[jira] [Updated] (HUDI-1790) Add SqlSource for DeltaStreamer to support backfill use cases

2021-04-28 Thread Vinoth Govindarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Govindarajan updated HUDI-1790:
--
Status: Patch Available  (was: In Progress)

> Add SqlSource for DeltaStreamer to support backfill use cases
> -
>
> Key: HUDI-1790
> URL: https://issues.apache.org/jira/browse/HUDI-1790
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Major
>  Labels: pull-request-available
>
> Delta Streamer is great for incremental workloads, but we need to support 
> backfills for use cases like adding a new column and backfill only that 
> column for the last 6 months, and if there was a bug in our transformation 
> logic and we need to reprocess a couple of older partitions.
>  
> If we have a SqlSource as one of the input source to the delta streamer, then 
> I can pass any custom Spark SQL queries selecting specific partitions and 
> backfill.
>  
> When we do the backfill, we don't need to update the last processed commit 
> checkpoint, this has to copy the last processed checkpoint before the 
> backfill and copy that over to the backfill commit.
>  
> cc [~nishith29]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1790) Add SqlSource for DeltaStreamer to support backfill use cases

2021-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1790:
-
Labels: pull-request-available  (was: )

> Add SqlSource for DeltaStreamer to support backfill use cases
> -
>
> Key: HUDI-1790
> URL: https://issues.apache.org/jira/browse/HUDI-1790
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Vinoth Govindarajan
>Assignee: Vinoth Govindarajan
>Priority: Major
>  Labels: pull-request-available
>
> Delta Streamer is great for incremental workloads, but we need to support 
> backfills for use cases like adding a new column and backfill only that 
> column for the last 6 months, and if there was a bug in our transformation 
> logic and we need to reprocess a couple of older partitions.
>  
> If we have a SqlSource as one of the input source to the delta streamer, then 
> I can pass any custom Spark SQL queries selecting specific partitions and 
> backfill.
>  
> When we do the backfill, we don't need to update the last processed commit 
> checkpoint, this has to copy the last processed checkpoint before the 
> backfill and copy that over to the backfill commit.
>  
> cc [~nishith29]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vingov opened a new pull request #2896: [HUDI-1790] Added SqlSource to fetch data from any partitions for backfill use case

2021-04-28 Thread GitBox


vingov opened a new pull request #2896:
URL: https://github.com/apache/hudi/pull/2896


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *This pull request adds a new source to delta streamer, to perform snapshot 
queries mainly used for backfilling historical partitions.*
   
   ## Brief change log
   
 - *Added a new SqlSource to delta streamer to handle backfills for any 
specific date range snapshot queries.*
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
 - *Added TestSqlSource to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [x] Has a corresponding JIRA in PR title & commit

- [x] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2892:
URL: https://github.com/apache/hudi/pull/2892#issuecomment-828296052






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2895: [HUDI-1867] Streaming read for Flink COW table

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2895:
URL: https://github.com/apache/hudi/pull/2895#issuecomment-828943047


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2895](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b8f0f53) into 
[master](https://codecov.io/gh/apache/hudi/commit/c9bcb5e33f7f9f97af0e8429a88d95f58ee48f13?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (c9bcb5e) will **increase** coverage by `5.04%`.
   > The diff coverage is `10.41%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2895/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2895  +/-   ##
   
   + Coverage 47.90%   52.94%   +5.04% 
   - Complexity 3421 3748 +327 
   
 Files   488  488  
 Lines 2352923572  +43 
 Branches   2501 2507   +6 
   
   + Hits  1127112480+1209 
   + Misses11277 9991-1286 
   - Partials981 1101 +120 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `39.53% <ø> (ø)` | `220.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.38% <ø> (ø)` | `1975.00 <ø> (ø)` | |
   | hudiflink | `59.07% <10.41%> (-0.60%)` | `537.00 <0.00> (ø)` | |
   | hudihadoopmr | `33.33% <ø> (ø)` | `198.00 <ø> (ø)` | |
   | hudisparkdatasource | `73.33% <ø> (ø)` | `237.00 <ø> (ø)` | |
   | hudisync | `46.73% <ø> (ø)` | `144.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `62.00 <ø> (ø)` | |
   | hudiutilities | `69.75% <ø> (+60.39%)` | `375.00 <ø> (+327.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../java/org/apache/hudi/table/HoodieTableSource.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNvdXJjZS5qYXZh)
 | `59.56% <0.00%> (-3.81%)` | `26.00 <0.00> (ø)` | |
   | 
[.../hudi/table/format/mor/MergeOnReadInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvbW9yL01lcmdlT25SZWFkSW5wdXRGb3JtYXQuamF2YQ==)
 | `66.52% <3.44%> (-8.96%)` | `18.00 <0.00> (ø)` | |
   | 
[...ache/hudi/source/StreamReadMonitoringFunction.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zb3VyY2UvU3RyZWFtUmVhZE1vbml0b3JpbmdGdW5jdGlvbi5qYXZh)
 | `76.22% <80.00%> (-0.04%)` | `35.00 <0.00> (ø)` | |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `88.79% <0.00%> (+5.17%)` | `28.00% <0.00%> (ø%)` | |
   | 
[...e/hudi/utilities/transform/ChainedTransformer.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9DaGFpbmVkVHJhbnNmb3JtZXIuamF2YQ==)
 | `100.00% <0.00%> (+11.11%)` | `4.00% <0.00%> (+1.00%)` | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2895: [HUDI-1867] Streaming read for Flink COW table

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2895:
URL: https://github.com/apache/hudi/pull/2895#issuecomment-828943047


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2895](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b8f0f53) into 
[master](https://codecov.io/gh/apache/hudi/commit/c9bcb5e33f7f9f97af0e8429a88d95f58ee48f13?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (c9bcb5e) will **increase** coverage by `3.31%`.
   > The diff coverage is `10.41%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2895/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2895  +/-   ##
   
   + Coverage 47.90%   51.21%   +3.31% 
   + Complexity 3421 3305 -116 
   
 Files   488  425  -63 
 Lines 2352920095-3434 
 Branches   2501 2089 -412 
   
   - Hits  1127110292 -979 
   + Misses11277 8947-2330 
   + Partials981  856 -125 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `39.53% <ø> (ø)` | `220.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.38% <ø> (ø)` | `1975.00 <ø> (ø)` | |
   | hudiflink | `59.07% <10.41%> (-0.60%)` | `537.00 <0.00> (ø)` | |
   | hudihadoopmr | `33.33% <ø> (ø)` | `198.00 <ø> (ø)` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.75% <ø> (+60.39%)` | `375.00 <ø> (+327.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../java/org/apache/hudi/table/HoodieTableSource.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNvdXJjZS5qYXZh)
 | `59.56% <0.00%> (-3.81%)` | `26.00 <0.00> (ø)` | |
   | 
[.../hudi/table/format/mor/MergeOnReadInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvbW9yL01lcmdlT25SZWFkSW5wdXRGb3JtYXQuamF2YQ==)
 | `66.52% <3.44%> (-8.96%)` | `18.00 <0.00> (ø)` | |
   | 
[...ache/hudi/source/StreamReadMonitoringFunction.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zb3VyY2UvU3RyZWFtUmVhZE1vbml0b3JpbmdGdW5jdGlvbi5qYXZh)
 | `76.22% <80.00%> (-0.04%)` | `35.00 <0.00> (ø)` | |
   | 
[.../main/scala/org/apache/hudi/HoodieSparkUtils.scala](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrVXRpbHMuc2NhbGE=)
 | | | |
   | 
[...nal/HoodieBulkInsertDataInternalWriterFactory.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2ludGVybmFsL0hvb2RpZUJ1bGtJbnNlcnREYXRhSW50ZXJuYWxXcml0ZXJGYWN0b3J5LmphdmE=)
 | | | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2892:
URL: https://github.com/apache/hudi/pull/2892#issuecomment-828296052


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2892](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (e2d0335) into 
[master](https://codecov.io/gh/apache/hudi/commit/386767693d46e7419c4fb0fa292ccb7ab7f7098d?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3867676) will **decrease** coverage by `16.73%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2892/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2892   +/-   ##
   =
   - Coverage 69.75%   53.01%   -16.74% 
   - Complexity  375 3746 +3371 
   =
 Files54  488  +434 
 Lines  199723527+21530 
 Branches236 2501 +2265 
   =
   + Hits   139312474+11081 
   - Misses  473 9953 +9480 
   - Partials131 1100  +969 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `39.53% <ø> (?)` | `220.00 <ø> (?)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.41% <ø> (?)` | `1976.00 <ø> (?)` | |
   | hudiflink | `59.67% <ø> (?)` | `537.00 <ø> (?)` | |
   | hudihadoopmr | `33.33% <ø> (?)` | `198.00 <ø> (?)` | |
   | hudisparkdatasource | `73.33% <ø> (?)` | `237.00 <ø> (?)` | |
   | hudisync | `46.39% <ø> (?)` | `142.00 <ø> (?)` | |
   | huditimelineservice | `64.36% <ø> (?)` | `62.00 <ø> (?)` | |
   | hudiutilities | `69.70% <ø> (-0.06%)` | `374.00 <ø> (-1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.08% <0.00%> (-0.35%)` | `55.00% <0.00%> (-1.00%)` | |
   | 
[...sioning/clean/CleanMetadataV1MigrationHandler.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5NZXRhZGF0YVYxTWlncmF0aW9uSGFuZGxlci5qYXZh)
 | `10.00% <0.00%> (ø)` | `3.00% <0.00%> (?%)` | |
   | 
[...apache/hudi/common/engine/TaskContextSupplier.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9UYXNrQ29udGV4dFN1cHBsaWVyLmphdmE=)
 | `100.00% <0.00%> (ø)` | `1.00% <0.00%> (?%)` | |
   | 
[...udi/common/table/log/block/HoodieCommandBlock.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVDb21tYW5kQmxvY2suamF2YQ==)
 | `100.00% <0.00%> (ø)` | `6.00% <0.00%> (?%)` | |
   | 
[...he/hudi/common/model/EmptyHoodieRecordPayload.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0VtcHR5SG9vZGllUmVjb3JkUGF5bG9hZC5qYXZh)
 | `0.00% <0.00%> 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2895: [HUDI-1867] Streaming read for Flink COW table

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2895:
URL: https://github.com/apache/hudi/pull/2895#issuecomment-828943047


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2895](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b8f0f53) into 
[master](https://codecov.io/gh/apache/hudi/commit/c9bcb5e33f7f9f97af0e8429a88d95f58ee48f13?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (c9bcb5e) will **increase** coverage by `21.85%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2895/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2895   +/-   ##
   =
   + Coverage 47.90%   69.75%   +21.85% 
   + Complexity 3421  375 -3046 
   =
 Files   488   54  -434 
 Lines 23529 1997-21532 
 Branches   2501  236 -2265 
   =
   - Hits  11271 1393 -9878 
   + Misses11277  473-10804 
   + Partials981  131  -850 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.75% <ø> (+60.39%)` | `375.00 <ø> (+327.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...he/hudi/metadata/HoodieMetadataFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvSG9vZGllTWV0YWRhdGFGaWxlU3lzdGVtVmlldy5qYXZh)
 | | | |
   | 
[...g/apache/hudi/common/util/RocksDBSchemaHelper.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUm9ja3NEQlNjaGVtYUhlbHBlci5qYXZh)
 | | | |
   | 
[...di/common/table/timeline/HoodieActiveTimeline.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFjdGl2ZVRpbWVsaW5lLmphdmE=)
 | | | |
   | 
[...in/java/org/apache/hudi/common/model/BaseFile.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jhc2VGaWxlLmphdmE=)
 | | | |
   | 
[...on/table/log/block/HoodieAvroDataBlockVersion.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVBdnJvRGF0YUJsb2NrVmVyc2lvbi5qYXZh)
 | | | |
   | 
[...a/org/apache/hudi/common/util/CollectionUtils.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQ29sbGVjdGlvblV0aWxzLmphdmE=)
 | | | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2892:
URL: https://github.com/apache/hudi/pull/2892#issuecomment-828296052


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2892](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (e2d0335) into 
[master](https://codecov.io/gh/apache/hudi/commit/386767693d46e7419c4fb0fa292ccb7ab7f7098d?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3867676) will **decrease** coverage by `16.73%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2892/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2892   +/-   ##
   =
   - Coverage 69.75%   53.01%   -16.74% 
   - Complexity  375 3746 +3371 
   =
 Files54  488  +434 
 Lines  199723527+21530 
 Branches236 2501 +2265 
   =
   + Hits   139312474+11081 
   - Misses  473 9953 +9480 
   - Partials131 1100  +969 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `39.53% <ø> (?)` | `220.00 <ø> (?)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.41% <ø> (?)` | `1976.00 <ø> (?)` | |
   | hudiflink | `59.67% <ø> (?)` | `537.00 <ø> (?)` | |
   | hudihadoopmr | `33.33% <ø> (?)` | `198.00 <ø> (?)` | |
   | hudisparkdatasource | `73.33% <ø> (?)` | `237.00 <ø> (?)` | |
   | hudisync | `46.39% <ø> (?)` | `142.00 <ø> (?)` | |
   | huditimelineservice | `64.36% <ø> (?)` | `62.00 <ø> (?)` | |
   | hudiutilities | `69.70% <ø> (-0.06%)` | `374.00 <ø> (-1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.08% <0.00%> (-0.35%)` | `55.00% <0.00%> (-1.00%)` | |
   | 
[.../hudi/common/table/view/FileSystemViewManager.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdNYW5hZ2VyLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | 
[...a/org/apache/hudi/common/bloom/InternalFilter.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0ludGVybmFsRmlsdGVyLmphdmE=)
 | `46.34% <0.00%> (ø)` | `4.00% <0.00%> (?%)` | |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `100.00% <0.00%> (ø)` | `2.00% <0.00%> (?%)` | |
   | 
[...mmon/table/log/HoodieUnMergedLogRecordScanner.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVVbk1lcmdlZExvZ1JlY29yZFNjYW5uZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
  

[GitHub] [hudi] codecov-commenter commented on pull request #2895: [HUDI-1867] Streaming read for Flink COW table

2021-04-28 Thread GitBox


codecov-commenter commented on pull request #2895:
URL: https://github.com/apache/hudi/pull/2895#issuecomment-828943047


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2895](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b8f0f53) into 
[master](https://codecov.io/gh/apache/hudi/commit/c9bcb5e33f7f9f97af0e8429a88d95f58ee48f13?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (c9bcb5e) will **decrease** coverage by `38.53%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2895/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2895   +/-   ##
   
   - Coverage 47.90%   9.36%   -38.54% 
   + Complexity 3421  48 -3373 
   
 Files   488  54  -434 
 Lines 235291997-21532 
 Branches   2501 236 -2265 
   
   - Hits  11271 187-11084 
   + Misses112771797 -9480 
   + Partials981  13  -968 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.36% <ø> (ø)` | `48.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2895?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../main/java/org/apache/hudi/util/AvroConvertor.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS91dGlsL0F2cm9Db252ZXJ0b3IuamF2YQ==)
 | | | |
   | 
[...java/org/apache/hudi/sink/StreamWriteOperator.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3IuamF2YQ==)
 | | | |
   | 
[...va/org/apache/hudi/metadata/BaseTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvQmFzZVRhYmxlTWV0YWRhdGEuamF2YQ==)
 | | | |
   | 
[.../org/apache/hudi/metadata/HoodieTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvSG9vZGllVGFibGVNZXRhZGF0YS5qYXZh)
 | | | |
   | 
[.../org/apache/hudi/common/metrics/LocalRegistry.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21ldHJpY3MvTG9jYWxSZWdpc3RyeS5qYXZh)
 | | | |
   | 
[...3/internal/HoodieDataSourceInternalBatchWrite.java](https://codecov.io/gh/apache/hudi/pull/2895/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NwYXJrMy9pbnRlcm5hbC9Ib29kaWVEYXRhU291cmNlSW50ZXJuYWxCYXRjaFdyaXRlLmphdmE=)
 | | | |
   | 

[GitHub] [hudi] danny0405 closed pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


danny0405 closed pull request #2892:
URL: https://github.com/apache/hudi/pull/2892


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on pull request #2868: [HUDI-1821] Remove legacy code for Flink writer

2021-04-28 Thread GitBox


danny0405 commented on pull request #2868:
URL: https://github.com/apache/hudi/pull/2868#issuecomment-828941274


   Fine, let's keep them for a time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1867) Streaming read for Flink COW table

2021-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1867:
-
Labels: pull-request-available  (was: )

> Streaming read for Flink COW table
> --
>
> Key: HUDI-1867
> URL: https://issues.apache.org/jira/browse/HUDI-1867
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Supports streaming read for Copy On Write table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 opened a new pull request #2895: [HUDI-1867] Streaming read for Flink COW table

2021-04-28 Thread GitBox


danny0405 opened a new pull request #2895:
URL: https://github.com/apache/hudi/pull/2895


   Supports streaming read for Copy On Write table.
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-28 Thread GitBox


pengzhiwei2018 commented on a change in pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#discussion_r622719034



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/UuidKeyGenerator.java
##
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.UUID;
+import java.util.stream.Collectors;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.keygen.constant.KeyGeneratorOptions;
+
+/**
+ * A KeyGenerator which use the uuid as the record key.
+ */
+public class UuidKeyGenerator extends BuiltinKeyGenerator {

Review comment:
   That's greate, I will try it in the 1840.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-28 Thread GitBox


pengzhiwei2018 commented on a change in pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#discussion_r622716410



##
File path: pom.xml
##
@@ -112,6 +112,7 @@
 3.0.0
 
 3
+hudi-spark2

Review comment:
   ok

##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestCreateTable.scala
##
@@ -0,0 +1,230 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import scala.collection.JavaConverters._
+import org.apache.hudi.common.model.HoodieRecord
+import org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.types.{DoubleType, IntegerType, LongType, 
StringType, StructField}
+
+class TestCreateTable extends TestHoodieSqlBase {
+
+  test("Test Create Managed Hoodie Table") {
+val tableName = generateTableName
+// Create a managed table
+spark.sql(
+  s"""
+ | create table $tableName (

Review comment:
   +1 for this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2833: [HUDI-89] Add configOption & refactor HoodieBootstrapConfig for a demo

2021-04-28 Thread GitBox


vinothchandar commented on pull request #2833:
URL: https://github.com/apache/hudi/pull/2833#issuecomment-828914347


   @zhedoubushishi please ping me when this is ready to go and we got all the 
configs covered.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


yanghua commented on pull request #2892:
URL: https://github.com/apache/hudi/pull/2892#issuecomment-828909873


   @vinothchandar Can you join in and review this PR? The change is out of my 
range.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2894: [HUDI-1620] Fix Metrics UTs and remove maven profile for azure tests

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2894:
URL: https://github.com/apache/hudi/pull/2894#issuecomment-828895522


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2894](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b91aef6) into 
[master](https://codecov.io/gh/apache/hudi/commit/3ca90302562580a7c5c69fd3f11ab376cfac1f0b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3ca9030) will **decrease** coverage by `16.74%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2894/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2894   +/-   ##
   =
   - Coverage 69.75%   53.00%   -16.75% 
   - Complexity  375 3745 +3370 
   =
 Files54  488  +434 
 Lines  199723527+21530 
 Branches236 2501 +2265 
   =
   + Hits   139312471+11078 
   - Misses  473 9956 +9483 
   - Partials131 1100  +969 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `39.53% <ø> (?)` | `220.00 <ø> (?)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.38% <ø> (?)` | `1975.00 <ø> (?)` | |
   | hudiflink | `59.67% <ø> (?)` | `537.00 <ø> (?)` | |
   | hudihadoopmr | `33.33% <ø> (?)` | `198.00 <ø> (?)` | |
   | hudisparkdatasource | `73.33% <ø> (?)` | `237.00 <ø> (?)` | |
   | hudisync | `46.39% <ø> (?)` | `142.00 <ø> (?)` | |
   | huditimelineservice | `64.36% <ø> (?)` | `62.00 <ø> (?)` | |
   | hudiutilities | `69.70% <ø> (-0.06%)` | `374.00 <ø> (-1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.08% <0.00%> (-0.35%)` | `55.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/io/storage/HoodieHFileReader.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVIRmlsZVJlYWRlci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | 
[...org/apache/hudi/common/table/log/AppendResult.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9BcHBlbmRSZXN1bHQuamF2YQ==)
 | `100.00% <0.00%> (ø)` | `4.00% <0.00%> (?%)` | |
   | 
[...mmon/table/log/AbstractHoodieLogRecordScanner.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9BYnN0cmFjdEhvb2RpZUxvZ1JlY29yZFNjYW5uZXIuamF2YQ==)
 | `80.00% <0.00%> (ø)` | `34.00% <0.00%> (?%)` | |
   | 
[.../common/util/queue/IteratorBasedQueueProducer.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvSXRlcmF0b3JCYXNlZFF1ZXVlUHJvZHVjZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2894: [HUDI-1620] Fix Metrics UTs and remove maven profile for azure tests

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2894:
URL: https://github.com/apache/hudi/pull/2894#issuecomment-828895522






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2894: [HUDI-1620] Fix Metrics UTs and remove maven profile for azure tests

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2894:
URL: https://github.com/apache/hudi/pull/2894#issuecomment-828895522


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2894](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b91aef6) into 
[master](https://codecov.io/gh/apache/hudi/commit/3ca90302562580a7c5c69fd3f11ab376cfac1f0b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3ca9030) will **decrease** coverage by `16.14%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2894/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2894   +/-   ##
   =
   - Coverage 69.75%   53.60%   -16.15% 
   - Complexity  375  594  +219 
   =
 Files54   94   +40 
 Lines  1997 4281 +2284 
 Branches236  496  +260 
   =
   + Hits   1393 2295  +902 
   - Misses  473 1781 +1308 
   - Partials131  205   +74 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `39.53% <ø> (?)` | `220.00 <ø> (?)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.70% <ø> (-0.06%)` | `374.00 <ø> (-1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.08% <0.00%> (-0.35%)` | `55.00% <0.00%> (-1.00%)` | |
   | 
[...n/java/org/apache/hudi/cli/HoodieSplashScreen.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL0hvb2RpZVNwbGFzaFNjcmVlbi5qYXZh)
 | `42.85% <0.00%> (ø)` | `2.00% <0.00%> (?%)` | |
   | 
[...i-cli/src/main/java/org/apache/hudi/cli/Table.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL1RhYmxlLmphdmE=)
 | `60.78% <0.00%> (ø)` | `12.00% <0.00%> (?%)` | |
   | 
[...ain/scala/org/apache/hudi/cli/DedupeSparkJob.scala](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL2NsaS9EZWR1cGVTcGFya0pvYi5zY2FsYQ==)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | 
[...rg/apache/hudi/cli/commands/CompactionCommand.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL0NvbXBhY3Rpb25Db21tYW5kLmphdmE=)
 | `30.18% <0.00%> (ø)` | `22.00% <0.00%> (?%)` | |
   | 
[...rc/main/scala/org/apache/hudi/cli/DeDupeType.scala](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL2NsaS9EZUR1cGVUeXBlLnNjYWxh)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | 

[GitHub] [hudi] yanghua commented on pull request #2868: [HUDI-1821] Remove legacy code for Flink writer

2021-04-28 Thread GitBox


yanghua commented on pull request #2868:
URL: https://github.com/apache/hudi/pull/2868#issuecomment-828903309


   > > I still insist that we need to include kafka-related dependencies. If 
you look back at the HoodieFlinkStreamerV2 class. What is it in essence? It is 
just a program written using Flink DataStream API, which is specific (Kafka -> 
Hudi)
   > 
   > No, on one says that they don't know how to add a connector jar or 
actually few people use the `HoodieFlinkStreamerV2` tool.
   
   "one says that they don't know how to add a connector jar" -> I recommend we 
package it into the bundle for users. It's not that users won't, but users 
should not or may not need to perceive these things. This is a question of user 
experience.
   
   According to your logic, what reason do you think users will not use 
FlinkWriteClient directly? Why should we guide users to use Flink SQL? Can't 
users write the FlinkStreamer class by themselves?
   
   All of this is to shield users from details as much as possible, let the 
framework provide out-of-the-box capabilities as much as possible, and provide 
a good experience as much as possible? Is not it?
   
   "actually few people use the `HoodieFlinkStreamerV2` tool" -> Actually, 
there still few users use the flink write client, because it is still not 
production-ready for 0.8, you know. IMO, we do not get enough samples about 
your result.
   
   I have never understood why we cannot include the kafka connector to provide 
convenience to some users who do not use SQL. And it should provide a 
consistent experience with DeltaStreamer based on Spark. Otherwise, don't call 
"FlinkStreamerXXX".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2894: [HUDI-1620] Fix Metrics UTs and remove maven profile for azure tests

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2894:
URL: https://github.com/apache/hudi/pull/2894#issuecomment-828895522


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2894](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b91aef6) into 
[master](https://codecov.io/gh/apache/hudi/commit/3ca90302562580a7c5c69fd3f11ab376cfac1f0b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3ca9030) will **decrease** coverage by `0.05%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2894/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2894  +/-   ##
   
   - Coverage 69.75%   69.70%   -0.06% 
   + Complexity  375  374   -1 
   
 Files54   54  
 Lines  1997 1997  
 Branches236  236  
   
   - Hits   1393 1392   -1 
 Misses  473  473  
   - Partials131  132   +1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudiclient | `?` | `?` | |
   | hudiutilities | `69.70% <ø> (-0.06%)` | `374.00 <ø> (-1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.08% <0.00%> (-0.35%)` | `55.00% <0.00%> (-1.00%)` | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-28 Thread GitBox


vinothchandar commented on a change in pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#discussion_r622689426



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestCreateTable.scala
##
@@ -0,0 +1,230 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import scala.collection.JavaConverters._
+import org.apache.hudi.common.model.HoodieRecord
+import org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.types.{DoubleType, IntegerType, LongType, 
StringType, StructField}
+
+class TestCreateTable extends TestHoodieSqlBase {
+
+  test("Test Create Managed Hoodie Table") {
+val tableName = generateTableName
+// Create a managed table
+spark.sql(
+  s"""
+ | create table $tableName (

Review comment:
   Can we file a JIRA to write our own DFS based catalog? We can also 
extend it to work with a metaserver down the line?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-28 Thread GitBox


vinothchandar commented on a change in pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#discussion_r622686509



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/UuidKeyGenerator.java
##
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.UUID;
+import java.util.stream.Collectors;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.keygen.constant.KeyGeneratorOptions;
+
+/**
+ * A KeyGenerator which use the uuid as the record key.
+ */
+public class UuidKeyGenerator extends BuiltinKeyGenerator {

Review comment:
   I have a better suggestion. Could you try and explore time-ordered UUIDs 
instead?
   
   https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/ 
   https://github.com/f4b6a3/uuid-creator 
   
   This will do I think. We need not make changes to pass in commit time per 
se. I was using that as an example. It will be good to do this in first go 
itself, that way users don't have to regenerate/rewrite datasets

##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/MultiPartKeysValueExtractor.java
##
@@ -31,6 +32,11 @@
 
   @Override
   public List extractPartitionValuesInPath(String partitionPath) {
+// If the partitionPath is empty string( which means none-partition 
table), the partition values

Review comment:
   2876 looks good. merged. 

##
File path: pom.xml
##
@@ -112,6 +112,7 @@
 3.0.0
 
 3
+hudi-spark2

Review comment:
   rename this to `hudi.spark.module` ? There is a typo.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (3ca9030 -> c9bcb5e)

2021-04-28 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 3ca9030  [HUDI-1858] Fix cannot create table due to jar conflict 
(#2886)
 add c9bcb5e  [HUDI-1845] Exception Throws When Sync Non-Partitioned Table 
To Hive With MultiPartKeysValueExtractor (#2876)

No new revisions were added by this update.

Summary of changes:
 .../hudi/hive/MultiPartKeysValueExtractor.java |  6 
 .../hudi/hive/TestMultiPartKeysValueExtractor.java | 39 +-
 2 files changed, 21 insertions(+), 24 deletions(-)
 copy 
hudi-common/src/test/java/org/apache/hudi/common/model/TestHoodieDeltaWriteStat.java
 => 
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestMultiPartKeysValueExtractor.java
 (53%)


[GitHub] [hudi] vinothchandar merged pull request #2876: [HUDI-1845] Exception Throws When Sync Non-Partitioned Table To Hive …

2021-04-28 Thread GitBox


vinothchandar merged pull request #2876:
URL: https://github.com/apache/hudi/pull/2876


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter commented on pull request #2894: [HUDI-1620] Fix Metrics UTs and remove maven profile for azure tests

2021-04-28 Thread GitBox


codecov-commenter commented on pull request #2894:
URL: https://github.com/apache/hudi/pull/2894#issuecomment-828895522


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2894](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b91aef6) into 
[master](https://codecov.io/gh/apache/hudi/commit/3ca90302562580a7c5c69fd3f11ab376cfac1f0b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3ca9030) will **decrease** coverage by `60.39%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2894/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2894   +/-   ##
   
   - Coverage 69.75%   9.36%   -60.40% 
   + Complexity  375  48  -327 
   
 Files54  54   
 Lines  19971997   
 Branches236 236   
   
   - Hits   1393 187 -1206 
   - Misses  4731797 +1324 
   + Partials131  13  -118 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudiclient | `?` | `?` | |
   | hudiutilities | `9.36% <ø> (-60.40%)` | `48.00 <ø> (-327.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2894?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2894/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 

[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-28 Thread GitBox


pengzhiwei2018 edited a comment on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-828893245


   > @pengzhiwei2018 could we make the spark-shell experience better? I think 
we need the extensions added by default when the jar is pulled in?
   > 
   > ```scala
   > $ spark-shell --jars $HUDI_SPARK_BUNDLE --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   > 
   > scala> spark.sql("create table t1 (id int, name string, price double, ts 
long) using hudi options(primaryKey= 'id', preCombineField = 'ts')").show 
   > t, returning NoSuchObjectException
   > org.apache.hudi.exception.HoodieException: 'path' or 
'hoodie.datasource.read.paths' or both must be specified.
   >   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:77)
   >   at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:337)
   >   at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
   >   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   >   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   >   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
   >   at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
   >   at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   >   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
   >   at org.apache.spark.sql.Dataset.(Dataset.scala:229)
   >   at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
   >   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
   >   at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
   >   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
   > ```
   
   Hi @vinothchandar , you can test this by the following command
   
   - Using spark-sql
   
   > spark-sql --jars $HUDI_SPARK_BUNDLE \\
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'  \\
   --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
   
   - Using spark-shell
   
   > spark-shell --jars $HUDI_SPARK_BUNDLE \\
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'  \\
   --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
   
   
   just set the `spark.sql.extensions` to 
`org.apache.spark.sql.hudi.HoodieSparkSessionExtension`.
   IMO This conf is just like the `spark.serializer` which should be specified 
when create `SparkSession`. So It is hard to auto set this when install the 
hudi jar.
   Thanks~
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1867) Streaming read for Flink COW table

2021-04-28 Thread Danny Chen (Jira)
Danny Chen created HUDI-1867:


 Summary: Streaming read for Flink COW table
 Key: HUDI-1867
 URL: https://issues.apache.org/jira/browse/HUDI-1867
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0


Supports streaming read for Copy On Write table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-28 Thread GitBox


pengzhiwei2018 edited a comment on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-828893245


   > @pengzhiwei2018 could we make the spark-shell experience better? I think 
we need the extensions added by default when the jar is pulled in?
   > 
   > ```scala
   > $ spark-shell --jars $HUDI_SPARK_BUNDLE --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   > 
   > scala> spark.sql("create table t1 (id int, name string, price double, ts 
long) using hudi options(primaryKey= 'id', preCombineField = 'ts')").show 
   > t, returning NoSuchObjectException
   > org.apache.hudi.exception.HoodieException: 'path' or 
'hoodie.datasource.read.paths' or both must be specified.
   >   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:77)
   >   at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:337)
   >   at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
   >   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   >   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   >   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
   >   at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
   >   at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   >   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
   >   at org.apache.spark.sql.Dataset.(Dataset.scala:229)
   >   at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
   >   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
   >   at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
   >   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
   > ```
   
   Hi @vinothchandar , you can test this by the following command
   
   - Using spark-sql
   
   > spark-sql --jars $HUDI_SPARK_BUNDLE \\
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'  \\
   --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
   
   - Using spark-shell
   
   > spark-shell --jars $HUDI_SPARK_BUNDLE \\
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'  \\
   --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
   
   
   just set the `spark.sql.extensions` to 
`org.apache.spark.sql.hudi.HoodieSparkSessionExtension`.
   Thanks~
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-28 Thread GitBox


pengzhiwei2018 commented on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-828893245


   > @pengzhiwei2018 could we make the spark-shell experience better? I think 
we need the extensions added by default when the jar is pulled in?
   > 
   > ```scala
   > $ spark-shell --jars $HUDI_SPARK_BUNDLE --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   > 
   > scala> spark.sql("create table t1 (id int, name string, price double, ts 
long) using hudi options(primaryKey= 'id', preCombineField = 'ts')").show 
   > t, returning NoSuchObjectException
   > org.apache.hudi.exception.HoodieException: 'path' or 
'hoodie.datasource.read.paths' or both must be specified.
   >   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:77)
   >   at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:337)
   >   at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
   >   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   >   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   >   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
   >   at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
   >   at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
   >   at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   >   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
   >   at org.apache.spark.sql.Dataset.(Dataset.scala:229)
   >   at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
   >   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
   >   at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606)
   >   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
   >   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
   > ```
   
   Hi @vinothchandar , you can test this by the following command
   
   - Using spark-sql
   
   > spark-sql --jars $HUDI_SPARK_BUNDLE \
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'  \
   --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
   
   - Using spark-shell
   
   > spark-shell --jars $HUDI_SPARK_BUNDLE \
   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'  \
   --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
   
   
   just set the `spark.sql.extensions` to 
`org.apache.spark.sql.hudi.HoodieSparkSessionExtension`.
   Thanks~
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-28 Thread GitBox


vinothchandar commented on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-82958


   @pengzhiwei2018 could we make the spark-shell experience better? I think we 
need the extensions added by default when the jar is pulled in?
   
   ```Scala 
   $ spark-shell --jars $HUDI_SPARK_BUNDLE --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   
   scala> spark.sql("create table t1 (id int, name string, price double, ts 
long) using hudi options(primaryKey= 'id', preCombineField = 'ts')").show 
   t, returning NoSuchObjectException
   org.apache.hudi.exception.HoodieException: 'path' or 
'hoodie.datasource.read.paths' or both must be specified.
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:77)
 at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:337)
 at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
 at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
 at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
 at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
 at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
 at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
 at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
 at org.apache.spark.sql.Dataset.(Dataset.scala:229)
 at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
 at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
 at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
 at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606)
 at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
 at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #2643: DO NOT MERGE (Azure CI) test branch ci

2021-04-28 Thread GitBox


hudi-bot edited a comment on pull request #2643:
URL: https://github.com/apache/hudi/pull/2643#issuecomment-792368481


   
   ## CI report:
   
   * 9831a6c50e9f49f8a71c02fc6ac50ae1446f7c1f UNKNOWN
   * a569dbe9409910fbb83b3764b300574c0e52612e Azure: 
[FAILURE](https://dev.azure.com/XUSH0012/0ef433cc-d4b4-47cc-b6a1-03d032ef546c/_build/results?buildId=142)
 
   * e6e9f1f1554a1474dd6c20338215030cad23a2e0 UNKNOWN
   * 2a6690a256c8cd8efe9ed2b1984b896fb27ef077 UNKNOWN
   * d8b7cca55e057a52a2e229d81e8cb52b60dc275f UNKNOWN
   * 3bce301333cc78194d13a702598b46e04fe9f85f UNKNOWN
   * f07f345baa450f3fec7eab59caa76b0fbda1e132 UNKNOWN
   * 869d2ce3fad330af93c1bb3b576824f519c6e68b UNKNOWN
   * fa86907f7522bc8dbe512d48b5a87e4a6b13f035 UNKNOWN
   * 4ebe53016ce3e0648992dbe14d04f71a92f116e6 UNKNOWN
   * 682ae9985f591f6d0c30ee2ef9b159403c1e46de UNKNOWN
   * d80397fcfeaa2996ab550bcdab4524be7420a364 UNKNOWN
   * bfe3a803e19540578b94f778f7ba7551db0f86f1 UNKNOWN
   * a632e58390eb94fcc7e757bd7580780cf184f9a8 UNKNOWN
   * 2e413d601c80b123269c2fc3fc6aa9a8bd0d746a UNKNOWN
   * e797ee47aa319df3c3c40bdc4acab4f592d70ffe UNKNOWN
   * acb06df73c1c2a0ef1590f66e8b41e173d2a7a7b UNKNOWN
   * f7f78ee22a0a75c5fb866c4e9cdda01482fbcb59 UNKNOWN
   * 3a7227993309e8dd37f2aef693cb3fed69a2043c UNKNOWN
   * 8f7a8e7f4989c9e20b936123c0f6e324898471d2 UNKNOWN
   * 6824c4917ad812c5938fe5346344a4aef9b7a72e UNKNOWN
   * 252364017f5dee1dcdfa061cc3070dac518d4047 UNKNOWN
   * b1691e583f3c23ee83fcb7ee0245eed826624cc0 UNKNOWN
   * ba970bda569f0312c77cd5c139f9dec4ad2759b0 UNKNOWN
   * 4370d21d4983e5e79d1f4bafba51ae26dd29f9a0 UNKNOWN
   * 21ea9ccef8ab9d78f9c201fa58a22e3e59caaa6b UNKNOWN
   * b17028e8a232ff3015c18b8f7de5435241800bfe UNKNOWN
   * 11974f4994838cca929d1f55214a132f2dbccd60 UNKNOWN
   * c4f92e29cc2affbd3da1e02c87e99c2076d3c410 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #2643: DO NOT MERGE (Azure CI) test branch ci

2021-04-28 Thread GitBox


hudi-bot edited a comment on pull request #2643:
URL: https://github.com/apache/hudi/pull/2643#issuecomment-792368481


   
   ## CI report:
   
   * 9831a6c50e9f49f8a71c02fc6ac50ae1446f7c1f UNKNOWN
   * a569dbe9409910fbb83b3764b300574c0e52612e Azure: 
[FAILURE](https://dev.azure.com/XUSH0012/0ef433cc-d4b4-47cc-b6a1-03d032ef546c/_build/results?buildId=142)
 
   * e6e9f1f1554a1474dd6c20338215030cad23a2e0 UNKNOWN
   * 2a6690a256c8cd8efe9ed2b1984b896fb27ef077 UNKNOWN
   * d8b7cca55e057a52a2e229d81e8cb52b60dc275f UNKNOWN
   * 3bce301333cc78194d13a702598b46e04fe9f85f UNKNOWN
   * f07f345baa450f3fec7eab59caa76b0fbda1e132 UNKNOWN
   * 869d2ce3fad330af93c1bb3b576824f519c6e68b UNKNOWN
   * fa86907f7522bc8dbe512d48b5a87e4a6b13f035 UNKNOWN
   * 4ebe53016ce3e0648992dbe14d04f71a92f116e6 UNKNOWN
   * 682ae9985f591f6d0c30ee2ef9b159403c1e46de UNKNOWN
   * d80397fcfeaa2996ab550bcdab4524be7420a364 UNKNOWN
   * bfe3a803e19540578b94f778f7ba7551db0f86f1 UNKNOWN
   * a632e58390eb94fcc7e757bd7580780cf184f9a8 UNKNOWN
   * 2e413d601c80b123269c2fc3fc6aa9a8bd0d746a UNKNOWN
   * e797ee47aa319df3c3c40bdc4acab4f592d70ffe UNKNOWN
   * acb06df73c1c2a0ef1590f66e8b41e173d2a7a7b UNKNOWN
   * f7f78ee22a0a75c5fb866c4e9cdda01482fbcb59 UNKNOWN
   * 3a7227993309e8dd37f2aef693cb3fed69a2043c UNKNOWN
   * 8f7a8e7f4989c9e20b936123c0f6e324898471d2 UNKNOWN
   * 6824c4917ad812c5938fe5346344a4aef9b7a72e UNKNOWN
   * 252364017f5dee1dcdfa061cc3070dac518d4047 UNKNOWN
   * b1691e583f3c23ee83fcb7ee0245eed826624cc0 UNKNOWN
   * ba970bda569f0312c77cd5c139f9dec4ad2759b0 UNKNOWN
   * 4370d21d4983e5e79d1f4bafba51ae26dd29f9a0 UNKNOWN
   * 21ea9ccef8ab9d78f9c201fa58a22e3e59caaa6b UNKNOWN
   * b17028e8a232ff3015c18b8f7de5435241800bfe UNKNOWN
   * 11974f4994838cca929d1f55214a132f2dbccd60 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2892:
URL: https://github.com/apache/hudi/pull/2892#issuecomment-828296052






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1620) TestPushGateWayReporter failed when run separately

2021-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1620:
-
Labels: pull-request-available  (was: )

> TestPushGateWayReporter failed when run separately
> --
>
> Key: HUDI-1620
> URL: https://issues.apache.org/jira/browse/HUDI-1620
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> org.apache.hudi.metrics.prometheus.TestPushGateWayReporter#testRegisterGauge
> when run separately, it failed with
> {quote}org.apache.hudi.exception.HoodieException: 
> java.lang.IllegalArgumentException
>   at org.apache.hudi.metrics.Metrics.init(Metrics.java:100)
>   at org.apache.hudi.metrics.HoodieMetrics.(HoodieMetrics.java:59)
>   at 
> org.apache.hudi.metrics.prometheus.TestPushGateWayReporter.testRegisterGauge(TestPushGateWayReporter.java:45){quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xushiyan opened a new pull request #2894: [HUDI-1620] Fix Metrics UT

2021-04-28 Thread GitBox


xushiyan opened a new pull request #2894:
URL: https://github.com/apache/hudi/pull/2894


   Make sure shutdown Metrics between unit test cases to ensure isolation
   
   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2892:
URL: https://github.com/apache/hudi/pull/2892#issuecomment-828296052


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2892](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (e2d0335) into 
[master](https://codecov.io/gh/apache/hudi/commit/386767693d46e7419c4fb0fa292ccb7ab7f7098d?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3867676) will **decrease** coverage by `16.75%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2892/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2892   +/-   ##
   =
   - Coverage 69.75%   52.99%   -16.76% 
   - Complexity  375 3745 +3370 
   =
 Files54  488  +434 
 Lines  199723527+21530 
 Branches236 2501 +2265 
   =
   + Hits   139312469+11076 
   - Misses  473 9957 +9484 
   - Partials131 1101  +970 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `39.53% <ø> (?)` | `220.00 <ø> (?)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.37% <ø> (?)` | `1975.00 <ø> (?)` | |
   | hudiflink | `59.67% <ø> (?)` | `537.00 <ø> (?)` | |
   | hudihadoopmr | `33.33% <ø> (?)` | `198.00 <ø> (?)` | |
   | hudisparkdatasource | `73.33% <ø> (?)` | `237.00 <ø> (?)` | |
   | hudisync | `46.39% <ø> (?)` | `142.00 <ø> (?)` | |
   | huditimelineservice | `64.36% <ø> (?)` | `62.00 <ø> (?)` | |
   | hudiutilities | `69.70% <ø> (-0.06%)` | `374.00 <ø> (-1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.08% <0.00%> (-0.35%)` | `55.00% <0.00%> (-1.00%)` | |
   | 
[...i/src/main/java/org/apache/hudi/cli/HoodieCLI.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL0hvb2RpZUNMSS5qYXZh)
 | `89.18% <0.00%> (ø)` | `18.00% <0.00%> (?%)` | |
   | 
[...org/apache/hudi/common/util/collection/Triple.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9UcmlwbGUuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | 
[...che/hudi/common/util/collection/ImmutablePair.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9JbW11dGFibGVQYWlyLmphdmE=)
 | `75.00% <0.00%> (ø)` | `3.00% <0.00%> (?%)` | |
   | 
[...hudi/hadoop/hive/HoodieCombineHiveInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL2hpdmUvSG9vZGllQ29tYmluZUhpdmVJbnB1dEZvcm1hdC5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2892:
URL: https://github.com/apache/hudi/pull/2892#issuecomment-828296052


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2892](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (e2d0335) into 
[master](https://codecov.io/gh/apache/hudi/commit/386767693d46e7419c4fb0fa292ccb7ab7f7098d?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3867676) will **decrease** coverage by `16.75%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2892/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2892   +/-   ##
   =
   - Coverage 69.75%   52.99%   -16.76% 
   - Complexity  375 3745 +3370 
   =
 Files54  488  +434 
 Lines  199723527+21530 
 Branches236 2501 +2265 
   =
   + Hits   139312469+11076 
   - Misses  473 9957 +9484 
   - Partials131 1101  +970 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `39.53% <ø> (?)` | `220.00 <ø> (?)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.37% <ø> (?)` | `1975.00 <ø> (?)` | |
   | hudiflink | `59.67% <ø> (?)` | `537.00 <ø> (?)` | |
   | hudihadoopmr | `33.33% <ø> (?)` | `198.00 <ø> (?)` | |
   | hudisparkdatasource | `73.33% <ø> (?)` | `237.00 <ø> (?)` | |
   | hudisync | `46.39% <ø> (?)` | `142.00 <ø> (?)` | |
   | huditimelineservice | `64.36% <ø> (?)` | `62.00 <ø> (?)` | |
   | hudiutilities | `69.70% <ø> (-0.06%)` | `374.00 <ø> (-1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.08% <0.00%> (-0.35%)` | `55.00% <0.00%> (-1.00%)` | |
   | 
[...ache/hudi/common/table/timeline/TimelineUtils.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lVXRpbHMuamF2YQ==)
 | `62.71% <0.00%> (ø)` | `21.00% <0.00%> (?%)` | |
   | 
[...apache/hudi/sink/event/BatchWriteSuccessEvent.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2V2ZW50L0JhdGNoV3JpdGVTdWNjZXNzRXZlbnQuamF2YQ==)
 | `92.30% <0.00%> (ø)` | `7.00% <0.00%> (?%)` | |
   | 
[.../org/apache/hudi/io/storage/HoodieHFileReader.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVIRmlsZVJlYWRlci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | 
[...pache/hudi/common/model/HoodieMetadataWrapper.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZU1ldGFkYXRhV3JhcHBlci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | 

[GitHub] [hudi] danny0405 closed pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


danny0405 closed pull request #2892:
URL: https://github.com/apache/hudi/pull/2892


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2893: [HUDI-1371] Support metadata based listing for Spark DataSource and Spark SQL

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2893:
URL: https://github.com/apache/hudi/pull/2893#issuecomment-828848333


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2893](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (1ce0f37) into 
[master](https://codecov.io/gh/apache/hudi/commit/e4fd195d9fd0cc1128b8c6797d88e56402b166bd?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (e4fd195) will **increase** coverage by `0.01%`.
   > The diff coverage is `50.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2893/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2893  +/-   ##
   
   + Coverage 52.99%   53.01%   +0.01% 
   - Complexity 3745 3749   +4 
   
 Files   488  488  
 Lines 2352723550  +23 
 Branches   2501 2503   +2 
   
   + Hits  1246912484  +15 
   - Misses 9957 9967  +10 
   + Partials   1101 1099   -2 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `39.53% <ø> (ø)` | `220.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.38% <26.08%> (+0.01%)` | `1978.00 <2.00> (+3.00)` | |
   | hudiflink | `59.67% <ø> (ø)` | `537.00 <ø> (ø)` | |
   | hudihadoopmr | `33.33% <ø> (ø)` | `198.00 <ø> (ø)` | |
   | hudisparkdatasource | `73.34% <92.30%> (+<0.01%)` | `237.00 <0.00> (ø)` | |
   | hudisync | `46.39% <ø> (ø)` | `142.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `62.00 <ø> (ø)` | |
   | hudiutilities | `69.75% <ø> (+0.05%)` | `375.00 <ø> (+1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==)
 | `47.34% <0.00%> (ø)` | `57.00 <0.00> (ø)` | |
   | 
[...va/org/apache/hudi/metadata/BaseTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvQmFzZVRhYmxlTWV0YWRhdGEuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../org/apache/hudi/metadata/HoodieTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvSG9vZGllVGFibGVNZXRhZGF0YS5qYXZh)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...c/main/scala/org/apache/hudi/HoodieFileIndex.scala](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZUZpbGVJbmRleC5zY2FsYQ==)
 | `78.98% <92.30%> (-0.11%)` | `24.00 <0.00> (ø)` | |
   | 
[...e/hudi/metadata/FileSystemBackedTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvRmlsZVN5c3RlbUJhY2tlZFRhYmxlTWV0YWRhdGEuamF2YQ==)
 | `93.18% <100.00%> (+1.07%)` | `15.00 <2.00> (+2.00)` | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2893: [HUDI-1371] Support metadata based listing for Spark DataSource and Spark SQL

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2893:
URL: https://github.com/apache/hudi/pull/2893#issuecomment-828848333


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2893](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (1ce0f37) into 
[master](https://codecov.io/gh/apache/hudi/commit/e4fd195d9fd0cc1128b8c6797d88e56402b166bd?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (e4fd195) will **increase** coverage by `0.01%`.
   > The diff coverage is `50.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2893/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2893  +/-   ##
   
   + Coverage 52.99%   53.01%   +0.01% 
   - Complexity 3745 3749   +4 
   
 Files   488  488  
 Lines 2352723550  +23 
 Branches   2501 2503   +2 
   
   + Hits  1246912484  +15 
   - Misses 9957 9967  +10 
   + Partials   1101 1099   -2 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `39.53% <ø> (ø)` | `220.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.38% <26.08%> (+0.01%)` | `1978.00 <2.00> (+3.00)` | |
   | hudiflink | `59.67% <ø> (ø)` | `537.00 <ø> (ø)` | |
   | hudihadoopmr | `33.33% <ø> (ø)` | `198.00 <ø> (ø)` | |
   | hudisparkdatasource | `73.34% <92.30%> (+<0.01%)` | `237.00 <0.00> (ø)` | |
   | hudisync | `46.39% <ø> (ø)` | `142.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `62.00 <ø> (ø)` | |
   | hudiutilities | `69.75% <ø> (+0.05%)` | `375.00 <ø> (+1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==)
 | `47.34% <0.00%> (ø)` | `57.00 <0.00> (ø)` | |
   | 
[...va/org/apache/hudi/metadata/BaseTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvQmFzZVRhYmxlTWV0YWRhdGEuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../org/apache/hudi/metadata/HoodieTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvSG9vZGllVGFibGVNZXRhZGF0YS5qYXZh)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...c/main/scala/org/apache/hudi/HoodieFileIndex.scala](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZUZpbGVJbmRleC5zY2FsYQ==)
 | `78.98% <92.30%> (-0.11%)` | `24.00 <0.00> (ø)` | |
   | 
[...e/hudi/metadata/FileSystemBackedTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvRmlsZVN5c3RlbUJhY2tlZFRhYmxlTWV0YWRhdGEuamF2YQ==)
 | `93.18% <100.00%> (+1.07%)` | `15.00 <2.00> (+2.00)` | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2893: [HUDI-1371] Support metadata based listing for Spark DataSource and Spark SQL

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2893:
URL: https://github.com/apache/hudi/pull/2893#issuecomment-828848333


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2893](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (1ce0f37) into 
[master](https://codecov.io/gh/apache/hudi/commit/e4fd195d9fd0cc1128b8c6797d88e56402b166bd?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (e4fd195) will **decrease** coverage by `1.69%`.
   > The diff coverage is `26.08%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2893/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2893  +/-   ##
   
   - Coverage 52.99%   51.30%   -1.70% 
   + Complexity 3745 3308 -437 
   
 Files   488  425  -63 
 Lines 2352720071-3456 
 Branches   2501 2085 -416 
   
   - Hits  1246910298-2171 
   + Misses 9957 8919-1038 
   + Partials   1101  854 -247 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `39.53% <ø> (ø)` | `220.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.38% <26.08%> (+0.01%)` | `1978.00 <2.00> (+3.00)` | |
   | hudiflink | `59.67% <ø> (ø)` | `537.00 <ø> (ø)` | |
   | hudihadoopmr | `33.33% <ø> (ø)` | `198.00 <ø> (ø)` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.75% <ø> (+0.05%)` | `375.00 <ø> (+1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==)
 | `47.34% <0.00%> (ø)` | `57.00 <0.00> (ø)` | |
   | 
[...va/org/apache/hudi/metadata/BaseTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvQmFzZVRhYmxlTWV0YWRhdGEuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../org/apache/hudi/metadata/HoodieTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvSG9vZGllVGFibGVNZXRhZGF0YS5qYXZh)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...e/hudi/metadata/FileSystemBackedTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvRmlsZVN5c3RlbUJhY2tlZFRhYmxlTWV0YWRhdGEuamF2YQ==)
 | `93.18% <100.00%> (+1.07%)` | `15.00 <2.00> (+2.00)` | |
   | 
[.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTXVsdGlQYXJ0S2V5c1ZhbHVlRXh0cmFjdG9yLmphdmE=)
 | | | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2893: [HUDI-1371] Support metadata based listing for Spark DataSource and Spark SQL

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2893:
URL: https://github.com/apache/hudi/pull/2893#issuecomment-828848333


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2893](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (1ce0f37) into 
[master](https://codecov.io/gh/apache/hudi/commit/e4fd195d9fd0cc1128b8c6797d88e56402b166bd?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (e4fd195) will **increase** coverage by `16.75%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2893/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2893   +/-   ##
   =
   + Coverage 52.99%   69.75%   +16.75% 
   + Complexity 3745  375 -3370 
   =
 Files   488   54  -434 
 Lines 23527 1997-21530 
 Branches   2501  236 -2265 
   =
   - Hits  12469 1393-11076 
   + Misses 9957  473 -9484 
   + Partials   1101  131  -970 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.75% <ø> (+0.05%)` | `375.00 <ø> (+1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...java/org/apache/hudi/common/fs/StorageSchemes.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL1N0b3JhZ2VTY2hlbWVzLmphdmE=)
 | | | |
   | 
[...e/hudi/exception/HoodieDeltaStreamerException.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZURlbHRhU3RyZWFtZXJFeGNlcHRpb24uamF2YQ==)
 | | | |
   | 
[...src/main/scala/org/apache/hudi/DefaultSource.scala](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0RlZmF1bHRTb3VyY2Uuc2NhbGE=)
 | | | |
   | 
[.../java/org/apache/hudi/common/util/CommitUtils.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQ29tbWl0VXRpbHMuamF2YQ==)
 | | | |
   | 
[...e/timeline/versioning/clean/CleanPlanMigrator.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5QbGFuTWlncmF0b3IuamF2YQ==)
 | | | |
   | 
[...rg/apache/hudi/cli/commands/SavepointsCommand.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL1NhdmVwb2ludHNDb21tYW5kLmphdmE=)
 | | | |
   | 

[GitHub] [hudi] codecov-commenter commented on pull request #2893: [HUDI-1371] Support metadata based listing for Spark DataSource and Spark SQL

2021-04-28 Thread GitBox


codecov-commenter commented on pull request #2893:
URL: https://github.com/apache/hudi/pull/2893#issuecomment-828848333


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2893](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (1ce0f37) into 
[master](https://codecov.io/gh/apache/hudi/commit/e4fd195d9fd0cc1128b8c6797d88e56402b166bd?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (e4fd195) will **decrease** coverage by `43.63%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2893/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2893   +/-   ##
   
   - Coverage 52.99%   9.36%   -43.64% 
   + Complexity 3745  48 -3697 
   
 Files   488  54  -434 
 Lines 235271997-21530 
 Branches   2501 236 -2265 
   
   - Hits  12469 187-12282 
   + Misses 99571797 -8160 
   + Partials   1101  13 -1088 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.36% <ø> (-60.35%)` | `48.00 <ø> (-326.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2893?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2893/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 

[jira] [Commented] (HUDI-1847) Add ability to decouple configs for scheduling inline and running async

2021-04-28 Thread Nishith Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335061#comment-17335061
 ] 

Nishith Agarwal commented on HUDI-1847:
---

Steps to contribute this PR 

 
 # Start by adding a config to SCHEDULE compaction inline or not, so that 
allows to turn off inline compaction but schedule inline or not. This can be 
added here -> 
[https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java]
 # Next, this config needs to be added to 
[https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java]
 so it's part of the getters
 # This config then needs to be honored in all the places compaction is 
scheduled, good place to look at are : AbstractHoodieWriteClient, 
DeltaSync/HoodieDeltaStreamer and HoodieSparkSqlWriter.scala
 # Once this config is honored, you should be able to write test cases for each 
of these parts of the code to be able to test out this feature

> Add ability to decouple configs for scheduling inline and running async
> ---
>
> Key: HUDI-1847
> URL: https://issues.apache.org/jira/browse/HUDI-1847
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Nishith Agarwal
>Priority: Major
>  Labels: sev:high
>
> Currently, there are 2 ways to enable compaction:
>  
>  # Inline - This will schedule compaction inline and execute inline
>  # Async - This option is only available for HoodieDeltaStreamer based jobs. 
> This turns on scheduling inline and running async as part of the same spark 
> job.
>  
> Users need a config to be able to schedule only inline while having an 
> ability to execute in their own spark job



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] diogodilcl commented on issue #1679: [HUDI-1609] How to disable Hive JDBC and enable metastore

2021-04-28 Thread GitBox


diogodilcl commented on issue #1679:
URL: https://github.com/apache/hudi/issues/1679#issuecomment-828828874


   Hudi  version: 0.7.0
   Emr : 6.2
   
   Hi,
   
   when I use:
   
   `"hoodie.datasource.hive_sync.use_jdbc":"false"`
   
   I have the following exception:
   
   ```
   21/04/28 22:19:49 ERROR HiveSyncTool: Got runtime exception when hive syncing
   org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL
at 
org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:406)
at 
org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:384)
at 
org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:374)
at 
org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:263)
at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:181)
at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:136)
at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
at 
org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:355)
at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$4(HoodieSparkSqlWriter.scala:403)
at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$4$adapted(HoodieSparkSqlWriter.scala:399)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at 
org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:399)
at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:460)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:218)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:124)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:123)
at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:132)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:132)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:131)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:399)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)
at sun.reflect.GeneratedMethodAccessor222.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at 

[jira] [Updated] (HUDI-1371) Support file listing using metadata for Spark DataSource and Spark SQL queries

2021-04-28 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra updated HUDI-1371:

Summary: Support file listing using metadata for Spark DataSource and Spark 
SQL queries  (was: Implement Spark datasource by fetching file listing from 
metadata table)

> Support file listing using metadata for Spark DataSource and Spark SQL queries
> --
>
> Key: HUDI-1371
> URL: https://issues.apache.org/jira/browse/HUDI-1371
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1371) Implement Spark datasource by fetching file listing from metadata table

2021-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1371:
-
Labels: pull-request-available  (was: )

> Implement Spark datasource by fetching file listing from metadata table
> ---
>
> Key: HUDI-1371
> URL: https://issues.apache.org/jira/browse/HUDI-1371
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] umehrot2 opened a new pull request #2893: [HUDI-1371] Support metadata based listing for Spark DataSource and Spark SQL

2021-04-28 Thread GitBox


umehrot2 opened a new pull request #2893:
URL: https://github.com/apache/hudi/pull/2893


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   This pr adds support for metadata based listing for Hudi Spark DataSource 
and Spark SQL based queries. The detailed design for Spark integration (V2 
implementation specifically) can be found at 
https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements#RFC15:HUDIFileListingImprovements-Spark.
 Two parts of the V2 design have already been implemented:
   - Custom FileIndex for Hudi: https://github.com/apache/hudi/pull/2651
   - Registering Hudi tables as DataSource tables in Hive metastore so they are 
executed via Hudi DataSource instead of Hive InputFormat/Serde. In the process, 
it will also use the FileIndex implemented in Hudi DataSource: 
https://github.com/apache/hudi/pull/2283
   
   In this pr we build on top of the FileIndex implementation to get file 
listing using Hudi's metadata table if the feature is enabled, and otherwise 
fallback to distributed listing using Spark Context. The metadata table will be 
read just once and it will reduce O(N) list calls to O(1) get calls for N 
partitions. We also refactor the Hudi metadata table contract to add a new API 
which can fetch lists for multiple partitions (opens the reader just once). 
   
   ## Brief change log
   
   ## Verify this pull request
   
   - Existing unit tests updated
   - Internally on AWS EMR ran several performance tests via Spark DataSource 
and Spark SQL to observe improvements in query planning times
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2833: [HUDI-89] Add configOption & refactor HoodieBootstrapConfig for a demo

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2833:
URL: https://github.com/apache/hudi/pull/2833#issuecomment-828792354


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2833?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2833](https://codecov.io/gh/apache/hudi/pull/2833?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (98d109a) into 
[master](https://codecov.io/gh/apache/hudi/commit/3ca90302562580a7c5c69fd3f11ab376cfac1f0b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3ca9030) will **decrease** coverage by `16.66%`.
   > The diff coverage is `55.41%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2833/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2833?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2833   +/-   ##
   =
   - Coverage 69.75%   53.08%   -16.67% 
   - Complexity  375 3761 +3386 
   =
 Files54  489  +435 
 Lines  199723792+21795 
 Branches236 2467 +2231 
   =
   + Hits   139312630+11237 
   - Misses  47310082 +9609 
   - Partials131 1080  +949 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `39.58% <37.50%> (?)` | `220.00 <0.00> (?)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.72% <52.65%> (?)` | `1991.00 <41.00> (?)` | |
   | hudiflink | `59.67% <66.66%> (?)` | `537.00 <0.00> (?)` | |
   | hudihadoopmr | `33.33% <100.00%> (?)` | `198.00 <0.00> (?)` | |
   | hudisparkdatasource | `73.33% <79.85%> (?)` | `237.00 <4.00> (?)` | |
   | hudisync | `46.39% <15.38%> (?)` | `142.00 <0.00> (?)` | |
   | huditimelineservice | `64.07% <0.00%> (?)` | `62.00 <0.00> (?)` | |
   | hudiutilities | `68.99% <28.26%> (-0.77%)` | `374.00 <0.00> (-1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2833?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...n/java/org/apache/hudi/cli/commands/SparkMain.java](https://codecov.io/gh/apache/hudi/pull/2833/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL1NwYXJrTWFpbi5qYXZh)
 | `6.06% <0.00%> (ø)` | `4.00 <0.00> (?)` | |
   | 
[.../main/scala/org/apache/hudi/cli/SparkHelpers.scala](https://codecov.io/gh/apache/hudi/pull/2833/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGkvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL2NsaS9TcGFya0hlbHBlcnMuc2NhbGE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...pache/hudi/common/config/HoodieMetadataConfig.java](https://codecov.io/gh/apache/hudi/pull/2833/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9Ib29kaWVNZXRhZGF0YUNvbmZpZy5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...g/apache/hudi/common/config/LockConfiguration.java](https://codecov.io/gh/apache/hudi/pull/2833/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9Mb2NrQ29uZmlndXJhdGlvbi5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/hudi/pull/2833/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==)
 | `46.66% <0.00%> (ø)` | `57.00 <0.00> (?)` | |
   | 

[GitHub] [hudi] codecov-commenter commented on pull request #2833: [HUDI-89] Add configOption & refactor HoodieBootstrapConfig for a demo

2021-04-28 Thread GitBox


codecov-commenter commented on pull request #2833:
URL: https://github.com/apache/hudi/pull/2833#issuecomment-828792354


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2833?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2833](https://codecov.io/gh/apache/hudi/pull/2833?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (98d109a) into 
[master](https://codecov.io/gh/apache/hudi/commit/3ca90302562580a7c5c69fd3f11ab376cfac1f0b?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3ca9030) will **decrease** coverage by `0.76%`.
   > The diff coverage is `28.26%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2833/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2833?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2833  +/-   ##
   
   - Coverage 69.75%   68.99%   -0.77% 
   + Complexity  375  374   -1 
   
 Files54   54  
 Lines  1997 2019  +22 
 Branches236  235   -1 
   
 Hits   1393 1393  
   - Misses  473  494  +21 
   - Partials131  132   +1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `68.99% <28.26%> (-0.77%)` | `374.00 <0.00> (-1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2833?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...callback/kafka/HoodieWriteCommitKafkaCallback.java](https://codecov.io/gh/apache/hudi/pull/2833/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NhbGxiYWNrL2thZmthL0hvb2RpZVdyaXRlQ29tbWl0S2Fma2FDYWxsYmFjay5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ck/kafka/HoodieWriteCommitKafkaCallbackConfig.java](https://codecov.io/gh/apache/hudi/pull/2833/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NhbGxiYWNrL2thZmthL0hvb2RpZVdyaXRlQ29tbWl0S2Fma2FDYWxsYmFja0NvbmZpZy5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...s/deltastreamer/HoodieMultiTableDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2833/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllTXVsdGlUYWJsZURlbHRhU3RyZWFtZXIuamF2YQ==)
 | `78.52% <0.00%> (ø)` | `19.00 <0.00> (ø)` | |
   | 
[...apache/hudi/utilities/sources/AvroKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2833/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb0thZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...udi/utilities/deltastreamer/BootstrapExecutor.java](https://codecov.io/gh/apache/hudi/pull/2833/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvQm9vdHN0cmFwRXhlY3V0b3IuamF2YQ==)
 | `82.69% <100.00%> (+0.33%)` | `6.00 <0.00> (ø)` | |
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2833/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.08% 

[jira] [Created] (HUDI-1866) Investigate if hive-sync works as expected in a quickstart environment for 0.8

2021-04-28 Thread Nishith Agarwal (Jira)
Nishith Agarwal created HUDI-1866:
-

 Summary: Investigate if hive-sync works as expected in a 
quickstart environment for 0.8
 Key: HUDI-1866
 URL: https://issues.apache.org/jira/browse/HUDI-1866
 Project: Apache Hudi
  Issue Type: Bug
  Components: Hive Integration
Reporter: Nishith Agarwal
Assignee: Nishith Agarwal


Hive-Sync seems to be failing for few users as reported on slack, see an 
example here -> 
[https://apache-hudi.slack.com/archives/C4D716NPQ/p161950993803]

 

We need to investigate if this is a real issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1866) Investigate if hive-sync works as expected in a quickstart environment for 0.8

2021-04-28 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-1866:
--
Labels: sev:critical  (was: )

> Investigate if hive-sync works as expected in a quickstart environment for 0.8
> --
>
> Key: HUDI-1866
> URL: https://issues.apache.org/jira/browse/HUDI-1866
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: sev:critical
>
> Hive-Sync seems to be failing for few users as reported on slack, see an 
> example here -> 
> [https://apache-hudi.slack.com/archives/C4D716NPQ/p161950993803]
>  
> We need to investigate if this is a real issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] abhijeetkushe edited a comment on issue #2850: [SUPPORT] S3 files skipped by HoodieDeltaStreamer on s3 bucket in continuous mode

2021-04-28 Thread GitBox


abhijeetkushe edited a comment on issue #2850:
URL: https://github.com/apache/hudi/issues/2850#issuecomment-824917902


   @xushiyan Thanks for your prompt reply.I agree that the issue I am facing is 
somewhat related to 
[HUDI-1723](https://issues.apache.org/jira/browse/HUDI-1723).
   It is great that the hudi team is actively working on addressing this 
issue.We have come up with the below interim solution to address our issue
   
   - We are using INSERT while writing our data as that is both memory and time 
efficient so using UPSERT just to handle missing files will not work for us
   - The solution you proposed for overriding the DFSPathSelector will work for 
us.We are planning to override the  [below 
line](https://github.com/apache/hudi/blob/release-0.6.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/DFSPathSelector.java#L92)
 
   with `f.getModificationTime() <= 
Long.valueOf(lastCheckpointStr.get()).longValue() ||
   f.getModificationTime() > (System.currentTimeMillis() - 3)`. We are 
using hudi version 0.6.0
   This will result in a 30 seconds lag while writing records which is 
acceptable to us and will address missing file problem completely.The 30 
seconds lag will be configurable via an environment variable.The 
HoodieDeltaStreamer takes  --source-class as a argument where we will be 
providing our custom JsonDFSSource which delegates to our custom 
DFSPathSelector.
   - Can you please validate whether hoodiedeltastreamer will be able to record 
correct checkpoint with the change I am proposing to make above ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2892:
URL: https://github.com/apache/hudi/pull/2892#issuecomment-828296052


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2892](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (e2d0335) into 
[master](https://codecov.io/gh/apache/hudi/commit/386767693d46e7419c4fb0fa292ccb7ab7f7098d?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3867676) will **decrease** coverage by `21.87%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2892/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2892   +/-   ##
   =
   - Coverage 69.75%   47.87%   -21.88% 
   - Complexity  375 3419 +3044 
   =
 Files54  488  +434 
 Lines  199723527+21530 
 Branches236 2501 +2265 
   =
   + Hits   139311264 +9871 
   - Misses  47311281+10808 
   - Partials131  982  +851 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `39.53% <ø> (?)` | `220.00 <ø> (?)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.37% <ø> (?)` | `1975.00 <ø> (?)` | |
   | hudiflink | `59.67% <ø> (?)` | `537.00 <ø> (?)` | |
   | hudihadoopmr | `33.33% <ø> (?)` | `198.00 <ø> (?)` | |
   | hudisparkdatasource | `73.33% <ø> (?)` | `237.00 <ø> (?)` | |
   | hudisync | `46.39% <ø> (?)` | `142.00 <ø> (?)` | |
   | huditimelineservice | `64.36% <ø> (?)` | `62.00 <ø> (?)` | |
   | hudiutilities | `9.36% <ø> (-60.40%)` | `48.00 <ø> (-327.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 

[jira] [Commented] (HUDI-1607) Decimal handling bug in SparkAvroPostProcessor

2021-04-28 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334823#comment-17334823
 ] 

sivabalan narayanan commented on HUDI-1607:
---

https://issues.apache.org/jira/browse/HUDI-1343?focusedCommentId=17325964=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17325964

 

> Decimal handling bug in SparkAvroPostProcessor 
> ---
>
> Key: HUDI-1607
> URL: https://issues.apache.org/jira/browse/HUDI-1607
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jingwei Zhang
>Priority: Major
>  Labels: sev:critical, user-support-issues
>
> This issue related to 
> [#[Hudi-1343]|[https://github.com/apache/hudi/pull/2192].]
> I think the purpose of Hudi-1343 was to bridge the difference between avro 
> 1.8.2(used by hudi) and avro 1.9.2(used by upstream system) thru internal 
> Struct type. In particular, the incompatible form to express nullable type 
> between those two versions. 
> It was all good until I hit the type Decimal. Since it can either be FIXED or 
> BYTES, if an avro schema contains decimal type with BYTES as its literal 
> type, after this two way conversion its literal type become FIXED instead. 
> This will cause an exception to be thrown in AvroConversionHelper as the data 
> underneath is HeapByteBuffer rather than GenericFixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1864) Support for java.time.LocalDate in TimestampBasedAvroKeyGenerator

2021-04-28 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-1864:
--
Labels: sev:high  (was: )

> Support for java.time.LocalDate in TimestampBasedAvroKeyGenerator
> -
>
> Key: HUDI-1864
> URL: https://issues.apache.org/jira/browse/HUDI-1864
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Vaibhav Sinha
>Priority: Major
>  Labels: sev:high
>
> When we read data from MySQL which has a column of type {{Date}}, Spark 
> represents it as an instance of {{java.time.LocalDate}}. If I try and use 
> this column for partitioning while doing a write to Hudi, I get the following 
> exception
>  
> {code:java}
> Caused by: org.apache.hudi.exception.HoodieKeyGeneratorException: Unable to 
> parse input partition field :2021-04-21
>   at 
> org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator.getPartitionPath(TimestampBasedAvroKeyGenerator.java:136)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.keygen.CustomAvroKeyGenerator.getPartitionPath(CustomAvroKeyGenerator.java:89)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.keygen.CustomKeyGenerator.getPartitionPath(CustomKeyGenerator.java:64)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.keygen.BaseKeyGenerator.getKey(BaseKeyGenerator.java:62) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$write$2(HoodieSparkSqlWriter.scala:160)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.Iterator$SliceIterator.next(Iterator.scala:271) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.Iterator.foreach(Iterator.scala:941) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.Iterator.foreach$(Iterator.scala:941) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) 
> ~[scala-library-2.12.10.jar:?]
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) 
> ~[scala-library-2.12.10.jar:?]
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.AbstractIterator.to(Iterator.scala:1429) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) 
> ~[scala-library-2.12.10.jar:?]
>   at 
> scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) 
> ~[scala-library-2.12.10.jar:?]
>   at org.apache.spark.rdd.RDD.$anonfun$take$2(RDD.scala:1449) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2242) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
>   at org.apache.spark.scheduler.Task.run(Task.scala:131) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
>  ~[spark-core_2.12-3.1.1.jar:3.1.1]
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_171]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_171]
>   at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_171]
> Caused by: org.apache.hudi.exception.HoodieNotSupportedException: Unexpected 
> type for partition field: java.time.LocalDate

[jira] [Commented] (HUDI-1739) insert_overwrite_table and insert_overwrite create empty replacecommit.requested file which breaks archival

2021-04-28 Thread satish (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334816#comment-17334816
 ] 

satish commented on HUDI-1739:
--

[~shivnarayan] https://github.com/apache/hudi/pull/2784 we already have PR for 
this

Looks like this is is also dup of HUDI-1740

> insert_overwrite_table and insert_overwrite create empty 
> replacecommit.requested file which breaks archival
> ---
>
> Key: HUDI-1739
> URL: https://issues.apache.org/jira/browse/HUDI-1739
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jagmeet Bali
>Assignee: Susu Dong
>Priority: Minor
>  Labels: sev:high
>
> Fixes can be to 
>  # Ignore empty replacecommit.requested files.
>  # Standardise the replacecommit.requested format across all invocations be 
> it from clustering or this use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] satishkotha commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-28 Thread GitBox


satishkotha commented on pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#issuecomment-828560911


   > I will take a pass on this and land!
   
   @vinothchandar could you please review this since its been waiting for some 
time?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 closed pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


danny0405 closed pull request #2892:
URL: https://github.com/apache/hudi/pull/2892


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (HUDI-1063) Save in Google Cloud Storage not working

2021-04-28 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334773#comment-17334773
 ] 

sivabalan narayanan edited comment on HUDI-1063 at 4/28/21, 2:51 PM:
-

[~WaterKnight]: I could not reproduce the issue w/ latest master. things are 
working fine. 

I followed 
[this|https://holowczak.com/getting-started-with-apache-spark-on-google-cloud-platform-using-dataproc/]
 link to set up my cluster. 

Command I used to launch spark shell

```

/usr/lib/spark/bin/spark-shell --packages 
org.apache.spark:spark-avro_2.12:3.0.0 --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --jars 
/home/n_siva_b/hudi-spark3-bundle_2.12-0.9.0-SNAPSHOT.jar

```

[Link|https://gist.github.com/nsivabalan/03736cda20c10781957b83a89e2f6650] to 
gist for steps I tried out. 

 

Not sure if Hadoop 3+ was tried w/ 0.5.3. Hudi has few more releases after 
0.5.0 with latest as 0.8.0 which is tested for hadoop3. If you want to try out 
hudi 0.5.3, would recommend trying out hadoop2.7 may be.  

 

 


was (Author: shivnarayan):
[~WaterKnight]: I could not reproduce the issue w/ latest master. things are 
working fine. 

Command I used to launch spark shell

```

/usr/lib/spark/bin/spark-shell --packages 
org.apache.spark:spark-avro_2.12:3.0.0 --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --jars 
/home/n_siva_b/hudi-spark3-bundle_2.12-0.9.0-SNAPSHOT.jar

```

[Link|https://gist.github.com/nsivabalan/03736cda20c10781957b83a89e2f6650] to 
gist for steps I tried out. 

 

Not sure if Hadoop 3+ was tried w/ 0.5.3. Hudi has few more releases after 
0.5.0 with latest as 0.8.0 which is tested for hadoop3. If you want to try out 
hudi 0.5.3, would recommend trying out hadoop2.7 may be.  

 

 

> Save in Google Cloud Storage not working
> 
>
> Key: HUDI-1063
> URL: https://issues.apache.org/jira/browse/HUDI-1063
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.9.0
>Reporter: David Lacalle Castillo
>Priority: Critical
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> I added to spark submit the following properties: 
> {{--packages 
> org.apache.hudi:hudi-spark-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4
>  \  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'}}
> Spark version 2.4.5 and Hadoop version 3.2.1
>  
> I am trying to save a Dataframe as follows in Google Cloud Storage as follows:
> tableName = "forecasts"
> basePath = "gs://hudi-datalake/" + tableName
> hudi_options = {
>  'hoodie.table.name': tableName,
>  'hoodie.datasource.write.recordkey.field': 'uuid',
>  'hoodie.datasource.write.partitionpath.field': 'partitionpath',
>  'hoodie.datasource.write.table.name': tableName,
>  'hoodie.datasource.write.operation': 'insert',
>  'hoodie.datasource.write.precombine.field': 'ts',
>  'hoodie.upsert.shuffle.parallelism': 2, 
>  'hoodie.insert.shuffle.parallelism': 2
> }
> results = results.selectExpr(
>  "ds as date",
>  "store",
>  "item",
>  "y as sales",
>  "yhat as sales_predicted",
>  "yhat_upper as sales_predicted_upper",
>  "yhat_lower as sales_predicted_lower",
>  "training_date")
> results.write.format("hudi"). \
>  options(**hudi_options). \
>  mode("overwrite"). \
>  save(basePath)
> I am getting the following error:
> Py4JJavaError: An error occurred while calling o312.save. : 
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V at 
> io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50)
>  at io.javalin.Javalin.(Javalin.java:94) at 
> io.javalin.Javalin.create(Javalin.java:107) at 
> org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:102)
>  at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:74)
>  at 
> org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:102)
>  at 
> org.apache.hudi.client.AbstractHoodieClient.(AbstractHoodieClient.java:69)
>  at 
> org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:83)
>  at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:137) 
> at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:124) 
> at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:120) 
> at 
> org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:195) 
> at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:135) 
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108) at 
> 

[jira] [Updated] (HUDI-1063) Save in Google Cloud Storage not working

2021-04-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1063:
--
Labels: sev:critical sev:triage user-support-issues  (was: sev:critical 
user-support-issues)

> Save in Google Cloud Storage not working
> 
>
> Key: HUDI-1063
> URL: https://issues.apache.org/jira/browse/HUDI-1063
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.9.0
>Reporter: David Lacalle Castillo
>Priority: Critical
>  Labels: sev:critical, sev:triage, user-support-issues
> Fix For: 0.9.0
>
>
> I added to spark submit the following properties: 
> {{--packages 
> org.apache.hudi:hudi-spark-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4
>  \  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'}}
> Spark version 2.4.5 and Hadoop version 3.2.1
>  
> I am trying to save a Dataframe as follows in Google Cloud Storage as follows:
> tableName = "forecasts"
> basePath = "gs://hudi-datalake/" + tableName
> hudi_options = {
>  'hoodie.table.name': tableName,
>  'hoodie.datasource.write.recordkey.field': 'uuid',
>  'hoodie.datasource.write.partitionpath.field': 'partitionpath',
>  'hoodie.datasource.write.table.name': tableName,
>  'hoodie.datasource.write.operation': 'insert',
>  'hoodie.datasource.write.precombine.field': 'ts',
>  'hoodie.upsert.shuffle.parallelism': 2, 
>  'hoodie.insert.shuffle.parallelism': 2
> }
> results = results.selectExpr(
>  "ds as date",
>  "store",
>  "item",
>  "y as sales",
>  "yhat as sales_predicted",
>  "yhat_upper as sales_predicted_upper",
>  "yhat_lower as sales_predicted_lower",
>  "training_date")
> results.write.format("hudi"). \
>  options(**hudi_options). \
>  mode("overwrite"). \
>  save(basePath)
> I am getting the following error:
> Py4JJavaError: An error occurred while calling o312.save. : 
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V at 
> io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50)
>  at io.javalin.Javalin.(Javalin.java:94) at 
> io.javalin.Javalin.create(Javalin.java:107) at 
> org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:102)
>  at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:74)
>  at 
> org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:102)
>  at 
> org.apache.hudi.client.AbstractHoodieClient.(AbstractHoodieClient.java:69)
>  at 
> org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:83)
>  at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:137) 
> at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:124) 
> at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:120) 
> at 
> org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:195) 
> at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:135) 
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108) at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) 
> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
>  at 
> 

[jira] [Comment Edited] (HUDI-1063) Save in Google Cloud Storage not working

2021-04-28 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334773#comment-17334773
 ] 

sivabalan narayanan edited comment on HUDI-1063 at 4/28/21, 2:50 PM:
-

[~WaterKnight]: I could not reproduce the issue w/ latest master. things are 
working fine. 

Command I used to launch spark shell

```

/usr/lib/spark/bin/spark-shell --packages 
org.apache.spark:spark-avro_2.12:3.0.0 --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --jars 
/home/n_siva_b/hudi-spark3-bundle_2.12-0.9.0-SNAPSHOT.jar

```

[Link|https://gist.github.com/nsivabalan/03736cda20c10781957b83a89e2f6650] to 
gist for steps I tried out. 

 

Not sure if Hadoop 3+ was tried w/ 0.5.3. Hudi has few more releases after 
0.5.0 with latest as 0.8.0 which is tested for hadoop3. If you want to try out 
hudi 0.5.3, would recommend trying out hadoop2.7 may be.  

 

 


was (Author: shivnarayan):
[~WaterKnight]: I could not reproduce the issue w/ latest master. things are 
working fine. 

Command I used to launch spark shell

```

/usr/lib/spark/bin/spark-shell --packages 
org.apache.spark:spark-avro_2.12:3.0.0 --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --jars 
/home/n_siva_b/hudi-spark3-bundle_2.12-0.9.0-SNAPSHOT.jar

```

[Link|https://gist.github.com/nsivabalan/03736cda20c10781957b83a89e2f6650] to 
gist for steps I tried out. 

 

> Save in Google Cloud Storage not working
> 
>
> Key: HUDI-1063
> URL: https://issues.apache.org/jira/browse/HUDI-1063
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.9.0
>Reporter: David Lacalle Castillo
>Priority: Critical
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> I added to spark submit the following properties: 
> {{--packages 
> org.apache.hudi:hudi-spark-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4
>  \  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'}}
> Spark version 2.4.5 and Hadoop version 3.2.1
>  
> I am trying to save a Dataframe as follows in Google Cloud Storage as follows:
> tableName = "forecasts"
> basePath = "gs://hudi-datalake/" + tableName
> hudi_options = {
>  'hoodie.table.name': tableName,
>  'hoodie.datasource.write.recordkey.field': 'uuid',
>  'hoodie.datasource.write.partitionpath.field': 'partitionpath',
>  'hoodie.datasource.write.table.name': tableName,
>  'hoodie.datasource.write.operation': 'insert',
>  'hoodie.datasource.write.precombine.field': 'ts',
>  'hoodie.upsert.shuffle.parallelism': 2, 
>  'hoodie.insert.shuffle.parallelism': 2
> }
> results = results.selectExpr(
>  "ds as date",
>  "store",
>  "item",
>  "y as sales",
>  "yhat as sales_predicted",
>  "yhat_upper as sales_predicted_upper",
>  "yhat_lower as sales_predicted_lower",
>  "training_date")
> results.write.format("hudi"). \
>  options(**hudi_options). \
>  mode("overwrite"). \
>  save(basePath)
> I am getting the following error:
> Py4JJavaError: An error occurred while calling o312.save. : 
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V at 
> io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50)
>  at io.javalin.Javalin.(Javalin.java:94) at 
> io.javalin.Javalin.create(Javalin.java:107) at 
> org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:102)
>  at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:74)
>  at 
> org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:102)
>  at 
> org.apache.hudi.client.AbstractHoodieClient.(AbstractHoodieClient.java:69)
>  at 
> org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:83)
>  at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:137) 
> at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:124) 
> at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:120) 
> at 
> org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:195) 
> at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:135) 
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108) at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>  at 
> 

[jira] [Commented] (HUDI-1063) Save in Google Cloud Storage not working

2021-04-28 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334773#comment-17334773
 ] 

sivabalan narayanan commented on HUDI-1063:
---

[~WaterKnight]: I could not reproduce the issue w/ latest master. things are 
working fine. 

Command I used to launch spark shell

```

/usr/lib/spark/bin/spark-shell --packages 
org.apache.spark:spark-avro_2.12:3.0.0 --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --jars 
/home/n_siva_b/hudi-spark3-bundle_2.12-0.9.0-SNAPSHOT.jar

```

[Link|https://gist.github.com/nsivabalan/03736cda20c10781957b83a89e2f6650] to 
gist for steps I tried out. 

 

> Save in Google Cloud Storage not working
> 
>
> Key: HUDI-1063
> URL: https://issues.apache.org/jira/browse/HUDI-1063
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.9.0
>Reporter: David Lacalle Castillo
>Priority: Critical
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> I added to spark submit the following properties: 
> {{--packages 
> org.apache.hudi:hudi-spark-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4
>  \  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'}}
> Spark version 2.4.5 and Hadoop version 3.2.1
>  
> I am trying to save a Dataframe as follows in Google Cloud Storage as follows:
> tableName = "forecasts"
> basePath = "gs://hudi-datalake/" + tableName
> hudi_options = {
>  'hoodie.table.name': tableName,
>  'hoodie.datasource.write.recordkey.field': 'uuid',
>  'hoodie.datasource.write.partitionpath.field': 'partitionpath',
>  'hoodie.datasource.write.table.name': tableName,
>  'hoodie.datasource.write.operation': 'insert',
>  'hoodie.datasource.write.precombine.field': 'ts',
>  'hoodie.upsert.shuffle.parallelism': 2, 
>  'hoodie.insert.shuffle.parallelism': 2
> }
> results = results.selectExpr(
>  "ds as date",
>  "store",
>  "item",
>  "y as sales",
>  "yhat as sales_predicted",
>  "yhat_upper as sales_predicted_upper",
>  "yhat_lower as sales_predicted_lower",
>  "training_date")
> results.write.format("hudi"). \
>  options(**hudi_options). \
>  mode("overwrite"). \
>  save(basePath)
> I am getting the following error:
> Py4JJavaError: An error occurred while calling o312.save. : 
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V at 
> io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50)
>  at io.javalin.Javalin.(Javalin.java:94) at 
> io.javalin.Javalin.create(Javalin.java:107) at 
> org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:102)
>  at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:74)
>  at 
> org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:102)
>  at 
> org.apache.hudi.client.AbstractHoodieClient.(AbstractHoodieClient.java:69)
>  at 
> org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:83)
>  at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:137) 
> at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:124) 
> at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:120) 
> at 
> org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:195) 
> at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:135) 
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108) at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) 
> at 
> 

[jira] [Comment Edited] (HUDI-1854) Corrupt blocks in GCS log files

2021-04-28 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334771#comment-17334771
 ] 

sivabalan narayanan edited comment on HUDI-1854 at 4/28/21, 2:47 PM:
-

For me, things are working. not able to reproduce. I tried w/ latest master 
fyi. 

Followed this 
[link|[https://hol|https://hol/]owczak.com/getting-started-with-apache-spark-on-google-cloud-platform-using-dataproc/]
 to set up my cluster. 

 

Launch command: 

```

/usr/lib/spark/bin/spark-shell --packages 
org.apache.spark:spark-avro_2.12:3.0.0 --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --jars 
/home/n_siva_b/hudi-spark3-bundle_2.12-0.9.0-SNAPSHOT.jar

```

Gist link for commands I ran. 
[https://gist.github.com/nsivabalan/03736cda20c10781957b83a89e2f6650]

I verified via console, that log files were > 16Mb. // Check attached 
screenshot.

 

 


was (Author: shivnarayan):
For me, things are working. not able to reproduce. I tried w/ latest master 
fyi. 

Followed this 
[link|[https://hol|https://hol/]owczak.com/getting-started-with-apache-spark-on-google-cloud-platform-using-dataproc/]
 to set up my cluster. 

 

Launch command: 

```

/usr/lib/spark/bin/spark-shell --packages 
org.apache.spark:spark-avro_2.12:3.0.0 --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --jars 
/home/n_siva_b/hudi-spark3-bundle_2.12-0.9.0-SNAPSHOT.jar

```

Gist link for commands I ran. 
[https://gist.github.com/nsivabalan/03736cda20c10781957b83a89e2f6650]

I verified via console, that log files were > 16Mb. 

 

 

> Corrupt blocks in GCS log files
> ---
>
> Key: HUDI-1854
> URL: https://issues.apache.org/jira/browse/HUDI-1854
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Nishith Agarwal
>Priority: Major
>  Labels: sev:critical, sev:triage
> Attachments: Screen Shot 2021-04-28 at 10.42.50 AM.png
>
>
> Details on how to reproduce this can be found here -> 
> [https://github.com/apache/hudi/issues/2692]
>  
> We need a GCS, google data proc environment to reproduce this. 
>  
> [~vburenin] Would you be able to help try out hudi 0.7 and follow the steps 
> mentioned in this ticket to help reproduce this issue and find the root cause 
> ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1854) Corrupt blocks in GCS log files

2021-04-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1854:
--
Attachment: Screen Shot 2021-04-28 at 10.42.50 AM.png

> Corrupt blocks in GCS log files
> ---
>
> Key: HUDI-1854
> URL: https://issues.apache.org/jira/browse/HUDI-1854
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Nishith Agarwal
>Priority: Major
>  Labels: sev:critical, sev:triage
> Attachments: Screen Shot 2021-04-28 at 10.42.50 AM.png
>
>
> Details on how to reproduce this can be found here -> 
> [https://github.com/apache/hudi/issues/2692]
>  
> We need a GCS, google data proc environment to reproduce this. 
>  
> [~vburenin] Would you be able to help try out hudi 0.7 and follow the steps 
> mentioned in this ticket to help reproduce this issue and find the root cause 
> ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1854) Corrupt blocks in GCS log files

2021-04-28 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334771#comment-17334771
 ] 

sivabalan narayanan commented on HUDI-1854:
---

For me, things are working. not able to reproduce. I tried w/ latest master 
fyi. 

Followed this 
[link|[https://hol|https://hol/]owczak.com/getting-started-with-apache-spark-on-google-cloud-platform-using-dataproc/]
 to set up my cluster. 

 

Launch command: 

```

/usr/lib/spark/bin/spark-shell --packages 
org.apache.spark:spark-avro_2.12:3.0.0 --conf 
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --jars 
/home/n_siva_b/hudi-spark3-bundle_2.12-0.9.0-SNAPSHOT.jar

```

Gist link for commands I ran. 
[https://gist.github.com/nsivabalan/03736cda20c10781957b83a89e2f6650]

I verified via console, that log files were > 16Mb. 

 

 

> Corrupt blocks in GCS log files
> ---
>
> Key: HUDI-1854
> URL: https://issues.apache.org/jira/browse/HUDI-1854
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Nishith Agarwal
>Priority: Major
>  Labels: sev:critical, sev:triage
> Attachments: Screen Shot 2021-04-28 at 10.42.50 AM.png
>
>
> Details on how to reproduce this can be found here -> 
> [https://github.com/apache/hudi/issues/2692]
>  
> We need a GCS, google data proc environment to reproduce this. 
>  
> [~vburenin] Would you be able to help try out hudi 0.7 and follow the steps 
> mentioned in this ticket to help reproduce this issue and find the root cause 
> ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter edited a comment on pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2892:
URL: https://github.com/apache/hudi/pull/2892#issuecomment-828296052


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2892](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (e2d0335) into 
[master](https://codecov.io/gh/apache/hudi/commit/386767693d46e7419c4fb0fa292ccb7ab7f7098d?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3867676) will **decrease** coverage by `60.39%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2892/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2892   +/-   ##
   
   - Coverage 69.75%   9.36%   -60.40% 
   + Complexity  375  48  -327 
   
 Files54  54   
 Lines  19971997   
 Branches236 236   
   
   - Hits   1393 187 -1206 
   - Misses  4731797 +1324 
   + Partials131  13  -118 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudiclient | `?` | `?` | |
   | hudiutilities | `9.36% <ø> (-60.40%)` | `48.00 <ø> (-327.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 

[GitHub] [hudi] danny0405 commented on pull request #2868: [HUDI-1821] Remove legacy code for Flink writer

2021-04-28 Thread GitBox


danny0405 commented on pull request #2868:
URL: https://github.com/apache/hudi/pull/2868#issuecomment-828420402


   > I still insist that we need to include kafka-related dependencies. If you 
look back at the HoodieFlinkStreamerV2 class. What is it in essence? It is just 
a program written using Flink DataStream API, which is specific (Kafka -> Hudi)
   
   No, on one says that they don't know how to add a connector jar or actually 
few people use the `HoodieFlinkStreamerV2` tool.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2892:
URL: https://github.com/apache/hudi/pull/2892#issuecomment-828296052


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2892](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d06be43) into 
[master](https://codecov.io/gh/apache/hudi/commit/386767693d46e7419c4fb0fa292ccb7ab7f7098d?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3867676) will **decrease** coverage by `0.05%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2892/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2892  +/-   ##
   
   - Coverage 69.75%   69.70%   -0.06% 
   + Complexity  375  374   -1 
   
 Files54   54  
 Lines  1997 1997  
 Branches236  236  
   
   - Hits   1393 1392   -1 
 Misses  473  473  
   - Partials131  132   +1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.70% <ø> (-0.06%)` | `374.00 <ø> (-1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.08% <0.00%> (-0.35%)` | `55.00% <0.00%> (-1.00%)` | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 closed pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


danny0405 closed pull request #2892:
URL: https://github.com/apache/hudi/pull/2892


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #2868: [HUDI-1821] Remove legacy code for Flink writer

2021-04-28 Thread GitBox


yanghua commented on pull request #2868:
URL: https://github.com/apache/hudi/pull/2868#issuecomment-828408491


   > > I have two questions:
   > > 
   > > 1. The lowest Flink version we supported is 1.12.x?
   > > 2. Can we provide an e2e demo and documentation to show the usage of the 
flink streamer via jar-mode, just like delta-streamer, it should be out of the 
box;
   > > 
   > > I tried it, but missed the dependencies of the Kafka connector. Can we 
make the new flink streamer peer to the delta streamer?
   > 
   > Yes, people would only use flink 1.12.x code, the code to remove is not 
because of flink version, it's because the logic is totally redundant. Remove 
to avoid cofusion, because i found some people use the legacy code with poor 
performance.
   
   Although I know that many users are currently testing based on 1.12, the 
threshold we set for many users of older versions is very high. Pray that they 
are willing to upgrade the Flink version in order to use hudi. In fact, I 
personally think that the biggest improvement of the new implementation lies in 
the bucket assigner. As for other points, we could have found a solution 
(although it does not seem very elegant). Well, I don't have to worry about the 
Flink version anymore, and I don't have time to pay attention to the old 
implementation.
   
   > I still think we should not include a kafka connector into the delta 
streamer, on one complains the missing of it, based on the users i see.
   
   I still insist that we need to include kafka-related dependencies. If you 
look back at the HoodieFlinkStreamerV2 class. What is it in essence? It is just 
a program written using Flink DataStream API, which is specific (Kafka -> 
Hudi), not plug-in-oriented or abstract-oriented. For a specific Flink program, 
we should provide users with an Uber(fat) Jar. Instead of letting users pay 
attention to details and pay additional costs. Otherwise, why don't we make the 
source universal?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (HUDI-1343) Add standard schema postprocessor which would rewrite the schema using spark-avro conversion

2021-04-28 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325964#comment-17325964
 ] 

sivabalan narayanan edited comment on HUDI-1343 at 4/28/21, 11:05 AM:
--

[~liujinhui] [~vbalaji] [~nishith29] : Do you folks think if this is still 
required after this fix [https://github.com/apache/hudi/pull/2765] . This fixes 
AvroConvertionUtils.convertStructTypeToAvroSchema() to ensure null is first 
entry in union and default value is set to null if a field is nullable in spark 
structtype. 

I mean, we have enabled the post schema processor by default. so wanted to 
double check if it's still applicable. 


was (Author: shivnarayan):
[~liujinhui] [~vbalaji]: Do you folks think if this is still required after 
this fix [https://github.com/apache/hudi/pull/2765] . This fixes 
AvroConvertionUtils.convertStructTypeToAvroSchema() to ensure null is first 
entry in union and default value is set to null if a field is nullable in spark 
structtype. 

I mean, we have enabled the post schema processor by default. so wanted to 
double check if it's still applicable. 

> Add standard schema postprocessor which would rewrite the schema using 
> spark-avro conversion
> 
>
> Key: HUDI-1343
> URL: https://issues.apache.org/jira/browse/HUDI-1343
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.7.0
>
>
> When we use Transformer, the final Schema which we use to convert avro record 
> to bytes is auto generated by spark. This could be different (due to the way 
> Avro treats it) from the target schema that is being used to write (as the 
> target schema could be coming from Schema Registry). 
>  
> For example : 
> Schema generated by spark-avro when converting Row to avro
> {
>   "type" : "record",
>   "name" : "hoodie_source",
>   "namespace" : "hoodie.source",
>   "fields" : [ {
>     "name" : "_ts_ms",
>     "type" : [ "long", "null" ]
>   }, {
>     "name" : "_op",
>     "type" : "string"
>   }, {
>     "name" : "inc_id",
>     "type" : "int"
>   }, {
>     "name" : "year",
>     "type" : [ "int", "null" ]
>   }, {
>     "name" : "violation_desc",
>     "type" : [ "string", "null" ]
>   }, {
>     "name" : "violation_code",
>     "type" : [ "string", "null" ]
>   }, {
>     "name" : "case_individual_id",
>     "type" : [ "int", "null" ]
>   }, {
>     "name" : "flag",
>     "type" : [ "string", "null" ]
>   }, {
>     "name" : "last_modified_ts",
>     "type" : "long"
>   } ]
> }
>  
> is not compatible with the Avro Schema:
>  
> {
>   "type" : "record",
>   "name" : "formatted_debezium_payload",
>   "fields" : [ {
>     "name" : "_ts_ms",
>     "type" : [ "null", "long" ],
>     "default" : null
>   }, {
>     "name" : "_op",
>     "type" : "string",
>     "default" : null
>   }, {
>     "name" : "inc_id",
>     "type" : "int",
>     "default" : null
>   }, {
>     "name" : "year",
>     "type" : [ "null", "int" ],
>     "default" : null
>   }, {
>     "name" : "violation_desc",
>     "type" : [ "null", "string" ],
>     "default" : null
>   }, {
>     "name" : "violation_code",
>     "type" : [ "null", "string" ],
>     "default" : null
>   }, {
>     "name" : "case_individual_id",
>     "type" : [ "null", "int" ],
>     "default" : null
>   }, {
>     "name" : "flag",
>     "type" : [ "null", "string" ],
>     "default" : null
>   }, {
>     "name" : "last_modified_ts",
>     "type" : "long",
>     "default" : null
>   } ]
> }
>  
> Note that the type order is different for individual fields : 
> "type" : [ "null", "string" ], vs  "type" : [ "string", "null" ]
> Unexpectedly, Avro decoding fails when bytes written with first schema is 
> read using second schema.
>  
> One way to fix is to use configured target schema when generating record 
> bytes but this is not easy without breaking Record payload constructor API 
> used by deltastreamer. 
> The other option is to apply a post-processor on target schema to make it 
> schema consistent with Transformer generated records.
>  
> This ticket is to use the later approach of creating a standard schema 
> post-processor and adding it by default when Transformer is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on issue #2887: [SUPPORT]

2021-04-28 Thread GitBox


nsivabalan commented on issue #2887:
URL: https://github.com/apache/hudi/issues/2887#issuecomment-828363941


   while you update the ticket w/ more info, curious to know if you had set 
partition path to empty intentionally?
   ```
   .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), 
StringUtils.EMPTY)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1739) insert_overwrite_table and insert_overwrite create empty replacecommit.requested file which breaks archival

2021-04-28 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334645#comment-17334645
 ] 

sivabalan narayanan commented on HUDI-1739:
---

CC [~satishkotha]

> insert_overwrite_table and insert_overwrite create empty 
> replacecommit.requested file which breaks archival
> ---
>
> Key: HUDI-1739
> URL: https://issues.apache.org/jira/browse/HUDI-1739
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jagmeet Bali
>Assignee: Susu Dong
>Priority: Minor
>  Labels: sev:high
>
> Fixes can be to 
>  # Ignore empty replacecommit.requested files.
>  # Standardise the replacecommit.requested format across all invocations be 
> it from clustering or this use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1739) insert_overwrite_table and insert_overwrite create empty replacecommit.requested file which breaks archival

2021-04-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1739:
--
Labels: sev:high  (was: sev:critical)

> insert_overwrite_table and insert_overwrite create empty 
> replacecommit.requested file which breaks archival
> ---
>
> Key: HUDI-1739
> URL: https://issues.apache.org/jira/browse/HUDI-1739
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jagmeet Bali
>Assignee: Susu Dong
>Priority: Minor
>  Labels: sev:high
>
> Fixes can be to 
>  # Ignore empty replacecommit.requested files.
>  # Standardise the replacecommit.requested format across all invocations be 
> it from clustering or this use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter edited a comment on pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2892:
URL: https://github.com/apache/hudi/pull/2892#issuecomment-828296052


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2892](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (d06be43) into 
[master](https://codecov.io/gh/apache/hudi/commit/386767693d46e7419c4fb0fa292ccb7ab7f7098d?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3867676) will **decrease** coverage by `0.05%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2892/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2892  +/-   ##
   
   - Coverage 69.75%   69.70%   -0.06% 
   + Complexity  375  374   -1 
   
 Files54   54  
 Lines  1997 1997  
 Branches236  236  
   
   - Hits   1393 1392   -1 
 Misses  473  473  
   - Partials131  132   +1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudiclient | `?` | `?` | |
   | hudiutilities | `69.70% <ø> (-0.06%)` | `374.00 <ø> (-1.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.08% <0.00%> (-0.35%)` | `55.00% <0.00%> (-1.00%)` | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 edited a comment on pull request #2868: [HUDI-1821] Remove legacy code for Flink writer

2021-04-28 Thread GitBox


danny0405 edited a comment on pull request #2868:
URL: https://github.com/apache/hudi/pull/2868#issuecomment-828336262


   > I have two questions:
   > 
   > 1. The lowest Flink version we supported is 1.12.x?
   > 2. Can we provide an e2e demo and documentation to show the usage of the 
flink streamer via jar-mode, just like delta-streamer, it should be out of the 
box;
   > 
   > I tried it, but missed the dependencies of the Kafka connector. Can we 
make the new flink streamer peer to the delta streamer?
   
   Yes, people would only use flink 1.12.x code, the code to remove is not 
because of flink version, it's because the logic is totally redundant. Remove 
to avoid cofusion, because i found some people use the legacy code with poor 
performace.
   
   I still think we should not include a kafka connector into the delta 
streamer, on one complains the missing of it, based on the users i see.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on pull request #2868: [HUDI-1821] Remove legacy code for Flink writer

2021-04-28 Thread GitBox


danny0405 commented on pull request #2868:
URL: https://github.com/apache/hudi/pull/2868#issuecomment-828336262


   > I have two questions:
   > 
   > 1. The lowest Flink version we supported is 1.12.x?
   > 2. Can we provide an e2e demo and documentation to show the usage of the 
flink streamer via jar-mode, just like delta-streamer, it should be out of the 
box;
   > 
   > I tried it, but missed the dependencies of the Kafka connector. Can we 
make the new flink streamer peer to the delta streamer?
   
   Yes, people would only use flink 1.12.x code, the code to remove is not 
because of flink version, it's because the logic is totally redundant.
   
   I still think we should not include a kafka connector into the delta 
streamer, on one complains the missing of it, based on the users i see.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #2868: [HUDI-1821] Remove legacy code for Flink writer

2021-04-28 Thread GitBox


yanghua commented on pull request #2868:
URL: https://github.com/apache/hudi/pull/2868#issuecomment-828323317


   I have two questions:
   
   1) The lowest Flink version we supported is 1.12.x?
   2) Can we provide an e2e demo and documentation to show the usage of the 
flink streamer via jar-mode, just like delta-streamer, it should be out of the 
box;
   
   I tried it, but missed the dependencies of the Kafka connector. Can we make 
the new flink streamer peer to the delta streamer?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter commented on pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


codecov-commenter commented on pull request #2892:
URL: https://github.com/apache/hudi/pull/2892#issuecomment-828296052


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2892](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (ab94864) into 
[master](https://codecov.io/gh/apache/hudi/commit/386767693d46e7419c4fb0fa292ccb7ab7f7098d?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3867676) will **decrease** coverage by `60.39%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2892/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2892   +/-   ##
   
   - Coverage 69.75%   9.36%   -60.40% 
   + Complexity  375  48  -327 
   
 Files54  54   
 Lines  19971997   
 Branches236 236   
   
   - Hits   1393 187 -1206 
   - Misses  4731797 +1324 
   + Partials131  13  -118 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudiclient | `?` | `?` | |
   | hudiutilities | `9.36% <ø> (-60.40%)` | `48.00 <ø> (-327.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2892?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2892/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 

[GitHub] [hudi] danny0405 commented on pull request #2868: [HUDI-1821] Remove legacy code for Flink writer

2021-04-28 Thread GitBox


danny0405 commented on pull request #2868:
URL: https://github.com/apache/hudi/pull/2868#issuecomment-828261491


   Hi, @yanghua can you take a look, thanks ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] MyLanPangzi removed a comment on pull request #2868: [HUDI-1821] Remove legacy code for Flink writer

2021-04-28 Thread GitBox


MyLanPangzi removed a comment on pull request #2868:
URL: https://github.com/apache/hudi/pull/2868#issuecomment-828259208


   @yanghua Hi,i triggered the ci and it's passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] MyLanPangzi commented on pull request #2868: [HUDI-1821] Remove legacy code for Flink writer

2021-04-28 Thread GitBox


MyLanPangzi commented on pull request #2868:
URL: https://github.com/apache/hudi/pull/2868#issuecomment-828259208


   @yanghua Hi,i triggered the ci and it's passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1865) Make embedded time line service singleton

2021-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1865:
-
Labels: pull-request-available  (was: )

> Make embedded time line service singleton
> -
>
> Key: HUDI-1865
> URL: https://issues.apache.org/jira/browse/HUDI-1865
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> The filesystem view takes too much memory, make it process singleton.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 opened a new pull request #2892: [HUDI-1865] Make embedded time line service singleton

2021-04-28 Thread GitBox


danny0405 opened a new pull request #2892:
URL: https://github.com/apache/hudi/pull/2892


   The filesystem view takes too much memory, make it process singleton.
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1865) Make embedded time line service singleton

2021-04-28 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-1865:
-
Summary: Make embedded time line service singleton  (was: Make write client 
of flink pipeline singleton)

> Make embedded time line service singleton
> -
>
> Key: HUDI-1865
> URL: https://issues.apache.org/jira/browse/HUDI-1865
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.9.0
>
>
> The filesystem view takes too much memory, make it process singleton.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #2643: DO NOT MERGE (Azure CI) test branch ci

2021-04-28 Thread GitBox


hudi-bot edited a comment on pull request #2643:
URL: https://github.com/apache/hudi/pull/2643#issuecomment-792368481


   
   ## CI report:
   
   * 9831a6c50e9f49f8a71c02fc6ac50ae1446f7c1f UNKNOWN
   * a569dbe9409910fbb83b3764b300574c0e52612e Azure: 
[FAILURE](https://dev.azure.com/XUSH0012/0ef433cc-d4b4-47cc-b6a1-03d032ef546c/_build/results?buildId=142)
 
   * e6e9f1f1554a1474dd6c20338215030cad23a2e0 UNKNOWN
   * 2a6690a256c8cd8efe9ed2b1984b896fb27ef077 UNKNOWN
   * d8b7cca55e057a52a2e229d81e8cb52b60dc275f UNKNOWN
   * 3bce301333cc78194d13a702598b46e04fe9f85f UNKNOWN
   * f07f345baa450f3fec7eab59caa76b0fbda1e132 UNKNOWN
   * 869d2ce3fad330af93c1bb3b576824f519c6e68b UNKNOWN
   * fa86907f7522bc8dbe512d48b5a87e4a6b13f035 UNKNOWN
   * 4ebe53016ce3e0648992dbe14d04f71a92f116e6 UNKNOWN
   * 682ae9985f591f6d0c30ee2ef9b159403c1e46de UNKNOWN
   * d80397fcfeaa2996ab550bcdab4524be7420a364 UNKNOWN
   * bfe3a803e19540578b94f778f7ba7551db0f86f1 UNKNOWN
   * a632e58390eb94fcc7e757bd7580780cf184f9a8 UNKNOWN
   * 2e413d601c80b123269c2fc3fc6aa9a8bd0d746a UNKNOWN
   * e797ee47aa319df3c3c40bdc4acab4f592d70ffe UNKNOWN
   * acb06df73c1c2a0ef1590f66e8b41e173d2a7a7b UNKNOWN
   * f7f78ee22a0a75c5fb866c4e9cdda01482fbcb59 UNKNOWN
   * 3a7227993309e8dd37f2aef693cb3fed69a2043c UNKNOWN
   * 8f7a8e7f4989c9e20b936123c0f6e324898471d2 UNKNOWN
   * 6824c4917ad812c5938fe5346344a4aef9b7a72e UNKNOWN
   * 252364017f5dee1dcdfa061cc3070dac518d4047 UNKNOWN
   * b1691e583f3c23ee83fcb7ee0245eed826624cc0 UNKNOWN
   * ba970bda569f0312c77cd5c139f9dec4ad2759b0 UNKNOWN
   * 4370d21d4983e5e79d1f4bafba51ae26dd29f9a0 UNKNOWN
   * 21ea9ccef8ab9d78f9c201fa58a22e3e59caaa6b UNKNOWN
   * b17028e8a232ff3015c18b8f7de5435241800bfe UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #2643: DO NOT MERGE (Azure CI) test branch ci

2021-04-28 Thread GitBox


hudi-bot edited a comment on pull request #2643:
URL: https://github.com/apache/hudi/pull/2643#issuecomment-792368481


   
   ## CI report:
   
   * 9831a6c50e9f49f8a71c02fc6ac50ae1446f7c1f UNKNOWN
   * a569dbe9409910fbb83b3764b300574c0e52612e Azure: 
[FAILURE](https://dev.azure.com/XUSH0012/0ef433cc-d4b4-47cc-b6a1-03d032ef546c/_build/results?buildId=142)
 
   * e6e9f1f1554a1474dd6c20338215030cad23a2e0 UNKNOWN
   * 2a6690a256c8cd8efe9ed2b1984b896fb27ef077 UNKNOWN
   * d8b7cca55e057a52a2e229d81e8cb52b60dc275f UNKNOWN
   * 3bce301333cc78194d13a702598b46e04fe9f85f UNKNOWN
   * f07f345baa450f3fec7eab59caa76b0fbda1e132 UNKNOWN
   * 869d2ce3fad330af93c1bb3b576824f519c6e68b UNKNOWN
   * fa86907f7522bc8dbe512d48b5a87e4a6b13f035 UNKNOWN
   * 4ebe53016ce3e0648992dbe14d04f71a92f116e6 UNKNOWN
   * 682ae9985f591f6d0c30ee2ef9b159403c1e46de UNKNOWN
   * d80397fcfeaa2996ab550bcdab4524be7420a364 UNKNOWN
   * bfe3a803e19540578b94f778f7ba7551db0f86f1 UNKNOWN
   * a632e58390eb94fcc7e757bd7580780cf184f9a8 UNKNOWN
   * 2e413d601c80b123269c2fc3fc6aa9a8bd0d746a UNKNOWN
   * e797ee47aa319df3c3c40bdc4acab4f592d70ffe UNKNOWN
   * acb06df73c1c2a0ef1590f66e8b41e173d2a7a7b UNKNOWN
   * f7f78ee22a0a75c5fb866c4e9cdda01482fbcb59 UNKNOWN
   * 3a7227993309e8dd37f2aef693cb3fed69a2043c UNKNOWN
   * 8f7a8e7f4989c9e20b936123c0f6e324898471d2 UNKNOWN
   * 6824c4917ad812c5938fe5346344a4aef9b7a72e UNKNOWN
   * 252364017f5dee1dcdfa061cc3070dac518d4047 UNKNOWN
   * b1691e583f3c23ee83fcb7ee0245eed826624cc0 UNKNOWN
   * ba970bda569f0312c77cd5c139f9dec4ad2759b0 UNKNOWN
   * 4370d21d4983e5e79d1f4bafba51ae26dd29f9a0 UNKNOWN
   * 21ea9ccef8ab9d78f9c201fa58a22e3e59caaa6b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan edited a comment on pull request #2889: [HUDI-1810] Fix azure setting for integ tests (Azure CI)

2021-04-28 Thread GitBox


xushiyan edited a comment on pull request #2889:
URL: https://github.com/apache/hudi/pull/2889#issuecomment-828166779


   @vinothchandar integ tests were misconfigured previously. this makes [integ 
tests 
passed](https://dev.azure.com/xushiyan/apache-hudi-ci/_build/results?buildId=76=logs=d5c42908-5572-5ce6-e4a8-5e2053b947e8)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2891: [HUDI-1863] Add rate limiter to Flink writer to avoid OOM for bootstrap

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2891:
URL: https://github.com/apache/hudi/pull/2891#issuecomment-828184944


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2891?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2891](https://codecov.io/gh/apache/hudi/pull/2891?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (08576e3) into 
[master](https://codecov.io/gh/apache/hudi/commit/386767693d46e7419c4fb0fa292ccb7ab7f7098d?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3867676) will **decrease** coverage by `16.74%`.
   > The diff coverage is `93.47%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2891/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2891?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2891   +/-   ##
   =
   - Coverage 69.75%   53.00%   -16.75% 
   - Complexity  375 3747 +3372 
   =
 Files54  488  +434 
 Lines  199723521+21524 
 Branches236 2502 +2266 
   =
   + Hits   139312468+11075 
   - Misses  473 9954 +9481 
   - Partials131 1099  +968 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `39.53% <ø> (?)` | `220.00 <ø> (?)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.38% <ø> (?)` | `1975.00 <ø> (?)` | |
   | hudiflink | `59.66% <93.47%> (?)` | `538.00 <4.00> (?)` | |
   | hudihadoopmr | `33.33% <ø> (?)` | `198.00 <ø> (?)` | |
   | hudisparkdatasource | `73.33% <ø> (?)` | `237.00 <ø> (?)` | |
   | hudisync | `46.39% <ø> (?)` | `142.00 <ø> (?)` | |
   | huditimelineservice | `64.36% <ø> (?)` | `62.00 <ø> (?)` | |
   | hudiutilities | `69.75% <ø> (ø)` | `375.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2891?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...e/hudi/sink/transform/RowDataToHoodieFunction.java](https://codecov.io/gh/apache/hudi/pull/2891/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3RyYW5zZm9ybS9Sb3dEYXRhVG9Ib29kaWVGdW5jdGlvbi5qYXZh)
 | `85.71% <89.28%> (ø)` | `8.00 <3.00> (?)` | |
   | 
[...va/org/apache/hudi/configuration/FlinkOptions.java](https://codecov.io/gh/apache/hudi/pull/2891/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9jb25maWd1cmF0aW9uL0ZsaW5rT3B0aW9ucy5qYXZh)
 | `90.48% <100.00%> (ø)` | `11.00 <0.00> (?)` | |
   | 
[...java/org/apache/hudi/sink/StreamWriteFunction.java](https://codecov.io/gh/apache/hudi/pull/2891/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlRnVuY3Rpb24uamF2YQ==)
 | `77.77% <100.00%> (ø)` | `22.00 <1.00> (?)` | |
   | 
[...ava/org/apache/hudi/source/StreamReadOperator.java](https://codecov.io/gh/apache/hudi/pull/2891/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zb3VyY2UvU3RyZWFtUmVhZE9wZXJhdG9yLmphdmE=)
 | `90.66% <100.00%> (ø)` | `15.00 <0.00> (?)` | |
   | 
[...udi/common/table/timeline/dto/FSPermissionDTO.java](https://codecov.io/gh/apache/hudi/pull/2891/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9GU1Blcm1pc3Npb25EVE8uamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | 

[GitHub] [hudi] codecov-commenter edited a comment on pull request #2891: [HUDI-1863] Add rate limiter to Flink writer to avoid OOM for bootstrap

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2891:
URL: https://github.com/apache/hudi/pull/2891#issuecomment-828184944






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1865) Make write client of flink pipeline singleton

2021-04-28 Thread Danny Chen (Jira)
Danny Chen created HUDI-1865:


 Summary: Make write client of flink pipeline singleton
 Key: HUDI-1865
 URL: https://issues.apache.org/jira/browse/HUDI-1865
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0


The filesystem view takes too much memory, make it process singleton.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch asf-site updated: Travis CI build asf-site

2021-04-28 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new dff8c32  Travis CI build asf-site
dff8c32 is described below

commit dff8c3207cd64a87e5e4bddcd32b90857f89c9a4
Author: CI 
AuthorDate: Wed Apr 28 06:56:01 2021 +

Travis CI build asf-site
---
 content/docs/flink-quick-start-guide.html | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/content/docs/flink-quick-start-guide.html 
b/content/docs/flink-quick-start-guide.html
index 03d52fe..21ef68b 100644
--- a/content/docs/flink-quick-start-guide.html
+++ b/content/docs/flink-quick-start-guide.html
@@ -390,13 +390,7 @@ quick start tool for SQL users.
 The hudi-flink-bundle jar is archived with scala 2.11, so it’s recommended to 
use flink 1.12.x bundled with scala 2.11.
 
 Step.2 start flink cluster
-Start a standalone flink cluster within hadoop environment.
-Before you start up the cluster, we suggest to config the cluster as 
follows:
-
-
-  in $FLINK_HOME/conf/flink-conf.yaml, add config 
option taskmanager.numberOfTaskSlots: 
4
-  in $FLINK_HOME/conf/workers, add 
item localhost as 4 lines so that there 
are 4 workers on the local cluster
-
+Start a standalone flink cluster within hadoop environment.
 
 Now starts the cluster:
 
@@ -449,6 +443,8 @@ The SQL CLI only executes the SQL line by line.
 WITH (
   'connector' = 'hudi',
   'path' = 'table_base_path',
+  'write.tasks' = '1', -- default is 4 
,required more resource
+  'compaction.tasks' = '1', -- default is 10 
,required more resource
   'table.type' = 'MERGE_ON_READ' -- this creates a 
MERGE_ON_READ table, by default is COPY_ON_WRITE
 );
 
@@ -504,6 +500,7 @@ We do not need to specify endTime, if we want all changes 
after the given commit
   'connector' = 'hudi',
   'path' = 'table_base_path',
   'table.type' = 'MERGE_ON_READ',
+  'read.tasks' = '1', -- default is 4 
,required more resource
   'read.streaming.enabled' = 
'true',  -- 
this option enable the streaming read
   'read.streaming.start-commit' = '20210316134557', -- specifies the start commit instant 
time
   'read.streaming.check-interval' = '4' -- specifies 
the check interval for finding new source commits, default 60s.


[GitHub] [hudi] yanghua merged pull request #2890: [MINOR] minimize flink quick start resource

2021-04-28 Thread GitBox


yanghua merged pull request #2890:
URL: https://github.com/apache/hudi/pull/2890


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #2889: [HUDI-1810] Fix azure setting for integ tests (Azure CI)

2021-04-28 Thread GitBox


hudi-bot edited a comment on pull request #2889:
URL: https://github.com/apache/hudi/pull/2889#issuecomment-828163773


   
   ## CI report:
   
   * 3f35042fee2ab77100af7cddbc1b5914808ef7d1 Travis: 
[FAILURE](https://travis-ci.com/github/apachehudi-ci/hudi-branch-ci/builds/224346756)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #2889: [HUDI-1810] Fix azure setting for integ tests (Azure CI)

2021-04-28 Thread GitBox


codecov-commenter edited a comment on pull request #2889:
URL: https://github.com/apache/hudi/pull/2889#issuecomment-828165841






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter commented on pull request #2891: [HUDI-1863] Add rate limiter to Flink writer to avoid OOM for bootstrap

2021-04-28 Thread GitBox


codecov-commenter commented on pull request #2891:
URL: https://github.com/apache/hudi/pull/2891#issuecomment-828184944


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2891?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2891](https://codecov.io/gh/apache/hudi/pull/2891?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (08576e3) into 
[master](https://codecov.io/gh/apache/hudi/commit/386767693d46e7419c4fb0fa292ccb7ab7f7098d?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (3867676) will **decrease** coverage by `60.39%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2891/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2891?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2891   +/-   ##
   
   - Coverage 69.75%   9.36%   -60.40% 
   + Complexity  375  48  -327 
   
 Files54  54   
 Lines  19971997   
 Branches236 236   
   
   - Hits   1393 187 -1206 
   - Misses  4731797 +1324 
   + Partials131  13  -118 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudiclient | `?` | `?` | |
   | hudiutilities | `9.36% <ø> (-60.40%)` | `48.00 <ø> (-327.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2891?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2891/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2891/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2891/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2891/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2891/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2891/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 

  1   2   >