[GitHub] [hudi] codecov-commenter commented on pull request #2210: [HUDI-1348] Provide option to clean up DFS sources
codecov-commenter commented on pull request #2210: URL: https://github.com/apache/hudi/pull/2210#issuecomment-862054362 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2210](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (b845e34) into [master](https://codecov.io/gh/apache/hudi/commit/673d62f3c3ab07abb3fcd319607e657339bc0682?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (673d62f) will **increase** coverage by `41.65%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2210/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2210 +/- ## = + Coverage 8.43% 50.09% +41.65% - Complexity 62 3091 +3029 = Files70 386 +316 Lines 288018953+16073 Branches359 1977 +1618 = + Hits243 9494 +9251 - Misses 2616 8661 +6045 - Partials 21 798 +777 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.95% <ø> (?)` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `48.20% <ø> (?)` | | | hudiflink | `60.73% <ø> (?)` | | | hudihadoopmr | `51.34% <ø> (?)` | | | hudisync | `?` | | | hudiutilities | `?` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...callback/kafka/HoodieWriteCommitKafkaCallback.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NhbGxiYWNrL2thZmthL0hvb2RpZVdyaXRlQ29tbWl0S2Fma2FDYWxsYmFjay5qYXZh) | | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | | | | [.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTXVsdGlQYXJ0S2V5c1ZhbHVlRXh0cmFjdG9yLmphdmE=) | | | | [...alCheckpointFromAnotherHoodieTimelineProvider.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvSW5pdGlhbENoZWNrcG9pbnRGcm9tQW5vdGhlckhvb2RpZVRpbWVsaW5lUHJvdmlkZXIuamF2YQ==) | | | | [...ties/exception/HoodieIncrementalPullException.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVJbmNyZW1lbnRhbFB1bGxFeGNlcHRpb24uamF2YQ==) | | | | [...ache/hudi/hive/HiveMetastoreBasedLockProvider.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZU1ldGFzdG9yZUJhc2VkTG9ja1Byb3ZpZGVyLmphdmE=) | | | |
[jira] [Updated] (HUDI-1047) Support asynchronize clustering in CoW mode
[ https://issues.apache.org/jira/browse/HUDI-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-1047: -- Fix Version/s: 0.9.0 > Support asynchronize clustering in CoW mode > --- > > Key: HUDI-1047 > URL: https://issues.apache.org/jira/browse/HUDI-1047 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: leesf >Priority: Blocker > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codecov-commenter edited a comment on pull request #2210: [HUDI-1348] Provide option to clean up DFS sources
codecov-commenter edited a comment on pull request #2210: URL: https://github.com/apache/hudi/pull/2210#issuecomment-862054362 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2210](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (b845e34) into [master](https://codecov.io/gh/apache/hudi/commit/673d62f3c3ab07abb3fcd319607e657339bc0682?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (673d62f) will **increase** coverage by `44.07%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2210/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2210 +/- ## = + Coverage 8.43% 52.51% +44.07% - Complexity 62 3664 +3602 = Files70 474 +404 Lines 288023997+21117 Branches359 2741 +2382 = + Hits24312601+12358 - Misses 261610137 +7521 - Partials 21 1259 +1238 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.95% <ø> (?)` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `48.20% <ø> (?)` | | | hudiflink | `60.73% <ø> (?)` | | | hudihadoopmr | `51.34% <ø> (?)` | | | hudisparkdatasource | `66.47% <ø> (?)` | | | hudisync | `46.79% <ø> (+40.00%)` | :arrow_up: | | huditimelineservice | `64.36% <ø> (?)` | | | hudiutilities | `?` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...ache/hudi/hive/HiveMetastoreBasedLockProvider.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZU1ldGFzdG9yZUJhc2VkTG9ja1Byb3ZpZGVyLmphdmE=) | `0.00% <0.00%> (-60.22%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | | | | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | | | | [...a/org/apache/hudi/utilities/sources/SqlSource.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvU3FsU291cmNlLmphdmE=) | | | | [...s/exception/HoodieIncrementalPullSQLException.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVJbmNyZW1lbnRhbFB1bGxTUUxFeGNlcHRpb24uamF2YQ==) | | | | [...che/hudi/utilities/schema/SchemaPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQb3N0UHJvY2Vzc29yLmphdmE=) | | | |
[jira] [Updated] (HUDI-1048) Support Asynchronize clustering in MoR mode
[ https://issues.apache.org/jira/browse/HUDI-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-1048: -- Fix Version/s: 0.9.0 > Support Asynchronize clustering in MoR mode > --- > > Key: HUDI-1048 > URL: https://issues.apache.org/jira/browse/HUDI-1048 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: leesf >Priority: Blocker > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2027) Certify bulk_insert row writing for COW and MOR w/ test suite infra
sivabalan narayanan created HUDI-2027: - Summary: Certify bulk_insert row writing for COW and MOR w/ test suite infra Key: HUDI-2027 URL: https://issues.apache.org/jira/browse/HUDI-2027 Project: Apache Hudi Issue Type: Task Reporter: sivabalan narayanan Certify bulk_insert row writing for COW and MOR w/ test suite infra. should include validations for archival and cleaning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1048) Support Asynchronize clustering in MoR mode
[ https://issues.apache.org/jira/browse/HUDI-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-1048: -- Priority: Blocker (was: Major) > Support Asynchronize clustering in MoR mode > --- > > Key: HUDI-1048 > URL: https://issues.apache.org/jira/browse/HUDI-1048 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: leesf >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1042) [Umbrella] Support clustering on filegroups
[ https://issues.apache.org/jira/browse/HUDI-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364073#comment-17364073 ] Vinoth Chandar commented on HUDI-1042: -- [~uditme] Can you please add any suggestions around config simplification or first time bulk_insert + clustering issue you mentioned? We would like improve this and call it GA in 0.9.0 cc [~satishkotha] This is something we should make more and more intelligent as well. > [Umbrella] Support clustering on filegroups > --- > > Key: HUDI-1042 > URL: https://issues.apache.org/jira/browse/HUDI-1042 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: hudi-umbrellas > Fix For: 0.9.0 > > > please see > [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+speed+and+query+performance] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] wangfeigithub commented on issue #3065: Not in Marker Dir occurs when I write to HDFS using Spark
wangfeigithub commented on issue #3065: URL: https://github.com/apache/hudi/issues/3065#issuecomment-862055951 https://github.com/apache/hudi/issues/3065#issuecomment-862009477 directly writing new files using 0.8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1048) Support Asynchronize clustering in MoR mode
[ https://issues.apache.org/jira/browse/HUDI-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1048: - Status: Open (was: New) > Support Asynchronize clustering in MoR mode > --- > > Key: HUDI-1048 > URL: https://issues.apache.org/jira/browse/HUDI-1048 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: leesf >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2025) Ensure parity between row writer bulk_insert and rdd based bulk_insert
[ https://issues.apache.org/jira/browse/HUDI-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2025: -- Description: Ensure parity between row writer bulk_insert and rdd based bulk_insert (was: Bring parity between row writer bulk_insert and rdd based bulk_insert) > Ensure parity between row writer bulk_insert and rdd based bulk_insert > -- > > Key: HUDI-2025 > URL: https://issues.apache.org/jira/browse/HUDI-2025 > Project: Apache Hudi > Issue Type: Task >Reporter: sivabalan narayanan >Priority: Major > > Ensure parity between row writer bulk_insert and rdd based bulk_insert -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2025) Ensure parity between row writer bulk_insert and rdd based bulk_insert
[ https://issues.apache.org/jira/browse/HUDI-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2025: -- Summary: Ensure parity between row writer bulk_insert and rdd based bulk_insert (was: Bring parity between row writer bulk_insert and rdd based bulk_insert) > Ensure parity between row writer bulk_insert and rdd based bulk_insert > -- > > Key: HUDI-2025 > URL: https://issues.apache.org/jira/browse/HUDI-2025 > Project: Apache Hudi > Issue Type: Task >Reporter: sivabalan narayanan >Priority: Major > > Bring parity between row writer bulk_insert and rdd based bulk_insert -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1500) Support incrementally reading clustering commit via Spark Datasource/DeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1500: - Summary: Support incrementally reading clustering commit via Spark Datasource/DeltaStreamer (was: support incremental read clustering commit in deltastreamer) > Support incrementally reading clustering commit via Spark > Datasource/DeltaStreamer > --- > > Key: HUDI-1500 > URL: https://issues.apache.org/jira/browse/HUDI-1500 > Project: Apache Hudi > Issue Type: Sub-task > Components: DeltaStreamer, Spark Integration >Reporter: liwei >Assignee: satish >Priority: Blocker > Fix For: 0.9.0 > > > now in DeltaSync.readFromSource() can not read last instant as replace > commit, such as clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1500) support incremental read clustering commit in deltastreamer
[ https://issues.apache.org/jira/browse/HUDI-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1500: - Component/s: Spark Integration > support incremental read clustering commit in deltastreamer > > > Key: HUDI-1500 > URL: https://issues.apache.org/jira/browse/HUDI-1500 > Project: Apache Hudi > Issue Type: Sub-task > Components: DeltaStreamer, Spark Integration >Reporter: liwei >Assignee: satish >Priority: Blocker > Fix For: 0.9.0 > > > now in DeltaSync.readFromSource() can not read last instant as replace > commit, such as clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1706) Test flakiness w/ multiwriter test
[ https://issues.apache.org/jira/browse/HUDI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-1706: -- Priority: Blocker (was: Major) > Test flakiness w/ multiwriter test > -- > > Key: HUDI-1706 > URL: https://issues.apache.org/jira/browse/HUDI-1706 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: sivabalan narayanan >Assignee: Nishith Agarwal >Priority: Blocker > > [https://api.travis-ci.com/v3/job/492130170/log.txt] > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1706) Test flakiness w/ multiwriter test
[ https://issues.apache.org/jira/browse/HUDI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-1706: -- Fix Version/s: 0.9.0 > Test flakiness w/ multiwriter test > -- > > Key: HUDI-1706 > URL: https://issues.apache.org/jira/browse/HUDI-1706 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: sivabalan narayanan >Assignee: Nishith Agarwal >Priority: Blocker > Fix For: 0.9.0 > > > [https://api.travis-ci.com/v3/job/492130170/log.txt] > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1492) Handle DeltaWriteStat correctly for storage schemes that support appends
[ https://issues.apache.org/jira/browse/HUDI-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1492: - Priority: Blocker (was: Major) > Handle DeltaWriteStat correctly for storage schemes that support appends > > > Key: HUDI-1492 > URL: https://issues.apache.org/jira/browse/HUDI-1492 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Vinoth Chandar >Assignee: Prashant Wason >Priority: Blocker > Fix For: 0.9.0 > > > Current implementation simply uses the > {code:java} > String pathWithPartition = hoodieWriteStat.getPath(); {code} > to write the metadata table. this is problematic, if the delta write was > merely an append. and can technically add duplicate files into the metadata > table > (not sure if this is a problem per se. but filing a Jira to track and either > close/fix ) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1309) Listing Metadata unreadable in S3 as the log block is deemed corrupted
[ https://issues.apache.org/jira/browse/HUDI-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-1309: Assignee: Nishith Agarwal > Listing Metadata unreadable in S3 as the log block is deemed corrupted > -- > > Key: HUDI-1309 > URL: https://issues.apache.org/jira/browse/HUDI-1309 > Project: Apache Hudi > Issue Type: Sub-task > Components: Writer Core >Reporter: Balaji Varadarajan >Assignee: Nishith Agarwal >Priority: Blocker > Fix For: 0.9.0 > > > When running metadata list-partitions CLI command, I am seeing the below > messages and the partition list is empty. Was expecting 10K partitions. > > {code:java} > 36589 [Spring Shell] INFO > org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Scanning > log file > HoodieLogFile{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045', > fileLen=0} > 36590 [Spring Shell] INFO > org.apache.hudi.common.table.log.HoodieLogFileReader - Found corrupted block > in file > HoodieLogFile{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045', > fileLen=0} with block size(3723305) running past EOF > 36684 [Spring Shell] INFO > org.apache.hudi.common.table.log.HoodieLogFileReader - Log > HoodieLogFile{pathStr='s3a:///dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045', > fileLen=0} has a corrupted block at 14 > 44515 [Spring Shell] INFO > org.apache.hudi.common.table.log.HoodieLogFileReader - Next available block > in > HoodieLogFile{pathStr='s3a:///dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045', > fileLen=0} starts at 3723319 > 44566 [Spring Shell] INFO > org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Found a > corrupt block in > s3a:///dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045 > 44567 [Spring Shell] INFO > org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - M{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1309) Listing Metadata unreadable in S3 as the log block is deemed corrupted
[ https://issues.apache.org/jira/browse/HUDI-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1309: - Fix Version/s: 0.9.0 > Listing Metadata unreadable in S3 as the log block is deemed corrupted > -- > > Key: HUDI-1309 > URL: https://issues.apache.org/jira/browse/HUDI-1309 > Project: Apache Hudi > Issue Type: Sub-task > Components: Writer Core >Reporter: Balaji Varadarajan >Priority: Blocker > Fix For: 0.9.0 > > > When running metadata list-partitions CLI command, I am seeing the below > messages and the partition list is empty. Was expecting 10K partitions. > > {code:java} > 36589 [Spring Shell] INFO > org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Scanning > log file > HoodieLogFile{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045', > fileLen=0} > 36590 [Spring Shell] INFO > org.apache.hudi.common.table.log.HoodieLogFileReader - Found corrupted block > in file > HoodieLogFile{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045', > fileLen=0} with block size(3723305) running past EOF > 36684 [Spring Shell] INFO > org.apache.hudi.common.table.log.HoodieLogFileReader - Log > HoodieLogFile{pathStr='s3a:///dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045', > fileLen=0} has a corrupted block at 14 > 44515 [Spring Shell] INFO > org.apache.hudi.common.table.log.HoodieLogFileReader - Next available block > in > HoodieLogFile{pathStr='s3a:///dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045', > fileLen=0} starts at 3723319 > 44566 [Spring Shell] INFO > org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Found a > corrupt block in > s3a:///dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045 > 44567 [Spring Shell] INFO > org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - M{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan commented on issue #1845: [SUPPORT] Support for Schema evolution. Facing an error
nsivabalan commented on issue #1845: URL: https://github.com/apache/hudi/issues/1845#issuecomment-862039137 yes, I am in sync w/ @sbernauer via slack. He confirmed that the PR we have put up works for him (older records able to be upserted to hudi after schema evolved w/ hudi table). He is doing more testing for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert
hudi-bot edited a comment on pull request #3035: URL: https://github.com/apache/hudi/pull/3035#issuecomment-862017744 ## CI report: * 26dadb6627c90c9f06e66fba0b8bd24e5579665f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2210: [HUDI-1348] Provide option to clean up DFS sources
hudi-bot edited a comment on pull request #2210: URL: https://github.com/apache/hudi/pull/2210#issuecomment-862028641 ## CI report: * b845e34d11e4e44e2b41e2089349baddc3a10b80 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=210) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #2210: [HUDI-1348] Provide option to clean up DFS sources
hudi-bot commented on pull request #2210: URL: https://github.com/apache/hudi/pull/2210#issuecomment-862028641 ## CI report: * b845e34d11e4e44e2b41e2089349baddc3a10b80 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert
nsivabalan commented on a change in pull request #3035: URL: https://github.com/apache/hudi/pull/3035#discussion_r652332012 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.model; + +import org.apache.hudi.common.util.Option; +import org.apache.hudi.exception.ColumnNotFoundException; +import org.apache.hudi.exception.UpdateKeyNotFoundException; +import org.apache.hudi.exception.WriteOperationException; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.IndexedRecord; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.Properties; +import java.util.stream.Collectors; + +/** + * subclass of OverwriteWithLatestAvroPayload used for delta streamer. + * + * + * combineAndGetUpdateValue - Accepts the column names to be updated; + * splitKeys - Split keys based upon keys; + * + */ +public class OverwriteWithCustomAvroPayload extends OverwriteWithLatestAvroPayload { + + public OverwriteWithCustomAvroPayload(GenericRecord record, Comparable orderingVal) { +super(record, orderingVal); + } + + /** + * split keys over. + */ + public List splitKeys(String keys) throws UpdateKeyNotFoundException { +if (keys == null) { + throw new UpdateKeyNotFoundException("keys cannot be null"); +} else if (keys.equals("")) { + throw new UpdateKeyNotFoundException("keys cannot be blank"); +} else { + return Arrays.stream(keys.split(",")).collect(Collectors.toList()); +} + } + + /** + * check column exi. + */ + public boolean checkColumnExists(List keys, Schema schema) { +List field = schema.getFields(); +List common = new ArrayList<>(); +for (Schema.Field columns : field) { + if (keys.contains(columns.name())) { +common.add(columns); + } +} +return common.size() == keys.size(); + } + + @Override + public Option combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties) + throws WriteOperationException, IOException, ColumnNotFoundException, UpdateKeyNotFoundException { + +if (!properties.getProperty("hoodie.datasource.write.operation").equals("upsert")) { + throw new WriteOperationException("write should be upsert"); +} + +Option recordOption = getInsertValue(schema); + +if (!recordOption.isPresent()) { + return Option.empty(); +} + +GenericRecord existingRecord = (GenericRecord) currentValue; +GenericRecord incomingRecord = (GenericRecord) recordOption.get(); +List keys = splitKeys(properties.getProperty("hoodie.update.keys")); Review comment: Also, lets add this config to DataSourceWriteOptions. may be we can name it as "hoodie.datasource.write.partial.fields.to.update" ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithCustomAvroPayload.java ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.model; + +import org.apache.hudi.common.util.Option; +import org.apache.hudi.exception.ColumnNotFoundException; +import org.apache.hudi.exception.UpdateKeyNotFoundException; +import
[GitHub] [hudi] hudi-bot edited a comment on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert
hudi-bot edited a comment on pull request #3035: URL: https://github.com/apache/hudi/pull/3035#issuecomment-862017744 ## CI report: * 26dadb6627c90c9f06e66fba0b8bd24e5579665f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=209) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3035: [HUDI-1936] Introduce a optional property for conditional upsert
hudi-bot commented on pull request #3035: URL: https://github.com/apache/hudi/pull/3035#issuecomment-862017744 ## CI report: * 26dadb6627c90c9f06e66fba0b8bd24e5579665f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2265: Arrays with nulls in them result in broken parquet files
n3nash commented on issue #2265: URL: https://github.com/apache/hudi/issues/2265#issuecomment-862016892 The fix has been landed and a FAQ has been added here -> https://cwiki.apache.org/confluence/display/HUDI/FAQ?focusedCommentId=181310323#comment-181310323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash closed issue #2265: Arrays with nulls in them result in broken parquet files
n3nash closed issue #2265: URL: https://github.com/apache/hudi/issues/2265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #1845: [SUPPORT] Support for Schema evolution. Facing an error
n3nash commented on issue #1845: URL: https://github.com/apache/hudi/issues/1845#issuecomment-862016359 @nsivabalan Can you please reply above ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.
n3nash commented on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-862016095 With 0.7.0, one can set `hoodie.metadata.enable` to true to eliminate issues due to file listings. Closing this ticket now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash closed issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.
n3nash closed issue #1829: URL: https://github.com/apache/hudi/issues/1829 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash closed issue #1679: [HUDI-1609] How to disable Hive JDBC and enable metastore
n3nash closed issue #1679: URL: https://github.com/apache/hudi/issues/1679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #1679: [HUDI-1609] How to disable Hive JDBC and enable metastore
n3nash commented on issue #1679: URL: https://github.com/apache/hudi/issues/1679#issuecomment-862014466 Closing this ticket due to inactivity. There is a [PR](https://github.com/apache/hudi/pull/2879) open that will provide ways to disable JDBC. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2026) Add documentation for GlobalDeleteKeyGenerator
Nishith Agarwal created HUDI-2026: - Summary: Add documentation for GlobalDeleteKeyGenerator Key: HUDI-2026 URL: https://issues.apache.org/jira/browse/HUDI-2026 Project: Apache Hudi Issue Type: Sub-task Reporter: Nishith Agarwal Assignee: sivabalan narayanan [https://github.com/apache/hudi/issues/3008] {code:java} - should hard delete records from hudi table with hive sync *** FAILED *** (24 seconds, 49 milliseconds) Cause: java.lang.NoSuchMethodException: org.apache.hudi.keygen.GlobalDeleteKeyGenerator.() [scalatest] at java.lang.Class.getConstructor0(Class.java:3110) [scalatest] at java.lang.Class.newInstance(Class.java:412) [scalatest] at org.apache.hudi.hive.HoodieHiveClient.(HoodieHiveClient.java:98) [scalatest] at org.apache.hudi.hive.HiveSyncTool.(HiveSyncTool.java:69) [scalatest] at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:391) [scalatest] at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:440) [scalatest] at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:436) [scalatest] at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) [scalatest] at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:436) [scalatest] at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:497) [scalatest] at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:222) [scalatest] at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145) [scalatest] at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) [scalatest] at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) [scalatest] at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) [scalatest] at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) [scalatest] at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) [scalatest] at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) [scalatest] at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) [scalatest] at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) [scalatest] at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) [scalatest] at org.apach {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] n3nash closed issue #3008: [SUPPORT] Hive Sync issues on deletes and non partitioned table
n3nash closed issue #3008: URL: https://github.com/apache/hudi/issues/3008 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #3008: [SUPPORT] Hive Sync issues on deletes and non partitioned table
n3nash commented on issue #3008: URL: https://github.com/apache/hudi/issues/3008#issuecomment-862012567 @pranotishanbhag We will add the right documentation for the GlobalDeleteKeyGenerator. Can you please expand on what code changes are needed for the second issue ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3086: [HUDI-1776] Support AlterCommand For Hoodie
hudi-bot edited a comment on pull request #3086: URL: https://github.com/apache/hudi/pull/3086#issuecomment-861654333 ## CI report: * c041853f41119f23760388d1ab5c7173fe22936b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=198) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=208) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #3059: when java client api support MERGE_ON_READ ?
n3nash commented on issue #3059: URL: https://github.com/apache/hudi/issues/3059#issuecomment-862010731 @lppsuixn Gentle ping to respond to @leesf comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #3063: Hive database not auto created when syncing
n3nash commented on issue #3063: URL: https://github.com/apache/hudi/issues/3063#issuecomment-862010383 Closing this ticket since the issue is resolved. Thanks @veenaypatil ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash closed issue #3063: Hive database not auto created when syncing
n3nash closed issue #3063: URL: https://github.com/apache/hudi/issues/3063 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #3065: Not in Marker Dir occurs when I write to HDFS using Spark
n3nash commented on issue #3065: URL: https://github.com/apache/hudi/issues/3065#issuecomment-862009477 @wangfeigithub Are you trying to upgrade from a previous older version of Hudi or are you directly writing new files using 0.8 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #3078: [SUPPORT] combineAndGetUpdateValue is not getting called when Schema evolution happens
n3nash commented on issue #3078: URL: https://github.com/apache/hudi/issues/3078#issuecomment-862008843 @tandonraghav To expect the payload semantics for compaction, you need to override the `preCombine` implementation and have the same implementation across `combineAndGetUpdateValue` and `preCombine`. The idea is that whether it is 2 records in memory, 2 records on disk or 1 record in memory vs 1 on disk -> we call either `preCombine` or `combineAndGetUpdateValue` depending on what can be constructed as a payload. Please try overriding preCombine and let me know if you still see problems. We might open a documentation around this to make it more clear. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] atharshah-ea edited a comment on issue #2522: [SUPPORT] Avoid UPSERT unchanged records from source
atharshah-ea edited a comment on issue #2522: URL: https://github.com/apache/hudi/issues/2522#issuecomment-862002125 Hi, also looking for an example of how to specify the DefaultHoodieRecordPayload. Setting the following option did not work for us: **'hoodie.datasource.write.payload.class': 'org.apache.hudi.DefaultHoodieRecordPayload'** Output: could not create payload for class: org.apache.hudi.default hoodie record payload @nsivabalan @vinothchandar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] atharshah-ea commented on issue #2522: [SUPPORT] Avoid UPSERT unchanged records from source
atharshah-ea commented on issue #2522: URL: https://github.com/apache/hudi/issues/2522#issuecomment-862002125 Hi, also looking for an example of how to specify the DefaultHoodieRecordPayload. Setting the following option did not work for us: **'hoodie.datasource.write.payload.class': 'org.apache.hudi.DefaultHoodieRecordPayload'** Output: could not create payload for class: org.apache.hudi.default hoodie record payload -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (HUDI-2025) Bring parity between row writer bulk_insert and rdd based bulk_insert
[ https://issues.apache.org/jira/browse/HUDI-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364021#comment-17364021 ] sivabalan narayanan edited comment on HUDI-2025 at 6/16/21, 2:48 AM: - Trying to check differences between both flows. Inspected metadata that gets attached to parquet written in both paths. // listing just the keys in extra metadata. sn$ grep "extra" /tmp/regular_bulk_insert_meta.out| cut -d"=" -f1 extra: org.apache.hudi.bloomfilter extra: hoodie_min_record_key extra: parquet.avro.schema extra: writer.model.name extra: hoodie_max_record_key sn$ grep "extra" /tmp/rowWriter_bulk_insert_meta.out| cut -d"=" -f1 extra: org.apache.spark.version extra: org.apache.hudi.bloomfilter extra: hoodie_min_record_key extra: org.apache.spark.sql.parquet.row.metadata extra: hoodie_max_record_key Sample parquet meta in both paths : [https://drive.google.com/drive/folders/1scb-OysFEhB6HBz7s50-mwcpLAccWdW5?usp=sharing] was (Author: shivnarayan): Trying to check differences between both flows. Inspected metadata that gets attached to parquet written in both paths. // listing just the keys in extra metadata. sn$ grep "extra" /tmp/regular_bulk_insert_meta.out| cut -d"=" -f1 extra: org.apache.hudi.bloomfilter extra: hoodie_min_record_key extra: parquet.avro.schema extra: writer.model.name extra: hoodie_max_record_key sn$ grep "extra" /tmp/rowWriter_bulk_insert_meta.out| cut -d"=" -f1 extra: org.apache.spark.version extra: org.apache.hudi.bloomfilter extra: hoodie_min_record_key extra: org.apache.spark.sql.parquet.row.metadata extra: hoodie_max_record_key > Bring parity between row writer bulk_insert and rdd based bulk_insert > - > > Key: HUDI-2025 > URL: https://issues.apache.org/jira/browse/HUDI-2025 > Project: Apache Hudi > Issue Type: Task >Reporter: sivabalan narayanan >Priority: Major > > Bring parity between row writer bulk_insert and rdd based bulk_insert -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2025) Bring parity between row writer bulk_insert and rdd based bulk_insert
[ https://issues.apache.org/jira/browse/HUDI-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364021#comment-17364021 ] sivabalan narayanan commented on HUDI-2025: --- Trying to check differences between both flows. Inspected metadata that gets attached to parquet written in both paths. // listing just the keys in extra metadata. sn$ grep "extra" /tmp/regular_bulk_insert_meta.out| cut -d"=" -f1 extra: org.apache.hudi.bloomfilter extra: hoodie_min_record_key extra: parquet.avro.schema extra: writer.model.name extra: hoodie_max_record_key sn$ grep "extra" /tmp/rowWriter_bulk_insert_meta.out| cut -d"=" -f1 extra: org.apache.spark.version extra: org.apache.hudi.bloomfilter extra: hoodie_min_record_key extra: org.apache.spark.sql.parquet.row.metadata extra: hoodie_max_record_key > Bring parity between row writer bulk_insert and rdd based bulk_insert > - > > Key: HUDI-2025 > URL: https://issues.apache.org/jira/browse/HUDI-2025 > Project: Apache Hudi > Issue Type: Task >Reporter: sivabalan narayanan >Priority: Major > > Bring parity between row writer bulk_insert and rdd based bulk_insert -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2025) Bring parity between row writer bulk_insert and rdd based bulk_insert
sivabalan narayanan created HUDI-2025: - Summary: Bring parity between row writer bulk_insert and rdd based bulk_insert Key: HUDI-2025 URL: https://issues.apache.org/jira/browse/HUDI-2025 Project: Apache Hudi Issue Type: Task Reporter: sivabalan narayanan Bring parity between row writer bulk_insert and rdd based bulk_insert -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3086: [HUDI-1776] Support AlterCommand For Hoodie
hudi-bot edited a comment on pull request #3086: URL: https://github.com/apache/hudi/pull/3086#issuecomment-861654333 ## CI report: * c041853f41119f23760388d1ab5c7173fe22936b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=198) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=208) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #3086: [HUDI-1776] Support AlterCommand For Hoodie
xushiyan commented on pull request #3086: URL: https://github.com/apache/hudi/pull/3086#issuecomment-861991019 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on pull request #3085: [HUDI-2019] Update writeConfig in every task
yuzhaojing commented on pull request #3085: URL: https://github.com/apache/hudi/pull/3085#issuecomment-861981444 > Also add a test case to indicate that the config is overridden when embedded server is reused. OK,I will add a test case -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #3085: [HUDI-2019] Update writeConfig in every task
danny0405 commented on pull request #3085: URL: https://github.com/apache/hudi/pull/3085#issuecomment-861980618 Also add a test case to indicate that the config is overridden when embedded server is reused. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-2022) Release writer for append handle #close
[ https://issues.apache.org/jira/browse/HUDI-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-2022. -- Fix Version/s: 0.9.0 Resolution: Done 61efc6af79c389ef0a77cda75e4f562ed59ef86b > Release writer for append handle #close > --- > > Key: HUDI-2022 > URL: https://issues.apache.org/jira/browse/HUDI-2022 > Project: Apache Hudi > Issue Type: Improvement > Components: Common Core >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > The writer can be release eagerly to save the memory footprint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] yanghua merged pull request #3087: [HUDI-2022] Release writer for append handle #close
yanghua merged pull request #3087: URL: https://github.com/apache/hudi/pull/3087 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-2022] Release writer for append handle #close (#3087)
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 61efc6a [HUDI-2022] Release writer for append handle #close (#3087) 61efc6a is described below commit 61efc6af79c389ef0a77cda75e4f562ed59ef86b Author: yuzhaojing <32435329+yuzhaoj...@users.noreply.github.com> AuthorDate: Wed Jun 16 09:18:38 2021 +0800 [HUDI-2022] Release writer for append handle #close (#3087) Co-authored-by: 喻兆靖 --- .../src/main/java/org/apache/hudi/io/HoodieAppendHandle.java | 1 + 1 file changed, 1 insertion(+) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java index 8ee4b46..64de066 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java @@ -389,6 +389,7 @@ public class HoodieAppendHandle extends recordItr = null; if (writer != null) { writer.close(); +writer = null; // update final size, once for all log files for (WriteStatus status: statuses) {
[hudi] branch master updated (b8fe5b9 -> 910fe48)
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from b8fe5b9 [HUDI-764] [HUDI-765] ORC reader writer Implementation (#2999) add 910fe48 [MINOR] Rename broken codecov file (#3088) No new revisions were added by this update. Summary of changes: .codecov.yml => .codecov.yml.broken | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename .codecov.yml => .codecov.yml.broken (100%)
[GitHub] [hudi] vinothchandar merged pull request #3088: [MINOR] Rename broken codecov file
vinothchandar merged pull request #3088: URL: https://github.com/apache/hudi/pull/3088 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3088: [MINOR] Rename broken codecov file
codecov-commenter edited a comment on pull request #3088: URL: https://github.com/apache/hudi/pull/3088#issuecomment-861892572 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3088?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3088](https://codecov.io/gh/apache/hudi/pull/3088?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (e168476) into [master](https://codecov.io/gh/apache/hudi/commit/b8fe5b91d599418cd908d833fd63edc7f362c548?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (b8fe5b9) will **increase** coverage by `4.87%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3088/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3088?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3088 +/- ## + Coverage 49.29% 54.17% +4.87% - Complexity 3726 4071 +345 Files 530 530 Lines 2605326053 Branches 2986 2986 + Hits 1284414115+1271 + Misses1192910534-1395 - Partials 1280 1404 +124 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.95% <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `48.18% <ø> (-0.02%)` | :arrow_down: | | hudiflink | `60.73% <ø> (ø)` | | | hudihadoopmr | `51.34% <ø> (ø)` | | | hudisparkdatasource | `66.47% <ø> (ø)` | | | hudisync | `51.45% <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | | | hudiutilities | `71.01% <ø> (+61.91%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3088?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/3088/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | `78.12% <0.00%> (-1.57%)` | :arrow_down: | | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3088/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `88.79% <0.00%> (+5.17%)` | :arrow_up: | | [...e/hudi/utilities/transform/ChainedTransformer.java](https://codecov.io/gh/apache/hudi/pull/3088/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9DaGFpbmVkVHJhbnNmb3JtZXIuamF2YQ==) | `100.00% <0.00%> (+11.11%)` | :arrow_up: | | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/3088/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `64.53% <0.00%> (+30.23%)` | :arrow_up: | | [...ties/deltastreamer/HoodieDeltaStreamerMetrics.java](https://codecov.io/gh/apache/hudi/pull/3088/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lck1ldHJpY3MuamF2YQ==) | `36.11% <0.00%> (+36.11%)` | :arrow_up: | |
[GitHub] [hudi] danny0405 commented on pull request #3085: [HUDI-2019] Update writeConfig in every task
danny0405 commented on pull request #3085: URL: https://github.com/apache/hudi/pull/3085#issuecomment-861919796 Nice catch, can we make the commit message more clear: Set up the file system view storage config for singleton embedded server write config every time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #3025: [HUDI-1955]Fix the filter condition is missing in the judgment condition of comp…
danny0405 commented on a change in pull request #3025: URL: https://github.com/apache/hudi/pull/3025#discussion_r652258797 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/BaseScheduleCompactionActionExecutor.java ## @@ -63,7 +63,7 @@ public BaseScheduleCompactionActionExecutor(HoodieEngineContext context, + ", Compaction scheduled at " + instantTime)); // Committed and pending compaction instants should have strictly lower timestamps List conflictingInstants = table.getActiveTimeline() - .getWriteTimeline().getInstants() + .getWriteTimeline().filterCompletedAndCompactionInstants().getInstants() Review comment: Sure, let us add a test case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3088: [MINOR] Rename broken codecov file
hudi-bot edited a comment on pull request #3088: URL: https://github.com/apache/hudi/pull/3088#issuecomment-861890266 ## CI report: * e168476083aaea02220d5c3502d2ca20840cd236 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=206) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=205) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3088: [MINOR] Rename broken codecov file
codecov-commenter edited a comment on pull request #3088: URL: https://github.com/apache/hudi/pull/3088#issuecomment-861892572 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3088: [MINOR] Rename broken codecov file
codecov-commenter edited a comment on pull request #3088: URL: https://github.com/apache/hudi/pull/3088#issuecomment-861892572 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3088?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3088](https://codecov.io/gh/apache/hudi/pull/3088?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (e168476) into [master](https://codecov.io/gh/apache/hudi/commit/b8fe5b91d599418cd908d833fd63edc7f362c548?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (b8fe5b9) will **increase** coverage by `44.20%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3088/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3088?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3088 +/- ## = + Coverage 8.43% 52.63% +44.20% - Complexity 62 407 +345 = Files70 70 Lines 2880 2880 Branches359 359 = + Hits243 1516 +1273 + Misses 2616 1220 -1396 - Partials 21 144 +123 ``` | Flag | Coverage Δ | | |---|---|---| | hudisync | `6.79% <ø> (ø)` | | | hudiutilities | `71.01% <ø> (+61.91%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3088?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3088/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `88.79% <0.00%> (+5.17%)` | :arrow_up: | | [...e/hudi/utilities/transform/ChainedTransformer.java](https://codecov.io/gh/apache/hudi/pull/3088/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9DaGFpbmVkVHJhbnNmb3JtZXIuamF2YQ==) | `100.00% <0.00%> (+11.11%)` | :arrow_up: | | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/3088/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `64.53% <0.00%> (+30.23%)` | :arrow_up: | | [...ties/deltastreamer/HoodieDeltaStreamerMetrics.java](https://codecov.io/gh/apache/hudi/pull/3088/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lck1ldHJpY3MuamF2YQ==) | `36.11% <0.00%> (+36.11%)` | :arrow_up: | | [...g/apache/hudi/utilities/schema/SchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3088/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlci5qYXZh) | `100.00% <0.00%> (+42.85%)` | :arrow_up: | | [...ities/checkpointing/InitialCheckPointProvider.java](https://codecov.io/gh/apache/hudi/pull/3088/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvSW5pdGlhbENoZWNrUG9pbnRQcm92aWRlci5qYXZh) | `45.45% <0.00%> (+45.45%)` | :arrow_up: | |
[GitHub] [hudi] hudi-bot edited a comment on pull request #3088: [MINOR] Rename broken codecov file
hudi-bot edited a comment on pull request #3088: URL: https://github.com/apache/hudi/pull/3088#issuecomment-861890266 ## CI report: * e168476083aaea02220d5c3502d2ca20840cd236 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=206) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=205) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter commented on pull request #3088: [MINOR] Rename broken codecov file
codecov-commenter commented on pull request #3088: URL: https://github.com/apache/hudi/pull/3088#issuecomment-861892572 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3088?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#3088](https://codecov.io/gh/apache/hudi/pull/3088?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (e168476) into [master](https://codecov.io/gh/apache/hudi/commit/b8fe5b91d599418cd908d833fd63edc7f362c548?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (b8fe5b9) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3088/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3088?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff@@ ## master #3088 +/- ## Coverage 8.43% 8.43% Complexity 62 62 Files70 70 Lines 28802880 Branches359 359 Hits243 243 Misses 26162616 Partials 21 21 ``` | Flag | Coverage Δ | | |---|---|---| | hudisync | `6.79% <ø> (ø)` | | | hudiutilities | `9.09% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #3088: [MINOR] Rename broken codecov file
vinothchandar commented on pull request #3088: URL: https://github.com/apache/hudi/pull/3088#issuecomment-861890684 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3088: [MINOR] Rename broken codecov file
hudi-bot commented on pull request #3088: URL: https://github.com/apache/hudi/pull/3088#issuecomment-861890266 ## CI report: * e168476083aaea02220d5c3502d2ca20840cd236 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar opened a new pull request #3088: [MINOR] Rename broken codecov file
vinothchandar opened a new pull request #3088: URL: https://github.com/apache/hudi/pull/3088 - Stop polluting PRs with wrong coverage info - Retaining the file, so someone can try digging in ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-765) Implement OrcReaderIterator
[ https://issues.apache.org/jira/browse/HUDI-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363932#comment-17363932 ] Jintao commented on HUDI-765: - This PR#2999 has been merged. We can close this ticket. > Implement OrcReaderIterator > --- > > Key: HUDI-765 > URL: https://issues.apache.org/jira/browse/HUDI-765 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: lamber-ken >Assignee: Teresa Kang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HUDI-764) Implement HoodieOrcWriter
[ https://issues.apache.org/jira/browse/HUDI-764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363931#comment-17363931 ] Jintao edited comment on HUDI-764 at 6/15/21, 10:28 PM: The PR#2999 has been merged. We can close this ticket. was (Author: guan): The PR#2999 has been landed. We can close this ticket. > Implement HoodieOrcWriter > - > > Key: HUDI-764 > URL: https://issues.apache.org/jira/browse/HUDI-764 > Project: Apache Hudi > Issue Type: Sub-task > Components: Storage Management >Reporter: lamber-ken >Assignee: Teresa Kang >Priority: Critical > Labels: pull-request-available > > Implement HoodieOrcWriter > * Avro to ORC schema > * Write record in row -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-764) Implement HoodieOrcWriter
[ https://issues.apache.org/jira/browse/HUDI-764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363931#comment-17363931 ] Jintao commented on HUDI-764: - The PR#2999 has been landed. We can close this ticket. > Implement HoodieOrcWriter > - > > Key: HUDI-764 > URL: https://issues.apache.org/jira/browse/HUDI-764 > Project: Apache Hudi > Issue Type: Sub-task > Components: Storage Management >Reporter: lamber-ken >Assignee: Teresa Kang >Priority: Critical > Labels: pull-request-available > > Implement HoodieOrcWriter > * Avro to ORC schema > * Write record in row -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated: [HUDI-764] [HUDI-765] ORC reader writer Implementation (#2999)
This is an automated email from the ASF dual-hosted git repository. pwason pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new b8fe5b9 [HUDI-764] [HUDI-765] ORC reader writer Implementation (#2999) b8fe5b9 is described below commit b8fe5b91d599418cd908d833fd63edc7f362c548 Author: Jintao Guan AuthorDate: Tue Jun 15 15:21:43 2021 -0700 [HUDI-764] [HUDI-765] ORC reader writer Implementation (#2999) Co-authored-by: Qingyun (Teresa) Kang --- LICENSE| 12 + NOTICE | 12 + .../apache/hudi/config/HoodieStorageConfig.java| 42 ++ .../org/apache/hudi/config/HoodieWriteConfig.java | 17 + .../apache/hudi/io/storage/HoodieFileWriter.java | 10 + .../hudi/io/storage/HoodieFileWriterFactory.java | 13 + .../apache/hudi/io/storage/HoodieHFileWriter.java | 10 +- .../apache/hudi/io/storage/HoodieOrcConfig.java| 72 ++ .../apache/hudi/io/storage/HoodieOrcWriter.java| 172 + .../hudi/io/storage/HoodieParquetWriter.java | 9 +- .../java/org/apache/hudi/table/HoodieTable.java| 1 + .../hudi/io/storage/TestHoodieOrcReaderWriter.java | 261 +++ .../src/test/resources/exampleSchemaWithUDT.avsc | 67 ++ .../io/storage/TestHoodieFileWriterFactory.java| 7 + hudi-common/pom.xml| 8 + .../apache/hudi/common/model/HoodieFileFormat.java | 3 +- .../org/apache/hudi/common/util/AvroOrcUtils.java | 799 + .../org/apache/hudi/common/util/BaseFileUtils.java | 133 +++- .../apache/hudi/common/util/OrcReaderIterator.java | 118 +++ .../java/org/apache/hudi/common/util/OrcUtils.java | 235 ++ .../org/apache/hudi/common/util/ParquetUtils.java | 60 +- .../hudi/io/storage/HoodieFileReaderFactory.java | 8 + .../apache/hudi/io/storage/HoodieOrcReader.java| 91 +++ .../apache/hudi/common/util/TestAvroOrcUtils.java | 76 ++ .../hudi/common/util/TestOrcReaderIterator.java| 92 +++ .../io/storage/TestHoodieFileReaderFactory.java| 7 +- .../hudi/hadoop/utils/HoodieInputFormatUtils.java | 9 + .../main/scala/org/apache/hudi/DefaultSource.scala | 13 +- pom.xml| 2 + 29 files changed, 2268 insertions(+), 91 deletions(-) diff --git a/LICENSE b/LICENSE index 385191d..28222a7 100644 --- a/LICENSE +++ b/LICENSE @@ -333,3 +333,15 @@ Copyright (c) 2005, European Commission project OneLab under contract 034819 (ht Home page: https://commons.apache.org/proper/commons-lang/ License: http://www.apache.org/licenses/LICENSE-2.0 + + --- + + This product includes code from StreamSets Data Collector + + * com.streamsets.pipeline.lib.util.avroorc.AvroToOrcRecordConverter copied and modified to org.apache.hudi.common.util.AvroOrcUtils + * com.streamsets.pipeline.lib.util.avroorc.AvroToOrcSchemaConverter copied and modified to org.apache.hudi.common.util.AvroOrcUtils + + Copyright 2018 StreamSets Inc. + + Home page: https://github.com/streamsets/datacollector-oss + License: http://www.apache.org/licenses/LICENSE-2.0 diff --git a/NOTICE b/NOTICE index 2f1aee6..9b24933 100644 --- a/NOTICE +++ b/NOTICE @@ -147,3 +147,15 @@ its NOTICE file: This product includes software developed at The Apache Software Foundation (http://www.apache.org/). + + + +This product includes code from StreamSets Data Collector, which includes the following in +its NOTICE file: + + StreamSets datacollector-oss + Copyright 2018 StreamSets Inc. + + This product includes software developed at + StreamSets (http://www.streamsets.com/). + diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java index 50b45f3..3cd8817 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java @@ -39,10 +39,21 @@ public class HoodieStorageConfig extends DefaultHoodieConfig { public static final String DEFAULT_PARQUET_BLOCK_SIZE_BYTES = DEFAULT_PARQUET_FILE_MAX_BYTES; public static final String PARQUET_PAGE_SIZE_BYTES = "hoodie.parquet.page.size"; public static final String DEFAULT_PARQUET_PAGE_SIZE_BYTES = String.valueOf(1 * 1024 * 1024); + public static final String HFILE_FILE_MAX_BYTES = "hoodie.hfile.max.file.size"; public static final String HFILE_BLOCK_SIZE_BYTES = "hoodie.hfile.block.size"; public static final String DEFAULT_HFILE_BLOCK_SIZE_BYTES = String.valueOf(1 * 1024 * 1024); public static
[GitHub] [hudi] prashantwason merged pull request #2999: [HUDI-764] [HUDI-765] ORC reader writer Implementation
prashantwason merged pull request #2999: URL: https://github.com/apache/hudi/pull/2999 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x
[ https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363911#comment-17363911 ] Nishith Agarwal commented on HUDI-1975: --- [~vinaypatil18] I think there are 2 options : # Shade the dropwizard inside Hudi to let Hudi use 4.1.x # Downgrade to 3.1.x and make changes for the workaround To be able to answer this, can you dig into whether shading will help ? (does prometheus package bring it's own dropwizard or is the environment expected to provide it?). Secondly, can you dig up when the 4.x upgrade was done and what was the reason for it. We can take a call then > Upgrade java-prometheus-client from 3.1.2 to 4.x > > > Key: HUDI-1975 > URL: https://issues.apache.org/jira/browse/HUDI-1975 > Project: Apache Hudi > Issue Type: Task >Reporter: Nishith Agarwal >Priority: Blocker > Fix For: 0.9.0 > > > Find more details here -> https://github.com/apache/hudi/issues/2774 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1413) Need binary release of Hudi to distribute tools like hudi-cli.sh and hudi-sync
[ https://issues.apache.org/jira/browse/HUDI-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1413: -- Labels: sev:critical (was: ) > Need binary release of Hudi to distribute tools like hudi-cli.sh and hudi-sync > -- > > Key: HUDI-1413 > URL: https://issues.apache.org/jira/browse/HUDI-1413 > Project: Apache Hudi > Issue Type: New Feature > Components: Usability >Affects Versions: 0.9.0 >Reporter: Balaji Varadarajan >Priority: Major > Labels: sev:critical > Fix For: 0.9.0 > > > GH issue : https://github.com/apache/hudi/issues/2270 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1413) Need binary release of Hudi to distribute tools like hudi-cli.sh and hudi-sync
[ https://issues.apache.org/jira/browse/HUDI-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-1413: - Assignee: sivabalan narayanan > Need binary release of Hudi to distribute tools like hudi-cli.sh and hudi-sync > -- > > Key: HUDI-1413 > URL: https://issues.apache.org/jira/browse/HUDI-1413 > Project: Apache Hudi > Issue Type: New Feature > Components: Usability >Affects Versions: 0.9.0 >Reporter: Balaji Varadarajan >Assignee: sivabalan narayanan >Priority: Major > Labels: sev:critical > Fix For: 0.9.0 > > > GH issue : https://github.com/apache/hudi/issues/2270 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] prashantwason commented on a change in pull request #3079: [HUDI-2013] Removed option to fallback to file listing when Metadata Table is enabled.
prashantwason commented on a change in pull request #3079: URL: https://github.com/apache/hudi/pull/3079#discussion_r652145709 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java ## @@ -101,11 +101,7 @@ protected BaseTableMetadata(HoodieEngineContext engineContext, HoodieMetadataCon try { return fetchAllPartitionPaths(); } catch (Exception e) { -if (metadataConfig.enableFallback()) { - LOG.error("Failed to retrieve list of partition from metadata", e); -} else { - throw new HoodieMetadataException("Failed to retrieve list of partition from metadata", e); -} +throw new HoodieMetadataException("Failed to retrieve list of partition from metadata", e); Review comment: To make lookups from metadata table fail fast. When fallback was enabled, any errors in lookups from metadata table are logged and file-listing from file system is used instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2906: [HUDI-393] remove travis
hudi-bot edited a comment on pull request #2906: URL: https://github.com/apache/hudi/pull/2906#issuecomment-830929050 ## CI report: * e96f2c03ab0f4fd6deb6803479fa6624eb21ed73 UNKNOWN * 0985b9b4a64b8015257eae8d85dfd899acf7a910 UNKNOWN * 7d6525d83a0fb964feb121e44fc617342778aff5 UNKNOWN * 9e3ef96320c2f6555da6c06264315468654ac520 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=200) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3086: [HUDI-1776] Support AlterCommand For Hoodie
hudi-bot edited a comment on pull request #3086: URL: https://github.com/apache/hudi/pull/3086#issuecomment-861654333 ## CI report: * c041853f41119f23760388d1ab5c7173fe22936b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=198) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3085: [HUDI-2019] Update writeConfig in every task
hudi-bot edited a comment on pull request #3085: URL: https://github.com/apache/hudi/pull/3085#issuecomment-861654289 ## CI report: * 12391dcc275f51727c7977dee297c2f551fbee05 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=197) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3084: [HUDI-2017] Add API to set a metric in the registry.
hudi-bot edited a comment on pull request #3084: URL: https://github.com/apache/hudi/pull/3084#issuecomment-861654242 ## CI report: * 0f912497450d109667dc36d6f79b8feea2b83203 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=196) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3083: [HUDI-2016] Fixed bootstrap of Metadata Table when some actions are in progress.
hudi-bot edited a comment on pull request #3083: URL: https://github.com/apache/hudi/pull/3083#issuecomment-861654193 ## CI report: * ed51501fde0714172ba786b0d882d9c115f4271e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=195) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2023) Validate Schema evolution in hudi
[ https://issues.apache.org/jira/browse/HUDI-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363845#comment-17363845 ] sivabalan narayanan commented on HUDI-2023: --- dump of steps : https://gist.github.com/nsivabalan/33147072fabf5afa9cf2dfee1734e57a > Validate Schema evolution in hudi > - > > Key: HUDI-2023 > URL: https://issues.apache.org/jira/browse/HUDI-2023 > Project: Apache Hudi > Issue Type: Test > Components: Testing >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > > Test schema evolution in hudi and document the same -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3076: [HUDI-2008] Add an annotation to suppress the compiler warnings
hudi-bot edited a comment on pull request #3076: URL: https://github.com/apache/hudi/pull/3076#issuecomment-861653989 ## CI report: * 59ed952fd6c4145121c47e1ea75b14150be0b194 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=191) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3081: [HUDI-2014] Support flink hive sync in batch mode
hudi-bot edited a comment on pull request #3081: URL: https://github.com/apache/hudi/pull/3081#issuecomment-861654094 ## CI report: * 8432465ad2fbc979bca85f99afdfcd3f89aa1147 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=193) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3082: [HUDI-1717] Metadata Reader should merge all the un-synced but complete instants from the dataset timeline.
hudi-bot edited a comment on pull request #3082: URL: https://github.com/apache/hudi/pull/3082#issuecomment-861654149 ## CI report: * d1051893a988c176f36aae88ebedbed43ee9619d Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=194) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3079: [HUDI-2013] Removed option to fallback to file listing when Metadata Table is enabled.
hudi-bot edited a comment on pull request #3079: URL: https://github.com/apache/hudi/pull/3079#issuecomment-861654035 ## CI report: * ea8a509b21a003b77f17703fad20ab2d9015087f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=192) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3073: [HUDI-2006] Adding more yaml templates to test suite
hudi-bot edited a comment on pull request #3073: URL: https://github.com/apache/hudi/pull/3073#issuecomment-861653904 ## CI report: * e039f490bf755cc7b29a1446aa8156dfd3adc9c4 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=189) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3074: [HUDI-2007] Fixing hudi_test_suite for spark nodes and adding spark bulk_insert node
hudi-bot edited a comment on pull request #3074: URL: https://github.com/apache/hudi/pull/3074#issuecomment-861653947 ## CI report: * 2e782acd2f3d8f79d2014875cddf69a62f736a54 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=190) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3050: [HUDI-1988] FinalizeWrite() been executed twice in AbstractHoodieWrit…
hudi-bot edited a comment on pull request #3050: URL: https://github.com/apache/hudi/pull/3050#issuecomment-861653843 ## CI report: * 1f85ab34292c5d57503581f120f04b0472beed1d Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=188) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2999: [HUDI-764] [HUDI-765] ORC reader writer Implementation
hudi-bot edited a comment on pull request #2999: URL: https://github.com/apache/hudi/pull/2999#issuecomment-861684975 ## CI report: * cc731794bc22494488d94eb4e951105125ab5b97 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=203) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2999: [HUDI-764] [HUDI-765] ORC reader writer Implementation
hudi-bot edited a comment on pull request #2999: URL: https://github.com/apache/hudi/pull/2999#issuecomment-861684975 ## CI report: * cc731794bc22494488d94eb4e951105125ab5b97 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=203) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #2999: [HUDI-764] [HUDI-765] ORC reader writer Implementation
hudi-bot commented on pull request #2999: URL: https://github.com/apache/hudi/pull/2999#issuecomment-861684975 ## CI report: * cc731794bc22494488d94eb4e951105125ab5b97 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jintaoguan commented on pull request #2999: [HUDI-764] [HUDI-765] ORC reader writer Implementation
jintaoguan commented on pull request #2999: URL: https://github.com/apache/hudi/pull/2999#issuecomment-861683763 @leesf I added a new subtask in [HUDI-57](https://issues.apache.org/jira/browse/HUDI-57) about adding a new integration test for ORC reader/writer. We will merge this PR first and work on the integration later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2024) Add integration tests for ORC reader/writer
Jintao created HUDI-2024: Summary: Add integration tests for ORC reader/writer Key: HUDI-2024 URL: https://issues.apache.org/jira/browse/HUDI-2024 Project: Apache Hudi Issue Type: Sub-task Reporter: Jintao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #2206: [HUDI-1105] Adding dedup support for Bulk Insert w/ Rows
hudi-bot edited a comment on pull request #2206: URL: https://github.com/apache/hudi/pull/2206#issuecomment-861663727 ## CI report: * 031b8fa2a947f69815bce9fa181dc98dd972d07e Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=201) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=202) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2819: [HUDI-1794] Moved static COMMIT_FORMATTER to thread local variable as SimpleDateFormat is not thread safe.
hudi-bot edited a comment on pull request #2819: URL: https://github.com/apache/hudi/pull/2819#issuecomment-861653496 ## CI report: * 199e377c5de50a8bce5b0af4f70d8090d714 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=183) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2915: [HUDI-251] Adds JDBC source support for DeltaStreamer
hudi-bot edited a comment on pull request #2915: URL: https://github.com/apache/hudi/pull/2915#issuecomment-861653740 ## CI report: * 877103f83dc9ea2ed3d8bffecd0d740c3dfc391a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=186) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3087: [HUDI-2022] Release writer for append handle #close
hudi-bot edited a comment on pull request #3087: URL: https://github.com/apache/hudi/pull/3087#issuecomment-861654382 ## CI report: * 6711f3ecada0ce60bd4bdda130a18b2984ec71e6 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=199) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2206: [HUDI-1105] Adding dedup support for Bulk Insert w/ Rows
hudi-bot edited a comment on pull request #2206: URL: https://github.com/apache/hudi/pull/2206#issuecomment-861663727 ## CI report: * 031b8fa2a947f69815bce9fa181dc98dd972d07e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=201) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=202) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #2206: [HUDI-1105] Adding dedup support for Bulk Insert w/ Rows
vinothchandar commented on pull request #2206: URL: https://github.com/apache/hudi/pull/2206#issuecomment-861665899 Lets give this a shot. @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2879: [HUDI-1848] Adding support for HMS for running DDL queries in hive-sy…
hudi-bot edited a comment on pull request #2879: URL: https://github.com/apache/hudi/pull/2879#issuecomment-861653620 ## CI report: * 11e1ac37a95ef12de1d0f3ada11e5fccc57960e2 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=185) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org