Re: [PR] [HUDI-7086] Scaling gcs event source [hudi]
nsivabalan commented on PR #10073: URL: https://github.com/apache/hudi/pull/10073#issuecomment-1822259003 testHoodieAsyncClusteringJobWithScheduleAndExecute{String, HoodieRecordType}[1] is known to be flaky. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7110] Add call procedure for show column stats information [hudi]
majian1998 closed pull request #10120: [HUDI-7110] Add call procedure for show column stats information URL: https://github.com/apache/hudi/pull/10120 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7133] Improve dbt example for better guidance [hudi]
hudi-bot commented on PR #10155: URL: https://github.com/apache/hudi/pull/10155#issuecomment-1822243889 ## CI report: * af912a3fed7270708fad935b7df55fb508cd5536 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21090) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]
hudi-bot commented on PR #10150: URL: https://github.com/apache/hudi/pull/10150#issuecomment-1822243792 ## CI report: * 7a678c8f26e7b94fca3812d29e9ddca59b083127 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7123] Improve CI scripts [hudi]
hudi-bot commented on PR #10136: URL: https://github.com/apache/hudi/pull/10136#issuecomment-1822243691 ## CI report: * dd3d933d329208fcdf9c00ed2dcb12a7e22cce26 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20982) * 0aede30a2da391f50f6d750b102901e811b35880 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21089) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7112] Reuse existing timeline server and performance improvements [hudi]
hudi-bot commented on PR #10122: URL: https://github.com/apache/hudi/pull/10122#issuecomment-1822243625 ## CI report: * bbf765005b4e1e92730e0dff736bcb561d928b7b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21081) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7095] Making perf enhancements to JSON serde [hudi]
hudi-bot commented on PR #10097: URL: https://github.com/apache/hudi/pull/10097#issuecomment-1822243447 ## CI report: * 43400ce2317882c76a68eb3a855c9dd814c92234 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21077) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7133] Improve dbt example for better guidance [hudi]
hudi-bot commented on PR #10155: URL: https://github.com/apache/hudi/pull/10155#issuecomment-1822235911 ## CI report: * af912a3fed7270708fad935b7df55fb508cd5536 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7123] Improve CI scripts [hudi]
hudi-bot commented on PR #10136: URL: https://github.com/apache/hudi/pull/10136#issuecomment-1822235518 ## CI report: * dd3d933d329208fcdf9c00ed2dcb12a7e22cce26 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20982) * 0aede30a2da391f50f6d750b102901e811b35880 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] hudi-examples-dbt not running with spark thrift server [hudi]
xushiyan commented on issue #6125: URL: https://github.com/apache/hudi/issues/6125#issuecomment-181787 @sambhav13 I'm updating the instructions in the dbt example (using spark 3.2 and hudi 0.14.0). Please check this out and let us know if it helps. https://github.com/apache/hudi/blob/af912a3fed7270708fad935b7df55fb508cd5536/hudi-examples/hudi-examples-dbt/README.md -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7133) Improve dbt example for better guidance
[ https://issues.apache.org/jira/browse/HUDI-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7133: - Labels: pull-request-available (was: ) > Improve dbt example for better guidance > --- > > Key: HUDI-7133 > URL: https://issues.apache.org/jira/browse/HUDI-7133 > Project: Apache Hudi > Issue Type: Improvement > Components: docs >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Major > Labels: pull-request-available > Fix For: 0.14.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7133] Improve dbt example for better guidance [hudi]
xushiyan opened a new pull request, #10155: URL: https://github.com/apache/hudi/pull/10155 ### Change Logs Update dbt example with more detailed instructions. ### Impact Improve dbt example for learning. ### Risk level None. ### Documentation Update NA ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7133) Improve dbt example for better guidance
Raymond Xu created HUDI-7133: Summary: Improve dbt example for better guidance Key: HUDI-7133 URL: https://issues.apache.org/jira/browse/HUDI-7133 Project: Apache Hudi Issue Type: Improvement Components: docs Reporter: Raymond Xu Assignee: Raymond Xu Fix For: 0.14.1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7096) Improve Incr Query for partitions touched based on start and end
[ https://issues.apache.org/jira/browse/HUDI-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-7096. - Fix Version/s: 0.14.1 Resolution: Fixed > Improve Incr Query for partitions touched based on start and end > > > Key: HUDI-7096 > URL: https://issues.apache.org/jira/browse/HUDI-7096 > Project: Apache Hudi > Issue Type: Improvement > Components: reader-core >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.14.1 > > > We could improve incr query by directly fetching the partitions based on > commit metadata for commits based on start and end. And thus avoiding to poll > metadata table or do file system based listing to fetch partitions in > FileIndex -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7052] Fix partition key validation for custom key generators. [hudi]
hudi-bot commented on PR #10014: URL: https://github.com/apache/hudi/pull/10014#issuecomment-1822191008 ## CI report: * c359d7d70cb6e34ce4d62b2f71f39b91b49ea334 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21070) * 32469364d39eebf69fb001955aa2cccdfc772f1c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21087) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated (2522f6de6f1 -> c5af85dfd91)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 2522f6de6f1 [HUDI-7128] DeleteMarkerProcedures support delete in batch mode (#10148) add c5af85dfd91 [HUDI-7096] Improving incremental query to fetch partitions based on commit metadata (#10098) No new revisions were added by this update. Summary of changes: .../hudi/client/BaseHoodieTableServiceClient.java | 4 +++ .../org/apache/hudi/BaseHoodieTableFileIndex.java | 30 -- .../hudi/common/testutils/HoodieTestUtils.java | 10 +++- .../hudi/hadoop/HiveHoodieTableFileIndex.java | 4 ++- .../scala/org/apache/hudi/HoodieFileIndex.scala| 4 ++- .../apache/hudi/SparkHoodieTableFileIndex.scala| 8 -- .../hudi/TestHoodieMergeHandleWithSparkMerger.java | 6 - .../org/apache/hudi/functional/TestBootstrap.java | 7 ++--- 8 files changed, 62 insertions(+), 11 deletions(-)
Re: [PR] [HUDI-7096] Improving incremental query to fetch partitions based on commit metadata [hudi]
nsivabalan merged PR #10098: URL: https://github.com/apache/hudi/pull/10098 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7004] Add support of snapshotLoadQuerySplitter in s3/gcs sources [hudi]
hudi-bot commented on PR #10152: URL: https://github.com/apache/hudi/pull/10152#issuecomment-1822183860 ## CI report: * 9764bc6527e5e3e83ed08263484beb45c1796d47 UNKNOWN * b1748e270c379c479bb3286e635482d204b853c5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21071) * 6407bc6c69ed9e43b970dbd4a7d5a441ebe45150 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21086) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7110] Add call procedure for show column stats information [hudi]
hudi-bot commented on PR #10120: URL: https://github.com/apache/hudi/pull/10120#issuecomment-1822183724 ## CI report: * 03451f7cd016ee9fb078f4d78f3b771e8719c233 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21056) * 946ec17878c5110741fa3b1e3bbff4fc804d77ed Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21085) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7052] Fix partition key validation for custom key generators. [hudi]
hudi-bot commented on PR #10014: URL: https://github.com/apache/hudi/pull/10014#issuecomment-1822183538 ## CI report: * c359d7d70cb6e34ce4d62b2f71f39b91b49ea334 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21070) * 32469364d39eebf69fb001955aa2cccdfc772f1c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]
hudi-bot commented on PR #10150: URL: https://github.com/apache/hudi/pull/10150#issuecomment-1822176875 ## CI report: * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073) * 1f7f7cdaf5be480a170169e5d97bc6ec76aa5d6f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21079) * 2db7b8bee13140a4756427aeb802bad13822e5af Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21080) * 7a678c8f26e7b94fca3812d29e9ddca59b083127 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21083) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7110] Add call procedure for show column stats information [hudi]
hudi-bot commented on PR #10120: URL: https://github.com/apache/hudi/pull/10120#issuecomment-1822176736 ## CI report: * 03451f7cd016ee9fb078f4d78f3b771e8719c233 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21056) * 946ec17878c5110741fa3b1e3bbff4fc804d77ed UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7004] Add support of snapshotLoadQuerySplitter in s3/gcs sources [hudi]
hudi-bot commented on PR #10152: URL: https://github.com/apache/hudi/pull/10152#issuecomment-1822169857 ## CI report: * 9764bc6527e5e3e83ed08263484beb45c1796d47 UNKNOWN * b1748e270c379c479bb3286e635482d204b853c5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21071) * 6407bc6c69ed9e43b970dbd4a7d5a441ebe45150 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7086] Scaling gcs event source [hudi]
hudi-bot commented on PR #10073: URL: https://github.com/apache/hudi/pull/10073#issuecomment-1822169590 ## CI report: * 48df6bbec2473dbbbedb1b723896acb17056e80f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21076) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Fail to add default partition [hudi]
ad1happy2go commented on issue #10154: URL: https://github.com/apache/hudi/issues/10154#issuecomment-1822169099 @njalan Do your partition column in data contains NULLS? When are you facing this error? Looks like you are trying to add the null partition. It may not be hudi related but more of hive related issue. You may try - ALTER TABLE ods_xxx.xx ADD IF NOT EXISTS PARTITION (xx=null) LOCATION '/HIVE_DEFAULT_PARTITION' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Clean action failure triggers an exception while trying to check whether metadata is a table [hudi]
ad1happy2go commented on issue #10127: URL: https://github.com/apache/hudi/issues/10127#issuecomment-1822166971 @shubhamn21 Were you able to resolve this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]
ad1happy2go commented on issue #10138: URL: https://github.com/apache/hudi/issues/10138#issuecomment-1822164822 @abhisheksahani91 There looks like related to this which is yet to be fixed. https://github.com/apache/hudi/pull/5269 To unblock you can disable the timeline server for now - ``` hoodie.write.markers.type= 'direct', hoodie.embed.timeline.server= 'false' ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7110] Add call procedure for show column stats information [hudi]
majian1998 commented on PR #10120: URL: https://github.com/apache/hudi/pull/10120#issuecomment-1822158300 Improvement: Using FileSystemView to obtain the latest file slices, only displaying valid and up-to-date file information when showing column stats information. At the same time, rebased the latest master branch to rerun tests. cc @danny0405 @stream2000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7132) Data may be lost in flink#chk
[ https://issues.apache.org/jira/browse/HUDI-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HUDI-7132: - Description: https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35 before the line code, eventBuffer may be updated by `subtaskFailed`, and some elements of eventBuffer is null https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21 We can create new FailedWriteMetadataEvent for `subtaskFailed` was: https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35 before the line code, eventBuffer may be updated by `subtaskFailed`, and some elements of eventBuffer is null https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21 > Data may be lost in flink#chk > - > > Key: HUDI-7132 > URL: https://issues.apache.org/jira/browse/HUDI-7132 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 1.1.0 >Reporter: Bo Cui >Priority: Major > > https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35 > before the line code, eventBuffer may be updated by `subtaskFailed`, and some > elements of eventBuffer is null > https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21 > We can create new FailedWriteMetadataEvent for `subtaskFailed` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7132) Data may be lost in flink#chk
[ https://issues.apache.org/jira/browse/HUDI-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HUDI-7132: - Description: https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35 before the line code, eventBuffer may be updated by `subtaskFailed`, and some elements of eventBuffer is null https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21 was: https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35 before the line code, eventBuffer may be updated by `subtaskFailed`, and some elements of eventBuffer is null we can add a lock to the 2 line to solve the problem. https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21 > Data may be lost in flink#chk > - > > Key: HUDI-7132 > URL: https://issues.apache.org/jira/browse/HUDI-7132 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 1.1.0 >Reporter: Bo Cui >Priority: Major > > https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35 > before the line code, eventBuffer may be updated by `subtaskFailed`, and some > elements of eventBuffer is null > https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7110] Add call procedure for show column stats information [hudi]
danny0405 commented on PR #10120: URL: https://github.com/apache/hudi/pull/10120#issuecomment-1822104388 You got some test failures: https://github.com/apache/hudi/actions/runs/6940997119/job/18914907624?pr=10120, you can rebase with the latest master to fix the Azure tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7120] Performance improvements in deltastreamer executor code path [hudi]
hudi-bot commented on PR #10135: URL: https://github.com/apache/hudi/pull/10135#issuecomment-1822079113 ## CI report: * 34ffc8261d951bde500df7688800b2ed6afb4fa6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21069) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7052] Fix partition key validation for custom key generators. [hudi]
hudi-bot commented on PR #10014: URL: https://github.com/apache/hudi/pull/10014#issuecomment-1822078574 ## CI report: * c359d7d70cb6e34ce4d62b2f71f39b91b49ea334 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21070) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Invalid number of file groups for partition:column_stats [hudi]
ocean-zhc commented on issue #7657: URL: https://github.com/apache/hudi/issues/7657#issuecomment-1822048995 > I have came across the same problem using 0.12.0 version. I have set > > hoodie.metadata.index.bloom.filter.enable=false hoodie.metadata.index.column.stats.enable=false > > these configs to false and it helped me to bypass this error. TKS! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated (a1afcdd989c -> 2522f6de6f1)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from a1afcdd989c [HUDI-7115] Add in new options for the bigquery sync (#10125) add 2522f6de6f1 [HUDI-7128] DeleteMarkerProcedures support delete in batch mode (#10148) No new revisions were added by this update. Summary of changes: .../command/procedures/DeleteMarkerProcedure.scala | 11 +++- .../procedures/DeleteSavepointProcedure.scala | 37 +-- .../sql/hudi/procedure/TestCallProcedure.scala | 44 ++ .../hudi/procedure/TestSavepointsProcedure.scala | 71 ++ 4 files changed, 143 insertions(+), 20 deletions(-)
[jira] [Closed] (HUDI-7128) DeleteProcedures support delete in batch mode
[ https://issues.apache.org/jira/browse/HUDI-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7128. Resolution: Fixed Fixed via master branch: 2522f6de6f13f44bac89c81bb753c58a52cc780c > DeleteProcedures support delete in batch mode > - > > Key: HUDI-7128 > URL: https://issues.apache.org/jira/browse/HUDI-7128 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available > Fix For: 0.14.1 > > > DeleteMarkerProcedures support delete in batch mode > eg: > if user want to delete 100 or more markers,before the pr need execute > sparksql job for 100 times,and just once after the pr,it would reduce much > execute time in sparksql. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7128] DeleteProcedures support batch mode [hudi]
danny0405 merged PR #10148: URL: https://github.com/apache/hudi/pull/10148 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-7059) Read record positions with filter pushdown using Spark parquet reader
[ https://issues.apache.org/jira/browse/HUDI-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7059: --- Assignee: Lin Liu > Read record positions with filter pushdown using Spark parquet reader > - > > Key: HUDI-7059 > URL: https://issues.apache.org/jira/browse/HUDI-7059 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Lin Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7128] DeleteProcedures support batch mode [hudi]
xuzifu666 commented on PR #10148: URL: https://github.com/apache/hudi/pull/10148#issuecomment-1822035400 @danny0405 @yihua PTAL CI error seems not related to the changed code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]
hudi-bot commented on PR #10150: URL: https://github.com/apache/hudi/pull/10150#issuecomment-1822035335 ## CI report: * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059) * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073) * 1f7f7cdaf5be480a170169e5d97bc6ec76aa5d6f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21079) * 2db7b8bee13140a4756427aeb802bad13822e5af Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21080) * 7a678c8f26e7b94fca3812d29e9ddca59b083127 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21083) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Making misc fixes to deltastreamer sources(S3 and GCS) [hudi]
hudi-bot commented on PR #10095: URL: https://github.com/apache/hudi/pull/10095#issuecomment-1822035213 ## CI report: * b77d51aac5e370b00bab3acfccd471cf03a1c718 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21015) * 6b39eb40210756b1cb6c50317690d5df99e09d9d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21082) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]
hudi-bot commented on PR #10150: URL: https://github.com/apache/hudi/pull/10150#issuecomment-1822030407 ## CI report: * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059) * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073) * 1f7f7cdaf5be480a170169e5d97bc6ec76aa5d6f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21079) * 2db7b8bee13140a4756427aeb802bad13822e5af Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21080) * 7a678c8f26e7b94fca3812d29e9ddca59b083127 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Making misc fixes to deltastreamer sources(S3 and GCS) [hudi]
hudi-bot commented on PR #10095: URL: https://github.com/apache/hudi/pull/10095#issuecomment-1822030206 ## CI report: * b77d51aac5e370b00bab3acfccd471cf03a1c718 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21015) * 6b39eb40210756b1cb6c50317690d5df99e09d9d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7112] Reuse existing timeline server and performance improvements [hudi]
hudi-bot commented on PR #10122: URL: https://github.com/apache/hudi/pull/10122#issuecomment-1822025391 ## CI report: * 697114b6ec4f578123363a89a6846e352bc3a53e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21057) * bbf765005b4e1e92730e0dff736bcb561d928b7b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21081) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7097] Fixing instantiation of Hms Uri with HiveSync tool [hudi]
nsivabalan commented on code in PR #10099: URL: https://github.com/apache/hudi/pull/10099#discussion_r1401453914 ## hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java: ## @@ -103,15 +103,29 @@ public class HiveSyncTool extends HoodieSyncTool implements AutoCloseable { public HiveSyncTool(Properties props, Configuration hadoopConf) { super(props, hadoopConf); -String metastoreUris = props.getProperty(METASTORE_URIS.key()); -// Give precedence to HiveConf.ConfVars.METASTOREURIS if it is set. -// Else if user has provided HiveSyncConfigHolder.METASTORE_URIS, then set that in hadoop conf. -if (isNullOrEmpty(hadoopConf.get(HiveConf.ConfVars.METASTOREURIS.varname)) && nonEmpty(metastoreUris)) { - LOG.info(String.format("Setting %s = %s", HiveConf.ConfVars.METASTOREURIS.varname, metastoreUris)); - hadoopConf.set(HiveConf.ConfVars.METASTOREURIS.varname, metastoreUris); +String configuredMetastoreUris = props.getProperty(METASTORE_URIS.key()); +String existingHadoopConfMetastoreUris = hadoopConf.get(HiveConf.ConfVars.METASTOREURIS.varname); Review Comment: appreciate it. agree then. we can simplify. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7112] Reuse existing timeline server and performance improvements [hudi]
hudi-bot commented on PR #10122: URL: https://github.com/apache/hudi/pull/10122#issuecomment-1822019706 ## CI report: * 697114b6ec4f578123363a89a6846e352bc3a53e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21057) * bbf765005b4e1e92730e0dff736bcb561d928b7b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-7131) The requested schema is not compatible with the file schema
[ https://issues.apache.org/jira/browse/HUDI-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788607#comment-17788607 ] loukey_j commented on HUDI-7131: The schema of the table has not changed, only the partition value of the data has changed. > The requested schema is not compatible with the file schema > --- > > Key: HUDI-7131 > URL: https://issues.apache.org/jira/browse/HUDI-7131 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: loukey_j >Priority: Critical > Labels: core, merge, spark > > use global Index and data partition change , report an error: The requested > schema is not compatible with the file schema... > Why not use the schema of > org.apache.hudi.common.table.TableSchemaResolver#getTableAvroSchemaInternal > to read hudi data > > CREATE TABLE if not exists unisql.hudi_ut_time_traval > (id INT, version INT, name STRING, birthDate TIMESTAMP, inc_day STRING) USING > HUDI > PARTITIONED BY (inc_day) TBLPROPERTIES (type='cow', primaryKey='id'); > insert into unisql.hudi_ut_time_traval > select 1 as id, 1 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' > as timestamp) as birthDate, cast('2023-10-01' as date) as inc_day; > select * from hudi_ut_time_traval; > +---+-+--+--++---+---+-+---+--+ > |_hoodie_commit_time|_hoodie_commit_seqno > |_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name |id > |version|name |birthDate |inc_day | > +---+-+--+--++---+---+-+---+--+ > |20231122100234339 |20231122100234339_0_0|1 |inc_day=2023-10-01 > |8a510742-c060-4d12-898e-70bbd122f2e3-0_0-19-16_20231122100234339.parquet|1 > |1 |str_1|2023-01-01 12:12:12|2023-10-01| > +---+-+--+--++---+---+-+---+--+ > merge into hudi_ut_time_traval t using ( > select 1 as id, 2 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' > as timestamp) as birthDate, cast('2023-10-02' as date) as inc_day > ) s on t.id=s.id when matched THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT * > Caused by: org.apache.parquet.io.ParquetDecodingException: The requested > schema is not compatible with the file schema. incompatible types: required > int32 id != optional int32 id > at > org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:101) > at > org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:81) > at > org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:57) > at org.apache.parquet.schema.MessageType.accept(MessageType.java:55) > at org.apache.parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:162) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:135) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:225); > parquet schema: > { > "type" : "record", > "name" : "hudi_ut_time_traval_record", > "namespace" : "hoodie.hudi_ut_time_traval", > "fields" : [ { > "name" : "_hoodie_commit_time", > "type" : [ "null", "string" ], > "doc" : "", > "default" : null > }, { > "name" : "_hoodie_commit_seqno", > "type" : [ "null", "string" ], > "doc" : "", > "default" : null > }, { > "name" : "_hoodie_record_key", > "type" : [ "null", "string" ], > "doc" : "", > "default" : null > }, { > "name" : "_hoodie_partition_path", > "type" : [ "null", "string" ], > "doc" : "", > "default" : null > }, { > "name" : "_hoodie_file_name", > "type" : [ "null", "string" ], > "doc" : "", > "default" : null > }, { > "name" : "id", > "type" : [ "null", "int" ], > "default" : null > }, { > "name" : "version", > "type" : [ "null", "int" ], > "default" : null > }, { > "name" : "name", > "type" : [ "null", "string" ], > "default" : null > }, { > "name" : "birthDate", > "type" : [ "null", { > "type" : "long", > "logicalType" : "timestamp-micros" > } ], > "default" : null > }, { > "name" : "inc_day", > "type" : [ "null", "string" ], > "default" : null > } ] > } > org.apache.hudi.io.HoodieMergedReadHandle#readerSchema: >
[jira] [Commented] (HUDI-7131) The requested schema is not compatible with the file schema
[ https://issues.apache.org/jira/browse/HUDI-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788605#comment-17788605 ] Danny Chen commented on HUDI-7131: -- It looks like an known issue, we do not support schema evolution on partiton fields yet. > The requested schema is not compatible with the file schema > --- > > Key: HUDI-7131 > URL: https://issues.apache.org/jira/browse/HUDI-7131 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: loukey_j >Priority: Critical > Labels: core, merge, spark > > use global Index and data partition change , report an error: The requested > schema is not compatible with the file schema... > Why not use the schema of > org.apache.hudi.common.table.TableSchemaResolver#getTableAvroSchemaInternal > to read hudi data > > CREATE TABLE if not exists unisql.hudi_ut_time_traval > (id INT, version INT, name STRING, birthDate TIMESTAMP, inc_day STRING) USING > HUDI > PARTITIONED BY (inc_day) TBLPROPERTIES (type='cow', primaryKey='id'); > insert into unisql.hudi_ut_time_traval > select 1 as id, 1 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' > as timestamp) as birthDate, cast('2023-10-01' as date) as inc_day; > select * from hudi_ut_time_traval; > +---+-+--+--++---+---+-+---+--+ > |_hoodie_commit_time|_hoodie_commit_seqno > |_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name |id > |version|name |birthDate |inc_day | > +---+-+--+--++---+---+-+---+--+ > |20231122100234339 |20231122100234339_0_0|1 |inc_day=2023-10-01 > |8a510742-c060-4d12-898e-70bbd122f2e3-0_0-19-16_20231122100234339.parquet|1 > |1 |str_1|2023-01-01 12:12:12|2023-10-01| > +---+-+--+--++---+---+-+---+--+ > merge into hudi_ut_time_traval t using ( > select 1 as id, 2 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' > as timestamp) as birthDate, cast('2023-10-02' as date) as inc_day > ) s on t.id=s.id when matched THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT * > Caused by: org.apache.parquet.io.ParquetDecodingException: The requested > schema is not compatible with the file schema. incompatible types: required > int32 id != optional int32 id > at > org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:101) > at > org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:81) > at > org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:57) > at org.apache.parquet.schema.MessageType.accept(MessageType.java:55) > at org.apache.parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:162) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:135) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:225); > parquet schema: > { > "type" : "record", > "name" : "hudi_ut_time_traval_record", > "namespace" : "hoodie.hudi_ut_time_traval", > "fields" : [ { > "name" : "_hoodie_commit_time", > "type" : [ "null", "string" ], > "doc" : "", > "default" : null > }, { > "name" : "_hoodie_commit_seqno", > "type" : [ "null", "string" ], > "doc" : "", > "default" : null > }, { > "name" : "_hoodie_record_key", > "type" : [ "null", "string" ], > "doc" : "", > "default" : null > }, { > "name" : "_hoodie_partition_path", > "type" : [ "null", "string" ], > "doc" : "", > "default" : null > }, { > "name" : "_hoodie_file_name", > "type" : [ "null", "string" ], > "doc" : "", > "default" : null > }, { > "name" : "id", > "type" : [ "null", "int" ], > "default" : null > }, { > "name" : "version", > "type" : [ "null", "int" ], > "default" : null > }, { > "name" : "name", > "type" : [ "null", "string" ], > "default" : null > }, { > "name" : "birthDate", > "type" : [ "null", { > "type" : "long", > "logicalType" : "timestamp-micros" > } ], > "default" : null > }, { > "name" : "inc_day", > "type" : [ "null", "string" ], > "default" : null > } ] > } > org.apache.hudi.io.HoodieMergedReadHandle#readerSchema: >
[jira] [Updated] (HUDI-7132) Data may be lost in flink#chk
[ https://issues.apache.org/jira/browse/HUDI-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HUDI-7132: - Description: https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35 before the line code, eventBuffer may be updated by `subtaskFailed`, and some elements of eventBuffer is null we can add a lock to the 2 line to solve the problem. https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21 was: https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35 before the line code, eventBuffer may be updated by `subtaskFailed`, and some elements of eventBuffer is null https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21 > Data may be lost in flink#chk > - > > Key: HUDI-7132 > URL: https://issues.apache.org/jira/browse/HUDI-7132 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 1.1.0 >Reporter: Bo Cui >Priority: Major > > https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35 > before the line code, eventBuffer may be updated by `subtaskFailed`, and some > elements of eventBuffer is null > we can add a lock to the 2 line to solve the problem. > https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7034] Refresh index fix - remove cached file slices within part… [hudi]
danny0405 commented on PR #10151: URL: https://github.com/apache/hudi/pull/10151#issuecomment-1821998306 @VitoMakarevich Nice catch, can you fix the compile error: error file=/home/runner/work/hudi/hudi/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieFileIndex.scala message=expected start of definition, but was Token(VAL,val,12946,val) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7132) Data may be lost in flink#chk
[ https://issues.apache.org/jira/browse/HUDI-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Cui updated HUDI-7132: - Description: https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35 before the line code, eventBuffer may be updated by `subtaskFailed`, and some elements of eventBuffer is null https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21 > Data may be lost in flink#chk > - > > Key: HUDI-7132 > URL: https://issues.apache.org/jira/browse/HUDI-7132 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 1.1.0 >Reporter: Bo Cui >Priority: Major > > https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35 > before the line code, eventBuffer may be updated by `subtaskFailed`, and some > elements of eventBuffer is null > https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7132) Data may be lost in flink#chk
Bo Cui created HUDI-7132: Summary: Data may be lost in flink#chk Key: HUDI-7132 URL: https://issues.apache.org/jira/browse/HUDI-7132 Project: Apache Hudi Issue Type: Bug Components: flink Affects Versions: 1.1.0 Reporter: Bo Cui -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]
hudi-bot commented on PR #10150: URL: https://github.com/apache/hudi/pull/10150#issuecomment-1821990146 ## CI report: * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059) * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073) * 1f7f7cdaf5be480a170169e5d97bc6ec76aa5d6f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21079) * 2db7b8bee13140a4756427aeb802bad13822e5af Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21080) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5968] Fix global index duplicate and handle custom payload when update partition [hudi]
loukey-lj commented on PR #8490: URL: https://github.com/apache/hudi/pull/8490#issuecomment-1821987556 @xushiyan @nsivabalan https://issues.apache.org/jira/browse/HUDI-7131 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7131) The requested schema is not compatible with the file schema
[ https://issues.apache.org/jira/browse/HUDI-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] loukey_j updated HUDI-7131: --- Affects Version/s: 0.14.0 > The requested schema is not compatible with the file schema > --- > > Key: HUDI-7131 > URL: https://issues.apache.org/jira/browse/HUDI-7131 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: loukey_j >Priority: Critical > Labels: core, merge, spark > > use global Index and data partition change , report an error: The requested > schema is not compatible with the file schema... > Why not use the schema of > org.apache.hudi.common.table.TableSchemaResolver#getTableAvroSchemaInternal > to read hudi data > > CREATE TABLE if not exists unisql.hudi_ut_time_traval > (id INT, version INT, name STRING, birthDate TIMESTAMP, inc_day STRING) USING > HUDI > PARTITIONED BY (inc_day) TBLPROPERTIES (type='cow', primaryKey='id'); > insert into unisql.hudi_ut_time_traval > select 1 as id, 1 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' > as timestamp) as birthDate, cast('2023-10-01' as date) as inc_day; > select * from hudi_ut_time_traval; > +---+-+--+--++---+---+-+---+--+ > |_hoodie_commit_time|_hoodie_commit_seqno > |_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name |id > |version|name |birthDate |inc_day | > +---+-+--+--++---+---+-+---+--+ > |20231122100234339 |20231122100234339_0_0|1 |inc_day=2023-10-01 > |8a510742-c060-4d12-898e-70bbd122f2e3-0_0-19-16_20231122100234339.parquet|1 > |1 |str_1|2023-01-01 12:12:12|2023-10-01| > +---+-+--+--++---+---+-+---+--+ > merge into hudi_ut_time_traval t using ( > select 1 as id, 2 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' > as timestamp) as birthDate, cast('2023-10-02' as date) as inc_day > ) s on t.id=s.id when matched THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT * > Caused by: org.apache.parquet.io.ParquetDecodingException: The requested > schema is not compatible with the file schema. incompatible types: required > int32 id != optional int32 id > at > org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:101) > at > org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:81) > at > org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:57) > at org.apache.parquet.schema.MessageType.accept(MessageType.java:55) > at org.apache.parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:162) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:135) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:225); > parquet schema: > { > "type" : "record", > "name" : "hudi_ut_time_traval_record", > "namespace" : "hoodie.hudi_ut_time_traval", > "fields" : [ { > "name" : "_hoodie_commit_time", > "type" : [ "null", "string" ], > "doc" : "", > "default" : null > }, { > "name" : "_hoodie_commit_seqno", > "type" : [ "null", "string" ], > "doc" : "", > "default" : null > }, { > "name" : "_hoodie_record_key", > "type" : [ "null", "string" ], > "doc" : "", > "default" : null > }, { > "name" : "_hoodie_partition_path", > "type" : [ "null", "string" ], > "doc" : "", > "default" : null > }, { > "name" : "_hoodie_file_name", > "type" : [ "null", "string" ], > "doc" : "", > "default" : null > }, { > "name" : "id", > "type" : [ "null", "int" ], > "default" : null > }, { > "name" : "version", > "type" : [ "null", "int" ], > "default" : null > }, { > "name" : "name", > "type" : [ "null", "string" ], > "default" : null > }, { > "name" : "birthDate", > "type" : [ "null", { > "type" : "long", > "logicalType" : "timestamp-micros" > } ], > "default" : null > }, { > "name" : "inc_day", > "type" : [ "null", "string" ], > "default" : null > } ] > } > org.apache.hudi.io.HoodieMergedReadHandle#readerSchema: >
Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]
hudi-bot commented on PR #10150: URL: https://github.com/apache/hudi/pull/10150#issuecomment-1821984783 ## CI report: * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059) * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073) * 1f7f7cdaf5be480a170169e5d97bc6ec76aa5d6f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21079) * 2db7b8bee13140a4756427aeb802bad13822e5af UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7131) The requested schema is not compatible with the file schema
loukey_j created HUDI-7131: -- Summary: The requested schema is not compatible with the file schema Key: HUDI-7131 URL: https://issues.apache.org/jira/browse/HUDI-7131 Project: Apache Hudi Issue Type: Bug Reporter: loukey_j use global Index and data partition change , report an error: The requested schema is not compatible with the file schema... Why not use the schema of org.apache.hudi.common.table.TableSchemaResolver#getTableAvroSchemaInternal to read hudi data CREATE TABLE if not exists unisql.hudi_ut_time_traval (id INT, version INT, name STRING, birthDate TIMESTAMP, inc_day STRING) USING HUDI PARTITIONED BY (inc_day) TBLPROPERTIES (type='cow', primaryKey='id'); insert into unisql.hudi_ut_time_traval select 1 as id, 1 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' as timestamp) as birthDate, cast('2023-10-01' as date) as inc_day; select * from hudi_ut_time_traval; +---+-+--+--++---+---+-+---+--+ |_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name |id |version|name |birthDate |inc_day | +---+-+--+--++---+---+-+---+--+ |20231122100234339 |20231122100234339_0_0|1 |inc_day=2023-10-01 |8a510742-c060-4d12-898e-70bbd122f2e3-0_0-19-16_20231122100234339.parquet|1 |1 |str_1|2023-01-01 12:12:12|2023-10-01| +---+-+--+--++---+---+-+---+--+ merge into hudi_ut_time_traval t using ( select 1 as id, 2 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' as timestamp) as birthDate, cast('2023-10-02' as date) as inc_day ) s on t.id=s.id when matched THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT * Caused by: org.apache.parquet.io.ParquetDecodingException: The requested schema is not compatible with the file schema. incompatible types: required int32 id != optional int32 id at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:101) at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:81) at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:57) at org.apache.parquet.schema.MessageType.accept(MessageType.java:55) at org.apache.parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:162) at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:135) at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:225); parquet schema: { "type" : "record", "name" : "hudi_ut_time_traval_record", "namespace" : "hoodie.hudi_ut_time_traval", "fields" : [ { "name" : "_hoodie_commit_time", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_commit_seqno", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_record_key", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_partition_path", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_file_name", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "id", "type" : [ "null", "int" ], "default" : null }, { "name" : "version", "type" : [ "null", "int" ], "default" : null }, { "name" : "name", "type" : [ "null", "string" ], "default" : null }, { "name" : "birthDate", "type" : [ "null", { "type" : "long", "logicalType" : "timestamp-micros" } ], "default" : null }, { "name" : "inc_day", "type" : [ "null", "string" ], "default" : null } ] } org.apache.hudi.io.HoodieMergedReadHandle#readerSchema: {"type":"record","name":"hudi_ut_time_traval_record","namespace":"hoodie.hudi_ut_time_traval","fields":[\{"name":"id","type":"int"},\{"name":"version","type":"int"},\{"name":"name","type":"string"},\{"name":"birthDate","type":["null",{"type":"long","logicalType":"timestamp-micros"}],"default":null},\{"name":"inc_day","type":["null",{"type":"int","logicalType":"date"}],"default":null}]} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7004] Add support of snapshotLoadQuerySplitter in s3/gcs sources [hudi]
hudi-bot commented on PR #10152: URL: https://github.com/apache/hudi/pull/10152#issuecomment-1821984821 ## CI report: * 9764bc6527e5e3e83ed08263484beb45c1796d47 UNKNOWN * d0fe92994777e2067d654e2585c75c91616f8598 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21061) * b1748e270c379c479bb3286e635482d204b853c5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21071) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]
nsivabalan commented on code in PR #10150: URL: https://github.com/apache/hudi/pull/10150#discussion_r1401416490 ## hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java: ## @@ -89,6 +90,18 @@ public Option getInsertValue(Schema schema, Properties properties return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord); } + public boolean isDeleted(Schema schema, Properties props) { +if (recordBytes.length == 0) { + return true; +} +try { + GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, schema); + return isDeleteRecord(incomingRecord, props); Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7097] Fixing instantiation of Hms Uri with HiveSync tool [hudi]
xushiyan commented on code in PR #10099: URL: https://github.com/apache/hudi/pull/10099#discussion_r1401411887 ## hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java: ## @@ -103,15 +103,29 @@ public class HiveSyncTool extends HoodieSyncTool implements AutoCloseable { public HiveSyncTool(Properties props, Configuration hadoopConf) { super(props, hadoopConf); -String metastoreUris = props.getProperty(METASTORE_URIS.key()); -// Give precedence to HiveConf.ConfVars.METASTOREURIS if it is set. -// Else if user has provided HiveSyncConfigHolder.METASTORE_URIS, then set that in hadoop conf. -if (isNullOrEmpty(hadoopConf.get(HiveConf.ConfVars.METASTOREURIS.varname)) && nonEmpty(metastoreUris)) { - LOG.info(String.format("Setting %s = %s", HiveConf.ConfVars.METASTOREURIS.varname, metastoreUris)); - hadoopConf.set(HiveConf.ConfVars.METASTOREURIS.varname, metastoreUris); +String configuredMetastoreUris = props.getProperty(METASTORE_URIS.key()); +String existingHadoopConfMetastoreUris = hadoopConf.get(HiveConf.ConfVars.METASTOREURIS.varname); Review Comment: so wrote a small snippet to check how much memory it could cost. ```java Configuration originalConf = hadoopConf(); long freeMem0 = Runtime.getRuntime().freeMemory(); IntStream.range(0, 1000).forEach(i -> { originalConf.set("typical.hadoop.configuration.key" + i, "https://www.example.com:8080/path?query=value#fragment; + i); }); System.out.println("no conf entries: " + originalConf.size()); List l = new ArrayList<>(); IntStream.range(0, 100).forEach(i -> { l.add(new Configuration(originalConf)); }); long freeMem1 = Runtime.getRuntime().freeMemory(); System.out.println("after copy, used mem (MB): " + (freeMem0 - freeMem1) / (1024.0 * 1024.0)); ``` each hadoop conf has 2k+ properties and making 100 copies cost 30mb ``` no conf entries: 2162 after copy, used mem (MB): 29.616012573242188 ``` this is even extreme case with this number of confs and metasync tasks. so i don't think memory will be an issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]
danny0405 commented on code in PR #10150: URL: https://github.com/apache/hudi/pull/10150#discussion_r1401397731 ## hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java: ## @@ -89,6 +90,18 @@ public Option getInsertValue(Schema schema, Properties properties return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord); } + public boolean isDeleted(Schema schema, Properties props) { +if (recordBytes.length == 0) { + return true; +} +try { + GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, schema); + return isDeleteRecord(incomingRecord, props); Review Comment: Maybe we just cache a specific `isDeleted` flag for this `DefaultHoodieRecordPayload` so that this flag can be reused all the time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]
danny0405 commented on code in PR #10150: URL: https://github.com/apache/hudi/pull/10150#discussion_r1401397731 ## hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java: ## @@ -89,6 +90,18 @@ public Option getInsertValue(Schema schema, Properties properties return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord); } + public boolean isDeleted(Schema schema, Properties props) { +if (recordBytes.length == 0) { + return true; +} +try { + GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, schema); + return isDeleteRecord(incomingRecord, props); Review Comment: I feel like we can cache the avro generic record for a little while for the `isDeleted` call based on the fact that the `isDeleted` should always be invoked before `getInsertValue` and `combineAndGetUpdateValue`, we can destroy the avro record in the last step of calling `getInsertValue`, that would eliminate 2 deserializations of avro. Another choice is we just cache a specific `isDeleted` flag for this `DefaultHoodieRecordPayload` so that this flag can be reused all the time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7096] Improving incremental query to fetch partitions based on commit metadata [hudi]
hudi-bot commented on PR #10098: URL: https://github.com/apache/hudi/pull/10098#issuecomment-1821947415 ## CI report: * abd651f8dcbde53717e473efe1c15d4bd486b0eb Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21011) * c64942862556ae29fb52af06007bc5a303d42100 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21078) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7086] Scaling gcs event source [hudi]
hudi-bot commented on PR #10073: URL: https://github.com/apache/hudi/pull/10073#issuecomment-1821947343 ## CI report: * fff2ac40c67fdcd15fdf4b65890e00d63aa60a0a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21014) * 48df6bbec2473dbbbedb1b723896acb17056e80f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21076) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]
hudi-bot commented on PR #10150: URL: https://github.com/apache/hudi/pull/10150#issuecomment-1821947562 ## CI report: * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059) * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073) * 1f7f7cdaf5be480a170169e5d97bc6ec76aa5d6f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21079) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7095] Making perf enhancements to JSON serde [hudi]
hudi-bot commented on PR #10097: URL: https://github.com/apache/hudi/pull/10097#issuecomment-1821947383 ## CI report: * aaf5a310c5ac999c81498308fdc11d6d5171463d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21016) * 43400ce2317882c76a68eb3a855c9dd814c92234 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21077) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7023] Support querying without syncing partition metadata to catalog [hudi]
hudi-bot commented on PR #10153: URL: https://github.com/apache/hudi/pull/10153#issuecomment-1821942182 ## CI report: * 46a4c3344c79fd9a61db78620e8c40e7d98bcd36 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21062) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7034] Refresh index fix - remove cached file slices within part… [hudi]
hudi-bot commented on PR #10151: URL: https://github.com/apache/hudi/pull/10151#issuecomment-1821942128 ## CI report: * 3d1e603aea3cc23614de38d511b5d4ddeac92f5d Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21075) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]
hudi-bot commented on PR #10150: URL: https://github.com/apache/hudi/pull/10150#issuecomment-1821942086 ## CI report: * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059) * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073) * 1f7f7cdaf5be480a170169e5d97bc6ec76aa5d6f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]
hudi-bot commented on PR #10102: URL: https://github.com/apache/hudi/pull/10102#issuecomment-1821941946 ## CI report: * a7f01f6ad7008830e6e2993b0ba5c986ca493093 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21074) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7096] Improving incremental query to fetch partitions based on commit metadata [hudi]
hudi-bot commented on PR #10098: URL: https://github.com/apache/hudi/pull/10098#issuecomment-1821941910 ## CI report: * abd651f8dcbde53717e473efe1c15d4bd486b0eb Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21011) * c64942862556ae29fb52af06007bc5a303d42100 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7095] Making perf enhancements to JSON serde [hudi]
hudi-bot commented on PR #10097: URL: https://github.com/apache/hudi/pull/10097#issuecomment-1821941849 ## CI report: * aaf5a310c5ac999c81498308fdc11d6d5171463d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21016) * 43400ce2317882c76a68eb3a855c9dd814c92234 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7086] Scaling gcs event source [hudi]
hudi-bot commented on PR #10073: URL: https://github.com/apache/hudi/pull/10073#issuecomment-1821941789 ## CI report: * fff2ac40c67fdcd15fdf4b65890e00d63aa60a0a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21014) * 48df6bbec2473dbbbedb1b723896acb17056e80f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7112] Reuse existing timeline server and performance improvements [hudi]
nsivabalan commented on code in PR #10122: URL: https://github.com/apache/hudi/pull/10122#discussion_r1401369779 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteMergeOnReadWithCompact.java: ## @@ -159,6 +159,8 @@ public void testNonBlockingConcurrencyControlWithPartialUpdatePayload() throws E // because the data files belongs 3rd commit is not included in the last compaction. Map readOptimizedResult = Collections.singletonMap("par1", "[id1,par1,id1,Danny,23,2,par1]"); TestData.checkWrittenData(tempFile, readOptimizedResult, 1); +pipeline1.end(); +pipeline2.end(); Review Comment: responded below ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java: ## @@ -29,37 +31,95 @@ import org.apache.hudi.config.HoodieWriteConfig; import org.apache.hudi.timeline.service.TimelineService; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; +import java.util.HashMap; +import java.util.HashSet; +import java.util.Map; +import java.util.Properties; +import java.util.Set; +import java.util.concurrent.atomic.AtomicInteger; /** * Timeline Service that runs as part of write client. */ public class EmbeddedTimelineService { + // lock used when starting/stopping/modifying embedded services + private static final Object SERVICE_LOCK = new Object(); private static final Logger LOG = LoggerFactory.getLogger(EmbeddedTimelineService.class); - + private static final AtomicInteger NUM_SERVERS_RUNNING = new AtomicInteger(0); + // Map of port to existing timeline service running on that port + private static final Map RUNNING_SERVICES = new HashMap<>(); + private static final Registry METRICS_REGISTRY = Registry.getRegistry("TimelineService"); + private static final String NUM_EMBEDDED_TIMELINE_SERVERS = "numEmbeddedTimelineServers"; private int serverPort; private String hostAddr; - private HoodieEngineContext context; + private final HoodieEngineContext context; private final SerializableConfiguration hadoopConf; private final HoodieWriteConfig writeConfig; - private final String basePath; + private TimelineService.Config serviceConfig; + private final Set basePaths; // the set of base paths using this EmbeddedTimelineService private transient FileSystemViewManager viewManager; private transient TimelineService server; - public EmbeddedTimelineService(HoodieEngineContext context, String embeddedTimelineServiceHostAddr, HoodieWriteConfig writeConfig) { + private EmbeddedTimelineService(HoodieEngineContext context, String embeddedTimelineServiceHostAddr, HoodieWriteConfig writeConfig) { setHostAddr(embeddedTimelineServiceHostAddr); this.context = context; this.writeConfig = writeConfig; -this.basePath = writeConfig.getBasePath(); +this.basePaths = new HashSet<>(); +this.basePaths.add(writeConfig.getBasePath()); this.hadoopConf = context.getHadoopConf(); this.viewManager = createViewManager(); } + /** + * Returns an existing embedded timeline service if one is running for the given configuration and reuse is enabled, or starts a new one. + * @param context The {@link HoodieEngineContext} for the client + * @param embeddedTimelineServiceHostAddr The host address to use for the service (nullable) + * @param writeConfig The {@link HoodieWriteConfig} for the client + * @return A running {@link EmbeddedTimelineService} + * @throws IOException if an error occurs while starting the service + */ + public static EmbeddedTimelineService getOrStartEmbeddedTimelineService(HoodieEngineContext context, String embeddedTimelineServiceHostAddr, HoodieWriteConfig writeConfig) throws IOException { +return getOrStartEmbeddedTimelineService(context, embeddedTimelineServiceHostAddr, writeConfig, TimelineService::new); + } + + static EmbeddedTimelineService getOrStartEmbeddedTimelineService(HoodieEngineContext context, String embeddedTimelineServiceHostAddr, HoodieWriteConfig writeConfig, + TimelineServiceCreator timelineServiceCreator) throws IOException { +// if reuse is enabled, check if any existing instances are compatible +if (writeConfig.isEmbeddedTimelineServerReuseEnabled()) { + synchronized (SERVICE_LOCK) { +for (EmbeddedTimelineService service : RUNNING_SERVICES.values()) { + if (service.canReuseFor(writeConfig, embeddedTimelineServiceHostAddr)) { +service.addBasePath(writeConfig.getBasePath()); +LOG.info("Reusing existing embedded timeline server with configuration: " + service.serviceConfig); +return service; + } +} +// if no compatible instance is found, create a new one +
Re: [PR] [HUDI-7034] Refresh index fix - remove cached file slices within part… [hudi]
hudi-bot commented on PR #10151: URL: https://github.com/apache/hudi/pull/10151#issuecomment-1821901635 ## CI report: * b124e2a54cd9b3fec6d19c7c131b93234cd8c68c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21060) * 3d1e603aea3cc23614de38d511b5d4ddeac92f5d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21075) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]
hudi-bot commented on PR #10150: URL: https://github.com/apache/hudi/pull/10150#issuecomment-1821901599 ## CI report: * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059) * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]
hudi-bot commented on PR #10102: URL: https://github.com/apache/hudi/pull/10102#issuecomment-1821901457 ## CI report: * c3ff2511a30564e5a5ff0cb407326ff6ef0584e3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20930) * a7f01f6ad7008830e6e2993b0ba5c986ca493093 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21074) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7034] Refresh index fix - remove cached file slices within part… [hudi]
hudi-bot commented on PR #10151: URL: https://github.com/apache/hudi/pull/10151#issuecomment-1821895329 ## CI report: * b124e2a54cd9b3fec6d19c7c131b93234cd8c68c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21060) * 3d1e603aea3cc23614de38d511b5d4ddeac92f5d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]
hudi-bot commented on PR #10102: URL: https://github.com/apache/hudi/pull/10102#issuecomment-1821895011 ## CI report: * c3ff2511a30564e5a5ff0cb407326ff6ef0584e3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20930) * a7f01f6ad7008830e6e2993b0ba5c986ca493093 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]
linliu-code commented on code in PR #10102: URL: https://github.com/apache/hudi/pull/10102#discussion_r1401341463 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java: ## @@ -260,7 +260,7 @@ private void scanInternalV1(Option keySpecOpt) { && !HoodieTimeline.compareTimestamps(logBlock.getLogBlockHeader().get(INSTANT_TIME), HoodieTimeline.LESSER_THAN_OR_EQUALS, this.latestInstantTime )) { // hit a block with instant time greater than should be processed, stop processing further - break; + continue; } Review Comment: @danny0405, after discussing with @yihua , the fix here is correct since the order of log blocks have been reversed and the "break" logic was for the in-the-order-of-time. CC:@yihua -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]
linliu-code commented on code in PR #10102: URL: https://github.com/apache/hudi/pull/10102#discussion_r1401341463 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java: ## @@ -260,7 +260,7 @@ private void scanInternalV1(Option keySpecOpt) { && !HoodieTimeline.compareTimestamps(logBlock.getLogBlockHeader().get(INSTANT_TIME), HoodieTimeline.LESSER_THAN_OR_EQUALS, this.latestInstantTime )) { // hit a block with instant time greater than should be processed, stop processing further - break; + continue; } Review Comment: @danny0405, after discussing with @yihua , the fix here is correct since the order of log blocks have been reversed and the "break" logic was for the old design where the blocks are in the order of time. CC:@yihua -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]
linliu-code commented on code in PR #10102: URL: https://github.com/apache/hudi/pull/10102#discussion_r1401341463 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java: ## @@ -260,7 +260,7 @@ private void scanInternalV1(Option keySpecOpt) { && !HoodieTimeline.compareTimestamps(logBlock.getLogBlockHeader().get(INSTANT_TIME), HoodieTimeline.LESSER_THAN_OR_EQUALS, this.latestInstantTime )) { // hit a block with instant time greater than should be processed, stop processing further - break; + continue; } Review Comment: @danny0405, after discussing with @yihua , the fix here is correct since the order of log blocks have been reserved, and the "break" logic was for the in-the-order-of-time. CC:@yihua -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7034] Refresh index fix - remove cached file slices within part… [hudi]
VitoMakarevich commented on PR #10151: URL: https://github.com/apache/hudi/pull/10151#issuecomment-1821859085 Added a test, at least `fileIndex`-related tests pass(I ran only them), also can verify that executing `.refresh` on existing Index does not refresh a list of files as reported, with this one-line code change it starts working correctly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7129] Fix bug when upgrade from table version three using UpgradeOrDowngradeProcedure [hudi]
hudi-bot commented on PR #10147: URL: https://github.com/apache/hudi/pull/10147#issuecomment-1821842256 ## CI report: * 994d062df78afd5062dec418cddff167daff42d8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21058) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7110] Add call procedure for show column stats information [hudi]
hudi-bot commented on PR #10120: URL: https://github.com/apache/hudi/pull/10120#issuecomment-1821786342 ## CI report: * 03451f7cd016ee9fb078f4d78f3b771e8719c233 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21056) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]
hudi-bot commented on PR #10150: URL: https://github.com/apache/hudi/pull/10150#issuecomment-1821715566 ## CI report: * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059) * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]
hudi-bot commented on PR #10150: URL: https://github.com/apache/hudi/pull/10150#issuecomment-1821703634 ## CI report: * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059) * c586dc82aa4c791cabd8f3172ee1f982f71433da UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7120] Performance improvements in deltastreamer executor code path [hudi]
hudi-bot commented on PR #10135: URL: https://github.com/apache/hudi/pull/10135#issuecomment-1821703416 ## CI report: * 3d48bfc5c41a59a1114eb73a5ef9a7b7fda5eccf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21053) * 34ffc8261d951bde500df7688800b2ed6afb4fa6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21069) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7112] Reuse existing timeline server and performance improvements [hudi]
hudi-bot commented on PR #10122: URL: https://github.com/apache/hudi/pull/10122#issuecomment-1821683706 ## CI report: * 697114b6ec4f578123363a89a6846e352bc3a53e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21057) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7115] Add in new options for the bigquery sync (#10125)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new a1afcdd989c [HUDI-7115] Add in new options for the bigquery sync (#10125) a1afcdd989c is described below commit a1afcdd989ce2d634290d1bd9e099a17057e6b4d Author: Tim Brown AuthorDate: Tue Nov 21 14:58:12 2023 -0600 [HUDI-7115] Add in new options for the bigquery sync (#10125) - Add in new options for the bigquery sync --- hudi-gcp/pom.xml | 3 +- .../hudi/gcp/bigquery/BigQuerySyncConfig.java | 20 .../apache/hudi/gcp/bigquery/BigQuerySyncTool.java | 23 + .../gcp/bigquery/HoodieBigQuerySyncClient.java | 58 +++--- .../hudi/gcp/bigquery/TestBigQuerySyncConfig.java | 2 +- .../hudi/gcp/bigquery/TestBigQuerySyncTool.java| 12 ++--- .../gcp/bigquery/TestBigQuerySyncToolArgs.java | 8 ++- .../gcp/bigquery/TestHoodieBigQuerySyncClient.java | 26 ++ 8 files changed, 114 insertions(+), 38 deletions(-) diff --git a/hudi-gcp/pom.xml b/hudi-gcp/pom.xml index b1cfb8076a6..2c308fbf424 100644 --- a/hudi-gcp/pom.xml +++ b/hudi-gcp/pom.xml @@ -36,7 +36,7 @@ See https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google com.google.cloud libraries-bom -25.1.0 +26.15.0 pom import @@ -70,7 +70,6 @@ See https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google com.google.cloud google-cloud-pubsub - ${google.cloud.pubsub.version} diff --git a/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java b/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java index 94510ca8dfa..ed8895ca217 100644 --- a/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java +++ b/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java @@ -122,6 +122,20 @@ public class BigQuerySyncConfig extends HoodieSyncConfig implements Serializable .markAdvanced() .withDocumentation("Fetch file listing from Hudi's metadata"); + public static final ConfigProperty BIGQUERY_SYNC_REQUIRE_PARTITION_FILTER = ConfigProperty + .key("hoodie.gcp.bigquery.sync.require_partition_filter") + .defaultValue(false) + .sinceVersion("0.14.1") + .markAdvanced() + .withDocumentation("If true, configure table to require a partition filter to be specified when querying the table"); + + public static final ConfigProperty BIGQUERY_SYNC_BIG_LAKE_CONNECTION_ID = ConfigProperty + .key("hoodie.gcp.bigquery.sync.big_lake_connection_id") + .noDefaultValue() + .sinceVersion("0.14.1") + .markAdvanced() + .withDocumentation("The Big Lake connection ID to use"); + public BigQuerySyncConfig(Properties props) { super(props); setDefaults(BigQuerySyncConfig.class.getName()); @@ -147,6 +161,10 @@ public class BigQuerySyncConfig extends HoodieSyncConfig implements Serializable public String sourceUri; @Parameter(names = {"--source-uri-prefix"}, description = "Name of the source uri gcs path prefix of the table", required = false) public String sourceUriPrefix; +@Parameter(names = {"--big-lake-connection-id"}, description = "The Big Lake connection ID to use when creating the table if using the manifest file approach.") +public String bigLakeConnectionId; +@Parameter(names = {"--require-partition-filter"}, description = "If true, configure table to require a partition filter to be specified when querying the table") +public Boolean requirePartitionFilter; public boolean isHelp() { return hoodieSyncConfigParams.isHelp(); @@ -164,6 +182,8 @@ public class BigQuerySyncConfig extends HoodieSyncConfig implements Serializable props.setPropertyIfNonNull(BIGQUERY_SYNC_SYNC_BASE_PATH.key(), hoodieSyncConfigParams.basePath); props.setPropertyIfNonNull(BIGQUERY_SYNC_PARTITION_FIELDS.key(), StringUtils.join(",", hoodieSyncConfigParams.partitionFields)); props.setPropertyIfNonNull(BIGQUERY_SYNC_USE_FILE_LISTING_FROM_METADATA.key(), hoodieSyncConfigParams.useFileListingFromMetadata); + props.setPropertyIfNonNull(BIGQUERY_SYNC_BIG_LAKE_CONNECTION_ID.key(), bigLakeConnectionId); + props.setPropertyIfNonNull(BIGQUERY_SYNC_REQUIRE_PARTITION_FILTER.key(), requirePartitionFilter); return props; } } diff --git a/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncTool.java b/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncTool.java index 19c8449f8fa..28c071e5231 100644 --- a/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncTool.java +++ b/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncTool.java
Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]
nsivabalan merged PR #10125: URL: https://github.com/apache/hudi/pull/10125 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7129] Fix bug when upgrade from table version three using UpgradeOrDowngradeProcedure [hudi]
hudi-bot commented on PR #10147: URL: https://github.com/apache/hudi/pull/10147#issuecomment-1821620165 ## CI report: * 1dee5fb303eff272371c638d07d80806676fd5aa Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21054) * 994d062df78afd5062dec418cddff167daff42d8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21058) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7130] Adding support for configuring value serializer with JsonKakfaSource [hudi]
hudi-bot commented on PR #10149: URL: https://github.com/apache/hudi/pull/10149#issuecomment-1821620236 ## CI report: * e809a39b71dcfa3ddcfc6348b6740391b2a08dbd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21055) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch branch-0.x created (now 0908f648152)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch branch-0.x in repository https://gitbox.apache.org/repos/asf/hudi.git at 0908f648152 [HUDI-6999] Adding row writer support to HoodieStreamer (#9913) This branch includes the following new commits: new 226a46d4841 [HUDI-6846] Fix a bug of consistent bucket index clustering (#9679) new 69225bc9bf6 [HUDI-6823] instantiate writeTimer in StreamWriteOperatorCoordinator (#9637) new a30c904608a [HUDI-6853] ArchiveCommitsProcedure should throw an exception when the archive operation executes failed (#9703) new 4afc077f56b [MINOR] Fix hbase index config improper use (#9582) new e870ef66653 [HUDI-6630] Automatic release connection for hoodie metaserver client (#9340) new 20c5ef50bdf [HUDI-6862] Replace directory connector markers in TestSqlStatement (#9458) new 9e647b17ea1 [HUDI-6847] Improve the incremental clean fallback logic (#9681) new 903933f607b [HUDI-6848] Fix non-unique uid for hudi operators (#9680) new 68ea64f7e24 [MINOR] Close record readers in TestHoodieReaderWriterBase after use during tests (#9504) new ea0c7fa7e29 [HUDI-6870] Pass project ID to BigQuery job (#9730) new e0b2fb67816 [HUDI-6865] Fix InternalSchema schemaId when column is dropped (#9724) new fa04fb901f1 [MINOR] Enhancing validate staged bundles script (#8591) new 4c288b35053 [HUDI-6871] BigQuery sync improvements (#9741) new 2bd4d3618aa [HUDI-6708] Support record level indexing with async indexer (#9517) new b786ce7b491 [MINOR] Close resources in tests (#9685) new 7ee50a13f4a [MINOR] Fix default config values if not specified (#9625) new aea93b3b71c [HUDI-6882] Differentiate between replacecommits in cluster planning (#9755) new e4f53c5334f [MINOR] Set connection settings for maven to avoid build flakiness (#9772) new d7d0b0e5d09 [MINOR] Mark a few new configs advanced and tag since version of 0.14.0 (#9771) new b32be910dbb [HUDI-6881] Hudi configured spark.scheduler.allocation.file should include scheme since Spark3.2 (#9754) new 0ab1beb4e18 [HUDI-6011] Fix cli show archived commits breaks for replacecommit (#8345) new b688181616c [HUDI-5924] Fixing cli clean command to trim down a subset based on start and end (#8169) new 073b36a2da5 [MINOR] Fix the check for connector identity in HoodieHiveCatalog (#9770) new 936ece380ec [HUDI-6062] Fix irregular enum config (#8564) new 0dd2e0aa055 [HUDI-6893] Copy the trino bundle to override the one in the image (#9781) new a6aec4719cd [HUDI-6827] Fix task failure when insert into empty dataset (#9797) new b535919ab7d [HUDI-6892] ExternalSpillableMap may cause data duplication when flink compaction (#9778) new c935303ce51 fixing build/compilation issue. Fixed missing import in HoodieTableMetadataUtil new b9980984f2e [HUDI-6922] Fix inconsistency between base file format and catalog input format (#9830) new 757b0a529ab [HUDI-6828] Fix wrong partitionToReplaceIds when insertOverwrite empty data into partitions (#9811) new c88d6ffcbd5 [MINOR] Disable falky integration test temporarily (#9823) new bab7a1ed44a [HUDI-6916] Improve performance of Custom Key Generators (#9821) new a66cf28be04 [HUDI-6913] Set default database name correctly (#9816) new c925d98c170 [HUDI-5911] SimpleTransactionDirectMarkerBasedDetectionStrategy can't work with none-partitioned table (#8143) new 05867751b3b [HUDI-6926] Disable DROP_PARTITION_COLUMNS when upsert MOR table (#9840) new fcb7c89fe75 [HUDI-6873] fix clustering mor (#9774) new 42f09b3d4ff [HUDI-6927] CDC file clean not work (#9841) new 25db3575fe5 [HUDI-6917] Fix docker integ tests (#9843) new 8c616c1fc74 Fixing build failures with InsertIntoHoodieTableCommand new 9665ef44928 [HUDI-6937] CopyOnWriteInsertHandler#consume cause clustering performance degradation (#9851) new b8186d11303 Follow up HUDI-6937, fix the RealtimeCompactedRecordReader props instantiation (#9853) new 63d513ef543 [HUDI-6894] ReflectionUtils is not thread safe (#9786) new 14e89fd7866 [HUDI-6941] Fix partition pruning for multiple partition fields (#9863) new 93d6a66b577 [HUDI-6944] Fix flink boostrap concurrency issue (#9867) new 3e33ecde8ba [HUDI-6945] Fix HoodieRowDataParquetWriter cast issue (#9868) new bca004c3a09 [HUDI-6924] Fix hoodie table config not wok in table properties (#9836) new e60690a52cd [HUDI-6950] Query should process listed partitions to avoid driver oom due to large number files in table first partition (#9875) new 7121c9826b0 [MINOR] HFileBootstrapIndex: use try-with-resources in two places (#9813) new 871f8b7e6e1 [HUDI-6369] Fix spacial curve with sample strategy fails when 0 or 1 rows only is incoming (#9053) new bee5e5c5da9 [HUDI-5031] Fix MERGE INTO creates empty partition files
Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]
abhisheksahani91 commented on issue #10138: URL: https://github.com/apache/hudi/issues/10138#issuecomment-1821565443 @ad1happy2go I also want to add the point the connection refused error is observed when I am generating the high load on hudi ingestion https://github.com/apache/hudi/assets/122790088/a4802067-b8a6-4c57-9a09-f1f22036842e;> In above screen shot you can see Hudi is reading more than 1 million records from kafka in a single read post second read Async compaction has triggered and it resulted in connection refused error https://github.com/apache/hudi/assets/122790088/cc99d2e0-024b-49a1-85d5-78c2e852ce14;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7004] Add support of snapshotLoadQuerySplitter in s3/gcs sources [hudi]
hudi-bot commented on PR #10152: URL: https://github.com/apache/hudi/pull/10152#issuecomment-1821555262 ## CI report: * 9764bc6527e5e3e83ed08263484beb45c1796d47 UNKNOWN * d0fe92994777e2067d654e2585c75c91616f8598 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21061) * b1748e270c379c479bb3286e635482d204b853c5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21071) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7052] Fix partition key validation for custom key generators. [hudi]
hudi-bot commented on PR #10014: URL: https://github.com/apache/hudi/pull/10014#issuecomment-1821554730 ## CI report: * 898d03c01442d0b4ac84056f25ff49f1f9aba0c0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20733) * c359d7d70cb6e34ce4d62b2f71f39b91b49ea334 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21070) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7120] Performance improvements in deltastreamer executor code path [hudi]
hudi-bot commented on PR #10135: URL: https://github.com/apache/hudi/pull/10135#issuecomment-1821544132 ## CI report: * 4913158456e1dfaa1366ba7bd5029578f3bf4cef Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21041) * 3d48bfc5c41a59a1114eb73a5ef9a7b7fda5eccf Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21053) * 34ffc8261d951bde500df7688800b2ed6afb4fa6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21069) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7004] Add support of snapshotLoadQuerySplitter in s3/gcs sources [hudi]
hudi-bot commented on PR #10152: URL: https://github.com/apache/hudi/pull/10152#issuecomment-1821544519 ## CI report: * 9764bc6527e5e3e83ed08263484beb45c1796d47 UNKNOWN * d0fe92994777e2067d654e2585c75c91616f8598 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21061) * b1748e270c379c479bb3286e635482d204b853c5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7052] Fix partition key validation for custom key generators. [hudi]
hudi-bot commented on PR #10014: URL: https://github.com/apache/hudi/pull/10014#issuecomment-1821543331 ## CI report: * 898d03c01442d0b4ac84056f25ff49f1f9aba0c0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20733) * c359d7d70cb6e34ce4d62b2f71f39b91b49ea334 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org