Re: [PR] [HUDI-7086] Scaling gcs event source [hudi]

2023-11-21 Thread via GitHub


nsivabalan commented on PR #10073:
URL: https://github.com/apache/hudi/pull/10073#issuecomment-1822259003

   testHoodieAsyncClusteringJobWithScheduleAndExecute{String, 
HoodieRecordType}[1]   
   is known to be flaky. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7110] Add call procedure for show column stats information [hudi]

2023-11-21 Thread via GitHub


majian1998 closed pull request #10120: [HUDI-7110] Add call procedure for show 
column stats information
URL: https://github.com/apache/hudi/pull/10120


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7133] Improve dbt example for better guidance [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10155:
URL: https://github.com/apache/hudi/pull/10155#issuecomment-1822243889

   
   ## CI report:
   
   * af912a3fed7270708fad935b7df55fb508cd5536 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21090)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10150:
URL: https://github.com/apache/hudi/pull/10150#issuecomment-1822243792

   
   ## CI report:
   
   * 7a678c8f26e7b94fca3812d29e9ddca59b083127 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7123] Improve CI scripts [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10136:
URL: https://github.com/apache/hudi/pull/10136#issuecomment-1822243691

   
   ## CI report:
   
   * dd3d933d329208fcdf9c00ed2dcb12a7e22cce26 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20982)
 
   * 0aede30a2da391f50f6d750b102901e811b35880 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21089)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7112] Reuse existing timeline server and performance improvements [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10122:
URL: https://github.com/apache/hudi/pull/10122#issuecomment-1822243625

   
   ## CI report:
   
   * bbf765005b4e1e92730e0dff736bcb561d928b7b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21081)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7095] Making perf enhancements to JSON serde [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10097:
URL: https://github.com/apache/hudi/pull/10097#issuecomment-1822243447

   
   ## CI report:
   
   * 43400ce2317882c76a68eb3a855c9dd814c92234 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21077)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7133] Improve dbt example for better guidance [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10155:
URL: https://github.com/apache/hudi/pull/10155#issuecomment-1822235911

   
   ## CI report:
   
   * af912a3fed7270708fad935b7df55fb508cd5536 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7123] Improve CI scripts [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10136:
URL: https://github.com/apache/hudi/pull/10136#issuecomment-1822235518

   
   ## CI report:
   
   * dd3d933d329208fcdf9c00ed2dcb12a7e22cce26 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20982)
 
   * 0aede30a2da391f50f6d750b102901e811b35880 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] hudi-examples-dbt not running with spark thrift server [hudi]

2023-11-21 Thread via GitHub


xushiyan commented on issue #6125:
URL: https://github.com/apache/hudi/issues/6125#issuecomment-181787

   @sambhav13 I'm updating the instructions in the dbt example (using spark 3.2 
and hudi 0.14.0). Please check this out and let us know if it helps.
   
   
https://github.com/apache/hudi/blob/af912a3fed7270708fad935b7df55fb508cd5536/hudi-examples/hudi-examples-dbt/README.md


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7133) Improve dbt example for better guidance

2023-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7133:
-
Labels: pull-request-available  (was: )

> Improve dbt example for better guidance
> ---
>
> Key: HUDI-7133
> URL: https://issues.apache.org/jira/browse/HUDI-7133
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: docs
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7133] Improve dbt example for better guidance [hudi]

2023-11-21 Thread via GitHub


xushiyan opened a new pull request, #10155:
URL: https://github.com/apache/hudi/pull/10155

   ### Change Logs
   
   Update dbt example with more detailed instructions.
   
   ### Impact
   
   Improve dbt example for learning.
   
   ### Risk level
   
   None.
   
   ### Documentation Update
   
   NA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7133) Improve dbt example for better guidance

2023-11-21 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-7133:


 Summary: Improve dbt example for better guidance
 Key: HUDI-7133
 URL: https://issues.apache.org/jira/browse/HUDI-7133
 Project: Apache Hudi
  Issue Type: Improvement
  Components: docs
Reporter: Raymond Xu
Assignee: Raymond Xu
 Fix For: 0.14.1






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7096) Improve Incr Query for partitions touched based on start and end

2023-11-21 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-7096.
-
Fix Version/s: 0.14.1
   Resolution: Fixed

> Improve Incr Query for partitions touched based on start and end
> 
>
> Key: HUDI-7096
> URL: https://issues.apache.org/jira/browse/HUDI-7096
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.1
>
>
> We could improve incr query by directly fetching the partitions based on 
> commit metadata for commits based on start and end. And thus avoiding to poll 
> metadata table or do file system based listing to fetch partitions in 
> FileIndex



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7052] Fix partition key validation for custom key generators. [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10014:
URL: https://github.com/apache/hudi/pull/10014#issuecomment-1822191008

   
   ## CI report:
   
   * c359d7d70cb6e34ce4d62b2f71f39b91b49ea334 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21070)
 
   * 32469364d39eebf69fb001955aa2cccdfc772f1c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21087)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (2522f6de6f1 -> c5af85dfd91)

2023-11-21 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 2522f6de6f1 [HUDI-7128] DeleteMarkerProcedures support delete in batch 
mode (#10148)
 add c5af85dfd91 [HUDI-7096] Improving incremental query to fetch 
partitions based on commit metadata (#10098)

No new revisions were added by this update.

Summary of changes:
 .../hudi/client/BaseHoodieTableServiceClient.java  |  4 +++
 .../org/apache/hudi/BaseHoodieTableFileIndex.java  | 30 --
 .../hudi/common/testutils/HoodieTestUtils.java | 10 +++-
 .../hudi/hadoop/HiveHoodieTableFileIndex.java  |  4 ++-
 .../scala/org/apache/hudi/HoodieFileIndex.scala|  4 ++-
 .../apache/hudi/SparkHoodieTableFileIndex.scala|  8 --
 .../hudi/TestHoodieMergeHandleWithSparkMerger.java |  6 -
 .../org/apache/hudi/functional/TestBootstrap.java  |  7 ++---
 8 files changed, 62 insertions(+), 11 deletions(-)



Re: [PR] [HUDI-7096] Improving incremental query to fetch partitions based on commit metadata [hudi]

2023-11-21 Thread via GitHub


nsivabalan merged PR #10098:
URL: https://github.com/apache/hudi/pull/10098


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7004] Add support of snapshotLoadQuerySplitter in s3/gcs sources [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10152:
URL: https://github.com/apache/hudi/pull/10152#issuecomment-1822183860

   
   ## CI report:
   
   * 9764bc6527e5e3e83ed08263484beb45c1796d47 UNKNOWN
   * b1748e270c379c479bb3286e635482d204b853c5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21071)
 
   * 6407bc6c69ed9e43b970dbd4a7d5a441ebe45150 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21086)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7110] Add call procedure for show column stats information [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10120:
URL: https://github.com/apache/hudi/pull/10120#issuecomment-1822183724

   
   ## CI report:
   
   * 03451f7cd016ee9fb078f4d78f3b771e8719c233 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21056)
 
   * 946ec17878c5110741fa3b1e3bbff4fc804d77ed Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21085)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7052] Fix partition key validation for custom key generators. [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10014:
URL: https://github.com/apache/hudi/pull/10014#issuecomment-1822183538

   
   ## CI report:
   
   * c359d7d70cb6e34ce4d62b2f71f39b91b49ea334 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21070)
 
   * 32469364d39eebf69fb001955aa2cccdfc772f1c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10150:
URL: https://github.com/apache/hudi/pull/10150#issuecomment-1822176875

   
   ## CI report:
   
   * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073)
 
   * 1f7f7cdaf5be480a170169e5d97bc6ec76aa5d6f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21079)
 
   * 2db7b8bee13140a4756427aeb802bad13822e5af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21080)
 
   * 7a678c8f26e7b94fca3812d29e9ddca59b083127 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21083)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7110] Add call procedure for show column stats information [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10120:
URL: https://github.com/apache/hudi/pull/10120#issuecomment-1822176736

   
   ## CI report:
   
   * 03451f7cd016ee9fb078f4d78f3b771e8719c233 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21056)
 
   * 946ec17878c5110741fa3b1e3bbff4fc804d77ed UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7004] Add support of snapshotLoadQuerySplitter in s3/gcs sources [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10152:
URL: https://github.com/apache/hudi/pull/10152#issuecomment-1822169857

   
   ## CI report:
   
   * 9764bc6527e5e3e83ed08263484beb45c1796d47 UNKNOWN
   * b1748e270c379c479bb3286e635482d204b853c5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21071)
 
   * 6407bc6c69ed9e43b970dbd4a7d5a441ebe45150 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7086] Scaling gcs event source [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10073:
URL: https://github.com/apache/hudi/pull/10073#issuecomment-1822169590

   
   ## CI report:
   
   * 48df6bbec2473dbbbedb1b723896acb17056e80f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21076)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Fail to add default partition [hudi]

2023-11-21 Thread via GitHub


ad1happy2go commented on issue #10154:
URL: https://github.com/apache/hudi/issues/10154#issuecomment-1822169099

   @njalan Do your partition column in data contains NULLS? When are you facing 
this error? Looks like you are trying to add the null partition. It may not be 
hudi related but more of hive related issue.
   
   You may try - 
   ALTER TABLE ods_xxx.xx ADD IF NOT EXISTS PARTITION (xx=null) LOCATION 
'/HIVE_DEFAULT_PARTITION'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Clean action failure triggers an exception while trying to check whether metadata is a table [hudi]

2023-11-21 Thread via GitHub


ad1happy2go commented on issue #10127:
URL: https://github.com/apache/hudi/issues/10127#issuecomment-1822166971

   @shubhamn21 Were you able to resolve this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]

2023-11-21 Thread via GitHub


ad1happy2go commented on issue #10138:
URL: https://github.com/apache/hudi/issues/10138#issuecomment-1822164822

   @abhisheksahani91 There looks like related to this which is yet to be fixed. 
   https://github.com/apache/hudi/pull/5269
   
   To unblock you can disable the timeline server for now - 
   ```
 hoodie.write.markers.type= 'direct',
 hoodie.embed.timeline.server= 'false'
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7110] Add call procedure for show column stats information [hudi]

2023-11-21 Thread via GitHub


majian1998 commented on PR #10120:
URL: https://github.com/apache/hudi/pull/10120#issuecomment-1822158300

   Improvement: Using FileSystemView to obtain the latest file slices, only 
displaying valid and up-to-date file information when showing column stats 
information. 
   At the same time, rebased the latest master branch to rerun tests.
   cc @danny0405 @stream2000 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7132) Data may be lost in flink#chk

2023-11-21 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HUDI-7132:
-
Description: 
https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
elements of eventBuffer is null
https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21

We can create new FailedWriteMetadataEvent for `subtaskFailed`


  was:
https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
elements of eventBuffer is null
https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21




> Data may be lost in flink#chk
> -
>
> Key: HUDI-7132
> URL: https://issues.apache.org/jira/browse/HUDI-7132
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 1.1.0
>Reporter: Bo Cui
>Priority: Major
>
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
> before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
> elements of eventBuffer is null
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21
> We can create new FailedWriteMetadataEvent for `subtaskFailed`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7132) Data may be lost in flink#chk

2023-11-21 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HUDI-7132:
-
Description: 
https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
elements of eventBuffer is null
https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21



  was:
https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
elements of eventBuffer is null
we can add a lock to the 2 line to solve the problem.
https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21




> Data may be lost in flink#chk
> -
>
> Key: HUDI-7132
> URL: https://issues.apache.org/jira/browse/HUDI-7132
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 1.1.0
>Reporter: Bo Cui
>Priority: Major
>
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
> before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
> elements of eventBuffer is null
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7110] Add call procedure for show column stats information [hudi]

2023-11-21 Thread via GitHub


danny0405 commented on PR #10120:
URL: https://github.com/apache/hudi/pull/10120#issuecomment-1822104388

   You got some test failures: 
https://github.com/apache/hudi/actions/runs/6940997119/job/18914907624?pr=10120,
 you can rebase with the latest master to fix the Azure tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7120] Performance improvements in deltastreamer executor code path [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10135:
URL: https://github.com/apache/hudi/pull/10135#issuecomment-1822079113

   
   ## CI report:
   
   * 34ffc8261d951bde500df7688800b2ed6afb4fa6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21069)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7052] Fix partition key validation for custom key generators. [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10014:
URL: https://github.com/apache/hudi/pull/10014#issuecomment-1822078574

   
   ## CI report:
   
   * c359d7d70cb6e34ce4d62b2f71f39b91b49ea334 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21070)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Invalid number of file groups for partition:column_stats [hudi]

2023-11-21 Thread via GitHub


ocean-zhc commented on issue #7657:
URL: https://github.com/apache/hudi/issues/7657#issuecomment-1822048995

   > I have came across the same problem using 0.12.0 version. I have set
   > 
   > hoodie.metadata.index.bloom.filter.enable=false 
hoodie.metadata.index.column.stats.enable=false
   > 
   > these configs to false and it helped me to bypass this error.
   
   TKS!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (a1afcdd989c -> 2522f6de6f1)

2023-11-21 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from a1afcdd989c [HUDI-7115] Add in new options for the bigquery sync 
(#10125)
 add 2522f6de6f1 [HUDI-7128] DeleteMarkerProcedures support delete in batch 
mode (#10148)

No new revisions were added by this update.

Summary of changes:
 .../command/procedures/DeleteMarkerProcedure.scala | 11 +++-
 .../procedures/DeleteSavepointProcedure.scala  | 37 +--
 .../sql/hudi/procedure/TestCallProcedure.scala | 44 ++
 .../hudi/procedure/TestSavepointsProcedure.scala   | 71 ++
 4 files changed, 143 insertions(+), 20 deletions(-)



[jira] [Closed] (HUDI-7128) DeleteProcedures support delete in batch mode

2023-11-21 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7128.

Resolution: Fixed

Fixed via master branch: 2522f6de6f13f44bac89c81bb753c58a52cc780c

> DeleteProcedures support delete in batch mode
> -
>
> Key: HUDI-7128
> URL: https://issues.apache.org/jira/browse/HUDI-7128
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.1
>
>
> DeleteMarkerProcedures support delete in batch mode
> eg:
> if user want to delete 100 or more markers,before the pr need execute 
> sparksql job for 100 times,and just once after the pr,it would reduce much 
> execute time in sparksql.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7128] DeleteProcedures support batch mode [hudi]

2023-11-21 Thread via GitHub


danny0405 merged PR #10148:
URL: https://github.com/apache/hudi/pull/10148


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-7059) Read record positions with filter pushdown using Spark parquet reader

2023-11-21 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7059:
---

Assignee: Lin Liu

> Read record positions with filter pushdown using Spark parquet reader
> -
>
> Key: HUDI-7059
> URL: https://issues.apache.org/jira/browse/HUDI-7059
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Lin Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7128] DeleteProcedures support batch mode [hudi]

2023-11-21 Thread via GitHub


xuzifu666 commented on PR #10148:
URL: https://github.com/apache/hudi/pull/10148#issuecomment-1822035400

   @danny0405 @yihua PTAL  CI error seems not related to the changed code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10150:
URL: https://github.com/apache/hudi/pull/10150#issuecomment-1822035335

   
   ## CI report:
   
   * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059)
 
   * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073)
 
   * 1f7f7cdaf5be480a170169e5d97bc6ec76aa5d6f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21079)
 
   * 2db7b8bee13140a4756427aeb802bad13822e5af Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21080)
 
   * 7a678c8f26e7b94fca3812d29e9ddca59b083127 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21083)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Making misc fixes to deltastreamer sources(S3 and GCS) [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10095:
URL: https://github.com/apache/hudi/pull/10095#issuecomment-1822035213

   
   ## CI report:
   
   * b77d51aac5e370b00bab3acfccd471cf03a1c718 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21015)
 
   * 6b39eb40210756b1cb6c50317690d5df99e09d9d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21082)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10150:
URL: https://github.com/apache/hudi/pull/10150#issuecomment-1822030407

   
   ## CI report:
   
   * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059)
 
   * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073)
 
   * 1f7f7cdaf5be480a170169e5d97bc6ec76aa5d6f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21079)
 
   * 2db7b8bee13140a4756427aeb802bad13822e5af Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21080)
 
   * 7a678c8f26e7b94fca3812d29e9ddca59b083127 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Making misc fixes to deltastreamer sources(S3 and GCS) [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10095:
URL: https://github.com/apache/hudi/pull/10095#issuecomment-1822030206

   
   ## CI report:
   
   * b77d51aac5e370b00bab3acfccd471cf03a1c718 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21015)
 
   * 6b39eb40210756b1cb6c50317690d5df99e09d9d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7112] Reuse existing timeline server and performance improvements [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10122:
URL: https://github.com/apache/hudi/pull/10122#issuecomment-1822025391

   
   ## CI report:
   
   * 697114b6ec4f578123363a89a6846e352bc3a53e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21057)
 
   * bbf765005b4e1e92730e0dff736bcb561d928b7b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21081)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7097] Fixing instantiation of Hms Uri with HiveSync tool [hudi]

2023-11-21 Thread via GitHub


nsivabalan commented on code in PR #10099:
URL: https://github.com/apache/hudi/pull/10099#discussion_r1401453914


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java:
##
@@ -103,15 +103,29 @@ public class HiveSyncTool extends HoodieSyncTool 
implements AutoCloseable {
 
   public HiveSyncTool(Properties props, Configuration hadoopConf) {
 super(props, hadoopConf);
-String metastoreUris = props.getProperty(METASTORE_URIS.key());
-// Give precedence to HiveConf.ConfVars.METASTOREURIS if it is set.
-// Else if user has provided HiveSyncConfigHolder.METASTORE_URIS, then set 
that in hadoop conf.
-if (isNullOrEmpty(hadoopConf.get(HiveConf.ConfVars.METASTOREURIS.varname)) 
&& nonEmpty(metastoreUris)) {
-  LOG.info(String.format("Setting %s = %s", 
HiveConf.ConfVars.METASTOREURIS.varname, metastoreUris));
-  hadoopConf.set(HiveConf.ConfVars.METASTOREURIS.varname, metastoreUris);
+String configuredMetastoreUris = props.getProperty(METASTORE_URIS.key());
+String existingHadoopConfMetastoreUris = 
hadoopConf.get(HiveConf.ConfVars.METASTOREURIS.varname);

Review Comment:
   appreciate it. agree then. we can simplify. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7112] Reuse existing timeline server and performance improvements [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10122:
URL: https://github.com/apache/hudi/pull/10122#issuecomment-1822019706

   
   ## CI report:
   
   * 697114b6ec4f578123363a89a6846e352bc3a53e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21057)
 
   * bbf765005b4e1e92730e0dff736bcb561d928b7b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-7131) The requested schema is not compatible with the file schema

2023-11-21 Thread loukey_j (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788607#comment-17788607
 ] 

loukey_j commented on HUDI-7131:


The schema of the table has not changed, only the partition value of the data 
has changed.

> The requested schema is not compatible with the file schema
> ---
>
> Key: HUDI-7131
> URL: https://issues.apache.org/jira/browse/HUDI-7131
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: loukey_j
>Priority: Critical
>  Labels: core, merge, spark
>
> use global Index and data partition change , report an error: The requested 
> schema is not compatible with the file schema...
> Why not use the schema of 
> org.apache.hudi.common.table.TableSchemaResolver#getTableAvroSchemaInternal 
> to read hudi data
>  
> CREATE TABLE if not exists unisql.hudi_ut_time_traval
> (id INT, version INT, name STRING, birthDate TIMESTAMP, inc_day STRING) USING 
> HUDI
> PARTITIONED BY (inc_day) TBLPROPERTIES (type='cow', primaryKey='id');
> insert into unisql.hudi_ut_time_traval
> select 1 as id, 1 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' 
> as timestamp) as birthDate, cast('2023-10-01' as date) as inc_day;
> select * from hudi_ut_time_traval;
> +---+-+--+--++---+---+-+---+--+
> |_hoodie_commit_time|_hoodie_commit_seqno 
> |_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name |id 
> |version|name |birthDate |inc_day |
> +---+-+--+--++---+---+-+---+--+
> |20231122100234339 |20231122100234339_0_0|1 |inc_day=2023-10-01 
> |8a510742-c060-4d12-898e-70bbd122f2e3-0_0-19-16_20231122100234339.parquet|1 
> |1 |str_1|2023-01-01 12:12:12|2023-10-01|
> +---+-+--+--++---+---+-+---+--+
> merge into hudi_ut_time_traval t using (
> select 1 as id, 2 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' 
> as timestamp) as birthDate, cast('2023-10-02' as date) as inc_day
> ) s on t.id=s.id when matched THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *
> Caused by: org.apache.parquet.io.ParquetDecodingException: The requested 
> schema is not compatible with the file schema. incompatible types: required 
> int32 id != optional int32 id
> at 
> org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:101)
> at 
> org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:81)
> at 
> org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:57)
> at org.apache.parquet.schema.MessageType.accept(MessageType.java:55)
> at org.apache.parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:162)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:135)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:225);
> parquet schema:
> {
> "type" : "record",
> "name" : "hudi_ut_time_traval_record",
> "namespace" : "hoodie.hudi_ut_time_traval",
> "fields" : [ {
> "name" : "_hoodie_commit_time",
> "type" : [ "null", "string" ],
> "doc" : "",
> "default" : null
> }, {
> "name" : "_hoodie_commit_seqno",
> "type" : [ "null", "string" ],
> "doc" : "",
> "default" : null
> }, {
> "name" : "_hoodie_record_key",
> "type" : [ "null", "string" ],
> "doc" : "",
> "default" : null
> }, {
> "name" : "_hoodie_partition_path",
> "type" : [ "null", "string" ],
> "doc" : "",
> "default" : null
> }, {
> "name" : "_hoodie_file_name",
> "type" : [ "null", "string" ],
> "doc" : "",
> "default" : null
> }, {
> "name" : "id",
> "type" : [ "null", "int" ],
> "default" : null
> }, {
> "name" : "version",
> "type" : [ "null", "int" ],
> "default" : null
> }, {
> "name" : "name",
> "type" : [ "null", "string" ],
> "default" : null
> }, {
> "name" : "birthDate",
> "type" : [ "null", {
> "type" : "long",
> "logicalType" : "timestamp-micros"
> } ],
> "default" : null
> }, {
> "name" : "inc_day",
> "type" : [ "null", "string" ],
> "default" : null
> } ]
> }
> org.apache.hudi.io.HoodieMergedReadHandle#readerSchema:
> 

[jira] [Commented] (HUDI-7131) The requested schema is not compatible with the file schema

2023-11-21 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788605#comment-17788605
 ] 

Danny Chen commented on HUDI-7131:
--

It looks like an known issue, we do not support schema evolution on partiton 
fields yet.

> The requested schema is not compatible with the file schema
> ---
>
> Key: HUDI-7131
> URL: https://issues.apache.org/jira/browse/HUDI-7131
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: loukey_j
>Priority: Critical
>  Labels: core, merge, spark
>
> use global Index and data partition change , report an error: The requested 
> schema is not compatible with the file schema...
> Why not use the schema of 
> org.apache.hudi.common.table.TableSchemaResolver#getTableAvroSchemaInternal 
> to read hudi data
>  
> CREATE TABLE if not exists unisql.hudi_ut_time_traval
> (id INT, version INT, name STRING, birthDate TIMESTAMP, inc_day STRING) USING 
> HUDI
> PARTITIONED BY (inc_day) TBLPROPERTIES (type='cow', primaryKey='id');
> insert into unisql.hudi_ut_time_traval
> select 1 as id, 1 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' 
> as timestamp) as birthDate, cast('2023-10-01' as date) as inc_day;
> select * from hudi_ut_time_traval;
> +---+-+--+--++---+---+-+---+--+
> |_hoodie_commit_time|_hoodie_commit_seqno 
> |_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name |id 
> |version|name |birthDate |inc_day |
> +---+-+--+--++---+---+-+---+--+
> |20231122100234339 |20231122100234339_0_0|1 |inc_day=2023-10-01 
> |8a510742-c060-4d12-898e-70bbd122f2e3-0_0-19-16_20231122100234339.parquet|1 
> |1 |str_1|2023-01-01 12:12:12|2023-10-01|
> +---+-+--+--++---+---+-+---+--+
> merge into hudi_ut_time_traval t using (
> select 1 as id, 2 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' 
> as timestamp) as birthDate, cast('2023-10-02' as date) as inc_day
> ) s on t.id=s.id when matched THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *
> Caused by: org.apache.parquet.io.ParquetDecodingException: The requested 
> schema is not compatible with the file schema. incompatible types: required 
> int32 id != optional int32 id
> at 
> org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:101)
> at 
> org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:81)
> at 
> org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:57)
> at org.apache.parquet.schema.MessageType.accept(MessageType.java:55)
> at org.apache.parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:162)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:135)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:225);
> parquet schema:
> {
> "type" : "record",
> "name" : "hudi_ut_time_traval_record",
> "namespace" : "hoodie.hudi_ut_time_traval",
> "fields" : [ {
> "name" : "_hoodie_commit_time",
> "type" : [ "null", "string" ],
> "doc" : "",
> "default" : null
> }, {
> "name" : "_hoodie_commit_seqno",
> "type" : [ "null", "string" ],
> "doc" : "",
> "default" : null
> }, {
> "name" : "_hoodie_record_key",
> "type" : [ "null", "string" ],
> "doc" : "",
> "default" : null
> }, {
> "name" : "_hoodie_partition_path",
> "type" : [ "null", "string" ],
> "doc" : "",
> "default" : null
> }, {
> "name" : "_hoodie_file_name",
> "type" : [ "null", "string" ],
> "doc" : "",
> "default" : null
> }, {
> "name" : "id",
> "type" : [ "null", "int" ],
> "default" : null
> }, {
> "name" : "version",
> "type" : [ "null", "int" ],
> "default" : null
> }, {
> "name" : "name",
> "type" : [ "null", "string" ],
> "default" : null
> }, {
> "name" : "birthDate",
> "type" : [ "null", {
> "type" : "long",
> "logicalType" : "timestamp-micros"
> } ],
> "default" : null
> }, {
> "name" : "inc_day",
> "type" : [ "null", "string" ],
> "default" : null
> } ]
> }
> org.apache.hudi.io.HoodieMergedReadHandle#readerSchema:
> 

[jira] [Updated] (HUDI-7132) Data may be lost in flink#chk

2023-11-21 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HUDI-7132:
-
Description: 
https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
elements of eventBuffer is null
we can add a lock to the 2 line to solve the problem.
https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21



  was:
https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
elements of eventBuffer is null

https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21


> Data may be lost in flink#chk
> -
>
> Key: HUDI-7132
> URL: https://issues.apache.org/jira/browse/HUDI-7132
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 1.1.0
>Reporter: Bo Cui
>Priority: Major
>
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
> before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
> elements of eventBuffer is null
> we can add a lock to the 2 line to solve the problem.
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7034] Refresh index fix - remove cached file slices within part… [hudi]

2023-11-21 Thread via GitHub


danny0405 commented on PR #10151:
URL: https://github.com/apache/hudi/pull/10151#issuecomment-1821998306

   @VitoMakarevich Nice catch, can you fix the compile error: error 
file=/home/runner/work/hudi/hudi/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieFileIndex.scala
 message=expected start of definition, but was Token(VAL,val,12946,val)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7132) Data may be lost in flink#chk

2023-11-21 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HUDI-7132:
-
Description: 
https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
elements of eventBuffer is null

https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21

> Data may be lost in flink#chk
> -
>
> Key: HUDI-7132
> URL: https://issues.apache.org/jira/browse/HUDI-7132
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 1.1.0
>Reporter: Bo Cui
>Priority: Major
>
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L524C23-L524C35
> before the line code, eventBuffer may be updated by `subtaskFailed`, and some 
> elements of eventBuffer is null
> https://github.com/apache/hudi/blob/a1afcdd989ce2d634290d1bd9e099a17057e6b4d/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java#L305C10-L305C21



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7132) Data may be lost in flink#chk

2023-11-21 Thread Bo Cui (Jira)
Bo Cui created HUDI-7132:


 Summary: Data may be lost in flink#chk
 Key: HUDI-7132
 URL: https://issues.apache.org/jira/browse/HUDI-7132
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Affects Versions: 1.1.0
Reporter: Bo Cui






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10150:
URL: https://github.com/apache/hudi/pull/10150#issuecomment-1821990146

   
   ## CI report:
   
   * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059)
 
   * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073)
 
   * 1f7f7cdaf5be480a170169e5d97bc6ec76aa5d6f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21079)
 
   * 2db7b8bee13140a4756427aeb802bad13822e5af Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21080)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5968] Fix global index duplicate and handle custom payload when update partition [hudi]

2023-11-21 Thread via GitHub


loukey-lj commented on PR #8490:
URL: https://github.com/apache/hudi/pull/8490#issuecomment-1821987556

   @xushiyan @nsivabalan https://issues.apache.org/jira/browse/HUDI-7131


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7131) The requested schema is not compatible with the file schema

2023-11-21 Thread loukey_j (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

loukey_j updated HUDI-7131:
---
Affects Version/s: 0.14.0

> The requested schema is not compatible with the file schema
> ---
>
> Key: HUDI-7131
> URL: https://issues.apache.org/jira/browse/HUDI-7131
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: loukey_j
>Priority: Critical
>  Labels: core, merge, spark
>
> use global Index and data partition change , report an error: The requested 
> schema is not compatible with the file schema...
> Why not use the schema of 
> org.apache.hudi.common.table.TableSchemaResolver#getTableAvroSchemaInternal 
> to read hudi data
>  
> CREATE TABLE if not exists unisql.hudi_ut_time_traval
> (id INT, version INT, name STRING, birthDate TIMESTAMP, inc_day STRING) USING 
> HUDI
> PARTITIONED BY (inc_day) TBLPROPERTIES (type='cow', primaryKey='id');
> insert into unisql.hudi_ut_time_traval
> select 1 as id, 1 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' 
> as timestamp) as birthDate, cast('2023-10-01' as date) as inc_day;
> select * from hudi_ut_time_traval;
> +---+-+--+--++---+---+-+---+--+
> |_hoodie_commit_time|_hoodie_commit_seqno 
> |_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name |id 
> |version|name |birthDate |inc_day |
> +---+-+--+--++---+---+-+---+--+
> |20231122100234339 |20231122100234339_0_0|1 |inc_day=2023-10-01 
> |8a510742-c060-4d12-898e-70bbd122f2e3-0_0-19-16_20231122100234339.parquet|1 
> |1 |str_1|2023-01-01 12:12:12|2023-10-01|
> +---+-+--+--++---+---+-+---+--+
> merge into hudi_ut_time_traval t using (
> select 1 as id, 2 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' 
> as timestamp) as birthDate, cast('2023-10-02' as date) as inc_day
> ) s on t.id=s.id when matched THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *
> Caused by: org.apache.parquet.io.ParquetDecodingException: The requested 
> schema is not compatible with the file schema. incompatible types: required 
> int32 id != optional int32 id
> at 
> org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:101)
> at 
> org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:81)
> at 
> org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:57)
> at org.apache.parquet.schema.MessageType.accept(MessageType.java:55)
> at org.apache.parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:162)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:135)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:225);
> parquet schema:
> {
> "type" : "record",
> "name" : "hudi_ut_time_traval_record",
> "namespace" : "hoodie.hudi_ut_time_traval",
> "fields" : [ {
> "name" : "_hoodie_commit_time",
> "type" : [ "null", "string" ],
> "doc" : "",
> "default" : null
> }, {
> "name" : "_hoodie_commit_seqno",
> "type" : [ "null", "string" ],
> "doc" : "",
> "default" : null
> }, {
> "name" : "_hoodie_record_key",
> "type" : [ "null", "string" ],
> "doc" : "",
> "default" : null
> }, {
> "name" : "_hoodie_partition_path",
> "type" : [ "null", "string" ],
> "doc" : "",
> "default" : null
> }, {
> "name" : "_hoodie_file_name",
> "type" : [ "null", "string" ],
> "doc" : "",
> "default" : null
> }, {
> "name" : "id",
> "type" : [ "null", "int" ],
> "default" : null
> }, {
> "name" : "version",
> "type" : [ "null", "int" ],
> "default" : null
> }, {
> "name" : "name",
> "type" : [ "null", "string" ],
> "default" : null
> }, {
> "name" : "birthDate",
> "type" : [ "null", {
> "type" : "long",
> "logicalType" : "timestamp-micros"
> } ],
> "default" : null
> }, {
> "name" : "inc_day",
> "type" : [ "null", "string" ],
> "default" : null
> } ]
> }
> org.apache.hudi.io.HoodieMergedReadHandle#readerSchema:
> 

Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10150:
URL: https://github.com/apache/hudi/pull/10150#issuecomment-1821984783

   
   ## CI report:
   
   * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059)
 
   * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073)
 
   * 1f7f7cdaf5be480a170169e5d97bc6ec76aa5d6f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21079)
 
   * 2db7b8bee13140a4756427aeb802bad13822e5af UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7131) The requested schema is not compatible with the file schema

2023-11-21 Thread loukey_j (Jira)
loukey_j created HUDI-7131:
--

 Summary: The requested schema is not compatible with the file 
schema
 Key: HUDI-7131
 URL: https://issues.apache.org/jira/browse/HUDI-7131
 Project: Apache Hudi
  Issue Type: Bug
Reporter: loukey_j


use global Index and data partition change , report an error: The requested 
schema is not compatible with the file schema...

Why not use the schema of 
org.apache.hudi.common.table.TableSchemaResolver#getTableAvroSchemaInternal to 
read hudi data

 
CREATE TABLE if not exists unisql.hudi_ut_time_traval
(id INT, version INT, name STRING, birthDate TIMESTAMP, inc_day STRING) USING 
HUDI
PARTITIONED BY (inc_day) TBLPROPERTIES (type='cow', primaryKey='id');

insert into unisql.hudi_ut_time_traval
select 1 as id, 1 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' as 
timestamp) as birthDate, cast('2023-10-01' as date) as inc_day;

select * from hudi_ut_time_traval;
+---+-+--+--++---+---+-+---+--+
|_hoodie_commit_time|_hoodie_commit_seqno 
|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name |id |version|name 
|birthDate |inc_day |
+---+-+--+--++---+---+-+---+--+
|20231122100234339 |20231122100234339_0_0|1 |inc_day=2023-10-01 
|8a510742-c060-4d12-898e-70bbd122f2e3-0_0-19-16_20231122100234339.parquet|1 |1 
|str_1|2023-01-01 12:12:12|2023-10-01|
+---+-+--+--++---+---+-+---+--+

merge into hudi_ut_time_traval t using (
select 1 as id, 2 as version, 'str_1' as name, cast('2023-01-01 12:12:12.0' as 
timestamp) as birthDate, cast('2023-10-02' as date) as inc_day
) s on t.id=s.id when matched THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *

Caused by: org.apache.parquet.io.ParquetDecodingException: The requested schema 
is not compatible with the file schema. incompatible types: required int32 id 
!= optional int32 id
at 
org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:101)
at 
org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:81)
at 
org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:57)
at org.apache.parquet.schema.MessageType.accept(MessageType.java:55)
at org.apache.parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:162)
at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:135)
at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:225);

parquet schema:
{
"type" : "record",
"name" : "hudi_ut_time_traval_record",
"namespace" : "hoodie.hudi_ut_time_traval",
"fields" : [ {
"name" : "_hoodie_commit_time",
"type" : [ "null", "string" ],
"doc" : "",
"default" : null
}, {
"name" : "_hoodie_commit_seqno",
"type" : [ "null", "string" ],
"doc" : "",
"default" : null
}, {
"name" : "_hoodie_record_key",
"type" : [ "null", "string" ],
"doc" : "",
"default" : null
}, {
"name" : "_hoodie_partition_path",
"type" : [ "null", "string" ],
"doc" : "",
"default" : null
}, {
"name" : "_hoodie_file_name",
"type" : [ "null", "string" ],
"doc" : "",
"default" : null
}, {
"name" : "id",
"type" : [ "null", "int" ],
"default" : null
}, {
"name" : "version",
"type" : [ "null", "int" ],
"default" : null
}, {
"name" : "name",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "birthDate",
"type" : [ "null", {
"type" : "long",
"logicalType" : "timestamp-micros"
} ],
"default" : null
}, {
"name" : "inc_day",
"type" : [ "null", "string" ],
"default" : null
} ]
}

org.apache.hudi.io.HoodieMergedReadHandle#readerSchema:

{"type":"record","name":"hudi_ut_time_traval_record","namespace":"hoodie.hudi_ut_time_traval","fields":[\{"name":"id","type":"int"},\{"name":"version","type":"int"},\{"name":"name","type":"string"},\{"name":"birthDate","type":["null",{"type":"long","logicalType":"timestamp-micros"}],"default":null},\{"name":"inc_day","type":["null",{"type":"int","logicalType":"date"}],"default":null}]}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7004] Add support of snapshotLoadQuerySplitter in s3/gcs sources [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10152:
URL: https://github.com/apache/hudi/pull/10152#issuecomment-1821984821

   
   ## CI report:
   
   * 9764bc6527e5e3e83ed08263484beb45c1796d47 UNKNOWN
   * d0fe92994777e2067d654e2585c75c91616f8598 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21061)
 
   * b1748e270c379c479bb3286e635482d204b853c5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21071)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]

2023-11-21 Thread via GitHub


nsivabalan commented on code in PR #10150:
URL: https://github.com/apache/hudi/pull/10150#discussion_r1401416490


##
hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java:
##
@@ -89,6 +90,18 @@ public Option getInsertValue(Schema schema, 
Properties properties
 return isDeleteRecord(incomingRecord, properties) ? Option.empty() : 
Option.of(incomingRecord);
   }
 
+  public boolean isDeleted(Schema schema, Properties props) {
+if (recordBytes.length == 0) {
+  return true;
+}
+try {
+  GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, 
schema);
+  return isDeleteRecord(incomingRecord, props);

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7097] Fixing instantiation of Hms Uri with HiveSync tool [hudi]

2023-11-21 Thread via GitHub


xushiyan commented on code in PR #10099:
URL: https://github.com/apache/hudi/pull/10099#discussion_r1401411887


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java:
##
@@ -103,15 +103,29 @@ public class HiveSyncTool extends HoodieSyncTool 
implements AutoCloseable {
 
   public HiveSyncTool(Properties props, Configuration hadoopConf) {
 super(props, hadoopConf);
-String metastoreUris = props.getProperty(METASTORE_URIS.key());
-// Give precedence to HiveConf.ConfVars.METASTOREURIS if it is set.
-// Else if user has provided HiveSyncConfigHolder.METASTORE_URIS, then set 
that in hadoop conf.
-if (isNullOrEmpty(hadoopConf.get(HiveConf.ConfVars.METASTOREURIS.varname)) 
&& nonEmpty(metastoreUris)) {
-  LOG.info(String.format("Setting %s = %s", 
HiveConf.ConfVars.METASTOREURIS.varname, metastoreUris));
-  hadoopConf.set(HiveConf.ConfVars.METASTOREURIS.varname, metastoreUris);
+String configuredMetastoreUris = props.getProperty(METASTORE_URIS.key());
+String existingHadoopConfMetastoreUris = 
hadoopConf.get(HiveConf.ConfVars.METASTOREURIS.varname);

Review Comment:
   so wrote a small snippet to check how much memory it could cost. 
   
   ```java
   Configuration originalConf = hadoopConf();
   long freeMem0 = Runtime.getRuntime().freeMemory();
   IntStream.range(0, 1000).forEach(i -> {
 originalConf.set("typical.hadoop.configuration.key" + i, 
"https://www.example.com:8080/path?query=value#fragment; + i);
   });
   System.out.println("no conf entries: " + originalConf.size());
   List l = new ArrayList<>();
   IntStream.range(0, 100).forEach(i -> {
 l.add(new Configuration(originalConf));
   });
   long freeMem1 = Runtime.getRuntime().freeMemory();
   System.out.println("after copy, used mem (MB): " + (freeMem0 - freeMem1) 
/ (1024.0 * 1024.0));
   ```
   
   each hadoop conf has 2k+ properties and making 100 copies cost 30mb
   
   ```
   no conf entries: 2162
   after copy, used mem (MB): 29.616012573242188
   ```
   
   this is even extreme case with this number of confs and metasync tasks. so i 
don't think memory will be an issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]

2023-11-21 Thread via GitHub


danny0405 commented on code in PR #10150:
URL: https://github.com/apache/hudi/pull/10150#discussion_r1401397731


##
hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java:
##
@@ -89,6 +90,18 @@ public Option getInsertValue(Schema schema, 
Properties properties
 return isDeleteRecord(incomingRecord, properties) ? Option.empty() : 
Option.of(incomingRecord);
   }
 
+  public boolean isDeleted(Schema schema, Properties props) {
+if (recordBytes.length == 0) {
+  return true;
+}
+try {
+  GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, 
schema);
+  return isDeleteRecord(incomingRecord, props);

Review Comment:
   Maybe we just cache a specific `isDeleted` flag for this 
`DefaultHoodieRecordPayload` so that this flag can be reused all the time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]

2023-11-21 Thread via GitHub


danny0405 commented on code in PR #10150:
URL: https://github.com/apache/hudi/pull/10150#discussion_r1401397731


##
hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java:
##
@@ -89,6 +90,18 @@ public Option getInsertValue(Schema schema, 
Properties properties
 return isDeleteRecord(incomingRecord, properties) ? Option.empty() : 
Option.of(incomingRecord);
   }
 
+  public boolean isDeleted(Schema schema, Properties props) {
+if (recordBytes.length == 0) {
+  return true;
+}
+try {
+  GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, 
schema);
+  return isDeleteRecord(incomingRecord, props);

Review Comment:
   I feel like we can cache the avro generic record for a little while for the 
`isDeleted` call based on the fact that the `isDeleted` should always be 
invoked before `getInsertValue` and `combineAndGetUpdateValue`, we can destroy 
the avro record in the last step of calling `getInsertValue`, that would 
eliminate 2 deserializations of avro.
   
   Another choice is we just cache a specific `isDeleted` flag for this 
`DefaultHoodieRecordPayload` so that this flag can be reused all the time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7096] Improving incremental query to fetch partitions based on commit metadata [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10098:
URL: https://github.com/apache/hudi/pull/10098#issuecomment-1821947415

   
   ## CI report:
   
   * abd651f8dcbde53717e473efe1c15d4bd486b0eb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21011)
 
   * c64942862556ae29fb52af06007bc5a303d42100 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21078)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7086] Scaling gcs event source [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10073:
URL: https://github.com/apache/hudi/pull/10073#issuecomment-1821947343

   
   ## CI report:
   
   * fff2ac40c67fdcd15fdf4b65890e00d63aa60a0a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21014)
 
   * 48df6bbec2473dbbbedb1b723896acb17056e80f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21076)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10150:
URL: https://github.com/apache/hudi/pull/10150#issuecomment-1821947562

   
   ## CI report:
   
   * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059)
 
   * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073)
 
   * 1f7f7cdaf5be480a170169e5d97bc6ec76aa5d6f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21079)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7095] Making perf enhancements to JSON serde [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10097:
URL: https://github.com/apache/hudi/pull/10097#issuecomment-1821947383

   
   ## CI report:
   
   * aaf5a310c5ac999c81498308fdc11d6d5171463d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21016)
 
   * 43400ce2317882c76a68eb3a855c9dd814c92234 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21077)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7023] Support querying without syncing partition metadata to catalog [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10153:
URL: https://github.com/apache/hudi/pull/10153#issuecomment-1821942182

   
   ## CI report:
   
   * 46a4c3344c79fd9a61db78620e8c40e7d98bcd36 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21062)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7034] Refresh index fix - remove cached file slices within part… [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10151:
URL: https://github.com/apache/hudi/pull/10151#issuecomment-1821942128

   
   ## CI report:
   
   * 3d1e603aea3cc23614de38d511b5d4ddeac92f5d Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21075)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10150:
URL: https://github.com/apache/hudi/pull/10150#issuecomment-1821942086

   
   ## CI report:
   
   * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059)
 
   * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073)
 
   * 1f7f7cdaf5be480a170169e5d97bc6ec76aa5d6f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10102:
URL: https://github.com/apache/hudi/pull/10102#issuecomment-1821941946

   
   ## CI report:
   
   * a7f01f6ad7008830e6e2993b0ba5c986ca493093 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21074)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7096] Improving incremental query to fetch partitions based on commit metadata [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10098:
URL: https://github.com/apache/hudi/pull/10098#issuecomment-1821941910

   
   ## CI report:
   
   * abd651f8dcbde53717e473efe1c15d4bd486b0eb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21011)
 
   * c64942862556ae29fb52af06007bc5a303d42100 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7095] Making perf enhancements to JSON serde [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10097:
URL: https://github.com/apache/hudi/pull/10097#issuecomment-1821941849

   
   ## CI report:
   
   * aaf5a310c5ac999c81498308fdc11d6d5171463d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21016)
 
   * 43400ce2317882c76a68eb3a855c9dd814c92234 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7086] Scaling gcs event source [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10073:
URL: https://github.com/apache/hudi/pull/10073#issuecomment-1821941789

   
   ## CI report:
   
   * fff2ac40c67fdcd15fdf4b65890e00d63aa60a0a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21014)
 
   * 48df6bbec2473dbbbedb1b723896acb17056e80f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7112] Reuse existing timeline server and performance improvements [hudi]

2023-11-21 Thread via GitHub


nsivabalan commented on code in PR #10122:
URL: https://github.com/apache/hudi/pull/10122#discussion_r1401369779


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteMergeOnReadWithCompact.java:
##
@@ -159,6 +159,8 @@ public void 
testNonBlockingConcurrencyControlWithPartialUpdatePayload() throws E
 // because the data files belongs 3rd commit is not included in the last 
compaction.
 Map readOptimizedResult = Collections.singletonMap("par1", 
"[id1,par1,id1,Danny,23,2,par1]");
 TestData.checkWrittenData(tempFile, readOptimizedResult, 1);
+pipeline1.end();
+pipeline2.end();

Review Comment:
   responded below



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java:
##
@@ -29,37 +31,95 @@
 import org.apache.hudi.config.HoodieWriteConfig;
 import org.apache.hudi.timeline.service.TimelineService;
 
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import java.io.IOException;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Properties;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicInteger;
 
 /**
  * Timeline Service that runs as part of write client.
  */
 public class EmbeddedTimelineService {
+  // lock used when starting/stopping/modifying embedded services
+  private static final Object SERVICE_LOCK = new Object();
 
   private static final Logger LOG = 
LoggerFactory.getLogger(EmbeddedTimelineService.class);
-
+  private static final AtomicInteger NUM_SERVERS_RUNNING = new 
AtomicInteger(0);
+  // Map of port to existing timeline service running on that port
+  private static final Map RUNNING_SERVICES 
= new HashMap<>();
+  private static final Registry METRICS_REGISTRY = 
Registry.getRegistry("TimelineService");
+  private static final String NUM_EMBEDDED_TIMELINE_SERVERS = 
"numEmbeddedTimelineServers";
   private int serverPort;
   private String hostAddr;
-  private HoodieEngineContext context;
+  private final HoodieEngineContext context;
   private final SerializableConfiguration hadoopConf;
   private final HoodieWriteConfig writeConfig;
-  private final String basePath;
+  private TimelineService.Config serviceConfig;
+  private final Set basePaths; // the set of base paths using this 
EmbeddedTimelineService
 
   private transient FileSystemViewManager viewManager;
   private transient TimelineService server;
 
-  public EmbeddedTimelineService(HoodieEngineContext context, String 
embeddedTimelineServiceHostAddr, HoodieWriteConfig writeConfig) {
+  private EmbeddedTimelineService(HoodieEngineContext context, String 
embeddedTimelineServiceHostAddr, HoodieWriteConfig writeConfig) {
 setHostAddr(embeddedTimelineServiceHostAddr);
 this.context = context;
 this.writeConfig = writeConfig;
-this.basePath = writeConfig.getBasePath();
+this.basePaths = new HashSet<>();
+this.basePaths.add(writeConfig.getBasePath());
 this.hadoopConf = context.getHadoopConf();
 this.viewManager = createViewManager();
   }
 
+  /**
+   * Returns an existing embedded timeline service if one is running for the 
given configuration and reuse is enabled, or starts a new one.
+   * @param context The {@link HoodieEngineContext} for the client
+   * @param embeddedTimelineServiceHostAddr The host address to use for the 
service (nullable)
+   * @param writeConfig The {@link HoodieWriteConfig} for the client
+   * @return A running {@link EmbeddedTimelineService}
+   * @throws IOException if an error occurs while starting the service
+   */
+  public static EmbeddedTimelineService 
getOrStartEmbeddedTimelineService(HoodieEngineContext context, String 
embeddedTimelineServiceHostAddr, HoodieWriteConfig writeConfig) throws 
IOException {
+return getOrStartEmbeddedTimelineService(context, 
embeddedTimelineServiceHostAddr, writeConfig, TimelineService::new);
+  }
+
+  static EmbeddedTimelineService 
getOrStartEmbeddedTimelineService(HoodieEngineContext context, String 
embeddedTimelineServiceHostAddr, HoodieWriteConfig writeConfig,
+   
TimelineServiceCreator timelineServiceCreator) throws IOException {
+// if reuse is enabled, check if any existing instances are compatible
+if (writeConfig.isEmbeddedTimelineServerReuseEnabled()) {
+  synchronized (SERVICE_LOCK) {
+for (EmbeddedTimelineService service : RUNNING_SERVICES.values()) {
+  if (service.canReuseFor(writeConfig, 
embeddedTimelineServiceHostAddr)) {
+service.addBasePath(writeConfig.getBasePath());
+LOG.info("Reusing existing embedded timeline server with 
configuration: " + service.serviceConfig);
+return service;
+  }
+}
+// if no compatible instance is found, create a new one
+

Re: [PR] [HUDI-7034] Refresh index fix - remove cached file slices within part… [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10151:
URL: https://github.com/apache/hudi/pull/10151#issuecomment-1821901635

   
   ## CI report:
   
   * b124e2a54cd9b3fec6d19c7c131b93234cd8c68c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21060)
 
   * 3d1e603aea3cc23614de38d511b5d4ddeac92f5d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21075)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10150:
URL: https://github.com/apache/hudi/pull/10150#issuecomment-1821901599

   
   ## CI report:
   
   * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059)
 
   * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10102:
URL: https://github.com/apache/hudi/pull/10102#issuecomment-1821901457

   
   ## CI report:
   
   * c3ff2511a30564e5a5ff0cb407326ff6ef0584e3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20930)
 
   * a7f01f6ad7008830e6e2993b0ba5c986ca493093 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21074)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7034] Refresh index fix - remove cached file slices within part… [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10151:
URL: https://github.com/apache/hudi/pull/10151#issuecomment-1821895329

   
   ## CI report:
   
   * b124e2a54cd9b3fec6d19c7c131b93234cd8c68c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21060)
 
   * 3d1e603aea3cc23614de38d511b5d4ddeac92f5d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10102:
URL: https://github.com/apache/hudi/pull/10102#issuecomment-1821895011

   
   ## CI report:
   
   * c3ff2511a30564e5a5ff0cb407326ff6ef0584e3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20930)
 
   * a7f01f6ad7008830e6e2993b0ba5c986ca493093 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]

2023-11-21 Thread via GitHub


linliu-code commented on code in PR #10102:
URL: https://github.com/apache/hudi/pull/10102#discussion_r1401341463


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java:
##
@@ -260,7 +260,7 @@ private void scanInternalV1(Option keySpecOpt) {
 && 
!HoodieTimeline.compareTimestamps(logBlock.getLogBlockHeader().get(INSTANT_TIME),
 HoodieTimeline.LESSER_THAN_OR_EQUALS, this.latestInstantTime
 )) {
   // hit a block with instant time greater than should be processed, 
stop processing further
-  break;
+  continue;
 }

Review Comment:
   @danny0405, after discussing with @yihua , the fix here is correct since the 
order of log blocks have been reversed and the "break" logic was for the 
in-the-order-of-time. CC:@yihua



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]

2023-11-21 Thread via GitHub


linliu-code commented on code in PR #10102:
URL: https://github.com/apache/hudi/pull/10102#discussion_r1401341463


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java:
##
@@ -260,7 +260,7 @@ private void scanInternalV1(Option keySpecOpt) {
 && 
!HoodieTimeline.compareTimestamps(logBlock.getLogBlockHeader().get(INSTANT_TIME),
 HoodieTimeline.LESSER_THAN_OR_EQUALS, this.latestInstantTime
 )) {
   // hit a block with instant time greater than should be processed, 
stop processing further
-  break;
+  continue;
 }

Review Comment:
   @danny0405, after discussing with @yihua , the fix here is correct since the 
order of log blocks have been reversed and the "break" logic was for the old 
design where the blocks are in the order of time. CC:@yihua



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7102] Fix a bug for time travel queries on MOR tables [hudi]

2023-11-21 Thread via GitHub


linliu-code commented on code in PR #10102:
URL: https://github.com/apache/hudi/pull/10102#discussion_r1401341463


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java:
##
@@ -260,7 +260,7 @@ private void scanInternalV1(Option keySpecOpt) {
 && 
!HoodieTimeline.compareTimestamps(logBlock.getLogBlockHeader().get(INSTANT_TIME),
 HoodieTimeline.LESSER_THAN_OR_EQUALS, this.latestInstantTime
 )) {
   // hit a block with instant time greater than should be processed, 
stop processing further
-  break;
+  continue;
 }

Review Comment:
   @danny0405, after discussing with @yihua , the fix here is correct since the 
order of log blocks have been reserved, and the "break" logic was for the 
in-the-order-of-time. CC:@yihua



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7034] Refresh index fix - remove cached file slices within part… [hudi]

2023-11-21 Thread via GitHub


VitoMakarevich commented on PR #10151:
URL: https://github.com/apache/hudi/pull/10151#issuecomment-1821859085

   Added a test, at least `fileIndex`-related tests pass(I ran only them), also 
can verify that executing `.refresh` on existing Index does not refresh a list 
of files as reported, with this one-line code change it starts working 
correctly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7129] Fix bug when upgrade from table version three using UpgradeOrDowngradeProcedure [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10147:
URL: https://github.com/apache/hudi/pull/10147#issuecomment-1821842256

   
   ## CI report:
   
   * 994d062df78afd5062dec418cddff167daff42d8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21058)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7110] Add call procedure for show column stats information [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10120:
URL: https://github.com/apache/hudi/pull/10120#issuecomment-1821786342

   
   ## CI report:
   
   * 03451f7cd016ee9fb078f4d78f3b771e8719c233 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21056)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10150:
URL: https://github.com/apache/hudi/pull/10150#issuecomment-1821715566

   
   ## CI report:
   
   * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059)
 
   * c586dc82aa4c791cabd8f3172ee1f982f71433da Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21073)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10150:
URL: https://github.com/apache/hudi/pull/10150#issuecomment-1821703634

   
   ## CI report:
   
   * ea3efa0db6b2a2e88508641d6ffb7eec9c33bf00 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21059)
 
   * c586dc82aa4c791cabd8f3172ee1f982f71433da UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7120] Performance improvements in deltastreamer executor code path [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10135:
URL: https://github.com/apache/hudi/pull/10135#issuecomment-1821703416

   
   ## CI report:
   
   * 3d48bfc5c41a59a1114eb73a5ef9a7b7fda5eccf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21053)
 
   * 34ffc8261d951bde500df7688800b2ed6afb4fa6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21069)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7112] Reuse existing timeline server and performance improvements [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10122:
URL: https://github.com/apache/hudi/pull/10122#issuecomment-1821683706

   
   ## CI report:
   
   * 697114b6ec4f578123363a89a6846e352bc3a53e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21057)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7115] Add in new options for the bigquery sync (#10125)

2023-11-21 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new a1afcdd989c [HUDI-7115] Add in new options for the bigquery sync 
(#10125)
a1afcdd989c is described below

commit a1afcdd989ce2d634290d1bd9e099a17057e6b4d
Author: Tim Brown 
AuthorDate: Tue Nov 21 14:58:12 2023 -0600

[HUDI-7115] Add in new options for the bigquery sync (#10125)

- Add in new options for the bigquery sync
---
 hudi-gcp/pom.xml   |  3 +-
 .../hudi/gcp/bigquery/BigQuerySyncConfig.java  | 20 
 .../apache/hudi/gcp/bigquery/BigQuerySyncTool.java | 23 +
 .../gcp/bigquery/HoodieBigQuerySyncClient.java | 58 +++---
 .../hudi/gcp/bigquery/TestBigQuerySyncConfig.java  |  2 +-
 .../hudi/gcp/bigquery/TestBigQuerySyncTool.java| 12 ++---
 .../gcp/bigquery/TestBigQuerySyncToolArgs.java |  8 ++-
 .../gcp/bigquery/TestHoodieBigQuerySyncClient.java | 26 ++
 8 files changed, 114 insertions(+), 38 deletions(-)

diff --git a/hudi-gcp/pom.xml b/hudi-gcp/pom.xml
index b1cfb8076a6..2c308fbf424 100644
--- a/hudi-gcp/pom.xml
+++ b/hudi-gcp/pom.xml
@@ -36,7 +36,7 @@ See 
https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google
   
 com.google.cloud
 libraries-bom
-25.1.0
+26.15.0
 pom
 import
   
@@ -70,7 +70,6 @@ See 
https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google
 
   com.google.cloud
   google-cloud-pubsub
-  ${google.cloud.pubsub.version}
 
 
 
diff --git 
a/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java 
b/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java
index 94510ca8dfa..ed8895ca217 100644
--- 
a/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java
+++ 
b/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncConfig.java
@@ -122,6 +122,20 @@ public class BigQuerySyncConfig extends HoodieSyncConfig 
implements Serializable
   .markAdvanced()
   .withDocumentation("Fetch file listing from Hudi's metadata");
 
+  public static final ConfigProperty 
BIGQUERY_SYNC_REQUIRE_PARTITION_FILTER = ConfigProperty
+  .key("hoodie.gcp.bigquery.sync.require_partition_filter")
+  .defaultValue(false)
+  .sinceVersion("0.14.1")
+  .markAdvanced()
+  .withDocumentation("If true, configure table to require a partition 
filter to be specified when querying the table");
+
+  public static final ConfigProperty 
BIGQUERY_SYNC_BIG_LAKE_CONNECTION_ID = ConfigProperty
+  .key("hoodie.gcp.bigquery.sync.big_lake_connection_id")
+  .noDefaultValue()
+  .sinceVersion("0.14.1")
+  .markAdvanced()
+  .withDocumentation("The Big Lake connection ID to use");
+
   public BigQuerySyncConfig(Properties props) {
 super(props);
 setDefaults(BigQuerySyncConfig.class.getName());
@@ -147,6 +161,10 @@ public class BigQuerySyncConfig extends HoodieSyncConfig 
implements Serializable
 public String sourceUri;
 @Parameter(names = {"--source-uri-prefix"}, description = "Name of the 
source uri gcs path prefix of the table", required = false)
 public String sourceUriPrefix;
+@Parameter(names = {"--big-lake-connection-id"}, description = "The Big 
Lake connection ID to use when creating the table if using the manifest file 
approach.")
+public String bigLakeConnectionId;
+@Parameter(names = {"--require-partition-filter"}, description = "If true, 
configure table to require a partition filter to be specified when querying the 
table")
+public Boolean requirePartitionFilter;
 
 public boolean isHelp() {
   return hoodieSyncConfigParams.isHelp();
@@ -164,6 +182,8 @@ public class BigQuerySyncConfig extends HoodieSyncConfig 
implements Serializable
   props.setPropertyIfNonNull(BIGQUERY_SYNC_SYNC_BASE_PATH.key(), 
hoodieSyncConfigParams.basePath);
   props.setPropertyIfNonNull(BIGQUERY_SYNC_PARTITION_FIELDS.key(), 
StringUtils.join(",", hoodieSyncConfigParams.partitionFields));
   
props.setPropertyIfNonNull(BIGQUERY_SYNC_USE_FILE_LISTING_FROM_METADATA.key(), 
hoodieSyncConfigParams.useFileListingFromMetadata);
+  props.setPropertyIfNonNull(BIGQUERY_SYNC_BIG_LAKE_CONNECTION_ID.key(), 
bigLakeConnectionId);
+  props.setPropertyIfNonNull(BIGQUERY_SYNC_REQUIRE_PARTITION_FILTER.key(), 
requirePartitionFilter);
   return props;
 }
   }
diff --git 
a/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncTool.java 
b/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncTool.java
index 19c8449f8fa..28c071e5231 100644
--- a/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncTool.java
+++ b/hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncTool.java

Re: [PR] [HUDI-7115] Add in new options for the bigquery sync [hudi]

2023-11-21 Thread via GitHub


nsivabalan merged PR #10125:
URL: https://github.com/apache/hudi/pull/10125


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7129] Fix bug when upgrade from table version three using UpgradeOrDowngradeProcedure [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10147:
URL: https://github.com/apache/hudi/pull/10147#issuecomment-1821620165

   
   ## CI report:
   
   * 1dee5fb303eff272371c638d07d80806676fd5aa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21054)
 
   * 994d062df78afd5062dec418cddff167daff42d8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21058)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7130] Adding support for configuring value serializer with JsonKakfaSource [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10149:
URL: https://github.com/apache/hudi/pull/10149#issuecomment-1821620236

   
   ## CI report:
   
   * e809a39b71dcfa3ddcfc6348b6740391b2a08dbd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21055)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch branch-0.x created (now 0908f648152)

2023-11-21 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch branch-0.x
in repository https://gitbox.apache.org/repos/asf/hudi.git


  at 0908f648152 [HUDI-6999] Adding row writer support to HoodieStreamer 
(#9913)

This branch includes the following new commits:

 new 226a46d4841 [HUDI-6846] Fix a bug of consistent bucket index 
clustering (#9679)
 new 69225bc9bf6 [HUDI-6823] instantiate writeTimer in 
StreamWriteOperatorCoordinator (#9637)
 new a30c904608a [HUDI-6853] ArchiveCommitsProcedure should throw an 
exception when the archive operation executes failed (#9703)
 new 4afc077f56b [MINOR] Fix hbase index config improper use (#9582)
 new e870ef66653 [HUDI-6630] Automatic release connection for hoodie 
metaserver client (#9340)
 new 20c5ef50bdf [HUDI-6862] Replace directory connector markers in 
TestSqlStatement (#9458)
 new 9e647b17ea1 [HUDI-6847] Improve the incremental clean fallback logic 
(#9681)
 new 903933f607b [HUDI-6848] Fix non-unique uid for hudi operators (#9680)
 new 68ea64f7e24 [MINOR] Close record readers in TestHoodieReaderWriterBase 
after use during tests (#9504)
 new ea0c7fa7e29 [HUDI-6870] Pass project ID to BigQuery job (#9730)
 new e0b2fb67816 [HUDI-6865] Fix InternalSchema schemaId when column is 
dropped (#9724)
 new fa04fb901f1 [MINOR] Enhancing validate staged bundles script (#8591)
 new 4c288b35053 [HUDI-6871] BigQuery sync improvements (#9741)
 new 2bd4d3618aa [HUDI-6708] Support record level indexing with async 
indexer (#9517)
 new b786ce7b491 [MINOR] Close resources in tests (#9685)
 new 7ee50a13f4a [MINOR] Fix default config values if not specified (#9625)
 new aea93b3b71c [HUDI-6882] Differentiate between replacecommits in 
cluster planning (#9755)
 new e4f53c5334f [MINOR] Set connection settings for maven to avoid build 
flakiness (#9772)
 new d7d0b0e5d09 [MINOR] Mark a few new configs advanced and tag since 
version of 0.14.0 (#9771)
 new b32be910dbb [HUDI-6881] Hudi configured 
spark.scheduler.allocation.file should include scheme since Spark3.2 (#9754)
 new 0ab1beb4e18 [HUDI-6011] Fix cli show archived commits breaks for 
replacecommit (#8345)
 new b688181616c [HUDI-5924] Fixing cli clean command to trim down a subset 
based on start and end (#8169)
 new 073b36a2da5 [MINOR] Fix the check for connector identity in 
HoodieHiveCatalog (#9770)
 new 936ece380ec [HUDI-6062] Fix irregular enum config (#8564)
 new 0dd2e0aa055 [HUDI-6893] Copy the trino bundle to override the one in 
the image (#9781)
 new a6aec4719cd [HUDI-6827] Fix task failure when insert into empty 
dataset (#9797)
 new b535919ab7d [HUDI-6892] ExternalSpillableMap may cause data 
duplication when flink compaction (#9778)
 new c935303ce51 fixing build/compilation issue. Fixed missing import in 
HoodieTableMetadataUtil
 new b9980984f2e [HUDI-6922] Fix inconsistency between base file format and 
catalog input format (#9830)
 new 757b0a529ab [HUDI-6828] Fix wrong partitionToReplaceIds when 
insertOverwrite empty data into partitions (#9811)
 new c88d6ffcbd5 [MINOR] Disable falky integration test temporarily (#9823)
 new bab7a1ed44a [HUDI-6916] Improve performance of Custom Key Generators 
(#9821)
 new a66cf28be04 [HUDI-6913] Set default database name correctly (#9816)
 new c925d98c170 [HUDI-5911] 
SimpleTransactionDirectMarkerBasedDetectionStrategy can't work with 
none-partitioned table (#8143)
 new 05867751b3b [HUDI-6926] Disable DROP_PARTITION_COLUMNS when upsert MOR 
table (#9840)
 new fcb7c89fe75 [HUDI-6873] fix clustering mor (#9774)
 new 42f09b3d4ff  [HUDI-6927] CDC file clean not work (#9841)
 new 25db3575fe5 [HUDI-6917] Fix docker integ tests (#9843)
 new 8c616c1fc74 Fixing build failures with InsertIntoHoodieTableCommand
 new 9665ef44928 [HUDI-6937] CopyOnWriteInsertHandler#consume cause 
clustering performance degradation (#9851)
 new b8186d11303 Follow up HUDI-6937, fix the RealtimeCompactedRecordReader 
props instantiation (#9853)
 new 63d513ef543 [HUDI-6894] ReflectionUtils is not thread safe (#9786)
 new 14e89fd7866 [HUDI-6941] Fix partition pruning for multiple partition 
fields (#9863)
 new 93d6a66b577 [HUDI-6944] Fix flink boostrap concurrency issue (#9867)
 new 3e33ecde8ba [HUDI-6945] Fix HoodieRowDataParquetWriter cast issue 
(#9868)
 new bca004c3a09 [HUDI-6924] Fix hoodie table config not wok in table 
properties (#9836)
 new e60690a52cd [HUDI-6950] Query should process listed partitions to 
avoid driver oom due to large number files in table first partition (#9875)
 new 7121c9826b0 [MINOR] HFileBootstrapIndex: use try-with-resources in two 
places (#9813)
 new 871f8b7e6e1 [HUDI-6369] Fix spacial curve with sample strategy fails 
when 0 or 1 rows only is incoming (#9053)
 new bee5e5c5da9 [HUDI-5031] Fix MERGE INTO creates empty partition files 

Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]

2023-11-21 Thread via GitHub


abhisheksahani91 commented on issue #10138:
URL: https://github.com/apache/hudi/issues/10138#issuecomment-1821565443

   @ad1happy2go 
   I also want to add the point the connection refused error is observed when I 
am generating the high load on hudi ingestion
   https://github.com/apache/hudi/assets/122790088/a4802067-b8a6-4c57-9a09-f1f22036842e;>
   In above screen shot you can see Hudi is reading more than 1 million records 
from kafka in a single read
   
   post second read Async compaction has triggered and it resulted in 
connection refused error
   

   https://github.com/apache/hudi/assets/122790088/cc99d2e0-024b-49a1-85d5-78c2e852ce14;>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7004] Add support of snapshotLoadQuerySplitter in s3/gcs sources [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10152:
URL: https://github.com/apache/hudi/pull/10152#issuecomment-1821555262

   
   ## CI report:
   
   * 9764bc6527e5e3e83ed08263484beb45c1796d47 UNKNOWN
   * d0fe92994777e2067d654e2585c75c91616f8598 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21061)
 
   * b1748e270c379c479bb3286e635482d204b853c5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21071)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7052] Fix partition key validation for custom key generators. [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10014:
URL: https://github.com/apache/hudi/pull/10014#issuecomment-1821554730

   
   ## CI report:
   
   * 898d03c01442d0b4ac84056f25ff49f1f9aba0c0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20733)
 
   * c359d7d70cb6e34ce4d62b2f71f39b91b49ea334 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21070)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7120] Performance improvements in deltastreamer executor code path [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10135:
URL: https://github.com/apache/hudi/pull/10135#issuecomment-1821544132

   
   ## CI report:
   
   * 4913158456e1dfaa1366ba7bd5029578f3bf4cef Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21041)
 
   * 3d48bfc5c41a59a1114eb73a5ef9a7b7fda5eccf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21053)
 
   * 34ffc8261d951bde500df7688800b2ed6afb4fa6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21069)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7004] Add support of snapshotLoadQuerySplitter in s3/gcs sources [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10152:
URL: https://github.com/apache/hudi/pull/10152#issuecomment-1821544519

   
   ## CI report:
   
   * 9764bc6527e5e3e83ed08263484beb45c1796d47 UNKNOWN
   * d0fe92994777e2067d654e2585c75c91616f8598 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21061)
 
   * b1748e270c379c479bb3286e635482d204b853c5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7052] Fix partition key validation for custom key generators. [hudi]

2023-11-21 Thread via GitHub


hudi-bot commented on PR #10014:
URL: https://github.com/apache/hudi/pull/10014#issuecomment-1821543331

   
   ## CI report:
   
   * 898d03c01442d0b4ac84056f25ff49f1f9aba0c0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20733)
 
   * c359d7d70cb6e34ce4d62b2f71f39b91b49ea334 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >