[GitHub] [hudi] lokeshj1703 commented on issue #7261: [SUPPORT] Meta sync error when trying to write to s3 bucket

2022-12-13 Thread GitBox


lokeshj1703 commented on issue #7261:
URL: https://github.com/apache/hudi/issues/7261#issuecomment-1350558263

   @devanshguptatrepp Were you able to try out the changes suggested above?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] boneanxs opened a new pull request, #7456: [HUDI-4917][FOLLOW_UP]Optimize codes logic to not break the old class meaning

2022-12-13 Thread GitBox


boneanxs opened a new pull request, #7456:
URL: https://github.com/apache/hudi/pull/7456

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   1. HoodieRangeInfoHandle is bind to a file slice, but the old fix breaks the 
class meaning to allow it handle different files. Maybe we can change the class 
construct to accept BaseFile, while keep the method as it is before
   2. No need to extract `FileId` again to build the pair, causing 
`HoodieBaseFile` has the `FileId` value.
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   No
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin opened a new pull request, #7455: [DO_NOT_MERGE] Release 0.12.2 blockers candidate

2022-12-13 Thread GitBox


alexeykudinkin opened a new pull request, #7455:
URL: https://github.com/apache/hudi/pull/7455

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin closed pull request #7454: [DO_NOT_MERGE] Release 0.12.2 branch candidate

2022-12-13 Thread GitBox


alexeykudinkin closed pull request #7454: [DO_NOT_MERGE] Release 0.12.2 branch 
candidate
URL: https://github.com/apache/hudi/pull/7454


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch release-0.12.2-blockers-candidate updated (51af3e5f943 -> 51f15f500b3)

2022-12-13 Thread akudinkin
This is an automated email from the ASF dual-hosted git repository.

akudinkin pushed a change to branch release-0.12.2-blockers-candidate
in repository https://gitbox.apache.org/repos/asf/hudi.git


 discard 51af3e5f943 [HUDI-5348] Cache file slices in HoodieBackedTableMetadata 
(#7436)
 discard 5a6b4de0e04 [HUDI-5296] Allow disable schema on read after enabling 
(#7421)
 discard 7538e7e1512 [HUDI-5078] Fixing isTableService for replace commits 
(#7037)
 discard 1c0c379df92 [HUDI-5353] Close file readers (#7412)
 discard f11edd53c78 [MINOR] Fix Out of Bounds Exception for 
DayBasedCompactionStrategy (#7360)
 discard 4bb8f31d64b [HUDI-5372] Fix NPE caused by alter table add column. 
(#7236)
 discard 25571aa03d0 [HUDI-5347] Cleaned up transient state from 
`ExpressionPayload` making it non-serializable (#7424)
 discard b09e361723d [HUDI-5336] Fixing parsing of log files while building 
file groups (#7393)
 discard e647096286d [HUDI-5338] Adjust coalesce behavior within NONE sort mode 
for bulk insert (#7396)
 discard f33a430b054 [HUDI-5342] Add new bulk insert sort modes repartitioning 
data by partition path (#7402)
 discard ff817d9009c [HUDI-5358] Fix flaky tests in 
TestCleanerInsertAndCleanByCommits (#7420)
 discard c7c74e127d7 [HUDI-5350] Fix oom cause compaction event lost problem 
(#7408)
 discard 34537d29375 [HUDI-5346][HUDI-5320] Fixing Create Table as Select 
(CTAS) performance gaps (#7370)
 discard aecfb40a99a [HUDI-5291] Fixing NPE in MOR column stats accounting 
(#7349)
 discard e56631f34c0 [HUDI-5345] Avoid fs.exists calls for metadata table in 
HFileBootstrapIndex (#7404)
 discard 7ccdbaedb45 [HUDI-5347] FIxing performance traps in Spark SQL `MERGE 
INTO` implementation (#7395)
 discard 025a8db3f1d [HUDI-5344] Fix CVE - upgrade protobuf-java (#6960)
 discard 94860a41dfd [HUDI-5163] Fix failure handling with spark datasource 
write (#7140)
 discard fa2fd8e97ed [HUDI-5344] Fix CVE - upgrade protobuf-java to 3.18.2 
(#6957)
 discard 14004c83f63 [HUDI-5151] Fix bug with broken flink data skipping caused 
by ClassNotFoundException of InLineFileSystem (#7124)
 discard 0c963205084 [HUDI-5253] HoodieMergeOnReadTableInputFormat could have 
duplicate records issue if it contains delta files while still splittable 
(#7264)
 discard 7e3451269a8 [HUDI-5242] Do not fail Meta sync in Deltastreamer when 
inline table service fails (#7243)
 discard 6948ab10020 [HUDI-5277] Close HoodieWriteClient before exiting 
RunClusteringProcedure (#7300)
 discard 44bbfef9a3a [HUDI-5260] Fix insert into sql command with strict sql 
insert mode (#7269)
 discard 7614443d518 [HUDI-5252] ClusteringCommitSink supports to rollback 
clustering (#7263)
 add d4ec501f755 [HUDI-5260] Fix insert into sql command with strict sql 
insert mode (#7269)
 add 5230a11f15d [HUDI-5277] Close HoodieWriteClient before exiting 
RunClusteringProcedure (#7300)
 add a78cb091f94 [HUDI-5242] Do not fail Meta sync in Deltastreamer when 
inline table service fails (#7243)
 add 4ccee729d29 [HUDI-5253] HoodieMergeOnReadTableInputFormat could have 
duplicate records issue if it contains delta files while still splittable 
(#7264)
 add 64a359b5bd8 [HUDI-5151] Fix bug with broken flink data skipping caused 
by ClassNotFoundException of InLineFileSystem (#7124)
 add ab80838fd35 [HUDI-5344] Fix CVE - upgrade protobuf-java to 3.18.2 
(#6957)
 add e3c956284ed [HUDI-5163] Fix failure handling with spark datasource 
write (#7140)
 add 8b294b05639 [HUDI-5344] Fix CVE - upgrade protobuf-java (#6960)
 add 8510aacba8e [HUDI-5347] FIxing performance traps in Spark SQL `MERGE 
INTO` implementation (#7395)
 add 4a28b8389f9 [HUDI-5345] Avoid fs.exists calls for metadata table in 
HFileBootstrapIndex (#7404)
 add 725a9b210a1 [HUDI-5291] Fixing NPE in MOR column stats accounting 
(#7349)
 add d156989 [HUDI-5346][HUDI-5320] Fixing Create Table as Select 
(CTAS) performance gaps (#7370)
 add f1d643e8f9e [HUDI-5350] Fix oom cause compaction event lost problem 
(#7408)
 add 172c438d64b [HUDI-5358] Fix flaky tests in 
TestCleanerInsertAndCleanByCommits (#7420)
 add bca85f376c1 [HUDI-5342] Add new bulk insert sort modes repartitioning 
data by partition path (#7402)
 add 7b8a7208602 [HUDI-5338] Adjust coalesce behavior within NONE sort mode 
for bulk insert (#7396)
 add 08b414dd15c [HUDI-5336] Fixing parsing of log files while building 
file groups (#7393)
 add 438f3ab6ae3 [HUDI-5347] Cleaned up transient state from 
`ExpressionPayload` making it non-serializable (#7424)
 add 39031d3c9ef [HUDI-5372] Fix NPE caused by alter table add column. 
(#7236)
 add 68361fae88e [MINOR] Fix Out of Bounds Exception for 
DayBasedCompactionStrategy (#7360)
 add 6e6940fc59e [HUDI-5353] Close file readers (#7412)
 add 4085f27cfb0 [HUDI-5078] Fixing isTableService for replace commits 
(#7037)
 add 70e4615c26a [HUDI-5296] Allow disable schema on read after enabling 
(#7421)
 add 51f15f500b3 

[GitHub] [hudi] hudi-bot commented on pull request #7454: [DO_NOT_MERGE] Release 0.12.2 branch candidate

2022-12-13 Thread GitBox


hudi-bot commented on PR #7454:
URL: https://github.com/apache/hudi/pull/7454#issuecomment-1350500644

   
   ## CI report:
   
   * d4f483df3c772b248200c9e781461c953900226a UNKNOWN
   * 51af3e5f943ce612de538e918ed36bd688312a73 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table

2022-12-13 Thread GitBox


hudi-bot commented on PR #7365:
URL: https://github.com/apache/hudi/pull/7365#issuecomment-1350500405

   
   ## CI report:
   
   * e58d4db34dea4225808760126be11d3c559da896 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13711)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13712)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7345: [HUDI-3378] RFC46 rebase

2022-12-13 Thread GitBox


hudi-bot commented on PR #7345:
URL: https://github.com/apache/hudi/pull/7345#issuecomment-1350500318

   
   ## CI report:
   
   * e5f1dba84479f08417f25f53a79f6dae4425ba23 UNKNOWN
   * 2cfcca5c4f1a3a17b68d50f605f736c3a03c2e3f UNKNOWN
   * d2158f73ae32ae032293b86137aa477853b2df02 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13708)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13713)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] wzx140 commented on pull request #7345: [HUDI-3378] RFC46 rebase

2022-12-13 Thread GitBox


wzx140 commented on PR #7345:
URL: https://github.com/apache/hudi/pull/7345#issuecomment-1350499176

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch release-0.12.2-blockers-candidate created (now 51af3e5f943)

2022-12-13 Thread akudinkin
This is an automated email from the ASF dual-hosted git repository.

akudinkin pushed a change to branch release-0.12.2-blockers-candidate
in repository https://gitbox.apache.org/repos/asf/hudi.git


  at 51af3e5f943 [HUDI-5348] Cache file slices in HoodieBackedTableMetadata 
(#7436)

No new revisions were added by this update.



[GitHub] [hudi] hudi-bot commented on pull request #7454: [DO_NOT_MERGE] Release 0.12.2 branch candidate

2022-12-13 Thread GitBox


hudi-bot commented on PR #7454:
URL: https://github.com/apache/hudi/pull/7454#issuecomment-1350495465

   
   ## CI report:
   
   * d4f483df3c772b248200c9e781461c953900226a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table

2022-12-13 Thread GitBox


hudi-bot commented on PR #7365:
URL: https://github.com/apache/hudi/pull/7365#issuecomment-1350495242

   
   ## CI report:
   
   * 424e8f0d6a4ec77335f375ccd567b46076340b72 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13413)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13435)
 
   * e58d4db34dea4225808760126be11d3c559da896 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13711)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13712)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] stream2000 commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table

2022-12-13 Thread GitBox


stream2000 commented on PR #7365:
URL: https://github.com/apache/hudi/pull/7365#issuecomment-1350494670

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table

2022-12-13 Thread GitBox


hudi-bot commented on PR #7365:
URL: https://github.com/apache/hudi/pull/7365#issuecomment-1350490058

   
   ## CI report:
   
   * 424e8f0d6a4ec77335f375ccd567b46076340b72 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13413)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13435)
 
   * e58d4db34dea4225808760126be11d3c559da896 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7345: [HUDI-3378] RFC46 rebase

2022-12-13 Thread GitBox


hudi-bot commented on PR #7345:
URL: https://github.com/apache/hudi/pull/7345#issuecomment-1350489965

   
   ## CI report:
   
   * e5f1dba84479f08417f25f53a79f6dae4425ba23 UNKNOWN
   * 2cfcca5c4f1a3a17b68d50f605f736c3a03c2e3f UNKNOWN
   * d2158f73ae32ae032293b86137aa477853b2df02 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13708)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] txl2017 commented on issue #7312: [SUPPORT]How about add ttlEvent method in BaseAvroPayload

2022-12-13 Thread GitBox


txl2017 commented on issue #7312:
URL: https://github.com/apache/hudi/issues/7312#issuecomment-1350489131

   yes, it's.
   at first, we can use ttl to manage data's life cycle, and this is a normal 
requirement.
   in addition, we can use ttl to trigger a event, to implements some other 
logic if not want deleting records.
   
   for example, our sensors upload data every 10 second, if some sensors is 
bad, it will not upload data any more, 
   in the upload data have a field `STATUS`, to mark the sensor's status, if a 
sensor not upload data in 6 minutes,
   we think the sensor is bad and have to change the `STATUS` value to 0.
   
   so, we want a ttl event, if sensor's last time more than 6 minutes, not 
delete record but change some value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7224: [HUDI-5182] Create Hudi CLI Bundle

2022-12-13 Thread GitBox


hudi-bot commented on PR #7224:
URL: https://github.com/apache/hudi/pull/7224#issuecomment-1350483067

   
   ## CI report:
   
   * 01aa2325fda6298bc776083260b5863f2124523d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13700)
 
   * 6c98c102cf86ca7158a89d284c73bc5652028297 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13710)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5388) Fix the pom to not depend on specific OS and Arch

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5388:

Fix Version/s: 0.13.0

> Fix the pom to not depend on specific OS and Arch
> -
>
> Key: HUDI-5388
> URL: https://issues.apache.org/jira/browse/HUDI-5388
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.13.0
>
>
> [https://github.com/apache/hudi/pull/6751#issuecomment-1350468029]
>  
> HUDI-4972 introduces the risk of building release artifacts with different 
> versions of dependencies on M1 Macbook (note that the release manager needs 
> to build the artifacts locally and push them to the staging area using the 
> release script). We should directly upgrade the versions without keeping two 
> variants based on the OS/Arch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5388) Fix the pom to not depend on specific OS and Arch

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5388:

Description: 
[https://github.com/apache/hudi/pull/6751#issuecomment-1350468029]

 

HUDI-4972 introduces the risk of building release artifacts with different 
versions of dependencies on M1 Macbook (note that the release manager needs to 
build the artifacts locally and push them to the staging area using the release 
script). We should directly upgrade the versions without keeping two variants 
based on the OS/Arch.

> Fix the pom to not depend on specific OS and Arch
> -
>
> Key: HUDI-5388
> URL: https://issues.apache.org/jira/browse/HUDI-5388
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
>
> [https://github.com/apache/hudi/pull/6751#issuecomment-1350468029]
>  
> HUDI-4972 introduces the risk of building release artifacts with different 
> versions of dependencies on M1 Macbook (note that the release manager needs 
> to build the artifacts locally and push them to the staging area using the 
> release script). We should directly upgrade the versions without keeping two 
> variants based on the OS/Arch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5388) Fix the pom to not depend on specific OS and Arch

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5388:

Priority: Critical  (was: Major)

> Fix the pom to not depend on specific OS and Arch
> -
>
> Key: HUDI-5388
> URL: https://issues.apache.org/jira/browse/HUDI-5388
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Critical
> Fix For: 0.13.0
>
>
> [https://github.com/apache/hudi/pull/6751#issuecomment-1350468029]
>  
> HUDI-4972 introduces the risk of building release artifacts with different 
> versions of dependencies on M1 Macbook (note that the release manager needs 
> to build the artifacts locally and push them to the staging area using the 
> release script). We should directly upgrade the versions without keeping two 
> variants based on the OS/Arch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5388) Fix the pom to not depend on specific OS and Arch

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5388:

Summary: Fix the pom to not depend on specific OS and Arch  (was: Fix the 
pom to not depend on specific OS/Arch)

> Fix the pom to not depend on specific OS and Arch
> -
>
> Key: HUDI-5388
> URL: https://issues.apache.org/jira/browse/HUDI-5388
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5388) Fix the pom to not depend on specific OS/Arch

2022-12-13 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-5388:
---

 Summary: Fix the pom to not depend on specific OS/Arch
 Key: HUDI-5388
 URL: https://issues.apache.org/jira/browse/HUDI-5388
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] yihua commented on pull request #6751: [HUDI-4972] Fixes to make unit tests work on m1

2022-12-13 Thread GitBox


yihua commented on PR #6751:
URL: https://github.com/apache/hudi/pull/6751#issuecomment-1350468029

   After revisiting this PR, I feel that this introduces the risk of building 
release artifacts with different versions of dependencies on M1 Macbook (note 
that the release manager needs to build the artifacts locally and push them to 
the staging area using the release script).  We should directly upgrade the 
versions without keeping two variants based on the OS/Arch.  
[HUDI-5388](https://issues.apache.org/jira/browse/HUDI-5388) for followup.
   
   cc @nsivabalan @alexeykudinkin @xushiyan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-5025) Rollback failed with log file not found when rollOver in rollback process

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-5025:
---

Assignee: Ethan Guo

> Rollback failed with log file not found when rollOver in rollback process
> -
>
> Key: HUDI-5025
> URL: https://issues.apache.org/jira/browse/HUDI-5025
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: konwu
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
> Attachments: image-2022-10-13-18-15-57-023.png
>
>
> Currently when rollOver happened in rollback was not create new log file 
> ,thus caused below exception 
> some test log:
> 2022-10-13 16:58:54,613 INFO  
> org.apache.hudi.common.table.log.HoodieLogFormatWriter       [] - 
> HoodieLogFile\{pathStr='viewfs://dcfs/ns-common/car/dws/dws_order_info_by_flinkbatch_history/2022-10-12/.0002-1251-4f1c-8f75-71ff51071ee3_20221013052439696.log.1_2-4-0',
>  fileLen=0} exists. Appending to existing file
> 2022-10-13 16:58:54,974 INFO  
> org.apache.hudi.table.action.rollback.BaseRollbackHelper     [] - after 
> testrollback writer.LogFile: 
> viewfs://dcfs/ns-common/car/dws/dws_order_info_by_flinkbatch_history/2022-10-12/.0002-1251-4f1c-8f75-71ff51071ee3_20221013052439696.log.2_1-0-1
>  
> !image-2022-10-13-18-15-57-023.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] alexeykudinkin opened a new pull request, #7454: [DNM] Release 0.12.2 branch candidate

2022-12-13 Thread GitBox


alexeykudinkin opened a new pull request, #7454:
URL: https://github.com/apache/hudi/pull/7454

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kepplertreet opened a new issue, #7453: [SUPPORT] Hudi Upsert fails for

2022-12-13 Thread GitBox


kepplertreet opened a new issue, #7453:
URL: https://github.com/apache/hudi/issues/7453

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? Yes
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   After the initial bulk insert ran a streaming job with the below mentioned 
HUDI configs. 
   Fails to Upsert for a given commit time. 
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Carry out a table bulk insert using the following Hudi Configs 
   2. Ran a Spark Structured Streaming Application on top of it for Incremental 
CDC
   
   **Expected behavior**
  Column stats are created and used for Incremental Upsert Operations 
   
   **Environment Description**
   
   * Hudi version : 0.11.1 (EMR)
   
   * Spark version :  3.3.0 (EMR)
   
   * Hive version : 3.1.3 (EMR)
   
   * Emr version : 6.8.0 
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : NO 
   
   **Additional context**
 
  - Hudi Cconfigs   
  *Bulk Insert* 
  `"hoodie.table.name":  ,
   "hoodie.datasource.write.table.name":  ,
   "hoodie.datasource.write.table.type" : "MERGE_ON_READ", 
   "hoodie.datasource.write.recordkey.field": "id", 
   "hoodie.datasource.write.partitionpath.field" : 
"_year_month",
   "hoodie.datasource.write.keygenerator.class": 
"org.apache.hudi.keygen.SimpleKeyGenerator", 
   "hoodie.datasource.hive_sync.table" :  , 
   "hoodie.datasource.hive_sync.database" :  
, 
   "hoodie.datasource.write.row.writer.enable" : "true", 
   "hoodie.upsert.shuffle.parallelism": 6,
   "hoodie.bulkinsert.shuffle.parallelism" : 338, 
   "hoodie.table.version": "4",
   "hoodie.datasource.write.operation": "bulk_insert",
   "hoodie.datasource.write.hive_style_partitioning": 
"false",
   "hoodie.datasource.write.precombine.field": 
"_commit_time_ms",
   "hoodie.datasource.write.commitmeta.key.prefix": "_",
   "hoodie.datasource.write.insert.drop.duplicates": 
"false",
   "hoodie.datasource.hive_sync.enable": "true",
   "hoodie.datasource.hive_sync.use_jdbc": "true",
   "hoodie.datasource.hive_sync.auto_create_database": 
"true",
   "hoodie.datasource.hive_sync.support_timestamp": "false",
   "hoodie.datasource.hive_sync.skip_ro_suffix": "true",
   "hoodie.parquet.compression.codec": "snappy",
   "hoodie.metrics.on": "false",
   "hoodie.metadata.enable": "true",
   "hoodie.metadata.metrics.enable": "false",
   "hoodie.metadata.clean.async": "false",
   "hoodie.metadata.index.column.stats.enable": "true", 
   "hoodie.metadata.index.bloom.filter.enable": "true",
   "hoodie.datasource.compaction.async.enable": "false",
   "hoodie.compact.inline": "true",
   "hoodie.index.type": "BLOOM",
   "hoodie.parquet.small.file.limit": 209715200,
   "hoodie.parquet.max.file.size": 268435456`
   * Upsert **(Spark Structured Streaming)** *
   Property 
Value
   0hoodie.table.version
  4
   1   hoodie.datasource.write.operation
 upsert
   2 hoodie.datasource.write.hive_style_partitioning
  false
   3hoodie.datasource.write.precombine.field
_commit_time_ms
   4   hoodie.datasource.write.commitmeta.key.prefix
  _
   5  hoodie.datasource.write.insert.drop.duplicates
  false
   6  hoodie.datasource.hive_sync.enable
   true
   7hoodie.datasource.hive_sync.use_jdbc
   true
   8hoodie.datasource.hive_sync.auto_create_database
   true
   9   hoodie.datasource.hive_sync.support_timestamp
  false
   10 hoodie.datasource.hive_sync.skip_ro_suffix

[jira] [Updated] (HUDI-5387) Add bundle validation for hudi-cli-bundle

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5387:

Description: We add a new hudi-cli-bundle in 
[https://github.com/apache/hudi/pull/7224] .  We should add the bundle 
validation for hudi-cli-bundle in 
[https://github.com/apache/hudi/tree/master/packaging/bundle-validation] so 
that we can validate the basic functionality of the CLI bundle. 

> Add bundle validation for hudi-cli-bundle
> -
>
> Key: HUDI-5387
> URL: https://issues.apache.org/jira/browse/HUDI-5387
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.13.0
>
>
> We add a new hudi-cli-bundle in [https://github.com/apache/hudi/pull/7224] .  
> We should add the bundle validation for hudi-cli-bundle in 
> [https://github.com/apache/hudi/tree/master/packaging/bundle-validation] so 
> that we can validate the basic functionality of the CLI bundle. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] yihua commented on pull request #7224: [HUDI-5182] Create Hudi CLI Bundle

2022-12-13 Thread GitBox


yihua commented on PR #7224:
URL: https://github.com/apache/hudi/pull/7224#issuecomment-1350422376

   As a follow-up, we should add bundle validation for hudi-cli-bundle: 
HUDI-5387.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5387) Add bundle validation for hudi-cli-bundle

2022-12-13 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-5387:
---

 Summary: Add bundle validation for hudi-cli-bundle
 Key: HUDI-5387
 URL: https://issues.apache.org/jira/browse/HUDI-5387
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5387) Add bundle validation for hudi-cli-bundle

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5387:

Fix Version/s: 0.13.0

> Add bundle validation for hudi-cli-bundle
> -
>
> Key: HUDI-5387
> URL: https://issues.apache.org/jira/browse/HUDI-5387
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-13 Thread GitBox


hudi-bot commented on PR #7450:
URL: https://github.com/apache/hudi/pull/7450#issuecomment-1350419453

   
   ## CI report:
   
   * f11664234aaf6c74c98c1d75a364770931f9c00b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13706)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7224: [HUDI-5182] Create Hudi CLI Bundle

2022-12-13 Thread GitBox


hudi-bot commented on PR #7224:
URL: https://github.com/apache/hudi/pull/7224#issuecomment-1350419147

   
   ## CI report:
   
   * 01aa2325fda6298bc776083260b5863f2124523d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13700)
 
   * 6c98c102cf86ca7158a89d284c73bc5652028297 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on a diff in pull request #7224: [HUDI-5182] Create Hudi CLI Bundle

2022-12-13 Thread GitBox


yihua commented on code in PR #7224:
URL: https://github.com/apache/hudi/pull/7224#discussion_r1048026350


##
packaging/hudi-cli-bundle/pom.xml:
##
@@ -0,0 +1,244 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  
+hudi
+org.apache.hudi
+0.13.0-SNAPSHOT
+../../pom.xml
+  
+  4.0.0
+  hudi-cli-bundle_${scala.binary.version}
+  jar
+
+  
+true
+${project.parent.basedir}
+true
+
+3.0.3
+2.0.2
+3.21.0
+2.6.2
+  
+
+  
+
+  
+org.apache.rat
+apache-rat-plugin
+  
+  
+org.apache.maven.plugins
+maven-shade-plugin
+${maven-shade-plugin.version}
+
+  
+package
+
+  shade
+
+
+  ${shadeSources}
+  
${project.build.directory}/dependency-reduced-pom.xml
+  
+  
+
+
+  true
+
+
+  META-INF/LICENSE
+  target/classes/META-INF/LICENSE
+
+
+  META-INF/spring.handlers
+
+
+  META-INF/spring.schemas
+
+
+  META-INF/spring.factories
+
+
+
+  org.apache.hudi.cli.Main
+
+
+  
META-INF/spring/org.springframework.boot.actuate.autoconfigure.web.ManagementContextConfiguration.imports
+
+
+  
META-INF/spring/org.springframework.boot.autoconfigure.AutoConfiguration.imports
+
+  
+  
+
+  
+  org.apache.hudi:hudi-cli
+  
org.apache.hudi:hudi-utilities_${scala.binary.version}
+  
+  com.fasterxml:classmate
+  com.fasterxml.woodstox:woodstox-core
+  com.google.code.gson:gson
+  com.google.re2j:re2j
+  com.jakewharton.fliptables:fliptables
+
+  jakarta.el:jakarta.el-api
+  jakarta.validation:jakarta.validation-api
+  net.java.dev.jna:jna
+
+  org.apache.commons:commons-configuration2

Review Comment:
   I resolved it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on a diff in pull request #7224: [HUDI-5182] Create Hudi CLI Bundle

2022-12-13 Thread GitBox


yihua commented on code in PR #7224:
URL: https://github.com/apache/hudi/pull/7224#discussion_r1048021060


##
packaging/hudi-cli-bundle/pom.xml:
##
@@ -0,0 +1,244 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  
+hudi
+org.apache.hudi
+0.13.0-SNAPSHOT
+../../pom.xml
+  
+  4.0.0
+  hudi-cli-bundle_${scala.binary.version}
+  jar
+
+  
+true
+${project.parent.basedir}
+true
+
+3.0.3
+2.0.2
+3.21.0
+2.6.2
+  
+
+  
+
+  
+org.apache.rat
+apache-rat-plugin
+  
+  
+org.apache.maven.plugins
+maven-shade-plugin
+${maven-shade-plugin.version}
+
+  
+package
+
+  shade
+
+
+  ${shadeSources}
+  
${project.build.directory}/dependency-reduced-pom.xml
+  
+  
+
+
+  true
+
+
+  META-INF/LICENSE
+  target/classes/META-INF/LICENSE
+
+
+  META-INF/spring.handlers
+
+
+  META-INF/spring.schemas
+
+
+  META-INF/spring.factories
+
+
+
+  org.apache.hudi.cli.Main
+
+
+  
META-INF/spring/org.springframework.boot.actuate.autoconfigure.web.ManagementContextConfiguration.imports
+
+
+  
META-INF/spring/org.springframework.boot.autoconfigure.AutoConfiguration.imports
+
+  
+  
+
+  
+  org.apache.hudi:hudi-cli
+  
org.apache.hudi:hudi-utilities_${scala.binary.version}
+  
+  com.fasterxml:classmate
+  com.fasterxml.woodstox:woodstox-core
+  com.google.code.gson:gson
+  com.google.re2j:re2j
+  com.jakewharton.fliptables:fliptables
+
+  jakarta.el:jakarta.el-api
+  jakarta.validation:jakarta.validation-api
+  net.java.dev.jna:jna
+
+  org.apache.commons:commons-configuration2

Review Comment:
   As the `commons-configuration2` dependency is removed, this `include` entry 
is no longer needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7345: [HUDI-3378] RFC46 rebase

2022-12-13 Thread GitBox


hudi-bot commented on PR #7345:
URL: https://github.com/apache/hudi/pull/7345#issuecomment-1350361418

   
   ## CI report:
   
   * e5f1dba84479f08417f25f53a79f6dae4425ba23 UNKNOWN
   * 2cfcca5c4f1a3a17b68d50f605f736c3a03c2e3f UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   * d2158f73ae32ae032293b86137aa477853b2df02 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13708)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Write call stack information to lock file

2022-12-13 Thread GitBox


hudi-bot commented on PR #7440:
URL: https://github.com/apache/hudi/pull/7440#issuecomment-1350358330

   
   ## CI report:
   
   * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN
   * 2d1a6bec193bc6064f049b70fb7e8dcfa9a97277 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13704)
 
   * 0d169ce9d63166dcfd93889019704b3edb71ed10 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13709)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7345: [HUDI-3378] RFC46 rebase

2022-12-13 Thread GitBox


hudi-bot commented on PR #7345:
URL: https://github.com/apache/hudi/pull/7345#issuecomment-1350358060

   
   ## CI report:
   
   * e5f1dba84479f08417f25f53a79f6dae4425ba23 UNKNOWN
   * 2cfcca5c4f1a3a17b68d50f605f736c3a03c2e3f UNKNOWN
   * 060d8e2673cb07498e34b20424da566a411f4e1d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13703)
 
   *  Unknown: [CANCELED](TBD) 
   * d2158f73ae32ae032293b86137aa477853b2df02 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7224: [HUDI-5182] Create Hudi CLI Bundle

2022-12-13 Thread GitBox


hudi-bot commented on PR #7224:
URL: https://github.com/apache/hudi/pull/7224#issuecomment-1350357883

   
   ## CI report:
   
   * 01aa2325fda6298bc776083260b5863f2124523d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13700)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] wzx140 commented on pull request #7345: [HUDI-3378] RFC46 rebase

2022-12-13 Thread GitBox


wzx140 commented on PR #7345:
URL: https://github.com/apache/hudi/pull/7345#issuecomment-1350357398

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] soumilshah1995 closed issue #7451: [SUPPORT] o120.showString. java.lang.NullPointerException

2022-12-13 Thread GitBox


soumilshah1995 closed issue #7451: [SUPPORT]  o120.showString. 
java.lang.NullPointerException
URL: https://github.com/apache/hudi/issues/7451


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4526) improve spillableMapBasePath disk directory is full

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-4526:

Priority: Blocker  (was: Critical)

> improve spillableMapBasePath disk directory is full
> ---
>
> Key: HUDI-4526
> URL: https://issues.apache.org/jira/browse/HUDI-4526
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> {code:java}
> // code placeholder
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/data13/yarnenv/local/filecache/72005/spark-jars.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/data/gaiaadmin/gaiaenv/tdwgaia/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 22/08/02 19:14:55 ERROR AbstractHoodieLogRecordReader: Got exception when 
> reading log file
> org.apache.hudi.exception.HoodieIOException: Unable to create 
> :/tmp/hudi-BITCASK-092a9065-a2b6-4a72-aff4-23a7072e8064
>   at 
> org.apache.hudi.common.util.collection.ExternalSpillableMap.getDiskBasedMap(ExternalSpillableMap.java:122)
>   at 
> org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:197)
>   at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.processNextDeletedRecord(HoodieMergedLogRecordScanner.java:168)
>   at 
> java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
>   at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
>   at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473)
>   at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343)
>   at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:192)
>   at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:110)
>   at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:103)
>   at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:324)
>   at 
> org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:370)
>   at 
> org.apache.hudi.HoodieMergeOnReadRDD$LogFileIterator.(HoodieMergeOnReadRDD.scala:171)
>   at 
> org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:92)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1419)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Unable to create 
> :/tmp/hudi-BITCASK-092a9065-a2b6-4a72-aff4-23a7072e8064
>   at org.apache.hudi.common.util.FileIOUtils.mkdir(FileIOUtils.java:70)
>   at 
> org.apache.hudi.common.util.collection.DiskMap.(DiskMap.java:55)
>   at 
> 

[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Write call stack information to lock file

2022-12-13 Thread GitBox


hudi-bot commented on PR #7440:
URL: https://github.com/apache/hudi/pull/7440#issuecomment-1350352121

   
   ## CI report:
   
   * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN
   * 2d1a6bec193bc6064f049b70fb7e8dcfa9a97277 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13704)
 
   * 0d169ce9d63166dcfd93889019704b3edb71ed10 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-5318) Clustering schduling now will list all partition in table when PARTITION_SELECTED is set

2022-12-13 Thread Qijun Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qijun Fu closed HUDI-5318.
--
 Reviewers: Shaofeng Li
Resolution: Fixed

> Clustering schduling now will list all partition in table when 
> PARTITION_SELECTED is set
> 
>
> Key: HUDI-5318
> URL: https://issues.apache.org/jira/browse/HUDI-5318
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: clustering
>Reporter: Qijun Fu
>Assignee: Qijun Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> Currently PartitionAwareClusteringPlanStrategy will list all partition in 
> table whether PARTITION_SELECTED is set or not. List all partition in the 
> dataset is a very expensive operation when the number of partition is huge. 
> We can skip list all partition when PARTITION_SELECTED is set, so that 
> clustering scheduling can benefit a lot from  partition pruning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5318) Clustering schduling now will list all partition in table when PARTITION_SELECTED is set

2022-12-13 Thread Qijun Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qijun Fu updated HUDI-5318:
---
Fix Version/s: 0.12.2

> Clustering schduling now will list all partition in table when 
> PARTITION_SELECTED is set
> 
>
> Key: HUDI-5318
> URL: https://issues.apache.org/jira/browse/HUDI-5318
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: clustering
>Reporter: Qijun Fu
>Assignee: Qijun Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> Currently PartitionAwareClusteringPlanStrategy will list all partition in 
> table whether PARTITION_SELECTED is set or not. List all partition in the 
> dataset is a very expensive operation when the number of partition is huge. 
> We can skip list all partition when PARTITION_SELECTED is set, so that 
> clustering scheduling can benefit a lot from  partition pruning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-5318) Clustering schduling now will list all partition in table when PARTITION_SELECTED is set

2022-12-13 Thread Qijun Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qijun Fu resolved HUDI-5318.


> Clustering schduling now will list all partition in table when 
> PARTITION_SELECTED is set
> 
>
> Key: HUDI-5318
> URL: https://issues.apache.org/jira/browse/HUDI-5318
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: clustering
>Reporter: Qijun Fu
>Assignee: Qijun Fu
>Priority: Major
>  Labels: pull-request-available
>
> Currently PartitionAwareClusteringPlanStrategy will list all partition in 
> table whether PARTITION_SELECTED is set or not. List all partition in the 
> dataset is a very expensive operation when the number of partition is huge. 
> We can skip list all partition when PARTITION_SELECTED is set, so that 
> clustering scheduling can benefit a lot from  partition pruning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5233) Fix bug when InternalSchemaUtils.collectTypeChangedCols returns all columns

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5233:

Priority: Blocker  (was: Major)

> Fix bug when InternalSchemaUtils.collectTypeChangedCols returns all columns
> ---
>
> Key: HUDI-5233
> URL: https://issues.apache.org/jira/browse/HUDI-5233
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Alexander Trushev
>Assignee: Alexander Trushev
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> InternalSchemaUtils.collectTypeChangedCols returns all columns instead of 
> changed ones



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5244) Fix bugs in schema evolution client with lost operation field and not found schema

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5244:

Priority: Blocker  (was: Major)

> Fix bugs in schema evolution client with lost operation field and not found 
> schema
> --
>
> Key: HUDI-5244
> URL: https://issues.apache.org/jira/browse/HUDI-5244
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Alexander Trushev
>Assignee: Alexander Trushev
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
>
> Currently, BaseHoodieWriteClient contains schema evolution methods such as
> * addColumn
> * deleteColumns
> * renameColumn
> * updateColumnNullability
> * updateColumnType
> * updateColumnComment
> * reOrderColPosition
> These methods are not covered with tests and contain two issues:
> # Lost operation field in avro schema
> # Not found schema for table
> {code:java}
> org.apache.hudi.exception.HoodieException: cannot find schema for current 
> table: /tmp/hudi
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.getInternalSchemaAndMetaClient(BaseHoodieWriteClient.java:1767)
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.addColumn(BaseHoodieWriteClient.java:1673)
>   at 
> org.apache.hudi.sink.TestWriteCopyOnWrite.test(TestWriteCopyOnWrite.java:454)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5244) Fix bugs in schema evolution client with lost operation field and not found schema

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5244:

Fix Version/s: 0.12.2

> Fix bugs in schema evolution client with lost operation field and not found 
> schema
> --
>
> Key: HUDI-5244
> URL: https://issues.apache.org/jira/browse/HUDI-5244
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Alexander Trushev
>Assignee: Alexander Trushev
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
>
> Currently, BaseHoodieWriteClient contains schema evolution methods such as
> * addColumn
> * deleteColumns
> * renameColumn
> * updateColumnNullability
> * updateColumnType
> * updateColumnComment
> * reOrderColPosition
> These methods are not covered with tests and contain two issues:
> # Lost operation field in avro schema
> # Not found schema for table
> {code:java}
> org.apache.hudi.exception.HoodieException: cannot find schema for current 
> table: /tmp/hudi
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.getInternalSchemaAndMetaClient(BaseHoodieWriteClient.java:1767)
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.addColumn(BaseHoodieWriteClient.java:1673)
>   at 
> org.apache.hudi.sink.TestWriteCopyOnWrite.test(TestWriteCopyOnWrite.java:454)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5260) Insert into sql with strict insert mode and no preCombineField should not overwrite existing records

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5260:

Priority: Critical  (was: Minor)

> Insert into sql with strict insert mode and no preCombineField should not 
> overwrite existing records
> 
>
> Key: HUDI-5260
> URL: https://issues.apache.org/jira/browse/HUDI-5260
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: kazdy
>Assignee: kazdy
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> Spark sql insert updates the whole record if the record with same PK already 
> exists in hudi table that has no preCombineField specified and strict insert 
> mode is used.
> To Reproduce
> Steps to reproduce the behavior:
> create table hudi_cow_nonpcf_tbl (
>   uuid int,
>   name string,
>   price double
> ) using hudi;
> set hoodie.sql.insert.mode=strict;
> # first insert
> insert into hudi_cow_nonpcf_tbl select 1, ‘a1’, 20;
> select * from hudi_cow_nonpcf_tbl;
> # returns
> 1    a1    20.0
> # another insert with the same key, different values:
> insert into hudi_cow_nonpcf_tbl select 1, ‘a2’, 30;
> select * from hudi_cow_nonpcf_tbl;
> # returns
> 1    a2    30.0
> Expected behavior
> There's a difference in behavior when precombine field is specified and Hudi 
> throws an error.
> I would expect the second insert fail if a record with the same key already 
> exists when precombine field is not specified and strict insert mode is 
> enabled.
> https://github.com/apache/hudi/issues/7266



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] JoshuaZhuCN commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

2022-12-13 Thread GitBox


JoshuaZhuCN commented on issue #7322:
URL: https://github.com/apache/hudi/issues/7322#issuecomment-1350344575

   > > @alexeykudinkin At present, the problem I encounter is not only that the 
Spark datasource cannot be read after it is written, but also that the Spark 
sql cannot be read after it is written by Flink using hive sync. In other 
words, the SparkSQL query can not immediately read new data in any other way 
except by writing data in SQL. Therefore, I think this is a problem that needs 
to be solved
   > 
   > Interesting. Can you please create another issue specifically for this one 
as this hardly could be related?
   
   @alexeykudinkin here https://github.com/apache/hudi/issues/7452


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5277) RunClusteringProcedure can't exit corretly

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5277:

Priority: Blocker  (was: Minor)

> RunClusteringProcedure can't exit corretly
> --
>
> Key: HUDI-5277
> URL: https://issues.apache.org/jira/browse/HUDI-5277
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Reporter: Qijun Fu
>Assignee: Qijun Fu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> When executing RunClusteringProcedure, we found out that the spark program 
> couldn't exit although we successfully finished the clustering. 
> Then we checked the threads and found that some threads of timeline server 
> was still remaining. We should close the timeline server before exiting 
> RunClusteringProcedure. This problem is not only existing in 
> RunClusteringProcedure, other procedures like 
> RunCompactionProcedure also have the same problem.
> We  have submitted a pr to fix the problem in clustering.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5252) ClusteringCommitSink supports to rollback clustering

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5252:

Priority: Blocker  (was: Major)

> ClusteringCommitSink supports to rollback clustering
> 
>
> Key: HUDI-5252
> URL: https://issues.apache.org/jira/browse/HUDI-5252
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
>
> When commit buffer has failed ClusteringCommitEvent, the ClusteringCommitSink 
> invokes the CompactionUtil#rollbackCompaction to rollback clustering. 
> ClusteringCommitSink should call ClusteringUtil#rollbackClustering to 
> rollback clustering. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] JoshuaZhuCN opened a new issue, #7452: [SUPPORT]SparkSQL can not read the latest data(snapshot mode) after write by flink

2022-12-13 Thread GitBox


JoshuaZhuCN opened a new issue, #7452:
URL: https://github.com/apache/hudi/issues/7452

   SparkSQL can not read the latest data(snapshot mode) after write by flink.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. run a spark session first(e.g:spark-sql, spark-shell,thrift-server )
   2. insert the data to hudi by flink stream api using hive sync
   3. upsert the data to hudi by flink stream api using hive sync
   4. using spark sql to query,it can not query the latest data in step3
   5. using spark sql to execute `refresh table xxx`
   6. using spark sql to query again, it can query the latest data in step3
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.12.1
   
   * Spark version : 3.1.3
   
   * Hive version : 3.1.0
   
   * Hadoop version : 3.1.1
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5206) RowColumnReader should not return null value for certain null child columns

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5206:

Priority: Critical  (was: Major)

> RowColumnReader should not return null value for certain null child columns
> ---
>
> Key: HUDI-5206
> URL: https://issues.apache.org/jira/browse/HUDI-5206
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When reading to vector of certain null child columns of row type column, 
> RowColumnReader should not return null value because the value of the row 
> type column may not be null, which results in incorrect values of row type 
> column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5008) Avoid unset HoodieROTablePathFilter in IncrementalRelation

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5008:

Priority: Critical  (was: Major)

> Avoid unset HoodieROTablePathFilter in IncrementalRelation
> --
>
> Key: HUDI-5008
> URL: https://issues.apache.org/jira/browse/HUDI-5008
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: incremental-query, spark
>Reporter: Hui An
>Assignee: Hui An
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> If users create an incrementalRelation while join another existing hive hudi 
> table, as pathFilter is unset inside incrementalRelation, all files under 
> hive hudi table will be selected.
> Now HoodieROTablePathFilter can accept {{as.of.instant}} to do the time 
> travel, so instead we pass as.of.instant to the dataframe(not change spark 
> hadoop conf globally) to avoid this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5185) Compaction run fails with --hoodieConfigs

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5185:

Fix Version/s: 0.12.2

> Compaction run fails with --hoodieConfigs
> -
>
> Key: HUDI-5185
> URL: https://issues.apache.org/jira/browse/HUDI-5185
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: xi chaomin
>Assignee: xi chaomin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> compaction run --schemaFilePath /tmp/compaction.schema --hoodieConfigs 
> hoodie.embed.timeline.server=false
> {code:java}
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 
> 11150 [main] ERROR org.apache.hudi.cli.commands.SparkMain [] - Fail to 
> execute commandString
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 
> java.lang.IllegalArgumentException: null
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
>  ~[hudi-common-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.hudi.utilities.UtilHelpers.lambda$buildProperties$0(UtilHelpers.java:233)
>  ~[hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_271]
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.hudi.utilities.UtilHelpers.buildProperties(UtilHelpers.java:231) 
> ~[hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.hudi.utilities.HoodieCompactor.(HoodieCompactor.java:69) 
> ~[hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.hudi.cli.commands.SparkMain.compact(SparkMain.java:420) 
> ~[hudi-cli-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.hudi.cli.commands.SparkMain.main(SparkMain.java:156) 
> ~[hudi-cli-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_271]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_271]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_271]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_271]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>  ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131917 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131917 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131917 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-5185) Compaction run fails with --hoodieConfigs

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-5185.
---
Resolution: Fixed

> Compaction run fails with --hoodieConfigs
> -
>
> Key: HUDI-5185
> URL: https://issues.apache.org/jira/browse/HUDI-5185
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: xi chaomin
>Assignee: xi chaomin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> compaction run --schemaFilePath /tmp/compaction.schema --hoodieConfigs 
> hoodie.embed.timeline.server=false
> {code:java}
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 
> 11150 [main] ERROR org.apache.hudi.cli.commands.SparkMain [] - Fail to 
> execute commandString
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 
> java.lang.IllegalArgumentException: null
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
>  ~[hudi-common-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.hudi.utilities.UtilHelpers.lambda$buildProperties$0(UtilHelpers.java:233)
>  ~[hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_271]
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.hudi.utilities.UtilHelpers.buildProperties(UtilHelpers.java:231) 
> ~[hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.hudi.utilities.HoodieCompactor.(HoodieCompactor.java:69) 
> ~[hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.hudi.cli.commands.SparkMain.compact(SparkMain.java:420) 
> ~[hudi-cli-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.hudi.cli.commands.SparkMain.main(SparkMain.java:156) 
> ~[hudi-cli-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_271]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_271]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_271]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_271]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>  ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131917 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131917 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131917 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5185) Compaction run fails with --hoodieConfigs

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5185:

Priority: Blocker  (was: Major)

> Compaction run fails with --hoodieConfigs
> -
>
> Key: HUDI-5185
> URL: https://issues.apache.org/jira/browse/HUDI-5185
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: xi chaomin
>Assignee: xi chaomin
>Priority: Blocker
>  Labels: pull-request-available
>
> compaction run --schemaFilePath /tmp/compaction.schema --hoodieConfigs 
> hoodie.embed.timeline.server=false
> {code:java}
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 
> 11150 [main] ERROR org.apache.hudi.cli.commands.SparkMain [] - Fail to 
> execute commandString
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 
> java.lang.IllegalArgumentException: null
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
>  ~[hudi-common-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.hudi.utilities.UtilHelpers.lambda$buildProperties$0(UtilHelpers.java:233)
>  ~[hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_271]
> 131915 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.hudi.utilities.UtilHelpers.buildProperties(UtilHelpers.java:231) 
> ~[hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.hudi.utilities.HoodieCompactor.(HoodieCompactor.java:69) 
> ~[hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.hudi.cli.commands.SparkMain.compact(SparkMain.java:420) 
> ~[hudi-cli-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.hudi.cli.commands.SparkMain.main(SparkMain.java:156) 
> ~[hudi-cli-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_271]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_271]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_271]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_271]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>  ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131916 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131917 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131917 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
> 131917 [Thread-10] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -   
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5067) Merge the columns stats of multiple log blocks from the same log file

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5067:

Priority: Critical  (was: Major)

> Merge the columns stats of multiple log blocks from the same log file
> -
>
> Key: HUDI-5067
> URL: https://issues.apache.org/jira/browse/HUDI-5067
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: core
>Reporter: Danny Chen
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5025) Rollback failed with log file not found when rollOver in rollback process

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5025:

Priority: Blocker  (was: Major)

> Rollback failed with log file not found when rollOver in rollback process
> -
>
> Key: HUDI-5025
> URL: https://issues.apache.org/jira/browse/HUDI-5025
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: konwu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
> Attachments: image-2022-10-13-18-15-57-023.png
>
>
> Currently when rollOver happened in rollback was not create new log file 
> ,thus caused below exception 
> some test log:
> 2022-10-13 16:58:54,613 INFO  
> org.apache.hudi.common.table.log.HoodieLogFormatWriter       [] - 
> HoodieLogFile\{pathStr='viewfs://dcfs/ns-common/car/dws/dws_order_info_by_flinkbatch_history/2022-10-12/.0002-1251-4f1c-8f75-71ff51071ee3_20221013052439696.log.1_2-4-0',
>  fileLen=0} exists. Appending to existing file
> 2022-10-13 16:58:54,974 INFO  
> org.apache.hudi.table.action.rollback.BaseRollbackHelper     [] - after 
> testrollback writer.LogFile: 
> viewfs://dcfs/ns-common/car/dws/dws_order_info_by_flinkbatch_history/2022-10-12/.0002-1251-4f1c-8f75-71ff51071ee3_20221013052439696.log.2_1-0-1
>  
> !image-2022-10-13-18-15-57-023.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5088) Failed to synchronize the hive metadata of the Flink table

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5088:

Priority: Blocker  (was: Major)

> Failed to synchronize the hive metadata of the Flink table
> --
>
> Key: HUDI-5088
> URL: https://issues.apache.org/jira/browse/HUDI-5088
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink-sql
>Affects Versions: 0.12.1
>Reporter: waywtdcc
>Assignee: waywtdcc
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.2, 0.13.0
>
> Attachments: image-2022-10-27-13-51-06-216.png
>
>
> The flash table failed to synchronize the hive metadata. After using the 
> flash catalog and specifying changlog.enabled=true and 
> hive_sync.skip_ro_suffix = true, the real-time writing to the table fails to 
> synchronize the hive metadata.
> I guess the reason is:
> When using the Flink catalog to create the Flink table, the hive table will 
> be created first. The default value for creating a hive table is none_ 
> hoodie_ The operation field, but the Flink table has this field. The 
> inconsistent fields cause synchronization failure.
>  * sql
>  
>  * 
> {code:java}
> CREATE TABLE datagen (
>  id INT,
>  name string,
>  ts3  timestamp(3)
> ) WITH (
>   'connector' = 'datagen',
>   'rows-per-second'='10',
>   'fields.id.kind'='sequence',
>   'fields.id.start'='1',
>   'fields.id.end'='1000'
>   );
> CREATE CATALOG myhudi WITH(
> 'type' = 'hudi',
> 'default-database' = 'default',
> 'catalog.path' = '/user/hdpu/warehouse',
> 'mode' = 'hms',
> 'hive.conf.dir' = 'hdfs:///user/hdpu/streamx/conf_data/hive_conf',
> -- table-prop. properties.It is used to add attributes by default when 
> creating tables.    
>  'table-prop.connector' =  'hudi',
> 'table-prop.table.type' =  'MERGE_ON_READ',
> 'table-prop.compaction.tasks' = '4',
> 'table-prop.write.tasks' = '4',
> 'table-prop.index.bootstrap.enabled' = 'true',
> 'table-prop.hive_sync.skip_ro_suffix' = 'true',
> 'table-prop.compaction.delta_commits' = '1',
> 'table-prop.compaction.async.enabled' = 'true',
> 'table-prop.changelog.enabled' = 'true',
> 'table-prop.index.type' = 'BUCKET',
> 'table-prop.index.global.enabled' = 'true',
> 'table-prop.read.utc-timezone' = 'false'
>  );
> CREATE CATALOG myhive WITH (
>   'type' = 'hive',
>   'default-database' = 'default',
>   'hive-conf-dir' = 'hdfs:///user/hdpu/streamx/conf_data/hive_conf'
> );
> drop table if exists  myhive.test_hudi3.hudi_datagen_incre;
> drop table if exists  myhudi.test_hudi3.hudi_datagen_incre;
> create table if not exists myhudi.test_hudi3.hudi_datagen_incre
> (id bigint not null, name string,ts3 timestamp(3)
> ,PRIMARY KEY (`id`) NOT ENFORCED
> )
> ;
> show create table myhudi.test_hudi3.hudi_datagen_incre;
> insert into  myhudi.test_hudi3.hudi_datagen_incre
> select id,name,ts3
> from datagen;
>  {code}
>  * error
> {code:java}
> 22/10/25 13:55:01 ERROR HMSDDLExecutor: Failed to update table for 
> hudi_datagen_incre_ro
> InvalidOperationException(message:The following columns have types 
> incompatible with the existing columns in their respective positions :
> id)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_with_environment_context_result$alter_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:59744)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_with_environment_context_result$alter_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:59730)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_with_environment_context_result.read(ThriftHiveMetastore.java:59672)
>     at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_with_environment_context(ThriftHiveMetastore.java:1693)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_with_environment_context(ThriftHiveMetastore.java:1677)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table_with_environmentContext(HiveMetaStoreClient.java:375)
>     at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.alter_table_with_environmentContext(SessionHiveMetaStoreClient.java:322)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at 

[jira] [Updated] (HUDI-5386) Rollback conflict in occ mode

2022-12-13 Thread HunterXHunter (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5386:

Attachment: image-2022-12-14-11-26-37-252.png

> Rollback conflict in occ mode
> -
>
> Key: HUDI-5386
> URL: https://issues.apache.org/jira/browse/HUDI-5386
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2022-12-14-11-26-21-995.png, 
> image-2022-12-14-11-26-37-252.png
>
>
> {code:java}
> configuration parameter: 
> 'hoodie.cleaner.policy.failed.writes' = 'LAZY'
> 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
> Because `getInstantsToRollback` is not locked, multiple writes get the same 
> `instantsToRollback`, the same `instant` will be deleted multiple times and 
> the same `rollback.inflight` will be created multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5386) Rollback conflict in occ mode

2022-12-13 Thread HunterXHunter (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5386:

Attachment: (was: 1670986960525.jpg)

> Rollback conflict in occ mode
> -
>
> Key: HUDI-5386
> URL: https://issues.apache.org/jira/browse/HUDI-5386
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2022-12-14-11-26-21-995.png, 
> image-2022-12-14-11-26-37-252.png
>
>
> {code:java}
> configuration parameter: 
> 'hoodie.cleaner.policy.failed.writes' = 'LAZY'
> 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
> Because `getInstantsToRollback` is not locked, multiple writes get the same 
> `instantsToRollback`, the same `instant` will be deleted multiple times and 
> the same `rollback.inflight` will be created multiple times.
> !image-2022-12-14-11-26-37-252.png!
> !image-2022-12-14-11-26-21-995.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5386) Rollback conflict in occ mode

2022-12-13 Thread HunterXHunter (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5386:

Attachment: image-2022-12-14-11-26-21-995.png

> Rollback conflict in occ mode
> -
>
> Key: HUDI-5386
> URL: https://issues.apache.org/jira/browse/HUDI-5386
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2022-12-14-11-26-21-995.png, 
> image-2022-12-14-11-26-37-252.png
>
>
> {code:java}
> configuration parameter: 
> 'hoodie.cleaner.policy.failed.writes' = 'LAZY'
> 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
> Because `getInstantsToRollback` is not locked, multiple writes get the same 
> `instantsToRollback`, the same `instant` will be deleted multiple times and 
> the same `rollback.inflight` will be created multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5386) Rollback conflict in occ mode

2022-12-13 Thread HunterXHunter (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5386:

Attachment: (was: WechatIMG70.jpeg)

> Rollback conflict in occ mode
> -
>
> Key: HUDI-5386
> URL: https://issues.apache.org/jira/browse/HUDI-5386
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2022-12-14-11-26-21-995.png, 
> image-2022-12-14-11-26-37-252.png
>
>
> {code:java}
> configuration parameter: 
> 'hoodie.cleaner.policy.failed.writes' = 'LAZY'
> 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
> Because `getInstantsToRollback` is not locked, multiple writes get the same 
> `instantsToRollback`, the same `instant` will be deleted multiple times and 
> the same `rollback.inflight` will be created multiple times.
> !image-2022-12-14-11-26-37-252.png!
> !image-2022-12-14-11-26-21-995.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5386) Rollback conflict in occ mode

2022-12-13 Thread HunterXHunter (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5386:

Description: 
{code:java}
configuration parameter: 
'hoodie.cleaner.policy.failed.writes' = 'LAZY'
'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
Because `getInstantsToRollback` is not locked, multiple writes get the same 
`instantsToRollback`, the same `instant` will be deleted multiple times and the 
same `rollback.inflight` will be created multiple times.

!image-2022-12-14-11-26-37-252.png!

!image-2022-12-14-11-26-21-995.png!

  was:
{code:java}
configuration parameter: 
'hoodie.cleaner.policy.failed.writes' = 'LAZY'
'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
Because `getInstantsToRollback` is not locked, multiple writes get the same 
`instantsToRollback`, the same `instant` will be deleted multiple times and the 
same `rollback.inflight` will be created multiple times.


> Rollback conflict in occ mode
> -
>
> Key: HUDI-5386
> URL: https://issues.apache.org/jira/browse/HUDI-5386
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: HunterXHunter
>Priority: Major
> Attachments: image-2022-12-14-11-26-21-995.png, 
> image-2022-12-14-11-26-37-252.png
>
>
> {code:java}
> configuration parameter: 
> 'hoodie.cleaner.policy.failed.writes' = 'LAZY'
> 'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
> Because `getInstantsToRollback` is not locked, multiple writes get the same 
> `instantsToRollback`, the same `instant` will be deleted multiple times and 
> the same `rollback.inflight` will be created multiple times.
> !image-2022-12-14-11-26-37-252.png!
> !image-2022-12-14-11-26-21-995.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5386) Rollback conflict in occ mode

2022-12-13 Thread HunterXHunter (Jira)
HunterXHunter created HUDI-5386:
---

 Summary: Rollback conflict in occ mode
 Key: HUDI-5386
 URL: https://issues.apache.org/jira/browse/HUDI-5386
 Project: Apache Hudi
  Issue Type: Bug
Reporter: HunterXHunter
 Attachments: 1670986960525.jpg, WechatIMG70.jpeg

{code:java}
configuration parameter: 
'hoodie.cleaner.policy.failed.writes' = 'LAZY'
'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control' {code}
Because `getInstantsToRollback` is not locked, multiple writes get the same 
`instantsToRollback`, the same `instant` will be deleted multiple times and the 
same `rollback.inflight` will be created multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xushiyan commented on pull request #7139: [HUDI-5160] Spark df saveAsTable failed with CTAS

2022-12-13 Thread GitBox


xushiyan commented on PR #7139:
URL: https://github.com/apache/hudi/pull/7139#issuecomment-1350337021

   @dongkelun thanks for making the patch. the root cause here is we did not 
support converting data source write config into table config while using 
saveAsTable(), where table was not created yet and hudi catalog table should 
handle this conversion. made this patch https://github.com/apache/hudi/pull/7448


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

2022-12-13 Thread GitBox


alexeykudinkin commented on issue #7322:
URL: https://github.com/apache/hudi/issues/7322#issuecomment-1350336358

   > @alexeykudinkin I think the query engine should not limit the writing way 
for querying data. Even for the tables created by Spakrsql, the query engine 
should be able to query new data regardless of the way in which the data is 
written by spark datasource, spark sql, java client, flink sql, and flink 
stream api, without requiring users to do additional operations for different 
writing methods when using the query engine.
   
   This is not a limitation of the query engine this is a limitation of how 
you're using the query engine -- when writing to a table specified as a path 
following issues are at play
   
   1. Spark SQL will cache the Relation w/in the session cache when queried
   2. When writing to a table identified by a full path, rather than a name and 
Spark has no way to invalidate the SQL session cache (since it doesn't have the 
table identifier)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-5203) Debezium payload does not handle null-field cases

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-5203.
---
Resolution: Fixed

> Debezium payload does not handle null-field cases
> -
>
> Key: HUDI-5203
> URL: https://issues.apache.org/jira/browse/HUDI-5203
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: deltastreamer
>Reporter: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> https://github.com/apache/hudi/issues/7152



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5203) Debezium payload does not handle null-field cases

2022-12-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5203:

Priority: Critical  (was: Major)

> Debezium payload does not handle null-field cases
> -
>
> Key: HUDI-5203
> URL: https://issues.apache.org/jira/browse/HUDI-5203
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: deltastreamer
>Reporter: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> https://github.com/apache/hudi/issues/7152



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] alexeykudinkin commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

2022-12-13 Thread GitBox


alexeykudinkin commented on issue #7322:
URL: https://github.com/apache/hudi/issues/7322#issuecomment-1350331029

   > @alexeykudinkin Let's resolve the related issues all in this one. Re-open 
it.
   
   Can you elaborate why you think these issues are related?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] JoshuaZhuCN commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

2022-12-13 Thread GitBox


JoshuaZhuCN commented on issue #7322:
URL: https://github.com/apache/hudi/issues/7322#issuecomment-1350311469

   > > @alexeykudinkin i don't understand what "write into the table by its id" 
means, just using sql like insert into/update/delete from db.table to write 
data?
   > 
   > Correct. You can do the same from Spark DS.
   > 
   
   @alexeykudinkin I think the query engine should not limit the writing method 
for querying data. Even for the tables created by Spakrsql, the query engine 
should be able to query new data regardless of the way in which the data is 
written in the spark datasource, spark sql, java client, flash sql, and flash 
stream apis, without requiring users to do additional operations for different 
writing methods when using the query engine
   > > @alexeykudinkin At present, the problem I encounter is not only that the 
Spark datasource cannot be read after it is written, but also that the Spark 
sql cannot be read after it is written by Flink using hive sync. In other 
words, the SparkSQL query can not immediately read new data in any other way 
except by writing data in SQL. Therefore, I think this is a problem that needs 
to be solved
   > 
   > Interesting. Can you please create another issue specifically for this one 
as this hardly could be related?
   I'll verify it again.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

2022-12-13 Thread GitBox


danny0405 commented on issue #7322:
URL: https://github.com/apache/hudi/issues/7322#issuecomment-1350309180

   @alexeykudinkin Let's resolve the related issues all in this one. Re-open it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] soumilshah1995 opened a new issue, #7451: [SUPPORT] Failed to upsert for commit time 20221214022409792

2022-12-13 Thread GitBox


soumilshah1995 opened a new issue, #7451:
URL: https://github.com/apache/hudi/issues/7451

   Hello i am using glue 4.0 and i am trying to make  a tutorial for community. 
the title is move data from dynamodb into Apache Hudi.
   
   # Table 
   
![image](https://user-images.githubusercontent.com/39345855/207491290-6a23b4ce-1ed9-4e2c-8839-03afdb11b883.png)
   
   # Sample JSON
   ```
   {
 "id": {
   "S": "bbbe96f6-f762-4750-a946-a5e0dc5d0fa5"
 },
 "address": {
   "S": "361 Terri Rapids\nDonnaberg, ID 55397"
 },
 "city": {
   "S": "361 Terri Rapids\nDonnaberg, ID 55397"
 },
 "first_name": {
   "S": "Melissa"
 },
 "last_name": {
   "S": "Frederick"
 },
 "state": {
   "S": "Miss what personal door country energy rate. Court school its 
indicate. Remember finally because debate role hospital appear."
 },
 "text": {
   "S": "Miss what personal door country energy rate. Court school its 
indicate. Remember finally because debate role hospital appear."
 }
   }
   ```
   
    Glue table 
   
![image](https://user-images.githubusercontent.com/39345855/207491453-876af9d6-2be9-4cd2-a1d5-8c0e5b6e9565.png)
   
   ```
   import sys
   from awsglue.transforms import *
   from awsglue.utils import getResolvedOptions
   from pyspark.context import SparkContext
   from awsglue.context import GlueContext
   from awsglue.job import Job
   
   args = getResolvedOptions(sys.argv, ["JOB_NAME"])
   sc = SparkContext()
   glueContext = GlueContext(sc)
   spark = glueContext.spark_session
   job = Job(glueContext)
   job.init(args["JOB_NAME"], args)
   
   # Script generated for node AWS Glue Data Catalog
   AWSGlueDataCatalog_node1670979530578 = 
glueContext.create_dynamic_frame.from_catalog(
   database="dev.dynamodbdb",
   table_name="dev_users",
   transformation_ctx="AWSGlueDataCatalog_node1670979530578",
   )
   
   # Script generated for node Rename Field
   RenameField_node1670981729330 = RenameField.apply(
   frame=AWSGlueDataCatalog_node1670979530578,
   old_name="id",
   new_name="pk",
   transformation_ctx="RenameField_node1670981729330",
   )
   
   # Script generated for node Change Schema (Apply Mapping)
   ChangeSchemaApplyMapping_node1670981753064 = ApplyMapping.apply(
   frame=RenameField_node1670981729330,
   mappings=[
   ("address", "string", "address", "string"),
   ("city", "string", "city", "string"),
   ("last_name", "string", "last_name", "string"),
   ("text", "string", "text", "string"),
   ("pk", "string", "pk", "string"),
   ("state", "string", "state", "string"),
   ("first_name", "string", "first_name", "string"),
   ],
   transformation_ctx="ChangeSchemaApplyMapping_node1670981753064",
   )
   
   additional_options={
   "hoodie.datasource.hive_sync.database": "hudidb",
   "hoodie.table.name": "hudi_table",
   "hoodie.datasource.hive_sync.table": "hudi_table",
   
   "hoodie.datasource.write.storage.type": "COPY_ON_WRITE",
   "hoodie.datasource.write.operation": "upsert",
   "hoodie.datasource.write.recordkey.field": "pk",
   "hoodie.datasource.write.precombine.field": "pk",
   "hoodie.combine.before.delete":"false",
   
   
   "hoodie.datasource.write.hive_style_partitioning": "true",
   "hoodie.datasource.hive_sync.enable": "true",
   'hoodie.datasource.hive_sync.sync_as_datasource': 'false',
   "hoodie.datasource.hive_sync.use_jdbc": "false",
   
   'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor',
   "hoodie.datasource.hive_sync.mode": "hms",
   "path": "s3://glue-learn-begineers/data/"
   }
   df = ChangeSchemaApplyMapping_node1670981753064.toDF()
   
   
df.write.format("hudi").options(**additional_options).mode("overwrite").save()
   
   job.commit()
   
   ```
   
    Error Message 
   
   An error occurred while calling o138.save. Failed to upsert for commit time 
20221214022409792
   # Detailed Logs 
   ```
   here are older events to load. 
   Load more.
   2022-12-14 02:24:16,947 INFO [main] spark.SecurityManager 
(Logging.scala:logInfo(61)): Changing modify acls to: spark
   2022-12-14 02:24:16,948 INFO [main] spark.SecurityManager 
(Logging.scala:logInfo(61)): Changing view acls groups to: 
   2022-12-14 02:24:16,949 INFO [main] spark.SecurityManager 
(Logging.scala:logInfo(61)): Changing modify acls groups to: 
   2022-12-14 02:24:16,949 INFO [main] spark.SecurityManager 
(Logging.scala:logInfo(61)): SecurityManager: authentication enabled; ui acls 
disabled; users  with view permissions: Set(spark); groups with view 
permissions: Set(); users  with modify permissions: Set(spark); groups with 
modify permissions: Set()
   2022-12-14 02:24:17,452 INFO [netty-rpc-connection-0] 
client.TransportClientFactory (TransportClientFactory.java:createClient(310)): 

[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Write call stack information to lock file

2022-12-13 Thread GitBox


hudi-bot commented on PR #7440:
URL: https://github.com/apache/hudi/pull/7440#issuecomment-1350285610

   
   ## CI report:
   
   * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN
   * 3a660182f2351faa568087baa1de216d4151702a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13679)
 
   * 2d1a6bec193bc6064f049b70fb7e8dcfa9a97277 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13704)
 
   * 0d169ce9d63166dcfd93889019704b3edb71ed10 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (6de923cfdfd -> f56531f5489)

2022-12-13 Thread forwardxu
This is an automated email from the ASF dual-hosted git repository.

forwardxu pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 6de923cfdfd [HUDI-5318] Fix partition pruning for clustering 
scheduling (#7366)
 add f56531f5489 [HUDI-4961] Support optional table synchronization to 
hive. (#7398)

No new revisions were added by this update.

Summary of changes:
 ...ingPolicy.java => HoodieSyncTableStrategy.java} |  6 +++---
 .../apache/hudi/configuration/FlinkOptions.java|  7 +++
 .../apache/hudi/sink/utils/HiveSyncContext.java|  2 ++
 .../org/apache/hudi/hive/HiveSyncConfigHolder.java |  6 ++
 .../java/org/apache/hudi/hive/HiveSyncTool.java| 24 ++
 5 files changed, 38 insertions(+), 7 deletions(-)
 copy 
hudi-common/src/main/java/org/apache/hudi/common/model/{HoodieCleaningPolicy.java
 => HoodieSyncTableStrategy.java} (86%)



[GitHub] [hudi] XuQianJin-Stars merged pull request #7398: [HUDI-4961] Support optional table synchronization to hive.

2022-12-13 Thread GitBox


XuQianJin-Stars merged PR #7398:
URL: https://github.com/apache/hudi/pull/7398


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7423: [HUDI-5384] Adding optimization rule to appropriately push down filters into the `HoodieFileIndex`

2022-12-13 Thread GitBox


hudi-bot commented on PR #7423:
URL: https://github.com/apache/hudi/pull/7423#issuecomment-1350279292

   
   ## CI report:
   
   * 2905580eede076436b472c22da2f2d6af27d1e1e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13699)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Write call stack information to lock file

2022-12-13 Thread GitBox


hudi-bot commented on PR #7440:
URL: https://github.com/apache/hudi/pull/7440#issuecomment-1350279402

   
   ## CI report:
   
   * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN
   * 3a660182f2351faa568087baa1de216d4151702a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13679)
 
   * 2d1a6bec193bc6064f049b70fb7e8dcfa9a97277 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13704)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-13 Thread GitBox


hudi-bot commented on PR #7450:
URL: https://github.com/apache/hudi/pull/7450#issuecomment-1350279525

   
   ## CI report:
   
   * 80a0afc67d1e03473cdf9057375519158fee92d5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13705)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13701)
 
   * f11664234aaf6c74c98c1d75a364770931f9c00b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13706)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7345: [HUDI-3378] RFC46 rebase

2022-12-13 Thread GitBox


hudi-bot commented on PR #7345:
URL: https://github.com/apache/hudi/pull/7345#issuecomment-1350279052

   
   ## CI report:
   
   * e5f1dba84479f08417f25f53a79f6dae4425ba23 UNKNOWN
   * 2cfcca5c4f1a3a17b68d50f605f736c3a03c2e3f UNKNOWN
   * 24c41da0d3a33d1f851e463715d4494cbd873a86 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13696)
 
   * 060d8e2673cb07498e34b20424da566a411f4e1d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13703)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-13 Thread GitBox


hudi-bot commented on PR #7450:
URL: https://github.com/apache/hudi/pull/7450#issuecomment-1350273315

   
   ## CI report:
   
   * 80a0afc67d1e03473cdf9057375519158fee92d5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13705)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13701)
 
   * f11664234aaf6c74c98c1d75a364770931f9c00b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7440: [HUDI-5377] Write call stack information to lock file

2022-12-13 Thread GitBox


hudi-bot commented on PR #7440:
URL: https://github.com/apache/hudi/pull/7440#issuecomment-1350273184

   
   ## CI report:
   
   * 67e64ca0d35342d303f5c0027db72ec4c14f1890 UNKNOWN
   * 3a660182f2351faa568087baa1de216d4151702a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13679)
 
   * 2d1a6bec193bc6064f049b70fb7e8dcfa9a97277 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7345: [HUDI-3378] RFC46 rebase

2022-12-13 Thread GitBox


hudi-bot commented on PR #7345:
URL: https://github.com/apache/hudi/pull/7345#issuecomment-1350272921

   
   ## CI report:
   
   * e5f1dba84479f08417f25f53a79f6dae4425ba23 UNKNOWN
   * 2cfcca5c4f1a3a17b68d50f605f736c3a03c2e3f UNKNOWN
   * 24c41da0d3a33d1f851e463715d4494cbd873a86 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13696)
 
   * 060d8e2673cb07498e34b20424da566a411f4e1d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-13 Thread GitBox


hudi-bot commented on PR #7450:
URL: https://github.com/apache/hudi/pull/7450#issuecomment-1350266753

   
   ## CI report:
   
   * 80a0afc67d1e03473cdf9057375519158fee92d5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13701)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13705)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7449: [HUDI-5261] Use Proper Parallelism for Engine Context APIs

2022-12-13 Thread GitBox


hudi-bot commented on PR #7449:
URL: https://github.com/apache/hudi/pull/7449#issuecomment-1350266678

   
   ## CI report:
   
   * 46cb8c24bff18e89b912f53b759387d91d420019 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13698)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7345: [HUDI-3378] RFC46 rebase

2022-12-13 Thread GitBox


hudi-bot commented on PR #7345:
URL: https://github.com/apache/hudi/pull/7345#issuecomment-1350266283

   
   ## CI report:
   
   * e5f1dba84479f08417f25f53a79f6dae4425ba23 UNKNOWN
   * 2cfcca5c4f1a3a17b68d50f605f736c3a03c2e3f UNKNOWN
   * 24c41da0d3a33d1f851e463715d4494cbd873a86 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13696)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5377) Write call stack information to lock file

2022-12-13 Thread HunterXHunter (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-5377:

Summary: Write call stack information to lock file  (was: Add call stack 
information to lock file)

> Write call stack information to lock file
> -
>
> Key: HUDI-5377
> URL: https://issues.apache.org/jira/browse/HUDI-5377
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
>
> When Occ is enabled, Sometimes an exception is thrown 'Unable  to acquire 
> lock',
> We need to know which step caused the deadlock.
> like :
>  
> LOCK-TIME : 2022-12-13 11:13:15.015
> LOCK-STACK-INFO :
>      
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.acquireLock
>  (FileSystemBasedLockProvider.java:148)
>      
> org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider.tryLock 
> (FileSystemBasedLockProvider.java:100)
>      org.apache.hudi.client.transaction.lock.LockManager.lock 
> (LockManager.java:102)
>      org.apache.hudi.client.transaction.TransactionManager.beginTransaction 
> (TransactionManager.java:58)
>      org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableService 
> (BaseHoodieWriteClient.java:1425)
>      org.apache.hudi.client.BaseHoodieWriteClient.scheduleCompactionAtInstant 
> (BaseHoodieWriteClient.java:1037)
>      org.apache.hudi.util.CompactionUtil.scheduleCompaction 
> (CompactionUtil.java:72)
>      
> org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2
>  (StreamWriteOperatorCoordinator.java:250)
>      org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0 
> (NonThrownExecutor.java:130)
>      java.util.concurrent.ThreadPoolExecutor.runWorker 
> (ThreadPoolExecutor.java:1149)
>      java.util.concurrent.ThreadPoolExecutor$Worker.run 
> (ThreadPoolExecutor.java:624)
>      java.lang.Thread.run (Thread.java:750)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] nsivabalan commented on pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-13 Thread GitBox


nsivabalan commented on PR #7450:
URL: https://github.com/apache/hudi/pull/7450#issuecomment-1350264478

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] china-shang closed issue #7133: lazyReading affect

2022-12-13 Thread GitBox


china-shang closed issue #7133: lazyReading affect
URL: https://github.com/apache/hudi/issues/7133


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] china-shang commented on issue #7133: lazyReading affect

2022-12-13 Thread GitBox


china-shang commented on issue #7133:
URL: https://github.com/apache/hudi/issues/7133#issuecomment-1350258584

   ok
   
   Sagar Sumit ***@***.***> 于2022年12月13日周二 22:50写道:
   
   > @china-shang  please close the issue if
   > your query is answered.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] YannByron commented on pull request #7410: [HUDI-3478] imporve cdc-related codes

2022-12-13 Thread GitBox


YannByron commented on PR #7410:
URL: https://github.com/apache/hudi/pull/7410#issuecomment-1350256706

   @hudi-bot  run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

2022-12-13 Thread GitBox


alexeykudinkin commented on issue #7322:
URL: https://github.com/apache/hudi/issues/7322#issuecomment-1350235937

   > @alexeykudinkin i don't understand what "write into the table by its id" 
means, just using sql like insert into/update/delete from db.table to write 
data?
   
   Correct. You can do the same from Spark DS.
   
   > @alexeykudinkin At present, the problem I encounter is not only that the 
Spark datasource cannot be read after it is written, but also that the Spark 
sql cannot be read after it is written by Flink using hive sync. In other 
words, the SparkSQL query can not immediately read new data in any other way 
except by writing data in SQL. Therefore, I think this is a problem that needs 
to be solved
   
   Interesting. Can you please create another issue specifically for this one 
as this hardly could be related?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Zouxxyy commented on a diff in pull request #7175: [HUDI-5191] Fix compatibility with avro 1.10

2022-12-13 Thread GitBox


Zouxxyy commented on code in PR #7175:
URL: https://github.com/apache/hudi/pull/7175#discussion_r1047921632


##
.github/workflows/bot.yml:
##
@@ -73,6 +73,14 @@ jobs:
 run: |
   HUDI_VERSION=$(mvn help:evaluate -Dexpression=project.version -q 
-DforceStdout)
   ./packaging/bundle-validation/ci_run.sh $HUDI_VERSION
+  - name: Common Test

Review Comment:
   @alexeykudinkin Because the patch repairs the avro compatibility of the test 
cases in the hudi-common module under spark3, the test of the hudi-common 
module is added in bot



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] rahil-c commented on a diff in pull request #7397: [HUDI-5205] Upgrade Flink to 1.16.0

2022-12-13 Thread GitBox


rahil-c commented on code in PR #7397:
URL: https://github.com/apache/hudi/pull/7397#discussion_r1047911803


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java:
##
@@ -291,19 +291,19 @@ public void handleEventFromOperator(int i, OperatorEvent 
operatorEvent) {
   }
 
   @Override
-  public void subtaskFailed(int i, @Nullable Throwable throwable) {
-// reset the event
-this.eventBuffer[i] = null;
-LOG.warn("Reset the event for task [" + i + "]", throwable);
+  public void subtaskReset(int i, long l) {

Review Comment:
   I believe this method `subtaskReset` was just moved as opposed to actually 
changed ( atleast when im examining this diff). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] JoshuaZhuCN commented on issue #7322: [SUPPORT][HELP] SparkSQL can not read the latest change data without execute "refresh table xxx"

2022-12-13 Thread GitBox


JoshuaZhuCN commented on issue #7322:
URL: https://github.com/apache/hudi/issues/7322#issuecomment-1350206823

   @alexeykudinkin  At present, the problem I encounter is not only that the 
Spark datasource cannot be read after it is written, but also that the Spark 
sql cannot be read after it is written by Flink using hive sync. In other 
words, the SparkSQL query can not immediately read new data in any other way 
except by writing data in SQL. Therefore, I think this is a problem that needs 
to be solved


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-5318] Fix partition pruning for clustering scheduling (#7366)

2022-12-13 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 6de923cfdfd [HUDI-5318] Fix partition pruning for clustering 
scheduling (#7366)
6de923cfdfd is described below

commit 6de923cfdfdfcc4d265e3af5e12749295c29bb1c
Author: StreamingFlames <18889897...@163.com>
AuthorDate: Wed Dec 14 09:10:50 2022 +0800

[HUDI-5318] Fix partition pruning for clustering scheduling (#7366)

Co-authored-by: Nicholas Jiang 
---
 .../PartitionAwareClusteringPlanStrategy.java  | 24 
 .../TestPartitionAwareClusteringPlanStrategy.java  |  2 +-
 .../hudi/procedure/TestClusteringProcedure.scala   | 66 ++
 3 files changed, 78 insertions(+), 14 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/PartitionAwareClusteringPlanStrategy.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/PartitionAwareClusteringPlanStrategy.java
index 7042585f59b..e12d6d27aa2 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/PartitionAwareClusteringPlanStrategy.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/PartitionAwareClusteringPlanStrategy.java
@@ -75,11 +75,18 @@ public abstract class 
PartitionAwareClusteringPlanStrategy partitionPaths = 
FSUtils.getAllPartitionPaths(getEngineContext(), config.getMetadataConfig(), 
metaClient.getBasePath());
 
-// get matched partitions if set
-partitionPaths = getMatchedPartitions(config, partitionPaths);
-// filter the partition paths if needed to reduce list status
+String partitionSelected = config.getClusteringPartitionSelected();
+List partitionPaths;
+
+if (StringUtils.isNullOrEmpty(partitionSelected)) {
+  // get matched partitions if set
+  partitionPaths = getRegexPatternMatchedPartitions(config, 
FSUtils.getAllPartitionPaths(getEngineContext(), config.getMetadataConfig(), 
metaClient.getBasePath()));
+  // filter the partition paths if needed to reduce list status
+} else {
+  partitionPaths = Arrays.asList(partitionSelected.split(","));
+}
+
 partitionPaths = filterPartitionPaths(partitionPaths);
 
 if (partitionPaths.isEmpty()) {
@@ -118,15 +125,6 @@ public abstract class 
PartitionAwareClusteringPlanStrategy getMatchedPartitions(HoodieWriteConfig config, 
List partitionPaths) {
-String partitionSelected = config.getClusteringPartitionSelected();
-if (!StringUtils.isNullOrEmpty(partitionSelected)) {
-  return Arrays.asList(partitionSelected.split(","));
-} else {
-  return getRegexPatternMatchedPartitions(config, partitionPaths);
-}
-  }
-
   public List getRegexPatternMatchedPartitions(HoodieWriteConfig 
config, List partitionPaths) {
 String pattern = config.getClusteringPartitionFilterRegexPattern();
 if (!StringUtils.isNullOrEmpty(pattern)) {
diff --git 
a/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/cluster/strategy/TestPartitionAwareClusteringPlanStrategy.java
 
b/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/cluster/strategy/TestPartitionAwareClusteringPlanStrategy.java
index 440bc956153..a053a961105 100644
--- 
a/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/cluster/strategy/TestPartitionAwareClusteringPlanStrategy.java
+++ 
b/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/cluster/strategy/TestPartitionAwareClusteringPlanStrategy.java
@@ -71,7 +71,7 @@ public class TestPartitionAwareClusteringPlanStrategy {
 fakeTimeBasedPartitionsPath.add("20210719");
 fakeTimeBasedPartitionsPath.add("20210721");
 
-List list = 
strategyTestRegexPattern.getMatchedPartitions(hoodieWriteConfig, 
fakeTimeBasedPartitionsPath);
+List list = 
strategyTestRegexPattern.getRegexPatternMatchedPartitions(hoodieWriteConfig, 
fakeTimeBasedPartitionsPath);
 assertEquals(2, list.size());
 assertTrue(list.contains("20210721"));
 assertTrue(list.contains("20210723"));
diff --git 
a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestClusteringProcedure.scala
 
b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestClusteringProcedure.scala
index cc61db4a03d..fa82e419f7b 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestClusteringProcedure.scala
+++ 
b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestClusteringProcedure.scala
@@ -28,6 +28,7 @@ import org.apache.hudi.common.table.HoodieTableMetaClient
 import org.apache.hudi.common.table.timeline.{HoodieActiveTimeline, 
HoodieInstant, 

[GitHub] [hudi] leesf merged pull request #7366: [HUDI-5318] Fix partition pruning for clustering scheduling

2022-12-13 Thread GitBox


leesf merged PR #7366:
URL: https://github.com/apache/hudi/pull/7366


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7450: [HUDI-5375] Fixing reusing file readers with Metadata reader within FileIndex

2022-12-13 Thread GitBox


hudi-bot commented on PR #7450:
URL: https://github.com/apache/hudi/pull/7450#issuecomment-1350202268

   
   ## CI report:
   
   * 80a0afc67d1e03473cdf9057375519158fee92d5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13701)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   4   5   >