Re: [I] [to be discussed] Support for rollbackFailedWrites to delete inactive clustering plans [hudi]

2026-02-10 Thread via GitHub


nsivabalan commented on issue #17879:
URL: https://github.com/apache/hudi/issues/17879#issuecomment-3881384507

   gotcha. 
   
   To summarize, here are the diff requirements we have around table service 
orchestration and deployment: 
   
   - Compaction
 R1: Only one TS writer should be able to execute a given compaction plan. 
-> Emit heart beats during execution. 
   - Clustering 
 R1: Only one TS writer should be able to execute a given clustering plan. 
-> Emit heart beats during execution. 
 R2: Only one TS writer should be able to perform both planning and 
execution. If execution fails after planning for any reason, the plan should be 
cleaned up by some TS writer after some threshold. 
-> Concurrent writer or TS planner could hit FileNotFound issue if not 
for abort state. 
-> Abort state or go w/ inflight deletion/nuking of plans. 
   
   Let me know if this is right
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] [to be discussed] Support for rollbackFailedWrites to delete inactive clustering plans [hudi]

2026-01-20 Thread via GitHub


kbuci commented on issue #17879:
URL: https://github.com/apache/hudi/issues/17879#issuecomment-3776026228

   Sorry let me clarify; our goal is
   - have a way for clean to automatically rollback (and delete plans of) 
clustering writes that have failed. 
   - a new utility API that takes a list of partitions and attempts a rollback 
(and delete plans of) of any failed clustering writes targeting the same 
partition 
   Normally this would be unsafe (without implementing the RFC) due to the 
multi writer cases you mentioned. But we are able to implement this cleanup 
logic internally since for us:
   - Clustering is only attempted by a dedicated table service writer job that 
directly calls clustering APIs. In addition, we ensure that scheduling and 
execution will happen in the same job. 
   - We don't enable clustering "inline" within write commit / deltastreamer
   
Because of this, in our internal build we can just start a heartbeat in the 
schedule clustering call. And similar to ingestion writes, once heartbeat is 
expired HUDI can assume that the plan is safe to rollback and delete (either by 
clean or the  aforementioned utility API)
   
   > Are you asking for the proposed RFC to be implemented. 
   
   No we don't have to implement the RFC for this, although that would be an 
ideal long term solution to maintain our clustering flow with existing OSS 
flow. 
   
   > You are looking for support to rollback and nuke an existing clustering 
plan(already scheduled).
   This should not be a big ask, assuming you can enable the config for just 1 
of the dedicated table service writer and whenever it detects a pending 
clustering plan in the timeline, it could rollback and nuke the plan. 
   
   Yes, though both the clustering and clean jobs should have this config 
enabled. Alternatively this can be made as a table level config.
   
   > But chances that another concurrent ingestion writer could result in file 
not found issue which needs to be tackled.
   Yes thats right, we have an internal fix we can upstream for this
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] [to be discussed] Support for rollbackFailedWrites to delete inactive clustering plans [hudi]

2026-01-18 Thread via GitHub


nsivabalan commented on issue #17879:
URL: https://github.com/apache/hudi/issues/17879#issuecomment-3766412418

   I will focus on the  problem statement before diving into the solution. Let 
me know if my understanding is right. 
   
   1. You are looking for support to rollback and nuke an existing clustering 
plan(already scheduled). 
   This should not be a big ask, assuming you can enable the config for just 1 
of the dedicated table service writer and whenever it detects a pending 
clustering plan in the timeline, it could rollback and nuke the plan. But 
chances that another concurrent ingestion writer could result in file not found 
issue which needs to be tackled. That's why we proposed 
https://github.com/apache/hudi/pull/12856 
   
   Are you asking for the proposed RFC to be implemented. Can you help clarify 
please. 
   
   2. Based on your requirements, ask is not as simple as support 1. You could 
have multiple table service writers, where table service writer 1 and table 
service writer 2 could contend to perform clustering for the same table based 
on how the table services is orchestrated. Say, we have a pending clustering 
plan in the timeline, unless we have a heart beat, how would the other table 
service writer knows that a given clustering instant is being worked upon or 
no? Does that mean that, you already have incorporated heart beats for table 
services? 
   2.b. If my hunch is right, the heart heats are enabled for both schedule and 
execution of clustering. but typically schedule and execution can be 
de-coupled. So, I am not sure how would we enable heart beats for scheduling in 
such cases. or after scheduling, if the writer shuts down, the heart beat could 
be seen as expired right. But we did not want to rollback and nuke the 
clustering plan in this case. 
   
   W/o going into the solution, can you help clarify the problem statement and 
requirements. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]