Yes, your interpretation is correct.
I thought about using your meta-resource idea, but it suffers from a
similar problem as what I am currently facing. The partitions of the
meta-resource will be "placed" on some participant by Helix. Now imagine
that participant fails. The partition of the meta-resource will go
through its DROPPED state to get placed elsewhere. So that DROPPED state
is still discernible from the partition being dropped to reflect
dropping of the resource it represents. Either I misunderstood your
solution (1) or I need to somehow have a participant that never fails.
I can see how to use the task framework to build a reliable workflow
with a sequence of steps where one of the steps is to do the final
cleanup. Is this already available in the codebase (if I was willing to
make my own build)?
Thanks,
Vinayak
On 5/19/14, 9:11 AM, Kanak Biscuitwala wrote:
So if I understand correctly, you basically want to know if state should be
kept in case the partition could move back, or not. Right now, Helix treats a
dropped resource as if all its partitions have been dropped. Separately, Helix
treats a moved partition as a dropped partition on one participant and an added
partition on another participant. So they're currently very much linked.
This requires some more thought, but here's what comes to mind:
1. Have a meta-resource whose partitions are simply the names of the other
resources in the cluster. When you drop a resource, the operation would be to
simultaneously drop the resource and drop the partition from the meta resource.
Then you can get a separate transition for dropped resource. I haven't thought
about the race conditions here, and there could be some impact depending on
your app.
2. In the upcoming task framework, create a task that manages the drop resource
scenario from beginning to end, for instance call helixadmin#dropresource, wait
for external view to converge, issue cleanup requests to participants.
Participants would implement a cleanup callback. This is something we're get
out the door this quarter.
3. Something that works, but you would like to avoid: ask HelixAdmin if the
resource exists
Perhaps others can chime in with ideas.
----------------------------------------
Date: Sun, 18 May 2014 12:08:15 -0700
From: [email protected]
To: [email protected]
Subject: Need two kinds of DROPPED states?
Hi Guys,
It looks like when a partition that is on a participant (P1) is moved to
another participant (P2), P1 is sent a transition request from OFFLINE
-> DROPPED.
In an other scenario, when a resource is dropped using HelixAdmin, the
partitions undergo a similar transition to DROPPED.
As an application, one might need to do different things in those two
cases. For example, in the first case its being dropped to become live
somewhere else and so any shared state for the resource should not be
lost. On the other hand, in the second scenario, the application might
want to clean up all state associated with the resource.
Is there a way for the application to distinguish between the first kind
of DROPPED and the second kind? I am looking to have the state machine
itself handle both the scenarios without the need for the application to
trigger some special activity to perform the cleanup in the second scenario.
Thanks,
Vinayak