Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-20 Thread Zhu Zhu
Thanks Till for the explanation! That looks good to me. Thanks, Zhu Zhu Till Rohrmann 于2019年10月21日周一 上午2:45写道: > Hi Zhu Zhu, > > the cluster partition does not need to be registered at the RM before it > can be used. The cluster partition descriptor will be reported to the > client as part of

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-20 Thread Till Rohrmann
Hi Zhu Zhu, the cluster partition does not need to be registered at the RM before it can be used. The cluster partition descriptor will be reported to the client as part of the job execution result. This information is used to construct a JobGraph which can consume from a cluster partition. The

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-20 Thread Zhu Zhu
Thanks Chesnay for proposing this FLIP! And sorry for the late response on it. The FLIP overall looks good to me, except for one question. - If a cluster partition does not exist in RM, how can users tell whether it is not produced yet, or it is already released? Users/InteractiveQuery may need

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-15 Thread Chesnay Schepler
I have updated the FLIP. - adopted job-/cluster partitions naming scheme - out-lined interface for new component living in the RM (currently called ThinShuffleMaster, but I'm not a fan of the name. Suggestions would be appreciated) - added a note that the ShuffleService changes are only

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-13 Thread Zhijiang
Thanks for these further considerations Chesnay! I guess we might have some misunderstanding. Actually I was not against the previous proposal Till suggested before, and I think it is a formal way to do that. And my previous proposal was not for excluding the ShuffleService completely. The

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-13 Thread Zhijiang
Thanks for the further explanation Till! It is fine for me to run only one ShuffleMaster instance as now, and make RM handle the deletion of cluster partitions in a light-weight way. I also have no concerns of letting TE handle the deletion of cluster partititions as did for job partitions now.

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-13 Thread Chesnay Schepler
I'm quite torn on whether to exclude the ShuffleServices from the proposal. I think I'm now on my third or fourth iteration for a response, so I'll just send both so I can stop thinking for a bit about whether to push for one or the other: Opinion A, aka "Nu Uh": I'm not in favor of

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-13 Thread Till Rohrmann
I think we won't necessarily run multiple ShuffleMasters. I think it would be better to pass in a leaner interface into the RM to only handle the deletion of the global result partitions. Letting the TEs handle the deletion of the global result partitions might work as long as we don't have an

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-11 Thread zhijiang
Sorry for delay catching up with the recent progress. Thanks for the FLIP update and valuable discussions! I also like the term of job/cluster partitions, and agree with most of the previous comments. Only left one concern of ShuffleMaster side: >However, if the separation of JM/RM into

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-11 Thread Chesnay Schepler
h I like job-/cluster partitions. On 10/10/2019 16:27, Till Rohrmann wrote: I think we should introduce a separate interface for the ResourceManager so that it can list and delete global result partitions from the shuffle service implementation. As long as the JM and RM run in the same

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-10 Thread Till Rohrmann
I think we should introduce a separate interface for the ResourceManager so that it can list and delete global result partitions from the shuffle service implementation. As long as the JM and RM run in the same process, this interface could be implemented by the ShuffleMaster implementations.

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-09 Thread Chesnay Schepler
Are there any other opinions in regards to the naming scheme? (local/global, promote) On 06/09/2019 15:16, Chesnay Schepler wrote: Hello, FLIP-36 (interactive programming) proposes a new

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-09 Thread Chesnay Schepler
While we could argue that it's a new interface so we aren't /technically /changing anything about the ShuffleMaster, I'd assume most people would just have the ShuffleMaster implement the new interface and call it a day. On 09/10/2019 09:57, Chesnay Schepler wrote: So should we enforce having

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-09 Thread Chesnay Schepler
So should we enforce having 2 instances now or defer this to a later date? I'd rather do this early since it changes 2 assumptions that ShuffleMaster can currently make: - every partition release is preceded by a registration of said partition - the release of partitions may rely on local data

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-04 Thread Till Rohrmann
Thanks for updating the FLIP. I think the RM does not need to have access to a full fledged ShuffleMaster implementation. Instead it should enough to give it a leaner interface which only supports to delete result partitions and list available global partitions. This might entail that one will

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-04 Thread Chesnay Schepler
I have updated the FLIP. - consistently use "local"/"global" terminology; this incidentally should make it easier to update the terminology if we decide on other names - inform RM via heartbeats from TE about available global partitions - add dedicated method for releasing global partitions -

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-04 Thread Till Rohrmann
On Fri, Oct 4, 2019 at 12:37 PM Chesnay Schepler wrote: > *Till: In the FLIP you wrote "The set of partitions to release may contain > local > and/or global partitions; the promotion set must only refer to local > partitions." to describe the `releasePartitions`. I think the JM should > never

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-04 Thread Chesnay Schepler
/Till: In the FLIP you wrote "The set of partitions to release may contain local and/or global partitions; the promotion set must only refer to local partitions." to describe the `releasePartitions`. I think the JM should never be in the situation to release a global partition. Moreover, I

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-02 Thread Till Rohrmann
Thanks for addressing our comments Chesnay. See some comments inline. On Wed, Oct 2, 2019 at 4:07 PM Chesnay Schepler wrote: > Thank you for your comments; I've aggregated them a bit and added > comments to each of them. > > 1) Concept name (proposal: persistent) > > I agree that "global" is

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-10-02 Thread Chesnay Schepler
Thank you for your comments; I've aggregated them a bit and added comments to each of them. 1) Concept name (proposal: persistent) I agree that "global" is rather undescriptive, particularly so since we never had a notion of "local" partitions. I'm not a fan of "persistent"; as to me this

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-09-30 Thread Becket Qin
Forgot to say that I agree with Till that it seems a good idea to let TEs register the global partitions to the RM instead of letting JM do it. This simplifies quite a few things. Thanks, Jiangjie (Becket) Qin On Sun, Sep 29, 2019 at 11:25 PM Becket Qin wrote: > Hi Chesnay, > > Thanks for the

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-09-30 Thread Becket Qin
Hi Chesnay, Thanks for the proposal. My understanding of the entire workflow step by step is following: - JM maintains the local and global partition metadata when the task runs to create result partitions. The tasks themselves does not distinguish between local / global partitions. Only the

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-09-17 Thread zhijiang
Thanks Chesnay for this FLIP and sorry for touching it a bit delay on my side. I also have some similar concerns which Till already proposed before. 1. The consistent terminology in different components. On JM side, PartitionTracker#getPersistedBlockingPartitions is defined for getting global

Re: [DISCUSS] FLIP-67: Global partitions lifecycle

2019-09-10 Thread Till Rohrmann
Thanks Chesnay for drafting the FLIP and starting this discussion. I have a couple of comments: * I know that I've also coined the terms global/local result partition but maybe it is not the perfect name. Maybe we could rethink the terminology and call them persistent result partitions? * Nit: I