Re: [PROPOSAL] Spark stage resubmission for shuffle fetch failure

2023-09-27 Thread orpl
Hello, As we are developing MR3 extension for Celeborn, I would like to add my comments on stage re-run in the context of using Celeborn for MR3. I don't know the internal details of Spark stage re-run very well, so my apology if my comments are irrelevant to the proposal in the design documen

Question on speculative execution,

2023-08-19 Thread orpl
Hello Celeborn team, We are quite close to completing our Celeborn-MR3 client, and I have a question on speculative execution in the context of using Celeborn. MR3 supports speculative execution which allows several task attempts to run concurrently. When a task attempt succeeds, all other concu

Re: Question on ShuffleClient.readPartition()

2023-08-14 Thread orpl
Hi Keyong, Thank you for your detailed response. We will think about how to pass ordered data. Thanks, --- Sungwoo On Tue, 15 Aug 2023, Keyong Zhou wrote: Hi Sungwoo, Thanks for your mail. For your questions: 1. ShuffleClient.readPartition() may not read chunks in the same order that the

Question on ShuffleClient.readPartition()

2023-08-14 Thread orpl
Hi Celeborn team, We are implementing a Celeborn-MR3 client, and have a question on the order of chunks returned by ShuffleClient.readPartition(). --- Setup With shuffleId, mapId, attemptId, partitionId all fixed, suppose that a mapper with mapIndex M calls ShuffleClient.pushData() several t

Re: Q. How to interrupt ShuffleClient and avoid revive requests due to HARD_SPLIT

2023-08-03 Thread orpl
2. You can try to add the shuffle id in ShuffleClientImpl's stageEndShuffleSet. Currently ShuffleClient does not have an API like `setStageEnd`, but I think it's fine to add one. Let me know if you are interested in sending a PR :) Hi Keyong, Thanks for the detailed reply. We have decided to us

Q. How to interrupt ShuffleClient and avoid revive requests due to HARD_SPLIT

2023-07-30 Thread orpl
Hi Celeborn team, We are implementing a Celeborn-MR3 client, and have a question on how to properly unregister a shuffle ID via ShuffleClient. Here is a description of the problem. 1. Suppose that several ShuffleClients are pushing data for a common shuffle ID. 2. For some reason (e.g., a H

Re: Question of fetching mapper output

2023-07-20 Thread orpl
Hi Keyong, Unlike Spark/Flink clients, we had to directly modify the MR3 runtime code to support Celeborn and thus don't add new code to Celeborn. We release the MR3 runtime code in Github, which could be used just as an example of exploiting Celeborn. The API is clean and the code is also c

Re: Question of fetching mapper output

2023-07-16 Thread orpl
We have extended the implementation of MR3 so that all partition inputs can be fetched with a single call, e.g.: rssShuffleClient.readPartition(..., 0, 100) Now, Hive-MR3 with Celeborn runs as fast as Hive-MR3 with its own shuffle handlers when tested with 10TB TPC-DS benchmark. For some que

Re: Question on implementing Celeborn client,

2023-07-13 Thread orpl
Is there some way to use Celeborn API to check if CommitFiles succeeds in step 6? Currently we are testing with TPC-DS 10TB data, and some heavy query (query 24) occasionally fails with: Caused by: java.io.IOException: Premature EOF from inputStream We are speculating that this error occurs b

Re: Question on implementing Celeborn client,

2023-07-13 Thread orpl
Following are the main steps for a shuffle stage: 1. LifecycleManager sends RequestSlots to Master to request slots for the current shuffle; 2. Master allocates slots among workers for the shuffle and returns RequestSlotsResponse; 3. LifecycleManager sends ReserveSlots to workers; workers do initi

Question of fetching mapper output

2023-07-13 Thread orpl
Hi Team, I have a question on how a reducer should fetch the output of mappers. As an example, consider this standard scenario: 1. There are 100 mapper and 50 reducers. 2. Each mapper creates 50 partitions, each of which is to be fetched by the corresponding reducer. 3. Each reducer is responsi

Re: Question on implementing Celeborn client,

2023-07-12 Thread orpl
Hi Keyong, Thanks for your quick reply. We thought that Celeborn API was clean and very intuitive, and have not encountered serious problems yet for getting our system up and running. We are not sure about just a few points that are not immediately obvious from Celeborn API (e.g., whether or n

Question on implementing Celeborn client,

2023-07-11 Thread orpl
Hi Team, We are currently implementing a Celeborn client for our application (called MR3 which is similar to Tez), and have a question on the internals of Celeborn. The question is whether a reducer should wait until the completion of all mappers before starting to fetch mapper output. From