Hello,
As we are developing MR3 extension for Celeborn, I would like to add my
comments on stage re-run in the context of using Celeborn for MR3. I don't
know the internal details of Spark stage re-run very well, so my apology
if my comments are irrelevant to the proposal in the design documen
Hello Celeborn team,
We are quite close to completing our Celeborn-MR3 client, and I have a
question on speculative execution in the context of using Celeborn.
MR3 supports speculative execution which allows several task attempts to
run concurrently. When a task attempt succeeds, all other concu
Hi Keyong,
Thank you for your detailed response. We will think about how to pass
ordered data.
Thanks,
--- Sungwoo
On Tue, 15 Aug 2023, Keyong Zhou wrote:
Hi Sungwoo,
Thanks for your mail. For your questions:
1. ShuffleClient.readPartition() may not read chunks in the same order that
the
Hi Celeborn team,
We are implementing a Celeborn-MR3 client, and have a question on the
order of chunks returned by ShuffleClient.readPartition().
--- Setup
With shuffleId, mapId, attemptId, partitionId all fixed, suppose that a
mapper with mapIndex M calls ShuffleClient.pushData() several t
2. You can try to add the shuffle id in
ShuffleClientImpl's stageEndShuffleSet. Currently
ShuffleClient does not have an API like `setStageEnd`, but I think it's
fine to add one. Let me know if you are interested in sending a PR :)
Hi Keyong,
Thanks for the detailed reply. We have decided to us
Hi Celeborn team,
We are implementing a Celeborn-MR3 client, and have a question on how to
properly unregister a shuffle ID via ShuffleClient. Here is a description
of the problem.
1. Suppose that several ShuffleClients are pushing data for a common
shuffle ID.
2. For some reason (e.g., a H
Hi Keyong,
Unlike Spark/Flink clients, we had to directly modify the MR3 runtime code
to support Celeborn and thus don't add new code to Celeborn. We release the
MR3 runtime code in Github, which could be used just as an example of
exploiting Celeborn.
The API is clean and the code is also c
We have extended the implementation of MR3 so that all partition
inputs can be fetched with a single call, e.g.:
rssShuffleClient.readPartition(..., 0, 100)
Now, Hive-MR3 with Celeborn runs as fast as Hive-MR3 with its own shuffle
handlers when tested with 10TB TPC-DS benchmark. For some que
Is there some way to use Celeborn API to check if CommitFiles succeeds in
step 6? Currently we are testing with TPC-DS 10TB data, and some heavy query
(query 24) occasionally fails with:
Caused by: java.io.IOException: Premature EOF from inputStream
We are speculating that this error occurs b
Following are the main steps for a shuffle stage:
1. LifecycleManager sends RequestSlots to Master to request slots for the
current shuffle;
2. Master allocates slots among workers for the shuffle and
returns RequestSlotsResponse;
3. LifecycleManager sends ReserveSlots to workers; workers do
initi
Hi Team,
I have a question on how a reducer should fetch the output of mappers.
As an example, consider this standard scenario:
1. There are 100 mapper and 50 reducers.
2. Each mapper creates 50 partitions, each of which is to be fetched by
the corresponding reducer.
3. Each reducer is responsi
Hi Keyong,
Thanks for your quick reply. We thought that Celeborn API was clean and
very intuitive, and have not encountered serious problems yet for getting
our system up and running. We are not sure about just a few points that
are not immediately obvious from Celeborn API (e.g., whether or n
Hi Team,
We are currently implementing a Celeborn client for our application
(called MR3 which is similar to Tez), and have a question on the internals
of Celeborn.
The question is whether a reducer should wait until the completion of all
mappers before starting to fetch mapper output. From
13 matches
Mail list logo