mccheah commented on pull request #28618:
URL: https://github.com/apache/spark/pull/28618#issuecomment-647844596


   > I was thinking a lot on `ShuffleDriverComponents` and I have an idea how 
to improve it.
   > 
   > The problem I believe this class tries to fulfil two very separate roles: 
be a **builder** and the **result of the building** in the same time.
   > 
   > This is why we need this kind of check:
   > 
https://github.com/apache/spark/blob/289050e4309b024146867b7b32974ab4746eb570/core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleDriverComponents.java#L47-L49
   > 
   > If the building is cleanly separated from the result of the building then 
we can be sure the prerequisites are fulfilled before.
   > 
   > I would change it by transforming it to be the result of the building in 
the following way:
   > 
   > * the `initializeApplication` (I mean the process of the initialisation 
and not the returned Map) should be part of the building. The documentation of 
the `ShuffleDataIO#driver` method can be extended by mentioning this is the 
right place to initialize.
   > * the `ShuffleDriverComponents` could have a new method which gives back 
the "additional SparkConf settings necessary for initializing the executor 
components" we can call it like `additonalExecutorConfigs`. This new method 
would replace the old `initializeApplication`
   > 
   > One more idea / question:
   > 
   > * I do not see why the `ShuffleOutputTracker` is optional. Either we or 
the API user can provide an implementation where the methods are empty this way 
the API a bit simpler.
   > 
   > @mccheah what do you think?
   
   You bring up a good point. I can adjust the PR accordingly. It does seem 
like the components does both an initialization and a runtime mode, and it 
would be more ideal to separate the two. Thanks for critically thinking about 
this!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to