squito commented on a change in pull request #25823: [SPARK-28211][Core][Shuffle] Propose Shuffle Driver Components API URL: https://github.com/apache/spark/pull/25823#discussion_r328687947
########## File path: core/src/main/java/org/apache/spark/shuffle/api/ShuffleDriverComponents.java ########## @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle.api; + +import org.apache.spark.annotation.Private; + +import java.io.IOException; +import java.util.Map; + +/** + * :: Private :: + * An interface for building shuffle support modules for the Driver. + */ +@Private +public interface ShuffleDriverComponents { + + /** + * Called once in the driver to bootstrap this module that is specific + * to this application. + * + * @return additional SparkConf settings necessary for initializing the executors. Review comment: I don't think the docs here and for the interface are great, though I'm also struggling to come up with something better. I think we need to give a better explanation of what this class is for -- more like what is in the jira "this can used for creating tables in the shuffle storage database, or registering / unregistering against file servers." Also this `initializeApplication()` method should say a bit more about the purpose of the return value. Eg., some conf which needs to be passed to the executors, but cannot be set statically as a cluster config (eg. something like a host:port the executor components should connect back to?). What do you think of passing in SparkEnv explicitly? With so much stuff getting initialized in the driver, its hard to know what you can count on being available when this is called. And seems helpful to make it explicit, rather than expecting developers to just know they can call `SparkEnv.get` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
