[GitHub] [spark] squito commented on a change in pull request #25823: [SPARK-28211][Core][Shuffle] Propose Shuffle Driver Components API

GitBox Thu, 26 Sep 2019 08:47:27 -0700

squito commented on a change in pull request #25823: 
[SPARK-28211][Core][Shuffle] Propose Shuffle Driver Components API
URL: https://github.com/apache/spark/pull/25823#discussion_r328687947


 ##########
 File path: 
core/src/main/java/org/apache/spark/shuffle/api/ShuffleDriverComponents.java
 ##########
 @@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.shuffle.api;
+
+import org.apache.spark.annotation.Private;
+
+import java.io.IOException;
+import java.util.Map;
+
+/**
+ * :: Private ::
+ * An interface for building shuffle support modules for the Driver.
+ */
+@Private
+public interface ShuffleDriverComponents {
+
+  /**
+   * Called once in the driver to bootstrap this module that is specific
+   * to this application.
+   *
+   * @return additional SparkConf settings necessary for initializing the 
executors.
 
 Review comment:
   I don't think the docs here and for the interface are great, though I'm also 
struggling to come up with something better.  I think we need to give a better 
explanation of what this class is for -- more like what is in the jira "this 
can used for creating tables in the shuffle storage database, or registering / 
unregistering against file servers."
   
   Also this `initializeApplication()` method should say a bit more about the 
purpose of the return value.  Eg., some conf which needs to be passed to the 
executors, but cannot be set statically as a cluster config (eg. something like 
a host:port the executor components should connect back to?).
   
   What do you think of passing in SparkEnv explicitly?  With so much stuff 
getting initialized in the driver, its hard to know what you can count on being 
available when this is called.  And seems helpful to make it explicit, rather 
than expecting developers to just know they can call `SparkEnv.get`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] squito commented on a change in pull request #25823: [SPARK-28211][Core][Shuffle] Propose Shuffle Driver Components API

Reply via email to