[GitHub] [spark] HyukjinKwon commented on a change in pull request #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API

GitBox Thu, 01 Aug 2019 00:52:14 -0700

HyukjinKwon commented on a change in pull request #25007: 
[SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API 
URL: https://github.com/apache/spark/pull/25007#discussion_r309565135


 ##########
 File path: core/src/main/java/org/apache/spark/shuffle/api/ShuffleDataIO.java
 ##########
 @@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.shuffle.api;
+
+import org.apache.spark.annotation.Private;
+
+/**
+ * :: Private ::
+ * An interface for plugging in modules for storing and reading temporary 
shuffle data.
+ * <p>
+ * This is the root of a plugin system for storing shuffle bytes to arbitrary 
storage
+ * backends in the sort-based shuffle algorithm implemented by the
+ * {@link org.apache.spark.shuffle.sort.SortShuffleManager}. If another 
shuffle algorithm is
+ * needed instead of sort-based shuffle, one should implement
+ * {@link org.apache.spark.shuffle.ShuffleManager} instead.
+ * <p>
+ * A single instance of this module is loaded per process in the Spark 
application.
+ * The default implementation reads and writes shuffle data from the local 
disks of
+ * the executor, and is the implementation of shuffle file storage that has 
remained
+ * consistent throughout most of Spark's history.
+ * <p>
+ * Alternative implementations of shuffle data storage can be loaded via 
setting
+ * <code>spark.shuffle.sort.io.plugin.class</code>.
+ * @since 3.0.0
+ */
+@Private
 
 Review comment:
   Question from 
[SPARK-28568](https://issues.apache.org/jira/browse/SPARK-28568). Is it an API 
or not? Looks so given the PR description. `@Private` is:
   
   >  * This should be used only when the standard Scala / Java means of 
protecting classes are
   >  * insufficient.  In particular, Java has no equivalent of private[spark], 
so we use this annotation
   >  * in its place.
   
   So `@Private` doesn't look like for APIs. Shall we change it to `@Unstable` 
(maybe with an explicit warning)?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API

Reply via email to