Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/16137#discussion_r91855251
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -956,24 +976,24 @@ class SparkContext(config: SparkConf) extends Logging
{
}
/**
- * Get an RDD for a Hadoop-readable dataset from a Hadoop JobConf given
its InputFormat and other
- * necessary info (e.g. file name for a filesystem-based dataset, table
name for HyperTable),
- * using the older MapReduce API (`org.apache.hadoop.mapred`).
+ * Get an RDD for a Hadoop-readable dataset from a Hadoop `JobConf`
given its `InputFormat`
+ * and other necessary info (e.g. file name for a filesystem-based
dataset, table name
+ * for HyperTable), using the older MapReduce API
(`org.apache.hadoop.mapred`).
*
- * @param conf JobConf for setting up the dataset. Note: This will be
put into a Broadcast.
+ * @note Because Hadoop's `RecordReader` class re-uses the same Writable
object for each
+ * record, directly caching the returned RDD or directly passing it to
an aggregation
+ * or shuffle operation will create many references to the same object.
+ * If you plan to directly cache, sort, or aggregate Hadoop writable
objects, you
+ * should first copy them using a `map` function.
+ * @param conf `JobConf` for setting up the dataset. Note: This will be
put into a Broadcast.
* Therefore if you plan to reuse this conf to create
multiple RDDs, you need to make
* sure you won't modify the conf. A safe approach is always
creating a new conf for
* a new RDD.
* @param inputFormatClass Class of the InputFormat
* @param keyClass Class of the keys
* @param valueClass Class of the values
- * @param minPartitions Minimum number of Hadoop Splits to generate.
- *
- * @note Because Hadoop's RecordReader class re-uses the same Writable
object for each
- * record, directly caching the returned RDD or directly passing it to
an aggregation or shuffle
- * operation will create many references to the same object.
- * If you plan to directly cache, sort, or aggregate Hadoop writable
objects, you should first
- * copy them using a `map` function.
+ * @param minPartitions minimum number of Hadoop Splits to generate.
--- End diff --
Say "partitions" not "splits" despite what the existing string says
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]