Why do you only want the third partition? You can access individual partitions using the partitions() function. You can also filter your data using the filter() function to only contain the data you care about. Moreover, when you create your RDDs unless you define a custom partitioner you have no way of controlling what data is in partition #3. Therefore, there is almost no reason to want to operate on an individual partition.
-----Original Message----- From: Tim Chou [timchou....@gmail.com<mailto:timchou....@gmail.com>] Sent: Thursday, November 13, 2014 06:01 PM Eastern Standard Time To: user@spark.apache.org Subject: Spark- How can I run MapReduce only on one partition in an RDD? Hi All, I use textFile to create a RDD. However, I don't want to handle the whole data in this RDD. For example, maybe I only want to solve the data in 3rd partition of the RDD. How can I do it? Here are some possible solutions that I'm thinking: 1. Create multiple RDDs when reading the file 2. Run MapReduce functions with the specific partition for an RDD. However, I cannot find any appropriate function. Thank you and wait for your suggestions. Best, Tim ________________________________________________________ The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.