RE: Spark- How can I run MapReduce only on one partition in an RDD?

Ganelin, Ilya Thu, 13 Nov 2014 15:10:08 -0800

Why do you only want the third partition? You can access individual partitions 
using the partitions() function. You can also filter your data using the 
filter() function to only contain the data you care about. Moreover, when you 
create your RDDs unless you define a custom partitioner you have no way of 
controlling what data is in partition #3. Therefore, there is almost no reason 
to want to operate on an individual partition.

-----Original Message-----
From: Tim Chou [timchou....@gmail.com<mailto:timchou....@gmail.com>]
Sent: Thursday, November 13, 2014 06:01 PM Eastern Standard Time
To: user@spark.apache.org
Subject: Spark- How can I run MapReduce only on one partition in an RDD?

Hi All,

I use textFile to create a RDD. However, I don't want to handle the whole data 
in this RDD. For example, maybe I only want to solve the data in 3rd partition 
of the RDD.

How can I do it? Here are some possible solutions that I'm thinking:
1. Create multiple RDDs when reading the file
2.  Run MapReduce functions with the specific partition for an RDD.

However, I cannot find any appropriate function.

Thank you and wait for your suggestions.

Best,
Tim
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed.  If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

RE: Spark- How can I run MapReduce only on one partition in an RDD?

Reply via email to