Re: Spark- How can I run MapReduce only on one partition in an RDD?

adrian Thu, 13 Nov 2014 15:15:07 -0800

The direct answere you are looking for may be in RDD.mapPartitionsWithIndex()


The better question is, why are you looking into only the 3rd partition? To
analyze a random sample? Then look into RDD.sample(). Are you sure the data
you are looking for is in the 3rd partition? What if you end up with only 2
partitions after loading your data? Or you may want to filter() your RDD?

Adrian


Tim Chou wrote
> Hi All,
> 
> I use textFile to create a RDD. However, I don't want to handle the whole
> data in this RDD. For example, maybe I only want to solve the data in 3rd
> partition of the RDD.
> 
> How can I do it? Here are some possible solutions that I'm thinking:
> 1. Create multiple RDDs when reading the file
> 2.  Run MapReduce functions with the specific partition for an RDD.
> 
> However, I cannot find any appropriate function.
> 
> Thank you and wait for your suggestions.
> 
> Best,
> Tim





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-How-can-I-run-MapReduce-only-on-one-partition-in-an-RDD-tp18882p18884.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark- How can I run MapReduce only on one partition in an RDD?

Reply via email to