Re: Ways to partition the RDD

2014-08-14 Thread Daniel Siegmann
d information. Since the RDD is already > partitioned, there is no need to worry about repartitioning. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Ways-to-partition-the-RDD-tp12083p12136.html > Sent from the Apache Spark User Li

Re: Ways to partition the RDD

2014-08-14 Thread bdev
Thanks Daniel for the detailed information. Since the RDD is already partitioned, there is no need to worry about repartitioning. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Ways-to-partition-the-RDD-tp12083p12136.html Sent from the Apache Spark User

Re: Ways to partition the RDD

2014-08-14 Thread Daniel Siegmann
tions? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Ways-to-partition-the-RDD-tp12083p12128.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --

Re: Ways to partition the RDD

2014-08-14 Thread bdev
Thanks, will give that a try. I see the number of partitions requested is 8 (through HashPartitioner(8)). If I have a 40 node cluster, whats the recommended number of partitions? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Ways-to-partition-the-RDD

Re: Ways to partition the RDD

2014-08-14 Thread ssb61
(pfUser(0) -> pfUser(1))}) .partitionBy(new org.apache.spark.HashPartitioner(8)) You have a kvRdd with pageName as Key and UserID as Value. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Ways-to-partition-the-RDD-tp1208

Re: Ways to partition the RDD

2014-08-13 Thread bdev
Forgot to mention, I'm using Spark 1.0.0 and running against 40 node yarn-cluster. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Ways-to-partition-the-RDD-tp12083p12088.html Sent from the Apache Spark User List mailing list archive at Nabbl

Ways to partition the RDD

2014-08-13 Thread bdev
http://apache-spark-user-list.1001560.n3.nabble.com/Ways-to-partition-the-RDD-tp12083.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For addit