Re: Custom Partitioning Spark
Why do you use custom partitioner ? Are you doing join ? And, can you share some code that shows how you implemented custom partitioner. On Tue, Apr 21, 2015 at 8:38 PM, ayan guha guha.a...@gmail.com wrote: Are you looking for? *mapPartitions*(*func*)Similar to map, but runs separately on each partition (block) of the RDD, so *func* must be of type IteratorT = IteratorU when running on an RDD of type T.*mapPartitionsWithIndex*( *func*)Similar to mapPartitions, but also provides *func* with an integer value representing the index of the partition, so *func* must be of type (Int, IteratorT) = IteratorU when running on an RDD of type T. On Wed, Apr 22, 2015 at 1:00 AM, MUHAMMAD AAMIR mas.ha...@gmail.com wrote: Hi Archit, Thanks a lot for your reply. I am using rdd.partitions.length to check the number of partitions. rdd.partitions return the array of partitions. I would like to add one more question here do you have any idea how to get the objects in each partition ? Further is there any way to figure out which particular partitions an object bleongs ? Thanks, On Tue, Apr 21, 2015 at 12:16 PM, Archit Thakur archit279tha...@gmail.com wrote: Hi, This should work. How are you checking the no. of partitions.? Thanks and Regards, Archit Thakur. On Mon, Apr 20, 2015 at 7:26 PM, mas mas.ha...@gmail.com wrote: Hi, I aim to do custom partitioning on a text file. I first convert it into pairRDD and then try to use my custom partitioner. However, somehow it is not working. My code snippet is given below. val file=sc.textFile(filePath) val locLines=file.map(line = line.split(\t)).map(line= ((line(2).toDouble,line(3).toDouble),line(5).toLong)) val ck=locLines.partitionBy(new HashPartitioner(50)) // new CustomPartitioner(50) -- none of the way is working here. while reading the file using textFile method it automatically partitions the file. However when i explicitly want to partition the new rdd locLines, It doesn't appear to do anything and even the number of partitions are same which is created by sc.textFile(). Any help in this regard will be highly appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Custom-Partitioning-Spark-tp22571.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Regards, Muhammad Aamir *CONFIDENTIALITY:This email is intended solely for the person(s) named and may be confidential and/or privileged.If you are not the intended recipient,please delete it,notify me and do not copy,use,or disclose its content.* -- Best Regards, Ayan Guha -- Deepak
Re: Custom Partitioning Spark
Hi Archit, Thanks a lot for your reply. I am using rdd.partitions.length to check the number of partitions. rdd.partitions return the array of partitions. I would like to add one more question here do you have any idea how to get the objects in each partition ? Further is there any way to figure out which particular partitions an object bleongs ? Thanks, On Tue, Apr 21, 2015 at 12:16 PM, Archit Thakur archit279tha...@gmail.com wrote: Hi, This should work. How are you checking the no. of partitions.? Thanks and Regards, Archit Thakur. On Mon, Apr 20, 2015 at 7:26 PM, mas mas.ha...@gmail.com wrote: Hi, I aim to do custom partitioning on a text file. I first convert it into pairRDD and then try to use my custom partitioner. However, somehow it is not working. My code snippet is given below. val file=sc.textFile(filePath) val locLines=file.map(line = line.split(\t)).map(line= ((line(2).toDouble,line(3).toDouble),line(5).toLong)) val ck=locLines.partitionBy(new HashPartitioner(50)) // new CustomPartitioner(50) -- none of the way is working here. while reading the file using textFile method it automatically partitions the file. However when i explicitly want to partition the new rdd locLines, It doesn't appear to do anything and even the number of partitions are same which is created by sc.textFile(). Any help in this regard will be highly appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Custom-Partitioning-Spark-tp22571.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Regards, Muhammad Aamir *CONFIDENTIALITY:This email is intended solely for the person(s) named and may be confidential and/or privileged.If you are not the intended recipient,please delete it,notify me and do not copy,use,or disclose its content.*
Re: Custom Partitioning Spark
Are you looking for? *mapPartitions*(*func*)Similar to map, but runs separately on each partition (block) of the RDD, so *func* must be of type IteratorT = IteratorU when running on an RDD of type T.*mapPartitionsWithIndex*(*func* )Similar to mapPartitions, but also provides *func* with an integer value representing the index of the partition, so *func* must be of type (Int, IteratorT) = IteratorU when running on an RDD of type T. On Wed, Apr 22, 2015 at 1:00 AM, MUHAMMAD AAMIR mas.ha...@gmail.com wrote: Hi Archit, Thanks a lot for your reply. I am using rdd.partitions.length to check the number of partitions. rdd.partitions return the array of partitions. I would like to add one more question here do you have any idea how to get the objects in each partition ? Further is there any way to figure out which particular partitions an object bleongs ? Thanks, On Tue, Apr 21, 2015 at 12:16 PM, Archit Thakur archit279tha...@gmail.com wrote: Hi, This should work. How are you checking the no. of partitions.? Thanks and Regards, Archit Thakur. On Mon, Apr 20, 2015 at 7:26 PM, mas mas.ha...@gmail.com wrote: Hi, I aim to do custom partitioning on a text file. I first convert it into pairRDD and then try to use my custom partitioner. However, somehow it is not working. My code snippet is given below. val file=sc.textFile(filePath) val locLines=file.map(line = line.split(\t)).map(line= ((line(2).toDouble,line(3).toDouble),line(5).toLong)) val ck=locLines.partitionBy(new HashPartitioner(50)) // new CustomPartitioner(50) -- none of the way is working here. while reading the file using textFile method it automatically partitions the file. However when i explicitly want to partition the new rdd locLines, It doesn't appear to do anything and even the number of partitions are same which is created by sc.textFile(). Any help in this regard will be highly appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Custom-Partitioning-Spark-tp22571.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Regards, Muhammad Aamir *CONFIDENTIALITY:This email is intended solely for the person(s) named and may be confidential and/or privileged.If you are not the intended recipient,please delete it,notify me and do not copy,use,or disclose its content.* -- Best Regards, Ayan Guha
Re: Custom Partitioning Spark
Hi, This should work. How are you checking the no. of partitions.? Thanks and Regards, Archit Thakur. On Mon, Apr 20, 2015 at 7:26 PM, mas mas.ha...@gmail.com wrote: Hi, I aim to do custom partitioning on a text file. I first convert it into pairRDD and then try to use my custom partitioner. However, somehow it is not working. My code snippet is given below. val file=sc.textFile(filePath) val locLines=file.map(line = line.split(\t)).map(line= ((line(2).toDouble,line(3).toDouble),line(5).toLong)) val ck=locLines.partitionBy(new HashPartitioner(50)) // new CustomPartitioner(50) -- none of the way is working here. while reading the file using textFile method it automatically partitions the file. However when i explicitly want to partition the new rdd locLines, It doesn't appear to do anything and even the number of partitions are same which is created by sc.textFile(). Any help in this regard will be highly appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Custom-Partitioning-Spark-tp22571.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org