Re: Custom Partitioning Spark

2015-05-04 Thread ๏̯͡๏
Why do you use custom partitioner  ?
Are you doing join ?
And, can you share some code that shows how you implemented custom
partitioner.

On Tue, Apr 21, 2015 at 8:38 PM, ayan guha guha.a...@gmail.com wrote:

 Are you looking for?

 *mapPartitions*(*func*)Similar to map, but runs separately on each
 partition (block) of the RDD, so *func* must be of type IteratorT =
 IteratorU when running on an RDD of type T.*mapPartitionsWithIndex*(
 *func*)Similar to mapPartitions, but also provides *func* with an integer
 value representing the index of the partition, so *func* must be of type
 (Int, IteratorT) = IteratorU when running on an RDD of type T.

 On Wed, Apr 22, 2015 at 1:00 AM, MUHAMMAD AAMIR mas.ha...@gmail.com
 wrote:

 Hi Archit,

 Thanks a lot for your reply. I am using rdd.partitions.length to check
 the number of partitions. rdd.partitions return the array of partitions.
 I would like to add one more question here do you have any idea how to
 get the objects in each partition ? Further is there any way to figure out
 which particular partitions an object bleongs ?

 Thanks,

 On Tue, Apr 21, 2015 at 12:16 PM, Archit Thakur 
 archit279tha...@gmail.com wrote:

 Hi,

 This should work. How are you checking the no. of partitions.?

 Thanks and Regards,
 Archit Thakur.

 On Mon, Apr 20, 2015 at 7:26 PM, mas mas.ha...@gmail.com wrote:

 Hi,

 I aim to do custom partitioning on a text file. I first convert it into
 pairRDD and then try to use my custom partitioner. However, somehow it
 is
 not working. My code snippet is given below.

 val file=sc.textFile(filePath)
 val locLines=file.map(line = line.split(\t)).map(line=
 ((line(2).toDouble,line(3).toDouble),line(5).toLong))
 val ck=locLines.partitionBy(new HashPartitioner(50)) // new
 CustomPartitioner(50) -- none of the way is working here.

 while reading the file using textFile method it automatically
 partitions
 the file. However when i explicitly want to partition the new rdd
 locLines, It doesn't appear to do anything and even the number of
 partitions are same which is created by sc.textFile().

 Any help in this regard will be highly appreciated.




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Custom-Partitioning-Spark-tp22571.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





 --
 Regards,
 Muhammad Aamir


 *CONFIDENTIALITY:This email is intended solely for the person(s) named
 and may be confidential and/or privileged.If you are not the intended
 recipient,please delete it,notify me and do not copy,use,or disclose its
 content.*




 --
 Best Regards,
 Ayan Guha




-- 
Deepak


Re: Custom Partitioning Spark

2015-04-21 Thread MUHAMMAD AAMIR
Hi Archit,

Thanks a lot for your reply. I am using rdd.partitions.length to check
the number of partitions. rdd.partitions return the array of partitions.
I would like to add one more question here do you have any idea how to get
the objects in each partition ? Further is there any way to figure out
which particular partitions an object bleongs ?

Thanks,

On Tue, Apr 21, 2015 at 12:16 PM, Archit Thakur archit279tha...@gmail.com
wrote:

 Hi,

 This should work. How are you checking the no. of partitions.?

 Thanks and Regards,
 Archit Thakur.

 On Mon, Apr 20, 2015 at 7:26 PM, mas mas.ha...@gmail.com wrote:

 Hi,

 I aim to do custom partitioning on a text file. I first convert it into
 pairRDD and then try to use my custom partitioner. However, somehow it is
 not working. My code snippet is given below.

 val file=sc.textFile(filePath)
 val locLines=file.map(line = line.split(\t)).map(line=
 ((line(2).toDouble,line(3).toDouble),line(5).toLong))
 val ck=locLines.partitionBy(new HashPartitioner(50)) // new
 CustomPartitioner(50) -- none of the way is working here.

 while reading the file using textFile method it automatically partitions
 the file. However when i explicitly want to partition the new rdd
 locLines, It doesn't appear to do anything and even the number of
 partitions are same which is created by sc.textFile().

 Any help in this regard will be highly appreciated.




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Custom-Partitioning-Spark-tp22571.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





-- 
Regards,
Muhammad Aamir


*CONFIDENTIALITY:This email is intended solely for the person(s) named and
may be confidential and/or privileged.If you are not the intended
recipient,please delete it,notify me and do not copy,use,or disclose its
content.*


Re: Custom Partitioning Spark

2015-04-21 Thread ayan guha
Are you looking for?

*mapPartitions*(*func*)Similar to map, but runs separately on each
partition (block) of the RDD, so *func* must be of type IteratorT =
IteratorU when running on an RDD of type T.*mapPartitionsWithIndex*(*func*
)Similar to mapPartitions, but also provides *func* with an integer value
representing the index of the partition, so *func* must be of type (Int,
IteratorT) = IteratorU when running on an RDD of type T.

On Wed, Apr 22, 2015 at 1:00 AM, MUHAMMAD AAMIR mas.ha...@gmail.com wrote:

 Hi Archit,

 Thanks a lot for your reply. I am using rdd.partitions.length to check
 the number of partitions. rdd.partitions return the array of partitions.
 I would like to add one more question here do you have any idea how to get
 the objects in each partition ? Further is there any way to figure out
 which particular partitions an object bleongs ?

 Thanks,

 On Tue, Apr 21, 2015 at 12:16 PM, Archit Thakur archit279tha...@gmail.com
  wrote:

 Hi,

 This should work. How are you checking the no. of partitions.?

 Thanks and Regards,
 Archit Thakur.

 On Mon, Apr 20, 2015 at 7:26 PM, mas mas.ha...@gmail.com wrote:

 Hi,

 I aim to do custom partitioning on a text file. I first convert it into
 pairRDD and then try to use my custom partitioner. However, somehow it is
 not working. My code snippet is given below.

 val file=sc.textFile(filePath)
 val locLines=file.map(line = line.split(\t)).map(line=
 ((line(2).toDouble,line(3).toDouble),line(5).toLong))
 val ck=locLines.partitionBy(new HashPartitioner(50)) // new
 CustomPartitioner(50) -- none of the way is working here.

 while reading the file using textFile method it automatically
 partitions
 the file. However when i explicitly want to partition the new rdd
 locLines, It doesn't appear to do anything and even the number of
 partitions are same which is created by sc.textFile().

 Any help in this regard will be highly appreciated.




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Custom-Partitioning-Spark-tp22571.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





 --
 Regards,
 Muhammad Aamir


 *CONFIDENTIALITY:This email is intended solely for the person(s) named and
 may be confidential and/or privileged.If you are not the intended
 recipient,please delete it,notify me and do not copy,use,or disclose its
 content.*




-- 
Best Regards,
Ayan Guha


Re: Custom Partitioning Spark

2015-04-21 Thread Archit Thakur
Hi,

This should work. How are you checking the no. of partitions.?

Thanks and Regards,
Archit Thakur.

On Mon, Apr 20, 2015 at 7:26 PM, mas mas.ha...@gmail.com wrote:

 Hi,

 I aim to do custom partitioning on a text file. I first convert it into
 pairRDD and then try to use my custom partitioner. However, somehow it is
 not working. My code snippet is given below.

 val file=sc.textFile(filePath)
 val locLines=file.map(line = line.split(\t)).map(line=
 ((line(2).toDouble,line(3).toDouble),line(5).toLong))
 val ck=locLines.partitionBy(new HashPartitioner(50)) // new
 CustomPartitioner(50) -- none of the way is working here.

 while reading the file using textFile method it automatically partitions
 the file. However when i explicitly want to partition the new rdd
 locLines, It doesn't appear to do anything and even the number of
 partitions are same which is created by sc.textFile().

 Any help in this regard will be highly appreciated.




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Custom-Partitioning-Spark-tp22571.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org