Re: Custom Partitioning Spark

ayan guha Tue, 21 Apr 2015 08:11:13 -0700

Are you looking for?

*mapPartitions*(*func*)Similar to map, but runs separately on each
partition (block) of the RDD, so *func* must be of type Iterator<T> =>
Iterator<U> when running on an RDD of type T.*mapPartitionsWithIndex*(*func*
)Similar to mapPartitions, but also provides *func* with an integer value
representing the index of the partition, so *func* must be of type (Int,
Iterator<T>) => Iterator<U> when running on an RDD of type T.


On Wed, Apr 22, 2015 at 1:00 AM, MUHAMMAD AAMIR <[email protected]> wrote:

> Hi Archit,
>
> Thanks a lot for your reply. I am using "rdd.partitions.length" to check
> the number of partitions. rdd.partitions return the array of partitions.
> I would like to add one more question here do you have any idea how to get
> the objects in each partition ? Further is there any way to figure out
> which particular partitions an object bleongs ?
>
> Thanks,
>
> On Tue, Apr 21, 2015 at 12:16 PM, Archit Thakur <[email protected]
> > wrote:
>
>> Hi,
>>
>> This should work. How are you checking the no. of partitions.?
>>
>> Thanks and Regards,
>> Archit Thakur.
>>
>> On Mon, Apr 20, 2015 at 7:26 PM, mas <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I aim to do custom partitioning on a text file. I first convert it into
>>> pairRDD and then try to use my custom partitioner. However, somehow it is
>>> not working. My code snippet is given below.
>>>
>>> val file=sc.textFile(filePath)
>>> val locLines=file.map(line => line.split("\t")).map(line=>
>>> ((line(2).toDouble,line(3).toDouble),line(5).toLong))
>>> val ck=locLines.partitionBy(new HashPartitioner(50)) // new
>>> CustomPartitioner(50) -- none of the way is working here.
>>>
>>> while reading the file using "textFile" method it automatically
>>> partitions
>>> the file. However when i explicitly want to partition the new rdd
>>> "locLines", It doesn't appear to do anything and even the number of
>>> partitions are same which is created by sc.textFile().
>>>
>>> Any help in this regard will be highly appreciated.
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Custom-Partitioning-Spark-tp22571.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>
>
> --
> Regards,
> Muhammad Aamir
>
>
> *CONFIDENTIALITY:This email is intended solely for the person(s) named and
> may be confidential and/or privileged.If you are not the intended
> recipient,please delete it,notify me and do not copy,use,or disclose its
> content.*
>



-- 
Best Regards,
Ayan Guha

Re: Custom Partitioning Spark

Reply via email to