yes i can take as an example , but my actual use case is that in need to
resolve a data skew, when i do grouping based on key(A-Z) the resulting
partitions are skewed like
(partition no.,no_of_keys, total elements with given key)
<< partition: [(0, 0, 0), (1, 15, 17395), (2, 0, 0), (3, 0, 0), (4, 13,
18196), (5, 0, 0), (6, 0, 0), (7, 0, 0), (8, 1, 1), (9, 0, 0)] and
elements: >>
the data has been skewed to partition 1 and 4, i need to split the
partition. and do processing on split partitions and i should be able to
combine splitted partition back also.

On Tue, Sep 1, 2015 at 10:42 PM, Davies Liu <dav...@databricks.com> wrote:

> You can take the sortByKey as example:
> https://github.com/apache/spark/blob/master/python/pyspark/rdd.py#L642
>
> On Tue, Sep 1, 2015 at 3:48 AM, Jem Tucker <jem.tuc...@gmail.com> wrote:
> > something like...
> >
> > class RangePartitioner(Partitioner):
> > def __init__(self, numParts):
> > self.numPartitions = numParts
> > self.partitionFunction = rangePartition
> > def rangePartition(key):
> > # Logic to turn key into a partition id
> > return id
> >
> > On Tue, Sep 1, 2015 at 11:38 AM shahid ashraf <sha...@trialx.com> wrote:
> >>
> >> Hi
> >>
> >> I think range partitioner is not available in pyspark, so if we want
> >> create one. how should we create that. my question is that.
> >>
> >> On Tue, Sep 1, 2015 at 3:57 PM, Jem Tucker <jem.tuc...@gmail.com>
> wrote:
> >>>
> >>> Ah sorry I miss read your question. In pyspark it looks like you just
> >>> need to instantiate the Partitioner class with numPartitions and
> >>> partitionFunc.
> >>>
> >>> On Tue, Sep 1, 2015 at 11:13 AM shahid ashraf <sha...@trialx.com>
> wrote:
> >>>>
> >>>> Hi
> >>>>
> >>>> I did not get this, e.g if i need to create a custom partitioner like
> >>>> range partitioner.
> >>>>
> >>>> On Tue, Sep 1, 2015 at 3:22 PM, Jem Tucker <jem.tuc...@gmail.com>
> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> You just need to extend Partitioner and override the numPartitions
> and
> >>>>> getPartition methods, see below
> >>>>>
> >>>>> class MyPartitioner extends partitioner {
> >>>>>   def numPartitions: Int = // Return the number of partitions
> >>>>>   def getPartition(key Any): Int = // Return the partition for a
> given
> >>>>> key
> >>>>> }
> >>>>>
> >>>>> On Tue, Sep 1, 2015 at 10:15 AM shahid qadri <
> shahidashr...@icloud.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hi Sparkians
> >>>>>>
> >>>>>> How can we create a customer partition in pyspark
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >>>>>> For additional commands, e-mail: user-h...@spark.apache.org
> >>>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> with Regards
> >>>> Shahid Ashraf
> >>
> >>
> >>
> >>
> >> --
> >> with Regards
> >> Shahid Ashraf
>



-- 
with Regards
Shahid Ashraf

Reply via email to