Github user xubo245 commented on a diff in the pull request:
    --- Diff: python/pyspark/sql/ ---
    @@ -667,6 +667,55 @@ def repartition(self, numPartitions, *cols):
                 raise TypeError("numPartitions should be an int or Column")
    +    @since("2.3.0")
    +    def repartitionByRange(self, numPartitions, *cols, **kwargs):
    +        """
    +        Returns a new :class:`DataFrame` partitioned by the given 
partitioning expressions. The
    +        resulting DataFrame is range partitioned.
    +        ``numPartitions`` can be an int to specify the target number of 
partitions or a Column.
    +        If it is a Column, it will be used as the first partitioning 
column. If not specified,
    +        the default number of partitions is used.
    +        At least one partition-by expression must be specified.
    +        When no explicit sort order is specified, "ascending nulls first" 
is assumed.
    +        >>> df.repartitionByRange(2, "age").rdd.getNumPartitions()
    +        2
    +        >>> data = df.union(df).repartition(1, "age")
    --- End diff --
    ok, change it to repartitionByRange


To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to