[GitHub] spark pull request #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF shou...

felixcheung Fri, 25 May 2018 22:19:59 -0700

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21427#discussion_r191040210
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -4931,6 +4931,63 @@ def foo3(key, pdf):
             expected4 = udf3.func((), pdf)
             self.assertPandasEqual(expected4, result4)
     
    +    def test_column_order(self):
    +        import pandas as pd
    +        from pyspark.sql.functions import pandas_udf, PandasUDFType
    +        df = self.data
    +
    +        # Function returns a pdf with required column names, but order 
could be arbitrary using dict
    +        def change_col_order(pdf):
    +            # Constructing a DataFrame from a dict should result in the 
same order,
    +            # but use from_items to ensure the pdf column order is 
different than schema
    +            return pd.DataFrame.from_items([
    +                ('id', pdf.id),
    +                ('u', pdf.v * 2),
    +                ('v', pdf.v)])
    +
    +        ordered_udf = pandas_udf(
    +            change_col_order,
    +            'id long, v int, u int',
    +            PandasUDFType.GROUPED_MAP
    +        )
    +
    +        def positional_col_order(pdf):
    --- End diff --
    
    would it be more nature/common to `zip(range(3)` the columns, or or just 
name them one by one explicitly?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF shou...

Reply via email to