[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...

HyukjinKwon Sun, 04 Mar 2018 02:22:15 -0800

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20728
  
    I was just double checking if we can write a test. Mind adding the test 
below if it makes sense?
    
    ```diff
    diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
    index 19653072ea3..c46423ac905 100644
    --- a/python/pyspark/sql/tests.py
    +++ b/python/pyspark/sql/tests.py
    @@ -4381,6 +4381,24 @@ class ScalarPandasUDFTests(ReusedSQLTestCase):
             result = df.withColumn('time', foo_udf(df.time))
             self.assertEquals(df.collect(), result.collect())
    
    +    @unittest.skipIf(sys.version_info[:2] < (3, 5), "Type hints are 
supported from Python 3.5.")
    +    def test_type_annotation(self):
    +        from pyspark.sql.functions import pandas_udf
    +        # Regression test to check if type hints can be used. See 
SPARK-23569.
    +        # Note that it throws an error during compilation in lower Python 
versions if 'exec'
    +        # is not used. Also, note that we explicitly use another 
dictionary to avoid modifications
    +        # in the current 'locals()'.
    +        #
    +        # Hyukjin: I think it's an ugly way to test issues about syntax 
specific in
    +        # higher versions of Python, which we shouldn't encourage. This 
was the last resort
    +        # I could come up with at that time.
    +        _locals = {}
    +        exec(
    +            "import pandas as pd\ndef _noop(col: pd.Series) -> pd.Series: 
return col",
    +            _locals)
    +        df = self.spark.range(1).select(pandas_udf(f=_locals['_noop'], 
returnType='bigint')('id'))
    +        self.assertEqual(df.first()[0], 0)
    +
    
     @unittest.skipIf(
         not _have_pandas or not _have_pyarrow,
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20728: [SPARK-23569][PYTHON] Allow pandas_udf to work with pyth...

Reply via email to