Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r158208592
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2141,22 +2141,22 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
     
            >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
            >>> from pyspark.sql.types import IntegerType, StringType
    -       >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
    -       >>> @pandas_udf(StringType())
    +       >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())  # 
doctest: +SKIP
    +       >>> @pandas_udf(StringType())  # doctest: +SKIP
            ... def to_upper(s):
            ...     return s.str.upper()
            ...
    -       >>> @pandas_udf("integer", PandasUDFType.SCALAR)
    +       >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
            ... def add_one(x):
            ...     return x + 1
            ...
    -       >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", 
"name", "age"))
    +       >>> df = spark.createDataFrame([(1, "John", 21)], ("id", "name", 
"age"))  # doctest: +SKIP
    --- End diff --
    
    The name change shouldn't have been committed, I'll change it back.  I 
don't think we can make the doctests conditional on if pandas/pyarrow is 
installed, so unless we make these required dependencies and have them 
installed on all the workers, we need to skip them.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to