[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

cloud-fan Mon, 16 Oct 2017 08:25:27 -0700

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18732
  
    @icexelloss I think as an API, it's a little confusing that `@pandas_udf` 
can define both `Series* -> Series` function and `DataFrame -> DataFrame` 
function. Besides, to support `StructType` as the return type of `Series* -> 
Series` function, I think we have to add an extra flag to `@pandas_udf`. For 
the coming `DataFrame -> Scalar` pandas UDAF, we also need extra flags to 
represent partial aggregate ability.
    
    From my experience of Java/Scala API design, I think it's a bad idea to 
have a method with many parameters as flags. We'd better have more methods. For 
this case, `@pandas_udf`, `@pandas_grouped_udf` and `@pandas_udaf` looks better 
to me.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

Reply via email to