[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

holdenk Mon, 04 Dec 2017 13:37:33 -0800

Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19872#discussion_r154782452
  
    --- Diff: python/pyspark/sql/group.py ---
    @@ -89,8 +89,15 @@ def agg(self, *exprs):
             else:
                 # Columns
                 assert all(isinstance(c, Column) for c in exprs), "all exprs 
should be Column"
    -            jdf = self._jgd.agg(exprs[0]._jc,
    -                                _to_seq(self.sql_ctx._sc, [c._jc for c in 
exprs[1:]]))
    +            if isinstance(exprs[0], UDFColumn):
    +                assert all(isinstance(c, UDFColumn) for c in exprs)
    --- End diff --
    
    So I'm a little worried about this change, if other folks have wrapped Java 
UDAFs (which is reasonable since there aren't other ways to make UDAFs in 
PySpark before this), this seems like they won't be able to mix them. I'd 
suggest maybe doing what @viirya suggested bellow but instead of a failure just 
a warning until Spark 3.
    
    What do y'all think?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

Reply via email to