[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

viirya Sun, 03 Dec 2017 22:59:19 -0800

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19872#discussion_r154569884
  
    --- Diff: python/pyspark/sql/group.py ---
    @@ -89,8 +89,15 @@ def agg(self, *exprs):
             else:
                 # Columns
                 assert all(isinstance(c, Column) for c in exprs), "all exprs 
should be Column"
    -            jdf = self._jgd.agg(exprs[0]._jc,
    -                                _to_seq(self.sql_ctx._sc, [c._jc for c in 
exprs[1:]]))
    +            if isinstance(exprs[0], UDFColumn):
    +                assert all(isinstance(c, UDFColumn) for c in exprs)
    +                jdf = self._jgd.aggInPandas(
    +                    _to_seq(self.sql_ctx._sc, [c._jc for c in exprs]))
    +            else:
    +                jdf = self._jgd.agg(exprs[0]._jc,
    --- End diff --
    
    If `exprs[n]` (n > 0) is a `UDFColumn`? I think we should make sure if any 
column is a `UDFColumn`, all columns should be `UDFColumn`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

Reply via email to