[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142029595 --- Diff: python/pyspark/sql/functions.py --- @@ -2181,31 +2186,69 @@ def udf(f=None, returnType=StringType()): @since(2.3) def pandas_udf(f=No

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142029720 --- Diff: python/pyspark/sql/functions.py --- @@ -2181,31 +2186,69 @@ def udf(f=None, returnType=StringType()): @since(2.3) def pandas_udf(f=No

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142029736 --- Diff: python/pyspark/sql/group.py --- @@ -194,6 +194,37 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col,

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142029696 --- Diff: python/pyspark/sql/functions.py --- @@ -2181,31 +2186,69 @@ def udf(f=None, returnType=StringType()): @since(2.3) def pandas_udf(f=No

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142029802 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed t

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142029714 --- Diff: python/pyspark/sql/functions.py --- @@ -2181,31 +2186,69 @@ def udf(f=None, returnType=StringType()): @since(2.3) def pandas_udf(f=No

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142029786 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed t

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142029655 --- Diff: python/pyspark/sql/functions.py --- @@ -2181,31 +2186,69 @@ def udf(f=None, returnType=StringType()): @since(2.3) def pandas_udf(f=No

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141976177 --- Diff: python/pyspark/sql/group.py --- @@ -194,6 +194,28 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col,

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141955128 --- Diff: python/pyspark/sql/functions.py --- @@ -2129,7 +2129,8 @@ def _create_udf(f, returnType, vectorized): def _udf(f, returnType=StringType

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141953540 --- Diff: python/pyspark/sql/group.py --- @@ -194,6 +194,28 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col,

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141953502 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141953474 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141953435 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141953036 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -519,3 +519,18 @@ case class CoGroup( out

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141953052 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -24,9 +24,9 @@ import org.apache.spark.broadcast.B

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141952916 --- Diff: python/pyspark/sql/group.py --- @@ -194,6 +194,28 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col,

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141950729 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -435,6 +435,29 @@ class RelationalGroupedDataset protected[s

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141950202 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -435,6 +435,29 @@ class RelationalGroupedDataset protected[s

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141950175 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -47,8 +47,8 @@ import org.apache.spark.sql.types.StructType

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141950131 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -47,8 +47,8 @@ import org.apache.spark.sql.types.StructType

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141949108 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala --- @@ -37,6 +37,9 @@ object AttributeSet { /*

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141906843 --- Diff: python/pyspark/sql/group.py --- @@ -194,6 +194,28 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col,

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141892887 --- Diff: python/pyspark/sql/group.py --- @@ -194,6 +194,28 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col, valu

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141891357 --- Diff: python/pyspark/sql/group.py --- @@ -194,6 +194,28 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col,

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141886413 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -44,14 +44,17 @@ case class ArrowEvalPythonExec

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141885324 --- Diff: python/pyspark/sql/tests.py --- @@ -3376,6 +3377,74 @@ def test_vectorized_udf_empty_partition(self): res = df.select(f(col('id')))

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141883671 --- Diff: python/pyspark/sql/functions.py --- @@ -2206,6 +2207,10 @@ def pandas_udf(f=None, returnType=StringType()): | 8| JOHN DOE|

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141830344 --- Diff: python/pyspark/sql/group.py --- @@ -194,6 +194,28 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col, valu

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141829344 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -44,14 +44,17 @@ case class ArrowEvalPythonExec(udf

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141828817 --- Diff: python/pyspark/sql/group.py --- @@ -194,6 +194,28 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col, valu

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141827443 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -435,6 +435,29 @@ class RelationalGroupedDataset protected[sql](

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141803015 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala --- @@ -37,6 +37,9 @@ object AttributeSet { /** Co

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141788690 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -44,14 +44,17 @@ case class ArrowEvalPythonExec(udf

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141788272 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141804070 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -47,8 +47,8 @@ import org.apache.spark.sql.types.StructType

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141803992 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -519,3 +519,18 @@ case class CoGroup( outputO

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141803787 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -24,9 +24,9 @@ import org.apache.spark.broadcast.Broad

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141788365 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141807573 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141804329 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -435,6 +435,29 @@ class RelationalGroupedDataset protected[sql](

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141820198 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -44,14 +44,17 @@ case class ArrowEvalPythonExec(udf

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141817634 --- Diff: python/pyspark/sql/group.py --- @@ -194,6 +194,28 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col, valu

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-28 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141780866 --- Diff: python/pyspark/sql/functions.py --- @@ -2206,6 +2207,10 @@ def pandas_udf(f=None, returnType=StringType()): | 8| JOHN DOE|

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141775540 --- Diff: python/pyspark/sql/functions.py --- @@ -2129,7 +2129,8 @@ def _create_udf(f, returnType, vectorized): def _udf(f, returnType=StringTyp

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-28 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141765717 --- Diff: python/pyspark/sql/tests.py --- @@ -3376,6 +3377,74 @@ def test_vectorized_udf_empty_partition(self): res = df.select(f(col('id'))

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-28 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141765529 --- Diff: python/pyspark/sql/functions.py --- @@ -2206,6 +2207,10 @@ def pandas_udf(f=None, returnType=StringType()): | 8| JOHN DOE

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-28 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141765893 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -44,14 +44,17 @@ case class ArrowEvalPythonExe

<    1   2   3