Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r172583099
--- Diff: python/pyspark/worker.py ---
@@ -91,10 +92,16 @@ def verify_result_length(*a):
def wrap_grouped_map_pandas_udf(f, return_type):
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r172374978
--- Diff: python/pyspark/worker.py ---
@@ -91,10 +92,16 @@ def verify_result_length(*a):
def wrap_grouped_map_pandas_udf(f, return_type)
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r172250093
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala
---
@@ -75,28 +76,66 @@ case class FlatMapGroups
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r172250020
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala
---
@@ -75,28 +76,66 @@ case class FlatMapGroups
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r172249968
--- Diff: python/pyspark/worker.py ---
@@ -149,18 +156,30 @@ def read_udfs(pickleSer, infile, eval_type):
num_udfs = read_int(infile)
ud
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r172249841
--- Diff: python/pyspark/worker.py ---
@@ -149,18 +156,30 @@ def read_udfs(pickleSer, infile, eval_type):
num_udfs = read_int(infile)
ud
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r172249757
--- Diff: python/pyspark/sql/functions.py ---
@@ -2253,6 +2253,30 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
| 2| 1.1094
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r172249706
--- Diff: python/pyspark/sql/functions.py ---
@@ -2253,6 +2253,30 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
| 2| 1.1094
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r172249785
--- Diff: python/pyspark/sql/functions.py ---
@@ -2253,6 +2253,30 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
| 2| 1.1094
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r172226094
--- Diff: python/pyspark/sql/types.py ---
@@ -1725,6 +1737,29 @@ def _get_local_timezone():
return os.environ.get('TZ', 'dateutil/:')
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r172225237
--- Diff: python/pyspark/worker.py ---
@@ -149,18 +156,30 @@ def read_udfs(pickleSer, infile, eval_type):
num_udfs = read_int(infile)
ud
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r171469275
--- Diff: python/pyspark/sql/functions.py ---
@@ -2253,6 +2253,30 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
| 2| 1.109
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r171466325
--- Diff: python/pyspark/sql/types.py ---
@@ -1725,6 +1737,29 @@ def _get_local_timezone():
return os.environ.get('TZ', 'dateutil/:')
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r171465908
--- Diff: python/pyspark/sql/functions.py ---
@@ -2253,6 +2253,30 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
| 2| 1.109
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r171285307
--- Diff: python/pyspark/sql/functions.py ---
@@ -2253,6 +2253,30 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
| 2| 1.109
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r171305660
--- Diff: python/pyspark/worker.py ---
@@ -149,18 +156,30 @@ def read_udfs(pickleSer, infile, eval_type):
num_udfs = read_int(infile)
u
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r171311096
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala
---
@@ -75,28 +76,66 @@ case class FlatMapGroup
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r171297236
--- Diff: python/pyspark/sql/functions.py ---
@@ -2253,6 +2253,30 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
| 2| 1.109
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r171307853
--- Diff: python/pyspark/worker.py ---
@@ -149,18 +156,30 @@ def read_udfs(pickleSer, infile, eval_type):
num_udfs = read_int(infile)
u
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r171297003
--- Diff: python/pyspark/sql/functions.py ---
@@ -2253,6 +2253,30 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
| 2| 1.109
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r171315375
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala
---
@@ -75,28 +76,66 @@ case class FlatMapGroup
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r167501868
--- Diff: python/pyspark/sql/types.py ---
@@ -1694,6 +1694,13 @@ def from_arrow_schema(arrow_schema):
for field in arrow_schema])
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r164483676
--- Diff: python/pyspark/sql/udf.py ---
@@ -54,7 +54,7 @@ def _create_udf(f, returnType, evalType):
"Instead, create a 1-arg pandas_u
GitHub user icexelloss opened a pull request:
https://github.com/apache/spark/pull/20295
[SPARK-23011] Support alternative function form with group aggregate pandas
UDF
## What changes were proposed in this pull request?
This PR proposes to support an alternative function f
24 matches
Mail list logo