Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/18787#discussion_r132188490
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java
---
@@ -65,15 +65,35 @@
final Row row
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/18933#discussion_r133229705
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -912,6 +912,14 @@ object SQLConf {
.intConf
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/15821
@BryanCutler , is Timestamp and Date type supported now with Arrow 0.3?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/15821
>@icexelloss , yes Arrow supports it but Spark stores timestamps is a
different way which caused some complication. After talking with Holden, we
agreed it was better to keep this PR to sim
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/22305
Hi @BryanCutler @HyukjinKwon @ueshin , mind taking another look? I think
this is in a good shape. Thanks!
---
-
To
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/23248#discussion_r239565253
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -131,8 +131,20 @@ object ExtractPythonUDFs
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r239587020
--- Diff: python/pyspark/sql/tests/test_pandas_udf_window.py ---
@@ -44,9 +44,18 @@ def python_plus_one(self):
@property
def
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r239587089
--- Diff: python/pyspark/sql/tests/test_pandas_udf_window.py ---
@@ -231,12 +266,10 @@ def test_array_type(self):
self.assertEquals(result1
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r239587065
--- Diff: python/pyspark/sql/tests/test_pandas_udf_window.py ---
@@ -87,8 +96,34 @@ def ordered_window(self):
def unpartitioned_window(self
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r239587136
--- Diff: python/pyspark/sql/tests/test_pandas_udf_window.py ---
@@ -245,11 +278,101 @@ def test_invalid_args(self):
foo_udf
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r239587375
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala
---
@@ -144,24 +282,107 @@ case class
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r239922856
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala
---
@@ -144,24 +282,107 @@ case class
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/23248#discussion_r239925749
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -131,8 +131,20 @@ object ExtractPythonUDFs
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22208#discussion_r212716787
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -216,8 +216,16 @@ class Dataset[T] private[sql](
private[sql] def
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/22244
@cloud-fan Thanks! I will take a look later today and incorporate this with
my patch.
---
-
To unsubscribe, e-mail: reviews
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/22104
Thanks all for the review!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/22208
@dongjoon-hyun Could please take another look? I changed to use resolver
and try to resolve column with backticks and added unit tests as well
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/22208
@dongjoon-hyun SGTM. I misunderstood your suggestion about resolver.
Keeping it simple was my preference too.
---
-
To
GitHub user icexelloss opened a pull request:
https://github.com/apache/spark/pull/22305
[WIP][SPARK-24561][SQL][Python] User-defined window aggregation functions
with Pandas UDF (bounded window)
## What changes were proposed in this pull request?
### **This is currently
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/22305
The current state is a minimum working version - I copied some code from
`WindowExec` to make this work but will need to refactor those
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/22104
@cloud-fan Sure! Updated
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22329#discussion_r214940744
--- Diff: python/pyspark/sql/functions.py ---
@@ -2804,6 +2804,20 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
| 1|1.5
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22329#discussion_r215267320
--- Diff: python/pyspark/sql/functions.py ---
@@ -2804,6 +2804,22 @@ def pandas_udf(f=None, returnType=None,
functionType=None):
| 1|1.5
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/22329
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/22305
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r218243887
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
---
@@ -0,0 +1,228 @@
+/*
+ * Licensed to the
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r218244042
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
---
@@ -0,0 +1,228 @@
+/*
+ * Licensed to the
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/22305
@felixcheung I am waiting for some in-depth review. @ueshin do you have
some time to review this in the near future? Thanks
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r227591428
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -63,7 +65,7 @@ private[spark] object PythonEvalType
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r227591518
--- Diff: python/pyspark/sql/tests.py ---
@@ -6481,12 +6516,116 @@ def test_invalid_args(self):
foo_udf = pandas_udf(lambda x: x,
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r227591746
--- Diff: python/pyspark/sql/tests.py ---
@@ -6323,6 +6333,33 @@ def ordered_window(self):
def unpartitioned_window(self):
return
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/22305
Hey @gatorsmile it has been quite a while with no review progress on this.
@BryanCutler has some initial comments but I want to get more people's feedback
before addressing those. Since no
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/22305
No worries. Thank you @HyukjinKwon and @ueshin
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r232084279
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala
---
@@ -73,68 +118,147 @@ case class
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r232388369
--- Diff: python/pyspark/worker.py ---
@@ -154,6 +154,47 @@ def wrapped(*series):
return lambda *a: (wrapped(*a), arrow_return_type
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r232393187
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala
---
@@ -73,68 +118,147 @@ case class
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r232393305
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala
---
@@ -73,68 +118,147 @@ case class
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r232393452
--- Diff: python/pyspark/sql/tests.py ---
@@ -6323,6 +6333,33 @@ def ordered_window(self):
def unpartitioned_window(self):
return
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r232393335
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala
---
@@ -27,17 +27,62 @@ import
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r232393476
--- Diff: python/pyspark/sql/tests.py ---
@@ -6323,6 +6333,33 @@ def ordered_window(self):
def unpartitioned_window(self):
return
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r234790403
--- Diff: python/pyspark/sql/tests.py ---
@@ -7064,12 +7098,104 @@ def test_invalid_args(self):
foo_udf = pandas_udf(lambda x: x,
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r234790364
--- Diff: python/pyspark/sql/tests.py ---
@@ -89,6 +89,7 @@
from pyspark.sql.types import _merge_type
from pyspark.tests import QuietTest
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r234790633
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala
---
@@ -73,68 +118,151 @@ case class
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r234790479
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala
---
@@ -27,17 +27,62 @@ import
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r235182927
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -63,7 +65,7 @@ private[spark] object PythonEvalType
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/22305
@BryanCutler @HyukjinKwon @ueshin I have addressed all the comments so far.
Could you please take another look? Thanks
Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/22305#discussion_r235417425
--- Diff: python/pyspark/worker.py ---
@@ -154,6 +154,47 @@ def wrapped(*series):
return lambda *a: (wrapped(*a), arrow_return_type
701 - 747 of 747 matches
Mail list logo