Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/19331
Thanks! merging to master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18659
@BryanCutler Hmm, I'm not exactly sure the reason why it doesn't work (or
mine works) but we can use `fillna(0)` before casting like:
```
pa.Array.from_pandas(s.fillna(0).astype
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/19249
A late LGTM. Btw, can we use the same idea for `MapType`?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/19234
LGTM.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18659
@BryanCutler I think it's okay to rename `size` to `length` (or longer name
to avoid name-conflict like `_length_
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18659#discussion_r140396700
--- Diff: python/pyspark/serializers.py ---
@@ -199,6 +211,55 @@ def __repr__(self):
return "ArrowSerializer"
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18659#discussion_r139579800
--- Diff: python/pyspark/worker.py ---
@@ -71,7 +73,19 @@ def wrap_udf(f, return_type):
return lambda *a: f(*a)
-def
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18659#discussion_r139580569
--- Diff: python/pyspark/worker.py ---
@@ -71,7 +73,19 @@ def wrap_udf(f, return_type):
return lambda *a: f(*a)
-def
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18659#discussion_r139583530
--- Diff: python/pyspark/sql/tests.py ---
@@ -3122,6 +3122,185 @@ def test_filtered_frame(self):
self.assertTrue(pdf.empty
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18659
@BryanCutler I'm ok to upgrade pyarrow to 0.7 except for the same concerns
as #18974.
I guess we need to discuss upgrade policy and strategy of pyarrow
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19246#discussion_r139871986
--- Diff: python/pyspark/sql/types.py ---
@@ -410,6 +410,24 @@ def __init__(self, name, dataType, nullable=True,
metadata=None):
self.dataType
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18754#discussion_r139872489
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
---
@@ -224,6 +226,25 @@ private[arrow] class DoubleWriter(val
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r142841543
--- Diff: python/pyspark/sql/group.py ---
@@ -192,7 +193,67 @@ def pivot(self, pivot_col, values=None):
jgd = self._jgd.pivot(pivot_col
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r142592915
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala
---
@@ -0,0 +1,89 @@
+/*
+ * Licensed
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r142610856
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala
---
@@ -0,0 +1,95 @@
+/*
+ * Licensed
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r142720877
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala
---
@@ -0,0 +1,89 @@
+/*
+ * Licensed
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18664
I'd say I prefer 1, too. I'm just wondering what if we use timestamp in
nested types. Currently we don't support nested types but in the future
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18732
I submitted a pr #19505 to introduce `@pandas_grouped_udf` instead of
reusing `@pandas_udf`.
---
-
To unsubscribe, e-mail
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19505#discussion_r144768652
--- Diff: python/pyspark/sql/functions.py ---
@@ -2044,7 +2044,7 @@ class UserDefinedFunction(object):
.. versionadded:: 1.3
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18732
I'm +0 for now.
I'm just wondering whether we can support struct types in vectorized UDF
when needed in the future.
As for adding pandas UDAF, I think we need another decorator
GitHub user ueshin opened a pull request:
https://github.com/apache/spark/pull/19505
[SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply() with pandas udf
## What changes were proposed in this pull request?
This is a follow-up of #18732.
This pr introduces
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19505#discussion_r144780130
--- Diff: python/pyspark/sql/functions.py ---
@@ -2192,67 +2195,82 @@ def pandas_udf(f=None, returnType=StringType()):
:param f: user-defined
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19459#discussion_r144827985
--- Diff: python/pyspark/sql/session.py ---
@@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema):
data = [schema.toInternal(row
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19459#discussion_r144829187
--- Diff: python/pyspark/sql/types.py ---
@@ -1624,6 +1624,50 @@ def to_arrow_type(dt):
return arrow_type
+def to_arrow_schema
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19459#discussion_r144828405
--- Diff: python/pyspark/sql/session.py ---
@@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema):
data = [schema.toInternal(row
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19505#discussion_r144924728
--- Diff: python/pyspark/sql/functions.py ---
@@ -2121,33 +2127,40 @@ def wrapper(*args):
wrapper.func = self.func
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19505#discussion_r144926703
--- Diff: python/pyspark/sql/functions.py ---
@@ -2192,67 +2205,82 @@ def pandas_udf(f=None, returnType=StringType()):
:param f: user-defined
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19505#discussion_r144929491
--- Diff: python/pyspark/sql/functions.py ---
@@ -2192,67 +2205,82 @@ def pandas_udf(f=None, returnType=StringType()):
:param f: user-defined
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19505#discussion_r144924765
--- Diff: python/pyspark/sql/functions.py ---
@@ -2038,13 +2038,19 @@ def _wrap_function(sc, func, returnType
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19505#discussion_r144926010
--- Diff: python/pyspark/sql/functions.py ---
@@ -2192,67 +2205,82 @@ def pandas_udf(f=None, returnType=StringType()):
:param f: user-defined
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18664
Jenkins, retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18664#discussion_r145037361
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
---
@@ -55,6 +55,12 @@ object ArrowWriter {
case
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18664#discussion_r145036404
--- Diff: python/pyspark/sql/types.py ---
@@ -1619,11 +1619,38 @@ def to_arrow_type(dt):
arrow_type = pa.decimal(dt.precision, dt.scale
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19505#discussion_r144859680
--- Diff: python/pyspark/sql/functions.py ---
@@ -2121,33 +2127,35 @@ def wrapper(*args):
wrapper.func = self.func
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19505#discussion_r144859965
--- Diff: python/pyspark/sql/functions.py ---
@@ -2121,33 +2127,35 @@ def wrapper(*args):
wrapper.func = self.func
GitHub user ueshin opened a pull request:
https://github.com/apache/spark/pull/19517
[SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply() with pandas udf
## What changes were proposed in this pull request?
This is a follow-up of #18732.
This pr modifies
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18664
I disagree with using `DateTimeUtils.defaultTimeZone()` for the timezone.
If `DateTimeUtils.defaultTimeZone()` is different from system timezone in
Python, the return values are different between
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/19475
Jenkins, retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18664#discussion_r144250165
--- Diff: python/pyspark/sql/types.py ---
@@ -1619,11 +1619,47 @@ def to_arrow_type(dt):
arrow_type = pa.decimal(dt.precision, dt.scale
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18664#discussion_r144248880
--- Diff: python/pyspark/sql/types.py ---
@@ -1619,11 +1619,47 @@ def to_arrow_type(dt):
arrow_type = pa.decimal(dt.precision, dt.scale
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19505#discussion_r144848936
--- Diff: python/pyspark/sql/functions.py ---
@@ -2121,33 +2127,35 @@ def wrapper(*args):
wrapper.func = self.func
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19505#discussion_r144852099
--- Diff: python/pyspark/sql/functions.py ---
@@ -2121,33 +2127,35 @@ def wrapper(*args):
wrapper.func = self.func
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19505#discussion_r145027904
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -137,11 +137,15 @@ object ExtractPythonUDFs extends
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19459#discussion_r145032365
--- Diff: python/pyspark/sql/session.py ---
@@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema):
data = [schema.toInternal(row
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19459#discussion_r145034007
--- Diff: python/pyspark/sql/session.py ---
@@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema):
data = [schema.toInternal(row
GitHub user ueshin opened a pull request:
https://github.com/apache/spark/pull/19491
[SPARK-22273][SQL] Fix key/value schema field names in HashMapGenerators.
## What changes were proposed in this pull request?
When fixing schema field names using escape characters
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/19147
@felixcheung Thank you for your comment.
We already support data type in string form. I'll add a test to confirm it.
As for decorator name, we can have a more generic decorator name
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19147#discussion_r137507341
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/VectorizedPythonRunner.scala
---
@@ -0,0 +1,329 @@
+/*
+ * Licensed
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19147#discussion_r137507456
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/VectorizedPythonRunner.scala
---
@@ -0,0 +1,329 @@
+/*
+ * Licensed
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19085#discussion_r136092600
--- Diff: core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala
---
@@ -35,6 +35,16 @@ import org.apache.spark.rdd.RDD
/** Utilities
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19085#discussion_r136093307
--- Diff: python/pyspark/sql/tests.py ---
@@ -2480,6 +2480,11 @@ def assertCollectSuccess(typecode, value):
a = array.array(t
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18787
LGTM, pending Jenkins.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19147#discussion_r138003254
--- Diff: python/pyspark/sql/tests.py ---
@@ -3122,6 +3124,147 @@ def test_filtered_frame(self):
self.assertTrue(pdf.empty
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19147#discussion_r138005735
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/VectorizedPythonRunner.scala
---
@@ -0,0 +1,329 @@
+/*
+ * Licensed
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19147#discussion_r138012166
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExec.scala
---
@@ -62,6 +62,7 @@ import org.apache.spark.util.Utils
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19147#discussion_r138003592
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExec.scala
---
@@ -62,6 +62,7 @@ import org.apache.spark.util.Utils
GitHub user ueshin opened a pull request:
https://github.com/apache/spark/pull/19147
[WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs in Python
## What changes were proposed in this pull request?
This pr introduces vectorized UDFs in Python.
Note that this pr should
GitHub user ueshin opened a pull request:
https://github.com/apache/spark/pull/19158
[SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTests2 should stop
SparkContext.
## What changes were proposed in this pull request?
`pyspark.sql.tests.SQLTests2` doesn't stop newly
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/19147
The test failure above should be fixed by #19158.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18659
@BryanCutler I sent a pr to your repository
https://github.com/BryanCutler/spark/pull/26. Could you take a look at it
please
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/19158
Thanks for reviewing! merging to master/2.2/2.1/2.0
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/19147
Jenkins, retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/19325
LGTM, pending Jenkins.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user ueshin opened a pull request:
https://github.com/apache/spark/pull/19349
[SPARK-22125][PYSPARK][SQL] Enable Arrow Stream format for vectorized UDF.
## What changes were proposed in this pull request?
Currently we use Arrow File format to communicate with Python
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r141788690
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala
---
@@ -44,14 +44,17 @@ case class ArrowEvalPythonExec
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r141803015
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala
---
@@ -37,6 +37,9 @@ object AttributeSet
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r141788272
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala
---
@@ -0,0 +1,95 @@
+/*
+ * Licensed
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r141803787
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
---
@@ -24,9 +24,9 @@ import
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r141788365
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala
---
@@ -0,0 +1,95 @@
+/*
+ * Licensed
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r141804070
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -47,8 +47,8 @@ import org.apache.spark.sql.types.StructType
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r141803992
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
---
@@ -519,3 +519,18 @@ case class CoGroup
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r141807573
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala
---
@@ -0,0 +1,95 @@
+/*
+ * Licensed
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r141804329
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -435,6 +435,29 @@ class RelationalGroupedDataset protected[sql
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r141829344
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala
---
@@ -44,14 +44,17 @@ case class ArrowEvalPythonExec
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r142057939
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala
---
@@ -44,14 +66,24 @@ case class ArrowEvalPythonExec
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r142055482
--- Diff: python/pyspark/sql/functions.py ---
@@ -2181,31 +2186,69 @@ def udf(f=None, returnType=StringType()):
@since(2.3)
def pandas_udf(f=None
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r142055226
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala
---
@@ -0,0 +1,95 @@
+/*
+ * Licensed
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r142066688
--- Diff: python/pyspark/worker.py ---
@@ -32,8 +32,9 @@
from pyspark.serializers import write_with_length, write_int, read_long, \
write_long
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/19349
The performance test I did in my local based on @BryanCutler's
(https://github.com/apache/spark/pull/18659#issuecomment-315879173) is as
follows:
```python
from pyspark.sql.functions
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/19349
cc @BryanCutler @HyukjinKwon @viirya @cloud-fan
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19349#discussion_r141250242
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowPythonRunner.scala
---
@@ -0,0 +1,197 @@
+/*
+ * Licensed
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19349#discussion_r141250227
--- Diff: python/pyspark/serializers.py ---
@@ -211,33 +212,37 @@ def __repr__(self):
return "ArrowSerializer"
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19349#discussion_r141250251
--- Diff: python/pyspark/serializers.py ---
@@ -251,6 +256,36 @@ def __repr__(self):
return "ArrowPandasSerializer"
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19349#discussion_r141250303
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -0,0 +1,429 @@
+/*
+ * Licensed to the Apache Software
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19349#discussion_r141250276
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonUDFRunner.scala
---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19349#discussion_r141256430
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowPythonRunner.scala
---
@@ -0,0 +1,197 @@
+/*
+ * Licensed
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19349#discussion_r141257653
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowPythonRunner.scala
---
@@ -0,0 +1,197 @@
+/*
+ * Licensed
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19367#discussion_r141365149
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
---
@@ -718,62 +705,69 @@ class ColumnarBatchSuite
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19367#discussion_r141365196
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
---
@@ -718,62 +705,69 @@ class ColumnarBatchSuite
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19367#discussion_r141360887
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala
---
@@ -25,19 +25,25 @@ import
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18958
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18958#discussion_r134393145
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java
---
@@ -307,64 +293,70 @@ public void update(int ordinal
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18974
Do we need to upgrade pyarrow in Jenkins environment?
LGTM except for it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18787#discussion_r135439793
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala
---
@@ -1629,6 +1632,39 @@ class ArrowConvertersSuite
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18787#discussion_r135439372
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
---
@@ -1261,4 +1264,55 @@ class ColumnarBatchSuite
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18787#discussion_r135438683
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala
---
@@ -111,6 +125,66 @@ private[sql] object ArrowConverters
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18787#discussion_r135439857
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala
---
@@ -1629,6 +1632,39 @@ class ArrowConvertersSuite
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18787#discussion_r135439310
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
---
@@ -1261,4 +1264,55 @@ class ColumnarBatchSuite
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18945#discussion_r134925269
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1762,7 +1762,7 @@ def toPandas(self):
else:
--- End diff --
If we use
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/19027
LGTM.
Btw, I'm just curious why we need tests with `numpy` here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
901 - 1000 of 2567 matches
Mail list logo