[GitHub] spark issue #19331: [SPARK-22109][SQL] Resolves type conflicts between strin...

2017-09-23 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19331 Thanks! merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #18659: [SPARK-21190][PYSPARK][WIP] Python Vectorized UDFs

2017-09-20 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18659 @BryanCutler Hmm, I'm not exactly sure the reason why it doesn't work (or mine works) but we can use `fillna(0)` before casting like: ``` pa.Array.from_pandas(s.fillna(0).astype

[GitHub] spark issue #19249: [SPARK-22032][PySpark] Speed up StructType conversion

2017-09-17 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19249 A late LGTM. Btw, can we use the same idea for `MapType`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19234: [SPARK-22010][PySpark] Change fromInternal method of Tim...

2017-09-17 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19234 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #18659: [SPARK-21190][PYSPARK][WIP] Python Vectorized UDFs

2017-09-18 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18659 @BryanCutler I think it's okay to rename `size` to `length` (or longer name to avoid name-conflict like `_length_

[GitHub] spark pull request #18659: [SPARK-21190][PYSPARK][WIP] Python Vectorized UDF...

2017-09-21 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18659#discussion_r140396700 --- Diff: python/pyspark/serializers.py --- @@ -199,6 +211,55 @@ def __repr__(self): return "ArrowSerializer"

[GitHub] spark pull request #18659: [SPARK-21190][PYSPARK][WIP] Python Vectorized UDF...

2017-09-18 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18659#discussion_r139579800 --- Diff: python/pyspark/worker.py --- @@ -71,7 +73,19 @@ def wrap_udf(f, return_type): return lambda *a: f(*a) -def

[GitHub] spark pull request #18659: [SPARK-21190][PYSPARK][WIP] Python Vectorized UDF...

2017-09-18 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18659#discussion_r139580569 --- Diff: python/pyspark/worker.py --- @@ -71,7 +73,19 @@ def wrap_udf(f, return_type): return lambda *a: f(*a) -def

[GitHub] spark pull request #18659: [SPARK-21190][PYSPARK][WIP] Python Vectorized UDF...

2017-09-18 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18659#discussion_r139583530 --- Diff: python/pyspark/sql/tests.py --- @@ -3122,6 +3122,185 @@ def test_filtered_frame(self): self.assertTrue(pdf.empty

[GitHub] spark issue #18659: [SPARK-21190][PYSPARK][WIP] Python Vectorized UDFs

2017-09-18 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18659 @BryanCutler I'm ok to upgrade pyarrow to 0.7 except for the same concerns as #18974. I guess we need to discuss upgrade policy and strategy of pyarrow

[GitHub] spark pull request #19246: [SPARK-22025][PySpark] Speeding up fromInternal f...

2017-09-19 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19246#discussion_r139871986 --- Diff: python/pyspark/sql/types.py --- @@ -410,6 +410,24 @@ def __init__(self, name, dataType, nullable=True, metadata=None): self.dataType

[GitHub] spark pull request #18754: [WIP][SPARK-21552][SQL] Add DecimalType support t...

2017-09-19 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18754#discussion_r139872489 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala --- @@ -224,6 +226,25 @@ private[arrow] class DoubleWriter(val

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-04 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142841543 --- Diff: python/pyspark/sql/group.py --- @@ -192,7 +193,67 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-04 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142592915 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,89 @@ +/* + * Licensed

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-04 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142610856 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-04 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142720877 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,89 @@ +/* + * Licensed

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-10 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18664 I'd say I prefer 1, too. I'm just wondering what if we use timestamp in nested types. Currently we don't support nested types but in the future

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-16 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18732 I submitted a pr #19505 to introduce `@pandas_grouped_udf` instead of reusing `@pandas_udf`. --- - To unsubscribe, e-mail

[GitHub] spark pull request #19505: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19505#discussion_r144768652 --- Diff: python/pyspark/sql/functions.py --- @@ -2044,7 +2044,7 @@ class UserDefinedFunction(object): .. versionadded:: 1.3

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-15 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18732 I'm +0 for now. I'm just wondering whether we can support struct types in vectorized UDF when needed in the future. As for adding pandas UDAF, I think we need another decorator

[GitHub] spark pull request #19505: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

2017-10-16 Thread ueshin
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/19505 [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply() with pandas udf ## What changes were proposed in this pull request? This is a follow-up of #18732. This pr introduces

[GitHub] spark pull request #19505: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19505#discussion_r144780130 --- Diff: python/pyspark/sql/functions.py --- @@ -2192,67 +2195,82 @@ def pandas_udf(f=None, returnType=StringType()): :param f: user-defined

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144827985 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144829187 --- Diff: python/pyspark/sql/types.py --- @@ -1624,6 +1624,50 @@ def to_arrow_type(dt): return arrow_type +def to_arrow_schema

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r144828405 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row

[GitHub] spark pull request #19505: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19505#discussion_r144924728 --- Diff: python/pyspark/sql/functions.py --- @@ -2121,33 +2127,40 @@ def wrapper(*args): wrapper.func = self.func

[GitHub] spark pull request #19505: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19505#discussion_r144926703 --- Diff: python/pyspark/sql/functions.py --- @@ -2192,67 +2205,82 @@ def pandas_udf(f=None, returnType=StringType()): :param f: user-defined

[GitHub] spark pull request #19505: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19505#discussion_r144929491 --- Diff: python/pyspark/sql/functions.py --- @@ -2192,67 +2205,82 @@ def pandas_udf(f=None, returnType=StringType()): :param f: user-defined

[GitHub] spark pull request #19505: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19505#discussion_r144924765 --- Diff: python/pyspark/sql/functions.py --- @@ -2038,13 +2038,19 @@ def _wrap_function(sc, func, returnType

[GitHub] spark pull request #19505: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19505#discussion_r144926010 --- Diff: python/pyspark/sql/functions.py --- @@ -2192,67 +2205,82 @@ def pandas_udf(f=None, returnType=StringType()): :param f: user-defined

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-17 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18664 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...

2017-10-17 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r145037361 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala --- @@ -55,6 +55,12 @@ object ArrowWriter { case

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...

2017-10-17 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r145036404 --- Diff: python/pyspark/sql/types.py --- @@ -1619,11 +1619,38 @@ def to_arrow_type(dt): arrow_type = pa.decimal(dt.precision, dt.scale

[GitHub] spark pull request #19505: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19505#discussion_r144859680 --- Diff: python/pyspark/sql/functions.py --- @@ -2121,33 +2127,35 @@ def wrapper(*args): wrapper.func = self.func

[GitHub] spark pull request #19505: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19505#discussion_r144859965 --- Diff: python/pyspark/sql/functions.py --- @@ -2121,33 +2127,35 @@ def wrapper(*args): wrapper.func = self.func

[GitHub] spark pull request #19517: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

2017-10-17 Thread ueshin
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/19517 [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply() with pandas udf ## What changes were proposed in this pull request? This is a follow-up of #18732. This pr modifies

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-11 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18664 I disagree with using `DateTimeUtils.defaultTimeZone()` for the timezone. If `DateTimeUtils.defaultTimeZone()` is different from system timezone in Python, the return values are different between

[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

2017-10-12 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19475 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...

2017-10-12 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r144250165 --- Diff: python/pyspark/sql/types.py --- @@ -1619,11 +1619,47 @@ def to_arrow_type(dt): arrow_type = pa.decimal(dt.precision, dt.scale

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...

2017-10-12 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r144248880 --- Diff: python/pyspark/sql/types.py --- @@ -1619,11 +1619,47 @@ def to_arrow_type(dt): arrow_type = pa.decimal(dt.precision, dt.scale

[GitHub] spark pull request #19505: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19505#discussion_r144848936 --- Diff: python/pyspark/sql/functions.py --- @@ -2121,33 +2127,35 @@ def wrapper(*args): wrapper.func = self.func

[GitHub] spark pull request #19505: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19505#discussion_r144852099 --- Diff: python/pyspark/sql/functions.py --- @@ -2121,33 +2127,35 @@ def wrapper(*args): wrapper.func = self.func

[GitHub] spark pull request #19505: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19505#discussion_r145027904 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -137,11 +137,15 @@ object ExtractPythonUDFs extends

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145032365 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-17 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145034007 --- Diff: python/pyspark/sql/session.py --- @@ -414,6 +415,43 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row

[GitHub] spark pull request #19491: [SPARK-22273][SQL] Fix key/value schema field nam...

2017-10-13 Thread ueshin
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/19491 [SPARK-22273][SQL] Fix key/value schema field names in HashMapGenerators. ## What changes were proposed in this pull request? When fixing schema field names using escape characters

[GitHub] spark issue #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs in Pytho...

2017-09-07 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19147 @felixcheung Thank you for your comment. We already support data type in string form. I'll add a test to confirm it. As for decorator name, we can have a more generic decorator name

[GitHub] spark pull request #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs i...

2017-09-07 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19147#discussion_r137507341 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/VectorizedPythonRunner.scala --- @@ -0,0 +1,329 @@ +/* + * Licensed

[GitHub] spark pull request #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs i...

2017-09-07 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19147#discussion_r137507456 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/VectorizedPythonRunner.scala --- @@ -0,0 +1,329 @@ +/* + * Licensed

[GitHub] spark pull request #19085: [SPARK-21534][SQL][PySpark] PickleException when ...

2017-08-30 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19085#discussion_r136092600 --- Diff: core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala --- @@ -35,6 +35,16 @@ import org.apache.spark.rdd.RDD /** Utilities

[GitHub] spark pull request #19085: [SPARK-21534][SQL][PySpark] PickleException when ...

2017-08-30 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19085#discussion_r136093307 --- Diff: python/pyspark/sql/tests.py --- @@ -2480,6 +2480,11 @@ def assertCollectSuccess(typecode, value): a = array.array(t

[GitHub] spark issue #18787: [SPARK-21583][SQL] Create a ColumnarBatch from ArrowColu...

2017-08-30 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18787 LGTM, pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs i...

2017-09-11 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19147#discussion_r138003254 --- Diff: python/pyspark/sql/tests.py --- @@ -3122,6 +3124,147 @@ def test_filtered_frame(self): self.assertTrue(pdf.empty

[GitHub] spark pull request #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs i...

2017-09-11 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19147#discussion_r138005735 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/VectorizedPythonRunner.scala --- @@ -0,0 +1,329 @@ +/* + * Licensed

[GitHub] spark pull request #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs i...

2017-09-11 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19147#discussion_r138012166 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExec.scala --- @@ -62,6 +62,7 @@ import org.apache.spark.util.Utils

[GitHub] spark pull request #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs i...

2017-09-11 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19147#discussion_r138003592 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExec.scala --- @@ -62,6 +62,7 @@ import org.apache.spark.util.Utils

[GitHub] spark pull request #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs i...

2017-09-06 Thread ueshin
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/19147 [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs in Python ## What changes were proposed in this pull request? This pr introduces vectorized UDFs in Python. Note that this pr should

[GitHub] spark pull request #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.test...

2017-09-07 Thread ueshin
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/19158 [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTests2 should stop SparkContext. ## What changes were proposed in this pull request? `pyspark.sql.tests.SQLTests2` doesn't stop newly

[GitHub] spark issue #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs in Pytho...

2017-09-07 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19147 The test failure above should be fixed by #19158. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #18659: [SPARK-21190][PYSPARK][WIP] Simple Python Vectorized UDF...

2017-09-12 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18659 @BryanCutler I sent a pr to your repository https://github.com/BryanCutler/spark/pull/26. Could you take a look at it please

[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...

2017-09-07 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19158 Thanks for reviewing! merging to master/2.2/2.1/2.0 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs in Pytho...

2017-09-07 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19147 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19325: [SPARK-22106][PYSPARK][SQL] Disable 0-parameter pandas_u...

2017-09-25 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19325 LGTM, pending Jenkins. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19349: [SPARK-22125][PYSPARK][SQL] Enable Arrow Stream f...

2017-09-25 Thread ueshin
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/19349 [SPARK-22125][PYSPARK][SQL] Enable Arrow Stream format for vectorized UDF. ## What changes were proposed in this pull request? Currently we use Arrow File format to communicate with Python

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141788690 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -44,14 +44,17 @@ case class ArrowEvalPythonExec

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141803015 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala --- @@ -37,6 +37,9 @@ object AttributeSet

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141788272 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141803787 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -24,9 +24,9 @@ import

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141788365 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141804070 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -47,8 +47,8 @@ import org.apache.spark.sql.types.StructType

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141803992 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -519,3 +519,18 @@ case class CoGroup

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141807573 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141804329 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -435,6 +435,29 @@ class RelationalGroupedDataset protected[sql

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r141829344 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -44,14 +44,17 @@ case class ArrowEvalPythonExec

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-01 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142057939 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -44,14 +66,24 @@ case class ArrowEvalPythonExec

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-01 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142055482 --- Diff: python/pyspark/sql/functions.py --- @@ -2181,31 +2186,69 @@ def udf(f=None, returnType=StringType()): @since(2.3) def pandas_udf(f=None

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-01 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142055226 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-02 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142066688 --- Diff: python/pyspark/worker.py --- @@ -32,8 +32,9 @@ from pyspark.serializers import write_with_length, write_int, read_long, \ write_long

[GitHub] spark issue #19349: [SPARK-22125][PYSPARK][SQL] Enable Arrow Stream format f...

2017-09-26 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19349 The performance test I did in my local based on @BryanCutler's (https://github.com/apache/spark/pull/18659#issuecomment-315879173) is as follows: ```python from pyspark.sql.functions

[GitHub] spark issue #19349: [SPARK-22125][PYSPARK][SQL] Enable Arrow Stream format f...

2017-09-25 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19349 cc @BryanCutler @HyukjinKwon @viirya @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19349: [SPARK-22125][PYSPARK][SQL] Enable Arrow Stream f...

2017-09-27 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19349#discussion_r141250242 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowPythonRunner.scala --- @@ -0,0 +1,197 @@ +/* + * Licensed

[GitHub] spark pull request #19349: [SPARK-22125][PYSPARK][SQL] Enable Arrow Stream f...

2017-09-27 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19349#discussion_r141250227 --- Diff: python/pyspark/serializers.py --- @@ -211,33 +212,37 @@ def __repr__(self): return "ArrowSerializer"

[GitHub] spark pull request #19349: [SPARK-22125][PYSPARK][SQL] Enable Arrow Stream f...

2017-09-27 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19349#discussion_r141250251 --- Diff: python/pyspark/serializers.py --- @@ -251,6 +256,36 @@ def __repr__(self): return "ArrowPandasSerializer"

[GitHub] spark pull request #19349: [SPARK-22125][PYSPARK][SQL] Enable Arrow Stream f...

2017-09-27 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19349#discussion_r141250303 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -0,0 +1,429 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19349: [SPARK-22125][PYSPARK][SQL] Enable Arrow Stream f...

2017-09-27 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19349#discussion_r141250276 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonUDFRunner.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19349: [SPARK-22125][PYSPARK][SQL] Enable Arrow Stream f...

2017-09-27 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19349#discussion_r141256430 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowPythonRunner.scala --- @@ -0,0 +1,197 @@ +/* + * Licensed

[GitHub] spark pull request #19349: [SPARK-22125][PYSPARK][SQL] Enable Arrow Stream f...

2017-09-27 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19349#discussion_r141257653 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowPythonRunner.scala --- @@ -0,0 +1,197 @@ +/* + * Licensed

[GitHub] spark pull request #19367: [SPARK-22143][SQL] Fix memory leak in OffHeapColu...

2017-09-27 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19367#discussion_r141365149 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -718,62 +705,69 @@ class ColumnarBatchSuite

[GitHub] spark pull request #19367: [SPARK-22143][SQL] Fix memory leak in OffHeapColu...

2017-09-27 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19367#discussion_r141365196 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -718,62 +705,69 @@ class ColumnarBatchSuite

[GitHub] spark pull request #19367: [SPARK-22143][SQL] Fix memory leak in OffHeapColu...

2017-09-27 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19367#discussion_r141360887 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala --- @@ -25,19 +25,25 @@ import

[GitHub] spark issue #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarchy to ma...

2017-08-21 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18958 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarch...

2017-08-22 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18958#discussion_r134393145 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java --- @@ -307,64 +293,70 @@ public void update(int ordinal

[GitHub] spark issue #18974: [SPARK-21750][SQL] Use Arrow 0.6.0

2017-08-22 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18974 Do we need to upgrade pyarrow in Jenkins environment? LGTM except for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...

2017-08-27 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r135439793 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala --- @@ -1629,6 +1632,39 @@ class ArrowConvertersSuite

[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...

2017-08-27 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r135439372 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1261,4 +1264,55 @@ class ColumnarBatchSuite

[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...

2017-08-27 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r135438683 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala --- @@ -111,6 +125,66 @@ private[sql] object ArrowConverters

[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...

2017-08-27 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r135439857 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala --- @@ -1629,6 +1632,39 @@ class ArrowConvertersSuite

[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...

2017-08-27 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r135439310 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1261,4 +1264,55 @@ class ColumnarBatchSuite

[GitHub] spark pull request #18945: Add option to convert nullable int columns to flo...

2017-08-23 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18945#discussion_r134925269 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1762,7 +1762,7 @@ def toPandas(self): else: --- End diff -- If we use

[GitHub] spark issue #19027: [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as...

2017-08-23 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19027 LGTM. Btw, I'm just curious why we need tests with `numpy` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

<    5   6   7   8   9   10   11   12   13   14   >