[jira] [Resolved] (SPARK-25801) pandas_udf grouped_map fails with input dataframe with more than 255 columns

2018-10-23 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-25801. -- Resolution: Fixed Fix Version/s: 2.4.0 > pandas_udf grouped_map fails with input

[jira] [Updated] (SPARK-22809) pyspark is sensitive to imports with dots

2018-10-23 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-22809: - Fix Version/s: 2.3.2 > pyspark is sensitive to imports with dots >

[jira] [Commented] (SPARK-22809) pyspark is sensitive to imports with dots

2018-10-23 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661502#comment-16661502 ] Bryan Cutler commented on SPARK-22809: -- Sure, I probably shouldn't have tested out of the branches.

[jira] [Comment Edited] (SPARK-25079) [PYTHON] upgrade python 3.4 -> 3.5

2018-11-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675866#comment-16675866 ] Bryan Cutler edited comment on SPARK-25079 at 11/5/18 11:09 PM: Sounds

[jira] [Commented] (SPARK-25079) [PYTHON] upgrade python 3.4 -> 3.5

2018-11-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675866#comment-16675866 ] Bryan Cutler commented on SPARK-25079: -- Sounds like a good plan [~shaneknapp]!  The instances of

[jira] [Comment Edited] (SPARK-25079) [PYTHON] upgrade python 3.4 -> 3.5

2018-11-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675866#comment-16675866 ] Bryan Cutler edited comment on SPARK-25079 at 11/5/18 11:08 PM: Sounds

[jira] [Commented] (SPARK-25344) Break large PySpark unittests into smaller files

2018-11-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685613#comment-16685613 ] Bryan Cutler commented on SPARK-25344: -- [~hyukjin.kwon] no problem, I can take on ML and MLlib >

[jira] [Commented] (SPARK-25344) Break large tests.py files into smaller files

2018-10-09 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643762#comment-16643762 ] Bryan Cutler commented on SPARK-25344: -- No I don't have strong feelings, my only preference was to

[jira] [Commented] (SPARK-25461) PySpark Pandas UDF outputs incorrect results when input columns contain None

2018-10-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635762#comment-16635762 ] Bryan Cutler commented on SPARK-25461: -- Thanks for looking into this [~viirya]! You are right that

[jira] [Commented] (SPARK-25461) PySpark Pandas UDF outputs incorrect results when input columns contain None

2018-10-03 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637626#comment-16637626 ] Bryan Cutler commented on SPARK-25461: -- I file ARROW-3428, which deals with the incorrect cast from

[jira] [Comment Edited] (SPARK-25461) PySpark Pandas UDF outputs incorrect results when input columns contain None

2018-10-03 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637626#comment-16637626 ] Bryan Cutler edited comment on SPARK-25461 at 10/3/18 11:53 PM: I filed

[jira] [Commented] (SPARK-25461) PySpark Pandas UDF outputs incorrect results when input columns contain None

2018-10-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642379#comment-16642379 ] Bryan Cutler commented on SPARK-25461: -- Just wanted to add that the resolution here added a note

[jira] [Assigned] (SPARK-25471) Fix tests for Python 3.6 with Pandas 0.23+

2018-09-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-25471: Assignee: (was: Bryan Cutler) > Fix tests for Python 3.6 with Pandas 0.23+ >

[jira] [Created] (SPARK-25471) Fix tests for Python 3.6 with Pandas 0.23+

2018-09-19 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-25471: Summary: Fix tests for Python 3.6 with Pandas 0.23+ Key: SPARK-25471 URL: https://issues.apache.org/jira/browse/SPARK-25471 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-25432) Consider if using standard getOrCreate from PySpark into JVM SparkSession would simplify code

2018-09-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626148#comment-16626148 ] Bryan Cutler commented on SPARK-25432: -- moved description :) > Consider if using standard

[jira] [Updated] (SPARK-25432) Consider if using standard getOrCreate from PySpark into JVM SparkSession would simplify code

2018-09-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-25432: - Description: As we saw in [https://github.com/apache/spark/pull/22295/files] the logic can get

[jira] [Updated] (SPARK-25432) Consider if using standard getOrCreate from PySpark into JVM SparkSession would simplify code

2018-09-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-25432: - Environment: (was: As we saw in [https://github.com/apache/spark/pull/22295/files] the

[jira] [Commented] (SPARK-25351) Handle Pandas category type when converting from Python with Arrow

2018-09-26 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629572#comment-16629572 ] Bryan Cutler commented on SPARK-25351: -- Hi [~pgadige], yes please go ahead with this issue! When

[jira] [Commented] (SPARK-26591) Scalar Pandas UDF fails with 'illegal hardware instruction' in a certain environment

2019-01-16 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744288#comment-16744288 ] Bryan Cutler commented on SPARK-26591: -- [~elch10] please go ahead and make a Jira for Arrow

[jira] [Updated] (SPARK-26566) Upgrade apache/arrow to 0.12.0

2019-01-16 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-26566: - Description: _This is just a placeholder for now to collect what needs to be fixed when we

[jira] [Commented] (SPARK-26591) Scalar Pandas UDF fails with 'illegal hardware instruction' in a certain environment

2019-01-14 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742517#comment-16742517 ] Bryan Cutler commented on SPARK-26591: -- I created the same virtual environment and could not

[jira] [Commented] (SPARK-26591) Scalar Pandas UDF fails with 'illegal hardware instruction' in a certain environment

2019-01-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743460#comment-16743460 ] Bryan Cutler commented on SPARK-26591: -- [~elch10] this seems like it is more an Arrow issue with

[jira] [Commented] (SPARK-26591) Scalar Pandas UDF fails with 'illegal hardware instruction' in a certain environment

2019-01-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743463#comment-16743463 ] Bryan Cutler commented on SPARK-26591: -- And yes, you could build pyarrow yourself, but it shouldn't

[jira] [Resolved] (SPARK-26676) Make HiveContextSQLTests.test_unbounded_frames test compatible with Python 2 and PyPy

2019-01-21 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-26676. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23604

[jira] [Assigned] (SPARK-26676) Make HiveContextSQLTests.test_unbounded_frames test compatible with Python 2 and PyPy

2019-01-21 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-26676: Assignee: Hyukjin Kwon > Make HiveContextSQLTests.test_unbounded_frames test compatible

[jira] [Commented] (SPARK-26315) auto cast threshold from Integer to Float in approxSimilarityJoin of BucketedRandomProjectionLSHModel

2018-12-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717873#comment-16717873 ] Bryan Cutler commented on SPARK-26315: -- I believe {{def approxSimilarityJoin(...)}} in LSHModelf in

[jira] [Comment Edited] (SPARK-26200) Column values are incorrectly transposed when a field in a PySpark Row requires serialization

2018-11-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704092#comment-16704092 ] Bryan Cutler edited comment on SPARK-26200 at 11/30/18 12:56 AM: - I

[jira] [Commented] (SPARK-26200) Column values are incorrectly transposed when a field in a PySpark Row requires serialization

2018-11-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704092#comment-16704092 ] Bryan Cutler commented on SPARK-26200: -- I think this is a duplicate of

[jira] [Resolved] (SPARK-26200) Column values are incorrectly transposed when a field in a PySpark Row requires serialization

2018-11-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-26200. -- Resolution: Duplicate > Column values are incorrectly transposed when a field in a PySpark

[jira] [Assigned] (SPARK-24333) Add fit with validation set to spark.ml GBT: Python API

2018-12-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-24333: Assignee: Huaxin Gao > Add fit with validation set to spark.ml GBT: Python API >

[jira] [Resolved] (SPARK-24333) Add fit with validation set to spark.ml GBT: Python API

2018-12-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-24333. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 21465

[jira] [Assigned] (SPARK-25274) Improve toPandas with Arrow by sending out-of-order record batches

2018-12-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-25274: Assignee: Bryan Cutler > Improve toPandas with Arrow by sending out-of-order record

[jira] [Resolved] (SPARK-25274) Improve toPandas with Arrow by sending out-of-order record batches

2018-12-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-25274. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22275

[jira] [Created] (SPARK-26573) Python worker not reused with mapPartitions if not consuming iterator

2019-01-08 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-26573: Summary: Python worker not reused with mapPartitions if not consuming iterator Key: SPARK-26573 URL: https://issues.apache.org/jira/browse/SPARK-26573 Project: Spark

[jira] [Assigned] (SPARK-26349) Pyspark should not accept insecure p4yj gateways

2019-01-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-26349: Assignee: Imran Rashid > Pyspark should not accept insecure p4yj gateways >

[jira] [Resolved] (SPARK-26349) Pyspark should not accept insecure p4yj gateways

2019-01-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-26349. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23441

[jira] [Updated] (SPARK-26566) Upgrade apache/arrow to 0.12.0

2019-01-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-26566: - Target Version/s: (was: 2.4.0) > Upgrade apache/arrow to 0.12.0 >

[jira] [Created] (SPARK-26566) Upgrade apache/arrow to 0.12.0

2019-01-07 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-26566: Summary: Upgrade apache/arrow to 0.12.0 Key: SPARK-26566 URL: https://issues.apache.org/jira/browse/SPARK-26566 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-26566) Upgrade apache/arrow to 0.12.0

2019-01-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736540#comment-16736540 ] Bryan Cutler commented on SPARK-26566: -- Version 0.12.0 is slated to be released in mid January >

[jira] [Resolved] (SPARK-25272) Show some kind of test output to indicate pyarrow tests were run

2019-01-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-25272. -- Resolution: Won't Fix > Show some kind of test output to indicate pyarrow tests were run >

[jira] [Updated] (SPARK-26566) Upgrade apache/arrow to 0.12.0

2019-01-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-26566: - Fix Version/s: (was: 2.4.0) > Upgrade apache/arrow to 0.12.0 >

[jira] [Updated] (SPARK-26566) Upgrade apache/arrow to 0.12.0

2019-01-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-26566: - Affects Version/s: (was: 2.3.0) 2.4.0 > Upgrade apache/arrow to

[jira] [Assigned] (SPARK-26566) Upgrade apache/arrow to 0.12.0

2019-01-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-26566: Assignee: (was: Bryan Cutler) > Upgrade apache/arrow to 0.12.0 >

[jira] [Updated] (SPARK-26566) Upgrade apache/arrow to 0.12.0

2019-01-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-26566: - Description: _This is just a placeholder for now to collect what needs to be fixed when we

[jira] [Commented] (SPARK-26591) illegal hardware instruction

2019-01-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740571#comment-16740571 ] Bryan Cutler commented on SPARK-26591: -- Could you share some details of your pyarrow installation -

[jira] [Commented] (SPARK-25344) Break large tests.py files into smaller files

2018-09-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614176#comment-16614176 ] Bryan Cutler commented on SPARK-25344: -- >From the mailing list I think we should agree on a few

[jira] [Commented] (SPARK-26200) Column values are incorrectly transposed when a field in a PySpark Row requires serialization

2018-11-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705224#comment-16705224 ] Bryan Cutler commented on SPARK-26200: -- Thanks [~davidlyness], I'll mark this as a duplicate since

[jira] [Commented] (SPARK-26412) Allow Pandas UDF to take an iterator of pd.DataFrames for the entire partition

2019-01-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751785#comment-16751785 ] Bryan Cutler commented on SPARK-26412: -- [~mengxr] I think Arrow record batches would be a much more

[jira] [Commented] (SPARK-26410) Support per Pandas UDF configuration

2019-01-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751771#comment-16751771 ] Bryan Cutler commented on SPARK-26410: -- This could be useful to have, but it does seem a little

[jira] [Commented] (SPARK-24579) SPIP: Standardize Optimized Data Exchange between Spark and DL/AI frameworks

2019-01-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751802#comment-16751802 ] Bryan Cutler commented on SPARK-24579: -- It would be great to start up this discussion again, I saw

[jira] [Commented] (SPARK-27276) Increase the minimum pyarrow version to 0.12.0

2019-03-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801250#comment-16801250 ] Bryan Cutler commented on SPARK-27276: -- [~shaneknapp] this will need an upgrade on Jenkins, so let

[jira] [Created] (SPARK-27276) Increase the minimum pyarrow version to 0.12.0

2019-03-25 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-27276: Summary: Increase the minimum pyarrow version to 0.12.0 Key: SPARK-27276 URL: https://issues.apache.org/jira/browse/SPARK-27276 Project: Spark Issue Type:

[jira] [Updated] (SPARK-27276) Increase the minimum pyarrow version to 0.12.0

2019-03-26 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27276: - Description: The current minimum version is 0.8.0, which is pretty ancient since Arrow has

[jira] [Updated] (SPARK-27276) Increase the minimum pyarrow version to 0.12.1

2019-04-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27276: - Summary: Increase the minimum pyarrow version to 0.12.1 (was: Increase the minimum pyarrow

[jira] [Commented] (SPARK-27276) Increase the minimum pyarrow version to 0.12.1

2019-04-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810024#comment-16810024 ] Bryan Cutler commented on SPARK-27276: -- I think we should use 0.12.1, there was a bug fix

[jira] [Commented] (SPARK-27353) PySpark Row __repr__ bug

2019-04-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810009#comment-16810009 ] Bryan Cutler commented on SPARK-27353: -- Works for me out of master, can you provide a script to

[jira] [Created] (SPARK-27387) Replace sqlutils assertPandasEqual with Pandas assert_frame_equal in tests

2019-04-04 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-27387: Summary: Replace sqlutils assertPandasEqual with Pandas assert_frame_equal in tests Key: SPARK-27387 URL: https://issues.apache.org/jira/browse/SPARK-27387 Project:

[jira] [Commented] (SPARK-27387) Replace sqlutils assertPandasEqual with Pandas assert_frame_equal in tests

2019-04-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810197#comment-16810197 ] Bryan Cutler commented on SPARK-27387: -- This can be done after the upgrade of pyarrow version to

[jira] [Commented] (SPARK-27387) Replace sqlutils assertPandasEqual with Pandas assert_frame_equal in tests

2019-04-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810200#comment-16810200 ] Bryan Cutler commented on SPARK-27387: -- I can work on this > Replace sqlutils assertPandasEqual

[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"

2019-04-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811355#comment-16811355 ] Bryan Cutler commented on SPARK-27389: -- >From the stacktrace, it looks like it's getting this from

[jira] [Updated] (SPARK-27293) Setting random seed produces different results in RandomForestRegressor

2019-03-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27293: - Summary: Setting random seed produces different results in RandomForestRegressor (was: I am

[jira] [Updated] (SPARK-27293) Setting random seed produces different results in RandomForestRegressor

2019-03-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27293: - Description: I am interested in finding out if there is a bug in the implementation of

[jira] [Updated] (SPARK-27293) Setting random seed produces different results in RandomForestRegressor

2019-03-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27293: - Component/s: ML > Setting random seed produces different results in RandomForestRegressor >

[jira] [Commented] (SPARK-27293) I am interested in finding out if there is a bug in the implementation of RandomForests. The Issue is when applying a seed and getting different results than other peo

2019-03-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804124#comment-16804124 ] Bryan Cutler commented on SPARK-27293: -- Setting the seed like in your example for randomSplit and

[jira] [Resolved] (SPARK-27240) Use pandas DataFrame for struct type argument in Scalar Pandas UDF.

2019-03-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-27240. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24177

[jira] [Assigned] (SPARK-27240) Use pandas DataFrame for struct type argument in Scalar Pandas UDF.

2019-03-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-27240: Assignee: Takuya Ueshin > Use pandas DataFrame for struct type argument in Scalar Pandas

[jira] [Commented] (SPARK-23836) Support returning StructType to the level support in GroupedMap Arrow's "scalar" UDFS (or similar)

2019-02-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16777416#comment-16777416 ] Bryan Cutler commented on SPARK-23836: -- I can work on this > Support returning StructType to the

[jira] [Resolved] (SPARK-25147) GroupedData.apply pandas_udf crashing

2019-02-26 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-25147. -- Resolution: Cannot Reproduce Going to resolve this for now, please reopen if the above

[jira] [Commented] (SPARK-26943) Weird behaviour with `.cache()`

2019-02-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780751#comment-16780751 ] Bryan Cutler commented on SPARK-26943: -- If you can try to reproduce locally, that would be ideal.

[jira] [Commented] (SPARK-26858) Vectorized gapplyCollect, Arrow optimization in native R function execution

2019-02-20 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773284#comment-16773284 ] Bryan Cutler commented on SPARK-26858: -- {quote} (One other possibility I was thinking about batches

[jira] [Commented] (SPARK-23961) pyspark toLocalIterator throws an exception

2019-03-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784843#comment-16784843 ] Bryan Cutler commented on SPARK-23961: -- I could also reproduce with a nearly identical error using

[jira] [Commented] (SPARK-27039) toPandas with Arrow swallows maxResultSize errors

2019-03-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783690#comment-16783690 ] Bryan Cutler commented on SPARK-27039: -- I was able to reproduce in v2.4.0, but it looks like

[jira] [Commented] (SPARK-26943) Weird behaviour with `.cache()`

2019-02-21 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774566#comment-16774566 ] Bryan Cutler commented on SPARK-26943: -- Could you please provide a complete script to reproduce?

[jira] [Created] (SPARK-27163) Cleanup and consolidate Pandas UDF functionality

2019-03-14 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-27163: Summary: Cleanup and consolidate Pandas UDF functionality Key: SPARK-27163 URL: https://issues.apache.org/jira/browse/SPARK-27163 Project: Spark Issue Type:

[jira] [Updated] (SPARK-27163) Cleanup and consolidate Pandas UDF functionality

2019-03-14 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27163: - Priority: Minor (was: Major) > Cleanup and consolidate Pandas UDF functionality >

[jira] [Resolved] (SPARK-23836) Support returning StructType to the level support in GroupedMap Arrow's "scalar" UDFS (or similar)

2019-03-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-23836. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23900

[jira] [Assigned] (SPARK-23836) Support returning StructType to the level support in GroupedMap Arrow's "scalar" UDFS (or similar)

2019-03-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-23836: Assignee: Bryan Cutler > Support returning StructType to the level support in GroupedMap

[jira] [Commented] (SPARK-26858) Vectorized gapplyCollect, Arrow optimization in native R function execution

2019-02-19 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772266#comment-16772266 ] Bryan Cutler commented on SPARK-26858: -- [~hyukjin.kwon] actually {{pyarrow.Table.from_batches}}

[jira] [Updated] (SPARK-26566) Upgrade apache/arrow to 0.12.0

2019-01-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-26566: - Description: Version 0.12.0 includes the following selected fixes/improvements relevant to

[jira] [Updated] (SPARK-26566) Upgrade apache/arrow to 0.12.0

2019-01-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-26566: - Description: Version 0.12.0 includes the following selected fixes/improvements relevant to

[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"

2019-04-09 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813896#comment-16813896 ] Bryan Cutler commented on SPARK-27389: -- Thanks [~shaneknapp] for the fix. I couldn't come up with

[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"

2019-04-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812877#comment-16812877 ] Bryan Cutler commented on SPARK-27389: -- [~shaneknapp], I had a couple of successful tests with

[jira] [Assigned] (SPARK-27387) Replace sqlutils assertPandasEqual with Pandas assert_frame_equal in tests

2019-04-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-27387: Assignee: Bryan Cutler > Replace sqlutils assertPandasEqual with Pandas

[jira] [Commented] (SPARK-27463) SPIP: Support Dataframe Cogroup via Pandas UDFs

2019-05-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842335#comment-16842335 ] Bryan Cutler commented on SPARK-27463: -- [~d80tb7] I think you could remove the SPIP label from this

[jira] [Resolved] (SPARK-27712) createDataFrame() reorders row

2019-05-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-27712. -- Resolution: Duplicate > createDataFrame() reorders row > -- > >

[jira] [Updated] (SPARK-27805) toPandas does not propagate SparkExceptions with arrow enabled

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27805: - Affects Version/s: (was: 3.1.0) 2.4.3 > toPandas does not propagate

[jira] [Resolved] (SPARK-27805) toPandas does not propagate SparkExceptions with arrow enabled

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-27805. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24677

[jira] [Assigned] (SPARK-27805) toPandas does not propagate SparkExceptions with arrow enabled

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-27805: Assignee: David Vogelbacher > toPandas does not propagate SparkExceptions with arrow

[jira] [Comment Edited] (SPARK-27939) Defining a schema with VectorUDT

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855969#comment-16855969 ] Bryan Cutler edited comment on SPARK-27939 at 6/4/19 6:13 PM: -- Linked to a

[jira] [Resolved] (SPARK-27939) Defining a schema with VectorUDT

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-27939. -- Resolution: Not A Problem > Defining a schema with VectorUDT >

[jira] [Commented] (SPARK-27939) Defining a schema with VectorUDT

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855969#comment-16855969 ] Bryan Cutler commented on SPARK-27939: -- Another problem with Python {{Row}} class > Defining a

[jira] [Comment Edited] (SPARK-27939) Defining a schema with VectorUDT

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855966#comment-16855966 ] Bryan Cutler edited comment on SPARK-27939 at 6/4/19 6:11 PM: -- The problem

[jira] [Commented] (SPARK-27939) Defining a schema with VectorUDT

2019-06-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855966#comment-16855966 ] Bryan Cutler commented on SPARK-27939: -- The problem is the {{Row}} class sorts the field names

[jira] [Updated] (SPARK-27992) PySpark socket server should sync with JVM connection thread future

2019-06-10 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27992: - Description: Both SPARK-27805 and SPARK-27548 identified an issue that errors in a Spark job

[jira] [Updated] (SPARK-27992) PySpark socket server should sync with JVM connection thread future

2019-06-10 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27992: - Environment: (was: Both SPARK-27805 and SPARK-27548 identified an issue that errors in a

[jira] [Created] (SPARK-27992) PySpark socket server should sync with JVM connection thread future

2019-06-10 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-27992: Summary: PySpark socket server should sync with JVM connection thread future Key: SPARK-27992 URL: https://issues.apache.org/jira/browse/SPARK-27992 Project: Spark

[jira] [Updated] (SPARK-27992) PySpark socket server should sync with JVM connection thread future

2019-06-10 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27992: - Affects Version/s: (was: 2.4.3) 3.0.0 > PySpark socket server should

[jira] [Updated] (SPARK-27992) PySpark socket server should sync with JVM connection thread future

2019-06-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27992: - Description: Both SPARK-27805 and SPARK-27548 identified an issue that errors in a Spark job

[jira] [Resolved] (SPARK-28003) spark.createDataFrame with Arrow doesn't work with pandas.NaT

2019-06-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-28003. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24844

[jira] [Assigned] (SPARK-28003) spark.createDataFrame with Arrow doesn't work with pandas.NaT

2019-06-24 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-28003: Assignee: Li Jin > spark.createDataFrame with Arrow doesn't work with pandas.NaT >

<    1   2   3   4   5   6   7   8   >