[jira] [Created] (SPARK-21187) Complete support for remaining Spark data type in Arrow Converters

2017-06-22 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-21187: Summary: Complete support for remaining Spark data type in Arrow Converters Key: SPARK-21187 URL: https://issues.apache.org/jira/browse/SPARK-21187 Project: Spark

[jira] [Commented] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2017-06-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060413#comment-16060413 ] Bryan Cutler commented on SPARK-13534: -- That is correct [~rxin], this did not have s

[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2017-06-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21187: - Summary: Complete support for remaining Spark data types in Arrow Converters (was: Complete supp

[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2017-06-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21187: - Description: This is to track adding the remaining type support in Arrow Converters. Currently,

[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2017-06-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21187: - Description: This is to track adding the remaining type support in Arrow Converters. Currently,

[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2017-06-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21187: - Description: This is to track adding the remaining type support in Arrow Converters. Currently,

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2017-06-23 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060510#comment-16060510 ] Bryan Cutler commented on SPARK-21187: -- Pandas only supports flat columns, I'm not s

[jira] [Commented] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2017-06-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070475#comment-16070475 ] Bryan Cutler commented on SPARK-13534: -- Hi [~jaise...@gmail.com], the DataFrameWrite

[jira] [Comment Edited] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2017-06-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070475#comment-16070475 ] Bryan Cutler edited comment on SPARK-13534 at 6/30/17 6:05 PM:

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-07-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16077390#comment-16077390 ] Bryan Cutler commented on SPARK-21190: -- This is a great discussion so far and I wou

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-07-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16078837#comment-16078837 ] Bryan Cutler commented on SPARK-21190: -- [~rxin] I was talking about 2 different thin

[jira] [Created] (SPARK-21375) Add date and timestamp support to ArrowConverters for toPandas() collection

2017-07-11 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-21375: Summary: Add date and timestamp support to ArrowConverters for toPandas() collection Key: SPARK-21375 URL: https://issues.apache.org/jira/browse/SPARK-21375 Project:

[jira] [Commented] (SPARK-21375) Add date and timestamp support to ArrowConverters for toPandas() collection

2017-07-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082735#comment-16082735 ] Bryan Cutler commented on SPARK-21375: -- I'm working on this > Add date and timestam

[jira] [Updated] (SPARK-21375) Add date and timestamp support to ArrowConverters for toPandas() collection

2017-07-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21375: - Description: Date and timestamp are not yet supported in DataFrame.toPandas() using ArrowConvert

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-07-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083018#comment-16083018 ] Bryan Cutler commented on SPARK-21190: -- [~cloud_fan] yes, I know not every function

[jira] [Commented] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2017-07-12 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084390#comment-16084390 ] Bryan Cutler commented on SPARK-13534: -- Hi [~tagar], the {{ArrowSerializer}} doesn't

[jira] [Created] (SPARK-21404) Simple Vectorized Python UDFs using Arrow

2017-07-13 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-21404: Summary: Simple Vectorized Python UDFs using Arrow Key: SPARK-21404 URL: https://issues.apache.org/jira/browse/SPARK-21404 Project: Spark Issue Type: Improve

[jira] [Commented] (SPARK-21404) Simple Vectorized Python UDFs using Arrow

2017-07-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086010#comment-16086010 ] Bryan Cutler commented on SPARK-21404: -- I'll submit the work I've done so far as a W

[jira] [Updated] (SPARK-21404) Simple Vectorized Python UDFs using Arrow

2017-07-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21404: - Description: Using Arrow, Python UDFs can be evaluated in vectorized form by using the column da

[jira] [Updated] (SPARK-21375) Add date and timestamp support to ArrowConverters for toPandas() collection

2017-07-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21375: - Issue Type: Sub-task (was: Improvement) Parent: SPARK-21187 > Add date and timestamp sup

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2017-07-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090624#comment-16090624 ] Bryan Cutler commented on SPARK-21187: -- NOTE - There was a bug fixed in Arrow 0.4.1

[jira] [Commented] (SPARK-21375) Add date and timestamp support to ArrowConverters for toPandas() collection

2017-07-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100434#comment-16100434 ] Bryan Cutler commented on SPARK-21375: -- Thanks for the details [~wesmckinn]. The ap

[jira] [Comment Edited] (SPARK-21375) Add date and timestamp support to ArrowConverters for toPandas() collection

2017-07-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100434#comment-16100434 ] Bryan Cutler edited comment on SPARK-21375 at 7/25/17 5:49 PM:

[jira] [Comment Edited] (SPARK-21375) Add date and timestamp support to ArrowConverters for toPandas() collection

2017-07-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100434#comment-16100434 ] Bryan Cutler edited comment on SPARK-21375 at 7/25/17 5:50 PM:

[jira] [Commented] (SPARK-21375) Add date and timestamp support to ArrowConverters for toPandas() collection

2017-07-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100572#comment-16100572 ] Bryan Cutler commented on SPARK-21375: -- Also, there has been some discussion about T

[jira] [Resolved] (SPARK-21231) Conda install of packages during Jenkins testing is causing intermittent failure

2017-07-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-21231. -- Resolution: Resolved > Conda install of packages during Jenkins testing is causing intermittent

[jira] [Closed] (SPARK-21231) Conda install of packages during Jenkins testing is causing intermittent failure

2017-07-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler closed SPARK-21231. Resolved by https://github.com/apache/spark/pull/18459 > Conda install of packages during Jenkins test

[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-07-26 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102420#comment-16102420 ] Bryan Cutler commented on SPARK-21190: -- Hi [~icexelloss], yes I think there is defin

[jira] [Created] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-07-31 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-21583: Summary: Create a ColumnarBatch with ArrowColumnVectors for row based iteration Key: SPARK-21583 URL: https://issues.apache.org/jira/browse/SPARK-21583 Project: Spark

[jira] [Updated] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-07-31 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21583: - Description: The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. It wou

[jira] [Updated] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-07-31 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21583: - Description: The existing {{ArrowColumnVector}} creates a read-only vector of Arrow data. It wou

[jira] [Commented] (SPARK-21583) Create a ColumnarBatch with ArrowColumnVectors for row based iteration

2017-07-31 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107652#comment-16107652 ] Bryan Cutler commented on SPARK-21583: -- I have this implemented already, will submit

[jira] [Commented] (SPARK-12717) pyspark broadcast fails when using multiple threads

2017-08-01 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110007#comment-16110007 ] Bryan Cutler commented on SPARK-12717: -- Thanks [~hyukjin.kwon]! What are your thoug

[jira] [Commented] (SPARK-12717) pyspark broadcast fails when using multiple threads

2017-08-02 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111304#comment-16111304 ] Bryan Cutler commented on SPARK-12717: -- Sure, I'll open a PR for 2.2 and ping you.

[jira] [Created] (SPARK-16231) PySpark ML DataFrame example fails on Vector conversion

2016-06-27 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-16231: Summary: PySpark ML DataFrame example fails on Vector conversion Key: SPARK-16231 URL: https://issues.apache.org/jira/browse/SPARK-16231 Project: Spark Issue

[jira] [Created] (SPARK-16260) PySpark ML Example Improvements and Cleanup

2016-06-28 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-16260: Summary: PySpark ML Example Improvements and Cleanup Key: SPARK-16260 URL: https://issues.apache.org/jira/browse/SPARK-16260 Project: Spark Issue Type: Umbre

[jira] [Created] (SPARK-16261) Fix Incorrect appNames in PySpark ML Examples

2016-06-28 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-16261: Summary: Fix Incorrect appNames in PySpark ML Examples Key: SPARK-16261 URL: https://issues.apache.org/jira/browse/SPARK-16261 Project: Spark Issue Type: Sub

[jira] [Commented] (SPARK-12428) Write a script to run all PySpark MLlib examples for testing

2016-06-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353488#comment-15353488 ] Bryan Cutler commented on SPARK-12428: -- Hey Holden, I was thinking about doing this

[jira] [Commented] (SPARK-16247) Using pyspark dataframe with pipeline and cross validator

2016-06-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353555#comment-15353555 ] Bryan Cutler commented on SPARK-16247: -- I'm not sure if this is the issue, but the f

[jira] [Commented] (SPARK-16247) Using pyspark dataframe with pipeline and cross validator

2016-06-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355456#comment-15355456 ] Bryan Cutler commented on SPARK-16247: -- I think you need to specify the {labelCol} i

[jira] [Comment Edited] (SPARK-16247) Using pyspark dataframe with pipeline and cross validator

2016-06-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355456#comment-15355456 ] Bryan Cutler edited comment on SPARK-16247 at 6/29/16 3:53 PM:

[jira] [Commented] (SPARK-16247) Using pyspark dataframe with pipeline and cross validator

2016-06-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357548#comment-15357548 ] Bryan Cutler commented on SPARK-16247: -- Great, glad that solved the problem! A cros

[jira] [Commented] (SPARK-15009) PySpark CountVectorizerModel should be able to construct from vocabulary list

2016-07-03 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360782#comment-15360782 ] Bryan Cutler commented on SPARK-15009: -- At the time I reported this, it was blocked

[jira] [Commented] (SPARK-16260) PySpark ML Example Improvements and Cleanup

2016-07-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15363135#comment-15363135 ] Bryan Cutler commented on SPARK-16260: -- I have a couple tasks I still plan to add he

[jira] [Updated] (SPARK-16403) Example cleanup and fix minor issues

2016-07-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16403: - Priority: Trivial (was: Major) > Example cleanup and fix minor issues >

[jira] [Created] (SPARK-16403) Example cleanup and fix minor issues

2016-07-06 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-16403: Summary: Example cleanup and fix minor issues Key: SPARK-16403 URL: https://issues.apache.org/jira/browse/SPARK-16403 Project: Spark Issue Type: Sub-task

[jira] [Commented] (SPARK-16403) Example cleanup and fix minor issues

2016-07-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365174#comment-15365174 ] Bryan Cutler commented on SPARK-16403: -- I'm working on this > Example cleanup and f

[jira] [Updated] (SPARK-16403) Example cleanup and fix minor issues

2016-07-06 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16403: - Description: General cleanup of examples, focused on PySpark ML, to remove unused imports, sync

[jira] [Created] (SPARK-16421) Improve output from ML examples

2016-07-07 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-16421: Summary: Improve output from ML examples Key: SPARK-16421 URL: https://issues.apache.org/jira/browse/SPARK-16421 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-16421) Improve output from ML examples

2016-07-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366461#comment-15366461 ] Bryan Cutler commented on SPARK-16421: -- I'll be working on this once the blocking is

[jira] [Updated] (SPARK-16421) Improve output from ML examples

2016-07-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16421: - Issue Type: Sub-task (was: Improvement) Parent: SPARK-16260 > Improve output from ML exa

[jira] [Commented] (SPARK-15623) 2.0 python coverage ml.feature

2016-07-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371229#comment-15371229 ] Bryan Cutler commented on SPARK-15623: -- Hey [~holdenk], think I can close this off n

[jira] [Updated] (SPARK-16403) Example cleanup and fix minor issues

2016-07-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16403: - Description: General cleanup of examples, focused on PySpark ML, to remove unused imports, sync

[jira] [Resolved] (SPARK-14087) PySpark ML JavaModel does not properly own params after being fit

2016-07-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-14087. -- Resolution: Resolved Fix Version/s: 2.0.0 This is no longer an issue as the PySpark wrap

[jira] [Commented] (SPARK-16421) Improve output from ML examples

2016-07-18 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382579#comment-15382579 ] Bryan Cutler commented on SPARK-16421: -- Yeah, I'm working on it now > Improve outpu

[jira] [Updated] (SPARK-16197) Cleanup PySpark status api and example

2016-07-22 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16197: - Description: Cleanup of Status API example to use SparkSession and be more consistent with other

[jira] [Commented] (SPARK-16765) Add Pipeline API example for KMeans

2016-07-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397994#comment-15397994 ] Bryan Cutler commented on SPARK-16765: -- Was there some specific use of Pipelines wit

[jira] [Created] (SPARK-16800) Fix Java Examples that throw exception

2016-07-29 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-16800: Summary: Fix Java Examples that throw exception Key: SPARK-16800 URL: https://issues.apache.org/jira/browse/SPARK-16800 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-16800) Fix Java Examples that throw exception

2016-07-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16800: - Description: Some Java examples fail to run due to an exception thrown when using mllib.linalg.Ve

[jira] [Updated] (SPARK-16800) Fix Java Examples that throw exception

2016-07-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-16800: - Description: Some Java examples fail to run due to an exception thrown when using mllib.linalg.Ve

[jira] [Commented] (SPARK-27039) toPandas with Arrow swallows maxResultSize errors

2019-03-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783690#comment-16783690 ] Bryan Cutler commented on SPARK-27039: -- I was able to reproduce in v2.4.0, but it l

[jira] [Commented] (SPARK-23961) pyspark toLocalIterator throws an exception

2019-03-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784843#comment-16784843 ] Bryan Cutler commented on SPARK-23961: -- I could also reproduce with a nearly identi

[jira] [Assigned] (SPARK-23836) Support returning StructType to the level support in GroupedMap Arrow's "scalar" UDFS (or similar)

2019-03-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-23836: Assignee: Bryan Cutler > Support returning StructType to the level support in GroupedMap

[jira] [Resolved] (SPARK-23836) Support returning StructType to the level support in GroupedMap Arrow's "scalar" UDFS (or similar)

2019-03-07 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-23836. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23900 https://git

[jira] [Created] (SPARK-27163) Cleanup and consolidate Pandas UDF functionality

2019-03-14 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-27163: Summary: Cleanup and consolidate Pandas UDF functionality Key: SPARK-27163 URL: https://issues.apache.org/jira/browse/SPARK-27163 Project: Spark Issue Type:

[jira] [Updated] (SPARK-27163) Cleanup and consolidate Pandas UDF functionality

2019-03-14 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27163: - Priority: Minor (was: Major) > Cleanup and consolidate Pandas UDF functionality > -

[jira] [Resolved] (SPARK-27240) Use pandas DataFrame for struct type argument in Scalar Pandas UDF.

2019-03-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-27240. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24177 https://git

[jira] [Assigned] (SPARK-27240) Use pandas DataFrame for struct type argument in Scalar Pandas UDF.

2019-03-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-27240: Assignee: Takuya Ueshin > Use pandas DataFrame for struct type argument in Scalar Pandas

[jira] [Created] (SPARK-27276) Increase the minimum pyarrow version to 0.12.0

2019-03-25 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-27276: Summary: Increase the minimum pyarrow version to 0.12.0 Key: SPARK-27276 URL: https://issues.apache.org/jira/browse/SPARK-27276 Project: Spark Issue Type: Im

[jira] [Commented] (SPARK-27276) Increase the minimum pyarrow version to 0.12.0

2019-03-25 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801250#comment-16801250 ] Bryan Cutler commented on SPARK-27276: -- [~shaneknapp] this will need an upgrade on

[jira] [Updated] (SPARK-27276) Increase the minimum pyarrow version to 0.12.0

2019-03-26 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27276: - Description: The current minimum version is 0.8.0, which is pretty ancient since Arrow has been

[jira] [Updated] (SPARK-27293) Setting random seed produces different results in RandomForestRegressor

2019-03-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27293: - Summary: Setting random seed produces different results in RandomForestRegressor (was: I am int

[jira] [Commented] (SPARK-27293) I am interested in finding out if there is a bug in the implementation of RandomForests. The Issue is when applying a seed and getting different results than other peo

2019-03-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804124#comment-16804124 ] Bryan Cutler commented on SPARK-27293: -- Setting the seed like in your example for r

[jira] [Updated] (SPARK-27293) Setting random seed produces different results in RandomForestRegressor

2019-03-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27293: - Description: I am interested in finding out if there is a bug in the implementation of RandomFo

[jira] [Updated] (SPARK-27293) Setting random seed produces different results in RandomForestRegressor

2019-03-28 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27293: - Component/s: ML > Setting random seed produces different results in RandomForestRegressor >

[jira] [Commented] (SPARK-27353) PySpark Row __repr__ bug

2019-04-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810009#comment-16810009 ] Bryan Cutler commented on SPARK-27353: -- Works for me out of master, can you provide

[jira] [Updated] (SPARK-27276) Increase the minimum pyarrow version to 0.12.1

2019-04-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27276: - Summary: Increase the minimum pyarrow version to 0.12.1 (was: Increase the minimum pyarrow vers

[jira] [Commented] (SPARK-27276) Increase the minimum pyarrow version to 0.12.1

2019-04-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810024#comment-16810024 ] Bryan Cutler commented on SPARK-27276: -- I think we should use 0.12.1, there was a b

[jira] [Created] (SPARK-27387) Replace sqlutils assertPandasEqual with Pandas assert_frame_equal in tests

2019-04-04 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-27387: Summary: Replace sqlutils assertPandasEqual with Pandas assert_frame_equal in tests Key: SPARK-27387 URL: https://issues.apache.org/jira/browse/SPARK-27387 Project: S

[jira] [Commented] (SPARK-27387) Replace sqlutils assertPandasEqual with Pandas assert_frame_equal in tests

2019-04-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810197#comment-16810197 ] Bryan Cutler commented on SPARK-27387: -- This can be done after the upgrade of pyarr

[jira] [Commented] (SPARK-27387) Replace sqlutils assertPandasEqual with Pandas assert_frame_equal in tests

2019-04-04 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810200#comment-16810200 ] Bryan Cutler commented on SPARK-27387: -- I can work on this > Replace sqlutils asse

[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"

2019-04-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16811355#comment-16811355 ] Bryan Cutler commented on SPARK-27389: -- >From the stacktrace, it looks like it's ge

[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"

2019-04-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812877#comment-16812877 ] Bryan Cutler commented on SPARK-27389: -- [~shaneknapp], I had a couple of successful

[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"

2019-04-09 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813896#comment-16813896 ] Bryan Cutler commented on SPARK-27389: -- Thanks [~shaneknapp] for the fix. I couldn'

[jira] [Assigned] (SPARK-27387) Replace sqlutils assertPandasEqual with Pandas assert_frame_equal in tests

2019-04-11 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-27387: Assignee: Bryan Cutler > Replace sqlutils assertPandasEqual with Pandas assert_frame_equa

[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-04-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818649#comment-16818649 ] Bryan Cutler commented on SPARK-27396: -- Thanks for this [~revans2], overall I think

[jira] [Created] (SPARK-27548) PySpark toLocalIterator does not raise errors from worker

2019-04-23 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-27548: Summary: PySpark toLocalIterator does not raise errors from worker Key: SPARK-27548 URL: https://issues.apache.org/jira/browse/SPARK-27548 Project: Spark Iss

[jira] [Resolved] (SPARK-26970) Can't load PipelineModel that was created in Scala with Python due to missing Interaction transformer

2019-04-23 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-26970. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24426 [https://gi

[jira] [Commented] (SPARK-27548) PySpark toLocalIterator does not raise errors from worker

2019-04-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829861#comment-16829861 ] Bryan Cutler commented on SPARK-27548: -- This is not that easy to fix by itself. Sin

[jira] [Comment Edited] (SPARK-27548) PySpark toLocalIterator does not raise errors from worker

2019-04-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829861#comment-16829861 ] Bryan Cutler edited comment on SPARK-27548 at 4/30/19 12:46 AM: --

[jira] [Comment Edited] (SPARK-27548) PySpark toLocalIterator does not raise errors from worker

2019-04-29 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829861#comment-16829861 ] Bryan Cutler edited comment on SPARK-27548 at 4/30/19 12:46 AM: --

[jira] [Commented] (SPARK-27463) SPIP: Support Dataframe Cogroup via Pandas UDFs

2019-04-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830535#comment-16830535 ] Bryan Cutler commented on SPARK-27463: -- I left some comments on the doc. Overall, I

[jira] [Commented] (SPARK-27519) Pandas udf corrupting data

2019-04-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830742#comment-16830742 ] Bryan Cutler commented on SPARK-27519: -- Thanks for the script [~f7faf8ba36], I was

[jira] [Updated] (SPARK-27519) Pandas udf corrupting data

2019-04-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27519: - Affects Version/s: (was: 3.0.0) > Pandas udf corrupting data > -- >

[jira] [Resolved] (SPARK-27519) Pandas udf corrupting data

2019-04-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-27519. -- Resolution: Fixed Fix Version/s: 3.0.0 Problem does not happen when running the latest

[jira] [Comment Edited] (SPARK-27519) Pandas udf corrupting data

2019-04-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830743#comment-16830743 ] Bryan Cutler edited comment on SPARK-27519 at 4/30/19 10:49 PM: --

[jira] [Updated] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None

2019-04-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27612: - Description: When creating a DataFrame with type {{ArrayType(IntegerType(), True)}} there ends

[jira] [Created] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None

2019-04-30 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-27612: Summary: Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None Key: SPARK-27612 URL: https://issues.apache.org/jira/browse/SPARK-27612

[jira] [Commented] (SPARK-27519) Pandas udf corrupting data

2019-04-30 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830755#comment-16830755 ] Bryan Cutler commented on SPARK-27519: -- I made SPARK-27612 for the problem with {{R

[jira] [Commented] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None

2019-05-01 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831092#comment-16831092 ] Bryan Cutler commented on SPARK-27612: -- Thanks [~mgaido], it seems like the problem

  1   2   3   4   5   6   7   8   >