subject:"\"Possible SPIP to improve matrix and vector column type support\""

Re: Possible SPIP to improve matrix and vector column type support

2018-05-12 Thread Leif Walsh

I filed an SPIP for this at https://issues.apache.org/jira/browse/SPARK-24258. Let’s discuss! On Wed, Apr 18, 2018 at 23:33 Leif Walsh wrote: > I agree we should reuse as much as possible. For PySpark, I think the > obvious choices of Breeze and numpy arrays already made make a lot of > sense, I

Re: Possible SPIP to improve matrix and vector column type support

2018-04-18 Thread Leif Walsh

I agree we should reuse as much as possible. For PySpark, I think the obvious choices of Breeze and numpy arrays already made make a lot of sense, I’m not sure about the other language bindings and would defer to others. I was under the impression that UDTs were gone and (probably?) not coming bac

Re: Possible SPIP to improve matrix and vector column type support

2018-04-18 Thread Joseph Bradley

Thanks for the thoughts! We've gone back and forth quite a bit about local linear algebra support in Spark. For reference, there have been some discussions here: https://issues.apache.org/jira/browse/SPARK-6442 https://issues.apache.org/jira/browse/SPARK-16365 https://issues.apache.org/jira/brows

Possible SPIP to improve matrix and vector column type support

2018-04-11 Thread Leif Walsh

Hi all, I’ve been playing around with the Vector and Matrix UDTs in pyspark.ml and I’ve found myself wanting more. There is a minor issue in that with the arrow serialization enabled, these types don’t serialize properly in python UDF calls or in toPandas. There’s a natural representation for the