Github user paberline commented on the pull request:
https://github.com/apache/spark/pull/6995#issuecomment-115004710
Using the DoubleArrayWritable example, I have added support for storing
NumPy double arrays and matrices as arrays of doubles and nested arrays of
doubles as value elements of Sequence Files.
Each value element is a discrete matrix or array. This is useful where you
have many matrices that you don't want to join into a single Spark Data Frame
to store in a Parquet file.
Pandas DataFrames can be easily converted to and from NumPy matrices, so
I've also added the ability to store the schema-less data from DataFrames and
Series that contain double data.
There seems to be demand for this functionality:
http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]