Github user dusenberrymw commented on the pull request:
https://github.com/apache/spark/pull/7554#issuecomment-123497391
@mengxr No problem, it has been enjoyable to work on! Here are some
thoughts:
1. For local matrices and vectors, Scala has `Matrices` and `Vectors`
classes, which each contain factory methods for creating the various local
`Matrix` and `Vector` types (`DenseVector`, `SparseVector`, `DenseMatrix`,
`SparseMatrix`). These factory methods are the recommended method for creating
these matrices & vectors. On the Python side, there are also `Matrices` and
`Vectors` classes with factory methods, however, rather than call the Scala
counterpart, these just mimic the behavior and create the various `Matrix` and
`Vector` types directly in Python. For the _distributed_ matrices, I thought
it would be best to follow the same idea, so I added a `DistributedMatrices`
class in Scala containing factory methods, and created the equivalent in
Python. On the Python side, I think this ends up being a really clean
solution, as it allows the specific types of distributed matrices (`RowMatrix`,
`IndexedRowMatrix`, etc.) to simply be wrappers over their Scala/Java
counterpart,
similar to how the RDD and DataFrame classes act in Python. This keeps the
creation logic within the factory methods, and allows for clean conversions
between the distributed matrix types in Python. Really interested in your
thoughts on this! I'd definitely be willing to pull that out into a separate
pull request though, should that end up being the best idea.
2. For point number _2_, yes, there were still a few long Python doctests,
but I have cleaned those up now! Also, looks like I'm now having issues with
the unit tests between Python2 & Python3 (`2` vs `2L` for example), so I need
to look into that. The logic is correct, but just need to fix the output so
that both Python versions output the same.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]