Github user dusenberrymw commented on the pull request:

    https://github.com/apache/spark/pull/7554#issuecomment-123497391
  
    @mengxr No problem, it has been enjoyable to work on!  Here are some 
thoughts:
    
    1. For local matrices and vectors, Scala has `Matrices` and `Vectors` 
classes, which each contain factory methods for creating the various local 
`Matrix` and `Vector` types (`DenseVector`, `SparseVector`, `DenseMatrix`, 
`SparseMatrix`).  These factory methods are the recommended method for creating 
these matrices & vectors.  On the Python side, there are also `Matrices` and 
`Vectors` classes with factory methods, however, rather than call the Scala 
counterpart, these just mimic the behavior and create the various `Matrix` and 
`Vector` types directly in Python.  For the _distributed_ matrices, I thought 
it would be best to follow the same idea, so I added a `DistributedMatrices` 
class in Scala containing factory methods, and created the equivalent in 
Python.  On the Python side, I think this ends up being a really clean 
solution, as it allows the specific types of distributed matrices (`RowMatrix`, 
`IndexedRowMatrix`, etc.) to simply be wrappers over their Scala/Java 
counterpart, 
 similar to how the RDD and DataFrame classes act in Python.  This keeps the 
creation logic within the factory methods, and allows for clean conversions 
between the distributed matrix types in Python.  Really interested in your 
thoughts on this!  I'd definitely be willing to pull that out into a separate 
pull request though, should that end up being the best idea.
    2. For point number _2_, yes, there were still a few long Python doctests, 
but I have cleaned those up now!  Also, looks like I'm now having issues with 
the unit tests between Python2 & Python3 (`2` vs `2L` for example), so I need 
to look into that.  The logic is correct, but just need to fix the output so 
that both Python versions output the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to