[GitHub] spark pull request: [SPARK-8514] LU factorization on BlockMatrix

nilmeier Sun, 04 Oct 2015 20:13:16 -0700

Github user nilmeier commented on the pull request:

    https://github.com/apache/spark/pull/8563#issuecomment-145419186
  
    Addressed Fred's Review Comments in the most recent update.  I was not able 
to fully address all issues raised...they are described below:
    
    Review comments:
    1) --> Try making the portion of the examples with input data more 
condensed; perhaps
     by reading the matrices from string constants
     X Done.
    
    2) --> There's an orphan/duplicate ScalaDoc comment at line 356 of 
BlockMatrix.scala
     X Done.
    
    
    3) --> Remove carriage return on lines 370, 481, 569, 732
     X Done.
    
    
    4) --> Recommend adding a call to the new subtract method to the MLLib
    programmer's guide
     X Done.
    
    
    5) --> New API calls to BlockMatrix should have corresponding PySpark APIs
     O Mike Dusenberry has an open JIRA I will be meeting with Mike Dusenberry 
who has
       done Python API work before.
    
    
    6) --> Error message at line 394 should print out the block sizes that 
don't match
     X added error message to .add and .subtract
    
    
    7) --> The code at line 384 should multiply every element of b.head by -1 
as far
    as I can see
     X Fixed this error.
    
    8) --> Line 456 and 465-471 have wrong indentation
     X Done.
    
    
    9) --> Scaladoc at 474 should state that blockRowRange and blockColRange 
are block
    indexes, not row/column indexes
     X Done.
    
    10) --> In lines 460-463, consider making a single pass over the blocks 
instead of
    4 passes
     O I couldn't see an easy way to do this that didn't introduce additional 
layers of recursion
       that may impact performance.  I would like to add this to a later 
revision, as the recursive
       building procedures are a general design issue to consider here.
    
    11) --> Add a note to SchurComplement that the current implementation 
assumes that
    a11 fits into memory on the driver
    X Done.
    
    12) --> Might want to use a case class in return type of blockLUtoSolver
     O Will try to do this in the near future, but I didn't include the changes 
in this revision.
    
    13) --> Take a close look at the performance impact of the chains of
    multiplications at line 811 when there are many levels of recursion
    
     O We did some studies, and the code would benefit from refactoring to 
remove recursion.
      It would require a significant rewrite, and might be beyond the scope of 
the initial
      submission.
    
    
    14) --> In recursiveSequencesBuild, you may want to break off more than one 
block
    from the upper left corner of the matrix; in many cases, the available 
memory
    on the driver can comfortably hold, say 10x10 blocks. You should be able to
    query the SparkContext's memory information to determine how much heap
    space is available for the in-memory portion of the computation. On a
    side note, it would be good to produce a user-friendly error if it looks
    like there is not enough local heap to process one block's data locally.
    
     O This may require a significant rewrite to handle correctly....I would 
like to try this for the next revision.
    
    15) --> Might want to reuse buffers for the local data allocated at lines 
623-629
    to avoid triggering global GC at each level of recursion.
    
     O I would like to explore this as well in the future, but I didn't address 
it in this update.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8514] LU factorization on BlockMatrix

Reply via email to