Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/12299#issuecomment-208466094
  
    I figured that it would be rare to have mixed sparse/dense vectors, but 
maybe we should indeed address this for both cases to avoid surprises.
    
    I'm inclined to fix it since the error is larger than I imagined. The 
covariances I see, for example, when running the test in this PR are like -496, 
which is way off. The error is significant for values orders of magnitude 
smaller.
    
    I wonder if it's better to push down the centering to deeper in the 
calculation of the Gramian -- like optionally pass column means to subtract. 
This avoids a temp vector; it does mean effectively losing the benefit of 
sparsity. Maybe I can figure out a way to be clever about that.
    
    Thoughts? or else I'll investigate along those lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to