[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

yinxusen Mon, 31 Mar 2014 09:36:09 -0700

Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/268#issuecomment-39110100
  
    @mengxr I am not very sure of the concept of sparse vector. In your 
example, do you mean the column is `Vector(1.0, 0.0, 2.0, 0.0, 3.0, 0.0, 0.0)` 
or 
    `RDD(
    Vector(1.0), 
    Vector(0.0), 
    Vector(2.0), 
    Vector(0.0), 
    Vector(3.0), 
    Vector(0.0), 
    Vector(0.0)
    )`?
    
    If it is the case 1, then it is easy to rewrite it in O(nnz), otherwise, it 
will be difficult, because we cannot judge whether a column is sparse or not 
before we count the nnz. If the case 1 is your mean, then I think I should 
treat sparse vector different with the dense one with the following code:
    
    `RDD.take(1).head.type match {
      case DenseVector[Double] => xxx
      case SparseVector[Double] => xxx
    }`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] [SPARK-1328] Add vector statistics

Reply via email to