[GitHub] spark pull request: [SPARK-6487][MLlib] Add sequential pattern min...

mengxr Thu, 09 Jul 2015 08:51:08 -0700

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/7258#issuecomment-120045016
  
    @zhangjiajin Since you already collected the frequent items (length-1 
patterns) to driver, you don't need to keep the RDD of length-1 patterns. When 
generating the final patterns, recomputing the RDD is more expensive than 
parallelizing the collected ones.
    
    Another comment is to separate local computation from the distributed ones. 
It makes the implementation easier to read. We can create a private object 
called `LocalPrefixSpan`, with `run(sequences: Array[Array[Int]], minCount: 
Int): Array[(Array[Int], Int)]`, then put all local methods under this object.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6487][MLlib] Add sequential pattern min...

Reply via email to