[GitHub] spark pull request: [SPARK-6487][MLlib] Add sequential pattern min...

zhangjiajin Wed, 08 Jul 2015 04:46:01 -0700

Github user zhangjiajin commented on the pull request:

    https://github.com/apache/spark/pull/7258#issuecomment-119547968
  
    @feynmanliang  findPatternsLengthOne and getPatternsWithPrefix do the 
similar work, but the input data type is different. findPatternsLengthOne faces 
the entire RDD data, getPatternsWithPrefix faces part of RDD data, but the 
content of the data is similar. Before shuffle, the prefix and the projected 
database distributed in all the partitions. After shuffle, one partition only 
including one prefix and its projected database. In other words, after shuffle, 
one partition of the new RDD is same as the old entire RDD. So, 
findPatternsLengthOne is different from getPatternsWithPrefix, eg. 
findPatternsLengthOne has some collect methods.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6487][MLlib] Add sequential pattern min...

Reply via email to