Github user zhangjiajin commented on the pull request:
https://github.com/apache/spark/pull/7258#issuecomment-119547968
@feynmanliang findPatternsLengthOne and getPatternsWithPrefix do the
similar work, but the input data type is different. findPatternsLengthOne faces
the entire RDD data, getPatternsWithPrefix faces part of RDD data, but the
content of the data is similar. Before shuffle, the prefix and the projected
database distributed in all the partitions. After shuffle, one partition only
including one prefix and its projected database. In other words, after shuffle,
one partition of the new RDD is same as the old entire RDD. So,
findPatternsLengthOne is different from getPatternsWithPrefix, eg.
findPatternsLengthOne has some collect methods.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]