Github user feynmanliang commented on the pull request:

    https://github.com/apache/spark/pull/7412#issuecomment-121508184
  
    * PR title is off; should be  "before local processing" instead of "before 
projection"
    * Instead of terminating on `minPatternsBeforeShuffle`, should the 
termination condition be some maximum size of a projected database? The current 
condition doesn't handle the case when we have `minPatternsBeforeShuffle` 
prefixes but some have very large projected databases (e.g. if the sequences 
are very long there may be many prefixes receiving most of the dataset even if 
`minPatternsBeforeShuffle` prefixes are generated)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to