Github user Syrux commented on the issue:
https://github.com/apache/spark/pull/17575
Yo Sean, I already pushed the requested changes in case it's the correct
place to do so.
(I can just revert them, if not)
I added two new methods to allow tests. First a method which finds all
frequent items in a database, second a method that actually clean the database
using those frequent items. Although I didn't end up using the first method,
the pre-processing method is now much clearer to understand. So I left the new
method. Just tell me if I need to put that piece of code back.
I also added tests for multiple types of sequence database. More
specifically, when there is max one item per itemset, when there can be
multiple items per itemsets, and when cleaning the database empties it. They
should cover all cases together.
Of course, the new implementation passes the tests perfectly, and the old
one doesn't.
Every other thing remained as is.
Tell me if the way I did it was ok. I hope it's up to standards :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]