[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

Syrux Sat, 08 Apr 2017 12:41:25 -0700

Github user Syrux commented on the issue:

    https://github.com/apache/spark/pull/17575
  
    Yo Sean, I already pushed the requested changes in case it's the correct 
place to do so.
    (I can just revert them, if not)
    
    I added two new methods to allow tests. First a method which finds all 
frequent items in a database, second a method that actually clean the database 
using those frequent items. Although I didn't end up using the first method, 
the pre-processing method is now much clearer to understand. So I left the new 
method. Just tell me if I need to put that piece of code back.
    
    I also added tests for multiple types of sequence database. More 
specifically, when there is max one item per itemset, when there can be 
multiple items per itemsets, and when cleaning the database empties it. They 
should cover all cases together.
    
    Of course, the new implementation passes the tests perfectly, and the old 
one doesn't.
    Every other thing remained as is.
    
    Tell me if the way I did it was ok. I hope it's up to standards :)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

Reply via email to