[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

Syrux Sat, 08 Apr 2017 07:15:06 -0700

Github user Syrux commented on the issue:

    https://github.com/apache/spark/pull/17575
  
    Yes exactly, the current implementation adds too much unnecessary 
delimiters. We this one line change, delimiter are only placed where needed. 
    
    Currently there are no tests to verify if the algorithm cleans the 
sequences correctly. I only found that inneficiency by printing stuff around 
while I implemented other things on my local github. 
    
    If you want, I can add some tests, but that will necessitate a small 
refector to separate the cleaning part in it's own method. Calling the current 
method would directly call the main algorithm ... ^^'
    
    Two of the existing tests did cover cases where sequence of zero where 
left. However not at pertinent places (Integer/String type, variable-size 
itemsets clean a five at the end of the third sequence, leaving 2 zero instead 
of one). 
    
    I can however vouch that the previous code worked just fine. Both the 
results of the old implementation and this one are the same. They also 
correspond to the results I obtained for another standalone CP based 
implementation. It's just that this code makes the pre-processing more 
efficient.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

Reply via email to