Github user Syrux commented on the issue:
https://github.com/apache/spark/pull/17575
Yes exactly, the current implementation adds too much unnecessary
delimiters. We this one line change, delimiter are only placed where needed.
Currently there are no tests to verify if the algorithm cleans the
sequences correctly. I only found that inneficiency by printing stuff around
while I implemented other things on my local github.
If you want, I can add some tests, but that will necessitate a small
refector to separate the cleaning part in it's own method. Calling the current
method would directly call the main algorithm ... ^^'
Two of the existing tests did cover cases where sequence of zero where
left. However not at pertinent places (Integer/String type, variable-size
itemsets clean a five at the end of the third sequence, leaving 2 zero instead
of one).
I can however vouch that the previous code worked just fine. Both the
results of the old implementation and this one are the same. They also
correspond to the results I obtained for another standalone CP based
implementation. It's just that this code makes the pre-processing more
efficient.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]