[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955705#comment-15955705 ] yuhao yang commented on SPARK-20203: [~Syrux] Since you got some experiences using the PrefixSpan, I'd like to have your input (or better contribution) in https://issues.apache.org/jira/browse/SPARK-20114 . > Change default maxPatternLength value to Int.MaxValue in PrefixSpan > --- > > Key: SPARK-20203 > URL: https://issues.apache.org/jira/browse/SPARK-20203 > Project: Spark > Issue Type: Wish > Components: MLlib >Affects Versions: 2.1.0 >Reporter: Cyril de Vogelaere >Priority: Trivial > Original Estimate: 0h > Remaining Estimate: 0h > > I think changing the default value to Int.MaxValue would be more user > friendly. At least for new users. > Personally, when I run an algorithm, I expect it to find all solution by > default. And a limited number of them, when I set the parameters to do so. > The current implementation limit the length of solution patterns to 10. > Thus preventing all solution to be printed when running slightly large > datasets. > I feel like that should be changed, but since this would change the default > behavior of PrefixSpan. I think asking for the communities opinion should > come first. So, what do you think ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954904#comment-15954904 ] Nick Pentreath commented on SPARK-20203: I see there is a comment in the code that says: {{// TODO: support unbounded pattern length when maxPatternLength = 0}}. But the same thing can essentially be achieved setting the pattern length to {{Int.MaxValue}} as Sean has previously said, so I don't really think this is a valid work item (in fact probably that comment should be removed). Is an unbounded default really better (or worse) from an API / user facing perspective? There are arguments either way but to be honest I see nothing compelling enough to warrant a change here. > Change default maxPatternLength value to Int.MaxValue in PrefixSpan > --- > > Key: SPARK-20203 > URL: https://issues.apache.org/jira/browse/SPARK-20203 > Project: Spark > Issue Type: Wish > Components: MLlib >Affects Versions: 2.1.0 >Reporter: Cyril de Vogelaere >Priority: Trivial > Original Estimate: 0h > Remaining Estimate: 0h > > I think changing the default value to Int.MaxValue would be more user > friendly. At least for new users. > Personally, when I run an algorithm, I expect it to find all solution by > default. And a limited number of them, when I set the parameters to do so. > The current implementation limit the length of solution patterns to 10. > Thus preventing all solution to be printed when running slightly large > datasets. > I feel like that should be changed, but since this would change the default > behavior of PrefixSpan. I think asking for the communities opinion should > come first. So, what do you think ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953328#comment-15953328 ] Cyril de Vogelaere commented on SPARK-20203: Oh, I thought we were talking about the performance implication of adding an if which would be tested often. For the issue you just pointed, I will agree it would be a major negative consequence of that change. Sorry, I didn't understand that it was what you were talking about. Well, then I suppose we should resolve this thread with a "won't fix". Except if you think the potential user friendlyness can balance that major default. > Change default maxPatternLength value to Int.MaxValue in PrefixSpan > --- > > Key: SPARK-20203 > URL: https://issues.apache.org/jira/browse/SPARK-20203 > Project: Spark > Issue Type: Wish > Components: MLlib >Affects Versions: 2.1.0 >Reporter: Cyril de Vogelaere >Priority: Trivial > Original Estimate: 0h > Remaining Estimate: 0h > > I think changing the default value to Int.MaxValue would be more user > friendly. At least for new users. > Personally, when I run an algorithm, I expect it to find all solution by > default. And a limited number of them, when I set the parameters to do so. > The current implementation limit the length of solution patterns to 10. > Thus preventing all solution to be printed when running slightly large > datasets. > I feel like that should be changed, but since this would change the default > behavior of PrefixSpan. I think asking for the communities opinion should > come first. So, what do you think ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953319#comment-15953319 ] Sean Owen commented on SPARK-20203: --- How can this not have performance implications? you generate more frequent patterns, potentially a lot more. You can see this even in the comments and error messages about collecting too many elements to the driver. > Change default maxPatternLength value to Int.MaxValue in PrefixSpan > --- > > Key: SPARK-20203 > URL: https://issues.apache.org/jira/browse/SPARK-20203 > Project: Spark > Issue Type: Wish > Components: MLlib >Affects Versions: 2.1.0 >Reporter: Cyril de Vogelaere >Priority: Trivial > Original Estimate: 0h > Remaining Estimate: 0h > > I think changing the default value to Int.MaxValue would be more user > friendly. At least for new users. > Personally, when I run an algorithm, I expect it to find all solution by > default. And a limited number of them, when I set the parameters to do so. > The current implementation limit the length of solution patterns to 10. > Thus preventing all solution to be printed when running slightly large > datasets. > I feel like that should be changed, but since this would change the default > behavior of PrefixSpan. I think asking for the communities opinion should > come first. So, what do you think ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953318#comment-15953318 ] Cyril de Vogelaere commented on SPARK-20203: I'm not splitting it, I deleted the other thread. I did agree adding the zero special value might have a tiny negative effect on performance, without adding new functionnalities. So I closed it, following that line of thought. This post, is just about changing the default value. Which, you agreed, can be discussed. That's a new context of discussion, so I created a new thread. This should make more sense no ? > Change default maxPatternLength value to Int.MaxValue in PrefixSpan > --- > > Key: SPARK-20203 > URL: https://issues.apache.org/jira/browse/SPARK-20203 > Project: Spark > Issue Type: Wish > Components: MLlib >Affects Versions: 2.1.0 >Reporter: Cyril de Vogelaere >Priority: Trivial > Original Estimate: 0h > Remaining Estimate: 0h > > I think changing the default value to Int.MaxValue would be more user > friendly. At least for new users. > Personally, when I run an algorithm, I expect it to find all solution by > default. And a limited number of them, when I set the parameters to do so. > The current implementation limit the length of solution patterns to 10. > Thus preventing all solution to be printed when running slightly large > datasets. > I feel like that should be changed, but since this would change the default > behavior of PrefixSpan. I think asking for the communities opinion should > come first. So, what do you think ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953299#comment-15953299 ] Cyril de Vogelaere commented on SPARK-20203: This cannot have performance implication, we are not changing anything but the default value. It does change the number of solution we are searching for. So of course it will take longer since the search space is bigger. But on a dataset where it already found everything, it should still do so. Now, it would just find everything by default. Which, I agree, should be debated. To know whether that's really what we want the default behavior of the program to be. > Change default maxPatternLength value to Int.MaxValue in PrefixSpan > --- > > Key: SPARK-20203 > URL: https://issues.apache.org/jira/browse/SPARK-20203 > Project: Spark > Issue Type: Wish > Components: MLlib >Affects Versions: 2.1.0 >Reporter: Cyril de Vogelaere >Priority: Trivial > Original Estimate: 0h > Remaining Estimate: 0h > > I think changing the default value to Int.MaxValue would be more user > friendly. At least for new users. > Personally, when I run an algorithm, I expect it to find all solution by > default. And a limited number of them, when I set the parameters to do so. > The current implementation limit the length of solution patterns to 10. > Thus preventing all solution to be printed when running slightly large > datasets. > I feel like that should be changed, but since this would change the default > behavior of PrefixSpan. I think asking for the communities opinion should > come first. So, what do you think ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953297#comment-15953297 ] Cyril de Vogelaere commented on SPARK-20203: SPARK-20180 was about adding a special value (0) to find all pattern no matter their length, and put it as default value. You pointed it might lower the performances, without adding more functionalities. So I closed that thread. This one is just about changing the default value, no other changes in the code. You said it needed discussion, since it was a change in default behavior. But the amount of comment on the last thread would discourage discussion, I felt like a new thread would be more appropriate. > Change default maxPatternLength value to Int.MaxValue in PrefixSpan > --- > > Key: SPARK-20203 > URL: https://issues.apache.org/jira/browse/SPARK-20203 > Project: Spark > Issue Type: Wish > Components: MLlib >Affects Versions: 2.1.0 >Reporter: Cyril de Vogelaere >Priority: Trivial > Original Estimate: 0h > Remaining Estimate: 0h > > I think changing the default value to Int.MaxValue would be more user > friendly. At least for new users. > Personally, when I run an algorithm, I expect it to find all solution by > default. And a limited number of them, when I set the parameters to do so. > The current implementation limit the length of solution patterns to 10. > Thus preventing all solution to be printed when running slightly large > datasets. > I feel like that should be changed, but since this would change the default > behavior of PrefixSpan. I think asking for the communities opinion should > come first. So, what do you think ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953289#comment-15953289 ] Sean Owen commented on SPARK-20203: --- This is again not addressing the point, that doing so has performance implications. Or could. That has to be established. > Change default maxPatternLength value to Int.MaxValue in PrefixSpan > --- > > Key: SPARK-20203 > URL: https://issues.apache.org/jira/browse/SPARK-20203 > Project: Spark > Issue Type: Wish > Components: MLlib >Affects Versions: 2.1.0 >Reporter: Cyril de Vogelaere >Priority: Trivial > Original Estimate: 0h > Remaining Estimate: 0h > > I think changing the default value to Int.MaxValue would be more user > friendly. At least for new users. > Personally, when I run an algorithm, I expect it to find all solution by > default. And a limited number of them, when I set the parameters to do so. > The current implementation limit the length of solution patterns to 10. > Thus preventing all solution to be printed when running slightly large > datasets. > I feel like that should be changed, but since this would change the default > behavior of PrefixSpan. I think asking for the communities opinion should > come first. So, what do you think ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953280#comment-15953280 ] Sean Owen commented on SPARK-20203: --- I don't understand, this is the same as SPARK-20180? > Change default maxPatternLength value to Int.MaxValue in PrefixSpan > --- > > Key: SPARK-20203 > URL: https://issues.apache.org/jira/browse/SPARK-20203 > Project: Spark > Issue Type: Wish > Components: MLlib >Affects Versions: 2.1.0 >Reporter: Cyril de Vogelaere >Priority: Trivial > Original Estimate: 0h > Remaining Estimate: 0h > > I think changing the default value to Int.MaxValue would > be more user friendly. At least for new user. > Personally, when I run an algorithm, I expect it to find all solution by > default. And a limited number of them, when I set the parameters so. > The current implementation limit the length of solution patterns to 10. > Thus preventing all solution to be printed when running slightly large > datasets. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org