[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan

2017-04-04 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955705#comment-15955705
 ] 

yuhao yang commented on SPARK-20203:


[~Syrux] Since you got some experiences using the PrefixSpan, I'd like to have 
your input (or better contribution) in 
https://issues.apache.org/jira/browse/SPARK-20114 . 

> Change default maxPatternLength value to Int.MaxValue in PrefixSpan
> ---
>
> Key: SPARK-20203
> URL: https://issues.apache.org/jira/browse/SPARK-20203
> Project: Spark
>  Issue Type: Wish
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Trivial
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> I think changing the default value to Int.MaxValue would be more user 
> friendly. At least for new users.
> Personally, when I run an algorithm, I expect it to find all solution by 
> default. And a limited number of them, when I set the parameters to do so.
> The current implementation limit the length of solution patterns to 10.
> Thus preventing all solution to be printed when running slightly large 
> datasets.
> I feel like that should be changed, but since this would change the default 
> behavior of PrefixSpan. I think asking for the communities opinion should 
> come first. So, what do you think ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan

2017-04-04 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954904#comment-15954904
 ] 

Nick Pentreath commented on SPARK-20203:


I see there is a comment in the code that says: {{// TODO: support unbounded 
pattern length when maxPatternLength = 0}}. But the same thing can essentially 
be achieved setting the pattern length to {{Int.MaxValue}} as Sean has 
previously said, so I don't really think this is a valid work item (in fact 
probably that comment should be removed).

Is an unbounded default really better (or worse) from an API / user facing 
perspective? There are arguments either way but to be honest I see nothing 
compelling enough to warrant a change here.

> Change default maxPatternLength value to Int.MaxValue in PrefixSpan
> ---
>
> Key: SPARK-20203
> URL: https://issues.apache.org/jira/browse/SPARK-20203
> Project: Spark
>  Issue Type: Wish
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Trivial
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> I think changing the default value to Int.MaxValue would be more user 
> friendly. At least for new users.
> Personally, when I run an algorithm, I expect it to find all solution by 
> default. And a limited number of them, when I set the parameters to do so.
> The current implementation limit the length of solution patterns to 10.
> Thus preventing all solution to be printed when running slightly large 
> datasets.
> I feel like that should be changed, but since this would change the default 
> behavior of PrefixSpan. I think asking for the communities opinion should 
> come first. So, what do you think ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan

2017-04-03 Thread Cyril de Vogelaere (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953328#comment-15953328
 ] 

Cyril de Vogelaere commented on SPARK-20203:


Oh, I thought we were talking about the performance implication of adding an if 
which would be tested often.

For the issue you just pointed, I will agree it would be a major negative 
consequence of that change.
Sorry, I didn't understand that it was what you were talking about.

Well, then I suppose we should resolve this thread with a "won't fix". Except 
if you think the potential user friendlyness can balance that major default.

> Change default maxPatternLength value to Int.MaxValue in PrefixSpan
> ---
>
> Key: SPARK-20203
> URL: https://issues.apache.org/jira/browse/SPARK-20203
> Project: Spark
>  Issue Type: Wish
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Trivial
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> I think changing the default value to Int.MaxValue would be more user 
> friendly. At least for new users.
> Personally, when I run an algorithm, I expect it to find all solution by 
> default. And a limited number of them, when I set the parameters to do so.
> The current implementation limit the length of solution patterns to 10.
> Thus preventing all solution to be printed when running slightly large 
> datasets.
> I feel like that should be changed, but since this would change the default 
> behavior of PrefixSpan. I think asking for the communities opinion should 
> come first. So, what do you think ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan

2017-04-03 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953319#comment-15953319
 ] 

Sean Owen commented on SPARK-20203:
---

How can this not have performance implications? you generate more frequent 
patterns, potentially a lot more. You can see this even in the comments and 
error messages about collecting too many elements to the driver.

> Change default maxPatternLength value to Int.MaxValue in PrefixSpan
> ---
>
> Key: SPARK-20203
> URL: https://issues.apache.org/jira/browse/SPARK-20203
> Project: Spark
>  Issue Type: Wish
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Trivial
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> I think changing the default value to Int.MaxValue would be more user 
> friendly. At least for new users.
> Personally, when I run an algorithm, I expect it to find all solution by 
> default. And a limited number of them, when I set the parameters to do so.
> The current implementation limit the length of solution patterns to 10.
> Thus preventing all solution to be printed when running slightly large 
> datasets.
> I feel like that should be changed, but since this would change the default 
> behavior of PrefixSpan. I think asking for the communities opinion should 
> come first. So, what do you think ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan

2017-04-03 Thread Cyril de Vogelaere (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953318#comment-15953318
 ] 

Cyril de Vogelaere commented on SPARK-20203:


I'm not splitting it, I deleted the other thread.

I did agree adding the zero special value might have a tiny negative effect on 
performance, without adding new functionnalities.
So I closed it, following that line of thought.

This post, is just about changing the default value. Which, you agreed, can be 
discussed.
That's a new context of discussion, so I created a new thread. This should make 
more sense no ?


> Change default maxPatternLength value to Int.MaxValue in PrefixSpan
> ---
>
> Key: SPARK-20203
> URL: https://issues.apache.org/jira/browse/SPARK-20203
> Project: Spark
>  Issue Type: Wish
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Trivial
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> I think changing the default value to Int.MaxValue would be more user 
> friendly. At least for new users.
> Personally, when I run an algorithm, I expect it to find all solution by 
> default. And a limited number of them, when I set the parameters to do so.
> The current implementation limit the length of solution patterns to 10.
> Thus preventing all solution to be printed when running slightly large 
> datasets.
> I feel like that should be changed, but since this would change the default 
> behavior of PrefixSpan. I think asking for the communities opinion should 
> come first. So, what do you think ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan

2017-04-03 Thread Cyril de Vogelaere (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953299#comment-15953299
 ] 

Cyril de Vogelaere commented on SPARK-20203:


This cannot have performance implication, we are not changing anything but the 
default value.
It does change the number of solution we are searching for. So of course it 
will take longer since the search space is bigger.

But on a dataset where it already found everything, it should still do so.
Now, it would just find everything by default. Which, I agree, should be 
debated. To know whether that's really what we want the default behavior of the 
program to be.

> Change default maxPatternLength value to Int.MaxValue in PrefixSpan
> ---
>
> Key: SPARK-20203
> URL: https://issues.apache.org/jira/browse/SPARK-20203
> Project: Spark
>  Issue Type: Wish
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Trivial
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> I think changing the default value to Int.MaxValue would be more user 
> friendly. At least for new users.
> Personally, when I run an algorithm, I expect it to find all solution by 
> default. And a limited number of them, when I set the parameters to do so.
> The current implementation limit the length of solution patterns to 10.
> Thus preventing all solution to be printed when running slightly large 
> datasets.
> I feel like that should be changed, but since this would change the default 
> behavior of PrefixSpan. I think asking for the communities opinion should 
> come first. So, what do you think ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan

2017-04-03 Thread Cyril de Vogelaere (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953297#comment-15953297
 ] 

Cyril de Vogelaere commented on SPARK-20203:


SPARK-20180 was about adding a special value (0) to find all pattern no matter 
their length, and put it as default value.
You pointed it might lower the performances, without adding more 
functionalities. So I closed that thread.

This one is just about changing the default value, no other changes in the code.
You said it needed discussion, since it was a change in default behavior. But 
the amount of comment on the last thread would discourage discussion, I felt 
like a new thread would be more appropriate.

> Change default maxPatternLength value to Int.MaxValue in PrefixSpan
> ---
>
> Key: SPARK-20203
> URL: https://issues.apache.org/jira/browse/SPARK-20203
> Project: Spark
>  Issue Type: Wish
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Trivial
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> I think changing the default value to Int.MaxValue would be more user 
> friendly. At least for new users.
> Personally, when I run an algorithm, I expect it to find all solution by 
> default. And a limited number of them, when I set the parameters to do so.
> The current implementation limit the length of solution patterns to 10.
> Thus preventing all solution to be printed when running slightly large 
> datasets.
> I feel like that should be changed, but since this would change the default 
> behavior of PrefixSpan. I think asking for the communities opinion should 
> come first. So, what do you think ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan

2017-04-03 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953289#comment-15953289
 ] 

Sean Owen commented on SPARK-20203:
---

This is again not addressing the point, that doing so has performance 
implications. Or could. That has to be established.

> Change default maxPatternLength value to Int.MaxValue in PrefixSpan
> ---
>
> Key: SPARK-20203
> URL: https://issues.apache.org/jira/browse/SPARK-20203
> Project: Spark
>  Issue Type: Wish
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Trivial
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> I think changing the default value to Int.MaxValue would be more user 
> friendly. At least for new users.
> Personally, when I run an algorithm, I expect it to find all solution by 
> default. And a limited number of them, when I set the parameters to do so.
> The current implementation limit the length of solution patterns to 10.
> Thus preventing all solution to be printed when running slightly large 
> datasets.
> I feel like that should be changed, but since this would change the default 
> behavior of PrefixSpan. I think asking for the communities opinion should 
> come first. So, what do you think ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20203) Change default maxPatternLength value to Int.MaxValue in PrefixSpan

2017-04-03 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953280#comment-15953280
 ] 

Sean Owen commented on SPARK-20203:
---

I don't understand, this is the same as SPARK-20180?

> Change default maxPatternLength value to Int.MaxValue in PrefixSpan
> ---
>
> Key: SPARK-20203
> URL: https://issues.apache.org/jira/browse/SPARK-20203
> Project: Spark
>  Issue Type: Wish
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Cyril de Vogelaere
>Priority: Trivial
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> I think changing the default value to Int.MaxValue would
> be more user friendly. At least for new user.
> Personally, when I run an algorithm, I expect it to find all solution by 
> default. And a limited number of them, when I set the parameters so.
> The current implementation limit the length of solution patterns to 10.
> Thus preventing all solution to be printed when running slightly large 
> datasets.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org