[text] Longest common subsequence wrong result?

2017-03-31 Thread Sébastien Piller
Hi all,
If I call
new LongestCommonSubsequence ().apply ("xxx","yyy")
I get 0 (correct)
If I call 
new LongestCommonSubsequence ().apply ("Gandalf","Sauron")
I get 2 which looks incorrect to me (should have got 1 since there is no 
sequence of 2 chars on both strings. Is it a bug or an expected behavior?
Thanks

Envoyé depuis mon smartphone Samsung Galaxy.

RE : Re: [text] Longest common subsequence wrong result?

2017-03-31 Thread Sébastien Piller
Integer documented as "longestCommonSubsequenceLength"


Envoyé depuis mon smartphone Samsung Galaxy.
 Message d'origine De : paul womack  
Date : 31.03.17  13:33  (GMT+01:00) À : Commons Users List 
 Objet : Re: [text] Longest common subsequence wrong 
result? 
Sébastien Piller wrote:
> Hi all,
> If I call
> new LongestCommonSubsequence ().apply ("xxx","yyy")
> I get 0 (correct)
> If I call
> new LongestCommonSubsequence ().apply ("Gandalf","Sauron")
> I get 2 which looks incorrect to me (should have got 1 since there is no 
> sequence of 2 chars on both strings. Is it a bug or an expected behavior?

What is the return type of the method?

  BugBear


-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



RE : Re: [text] Longest common subsequence wrong result?

2017-03-31 Thread Sébastien Piller
Ho ok, my bad I did not read enough...
Maybe just put that note at the class level? If you just read the class 
description, it is not clear that there is a difference between the two concepts
Thanks for the clarification !
Envoyé depuis mon smartphone Samsung Galaxy.
 Message d'origine De : Rob Tompkins  Date 
: 31.03.17  13:34  (GMT+01:00) À : Commons Users List  
Objet : Re: [text] Longest common subsequence wrong result? 
Hello Sébastien,

From what I can tell this would be expected behaviour. I think this hinges on 
the definition of “subsequence” differing from the definition of “substring.” 
By this I mean that a subsequence to be an enumerated list of elements derived 
by deleting some (possibly zero) elements from the original enumerated list. 
Whereas, a substring is an enumerated list of characters derived by deleting 
some (possibly zero) elements from the original character list and that our new 
character list were adjacent in the original list.

So, in your example of “Gandalf” and “Sauron” share the subsequence {a, n}. 
But, it we were to restrict to substring, then the longest commons substring 
would simply be {a}.

I’ve tried to spell this out in the javadoc here 
(http://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/similarity/LongestCommonSubsequence.html#logestCommonSubsequence-java.lang.CharSequence-java.lang.CharSequence-),
 but I suppose I should have been clearer in the documentation. 

Do let me know if you think there’s a way to better present this details.

Many thanks and all the best,
-Rob

> On Mar 31, 2017, at 7:16 AM, Sébastien Piller  wrote:
> 
> Hi all,
> If I call
> new LongestCommonSubsequence ().apply ("xxx","yyy")
> I get 0 (correct)
> If I call 
> new LongestCommonSubsequence ().apply ("Gandalf","Sauron")
> I get 2 which looks incorrect to me (should have got 1 since there is no 
> sequence of 2 chars on both strings. Is it a bug or an expected behavior?
> Thanks
> 
> Envoyé depuis mon smartphone Samsung Galaxy.