[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

2016-10-20 Thread PascalSchumacher
Github user PascalSchumacher commented on the issue:

https://github.com/apache/commons-lang/pull/189
  
@yufcuy: Thanks for the pull request. :+1: 

@kinow: Thanks for the review. I have added the link and replaced tabs with 
spaces after merging. Feel free to change/improve anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

2016-10-07 Thread kinow
Github user kinow commented on the issue:

https://github.com/apache/commons-lang/pull/189
  
Ack @PascalSchumacher after @yufcuy 's feedback we can merge it and include 
in 3.x releases, and then think where, and if, we should move text-related code 
:-)

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

2016-10-07 Thread PascalSchumacher
Github user PascalSchumacher commented on the issue:

https://github.com/apache/commons-lang/pull/189
  
@kinow I think this has to stay in lang. At least until lang 4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

2016-10-07 Thread kinow
Github user kinow commented on the issue:

https://github.com/apache/commons-lang/pull/189
  
Hi @yufcuy,

Sorry for the delay to look into this.

I looked at the first two implements this morning to refresh my memory. The 
first one creating the whole comparison table, and the second one with just the 
current and previous row. Both described in the Wikipedia page you linked 
(thanks for that).

Then I started reviewing your pull request, and I believe your 
implementation is correct :-) though I couldn't find the exact algorithm 
implementation description on Wikipedia. The best I could find was this page: 
http://blog.softwx.net/2014/12/optimizing-levenshtein-algorithm-in-c.html

The page mentioned above mentions the single-array approach. We could add a 
link to it in the Javadoc, as for the previous two implementations. What do you 
think?

There are other trivial changes regarding spelling, typos, tabs vs. spaces, 
etc. So I will add a few more comments, but all tests are passing, I see no 
regression (feature-wise, or in performance), and if you agree with the minor 
adjustments we may have to do, then I believe we are ready to merge it.

@britter should we keep it in [lang], or in [text]? Either way, I will 
replicate the change in the Levenshtein implementation in [text] :-)

Cheers
Bruno


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

2016-09-24 Thread britter
Github user britter commented on the issue:

https://github.com/apache/commons-lang/pull/189
  
@kinow can you take the lead in reviewing this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

2016-09-17 Thread yufcuy
Github user yufcuy commented on the issue:

https://github.com/apache/commons-lang/pull/189
  
Hello, @britter @kinow   
The details of Levenshtein distance can be find at 
https://en.wikipedia.org/wiki/Levenshtein_distance, as it describe, the 
algorithm is compute a matrix to hold the Levenshtein distances between all 
prefixes of the first string and all prefixes of the second. For all i and j, 
d[i][j] will hold the Levenshtein distance between the first i characters of s 
and the first j characters of t. 
The previous impl use two matrix rows for the construction. When we 
calculate the value of d[i][j] we find only the value of d[i-1][j-1], d[i-1][j] 
and d[i][j-1] are used according to the algorithm. And we use a matrix row p[] 
for the construction, when calculate p[j] at the ith iteration, it actually 
calculate the value of d[i][j], and only d[i-1][j-1] is covered by p[j-1], so 
we use variable 'upper_left' to hold d[i-1][j-1], and use variable 'upper' to 
hold next 'upper_left'.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

2016-09-14 Thread kinow
Github user kinow commented on the issue:

https://github.com/apache/commons-lang/pull/189
  
Though I also agree with @britter that some better description would help 
reviewing it :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

2016-09-14 Thread kinow
Github user kinow commented on the issue:

https://github.com/apache/commons-lang/pull/189
  
I think string distances were the one of the first things we thought about 
moving to [text]. Might be interesting to discuss implementing - if it makes 
sense - this change there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

2016-09-14 Thread britter
Github user britter commented on the issue:

https://github.com/apache/commons-lang/pull/189
  
Hello @yufcuy,

can you provide more information as to why you think this change is 
necessary? Does it improve performance properties? Do you have benchmarked the 
new implementation against the old implementation?

Regards,
Benedikt


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

2016-09-13 Thread coveralls
Github user coveralls commented on the issue:

https://github.com/apache/commons-lang/pull/189
  

[![Coverage 
Status](https://coveralls.io/builds/7859475/badge)](https://coveralls.io/builds/7859475)

Coverage increased (+0.02%) to 93.592% when pulling 
**aa5b88ec1012a738bfd24d21ed91e88b3edd094d on yufcuy:master** into 
**dad86bc0a29689fd29bf03b382a39621718e8b05 on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

2016-09-13 Thread coveralls
Github user coveralls commented on the issue:

https://github.com/apache/commons-lang/pull/189
  

[![Coverage 
Status](https://coveralls.io/builds/7859475/badge)](https://coveralls.io/builds/7859475)

Coverage increased (+0.02%) to 93.592% when pulling 
**aa5b88ec1012a738bfd24d21ed91e88b3edd094d on yufcuy:master** into 
**dad86bc0a29689fd29bf03b382a39621718e8b05 on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---