[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...
Github user HiuKwok commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/340#discussion_r214529178 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -5596,8 +5596,8 @@ private static String replace(final String text, String searchString, final Stri } String searchText = text; if (ignoreCase) { - searchText = text.toLowerCase(); - searchString = searchString.toLowerCase(); + searchText = text.toUpperCase(); + searchString = searchString.toUpperCase(); --- End diff -- Hi all, Just a quick update finally I kind of figure the way out of this problem, while I am trying to make it happen (the implementation). But the concept is mainly Normalize incoming word at the beginning of the method call by ```Normalizer.normalize()``` ---
[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...
Github user kinow commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/340#discussion_r209907553 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -5596,8 +5596,8 @@ private static String replace(final String text, String searchString, final Stri } String searchText = text; if (ignoreCase) { - searchText = text.toLowerCase(); - searchString = searchString.toLowerCase(); + searchText = text.toUpperCase(); + searchString = searchString.toUpperCase(); --- End diff -- Yeah, a very slippery problem. We still have the option to simply document that the method does not work well with unicode strings. But first I would like to spend at least a few hours with paper and pencil (and eraser, because this could take a bit till I give up or find a possible way around it), and perhaps even check in the mailing list if other devs have any idea. I think you found a very interesting problem (*)! Keep the ideas coming if you have any on how to solve this issue :+1: _* if I had more time, I would possibly either go through other methods checking for that or, just try some fuzzifier approach to test the whole project ! Not aware of any static or dynamic analysis tool that does that_ ---
[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...
Github user HiuKwok commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/340#discussion_r209878957 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -5596,8 +5596,8 @@ private static String replace(final String text, String searchString, final Stri } String searchText = text; if (ignoreCase) { - searchText = text.toLowerCase(); - searchString = searchString.toLowerCase(); + searchText = text.toUpperCase(); + searchString = searchString.toUpperCase(); --- End diff -- @kinow It turn out the problem is large than that, because the incorrect length basically affecting the whole method, especially the part which involve parsing inside the while loop. https://github.com/apache/commons-lang/blob/590f90889bf61a5570bd98b78e73410a07d7410b/src/main/java/org/apache/commons/lang3/StringUtils.java#L5612-L5619 While string like following is entered, then Exception would be thrown on here instead. https://github.com/apache/commons-lang/blob/590f90889bf61a5570bd98b78e73410a07d7410b/src/main/java/org/apache/commons/lang3/StringUtils.java#L5613 ![image](https://user-images.githubusercontent.com/37996731/44082531-137cb1a0-9fe4-11e8-9219-8f373aebce67.png) ---
[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...
Github user HiuKwok commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/340#discussion_r209659530 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -5596,8 +5596,8 @@ private static String replace(final String text, String searchString, final Stri } String searchText = text; if (ignoreCase) { - searchText = text.toLowerCase(); - searchString = searchString.toLowerCase(); + searchText = text.toUpperCase(); + searchString = searchString.toUpperCase(); --- End diff -- @kinow May be try to store a original text.length before performance any upper || lower case operation and make use of it? haha, it's just a thought. Would try out later ---
[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...
Github user kinow commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/340#discussion_r209587514 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -5596,8 +5596,8 @@ private static String replace(final String text, String searchString, final Stri } String searchText = text; if (ignoreCase) { - searchText = text.toLowerCase(); - searchString = searchString.toLowerCase(); + searchText = text.toUpperCase(); + searchString = searchString.toUpperCase(); --- End diff -- Oohh, great testing @HiuKwok ! Thanks for sharing here. >Just a quick question what you mean by remove the length( ) mean? Would you mind to specify more on that? Sure. I think there could be a possibility to fix the issue by addressing how the length of the lower'ed/upper'ed text is used https://github.com/apache/commons-lang/blob/590f90889bf61a5570bd98b78e73410a07d7410b/src/main/java/org/apache/commons/lang3/StringUtils.java#L5603 So maybe there could be another way to work around the way we use the strings lengths, and avoid the exception. ---
[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...
Github user HiuKwok commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/340#discussion_r209574053 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -5596,8 +5596,8 @@ private static String replace(final String text, String searchString, final Stri } String searchText = text; if (ignoreCase) { - searchText = text.toLowerCase(); - searchString = searchString.toLowerCase(); + searchText = text.toUpperCase(); + searchString = searchString.toUpperCase(); --- End diff -- Hi @kinow, yes you are right after I did try to come up with a draft java main to generate all || most unicode in string and compare it's length between original, toLowerCase() and to UpperCase(). ![image](https://user-images.githubusercontent.com/37996731/44028895-032488d4-9f2e-11e8-839b-c1259a182b8e.png) It's seem like no matter which one we pick (toLowerCase || toUpperCase), it would still tend to come up with a incorrect length. Just a quick question what you mean by remove the ```length( )``` mean? Would you mind to specify more on that? Thanks, ---
[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...
Github user kinow commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/340#discussion_r208851249 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -5596,8 +5596,8 @@ private static String replace(final String text, String searchString, final Stri } String searchText = text; if (ignoreCase) { - searchText = text.toLowerCase(); - searchString = searchString.toLowerCase(); + searchText = text.toUpperCase(); + searchString = searchString.toUpperCase(); --- End diff -- Just leaving a comment here too to have a review here in GitHub. While your example works, as the character is considered already in upper case, the reverse case would still fail after changing from `toLowerCase` to `toUpperCase`. So I think we should find another solution or update the documentation stating how the code works with unicode. ---
[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...
GitHub user HiuKwok opened a pull request: https://github.com/apache/commons-lang/pull/340 [LANG-1406] StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase Fix for Lang-1406 to avoid any exception while performing String.UTils.replaceIgnoreCase() against uniCode String object. Plz let me if there have any extra things need to be done for this PR since I am the first timer for commons-lang project( add more test case?). All the best You can merge this pull request into a Git repository by running: $ git pull https://github.com/HiuKwok/commons-lang master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/commons-lang/pull/340.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #340 commit e0f6c7802b5e721019a602bf30b31c79dbf6d233 Author: Hiu Kwok Date: 2018-08-08T11:44:51Z toUpperCase() > toLowerCase() to avoid unicode string length miscalculation commit 590f90889bf61a5570bd98b78e73410a07d7410b Author: Hiu Kwok Date: 2018-08-08T11:46:29Z Assertion for example mentioned on LANG-1406 Description ---