[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...

2018-09-01 Thread HiuKwok
Github user HiuKwok commented on a diff in the pull request:

https://github.com/apache/commons-lang/pull/340#discussion_r214529178
  
--- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java ---
@@ -5596,8 +5596,8 @@ private static String replace(final String text, 
String searchString, final Stri
  }
  String searchText = text;
  if (ignoreCase) {
- searchText = text.toLowerCase();
- searchString = searchString.toLowerCase();
+ searchText = text.toUpperCase();
+ searchString = searchString.toUpperCase();
--- End diff --

Hi all, Just a quick update finally I kind of figure the way out of this 
problem,
while I am trying to make it happen (the implementation).
But the concept is mainly Normalize incoming word at the beginning of the 
method call by ```Normalizer.normalize()```


---


[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...

2018-08-14 Thread kinow
Github user kinow commented on a diff in the pull request:

https://github.com/apache/commons-lang/pull/340#discussion_r209907553
  
--- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java ---
@@ -5596,8 +5596,8 @@ private static String replace(final String text, 
String searchString, final Stri
  }
  String searchText = text;
  if (ignoreCase) {
- searchText = text.toLowerCase();
- searchString = searchString.toLowerCase();
+ searchText = text.toUpperCase();
+ searchString = searchString.toUpperCase();
--- End diff --

Yeah, a very slippery problem. We still have the option to simply document 
that the method does not work well with unicode strings. 

But first I would like to spend at least a few hours with paper and pencil 
(and eraser, because this could take a bit till I give up or find a possible 
way around it), and perhaps even check in the mailing list if other devs have 
any idea.

I think you found a very interesting problem (*)! Keep the ideas coming if 
you have any on how to solve this issue :+1: 

_* if I had more time, I would possibly either go through other methods 
checking for that or, just try some fuzzifier approach to test the whole 
project ! Not aware of any static or dynamic analysis tool that does that_


---


[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...

2018-08-14 Thread HiuKwok
Github user HiuKwok commented on a diff in the pull request:

https://github.com/apache/commons-lang/pull/340#discussion_r209878957
  
--- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java ---
@@ -5596,8 +5596,8 @@ private static String replace(final String text, 
String searchString, final Stri
  }
  String searchText = text;
  if (ignoreCase) {
- searchText = text.toLowerCase();
- searchString = searchString.toLowerCase();
+ searchText = text.toUpperCase();
+ searchString = searchString.toUpperCase();
--- End diff --

@kinow It turn out the problem is large than that, because the incorrect 
length basically affecting the whole method, especially the part which involve 
parsing inside the while loop.


https://github.com/apache/commons-lang/blob/590f90889bf61a5570bd98b78e73410a07d7410b/src/main/java/org/apache/commons/lang3/StringUtils.java#L5612-L5619

While string like following is entered, then Exception would be thrown on 
here instead.

https://github.com/apache/commons-lang/blob/590f90889bf61a5570bd98b78e73410a07d7410b/src/main/java/org/apache/commons/lang3/StringUtils.java#L5613


![image](https://user-images.githubusercontent.com/37996731/44082531-137cb1a0-9fe4-11e8-9219-8f373aebce67.png)






---


[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...

2018-08-13 Thread HiuKwok
Github user HiuKwok commented on a diff in the pull request:

https://github.com/apache/commons-lang/pull/340#discussion_r209659530
  
--- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java ---
@@ -5596,8 +5596,8 @@ private static String replace(final String text, 
String searchString, final Stri
  }
  String searchText = text;
  if (ignoreCase) {
- searchText = text.toLowerCase();
- searchString = searchString.toLowerCase();
+ searchText = text.toUpperCase();
+ searchString = searchString.toUpperCase();
--- End diff --

@kinow May be try to store a original text.length before performance any 
upper || lower case operation and make use of it?

haha, it's just a thought. Would try out later 


---


[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...

2018-08-13 Thread kinow
Github user kinow commented on a diff in the pull request:

https://github.com/apache/commons-lang/pull/340#discussion_r209587514
  
--- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java ---
@@ -5596,8 +5596,8 @@ private static String replace(final String text, 
String searchString, final Stri
  }
  String searchText = text;
  if (ignoreCase) {
- searchText = text.toLowerCase();
- searchString = searchString.toLowerCase();
+ searchText = text.toUpperCase();
+ searchString = searchString.toUpperCase();
--- End diff --

Oohh, great testing @HiuKwok ! Thanks for sharing here.

>Just a quick question what you mean by remove the length( ) mean? Would 
you mind to specify more on that?

Sure. I think there could be a possibility to fix the issue by addressing 
how the length of the lower'ed/upper'ed text is used 
https://github.com/apache/commons-lang/blob/590f90889bf61a5570bd98b78e73410a07d7410b/src/main/java/org/apache/commons/lang3/StringUtils.java#L5603

So maybe there could be another way to work around the way we use the 
strings lengths, and avoid the exception.


---


[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...

2018-08-13 Thread HiuKwok
Github user HiuKwok commented on a diff in the pull request:

https://github.com/apache/commons-lang/pull/340#discussion_r209574053
  
--- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java ---
@@ -5596,8 +5596,8 @@ private static String replace(final String text, 
String searchString, final Stri
  }
  String searchText = text;
  if (ignoreCase) {
- searchText = text.toLowerCase();
- searchString = searchString.toLowerCase();
+ searchText = text.toUpperCase();
+ searchString = searchString.toUpperCase();
--- End diff --

Hi @kinow, yes you are right after I did try to come up with a draft java 
main to generate all || most unicode in string and compare it's length between 
original, toLowerCase() and to UpperCase().


![image](https://user-images.githubusercontent.com/37996731/44028895-032488d4-9f2e-11e8-839b-c1259a182b8e.png)
  
It's seem like no matter which one we pick (toLowerCase || toUpperCase), it 
would still tend to come up with a incorrect length.

Just a quick question what you mean by remove the ```length( )``` mean? 
Would you mind to specify more on that? 

Thanks, 


---


[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...

2018-08-09 Thread kinow
Github user kinow commented on a diff in the pull request:

https://github.com/apache/commons-lang/pull/340#discussion_r208851249
  
--- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java ---
@@ -5596,8 +5596,8 @@ private static String replace(final String text, 
String searchString, final Stri
  }
  String searchText = text;
  if (ignoreCase) {
- searchText = text.toLowerCase();
- searchString = searchString.toLowerCase();
+ searchText = text.toUpperCase();
+ searchString = searchString.toUpperCase();
--- End diff --

Just leaving a comment here too to have a review here in GitHub. While your 
example works, as the character is considered already in upper case, the 
reverse case would still fail after changing from `toLowerCase` to 
`toUpperCase`. So I think we should find another solution or update the 
documentation stating how the code works with unicode.


---


[GitHub] commons-lang pull request #340: [LANG-1406] StringIndexOutOfBoundsException ...

2018-08-08 Thread HiuKwok
GitHub user HiuKwok opened a pull request:

https://github.com/apache/commons-lang/pull/340

[LANG-1406]  StringIndexOutOfBoundsException in 
StringUtils.replaceIgnoreCase

Fix for Lang-1406 to avoid any exception while performing 
String.UTils.replaceIgnoreCase() against uniCode String object. 

Plz let me if there have any extra things need to be done for this PR since 
I am the first timer for commons-lang project( add more test case?).

All the best

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HiuKwok/commons-lang master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/commons-lang/pull/340.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #340


commit e0f6c7802b5e721019a602bf30b31c79dbf6d233
Author: Hiu Kwok 
Date:   2018-08-08T11:44:51Z

toUpperCase() > toLowerCase() to avoid unicode string length miscalculation

commit 590f90889bf61a5570bd98b78e73410a07d7410b
Author: Hiu Kwok 
Date:   2018-08-08T11:46:29Z

Assertion for example mentioned on LANG-1406 Description




---