[jira] [Commented] (LANG-910) Patch to extend StringUtils
[ https://issues.apache.org/jira/browse/LANG-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501419#comment-15501419 ] ASF GitHub Bot commented on LANG-910: - Github user PascalSchumacher commented on the issue: https://github.com/apache/commons-lang/pull/184 the non-breaking space replacement was added with https://issues.apache.org/jira/browse/LANG-910 > Patch to extend StringUtils > --- > > Key: LANG-910 > URL: https://issues.apache.org/jira/browse/LANG-910 > Project: Commons Lang > Issue Type: Improvement > Components: lang.* >Affects Versions: 3.1 > Environment: Developed on Ubuntu 13.04 with openjdk 7u25 >Reporter: Timur Yarosh > Labels: patch > Fix For: Discussion > > Attachments: LANG-910.patch, > substring-matches-and-white-space-normalize.patch > > > This patch extends StringUtils capabilities: added methods to find > substring(s) by Pattern. Also method > org.apache.commons.lang3.StringUtils#normalizeSpace now replaces ASCII #160 > char to normal whitespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (LANG-910) Patch to extend StringUtils
[ https://issues.apache.org/jira/browse/LANG-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992593#comment-14992593 ] Pascal Schumacher commented on LANG-910: Replacing hard space (ASCII #160) with normal whitespace is an enhacement, but not normalizing mixtures of hard spaces and other whitespace is not o.k. [LANG-1184]. > Patch to extend StringUtils > --- > > Key: LANG-910 > URL: https://issues.apache.org/jira/browse/LANG-910 > Project: Commons Lang > Issue Type: Improvement > Components: lang.* >Affects Versions: 3.1 > Environment: Developed on Ubuntu 13.04 with openjdk 7u25 >Reporter: Timur Yarosh > Labels: patch > Fix For: Discussion > > Attachments: LANG-910.patch, > substring-matches-and-white-space-normalize.patch > > > This patch extends StringUtils capabilities: added methods to find > substring(s) by Pattern. Also method > org.apache.commons.lang3.StringUtils#normalizeSpace now replaces ASCII #160 > char to normal whitespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (LANG-910) Patch to extend StringUtils
[ https://issues.apache.org/jira/browse/LANG-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801994#comment-13801994 ] Matt Benson commented on LANG-910: -- Followup: Your example suggested that a user looking for kins would intuitively supply 2 as the final argument. Maybe so, but I think the _correct_ behavior would really be: Yes, support pulling in each matching subsequence, but rather than: {code} assertEquals(kins, StringUtils.substringMatching(two little pumpkins sitting over there, Pattern.compile(() ), 2)); {code} The correct assertion would be: {code} assertEquals(kins, StringUtils.substringsMatching(two little pumpkins sitting over there, Pattern.compile(() ), 1)[1]); {code} Less intuitive perhaps, but IMO more defensible in terms of why did we do it this way?. We provide the required functionality but the path from {{Pattern}}/{{Matcher}} APIs is fairly evident. Patch to extend StringUtils --- Key: LANG-910 URL: https://issues.apache.org/jira/browse/LANG-910 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 3.1 Environment: Developed on Ubuntu 13.04 with openjdk 7u25 Reporter: Timur Yarosh Labels: patch Fix For: 3.2, Discussion Attachments: LANG-910.patch, substring-matches-and-white-space-normalize.patch This patch extends StringUtils capabilities: added methods to find substring(s) by Pattern. Also method org.apache.commons.lang3.StringUtils#normalizeSpace now replaces ASCII #160 char to normal whitespace. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (LANG-910) Patch to extend StringUtils
[ https://issues.apache.org/jira/browse/LANG-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802010#comment-13802010 ] Henri Yandell commented on LANG-910: Gotya. Sounds reasonable. I agree with the [0] group being supported btw. I'm going to drop the 3.2 off this for the moment as the patch needs more work. I also wonder if it should go in a PatternUtils. Patch to extend StringUtils --- Key: LANG-910 URL: https://issues.apache.org/jira/browse/LANG-910 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 3.1 Environment: Developed on Ubuntu 13.04 with openjdk 7u25 Reporter: Timur Yarosh Labels: patch Fix For: Discussion Attachments: LANG-910.patch, substring-matches-and-white-space-normalize.patch This patch extends StringUtils capabilities: added methods to find substring(s) by Pattern. Also method org.apache.commons.lang3.StringUtils#normalizeSpace now replaces ASCII #160 char to normal whitespace. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (LANG-910) Patch to extend StringUtils
[ https://issues.apache.org/jira/browse/LANG-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800660#comment-13800660 ] Matt Benson commented on LANG-910: -- My primary comment is that I find the method naming uncomfortable, particularly in the case of those returning {{String[]}}. I would prefer this family of names instead be called {{substring\[s]Match}}_ing_. To your last question, I feel as though I'm not understanding what you feel is less natural about the way the proposed API works. Can you elaborate on the context of your example? Patch to extend StringUtils --- Key: LANG-910 URL: https://issues.apache.org/jira/browse/LANG-910 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 3.1 Environment: Developed on Ubuntu 13.04 with openjdk 7u25 Reporter: Timur Yarosh Labels: patch Fix For: 3.2, Discussion Attachments: LANG-910.patch, substring-matches-and-white-space-normalize.patch This patch extends StringUtils capabilities: added methods to find substring(s) by Pattern. Also method org.apache.commons.lang3.StringUtils#normalizeSpace now replaces ASCII #160 char to normal whitespace. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (LANG-910) Patch to extend StringUtils
[ https://issues.apache.org/jira/browse/LANG-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801043#comment-13801043 ] Henri Yandell commented on LANG-910: +1 to 'ing'. On my less natural statement, let's use the example: assertEquals(ttle, StringUtils.substringMatching(two little pumpkins sitting over there, Pattern.compile(() ))); Here the pattern finds four characters, followed by a space. So the first one it finds is the ttle on the end of little. So let's skip to the next word, just like we would with substring: assertEquals(kins, StringUtils.substringMatching(two little pumpkins sitting over there, Pattern.compile(() ), 1)); More fool us, because regex results are 1-indexed and not 0-indexed, the answer is ttle again. Okay, so let's increment our start index: assertEquals(kins, StringUtils.substringMatching(two little pumpkins sitting over there, Pattern.compile(() ), 2)); This test fails. The result was null and not kins. This is because there is only one group in (), regardless of how often it matches. So using the substringMatching version never allows us to get to kins. Moving to substringsMatching would in that we would get an array back for either the undeclared index or the 1-index. Nothing else. My concern is that this is confusing and not the simplicity a user would be aiming for with the code. Patch to extend StringUtils --- Key: LANG-910 URL: https://issues.apache.org/jira/browse/LANG-910 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 3.1 Environment: Developed on Ubuntu 13.04 with openjdk 7u25 Reporter: Timur Yarosh Labels: patch Fix For: 3.2, Discussion Attachments: LANG-910.patch, substring-matches-and-white-space-normalize.patch This patch extends StringUtils capabilities: added methods to find substring(s) by Pattern. Also method org.apache.commons.lang3.StringUtils#normalizeSpace now replaces ASCII #160 char to normal whitespace. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (LANG-910) Patch to extend StringUtils
[ https://issues.apache.org/jira/browse/LANG-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801093#comment-13801093 ] Matt Benson commented on LANG-910: -- Thanks for the clear example. I suppose that as long as you're wrapping up {{Pattern}}/{{Matcher}} functionality it shouldn't hurt to do this. I do feel as though the method variants that do not accept a capture group parameter should specify 0 internally to imply the entire matched pattern. Patch to extend StringUtils --- Key: LANG-910 URL: https://issues.apache.org/jira/browse/LANG-910 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 3.1 Environment: Developed on Ubuntu 13.04 with openjdk 7u25 Reporter: Timur Yarosh Labels: patch Fix For: 3.2, Discussion Attachments: LANG-910.patch, substring-matches-and-white-space-normalize.patch This patch extends StringUtils capabilities: added methods to find substring(s) by Pattern. Also method org.apache.commons.lang3.StringUtils#normalizeSpace now replaces ASCII #160 char to normal whitespace. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (LANG-910) Patch to extend StringUtils
[ https://issues.apache.org/jira/browse/LANG-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793599#comment-13793599 ] Henri Yandell commented on LANG-910: I've committed the whitespace change. Reviewing the new Pattern methods. Patch to extend StringUtils --- Key: LANG-910 URL: https://issues.apache.org/jira/browse/LANG-910 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 3.1 Environment: Developed on Ubuntu 13.04 with openjdk 7u25 Reporter: Timur Yarosh Labels: patch Fix For: 3.2 Attachments: substring-matches-and-white-space-normalize.patch This patch extends StringUtils capabilities: added methods to find substring(s) by Pattern. Also method org.apache.commons.lang3.StringUtils#normalizeSpace now replaces ASCII #160 char to normal whitespace. -- This message was sent by Atlassian JIRA (v6.1#6144)