Re: RFR: 8197594 - String and character repeat
On 2/18/18 1:37 AM, James Laskey wrote: Didn’t I hear someone mentioning “\U1D11A” at some point? On 2/19/18 7:55 AM, Martin Buchholz wrote: Oops, I already got it wrong - it's already at 6 hex digits because there are 17 planes, not 16. MAX_CODE_POINT is U+10. Yes, we need a variable width syntax like regex \x{h...h} Yeah, there are a bunch of syntactic alternatives to consider. An "obvious" alternative to "\u" is "\Uxx" which works if you're always willing to specify six digits (or to have some weird non-delimited but variable-length sequence, such as the existing octal escapes for chars (does anybody use those (see JLS 3.10.6)?)) The difference between \u and \U is rather subtle, though. Or a delimited sequence such as used by regex might be an alternative. And java regex also supports \N{name}The character with Unicode character name 'name' so we could do the same for the java language. Although it would be a little weird to have every Unicode update make some previously invalid source files valid. We could also say "It's 2018 and UTF-8 has won" and simply use UTF-8 in source files directly. No Unicode escapes needed. Even if one is willing to have a source file in UTF-8 (as opposed to say, ASCII) there are things in Unicode that are really hard to edit. For example, there are zero-width spaces, joiners, non-joiners, directionality markers, etc. I think escapes are the bare minimum. Some kind of name-based interpolation would be useful, but the actual Unicode names are rather unwieldy. Maybe something like HTML entities would be worthwhile to investigate, though probably with a different syntax. s'marks
Re: RFR: 8197594 - String and character repeat
To close the loop on this, I've reopened JDK-4993841, which requests adding an API Character.toString(int) which converts an int codepoint value to a String. (This seems like the obvious API, in parallel with Character.toString(char), but of course alternatives could be considered.) s'marks On 2/20/18 11:46 AM, Louis Wasserman wrote: I'm with Brian: adding a separate API to make it easier to get from a codepoint to a String seems independently merited, and makes the single repeat API work well for that case. A very quick regex-powered search comes up with 183 hits in Google for (new String|String.copyValueOf|String.valueOf)(Character.toChars(..)). I do, however, recommend a separate thread for discussing that API :) On Tue, Feb 20, 2018 at 11:33 AM Kevin Bourrillion wrote: Just to add another dimension to this data: most of the usages of our repeat method (~75%) are in test code. These tests usually just want any old test string of a certain length. Repeating a single character is the obvious way to get that. Among production code usages (~25%), there are a few roughly equal use cases: ascii indentation/alignment, redaction, and Martin's expected case of "drawing" with ASCII symbols, and "other". On Thu, Feb 15, 2018 at 12:52 PM, Louis Wasserman wrote: I don't think there's a case for demand to merit having a repeat(CharSequence, int) at all. I did an analysis of usages of Guava's Strings.repeat on Google's codebase. Users might be rolling their own implementations, too, but this should be a very good proxy for demand. StringRepeat_SingleConstantChar = 4.475 K // strings with .length() == 1 StringRepeat_SingleConstantCodePoint = 28 // strings with .codePointCount(...) == 1 StringRepeat_MultiCodePointConstant = 1.156 K // constant strings neither of the above StringRepeat_CharSequenceToString = 2 // Strings.repeat(CharSequence.toString(), n) StringRepeat_NoneOfTheAbove = 248 Notably, it seems like basically nobody needs to repeat a CharSequence -- definitely not enough demand to merit the awkwardness of e.g. Rope.repeat(n) inheriting a repeat returning a String. Based on this data, I'd recommend providing one and only one method of this type: String.repeat(int). There's no real advantage to a static repeat(char, int) method when the overwhelming majority of these are constants: e.g. compare SomeUtilClass.repeat('*', n) versus "*".repeat(n). Character.toString(c).repeat(n) isn't a bad workaround if you don't have a constant char. There also isn't much demand for dealing with the code point case specially; the String.repeat(int) method seems like it'd handle that just fine. On Thu, Feb 15, 2018 at 11:44 AM Jim Laskey wrote: On Feb 15, 2018, at 3:36 PM, Ivan Gerasimov < ivan.gerasi...@oracle.com> wrote: Hello! The link with the webrev returned 404, but I could find it at this location: http://cr.openjdk.java.net/~jlaskey/8197594/webrev-00/ A few minor comments: 1) This check: 2992 final long limit = (long)count * 2L; 2993 if ((long)Integer.MAX_VALUE < limit) { can be possibly simplified as if (count > Integer.MAX_VALUE - count) { Good. 2) Should String repeat(final int codepoint, final int count) be optimized for codepoints that can be represented with a single char? E.g. like this: public static String repeat(final int codepoint, final int count) { return Character.isBmpCodePoint(codepoint)) ? repeat((char) codepoint, count) : (new String(Character.toChars(codepoint))).repeat(count); } Yes, avoid array allocation. 3) Using long arithmetic can possibly be avoided in the common path of repeat(final int count): E.g. like this: if (count < 0) { throw new IllegalArgumentException("count is negative, " + count); } else if (count == 1) { return this; } else if (count == 0) { return ""; } final int len = value.length; if (Integer.MAX_VALUE / count < len) { throw new IllegalArgumentException( "Resulting string exceeds maximum string length: " + ((long)len * (long)count)); } final int limit = count * len; Good. Thank you. With kind regards, Ivan On 2/15/18 9:20 AM, Jim Laskey wrote: This is a pre-CSR code review [1] for String repeat methods (Enhancement). The proposal is to introduce four new methods; 1. public String repeat(final int count) 2. public static String repeat(final char ch, final int count) 3. public static String repeat(final int codepoint, final int count) 4. public static String repeat(final CharSequence seq, final int count) For the sake of transparency, only 1 is necessary, 2-4 are convenience methods. In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, 10). 3 and 4 convert to String before calling 1. Performance runs with jmh (results as comment in [2]) show that these methods are significantly faster that Str
Re: RFR: 8197594 - String and character repeat
. On Sun, Feb 18, 2018 at 11:19 AM, Martin Buchholz wrote: > > - how many digits to consume after the escape? How much do we trust > Unicode to never ever grow beyond 5 hex digits? > Oops, I already got it wrong - it's already at 6 hex digits because there are 17 planes, not 16. MAX_CODE_POINT is U+10. Yes, we need a variable width syntax like regex \x{h...h} And java regex also supports \N{name} The character with Unicode character name 'name' so we could do the same for the java language. Although it would be a little weird to have every Unicode update make some previously invalid source files valid. We could also say "It's 2018 and UTF-8 has won" and simply use UTF-8 in source files directly. No Unicode escapes needed.
Re: RFR: 8197594 - String and character repeat
On 2/18/18, 11:19 AM, Martin Buchholz wrote: On Sun, Feb 18, 2018 at 1:37 AM, James Laskey wrote: Didn’t I hear someone mentioning “\U1D11A” at some point? Unicode codepoint escapes are still Not a Thing, apparently. Yes, it's a language feature, but an easy one to implement. Except for the usual bikeshedding about syntax - include the '+'? - how many digits to consume after the escape? How much do we trust Unicode to never ever grow beyond 5 hex digits? |now we have some "similar" already in regex \x|{h...h} :-) It's hard to say "never" given we are adding so many emoji in Unicode but it will take a while to fill those unsigned space. / /
Re: RFR: 8197594 - String and character repeat
On Sun, Feb 18, 2018 at 1:37 AM, James Laskey wrote: > Didn’t I hear someone mentioning “\U1D11A” at some point? > Unicode codepoint escapes are still Not a Thing, apparently. Yes, it's a language feature, but an easy one to implement. Except for the usual bikeshedding about syntax - include the '+'? - how many digits to consume after the escape? How much do we trust Unicode to never ever grow beyond 5 hex digits?
Re: RFR: 8197594 - String and character repeat
Am 18.02.2018 um 06:10 schrieb Stuart Marks: Fair enough. I'll be less unhappy if there is a way to convert from a code point to a String, as requested by JDK-4993841. This will reduce new String(Character.toChars(codepoint)).repeat(count) to Character.toString(codepoint).repeat(count) Shorter and maybe more logical to get a String by a String constructor, instead overloading toString() of Character: String(codepoint).repeat(count) -Ulf
Re: RFR: 8197594 - String and character repeat
Didn’t I hear someone mentioning “\U1D11A” at some point? Sent from my iPhone > On Feb 18, 2018, at 1:10 AM, Stuart Marks wrote: > > Fair enough. I'll be less unhappy if there is a way to convert from a code > point to a String, as requested by JDK-4993841. This will reduce > >new String(Character.toChars(codepoint)).repeat(count) > > to > >Character.toString(codepoint).repeat(count) > > But this is still fairly roundabout. Since most cases are constants, the > advice is to use a string literal instead of a char literal. This works for > BMP characters, e.g. "-".repeat(10) or "\u2501".repeat(15). But if I want a > non-BMP character as a string literal, I have encode it into a surrogate pair > myself. For example, a string literal containing the character U+1D11A > MUSICAL SYMBOL FIVE-LINE STAFF would be "\uD834\uDD1A". Ugh! Or, I could just > call a function and live with it not being a constant. It would be nice if > there were an escape sequence that allowed any Unicode code point, including > supplementary characters, to be put to n a string literal. > > s'marks > >> On Feb 16, 2018, at 18:02, Brian Goetz wrote: >> >> Disagree. >> >> On #3, most of the time the char being repeated is already a literal. So >> just make it a string. >> >> On #2, better to aim for string.ofCodePoint(int) and compose w repeat. >> >> Down to one method again :) >> >> Sent from my MacBook Wheel >> >>> On Feb 16, 2018, at 5:13 PM, Stuart Marks wrote: >>> >>> Let me put in an argument for handling code points: >>> 3. public static String repeat(final int codepoint, final int count) >>> >>> Most of the String and Character API handles code points on an equal >>> footing with chars. I think this is important, as over time Unicode is >>> continuing to add supplementary characters -- those that can't be >>> represented in a Java char value. Examples abound of how such characters >>> are mishandled. Therefore, I believe Java APIs should have full support for >>> code points. >>> >>> This is a small thing, and some might consider it a rare case -- how often >>> does one need to repeat something like an emoji? The issue however isn't >>> that particular use case. Instead what's required is the ability to handle >>> *any Unicode character* uniformly, regardless of whether or not it's a >>> supplementary character. The way to do that is to deal with code points, so >>> any Java API that deals with character data must also handle code points. >>> >>> If we were to add just one method: >>> 1. public String repeat(final int count) >>> >>> the workaround is to take the character, turn it into a string, and call >>> the repeat() method on it. For a 'char' value, this isn't too bad, but I'd >>> argue it isn't pretty either: >>> >>> Character.toString(charVal).repeat(n) >>> >>> But this only handles BMP characters, not supplementary characters. >>> Unfortunately, there's no direct way to turn a code point into a string -- >>> you have to turn it into a byte array first! Thus, to get a string from a >>> code point and repeat it, you have to do this: >>> >>> new String(Character.toChars(codepoint)).repeat(count) >>> >>> This is enough indirection that it's hard to discover, and I suspect that >>> most people won't put in the effort to do this correctly, resulting in more >>> code that mishandles supplementary characters. >>> >>> Thus, I think we need to add API #3 that performs the repeat function on >>> code points. >>> >>> (Hm, the lack of Character.toString(codepoint) is covered by JDK-4993841, >>> which is closed. I think I'll reopen it.) >>> 2. public static String repeat(final char ch, final int count) >>> >>> I can see that this API is not as important as one that handles code >>> points, and it seems to be less frequently used according to Louis W's >>> analysis. But if you have char data you want to repeat, not having this >>> seems like an omission; it seems backwards to have to create a string from >>> the char, only for repeat() to extract that char from that String in order >>> to repeat it. Thus I've vote for inclusion of this method as well. >>> >>> s'marks >>> >>> On 2/16/18 5:10 AM, Jim Laskey wrote: We’re going with the one instance method (Louis clinched it.) with recommended enhancements and not touching CharSequence. Working it up now. — Jim > On Feb 16, 2018, at 7:46 AM, Alan Bateman wrote: > > On 15/02/2018 17:20, Jim Laskey wrote: >> This is a pre-CSR code review [1] for String repeat methods >> (Enhancement). >> >> The proposal is to introduce four new methods; >> >> 1. public String repeat(final int count) >> 2. public static String repeat(final char ch, final int count) >> 3. public static String repeat(final int codepoint, final int count) >> 4. public static String repeat(final CharSequence seq, final int count) >> > Just catching
Re: RFR: 8197594 - String and character repeat
Fair enough. I'll be less unhappy if there is a way to convert from a code point to a String, as requested by JDK-4993841. This will reduce new String(Character.toChars(codepoint)).repeat(count) to Character.toString(codepoint).repeat(count) But this is still fairly roundabout. Since most cases are constants, the advice is to use a string literal instead of a char literal. This works for BMP characters, e.g. "-".repeat(10) or "\u2501".repeat(15). But if I want a non-BMP character as a string literal, I have encode it into a surrogate pair myself. For example, a string literal containing the character U+1D11A MUSICAL SYMBOL FIVE-LINE STAFF would be "\uD834\uDD1A". Ugh! Or, I could just call a function and live with it not being a constant. It would be nice if there were an escape sequence that allowed any Unicode code point, including supplementary characters, to be put to n a string literal. s'marks > On Feb 16, 2018, at 18:02, Brian Goetz wrote: > > Disagree. > > On #3, most of the time the char being repeated is already a literal. So > just make it a string. > > On #2, better to aim for string.ofCodePoint(int) and compose w repeat. > > Down to one method again :) > > Sent from my MacBook Wheel > >> On Feb 16, 2018, at 5:13 PM, Stuart Marks wrote: >> >> Let me put in an argument for handling code points: >> >>> 3. public static String repeat(final int codepoint, final int count) >> >> Most of the String and Character API handles code points on an equal footing >> with chars. I think this is important, as over time Unicode is continuing to >> add supplementary characters -- those that can't be represented in a Java >> char value. Examples abound of how such characters are mishandled. >> Therefore, I believe Java APIs should have full support for code points. >> >> This is a small thing, and some might consider it a rare case -- how often >> does one need to repeat something like an emoji? The issue however isn't >> that particular use case. Instead what's required is the ability to handle >> *any Unicode character* uniformly, regardless of whether or not it's a >> supplementary character. The way to do that is to deal with code points, so >> any Java API that deals with character data must also handle code points. >> >> If we were to add just one method: >> >>> 1. public String repeat(final int count) >> >> the workaround is to take the character, turn it into a string, and call the >> repeat() method on it. For a 'char' value, this isn't too bad, but I'd argue >> it isn't pretty either: >> >> Character.toString(charVal).repeat(n) >> >> But this only handles BMP characters, not supplementary characters. >> Unfortunately, there's no direct way to turn a code point into a string -- >> you have to turn it into a byte array first! Thus, to get a string from a >> code point and repeat it, you have to do this: >> >> new String(Character.toChars(codepoint)).repeat(count) >> >> This is enough indirection that it's hard to discover, and I suspect that >> most people won't put in the effort to do this correctly, resulting in more >> code that mishandles supplementary characters. >> >> Thus, I think we need to add API #3 that performs the repeat function on >> code points. >> >> (Hm, the lack of Character.toString(codepoint) is covered by JDK-4993841, >> which is closed. I think I'll reopen it.) >> >>> 2. public static String repeat(final char ch, final int count) >> >> I can see that this API is not as important as one that handles code points, >> and it seems to be less frequently used according to Louis W's analysis. But >> if you have char data you want to repeat, not having this seems like an >> omission; it seems backwards to have to create a string from the char, only >> for repeat() to extract that char from that String in order to repeat it. >> Thus I've vote for inclusion of this method as well. >> >> s'marks >> >> >>> On 2/16/18 5:10 AM, Jim Laskey wrote: >>> We’re going with the one instance method (Louis clinched it.) with >>> recommended enhancements and not touching CharSequence. >>> Working it up now. >>> — Jim On Feb 16, 2018, at 7:46 AM, Alan Bateman wrote: On 15/02/2018 17:20, Jim Laskey wrote: > This is a pre-CSR code review [1] for String repeat methods (Enhancement). > > The proposal is to introduce four new methods; > > 1. public String repeat(final int count) > 2. public static String repeat(final char ch, final int count) > 3. public static String repeat(final int codepoint, final int count) > 4. public static String repeat(final CharSequence seq, final int count) > Just catching up on this thread and it's hard to see where the bidding is currently at. Are you planning to send an updated proposal, a list of methods is fine, even if it's just one, is okay (implementation can follow later). -Alan >> >
Re: RFR: 8197594 - String and character repeat
> On Feb 17, 2018, at 9:40 AM, Martin Buchholz wrote: > > Let me join the chorus of agreement with Brian here. > > The most popular use case will forever be ASCII line of non-letter symbols. We may also wish to have a repeating version on StringBuilder, though: sb.append(INDENT_CHARS, indentLevel) (or a default method on Appendable with the same effect.) By way of background … my primary motivation for these sorts of methods is to take things that require execution as _statements_ (loops, if-then, etc) and turn them into _expressions_, not primarily because they are more compact, but because they are then _composible_. Repeating a string today requires a loop (yes, I know you can do it with a stream expression), which means it can’t be done inline as a method parameter, requiring you to potentially unroll a deeply nested expression just to create a statement context to do this bit of paperwork.
Re: RFR: 8197594 - String and character repeat
Let me join the chorus of agreement with Brian here. The most popular use case will forever be ASCII line of non-letter symbols. Another way to think about codepoints is as conversion between UTF-32 and UTF-16, or as general support for text in UTF-32 format, which is not supported well within the JDK (and we probably should not do too much work in this direction). On Sat, Feb 17, 2018 at 1:50 AM, Remi Forax wrote: > - Mail original - > > De: "Brian Goetz" > > > I really can’t see the value of more than one method. If we need other > forms > > they should be for constructing strings not repeating strings. > > > > Sent from my MacBook Wheel > > I fully agree. > > Rémi
Re: RFR: 8197594 - String and character repeat
- Mail original - > De: "Brian Goetz" > À: "Xueming Shen" > Cc: "core-libs-dev" > Envoyé: Samedi 17 Février 2018 03:30:10 > Objet: Re: RFR: 8197594 - String and character repeat > I really can’t see the value of more than one method. If we need other forms > they should be for constructing strings not repeating strings. > > Sent from my MacBook Wheel I fully agree. Rémi > >> On Feb 16, 2018, at 6:12 PM, Xueming Shen wrote: >> >>> On 2/16/18, 5:13 PM, Stuart Marks wrote: >>> Let me put in an argument for handling code points: >>> >>>> 3. public static String repeat(final int codepoint, final int count) >>> >>> Most of the String and Character API handles code points on an equal footing >>> with chars. I think this is important, as over time Unicode is continuing to >>> add supplementary characters -- those that can't be represented in a Java >>> char >>> value. Examples abound of how such characters are mishandled. Therefore, I >>> believe Java APIs should have full support for code points. >>> >>> This is a small thing, and some might consider it a rare case -- how often >>> does >>> one need to repeat something like an emoji? The issue however isn't that >>> particular use case. Instead what's required is the ability to handle *any >>> Unicode character* uniformly, regardless of whether or not it's a >>> supplementary >>> character. The way to do that is to deal with code points, so any Java API >>> that >>> deals with character data must also handle code points. >>> >>> If we were to add just one method: >>> >>>> 1. public String repeat(final int count) >>> >>> the workaround is to take the character, turn it into a string, and call the >>> repeat() method on it. For a 'char' value, this isn't too bad, but I'd >>> argue it >>> isn't pretty either: >>> >>>Character.toString(charVal).repeat(n) >> >> How about >> >> public static repeat(int count, char... chars)? >> >> String.repeat(100, '*'); >> String.repeat(100, 'x', 'y'); >> >> and it should not be too bad and kinda consistent to have >> >> String.repeat(n, Character.toChars(0x12345)); >> > > -sherman
Re: RFR: 8197594 - String and character repeat
I really can’t see the value of more than one method. If we need other forms they should be for constructing strings not repeating strings. Sent from my MacBook Wheel > On Feb 16, 2018, at 6:12 PM, Xueming Shen wrote: > >> On 2/16/18, 5:13 PM, Stuart Marks wrote: >> Let me put in an argument for handling code points: >> >>> 3. public static String repeat(final int codepoint, final int count) >> >> Most of the String and Character API handles code points on an equal footing >> with chars. I think this is important, as over time Unicode is continuing to >> add supplementary characters -- those that can't be represented in a Java >> char value. Examples abound of how such characters are mishandled. >> Therefore, I believe Java APIs should have full support for code points. >> >> This is a small thing, and some might consider it a rare case -- how often >> does one need to repeat something like an emoji? The issue however isn't >> that particular use case. Instead what's required is the ability to handle >> *any Unicode character* uniformly, regardless of whether or not it's a >> supplementary character. The way to do that is to deal with code points, so >> any Java API that deals with character data must also handle code points. >> >> If we were to add just one method: >> >>> 1. public String repeat(final int count) >> >> the workaround is to take the character, turn it into a string, and call the >> repeat() method on it. For a 'char' value, this isn't too bad, but I'd argue >> it isn't pretty either: >> >>Character.toString(charVal).repeat(n) > > How about > > public static repeat(int count, char... chars)? > > String.repeat(100, '*'); > String.repeat(100, 'x', 'y'); > > and it should not be too bad and kinda consistent to have > > String.repeat(n, Character.toChars(0x12345)); > > -sherman
Re: RFR: 8197594 - String and character repeat
On 2/16/18, 5:13 PM, Stuart Marks wrote: Let me put in an argument for handling code points: 3. public static String repeat(final int codepoint, final int count) Most of the String and Character API handles code points on an equal footing with chars. I think this is important, as over time Unicode is continuing to add supplementary characters -- those that can't be represented in a Java char value. Examples abound of how such characters are mishandled. Therefore, I believe Java APIs should have full support for code points. This is a small thing, and some might consider it a rare case -- how often does one need to repeat something like an emoji? The issue however isn't that particular use case. Instead what's required is the ability to handle *any Unicode character* uniformly, regardless of whether or not it's a supplementary character. The way to do that is to deal with code points, so any Java API that deals with character data must also handle code points. If we were to add just one method: 1. public String repeat(final int count) the workaround is to take the character, turn it into a string, and call the repeat() method on it. For a 'char' value, this isn't too bad, but I'd argue it isn't pretty either: Character.toString(charVal).repeat(n) How about public static repeat(int count, char... chars)? String.repeat(100, '*'); String.repeat(100, 'x', 'y'); and it should not be too bad and kinda consistent to have String.repeat(n, Character.toChars(0x12345)); -sherman
Re: RFR: 8197594 - String and character repeat
Disagree. On #3, most of the time the char being repeated is already a literal. So just make it a string. On #2, better to aim for string.ofCodePoint(int) and compose w repeat. Down to one method again :) Sent from my MacBook Wheel > On Feb 16, 2018, at 5:13 PM, Stuart Marks wrote: > > Let me put in an argument for handling code points: > >> 3. public static String repeat(final int codepoint, final int count) > > Most of the String and Character API handles code points on an equal footing > with chars. I think this is important, as over time Unicode is continuing to > add supplementary characters -- those that can't be represented in a Java > char value. Examples abound of how such characters are mishandled. Therefore, > I believe Java APIs should have full support for code points. > > This is a small thing, and some might consider it a rare case -- how often > does one need to repeat something like an emoji? The issue however isn't that > particular use case. Instead what's required is the ability to handle *any > Unicode character* uniformly, regardless of whether or not it's a > supplementary character. The way to do that is to deal with code points, so > any Java API that deals with character data must also handle code points. > > If we were to add just one method: > >> 1. public String repeat(final int count) > > the workaround is to take the character, turn it into a string, and call the > repeat() method on it. For a 'char' value, this isn't too bad, but I'd argue > it isn't pretty either: > >Character.toString(charVal).repeat(n) > > But this only handles BMP characters, not supplementary characters. > Unfortunately, there's no direct way to turn a code point into a string -- > you have to turn it into a byte array first! Thus, to get a string from a > code point and repeat it, you have to do this: > >new String(Character.toChars(codepoint)).repeat(count) > > This is enough indirection that it's hard to discover, and I suspect that > most people won't put in the effort to do this correctly, resulting in more > code that mishandles supplementary characters. > > Thus, I think we need to add API #3 that performs the repeat function on code > points. > > (Hm, the lack of Character.toString(codepoint) is covered by JDK-4993841, > which is closed. I think I'll reopen it.) > >> 2. public static String repeat(final char ch, final int count) > > I can see that this API is not as important as one that handles code points, > and it seems to be less frequently used according to Louis W's analysis. But > if you have char data you want to repeat, not having this seems like an > omission; it seems backwards to have to create a string from the char, only > for repeat() to extract that char from that String in order to repeat it. > Thus I've vote for inclusion of this method as well. > > s'marks > > >> On 2/16/18 5:10 AM, Jim Laskey wrote: >> We’re going with the one instance method (Louis clinched it.) with >> recommended enhancements and not touching CharSequence. >> Working it up now. >> — Jim >>> On Feb 16, 2018, at 7:46 AM, Alan Bateman wrote: >>> >>> On 15/02/2018 17:20, Jim Laskey wrote: This is a pre-CSR code review [1] for String repeat methods (Enhancement). The proposal is to introduce four new methods; 1. public String repeat(final int count) 2. public static String repeat(final char ch, final int count) 3. public static String repeat(final int codepoint, final int count) 4. public static String repeat(final CharSequence seq, final int count) >>> Just catching up on this thread and it's hard to see where the bidding is >>> currently at. Are you planning to send an updated proposal, a list of >>> methods is fine, even if it's just one, is okay (implementation can follow >>> later). >>> >>> -Alan >
Re: RFR: 8197594 - String and character repeat
Let me put in an argument for handling code points: 3. public static String repeat(final int codepoint, final int count) Most of the String and Character API handles code points on an equal footing with chars. I think this is important, as over time Unicode is continuing to add supplementary characters -- those that can't be represented in a Java char value. Examples abound of how such characters are mishandled. Therefore, I believe Java APIs should have full support for code points. This is a small thing, and some might consider it a rare case -- how often does one need to repeat something like an emoji? The issue however isn't that particular use case. Instead what's required is the ability to handle *any Unicode character* uniformly, regardless of whether or not it's a supplementary character. The way to do that is to deal with code points, so any Java API that deals with character data must also handle code points. If we were to add just one method: 1. public String repeat(final int count) the workaround is to take the character, turn it into a string, and call the repeat() method on it. For a 'char' value, this isn't too bad, but I'd argue it isn't pretty either: Character.toString(charVal).repeat(n) But this only handles BMP characters, not supplementary characters. Unfortunately, there's no direct way to turn a code point into a string -- you have to turn it into a byte array first! Thus, to get a string from a code point and repeat it, you have to do this: new String(Character.toChars(codepoint)).repeat(count) This is enough indirection that it's hard to discover, and I suspect that most people won't put in the effort to do this correctly, resulting in more code that mishandles supplementary characters. Thus, I think we need to add API #3 that performs the repeat function on code points. (Hm, the lack of Character.toString(codepoint) is covered by JDK-4993841, which is closed. I think I'll reopen it.) 2. public static String repeat(final char ch, final int count) I can see that this API is not as important as one that handles code points, and it seems to be less frequently used according to Louis W's analysis. But if you have char data you want to repeat, not having this seems like an omission; it seems backwards to have to create a string from the char, only for repeat() to extract that char from that String in order to repeat it. Thus I've vote for inclusion of this method as well. s'marks On 2/16/18 5:10 AM, Jim Laskey wrote: We’re going with the one instance method (Louis clinched it.) with recommended enhancements and not touching CharSequence. Working it up now. — Jim On Feb 16, 2018, at 7:46 AM, Alan Bateman wrote: On 15/02/2018 17:20, Jim Laskey wrote: This is a pre-CSR code review [1] for String repeat methods (Enhancement). The proposal is to introduce four new methods; 1. public String repeat(final int count) 2. public static String repeat(final char ch, final int count) 3. public static String repeat(final int codepoint, final int count) 4. public static String repeat(final CharSequence seq, final int count) Just catching up on this thread and it's hard to see where the bidding is currently at. Are you planning to send an updated proposal, a list of methods is fine, even if it's just one, is okay (implementation can follow later). -Alan
Re: RFR: 8197594 - String and character repeat
We’re going with the one instance method (Louis clinched it.) with recommended enhancements and not touching CharSequence. Working it up now. — Jim > On Feb 16, 2018, at 7:46 AM, Alan Bateman wrote: > > On 15/02/2018 17:20, Jim Laskey wrote: >> This is a pre-CSR code review [1] for String repeat methods (Enhancement). >> >> The proposal is to introduce four new methods; >> >> 1. public String repeat(final int count) >> 2. public static String repeat(final char ch, final int count) >> 3. public static String repeat(final int codepoint, final int count) >> 4. public static String repeat(final CharSequence seq, final int count) >> > Just catching up on this thread and it's hard to see where the bidding is > currently at. Are you planning to send an updated proposal, a list of methods > is fine, even if it's just one, is okay (implementation can follow later). > > -Alan > > > > > > > > > >
Re: RFR: 8197594 - String and character repeat
On 15/02/2018 21:59, Joseph D. Darcy wrote: : My general recommendation if the code review and CSR review are to be serialized is to estimate which one is more likely to generate feedback that might modify the proposal and run though that process first. Since there has already been feedback leading to updates of this proposal, code review first in this case seems like a reasonable choice :-) I agree, esp. for API additions where it is highly likely there will be several iterations as the API is discussed. It's pointless finalizing a CSR without getting wider feedback first. -Alan.
Re: RFR: 8197594 - String and character repeat
On 15/02/2018 17:20, Jim Laskey wrote: This is a pre-CSR code review [1] for String repeat methods (Enhancement). The proposal is to introduce four new methods; 1. public String repeat(final int count) 2. public static String repeat(final char ch, final int count) 3. public static String repeat(final int codepoint, final int count) 4. public static String repeat(final CharSequence seq, final int count) Just catching up on this thread and it's hard to see where the bidding is currently at. Are you planning to send an updated proposal, a list of methods is fine, even if it's just one, is okay (implementation can follow later). -Alan
Re: RFR: 8197594 - String and character repeat
On 15 February 2018 at 20:52, Louis Wasserman wrote: > Based on this data, I'd recommend providing one and only one method of this > type: String.repeat(int). Only adding the one instance method seems like the best plan in this case. Stephen
Re: RFR: 8197594 - String and character repeat
On 2/15/2018 12:38 PM, Roger Riggs wrote: Hi Jim, Its cleaner to do the API (CSR) review before and without the implementation. It helps keep the issues separate. Don't make statements about count == Integer.MAX_VALUE / 2. There is no point, unless it should throw IAE. My general recommendation if the code review and CSR review are to be serialized is to estimate which one is more likely to generate feedback that might modify the proposal and run though that process first. Since there has already been feedback leading to updates of this proposal, code review first in this case seems like a reasonable choice :-) Developers at their discretion can run CSR and code review in parallel, but feedback may come from either process. Proposals still under development can also go through the two-phase CSR process to get some initial feedback before an intended-to-be-final version is produced. HTH, -Joe
Re: RFR: 8197594 - String and character repeat
This matches my intuition too. Sent from my MacBook Wheel > On Feb 15, 2018, at 12:52 PM, Louis Wasserman wrote: > > I don't think there's a case for demand to merit having a > repeat(CharSequence, int) at all. > > I did an analysis of usages of Guava's Strings.repeat on Google's > codebase. Users might be rolling their own implementations, too, but this > should be a very good proxy for demand. > > StringRepeat_SingleConstantChar = 4.475 K // strings with .length() == 1 > StringRepeat_SingleConstantCodePoint = 28 // strings with > .codePointCount(...) == 1 > StringRepeat_MultiCodePointConstant = 1.156 K // constant strings neither > of the above > StringRepeat_CharSequenceToString = 2 // > Strings.repeat(CharSequence.toString(), n) > StringRepeat_NoneOfTheAbove = 248 > > Notably, it seems like basically nobody needs to repeat a CharSequence -- > definitely not enough demand to merit the awkwardness of e.g. > Rope.repeat(n) inheriting a repeat returning a String. > > Based on this data, I'd recommend providing one and only one method of this > type: String.repeat(int). There's no real advantage to a static > repeat(char, int) method when the overwhelming majority of these are > constants: e.g. compare SomeUtilClass.repeat('*', n) versus "*".repeat(n). > Character.toString(c).repeat(n) isn't a bad workaround if you don't have a > constant char. There also isn't much demand for dealing with the code > point case specially; the String.repeat(int) method seems like it'd handle > that just fine. > >> On Thu, Feb 15, 2018 at 11:44 AM Jim Laskey wrote: >> >> >> >>> On Feb 15, 2018, at 3:36 PM, Ivan Gerasimov >> wrote: >>> >>> Hello! >>> >>> The link with the webrev returned 404, but I could find it at this >> location: http://cr.openjdk.java.net/~jlaskey/8197594/webrev-00/ >>> >>> A few minor comments: >>> >>> 1) >>> >>> This check: >>> >>> 2992 final long limit = (long)count * 2L; >>> 2993 if ((long)Integer.MAX_VALUE < limit) { >>> >>> can be possibly simplified as >>> if (count > Integer.MAX_VALUE - count) { >> >> Good. >> >>> >>> 2) >>> Should String repeat(final int codepoint, final int count) be optimized >> for codepoints that can be represented with a single char? >>> >>> E.g. like this: >>> >>> public static String repeat(final int codepoint, final int count) { >>> return Character.isBmpCodePoint(codepoint)) >>> ? repeat((char) codepoint, count) >>> : (new String(Character.toChars(codepoint))).repeat(count); >>> } >> >> Yes, avoid array allocation. >> >>> >>> 3) >>> Using long arithmetic can possibly be avoided in the common path of >> repeat(final int count): >>> >>> E.g. like this: >>> >>>if (count < 0) { >>>throw new IllegalArgumentException("count is negative, " + >> count); >>>} else if (count == 1) { >>>return this; >>>} else if (count == 0) { >>>return ""; >>> } >>>final int len = value.length; >>>if (Integer.MAX_VALUE / count < len) { >>>throw new IllegalArgumentException( >>>"Resulting string exceeds maximum string length: " + >> ((long)len * (long)count)); >>>} >>>final int limit = count * len; >> >> Good. >> >> Thank you. >> >>> >>> With kind regards, >>> Ivan >>> On 2/15/18 9:20 AM, Jim Laskey wrote: This is a pre-CSR code review [1] for String repeat methods >> (Enhancement). The proposal is to introduce four new methods; 1. public String repeat(final int count) 2. public static String repeat(final char ch, final int count) 3. public static String repeat(final int codepoint, final int count) 4. public static String repeat(final CharSequence seq, final int count) For the sake of transparency, only 1 is necessary, 2-4 are convenience >> methods. In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, >> 10). 3 and 4 convert to String before calling 1. Performance runs with jmh (results as comment in [2]) show that these methods are significantly faster that StringBuilder equivalents. - fewer memory allocations - fewer char to byte array conversions - faster pyramid replication vs O(N) copying I left StringBuilder out of scope. It falls under the category of Appendables#append with repeat. A much bigger project. All comments welcome. Especially around the need for convenience methods, the JavaDoc content and expanding the tests. — Jim [1] webrev: >> http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00 [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594 >>> >>> -- >>> With kind regards, >>> Ivan Gerasimov >>> >> >>
Re: RFR: 8197594 - String and character repeat
Good to have the data. Thank you Louis. Sent from my iPhone > On Feb 15, 2018, at 4:52 PM, Louis Wasserman wrote: > > I don't think there's a case for demand to merit having a > repeat(CharSequence, int) at all. > > I did an analysis of usages of Guava's Strings.repeat on Google's codebase. > Users might be rolling their own implementations, too, but this should be a > very good proxy for demand. > > StringRepeat_SingleConstantChar = 4.475 K // strings with .length() == > 1 > StringRepeat_SingleConstantCodePoint = 28 // strings with > .codePointCount(...) == 1 > StringRepeat_MultiCodePointConstant = 1.156 K // constant strings neither > of the above > StringRepeat_CharSequenceToString = 2 // > Strings.repeat(CharSequence.toString(), n) > StringRepeat_NoneOfTheAbove = 248 > > Notably, it seems like basically nobody needs to repeat a CharSequence -- > definitely not enough demand to merit the awkwardness of e.g. Rope.repeat(n) > inheriting a repeat returning a String. > > Based on this data, I'd recommend providing one and only one method of this > type: String.repeat(int). There's no real advantage to a static repeat(char, > int) method when the overwhelming majority of these are constants: e.g. > compare SomeUtilClass.repeat('*', n) versus "*".repeat(n). > Character.toString(c).repeat(n) isn't a bad workaround if you don't have a > constant char. There also isn't much demand for dealing with the code point > case specially; the String.repeat(int) method seems like it'd handle that > just fine. > >> On Thu, Feb 15, 2018 at 11:44 AM Jim Laskey wrote: >> >> >> > On Feb 15, 2018, at 3:36 PM, Ivan Gerasimov >> > wrote: >> > >> > Hello! >> > >> > The link with the webrev returned 404, but I could find it at this >> > location: http://cr.openjdk.java.net/~jlaskey/8197594/webrev-00/ >> > >> > A few minor comments: >> > >> > 1) >> > >> > This check: >> > >> > 2992 final long limit = (long)count * 2L; >> > 2993 if ((long)Integer.MAX_VALUE < limit) { >> > >> > can be possibly simplified as >> > if (count > Integer.MAX_VALUE - count) { >> >> Good. >> >> > >> > 2) >> > Should String repeat(final int codepoint, final int count) be optimized >> > for codepoints that can be represented with a single char? >> > >> > E.g. like this: >> > >> > public static String repeat(final int codepoint, final int count) { >> >return Character.isBmpCodePoint(codepoint)) >> >? repeat((char) codepoint, count) >> >: (new String(Character.toChars(codepoint))).repeat(count); >> > } >> >> Yes, avoid array allocation. >> >> > >> > 3) >> > Using long arithmetic can possibly be avoided in the common path of >> > repeat(final int count): >> > >> > E.g. like this: >> > >> > if (count < 0) { >> > throw new IllegalArgumentException("count is negative, " + >> > count); >> > } else if (count == 1) { >> > return this; >> > } else if (count == 0) { >> > return ""; >> > } >> > final int len = value.length; >> > if (Integer.MAX_VALUE / count < len) { >> > throw new IllegalArgumentException( >> > "Resulting string exceeds maximum string length: " + >> > ((long)len * (long)count)); >> > } >> > final int limit = count * len; >> >> Good. >> >> Thank you. >> >> > >> > With kind regards, >> > Ivan >> > >> > On 2/15/18 9:20 AM, Jim Laskey wrote: >> >> This is a pre-CSR code review [1] for String repeat methods (Enhancement). >> >> >> >> The proposal is to introduce four new methods; >> >> >> >> 1. public String repeat(final int count) >> >> 2. public static String repeat(final char ch, final int count) >> >> 3. public static String repeat(final int codepoint, final int count) >> >> 4. public static String repeat(final CharSequence seq, final int count) >> >> >> >> For the sake of transparency, only 1 is necessary, 2-4 are convenience >> >> methods. >> >> In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, >> >> 10). >> >> 3 and 4 convert to String before calling 1. >> >> >> >> Performance runs with jmh (results as comment in [2]) show that these >> >> methods are significantly faster that StringBuilder equivalents. >> >> - fewer memory allocations >> >> - fewer char to byte array conversions >> >> - faster pyramid replication vs O(N) copying >> >> >> >> I left StringBuilder out of scope. It falls under the category of >> >> Appendables#append with repeat. A much bigger project. >> >> >> >> All comments welcome. Especially around the need for convenience >> >> methods, the JavaDoc content and expanding the tests. >> >> >> >> — Jim >> >> >> >> [1] webrev: http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00 >> >> [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594 >> >> >> >> >> > >> > -- >> > With kind regards, >> > Ivan Gerasimov >> > >>
Re: RFR: 8197594 - String and character repeat
I don't think there's a case for demand to merit having a repeat(CharSequence, int) at all. I did an analysis of usages of Guava's Strings.repeat on Google's codebase. Users might be rolling their own implementations, too, but this should be a very good proxy for demand. StringRepeat_SingleConstantChar = 4.475 K // strings with .length() == 1 StringRepeat_SingleConstantCodePoint = 28 // strings with .codePointCount(...) == 1 StringRepeat_MultiCodePointConstant = 1.156 K // constant strings neither of the above StringRepeat_CharSequenceToString = 2 // Strings.repeat(CharSequence.toString(), n) StringRepeat_NoneOfTheAbove = 248 Notably, it seems like basically nobody needs to repeat a CharSequence -- definitely not enough demand to merit the awkwardness of e.g. Rope.repeat(n) inheriting a repeat returning a String. Based on this data, I'd recommend providing one and only one method of this type: String.repeat(int). There's no real advantage to a static repeat(char, int) method when the overwhelming majority of these are constants: e.g. compare SomeUtilClass.repeat('*', n) versus "*".repeat(n). Character.toString(c).repeat(n) isn't a bad workaround if you don't have a constant char. There also isn't much demand for dealing with the code point case specially; the String.repeat(int) method seems like it'd handle that just fine. On Thu, Feb 15, 2018 at 11:44 AM Jim Laskey wrote: > > > > On Feb 15, 2018, at 3:36 PM, Ivan Gerasimov > wrote: > > > > Hello! > > > > The link with the webrev returned 404, but I could find it at this > location: http://cr.openjdk.java.net/~jlaskey/8197594/webrev-00/ > > > > A few minor comments: > > > > 1) > > > > This check: > > > > 2992 final long limit = (long)count * 2L; > > 2993 if ((long)Integer.MAX_VALUE < limit) { > > > > can be possibly simplified as > > if (count > Integer.MAX_VALUE - count) { > > Good. > > > > > 2) > > Should String repeat(final int codepoint, final int count) be optimized > for codepoints that can be represented with a single char? > > > > E.g. like this: > > > > public static String repeat(final int codepoint, final int count) { > >return Character.isBmpCodePoint(codepoint)) > >? repeat((char) codepoint, count) > >: (new String(Character.toChars(codepoint))).repeat(count); > > } > > Yes, avoid array allocation. > > > > > 3) > > Using long arithmetic can possibly be avoided in the common path of > repeat(final int count): > > > > E.g. like this: > > > > if (count < 0) { > > throw new IllegalArgumentException("count is negative, " + > count); > > } else if (count == 1) { > > return this; > > } else if (count == 0) { > > return ""; > > } > > final int len = value.length; > > if (Integer.MAX_VALUE / count < len) { > > throw new IllegalArgumentException( > > "Resulting string exceeds maximum string length: " + > ((long)len * (long)count)); > > } > > final int limit = count * len; > > Good. > > Thank you. > > > > > With kind regards, > > Ivan > > > > On 2/15/18 9:20 AM, Jim Laskey wrote: > >> This is a pre-CSR code review [1] for String repeat methods > (Enhancement). > >> > >> The proposal is to introduce four new methods; > >> > >> 1. public String repeat(final int count) > >> 2. public static String repeat(final char ch, final int count) > >> 3. public static String repeat(final int codepoint, final int count) > >> 4. public static String repeat(final CharSequence seq, final int count) > >> > >> For the sake of transparency, only 1 is necessary, 2-4 are convenience > methods. > >> In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, > 10). > >> 3 and 4 convert to String before calling 1. > >> > >> Performance runs with jmh (results as comment in [2]) show that these > >> methods are significantly faster that StringBuilder equivalents. > >> - fewer memory allocations > >> - fewer char to byte array conversions > >> - faster pyramid replication vs O(N) copying > >> > >> I left StringBuilder out of scope. It falls under the category of > >> Appendables#append with repeat. A much bigger project. > >> > >> All comments welcome. Especially around the need for convenience > >> methods, the JavaDoc content and expanding the tests. > >> > >> — Jim > >> > >> [1] webrev: > http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00 > >> [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594 > >> > >> > > > > -- > > With kind regards, > > Ivan Gerasimov > > > >
Re: RFR: 8197594 - String and character repeat
Thank you Roger. > On Feb 15, 2018, at 4:38 PM, Roger Riggs wrote: > > Hi Jim, > > Its cleaner to do the API (CSR) review before and without the implementation. > It helps keep the issues separate. This was on advice from a member of the core libs team. He can speak up if he wants. > > Don't make statements about count == Integer.MAX_VALUE / 2. > There is no point, unless it should throw IAE. > > On 2/15/2018 3:16 PM, Jim Laskey wrote: >>> On Feb 15, 2018, at 4:04 PM, Remi Forax wrote: >>> >>> I'm not sure we need 4, it's just a convenient method that may be slower >>> than if the user code calls toString() (because of profile pollution), >>> so i'm not sure i pull it's own weight. >>> >>> And about adding a default method into CharSequence, the default method >>> should return a CharSequence or String ? >> If you mean each class returns an instance of its class, I think that >> overlaps Appendable.. > Beware thinking about default methods on interfaces; the dragons will get you. > Recently, this was discussed in the context of Reader.transferTo. >>> and what about the other implementations, AbstractStringBuilder and >>> CharBuffer at least ? >> This falls into the Appendable.append( T t, int count) realm mentioned >> originally. >> >> Long term this could be a goal, and maybe defaulting CharSequence#repeat >> returning a string would be shortsighted. >> >> But, I think having instance String#repeat returning a CharSequence would >> limit its use (methods expecting strings.) > There is no point in returning CharSequence, a String is a CharSequence and > can be used anywhere > a CharSequence can. > > $.02, Roger > >> >> — Jim >> >>> Rémi >>> >>> - Mail original - >>>> De: "Jim Laskey" >>>> À: "Brian Goetz" >>>> Cc: "core-libs-dev" >>>> Envoyé: Jeudi 15 Février 2018 18:34:19 >>>> Objet: Re: RFR: 8197594 - String and character repeat >>>> Very reasonable approach. >>>> >>>> >>>>> On Feb 15, 2018, at 1:31 PM, Brian Goetz wrote: >>>>> >>>>> I suggest merging 1 and 4 by making it an instance method on CS with a >>>>> default >>>>> in CS and an override on string and string builder? >>>>> >>>>> Sent from my MacBook Wheel >>>>> >>>>>> On Feb 15, 2018, at 9:20 AM, Jim Laskey wrote: >>>>>> >>>>>> This is a pre-CSR code review [1] for String repeat methods >>>>>> (Enhancement). >>>>>> >>>>>> The proposal is to introduce four new methods; >>>>>> >>>>>> 1. public String repeat(final int count) >>>>>> 2. public static String repeat(final char ch, final int count) >>>>>> 3. public static String repeat(final int codepoint, final int count) >>>>>> 4. public static String repeat(final CharSequence seq, final int count) >>>>>> >>>>>> For the sake of transparency, only 1 is necessary, 2-4 are convenience >>>>>> methods. >>>>>> In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, >>>>>> 10). >>>>>> 3 and 4 convert to String before calling 1. >>>>>> >>>>>> Performance runs with jmh (results as comment in [2]) show that these >>>>>> methods are significantly faster that StringBuilder equivalents. >>>>>> - fewer memory allocations >>>>>> - fewer char to byte array conversions >>>>>> - faster pyramid replication vs O(N) copying >>>>>> >>>>>> I left StringBuilder out of scope. It falls under the category of >>>>>> Appendables#append with repeat. A much bigger project. >>>>>> >>>>>> All comments welcome. Especially around the need for convenience >>>>>> methods, the JavaDoc content and expanding the tests. >>>>>> >>>>>> — Jim >>>>>> >>>>>> [1] webrev: http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00 >>>>>> [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594 >>>>>> >
Re: RFR: 8197594 - String and character repeat
Hi Jim, Its cleaner to do the API (CSR) review before and without the implementation. It helps keep the issues separate. Don't make statements about count == Integer.MAX_VALUE / 2. There is no point, unless it should throw IAE. On 2/15/2018 3:16 PM, Jim Laskey wrote: On Feb 15, 2018, at 4:04 PM, Remi Forax wrote: I'm not sure we need 4, it's just a convenient method that may be slower than if the user code calls toString() (because of profile pollution), so i'm not sure i pull it's own weight. And about adding a default method into CharSequence, the default method should return a CharSequence or String ? If you mean each class returns an instance of its class, I think that overlaps Appendable.. Beware thinking about default methods on interfaces; the dragons will get you. Recently, this was discussed in the context of Reader.transferTo. and what about the other implementations, AbstractStringBuilder and CharBuffer at least ? This falls into the Appendable.append( T t, int count) realm mentioned originally. Long term this could be a goal, and maybe defaulting CharSequence#repeat returning a string would be shortsighted. But, I think having instance String#repeat returning a CharSequence would limit its use (methods expecting strings.) There is no point in returning CharSequence, a String is a CharSequence and can be used anywhere a CharSequence can. $.02, Roger — Jim Rémi - Mail original - De: "Jim Laskey" À: "Brian Goetz" Cc: "core-libs-dev" Envoyé: Jeudi 15 Février 2018 18:34:19 Objet: Re: RFR: 8197594 - String and character repeat Very reasonable approach. On Feb 15, 2018, at 1:31 PM, Brian Goetz wrote: I suggest merging 1 and 4 by making it an instance method on CS with a default in CS and an override on string and string builder? Sent from my MacBook Wheel On Feb 15, 2018, at 9:20 AM, Jim Laskey wrote: This is a pre-CSR code review [1] for String repeat methods (Enhancement). The proposal is to introduce four new methods; 1. public String repeat(final int count) 2. public static String repeat(final char ch, final int count) 3. public static String repeat(final int codepoint, final int count) 4. public static String repeat(final CharSequence seq, final int count) For the sake of transparency, only 1 is necessary, 2-4 are convenience methods. In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, 10). 3 and 4 convert to String before calling 1. Performance runs with jmh (results as comment in [2]) show that these methods are significantly faster that StringBuilder equivalents. - fewer memory allocations - fewer char to byte array conversions - faster pyramid replication vs O(N) copying I left StringBuilder out of scope. It falls under the category of Appendables#append with repeat. A much bigger project. All comments welcome. Especially around the need for convenience methods, the JavaDoc content and expanding the tests. — Jim [1] webrev: http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00 [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594
Re: RFR: 8197594 - String and character repeat
- Mail original - > De: "Jim Laskey" > À: "Remi Forax" > Cc: "Brian Goetz" , "core-libs-dev" > > Envoyé: Jeudi 15 Février 2018 21:16:48 > Objet: Re: RFR: 8197594 - String and character repeat >> On Feb 15, 2018, at 4:04 PM, Remi Forax wrote: >> >> I'm not sure we need 4, it's just a convenient method that may be slower >> than if >> the user code calls toString() (because of profile pollution), >> so i'm not sure i pull it's own weight. >> >> And about adding a default method into CharSequence, the default method >> should >> return a CharSequence or String ? > > If you mean each class returns an instance of its class, I think that overlaps > Appendable.. yes, very true > >> and what about the other implementations, AbstractStringBuilder and >> CharBuffer >> at least ? > > This falls into the Appendable.append( T t, int count) realm mentioned > originally. > > Long term this could be a goal, and maybe defaulting CharSequence#repeat > returning a string would be shortsighted. > > But, I think having instance String#repeat returning a CharSequence would > limit > its use (methods expecting strings.) yes, i think we should play safe here and only add repeat to String. > > — Jim Rémi > >> >> Rémi >> >> - Mail original - >>> De: "Jim Laskey" >>> À: "Brian Goetz" >>> Cc: "core-libs-dev" >>> Envoyé: Jeudi 15 Février 2018 18:34:19 >>> Objet: Re: RFR: 8197594 - String and character repeat >> >>> Very reasonable approach. >>> >>> >>>> On Feb 15, 2018, at 1:31 PM, Brian Goetz wrote: >>>> >>>> I suggest merging 1 and 4 by making it an instance method on CS with a >>>> default >>>> in CS and an override on string and string builder? >>>> >>>> Sent from my MacBook Wheel >>>> >>>>> On Feb 15, 2018, at 9:20 AM, Jim Laskey wrote: >>>>> >>>>> This is a pre-CSR code review [1] for String repeat methods (Enhancement). >>>>> >>>>> The proposal is to introduce four new methods; >>>>> >>>>> 1. public String repeat(final int count) >>>>> 2. public static String repeat(final char ch, final int count) >>>>> 3. public static String repeat(final int codepoint, final int count) >>>>> 4. public static String repeat(final CharSequence seq, final int count) >>>>> >>>>> For the sake of transparency, only 1 is necessary, 2-4 are convenience >>>>> methods. >>>>> In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, >>>>> 10). >>>>> 3 and 4 convert to String before calling 1. >>>>> >>>>> Performance runs with jmh (results as comment in [2]) show that these >>>>> methods are significantly faster that StringBuilder equivalents. >>>>> - fewer memory allocations >>>>> - fewer char to byte array conversions >>>>> - faster pyramid replication vs O(N) copying >>>>> >>>>> I left StringBuilder out of scope. It falls under the category of >>>>> Appendables#append with repeat. A much bigger project. >>>>> >>>>> All comments welcome. Especially around the need for convenience >>>>> methods, the JavaDoc content and expanding the tests. >>>>> >>>>> — Jim >>>>> >>>>> [1] webrev: http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00 >>>>> [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594
Re: RFR: 8197594 - String and character repeat
Thank you Remi. Re final: Attila got into my head. > On Feb 15, 2018, at 4:24 PM, Remi Forax wrote: > > - Mail original - >> De: "Ivan Gerasimov" >> À: "Jim Laskey" , "core-libs-dev" >> >> Envoyé: Jeudi 15 Février 2018 20:36:44 >> Objet: Re: RFR: 8197594 - String and character repeat > >> Hello! >> >> The link with the webrev returned 404, but I could find it at this >> location: http://cr.openjdk.java.net/~jlaskey/8197594/webrev-00/ > > Jim, > in static String repeat(char ch, int count) > you can remove the 'else's because they are after a throw or a return as you > have done in the other methods. > > in static String repeat(int codepoint, int count), > the parenthesis just after the return seems unnecessary, > > in String repeat(int count) > at the end instead of testing isLatin1() >return new String(bytes, coder); > should be equivalent. > > in replicate > you can return 'result' after the call to Arrays.fill, to signal to readers > that this is an optimization and that the primary code is below, it will also > avoid to shift the primary code to the right. > > and nitpicking about the style, can you remove those pesky final, it's the > Nashorn style but not the JDK one :) > > cheers, > Rémi > > >> On 2/15/18 9:20 AM, Jim Laskey wrote: >>> This is a pre-CSR code review [1] for String repeat methods (Enhancement). >>> >>> The proposal is to introduce four new methods; >>> >>> 1. public String repeat(final int count) >>> 2. public static String repeat(final char ch, final int count) >>> 3. public static String repeat(final int codepoint, final int count) >>> 4. public static String repeat(final CharSequence seq, final int count) >>> >>> For the sake of transparency, only 1 is necessary, 2-4 are convenience >>> methods. >>> In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, 10). >>> 3 and 4 convert to String before calling 1. >>> >>> Performance runs with jmh (results as comment in [2]) show that these >>> methods are significantly faster that StringBuilder equivalents. >>> - fewer memory allocations >>> - fewer char to byte array conversions >>> - faster pyramid replication vs O(N) copying >>> >>> I left StringBuilder out of scope. It falls under the category of >>> Appendables#append with repeat. A much bigger project. >>> >>> All comments welcome. Especially around the need for convenience >>> methods, the JavaDoc content and expanding the tests. >>> >>> — Jim >>> >>> [1] webrev: http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00 >>> [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594 >>> >>> >> >> -- >> With kind regards, >> Ivan Gerasimov
Re: RFR: 8197594 - String and character repeat
- Mail original - > De: "Ivan Gerasimov" > À: "Jim Laskey" , "core-libs-dev" > > Envoyé: Jeudi 15 Février 2018 20:36:44 > Objet: Re: RFR: 8197594 - String and character repeat > Hello! > > The link with the webrev returned 404, but I could find it at this > location: http://cr.openjdk.java.net/~jlaskey/8197594/webrev-00/ Jim, in static String repeat(char ch, int count) you can remove the 'else's because they are after a throw or a return as you have done in the other methods. in static String repeat(int codepoint, int count), the parenthesis just after the return seems unnecessary, in String repeat(int count) at the end instead of testing isLatin1() return new String(bytes, coder); should be equivalent. in replicate you can return 'result' after the call to Arrays.fill, to signal to readers that this is an optimization and that the primary code is below, it will also avoid to shift the primary code to the right. and nitpicking about the style, can you remove those pesky final, it's the Nashorn style but not the JDK one :) cheers, Rémi > On 2/15/18 9:20 AM, Jim Laskey wrote: >> This is a pre-CSR code review [1] for String repeat methods (Enhancement). >> >> The proposal is to introduce four new methods; >> >> 1. public String repeat(final int count) >> 2. public static String repeat(final char ch, final int count) >> 3. public static String repeat(final int codepoint, final int count) >> 4. public static String repeat(final CharSequence seq, final int count) >> >> For the sake of transparency, only 1 is necessary, 2-4 are convenience >> methods. >> In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, 10). >> 3 and 4 convert to String before calling 1. >> >> Performance runs with jmh (results as comment in [2]) show that these >> methods are significantly faster that StringBuilder equivalents. >> - fewer memory allocations >> - fewer char to byte array conversions >> - faster pyramid replication vs O(N) copying >> >> I left StringBuilder out of scope. It falls under the category of >> Appendables#append with repeat. A much bigger project. >> >> All comments welcome. Especially around the need for convenience >> methods, the JavaDoc content and expanding the tests. >> >> — Jim >> >> [1] webrev: http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00 >> [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594 >> >> > > -- > With kind regards, > Ivan Gerasimov
Re: RFR: 8197594 - String and character repeat
> On Feb 15, 2018, at 4:04 PM, Remi Forax wrote: > > I'm not sure we need 4, it's just a convenient method that may be slower than > if the user code calls toString() (because of profile pollution), > so i'm not sure i pull it's own weight. > > And about adding a default method into CharSequence, the default method > should return a CharSequence or String ? If you mean each class returns an instance of its class, I think that overlaps Appendable.. > and what about the other implementations, AbstractStringBuilder and > CharBuffer at least ? This falls into the Appendable.append( T t, int count) realm mentioned originally. Long term this could be a goal, and maybe defaulting CharSequence#repeat returning a string would be shortsighted. But, I think having instance String#repeat returning a CharSequence would limit its use (methods expecting strings.) — Jim > > Rémi > > - Mail original - >> De: "Jim Laskey" >> À: "Brian Goetz" >> Cc: "core-libs-dev" >> Envoyé: Jeudi 15 Février 2018 18:34:19 >> Objet: Re: RFR: 8197594 - String and character repeat > >> Very reasonable approach. >> >> >>> On Feb 15, 2018, at 1:31 PM, Brian Goetz wrote: >>> >>> I suggest merging 1 and 4 by making it an instance method on CS with a >>> default >>> in CS and an override on string and string builder? >>> >>> Sent from my MacBook Wheel >>> >>>> On Feb 15, 2018, at 9:20 AM, Jim Laskey wrote: >>>> >>>> This is a pre-CSR code review [1] for String repeat methods (Enhancement). >>>> >>>> The proposal is to introduce four new methods; >>>> >>>> 1. public String repeat(final int count) >>>> 2. public static String repeat(final char ch, final int count) >>>> 3. public static String repeat(final int codepoint, final int count) >>>> 4. public static String repeat(final CharSequence seq, final int count) >>>> >>>> For the sake of transparency, only 1 is necessary, 2-4 are convenience >>>> methods. >>>> In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, >>>> 10). >>>> 3 and 4 convert to String before calling 1. >>>> >>>> Performance runs with jmh (results as comment in [2]) show that these >>>> methods are significantly faster that StringBuilder equivalents. >>>> - fewer memory allocations >>>> - fewer char to byte array conversions >>>> - faster pyramid replication vs O(N) copying >>>> >>>> I left StringBuilder out of scope. It falls under the category of >>>> Appendables#append with repeat. A much bigger project. >>>> >>>> All comments welcome. Especially around the need for convenience >>>> methods, the JavaDoc content and expanding the tests. >>>> >>>> — Jim >>>> >>>> [1] webrev: http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00 >>>> [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594 >>>>
Re: RFR: 8197594 - String and character repeat
I'm not sure we need 4, it's just a convenient method that may be slower than if the user code calls toString() (because of profile pollution), so i'm not sure i pull it's own weight. And about adding a default method into CharSequence, the default method should return a CharSequence or String ? and what about the other implementations, AbstractStringBuilder and CharBuffer at least ? Rémi - Mail original - > De: "Jim Laskey" > À: "Brian Goetz" > Cc: "core-libs-dev" > Envoyé: Jeudi 15 Février 2018 18:34:19 > Objet: Re: RFR: 8197594 - String and character repeat > Very reasonable approach. > > >> On Feb 15, 2018, at 1:31 PM, Brian Goetz wrote: >> >> I suggest merging 1 and 4 by making it an instance method on CS with a >> default >> in CS and an override on string and string builder? >> >> Sent from my MacBook Wheel >> >>> On Feb 15, 2018, at 9:20 AM, Jim Laskey wrote: >>> >>> This is a pre-CSR code review [1] for String repeat methods (Enhancement). >>> >>> The proposal is to introduce four new methods; >>> >>> 1. public String repeat(final int count) >>> 2. public static String repeat(final char ch, final int count) >>> 3. public static String repeat(final int codepoint, final int count) >>> 4. public static String repeat(final CharSequence seq, final int count) >>> >>> For the sake of transparency, only 1 is necessary, 2-4 are convenience >>> methods. >>> In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, 10). >>> 3 and 4 convert to String before calling 1. >>> >>> Performance runs with jmh (results as comment in [2]) show that these >>> methods are significantly faster that StringBuilder equivalents. >>> - fewer memory allocations >>> - fewer char to byte array conversions >>> - faster pyramid replication vs O(N) copying >>> >>> I left StringBuilder out of scope. It falls under the category of >>> Appendables#append with repeat. A much bigger project. >>> >>> All comments welcome. Especially around the need for convenience >>> methods, the JavaDoc content and expanding the tests. >>> >>> — Jim >>> >>> [1] webrev: http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00 >>> [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594 >>>
Re: RFR: 8197594 - String and character repeat
> On Feb 15, 2018, at 3:36 PM, Ivan Gerasimov wrote: > > Hello! > > The link with the webrev returned 404, but I could find it at this location: > http://cr.openjdk.java.net/~jlaskey/8197594/webrev-00/ > > A few minor comments: > > 1) > > This check: > > 2992 final long limit = (long)count * 2L; > 2993 if ((long)Integer.MAX_VALUE < limit) { > > can be possibly simplified as > if (count > Integer.MAX_VALUE - count) { Good. > > 2) > Should String repeat(final int codepoint, final int count) be optimized for > codepoints that can be represented with a single char? > > E.g. like this: > > public static String repeat(final int codepoint, final int count) { >return Character.isBmpCodePoint(codepoint)) >? repeat((char) codepoint, count) >: (new String(Character.toChars(codepoint))).repeat(count); > } Yes, avoid array allocation. > > 3) > Using long arithmetic can possibly be avoided in the common path of > repeat(final int count): > > E.g. like this: > > if (count < 0) { > throw new IllegalArgumentException("count is negative, " + count); > } else if (count == 1) { > return this; > } else if (count == 0) { > return ""; > } > final int len = value.length; > if (Integer.MAX_VALUE / count < len) { > throw new IllegalArgumentException( > "Resulting string exceeds maximum string length: " + > ((long)len * (long)count)); > } > final int limit = count * len; Good. Thank you. > > With kind regards, > Ivan > > On 2/15/18 9:20 AM, Jim Laskey wrote: >> This is a pre-CSR code review [1] for String repeat methods (Enhancement). >> >> The proposal is to introduce four new methods; >> >> 1. public String repeat(final int count) >> 2. public static String repeat(final char ch, final int count) >> 3. public static String repeat(final int codepoint, final int count) >> 4. public static String repeat(final CharSequence seq, final int count) >> >> For the sake of transparency, only 1 is necessary, 2-4 are convenience >> methods. >> In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, 10). >> 3 and 4 convert to String before calling 1. >> >> Performance runs with jmh (results as comment in [2]) show that these >> methods are significantly faster that StringBuilder equivalents. >> - fewer memory allocations >> - fewer char to byte array conversions >> - faster pyramid replication vs O(N) copying >> >> I left StringBuilder out of scope. It falls under the category of >> Appendables#append with repeat. A much bigger project. >> >> All comments welcome. Especially around the need for convenience >> methods, the JavaDoc content and expanding the tests. >> >> — Jim >> >> [1] webrev: http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00 >> [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594 >> >> > > -- > With kind regards, > Ivan Gerasimov >
Re: RFR: 8197594 - String and character repeat
Hello! The link with the webrev returned 404, but I could find it at this location: http://cr.openjdk.java.net/~jlaskey/8197594/webrev-00/ A few minor comments: 1) This check: 2992 final long limit = (long)count * 2L; 2993 if ((long)Integer.MAX_VALUE < limit) { can be possibly simplified as if (count > Integer.MAX_VALUE - count) { 2) Should String repeat(final int codepoint, final int count) be optimized for codepoints that can be represented with a single char? E.g. like this: public static String repeat(final int codepoint, final int count) { return Character.isBmpCodePoint(codepoint)) ? repeat((char) codepoint, count) : (new String(Character.toChars(codepoint))).repeat(count); } 3) Using long arithmetic can possibly be avoided in the common path of repeat(final int count): E.g. like this: if (count < 0) { throw new IllegalArgumentException("count is negative, " + count); } else if (count == 1) { return this; } else if (count == 0) { return ""; } final int len = value.length; if (Integer.MAX_VALUE / count < len) { throw new IllegalArgumentException( "Resulting string exceeds maximum string length: " + ((long)len * (long)count)); } final int limit = count * len; With kind regards, Ivan On 2/15/18 9:20 AM, Jim Laskey wrote: This is a pre-CSR code review [1] for String repeat methods (Enhancement). The proposal is to introduce four new methods; 1. public String repeat(final int count) 2. public static String repeat(final char ch, final int count) 3. public static String repeat(final int codepoint, final int count) 4. public static String repeat(final CharSequence seq, final int count) For the sake of transparency, only 1 is necessary, 2-4 are convenience methods. In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, 10). 3 and 4 convert to String before calling 1. Performance runs with jmh (results as comment in [2]) show that these methods are significantly faster that StringBuilder equivalents. - fewer memory allocations - fewer char to byte array conversions - faster pyramid replication vs O(N) copying I left StringBuilder out of scope. It falls under the category of Appendables#append with repeat. A much bigger project. All comments welcome. Especially around the need for convenience methods, the JavaDoc content and expanding the tests. — Jim [1] webrev: http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00 [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594 -- With kind regards, Ivan Gerasimov
Re: RFR: 8197594 - String and character repeat
Very reasonable approach. > On Feb 15, 2018, at 1:31 PM, Brian Goetz wrote: > > I suggest merging 1 and 4 by making it an instance method on CS with a > default in CS and an override on string and string builder? > > Sent from my MacBook Wheel > >> On Feb 15, 2018, at 9:20 AM, Jim Laskey wrote: >> >> This is a pre-CSR code review [1] for String repeat methods (Enhancement). >> >> The proposal is to introduce four new methods; >> >> 1. public String repeat(final int count) >> 2. public static String repeat(final char ch, final int count) >> 3. public static String repeat(final int codepoint, final int count) >> 4. public static String repeat(final CharSequence seq, final int count) >> >> For the sake of transparency, only 1 is necessary, 2-4 are convenience >> methods. >> In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, 10). >> 3 and 4 convert to String before calling 1. >> >> Performance runs with jmh (results as comment in [2]) show that these >> methods are significantly faster that StringBuilder equivalents. >> - fewer memory allocations >> - fewer char to byte array conversions >> - faster pyramid replication vs O(N) copying >> >> I left StringBuilder out of scope. It falls under the category of >> Appendables#append with repeat. A much bigger project. >> >> All comments welcome. Especially around the need for convenience >> methods, the JavaDoc content and expanding the tests. >> >> — Jim >> >> [1] webrev: http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00 >> [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594 >> >
Re: RFR: 8197594 - String and character repeat
I suggest merging 1 and 4 by making it an instance method on CS with a default in CS and an override on string and string builder? Sent from my MacBook Wheel > On Feb 15, 2018, at 9:20 AM, Jim Laskey wrote: > > This is a pre-CSR code review [1] for String repeat methods (Enhancement). > > The proposal is to introduce four new methods; > > 1. public String repeat(final int count) > 2. public static String repeat(final char ch, final int count) > 3. public static String repeat(final int codepoint, final int count) > 4. public static String repeat(final CharSequence seq, final int count) > > For the sake of transparency, only 1 is necessary, 2-4 are convenience > methods. > In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, 10). > 3 and 4 convert to String before calling 1. > > Performance runs with jmh (results as comment in [2]) show that these > methods are significantly faster that StringBuilder equivalents. > - fewer memory allocations > - fewer char to byte array conversions > - faster pyramid replication vs O(N) copying > > I left StringBuilder out of scope. It falls under the category of > Appendables#append with repeat. A much bigger project. > > All comments welcome. Especially around the need for convenience > methods, the JavaDoc content and expanding the tests. > > — Jim > > [1] webrev: http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00 > [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594 >
RFR: 8197594 - String and character repeat
This is a pre-CSR code review [1] for String repeat methods (Enhancement). The proposal is to introduce four new methods; 1. public String repeat(final int count) 2. public static String repeat(final char ch, final int count) 3. public static String repeat(final int codepoint, final int count) 4. public static String repeat(final CharSequence seq, final int count) For the sake of transparency, only 1 is necessary, 2-4 are convenience methods. In the case of 2, “*”.repeat(10) performs as well as String.repeat(‘*’, 10). 3 and 4 convert to String before calling 1. Performance runs with jmh (results as comment in [2]) show that these methods are significantly faster that StringBuilder equivalents. - fewer memory allocations - fewer char to byte array conversions - faster pyramid replication vs O(N) copying I left StringBuilder out of scope. It falls under the category of Appendables#append with repeat. A much bigger project. All comments welcome. Especially around the need for convenience methods, the JavaDoc content and expanding the tests. — Jim [1] webrev: http://cr.openjdk.java.net//oj/home/jlaskey/8197594/webrev-00 [2] jbs: https://bugs.openjdk.java.net/browse/JDK-8197594