I'm pretty sure the majority of cases where getBytes is used without
specifying an encoding, it's in test code. Most everywhere else has gone out
of the way to litter the useless UnsupportedEncodingExceptions all over the
place.

On Thu, May 8, 2008 at 11:58 AM, Paul Lindner <[EMAIL PROTECTED]> wrote:

> I agree with this String.getBytes() is evil, especially for performance
> reasons.  See my post about it here:
>
>
> http://paul.vox.com/library/post/the-mysteries-of-java-character-set-perform
> ance.html<http://paul.vox.com/library/post/the-mysteries-of-java-character-set-performance.html>
>
> Here's some code chunks that use Java NIO to convert between character sets
> that don't exhibit the performance problems with looking up character sets
> all the time...
>
>
>    private static final Charset UTF8 = Charset.forName("UTF-8");
>
>    try {
>        CharsetEncoder toUTF8Bytes = UTF8.newEncoder()
>                     .onMalformedInput(CodingErrorAction.REPORT)
>                     .onUnmappableCharacter(CodingErrorAction.REPORT);
>
>       return toUTF8Bytes.encode(CharBuffer.wrap(str))
>    } catch (Exception ex) {
>        // do something else
>    }
>
>    String s = UTF8.decode(ByteBuffer.wrap(output)).toString();
>
>
>
> On 5/8/08 8:44 AM, "Henning P. Schmiedehausen" <[EMAIL PROTECTED]> wrote:
>
> > Having been burned far too many times by far too many Java
> > applications that assume platform encoding == UTF-8 and running
> > applications between different platforms, the free usage of the
> > getBytes() method inside the Shindig code base concerns me a lot.
> >
> > A simple example: Apply the following patch:
> >
> > Index: java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java
> > ===================================================================
> > --- java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java
> > (revision 654541)
> > +++ java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java
> > (working copy)
> > @@ -37,14 +37,17 @@
> >      crypter = new BasicBlobCrypter("0123456789abcdef".getBytes());
> >      crypter.timeSource = new FakeTimeSource();
> >    }
> > +
> > +
> > +
> >
> >    @Test
> >    public void testHmacSha1() throws Exception {
> >      String key = "abcd1234";
> > -    String val = "your mother is a hedgehog";
> > +    String val = "your mother is a hedgehog
> > (\u00e4\u00f6\u00fc\u00c4\u00d6\u00dc)";
> >      byte[] expected = new byte[] {
> > -        -21, 2, 47, -101, 9, -40, 18, 43, 76, 117,
> > -        -51, 115, -122, -91, 39, 26, -18, 122, 30, 90,
> > +        -45, -20, 16, -21, -64, 8, 79, -41, -28, -101,
> > +        -108, 73, -113, 79, 57, 40, 107, -1, 107, -61,
> >      };
> >      byte[] hmac = Crypto.hmacSha1(key.getBytes(), val.getBytes());
> >      assertArrayEquals(expected, hmac);
> > @@ -53,10 +56,10 @@
> >    @Test
> >    public void testHmacSha1Verify() throws Exception {
> >      String key = "abcd1234";
> > -    String val = "your mother is a hedgehog";
> > +    String val = "your mother is a hedgehog
> > (\u00e4\u00f6\u00fc\u00c4\u00d6\u00dc)";
> >      byte[] expected = new byte[] {
> > -        -21, 2, 47, -101, 9, -40, 18, 43, 76, 117,
> > -        -51, 115, -122, -91, 39, 26, -18, 122, 30, 90,
> > +        -45, -20, 16, -21, -64, 8, 79, -41, -28, -101,
> > +        -108, 73, -113, 79, 57, 40, 107, -1, 107, -61,
> >      };
> >      Crypto.hmacSha1Verify(key.getBytes(), val.getBytes(), expected);
> >    }
> > @@ -65,10 +68,10 @@
> >    @Test
> >    public void testHmacSha1VerifyTampered() throws Exception {
> >      String key = "abcd1234";
> > -    String val = "your mother is a hedgehog";
> > +    String val = "your mother is a hedgehog
> > (\u00e4\u00f6\u00fc\u00c4\u00d6\u00dc)";
> >      byte[] expected = new byte[] {
> > -        -21, 2, 47, -101, 9, -40, 18, 43, 76, 117,
> > -        -51, 115, -122, -91, 39, 0, -18, 122, 30, 90,
> > +        -45, -20, 16, -21, -64, 15, 79, -41, -28, -101,
> > +        -108, 73, -113, 79, 57, 40, 107, -1, 107, -61,
> >      };
> >      try {
> >        Crypto.hmacSha1Verify(key.getBytes(), val.getBytes(), expected);
> >
> >
> > now run
> >
> > export LANG=en_US.UTF-8 ; mvn clean ; mvn
> >
> > and
> >
> > export LANG=en_US.ISO-8859-1 ; mvn clean ; mvn
> >
> > The second one then gives me errors in CryptoTest, which means that
> > the actual bytes depend on the platform encoding. Which is bad if you
> > happen to live outside US-ASCII. :-)
> >
> > I have a largeish patch to make sure that everywhere where getBytes()
> > is used actually getBytes("UTF-8") is used (the only place where this
> > is ok is the BasicBlobCrypter, there is only a single bug in
> > there... :-) ), however this needs to deal with useless
> > UnsupportedEncodingException (but be honest: If getBytes("UTF-8")
> > fails, then this is the smallest of your problems. ;-) ).
> >
> > there is a good article from Joel on Software that deals in depth with
> > the whole encoding shebang. Very readable:
> >
> > http://www.joelonsoftware.com/articles/Unicode.html
> >
> > You can also apply this patch:
> >
> > Index: java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java
> > ===================================================================
> > --- java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java
> > (revision 654541)
> > +++ java/gadgets/src/test/java/org/apache/shindig/util/CryptoTest.java
> > (working copy)
> > @@ -39,6 +39,17 @@
> >    }
> >
> >    @Test
> > +  public void testCharsetEncoding() throws Exception {
> > +      String str = "\u00e4\u00f6\u00fc\u00c4\u00d6\u00dc";
> > +
> > +      assertEquals(12, str.getBytes("UTF-8").length);
> > +      assertEquals(6, str.getBytes("ISO-8859-1").length);
> > +
> > +      assertEquals(12, str.getBytes().length);
> > +  }
> > +
> > +
> > +  @Test
> >    public void testHmacSha1() throws Exception {
> >      String key = "abcd1234";
> >      String val = "your mother is a hedgehog";
> >
> > and run with ISO-8859-1 and UTF-8 platform encodings to illustrate the
> > problem.
> >
> > Best regards
> >     Henning (living in ÀöÃπ country. ;-) )
>
>

Reply via email to