Well spotted! You could also compare with the StandardCharset, instead of going through the name comparison: http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
2015-06-09 13:49 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: > No, it's a Groovy bug. > > private static void writeUTF16BomIfRequired(final String charset, final > OutputStream stream) throws IOException { > if ("UTF-16BE".equals(charset)) { > writeUtf16Bom(stream, true); > } else if ("UTF-16LE".equals(charset)) { > writeUtf16Bom(stream, false); > } > } > > should be > > private static void writeUTF16BomIfRequired(final String charset, final > OutputStream stream) throws IOException { > if ("UTF-16BE".equals(Charset.forName(charset).name())) { > writeUtf16Bom(stream, true); > } else if ("UTF-16LE".equals(Charset.forName(charset).name())) { > writeUtf16Bom(stream, false); > } > } > > in org.codehaus.groovy.runtime.ResourceGroovyMethods. We'll probably want > to fix that regardless of what we decide on the *withPrintWriter* > question. I'll open a Jira and a PR. > > -Keegan > > > > On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <glafo...@gmail.com> > wrote: > >> From Groovy's point of view (ie. when you're coding in Groovy), the BOM >> is automatically discarded when you use one of our reader methods >> (withReader, etc), so it's transparent whether the BOM is here or not. >> >> I tend to think that having the BOM always is a good thing (I even >> thought that was mandatory), but Groovy should guess the endianness >> regardless anyway. >> >> Happy to hear what others think too about all this though. >> >> Guillaume >> >> >> 2015-06-08 23:20 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >> >>> The code as-is today writes the BOM regardless of platform. I just >>> tested in Linux with the same results. I think there are 2 parts to the >>> question of "what's the correct behavior?" >>> >>> 1. Should the BOM be written at all, particularly when the platform is >>> Windows? >>> 2. Should the behavior of *withPrintWriter* differ (even if the >>> difference is to be smarter) from the behavior of *new PrintWriter*? >>> >>> *Discussion* >>> 1. Strictly speaking, yes. Because RFC 2781 >>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume >>> big endian if there is no BOM. However, in practice, many applications >>> disregard the RFC and assume little-endian because that's what Windows >>> does >>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>. >>> Because of this, the behavior could be changed so that when writing >>> UTF-16LE on Windows, it doesn't write the BOM. But in my opinion, it's >>> best practice to always write a BOM when working with UTF-16, and Java >>> should have done this in their implementation of their PrintWriter. >>> >>> 2. This is a tough one. Arguably, *withPrintWriter* is doing the >>> smarter, more correct behavior, but the typical user would assume this is >>> just a shorthand convenience for newing up a PrintWriter (I certainly >>> did). So the question is, is it better to just document this difference in >>> the GroovyDoc? Or to change the behavior to be closer to Java? And if the >>> latter, what breakages would that cause within Groovy itself? Making that >>> change could break folks in production, because they could rely on that BOM >>> being there, in cases for example where the file is created on Windows, but >>> then processed on Linux or when working with a third party library that is >>> more picky about the presence of a BOM. >>> >>> -Keegan >>> >>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <glafo...@gmail.com> >>> wrote: >>> >>>> Now... is it what should be done or not is the good question to ask :-) >>>> Does Windows manages to open UTF-16 files without BOMs? >>>> >>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>> >>>>> I forgot to mention that. Yes, I ran the test mentioned in Windows. >>>>> >>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <glafo...@gmail.com> >>>>> wrote: >>>>> >>>>>> That's a good question. >>>>>> I guess this is happening on Windows? (I haven't tried here, since >>>>>> I'm on OS X) >>>>>> I think BOMs were mandatory in text files on Windows. >>>>>> >>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>> >>>>>>> I've always taken a perverse pleasure in character encoding >>>>>>> problems. I was intrigued by this SO question >>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> >>>>>>> on >>>>>>> UTF 16 BOMs in Java vs Groovy. >>>>>>> >>>>>>> It appears using withPrintWriter(charset) produces a BOM whereas new >>>>>>> PrintWriter(file, charset) does not. As demonstrated here: >>>>>>> >>>>>>> File file = new File("tmp.txt")try { >>>>>>> String text = " " >>>>>>> String charset = "UTF-16LE" >>>>>>> >>>>>>> file.withPrintWriter(charset) { it << text } >>>>>>> println "withPrintWriter" >>>>>>> file.getBytes().each { System.out.format("%02x ", it) } >>>>>>> >>>>>>> PrintWriter w = new PrintWriter(file, charset) >>>>>>> w.print(text) >>>>>>> w.close() >>>>>>> println "\n\nnew PrintWriter" >>>>>>> file.getBytes().each { System.out.format("%02x ", it) }} finally { >>>>>>> file.delete()} >>>>>>> >>>>>>> Outputs >>>>>>> >>>>>>> withPrintWriter >>>>>>> ff fe 20 00 >>>>>>> >>>>>>> new PrintWriter >>>>>>> 20 00 >>>>>>> >>>>>>> >>>>>>> Is this difference in behavior intentional? It seems kinda odd to >>>>>>> me. >>>>>>> >>>>>>> -Keegan >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Guillaume Laforge >>>>>> Groovy Project Manager >>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>> >>>>>> Blog: http://glaforge.appspot.com/ >>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Guillaume Laforge >>>> Groovy Project Manager >>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>> >>>> Blog: http://glaforge.appspot.com/ >>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>> >>> >>> >> >> >> -- >> Guillaume Laforge >> Groovy Project Manager >> Product Ninja & Advocate at Restlet <http://restlet.com> >> >> Blog: http://glaforge.appspot.com/ >> Social: @glaforge <http://twitter.com/glaforge> / Google+ >> <https://plus.google.com/u/0/114130972232398734985/posts> >> > > -- Guillaume Laforge Groovy Project Manager Product Ninja & Advocate at Restlet <http://restlet.com> Blog: http://glaforge.appspot.com/ Social: @glaforge <http://twitter.com/glaforge> / Google+ <https://plus.google.com/u/0/114130972232398734985/posts>