No, it's a Groovy bug. private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException { if ("UTF-16BE".equals(charset)) { writeUtf16Bom(stream, true); } else if ("UTF-16LE".equals(charset)) { writeUtf16Bom(stream, false); } }
should be private static void writeUTF16BomIfRequired(final String charset, final OutputStream stream) throws IOException { if ("UTF-16BE".equals(Charset.forName(charset).name())) { writeUtf16Bom(stream, true); } else if ("UTF-16LE".equals(Charset.forName(charset).name())) { writeUtf16Bom(stream, false); } } in org.codehaus.groovy.runtime.ResourceGroovyMethods. We'll probably want to fix that regardless of what we decide on the *withPrintWriter* question. I'll open a Jira and a PR. -Keegan On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <glafo...@gmail.com> wrote: > From Groovy's point of view (ie. when you're coding in Groovy), the BOM is > automatically discarded when you use one of our reader methods (withReader, > etc), so it's transparent whether the BOM is here or not. > > I tend to think that having the BOM always is a good thing (I even thought > that was mandatory), but Groovy should guess the endianness regardless > anyway. > > Happy to hear what others think too about all this though. > > Guillaume > > > 2015-06-08 23:20 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: > >> The code as-is today writes the BOM regardless of platform. I just >> tested in Linux with the same results. I think there are 2 parts to the >> question of "what's the correct behavior?" >> >> 1. Should the BOM be written at all, particularly when the platform is >> Windows? >> 2. Should the behavior of *withPrintWriter* differ (even if the >> difference is to be smarter) from the behavior of *new PrintWriter*? >> >> *Discussion* >> 1. Strictly speaking, yes. Because RFC 2781 >> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume big >> endian if there is no BOM. However, in practice, many applications >> disregard the RFC and assume little-endian because that's what Windows >> does >> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>. >> Because of this, the behavior could be changed so that when writing >> UTF-16LE on Windows, it doesn't write the BOM. But in my opinion, it's >> best practice to always write a BOM when working with UTF-16, and Java >> should have done this in their implementation of their PrintWriter. >> >> 2. This is a tough one. Arguably, *withPrintWriter* is doing the >> smarter, more correct behavior, but the typical user would assume this is >> just a shorthand convenience for newing up a PrintWriter (I certainly >> did). So the question is, is it better to just document this difference in >> the GroovyDoc? Or to change the behavior to be closer to Java? And if the >> latter, what breakages would that cause within Groovy itself? Making that >> change could break folks in production, because they could rely on that BOM >> being there, in cases for example where the file is created on Windows, but >> then processed on Linux or when working with a third party library that is >> more picky about the presence of a BOM. >> >> -Keegan >> >> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <glafo...@gmail.com> >> wrote: >> >>> Now... is it what should be done or not is the good question to ask :-) >>> Does Windows manages to open UTF-16 files without BOMs? >>> >>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>> >>>> I forgot to mention that. Yes, I ran the test mentioned in Windows. >>>> >>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <glafo...@gmail.com> >>>> wrote: >>>> >>>>> That's a good question. >>>>> I guess this is happening on Windows? (I haven't tried here, since I'm >>>>> on OS X) >>>>> I think BOMs were mandatory in text files on Windows. >>>>> >>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>> >>>>>> I've always taken a perverse pleasure in character encoding >>>>>> problems. I was intrigued by this SO question >>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> >>>>>> on >>>>>> UTF 16 BOMs in Java vs Groovy. >>>>>> >>>>>> It appears using withPrintWriter(charset) produces a BOM whereas new >>>>>> PrintWriter(file, charset) does not. As demonstrated here: >>>>>> >>>>>> File file = new File("tmp.txt")try { >>>>>> String text = " " >>>>>> String charset = "UTF-16LE" >>>>>> >>>>>> file.withPrintWriter(charset) { it << text } >>>>>> println "withPrintWriter" >>>>>> file.getBytes().each { System.out.format("%02x ", it) } >>>>>> >>>>>> PrintWriter w = new PrintWriter(file, charset) >>>>>> w.print(text) >>>>>> w.close() >>>>>> println "\n\nnew PrintWriter" >>>>>> file.getBytes().each { System.out.format("%02x ", it) }} finally { >>>>>> file.delete()} >>>>>> >>>>>> Outputs >>>>>> >>>>>> withPrintWriter >>>>>> ff fe 20 00 >>>>>> >>>>>> new PrintWriter >>>>>> 20 00 >>>>>> >>>>>> >>>>>> Is this difference in behavior intentional? It seems kinda odd to me. >>>>>> >>>>>> -Keegan >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Guillaume Laforge >>>>> Groovy Project Manager >>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>> >>>>> Blog: http://glaforge.appspot.com/ >>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>> >>>> >>>> >>> >>> >>> -- >>> Guillaume Laforge >>> Groovy Project Manager >>> Product Ninja & Advocate at Restlet <http://restlet.com> >>> >>> Blog: http://glaforge.appspot.com/ >>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>> <https://plus.google.com/u/0/114130972232398734985/posts> >>> >> >> > > > -- > Guillaume Laforge > Groovy Project Manager > Product Ninja & Advocate at Restlet <http://restlet.com> > > Blog: http://glaforge.appspot.com/ > Social: @glaforge <http://twitter.com/glaforge> / Google+ > <https://plus.google.com/u/0/114130972232398734985/posts> >