Another point of interest is that the current code doesn't respect aliases. For example, the charset string "UTF_16LE" will not write the BOM, despite being an alias for "UTF-16LE"
-Keegan On Jun 8, 2015 5:20 PM, "Keegan Witt" <keeganw...@gmail.com> wrote: > The code as-is today writes the BOM regardless of platform. I just tested > in Linux with the same results. I think there are 2 parts to the question > of "what's the correct behavior?" > > 1. Should the BOM be written at all, particularly when the platform is > Windows? > 2. Should the behavior of *withPrintWriter* differ (even if the > difference is to be smarter) from the behavior of *new PrintWriter*? > > *Discussion* > 1. Strictly speaking, yes. Because RFC 2781 > <http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume big > endian if there is no BOM. However, in practice, many applications > disregard the RFC and assume little-endian because that's what Windows > does > <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>. > Because of this, the behavior could be changed so that when writing > UTF-16LE on Windows, it doesn't write the BOM. But in my opinion, it's > best practice to always write a BOM when working with UTF-16, and Java > should have done this in their implementation of their PrintWriter. > > 2. This is a tough one. Arguably, *withPrintWriter* is doing the > smarter, more correct behavior, but the typical user would assume this is > just a shorthand convenience for newing up a PrintWriter (I certainly > did). So the question is, is it better to just document this difference in > the GroovyDoc? Or to change the behavior to be closer to Java? And if the > latter, what breakages would that cause within Groovy itself? Making that > change could break folks in production, because they could rely on that BOM > being there, in cases for example where the file is created on Windows, but > then processed on Linux or when working with a third party library that is > more picky about the presence of a BOM. > > -Keegan > > On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <glafo...@gmail.com> > wrote: > >> Now... is it what should be done or not is the good question to ask :-) >> Does Windows manages to open UTF-16 files without BOMs? >> >> 2015-06-08 22:17 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >> >>> I forgot to mention that. Yes, I ran the test mentioned in Windows. >>> >>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <glafo...@gmail.com> >>> wrote: >>> >>>> That's a good question. >>>> I guess this is happening on Windows? (I haven't tried here, since I'm >>>> on OS X) >>>> I think BOMs were mandatory in text files on Windows. >>>> >>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>> >>>>> I've always taken a perverse pleasure in character encoding problems. >>>>> I was intrigued by this SO question >>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> >>>>> on >>>>> UTF 16 BOMs in Java vs Groovy. >>>>> >>>>> It appears using withPrintWriter(charset) produces a BOM whereas new >>>>> PrintWriter(file, charset) does not. As demonstrated here: >>>>> >>>>> File file = new File("tmp.txt")try { >>>>> String text = " " >>>>> String charset = "UTF-16LE" >>>>> >>>>> file.withPrintWriter(charset) { it << text } >>>>> println "withPrintWriter" >>>>> file.getBytes().each { System.out.format("%02x ", it) } >>>>> >>>>> PrintWriter w = new PrintWriter(file, charset) >>>>> w.print(text) >>>>> w.close() >>>>> println "\n\nnew PrintWriter" >>>>> file.getBytes().each { System.out.format("%02x ", it) }} finally { >>>>> file.delete()} >>>>> >>>>> Outputs >>>>> >>>>> withPrintWriter >>>>> ff fe 20 00 >>>>> >>>>> new PrintWriter >>>>> 20 00 >>>>> >>>>> >>>>> Is this difference in behavior intentional? It seems kinda odd to me. >>>>> >>>>> -Keegan >>>>> >>>> >>>> >>>> >>>> -- >>>> Guillaume Laforge >>>> Groovy Project Manager >>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>> >>>> Blog: http://glaforge.appspot.com/ >>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>> >>> >>> >> >> >> -- >> Guillaume Laforge >> Groovy Project Manager >> Product Ninja & Advocate at Restlet <http://restlet.com> >> >> Blog: http://glaforge.appspot.com/ >> Social: @glaforge <http://twitter.com/glaforge> / Google+ >> <https://plus.google.com/u/0/114130972232398734985/posts> >> > >