Cool. I'll wait for PR 36 to be merged first, because I also was thinking the Javadoc would be changed from is "UTF-16BE" or "UTF-16LE" to is "UTF-16BE" or "UTF-16LE" (or an equivalent alias)
-Keegan On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <glafo...@gmail.com> wrote: > > 2015-06-09 15:04 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: > >> Created GROOVY-7461 <https://issues.apache.org/jira/browse/GROOVY-7461> >> and PR 36 <https://github.com/apache/incubator-groovy/pull/36>. >> > > Cool! > > >> How would you feel about a PR to copy the Javadoc comment mentioning the >> UTF-16 BOM on File.newWriter to all the other methods that use >> writeUTF16BomIfRequired (at least until we decide we're going to change >> the current behavior)? >> > > Right, worth it! > > >> >> -Keegan >> >> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <glafo...@gmail.com> >> wrote: >> >>> Good point! >>> >>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>> >>>> That's only available in Java 7. Isn't Groovy still targeting 1.6 for >>>> the non-indy version? >>>> >>>> -Keegan >>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <glafo...@gmail.com> wrote: >>>> >>>>> Well spotted! >>>>> >>>>> You could also compare with the StandardCharset, instead of going >>>>> through the name comparison: >>>>> >>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html >>>>> >>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>> >>>>>> No, it's a Groovy bug. >>>>>> >>>>>> private static void writeUTF16BomIfRequired(final String charset, final >>>>>> OutputStream stream) throws IOException { >>>>>> if ("UTF-16BE".equals(charset)) { >>>>>> writeUtf16Bom(stream, true); >>>>>> } else if ("UTF-16LE".equals(charset)) { >>>>>> writeUtf16Bom(stream, false); >>>>>> } >>>>>> } >>>>>> >>>>>> should be >>>>>> >>>>>> private static void writeUTF16BomIfRequired(final String charset, final >>>>>> OutputStream stream) throws IOException { >>>>>> if ("UTF-16BE".equals(Charset.forName(charset).name())) { >>>>>> writeUtf16Bom(stream, true); >>>>>> } else if ("UTF-16LE".equals(Charset.forName(charset).name())) { >>>>>> writeUtf16Bom(stream, false); >>>>>> } >>>>>> } >>>>>> >>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods. We'll probably >>>>>> want to fix that regardless of what we decide on the >>>>>> *withPrintWriter* question. I'll open a Jira and a PR. >>>>>> >>>>>> -Keegan >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <glafo...@gmail.com >>>>>> > wrote: >>>>>> >>>>>>> From Groovy's point of view (ie. when you're coding in Groovy), the >>>>>>> BOM is automatically discarded when you use one of our reader methods >>>>>>> (withReader, etc), so it's transparent whether the BOM is here or not. >>>>>>> >>>>>>> I tend to think that having the BOM always is a good thing (I even >>>>>>> thought that was mandatory), but Groovy should guess the endianness >>>>>>> regardless anyway. >>>>>>> >>>>>>> Happy to hear what others think too about all this though. >>>>>>> >>>>>>> Guillaume >>>>>>> >>>>>>> >>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>> >>>>>>>> The code as-is today writes the BOM regardless of platform. I just >>>>>>>> tested in Linux with the same results. I think there are 2 parts to >>>>>>>> the >>>>>>>> question of "what's the correct behavior?" >>>>>>>> >>>>>>>> 1. Should the BOM be written at all, particularly when the >>>>>>>> platform is Windows? >>>>>>>> 2. Should the behavior of *withPrintWriter* differ (even if the >>>>>>>> difference is to be smarter) from the behavior of *new PrintWriter* >>>>>>>> ? >>>>>>>> >>>>>>>> *Discussion* >>>>>>>> 1. Strictly speaking, yes. Because RFC 2781 >>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to >>>>>>>> assume big endian if there is no BOM. However, in practice, many >>>>>>>> applications disregard the RFC and assume little-endian because that's >>>>>>>> what Windows >>>>>>>> does >>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>. >>>>>>>> Because of this, the behavior could be changed so that when writing >>>>>>>> UTF-16LE on Windows, it doesn't write the BOM. But in my opinion, it's >>>>>>>> best practice to always write a BOM when working with UTF-16, and Java >>>>>>>> should have done this in their implementation of their PrintWriter. >>>>>>>> >>>>>>>> 2. This is a tough one. Arguably, *withPrintWriter* is doing the >>>>>>>> smarter, more correct behavior, but the typical user would assume this >>>>>>>> is >>>>>>>> just a shorthand convenience for newing up a PrintWriter (I certainly >>>>>>>> did). So the question is, is it better to just document this >>>>>>>> difference in >>>>>>>> the GroovyDoc? Or to change the behavior to be closer to Java? And >>>>>>>> if the >>>>>>>> latter, what breakages would that cause within Groovy itself? Making >>>>>>>> that >>>>>>>> change could break folks in production, because they could rely on >>>>>>>> that BOM >>>>>>>> being there, in cases for example where the file is created on >>>>>>>> Windows, but >>>>>>>> then processed on Linux or when working with a third party library >>>>>>>> that is >>>>>>>> more picky about the presence of a BOM. >>>>>>>> >>>>>>>> -Keegan >>>>>>>> >>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge < >>>>>>>> glafo...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Now... is it what should be done or not is the good question to >>>>>>>>> ask :-) >>>>>>>>> Does Windows manages to open UTF-16 files without BOMs? >>>>>>>>> >>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>>>> >>>>>>>>>> I forgot to mention that. Yes, I ran the test mentioned in >>>>>>>>>> Windows. >>>>>>>>>> >>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge < >>>>>>>>>> glafo...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> That's a good question. >>>>>>>>>>> I guess this is happening on Windows? (I haven't tried here, >>>>>>>>>>> since I'm on OS X) >>>>>>>>>>> I think BOMs were mandatory in text files on Windows. >>>>>>>>>>> >>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>>>>>> >>>>>>>>>>>> I've always taken a perverse pleasure in character encoding >>>>>>>>>>>> problems. I was intrigued by this SO question >>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> >>>>>>>>>>>> on >>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy. >>>>>>>>>>>> >>>>>>>>>>>> It appears using withPrintWriter(charset) produces a BOM >>>>>>>>>>>> whereas new PrintWriter(file, charset) does not. As >>>>>>>>>>>> demonstrated here: >>>>>>>>>>>> >>>>>>>>>>>> File file = new File("tmp.txt")try { >>>>>>>>>>>> String text = " " >>>>>>>>>>>> String charset = "UTF-16LE" >>>>>>>>>>>> >>>>>>>>>>>> file.withPrintWriter(charset) { it << text } >>>>>>>>>>>> println "withPrintWriter" >>>>>>>>>>>> file.getBytes().each { System.out.format("%02x ", it) } >>>>>>>>>>>> >>>>>>>>>>>> PrintWriter w = new PrintWriter(file, charset) >>>>>>>>>>>> w.print(text) >>>>>>>>>>>> w.close() >>>>>>>>>>>> println "\n\nnew PrintWriter" >>>>>>>>>>>> file.getBytes().each { System.out.format("%02x ", it) }} >>>>>>>>>>>> finally { >>>>>>>>>>>> file.delete()} >>>>>>>>>>>> >>>>>>>>>>>> Outputs >>>>>>>>>>>> >>>>>>>>>>>> withPrintWriter >>>>>>>>>>>> ff fe 20 00 >>>>>>>>>>>> >>>>>>>>>>>> new PrintWriter >>>>>>>>>>>> 20 00 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Is this difference in behavior intentional? It seems kinda odd >>>>>>>>>>>> to me. >>>>>>>>>>>> >>>>>>>>>>>> -Keegan >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Guillaume Laforge >>>>>>>>>>> Groovy Project Manager >>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>>>> >>>>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Guillaume Laforge >>>>>>>>> Groovy Project Manager >>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>> >>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Guillaume Laforge >>>>>>> Groovy Project Manager >>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>> >>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Guillaume Laforge >>>>> Groovy Project Manager >>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>> >>>>> Blog: http://glaforge.appspot.com/ >>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>> >>>> >>> >>> >>> -- >>> Guillaume Laforge >>> Groovy Project Manager >>> Product Ninja & Advocate at Restlet <http://restlet.com> >>> >>> Blog: http://glaforge.appspot.com/ >>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>> <https://plus.google.com/u/0/114130972232398734985/posts> >>> >> >> > > > -- > Guillaume Laforge > Groovy Project Manager > Product Ninja & Advocate at Restlet <http://restlet.com> > > Blog: http://glaforge.appspot.com/ > Social: @glaforge <http://twitter.com/glaforge> / Google+ > <https://plus.google.com/u/0/114130972232398734985/posts> >