I'm wondering if NioGroovyMethods that implement the write methods for Path should do the same.
Cheers, Paolo On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <keeganw...@gmail.com> wrote: > Cool. I'll wait for PR 36 to be merged first, because I also was thinking > the Javadoc would be changed from > is "UTF-16BE" or "UTF-16LE" > to > is "UTF-16BE" or "UTF-16LE" (or an equivalent alias) > > -Keegan > > > On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge <glafo...@gmail.com> > wrote: > >> >> 2015-06-09 15:04 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >> >>> Created GROOVY-7461 <https://issues.apache.org/jira/browse/GROOVY-7461> >>> and PR 36 <https://github.com/apache/incubator-groovy/pull/36>. >>> >> >> Cool! >> >> >>> How would you feel about a PR to copy the Javadoc comment mentioning the >>> UTF-16 BOM on File.newWriter to all the other methods that use >>> writeUTF16BomIfRequired (at least until we decide we're going to change >>> the current behavior)? >>> >> >> Right, worth it! >> >> >>> >>> -Keegan >>> >>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge <glafo...@gmail.com> >>> wrote: >>> >>>> Good point! >>>> >>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>> >>>>> That's only available in Java 7. Isn't Groovy still targeting 1.6 for >>>>> the non-indy version? >>>>> >>>>> -Keegan >>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <glafo...@gmail.com> >>>>> wrote: >>>>> >>>>>> Well spotted! >>>>>> >>>>>> You could also compare with the StandardCharset, instead of going >>>>>> through the name comparison: >>>>>> >>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html >>>>>> >>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>> >>>>>>> No, it's a Groovy bug. >>>>>>> >>>>>>> private static void writeUTF16BomIfRequired(final String charset, final >>>>>>> OutputStream stream) throws IOException { >>>>>>> if ("UTF-16BE".equals(charset)) { >>>>>>> writeUtf16Bom(stream, true); >>>>>>> } else if ("UTF-16LE".equals(charset)) { >>>>>>> writeUtf16Bom(stream, false); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> should be >>>>>>> >>>>>>> private static void writeUTF16BomIfRequired(final String charset, final >>>>>>> OutputStream stream) throws IOException { >>>>>>> if ("UTF-16BE".equals(Charset.forName(charset).name())) { >>>>>>> writeUtf16Bom(stream, true); >>>>>>> } else if ("UTF-16LE".equals(Charset.forName(charset).name())) { >>>>>>> writeUtf16Bom(stream, false); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods. We'll >>>>>>> probably want to fix that regardless of what we decide on the >>>>>>> *withPrintWriter* question. I'll open a Jira and a PR. >>>>>>> >>>>>>> -Keegan >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge < >>>>>>> glafo...@gmail.com> wrote: >>>>>>> >>>>>>>> From Groovy's point of view (ie. when you're coding in Groovy), the >>>>>>>> BOM is automatically discarded when you use one of our reader methods >>>>>>>> (withReader, etc), so it's transparent whether the BOM is here or not. >>>>>>>> >>>>>>>> I tend to think that having the BOM always is a good thing (I even >>>>>>>> thought that was mandatory), but Groovy should guess the endianness >>>>>>>> regardless anyway. >>>>>>>> >>>>>>>> Happy to hear what others think too about all this though. >>>>>>>> >>>>>>>> Guillaume >>>>>>>> >>>>>>>> >>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>>> >>>>>>>>> The code as-is today writes the BOM regardless of platform. I >>>>>>>>> just tested in Linux with the same results. I think there are 2 >>>>>>>>> parts to >>>>>>>>> the question of "what's the correct behavior?" >>>>>>>>> >>>>>>>>> 1. Should the BOM be written at all, particularly when the >>>>>>>>> platform is Windows? >>>>>>>>> 2. Should the behavior of *withPrintWriter* differ (even if the >>>>>>>>> difference is to be smarter) from the behavior of *new >>>>>>>>> PrintWriter*? >>>>>>>>> >>>>>>>>> *Discussion* >>>>>>>>> 1. Strictly speaking, yes. Because RFC 2781 >>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to >>>>>>>>> assume big endian if there is no BOM. However, in practice, many >>>>>>>>> applications disregard the RFC and assume little-endian because >>>>>>>>> that's what Windows >>>>>>>>> does >>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>. >>>>>>>>> Because of this, the behavior could be changed so that when writing >>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM. But in my opinion, >>>>>>>>> it's >>>>>>>>> best practice to always write a BOM when working with UTF-16, and Java >>>>>>>>> should have done this in their implementation of their PrintWriter. >>>>>>>>> >>>>>>>>> 2. This is a tough one. Arguably, *withPrintWriter* is doing >>>>>>>>> the smarter, more correct behavior, but the typical user would assume >>>>>>>>> this >>>>>>>>> is just a shorthand convenience for newing up a PrintWriter (I >>>>>>>>> certainly >>>>>>>>> did). So the question is, is it better to just document this >>>>>>>>> difference in >>>>>>>>> the GroovyDoc? Or to change the behavior to be closer to Java? And >>>>>>>>> if the >>>>>>>>> latter, what breakages would that cause within Groovy itself? Making >>>>>>>>> that >>>>>>>>> change could break folks in production, because they could rely on >>>>>>>>> that BOM >>>>>>>>> being there, in cases for example where the file is created on >>>>>>>>> Windows, but >>>>>>>>> then processed on Linux or when working with a third party library >>>>>>>>> that is >>>>>>>>> more picky about the presence of a BOM. >>>>>>>>> >>>>>>>>> -Keegan >>>>>>>>> >>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge < >>>>>>>>> glafo...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Now... is it what should be done or not is the good question to >>>>>>>>>> ask :-) >>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs? >>>>>>>>>> >>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>>>>> >>>>>>>>>>> I forgot to mention that. Yes, I ran the test mentioned in >>>>>>>>>>> Windows. >>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge < >>>>>>>>>>> glafo...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> That's a good question. >>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried here, >>>>>>>>>>>> since I'm on OS X) >>>>>>>>>>>> I think BOMs were mandatory in text files on Windows. >>>>>>>>>>>> >>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>>>>>>> >>>>>>>>>>>>> I've always taken a perverse pleasure in character encoding >>>>>>>>>>>>> problems. I was intrigued by this SO question >>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> >>>>>>>>>>>>> on >>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy. >>>>>>>>>>>>> >>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a BOM >>>>>>>>>>>>> whereas new PrintWriter(file, charset) does not. As >>>>>>>>>>>>> demonstrated here: >>>>>>>>>>>>> >>>>>>>>>>>>> File file = new File("tmp.txt")try { >>>>>>>>>>>>> String text = " " >>>>>>>>>>>>> String charset = "UTF-16LE" >>>>>>>>>>>>> >>>>>>>>>>>>> file.withPrintWriter(charset) { it << text } >>>>>>>>>>>>> println "withPrintWriter" >>>>>>>>>>>>> file.getBytes().each { System.out.format("%02x ", it) } >>>>>>>>>>>>> >>>>>>>>>>>>> PrintWriter w = new PrintWriter(file, charset) >>>>>>>>>>>>> w.print(text) >>>>>>>>>>>>> w.close() >>>>>>>>>>>>> println "\n\nnew PrintWriter" >>>>>>>>>>>>> file.getBytes().each { System.out.format("%02x ", it) }} >>>>>>>>>>>>> finally { >>>>>>>>>>>>> file.delete()} >>>>>>>>>>>>> >>>>>>>>>>>>> Outputs >>>>>>>>>>>>> >>>>>>>>>>>>> withPrintWriter >>>>>>>>>>>>> ff fe 20 00 >>>>>>>>>>>>> >>>>>>>>>>>>> new PrintWriter >>>>>>>>>>>>> 20 00 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Is this difference in behavior intentional? It seems kinda >>>>>>>>>>>>> odd to me. >>>>>>>>>>>>> >>>>>>>>>>>>> -Keegan >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Guillaume Laforge >>>>>>>>>>>> Groovy Project Manager >>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>>>>> >>>>>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Guillaume Laforge >>>>>>>>>> Groovy Project Manager >>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>>> >>>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Guillaume Laforge >>>>>>>> Groovy Project Manager >>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>> >>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Guillaume Laforge >>>>>> Groovy Project Manager >>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>> >>>>>> Blog: http://glaforge.appspot.com/ >>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Guillaume Laforge >>>> Groovy Project Manager >>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>> >>>> Blog: http://glaforge.appspot.com/ >>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>> >>> >>> >> >> >> -- >> Guillaume Laforge >> Groovy Project Manager >> Product Ninja & Advocate at Restlet <http://restlet.com> >> >> Blog: http://glaforge.appspot.com/ >> Social: @glaforge <http://twitter.com/glaforge> / Google+ >> <https://plus.google.com/u/0/114130972232398734985/posts> >> > >