So let's say, perhaps, we don't generate a BOM, unless asked specifically... but not with new methods, but with new parameters to such methods. In addition to specifying a charset, we could also pass a boolean saying we want a BOM to be generated (false by default, needs to be specified as true if BOM wanted) ?
2015-06-09 21:47 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: > I get that -- and I wish JDK did the same. But what bothers me most about > the current state is that sometimes it's transparent, sometimes it's not -- > depending on how it was invoked. And while we could fix the new instance > usage too with metaClass, that could lead to weird inconsistencies when > Groovy is invoked from Java. > > I really think most users would not expect these two usages to behave > differently. I think most would expect the difference to be stylistic > only. So as much as it pains me to say this, I think it's better not to > violate the principle of least surprise, and remain consistent across all > styles of invocation with Java's poor life choices. > > But maybe the friendlier APIs can be moved into new methods, such as new > BomAwareWriter() / WithBomAwareWriter{} What do you think? If we did > that, I guess it'd be consistent to do the same for the readers as well. > > -Keegan > > On Tue, Jun 9, 2015 at 3:22 PM, Guillaume Laforge <glafo...@gmail.com> > wrote: > >> >> 2015-06-09 18:57 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >> >>> I created PR 37 <https://github.com/apache/incubator-groovy/pull/37> to >>> correct the JavaDoc I mentioned (as well as to document the existing >>> behavior for the non-NIO methods). >>> >>> Java doesn't eat the BOM, but this is a problem Java folks are used to >>> dealing with, and why things like Apache Common-IO's BOMInputStream >>> <https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html> >>> exist. >>> >> >> That's also why I made Groovy eat the BOM too, so that it's transparent >> to our users :-) >> But that was a long time ago since I worked on those parts of the >> codebase, and it's been refactored quite a bit (by Jim for example). >> >> >>> >>> -Keegan >>> >>> On Tue, Jun 9, 2015 at 11:33 AM, Guillaume Laforge <glafo...@gmail.com> >>> wrote: >>> >>>> So now, how to decide what's best? :-) >>>> >>>> Is a Java reader happy with the BOM? and eats it transparently? (I >>>> think in the past that wasn't the case but I may be wrong) >>>> >>>> 2015-06-09 17:21 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>> >>>>> That's an excellent point, Paolo. NioGroovyMethods.newWriter claims >>>>> (in the JavaDoc) it will write the BOM if needed, but it doesn't because >>>>> it >>>>> uses Java's implementation rather than with Groovy's >>>>> writeUTF16BomIfRequired. None of the methods in NioGroovyMethods use >>>>> writeUTF16BomIfRequired. >>>>> >>>>> Whichever we decide, we should be consistent. >>>>> >>>>> -Keegan >>>>> >>>>> On Tue, Jun 9, 2015 at 11:08 AM, Paolo Di Tommaso < >>>>> paolo.ditomm...@gmail.com> wrote: >>>>> >>>>>> I'm wondering if NioGroovyMethods that implement the write methods >>>>>> for Path should do the same. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Paolo >>>>>> >>>>>> >>>>>> On Tue, Jun 9, 2015 at 4:02 PM, Keegan Witt <keeganw...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Cool. I'll wait for PR 36 to be merged first, because I also was >>>>>>> thinking the Javadoc would be changed from >>>>>>> is "UTF-16BE" or "UTF-16LE" >>>>>>> to >>>>>>> is "UTF-16BE" or "UTF-16LE" (or an equivalent alias) >>>>>>> >>>>>>> -Keegan >>>>>>> >>>>>>> >>>>>>> On Tue, Jun 9, 2015 at 9:08 AM, Guillaume Laforge < >>>>>>> glafo...@gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> 2015-06-09 15:04 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>>> >>>>>>>>> Created GROOVY-7461 >>>>>>>>> <https://issues.apache.org/jira/browse/GROOVY-7461> and PR 36 >>>>>>>>> <https://github.com/apache/incubator-groovy/pull/36>. >>>>>>>>> >>>>>>>> >>>>>>>> Cool! >>>>>>>> >>>>>>>> >>>>>>>>> How would you feel about a PR to copy the Javadoc comment >>>>>>>>> mentioning the UTF-16 BOM on File.newWriter to all the other >>>>>>>>> methods that use writeUTF16BomIfRequired (at least until we >>>>>>>>> decide we're going to change the current behavior)? >>>>>>>>> >>>>>>>> >>>>>>>> Right, worth it! >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> -Keegan >>>>>>>>> >>>>>>>>> On Tue, Jun 9, 2015 at 8:17 AM, Guillaume Laforge < >>>>>>>>> glafo...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Good point! >>>>>>>>>> >>>>>>>>>> 2015-06-09 14:11 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>>>>> >>>>>>>>>>> That's only available in Java 7. Isn't Groovy still targeting >>>>>>>>>>> 1.6 for the non-indy version? >>>>>>>>>>> >>>>>>>>>>> -Keegan >>>>>>>>>>> On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <glafo...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Well spotted! >>>>>>>>>>>> >>>>>>>>>>>> You could also compare with the StandardCharset, instead of >>>>>>>>>>>> going through the name comparison: >>>>>>>>>>>> >>>>>>>>>>>> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html >>>>>>>>>>>> >>>>>>>>>>>> 2015-06-09 13:49 GMT+02:00 Keegan Witt <keeganw...@gmail.com>: >>>>>>>>>>>> >>>>>>>>>>>>> No, it's a Groovy bug. >>>>>>>>>>>>> >>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, >>>>>>>>>>>>> final OutputStream stream) throws IOException { >>>>>>>>>>>>> if ("UTF-16BE".equals(charset)) { >>>>>>>>>>>>> writeUtf16Bom(stream, true); >>>>>>>>>>>>> } else if ("UTF-16LE".equals(charset)) { >>>>>>>>>>>>> writeUtf16Bom(stream, false); >>>>>>>>>>>>> } >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> should be >>>>>>>>>>>>> >>>>>>>>>>>>> private static void writeUTF16BomIfRequired(final String charset, >>>>>>>>>>>>> final OutputStream stream) throws IOException { >>>>>>>>>>>>> if ("UTF-16BE".equals(Charset.forName(charset).name())) { >>>>>>>>>>>>> writeUtf16Bom(stream, true); >>>>>>>>>>>>> } else if >>>>>>>>>>>>> ("UTF-16LE".equals(Charset.forName(charset).name())) { >>>>>>>>>>>>> writeUtf16Bom(stream, false); >>>>>>>>>>>>> } >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> in org.codehaus.groovy.runtime.ResourceGroovyMethods. We'll >>>>>>>>>>>>> probably want to fix that regardless of what we decide on the >>>>>>>>>>>>> *withPrintWriter* question. I'll open a Jira and a PR. >>>>>>>>>>>>> >>>>>>>>>>>>> -Keegan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge < >>>>>>>>>>>>> glafo...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> From Groovy's point of view (ie. when you're coding in >>>>>>>>>>>>>> Groovy), the BOM is automatically discarded when you use one of >>>>>>>>>>>>>> our reader >>>>>>>>>>>>>> methods (withReader, etc), so it's transparent whether the BOM >>>>>>>>>>>>>> is here or >>>>>>>>>>>>>> not. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I tend to think that having the BOM always is a good thing (I >>>>>>>>>>>>>> even thought that was mandatory), but Groovy should guess the >>>>>>>>>>>>>> endianness >>>>>>>>>>>>>> regardless anyway. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Happy to hear what others think too about all this though. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Guillaume >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <keeganw...@gmail.com> >>>>>>>>>>>>>> : >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The code as-is today writes the BOM regardless of platform. >>>>>>>>>>>>>>> I just tested in Linux with the same results. I think there >>>>>>>>>>>>>>> are 2 parts to >>>>>>>>>>>>>>> the question of "what's the correct behavior?" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. Should the BOM be written at all, particularly when the >>>>>>>>>>>>>>> platform is Windows? >>>>>>>>>>>>>>> 2. Should the behavior of *withPrintWriter* differ (even >>>>>>>>>>>>>>> if the difference is to be smarter) from the behavior of *new >>>>>>>>>>>>>>> PrintWriter*? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Discussion* >>>>>>>>>>>>>>> 1. Strictly speaking, yes. Because RFC 2781 >>>>>>>>>>>>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 >>>>>>>>>>>>>>> to assume big endian if there is no BOM. However, in practice, >>>>>>>>>>>>>>> many >>>>>>>>>>>>>>> applications disregard the RFC and assume little-endian because >>>>>>>>>>>>>>> that's what Windows >>>>>>>>>>>>>>> does >>>>>>>>>>>>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>. >>>>>>>>>>>>>>> Because of this, the behavior could be changed so that when >>>>>>>>>>>>>>> writing >>>>>>>>>>>>>>> UTF-16LE on Windows, it doesn't write the BOM. But in my >>>>>>>>>>>>>>> opinion, it's >>>>>>>>>>>>>>> best practice to always write a BOM when working with UTF-16, >>>>>>>>>>>>>>> and Java >>>>>>>>>>>>>>> should have done this in their implementation of their >>>>>>>>>>>>>>> PrintWriter. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2. This is a tough one. Arguably, *withPrintWriter* is >>>>>>>>>>>>>>> doing the smarter, more correct behavior, but the typical user >>>>>>>>>>>>>>> would assume >>>>>>>>>>>>>>> this is just a shorthand convenience for newing up a >>>>>>>>>>>>>>> PrintWriter (I >>>>>>>>>>>>>>> certainly did). So the question is, is it better to just >>>>>>>>>>>>>>> document this >>>>>>>>>>>>>>> difference in the GroovyDoc? Or to change the behavior to be >>>>>>>>>>>>>>> closer to >>>>>>>>>>>>>>> Java? And if the latter, what breakages would that cause >>>>>>>>>>>>>>> within Groovy >>>>>>>>>>>>>>> itself? Making that change could break folks in production, >>>>>>>>>>>>>>> because they >>>>>>>>>>>>>>> could rely on that BOM being there, in cases for example where >>>>>>>>>>>>>>> the file is >>>>>>>>>>>>>>> created on Windows, but then processed on Linux or when working >>>>>>>>>>>>>>> with a >>>>>>>>>>>>>>> third party library that is more picky about the presence of a >>>>>>>>>>>>>>> BOM. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Keegan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge < >>>>>>>>>>>>>>> glafo...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Now... is it what should be done or not is the good >>>>>>>>>>>>>>>> question to ask :-) >>>>>>>>>>>>>>>> Does Windows manages to open UTF-16 files without BOMs? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt < >>>>>>>>>>>>>>>> keeganw...@gmail.com>: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I forgot to mention that. Yes, I ran the test mentioned >>>>>>>>>>>>>>>>> in Windows. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge < >>>>>>>>>>>>>>>>> glafo...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> That's a good question. >>>>>>>>>>>>>>>>>> I guess this is happening on Windows? (I haven't tried >>>>>>>>>>>>>>>>>> here, since I'm on OS X) >>>>>>>>>>>>>>>>>> I think BOMs were mandatory in text files on Windows. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt < >>>>>>>>>>>>>>>>>> keeganw...@gmail.com>: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I've always taken a perverse pleasure in character >>>>>>>>>>>>>>>>>>> encoding problems. I was intrigued by this SO question >>>>>>>>>>>>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char> >>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>> UTF 16 BOMs in Java vs Groovy. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> It appears using withPrintWriter(charset) produces a >>>>>>>>>>>>>>>>>>> BOM whereas new PrintWriter(file, charset) does not. >>>>>>>>>>>>>>>>>>> As demonstrated here: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> File file = new File("tmp.txt")try { >>>>>>>>>>>>>>>>>>> String text = " " >>>>>>>>>>>>>>>>>>> String charset = "UTF-16LE" >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> file.withPrintWriter(charset) { it << text } >>>>>>>>>>>>>>>>>>> println "withPrintWriter" >>>>>>>>>>>>>>>>>>> file.getBytes().each { System.out.format("%02x ", it) } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> PrintWriter w = new PrintWriter(file, charset) >>>>>>>>>>>>>>>>>>> w.print(text) >>>>>>>>>>>>>>>>>>> w.close() >>>>>>>>>>>>>>>>>>> println "\n\nnew PrintWriter" >>>>>>>>>>>>>>>>>>> file.getBytes().each { System.out.format("%02x ", it) >>>>>>>>>>>>>>>>>>> }} finally { >>>>>>>>>>>>>>>>>>> file.delete()} >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Outputs >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> withPrintWriter >>>>>>>>>>>>>>>>>>> ff fe 20 00 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> new PrintWriter >>>>>>>>>>>>>>>>>>> 20 00 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is this difference in behavior intentional? It seems >>>>>>>>>>>>>>>>>>> kinda odd to me. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -Keegan >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Guillaume Laforge >>>>>>>>>>>>>>>>>> Groovy Project Manager >>>>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Guillaume Laforge >>>>>>>>>>>>>>>> Groovy Project Manager >>>>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Guillaume Laforge >>>>>>>>>>>>>> Groovy Project Manager >>>>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Guillaume Laforge >>>>>>>>>>>> Groovy Project Manager >>>>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>>>>> >>>>>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Guillaume Laforge >>>>>>>>>> Groovy Project Manager >>>>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>>>> >>>>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Guillaume Laforge >>>>>>>> Groovy Project Manager >>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>>>>>> >>>>>>>> Blog: http://glaforge.appspot.com/ >>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Guillaume Laforge >>>> Groovy Project Manager >>>> Product Ninja & Advocate at Restlet <http://restlet.com> >>>> >>>> Blog: http://glaforge.appspot.com/ >>>> Social: @glaforge <http://twitter.com/glaforge> / Google+ >>>> <https://plus.google.com/u/0/114130972232398734985/posts> >>>> >>> >>> >> >> >> -- >> Guillaume Laforge >> Groovy Project Manager >> Product Ninja & Advocate at Restlet <http://restlet.com> >> >> Blog: http://glaforge.appspot.com/ >> Social: @glaforge <http://twitter.com/glaforge> / Google+ >> <https://plus.google.com/u/0/114130972232398734985/posts> >> > > -- Guillaume Laforge Groovy Project Manager Product Ninja & Advocate at Restlet <http://restlet.com> Blog: http://glaforge.appspot.com/ Social: @glaforge <http://twitter.com/glaforge> / Google+ <https://plus.google.com/u/0/114130972232398734985/posts>