Re: Java source encoding (was Re: [RTF] Jfor integration)
Victor Mote wrote: Peter B. West wrote: (I wouldn't say it was heated.) I am curious about the impact of someone working without any formal IDE, and just using (X)Emacs and JDEE for development. As far as I know, XEmacs does not support Unicode, but if the non-ASCII characters were restricted to comments, and XEmacs thought it was dealing with ISO-8859-15, would there be any actual problems? ASCII nulls aren't gling to appear in such UTF8 are they? As long as 1) the editor doesn't think it needs to change the content when opening or saving the file, and 2) the non-ASCII characters don't mess up the editor's display, I don't think there is any problem. The first issue is easy to test by opening the file, saving as something else, and diffing. Since control characters have the same code points in ASCII and UTF-8, the second problem should be a non-issue. I never bit the emacs bullet, and don't directly know the impact there. In vi, and Notepad (as in most other non-Unicode editors), you'll see the two (or more) bytes displayed in their single-byte forms, which makes sense. I had to look it up to be sure it is true for control characters as well, but the UTF-8 range 00-7F always represents a single-byte character, so there should be no ASCII nulls (at least not as the result of an ASCII-to-UTF-8 conversion). Yes. Now that I think a bit more about it, UTF8 guarantees that all non-ASCII characters will have the 8th bit set; that's the success of the encoding (which is a delight, btw). So not only will there be no NULs, but no TAB, LF, CR, etc. The affected changes can be seen at: http://marc.theaimsgroup.com/?l=fop-cvsm=105647684725575w=2 Of course, the files affected are listed there as well if you would like to test them in your favorite editor. It may be true that it is safer in the short-term to either 1) eliminate such characters, or 2) encode them in the \u format, but I think it probably makes more sense to simply say that we all need to work in a Unicode-aware environment or at least a non-Unicode-hostile one. Either way, this is probably something that we should document in the style guide. As far as comments are concerned, we can say Don't do it, because the comment is not going to be readable in any editor that is not Unicode capable, and the \u make no sense in a comment. For code, just use the \u form if it is necessary. There are examples in alt.design's org/apache/fop/datatypes/CountryLanguageScript.java, generated(!) from xml-lang.xsl and xml-lang.xml, currently in the conf directory. The language codes from ISO 639-2T, ISO 639-2B and ISO 639-1 include the French name. http://www.loc.gov/standards/iso639-2/langhome.html represents ISO 639-2 in four tables, sorted by English name, French name, bibliographic code and teminology code respectively. I included the French names in the XML, and in the generated code, although I have not done the same for script or country codes. The easiest way out is probably to remove the French names, but I am loathe to do that. Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Java source encoding (was Re: [RTF] Jfor integration)
Victor Mote wrote: I never bit the emacs bullet, and don't directly know the impact there. GNU emacs 21 knows about UTF-8. Unfortunately, the NT port seems to be less rock solid than usual, I got the first emacs crashes since I abandoned Solaris 2.1 in, well, lets say an epoch or two ago. BTW, I notice the absence of you and Peter from http://cvs.apache.org/~sgala/nightmap.html J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Java source encoding (was Re: [RTF] Jfor integration)
J.Pietschmann wrote: Victor Mote wrote: BTW, I notice the absence of you and Peter from http://cvs.apache.org/~sgala/nightmap.html Ok, so how do we drive this thing? Is it zoomable? I couldn't find pietsch on there either. Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: Java source encoding (was Re: [RTF] Jfor integration)
J.Pietschmann wrote: BTW, I notice the absence of you and Peter from http://cvs.apache.org/~sgala/nightmap.html OK, I should be on there the next time the map is regenerated. Now that you guys know how to get here, come on over! Victor Mote - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: Java source encoding (was Re: [RTF] Jfor integration)
Peter B. West wrote: Ok, so how do we drive this thing? Is it zoomable? I couldn't find pietsch on there either. Sorry, I meant to show the path (it took a while to figure out): Joerg gave us: http://cvs.apache.org/~sgala/nightmap.html Then (hoping to find something instructive) look at: http://cvs.apache.org/~sgala You'll have to view the source to see the intended URL underneath: http://cvs.apache.org/~dirkx/sgala.html Instructions are down at the bottom. A couple of gotchas: 1. committers is a CVS module in /home/cvs 2. The FAQ to which he refers is in the committers/krell Victor Mote - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Java source encoding (was Re: [RTF] Jfor integration)
Bertrand Delacretaz wrote: Sure - it is by accident that comments in the jfor source code contains non-ASCII chars (in people's names IIRC). OOps, I didn't think about that. We could a) Force ISO-8859-1 for all Java source files in the build file. Is this a discrimination of, ummm, non-western contributors who might want to have their names in their native script in the files? b) Keep a list of Java source files which need a different encoding and force ISO-8859-1 on the rest c) Switch to UTF-8. Eclipse can deal with UTF-8. Users of other IDEs are, to a large part, screwed. J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Java source encoding (was Re: [RTF] Jfor integration)
Le Vendredi, 4 juil 2003, à 21:12 Europe/Zurich, J.Pietschmann a écrit : Bertrand Delacretaz wrote: Sure - it is by accident that comments in the jfor source code contains non-ASCII chars (in people's names IIRC). OOps, I didn't think about that. We could What I meant is that I think (or rather hope) people are ok to have their names spelled slightly wrong in source files. I don't think it's worth the hassle to worry about encodings just to write contributors names. -Bertrand - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: Java source encoding (was Re: [RTF] Jfor integration)
J.Pietschmann wrote: OOps, I didn't think about that. We could a) Force ISO-8859-1 for all Java source files in the build file. Is this a discrimination of, ummm, non-western contributors who might want to have their names in their native script in the files? b) Keep a list of Java source files which need a different encoding and force ISO-8859-1 on the rest c) Switch to UTF-8. Eclipse can deal with UTF-8. Users of other IDEs are, to a large part, screwed. I already chose option #3 for the files in question (probably 2 weeks ago). If java source files are Unicode, then can a java editor really claim to be such if it can't handle UTF-8? I guess I don't understand how this became such an important and heated discussion. Victor Mote - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Java source encoding (was Re: [RTF] Jfor integration)
Victor Mote wrote: J.Pietschmann wrote: OOps, I didn't think about that. We could a) Force ISO-8859-1 for all Java source files in the build file. Is this a discrimination of, ummm, non-western contributors who might want to have their names in their native script in the files? b) Keep a list of Java source files which need a different encoding and force ISO-8859-1 on the rest c) Switch to UTF-8. Eclipse can deal with UTF-8. Users of other IDEs are, to a large part, screwed. I already chose option #3 for the files in question (probably 2 weeks ago). If java source files are Unicode, then can a java editor really claim to be such if it can't handle UTF-8? I guess I don't understand how this became such an important and heated discussion. (I wouldn't say it was heated.) I am curious about the impact of someone working without any formal IDE, and just using (X)Emacs and JDEE for development. As far as I know, XEmacs does not support Unicode, but if the non-ASCII characters were restricted to comments, and XEmacs thought it was dealing with ISO-8859-15, would there be any actual problems? ASCII nulls aren't gling to appear in such UTF8 are they? Peter -- Peter B. West http://www.powerup.com.au/~pbwest/resume.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Java source encoding (was Re: [RTF] Jfor integration)
Christian Geisert wrote: Java source is whatever the platform encoding is (see file.encoding system property) so the best thing is to avoid chars 127 at all (use \u instead) ... uh this won't work in this case as these are comments. Maybe there's a workaround for special french characters (like ä = ae in german) Recent javacs can be told to use any of the encodings known to the RT library. I think there is also a property file somewhere which can be used to define a default. Whether it is prudent to fiddle with these kind of settings is another matter. And, uh, comment language is *english*, guys :-) J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
RE: Java source encoding (was Re: [RTF] Jfor integration)
Christian Geisert wrote: Java source is Unicode, and I don't think the encoding would matter, but Java source is whatever the platform encoding is (see file.encoding system property) so the best thing is to avoid chars 127 at all (use \u instead) ... uh this won't work in this case as these are comments. Maybe there's a workaround for special french characters (like ä = ae in german) O'Reilly's Java in a Nutshell, 4th edition, page 20: Java programs are written using the Unicode character set. Java uses Unicode specifically to avoid the avoid chars 127 problem. The platform may affect defaults, but the editor is where the problem shows up, as not all editors support Unicode. AFAIK, all java compilers must handle Unicode files. The issue was that the editor knew that the 128+ chars were not encoded correctly. Converting to UTF-8 solved the problem nicely. BTW, I think the long-hand ASCII \u will work in comments as well as source (too lazy to test it). Victor Mote - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Java source encoding (was Re: [RTF] Jfor integration)
Le Jeudi, 3 juil 2003, à 21:16 Europe/Zurich, J.Pietschmann a écrit : ...And, uh, comment language is *english*, guys :-) Sure - it is by accident that comments in the jfor source code contains non-ASCII chars (in people's names IIRC). No problem in removing the accents! -Bertrand - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]