Re: Java source encoding (was Re: [RTF] Jfor integration)

2003-07-05 Thread Peter B. West
Victor Mote wrote:
Peter B. West wrote:


(I wouldn't say it was heated.)  I am curious about the impact of
someone working without any formal IDE, and just using (X)Emacs and JDEE
for development.  As far as I know, XEmacs does not support Unicode, but
if the non-ASCII characters were restricted to comments, and XEmacs
thought it was dealing with ISO-8859-15, would there be any actual
problems?  ASCII nulls aren't gling to appear in such UTF8 are they?


As long as 1) the editor doesn't think it needs to change the content when
opening or saving the file, and 2) the non-ASCII characters don't mess up
the editor's display, I don't think there is any problem. The first issue is
easy to test by opening the file, saving as something else, and diffing.
Since control characters have the same code points in ASCII and UTF-8, the
second problem should be a non-issue.
I never bit the emacs bullet, and don't directly know the impact there. In
vi, and Notepad (as in most other non-Unicode editors), you'll see the two
(or more) bytes displayed in their single-byte forms, which makes sense.
I had to look it up to be sure it is true for control characters as well,
but the UTF-8 range 00-7F always represents a single-byte character, so
there should be no ASCII nulls (at least not as the result of an
ASCII-to-UTF-8 conversion).
Yes.  Now that I think a bit more about it, UTF8 guarantees that all 
non-ASCII characters will have the 8th bit set; that's the success of 
the encoding (which is a delight, btw).  So not only will there be no 
NULs, but no TAB, LF, CR, etc.

The affected changes can be seen at:
http://marc.theaimsgroup.com/?l=fop-cvsm=105647684725575w=2
Of course, the files affected are listed there as well if you would like to
test them in your favorite editor.
It may be true that it is safer in the short-term to either 1) eliminate
such characters, or 2) encode them in the \u format, but I think it
probably makes more sense to simply say that we all need to work in a
Unicode-aware environment or at least a non-Unicode-hostile one. Either way,
this is probably something that we should document in the style guide.
As far as comments are concerned, we can say Don't do it, because the 
comment is not going to be readable in any editor that is not Unicode 
capable, and the \u make no sense in a comment.  For code, just use the 
\u form if it is necessary.

There are examples in alt.design's 
org/apache/fop/datatypes/CountryLanguageScript.java, generated(!) from 
xml-lang.xsl and xml-lang.xml, currently in the conf directory.  The 
language codes from ISO 639-2T, ISO 639-2B and ISO 639-1 include the 
French name. http://www.loc.gov/standards/iso639-2/langhome.html 
represents ISO 639-2 in four tables, sorted by English name, French 
name, bibliographic code and teminology code respectively.  I included 
the French names in the XML, and in the generated code, although I have 
not done the same for script or country codes.  The easiest way out is 
probably to remove the French names, but I am loathe to do that.

Peter
--
Peter B. West  http://www.powerup.com.au/~pbwest/resume.html
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]


Re: Java source encoding (was Re: [RTF] Jfor integration)

2003-07-05 Thread J.Pietschmann
Victor Mote wrote:
I never bit the emacs bullet, and don't directly know the impact there.
GNU emacs 21 knows about UTF-8. Unfortunately, the NT port seems to
be less rock solid than usual, I got the first emacs crashes since
I abandoned Solaris 2.1 in, well, lets say an epoch or two ago.
BTW, I notice the absence of you and Peter from
 http://cvs.apache.org/~sgala/nightmap.html
J.Pietschmann

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]


Re: Java source encoding (was Re: [RTF] Jfor integration)

2003-07-05 Thread Peter B. West
J.Pietschmann wrote:
Victor Mote wrote:

BTW, I notice the absence of you and Peter from
 http://cvs.apache.org/~sgala/nightmap.html
Ok, so how do we drive this thing?  Is it zoomable?  I couldn't find 
pietsch on there either.

Peter
--
Peter B. West  http://www.powerup.com.au/~pbwest/resume.html
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]


RE: Java source encoding (was Re: [RTF] Jfor integration)

2003-07-05 Thread Victor Mote
J.Pietschmann wrote:

 BTW, I notice the absence of you and Peter from
   http://cvs.apache.org/~sgala/nightmap.html

OK, I should be on there the next time the map is regenerated. Now that you
guys know how to get here, come on over!

Victor Mote


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]



RE: Java source encoding (was Re: [RTF] Jfor integration)

2003-07-05 Thread Victor Mote
Peter B. West wrote:

 Ok, so how do we drive this thing?  Is it zoomable?  I couldn't find 
 pietsch on there either.

Sorry, I meant to show the path (it took a while to figure out):
Joerg gave us:
http://cvs.apache.org/~sgala/nightmap.html
Then (hoping to find something instructive) look at:
http://cvs.apache.org/~sgala
You'll have to view the source to see the intended URL underneath:
http://cvs.apache.org/~dirkx/sgala.html
Instructions are down at the bottom. A couple of gotchas:
1. committers is a CVS module in /home/cvs
2. The FAQ to which he refers is in the committers/krell

Victor Mote

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]



Re: Java source encoding (was Re: [RTF] Jfor integration)

2003-07-04 Thread J.Pietschmann
Bertrand Delacretaz wrote:
Sure - it is by accident that comments in the jfor source code contains 
non-ASCII chars (in people's names IIRC).
OOps, I didn't think about that. We could
a) Force ISO-8859-1 for all Java source files in the build file.
   Is this a discrimination of, ummm, non-western contributors
   who might want to have their names in their native script
   in the files?
b) Keep a list of Java source files which need a different encoding
  and force ISO-8859-1 on the rest
c) Switch to UTF-8. Eclipse can deal with UTF-8. Users of other
  IDEs are, to a large part, screwed.
J.Pietschmann



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]


Re: Java source encoding (was Re: [RTF] Jfor integration)

2003-07-04 Thread Bertrand Delacretaz
Le Vendredi, 4 juil 2003, à 21:12 Europe/Zurich, J.Pietschmann a écrit :

Bertrand Delacretaz wrote:
Sure - it is by accident that comments in the jfor source code 
contains non-ASCII chars (in people's names IIRC).
OOps, I didn't think about that. We could
What I meant is that I think (or rather hope) people are ok to have 
their names spelled slightly wrong in source files.
I don't think it's worth the hassle to worry about encodings just to 
write contributors names.

-Bertrand

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]


RE: Java source encoding (was Re: [RTF] Jfor integration)

2003-07-04 Thread Victor Mote
J.Pietschmann wrote:

 OOps, I didn't think about that. We could
 a) Force ISO-8859-1 for all Java source files in the build file.
 Is this a discrimination of, ummm, non-western contributors
 who might want to have their names in their native script
 in the files?
 b) Keep a list of Java source files which need a different encoding
and force ISO-8859-1 on the rest
 c) Switch to UTF-8. Eclipse can deal with UTF-8. Users of other
IDEs are, to a large part, screwed.

I already chose option #3 for the files in question (probably 2 weeks ago).
If java source files are Unicode, then can a java editor really claim to be
such if it can't handle UTF-8? I guess I don't understand how this became
such an important and heated discussion.

Victor Mote


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]



Re: Java source encoding (was Re: [RTF] Jfor integration)

2003-07-04 Thread Peter B. West
Victor Mote wrote:
J.Pietschmann wrote:


OOps, I didn't think about that. We could
a) Force ISO-8859-1 for all Java source files in the build file.
   Is this a discrimination of, ummm, non-western contributors
   who might want to have their names in their native script
   in the files?
b) Keep a list of Java source files which need a different encoding
  and force ISO-8859-1 on the rest
c) Switch to UTF-8. Eclipse can deal with UTF-8. Users of other
  IDEs are, to a large part, screwed.


I already chose option #3 for the files in question (probably 2 weeks ago).
If java source files are Unicode, then can a java editor really claim to be
such if it can't handle UTF-8? I guess I don't understand how this became
such an important and heated discussion.
(I wouldn't say it was heated.)  I am curious about the impact of 
someone working without any formal IDE, and just using (X)Emacs and JDEE 
for development.  As far as I know, XEmacs does not support Unicode, but 
if the non-ASCII characters were restricted to comments, and XEmacs 
thought it was dealing with ISO-8859-15, would there be any actual 
problems?  ASCII nulls aren't gling to appear in such UTF8 are they?

Peter
--
Peter B. West  http://www.powerup.com.au/~pbwest/resume.html
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]


Re: Java source encoding (was Re: [RTF] Jfor integration)

2003-07-03 Thread J.Pietschmann
Christian Geisert wrote:
Java source is whatever the platform encoding is (see file.encoding
system property) so the best thing is to avoid chars  127 at all
(use \u instead) ... uh this won't work in this case as
these are comments. Maybe there's a workaround for special french 
characters (like ä = ae in german)
Recent javacs can be told to use any of the encodings known
to the RT library. I think there is also a property file
somewhere which can be used to define a default. Whether
it is prudent to fiddle with these kind of settings is
another matter.
And, uh, comment language is *english*, guys :-)

J.Pietschmann



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]


RE: Java source encoding (was Re: [RTF] Jfor integration)

2003-07-03 Thread Victor Mote
Christian Geisert wrote:

  Java source is Unicode, and I don't think the encoding would matter, but

 Java source is whatever the platform encoding is (see file.encoding
 system property) so the best thing is to avoid chars  127 at all
 (use \u instead) ... uh this won't work in this case as
 these are comments. Maybe there's a workaround for special french
 characters (like ä = ae in german)

O'Reilly's Java in a Nutshell, 4th edition, page 20: Java programs are
written using the Unicode character set.

Java uses Unicode specifically to avoid the avoid chars  127 problem. The
platform may affect defaults, but the editor is where the problem shows up,
as not all editors support Unicode. AFAIK, all java compilers must handle
Unicode files. The issue was that the editor knew that the 128+ chars were
not encoded correctly. Converting to UTF-8 solved the problem nicely.

BTW, I think the long-hand ASCII \u will work in comments as well as
source (too lazy to test it).

Victor Mote


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]



Re: Java source encoding (was Re: [RTF] Jfor integration)

2003-07-03 Thread Bertrand Delacretaz
Le Jeudi, 3 juil 2003, à 21:16 Europe/Zurich, J.Pietschmann a écrit :
...And, uh, comment language is *english*, guys :-)
Sure - it is by accident that comments in the jfor source code contains 
non-ASCII chars (in people's names IIRC).
No problem in removing the accents!

-Bertrand

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]