Re: Japanese transformation is not stable

2004-07-28 Thread Yoshiki Hayashi
André Malo [EMAIL PROTECTED] writes:

 Last few months, I've encountered some bugs of the implementation of 
 iso-2022-jp charset converter of Sun JRE, but the converter will be soon 
 stable I think. 

Well, I have a reason to believe it will never be their top
priority to fix those bugs.

 The second, though not so important reason is, that I'm currently working on
 restructuring the docs to create a better platform for translators which
 includes rewriting the styles and the build tools. I'm using the result diffs
 to check if something went wrong.

It would be nice if you could add a target which takes an
XML file as an argument and transforms only that file.

 Ok. That's reason why I've asked. I've had shift_jis in my mind, since we're
 currently recoding to shift_jis for the CHM files, because the html help
 compiler seems to support only this charset for Japanese. If euc-jp is better
 for the online pages, we should use it.

Japanese Windows defaults to shift_jis.  I don't think any
software beside MUA and web browser support other character
encoding scheme.  I don't know how well Windows supports
UTF-8 nowadays.

 Just to make clear: that this doesn't affect the *source* encoding. Keep it as
 you like.

I thought a little bit about shift_jis's advantage of
directly editable by whatever editor on Windows but soon I
realized it is only for transformed files. ;-)

-- 
Yoshiki Hayashi

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Japanese transformation is not stable

2004-07-27 Thread Yoshiki Hayashi
André Malo [EMAIL PROTECTED] writes:

 Hmm. It still happens, that different JREs (?) produce different iso-2022-jp
 output (i.e. any time someone builds all and diffs, he gets .ja.jis diffs.

Well, at least mine removes bogus escape sequences and
produce more desirable output but yeah, it still happens.

 I'd suggest to switch the transformation finally to shift_jis, which is more
 stable (because there are none of these problematic escape sequences).

I'd rather use euc-jp than shift_jis.  For one thing,
shift_jis is a nightmare for auto detection since almost all
byte sequence can represent a valid character.  If I choose
from three major character encoding scheme in Japan, I
always choose euc-jp.  It doesn't have quirks sjis has.  The
fact that current one uses iso-2022-jp is just from legacy
reasons.

 If we decide to switch the charset then we also need to decide, whether we
 want to move the files in CVS (ja.jis to ja.sjis) or just change to charset
 configuration for the docs directories. I'd prefer the latter. But for
 backwards compat reasons we may need to move the files.

I prefer moving files and add redirection magic to make old
directly pointing URLs to work.  It doesn't look right to
get euc-jp or shift_jis file from .ja.jis ending URL.

-- 
Yoshiki Hayashi

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Japanese transformation is not stable

2004-07-27 Thread Hiroaki KAWAI
  Hmm. It still happens, that different JREs (?) produce different iso-2022-jp
  output (i.e. any time someone builds all and diffs, he gets .ja.jis diffs.
 
 Well, at least mine removes bogus escape sequences and
 produce more desirable output but yeah, it still happens.

Last few months, I've encountered some bugs of the implementation of 
iso-2022-jp charset converter of Sun JRE, but the converter will be soon 
stable I think. 
I'm working on the input XML files, and I'm not watching the generated 
html files. I feel the diffs of the htmls are not so important than 
those of the xmls. 
I can just ignore the diffs of generated html files right now.

Well, I don't understand what the diffs do harm to us, so can I ask some 
reasons?


  I'd suggest to switch the transformation finally to shift_jis, which is more
  stable (because there are none of these problematic escape sequences).
 
 I'd rather use euc-jp than shift_jis.  For one thing,
 shift_jis is a nightmare for auto detection since almost all
 byte sequence can represent a valid character.  If I choose
 from three major character encoding scheme in Japan, I
 always choose euc-jp.  It doesn't have quirks sjis has.  The
 fact that current one uses iso-2022-jp is just from legacy
 reasons.

IMHO, whatever charset we choose, more or less, we will face this kind of 
problem. 
# I, myself prefer UTF8. :-)
## Because it support wide area of characters. 

But, shift_jis is actually worse choise because there're well known 
issuses around Shift_JIS and CP932 charsets. 
The alias definition changed and changed between the release of Java.

I have no strong push which charset to be (except shift_jis).


---Hiroaki Kawai


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Japanese transformation is not stable

2004-07-27 Thread Andr Malo
* Hiroaki KAWAI [EMAIL PROTECTED] wrote:

   Hmm. It still happens, that different JREs (?) produce different
   iso-2022-jp output (i.e. any time someone builds all and diffs, he gets
   .ja.jis diffs.
  
  Well, at least mine removes bogus escape sequences and
  produce more desirable output but yeah, it still happens.
 
 Last few months, I've encountered some bugs of the implementation of 
 iso-2022-jp charset converter of Sun JRE, but the converter will be soon 
 stable I think. 
 I'm working on the input XML files, and I'm not watching the generated 
 html files. I feel the diffs of the htmls are not so important than 
 those of the xmls. 
 I can just ignore the diffs of generated html files right now.
 
 Well, I don't understand what the diffs do harm to us, so can I ask some 
 reasons?

The problem is, that someone who builds the whole tree gets japanese diffs -
and most people just cannot decide if they made somehting wrong or not (I can,
because I've glanced over the accompanying RFC ;-)

The second, though not so important reason is, that I'm currently working on
restructuring the docs to create a better platform for translators which
includes rewriting the styles and the build tools. I'm using the result diffs
to check if something went wrong.

   I'd suggest to switch the transformation finally to shift_jis, which is
   more stable (because there are none of these problematic escape
   sequences).
  
  I'd rather use euc-jp than shift_jis.  For one thing,
  shift_jis is a nightmare for auto detection since almost all
  byte sequence can represent a valid character.  If I choose
  from three major character encoding scheme in Japan, I
  always choose euc-jp.  It doesn't have quirks sjis has.  The
  fact that current one uses iso-2022-jp is just from legacy
  reasons.
 
 IMHO, whatever charset we choose, more or less, we will face this kind of 
 problem. 
 # I, myself prefer UTF8. :-)
 ## Because it support wide area of characters. 

UTF-8 is cool, but too large for the resulting html pages. A two-byte encoding
is way smaller and the wider area of characters one needs (if any) are
supported by html itself (#xxx;).

 But, shift_jis is actually worse choise because there're well known 
 issuses around Shift_JIS and CP932 charsets. 
 The alias definition changed and changed between the release of Java.

Ok. That's reason why I've asked. I've had shift_jis in my mind, since we're
currently recoding to shift_jis for the CHM files, because the html help
compiler seems to support only this charset for Japanese. If euc-jp is better
for the online pages, we should use it.

If noone objects, I'm going to start conversion to euc-jp within some days.

Just to make clear: that this doesn't affect the *source* encoding. Keep it as
you like.

nd
-- 
Flhacs wird im Usenet grundsätzlich alsfhc geschrieben. Schreibt man
lafhsc nicht slfach, so ist das schlichtweg hclafs. Hingegen darf man
rihctig ruhig rhitcgi schreiben, weil eine shcalfe Schreibweise bei
irhictg nicht als shflac angesehen wird.   -- Hajo Pflüger in dnq

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Japanese transformation is not stable (was: cvs commit: httpd-2.0/docs/manual/mod mod_status.html.ja.jis allmodules.xml.ja core.html.ja.jis core.xml.meta index.html.ja.jis mod_status.html mod_status.html.en mod_status.xml.meta quickreference.html.ja.jis)

2004-07-26 Thread Andr Malo
* [EMAIL PROTECTED] wrote:

 yoshiki 2004/07/25 21:09:37
 
   Modified:docs/manual/mod allmodules.xml.ja core.html.ja.jis
 core.xml.meta index.html.ja.jis mod_status.html
 mod_status.html.en mod_status.xml.meta
 quickreference.html.ja.jis
   Added:   docs/manual/mod mod_status.html.ja.jis
   Log:
   Update transformation.

Hmm. It still happens, that different JREs (?) produce different iso-2022-jp
output (i.e. any time someone builds all and diffs, he gets .ja.jis diffs.
I'd suggest to switch the transformation finally to shift_jis, which is more
stable (because there are none of these problematic escape sequences).

If we decide to switch the charset then we also need to decide, whether we
want to move the files in CVS (ja.jis to ja.sjis) or just change to charset
configuration for the docs directories. I'd prefer the latter. But for
backwards compat reasons we may need to move the files.

Opinions?

nd
-- 
Solides und umfangreiches Buch
  -- aus einer Rezension

http://pub.perlig.de/books.html#apache2

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Japanese transformation is not stable (was: cvs commit:httpd-2.0/docs/manual/mod mod_status.html.ja.jis allmodules.xml.jacore.html.ja.jis core.xml.meta index.html.ja.jis mod_status.htmlmod_status.html.en mod_status.xml.meta quickreference.html.ja.jis)

2004-07-26 Thread sweetaman
hi
its good to having mailos from u but
i m really not geting u
pl
help me out
thanx
















* [EMAIL PROTECTED] wrote:

 yoshiki 2004/07/25 21:09:37

   Modified:docs/manual/mod allmodules.xml.ja core.html.ja.jis
 core.xml.meta index.html.ja.jis mod_status.html
 mod_status.html.en mod_status.xml.meta
 quickreference.html.ja.jis
   Added:   docs/manual/mod mod_status.html.ja.jis
   Log:
   Update transformation.

 Hmm. It still happens, that different JREs (?) produce different
 iso-2022-jp
 output (i.e. any time someone builds all and diffs, he gets .ja.jis diffs.
 I'd suggest to switch the transformation finally to shift_jis, which is
 more
 stable (because there are none of these problematic escape sequences).

 If we decide to switch the charset then we also need to decide, whether we
 want to move the files in CVS (ja.jis to ja.sjis) or just change to
 charset
 configuration for the docs directories. I'd prefer the latter. But for
 backwards compat reasons we may need to move the files.

 Opinions?

 nd
 --
 Solides und umfangreiches Buch
   -- aus einer Rezension

 http://pub.perlig.de/books.html#apache2

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Japanese transformation is not stable

2004-07-26 Thread Andr Malo
* [EMAIL PROTECTED] wrote:

 its good to having mailos from u but
 i m really not geting u
 pl
 help me out

Sorry, I don't understand your request.

nd
-- 
Das Verhalten von Gates hatte mir bewiesen, dass ich auf ihn und seine
beiden Gefährten nicht zu zählen brauchte -- Karl May, Winnetou III

Im Westen was neues: http://pub.perlig.de/books.html#apache2

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]