Re: Japanese transformation is not stable
André Malo [EMAIL PROTECTED] writes: Last few months, I've encountered some bugs of the implementation of iso-2022-jp charset converter of Sun JRE, but the converter will be soon stable I think. Well, I have a reason to believe it will never be their top priority to fix those bugs. The second, though not so important reason is, that I'm currently working on restructuring the docs to create a better platform for translators which includes rewriting the styles and the build tools. I'm using the result diffs to check if something went wrong. It would be nice if you could add a target which takes an XML file as an argument and transforms only that file. Ok. That's reason why I've asked. I've had shift_jis in my mind, since we're currently recoding to shift_jis for the CHM files, because the html help compiler seems to support only this charset for Japanese. If euc-jp is better for the online pages, we should use it. Japanese Windows defaults to shift_jis. I don't think any software beside MUA and web browser support other character encoding scheme. I don't know how well Windows supports UTF-8 nowadays. Just to make clear: that this doesn't affect the *source* encoding. Keep it as you like. I thought a little bit about shift_jis's advantage of directly editable by whatever editor on Windows but soon I realized it is only for transformed files. ;-) -- Yoshiki Hayashi - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Japanese transformation is not stable
André Malo [EMAIL PROTECTED] writes: Hmm. It still happens, that different JREs (?) produce different iso-2022-jp output (i.e. any time someone builds all and diffs, he gets .ja.jis diffs. Well, at least mine removes bogus escape sequences and produce more desirable output but yeah, it still happens. I'd suggest to switch the transformation finally to shift_jis, which is more stable (because there are none of these problematic escape sequences). I'd rather use euc-jp than shift_jis. For one thing, shift_jis is a nightmare for auto detection since almost all byte sequence can represent a valid character. If I choose from three major character encoding scheme in Japan, I always choose euc-jp. It doesn't have quirks sjis has. The fact that current one uses iso-2022-jp is just from legacy reasons. If we decide to switch the charset then we also need to decide, whether we want to move the files in CVS (ja.jis to ja.sjis) or just change to charset configuration for the docs directories. I'd prefer the latter. But for backwards compat reasons we may need to move the files. I prefer moving files and add redirection magic to make old directly pointing URLs to work. It doesn't look right to get euc-jp or shift_jis file from .ja.jis ending URL. -- Yoshiki Hayashi - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Japanese transformation is not stable
Hmm. It still happens, that different JREs (?) produce different iso-2022-jp output (i.e. any time someone builds all and diffs, he gets .ja.jis diffs. Well, at least mine removes bogus escape sequences and produce more desirable output but yeah, it still happens. Last few months, I've encountered some bugs of the implementation of iso-2022-jp charset converter of Sun JRE, but the converter will be soon stable I think. I'm working on the input XML files, and I'm not watching the generated html files. I feel the diffs of the htmls are not so important than those of the xmls. I can just ignore the diffs of generated html files right now. Well, I don't understand what the diffs do harm to us, so can I ask some reasons? I'd suggest to switch the transformation finally to shift_jis, which is more stable (because there are none of these problematic escape sequences). I'd rather use euc-jp than shift_jis. For one thing, shift_jis is a nightmare for auto detection since almost all byte sequence can represent a valid character. If I choose from three major character encoding scheme in Japan, I always choose euc-jp. It doesn't have quirks sjis has. The fact that current one uses iso-2022-jp is just from legacy reasons. IMHO, whatever charset we choose, more or less, we will face this kind of problem. # I, myself prefer UTF8. :-) ## Because it support wide area of characters. But, shift_jis is actually worse choise because there're well known issuses around Shift_JIS and CP932 charsets. The alias definition changed and changed between the release of Java. I have no strong push which charset to be (except shift_jis). ---Hiroaki Kawai - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Japanese transformation is not stable
* Hiroaki KAWAI [EMAIL PROTECTED] wrote: Hmm. It still happens, that different JREs (?) produce different iso-2022-jp output (i.e. any time someone builds all and diffs, he gets .ja.jis diffs. Well, at least mine removes bogus escape sequences and produce more desirable output but yeah, it still happens. Last few months, I've encountered some bugs of the implementation of iso-2022-jp charset converter of Sun JRE, but the converter will be soon stable I think. I'm working on the input XML files, and I'm not watching the generated html files. I feel the diffs of the htmls are not so important than those of the xmls. I can just ignore the diffs of generated html files right now. Well, I don't understand what the diffs do harm to us, so can I ask some reasons? The problem is, that someone who builds the whole tree gets japanese diffs - and most people just cannot decide if they made somehting wrong or not (I can, because I've glanced over the accompanying RFC ;-) The second, though not so important reason is, that I'm currently working on restructuring the docs to create a better platform for translators which includes rewriting the styles and the build tools. I'm using the result diffs to check if something went wrong. I'd suggest to switch the transformation finally to shift_jis, which is more stable (because there are none of these problematic escape sequences). I'd rather use euc-jp than shift_jis. For one thing, shift_jis is a nightmare for auto detection since almost all byte sequence can represent a valid character. If I choose from three major character encoding scheme in Japan, I always choose euc-jp. It doesn't have quirks sjis has. The fact that current one uses iso-2022-jp is just from legacy reasons. IMHO, whatever charset we choose, more or less, we will face this kind of problem. # I, myself prefer UTF8. :-) ## Because it support wide area of characters. UTF-8 is cool, but too large for the resulting html pages. A two-byte encoding is way smaller and the wider area of characters one needs (if any) are supported by html itself (#xxx;). But, shift_jis is actually worse choise because there're well known issuses around Shift_JIS and CP932 charsets. The alias definition changed and changed between the release of Java. Ok. That's reason why I've asked. I've had shift_jis in my mind, since we're currently recoding to shift_jis for the CHM files, because the html help compiler seems to support only this charset for Japanese. If euc-jp is better for the online pages, we should use it. If noone objects, I'm going to start conversion to euc-jp within some days. Just to make clear: that this doesn't affect the *source* encoding. Keep it as you like. nd -- Flhacs wird im Usenet grundsätzlich alsfhc geschrieben. Schreibt man lafhsc nicht slfach, so ist das schlichtweg hclafs. Hingegen darf man rihctig ruhig rhitcgi schreiben, weil eine shcalfe Schreibweise bei irhictg nicht als shflac angesehen wird. -- Hajo Pflüger in dnq - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Japanese transformation is not stable (was: cvs commit: httpd-2.0/docs/manual/mod mod_status.html.ja.jis allmodules.xml.ja core.html.ja.jis core.xml.meta index.html.ja.jis mod_status.html mod_status.html.en mod_status.xml.meta quickreference.html.ja.jis)
* [EMAIL PROTECTED] wrote: yoshiki 2004/07/25 21:09:37 Modified:docs/manual/mod allmodules.xml.ja core.html.ja.jis core.xml.meta index.html.ja.jis mod_status.html mod_status.html.en mod_status.xml.meta quickreference.html.ja.jis Added: docs/manual/mod mod_status.html.ja.jis Log: Update transformation. Hmm. It still happens, that different JREs (?) produce different iso-2022-jp output (i.e. any time someone builds all and diffs, he gets .ja.jis diffs. I'd suggest to switch the transformation finally to shift_jis, which is more stable (because there are none of these problematic escape sequences). If we decide to switch the charset then we also need to decide, whether we want to move the files in CVS (ja.jis to ja.sjis) or just change to charset configuration for the docs directories. I'd prefer the latter. But for backwards compat reasons we may need to move the files. Opinions? nd -- Solides und umfangreiches Buch -- aus einer Rezension http://pub.perlig.de/books.html#apache2 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Japanese transformation is not stable (was: cvs commit:httpd-2.0/docs/manual/mod mod_status.html.ja.jis allmodules.xml.jacore.html.ja.jis core.xml.meta index.html.ja.jis mod_status.htmlmod_status.html.en mod_status.xml.meta quickreference.html.ja.jis)
hi its good to having mailos from u but i m really not geting u pl help me out thanx * [EMAIL PROTECTED] wrote: yoshiki 2004/07/25 21:09:37 Modified:docs/manual/mod allmodules.xml.ja core.html.ja.jis core.xml.meta index.html.ja.jis mod_status.html mod_status.html.en mod_status.xml.meta quickreference.html.ja.jis Added: docs/manual/mod mod_status.html.ja.jis Log: Update transformation. Hmm. It still happens, that different JREs (?) produce different iso-2022-jp output (i.e. any time someone builds all and diffs, he gets .ja.jis diffs. I'd suggest to switch the transformation finally to shift_jis, which is more stable (because there are none of these problematic escape sequences). If we decide to switch the charset then we also need to decide, whether we want to move the files in CVS (ja.jis to ja.sjis) or just change to charset configuration for the docs directories. I'd prefer the latter. But for backwards compat reasons we may need to move the files. Opinions? nd -- Solides und umfangreiches Buch -- aus einer Rezension http://pub.perlig.de/books.html#apache2 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Japanese transformation is not stable
* [EMAIL PROTECTED] wrote: its good to having mailos from u but i m really not geting u pl help me out Sorry, I don't understand your request. nd -- Das Verhalten von Gates hatte mir bewiesen, dass ich auf ihn und seine beiden Gefährten nicht zu zählen brauchte -- Karl May, Winnetou III Im Westen was neues: http://pub.perlig.de/books.html#apache2 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]