On Thu, 28 Mar 2002, Dan Kogai wrote: Dan,
> First, Thank you so much (as much as the number of code points for all > Korean charset combined!) for submitting a patch so quickly. It was > applied hairlessly. Oh, that's too much more than I really deserve :-) > I am now hopeful that 1.00 will be shipped in next 24 hours. > Coincidentally, it is 09:00 JST, meaning 00:00 Zulu. I'd rather say UTC instead of Zulu :-) > On Thursday, March 28, 2002, at 08:01 , Jungshik Shin wrote: > > Yeah, that's a common mistake made by (Japanese) programmers when > > they didn't bother to read RFC 1557 (or Ken Lunde's book) :-) > > I wonder where that 2022.enc came from. Has nobody touched *.euc that > came from Tcl? If so, NI-S, you should issue a warning to Tcl/Tk > community! That'll be great. NI-S, could you alert Tcl/Tk community to this issue? Although I understand why every programming language/ dev. library has to reinvent the wheel (for the sake of portability), it really is a big headache to monitor every single one of them for potential mistakes like this. Recently, I talked to the author of a popular web-bbs program in Korea written in PHP. The status of multibyte encoding support in PHP is at best primitive (as is usually the case, Japanese encodings are more or less supported but not other East Asian encodings). It can't even handle multibyte encodings which uses the GL range for the second or third octet (SJIS, Big5, GBK, CP949, etc) because it began with supporting only ISO-8859-1. > > encoding, but nobody really needs it any more... ISO-2022-KR decoder > > is still of use because there are some old emails floating around in > > people's mailboxes and some (outdated) programs still generate it. > > (this is why Mozilla has ISO-2022-KR decoder but doesn't have the > > encoder) > > ISO-2022-KR is very rarely used these days. It MUST NOT be > > used for outgoing messages any more. However, the decoder is still handy > > to have (see above.) > > You capped MUST NOT. Not even *depreciated*. Is this de facto or de > jure ? Perhaps, de facto because we never revised RFC 1557 (see below). However, it can be argued that there's no need because it's not standard-track but just informational. Maybe, I used too strong a wording. Major mail programs retained ability to decode ISO-2022-KR. However, most web mail services cannot handle ISO-2022-KR and that's why ISO-2022-KR should never be used for outgoing emails. As for revising RFC 1557, we tried to draft a new RFC on Korean email exchange around 1997 and 1998 because it's obvious that ISO-2022-KR(7bit) had seen its day and it's time for it to rest :-) with major email programs (MUA and MTA) supporting MIME (base64/q-p) and 8BITMIME extension negotiation mechanism. However, the effort got nowhere because people from Microsoft insisted that we should give up EUC-KR in favor of ks_c_5601-1987 or something similar. Most other people including Ken Lunde, Erik van de Poel(the author of RFC 14xx for ISO-2022-JP), Frank Tang (netscape), Woohyung CHOI (the author of RFC 1557 for ISO-2022-KR), Kyungseok GIM (Korean representative to ISO/IEC JTC1/SC2/WG2 and JTC1/SC22/WG22) pitched in and made their cases for EUC-KR. That debate even made a couple of articles in major Korean newspapers and even a public hearing was held in Seoul with an official from MoIC(Ministry of Information and Communication). Anyway, the flaw of MS designation became crystal clear when KSA changed the name of KS C 5601 to KS X 1001 and a tentative conclusion was that we couldn't use ks_c_5601*. However, MS went onto use it nonetheless. Now with the browser market completely dominated by MS IE and the OS market still dominated by MS-Windows, 'ks_c_5601-1987' is everywhere. Mozilla and Linux/Unix/Mac users still adhere to EUC-KR. > > One (rather drastic) way to reduce the number of spam mails > > is to just filter out email messages with MIME charset 'ks_c_5601-1987' > > and C-T 'text/html'. > > Well, a moderate number of spams are okay to me; I even enjoy them > sometimes and they were useful in the course of forging Encode :) What I do is to use procmail to collect potential spams in a separate folder (of course, I have a more fine-tuned filter than the above) and drop in there from time to time to look for anything interesting. > > Spammers are much more likely to use non-standard > > and broken mail programs than non-spammers (at least in Korea). > > Glad to hear that. What is the socially accepted way to include > Korean messages in MIME header? =?euc-kr?b... good enough? Or do you > guys prefer quoted-printable? Or Korea is so much into the future and > =?UTF-8?b= is the standard :? Right now, EUC-KR with B or Q encoding is widely used by Linux/Unix/Mozilla/MacOS users and some web mail services. (Last time I checked the most popular web mail service in Korea did not RFC 2047-encode message headers. I wrote them several times, but I gave up. It took me several email messages to make them replace 'KST' with '+0900 (KST)' in Date: header). However, I think we have to move onto UTF-8 as soon as possible and that's a consensus among Korean user community. > I have. Another welcome thing is test data. See t/*.euc and > t/*.ref. t/(JP|KR).t does a round-trip matching test to see if it is > okay. That's nice. I'll try to build Perl dev-snapshot (I finally squeezed out some disk space. Hmm, hard disk is cheap and I should buy a 100GB disk....) and see Encode in action. > Anyway, Kamsahamnida! Chonmaneyo ! Jungshik