Re: Unicode support in cd9660 [patch for review]
Boris Popov wrote: On Wed, 27 Dec 2000, Maxim Sobolev wrote: Several days ago I got a CD with Russian filenames on it and discovered that I'm unable to read those filenames. After some hacking I produced a patch, which should solve this problem in the manner similar to what we have in msdosfs module (i.e. user-provided conversion table). I have to emphasize that it's a temporary solution until we will have iconv support in kernel. The patch seems to be ok as temporary solution for CDs with Russian file names. And as temporary solution it well suits to the ports collection, not to the main tree. In the near future we'll have iconv interface in the kernel which uses libiconv library written by Konstantin Chuguev. I'm really sorry for delays, but my current job leaves nearly zero spare time to me and there is a hope that January will be less busy. Ok folks, I'll do a port out of it. -Maxim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
On Thu, Dec 28, 2000 at 11:19:16AM +0200, Maxim Sobolev scribbled: | Boris Popov wrote: | On Wed, 27 Dec 2000, Maxim Sobolev wrote: | In the near future we'll have iconv interface in the kernel which | uses libiconv library written by Konstantin Chuguev. I'm really sorry for | delays, but my current job leaves nearly zero spare time to me and there | is a hope that January will be less busy. | | Ok folks, I'll do a port out of it. Thank you very much! :) We can finally end this discussion and return -current back to its daily scheduled "is xxx build broken?" talk. /me notes that keichii has generated more the 50% of -i18n traffic in the last month... :( -- +--+ | [EMAIL PROTECTED] | [EMAIL PROTECTED] | | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. | +--+ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
Hello, Thanks for the hint, Michael. I have made a port based on Matsuzaki-san' patch (with my little addition). The only problem - I don't have web page/ftp directory. So if anybody agrees to host a distfile I would gladly send it as well as a port tarball. On Wed, 27 Dec 2000, Michael C . Wu wrote: Have you seen ports/chinese/big5fs? Japanese/Korean do the same thing too You could simply make a port of this that loads KLD's. This enables us Regards, Vladimir -- ===|=== Vladimir Kushnir | [EMAIL PROTECTED] |Powered by FreeBSD To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
Hello Vladimir, Friday, December 29, 2000, 1:01:25 AM, you wrote: VK Hello, VK Thanks for the hint, Michael. I have made a port based on Matsuzaki-san' VK patch (with my little addition). The only problem - I don't have web VK page/ftp directory. So if anybody agrees to host a distfile I would VK gladly send it as well as a port tarball. Feel free to dump it to me and I'll put it into MASTER_SITE_LOCAL. -Maxim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Unicode support in cd9660 [patch for review]
Hi, Several days ago I got a CD with Russian filenames on it and discovered that I'm unable to read those filenames. After some hacking I produced a patch, which should solve this problem in the manner similar to what we have in msdosfs module (i.e. user-provided conversion table). I have to emphasize that it's a temporary solution until we will have iconv support in kernel. Please somebody review attached patches. -Maxim Index: cd9660/cd9660_lookup.c === RCS file: /home/ncvs/src/sys/isofs/cd9660/cd9660_lookup.c,v retrieving revision 1.25 diff -d -u -r1.25 cd9660_lookup.c --- cd9660/cd9660_lookup.c 2000/10/03 04:39:50 1.25 +++ cd9660/cd9660_lookup.c 2000/12/27 10:03:04 @@ -239,7 +239,7 @@ if (namelen != 1 || ep-name[0] != 0) goto notfound; - } else if (!(res = isofncmp(name, len, ep-name, namelen, imp-joliet_level))) { + } else if (!(res = isofncmp(name, len, ep-name, +namelen, imp-joliet_level, imp-ctable))) { if (isoflags 2) ino = isodirino(ep, imp); else Index: cd9660/cd9660_mount.h === RCS file: /home/ncvs/src/sys/isofs/cd9660/cd9660_mount.h,v retrieving revision 1.4 diff -d -u -r1.4 cd9660_mount.h --- cd9660/cd9660_mount.h 2000/05/01 20:05:04 1.4 +++ cd9660/cd9660_mount.h 2000/12/27 10:03:04 @@ -47,6 +47,7 @@ struct export_args export; /* network export info */ int flags; /* mounting flags, see below */ int ssector;/* starting sector, 0 for 1st session */ + u_char *ctable[256]; /* Table for converting unicode filenames */ }; #defineISOFSMNT_NORRIP 0x0001 /* disable Rock Ridge Ext.*/ #defineISOFSMNT_GENS 0x0002 /* enable generation numbers */ Index: cd9660/cd9660_rrip.c === RCS file: /home/ncvs/src/sys/isofs/cd9660/cd9660_rrip.c,v retrieving revision 1.18 diff -d -u -r1.18 cd9660_rrip.c --- cd9660/cd9660_rrip.c2000/05/05 09:58:17 1.18 +++ cd9660/cd9660_rrip.c2000/12/27 10:03:05 @@ -301,7 +301,7 @@ { isofntrans(isodir-name,isonum_711(isodir-name_len), ana-outbuf,ana-outlen, - 1,isonum_711(isodir-flags)4, ana-imp-joliet_level); + 1,isonum_711(isodir-flags)4, ana-imp-joliet_level, +ana-imp-ctable); switch (*ana-outbuf) { default: break; @@ -509,7 +509,7 @@ pwhead = isodir-name + isonum_711(isodir-name_len); if (!(isonum_711(isodir-name_len)1)) pwhead++; - isochar(isodir-name, pwhead, ana-imp-joliet_level, c); + isochar(isodir-name, pwhead, ana-imp-joliet_level, c, ana-imp-ctable); /* If it's not the '.' entry of the root dir obey SP field */ if (c != 0 || isonum_733(isodir-extent) != ana-imp-root_extent) @@ -646,7 +646,7 @@ *outlen = 0; isochar(isodir-name, isodir-name + isonum_711(isodir-name_len), - imp-joliet_level, c); + imp-joliet_level, c, imp-ctable); tab = rrip_table_getname; if (c == 0 || c == 1) { cd9660_rrip_defname(isodir,analyze); Index: cd9660/cd9660_util.c === RCS file: /home/ncvs/src/sys/isofs/cd9660/cd9660_util.c,v retrieving revision 1.15 diff -d -u -r1.15 cd9660_util.c --- cd9660/cd9660_util.c2000/10/29 13:56:43 1.15 +++ cd9660/cd9660_util.c2000/12/27 10:03:05 @@ -52,25 +52,28 @@ * Return number of bytes consumed */ int -isochar(isofn, isoend, joliet_level, c) +isochar(isofn, isoend, joliet_level, c, ctable) u_char *isofn; u_char *isoend; int joliet_level; u_char *c; + u_char **ctable; { *c = *isofn++; if (joliet_level == 0 || isofn == isoend) /* (00) and (01) are one byte in Joliet, too */ return 1; - /* No Unicode support yet :-( */ + /* Limited Unicode support yet :-( */ + /* (requires user-supplied conversion table) */ switch (*c) { - default: - *c = '?'; - break; - case '\0': + case '\0': /* ANSI */ *c = *isofn; break; + default: + if ((ctable[*c] == NULL) || ((*c = ctable[*c][*isofn]) == '\0')) + *c = '?'; + break; } return 2; } @@ -81,12 +84,13 @@ * Note: Version number plus ';' may be omitted. */ int -isofncmp(fn, fnlen, isofn,
Re: Unicode support in cd9660 [patch for review]
On Wed, Dec 27, 2000 at 12:05:57 +0200, Maxim Sobolev wrote: Please somebody review attached patches. + u_char *ctable[256]; /* Table for converting unicode filenames */ You deside to use per- Unicode base conversion table, it takes much memory and don't satisfy in any case because you miss other graphics related to charset of OS that made CD (I mean high code table characters like copyright, angle quotes and so on). Better variant is to use exact to/from Unicode conversion tables, if you know exact charset of OS that made CD. I.e. I suggest to use the method that MSDOSFS currently use which is foreign-charset - Unicode - native charset. It takes small memory and convert names without loss of some characters. -- Andrey A. Chernov http://ache.pp.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
Àíäðåé ×åðíîâ wrote: On Wed, Dec 27, 2000 at 12:05:57 +0200, Maxim Sobolev wrote: Please somebody review attached patches. + u_char *ctable[256]; /* Table for converting unicode filenames */ You deside to use per- Unicode base conversion table, it takes much memory Not too much - only 1K per cd9660 mount point for machines with sizeof(u_char *) == 4. This also provides opportunity to load several tables with different bases transparently. and don't satisfy in any case because you miss other graphics related to charset of OS that made CD (I mean high code table characters like copyright, angle quotes and so on). Better variant is to use exact to/from Unicode conversion tables, if you know exact charset of OS that made CD. I'm now sure how could I obtain charset for each of dozen+ OSes that may create a CD. I.e. I suggest to use the method that MSDOSFS currently use which is foreign-charset - Unicode - native charset. It takes small memory and convert names without loss of some characters. I don't see any problems, because it's likely that usual special high code table characters (copyright, angle quotes and so on) will be represented using Unicode charcodes with first byte (`base') equal to 0, so they can be mapped directly into native charset. In my implementation only Unicode characters with base !=0 are to be translated. All less usual characters (graphics and so on) can be translated by extending appropriate codetable to include additional translation tables with different bases (e.g. 0x25 for graphics chars etc.). -Maxim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: [FreeBSD-tech-jp 2988] Re: Unicode support in cd9660 [patch for review]
On Wed, 27 Dec 2000 21:28:19 +0900, Motomichi Matsuzaki [EMAIL PROTECTED] said: msaki Any ideas? There was dicussion about this issue on [EMAIL PROTECTED] mailing list with Subject: "Unicode support in kernel" and "code set recoding engine, V2" in October and November, 1999. Summary of my proposal is the following: http://mail-index.netbsd.org/tech-kern/1999/10/15/0009.html http://mail-index.netbsd.org/tech-kern/1999/11/23/0002.html (the former one is somewhat difficult, though) -- soda To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
Motomichi Matsuzaki wrote: At Wed, 27 Dec 2000 12:05:57 +0200, Maxim Sobolev [EMAIL PROTECTED] wrote: Several days ago I got a CD with Russian filenames on it and discovered that I'm unable to read those filenames. After some hacking I produced a patch, Vladimir Kushnir's patch will be for it. http://www.freebsd.org/cgi/getmsg.cgi?fetch=270425+0+/usr/local/www/db/text/2000/freebsd-hackers/20001203.freebsd-hackers and it is based on my patch: http://triaez.kaisei.org/~mzaki/joliet/ which should solve this problem in the manner similar to what we have in msdosfs module (i.e. user-provided conversion table). I have to emphasize that it's a temporary solution until we will have iconv support in kernel. *PLEASE* be careful about filename I18N. 1. Joliet extension The Joliet extension are built on Unicode basis, and is the "multilingual" filesystem. We can found CDs which contain files named by all of English, French, Russian, Chinese, and Japanese languages. So charset conversion per mount is not sufficient. You can specify multiple charset conversion tables for each mount point, the problem is only to create appropriate conversion tables (I do not have any CDs with anything than English/Russian filenames :- ). 3. Relation to userland applications Currently, conversion table between Unicode and local charset are widely needed and implemented, for such as the Joliet extension, the FAT filesystem, TrueType rasterizers, WWW browsers, and so on. We should share the tables as possible for their consintency. So the ideal solution to code conversion are not in-kernel table but userland shared library. Therefore, filename code conversion should also be done in userland as possible. 4. Rough idea of me My preliminary idea to the filesystem I18N: * filenames recorded on Unix filesystems (e.g. FFS, MFS) use an arbitrary codeset, for example Unicode. * interface between kernel and userland should use filesystem-safe encoding, for example UTF-8. * userland applications can convert from/to the user-requested charsets, such as latin-2, koi8, and euc-jp, using shared library. * the Joliet extension and UDF, which based on Unicode, need no in-kernel conversion, in case Unix filesystems use Unicode. * the FAT filesystem, which use both Unicode and conventional codepages, requires in-kernel conversion in order to write the conventional 8.3 names. Any ideas? Thanks for the pointing out, but I think that your proposal is too generic to be committed any time soon (not even to mention MFC'ing it). Moreover, as I pointed out, currently efforts to provide generic Unicode functionality in kernel/userland are underway, so it is likely that part of your work will be duplicated/obsoleted. What I'm proposing here is quick'n'dirty (and limited as so) solution to allow mounting CD's with unicode filenames on it. This solution is targeted to be temporary until iconv-based kernel interfaces will appear. -Maxim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
At Wed, 27 Dec 2000 14:54:00 +0200, Maxim Sobolev [EMAIL PROTECTED] wrote: The Joliet extension are built on Unicode basis, and is the "multilingual" filesystem. We can found CDs which contain files named by all of English, French, Russian, Chinese, and Japanese languages. So charset conversion per mount is not sufficient. You can specify multiple charset conversion tables for each mount point, the problem is only to create appropriate conversion tables (I do not have any CDs with anything than English/Russian filenames :- ). Suppose a file which name contains multilingual characters. Think Japanese researchers of Russian literatures. The Microsoft Word document files about their works may have such complexed filenames. And Joliet can handle them. The multiple mount point solution is insufficient to these situations. 4. Rough idea of me My preliminary idea to the filesystem I18N: Thanks for the pointing out, but I think that your proposal is too generic to be committed any time soon (not even to mention MFC'ing it). Yes, you're right. I have no more than such rough idea indeed. What I'm proposing here is quick'n'dirty (and limited as so) solution to allow mounting CD's with unicode filenames on it. This solution is targeted to be temporary until iconv-based kernel interfaces will appear. But your solution is no effective and much harmful to multibyte users. The "loading conversion tables on every mount points" idea is totally wrong. -- Motomichi Matsuzaki [EMAIL PROTECTED] Dept. of Biological Sciences, Grad. School of Science, Univ. of Tokyo, Japan To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
Motomichi Matsuzaki wrote: At Wed, 27 Dec 2000 14:54:00 +0200, Maxim Sobolev [EMAIL PROTECTED] wrote: The Joliet extension are built on Unicode basis, and is the "multilingual" filesystem. We can found CDs which contain files named by all of English, French, Russian, Chinese, and Japanese languages. So charset conversion per mount is not sufficient. You can specify multiple charset conversion tables for each mount point, the problem is only to create appropriate conversion tables (I do not have any CDs with anything than English/Russian filenames :- ). Suppose a file which name contains multilingual characters. Think Japanese researchers of Russian literatures. The Microsoft Word document files about their works may have such complexed filenames. And Joliet can handle them. Yeah, but unfortunately our fs interface can't. :( The multiple mount point solution is insufficient to these situations. 4. Rough idea of me My preliminary idea to the filesystem I18N: Thanks for the pointing out, but I think that your proposal is too generic to be committed any time soon (not even to mention MFC'ing it). Yes, you're right. I have no more than such rough idea indeed. What I'm proposing here is quick'n'dirty (and limited as so) solution to allow mounting CD's with unicode filenames on it. This solution is targeted to be temporary until iconv-based kernel interfaces will appear. But your solution is no effective and much harmful to multibyte users. You are not quite right. For multibyte users my solution (workaround?) is at least equial to the previous no-unicode case . I do not see how it can be harmful. -Maxim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
At Wed, 27 Dec 2000 15:38:58 +0200, Maxim Sobolev [EMAIL PROTECTED] wrote: But your solution is no effective and much harmful to multibyte users. You are not quite right. For multibyte users my solution (workaround?) is at least equial to the previous no-unicode case . I do not see how it can be harmful. 1. In just your workaround, multibyte users will take no merits. 2. Based on your direction, the size of loadable conversion table will immensely expand for multibyte support, or be abandoned. Fundamental misdesign will lead to such unfortunate situation. So I said your solution was harmful. -- Motomichi Matsuzaki [EMAIL PROTECTED] Dept. of Biological Sciences, Grad. School of Science, Univ. of Tokyo, Japan To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
Motomichi Matsuzaki wrote: At Wed, 27 Dec 2000 15:38:58 +0200, Maxim Sobolev [EMAIL PROTECTED] wrote: But your solution is no effective and much harmful to multibyte users. You are not quite right. For multibyte users my solution (workaround?) is at least equial to the previous no-unicode case . I do not see how it can be harmful. 1. In just your workaround, multibyte users will take no merits. 2. Based on your direction, the size of loadable conversion table will immensely expand for multibyte support, or be abandoned. Fundamental misdesign will lead to such unfortunate situation. So I said your solution was harmful. Proposed by me patches is no way an official direction of the Project and as I advertised are merely a workaround to allow non-English users to read CD with native filenames until comprehensive iconv for kernel will be introduced. I would be glad if someone will replace my hack with more generic solution. -Maxim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
On Wed, Dec 27, 2000 at 12:05:57PM +0200, Maxim Sobolev scribbled: | Several days ago I got a CD with Russian filenames on it and discovered that | I'm unable to read those filenames. After some hacking I produced a patch, | which should solve this problem in the manner similar to what we have in | msdosfs module (i.e. user-provided conversion table). I have to emphasize that | it's a temporary solution until we will have iconv support in kernel. | | Please somebody review attached patches. Please do not assume Unicode. I18N/L10N efforts have been crying for programmers to *not* and *never* assume *anything*. :) Also, this belongs more on -i18n more than anything else. I really do not want to generate all the traffic on -i18n alone.. :) Have you seen ports/chinese/big5fs? Japanese/Korean do the same thing too You could simply make a port of this that loads KLD's. This enables us to have the support without having hacks in src/sys. I talked to Boris at length about this. And I think this would be the best way to implement "temporary" hacks. As to the progress of iconv, we should have it soon, as soon as itojun and I work out how to import either the Citrus code or Konstantin's code. If you make this a port, I say "go for it." :) If you wish to commit to src/sys, I have strong doubts about this. I hope you do not mind my bluntness about this, as I really want to express the feelings. Merry Christmas and a Happy New Year, Michael -- +--+ | [EMAIL PROTECTED] | [EMAIL PROTECTED] | | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. | +--+ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
On Wed, Dec 27, 2000 at 05:57:19PM +0200, Maxim Sobolev scribbled: | Motomichi Matsuzaki wrote: | | At Wed, 27 Dec 2000 15:38:58 +0200, | Maxim Sobolev [EMAIL PROTECTED] wrote: |But your solution is no effective and much harmful to multibyte users. | You are not quite right. For multibyte users my solution (workaround?) is at |least equial to the previous no-unicode case . I do | not see how it can be harmful. | | 1. In just your workaround, multibyte users will take no merits. | | 2. Based on your direction, the size of loadable conversion table | will immensely expand for multibyte support, or be abandoned. | Fundamental misdesign will lead to such unfortunate situation. | So I said your solution was harmful. | | Proposed by me patches is no way an official direction of the Project and as I |advertised are merely a workaround to allow non-English | users to read CD with native filenames until comprehensive iconv for kernel will be |introduced. I would be glad if someone will | replace my hack with more generic solution. I think that making this "hack" a russian/xxxfs port and I think everyone can be happy. If you want unicode FS like this, perhaps you can have a sysutils/unicodefs. :) After all, what are KLD's for but modularity? :) -- +--+ | [EMAIL PROTECTED] | [EMAIL PROTECTED] | | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. | +--+ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
"Michael C . Wu" wrote: As to the progress of iconv, we should have it soon, as soon as itojun and I work out how to import either the Citrus code or Konstantin's code. As I could see from the CVSed Citrus code, it's a locale library rather than iconv, and it's just got the stub calls of iconv functions. So I thought Citrus and iconv complement each other and don't do the same. Am I wrong? As for the iconv library itself, its userland part is complete. I'm really busy these days despite Christmas and the New Year, but I expect to release v2.1 in 3 weeks with the following changes: * the two patches from ports will be incorporated; * a few charsets added (to provide compatibility with the libiconv port and to be able to use glib-1.3 with iconv - the only port still depending on libiconv); * memory and file management functions moved into a separate file; then a kernel-side iconv implementation can compile iconv with its own specific memory and file management functions (from a different file). I tried to write a kernel module, but I don't have enough knowledge of the kernel. If anybody would like to do it, I am ready to help. Regards, Konstantin. -- * *Konstantin Chuguev - Application Engineer * * Francis House, 112 Hills Road * Cambridge CB2 1PQ, United Kingdom D A N T E WWW:http://www.dante.net N '²æìr¸zǧvf¢Új:+v¨· è®"¶§²æìr¸yúÞy»rêëz{bØ^nr¡ûazg¬±¨
Re: Unicode support in cd9660 [patch for review]
On Wed, Dec 27, 2000 at 08:32:26PM +0200, Maxim Sobolev scribbled: | "Michael C . Wu" wrote: | On Wed, Dec 27, 2000 at 05:57:19PM +0200, Maxim Sobolev scribbled: | | Motomichi Matsuzaki wrote: | I think that making this "hack" a russian/xxxfs port and I think | everyone can be happy. If you want unicode FS like this, perhaps you | can have a sysutils/unicodefs. :) | No, it's no way a ports- commit. Thank you for the one sentence reply without giving any reasons. As a community, I thought we are supposed to communicate. I have expressed my reasons and ideas, and you return the favor with an one line comment. :) There is nothing wrong with have a port install a KLD, vmware does it. And ports/emulators/linux_base depends on linux.ko. keichii@recursive:/usr/ports$ ls japanese/msdosfs/ Makefile files/ patches.5/ pkg-descr distinfo patches.4/ pkg-comment pkg-plist keichii@recursive:/usr/ports$ ls chinese/big5fs/ Makefile distinfo files/ pkg-comment pkg-descrpkg-plist keichii@recursive:/usr/ports$ cat chinese/big5fs/pkg-descr This port installs two kernel modules, cd9660.ko and msdos.ko, which will let users read Big5 filenames on Joliet and VFAT filesystems. Why do you think we did this? You have admitted that this is an ugly hack. Since when did we allow ugly hacks in the kernel when we can avoid it? Japanese/Chinese chose to do this because we do not want dirty stuff in the kernel. We have had this FS support since 2.2.x. Please understand that we do not want hacks, and we certainly do not want to be reliant on unicode alone. We should never ever assume stuff in I18N, and you are doing exactly this. Have you any idea what this would cause? Do you know how hard it is to remove something after it works for a while? We do NOT ever want to have to do legacy support in the future, because *you* want unicode and you want it now. In addition, this would probably mean laziness in the future to do this support properly. In -SMP, they are avoiding to do hacks to make things work temporarily. Why should I18N do otherwise? You can commit this to work with iconv() after iconv() gets in. I would personally cheer you on. I apologize for any harshness in the emails, it is quite hard to express things without facial expressions. I think we all care about the project a lot. Please be assured that I mean no offense to you. -- +--+ | [EMAIL PROTECTED] | [EMAIL PROTECTED] | | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. | +--+ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
On Wed, Dec 27, 2000 at 13:44:11 +0200, Maxim Sobolev wrote: I'm now sure how could I obtain charset for each of dozen+ OSes that may create a CD. There is not so much number, usualy only one Russian charset per OS :-) I don't see any problems, because it's likely that usual special high code table characters (copyright, angle quotes and so on) will be represented using Unicode charcodes with first byte (`base') equal to 0, so they can be mapped directly into native charset. In my implementation only Unicode characters with base !=0 are to be translated. All less usual characters (graphics and so on) can be translated by extending appropriate codetable to include additional translation tables with different bases (e.g. 0x25 for graphics chars etc.). Well, I could live with it in case you add _whole_ windows-1251 set as Unicode to your loadable table and provide corresponding mapping for all matching KOI8-R characters, as MSDOSFS currently does. You can get those tables from MSDOSFS. This is minimal basis, you can make separate table for KOI8-U, etc. I say this because I see you try to treat KOI8-R and KOI8-U in your patch as the same charset which is not acceptable. Also please call those tables per local charset name and not 'cd9660'. -- Andrey A. Chernov http://ache.pp.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
On Wed, Dec 27, 2000 at 12:48:12 -0600, Michael C . Wu wrote: On Wed, Dec 27, 2000 at 08:32:26PM +0200, Maxim Sobolev scribbled: | "Michael C . Wu" wrote: | On Wed, Dec 27, 2000 at 05:57:19PM +0200, Maxim Sobolev scribbled: | | Motomichi Matsuzaki wrote: | I think that making this "hack" a russian/xxxfs port and I think | everyone can be happy. If you want unicode FS like this, perhaps you | can have a sysutils/unicodefs. :) | No, it's no way a ports- commit. I agree. Thank you for the one sentence reply without giving any reasons. As a community, I thought we are supposed to communicate. I have expressed my reasons and ideas, and you return the favor with an one line comment. :) This is general method of telling kernel how to convert names from Unicode to any local charset, not particulary to Russian one. Anybody feel free to add their own single bytes charsets tables. Yes, it is a per-FS hack, but until iconv or something like will be integrated, some hack needed just to read CDs selling at nearby shop. keichii@recursive:/usr/ports$ ls japanese/msdosfs/ Japanese etc. is _very_ different and needs tons of efforts even when implemented like similar hack, we currently talk about and limited to single bytes charsets only. We have almost full single bytes charset localization in the kernel/userland, so this step is next logical extention in that direction. I can't say something like about double bytes charsets like Japanese etc. -- Andrey A. Chernov http://ache.pp.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
-audit trimmed, cc'ed to -i18n On Wed, Dec 27, 2000 at 10:02:01PM +0300, áÎÄÒÅÊ þÅÒÎÏ× scribbled: | On Wed, Dec 27, 2000 at 12:48:12 -0600, Michael C . Wu wrote: | On Wed, Dec 27, 2000 at 08:32:26PM +0200, Maxim Sobolev scribbled: | | "Michael C . Wu" wrote: | | On Wed, Dec 27, 2000 at 05:57:19PM +0200, Maxim Sobolev scribbled: | | | Motomichi Matsuzaki wrote: | | I think that making this "hack" a russian/xxxfs port and I think | | everyone can be happy. If you want unicode FS like this, perhaps you | | can have a sysutils/unicodefs. :) | | | No, it's no way a ports- commit. | | I agree. I disagree. | Thank you for the one sentence reply without giving any reasons. | As a community, I thought we are supposed to communicate. | I have expressed my reasons and ideas, and you return the favor | with an one line comment. :) | | This is general method of telling kernel how to convert names from Unicode | to any local charset, not particulary to Russian one. Anybody feel free to | add their own single bytes charsets tables. READ: "single byte" You are breaking CJK multibyte support. Why? Why do you want make software engineering mistakes? | Yes, it is a per-FS hack, but until iconv or something like will be | integrated, some hack needed just to read CDs selling at nearby shop. | | keichii@recursive:/usr/ports$ ls japanese/msdosfs/ | | Japanese etc. is _very_ different and needs tons of efforts even when | implemented like similar hack, we currently talk about and limited to | single bytes charsets only. Why should you ignore the multibyte stuff because it is harder? SMPng is hard, but no one is ignoring SMP because SMP is hard to implement. Since device drivers for cheap hardware is hard to write/fix, should we also ignore those? Your logic fails to evaluate to TRUE. | We have almost full single bytes charset localization in the | kernel/userland, so this step is next logical extention in that direction. | I can't say something like about double bytes charsets like Japanese etc. Right, and as much as you want single byte stuff to work, CJK is also a large market for FreeBSD. Where else in the world is there a full- fledged print monthly 200page magazine for BSD but Japan? (Sorry, but DaemonNews is not 200 pages :) The proposal breaks multibyte support and I do not believe it is acceptable to do this. You want Russian to work, and us CJK people want CJK to work. If you were looking at this from my point of view, would you say yes? Probably not. The frustration with C charset is enough, can we please not make the mistake that PDP-based BSD's made long ago? -- +--+ | [EMAIL PROTECTED] | [EMAIL PROTECTED] | | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. | +--+ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
On Wed, Dec 27, 2000 at 13:35:52 -0600, Michael C . Wu wrote: READ: "single byte" You are breaking CJK multibyte support. Why? Why do you want make software engineering mistakes? 1) Nobody can break something which not exist yet. 2) The stuff discussed is optional and not using it you got old functionality. Why should you ignore the multibyte stuff because it is harder? Not for this reason, but because it must be implemented at higher abstraction level first to be considered. Since I have no much interest in multibyte characters, it is not mine task in general, but others. The proposal breaks multibyte support and I do not believe it is acceptable Again, I see no way how it can break nonexisten support. Remember that conversion table is optional and without loading it you got exact previous functionality variant. Very similar stuff is in MSDOSFS for years and nobody complains. -- Andrey A. Chernov http://ache.pp.ru/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
Aimed at the thread, not the participants. This is off-topic for audit-. Please bring it back to audit- when you have some actual code to audit. Thanks! M On Wed, Dec 27, 2000 at 12:05:57PM +0200, Maxim Sobolev scribbled: | Several days ago I got a CD with Russian filenames on it and discovered tha t | I'm unable to read those filenames. After some hacking I produced a patch, | which should solve this problem in the manner similar to what we have in | msdosfs module (i.e. user-provided conversion table). I have to emphasize t hat | it's a temporary solution until we will have iconv support in kernel. | | Please somebody review attached patches. Please do not assume Unicode. I18N/L10N efforts have been crying for programmers to *not* and *never* assume *anything*. :) Also, this belongs more on -i18n more than anything else. I really do not want to generate all the traffic on -i18n alone.. :) Have you seen ports/chinese/big5fs? Japanese/Korean do the same thing too You could simply make a port of this that loads KLD's. This enables us to have the support without having hacks in src/sys. I talked to Boris at length about this. And I think this would be the best way to implement "temporary" hacks. As to the progress of iconv, we should have it soon, as soon as itojun and I work out how to import either the Citrus code or Konstantin's code. If you make this a port, I say "go for it." :) If you wish to commit to src/sys, I have strong doubts about this. I hope you do not mind my bluntness about this, as I really want to express the feelings. Merry Christmas and a Happy New Year, Michael -- +--+ | [EMAIL PROTECTED] | [EMAIL PROTECTED] | | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. | +--+ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-audit" in the body of the message -- Mark Murray Warning: this .sig is umop ap!sdn To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
[[ sorry for the minorly offtopic post ]] In message [EMAIL PROTECTED] "Michael C. Wu" writes: : Right, and as much as you want single byte stuff to work, CJK is also : a large market for FreeBSD. Where else in the world is there a full- : fledged print monthly 200page magazine for BSD but Japan? : (Sorry, but DaemonNews is not 200 pages :) I think that the BSD magazine is published more like quarterly. There have only been 6 issues since October 1999 of BSD Magazine (issue 6 just arrived the other day). There have been two issues of the FreeBSD magazine in Japan since October 2000. The second issue of FreeBSD mag showed up last week. However, your point about how big FreeBSD is in Japan is none the less quite valid. Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
On Wed, 27 Dec 2000, Maxim Sobolev wrote: Several days ago I got a CD with Russian filenames on it and discovered that I'm unable to read those filenames. After some hacking I produced a patch, which should solve this problem in the manner similar to what we have in msdosfs module (i.e. user-provided conversion table). I have to emphasize that it's a temporary solution until we will have iconv support in kernel. The patch seems to be ok as temporary solution for CDs with Russian file names. And as temporary solution it well suits to the ports collection, not to the main tree. In the near future we'll have iconv interface in the kernel which uses libiconv library written by Konstantin Chuguev. I'm really sorry for delays, but my current job leaves nearly zero spare time to me and there is a hope that January will be less busy. -- Boris Popov http://www.butya.kz/~bp/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Unicode support in cd9660 [patch for review]
-On [20001227 20:05], Andrej Cernov ([EMAIL PROTECTED]) wrote: Yes, it is a per-FS hack, but until iconv or something like will be integrated, some hack needed just to read CDs selling at nearby shop. As far as I know, Boris Popov was working on iconv() support. I see he is cc:'d, so I happily await his ideas/statusupdate. Furthermore, even though I only use special characters for Dutch, German, Norwegian, Swedish, Danish, Finnish and Icelandic, I feel that _if_ we start to add Unicode support we do it right from the beginning. Unicode was meant to solve the native character set problem for all languages [as far as my knowledge stretches] and should not be a working, but ugly, hack which only allows per-language solutions. If you want to have people to have the ability to mount their CD's with Russian characters by all means provide the patch on your website. I am by all means not an l10n or i18n hacker nor wizard, otherwise I would've dedicated my ample time working on this and solve it once and for all with all other interested parties. In short: please do it right from the start. Thanks, -- Jeroen Ruigrok van der Werven VIA Net.Works The Netherlands BSD: Technical excellence at its best Network- and systemadministrator D78D D0AD 244D 1D12 C9CA 7152 035C 1138 546A B867 I'm breaking you down... I'm taking you down... To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message