Re: Unicode support in cd9660 [patch for review]

2000-12-28 Thread Maxim Sobolev

Boris Popov wrote:

 On Wed, 27 Dec 2000, Maxim Sobolev wrote:

  Several days ago I got a CD with Russian filenames on it and discovered that
  I'm unable to read those filenames. After some hacking I produced a patch,
  which should solve this problem in the manner similar to what we have in
  msdosfs module (i.e. user-provided conversion table). I have to emphasize that
  it's a temporary solution until we will have iconv support in kernel.

 The patch seems to be ok as temporary solution for CDs with
 Russian file names. And as temporary solution it well suits to the ports
 collection, not to the main tree.

 In the near future we'll have iconv interface in the kernel which
 uses libiconv library written by Konstantin Chuguev. I'm really sorry for
 delays, but my current job leaves nearly zero spare time to me and there
 is a hope that January will be less busy.

Ok folks, I'll do a port out of it.

-Maxim



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-28 Thread Michael C . Wu

On Thu, Dec 28, 2000 at 11:19:16AM +0200, Maxim Sobolev scribbled:
| Boris Popov wrote:
|  On Wed, 27 Dec 2000, Maxim Sobolev wrote:
|  In the near future we'll have iconv interface in the kernel which
|  uses libiconv library written by Konstantin Chuguev. I'm really sorry for
|  delays, but my current job leaves nearly zero spare time to me and there
|  is a hope that January will be less busy.
| 
| Ok folks, I'll do a port out of it.

Thank you very much!  :)  We can finally end this discussion and 
return -current back to its daily scheduled "is xxx build broken?" 
talk.   

/me notes that keichii has generated more the 50% of -i18n traffic
in the last month... :(

-- 
+--+
| [EMAIL PROTECTED] | [EMAIL PROTECTED] |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+--+


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-28 Thread Vladimir Kushnir

Hello,
Thanks for the hint, Michael. I have made a port based on Matsuzaki-san'
patch (with my little addition). The only problem - I don't have web
page/ftp directory. So if anybody agrees to host a distfile I would
gladly send it as well as a port tarball.

On Wed, 27 Dec 2000, Michael C . Wu wrote:


 Have you seen ports/chinese/big5fs?  Japanese/Korean do the same thing too
 You could simply make a port of this that loads KLD's.  This enables us

Regards,
Vladimir

-- 

===|===
 Vladimir Kushnir  |
 [EMAIL PROTECTED]  |Powered by FreeBSD






To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-28 Thread Maxim Sobolev

Hello Vladimir,

Friday, December 29, 2000, 1:01:25 AM, you wrote:

VK Hello,
VK Thanks for the hint, Michael. I have made a port based on Matsuzaki-san'
VK patch (with my little addition). The only problem - I don't have web
VK page/ftp directory. So if anybody agrees to host a distfile I would
VK gladly send it as well as a port tarball.

Feel free to dump it to me and I'll put it into MASTER_SITE_LOCAL.

-Maxim




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Unicode support in cd9660 [patch for review]

2000-12-27 Thread Maxim Sobolev

Hi,

Several days ago I got a CD with Russian filenames on it and discovered that
I'm unable to read those filenames. After some hacking I produced a patch,
which should solve this problem in the manner similar to what we have in
msdosfs module (i.e. user-provided conversion table). I have to emphasize that
it's a temporary solution until we will have iconv support in kernel.

Please somebody review attached patches.

-Maxim


Index: cd9660/cd9660_lookup.c
===
RCS file: /home/ncvs/src/sys/isofs/cd9660/cd9660_lookup.c,v
retrieving revision 1.25
diff -d -u -r1.25 cd9660_lookup.c
--- cd9660/cd9660_lookup.c  2000/10/03 04:39:50 1.25
+++ cd9660/cd9660_lookup.c  2000/12/27 10:03:04
@@ -239,7 +239,7 @@
if (namelen != 1
|| ep-name[0] != 0)
goto notfound;
-   } else if (!(res = isofncmp(name, len, ep-name, 
namelen, imp-joliet_level))) {
+   } else if (!(res = isofncmp(name, len, ep-name, 
+namelen, imp-joliet_level, imp-ctable))) {
if (isoflags  2)
ino = isodirino(ep, imp);
else
Index: cd9660/cd9660_mount.h
===
RCS file: /home/ncvs/src/sys/isofs/cd9660/cd9660_mount.h,v
retrieving revision 1.4
diff -d -u -r1.4 cd9660_mount.h
--- cd9660/cd9660_mount.h   2000/05/01 20:05:04 1.4
+++ cd9660/cd9660_mount.h   2000/12/27 10:03:04
@@ -47,6 +47,7 @@
struct  export_args export; /* network export info */
int flags;  /* mounting flags, see below */
int ssector;/* starting sector, 0 for 1st session */
+   u_char  *ctable[256];   /* Table for converting unicode filenames */
 };
 #defineISOFSMNT_NORRIP 0x0001  /* disable Rock Ridge Ext.*/
 #defineISOFSMNT_GENS   0x0002  /* enable generation numbers */
Index: cd9660/cd9660_rrip.c
===
RCS file: /home/ncvs/src/sys/isofs/cd9660/cd9660_rrip.c,v
retrieving revision 1.18
diff -d -u -r1.18 cd9660_rrip.c
--- cd9660/cd9660_rrip.c2000/05/05 09:58:17 1.18
+++ cd9660/cd9660_rrip.c2000/12/27 10:03:05
@@ -301,7 +301,7 @@
 {
isofntrans(isodir-name,isonum_711(isodir-name_len),
   ana-outbuf,ana-outlen,
-  1,isonum_711(isodir-flags)4, ana-imp-joliet_level);
+  1,isonum_711(isodir-flags)4, ana-imp-joliet_level, 
+ana-imp-ctable);
switch (*ana-outbuf) {
default:
break;
@@ -509,7 +509,7 @@
pwhead = isodir-name + isonum_711(isodir-name_len);
if (!(isonum_711(isodir-name_len)1))
pwhead++;
-   isochar(isodir-name, pwhead, ana-imp-joliet_level, c);
+   isochar(isodir-name, pwhead, ana-imp-joliet_level, c, ana-imp-ctable);
 
/* If it's not the '.' entry of the root dir obey SP field */
if (c != 0 || isonum_733(isodir-extent) != ana-imp-root_extent)
@@ -646,7 +646,7 @@
*outlen = 0;
 
isochar(isodir-name, isodir-name + isonum_711(isodir-name_len),
-   imp-joliet_level, c);
+   imp-joliet_level, c, imp-ctable);
tab = rrip_table_getname;
if (c == 0 || c == 1) {
cd9660_rrip_defname(isodir,analyze);
Index: cd9660/cd9660_util.c
===
RCS file: /home/ncvs/src/sys/isofs/cd9660/cd9660_util.c,v
retrieving revision 1.15
diff -d -u -r1.15 cd9660_util.c
--- cd9660/cd9660_util.c2000/10/29 13:56:43 1.15
+++ cd9660/cd9660_util.c2000/12/27 10:03:05
@@ -52,25 +52,28 @@
  * Return number of bytes consumed
  */
 int
-isochar(isofn, isoend, joliet_level, c)
+isochar(isofn, isoend, joliet_level, c, ctable)
   u_char *isofn;
   u_char *isoend;
   int joliet_level;
   u_char *c;
+  u_char **ctable;
 {
   *c = *isofn++;
   if (joliet_level == 0 || isofn == isoend)
   /* (00) and (01) are one byte in Joliet, too */
   return 1;
 
-  /* No Unicode support yet :-( */
+  /* Limited Unicode support yet :-( */
+  /* (requires user-supplied conversion table) */
   switch (*c) {
-  default:
-  *c = '?';
-  break;
-  case '\0':
+  case '\0':   /* ANSI */
   *c = *isofn;
   break;
+  default:
+  if ((ctable[*c] == NULL) || ((*c = ctable[*c][*isofn]) == '\0'))
+  *c = '?';
+  break;
   }
   return 2;
 }
@@ -81,12 +84,13 @@
  * Note: Version number plus ';' may be omitted.
  */
 int
-isofncmp(fn, fnlen, isofn, 

Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread

On Wed, Dec 27, 2000 at 12:05:57 +0200, Maxim Sobolev wrote:
 Please somebody review attached patches.
 + u_char  *ctable[256];   /* Table for converting unicode filenames */

You deside to use per- Unicode base conversion table, it takes much memory
and don't satisfy in any case because you miss other graphics related to
charset of OS that made CD (I mean high code table characters like
copyright, angle quotes and so on). Better variant is to use exact to/from
Unicode conversion tables, if you know exact charset of OS that made CD.
I.e. I suggest to use the method that MSDOSFS currently use which is
foreign-charset - Unicode - native charset. It takes small memory and
convert names without loss of some characters.

-- 
Andrey A. Chernov
http://ache.pp.ru/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Maxim Sobolev

Àíäðåé ×åðíîâ wrote:

 On Wed, Dec 27, 2000 at 12:05:57 +0200, Maxim Sobolev wrote:
  Please somebody review attached patches.
  + u_char  *ctable[256];   /* Table for converting unicode filenames */

 You deside to use per- Unicode base conversion table, it takes much memory

Not too much - only 1K per cd9660 mount point for machines with sizeof(u_char *) == 4.
This also provides opportunity to load several tables with different bases
transparently.

 and don't satisfy in any case because you miss other graphics related to
 charset of OS that made CD (I mean high code table characters like
 copyright, angle quotes and so on). Better variant is to use exact to/from
 Unicode conversion tables, if you know exact charset of OS that made CD.

I'm now sure how could I obtain charset for each of dozen+ OSes that may create a CD.

 I.e. I suggest to use the method that MSDOSFS currently use which is
 foreign-charset - Unicode - native charset. It takes small memory and
 convert names without loss of some characters.

I don't see any problems, because it's likely that usual special high code table
characters (copyright, angle quotes and so on) will be represented using Unicode
charcodes with first byte (`base') equal to 0, so they can be mapped directly into
native charset. In my implementation only Unicode characters with base !=0 are to be
translated. All less usual characters (graphics and so on) can be translated by
extending appropriate codetable to include additional translation tables with different
bases (e.g. 0x25 for graphics chars etc.).

-Maxim



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: [FreeBSD-tech-jp 2988] Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Noriyuki Soda

 On Wed, 27 Dec 2000 21:28:19 +0900,
Motomichi Matsuzaki [EMAIL PROTECTED] said:

msaki Any ideas?

There was dicussion about this issue on [EMAIL PROTECTED]
mailing list with Subject: "Unicode support in kernel" and
"code set recoding engine, V2" in October and November, 1999.

Summary of my proposal is the following:
http://mail-index.netbsd.org/tech-kern/1999/10/15/0009.html
http://mail-index.netbsd.org/tech-kern/1999/11/23/0002.html
(the former one is somewhat difficult, though)
--
soda


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Maxim Sobolev

Motomichi Matsuzaki wrote:

 At Wed, 27 Dec 2000 12:05:57 +0200,
 Maxim Sobolev [EMAIL PROTECTED] wrote:
  Several days ago I got a CD with Russian filenames on it and discovered that
  I'm unable to read those filenames. After some hacking I produced a patch,

 Vladimir Kushnir's patch will be for it.

 
http://www.freebsd.org/cgi/getmsg.cgi?fetch=270425+0+/usr/local/www/db/text/2000/freebsd-hackers/20001203.freebsd-hackers

 and it is based on my patch:

 http://triaez.kaisei.org/~mzaki/joliet/

  which should solve this problem in the manner similar to what we have in
  msdosfs module (i.e. user-provided conversion table). I have to emphasize that
  it's a temporary solution until we will have iconv support in kernel.

 *PLEASE* be careful about filename I18N.

 1. Joliet extension

 The Joliet extension are built on Unicode basis,
 and is the "multilingual" filesystem.
 We can found CDs which contain files named by all of
 English, French, Russian, Chinese, and Japanese languages.
 So charset conversion per mount is not sufficient.

You can specify multiple charset conversion tables for each mount point, the problem 
is only to create appropriate conversion
tables (I do not have any CDs with anything than English/Russian filenames :- ).

 3. Relation to userland applications

 Currently, conversion table between Unicode and local charset are
 widely needed and implemented, for such as the Joliet extension,
 the FAT filesystem, TrueType rasterizers, WWW browsers, and so on.
 We should share the tables as possible for their consintency.
 So the ideal solution to code conversion are not in-kernel table
 but userland shared library.
 Therefore, filename code conversion should also be done in userland
 as possible.

 4. Rough idea of me

 My preliminary idea to the filesystem I18N:

 * filenames recorded on Unix filesystems (e.g. FFS, MFS) use
   an arbitrary codeset, for example Unicode.

 * interface between kernel and userland should use
   filesystem-safe encoding, for example UTF-8.

 * userland applications can convert from/to the user-requested
   charsets, such as latin-2, koi8, and euc-jp, using shared library.

 * the Joliet extension and UDF, which based on Unicode, need
   no in-kernel conversion, in case Unix filesystems use Unicode.

 * the FAT filesystem, which use both Unicode and conventional
   codepages, requires in-kernel conversion in order to
   write the conventional 8.3 names.

 Any ideas?

Thanks for the pointing out, but I think that your proposal is too generic to be 
committed any time soon (not even to mention
MFC'ing it). Moreover, as I pointed out, currently efforts to provide generic Unicode 
functionality in kernel/userland are
underway, so it is likely that part of your work will be duplicated/obsoleted.

What I'm proposing here is quick'n'dirty (and limited as so) solution to allow 
mounting CD's with unicode filenames on it.
This solution is targeted to be temporary until iconv-based kernel interfaces will 
appear.

-Maxim



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Motomichi Matsuzaki


At Wed, 27 Dec 2000 14:54:00 +0200,
Maxim Sobolev [EMAIL PROTECTED] wrote:
  The Joliet extension are built on Unicode basis,
  and is the "multilingual" filesystem.
  We can found CDs which contain files named by all of
  English, French, Russian, Chinese, and Japanese languages.
  So charset conversion per mount is not sufficient.
 You can specify multiple charset conversion tables for each mount point, the problem 
is only to create appropriate conversion
 tables (I do not have any CDs with anything than English/Russian filenames :- ).

Suppose a file which name contains multilingual characters.

Think Japanese researchers of Russian literatures.
The Microsoft Word document files about their works may have 
such complexed filenames. And Joliet can handle them.

The multiple mount point solution is insufficient to these situations.

  4. Rough idea of me
  My preliminary idea to the filesystem I18N:
 Thanks for the pointing out, but I think that your proposal is too
 generic to be committed any time soon (not even to mention MFC'ing it).

Yes, you're right. I have no more than such rough idea indeed.

 What I'm proposing here is quick'n'dirty (and limited as so) solution to allow 
mounting CD's with unicode filenames on it.
 This solution is targeted to be temporary until iconv-based kernel interfaces will 
appear.

But your solution is no effective and much harmful to multibyte users.
The "loading conversion tables on every mount points" idea is totally wrong.

-- 
Motomichi Matsuzaki [EMAIL PROTECTED] 
Dept. of Biological Sciences, Grad. School of Science, Univ. of Tokyo, Japan 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Maxim Sobolev

Motomichi Matsuzaki wrote:

 At Wed, 27 Dec 2000 14:54:00 +0200,
 Maxim Sobolev [EMAIL PROTECTED] wrote:
   The Joliet extension are built on Unicode basis,
   and is the "multilingual" filesystem.
   We can found CDs which contain files named by all of
   English, French, Russian, Chinese, and Japanese languages.
   So charset conversion per mount is not sufficient.
  You can specify multiple charset conversion tables for each mount point, the 
problem is only to create appropriate conversion
  tables (I do not have any CDs with anything than English/Russian filenames :- ).

 Suppose a file which name contains multilingual characters.

 Think Japanese researchers of Russian literatures.
 The Microsoft Word document files about their works may have
 such complexed filenames. And Joliet can handle them.

Yeah, but unfortunately our fs interface can't. :(

 The multiple mount point solution is insufficient to these situations.

   4. Rough idea of me
   My preliminary idea to the filesystem I18N:
  Thanks for the pointing out, but I think that your proposal is too
  generic to be committed any time soon (not even to mention MFC'ing it).

 Yes, you're right. I have no more than such rough idea indeed.

  What I'm proposing here is quick'n'dirty (and limited as so) solution to allow 
mounting CD's with unicode filenames on it.
  This solution is targeted to be temporary until iconv-based kernel interfaces will 
appear.

 But your solution is no effective and much harmful to multibyte users.

You are not quite right. For multibyte users my solution (workaround?) is at least 
equial to the previous no-unicode case . I do
not see how it can be harmful.

-Maxim



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Motomichi Matsuzaki



At Wed, 27 Dec 2000 15:38:58 +0200,
Maxim Sobolev [EMAIL PROTECTED] wrote:
  But your solution is no effective and much harmful to multibyte users.
 You are not quite right. For multibyte users my solution (workaround?) is at least 
equial to the previous no-unicode case . I do
 not see how it can be harmful.

1. In just your workaround, multibyte users will take no merits.

2. Based on your direction, the size of loadable conversion table
   will immensely expand for multibyte support, or be abandoned.
   Fundamental misdesign will lead to such unfortunate situation.
   So I said your solution was harmful.

-- 
Motomichi Matsuzaki [EMAIL PROTECTED] 
Dept. of Biological Sciences, Grad. School of Science, Univ. of Tokyo, Japan 
  


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Maxim Sobolev

Motomichi Matsuzaki wrote:

 At Wed, 27 Dec 2000 15:38:58 +0200,
 Maxim Sobolev [EMAIL PROTECTED] wrote:
   But your solution is no effective and much harmful to multibyte users.
  You are not quite right. For multibyte users my solution (workaround?) is at least 
equial to the previous no-unicode case . I do
  not see how it can be harmful.

 1. In just your workaround, multibyte users will take no merits.

 2. Based on your direction, the size of loadable conversion table
will immensely expand for multibyte support, or be abandoned.
Fundamental misdesign will lead to such unfortunate situation.
So I said your solution was harmful.

Proposed by me patches is no way an official direction of the Project and as I 
advertised are merely a workaround to allow non-English
users to read CD with native filenames until comprehensive iconv for kernel will be 
introduced. I would be glad if someone will
replace my hack with more generic solution.

-Maxim



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Michael C . Wu

On Wed, Dec 27, 2000 at 12:05:57PM +0200, Maxim Sobolev scribbled:
| Several days ago I got a CD with Russian filenames on it and discovered that
| I'm unable to read those filenames. After some hacking I produced a patch,
| which should solve this problem in the manner similar to what we have in
| msdosfs module (i.e. user-provided conversion table). I have to emphasize that
| it's a temporary solution until we will have iconv support in kernel.
| 
| Please somebody review attached patches.

Please do not assume Unicode.  I18N/L10N efforts have been crying for
programmers to *not* and *never* assume *anything*.  :)

Also, this belongs more on -i18n more than anything else.  I really
do not want to generate all the traffic on -i18n alone.. :)

Have you seen ports/chinese/big5fs?  Japanese/Korean do the same thing too
You could simply make a port of this that loads KLD's.  This enables us 
to have the support without having hacks in src/sys.  I talked
to Boris at length about this.  And I think this would be the best way
to implement "temporary" hacks.

As to the progress of iconv, we should have it soon, as soon as
itojun and I work out how to import either the Citrus code
or Konstantin's code. 

If you make this a port, I say "go for it." :) If you wish to commit
to src/sys, I have strong doubts about this.  I hope you do not
mind my bluntness about this, as I really want to express the feelings.

Merry Christmas and a Happy New Year,
Michael
-- 
+--+
| [EMAIL PROTECTED] | [EMAIL PROTECTED] |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+--+


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Michael C . Wu

On Wed, Dec 27, 2000 at 05:57:19PM +0200, Maxim Sobolev scribbled:
| Motomichi Matsuzaki wrote:
| 
|  At Wed, 27 Dec 2000 15:38:58 +0200,
|  Maxim Sobolev [EMAIL PROTECTED] wrote:
|But your solution is no effective and much harmful to multibyte users.
|   You are not quite right. For multibyte users my solution (workaround?) is at 
|least equial to the previous no-unicode case . I do
|   not see how it can be harmful.
| 
|  1. In just your workaround, multibyte users will take no merits.
| 
|  2. Based on your direction, the size of loadable conversion table
| will immensely expand for multibyte support, or be abandoned.
| Fundamental misdesign will lead to such unfortunate situation.
| So I said your solution was harmful.
| 
| Proposed by me patches is no way an official direction of the Project and as I 
|advertised are merely a workaround to allow non-English
| users to read CD with native filenames until comprehensive iconv for kernel will be 
|introduced. I would be glad if someone will
| replace my hack with more generic solution.

I think that making this "hack" a russian/xxxfs port and I think
everyone can be happy. If you want unicode FS like this, perhaps you
can have a sysutils/unicodefs.  :) 

After all, what are KLD's for but modularity? :)


-- 
+--+
| [EMAIL PROTECTED] | [EMAIL PROTECTED] |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+--+


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Konstantin Chuguev

"Michael C . Wu" wrote:

 As to the progress of iconv, we should have it soon, as soon as
 itojun and I work out how to import either the Citrus code
 or Konstantin's code.


As I could see from the CVSed Citrus code, it's a locale library rather than iconv,
and it's just got the stub calls of iconv functions. So I thought Citrus and iconv
complement each other and don't do the same. Am I wrong?

As for the iconv library itself, its userland part is complete. I'm really busy
these days despite Christmas and the New Year, but I expect to release v2.1 in 3
weeks with the following changes:

   * the two patches from ports will be incorporated;
   * a few charsets added (to provide compatibility with the libiconv port and to
 be able to use glib-1.3 with iconv - the only port still depending on
 libiconv);
   * memory and file management functions moved into a separate file; then a
 kernel-side iconv implementation can compile iconv with its own specific
 memory and file management functions (from a different file).

I tried to write a kernel module, but I don't have enough knowledge of the kernel.
If anybody would like to do it, I am ready to help.

Regards,
Konstantin.

--
  * *Konstantin Chuguev - Application Engineer
   *  *  Francis House, 112 Hills Road
 *   Cambridge CB2 1PQ, United Kingdom
 D  A  N  T  E   WWW:http://www.dante.net


N…'²æìr¸›zǧvf¢–Új:+v‰¨·ž è®"¶§²æìr¸›yúÞy»rêëz{bžØ^n‡r¡ûazg¬±¨


Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Michael C . Wu

On Wed, Dec 27, 2000 at 08:32:26PM +0200, Maxim Sobolev scribbled:
| "Michael C . Wu" wrote:
|  On Wed, Dec 27, 2000 at 05:57:19PM +0200, Maxim Sobolev scribbled:
|  | Motomichi Matsuzaki wrote:
|  I think that making this "hack" a russian/xxxfs port and I think
|  everyone can be happy. If you want unicode FS like this, perhaps you
|  can have a sysutils/unicodefs.  :)

| No, it's no way a ports- commit.

Thank you for the one sentence reply without giving any reasons.
As a community, I thought we are supposed to communicate.
I have expressed my reasons and ideas, and you return the favor
with an one line comment.  :)

There is nothing wrong with have a port install a KLD, vmware does it.
And ports/emulators/linux_base depends on linux.ko. 

keichii@recursive:/usr/ports$ ls japanese/msdosfs/
Makefile files/   patches.5/   pkg-descr
distinfo patches.4/   pkg-comment  pkg-plist
keichii@recursive:/usr/ports$ ls chinese/big5fs/
Makefile distinfo files/   pkg-comment  pkg-descrpkg-plist
keichii@recursive:/usr/ports$ cat chinese/big5fs/pkg-descr 
This port installs two kernel modules, cd9660.ko and
msdos.ko, which will let users read Big5 filenames on
Joliet and VFAT filesystems.

Why do you think we did this?  You have admitted that this is an
ugly hack.  Since when did we allow ugly hacks in the kernel when
we can avoid it? Japanese/Chinese chose to do this because we
do not want dirty stuff in the kernel.  We have had this FS
support since 2.2.x.  

Please understand that we do not want hacks, and we certainly do
not want to be reliant on unicode alone.  We should never ever
assume stuff in I18N, and you are doing exactly this.  Have you
any idea what this would cause?  Do you know how hard it is to remove
something after it works for a while?  We do NOT ever want to have to
do legacy support in the future, because *you* want unicode and 
you want it now.  

In addition, this would probably mean laziness in the future to 
do this support properly.  In -SMP, they are avoiding to do hacks
to make things work temporarily.  Why should I18N do otherwise?

You can commit this to work with iconv() after iconv() gets in.
I would personally cheer you on.

I apologize for any harshness in the emails, it is quite hard
to express things without facial expressions.  I think we all
care about the project a lot.  Please be assured
that I mean no offense to you.
-- 
+--+
| [EMAIL PROTECTED] | [EMAIL PROTECTED] |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+--+


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread

On Wed, Dec 27, 2000 at 13:44:11 +0200, Maxim Sobolev wrote:
 I'm now sure how could I obtain charset for each of dozen+ OSes that may create a CD.

There is not so much number, usualy only one Russian charset per OS :-)

 I don't see any problems, because it's likely that usual special high code table
 characters (copyright, angle quotes and so on) will be represented using Unicode
 charcodes with first byte (`base') equal to 0, so they can be mapped directly into
 native charset. In my implementation only Unicode characters with base !=0 are to be
 translated. All less usual characters (graphics and so on) can be translated by
 extending appropriate codetable to include additional translation tables with 
different
 bases (e.g. 0x25 for graphics chars etc.).

Well, I could live with it in case you add _whole_ windows-1251 set as
Unicode to your loadable table and provide corresponding mapping for all
matching KOI8-R characters, as MSDOSFS currently does. You can get those
tables from MSDOSFS. This is minimal basis, you can make separate table
for KOI8-U, etc. I say this because I see you try to treat KOI8-R and
KOI8-U in your patch as the same charset which is not acceptable. Also
please call those tables per local charset name and not 'cd9660'.

-- 
Andrey A. Chernov
http://ache.pp.ru/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread

On Wed, Dec 27, 2000 at 12:48:12 -0600, Michael C . Wu wrote:
 On Wed, Dec 27, 2000 at 08:32:26PM +0200, Maxim Sobolev scribbled:
 | "Michael C . Wu" wrote:
 |  On Wed, Dec 27, 2000 at 05:57:19PM +0200, Maxim Sobolev scribbled:
 |  | Motomichi Matsuzaki wrote:
 |  I think that making this "hack" a russian/xxxfs port and I think
 |  everyone can be happy. If you want unicode FS like this, perhaps you
 |  can have a sysutils/unicodefs.  :)
 
 | No, it's no way a ports- commit.

I agree.

 Thank you for the one sentence reply without giving any reasons.
 As a community, I thought we are supposed to communicate.
 I have expressed my reasons and ideas, and you return the favor
 with an one line comment.  :)

This is general method of telling kernel how to convert names from Unicode
to any local charset, not particulary to Russian one. Anybody feel free to
add their own single bytes charsets tables.

Yes, it is a per-FS hack, but until iconv or something like will be
integrated, some hack needed just to read CDs selling at nearby shop.

 keichii@recursive:/usr/ports$ ls japanese/msdosfs/

Japanese etc. is _very_ different and needs tons of efforts even when
implemented like similar hack, we currently talk about and limited to
single bytes charsets only.

We have almost full single bytes charset localization in the
kernel/userland, so this step is next logical extention in that direction.
I can't say something like about double bytes charsets like Japanese etc.

-- 
Andrey A. Chernov
http://ache.pp.ru/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Michael C . Wu


-audit trimmed, cc'ed to -i18n

On Wed, Dec 27, 2000 at 10:02:01PM +0300, áÎÄÒÅÊ þÅÒÎÏ× scribbled:
| On Wed, Dec 27, 2000 at 12:48:12 -0600, Michael C . Wu wrote:
|  On Wed, Dec 27, 2000 at 08:32:26PM +0200, Maxim Sobolev scribbled:
|  | "Michael C . Wu" wrote:
|  |  On Wed, Dec 27, 2000 at 05:57:19PM +0200, Maxim Sobolev scribbled:
|  |  | Motomichi Matsuzaki wrote:
|  |  I think that making this "hack" a russian/xxxfs port and I think
|  |  everyone can be happy. If you want unicode FS like this, perhaps you
|  |  can have a sysutils/unicodefs.  :)
|  
|  | No, it's no way a ports- commit.
| 
| I agree.

I disagree.

|  Thank you for the one sentence reply without giving any reasons.
|  As a community, I thought we are supposed to communicate.
|  I have expressed my reasons and ideas, and you return the favor
|  with an one line comment.  :)
| 
| This is general method of telling kernel how to convert names from Unicode
| to any local charset, not particulary to Russian one. Anybody feel free to
| add their own single bytes charsets tables.

READ: "single byte"  
You are breaking CJK multibyte support.  Why? Why do you want make 
software engineering mistakes?  
 
| Yes, it is a per-FS hack, but until iconv or something like will be
| integrated, some hack needed just to read CDs selling at nearby shop.
| 
|  keichii@recursive:/usr/ports$ ls japanese/msdosfs/
| 
| Japanese etc. is _very_ different and needs tons of efforts even when
| implemented like similar hack, we currently talk about and limited to
| single bytes charsets only.

Why should you ignore the multibyte stuff because it is harder?
SMPng is hard, but no one is ignoring SMP because SMP is hard to implement.
Since device drivers for cheap hardware is hard to write/fix, should
we also ignore those?  Your logic fails to evaluate to TRUE.

| We have almost full single bytes charset localization in the
| kernel/userland, so this step is next logical extention in that direction.
| I can't say something like about double bytes charsets like Japanese etc.

Right, and as much as you want single byte stuff to work, CJK is also
a large market for FreeBSD.  Where else in the world is there a full-
fledged print monthly 200page magazine for BSD but Japan?
(Sorry, but DaemonNews is not 200 pages :)  

The proposal breaks multibyte support and I do not believe it is acceptable
to do this.  You want Russian to work, and us CJK people want CJK to work.
If you were looking at this from my point of view, would you say yes?
Probably not.  

The frustration with C charset is enough, can we please not make the mistake
that PDP-based BSD's made long ago? 

-- 
+--+
| [EMAIL PROTECTED] | [EMAIL PROTECTED] |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+--+


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread

On Wed, Dec 27, 2000 at 13:35:52 -0600, Michael C . Wu wrote:

 READ: "single byte"  
 You are breaking CJK multibyte support.  Why? Why do you want make 
 software engineering mistakes?  

1) Nobody can break something which not exist yet.
2) The stuff discussed is optional and not using it you got old
functionality.

 Why should you ignore the multibyte stuff because it is harder?

Not for this reason, but because it must be implemented at higher
abstraction level first to be considered. Since I have no much interest
in multibyte characters, it is not mine task in general, but others.

 The proposal breaks multibyte support and I do not believe it is acceptable

Again, I see no way how it can break nonexisten support. Remember that
conversion table is optional and without loading it you got exact previous
functionality variant.

Very similar stuff is in MSDOSFS for years and nobody complains.

-- 
Andrey A. Chernov
http://ache.pp.ru/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Mark Murray

Aimed at the thread, not the participants.

This is off-topic for audit-. Please bring it back to audit- when you have
some actual code to audit.

Thanks!

M

 On Wed, Dec 27, 2000 at 12:05:57PM +0200, Maxim Sobolev scribbled:
 | Several days ago I got a CD with Russian filenames on it and discovered tha
t
 | I'm unable to read those filenames. After some hacking I produced a patch,
 | which should solve this problem in the manner similar to what we have in
 | msdosfs module (i.e. user-provided conversion table). I have to emphasize t
hat
 | it's a temporary solution until we will have iconv support in kernel.
 | 
 | Please somebody review attached patches.
 
 Please do not assume Unicode.  I18N/L10N efforts have been crying for
 programmers to *not* and *never* assume *anything*.  :)
 
 Also, this belongs more on -i18n more than anything else.  I really
 do not want to generate all the traffic on -i18n alone.. :)
 
 Have you seen ports/chinese/big5fs?  Japanese/Korean do the same thing too
 You could simply make a port of this that loads KLD's.  This enables us 
 to have the support without having hacks in src/sys.  I talked
 to Boris at length about this.  And I think this would be the best way
 to implement "temporary" hacks.
 
 As to the progress of iconv, we should have it soon, as soon as
 itojun and I work out how to import either the Citrus code
 or Konstantin's code. 
 
 If you make this a port, I say "go for it." :) If you wish to commit
 to src/sys, I have strong doubts about this.  I hope you do not
 mind my bluntness about this, as I really want to express the feelings.
 
 Merry Christmas and a Happy New Year,
 Michael
 -- 
 +--+
 | [EMAIL PROTECTED] | [EMAIL PROTECTED] |
 | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
 +--+
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-audit" in the body of the message
 
--
Mark Murray
Warning: this .sig is umop ap!sdn


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Warner Losh

[[ sorry for the minorly offtopic post ]]

In message [EMAIL PROTECTED] "Michael
C. Wu" writes:
: Right, and as much as you want single byte stuff to work, CJK is also
: a large market for FreeBSD.  Where else in the world is there a full-
: fledged print monthly 200page magazine for BSD but Japan?
: (Sorry, but DaemonNews is not 200 pages :)  

I think that the BSD magazine is published more like quarterly.  There
have only been 6 issues since October 1999 of BSD Magazine (issue 6
just arrived the other day).  There have been two issues of the
FreeBSD magazine in Japan since October 2000.  The second issue of
FreeBSD mag showed up last week.

However, your point about how big FreeBSD is in Japan is none the less
quite valid.

Warner


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Boris Popov

On Wed, 27 Dec 2000, Maxim Sobolev wrote:

 Several days ago I got a CD with Russian filenames on it and discovered that
 I'm unable to read those filenames. After some hacking I produced a patch,
 which should solve this problem in the manner similar to what we have in
 msdosfs module (i.e. user-provided conversion table). I have to emphasize that
 it's a temporary solution until we will have iconv support in kernel.

The patch seems to be ok as temporary solution for CDs with
Russian file names. And as temporary solution it well suits to the ports
collection, not to the main tree.

In the near future we'll have iconv interface in the kernel which
uses libiconv library written by Konstantin Chuguev. I'm really sorry for
delays, but my current job leaves nearly zero spare time to me and there
is a hope that January will be less busy.

--
Boris Popov
http://www.butya.kz/~bp/



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Unicode support in cd9660 [patch for review]

2000-12-27 Thread Jeroen Ruigrok van der Werven

-On [20001227 20:05], Andrej Cernov ([EMAIL PROTECTED]) wrote:
Yes, it is a per-FS hack, but until iconv or something like will be
integrated, some hack needed just to read CDs selling at nearby shop.

As far as I know, Boris Popov was working on iconv() support.
I see he is cc:'d, so I happily await his ideas/statusupdate.

Furthermore, even though I only use special characters for Dutch,
German, Norwegian, Swedish, Danish, Finnish and Icelandic, I feel that
_if_ we start to add Unicode support we do it right from the beginning.
Unicode was meant to solve the native character set problem for all
languages [as far as my knowledge stretches] and should not be a
working, but ugly, hack which only allows per-language solutions.

If you want to have people to have the ability to mount their CD's with
Russian characters by all means provide the patch on your website.

I am by all means not an l10n or i18n hacker nor wizard, otherwise I
would've dedicated my ample time working on this and solve it once and
for all with all other interested parties.

In short: please do it right from the start.

Thanks,

-- 
Jeroen Ruigrok van der Werven  VIA Net.Works The Netherlands
BSD: Technical excellence at its best  Network- and systemadministrator
  D78D D0AD 244D 1D12 C9CA  7152 035C 1138 546A B867
I'm breaking you down...  I'm taking you down...


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message