Re: [RFC 0/2] Case-insensitive filename lookup for XFS

2007-10-24 Thread Barry Naujok
Hi Anton,

On Tue, 23 Oct 2007 20:07:47 +1000, Anton Altaparmakov <[EMAIL PROTECTED]> 
wrote:

> I forgot to say:  If you do what I did for NTFS you can also throw
> away your custom dentry operations that your patch adds as the dcache
> then only holds correctly cased names so you are fine to do case
> sensitive dcache lookups at all times.  Access via wrongly cased name
> will always go to ->lookup inode operation and that is fine because
> such lookups almost never happen because majority of users will either
> use a GUI in which case all names are always correctly cased as the
> names displayed in the GUI are obtained from a ->readdir and thus show
> the correct case or they will use the command line in which case they
> will be savvy enough to use tab-completion in which case the names are
> correct case, too.  Tab-completion does not work on wrongly cased
> names so you are very unlikely to ever get a wrongly cased name at all.
>
> And yes of course you can on purpose construct a test / benchmark
> where having to do the ->lookup each time will be really slow because
> you keep creating files and then accessing them by wrongly cased name
> on purpose (or whatever) but I would hope that you do not care about
> such artificial benchmarks that do not reflect any real-world loads...

I have been looking at ntfs_lookup() and seeing how it does its stuff.
It seems that is the best way to go.

One thing I have noticed is with two or more attempted case-insensitive
lookups that don't exist yet case match the same
(ie. ntfs_lookup_inode_by_name() fails with -ENOENT), d_add(dent, NULL)
is called, populating the dentry with effective duplicates.

Eg:
  # cat /mnt/foo/fileNOTexist
  # cat /mnt/foo/FILEnotEXIST

Will have two negative dentries, am I correct?

Regards,
Barry.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] Case-insensitive XFS - kernel patch

2007-10-23 Thread Barry Naujok
On Wed, 24 Oct 2007 06:19:12 +1000, Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> This patch is quite badly mangled by your mailer.  Could you just
> attach it?  (Or even better use a mailer that handles inlined text
> without mangling it..)

Found the setting at last. Here it is again...  


===
fs/xfs/Makefile
===

--- a/fs/xfs/Makefile   2007-10-23 17:19:40.0 +1000
+++ b/fs/xfs/Makefile   2007-10-23 16:17:22.173903950 +1000
@@ -74,6 +74,7 @@ xfs-y += xfs_alloc.o \
   xfs_trans_extfree.o \
   xfs_trans_inode.o \
   xfs_trans_item.o \
+  xfs_unicode.o \
   xfs_utils.o \
   xfs_vfsops.o \
   xfs_vnodeops.o \

===
fs/xfs/linux-2.6/xfs_iops.c
===

--- a/fs/xfs/linux-2.6/xfs_iops.c   2007-10-23 17:19:41.0 +1000
+++ b/fs/xfs/linux-2.6/xfs_iops.c   2007-10-23 16:43:19.828562924 +1000
@@ -47,12 +47,17 @@
  #include "xfs_buf_item.h"
  #include "xfs_utils.h"
  #include "xfs_vnodeops.h"
+#include "xfs_da_btree.h"
+#include "xfs_unicode.h"

  #include 
  #include 
  #include 
  #include 

+struct dentry_operations xfs_dentry_operations;
+struct dentry_operations xfs_nls_dentry_operations;
+
  /*
   * Bring the atime in the XFS inode uptodate.
   * Used before logging the inode to disk or when the Linux inode goes away.
@@ -369,10 +374,17 @@ xfs_vn_lookup(
  {
bhv_vnode_t *cvp;
int error;
+   struct xfs_mount *mp = XFS_I(dir)->i_mount;
+   struct dentry   *result;

if (dentry->d_name.len >= MAXNAMELEN)
return ERR_PTR(-ENAMETOOLONG);

+   if (xfs_sb_version_hasunicode(&mp->m_sb) ||
+   xfs_sb_version_hasoldci(&mp->m_sb))
+   dentry->d_op = mp->m_nls ? &xfs_nls_dentry_operations :
+   &xfs_dentry_operations;
+
error = xfs_lookup(XFS_I(dir), dentry, &cvp);
if (unlikely(error)) {
if (unlikely(error != ENOENT))
@@ -381,7 +393,11 @@ xfs_vn_lookup(
return NULL;
}

-   return d_splice_alias(vn_to_inode(cvp), dentry);
+   result = d_splice_alias(vn_to_inode(cvp), dentry);
+   if (result)
+   result->d_op = dentry->d_op;
+
+   return result;
  }

  STATIC int
@@ -823,3 +839,74 @@ const struct inode_operations xfs_symlin
.listxattr  = xfs_vn_listxattr,
.removexattr= xfs_vn_removexattr,
  };
+
+
+STATIC int
+xfs_dentry_hash(
+   struct dentry   *dir,
+   struct qstr *this)
+{
+   this->hash = xfs_dir_hashname(XFS_I(dir->d_inode),
+   this->name, this->len);
+   return 0;
+}
+
+STATIC int
+xfs_dentry_compare(
+   struct dentry   *dir,
+   struct qstr *a,
+   struct qstr *b)
+{
+   int result = xfs_dir_compname(XFS_I(dir->d_inode), a->name, a->len,
+   b->name, b->len);
+   if (result == 0) {
+   if (a->len == b->len)
+   memcpy((unsigned char *)a->name, b->name, a->len);
+   else {
+   /* TODO: more complicated when name lengths differ */
+   }
+   }
+   return result;
+}
+
+STATIC int
+xfs_nls_dentry_hash(
+   struct dentry   *dir,
+   struct qstr *this)
+{
+   xfs_mount_t *mp = XFS_I(dir->d_inode)->i_mount;
+
+   this->hash = xfs_nls_hash(mp->m_nls, mp->m_cft, this->name, this->len);
+   return 0;
+}
+
+STATIC int
+xfs_nls_dentry_compare(
+   struct dentry   *dir,
+   struct qstr *a,
+   struct qstr *b)
+{
+   xfs_mount_t *mp = XFS_I(dir->d_inode)->i_mount;
+   int result = xfs_nls_casecmp(mp->m_nls, mp->m_cft,
+   a->name, a->len, b->name, b->len);
+   if (result == 0) {
+   if (a->len == b->len)
+   memcpy((unsigned char *)a->name, b->name, a->len);
+   else {
+   /* TODO: more complicated when name lengths differ */
+   }
+   }
+   return result;
+}
+
+struct dentry_operations xfs_dentry_operations =
+{
+   .d_hash = xfs_dentry_hash,
+   .d_compare = xfs_dentry_compare,
+};
+
+struct dentry_operations xfs_nls_dentry_operations =
+{
+   .d_hash = xfs_nls_dentry_hash,
+   .d_compare = xfs_nls_dentry_compare,
+};

===
fs/xfs/linux-2.6/xfs_linux.h

[RFC 1/2] Case-insensitive XFS - kernel patch

2007-10-23 Thread Barry Naujok


 Makefile  |1
 linux-2.6/xfs_iops.c  |   89 
 linux-2.6/xfs_linux.h |2
 linux-2.6/xfs_super.c |8
 xfs_clnt.h|5
 xfs_da_btree.c|   20 +
 xfs_da_btree.h|   21 +
 xfs_dir2.c|  177 +---
 xfs_dir2.h|9
 xfs_dir2_block.c  |   30 +-
 xfs_dir2_data.c   |3
 xfs_dir2_leaf.c   |   19 +
 xfs_dir2_node.c   |5
 xfs_dir2_sf.c |   35 ++-
 xfs_mount.c   |   25 ++
 xfs_mount.h   |8
 xfs_sb.h  |   33 ++-
 xfs_unicode.c |  547  
++

 xfs_unicode.h |   75 ++
 xfs_vfsops.c  |   53 
 20 files changed, 1100 insertions(+), 65 deletions(-)

===
fs/xfs/Makefile
===

--- a/fs/xfs/Makefile   2007-10-23 17:19:40.0 +1000
+++ b/fs/xfs/Makefile   2007-10-23 16:17:22.173903950 +1000
@@ -74,6 +74,7 @@ xfs-y += xfs_alloc.o \
   xfs_trans_extfree.o \
   xfs_trans_inode.o \
   xfs_trans_item.o \
+  xfs_unicode.o \
   xfs_utils.o \
   xfs_vfsops.o \
   xfs_vnodeops.o \

===
fs/xfs/linux-2.6/xfs_iops.c
===

--- a/fs/xfs/linux-2.6/xfs_iops.c   2007-10-23 17:19:41.0 +1000
+++ b/fs/xfs/linux-2.6/xfs_iops.c   2007-10-23 16:43:19.828562924 +1000
@@ -47,12 +47,17 @@
 #include "xfs_buf_item.h"
 #include "xfs_utils.h"
 #include "xfs_vnodeops.h"
+#include "xfs_da_btree.h"
+#include "xfs_unicode.h"

 #include 
 #include 
 #include 
 #include 

+struct dentry_operations xfs_dentry_operations;
+struct dentry_operations xfs_nls_dentry_operations;
+
 /*
  * Bring the atime in the XFS inode uptodate.
  * Used before logging the inode to disk or when the Linux inode goes  
away.

@@ -369,10 +374,17 @@ xfs_vn_lookup(
 {
bhv_vnode_t *cvp;
int error;
+   struct xfs_mount *mp = XFS_I(dir)->i_mount;
+   struct dentry   *result;

if (dentry->d_name.len >= MAXNAMELEN)
return ERR_PTR(-ENAMETOOLONG);

+   if (xfs_sb_version_hasunicode(&mp->m_sb) ||
+   xfs_sb_version_hasoldci(&mp->m_sb))
+   dentry->d_op = mp->m_nls ? &xfs_nls_dentry_operations :
+   &xfs_dentry_operations;
+
error = xfs_lookup(XFS_I(dir), dentry, &cvp);
if (unlikely(error)) {
if (unlikely(error != ENOENT))
@@ -381,7 +393,11 @@ xfs_vn_lookup(
return NULL;
}

-   return d_splice_alias(vn_to_inode(cvp), dentry);
+   result = d_splice_alias(vn_to_inode(cvp), dentry);
+   if (result)
+   result->d_op = dentry->d_op;
+
+   return result;
 }

 STATIC int
@@ -823,3 +839,74 @@ const struct inode_operations xfs_symlin
.listxattr  = xfs_vn_listxattr,
.removexattr= xfs_vn_removexattr,
 };
+
+
+STATIC int
+xfs_dentry_hash(
+   struct dentry   *dir,
+   struct qstr *this)
+{
+   this->hash = xfs_dir_hashname(XFS_I(dir->d_inode),
+   this->name, this->len);
+   return 0;
+}
+
+STATIC int
+xfs_dentry_compare(
+   struct dentry   *dir,
+   struct qstr *a,
+   struct qstr *b)
+{
+   int result = xfs_dir_compname(XFS_I(dir->d_inode), a->name, a->len,
+   b->name, b->len);
+   if (result == 0) {
+   if (a->len == b->len)
+   memcpy((unsigned char *)a->name, b->name, a->len);
+   else {
+   /* TODO: more complicated when name lengths differ */
+   }
+   }
+   return result;
+}
+
+STATIC int
+xfs_nls_dentry_hash(
+   struct dentry   *dir,
+   struct qstr *this)
+{
+   xfs_mount_t *mp = XFS_I(dir->d_inode)->i_mount;
+
+   this->hash = xfs_nls_hash(mp->m_nls, mp->m_cft, this->name, this->len);
+   return 0;
+}
+
+STATIC int
+xfs_nls_dentry_compare(
+   struct dentry   *dir,
+   struct qstr *a,
+   struct qstr *b)
+{
+   xfs_mount_t *mp = XFS_I(dir->d_inode)->i_mount;
+   int result = xfs_nls_casecmp(mp->m_nls, mp->m_cft,
+   a->name, a->len, b->name, b->len);
+   if (result == 0) {
+   if (a->len == b->len)
+   memcpy((unsigned char *)a->name, b->name, a->len);
+   else {
+   /* TODO: more complicated when name lengths d

[RFC 2/2] Case-insensitive XFS - mkfs.xfs

2007-10-23 Thread Barry Naujok


 include/xfs_sb.h |   27 +-
 libxfs/xfs_mount.c   |2
 mkfs/Makefile|2
 mkfs/casefoldtable.c |  608  
+++

 mkfs/proto.c |  158 +
 mkfs/xfs_mkfs.c  |   93 ---
 mkfs/xfs_mkfs.h  |   24 +-
 7 files changed, 866 insertions(+), 48 deletions(-)

===
xfsprogs/include/xfs_sb.h
===

--- a/xfsprogs/include/xfs_sb.h 2007-10-23 17:14:16.0 +1000
+++ b/xfsprogs/include/xfs_sb.h 2007-10-23 16:56:07.765557256 +1000
@@ -46,10 +46,12 @@ struct xfs_mount;
 #define XFS_SB_VERSION_SECTORBIT   0x0800
 #defineXFS_SB_VERSION_EXTFLGBIT0x1000
 #defineXFS_SB_VERSION_DIRV2BIT 0x2000
+#define XFS_SB_VERSION_OLDCIBIT0x4000
 #defineXFS_SB_VERSION_MOREBITSBIT  0x8000
 #defineXFS_SB_VERSION_OKSASHFBITS  \
(XFS_SB_VERSION_EXTFLGBIT | \
-XFS_SB_VERSION_DIRV2BIT)
+XFS_SB_VERSION_DIRV2BIT | \
+XFS_SB_VERSION_OLDCIBIT)
 #defineXFS_SB_VERSION_OKREALFBITS  \
(XFS_SB_VERSION_ATTRBIT | \
 XFS_SB_VERSION_NLINKBIT | \
@@ -82,13 +84,12 @@ struct xfs_mount;
 #define XFS_SB_VERSION2_DONOTUSEBIT2   0x0004
 #define XFS_SB_VERSION2_ATTR2BIT   0x0008  /* Inline attr rework */
 #define XFS_SB_VERSION2_PARENTBIT  0x0010  /* Parent pointers */
-#define XFS_SB_VERSION2_SASHFBITS  0xff00  /* Mask: features that
-  require changing
-  PROM and SASH */
+#define XFS_SB_VERSION2_UNICODEBIT 0x0020  /* Unicode names */

 #defineXFS_SB_VERSION2_OKREALFBITS \
-   (XFS_SB_VERSION2_ATTR2BIT | \
-XFS_SB_VERSION2_LAZYSBCOUNTBIT)
+   (XFS_SB_VERSION2_LAZYSBCOUNTBIT | \
+XFS_SB_VERSION2_ATTR2BIT | \
+XFS_SB_VERSION2_UNICODEBIT)
 #defineXFS_SB_VERSION2_OKSASHFBITS \
(0)
 #define XFS_SB_VERSION2_OKREALBITS \
@@ -151,6 +152,8 @@ typedef struct xfs_sb
__uint16_t  sb_logsectsize; /* sector size for the log, bytes */
__uint32_t  sb_logsunit;/* stripe unit size for the log */
__uint32_t  sb_features2;   /* additional feature bits */
+   __uint32_t  sb_bad_features2; /* unusable space */
+   xfs_ino_t   sb_cftino;  /* unicode case folding table inode */
 } xfs_sb_t;

 /*
@@ -169,7 +172,7 @@ typedef enum {
XFS_SBS_GQUOTINO, XFS_SBS_QFLAGS, XFS_SBS_FLAGS, XFS_SBS_SHARED_VN,
XFS_SBS_INOALIGNMT, XFS_SBS_UNIT, XFS_SBS_WIDTH, XFS_SBS_DIRBLKLOG,
XFS_SBS_LOGSECTLOG, XFS_SBS_LOGSECTSIZE, XFS_SBS_LOGSUNIT,
-   XFS_SBS_FEATURES2,
+   XFS_SBS_FEATURES2, XFS_SBS_BAD_FEATURES2, XFS_SBS_CFTINO,
XFS_SBS_FIELDCOUNT
 } xfs_sb_field_t;

@@ -194,13 +197,15 @@ typedef enum {
 #define XFS_SB_IFREE   XFS_SB_MVAL(IFREE)
 #define XFS_SB_FDBLOCKSXFS_SB_MVAL(FDBLOCKS)
 #define XFS_SB_FEATURES2   XFS_SB_MVAL(FEATURES2)
+#define XFS_SB_CFTINO  XFS_SB_MVAL(CFTINO)
 #defineXFS_SB_NUM_BITS ((int)XFS_SBS_FIELDCOUNT)
 #defineXFS_SB_ALL_BITS ((1LL << XFS_SB_NUM_BITS) - 1)
 #defineXFS_SB_MOD_BITS \
(XFS_SB_UUID | XFS_SB_ROOTINO | XFS_SB_RBMINO | XFS_SB_RSUMINO | \
 XFS_SB_VERSIONNUM | XFS_SB_UQUOTINO | XFS_SB_GQUOTINO | \
 XFS_SB_QFLAGS | XFS_SB_SHARED_VN | XFS_SB_UNIT | XFS_SB_WIDTH | \
-XFS_SB_ICOUNT | XFS_SB_IFREE | XFS_SB_FDBLOCKS | XFS_SB_FEATURES2)
+XFS_SB_ICOUNT | XFS_SB_IFREE | XFS_SB_FDBLOCKS | XFS_SB_FEATURES2 | \
+XFS_SB_CFTINO)


 /*
@@ -455,6 +460,12 @@ static inline void xfs_sb_version_addatt
((sbp)->sb_features2 | XFS_SB_VERSION2_ATTR2BIT)));
 }

+static inline int xfs_sb_version_hasunicode(xfs_sb_t *sbp)
+{
+   return (xfs_sb_version_hasmorebits(sbp) &&  \
+   ((sbp)->sb_features2 & XFS_SB_VERSION2_UNICODEBIT));
+}
+
 /*
  * end of superblock version macros
  */

===
xfsprogs/libxfs/xfs_mount.c
===

--- a/xfsprogs/libxfs/xfs_mount.c   2007-10-23 17:14:16.0 +1000
+++ b/xfsprogs/libxfs/xfs_mount.c   2007-10-23 16:52:26.438099100 +1000
@@ -140,6 +140,8 @@ static struct {
 { offsetof(xfs_sb_t, sb_logsectsize),0 },
 { offsetof(xfs_sb_t, sb_logsunit),  0 },
 { offsetof(xfs_sb_t, sb_features2), 0 },
+{ offsetof(xfs_sb_t, sb_bad_features2), 0 },
+{ offsetof(xfs_sb_t, sb_cftino),0 },
 { sizeof(xfs_sb_t), 0 }
 };


===
xfsprogs/mkfs/Makefile
===

[RFC 0/2] Case-insensitive filename lookup for XFS

2007-10-23 Thread Barry Naujok


Following is the initial test version of case-insensitive support
for XFS in Linux. It implements case-insensitivity utilising a
Unicode case folding table stored on disk generated from
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt

As the filesystem stores names as Unicode (UTF-8), the "nls"
mount option has been added to support systems not utilising
UTF-8 natively. If the nls mount option is not used, it will
use the default NLS defined in the kernel's config.

To allow case-insensitivity to be a mount option rather than
a mkfs option, the hashes stored on disk are always case-folded.
This is indicated by the new "unicode" bit in the superblock.
This bit also associated with the presence of the case-folding
table on disk.

With the case-folding table on disk, it allows us to upgrade
the table in the future while retaining backwards and forwards
compatibility. It also allows special case tables such as
Turkic case which is supported in this patch set.

The case-insensitive support also installs a couple of
dentry_operations for the XFS inodes: hash and compare.

Currently, there is a couple of outstanding issues with the
dentry cache interaction:

  - The first lookup if case-mismatched will continue to
have the mismatched case in the cache. Not really sure
if this is an issue or not. If it is an issue, how
should I resolve it?

  - As above, but with a non-existing lookup, then creating
the file with a different case, the first failed lookup
will define the case used. I have partially resolved
this with a memcpy if the two lengths are the same.
How do I fix this if the lengths are different?
(TODO's show the location of this problem.)

Other TODOs:

  - support for case-insensitve extended attributes
as a separate mount option.

  - Other xfsprogs updates: xfs_repair, xfs_db

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Case-insensitive support for XFS

2007-10-09 Thread Barry Naujok
On Mon, 08 Oct 2007 15:44:48 +1000, Nicholas Miell <[EMAIL PROTECTED]>  
wrote:



On Mon, 2007-10-08 at 15:07 +1000, Barry Naujok wrote:

On Sat, 06 Oct 2007 04:52:18 +1000, Nicholas Miell <[EMAIL PROTECTED]>
wrote:

> On Fri, 2007-10-05 at 16:44 +0100, Christoph Hellwig wrote:
>> [Adding -fsdevel because some of the things touched here might be of
>>  broader interest and Urban because his name is on nls_utf8.c]
>>
>> On Fri, Oct 05, 2007 at 11:57:54AM +1000, Barry Naujok wrote:
>> >
>> > On it's own, linux only provides case conversion for old-style
>> > character sets - 8 bit sequences only. A lot of distos are
>> > now defaulting to UTF-8 and Linux NLS stuff does not support
>> > case conversion for any unicode sets.
>>
>> The lack of case tables in nls_utf8.c defintively seems odd to me.
>> Urban, is there a reason for that?  The only thing that comes to
>> mind is that these tables might be quite large.
>>
>
> Case conversion in Unicode is locale dependent. The legacy 8-bit
> character encodings don't code for enough characters to run into the
> ambiguities, so they can get away with fixed case conversion tables.
> Unicode can't.

Based on http://www.unicode.org/reports/tr21/tr21-5.html and
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt

Doing case comparison using that table should cater for most
circumstances except a few exeptions. It should be enough
to satisfy a locale independant case-insensitive filesystem
(ie. the C + F case folding option).

Is normalization required after case-folding? What I read
implies it is not necessary for this purpose (and would
slow things down and bloat the code more).

Now I suppose, it's just a question of a fixed table in the
kernel driver (HFS+ style), or data stored in a special
inode on-disk (NTFS style, shared refcounted in memory
when the same). With the on-disk, the table can be generated
 from mkfs.xfs.


You also have to decide whether to screw over people who speak Turkic
languages and expect an 'I' to 'ı' mapping or everybody else who expect
an 'I' to 'i' mapping.


I have had a thought about this. If the case table is stored on-disk like
NTFS, then mkfs.xfs can specify whether to use Turkic I's or not.

That guarantees consistent case folding for the filesystem. mkfs.xfs can
default to a Turkic case table if the user's locale is tr/az and the
"default case table" if not. mkfs.xfs will have to highlight this setting
if the user specifies the generic case-insensitive option. mkfs.xfs
should also allow the user to specify which of the case tables to use.

Barry.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Case-insensitive support for XFS

2007-10-08 Thread Barry Naujok
On Mon, 08 Oct 2007 15:44:48 +1000, Nicholas Miell <[EMAIL PROTECTED]>  
wrote:



You also have to decide whether to screw over people who speak Turkic
languages and expect an 'I' to 'ı' mapping or everybody else who expect
an 'I' to 'i' mapping.


I suspect they would be used to the false case-insensitive match. I
tested it on Windows XP with NTFS: İ (U+0130) did not match I or i
or ı (U+0131). I also tested it with the Turkish language/keyboard set.

Once it's set in a filesystem, the handling of it can't really be
swapped back and forth either, otherwise, you may lose access to
that file.

There is no practical way that I can see of supporting this
fully, even with using the NLS tables. The on-disk hashes have
to remain consistent regardless of what language is specified.

Barry.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Case-insensitive support for XFS

2007-10-07 Thread Barry Naujok
On Mon, 08 Oct 2007 15:44:48 +1000, Nicholas Miell <[EMAIL PROTECTED]>  
wrote:



On Mon, 2007-10-08 at 15:07 +1000, Barry Naujok wrote:

On Sat, 06 Oct 2007 04:52:18 +1000, Nicholas Miell <[EMAIL PROTECTED]>
wrote:

> On Fri, 2007-10-05 at 16:44 +0100, Christoph Hellwig wrote:
>> [Adding -fsdevel because some of the things touched here might be of
>>  broader interest and Urban because his name is on nls_utf8.c]
>>
>> On Fri, Oct 05, 2007 at 11:57:54AM +1000, Barry Naujok wrote:
>> >
>> > On it's own, linux only provides case conversion for old-style
>> > character sets - 8 bit sequences only. A lot of distos are
>> > now defaulting to UTF-8 and Linux NLS stuff does not support
>> > case conversion for any unicode sets.
>>
>> The lack of case tables in nls_utf8.c defintively seems odd to me.
>> Urban, is there a reason for that?  The only thing that comes to
>> mind is that these tables might be quite large.
>>
>
> Case conversion in Unicode is locale dependent. The legacy 8-bit
> character encodings don't code for enough characters to run into the
> ambiguities, so they can get away with fixed case conversion tables.
> Unicode can't.

Based on http://www.unicode.org/reports/tr21/tr21-5.html and
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt

Doing case comparison using that table should cater for most
circumstances except a few exeptions. It should be enough
to satisfy a locale independant case-insensitive filesystem
(ie. the C + F case folding option).

Is normalization required after case-folding? What I read
implies it is not necessary for this purpose (and would
slow things down and bloat the code more).

Now I suppose, it's just a question of a fixed table in the
kernel driver (HFS+ style), or data stored in a special
inode on-disk (NTFS style, shared refcounted in memory
when the same). With the on-disk, the table can be generated
 from mkfs.xfs.


You also have to decide whether to screw over people who speak Turkic
languages and expect an 'I' to 'ı' mapping or everybody else who expect
an 'I' to 'i' mapping.


Is there some way in the kernel, that I'm unaware of, in knowing what
the user's current language and/or codepage locale is set to?

The only thing I've found is the isocharset option that the other
filesystems use or the default_nls_table() if one isn't specified.
The default one seems to be a CONFIG option.


Although, if you're content in ignoring the kernel's native NLS case
mapping tables (which expect a locale-independent 1-to-1 mapping), you
could just uppercase everything and map both 'i' and 'ı' to 'I'.

Then you have to decide whether things like 'ê' map to 'E' or 'Ê', which
is also locale dependent.


Looking at case-folding, it would be generating lower case equivalent
characters, nls->charset2lower.

Barry.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Case-insensitive support for XFS

2007-10-07 Thread Barry Naujok
On Sat, 06 Oct 2007 04:52:18 +1000, Nicholas Miell <[EMAIL PROTECTED]>  
wrote:



On Fri, 2007-10-05 at 16:44 +0100, Christoph Hellwig wrote:

[Adding -fsdevel because some of the things touched here might be of
 broader interest and Urban because his name is on nls_utf8.c]

On Fri, Oct 05, 2007 at 11:57:54AM +1000, Barry Naujok wrote:
>
> On it's own, linux only provides case conversion for old-style
> character sets - 8 bit sequences only. A lot of distos are
> now defaulting to UTF-8 and Linux NLS stuff does not support
> case conversion for any unicode sets.

The lack of case tables in nls_utf8.c defintively seems odd to me.
Urban, is there a reason for that?  The only thing that comes to
mind is that these tables might be quite large.



Case conversion in Unicode is locale dependent. The legacy 8-bit
character encodings don't code for enough characters to run into the
ambiguities, so they can get away with fixed case conversion tables.
Unicode can't.


Based on http://www.unicode.org/reports/tr21/tr21-5.html and
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt

Doing case comparison using that table should cater for most
circumstances except a few exeptions. It should be enough
to satisfy a locale independant case-insensitive filesystem
(ie. the C + F case folding option).

Is normalization required after case-folding? What I read
implies it is not necessary for this purpose (and would
slow things down and bloat the code more).

Now I suppose, it's just a question of a fixed table in the
kernel driver (HFS+ style), or data stored in a special
inode on-disk (NTFS style, shared refcounted in memory
when the same). With the on-disk, the table can be generated
from mkfs.xfs.

Regards,
Barry.


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Case-insensitive support for XFS

2007-10-07 Thread Barry Naujok
On Sat, 06 Oct 2007 05:10:23 +1000, Anton Altaparmakov <[EMAIL PROTECTED]>  
wrote:



Hi,

On 5 Oct 2007, at 16:44, Christoph Hellwig wrote:

[Adding -fsdevel because some of the things touched here might be of
 broader interest and Urban because his name is on nls_utf8.c]

On Fri, Oct 05, 2007 at 11:57:54AM +1000, Barry Naujok wrote:

On it's own, linux only provides case conversion for old-style
character sets - 8 bit sequences only. A lot of distos are
now defaulting to UTF-8 and Linux NLS stuff does not support
case conversion for any unicode sets.


The lack of case tables in nls_utf8.c defintively seems odd to me.
Urban, is there a reason for that?  The only thing that comes to
mind is that these tables might be quite large.


NTFS in Linux also implements it's own dcache and NTFS also


^^^ dentry operations?


Where did that come from?  NTFS does not have its own dcache!  It  
doesn't have its own dentry operations either...  NTFS uses the default  
ones...


All the case insensitivity handling "cleverness" is done inside  
ntfs_lookup(), i.e. the NTFS directory inode operation ->lookup.


Sorry if I got this wrong. I derived my comment from fs/ntfs/namei.c:

 * In order to handle the case insensitivity issues of NTFS with regards  
to the
 * dcache and the dcache requiring only one dentry per directory, we deal  
with
 * dentry aliases that only differ in case in ->ntfs_lookup() while  
maintaining

 * a case sensitive dcache.

Misinterpretation reading it again :)


Internally, the names will probably be converted to "u16"s for
efficient processing. Conversion between UTF-8 and UTF-16/UCS-2
is very straight forward.


Do we really need that?  And if so please make sure this only happens
for filesystems created with the case insensitivity option so normal
filesystems don't have to pay for these bloated strings.


There is nothing efficient about using u16 in memory AFAIK.  In fact for  
majority of the time it just means you use twice the memory per string...


FWIW Mac OS X uses utf8 in the kernel and so does HFS(+) and I can't see  
anything wrong with that.  And Windows uses u16 (little endian) and so  
does NTFS.  So there is precedent for doing both internally...


What are the reasons for suggesting that it would be more efficient to  
use u16 internally?


As I said to Christoph before, the only reason is the nls conversions
use wchar_t. As I don't have any case tables yet (one of the primary
points for discussion), I haven't settled on which method to use.

If I do use u16, it will only be used temporarily for case comparison.

Regards,
barry.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Case-insensitive support for XFS

2007-10-07 Thread Barry Naujok
On Sat, 06 Oct 2007 01:44:42 +1000, Christoph Hellwig <[EMAIL PROTECTED]>  
wrote:



[Adding -fsdevel because some of the things touched here might be of
 broader interest and Urban because his name is on nls_utf8.c]

On Fri, Oct 05, 2007 at 11:57:54AM +1000, Barry Naujok wrote:


It will be proposed that in the future, XFS may default to
UTF-8 on disk and to go for the old format, explicitily
use a mkfs.xfs option. Two superbits will be used: one for
case-insensitive (which generates lowercase hashes on disk)
and that already exists on IRIX filesystems and a new one
for UTF-8 filenames. Any combination of the two bits can be
used and the dentry_operations will be adjusted accordingly.


I don't think arbitrary combinations make sense.  Without case  
insensitive

support a unix filesystem couldn't care less what charset the filenames
are in, except for the terminating 0 and '/', '.', '..' it's an entirely
opaqueue stream of bytes.  So chosing a charset only makes sense
with the case insensitive filename option.


I was thinking along the lines of the isocharset mount option
that specifies the 8-bit codepage should be converted to/from UTF-8.
In the end, I suppose it ends up as a an "opaque stream of bytes"
for a case sensitive filesytem. I've started implementing the
changes to XFS and UTF8/old have no differences.


So, in regards to the UTF-8 case-conversion/folding table, we
have several options to choose from:
   - Use the HFS+ method as-is.
   - Use an NTFS scheme with an on-disk table.
   - Pick a current table and stick with it (similar to HFS+).
   - How much of Unicode to we support? Just the the "Basic
 Multilingual Plane" (U+ - U+) or the entire set?
 (anything above U+ won't have case-conversion
  requirements). Seems that all the other filesystems
  just support the "BMP".
   - UTF-8, UTF-16 or UCS-2.

With the last point, UTF-8 has several advantages IMO:
   - xfs_repair can easily detect UTF-8 sequences in filenames
 and also validate UTF-8 sequences.
   - char based structures don't change
   - "nulls" in filenames.
   - no endian conversions required.


I think the right approach is to use the fs/nls/ code and allow the
user to select any table with a mount option as at least in russia
and eastern europe some non-utf8 charsets still seem to be prefered.
The default should of course be utf8 and support for utf8 case
conversion should be added to fs/nls/


Internally, the names will probably be converted to "u16"s for
efficient processing. Conversion between UTF-8 and UTF-16/UCS-2
is very straight forward.


Do we really need that?  And if so please make sure this only happens
for filesystems created with the case insensitivity option so normal
filesystems don't have to pay for these bloated strings.


Sort of as the NLS conversions use wchar_t's. From that, I can
convert straight back to utf8 anyway.

Barry.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] manpage for fallocate

2007-07-10 Thread Barry Naujok
On Wed, 11 Jul 2007 06:18:20 +1000, Amit K. Arora  
<[EMAIL PROTECTED]> wrote:



Following is the modified version of the manpage originally submitted by
David Chinner. Please use `nroff -man fallocate.2 | less` to view.


A few more touch-ups attached.

Regards,
Barry.

fallocate.2
Description: Binary data