Bug#798300: linux: fs/isofs/rock.c coarsely truncates file names of 254 or 255 bytes length

2021-05-09 Thread Thomas Schmitt
Hi,

> if this still should be trackled,

It's still in kernel 5.10, at least.

> please report it upstream

I agree that this bug is not Debain's business.
But i'd need a kernel list member who would be ready to review a patch.


Warning:
From here on it's just whining of a userlander about kernel cruelty.

There is no maintainer for isofs. Nearest is linux-scsi because of the
semi-proximity between cdrom and isofs.
Having had a real test machine last year, i tried to submit patches for
both subsystems [1] [2], but did not get any reaction.

After this experience i did not submit my patch for the name length issues.
It is based on 5.10:
--

From 3d484405f0ad8d10ef490281da157bfdd7450cb6 Mon Sep 17 00:00:00 2001
From: Thomas Schmitt 
Date: Tue, 22 Sep 2020 12:34:50 +0200
Subject: [PATCH 1/1] isofs: truncate oversized Rock Ridge names to 255 bytes

Enlarge the limit for name bytes from 253 to 255.
Do not discard all bytes of the NM field where the overflow occurs, but
rather append them to the accumulated name before truncating it to exactly
255 bytes.
Map trailing incomplete UTF-8 bytes to '_'.

Signed-off-by: Thomas Schmitt 
---
 fs/isofs/rock.c | 73 ++---
 fs/isofs/rock.h |  2 ++
 2 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/fs/isofs/rock.c b/fs/isofs/rock.c
index 94ef92fe806c..e1db8776b67e 100644
--- a/fs/isofs/rock.c
+++ b/fs/isofs/rock.c
@@ -192,6 +192,63 @@ static int rock_check_overflow(struct rock_state *rs, int 
sig)
return 0;
 }

+/*
+ * Find backward from idx the start byte of a possible UTF-8 character.
+ *   https://en.wikipedia.org/wiki/UTF-8#Description
+ * Return -1 if no incomplete UTF-8 sequence is found at the name end.
+ */
+static int rock_find_utf8_start(char *name, int idx)
+{
+   unsigned char *uname, uch;
+   int i;
+
+   uname = (unsigned char *)name;
+   /* Up to 4-byte codes */
+   for (i = 0; i < 4 && idx - i >= 0; i++) {
+   uch = uname[idx - i];
+   if ((uch & 0xe0) == 0xc0) {
+   /* UTF-8 start byte for 2-byte codes */
+   if (i >= 1)
+   return -1;  /* tail is complete */
+   else
+   return (idx - i);   /* tail not complete */
+   } else if ((uch & 0xf0) == 0xe0) {
+   /* UTF-8 start byte for 3-byte codes */
+   if (i >= 2)
+   return -1;
+   else
+   return (idx - i);
+   } else if ((uch & 0xf8) == 0xf0) {
+   /* UTF-8 start byte for 4-byte codes */
+   if (i >= 3)
+   return -1;
+   else
+   return (idx - i);
+   } else if ((uch & 0xc0) != 0x80) {
+   /* not an UTF-8 tail byte, so no UTF-8 */
+   return -1;
+   }
+   }
+   /* That would be an oversized UTF-8 code or no UTF-8 at all */
+   return -1;
+}
+
+/*
+ * Replace trailing incomplete UTF-8 sequence by '_' characters.
+ */
+static void rock_erase_incomplete_utf8(char *name, int len)
+{
+   int i;
+
+   if (len <= 0)
+   return;
+   i = rock_find_utf8_start(name, len - 1);
+   if (i < 0)
+   return;
+   for (; i < len; i++)
+   name[i] = '_';
+}
+
 /*
  * return length of name field; 0: not found, -1: to be ignored
  */
@@ -271,16 +328,24 @@ int get_rock_ridge_filename(struct iso_directory_record 
*de,
break;
}
len = rr->len - 5;
-   if (retnamlen + len >= 254) {
-   truncate = 1;
-   break;
-   }
p = memchr(rr->u.NM.name, '\0', len);
if (unlikely(p))
len = p - rr->u.NM.name;
+   if (retnamlen + len > RR_NAME_LEN) {
+   truncate = 1;
+   /* Take as many characters as possible */
+   len = RR_NAME_LEN - retnamlen;
+   if (len <= 0) {
+   rock_erase_incomplete_utf8(retname,
+  retnamlen);
+   break;
+   }
+   }
memcpy(retname + retnamlen, rr->u.NM.name, len);
retnamlen += len;
retname[retnamlen] = '\0';
+   if (truncate == 1)
+   rock_erase_incomplete_

Bug#798300: linux: fs/isofs/rock.c coarsely truncates file names of 254 or 255 bytes length

2015-09-07 Thread Thomas Schmitt

Source: linux
Version: 4.1.6
Severity: normal

Dear Maintainer,

I experience coarse name truncation with very long Rock Ridge
filenames in mounted ISO 9660 filesystems. It happens with
names of 255 or 254 characters. 253 is ok.

Now i riddle about the reasoning behind constant 254
in fs/isofs/rock.c function get_rock_ridge_filename() line 270.
Is there a limit in the data structures which would forbid
filenames of 254 or 255 bytes length ?

Affected are at least Debian kernels 4.1.0-2-amd64 and
3.16.0-4-amd64 but probably it reaches far more back.

I seemingly fixed the problem for length 254 and 255 by changing
254 to 256 and building a new kernel. VM makes courageous.

--

Debian's
  /usr/src/linux-4.1.6/fs/isofs/rock.c
and also
  https://github.com/torvalds/linux/blob/master/fs/isofs/rock.c
function get_rock_ridge_filename(), line 270 say

if ((strlen(retname) + rr->len - 5) >= 254) {
truncate = 1;
break;
}

The value 1 in "truncate" causes interpretation of NM
entries to end prematurely. "break" prevents the current
NM name piece from being added to the result.

Different distribution of the name over NM entries causes
different effective truncation lengths.
It depends on ECMA-119 name and the SUSP data in the ECMA-119
directory record how much of the name fits into the first NM,
and how much must go to the second NM in Continuation Area.
This second NM gets dropped.

Further it seems that truncation without any attempt to generate
a unique filename is suboptimal for the further processing of
such names.

In the times of UTF-8, truncation would have to show
some additional intelligence to avoid incomplete multi-byte
characters.

--

To reproduce the problem, create an ISO with a Rock Ridge
filename of at least 254 bytes length

  $ touch my_dummy_file

  $ genisoimage -R -o test.iso -graft-points 
/12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234=my_dummy_file


Then mount it and list the file

  # mount -o loop test.iso /mnt/iso

  $ ls /mnt/iso
  
12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456

That's 146 characters only.
Filenames of length 245 or 255 with same 146 first characters
all get mapped to the same name. readdir(3) delivers several
dirents with the same .dname.

Different production yields different truncations.

  $ xorriso -as mkisofs ...same.options.as.above...

yields after mount 155 characters.

  $ xorriso -outdev /dvdbuffer/test.iso -blank as_needed \
-map my_dummy_file /...254.characters...

yields 93 characters.

xorriso as alternative reader shows in all three cases 254
characters.

--
Investigating the possible impact of a change from 254 to 256
in rock.c
-
dir.c:

static do_isofs_readdir() calls get_rock_ridge_filename()
and hands over the result to dir_emit(), which i understand from
include/linux/fs.h is a "filldir" function type, by which
the kernel receives dirent information.
Such a receiver and its consumers should be good for 255
bytes plus 0, shouldn't it ?

The name buffer tmpname is a parameter handed over by the only
caller isofs_readdir(). It obtains the buffer by
tmpname = (char *)__get_free_page(GFP_KERNEL);
"struct page" sends me on a log journey into regions of the
memory management which i do not understand.
Nevertheless a widely used object named "page" that can take
254 bytes should be able to take 256 bytes, shouldn't it ?.

This buffer is not used after the call of do_isofs_readdir()
but rather gets freed immediately.

-
namei.c:

static isofs_find_entry() uses the result of get_rock_ridge_filename()
as argument of static isofs_cmp(). I believe to trace it through
struct dentry_operations to isofs_dentry_cmp_common() in
fs/isofs/inode.c. Its use of struct qstr *name seems harmless
with a potential name->len == 255.

The name buffer tmpname is a parameter handed over by the only
caller isofs_lookup():
  struct page *page;
  page = alloc_page(GFP_USER);
The parameter tmpname is
  page_address(page)
As above: a "page" that can take 254 bytes should be able
to take 256 bytes.

This buffer is not used after the call of isofs_find_entry()
but rather gets freed immediately.

-

So s/254/256/ _should_ be safe ... if i am not mistaken.

On my Debian "unstable" VM i worked my way th