Bug#798300: linux: fs/isofs/rock.c coarsely truncates file names of 254 or 255 bytes length
Hi, > if this still should be trackled, It's still in kernel 5.10, at least. > please report it upstream I agree that this bug is not Debain's business. But i'd need a kernel list member who would be ready to review a patch. Warning: From here on it's just whining of a userlander about kernel cruelty. There is no maintainer for isofs. Nearest is linux-scsi because of the semi-proximity between cdrom and isofs. Having had a real test machine last year, i tried to submit patches for both subsystems [1] [2], but did not get any reaction. After this experience i did not submit my patch for the name length issues. It is based on 5.10: -- From 3d484405f0ad8d10ef490281da157bfdd7450cb6 Mon Sep 17 00:00:00 2001 From: Thomas Schmitt Date: Tue, 22 Sep 2020 12:34:50 +0200 Subject: [PATCH 1/1] isofs: truncate oversized Rock Ridge names to 255 bytes Enlarge the limit for name bytes from 253 to 255. Do not discard all bytes of the NM field where the overflow occurs, but rather append them to the accumulated name before truncating it to exactly 255 bytes. Map trailing incomplete UTF-8 bytes to '_'. Signed-off-by: Thomas Schmitt --- fs/isofs/rock.c | 73 ++--- fs/isofs/rock.h | 2 ++ 2 files changed, 71 insertions(+), 4 deletions(-) diff --git a/fs/isofs/rock.c b/fs/isofs/rock.c index 94ef92fe806c..e1db8776b67e 100644 --- a/fs/isofs/rock.c +++ b/fs/isofs/rock.c @@ -192,6 +192,63 @@ static int rock_check_overflow(struct rock_state *rs, int sig) return 0; } +/* + * Find backward from idx the start byte of a possible UTF-8 character. + * https://en.wikipedia.org/wiki/UTF-8#Description + * Return -1 if no incomplete UTF-8 sequence is found at the name end. + */ +static int rock_find_utf8_start(char *name, int idx) +{ + unsigned char *uname, uch; + int i; + + uname = (unsigned char *)name; + /* Up to 4-byte codes */ + for (i = 0; i < 4 && idx - i >= 0; i++) { + uch = uname[idx - i]; + if ((uch & 0xe0) == 0xc0) { + /* UTF-8 start byte for 2-byte codes */ + if (i >= 1) + return -1; /* tail is complete */ + else + return (idx - i); /* tail not complete */ + } else if ((uch & 0xf0) == 0xe0) { + /* UTF-8 start byte for 3-byte codes */ + if (i >= 2) + return -1; + else + return (idx - i); + } else if ((uch & 0xf8) == 0xf0) { + /* UTF-8 start byte for 4-byte codes */ + if (i >= 3) + return -1; + else + return (idx - i); + } else if ((uch & 0xc0) != 0x80) { + /* not an UTF-8 tail byte, so no UTF-8 */ + return -1; + } + } + /* That would be an oversized UTF-8 code or no UTF-8 at all */ + return -1; +} + +/* + * Replace trailing incomplete UTF-8 sequence by '_' characters. + */ +static void rock_erase_incomplete_utf8(char *name, int len) +{ + int i; + + if (len <= 0) + return; + i = rock_find_utf8_start(name, len - 1); + if (i < 0) + return; + for (; i < len; i++) + name[i] = '_'; +} + /* * return length of name field; 0: not found, -1: to be ignored */ @@ -271,16 +328,24 @@ int get_rock_ridge_filename(struct iso_directory_record *de, break; } len = rr->len - 5; - if (retnamlen + len >= 254) { - truncate = 1; - break; - } p = memchr(rr->u.NM.name, '\0', len); if (unlikely(p)) len = p - rr->u.NM.name; + if (retnamlen + len > RR_NAME_LEN) { + truncate = 1; + /* Take as many characters as possible */ + len = RR_NAME_LEN - retnamlen; + if (len <= 0) { + rock_erase_incomplete_utf8(retname, + retnamlen); + break; + } + } memcpy(retname + retnamlen, rr->u.NM.name, len); retnamlen += len; retname[retnamlen] = '\0'; + if (truncate == 1) + rock_erase_incomplete_
Bug#798300: linux: fs/isofs/rock.c coarsely truncates file names of 254 or 255 bytes length
Source: linux Version: 4.1.6 Severity: normal Dear Maintainer, I experience coarse name truncation with very long Rock Ridge filenames in mounted ISO 9660 filesystems. It happens with names of 255 or 254 characters. 253 is ok. Now i riddle about the reasoning behind constant 254 in fs/isofs/rock.c function get_rock_ridge_filename() line 270. Is there a limit in the data structures which would forbid filenames of 254 or 255 bytes length ? Affected are at least Debian kernels 4.1.0-2-amd64 and 3.16.0-4-amd64 but probably it reaches far more back. I seemingly fixed the problem for length 254 and 255 by changing 254 to 256 and building a new kernel. VM makes courageous. -- Debian's /usr/src/linux-4.1.6/fs/isofs/rock.c and also https://github.com/torvalds/linux/blob/master/fs/isofs/rock.c function get_rock_ridge_filename(), line 270 say if ((strlen(retname) + rr->len - 5) >= 254) { truncate = 1; break; } The value 1 in "truncate" causes interpretation of NM entries to end prematurely. "break" prevents the current NM name piece from being added to the result. Different distribution of the name over NM entries causes different effective truncation lengths. It depends on ECMA-119 name and the SUSP data in the ECMA-119 directory record how much of the name fits into the first NM, and how much must go to the second NM in Continuation Area. This second NM gets dropped. Further it seems that truncation without any attempt to generate a unique filename is suboptimal for the further processing of such names. In the times of UTF-8, truncation would have to show some additional intelligence to avoid incomplete multi-byte characters. -- To reproduce the problem, create an ISO with a Rock Ridge filename of at least 254 bytes length $ touch my_dummy_file $ genisoimage -R -o test.iso -graft-points /12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234=my_dummy_file Then mount it and list the file # mount -o loop test.iso /mnt/iso $ ls /mnt/iso 12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456 That's 146 characters only. Filenames of length 245 or 255 with same 146 first characters all get mapped to the same name. readdir(3) delivers several dirents with the same .dname. Different production yields different truncations. $ xorriso -as mkisofs ...same.options.as.above... yields after mount 155 characters. $ xorriso -outdev /dvdbuffer/test.iso -blank as_needed \ -map my_dummy_file /...254.characters... yields 93 characters. xorriso as alternative reader shows in all three cases 254 characters. -- Investigating the possible impact of a change from 254 to 256 in rock.c - dir.c: static do_isofs_readdir() calls get_rock_ridge_filename() and hands over the result to dir_emit(), which i understand from include/linux/fs.h is a "filldir" function type, by which the kernel receives dirent information. Such a receiver and its consumers should be good for 255 bytes plus 0, shouldn't it ? The name buffer tmpname is a parameter handed over by the only caller isofs_readdir(). It obtains the buffer by tmpname = (char *)__get_free_page(GFP_KERNEL); "struct page" sends me on a log journey into regions of the memory management which i do not understand. Nevertheless a widely used object named "page" that can take 254 bytes should be able to take 256 bytes, shouldn't it ?. This buffer is not used after the call of do_isofs_readdir() but rather gets freed immediately. - namei.c: static isofs_find_entry() uses the result of get_rock_ridge_filename() as argument of static isofs_cmp(). I believe to trace it through struct dentry_operations to isofs_dentry_cmp_common() in fs/isofs/inode.c. Its use of struct qstr *name seems harmless with a potential name->len == 255. The name buffer tmpname is a parameter handed over by the only caller isofs_lookup(): struct page *page; page = alloc_page(GFP_USER); The parameter tmpname is page_address(page) As above: a "page" that can take 254 bytes should be able to take 256 bytes. This buffer is not used after the call of isofs_find_entry() but rather gets freed immediately. - So s/254/256/ _should_ be safe ... if i am not mistaken. On my Debian "unstable" VM i worked my way th