Hi, i now have running support for large data files in cd9660. Shall we first discuss it here, or shall i submit a large PR which motivates the change, lengthily assesses the situation, presents the model change, and proposes a patch ?
Overview: The goal is to let cd9660 recognize files with multiple file sections and represent their multiple directory records as a single vnode with a uniform byte space. There shall be no duplicate filenames presented to VFS. If files with equal names are found, then only the last one of those with the highest ISO 9660 version number will be visible. This warranty depends on properly sorted ISO 9660 directories. The decision, which intervals of directory records form single files depends on properly set ISO 9660 Multi-Extent flag bits. The filesystem specific vnode.v_data struct iso_node needed a change to represent the 1:n relation between file and file section. This change caused adjustments all over the code of cd9660. It makes nearly full use of the 64 bits of NetBSD's ino_t and employs malloc(9) or kmem(9) memory for files with more than one section. (Currently 12 bytes per file section.) ABI compatibility of the changed struct iso_node is guaranteed for systems with up to 96 bit pointer types. The API of cd9660_node.h is not compatible, in my current implementation. An alternative implementation is possible with 100 % API/ABI compatibility for single-section files. It would cause 4 to 8 bytes size increase of struct iso_node. fstat(1) and pmap(1) include <isofs/cd9660/cd9660_node.h>. Several implementations of interface methods are affected: - cd9660_readdir() serving as VOP_READDIR(9) The case of mount -o norrip,nogens already used a delivery function with delayed file candidates: cd9660_vnops.c : iso_shipdir(). Originally it only had the task to find the youngest version of a ISO 9660 data file. Now it also counts the follow-up directory records of the same file and skips over them. - cd9660_lookup() as VOP_LOOKUP(9) mount -o norrip returned the last record of matching name, whereas -o rrip returned the first matching record. Now norrip,nogens with a healthy ISO 9660 filesystem drops only older versions of the same name. All three filesystem interpretation types now return the ino based on the byte address of first record of the winning file and on its number of file sections. - cd9660_vget_internal() serving effectively as VFS_VGET(9) If the ino_t input parameter indicates a number of file sections larger than one, then the created iso_node gets malloced iso_node.iso_fsect.many as type M_ISOFSFSECT. The iso_node.i_number will indicate a file section count larger than 1 only if such memory is attached to the iso_node and valid. - cd9660_bmap() as VOP_BMAP(9) Nothing changes for files with a single file section. Those with more file sections need a loop to find the section which holds the desired block. Similar to the case of a single section, the last section will be base of the resulting block address, regardless whether its size includes the desired block. The changes are tested by an ISO 9660 filesystem with large data file, and examples for the Rock Ridge POSIX file types regular, directory, block device, fifo, symbolic link. The test for block device functionality makes it necessary to create it especially for a local readable device file. xorriso perception of Rock Ridge aspect: dr-x------ 1 1000 0 0 May 6 15:31 '/' dr-x------ 1 1000 0 0 May 3 14:58 '/dev' prw------- 1 1000 0 0 May 24 14:29 '/dev/test.fifo' br-------- 1 1000 5 0,12 May 14 14:33 '/dev/wd1e' dr-x------ 1 1000 1000 0 May 6 15:30 '/my' -r-------- 1 1000 1000 4329375744 May 6 15:30 '/my/large_file' dr-x------ 1 1000 0 0 Jan 19 14:41 '/reg' -r-x------ 1 1000 0 133411 Jan 19 14:41 '/reg/tar' lr-x------ 1 1000 0 0 May 24 14:29 '/reg/to_regfile' -> 'tar' -r-------- 1 1000 1000 6 May 6 15:34 '/small_file' A script is available for creating, mounting, and testing this filesystem. It is not trivially portable to other computers because there are several individual adaptions to be made. Its result varies between ####################### # BAD TEST RESULTS: 8 # ####################### and +++++++++++++++ + All is well + +++++++++++++++ I will publish the script and be ready to support its conversion into an automatic test of what can be tested in general. There is also an ISO image emerging which (by hex editor) exposes exotic or even illegal situations. 6.1.3 and the host operating system show interesting effects when mounting it. Remaining restrictions: - ISO 9660 allows a file to be composed of multiple file sections with sizes which are not aligned to the filesystem block size. cd9660 demands that all but the last file section of a file must have sizes which are multiples of the block size. Usually 2 KiB. - cd9660 imposes a deliberate limit of 128 on the number of sections per file. CD9660_FSECT_MAX can be adjusted in cd9660_node.h. Remaining problems: - The name comparison for finding identical names is still not fully in sync underneath VOP_READDIR(9) and VOP_LOOKUP(9). It is done by two different functions in cd9660_util.c where i see incompatibilities in cases of non-compliant version suffixes. - I could not yet find ISO images or software which would provide test opportunities for ISO 9660 Associated Files or Extented Attributes (which are not related to getextattr(1)/extattr(9)). - Some code paths are not as clear as they could be. I restricted myself to augmenting the existing code for the 1:n relation. Some code made use of the contrary assumption for shortcuts. I did not tackle the shortcuts but only the assumptions yet. So some shortcuts became quite curvy and not so short any more. About the inode number inflation: Large data files get giant inode numbers, because the file section count is encoded above bit 48 of ino_t. The hardest reason why this information has to be encoded in ino_t is the desire to implement method VFS_VGET(9). If VOP_LOOKUP(9) would be the only method which leads to creation of a vnode, then the address and count could be stored in some other members of struct iso_node. A simple EOPNOTSUPP would open this path. One could cut inode numbers to 32 bit and then port the cd9660 improvements to FreeBSD. (Not that freebsd-hackers would be much interested in cd9660.) Have a nice day :) Thomas