Hi, i have pushed the very large commit 87f5053 to branch ts-multiextent :
New API around iso9660_statv2_t as successor of iso9660_stat_t for reading data files of 4 GiB or larger, while keeping old API and ABI valid. http://git.savannah.gnu.org/cgit/libcdio.git/commit/?h=ts-multiextent&id=87f50538ea360c3e533a4bbee8d8e1b9feba4131 Before jumping on the changeset, reviewers should read the following introduction to problem and proposed solution. -------------------------------------------------------------------------- The lack of multi-extent support in iso9660_stat_t led to the development of a successor type: iso9660_statv2_t That's because any ABI compatible extension of iso9660_stat_t is prevented by its last member, an array of variable size, and by the fact that the struct behind iso9660_stat_t is public. So i present iso9660_statv2_t for approval or rejection in branch ts-multiextent http://git.savannah.gnu.org/cgit/libcdio.git/log/?h=ts-multiextent The new API is based on the work of Pete Batard in branch pbatard-multiextent2 where he implemented the proper assessment of multi-extent files while reading ISO 9660 directory records. His proposal is much more concise than iso9660_statv2_t. I see only two problems with his proposal: * It breaks the ABI. * It will break the ABI again if ever the maximum number of extents ISO_MAX_MULTIEXTENT shall be incresed from 8 to a higher number. It is unattractive to choose a large value already now, because each extent slot costs 8 byte and iso9660_stat_t can have many instances. (This could be fixed without a new API by cherrypicking parts of my proposal. I will probably do, if iso9660_statv2_t gets rejected.) It turned out that avoiding thee two ABI problems implies to stirr up a lot of code. Pete is officially allowed to laugh now. -------------------------------------------------------------------------- Other than its predecessor the entrails of iso9660_statv2_t are completely private and accessible only via API functions. This gives hope that future changes will not cause API or ABI problems. iso9660_stat_free and all API functions which return iso9660_stat_t are now deprececated. Each of them has a successor function: iso9660_fs_stat -> iso9660_fs_statv2 iso9660_fs_stat_translate -> iso9660_fs_statv2_translate iso9660_fs_readdir -> iso9660_fs_readdir_v2 iso9660_ifs_stat -> iso9660_ifs_statv2 iso9660_ifs_stat_translate -> iso9660_ifs_statv2_translate iso9660_ifs_readdir -> iso9660_ifs_readdir_v2 iso9660_fs_find_lsn -> iso9660_fs_find_lsn_v2 iso9660_fs_find_lsn_with_path -> iso9660_fs_find_lsn_with_path_v2 iso9660_ifs_find_lsn -> iso9660_ifs_find_lsn_v2 iso9660_ifs_find_lsn_with_path -> iso9660_ifs_find_lsn_with_path_v2 iso9660_stat_free -> iso9660_statv2_free Further the list type for iso9660_stat_t members and its methods have successors: CdioISO9660FileList_t -> CdioISO9660FileListV2_t iso9660_filelist_new -> iso9660_filelist_new_v2 iso9660_filelist_free -> iso9660_filelist_free_v2 The new iso9660_statv2_t has access methods for its struct members: iso9660_statv2_get_rr iso9660_statv2_get_tm iso9660_statv2_get_total_size iso9660_statv2_get_extents iso9660_statv2_get_xa iso9660_statv2_get_type iso9660_statv2_get_filename and a convenience function to obtain a legacy iso9660_stat_t: iso9660_statv2_get_v1 -------------------------------------------------------------------------- The old API around iso9660_stat_t is still provided without change. Each application of libiso9660 which wants to access data files of 4 GiB or larger will have to make the transition by own effort. See example/extract.c for the alternative usage of both APIs. There are two scenarios for adaption of applications to new iso9660_statv2_t: * Many direct accesses to members of iso9660_stat_t. * Few such accesses and mainly file data content reading. -------------------------------------------------------------------------- If the application has many direct accesses to members of iso9660_stat_t, then continue to use the pointer as a convenience object. Program example/extract.c demonstrates this in the #else cases of #ifdef LIBISO9660_USE_LEGACY_STAT. * Let us assume that the existing iso9660_stat_t is declared like this: iso9660_stat_t *p_stat; * As its source of information introduce a new pointer to iso9660_statv2_t or a list of such objects. E.g. + iso9660_statv2_t *p_statv2; * If you have lists of iso9660_stat_t, then change their type to the type for the new iso9660_statv2_t elements. E.g. - CdioISO9660FileList_t *p_filelist; + CdioISO9660FileListV2_t *p_filelist; * Change the old API calls to obtain the iso9660_stat_t object or a list of such objects to the corresponding calls of the new API. E.g. - p_stat = iso9660_fs_stat(...); + p_statv2 = iso9660_fs_statv2(...); - p_filelist = iso9660_fs_readdir(...); + p_filelist = iso9660_fs_readdir_v2(...); * When using list elements, obtain the iso9660_statv2_t object. E.g. - p_stat = (iso9660_stat_t *) _cdio_list_node_data(); + p_statv2 = (iso9660_statv2_t *) _cdio_list_node_data(); * Obtain the convenience iso9660_stat_t from the iso9660_statv2_t by call iso9660_statv2_get_v1(). E.g. + p_stat = iso9660_statv2_get_v1(p_statv2); * Now use the convenience iso9660_stat_t for all purposes except for data file content. E.g. strcpy(name, p_stat.filename); * For data file content use the iso9660_statv2_t object. E.g. + uint32_t num_extents; + iso9660_extent_descr_t *extents + total_size = iso9660_statv2_get_total_size(p_statv2); + num_extents = iso9660_statv2_get_extents(p_statv2, &extents); The content of a datafile is comprised of the byte intervals which are described by iso9660_extent_descr_t: * iso9660_extent_descr_t.lsn is the block address * iso9660_extent_descr_t.size is the byte count of the interval All those intervals concatenated constitute the data file content. So you will have to loop over num_extents when reading data. See exaple/extract.c . * Replace iso9660_stat_t.secsize by a macro usage for each extent. E.g. - blocks = p_stat.secsize; + blocks = CDIO_EXTENT_BLOCKS(extents[index].size); * Finally, wherever the derived iso9660_stat_t is disposed, dispose the iso9660_statv2_t object, too: + iso9660_statv2_free(p_statv2); iso9660_stat_free(p_stat); -------------------------------------------------------------------------- If only few direct accesses to members of iso9660_stat_t occur, then it is unnecessary to have a convenience iso9660_stat_t. Program example/isolist.c demonstrates this approach. * Change the types from old to new - iso9660_stat_t *p_stat + iso9660_statv2_t *p_stat - CdioISO9660FileList_t *p_filelist; + CdioISO9660FileListV2_t *p_filelist; * Change the old API calls to obtain the iso9660_stat_t object or a list of such objects to the corresponding calls of the new API. E.g. - p_stat = iso9660_fs_stat(...); + p_stat = iso9660_fs_statv2(...); - p_filelist = iso9660_fs_readdir(...); + p_filelist = iso9660_fs_readdir_v2(...); * When using list elements, obtain the iso9660_statv2_t object. E.g. - p_stat = (iso9660_stat_t *) _cdio_list_node_data(); + p_stat = (iso9660_statv2_t *) _cdio_list_node_data(); * Use the access methods of iso9660_statv2 rather than direct access to members of iso9660_stat_t. (For member .secsize, see below.) E.g. - strcpy(name, p_stat.filename); + strcpy(name, iso9660_statv2_get_filename(p_stat)); * Adapt to the new way to learn about data file content. E.g. + uint64_t total_size; + uint32_t num_extents; + iso9660_extent_descr_t *extents + total_size = iso9660_statv2_get_total_size(p_stat); + num_extents = iso9660_statv2_get_extents(p_stat, &extents); The content of a datafile is comprised of the byte intervals which are described by iso9660_extent_descr_t: * iso9660_extent_descr_t[index].lsn is the block address * iso9660_extent_descr_t[index].size is the byte count of the interval All those intervals concatenated constitute the data file content. So you will have to loop over num_extents when reading data. index ranges from 0 to num_extents - 1. See exaple/extract.c . * Replace iso9660_stat_t.secsize by a macro usage for each extent. E.g. - blocks = p_stat.secsize; + blocks = CDIO_EXTENT_BLOCKS(extents[index].size); * Finally use the new disposal call for the iso9660_statv2_t object: - iso9660_stat_free(p_stat); + iso9660_statv2_free(p_stat); -------------------------------------------------------------------------- There are minimal runtime precautions to detect mismatch of type and API function. Typical messages are: cdio 3 message: Probable programming error with using iso9660_statv2_get_extents : Detected non-iso9660_statv2_t cdio ASSERT message: file iso9660_fs.c: line 2132 (iso9660_statv2_get_extents): assertion failed: (_iso9660_demand_statv2(p_stat, "iso9660_statv2_get_extents")) or !ASSERT: file iso9660_fs.c: line 2236 (iso9660_stat_free): assertion failed: (_iso9660_demand_stat(p_stat, "iso9660_stat_free")) The names "_iso9660_demand_statv2" or "_iso9660_demand_stat" tell what type was expected as parameter. ========================================================================== Implementation report: * The expansion of API struct iso9660_stat_s by Pete Batard was reverted in include/cdio/iso9660.h . * The new struct iso9660_statv2_s is defined in lib/iso9660/iso9660_fs.c. In include/cdio/iso9660.h there are only opaque declarations: struct iso9660_statv2_s; typedef struct iso9660_statv2_s iso9660_statv2_t; * All feature implementations in iso9660_fs.c were converted to the new type iso9660_statv2_t. The old API functions are now frontends to the new ones, which derive their resuling iso9660_stat_t from the iso9660_statv2_t which they obtain from the implementations. * Creation and cloning of iso9660_statv2_t are now delegated to private methods: iso9660_statv2_new(), iso9660_statv2_clone(). * Initialization, cloning and disposal of entrails of iso_rock_statbuf_t are now delegated to semi-public functions in lib/iso9660/rock.c: iso9660_rock_statbuf_init(), iso9660_rock_statbuf_clone_entrails(), iso9660_rock_statbuf_free_entrails(). * Semi-public functions of lib/iso9660/rock.c which have iso9660_stat_s as parameter have been accompanied by equivalent functions which have iso_rock_statbuf_t instead: int get_rock_ridge_filename_rr(iso9660_dir_t * p_iso9660_dir, /*out*/ char * psz_name, /*in/out*/ iso_rock_statbuf_t *p_rr); int parse_rock_ridge_stat_rr(iso9660_dir_t *p_iso9660_dir, /*out*/ iso_rock_statbuf_t *p_rr); The old functions are not used by libcdio any more but they are declared in include/cdio/rock.h. So they still exist as frontends to the new functions. * The definition of bool_3way_t in include/cdio/types.h got a comment which warns that the type may not get a value 3 or else breaks the runtime safety check which shall detect bad combinations of old and new types with API calls. * Examples, tests, and src/ programs have been adapted to use the new API: example/extract.c (shows usage of both APIs) example/isofile.c example/isofile2.c example/isolist.c src/iso-read.c src/util.c src/cd-info.c src/iso-info.c test/testisocd2.c test/testisocd_joliet.c * One test was braought back to pure legacy iso9660_stat_t use (i.e. i reverted Pete Batard's changes to it). Just to surely have such an application at compile time: test/testisocd.c * Fixed wrong description of CdioList_t in iso9660.h. (Its destructor implementation iso9660_dirlist_free() submits free(3) as member destructor function. Its users inside libcdio submit text strings as members. External users of legacy iso9660_stat_t will suffer no memory leaks as long as only directories are submitted to the list. With iso9660_stat2_t there would be substantial memory leaking.) -------------------------------------------------------------------------- Open decisions * Legacy iso9660_stat_s has an anonymous enum for the file type. It is not possible to use the same enum names in another enum in successor iso9660_statv2_s. Alternative solutions are: - Define and use a named enum and thus change the souce code of the API definition. To my understanding the change is compatible. But that is a bet on the plausibility of compiler behavior. At least i cannot give hard reasons why this must stay API and ABI compatible under all circumstances. Define ISO9660_WITH_FILE_TYPE_ENUM to activate this alternative. - Simulate in iso9660_statv2_s the anonymous enum by some integer-ish type which bears the same values as publicly defined by the anonymous enum. This is a dirty hack, but leaves the legacy API literally unchanged. Do NOT define ISO9660_WITH_FILE_TYPE_ENUM to activate this alternative. This is currently enabled. * Where to declare semi-public methods of iso_rock_statbuf_t ? I introduced three calls by which iso9660_fs.c can avoid to dig in the entrails of iso_rock_statbuf_t. Their prototypes should be exposed to lib/iso9660/iso9660_fs.c lib/iso9660/rock.c but not to the API. So i can hardly declare them in include/cdio/rock.h -------------------------------------------------------------------------- Test runs on GNU/Linux I configured libcdio with --disable-shared in order to get it ready for valgrind (and for gdb, too). The ISOs used are available as http://scdbackup.webframe.org/multi_extent_8k.iso.gz MD5 c2c57fa0bc428f6032b33f61ceae8bef (use gzip to uncompress) https://cdimage.debian.org/debian-cd/current/i386/iso-cd/debian-9.4.0-i386-netinst.iso MD5 a9a477d5cd311635d502c4ab746743d3 After next Debian point release it will be in the archive: http://cdimage.debian.org/mirror/cdimage/archive/9.4.0/i386/iso-cd/debian-9.4.0-i386-netinst.iso Both ISOs are stored in some directory on disk, where a subdirectory ./test_dir exists isodir=/dvdbuffer Sometimes they get mounted at mounted=/mnt/iso Run-time memory checker valgrind was applied by this prefix to the program runs: valgrind --leak-check=full \ Now for the particular runnable programs. example/example.c : example/extract "$isodir"/debian-9.4.0-i386-netinst.iso \ "$isodir"/test_dir/debian-9.4.0-i386-netinst 2>&1 | less mount "$isodir"/debian-9.4.0-i386-netinst.iso "$mounted" # Symbolic links will be missing because example/extract prefers # Joliet unconditionally. diff -r "$isodir"/test_dir/debian-9.4.0-i386-netinst "$mounted" 2>&1 | less umount "$mounted" /bin/rm -r "$isodir"/test_dir/debian-9.4.0-i386-netinst example/extract "$isodir"/multi_extent_8k.iso \ "$isodir"/test_dir/multi_extent_8k 2>&1 | less mount "$isodir"/multi_extent_8k.iso "$mounted" # No difference should show up. diff -r "$isodir"/test_dir/multi_extent_8k "$mounted" 2>&1 | less umount "$mounted" /bin/rm -r "$isodir"/test_dir/multi_extent_8k example/isofile.c : example/isofile "$isodir"/debian-9.4.0-i386-netinst.iso \ /.disk/mkisofs "$isodir"/test_dir/disk_mkisofs 2>&1 | less # Should bear the lengthy xorriso command by which the ISO was made view "$isodir"/test_dir/disk_mkisofs example/isofile2.c : # Put debian-9.4.0-i386-netinst onto DVD+RW # (On Linux kernel: Eject and load after burning, so the kernel learns about # the new state of medium and drive.) example/isofile2 /dev/sr4 md5sum.txt 2>&1 | less # Should bear a long list of MD5 sums and the file view md5sum.txt rm md5sum.txt example/isolist.c : example/isolist "$isodir"/multi_extent_8k.iso 2>&1 | less example/isolist "$isodir"/debian-9.4.0-i386-netinst.iso 2>&1 | less src/iso-read.c : # (valgrind reports memory leaks about function parse_options()) src/iso-read "$isodir"/multi_extent_8k.iso \ -e /multi_extent_file \ -o "$isodir"/test_dir/multi_extent_file mount "$isodir"/multi_extent_8k.iso "$mounted" diff "$isodir"/test_dir/multi_extent_file "$mounted"/multi_extent_file \ 2>&1 | less umount "$mounted" rm "$isodir"/test_dir/multi_extent_file src/cd-info.c : src/cd-info -C /dev/sr4 --iso9660 --dvd 2>&1 | less src/iso-info.c : src/iso-info -i "$isodir"/debian-9.4.0-i386-netinst.iso \ --iso9660 2>&1 | less test/testisocd2.c : test/testisocd2 2>&1 | less test/testisocd_joliet.c : test/testisocd_joliet 2>&1 | less test/testisocd.c : # Put DVD+RW into drive with highest number (e.g. /dev/sr4) (cd test ; ./testisocd ) 2>&1 | less --------------------------------------------------------------------------- Have a nice day :) Thomas