Re: Suggestion: bmap files and bmaptool
On Wed, 2013-08-14 at 11:44 +0200, Till Maas wrote: On Wed, Aug 14, 2013 at 12:21:23PM +0300, Artem Bityutskiy wrote: On Wed, 2013-08-14 at 10:37 +0200, Till Maas wrote: On Wed, Aug 14, 2013 at 09:31:22AM +0300, Artem Bityutskiy wrote: Other things like reading from remote sites, progress indicator, protecting your mounted disks, uncompressing on-the-fly, checking sha1 of the data ond of the bmap file itself - are goodies, although important ones. Why sha1? If the check is there for security reasons, please use at least sha256. Should not be difficult to implement if there is demand. SHA-256 is used to create the signatures of other distributed files: https://fedoraproject.org/static/checksums/Fedora-19-i386-CHECKSUM FYI, I've implemented SHA-256 support and it will be the default for bmaptool soon. I also made sure other hash functions can be used as well (e.g., SHA-512). -- Best Regards, Artem Bityutskiy -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On Wed, 2013-08-14 at 12:24 +0200, Björn Persson wrote: Artem Bityutskiy wrote: On Wed, 2013-08-14 at 11:44 +0200, Till Maas wrote: On Wed, Aug 14, 2013 at 12:21:23PM +0300, Artem Bityutskiy wrote: On Wed, 2013-08-14 at 10:37 +0200, Till Maas wrote: On Wed, Aug 14, 2013 at 09:31:22AM +0300, Artem Bityutskiy wrote: Other things like reading from remote sites, progress indicator, protecting your mounted disks, uncompressing on-the-fly, checking sha1 of the data ond of the bmap file itself - are goodies, although important ones. Why sha1? If the check is there for security reasons, please use at least sha256. Should not be difficult to implement if there is demand. SHA-256 is used to create the signatures of other distributed files: https://fedoraproject.org/static/checksums/Fedora-19-i386-CHECKSUM Therefore if bmap is used it should also use at least SHA 256. It is recommended against using SHA-1 for more than 7 years now: http://csrc.nist.gov/groups/ST/hash/policy_2006.html Sure, good point, thank you, I'll implement sha-256 support. Speaking of security, how is the integrity of the bmap file itself verified? A checksum is of no use if you don't know who generated the checksum. Fedora's checksum files are OpenPGP signed, as you can see in the one that Till linked to. I don't see a cryptographic signature in your example file. Are there detached signatures for the bmap files? And does Bmaptool verify the signatures? I've implemented gpg signature verification. Now the bmap file can be gpg-signed and in this case bmaptool will verify the signature. Both Fedora-like clearsign gpg signatures and detached signatures are supported. -- Best Regards, Artem Bityutskiy -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On Sun, 2013-08-18 at 16:43 +0100, Pádraig Brady wrote: You definitely need the fsync before doing the fiemap. We saw this on certain file systems including ext4 when adding fiemap support (efficient reading of holes) to cp. This is a bug in the fiemap interface IMHO in that it returns fairly useless data unless FIEMAP_FLAG_SYNC is specified. For a general utility like cp, we couldn't sync each file before copying (even only large files), so we restrict fiemap usage to files that have a different disk usage than apparent size and so probably contain holes. FYI, I've just made sure I use the FIEMAP_FLAG_SYNC is used in the project. -- Best Regards, Artem Bityutskiy -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On Fri, 2013-08-16 at 09:51 -0500, Eric Sandeen wrote: by single file I meant a _single_ file, not the original file and a mapping file. :) Oh, sorry, OK. I realize that you have a fully-fledged set of tools, and you're not looking for new directions, but I was thinking about encoding mapping info file data into a single file, with a tool to extract it again. That way there's nothing to get out of sync. Well, I do wish to know about alternatives, or the better ways to do it, of course. Actually, now that I think about it qemu-img can do that already: (sorry, I'm getting a little off topic here, bear with me) # truncate --size=1g fsfile # mkfs.ext4 fsfile # cp fsfile --sparse=never fsfile.copy // fsfile is sparse; the copy is not. # du -hc fsfile* 49M fsfile 1.0G fsfile.copy # qemu-img convert -f raw -O qcow fsfile.copy fsfile.qcow // the qcow image now contains only data+mapping info, // no zero ranges: # du -h fsfile.qcow 832K fsfile.qcow // and can be re-extracted into a sparse file # qemu-img convert -O raw fsfile.qcow fsfile.copy2 # du -hc fsfile.copy2 352K fsfile.copy2 So from 49M down to 352K, sparseness increased by 48M? Where it came from? Must be that this command turned zero blocks into gaps. Like the zeroed out inode tables, etc. Ok, sorry for that diversion, but that's cool - the tool I want already exists, and I hadn't realized it. :) Exactly this usage is not good enough for flashing purposes, because when flashing to a block device you have to flash the zeroes, you cannot skip them as you do for holes. But I am sure the qcow tools can save sparseness without turning zeroes into gaps, or at least this should not be too difficult to implement this. But in case of bmaptool I chose to keep the sparseness information in a separate file because I wanted to make sure the bmap is optional. Those who use Windows/Mac or do not want to install any additional tools could still use the old method of throwing the entire image to the target block device. Another goal I had is to make the additional bmap file to be a human-readable text file. But yes, having all in one file does have its own advantages. -- Best Regards, Artem Bityutskiy -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On Sat, 2013-08-17 at 17:09 +0100, Richard W.M. Jones wrote: On Thu, Aug 15, 2013 at 12:34:26PM -0500, Eric Sandeen wrote: But then there's the issue of transporting these sparse files around. We have had the same problem in the past with large e2image metadata image files, which may be terabytes in length, with only gigabytes or megabytes of real data. e2image _itself_ creates a sparse file, but bzipping it or rsyncing it still processes terabytes of zeros, and loses all notion of sparseness. xz preserves sparseness. We use it for preserving and compressing virt-sparsify'd images. Right, this is a good solution for the problem area of just saving the sparseness and later restoring it. The problem area for bmaptool is a bit wider. 1. There are large images which are inherently sparse 2. They are distributed via ftp/http/etc servers How do we enable the users flashing these images a) very quickly b) easily c) without breaking the old way of flashing (dd) For the first part, we exploit the sparseness information, which is saved in the bmap file. For the second part, we implement stream-reading directly from the remote service, stream-decompressing on-the fly, and flashing in parallel. This rules out the xz saves sparseness design. For the third part, we keep the sparseness information in a separate file which makes it entirely optional. -- Best Regards, Artem Bityutskiy -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On Sun, 2013-08-18 at 16:43 +0100, Pádraig Brady wrote: You definitely need the fsync before doing the fiemap. We saw this on certain file systems including ext4 when adding fiemap support (efficient reading of holes) to cp. This is a bug in the fiemap interface IMHO in that it returns fairly useless data unless FIEMAP_FLAG_SYNC is specified. For a general utility like cp, we couldn't sync each file before copying (even only large files), so we restrict fiemap usage to files that have a different disk usage than apparent size and so probably contain holes. I see, thanks a lot for suggestion. Just to make it clear, this is more like working around file-system bugs, which, as I guess, were present in early days of FIEMAP. I'll look into adding FIEMAP_FLAG_SYNC to Fiemap.py in bmap-tools, thanks! -- Best Regards, Artem Bityutskiy -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On 08/16/2013 09:46 AM, Artem Bityutskiy wrote: Hi Eric, thanks for the question. Sorry my answers contain extra information, but I assume that other people read this, and may benefit from the info. After all, I am trying to get people interested, this is a good tool IMO, and I would like to get more users and hopefully contributors. :-) On Thu, 2013-08-15 at 12:34 -0500, Eric Sandeen wrote: On 8/13/13 8:58 AM, Artem Bityutskiy wrote: # Make the image to be sparse $ cp --sparse=always Fedora-x86_64-19-20130627-sda.raw Fedora-x86_64-19-20130627-sda.raw.sparse # Generate the bmap file $ bmaptool create Fedora-x86_64-19-20130627-sda.raw.sparse -o Fedora-x86_64-19-20130627-sda.raw.sparse.bmap So this is the part that interests me . . . Before going further, I want to quckly note that the Tizen image generation software uses the BmapCreate library API directly, instead of running the bmaptool command-line utility. The library comes with the bmap-tools project. http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/bmaptools/BmapCreate.py There seem to be two issues here; how do we efficiently (compress and) transport sparse files while retaining sparseness, and how do we efficiently operate on files which are already sparse. Yes, it is our assumption is that sparseness gets lost as soon as the image is compressed or copied to the download server, or anywhere else. The idea is that as soon as you generate the raw image on the build server, you generate the bmap file right away, _before_ the sparseness gets lost. The sparseness information is then saved in the bmap file. Then you can compress the image, copy it around, and lose the sparseness. The bmap file preserves it. And of course this means that you should not modify the image later on, otherwise the bmap file becomes incorrect, and the checksums, which are inside the bmap file, will probably mismatch. For the latter, you're using your bmap tool to map what is hopefully a static file (via fibmap or fiemap, I guess?). Yes, we use FIEMAP. Here is the python module which does the job: http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/bmaptools/Fiemap.py I haven't looked at how you've done it, but you do need to be very careful that the file is stable quiesced on disk. Right. We generate the bmap file on the _build server_, inside the tool which generates the raw image. At this point we do know we fully control the image, and no one touches it while we generate the bmap. But yes, this is a good point, may be I need to put it to the man page. Which, by the way, as I figure out now, needs to be somewhat updated: http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/docs/man1/bmaptool.1 Mapping it this way can be fraught with errors if the file is changing, or has delalloc blocks, etc. Good point. Tizen image generator fsync()'s the file before creating bmap, but I guess I have to do this in the BmapCreate library too, to be safe. Thanks! And of course getting the mapping wrong means data corruption. Right. But as I said, we are using bmaptool for a year now, and nothing which looks like a corruption was reported so far. But the importance of fsync() is a very good point, I'll improve the library and make it explicitely fsync, and probably ignore the EROFS error, in case the file is R/O. If the file is known to be sparse, then going forward, using SEEK_HOLE / SEEK_DATA is probably the best approach. Why are they better than FIEMAP? I did consider them, actually, but they are very new, and build servers tend to use older kernels, so I chose FIEMAP. I actually first used FIBMAP, but it is too slow, so I switched to FIEMAP. But then there's the issue of transporting these sparse files around. We have had the same problem in the past with large e2image metadata image files, which may be terabytes in length, with only gigabytes or megabytes of real data. e2image _itself_ creates a sparse file, but bzipping it or rsyncing it still processes terabytes of zeros, and loses all notion of sparseness. Right, but the scenario I keep in mind is that the bmap file is created at the _very_ beginning, and carried/published together with the image, as a stand-alone file with the same basename and .bmap extension. The zeroes in the image can be very well compressed with xz, so people download/copy a lot less than Terabytes. And then people just run this command to re-create the original sparse file: $ bmaptool copy --bmap huge.img.bmap huge.img.xz a_sparse_copy.img This will decompress huge.img.xz on-the-fly and put it to a_sparse_copy.img. The a_sparse_copy.img file will be sparse. Note, it bmaptool auto-discovers the bmap file if it has a common basename with the image, and if it sits in the same directory, so this command can instead be: bmaptool copy
Re: Suggestion: bmap files and bmaptool
On Thu, Aug 15, 2013 at 12:34:26PM -0500, Eric Sandeen wrote: But then there's the issue of transporting these sparse files around. We have had the same problem in the past with large e2image metadata image files, which may be terabytes in length, with only gigabytes or megabytes of real data. e2image _itself_ creates a sparse file, but bzipping it or rsyncing it still processes terabytes of zeros, and loses all notion of sparseness. xz preserves sparseness. We use it for preserving and compressing virt-sparsify'd images. Another approach which might (?) be more robust, is to somehow encode that sparseness in a single file format that can be transported/compressed/copied w/o losing the sparseness information, and another tool to operate efficiently on that format at the destination, either by unpacking it to a normal sparse file or piping it to some other process. qcow2 :-) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#) -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
Hi Eric, thanks for the question. Sorry my answers contain extra information, but I assume that other people read this, and may benefit from the info. After all, I am trying to get people interested, this is a good tool IMO, and I would like to get more users and hopefully contributors. :-) On Thu, 2013-08-15 at 12:34 -0500, Eric Sandeen wrote: On 8/13/13 8:58 AM, Artem Bityutskiy wrote: # Make the image to be sparse $ cp --sparse=always Fedora-x86_64-19-20130627-sda.raw Fedora-x86_64-19-20130627-sda.raw.sparse # Generate the bmap file $ bmaptool create Fedora-x86_64-19-20130627-sda.raw.sparse -o Fedora-x86_64-19-20130627-sda.raw.sparse.bmap So this is the part that interests me . . . Before going further, I want to quckly note that the Tizen image generation software uses the BmapCreate library API directly, instead of running the bmaptool command-line utility. The library comes with the bmap-tools project. http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/bmaptools/BmapCreate.py There seem to be two issues here; how do we efficiently (compress and) transport sparse files while retaining sparseness, and how do we efficiently operate on files which are already sparse. Yes, it is our assumption is that sparseness gets lost as soon as the image is compressed or copied to the download server, or anywhere else. The idea is that as soon as you generate the raw image on the build server, you generate the bmap file right away, _before_ the sparseness gets lost. The sparseness information is then saved in the bmap file. Then you can compress the image, copy it around, and lose the sparseness. The bmap file preserves it. And of course this means that you should not modify the image later on, otherwise the bmap file becomes incorrect, and the checksums, which are inside the bmap file, will probably mismatch. For the latter, you're using your bmap tool to map what is hopefully a static file (via fibmap or fiemap, I guess?). Yes, we use FIEMAP. Here is the python module which does the job: http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/bmaptools/Fiemap.py I haven't looked at how you've done it, but you do need to be very careful that the file is stable quiesced on disk. Right. We generate the bmap file on the _build server_, inside the tool which generates the raw image. At this point we do know we fully control the image, and no one touches it while we generate the bmap. But yes, this is a good point, may be I need to put it to the man page. Which, by the way, as I figure out now, needs to be somewhat updated: http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/docs/man1/bmaptool.1 Mapping it this way can be fraught with errors if the file is changing, or has delalloc blocks, etc. Good point. Tizen image generator fsync()'s the file before creating bmap, but I guess I have to do this in the BmapCreate library too, to be safe. Thanks! And of course getting the mapping wrong means data corruption. Right. But as I said, we are using bmaptool for a year now, and nothing which looks like a corruption was reported so far. But the importance of fsync() is a very good point, I'll improve the library and make it explicitely fsync, and probably ignore the EROFS error, in case the file is R/O. If the file is known to be sparse, then going forward, using SEEK_HOLE / SEEK_DATA is probably the best approach. Why are they better than FIEMAP? I did consider them, actually, but they are very new, and build servers tend to use older kernels, so I chose FIEMAP. I actually first used FIBMAP, but it is too slow, so I switched to FIEMAP. But then there's the issue of transporting these sparse files around. We have had the same problem in the past with large e2image metadata image files, which may be terabytes in length, with only gigabytes or megabytes of real data. e2image _itself_ creates a sparse file, but bzipping it or rsyncing it still processes terabytes of zeros, and loses all notion of sparseness. Right, but the scenario I keep in mind is that the bmap file is created at the _very_ beginning, and carried/published together with the image, as a stand-alone file with the same basename and .bmap extension. The zeroes in the image can be very well compressed with xz, so people download/copy a lot less than Terabytes. And then people just run this command to re-create the original sparse file: $ bmaptool copy --bmap huge.img.bmap huge.img.xz a_sparse_copy.img This will decompress huge.img.xz on-the-fly and put it to a_sparse_copy.img. The a_sparse_copy.img file will be sparse. Note, it bmaptool auto-discovers the bmap file if it has a common basename with the image, and if it sits in the same directory, so this command can instead be: bmaptool copy huge.img.xz a_sparse_copy (analogy to cp from to). And of course, huge.img.xz can be, say: bmaptool copy
Re: Suggestion: bmap files and bmaptool
On 8/16/13 3:46 AM, Artem Bityutskiy wrote: Another approach which might (?) be more robust, is to somehow encode that sparseness in a single file format that can be transported/compressed/copied w/o losing the sparseness information, and another tool to operate efficiently on that format at the destination, either by unpacking it to a normal sparse file or piping it to some other process. Err, not sure I fully understand, but it sounds like what bmap-tools project actually does. by single file I meant a _single_ file, not the original file and a mapping file. :) I realize that you have a fully-fledged set of tools, and you're not looking for new directions, but I was thinking about encoding mapping info file data into a single file, with a tool to extract it again. That way there's nothing to get out of sync. Actually, now that I think about it qemu-img can do that already: (sorry, I'm getting a little off topic here, bear with me) # truncate --size=1g fsfile # mkfs.ext4 fsfile # cp fsfile --sparse=never fsfile.copy // fsfile is sparse; the copy is not. # du -hc fsfile* 49M fsfile 1.0Gfsfile.copy # qemu-img convert -f raw -O qcow fsfile.copy fsfile.qcow // the qcow image now contains only data+mapping info, // no zero ranges: # du -h fsfile.qcow 832Kfsfile.qcow // and can be re-extracted into a sparse file # qemu-img convert -O raw fsfile.qcow fsfile.copy2 # du -hc fsfile.copy2 352Kfsfile.copy2 Ok, sorry for that diversion, but that's cool - the tool I want already exists, and I hadn't realized it. :) So that's a decent option for encoding sparse files for efficient transfer, too. Piping is not implemented, because sparseness cannot be easily passed though a pipe. Err, right ;) -Eric Just some thoughts... Thanks a lot for the feed-back! -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On 8/13/13 8:58 AM, Artem Bityutskiy wrote: # Make the image to be sparse $ cp --sparse=always Fedora-x86_64-19-20130627-sda.raw Fedora-x86_64-19-20130627-sda.raw.sparse # Generate the bmap file $ bmaptool create Fedora-x86_64-19-20130627-sda.raw.sparse -o Fedora-x86_64-19-20130627-sda.raw.sparse.bmap So this is the part that interests me . . . There seem to be two issues here; how do we efficiently (compress and) transport sparse files while retaining sparseness, and how do we efficiently operate on files which are already sparse. For the latter, you're using your bmap tool to map what is hopefully a static file (via fibmap or fiemap, I guess?). I haven't looked at how you've done it, but you do need to be very careful that the file is stable quiesced on disk. Mapping it this way can be fraught with errors if the file is changing, or has delalloc blocks, etc. And of course getting the mapping wrong means data corruption. If the file is known to be sparse, then going forward, using SEEK_HOLE / SEEK_DATA is probably the best approach. But then there's the issue of transporting these sparse files around. We have had the same problem in the past with large e2image metadata image files, which may be terabytes in length, with only gigabytes or megabytes of real data. e2image _itself_ creates a sparse file, but bzipping it or rsyncing it still processes terabytes of zeros, and loses all notion of sparseness. xfs_metadump worked around this by creating its own compact format describing a sparse file's data sparseness, which is unpacked into a normal sparse file by xfs_mdrestore. More recently e2image gained something slightly similar, but used the existing qcow format to encode the sparseness. qemu-image convert to raw type turns it back into a normal sparse file readable by e2fsprogs tools. So I guess your solution requires 2 pieces of information; the existing file, and the mapping file. Are there mechanisms to ensure that they are in sync? Another approach which might (?) be more robust, is to somehow encode that sparseness in a single file format that can be transported/compressed/copied w/o losing the sparseness information, and another tool to operate efficiently on that format at the destination, either by unpacking it to a normal sparse file or piping it to some other process. Just some thoughts... -Eric -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On Tue, 2013-08-13 at 17:48 +0200, Jochen Schmitt wrote: On Tue, Aug 13, 2013 at 04:58:16PM +0300, Artem Bityutskiy wrote: Hi Fedora developers, $ dd if=Fedora-x86_64-19-20130627-sda.raw of=/dev/sdd 4194304+0 records in 4194304+0 records out 2147483648 bytes (2.1 GB) copied, 799.487 s, 2.7 MB/s 1.) Seems to be a slow USB flash drive. Sure, typical slow, but large size and cheap USB stick most people have. 2.) What time did you get, if you are specify a large bs value on the dd command. For example dd if=Fedora-x86_64-19-20130627-sda.raw of=/dev/sdd bs=100M It is faster: 2147483648 bytes (2.1 GB) copied, 529.886 s, 4.1 MB/s bmaptool is still a bit faster than dd, because it sets the I/O scheduler to 'noop' and tweaks the queue size. It also does not hog the system, and it reacts on Ctrl-C almost immediately, unlike dd. I carefully took care of this. But this all is not the point, these are additional goodies. The main point of bmaptool is in having the bmap file, and _then_ you get real speed-up, based on the sparseness of the original image. Other things like reading from remote sites, progress indicator, protecting your mounted disks, uncompressing on-the-fly, checking sha1 of the data ond of the bmap file itself - are goodies, although important ones. This is not of interest for people who flash once a month. This is of interest for people who flash images more often than that, e.g., for testing, or in a production line (not Fedora case, though, I guess). But even for people who flash occasionally it would be nice to run only one single command: bmaptool copy URL-to-.xz-file /dev/my-block-device which would do everything, very quickly (if the connectivity is not a bottleneck), and reliably. Check the docs if you are interested, we have rpm/deb packages, publish tarballs, etc. But the base principle is to utilize the inherent sparseness most raw images have a lot of, record this in the bmap file before it is lost, then publish the image in any form (compressed or not), and use the bmap file for fetching the sparseness information and writing/copying only the real data, and leaving out the zeroes. And the larger is the image, and the more often you have to copy/flash it, the more useful bmap file is. -- Best Regards, Artem Bityutskiy -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On Wed, 2013-08-14 at 09:31 +0300, Artem Bityutskiy wrote: But this all is not the point, these are additional goodies. The main point of bmaptool is in having the bmap file, and _then_ you get real speed-up, based on the sparseness of the original image. Just as a rough demonstration, as I described in the original e-mail, I've reconstructed the original image sparseness using 'cp --sparse=always', and then created the bmap file, which at the end of this e-mail. You can see that only 33.6% of the image contains non-zero blocks, so with bmap file you'd copy only 688.6MiB instead of 2.1GiB. This is where the speed would come from. This is why I got a seemingly unrealistically fast flashing speed in the original e-mail. But sure, 33.6% is incorrect number, the real number will probably be slightly higher. The explanations why can be found here: https://source.tizen.org/documentation/reference/bmaptool/introduction in the Reconstructing sparse files section. It is technically easy to add bmap file generation to the fedora image build tools and generate the bmap file along with the image. I could try to do this if people are interested. Here is the bmap file I got: ?xml version=1.0 ? !-- This file contains the block map for an image file, which is basically a list of useful (mapped) block numbers in the image file. In other words, it lists only those blocks which contain data (boot sector, partition table, file-system metadata, files, directories, extents, etc). These blocks have to be copied to the target device. The other blocks do not contain any useful data and do not have to be copied to the target device. The block map an optimization which allows to copy or flash the image to the image quicker than copying of flashing the entire image. This is because with bmap less data is copied: MappedBlocksCount blocks instead of BlocksCount blocks. Besides the machine-readable data, this file contains useful commentaries which contain human-readable information like image size, percentage of mapped data, etc. The 'version' attribute is the block map file format version in the 'major.minor' format. The version major number is increased whenever an incompatible block map format change is made. The minor number changes in case of minor backward-compatible changes. -- bmap version=1.3 !-- Image size in bytes: 2.0 GiB -- ImageSize 2147483648 /ImageSize !-- Size of a block in bytes -- BlockSize 4096 /BlockSize !-- Count of blocks in the image file -- BlocksCount 524288 /BlocksCount !-- Count of mapped blocks: 688.6 MiB or 33.6% -- MappedBlocksCount 176288 /MappedBlocksCount !-- The checksum of this bmap file. When it is calculated, the value of the SHA1 checksum has be zero (40 ASCII 0 symbols). -- BmapFileSHA1 75b48dc596a5e92d7cc4935d8fcc3a91c2e48b0f /BmapFileSHA1 !-- The block map which consists of elements which may either be a range of blocks or a single block. The 'sha1' attribute (if present) is the SHA1 checksum of this blocks range. -- BlockMap Range sha1=ce5cc4f31d623a34c01f791e6e5c1ee65456044b 0-15 /Range Range sha1=4ca6ee9c01354785502ba201a7c03cdd4ffbaedb 240-1967 /Range Range sha1=6f0869f38618931042495803f5b4c02f2d7d03e9 8224-9583 /Range Range sha1=b7650fa6a6bb703e30438ae22ca7a53c107f72ad 9632-11599 /Range Range sha1=4e04cad0efb9194e7e00d407caaf89c9bc19042d 11632-11775 /Range Range sha1=feebd3924ce0db12f1d7186577ca54985984ab69 33008-33023 /Range Range sha1=ce74e311d9c1316f03b3b5a64225b607196419b9 33136-33567 /Range Range sha1=1e6a1f19428243b09da296e5e5494d9443b90c64 34032-34095 /Range Range sha1=9b2b5218b79d1b51e00d67814a8eea41d5cc430f 34544-34591 /Range Range sha1=f36da167d811feba274928d5c293b569b9f8c0aa 34656-34671 /Range Range sha1=2ffd3f90346ba2d0a9841865ccca0adfa19b830e 35056-35087 /Range Range sha1=48a82266b1351a543a169df69274c7f82c743988 35568-35615 /Range Range sha1=71ffc02cadefd321af333fa4b8e54ca6a2a48fbe 35888-35903 /Range Range sha1=e3b51cd7304df8df536e9ebd6d84d6d0d516d0a0 36080-36255 /Range Range sha1=6939de8eed4c6c35691f6b6ccf805e7d7a141b70 36304-36511 /Range Range sha1=9d2b2e2c0cc771f2a2684317ffdaa8557d5962e1 36592-37071 /Range Range sha1=5ccf6bca3a0140f3f549b67cf114e8847d8fd82e 37104-1 /Range Range sha1=a3259735e513eb38f27058e6c87bd555d060615a 62656-62687 /Range Range sha1=27c3ac0865d0b12180395e2d083f7bbdc0440bb3 62992-63023 /Range Range sha1=bfe6dbe0c325a0a95a6cf6d62226287f4dd0b99d 64144-64175 /Range Range sha1=ac97574508a8bd42f39bed6ccbe0ab1e57cbfd54 65120-98559 /Range Range sha1=2351cae4967fe643d5003b6f526c811cd311df45 98672-100975 /Range Range sha1=6a47e91dc5be78d69761ab762fc2e3ae1ea9bc8e 101040-101615
Re: Suggestion: bmap files and bmaptool
在 2013-8-13 PM9:52,Artem Bityutskiy dedeki...@gmail.com写道: Hi Fedora developers, I would like suggest you to take a look at bmaptool, which you can use for flashing Fedora ISO images to USB sticks (or other block devices). I've read an article about this tool in Chinese months ago. I can help submit a package review tomorrow if you want, then let us test it. Cheers. -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On Wed, Aug 14, 2013 at 09:31:22AM +0300, Artem Bityutskiy wrote: Other things like reading from remote sites, progress indicator, protecting your mounted disks, uncompressing on-the-fly, checking sha1 of the data ond of the bmap file itself - are goodies, although important ones. Why sha1? If the check is there for security reasons, please use at least sha256. You can also encode the checksum as base64, base85 or base91 to reduce the size of the bmap file. But the base principle is to utilize the inherent sparseness most raw images have a lot of, record this in the bmap file before it is lost, then publish the image in any form (compressed or not), and use the bmap file for fetching the sparseness information and writing/copying only the real data, and leaving out the zeroes. This does not sound safe, because it does not ensure that all data that should be zero actually is a zero. It works well for unassigned file systems blocks, but if there is a file containing zeroes in the file system (that is not a sparse file) it might not contains zeroes afterwards as far as I understand bmap. This does not sound like something that is safe to do. Regards Till -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On Wed, 2013-08-14 at 16:35 +0800, Christopher Meng wrote: 在 2013-8-13 PM9:52,Artem Bityutskiy dedeki...@gmail.com写道: Hi Fedora developers, I would like suggest you to take a look at bmaptool, which you can use for flashing Fedora ISO images to USB sticks (or other block devices). I've read an article about this tool in Chinese months ago. I can help submit a package review tomorrow if you want, then let us test it. Sure, please. The packaging may not be entirely correct, I am open to fix it, of course. Also, please, use the tip of the 'devel' branch in your experiments so far. -- Best Regards, Artem Bityutskiy -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On Wed, Aug 14, 2013 at 12:21:23PM +0300, Artem Bityutskiy wrote: On Wed, 2013-08-14 at 10:37 +0200, Till Maas wrote: On Wed, Aug 14, 2013 at 09:31:22AM +0300, Artem Bityutskiy wrote: Other things like reading from remote sites, progress indicator, protecting your mounted disks, uncompressing on-the-fly, checking sha1 of the data ond of the bmap file itself - are goodies, although important ones. Why sha1? If the check is there for security reasons, please use at least sha256. Should not be difficult to implement if there is demand. SHA-256 is used to create the signatures of other distributed files: https://fedoraproject.org/static/checksums/Fedora-19-i386-CHECKSUM Therefore if bmap is used it should also use at least SHA 256. It is recommended against using SHA-1 for more than 7 years now: http://csrc.nist.gov/groups/ST/hash/policy_2006.html It works well for unassigned file systems blocks, but if there is a file containing zeroes in the file system (that is not a sparse file) it might not contains zeroes afterwards as far as I understand bmap. It will, those blocks will be explicitly specified in the bmap file. And the zeroes will be copied. And this is exactly why I said that 'cp --sparse=always' does not generate the correct bmap file, I used it only for demosntration purposes. I see. Regards Till -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On Wed, 2013-08-14 at 11:44 +0200, Till Maas wrote: On Wed, Aug 14, 2013 at 12:21:23PM +0300, Artem Bityutskiy wrote: On Wed, 2013-08-14 at 10:37 +0200, Till Maas wrote: On Wed, Aug 14, 2013 at 09:31:22AM +0300, Artem Bityutskiy wrote: Other things like reading from remote sites, progress indicator, protecting your mounted disks, uncompressing on-the-fly, checking sha1 of the data ond of the bmap file itself - are goodies, although important ones. Why sha1? If the check is there for security reasons, please use at least sha256. Should not be difficult to implement if there is demand. SHA-256 is used to create the signatures of other distributed files: https://fedoraproject.org/static/checksums/Fedora-19-i386-CHECKSUM Therefore if bmap is used it should also use at least SHA 256. It is recommended against using SHA-1 for more than 7 years now: http://csrc.nist.gov/groups/ST/hash/policy_2006.html Sure, good point, thank you, I'll implement sha-256 support. -- Best Regards, Artem Bityutskiy -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
Artem Bityutskiy wrote: On Wed, 2013-08-14 at 11:44 +0200, Till Maas wrote: On Wed, Aug 14, 2013 at 12:21:23PM +0300, Artem Bityutskiy wrote: On Wed, 2013-08-14 at 10:37 +0200, Till Maas wrote: On Wed, Aug 14, 2013 at 09:31:22AM +0300, Artem Bityutskiy wrote: Other things like reading from remote sites, progress indicator, protecting your mounted disks, uncompressing on-the-fly, checking sha1 of the data ond of the bmap file itself - are goodies, although important ones. Why sha1? If the check is there for security reasons, please use at least sha256. Should not be difficult to implement if there is demand. SHA-256 is used to create the signatures of other distributed files: https://fedoraproject.org/static/checksums/Fedora-19-i386-CHECKSUM Therefore if bmap is used it should also use at least SHA 256. It is recommended against using SHA-1 for more than 7 years now: http://csrc.nist.gov/groups/ST/hash/policy_2006.html Sure, good point, thank you, I'll implement sha-256 support. Speaking of security, how is the integrity of the bmap file itself verified? A checksum is of no use if you don't know who generated the checksum. Fedora's checksum files are OpenPGP signed, as you can see in the one that Till linked to. I don't see a cryptographic signature in your example file. Are there detached signatures for the bmap files? And does Bmaptool verify the signatures? -- Björn Persson Sent from my computer. signature.asc Description: PGP signature -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On Wed, 2013-08-14 at 12:24 +0200, Björn Persson wrote: Speaking of security, how is the integrity of the bmap file itself verified? This is not implemented, unfortunately. This is another thing which I probably would need to do, and this is a very good point. I will look at this, after I do the SHA256 thing. A checksum is of no use if you don't know who generated the checksum. Fedora's checksum files are OpenPGP signed, as you can see in the one that Till linked to. Right, bmap file could also contain such a signature. I don't see a cryptographic signature in your example file. Are there detached signatures for the bmap files? Well, of course detached signatures can be generated. And does Bmaptool verify the signatures? But no, bmaptool does not verify them. And again, if there is real interest from Fedora community, I will try to implement this faster (or accept someone's contribution :-)) Thanks for the feed-back! -- Best Regards, Artem Bityutskiy -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On 08/14/2013 02:21 AM, Artem Bityutskiy wrote: I think I covered this part in the documentation. But here is a short description. 1. The bmap file should be created just after the image is generated. 2. The blocks where zeroes were explicitly written will be mapped to real sectors which will contain zeroes. 3. The blocks which were not explicitely written to, will be unmapped. 4. Creation of the bmap file is done using the FIEMAP ioctl 5. Only unmapped blocks will be omited in the bmap files. While on this, I should note that this works best on ext4 file-system. I did not test ext2/3, but they should work as well as ext4. Btrfs was also tested, but it is a little bit worse than ext4, I can explain why if someone is interested. Have you looked at partimage? It sounds like this except that it works on many different filesystems and doesn't need the blocks to be unmapped to compress it (i.e. it works on normal partitions as well as images). -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On Wed, 2013-08-14 at 21:16 -0700, Samuel Sieb wrote: On 08/14/2013 02:21 AM, Artem Bityutskiy wrote: I think I covered this part in the documentation. But here is a short description. 1. The bmap file should be created just after the image is generated. 2. The blocks where zeroes were explicitly written will be mapped to real sectors which will contain zeroes. 3. The blocks which were not explicitely written to, will be unmapped. 4. Creation of the bmap file is done using the FIEMAP ioctl 5. Only unmapped blocks will be omited in the bmap files. While on this, I should note that this works best on ext4 file-system. I did not test ext2/3, but they should work as well as ext4. Btrfs was also tested, but it is a little bit worse than ext4, I can explain why if someone is interested. Have you looked at partimage? It sounds like this except that it works on many different filesystems and doesn't need the blocks to be unmapped to compress it (i.e. it works on normal partitions as well as images). No, never saw this project before. Yeah, it sounds like it uses similar ideas to speed-up, but has different purposes and tries to know the file-system internal format, and hence, does not support ext4/btrfs simply because, as I guess, they are too complex and are developed too quickly. It is just too difficult to maintain a parallel implementation in user-space. Bmaptool does not know anything about the internals of the file-system. It does not care what is the FS underneath. bmaptool simply use the FIEMAP ioctl and ask the FS about which blocks are mapped (used). This would not work for partimage since it needs to know about all the blocks (superblock, all the other meta-data blocks), not just blocks belonging to a single file. Now, why I said that ext4 is the best one to use on the server (most probably ext2/3 are as good, but I did not verify). This is because ext4 is perfect in leaving the gaps. Even if you have one block gap, it still will account it as unmapped. I have a test where I create random mapped areas, and ext4 keeps all the gaps. But BTRFS sometimes maps small 1-block gaps. This is related to its internal structure. So with btrfs the bmap file becomes less ideal. Anyway, thanks for letting me know about partimage. -- Best Regards, Artem Bityutskiy -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Suggestion: bmap files and bmaptool
On Tue, Aug 13, 2013 at 04:58:16PM +0300, Artem Bityutskiy wrote: Hi Fedora developers, $ dd if=Fedora-x86_64-19-20130627-sda.raw of=/dev/sdd 4194304+0 records in 4194304+0 records out 2147483648 bytes (2.1 GB) copied, 799.487 s, 2.7 MB/s 1.) Seems to be a slow USB flash drive. 2.) What time did you get, if you are specify a large bs value on the dd command. For example dd if=Fedora-x86_64-19-20130627-sda.raw of=/dev/sdd bs=100M Best Regards: Jochen Schmitt -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct