subject:"Re\: Suggestion\: bmap files and bmaptool"

Re: Suggestion: bmap files and bmaptool

2013-09-17 Thread Artem Bityutskiy

On Wed, 2013-08-14 at 11:44 +0200, Till Maas wrote:
 On Wed, Aug 14, 2013 at 12:21:23PM +0300, Artem Bityutskiy wrote:
  On Wed, 2013-08-14 at 10:37 +0200, Till Maas wrote:
   On Wed, Aug 14, 2013 at 09:31:22AM +0300, Artem Bityutskiy wrote:
   
Other things like reading from remote sites, progress indicator,
protecting your mounted disks, uncompressing on-the-fly, checking sha1
of the data ond of the bmap file itself - are goodies, although
important ones.
   
   Why sha1? If the check is there for security reasons, please use at
   least sha256.
  
  Should not be difficult to implement if there is demand.
 
 SHA-256 is used to create the signatures of other distributed files:
 https://fedoraproject.org/static/checksums/Fedora-19-i386-CHECKSUM

FYI, I've implemented SHA-256 support and it will be the default for
bmaptool soon. I also made sure other hash functions can be used as well
(e.g., SHA-512).

-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-09-17 Thread Artem Bityutskiy

On Wed, 2013-08-14 at 12:24 +0200, Björn Persson wrote:
 Artem Bityutskiy wrote:
 On Wed, 2013-08-14 at 11:44 +0200, Till Maas wrote:
  On Wed, Aug 14, 2013 at 12:21:23PM +0300, Artem Bityutskiy wrote:
   On Wed, 2013-08-14 at 10:37 +0200, Till Maas wrote:
On Wed, Aug 14, 2013 at 09:31:22AM +0300, Artem Bityutskiy wrote:

 Other things like reading from remote sites, progress
 indicator, protecting your mounted disks, uncompressing
 on-the-fly, checking sha1 of the data ond of the bmap file
 itself - are goodies, although important ones.

Why sha1? If the check is there for security reasons, please use
at least sha256.
   
   Should not be difficult to implement if there is demand.
  
  SHA-256 is used to create the signatures of other distributed files:
  https://fedoraproject.org/static/checksums/Fedora-19-i386-CHECKSUM
  
  Therefore if bmap is used it should also use at least SHA 256. It is
  recommended against using SHA-1 for more than 7 years now:
  http://csrc.nist.gov/groups/ST/hash/policy_2006.html
 
 Sure, good point, thank you, I'll implement sha-256 support.
 
 Speaking of security, how is the integrity of the bmap file itself
 verified? A checksum is of no use if you don't know who generated the
 checksum. Fedora's checksum files are OpenPGP signed, as you can see in
 the one that Till linked to. I don't see a cryptographic signature in
 your example file. Are there detached signatures for the bmap files?
 And does Bmaptool verify the signatures?

I've implemented gpg signature verification.

Now the bmap file can be gpg-signed and in this case bmaptool will
verify the signature. Both Fedora-like clearsign gpg signatures and
detached signatures are supported.

-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-09-17 Thread Artem Bityutskiy

On Sun, 2013-08-18 at 16:43 +0100, Pádraig Brady wrote:
 You definitely need the fsync before doing the fiemap.
 We saw this on certain file systems including ext4 when adding
 fiemap support (efficient reading of holes) to cp.
 This is a bug in the fiemap interface IMHO in that it returns
 fairly useless data unless FIEMAP_FLAG_SYNC is specified.
 For a general utility like cp, we couldn't sync each file before copying
 (even only large files), so we restrict fiemap usage to files that
 have a different disk usage than apparent size and so probably contain holes.

FYI, I've just made sure I use the FIEMAP_FLAG_SYNC is used in the
project.

-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-19 Thread Artem Bityutskiy

On Fri, 2013-08-16 at 09:51 -0500, Eric Sandeen wrote:
 by single file I meant a _single_ file, not the original file and a mapping
 file.  :)

Oh, sorry, OK.

 I realize that you have a fully-fledged set of tools, and you're not looking
 for new directions, but I was thinking about encoding mapping info  file data
 into a single file, with a tool to extract it again.  That way there's nothing
 to get out of sync.

Well, I do wish to know about alternatives, or the better ways to do it,
of course.

 Actually, now that I think about it qemu-img can do that already:
 (sorry, I'm getting a little off topic here, bear with me)
 
 # truncate --size=1g fsfile
 # mkfs.ext4 fsfile
 # cp fsfile --sparse=never fsfile.copy
 
 // fsfile is sparse; the copy is not.
 
 # du -hc fsfile*
 49M   fsfile
 1.0G  fsfile.copy
 
 # qemu-img convert -f raw -O qcow fsfile.copy fsfile.qcow
 
 // the qcow image now contains only data+mapping info,
 // no zero ranges:
 # du -h fsfile.qcow
 832K  fsfile.qcow
 
 // and can be re-extracted into a sparse file
 
 # qemu-img convert -O raw fsfile.qcow fsfile.copy2
 # du -hc fsfile.copy2 
 352K  fsfile.copy2

So from 49M down to 352K, sparseness increased by 48M? Where it came
from? Must be that this command turned zero blocks into gaps. Like the
zeroed out inode tables, etc.

 Ok, sorry for that diversion, but that's cool - the tool I want
 already exists, and I hadn't realized it.  :)

Exactly this usage is not good enough for flashing purposes, because
when flashing to a block device you have to flash the zeroes, you cannot
skip them as you do for holes.

But I am sure the qcow tools can save sparseness without turning zeroes
into gaps, or at least this should not be too difficult to implement
this.

But in case of bmaptool I chose to keep the sparseness information in a
separate file because I wanted to make sure the bmap is optional. Those
who use Windows/Mac or do not want to install any additional tools could
still use the old method of throwing the entire image to the target
block device.

Another goal I had is to make the additional bmap file to be a
human-readable text file.

But yes, having all in one file does have its own advantages.

-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-19 Thread Artem Bityutskiy

On Sat, 2013-08-17 at 17:09 +0100, Richard W.M. Jones wrote:
 On Thu, Aug 15, 2013 at 12:34:26PM -0500, Eric Sandeen wrote:
  But then there's the issue of transporting these sparse files
  around.  We have had the same problem in the past with large e2image
  metadata image files, which may be terabytes in length, with only
  gigabytes or megabytes of real data.  e2image _itself_ creates a
  sparse file, but bzipping it or rsyncing it still processes
  terabytes of zeros, and loses all notion of sparseness.
 
 xz preserves sparseness.  We use it for preserving and compressing
 virt-sparsify'd images.

Right, this is a good solution for the problem area of just saving the
sparseness and later restoring it.

The problem area for bmaptool is a bit wider.

1. There are large images which are inherently sparse
2. They are distributed via ftp/http/etc servers

How do we enable the users flashing these images

a) very quickly
b) easily
c) without breaking the old way of flashing (dd)

For the first part, we exploit the sparseness information, which is
saved in the bmap file.

For the second part, we implement stream-reading directly from the
remote service, stream-decompressing on-the fly, and flashing in
parallel. This rules out the xz saves sparseness design.

For the third part, we keep the sparseness information in a separate
file which makes it entirely optional.


-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-19 Thread Artem Bityutskiy

On Sun, 2013-08-18 at 16:43 +0100, Pádraig Brady wrote:
 You definitely need the fsync before doing the fiemap.
 We saw this on certain file systems including ext4 when adding
 fiemap support (efficient reading of holes) to cp.
 This is a bug in the fiemap interface IMHO in that it returns
 fairly useless data unless FIEMAP_FLAG_SYNC is specified.
 For a general utility like cp, we couldn't sync each file before copying
 (even only large files), so we restrict fiemap usage to files that
 have a different disk usage than apparent size and so probably contain holes.

I see, thanks a lot for suggestion.

Just to make it clear, this is more like working around file-system
bugs, which, as I guess, were present in early days of FIEMAP.

I'll look into adding FIEMAP_FLAG_SYNC to Fiemap.py in bmap-tools,
thanks!

-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-18 Thread Pádraig Brady

On 08/16/2013 09:46 AM, Artem Bityutskiy wrote:
Hi Eric,

thanks for the question. Sorry my answers contain extra information, but
I assume that other people read this, and may benefit from the info.
After all, I am trying to get people interested, this is a good tool
IMO, and I would like to get more users and hopefully contributors. :-)

On Thu, 2013-08-15 at 12:34 -0500, Eric Sandeen wrote:
On 8/13/13 8:58 AM, Artem Bityutskiy wrote:
# Make the image to be sparse
$ cp --sparse=always Fedora-x86_64-19-20130627-sda.raw
Fedora-x86_64-19-20130627-sda.raw.sparse

# Generate the bmap file
$ bmaptool create Fedora-x86_64-19-20130627-sda.raw.sparse -o
Fedora-x86_64-19-20130627-sda.raw.sparse.bmap

So this is the part that interests me . . .

Before going further, I want to quckly note that the Tizen image
generation software uses the BmapCreate library API directly, instead of
running the bmaptool command-line utility. The library comes with the
bmap-tools project.

http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/bmaptools/BmapCreate.py

There seem to be two issues here; how do we efficiently (compress and)
transport sparse files while retaining sparseness, and how do we
efficiently operate on files which are already sparse.

Yes, it is our assumption is that sparseness gets lost as soon as the
image is compressed or copied to the download server, or anywhere else.

The idea is that as soon as you generate the raw image on the build
server, you generate the bmap file right away, _before_ the sparseness
gets lost.

The sparseness information is then saved in the bmap file. Then you can
compress the image, copy it around, and lose the sparseness. The bmap
file preserves it.

And of course this means that you should not modify the image later on,
otherwise the bmap file becomes incorrect, and the checksums, which are
inside the bmap file, will probably mismatch.

For the latter, you're using your bmap tool to map what is hopefully a
static file (via fibmap or fiemap, I guess?).

Yes, we use FIEMAP. Here is the python module which does the job:

http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/bmaptools/Fiemap.py

I haven't looked at how you've done it, but you do need to be very
careful that the file is stable quiesced on disk.

Right. We generate the bmap file on the _build server_, inside the tool
which generates the raw image. At this point we do know we fully control
the image, and no one touches it while we generate the bmap.

But yes, this is a good point, may be I need to put it to the man page.
Which, by the way, as I figure out now, needs to be somewhat updated:

http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/docs/man1/bmaptool.1

Mapping it this way can be fraught with errors if the file is
changing, or has delalloc blocks, etc.

Good point. Tizen image generator fsync()'s the file before creating
bmap, but I guess I have to do this in the BmapCreate library too, to be
safe.

Thanks!

And of course getting the mapping wrong means data corruption.

Right. But as I said, we are using bmaptool for a year now, and nothing
which looks like a corruption was reported so far.

But the importance of fsync() is a very good point, I'll improve the
library and make it explicitely fsync, and probably ignore the EROFS
error, in case the file is R/O.

If the file is known to be sparse, then going forward, using
SEEK_HOLE / SEEK_DATA is probably the best approach.

Why are they better than FIEMAP?

I did consider them, actually, but they are very new, and build servers
tend to use older kernels, so I chose FIEMAP. I actually first used
FIBMAP, but it is too slow, so I switched to FIEMAP.

But then there's the issue of transporting these sparse files around.
We have had the same problem in the past with large e2image metadata
image files, which may be terabytes in length, with only gigabytes or
megabytes of real data. e2image _itself_ creates a sparse file, but
bzipping it or rsyncing it still processes terabytes of zeros, and
loses all notion of sparseness.

Right, but the scenario I keep in mind is that the bmap file is created
at the _very_ beginning, and carried/published together with the image,
as a stand-alone file with the same basename and .bmap extension.

The zeroes in the image can be very well compressed with xz, so people
download/copy a lot less than Terabytes. And then people just run this
command to re-create the original sparse file:

$ bmaptool copy --bmap huge.img.bmap huge.img.xz a_sparse_copy.img

This will decompress huge.img.xz on-the-fly and put it to
a_sparse_copy.img. The a_sparse_copy.img file will be sparse.

Note, it bmaptool auto-discovers the bmap file if it has a common
basename with the image, and if it sits in the same directory, so this
command can instead be:

bmaptool copy

Re: Suggestion: bmap files and bmaptool

2013-08-17 Thread Richard W.M. Jones

On Thu, Aug 15, 2013 at 12:34:26PM -0500, Eric Sandeen wrote:
 But then there's the issue of transporting these sparse files
 around.  We have had the same problem in the past with large e2image
 metadata image files, which may be terabytes in length, with only
 gigabytes or megabytes of real data.  e2image _itself_ creates a
 sparse file, but bzipping it or rsyncing it still processes
 terabytes of zeros, and loses all notion of sparseness.

xz preserves sparseness.  We use it for preserving and compressing
virt-sparsify'd images.

 Another approach which might (?) be more robust, is to somehow
 encode that sparseness in a single file format that can be
 transported/compressed/copied w/o losing the sparseness information,
 and another tool to operate efficiently on that format at the
 destination, either by unpacking it to a normal sparse file or
 piping it to some other process.

qcow2 :-)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-16 Thread Artem Bityutskiy

Hi Eric,