Re: Suggestion: bmap files and bmaptool

2013-09-17 Thread Artem Bityutskiy
On Wed, 2013-08-14 at 11:44 +0200, Till Maas wrote:
 On Wed, Aug 14, 2013 at 12:21:23PM +0300, Artem Bityutskiy wrote:
  On Wed, 2013-08-14 at 10:37 +0200, Till Maas wrote:
   On Wed, Aug 14, 2013 at 09:31:22AM +0300, Artem Bityutskiy wrote:
   
Other things like reading from remote sites, progress indicator,
protecting your mounted disks, uncompressing on-the-fly, checking sha1
of the data ond of the bmap file itself - are goodies, although
important ones.
   
   Why sha1? If the check is there for security reasons, please use at
   least sha256.
  
  Should not be difficult to implement if there is demand.
 
 SHA-256 is used to create the signatures of other distributed files:
 https://fedoraproject.org/static/checksums/Fedora-19-i386-CHECKSUM

FYI, I've implemented SHA-256 support and it will be the default for
bmaptool soon. I also made sure other hash functions can be used as well
(e.g., SHA-512).

-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-09-17 Thread Artem Bityutskiy
On Wed, 2013-08-14 at 12:24 +0200, Björn Persson wrote:
 Artem Bityutskiy wrote:
 On Wed, 2013-08-14 at 11:44 +0200, Till Maas wrote:
  On Wed, Aug 14, 2013 at 12:21:23PM +0300, Artem Bityutskiy wrote:
   On Wed, 2013-08-14 at 10:37 +0200, Till Maas wrote:
On Wed, Aug 14, 2013 at 09:31:22AM +0300, Artem Bityutskiy wrote:

 Other things like reading from remote sites, progress
 indicator, protecting your mounted disks, uncompressing
 on-the-fly, checking sha1 of the data ond of the bmap file
 itself - are goodies, although important ones.

Why sha1? If the check is there for security reasons, please use
at least sha256.
   
   Should not be difficult to implement if there is demand.
  
  SHA-256 is used to create the signatures of other distributed files:
  https://fedoraproject.org/static/checksums/Fedora-19-i386-CHECKSUM
  
  Therefore if bmap is used it should also use at least SHA 256. It is
  recommended against using SHA-1 for more than 7 years now:
  http://csrc.nist.gov/groups/ST/hash/policy_2006.html
 
 Sure, good point, thank you, I'll implement sha-256 support.
 
 Speaking of security, how is the integrity of the bmap file itself
 verified? A checksum is of no use if you don't know who generated the
 checksum. Fedora's checksum files are OpenPGP signed, as you can see in
 the one that Till linked to. I don't see a cryptographic signature in
 your example file. Are there detached signatures for the bmap files?
 And does Bmaptool verify the signatures?

I've implemented gpg signature verification.

Now the bmap file can be gpg-signed and in this case bmaptool will
verify the signature. Both Fedora-like clearsign gpg signatures and
detached signatures are supported.

-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-09-17 Thread Artem Bityutskiy
On Sun, 2013-08-18 at 16:43 +0100, Pádraig Brady wrote:
 You definitely need the fsync before doing the fiemap.
 We saw this on certain file systems including ext4 when adding
 fiemap support (efficient reading of holes) to cp.
 This is a bug in the fiemap interface IMHO in that it returns
 fairly useless data unless FIEMAP_FLAG_SYNC is specified.
 For a general utility like cp, we couldn't sync each file before copying
 (even only large files), so we restrict fiemap usage to files that
 have a different disk usage than apparent size and so probably contain holes.

FYI, I've just made sure I use the FIEMAP_FLAG_SYNC is used in the
project.

-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-19 Thread Artem Bityutskiy
On Fri, 2013-08-16 at 09:51 -0500, Eric Sandeen wrote:
 by single file I meant a _single_ file, not the original file and a mapping
 file.  :)

Oh, sorry, OK.

 I realize that you have a fully-fledged set of tools, and you're not looking
 for new directions, but I was thinking about encoding mapping info  file data
 into a single file, with a tool to extract it again.  That way there's nothing
 to get out of sync.

Well, I do wish to know about alternatives, or the better ways to do it,
of course.

 Actually, now that I think about it qemu-img can do that already:
 (sorry, I'm getting a little off topic here, bear with me)
 
 # truncate --size=1g fsfile
 # mkfs.ext4 fsfile
 # cp fsfile --sparse=never fsfile.copy
 
 // fsfile is sparse; the copy is not.
 
 # du -hc fsfile*
 49M   fsfile
 1.0G  fsfile.copy
 
 # qemu-img convert -f raw -O qcow fsfile.copy fsfile.qcow
 
 // the qcow image now contains only data+mapping info,
 // no zero ranges:
 # du -h fsfile.qcow
 832K  fsfile.qcow
 
 // and can be re-extracted into a sparse file
 
 # qemu-img convert -O raw fsfile.qcow fsfile.copy2
 # du -hc fsfile.copy2 
 352K  fsfile.copy2

So from 49M down to 352K, sparseness increased by 48M? Where it came
from? Must be that this command turned zero blocks into gaps. Like the
zeroed out inode tables, etc.

 Ok, sorry for that diversion, but that's cool - the tool I want
 already exists, and I hadn't realized it.  :)

Exactly this usage is not good enough for flashing purposes, because
when flashing to a block device you have to flash the zeroes, you cannot
skip them as you do for holes.

But I am sure the qcow tools can save sparseness without turning zeroes
into gaps, or at least this should not be too difficult to implement
this.

But in case of bmaptool I chose to keep the sparseness information in a
separate file because I wanted to make sure the bmap is optional. Those
who use Windows/Mac or do not want to install any additional tools could
still use the old method of throwing the entire image to the target
block device.

Another goal I had is to make the additional bmap file to be a
human-readable text file.

But yes, having all in one file does have its own advantages.

-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-19 Thread Artem Bityutskiy
On Sat, 2013-08-17 at 17:09 +0100, Richard W.M. Jones wrote:
 On Thu, Aug 15, 2013 at 12:34:26PM -0500, Eric Sandeen wrote:
  But then there's the issue of transporting these sparse files
  around.  We have had the same problem in the past with large e2image
  metadata image files, which may be terabytes in length, with only
  gigabytes or megabytes of real data.  e2image _itself_ creates a
  sparse file, but bzipping it or rsyncing it still processes
  terabytes of zeros, and loses all notion of sparseness.
 
 xz preserves sparseness.  We use it for preserving and compressing
 virt-sparsify'd images.

Right, this is a good solution for the problem area of just saving the
sparseness and later restoring it.

The problem area for bmaptool is a bit wider.

1. There are large images which are inherently sparse
2. They are distributed via ftp/http/etc servers

How do we enable the users flashing these images

a) very quickly
b) easily
c) without breaking the old way of flashing (dd)

For the first part, we exploit the sparseness information, which is
saved in the bmap file.

For the second part, we implement stream-reading directly from the
remote service, stream-decompressing on-the fly, and flashing in
parallel. This rules out the xz saves sparseness design.

For the third part, we keep the sparseness information in a separate
file which makes it entirely optional.


-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-19 Thread Artem Bityutskiy
On Sun, 2013-08-18 at 16:43 +0100, Pádraig Brady wrote:
 You definitely need the fsync before doing the fiemap.
 We saw this on certain file systems including ext4 when adding
 fiemap support (efficient reading of holes) to cp.
 This is a bug in the fiemap interface IMHO in that it returns
 fairly useless data unless FIEMAP_FLAG_SYNC is specified.
 For a general utility like cp, we couldn't sync each file before copying
 (even only large files), so we restrict fiemap usage to files that
 have a different disk usage than apparent size and so probably contain holes.

I see, thanks a lot for suggestion.

Just to make it clear, this is more like working around file-system
bugs, which, as I guess, were present in early days of FIEMAP.

I'll look into adding FIEMAP_FLAG_SYNC to Fiemap.py in bmap-tools,
thanks!

-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-18 Thread Pádraig Brady
On 08/16/2013 09:46 AM, Artem Bityutskiy wrote:
 Hi Eric,
 
 thanks for the question. Sorry my answers contain extra information, but
 I assume that other people read this, and may benefit from the info.
 After all, I am trying to get people interested, this is a good tool
 IMO, and I would like to get more users and hopefully contributors. :-)
 
 On Thu, 2013-08-15 at 12:34 -0500, Eric Sandeen wrote:
 On 8/13/13 8:58 AM, Artem Bityutskiy wrote:
 # Make the image to be sparse
 $ cp --sparse=always Fedora-x86_64-19-20130627-sda.raw
 Fedora-x86_64-19-20130627-sda.raw.sparse

 # Generate the bmap file
 $ bmaptool create Fedora-x86_64-19-20130627-sda.raw.sparse -o
 Fedora-x86_64-19-20130627-sda.raw.sparse.bmap

 So this is the part that interests me . . .
 
 Before going further, I want to quckly note that the Tizen image
 generation software uses the BmapCreate library API directly, instead of
 running the bmaptool command-line utility. The library comes with the
 bmap-tools project.
 
 http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/bmaptools/BmapCreate.py
 
 There seem to be two issues here; how do we efficiently (compress and)
 transport sparse files while retaining sparseness, and how do we
 efficiently operate on files which are already sparse.
 
 Yes, it is our assumption is that sparseness gets lost as soon as the
 image is  compressed or copied to the download server, or anywhere else.
 
 The idea is that as soon as you generate the raw image on the build
 server, you generate the bmap file right away, _before_ the sparseness
 gets lost.
 
 The sparseness information is then saved in the bmap file. Then you can
 compress the image, copy it around, and lose the sparseness. The bmap
 file preserves it.
 
 And of course this means that you should not modify the image later on,
 otherwise the bmap file becomes incorrect, and the checksums, which are
 inside the bmap file, will probably mismatch.
 
 For the latter, you're using your bmap tool to map what is hopefully a
 static file (via fibmap or fiemap, I guess?).
 
 Yes, we use FIEMAP. Here is the python module which does the job:
 
 http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/bmaptools/Fiemap.py
 
 I haven't looked at how you've done it, but you do need to be very
 careful that the file is stable  quiesced on disk.
 
 Right. We generate the bmap file on the _build server_, inside the tool
 which generates the raw image. At this point we do know we fully control
 the image, and no one touches it while we generate the bmap.
 
 But yes, this is a good point, may be I need to put it to the man page.
 Which, by the way, as I figure out now, needs to be somewhat updated:
 
 http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/docs/man1/bmaptool.1
 
   Mapping it this way can be fraught with errors if the file is
 changing, or has delalloc blocks, etc.
 
 Good point. Tizen image generator fsync()'s the file before creating
 bmap, but I guess I have to do this in the BmapCreate library too, to be
 safe.
 
 Thanks!
 
   And of course getting the mapping wrong means data corruption.
 
 Right. But as I said, we are using bmaptool for a year now, and nothing
 which looks like a corruption was reported so far.
 
 But the importance of fsync() is a very good point, I'll improve the
 library and make it explicitely fsync, and probably ignore the EROFS
 error, in case the file is R/O.
 
   If the file is known to be sparse, then going forward, using
 SEEK_HOLE / SEEK_DATA is probably the best approach.
 
 Why are they better than FIEMAP?
 
 I did consider them, actually, but they are very new, and build servers
 tend to use older kernels, so I chose FIEMAP. I actually first used
 FIBMAP, but it is too slow, so I switched to FIEMAP.

 But then there's the issue of transporting these sparse files around.
 We have had the same problem in the past with large e2image metadata
 image files, which may be terabytes in length, with only gigabytes or
 megabytes of real data.  e2image _itself_ creates a sparse file, but
 bzipping it or rsyncing it still processes terabytes of zeros, and
 loses all notion of sparseness.
 
 Right, but the scenario I keep in mind is that the bmap file is created
 at the _very_ beginning, and carried/published together with the image,
 as a stand-alone file with the same basename and .bmap extension.
 
 The zeroes in the image can be very well compressed with xz, so people
 download/copy a lot less than Terabytes. And then people just run this
 command to re-create the original sparse file:
 
 $ bmaptool copy --bmap huge.img.bmap huge.img.xz a_sparse_copy.img
 
 This will decompress huge.img.xz on-the-fly and put it to
 a_sparse_copy.img. The a_sparse_copy.img file will be sparse.
 
 Note, it bmaptool auto-discovers the bmap file if it has a common
 basename with the image, and if it sits in the same directory, so this
 command can instead be:
 
 bmaptool copy 

Re: Suggestion: bmap files and bmaptool

2013-08-17 Thread Richard W.M. Jones
On Thu, Aug 15, 2013 at 12:34:26PM -0500, Eric Sandeen wrote:
 But then there's the issue of transporting these sparse files
 around.  We have had the same problem in the past with large e2image
 metadata image files, which may be terabytes in length, with only
 gigabytes or megabytes of real data.  e2image _itself_ creates a
 sparse file, but bzipping it or rsyncing it still processes
 terabytes of zeros, and loses all notion of sparseness.

xz preserves sparseness.  We use it for preserving and compressing
virt-sparsify'd images.

 Another approach which might (?) be more robust, is to somehow
 encode that sparseness in a single file format that can be
 transported/compressed/copied w/o losing the sparseness information,
 and another tool to operate efficiently on that format at the
 destination, either by unpacking it to a normal sparse file or
 piping it to some other process.

qcow2 :-)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-16 Thread Artem Bityutskiy
Hi Eric,

thanks for the question. Sorry my answers contain extra information, but
I assume that other people read this, and may benefit from the info.
After all, I am trying to get people interested, this is a good tool
IMO, and I would like to get more users and hopefully contributors. :-)

On Thu, 2013-08-15 at 12:34 -0500, Eric Sandeen wrote:
 On 8/13/13 8:58 AM, Artem Bityutskiy wrote:
  # Make the image to be sparse
  $ cp --sparse=always Fedora-x86_64-19-20130627-sda.raw
 Fedora-x86_64-19-20130627-sda.raw.sparse
  
  # Generate the bmap file
  $ bmaptool create Fedora-x86_64-19-20130627-sda.raw.sparse -o
 Fedora-x86_64-19-20130627-sda.raw.sparse.bmap
 
 So this is the part that interests me . . .

Before going further, I want to quckly note that the Tizen image
generation software uses the BmapCreate library API directly, instead of
running the bmaptool command-line utility. The library comes with the
bmap-tools project.

http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/bmaptools/BmapCreate.py

 There seem to be two issues here; how do we efficiently (compress and)
 transport sparse files while retaining sparseness, and how do we
 efficiently operate on files which are already sparse.

Yes, it is our assumption is that sparseness gets lost as soon as the
image is  compressed or copied to the download server, or anywhere else.

The idea is that as soon as you generate the raw image on the build
server, you generate the bmap file right away, _before_ the sparseness
gets lost.

The sparseness information is then saved in the bmap file. Then you can
compress the image, copy it around, and lose the sparseness. The bmap
file preserves it.

And of course this means that you should not modify the image later on,
otherwise the bmap file becomes incorrect, and the checksums, which are
inside the bmap file, will probably mismatch.

 For the latter, you're using your bmap tool to map what is hopefully a
 static file (via fibmap or fiemap, I guess?).

Yes, we use FIEMAP. Here is the python module which does the job:

http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/bmaptools/Fiemap.py

 I haven't looked at how you've done it, but you do need to be very
 careful that the file is stable  quiesced on disk.

Right. We generate the bmap file on the _build server_, inside the tool
which generates the raw image. At this point we do know we fully control
the image, and no one touches it while we generate the bmap.

But yes, this is a good point, may be I need to put it to the man page.
Which, by the way, as I figure out now, needs to be somewhat updated:

http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/docs/man1/bmaptool.1

   Mapping it this way can be fraught with errors if the file is
 changing, or has delalloc blocks, etc.

Good point. Tizen image generator fsync()'s the file before creating
bmap, but I guess I have to do this in the BmapCreate library too, to be
safe.

Thanks!

   And of course getting the mapping wrong means data corruption.

Right. But as I said, we are using bmaptool for a year now, and nothing
which looks like a corruption was reported so far.

But the importance of fsync() is a very good point, I'll improve the
library and make it explicitely fsync, and probably ignore the EROFS
error, in case the file is R/O.

   If the file is known to be sparse, then going forward, using
 SEEK_HOLE / SEEK_DATA is probably the best approach.

Why are they better than FIEMAP?

I did consider them, actually, but they are very new, and build servers
tend to use older kernels, so I chose FIEMAP. I actually first used
FIBMAP, but it is too slow, so I switched to FIEMAP.
 
 But then there's the issue of transporting these sparse files around.
 We have had the same problem in the past with large e2image metadata
 image files, which may be terabytes in length, with only gigabytes or
 megabytes of real data.  e2image _itself_ creates a sparse file, but
 bzipping it or rsyncing it still processes terabytes of zeros, and
 loses all notion of sparseness.

Right, but the scenario I keep in mind is that the bmap file is created
at the _very_ beginning, and carried/published together with the image,
as a stand-alone file with the same basename and .bmap extension.

The zeroes in the image can be very well compressed with xz, so people
download/copy a lot less than Terabytes. And then people just run this
command to re-create the original sparse file:

$ bmaptool copy --bmap huge.img.bmap huge.img.xz a_sparse_copy.img

This will decompress huge.img.xz on-the-fly and put it to
a_sparse_copy.img. The a_sparse_copy.img file will be sparse.

Note, it bmaptool auto-discovers the bmap file if it has a common
basename with the image, and if it sits in the same directory, so this
command can instead be:

bmaptool copy huge.img.xz a_sparse_copy

(analogy to cp from to).

And of course, huge.img.xz can be, say:

bmaptool copy 

Re: Suggestion: bmap files and bmaptool

2013-08-16 Thread Eric Sandeen
On 8/16/13 3:46 AM, Artem Bityutskiy wrote:
 Another approach which might (?) be more robust, is to somehow encode
  that sparseness in a single file format that can be
  transported/compressed/copied w/o losing the sparseness information,
  and another tool to operate efficiently on that format at the
  destination, either by unpacking it to a normal sparse file or piping
  it to some other process.
 Err, not sure I fully understand, but it sounds like what bmap-tools
 project actually does.

by single file I meant a _single_ file, not the original file and a mapping
file.  :)

I realize that you have a fully-fledged set of tools, and you're not looking
for new directions, but I was thinking about encoding mapping info  file data
into a single file, with a tool to extract it again.  That way there's nothing
to get out of sync.

Actually, now that I think about it qemu-img can do that already:
(sorry, I'm getting a little off topic here, bear with me)

# truncate --size=1g fsfile
# mkfs.ext4 fsfile
# cp fsfile --sparse=never fsfile.copy

// fsfile is sparse; the copy is not.

# du -hc fsfile*
49M fsfile
1.0Gfsfile.copy

# qemu-img convert -f raw -O qcow fsfile.copy fsfile.qcow

// the qcow image now contains only data+mapping info,
// no zero ranges:
# du -h fsfile.qcow
832Kfsfile.qcow

// and can be re-extracted into a sparse file

# qemu-img convert -O raw fsfile.qcow fsfile.copy2
# du -hc fsfile.copy2 
352Kfsfile.copy2

Ok, sorry for that diversion, but that's cool - the tool I want
already exists, and I hadn't realized it.  :)

So that's a decent option for encoding sparse files for efficient
transfer, too.

 Piping is not implemented, because sparseness cannot be easily passed
 though a pipe.

Err, right ;)

-Eric

  Just some thoughts...
  
 Thanks a lot for the feed-back!

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-15 Thread Eric Sandeen
On 8/13/13 8:58 AM, Artem Bityutskiy wrote:
 # Make the image to be sparse
 $ cp --sparse=always Fedora-x86_64-19-20130627-sda.raw 
 Fedora-x86_64-19-20130627-sda.raw.sparse
 
 # Generate the bmap file
 $ bmaptool create Fedora-x86_64-19-20130627-sda.raw.sparse -o 
 Fedora-x86_64-19-20130627-sda.raw.sparse.bmap

So this is the part that interests me . . . 

There seem to be two issues here; how do we efficiently (compress and) 
transport sparse files while retaining sparseness, and how do we efficiently 
operate on files which are already sparse.

For the latter, you're using your bmap tool to map what is hopefully a static 
file (via fibmap or fiemap, I guess?).

I haven't looked at how you've done it, but you do need to be very careful that 
the file is stable  quiesced on disk.  Mapping it this way can be fraught with 
errors if the file is changing, or has delalloc blocks, etc.  And of course 
getting the mapping wrong means data corruption.  If the file is known to be 
sparse, then going forward, using SEEK_HOLE / SEEK_DATA is probably the best 
approach.

But then there's the issue of transporting these sparse files around.  We have 
had the same problem in the past with large e2image metadata image files, which 
may be terabytes in length, with only gigabytes or megabytes of real data.  
e2image _itself_ creates a sparse file, but bzipping it or rsyncing it still 
processes terabytes of zeros, and loses all notion of sparseness.

xfs_metadump worked around this by creating its own compact format describing a 
sparse file's data  sparseness, which is unpacked into a normal sparse file 
by xfs_mdrestore.

More recently e2image gained something slightly similar, but used the existing 
qcow format to encode the sparseness.  qemu-image convert to raw type turns 
it back into a normal sparse file readable by e2fsprogs tools.

So I guess your solution requires 2 pieces of information; the existing file, 
and the mapping file.  Are there mechanisms to ensure that they are in sync?

Another approach which might (?) be more robust, is to somehow encode that 
sparseness in a single file format that can be transported/compressed/copied 
w/o losing the sparseness information, and another tool to operate efficiently 
on that format at the destination, either by unpacking it to a normal sparse 
file or piping it to some other process.

Just some thoughts...

-Eric
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-14 Thread Artem Bityutskiy
On Tue, 2013-08-13 at 17:48 +0200, Jochen Schmitt wrote:
 On Tue, Aug 13, 2013 at 04:58:16PM +0300, Artem Bityutskiy wrote:
  Hi Fedora developers,
 
  $ dd if=Fedora-x86_64-19-20130627-sda.raw of=/dev/sdd
  4194304+0 records in
  4194304+0 records out
  2147483648 bytes (2.1 GB) copied, 799.487 s, 2.7 MB/s
 
 1.) Seems to be a slow USB flash drive.

Sure, typical slow, but large size and cheap USB stick most people have.

 2.) What time did you get, if you are specify a large bs value
 on the dd command. For example
 
 dd if=Fedora-x86_64-19-20130627-sda.raw of=/dev/sdd bs=100M

It is faster: 2147483648 bytes (2.1 GB) copied, 529.886 s, 4.1 MB/s


bmaptool is still a bit faster than dd, because it sets the I/O
scheduler to 'noop' and tweaks the queue size. It also does not hog the
system, and it reacts on Ctrl-C almost immediately, unlike dd. I
carefully took care of this.

But this all is not the point, these are additional goodies. The main
point of bmaptool is in having the bmap file, and _then_ you get real
speed-up, based on the sparseness of the original image.

Other things like reading from remote sites, progress indicator,
protecting your mounted disks, uncompressing on-the-fly, checking sha1
of the data ond of the bmap file itself - are goodies, although
important ones.

This is not of interest for people who flash once a month. This is of
interest for people who flash images more often than that, e.g., for
testing, or in a production line (not Fedora case, though, I guess).

But even for people who flash occasionally it would be nice to run only
one single command:

bmaptool copy URL-to-.xz-file /dev/my-block-device

which would do everything, very quickly (if the connectivity is not a
bottleneck), and reliably.

Check the docs if you are interested, we have rpm/deb packages, publish
tarballs, etc.

But the base principle is to utilize the inherent sparseness most raw
images have a lot of, record this in the bmap file before it is lost,
then publish the image in any form (compressed or not), and use the bmap
file for fetching the sparseness information and writing/copying only
the real data, and leaving out the zeroes.

And the larger is the image, and the more often you have to copy/flash
it, the more useful bmap file is. 

-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-14 Thread Artem Bityutskiy
On Wed, 2013-08-14 at 09:31 +0300, Artem Bityutskiy wrote:
 But this all is not the point, these are additional goodies. The main
 point of bmaptool is in having the bmap file, and _then_ you get real
 speed-up, based on the sparseness of the original image.

Just as a rough demonstration, as I described in the original e-mail,
I've reconstructed the original image sparseness using 'cp
--sparse=always', and then created the bmap file, which at the end of
this e-mail.

You can see that only 33.6% of the image contains non-zero blocks, so
with bmap file you'd copy only 688.6MiB instead of 2.1GiB. This is where
the speed would come from. This is why I got a seemingly unrealistically
fast flashing speed in the original e-mail.

But sure, 33.6% is incorrect number, the real number will probably be
slightly higher. The explanations why can be found here:

https://source.tizen.org/documentation/reference/bmaptool/introduction

in the Reconstructing sparse files section.

It is technically easy to add bmap file generation to the fedora image
build tools and generate the bmap file along with the image. I could try
to do this if people are interested.

Here is the bmap file I got:


?xml version=1.0 ?
!-- This file contains the block map for an image file, which is basically
 a list of useful (mapped) block numbers in the image file. In other words,
 it lists only those blocks which contain data (boot sector, partition
 table, file-system metadata, files, directories, extents, etc). These
 blocks have to be copied to the target device. The other blocks do not
 contain any useful data and do not have to be copied to the target
 device.

 The block map an optimization which allows to copy or flash the image to
 the image quicker than copying of flashing the entire image. This is
 because with bmap less data is copied: MappedBlocksCount blocks instead
 of BlocksCount blocks.

 Besides the machine-readable data, this file contains useful commentaries
 which contain human-readable information like image size, percentage of
 mapped data, etc.

 The 'version' attribute is the block map file format version in the
 'major.minor' format. The version major number is increased whenever an
 incompatible block map format change is made. The minor number changes
 in case of minor backward-compatible changes. --

bmap version=1.3
!-- Image size in bytes: 2.0 GiB --
ImageSize 2147483648 /ImageSize

!-- Size of a block in bytes --
BlockSize 4096 /BlockSize

!-- Count of blocks in the image file --
BlocksCount 524288 /BlocksCount

!-- Count of mapped blocks: 688.6 MiB or 33.6%  --
MappedBlocksCount 176288 /MappedBlocksCount

!-- The checksum of this bmap file. When it is calculated, the value of
 the SHA1 checksum has be zero (40 ASCII 0 symbols). --
BmapFileSHA1 75b48dc596a5e92d7cc4935d8fcc3a91c2e48b0f /BmapFileSHA1

!-- The block map which consists of elements which may either be a
 range of blocks or a single block. The 'sha1' attribute (if present)
 is the SHA1 checksum of this blocks range. --
BlockMap
Range sha1=ce5cc4f31d623a34c01f791e6e5c1ee65456044b 0-15 /Range
Range sha1=4ca6ee9c01354785502ba201a7c03cdd4ffbaedb 240-1967 
/Range
Range sha1=6f0869f38618931042495803f5b4c02f2d7d03e9 8224-9583 
/Range
Range sha1=b7650fa6a6bb703e30438ae22ca7a53c107f72ad 9632-11599 
/Range
Range sha1=4e04cad0efb9194e7e00d407caaf89c9bc19042d 11632-11775 
/Range
Range sha1=feebd3924ce0db12f1d7186577ca54985984ab69 33008-33023 
/Range
Range sha1=ce74e311d9c1316f03b3b5a64225b607196419b9 33136-33567 
/Range
Range sha1=1e6a1f19428243b09da296e5e5494d9443b90c64 34032-34095 
/Range
Range sha1=9b2b5218b79d1b51e00d67814a8eea41d5cc430f 34544-34591 
/Range
Range sha1=f36da167d811feba274928d5c293b569b9f8c0aa 34656-34671 
/Range
Range sha1=2ffd3f90346ba2d0a9841865ccca0adfa19b830e 35056-35087 
/Range
Range sha1=48a82266b1351a543a169df69274c7f82c743988 35568-35615 
/Range
Range sha1=71ffc02cadefd321af333fa4b8e54ca6a2a48fbe 35888-35903 
/Range
Range sha1=e3b51cd7304df8df536e9ebd6d84d6d0d516d0a0 36080-36255 
/Range
Range sha1=6939de8eed4c6c35691f6b6ccf805e7d7a141b70 36304-36511 
/Range
Range sha1=9d2b2e2c0cc771f2a2684317ffdaa8557d5962e1 36592-37071 
/Range
Range sha1=5ccf6bca3a0140f3f549b67cf114e8847d8fd82e 37104-1 
/Range
Range sha1=a3259735e513eb38f27058e6c87bd555d060615a 62656-62687 
/Range
Range sha1=27c3ac0865d0b12180395e2d083f7bbdc0440bb3 62992-63023 
/Range
Range sha1=bfe6dbe0c325a0a95a6cf6d62226287f4dd0b99d 64144-64175 
/Range
Range sha1=ac97574508a8bd42f39bed6ccbe0ab1e57cbfd54 65120-98559 
/Range
Range sha1=2351cae4967fe643d5003b6f526c811cd311df45 98672-100975 
/Range
Range sha1=6a47e91dc5be78d69761ab762fc2e3ae1ea9bc8e 101040-101615 

Re: Suggestion: bmap files and bmaptool

2013-08-14 Thread Christopher Meng
在 2013-8-13 PM9:52,Artem Bityutskiy dedeki...@gmail.com写道:

 Hi Fedora developers,

 I would like suggest you to take a look at bmaptool, which you can use
 for flashing Fedora ISO images to USB sticks (or other block devices).

I've read an article about this tool in Chinese months ago.

I can help submit a package review tomorrow if you want, then let us test
it.

Cheers.
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-14 Thread Till Maas
On Wed, Aug 14, 2013 at 09:31:22AM +0300, Artem Bityutskiy wrote:

 Other things like reading from remote sites, progress indicator,
 protecting your mounted disks, uncompressing on-the-fly, checking sha1
 of the data ond of the bmap file itself - are goodies, although
 important ones.

Why sha1? If the check is there for security reasons, please use at
least sha256. You can also encode the checksum as base64, base85 or
base91 to reduce the size of the bmap file.

 But the base principle is to utilize the inherent sparseness most raw
 images have a lot of, record this in the bmap file before it is lost,
 then publish the image in any form (compressed or not), and use the bmap
 file for fetching the sparseness information and writing/copying only
 the real data, and leaving out the zeroes.

This does not sound safe, because it does not ensure that all data that
should be zero actually is a zero. It works well for unassigned file
systems blocks, but if there is a file containing zeroes in the file
system (that is not a sparse file) it might not contains zeroes
afterwards as far as I understand bmap. This does not sound like
something that is safe to do.

Regards
Till
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-14 Thread Artem Bityutskiy
On Wed, 2013-08-14 at 16:35 +0800, Christopher Meng wrote:
 在 2013-8-13 PM9:52,Artem Bityutskiy dedeki...@gmail.com写道:
 
  Hi Fedora developers,
 
  I would like suggest you to take a look at bmaptool, which you can
 use
  for flashing Fedora ISO images to USB sticks (or other block
 devices).
 
 I've read an article about this tool in Chinese months ago. 
 
 I can help submit a package review tomorrow if you want, then let us
 test it.
 
Sure, please. The packaging may not be entirely correct, I am open to
fix it, of course. Also, please, use the tip of the 'devel' branch in
your experiments so far.

-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-14 Thread Till Maas
On Wed, Aug 14, 2013 at 12:21:23PM +0300, Artem Bityutskiy wrote:
 On Wed, 2013-08-14 at 10:37 +0200, Till Maas wrote:
  On Wed, Aug 14, 2013 at 09:31:22AM +0300, Artem Bityutskiy wrote:
  
   Other things like reading from remote sites, progress indicator,
   protecting your mounted disks, uncompressing on-the-fly, checking sha1
   of the data ond of the bmap file itself - are goodies, although
   important ones.
  
  Why sha1? If the check is there for security reasons, please use at
  least sha256.
 
 Should not be difficult to implement if there is demand.

SHA-256 is used to create the signatures of other distributed files:
https://fedoraproject.org/static/checksums/Fedora-19-i386-CHECKSUM

Therefore if bmap is used it should also use at least SHA 256. It is
recommended against using SHA-1 for more than 7 years now:
http://csrc.nist.gov/groups/ST/hash/policy_2006.html

   It works well for unassigned file
  systems blocks, but if there is a file containing zeroes in the file
  system (that is not a sparse file) it might not contains zeroes
  afterwards as far as I understand bmap.
 
 It will, those blocks will be explicitly specified in the bmap file. And
 the zeroes will be copied.
 
 And this is exactly why I said that 'cp --sparse=always' does not
 generate the correct bmap file, I used it only for demosntration
 purposes.

I see.

Regards
Till
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-14 Thread Artem Bityutskiy
On Wed, 2013-08-14 at 11:44 +0200, Till Maas wrote:
 On Wed, Aug 14, 2013 at 12:21:23PM +0300, Artem Bityutskiy wrote:
  On Wed, 2013-08-14 at 10:37 +0200, Till Maas wrote:
   On Wed, Aug 14, 2013 at 09:31:22AM +0300, Artem Bityutskiy wrote:
   
Other things like reading from remote sites, progress indicator,
protecting your mounted disks, uncompressing on-the-fly, checking sha1
of the data ond of the bmap file itself - are goodies, although
important ones.
   
   Why sha1? If the check is there for security reasons, please use at
   least sha256.
  
  Should not be difficult to implement if there is demand.
 
 SHA-256 is used to create the signatures of other distributed files:
 https://fedoraproject.org/static/checksums/Fedora-19-i386-CHECKSUM
 
 Therefore if bmap is used it should also use at least SHA 256. It is
 recommended against using SHA-1 for more than 7 years now:
 http://csrc.nist.gov/groups/ST/hash/policy_2006.html

Sure, good point, thank you, I'll implement sha-256 support.

-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-14 Thread Björn Persson
Artem Bityutskiy wrote:
On Wed, 2013-08-14 at 11:44 +0200, Till Maas wrote:
 On Wed, Aug 14, 2013 at 12:21:23PM +0300, Artem Bityutskiy wrote:
  On Wed, 2013-08-14 at 10:37 +0200, Till Maas wrote:
   On Wed, Aug 14, 2013 at 09:31:22AM +0300, Artem Bityutskiy wrote:
   
Other things like reading from remote sites, progress
indicator, protecting your mounted disks, uncompressing
on-the-fly, checking sha1 of the data ond of the bmap file
itself - are goodies, although important ones.
   
   Why sha1? If the check is there for security reasons, please use
   at least sha256.
  
  Should not be difficult to implement if there is demand.
 
 SHA-256 is used to create the signatures of other distributed files:
 https://fedoraproject.org/static/checksums/Fedora-19-i386-CHECKSUM
 
 Therefore if bmap is used it should also use at least SHA 256. It is
 recommended against using SHA-1 for more than 7 years now:
 http://csrc.nist.gov/groups/ST/hash/policy_2006.html

Sure, good point, thank you, I'll implement sha-256 support.

Speaking of security, how is the integrity of the bmap file itself
verified? A checksum is of no use if you don't know who generated the
checksum. Fedora's checksum files are OpenPGP signed, as you can see in
the one that Till linked to. I don't see a cryptographic signature in
your example file. Are there detached signatures for the bmap files?
And does Bmaptool verify the signatures?

-- 
Björn Persson

Sent from my computer.


signature.asc
Description: PGP signature
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-14 Thread Artem Bityutskiy
On Wed, 2013-08-14 at 12:24 +0200, Björn Persson wrote:
 Speaking of security, how is the integrity of the bmap file itself
 verified?

This is not implemented, unfortunately. This is another thing which I
probably would need to do, and this is a very good point.

I will look at this, after I do the SHA256 thing.

  A checksum is of no use if you don't know who generated the
 checksum. Fedora's checksum files are OpenPGP signed, as you can see
 in
 the one that Till linked to.

Right, bmap file could also contain such a signature.

  I don't see a cryptographic signature in
 your example file. Are there detached signatures for the bmap files?

Well, of course detached signatures can be generated.

 And does Bmaptool verify the signatures?

But no, bmaptool does not verify them. And again, if there is real
interest from Fedora community, I will try to implement this faster (or
accept someone's contribution :-))

Thanks for the feed-back!

-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-14 Thread Samuel Sieb

On 08/14/2013 02:21 AM, Artem Bityutskiy wrote:

I think I covered this part in the documentation. But here is a short
description.

1. The bmap file should be created just after the image is generated.
2. The blocks where zeroes were explicitly written will be mapped to
real sectors which will contain zeroes.
3. The blocks which were not explicitely written to, will be unmapped.
4. Creation of the bmap file is done using the FIEMAP ioctl
5. Only unmapped blocks will be omited in the bmap files.

While on this, I should note that this works best on ext4 file-system. I
did not test ext2/3, but they should work as well as ext4. Btrfs was
also tested, but it is a little bit worse than ext4, I can explain why
if someone is interested.


Have you looked at partimage?  It sounds like this except that it works 
on many different filesystems and doesn't need the blocks to be unmapped 
to compress it (i.e. it works on normal partitions as well as images).

--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-14 Thread Artem Bityutskiy
On Wed, 2013-08-14 at 21:16 -0700, Samuel Sieb wrote:
 On 08/14/2013 02:21 AM, Artem Bityutskiy wrote:
  I think I covered this part in the documentation. But here is a short
  description.
 
  1. The bmap file should be created just after the image is generated.
  2. The blocks where zeroes were explicitly written will be mapped to
  real sectors which will contain zeroes.
  3. The blocks which were not explicitely written to, will be unmapped.
  4. Creation of the bmap file is done using the FIEMAP ioctl
  5. Only unmapped blocks will be omited in the bmap files.
 
  While on this, I should note that this works best on ext4 file-system. I
  did not test ext2/3, but they should work as well as ext4. Btrfs was
  also tested, but it is a little bit worse than ext4, I can explain why
  if someone is interested.
 
 Have you looked at partimage?  It sounds like this except that it works 
 on many different filesystems and doesn't need the blocks to be unmapped 
 to compress it (i.e. it works on normal partitions as well as images).

No, never saw this project before. Yeah, it sounds like it uses similar
ideas to speed-up, but has different purposes and tries to know the
file-system internal format, and hence, does not support ext4/btrfs
simply because, as I guess, they are too complex and are developed too
quickly. It is just too difficult to maintain a parallel implementation
in user-space.

Bmaptool does not know anything about the internals of the file-system.
It does not care what is the FS underneath. bmaptool simply use the
FIEMAP ioctl and ask the FS about which blocks are mapped (used).

This would not work for partimage since it needs to know about all the
blocks (superblock, all the other meta-data blocks), not just blocks
belonging to a single file.

Now, why I said that ext4 is the best one to use on the server (most
probably ext2/3 are as good, but I did not verify). This is because ext4
is perfect in leaving the gaps. Even if you have one block gap, it
still will account it as unmapped. I have a test where I create random
mapped areas, and ext4 keeps all the gaps. But BTRFS sometimes maps
small 1-block gaps. This is related to its internal structure. So with
btrfs the bmap file becomes less ideal. 

Anyway, thanks for letting me know about partimage.

-- 
Best Regards,
Artem Bityutskiy

-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Suggestion: bmap files and bmaptool

2013-08-13 Thread Jochen Schmitt
On Tue, Aug 13, 2013 at 04:58:16PM +0300, Artem Bityutskiy wrote:
 Hi Fedora developers,

 $ dd if=Fedora-x86_64-19-20130627-sda.raw of=/dev/sdd
 4194304+0 records in
 4194304+0 records out
 2147483648 bytes (2.1 GB) copied, 799.487 s, 2.7 MB/s

1.) Seems to be a slow USB flash drive.

2.) What time did you get, if you are specify a large bs value
on the dd command. For example

dd if=Fedora-x86_64-19-20130627-sda.raw of=/dev/sdd bs=100M

Best Regards:

Jochen Schmitt
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct