Re: Shrinking a device - performance?

2017-04-01 Thread Peter Grandi
[ ... ]

>>>   $  D='btrfs f2fs gfs2 hfsplus jfs nilfs2 reiserfs udf xfs'
>>>   $  find $D -name '*.ko' | xargs size | sed 's/^  *//;s/ .*\t//g'
>>>   textfilename
>>>   832719  btrfs/btrfs.ko
>>>   237952  f2fs/f2fs.ko
>>>   251805  gfs2/gfs2.ko
>>>   72731   hfsplus/hfsplus.ko
>>>   171623  jfs/jfs.ko
>>>   173540  nilfs2/nilfs2.ko
>>>   214655  reiserfs/reiserfs.ko
>>>   81628   udf/udf.ko
>>>   658637  xfs/xfs.ko

That was Linux AMD64.

> udf is 637K on Mac OS 10.6
> exfat is 75K on Mac OS 10.9
> msdosfs is 79K on Mac OS 10.9
> ntfs is 394K (That must be Paragon's ntfs for Mac)
...
> zfs is 1.7M (10.9)
> spl is 247K (10.9)

Similar on Linux AMD64 but smaller:

  $ size updates/dkms/*.ko | sed 's/^  *//;s/ .*\t//g'
  textfilename
  62005   updates/dkms/spl.ko
  184370  updates/dkms/splat.ko
  3879updates/dkms/zavl.ko
  22688   updates/dkms/zcommon.ko
  1012212 updates/dkms/zfs.ko
  39874   updates/dkms/znvpair.ko
  18321   updates/dkms/zpios.ko
  319224  updates/dkms/zunicode.ko

> If they are somehow comparable even with the differences, 833K
> is not bad for btrfs compared to zfs. I did not look at the
> format of the file; it must be binary, but compression may be
> optional for third party kexts. So the kernel module sizes are
> large for both btrfs and zfs. Given the feature sets of both,
> is that surprising?

Not surprising and indeed I agree with the statement that
appeared earlier that "there are use cases that actually need
them". There are also use cases that need realtime translation
of file content from chinese to spanish, and one could add to
ZFS or Btrfs an extension to detect the language of text files
and invoke via HTTP Google Translate, for example with option
"translate=chinese-spanish" at mount time; or less flexibly
there are many use cases where B-Tree lookup of records in files
is useful, and it would be possible to add that to Btrfs or ZFS,
so that for example 'lseek(4,"Jane Smith",SEEK_KEY)' would be
possible, as in the ancient TSS/370 filesystem design.

But the question is about engineering, where best to implement
those "feature sets": in the kernel or higher levels. There is
no doubt for me that realtime language translation and seeking
by key can be added to a filesystem kernel module, and would
"work". The issue is a crudely technical one: "works" for an
engineer is not a binary state, but a statistical property over
a wide spectrum of cost/benefit tradeoffs.

Adding "feature sets" because "there are use cases that actually
need them" is fine, adding their implementation to the kernel
driver of a filesystem is quite a different proposition, which
may have downsides, as the implementations of those feature sets
may make code more complex and harder to understand and test,
never mind debug, even for the base features. But of course lots
of people know better :-).

Buit there is more; look again at some compiled code sizes as a
crude proxy for complexity, divided in two groups, both of
robust, full featured designs:

  1012212 updates/dkms/zfs.ko
  832719  btrfs/btrfs.ko
  658637  xfs/xfs.ko

  237952  f2fs/f2fs.ko
  173540  nilfs2/nilfs2.ko
  171623  jfs/jfs.ko
  81628   udf/udf.ko

The code size for JFS or NILFS2 or UDF is roughly 1/4 the code
size for XFS, yet there is little difference in functionality.
Compared to ZFS as to base functionality JFS lacks checksums and
snapshots (in theory it has subvolumes, but they are disabled),
but NILFS2 has snapshots and checksums (but does not verify them
on ordinary reads), and yet the code size is 1/6 that of ZFS.
ZFS has also RAID, but looking at the code size of the Linux MD
RAID modules I see rather smaller numbers. Even so ZFS has a
good reputation for reliability despire its amazing complexity,
but that is also because SUN invested big into massive release
engineering for it, and similarly for XFS.

Therefore my impression is that the filesystems in the first
group have a lot of cool features like compression or dedup
etc. that could have been implemented user-level, and having
them in the kernel is good "for "marketing" purposes, to win
box-ticking competitions".
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-04-01 Thread Kai Krakow
Am Mon, 27 Mar 2017 20:06:46 +0500
schrieb Roman Mamedov :

> On Mon, 27 Mar 2017 16:49:47 +0200
> Christian Theune  wrote:
> 
> > Also: the idea of migrating on btrfs also has its downside - the
> > performance of “mkdir” and “fsync” is abysmal at the moment. I’m
> > waiting for the current shrinking job to finish but this is likely
> > limited to the “find free space” algorithm. We’re talking about a
> > few megabytes converted per second. Sigh.  
> 
> Btw since this is all on LVM already, you could set up lvmcache with
> a small SSD-based cache volume. Even some old 60GB SSD would work
> wonders for performance, and with the cache policy of "writethrough"
> you don't have to worry about its reliability (much).

That's maybe the best recommendation to speed things up. I'm using
bcache here for the same reasons (speeding up random workloads) and it
works wonders.

Tho, for such big storage I'd maybe recommend a bigger SSD and a new
one. Bigger SSDs tend to last much longer. Just don't use the whole of
it to allow for better wear leveling and you'll get a final setup that
can serve the system much longer than for the period of migration.

-- 
Regards,
Kai

Replies to list-only preferred.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-31 Thread GWB
Indeed, that does make sense.  It's the output of the size command in
the Berkeley format of "text", not decimal, octal or hex.  Out of
curiosity about kernel module sizes, I dug up some old MacBooks and
looked around in:

/System/Library/Extensions/[modulename].kext/Content/MacOS:

udf is 637K on Mac OS 10.6
exfat is 75K on Mac OS 10.9
msdosfs is 79K on Mac OS 10.9
ntfs is 394K (That must be Paragon's ntfs for Mac)

And here's the kernel extension sizes for zfs (From OpenZFS):

/Library/Extensions/[modulename].kext/Content/MacOS:

zfs is 1.7M (10.9)
spl is 247K (10.9)

Different kernel from linux, of course (evidently a "mish mash" of
NextStep, BSD, Mach and Apple's own code), but that is one large
kernel extension for zfs.  If they are somehow comparable even with
the differences, 833K is not bad for btrfs compared to zfs.  I did not
look at the format of the file; it must be binary, but compression may
be optional for third party kexts.

So the kernel module sizes are large for both btrfs and zfs.  Given
the feature sets of both, is that surprising?

My favourite kernel extension in Mac OS X is:

/System/Library/Extensions/Dont Steal Mac OS X.kext/

Subtle, very subtle.

Gordon

On Fri, Mar 31, 2017 at 9:42 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> GWB posted on Fri, 31 Mar 2017 19:02:40 -0500 as excerpted:
>
>> It is confusing, and now that I look at it, more than a little funny.
>> Your use of xargs returns the size of the kernel module for each of the
>> filesystem types.  I think I get it now: you are pointing to how large
>> the kernel module for btrfs is compared to other file system kernel
>> modules, 833 megs (piping find through xargs to sed).  That does not
>> mean the btrfs kernel module can accommodate an upper limit of a command
>> line length that is 833 megs.  It is just a very big loadable kernel
>> module.
>
> Umm... 833 K, not M, I believe.  (The unit is bytes not KiB.)
>
> Because if just one kernel module is nearing a gigabyte, then the kernel
> must be many gigabytes either monolithic or once assembled in memory, and
> it just ain't so.
>
> But FWIW megs was my first-glance impression too, until my brain said "No
> way!  Doesn't work!" and I took a second look.
>
> The kernel may indeed no longer fit on a 1.44 MB floppy, but it's still
> got a ways to go before it's multiple GiB! =:^)  While they're XZ-
> compressed, I'm still fitting several monolithic-build kernels including
> their appended initramfs, along with grub, its config and modules, and a
> few other misc things, in a quarter-GB dup-mode btrfs, meaning 128 MiB
> capacity, including the 16 MiB system chunk so 112 MiB for data and
> metadata.  That simply wouldn't be possible if the kernel itself were
> multi-GB, even uncompressed.  Even XZ isn't /that/ good!
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-31 Thread Duncan
GWB posted on Fri, 31 Mar 2017 19:02:40 -0500 as excerpted:

> It is confusing, and now that I look at it, more than a little funny.
> Your use of xargs returns the size of the kernel module for each of the
> filesystem types.  I think I get it now: you are pointing to how large
> the kernel module for btrfs is compared to other file system kernel
> modules, 833 megs (piping find through xargs to sed).  That does not
> mean the btrfs kernel module can accommodate an upper limit of a command
> line length that is 833 megs.  It is just a very big loadable kernel
> module.

Umm... 833 K, not M, I believe.  (The unit is bytes not KiB.)

Because if just one kernel module is nearing a gigabyte, then the kernel 
must be many gigabytes either monolithic or once assembled in memory, and 
it just ain't so.

But FWIW megs was my first-glance impression too, until my brain said "No 
way!  Doesn't work!" and I took a second look.

The kernel may indeed no longer fit on a 1.44 MB floppy, but it's still 
got a ways to go before it's multiple GiB! =:^)  While they're XZ-
compressed, I'm still fitting several monolithic-build kernels including 
their appended initramfs, along with grub, its config and modules, and a 
few other misc things, in a quarter-GB dup-mode btrfs, meaning 128 MiB 
capacity, including the 16 MiB system chunk so 112 MiB for data and 
metadata.  That simply wouldn't be possible if the kernel itself were 
multi-GB, even uncompressed.  Even XZ isn't /that/ good!

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-31 Thread GWB
It is confusing, and now that I look at it, more than a little funny.
Your use of xargs returns the size of the kernel module for each of
the filesystem types.  I think I get it now: you are pointing to how
large the kernel module for btrfs is compared to other file system
kernel modules, 833 megs (piping find through xargs to sed).  That
does not mean the btrfs kernel module can accommodate an upper limit
of a command line length that is 833 megs.  It is just a very big
loadable kernel module.

So same question, but different expression: what is the signifigance
of the large size of the btrfs kernel module?  Is it that the larger
the module, the more complex, the more prone to breakage, and more
difficult to debug?  Is the hfsplus kernel module less complex, and
more robust?  What did the file system designers of hfsplus (or udf)
know better (or worse?) than the file system designers of btrfs?

VAX/VMS clusters just aren't happy outside of a deeply hidden bunker
running 9 machines in a cluster from one storage device connected by
myranet over 500 miles to the next cluster.  I applaud the move to
x86, but like I wrote earlier, time has moved on.  I suppose weird is
in the eye of the beholder, but yes, when dial up was king and disco
pants roamed the earth, they were nice.  I don't think x86 is a viable
use case even for OpenVMS.  If you really need a VAX/VMS cluster,
chances are you have already have had one running with a continuous
uptime of more than a decade and you have already upgraded and changed
out every component several times by cycling down one machine in the
cluster at a time.

Gordon

On Fri, Mar 31, 2017 at 3:27 PM, Peter Grandi  
wrote:
>> [ ... ] what the signifigance of the xargs size limits of
>> btrfs might be. [ ... ] So what does it mean that btrfs has a
>> higher xargs size limit than other file systems? [ ... ] Or
>> does the lower capacity for argument length for hfsplus
>> demonstrate it is the superior file system for avoiding
>> breakage? [ ... ]
>
> That confuses, as my understanding of command argument size
> limit is that it is a system, not filesystem, property, and for
> example can be obtained with 'getconf _POSIX_ARG_MAX'.
>
>> Personally, I would go back to fossil and venti on Plan 9 for
>> an archival data server (using WORM drives),
>
> In an ideal world we would be using Plan 9. Not necessarily with
> Fossil and Venti. As a to storage/backup/archival Linux based
> options are not bad, even if the platform is far messier than
> Plan 9 (or some other alternatives). BTW I just noticed with a
> search that AWS might be offering Plan 9 hosts :-).
>
>> and VAX/VMS cluster for an HA server. [ ... ]
>
> Uhmmm, however nice it was, it was fairly weird. An IA32 or
> AMD64 port has been promised however :-).
>
> https://www.theregister.co.uk/2016/10/13/openvms_moves_slowly_towards_x86/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-31 Thread Peter Grandi
> [ ... ] what the signifigance of the xargs size limits of
> btrfs might be. [ ... ] So what does it mean that btrfs has a
> higher xargs size limit than other file systems? [ ... ] Or
> does the lower capacity for argument length for hfsplus
> demonstrate it is the superior file system for avoiding
> breakage? [ ... ]

That confuses, as my understanding of command argument size
limit is that it is a system, not filesystem, property, and for
example can be obtained with 'getconf _POSIX_ARG_MAX'.

> Personally, I would go back to fossil and venti on Plan 9 for
> an archival data server (using WORM drives),

In an ideal world we would be using Plan 9. Not necessarily with
Fossil and Venti. As a to storage/backup/archival Linux based
options are not bad, even if the platform is far messier than
Plan 9 (or some other alternatives). BTW I just noticed with a
search that AWS might be offering Plan 9 hosts :-).

> and VAX/VMS cluster for an HA server. [ ... ]

Uhmmm, however nice it was, it was fairly weird. An IA32 or
AMD64 port has been promised however :-).

https://www.theregister.co.uk/2016/10/13/openvms_moves_slowly_towards_x86/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-31 Thread GWB
Well, now I am curious.  Until we hear back from Christiane on the
progress of the never ending file system shrinkage, I suppose it can't
hurt to ask what the signifigance of the xargs size limits of btrfs
might be.  Or, again, if Christiane is already happily on his way to
an xfs server running over lvm, skip, ignore, delete.

Here is the output of xargs --size-limits on my laptop:

<<
$ xargs --show-limits
Your environment variables take up 4830 bytes
POSIX upper limit on argument length (this system): 2090274
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2085444
Size of command buffer we are actually using: 131072

Execution of xargs will continue now...
>>

That is for a laptop system.  So what does it mean that btrfs has a
higher xargs size limit than other file systems?  Could I
theoretically use 40% of the total allowed argument length of the
system for btrfs arguments alone?  Would that make balance, shrinkage,
etc., faster?  Does the higher capacity for argument length mean btrfs
is overly complex and therefore more prone to breakage?  Or does the
lower capacity for argument length for hfsplus demonstrate it is the
superior file system for avoiding breakage?

Or does it means that hfsplus is very old (and reflects older xargs
limits), and that btrfs is newer code?  I am relatively new to btrfs,
and would like to find out.  I am also attracted to the idea that it
is better to leave some operations to the system itself, and not code
them into the file system.  For example, I think deduplication "off
line" or "out of band" is an advantage for btrfs over zfs.  But that's
only for what I do.  For other uses deduplication "in line", while
writing the file, is preferred, and that is what zfs does (preferably
with lots of memory, at least one ssd to run zil, caches, etc.).

I use btrfs now because Ubuntu has it as a default in the kernel, and
I assume that when (not "if") I have to use a system rescue disk (USB
or CD) it will have some capacity to repair btrfs.  Along the way,
btrfs has been quite good as a general purpose file system on root; it
makes and sends snapshots, and so far only needs an occasional scrub
and balance.  My earlier experience with btrfs on a 2TB drive was more
complicated, but I expected that for a file system with a lot of
potential but less maturity.

Personally, I would go back to fossil and venti on Plan 9 for an
archival data server (using WORM drives), and VAX/VMS cluster for an
HA server.  But of course that no longer makes sense except for a very
few usage cases.  Time has moved on, prices have dropped drastically,
and hardware can do a lot more per penny than it used to.

Gordon

On Fri, Mar 31, 2017 at 12:25 PM, Peter Grandi  
wrote:
 My guess is that very complex risky slow operations like
 that are provided by "clever" filesystem developers for
 "marketing" purposes, to win box-ticking competitions.
>
 That applies to those system developers who do know better;
 I suspect that even some filesystem developers are
 "optimistic" as to what they can actually achieve.
>
 There are cases where there really is no other sane
 option. Not everyone has the kind of budget needed for
 proper HA setups,
>
>>> Thnaks for letting me know, that must have never occurred to
>>> me, just as it must have never occurred to me that some
>>> people expect extremely advanced features that imply
>>> big-budget high-IOPS high-reliability storage to be fast and
>>> reliable on small-budget storage too :-)
>
>> You're missing my point (or intentionally ignoring it).
>
> In "Thanks for letting me know" I am not missing your point, I
> am simply pointing out that I do know that people try to run
> high-budget workloads on low-budget storage.
>
> The argument as to whether "very complex risky slow operations"
> should be provided in the filesystem itself is a very different
> one, and I did not develop it fully. But is quite "optimistic"
> to simply state "there really is no other sane option", even
> when for people that don't have "proper HA setups".
>
> Let'a start by assuming for the time being. that "very complex
> risky slow operations" are indeed feasible on very reliable high
> speed storage layers. Then the questions become:
>
> * Is it really true that "there is no other sane option" to
>   running "very complex risky slow operations" even on storage
>   that is not "big-budget high-IOPS high-reliability"?
>
> * Is is really true that it is a good idea to run "very complex
>   risky slow operations" even on ¨big-budget high-IOPS
>   high-reliability storage"?
>
>> Those types of operations are implemented because there are
>> use cases that actually need them, not because some developer
>> thought it would be cool. [ ... ]
>
> And this is the really crucial bit, I'll disregard without
> agreeing too much (but in part I do) with the rest of the
> 

Re: Shrinking a device - performance?

2017-03-31 Thread Peter Grandi
>>> My guess is that very complex risky slow operations like
>>> that are provided by "clever" filesystem developers for
>>> "marketing" purposes, to win box-ticking competitions.

>>> That applies to those system developers who do know better;
>>> I suspect that even some filesystem developers are
>>> "optimistic" as to what they can actually achieve.

>>> There are cases where there really is no other sane
>>> option. Not everyone has the kind of budget needed for
>>> proper HA setups,

>> Thnaks for letting me know, that must have never occurred to
>> me, just as it must have never occurred to me that some
>> people expect extremely advanced features that imply
>> big-budget high-IOPS high-reliability storage to be fast and
>> reliable on small-budget storage too :-)

> You're missing my point (or intentionally ignoring it).

In "Thanks for letting me know" I am not missing your point, I
am simply pointing out that I do know that people try to run
high-budget workloads on low-budget storage.

The argument as to whether "very complex risky slow operations"
should be provided in the filesystem itself is a very different
one, and I did not develop it fully. But is quite "optimistic"
to simply state "there really is no other sane option", even
when for people that don't have "proper HA setups".

Let'a start by assuming for the time being. that "very complex
risky slow operations" are indeed feasible on very reliable high
speed storage layers. Then the questions become:

* Is it really true that "there is no other sane option" to
  running "very complex risky slow operations" even on storage
  that is not "big-budget high-IOPS high-reliability"?

* Is is really true that it is a good idea to run "very complex
  risky slow operations" even on ¨big-budget high-IOPS
  high-reliability storage"?

> Those types of operations are implemented because there are
> use cases that actually need them, not because some developer
> thought it would be cool. [ ... ]

And this is the really crucial bit, I'll disregard without
agreeing too much (but in part I do) with the rest of the
response, as those are less important matters, and this is going
to be londer than a twitter message.

First, I agree that "there are use cases that actually need
them", and I need to explain what I am agreeing to: I believe
that computer systems, "system" in a wide sense, have what I
call "inewvitable functionality", that is functionality that is
not optional, but must be provided *somewhere*: for example
print spooling is "inevitable functionality" as long as there
are multuple users, and spell checking is another example.

The only choice as to "inevitable functionality" is *where* to
provide it. For example spooling can be done among two users by
queuing jobs manually with one saying "I am going to print now",
and the other user waits until the print is finished, or by
using a spool program that queues jobs on the source system, or
by using a spool program that queues jobs on the target
printer. Spell checking can be done on the fly in the document
processor, batch with a tool, or manually by the document
author. All these are valid implementations of "inevitable
functionality", just with very different performance envelope,
where the "system" includes the users as "peripherals" or
"plugins" :-) in the manual implementations.

There is no dispute from me that multiple devices,
adding/removing block devices, data compression, structural
repair, balancing, growing/shrinking, defragmentation, quota
groups, integrity checking, deduplication, ...a are all in the
general case "inevitably functionality", and every non-trivial
storage system *must* implement them.

The big question is *where*: for example when I started using
UNIX the 'fsck' tool was several years away, and when the system
crashed I did like everybody filetree integrity checking and
structure recovery myself (with the help of 'ncheck' and
'icheck' and 'adb'), that is 'fsck' was implemented in my head.

In the general case there are three places where such
"inevitable functionality" can be implemented:

* In the filesystem module in the kernel, for example Btrfs
  scrubbing.
* In a tool that uses hook provided by the filesystem module in
  the kernel, for example Btrfs deduplication, 'send'/'receive'.
* In a tool, for example 'btrfsck'.
* In the system administrator.

Consider the "very complex risky slow" operation of
defragmentation; the system administrator can implement it by
dumping and reloading the volume, or a tool ban implement it by
running on the unmounted filesystem, or a tool and the kernel
can implement it by using kernel module hooks, or it can be
provided entirely in the kernel module.

My argument is that providing "very complex risky slow"
maintenance operations as filesystem primitives looks awesomely
convenient, a good way to "win box-ticking competitions" for
"marketing" purposes, but is rather bad idea for several
reasons, of varying strengths:

* Most system 

Re: Shrinking a device - performance?

2017-03-31 Thread Austin S. Hemmelgarn

On 2017-03-30 11:55, Peter Grandi wrote:

My guess is that very complex risky slow operations like that are
provided by "clever" filesystem developers for "marketing" purposes,
to win box-ticking competitions. That applies to those system
developers who do know better; I suspect that even some filesystem
developers are "optimistic" as to what they can actually achieve.



There are cases where there really is no other sane option. Not
everyone has the kind of budget needed for proper HA setups,


Thnaks for letting me know, that must have never occurred to me, just as
it must have never occurred to me that some people expect extremely
advanced features that imply big-budget high-IOPS high-reliability
storage to be fast and reliable on small-budget storage too :-)
You're missing my point (or intentionally ignoring it).  Those types of 
operations are implemented because there are use cases that actually 
need them, not because some developer thought it would be cool.  The one 
possible counter-example of this is XFS, which doesn't support shrinking 
the filesystem at all, but that was a conscious decision because their 
target use case (very large scale data storage) does not need that 
feature and not implementing it allows them to make certain other parts 
of the filesystem faster.



and if you need maximal uptime and as a result have to reprovision the
system online, then you pretty much need a filesystem that supports
online shrinking.


That's a bigger topic than we can address here. The topic used to be
known in one related domain as "Very Large Databases", which were
defined as databases so large and critical that they the time needed for
maintenance and backup were too slow for taking them them offline etc.;
that is a topics that has largely vanished for discussion, I guess
because most management just don't want to hear it :-).
No, it's mostly vanished because of changes in best current practice. 
That was a topic in an era where the only platform that could handle 
high-availability was VMS, and software wasn't routinely written to 
handle things like load balancing.  As a result, people ran a single 
system which hosted the database, and if that went down, everything went 
down.  By contrast, it's rare these days outside of small companies to 
see singly hosted databases that aren't specific to the local system, 
and once you start parallelizing on the system level, backup and 
maintenance times generally go down.



Also, it's not really all that slow on most filesystem, BTRFS is just
hurt by it's comparatively poor performance, and the COW metadata
updates that are needed.


Btrfs in realistic situations has pretty good speed *and* performance,
and COW actually helps, as it often results in less head repositioning
than update-in-place. What makes it a bit slower with metadata is having
'dup' by default to recover from especially damaging bitflips in
metadata, but then that does not impact performance, only speed.
I and numerous other people have done benchmarks running single metadata 
and single data profiles on BTRFS, and it consistently performs worse 
than XFS and ext4 even under those circumstances.  It's not horrible 
performance (it's better for example than trying the same workload on 
NTFS on Windows), but it's still not what most people would call 'high' 
performance or speed.



That feature set is arguably not appropriate for VM images, but
lots of people know better :-).



That depends on a lot of factors.  I have no issues personally running
small VM images on BTRFS, but I'm also running on decent SSD's
(>500MB/s read and write speeds), using sparse files, and keeping on
top of managing them. [ ... ]


Having (relatively) big-budget high-IOPS storage for high-IOPS workloads
helps, that must have never occurred to me either :-).
It's not big budget, the SSD's in question are at best mid-range 
consumer SSD's that cost only marginally more than a decent hard drive, 
and they really don't get all that great performance in terms of IOPS 
because they're all on the same cheap SATA controller.  The point I was 
trying to make (which I should have been clearer about) is that they 
have good bulk throughput, which means that the OS can do much more 
aggressive writeback caching, which in turn means that COW and 
fragmentation have less impact.



XFS and 'ext4' are essentially equivalent, except for the fixed-size
inode table limitation of 'ext4' (and XFS reportedly has finer
grained locking). Btrfs is nearly as good as either on most workloads
is single-device mode [ ... ]



No, if you look at actual data, [ ... ]


Well, I have looked at actual data in many published but often poorly
made "benchmarks", and to me they seem they seem quite equivalent
indeed, within somewhat differently shaped performance envelopes, so the
results depend on the testing point within that envelope. I have been
done my own simplistic actual data gathering, most recently here:

  

Re: Shrinking a device - performance?

2017-03-31 Thread Peter Grandi
>> [ ... ] CentOS, Redhat, and Oracle seem to take the position
>> that very large data subvolumes using btrfs should work
>> fine. But I would be curious what the rest of the list thinks
>> about 20 TiB in one volume/subvolume.

> To be sure I'm a biased voice here, as I have multiple
> independent btrfs on multiple partitions here, with no btrfs
> over 100 GiB in size, and that's on ssd so maintenance
> commands normally return in minutes or even seconds,

That's a bit extreme I think, as there are downsides to have
many too small volumes too.

> not the hours to days or even weeks it takes on multi-TB btrfs
> on spinning rust.

Or months :-).

> But FWIW... 1) Don't put all your data eggs in one basket,
> especially when that basket isn't yet entirely stable and
> mature.

Really good point here.

> A mantra commonly repeated on this list is that btrfs is still
> stabilizing,

My impression is that most 4.x and later versions are very
reliable for "base" functionality, that is excluding
multi-device, compression, qgroups, ... Put another way, what
scratches the Facebook itches works well :-).

> [ ... ] the time/cost/hassle-factor of the backup, and being
> practically prepared to use them, is even *MORE* important
> than it is on fully mature and stable filesystems.

Indeed, or at least *different* filesystems. I backup JFS
filesystems to XFS ones, and Btrfs filesystems to NILFS2 ones,
for example.

> 2) Don't make your filesystems so large that any maintenance
> on them, including both filesystem maintenance like btrfs
> balance/scrub/check/ whatever, and normal backup and restore
> operations, takes impractically long,

As per my preceding post, that's the big deal, but so many
people "know better" :-).

> where "impractically" can be reasonably defined as so long it
> discourages you from doing them in the first place and/or so
> long that it's going to cause unwarranted downtime.

That's the "Very Large DataBase" level of trouble.

> Some years ago, before I started using btrfs and while I was
> using mdraid, I learned this one the hard way. I had a bunch
> of rather large mdraids setup, [ ... ]

I have recently seen another much "funnier" example: people who
"know better" and follow every cool trend decide to consolidate
their server farm on VMs, backed by a storage server with a
largish single pool of storage holding the virtual disk images
of all the server VMs. They look like geniuses until the storage
pool system crashes, and a minimal integrity check on restart
takes two days during which the whole organization is without
access to any email, files, databases, ...

> [ ... ] And there was a good chance it was /not/ active and
> mounted at the time of the crash and thus didn't need
> repaired, saving that time entirely! =:^)

As to that I have switched to using 'autofs' to mount volumes
only on access, using a simple script that turns '/etc/fstab'
into an automounter dynamic map, which means that most of the
time most volumes on my (home) systems are not mounted:

  http://www.sabi.co.uk/blog/anno06-3rd.html?060928#060928

> Eventually I arranged things so I could keep root mounted
> read-only unless I was updating it, and that's still the way I
> run it today.

The ancient way, instead of having '/' RO and '/var' RW, to have
'/' RW and '/usr' RO (so for example it could be shared across
many systems via NFS etc.), and while both are good ideas, I
prefer the ancient way. But then some people who know better are
moving to merge '/' with '/usr' without understanding what's the
history and the advantages.

> [ ... ] If it's multiple TBs, chances are it's going to be
> faster to simply blow away and recreate from backup, than it
> is to try to repair... [ ... ]

Or to shrink or defragment or dedup etc., except on very high
IOPS-per-TB storage.

> [ ... ] how much simpler it would have been had they had an
> independent btrfs of say a TB or two for each system they were
> backing up.

That is the general alternative to a single large pool/volume:
sharding/chunking of filetrees, sometimes, like with Lustre or
Ceph etc. with a "metafilesystem" layer on top.

Done manually my suggestion is to do the sharding per-week (or
other suitable period) rather than per-system, in a circular
"crop rotation" scheme. So that once a volume has been filled,
it becomes read-only and can even be unmounted until it needs
to be reused:

  http://www.sabi.co.uk/blog/12-fou.html?121218b#121218b

Then there is the problem that "a TB or two" is less easy with
increasing disk capacities, but then I think that disks with a
capacity larger than 1TB are not suitable for ordinary
workloads, and more for tape-cartridge like usage.

> What would they have done had the btrfs gone bad and needed
> repaired? [ ... ]

In most cases I have seen of designs aimed at achieving the
lowest cost and highest flexibility "low IOPS single poool" at
the expense of scalability and maintainability, the "clever"
designer had been promoted or had 

Re: Shrinking a device - performance?

2017-03-31 Thread Peter Grandi
> Can you try to first dedup the btrfs volume?  This is probably
> out of date, but you could try one of these: [ ... ] Yep,
> that's probably a lot of work. [ ... ] My recollection is that
> btrfs handles deduplication differently than zfs, but both of
> them can be very, very slow

But the big deal there is that dedup is indeed a very expensive
operation, even worse than 'balance'. A balanced, deduped volume
will shrink faster in most cases, but the time taken simply
moved from shrinking to preparing.

> Again, I'm not an expert in btrfs, but in most cases a full
> balance and scrub takes care of any problems on the root
> partition, but that is a relatively small partition.  A full
> balance (without the options) and scrub on 20 TiB must take a
> very long time even with robust hardware, would it not?

There have been reports of several months for volumes of that
size subject to ordinary workload.

> CentOS, Redhat, and Oracle seem to take the position that very
> large data subvolumes using btrfs should work fine.

This is a long standing controvery, and for example there have
been "interesting" debates in the XFS mailing list. Btrfs in
this is not really different from others, with one major
difference in context, that many Btrfs developers work for a
company that relies of large numbers of small servers, to the
point that fixing multidevice issues has not been a priority.

The controversy of large volumes is that while no doubt the
logical structures of recent filesystem types can support single
volumes of many petabytes (or even much larger), and such
volumes have indeed been created and "work"-ish, so they are
unquestionably "syntactically valid", the tradeoffs involved
especially as to maintainability may mean that they don't "work"
well and sustainably so.

The fundamental issue is metadata: while the logical structures,
using 48-64 bit pointers, unquestionably scale "syntactically",
they don't scale pragmatically when considering whole-volume
maintenance like checking, repair, balancing, scrubbing,
indexing (which includes making incremental backups etc.).

Note: large volumes don't have just a speed problem for
whole-volume operations, they also have a memory problem, as
most tools hold in memory copy of the metadata. There have been
cases where indexing or repair of a volume requires a lot more
RAM (many hundreds GiB or some TiB of RAM) than the system on
which the volume was being used.

The problem is of course smaller if the large volume contains
mostly large files, and bigger if the volume is stored on low
IOPS-per-TB devices and used on small-memory systems. But even
with large files even if filetree object metadata (inodes etc.)
are relatively few eventually space metadata must at least
potentially resolve down to single sectors, and that can be a
lot of metadata unless both used and free space are very
unfragmented.

The fundamental technological issue is: *data* IO rates, in both
random IOPS and sequential ones, can be scaled "almost" linearly
by parallelizing them using RAID or equivalent, allowing large
volumes to serve scalably large and parallel *data* workloads,
but *metadata* IO rates cannot be easily parallelized, because
metadata structures are graphs, not arrays of bytes like files.

So a large volume on 100 storage devices can serve in parallel a
significant percentage of 100 times the data workload of a small
volume on 1 storage device, but not so much for the metadata
workload.

For example, I have never seen a parallel 'fsck' tool that can
take advantage of 100 storage devices to complete a scan of a
single volume on 100 storage devices in not much longer time
than the scan of a volume on 1 of the storage device.

> But I would be curious what the rest of the list thinks about
> 20 TiB in one volume/subvolume.

Personally I think that while volumes of many petabytes "work"
syntactically, there are serious maintainability problem (which
I have seen happen at a number of sites) with volumes larger
than 4TB-8TB with any current local filesystem design.

That depends also on number/size of storage devices, and their
nature, that is IOPS, as after all metadata workloads do scale a
bit with number of available IOPS, even if far more slowly than
data workloads.

For for example I think that an 8TB volume is not desirable on a
single 8TB disk for ordinary workloads (but then I think that
disks above 1-2TB are just not suitable for ordinary filesystem
workloads), but with lots of smaller/faster disks a 12TB volume
would probably be acceptable, and maybe a number of flash SSDs
might make acceptable even a 20TB volume.

Of course there are lots of people who know better. :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-31 Thread Peter Grandi
>>> The way btrfs is designed I'd actually expect shrinking to
>>> be fast in most cases. [ ... ]

>> The proposed "move whole chunks" implementation helps only if
>> there are enough unallocated chunks "below the line". If regular
>> 'balance' is done on the filesystem there will be some, but that
>> just spreads the cost of the 'balance' across time, it does not
>> by itself make a «risky, difficult, slow operation» any less so,
>> just spreads the risk, difficulty, slowness across time.

> Isn't that too pessimistic?

Maybe, it depends on the workload impacting the volume and how
much it churns the free/unallocated situation.

> Most of my filesystems have 90+% of free space unallocated,
> even those I never run balance on.

That seems quite lucky to me, as definitely is not my experience
or even my expectation in the general case: in my laptop and
desktop with relatively few updates I have to run 'balance'
fairly frequently, and "Knorrie" has produced a nice tools that
produces a graphical map of free vs. unallocated space and most
examples and users find quite a bit of balancing needs to be
done

> For me it wouldn't just spread the cost, it would reduce it
> considerably.

In your case the cost of the implicit or explicit 'balance'
simply does not arise because 'balance' is not necessary, and
then moving whole chunks is indeed cheap. The argument here is
in part whether used space (extents) or allocated space (chunks)
is more fragmented as well as the amount of metadata to update
in either case.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-30 Thread Duncan
Duncan posted on Fri, 31 Mar 2017 05:26:39 + as excerpted:

> Compare that to the current thread where someone's trying to do a resize
> of a 20+ TB btrfs and it was looking to take a week, due to the massive
> size and the slow speed of balance on his highly reflinked filesystem on
> spinning rust.

Heh, /this/ thread.  =:^)  I obviously lost track of the thread I was 
replying to.

Which in a way makes the reply even more forceful, as it's obviously 
generically targeted, not just at this thread.  Even if I were so devious 
as to arrange that deliberately (I'm not and I didn't, FWIW, but of 
course if you suspect that than this assurance won't mean much either).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-30 Thread Duncan
GWB posted on Thu, 30 Mar 2017 20:00:22 -0500 as excerpted:

> CentOS, Redhat, and Oracle seem to take the position that very large
> data subvolumes using btrfs should work fine.  But I would be curious
> what the rest of the list thinks about 20 TiB in one volume/subvolume.

To be sure I'm a biased voice here, as I have multiple independent btrfs 
on multiple partitions here, with no btrfs over 100 GiB in size, and 
that's on ssd so maintenance commands normally return in minutes or even 
seconds, not the hours to days or even weeks it takes on multi-TB btrfs 
on spinning rust.  But FWIW...

IMO there are two rules favoring multiple relatively smaller btrfs over 
single far larger btrfs:

1) Don't put all your data eggs in one basket, especially when that 
basket isn't yet entirely stable and mature.

A mantra commonly repeated on this list is that btrfs is still 
stabilizing, not fully stable and mature, the result being that keeping 
backups of any data you value more than the time/cost/hassle-factor of 
the backup, and being practically prepared to use them, is even *MORE* 
important than it is on fully mature and stable filesystems.  If 
potential users aren't prepared to do that, flat answer, they should be 
looking at other filesystems, tho in reality, that rule applies to stable 
and mature filesystems too, and any good sysadmin understands that not 
having a backup is in reality defining the data in question as worth less 
than the cost of that backup, regardless of any protests to the contrary.

Based on that and the fact that if this less than 100% stable and mature 
filesystem fails, all those subvolumes and snapshots you painstakingly 
created aren't going to matter, it's all up in smoke, it just makes sense 
to subdivide that data roughly along functional lines and split it up 
into multiple independent btrfs, so that if a filesystem fails, it'll 
take only a fraction of the total data with it, and restoring/repairing/
rebuilding will hopefully only have to be done on a small fraction of 
that data.

Which brings us to rule #2:

2) Don't make your filesystems so large that any maintenance on them, 
including both filesystem maintenance like btrfs balance/scrub/check/
whatever, and normal backup and restore operations, takes impractically 
long, where "impractically" can be reasonably defined as so long it 
discourages you from doing them in the first place and/or so long that 
it's going to cause unwarranted downtime.

Some years ago, before I started using btrfs and while I was using 
mdraid, I learned this one the hard way.  I had a bunch of rather large 
mdraids setup, each with multiple partitions and filesystems[1].  This 
was before mdraid got proper write-intent bitmap support, so after a 
crash, I'd have to repair any of these large mdraids that had been active 
at the time, a process taking hours, even for the primary one containing 
root and /home, because it contained for example a large media partition 
that was unlikely to have been mounted at the same time.

After getting tired of this I redid things, putting each partition/
filesystem on its own mdraid.  Then it would take only a few minutes each 
for the mdraids for root, /home and /var/log, and I could be back in 
business with them in half an hour or so, instead of the couple hours I 
had to wait before, to get the bigger mdraid back up and repaired.  Sure, 
if the much larger media raid was active and the partition mounted too, 
I'd still have it to repair, but I could do that in the background.  And 
there was a good chance it was /not/ active and mounted at the time of 
the crash and thus didn't need repaired, saving that time entirely! =:^)

Eventually I arranged things so I could keep root mounted read-only 
unless I was updating it, and that's still the way I run it today.  That 
makes it very nice when a crash impairs /home and /var/log, since there's 
much less chance root was affected, and with a normal root mount, at 
least I have my full normal system available to me, including the latest 
installed btrfs-progs, and manpages and text-mode browsers such as lynx 
available to me to help troubleshoot, that aren't normally available in 
typical distros' rescue modes.

Meanwhile, a scrub (my btrfs but for /boot are raid1 both data and 
metadata, and /boot is mixed-mode dup, so scrub can normally repair crash 
damage getting the two mirrors out of sync) of root takes only ~10 
seconds, a scrub of /home takes only ~45 seconds, and a scrub of /var/log 
is normally done nearly as fast as I hit enter on the command.  
Similarly, btrfs balance and btrfs check normally run in under a minute, 
partly because I'm on ssd, and partly because those three filesystems are 
all well under 50 GiB each.

Of course I may have to run two or three scrubs, depending on what was 
mounted writable at the time of the crash, and I've had /home and /var/
log (but not root as it's read-only by default) go unmountable until 
repaired a couple 

Re: Shrinking a device - performance?

2017-03-30 Thread GWB
Hello, Christiane,

I very much enjoyed the discussion you sparked with your original
post.  My ability in btrfs is very limited, much less than the others
who have replied here, so this may not be much help.

Let us assume that you have been able to shrink the device to the size
you need, and you are now merrily on your way to moving the data to
XFS.  If so, ignore this email, delete, whatever, and read no further.

If that is not the case, perhaps try something like the following.

Can you try to first dedup the btrfs volume?  This is probably out of
date, but you could try one of these:

https://btrfs.wiki.kernel.org/index.php/Deduplication

If that does not work, this is a longer shot, but you might consider
adding an intermediate step of creating yet another btrfs volume on
the underlying lvm2 device mapper, turning on dedup, compression, and
whatever else can squeeze some extra space out of the current btrfs
volume.  You could then try to copy over files and see if you get the
results you need (or try sending the current btrfs volume as a
snapshot, but I'm guessing 20TB is too much).

Once the new btrfs volume on top of lvm2 is complete, you could just
delete the old one, and then transfer the (hopefully compressed and
deduped) data to XFS.

Yep, that's probably a lot of work.

I use both btrfs (on root on Ubuntu) and zfs (for data, home), and I
try to do as little as possible with live mounted file systems other
than snapshots.  I avoid sending and receive snapshots from the live
system (mostly zfs, but sometimes btrfs) but instead write increment
snapshots as a file on the backup disks, and then import the
incremental snaps into a backup pool at night.

My recollection is that btrfs handles deduplication differently than
zfs, but both of them can be very, very slow (from the human
perspective; call that what you will; a sub optimal relationship of
the parameters of performance and speed).

The advantage you have is that with lvm you can create a number of
different file systems.  And lvm can also create snapshots.  I think
both zfs and btrfs both have a more "elegant" way of dealing with
snapshots, but lvm allows a file system without that feature to have
it.  Others on the list can tell you about the disadvantages.

I would be curious how it turns out for you.  If you are able to move
the data to XFS running on top of lvm, what is your plan for snapshots
in lvm?

Again, I'm not an expert in btrfs, but in most cases a full balance
and scrub takes care of any problems on the root partition, but that
is a relatively small partition.  A full balance (without the options)
and scrub on 20 TiB must take a very long time even with robust
hardware, would it not?

CentOS, Redhat, and Oracle seem to take the position that very large
data subvolumes using btrfs should work fine.  But I would be curious
what the rest of the list thinks about 20 TiB in one volume/subvolume.

Gordon



On Thu, Mar 30, 2017 at 5:13 PM, Piotr Pawłow  wrote:
>> The proposed "move whole chunks" implementation helps only if
>> there are enough unallocated chunks "below the line". If regular
>> 'balance' is done on the filesystem there will be some, but that
>> just spreads the cost of the 'balance' across time, it does not
>> by itself make a «risky, difficult, slow operation» any less so,
>> just spreads the risk, difficulty, slowness across time.
>
> Isn't that too pessimistic? Most of my filesystems have 90+% of free
> space unallocated, even those I never run balance on. For me it wouldn't
> just spread the cost, it would reduce it considerably.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-30 Thread Piotr Pawłow
> The proposed "move whole chunks" implementation helps only if
> there are enough unallocated chunks "below the line". If regular
> 'balance' is done on the filesystem there will be some, but that
> just spreads the cost of the 'balance' across time, it does not
> by itself make a «risky, difficult, slow operation» any less so,
> just spreads the risk, difficulty, slowness across time.

Isn't that too pessimistic? Most of my filesystems have 90+% of free
space unallocated, even those I never run balance on. For me it wouldn't
just spread the cost, it would reduce it considerably.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-30 Thread Peter Grandi
>> My guess is that very complex risky slow operations like that are
>> provided by "clever" filesystem developers for "marketing" purposes,
>> to win box-ticking competitions. That applies to those system
>> developers who do know better; I suspect that even some filesystem
>> developers are "optimistic" as to what they can actually achieve.

> There are cases where there really is no other sane option. Not
> everyone has the kind of budget needed for proper HA setups,

Thnaks for letting me know, that must have never occurred to me, just as
it must have never occurred to me that some people expect extremely
advanced features that imply big-budget high-IOPS high-reliability
storage to be fast and reliable on small-budget storage too :-)

> and if you need maximal uptime and as a result have to reprovision the
> system online, then you pretty much need a filesystem that supports
> online shrinking.

That's a bigger topic than we can address here. The topic used to be
known in one related domain as "Very Large Databases", which were
defined as databases so large and critical that they the time needed for
maintenance and backup were too slow for taking them them offline etc.;
that is a topics that has largely vanished for discussion, I guess
because most management just don't want to hear it :-).

> Also, it's not really all that slow on most filesystem, BTRFS is just
> hurt by it's comparatively poor performance, and the COW metadata
> updates that are needed.

Btrfs in realistic situations has pretty good speed *and* performance,
and COW actually helps, as it often results in less head repositioning
than update-in-place. What makes it a bit slower with metadata is having
'dup' by default to recover from especially damaging bitflips in
metadata, but then that does not impact performance, only speed.

>> That feature set is arguably not appropriate for VM images, but
>> lots of people know better :-).

> That depends on a lot of factors.  I have no issues personally running
> small VM images on BTRFS, but I'm also running on decent SSD's
> (>500MB/s read and write speeds), using sparse files, and keeping on
> top of managing them. [ ... ]

Having (relatively) big-budget high-IOPS storage for high-IOPS workloads
helps, that must have never occurred to me either :-).

>> XFS and 'ext4' are essentially equivalent, except for the fixed-size
>> inode table limitation of 'ext4' (and XFS reportedly has finer
>> grained locking). Btrfs is nearly as good as either on most workloads
>> is single-device mode [ ... ]

> No, if you look at actual data, [ ... ]

Well, I have looked at actual data in many published but often poorly
made "benchmarks", and to me they seem they seem quite equivalent
indeed, within somewhat differently shaped performance envelopes, so the
results depend on the testing point within that envelope. I have been
done my own simplistic actual data gathering, most recently here:

  http://www.sabi.co.uk/blog/17-one.html?170302#170302
  http://www.sabi.co.uk/blog/17-one.html?170228#170228

and however simplistic they are fairly informative (and for writes they
point a finger at a layer below the filesystem type).

[ ... ]

>> "Flexibility" in filesystems, especially on rotating disk
>> storage with extremely anisotropic performance envelopes, is
>> very expensive, but of course lots of people know better :-).

> Time is not free,

Your time seems especially and uniquely precious as you "waste"
as little as possible editing your replies into readability.

> and humans generally prefer to minimize the amount of time they have
> to work on things. This is why ZFS is so popular, it handles most
> errors correctly by itself and usually requires very little human
> intervention for maintenance.

That seems to me a pretty illusion, as it does not contain any magical
AI, just pretty ordinary and limited error correction for trivial cases.

> 'Flexibility' in a filesystem costs some time on a regular basis, but
> can save a huge amount of time in the long run.

Like everything else. The difficulty is having flexibility at scale with
challenging workloads. "An engineer can do  for a nickel what  any damn
fool can do for a dollar" :-).

> To look at it another way, I have a home server system running BTRFS
> on top of LVM. [ ... ]

But usually home servers have "unchallenging" workloads, and it is
relatively easy to overbudget their storage, because the total absolute
cost is "affordable".
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-30 Thread Peter Grandi
> I’ve glazed over on “Not only that …” … can you make youtube
> video of that :)) [ ... ]  It’s because I’m special :*

Well played again, that's a fairly credible impersonation of a
node.js/mongodb developer :-).

> On a real note thank’s [ ... ] to much of open source stuff is
> based on short comments :/

Yes... In part that's because the "sw engineering" aspect of
programming takes a lot of time that unpaid volunteers sometimes
cannot afford to take, in part though I have noticed sometimes
free sw authors who do get paid to do free sw act as if they had
a policy of obfuscation to protect their turf/jobs.

Regardless, mailing lists, IRC channel logs, wikis, personal
blogs, search engines allow a mosaic of lore to form, which
in part remedies the situation, and here we are :-).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-30 Thread Peter Grandi
>> As a general consideration, shrinking a large filetree online
>> in-place is an amazingly risky, difficult, slow operation and
>> should be a last desperate resort (as apparently in this case),
>> regardless of the filesystem type, and expecting otherwise is
>> "optimistic".

> The way btrfs is designed I'd actually expect shrinking to be
> fast in most cases. It could probably be done by moving whole
> chunks at near platter speed, [ ... ] It just hasn't been
> implemented yet.

That seems to me a rather "optimistic" argument, as most of the
cost of shrinking is the 'balance' to pack extents into chunks.

As that thread implies, the current implementation in effect
does a "balance" while shrinking, by moving extents from chunks
"above the line" to free space in chunks "below the line".

The proposed "move whole chunks" implementation helps only if
there are enough unallocated chunks "below the line". If regular
'balance' is done on the filesystem there will be some, but that
just spreads the cost of the 'balance' across time, it does not
by itself make a «risky, difficult, slow operation» any less so,
just spreads the risk, difficulty, slowness across time.

More generally one of the downsides of Btrfs is that because of
its two-level (allocated/unallocated chunks, used/free nodes or
blocks) design it requires more than most other designs to do
regular 'balance', which is indeed «risky, difficult, slow».

Compare an even more COW design like NILFS2, which requires, but a
bit less, to run its garbage collector, which is also «risky,
difficult, slow». Just like in Btrfs that is a tradeoff that
shrinks the performance envelope in one direction and expands it
in another.

But in the case of Btrfs it shrinks it perhaps a bit more than it
expands it, as the added flexibility of having chunk-based
'profiles' is only very partially taken advantage of.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-30 Thread Piotr Pawłow
> As a general consideration, shrinking a large filetree online
> in-place is an amazingly risky, difficult, slow operation and
> should be a last desperate resort (as apparently in this case),
> regardless of the filesystem type, and expecting otherwise is
> "optimistic".

The way btrfs is designed I'd actually expect shrinking to be fast in
most cases. It could probably be done by moving whole chunks at near
platter speed, instead of extent-by-extent as it is done now, as long as
there is enough free space. There was a discussion about it already:
http://www.spinics.net/lists/linux-btrfs/msg38608.html. It just hasn't
been implemented yet.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-28 Thread Austin S. Hemmelgarn

On 2017-03-28 10:43, Peter Grandi wrote:

This is going to be long because I am writing something detailed
hoping pointlessly that someone in the future will find it by
searching the list archives while doing research before setting
up a new storage system, and they will be the kind of person
that tolerates reading messages longer than Twitter. :-).


I’m currently shrinking a device and it seems that the
performance of shrink is abysmal.


When I read this kind of statement I am reminded of all the
cases where someone left me to decatastrophize a storage system
built on "optimistic" assumptions. The usual "optimism" is what
I call the "syntactic approach", that is the axiomatic belief
that any syntactically valid combination of features not only
will "work", but very fast too and reliably despite slow cheap
hardware and "unattentive" configuration. Some people call that
the expectation that system developers provide or should provide
an "O_PONIES" option. In particular I get very saddened when
people use "performance" to mean "speed", as the difference
between the two is very great.

As a general consideration, shrinking a large filetree online
in-place is an amazingly risky, difficult, slow operation and
should be a last desperate resort (as apparently in this case),
regardless of the filesystem type, and expecting otherwise is
"optimistic".

My guess is that very complex risky slow operations like that
are provided by "clever" filesystem developers for "marketing"
purposes, to win box-ticking competitions. That applies to those
system developers who do know better; I suspect that even some
filesystem developers are "optimistic" as to what they can
actually achieve.
There are cases where there really is no other sane option.  Not 
everyone has the kind of budget needed for proper HA setups, and if you 
need maximal uptime and as a result have to reprovision the system 
online, then you pretty much need a filesystem that supports online 
shrinking.  Also, it's not really all that slow on most filesystem, 
BTRFS is just hurt by it's comparatively poor performance, and the COW 
metadata updates that are needed.



I intended to shrink a ~22TiB filesystem down to 20TiB. This is
still using LVM underneath so that I can’t just remove a device
from the filesystem but have to use the resize command.


That is actually a very good idea because Btrfs multi-device is
not quite as reliable as DM/LVM2 multi-device.
This depends on how much you trust your storage hardware relative to how 
much you trust the kernel code.  For raid5/6, yes, BTRFS multi-device is 
currently crap.  For most people raid10 in BTRFS is too.  For raid1 mode 
however, it really is personal opinion.



Label: 'backy'  uuid: 3d0b7511-4901-4554-96d4-e6f9627ea9a4
   Total devices 1 FS bytes used 18.21TiB
   devid1 size 20.00TiB used 20.71TiB path /dev/mapper/vgsys-backy


Maybe 'balance' should have been used a bit more.


This has been running since last Thursday, so roughly 3.5days
now. The “used” number in devid1 has moved about 1TiB in this
time. The filesystem is seeing regular usage (read and write)
and when I’m suspending any application traffic I see about
1GiB of movement every now and then. Maybe once every 30
seconds or so. Does this sound fishy or normal to you?


With consistent "optimism" this is a request to assess whether
"performance" of some operations is adequate on a filetree
without telling us either what the filetree contents look like,
what the regular workload is, or what the storage layer looks
like.

Being one of the few system administrators crippled by lack of
psychic powers :-), I rely on guesses and inferences here, and
having read the whole thread containing some belated details.

From the ~22TB total capacity my guess is that the storage layer
involves rotating hard disks, and from later details the
filesystem contents seems to be heavily reflinked files of
several GB in size, and workload seems to be backups to those
files from several source hosts. Considering the general level
of "optimism" in the situation my wild guess is that the storage
layer is based on large slow cheap rotating disks in teh 4GB-8GB
range, with very low IOPS-per-TB.


Thanks for that info. The 1min per 1GiB is what I saw too -
the “it can take longer” wasn’t really explainable to me.


A contemporary rotating disk device can do around 0.5MB/s
transfer rate with small random accesses with barriers up to
around 80-160MB/s in purely sequential access without barriers.

1GB/m of simultaneous read-write means around 16MB/s reads plus
16MB/s writes which is fairly good *performance* (even if slow
*speed*) considering that moving extents around, even across
disks, involves quite a bit of randomish same-disk updates of
metadata; because it all depends usually on how much randomish
metadata updates need to done, on any filesystem type, as those
must be done with barriers.


As I’m not using snapshots: would large files (100+gb)


Using 

Re: Shrinking a device - performance?

2017-03-28 Thread Tomasz Kusmierz
I’ve glazed over on “Not only that …” … can you make youtube video of that :
> On 28 Mar 2017, at 16:06, Peter Grandi  wrote:
> 
>> I glazed over at “This is going to be long” … :)
>>> [ ... ]
> 
> Not only that, you also top-posted while quoting it pointlessly
> in its entirety, to the whole mailing list. Well played :-).
It’s because I’m special :* 

On a real note thank’s for giving a f to provide a detailed comment … to much 
of open source stuff is based on short comments :/

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-28 Thread Peter Grandi
> I glazed over at “This is going to be long” … :)
>> [ ... ]

Not only that, you also top-posted while quoting it pointlessly
in its entirety, to the whole mailing list. Well played :-).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-28 Thread Peter Grandi
>  [ ... ] slaps together a large storage system in the cheapest
> and quickest way knowing that while it is mostly empty it will
> seem very fast regardless and therefore to have awesome
> performance, and then the "clever" sysadm disappears surrounded
> by a halo of glory before the storage system gets full workload
> and fills up; [ ... ]

Fortunately or unfortunately Btrfs is particularly suitable for
this technique, as it has an enormous number of checkbox-ticking
awesome looking feature: transparent compression, dynamic
add/remove, online balance/scrub, different sized member devices,
online grow/shrink, online defrag, limitless scalability, online
dedup, arbitrary subvolumes and snapshots, COW and reflinking,
online conversion of RAID profiles, ... and one can use all of
them at the same time, and for the initial period where volume
workload is low and space used not much, it will looks absolutely
fantastic, cheap, flexible, always available, fast, the work of
genius of a very cool sysadm.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-28 Thread Peter Grandi
> [ ... ] reminded of all the cases where someone left me to
> decatastrophize a storage system built on "optimistic"
> assumptions.

In particular when some "clever" sysadm with a "clever" (or
dumb) manager slaps together a large storage system in the
cheapest and quickest way knowing that while it is mostly empty
it will seem very fast regardless and therefore to have awesome
performance, and then the "clever" sysadm disappears surrounded
by a halo of glory before the storage system gets full workload
and fills up; when that happens usually I get to inherit it.
BTW The same technique also can be done with HPC clusters.

>> I intended to shrink a ~22TiB filesystem down to 20TiB. This
>> is still using LVM underneath so that I can’t just remove a
>> device from the filesystem but have to use the resize
>> command.

>> Label: 'backy'  uuid: 3d0b7511-4901-4554-96d4-e6f9627ea9a4
>> Total devices 1 FS bytes used 18.21TiB
>> devid1 size 20.00TiB used 20.71TiB path /dev/mapper/vgsys-backy

Ahh it is indeed a filled up storage system now running a full
workload. At least it wasn't me who inherited it this time. :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-28 Thread Tomasz Kusmierz
I glazed over at “This is going to be long” … :)

> On 28 Mar 2017, at 15:43, Peter Grandi  wrote:
> 
> This is going to be long because I am writing something detailed
> hoping pointlessly that someone in the future will find it by
> searching the list archives while doing research before setting
> up a new storage system, and they will be the kind of person
> that tolerates reading messages longer than Twitter. :-).
> 
>> I’m currently shrinking a device and it seems that the
>> performance of shrink is abysmal.
> 
> When I read this kind of statement I am reminded of all the
> cases where someone left me to decatastrophize a storage system
> built on "optimistic" assumptions. The usual "optimism" is what
> I call the "syntactic approach", that is the axiomatic belief
> that any syntactically valid combination of features not only
> will "work", but very fast too and reliably despite slow cheap
> hardware and "unattentive" configuration. Some people call that
> the expectation that system developers provide or should provide
> an "O_PONIES" option. In particular I get very saddened when
> people use "performance" to mean "speed", as the difference
> between the two is very great.
> 
> As a general consideration, shrinking a large filetree online
> in-place is an amazingly risky, difficult, slow operation and
> should be a last desperate resort (as apparently in this case),
> regardless of the filesystem type, and expecting otherwise is
> "optimistic".
> 
> My guess is that very complex risky slow operations like that
> are provided by "clever" filesystem developers for "marketing"
> purposes, to win box-ticking competitions. That applies to those
> system developers who do know better; I suspect that even some
> filesystem developers are "optimistic" as to what they can
> actually achieve.
> 
>> I intended to shrink a ~22TiB filesystem down to 20TiB. This is
>> still using LVM underneath so that I can’t just remove a device
>> from the filesystem but have to use the resize command.
> 
> That is actually a very good idea because Btrfs multi-device is
> not quite as reliable as DM/LVM2 multi-device.
> 
>> Label: 'backy'  uuid: 3d0b7511-4901-4554-96d4-e6f9627ea9a4
>>   Total devices 1 FS bytes used 18.21TiB
>>   devid1 size 20.00TiB used 20.71TiB path /dev/mapper/vgsys-backy
> 
> Maybe 'balance' should have been used a bit more.
> 
>> This has been running since last Thursday, so roughly 3.5days
>> now. The “used” number in devid1 has moved about 1TiB in this
>> time. The filesystem is seeing regular usage (read and write)
>> and when I’m suspending any application traffic I see about
>> 1GiB of movement every now and then. Maybe once every 30
>> seconds or so. Does this sound fishy or normal to you?
> 
> With consistent "optimism" this is a request to assess whether
> "performance" of some operations is adequate on a filetree
> without telling us either what the filetree contents look like,
> what the regular workload is, or what the storage layer looks
> like.
> 
> Being one of the few system administrators crippled by lack of
> psychic powers :-), I rely on guesses and inferences here, and
> having read the whole thread containing some belated details.
> 
> From the ~22TB total capacity my guess is that the storage layer
> involves rotating hard disks, and from later details the
> filesystem contents seems to be heavily reflinked files of
> several GB in size, and workload seems to be backups to those
> files from several source hosts. Considering the general level
> of "optimism" in the situation my wild guess is that the storage
> layer is based on large slow cheap rotating disks in teh 4GB-8GB
> range, with very low IOPS-per-TB.
> 
>> Thanks for that info. The 1min per 1GiB is what I saw too -
>> the “it can take longer” wasn’t really explainable to me.
> 
> A contemporary rotating disk device can do around 0.5MB/s
> transfer rate with small random accesses with barriers up to
> around 80-160MB/s in purely sequential access without barriers.
> 
> 1GB/m of simultaneous read-write means around 16MB/s reads plus
> 16MB/s writes which is fairly good *performance* (even if slow
> *speed*) considering that moving extents around, even across
> disks, involves quite a bit of randomish same-disk updates of
> metadata; because it all depends usually on how much randomish
> metadata updates need to done, on any filesystem type, as those
> must be done with barriers.
> 
>> As I’m not using snapshots: would large files (100+gb)
> 
> Using 100GB sized VM virtual disks (never mind with COW) seems
> very unwise to me to start with, but of course a lot of other
> people know better :-). Just like a lot of other people know
> better that large single pool storage systems are awesome in
> every respect :-): cost, reliability, speed, flexibility,
> maintenance, etc.
> 
>> with long chains of CoW history (specifically reflink copies)
>> also hurt?
> 
> Oh yes... They are about one 

Re: Shrinking a device - performance?

2017-03-28 Thread Peter Grandi
This is going to be long because I am writing something detailed
hoping pointlessly that someone in the future will find it by
searching the list archives while doing research before setting
up a new storage system, and they will be the kind of person
that tolerates reading messages longer than Twitter. :-).

> I’m currently shrinking a device and it seems that the
> performance of shrink is abysmal.

When I read this kind of statement I am reminded of all the
cases where someone left me to decatastrophize a storage system
built on "optimistic" assumptions. The usual "optimism" is what
I call the "syntactic approach", that is the axiomatic belief
that any syntactically valid combination of features not only
will "work", but very fast too and reliably despite slow cheap
hardware and "unattentive" configuration. Some people call that
the expectation that system developers provide or should provide
an "O_PONIES" option. In particular I get very saddened when
people use "performance" to mean "speed", as the difference
between the two is very great.

As a general consideration, shrinking a large filetree online
in-place is an amazingly risky, difficult, slow operation and
should be a last desperate resort (as apparently in this case),
regardless of the filesystem type, and expecting otherwise is
"optimistic".

My guess is that very complex risky slow operations like that
are provided by "clever" filesystem developers for "marketing"
purposes, to win box-ticking competitions. That applies to those
system developers who do know better; I suspect that even some
filesystem developers are "optimistic" as to what they can
actually achieve.

> I intended to shrink a ~22TiB filesystem down to 20TiB. This is
> still using LVM underneath so that I can’t just remove a device
> from the filesystem but have to use the resize command.

That is actually a very good idea because Btrfs multi-device is
not quite as reliable as DM/LVM2 multi-device.

> Label: 'backy'  uuid: 3d0b7511-4901-4554-96d4-e6f9627ea9a4
>Total devices 1 FS bytes used 18.21TiB
>devid1 size 20.00TiB used 20.71TiB path /dev/mapper/vgsys-backy

Maybe 'balance' should have been used a bit more.

> This has been running since last Thursday, so roughly 3.5days
> now. The “used” number in devid1 has moved about 1TiB in this
> time. The filesystem is seeing regular usage (read and write)
> and when I’m suspending any application traffic I see about
> 1GiB of movement every now and then. Maybe once every 30
> seconds or so. Does this sound fishy or normal to you?

With consistent "optimism" this is a request to assess whether
"performance" of some operations is adequate on a filetree
without telling us either what the filetree contents look like,
what the regular workload is, or what the storage layer looks
like.

Being one of the few system administrators crippled by lack of
psychic powers :-), I rely on guesses and inferences here, and
having read the whole thread containing some belated details.

>From the ~22TB total capacity my guess is that the storage layer
involves rotating hard disks, and from later details the
filesystem contents seems to be heavily reflinked files of
several GB in size, and workload seems to be backups to those
files from several source hosts. Considering the general level
of "optimism" in the situation my wild guess is that the storage
layer is based on large slow cheap rotating disks in teh 4GB-8GB
range, with very low IOPS-per-TB.

> Thanks for that info. The 1min per 1GiB is what I saw too -
> the “it can take longer” wasn’t really explainable to me.

A contemporary rotating disk device can do around 0.5MB/s
transfer rate with small random accesses with barriers up to
around 80-160MB/s in purely sequential access without barriers.

1GB/m of simultaneous read-write means around 16MB/s reads plus
16MB/s writes which is fairly good *performance* (even if slow
*speed*) considering that moving extents around, even across
disks, involves quite a bit of randomish same-disk updates of
metadata; because it all depends usually on how much randomish
metadata updates need to done, on any filesystem type, as those
must be done with barriers.

> As I’m not using snapshots: would large files (100+gb)

Using 100GB sized VM virtual disks (never mind with COW) seems
very unwise to me to start with, but of course a lot of other
people know better :-). Just like a lot of other people know
better that large single pool storage systems are awesome in
every respect :-): cost, reliability, speed, flexibility,
maintenance, etc.

> with long chains of CoW history (specifically reflink copies)
> also hurt?

Oh yes... They are about one of the worst cases for using
Btrfs. But also very "optimistic" to think that kind of stuff
can work awesomely on *any* filesystem type.

> Something I’d like to verify: does having traffic on the
> volume have the potential to delay this infinitely? [ ... ]
> it’s just slow and we’re looking forward to about 

Re: Shrinking a device - performance?

2017-03-27 Thread Roman Mamedov
On Mon, 27 Mar 2017 16:49:47 +0200
Christian Theune  wrote:

> Also: the idea of migrating on btrfs also has its downside - the performance 
> of “mkdir” and “fsync” is abysmal at the moment. I’m waiting for the current 
> shrinking job to finish but this is likely limited to the “find free space” 
> algorithm. We’re talking about a few megabytes converted per second. Sigh.

Btw since this is all on LVM already, you could set up lvmcache with a small
SSD-based cache volume. Even some old 60GB SSD would work wonders for
performance, and with the cache policy of "writethrough" you don't have to
worry about its reliability (much).

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-27 Thread Christian Theune
Hi,

> On Mar 27, 2017, at 4:48 PM, Roman Mamedov  wrote:
> 
> On Mon, 27 Mar 2017 15:20:37 +0200
> Christian Theune  wrote:
> 
>> (Background info: we’re migrating large volumes from btrfs to xfs and can
>> only do this step by step: copying some data, shrinking the btrfs volume,
>> extending the xfs volume, rinse repeat. If someone should have any
>> suggestions to speed this up and not having to think in terms of _months_
>> then I’m all ears.)
> 
> I would only suggest that you reconsider XFS. You can't shrink XFS, therefore
> you won't have the flexibility to migrate in the same way to anything better
> that comes along in the future (ZFS perhaps? or even Bcachefs?). XFS does not
> perform that much better over Ext4, and very importantly, Ext4 can be shrunk.

That is true. However, we do have moved the expected feature set of the 
filesystem (i.e. cow) down to “store files safely and reliably” and we’ve seen 
too much breakage with ext4 in the past. Of course “persistence means you’ll 
have to say I’m sorry” and thus with either choice we may be faced with some 
issue in the future that we might have circumvented with another solution and 
yes flexibility is worth a great deal.

We’ve run XFS and ext4 on different (large and small) workloads in the last 2 
years and I have to say I’m much more happy about XFS even with the shrinking 
limitation.

To us ext4 is prohibitive with it’s fsck performance and we do like the tight 
error checking in XFS.

Thanks for the reminder though - especially in the public archive making this 
tradeoff with flexibility known is wise to communicate. :-)

Hugs,
Christian

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


Re: Shrinking a device - performance?

2017-03-27 Thread Roman Mamedov
On Mon, 27 Mar 2017 15:20:37 +0200
Christian Theune  wrote:

> (Background info: we’re migrating large volumes from btrfs to xfs and can
> only do this step by step: copying some data, shrinking the btrfs volume,
> extending the xfs volume, rinse repeat. If someone should have any
> suggestions to speed this up and not having to think in terms of _months_
> then I’m all ears.)

I would only suggest that you reconsider XFS. You can't shrink XFS, therefore
you won't have the flexibility to migrate in the same way to anything better
that comes along in the future (ZFS perhaps? or even Bcachefs?). XFS does not
perform that much better over Ext4, and very importantly, Ext4 can be shrunk.

>From the looks of it Ext4 has also overcome its 16TB limitation:
http://askubuntu.com/questions/779754/how-do-i-resize-an-ext4-partition-beyond-the-16tb-limit

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-27 Thread Christian Theune
Hi,

> On Mar 27, 2017, at 4:17 PM, Austin S. Hemmelgarn  
> wrote:
> 
> One other thing that I just thought of:
> For a backup system, assuming some reasonable thinning system is used for the 
> backups, I would personally migrate things slowly over time by putting new 
> backups on the new filesystem, and shrinking the old filesystem as the old 
> backups there get cleaned out.  Unfortunately, most backup software I've seen 
> doesn't handle this well, so it's not all that easy to do, but it does save 
> you from having to migrate data off of the old filesystem, and means you 
> don't have to worry as much about the resize of the old FS taking forever.

Right. This is an option we can do from a software perspective (our own 
solution - https://bitbucket.org/flyingcircus/backy) but our systems in use 
can’t hold all the data twice. Even though we’re migrating to a backend 
implementation that uses less data than before I have to perform an “inplace” 
migration in some way. This is VM block device backup. So basically we migrate 
one VM with all its previous data and that works quite fine with a little 
headroom. However, migrating all VMs to a new “full” backup and then wait for 
the old to shrink would only work if we had a completely empty backup server in 
place, which we don’t.

Also: the idea of migrating on btrfs also has its downside - the performance of 
“mkdir” and “fsync” is abysmal at the moment. I’m waiting for the current 
shrinking job to finish but this is likely limited to the “find free space” 
algorithm. We’re talking about a few megabytes converted per second. Sigh.

Cheers,
Christian Theune

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


Re: Shrinking a device - performance?

2017-03-27 Thread Austin S. Hemmelgarn

On 2017-03-27 09:54, Christian Theune wrote:

Hi,


On Mar 27, 2017, at 3:50 PM, Christian Theune  wrote:

Hi,


On Mar 27, 2017, at 3:46 PM, Austin S. Hemmelgarn  wrote:



Something I’d like to verify: does having traffic on the volume have
the potential to delay this infinitely? I.e. does the system write
to any segments that we’re trying to free so it may have to work on
the same chunk over and over again? If not, then this means it’s
just slow and we’re looking forward to about 2 months worth of time
shrinking this volume. (And then again on the next bigger server
probably about 3-4 months).


 I don't know. I would hope not, but I simply don't know enough
about the internal algorithms for that. Maybe someone else can confirm?

I'm not 100% certain, but I believe that while it can delay things, it can't do 
so infinitely.  AFAICT from looking at the code (disclaimer: I am not a C 
programmer by profession), it looks like writes to chunks that are being 
compacted or moved will go to the new location, not the old one, but writes to 
chunks which aren't being touched by the resize currently will just go to where 
the chunk is currently.  Based on this, lowering the amount of traffic to the 
FS could probably speed things up a bit, but it likely won't help much.


I hoped that this is the strategy implemented, otherwise it would end up in an 
infinite cat-and-mouse game. ;)


(Background info: we’re migrating large volumes from btrfs to xfs
and can only do this step by step: copying some data, shrinking the
btrfs volume, extending the xfs volume, rinse repeat. If someone
should have any suggestions to speed this up and not having to think
in terms of _months_ then I’m all ears.)


 All I can suggest is to move some unused data off the volume and do
it in fewer larger steps. Sorry.

Same.

The other option though is to just schedule a maintenance window, nuke the old 
FS, and restore from a backup.  If you can afford to take the system off-line 
temporarily, this will almost certainly go faster (assuming you have a 
reasonably fast means of restoring backups).


Well. This is the backup. ;)


One strategy that does come to mind: we’re converting our backup from a system 
that uses reflinks to a non-reflink based system. We can convert this in place 
so this would remove all the reflink stuff in the existing filesystem and then 
we maybe can do the FS conversion faster when this isn’t an issue any longer. I 
think I’ll

One other thing that I just thought of:
For a backup system, assuming some reasonable thinning system is used 
for the backups, I would personally migrate things slowly over time by 
putting new backups on the new filesystem, and shrinking the old 
filesystem as the old backups there get cleaned out.  Unfortunately, 
most backup software I've seen doesn't handle this well, so it's not all 
that easy to do, but it does save you from having to migrate data off of 
the old filesystem, and means you don't have to worry as much about the 
resize of the old FS taking forever.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-27 Thread Austin S. Hemmelgarn

On 2017-03-27 09:50, Christian Theune wrote:

Hi,


On Mar 27, 2017, at 3:46 PM, Austin S. Hemmelgarn  wrote:



Something I’d like to verify: does having traffic on the volume have
the potential to delay this infinitely? I.e. does the system write
to any segments that we’re trying to free so it may have to work on
the same chunk over and over again? If not, then this means it’s
just slow and we’re looking forward to about 2 months worth of time
shrinking this volume. (And then again on the next bigger server
probably about 3-4 months).


  I don't know. I would hope not, but I simply don't know enough
about the internal algorithms for that. Maybe someone else can confirm?

I'm not 100% certain, but I believe that while it can delay things, it can't do 
so infinitely.  AFAICT from looking at the code (disclaimer: I am not a C 
programmer by profession), it looks like writes to chunks that are being 
compacted or moved will go to the new location, not the old one, but writes to 
chunks which aren't being touched by the resize currently will just go to where 
the chunk is currently.  Based on this, lowering the amount of traffic to the 
FS could probably speed things up a bit, but it likely won't help much.


I hoped that this is the strategy implemented, otherwise it would end up in an 
infinite cat-and-mouse game. ;)
I know that balance and replace work this way, and the code for resize 
appears to handle things similarly to both, so I'm pretty certain it 
works this way.  TBH though, it's really the only sane way to handle 
something like this.



(Background info: we’re migrating large volumes from btrfs to xfs
and can only do this step by step: copying some data, shrinking the
btrfs volume, extending the xfs volume, rinse repeat. If someone
should have any suggestions to speed this up and not having to think
in terms of _months_ then I’m all ears.)


  All I can suggest is to move some unused data off the volume and do
it in fewer larger steps. Sorry.

Same.

The other option though is to just schedule a maintenance window, nuke the old 
FS, and restore from a backup.  If you can afford to take the system off-line 
temporarily, this will almost certainly go faster (assuming you have a 
reasonably fast means of restoring backups).


Well. This is the backup. ;)

Ah, yeah, that does complicate things a bit more.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-27 Thread Christian Theune
Hi,

> On Mar 27, 2017, at 3:50 PM, Christian Theune  wrote:
> 
> Hi,
> 
>> On Mar 27, 2017, at 3:46 PM, Austin S. Hemmelgarn  
>> wrote:
>>> 
 Something I’d like to verify: does having traffic on the volume have
 the potential to delay this infinitely? I.e. does the system write
 to any segments that we’re trying to free so it may have to work on
 the same chunk over and over again? If not, then this means it’s
 just slow and we’re looking forward to about 2 months worth of time
 shrinking this volume. (And then again on the next bigger server
 probably about 3-4 months).
>>> 
>>>  I don't know. I would hope not, but I simply don't know enough
>>> about the internal algorithms for that. Maybe someone else can confirm?
>> I'm not 100% certain, but I believe that while it can delay things, it can't 
>> do so infinitely.  AFAICT from looking at the code (disclaimer: I am not a C 
>> programmer by profession), it looks like writes to chunks that are being 
>> compacted or moved will go to the new location, not the old one, but writes 
>> to chunks which aren't being touched by the resize currently will just go to 
>> where the chunk is currently.  Based on this, lowering the amount of traffic 
>> to the FS could probably speed things up a bit, but it likely won't help 
>> much.
> 
> I hoped that this is the strategy implemented, otherwise it would end up in 
> an infinite cat-and-mouse game. ;)
> 
 (Background info: we’re migrating large volumes from btrfs to xfs
 and can only do this step by step: copying some data, shrinking the
 btrfs volume, extending the xfs volume, rinse repeat. If someone
 should have any suggestions to speed this up and not having to think
 in terms of _months_ then I’m all ears.)
>>> 
>>>  All I can suggest is to move some unused data off the volume and do
>>> it in fewer larger steps. Sorry.
>> Same.
>> 
>> The other option though is to just schedule a maintenance window, nuke the 
>> old FS, and restore from a backup.  If you can afford to take the system 
>> off-line temporarily, this will almost certainly go faster (assuming you 
>> have a reasonably fast means of restoring backups).
> 
> Well. This is the backup. ;)

One strategy that does come to mind: we’re converting our backup from a system 
that uses reflinks to a non-reflink based system. We can convert this in place 
so this would remove all the reflink stuff in the existing filesystem and then 
we maybe can do the FS conversion faster when this isn’t an issue any longer. I 
think I’ll

Christian

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


Re: Shrinking a device - performance?

2017-03-27 Thread Christian Theune
Hi,

> On Mar 27, 2017, at 3:46 PM, Austin S. Hemmelgarn  
> wrote:
>> 
>>> Something I’d like to verify: does having traffic on the volume have
>>> the potential to delay this infinitely? I.e. does the system write
>>> to any segments that we’re trying to free so it may have to work on
>>> the same chunk over and over again? If not, then this means it’s
>>> just slow and we’re looking forward to about 2 months worth of time
>>> shrinking this volume. (And then again on the next bigger server
>>> probably about 3-4 months).
>> 
>>   I don't know. I would hope not, but I simply don't know enough
>> about the internal algorithms for that. Maybe someone else can confirm?
> I'm not 100% certain, but I believe that while it can delay things, it can't 
> do so infinitely.  AFAICT from looking at the code (disclaimer: I am not a C 
> programmer by profession), it looks like writes to chunks that are being 
> compacted or moved will go to the new location, not the old one, but writes 
> to chunks which aren't being touched by the resize currently will just go to 
> where the chunk is currently.  Based on this, lowering the amount of traffic 
> to the FS could probably speed things up a bit, but it likely won't help much.

I hoped that this is the strategy implemented, otherwise it would end up in an 
infinite cat-and-mouse game. ;)

>>> (Background info: we’re migrating large volumes from btrfs to xfs
>>> and can only do this step by step: copying some data, shrinking the
>>> btrfs volume, extending the xfs volume, rinse repeat. If someone
>>> should have any suggestions to speed this up and not having to think
>>> in terms of _months_ then I’m all ears.)
>> 
>>   All I can suggest is to move some unused data off the volume and do
>> it in fewer larger steps. Sorry.
> Same.
> 
> The other option though is to just schedule a maintenance window, nuke the 
> old FS, and restore from a backup.  If you can afford to take the system 
> off-line temporarily, this will almost certainly go faster (assuming you have 
> a reasonably fast means of restoring backups).

Well. This is the backup. ;)

Thanks,
Christian

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


Re: Shrinking a device - performance?

2017-03-27 Thread Austin S. Hemmelgarn

On 2017-03-27 09:24, Hugo Mills wrote:

On Mon, Mar 27, 2017 at 03:20:37PM +0200, Christian Theune wrote:

Hi,


On Mar 27, 2017, at 3:07 PM, Hugo Mills  wrote:

  On my hardware (consumer HDDs and SATA, RAID-1 over 6 devices), it
takes about a minute to move 1 GiB of data. At that rate, it would
take 1000 minutes (or about 16 hours) to move 1 TiB of data.

  However, there are cases where some items of data can take *much*
longer to move. The biggest of these is when you have lots of
snapshots. When that happens, some (but not all) of the metadata can
take a very long time. In my case, with a couple of hundred snapshots,
some metadata chunks take 4+ hours to move.



Thanks for that info. The 1min per 1GiB is what I saw too - the “it
can take longer” wasn’t really explainable to me.



As I’m not using snapshots: would large files (100+gb) with long
chains of CoW history (specifically reflink copies) also hurt?


   Yes, that's the same issue -- it's to do with the number of times
an extent is shared. Snapshots are one way of creating that sharing,
reflinks are another.
FWIW, I've noticed less of an issue with reflinks than snapshots, but I 
can't comment on this specific case.



Something I’d like to verify: does having traffic on the volume have
the potential to delay this infinitely? I.e. does the system write
to any segments that we’re trying to free so it may have to work on
the same chunk over and over again? If not, then this means it’s
just slow and we’re looking forward to about 2 months worth of time
shrinking this volume. (And then again on the next bigger server
probably about 3-4 months).


   I don't know. I would hope not, but I simply don't know enough
about the internal algorithms for that. Maybe someone else can confirm?
I'm not 100% certain, but I believe that while it can delay things, it 
can't do so infinitely.  AFAICT from looking at the code (disclaimer: I 
am not a C programmer by profession), it looks like writes to chunks 
that are being compacted or moved will go to the new location, not the 
old one, but writes to chunks which aren't being touched by the resize 
currently will just go to where the chunk is currently.  Based on this, 
lowering the amount of traffic to the FS could probably speed things up 
a bit, but it likely won't help much.



(Background info: we’re migrating large volumes from btrfs to xfs
and can only do this step by step: copying some data, shrinking the
btrfs volume, extending the xfs volume, rinse repeat. If someone
should have any suggestions to speed this up and not having to think
in terms of _months_ then I’m all ears.)


   All I can suggest is to move some unused data off the volume and do
it in fewer larger steps. Sorry.

Same.

The other option though is to just schedule a maintenance window, nuke 
the old FS, and restore from a backup.  If you can afford to take the 
system off-line temporarily, this will almost certainly go faster 
(assuming you have a reasonably fast means of restoring backups).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-27 Thread Hugo Mills
On Mon, Mar 27, 2017 at 03:20:37PM +0200, Christian Theune wrote:
> Hi,
> 
> > On Mar 27, 2017, at 3:07 PM, Hugo Mills  wrote:
> > 
> >   On my hardware (consumer HDDs and SATA, RAID-1 over 6 devices), it
> > takes about a minute to move 1 GiB of data. At that rate, it would
> > take 1000 minutes (or about 16 hours) to move 1 TiB of data.
> > 
> >   However, there are cases where some items of data can take *much*
> > longer to move. The biggest of these is when you have lots of
> > snapshots. When that happens, some (but not all) of the metadata can
> > take a very long time. In my case, with a couple of hundred snapshots,
> > some metadata chunks take 4+ hours to move.

> Thanks for that info. The 1min per 1GiB is what I saw too - the “it
> can take longer” wasn’t really explainable to me.

> As I’m not using snapshots: would large files (100+gb) with long
> chains of CoW history (specifically reflink copies) also hurt?

   Yes, that's the same issue -- it's to do with the number of times
an extent is shared. Snapshots are one way of creating that sharing,
reflinks are another.

> Something I’d like to verify: does having traffic on the volume have
> the potential to delay this infinitely? I.e. does the system write
> to any segments that we’re trying to free so it may have to work on
> the same chunk over and over again? If not, then this means it’s
> just slow and we’re looking forward to about 2 months worth of time
> shrinking this volume. (And then again on the next bigger server
> probably about 3-4 months).

   I don't know. I would hope not, but I simply don't know enough
about the internal algorithms for that. Maybe someone else can confirm?

> (Background info: we’re migrating large volumes from btrfs to xfs
> and can only do this step by step: copying some data, shrinking the
> btrfs volume, extending the xfs volume, rinse repeat. If someone
> should have any suggestions to speed this up and not having to think
> in terms of _months_ then I’m all ears.)

   All I can suggest is to move some unused data off the volume and do
it in fewer larger steps. Sorry.

   Hugo.

-- 
Hugo Mills | Jenkins! Chap with the wings there! Five rounds
hugo@... carfax.org.uk | rapid!
http://carfax.org.uk/  | Brigadier Alistair Lethbridge-Stewart
PGP: E2AB1DE4  |Dr Who and the Daemons


signature.asc
Description: Digital signature


Re: Shrinking a device - performance?

2017-03-27 Thread Christian Theune
Hi,

> On Mar 27, 2017, at 3:07 PM, Hugo Mills  wrote:
> 
>   On my hardware (consumer HDDs and SATA, RAID-1 over 6 devices), it
> takes about a minute to move 1 GiB of data. At that rate, it would
> take 1000 minutes (or about 16 hours) to move 1 TiB of data.
> 
>   However, there are cases where some items of data can take *much*
> longer to move. The biggest of these is when you have lots of
> snapshots. When that happens, some (but not all) of the metadata can
> take a very long time. In my case, with a couple of hundred snapshots,
> some metadata chunks take 4+ hours to move.

Thanks for that info. The 1min per 1GiB is what I saw too - the “it can take 
longer” wasn’t really explainable to me.

As I’m not using snapshots: would large files (100+gb) with long chains of CoW 
history (specifically reflink copies) also hurt?

Something I’d like to verify: does having traffic on the volume have the 
potential to delay this infinitely? I.e. does the system write to any segments 
that we’re trying to free so it may have to work on the same chunk over and 
over again? If not, then this means it’s just slow and we’re looking forward to 
about 2 months worth of time shrinking this volume. (And then again on the next 
bigger server probably about 3-4 months).

(Background info: we’re migrating large volumes from btrfs to xfs and can only 
do this step by step: copying some data, shrinking the btrfs volume, extending 
the xfs volume, rinse repeat. If someone should have any suggestions to speed 
this up and not having to think in terms of _months_ then I’m all ears.)

Cheers,
Christian

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


Re: Shrinking a device - performance?

2017-03-27 Thread Hugo Mills
On Mon, Mar 27, 2017 at 01:17:26PM +0200, Christian Theune wrote:
> Hi,
> 
> I’m currently shrinking a device and it seems that the performance of shrink 
> is abysmal. I intended to shrink a ~22TiB filesystem down to 20TiB. This is 
> still using LVM underneath so that I can’t just remove a device from the 
> filesystem but have to use the resize command.
> 
> Label: 'backy'  uuid: 3d0b7511-4901-4554-96d4-e6f9627ea9a4
> Total devices 1 FS bytes used 18.21TiB
> devid1 size 20.00TiB used 20.71TiB path /dev/mapper/vgsys-backy
> 
> This has been running since last Thursday, so roughly 3.5days now. The “used” 
> number in devid1 has moved about 1TiB in this time. The filesystem is seeing 
> regular usage (read and write) and when I’m suspending any application 
> traffic I see about 1GiB of movement every now and then. Maybe once every 30 
> seconds or so.
> 
> Does this sound fishy or normal to you?

   On my hardware (consumer HDDs and SATA, RAID-1 over 6 devices), it
takes about a minute to move 1 GiB of data. At that rate, it would
take 1000 minutes (or about 16 hours) to move 1 TiB of data.

   However, there are cases where some items of data can take *much*
longer to move. The biggest of these is when you have lots of
snapshots. When that happens, some (but not all) of the metadata can
take a very long time. In my case, with a couple of hundred snapshots,
some metadata chunks take 4+ hours to move.

   Hugo.

-- 
Hugo Mills | Great films about cricket: Silly Point Break
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: Shrinking a device - performance?

2017-03-27 Thread Christian Theune

> On Mar 27, 2017, at 1:51 PM, Christian Theune  wrote:
> 
> Hi,
> 
> (I hope I’m not double posting. My mail client was misconfigured and I think 
> I only managed to send the mail correctly this time.)

Turns out I did double post. Mea culpa.

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


Shrinking a device - performance?

2017-03-27 Thread Christian Theune
Hi,

(I hope I’m not double posting. My mail client was misconfigured and I think I 
only managed to send the mail correctly this time.)

I’m currently shrinking a device and it seems that the performance of shrink is 
abysmal. I intended to shrink a ~22TiB filesystem down to 20TiB. This is still 
using LVM underneath so that I can’t just remove a device from the filesystem 
but have to use the resize command.

Label: 'backy'  uuid: 3d0b7511-4901-4554-96d4-e6f9627ea9a4
   Total devices 1 FS bytes used 18.21TiB
   devid1 size 20.00TiB used 20.71TiB path /dev/mapper/vgsys-backy

This has been running since last Thursday, so roughly 3.5days now. The “used” 
number in devid1 has moved about 1TiB in this time. The filesystem is seeing 
regular usage (read and write) and when I’m suspending any application traffic 
I see about 1GiB of movement every now and then. Maybe once every 30 seconds or 
so.

Does this sound fishy or normal to you?

Kind regards,
Christian

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


Shrinking a device - performance?

2017-03-27 Thread Christian Theune
Hi,

I’m currently shrinking a device and it seems that the performance of shrink is 
abysmal. I intended to shrink a ~22TiB filesystem down to 20TiB. This is still 
using LVM underneath so that I can’t just remove a device from the filesystem 
but have to use the resize command.

Label: 'backy'  uuid: 3d0b7511-4901-4554-96d4-e6f9627ea9a4
Total devices 1 FS bytes used 18.21TiB
devid1 size 20.00TiB used 20.71TiB path /dev/mapper/vgsys-backy

This has been running since last Thursday, so roughly 3.5days now. The “used” 
number in devid1 has moved about 1TiB in this time. The filesystem is seeing 
regular usage (read and write) and when I’m suspending any application traffic 
I see about 1GiB of movement every now and then. Maybe once every 30 seconds or 
so.

Does this sound fishy or normal to you?

Kind regards,
Christian

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP