Re: [xz-devel] [PATCH] xz: Fix setting memory limit on 32-bit systems

2022-04-14 Thread Lasse Collin
On 2021-01-20 Sebastian Andrzej Siewior wrote:
> On 2021-01-18 23:52:50 [+0200], Lasse Collin wrote:
> > I have understood that *in practice* the problem with the xz command
> > line tool is limited to "xz -T0" usage so fixing this use case is
> > enough for most people. Please correct me if I missed something.  
> 
> Correct.

There is some code for special behavior with -T0 now for both
compression and decompression. I haven't updated the man page yet but
the commit messages should be helpful. I hope it can be documented so
that it sounds simple enough. :-)

> In the parallel decompress I added code on Linux to query the
> available memory. I would prefer that as an upper limit on 64bit if no
> limit is given. The reason is that *this* amount of memory is safe to
> use without over-committing / involving swap.

This may be the way to go on Linux but I didn't add it yet. The
committed code uses total_ram / 4. Since MemAvail is Linux-specific
something more broadly available needs exist for better portability,
and total_ram / 4 could perhaps be it. It can be tweaked if needed,
it's just a starting point.

> For 32bit applications I would cap that limit to 2.5 GiB or so. The
> reason is that the *normal* case is to run 32bit application on a
> 32bit kernel and so likely only 3GiB can be addressed at most (minus
> a few details like linked in libs, NULL page, guard pages and so on).
> The 32bit application on 64bit kernel is probably a shortcut where
> something is done a 32bit chroot - like building a package.
> 
> I'm not sure what a sane upper limit is on other OSes. Limitting it on
> 32bit does probably more good than bad if there is no -M parameter.

I think a generic cap needs to be below 2 GiB. For example, if 32-bit
MIPS can do only 2 GiB. There could be OS+arch-specific exceptions
though.

The code currently in xz.git uses 1400 MiB. There needs to be some
extra room if repeated mallocs and frees fragment the address space a
little. Perhaps it's too conservative but it allows eight compression
threads at the default xz -6, and one thread at -9 in threaded mode (so
it can create a file that can be decompressed in threaded mode).

> > An alternative "fix" for the liblzma case could be adding a simple
> > API function that would scale down the number of threads in a
> > lzma_mt structure based on a memory usage limit and if the
> > application is 32 bits. Currently the thread count and LZMA2
> > settings adjusting code is in xz, not in liblzma.  
> 
> It might help. dpkg checks the memlimit with
> lzma_stream_encoder_mt_memusage() and decreases the memory limit until
> it fits. It looks simpler compared to rpm's attempt and various
> exceptions.

Now that lzma_mt structure contains memlimit_threading already, a flag
could be added to use it to reduce the number of threads at the encoder
initialization. I suppose reducing the thread count would go a long
way. It doesn't affect the compressed output so it can be done when
people wish reproducible output.

> > The idea for the current 4020 MiB special limit is based on a patch
> > that was in use in FreeBSD to solve the problem of 32-bit xz on
> > 64-bit kernel. So at least FreeBSD should be supported to not make
> > 32-bit xz worse under 64-bit FreeBSD kernel.  
> 
> Is this a common case?

I don't *know* but I guess some build 32-bit packages on a 64-bit
kernel so it may be common enough use case.

> While poking around, Linux has this personality() syscall/function.
> There is a flag called PER_LINUX32_3GB and PER_LINUX_32BIT which are
> set if the command is invoked with `linux32' say
>   linux32 xz
> 
> then it would set that flag set and could act. It is not set by
> starting a 32bit application on a 64bit kernel on its own or on a
> 32bit kernel. I don't know if this is common practise but I use this
> in my chroots. So commands like `uname -m' return `i686' instead of
> `x86_64'. If other chroot environments do it as well then it could be
> used as a hack to assume that it is run on 64bit kernel. That is if
> we want that ofcourse :)

I haven't look at this but it sounds that it could be useful. If xz
knows that it has 4 GiB of address space the default limit could be much
higher.

-- 
Lasse Collin



Re: [xz-devel] [PATCH] xz: Fix setting memory limit on 32-bit systems

2021-01-19 Thread Sebastian Andrzej Siewior
On 2021-01-18 23:52:50 [+0200], Lasse Collin wrote:
> On 2021-01-10 Sebastian Andrzej Siewior wrote:
> > I hope for sane defaults :)
> 
> I hope so too. So far I have felt that the suggested solutions have
> significant flaws or downsides, and I'm not able to see what is a good
> enough compromise. As a result the discussion hasn't progressed much
> and I feel it's partly my fault, sorry. I will try again:
> 
> I have understood that *in practice* the problem with the xz command
> line tool is limited to "xz -T0" usage so fixing this use case is
> enough for most people. Please correct me if I missed something.

Correct.

> The change in XZ Utils 5.2.5 helps a little with 32-bit xz running
> under 64-bit kernel but only if one specifies a memory usage limit like
> -M90% together with -T0. To make plain -T0 work too, in an earlier
> email I suggested that -T0 could also imply a memory usage limit if no
> limit was otherwise specified (a preliminary patch was included too). I
> have been hesitant to make changes to the defaults of the memory usage
> limiter but this solution would only affect a very specific situation
> and thus I feel it would be fine. Comments would be appreciated.

In the parallel decompress I added code on Linux to query the
available memory. I would prefer that as an upper limit on 64bit if no
limit is given. The reason is that *this* amount of memory is safe to
use without over-committing / involving swap.
For 32bit applications I would cap that limit to 2.5 GiB or so. The
reason is that the *normal* case is to run 32bit application on a 32bit
kernel and so likely only 3GiB can be addressed at most (minus a few
details like linked in libs, NULL page, guard pages and so on).
The 32bit application on 64bit kernel is probably a shortcut where
something is done a 32bit chroot - like building a package.

I'm not sure what a sane upper limit is on other OSes. Limitting it on
32bit does probably more good than bad if there is no -M parameter.

> The problem with applications using liblzma and running out of address
> space sounds harder to fix. As I explained in another email, making
> liblzma more robust with memory allocation failures is not a perfect
> fix and can still result in severe problems depending on how the
> application as a whole works (with some apps it could be enough).

Yes. For liblzma you get the memory limitation from the caller. I've
seen in Debian's dpkg to use physmem/2 with 2GiB as upper limit on
32bit. That works ;)

> An alternative "fix" for the liblzma case could be adding a simple API
> function that would scale down the number of threads in a lzma_mt
> structure based on a memory usage limit and if the application is 32
> bits. Currently the thread count and LZMA2 settings adjusting code is
> in xz, not in liblzma.

It might help. dpkg checks the memlimit with
lzma_stream_encoder_mt_memusage() and decreases the memory limit until
it fits. It looks simpler compared to rpm's attempt and various
exceptions.

> > Anyway. Not to overcompilcate things: On Linux you can obtain the
> > available system memory which I would cap to 2 or 2.5 GiB by default.
> > Nobody should be hurt by that.
> 
> If full 4 GiB of address space is available, capping to 2 GiB to 2.5 GiB
> when the available memory isn't known would mean fewer threads than
> with the 4020 MiB limit. Obviously this is less bad than failing due to
> running out of address space but it still makes me feel that if
> available memory is used on Linux, it should be ported to other OSes
> too.

I didn't understand that last sentance.

> The idea for the current 4020 MiB special limit is based on a patch
> that was in use in FreeBSD to solve the problem of 32-bit xz on 64-bit
> kernel. So at least FreeBSD should be supported to not make 32-bit xz
> worse under 64-bit FreeBSD kernel.

Is this a common case? 
While poking around, Linux has this personality() syscall/function.
There is a flag called PER_LINUX32_3GB and PER_LINUX_32BIT which are set
if the command is invoked with `linux32' say
linux32 xz

then it would set that flag set and could act. It is not set by starting
a 32bit application on a 64bit kernel on its own or on a 32bit kernel.
I don't know if this is common practise but I use this in my chroots. So
commands like `uname -m' return `i686' instead of `x86_64'.
If other chroot environments do it as well then it could be used as a
hack to assume that it is run on 64bit kernel. That is if we want that
ofcourse :)

> In liblzma, if a new function is added to reduce the thread count based
> on a memory usage limit, a capping the limit to 2 to 3 GiB on 32-bit
> applications could be fine even if there is more available memory. Being
> conservative means fewer threads but it would make it more likely that
> things keep working if the application allocates memory after liblzma
> has already done so.
> 
> Oh well. :-( I think I still made this sound like a mess. In any case,
> let's at least try to find 

Re: [xz-devel] [PATCH] xz: Fix setting memory limit on 32-bit systems

2021-01-18 Thread Lasse Collin
On 2021-01-10 Sebastian Andrzej Siewior wrote:
> I hope for sane defaults :)

I hope so too. So far I have felt that the suggested solutions have
significant flaws or downsides, and I'm not able to see what is a good
enough compromise. As a result the discussion hasn't progressed much
and I feel it's partly my fault, sorry. I will try again:

I have understood that *in practice* the problem with the xz command
line tool is limited to "xz -T0" usage so fixing this use case is
enough for most people. Please correct me if I missed something.

The change in XZ Utils 5.2.5 helps a little with 32-bit xz running
under 64-bit kernel but only if one specifies a memory usage limit like
-M90% together with -T0. To make plain -T0 work too, in an earlier
email I suggested that -T0 could also imply a memory usage limit if no
limit was otherwise specified (a preliminary patch was included too). I
have been hesitant to make changes to the defaults of the memory usage
limiter but this solution would only affect a very specific situation
and thus I feel it would be fine. Comments would be appreciated.


The problem with applications using liblzma and running out of address
space sounds harder to fix. As I explained in another email, making
liblzma more robust with memory allocation failures is not a perfect
fix and can still result in severe problems depending on how the
application as a whole works (with some apps it could be enough).

An alternative "fix" for the liblzma case could be adding a simple API
function that would scale down the number of threads in a lzma_mt
structure based on a memory usage limit and if the application is 32
bits. Currently the thread count and LZMA2 settings adjusting code is
in xz, not in liblzma.

> Anyway. Not to overcompilcate things: On Linux you can obtain the
> available system memory which I would cap to 2 or 2.5 GiB by default.
> Nobody should be hurt by that.

If full 4 GiB of address space is available, capping to 2 GiB to 2.5 GiB
when the available memory isn't known would mean fewer threads than
with the 4020 MiB limit. Obviously this is less bad than failing due to
running out of address space but it still makes me feel that if
available memory is used on Linux, it should be ported to other OSes
too.

The idea for the current 4020 MiB special limit is based on a patch
that was in use in FreeBSD to solve the problem of 32-bit xz on 64-bit
kernel. So at least FreeBSD should be supported to not make 32-bit xz
worse under 64-bit FreeBSD kernel.

In liblzma, if a new function is added to reduce the thread count based
on a memory usage limit, a capping the limit to 2 to 3 GiB on 32-bit
applications could be fine even if there is more available memory. Being
conservative means fewer threads but it would make it more likely that
things keep working if the application allocates memory after liblzma
has already done so.

Oh well. :-( I think I still made this sound like a mess. In any case,
let's at least try to find some solution to the "xz -T0" case. It would
be nice to hear if my suggestion makes any sense. Thanks.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode



Re: [xz-devel] [PATCH] xz: Fix setting memory limit on 32-bit systems

2021-01-10 Thread Sebastian Andrzej Siewior
On 2021-01-08 18:40:18 [+0200], Lasse Collin wrote:
> Hello!
Hi,

> Sorry for the two-week silence, I was ill. It will take a few days for
> me to catch up with the emails.

No worries. Take your time to get better.

> On 2020-12-26 Sebastian Andrzej Siewior wrote:
> > On 2020-12-26 09:33:04 [+0300], Vitaly Chikunov wrote:
> > > This wasn't working, because `memlimit_compress` initialized with
> > > zero, thus memory limit is never lowered for 32-bit address space,
> > > causing `Cannot allocate memory' error (in `lzma_outq_init()'). For
> > > example, when `-T0' is used on 32 CPUs with compression level
> > > higher than `-6'.  
> > 
> > That is one way. It might be that hardware_init() should pass
> > `total_ram' to hardware_memlimit_set() instead of 0.
> > hardware_memlimit_get() treats 0 as unlimited but I don't think it
> > makes sense since memory is never unlimited.
> 
> 0 means disabled, that is, xz is expected to behave just like most
> other programs that might allocate a lot of memory but don't have any
> internal memory usage limiting. Memory isn't unlimited but many
> programs sort of behave as if it were and fail hard if allocation
> fails. That's not robust but it seems to work most of the time and
> many seem find this to be acceptable behavior in general.

I hope for sane defaults :)

> The whole limiter feature exist because I felt it was good to have a
> mechanism to control the memory usage, especially when decompressing
> since a .xz file may cause xz to allocate 4 GiB of memory for a single
> thread. However, I think few people think the same and thus the limiter
> is off by default for both compression and decompression.

The memory limiter sounds reasonable - no doubts.

> > Also, 32bit with almost 4GiB as a limit is working. If you increase
> > your input (the example from your previous email) then you also end
> > up "can not allocate memory) simply because 32bit can not allocate
> > 4GiB of memory. I'm not sure if the actual memory limit is exported.
> > It is usually at around 3GiB but there architectures which allow less
> > than that (not to mention kernel configurations).
> 
> I don't know if Linux makes it possible for userspace applications to
> know the available address space. It can indeed vary depending on the
> kernel config. xz also needs to be portable to many other kernels. The
> 4020 MiB hack works with 64-bit kernels running 32-bit applications
> since in that case many kernels provide 4 GiB of address space.

That is kind of a pain. I'm not aware of anything that reports the
possible address limit other than some test-and-error hacks.
Debian had a 2:2 split and was "forced" to switch to a 3:1 split because
some java applications expected / required a larger virtual address
space. I think that every distro ships a 3:1 32bit kernel now.
You can also have architecture level limitations. If I remember correctly
there was (is) a MIPS achitecture which can not assign more than 2GiB of
address space to a single application.

Anyway. Not to overcompilcate things: On Linux you can obtain the
available system memory which I would cap to 2 or 2.5 GiB by default.
Nobody should be hurt by that.

> There are also resource limits that may also be somewhat OS-specific.
> On GNU/Linux one can use "ulimit -v LIM" where LIM is the virtual
> memory limit in KiB. Trying to exceed it will result in ENOMEM just
> like when running out of address space.
> 
> Trying to figure out the various limits doesn't sound practical
> especially if it is supposed to work with kernels other than Linux.
> Simply trying to allocate a lot of memory (to test if it works) is more
> realistic but I think it's still dumb.

Oh, I though that this isn't Linux only. xz could query it but for
liblzma it is imposible since it is part of something bigger. If the
user is able to set ulimit, it is reasonable to assume that he can also
use -M. While I observed it a few times that a script invoked "xz -T0"
as somepoint which led to bad outcome on big iron.

Sebastian



Re: [xz-devel] [PATCH] xz: Fix setting memory limit on 32-bit systems

2021-01-09 Thread Lasse Collin
On 2021-01-09 Vitaly Chikunov wrote:
> Few percent of the people report bugs or misfeatures, most just adapt
> or abandon usage. For example,
> 
> 1. We added -T0 to our rpm building system for xz call, and when it's
> failed (tests) the change is just reverted. Yes, we didn't gain speed
> up, but things are working again. How much people did the same?

Possibly many. In some ways the existence of -T0 may even have been
counter-productive since it may make people think xz is smart while it
isn't.

Also worth keeping in mind is that threaded mode in xz tends to produce 
slightly bigger output than single-threaded mode, so when maximum compression 
is wanted threading may need to be avoided anyway.

> 2. More important problem is not just with xz tool, but with liblzma
> xz mt compression.

Yes and it's even harder to fix than xz.

> (I think, they switched to zstd after all.)

I think zstd is nowadays the way to go with package managers unless
file size is very important or the users' connection speeds are known
to be somewhat low (under 10 Mbit/s). Package managers usually download
and decompress as separate steps (not parallel) so zstd's fast
decompression makes up for the extra time spent to download a slightly
bigger file. Threaded decompression with xz won't help much because
most packages are so small that threading doesn't come into play.

> We have to build and test re-build thousands of packages on many
> architectures every day, some of them got compression failures So,
> there is new solution - increase memory limit to 512MiB:

Just to clarify, I see you mean 512 MiB as reserved memory so 3.5 GiB as
the memlimit for liblzma.

> So, this is not like happy careless life out there.

I agree, I don't mean to downplay the severity of the problem.

> Only proper solution, of course, is to make liblzma mt code robust on
> per-thread memory allocation.

That would help but it's not the final answer *in general*.

It's hard to know how much memory the rest of the application (outside
liblzma) may need. If liblzma were more robust against memory
allocation failures, it would still be possible that liblzma manages to
allocate quite a bit of memory and *then* the non-liblzma part of the
application needs to allocate some memory but that fails because
liblzma already grabbed most of the address space. Thus in the end you
could still need hacks like you described. Obviously this isn't a
problem if you *know* that the application won't allocate any memory
during the compression phase.

If 32-bit address space is too limited, one way to is to run the
compressor in a separate process. It may sound like a hack but there's
a reason why 64-bit address space got common.

The new lzma_outq doesn't allocate all buffers at once. If the call to
lzma_block_encoder_init() (and the associated structure
initializations) were moved to get_thread(), it could be a start to
make the encoder tolerate allocation failures because then no input
would be copied to thread-specific buffer before the encoder
initialization has succeeded. There are some details that complicate
the idea but perhaps it should be looked at.

> My ugly and dumb hack does not meant to be merged as is, but (RFC)
> invitation to discussion and would show your attitude to the problem.

Sure. I know I can be stubborn and when memlimits are discussed I may
be extra defensive. Also, I know I have a problem that if all suggested
solutions don't look perfect enough I may too easily reject them. But I
also recognize that the problem you (and others) describe is
significant, I just don't know how to *properly* solve it.

For the xz tool, in the previous email I wondered if it could help to
make -T0 set a memlimit if and only if no limit was otherwise
specified. Then the default memlimit would be specific to the -T0 case
which is known to be silly even on 64-bit arch if one happens to have a
lot of hardware threads but relatively small amount of RAM. I wrote a
rough preliminary patch for this:

diff --git a/src/xz/args.c b/src/xz/args.c
index 9238fb3..0ad924e 100644
--- a/src/xz/args.c
+++ b/src/xz/args.c
@@ -29,6 +29,25 @@ bool opt_ignore_check = false;
 const char stdin_filename[] = "(stdin)";
 
 
+/// True if using --threads=0 to autodetect the number of threads.
+static bool threads_are_autodetected = false;
+
+
+static void
+set_memlimits_for_autodetected_threads(void)
+{
+   if (threads_are_autodetected) {
+   if (hardware_memlimit_get(MODE_COMPRESS) == UINT64_MAX)
+   hardware_memlimit_set(95, true, false, true);
+
+   if (hardware_memlimit_get(MODE_DECOMPRESS) == UINT64_MAX)
+   hardware_memlimit_set(95, false, true, true);
+   }
+
+   return;
+}
+
+
 /// Parse and set the memory usage limit for compression and/or decompression.
 static void
 parse_memlimit(const char *name, const char *name_percentage, char *str,
@@ -246,11 +265,14 @@ parse_real(args_info *args, int argc, char **argv)
 

Re: [xz-devel] [PATCH] xz: Fix setting memory limit on 32-bit systems

2021-01-09 Thread Vitaly Chikunov
Hi,

On Fri, Jan 08, 2021 at 06:40:18PM +0200, Lasse Collin wrote:
> On 2020-12-26 Sebastian Andrzej Siewior wrote:
> > On 2020-12-26 09:33:04 [+0300], Vitaly Chikunov wrote:
> > > This wasn't working, because `memlimit_compress` initialized with
> > > zero, thus memory limit is never lowered for 32-bit address space,
> > > causing `Cannot allocate memory' error (in `lzma_outq_init()'). For
> > > example, when `-T0' is used on 32 CPUs with compression level
> > > higher than `-6'.  
> > 
> > That is one way. It might be that hardware_init() should pass
> > `total_ram' to hardware_memlimit_set() instead of 0.
> > hardware_memlimit_get() treats 0 as unlimited but I don't think it
> > makes sense since memory is never unlimited.
> 
> 0 means disabled, that is, xz is expected to behave just like most
> other programs that might allocate a lot of memory but don't have any
> internal memory usage limiting. Memory isn't unlimited but many
> programs sort of behave as if it were and fail hard if allocation
> fails. That's not robust but it seems to work most of the time and
> many seem find this to be acceptable behavior in general.

Few percent of the people report bugs or misfeatures, most just adapt
or abandon usage. For example,

1. We added -T0 to our rpm building system for xz call, and when it's
failed (tests) the change is just reverted. Yes, we didn't gain speed
up, but things are working again. How much people did the same?

2. More important problem is not just with xz tool, but with liblzma xz
mt compression. For example, in Fedora's RPM they added multi-thread
xz compression in (note: because it's usage is configurable by other
means this commit does not mean they stated use it at the time):

  
https://github.com/rpm-software-management/rpm/commit/7740d1098642cd42f725fb9a2a3819ceaf51a80b

then, guess what, it didn't worked on 32-bit systems. Did you have bug
report about it? People tried to solve it by themselves with so
complicated code (which we found out lateris not always effective):

  
https://github.com/rpm-software-management/rpm/commit/a60f36a55cff3331e8bef3a1ab95c87d313911bb

Then they optimized it to use 4 threads on 32-bit architectures.

  
https://github.com/rpm-software-management/rpm/commit/4dfb381742c27449aafcd907c05f3a84015ffb54

(I think, they switched to zstd after all.)

In our rpmbuild we are using lzma, but wanted to speed up compression
and use multi-threaded xz, so we optimistically merged these upstream
changes:

  
http://git.altlinux.org/gears/r/rpm-build.git?p=rpm-build.git;a=commit;h=a4bace177116163e8f6cb9ce6e31f94252e06775

Again, we got the same problem as them, and thought to reserve 128MiB will
be enough:

  
http://git.altlinux.org/gears/r/rpm-build.git?p=rpm-build.git;a=commitdiff;h=0b18c0498e1caa14149369c168440ede6aa664b2

This turned out to be not enough, so we increased to 256MiB:

  
http://git.altlinux.org/gears/r/rpm-build.git?p=rpm-build.git;a=commitdiff;h=afe6605589afaf241b360c32d6e03d4aa0c271cd

We have to build and test re-build thousands of packages on many
architectures every day, some of them got compression failures So,
there is new solution - increase memory limit to 512MiB:

  
http://git.altlinux.org/gears/r/rpm-build.git?p=rpm-build.git;a=commitdiff;h=f5fcb8f43d99f718bc339c5eb1dbecefc717b706

It tuned out to be not enough for some rare (and huge) packages (like llvm).
So, there is proposed new ugly fix:

  
http://git.altlinux.org/people/vt/packages/?p=rpm-build.git;a=commitdiff;h=ca3433a6aabf98a837e69f78f7f3426940c23ec5

This pass all our tests, yet. (What else we could do not changing liblzma
and still wanting to not abandon T0?)

So, this is not like happy careless life out there.

> The whole limiter feature exist because I felt it was good to have a
> mechanism to control the memory usage, especially when decompressing
> since a .xz file may cause xz to allocate 4 GiB of memory for a single
> thread. However, I think few people think the same and thus the limiter
> is off by default for both compression and decompression.
> 
> > Also, 32bit with almost 4GiB as a limit is working. If you increase
> > your input (the example from your previous email) then you also end
> > up "can not allocate memory) simply because 32bit can not allocate
> > 4GiB of memory. I'm not sure if the actual memory limit is exported.
> > It is usually at around 3GiB but there architectures which allow less
> > than that (not to mention kernel configurations).
> 
> I don't know if Linux makes it possible for userspace applications to
> know the available address space. It can indeed vary depending on the
> kernel config. xz also needs to be portable to many other kernels. The
> 4020 MiB hack works with 64-bit kernels running 32-bit applications
> since in that case many kernels provide 4 GiB of address space.

Knowing available memory is _not different_ from guard malloc(3) call,
but even worse, because it does not guarantee malloc(3) will succeed (by
so many reason). 

Re: [xz-devel] [PATCH] xz: Fix setting memory limit on 32-bit systems

2021-01-08 Thread Lasse Collin
On 2020-12-27 Vitaly Chikunov wrote:
> On Sat, Dec 26, 2020 at 05:04:02PM +0200, Lasse Collin wrote:
> > I cannot make everyone happy.  
> 
> Wow, that's philosophical! I think, we should solve this fundamental
> problem first. -- Even if we cannot satisfy everybody, better than
> satisfying just one party and make other unhappy, we can give users
> choice. If that's approach is accepted we can rework patch to make it
> better.

The ability to use "xz -T0 -M100%" (-M is short but sets the limit for
both compression and decompression) gives choice in the fairly common
special case where 32-bit programs have 4 GiB of address space. The
hack is mentioned on the man page but it's not explained well enough so
the documentation should be improved. Perhaps it should be referred in
the -T option since that may be the most likely place where users might
look:

If you use -T0, it may lead to memory allocation failures with
32-bit xz if the system has many cores. Combining -T0 with, for
example, --memlimit-compress=90% may help if running a 64-bit
kernel: even if the system has a lot of RAM, 32-bit xz will set
the compression limit to at most 4020 MiB which may make -T0 work
under a 64-bit kernel. See --memlimit-compress for details.

> For example, percentage memory limit on 32-bit systems is calculated
> against whole memory and not against 'physical' 4MiB limit -- user
> should somehow find this, probably by trial and error, wasting her
> time.

This is a fair point, although using "xz -vv" or even the more obscure
"xz --info-memory" do reveal the effective limits.

> By this, I think its always better that program works by default.

I mostly agree with all that you wrote. However, you missed the crucial
detail that not all 32-bit xz binaries have access to 4 GiB of address
space. With Linux it's true only when running a 64-bit kernel. That is
a common special case but it's still a special case. Making that
special case work is still improvement but one has to keep in mind that
it's just a special case. Making 32-bit xz actually robust would
require much more than just limiting memory usage to 4020 MiB.

Having a limit affects single-threaded situation too. A command like "xz
--lzma2=dict=512MiB" has no chance to work on any 32-bit system. If
there's no limit, it will result in allocation failure. If there is a
memory usage limit, xz will scale the dictionary size down so that the
limit isn't exceeded. For some this is good default behavior (things
keep working), for others it's not (the output file isn't what the user
expected it to be, yet there were no errors!). The defaults cannot make
both users happy so one of the users has to set some options to change
the defaults. If I ask which user that should be, for once everyone will
agree that it should be "the other user, not me".

Obviously one could have a limit that only affects the thread count. It
wuold make things even more complicated though and is very probably not
worth it.

> I reason like this: Setting [non-zero value to]
> `--memlimit-compress=` _increases_ use cases by avoiding memory
> errors, in comparison to not setting it [or setting it to 0]. So it
> should be enabled by default.

True. Someone else might say:

Setting a limit increases the chance of getting output files that
aren't compressed with the exact settings that the user specified,
thus it should be disabled by default.

Which would also be true. However, long ago the default limit was based
on percentage of total RAM. With smaller RAM sizes back then, it would
more easily result in settings being adjusted. Trying to just keep
things working for 32-bit executables is a mild adjustment/limit in
comparison.

While I mostly agree with you, I feel my opinion is also quite
irrelevant. With anything related to memory usage limiting I feel I need
to be really careful to not make things worse.

An alternative idea could be to make -T0 imply --memlimit-compress=100%
*if* no limit is otherwise specified. This would help on such 64-bit
platforms too which have tons of cores but not tons of RAM. However,
this would cause breakage on systems where xz doesn't know how to
detect the amount of RAM.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode



Re: [xz-devel] [PATCH] xz: Fix setting memory limit on 32-bit systems

2021-01-08 Thread Lasse Collin
Hello!

Sorry for the two-week silence, I was ill. It will take a few days for
me to catch up with the emails.

On 2020-12-26 Sebastian Andrzej Siewior wrote:
> On 2020-12-26 09:33:04 [+0300], Vitaly Chikunov wrote:
> > This wasn't working, because `memlimit_compress` initialized with
> > zero, thus memory limit is never lowered for 32-bit address space,
> > causing `Cannot allocate memory' error (in `lzma_outq_init()'). For
> > example, when `-T0' is used on 32 CPUs with compression level
> > higher than `-6'.  
> 
> That is one way. It might be that hardware_init() should pass
> `total_ram' to hardware_memlimit_set() instead of 0.
> hardware_memlimit_get() treats 0 as unlimited but I don't think it
> makes sense since memory is never unlimited.

0 means disabled, that is, xz is expected to behave just like most
other programs that might allocate a lot of memory but don't have any
internal memory usage limiting. Memory isn't unlimited but many
programs sort of behave as if it were and fail hard if allocation
fails. That's not robust but it seems to work most of the time and
many seem find this to be acceptable behavior in general.

The whole limiter feature exist because I felt it was good to have a
mechanism to control the memory usage, especially when decompressing
since a .xz file may cause xz to allocate 4 GiB of memory for a single
thread. However, I think few people think the same and thus the limiter
is off by default for both compression and decompression.

> Also, 32bit with almost 4GiB as a limit is working. If you increase
> your input (the example from your previous email) then you also end
> up "can not allocate memory) simply because 32bit can not allocate
> 4GiB of memory. I'm not sure if the actual memory limit is exported.
> It is usually at around 3GiB but there architectures which allow less
> than that (not to mention kernel configurations).

I don't know if Linux makes it possible for userspace applications to
know the available address space. It can indeed vary depending on the
kernel config. xz also needs to be portable to many other kernels. The
4020 MiB hack works with 64-bit kernels running 32-bit applications
since in that case many kernels provide 4 GiB of address space.

There are also resource limits that may also be somewhat OS-specific.
On GNU/Linux one can use "ulimit -v LIM" where LIM is the virtual
memory limit in KiB. Trying to exceed it will result in ENOMEM just
like when running out of address space.

Trying to figure out the various limits doesn't sound practical
especially if it is supposed to work with kernels other than Linux.
Simply trying to allocate a lot of memory (to test if it works) is more
realistic but I think it's still dumb.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode



Re: [xz-devel] [PATCH] xz: Fix setting memory limit on 32-bit systems

2020-12-26 Thread Vitaly Chikunov
Lasse,

On Sat, Dec 26, 2020 at 05:04:02PM +0200, Lasse Collin wrote:
> On 2020-12-26 Vitaly Chikunov wrote:
> > This wasn't working, because `memlimit_compress` initialized with
> > zero, thus memory limit is never lowered for 32-bit address space,
> > causing `Cannot allocate memory' error (in `lzma_outq_init()'). For
> > example, when `-T0' is used on 32 CPUs with compression level higher
> > than `-6'.
> 
> Zero means disabling the limiter. That is the default. The patch would
> change this and enable a memory usage limit by default on 32-bit
> platforms.
> 
> If you want to enable a limiter by default, you may use the environment
> variables XZ_DEFAULTS or XZ_OPT (see the man page).
> 
> The whole memory usage limiting feature is a bit controversial and I'm
> hesitant to enable anything by default (although with threaded
> decompression there may need to be some default *soft* limit). Your
> suggestion is reasonable but someone else may prefer that xz gives a
> hard error if too high settings are specified instead of xz trying to
> outsmart the user and change the settings automatically. Perhaps that
> sounds silly but I don't have energy for endless arguments where both
> sides are kind of right but both wishes cannot be fulfilled at the
> same time.
> 
> I cannot make everyone happy.

Wow, that's philosophical! I think, we should solve this fundamental
problem first. -- Even if we cannot satisfy everybody, better than
satisfying just one party and make other unhappy, we can give users
choice. If that's approach is accepted we can rework patch to make it
better.

Then, I think, better approach is 'sane defaults', thus more use cases
are working correctly out of the box. (Most users don't need to test
memory subsystem they need compression from xz.)

Users who want (by hard to imagine unknown reason) xz to fail on 32-bit
arches would use `--no-adjust` -- we could (rework the patch to) disable
memory auto-fitting logic if this option is used.  Does it sound good?

Enabling (robust behavior) by default is better than not, because -- you
do not shift burden to users: 1) unexpected consequences (of xz not
working in some corner cases, somewhere), 2) then, user would have hard
time solving the issue (man, googling, reading sources). -- By what
combination of options of xz to solve this (if she is not expert in xz
usage hacks). For example, percentage memory limit on 32-bit systems is
calculated against whole memory and not against 'physical' 4MiB limit --
user should somehow find this, probably by trial and error, wasting her
time.

So, to make `xz -T0` "just work" on all possible systems, single portable
approach is just disabling `-T0` and wasting opportinity to speed up
compression real time.
Second option is to make correct xz option generator depending on system
configuration -- she should understand virtual memory and how xz works,
and still may miss something.

By this, I think its always better that program works by default.

Your suggestion (in previous mail) to use `--memlimit-compress=100%`
seems to offset this patch. But, still, need rare expert to know about
this hack. Also, it would be easier for users (I'm also suggested be
colleague) if `--memlimit-compress=100%` is setting enabled by default,
and if someone wants to disable limiter she would use
`--memlimit-compress=0`.

I reason like this: Setting [non-zero value to] `--memlimit-compress=` 
_increases_ use cases by avoiding memory errors, in comparison to not
setting it [or setting it to 0]. So it should be enabled by default.

Thanks,

> 
> -- 
> Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode



Re: [xz-devel] [PATCH] xz: Fix setting memory limit on 32-bit systems

2020-12-26 Thread Sebastian Andrzej Siewior
On 2020-12-26 09:33:04 [+0300], Vitaly Chikunov wrote:
> This wasn't working, because `memlimit_compress` initialized with zero,
> thus memory limit is never lowered for 32-bit address space, causing
> `Cannot allocate memory' error (in `lzma_outq_init()'). For example,
> when `-T0' is used on 32 CPUs with compression level higher than `-6'.

That is one way. It might be that hardware_init() should pass
`total_ram' to hardware_memlimit_set() instead of 0.
hardware_memlimit_get() treats 0 as unlimited but I don't think it makes
sense since memory is never unlimited.
Also, 32bit with almost 4GiB as a limit is working. If you increase your
input (the example from your previous email) then you also end up "can
not allocate memory) simply because 32bit can not allocate 4GiB of
memory. I'm not sure if the actual memory limit is exported. It is
usually at around 3GiB but there architectures which allow less than
that (not to mention kernel configurations).

> Signed-off-by: Vitaly Chikunov 

Sebastian



Re: [xz-devel] [PATCH] xz: Fix setting memory limit on 32-bit systems

2020-12-26 Thread Lasse Collin
On 2020-12-26 Vitaly Chikunov wrote:
> This wasn't working, because `memlimit_compress` initialized with
> zero, thus memory limit is never lowered for 32-bit address space,
> causing `Cannot allocate memory' error (in `lzma_outq_init()'). For
> example, when `-T0' is used on 32 CPUs with compression level higher
> than `-6'.

Zero means disabling the limiter. That is the default. The patch would
change this and enable a memory usage limit by default on 32-bit
platforms.

If you want to enable a limiter by default, you may use the environment
variables XZ_DEFAULTS or XZ_OPT (see the man page).

The whole memory usage limiting feature is a bit controversial and I'm
hesitant to enable anything by default (although with threaded
decompression there may need to be some default *soft* limit). Your
suggestion is reasonable but someone else may prefer that xz gives a
hard error if too high settings are specified instead of xz trying to
outsmart the user and change the settings automatically. Perhaps that
sounds silly but I don't have energy for endless arguments where both
sides are kind of right but both wishes cannot be fulfilled at the
same time.

I cannot make everyone happy.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode