On 2020-12-27 Vitaly Chikunov wrote:
> On Sat, Dec 26, 2020 at 05:04:02PM +0200, Lasse Collin wrote:
> > I cannot make everyone happy.  
> 
> Wow, that's philosophical! I think, we should solve this fundamental
> problem first. -- Even if we cannot satisfy everybody, better than
> satisfying just one party and make other unhappy, we can give users
> choice. If that's approach is accepted we can rework patch to make it
> better.

The ability to use "xz -T0 -M100%" (-M is short but sets the limit for
both compression and decompression) gives choice in the fairly common
special case where 32-bit programs have 4 GiB of address space. The
hack is mentioned on the man page but it's not explained well enough so
the documentation should be improved. Perhaps it should be referred in
the -T option since that may be the most likely place where users might
look:

    If you use -T0, it may lead to memory allocation failures with
    32-bit xz if the system has many cores. Combining -T0 with, for
    example, --memlimit-compress=90% may help if running a 64-bit
    kernel: even if the system has a lot of RAM, 32-bit xz will set
    the compression limit to at most 4020 MiB which may make -T0 work
    under a 64-bit kernel. See --memlimit-compress for details.

> For example, percentage memory limit on 32-bit systems is calculated
> against whole memory and not against 'physical' 4MiB limit -- user
> should somehow find this, probably by trial and error, wasting her
> time.

This is a fair point, although using "xz -vv" or even the more obscure
"xz --info-memory" do reveal the effective limits.

> By this, I think its always better that program works by default.

I mostly agree with all that you wrote. However, you missed the crucial
detail that not all 32-bit xz binaries have access to 4 GiB of address
space. With Linux it's true only when running a 64-bit kernel. That is
a common special case but it's still a special case. Making that
special case work is still improvement but one has to keep in mind that
it's just a special case. Making 32-bit xz actually robust would
require much more than just limiting memory usage to 4020 MiB.

Having a limit affects single-threaded situation too. A command like "xz
--lzma2=dict=512MiB" has no chance to work on any 32-bit system. If
there's no limit, it will result in allocation failure. If there is a
memory usage limit, xz will scale the dictionary size down so that the
limit isn't exceeded. For some this is good default behavior (things
keep working), for others it's not (the output file isn't what the user
expected it to be, yet there were no errors!). The defaults cannot make
both users happy so one of the users has to set some options to change
the defaults. If I ask which user that should be, for once everyone will
agree that it should be "the other user, not me".

Obviously one could have a limit that only affects the thread count. It
wuold make things even more complicated though and is very probably not
worth it.

> I reason like this: Setting [non-zero value to]
> `--memlimit-compress=` _increases_ use cases by avoiding memory
> errors, in comparison to not setting it [or setting it to 0]. So it
> should be enabled by default.

True. Someone else might say:

    Setting a limit increases the chance of getting output files that
    aren't compressed with the exact settings that the user specified,
    thus it should be disabled by default.

Which would also be true. However, long ago the default limit was based
on percentage of total RAM. With smaller RAM sizes back then, it would
more easily result in settings being adjusted. Trying to just keep
things working for 32-bit executables is a mild adjustment/limit in
comparison.

While I mostly agree with you, I feel my opinion is also quite
irrelevant. With anything related to memory usage limiting I feel I need
to be really careful to not make things worse.

An alternative idea could be to make -T0 imply --memlimit-compress=100%
*if* no limit is otherwise specified. This would help on such 64-bit
platforms too which have tons of cores but not tons of RAM. However,
this would cause breakage on systems where xz doesn't know how to
detect the amount of RAM.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

Reply via email to