Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-26 Thread Kevin Kofler via devel
Daniel Alley wrote:

>>ry xz -9, it should be better than zstd. It will take longer to compress,
>>but should actually be FASTER (!) to decompress, which is what really
>>matters.
> 
> Please provide data - any data - to support this claim, because it flies
> completely in the face of every benchmark the internet has to offer,
> including the one Sirius performed below.

In any case, according to Sirius' benchmark, it looks like zstd -19 actually 
beats even xz -9 at compression ratio (while being worlds faster to 
decompress), so it looks like a good alternative. It takes 3 times longer to 
compress, but who cares, since that happens only once per compose, on one 
computer, vs. millions of Fedora users having to download and decompress the 
file. The tradeoff should be obvious.

(You can also see that the decompression time does in fact go down from xz 
-4 to -6 to -7, then stays constant on -7, -8, -9 where little to no further 
size reduction is reached. This is consistent with what I explained in my 
previous reply to your post above. But of course zstd at any level is about 
6 times faster to decompress than xz at any level.)

Given the benchmark results on one of the actually affected files, I now 
think zstd -19 is what we want to use, not xz -9.

Kevin Kofler
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-26 Thread Kevin Kofler via devel
Daniel Alley wrote:

>>ry xz -9, it should be better than zstd. It will take longer to compress,
>>but should actually be FASTER (!) to decompress, which is what really
>>matters.
> 
> Please provide data - any data - to support this claim, because it flies
> completely in the face of every benchmark the internet has to offer,
> including the one Sirius performed below.

I think you misunderstood what I wrote (which admittedly was somewhat 
misleading). I mean xz decompresses faster when the input was compressed 
with xz -9 than when it was compressed with just xz (which according to the 
manpage currently defaults to xz -6, but in any case, less than -9), which 
was the context.

If you look at https://quixdb.github.io/squash-benchmark/ , wherever a 
higher compression level actually compresses better (e.g., on the enwik8 or 
mozilla benchmarks), xz gets slower to compress, but faster to decompress 
with increasing compression level. (Though if the maximum compression ratio 
is reached before -9, as on ooffice, decompression will actually get slower 
again with higher levels. The speedup comes from having less input to 
process.)

xz at any level will of course still be nowhere near zstd in decompression 
speed. That is not what I intended to claim (and I thought it is obvious 
that that is not the correct interpretation), though my message was somewhat 
ambiguous, and I apologize for that.

Kevin Kofler
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-26 Thread Daniel Alley
>ry xz -9, it should be better than zstd. It will take longer to compress, but 
>should actually be FASTER (!) to decompress, which is what really matters.

Please provide data - any data - to support this claim, because it flies 
completely in the face of every benchmark the internet has to offer, including 
the one Sirius performed below.
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-26 Thread Tulio Magno Quites Machado Filho
Sirius via devel  writes:

> echo % of original
> echo Time to decompress the file, output to /dev/null
> time gzip -d -c ${INPUTFILE}.gz > /dev/null

Keep in mind that gzip has its own zlib implementation, while
createrepo_c uses the system-provided zlib.
That means, when creating a repository, results may very.

-- 
Tulio Magno
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-26 Thread Sirius via devel
In days of yore (Tue, 26 Mar 2024), fedora-devel thus quoth: 
> Also note that adding '-T0' to use all available cores of the CPU will 
> greatly speed up the results with zstd.
> 
> However, all this talking about the optimal compression level, but in 
> the end there's no way to set that to createrepo_c options, so ;-)

True. But running these tests illustrate quite well that there is
diminishing returns or serious tradeoffs required to reach for the biggest
compression ratios. Either they do not perform as well as a lower ratio,
they take inordinately long time to run or they require bespoke solutions
(like custom dicts tailored very specifically to what you are trying to
compress).

Saving bandwidth is a laudable goal but can not lose sight of practical
issues. :)

-- 
Kind regards,

/S
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-26 Thread Mattia Verga via devel
Il 26/03/24 10:41, Sirius via devel ha scritto:
> In days of yore (Tue, 26 Mar 2024), fedora-devel thus quoth:
>> In days of yore (Thu, 21 Mar 2024), Stephen Smoogen thus quoth:
>>> On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel <
>>> devel@lists.fedoraproject.org> wrote:
>>>
 Aoife Moloney wrote:
> The zstd compression type was chosen to match createrepo_c settings.
> As an alternative, we might want to choose xz,
 Since xz consistently compresses better than zstd, I would strongly
 suggest
 using xz everywhere to minimize download sizes. However:

> especially after zlib-ng has been made the default in Fedora and brought
> performance improvements.
 zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly
 (which is mostly due to the format, so, while some implementations manage
 to
 do better than others at the expense of more compression time, there is a
 limit to how well they can do and it is nowhere near xz or even zstd) and
 should hence never be used at all.


>>> There are two parts to this which users will see as 'slowness'. Part one is
>>> downloading the data from a mirror. Part two is uncompressing the data. In
>>> work I have been a part of, we have found that while xz gave us much
>>> smaller files, the time to uncompress was so much larger that our download
>>> gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger)
>>> but uncompressed much faster than xz. This is data dependent though so it
>>> would be good to see if someone could test to see if xz uncompression of
>>> the datafiles will be too slow.
>> Hi there,
>>
>> Ran tests with gzip 1-9 and xz 1-9 on a F41 XML file that was 940MiB.
> Added tests with zstd 1-19, not using a dictionary to improve it any
> further.
>
> Input File: f41-filelist.xml, Size: 985194446 bytes
>
> ZStd Level  1, 1.7s to compress, 6.46% file size,  0.6s decompress
> ZStd Level  2, 1.7s to compress, 6.34% file size,  0.7s decompress
> ZStd Level  3, 2.1s to compress, 6.26% file size,  0.7s decompress
> ZStd Level  4, 2.3s to compress, 6.26% file size,  0.7s decompress
> ZStd Level  5, 5.7s to compress, 5.60% file size,  0.6s decompress
> ZStd Level  6, 7.2s to compress, 5.42% file size,  0.6s decompress
> ZStd Level  7, 8.1s to compress, 5.39% file size,  0.6s decompress
> ZStd Level  8, 9.5s to compress, 5.31% file size,  0.6s decompress
> ZStd Level  9,10.4s to compress, 5.28% file size,  0.6s decompress
> ZStd Level 10,13.6s to compress, 5.26% file size,  0.6s decompress
> ZStd Level 11,18.4s to compress, 5.25% file size,  0.6s decompress
> ZStd Level 12,19.5s to compress, 5.25% file size,  0.6s decompress
> ZStd Level 13,30.9s to compress, 5.25% file size,  0.6s decompress
> ZStd Level 14,39.7s to compress, 5.23% file size,  0.6s decompress
> ZStd Level 15,56.1s to compress, 5.21% file size,  0.6s decompress
> ZStd Level 16,  1min58s to compress, 5.52% file size,  0.7s decompress
> ZStd Level 17,  2min25s to compress, 5.36% file size,  0.7s decompress
> ZStd Level 18,  3min46s to compress, 5.43% file size,  0.8s decompress
> ZStd Level 19, 10min36s to compress, 4.66% file size,  0.7s decompress
>
> So to save 5.2MB in filesize (lvl19 vs lvl15) the server have to spend
> eleven times longer compressing the file (and I did not look at resources
> like CPU or RAM while doing this). I am sure there are other compression
> mechanisms that can squeeze these files a bit further, but at what cost.
> If it is a once a day event, maybe a high compression ration is
> justifiable. If it has to happen hundreds of times per day - not so much.
>
>
> ## zstd
> function do_zstd()
> {
>let cl=1
>echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes
>echo
>while [[ $cl -le 19 ]]
>do
>  echo ZStd compression level ${cl}
>  echo Time to compress the file
>  time zstd -z -${cl} ${INPUTFILE}
>  COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.zst | awk '{print $5}')
>  echo Compressed to
>  echo "scale=5
>  ${COMPRESSED_SIZE}/${INPUTFILESIZE}*100
>  "|bc
>  echo % of original
>  echo Time to decompress the file, output to /dev/null
>  time zstd -d -c ${INPUTFILE}.zst > /dev/null
>  rm -f ${INPUTFILE}.zst
>  let cl=$cl+1
>  echo
>done
> }
>
> --
> Kind regards,
>
> /S
> --

Also note that adding '-T0' to use all available cores of the CPU will 
greatly speed up the results with zstd.

However, all this talking about the optimal compression level, but in 
the end there's no way to set that to createrepo_c options, so ;-)

Mattia

--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: 

Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-26 Thread Sirius via devel
In days of yore (Tue, 26 Mar 2024), fedora-devel thus quoth: 
> In days of yore (Thu, 21 Mar 2024), Stephen Smoogen thus quoth: 
> > On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel <
> > devel@lists.fedoraproject.org> wrote:
> > 
> > > Aoife Moloney wrote:
> > > > The zstd compression type was chosen to match createrepo_c settings.
> > > > As an alternative, we might want to choose xz,
> > >
> > > Since xz consistently compresses better than zstd, I would strongly
> > > suggest
> > > using xz everywhere to minimize download sizes. However:
> > >
> > > > especially after zlib-ng has been made the default in Fedora and brought
> > > > performance improvements.
> > >
> > > zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly
> > > (which is mostly due to the format, so, while some implementations manage
> > > to
> > > do better than others at the expense of more compression time, there is a
> > > limit to how well they can do and it is nowhere near xz or even zstd) and
> > > should hence never be used at all.
> > >
> > >
> > There are two parts to this which users will see as 'slowness'. Part one is
> > downloading the data from a mirror. Part two is uncompressing the data. In
> > work I have been a part of, we have found that while xz gave us much
> > smaller files, the time to uncompress was so much larger that our download
> > gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger)
> > but uncompressed much faster than xz. This is data dependent though so it
> > would be good to see if someone could test to see if xz uncompression of
> > the datafiles will be too slow.
> 
> Hi there,
> 
> Ran tests with gzip 1-9 and xz 1-9 on a F41 XML file that was 940MiB.

Added tests with zstd 1-19, not using a dictionary to improve it any
further.

Input File: f41-filelist.xml, Size: 985194446 bytes

ZStd Level  1, 1.7s to compress, 6.46% file size,  0.6s decompress
ZStd Level  2, 1.7s to compress, 6.34% file size,  0.7s decompress
ZStd Level  3, 2.1s to compress, 6.26% file size,  0.7s decompress
ZStd Level  4, 2.3s to compress, 6.26% file size,  0.7s decompress
ZStd Level  5, 5.7s to compress, 5.60% file size,  0.6s decompress
ZStd Level  6, 7.2s to compress, 5.42% file size,  0.6s decompress
ZStd Level  7, 8.1s to compress, 5.39% file size,  0.6s decompress
ZStd Level  8, 9.5s to compress, 5.31% file size,  0.6s decompress
ZStd Level  9,10.4s to compress, 5.28% file size,  0.6s decompress
ZStd Level 10,13.6s to compress, 5.26% file size,  0.6s decompress
ZStd Level 11,18.4s to compress, 5.25% file size,  0.6s decompress
ZStd Level 12,19.5s to compress, 5.25% file size,  0.6s decompress
ZStd Level 13,30.9s to compress, 5.25% file size,  0.6s decompress
ZStd Level 14,39.7s to compress, 5.23% file size,  0.6s decompress
ZStd Level 15,56.1s to compress, 5.21% file size,  0.6s decompress
ZStd Level 16,  1min58s to compress, 5.52% file size,  0.7s decompress
ZStd Level 17,  2min25s to compress, 5.36% file size,  0.7s decompress
ZStd Level 18,  3min46s to compress, 5.43% file size,  0.8s decompress
ZStd Level 19, 10min36s to compress, 4.66% file size,  0.7s decompress

So to save 5.2MB in filesize (lvl19 vs lvl15) the server have to spend
eleven times longer compressing the file (and I did not look at resources
like CPU or RAM while doing this). I am sure there are other compression
mechanisms that can squeeze these files a bit further, but at what cost.
If it is a once a day event, maybe a high compression ration is
justifiable. If it has to happen hundreds of times per day - not so much.


## zstd
function do_zstd()
{
  let cl=1
  echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes
  echo
  while [[ $cl -le 19 ]]
  do
echo ZStd compression level ${cl}
echo Time to compress the file
time zstd -z -${cl} ${INPUTFILE}
COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.zst | awk '{print $5}')
echo Compressed to
echo "scale=5
${COMPRESSED_SIZE}/${INPUTFILESIZE}*100
"|bc
echo % of original
echo Time to decompress the file, output to /dev/null
time zstd -d -c ${INPUTFILE}.zst > /dev/null
rm -f ${INPUTFILE}.zst
let cl=$cl+1
echo
  done
}

-- 
Kind regards,

/S
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-26 Thread Sirius via devel
In days of yore (Thu, 21 Mar 2024), Stephen Smoogen thus quoth: 
> On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel <
> devel@lists.fedoraproject.org> wrote:
> 
> > Aoife Moloney wrote:
> > > The zstd compression type was chosen to match createrepo_c settings.
> > > As an alternative, we might want to choose xz,
> >
> > Since xz consistently compresses better than zstd, I would strongly
> > suggest
> > using xz everywhere to minimize download sizes. However:
> >
> > > especially after zlib-ng has been made the default in Fedora and brought
> > > performance improvements.
> >
> > zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly
> > (which is mostly due to the format, so, while some implementations manage
> > to
> > do better than others at the expense of more compression time, there is a
> > limit to how well they can do and it is nowhere near xz or even zstd) and
> > should hence never be used at all.
> >
> >
> There are two parts to this which users will see as 'slowness'. Part one is
> downloading the data from a mirror. Part two is uncompressing the data. In
> work I have been a part of, we have found that while xz gave us much
> smaller files, the time to uncompress was so much larger that our download
> gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger)
> but uncompressed much faster than xz. This is data dependent though so it
> would be good to see if someone could test to see if xz uncompression of
> the datafiles will be too slow.

Hi there,

Ran tests with gzip 1-9 and xz 1-9 on a F41 XML file that was 940MiB.

Input File: f41-filelist.xml, Size: 985194446 bytes
XZ level 1 : 21s to compress, 5.3% filesize, 4.4s to decompress
XZ level 2 : 28s to compress, 5.1% filesize, 4.2s to decompress
XZ level 3 : 44s to compress, 5.1% filesize, 4.2s to decompress
XZ level 4 : 55s to compress, 5.3% filesize, 4.5s to decompress
XZ level 5 : 1min25s to compress, 5.3% filesize, 4.3s to decompress
XZ level 6 : 2min49s to compress, 5.1% filesize, 4.4s to decompress
XZ level 7 : 2min55s to compress, 4.8% filesize, 4.2s to decompress
XZ level 8 : 3min 4s to compress, 4.8% filesize, 4.2s to decompress
XZ level 9 : 3min12s to compress, 4.8% filesize, 4.2s to decompress

Input File: f41-filelist.xml, Size: 985194446 bytes
GZ Level 1 :  6s to compress, 7.9% filesize, 4.2s to decompress
GZ Level 2 :  6s to compress, 7.8% filesize, 4.1s to decompress
GZ Level 3 :  7s to compress, 7.6% filesize, 4.1s to decompress
GZ Level 4 :  8s to compress, 6.8% filesize, 4.0s to decompress
GZ Level 5 :  9s to compress, 6.6% filesize, 4.0s to decompress
GZ Level 6 : 12s to compress, 6.6% filesize, 4.0s to decompress
GZ Level 7 : 15s to compress, 6.5% filesize, 4.0s to decompress
GZ Level 8 : 24s to compress, 6.4% filesize, 4.0s to decompress
GZ Level 9 : 28s to compress, 6.3% filesize, 4.0s to decompress

xz level 2 is not a shabby compromise as you get small filesize and time
to compress is the same as gzip level 9. To get the smallest filesizes,
the time (and memory requirements) of xz becomes very noticeable for not
much gain.



#!/bin/bash

INPUTFILE=f41-filelist.xml
INPUTFILESIZE=$(ls -ln f41-filelist.xml|awk '{print $5}')
## gzip
function do_gzip()
{
  let cl=1
  echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes
  echo
  while [[ $cl -le 9 ]]
  do
echo GZip compression level ${cl}
echo Time to compress the file
time gzip -k -${cl} ${INPUTFILE}
COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.gz | awk '{print $5}')
echo Compressed to
echo "scale=5
${COMPRESSED_SIZE}/${INPUTFILESIZE}*100
"|bc
echo % of original
echo Time to decompress the file, output to /dev/null
time gzip -d -c ${INPUTFILE}.gz > /dev/null
rm -f ${INPUTFILE}.gz
let cl=$cl+1
echo
  done
}

## xz
function do_xz()
{
  let cl=1
  echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes
  echo
  while [[ $cl -le 9 ]]
  do
echo XZ compression level ${cl}
echo Time to compress the file
time xz -k -z -${cl} ${INPUTFILE}
COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.xz | awk '{print $5}')
echo Compressed to
echo "scale=5
${COMPRESSED_SIZE}/${INPUTFILESIZE}*100
"|bc
echo % of original
echo Time to decompress the file, output to /dev/null
time xz -d -c ${INPUTFILE}.xz > /dev/null
rm -f ${INPUTFILE}.xz
let cl=$cl+1
echo
  done
}

do_gzip
do_xz

-- 
Kind regards,

/S
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-25 Thread Kevin Fenzi
On Mon, Mar 25, 2024 at 10:59:15PM +0100, Kevin Kofler via devel wrote:
> Zbigniew Jędrzejewski-Szmek wrote:
> 
> > On Mon, Mar 25, 2024 at 04:50:28PM -0400, Neal Gompa wrote:
> >> Keep in mind we also want to make the compose process faster too, I
> >> don't know if it's worth it to spend 20x more time compressing
> >> repodata when we keep trying to get back hours and minutes in the
> >> compose time.
> > 
> > I wanted to write that the compression times are small enough for this not
> > not matter, but indeed, at the very highest levels, they do become
> > noticable.
> 
> 5 minutes? On a process that is run once every 24 hours? While at the same 
> time saving download time for all Fedora users? I fail to see the issue.

7 repodata files compressed x 5 arches x 2 (debuginfo) x 2 (server and
Everything ) = 140 files * 5min -> 11 hours?

Thats of course a inflation of what it would be... most of the repodata
files are way smaller than filelists, but still... keep in mind that any
one thing we do, if we do it a zillion times will add up. 

kevin


signature.asc
Description: PGP signature
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-25 Thread Kevin Kofler via devel
Zbigniew Jędrzejewski-Szmek wrote:

> On Mon, Mar 25, 2024 at 04:50:28PM -0400, Neal Gompa wrote:
>> Keep in mind we also want to make the compose process faster too, I
>> don't know if it's worth it to spend 20x more time compressing
>> repodata when we keep trying to get back hours and minutes in the
>> compose time.
> 
> I wanted to write that the compression times are small enough for this not
> not matter, but indeed, at the very highest levels, they do become
> noticable.

5 minutes? On a process that is run once every 24 hours? While at the same 
time saving download time for all Fedora users? I fail to see the issue.

> $ time xz -k -v
> 8e09489af54bbd4ab85470d449f0b0afa4a26fc3eb97c1665c741427bbc8f060-
filelists.xml
> 8e09489af54bbd4ab85470d449f0b0afa4a26fc3eb97c1665c741427bbc8f060-
filelists.xml
> (1/1)
>   100 %44.3 MiB / 862.9 MiB = 0.05133 MiB/s   0:26
> xz -k -v   196.88s user 0.63s system 749% cpu 26.337 total
> (This is multithreaded, and gives a compression ratio of 5.14%.)

That is not the highest compression level of xz though. Try xz -9, it should 
be better than zstd. It will take longer to compress, but should actually be 
FASTER (!) to decompress, which is what really matters.

Kevin Kofler
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-25 Thread Zbigniew Jędrzejewski-Szmek
On Mon, Mar 25, 2024 at 04:50:28PM -0400, Neal Gompa wrote:
> On Mon, Mar 25, 2024 at 4:40 PM Zbigniew Jędrzejewski-Szmek
>  wrote:
> >
> > On Mon, Mar 25, 2024 at 07:29:09PM +0100, Kevin Kofler via devel wrote:
> > > Daniel Alley wrote:
> > > > One more point: createrepo_c uses zstd compression level 10, but the 
> > > > range
> > > > goes all the way up to level 22.  I would oppose making the default much
> > > > computationally heavier than it is currently, but if spending 20x longer
> > > > to compress the repo 10% more is desirable to the fedora project, then
> > > > createrepo_c could perhaps add a the ability to select a compression
> > > > level.
> > > >
> > > > zstd at high compression levels is very nearly as good at compressing as
> > > > xz and sometimes better, while remaining much faster to decompress. --
> > >
> > > Considering that compression happens once on the server and downloading 
> > > and
> > > decompression happens many times on many computers, I think we should use
> > > the highest possible compression level.
> >
> > +1
> >
> 
> Keep in mind we also want to make the compose process faster too, I
> don't know if it's worth it to spend 20x more time compressing
> repodata when we keep trying to get back hours and minutes in the
> compose time.

I wanted to write that the compression times are small enough for this not
not matter, but indeed, at the very highest levels, they do become noticable.

$ mv 
8e09489af54bbd4ab85470d449f0b0afa4a26fc3eb97c1665c741427bbc8f060-filelists.xml 
filelists.xml
$ time zstd -k -9 filelists.xml
filelists.xml :  5.38%   (   863 MiB =>   46.4 MiB, filelists.xml.zst) 
zstd -k -9   4.96s user 0.18s system 103% cpu 4.971 total

$ time zstd -k -21 filelists.xml
Warning : compression level higher than max, reduced to 19 
filelists.xml :  4.74%   (   863 MiB =>   40.9 MiB, filelists.xml.zst) 
zstd -k -21   321.49s user 0.31s system 99% cpu 5:22.20 total

$ time zstd -k -21 -T8 filelists.xml
Warning : compression level higher than max, reduced to 19 
filelists.xml :  4.74%   (   863 MiB =>   40.9 MiB, filelists.xml.zst) 
zstd -k -21 -T8   874.57s user 0.70s system 732% cpu 1:59.51 total

$ time xz -k -v 
8e09489af54bbd4ab85470d449f0b0afa4a26fc3eb97c1665c741427bbc8f060-filelists.xml 
8e09489af54bbd4ab85470d449f0b0afa4a26fc3eb97c1665c741427bbc8f060-filelists.xml 
(1/1)
  100 %44.3 MiB / 862.9 MiB = 0.05133 MiB/s   0:26 
xz -k -v   196.88s user 0.63s system 749% cpu 26.337 total
(This is multithreaded, and gives a compression ratio of 5.14%.)

Dunno, I think anything below a minute should be OK…

Zbyszek
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-25 Thread Neal Gompa
On Mon, Mar 25, 2024 at 4:40 PM Zbigniew Jędrzejewski-Szmek
 wrote:
>
> On Mon, Mar 25, 2024 at 07:29:09PM +0100, Kevin Kofler via devel wrote:
> > Daniel Alley wrote:
> > > One more point: createrepo_c uses zstd compression level 10, but the range
> > > goes all the way up to level 22.  I would oppose making the default much
> > > computationally heavier than it is currently, but if spending 20x longer
> > > to compress the repo 10% more is desirable to the fedora project, then
> > > createrepo_c could perhaps add a the ability to select a compression
> > > level.
> > >
> > > zstd at high compression levels is very nearly as good at compressing as
> > > xz and sometimes better, while remaining much faster to decompress. --
> >
> > Considering that compression happens once on the server and downloading and
> > decompression happens many times on many computers, I think we should use
> > the highest possible compression level.
>
> +1
>

Keep in mind we also want to make the compose process faster too, I
don't know if it's worth it to spend 20x more time compressing
repodata when we keep trying to get back hours and minutes in the
compose time.



-- 
真実はいつも一つ!/ Always, there's only one truth!
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-25 Thread Zbigniew Jędrzejewski-Szmek
On Mon, Mar 25, 2024 at 07:29:09PM +0100, Kevin Kofler via devel wrote:
> Daniel Alley wrote:
> > One more point: createrepo_c uses zstd compression level 10, but the range
> > goes all the way up to level 22.  I would oppose making the default much
> > computationally heavier than it is currently, but if spending 20x longer
> > to compress the repo 10% more is desirable to the fedora project, then
> > createrepo_c could perhaps add a the ability to select a compression
> > level.
> > 
> > zstd at high compression levels is very nearly as good at compressing as
> > xz and sometimes better, while remaining much faster to decompress. --
> 
> Considering that compression happens once on the server and downloading and 
> decompression happens many times on many computers, I think we should use 
> the highest possible compression level.

+1

Zbyszek
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-25 Thread Kevin Kofler via devel
Daniel Alley wrote:
> One more point: createrepo_c uses zstd compression level 10, but the range
> goes all the way up to level 22.  I would oppose making the default much
> computationally heavier than it is currently, but if spending 20x longer
> to compress the repo 10% more is desirable to the fedora project, then
> createrepo_c could perhaps add a the ability to select a compression
> level.
> 
> zstd at high compression levels is very nearly as good at compressing as
> xz and sometimes better, while remaining much faster to decompress. --

Considering that compression happens once on the server and downloading and 
decompression happens many times on many computers, I think we should use 
the highest possible compression level.

By the way, xz also supports stronger parameters than -9 in principle, there 
is just no preset for it.

Kevin Kofler
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-25 Thread Daniel Alley
One more point: createrepo_c uses zstd compression level 10, but the range goes 
all the way up to level 22.  I would oppose making the default much 
computationally heavier than it is currently, but if spending 20x longer to 
compress the repo 10% more is desirable to the fedora project, then 
createrepo_c could perhaps add a the ability to select a compression level.

zstd at high compression levels is very nearly as good at compressing as xz and 
sometimes better, while remaining much faster to decompress.
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-24 Thread Daniel Alley
Also, to  use that squash benchmark you will probably want to run it yourself 
with modern libraries and modern hardware, as the data on their website 
(assuming it's the same as the data in their github repo) is 8+ years old.  
zstd has improved a fair bit during that timeframe.
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-24 Thread Daniel Alley
But we're not compressing text, we're compressing XML. 

Anyway, I ran an experiment on a local copy of the Fedora 38 release repo and 
the differences (while they do exist) aren't very significant.  Less than 10%


createrepo_c --update --skip-stat --recycle-pkglist --general-compress-type gz .


du -bh repodata/* | grep .gz
18M 
repodata/f6dee453a7f86804214e402ad2e444b989f044f0b16fa7ba74e5a27a8a49cd07-primary.xml.gz
52M 
repodata/131fa4fcd206fd3a718e4765983c8b7b276e7e634e45c226d9c465145f8e69e9-filelists.xml.gz
7.1M
repodata/1c4bf077a2bdf4743a7cded3e2f72282dec5f8e4910692d193e371508552322a-other.xml.gz


createrepo_c --update --skip-stat --recycle-pkglist --general-compress-type 
zstd .
=

15M 
repodata/5c05e888c6da5a13dc2a73fc6fc6e6b2f4ec9120a9544fa26c20cff14a8ace27-primary.xml.zst
41M 
repodata/289503c7ec867863ee67188b8d9981f7e291158a9821ae813124eb480b41cc94-filelists.xml.zst
5.5M
repodata/f78b3010d62173a4a81951c24bad28deb7cb91ab678fdd515a56ff9a72574953-other.xml.zst


createrepo_c --update --skip-stat --recycle-pkglist --general-compress-type xz .


14M 
repodata/a4a4b9c7da02d0cbc7bb3aa39f7f919c7ca033e685ef44e42478a6daf841b32a-primary.xml.xz
41M 
repodata/244e49e5b8c95280bb67a9695e4177fc9e7358f4482df2b489126c02673a48ad-filelists.xml.xz
4.9M
repodata/1f492b3f77a2f9d8a0c1f646200db2575f4a37f5df4c955c8f39e622324eb3ec-other.xml.xz
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-22 Thread Kevin Kofler via devel
Neal Gompa wrote:
> Regular zstd compression is less optimized due to the lack of
> dictionaries, but it's also effectively the fallback path, though much
> faster to decompress while providing pretty good compression (which is
> why we have been gradually switching *everything* to zstd).

"pretty good" if you only really care about speed. It loses to the much 
older xz in almost all compression ratio benchmarks, often significantly 
(e.g., xz compresses text (enwik8 testcase) with a factor 4 (*), zstd only 
with a factor 2.5 [1]). So I still think zstd is a step backwards compared 
to xz.

(*) The factor 4 is with xz -9, but you should always use that because the 
decompression is no slower, and in fact actually slightly faster, with it 
than with lower compression levels, only the compression is slower, but the 
compression happens once on the server.

[1] source: https://quixdb.github.io/squash-benchmark/

Kevin Kofler
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-21 Thread Mattia Verga via devel
Il 21/03/24 03:00, Kevin Kofler via devel ha scritto:

> Since xz consistently compresses better than zstd, I would strongly suggest
> using xz everywhere to minimize download sizes. However:
>
>> especially after zlib-ng has been made the default in Fedora and brought
>> performance improvements.
>
> zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly
> (which is mostly due to the format, so, while some implementations manage to
> do better than others at the expense of more compression time, there is a
> limit to how well they can do and it is nowhere near xz or even zstd) and
> should hence never be used at all.

Yep, I've messed thing up. So, let's stick to use zstd, which is createrepo_c 
new default anyway.

Mattia--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-21 Thread Kevin Kofler via devel
Stephen Smoogen wrote:
> There are two parts to this which users will see as 'slowness'. Part one
> is downloading the data from a mirror. Part two is uncompressing the data.
> In work I have been a part of, we have found that while xz gave us much
> smaller files, the time to uncompress was so much larger that our download
> gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger)
> but uncompressed much faster than xz. This is data dependent though so it
> would be good to see if someone could test to see if xz uncompression of
> the datafiles will be too slow.

This very much depends on the speed of the local Internet connection vs. the 
speed of the user's CPU, so the tradeoff will unfortunately be different 
from user to user. Back in the delta RPM days, I have seen both sides of the 
tradeoff, with delta RPMs initially helping, then when my ISP gradually 
increased the bandwidth allocations while my computer was still the same, it 
more and more just making things worse. It works the same way for metadata 
compression, though I have not timed how that will work out for me 
personally.

That said, another part of the tradeoff is that, for some users, more to 
download means more money getting charged on their metered bandwidth plan. 
That is of course not an issue for those of us lucky enough to be on a 
flatrate broadband plan.

Kevin Kofler
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-21 Thread Neal Gompa
On Thu, Mar 21, 2024 at 8:20 AM Stephen Smoogen  wrote:
>
>
>
> On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel 
>  wrote:
>>
>> Aoife Moloney wrote:
>> > The zstd compression type was chosen to match createrepo_c settings.
>> > As an alternative, we might want to choose xz,
>>
>> Since xz consistently compresses better than zstd, I would strongly suggest
>> using xz everywhere to minimize download sizes. However:
>>
>> > especially after zlib-ng has been made the default in Fedora and brought
>> > performance improvements.
>>
>> zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly
>> (which is mostly due to the format, so, while some implementations manage to
>> do better than others at the expense of more compression time, there is a
>> limit to how well they can do and it is nowhere near xz or even zstd) and
>> should hence never be used at all.
>>
>
> There are two parts to this which users will see as 'slowness'. Part one is 
> downloading the data from a mirror. Part two is uncompressing the data. In 
> work I have been a part of, we have found that while xz gave us much smaller 
> files, the time to uncompress was so much larger that our download gains were 
> lost. Using zstd gave larger downloads (maybe 10 to 20% bigger) but 
> uncompressed much faster than xz. This is data dependent though so it would 
> be good to see if someone could test to see if xz uncompression of the 
> datafiles will be too slow.
>

Fedora has been using optimized zstd compression "by default" since
Fedora 30 anyway with Zchunk metadata:
https://fedoraproject.org/wiki/Changes/Zchunk_Metadata

Regular zstd compression is less optimized due to the lack of
dictionaries, but it's also effectively the fallback path, though much
faster to decompress while providing pretty good compression (which is
why we have been gradually switching *everything* to zstd).



--
真実はいつも一つ!/ Always, there's only one truth!
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-21 Thread Stephen Smoogen
On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel <
devel@lists.fedoraproject.org> wrote:

> Aoife Moloney wrote:
> > The zstd compression type was chosen to match createrepo_c settings.
> > As an alternative, we might want to choose xz,
>
> Since xz consistently compresses better than zstd, I would strongly
> suggest
> using xz everywhere to minimize download sizes. However:
>
> > especially after zlib-ng has been made the default in Fedora and brought
> > performance improvements.
>
> zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly
> (which is mostly due to the format, so, while some implementations manage
> to
> do better than others at the expense of more compression time, there is a
> limit to how well they can do and it is nowhere near xz or even zstd) and
> should hence never be used at all.
>
>
There are two parts to this which users will see as 'slowness'. Part one is
downloading the data from a mirror. Part two is uncompressing the data. In
work I have been a part of, we have found that while xz gave us much
smaller files, the time to uncompress was so much larger that our download
gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger)
but uncompressed much faster than xz. This is data dependent though so it
would be good to see if someone could test to see if xz uncompression of
the datafiles will be too slow.




> Kevin Kofler
> --
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
> Do not reply to spam, report it:
> https://pagure.io/fedora-infrastructure/new_issue
>


-- 
Stephen Smoogen, Red Hat Automotive
Let us be kind to one another, for most of us are fighting a hard battle.
-- Ian MacClaren
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-20 Thread Kevin Kofler via devel
Aoife Moloney wrote:
> The zstd compression type was chosen to match createrepo_c settings.
> As an alternative, we might want to choose xz,

Since xz consistently compresses better than zstd, I would strongly suggest 
using xz everywhere to minimize download sizes. However:

> especially after zlib-ng has been made the default in Fedora and brought
> performance improvements.

zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly 
(which is mostly due to the format, so, while some implementations manage to 
do better than others at the expense of more compression time, there is a 
limit to how well they can do and it is nowhere near xz or even zstd) and 
should hence never be used at all.

Kevin Kofler
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue