Re: rpmbuild is very slow with large files

2022-07-15 Thread Marius Schwarz

Am 13.07.22 um 16:30 schrieb John Reiser:

On 7/11/22 Marius Schwarz wrote:
I have just create(d/ not finished yet, started 15 minutes ago) a 
~2.5 GB rpm and found, that rpmbuild is an extrem bottleneck.


IMHO, this is caused by a fileread function which reads files in 32k 
blocks, which is very slow and extrem IO intensive.  The result is a 
task running at 1 core at 100% perma. With changes to larger chunks, 
we can speed up so many build tasks on the farm.


Multicore use would also be helpful i.e. while packing the files.

Any counter-arguments ?


If you give the complete package name and URL of the repo,
then more persons may be likely to help investigate.
Specifying a reproducible example is always good.



All issues solved for far.  Just to give you (all) an impression, here 
are source and result in my test repo:


3,2G    /usr/share/pva/vosk-model-de-0.21

[vosk-model-de-0.21]# du -sh *
100M    am
12K    conf
685M    graph
8,2M    ivector
4,0K    README
2,1G    rescore
281M    rnnlm

[rescore]# ll
insgesamt 2171812
-rw-r--r-- 1 root root 2115929988 14. Sep 2021  G.carpa <-
-rw-r--r-- 1 root root  107992138 14. Sep 2021  G.fst

So compressing this 2+ GB file (and others) was slowing down the process 
because of the one core compression default.


Building this now takes just ~4-5 minutes on 8 cores and a system doing 
other things in parallel.


Resulting in a 1.7 GB rpm :

-rw-r--r-- 1 root root 1758210157 14. Jul 09:44 
/home/linux-am-dienstagde/repo/x86_64/fedora/35/pva-vosk-model-de-large-1-2.x86_64.rpm


Luckily, not all vosk language models are not changing frequently and 
are not that big, but some are.


If this ever makes it into Fedora repo, it will take a lot of space and 
bind resources on builds ;)


@BCotton:

No idea, if you remember, but when i said it will waste 100gb + updates, 
in the last year, there were only a few updates to the languages models, 
reducing the expected needed space over time a lot.


Best regards,
Marius
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: rpmbuild is very slow with large files

2022-07-13 Thread John Reiser

On 7/11/22 Marius Schwarz wrote:

I have just create(d/ not finished yet, started 15 minutes ago) a ~2.5 GB rpm 
and found, that rpmbuild is an extrem bottleneck.

IMHO, this is caused by a fileread function which reads files in 32k blocks, 
which is very slow and extrem IO intensive.  The result is a task running at 1 
core at 100% perma. With changes to larger chunks, we can speed up so many 
build tasks on the farm.

Multicore use would also be helpful i.e. while packing the files.

Any counter-arguments ?


If you give the complete package name and URL of the repo,
then more persons may be likely to help investigate.
Specifying a reproducible example is always good.

If you know "strace -p $PID" then please learn "perf record -p $PID".

If the size of the package is in gigabytes, then upstream bears some
responsibility for investigating and documenting the use of
data compression with the package.  What does upstream say?

In the few samples of "read(" from the output of strace,
there I see text similar to JSON or XML tags.  A large dataset
that contains zillions of repetitions of only a few dozen
tags, creates O(n**2) work for deflation.  Finding many matches
of any particular tag is quick, but which match can be extended
the most, considering the exact context of prefixes and suffixes?
A "looser" compression such as "gzip -3" or lzo might be
much faster with only slightly larger output.
A software implementation of a hardware technique such as WK,
or even "ancient" modem compression MNP5 or MNP10,
might also be a good choice.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: rpmbuild is very slow with large files

2022-07-12 Thread Stephen Smoogen
On Mon, 11 Jul 2022 at 18:52, Marius Schwarz  wrote:

>
> Hi,
>
> I have just create(d/ not finished yet, started 15 minutes ago) a ~2.5
> GB rpm and found, that rpmbuild is an extrem bottleneck.
>
>
At that size, is RPM actually a good fit for the data inside it? Building
it is going to be slow and so is going to be installing it and upgrading
it. Downloads from mirrors are going to be problematic going from the
complaints we get from users on various large rpms taking too long to
download or timing out or breaking something else.

I realize RPM is the cardboard box we are comfortable using for a lot of
things, but this seems like trying to use it for a shipping container
across the ocean.


-- 
Stephen Smoogen, Red Hat Automotive
Let us be kind to one another, for most of us are fighting a hard battle.
-- Ian MacClaren
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: rpmbuild is very slow with large files

2022-07-12 Thread Sérgio Basto


The only coincidence is "large files" the rest is different but may
worth check , if you are talking about rawhide. 

https://pagure.io/copr/copr/issue/2241



On Tue, 2022-07-12 at 00:52 +0200, Marius Schwarz wrote:
> 
> Hi,
> 
> I have just create(d/ not finished yet, started 15 minutes ago) a
> ~2.5 
> GB rpm and found, that rpmbuild is an extrem bottleneck.
> 
> IMHO, this is caused by a fileread function which reads files in 32k 
> blocks, which is very slow and extrem IO intensive.  The result is a 
> task running at 1 core at 100% perma. With changes to larger chunks,
> we 
> can speed up so many build tasks on the farm.
> 
> Multicore use would also be helpful i.e. while packing the files.
> 
> Any counter-arguments ?
> 
> strace example:
> 
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=477601377}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=477685727}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=477892054}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=47876}) = 0
> [pid 2604060] read(5, "_I @_I s_I t_E\nauss\303\244het 
> auss\303\244h"..., 32768) = 32768
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=478212651}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=478301347}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=478409015}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=478505273}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=478701366}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=478784826}) = 0
> [pid 2604060] read(5, " Y_I k_I t_E\naustun austun 'aU_B"..., 32768)
> = 32768
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=478962539}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=479045029}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=479130924}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=479213446}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=479407336}) = 0
> [pid 2604060] clock_gettime(CLOCK_REAobjections
> LTIME, {tv_sec=1657579222, tv_nsec=479489832}) = 0
> [pid 2604060] read(5, "s_I v_I u:_I k_I s_I @_I s_E\naus"..., 32768)
> = 32768
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=479720335}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=479803090}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=479950309}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=480067186}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=480305924}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=480417985}) = 0
> [pid 2604060] read(5, "B s_I ts_I u:_I g_I R_I E_I n_I "..., 32768) =
> 32768
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=480654716}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=480763606}) = 0
> 
> and I don't think, this tasks needs to read the clock that often too.
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines:
> https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
> Do not reply to spam on the list, report it:
> https://pagure.io/fedora-infrastructure

-- 
Sérgio M. B.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: rpmbuild is very slow with large files

2022-07-12 Thread Florian Weimer
* Miroslav Lichvar:

> On Tue, Jul 12, 2022 at 09:26:14AM +0200, Florian Weimer wrote:
>> * Marius Schwarz:
>> > and I don't think, this tasks needs to read the clock that often too.
>> 
>> strace shouldn't see a system call here because clock_gettime should be
>> handled in the vDSO.  This suggests something is wrong with the system
>> (unless it's some obscure variant that really doesn't have vDSO support).
>
> It doesn't necessarily have to be something wrong with the system.
> The vDSO clock_gettime() works only with specific clocksources,
> typically TSC on x86_64. On some older HW it's not reliable enough to
> be selected by the kernel, or it could be a VM which doesn't have one
> that would work with migrations, etc.

True, but I suspect a Xeon E5-2620 v4 is recent enough to have a stable
TSC (I see nonstop_tsc and constant_tsc in /proc/cpuinfo for some random
lab system, for a start).  So I suspect that virtualization is masking
it, or otherwise interfering with the TSC.

Thanks,
Florian
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: rpmbuild is very slow with large files

2022-07-12 Thread Florian Festi
On 7/12/22 11:02, Marius Schwarz wrote:
> Am 12.07.22 um 10:55 schrieb Marius Schwarz:
>>
>> The rpmbuild process for this one rpm was single thread. With a
>> lsof-loop,  I could see "bytes" getting attached to the resulting file
>> with an awful slow progression rate. Which is very frustrating to see
>> on a 8 core system.
>>
>> The thing is, I do testbuilds of VOSK with language model and code
>> etc. on one of my servers. If this project ever reaches the Fedora
>> build farm,
>> we can expect a very long build time, if nothing is changed in
>> rpmbuild. Is there maybe a hidden parallel compression option somewhere?
>>
> 
> Looks like someone else had the same problem with rpmbuilds XZ single
> core compression:
> 
> https://insujang.github.io/2020-11-07/accelerating-ceph-rpm-packaging-using-multithreaded-compression/
> 
> |--define "_binary_payload w2T16.xzdio" |
> 
> |Question: Is the resulting compression format suitable for Fedora repo
> or a against a policy?|

Yeah, we did quite some work upstream to get builds run in parallel at
various stages and levels.

RPM does support multithreaded compression where the compression
libraries support that but it needs to be enabled. As the result are not
the same as single threaded compression this may have impact on the
viability of deltarpms. But IIRC at least zstd while having different
results would at least have reproducible results. But I am not the one
that actually checked and decided against threaded compression.

Florian

> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
> Do not reply to spam on the list, report it: 
> https://pagure.io/fedora-infrastructure
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: rpmbuild is very slow with large files

2022-07-12 Thread Marius Schwarz

Am 12.07.22 um 10:55 schrieb Marius Schwarz:


The rpmbuild process for this one rpm was single thread. With a 
lsof-loop,  I could see "bytes" getting attached to the resulting file 
with an awful slow progression rate. Which is very frustrating to see 
on a 8 core system.


The thing is, I do testbuilds of VOSK with language model and code 
etc. on one of my servers. If this project ever reaches the Fedora 
build farm,
we can expect a very long build time, if nothing is changed in 
rpmbuild. Is there maybe a hidden parallel compression option somewhere?




Looks like someone else had the same problem with rpmbuilds XZ single 
core compression:


https://insujang.github.io/2020-11-07/accelerating-ceph-rpm-packaging-using-multithreaded-compression/

|--define "_binary_payload w2T16.xzdio" |

|Question: Is the resulting compression format suitable for Fedora repo 
or a against a policy?|

||
|best regards,|
|Marius|
||

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: rpmbuild is very slow with large files

2022-07-12 Thread Marius Schwarz

Am 12.07.22 um 09:47 schrieb Kamil Dudka:

On Tuesday, July 12, 2022 12:52:13 AM CEST Marius Schwarz wrote:

Multicore use would also be helpful i.e. while packing the files.

What do you mean by packing?  Creation of the resulting RPM packages?
I believe this phase already runs in parallel in case multiple RPM
packages are being created out of a single source RPM package.


"packing" aka "compression" .

The rpmbuild process for this one rpm was single thread. With a 
lsof-loop,  I could see "bytes" getting attached to the resulting file 
with an awful slow progression rate. Which is very frustrating to see on 
a 8 core system.


The thing is, I do testbuilds of VOSK with language model and code etc. 
on one of my servers. If this project ever reaches the Fedora build farm,
we can expect a very long build time, if nothing is changed in rpmbuild. 
Is there maybe a hidden parallel compression option somewhere?


best regards,
Marius
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: rpmbuild is very slow with large files

2022-07-12 Thread Marius Schwarz

Am 12.07.22 um 10:30 schrieb Miroslav Lichvar:

be selected by the kernel, or it could be a VM which doesn't have one
that would work with migrations, etc.


I think your right, it's a vm on xen base.

best regards,
Marius
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: rpmbuild is very slow with large files

2022-07-12 Thread Marius Schwarz

Am 12.07.22 um 09:26 schrieb Florian Weimer:

* Marius Schwarz:


I have just create(d/ not finished yet, started 15 minutes ago) a ~2.5
GB rpm and found, that rpmbuild is an extrem bottleneck.

IMHO, this is caused by a fileread function which reads files in 32k
blocks, which is very slow and extrem IO intensive.  The result is a
task running at 1 core at 100% perma. With changes to larger chunks,
we can speed up so many build tasks on the farm.

That's unlikely.  32K is not a small buffer size.

It's more likely that time is spent during compression.


In this case, a pigz , pbiz2 or other parallel compression mode, would 
be helpful.



strace shouldn't see a system call here because clock_gettime should be
handled in the vDSO.  This suggests something is wrong with the system
(unless it's some obscure variant that really doesn't have vDSO support).

Thanks,
Florian


it's a "normal" ( desktopless ) F 35 on a Xeon E5-2620v4 .

best regards,
Marius Schwarz
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: rpmbuild is very slow with large files

2022-07-12 Thread Miroslav Lichvar
On Tue, Jul 12, 2022 at 09:26:14AM +0200, Florian Weimer wrote:
> * Marius Schwarz:
> > and I don't think, this tasks needs to read the clock that often too.
> 
> strace shouldn't see a system call here because clock_gettime should be
> handled in the vDSO.  This suggests something is wrong with the system
> (unless it's some obscure variant that really doesn't have vDSO support).

It doesn't necessarily have to be something wrong with the system.
The vDSO clock_gettime() works only with specific clocksources,
typically TSC on x86_64. On some older HW it's not reliable enough to
be selected by the kernel, or it could be a VM which doesn't have one
that would work with migrations, etc.

-- 
Miroslav Lichvar
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: rpmbuild is very slow with large files

2022-07-12 Thread Kamil Dudka
On Tuesday, July 12, 2022 12:52:13 AM CEST Marius Schwarz wrote:
> 
> Hi,
> 
> I have just create(d/ not finished yet, started 15 minutes ago) a ~2.5 
> GB rpm and found, that rpmbuild is an extrem bottleneck.
> 
> IMHO, this is caused by a fileread function which reads files in 32k 
> blocks, which is very slow and extrem IO intensive.  The result is a 
> task running at 1 core at 100% perma. With changes to larger chunks, we 
> can speed up so many build tasks on the farm.
> 
> Multicore use would also be helpful i.e. while packing the files.

What do you mean by packing?  Creation of the resulting RPM packages?
I believe this phase already runs in parallel in case multiple RPM
packages are being created out of a single source RPM package.

Kamil

> Any counter-arguments ?
> 
> strace example:
> 
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=477601377}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=477685727}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=477892054}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=47876}) = 0
> [pid 2604060] read(5, "_I @_I s_I t_E\nauss\303\244het 
> auss\303\244h"..., 32768) = 32768
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=478212651}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=478301347}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=478409015}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=478505273}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=478701366}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=478784826}) = 0
> [pid 2604060] read(5, " Y_I k_I t_E\naustun austun 'aU_B"..., 32768) = 32768
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=478962539}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=479045029}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=479130924}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=479213446}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=479407336}) = 0
> [pid 2604060] clock_gettime(CLOCK_REAobjections
> LTIME, {tv_sec=1657579222, tv_nsec=479489832}) = 0
> [pid 2604060] read(5, "s_I v_I u:_I k_I s_I @_I s_E\naus"..., 32768) = 32768
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=479720335}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=479803090}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=479950309}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=480067186}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=480305924}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=480417985}) = 0
> [pid 2604060] read(5, "B s_I ts_I u:_I g_I R_I E_I n_I "..., 32768) = 32768
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=480654716}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
> tv_nsec=480763606}) = 0
> 
> and I don't think, this tasks needs to read the clock that often too.
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
> Do not reply to spam on the list, report it: 
> https://pagure.io/fedora-infrastructure

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: rpmbuild is very slow with large files

2022-07-12 Thread Florian Weimer
* Marius Schwarz:

> I have just create(d/ not finished yet, started 15 minutes ago) a ~2.5
> GB rpm and found, that rpmbuild is an extrem bottleneck.
>
> IMHO, this is caused by a fileread function which reads files in 32k
> blocks, which is very slow and extrem IO intensive.  The result is a 
> task running at 1 core at 100% perma. With changes to larger chunks,
> we can speed up so many build tasks on the farm.

That's unlikely.  32K is not a small buffer size.

It's more likely that time is spent during compression.

> [pid 2604060] read(5, "B s_I ts_I u:_I g_I R_I E_I n_I "..., 32768) = 32768
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222,
> tv_nsec=480654716}) = 0
> [pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222,
> tv_nsec=480763606}) = 0
>
> and I don't think, this tasks needs to read the clock that often too.

strace shouldn't see a system call here because clock_gettime should be
handled in the vDSO.  This suggests something is wrong with the system
(unless it's some obscure variant that really doesn't have vDSO support).

Thanks,
Florian
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


rpmbuild is very slow with large files

2022-07-11 Thread Marius Schwarz


Hi,

I have just create(d/ not finished yet, started 15 minutes ago) a ~2.5 
GB rpm and found, that rpmbuild is an extrem bottleneck.


IMHO, this is caused by a fileread function which reads files in 32k 
blocks, which is very slow and extrem IO intensive.  The result is a 
task running at 1 core at 100% perma. With changes to larger chunks, we 
can speed up so many build tasks on the farm.


Multicore use would also be helpful i.e. while packing the files.

Any counter-arguments ?

strace example:

[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=477601377}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=477685727}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=477892054}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=47876}) = 0
[pid 2604060] read(5, "_I @_I s_I t_E\nauss\303\244het 
auss\303\244h"..., 32768) = 32768
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=478212651}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=478301347}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=478409015}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=478505273}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=478701366}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=478784826}) = 0

[pid 2604060] read(5, " Y_I k_I t_E\naustun austun 'aU_B"..., 32768) = 32768
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=478962539}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=479045029}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=479130924}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=479213446}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=479407336}) = 0

[pid 2604060] clock_gettime(CLOCK_REAobjections
LTIME, {tv_sec=1657579222, tv_nsec=479489832}) = 0
[pid 2604060] read(5, "s_I v_I u:_I k_I s_I @_I s_E\naus"..., 32768) = 32768
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=479720335}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=479803090}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=479950309}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=480067186}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=480305924}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=480417985}) = 0

[pid 2604060] read(5, "B s_I ts_I u:_I g_I R_I E_I n_I "..., 32768) = 32768
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=480654716}) = 0
[pid 2604060] clock_gettime(CLOCK_REALTIME, {tv_sec=1657579222, 
tv_nsec=480763606}) = 0


and I don't think, this tasks needs to read the clock that often too.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure