On 14/2/24 13:02, Thomas Reichel wrote:
When downloading small files the download time seems dominated by
connection latency rather than bandwidth. Downloading several small
files in parallel is an effective way to reduce the impact of latency.
However, downloading many small files in parallel does not always saturate
bandwidth, which is inefficient. This commit attempts to fully utilize
parallelism to download small files while maintaining high bandwidth
utilization.
This is accomplished by downloading the smallest files first using
all but 1 parallel connection while downloading a large file using the
remaining parallel connection. The result seems to maintain a more stable
download speed throughout entire transactions. This is in contrast to
the usual behavior I observed when downloading many packages, where
the download speed progressively declines as smaller packages are downloaded.
When my entire cache is deleted and all packages are redownloaded using
` pacman -Qqn | sudo pacman -Sw -`, the mean download speed is 47.8
MiB/s. After this patch, the mean download speed is 54.0 MiB/s. In terms
of time savings, this patch causes a 14.9 GiB download to go from 5
minutes 20 seconds to 4 minutes 43 seconds on my system and network.
Your mileage may vary on different systems, networks, and selections of
packages. I expect there to be virtually no effect on selections of
packages that are all fairly large.
I just tested on my current update and it was a 20% deficit upon
applying this patch. I guess there is less benefit (or negative
benefit...) if your download speed is slower and your relatively latency
lower.
The sorting based on size was implemented after a back of the envelope
calculation that showed this was a good choice over a range of possible
speeds/latency/package size combinations. I won't consider this patch
without a more thorough analysis.
Also, patches should go via:
https://gitlab.archlinux.org/pacman/pacman
Allan