Re: Software Management call for RFEs
And there package diffs, which are ed-style diffs of the Packages file I mentioned above. This approach would work quite well for primary.xml because it doesn't contain cross-references between packages using non-natural keys. It doesn't work for the SQLite database, either in binary or SQL dump format, because of the reliance on artificial primary keys (such as package IDs). I've once tried this. With about 10k packages in fedora-updates, the delta over 2-3 days was +491 -479. Assuming deletions are cheap, the delta should ideally be 5%. As expected, binary bsddiff yields much bigger (~29%) delta. Very roughly, it's 5% that really describe new packages, plus an almost constant 24% overhead to fix up the inevitable changes in surrogate keys. Not as bad as I was afraid, but still not worth it (IMO). So, we need *.xml deltas. Yum can rebuild xml = .sqlite locally, but this needs quite a lot of memory and takes TENS of seconds. Add the time needed to patch the quite large uncompressed xml file, and suddenly the fact that you're downloading just 1/10th of data hardly pays off (ignoring very specific use cases, like mobile data for a moment) For DNF, it's different. It has to rebuild xml = .solv anyway, so this comes for free. However, for many users that follow unstable or testing, package diffs are currently slower than downloading the full Packages file because the diffs are incremental (i.e., they contain the changes from file version N to N+1, and you have to apply all of them to get to the current version) and apt-get can easily write 100 MB or more because the Packages file is rewritten locally multiple times. Yes, patch chaining should be avoided. I'd like to use N = 1 deltas, that could be applied to many recent snapshots. -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: Software Management call for RFEs
Can you point me to the primary.xml - SQLite translation in yum? I've got a fairly efficient primary.xml parser. Just set mddownloadpolicy=xml in yum.conf. It should work, but since downloading sqlite.bz2 is much better, very few use this. Yum uses fairly efficient parser, written in C, using libxml2 (the yum-metadata-parser package). It's always bundled, because Yum has to support xml-only repositories anyway. Oh, there's a typo in yum.conf.5 .. fixed. It might be interesting to see if it's possible to reduce the latency introduced by the SQLite conversion to close to zero. (Decompression and INSERTs can be interleaved with downloading, and maybe the index creation improvements in SQLite are sufficient these days.) We have to checksum the downloaded data before processing, and this kills pipelining. Also, when updating primary_db with a bunch of INSERTs and DELETEs, your database differs from the one on server: - different *.sqlite checksum - different pkgId = PkgKey mapping - different order of packages from SELECTs For speed, Yum joins primary_db and filelists_db via pkgKey, so #2 breaks Yum, unless you always download/delta-update both- so this kills the win in we don't need filelists case. -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Yum: faster bash completion of available packages
Hi, People were complaining that yum autocompletion of package names is very slow (Bug 919852). Now there's a shortcut that (if enabled) makes it much faster. However, the behavior changes slightly: - disabled but cached repositories are used - configured package excludes are not honored - all arches that are available are suggested - there's no installed/available package split It's now in F19 and rawhide, but DISABLED, due to the above. To enable it, just set YUM_CACHEDIR env variable to the path where repository metadata are stored- usually /var/cache/yum. Feedback is very welcome, so we can decide whether to scrap this or enable by default. Thanks! -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: Problem with F17 yum requesting old repodata?
/pub/fedora/linux/updates/17/x86_64/repodata/34881e74623de1754bf0e12f01884bea615fdcee05eded189f22d41bf9d4260b-other.sqlite.bz2 That file no longer exists; it was on my mirror from 21:12 Monday to 22:17 Tuesday. I'm guessing there's either a yum bug or a broken repo push that is: - causing yum to try to fetch an out-of-date repodata file That's puzzling. Repodata are fetched in two steps: 1) repomd.xml primary.sqlite: when repo is opened 2) filelists other: loaded on demand There's usually just a very small delay between 1) and 2), and 1) may use metadata at most 6 hours old. When something requests otherdata that have expired 3 days ago, this means that either: - metadata_expire option was changed in yum.conf - or, it's some long running application. - causing it to try over and over again rapidly AFAIK, Yum (cli) does not need other.sqlite. Maybe something runs repoquery --changelog somepackage in a retry loop.. Or, maybe it's PackageKit, trying to fetch changelogs of updated packages, without updating primary metadata first? That would fit the long running app guess. -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: Problem with F17 yum requesting old repodata?
This is what I get for the last few days: # yum --skip-broken update This is a fairly common scenario. Let me explain.. Repository cachecookie was older than 6 hours, Yum updates the metalink.xml file: updates/17/x86_64/metalink | 16 kB 00:00 The repomd.xml timestamp stored in metalink.xml has changed. Yum retrieves new repomd.xml from some mirror: updates | 4.7 kB 00:00 Then, $hash-primary.sqlite.bz2 referenced by the new repomd.xml needs to be retrieved. But 1st mirror Yum tried didn't have it: http://ftp.informatik.uni-frankfurt.de/fedora/updates/17/x86_64/repodata/00c7410a78aa8dd0f4934ed4935377b99e0339101cee369c1b1691f3025950ac-primary.sqlite.bz2: [Errno 14] HTTP Error 404 - Not Found Yum tries the same relative URL on other mirror. This time the DL was successful: Trying other mirror. updates/primary_db | 6.9 MB 00:06 Metadata are pushed to mirrors as independent files. Probably the tiny repomd.xml is way ahead of primary.sqlite.bz2, so there's a race possible. But since we handle it (by trying other mirror, or reverting to previous metadata when all mirrors fail), I don't consider this a bug. -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
re: request: remove gd.tuwien.ac.at from mirror-lists
On Wed Sep 26 08:02:05, Reindl Harald wrote: yes, and that is why statistics of mirrors are meaningsless because of this fact my idea to give us a config-option dear yum, if the selected mirror provides lower than 500 KB/sek try another one because my line can 12 MB/sec FWIW, we've increased the low speed limit in urlgrabber from 1 to 1000 B/sec. This should fix the most pathological cases. When speed falls below this limit for 30s, download is aborted as if it timed out, and next mirror is used. Each timeout also halves the mirror's estimated speed so it will very likely be avoided next time. https://admin.fedoraproject.org/updates/FEDORA-2012-14928/python-urlgrabber-3.9.1-21.fc18 The timeout value (30s by default) could be adjusted in yum.conf. Low speed limit is hardcoded, but there's a simple patch to add it to yum.conf.. could be merged if necessary. http://lists.baseurl.org/pipermail/yum-devel/2012-September/009634.html -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
yum pragma: no-cache (was Re: F18 DNF and history)
From: Nicolas Mailhot nicolas.mail...@laposte.net can we get a package downloader that sends the correct cache-control http headers to refresh data automatically instead of complaining metadata is wrong and aborting (for people behind a caching proxy)? Have you tried changing http_caching option in yum.conf from (default) 'all' to 'packages'? -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: [announce] yum: parallel downloading
I'd be happy if yum/urlgrabber/libcurl finally used http keepalives. It does, indeed. Parallel downloader tries to use keepalives, too. (we cache and reuse the last idle process instead of closing it) Last time I looked (and it has been a while), it didn't, so you always paid the TCP slow startup penalty for each package. /me just checked with tcpflow that we really do. Please, contact me off-list if you can reproduce it. Thanks! -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: [announce] yum: parallel downloading
Hi Glen, Why is the default three connections rather than one? Is a tripling of the number of connections to a mirror on a Fedora release day desirable? $ grep maxconnections /var/cache/yum/*/metalink.xml /var/cache/yum/fedora/metalink.xml: resources maxconnections=1 /var/cache/yum/updates/metalink.xml: resources maxconnections=1 Yum understands this. Consider that a large mirror site already sees concurrent connections in the multiple 10,000s. Three connections limit is used when the above is not available (e.g. a baseurl setup with just one mirror). I don't mind lowering it to just two, as that should work good enough in most cases. -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: [announce] yum: parallel downloading
The number of concurrent users is now lower because, well, each of them now completes a yum update in one third of the time. I think Glen's concerns were that the consumed resources (flow caches, TCP hash entries, sockets) may scale faster than the aggregated downloading speed. I am aware of this, and in most cases the downloader in urlgrabber will make just 1 concurrent connection to a mirror, because: 1) The Nth concurrent connection to the same host is assumed to be N times slower than 1st one, so we'll very likely not select the same mirror again. 2) maxconnections=1 in every metalink I've seen so far. This is a hard limit, we block until a download finishes and reuse one connection when the limit is maxed out. The reason for NOT banning 1 connections to the same host altogether is that (as John Reiser wrote) 2nd connection does help quite a lot when downloading many small files and just one mirror is available. I agree that using strictly 1 connection and HTTP pipelining would be even better, but we can't do that with libcurl. -- Zdenek -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: [announce] yum: parallel downloading
Both packages are compatible with older versions. Can we use them in Fedora 17 too ? Yes, I've used it in F14 for some time. -- Zdeněk Pavlas -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
Re: [announce] yum: parallel downloading
So disable fastestmirror plugin before testing this, would be the way to go? The fastestmirror plugin does some initial mirror sorting. We mostly ignore this, so disabling fastestmirror makes sense but is not strictly necessary. -- Zdeněk Pavlas -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel
[announce] yum: parallel downloading
Hi, A new yum and urlgrabber packages have just hit Rawhide. These releases include some new features, including parallel downloading of packages and metadata, and a new mirror selection code. As we plan to include these features in RHEL7, I welcome any feedback or bug reports! python-urlgrabber-3.9.1-12.fc18 supports a new API to urlgab() files in parallel, and yum-3.4.3-26.fc18 can use this. Both packages are compatible with older versions. Feature list: - parallel downloading of packages and metadata If possible, multiple files are downloaded in parallel. (see below for the limitations that apply) - configurable 'max_connections' limit in yum.conf This is the maximum number of simultaneous connections Yum makes. Purpose of this is to limit local resources (number of processes forked). The default is to use urlgrabber's default value of 5. - mirror limits are honored, too. Making many connections to the same mirror usually does not help much, it just consumes more resources. That's why Yum also uses mirror limits from metalink.xml. If no such limit is available, at most 3 simultaneous connections are made to any single mirror. - new mirror selection algorithm The real downloading speed is calculated after each download, and the mirror's statistics get updated. These are in turn used when selecting mirrors for further downloads. This should be more accurate than measuring latencies in fastestmirror plugin, but slow mirrors now have to be tried from time to time, and the statistics need some time to build up. - ctrl-c handling This is a long-standing problem in Yum. Due to various shortcomings in rpm and curl it's impossible to react immediately to SIGINT. But now the downloader runs in a different process, so we can exit even if curl is still stuck. The skip to next mirror feature is gone (we don't want to restart all currently running downloads). Known limitations: - metalink.xml and repomd.xml downloads are not parallelized yet. -- Zdeněk Pavlas -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel