In my experience, package management's routine failure to find/use a 'healthy' 
repository is a long-standing production problem.

I'm requesting a fix.

Here's a summary:

I run Opensuse 13.1 on numerous machines

        lsb_release -rd
                Description:    openSUSE 13.1 (Bottle) (x86_64)
                Release:        13.1

Current count is ~200.

The machines are installed at multiple locations around the globe.

They're connected to the 'net via a variety of different networks providers.

Some of the machines are directly connected to the 'net, some are behind LAN 
routers, switches & firewalls.

Package management for all of the machines is handled exclusively via zypper 
cli.

Each machine has a common core of repositories defined in /etc/zypp/repos.d, 
and frequently has a number of additional @openSUSE dev (!'home') repos defined.

In ALL cases, the default install of repos sets have been installed with the 
meta-director as baseurl,

        baseurl=http://download.opensuse.org/...

Regular package maintenance consists of

        zypper clean --all
        zypper (d)up

The maintenance frequency is nominally 1/wk, often 1/dy, and in devs' cases, 
often more frequent.

In virtually ALL cases, the update process regularly fails @ 
retrieving/refreshing the repos' (meta)data.

For example, a typical result is:

        ...
        Checking whether to refresh metadata for KDE4-Extra-Unstable
        Retrieving: repomd.xml 
.......................................................................................[error]
        File '/repodata/repomd.xml' not found on medium 
'http://download.opensuse.org/repositories/KDE:/Unstable:/Extra/KDE_Current_openSUSE_13.1'

        Abort, retry, ignore? [a/r/i/? shows all options] (a):
        ...

This occurs occassionally for any/all repos, whether the standard distribution 
repos (security, update, etc), core DM (e.g. KDE*) additional repos, or the 
more 'esoteric' !home OBS-hosted repos (e.g., security:netfilter).

The failure rate for overall update/upgrade process attempts is, very roughly, 
~15%.

The error is NON-recoverable.  'Abort' & 'retry' *never* work.

Chats @ IRC re: the issue typically result in the same '(non)responses' :  
"wait", "works for me", "prove it", etc.

The ONLY solution(s) that work are:

        (1) wait some random amount of time -- typically hours, occassionally 
days -- until the system magically heals itself,
        (2) visit the download.opensuse.org link for the repo, click 'details' 
for a target page, identify a specific working/available repo for the 
package(s) of interest, and manually edit baseurl= for the problematic repo.

Neither is tenable for a reliable operating environment.  It is simply 
unmanageable in either a single, local or widely-distributed environment.

(2) is further confounded by the fact that, at any given time, a 
previously-working, manually-selected repo may, itself, fail, requiring -- yet 
again -- another manual intervention.

Within the scope of our environment, no other distro's package management 
system has anywhere near the failure rate demonstrated here. (We've ~600+ other 
machines running a mix of Centos, Fedora, Debian & Ubuntu).

This has been occurring for literally years, across multiple openSUSE versions, 
and remains unaddressed.  I know, without any doubt, that others experience 
similar/frequent failures -- it's been a frequent discussion with our partners, 
as well as in openSUSE* IRC channels.

This needs a fix.  As to what, specifically, that fix can/should be -- I'm 
unclear.  If a solution already exists, I'm unaware.

One idea -- a fallback mechanism *within* a repos' definition would be useful

For example, allow in a given repo's def'n, having multiple, numbered baseurls

        baseurl1=http://direct/url/to/specific/site/1/...
        baseurl2=http://direct/url/to/specific/site/2/...
        baseurl3=http://download.opensuse.org/...
        ...
        baseurlN=http://direct/url/to/specific/site/3/...


and add fuction to zypper so that for each repo, the baseurls would be tried in 
order for any given failure.

By adding, e.g., a 

        failcount2abort=X

to either/both a given repo's defn, or /etc/zypp(er).conf, the overall process 
could be terminated if there were "X" # of subsequent fails, indicating a 
likely systemic problem requiring further intervention.

I'd appreciate hearing from "those responsible for keeping the redirector & 
repos working" re:

        * acknowledgement, or refusal thereof, of the failure issue
        * clarification as to why it occurs in the first place
        * ideas/suggestions as to what can/should be done to fix it

Thanks.

Grant
-- 
To unsubscribe, e-mail: zypp-devel+unsubscr...@opensuse.org
To contact the owner, e-mail: zypp-devel+ow...@opensuse.org

Reply via email to