Greetings,
As part of the catalog v1 work that johansen and I are doing, we've
discovered that there are numerous cases where consistency issues with
the catalog data can cause problems when refreshing publisher metadata.
The currently known cases, along with our current solution (if one
exists) for them are listed below, grouped by origin scenario. All of
these cases are for v1 catalog incremental update.
What has become clear in enumerating these cases is that we may need
different behaviour based on whether one or multiple origins are present
for a publisher.
The purpose of this inquiry is to determine whether the proposed or
current behaviours noted below are acceptable, and whether we should
emit additional warnings for certain error cases or alter the behaviour.
Please note that only exceptional error cases are detailed here, and
that the client doesn't actually support multiple origins yet, so those
cases do not have to be for the forthcoming catalog v1 changeset, but do
need to be addressed very soon.
Overview
==============================
In short, how do we deal with these single origin cases:
* (good) -> (older)
* (good) -> (rebuilt_newer)
* (good) -> (different)
* (good) -> (different_malicious)
* (good) -> (malicious)
...and how do we deal with these multiple origin cases:
* (good, good) -> (good, older)
* (good, good) -> (good, rebuilt_newer)
* (good, good) -> (good, different)
* (good, good) -> (good, different_malicious)
* (good, good) -> (malicious)
...and the reverse of all the cases above. Of course, there are further
combinations possible, but they should just be variations on the above.
It is important to note that the catalog v1 incremental update mechanism
depends on the use of the timestamps in catalog retrieval which are
based upon publication occurring on one host, with the data then
replicated to other origins. While all of the timestamp data is in UTC,
there can still be unexpected variations between hosts.
This is important because timestamps are used to determine the order of
updates. The host that is the publication source must ensure that the
time on each update is after the previous update. If multiple origins
have updates from different sources with different timestamps, it's
possible to introduce inconsistency into the update process.
Another assumption is that multiple origins for the same publisher do
not accept publication -- they must be read-only or a variation thereof.
That is, all origins for a publisher are expected to contain the same
set of package data (barring synchronisation issues).
Single Origin
==============================
These cases assume that a publisher only has a single origin such as this:
publisher: example.com
origin: http://pkg.example.com/repository
Case 1
------------------------------
Scenario:
pkg.example.com's repository server has had catastrophic disk failure,
and has restored an older version of the repository from backup.
Current Refresh Behaviour:
Since last_modified is older, but the creation date of the catalog
matches the last retrieved one, the client will abort the incremental
update and silently perform a full retrieval instead.
Case 2
------------------------------
Scenario:
pkg.example.com's repository server has had catastrophic disk failure,
but did not have a backup to restore. The repository was rebuilt using
a copy of the package data, which means there is a completely new
catalog in place.
Current Refresh Behaviour:
The client will attempt an incremental update because the rebuilt
catalog is newer than the last one it retrieved. However, it will
detect that the creation date (and time) of the new catalog does not
match that of the old catalog. So, it will abort the incremental update
and silently perform a full retrieval instead.
Case 3
------------------------------
Scenario:
pkg.example.com's package repository is completely rebuilt every night
(similar to the ON nightly repository we have). This means a new
catalog is put into place each time.
Current Refresh Behaviour:
The client will attempt an incremental update. However, when it detects
that the creation date and time of the new catalog do not match, it will
abort and silently perform a full retrieval instead.
Case 4
------------------------------
Scenario:
User publishes copies of the packages in pkg.example.com's repository to
their own repository, and executes set-publisher -O http://localhost
example.com.
Current Refresh Behaviour:
The client will attempt an incremental update. However, when it detects
that the creation date and time of the new catalog do not match, it will
abort and silently perform a full retrieval instead.
Case 5
------------------------------
Scenario:
Malicious user has redirected the client's requests to pkg.example.com's
repository to their own evil source via <insert nefarious plan here>,
which has been built from scratch, so has a different catalog than
pkg.example.com's repository.
Current Refresh Behaviour:
The client will attempt an incremental update. However, when it detects
that the creation date and time of the new catalog do not match, it will
abort and silently perform a full retrieval instead.
Case 6
------------------------------
Scenario:
Malicious user has redirected the client's requests to pkg.example.com's
repository to their own evil source via <insert nefarious plan here>.
However, they used a copy of pkg.example.com's repository and then added
their new, modified versions of packages.
Current Refresh Behaviour:
The client will silently incrementally update, unaware that the source
of the catalog data has changed.
Case 7
------------------------------
Scenario:
Malicious user had redirected the client's requests to pkg.example.com's
repository to their own evil source via <insert nefarious plan here>.
However, they used a copy of pkg.example.com's repository and then added
their new, modified versions of packages. Client user discovers this,
and fixes the problem, but client currently has copy of the malicious
user's repository data.
Outstanding Issues
------------------------------
The client is relying on creation date and time (which is accurate to
the micro-second level with six-digits of precision). Is this a
concern? Or is there a point where we say "good enough".
Even then, is there anyway to protect from the malicious user scenarios
above? It seems like signing the catalog is the only way to deal with
this. But that only helps the network repository case, and not the
on-disk case where manifest signing is the only thing we can depend on.
Multiple Origins
==============================
These cases assume that a publisher only has multiple origins such as this:
publisher: example.com
origins: http://pkg.example.com/repository,
http://pkg.example.net/repository
Case 1
------------------------------
The example.net repository is an older copy of the example.com.
Current Refresh Behaviour:
Since last_modified is older, but the creation date of the catalog
matches the last retrieved one, the client will abort the incremental
update and perform a full retrieval.
Case 2
------------------------------
The example.net repository is a copy of the example.com repository, but
its catalog data is older.
Current Refresh Behaviour:
The client will attempt an incremental update. However, when it detects
that the creation date and time of the new catalog do not match, it will
abort and perform a full retrieval instead.
Case 3
------------------------------
Scenario:
pkg.example.com's package repository is completely rebuilt every night
(similar to the ON nightly repository we have). This means a new
catalog is put into place each time. However, pkg.example.net is a copy
of this repository that has to be synchronized, and so its contents
don't always exactly match.
Current Refresh Behaviour:
The client when contacting pkg.example.net for an incremental update,
will silently do nothing thinking that no updates are available.
Case 4
------------------------------
Scenario:
One of the origins for pkg.example.com's repository has been compromised
by a malicious user via <insert nefarious plan here>, which has been
built from scratch, so has a different catalog than pkg.example.com's
repository.
Current Refresh Behaviour:
The client will attempt an incremental update. However, when it detects
that the creation date and time of the new catalog do not match, it will
abort and silently perform a full retrieval instead.
Case 5
------------------------------
Scenario:
One of the origins for pkg.example.com's repository has been compromised
by a malicious user via <insert nefarious plan here>. However, they
used a copy of pkg.example.com's repository and then added their new,
modified versions of packages.
Current Refresh Behaviour:
The client will silently incrementally update, unaware that the source
of the catalog data has changed.
Case 6
------------------------------
Scenario:
One of pkg.exmaple.com's origins was compromised, corrupted, or
contained older data for some period of time.
Outstanding Issues
-------------------------------
Silently performing a full retrieval for the multiple origin cases isn't
likely the right answer here. Instead, trying another origin seems the
right thing to do.
However, how does the client know which origin is authoritative?
Specifically, it seems like the client would have to contact every
origin and then pick the newest source with matching identity
information (creation date) with the assumption that was the
authoritative one.
Cheers,
--
Shawn Walker
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss