Greetings,

As part of the catalog v1 work that johansen and I are doing, we've discovered that there are numerous cases where consistency issues with the catalog data can cause problems when refreshing publisher metadata.

The currently known cases, along with our current solution (if one exists) for them are listed below, grouped by origin scenario. All of these cases are for v1 catalog incremental update.

What has become clear in enumerating these cases is that we may need different behaviour based on whether one or multiple origins are present for a publisher.

The purpose of this inquiry is to determine whether the proposed or current behaviours noted below are acceptable, and whether we should emit additional warnings for certain error cases or alter the behaviour.

Please note that only exceptional error cases are detailed here, and that the client doesn't actually support multiple origins yet, so those cases do not have to be for the forthcoming catalog v1 changeset, but do need to be addressed very soon.

Overview
==============================
In short, how do we deal with these single origin cases:
* (good) -> (older)
* (good) -> (rebuilt_newer)
* (good) -> (different)
* (good) -> (different_malicious)
* (good) -> (malicious)

...and how do we deal with these multiple origin cases:
* (good, good) -> (good, older)
* (good, good) -> (good, rebuilt_newer)
* (good, good) -> (good, different)
* (good, good) -> (good, different_malicious)
* (good, good) -> (malicious)

...and the reverse of all the cases above. Of course, there are further combinations possible, but they should just be variations on the above.

It is important to note that the catalog v1 incremental update mechanism depends on the use of the timestamps in catalog retrieval which are based upon publication occurring on one host, with the data then replicated to other origins. While all of the timestamp data is in UTC, there can still be unexpected variations between hosts.

This is important because timestamps are used to determine the order of updates. The host that is the publication source must ensure that the time on each update is after the previous update. If multiple origins have updates from different sources with different timestamps, it's possible to introduce inconsistency into the update process.

Another assumption is that multiple origins for the same publisher do not accept publication -- they must be read-only or a variation thereof. That is, all origins for a publisher are expected to contain the same set of package data (barring synchronisation issues).


Single Origin
==============================
These cases assume that a publisher only has a single origin such as this:

publisher: example.com
origin: http://pkg.example.com/repository

Case 1
------------------------------
Scenario:
pkg.example.com's repository server has had catastrophic disk failure, and has restored an older version of the repository from backup.

Current Refresh Behaviour:
Since last_modified is older, but the creation date of the catalog matches the last retrieved one, the client will abort the incremental update and silently perform a full retrieval instead.

Case 2
------------------------------
Scenario:
pkg.example.com's repository server has had catastrophic disk failure, but did not have a backup to restore. The repository was rebuilt using a copy of the package data, which means there is a completely new catalog in place.

Current Refresh Behaviour:
The client will attempt an incremental update because the rebuilt catalog is newer than the last one it retrieved. However, it will detect that the creation date (and time) of the new catalog does not match that of the old catalog. So, it will abort the incremental update and silently perform a full retrieval instead.

Case 3
------------------------------
Scenario:
pkg.example.com's package repository is completely rebuilt every night (similar to the ON nightly repository we have). This means a new catalog is put into place each time.

Current Refresh Behaviour:
The client will attempt an incremental update. However, when it detects that the creation date and time of the new catalog do not match, it will abort and silently perform a full retrieval instead.

Case 4
------------------------------
Scenario:
User publishes copies of the packages in pkg.example.com's repository to their own repository, and executes set-publisher -O http://localhost example.com.

Current Refresh Behaviour:
The client will attempt an incremental update. However, when it detects that the creation date and time of the new catalog do not match, it will abort and silently perform a full retrieval instead.

Case 5
------------------------------
Scenario:
Malicious user has redirected the client's requests to pkg.example.com's repository to their own evil source via <insert nefarious plan here>, which has been built from scratch, so has a different catalog than pkg.example.com's repository.

Current Refresh Behaviour:
The client will attempt an incremental update. However, when it detects that the creation date and time of the new catalog do not match, it will abort and silently perform a full retrieval instead.

Case 6
------------------------------
Scenario:
Malicious user has redirected the client's requests to pkg.example.com's repository to their own evil source via <insert nefarious plan here>. However, they used a copy of pkg.example.com's repository and then added their new, modified versions of packages.

Current Refresh Behaviour:
The client will silently incrementally update, unaware that the source of the catalog data has changed.

Case 7
------------------------------
Scenario:
Malicious user had redirected the client's requests to pkg.example.com's repository to their own evil source via <insert nefarious plan here>. However, they used a copy of pkg.example.com's repository and then added their new, modified versions of packages. Client user discovers this, and fixes the problem, but client currently has copy of the malicious user's repository data.

Outstanding Issues
------------------------------
The client is relying on creation date and time (which is accurate to the micro-second level with six-digits of precision). Is this a concern? Or is there a point where we say "good enough".

Even then, is there anyway to protect from the malicious user scenarios above? It seems like signing the catalog is the only way to deal with this. But that only helps the network repository case, and not the on-disk case where manifest signing is the only thing we can depend on.


Multiple Origins
==============================
These cases assume that a publisher only has multiple origins such as this:

publisher: example.com
origins: http://pkg.example.com/repository, http://pkg.example.net/repository

Case 1
------------------------------
The example.net repository is an older copy of the example.com.

Current Refresh Behaviour:
Since last_modified is older, but the creation date of the catalog matches the last retrieved one, the client will abort the incremental update and perform a full retrieval.

Case 2
------------------------------
The example.net repository is a copy of the example.com repository, but its catalog data is older.

Current Refresh Behaviour:
The client will attempt an incremental update. However, when it detects that the creation date and time of the new catalog do not match, it will abort and perform a full retrieval instead.

Case 3
------------------------------
Scenario:
pkg.example.com's package repository is completely rebuilt every night (similar to the ON nightly repository we have). This means a new catalog is put into place each time. However, pkg.example.net is a copy of this repository that has to be synchronized, and so its contents don't always exactly match.

Current Refresh Behaviour:
The client when contacting pkg.example.net for an incremental update, will silently do nothing thinking that no updates are available.

Case 4
------------------------------
Scenario:
One of the origins for pkg.example.com's repository has been compromised by a malicious user via <insert nefarious plan here>, which has been built from scratch, so has a different catalog than pkg.example.com's repository.

Current Refresh Behaviour:
The client will attempt an incremental update. However, when it detects that the creation date and time of the new catalog do not match, it will abort and silently perform a full retrieval instead.

Case 5
------------------------------
Scenario:
One of the origins for pkg.example.com's repository has been compromised by a malicious user via <insert nefarious plan here>. However, they used a copy of pkg.example.com's repository and then added their new, modified versions of packages.

Current Refresh Behaviour:
The client will silently incrementally update, unaware that the source of the catalog data has changed.

Case 6
------------------------------
Scenario:
One of pkg.exmaple.com's origins was compromised, corrupted, or contained older data for some period of time.

Outstanding Issues
-------------------------------
Silently performing a full retrieval for the multiple origin cases isn't likely the right answer here. Instead, trying another origin seems the right thing to do.

However, how does the client know which origin is authoritative? Specifically, it seems like the client would have to contact every origin and then pick the newest source with matching identity information (creation date) with the assumption that was the authoritative one.

Cheers,
--
Shawn Walker
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to