On 01/15/10 04:00 AM, Darren J Moffat wrote: ...
full file. On the other hand for people following a fast moving repository that have low bandwidth it may well be a choice that is worth taking.
Correct, and fits best with our current target audience. Remember that this is also saving Sun bandwidth costs as well (currently).
Yes the optimisation is a very clever one and it is probably saving a lot of downloads for some people - is there enough data in the logs of pkg.opensolaris.org and ipkg.sfbay to calculate if this is actually the case ? If so it would be a very interesting article.
Note that this isn't just a reduction in the amount of bandwidth used. It also reduces the amount of memory required to perform the upgrade (thereby increasing amount of memory available for zfs, etc.), the amount of disk space used during the upgrade, upgrade time, and the amount of energy used by the system performing the upgrade.
The potential savings, of course, are determined by the size of the differences between the build you are upgrading from and the one you are upgrading to, and also depends on the number of packages you have installed.
However, as a random point of reference, my b129 system would currently download 28,392 files going from b129 to b130 for a total of roughly 443MB.
If pkg(1) didn't use the content hash algorithm for determining files to retrieve for updates, those numbers would jump to around 30,836 files and roughly 506MB. So that's roughly a 13% savings for that particular build.
For "slower" builds, such as 109 -> 110, the savings should be greater. I believe there have been more detailed comparisons in the past, but I'm unable to find them at the moment.
...
My current belief is that this optimisation is great for the /dev repository but won't be suitable for some enterprise deployments of Solaris Next.
It is important to remember that the pkg(5) system is an in-development one. The pkg(5) system does not yet have the complete set of enterprise-level functionality that it is planned to have.
There are a number of gaps that are known when it comes to enterprise-level functionality, although we appreciate the time you are taking to provide relevant feedback to that goal. So, it is important to remember that the absence of certain functionality is usually not intentional.
An alternative to doing the deduplication (which is what this is really) at pkg publish time would be to keep the current method but allow a per repository setting on the client that determines wither full file hashes /signatures are always used or if the optimisations can be used.
Files can be shared across packages (at least inside the download cache), and that particular implementation would be fairly complex. As a result, I don't feel that it would be appropriate to have a per-repository setting.
However, as I mentioned in the bugzilla entry that was opened, I think it would probably be agreeable to have an image-level setting that asked pkg(5) to use a "paranoid" update algorithm instead of an "efficient" update algorithm.
There is also work being done with other groups to prevent needless changing of bits where possible, and there are additional publication-time optimisations that are on the pkg(5) list to investigate. Ideally, the entire publication process would be more efficient about not re-delivering bits that haven't changed.
However, there are a variety of challenges with tools used to build different system and programs that favour the current approach used by pkg(1).
Cheers, -- Shawn Walker _______________________________________________ pkg-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/pkg-discuss
