Re: [pkg-discuss] Catalog Format and Operation Change Proposal [take 1]

Danek Duvall Wed, 22 Jul 2009 16:55:54 -0700

On Wed, Jul 15, 2009 at 02:57:26PM -0500, Shawn Walker wrote:

>     Catalogs are not separated by locale since manifests are not,


Hm.  Do we really want to bloat a catalog with dozens of languages?  I
assume that almost all clients will want just one language, and however
fast the serialization and deserialization is, multiplying the data in
there is going to slow things down.

I don't see this being an issue for "base" or "depend", but definitely for
"summary".

>     and all data is assumed to be encodable using UTF-8.

We'll want an escape for this, I'm pretty sure.  My plan for the manifests
was going to be a set action, at or near the beginning of the manifests,
that provided the encoding name, at which point the engine could switch
over to that encoding.  For manifests which may change encoding from action
to action (descriptions in multiple languages, for instance), the action
itself could provide the encoding, and if that encoding is in the action
string prior to any values that use it, we can switch over at that point in
preparation.

I imagine that this could be done here, but I'm not sure whether JSON
provides for multiple encodings in-stream.

And then there are any issues that the unicode() type might have with
certain character sets, if such issues exist.

>         created:
>             The value is an ISO-8601 formatted date in UTC time
>             indicating when the catalog was created.
> 
>         last-modified:
>             The value is an ISO-8601 formatted date in UTC time
>             indicating when the catalog was last updated.
> 
>         Example:
> 
>         {
>             'created': '2005-06-14T08:00:00.686485',
>             'last-modified': '2009-05-08T16:10:25.686485',

These should be in UTC, right?  Also, is there a reason you chose the
extended format, rather than the basic format used in the FMRIs?

>             'package-count': 40802,
>             'unique-package-count': 1706,
>             'update-logs': ['2008100208', '2009050816'],
>             'version': 1,
>         }
> 
>     - catalog.<name>
> 
>         Catalog files will contain a python dict structure serialized
>         in JSON (JavaScript Object Notation) format.  Version entries
>         for each package stem are kept in ascending version order to
>         allow fast lookups by the client and avoid sort overhead on
>         load.  The structure can be described as follows:
> 
>         {
>             <publisher-prefix>: {
>                 <FMRI package stem>: [
>                     {
>                         "op-time": <ISO-8601 Date and Time>

What's "op-time" doing in the catalog?

>             "SUNWdvdrw":[
>               {
>                 "version":"5.21.4.10.8,5.11-0.108:20090218T042840Z",
>                 "actions":[
>                   "set name=variant.zone value=global value=nonglobal",
>                   "set name=variant.arch value=sparc value=i386",
>                   "depend [email protected] type=require",
>                   "depend [email protected] type=require",
>                   "depend [email protected] type=require"
>                 ]
>               }
>             ],

I wonder if we couldn't push the actions fully into JSON, too:

    "actions": [
        {
            "": "set",
            "name": "variant.zone",
            "value": ["global", "nonglobal"]
        },
        {
            "": "depend",
            "fmri": "[email protected]",
            "type": "require"
        },
    ]

except that this might add sufficient extra depth to the catalog that
serialization times on SPARC kick up again.

We've been talking about providing the install folks with a quick way of
determining the size of packages to be installed; it would make sense for
this information to go here as well.  There would be multiple entries, for
all the combinations of variants and facets.  It would also be synthetic,
not pulled directly from the manifest.

I would also expect package rename and obsoletion set actions to go here as
well.

The structure also brings up the issue of stability order of entries, which
will be necessary for catalog signing.  Signing and verification could
always happen on a transform of the catalog, where dictionaries are turned
into lists of key/value pairs sorted by key, but given that we'll want to
verify the signature every time we read the catalog (I assume), then this
seems a bit expensive.  Perhaps you or Bart have given some more thought to
this?

> 2.2  Server Changes
> 
>     To enable clients to retrieve the new catalog files and incremental
>     updates to them, the following changes will be made:
> 
>     - The new catalog files will be stored in the /var/pkg/catalog

Don't you mean <repo>/catalog?  (See also the next comment.)

> 2.3.1  Image Changes
> 
>     - The image object, upon initialization, will remove the
>       /var/pkg/catalog directory and its contents if possible.
>       If this cannot be done (due to permissions), the client
>       will continue on.  If it can be removed, a new directory
>       named /var/pkg/publisher be created, and publisher objects
>       will be told to store and retrieve their metadata from it.

Interesting.  This, plus your proposal to put the server-side catalog in
/var/pkg/catalog suggests that you're intending to be able to use /var/pkg
as the root of a repo on which you can just run a depot.  If so, I'm not
sure that's exactly the way you want to do it.  The files and manifests,
IMHO, really ought to be in a dataset of their own, outside of the ROOT
dataset, since they're completely shareable, and freeing up disk space
shouldn't be prevented simply because you're trying to remove old files
held on disk by an old snapshot.

Of course, sharing manifests and files (and even the catalog) between
client and server should be doable, but I'd like you to take the above into
account.

And my assumption could be completely wrong.  In which case I don't
understand the use of /var/pkg/publisher here instead of /var/pkg/catalog.

>     - For performance reasons, the client api will also store
>       versions of each of the catalogs proposed that only
>       contain entries for installed FMRIs to accelerate common
>       client functions such as info, list, uninstall, etc.

Does/could this eliminate the need for /var/pkg/state/install, as well?
And the "installed" files.

>     It was discovered that the likely reason for poor serialization on
>     some SPARC systems is that simplejson uses a recursive function-
>     based iterative encoder that does not perform well on SPARC systems
>     (due to register windows?).

We have a workaround now, but we should probably file a bug on this and
possibly work on an implementation that isn't recursive.

Danek
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] Catalog Format and Operation Change Proposal [take 1]

Reply via email to