[email protected] wrote:
On Wed, Jul 15, 2009 at 02:57:26PM -0500, Shawn Walker wrote:
2.1.2  Proposed Catalog Format

    Catalogs using this format will be composed of the following files:

    - catalog.attrs.<name>
        This file will contain a python dict structure serialized in
        JSON (JavaScript Object Notation) format.  The metadata within
        is used to describe the catalog file and its contents using the
        following attributes:

    - catalog.<name>

        Catalog files will contain a python dict structure serialized
        in JSON (JavaScript Object Notation) format.  Version entries
        for each package stem are kept in ascending version order to
        allow fast lookups by the client and avoid sort overhead on
        load.  The structure can be described as follows:


    Initially, the server will offer the following catalogs.  Each has
    its content based on a tradeoff between memory usage, loading times,
    and bandwidth needs which depend on the client being used to perform
    packaging operations or the operation being performed.

    - catalog.base
        This catalog file only contains the FMRIs of the packages that
        the repository contains.  Loading just this catalog is useful
        when performing basic listing operations using the cli, or when
        simply checking to see if a given package FMRI is valid.

    - catalog.dependency
        This catalog file contains the FMRIs of the packages that the
        repository contains, any 'depend' actions, and any 'set' actions
        for facets or variants.  This information is intended to be used
        during dependency calculation by install, uninstall, etc.

    - catalog.summary
        This catalog file contains the FMRIs of the packages that the
        repository contains and any 'set' actions (excluding those for
        facets or variants).  This information is intended to be used
        primarily by GUI clients such as packagemanager(1), or the BUI
        (Browser UI) provided by pkg.depotd(1m) for quick, efficient
        access to package metadata for listing.

    To enable incremental catalog updates, an "updatelog" will also be
    provided as a single, merged file that can be used to incrementally
    update any of the catalogs.  It is composed of the following files:

    - updatelog.attrs.<logdate>
        This file will contain a python dict structure serialized in
        JSON (JavaScript Object Notation) format.

    - updatelog.<logdate>

        This file will contain a python dict structure serialized in
        JSON (JavaScript Object Notation) format.  <logdate> is a UTC
        date and time of the format YYYYMMDDHH.

After thinking about this some more, and discussing this offline last
week, it seems like we should be able to unify the <object>.attrs.<name>
file with the <object>.<name> and simplify.  In particular, having to
download an attrs file and then the actual file, if something has
changed, leads to a bunch more network round-trips than we really need.

Having the attributes data avaliable separately is really a necessity when it comes to validating the state of the catalog on the server vs. the state of the catalog on the client. In particular, being able to retrieve *just* the attributes data for the catalog allows me to see if it has been rebuilt without downloading the whole catalog again.

The same is true of updatelogs as well.

Your suggestion of a single unified file below would be the alternative, but to me, would be no different than having an attrs file that described all of the catalogs and the updatelog; which is fine.

Ideally, we should be able to use a conditional HTTP get
(If-Modified-Since) for most of these catalogs/updatelogs.  If they
contain the attributes within themselves, it saves us from having to
perform another download.  Having some kind of single unified
description of the types of catalogs that are supported, and the names
of the updatelogs, in a single file seems reasonable, but creating an
attrs file for each object seems unnecessary.

One problem I've been attempting to overcome with the current approach is the problem of user images. In particular, I was concerned that if a user image was moved, the file modification times wouldn't be preserved for the catalog and would cause unexpected results. I suppose though that the same applies for the package files that are being managed inside the user image, so perhaps the right answer is that we must document that user images should only be "moved" using tools that preserve timestamps, etc.

I need to think on what you've suggested here as well as what Danek has mentioned. I'll send out an updated proposal after a few more comments have come in and I've had more time to reflect on this.

Cheers,
--
Shawn Walker
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to