Hello,

I would like to propose changes which will add new functionallity to the 
IPS. Basically the changes will add new file/files on the server side, 
which will contain specific meta data about fmri's. Other distributions 
already have those kind of metadata[1]

The file for Ubuntu gusy is around 1.3M gzipped and 6.3M uncompressed, 
but the file contains much more information then we would put into 
cataloginfo file and the file contains much more packages (5425).
I've made sample cataloginfo data and it's around 30K gzipped.

Benefits:
     - Package Manager/Update Manager startup with fully loaded
       descriptions will be reduced. Our goal is to have fully
       functional GUI application with loaded data in less then 10
       seconds. This needs to be done in the scalable way, so the user
       will not loose the performance even with loads of authorities.
     - client-server side operations will be reduced
     - Ability to search for packages descriptions/names using web based
       search. The fix for 3014 could be extended to allow users to
       search for descriptions of the packages.
     - The "pkg list -s" performance would be much improved

When the cataloginfo file will be created/re-created:
     The idea is that server will create/re-create cataloginfo file from
     the meta-data:
         - while sending packages to server
         - pkg.depotd:
             - add "--rebuild-cataloginfo"

Synchronization of the cataloginfo files:
     pkg(1):
         - the --refresh operation should allow to specify that we do
           not want to get the cataloginfo file, by default we will
           always get the cataloginfo file.

     api:
         - extend current refresh operation:

         def refresh(self, full_refresh, auths=None, locales=["C",]):
                 """Refreshes the catalogs. full_refresh controls
                 whether to do a full retrieval of the catalog and
                 cataloginfo from the authority or only update the
                 existing catalog cataloginfo files. auths is a list of
                 authorities to refresh. Passing an empty list or using
                 the default value means all known authorities will be
                 refreshed. locales is a list of the locale specific
                 cataloginfo files which will be downloaded during
                 refresh operation. Passing an empty list will skip
                 getting locale specific cataloginfo files. The default
                 is an "C" cataloginfo file, which is supposed to be
                 downloaded. While it currently returns an image object,
                 this is an expedient for allowing existing code to work
                 while the rest of the API is put into place."""

     Package Manager:
         - any operation which involves call to api refresh.

The cataloginfo:
     There should be one default language cataloginfo file per
     repository and corresponding l18n files. The catalog files can be
     stored in compressed binary format such as those generated using
     cPickle.
     Example:
         catalog/
                attrs
                catalog
                cataloginfo
                i18n/
                    cataloginfo_de
                    cataloginfo_pl
                    cataloginfo_en_GB
         cfg_cache
         file/
         index/
         pkg/
         trans/
         updatelog/

     The catalog/attrs file should contain information about last
     modification of cataloginfo similar to the "S Last-Modified:
     [timespec]" attribute for catalog file.

     cataloginfo may contain the following information about package:
         - FMRI
         - display name
         - display description
         - categories
         - ??????????????????????????????????????????

         Because we can have multiply FMRI's for each package, the
         cataloginfo file should store only those values which are
         changing.
         Example:
         Server have 4 versions of the same package PKG_NAME:
             A_FMRI4
             A_FMRI3
             A_FMRI2
             A_FMRI1

         The display description value have changed in the A_FMRI1 and
         then was updated in A_FMRI3, and the categories were updated in
         the A_FMRI4 we should have:
             PKG_NAME
                 A_FMRI4
                      categories: Applications/CoolGnomeApps
                 A_FMRI3
                      display description: here is version 3.45 of some
                                           fancy app
                 A_FMRI1
                      categories: Applications/CoolApps
                      display name: fancy package
                      display description: here is some fancy application

         This will allow to get specific version attributes and also the
         newest one for not installed packages at the same time reducing
         the size of the file.

     corresponding i18n/cataloginfo_* file may contain:
         - FMRI
         - translated display name
         - translated short description
         - translated categories

[1]
  Gentoo
      Gentoo stores the metadata for each package in separate file,
      which makes refreshing catalog not very efficient:

http://sources.gentoo.org/viewcvs.py/gentoo-x86/app-office/openoffice/metadata.xml?view=markup

  Ubuntu
      Similar solution to the proposed one, but ubuntu is storing all
      information about the packages in flat file (similar to manifest):
      http://archive.ubuntu.com/ubuntu/dists/gutsy/main/
      http://archive.ubuntu.com/ubuntu/dists/gutsy/main/binary-i386/
      http://archive.ubuntu.com/ubuntu/dists/gutsy/main/i18n/



-- 
best
Michal Pryc
http://blogs.sun.com/migi

_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to