Re: [pkg-discuss] Observations on IPS

Moinak Ghosh Tue, 16 Sep 2008 08:00:54 -0700

On Mon, Sep 15, 2008 at 10:21 PM, Stephen Hahn <[EMAIL PROTECTED]> wrote:
> * Moinak Ghosh <[EMAIL PROTECTED]> [2008-09-14 19:31]:
>
>  Responses to the points that haven't already been responded to in
>  other threads/fora, or aren't already tracked in the bug database.
>  I will make the overall comment that many of these points are
>  insufficiently well-specified to act upon.
>
>>    2. One fundamental design approach in IPS is to use an intelligent
>> package and metadata server. This makes IPS unsuitable for community
>> distro mirrors. Community distros need to use public mirror services like
>> say Ibiblio and it will be very rare, if at all for mirrors to run a custom
>> server on their machines just to mirror a particular distro's packages.
>
>  Actually, it's always been the plan to have the retrieval side be
>  simple.  We had to take a detour when we determined that Python's base
>  HTTP implementation didn't support HTTP/1.1, and thus couldn't
>  pipeline.  (Trivia:  the logo actually hints at this relationship--the
>  retrieval server is much smaller and simpler than the publication
>  server...)
>


   See Peter's comments. We may even decide to distribute packages
   and metadata over simple anonftp. In addition since metadata is in general
   read-only (as you mention below) with occasional writes during publishing
   I do not see much of a need in having a server side component.

   The client does not seem to be that lightweight. It has to do a bunch of
   processing checking package revisions, processing metadata and generating
   package plans which are non-trivial computations. I'd tend to think that the
   processing load is being balanced 50-50 between server and client. It should
   be possible to have the server-side component being done by the
client as well
   without functionality loss.

>>    3. Another fundamental restriction is that an IPS repo cannot be
>> rsync-ed. IPS maintains an index in a huge sparse file rendering rsync
>> impossible. In addition a running server is continuously accessing/
>> updating metadata making it unsafe for rsync. Rsync is a tried and
>> proven and highly optimized algorithm for mirroring used virtually by
>> every mirroring service on the planet and distros need to support it.
>
>  Dan pointed out that the index implementation changed some time ago.
>  I am uncertain why you believe that there is continuous change in the
>  metadata; such a belief is incorrect, and the discrete changes at
>  package publication time can be isolated from any rsync service.
>

   This is fine and removes one big problem of the sparse file. However
   rsync is still not straightforward. When rsync-ing from server_a to server_b
   the depotd on the server_b will have to be stopped for the duration of the
   rsync. Alternatively one has to maintain a duplicate directory structure
   on server_b, rsync to that and then cpio it to the actual depot to reduce
   downtime. In any case this some amount of round-about activity and
   does not fit into the straight zero-complexity distribution of content used
   all over the place today.

>> Why is IPS re-inventing mirroring ?
>
>  I don't believe we are.
>

   What about Pkgrecv ... why do you need that if rsync will suffice ...

>>    7. There is no boolean dependency mechanism in IPS though this
>> may possibly appear at some point.
>
>  Not certain what you mean by a boolean dependency mechanism.
>

   A dependency relation of the form:
   (PkgA, Version: 1.0)  (requires) (PkgB, (version >= 1.0 and version <= 1.5))

   Makes it possible to exactly specify software requirements and allows
   computation of the exact limit set of the packages absolutely needed to
   perform an install or upgrade.

   In addition there should be a logical separation of package dependencies
   between base OS packages and layered software. For eg. the transitive
   dependency closure for an application package say Gaim should not
   include core OS package like kernel or libc. Makes it possible to cleanly
   separate application package transforms from base OS package transforms.

   pkg image-update today doe not give an easy way to upgrade my base OS
   without upgrading all the bundled applications.

>>    8. IPS metadata is extremely opaque making it impossible for anyone
>> to understand it and cost of corruption high both on installed system and
>> on the repository server. With other solutions repairing a corrupt repo can
>> be as simple as an rsync from a mirror. We believe that simple human-
>> readable metadata that adequately serves the purpose is enough and is
>> in fact vital.
>
>  I'm sure I'm too close to this, so you'll need to explain "extremely
>  opaque" and "impossible for anyone".  What specific improvements would
>  lead to simple human-readability?
>

   I will admit here that my original comment goes a little overboard. I have
   been compiling this list based on wide feedback and did not digest this
   one.

   However the approach of naming files in the repo as hashes instead of the
   actual filenames is confusing. One cannot figure out what is what without
   cross-checking with the manifest.

>>   10. IPS performance seems to be on the low side. I have seen an
>> image-update in a machine in the US taking 3mins to compute the update
>> plan. It seems to me as a gut feeling that the abstractions used are not
>> utilizing Python's strengths. Far too much complexity.
>
>  We regularly run performance tests to see what operations are
>  expensive.  I believe the bulk of the cost you are seeing is actually
>  directory scanning in the image, but the next performance checks will
>  confirm that.
>

   It seems to be directory scanning from a little DTracing I did today but
   further digging is warranted.

>>   11. IPS operations are somewhat opaque from the observability point of
>> view. It is rather difficult for developers.
>
>  Vague; please expand.
>

   I will point you to an example:
   http://www.thewrittenword.com/www/projects/pkgutils/pkgadd/

   Excruciating verbosity yes but I will expect it if I am providing a ' -v '
   argument. It is clear as daylight was the utility is doing underneath. In
   contrast pkg image-update -v  for eg. is excruciatingly silent. What if
   fetching  pkg.opensolaris.org/catalog/0  is slow due to a network problem
   ... the user won't have a clue.

>>   13. The download cache in IPS uses hashes instead of filenames making
>> it impossible for a human to understand. Sometimes esp. in emergencies
>> human visibility into the guts of a system is critical.
>
>  At what points would the download cache contents be useful in an
>  emergency, in a way beyond that envisioned by the fix subcommand to
>  pkg(1)?
>

   Ability too see filenames makes it clear what is there in the cache. One
   cannot predict what kind of crooked emergency situation might arise
   requiring hackery beyond one's dreams just to get a something critical
   to work. Pkg itsef screwed up, hand copying of files and so on. I cannot
   articulate examples right now but everyone including myself have faced
   situations in the past. I remember one case where I desperately needed
   to apply a patch and a bug in patchadd caused me to hand-edit pkginfo
   files of 30 packages to get the patch to install.

   While in this thread, I will dare to make a few more comments which I
   was able to recollect yesterday:

   *) Adopting an existing FOSS packaging framework and working with that
   community would have gone a long way to boost SUN's perception among
   FOSS communities.

   *) How does IPS compare to something like Smart (http://labix.org/smart).
   I'd guess IPS still has ways to go to match those features. By that time
   those solutions will have moved further forward.

   *) Why tie every package version into an ON build number. What sense
   it makes to refer to an ON build number for say Thunderbird ?  It is
   understandable that one may require tagging as releases are synced to
   ON builds but a separate taglist property should have been more useful.
   This will also allow flexibility in tagging packages for multiple different
   kinds of deliverables like say a Network Appliance focussed distro.

   *) The feature of tagging within a package and filtering is not yet being
   used and the potential to misuse this is already being exploited. Consider
   the monolithic 450M OpenOffice package. There is no way one can install
   say a single or selected components like Writer. One has to install the
   whole hog.
   This increases opaqueness and reduces visibility into what is already there
   in the packages, unless one is prepared to list all files in the package.
   I'd say sub-package tagging makes sense only for multi-architecture support.
   Not the way things like  *-devel, *-doc etc. are being collapsed
into a single
   package.
   Imagine a small town college student in India sitting with a 128Kbps link
   trying to install OpenOffice, SunStudio and blah, blah, blah on his freshly
   installed OpenSolaris 2008.xx(in his home PC) that he got from a recent
   Bangalore OpenSolaris user group meet. Unfortunately he can't even ask
   someone in Bangalore with say a 2Mbps link to download the packages
   and provide those to him on a DVD!

  *) One final point from my observation, enterprises today have heterogenous
  environments having Windows, Linux, Solaris and possibly other legacy
  OS-es like say AIX. Leaving aside Windows and legacy, there are significant
  frameworks setup for controlled delivery of software to hundreds and thousands
  of boxes typically involving a package repository. The management of all these
  can become hell of a lot easier if it is possible to use a uniform repository
  across platforms. So the repository needs to be modular and extensible to
  different native packaging systems. Unfortunately IPS tightly couples
  packaging and network repository making this use-case impossible. If IPS
  had defined an independent stable on-disk format, had worked with an existing
  community repository project rather than re-doing from scratch, it would have
  made possible a common repository deployment for both Linux and Solaris,
  reduced administrative and maintenance cost and reduced one small barrier
  to entry for OpenSolaris.

Regards,
Moinak.

-- 
================================
http://www.belenix.org/
http://moinakg.wordpress.com/
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] Observations on IPS

Reply via email to