Re: archive rebuilds wrt Lucas' victory

2013-04-18 Thread Goswin von Brederlow
On Tue, Apr 16, 2013 at 10:22:20PM +0200, Adam Borowski wrote:
 On Tue, Apr 16, 2013 at 11:29:43AM +0200, Goswin von Brederlow wrote:
  On Mon, Apr 15, 2013 at 12:30:43AM +0200, Adam Borowski wrote:
   Too bad, I see what seems to be most of the time being spent in dpkg
   installing dependencies -- how could this be avoided?  One of ideas would
   be to reformat as btrfs (+eatmydata) and find some kind of a tree of
   packages with similar build-depends, snapshotting nodes of the tree to
   quickly reset to a wanted state
  
  I think snapshoting is a good idea there. Or rather forking the
  filesystem. Say you have 2 packages:
  
  Package: A
  Build-Depends: X, Y
  
  Package: B
  Build-Depends: X, Z
  
  You would start with the bare build chroot and install X. Then you
  create snapshots SA and SB from that. In SA you install Y and in SB
  you install Z. Now both packages can be built.
 
 So you would include intermediate states as nodes in the graph as well? 
 Interesting -- this could indeed optimize cases like that, at the cost of
 making the problem a good deal harder algorithmically.
 
  BUT:
  
  - Easy with 2 packages. But how do you do that with 3?
 
 You mean, an algorithmical challenge?  In our kind of crowd?  That's the fun
 stuff!

Anything exponential will not work. And for O(n^c) the c must be
rather small to still find a solution in time. Fun stuff indeed. After
all, if we search 4 weeks for an optimal solution we might as well
just build everything like now and be quicker.
 
 If we reduce the problem by two simplifications:
 
 * can snapshot only before building a package (no intermediate states)

I don't think that is a good simplification. The chance that for two
packges like A and B there is a third package C that only
Build-Depends on X is rather small. And then you wouldn't get a common
node for A and B.

 * the cost of purging a package is same as installing it

That somewhat fixes what the first one broke. Since now B can be a
child of A by purging Y and installing Z. Still wastefull though.

It makes the graph a lot smaller though.

 a solution is to find a minimal spanning tree, possibly with a
 constraint on the tree's height.  And with the graph not being malicious,
 I have a hunch the tree would behave nicely, not requiring too many
 snapshots (random graphs tend to produce short trees).

Luckily minimal spanning tree is well researched. :)

Note: the graph would be directed and should have weights according to
the (estimated) complexity of installing/purging a package.
 
 The full problem may take a bit more thinking.
 
  - Y and Z may both depend on W. So initialy we should have
installed X and W.
  - Package C may Build-Conflicts: X but depend on most of the stuff X
depends on. So taking the filesystem with X installed and purging X
will be faster than starting from scratch.
 
 Ie, edges that purge should have a lesser cost than edges that install.
 
  - Doing multiple apt/dpkg runs is more expensive than a combined one.
A single run will save startup time and triggers.
 
 Again, parameters to the edge cost function.
 
  - Could we install packages without running triggers and only trigger
them at the end of each chain? Or somewhere in the middle?
 
 Could be worth looking at.  Not sure how many triggers can work
 incrementally, and how many rebuild everything every time like man-db.
 Of course, this is moot if we snapshot only before package build.

Even if the trigger is incremental we don't loose anything if delay
running the trigger. It will do more work for that later run but not
more than the individual runs put together. On the other hand non
incremental triggers will add up to a lot more.
 
  - There will be multiple ways to build the tree. We might install U
first and then V or V first and then U. Also we might have to install
V in multiple branches and V can not be installed in a commong root.
Unless we install V in a common root and then uninstall V again for a
subtree. This probably needs a heuristic for how long installing (or
uninstalling) a package takes. Package size will be a major factor
but postinst scripts can take a long time to run (update-texmf anyone?).
 
 What about something akin to: log(size) + size/X ?  For smallish packages,
 dpkg churn is the dominant factor, for big ones it's actual I/O and
 registering individual files.

That would be something to tune experimentally I guess. Possible
factors I can see would be:

- fixed cost for apt/dpkg startup time (or cost relative to number of
  installed packages/files, i.e. database size)
- number of dpkg runs needed for a set of packages (multiplier for the first)
- package size in bytes and number of files
- cost for triggers
- cost for preinst/postinst/prerm/postrm scripts

The last two would be harder to get. The rest all comes from the
Packages files.
 
  - Build chroots, even as snapshots, take space. You can only have so
many of them in 

Re: FINAL release update

2013-04-18 Thread Gunnar Wolf
Neil McGovern dijo [Thu, Apr 18, 2013 at 03:22:21PM +0100]:
 Hi all,
 
 Once again, and hopefully for the final time for this cycle, we are
 writing to you with a release update.
 (...)
 Awesomeness of Wheezy
 =
 
 We really need some more work on http://wiki.debian.org/NewInWheezy,
 please help contribute! Let's tell everyone why Wheezy will be the best
 release ever.

What? Does this mean this is our high point? That from now on we will
just go downhill?

/me is severely disappointed.


-- 
To UNSUBSCRIBE, email to debian-curiosa-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130419005219.gd30...@gwolf.org