Re: Speeding up dpkg, a proposal

2011-03-18 Thread Goswin von Brederlow
Russell Coker russ...@coker.com.au writes:

 I recently had a situation where I was doing a backup to a USB flash device 
 and I decided to install some Debian packages.  The sync() didn't complete 
 until the backup completed because the write-back buffers were never empty!

Which is odd because I've used sync() while copying large amounts of
data and the sync() completes while the copying is going on steadily.
It only waits for the currently dirty buffers to be written, not for any
buffers that get dirtied later. At least it used to.

MfG
Goswin


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87r5a4tvcx.fsf@frosties.localnet



Re: Speeding up dpkg, a proposal

2011-03-18 Thread Russell Coker
On Sat, 19 Mar 2011, Goswin von Brederlow goswin-...@web.de wrote:
  I recently had a situation where I was doing a backup to a USB flash
  device  and I decided to install some Debian packages.  The sync()
  didn't complete until the backup completed because the write-back
  buffers were never empty!
 
 Which is odd because I've used sync() while copying large amounts of
 data and the sync() completes while the copying is going on steadily.
 It only waits for the currently dirty buffers to be written, not for any
 buffers that get dirtied later. At least it used to.

Maybe.  Of course if I happened to have 500M of dirty buffers on the flash 
device that could take a long time to be written and give a result that's 
pretty close to my first impression.

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201103190034.04057.russ...@coker.com.au



Re: Speeding up dpkg, a proposal

2011-03-17 Thread Goswin von Brederlow
Marius Vollmer marius.voll...@nokia.com writes:

 ext Chow Loong Jin hyper...@ubuntu.com writes:

 Could we somehow avoid using sync()? sync() syncs all mounted filesystems, 
 which
 isn't exactly very friendly when you have a few slow-syncing filesystems like
 btrfs (or even NFS) mounted.

 Hmm, right.  We could keep a list of all files that need fsyncing, and
 then fsync them all just before writing the checkpoint.

 Half of that is already done (for the content of the packages), we would
 need to add it for the files in /var/lib/dpkg/, or we could just fsync
 the whole directory.

 But then again, I would argue that the sync() is actually necessary
 always, for correct semantics: You also want to sync everything that the
 postinst script has done before recording that a package is fully
 installed.

Except for chroots, the throw away after use kind, this realy doesn't
matter. If the system crashes at any point before the chroot is thrown
away then it just gets thrown away after boot and the whole operation is
restarted from scratch.

MfG
Goswin


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/8762rha3ax.fsf@frosties.localnet



Re: Speeding up dpkg, a proposal

2011-03-17 Thread Goswin von Brederlow
Adrian von Bidder avbid...@fortytwo.ch writes:

 On Wednesday 02 March 2011 17.02:11 Marius Vollmer wrote:
 - Instead, we move all packages that are to be unpacked into
   half-installed / reinstreq before touching the first one, and put a
   big sync() right before carefully writing /var/lib/dpkg/status.

 You don't want to do this. While production systems usually are upgraded in 
 downtime windows (with less load), it is sometimes necessary to install some 
 package (tcpdump or whatever to diagnose problems...) while the system is 
 under high load. Especially when you're trying to find out why the machine 
 has a load of 20 and you can't afford to kill it...

 On a machine with lots of RAM (== disk cache...) and high I/O load, you 
 don't want to do a (global!) sync().  This can totally kill the machine for 
 20min or more and is a big no go.

 -- vbi

Then don't use the option. It should definetly be an option:

sync / fs sync / fsync / sync only metadata / single sync at end / no sync at 
all

MfG
Goswin


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/871v25a36z.fsf@frosties.localnet



Re: Speeding up dpkg, a proposal

2011-03-17 Thread Russell Coker
On Fri, 18 Mar 2011, Goswin von Brederlow goswin-...@web.de wrote:
  On a machine with lots of RAM (== disk cache...) and high I/O load, you 
  don't want to do a (global!) sync().  This can totally kill the machine
  for  20min or more and is a big no go.
  
  -- vbi
 
 Then don't use the option. It should definetly be an option:

It's a pity that there is no kernel support for synching one filesystem (or 
maybe a few filesystems).

I recently had a situation where I was doing a backup to a USB flash device 
and I decided to install some Debian packages.  The sync() didn't complete 
until the backup completed because the write-back buffers were never empty!

If dpkg had only sync'd the filesystems used for Debian files (IE ones on the 
hard drive) then the package install would have taken a fraction of the time 
and I could have used the programs in question while the backup was running.

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201103181011.08741.russ...@coker.com.au



Re: Speeding up dpkg, a proposal

2011-03-17 Thread Olaf van der Spek
On Fri, Mar 18, 2011 at 12:11 AM, Russell Coker russ...@coker.com.au wrote:
 Then don't use the option. It should definetly be an option:

 It's a pity that there is no kernel support for synching one filesystem (or
 maybe a few filesystems).

That'd be only a partial work around. Even with a single fs one big
sync can be bad.

 I recently had a situation where I was doing a backup to a USB flash device
 and I decided to install some Debian packages.  The sync() didn't complete
 until the backup completed because the write-back buffers were never empty!

Didn't dpkg stop using sync?

Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/AANLkTi=sjgv2n_otcrgymw8zvqo0pgd9zz_1+dquu...@mail.gmail.com



Re: Speeding up dpkg, a proposal

2011-03-17 Thread Henrique de Moraes Holschuh
On Fri, 18 Mar 2011, Russell Coker wrote:
 On Fri, 18 Mar 2011, Goswin von Brederlow goswin-...@web.de wrote:
   On a machine with lots of RAM (== disk cache...) and high I/O load, you 
   don't want to do a (global!) sync().  This can totally kill the machine
   for  20min or more and is a big no go.
   
   -- vbi
  
  Then don't use the option. It should definetly be an option:
 
 It's a pity that there is no kernel support for synching one filesystem (or 
 maybe a few filesystems).

It is being implemented right now, actually...  Maybe it will be in 2.6.39.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110318013625.ga6...@khazad-dum.debian.net



Re: Speeding up dpkg, a proposal

2011-03-03 Thread Marius Vollmer
ext Chow Loong Jin hyper...@ubuntu.com writes:

 Could we somehow avoid using sync()? sync() syncs all mounted filesystems, 
 which
 isn't exactly very friendly when you have a few slow-syncing filesystems like
 btrfs (or even NFS) mounted.

Hmm, right.  We could keep a list of all files that need fsyncing, and
then fsync them all just before writing the checkpoint.

Half of that is already done (for the content of the packages), we would
need to add it for the files in /var/lib/dpkg/, or we could just fsync
the whole directory.

But then again, I would argue that the sync() is actually necessary
always, for correct semantics: You also want to sync everything that the
postinst script has done before recording that a package is fully
installed.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87y64w8zy1@big.research.nokia.com



Re: Speeding up dpkg, a proposal

2011-03-03 Thread Raphael Hertzog
Hi Marius,

no need to CC Guillem privately, the dpkg maintainers are reachable at
debian-d...@lists.debian.org. :)

On Wed, 02 Mar 2011, Marius Vollmer wrote:
 - Instead, we move all packages that are to be unpacked into
   half-installed / reinstreq before touching the first one, and put a
   big sync() right before carefully writing /var/lib/dpkg/status.

The big sync() doesn't work. It means dpkg never finishes its work on
systems with lots of unrelated I/O.

We have used that in the past and we had to revert, see
commit 5ee4e4e0458088cde1625ddb5a3d736f31a335d3 and the associated bug
reports:
http://bugs.debian.org/588339
http://bugs.debian.org/595927
http://bugs.debian.org/600075

Later we completely removed the codepath USE_SYNC_SYNC due to this. Even
if you're not using the sync() in the same way that we did, its mere usage
makes the whole solution a no-go.

Other people have already mentioned --force-unsafe-io as a way to recover
some of the performance lost. It does still sync the updates to the
internal database (but not the files installed by the packages).

We've seen reports of poor performance with btrfs (and that's what you use
for Meego IIRC) so you might want to investigate why btrfs is coping so
badly with a few fsync() just on the status files.

That's my 2cents on this discussion.

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Follow my Debian News ▶ http://RaphaelHertzog.com (English)
  ▶ http://RaphaelHertzog.fr (Français)


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110303080011.gg6...@rivendell.home.ouaza.com



Re: Speeding up dpkg, a proposal

2011-03-03 Thread Marius Vollmer
ext Raphael Hertzog hert...@debian.org writes:

 On Wed, 02 Mar 2011, Marius Vollmer wrote:
 - Instead, we move all packages that are to be unpacked into
   half-installed / reinstreq before touching the first one, and put a
   big sync() right before carefully writing /var/lib/dpkg/status.

 The big sync() doesn't work. It means dpkg never finishes its work on
 systems with lots of unrelated I/O.

Ok, understood.  It's now clear to me that the big sync should be
replaced with deferred fsyncs.  (I would defer the fsync of the content
of all packages until modstatdb_checkpoint, not just until
tar_deferred_extract.)

With that change, do you think the approach is sound?

 We've seen reports of poor performance with btrfs (and that's what you use
 for Meego IIRC) so you might want to investigate why btrfs is coping so
 badly with a few fsync() just on the status files.

This is about Harmattan, which uses ext4.

To understand our troubles, you need to know that we have around 2500
packages with just a single file in it.  For those packages, dpkg spends
the largest part of its time in writing the nine journal entries to
/var/lib/dpkg/updates.

We will reduce the number of our packages, so this issue might solve
itself that way, but I had good success in reducing the per-package
overhead of dpkg, and if it is correct and works for us, why not use the
'reckless' option as well?


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87mxlc8vem@big.research.nokia.com



Re: Speeding up dpkg, a proposal

2011-03-03 Thread Olaf van der Spek
On Thu, Mar 3, 2011 at 8:33 AM, Marius Vollmer marius.voll...@nokia.com wrote:
 And in the big picture, all we need is some guarantee that renames are
 comitted in order, and after the content of the file that is being
 renamed.  I have the impression that all reasonable filesystems give
 that guarantee now, no?

No, they took shortcuts in the implementation and commit the rename
before the source file.

 Hmm, right.  We could keep a list of all files that need fsyncing, and
 then fsync them all just before writing the checkpoint.

Instead of fsync you might want to use the async Linux specific sync options.
-- 
Olaf


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/AANLkTinW=kxn2trssg+b327prv5m66-zpdxc0s10l...@mail.gmail.com



Re: Speeding up dpkg, a proposal

2011-03-03 Thread Raphael Hertzog
On Thu, 03 Mar 2011, Marius Vollmer wrote:
 ext Raphael Hertzog hert...@debian.org writes:
 
  On Wed, 02 Mar 2011, Marius Vollmer wrote:
  - Instead, we move all packages that are to be unpacked into
half-installed / reinstreq before touching the first one, and put a
big sync() right before carefully writing /var/lib/dpkg/status.
 
  The big sync() doesn't work. It means dpkg never finishes its work on
  systems with lots of unrelated I/O.
 
 Ok, understood.  It's now clear to me that the big sync should be
 replaced with deferred fsyncs.  (I would defer the fsync of the content
 of all packages until modstatdb_checkpoint, not just until
 tar_deferred_extract.)

This is assuming you don't use --force-unsafe-io. Otherwise you don't sync
packages content at all.

 With that change, do you think the approach is sound?

It looks like it could work in principle. But it might have unexpected
complications in case of interruptions. You said it yourself:
it leaves its database behind in a correct but quite outdated and not so
friendly state

The reinstreq flag is usually present on a single package only, and we know
that this single package is (likely) broken. So we reinstall it and we can go 
ahead.

Now with your scheme, we have many packages in that state and we don't know 
which
ones are really broken. At least the one which was being processed at the time
of the interruption (as in power loss).

Are we sure there are no case where this brokenness leads to failures in preinst
of some of the other packages to be reinstalled? How is the package manager
supposed to order the reinstallations?

 To understand our troubles, you need to know that we have around 2500
 packages with just a single file in it.  For those packages, dpkg spends
 the largest part of its time in writing the nine journal entries to
 /var/lib/dpkg/updates.

nine? I haven't reviewed the code but that's quite a lot indeed. Maybe there's
room for optimization here.

A quick review indeed reveals this sequence (for an upgrade):
- half_installed + reinstreq
- unpacked + reinstreq
- half_installed + reinstreq
- unpacked + reinstreq
- unpacked
- unpacked (again at start of configure, don't know why)
- half_configured
- installed
- the final installation in the status file

Indeed, your scenario is very particular. Usually you have many files and thus
the fsync() of all the files is what takes the most time (compared to the 9
fsync() for the status information) and there --force-unsafe-io shows a sizable
improvement.

 We will reduce the number of our packages, so this issue might solve
 itself that way, but I had good success in reducing the per-package
 overhead of dpkg, and if it is correct and works for us, why not use the
 'reckless' option as well?

I don't think we're interested in adding more options that make it even more
difficult to understand what dpkg does. Either there's a better way of doing it
and we use it all the time, or we keep it like it is.

For instance, I wonder if we could not get rid of two modstatdb_note in the 
above
list:
- the first unpacked + reinstreq could be directly brought back to
half_installed + reinstreq with minimal consequences (the only difference
comes when one of the conflictor/to-be-deconfigured package fails to be
deconfigured).
- the other one at the start of the configure process

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Follow my Debian News ▶ http://RaphaelHertzog.com (English)
  ▶ http://RaphaelHertzog.fr (Français)


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110303133441.gc11...@rivendell.home.ouaza.com



Re: Speeding up dpkg, a proposal

2011-03-03 Thread Phillip Susi
I have another proposal.  It looks like right now dpkg extracts all of
the files in the archive, then for each one, calls fsync() then
rename().  Because this is done serially for each file in the archive,
it forces small, out of order writes that cause extra seeking and queue
plugging.  It would be much better to use aio to queue up all of the
syncs at once, so that the elevator can coalesce and reorder them for
optimal writing.

Further, this processing is done on a per archive basis.  It would be
even better if multiple archives could be extracted at once, and all of
the fsyncs from all of the archives batched up.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d6fbf93.1030...@cfl.rr.com



Re: Speeding up dpkg, a proposal

2011-03-03 Thread Raphael Hertzog
Hi,

On Thu, 03 Mar 2011, Phillip Susi wrote:
 I have another proposal.  It looks like right now dpkg extracts all of
 the files in the archive, then for each one, calls fsync() then
 rename().  Because this is done serially for each file in the archive,
 it forces small, out of order writes that cause extra seeking and queue
 plugging.

That's wrong. The writeback is initiated before the fsync() so the
filesystem can order the write how it wants.

And we use some linux specific ioctl to avoid that fragmentation.

 It would be much better to use aio to queue up all of the syncs at once,
 so that the elevator can coalesce and reorder them for optimal writing.

I'm not convinced it would help. You're welcome to try and provide a
patch if it works.

I'm not even convinced it's possible with the existing interfaces (but I
have no experience with AIO). aio_fsync() is only usable with aio_write()
and it's not possible to use lio_listio() to batch a bunch of aio_fsync().

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Follow my Debian News ▶ http://RaphaelHertzog.com (English)
  ▶ http://RaphaelHertzog.fr (Français)


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110303174944.gb13...@rivendell.home.ouaza.com



Re: Speeding up dpkg, a proposal

2011-03-03 Thread Guillem Jover
On Thu, 2011-03-03 at 18:49:44 +0100, Raphael Hertzog wrote:
 On Thu, 03 Mar 2011, Phillip Susi wrote:
  It would be much better to use aio to queue up all of the syncs at once,
  so that the elevator can coalesce and reorder them for optimal writing.
 
 I'm not convinced it would help. You're welcome to try and provide a
 patch if it works.
 
 I'm not even convinced it's possible with the existing interfaces (but I
 have no experience with AIO). aio_fsync() is only usable with aio_write()
 and it's not possible to use lio_listio() to batch a bunch of aio_fsync().

Actually, this was discarded early on, as Linux does not implement
aio_fsync() for any file system. Also the interface is quite cumbersome
as it requires to keep state for each aio operation, and using SA_SIGINFO
(which is not yet available everywhere).

regards,
guillem


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110303183039.ga4...@gaara.hadrons.org



Re: Speeding up dpkg, a proposal

2011-03-03 Thread Phillip Susi
On 3/3/2011 12:49 PM, Raphael Hertzog wrote:
 That's wrong. The writeback is initiated before the fsync() so the
 filesystem can order the write how it wants.

Don't you mean it MAY be initiated if the cache decides there is enough
memory pressure?  I don't know of any other call besides fsync and
friends to force the writeback so before that is called, it could ( and
likely is if you have plenty of memory ) still be sitting in the cache
and the disk queue idle.

 And we use some linux specific ioctl to avoid that fragmentation.

Could you be more specific?


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d6fdeb8.5010...@cfl.rr.com



Re: Speeding up dpkg, a proposal

2011-03-03 Thread Olaf van der Spek
On Thu, Mar 3, 2011 at 7:32 PM, Phillip Susi ps...@cfl.rr.com wrote:
 And we use some linux specific ioctl to avoid that fragmentation.

 Could you be more specific?

   sync_file_range(fd.a, 0, 0, SYNC_FILE_RANGE_WRITE);
   sync_file_range(fd.a, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE);

Olaf


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/AANLkTimBZq=piivirm+o+rhzt5d-trmcbn6kncy2m...@mail.gmail.com



Re: Speeding up dpkg, a proposal

2011-03-03 Thread Phillip Susi
On 3/3/2011 1:30 PM, Guillem Jover wrote:
 Actually, this was discarded early on, as Linux does not implement
 aio_fsync() for any file system. Also the interface is quite cumbersome
 as it requires to keep state for each aio operation, and using SA_SIGINFO
 (which is not yet available everywhere).

I was wondering why I couldn't find the fs implementation.  How
annoying.  The posix and libaio wrapper interfaces are cumbersome, but
directly calling io_submit it would be perfect to queue up all of the
writes at once and then wait for their completion.  If the kernel
actually implemented it.  Sigh.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d6fe195.8030...@cfl.rr.com



Re: Speeding up dpkg, a proposal

2011-03-03 Thread Phillip Susi
On 3/3/2011 1:32 PM, Phillip Susi wrote:
 Don't you mean it MAY be initiated if the cache decides there is enough
 memory pressure?  I don't know of any other call besides fsync and
 friends to force the writeback so before that is called, it could ( and
 likely is if you have plenty of memory ) still be sitting in the cache
 and the disk queue idle.

Nevermind, I figured out what you meant.  This new
writeback_init/barrier() code that uses sync_file_range.  I hadn't seen
that before.  That should do quite well for an individual archive, so I
guess the next thing to do is try to get the calls to
tar_deferred_extract() delayed until after multiple archives are unpacked.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d6fe262.3050...@cfl.rr.com



Re: Speeding up dpkg, a proposal

2011-03-02 Thread Chow Loong Jin
Hi,

On Thursday 03,March,2011 12:02 AM, Marius Vollmer wrote:
 [...]
 - Instead, we move all packages that are to be unpacked into
   half-installed / reinstreq before touching the first one, and put a
   big sync() right before carefully writing /var/lib/dpkg/status.

Could we somehow avoid using sync()? sync() syncs all mounted filesystems, which
isn't exactly very friendly when you have a few slow-syncing filesystems like
btrfs (or even NFS) mounted. I recall my schroots that ran on tmpfs unpacking
exceptionally slowly due to this issue until I stuck libeatmydata (or a variant
of it) onto the schroots' dpkgs.

I actually recall there being some things mentioned about FS_IOC_SYNCFS (lkml
thread[1], dpkg-devel thread[2]) for faster per-filesystem syncing, but that
seems to have died of natural causes last year.

 [...]


[1] http://thread.gmane.org/gmane.linux.file-systems/44628
[2] http://lists.debian.org/debian-dpkg/2010/11/msg00069.html

-- 
Kind regards,
Loong Jin



signature.asc
Description: OpenPGP digital signature


Re: Speeding up dpkg, a proposal

2011-03-02 Thread Thilo Six
Chow Loong Jin wrote the following on 02.03.2011 18:51

 Hi,
 
 On Thursday 03,March,2011 12:02 AM, Marius Vollmer wrote:
 [...]
 - Instead, we move all packages that are to be unpacked into
   half-installed / reinstreq before touching the first one, and put a
   big sync() right before carefully writing /var/lib/dpkg/status.
 
 Could we somehow avoid using sync()? sync() syncs all mounted filesystems, 
 which
 isn't exactly very friendly when you have a few slow-syncing filesystems like
 btrfs (or even NFS) mounted. I recall my schroots that ran on tmpfs unpacking
 exceptionally slowly due to this issue until I stuck libeatmydata (or a 
 variant
 of it) onto the schroots' dpkgs.
 
 I actually recall there being some things mentioned about FS_IOC_SYNCFS (lkml
 thread[1], dpkg-devel thread[2]) for faster per-filesystem syncing, but that
 seems to have died of natural causes last year.
 
 [...]
 
 
 [1] http://thread.gmane.org/gmane.linux.file-systems/44628
 [2] http://lists.debian.org/debian-dpkg/2010/11/msg00069.html

Just for reference afaik there has been a discussion about this topic last year.
If you are interested why dpkg does all those syncs read:
http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/770841

-- 
bye Thilo

4096R/0xC70B1A8F
721B 1BA0 095C 1ABA 3FC6  7C18 89A4 A2A0 C70B 1A8F



-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/ikm75b$s3t$1...@dough.gmane.org



Re: Speeding up dpkg, a proposal

2011-03-02 Thread Roger Leigh
On Thu, Mar 03, 2011 at 01:51:35AM +0800, Chow Loong Jin wrote:
 Hi,
 
 On Thursday 03,March,2011 12:02 AM, Marius Vollmer wrote:
  [...]
  - Instead, we move all packages that are to be unpacked into
half-installed / reinstreq before touching the first one, and put a
big sync() right before carefully writing /var/lib/dpkg/status.
 
 Could we somehow avoid using sync()? sync() syncs all mounted filesystems,
 which isn't exactly very friendly when you have a few slow-syncing
 filesystems like btrfs (or even NFS) mounted. I recall my schroots that ran
 on tmpfs unpacking exceptionally slowly due to this issue until I stuck
 libeatmydata (or a variant of it) onto the schroots' dpkgs.

Btrfs is quite simply awful in chroots at present, and it seems
--force-unsafe-io doesn't really seem to help massively either.
It's dog slow--it's quicker to untar a chroot onto ext3 than to
bother with Btrfs.

This is a shame, because Btrfs snapshots are the most fast and
reliable out there at the moment (LVM snapshots are fast, but not
/that/ fast, and LVM appears to have locking issues which need
addressing to make it robust enough to handle simultaneous
creation and removal of many snapshots.

It would be great if there was a solution to this problem; is anyone
running Btrfs as a root filesystem who has any suggestions?


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?   http://gutenprint.sourceforge.net/
   `-GPG Public Key: 0x25BFB848   Please GPG sign your mail.


signature.asc
Description: Digital signature


Re: Speeding up dpkg, a proposal

2011-03-02 Thread Mike Hommey
On Wed, Mar 02, 2011 at 08:13:06PM +, Roger Leigh wrote:
 On Thu, Mar 03, 2011 at 01:51:35AM +0800, Chow Loong Jin wrote:
  Hi,
  
  On Thursday 03,March,2011 12:02 AM, Marius Vollmer wrote:
   [...]
   - Instead, we move all packages that are to be unpacked into
 half-installed / reinstreq before touching the first one, and put a
 big sync() right before carefully writing /var/lib/dpkg/status.
  
  Could we somehow avoid using sync()? sync() syncs all mounted filesystems,
  which isn't exactly very friendly when you have a few slow-syncing
  filesystems like btrfs (or even NFS) mounted. I recall my schroots that ran
  on tmpfs unpacking exceptionally slowly due to this issue until I stuck
  libeatmydata (or a variant of it) onto the schroots' dpkgs.
 
 Btrfs is quite simply awful in chroots at present, and it seems
 --force-unsafe-io doesn't really seem to help massively either.
 It's dog slow--it's quicker to untar a chroot onto ext3 than to
 bother with Btrfs.
 
 This is a shame, because Btrfs snapshots are the most fast and
 reliable out there at the moment (LVM snapshots are fast, but not
 /that/ fast, and LVM appears to have locking issues which need
 addressing to make it robust enough to handle simultaneous
 creation and removal of many snapshots.
 
 It would be great if there was a solution to this problem; is anyone
 running Btrfs as a root filesystem who has any suggestions?

eatmydata works great for my chroots on btrfs.

Mike


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110302202917.ga21...@glandium.org



Re: Speeding up dpkg, a proposal

2011-03-02 Thread Bastian Blank
On Wed, Mar 02, 2011 at 08:13:06PM +, Roger Leigh wrote:
 Btrfs is quite simply awful in chroots at present, and it seems
 --force-unsafe-io doesn't really seem to help massively either.
 It's dog slow--it's quicker to untar a chroot onto ext3 than to
 bother with Btrfs.

Because unsafe-io does not apply to the handling of the info directory.

Bastian

-- 
Those who hate and fight must stop themselves -- otherwise it is not stopped.
-- Spock, Day of the Dove, stardate unknown


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110302231220.ga31...@wavehammer.waldi.eu.org



Re: Speeding up dpkg, a proposal

2011-03-02 Thread Adrian von Bidder
On Wednesday 02 March 2011 17.02:11 Marius Vollmer wrote:
 - Instead, we move all packages that are to be unpacked into
   half-installed / reinstreq before touching the first one, and put a
   big sync() right before carefully writing /var/lib/dpkg/status.

You don't want to do this. While production systems usually are upgraded in 
downtime windows (with less load), it is sometimes necessary to install some 
package (tcpdump or whatever to diagnose problems...) while the system is 
under high load. Especially when you're trying to find out why the machine 
has a load of 20 and you can't afford to kill it...

On a machine with lots of RAM (== disk cache...) and high I/O load, you 
don't want to do a (global!) sync().  This can totally kill the machine for 
20min or more and is a big no go.

-- vbi

-- 
featured link: http://www.pool.ntp.org


signature.asc
Description: This is a digitally signed message part.


Re: Speeding up dpkg, a proposal

2011-03-02 Thread Adrian von Bidder
Yodel again!
On Wednesday 02 March 2011 17.02:11 Marius Vollmer wrote:
 It shows a speed up between factor six and two in our environment (ext4
 on a slowish flash drive) .  I am not sure whether messing with the
 fundamentals of dpkg is worth a factor of two in performance

To not be all negative: read the recent discussion about fsync() and other 
stuff in dpkg (I'm not sure where the discussion happened exactly; it was 
about dpkg becoming extremely slow in some use cases on modern filesystems 
like btrfs and was a short time before the release. Since then, there is an 
option for dpkg:

  unsafe-io:  Do  not  perform  safe  I/O  operations  when
  unpacking.  Currently  this  implies  not performing file
  system syncs before file renames, which is known to cause
  substantial performance degradation on some file systems,
  unfortunately the ones that require the safe I/O  on  the
  first  place  due  to  their unreliable behaviour causing
  zero-length files on abrupt system crashes.

  Note: For ext4, the main offender, consider using instead
  the mount option nodelalloc, which will fix both the per‐
  formance degradation and the data safety issues, the lat‐
  ter  by  making  the  file system not produce zero-length
  files on abrupt system  crashes  with  any  software  not
  doing syncs before atomic renames.

  Warning:  Using  this option might improve performance at
  the cost of losing data, use with care.

So you should compare against dpkg with unsafe-io. Very slightly pre-dating 
this: I often (on btrfs, and when I'm inside development chroots that don't 
matter much) end up wrapping aptitude/apt-get/dpkg with eatmydata and get 
the same benefits. But the tool really deserves its name: a hard reboot 
directly after a package installation will leave a mess behind... (I could 
so far avoid this; I recently had a zero-length initrd after a kernel 
upgrade that *might* have been related. OTOH I did a clean shutdown there so 
it shouldn't have happened...)

Yet another point, for the future: I *think* that btrfs is building an 
interface so applications can directly access btrfs transactions. Which 
would allow to do package upgrades in a btrfs transaction, and since (again, 
I *think* so, I'm not sure) can even be ended without forcing an immediate 
sync (the result is more like a barrier than a sync), this would be a fine 
way to deal with the situation.  At the cost that it's btrfs specific.

cheers
-- vbi

-- 
BOFH excuse #233:

TCP/IP UDP alarm threshold is set too low.


signature.asc
Description: This is a digitally signed message part.


Re: Speeding up dpkg, a proposal

2011-03-02 Thread Chow Loong Jin
On Thursday 03,March,2011 02:45 PM, Marius Vollmer wrote:
 ext Chow Loong Jin hyper...@ubuntu.com writes:
 
 Could we somehow avoid using sync()? sync() syncs all mounted filesystems, 
 which
 isn't exactly very friendly when you have a few slow-syncing filesystems like
 btrfs (or even NFS) mounted.
 
 Hmm, right.  We could keep a list of all files that need fsyncing, and
 then fsync them all just before writing the checkpoint.

I remember seeing there being some list of files to be fsynced in one of the
older dpkgs. It's probably that which led to the ext4 slowdown -- you'd get the
same effect of one sync() per file on systems with an ext4 root. If you had a
process doing heavy I/O in the background, each of those fsync()s will take a
considerable amount of time.

 Half of that is already done (for the content of the packages), we would
 need to add it for the files in /var/lib/dpkg/, or we could just fsync
 the whole directory.
 
 But then again, I would argue that the sync() is actually necessary
 always, for correct semantics: You also want to sync everything that the
 postinst script has done before recording that a package is fully
 installed.

Yes, you're right. I completely forgot about that. I don't think most postinst
scripts sync when done. I suppose the best that can be done is to batch the
stuff as best as can be done to reduce the number of sync()s needed.

-- 
Kind regards,
Loong Jin



signature.asc
Description: OpenPGP digital signature


Re: Speeding up dpkg, a proposal

2011-03-02 Thread Marius Vollmer
ext Chow Loong Jin hyper...@ubuntu.com writes:

 I remember seeing there being some list of files to be fsynced in one of the
 older dpkgs. It's probably that which led to the ext4 slowdown [...]

Hmm, performance is the ultimate reason for doing all this, but right
now, I am mostly interested in whether my changes are correct.  I know
that they improve performance, but I am not totally convinced that they
are actually correct in the way that they change the status of packages,
etc.

I am only proposing to add this as an option to dpkg, not to make it the
default.

We might enable it in Harmattan, if I have the balls and it does in fact
speed things up enough, but nothing of that is certain right now.  We
might get the improvement we need just from reducing our number of
packages to something reasonable.

 But then again, I would argue that the sync() is actually necessary
 always, for correct semantics: You also want to sync everything that the
 postinst script has done before recording that a package is fully
 installed.

 Yes, you're right. I completely forgot about that. I don't think most postinst
 scripts sync when done. I suppose the best that can be done is to batch the
 stuff as best as can be done to reduce the number of sync()s needed.

On the other hand, it _is_ the job of the maintainerscripts to sync if
that s necessary for correctness, and maybe we don't want to take that
reponsibility away from them.

And in the big picture, all we need is some guarantee that renames are
comitted in order, and after the content of the file that is being
renamed.  I have the impression that all reasonable filesystems give
that guarantee now, no?


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87r5ao8xqi@big.research.nokia.com