Re: NFS writes lock up system with -o tcp,-w32768

2011-03-31 Thread Walter Haidinger
Am 30.03.2011 21:54, schrieb Claudio Jeker:
 I guess there is a reason why the default is 8k.

Shouldn't mount_nfs(8) be updated then? It still says:
 -r readsize
 Set the read data size to the specified value. It should normally
 be a power of 2 greater than or equal to 1024.

No recommendation for writesize but I'd assume the values for readsize
would apply too.

Maybe something like: 
...power of 2 greater than or equal to the default of 8192.
  
Walter



Re: horribly slow fsck_ffs pass1 performance

2011-03-31 Thread Otto Moerbeek
On Wed, Mar 30, 2011 at 03:45:02PM -0500, Amit Kulkarni wrote:

 Hi,
 
 In fsck_ffs's pass1.c it just takes forever for large sized partitions
 and also if you have very high number of files stored on that
 partition (used inodes count goes high).
 
 fsck main limitation is in pass1.c.
 
 In pass1.c I found out that it in fact proceeded to check all inodes,
 but there's a misleading comment there, which says, Find all
 allocated blocks. So the original intent was to check only used
 inodes in that code block but somebody deleted that part of code when
 compared to FreeBSD. Is there any special reason not to build a used
 inode list, then only go through it as FreeBSD does? I know they added
 some stuff in the last year but that part of code has existed for a
 long time and we don't have it. Why not?
 
 I was reading cvs ver 1.46 of pass1.c in FreeBSD.
 
 Thanks

AFAIK, we never had that optimization.

It is interesting because it really speeds up fsck_ffs for filesystems
with few used inodes.

There's also a dangerous part: it assumes the cylinder group summary
info is ok when softdeps has been used. 

I suppose that's the reason why it was never included into OpenBSD.

I'll ponder if I want to work on this.

-Otto



Re: horribly slow fsck_ffs pass1 performance

2011-03-31 Thread Stuart Henderson
On 2011-03-31, Otto Moerbeek o...@drijf.net wrote:
 On Wed, Mar 30, 2011 at 03:45:02PM -0500, Amit Kulkarni wrote:
 In fsck_ffs's pass1.c it just takes forever for large sized partitions
 and also if you have very high number of files stored on that
 partition (used inodes count goes high).

If you really have a lot of used inodes, skipping the unused ones
isn't going to help :-)

You could always build your large-sized filesystems with a larger
value of bytes-per-inode. newfs -i 32768 or 65536 is good for common
filesystem use patterns with larger partitions (for specialist uses
e.g. storing backups as huge single files it might be appropriate
to go even higher).

Of course this does involve dump/restore if you need to do this for
an existing filesystem.

 It is interesting because it really speeds up fsck_ffs for filesystems
 with few used inodes.

 There's also a dangerous part: it assumes the cylinder group summary
 info is ok when softdeps has been used. 

 I suppose that's the reason why it was never included into OpenBSD.

 I'll ponder if I want to work on this.

A safer alternative to this optimization might be for the installer
(or newfs) to consider the fs size when deciding on a default inode
density.



Re: horribly slow fsck_ffs pass1 performance

2011-03-31 Thread Benny Lofgren
On 2011-03-31 11.13, Stuart Henderson wrote:
 On 2011-03-31, Otto Moerbeek o...@drijf.net wrote:
 On Wed, Mar 30, 2011 at 03:45:02PM -0500, Amit Kulkarni wrote:
 In fsck_ffs's pass1.c it just takes forever for large sized partitions
 and also if you have very high number of files stored on that
 partition (used inodes count goes high).
 If you really have a lot of used inodes, skipping the unused ones
 isn't going to help :-)
 You could always build your large-sized filesystems with a larger
 value of bytes-per-inode. newfs -i 32768 or 65536 is good for common
 filesystem use patterns with larger partitions (for specialist uses
 e.g. storing backups as huge single files it might be appropriate
 to go even higher).
 Of course this does involve dump/restore if you need to do this for
 an existing filesystem.
 It is interesting because it really speeds up fsck_ffs for filesystems
 with few used inodes.
 There's also a dangerous part: it assumes the cylinder group summary
 info is ok when softdeps has been used. 
 I suppose that's the reason why it was never included into OpenBSD.
 I'll ponder if I want to work on this.
 
 A safer alternative to this optimization might be for the installer
 (or newfs) to consider the fs size when deciding on a default inode
 density.

I think this is a very good idea regardless. I often forget to manually
tune large file systems, and end up with some ridiculously skewed
resource allocations.

For example, this is what one of my file systems looks like right now:

skynet:~# df -ih /u0
Filesystem SizeUsed   Avail Capacity iused   ifree  %iused
Mounted on
/dev/raid1a   12.6T7.0T5.5T56%  881220 211866810 0%   /u0

This one takes about an hour to fsck.

In general, the default values and algorithms for allocations could
probably do with a tune-up, since of course today's disks are several
magnitudes larger than only a few years ago (let alone than those that
were around when the bulk of the file system code was written!), and the
usage patterns are also in my experience often wildly different in a
large file system than in a smaller one.

I guess an fs like the one above would benefit a lot from the optimization
the OP mentions.

Perhaps it could be optional, since Otto mentions that it makes
assumptions on correctness of the cylinder group summary info. I haven't
looked at the code in a while, so I can't really judge the consequences
of that, or if some middle ground can be reached where the CG info is
sanity checked without the need for a full scan through every inode.


Regards,
/Benny

-- 
internetlabbet.se / work:   +46 8 551 124 80  / Words must
Benny Lvfgren/  mobile: +46 70 718 11 90 /   be weighed,
/   fax:+46 8 551 124 89/not counted.
   /email:  benny -at- internetlabbet.se



Re: horribly slow fsck_ffs pass1 performance

2011-03-31 Thread Otto Moerbeek
On Thu, Mar 31, 2011 at 09:13:41AM +, Stuart Henderson wrote:

 On 2011-03-31, Otto Moerbeek o...@drijf.net wrote:
  On Wed, Mar 30, 2011 at 03:45:02PM -0500, Amit Kulkarni wrote:
  In fsck_ffs's pass1.c it just takes forever for large sized partitions
  and also if you have very high number of files stored on that
  partition (used inodes count goes high).
 
 If you really have a lot of used inodes, skipping the unused ones
 isn't going to help :-)
 
 You could always build your large-sized filesystems with a larger
 value of bytes-per-inode. newfs -i 32768 or 65536 is good for common
 filesystem use patterns with larger partitions (for specialist uses
 e.g. storing backups as huge single files it might be appropriate
 to go even higher).

disklabel has code already to move to larger block and frag sizes for
large (new) partitions. newfs picks these settings up.


 
 Of course this does involve dump/restore if you need to do this for
 an existing filesystem.
 
  It is interesting because it really speeds up fsck_ffs for filesystems
  with few used inodes.
 
  There's also a dangerous part: it assumes the cylinder group summary
  info is ok when softdeps has been used. 
 
  I suppose that's the reason why it was never included into OpenBSD.
 
  I'll ponder if I want to work on this.
 
 A safer alternative to this optimization might be for the installer
 (or newfs) to consider the fs size when deciding on a default inode
 density.

-Otto



Re: horribly slow fsck_ffs pass1 performance

2011-03-31 Thread Otto Moerbeek
On Thu, Mar 31, 2011 at 12:30:29PM +0200, Benny Lofgren wrote:

 On 2011-03-31 11.13, Stuart Henderson wrote:
  On 2011-03-31, Otto Moerbeek o...@drijf.net wrote:
  On Wed, Mar 30, 2011 at 03:45:02PM -0500, Amit Kulkarni wrote:
  In fsck_ffs's pass1.c it just takes forever for large sized partitions
  and also if you have very high number of files stored on that
  partition (used inodes count goes high).
  If you really have a lot of used inodes, skipping the unused ones
  isn't going to help :-)
  You could always build your large-sized filesystems with a larger
  value of bytes-per-inode. newfs -i 32768 or 65536 is good for common
  filesystem use patterns with larger partitions (for specialist uses
  e.g. storing backups as huge single files it might be appropriate
  to go even higher).
  Of course this does involve dump/restore if you need to do this for
  an existing filesystem.
  It is interesting because it really speeds up fsck_ffs for filesystems
  with few used inodes.
  There's also a dangerous part: it assumes the cylinder group summary
  info is ok when softdeps has been used. 
  I suppose that's the reason why it was never included into OpenBSD.
  I'll ponder if I want to work on this.
  
  A safer alternative to this optimization might be for the installer
  (or newfs) to consider the fs size when deciding on a default inode
  density.
 
 I think this is a very good idea regardless. I often forget to manually
 tune large file systems, and end up with some ridiculously skewed
 resource allocations.
 
 For example, this is what one of my file systems looks like right now:
 
 skynet:~# df -ih /u0
 Filesystem SizeUsed   Avail Capacity iused   ifree  %iused
 Mounted on
 /dev/raid1a   12.6T7.0T5.5T56%  881220 211866810 0%   /u0
 
 This one takes about an hour to fsck.
 
 In general, the default values and algorithms for allocations could
 probably do with a tune-up, since of course today's disks are several
 magnitudes larger than only a few years ago (let alone than those that
 were around when the bulk of the file system code was written!), and the
 usage patterns are also in my experience often wildly different in a
 large file system than in a smaller one.

We do that already, inode density will be lower for newly created
partitions, because diskalbel sets larger block and fragment sizes.

-Otto

 
 I guess an fs like the one above would benefit a lot from the optimization
 the OP mentions.
 
 Perhaps it could be optional, since Otto mentions that it makes
 assumptions on correctness of the cylinder group summary info. I haven't
 looked at the code in a while, so I can't really judge the consequences
 of that, or if some middle ground can be reached where the CG info is
 sanity checked without the need for a full scan through every inode.
 
 
 Regards,
 /Benny
 
 -- 
 internetlabbet.se / work:   +46 8 551 124 80  / Words must
 Benny Lvfgren/  mobile: +46 70 718 11 90 /   be weighed,
 /   fax:+46 8 551 124 89/not counted.
/email:  benny -at- internetlabbet.se



Re: horribly slow fsck_ffs pass1 performance

2011-03-31 Thread Stuart Henderson
On 2011/03/31 12:46, Otto Moerbeek wrote:
  
  In general, the default values and algorithms for allocations could
  probably do with a tune-up, since of course today's disks are several
  magnitudes larger than only a few years ago (let alone than those that
  were around when the bulk of the file system code was written!), and the
  usage patterns are also in my experience often wildly different in a
  large file system than in a smaller one.
 
 We do that already, inode density will be lower for newly created
 partitions, because diskalbel sets larger block and fragment sizes.

Ah, the manual is out-of-date.

Index: newfs.8
===
RCS file: /cvs/src/sbin/newfs/newfs.8,v
retrieving revision 1.68
diff -u -p -r1.68 newfs.8
--- newfs.8 21 Mar 2010 07:51:23 -  1.68
+++ newfs.8 31 Mar 2011 11:10:18 -
@@ -169,7 +169,7 @@ The expected average file size for the f
 The expected average number of files per directory on the file system.
 .It Fl i Ar bytes
 This specifies the density of inodes in the file system.
-The default is to create an inode for each 8192 bytes of data space.
+The default is to create an inode for every 4 fragments.
 If fewer inodes are desired, a larger number should be used;
 to create more inodes a smaller number should be given.
 .It Fl m Ar free-space



acpivideo: do not trust _DOD for brightness

2011-03-31 Thread Martynas Venckus
Please test, details below.

We attach acpivout(4) to every device enumerated in _DOD.  However,
if you read ACPI spec. closely, it says that _DOD (unlike _DOS) is
not required if the system supports LCD brightness control.

In my case the situation is even worse--_DOD enumerates only 0x400
(which is non-existent), however there's this DD03 device which is
LCD and perfecly can handle brightness control.

I suggest to trust _DOD less, and search for devices having _BCL,
_BCM, and _BQC instead;  because only such device would be able to
handle brightness control.

We use the same trick in other drivers (look for functions instead
of trusting enumeration crap in acpi).

The following diff fixes my problem (Toshiba L300):

 acpivideo0 at acpi0: OVGA
+acpivout0 at acpivideo0: DD03

 acpivar.h   |8 -
 acpivideo.c |   81

 acpivout.c  |   30 +-
 3 files changed, 20 insertions(+), 99 deletions(-)

Index: acpivar.h
===
RCS file: /cvs/src/sys/dev/acpi/acpivar.h,v
retrieving revision 1.69
diff -u -r1.69 acpivar.h
--- acpivar.h   2 Jan 2011 04:56:57 -   1.69
+++ acpivar.h   31 Mar 2011 07:57:59 -
@@ -49,9 +49,6 @@

struct acpi_softc *sc_acpi;
struct aml_node *sc_devnode;
-
-   int *sc_dod;
-   size_t  sc_dod_len;
 };

 struct acpi_attach_args {
@@ -61,11 +58,6 @@
void*aaa_table;
struct aml_node *aaa_node;
const char  *aaa_dev;
-};
-
-struct acpivideo_attach_args {
-   struct acpi_attach_args aaa;
-   int dod;
 };

 struct acpi_mem_map {
Index: acpivideo.c
===
RCS file: /cvs/src/sys/dev/acpi/acpivideo.c,v
retrieving revision 1.7
diff -u -r1.7 acpivideo.c
--- acpivideo.c 27 Jul 2010 06:12:50 -  1.7
+++ acpivideo.c 31 Mar 2011 07:57:59 -
@@ -54,7 +54,6 @@
 intacpivideo_notify(struct aml_node *, int, void *);

 void   acpivideo_set_policy(struct acpivideo_softc *, int);
-void   acpivideo_get_dod(struct acpivideo_softc *);
 intacpi_foundvout(struct aml_node *, void *);
 intacpivideo_print(void *, const char *);

@@ -101,8 +100,6 @@
acpivideo_set_policy(sc,
DOS_SWITCH_BY_OSPM | DOS_BRIGHTNESS_BY_OSPM);

-   acpivideo_get_dod(sc);
-   aml_find_node(aaa-aaa_node, _DCS, acpi_foundvout, sc);
aml_find_node(aaa-aaa_node, _BCL, acpi_foundvout, sc);
 }

@@ -137,7 +134,7 @@
args.type = AML_OBJTYPE_INTEGER;

aml_evalname(sc-sc_acpi, sc-sc_devnode, _DOS, 1, args, res);
-   DPRINTF((%s: set policy to %d, DEVNAME(sc), aml_val2int(res)));
+   DPRINTF((%s: set policy to %X\n, DEVNAME(sc), aml_val2int(res)));

aml_freevalue(res);
 }
@@ -145,45 +142,23 @@
 int
 acpi_foundvout(struct aml_node *node, void *arg)
 {
-   struct aml_valueres;
-   int i, addr;
-   charfattach = 0;
-
struct acpivideo_softc *sc = (struct acpivideo_softc *)arg;
struct device *self = (struct device *)arg;
-   struct acpivideo_attach_args av;
+   struct acpi_attach_args aaa;
+   node = node-parent;

-   if (sc-sc_dod == NULL)
-   return (0);
-   DPRINTF((Inside acpi_foundvout()));
-   if (aml_evalname(sc-sc_acpi, node-parent, _ADR, 0, NULL, res)) {
-   DPRINTF((%s: no _ADR\n, DEVNAME(sc)));
+   DPRINTF((Inside acpi_foundvout()\n));
+   if (node-parent != sc-sc_devnode)
return (0);
-   }
-   addr = aml_val2int(res);
-   DPRINTF((_ADR: %X\n, addr));
-   aml_freevalue(res);

-   for (i = 0; i  sc-sc_dod_len; i++)
-   if (addr == (sc-sc_dod[i]0x)) {
-   DPRINTF((Matched: %X\n, sc-sc_dod[i]));
-   fattach = 1;
-   break;
-   }
-   if (fattach) {
-   memset(av, 0, sizeof(av));
-   av.aaa.aaa_iot = sc-sc_acpi-sc_iot;
-   av.aaa.aaa_memt = sc-sc_acpi-sc_memt;
-   av.aaa.aaa_node = node-parent;
-   av.aaa.aaa_name = acpivout;
-   av.dod = sc-sc_dod[i];
-   /*
-*  Make sure we don't attach twice if both _BCL and
-* _DCS methods are found by zeroing the DOD address.
-*/
-   sc-sc_dod[i] = 0;
+   if (aml_searchname(node, _BCM)  aml_searchname(node, _BQC)) {
+   memset(aaa, 0, sizeof(aaa));
+   aaa.aaa_iot = sc-sc_acpi-sc_iot;
+   aaa.aaa_memt = sc-sc_acpi-sc_memt;
+   aaa.aaa_node = node;
+   aaa.aaa_name = acpivout;

-   config_found(self, av, acpivideo_print);
+   config_found(self, aaa, acpivideo_print);
}

return (0);
@@ -202,38 +177,6 @@
}

return (UNCONF);
-}
-
-void
-acpivideo_get_dod(struct 

Re: horribly slow fsck_ffs pass1 performance

2011-03-31 Thread Otto Moerbeek
On Thu, Mar 31, 2011 at 12:30:29PM +0200, Benny Lofgren wrote:

 For example, this is what one of my file systems looks like right now:
 
 skynet:~# df -ih /u0
 Filesystem SizeUsed   Avail Capacity iused   ifree  %iused
 Mounted on
 /dev/raid1a   12.6T7.0T5.5T56%  881220 211866810 0%   /u0
 
 This one takes about an hour to fsck.

The change discussed won't help you much here, since ffs2 filesytems
already only initializes inodeblocks actually used. 

Memory use will be reduced, however, which might be even more
worthwhile. 

-Otto



Re: horribly slow fsck_ffs pass1 performance

2011-03-31 Thread Marco Peereboom
On Thu, Mar 31, 2011 at 09:13:41AM +, Stuart Henderson wrote:
 On 2011-03-31, Otto Moerbeek o...@drijf.net wrote:
  On Wed, Mar 30, 2011 at 03:45:02PM -0500, Amit Kulkarni wrote:
  In fsck_ffs's pass1.c it just takes forever for large sized partitions
  and also if you have very high number of files stored on that
  partition (used inodes count goes high).
 
 If you really have a lot of used inodes, skipping the unused ones
 isn't going to help :-)
 
 You could always build your large-sized filesystems with a larger
 value of bytes-per-inode. newfs -i 32768 or 65536 is good for common
 filesystem use patterns with larger partitions (for specialist uses
 e.g. storing backups as huge single files it might be appropriate
 to go even higher).

So this helps a lot to reduce fsck however if you play a lot with the
tuning parameters the only thing you tune is less speed.  I played
quite a bit with the parameters and the results were always worse than
the defaults.

 
 Of course this does involve dump/restore if you need to do this for
 an existing filesystem.
 
  It is interesting because it really speeds up fsck_ffs for filesystems
  with few used inodes.
 
  There's also a dangerous part: it assumes the cylinder group summary
  info is ok when softdeps has been used. 
 
  I suppose that's the reason why it was never included into OpenBSD.
 
  I'll ponder if I want to work on this.
 
 A safer alternative to this optimization might be for the installer
 (or newfs) to consider the fs size when deciding on a default inode
 density.



additional bpf mtap for carp

2011-03-31 Thread Mike Belopuhov
bpf is not called on multicast/broadcast packets arriving to the carp
interface.  this allows us to setup drop filters and allows tcpdump to
show all the packets.

OK/not-OK?

Index: ip_carp.c
===
RCS file: /home/cvs/src/sys/netinet/ip_carp.c,v
retrieving revision 1.181
diff -u -p -u -p -r1.181 ip_carp.c
--- ip_carp.c   8 Mar 2011 22:53:28 -   1.181
+++ ip_carp.c   31 Mar 2011 13:02:43 -
@@ -1580,6 +1580,11 @@ carp_input(struct mbuf *m, u_int8_t *sho
if (m0 == NULL)
continue;
m0-m_pkthdr.rcvif = vh-sc_if;
+#if NBPFILTER  0
+   if (vh-sc_if.if_bpf)
+   bpf_mtap_hdr(vh-sc_if.if_bpf, (char *)eh,
+   ETHER_HDR_LEN, m0, BPF_DIRECTION_IN);
+#endif
ether_input(vh-sc_if, eh, m0);
}
return (1);



Re: additional bpf mtap for carp

2011-03-31 Thread Claudio Jeker
On Thu, Mar 31, 2011 at 03:38:37PM +0200, Mike Belopuhov wrote:
 bpf is not called on multicast/broadcast packets arriving to the carp
 interface.  this allows us to setup drop filters and allows tcpdump to
 show all the packets.
 
 OK/not-OK?
 
 Index: ip_carp.c
 ===
 RCS file: /home/cvs/src/sys/netinet/ip_carp.c,v
 retrieving revision 1.181
 diff -u -p -u -p -r1.181 ip_carp.c
 --- ip_carp.c 8 Mar 2011 22:53:28 -   1.181
 +++ ip_carp.c 31 Mar 2011 13:02:43 -
 @@ -1580,6 +1580,11 @@ carp_input(struct mbuf *m, u_int8_t *sho
   if (m0 == NULL)
   continue;
   m0-m_pkthdr.rcvif = vh-sc_if;
 +#if NBPFILTER  0
 + if (vh-sc_if.if_bpf)
 + bpf_mtap_hdr(vh-sc_if.if_bpf, (char *)eh,
 + ETHER_HDR_LEN, m0, BPF_DIRECTION_IN);
 +#endif
The packet accounting is missing as well. So add this
vh-sc_if.if_ipackets++;
and then the diff is OK claudio@
   ether_input(vh-sc_if, eh, m0);
   }
   return (1);
 

-- 
:wq Claudio



wol for xl(4)

2011-03-31 Thread Stefan Sperling
This is an attempt to add wol support to xl(4).

Unfortunately, while I have an xl(4) card to test with none of the
motherboards I have will do WOL with it since they all lack an
on-board WOL connector :(

So test reports are needed.
Please also check whether WOL is disabled by default.

Index: ic/xl.c
===
RCS file: /cvs/src/sys/dev/ic/xl.c,v
retrieving revision 1.99
diff -u -p -r1.99 xl.c
--- ic/xl.c 22 Sep 2010 08:49:14 -  1.99
+++ ic/xl.c 31 Mar 2011 15:48:36 -
@@ -191,6 +191,9 @@ void xl_testpacket(struct xl_softc *);
 int xl_miibus_readreg(struct device *, int, int);
 void xl_miibus_writereg(struct device *, int, int, int);
 void xl_miibus_statchg(struct device *);
+#ifndef SMALL_KERNEL
+int xl_wol(struct ifnet *, int);
+#endif
 
 int
 xl_activate(struct device *self, int act)
@@ -2368,6 +2371,12 @@ xl_stop(struct xl_softc *sc)
ifp-if_flags = ~(IFF_RUNNING | IFF_OACTIVE);
 
xl_freetxrx(sc);
+
+#ifndef SMALL_KERNEL
+   /* Call upper layer WOL power routine if WOL is enabled. */
+   if ((sc-xl_flags  XL_FLAG_WOL)  sc-wol_power)
+   sc-wol_power(sc-wol_power_arg);
+#endif
 }
 
 void
@@ -2637,6 +2646,15 @@ xl_attach(struct xl_softc *sc)
CSR_WRITE_2(sc, XL_W0_MFG_ID, XL_NO_XCVR_PWR_MAGICBITS);
}
 
+#ifndef SMALL_KERNEL
+   /* Check availability of WOL. */
+   if ((sc-xl_caps  XL_CAPS_PWRMGMT) != 0) {
+   ifp-if_capabilities |= IFCAP_WOL;
+   ifp-if_wol = xl_wol;
+   xl_wol(ifp, 0);
+   }
+#endif
+
/*
 * Call MI attach routines.
 */
@@ -2668,6 +2686,24 @@ xl_detach(struct xl_softc *sc)
 
return (0);
 }
+
+#ifndef SMALL_KERNEL
+int
+xl_wol(struct ifnet *ifp, int enable)
+{
+   struct xl_softc *sc = ifp-if_softc;
+
+   XL_SEL_WIN(7);
+   if (enable) {
+   CSR_WRITE_2(sc, XL_W7_BM_PME, XL_BM_PME_MAGIC);
+   sc-xl_flags |= XL_FLAG_WOL;
+   } else {
+   CSR_WRITE_2(sc, XL_W7_BM_PME, 0);
+   sc-xl_flags = ~XL_FLAG_WOL;
+   }
+   return (0); 
+}
+#endif
 
 struct cfdriver xl_cd = {
0, xl, DV_IFNET
Index: ic/xlreg.h
===
RCS file: /cvs/src/sys/dev/ic/xlreg.h,v
retrieving revision 1.26
diff -u -p -r1.26 xlreg.h
--- ic/xlreg.h  21 Sep 2010 01:05:12 -  1.26
+++ ic/xlreg.h  31 Mar 2011 15:42:36 -
@@ -411,6 +411,12 @@
 #define XL_W7_BM_LEN   0x06
 #define XL_W7_BM_STATUS0x0B
 #define XL_W7_BM_TIMEr 0x0A
+#define XL_W7_BM_PME   0x0C
+
+#defineXL_BM_PME_WAKE  0x0001
+#defineXL_BM_PME_MAGIC 0x0002
+#defineXL_BM_PME_LINKCHG   0x0004
+#defineXL_BM_PME_WAKETIMER 0x0008
 
 /*
  * bus master control registers
@@ -571,6 +577,7 @@ struct xl_mii_frame {
 #define XL_FLAG_NO_XCVR_PWR0x0080
 #define XL_FLAG_USE_MMIO   0x0100
 #define XL_FLAG_NO_MMIO0x0200
+#define XL_FLAG_WOL0x0400
 
 #define XL_NO_XCVR_PWR_MAGICBITS   0x0900
 
@@ -604,6 +611,8 @@ struct xl_softc {
caddr_t sc_listkva;
bus_dmamap_tsc_rx_sparemap;
bus_dmamap_tsc_tx_sparemap;
+   void (*wol_power)(void *);
+   void *wol_power_arg;
 };
 
 #define xl_rx_goodframes(x) \
@@ -740,6 +749,13 @@ struct xl_stats {
 #define XL_PSTATE_D3   0x0003
 #define XL_PME_EN  0x0010
 #define XL_PME_STATUS  0x8000
+
+/* Bits in the XL_PCI_PWRMGMTCAP register */
+#define XL_PME_CAP_D0  0x0800
+#define XL_PME_CAP_D1  0x1000
+#define XL_PME_CAP_D2  0x2000
+#define XL_PME_CAP_D3_HOT  0x4000
+#define XL_PME_CAP_D3_COLD 0x8000
 
 extern int xl_intr(void *);
 extern void xl_attach(struct xl_softc *);
Index: pci/if_xl_pci.c
===
RCS file: /cvs/src/sys/dev/pci/if_xl_pci.c,v
retrieving revision 1.34
diff -u -p -r1.34 if_xl_pci.c
--- pci/if_xl_pci.c 19 Sep 2010 09:22:58 -  1.34
+++ pci/if_xl_pci.c 31 Mar 2011 15:43:05 -
@@ -92,10 +92,14 @@ int xl_pci_match(struct device *, void *
 void xl_pci_attach(struct device *, struct device *, void *);
 int xl_pci_detach(struct device *, int);
 void xl_pci_intr_ack(struct xl_softc *);
+#ifndef SMALL_KERNEL
+void xl_pci_wol_power(void *);
+#endif
 
 struct xl_pci_softc {
struct xl_softc psc_softc;
pci_chipset_tag_t   psc_pc;
+   pcitag_tpsc_tag;
bus_size_t  psc_iosize;
bus_size_t  psc_funsize;
 };
@@ -156,9 +160,11 @@ xl_pci_attach(struct device *parent, str
u_int32_t command;
 
psc-psc_pc = pc;
+   psc-psc_tag = pa-pa_tag;
sc-sc_dmat = pa-pa_dmat;
 
sc-xl_flags = 0;
+   

Re: horribly slow fsck_ffs pass1 performance

2011-03-31 Thread Otto Moerbeek
On Thu, Mar 31, 2011 at 02:50:36PM -0500, Amit Kulkarni wrote:

 
  If you really have a lot of used inodes, skipping the unused ones
  isn't going to help :-)
 
  You could always build your large-sized filesystems with a larger
  value of bytes-per-inode. newfs -i 32768 or 65536 is good for common
  filesystem use patterns with larger partitions (for specialist uses
  e.g. storing backups as huge single files it might be appropriate
  to go even higher).
 
 
 Stuart,
 
 Thanks for the tip. But I can verify when I did lookup my 80G
 filesystem it is currently not specifying -i, so it is 8Kb per a
 single inode (it is 4 times frag size per your update to newfs man
 page). This is a no brainer optimization which can get huge wins in
 fsck immediately without too much change in the existing code.

I dont think we want to change thed default density. Larger
parttitions already gets larger blocks and fragment, and as a
consequence lower number of inodes.

 Otto,
 In my tests on AMD64, if FFS partition size increases beyond 30GB,
 fsck starts taking exponential time even if you have zero used inodes.
 This is a for i () for j() loop and if you reduce the for j() inner
 loop it is a win.

Yes, it becomes very slow, but I don't think it is exponential.

 
 dumpfs -m /downloads
 # newfs command for /dev/wd0o
 newfs -O 1 -b 16384 -e 4096 -f 2048 -g 16384 -h 64 -m 5 -o time -s
 172714816 /dev/wd0o
 
 So, if I read it correctly, setting just the block size higher to say
 64Kb does auto tune frag size to 1/8 which is 8Kb (newfs complains
 appropriately) but the auto tune inode length to 4 times frag which is
 32Kb is not implemented now? Is this the proposed formula?

There's no such thing as inode length. 

 
 If a user tunes -i inodes, or -f frags or -b block size, it should all
 auto-adjust to the same outcome based on above formula in the future?

I don't see any formula.

If you feel you have too many inodes, you can use a larger -i, -b and or -f
For newly created partitions, newfs will pickup larger -b and -f from
the disklabel entry. If you still want less inodes, increase -f, -b or
-i further.

 
 dumpfs doesn't show the total inodes or the inode length in a easily
 readable format (-m option). Just trying to understand what the
 acronyms mean.

You want toal inodes = ng * ipg (number of cylinder groups * inode per
group) in the dumpfs header. I have no idea what you mean by inode length.

-Otto



Re: horribly slow fsck_ffs pass1 performance

2011-03-31 Thread Otto Moerbeek
So here's an initial, only lightly tested diff.

Beware, this very well could eat your filesystems.

To note any difference, you should use the -p mode of fsck_ffs (rc
does that) and the fs should have been mounted with softdep.

I have seen very nice speedups already.

-Otto

Index: dir.c
===
RCS file: /cvs/src/sbin/fsck_ffs/dir.c,v
retrieving revision 1.24
diff -u -p -r1.24 dir.c
--- dir.c   27 Oct 2009 23:59:32 -  1.24
+++ dir.c   31 Mar 2011 08:30:36 -
@@ -443,8 +443,8 @@ linkup(ino_t orphan, ino_t parentdir)
idesc.id_type = ADDR;
idesc.id_func = pass4check;
idesc.id_number = oldlfdir;
-   adjust(idesc, lncntp[oldlfdir] + 1);
-   lncntp[oldlfdir] = 0;
+   adjust(idesc, ILNCOUNT(oldlfdir) + 1);
+   ILNCOUNT(oldlfdir) = 0;
dp = ginode(lfdir);
}
if (GET_ISTATE(lfdir) != DFOUND) {
@@ -457,7 +457,7 @@ linkup(ino_t orphan, ino_t parentdir)
printf(\n\n);
return (0);
}
-   lncntp[orphan]--;
+   ILNCOUNT(orphan)--;
if (lostdir) {
if ((changeino(orphan, .., lfdir)  ALTERED) == 0 
parentdir != (ino_t)-1)
@@ -465,7 +465,7 @@ linkup(ino_t orphan, ino_t parentdir)
dp = ginode(lfdir);
DIP_SET(dp, di_nlink, DIP(dp, di_nlink) + 1);
inodirty();
-   lncntp[lfdir]++;
+   ILNCOUNT(lfdir)++;
pwarn(DIR I=%u CONNECTED. , orphan);
if (parentdir != (ino_t)-1) {
printf(PARENT WAS I=%u\n, parentdir);
@@ -476,7 +476,7 @@ linkup(ino_t orphan, ino_t parentdir)
 * fixes the parent link count so that fsck does
 * not need to be rerun.
 */
-   lncntp[parentdir]++;
+   ILNCOUNT(parentdir)++;
}
if (preen == 0)
printf(\n);
@@ -636,7 +636,7 @@ allocdir(ino_t parent, ino_t request, in
DIP_SET(dp, di_nlink, 2);
inodirty();
if (ino == ROOTINO) {
-   lncntp[ino] = DIP(dp, di_nlink);
+   ILNCOUNT(ino) = DIP(dp, di_nlink);
cacheino(dp, ino);
return(ino);
}
@@ -650,8 +650,8 @@ allocdir(ino_t parent, ino_t request, in
inp-i_dotdot = parent;
SET_ISTATE(ino, GET_ISTATE(parent));
if (GET_ISTATE(ino) == DSTATE) {
-   lncntp[ino] = DIP(dp, di_nlink);
-   lncntp[parent]++;
+   ILNCOUNT(ino) = DIP(dp, di_nlink);
+   ILNCOUNT(parent)++;
}
dp = ginode(parent);
DIP_SET(dp, di_nlink, DIP(dp, di_nlink) + 1);
Index: extern.h
===
RCS file: /cvs/src/sbin/fsck_ffs/extern.h,v
retrieving revision 1.10
diff -u -p -r1.10 extern.h
--- extern.h25 Jun 2007 19:59:55 -  1.10
+++ extern.h31 Mar 2011 11:56:53 -
@@ -54,6 +54,7 @@ int   ftypeok(union dinode *);
 void   getpathname(char *, size_t, ino_t, ino_t);
 void   inocleanup(void);
 void   inodirty(void);
+struct inostat *inoinfo(ino_t);
 intlinkup(ino_t, ino_t);
 intmakeentry(ino_t, ino_t, char *);
 void   pass1(void);
Index: fsck.h
===
RCS file: /cvs/src/sbin/fsck_ffs/fsck.h,v
retrieving revision 1.23
diff -u -p -r1.23 fsck.h
--- fsck.h  10 Jun 2008 23:10:29 -  1.23
+++ fsck.h  31 Mar 2011 11:55:42 -
@@ -66,6 +66,19 @@ union dinode {
 #define BUFSIZ 1024
 #endif
 
+/*
+ * Each inode on the file system is described by the following structure.
+ * The linkcnt is initially set to the value in the inode. Each time it
+ * is found during the descent in passes 2, 3, and 4 the count is
+ * decremented. Any inodes whose count is non-zero after pass 4 needs to
+ * have its link count adjusted by the value remaining in ino_linkcnt.
+ */
+struct inostat {
+   charino_state;  /* state of inode, see below */
+   charino_type;   /* type of inode */
+   short   ino_linkcnt;/* number of links not found */
+};
+
 #defineUSTATE  01  /* inode not allocated */
 #defineFSTATE  02  /* inode is file */
 #defineDSTATE  03  /* inode is directory */
@@ -73,12 +86,20 @@ union dinode {
 #defineDCLEAR  05  /* directory is to be cleared */
 #defineFCLEAR  06  /* file is to be cleared */
 
-#define GET_ISTATE(ino)(stmap[(ino)]  0xf)
-#define GET_ITYPE(ino) (stmap[(ino)]  4)
-#define SET_ISTATE(ino, v) do { stmap[(ino)] = (stmap[(ino)]  0xf0) | \
-   ((v)  0xf); } while (0)
-#define SET_ITYPE(ino, v)  do { 

Re: horribly slow fsck_ffs pass1 performance

2011-03-31 Thread Otto Moerbeek
On Thu, Mar 31, 2011 at 10:14:46PM +0200, Otto Moerbeek wrote:

 So here's an initial, only lightly tested diff.
 
 Beware, this very well could eat your filesystems.
 
 To note any difference, you should use the -p mode of fsck_ffs (rc
 does that) and the fs should have been mounted with softdep.
 
 I have seen very nice speedups already.

But don't count yourself a rich man too soon: for ffs2 filesystesm,
you won't see a lot of speedup, because inode blocks are allocated
on-demand there, so a filesystem with few inodes used likely has few
inode blocks. 

Also, depending on the usage patterns, you might have a fs where high
numbered inodes are used, while the fs itself is pretty empty. Filling
up a fs with lots of files and them removing a lot of them is an
example that could lead to such a situation. This diff does not speed
things up in such cases.

-Otto



Re: horribly slow fsck_ffs pass1 performance

2011-03-31 Thread Otto Moerbeek
On Thu, Mar 31, 2011 at 10:12:07PM +0200, Otto Moerbeek wrote:


  So, if I read it correctly, setting just the block size higher to say
  64Kb does auto tune frag size to 1/8 which is 8Kb (newfs complains
  appropriately) but the auto tune inode length to 4 times frag which is
  32Kb is not implemented now? Is this the proposed formula?
 
 There's no such thing as inode length. 
 
  
  If a user tunes -i inodes, or -f frags or -b block size, it should all
  auto-adjust to the same outcome based on above formula in the future?
 
 I don't see any formula.

Ah, now I understand what yoy mean by formula.

The rule is: if no -i parameter is given it's value is computed by 
4 * fragment size.

Default values for -b and -f are taken from the disklabel.
disklabel(8) in -E modes fills them in based on fs partition size. If
you specify -f or -b with newfs, these values override the values in
the label, and the label will be updated after the newfs. So the next
time you do a newfs, you'll re-use the last values for -b and -f. 

-Otto



Re: horribly slow fsck_ffs pass1 performance

2011-03-31 Thread Amit Kulkarni
 If you really have a lot of used inodes, skipping the unused ones
 isn't going to help :-)

 You could always build your large-sized filesystems with a larger
 value of bytes-per-inode. newfs -i 32768 or 65536 is good for common
 filesystem use patterns with larger partitions (for specialist uses
 e.g. storing backups as huge single files it might be appropriate
 to go even higher).


Stuart,

Thanks for the tip. But I can verify when I did lookup my 80G
filesystem it is currently not specifying -i, so it is 8Kb per a
single inode (it is 4 times frag size per your update to newfs man
page). This is a no brainer optimization which can get huge wins in
fsck immediately without too much change in the existing code.

Otto,
In my tests on AMD64, if FFS partition size increases beyond 30GB,
fsck starts taking exponential time even if you have zero used inodes.
This is a for i () for j() loop and if you reduce the for j() inner
loop it is a win.

dumpfs -m /downloads
# newfs command for /dev/wd0o
newfs -O 1 -b 16384 -e 4096 -f 2048 -g 16384 -h 64 -m 5 -o time -s
172714816 /dev/wd0o

So, if I read it correctly, setting just the block size higher to say
64Kb does auto tune frag size to 1/8 which is 8Kb (newfs complains
appropriately) but the auto tune inode length to 4 times frag which is
32Kb is not implemented now? Is this the proposed formula?

If a user tunes -i inodes, or -f frags or -b block size, it should all
auto-adjust to the same outcome based on above formula in the future?

dumpfs doesn't show the total inodes or the inode length in a easily
readable format (-m option). Just trying to understand what the
acronyms mean.

Thanks

 disklabel has code already to move to larger block and frag sizes for
 large (new) partitions. newfs picks these settings up.



 Of course this does involve dump/restore if you need to do this for
 an existing filesystem.

  It is interesting because it really speeds up fsck_ffs for filesystems
  with few used inodes.
 
  There's also a dangerous part: it assumes the cylinder group summary
  info is ok when softdeps has been used.
 
  I suppose that's the reason why it was never included into OpenBSD.
 
  I'll ponder if I want to work on this.


 A safer alternative to this optimization might be for the installer
 (or newfs) to consider the fs size when deciding on a default inode
 density.

-Otto



Re: horribly slow fsck_ffs pass1 performance

2011-03-31 Thread Amit Kulkarni
 I dont think we want to change thed default density. Larger
 parttitions already gets larger blocks and fragment, and as a
 consequence lower number of inodes.


 Otto,
 In my tests on AMD64, if FFS partition size increases beyond 30GB,
 fsck starts taking exponential time even if you have zero used inodes.
 This is a for i () for j() loop and if you reduce the for j() inner
 loop it is a win.

 Yes, it becomes very slow, but I don't think it is exponential.

Wo, even with ***existing code*** because I did a newfs -b 65536
-f 8192 wd0m (this has an implicit -i 32768)

fsck chewed through a 80G partition with 2 clang static analyzer runs
(2100 files of 200 Kb each) within 1 minute. When before this, it
never went past pass1 for over 5 hours.

Insanely fast fsck runs. Thanks Stuart and Otto. Why don't you make
the newfs default? What does everybody say?
newfs -b 65536 -f 8192 -i 32768

Somebody ought to change the section in FAQ too.!!

I will try out your diff right now.


 dumpfs -m /downloads
 # newfs command for /dev/wd0o
 newfs -O 1 -b 16384 -e 4096 -f 2048 -g 16384 -h 64 -m 5 -o time -s
 172714816 /dev/wd0o

 So, if I read it correctly, setting just the block size higher to say
 64Kb does auto tune frag size to 1/8 which is 8Kb (newfs complains
 appropriately) but the auto tune inode length to 4 times frag which is
 32Kb is not implemented now? Is this the proposed formula?

 There's no such thing as inode length.


Sorry what I meant was the size required to consider storing a single inode?