Re: properly name raid volumes on mpii

2011-02-24 Thread David Gwynne
On 24/02/2011, at 5:33 PM, Mark Kettenis wrote:

 Hmm, this looks like a fix for PR 6269, although that report is for
 mpi(4).  Any chance a similar fix works for mpi(4)?

thats a fairly old PR. is anyone able to reproduce it on mpi(4) with a modern
kernel?

dlg



Re: properly name raid volumes on mpii

2011-02-24 Thread David Gwynne
On 24/02/2011, at 6:51 PM, David Gwynne wrote:

 On 24/02/2011, at 5:33 PM, Mark Kettenis wrote:

 Hmm, this looks like a fix for PR 6269, although that report is for
 mpi(4).  Any chance a similar fix works for mpi(4)?

 thats a fairly old PR. is anyone able to reproduce it on mpi(4) with a
modern
 kernel?

hrm, a quick look over the code isnt inspiring :/



pfsync defer and ipv6

2011-03-02 Thread David Gwynne
there have been reports of panics with pfsync defer combined with
ipv6 traffic. ive been over the pfsync code repeatedly trying to
find out where it treats ipv4 and ipv6 differently without any luck.
i just had a lightbulb moment literally minutes ago and came up
with the code below. turns out that you SHOULD treat v4 and v6
differently in some cases...

i would appreciate testing of this change.

cheers,
dlg

Index: if_pfsync.c
===
RCS file: /cvs/src/sys/net/if_pfsync.c,v
retrieving revision 1.160
diff -u -p -r1.160 if_pfsync.c
--- if_pfsync.c 11 Jan 2011 08:33:27 -  1.160
+++ if_pfsync.c 2 Mar 2011 07:53:01 -
@@ -74,7 +74,11 @@
 #endif
 
 #ifdef INET6
+#include netinet/ip6.h
+#include netinet/in_pcb.h
+#include netinet/icmp6.h
 #include netinet6/nd6.h
+#include netinet6/ip6_divert.h
 #endif /* INET6 */
 
 #include carp.h
@@ -1745,8 +1749,18 @@ pfsync_undefer(struct pfsync_deferral *p
if (drop)
m_freem(pd-pd_m);
else {
-   ip_output(pd-pd_m, (void *)NULL, (void *)NULL, 0,
-   (void *)NULL, (void *)NULL);
+   switch (pd-pd_st-key[PF_SK_WIRE]-af) {
+#ifdef INET
+   case AF_INET:
+   ip_output(pd-pd_m, NULL, NULL, 0, NULL, NULL);
+   break;
+#endif /* INET */
+#ifdef INET6
+case AF_INET6:
+   ip6_output(pd-pd_m, NULL, NULL, 0, NULL, NULL, NULL);
+   break;
+#endif /* INET6 */
+}
}
 
pool_put(sc-sc_pool, pd);



Re: OpenBSD crash on an IBM x3550 M3

2011-03-04 Thread David Gwynne
i agree that mikebs change should go in.

On 05/03/2011, at 12:10 AM, Mark Kettenis wrote:

 Date: Fri, 4 Mar 2011 07:30:24 -0600
 From: Marco Peereboom sl...@peereboom.us

 That is a huge penalty because it is read over the pci bus.  The trick
 with 0x should work just fine per the doco and other os' drivers
 (on top of my head).  The question I have is does Linux only have one
 device per interrupt?

 Linux probably does a better job at avoiding shared interrupts than we
 do, but it on some hardware it can't be avoided so it has to deal with
 it.

 If you wantto avoid reading the interrupt status register, you'll have
 to stop trusting the hardware (or rather the firmware) in make
 mpi_reply(), and do bounds checks before accessing sc-sc_rcbs[] and
 sc-sc_ccbs[].  To be honest, that would be a good idea even if we
 didn't have this bug.

 In the meantime I think mikeb's fix should be committed.

 I am going to reference the doco one more time on this.

 On Thu, Mar 03, 2011 at 10:35:59PM -0500, Kenneth R Westerback wrote:
 On Thu, Mar 03, 2011 at 07:11:52PM +0100, Mike Belopuhov wrote:
 On Fri, Feb 04, 2011 at 14:53 +, emeric boit wrote:
 Hello,

 After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3,
 I find
 the
 system randomly panics after a period of use.
 uvm_fault(0x80cc8360, 0x8000149b7000, 0, 1) - e
 kernel: page
 fault trap, code=0
 Stopped at  mpi_reply+0x102:movq
 0(%r13),%rax
 ddb{0}

 ddb{0} trace
 mpi_reply() at mpi_reply+0x102
 mpi_intr()
 at mpi_intr+0x20
 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec
 ---
 interrupt ---
 Bad frame pointer: 0x8000194e1920
 end trace frame:
 0x8000194e1920, count: -3
 Xspllower+0xe:
 ddb{0}


 We've tried different things, but after this hint i realised
 that what might be happening is that bnx and mpi interrupts
 are chained (it's bnx0 actually, my initial guess about bnx1
 was wrong) and mpi_intr is called first.  Currently neither
 mpi(4) nor mpii(4) don't check the interrupt status register
 but look directly into the reply post queue.  Although,
 there's not supposed to be any race between host cpu reading
 from the memory and ioc writing to it, in practice it turns
 out that in some particular hardware configurations this rule
 is violated and we read a garbled reply from the controller.

 If my memory serves, I've considered this for the mpii_intr
 but never got into the situation where it was needed and
 thus omitted it.  I guess I have to bring it back too.

 Emeric tortured the machine with this diff and reported that
 it solves the issue for him.  OK to commit?

 On Wed, Mar 02, 2011 at 17:20 +, emeric boit wrote:
 hi,

 This change doesn't solve the issue.

 I have remarked that the server crash when I use the network.

 I copy a small file several times without problem.
 On the IBM I do :
 scp USER@IP:/tmp/mpi.c .

 And when I copy a larger file the server crash :
 scp USER@IP:/bsd .


 And when I copy th same file (bsd) from an usb key I don't have
problem.

 Emeric.


 that sounds like an interrupt sharing bug of some sort.
 is it bnx1 that you're using to reproduce a crash?

 try the following diff please (on a clean checkout):

 Index: mpi.c
 ===
 RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v
 retrieving revision 1.166
 diff -u -p -r1.166 mpi.c
 --- mpi.c  1 Mar 2011 23:48:33 -   1.166
 +++ mpi.c  2 Mar 2011 17:40:13 -
 @@ -887,6 +887,9 @@ mpi_intr(void *arg)
u_int32_t   reg;
int rv = 0;

 +  if ((mpi_read_intr(sc)  MPI_INTR_STATUS_REPLY) == 0)
 +  return (rv);
 +
while ((reg = mpi_pop_reply(sc)) != 0x) {
mpi_reply(sc, reg);
rv = 1;


 ok krw@.

  Ken



Re: hme empty rx ring

2011-03-22 Thread David Gwynne
makes sense to me.

On 23/03/2011, at 10:28 AM, Alexander Bluhm wrote:

 Hi,
 
 When the kernel runs out of mbuf clusters, the hme receive ring may
 become empty.  In that case, the hme driver cannot recover as the
 ring is only filled after receiving data.  My fix is to fill an
 empty receive ring every second.
 
 ok?
 
 bluhm
 
 
 Index: dev/ic/hme.c
 ===
 RCS file: /data/mirror/openbsd/cvs/src/sys/dev/ic/hme.c,v
 retrieving revision 1.61
 diff -u -p -r1.61 hme.c
 --- dev/ic/hme.c  15 Oct 2009 17:54:54 -  1.61
 +++ dev/ic/hme.c  23 Mar 2011 00:07:51 -
 @@ -362,6 +362,13 @@ hme_tick(arg)
   bus_space_write_4(t, mac, HME_MACI_EXCNT, 0);
   bus_space_write_4(t, mac, HME_MACI_LTCNT, 0);
 
 + /*
 +  * If buffer allocation fails, the receive ring may become
 +  * empty. There is no receive interrupt to recover from that.
 +  */
 + if (sc-sc_rx_cnt == 0)
 + hme_fill_rx_ring(sc);
 +
   mii_tick(sc-sc_mii);
   splx(s);



Re: kill loopback link1 wankery

2011-04-02 Thread David Gwynne
all the things i can imagine using this for, i can do with pf features.

ok.

On 02/04/2011, at 10:49 PM, Henning Brauer wrote:

 lo has that link1 wankery where it kind of replies to all addresses in
 the subnet, except that it doesn't really - it is very halfbaked and
 gets in the way. unless somebody has a VERY convincing reason to keep
 this it'll be gone in a few.
 we'll be able to use the fast rb tree lookup in in_aiwithaddr after
 this.

 Index: sys/netinet/ip_input.c
 ===
 RCS file: /cvs/src/sys/netinet/ip_input.c,v
 retrieving revision 1.186
 diff -u -p -r1.186 ip_input.c
 --- sys/netinet/ip_input.c11 Feb 2011 12:16:30 -  1.186
 +++ sys/netinet/ip_input.c2 Apr 2011 12:44:12 -
 @@ -688,10 +688,7 @@ in_iawithaddr(struct in_addr ina, struct
   TAILQ_FOREACH(ia, in_ifaddr, ia_list) {
   if (ia-ia_ifp-if_rdomain != rdomain)
   continue;
 - if ((ina.s_addr == ia-ia_addr.sin_addr.s_addr) ||
 - ((ia-ia_ifp-if_flags  (IFF_LOOPBACK|IFF_LINK1)) ==
 - (IFF_LOOPBACK|IFF_LINK1) 
 -  ia-ia_net == (ina.s_addr  ia-ia_netmask)))
 + if (ina.s_addr == ia-ia_addr.sin_addr.s_addr)
   return ia;
   /* check ancient classful too, e. g. for rarp-based netboot */
   if (((ip_directedbcast == 0) || (m  ip_directedbcast 
 Index: share/man/man4/lo.4
 ===
 RCS file: /cvs/src/share/man/man4/lo.4,v
 retrieving revision 1.26
 diff -u -p -r1.26 lo.4
 --- share/man/man4/lo.4   31 May 2007 19:19:50 -  1.26
 +++ share/man/man4/lo.4   2 Apr 2011 12:44:12 -
 @@ -70,33 +70,6 @@ The loopback should
 .Em never
 be configured first unless no hardware
 interfaces exist.
 -.Pp
 -Configuring a loopback interface for
 -.Xr inet 4
 -with the
 -.Em link1
 -flag set will make the interface answer to the whole set of
 -addresses identified as being in super-net which is specified
 -by the address and netmask.
 -Obviously you should not set the
 -.Em link1
 -flag on interface
 -.Nm lo0 ,
 -but instead use another interface like
 -.Nm lo1 .
 -.Sh EXAMPLES
 -.Bd -literal
 -# ifconfig lo1 create
 -# ifconfig lo1 inet 192.168.1.1 netmask 255.255.255.0 link1
 -.Ed
 -.Pp
 -is equivalent to:
 -.Bd -literal
 -# ifconfig lo1 create
 -# awk 'BEGIN {for(i=1;i255;i++) \e
 - print ifconfig lo1 inet 192.168.1.i netmask 255.255.255.255 alias}'|
\e
 - sh
 -.Ed
 .Sh DIAGNOSTICS
 .Bl -diag
 .It lo%d: can't handle af%d.
 @@ -116,16 +89,8 @@ The
 .Nm
 device appeared in
 .Bx 4.2 .
 -.Pp
 -The wildcard functionality first appeared in
 -.Ox 2.3 .
 .Sh BUGS
 Previous versions of the system enabled the loopback interface
 automatically, using a non-standard Internet address (127.1).
 Use of that address is now discouraged; a reserved host address
 for the local network should be used instead.
 -.Pp
 -Care should be taken when using NAT with interfaces that have the
 -.Em link1
 -flag set, because it may believe the packets are coming from a
 -loopback address.


 --
 Henning Brauer, h...@bsws.de, henn...@openbsd.org
 BS Web Services, http://bsws.de
 Full-Service ISP - Secure Hosting, Mail and DNS Services
 Dedicated Servers, Rootservers, Application Hosting



Re: atascsi dma_alloc() - make atascsi play nicer with bigmem

2011-04-02 Thread David Gwynne
ok.

On 02/04/2011, at 11:15 PM, Kenneth R Westerback wrote:

 Another driver malloc'ing and passing potentially dma unsafe memory
 to do i/o into.
 
 ok?
 
  Ken
 
 Index: atascsi.c
 ===
 RCS file: /cvs/src/sys/dev/ata/atascsi.c,v
 retrieving revision 1.101
 diff -u -p -r1.101 atascsi.c
 --- atascsi.c 3 Feb 2011 21:22:19 -   1.101
 +++ atascsi.c 2 Apr 2011 13:03:58 -
 @@ -26,6 +26,7 @@
 #include sys/device.h
 #include sys/proc.h
 #include sys/queue.h
 +#include sys/pool.h
 
 #include scsi/scsi_all.h
 #include scsi/scsi_disk.h
 @@ -335,8 +336,8 @@ atascsi_probe(struct scsi_link *link)
   xa = scsi_io_get(ahp-ahp_iopool, SCSI_NOSLEEP);
   if (xa == NULL)
   panic(no free xfers on a new port);
 - /* XXX dma reachable */
 - identify = malloc(sizeof(*identify), M_TEMP, M_WAITOK);
 + identify = dma_alloc(sizeof(*identify),
 + PR_WAITOK | PR_ZERO);
   xa-pmp_port = ap-ap_pmp_port;
   xa-data = identify;
   xa-datalen = sizeof(*identify);
 @@ -353,10 +354,10 @@ atascsi_probe(struct scsi_link *link)
   if (rv == 0) {
   bcopy(identify, ap-ap_identify,
   sizeof(ap-ap_identify));
 - free(identify, M_TEMP);
 + dma_free(identify, sizeof(*identify));
   break;
   }
 - free(identify, M_TEMP);
 + dma_free(identify, sizeof(*identify));
   delay(500);
   } while (count--);



Re: remove bufqs from vnds

2011-04-02 Thread David Gwynne
ok

On 02/04/2011, at 11:58 PM, Thordur Bjornsson wrote:

 Hi,
 
 So, it doesn't make sense to have a bufq for vnds.
 
 The disk that stores the image backing the vnd has it's own bufq
 ofcourse and what happens is that vnd puts a buf on it's bufq,
 which is promptly removed when we call vndstart, followed by a call
 to strategy so the buf ends up almost immediately on the bufq
 on the underlaying disk.
 
 Tested on vnd/svnd (and with the image on NFS. vnd is broken on nfs!).
 
 OK?
 
 
 Index: vnd.c
 ===
 RCS file: /home/thib/cvs/src/sys/dev/vnd.c,v
 retrieving revision 1.107
 diff -u -p -r1.107 vnd.c
 --- vnd.c 15 Feb 2011 20:02:11 -  1.107
 +++ vnd.c 2 Apr 2011 11:34:38 -
 @@ -127,8 +127,6 @@ struct vnd_softc {
   struct disk  sc_dk;
   char sc_dk_name[16];
 
 - struct bufq  sc_bufq;
 -
   char sc_file[VNDNLEN];  /* file we're covering */
   int  sc_flags;  /* flags */
   size_t   sc_size;   /* size of vnd in sectors */
 @@ -159,7 +157,7 @@ int numvnd = 0;
 void  vndattach(int);
 
 void  vndclear(struct vnd_softc *);
 -void vndstart(struct vnd_softc *);
 +void vndstart(struct vnd_softc *, struct buf *);
 int   vndsetcred(struct vnd_softc *, struct ucred *);
 void  vndiodone(struct buf *);
 void  vndshutdown(void);
 @@ -445,64 +443,50 @@ vndstrategy(struct buf *bp)
 
   /* No bypassing of buffer cache?  */
   if (vndsimple(bp-b_dev)) {
 - /* Loop until all queued requests are handled.  */
 - for (;;) {
 - int part = DISKPART(bp-b_dev);
 - daddr64_t off = DL_SECTOBLK(vnd-sc_dk.dk_label,
 - 
 DL_GETPOFFSET(vnd-sc_dk.dk_label-d_partitions[part]));
 - aiov.iov_base = bp-b_data;
 - auio.uio_resid = aiov.iov_len = bp-b_bcount;
 - auio.uio_iov = aiov;
 - auio.uio_iovcnt = 1;
 - auio.uio_offset = dbtob((off_t)(bp-b_blkno + off));
 - auio.uio_segflg = UIO_SYSSPACE;
 - auio.uio_procp = p;
 -
 - vn_lock(vnd-sc_vp, LK_EXCLUSIVE | LK_RETRY, p);
 - if (bp-b_flags  B_READ) {
 - auio.uio_rw = UIO_READ;
 - bp-b_error = VOP_READ(vnd-sc_vp, auio, 0,
 - vnd-sc_cred);
 - if (vnd-sc_keyctx)
 - vndencrypt(vnd, bp-b_data,
 -bp-b_bcount, bp-b_blkno, 0);
 - } else {
 - if (vnd-sc_keyctx)
 - vndencrypt(vnd, bp-b_data,
 -bp-b_bcount, bp-b_blkno, 1);
 - auio.uio_rw = UIO_WRITE;
 - /*
 -  * Upper layer has already checked I/O for
 -  * limits, so there is no need to do it again.
 -  */
 - bp-b_error = VOP_WRITE(vnd-sc_vp, auio,
 - IO_NOLIMIT, vnd-sc_cred);
 - /* Data in buffer cache needs to be in clear */
 - if (vnd-sc_keyctx)
 - vndencrypt(vnd, bp-b_data,
 -bp-b_bcount, bp-b_blkno, 0);
 - }
 - VOP_UNLOCK(vnd-sc_vp, 0, p);
 - if (bp-b_error)
 - bp-b_flags |= B_ERROR;
 - bp-b_resid = auio.uio_resid;
 - s = splbio();
 - biodone(bp);
 - splx(s);
 -
 - /* If nothing more is queued, we are done. */
 - if (!bufq_peek(vnd-sc_bufq))
 - return;
 -
 + int part = DISKPART(bp-b_dev);
 + daddr64_t off = DL_SECTOBLK(vnd-sc_dk.dk_label,
 + DL_GETPOFFSET(vnd-sc_dk.dk_label-d_partitions[part]));
 + aiov.iov_base = bp-b_data;
 + auio.uio_resid = aiov.iov_len = bp-b_bcount;
 + auio.uio_iov = aiov;
 + auio.uio_iovcnt = 1;
 + auio.uio_offset = dbtob((off_t)(bp-b_blkno + off));
 + auio.uio_segflg = UIO_SYSSPACE;
 + auio.uio_procp = p;
 +
 + vn_lock(vnd-sc_vp, LK_EXCLUSIVE | LK_RETRY, p);
 + if (bp-b_flags  B_READ) {
 + auio.uio_rw = UIO_READ;
 + bp-b_error = VOP_READ(vnd-sc_vp, auio, 0,
 + vnd-sc_cred);
 + if (vnd-sc_keyctx)
 + vndencrypt(vnd, bp-b_data,
 + 

iopools for ips(4)

2011-04-03 Thread David Gwynne
this cuts ips over to using iopools. it gets the usual benefits of
more reliable ioctl paths, better io scheduling between volumes and
the pt busses, and a removal of NO_CCB.

i dont have an ips, so i cant test this. id like more than an ok
from gcc before committing this.

Index: ips.c
===
RCS file: /cvs/src/sys/dev/pci/ips.c,v
retrieving revision 1.104
diff -u -p -r1.104 ips.c
--- ips.c   12 Oct 2010 00:53:32 -  1.104
+++ ips.c   3 Apr 2011 11:57:20 -
@@ -417,6 +417,8 @@ struct ips_softc {
struct ips_ccb *sc_ccb;
int sc_nccbs;
struct ips_ccbq sc_ccbq_free;
+   struct mutexsc_ccb_mtx;
+   struct scsi_iopool  sc_iopool;
 
struct dmamem   sc_sqm;
paddr_t sc_sqtail;
@@ -480,8 +482,8 @@ u_int32_t ips_morpheus_status(struct ips
 
 struct ips_ccb *ips_ccb_alloc(struct ips_softc *, int);
 void   ips_ccb_free(struct ips_softc *, struct ips_ccb *, int);
-struct ips_ccb *ips_ccb_get(struct ips_softc *);
-void   ips_ccb_put(struct ips_softc *, struct ips_ccb *);
+void   *ips_ccb_get(void *);
+void   ips_ccb_put(void *, void *);
 
 intips_dmamem_alloc(struct dmamem *, bus_dma_tag_t, bus_size_t);
 void   ips_dmamem_free(struct dmamem *);
@@ -660,6 +662,8 @@ ips_attach(struct device *parent, struct
ccb0.c_cmdbpa = sc-sc_cmdbm.dm_paddr;
SLIST_INIT(sc-sc_ccbq_free);
SLIST_INSERT_HEAD(sc-sc_ccbq_free, ccb0, c_link);
+   mtx_init(sc-sc_ccb_mtx, IPL_BIO);
+   scsi_iopool_init(sc-sc_iopool, sc, ips_ccb_get, ips_ccb_put);
 
/* Get adapter info */
if (ips_getadapterinfo(sc, SCSI_NOSLEEP)) {
@@ -731,6 +735,7 @@ ips_attach(struct device *parent, struct
sc-sc_scsi_link.adapter_buswidth = sc-sc_nunits;
sc-sc_scsi_link.adapter = ips_scsi_adapter;
sc-sc_scsi_link.adapter_softc = sc;
+   sc-sc_scsi_link.pool = sc-sc_iopool;
 
bzero(saa, sizeof(saa));
saa.saa_sc_link = sc-sc_scsi_link;
@@ -774,6 +779,7 @@ ips_attach(struct device *parent, struct
link-adapter_buswidth = lastarget + 1;
link-adapter = ips_scsi_pt_adapter;
link-adapter_softc = pt;
+   link-pool = sc-sc_iopool;
 
saa.saa_sc_link = link;
config_found(self, saa, scsiprint);
@@ -841,11 +847,11 @@ ips_scsi_cmd(struct scsi_xfer *xs)
struct scsi_sense_data sd;
struct scsi_rw *rw;
struct scsi_rw_big *rwb;
-   struct ips_ccb *ccb;
+   struct ips_ccb *ccb = xs-io;
struct ips_cmd *cmd;
int target = link-target;
u_int32_t blkno, blkcnt;
-   int code, s;
+   int code;
 
DPRINTF(IPS_D_XFER, (%s: ips_scsi_cmd: xs %p, target %d, 
opcode 0x%02x, flags 0x%x\n, sc-sc_dev.dv_xname, xs, target,
@@ -894,16 +900,7 @@ ips_scsi_cmd(struct scsi_xfer *xs)
else
code = IPS_CMD_WRITE;
 
-   s = splbio();
-   ccb = ips_ccb_get(sc);
-   splx(s);
-   if (ccb == NULL) {
-   DPRINTF(IPS_D_ERR, (%s: ips_scsi_cmd: no ccb\n,
-   sc-sc_dev.dv_xname));
-   xs-error = XS_NO_CCB;
-   scsi_done(xs);
-   return;
-   }
+   ccb = xs-io;
 
cmd = ccb-c_cmdbva;
cmd-code = code;
@@ -914,12 +911,9 @@ ips_scsi_cmd(struct scsi_xfer *xs)
if (ips_load_xs(sc, ccb, xs)) {
DPRINTF(IPS_D_ERR, (%s: ips_scsi_cmd: ips_load_xs 
failed\n, sc-sc_dev.dv_xname));
-
-   s = splbio();
-   ips_ccb_put(sc, ccb);
-   splx(s);
xs-error = XS_DRIVER_STUFFUP;
-   break;
+   scsi_done(xs);
+   return;
}
 
if (cmd-sgcnt  0)
@@ -954,17 +948,6 @@ ips_scsi_cmd(struct scsi_xfer *xs)
memcpy(xs-data, sd, MIN(xs-datalen, sizeof(sd)));
break;
case SYNCHRONIZE_CACHE:
-   s = splbio();
-   ccb = ips_ccb_get(sc);
-   splx(s);
-   if (ccb == NULL) {
-   DPRINTF(IPS_D_ERR, (%s: ips_scsi_cmd: no ccb\n,
-   sc-sc_dev.dv_xname));
-   xs-error = XS_NO_CCB;
-   scsi_done(xs);
-   return;
-   }
-
cmd = ccb-c_cmdbva;
cmd-code = IPS_CMD_FLUSH;
 
@@ -991,12 +974,11 @@ ips_scsi_pt_cmd(struct scsi_xfer *xs)
struct ips_pt *pt = link-adapter_softc;
struct ips_softc *sc = pt-pt_sc;
struct device *dev = link-device_softc;
-   struct ips_ccb *ccb;
+   struct 

iopools for twe(4)

2011-04-03 Thread David Gwynne
if you have a twe, let me first say how sorry i am for you.

this cuts twe over to using iopools. it gets the usual benefits of
more reliable ioctl paths, better io scheduling between volumes and
the pt busses, and a removal of NO_CCB.

it is a bit more than a straight conversion, it also moves the
draining of the async event notification over to iohandlers. this
simplifies them a bit and makes them reliable when the controller
is under load.

i dont have an twe, so i cant test this. id like more than an ok
from gcc before committing this.

Index: twe.c
===
RCS file: /cvs/src/sys/dev/ic/twe.c,v
retrieving revision 1.38
diff -u -p -r1.38 twe.c
--- twe.c   20 Sep 2010 06:17:49 -  1.38
+++ twe.c   3 Apr 2011 14:23:55 -
@@ -70,8 +70,8 @@ struct scsi_adapter twe_switch = {
twe_scsi_cmd, tweminphys, 0, 0,
 };
 
-static __inline struct twe_ccb *twe_get_ccb(struct twe_softc *sc);
-static __inline void twe_put_ccb(struct twe_ccb *ccb);
+void *twe_get_ccb(void *);
+void twe_put_ccb(void *, void *);
 void twe_dispose(struct twe_softc *sc);
 int  twe_cmd(struct twe_ccb *ccb, int flags, int wait);
 int  twe_start(struct twe_ccb *ccb, int wait);
@@ -80,28 +80,33 @@ int  twe_done(struct twe_softc *sc, stru
 void twe_copy_internal_data(struct scsi_xfer *xs, void *v, size_t size);
 void twe_thread_create(void *v);
 void twe_thread(void *v);
+void twe_aen(void *, void *);
 
-
-static __inline struct twe_ccb *
-twe_get_ccb(sc)
-   struct twe_softc *sc;
+void *
+twe_get_ccb(void *xsc)
 {
+   struct twe_softc *sc = xsc;
struct twe_ccb *ccb;
 
+   mtx_enter(sc-sc_ccb_mtx);
ccb = TAILQ_LAST(sc-sc_free_ccb, twe_queue_head);
-   if (ccb)
+   if (ccb != NULL)
TAILQ_REMOVE(sc-sc_free_ccb, ccb, ccb_link);
-   return ccb;
+   mtx_leave(sc-sc_ccb_mtx);
+
+   return (ccb);
 }
 
-static __inline void
-twe_put_ccb(ccb)
-   struct twe_ccb *ccb;
+void
+twe_put_ccb(void *xsc, void *xccb)
 {
-   struct twe_softc *sc = ccb-ccb_sc;
+   struct twe_softc *sc = xsc;
+   struct twe_ccb *ccb = xccb;
 
ccb-ccb_state = TWE_CCB_FREE;
+   mtx_enter(sc-sc_ccb_mtx);
TAILQ_INSERT_TAIL(sc-sc_free_ccb, ccb, ccb_link);
+   mtx_leave(sc-sc_ccb_mtx);
 }
 
 void
@@ -176,6 +181,10 @@ twe_attach(sc)
TAILQ_INIT(sc-sc_ccbq);
TAILQ_INIT(sc-sc_free_ccb);
TAILQ_INIT(sc-sc_done_ccb);
+   mtx_init(sc-sc_ccb_mtx, IPL_BIO);
+   scsi_iopool_init(sc-sc_iopool, sc, twe_get_ccb, twe_put_ccb);
+
+   scsi_ioh_set(sc-sc_aen, sc-sc_iopool, twe_aen, sc);
 
pa = sc-sc_cmdmap-dm_segs[0].ds_addr +
sizeof(struct twe_cmd) * (TWE_MAXCMDS - 1);
@@ -238,9 +247,10 @@ twe_attach(sc)
/* drain aen queue */
for (veseen_srst = 0, aen = -1; aen != TWE_AEN_QEMPTY; ) {
 
-   if ((ccb = twe_get_ccb(sc)) == NULL) {
+   ccb = scsi_io_get(sc-sc_iopool, 0);
+   if (ccb == NULL) {
errstr = : out of ccbs\n;
-   continue;
+   break;
}
 
ccb-ccb_xs = NULL;
@@ -256,10 +266,13 @@ twe_attach(sc)
pb-param_id = 2;
pb-param_size = 2;
 
-   if (twe_cmd(ccb, BUS_DMA_NOWAIT, 1)) {
+   error = twe_cmd(ccb, BUS_DMA_NOWAIT, 1);
+   scsi_io_put(sc-sc_iopool, ccb);
+   if (error) {
errstr = : error draining attention queue\n;
break;
}
+
aen = *(u_int16_t *)pb-data;
TWE_DPRINTF(TWE_D_AEN, (aen=%x , aen));
if (aen == TWE_AEN_SRST)
@@ -305,7 +318,8 @@ twe_attach(sc)
return 1;
}
 
-   if ((ccb = twe_get_ccb(sc)) == NULL) {
+   ccb = scsi_io_get(sc-sc_iopool, 0);
+   if (ccb == NULL) {
printf(: out of ccbs\n);
twe_dispose(sc);
return 1;
@@ -323,7 +337,10 @@ twe_attach(sc)
pb-table_id = TWE_PARAM_UC;
pb-param_id = TWE_PARAM_UC;
pb-param_size = TWE_MAX_UNITS;
-   if (twe_cmd(ccb, BUS_DMA_NOWAIT, 1)) {
+
+   error = twe_cmd(ccb, BUS_DMA_NOWAIT, 1);
+   scsi_io_put(sc-sc_iopool, ccb);
+   if (error) {
printf(: failed to fetch unit parameters\n);
twe_dispose(sc);
return 1;
@@ -336,7 +353,8 @@ twe_attach(sc)
if (pb-data[i] == 0)
continue;
 
-   if ((ccb = twe_get_ccb(sc)) == NULL) {
+   ccb = scsi_io_get(sc-sc_iopool, 0);
+   if (ccb == NULL) {
printf(: out of ccbs\n);
twe_dispose(sc);
  

Re: ahci big mem friendlification

2011-04-03 Thread David Gwynne
ahci_port_read_ncq_error is used from interrupt context, so you either have to
preallocate it during port attach (hi kettenis!) or fix your flags.

dlg

On 03/04/2011, at 11:38 PM, Kenneth R Westerback wrote:

 Another allocation/memory use made big mem friendly.

  Ken

 Index: ahci.c
 ===
 RCS file: /cvs/src/sys/dev/pci/ahci.c,v
 retrieving revision 1.172
 diff -u -p -r1.172 ahci.c
 --- ahci.c28 Jan 2011 06:32:31 -  1.172
 +++ ahci.c3 Apr 2011 13:35:05 -
 @@ -27,6 +27,7 @@
 #include sys/timeout.h
 #include sys/queue.h
 #include sys/mutex.h
 +#include sys/pool.h

 #include machine/bus.h

 @@ -391,8 +392,6 @@ struct ahci_port {
   u_int32_t   ap_err_saved_active;
   u_int32_t   ap_err_saved_active_cnt;

 - u_int8_tap_err_scratch[512];
 -
 #ifdef AHCI_DEBUG
   charap_name[16];
 #define PORTNAME(_ap) ((_ap)-ap_name)
 @@ -3098,9 +3097,10 @@ ahci_port_read_ncq_error(struct ahci_por
 {
   struct ahci_ccb *ccb;
   struct ahci_cmd_hdr *cmd_slot;
 - u_int32_t   cmd;
 + u_int8_t*ap_err_scratch = NULL;
   struct ata_fis_h2d  *fis;
   int rc = EIO;
 + u_int32_t   cmd;

   DPRINTF(AHCI_D_VERBOSE, %s: read log page\n, PORTNAME(ap));

 @@ -3112,10 +3112,11 @@ ahci_port_read_ncq_error(struct ahci_por
   ahci_port_start(ap, 0);

   /* Prep error CCB for READ LOG EXT, page 10h, 1 sector. */
 + ap_err_scratch = dma_alloc(DEV_BSIZE, PR_WAITOK | PR_ZERO);
   ccb = ahci_get_err_ccb(ap);
   ccb-ccb_xa.flags = ATA_F_NOWAIT | ATA_F_READ | ATA_F_POLL;
 - ccb-ccb_xa.data = ap-ap_err_scratch;
 - ccb-ccb_xa.datalen = 512;
 + ccb-ccb_xa.data = ap_err_scratch;
 + ccb-ccb_xa.datalen = DEV_BSIZE;
   cmd_slot = ccb-ccb_cmd_hdr;
   bzero(ccb-ccb_cmd_table, sizeof(struct ahci_cmd_table));

 @@ -3160,7 +3161,7 @@ err:
   struct ata_log_page_10h *log;
   int err_slot;

 - log = (struct ata_log_page_10h *)ap-ap_err_scratch;
 + log = (struct ata_log_page_10h *)ap_err_scratch;
   if (ISSET(log-err_regs.type, ATA_LOG_10H_TYPE_NOTQUEUED)) {
   /* Not queued bit was set - wasn't an NCQ error? */
   printf(%s: read NCQ error page, but not an NCQ 
 @@ -3181,6 +3182,9 @@ err:

   /* Restore saved CMD register state */
   ahci_pwrite(ap, AHCI_PREG_CMD, cmd);
 +
 + if (ap_err_scratch)
 + dma_free(ap_err_scratch, DEV_BSIZE);

   return (rc);
 }



Re: ahci big mem friendlification

2011-04-03 Thread David Gwynne
ok

On 04/04/2011, at 1:41 AM, Kenneth R Westerback wrote:

 On Sun, Apr 03, 2011 at 09:38:44AM -0400, Kenneth R Westerback wrote:
 Another allocation/memory use made big mem friendly.
 
  Ken
 
 Try to avoid allocating memory in interrupt context, as pointed out
 by dlg@.
 
  Ken
 
 Index: ahci.c
 ===
 RCS file: /cvs/src/sys/dev/pci/ahci.c,v
 retrieving revision 1.172
 diff -u -p -r1.172 ahci.c
 --- ahci.c28 Jan 2011 06:32:31 -  1.172
 +++ ahci.c3 Apr 2011 15:33:56 -
 @@ -27,6 +27,7 @@
 #include sys/timeout.h
 #include sys/queue.h
 #include sys/mutex.h
 +#include sys/pool.h
 
 #include machine/bus.h
 
 @@ -391,7 +392,7 @@ struct ahci_port {
   u_int32_t   ap_err_saved_active;
   u_int32_t   ap_err_saved_active_cnt;
 
 - u_int8_tap_err_scratch[512];
 + u_int8_t*ap_err_scratch;
 
 #ifdef AHCI_DEBUG
   charap_name[16];
 @@ -1094,6 +1095,12 @@ ahci_port_alloc(struct ahci_softc *sc, u
   DEVNAME(sc), port);
   goto reterr;
   }
 + ap-ap_err_scratch = dma_alloc(DEV_BSIZE, PR_NOWAIT | PR_ZERO);
 + if (ap-ap_err_scratch == NULL) {
 + printf(%s: unable to allocate DMA scratch buf for port %d\n,
 + DEVNAME(sc), port);
 + goto freeport;
 + }
 
 #ifdef AHCI_DEBUG
   snprintf(ap-ap_name, sizeof(ap-ap_name), %s.%d,
 @@ -1318,6 +1325,8 @@ ahci_port_free(struct ahci_softc *sc, u_
   ahci_dmamem_free(sc, ap-ap_dmamem_rfis);
   if (ap-ap_dmamem_cmd_table)
   ahci_dmamem_free(sc, ap-ap_dmamem_cmd_table);
 + if (ap-ap_err_scratch)
 + dma_free(ap-ap_err_scratch, DEV_BSIZE);
 
   /* bus_space(9) says we dont free the subregions handle */



avoiding races with timeout_del()

2011-04-03 Thread David Gwynne
there is an expectation that if you timeout_del the timeout will
not run. however, it doesnt prevent it from being about to run, or
from running on another cpu at the same time as you're doing the
timeout_del. you can check if the timeout ran with timeout_triggered,
but that can race unless you are at splsoftclock or higher.
unfortunately spls dont protect you if you don't hold the big lock.

this adds a return to timeout_del so you can know if the timeout
was pending or not. if you know you went timeout_add and timeout_del
returns 0, then you know you have to wait for your timeout to finish
working (which is your responsibility, not the timeout codes).

discussed with drinking art and art.

a manpage change will follow if this is ok.

ok?

Index: sys/timeout.h
===
RCS file: /cvs/src/sys/sys/timeout.h,v
retrieving revision 1.20
diff -u -p -r1.20 timeout.h
--- sys/timeout.h   26 May 2010 17:50:00 -  1.20
+++ sys/timeout.h   3 Apr 2011 15:59:32 -
@@ -91,7 +91,7 @@ void timeout_add_sec(struct timeout *, i
 void timeout_add_msec(struct timeout *, int);
 void timeout_add_usec(struct timeout *, int);
 void timeout_add_nsec(struct timeout *, int);
-void timeout_del(struct timeout *);
+int timeout_del(struct timeout *);
 
 void timeout_startup(void);
 
Index: kern/kern_timeout.c
===
RCS file: /cvs/src/sys/kern/kern_timeout.c,v
retrieving revision 1.32
diff -u -p -r1.32 kern_timeout.c
--- kern/kern_timeout.c 4 Nov 2009 19:14:10 -   1.32
+++ kern/kern_timeout.c 3 Apr 2011 15:59:32 -
@@ -263,16 +263,21 @@ timeout_add_nsec(struct timeout *to, int
timeout_add(to, to_ticks);
 }
 
-void
+int
 timeout_del(struct timeout *to)
 {
+   int ret = 0;
+
mtx_enter(timeout_mutex);
if (to-to_flags  TIMEOUT_ONQUEUE) {
CIRCQ_REMOVE(to-to_list);
to-to_flags = ~TIMEOUT_ONQUEUE;
+   ret = 1;
}
to-to_flags = ~TIMEOUT_TRIGGERED;
mtx_leave(timeout_mutex);
+
+   return (ret);
 }
 
 /*



Re: resurect and fix bce(4)

2011-04-03 Thread David Gwynne
On 04/04/2011, at 5:31 AM, Mark Kettenis wrote:

 Date: Sun, 3 Apr 2011 13:46:40 +0200
 From: Claudio Jeker cje...@diehard.n-r-g.com
 
 bce(4) was turned off because of limitations in the DMA engine that allows
 the chip to access only 1G of memory. On systems with more then 1G of
 memory hilarity ensued.
 
 Now I rewrote the driver to use bcopy() to copy the mbufs into a savely
 allocated DMA memory buffer. So the chip will now work with any ammount of
 memory in the machine. It works for me so if you own a bce(4) please test.
 
 +/* Create the data DMA region and maps. */
 +if ((sc-bce_data = (caddr_t)uvm_km_kmemalloc_pla(kernel_map,
 +uvm.kernel_object, (BCE_NTXDESC + BCE_NRXDESC) * MCLBYTES, 0,
 +UVM_KMF_NOWAIT, 0, (paddr_t)(0x4000 - 1), 0, 0, 1)) == NULL) {
 +printf(: unable to alloc space for ring);
 +return;
 
 Sorry, but I think drivers should not call uvm memory allocations
 directly.  Instead we should promote _bus_dmamem_alloc_range() found
 on i386 and amd64 to a first class citizen.

i would like to see this in bus_dma too.

dlg



Re: softraid iopoolification

2011-04-04 Thread David Gwynne
comments inline...

On 04/04/2011, at 9:01 AM, Kenneth R Westerback wrote:

 Works on my crypto volume. People with other volume types would be nice
 to hear from.

  Ken

 Index: softraid.c
 ===
 RCS file: /cvs/src/sys/dev/softraid.c,v
 retrieving revision 1.222
 diff -u -p -r1.222 softraid.c
 --- softraid.c15 Mar 2011 13:29:41 -  1.222
 +++ softraid.c3 Apr 2011 21:07:48 -
 @@ -1805,7 +1805,7 @@ sr_wu_alloc(struct sr_discipline *sd)
   for (i = 0; i  no_wu; i++) {
   wu = sd-sd_wu[i];
   wu-swu_dis = sd;
 - sr_wu_put(wu);
 + sr_wu_put(sd, wu);
   }

   return (0);
 @@ -1833,17 +1833,15 @@ sr_wu_free(struct sr_discipline *sd)
 }

 void
 -sr_wu_put(struct sr_workunit *wu)
 +sr_wu_put(void *xsd, void *xwu)
 {
 - struct sr_discipline*sd = wu-swu_dis;
 + struct sr_discipline*sd = (struct sr_discipline *)xsd;
 + struct sr_workunit  *wu = (struct sr_workunit *)xwu;
   struct sr_ccb   *ccb;
 -
   int s;

   DNPRINTF(SR_D_WU, %s: sr_wu_put: %p\n, DEVNAME(sd-sd_sc), wu);

 - s = splbio();
 -
   wu-swu_xs = NULL;
   wu-swu_state = SR_WU_FREE;
   wu-swu_ios_complete = 0;
 @@ -1864,9 +1862,12 @@ sr_wu_put(struct sr_workunit *wu)
   }
   TAILQ_INIT(wu-swu_ccb);

 + mtx_enter(sd-sd_wu_mtx);
   TAILQ_INSERT_TAIL(sd-sd_wu_freeq, wu, swu_link);
   sd-sd_wu_pending--;
 + mtx_leave(sd-sd_wu_mtx);

 + s = splbio();
   /* wake up sleepers */
 #ifdef DIAGNOSTIC
   if (sd-sd_wu_sleep  0)
 @@ -1874,34 +1875,23 @@ sr_wu_put(struct sr_workunit *wu)
 #endif /* DIAGNOSTIC */
   if (sd-sd_wu_sleep)
   wakeup(sd-sd_wu_sleep);
 -
   splx(s);
 }

 -struct sr_workunit *
 -sr_wu_get(struct sr_discipline *sd, int canwait)
 +void *
 +sr_wu_get(void *xsd)
 {
 + struct sr_discipline*sd = (struct sr_discipline *)xsd;
   struct sr_workunit  *wu;
 - int s;
 -
 - s = splbio();

 - for (;;) {
 - wu = TAILQ_FIRST(sd-sd_wu_freeq);
 - if (wu) {
 - TAILQ_REMOVE(sd-sd_wu_freeq, wu, swu_link);
 - wu-swu_state = SR_WU_INPROGRESS;
 - sd-sd_wu_pending++;
 - break;
 - } else if (wu == NULL  canwait) {
 - sd-sd_wu_sleep++;
 - tsleep(sd-sd_wu_sleep, PRIBIO, sr_wu_get, 0);
 - sd-sd_wu_sleep--;
 - } else
 - break;
 + mtx_enter(sd-sd_wu_mtx);
 + wu = TAILQ_FIRST(sd-sd_wu_freeq);
 + if (wu) {
 + TAILQ_REMOVE(sd-sd_wu_freeq, wu, swu_link);
 + wu-swu_state = SR_WU_INPROGRESS;
 + sd-sd_wu_pending++;
   }
 -
 - splx(s);
 + mtx_leave(sd-sd_wu_mtx);

   DNPRINTF(SR_D_WU, %s: sr_wu_get: %p\n, DEVNAME(sd-sd_sc), wu);

 @@ -1949,19 +1939,9 @@ sr_scsi_cmd(struct scsi_xfer *xs)
   goto stuffup;
   }

 - /*
 -  * we'll let the midlayer deal with stalls instead of being clever
 -  * and sending sr_wu_get !(xs-flags  SCSI_NOSLEEP) in cansleep
 -  */
 - if ((wu = sr_wu_get(sd, 0)) == NULL) {
 - DNPRINTF(SR_D_CMD, %s: sr_scsi_cmd no wu\n, DEVNAME(sc));
 - xs-error = XS_NO_CCB;
 - sr_scsi_done(sd, xs);
 - return;
 - }
 -
 - xs-error = XS_NOERROR;
 + wu = xs-io;
   wu-swu_xs = xs;
 + xs-error = XS_NOERROR;

scsi_xs_exec sets xs-error to XS_NOERROR before giving it to this func, so
this is technically unnecessary. not an issue with this diff exactly, just an
interesting factoid.


   /* the midlayer will query LUNs so report sense to stop scanning */
   if (link-target != 0 || link-lun != 0) {
 @@ -2049,10 +2029,9 @@ stuffup:
   xs-error = XS_DRIVER_STUFFUP;
   }
 complete:
 - if (wu)
 - sr_wu_put(wu);
   sr_scsi_done(sd, xs);
 }
 +
 int
 sr_scsi_ioctl(struct scsi_link *link, u_long cmd, caddr_t addr, int flag)
 {
 @@ -3042,6 +3021,8 @@ sr_ioctl_createraid(struct sr_softc *sc,
   }

   /* setup scsi midlayer */
 + mtx_init(sd-sd_wu_mtx, IPL_BIO);
 + scsi_iopool_init(sd-sd_iopool, sd, sr_wu_get, sr_wu_put);
   if (sd-sd_openings)
   sd-sd_link.openings = sd-sd_openings(sd);
   else
 @@ -3051,6 +3032,7 @@ sr_ioctl_createraid(struct sr_softc *sc,
   sd-sd_link.adapter = sr_switch;
   sd-sd_link.adapter_target = SR_MAX_LD;
   sd-sd_link.adapter_buswidth = 1;
 + sd-sd_link.pool = sd-sd_iopool;
   bzero(saa, sizeof(saa));
   saa.saa_sc_link = sd-sd_link;

 @@ -3953,11 +3935,17 @@ sr_rebuild_thread(void *arg)
   mysize += sz;
   lba = blk * sz;

 -

Re: no swapping to vnds

2011-04-04 Thread David Gwynne
On 04/04/2011, at 8:43 PM, Thordur Bjornsson wrote:

 On Mon, Apr 04, 2011 at 12:34:17PM +0200, Otto Moerbeek wrote:
 On Mon, Apr 04, 2011 at 09:22:41AM +, Thordur Bjornsson wrote:

 Hi,

 1) Swapping to svnds has issues (pagedaemon deadlocks) and has been
   broken since forever.
 2) Swapping to vnds makes no sense, why add another layer when you
   can just swap to a regular file instead ?

 so stop supporting swapping to vnds. If this turns out to be kosher
 I have a diff tested that removes vnds in favour of svnds.

 I don't know if this is the right check, but the  is redundant to get
 the address of a function.
 It's the easiest check. It's hard to map a dev_t to a device since
 it is MD, so checking for that function is the best way I could
 come up with.

 And doh on the ''. I'll commit with out it.

block drivers look themselves up by comparing to their own functions, so there
is a precedent for doing it this way.



  -Otto



 OK ?


 Index: uvm/uvm_swap.c
 ===
 RCS file: /home/thib/cvs/src/sys/uvm/uvm_swap.c,v
 retrieving revision 1.100
 diff -u -p -r1.100 uvm_swap.c
 --- uvm/uvm_swap.c  21 Dec 2010 20:14:44 -  1.100
 +++ uvm/uvm_swap.c  4 Apr 2011 09:14:59 -
 @@ -912,6 +912,10 @@ swap_on(struct proc *p, struct swapdev *
 vp = sdp-swd_vp;
 dev = sdp-swd_dev;

 +   /* no swapping to vnds. */
 +   if (bdevsw[major(dev)].d_strategy == vndstrategy)
 +   return (EOPNOTSUPP);
 +
 /*
  * open the swap file (mostly useful for block device files to
  * let device driver know what is up).



Re: softraid iopoolification

2011-04-04 Thread David Gwynne
this reads fine by me, except for one thing. i worry that it looks like wu's
get lists of ccbs attached to them that are released when the wu is released.
before iopools there was a wu per call to softraids scsi_cmd handler, but now
the same wu can be given to scsi_cmd multiple times. sr_scsi_done should
probably clean up the ccb list before the xs and wu are given back to the
midlayer.

if this isnt a problem and testing goes well, then it has my ok.

the wakeups in sr_wu_put are no longer necessary as the iopool code takes over
responsibility for sleeping for resources. its just extra code, but it doesnt
affect my ok above.

dlg

On 05/04/2011, at 12:47 AM, Kenneth R Westerback wrote:

 On Sun, Apr 03, 2011 at 07:01:04PM -0400, Kenneth R Westerback wrote:
 Works on my crypto volume. People with other volume types would be nice
 to hear from.

  Ken

 v2. Use scsi_io_[get|put](), stop trying so hard to avoid calling
scsi_done()
 at SPLBIO as this is nice but not necessary, remove random 'improvements'
to
 make diff as small as possible.

 Tests still desired!

  Ken

 Index: softraid.c
 ===
 RCS file: /cvs/src/sys/dev/softraid.c,v
 retrieving revision 1.222
 diff -u -p -r1.222 softraid.c
 --- softraid.c15 Mar 2011 13:29:41 -  1.222
 +++ softraid.c4 Apr 2011 14:36:04 -
 @@ -1805,7 +1805,7 @@ sr_wu_alloc(struct sr_discipline *sd)
   for (i = 0; i  no_wu; i++) {
   wu = sd-sd_wu[i];
   wu-swu_dis = sd;
 - sr_wu_put(wu);
 + sr_wu_put(sd, wu);
   }

   return (0);
 @@ -1833,9 +1833,10 @@ sr_wu_free(struct sr_discipline *sd)
 }

 void
 -sr_wu_put(struct sr_workunit *wu)
 +sr_wu_put(void *xsd, void *xwu)
 {
 - struct sr_discipline*sd = wu-swu_dis;
 + struct sr_discipline*sd = (struct sr_discipline *)xsd;
 + struct sr_workunit  *wu = (struct sr_workunit *)xwu;
   struct sr_ccb   *ccb;

   int s;
 @@ -1864,9 +1865,14 @@ sr_wu_put(struct sr_workunit *wu)
   }
   TAILQ_INIT(wu-swu_ccb);

 + splx(s);
 +
 + mtx_enter(sd-sd_wu_mtx);
   TAILQ_INSERT_TAIL(sd-sd_wu_freeq, wu, swu_link);
   sd-sd_wu_pending--;
 + mtx_leave(sd-sd_wu_mtx);

 + s = splbio();
   /* wake up sleepers */
 #ifdef DIAGNOSTIC
   if (sd-sd_wu_sleep  0)
 @@ -1878,30 +1884,20 @@ sr_wu_put(struct sr_workunit *wu)
   splx(s);
 }

 -struct sr_workunit *
 -sr_wu_get(struct sr_discipline *sd, int canwait)
 +void *
 +sr_wu_get(void *xsd)
 {
 + struct sr_discipline*sd = (struct sr_discipline *)xsd;
   struct sr_workunit  *wu;
 - int s;

 - s = splbio();
 -
 - for (;;) {
 - wu = TAILQ_FIRST(sd-sd_wu_freeq);
 - if (wu) {
 - TAILQ_REMOVE(sd-sd_wu_freeq, wu, swu_link);
 - wu-swu_state = SR_WU_INPROGRESS;
 - sd-sd_wu_pending++;
 - break;
 - } else if (wu == NULL  canwait) {
 - sd-sd_wu_sleep++;
 - tsleep(sd-sd_wu_sleep, PRIBIO, sr_wu_get, 0);
 - sd-sd_wu_sleep--;
 - } else
 - break;
 + mtx_enter(sd-sd_wu_mtx);
 + wu = TAILQ_FIRST(sd-sd_wu_freeq);
 + if (wu) {
 + TAILQ_REMOVE(sd-sd_wu_freeq, wu, swu_link);
 + wu-swu_state = SR_WU_INPROGRESS;
 + sd-sd_wu_pending++;
   }
 -
 - splx(s);
 + mtx_leave(sd-sd_wu_mtx);

   DNPRINTF(SR_D_WU, %s: sr_wu_get: %p\n, DEVNAME(sd-sd_sc), wu);

 @@ -1949,18 +1945,7 @@ sr_scsi_cmd(struct scsi_xfer *xs)
   goto stuffup;
   }

 - /*
 -  * we'll let the midlayer deal with stalls instead of being clever
 -  * and sending sr_wu_get !(xs-flags  SCSI_NOSLEEP) in cansleep
 -  */
 - if ((wu = sr_wu_get(sd, 0)) == NULL) {
 - DNPRINTF(SR_D_CMD, %s: sr_scsi_cmd no wu\n, DEVNAME(sc));
 - xs-error = XS_NO_CCB;
 - sr_scsi_done(sd, xs);
 - return;
 - }
 -
 - xs-error = XS_NOERROR;
 + wu = xs-io;
   wu-swu_xs = xs;

   /* the midlayer will query LUNs so report sense to stop scanning */
 @@ -2049,8 +2034,6 @@ stuffup:
   xs-error = XS_DRIVER_STUFFUP;
   }
 complete:
 - if (wu)
 - sr_wu_put(wu);
   sr_scsi_done(sd, xs);
 }
 int
 @@ -3042,6 +3025,8 @@ sr_ioctl_createraid(struct sr_softc *sc,
   }

   /* setup scsi midlayer */
 + mtx_init(sd-sd_wu_mtx, IPL_BIO);
 + scsi_iopool_init(sd-sd_iopool, sd, sr_wu_get, sr_wu_put);
   if (sd-sd_openings)
   sd-sd_link.openings = sd-sd_openings(sd);
   else
 @@ -3051,6 +3036,7 @@ sr_ioctl_createraid(struct sr_softc *sc,
   sd-sd_link.adapter = sr_switch;
   sd-sd_link.adapter_target = 

let isp(4) refuse to let the midlayer use high luns

2011-04-05 Thread David Gwynne
...so the io path doesnt have to do it EVERY TIME FOR EVERY IO.


Index: isp_openbsd.c
===
RCS file: /cvs/src/sys/dev/ic/isp_openbsd.c,v
retrieving revision 1.45
diff -u -p -r1.45 isp_openbsd.c
--- isp_openbsd.c   31 Dec 2010 19:20:42 -  1.45
+++ isp_openbsd.c   5 Apr 2011 11:28:57 -
@@ -61,6 +61,7 @@
 #define_XT(xs) xs)-timeout/1000) * hz) + (3 * hz))
 
 static void ispminphys(struct buf *, struct scsi_link *);
+static int isp_scsi_probe(struct scsi_link *);
 static void ispcmd_slow(XS_T *);
 static void ispcmd(XS_T *);
 
@@ -94,6 +95,7 @@ isp_attach(struct ispsoftc *isp)
struct scsibus_attach_args saa;
struct scsi_link *lptr = isp-isp_osinfo._link[0];
isp-isp_osinfo._adapter.scsi_minphys = ispminphys;
+   isp-isp_osinfo._adapter.dev_probe = isp_scsi_probe;
 
isp-isp_state = ISP_RUNSTATE;
 
@@ -283,6 +285,17 @@ isp_add2_blocked_queue(struct ispsoftc *
xs-free_list.le_next = NULL;
 }
 
+int
+isp_scsi_probe(struct scsi_link *link)
+{
+   struct ispsoftc *isp = (struct ispsoftc *)link-adapter_softc;
+
+   if (link-lun = isp-isp_maxluns)
+   return (ENXIO);
+
+   return (0);
+}
+
 void
 ispcmd(XS_T *xs)
 {
@@ -298,13 +311,6 @@ ispcmd(XS_T *xs)
 
ISP_LOCK(isp);
 
-   if (XS_LUN(xs) = isp-isp_maxluns) {
-   xs-error = XS_SELTIMEOUT;
-   scsi_done(xs);
-   ISP_UNLOCK(isp);
-   return;
-   }
-
if (isp-isp_state  ISP_RUNSTATE) {
ISP_DISABLE_INTS(isp);
isp_init(isp);



isp isnt out of resources, it just gets busy

2011-04-05 Thread David Gwynne
so use the appropriate define to report that.

Index: isp_openbsd.c
===
RCS file: /cvs/src/sys/dev/ic/isp_openbsd.c,v
retrieving revision 1.46
diff -u -p -r1.46 isp_openbsd.c
--- isp_openbsd.c   5 Apr 2011 12:09:20 -   1.46
+++ isp_openbsd.c   5 Apr 2011 13:01:11 -
@@ -330,7 +330,7 @@ ispcmd(XS_T *xs)
 */
if (isp-isp_osinfo.blocked) {
if (xs-flags  SCSI_POLL) {
-   xs-error = XS_NO_CCB;
+   xs-error = XS_BUSY;
scsi_done(xs);
ISP_UNLOCK(isp);
return;
@@ -401,7 +401,7 @@ isp_polled_cmd(struct ispsoftc *isp, XS_
break;
case CMD_RQLATER:
case CMD_EAGAIN:
-   xs-error = XS_NO_CCB;
+   xs-error = XS_BUSY;
/* FALLTHROUGH */
case CMD_COMPLETE:
scsi_done(xs);



make ahci reserve a ccb for error recovery

2011-04-17 Thread David Gwynne
since atascsi screws it up...

Index: ahci.c
===
RCS file: /cvs/src/sys/dev/pci/ahci.c,v
retrieving revision 1.174
diff -u -p -r1.174 ahci.c
--- ahci.c  7 Apr 2011 15:30:16 -   1.174
+++ ahci.c  18 Apr 2011 01:39:19 -
@@ -373,6 +373,7 @@ struct ahci_port {
TAILQ_HEAD(, ahci_ccb)  ap_ccb_free;
TAILQ_HEAD(, ahci_ccb)  ap_ccb_pending;
struct mutexap_ccb_mtx;
+   struct ahci_ccb *ap_ccb_err;
 
u_int32_t   ap_state;
 #define AP_S_NORMAL0
@@ -863,8 +864,7 @@ noccc:
aaa.aaa_methods = ahci_atascsi_methods;
aaa.aaa_minphys = NULL;
aaa.aaa_nports = AHCI_MAX_PORTS;
-   aaa.aaa_ncmds = sc-sc_ncmds;
-   aaa.aaa_capability = ASAA_CAP_NEEDS_RESERVED;
+   aaa.aaa_ncmds = sc-sc_ncmds - 1;
if (!(sc-sc_flags  AHCI_F_NO_NCQ) 
(sc-sc_cap  AHCI_REG_CAP_SNCQ)) {
aaa.aaa_capability |= ASAA_CAP_NCQ | ASAA_CAP_PMP_NCQ;
@@ -1292,6 +1292,10 @@ nomem:
 
ahci_enable_interrupts(ap);
 
+   /* grab a ccb for use during error recovery */
+   ap-ap_ccb_err = ap-ap_ccbs[sc-sc_ncmds - 1];
+   TAILQ_REMOVE(ap-ap_ccb_free, ap-ap_ccb_err, ccb_entry);
+
 freeport:
if (rc != 0)
ahci_port_free(sc, port);
@@ -1313,6 +1317,9 @@ ahci_port_free(struct ahci_softc *sc, u_
ahci_write(sc, AHCI_REG_IS, 1  port);
}
 
+   if (ap-ap_ccb_err)
+   ahci_put_ccb(ap-ap_ccb_err);
+
if (ap-ap_ccbs) {
while ((ccb = ahci_get_ccb(ap)) != NULL)
bus_dmamap_destroy(sc-sc_dmat, ccb-ccb_dmamap);
@@ -3012,12 +3019,11 @@ ahci_get_err_ccb(struct ahci_port *ap)
 * Grab a CCB to use for error recovery.  This should never fail, as
 * we ask atascsi to reserve one for us at init time.
 */
-   err_ccb = ahci_get_ccb(ap);
-   KASSERT(err_ccb != NULL);
+   err_ccb = ap-ap_ccb_err;
err_ccb-ccb_xa.flags = 0;
err_ccb-ccb_done = ahci_empty_done;
 
-   return err_ccb;
+   return (err_ccb);
 }
 
 void
@@ -3037,8 +3043,10 @@ ahci_put_err_ccb(struct ahci_ccb *ccb)
printf(ahci_port_err_ccb_restore but SACT %08x != 0?\n, sact);
KASSERT(ahci_pread(ap, AHCI_PREG_CI) == 0);
 
+#ifdef DIAGNOSTIC
/* Done with the CCB */
-   ahci_put_ccb(ccb);
+   KASSERT(ccb == ap-ap_ccb_err);
+#endif
 
/* Restore outstanding command state */
ap-ap_sactive = ap-ap_err_saved_sactive;



Re: eliminate gdt(4) raw_scsi mode

2011-04-19 Thread David Gwynne
ok

On 20/04/2011, at 12:54 AM, Kenneth R Westerback wrote:

 gdt(4) man page says 'transparent raw SCSI mode' is unsupported.
 The code just returns errors to any attempts to submit i/o.  I'm
 pretty sure nobody is going to add support so eliminate the framework
 for it.
 
 Shrinks the iopool diff.
 
 Any dissenting voices?
 
  Ken
 
 Index: share/man/man4/gdt.4
 ===
 RCS file: /cvs/src/share/man/man4/gdt.4,v
 retrieving revision 1.30
 diff -u -p -r1.30 gdt.4
 --- share/man/man4/gdt.4  1 Apr 2011 19:13:58 -   1.30
 +++ share/man/man4/gdt.4  19 Apr 2011 13:50:43 -
 @@ -145,7 +145,7 @@ inspired by the Linux driver by
 .Sh BUGS
 An ISA  EISA front-end is needed.
 .Pp
 -The driver does not yet support transparent raw SCSI mode.
 +The driver does not support transparent raw SCSI mode.
 .Pp
 It would be nice to configure the RAID units after boot
 but the information on how to do that is not public.
 Index: sys/dev/ic/gdt_common.c
 ===
 RCS file: /cvs/src/sys/dev/ic/gdt_common.c,v
 retrieving revision 1.55
 diff -u -p -r1.55 gdt_common.c
 --- sys/dev/ic/gdt_common.c   12 Oct 2010 00:53:32 -  1.55
 +++ sys/dev/ic/gdt_common.c   19 Apr 2011 13:50:44 -
 @@ -82,7 +82,6 @@ int gdt_ioctl_disk(struct gdt_softc *, s
 int   gdt_ioctl_alarm(struct gdt_softc *, struct bioc_alarm *);
 int   gdt_ioctl_setstate(struct gdt_softc *, struct bioc_setstate *);
 #endif /* NBIO  0 */
 -void gdt_raw_scsi_cmd(struct scsi_xfer *);
 void  gdt_scsi_cmd(struct scsi_xfer *);
 void  gdt_start_ccbs(struct gdt_softc *);
 int   gdt_sync_event(struct gdt_softc *, int, u_int8_t,
 @@ -99,10 +98,6 @@ struct scsi_adapter gdt_switch = {
   gdt_scsi_cmd, gdtminphys, 0, 0,
 };
 
 -struct scsi_adapter gdt_raw_switch = {
 - gdt_raw_scsi_cmd, gdtminphys, 0, 0,
 -};
 -
 int gdt_cnt = 0;
 u_int8_t gdt_polling;
 u_int8_t gdt_from_wait;
 @@ -484,26 +479,6 @@ gdt_attach(struct gdt_softc *sc)
 
   config_found(sc-sc_dev, saa, scsiprint);
 
 - sc-sc_raw_link = malloc(sc-sc_bus_cnt * sizeof (struct scsi_link),
 - M_DEVBUF, M_NOWAIT | M_ZERO);
 - if (sc-sc_raw_link == NULL)
 - panic(gdt_attach);
 -
 - for (i = 0; i  sc-sc_bus_cnt; i++) {
 - /* Fill in the prototype scsi_link. */
 - sc-sc_raw_link[i].adapter_softc = sc;
 - sc-sc_raw_link[i].adapter = gdt_raw_switch;
 - sc-sc_raw_link[i].adapter_target = 7;
 - sc-sc_raw_link[i].openings = 4;/* XXX a guess */
 - sc-sc_raw_link[i].adapter_buswidth =
 - (sc-sc_class  GDT_FC) ? GDT_MAXID : 16;   /* XXX */
 -
 - bzero(saa, sizeof(saa));
 - saa.saa_sc_link = sc-sc_raw_link[i];
 -
 - config_found(sc-sc_dev, saa, scsiprint);
 - }
 -
   gdt_polling = 0;
   return (0);
 }
 @@ -987,43 +962,6 @@ gdt_internal_cache_cmd(struct scsi_xfer 
   }
 
   xs-error = XS_NOERROR;
 -}
 -
 -/* Start a raw SCSI operation */
 -void
 -gdt_raw_scsi_cmd(struct scsi_xfer *xs)
 -{
 - struct scsi_link *link = xs-sc_link;
 - struct gdt_softc *sc = link-adapter_softc;
 - struct gdt_ccb *ccb;
 - int s;
 -
 - GDT_DPRINTF(GDT_D_CMD, (gdt_raw_scsi_cmd ));
 -
 - if (xs-cmdlen  12 /* XXX create #define */) {
 - GDT_DPRINTF(GDT_D_CMD, (CDB too big %p , xs));
 - bzero(xs-sense, sizeof(xs-sense));
 - xs-sense.error_code = SSD_ERRCODE_VALID | SSD_ERRCODE_CURRENT;
 - xs-sense.flags = SKEY_ILLEGAL_REQUEST;
 - xs-sense.add_sense_code = 0x20; /* illcmd, 0x24 illfield */
 - xs-error = XS_SENSE;
 - scsi_done(xs);
 - return;
 - }
 -
 - if ((ccb = gdt_get_ccb(sc, xs-flags)) == NULL) {
 - GDT_DPRINTF(GDT_D_CMD, (no ccb available for %p , xs));
 - xs-error = XS_DRIVER_STUFFUP;
 - scsi_done(xs);
 - return;
 - }
 -
 - xs-error = XS_DRIVER_STUFFUP;
 - s = splbio();
 - scsi_done(xs);
 - gdt_free_ccb(sc, ccb);
 -
 - splx(s);
 }
 
 void
 Index: sys/dev/ic/gdtvar.h
 ===
 RCS file: /cvs/src/sys/dev/ic/gdtvar.h,v
 retrieving revision 1.17
 diff -u -p -r1.17 gdtvar.h
 --- sys/dev/ic/gdtvar.h   12 Aug 2009 17:51:33 -  1.17
 +++ sys/dev/ic/gdtvar.h   19 Apr 2011 13:50:44 -
 @@ -268,7 +268,6 @@ struct gdt_softc {
   struct  device sc_dev;
   void   *sc_ih;
   struct  scsi_link sc_link;  /* Virtual SCSI bus for cache devs */
 - struct  scsi_link *sc_raw_link; /* Raw SCSI busses */
 
   int sc_class;   /* Controller class */
 #define GDT_ISA   0x01



Re: impact of unaligned partitions/slices on 4kB sector drives (wd10ears)

2011-05-14 Thread David Gwynne
On 14/05/2011, at 6:43 PM, Abel Abraham Camarillo Ojeda wrote:

 I'm starting to get angry about the _horrible_ performance on this drive
 (WD10EARS-00Y), some developer ever got a chance to see something about
 this?

don't get angry, it's just a disk.

we changed the default alignment of partitions on all disks to mitigate this
problem. the only issue you may have with a default install on one of these
drives is a small fragment size on the ffs partitions.

i have had a look at querying disks for their physical and logical block
alignments and offsets, but the the WD??EARS-00? drives dont report this info.
according to western digital, the next generation of these drives
(WD??EARS-11? iirc) are supposed to report them. if i ever find a disk that
does report the physical to logical alignment, i might have a look at having
the system make use of those values.

huggz,
dlg


 The original message is at:

 http://marc.info/?l=openbsd-techm=126281899324219w=2

 (I wasn't subscribed to this list back then)

 Thanks.

 OpenBSD 4.9-current (kobj) #0: Sun May  1 14:32:33 CDT 2011
root@maetel.00z:/usr/kobj
 real mem = 1608056832 (1533MB)
 avail mem = 1551196160 (1479MB)
 mainbus0 at root
 bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xfd400 (50 entries)
 bios0: vendor American Megatrends Inc. version V1.3 date 11/15/2010
 bios0: MSI MS-7623
 acpi0 at bios0: rev 0
 acpi0: sleep states S0 S3 S4 S5
 acpi0: tables DSDT FACP APIC MCFG OEMB SRAT HPET SSDT
 acpi0: wakeup devices PCE2(S4) PCE3(S4) PCE4(S4) PCE5(S4) PCE6(S4)
 PCE7(S4) PCE9(S4) PCEA(S4) PCEB(S4) PCEC(S4) SBAZ(S4) PSKE(S4)
 PSMS(S4) ECIR(S4) PS2K(S3) PS2M(S3) P0PC(S4) UHC1(S4) UHC2(S4)
 UHC3(S4) USB4(S4) UHC5(S4) UHC6(S4) UHC7(S4) PWRB(S3)
 acpitimer0 at acpi0: 3579545 Hz, 32 bits
 acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
 cpu0 at mainbus0: apid 0 (boot processor)
 cpu0: AMD Phenom(tm) II X4 955 Processor, 3200.77 MHz
 cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
 cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
 64b/line 16-way L2 cache
 cpu0: ITLB 32 4KB entries fully associative, 16 4MB entries fully
associative
 cpu0: DTLB 48 4KB entries fully associative, 48 4MB entries fully
associative
 cpu0: apic clock running at 200MHz
 cpu1 at mainbus0: apid 1 (application processor)
 cpu1: AMD Phenom(tm) II X4 955 Processor, 3200.16 MHz
 cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
 cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
 64b/line 16-way L2 cache
 cpu1: ITLB 32 4KB entries fully associative, 16 4MB entries fully
associative
 cpu1: DTLB 48 4KB entries fully associative, 48 4MB entries fully
associative
 cpu2 at mainbus0: apid 2 (application processor)
 cpu2: AMD Phenom(tm) II X4 955 Processor, 3200.15 MHz
 cpu2:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
 cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
 64b/line 16-way L2 cache
 cpu2: ITLB 32 4KB entries fully associative, 16 4MB entries fully
associative
 cpu2: DTLB 48 4KB entries fully associative, 48 4MB entries fully
associative
 cpu3 at mainbus0: apid 3 (application processor)
 cpu3: AMD Phenom(tm) II X4 955 Processor, 3200.16 MHz
 cpu3:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DN
OW
 cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
 64b/line 16-way L2 cache
 cpu3: ITLB 32 4KB entries fully associative, 16 4MB entries fully
associative
 cpu3: DTLB 48 4KB entries fully associative, 48 4MB entries fully
associative
 ioapic0 at mainbus0: apid 4 pa 0xfec0, version 21, 24 pins
 acpimcfg0 at acpi0 addr 0xe000, bus 0-255
 acpihpet0 at acpi0: 14318180 Hz
 acpiprt0 at acpi0: bus 0 (PCI0)
 acpiprt1 at acpi0: bus 1 (P0P1)
 acpiprt2 at acpi0: bus -1 (PCE2)
 acpiprt3 at acpi0: bus -1 (PCE3)
 acpiprt4 at acpi0: bus -1 (PCE4)
 acpiprt5 at acpi0: bus 2 (PCE5)
 acpiprt6 at acpi0: bus -1 (PCE6)
 acpiprt7 at acpi0: bus -1 (PCE7)
 acpiprt8 at acpi0: bus -1 (PCE9)
 acpiprt9 at acpi0: bus -1 (PCEA)
 acpiprt10 at acpi0: bus -1 (PCEB)
 acpiprt11 at acpi0: bus -1 (PCEC)
 acpiprt12 at acpi0: bus 3 (P0PC)
 acpicpu0 at acpi0: PSS
 acpicpu1 at acpi0: PSS
 acpicpu2 at acpi0: PSS
 acpicpu3 at acpi0: PSS
 acpibtn0 at acpi0: PWRB
 pci0 at mainbus0 bus 0
 pchb0 at pci0 dev 0 function 0 AMD RS780 Host rev 0x00
 ppb0 at pci0 dev 1 function 0 AMD RS780 PCIE rev 0x00
 pci1 at ppb0 bus 1
 vga1 at pci1 dev 5 function 0 vendor ATI, unknown product 0x9616 rev 0x00
 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
 wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
 ppb1 at 

smarter query for hds preferred ownership

2011-06-13 Thread David Gwynne
this is based on the solaris code. its a lot less arbitrary than
the stupid linux code.

if this works i'll add back the printing of the physical path, it can be useful 
when debugging wiring issues.

cheers,
dlg

Index: mpath_hds.c
===
RCS file: /cvs/src/sys/scsi/mpath_hds.c,v
retrieving revision 1.2
diff -u -p -r1.2 mpath_hds.c
--- mpath_hds.c 28 Apr 2011 10:43:36 -  1.2
+++ mpath_hds.c 14 Jun 2011 05:34:00 -
@@ -37,6 +37,17 @@
 #include scsi/scsiconf.h
 #include scsi/mpathvar.h
 
+#define HDS_VPD0xe0
+
+struct hds_vpd {
+struct scsi_vpd_hdrhdr; /* HDS_VPD */
+   u_int8_tstate;
+#define HDS_VPD_VALID  0x80
+#define HDS_VPD_PREFERRED  0x40
+
+   /* followed by lots of unknown stuff */
+};
+
 struct hds_softc {
struct device   sc_dev;
struct mpath_path   sc_path;
@@ -80,7 +91,7 @@ struct hds_device {
char *product;
 };
 
-inthds_priority(struct hds_softc *);
+inthds_preferred(struct scsi_link *, int *);
 
 struct hds_device hds_devices[] = {
 /*vendordevice  */
@@ -94,8 +105,9 @@ hds_match(struct device *parent, void *m
 {
struct scsi_attach_args *sa = aux;
struct scsi_inquiry_data *inq = sa-sa_inqbuf;
+   struct scsi_link *link = sa-sa_sc_link;
struct hds_device *s;
-   int i;
+   int i, preferred;
 
if (mpath_path_probe(sa-sa_sc_link) != 0)
return (0);
@@ -104,8 +116,11 @@ hds_match(struct device *parent, void *m
s = hds_devices[i];
 
if (bcmp(s-vendor, inq-vendor, strlen(s-vendor)) == 0 
-   bcmp(s-product, inq-product, strlen(s-product)) == 0)
-   return (3);
+   bcmp(s-product, inq-product, strlen(s-product)) == 0 
+   hds_preferred(link, preferred) == 0) {
+   /* match above sym(4) */
+   return (4);
+   }
}
 
return (0);
@@ -128,7 +143,7 @@ hds_attach(struct device *parent, struct
sc-sc_path.p_link = link;
sc-sc_path.p_ops = hds_mpath_ops;
 
-   if (hds_priority(sc) != 0)
+   if (hds_preferred(link, sc-sc_active) != 0)
return;
 
if (!sc-sc_active)
@@ -190,68 +205,26 @@ hds_mpath_offline(struct scsi_link *link
 }
 
 int
-hds_priority(struct hds_softc *sc)
+hds_preferred(struct scsi_link *link, int *preferred)
 {
-   u_int8_t *buffer;
-   struct scsi_inquiry *cdb;
-   struct scsi_xfer *xs;
-   size_t length;
-   u_int8_t ldev[9];
-   u_int8_t ctrl;
-   u_int8_t port;
-   int p, c;
+   struct hds_vpd *pg;
int error;
 
-   length = MIN(sc-sc_path.p_link-inqdata.additional_length + 5, 255);
-   if (length  51)
-   return (EIO);
-
-   buffer = dma_alloc(length, PR_WAITOK);
-
-   xs = scsi_xs_get(sc-sc_path.p_link, scsi_autoconf);
-   if (xs == NULL) {
-   error = EBUSY;
-   goto done;
-   }
-
-   cdb = (struct scsi_inquiry *)xs-cmd;
-   cdb-opcode = INQUIRY;
-   _lto2b(length, cdb-length);
-
-   xs-cmdlen = sizeof(*cdb);
-   xs-flags |= SCSI_DATA_IN;
-   xs-data = buffer;
-   xs-datalen = length;
+   pg = dma_alloc(sizeof(*pg), PR_WAITOK);
 
-   error = scsi_xs_sync(xs);
-   scsi_xs_put(xs);
-
-   if (error != 0)
+   error = scsi_inquire_vpd(link, pg, sizeof(*pg), HDS_VPD, scsi_autoconf);
+   if (error)
goto done;
 
-   /* XXX magical */
-   bzero(ldev, sizeof(ldev));
-   scsi_strvis(ldev, buffer + 44, 4);
-   ctrl = buffer[49];
-   port = buffer[50];
-
-   if (strlen(ldev)  4 || ldev[3]  '0' || ldev[3]  'F' ||
-   ctrl  '0' || ctrl  '9' ||
-   port  'A' || port  'B') {
-   error = EIO;
+   if (_2btol(pg-hdr.page_length)  sizeof(pg-state) ||
+!ISSET(pg-state, HDS_VPD_VALID)) {
+   error = ENXIO;
goto done;
}
 
-   c = ctrl - '0';
-   p = port - 'A';
-   if ((c  0x1) == (p  0x1))
-   sc-sc_active = 1;
-
-   printf(%s: ldev %s, controller %c, port %c\n, DEVNAME(sc), ldev,
-   ctrl, port);
+   *preferred = ISSET(pg-state, HDS_VPD_PREFERRED);
 
-   error = 0;
 done:
-   dma_free(buffer, length);
+   dma_free(pg, sizeof(*pg));
return (error);
 }



dont let sdmmc devices respond to scsi vpd queries

2011-06-14 Thread David Gwynne
ie, check if the VPD bit is set when an inquiry is issued and stop
if it is. adds a free check for the cdblen there too.

i cant even ping my x60 atm, so i cant test. anyone else want to
give it a spin?

Index: sdmmc_scsi.c
===
RCS file: /cvs/src/sys/dev/sdmmc/sdmmc_scsi.c,v
retrieving revision 1.26
diff -u -p -r1.26 sdmmc_scsi.c
--- sdmmc_scsi.c25 Oct 2010 10:36:49 -  1.26
+++ sdmmc_scsi.c15 Jun 2011 03:25:28 -
@@ -80,6 +80,7 @@ void  *sdmmc_ccb_alloc(void *);
 void   sdmmc_ccb_free(void *, void *);
 
 void   sdmmc_scsi_cmd(struct scsi_xfer *);
+void   sdmmc_inquiry(struct scsi_xfer *);
 void   sdmmc_start_xs(struct sdmmc_softc *, struct sdmmc_ccb *);
 void   sdmmc_complete_xs(void *);
 void   sdmmc_done_xs(struct sdmmc_ccb *);
@@ -296,7 +297,6 @@ sdmmc_scsi_cmd(struct scsi_xfer *xs)
struct sdmmc_softc *sc = link-adapter_softc;
struct sdmmc_scsi_softc *scbus = sc-sc_scsibus;
struct sdmmc_scsi_target *tgt = scbus-sc_tgt[link-target];
-   struct scsi_inquiry_data inq;
struct scsi_read_cap_data rcd;
u_int32_t blockno;
u_int32_t blockcnt;
@@ -327,17 +327,7 @@ sdmmc_scsi_cmd(struct scsi_xfer *xs)
break;
 
case INQUIRY:
-   bzero(inq, sizeof inq);
-   inq.device = T_DIRECT;
-   inq.version = 2;
-   inq.response_format = 2;
-   inq.additional_length = 32;
-   strlcpy(inq.vendor, SD/MMC , sizeof(inq.vendor));
-   snprintf(inq.product, sizeof(inq.product),
-   Drive #%02d, link-target);
-   strlcpy(inq.revision,, sizeof(inq.revision));
-   bcopy(inq, xs-data, MIN(xs-datalen, sizeof inq));
-   scsi_done(xs);
+   sdmmc_inquiry(xs);
return;
 
case TEST_UNIT_READY:
@@ -381,6 +371,39 @@ sdmmc_scsi_cmd(struct scsi_xfer *xs)
ccb-ccb_blockno = blockno;
 
sdmmc_start_xs(sc, ccb);
+}
+
+void
+sdmmc_inquiry(struct scsi_xfer *xs)
+{
+   struct scsi_link *link = xs-sc_link;
+   struct scsi_inquiry_data inq;
+   struct scsi_inquiry *cdb = (struct scsi_inquiry *)xs-cmd;
+
+if (xs-cmdlen != sizeof(*cdb)) {
+   xs-error = XS_DRIVER_STUFFUP;
+   goto done;
+   }
+
+   if (ISSET(cdb-flags, SI_EVPD)) {
+   xs-error = XS_DRIVER_STUFFUP;
+   goto done;
+   }
+
+   bzero(inq, sizeof inq);
+   inq.device = T_DIRECT;
+   inq.version = 2;
+   inq.response_format = 2;
+   inq.additional_length = 32;
+   strlcpy(inq.vendor, SD/MMC , sizeof(inq.vendor));
+   snprintf(inq.product, sizeof(inq.product),
+   Drive #%02d, link-target);
+   strlcpy(inq.revision,, sizeof(inq.revision));
+
+   bcopy(inq, xs-data, MIN(xs-datalen, sizeof(inq)));
+
+done:
+   scsi_done(xs);
 }
 
 void



try to refill bnx(4) when all the mbufs are gone

2011-06-14 Thread David Gwynne
this is like the change i did for ix(4)...

Index: if_bnx.c
===
RCS file: /cvs/src/sys/dev/pci/if_bnx.c,v
retrieving revision 1.94
diff -u -p -r1.94 if_bnx.c
--- if_bnx.c18 Apr 2011 04:27:31 -  1.94
+++ if_bnx.c15 Jun 2011 04:05:28 -
@@ -363,7 +363,8 @@ int bnx_get_buf(struct bnx_softc *, u_in
 
 intbnx_init_tx_chain(struct bnx_softc *);
 void   bnx_init_tx_context(struct bnx_softc *);
-void   bnx_fill_rx_chain(struct bnx_softc *);
+void   bnx_fill_rx_chain(struct bnx_softc *, int);
+void   bnx_refill_rx_chain(void *);
 void   bnx_init_rx_context(struct bnx_softc *);
 intbnx_init_rx_chain(struct bnx_softc *);
 void   bnx_free_rx_chain(struct bnx_softc *);
@@ -933,6 +934,7 @@ bnx_attachhook(void *xsc)
ether_ifattach(ifp);
 
timeout_set(sc-bnx_timeout, bnx_tick, sc);
+   timeout_set(sc-rx_refill, bnx_refill_rx_chain, sc);
 
/* Print some important debugging info. */
DBRUN(BNX_INFO, bnx_dump_driver_state(sc));
@@ -3233,6 +3235,7 @@ bnx_stop(struct bnx_softc *sc)
DBPRINT(sc, BNX_VERBOSE_RESET, Entering %s()\n, __FUNCTION__);
 
timeout_del(sc-bnx_timeout);
+   timeout_del(sc-rx_refill);
 
ifp-if_flags = ~(IFF_RUNNING | IFF_OACTIVE);
 
@@ -3706,6 +3709,7 @@ bnx_get_buf(struct bnx_softc *sc, u_int1
 * last rx_bd entry so that rx_mbuf_ptr and rx_mbuf_map matches)
 * and update our counter.
 */
+   sc-rx_mbuf_alloc++;
sc-rx_mbuf_ptr[*chain_prod] = m;
sc-rx_mbuf_map[first_chain_prod] = sc-rx_mbuf_map[*chain_prod];
sc-rx_mbuf_map[*chain_prod] = map;
@@ -3981,10 +3985,11 @@ bnx_init_rx_context(struct bnx_softc *sc
 /*   Nothing*/
 //
 void
-bnx_fill_rx_chain(struct bnx_softc *sc)
+bnx_fill_rx_chain(struct bnx_softc *sc, int offset)
 {
u_int16_t   prod, chain_prod;
u_int32_t   prod_bseq;
+   int refill = 0;
 #ifdef BNX_DEBUG
int rx_mbuf_alloc_before, free_rx_bd_before;
 #endif
@@ -4007,6 +4012,7 @@ bnx_fill_rx_chain(struct bnx_softc *sc)
break;
}
prod = NEXT_RX_BD(prod);
+   refill = 1;
}
 
 #if 0
@@ -4016,17 +4022,33 @@ bnx_fill_rx_chain(struct bnx_softc *sc)
(free_rx_bd_before - sc-free_rx_bd)));
 #endif
 
-   /* Save the RX chain producer index. */
-   sc-rx_prod = prod;
-   sc-rx_prod_bseq = prod_bseq;
-
-   /* Tell the chip about the waiting rx_bd's. */
-   REG_WR16(sc, MB_RX_CID_ADDR + BNX_L2CTX_HOST_BDIDX, sc-rx_prod);
-   REG_WR(sc, MB_RX_CID_ADDR + BNX_L2CTX_HOST_BSEQ, sc-rx_prod_bseq);
+   if (refill) {
+   /* Save the RX chain producer index. */
+   sc-rx_prod = prod;
+   sc-rx_prod_bseq = prod_bseq;
+
+   /* Tell the chip about the waiting rx_bd's. */
+   REG_WR16(sc, MB_RX_CID_ADDR + BNX_L2CTX_HOST_BDIDX,
+   sc-rx_prod);
+   REG_WR(sc, MB_RX_CID_ADDR + BNX_L2CTX_HOST_BSEQ,
+   sc-rx_prod_bseq);
+   } else if (sc-rx_mbuf_alloc  2)
+   timeout_add(sc-rx_refill, offset);
 
DBPRINT(sc, BNX_EXCESSIVE_RECV, Exiting %s()\n, __FUNCTION__);
 }
 
+void
+bnx_refill_rx_chain(void *xsc)
+{
+   struct bnx_softc *sc = xsc;
+   int s;
+
+   s = splnet();
+   bnx_fill_rx_chain(sc, 1);
+   splx(s);
+}
+
 //
 /* Allocate memory and initialize the RX data structures.   */
 /*  */
@@ -4071,7 +4093,7 @@ bnx_init_rx_chain(struct bnx_softc *sc)
}
 
/* Fill up the RX chain. */
-   bnx_fill_rx_chain(sc);
+   bnx_fill_rx_chain(sc, 1);
 
for (i = 0; i  RX_PAGES; i++)
bus_dmamap_sync(sc-bnx_dmatag, sc-rx_bd_chain_map[i], 0,
@@ -4120,7 +4142,7 @@ bnx_free_rx_chain(struct bnx_softc *sc)
}
m_freem(sc-rx_mbuf_ptr[i]);
sc-rx_mbuf_ptr[i] = NULL;
-   DBRUNIF(1, sc-rx_mbuf_alloc--);
+   sc-rx_mbuf_alloc--;
}
}
 
@@ -4335,6 +4357,7 @@ bnx_rx_intr(struct bnx_softc *sc)
/* Remove the mbuf from RX chain. */
m = sc-rx_mbuf_ptr[sw_chain_cons];
sc-rx_mbuf_ptr[sw_chain_cons] = NULL;
+   sc-rx_mbuf_alloc--;
 
/*
 * Frames received on the NetXteme II are prepended 
@@ -4483,7 +4506,6 @@ bnx_rx_int_next_rx:
DBPRINT(sc, BNX_VERBOSE_RECV,
%s(): 

lru/failover path scheduling in mpath(4)

2011-06-15 Thread David Gwynne
the subject line says it all, but happy to explain further if
required.

Index: mpath.c
===
RCS file: /cvs/src/sys/scsi/mpath.c,v
retrieving revision 1.21
diff -u -p -r1.21 mpath.c
--- mpath.c 27 Apr 2011 05:22:24 -  1.21
+++ mpath.c 15 Jun 2011 08:03:06 -
@@ -58,6 +58,7 @@ struct mpath_dev {
 
u_intd_path_count;
 
+   const struct mpath_ops  *d_ops;
struct devid*d_id;
 };
 
@@ -89,7 +90,7 @@ void  mpath_cmd(struct scsi_xfer *);
 void   mpath_minphys(struct buf *, struct scsi_link *);
 intmpath_probe(struct scsi_link *);
 
-struct mpath_path *mpath_next_path(struct mpath_dev *);
+struct mpath_path *mpath_next_path(struct mpath_dev *, int);
 void   mpath_done(struct scsi_xfer *);
 
 struct scsi_adapter mpath_switch = {
@@ -161,7 +162,7 @@ mpath_probe(struct scsi_link *link)
 }
 
 struct mpath_path *
-mpath_next_path(struct mpath_dev *d)
+mpath_next_path(struct mpath_dev *d, int next)
 {
struct mpath_path *p;
 
@@ -169,7 +170,7 @@ mpath_next_path(struct mpath_dev *d)
panic(%s: d is NULL, __func__);
 
p = d-d_next_path;
-   if (p != NULL) {
+   if (p != NULL  next == MPATH_NEXT) {
d-d_next_path = TAILQ_NEXT(p, p_entry);
if (d-d_next_path == NULL)
d-d_next_path = TAILQ_FIRST(d-d_paths);
@@ -194,7 +195,7 @@ mpath_cmd(struct scsi_xfer *xs)
 
if (ISSET(xs-flags, SCSI_POLL)) {
mtx_enter(d-d_mtx);
-   p = mpath_next_path(d);
+   p = mpath_next_path(d, d-d_ops-op_schedule);
mtx_leave(d-d_mtx);
if (p == NULL) {
mpath_xs_stuffup(xs);
@@ -232,7 +233,7 @@ mpath_cmd(struct scsi_xfer *xs)
 
mtx_enter(d-d_mtx);
SIMPLEQ_INSERT_TAIL(d-d_ccbs, ccb, c_entry);
-   p = mpath_next_path(d);
+   p = mpath_next_path(d, d-d_ops-op_schedule);
mtx_leave(d-d_mtx);
 
if (p != NULL)
@@ -294,11 +295,15 @@ mpath_done(struct scsi_xfer *mxs)
struct mpath_ccb *ccb = xs-io;
struct mpath_dev *d = mpath_devs[link-target];
struct mpath_path *p;
+   int next = d-d_ops-op_schedule;
 
-   if (mxs-error == XS_RESET || mxs-error == XS_SELTIMEOUT) {
+   switch (mxs-error) {
+   case XS_SELTIMEOUT: /* physical path is gone, try the next */
+   next = MPATH_NEXT;
+   case XS_RESET:
mtx_enter(d-d_mtx);
SIMPLEQ_INSERT_HEAD(d-d_ccbs, ccb, c_entry);
-   p = mpath_next_path(d);
+   p = mpath_next_path(d, next);
mtx_leave(d-d_mtx);
 
scsi_xs_put(mxs);
@@ -363,7 +368,7 @@ mpath_path_probe(struct scsi_link *link)
 }
 
 int
-mpath_path_attach(struct mpath_path *p)
+mpath_path_attach(struct mpath_path *p, const struct mpath_ops *ops)
 {
struct scsi_link *link = p-p_link;
struct mpath_dev *d = NULL;
@@ -381,7 +386,7 @@ mpath_path_attach(struct mpath_path *p)
if ((d = mpath_devs[target]) == NULL)
continue;
 
-   if (DEVID_CMP(d-d_id, link-id))
+   if (DEVID_CMP(d-d_id, link-id)  d-d_ops == ops)
break;
 
d = NULL;
@@ -403,6 +408,7 @@ mpath_path_attach(struct mpath_path *p)
TAILQ_INIT(d-d_paths);
SIMPLEQ_INIT(d-d_ccbs);
d-d_id = devid_copy(link-id);
+   d-d_ops = ops;
 
mpath_devs[target] = d;
newdev = 1;
Index: mpath_emc.c
===
RCS file: /cvs/src/sys/scsi/mpath_emc.c,v
retrieving revision 1.5
diff -u -p -r1.5 mpath_emc.c
--- mpath_emc.c 15 Jun 2011 01:10:50 -  1.5
+++ mpath_emc.c 15 Jun 2011 08:03:06 -
@@ -94,11 +94,12 @@ int emc_mpath_checksense(struct scsi_xf
 intemc_mpath_online(struct scsi_link *);
 intemc_mpath_offline(struct scsi_link *);
 
-struct mpath_ops emc_mpath_ops = {
+const struct mpath_ops emc_mpath_ops = {
emc,
emc_mpath_checksense,
emc_mpath_online,
emc_mpath_offline,
+   MPATH_ROUNDROBIN
 };
 
 struct emc_device {
@@ -156,7 +157,6 @@ emc_attach(struct device *parent, struct
/* init path */
scsi_xsh_set(sc-sc_path.p_xsh, link, emc_mpath_start);
sc-sc_path.p_link = link;
-   sc-sc_path.p_ops = emc_mpath_ops;
 
if (emc_sp_info(sc)) {
printf(%s: unable to get sp info\n, DEVNAME(sc));
@@ -172,7 +172,7 @@ emc_attach(struct device *parent, struct
sc-sc_sp + 'A', sc-sc_port);
 
if (sc-sc_lun_state == EMC_SP_INFO_LUN_STATE_OWNED) {
-   if (mpath_path_attach(sc-sc_path) != 0)
+   if (mpath_path_attach(sc-sc_path, emc_mpath_ops) != 0)
printf(%s: 

Re: lru/failover path scheduling in mpath(4)

2011-06-15 Thread David Gwynne
On 15/06/2011, at 8:16 PM, Mark Kettenis wrote:

 Date: Wed, 15 Jun 2011 18:04:24 +1000
 From: David Gwynne l...@animata.net

 the subject line says it all, but happy to explain further if
 required.

 Hmm, I'm somewhat confused:

 +#define MPATH_ROUNDROBIN0
 +#define MPATH_NEXT  MPATH_ROUNDROBIN
 +#define MPATH_LRU   1

ah crap, i mean MRU for most recently used. i dont know why LRU keeps coming
out of my fingers.

 What does MPATH_NEXT mean?  Is that the strategy you fall back on if
 the path you're using fails?  What if you have more than 4 paths to a
 disk with a device that only allows a single active path and one of
 them fails?  Are you suddenly going to try roundrobin scheduling of
 IOs?

if you have a group of four paths and you're doing MRU scheduling, only one of
those will be used. if that path goes away (ie, you get XS_SELTIMEOUT from the
physical adapter), MPATH_NEXT moves us to the next available path in the
group.

 Also, I don't quite understand what MPATH_LRU would mean.  Least
 Recently Used?  How is that different from Round Robin?  Don't you
 need a Last Used policy for devices that can only have a single active
 path or devices where there is a significant overhead for switching to
 a different path.

round robin uses all available paths, L^HMRU uses only one of the available
paths until it fails or goes away.

dlg



Re: dont let sdmmc devices respond to scsi vpd queries

2011-06-15 Thread David Gwynne
On 15/06/2011, at 9:52 PM, Kenneth R Westerback wrote:

 On Wed, Jun 15, 2011 at 01:27:03PM +1000, David Gwynne wrote:
 ie, check if the VPD bit is set when an inquiry is issued and stop
 if it is. adds a free check for the cdblen there too.

 i cant even ping my x60 atm, so i cant test. anyone else want to
 give it a spin?

 I have several sdmmc devices. What would I look for? i.e. what problem
 does this fix?

the kernel seems to ignore garbage vpd pages just fine, so you wont see
anything visible in dmesg. if you run
https://svn.itee.uq.edu.au/repo/openbsd-scsidump/ against a card before this
diff you'll see it try to parse the inquiry page as vpd pages cos thats what
the device returns. after the diff it should only show the basic inquiry since
thats all the code emulates.

dlg


  Ken


 Index: sdmmc_scsi.c
 ===
 RCS file: /cvs/src/sys/dev/sdmmc/sdmmc_scsi.c,v
 retrieving revision 1.26
 diff -u -p -r1.26 sdmmc_scsi.c
 --- sdmmc_scsi.c 25 Oct 2010 10:36:49 -  1.26
 +++ sdmmc_scsi.c 15 Jun 2011 03:25:28 -
 @@ -80,6 +80,7 @@ void   *sdmmc_ccb_alloc(void *);
 void sdmmc_ccb_free(void *, void *);

 void sdmmc_scsi_cmd(struct scsi_xfer *);
 +voidsdmmc_inquiry(struct scsi_xfer *);
 void sdmmc_start_xs(struct sdmmc_softc *, struct sdmmc_ccb *);
 void sdmmc_complete_xs(void *);
 void sdmmc_done_xs(struct sdmmc_ccb *);
 @@ -296,7 +297,6 @@ sdmmc_scsi_cmd(struct scsi_xfer *xs)
  struct sdmmc_softc *sc = link-adapter_softc;
  struct sdmmc_scsi_softc *scbus = sc-sc_scsibus;
  struct sdmmc_scsi_target *tgt = scbus-sc_tgt[link-target];
 -struct scsi_inquiry_data inq;
  struct scsi_read_cap_data rcd;
  u_int32_t blockno;
  u_int32_t blockcnt;
 @@ -327,17 +327,7 @@ sdmmc_scsi_cmd(struct scsi_xfer *xs)
  break;

  case INQUIRY:
 -bzero(inq, sizeof inq);
 -inq.device = T_DIRECT;
 -inq.version = 2;
 -inq.response_format = 2;
 -inq.additional_length = 32;
 -strlcpy(inq.vendor, SD/MMC , sizeof(inq.vendor));
 -snprintf(inq.product, sizeof(inq.product),
 -Drive #%02d, link-target);
 -strlcpy(inq.revision,, sizeof(inq.revision));
 -bcopy(inq, xs-data, MIN(xs-datalen, sizeof inq));
 -scsi_done(xs);
 +sdmmc_inquiry(xs);
  return;

  case TEST_UNIT_READY:
 @@ -381,6 +371,39 @@ sdmmc_scsi_cmd(struct scsi_xfer *xs)
  ccb-ccb_blockno = blockno;

  sdmmc_start_xs(sc, ccb);
 +}
 +
 +void
 +sdmmc_inquiry(struct scsi_xfer *xs)
 +{
 +struct scsi_link *link = xs-sc_link;
 +struct scsi_inquiry_data inq;
 +struct scsi_inquiry *cdb = (struct scsi_inquiry *)xs-cmd;
 +
 +if (xs-cmdlen != sizeof(*cdb)) {
 +xs-error = XS_DRIVER_STUFFUP;
 +goto done;
 +}
 +
 +if (ISSET(cdb-flags, SI_EVPD)) {
 +xs-error = XS_DRIVER_STUFFUP;
 +goto done;
 +}
 +
 +bzero(inq, sizeof inq);
 +inq.device = T_DIRECT;
 +inq.version = 2;
 +inq.response_format = 2;
 +inq.additional_length = 32;
 +strlcpy(inq.vendor, SD/MMC , sizeof(inq.vendor));
 +snprintf(inq.product, sizeof(inq.product),
 +Drive #%02d, link-target);
 +strlcpy(inq.revision,, sizeof(inq.revision));
 +
 +bcopy(inq, xs-data, MIN(xs-datalen, sizeof(inq)));
 +
 +done:
 +scsi_done(xs);
 }

 void



Re: try to refill bnx(4) when all the mbufs are gone

2011-06-15 Thread David Gwynne
On Wed, Jun 15, 2011 at 07:39:12PM +0100, Stuart Henderson wrote:
 On 2011/06/15 14:06, David Gwynne wrote:
  this is like the change i did for ix(4)...
 
  Index: if_bnx.c
 
 can i have the if_bnxreg.h part too, please? :)

yes...

Index: if_bnx.c
===
RCS file: /cvs/src/sys/dev/pci/if_bnx.c,v
retrieving revision 1.94
diff -u -p -r1.94 if_bnx.c
--- if_bnx.c18 Apr 2011 04:27:31 -  1.94
+++ if_bnx.c15 Jun 2011 21:36:46 -
@@ -363,7 +363,8 @@ int bnx_get_buf(struct bnx_softc *, u_in
 
 intbnx_init_tx_chain(struct bnx_softc *);
 void   bnx_init_tx_context(struct bnx_softc *);
-void   bnx_fill_rx_chain(struct bnx_softc *);
+void   bnx_fill_rx_chain(struct bnx_softc *, int);
+void   bnx_refill_rx_chain(void *);
 void   bnx_init_rx_context(struct bnx_softc *);
 intbnx_init_rx_chain(struct bnx_softc *);
 void   bnx_free_rx_chain(struct bnx_softc *);
@@ -933,6 +934,7 @@ bnx_attachhook(void *xsc)
ether_ifattach(ifp);
 
timeout_set(sc-bnx_timeout, bnx_tick, sc);
+   timeout_set(sc-rx_refill, bnx_refill_rx_chain, sc);
 
/* Print some important debugging info. */
DBRUN(BNX_INFO, bnx_dump_driver_state(sc));
@@ -3233,6 +3235,7 @@ bnx_stop(struct bnx_softc *sc)
DBPRINT(sc, BNX_VERBOSE_RESET, Entering %s()\n, __FUNCTION__);
 
timeout_del(sc-bnx_timeout);
+   timeout_del(sc-rx_refill);
 
ifp-if_flags = ~(IFF_RUNNING | IFF_OACTIVE);
 
@@ -3706,6 +3709,7 @@ bnx_get_buf(struct bnx_softc *sc, u_int1
 * last rx_bd entry so that rx_mbuf_ptr and rx_mbuf_map matches)
 * and update our counter.
 */
+   sc-rx_mbuf_alloc++;
sc-rx_mbuf_ptr[*chain_prod] = m;
sc-rx_mbuf_map[first_chain_prod] = sc-rx_mbuf_map[*chain_prod];
sc-rx_mbuf_map[*chain_prod] = map;
@@ -3981,10 +3985,11 @@ bnx_init_rx_context(struct bnx_softc *sc
 /*   Nothing*/
 //
 void
-bnx_fill_rx_chain(struct bnx_softc *sc)
+bnx_fill_rx_chain(struct bnx_softc *sc, int offset)
 {
u_int16_t   prod, chain_prod;
u_int32_t   prod_bseq;
+   int refill = 0;
 #ifdef BNX_DEBUG
int rx_mbuf_alloc_before, free_rx_bd_before;
 #endif
@@ -4007,6 +4012,7 @@ bnx_fill_rx_chain(struct bnx_softc *sc)
break;
}
prod = NEXT_RX_BD(prod);
+   refill = 1;
}
 
 #if 0
@@ -4016,17 +4022,33 @@ bnx_fill_rx_chain(struct bnx_softc *sc)
(free_rx_bd_before - sc-free_rx_bd)));
 #endif
 
-   /* Save the RX chain producer index. */
-   sc-rx_prod = prod;
-   sc-rx_prod_bseq = prod_bseq;
-
-   /* Tell the chip about the waiting rx_bd's. */
-   REG_WR16(sc, MB_RX_CID_ADDR + BNX_L2CTX_HOST_BDIDX, sc-rx_prod);
-   REG_WR(sc, MB_RX_CID_ADDR + BNX_L2CTX_HOST_BSEQ, sc-rx_prod_bseq);
+   if (refill) {
+   /* Save the RX chain producer index. */
+   sc-rx_prod = prod;
+   sc-rx_prod_bseq = prod_bseq;
+
+   /* Tell the chip about the waiting rx_bd's. */
+   REG_WR16(sc, MB_RX_CID_ADDR + BNX_L2CTX_HOST_BDIDX,
+   sc-rx_prod);
+   REG_WR(sc, MB_RX_CID_ADDR + BNX_L2CTX_HOST_BSEQ,
+   sc-rx_prod_bseq);
+   } else if (sc-rx_mbuf_alloc  2)
+   timeout_add(sc-rx_refill, offset);
 
DBPRINT(sc, BNX_EXCESSIVE_RECV, Exiting %s()\n, __FUNCTION__);
 }
 
+void
+bnx_refill_rx_chain(void *xsc)
+{
+   struct bnx_softc *sc = xsc;
+   int s;
+
+   s = splnet();
+   bnx_fill_rx_chain(sc, 1);
+   splx(s);
+}
+
 //
 /* Allocate memory and initialize the RX data structures.   */
 /*  */
@@ -4071,7 +4093,7 @@ bnx_init_rx_chain(struct bnx_softc *sc)
}
 
/* Fill up the RX chain. */
-   bnx_fill_rx_chain(sc);
+   bnx_fill_rx_chain(sc, 1);
 
for (i = 0; i  RX_PAGES; i++)
bus_dmamap_sync(sc-bnx_dmatag, sc-rx_bd_chain_map[i], 0,
@@ -4120,7 +4142,7 @@ bnx_free_rx_chain(struct bnx_softc *sc)
}
m_freem(sc-rx_mbuf_ptr[i]);
sc-rx_mbuf_ptr[i] = NULL;
-   DBRUNIF(1, sc-rx_mbuf_alloc--);
+   sc-rx_mbuf_alloc--;
}
}
 
@@ -4335,6 +4357,7 @@ bnx_rx_intr(struct bnx_softc *sc)
/* Remove the mbuf from RX chain. */
m = sc-rx_mbuf_ptr[sw_chain_cons];
sc-rx_mbuf_ptr[sw_chain_cons] = NULL;
+   sc-rx_mbuf_alloc

Re: two minor carp and pfsync fixes

2011-06-22 Thread David Gwynne
On 22/06/2011, at 3:52 AM, Stefan Rinkes wrote:

 Hi,

 while playing around with carp and pfsync I spotted
 two minor bugs.

 1. Not all pfstate flags are synced, cause pfsync uses
u_int8_t, while pf uses u_int16_t for state_flags.
Currently that means PFSTATE_SCRUB_TCP flags don't
get synced.

 retrieving revision 1.333
 diff -u -p -r1.333 pfvar.h
 --- sys/net/pfvar.h 20 Jun 2011 19:03:41 -  1.333
 +++ sys/net/pfvar.h 21 Jun 2011 17:33:31 -
 @@ -892,13 +892,13 @@ struct pfsync_state {
u_int8_t proto;
u_int8_t direction;
u_int8_t log;
 -   u_int8_t state_flags;
 +   u_int16_tstate_flags;
u_int8_t timeout;
u_int8_t sync_flags;
u_int8_t updates;
u_int8_t min_ttl;
u_int8_t set_tos;
 -   u_int8_t pad[4];
 +   u_int8_t pad[3];
 } __packed;

 #define PFSYNC_FLAG_SRCNODE0x04


this diff is not ok.

you changed the wire format, didnt considering endianness, and it looks like
you put a multibyte value on an unaligned boundary.

i'll have a look at the missing flags problem myself in the next few days.

 2. If you are using IP balanced carp and set one of
the interface to down, the mbufs still reach pf.
Cause carp_ourether() returns NULL if the interface
is down and the mbufs gets copied by carp_input(),
cause the M_MCAST flag is set. The copied mbuf is
dropped in ether_input() since the carp interface is down
and the original mbuf reaches the pf. IMHO carp should always
take care of mbufs with his MAC address, else the machine has
to do some unnecessary work.

 retrieving revision 1.184
 diff -u -p -r1.184 ip_carp.c
 --- sys/netinet/ip_carp.c   4 May 2011 16:05:49 -   1.184
 +++ sys/netinet/ip_carp.c   21 Jun 2011 17:34:42 -
 @@ -1514,9 +1514,7 @@ carp_ourether(void *v, struct ether_head

TAILQ_FOREACH(vh, cif-vhif_vrs, sc_list) {
struct carp_vhost_entry *vhe;
 -   if ((vh-sc_if.if_flags  (IFF_UP|IFF_RUNNING)) !=
 -   (IFF_UP|IFF_RUNNING))
 -   continue;
 +
if (vh-sc_balancing == CARP_BAL_ARP) {
LIST_FOREACH(vhe, vh-carp_vhosts, vhost_entries)
if (vhe-state == MASTER 


this looks reasonable to me. mcbride, mpf, could you chip in on this?

dlg



Re: ansi some files in dev

2011-06-22 Thread David Gwynne
ok

On 23/06/2011, at 3:06 AM, Ted Unangst wrote:

 checked with md5, before line folding caused differences.


 Index: cninit.c
 ===
 RCS file: /home/tedu/cvs/src/sys/dev/cninit.c,v
 retrieving revision 1.10
 diff -u -r1.10 cninit.c
 --- cninit.c  26 Jun 2010 23:24:44 -  1.10
 +++ cninit.c  22 Jun 2011 17:03:06 -
 @@ -54,7 +54,7 @@
 struct consdev *cn_tab = NULL;

 void
 -cninit()
 +cninit(void)
 {
   struct consdev *cp;

 @@ -82,8 +82,7 @@
 }

 int
 -cnset(dev)
 - dev_t dev;
 +cnset(dev_t dev)
 {
   struct consdev *cp;

 Index: ksyms.c
 ===
 RCS file: /home/tedu/cvs/src/sys/dev/ksyms.c,v
 retrieving revision 1.20
 diff -u -r1.20 ksyms.c
 --- ksyms.c   26 Dec 2010 15:41:00 -  1.20
 +++ ksyms.c   22 Jun 2011 17:01:08 -
 @@ -62,8 +62,7 @@

 /*ARGSUSED*/
 void
 -ksymsattach(num)
 - int num;
 +ksymsattach(int num)
 {

 #if defined(__sparc64__) || defined(__mips__)
 @@ -152,10 +151,7 @@

 /*ARGSUSED*/
 int
 -ksymsopen(dev, flag, mode, p)
 - dev_t dev;
 - int flag, mode;
 - struct proc *p;
 +ksymsopen(dev_t dev, int flag, int mode, struct proc *p)
 {

   /* There are no non-zero minor devices */
 @@ -175,10 +171,7 @@

 /*ARGSUSED*/
 int
 -ksymsclose(dev, flag, mode, p)
 - dev_t dev;
 - int flag, mode;
 - struct proc *p;
 +ksymsclose(dev_t dev, int flag, int mode, struct proc *p)
 {

   return (0);
 @@ -186,10 +179,7 @@

 /*ARGSUSED*/
 int
 -ksymsread(dev, uio, flags)
 - dev_t dev;
 - struct uio *uio;
 - int flags;
 +ksymsread(dev_t dev, struct uio *uio, int flags)
 {
   int error;
   size_t len;
 @@ -218,32 +208,3 @@

   return (0);
 }
 -
 -/* XXX - not yet */
 -#if 0
 -paddr_t
 -ksymsmmap(dev, off, prot)
 - dev_t dev;
 - off_t off;
 - int prot;
 -{
 - vaddr_t va;
 - paddr_t pa;
 -
 - if (off  0)
 - return (-1);
 - if (off = ksym_head_size + ksym_syms_size)
 - return (-1);
 -
 - if ((vaddr_t)off  ksym_head_size) {
 - va = (vaddr_t)ksym_head + off;
 - } else {
 - va = (vaddr_t)ksym_syms + off;
 - }
 -
 - if (pmap_extract(pmap_kernel, va, pa) == FALSE)
 - panic(ksymsmmap: unmapped page);
 -
 - return (pa);
 -}
 -#endif
 Index: sequencer.c
 ===
 RCS file: /home/tedu/cvs/src/sys/dev/sequencer.c,v
 retrieving revision 1.20
 diff -u -r1.20 sequencer.c
 --- sequencer.c   18 Nov 2010 21:15:14 -  1.20
 +++ sequencer.c   22 Jun 2011 17:01:38 -
 @@ -1250,7 +1250,7 @@
  */

 int
 -midi_unit_count()
 +midi_unit_count(void)
 {
   return (0);
 }
 Index: systrace.c
 ===
 RCS file: /home/tedu/cvs/src/sys/dev/systrace.c,v
 retrieving revision 1.54
 diff -u -r1.54 systrace.c
 --- systrace.c2 Apr 2011 17:04:35 -   1.54
 +++ systrace.c22 Jun 2011 16:59:32 -
 @@ -199,11 +199,8 @@

 /* ARGSUSED */
 int
 -systracef_read(fp, poff, uio, cred)
 - struct file *fp;
 - off_t *poff;
 - struct uio *uio;
 - struct ucred *cred;
 +systracef_read(struct file *fp, off_t *poff, struct uio *uio,
 +struct ucred *cred)
 {
   struct fsystrace *fst = (struct fsystrace *)fp-f_data;
   struct str_process *process;
 @@ -250,11 +247,8 @@

 /* ARGSUSED */
 int
 -systracef_write(fp, poff, uio, cred)
 - struct file *fp;
 - off_t *poff;
 - struct uio *uio;
 - struct ucred *cred;
 +systracef_write(struct file *fp, off_t *poff, struct uio *uio,
 +struct ucred *cred)
 {
   return (EIO);
 }
 @@ -265,11 +259,7 @@

 /* ARGSUSED */
 int
 -systracef_ioctl(fp, cmd, data, p)
 - struct file *fp;
 - u_long cmd;
 - caddr_t data;
 - struct proc *p;
 +systracef_ioctl(struct file *fp, u_long cmd, caddr_t data, struct proc *p)
 {
   int ret = 0;
   struct fsystrace *fst = (struct fsystrace *)fp-f_data;
 @@ -409,10 +399,7 @@

 /* ARGSUSED */
 int
 -systracef_poll(fp, events, p)
 - struct file *fp;
 - int events;
 - struct proc *p;
 +systracef_poll(struct file *fp, int events, struct proc *p)
 {
   struct fsystrace *fst = (struct fsystrace *)fp-f_data;
   int revents = 0;
 @@ -434,28 +421,21 @@

 /* ARGSUSED */
 int
 -systracef_kqfilter(fp, kn)
 - struct file *fp;
 - struct knote *kn;
 +systracef_kqfilter(struct file *fp, struct knote *kn)
 {
   return (1);
 }

 /* ARGSUSED */
 int
 -systracef_stat(fp, sb, p)
 - struct file *fp;
 - struct stat *sb;
 - struct proc *p;
 +systracef_stat(struct file *fp, struct stat *sb, struct proc *p)
 {
   return (EOPNOTSUPP);
 }

 /* ARGSUSED */
 int
 -systracef_close(fp, p)
 - struct file *fp;
 - struct proc *p;
 +systracef_close(struct file *fp, struct proc *p)
 {
   struct fsystrace *fst = (struct fsystrace 

Re: ahci.c: intel_3400_4 needs same flags as intel_3400_1 to avoid a 30 sec boot hang

2011-06-23 Thread David Gwynne
you dawe,

you could point both chips at the same function...

dlg

On 23/06/2011, at 10:50 PM, Dawe wrote:

 Hi,
 the intel_3400_4 has the same issue as the intel_3400_1, ahci(4)
 hangs for 30 seconds on boot and resume. See also PR6630.

 Index: ahci.c
 ===
 RCS file: /cvs/src/sys/dev/pci/ahci.c,v
 retrieving revision 1.180
 diff -u -p -r1.180 ahci.c
 --- ahci.c14 Jun 2011 10:40:14 -  1.180
 +++ ahci.c23 Jun 2011 12:34:49 -
 @@ -458,6 +458,8 @@ int   ahci_amd_hudson2_attach(struct 
 ahc
   struct pci_attach_args *);
 int   ahci_intel_3400_1_attach(struct ahci_softc *,
   struct pci_attach_args *);
 +int  ahci_intel_3400_4_attach(struct ahci_softc *,
 + struct pci_attach_args *);
 int   ahci_nvidia_mcp_attach(struct ahci_softc *,
   struct pci_attach_args *);

 @@ -482,6 +484,8 @@ static const struct ahci_device ahci_dev

   { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_3400_AHCI_1,
   NULL,   ahci_intel_3400_1_attach },
 + { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_3400_AHCI_4,
 + NULL,   ahci_intel_3400_4_attach },

   { PCI_VENDOR_NVIDIA,PCI_PRODUCT_NVIDIA_MCP65_AHCI_2,
   NULL,   ahci_nvidia_mcp_attach },
 @@ -717,6 +721,13 @@ ahci_amd_hudson2_attach(struct ahci_soft

 int
 ahci_intel_3400_1_attach(struct ahci_softc *sc, struct pci_attach_args *pa)
 +{
 + sc-sc_flags |= AHCI_F_IPMS_PROBE;
 + return (0);
 +}
 +
 +int
 +ahci_intel_3400_4_attach(struct ahci_softc *sc, struct pci_attach_args
*pa)
 {
   sc-sc_flags |= AHCI_F_IPMS_PROBE;
   return (0);


 OpenBSD 4.9-current (GENERIC.MP) #9: Thu Jun 23 13:06:40 CEST 2011
d...@padtree.my.domain:/usr/src/sys/arch/amd64/compile/GENERIC.MP
 real mem = 1998045184 (1905MB)
 avail mem = 1930702848 (1841MB)
 mainbus0 at root
 bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xe0010 (78 entries)
 bios0: vendor LENOVO version 6IET68WW (1.28 ) date 07/12/2010
 bios0: LENOVO 25184QG
 acpi0 at bios0: rev 2
 acpi0: sleep states S0 S3 S4 S5
 acpi0: tables DSDT FACP SSDT ECDT APIC MCFG HPET ASF! SLIC BOOT SSDT TCPA
SSDT
 SSDT SSDT
 acpi0: wakeup devices LID_(S3) SLPB(S3) UART(S3) IGBE(S4) EXP1(S4) EXP2(S4)
 EXP3(S4) EXP4(S4) EXP5(S4) EHC1(S3) EHC2(S3) HDEF(S4)
 acpitimer0 at acpi0: 3579545 Hz, 24 bits
 acpiec0 at acpi0
 acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
 cpu0 at mainbus0: apid 0 (boot processor)
 cpu0: Intel(R) Core(TM) i5 CPU M 430 @ 2.27GHz, 2261.37 MHz
 cpu0:

FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3
,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG
 cpu0: 256KB 64b/line 8-way L2 cache
 cpu0: apic clock running at 133MHz
 cpu1 at mainbus0: apid 1 (application processor)
 cpu1: Intel(R) Core(TM) i5 CPU M 430 @ 2.27GHz, 2261.00 MHz
 cpu1:

FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3
,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG
 cpu1: 256KB 64b/line 8-way L2 cache
 cpu2 at mainbus0: apid 4 (application processor)
 cpu2: Intel(R) Core(TM) i5 CPU M 430 @ 2.27GHz, 2261.00 MHz
 cpu2:

FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3
,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG
 cpu2: 256KB 64b/line 8-way L2 cache
 cpu3 at mainbus0: apid 5 (application processor)
 cpu3: Intel(R) Core(TM) i5 CPU M 430 @ 2.27GHz, 2261.00 MHz
 cpu3:

FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3
,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG
 cpu3: 256KB 64b/line 8-way L2 cache
 ioapic0 at mainbus0: apid 1 pa 0xfec0, version 20, 24 pins
 ioapic0: misconfigured as apic 2, remapped to apid 1
 acpimcfg0 at acpi0 addr 0xe000, bus 0-255
 acpihpet0 at acpi0: 14318179 Hz
 acpiprt0 at acpi0: bus 0 (PCI0)
 acpiprt1 at acpi0: bus -1 (PEG_)
 acpiprt2 at acpi0: bus 2 (EXP1)
 acpiprt3 at acpi0: bus 3 (EXP2)
 acpiprt4 at acpi0: bus -1 (EXP3)
 acpiprt5 at acpi0: bus 5 (EXP4)
 acpiprt6 at acpi0: bus 13 (EXP5)
 acpicpu0 at acpi0: C3, C1, PSS
 acpicpu1 at acpi0: C3, C1, PSS
 acpicpu2 at acpi0: C3, C1, PSS
 acpicpu3 at acpi0: C3, C1, PSS
 acpipwrres0 at acpi0: PUBS
 acpitz0 at acpi0: critical temperature is 100 degC
 acpibtn0 at acpi0: LID_
 acpibtn1 at acpi0: SLPB
 acpibat0 at acpi0: BAT0 not present
 acpibat1 at acpi0: BAT1 not present
 acpiac0 at acpi0: AC unit online
 acpithinkpad0 at acpi0
 cpu0: Enhanced SpeedStep 2261 MHz: speeds: 2267, 2266, 2133, 1999, 1866,
1733,
 1599, 1466, 1333, 1199 MHz
 pci0 at mainbus0 bus 0
 pchb0 at pci0 

include support for ospf6d config files

2011-06-26 Thread David Gwynne
this was surprisingly straightforward, i just copied it from ospfd.

ok?

Index: parse.y
===
RCS file: /cvs/src/usr.sbin/ospf6d/parse.y,v
retrieving revision 1.20
diff -u -p -r1.20 parse.y
--- parse.y 13 Dec 2010 13:43:37 -  1.20
+++ parse.y 27 Jun 2011 01:29:46 -
@@ -121,6 +121,7 @@ typedef struct {
 %token SET TYPE
 %token YES NO
 %token DEMOTE
+%token INCLUDE
 %token ERROR
 %token v.string  STRING
 %token v.number  NUMBER
@@ -131,6 +132,7 @@ typedef struct {
 %%
 
 grammar: /* empty */
+   | grammar include '\n'
| grammar '\n'
| grammar conf_main '\n'
| grammar varset '\n'
@@ -138,6 +140,21 @@ grammar: /* empty */
| grammar error '\n'{ file-errors++; }
;
 
+include: INCLUDE STRING{
+   struct file *nfile;
+
+   if ((nfile = pushfile($2, 1)) == NULL) {
+   yyerror(failed to include file %s, $2);
+   free($2);
+   YYERROR;
+   }
+   free($2);
+
+   file = nfile;
+   lungetc('\n');
+   }
+   ;
+
 string : string STRING {
if (asprintf($$, %s %s, $1, $2) == -1) {
free($1);
@@ -526,6 +543,7 @@ lookup(char *s)
{external-tag,EXTTAG},
{fib-update,  FIBUPDATE},
{hello-interval,  HELLOINTERVAL},
+   {include, INCLUDE},
{interface,   INTERFACE},
{metric,  METRIC},
{no,  NO},



Re: OpenOSPF6d does not send LSAs for passive interfaces

2011-06-28 Thread David Gwynne
this works great for me. i'll pressure claudio@ to have a look at it over the
next week or two.

cheers,
dlg

On 25/04/2011, at 8:44 PM, Patrick Coleman wrote:

 On Wed, Jan 5, 2011 at 8:32 PM, Jan Johansson janj+open...@wenf.org
wrote:

 So I found a bug here.

 Your mk2 patch (didn't try the mk1) does not advertise gif
 tunnels this works with the unpatched binary.

 Apologies for the delay on this one - finally got around to setting up
 a test environment today. See [1] for an updated patch.

 I've reworked the logic a bit (basically, it wasn't correctly dealing
 with interfaces with unknown physical link states before). I've tested
 the patch and it works with CARP and loopback interfaces, and gif
 interfaces should work the same.

 Comments appreciated.

 Cheers,

 Patrick

 1. Also uploaded to
 http://patrick.ld.net.au/ospf6d-fix-passive-interfaces-mk3.patch

 Index: interface.c
 ===
 RCS file: /cvs/src/usr.sbin/ospf6d/interface.c,v
 retrieving revision 1.15
 diff -u -p -r1.15 interface.c
 --- interface.c   20 Sep 2009 20:45:06 -  1.15
 +++ interface.c   25 Apr 2011 10:35:00 -
 @@ -145,11 +145,15 @@ if_fsm(struct iface *iface, enum iface_e
   if (iface-state != old_state) {
   orig_rtr_lsa(iface);
   orig_link_lsa(iface);
 -
 - /* state change inform RDE */
 - ospfe_imsg_compose_rde(IMSG_IFINFO,
 - iface-self-peerid, 0, iface, sizeof(struct iface));
   }
 +
 + /*
 +  * Send interface update to RDE regardless of whether state changes - a
 +  * passive interface will remain in the DOWN state but may need to have
 +  * prefix LSAs sent regardless.
 +  */
 + ospfe_imsg_compose_rde(IMSG_IFINFO,
 + iface-self-peerid, 0, iface, sizeof(struct iface));

   if (old_state  (IF_STA_MULTI | IF_STA_POINTTOPOINT) 
   (iface-state  (IF_STA_MULTI | IF_STA_POINTTOPOINT)) == 0)
 Index: rde.c
 ===
 RCS file: /cvs/src/usr.sbin/ospf6d/rde.c,v
 retrieving revision 1.50
 diff -u -p -r1.50 rde.c
 --- rde.c 22 Aug 2010 20:55:10 -  1.50
 +++ rde.c 25 Apr 2011 10:35:03 -
 @@ -22,6 +22,7 @@
 #include sys/socket.h
 #include sys/queue.h
 #include sys/param.h
 +#include net/if_types.h
 #include netinet/in.h
 #include arpa/inet.h
 #include err.h
 @@ -587,11 +588,20 @@ rde_dispatch_imsg(int fd, short event, v
   iface = if_find(ifp-ifindex);
   if (iface == NULL)
   fatalx(interface lost in rde);
 - iface-flags = ifp-flags;
 - iface-linkstate = ifp-linkstate;
   iface-nh_reachable = ifp-nh_reachable;
 - if (iface-state != ifp-state) {
 +
 + /*
 +  * Resend LSAs if interface flags change - carp/passive 
 interfaces
 +  * can come up and down without changing state.
 +  */
 + if ((iface-state != ifp-state) ||
 + (iface-linkstate != ifp-linkstate) ||
 + (iface-flags != ifp-flags)) {
 +
   iface-state = ifp-state;
 + iface-flags = ifp-flags;
 + iface-linkstate = ifp-linkstate;
 +
   area = area_find(rdeconf, iface-area_id);
   if (!area)
   fatalx(interface lost area);
 @@ -1459,8 +1469,43 @@ orig_intra_lsa_rtr(struct area *area, st

   numprefix = 0;
   LIST_FOREACH(iface, area-iface_list, entry) {
 - if (iface-state  IF_STA_DOWN)
 + /*
 +  * Do not send a LSA for interfaces that:
 +  *  - are down (kernel flags)
 +  *
 +  *  - are not carp and have a physical link state of down, 
 excluding
 +  *unknown interfaces: if an interface has a link state of 
 unknown
 +  *then the driver supplies no information about the 
 physical link
 +  *state
 +  *
 +  *  - are carp and have a physical link state of down or 
 unknown: carp
 +  *uses the DOWN state for the backup interface, and the 
 UNKNOWN link
 +  *state if something broke
 +  *
 +  *  - are in the down state, and are not [carp or marked as 
 passive]:
 +  *carp and passive interfaces will always have an OSPF 
 state of
 +  *DOWN.
 +  *
 +  * Note we recheck interface flags and link state here in 
 addition to
 +  * if_act_* as passive interfaces can change link state while 
 remaining
 +  * in IF_STA_DOWN.
 +  */
 + if (!(iface-flags  

Re: malloc flags: even more strict 'S'

2011-07-12 Thread David Gwynne
i like this.

On 12/07/2011, at 9:23 PM, Otto Moerbeek wrote:

 Hi,
 
 at the cost of some speed, reduce the malloc cache size to 0 with
 flag 'S'.  This means that pages that become free will be unmapped asap.
 This detects more use-after-free bugs. The slowdown is because of more
 unmap/mmap calls. 
 
 ok?
 
   -Otto
 
 Index: malloc.c
 ===
 RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
 retrieving revision 1.138
 diff -u -p -r1.138 malloc.c
 --- malloc.c  20 Jun 2011 18:04:06 -  1.138
 +++ malloc.c  12 Jul 2011 11:18:41 -
 @@ -68,6 +68,8 @@
 #define MALLOC_MAXCACHE   256
 #define MALLOC_DELAYED_CHUNKS 15  /* max of getrnibble() */
 #define MALLOC_INITIAL_REGIONS512
 +#define MALLOC_DEFAULT_CACHE 64
 +
 /*
  * When the P option is active, we move allocations between half a page
  * and a whole page towards the end, subject to alignment constraints.
 @@ -461,7 +463,7 @@ omalloc_init(struct dir_info **dp)
*/
   mopts.malloc_abort = 1;
   mopts.malloc_move = 1;
 - mopts.malloc_cache = 64;
 + mopts.malloc_cache = MALLOC_DEFAULT_CACHE;
 
   for (i = 0; i  3; i++) {
   switch (i) {
 @@ -551,10 +553,12 @@ omalloc_init(struct dir_info **dp)
   case 's':
   mopts.malloc_freeprot = mopts.malloc_junk = 0;
   mopts.malloc_guard = 0;
 + mopts.malloc_cache = MALLOC_DEFAULT_CACHE;
   break;
   case 'S':
   mopts.malloc_freeprot = mopts.malloc_junk = 1;
   mopts.malloc_guard = MALLOC_PAGESIZE;
 + mopts.malloc_cache = 0;
   break;
   case 'x':
   mopts.malloc_xmalloc = 0;



support specifying scheme/method in apache server configs

2011-07-13 Thread David Gwynne
in my environment i have nginx in front of apache to offload ssl
and to let me easily point different parts of the uri namespace at
all crazy backends we have. this works fine except if the apache
wants to canonicalise something on the ssl backends. because the
ssl is done in nginx, apache doesnt know that it should use https
as the scheme rather than just http and redirects the user to the
wrong port.

this diff models the newer apache behaviour of letting you specify
the scheme/method as part of the ServerName directive.

i can set virtualhosts up like this now:

# nginx has an ssl listener on 443 that proxies to this backend
# using plain http.
VirtualHost _default_:280
ServerName https://internal.eait.uq.edu.au

# other shizz

/VirtualHost

with this diff apache canonicalises with https at the start of the
url instead of the default of http.

please note i dont like userland (too many strings), and im generally
unfamiliar with apache internals, so i would appreciate both eyes
and tests.

ok?

Index: src/include/http_core.h
===
RCS file: /cvs/src/usr.sbin/httpd/src/include/http_core.h,v
retrieving revision 1.12
diff -u -p -r1.12 http_core.h
--- src/include/http_core.h 24 Aug 2007 11:31:29 -  1.12
+++ src/include/http_core.h 14 Jul 2011 03:33:02 -
@@ -138,6 +138,8 @@ API_EXPORT(const char *) ap_get_remote_l
 API_EXPORT(char *) ap_construct_url(pool *p, const char *uri, request_rec *r);
 API_EXPORT(const char *) ap_get_server_name(request_rec *r);
 API_EXPORT(unsigned) ap_get_server_port(const request_rec *r);
+API_EXPORT(const char *) ap_get_server_method(const request_rec *r);
+API_EXPORT(unsigned) ap_get_default_port(const request_rec *r);
 API_EXPORT(unsigned long) ap_get_limit_req_body(const request_rec *r);
 API_EXPORT(void) ap_custom_response(request_rec *r, int status, char *string);
 API_EXPORT(int) ap_exists_config_define(char *name);
Index: src/include/httpd.h
===
RCS file: /cvs/src/usr.sbin/httpd/src/include/httpd.h,v
retrieving revision 1.30
diff -u -p -r1.30 httpd.h
--- src/include/httpd.h 25 Feb 2010 07:49:53 -  1.30
+++ src/include/httpd.h 14 Jul 2011 03:33:02 -
@@ -141,12 +141,8 @@ extern C {
 #define DEFAULT_HTTP_PORT  80
 #define DEFAULT_HTTPS_PORT 443
 #define ap_is_default_port(port,r) ((port) == ap_default_port(r))
-#define ap_http_method(r)   (((r)-ctx != NULL  ap_ctx_get((r)-ctx, \
-ap::http::method) != NULL) ? ((char *)ap_ctx_get((r)-ctx,   \
-ap::http::method)) : http)
-#define ap_default_port(r)  (((r)-ctx != NULL  ap_ctx_get((r)-ctx, \
-ap::default::port) != NULL) ? atoi((char *)ap_ctx_get((r)-ctx,  \
-ap::default::port)) : DEFAULT_HTTP_PORT)
+#define ap_http_method(r)   ap_get_server_method(r)
+#define ap_default_port(r)  ap_get_default_port(r)
 
 /* - Default user name and group name running standalone -- */
 /* --- These may be specified as numbers by placing a # before a number --- */
Index: src/main/http_core.c
===
RCS file: /cvs/src/usr.sbin/httpd/src/main/http_core.c,v
retrieving revision 1.27
diff -u -p -r1.27 http_core.c
--- src/main/http_core.c10 May 2010 02:00:50 -  1.27
+++ src/main/http_core.c14 Jul 2011 03:33:02 -
@@ -804,6 +804,42 @@ ap_get_server_port(const request_rec *r)
: port;
 }
 
+API_EXPORT(const char *)
+ap_get_server_method(const request_rec *r)
+{
+   const char *method;
+
+   if (r-ctx != NULL) {
+   method = ap_ctx_get(r-ctx, ap::http::method);
+   if (method != NULL)
+   return (method);
+   }
+
+   if (r-server-ctx != NULL) {
+   method = ap_ctx_get(r-server-ctx, ap::http::method);
+   if (method != NULL)
+   return (method);
+   }
+
+   return (http);
+}
+
+API_EXPORT(unsigned)
+ap_get_default_port(const request_rec *r)
+{
+   const char *v = NULL;
+
+   if (r-ctx != NULL)
+   v = ap_ctx_get(r-ctx, ap::default::port);
+   if (v == NULL  r-server-ctx != NULL)
+   v = ap_ctx_get(r-server-ctx, ap::default::port);
+
+   if (v == NULL)
+   return (DEFAULT_HTTP_PORT);
+
+   return (atoi(v));
+}
+
 API_EXPORT(char *)
 ap_construct_url(pool *p, const char *uri, request_rec *r)
 {
@@ -1751,6 +1787,43 @@ static const char *set_server_string_slo
 return NULL;
 }
 
+static const char *
+set_server_name(cmd_parms *cmd, void *dummy, char *arg)
+{
+   const char *err = ap_check_cmd_context(cmd,
+   NOT_IN_DIR_LOC_FILE|NOT_IN_LIMIT);
+   const char *part;
+   int port;
+
+   if (err != NULL)
+   return (err);
+
+   if (strncmp(https://;, arg, 8) == 0) {
+   ap_ctx_set(cmd-server-ctx, ap::http::method, https);
+   

Re: support specifying scheme/method in apache server configs

2011-07-19 Thread David Gwynne
noone has an opinion?

would anyone get upset if i committed this?

dlg

On 14/07/2011, at 1:40 PM, David Gwynne wrote:

 in my environment i have nginx in front of apache to offload ssl
 and to let me easily point different parts of the uri namespace at
 all crazy backends we have. this works fine except if the apache
 wants to canonicalise something on the ssl backends. because the
 ssl is done in nginx, apache doesnt know that it should use https
 as the scheme rather than just http and redirects the user to the
 wrong port.

 this diff models the newer apache behaviour of letting you specify
 the scheme/method as part of the ServerName directive.

 i can set virtualhosts up like this now:

 # nginx has an ssl listener on 443 that proxies to this backend
 # using plain http.
 VirtualHost _default_:280
ServerName https://internal.eait.uq.edu.au

   # other shizz

 /VirtualHost

 with this diff apache canonicalises with https at the start of the
 url instead of the default of http.

 please note i dont like userland (too many strings), and im generally
 unfamiliar with apache internals, so i would appreciate both eyes
 and tests.

 ok?

 Index: src/include/http_core.h
 ===
 RCS file: /cvs/src/usr.sbin/httpd/src/include/http_core.h,v
 retrieving revision 1.12
 diff -u -p -r1.12 http_core.h
 --- src/include/http_core.h   24 Aug 2007 11:31:29 -  1.12
 +++ src/include/http_core.h   14 Jul 2011 03:33:02 -
 @@ -138,6 +138,8 @@ API_EXPORT(const char *) ap_get_remote_l
 API_EXPORT(char *) ap_construct_url(pool *p, const char *uri, request_rec
*r);
 API_EXPORT(const char *) ap_get_server_name(request_rec *r);
 API_EXPORT(unsigned) ap_get_server_port(const request_rec *r);
 +API_EXPORT(const char *) ap_get_server_method(const request_rec *r);
 +API_EXPORT(unsigned) ap_get_default_port(const request_rec *r);
 API_EXPORT(unsigned long) ap_get_limit_req_body(const request_rec *r);
 API_EXPORT(void) ap_custom_response(request_rec *r, int status, char
*string);
 API_EXPORT(int) ap_exists_config_define(char *name);
 Index: src/include/httpd.h
 ===
 RCS file: /cvs/src/usr.sbin/httpd/src/include/httpd.h,v
 retrieving revision 1.30
 diff -u -p -r1.30 httpd.h
 --- src/include/httpd.h   25 Feb 2010 07:49:53 -  1.30
 +++ src/include/httpd.h   14 Jul 2011 03:33:02 -
 @@ -141,12 +141,8 @@ extern C {
 #define DEFAULT_HTTP_PORT 80
 #define DEFAULT_HTTPS_PORT443
 #define ap_is_default_port(port,r)((port) == ap_default_port(r))
 -#define ap_http_method(r)   (((r)-ctx != NULL  ap_ctx_get((r)-ctx, \
 -ap::http::method) != NULL) ? ((char *)ap_ctx_get((r)-ctx,   \
 -ap::http::method)) : http)
 -#define ap_default_port(r)  (((r)-ctx != NULL  ap_ctx_get((r)-ctx, \
 -ap::default::port) != NULL) ? atoi((char *)ap_ctx_get((r)-ctx,  \
 -ap::default::port)) : DEFAULT_HTTP_PORT)
 +#define ap_http_method(r)   ap_get_server_method(r)
 +#define ap_default_port(r)  ap_get_default_port(r)

 /* - Default user name and group name running standalone --
*/
 /* --- These may be specified as numbers by placing a # before a number ---
*/
 Index: src/main/http_core.c
 ===
 RCS file: /cvs/src/usr.sbin/httpd/src/main/http_core.c,v
 retrieving revision 1.27
 diff -u -p -r1.27 http_core.c
 --- src/main/http_core.c  10 May 2010 02:00:50 -  1.27
 +++ src/main/http_core.c  14 Jul 2011 03:33:02 -
 @@ -804,6 +804,42 @@ ap_get_server_port(const request_rec *r)
   : port;
 }

 +API_EXPORT(const char *)
 +ap_get_server_method(const request_rec *r)
 +{
 + const char *method;
 +
 + if (r-ctx != NULL) {
 + method = ap_ctx_get(r-ctx, ap::http::method);
 + if (method != NULL)
 + return (method);
 + }
 +
 + if (r-server-ctx != NULL) {
 + method = ap_ctx_get(r-server-ctx, ap::http::method);
 + if (method != NULL)
 + return (method);
 + }
 +
 + return (http);
 +}
 +
 +API_EXPORT(unsigned)
 +ap_get_default_port(const request_rec *r)
 +{
 + const char *v = NULL;
 +
 + if (r-ctx != NULL)
 + v = ap_ctx_get(r-ctx, ap::default::port);
 + if (v == NULL  r-server-ctx != NULL)
 + v = ap_ctx_get(r-server-ctx, ap::default::port);
 +
 + if (v == NULL)
 + return (DEFAULT_HTTP_PORT);
 +
 + return (atoi(v));
 +}
 +
 API_EXPORT(char *)
 ap_construct_url(pool *p, const char *uri, request_rec *r)
 {
 @@ -1751,6 +1787,43 @@ static const char *set_server_string_slo
 return NULL;
 }

 +static const char *
 +set_server_name(cmd_parms *cmd, void *dummy, char *arg)
 +{
 + const char *err = ap_check_cmd_context(cmd,
 + NOT_IN_DIR_LOC_FILE|NOT_IN_LIMIT);
 + const char *part;
 + int port;
 +
 + if (err

Re: support specifying scheme/method in apache server configs

2011-07-25 Thread David Gwynne
On Thu, Jul 21, 2011 at 02:21:19PM +0100, Federico Schwindt wrote:
 On Thu, Jul 14, 2011 at 4:40 AM, David Gwynne l...@animata.net wrote:
  in my environment i have nginx in front of apache to offload ssl
  and to let me easily point different parts of the uri namespace at
  all crazy backends we have. this works fine except if the apache
  wants to canonicalise something on the ssl backends. because the
  ssl is done in nginx, apache doesnt know that it should use https
  as the scheme rather than just http and redirects the user to the
  wrong port.
 [..]
  ok?
 
 no as it is. please don't use atoi, use ap_strtol.

it seems a bit unfair of you to hold me to a higher standard than
the rest of the apache codebase, but ill live :)

 to check for `:' you can use strchr(). i believe you need an ap_pstrdup() for:
 
 +   cmd-server-server_hostname = arg;

other stuff stores arg directly, so i assumed that was ok. again,
i'll live :)

 the strtonum() is wrong. it should be 65535 (the value is inclusive)

cool.

 but you could replace that bit just calling server_port().

but then id be calling atoi...

 you need to update the documentation as well.

true.

here's an update diff to the code:

Index: src/include/http_core.h
===
RCS file: /cvs/src/usr.sbin/httpd/src/include/http_core.h,v
retrieving revision 1.12
diff -u -p -r1.12 http_core.h
--- src/include/http_core.h 24 Aug 2007 11:31:29 -  1.12
+++ src/include/http_core.h 25 Jul 2011 06:45:38 -
@@ -138,6 +138,8 @@ API_EXPORT(const char *) ap_get_remote_l
 API_EXPORT(char *) ap_construct_url(pool *p, const char *uri, request_rec *r);
 API_EXPORT(const char *) ap_get_server_name(request_rec *r);
 API_EXPORT(unsigned) ap_get_server_port(const request_rec *r);
+API_EXPORT(const char *) ap_get_server_method(const request_rec *r);
+API_EXPORT(unsigned) ap_get_default_port(const request_rec *r);
 API_EXPORT(unsigned long) ap_get_limit_req_body(const request_rec *r);
 API_EXPORT(void) ap_custom_response(request_rec *r, int status, char *string);
 API_EXPORT(int) ap_exists_config_define(char *name);
Index: src/include/httpd.h
===
RCS file: /cvs/src/usr.sbin/httpd/src/include/httpd.h,v
retrieving revision 1.30
diff -u -p -r1.30 httpd.h
--- src/include/httpd.h 25 Feb 2010 07:49:53 -  1.30
+++ src/include/httpd.h 25 Jul 2011 06:45:38 -
@@ -141,12 +141,8 @@ extern C {
 #define DEFAULT_HTTP_PORT  80
 #define DEFAULT_HTTPS_PORT 443
 #define ap_is_default_port(port,r) ((port) == ap_default_port(r))
-#define ap_http_method(r)   (((r)-ctx != NULL  ap_ctx_get((r)-ctx, \
-ap::http::method) != NULL) ? ((char *)ap_ctx_get((r)-ctx,   \
-ap::http::method)) : http)
-#define ap_default_port(r)  (((r)-ctx != NULL  ap_ctx_get((r)-ctx, \
-ap::default::port) != NULL) ? atoi((char *)ap_ctx_get((r)-ctx,  \
-ap::default::port)) : DEFAULT_HTTP_PORT)
+#define ap_http_method(r)   ap_get_server_method(r)
+#define ap_default_port(r)  ap_get_default_port(r)
 
 /* - Default user name and group name running standalone -- */
 /* --- These may be specified as numbers by placing a # before a number --- */
Index: src/main/http_core.c
===
RCS file: /cvs/src/usr.sbin/httpd/src/main/http_core.c,v
retrieving revision 1.27
diff -u -p -r1.27 http_core.c
--- src/main/http_core.c10 May 2010 02:00:50 -  1.27
+++ src/main/http_core.c25 Jul 2011 06:45:38 -
@@ -804,6 +804,42 @@ ap_get_server_port(const request_rec *r)
: port;
 }
 
+API_EXPORT(const char *)
+ap_get_server_method(const request_rec *r)
+{
+   const char *method;
+
+   if (r-ctx != NULL) {
+   method = ap_ctx_get(r-ctx, ap::http::method);
+   if (method != NULL)
+   return (method);
+   }
+
+   if (r-server-ctx != NULL) {
+   method = ap_ctx_get(r-server-ctx, ap::http::method);
+   if (method != NULL)
+   return (method);
+   }
+
+   return (http);
+}
+
+API_EXPORT(unsigned)
+ap_get_default_port(const request_rec *r)
+{
+   const char *v = NULL;
+
+   if (r-ctx != NULL)
+   v = ap_ctx_get(r-ctx, ap::default::port);
+   if (v == NULL  r-server-ctx != NULL)
+   v = ap_ctx_get(r-server-ctx, ap::default::port);
+
+   if (v == NULL)
+   return (DEFAULT_HTTP_PORT);
+
+   return ((unsigned)ap_strtol(v, NULL, 10));
+}
+
 API_EXPORT(char *)
 ap_construct_url(pool *p, const char *uri, request_rec *r)
 {
@@ -1751,6 +1787,43 @@ static const char *set_server_string_slo
 return NULL;
 }
 
+static const char *
+set_server_name(cmd_parms *cmd, void *dummy, char *arg)
+{
+   const char *err = ap_check_cmd_context(cmd,
+   NOT_IN_DIR_LOC_FILE|NOT_IN_LIMIT);
+   const

vmware is stupid wrt to the giaddrs in dhcp packets when pxe booting

2011-08-17 Thread David Gwynne
vmwares pxe rom in guests uses the giaddr (the address of the dhcp
relay) as the default ip gateway.

this is a problem if you're running carped firewalls, because you'll
be running a dhcrelay on each of them attached to the hardware
interface, not the carped interface. if the vmware client requesting
the dhcp release gets a response via the backup firewall, it will
then try to tftp by using the backup firewalls hardware ip as its
gateway, not the ip on the carp interface. that in turn means
tftp-proxy will insert rules on the backup, but the replies from
the tftp server to the client will be sent via the actual carp
master, which doesnt have the tftp rules and will block them.

this diff zeroes out the giaddr in dhcp replies so vmware cant be
stupid and use it as the ip gateway address.

im not sure this is the right way to fix the problem.

Index: dhcrelay.c
===
RCS file: /cvs/src/usr.sbin/dhcrelay/dhcrelay.c,v
retrieving revision 1.35
diff -u -p -r1.35 dhcrelay.c
--- dhcrelay.c  21 Jun 2011 17:31:07 -  1.35
+++ dhcrelay.c  18 Aug 2011 03:56:51 -
@@ -269,6 +269,8 @@ relay(struct interface_info *ip, struct 
return;
}
 
+   packet-giaddr.s_addr = 0x0;
+
if (send_packet(interfaces, packet, length,
interfaces-primary_address, to, hto) != -1)
debug(forwarded BOOTREPLY for %s to %s,



use strtonum to check Port statements in apache

2011-08-25 Thread David Gwynne
i want to push my set scheme in http diff again, but it makes
sense to reuse server_port as part of that.

this makes server_port more palatable to me.

ok?

Index: src/main/http_core.c
===
RCS file: /cvs/src/usr.sbin/httpd/src/main/http_core.c,v
retrieving revision 1.27
diff -u -p -r1.27 http_core.c
--- src/main/http_core.c10 May 2010 02:00:50 -  1.27
+++ src/main/http_core.c25 Aug 2011 10:23:02 -
@@ -1779,14 +1779,16 @@ static const char *server_port(cmd_parms
 if (err != NULL) {
return err;
 }
-port = atoi(arg);
-if (port = 0 || port = 65536) { /* 65536 == 116 */
-   return ap_pstrcat(cmd-temp_pool, The port number \, arg, 
- \ is outside the appropriate range 
- (i.e., 1..65535)., NULL);
+
+port = (int)strtonum(arg, 1, 65535, err);
+if (err != NULL) {
+   return ap_pstrcat(cmd-temp_pool,
+ The port number \, arg, \ is , err, NULL);
 }
+
 cmd-server-port = port;
-return NULL;
+
+return (NULL);
 }
 
 static const char *set_signature_flag(cmd_parms *cmd, core_dir_config *d, 



Re: scratch increasing MAXPHYS

2011-11-11 Thread David Gwynne
its generally hard for the cpu to slow a disk down...

On 12/11/2011, at 8:36 AM, Geoff Steckel wrote:

 Increasing MAXPHYS to 256K shows a few places where it's assumed that
 there are 16 pages in MAXPHYS.
 
 In dev/ic/ahci.c I had to make this change @307 to make the
 scatter-gather table large enough - 1 entry per page + extra
 because that's what the previous code had and didn't say why.
 I could understand +1 because a lot of code works that way.
 
 /* this makes ahci_cmd_table 512 bytes, supporting 128-byte alignment */
 /* #define AHCI_MAX_PRDT24 too small for 256K of 4K pages */
 /* extra 12 is to match old 16 + 8 */
 #define AHCI_MAX_PRDT   ((MAXPHYS / PAGE_SIZE) + 8)
 
 Grep-ing shows at least dev/ic/osiopvar.h doesn't compute
 DMA resources from MAXPHYS. There are probably other 17s buried
 in ugly places.
 
 It doesn't seem to help disk I/O speed at all.
 It *does* decrease interrupt rate to about 400/sec.
 
 Now to try some other tests. Grumble.
   Geoff Steckel



jumbos for bnx(4)

2011-11-21 Thread David Gwynne
this diff enables large ethernet frames on bnx(4) chips.

initial tests are good, im going to put it on the firewalls now.

Index: mii/brgphy.c
===
RCS file: /cvs/src/sys/dev/mii/brgphy.c,v
retrieving revision 1.93
diff -u -p -r1.93 brgphy.c
--- mii/brgphy.c24 May 2010 21:23:23 -  1.93
+++ mii/brgphy.c21 Nov 2011 06:12:02 -
@@ -1117,21 +1117,42 @@ void
 brgphy_jumbo_settings(struct mii_softc *sc)
 {
u_int32_t val;
+   char *devname;
 
-   /* Set Jumbo frame settings in the PHY. */
-   if (sc-mii_model == MII_MODEL_BROADCOM_BCM5401) {
-   /* Cannot do read-modify-write on the BCM5401 */
-   PHY_WRITE(sc, BRGPHY_MII_AUXCTL, 0x4c20);
+   devname = sc-mii_dev.dv_parent-dv_cfdata-cf_driver-cd_name;
+
+   /* enable jumbo support on bnx(4) chips */
+   if (strcmp(devname, bnx) == 0) {
+   /* Set Jumbo frame settings in the PHY. */
+   if (sc-mii_model == MII_MODEL_BROADCOM_BCM5401) {
+   /* Cannot do read-modify-write on the BCM5401 */
+   PHY_WRITE(sc, BRGPHY_MII_AUXCTL, 0x4c20);
+   } else {
+   PHY_WRITE(sc, BRGPHY_MII_AUXCTL, 0x7);
+   val = PHY_READ(sc, BRGPHY_MII_AUXCTL);
+   PHY_WRITE(sc, BRGPHY_MII_AUXCTL,
+   val | BRGPHY_AUXCTL_LONG_PKT);
+   }
+
+   val = PHY_READ(sc, BRGPHY_MII_PHY_EXTCTL);
+   PHY_WRITE(sc, BRGPHY_MII_PHY_EXTCTL,
+   val  ~(BRGPHY_PHY_EXTCTL_HIGH_LA));
} else {
-   PHY_WRITE(sc, BRGPHY_MII_AUXCTL, 0x7);
-   val = PHY_READ(sc, BRGPHY_MII_AUXCTL);
-   PHY_WRITE(sc, BRGPHY_MII_AUXCTL,
-   val  ~(BRGPHY_AUXCTL_LONG_PKT | 0x7));
+   /* Set Jumbo frame settings in the PHY. */
+   if (sc-mii_model == MII_MODEL_BROADCOM_BCM5401) {
+   /* Cannot do read-modify-write on the BCM5401 */
+   PHY_WRITE(sc, BRGPHY_MII_AUXCTL, 0x4c20);
+   } else {
+   PHY_WRITE(sc, BRGPHY_MII_AUXCTL, 0x7);
+   val = PHY_READ(sc, BRGPHY_MII_AUXCTL);
+   PHY_WRITE(sc, BRGPHY_MII_AUXCTL,
+   val  ~(BRGPHY_AUXCTL_LONG_PKT | 0x7));
+   }
+   
+   val = PHY_READ(sc, BRGPHY_MII_PHY_EXTCTL);
+   PHY_WRITE(sc, BRGPHY_MII_PHY_EXTCTL,
+   val  ~(BRGPHY_PHY_EXTCTL_HIGH_LA));
}
-
-   val = PHY_READ(sc, BRGPHY_MII_PHY_EXTCTL);
-   PHY_WRITE(sc, BRGPHY_MII_PHY_EXTCTL,
-   val  ~BRGPHY_PHY_EXTCTL_HIGH_LA);
 }
 
 void
Index: pci/if_bnx.c
===
RCS file: /cvs/src/sys/dev/pci/if_bnx.c,v
retrieving revision 1.95
diff -u -p -r1.95 if_bnx.c
--- pci/if_bnx.c22 Jun 2011 16:44:27 -  1.95
+++ pci/if_bnx.c21 Nov 2011 06:12:02 -
@@ -850,6 +850,8 @@ bnx_attachhook(void *xsc)
sc-bnx_rx_ticks   = 18;
 #endif
 
+   sc-mbuf_alloc_size = BNX_MAX_JUMBO_MRU;
+
/* Update statistics once every second. */
sc-bnx_stats_ticks = 100  0x00;
 
@@ -880,9 +882,10 @@ bnx_attachhook(void *xsc)
ifp-if_ioctl = bnx_ioctl;
ifp-if_start = bnx_start;
ifp-if_watchdog = bnx_watchdog;
+   ifp-if_hardmtu = 9000;
IFQ_SET_MAXLEN(ifp-if_snd, USABLE_TX_BD - 1);
IFQ_SET_READY(ifp-if_snd);
-   m_clsetwms(ifp, MCLBYTES, 2, USABLE_RX_BD);
+   m_clsetwms(ifp, sc-mbuf_alloc_size, 2, USABLE_RX_BD);
bcopy(sc-eaddr, sc-arpcom.ac_enaddr, ETHER_ADDR_LEN);
bcopy(sc-bnx_dev.dv_xname, ifp-if_xname, IFNAMSIZ);
 
@@ -896,8 +899,6 @@ bnx_attachhook(void *xsc)
ifp-if_capabilities |= IFCAP_VLAN_HWTAGGING;
 #endif
 
-   sc-mbuf_alloc_size = BNX_MAX_MRU;
-
printf(%s: address %s\n, sc-bnx_dev.dv_xname,
ether_sprintf(sc-arpcom.ac_enaddr));
 
@@ -2626,8 +2627,8 @@ bnx_dma_alloc(struct bnx_softc *sc)
 * Create DMA maps for the Rx buffer mbufs.
 */
for (i = 0; i  TOTAL_RX_BD; i++) {
-   if (bus_dmamap_create(sc-bnx_dmatag, BNX_MAX_MRU,
-   BNX_MAX_SEGMENTS, BNX_MAX_MRU, 0, BUS_DMA_NOWAIT,
+   if (bus_dmamap_create(sc-bnx_dmatag, sc-mbuf_alloc_size,
+   1, sc-mbuf_alloc_size, 0, BUS_DMA_NOWAIT,
sc-rx_mbuf_map[i])) {
printf(: Could not create Rx mbuf %d DMA map!\n, i);
rc = ENOMEM;
@@ -3641,10 +3642,10 @@ bnx_get_buf(struct bnx_softc *sc, u_int1
*prod_bseq);
 
/* This is a new mbuf allocation. */
-   m = MCLGETI(NULL, M_DONTWAIT, sc-arpcom.ac_if, MCLBYTES);
+   m = MCLGETI(NULL, M_DONTWAIT, sc-arpcom.ac_if, sc-mbuf_alloc_size);
if (!m)
  

Re: allow _proxy variables in sudoers

2012-01-15 Thread David Gwynne
you forgot https_proxy and no_proxy...

however, im against this change since it allows a user to redirect a program
they need privileges to use to an arbitrary proxy they specify, something
there is no good mitigation against.

dlg

On 16/01/2012, at 12:15 AM, Peter Hessler wrote:

 When you are in a place that requires proxy configs to reach the
 internet, it would be very nice if you could then use pkg_add.

 So, add ftp_proxy and http_proxy to the default env_keep variable.

 OK?

 Index: usr.bin/sudo/sudoers
 ===
 RCS file: /cvs/openbsd/src/usr.bin/sudo/sudoers,v
 retrieving revision 1.25
 diff -u -p -r1.25 sudoers
 --- usr.bin/sudo/sudoers  26 Oct 2009 19:28:26 -  1.25
 +++ usr.bin/sudo/sudoers  15 Jan 2012 14:13:48 -
 @@ -20,7 +20,7 @@ Defaults env_keep +=DESTDIR DISTDIR EDI
 Defaults env_keep +=MAKECONF MULTI_PACKAGES NOMAN OKAY_FILES OWNER
PKG_CACHE
 Defaults env_keep +=PKG_DBDIR PKG_DESTDIR PKG_PATH PKG_TMPDIR PORTSDIR
 Defaults env_keep +=RELEASEDIR SHARED_ONLY SSH_AUTH_SOCK SUBPACKAGE
VISUAL
 -Defaults env_keep +=WRKOBJDIR
 +Defaults env_keep +=WRKOBJDIR ftp_proxy http_proxy

 # Uncomment to disable the lecture the first time you run sudo
 #Defaults !lecture



 --
 Conscious is when you are aware of something and conscience is when you
 wish you weren't.



Re: login_yubikey does not accept user.name

2012-04-04 Thread David Gwynne
pretty confident a regex wont fly :)

On 04/04/2012, at 11:34 PM, Bjvrn Ketelaars wrote:

 2012/4/3 Theo de Raadt dera...@cvs.openbsd.org:
 Hmm, I'd like to see that refactored somehow.

 Also, '-' should not be legal at the start of a login name.  There
 are things that care.  I think at the end it is OK, though.

 Crazy eh.  Isn't there something else in libc that checks this?

 New diff:

 Index: login_yubikey.c
 ===
 RCS file: /cvs/src/libexec/login_yubikey/login_yubikey.c,v
 retrieving revision 1.4
 diff -u -r1.4 login_yubikey.c
 --- login_yubikey.c   1 Feb 2012 16:07:28 -   1.4
 +++ login_yubikey.c   4 Apr 2012 13:23:01 -
 @@ -36,6 +36,7 @@
 #include ctype.h
 #include login_cap.h
 #include pwd.h
 +#include regex.h
 #include stdarg.h
 #include stdio.h
 #include stdlib.h
 @@ -165,12 +166,15 @@
 static int
 clean_string(const char *s)
 {
 - while (*s) {
 - if (!isalnum(*s)  *s != '-'  *s != '_')
 - return (0);
 - ++s;
 - }
 - return (1);
 + char p[] = ^[0-9a-z_]+(($|[0-9a-z_-]$)|([0-9a-z\\._-]+[0-9a-z_-]$));
 + int ret = 0;
 + regex_t r;
 +
 + regcomp(r,p,REG_EXTENDED);
 + if (regexec(r,s,0,0,0) == 0)
 + ret = 1;
 + regfree(r);
 + return (ret);
 }

 static int



Re: tftpd patch

2012-07-02 Thread David Gwynne
that tftpd has been unlinked from the tree, and therefore unlikely to get
patches against it.

have you tried the new one to see if you get annoying errors out of it?

dlg

On 29/06/2012, at 4:11 AM, Peter J. Philipp wrote:

 Hi,

 I have the weird scenario when I try to tftp a file from a remote tftpd
that's
 also openbsd that my pf doesn't keep a state open.  This is something I
need
 to fix, however I found this in the logs on the remote tftpd and it's
 misleading:

 Jun 28 14:03:21 hostname tftpd[2506]: recv: Connection refused

 It first boggled my mind what it's trying to recv and then it came to me...
 the write error message is delayed because of the ICMP port unreachable
 travel time at which point the descriptor is already blocking in read I
guess.
 So I have changed it to this:

 Jun 28 14:03:21 hostname tftpd[2506]: sendfile: Connection refused

 which to me is a lot more explanatory on what it fails on.  sendfile is
 the function not the syscall.  I'd rather see send in there than recv.

 Here is the patch:


 Index: tftpd.c
 ===
 RCS file: /cvs/src/libexec/tftpd/tftpd.c,v
 retrieving revision 1.63
 diff -u -r1.63 tftpd.c
 --- tftpd.c   27 Oct 2009 23:59:32 -  1.63
 +++ tftpd.c   28 Jun 2012 18:00:29 -
 @@ -669,7 +669,10 @@
   error = 1;
   if (errno == EINTR)
   continue;
 - syslog(LOG_ERR, recv: %m);
 + if (errno == ECONNREFUSED)
 + syslog(LOG_ERR, sendfile: %m);
 + else
 + syslog(LOG_ERR, recv: %m);
   goto abort;
   }
   ap-th_opcode = ntohs((u_short)ap-th_opcode);



 If you think kittens will die because of this patch then don't commit it
 but otherwise I'm just trying to make sense of this better.

 Cheers,

 -peter



Re: OpenSSL handling intermediate certificates

2012-08-08 Thread David Gwynne
i believe as an ssl client you can add intermediate certs to /etc/ssl/cert.pem
and they'll be used to validate the endpoint.

if you're an ssl server and your program doesnt let you specify a chain, you
can just cat them on the end of the crt. eg, i do something like the following
when configuring certs in nginx:

root@host /etc/ssl# cat hostname.crt.201208 ca-bundle  hostname.chain.201208
root@host /etc/ssl# ln -s hostname.chain.201208 hostname.crt

and then i configure nginx to use the /etc/ssl/hostname.crt symlink to get the
full chain.

dlg

On 08/08/2012, at 6:35 AM, Justin N. Lindberg wrote:

 I suppose my question boils down to How can I validate certificates
 from SSL servers that fail to send intermediate certificates?

 There seem to be quite a few such servers out there, including some I
 have little choice but to use, and OpenSSL apparently doesn't like to
 validate a certificate if the intermediate certificates are not present.

 I tried this with OpenBSD's Apache httpd, and I had to install an
 intermediate certificate chain file, and use a directive like

 SSLCertificateChainFile /etc/ssl/sub.class1.server.ca.pem

 in httpd.conf in order for my certificate to validate with a web
 browser in OpenBSD. The default httpd.conf, which is rather verbosely
 self-documenting, does not mention this directive.

 My research leads me to believe that the tool c_rehash, which is not
 installed by default, will allow me to put intermediate certificates
 like this somewhere OpenSSL can use them for validating certificates
 from servers that do not present a complete chain all the way to a
 certificate directly signed by one of the roots.

 Is there an easier or right way to do this?  I feel like I must be
 doing something wrong when I'm driving myself bananas with all this
 technical fussing around just to validate common certificates.

 Thanks,

 --Justin



slight code refactoring in mfi(4)

2012-08-10 Thread David Gwynne
this moves knowledge of where the inbound doorbell on chips is out
of code and into the structure that stores the chip differences.

ive tested this on a perc5 (which is the xscale gen). id like a
skinny user to give it a spin too.

Index: mfi.c
===
RCS file: /cvs/src/sys/dev/ic/mfi.c,v
retrieving revision 1.122
diff -u -p -r1.122 mfi.c
--- mfi.c   12 Jan 2012 06:12:30 -  1.122
+++ mfi.c   10 Aug 2012 07:04:58 -
@@ -118,7 +118,8 @@ static const struct mfi_iop_ops mfi_iop_
mfi_xscale_fw_state,
mfi_xscale_intr_ena,
mfi_xscale_intr,
-   mfi_xscale_post
+   mfi_xscale_post,
+   MFI_IDB
 };
 
 u_int32_t  mfi_ppc_fw_state(struct mfi_softc *);
@@ -130,7 +131,8 @@ static const struct mfi_iop_ops mfi_iop_
mfi_ppc_fw_state,
mfi_ppc_intr_ena,
mfi_ppc_intr,
-   mfi_ppc_post
+   mfi_ppc_post,
+   MFI_IDB
 };
 
 u_int32_t  mfi_gen2_fw_state(struct mfi_softc *);
@@ -142,7 +144,8 @@ static const struct mfi_iop_ops mfi_iop_
mfi_gen2_fw_state,
mfi_gen2_intr_ena,
mfi_gen2_intr,
-   mfi_gen2_post
+   mfi_gen2_post,
+   MFI_IDB
 };
 
 u_int32_t  mfi_skinny_fw_state(struct mfi_softc *);
@@ -154,7 +157,8 @@ static const struct mfi_iop_ops mfi_iop_
mfi_skinny_fw_state,
mfi_skinny_intr_ena,
mfi_skinny_intr,
-   mfi_skinny_post
+   mfi_skinny_post,
+   MFI_SKINNY_IDB
 };
 
 #define mfi_fw_state(_s)   ((_s)-sc_iop-mio_fw_state(_s))
@@ -362,6 +366,7 @@ mfi_transition_firmware(struct mfi_softc
 {
int32_t fw_state, cur_state;
int max_wait, i;
+   bus_size_t  idb = sc-sc_iop-mio_idb;
 
fw_state = mfi_fw_state(sc)  MFI_STATE_MASK;
 
@@ -378,17 +383,11 @@ mfi_transition_firmware(struct mfi_softc
printf(%s: firmware fault\n, DEVNAME(sc));
return (1);
case MFI_STATE_WAIT_HANDSHAKE:
-   if (sc-sc_flags  MFI_IOP_SKINNY)
-   mfi_write(sc, MFI_SKINNY_IDB, 
MFI_INIT_CLEAR_HANDSHAKE);
-   else
-   mfi_write(sc, MFI_IDB, 
MFI_INIT_CLEAR_HANDSHAKE);
+   mfi_write(sc, idb, MFI_INIT_CLEAR_HANDSHAKE);
max_wait = 2;
break;
case MFI_STATE_OPERATIONAL:
-   if (sc-sc_flags  MFI_IOP_SKINNY)
-   mfi_write(sc, MFI_SKINNY_IDB, MFI_INIT_READY);
-   else
-   mfi_write(sc, MFI_IDB, MFI_INIT_READY);
+   mfi_write(sc, idb, MFI_INIT_READY);
max_wait = 10;
break;
case MFI_STATE_UNDEFINED:
Index: mfivar.h
===
RCS file: /cvs/src/sys/dev/ic/mfivar.h,v
retrieving revision 1.42
diff -u -p -r1.42 mfivar.h
--- mfivar.h12 Jan 2012 06:12:30 -  1.42
+++ mfivar.h10 Aug 2012 07:04:58 -
@@ -104,6 +104,7 @@ struct mfi_iop_ops {
void(*mio_intr_ena)(struct mfi_softc *);
int (*mio_intr)(struct mfi_softc *);
void(*mio_post)(struct mfi_softc *, struct mfi_ccb *);
+   bus_size_t  mio_idb;
 };
 
 struct mfi_softc {



possible performance gain for mpi(4)

2012-08-11 Thread David Gwynne
ive been beating my head against why mpi is slow on some machines
and not others, and i think this may be why.

issuing a command to the chip is done by posting its address to a
register. in my code this was done by doing a write to the register
and then using a barrier immediately after. i think the barrier
causes the cpu to wait till it knows the memory is flushed to the
register, when in reality we dont care when it happens, we should
go do other more important things.

ive only done basic testing so far, but i am hopeful.

Index: mpi.c
===
RCS file: /cvs/src/sys/dev/ic/mpi.c,v
retrieving revision 1.175
diff -u -p -r1.175 mpi.c
--- mpi.c   16 Jan 2012 10:55:46 -  1.175
+++ mpi.c   12 Aug 2012 03:17:49 -
@@ -1198,7 +1198,8 @@ mpi_start(struct mpi_softc *sc, struct m
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
ccb-ccb_state = MPI_CCB_QUEUED;
-   mpi_write(sc, MPI_REQ_QUEUE, ccb-ccb_cmd_dva);
+   bus_space_write_4(sc-sc_iot, sc-sc_ioh,
+   MPI_REQ_QUEUE, ccb-ccb_cmd_dva);
 }
 
 int



get mfi(4) product name out of the firmware

2012-08-13 Thread David Gwynne
this makes mfi(4) print details about itself like mfii(4) does.
instead of:

instead of this:

mfi0 at pci10 dev 14 function 0 Dell PERC 5 rev 0x00: apic 10 int 14, 
0x1f031028
mfi0: logical drives 2, version 5.2.2-0072, 256MB RAM
scsibus0 at mfi0: 2 targets
sd0 at scsibus0 targ 0 lun 0: DELL, PERC 5/i, 1.03 SCSI3 0/direct fixed 
naa.600188b03accac0016f418dd0a7f7f12
sd0: 69376MB, 512 bytes/sector, 142082048 sectors
sd1 at scsibus0 targ 1 lun 0: DELL, PERC 5/i, 1.03 SCSI3 0/direct fixed 
naa.600188b03accac0016f418f8e8a0a5aa
sd1: 1905664MB, 512 bytes/sector, 3902799872 sectors

itll now print this:

mfi0 at pci10 dev 14 function 0 Dell PERC 5 rev 0x00: apic 10 int 14, 
0x1f031028
mfi0: PERC 5/i Integrated, firmware 5.2.2-0072, 256MB cache
scsibus0 at mfi0: 2 targets
sd0 at scsibus0 targ 0 lun 0: DELL, PERC 5/i, 1.03 SCSI3 0/direct fixed 
naa.600188b03accac0016f418dd0a7f7f12
sd0: 69376MB, 512 bytes/sector, 142082048 sectors
sd1 at scsibus0 targ 1 lun 0: DELL, PERC 5/i, 1.03 SCSI3 0/direct fixed 
naa.600188b03accac0016f418f8e8a0a5aa
sd1: 1905664MB, 512 bytes/sector, 3902799872 sectors

cos we get the board name out of the firmware i can then get rid of the table 
of subtypes in mfi_pci.c later on. gotta love a space saving.

Index: mfi.c
===
RCS file: /cvs/src/sys/dev/ic/mfi.c,v
retrieving revision 1.126
diff -u -p -r1.126 mfi.c
--- mfi.c   13 Aug 2012 06:19:15 -  1.126
+++ mfi.c   14 Aug 2012 03:30:07 -
@@ -736,11 +736,11 @@ mfi_attach(struct mfi_softc *sc, enum mf
goto noinit;
}
 
-   printf(%s: logical drives %d, version %s, %dMB RAM\n,
-   DEVNAME(sc),
-   sc-sc_info.mci_lds_present,
-   sc-sc_info.mci_package_version,
-   sc-sc_info.mci_memory_size);
+   printf(%s: \%s\, firmware %s, DEVNAME(sc),
+   sc-sc_info.mci_product_name, sc-sc_info.mci_package_version);
+   if (letoh16(sc-sc_info.mci_memory_size)  0)
+   printf(, %uMB cache, letoh16(sc-sc_info.mci_memory_size));
+   printf(\n);
 
sc-sc_ld_cnt = sc-sc_info.mci_lds_present;
sc-sc_max_ld = sc-sc_ld_cnt;



Re: inet6 autoconf spl fix

2012-09-03 Thread David Gwynne
yes.

On 02/09/2012, at 8:20 PM, Stefan Sperling s...@openbsd.org wrote:

 prelist_update() runs at IPL_SOFTNET. Code moved out of it into a
 workq task for adding new addresses from process context should
 run at IPL_SOFTNET, too, shouldn't it?
 
 Index: netinet6/nd6_rtr.c
 ===
 RCS file: /cvs/src/sys/netinet6/nd6_rtr.c,v
 retrieving revision 1.62
 diff -u -p -r1.62 nd6_rtr.c
 --- netinet6/nd6_rtr.c28 Aug 2012 20:32:02 -  1.62
 +++ netinet6/nd6_rtr.c2 Sep 2012 10:01:34 -
 @@ -1298,7 +1298,9 @@ nd6_addr_add(void *prptr, void *arg2)
   struct nd_prefix *pr = (struct nd_prefix *)prptr;
   struct in6_ifaddr *ia6 = NULL;
   struct ifaddr *ifa;
 - int ifa_plen, autoconf, privacy;
 + int ifa_plen, autoconf, privacy, s;
 +
 + s = splsoftnet();
 
   autoconf = 1;
   privacy = (pr-ndpr_ifp-if_xflags  IFXF_INET6_NOPRIVACY) == 0;
 @@ -1362,6 +1364,8 @@ nd6_addr_add(void *prptr, void *arg2)
   pfxlist_onlink_check();
 
   pr-ndpr_refcnt--;
 +
 + splx(s);
 }
 
 /*



BCM5719 support for bge(4)

2012-09-10 Thread David Gwynne
i dont have this hardware, so i can only test that it hasnt broken
this chip:

bge0 at pci3 dev 4 function 0 Broadcom BCM5714 rev 0xa3, BCM5715 A3 (0x9003): 
ivec 0x795, address 00:14:4f:a9:34:90
brgphy0 at bge0 phy 1: BCM5714 10/100/1000baseT/SX PHY, rev. 0

i need tests from any bge users willing to give it a spin to make
sure it hasnt broken support for previous chips, in particular i
need a BCM5717 test since this driver touches the conditionals
around that one a lot.

this is an intermediate diff on the way to BCM5720 support.


Index: if_bge.c
===
RCS file: /cvs/src/sys/dev/pci/if_bge.c,v
retrieving revision 1.311
diff -u -p -r1.311 if_bge.c
--- if_bge.c4 Jul 2012 13:24:41 -   1.311
+++ if_bge.c10 Sep 2012 12:53:31 -
@@ -305,6 +305,7 @@ const struct pci_matchid bge_devices[] =
 #define BGE_IS_5755_PLUS(sc)   ((sc)-bge_flags  BGE_5755_PLUS)
 #define BGE_IS_5700_FAMILY(sc) ((sc)-bge_flags  BGE_5700_FAMILY)
 #define BGE_IS_5714_FAMILY(sc) ((sc)-bge_flags  BGE_5714_FAMILY)
+#define BGE_IS_5717_PLUS(sc)   ((sc)-bge_flags  BGE_5717_PLUS)
 #define BGE_IS_JUMBO_CAPABLE(sc)   ((sc)-bge_flags  BGE_JUMBO_CAPABLE)
 
 static const struct bge_revision {
@@ -400,6 +401,7 @@ static const struct bge_revision bge_maj
{ BGE_ASICREV_BCM5906, unknown BCM5906 },
{ BGE_ASICREV_BCM57780, unknown BCM57780 },
{ BGE_ASICREV_BCM5717, unknown BCM5717 },
+   { BGE_ASICREV_BCM5719, unknown BCM5719 },
{ BGE_ASICREV_BCM57765, unknown BCM57765 },
 
{ 0, NULL }
@@ -1260,7 +1262,19 @@ bge_chipinit(struct bge_softc *sc)
if (BGE_ASICREV(sc-bge_chipid) == BGE_ASICREV_BCM5703 ||
BGE_ASICREV(sc-bge_chipid) == BGE_ASICREV_BCM5704)
dma_rw_ctl = ~BGE_PCIDMARWCTL_MINDMA;
-
+   if (BGE_IS_5717_PLUS(sc)) {
+   dma_rw_ctl = ~BGE_PCIDMARWCTL_DIS_CACHE_ALIGNMENT;
+   if (sc-bge_chipid == BGE_CHIPID_BCM57765_A0)
+   dma_rw_ctl = ~BGE_PCIDMARWCTL_CRDRDR_RDMA_MRRS_MSK;
+   /*
+* Enable HW workaround for controllers that misinterpret
+* a status tag update and leave interrupts permanently
+* disabled.
+*/
+   if (BGE_ASICREV(sc-bge_chipid) != BGE_ASICREV_BCM5717 
+   BGE_ASICREV(sc-bge_chipid) != BGE_ASICREV_BCM57765)
+   dma_rw_ctl |= BGE_PCIDMARWCTL_TAGGED_STATUS_WA;
+}
pci_conf_write(pa-pa_pc, pa-pa_tag, BGE_PCI_DMA_RW_CTL, dma_rw_ctl);
 
/*
@@ -1318,7 +1332,7 @@ bge_blockinit(struct bge_softc *sc)
vaddr_t rcb_addr;
int i;
bge_hostaddrtaddr;
-   u_int32_t   val;
+   u_int32_t   dmactl, val;
 
/*
 * Initialize the memory window pointer register so that
@@ -1346,8 +1360,7 @@ bge_blockinit(struct bge_softc *sc)
 
/* Configure mbuf pool watermarks */
/* new Broadcom docs strongly recommend these: */
-   if (BGE_ASICREV(sc-bge_chipid) == BGE_ASICREV_BCM5717 ||
-   BGE_ASICREV(sc-bge_chipid) == BGE_ASICREV_BCM57765) {
+   if (BGE_IS_5717_PLUS(sc)) {
CSR_WRITE_4(sc, BGE_BMAN_MBUFPOOL_READDMA_LOWAT, 0x0);
CSR_WRITE_4(sc, BGE_BMAN_MBUFPOOL_MACRX_LOWAT, 0x2a);
CSR_WRITE_4(sc, BGE_BMAN_MBUFPOOL_HIWAT, 0xa0);
@@ -1372,8 +1385,16 @@ bge_blockinit(struct bge_softc *sc)
CSR_WRITE_4(sc, BGE_BMAN_DMA_DESCPOOL_HIWAT, 10);
 
/* Enable buffer manager */
-   CSR_WRITE_4(sc, BGE_BMAN_MODE,
-   BGE_BMANMODE_ENABLE|BGE_BMANMODE_LOMBUF_ATTN);
+   val = BGE_BMANMODE_ENABLE | BGE_BMANMODE_LOMBUF_ATTN;
+   /*
+* Change the arbitration algorithm of TXMBUF read request to
+* round-robin instead of priority based for BCM5719.  When
+* TXFIFO is almost empty, RDMA will hold its request until
+* TXFIFO is not almost empty.
+*/
+   if (BGE_ASICREV(sc-bge_chipid) == BGE_ASICREV_BCM5719)
+   val |= BGE_BMANMODE_NO_TX_UNDERRUN;
+   CSR_WRITE_4(sc, BGE_BMAN_MODE, val);
 
/* Poll for buffer manager start indication */
for (i = 0; i  2000; i++) {
@@ -1408,8 +1429,7 @@ bge_blockinit(struct bge_softc *sc)
/* Initialize the standard RX ring control block */
rcb = sc-bge_rdata-bge_info.bge_std_rx_rcb;
BGE_HOSTADDR(rcb-bge_hostaddr, BGE_RING_DMA_ADDR(sc, bge_rx_std_ring));
-   if (BGE_ASICREV(sc-bge_chipid) == BGE_ASICREV_BCM5717 ||
-   BGE_ASICREV(sc-bge_chipid) == BGE_ASICREV_BCM57765)
+   if (BGE_IS_5717_PLUS(sc))
rcb-bge_maxlen_flags = (BGE_RCB_MAXLEN_FLAGS(512, 0) |
(ETHER_MAX_DIX_LEN  2));
else if (BGE_IS_5705_PLUS(sc))
@@ -1417,7 +1437,11 @@ bge_blockinit(struct bge_softc *sc)
else
 

tell mii where bge(4)s phy is up front

2012-09-12 Thread David Gwynne
bge(4)s traditionally only have a phy at address 1, which is enforced
by the mii_read backend by failing reads at any other address.

why not just tell mii up front that the phy is at address 1?

why not avoid a conditional in an io path?

also, this is necessary to support recent chips which have phys at
different locations. 5717s through 5720s (and maybe chips in the
future) have phys at addresses relative to the pci function number
and whether theyre serdes capable. to support them we'll have to
be able to support phys at locations other than 1.

tested on this:

bge0 at pci3 dev 4 function 0 Broadcom BCM5714 rev 0xa3, BCM5715 A3 (0x9003): 
ivec 0x795, address 00:14:4f:a9:34:90
brgphy0 at bge0 phy 1: BCM5714 10/100/1000baseT/SX PHY, rev. 0

so at least one old thing still works. ok?


Index: if_bge.c
===
RCS file: /cvs/src/sys/dev/pci/if_bge.c,v
retrieving revision 1.311
diff -u -p -r1.311 if_bge.c
--- if_bge.c4 Jul 2012 13:24:41 -   1.311
+++ if_bge.c13 Sep 2012 01:16:19 -
@@ -586,18 +586,6 @@ bge_miibus_readreg(struct device *dev, i
u_int32_t val, autopoll;
int i;
 
-   /*
-* Broadcom's own driver always assumes the internal
-* PHY is at GMII address 1. On some chips, the PHY responds
-* to accesses at all addresses, which could cause us to
-* bogusly attach the PHY 32 times at probe type. Always
-* restricting the lookup to address 1 is simpler than
-* trying to figure out which chips revisions should be
-* special-cased.
-*/
-   if (phy != 1)
-   return (0);
-
/* Reading with autopolling on may trigger PCI errors */
autopoll = CSR_READ_4(sc, BGE_MI_MODE);
if (autopoll  BGE_MIMODE_AUTOPOLL) {
@@ -2253,8 +2241,8 @@ bge_attach(struct device *parent, struct
if (sc-bge_flags  BGE_PHY_FIBER_MII)
mii_flags |= MIIF_HAVEFIBER;
mii_attach(sc-bge_dev, sc-bge_mii, 0x,
-  MII_PHY_ANY, MII_OFFSET_ANY, mii_flags);
-   
+   1, MII_OFFSET_ANY, mii_flags);
+
if (LIST_FIRST(sc-bge_mii.mii_phys) == NULL) {
printf(%s: no PHY found!\n, sc-bge_dev.dv_xname);
ifmedia_add(sc-bge_mii.mii_media,



Re: if_bnx.c bnx_rx_intr diff

2013-01-22 Thread David Gwynne
i cant see anything except some stupid debug code touching that memory. it is 
reading it though. shouldnt we sync it for READs at the bottom of rx_intr too?

dlg

On 22/01/2013, at 9:49 AM, David Hill dh...@mindcry.org wrote:

 I think this should be POSTREAD in bnx_rx_intr.  This matches
 FreeBSD.
 
 Index: if_bnx.c
 ===
 RCS file: /cvs/src/sys/dev/pci/if_bnx.c,v
 retrieving revision 1.100
 diff -N -u -p if_bnx.c
 --- if_bnx.c  13 Jan 2013 05:45:10 -  1.100
 +++ if_bnx.c  21 Jan 2013 23:46:38 -
 @@ -4323,7 +4323,7 @@ bnx_rx_intr(struct bnx_softc *sc)
   bus_dmamap_sync(sc-bnx_dmatag,
   sc-rx_bd_chain_map[i], 0,
   sc-rx_bd_chain_map[i]-dm_mapsize,
 - BUS_DMASYNC_POSTWRITE);
 + BUS_DMASYNC_POSTREAD);
 
   /* Get the hardware's view of the RX consumer index. */
   hw_cons = sc-hw_rx_cons = sblk-status_rx_quick_consumer_index0;



scsi_xfers and hand rolled queue.h type operations

2013-02-03 Thread David Gwynne
scsi_xfers have a thing in them for letting adapters store them on
lists via a LIST_ENTRY. turns out most adapters want SIMPLEQ type
operations, so they end up doing things by hand. LIST_ENTRYs and
SIMPLEQ_ENTRYs are the same size, so this is effectively just a
code simplification.

id appreciate it if a gdt user could test this for me. isp seems
to be fine so far.

ok?

Index: scsi/scsiconf.h
===
RCS file: /cvs/src/sys/scsi/scsiconf.h,v
retrieving revision 1.150
diff -u -p -r1.150 scsiconf.h
--- scsi/scsiconf.h 1 Jul 2012 01:41:13 -   1.150
+++ scsi/scsiconf.h 4 Feb 2013 04:24:40 -
@@ -390,7 +390,7 @@ struct scsi_attach_args {
  * (via the scsi_link structure)
  */
 struct scsi_xfer {
-   LIST_ENTRY(scsi_xfer) free_list;
+   SIMPLEQ_ENTRY(scsi_xfer) xfer_list;
int flags;
struct  scsi_link *sc_link; /* all about our device and adapter */
int retries;/* the number of times to retry */
@@ -414,6 +414,7 @@ struct scsi_xfer {
 
void *io;   /* adapter io resource */
 };
+SIMPLEQ_HEAD(scsi_xfer_list, scsi_xfer);
 
 /*
  * Per-request Flag values
Index: dev/ic/gdt_common.c
===
RCS file: /cvs/src/sys/dev/ic/gdt_common.c,v
retrieving revision 1.61
diff -u -p -r1.61 gdt_common.c
--- dev/ic/gdt_common.c 15 Aug 2012 02:38:14 -  1.61
+++ dev/ic/gdt_common.c 4 Feb 2013 04:24:40 -
@@ -129,7 +129,7 @@ gdt_attach(struct gdt_softc *sc)
TAILQ_INIT(sc-sc_free_ccb);
TAILQ_INIT(sc-sc_ccbq);
TAILQ_INIT(sc-sc_ucmdq);
-   LIST_INIT(sc-sc_queue);
+   SIMPLEQ_INIT(sc-sc_queue);
 
mtx_init(sc-sc_ccb_mtx, IPL_BIO);
scsi_iopool_init(sc-sc_iopool, sc, gdt_ccb_alloc, gdt_ccb_free);
@@ -517,14 +517,10 @@ gdt_eval_mapping(u_int32_t size, int *cy
 void
 gdt_enqueue(struct gdt_softc *sc, struct scsi_xfer *xs, int infront)
 {
-   if (infront || LIST_FIRST(sc-sc_queue) == NULL) {
-   if (LIST_FIRST(sc-sc_queue) == NULL)
-   sc-sc_queuelast = xs;
-   LIST_INSERT_HEAD(sc-sc_queue, xs, free_list);
-   return;
-   }
-   LIST_INSERT_AFTER(sc-sc_queuelast, xs, free_list);
-   sc-sc_queuelast = xs;
+   if (infront)
+   SIMPLEQ_INSERT_HEAD(sc-sc_queue, xs, xfer_list);
+   else
+   SIMPLEQ_INSERT_TAIL(sc-sc_queue, xs, xfer_list);
 }
 
 /*
@@ -535,13 +531,9 @@ gdt_dequeue(struct gdt_softc *sc)
 {
struct scsi_xfer *xs;
 
-   xs = LIST_FIRST(sc-sc_queue);
-   if (xs == NULL)
-   return (NULL);
-   LIST_REMOVE(xs, free_list);
-
-   if (LIST_FIRST(sc-sc_queue) == NULL)
-   sc-sc_queuelast = NULL;
+   xs = SIMPLEQ_FIRST(sc-sc_queue);
+   if (xs != NULL)
+   SIMPLEQ_REMOVE_HEAD(sc-sc_queue, xfer_list);
 
return (xs);
 }
@@ -584,7 +576,7 @@ gdt_scsi_cmd(struct scsi_xfer *xs)
}
 
/* Don't double enqueue if we came from gdt_chain. */
-   if (xs != LIST_FIRST(sc-sc_queue))
+   if (xs != SIMPLEQ_FIRST(sc-sc_queue))
gdt_enqueue(sc, xs, 0);
 
while ((xs = gdt_dequeue(sc)) != NULL) {
@@ -1307,8 +1299,8 @@ gdt_chain(struct gdt_softc *sc)
 {
GDT_DPRINTF(GDT_D_INTR, (gdt_chain(%p) , sc));
 
-   if (LIST_FIRST(sc-sc_queue))
-   gdt_scsi_cmd(LIST_FIRST(sc-sc_queue));
+   if (!SIMPLEQ_EMPTY(sc-sc_queue))
+   gdt_scsi_cmd(SIMPLEQ_FIRST(sc-sc_queue));
 }
 
 void
Index: dev/ic/gdtvar.h
===
RCS file: /cvs/src/sys/dev/ic/gdtvar.h,v
retrieving revision 1.21
diff -u -p -r1.21 gdtvar.h
--- dev/ic/gdtvar.h 15 Aug 2012 02:38:14 -  1.21
+++ dev/ic/gdtvar.h 4 Feb 2013 04:24:40 -
@@ -131,8 +131,7 @@ struct gdt_softc {
struct gdt_ccb sc_ccbs[GDT_MAXCMDS];
TAILQ_HEAD(, gdt_ccb) sc_free_ccb, sc_ccbq;
TAILQ_HEAD(, gdt_ucmd) sc_ucmdq;
-   LIST_HEAD(, scsi_xfer) sc_queue;
-   struct scsi_xfer *sc_queuelast;
+   struct scsi_xfer_list sc_queue;
 
struct mutexsc_ccb_mtx;
struct scsi_iopool  sc_iopool;
Index: dev/ic/isp_openbsd.c
===
RCS file: /cvs/src/sys/dev/ic/isp_openbsd.c,v
retrieving revision 1.47
diff -u -p -r1.47 isp_openbsd.c
--- dev/ic/isp_openbsd.c22 Oct 2011 19:34:06 -  1.47
+++ dev/ic/isp_openbsd.c4 Feb 2013 04:24:40 -
@@ -103,7 +103,7 @@ isp_attach(struct ispsoftc *isp)
 * We only manage a single wait queues for dual bus controllers.
 * This is arguably broken.
 */
-   isp-isp_osinfo.wqf = isp-isp_osinfo.wqt = NULL;
+   SIMPLEQ_INIT(isp-isp_osinfo.wq);
 
lptr-adapter_softc = isp;
lptr-adapter = isp-isp_osinfo._adapter;
@@ -276,13 +276,7 @@ 

Re: [PATCH] tftpd: DoS vuln

2013-03-15 Thread David Gwynne

On 15/03/2013, at 9:02 AM, Maxime Villard rusty...@gmx.fr wrote:

 Hi,
 there is a huge bug in the tftp daemon.

sure, but then there's also a huge bug in your fix by your own definition of 
huge.

when oack fails it frees the client struct and everything hanging off it. now 
you avoid the unconditional free of of the client options straight after oack 
is called by going to the error label if oack fails, which calls nak, which 
then uses client and tries to free it again.

nice find though. i'll put a fix in shortly which will look very much like 
yours.

dlg


 
 -- /usr/src/usr.sbin/tftpd/tftpd.c --
 In tftp_open() on line 870, the daemon checks for options
 (OACK) and handle them through the oack() function. Then,
 it frees and NULLs the variable client-options.
 
 oack() - l.1390 - does some stuff, and if an error happens,
 client_free() is called with the client structure, which
 frees this structure, including client-options, but does
 not null it.
 
 So, when returning after an error, client-options is freed
 twice, causing a double-free and a crash.
 
 I succeeded in making the server crash, by changing the
 tsize and blksize options before transferring a small file
 in localhost. In fact, if you look on line 1432, you'll see
 that you just have to close the socket to make the server
 crash, just after sending the OACK packet.
 
 Typically, I just have to do:
  $ tftp localhost
  tftp tsize 2
  Tsize option on.
  tftp blksize 10
  tftp get test
  [...wait ~ 10 sec...]
  tftpd in free(): error: chunk is already free 0x1a658ca65f40
 
 Here is a patch. I switched oack() to int, so that we can
 handle errors and call nak() to alert the client before
 closing the connection. nak() frees the client structure,
 so all goes well.
 
 Ok/Comments?
 
 
 Index: tftpd.c
 ===
 RCS file: /cvs/src/usr.sbin/tftpd/tftpd.c,v
 retrieving revision 1.8
 diff -u -r1.8 tftpd.c
 --- tftpd.c   13 Jul 2012 02:31:46 -  1.8
 +++ tftpd.c   15 Mar 2013 00:36:40 -
 @@ -168,7 +168,7 @@
 void  tftp(struct tftp_client *, struct tftphdr *, size_t);
 void  tftp_open(struct tftp_client *, const char *);
 void  nak(struct tftp_client *, int);
 -void oack(struct tftp_client *);
 +int  oack(struct tftp_client *);
 void  oack_done(int, short, void *);
 
 void  sendfile(struct tftp_client *);
 @@ -876,8 +876,8 @@
   goto error;
 
   if (client-options) {
 - oack(client);
 -
 + if (oack(client) == -1)
 + goto error;
   free(client-options);
   client-options = NULL;
   } else if (client-opcode == WRQ) {
 @@ -1386,7 +1386,7 @@
 /*
  * Send an oack packet (option acknowledgement).
  */
 -void
 +int
 oack(struct tftp_client *client)
 {
   struct opt_client *options = client-options;
 @@ -1436,10 +1436,10 @@
   oack_done, client);
 
   event_add(client-sev, client-tv);
 - return;
 + return 0;
 
 error:
 - client_free(client);
 + return -1;
 }
 
 int
 




Re: [PATCH] tftpd: DoS vuln

2013-03-17 Thread David Gwynne
good point.

On 16/03/2013, at 10:17 PM, Maxime Villard rusty...@gmx.fr wrote:

 Le 15/03/2013 12:17, David Gwynne a écrit :
 
 On 15/03/2013, at 9:02 AM, Maxime Villard rusty...@gmx.fr wrote:
 
 Hi,
 there is a huge bug in the tftp daemon.
 
 sure, but then there's also a huge bug in your fix by your own definition of 
 huge.
 when oack fails it frees the client struct and everything hanging off it. 
 now you avoid the unconditional free of of the client options straight after 
 oack is called by going to the error label if oack fails, which calls nak, 
 which then uses client and tries to free it again.
 
 
 
 No. I removed the call to client_free() in oack():
 
 - client_free(client);
 + return -1;
 
 There is no double-free in my patch. But I saw you committed
 fixes, so ok now.
 
 
 nice find though. i'll put a fix in shortly which will look very much like 
 yours.
 
 dlg
 
 
 
 -- /usr/src/usr.sbin/tftpd/tftpd.c --
 In tftp_open() on line 870, the daemon checks for options
 (OACK) and handle them through the oack() function. Then,
 it frees and NULLs the variable client-options.
 
 oack() - l.1390 - does some stuff, and if an error happens,
 client_free() is called with the client structure, which
 frees this structure, including client-options, but does
 not null it.
 
 So, when returning after an error, client-options is freed
 twice, causing a double-free and a crash.
 
 I succeeded in making the server crash, by changing the
 tsize and blksize options before transferring a small file
 in localhost. In fact, if you look on line 1432, you'll see
 that you just have to close the socket to make the server
 crash, just after sending the OACK packet.
 
 Typically, I just have to do:
  $ tftp localhost
  tftp tsize 2
  Tsize option on.
  tftp blksize 10
  tftp get test
  [...wait ~ 10 sec...]
  tftpd in free(): error: chunk is already free 0x1a658ca65f40
 
 Here is a patch. I switched oack() to int, so that we can
 handle errors and call nak() to alert the client before
 closing the connection. nak() frees the client structure,
 so all goes well.
 
 Ok/Comments?
 
 
 Index: tftpd.c
 ===
 RCS file: /cvs/src/usr.sbin/tftpd/tftpd.c,v
 retrieving revision 1.8
 diff -u -r1.8 tftpd.c
 --- tftpd.c 13 Jul 2012 02:31:46 -  1.8
 +++ tftpd.c 15 Mar 2013 00:36:40 -
 @@ -168,7 +168,7 @@
 voidtftp(struct tftp_client *, struct tftphdr *, size_t);
 voidtftp_open(struct tftp_client *, const char *);
 voidnak(struct tftp_client *, int);
 -void   oack(struct tftp_client *);
 +intoack(struct tftp_client *);
 voidoack_done(int, short, void *);
 
 voidsendfile(struct tftp_client *);
 @@ -876,8 +876,8 @@
 goto error;
 
 if (client-options) {
 -   oack(client);
 -
 +   if (oack(client) == -1)
 +   goto error;
 free(client-options);
 client-options = NULL;
 } else if (client-opcode == WRQ) {
 @@ -1386,7 +1386,7 @@
 /*
  * Send an oack packet (option acknowledgement).
  */
 -void
 +int
 oack(struct tftp_client *client)
 {
 struct opt_client *options = client-options;
 @@ -1436,10 +1436,10 @@
 oack_done, client);
 
 event_add(client-sev, client-tv);
 -   return;
 +   return 0;
 
 error:
 -   client_free(client);
 +   return -1;
 }
 
 int
 
 
 
 
 




better filenames for certificates in relayd

2013-03-19 Thread David Gwynne
this lets the code that picks the filenames to use for certificates
fall through to using the services name, instead of just the ip
addresses of the service.

eg, if i have this in relayd.conf:

relay sslnews.eait.uq.edu.au {
listen on 0.0.0.0 port 563 ssl
forward to news port 119 check send  expect 200 *
protocol sslencap
}

i can have this on disk:

/etc/ssl/private/sslnews.eait.uq.edu.au.key
/etc/ssl/sslnews.eait.uq.edu.au.crt

and it works(tm).

it makes it easier to separate the service (relayd) from the hosts
underlying configuration. imagine a pool of boxes doing ssl offloading
with a centrally managed relayd.conf.

ok?

Index: relay.c
===
RCS file: /cvs/src/usr.sbin/relayd/relay.c,v
retrieving revision 1.164
diff -u -p -r1.164 relay.c
--- relay.c 10 Mar 2013 23:32:53 -  1.164
+++ relay.c 19 Mar 2013 07:49:28 -
@@ -42,6 +42,7 @@
 #include pwd.h
 #include event.h
 #include fnmatch.h
+#include netdb.h
 
 #include openssl/ssl.h
 
@@ -81,6 +82,7 @@ void   relay_ssl_readcb(int, short, void
 voidrelay_ssl_writecb(int, short, void *);
 
 char   *relay_load_file(const char *, off_t *);
+int relay_load_certfile(struct relay *, const char *);
 static __inline int
 relay_proto_cmp(struct protonode *, struct protonode *);
 extern void bufferevent_read_pressure_cb(struct evbuffer *, size_t,
@@ -2352,10 +2354,38 @@ relay_load_file(const char *name, off_t 
 }
 
 int
+relay_load_certfile(struct relay *rlay, const char *cert)
+{
+   char file[PATH_MAX];
+
+   if (snprintf(file, sizeof(file),
+   /etc/ssl/%s.crt, cert) == -1)
+   return (-1);
+
+   if ((rlay-rl_ssl_cert = relay_load_file(file,
+   rlay-rl_conf.ssl_cert_len)) == NULL)
+   return (-1);
+
+   log_debug(%s: using certificate %s, __func__, file);
+
+   if (snprintf(file, sizeof(file),
+   /etc/ssl/private/%s.key, cert) == -1)
+   return -1;
+
+   if ((rlay-rl_ssl_key = relay_load_file(file,
+   rlay-rl_conf.ssl_key_len)) == NULL)
+   return (-1);
+
+   log_debug(%s: using private key %s, __func__, file);
+
+   return (0);
+}
+
+int
 relay_load_certfiles(struct relay *rlay)
 {
char certfile[PATH_MAX];
-   char hbuf[sizeof(::::::255.255.255.255)];
+   char hbuf[NI_MAXHOST];
struct protocol *proto = rlay-rl_proto;
int  useport = htons(rlay-rl_conf.port);
 
@@ -2372,36 +2402,19 @@ relay_load_certfiles(struct relay *rlay)
if (print_host(rlay-rl_conf.ss, hbuf, sizeof(hbuf)) == NULL)
return (-1);
 
-   if (snprintf(certfile, sizeof(certfile),
-   /etc/ssl/%s:%u.crt, hbuf, useport) == -1)
+   if (snprintf(certfile, sizeof(certfile), %s:%u,
+   hbuf, useport) == -1)
return (-1);
-   if ((rlay-rl_ssl_cert = relay_load_file(certfile,
-   rlay-rl_conf.ssl_cert_len)) == NULL) {
-   if (snprintf(certfile, sizeof(certfile),
-   /etc/ssl/%s.crt, hbuf) == -1)
-   return (-1);
-   if ((rlay-rl_ssl_cert = relay_load_file(certfile,
-   rlay-rl_conf.ssl_cert_len)) == NULL)
-   return (-1);
-   useport = 0;
-   }
-   log_debug(%s: using certificate %s, __func__, certfile);
+   if (relay_load_certfile(rlay, certfile) == 0)
+   return (0);
 
-   if (useport) {
-   if (snprintf(certfile, sizeof(certfile),
-   /etc/ssl/private/%s:%u.key, hbuf, useport) == -1)
-   return -1;
-   } else {
-   if (snprintf(certfile, sizeof(certfile),
-   /etc/ssl/private/%s.key, hbuf) == -1)
-   return -1;
-   }
-   if ((rlay-rl_ssl_key = relay_load_file(certfile,
-   rlay-rl_conf.ssl_key_len)) == NULL)
-   return (-1);
-   log_debug(%s: using private key %s, __func__, certfile);
+   if (relay_load_certfile(rlay, hbuf) == 0)
+   return (0);
 
-   return (0);
+   if (relay_load_certfile(rlay, rlay-rl_conf.name) == 0)
+   return (0);
+
+   return (-1);
 }
 
 static __inline int
Index: relayd.conf.5
===
RCS file: /cvs/src/usr.sbin/relayd/relayd.conf.5,v
retrieving revision 1.132
diff -u -p -r1.132 relayd.conf.5
--- relayd.conf.5   29 Nov 2012 01:01:53 -  1.132
+++ relayd.conf.5   19 Mar 2013 07:49:28 -
@@ -665,6 +665,11 @@ If these files are not present, the rela
 .Pa /etc/ssl/private/address.key
 and
 .Pa /etc/ssl/address.crt .
+If those files are not present, the relay will finally try to use
+.Pa /etc/ssl/private/name.key
+and 
+.Pa 

Re: better filenames for certificates in relayd

2013-03-19 Thread David Gwynne

On 19/03/2013, at 7:56 PM, Reyk Floeter r...@openbsd.org wrote:

 On Tue, Mar 19, 2013 at 05:57:16PM +1000, David Gwynne wrote:
 this lets the code that picks the filenames to use for certificates
 fall through to using the services name, instead of just the ip
 addresses of the service.
 
 eg, if i have this in relayd.conf:
 
  relay sslnews.eait.uq.edu.au {
  listen on 0.0.0.0 port 563 ssl
  forward to news port 119 check send  expect 200 *
  protocol sslencap
  }
 
 i can have this on disk:
 
  /etc/ssl/private/sslnews.eait.uq.edu.au.key
  /etc/ssl/sslnews.eait.uq.edu.au.crt
 
 and it works(tm).
 
 it makes it easier to separate the service (relayd) from the hosts
 underlying configuration. imagine a pool of boxes doing ssl offloading
 with a centrally managed relayd.conf.
 
 ok?
 
 
 better is a definition based on your setup - using the ip-based
 scheme allows to use the same cert+key files for multiple relays
 running on the same ip but a different port.  this is actually also
 very common and better for 50% of the other users :) but as long as
 you keep the current behavior and check the ip-based keys / certs
 first, like your diff does, it should be ok.

yeah, the port and host certs are preferred over the name ones.

in my mind the best solution would be to require the user to explicitly specify 
the files in the configuration.

 but please wait with this diff for three reasons:
 - it conflicts with the bigger ssl inspection diff that should go in
 first (in next few days).

ok.

 - we need to look at SNI, it is highly demanded by many, and this
 might require adjustments to the configuration logic as well.

specifying multiple certs by name would make sense for that.

 - i would like to review the diff more carefully but i'll fly home in
 a few hours and de-jetlag afterwards before i'm able to do it.

while i was driving home i realised that i leak memory if a port/host cert 
exists but its key doesnt. so yeah, it could do with some tweaks.

dlg

 
 reyk
 
 Index: relay.c
 ===
 RCS file: /cvs/src/usr.sbin/relayd/relay.c,v
 retrieving revision 1.164
 diff -u -p -r1.164 relay.c
 --- relay.c  10 Mar 2013 23:32:53 -  1.164
 +++ relay.c  19 Mar 2013 07:49:28 -
 @@ -42,6 +42,7 @@
 #include pwd.h
 #include event.h
 #include fnmatch.h
 +#include netdb.h
 
 #include openssl/ssl.h
 
 @@ -81,6 +82,7 @@ voidrelay_ssl_readcb(int, short, void
 void  relay_ssl_writecb(int, short, void *);
 
 char *relay_load_file(const char *, off_t *);
 +int  relay_load_certfile(struct relay *, const char *);
 static __inline int
   relay_proto_cmp(struct protonode *, struct protonode *);
 extern void   bufferevent_read_pressure_cb(struct evbuffer *, size_t,
 @@ -2352,10 +2354,38 @@ relay_load_file(const char *name, off_t 
 }
 
 int
 +relay_load_certfile(struct relay *rlay, const char *cert)
 +{
 +char file[PATH_MAX];
 +
 +if (snprintf(file, sizeof(file),
 +/etc/ssl/%s.crt, cert) == -1)
 +return (-1);
 +
 +if ((rlay-rl_ssl_cert = relay_load_file(file,
 +rlay-rl_conf.ssl_cert_len)) == NULL)
 +return (-1);
 +
 +log_debug(%s: using certificate %s, __func__, file);
 +
 +if (snprintf(file, sizeof(file),
 +/etc/ssl/private/%s.key, cert) == -1)
 +return -1;
 +
 +if ((rlay-rl_ssl_key = relay_load_file(file,
 +rlay-rl_conf.ssl_key_len)) == NULL)
 +return (-1);
 +
 +log_debug(%s: using private key %s, __func__, file);
 +
 +return (0);
 +}
 +
 +int
 relay_load_certfiles(struct relay *rlay)
 {
  char certfile[PATH_MAX];
 -char hbuf[sizeof(::::::255.255.255.255)];
 +char hbuf[NI_MAXHOST];
  struct protocol *proto = rlay-rl_proto;
  int  useport = htons(rlay-rl_conf.port);
 
 @@ -2372,36 +2402,19 @@ relay_load_certfiles(struct relay *rlay)
  if (print_host(rlay-rl_conf.ss, hbuf, sizeof(hbuf)) == NULL)
  return (-1);
 
 -if (snprintf(certfile, sizeof(certfile),
 -/etc/ssl/%s:%u.crt, hbuf, useport) == -1)
 +if (snprintf(certfile, sizeof(certfile), %s:%u,
 +hbuf, useport) == -1)
  return (-1);
 -if ((rlay-rl_ssl_cert = relay_load_file(certfile,
 -rlay-rl_conf.ssl_cert_len)) == NULL) {
 -if (snprintf(certfile, sizeof(certfile),
 -/etc/ssl/%s.crt, hbuf) == -1)
 -return (-1);
 -if ((rlay-rl_ssl_cert = relay_load_file(certfile,
 -rlay-rl_conf.ssl_cert_len)) == NULL)
 -return (-1);
 -useport = 0;
 -}
 -log_debug(%s: using certificate %s, __func__, certfile);
 +if (relay_load_certfile(rlay, certfile) == 0)
 +return (0);
 
 -if (useport) {
 -if (snprintf(certfile, sizeof(certfile

Re: bge: use BUS_DMA_NOWIAT in functions called from the timeout

2013-05-21 Thread David Gwynne
sad but ok dlg@

On 22/05/2013, at 2:06 AM, Mark Kettenis mark.kette...@xs4all.nl wrote:

 From: Theo de Raadt dera...@cvs.openbsd.org
 Date: Tue, 21 May 2013 09:23:04 -0600
 
 I'd say no.  Why is the driver tearing down and reinitializing the dma
 maps when a watchdog timeout happens?  That's just wrong.
 
 Because bge_watchdog() simply calls bge_init() to redo everything.  It
 is simply the way the driver was written.
 
 During autoconfig, NOWAIT will be OK.  bge is not a hotplug capable
 chipset either, so the non-watchdog bge_init() code path will get the
 resources it wants.
 
 The change seems sane enough relative to the alternative of rewriting
 everything.
 
 Fair enough.  Indeed other drivers are doing the same thing.  Guess
 that if you get a watchdog timeout under memory pressure, you're
 simply screwed.
 
 ok kettenis@



Re: bge: don't use autopoll on anything above BCM5705

2013-05-29 Thread David Gwynne
ive tested this on:

bge0 at pci4 dev 0 function 0 Broadcom BCM5721 rev 0x21, BCM5750 C1 (0x4201): 
apic 0 int 16, address 00:18:f3:d1:80:64
brgphy0 at bge0 phy 1: BCM5750 10/100/1000baseT PHY, rev. 0

bge1 at pci5 dev 0 function 0 Broadcom BCM5720 rev 0x00, BCM5720 A0 
(0x572), APE firmware NCSI 1.0.85.0: apic 0 int 16, address 
d4:ae:52:a7:05:38
brgphy1 at bge1 phy 1: BCM5720C 10/100/1000baseT PHY, rev. 0
bge2 at pci5 dev 0 function 1 Broadcom BCM5720 rev 0x00, BCM5720 A0 
(0x572), APE firmware NCSI 1.0.85.0: apic 0 int 17, address 
d4:ae:52:a7:05:39
brgphy2 at bge2 phy 2: BCM5720C 10/100/1000baseT PHY, rev. 0

bge0 at pci3 dev 4 function 0 Broadcom BCM5714 rev 0xa3, BCM5715 A3 (0x9003): 
ivec 0x795, address 00:14:4f:a9:34:90
brgphy0 at bge0 phy 1: BCM5714 10/100/1000baseT/SX PHY, rev. 0
bge1 at pci3 dev 4 function 1 Broadcom BCM5714 rev 0xa3, BCM5715 A3 (0x9003): 
ivec 0x796, address 00:14:4f:a9:34:91
brgphy1 at bge1 phy 1: BCM5714 10/100/1000baseT/SX PHY, rev. 0

bge2 at pci7 dev 0 function 0 Broadcom BCM5719 rev 0x01, unknown BCM5719 
(0x5719001), APE firmware NCSI 1.0.60.0: ivec 0x795, address 00:10:18:e5:e1:b8
brgphy2 at bge2 phy 1: BCM5719C 10/100/1000baseT PHY, rev. 0
bge3 at pci7 dev 0 function 1 Broadcom BCM5719 rev 0x01, unknown BCM5719 
(0x5719001), APE firmware NCSI 1.0.60.0: ivec 0x796, address 00:10:18:e5:e1:b9
brgphy3 at bge3 phy 2: BCM5719C 10/100/1000baseT PHY, rev. 0
bge4 at pci7 dev 0 function 2 Broadcom BCM5719 rev 0x01, unknown BCM5719 
(0x5719001), APE firmware NCSI 1.0.60.0: ivec 0x795, address 00:10:18:e5:e1:ba
brgphy4 at bge4 phy 3: BCM5719C 10/100/1000baseT PHY, rev. 0
bge5 at pci7 dev 0 function 3 Broadcom BCM5719 rev 0x01, unknown BCM5719 
(0x5719001), APE firmware NCSI 1.0.60.0: ivec 0x796, address 00:10:18:e5:e1:bb
brgphy5 at bge5 phy 4: BCM5719C 10/100/1000baseT PHY, rev. 0
bge6 at pci13 dev 4 function 0 Broadcom BCM5714 rev 0xa3, BCM5715 A3 
(0x9003): ivec 0x7d6, address 00:14:4f:a9:34:92

working fine on each of them.

makes sense to me. ok.

On 28/05/2013, at 12:53 AM, Mike Belopuhov m...@belopuhov.com wrote:

 Hi,
 
 While trying to fix the link state bug on BCM5719, David Imhoff
 has arrived at conclusion that the chip won't generate proper
 link state interrupts which renders auto-polling mode useless.
 
 As it turns out neither Linux nor FreeBSD use auto-polling mode
 for anything newer than BCM5705 and recent Broadcom documentation
 is not describing this method at all.
 
 This diff brings us in line with others, but requires heavy
 testing on currently supported hardware.
 
 OK's are welcome as well.
 
 diff --git sys/dev/pci/if_bge.c sys/dev/pci/if_bge.c
 index 1d37192..8007108 100644
 --- sys/dev/pci/if_bge.c
 +++ sys/dev/pci/if_bge.c
 @@ -1055,10 +1055,22 @@ bge_miibus_statchg(struct device *dev)
   (mii-mii_media_active  IFM_ETH_FMASK) != sc-bge_flowflags) {
   sc-bge_flowflags = mii-mii_media_active  IFM_ETH_FMASK;
   mii-mii_media_active = ~IFM_ETH_FMASK;
   }
 
 + if (!BGE_STS_BIT(sc, BGE_STS_LINK) 
 + mii-mii_media_status  IFM_ACTIVE 
 + IFM_SUBTYPE(mii-mii_media_active) != IFM_NONE)
 + BGE_STS_SETBIT(sc, BGE_STS_LINK);
 + else if (BGE_STS_BIT(sc, BGE_STS_LINK) 
 + (!(mii-mii_media_status  IFM_ACTIVE) ||
 + IFM_SUBTYPE(mii-mii_media_active) == IFM_NONE))
 + BGE_STS_CLRBIT(sc, BGE_STS_LINK);
 +
 + if (!BGE_STS_BIT(sc, BGE_STS_LINK))
 + return;
 +
   /* Set the port mode (MII/GMII) to match the link speed. */
   mac_mode = CSR_READ_4(sc, BGE_MAC_MODE) 
   ~(BGE_MACMODE_PORTMODE | BGE_MACMODE_HALF_DUPLEX);
   tx_mode = CSR_READ_4(sc, BGE_TX_MODE);
   rx_mode = CSR_READ_4(sc, BGE_RX_MODE);
 @@ -1773,11 +1785,11 @@ int
 bge_blockinit(struct bge_softc *sc)
 {
   volatile struct bge_rcb *rcb;
   vaddr_t rcb_addr;
   bge_hostaddrtaddr;
 - u_int32_t   dmactl, val;
 + u_int32_t   dmactl, mimode, val;
   int i, limit;
 
   /*
* Initialize the memory window pointer register so that
* we can access the first 32K of internal NIC RAM. This will
 @@ -2369,13 +2381,23 @@ bge_blockinit(struct bge_softc *sc)
 
   /* Enable PHY auto polling (for MII/GMII only) */
   if (sc-bge_flags  BGE_PHY_FIBER_TBI) {
   CSR_WRITE_4(sc, BGE_MI_STS, BGE_MISTS_LINK);
   } else {
 - BGE_STS_SETBIT(sc, BGE_STS_AUTOPOLL);
 - BGE_SETBIT(sc, BGE_MI_MODE, BGE_MIMODE_AUTOPOLL|1016);
 - if (BGE_ASICREV(sc-bge_chipid) == BGE_ASICREV_BCM5700)
 + if ((sc-bge_flags  BGE_CPMU_PRESENT) != 0)
 + mimode = BGE_MIMODE_500KHZ_CONST;
 + else
 + mimode = BGE_MIMODE_BASE;
 + if (BGE_IS_5700_FAMILY(sc) ||
 + BGE_ASICREV(sc-bge_chipid) == BGE_ASICREV_BCM5705) {
 + mimode |= 

Re: bge: don't use autopoll on anything above BCM5705

2013-05-29 Thread David Gwynne

On 30/05/2013, at 2:56 AM, Mike Belopuhov m...@belopuhov.com wrote:
 
 cool, thanks! does your 5719 work w/o other diffs?

yes, otherwise i would have fixed it :)

dlg



Re: ipsec / PF received-on

2013-06-05 Thread David Gwynne

On 04/06/2013, at 11:22 AM, Claudio Jeker cje...@diehard.n-r-g.com wrote:

 On Mon, Jun 03, 2013 at 11:49:58PM +0100, Stuart Henderson wrote:
 On a router running PF and isakmpd, I have a rule like this:
 
 match out on pppoe0 inet all received-on vlan5 nat-to $someip
 
 I was surprised to find this being applied to packets received on vlan5
 and caught by an ipsec flow; the resulting *encapsulated* (proto ESP) packets
 (as in, generated on the router itself, not actually themselves received on
 vlan5) end up getting natted.
 
 What does anyone else think...expected or not?
 
 
 Question, would you expect the ipsec packets to match against this rule?
 match out on pppoe0 inet all received-on enc0 nat-to $someip
 
 As in should we change the received interface when we hit ipsec?
 Think carefully since this path is edged by dragons and deep dark
 rabbit holes.

there is precedence for virtual interfaces to overwrite the rcvif on the way 
through. eg, em0 could become trunk0 which could become vlan0.

in this particular case though (ipsec gateway) id argue encapsulation by ipsec 
should clear the rcvif. is there any error handling for ipsec packets that 
relies on it?

dlg



let ciss(4) disks use all ccbs

2010-06-28 Thread David Gwynne
if ciss(4) has multiple disks on it, this should let any one of
those disks to use all the commands. with the iopools stuff thats
now in the tree, io on another disk should then be able to start
up and get a fair share.

i want a test from someone who has multiple LDs on a single ciss(4).

if you could run iogen on both disks at the same time, or find over
both of them while running this diff, that would be great.

Index: ciss.c
===
RCS file: /cvs/src/sys/dev/ic/ciss.c,v
retrieving revision 1.55
diff -u -p -r1.55 ciss.c
--- ciss.c  15 Jun 2010 04:11:34 -  1.55
+++ ciss.c  18 Jun 2010 10:54:37 -
@@ -369,7 +369,7 @@ ciss_attach(struct ciss_softc *sc)
 
sc-sc_link.device = ciss_dev;
sc-sc_link.adapter_softc = sc;
-   sc-sc_link.openings = sc-maxcmd / (sc-maxunits? sc-maxunits : 1);
+   sc-sc_link.openings = sc-maxcmd;
sc-sc_link.adapter = ciss_switch;
sc-sc_link.luns = 1;
sc-sc_link.adapter_target = sc-maxunits;



Re: 4K Sector Disks

2010-06-28 Thread David Gwynne
On 29/06/2010, at 12:20 PM, J.C. Roberts wrote:

 dlg,

 It took me weeks and a few failed attempts with various disk
 manufacturers, but it's done, and we have victory!

 The value from the modified atactl output for reg 106: 4000

 Finally it seems we have a disk that is properly showing us 4k sectors
 rather than lying. I kind of guessed this disk might be correct
 considering the performance drop in 512b benchmarks compared to 4k and
 larger benchmarks.

 I won't be home for a week, but if you can't find a Crucial C300
 locally in .au, let me know and I'll deal with it.

unfortunately 0x4000 doesnt mean the physical block size is 4k. it means that
the low bits of that field are a valid representation of the block size.
0x4000 is saying there is a 1:1 map from physical to logical blocks.

see atascsi.h:

u_int16_t   p2l_sect;   /* 106 */
#define ATA_ID_P2L_SECT_MASK0xc000
#define ATA_ID_P2L_SECT_VALID   0x4000
#define ATA_ID_P2L_SECT_SET 0x2000
#define ATA_ID_P2L_SECT_SIZESET 0x1000
#define ATA_ID_P2L_SECT_SIZE0x000f

169 is cool though ;)

u_int16_t   data_set_mgmt;  /* 169 */
#define ATA_ID_DATA_SET_MGMT_TRIM 0x0001

dlg


   jcr

 Model: C300-CTFDDAC256MAG, Rev: 0001, Serial #: 1015C87C
 Device type: ATA, fixed
 Cylinders: 16383, heads: 16, sec/track: 63, total sectors: 500118192
 Device capabilities:
   ATA standby timer values
   IORDY operation
   IORDY disabling
 Device supports the following standards:
 ATA-4 ATA-5 ATA-6 ATA-7 ATA-8
 Device supports the following command sets:
   NOP command
   READ BUFFER command
   WRITE BUFFER command
   Host Protected Area feature set
   Read look-ahead
   Write cache
   Power Management feature set
   Security Mode feature set
   SMART feature set
   Flush Cache Ext command
   Flush Cache command
   Device Configuration Overlay feature set
   48bit address feature set
   Set Max security extension commands
   Power-up in standby feature set
   Advanced Power Management feature set
   DOWNLOAD MICROCODE command
   IDLE IMMEDIATE with UNLOAD FEATURE
   SMART self-test
   SMART error logging
 Device has enabled the following command sets/features:
   NOP command
   READ BUFFER command
   WRITE BUFFER command
   Host Protected Area feature set
   Read look-ahead
   Write cache
   Power Management feature set
   SMART feature set
   Flush Cache Ext command
   Flush Cache command
   Device Configuration Overlay feature set
   48bit address feature set
   DOWNLOAD MICROCODE command
  0: 0x045a
  1: 0x3fff
  2: 0x
  3: 0x0010
  4: 0x7e00
  5: 0x
  6: 0x003f
  7: 0x03d3
  8: 0xfdd0
  9: 0x
 10: 0x3030
 11: 0x3030
 12: 0x3030
 13: 0x3030
 14: 0x3031
 15: 0x3531
 16: 0x3030
 17: 0x3030
 18: 0x3843
 19: 0x4337
 20: 0x
 21: 0x
 22: 0x
 23: 0x3030
 24: 0x3130
 25: 0x
 26: 0x
 27: 0x3343
 28: 0x3030
 29: 0x432d
 30: 0x4654
 31: 0x
 32: 0x4341
 33: 0x3532
 34: 0x4d36
 35: 0x4741
 36: 0x
 37: 0x
 38: 0x
 39: 0x
 40: 0x
 41: 0x
 42: 0x
 43: 0x
 44: 0x
 45: 0x
 46: 0x
 47: 0x8010
 48: 0x4000
 49: 0x2f00
 50: 0x4000
 51: 0x
 52: 0x
 53: 0x0007
 54: 0x3fff
 55: 0x0010
 56: 0x003f
 57: 0x32b0
 58: 0x1dcf
 59: 0x0110
 60: 0x
 61: 0x0fff
 62: 0x
 63: 0x0007
 64: 0x0003
 65: 0x0078
 66: 0x0078
 67: 0x0078
 68: 0x0078
 69: 0x
 70: 0x
 71: 0x
 72: 0x
 73: 0x
 74: 0x
 75: 0x001f
 76: 0x070e
 77: 0x
 78: 0x004c
 79: 0x0040
 80: 0x01f0
 81: 0x0028
 82: 0x746b
 83: 0x7d29
 84: 0x6173
 85: 0x7469
 86: 0xbc01
 87: 0x6163
 88: 0x407f
 89: 0x0005
 90: 0x0005
 91: 0x00fe
 92: 0x
 93: 0x
 94: 0x
 95: 0x0040
 96: 0x0100
 97: 0x0100
 98: 0x
 99: 0x0001
 100: 0x32b0
 101: 0x1dcf
 102: 0x
 103: 0x
 104: 0x0100
 105: 0x
 106: 0x4000
 107: 0x
 108: 0x5075
 109: 0x00a1
 110: 0x7cc8
 111: 0x
 112: 0x
 113: 0x
 114: 0x
 115: 0x
 116: 0x
 117: 0x
 118: 0x
 119: 0x401e
 120: 0x401c
 121: 0x
 122: 0x
 123: 0x
 124: 0x
 125: 0x
 126: 0x
 127: 0x
 128: 0x0029
 129: 0x3030
 130: 0x3031
 131: 0x2e45
 132: 0x312e
 133: 0x3030
 134: 0x
 135: 0x
 136: 0x
 137: 0x3334
 138: 0x3639
 139: 0x2020
 140: 0x2020
 141: 0x3342
 142: 0x4c36
 143: 0x
 144: 0x
 145: 0x
 146: 0x
 147: 0x
 148: 0x
 149: 0x
 150: 0x
 151: 0x
 152: 0x
 153: 0x
 154: 0x
 155: 0x
 156: 0x
 157: 0x
 158: 0x
 159: 0x
 160: 0x
 161: 0x
 162: 0x
 163: 0x
 164: 0x
 165: 0x
 166: 0x
 167: 0x
 168: 0x
 169: 0x0001
 170: 0x
 171: 0x
 172: 0x
 173: 0x
 174: 0x
 175: 0x
 176: 0x
 177: 0x
 178: 0x
 179: 0x
 180: 0x
 181: 0x
 182: 0x
 183: 0x
 184: 0x
 185: 0x
 186: 0x

bufq massage

2010-08-29 Thread David Gwynne
this diff is largely a mechanical change.

firstly, it makes struct bufq a member of the softc for devices
that use it, rather than it being a pointer to something that needs
to be allocated at attach. since all these devices need a bufq to
operate, it makes sense to have it allocated as part of the softc
and get bufq_init to just initialise all its fields. it also gets
rid of the possibility that you wont be able to allocate teh bufq
struct during attach, which is something you dont want to happen.

secondly, it consistently implements a split between wrapper functions
and the per discipline implementation of the bufq handlers. it
consistently does the locking in the wrappers rather than doing
half in the wrappers and the other half in the implementations.

it also consistently handles the outstanding bufq bq pointer in the
wrappers.

this hides most of the implementation inside kern_bufq.c. the only
stuff left in buf.h is for the bits each implementation needs to
put bufs on their queues.

ive tested this extensively on sd(4) and thib has tested this on
wd(4). we'd like some wider exposure, especially over suspends and
resumes on a variety of machines. i have tried to preserve the
locking semantics, but testing would be lovely.

ok?

dlg

Index: dev/ata/wd.c
===
RCS file: /cvs/src/sys/dev/ata/wd.c,v
retrieving revision 1.85
diff -u -p -r1.85 wd.c
--- dev/ata/wd.c28 Jun 2010 08:35:46 -  1.85
+++ dev/ata/wd.c25 Aug 2010 12:05:33 -
@@ -121,7 +121,7 @@ struct wd_softc {
/* General disk infos */
struct device sc_dev;
struct disk sc_dk;
-   struct bufq *sc_bufq;
+   struct bufq sc_bufq;
 
/* IDE disk soft states */
struct ata_bio sc_wdc_bio; /* current transfer */
@@ -369,7 +369,7 @@ wdattach(struct device *parent, struct d
 */
wd-sc_dk.dk_driver = wddkdriver;
wd-sc_dk.dk_name = wd-sc_dev.dv_xname;
-   wd-sc_bufq = bufq_init(BUFQ_DEFAULT);
+   bufq_init(wd-sc_bufq, BUFQ_DEFAULT);
wd-sc_sdhook = shutdownhook_establish(wd_shutdown, wd);
if (wd-sc_sdhook == NULL)
printf(%s: WARNING: unable to establish shutdown hook\n,
@@ -413,7 +413,7 @@ wddetach(struct device *self, int flags)
 
/* Remove unprocessed buffers from queue */
s = splbio();
-   while ((bp = BUFQ_DEQUEUE(sc-sc_bufq)) != NULL) {
+   while ((bp = bufq_dequeue(sc-sc_bufq)) != NULL) {
bp-b_error = ENXIO;
bp-b_flags |= B_ERROR;
biodone(bp);
@@ -435,7 +435,7 @@ wddetach(struct device *self, int flags)
shutdownhook_disestablish(sc-sc_sdhook);
 
/* Detach disk. */
-   bufq_destroy(sc-sc_bufq);
+   bufq_destroy(sc-sc_bufq);
disk_detach(sc-sc_dk);
 
return (0);
@@ -486,7 +486,7 @@ wdstrategy(struct buf *bp)
(wd-sc_flags  (WDF_WLABEL|WDF_LABELLING)) != 0) = 0)
goto done;
/* Queue transfer on drive, activate drive and controller if idle. */
-   BUFQ_QUEUE(wd-sc_bufq, bp);
+   bufq_queue(wd-sc_bufq, bp);
s = splbio();
wdstart(wd);
splx(s);
@@ -518,7 +518,7 @@ wdstart(void *arg)
while (wd-openings  0) {
 
/* Is there a buf for us ? */
-   if ((bp = BUFQ_DEQUEUE(wd-sc_bufq)) == NULL)
+   if ((bp = bufq_dequeue(wd-sc_bufq)) == NULL)
return;
/*
 * Make the command. First lock the device
Index: kern/kern_bufq.c
===
RCS file: /cvs/src/sys/kern/kern_bufq.c,v
retrieving revision 1.14
diff -u -p -r1.14 kern_bufq.c
--- kern/kern_bufq.c19 Jul 2010 21:39:15 -  1.14
+++ kern/kern_bufq.c25 Aug 2010 12:05:33 -
@@ -30,45 +30,70 @@ SLIST_HEAD(, bufq)  bufqs = SLIST_HEAD_IN
 struct mutex   bufqs_mtx = MUTEX_INITIALIZER(IPL_NONE);
 intbufqs_stop;
 
-struct buf *(*bufq_dequeuev[BUFQ_HOWMANY])(struct bufq *, int) = {
-   bufq_disksort_dequeue,
-   bufq_fifo_dequeue
+struct bufq_impl {
+   void*(*impl_create)(void);
+   void (*impl_destroy)(void *);
+
+   void (*impl_queue)(void *, struct buf *);
+   struct buf  *(*impl_dequeue)(void *);
+   void (*impl_requeue)(void *, struct buf *);
+   int  (*impl_peek)(void *);
 };
-void (*bufq_queuev[BUFQ_HOWMANY])(struct bufq *, struct buf *) = {
+
+void   *bufq_disksort_create(void);
+voidbufq_disksort_destroy(void *);
+voidbufq_disksort_queue(void *, struct buf *);
+struct buf *bufq_disksort_dequeue(void *);
+voidbufq_disksort_requeue(void *, struct buf *);
+int bufq_disksort_peek(void *);
+
+struct bufq_impl bufq_impl_disksort = {
+   bufq_disksort_create,
+   

Re: Backout mclgeti for vr(4).

2010-08-29 Thread David Gwynne
unless someone fixes mclgeti in this driver in the next 24 hours, this should
go in.

this has my ok on august 31.

On 28/08/2010, at 4:07 AM, Thordur I Bjornsson wrote:

 As seen on misc@, and also by myself sometime ago, but I forgot about
 it as work got hectic and I was moving.

 Anyways, vr(4) will not even surivie a few ping -f's, before the pools
 become corrupt, this has (obviously) something todo with how MCLGETI
 takes and puts buf's of the rings;

 Reverting MCLGETI fixes the issue. I've attached a diff, I remember
 testing it, and it survives fine.

 I did spent some time back then trying to figure this out but to no avail.
 The problem is the card is still messing with mbufs apperently after they
 have been taken of the ring (This is kind of, similar to rev1.94 I
think).

 Index: dev/pci/if_vr.c
 ===
 RCS file: /home/cvs/src/sys/dev/pci/if_vr.c,v
 retrieving revision 1.105
 diff -u -p -r1.105 if_vr.c
 --- dev/pci/if_vr.c   19 May 2010 15:27:35 -  1.105
 +++ dev/pci/if_vr.c   5 Aug 2010 15:59:05 -
 @@ -1,4 +1,4 @@
 -/*   $OpenBSD: if_vr.c,v 1.105 2010/05/19 15:27:35 oga Exp $ */
 +/*   $OpenBSD: if_vr.c,v 1.95 2009/06/04 16:56:20 sthen Exp $*/

 /*
  * Copyright (c) 1997, 1998
 @@ -135,10 +135,9 @@ void vr_setcfg(struct vr_softc *, int);
 void vr_iff(struct vr_softc *);
 void vr_reset(struct vr_softc *);
 int vr_list_rx_init(struct vr_softc *);
 -void vr_fill_rx_ring(struct vr_softc *);
 int vr_list_tx_init(struct vr_softc *);

 -int vr_alloc_mbuf(struct vr_softc *, struct vr_chain_onefrag *);
 +int vr_alloc_mbuf(struct vr_softc *, struct vr_chain_onefrag *, struct mbuf
*);

 /*
  * Supported devices  quirks
 @@ -664,7 +663,6 @@ vr_attach(struct device *parent, struct
   /*
* Call MI attach routines.
*/
 - m_clsetwms(ifp, MCLBYTES, 2, VR_RX_LIST_CNT - 1);
   if_attach(ifp);
   ether_ifattach(ifp);
   return;
 @@ -749,6 +747,9 @@ vr_list_rx_init(struct vr_softc *sc)
   sc-sc_listmap-dm_segs[0].ds_addr +
   offsetof(struct vr_list_data, vr_rx_list[i]);

 + if (vr_alloc_mbuf(sc, cd-vr_rx_chain[i], NULL))
 + return (ENOBUFS);
 +
   if (i == (VR_RX_LIST_CNT - 1))
   nexti = 0;
   else
 @@ -760,30 +761,11 @@ vr_list_rx_init(struct vr_softc *sc)
   offsetof(struct vr_list_data, vr_rx_list[nexti]));
   }

 - cd-vr_rx_prod = cd-vr_rx_cons = cd-vr_rx_chain[0];
 - cd-vr_rx_cnt = 0;
 - vr_fill_rx_ring(sc);
 + cd-vr_rx_head = cd-vr_rx_chain[0];

   return (0);
 }

 -void
 -vr_fill_rx_ring(struct vr_softc *sc)
 -{
 - struct vr_chain_data*cd;
 - struct vr_list_data *ld;
 -
 - cd = sc-vr_cdata;
 - ld = sc-vr_ldata;
 -
 - while (cd-vr_rx_cnt  VR_RX_LIST_CNT) {
 - if (vr_alloc_mbuf(sc, cd-vr_rx_prod))
 - break;
 - cd-vr_rx_prod = cd-vr_rx_prod-vr_nextdesc;
 - cd-vr_rx_cnt++;
 - }
 -}
 -
 /*
  * A frame has been uploaded: pass the resulting mbuf chain up to
  * the higher level protocols.
 @@ -791,7 +773,7 @@ vr_fill_rx_ring(struct vr_softc *sc)
 void
 vr_rxeof(struct vr_softc *sc)
 {
 - struct mbuf *m;
 + struct mbuf *m0, *m;
   struct ifnet*ifp;
   struct vr_chain_onefrag *cur_rx;
   int total_len = 0;
 @@ -799,21 +781,20 @@ vr_rxeof(struct vr_softc *sc)

   ifp = sc-arpcom.ac_if;

 - while(sc-vr_cdata.vr_rx_cnt  0) {
 + for (;;) {
 +
   bus_dmamap_sync(sc-sc_dmat, sc-sc_listmap,
   0, sc-sc_listmap-dm_mapsize,
   BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 - rxstat = letoh32(sc-vr_cdata.vr_rx_cons-vr_ptr-vr_status);
 + rxstat = letoh32(sc-vr_cdata.vr_rx_head-vr_ptr-vr_status);
   if (rxstat  VR_RXSTAT_OWN)
   break;

 - rxctl = letoh32(sc-vr_cdata.vr_rx_cons-vr_ptr-vr_ctl);
 + rxctl = letoh32(sc-vr_cdata.vr_rx_head-vr_ptr-vr_ctl);

 - cur_rx = sc-vr_cdata.vr_rx_cons;
 - m = cur_rx-vr_mbuf;
 - cur_rx-vr_mbuf = NULL;
 - sc-vr_cdata.vr_rx_cons = cur_rx-vr_nextdesc;
 - sc-vr_cdata.vr_rx_cnt--;
 + m0 = NULL;
 + cur_rx = sc-vr_cdata.vr_rx_head;
 + sc-vr_cdata.vr_rx_head = cur_rx-vr_nextdesc;

   /*
* If an error occurs, update stats, clear the
 @@ -843,13 +824,24 @@ vr_rxeof(struct vr_softc *sc)
   printf(\n);
 #endif

 - m_freem(m);
 + /* Reinitialize descriptor */
 + cur_rx-vr_ptr-vr_status = htole32(VR_RXSTAT);
 + cur_rx-vr_ptr-vr_data =
 + htole32(cur_rx-vr_map-dm_segs[0].ds_addr +
 +

raise the openings on mpi(4)

2010-09-13 Thread David Gwynne
last time i tried this i caused weird issues on people using the
SPI variants of mpi(4).

this restricts the large number of openings to the SAS or FC mpi(4)
variants, both of which i have succesffully tested myself.

ok?

Index: mpi.c
===
RCS file: /cvs/src/sys/dev/ic/mpi.c,v
retrieving revision 1.161
diff -u -p -r1.161 mpi.c
--- mpi.c   13 Sep 2010 07:48:12 -  1.161
+++ mpi.c   13 Sep 2010 07:49:52 -
@@ -339,7 +340,10 @@ mpi_attach(struct mpi_softc *sc)
sc-sc_link.adapter_softc = sc;
sc-sc_link.adapter_target = sc-sc_target;
sc-sc_link.adapter_buswidth = sc-sc_buswidth;
-   sc-sc_link.openings = sc-sc_maxcmds / sc-sc_buswidth;
+   if (sc-sc_porttype == MPI_PORTFACTS_PORTTYPE_SCSI)
+   sc-sc_link.openings = sc-sc_maxcmds / sc-sc_buswidth;
+   else
+   sc-sc_link.openings = sc-sc_maxcmds;
sc-sc_link.pool = sc-sc_iopool;
 
bzero(saa, sizeof(saa));



Re: dhcpd autoproxy option

2010-10-19 Thread David Gwynne
On 19/10/2010, at 4:17 AM, Driton Husnovic wrote:

 Hi @Tech,
 
 I want to configure dhcpd with WPAD option. I see that option-252 is 
 renamed/moved as  autoproxy.
 But there is no manual or reference about autoproxy.
 Can anyone add any syntax or mini example to the list please?

option autoproxy-script http://example.com/autoproxy.pac;;

cheers,
dlg



enable vmt(4) by default

2010-10-25 Thread David Gwynne
thanks to work by jonathan matthew, vmt is now actually useful for
more than helping your clocks keep in sync.

is there a reason we shouldnt enable it?

cheers,
dlg

Index: amd64/conf/GENERIC
===
RCS file: /cvs/src/sys/arch/amd64/conf/GENERIC,v
retrieving revision 1.305
diff -u -p -r1.305 GENERIC
--- amd64/conf/GENERIC  4 Oct 2010 09:32:43 -   1.305
+++ amd64/conf/GENERIC  26 Oct 2010 01:27:36 -
@@ -59,7 +59,7 @@ aibs* at acpi?
 mpbios0at bios0
 
 ipmi0  at mainbus? disable # IPMI
-#vmt0  at mainbus? # VMware Tools
+vmt0   at mainbus? # VMware Tools
 
 option PCIVERBOSE
 option USBVERBOSE
Index: i386/conf/GENERIC
===
RCS file: /cvs/src/sys/arch/i386/conf/GENERIC,v
retrieving revision 1.699
diff -u -p -r1.699 GENERIC
--- i386/conf/GENERIC   4 Oct 2010 09:32:43 -   1.699
+++ i386/conf/GENERIC   26 Oct 2010 01:27:36 -
@@ -44,7 +44,7 @@ acpi0 at bios?
 mpbios0at bios0
 pcibios0 at bios0 flags 0x # use 0x30 for a total verbose
 ipmi0  at mainbus? disable # IPMI
-#vmt0  at mainbus? # VMware Tools
+vmt0   at mainbus? # VMware Tools
 esm0   at mainbus? # Dell Embedded Server Management
 amdmsr0at mainbus? # MSR access for AMD Geode LX CPUs with 
GP



allow relayd to install a route using a specific interface

2010-11-05 Thread David Gwynne
if you have two carped routers and you also want to redistribute
routes that relayd inserts into the kernel via ospf or bgp, but
only on the router that has the master carp interface, then this
diff should allow you to do so.

in relayd you can have a config like:

table routers { $gw1 ip ttl 1 }
router somenet {
route 192.168.1.0/24
interface carp0
rtlabel somenet
forward to routers check icmp
}

and in ospfd:

redistribute rtlabel somenet

this will cause the firewall to advertise a route to the 192.168.1.0
only if it can ping $gw1 AND only if the carp0 interface is up (ie,
it is the master).

my intended use is to build very redundant anycast setups for
services checked with relayd.

ok?

Index: parse.y
===
RCS file: /cvs/src/usr.sbin/relayd/parse.y,v
retrieving revision 1.149
diff -u -p -r1.149 parse.y
--- parse.y 26 Oct 2010 15:04:37 -  1.149
+++ parse.y 5 Nov 2010 08:30:19 -
@@ -1501,6 +1501,18 @@ routeoptsl   : ROUTE address '/' NUMBER {
router-rt_conf.gwtable = $3-conf.id;
router-rt_conf.gwport = $3-conf.port;
}
+   | INTERFACE STRING {
+   size_t rv;
+
+   rv = strlcpy(router-rt_conf.ifname, $2,
+   sizeof(router-rt_conf.ifname));
+   free($2);
+
+   if (rv = sizeof(router-rt_conf.ifname)) {
+   yyerror(router interface name truncated);
+   YYERROR;
+   }
+   }
| RTABLE NUMBER {
if (router-rt_conf.rtable) {
yyerror(router %s rtable already specified,
Index: pfe_route.c
===
RCS file: /cvs/src/usr.sbin/relayd/pfe_route.c,v
retrieving revision 1.1
diff -u -p -r1.1 pfe_route.c
--- pfe_route.c 13 Aug 2009 13:51:21 -  1.1
+++ pfe_route.c 5 Nov 2010 08:30:19 -
@@ -1,4 +1,4 @@
-/* $OpenBSD: pfe_route.c,v 1.1 2009/08/13 13:51:21 reyk Exp $  */
+/* $OpenBSD$   */
 
 /*
  * Copyright (c) 2009 Reyk Floeter r...@openbsd.org
@@ -19,8 +19,10 @@
 #include sys/types.h
 #include sys/queue.h
 #include sys/socket.h
+#include sys/uio.h
 
 #include net/if.h
+#include net/if_dl.h
 #include netinet/in.h
 #include arpa/inet.h
 #include net/route.h
@@ -38,23 +40,7 @@
 
 extern struct imsgev   *iev_main;
 
-struct relay_rtmsg {
-   struct rt_msghdrrm_hdr;
-   union {
-   struct {
-   struct sockaddr_in  rm_dst;
-   struct sockaddr_in  rm_gateway;
-   struct sockaddr_in  rm_netmask;
-   struct sockaddr_rtlabel rm_label;
-   }u4;
-   struct {
-   struct sockaddr_in6 rm_dst;
-   struct sockaddr_in6 rm_gateway;
-   struct sockaddr_in6 rm_netmask;
-   struct sockaddr_rtlabel rm_label;
-   }u6;
-   }rm_u;
-};
+void iov_add(struct iovec *, int *, void *, size_t);
 
 void
 init_routes(struct relayd *env)
@@ -106,19 +92,52 @@ sync_routes(struct relayd *env, struct r
}
 }
 
+#define ROUNDUP(a) (a0 ? (1 + (((a) - 1) | (sizeof(long) - 1))) : 
sizeof(long))
+
+void
+iov_add(struct iovec *iov, int *iovcount, void *base, size_t len)
+{
+   static char  pad[sizeof(long)];
+
+   iov[*iovcount].iov_base = base;
+   iov[*iovcount].iov_len = len;
+   (*iovcount)++;
+
+   if (ROUNDUP(len)  len) {
+   iov[*iovcount].iov_base = pad;
+   iov[*iovcount].iov_len = ROUNDUP(len) - len;
+   (*iovcount)++;
+   }
+}
+
 int
 pfe_route(struct relayd *env, struct ctl_netroute *crt)
 {
-   struct relay_rtmsg   rm;
-   struct sockaddr_rtlabel  sr;
+   struct rt_msghdrrm_hdr;
+   union {
+   struct {
+   struct sockaddr_in  rm_dst;
+   struct sockaddr_in  rm_gateway;
+   struct sockaddr_in  rm_netmask;
+   }u4;
+   struct {
+   struct sockaddr_in6 rm_dst;
+   struct sockaddr_in6 rm_gateway;
+   struct sockaddr_in6 rm_netmask;
+   }u6;
+   }rm_u;
+   struct sockaddr_rtlabel  rm_label;
+   struct sockaddr_dl   rm_ifp;
+
struct sockaddr_storage *gw;
struct sockaddr_in  *s4;
struct sockaddr_in6 

limit the number of cpus amd64 will attach to the number of cpuinfo slots we have

2010-11-24 Thread David Gwynne
without this diff this box panics on boot while attaching the 36th
cpu. its a buffer overrun...

analysis done by kettenis.

ok?

Index: cpu.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/cpu.c,v
retrieving revision 1.38
diff -u -p -r1.38 cpu.c
--- cpu.c   13 Nov 2010 04:16:42 -  1.38
+++ cpu.c   24 Nov 2010 13:04:30 -
@@ -168,6 +168,9 @@ cpu_match(struct device *parent, void *m
struct cfdata *cf = match;
struct cpu_attach_args *caa = aux;
 
+   if (~cpus_attached == 0)
+   return 0;
+
if (strcmp(caa-caa_name, cf-cf_driver-cd_name) == 0)
return 1;
return 0;


OpenBSD 4.8-current (GENERIC.MP) #0: Wed Nov 24 22:59:08 EST 2010
d...@dlg.eait.uq.edu.au:/home/dlg/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 3747188736 (3573MB)
avail mem = 3633528832 (3465MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xdf79c000 (103 entries)
bios0: vendor Dell Inc. version 1.2.1 date 08/02/2010
bios0: Dell Inc. PowerEdge R815
acpi0 at bios0: rev 2
acpi0: sleep states S0 S4 S5
acpi0: tables DSDT FACP APIC SPCR HPET MCFG WD__ SLIC ERST HEST BERT EINJ IV__ 
SRAT SLIT SS__ TCPA
acpi0: wakeup devices PCI0(S5) PCI1(S5)
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Opteron(tm) Processor 6174, 2200.31 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu0: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu0: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative
cpu0: apic clock running at 200MHz
cpu1 at mainbus0: apid 48 (application processor)
cpu1: AMD Opteron(tm) Processor 6174, 2200.04 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu1: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu1: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative
cpu2 at mainbus0: apid 32 (application processor)
cpu2: AMD Opteron(tm) Processor 6174, 2200.04 MHz
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu2: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu2: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative
cpu3 at mainbus0: apid 16 (application processor)
cpu3: AMD Opteron(tm) Processor 6174, 2200.04 MHz
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu3: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu3: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative
cpu4 at mainbus0: apid 1 (application processor)
cpu4: AMD Opteron(tm) Processor 6174, 2200.04 MHz
cpu4: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu4: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu4: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu4: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative
cpu5 at mainbus0: apid 49 (application processor)
cpu5: AMD Opteron(tm) Processor 6174, 2200.04 MHz
cpu5: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu5: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu5: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu5: DTLB 48 4KB entries fully associative, 48 4MB entries fully associative
cpu6 at mainbus0: apid 33 (application processor)
cpu6: AMD Opteron(tm) Processor 6174, 2200.04 MHz
cpu6: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,CX16,POPCNT,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu6: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu6: ITLB 32 4KB entries fully associative, 16 4MB entries fully associative
cpu6: DTLB 48 4KB entries fully associative, 48 4MB entries fully 

Re: limit the number of cpus amd64 will attach to the number of cpuinfo slots we have

2010-11-25 Thread David Gwynne
On Thu, Nov 25, 2010 at 06:04:23PM -0500, Kenneth R Westerback wrote:
 Index: amd64/cpu.c
 ===
 RCS file: /cvs/src/sys/arch/amd64/amd64/cpu.c,v
 retrieving revision 1.38
 diff -u -p -r1.38 cpu.c
 --- amd64/cpu.c   13 Nov 2010 04:16:42 -  1.38
 +++ amd64/cpu.c   25 Nov 2010 23:00:44 -
 @@ -135,8 +135,6 @@ struct cpu_info cpu_info_primary = { 0, 
  
  struct cpu_info *cpu_info_list = cpu_info_primary;
  
 -u_int32_t cpus_attached = 0;
 -
  #ifdef MULTIPROCESSOR
  /*
   * Array of CPU info structures.  Must be statically-allocated because
 @@ -345,8 +343,6 @@ cpu_attach(struct device *parent, struct
   panic(unknown processor type??);
   }
   cpu_vm_init(ci);
 -
 - cpus_attached |= (1  ci-ci_cpuid);
  
  #if defined(MULTIPROCESSOR)
   if (mp_verbose) {
 Index: include/cpu.h
 ===
 RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v
 retrieving revision 1.60
 diff -u -p -r1.60 cpu.h
 --- include/cpu.h 22 Nov 2010 21:07:18 -  1.60
 +++ include/cpu.h 25 Nov 2010 23:00:44 -
 @@ -211,8 +211,6 @@ extern struct cpu_info cpu_info_primary;
  
  #define aston(p) ((p)-p_md.md_astpending = 1)
  
 -extern u_int32_t cpus_attached;
 -
  #define curpcb   curcpu()-ci_curpcb
  
  /*

this should follow krws diff above.

information about each cpu is stored in a statically sized array.
if we have more physical cpus than entries in that array, we currently
start storing cpu_info structures in unowned memory after that
array. this leads to Bad Things(tm).

MAXCPUs should be bumped up on amd64, but we should also avoid
panicking on ridiculously overspecced boxes. i am sure i will regret
claiming that a 48core machine is overspecced in the very near
future though.

Index: amd64/cpu.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/cpu.c,v
retrieving revision 1.38
diff -u -p -r1.38 cpu.c
--- amd64/cpu.c 13 Nov 2010 04:16:42 -  1.38
+++ amd64/cpu.c 26 Nov 2010 06:31:53 -
@@ -167,10 +165,24 @@ cpu_match(struct device *parent, void *m
 {
struct cfdata *cf = match;
struct cpu_attach_args *caa = aux;
+   int cpunum;
 
-   if (strcmp(caa-caa_name, cf-cf_driver-cd_name) == 0)
+   /*
+* we cannot attach more cpus than we are set up to store in the
+* cpu_info array, so fail to match if we have more cpus than slots
+* in the array.
+*/
+
+   for (cpunum = 0; cpunum  nitems(cpu_info); cpunum++) {
+   if (cpu_info[cpunum] == NULL)
+   break;
+   }
+
+   if (cpunum  nitems(cpu_info) 
+   strcmp(caa-caa_name, cf-cf_driver-cd_name) == 0)
return 1;
-   return 0;
+
+   return (0);
 }
 
 static void



Re: no printing cache info

2010-11-28 Thread David Gwynne
i agree with mark.

On 28/11/2010, at 11:12 PM, Mark Kettenis wrote:

 Date: Sun, 28 Nov 2010 17:02:43 +1100 (EST)
 From: Damien Miller d...@mindrot.org
 
 On Sat, 27 Nov 2010, Ted Unangst wrote:
 
 if you really really need to know that your cpu cache has 48 fully 
 associative entries, go consult the spec sheet.  otherwise, save some 
 electrons.
 
 or, how about only print this (and flags) for the first attached CPU?
 Unless there are plans to support assymmetric MP sometime soon...
 
 Best thing would be to print it once per socket, i.e. for the first
 core of each physical CPU.
 
 Oh, and the flags can be subtly different for other CPUs in the
 system, even if they are exactly the same model, because the BIOS can
 enable/disable some features.



Re: CVS: cvs.openbsd.org: src

2010-11-29 Thread David Gwynne
seems to work fine on my big box.

On 29/11/2010, at 5:43 PM, Philip Guenther wrote:

 On Sun, 28 Nov 2010, Philip Guenther wrote:
 On Sun, 28 Nov 2010, Philip Guenther wrote:
 On Sunday, November 28, 2010, David Gwynne d...@cvs.openbsd.org wrote:
 ...
 Log message:
 bump the number of supported cpus from 32 up to 64. lets me attach and
use
 all 48 cores in one of my boxes.

 requested by deraadt@
 made possible by the recent pmap diff by kettenis@

 Doesn't pm_cpus in the pmap need to change to a u_int64_t and locore.S
 and pmap.c (at least) change to match?

 Here's a diff to do that.

 It also corrects the x86_atomic_*_{l,ul}() macros to actually expand to
 the functions that operate on longs instead of ints (64- and 32-bits,
 respectively) and removes the unused x86_multicast_ipi() function.
 Finally, tlb_shoot_wait has been operated on with 32bit atomic ops, so
 make it an (unsigned) int instead of a long.  (This would have never
 worked on a big-endian platform.)

 Compile tested only so far (about to get on plane).

 Revised diff that doesn't include my bogus flailing on x86_atomic_cas_ul()
 (which does operate on unsigned longs) or tlb_shoot_wait.

 I'm running this now on my lowly little 4 core amd64.

 Philip Guenther


 diff -ru t/amd64/intr.c ./amd64/intr.c
 --- t/amd64/intr.cSun Nov 28 20:27:17 2010
 +++ ./amd64/intr.cSun Nov 28 18:48:08 2010
 @@ -498,7 +498,7 @@

   simple_lock(ci-ci_slock);
   pic-pic_hwmask(pic, ih-ih_pin);
 - x86_atomic_clearbits_l(ci-ci_ipending, (1  ih-ih_slot));
 + x86_atomic_clearbits_u32(ci-ci_ipending, (1  ih-ih_slot));

   /*
* Remove the handler from the chain.
 diff -ru t/amd64/ipi.c ./amd64/ipi.c
 --- t/amd64/ipi.c Sun Nov 28 20:27:17 2010
 +++ ./amd64/ipi.c Sun Nov 28 18:48:46 2010
 @@ -50,7 +50,7 @@
 {
   int ret;

 - x86_atomic_setbits_l(ci-ci_ipis, ipimask);
 + x86_atomic_setbits_u32(ci-ci_ipis, ipimask);

   /* Don't send IPI to cpu which isn't (yet) running. */
   if (!(ci-ci_flags  CPUF_RUNNING))
 @@ -88,7 +88,7 @@
   continue;
   if ((ci-ci_flags  CPUF_RUNNING) == 0)
   continue;
 - x86_atomic_setbits_l(ci-ci_ipis, ipimask);
 + x86_atomic_setbits_u32(ci-ci_ipis, ipimask);
   count++;
   }
   if (!count)
 @@ -98,23 +98,6 @@
 }

 void
 -x86_multicast_ipi(int cpumask, int ipimask)
 -{
 - struct cpu_info *ci;
 - CPU_INFO_ITERATOR cii;
 -
 - cpumask = ~(1U  cpu_number());
 - if (cpumask == 0)
 - return;
 -
 - CPU_INFO_FOREACH(cii, ci) {
 - if ((cpumask  (1U  ci-ci_cpuid)) == 0)
 - continue;
 - x86_send_ipi(ci, ipimask);
 - }
 -}
 -
 -void
 x86_ipi_handler(void)
 {
   extern struct evcount ipi_count;
 @@ -122,7 +105,7 @@
   u_int32_t pending;
   int bit;

 - pending = x86_atomic_testset_ul(ci-ci_ipis, 0);
 + pending = x86_atomic_testset_u32(ci-ci_ipis, 0);

   for (bit = 0; bit  X86_NIPI  pending; bit++) {
   if (pending  (1bit)) {
 diff -ru t/amd64/locore.S ./amd64/locore.S
 --- t/amd64/locore.S  Sun Nov 28 20:27:17 2010
 +++ ./amd64/locore.S  Sun Nov 28 19:00:57 2010
 @@ -762,7 +762,7 @@
   /* clear the old pmap's bit for the cpu */
   movqPCB_PMAP(%r13),%rcx
   lock
 - btrl%edi,PM_CPUS(%rcx)
 + btrq%rdi,PM_CPUS(%rcx)

   /* Save stack pointers. */
   movq%rsp,PCB_RSP(%r13)
 @@ -800,9 +800,11 @@
   /* set the new pmap's bit for the cpu */
   movlCPUVAR(CPUID),%edi
   movqPCB_PMAP(%r13),%rcx
 - movlPM_CPUS(%rcx),%eax
 +#ifdef DIAGNOSTIC
 + movqPM_CPUS(%rcx),%rax
 +#endif
   lock
 - btsl%edi,PM_CPUS(%rcx)
 + btsq%rdi,PM_CPUS(%rcx)
 #ifdef DIAGNOSTIC
   jc  _C_LABEL(switch_pmcpu_set)
 #endif
 diff -ru t/amd64/pmap.c ./amd64/pmap.c
 --- t/amd64/pmap.cSun Nov 28 20:36:05 2010
 +++ ./amd64/pmap.cSun Nov 28 20:32:48 2010
 @@ -351,7 +351,7 @@
 pmap_is_active(struct pmap *pmap, int cpu_id)
 {
   return (pmap == pmap_kernel() ||
 - (pmap-pm_cpus  (1U  cpu_id)) != 0);
 + (pmap-pm_cpus  (1ULL  cpu_id)) != 0);
 }

 static __inline u_int
 @@ -1064,7 +1064,7 @@

 #ifdef DIAGNOSTIC
   if (pmap-pm_cpus != 0)
 - printf(pmap_destroy: pmap %p cpus=0x%lx\n,
 + printf(pmap_destroy: pmap %p cpus=0x%llx\n,
   (void *)pmap, pmap-pm_cpus);
 #endif

 @@ -1127,7 +1127,7 @@
   /*
* mark the pmap in use by this processor.
*/
 - x86_atomic_setbits_ul(pmap-pm_cpus, (1U  cpu_number()));
 + x86_atomic_setbits_u64(pmap-pm_cpus, (1ULL  cpu_number()));
   }
 }

 @@ -1143,7 +1143,7 @@
   /*
* mark the pmap no longer in use by this processor.
*/
 - x86_atomic_clearbits_ul(pmap-pm_cpus, (1U  cpu_number()));
 + x86_atomic_clearbits_u64(pmap-pm_cpus

Re: better matching of boot devices on amd64

2010-12-01 Thread David Gwynne
it's the same code, so yes, it does.

i'll fix it and commit it if/when this diff gets oked.

dlg

On 02/12/2010, at 3:21 PM, Theo de Raadt wrote:

 i386 doesn't have this bug?
 
 the boot loader passes a variable that identifies the disk its
 booting off made up of a bunch of fields like adapter, controller,
 disk, and partition offsets, plus a table of all the disks it can
 see which includes this id and a checksum.
 
 the kernel goes through and checksums the disks and then maps that
 back to the id associated with that disk, and then compares some
 of the fields in those ids against the boot disks id to figure out
 which disk its on.
 
 the problem is we overflow one of those fields (the disk id one).
 since the other fields are set to 0 by the boot loader, this doesnt
 really matter that much. however, since those fields are now
 significant because of the overflow, we should compare them too.
 
 this prevents sd16 being matched as the boot disk after sd0 on my
 system with 25 disks attached.
 
 sorry if this explanation sucks.
 
 ok?
 
 Index: dkcsum.c
 ===
 RCS file: /cvs/src/sys/arch/amd64/amd64/dkcsum.c,v
 retrieving revision 1.15
 diff -u -p -r1.15 dkcsum.c
 --- dkcsum.c 10 Dec 2008 23:41:19 -  1.15
 +++ dkcsum.c 2 Dec 2010 05:08:25 -
 @@ -71,10 +71,13 @@ dkcsumattach(void)
 
 #ifdef DEBUG
  printf(dkcsum: bootdev=%#x\n, bootdev);
 -for (bdi = bios_diskinfo; bdi-bios_number != -1; bdi++)
 -if (bdi-bios_number  0x80)
 -printf(dkcsum: BIOS drive %#x checksum is %#x\n,
 -bdi-bios_number, bdi-checksum);
 +for (bdi = bios_diskinfo; bdi-bios_number != -1; bdi++) {
 +if (bdi-bios_number  0x80) {
 +printf(dkcsum: BIOS drive %#x bsd_dev=%#x 
 +checksum=%#x\n, bdi-bios_number, bdi-bsd_dev,
 +bdi-checksum);
 +}
 +}
 #endif
  pribootdev = altbootdev = 0;
 
 @@ -180,7 +183,9 @@ dkcsumattach(void)
   */
 
  /* B_TYPE dependent hd unit counting bootblocks */
 -if ((B_TYPE(bootdev) == B_TYPE(hit-bsd_dev)) 
 +if ((B_ADAPTOR(bootdev) == B_ADAPTOR(hit-bsd_dev)) 
 +(B_CONTROLLER(bootdev) == B_CONTROLLER(hit-bsd_dev)) 
 +(B_TYPE(bootdev) == B_TYPE(hit-bsd_dev)) 
  (B_UNIT(bootdev) == B_UNIT(hit-bsd_dev))) {
  int type, ctrl, adap, part, unit;



Re: better matching of boot devices on amd64

2010-12-02 Thread David Gwynne
On 02/12/2010, at 10:35 PM, Kenneth R Westerback wrote:


 If this works then ok k...@. Certainly comparing all the fields is
 better as far as I am concerned.

 My only twinge is relying on the overflow of the unit field. Some
 clever dick (intern?) may look at MAKEBOOTDEV() some day and apply
 the masks as the word is being constructed. Perhaps a cautionary
 note in sys/sys/reboot.h saying we rely on unit overflowing would
 not go amiss.

we'll fix the bootloader eventually. i just dont think that is a great idea at
this point in the cycle.

dlg



working hotplug for busy devices on mpii(4)

2010-12-23 Thread David Gwynne
hi guys,

this makes mpii properly detach devices, which helps a lot if they
have commands in flight. to relevant changes are:

- call the activate(DVACT_DEACTIVATE) function against all the luns
on the target that is going away.
- issue the target reset BEFORE detaching the children devices.
this is needed now tha the midlayer will sleep until all outstanding
commands on a device come back from the adapter before calling the
child devices attach routine.

i have tested this on straight disks, but not on raid volumes. i
need someone to test disk removal behind a raid set before i can
commit it.

assuming testing goes well, can i get oks too?

dlg

Index: mpii.c
===
RCS file: /cvs/src/sys/dev/pci/mpii.c,v
retrieving revision 1.35
diff -u -p -r1.35 mpii.c
--- mpii.c  23 Aug 2010 00:53:36 -  1.35
+++ mpii.c  24 Dec 2010 05:46:34 -
@@ -3417,6 +3417,8 @@ mpii_event_sas(struct mpii_softc *sc, st
mpii_remove_dev(sc, dev);
if (sc-sc_scsibus) {
SET(dev-flags, MPII_DF_DETACH);
+   scsi_activate(sc-sc_scsibus, dev-slot, -1,
+   DVACT_DEACTIVATE);
if (scsi_task(mpii_event_defer, sc,
dev, 0) != 0)
printf(%s: unable to run device 
@@ -3529,27 +3531,19 @@ mpii_event_defer(void *xsc, void *arg)
struct mpii_softc   *sc = xsc;
struct mpii_device  *dev = arg;
 
-   /*
-* SAS and IR events are delivered separately, so it won't hurt
-* to wait for a second.
-*/
-   tsleep(sc, PRIBIO, mpiipause, hz);
-
-   if (!ISSET(dev-flags, MPII_DF_HIDDEN)) {
-   if (ISSET(dev-flags, MPII_DF_ATTACH))
-   scsi_probe_target(sc-sc_scsibus, dev-slot);
-   else if (ISSET(dev-flags, MPII_DF_DETACH))
-   scsi_detach_target(sc-sc_scsibus, dev-slot,
-   DETACH_FORCE);
-   }
-
if (ISSET(dev-flags, MPII_DF_DETACH)) {
mpii_sas_remove_device(sc, dev-dev_handle);
+   if (!ISSET(dev-flags, MPII_DF_HIDDEN)) {
+   scsi_detach_target(sc-sc_scsibus, dev-slot,
+   DETACH_FORCE);
+   }
free(dev, M_DEVBUF);
-   return;
-   }
 
-   CLR(dev-flags, MPII_DF_ATTACH);
+   } else if (ISSET(dev-flags, MPII_DF_ATTACH)) {
+   CLR(dev-flags, MPII_DF_ATTACH);
+   if (!ISSET(dev-flags, MPII_DF_HIDDEN))
+   scsi_probe_target(sc-sc_scsibus, dev-slot);
+   }
 }
 
 void
@@ -4547,9 +4541,12 @@ mpii_scsi_cmd_done(struct mpii_ccb *ccb)
 
case MPII_IOCSTATUS_BUSY:
case MPII_IOCSTATUS_INSUFFICIENT_RESOURCES:
+   xs-error = XS_BUSY;
+   break;
+
case MPII_IOCSTATUS_SCSI_IOC_TERMINATED:
case MPII_IOCSTATUS_SCSI_TASK_TERMINATED:
-   xs-error = XS_BUSY;
+   xs-error = XS_RESET;
break;
 
case MPII_IOCSTATUS_SCSI_INVALID_DEVHANDLE:
@@ -4559,6 +4556,7 @@ mpii_scsi_cmd_done(struct mpii_ccb *ccb)
 
default:
xs-error = XS_DRIVER_STUFFUP;
+   break;
}
 
if (sie-scsi_state  MPII_SCSIIO_ERR_STATE_AUTOSENSE_VALID)



timeout io on mpii(4)

2010-12-23 Thread David Gwynne
i can reliably produce a situation where an io on a disk attached
to mpii(4) never completes. this implements timeouts on scsi io so
we can recover from this situation.

ok?

Index: mpii.c
===
RCS file: /cvs/src/sys/dev/pci/mpii.c,v
retrieving revision 1.35
diff -u -p -r1.35 mpii.c
--- mpii.c  23 Aug 2010 00:53:36 -  1.35
+++ mpii.c  24 Dec 2010 06:04:38 -
@@ -1757,7 +1757,8 @@ struct mpii_ccb {
volatile enum {
MPII_CCB_FREE,
MPII_CCB_READY,
-   MPII_CCB_QUEUED
+   MPII_CCB_QUEUED,
+   MPII_CCB_TIMEOUT
}   ccb_state;
 
void(*ccb_done)(struct mpii_ccb *);
@@ -1822,6 +1823,15 @@ struct mpii_softc {
struct mpii_ccb_listsc_ccb_free;
struct mutexsc_ccb_free_mtx;
 
+   struct mutexsc_ccb_mtx;
+   /*
+* this protects the ccb state and list entry
+* between mpii_scsi_cmd and scsidone.
+*/
+
+   struct mpii_ccb_listsc_ccb_tmos;
+   struct scsi_iohandler   sc_ccb_tmo_handler;
+
struct scsi_iopool  sc_iopool;
 
struct mpii_dmamem  *sc_requests;
@@ -1894,6 +1904,10 @@ int  mpii_alloc_queues(struct mpii_softc
 void   mpii_push_reply(struct mpii_softc *, struct mpii_rcb *);
 void   mpii_push_replies(struct mpii_softc *);
 
+void   mpii_scsi_cmd_tmo(void *);
+void   mpii_scsi_cmd_tmo_handler(void *, void *);
+void   mpii_scsi_cmd_tmo_done(struct mpii_ccb *);
+
 intmpii_alloc_dev(struct mpii_softc *);
 intmpii_insert_dev(struct mpii_softc *, struct mpii_device *);
 intmpii_remove_dev(struct mpii_softc *, struct mpii_device *);
@@ -4035,7 +4049,11 @@ mpii_alloc_ccbs(struct mpii_softc *sc)
int i;
 
SLIST_INIT(sc-sc_ccb_free);
+   SLIST_INIT(sc-sc_ccb_tmos);
mtx_init(sc-sc_ccb_free_mtx, IPL_BIO);
+   mtx_init(sc-sc_ccb_mtx, IPL_BIO);
+   scsi_ioh_set(sc-sc_ccb_tmo_handler, sc-sc_iopool,
+   mpii_scsi_cmd_tmo_handler, sc);
 
sc-sc_ccbs = malloc(sizeof(*ccb) * (sc-sc_request_depth-1),
M_DEVBUF, M_NOWAIT | M_ZERO);
@@ -4448,6 +4466,7 @@ mpii_scsi_cmd(struct scsi_xfer *xs)
DNPRINTF(MPII_D_CMD, %s:  Offset0: 0x%02x\n, DEVNAME(sc),
io-sgl_offset0);
 
+   timeout_set(xs-stimeout, mpii_scsi_cmd_tmo, ccb);
if (xs-flags  SCSI_POLL) {
if (mpii_poll(sc, ccb) != 0) {
xs-error = XS_DRIVER_STUFFUP;
@@ -4459,10 +4478,66 @@ mpii_scsi_cmd(struct scsi_xfer *xs)
DNPRINTF(MPII_D_CMD, %s:mpii_scsi_cmd(): opcode: %02x 
datalen: %d\n, DEVNAME(sc), xs-cmd-opcode, xs-datalen);
 
+   timeout_add_msec(xs-stimeout, xs-timeout);
mpii_start(sc, ccb);
 }
 
 void
+mpii_scsi_cmd_tmo(void *xccb)
+{
+   struct mpii_ccb *ccb = xccb;
+   struct mpii_softc   *sc = ccb-ccb_sc;
+
+   printf(%s: mpii_scsi_cmd_tmo\n, DEVNAME(sc));
+
+   mtx_enter(sc-sc_ccb_mtx);
+   if (ccb-ccb_state == MPII_CCB_QUEUED) {
+   ccb-ccb_state = MPII_CCB_TIMEOUT;
+   SLIST_INSERT_HEAD(sc-sc_ccb_tmos, ccb, ccb_link);
+   }
+   mtx_leave(sc-sc_ccb_mtx);
+
+   scsi_ioh_add(sc-sc_ccb_tmo_handler);
+}
+
+void
+mpii_scsi_cmd_tmo_handler(void *cookie, void *io)
+{
+   struct mpii_softc   *sc = cookie;
+   struct mpii_ccb *tccb = io;
+   struct mpii_ccb *ccb;
+   struct mpii_msg_scsi_task_request   *stq;
+
+   mtx_enter(sc-sc_ccb_mtx);
+   ccb = SLIST_FIRST(sc-sc_ccb_tmos);
+   if (ccb != NULL) {
+   SLIST_REMOVE_HEAD(sc-sc_ccb_tmos, ccb_link);
+   ccb-ccb_state = MPII_CCB_QUEUED;
+   }
+   /* should remove any other ccbs for the same dev handle */
+   mtx_leave(sc-sc_ccb_mtx);
+
+   if (ccb == NULL) {
+   scsi_io_put(sc-sc_iopool, tccb);
+   return;
+   }
+
+   stq = tccb-ccb_cmd;
+   stq-function = MPII_FUNCTION_SCSI_TASK_MGMT;
+   stq-task_type = MPII_SCSI_TASK_TARGET_RESET;
+   stq-dev_handle = htole16(ccb-ccb_dev_handle);
+
+   tccb-ccb_done = mpii_scsi_cmd_tmo_done;
+   mpii_start(sc, tccb);
+}
+
+void
+mpii_scsi_cmd_tmo_done(struct mpii_ccb *tccb)
+{
+   mpii_scsi_cmd_tmo_handler(tccb-ccb_sc, tccb);
+}
+
+void
 mpii_scsi_cmd_done(struct mpii_ccb *ccb)
 {
struct mpii_msg_scsi_io_error   *sie;
@@ -4470,6 +4545,14 @@ mpii_scsi_cmd_done(struct mpii_ccb *ccb)
struct scsi_xfer*xs = ccb-ccb_cookie;
struct mpii_ccb_bundle  *mcb = ccb-ccb_cmd;
bus_dmamap_tdmap = ccb-ccb_dmamap;
+
+   

Re: working hotplug for busy devices on mpii(4)

2010-12-28 Thread David Gwynne
ive had no takers on testing.

i cant see how raid and sas events will race in the current code,
so i think the 1second sleep to avoid confusion is unecessary. i
will put it in and deal with fallout if it comes up.

On Fri, Dec 24, 2010 at 11:44:52AM -0500, Kenneth R Westerback wrote:
 On Fri, Dec 24, 2010 at 04:01:19PM +1000, David Gwynne wrote:
  hi guys,
  
  this makes mpii properly detach devices, which helps a lot if they
  have commands in flight. to relevant changes are:
  
  - call the activate(DVACT_DEACTIVATE) function against all the luns
  on the target that is going away.
  - issue the target reset BEFORE detaching the children devices.
  this is needed now tha the midlayer will sleep until all outstanding
  commands on a device come back from the adapter before calling the
  child devices attach routine.
  
  i have tested this on straight disks, but not on raid volumes. i
  need someone to test disk removal behind a raid set before i can
  commit it.
  
  assuming testing goes well, can i get oks too?
  
  dlg
 
 If testing goes well, ok k...@. Alas I don't have any to test.
 
  Ken
 
  
  Index: mpii.c
  ===
  RCS file: /cvs/src/sys/dev/pci/mpii.c,v
  retrieving revision 1.35
  diff -u -p -r1.35 mpii.c
  --- mpii.c  23 Aug 2010 00:53:36 -  1.35
  +++ mpii.c  24 Dec 2010 05:46:34 -
  @@ -3417,6 +3417,8 @@ mpii_event_sas(struct mpii_softc *sc, st
  mpii_remove_dev(sc, dev);
  if (sc-sc_scsibus) {
  SET(dev-flags, MPII_DF_DETACH);
  +   scsi_activate(sc-sc_scsibus, dev-slot, -1,
  +   DVACT_DEACTIVATE);
  if (scsi_task(mpii_event_defer, sc,
  dev, 0) != 0)
  printf(%s: unable to run device 
  @@ -3529,27 +3531,19 @@ mpii_event_defer(void *xsc, void *arg)
  struct mpii_softc   *sc = xsc;
  struct mpii_device  *dev = arg;
   
  -   /*
  -* SAS and IR events are delivered separately, so it won't hurt
  -* to wait for a second.
  -*/
  -   tsleep(sc, PRIBIO, mpiipause, hz);
  -
  -   if (!ISSET(dev-flags, MPII_DF_HIDDEN)) {
  -   if (ISSET(dev-flags, MPII_DF_ATTACH))
  -   scsi_probe_target(sc-sc_scsibus, dev-slot);
  -   else if (ISSET(dev-flags, MPII_DF_DETACH))
  -   scsi_detach_target(sc-sc_scsibus, dev-slot,
  -   DETACH_FORCE);
  -   }
  -
  if (ISSET(dev-flags, MPII_DF_DETACH)) {
  mpii_sas_remove_device(sc, dev-dev_handle);
  +   if (!ISSET(dev-flags, MPII_DF_HIDDEN)) {
  +   scsi_detach_target(sc-sc_scsibus, dev-slot,
  +   DETACH_FORCE);
  +   }
  free(dev, M_DEVBUF);
  -   return;
  -   }
   
  -   CLR(dev-flags, MPII_DF_ATTACH);
  +   } else if (ISSET(dev-flags, MPII_DF_ATTACH)) {
  +   CLR(dev-flags, MPII_DF_ATTACH);
  +   if (!ISSET(dev-flags, MPII_DF_HIDDEN))
  +   scsi_probe_target(sc-sc_scsibus, dev-slot);
  +   }
   }
   
   void
  @@ -4547,9 +4541,12 @@ mpii_scsi_cmd_done(struct mpii_ccb *ccb)
   
  case MPII_IOCSTATUS_BUSY:
  case MPII_IOCSTATUS_INSUFFICIENT_RESOURCES:
  +   xs-error = XS_BUSY;
  +   break;
  +
  case MPII_IOCSTATUS_SCSI_IOC_TERMINATED:
  case MPII_IOCSTATUS_SCSI_TASK_TERMINATED:
  -   xs-error = XS_BUSY;
  +   xs-error = XS_RESET;
  break;
   
  case MPII_IOCSTATUS_SCSI_INVALID_DEVHANDLE:
  @@ -4559,6 +4556,7 @@ mpii_scsi_cmd_done(struct mpii_ccb *ccb)
   
  default:
  xs-error = XS_DRIVER_STUFFUP;
  +   break;
  }
   
  if (sie-scsi_state  MPII_SCSIIO_ERR_STATE_AUTOSENSE_VALID)



iopools for mfi(4)

2010-12-29 Thread David Gwynne
the subject pretty much says it all. this is the least intrusive
version of the change i could come up with.

io on multiple volumes is scheduled better, and the ioctl paths
become more reliable with this change.

tests? ok?

Index: mfi.c
===
RCS file: /cvs/src/sys/dev/ic/mfi.c,v
retrieving revision 1.113
diff -u -p -r1.113 mfi.c
--- mfi.c   24 Sep 2010 01:30:05 -  1.113
+++ mfi.c   29 Dec 2010 09:49:27 -
@@ -63,8 +63,8 @@ struct scsi_adapter mfi_switch = {
mfi_scsi_cmd, mfiminphys, 0, 0, mfi_scsi_ioctl
 };
 
-struct mfi_ccb *mfi_get_ccb(struct mfi_softc *);
-void   mfi_put_ccb(struct mfi_ccb *);
+void * mfi_get_ccb(void *);
+void   mfi_put_ccb(void *, void *);
 intmfi_init_ccb(struct mfi_softc *);
 
 struct mfi_mem *mfi_allocmem(struct mfi_softc *, size_t);
@@ -85,6 +85,8 @@ int   mfi_scsi_io(struct mfi_ccb *, struc
 void   mfi_scsi_xs_done(struct mfi_ccb *);
 intmfi_mgmt(struct mfi_softc *, uint32_t, uint32_t, uint32_t,
void *, uint8_t *);
+intmfi_do_mgmt(struct mfi_softc *, struct mfi_ccb * , uint32_t,
+   uint32_t, uint32_t, void *, uint8_t *);
 void   mfi_mgmt_done(struct mfi_ccb *);
 
 #if NBIO  0
@@ -146,9 +148,10 @@ static const struct mfi_iop_ops mfi_iop_
 #define mfi_my_intr(_s)((_s)-sc_iop-mio_intr(_s))
 #define mfi_post(_s, _c)   ((_s)-sc_iop-mio_post((_s), (_c)))
 
-struct mfi_ccb *
-mfi_get_ccb(struct mfi_softc *sc)
+void *
+mfi_get_ccb(void *cookie)
 {
+   struct mfi_softc*sc = cookie;
struct mfi_ccb  *ccb;
 
mtx_enter(sc-sc_ccb_mtx);
@@ -165,9 +168,10 @@ mfi_get_ccb(struct mfi_softc *sc)
 }
 
 void
-mfi_put_ccb(struct mfi_ccb *ccb)
+mfi_put_ccb(void *cookie, void *io)
 {
-   struct mfi_softc*sc = ccb-ccb_sc;
+   struct mfi_softc*sc = cookie;
+   struct mfi_ccb  *ccb = io;
struct mfi_frame_header *hdr = ccb-ccb_frame-mfr_header;
 
DNPRINTF(MFI_D_CCB, %s: mfi_put_ccb: %p\n, DEVNAME(sc), ccb);
@@ -239,7 +243,7 @@ mfi_init_ccb(struct mfi_softc *sc)
ccb-ccb_dmamap);
 
/* add ccb to queue */
-   mfi_put_ccb(ccb);
+   mfi_put_ccb(sc, ccb);
}
 
return (0);
@@ -436,7 +440,7 @@ mfi_initialize_firmware(struct mfi_softc
return (1);
}
 
-   mfi_put_ccb(ccb);
+   mfi_put_ccb(sc, ccb);
 
return (0);
 }
@@ -638,6 +642,7 @@ mfi_attach(struct mfi_softc *sc, enum mf
 
SLIST_INIT(sc-sc_ccb_freeq);
mtx_init(sc-sc_ccb_mtx, IPL_BIO);
+   scsi_iopool_init(sc-sc_iopool, sc, mfi_get_ccb, mfi_put_ccb);
 
rw_init(sc-sc_lock, mfi_lock);
 
@@ -718,6 +723,7 @@ mfi_attach(struct mfi_softc *sc, enum mf
sc-sc_link.adapter = mfi_switch;
sc-sc_link.adapter_target = MFI_MAX_LD;
sc-sc_link.adapter_buswidth = sc-sc_max_ld;
+   sc-sc_link.pool = sc-sc_iopool;
 
bzero(saa, sizeof(saa));
saa.saa_sc_link = sc-sc_link;
@@ -931,7 +937,6 @@ mfi_scsi_xs_done(struct mfi_ccb *ccb)
break;
}
 
-   mfi_put_ccb(ccb);
scsi_done(xs);
 }
 
@@ -988,7 +993,7 @@ mfi_scsi_cmd(struct scsi_xfer *xs)
struct scsi_link*link = xs-sc_link;
struct mfi_softc*sc = link-adapter_softc;
struct device   *dev = link-device_softc;
-   struct mfi_ccb  *ccb;
+   struct mfi_ccb  *ccb = xs-io;
struct scsi_rw  *rw;
struct scsi_rw_big  *rwb;
struct scsi_rw_16   *rw16;
@@ -1007,13 +1012,6 @@ mfi_scsi_cmd(struct scsi_xfer *xs)
goto stuffup;
}
 
-   if ((ccb = mfi_get_ccb(sc)) == NULL) {
-   DNPRINTF(MFI_D_CMD, %s: mfi_scsi_cmd no ccb\n, DEVNAME(sc));
-   xs-error = XS_NO_CCB;
-   scsi_done(xs);
-   return;
-   }
-
xs-error = XS_NOERROR;
 
switch (xs-cmd-opcode) {
@@ -1023,10 +1021,8 @@ mfi_scsi_cmd(struct scsi_xfer *xs)
rwb = (struct scsi_rw_big *)xs-cmd;
blockno = (uint64_t)_4btol(rwb-addr);
blockcnt = _2btol(rwb-length);
-   if (mfi_scsi_io(ccb, xs, blockno, blockcnt)) {
-   mfi_put_ccb(ccb);
+   if (mfi_scsi_io(ccb, xs, blockno, blockcnt))
goto stuffup;
-   }
break;
 
case READ_COMMAND:
@@ -1035,10 +1031,8 @@ mfi_scsi_cmd(struct scsi_xfer *xs)
blockno =
(uint64_t)(_3btol(rw-addr)  (SRW_TOPADDR  16 | 0x));
blockcnt = rw-length ? rw-length : 0x100;
-   if (mfi_scsi_io(ccb, xs, blockno, blockcnt)) {
-   mfi_put_ccb(ccb);
+   if (mfi_scsi_io(ccb, xs, blockno, blockcnt))
   

Re: cut vnd's over to bufqs, again.

2010-12-29 Thread David Gwynne
explanation makes perfect sense to me and the diff is good.

ok dlg@

On 29/12/2010, at 11:03 PM, Thordur Bjornsson wrote:

 hi,
 
so cut vnds over to bufqs. this diff is similar to a diff
that was commited, but got backed out after one of the
hackathon fiasco's, with a small difference.
 
there is no reason to keep an active count, bufq_peek is
enough to figure out if the queue is empty or not.
 
in vndiodone, there is no need to jump through hoops to
figure out if we need to disk_unbusy(). We always need to
there is a one-to-one against disk_busy() in vndstart, as
we set the biodone callback to null so we don't end up there
twice.
 
OK?
 
 ciao, thib. 
 
 
 Index: dev/vnd.c
 ===
 RCS file: /usr/cvs/src/sys/dev/vnd.c,v
 retrieving revision 1.104
 diff -u -p -r1.104 vnd.c
 --- dev/vnd.c 22 Dec 2010 13:12:14 -  1.104
 +++ dev/vnd.c 28 Dec 2010 11:54:44 -
 @@ -1,4 +1,4 @@
 -/*   $OpenBSD: vnd.c,v 1.104 2010/12/22 13:12:14 jsing Exp $ */
 +/*   $OpenBSD: vnd.c,v 1.92 2009/06/04 05:57:27 krw Exp $*/
 /*$NetBSD: vnd.c,v 1.26 1996/03/30 23:06:11 christos Exp $*/
 
 /*
 @@ -127,6 +127,8 @@ struct vnd_softc {
   struct disk  sc_dk;
   char sc_dk_name[16];
 
 + struct bufq  sc_bufq;
 +
   char sc_file[VNDNLEN];  /* file we're covering */
   int  sc_flags;  /* flags */
   size_t   sc_size;   /* size of vnd in sectors */
 @@ -135,7 +137,6 @@ struct vnd_softc {
   size_t   sc_ntracks;/* # of tracks per cylinder */
   struct vnode*sc_vp; /* vnode */
   struct ucred*sc_cred;   /* credentials */
 - struct buf   sc_tab;/* transfer queue */
   blf_ctx *sc_keyctx; /* key context */
   struct rwlocksc_rwlock;
 };
 @@ -209,6 +210,7 @@ vndattach(int num)
   vnd_softc = (struct vnd_softc *)mem;
   for (i = 0; i  num; i++) {
   rw_init(vnd_softc[i].sc_rwlock, vndlock);
 + bufq_init(vnd_softc[i].sc_bufq, BUFQ_DEFAULT);
   }
   numvnd = num;
 
 @@ -489,8 +491,8 @@ vndstrategy(struct buf *bp)
   biodone(bp);
   splx(s);
 
 - /* If nothing more is queued, we are done.  */
 - if (!vnd-sc_tab.b_active)
 + /* If nothing more is queued, we are done. */
 + if (!bufq_peek(vnd-sc_bufq))
   return;
 
   /*
 @@ -498,9 +500,8 @@ vndstrategy(struct buf *bp)
* routine might queue using same links.
*/
   s = splbio();
 - bp = vnd-sc_tab.b_actf;
 - vnd-sc_tab.b_actf = bp-b_actf;
 - vnd-sc_tab.b_active--;
 + bp = bufq_dequeue(vnd-sc_bufq);
 + KASSERT(bp != NULL);
   splx(s);
   }
   }
 @@ -596,13 +597,9 @@ vndstrategy(struct buf *bp)
   splx(s);
   return;
   }
 - /*
 -  * Just sort by block number
 -  */
 - nbp-vb_buf.b_cylinder = nbp-vb_buf.b_blkno;
 +
 + bufq_queue(vnd-sc_bufq, nbp-vb_buf);
   s = splbio();
 - disksort(vnd-sc_tab, nbp-vb_buf);
 - vnd-sc_tab.b_active++;
   vndstart(vnd);
   splx(s);
   bn += sz;
 @@ -625,8 +622,9 @@ vndstart(struct vnd_softc *vnd)
* Dequeue now since lower level strategy routine might
* queue using same links
*/
 - bp = vnd-sc_tab.b_actf;
 - vnd-sc_tab.b_actf = bp-b_actf;
 + bp = bufq_dequeue(vnd-sc_bufq);
 + if (bp == NULL)
 + return;
 
   DNPRINTF(VDB_IO,
   vndstart(%d): bp %p vp %p blkno %lld addr %p cnt %lx\n,
 @@ -675,13 +673,8 @@ vndiodone(struct buf *bp)
 
 out:
   putvndbuf(vbp);
 -
 - if (vnd-sc_tab.b_active) {
 - disk_unbusy(vnd-sc_dk, (pbp-b_bcount - pbp-b_resid),
 - (pbp-b_flags  B_READ));
 - if (!vnd-sc_tab.b_actf)
 - vnd-sc_tab.b_active--;
 - }
 + disk_unbusy(vnd-sc_dk, (pbp-b_bcount - pbp-b_resid),
 + (pbp-b_flags  B_READ));
 }
 
 /* ARGSUSED */



Re: timeout io on mpii(4)

2010-12-30 Thread David Gwynne
On 31/12/2010, at 3:51 AM, Mike Belopuhov wrote:

 On Fri, Dec 24, 2010 at 16:09 +1000, David Gwynne wrote:
 i can reliably produce a situation where an io on a disk attached
 to mpii(4) never completes. this implements timeouts on scsi io so
 we can recover from this situation.

 ok?


 although, you've already committed the change, i have two questions
 regarding this.  please find them inline.

cool :) answers inline too.


 Index: mpii.c
 ===
 RCS file: /cvs/src/sys/dev/pci/mpii.c,v
 retrieving revision 1.35
 diff -u -p -r1.35 mpii.c
 --- mpii.c   23 Aug 2010 00:53:36 -  1.35
 +++ mpii.c   24 Dec 2010 06:04:38 -
 @@ -4448,6 +4466,7 @@ mpii_scsi_cmd(struct scsi_xfer *xs)
  DNPRINTF(MPII_D_CMD, %s:  Offset0: 0x%02x\n, DEVNAME(sc),
  io-sgl_offset0);

 +timeout_set(xs-stimeout, mpii_scsi_cmd_tmo, ccb);
  if (xs-flags  SCSI_POLL) {
  if (mpii_poll(sc, ccb) != 0) {
  xs-error = XS_DRIVER_STUFFUP;
 @@ -4459,10 +4478,66 @@ mpii_scsi_cmd(struct scsi_xfer *xs)
  DNPRINTF(MPII_D_CMD, %s:mpii_scsi_cmd(): opcode: %02x 
  datalen: %d\n, DEVNAME(sc), xs-cmd-opcode, xs-datalen);

 +timeout_add_msec(xs-stimeout, xs-timeout);
  mpii_start(sc, ccb);
 }

 void
 +mpii_scsi_cmd_tmo(void *xccb)
 +{
 +struct mpii_ccb *ccb = xccb;
 +struct mpii_softc   *sc = ccb-ccb_sc;
 +
 +printf(%s: mpii_scsi_cmd_tmo\n, DEVNAME(sc));
 +
 +mtx_enter(sc-sc_ccb_mtx);
 +if (ccb-ccb_state == MPII_CCB_QUEUED) {
 +ccb-ccb_state = MPII_CCB_TIMEOUT;
 +SLIST_INSERT_HEAD(sc-sc_ccb_tmos, ccb, ccb_link);
 +}
 +mtx_leave(sc-sc_ccb_mtx);
 +
 +scsi_ioh_add(sc-sc_ccb_tmo_handler);
 +}
 +
 +void
 +mpii_scsi_cmd_tmo_handler(void *cookie, void *io)
 +{
 +struct mpii_softc   *sc = cookie;
 +struct mpii_ccb *tccb = io;
 +struct mpii_ccb *ccb;
 +struct mpii_msg_scsi_task_request   *stq;
 +
 +mtx_enter(sc-sc_ccb_mtx);
 +ccb = SLIST_FIRST(sc-sc_ccb_tmos);
 +if (ccb != NULL) {
 +SLIST_REMOVE_HEAD(sc-sc_ccb_tmos, ccb_link);
 +ccb-ccb_state = MPII_CCB_QUEUED;
 +}
 +/* should remove any other ccbs for the same dev handle */
 +mtx_leave(sc-sc_ccb_mtx);
 +
 +if (ccb == NULL) {
 +scsi_io_put(sc-sc_iopool, tccb);
 +return;
 +}
 +
 +stq = tccb-ccb_cmd;
 +stq-function = MPII_FUNCTION_SCSI_TASK_MGMT;
 +stq-task_type = MPII_SCSI_TASK_TARGET_RESET;
 +stq-dev_handle = htole16(ccb-ccb_dev_handle);
 +

 why do you issue 'target reset' instead of 'abort task' here?

thats what solaris and linux do.


 +tccb-ccb_done = mpii_scsi_cmd_tmo_done;
 +mpii_start(sc, tccb);
 +}
 +
 +void
 +mpii_scsi_cmd_tmo_done(struct mpii_ccb *tccb)
 +{
 +mpii_scsi_cmd_tmo_handler(tccb-ccb_sc, tccb);
 +}
 +

 why do you call this function again from here?

xs timeouts put the xs on a list to be serviced. the io handler services that
list. the done function calling the handler again is the way it pulls the next
one off the list.

dlg



Re: working hotplug for busy devices on mpii(4)

2010-12-30 Thread David Gwynne
On 31/12/2010, at 4:03 AM, Mike Belopuhov wrote:

 On Wed, Dec 29, 2010 at 11:01 +1000, David Gwynne wrote:
 ive had no takers on testing.

 i cant see how raid and sas events will race in the current code,
 so i think the 1second sleep to avoid confusion is unecessary. i
 will put it in and deal with fallout if it comes up.


 i didn't look thru the diff yet, sorry for that.
 the race you're talking about is that for IR enabled setups
 you first get an event for SAS and then for IR, and the purpose
 of the IR event handler is to set the MPII_DF_HIDDEN flag.

 as you can see handling for SAS drives and volume drives is
 different:

   if (!ISSET(dev-flags, MPII_DF_HIDDEN)) {
   if (ISSET(dev-flags, MPII_DF_ATTACH))
   scsi_probe_target(sc-sc_scsibus, dev-slot);
   else if (ISSET(dev-flags, MPII_DF_DETACH))
   scsi_detach_target(sc-sc_scsibus, dev-slot,
   DETACH_FORCE);
   }

   if (ISSET(dev-flags, MPII_DF_DETACH)) {
   mpii_sas_remove_device(sc, dev-dev_handle);
   free(dev, M_DEVBUF);
   return;
   }

 we're not supposed to call scsi_detach_target on volume disks.
 so your change looks wrong to me, iirc how it works.

i'll wire up an IR volume and muck around then. does this mean that the
physical and logical disks have colliding or overlapping entries in the
sc_devs array?

dlg



Re: Teach pcidump(8) about another capability

2011-01-12 Thread David Gwynne
ok

On 12/01/2011, at 8:44 AM, Mark Kettenis wrote:

 Capability 0x12 is called SATA.  I have a diff to add PCI_CAP_SATA
 to pcireg.h, but given the fact that we're in ABI lock, that'll have
 to wait.  But it is anyhow better to use the number of elements in the
 array to decide whether we know about a capability or not.
 
 ok?
 
 Index: pcidump.c
 ===
 RCS file: /cvs/src/usr.sbin/pcidump/pcidump.c,v
 retrieving revision 1.26
 diff -u -p -r1.26 pcidump.c
 --- pcidump.c 19 Dec 2010 23:23:21 -  1.26
 +++ pcidump.c 11 Jan 2011 22:41:28 -
 @@ -35,6 +35,10 @@
 
 #define PCIDEV/dev/pci
 
 +#ifndef nitems
 +#define nitems(_a)   (sizeof((_a)) / sizeof((_a)[0]))
 +#endif
 +
 __dead void usage(void);
 void scanpcidomain(void);
 int probe(int, int, int);
 @@ -86,9 +90,9 @@ const char *pci_capnames[] = {
   AGP8,
   Secure,
   PCI Express,
 - Extended Message Signaled Interrupts (MSI-X)
 + Extended Message Signaled Interrupts (MSI-X),
 + SATA
 };
 -#define PCI_CAPNAMES_MAX PCI_CAP_MSIX
 
 int
 main(int argc, char *argv[])
 @@ -337,7 +341,7 @@ dump_caplist(int bus, int dev, int func,
   return;
   cap = PCI_CAPLIST_CAP(reg);
   printf(\t0x%04x: Capability 0x%02x: , ptr, cap);
 - if (cap  PCI_CAPNAMES_MAX)
 + if (cap  nitems(pci_capnames))
   cap = 0;
   printf(%s\n, pci_capnames[cap]);
   if (cap == PCI_CAP_PCIEXPRESS)



Re: de(4) bus_dma diff needs testing.

2009-06-14 Thread David Gwynne

On 15/06/2009, at 1:39 AM, Christian Weisgerber wrote:


Theo de Raadt dera...@cvs.openbsd.org wrote:


ahci0: error 35 loading dmamap
de0: unable to load tx map, error = 35


bigmem is unworkable.
Your DMA descriptor rings are in high memory.
They should be in low memory.


I thought iommu takes care of that?


what iommu?

also, why should you have to fix these mappings in bus_dma if you  
simply allocate memory up front in the right place for the devices?


dlg



Re: scsi midlayer tweak

2009-08-13 Thread David Gwynne
this is a new version of my diff, which is necessary following some
changes that have been committed to the tree since my original diff.
it also fixes a race in the scsi_scsi_cmd completion path, and locks
the sd buffers consistently.

this diff has been tested on mpi, siop, isp, ami, ahci, and umass
(usb). i have read through mfi, arc, and sili. if you want to avoid
this diff breaking any other controllers, please try the diff out
and report results.

On Thu, Aug 13, 2009 at 02:50:00AM +1000, David Gwynne wrote:
 this diff starts to address several problems i have with the scsi
 midlayer.
 
 the most important at the moment is that the entrypoint into the
 current midlayer is through a function called scsi_scsi_cmd. the
 problem with this function is that it is impossible to start an
 async scsi operation and then tell if the command has failed,
 completed, or been queued without shoving a buf down with it.
 
 this sucks for mpath because it simply wants to take commands from
 sd/cd/etc and push them down a real physical path. the buf handling
 will be done by the sd to mpath leg of the journey, trying to do
 it again on the mpath to mpi/isp/etc leg of the journey will cause
 a use after free.
 
 this lets drivers like sd/cd/mpath allocate an xs, fill it in, and
 supply a completion routine for it. in the mpath case this completion
 routine goes on and completes the io it was asked to do on cd/sd/etc's
 behalf.
 
 i have tweaked sd to use this new interface too to verify it is
 usable. i have also reimplimented scsi_scsi_cmd to retain backward
 compat for old users (currently everything except sd).
 
 these changes clear the way to make a hbas scsi_cmd routine not
 need to return an error code, in the future the ability of the hba
 to complete an xs will be reported by changing the state of the xs
 and completing it. there is a huge amount of confusion in hba drivers
 at the moment about the right way to report errors up to the midlayer
 and then in turn up to the device drivers. forcing all reporting
 to be done via the xs will simplify code hugely and make it more
 robust.
 
 the last benefit is this makes it easier to allow the hba to provide
 the xs to cd/sd/mpath/etc.
 
 i encourage everyone to test this diff and tell me what blows up.

Index: mpath.c
===
RCS file: /cvs/src/sys/scsi/mpath.c,v
retrieving revision 1.3
diff -u -p -r1.3 mpath.c
--- mpath.c 9 Aug 2009 16:55:02 -   1.3
+++ mpath.c 13 Aug 2009 21:44:17 -
@@ -78,6 +78,8 @@ int   mpath_cmd(struct scsi_xfer *);
 void   mpath_minphys(struct buf *, struct scsi_link *);
 intmpath_probe(struct scsi_link *);
 
+void   mpath_done(struct scsi_xfer *);
+
 struct scsi_adapter mpath_switch = {
mpath_cmd,
scsi_minphys,
@@ -148,36 +150,53 @@ mpath_cmd(struct scsi_xfer *xs)
struct scsi_link *link = xs-sc_link;
struct mpath_node *n = mpath_nodes[link-target];
struct mpath_path *p = TAILQ_FIRST(n-node_paths);
-   int rv;
-   int s;
+   struct scsi_xfer *mxs;
 
if (n == NULL || p == NULL) {
mpath_xs_stuffup(xs);
return (COMPLETE);
}
 
-   rv = scsi_scsi_cmd(p-path_link, xs-cmd, xs-cmdlen,
-   xs-data, xs-datalen,
-   2, xs-timeout, NULL, SCSI_POLL |
-   (xs-flags  (SCSI_DATA_IN|SCSI_DATA_OUT)));
+   mxs = scsi_xs_get(p-path_link, xs-flags);
+   if (mxs == NULL) {
+   mpath_xs_stuffup(xs);
+   return (COMPLETE);
+   }
 
+   memcpy(mxs-cmd, xs-cmd, xs-cmdlen);
+   mxs-cmdlen = xs-cmdlen;
+   mxs-data = xs-data;
+   mxs-datalen = xs-datalen;
+   mxs-retries = xs-retries;
+   mxs-timeout = xs-timeout;
+   mxs-req_sense_length = xs-req_sense_length;
 
-   xs-flags |= ITSDONE;
-   if (rv == 0) {
-   xs-error = XS_NOERROR;
-   xs-status = SCSI_OK;
-   xs-resid = 0;
-   } else {
-   printf(%s: t%dl%d rv %d cmd %x\n, DEVNAME(mpath),
-   link-target, link-lun, rv, xs-cmd-opcode);
-   xs-error = XS_DRIVER_STUFFUP;
-   }
+   mxs-cookie = xs;
+   mxs-done = mpath_done;
+
+   scsi_xs_exec(mxs);
+
+   return (COMPLETE); /* doesnt matter anymore */
+}
+
+void
+mpath_done(struct scsi_xfer *mxs)
+{
+   struct scsi_xfer *xs = mxs-cookie;
+   int s;
+
+   xs-error = mxs-error;
+   xs-status = mxs-status;
+   xs-flags = mxs-flags;
+   xs-resid = mxs-resid;
+
+   memcpy(xs-sense, mxs-sense, sizeof(xs-sense));
+
+   scsi_xs_put(mxs);
 
s = splbio();
scsi_done(xs);
splx(s);
-
-   return (COMPLETE);
 }
 
 void
Index: scsi_base.c
===
RCS file: /cvs/src/sys/scsi/scsi_base.c,v
retrieving revision 1.134
diff -u -p -r1.134 scsi_base.c
--- scsi_base.c 13

Re: scsi midlayer tweak

2009-08-20 Thread David Gwynne
 This second patch applies cleanly, and no errors were generated in 
 dmesg.  Subsequent usage of growisofs fails to burn a dvd, and hangs the 
 computer.  The following error message was displayed on the console:
 
 cd0(atapiscsi0:0:0):User Command with no buffer

ah, good catch. this is another update to my diff that includes a
rewritten ioctl path.

Index: mpath.c
===
RCS file: /cvs/src/sys/scsi/mpath.c,v
retrieving revision 1.3
diff -u -p -r1.3 mpath.c
--- mpath.c 9 Aug 2009 16:55:02 -   1.3
+++ mpath.c 20 Aug 2009 13:41:57 -
@@ -78,6 +78,8 @@ int   mpath_cmd(struct scsi_xfer *);
 void   mpath_minphys(struct buf *, struct scsi_link *);
 intmpath_probe(struct scsi_link *);
 
+void   mpath_done(struct scsi_xfer *);
+
 struct scsi_adapter mpath_switch = {
mpath_cmd,
scsi_minphys,
@@ -148,36 +150,53 @@ mpath_cmd(struct scsi_xfer *xs)
struct scsi_link *link = xs-sc_link;
struct mpath_node *n = mpath_nodes[link-target];
struct mpath_path *p = TAILQ_FIRST(n-node_paths);
-   int rv;
-   int s;
+   struct scsi_xfer *mxs;
 
if (n == NULL || p == NULL) {
mpath_xs_stuffup(xs);
return (COMPLETE);
}
 
-   rv = scsi_scsi_cmd(p-path_link, xs-cmd, xs-cmdlen,
-   xs-data, xs-datalen,
-   2, xs-timeout, NULL, SCSI_POLL |
-   (xs-flags  (SCSI_DATA_IN|SCSI_DATA_OUT)));
+   mxs = scsi_xs_get(p-path_link, xs-flags);
+   if (mxs == NULL) {
+   mpath_xs_stuffup(xs);
+   return (COMPLETE);
+   }
 
+   memcpy(mxs-cmd, xs-cmd, xs-cmdlen);
+   mxs-cmdlen = xs-cmdlen;
+   mxs-data = xs-data;
+   mxs-datalen = xs-datalen;
+   mxs-retries = xs-retries;
+   mxs-timeout = xs-timeout;
+   mxs-req_sense_length = xs-req_sense_length;
 
-   xs-flags |= ITSDONE;
-   if (rv == 0) {
-   xs-error = XS_NOERROR;
-   xs-status = SCSI_OK;
-   xs-resid = 0;
-   } else {
-   printf(%s: t%dl%d rv %d cmd %x\n, DEVNAME(mpath),
-   link-target, link-lun, rv, xs-cmd-opcode);
-   xs-error = XS_DRIVER_STUFFUP;
-   }
+   mxs-cookie = xs;
+   mxs-done = mpath_done;
+
+   scsi_xs_exec(mxs);
+
+   return (COMPLETE); /* doesnt matter anymore */
+}
+
+void
+mpath_done(struct scsi_xfer *mxs)
+{
+   struct scsi_xfer *xs = mxs-cookie;
+   int s;
+
+   xs-error = mxs-error;
+   xs-status = mxs-status;
+   xs-flags = mxs-flags;
+   xs-resid = mxs-resid;
+
+   memcpy(xs-sense, mxs-sense, sizeof(xs-sense));
+
+   scsi_xs_put(mxs);
 
s = splbio();
scsi_done(xs);
splx(s);
-
-   return (COMPLETE);
 }
 
 void
Index: scsi_base.c
===
RCS file: /cvs/src/sys/scsi/scsi_base.c,v
retrieving revision 1.134
diff -u -p -r1.134 scsi_base.c
--- scsi_base.c 13 Aug 2009 21:35:56 -  1.134
+++ scsi_base.c 20 Aug 2009 13:41:57 -
@@ -1,4 +1,4 @@
-/* $OpenBSD: scsi_base.c,v 1.133 2009/08/13 19:49:31 dlg Exp $ */
+/* $OpenBSD: scsi_base.c,v 1.134 2009/08/13 21:35:56 dlg Exp $ */
 /* $NetBSD: scsi_base.c,v 1.43 1997/04/02 02:29:36 mycroft Exp $   */
 
 /*
@@ -50,15 +50,14 @@
 #include scsi/scsi_disk.h
 #include scsi/scsiconf.h
 
-static __inline struct scsi_xfer *scsi_make_xs(struct scsi_link *,
-struct scsi_generic *, int cmdlen, u_char *data_addr,
-int datalen, int retries, int timeout, struct buf *, int flags);
 static __inline void asc2ascii(u_int8_t, u_int8_t ascq, char *result,
 size_t len);
 intsc_err1(struct scsi_xfer *);
 intscsi_interpret_sense(struct scsi_xfer *);
 char   *scsi_decode_sense(struct scsi_sense_data *, int);
 
+void   scsi_xs_done(struct scsi_xfer *);
+
 /* Values for flag parameter to scsi_decode_sense. */
 #defineDECODE_SENSE_KEY1
 #defineDECODE_ASC_ASCQ 2
@@ -94,6 +93,7 @@ scsi_init()
/* Initialize the scsi_xfer pool. */
pool_init(scsi_xfer_pool, sizeof(struct scsi_xfer), 0,
0, 0, scxspl, NULL);
+   pool_setipl(scsi_xfer_pool, IPL_BIO);
/* Initialize the scsi_plug pool */
pool_init(scsi_plug_pool, sizeof(struct scsi_plug), 0,
0, 0, scsiplug, NULL);
@@ -188,42 +188,43 @@ scsi_deinit()
  */
 
 struct scsi_xfer *
-scsi_get_xs(struct scsi_link *sc_link, int flags)
+scsi_xs_get(struct scsi_link *link, int flags)
 {
-   struct scsi_xfer*xs;
-   int s;
-
-   SC_DEBUG(sc_link, SDEV_DB3, (scsi_get_xs\n));
+   struct scsi_xfer *xs;
 
-   s = splbio();
-   while (sc_link-openings == 0) {
-   SC_DEBUG(sc_link, SDEV_DB3, (sleeping\n));
-   if ((flags  SCSI_NOSLEEP) != 0) {
-   splx(s);
-   

Re: Add IDE / SATA support for AMD SB900 chipset.

2009-10-12 Thread David Gwynne

do the sb900 chipsets suffer the same bugs as the sb600 ones?

dlg

On 13/10/2009, at 11:00 AM, Brad wrote:


On Sun, Sep 20, 2009 at 02:31:27PM -0400, Brad wrote:
The following diffs add support for IDE and SATA with the AMD SB900  
chipset.


Here is an updated diff after the last commit to ahci(4).


Index: ahci.c
===
RCS file: /cvs/src/sys/dev/pci/ahci.c,v
retrieving revision 1.150
diff -u -p -r1.150 ahci.c
--- ahci.c  13 Oct 2009 00:19:38 -  1.150
+++ ahci.c  13 Oct 2009 00:56:24 -
@@ -427,6 +427,9 @@ int ahci_nvidia_mcp_attach(struct ahci
struct pci_attach_args *);

static const struct ahci_device ahci_devices[] = {
+   { PCI_VENDOR_AMD,   PCI_PRODUCT_AMD_SB900_SATA,
+   NULL,   ahci_ati_sb600_attach },
+
{ PCI_VENDOR_ATI,   PCI_PRODUCT_ATI_SB600_SATA,
NULL,   ahci_ati_sb600_attach },
{ PCI_VENDOR_ATI,   PCI_PRODUCT_ATI_SBX00_SATA_1,
Index: pciide.c
===
RCS file: /cvs/src/sys/dev/pci/pciide.c,v
retrieving revision 1.301
diff -u -p -r1.301 pciide.c
--- pciide.c5 Oct 2009 20:39:26 -   1.301
+++ pciide.c9 Oct 2009 23:22:46 -
@@ -560,6 +560,10 @@ const struct pciide_product_desc pciide_
{ PCI_PRODUCT_AMD_CS5536_IDE,
  0,
  amd756_chip_map
+   },
+   { PCI_PRODUCT_AMD_SB900_IDE,
+ 0,
+ ixp_chip_map
}
};


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




Re: [PATCH] Link state change with BCM5709

2009-11-09 Thread David Gwynne
On Fri, Oct 30, 2009 at 09:50:34PM +0200, Atte Peltom?ki wrote:
 Hi,
 
 Link state change interrupt was not generated due to a missing bit in
 the MAC event register. This fixes at least carp and ifstated for 5709 
 chip (eg. in Dell R610). 

i have verified this on the 5709. i dont have a box with older
versions of bnx in it to test on them though. have you been able
to try this on a 2950 or r200?

 Note that Broadcom Linux driver also sets STATUS_ATTN_BITS_TIMER_ABORT
 on the same go which I is detection for a hung pci bus. Code for this 
 interrupt is missing in the bnx_phy_intr() handler, so I left it out 
 for now. If you have any ideas how to test it, I can try porting the
 handler code. 
 
 
 --- sys/dev/pci/if_bnx.c  Fri Oct 30 21:04:04 2009
 +++ sys/dev/pci/if_bnx.cFri Oct 30 21:05:56 2009
 @@ -3461,6 +3461,7 @@
 
 /* Set up link change interrupt generation. */
 REG_WR(sc, BNX_EMAC_ATTENTION_ENA, BNX_EMAC_ATTENTION_ENA_LINK);
 +   REG_WR(sc, BNX_HC_ATTN_BITS_ENABLE, STATUS_ATTN_BITS_LINK_STATE);
 
 /* Program the physical address of the status block. */
 REG_WR(sc, BNX_HC_STATUS_ADDR_L, (u_int32_t)(sc-status_block_paddr));
 
 -- 
 Atte Peltomdki
  atte.peltom...@iki.fi  http://kameli.org
 Your effort to remain what you are is what limits you



Re: UBC?

2010-01-30 Thread David Gwynne
On Sun, Jan 31, 2010 at 09:41:14AM +1000, David Gwynne wrote:
 pmap_enter in this situation should fail, not panic. the error would be 
 handled properly as the stack unwinds up to ami_allocmem.
 
 i can change ami_allocmem to take a NOWAIT etc as an argument rather than 
 just assume it, so callers with process context (like the sensor refresh 
 shown below) can sleep waiting for mappings.

here's such a change.

i havent even compiled this, let alone tested it.

Index: ami.c
===
RCS file: /cvs/src/sys/dev/ic/ami.c,v
retrieving revision 1.199
diff -u -p -r1.199 ami.c
--- ami.c   9 Jan 2010 23:15:06 -   1.199
+++ ami.c   30 Jan 2010 23:49:09 -
@@ -124,7 +124,7 @@ voidami_write(struct ami_softc *, bus_
 
 void   ami_copyhds(struct ami_softc *, const u_int32_t *,
const u_int8_t *, const u_int8_t *);
-struct ami_mem *ami_allocmem(struct ami_softc *, size_t);
+struct ami_mem *ami_allocmem(struct ami_softc *, size_t, int);
 void   ami_freemem(struct ami_softc *, struct ami_mem *);
 intami_alloc_ccbs(struct ami_softc *, int);
 
@@ -256,31 +256,33 @@ ami_write(struct ami_softc *sc, bus_size
 }
 
 struct ami_mem *
-ami_allocmem(struct ami_softc *sc, size_t size)
+ami_allocmem(struct ami_softc *sc, size_t size, int nowait)
 {
struct ami_mem  *am;
int nsegs;
 
-   am = malloc(sizeof(struct ami_mem), M_DEVBUF, M_NOWAIT|M_ZERO);
+   am = malloc(sizeof(struct ami_mem), M_DEVBUF, M_ZERO |
+   nowait ? M_NOWAIT : 0);
if (am == NULL)
return (NULL);
 
am-am_size = size;
+   nowait = nowait ? BUS_DMA_NOWAIT : 0;
 
if (bus_dmamap_create(sc-sc_dmat, size, 1, size, 0,
-   BUS_DMA_NOWAIT | BUS_DMA_ALLOCNOW, am-am_map) != 0)
+   nowait | BUS_DMA_ALLOCNOW, am-am_map) != 0)
goto amfree; 
 
if (bus_dmamem_alloc(sc-sc_dmat, size, PAGE_SIZE, 0, am-am_seg, 1,
-   nsegs, BUS_DMA_NOWAIT) != 0)
+   nsegs, nowait) != 0)
goto destroy;
 
if (bus_dmamem_map(sc-sc_dmat, am-am_seg, nsegs, size, am-am_kva,
-   BUS_DMA_NOWAIT) != 0)
+   nowait) != 0)
goto free;
 
if (bus_dmamap_load(sc-sc_dmat, am-am_map, am-am_kva, size, NULL,
-   BUS_DMA_NOWAIT) != 0)
+   nowait) != 0)
goto unmap;
 
memset(am-am_kva, 0, size);
@@ -337,7 +339,8 @@ ami_alloc_ccbs(struct ami_softc *sc, int
return (1);
}
 
-   sc-sc_ccbmem_am = ami_allocmem(sc, sizeof(struct ami_ccbmem) * nccbs);
+   sc-sc_ccbmem_am = ami_allocmem(sc,
+   sizeof(struct ami_ccbmem) * nccbs, 1);
if (sc-sc_ccbmem_am == NULL) {
printf(: unable to allocate ccb dmamem\n);
goto free_ccbs;
@@ -413,14 +416,14 @@ ami_attach(struct ami_softc *sc)
paddr_t pa;
int s;
 
-   am = ami_allocmem(sc, NBPG);
+   am = ami_allocmem(sc, NBPG, 1);
if (am == NULL) {
printf(: unable to allocate init data\n);
return (1);
}
pa = htole32(AMIMEM_DVA(am));
 
-   sc-sc_mbox_am = ami_allocmem(sc, sizeof(struct ami_iocmd));
+   sc-sc_mbox_am = ami_allocmem(sc, sizeof(struct ami_iocmd), 1);
if (sc-sc_mbox_am == NULL) {
printf(: unable to allocate mbox\n);
goto free_idata;
@@ -1914,7 +1917,7 @@ ami_mgmt(struct ami_softc *sc, u_int8_t 
}
 
if (size) {
-   if ((am = ami_allocmem(sc, size)) == NULL) {
+   if ((am = ami_allocmem(sc, size, 0)) == NULL) {
error = ENOMEM;
goto memerr;
}



pfsync bug

2010-03-24 Thread David Gwynne
i'll give a cookie* to anyone who can fix the bug described at
http://cvs.openbsd.org/cgi-bin/query-pr-wrapper?full=yesnumbers=6329.

ive stared at the pfsync code for hours trying to find it and cannot. fresh
eyes might have better luck though.

dlg

 * may not include real cookie.



disk caches and bioctl hacking

2010-05-19 Thread David Gwynne
this code lets userland check and change the caches on disks using
the new ioctl i just put into the tree.

id be extremely happy if someone would take the functionality here
and fit it into bioctl.

/*
 * Copyright (c) 2010 David Gwynne d...@openbsd.org
 *
 * Permission to use, copy, modify, and distribute this software for any
 * purpose with or without fee is hereby granted, provided that the above
 * copyright notice and this permission notice appear in all copies.
 *
 * THE SOFTWARE IS PROVIDED AS IS AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 */

#include stdio.h
#include unistd.h
#include fcntl.h
#include sys/ioctl.h
#include sys/dkio.h

__dead void usage(void);

__dead void
usage(void)
{
extern char *__progname;

fprintf(stderr, usage: %s [-rRwW] disk\n, __progname);

exit(1);
}

int
main(int argc, char *argv[])
{
struct dk_cache dkc;
int fd;
char *disk = NULL;
long cmd = 0;
int rdcache = -1, wrcache = -1;
int set = 0;

int ch;

while ((ch = getopt(argc, argv, rRwW)) != -1) {
switch (ch) {
case 'r':
rdcache = 0;
break;
case 'R':
rdcache = 1;
break;
case 'w':
wrcache = 0;
break;
case 'W':
wrcache = 1;
break;
default:
usage();
/* NOTREACHED */
}
}
argc -= optind;
argv += optind;

if (argc != 1)
usage();
disk = argv[0];

fd = open(disk, O_RDONLY);
if (fd == -1)
err(1, open: %s, disk);

if (ioctl(fd, DIOCGCACHE, dkc) == -1)
err(1, ioctl DIOCGCACHE);

if (wrcache != -1) {
dkc.wrcache = wrcache;
set = 1;
}
if (rdcache != -1) {
dkc.rdcache = rdcache;
set = 1;
}

if (set) {
if (ioctl(fd, DIOCSCACHE, dkc) == -1)
err(1, ioctl DIOCSCACHE);
}

printf(%s: write cache: %s, read cache: %s\n, disk,
dkc.wrcache ? enabled : disabled,
dkc.rdcache ? enabled : disabled);

return (0);
}



possible fix to races in ami(4)

2010-05-31 Thread David Gwynne
you cant test a variable and then sleep on it without blocking
interrupts, cos a completion could change the variables state between
those two actions.

could anyone test setting a hotspare on ami(4) while doing io?

dlg

Index: ami.c
===
RCS file: /cvs/src/sys/dev/ic/ami.c,v
retrieving revision 1.204
diff -u -p ami.c
--- ami.c   20 May 2010 00:55:17 -  1.204
+++ ami.c   31 May 2010 12:42:47 -
@@ -186,11 +186,8 @@ ami_remove_runq(struct ami_ccb *ccb)
splassert(IPL_BIO);
 
TAILQ_REMOVE(ccb-ccb_sc-sc_ccb_runq, ccb, ccb_link);
-   if (TAILQ_EMPTY(ccb-ccb_sc-sc_ccb_runq)) {
-   ccb-ccb_sc-sc_drained = 1;
-   if (ccb-ccb_sc-sc_drainio)
-   wakeup(ccb-ccb_sc);
-   }
+   if (ccb-ccb_sc-sc_drainio  TAILQ_EMPTY(ccb-ccb_sc-sc_ccb_runq))
+   wakeup(ccb-ccb_sc);
 }
 
 void
@@ -198,7 +195,6 @@ ami_insert_runq(struct ami_ccb *ccb)
 {
splassert(IPL_BIO);
 
-   ccb-ccb_sc-sc_drained = 0;
TAILQ_INSERT_TAIL(ccb-ccb_sc-sc_ccb_runq, ccb, ccb_link);
 }
 
@@ -539,7 +535,6 @@ ami_attach(struct ami_softc *sc)
/* error already printed */
goto free_mbox;
}
-   sc-sc_drained = 1;
 
/* hack for hp netraid version encoding */
if ('A' = sc-sc_fwver[2]  sc-sc_fwver[2] = 'Z' 
@@ -1016,7 +1011,6 @@ ami_runqueue(struct ami_softc *sc)
 
while ((ccb = TAILQ_FIRST(sc-sc_ccb_preq)) != NULL) {
if (sc-sc_exec(sc, ccb-ccb_cmd) != 0) {
-   /* this is now raceable too with other incoming io */
timeout_add(sc-sc_run_tmo, 1);
break;
}
@@ -1895,10 +1889,8 @@ ami_mgmt(struct ami_softc *sc, u_int8_t opcode, u_int8
goto err;
}
ccb-ccb_done = ami_done_ioctl;
-   } else {
+   } else
ccb = sc-sc_mgmtccb;
-   ccb-ccb_done = ami_done_dummy;
-   }
 
if (size) {
if ((am = ami_allocmem(sc, size)) == NULL) {
@@ -1930,22 +1922,29 @@ ami_mgmt(struct ami_softc *sc, u_int8_t opcode, u_int8
 
if (opcode != AMI_CHSTATE) {
ami_start(sc, ccb);
+   s = splbio();
while (ccb-ccb_state != AMI_CCB_READY)
tsleep(ccb, PRIBIO,ami_mgmt, 0);
+   splx(s);
} else {
/* change state must be run with id 0xfe and MUST be polled */
+   s = splbio();
sc-sc_drainio = 1;
-   while (sc-sc_drained != 1)
+   while (!TAILQ_EMPTY(sc-sc_ccb_runq)) {
if (tsleep(sc, PRIBIO, ami_mgmt_drain, hz * 60) ==
EWOULDBLOCK) {
printf(%s: drain io timeout\n, DEVNAME(sc));
ccb-ccb_flags |= AMI_CCB_F_ERR;
goto restartio;
}
-   ami_poll(sc, ccb);
+   }
+
+   error = sc-sc_poll(sc, ccb-ccb_cmd);
+   if (error == -1)
+   ccb-ccb_flags |= AMI_CCB_F_ERR;
+
 restartio:
/* restart io */
-   s = splbio();
sc-sc_drainio = 0;
ami_runqueue(sc);
splx(s);
@@ -1966,7 +1965,6 @@ memerr:
} else {
ccb-ccb_flags = 0;
ccb-ccb_state = AMI_CCB_FREE;
-   ccb-ccb_done = NULL;
}
 
 err:
Index: amivar.h
===
RCS file: /cvs/src/sys/dev/ic/amivar.h,v
retrieving revision 1.54
diff -u -p amivar.h
--- amivar.h28 Oct 2008 11:43:10 -  1.54
+++ amivar.h31 May 2010 12:42:47 -
@@ -149,7 +149,6 @@ struct ami_softc {
charsc_plist[AMI_BIG_MAX_PDRIVES];
 
struct ami_ccb  *sc_mgmtccb;
-   int sc_drained;
int sc_drainio;
u_int8_tsc_drvinscnt;
 };



use siphash to protect the mroute hash

2014-11-15 Thread David Gwynne
i have no idea how to test this.

tests? oks?

Index: ip_mroute.c
===
RCS file: /cvs/src/sys/netinet/ip_mroute.c,v
retrieving revision 1.71
diff -u -p -r1.71 ip_mroute.c
--- ip_mroute.c 30 Sep 2014 12:54:22 -  1.71
+++ ip_mroute.c 15 Nov 2014 11:18:29 -
@@ -93,6 +93,9 @@
 
 #include sys/stdarg.h
 
+#include dev/rndvar.h
+#include crypto/siphash.h
+
 #define IP_MULTICASTOPTS 0
 #defineM_PULLUP(m, len)
 \
do { \
@@ -110,11 +113,12 @@ int   ip_mrtproto = IGMP_DVMRP;/* for
 #define NO_RTE_FOUND   0x1
 #define RTE_FOUND  0x2
 
-#defineMFCHASH(a, g)   
\
-   a).s_addr  20) ^ ((a).s_addr  10) ^ (a).s_addr ^\
-   ((g).s_addr  20) ^ ((g).s_addr  10) ^ (g).s_addr)  mfchash)
+u_int32_t _mfchash(struct in_addr, struct in_addr);
+
+#defineMFCHASH(a, g) _mfchash((a), (g))
 LIST_HEAD(mfchashhdr, mfc) *mfchashtbl;
 u_long mfchash;
+SIPHASH_KEY mfchashkey;
 
 u_char nexpire[MFCTBLSIZ];
 struct vif viftable[MAXVIFS];
@@ -281,8 +285,10 @@ static struct mfc *
 mfc_find(struct in_addr *o, struct in_addr *g)
 {
struct mfc *rt;
+   u_int32_t hash;
 
-   LIST_FOREACH(rt, mfchashtbl[MFCHASH(*o, *g)], mfc_hash) {
+   hash = MFCHASH(*o, *g);
+   LIST_FOREACH(rt, mfchashtbl[hash], mfc_hash) {
if (in_hosteq(rt-mfc_origin, *o) 
in_hosteq(rt-mfc_mcastgrp, *g) 
(rt-mfc_stall == NULL))
@@ -498,6 +504,7 @@ ip_mrouter_init(struct socket *so, struc
ip_mrouter = so;
 
mfchashtbl = hashinit(MFCTBLSIZ, M_MRTABLE, M_WAITOK, mfchash);
+   arc4random_buf(mfchashkey, sizeof(mfchashkey));
memset(nexpire, 0, sizeof(nexpire));
 
pim_assert = 0;
@@ -509,6 +516,18 @@ ip_mrouter_init(struct socket *so, struc
log(LOG_DEBUG, ip_mrouter_init\n);
 
return (0);
+}
+
+u_int32_t
+_mfchash(struct in_addr o, struct in_addr g)
+{
+   SIPHASH_CTX ctx;
+
+   SipHash24_Init(ctx, mfchashkey);
+   SipHash24_Update(ctx, o.s_addr, sizeof(o.s_addr));
+   SipHash24_Update(ctx, g.s_addr, sizeof(g.s_addr));
+
+   return (SipHash24_End(ctx)  mfchash);
 }
 
 /*



Re: VPLS patch [0/3]: introduction

2014-11-15 Thread David Gwynne

 On 15 Nov 2014, at 3:40 am, Stuart Henderson st...@openbsd.org wrote:
 
 On 2014/11/14 15:24, Rafael Zalamena wrote:
 On Fri, Nov 14, 2014 at 05:41:32PM +0100, Mike Belopuhov wrote:
 
 is it possible to call it something other than just wire(4)?
 vpls maybe?
 
 pseudowire(4) then? (looks long to me)
 I wouldn't call vpls as it may be expanded later to do also VPWS.
 
 pwe(4)?
 
 though, wire(4) doesn't seem bad to me..

merge it into gif(4).

how about mpw(4) for MPLS Pseudo Wire like mpe(4) is MPLS Provider Edge?

or merge it into gif(4) ;)



siphash for ffs inode hash

2014-11-16 Thread David Gwynne
im in two minds about this.

on the one hand, its not like this can be attacked remotely. on the
other hand, maybe someone would create a metric buttload of files
and open all the ones with inodes whose values are 4096 apart.

discussion? tests? ok?

part of me thinks we shoudl apply siphash to all the kernel hashes
as a matter of course rather than waste the effort trying to figure
out if they are attackable or not.

Index: ufs_ihash.c
===
RCS file: /cvs/src/sys/ufs/ufs/ufs_ihash.c,v
retrieving revision 1.19
diff -u -p -r1.19 ufs_ihash.c
--- ufs_ihash.c 14 Sep 2014 14:17:27 -  1.19
+++ ufs_ihash.c 16 Nov 2014 14:25:19 -
@@ -41,12 +41,30 @@
 #include ufs/ufs/inode.h
 #include ufs/ufs/ufs_extern.h
 
+#include dev/rndvar.h
+#include crypto/siphash.h
+
 /*
  * Structures associated with inode cacheing.
  */
 LIST_HEAD(ihashhead, inode) *ihashtbl;
 u_long ihash;  /* size of hash table - 1 */
-#defineINOHASH(device, inum)   (ihashtbl[((device) + (inum))  ihash])
+SIPHASH_KEY ihashkey;
+
+struct ihashhead *ufs_ihash(dev_t, ufsino_t);
+#define INOHASH(device, inum) ufs_ihash((device), (inum))
+
+struct ihashhead *
+ufs_ihash(dev_t dev, ufsino_t inum)
+{
+   SIPHASH_CTX ctx;
+
+   SipHash24_Init(ctx, ihashkey);
+   SipHash24_Update(ctx, dev, sizeof(dev));
+   SipHash24_Update(ctx, inum, sizeof(inum));
+
+   return (ihashtbl[SipHash24_End(ctx)  ihash]);
+}
 
 /*
  * Initialize inode hash table.
@@ -55,6 +73,7 @@ void
 ufs_ihashinit(void)
 {
ihashtbl = hashinit(desiredvnodes, M_UFSMNT, M_WAITOK, ihash);
+   arc4random_buf(ihashkey, sizeof(ihashkey));
 }
 
 /*
@@ -65,11 +84,14 @@ struct vnode *
 ufs_ihashlookup(dev_t dev, ufsino_t inum)
 {
 struct inode *ip;
+   struct ihashhead *ipp;
 
/* XXXLOCKING lock hash list */
-   LIST_FOREACH(ip, INOHASH(dev, inum), i_hash)
+   ipp = INOHASH(dev, inum);
+   LIST_FOREACH(ip, ipp, i_hash) {
if (inum == ip-i_number  dev == ip-i_dev)
break;
+   }
/* XXXLOCKING unlock hash list? */
 
if (ip)
@@ -86,11 +108,13 @@ struct vnode *
 ufs_ihashget(dev_t dev, ufsino_t inum)
 {
struct proc *p = curproc;
+   struct ihashhead *ipp;
struct inode *ip;
struct vnode *vp;
 loop:
/* XXXLOCKING lock hash list */
-   LIST_FOREACH(ip, INOHASH(dev, inum), i_hash) {
+   ipp = INOHASH(dev, inum);
+   LIST_FOREACH(ip, ipp, i_hash) {
if (inum == ip-i_number  dev == ip-i_dev) {
vp = ITOV(ip);
/* XXXLOCKING unlock hash list? */
@@ -119,7 +143,8 @@ ufs_ihashins(struct inode *ip)
 
/* XXXLOCKING lock hash list */
 
-   LIST_FOREACH(curip, INOHASH(dev, inum), i_hash) {
+   ipp = INOHASH(dev, inum);
+   LIST_FOREACH(curip, ipp, i_hash) {
if (inum == curip-i_number  dev == curip-i_dev) {
/* XXXLOCKING unlock hash list? */
lockmgr(ip-i_lock, LK_RELEASE, NULL);
@@ -127,7 +152,6 @@ ufs_ihashins(struct inode *ip)
}
}
 
-   ipp = INOHASH(dev, inum);
SET(ip-i_flag, IN_HASHED);
LIST_INSERT_HEAD(ipp, ip, i_hash);
/* XXXLOCKING unlock hash list? */



siphash for ufs disk quota hash

2014-11-16 Thread David Gwynne
can someone test this? and eyeball the use of a pointer as data to
get mixed into the hash?

Index: ufs_quota.c
===
RCS file: /cvs/src/sys/ufs/ufs/ufs_quota.c,v
retrieving revision 1.35
diff -u -p -r1.35 ufs_quota.c
--- ufs_quota.c 13 Oct 2014 03:46:33 -  1.35
+++ ufs_quota.c 17 Nov 2014 00:59:49 -
@@ -53,6 +53,9 @@
 
 #include sys/queue.h
 
+#include dev/rndvar.h
+#include crypto/siphash.h
+
 /*
  * The following structure records disk usage for a user or group on a
  * filesystem. There is one allocated for each quota that exists on any
@@ -805,9 +808,8 @@ qsync(struct mount *mp)
 /*
  * Code pertaining to management of the in-core dquot data structures.
  */
-#define DQHASH(dqvp, id) \
-   (dqhashtbl[long)(dqvp))  8) + id)  dqhash])
 LIST_HEAD(dqhash, dquot) *dqhashtbl;
+SIPHASH_KEY dqhashkey;
 u_long dqhash;
 
 /*
@@ -824,6 +826,7 @@ void
 ufs_quota_init(void)
 {
dqhashtbl = hashinit(desiredvnodes, M_DQUOT, M_WAITOK, dqhash);
+   arc4random_buf(dqhashkey, sizeof(dqhashkey));
TAILQ_INIT(dqfreelist);
 }
 
@@ -835,6 +838,7 @@ int
 dqget(struct vnode *vp, u_long id, struct ufsmount *ump, int type,
 struct dquot **dqp)
 {
+   SIPHASH_CTX ctx;
struct proc *p = curproc;
struct dquot *dq;
struct dqhash *dqh;
@@ -851,7 +855,11 @@ dqget(struct vnode *vp, u_long id, struc
/*
 * Check the cache first.
 */
-   dqh = DQHASH(dqvp, id);
+   SipHash24_Init(ctx, dqhashkey);
+   SipHash24_Update(ctx, dqvp, sizeof(dqvp));
+   SipHash24_Update(ctx, id, sizeof(id));
+   dqh = dqhashtbl[SipHash24_End(ctx)  dqhash];
+
LIST_FOREACH(dq, dqh, dq_hash) {
if (dq-dq_id != id ||
dq-dq_vp != dqvp)



ufs_ihash.c double handles LIST entry invalidation

2014-11-16 Thread David Gwynne
src/sys/queue.h invalidates pointers in lists if DIAGNOSTIC is
enabled, so ufs_ihash.c doesnt have to.

ok?

Index: ufs_ihash.c
===
RCS file: /cvs/src/sys/ufs/ufs/ufs_ihash.c,v
retrieving revision 1.20
diff -u -p -r1.20 ufs_ihash.c
--- ufs_ihash.c 17 Nov 2014 00:59:31 -  1.20
+++ ufs_ihash.c 17 Nov 2014 07:08:29 -
@@ -173,9 +173,5 @@ ufs_ihashrem(struct inode *ip)
LIST_REMOVE(ip, i_hash);
CLR(ip-i_flag, IN_HASHED);
}
-#ifdef DIAGNOSTIC
-   ip-i_hash.le_next = NULL;
-   ip-i_hash.le_prev = NULL;
-#endif
/* XXXLOCKING unlock hash list? */
 }



siphash for inode caches, or when you have a hammer everything looks like a nail

2014-11-17 Thread David Gwynne
this cuts all the hashes in the filesystems over to using siphash
to get keys into their hashes.

i have compiled all of them, and will start on testing them as much
as i can. so far i have done softdep and started on the nfs srv
cache. id like help and review though.

Index: isofs/cd9660/cd9660_node.c
===
RCS file: /cvs/src/sys/isofs/cd9660/cd9660_node.c,v
retrieving revision 1.25
diff -u -p -r1.25 cd9660_node.c
--- isofs/cd9660/cd9660_node.c  14 Sep 2014 14:17:25 -  1.25
+++ isofs/cd9660/cd9660_node.c  18 Nov 2014 01:03:13 -
@@ -48,6 +48,9 @@
 #include sys/malloc.h
 #include sys/stat.h
 
+#include dev/rndvar.h
+#include crypto/siphash.h
+
 #include isofs/cd9660/iso.h
 #include isofs/cd9660/cd9660_extern.h
 #include isofs/cd9660/cd9660_node.h
@@ -56,9 +59,12 @@
 /*
  * Structures associated with iso_node caching.
  */
+u_int cd9660_isohash(dev_t, cdino_t);
+
 struct iso_node **isohashtbl;
 u_long isohash;
-#defineINOHASH(device, inum)   (((device) + ((inum)12))  isohash)
+SIPHASH_KEY isohashkey;
+#defineINOHASH(device, inum) cd9660_isohash((device), (inum))
 
 extern int prtactive;  /* 1 = print out reclaim of active vnodes */
 
@@ -73,7 +79,19 @@ cd9660_init(vfsp)
 {
 
isohashtbl = hashinit(desiredvnodes, M_ISOFSMNT, M_WAITOK, isohash);
+   arc4random_buf(isohashkey, sizeof(isohashkey));
return (0);
+}
+
+u_int
+cd9660_isohash(dev_t device, cdino_t inum)
+{
+   SIPHASH_CTX ctx;
+
+   SipHash24_Init(ctx, isohashkey);
+   SipHash24_Update(ctx, device, sizeof(device));
+   SipHash24_Update(ctx, inum, sizeof(inum));
+   return (SipHash24_End(ctx)  isohash);
 }
 
 /*
Index: isofs/udf/udf.h
===
RCS file: /cvs/src/sys/isofs/udf/udf.h,v
retrieving revision 1.19
diff -u -p -r1.19 udf.h
--- isofs/udf/udf.h 17 Sep 2013 04:31:56 -  1.19
+++ isofs/udf/udf.h 18 Nov 2014 01:03:13 -
@@ -71,6 +71,7 @@ struct umount {
struct unode *um_vat;
struct long_ad um_root_icb;
LIST_HEAD(udf_hash_lh, unode) *um_hashtbl;
+   SIPHASH_KEY um_hashkey;
u_long um_hashsz;
struct mutex um_hashmtx;
int um_psecs;
Index: isofs/udf/udf_subr.c
===
RCS file: /cvs/src/sys/isofs/udf/udf_subr.c,v
retrieving revision 1.23
diff -u -p -r1.23 udf_subr.c
--- isofs/udf/udf_subr.c3 Nov 2014 21:28:35 -   1.23
+++ isofs/udf/udf_subr.c18 Nov 2014 01:03:13 -
@@ -38,6 +38,9 @@
 #include sys/dirent.h
 #include sys/disklabel.h
 
+#include dev/rndvar.h
+#include crypto/siphash.h
+
 #include isofs/udf/ecma167-udf.h
 #include isofs/udf/udf.h
 #include isofs/udf/udf_extern.h
Index: isofs/udf/udf_vfsops.c
===
RCS file: /cvs/src/sys/isofs/udf/udf_vfsops.c,v
retrieving revision 1.42
diff -u -p -r1.42 udf_vfsops.c
--- isofs/udf/udf_vfsops.c  12 Jul 2014 18:50:00 -  1.42
+++ isofs/udf/udf_vfsops.c  18 Nov 2014 01:03:13 -
@@ -67,6 +67,9 @@
 #include sys/endian.h
 #include sys/specdev.h
 
+#include dev/rndvar.h
+#include crypto/siphash.h
+
 #include isofs/udf/ecma167-udf.h
 #include isofs/udf/udf.h
 #include isofs/udf/udf_extern.h
@@ -364,6 +367,7 @@ udf_mountfs(struct vnode *devvp, struct 
mtx_init(ump-um_hashmtx, IPL_NONE);
ump-um_hashtbl = hashinit(UDF_HASHTBLSIZE, M_UDFMOUNT, M_WAITOK,
ump-um_hashsz);
+   arc4random_buf(ump-um_hashkey, sizeof(ump-um_hashkey));
 
/* Get the VAT, if needed */
if (ump-um_flags  UDF_MNT_FIND_VAT) {
Index: isofs/udf/udf_vnops.c
===
RCS file: /cvs/src/sys/isofs/udf/udf_vnops.c,v
retrieving revision 1.56
diff -u -p -r1.56 udf_vnops.c
--- isofs/udf/udf_vnops.c   3 Nov 2014 21:28:35 -   1.56
+++ isofs/udf/udf_vnops.c   18 Nov 2014 01:03:13 -
@@ -50,6 +50,9 @@
 #include sys/specdev.h
 #include sys/unistd.h
 
+#include dev/rndvar.h
+#include crypto/siphash.h
+
 #include isofs/udf/ecma167-udf.h
 #include isofs/udf/udf.h
 #include isofs/udf/udf_extern.h
@@ -92,7 +95,8 @@ udf_hashlookup(struct umount *ump, udfin
 
 loop:
mtx_enter(ump-um_hashmtx);
-   lh = ump-um_hashtbl[id  ump-um_hashsz];
+   lh = ump-um_hashtbl[SipHash24(ump-um_hashkey, id, sizeof(id)) 
+   ump-um_hashsz];
if (lh == NULL) {
mtx_leave(ump-um_hashmtx);
return (ENOENT);
@@ -127,7 +131,8 @@ udf_hashins(struct unode *up)
 
vn_lock(up-u_vnode, LK_EXCLUSIVE | LK_RETRY, p);
mtx_enter(ump-um_hashmtx);
-   lh = ump-um_hashtbl[up-u_ino  ump-um_hashsz];
+   lh = ump-um_hashtbl[SipHash24(ump-um_hashkey,
+   up-u_ino, sizeof(up-u_ino))  ump-um_hashsz];
if (lh == NULL)
panic(hash entry is NULL, up-u_ino = %d, 

Re: siphash for inode caches, or when you have a hammer everything looks like a nail

2014-11-17 Thread David Gwynne

 On 18 Nov 2014, at 11:39, Ted Unangst t...@tedunangst.com wrote:
 
 On Tue, Nov 18, 2014 at 11:06, David Gwynne wrote:
 
 +#include dev/rndvar.h
 
 Sorry to interrupt :), but I think it's become clear that rndvar.h is
 the wrong header for arc4random() to live in.

agreed.

 The diff below moves it to systm.h and fixes the two files that didn't
 compile.

if this is quickly oked, i think it should go in first so i can update the hash 
diff. the hash diff needs more time for scrutiny and testing which would only 
hold up your simpler header change.

 
 Index: dev/rndvar.h
 ===
 RCS file: /cvs/src/sys/dev/rndvar.h,v
 retrieving revision 1.33
 diff -u -p -r1.33 rndvar.h
 --- dev/rndvar.h  19 Jan 2014 23:52:54 -  1.33
 +++ dev/rndvar.h  18 Nov 2014 01:31:17 -
 @@ -72,9 +72,6 @@ extern struct rndstats rndstats;
 void random_start(void);
 
 void enqueue_randomness(int, int);
 -void arc4random_buf(void *, size_t);
 -u_int32_t arc4random(void);
 -u_int32_t arc4random_uniform(u_int32_t);
 
 #endif /* _KERNEL */
 
 Index: sys/systm.h
 ===
 RCS file: /cvs/src/sys/sys/systm.h,v
 retrieving revision 1.102
 diff -u -p -r1.102 systm.h
 --- sys/systm.h   9 Oct 2014 04:04:27 -   1.102
 +++ sys/systm.h   18 Nov 2014 01:31:28 -
 @@ -215,6 +215,10 @@ int  copyin(const void *, void *, size_t)
   __attribute__ ((__bounded__(__buffer__,2,3)));
 int   copyout(const void *, void *, size_t);
 
 +void arc4random_buf(void *, size_t);
 +u_int32_t arc4random(void);
 +u_int32_t arc4random_uniform(u_int32_t);
 +
 struct timeval;
 struct timespec;
 int   hzto(const struct timeval *);
 Index: netinet/ip_id.c
 ===
 RCS file: /cvs/src/sys/netinet/ip_id.c,v
 retrieving revision 1.23
 diff -u -p -r1.23 ip_id.c
 --- netinet/ip_id.c   31 Mar 2011 10:36:42 -  1.23
 +++ netinet/ip_id.c   18 Nov 2014 01:35:18 -
 @@ -26,7 +26,7 @@
  * be reused for at least 32768 calls.
  */
 #include sys/param.h
 -#include dev/rndvar.h
 +#include sys/systm.h
 
 static u_int16_t ip_shuffle[65536];
 static int isindex = 0;
 Index: netinet6/ip6_id.c
 ===
 RCS file: /cvs/src/sys/netinet6/ip6_id.c,v
 retrieving revision 1.8
 diff -u -p -r1.8 ip6_id.c
 --- netinet6/ip6_id.c 8 Feb 2010 12:16:02 -   1.8
 +++ netinet6/ip6_id.c 18 Nov 2014 01:36:12 -
 @@ -85,13 +85,12 @@
 #include sys/param.h
 #include sys/kernel.h
 #include sys/socket.h
 +#include sys/systm.h
 
 #include net/if.h
 #include netinet/in.h
 #include netinet/ip6.h
 #include netinet6/ip6_var.h
 -
 -#include dev/rndvar.h
 
 struct randomtab {
   const int   ru_bits; /* resulting bits */




<    1   2   3   4   5   6   7   8   9   10   >