Re: No bus_space_read_8 on x86 ?

2012-10-12 Thread John Baldwin
On Wednesday, October 10, 2012 5:44:09 pm Carl Delsey wrote:
 Sorry for the slow response. I was dealing with a bit of a family 
 emergency. Responses inline below.
 
 On 10/09/12 08:54, John Baldwin wrote:
  On Monday, October 08, 2012 4:59:24 pm Warner Losh wrote:
  On Oct 5, 2012, at 10:08 AM, John Baldwin wrote:
 snip
  I think cxgb* already have an implementation.  For amd64 we should 
  certainly
  have bus_space_*_8(), at least for SYS_RES_MEMORY.  I think they should 
  fail
  for SYS_RES_IOPORT.  I don't think we can force a compile-time error 
  though,
  would just have to return -1 on reads or some such?
 
 Yes. Exactly what I was thinking.
 
  I believe it was because bus reads weren't guaranteed to be atomic on i386.
  don't know if that's still the case or a concern, but it was an 
  intentional omission.
  True.  If you are on a 32-bit system you can read the two 4 byte values and
  then build a 64-bit value.  For 64-bit platforms we should offer 
  bus_read_8()
  however.
 
 I believe there is still no way to perform a 64-bit read on a i386 (or 
 at least without messing with SSE instructions), but if you have to read 
 a 64-bit register, you are stuck with doing two 32-bit reads and 
 concatenating them. I figure we may as well provide an implementation 
 for those who have to do that as well as the implementation for 64-bit.

I think the problem though is that the way you should glue those two 32-bit
reads together is device dependent.  I don't think you can provide a completely
device-neutral bus_read_8() on i386.  We should certainly have it on 64-bit
platforms, but I think drivers that want to work on 32-bit platforms need to
explicitly merge the two words themselves.
 
-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: No bus_space_read_8 on x86 ?

2012-10-12 Thread Robert Watson


On Fri, 12 Oct 2012, John Baldwin wrote:

I believe it was because bus reads weren't guaranteed to be atomic on 
i386. don't know if that's still the case or a concern, but it was an 
intentional omission.
True.  If you are on a 32-bit system you can read the two 4 byte values 
and then build a 64-bit value.  For 64-bit platforms we should offer 
bus_read_8() however.


I believe there is still no way to perform a 64-bit read on a i386 (or at 
least without messing with SSE instructions), but if you have to read a 
64-bit register, you are stuck with doing two 32-bit reads and 
concatenating them. I figure we may as well provide an implementation for 
those who have to do that as well as the implementation for 64-bit.


I think the problem though is that the way you should glue those two 32-bit 
reads together is device dependent.  I don't think you can provide a 
completely device-neutral bus_read_8() on i386.  We should certainly have it 
on 64-bit platforms, but I think drivers that want to work on 32-bit 
platforms need to explicitly merge the two words themselves.


Indeed -- and on non-x86, where there are uncached direct map segments, and 
TLB entries that disable caching, reading 2x 32-bit vs 1x 64-bit have quite 
different effects in terms of atomicity.  Where uncached I/Os are being used, 
those differences may affect semantics significantly -- e.g., if your device 
has a 64-bit memory-mapped FIFO or registers, 2x 32-bit gives you two halves 
of two different 64-bit values, rather than two halves of the same value.  As 
device drivers depend on those atomicity semantics, we should (at the busspace 
level) offer only the exactly expected semantics, rather than trying to patch 
things up.  If a device driver accessing 64-bit fields wants to support doing 
it using two 32-bit reads, it can figure out how to splice it together 
following bus_space_read_region_4().


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: No bus_space_read_8 on x86 ?

2012-10-12 Thread Carl Delsey

On 10/12/2012 9:04 AM, Robert Watson wrote:


On Fri, 12 Oct 2012, John Baldwin wrote:

I believe it was because bus reads weren't guaranteed to be atomic 
on i386. don't know if that's still the case or a concern, but it 
was an intentional omission.
True.  If you are on a 32-bit system you can read the two 4 byte 
values and then build a 64-bit value.  For 64-bit platforms we 
should offer bus_read_8() however.


I believe there is still no way to perform a 64-bit read on a i386 
(or at least without messing with SSE instructions), but if you have 
to read a 64-bit register, you are stuck with doing two 32-bit reads 
and concatenating them. I figure we may as well provide an 
implementation for those who have to do that as well as the 
implementation for 64-bit.


I think the problem though is that the way you should glue those two 
32-bit reads together is device dependent.  I don't think you can 
provide a completely device-neutral bus_read_8() on i386.  We should 
certainly have it on 64-bit platforms, but I think drivers that want 
to work on 32-bit platforms need to explicitly merge the two words 
themselves.


Indeed -- and on non-x86, where there are uncached direct map 
segments, and TLB entries that disable caching, reading 2x 32-bit vs 
1x 64-bit have quite different effects in terms of atomicity. Where 
uncached I/Os are being used, those differences may affect semantics 
significantly -- e.g., if your device has a 64-bit memory-mapped FIFO 
or registers, 2x 32-bit gives you two halves of two different 64-bit 
values, rather than two halves of the same value.  As device drivers 
depend on those atomicity semantics, we should (at the busspace level) 
offer only the exactly expected semantics, rather than trying to patch 
things up.  If a device driver accessing 64-bit fields wants to 
support doing it using two 32-bit reads, it can figure out how to 
splice it together following bus_space_read_region_4().
I wouldn't make any default behaviour for bus_space_read_8 on i386, just 
amd64. My assumption (which may be unjustified) is that by far the most 
common implementations to read a 64-bit register on i386 would be to 
read the lower 4 bytes first, followed by the upper 4 bytes (or vice 
versa) and then stitch them together.  I think we should provide helper 
functions for these two cases, otherwise I fear our code base will be 
littered with multiple independent implementations of this.


Some driver writer who wants to take advantage of these helper functions 
would do something like

#ifdef i386
#definebus_space_read_8bus_space_read_8_lower_first
#endif
otherwise, using bus_space_read_8 won't compile for i386 builds.
If these implementations won't work for their case, they are free to 
write their own implementation or take whatever action is necessary.


I guess my question is, are these cases common enough that it is worth 
helping developers by providing functions that do the double read and 
shifts for them, or do we leave them to deal with it on their own at the 
risk of possibly some duplicated code.


Thanks,
Carl

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: NFS server bottlenecks

2012-10-12 Thread Rick Macklem
I wrote:
 Oops, I didn't get the readahead option description
 quite right in the last post. The default read ahead
 is 1, which does result in rsize * 2, since there is
 the read + 1 readahead.
 
 rsize * 16 would actually be for the option readahead=15
 and for readahead=16 the calculation would be rsize * 17.
 
 However, the example was otherwise ok, I think? rick

I've attached the patch drc3.patch (it assumes drc2.patch has already been
applied) that replaces the single mutex with one for each hash list
for tcp. It also increases the size of NFSRVCACHE_HASHSIZE to 200.

These patches are also at:
  http://people.freebsd.org/~rmacklem/drc2.patch
  http://people.freebsd.org/~rmacklem/drc3.patch
in case the attachments don't get through.

rick
ps: I haven't tested drc3.patch a lot, but I think it's ok?

 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to
 freebsd-hackers-unsubscr...@freebsd.org
--- fs/nfsserver/nfs_nfsdcache.c.orig	2012-02-29 21:07:53.0 -0500
+++ fs/nfsserver/nfs_nfsdcache.c	2012-10-03 08:23:24.0 -0400
@@ -164,8 +164,19 @@ NFSCACHEMUTEX;
 int nfsrc_floodlevel = NFSRVCACHE_FLOODLEVEL, nfsrc_tcpsavedreplies = 0;
 #endif	/* !APPLEKEXT */
 
+SYSCTL_DECL(_vfs_nfsd);
+
+static int	nfsrc_tcphighwater = 0;
+SYSCTL_INT(_vfs_nfsd, OID_AUTO, tcphighwater, CTLFLAG_RW,
+nfsrc_tcphighwater, 0,
+High water mark for TCP cache entries);
+static int	nfsrc_udphighwater = NFSRVCACHE_UDPHIGHWATER;
+SYSCTL_INT(_vfs_nfsd, OID_AUTO, udphighwater, CTLFLAG_RW,
+nfsrc_udphighwater, 0,
+High water mark for UDP cache entries);
+
 static int nfsrc_tcpnonidempotent = 1;
-static int nfsrc_udphighwater = NFSRVCACHE_UDPHIGHWATER, nfsrc_udpcachesize = 0;
+static int nfsrc_udpcachesize = 0;
 static TAILQ_HEAD(, nfsrvcache) nfsrvudplru;
 static struct nfsrvhashhead nfsrvhashtbl[NFSRVCACHE_HASHSIZE],
 nfsrvudphashtbl[NFSRVCACHE_HASHSIZE];
@@ -781,8 +792,15 @@ nfsrc_trimcache(u_int64_t sockref, struc
 {
 	struct nfsrvcache *rp, *nextrp;
 	int i;
+	static time_t lasttrim = 0;
 
+	if (NFSD_MONOSEC == lasttrim 
+	nfsrc_tcpsavedreplies  nfsrc_tcphighwater 
+	nfsrc_udpcachesize  (nfsrc_udphighwater +
+	nfsrc_udphighwater / 2))
+		return;
 	NFSLOCKCACHE();
+	lasttrim = NFSD_MONOSEC;
 	TAILQ_FOREACH_SAFE(rp, nfsrvudplru, rc_lru, nextrp) {
 		if (!(rp-rc_flag  (RC_INPROG|RC_LOCKED|RC_WANTED))
 		  rp-rc_refcnt == 0
--- fs/nfsserver/nfs_nfsdcache.c.sav	2012-10-10 18:56:01.0 -0400
+++ fs/nfsserver/nfs_nfsdcache.c	2012-10-12 21:04:21.0 -0400
@@ -160,7 +160,8 @@ __FBSDID($FreeBSD: head/sys/fs/nfsserve
 #include fs/nfs/nfsport.h
 
 extern struct nfsstats newnfsstats;
-NFSCACHEMUTEX;
+extern struct mtx nfsrc_tcpmtx[NFSRVCACHE_HASHSIZE];
+extern struct mtx nfsrc_udpmtx;
 int nfsrc_floodlevel = NFSRVCACHE_FLOODLEVEL, nfsrc_tcpsavedreplies = 0;
 #endif	/* !APPLEKEXT */
 
@@ -208,10 +209,11 @@ static int newnfsv2_procid[NFS_V3NPROCS]
 	NFSV2PROC_NOOP,
 };
 
+#define	nfsrc_hash(xid)	(((xid) + ((xid)  24)) % NFSRVCACHE_HASHSIZE)
 #define	NFSRCUDPHASH(xid) \
-	(nfsrvudphashtbl[((xid) + ((xid)  24)) % NFSRVCACHE_HASHSIZE])
+	(nfsrvudphashtbl[nfsrc_hash(xid)])
 #define	NFSRCHASH(xid) \
-	(nfsrvhashtbl[((xid) + ((xid)  24)) % NFSRVCACHE_HASHSIZE])
+	(nfsrvhashtbl[nfsrc_hash(xid)])
 #define	TRUE	1
 #define	FALSE	0
 #define	NFSRVCACHE_CHECKLEN	100
@@ -262,6 +264,18 @@ static int nfsrc_getlenandcksum(mbuf_t m
 static void nfsrc_marksametcpconn(u_int64_t);
 
 /*
+ * Return the correct mutex for this cache entry.
+ */
+static __inline struct mtx *
+nfsrc_cachemutex(struct nfsrvcache *rp)
+{
+
+	if ((rp-rc_flag  RC_UDP) != 0)
+		return (nfsrc_udpmtx);
+	return (nfsrc_tcpmtx[nfsrc_hash(rp-rc_xid)]);
+}
+
+/*
  * Initialize the server request cache list
  */
 APPLESTATIC void
@@ -336,10 +350,12 @@ nfsrc_getudp(struct nfsrv_descript *nd, 
 	struct sockaddr_in6 *saddr6;
 	struct nfsrvhashhead *hp;
 	int ret = 0;
+	struct mtx *mutex;
 
+	mutex = nfsrc_cachemutex(newrp);
 	hp = NFSRCUDPHASH(newrp-rc_xid);
 loop:
-	NFSLOCKCACHE();
+	mtx_lock(mutex);
 	LIST_FOREACH(rp, hp, rc_hash) {
 	if (newrp-rc_xid == rp-rc_xid 
 		newrp-rc_proc == rp-rc_proc 
@@ -347,8 +363,8 @@ loop:
 		nfsaddr_match(NETFAMILY(rp), rp-rc_haddr, nd-nd_nam)) {
 			if ((rp-rc_flag  RC_LOCKED) != 0) {
 rp-rc_flag |= RC_WANTED;
-(void)mtx_sleep(rp, NFSCACHEMUTEXPTR,
-(PZERO - 1) | PDROP, nfsrc, 10 * hz);
+(void)mtx_sleep(rp, mutex, (PZERO - 1) | PDROP,
+nfsrc, 10 * hz);
 goto loop;
 			}
 			if (rp-rc_flag == 0)
@@ -358,14 +374,14 @@ loop:
 			TAILQ_INSERT_TAIL(nfsrvudplru, rp, rc_lru);
 			if (rp-rc_flag  RC_INPROG) {
 newnfsstats.srvcache_inproghits++;
-NFSUNLOCKCACHE();
+mtx_unlock(mutex);
 ret = RC_DROPIT;
 			} else if (rp-rc_flag  RC_REPSTATUS) {
 /*
  * V2 only.
  */
 newnfsstats.srvcache_nonidemdonehits++;
-

Re: NFS server bottlenecks

2012-10-12 Thread Garrett Wollman
On Fri, 12 Oct 2012 22:05:54 -0400 (EDT), Rick Macklem rmack...@uoguelph.ca 
said:

 I've attached the patch drc3.patch (it assumes drc2.patch has already been
 applied) that replaces the single mutex with one for each hash list
 for tcp. It also increases the size of NFSRVCACHE_HASHSIZE to 200.

I haven't tested this at all, but I think putting all of the mutexes
in an array like that is likely to cause cache-line ping-ponging.  It
may be better to use a pool mutex, or to put the mutexes adjacent in
memory to the list heads that they protect.  (But I probably won't be
able to do the performance testing on any of these for a while.  I
have a server running the drc2 code but haven't gotten my users to
put a load on it yet.)

-GAWollman
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org