how to get max # of mbufs in a packet from xn to the tcp stack?

2011-07-06 Thread Colin Percival
Hi all,

I've attached a patch which fixes the nfrags  18, netback won't be able to
handle it problem with xen netfront when TSO is enabled.

It's not finished, though:
+   int max_mbuf_chain_len = 16;/* XXX Set this based on interface? */

I'm not sure what the right way is to feed a value from the interface up into
tcp_output; can someone advise?

-- 
Colin Percival
Security Officer, FreeBSD | freebsd.org | The power to serve
Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid
Index: kern/uipc_mbuf.c
===
--- kern/uipc_mbuf.c	(revision 223824)
+++ kern/uipc_mbuf.c	(working copy)
@@ -525,12 +525,14 @@
  * only their reference counts are incremented.
  */
 struct mbuf *
-m_copym(struct mbuf *m, int off0, int len, int wait)
+m_copy_nbufs(struct mbuf *m, int off0, int len, int wait, long * outlen,
+int nbufmax)
 {
 	struct mbuf *n, **np;
 	int off = off0;
 	struct mbuf *top;
 	int copyhdr = 0;
+	int len_orig = len;
 
 	KASSERT(off = 0, (m_copym, negative off %d, off));
 	KASSERT(len = 0, (m_copym, negative len %d, len));
@@ -546,7 +548,7 @@
 	}
 	np = top;
 	top = 0;
-	while (len  0) {
+	while (len  0  nbufmax--  0) {
 		if (m == NULL) {
 			KASSERT(len == M_COPYALL, 
 			(m_copym, length  size of mbuf chain));
@@ -584,6 +586,9 @@
 	if (top == NULL)
 		mbstat.m_mcfail++;	/* XXX: No consistency. */
 
+	if (outlen)
+		*outlen = len_orig - len;
+
 	return (top);
 nospace:
 	m_freem(top);
@@ -591,6 +596,13 @@
 	return (NULL);
 }
 
+struct mbuf *
+m_copym(struct mbuf *m, int off0, int len, int wait)
+{
+
+	return (m_copy_nbufs(m, off0, len, wait, NULL, INT_MAX));
+}
+
 /*
  * Returns mbuf chain with new head for the prepending case.
  * Copies from mbuf (chain) n from off for len to mbuf (chain) m
Index: netinet/tcp_output.c
===
--- netinet/tcp_output.c	(revision 223824)
+++ netinet/tcp_output.c	(working copy)
@@ -183,6 +183,7 @@
 	int sack_rxmit, sack_bytes_rxmt;
 	struct sackhole *p;
 	int tso;
+	int max_mbuf_chain_len = 16;	/* XXX Set this based on interface? */
 	struct tcpopt to;
 #if 0
 	int maxburst = TCP_MAXBURST;
@@ -806,16 +807,6 @@
 		struct mbuf *mb;
 		u_int moff;
 
-		if ((tp-t_flags  TF_FORCEDATA)  len == 1)
-			TCPSTAT_INC(tcps_sndprobe);
-		else if (SEQ_LT(tp-snd_nxt, tp-snd_max) || sack_rxmit) {
-			tp-t_sndrexmitpack++;
-			TCPSTAT_INC(tcps_sndrexmitpack);
-			TCPSTAT_ADD(tcps_sndrexmitbyte, len);
-		} else {
-			TCPSTAT_INC(tcps_sndpack);
-			TCPSTAT_ADD(tcps_sndbyte, len);
-		}
 		MGETHDR(m, M_DONTWAIT, MT_DATA);
 		if (m == NULL) {
 			SOCKBUF_UNLOCK(so-so_snd);
@@ -847,7 +838,8 @@
 			mtod(m, caddr_t) + hdrlen);
 			m-m_len += len;
 		} else {
-			m-m_next = m_copy(mb, moff, (int)len);
+			m-m_next = m_copy_nbufs(mb, moff, len, M_DONTWAIT,
+			len, max_mbuf_chain_len);
 			if (m-m_next == NULL) {
 SOCKBUF_UNLOCK(so-so_snd);
 (void) m_free(m);
@@ -856,6 +848,18 @@
 			}
 		}
 
+		/* Update stats here as m_copy_nbufs may have adjusted len. */
+		if ((tp-t_flags  TF_FORCEDATA)  len == 1)
+			TCPSTAT_INC(tcps_sndprobe);
+		else if (SEQ_LT(tp-snd_nxt, tp-snd_max) || sack_rxmit) {
+			tp-t_sndrexmitpack++;
+			TCPSTAT_INC(tcps_sndrexmitpack);
+			TCPSTAT_ADD(tcps_sndrexmitbyte, len);
+		} else {
+			TCPSTAT_INC(tcps_sndpack);
+			TCPSTAT_ADD(tcps_sndbyte, len);
+		}
+
 		/*
 		 * If we're sending everything we've got, set PUSH.
 		 * (This will keep happy those implementations which only
Index: sys/mbuf.h
===
--- sys/mbuf.h	(revision 223824)
+++ sys/mbuf.h	(working copy)
@@ -849,6 +849,7 @@
 		int, int, int, int);
 struct mbuf	*m_copypacket(struct mbuf *, int);
 void		 m_copy_pkthdr(struct mbuf *, struct mbuf *);
+struct mbuf	*m_copy_nbufs(struct mbuf *, int, int, int, long *, int);
 struct mbuf	*m_copyup(struct mbuf *n, int len, int dstoff);
 struct mbuf	*m_defrag(struct mbuf *, int);
 void		 m_demote(struct mbuf *, int);
___
freebsd-xen@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to freebsd-xen-unsubscr...@freebsd.org


Re: breakage in blkfront with ring_pages 1

2011-07-06 Thread Justin T. Gibbs

On 7/5/11 10:14 PM, Colin Percival wrote:

 On 07/05/11 19:42, Colin Percival wrote:
 On 07/05/11 19:04, Justin T. Gibbs wrote:
 On 7/5/11 7:14 PM, Colin Percival wrote:
 Maybe the right option is to have a loader tunable dev.xn.linuxback to
 control which version of the protocol is used?

 What a mess.

 Yep. Mess or not, shall I go ahead with having a loader tunable 

control this,

 or can you think of a better solution?

 Does anyone object to the attached patch? It keeps the differing 

behaviour to
 a minimum -- we MUST set ring-ref with a FreeBSD blkback, and we MUST 

NOT set
 it with a linux blkback -- but otherwise errs in the direction of 

setting more

 variables than are needed, to maximize the possibility of a future blkback
 being compatible with both blkback_is_linux=0 and blkback_is_linux=1.


It would be better to just change the FreeBSD blkback driver to be
compatible with the RedHat convention.

I'm still unclear on why the current FreeBSD blkfront driver believes
that it can use more than one page in your configuration given that the
RedHat blkfront doesn't advertise this capability in a way that the FreeBSD
blkfront understands (max-ring-pages isn'te set by blkback).  Did you do
something to force blkfront to use more than one page?

--
Justin

___
freebsd-xen@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to freebsd-xen-unsubscr...@freebsd.org


Re: breakage in blkfront with ring_pages 1

2011-07-06 Thread Colin Percival
On 07/06/11 15:55, Justin T. Gibbs wrote:
 On 7/5/11 10:14 PM, Colin Percival wrote:
 On 07/05/11 19:42, Colin Percival wrote:
 Yep. Mess or not, shall I go ahead with having a loader tunable
 control this,
 or can you think of a better solution?

 Does anyone object to the attached patch? It keeps the differing
 behaviour to
 a minimum -- we MUST set ring-ref with a FreeBSD blkback, and we MUST
 NOT set
 it with a linux blkback -- but otherwise errs in the direction of
 setting more
 variables than are needed, to maximize the possibility of a future blkback
 being compatible with both blkback_is_linux=0 and blkback_is_linux=1.
 
 It would be better to just change the FreeBSD blkback driver to be
 compatible with the RedHat convention.

Fine with me, but that will of course break compatibility between pre- and
post- patch versions of FreeBSD.  Aside from you, how many people use the
FreeBSD blkback driver?

 I'm still unclear on why the current FreeBSD blkfront driver believes
 that it can use more than one page in your configuration given that the
 RedHat blkfront doesn't advertise this capability in a way that the FreeBSD
 blkfront understands (max-ring-pages isn'te set by blkback).  Did you do
 something to force blkfront to use more than one page?

I'm seeing max-ring-pages set to 4.  I don't know what tree EC2 is using on
their Dom0 -- I've heard rumours that there's a lot of RedHat going on behind
the scenes, but given Amazon's size it's entirely possible that they got a
blkback which hasn't been released publicly.

I'm sent an email asking where their blkback came from; I'll report back if/when
I know something.

-- 
Colin Percival
Security Officer, FreeBSD | freebsd.org | The power to serve
Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid
___
freebsd-xen@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to freebsd-xen-unsubscr...@freebsd.org