Further mbuf adjustments and changes
reachable with intrusive changes, like removing the second argument from m_ext. CSUM flags: The current CSUM flags are a bit chaotic and rather poorly document, especially that their use on the outbound (down the stack) and inbound (up the stack) use is rather different. Especially the latter are handled partially incorrect in almost all drivers. To bring clarity into this mess the CSUM flags are named and arranged more appropriately with compatibility mappings. The drivers then can be corrected one by one as the work progresses in the new 11-HEAD and MFCd without issue to then 10-stable. The l[3-5]hlen fields provide the means to remove all packet header parsing from the drivers for offload setup. Others: Mbuf initialization is unified through m_init() and m_pkthdr_init() to avoid duplication. m_free_fast() is removed for lack of usage. Patch is available here: http://people.freebsd.org/~andre/mbuf-adjustments-20130821.diff This work is sponsored by the FreeBSD Foundation. -- Andre Index: sys/mbuf.h === --- sys/mbuf.h (revision 254596) +++ sys/mbuf.h (working copy) @@ -67,8 +67,10 @@ * type: * * mtod(m, t) -- Convert mbuf pointer to data pointer of correct type. + * mtodo(m, o) -- Same as above but with offset 'o' into data. */ #definemtod(m, t) ((t)((m)-m_data)) +#definemtodo(m, o) ((void *)(((m)-m_data) + (o))) /* * Argument structure passed to UMA routines during mbuf and packet @@ -80,74 +82,98 @@ }; #endif /* _KERNEL */ -#if defined(__LP64__) -#define M_HDR_PAD6 -#else -#define M_HDR_PAD2 -#endif - /* * Header present at the beginning of every mbuf. + * Size ILP32: 20 + * LP64: 32 */ struct m_hdr { struct mbuf *mh_next; /* next buffer in chain */ struct mbuf *mh_nextpkt;/* next chain in queue/record */ caddr_t mh_data; /* location of data */ - int mh_len;/* amount of data in this mbuf */ - int mh_flags; /* flags; see below */ - shortmh_type; /* type of data in this mbuf */ - uint8_t pad[M_HDR_PAD];/* word align */ + int32_t mh_len;/* amount of data in this mbuf */ + uint32_t mh_type:8, /* type of data in this mbuf */ +mh_flags:24; /* flags; see below */ }; /* * Packet tag structure (see below for details). + * Size ILP32: 16 + * LP64: 24 */ struct m_tag { SLIST_ENTRY(m_tag) m_tag_link; /* List of packet tags */ - u_int16_t m_tag_id; /* Tag ID */ - u_int16_t m_tag_len; /* Length of data */ - u_int32_t m_tag_cookie; /* ABI/Module ID */ + uint16_tm_tag_id; /* Tag ID */ + uint16_tm_tag_len; /* Length of data */ + uint32_tm_tag_cookie; /* ABI/Module ID */ void(*m_tag_free)(struct m_tag *); }; /* * Record/packet header in first mbuf of chain; valid only if M_PKTHDR is set. + * Size ILP32: 48 + * LP64: 56 */ struct pkthdr { struct ifnet*rcvif; /* rcv interface */ - /* variables for ip and tcp reassembly */ - void*header;/* pointer to packet header */ - int len; /* total packet length */ - uint32_t flowid;/* packet's 4-tuple system -* flow identifier -*/ - /* variables for hardware checksum */ - int csum_flags;/* flags regarding checksum */ - int csum_data; /* data field used by csum routines */ - u_int16_ttso_segsz; /* TSO segment size */ + SLIST_HEAD(packet_tags, m_tag) tags; /* list of packet tags */ + int32_t len; /* total packet length */ + + /* Layer crossing persistent information. */ + uint32_t flowid;/* packet's 4-tuple system */ + uint64_t csum_flags;/* checksum and offload features */ + + uint16_t fibnum;/* this packet should use this fib */ + uint8_t cosqos;/* class/quality of service */ + uint8_t rsstype; /* hash type */ + + uint8_t l2hlen;/* layer 2 header length */ + uint8_t l3hlen;/* layer 3 header length */ + uint8_t l4hlen;/* layer 4 header length */ + uint8_t l5hlen;/* layer 5 header length */ + union { - u_int16_t vt_vtag; /* Ethernet 802.1p+q vlan tag */ - u_int16_t vt_nrecs; /* # of IGMPv3 records in this chain */ - } PH_vt; - u_int16_tfibnum
Re: Further mbuf adjustments and changes
On 8/21/13 9:40 PM, Andre Oppermann wrote: [ actual text removed.. ] Patch is available here: http://people.freebsd.org/~andre/mbuf-adjustments-20130821.diff one small thing I noticed.. - u16 type = (CSUM_DATA_VALID | CSUM_PSEUDO_HDR); + u64 type = (CSUM_DATA_VALID | CSUM_PSEUDO_HDR); #if __FreeBSD_version = 80 if we are (in ixgbe) supporting different versions then your chang eneeds to be enclosed in a #if __FreeBSD_version = 1 section does anyone have a copy of jeff's old mbuf rewrite from a couple of years ago? Have you looked at it? is there overlap? I'm not sure that we have much of a gain any more by having mbufs with built in storage, except to complicate the code. we could just have small external storage types so tha tthe mbuf itself was always 'external'.. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CFT] VMware vmxnet3 ethernet driver
Bezüglich Bryan Venteicher's Nachricht vom 05.08.2013 02:12 (localtime): Hi, I've ported the OpenBSD vmxnet3 ethernet driver to FreeBSD. I did a lot of cleanup, bug fixes, new features, etc (+2000 new lines) along the way so there is not much of a resemblance left. The driver is in good enough shape I'd like additional testers. A patch against -CURRENT is at [1]. Alternatively, the driver and a Makefile is at [2]; this should compile at least as far back as 9.1. I can look at 8-STABLE if there is interest. Obviously, besides reports of 'it works', I'm interested performance vs the emulated e1000, and (for those using it) the VMware tools vmxnet3 driver. Hopefully it is no worse :) Hello Bryan, thanks a lot for your hard work! It seems if_vmx doesn't support jumbo frames. If I set mtu 9000, I get »vmx0: cannot populate Rx queue 0«, I have no problems using jumbo frames with vmxnet3. I took a oldish host (4x2,8GHz Core2[LGA775]) with recent software: ESXi 5.1U1 and FreeBSD-9.2-RC2 Two guests are connected to one MTU9000 VMware Software Switch. Simple iperf (standard TCP) results: vmxnet3jumbo - vmxnet3jumbo 5.3Gbits/sec, load: 40-60%Sys 0.5-2%Intr vmxnet3 - vmxnet3 1.85 GBits/sec, load: 60-80%Sys 0-0.8%Intr if_vmx - if_vmx 1.51 GBits/sec, load: 10-45%Sys 40-48%Intr !!! if_vmxjumbo - if_vmxjumbo not possible if_em(e1000) - if_em(e1000) 1.23 GBits/sec, load: 80-60%Sys 0.5-8%Intr if_em(e1000)jumbo - if_em(e1000)jumbo 2.27Gbits/sec, load: 40-30%Sys 0.5-5%Intr if_igb(e1000e)junmbo - if_igb(e1000e)jumbo 5.03 Gbits/s, load: 70-60%Sys 0.5%Intr if_igb(e1000e) - if_igb(e1000e) 1.39 Gbits/s, load: 60-80%Sys 0.5%Intr f_igb(e1000e) - if_igb(e1000e), both hw.em.[rt]xd=4096 1.66 Gbits/s, load: 65-90%Sys 0.5%Intr if_igb(e1000e)junmbo - if_igb(e1000e)jumbo, both hw.em.[rt]xd=4096 4.81 Gbits/s, load: 65%Sys 0.5%Intr Conclusion: if_vmx performs well compared to the regular emulated nics and standard MTU, but it's behind tuned e1000e nic emulation and can't reach vmxnet3 performance with regular mtu. If one needs throughput, the missing jumbo frame support in if_vmx is a show stopper. e1000e is preferable over e1000, even if not officially choosable with FreeBSD-selection as guest (edit .vmx and alter ethernet0.virtualDev = e1000e, and dont forget to set hw.em.enable_msix=0 in loader.conf, although the driver e1000e attaches is if_igb!) Thanks, -Harry signature.asc Description: OpenPGP digital signature
How to best overload the fileops ?
I am working on linux epoll functionality for linuxlator. It implements epoll through kqueue, but there is the need to overload fo_close function in fileops to free some memory. There is no generic mechanism defined in the kernel sources allowing to do this, and it isn't desirable to do this in a hackish way. So I am suggesting this particular way, see code snippets below. This approach is inspired by how C++ classes are sub-classed, with C++ class being similar to kernel file descriptor type and C++ vtbl being similar to fileops. I am looking for an opinion(s) on these questions: * Is such code is acceptable for kernel? * Does it look too ugly? * Any suggestions on how to improve it? As the system develops, other places may require to do such overloading too, so this approach can be reused. Thank you, Yuri *** In sys/file.h add these macros (they define how overloading is done): #define FDCLASS_DEFINE(cls)\ struct fileops* fdcls_##cls##_fileops(void);\ struct fileops* fdcls_##cls##_fileops(void) {\ return (cls##ops);\ } #define FDCLASS_INHERIT(cls, cls_parent, cls_init_func)\ extern struct fileops* fdcls_##cls_parent##_fileops(void);\ static void cls##_fdcls_init(void *dummy __unused) {\ cls##ops = *fdcls_##cls_parent##_fileops();\ cls_init_func();\ }\ SYSINIT(cls##_fdcls, SI_SUB_PSEUDO, SI_ORDER_ANY, cls##_fdcls_init, NULL); *** In the end of kern/kern_event.c add the line exposing kqueue's fileops: FDCLASS_DEFINE(kqueue) *** In linux_epoll.c add code that would initialize epoll fileops with the base class fileops: /* overload kqueue fileops */ static struct fileops epollops; static struct fileops epollops_base; static void epoll_init(void) { /* overload only fo_close operation */ epollops_base = epollops; epollops.fo_close = epoll_close; } FDCLASS_INHERIT(epoll, kqueue, epoll_init) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Problems with iconv in base and static linking
Hi, While packaging my just-rebuilt ports today, I noticed a strange message occurring during the package creation stage: $ sudo make -C /usr/ports/ports-mgmt/pkg repackage === Building package for pkg-1.1.4_1 Creating package for pkg-1.1.4_1 Service unavailable$ In fact, *every* make package/repackage produces the Service unavailable message. The message is actually produced by the pkg(8) command, which is run as follows: /usr/local/sbin/pkg-static create -o /usr/ports/packages pkg-1.1.4_1 Now comes the interesting part: if you use /usr/local/sbin/pkg instead, the Service unavailable message does *not* appear. It turns out this is because pkg(8) uses libarchive, which is now compiled with iconv support from base by default. But the iconv in base does *not* work properly in statically linked executables. For example, take this small program: #include err.h #include iconv.h int main(void) { iconv_t ic = iconv_open(UTF-8, ISO-8859-1); if (ic == (iconv_t)-1) err(1, iconv_open failed); iconv_close(ic); return 0; } If you compile and link this statically, it will produce: $ cc -static iconv-test.c -o iconv-test-static $ ./iconv-test-static iconv-test-static: iconv_open failed: Invalid argument Service unavailable$ The reason for the message is that libc's iconv tries to dlopen(3) a dynamic library in /usr/lib/i18n, which does not work in static executables. As a quick fix for pkg(8), we could build the static version of libarchive without -DHAVE_ICONV and friends. This also helps other consumers of libarchive that link statically. Of course, there may be other consumers of libc's iconv that might want to link statically, so it should really be fixed there instead. For example, by not doing the dlopen, and failing gracefully. Or maybe by actually linking in (a subset of) the /usr/lib/i18n libraries. -Dimitry ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Question about socket timeouts
On Monday, August 19, 2013 11:13:02 pm Daniel Eischen wrote: On Mon, 19 Aug 2013, Adrian Chadd wrote: Yes! Please file a PR! This sorta implies that both are acceptable (although, the Linux behavior seems more desirable). http://austingroupbugs.net/view.php?id=369 No, that says round up, so it does mean that the requested timeout should be the minimum amount slept. tvtohz() does this. Really odd that the socket code is using its own version of this rather than tvtohz(). Oh, I bet this just predates tvtohz(). Interesting that it keeps getting bug fixes in its history that simply using tvtohz() would have solved. Try this: Index: uipc_socket.c === --- uipc_socket.c (revision 254570) +++ uipc_socket.c (working copy) @@ -2699,21 +2699,16 @@ sosetopt(struct socket *so, struct sockopt *sopt) if (error) goto bad; - /* assert(hz 0); */ if (tv.tv_sec 0 || tv.tv_sec INT_MAX / hz || tv.tv_usec 0 || tv.tv_usec = 100) { error = EDOM; goto bad; } - /* assert(tick 0); */ - /* assert(ULONG_MAX - INT_MAX = 100); */ - val = (u_long)(tv.tv_sec * hz) + tv.tv_usec / tick; - if (val INT_MAX) { + val = tvtohz(tv); + if (val == INT_MAX) { error = EDOM; goto bad; } - if (val == 0 tv.tv_usec != 0) - val = 1; switch (sopt-sopt_name) { case SO_SNDTIMEO: -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
[patch] Addition of 'futimensat' call allowing to set file time with nanosecond precision
Please check in this patch: http://www.freebsd.org/cgi/query-pr.cgi?pr=181459 Thank you, Yuri ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: How to best overload the fileops ?
Yuri wrote this message on Wed, Aug 21, 2013 at 11:37 -0700: I am working on linux epoll functionality for linuxlator. It implements epoll through kqueue, but there is the need to overload fo_close function in fileops to free some memory. How did this memory get allocated in the first place? Why does it need to be free'd in fo_close and not another location? There is no generic mechanism defined in the kernel sources allowing to do this, and it isn't desirable to do this in a hackish way. So I am suggesting this particular way, see code snippets below. This approach is inspired by how C++ classes are sub-classed, with C++ class being similar to kernel file descriptor type and C++ vtbl being similar to fileops. I am looking for an opinion(s) on these questions: * Is such code is acceptable for kernel? * Does it look too ugly? * Any suggestions on how to improve it? As the system develops, other places may require to do such overloading too, so this approach can be reused. Thank you, Yuri *** In sys/file.h add these macros (they define how overloading is done): #define FDCLASS_DEFINE(cls)\ struct fileops* fdcls_##cls##_fileops(void);\ struct fileops* fdcls_##cls##_fileops(void) {\ return (cls##ops);\ } #define FDCLASS_INHERIT(cls, cls_parent, cls_init_func)\ extern struct fileops* fdcls_##cls_parent##_fileops(void);\ static void cls##_fdcls_init(void *dummy __unused) {\ cls##ops = *fdcls_##cls_parent##_fileops();\ cls_init_func();\ }\ SYSINIT(cls##_fdcls, SI_SUB_PSEUDO, SI_ORDER_ANY, cls##_fdcls_init, NULL); *** In the end of kern/kern_event.c add the line exposing kqueue's fileops: FDCLASS_DEFINE(kqueue) *** In linux_epoll.c add code that would initialize epoll fileops with the base class fileops: /* overload kqueue fileops */ static struct fileops epollops; static struct fileops epollops_base; static void epoll_init(void) { /* overload only fo_close operation */ epollops_base = epollops; epollops.fo_close = epoll_close; } FDCLASS_INHERIT(epoll, kqueue, epoll_init) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: How to best overload the fileops ?
On 08/21/2013 16:21, John-Mark Gurney wrote: How did this memory get allocated in the first place? Why does it need to be free'd in fo_close and not another location? It is allocated by the epoll module right after kqueue object is created and is attached to the opaque field in the file object. And it should be deallocated when this fd is closed, hence fo_close. Yuri ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: How to best overload the fileops ?
On Wed, Aug 21, 2013 at 04:53:06PM -0700, Yuri wrote: On 08/21/2013 16:21, John-Mark Gurney wrote: How did this memory get allocated in the first place? Why does it need to be free'd in fo_close and not another location? It is allocated by the epoll module right after kqueue object is created and is attached to the opaque field in the file object. And it should be deallocated when this fd is closed, hence fo_close. Short answer is provide epollops with your own fo_close and the rest as it is currently in kqueueops. All function are static, but this is not a real problem since you have to modify kern_event.c anyway. I don't know how your code looks like in general, so in case its not clear, simply wrapping sys_kqueue is inherently racy (some other thread may close the fd or even reuse it for something else by the time you try to do anything with it), thus modification of current code is unavoidable. -- Mateusz Guzik mjguzik gmail.com ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: How to best overload the fileops ?
On 08/21/2013 17:10, Mateusz Guzik wrote: Short answer is provide epollops with your own fo_close and the rest as it is currently in kqueueops. All function are static, but this is not a real problem since you have to modify kern_event.c anyway. This is exactly what this code I am asking about is doing. kqueueops functions are all static. This modification allows to export fileops to child modules. Since there is nothing similar in the kernel code, I am asking does this way look ugly or not. I don't know how your code looks like in general, so in case its not clear, simply wrapping sys_kqueue is inherently racy (some other thread may close the fd or even reuse it for something else by the time you try to do anything with it), thus modification of current code is unavoidable. No, sys_kqueue calling code is all protected by the lock on this file object. So nobody can close or reuse it. Yuri ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: How to best overload the fileops ?
On Wed, Aug 21, 2013 at 05:30:05PM -0700, Yuri wrote: On 08/21/2013 17:10, Mateusz Guzik wrote: Short answer is provide epollops with your own fo_close and the rest as it is currently in kqueueops. All function are static, but this is not a real problem since you have to modify kern_event.c anyway. This is exactly what this code I am asking about is doing. I somehow missed your first mail and reponsed only to your reply to jmg, sorry! kqueueops functions are all static. This modification allows to export fileops to child modules. Since there is nothing similar in the kernel code, I am asking does this way look ugly or not. I don't think there is a need to provide anything like this right now, nor I have any good idea how to implement it. I don't know how your code looks like in general, so in case its not clear, simply wrapping sys_kqueue is inherently racy (some other thread may close the fd or even reuse it for something else by the time you try to do anything with it), thus modification of current code is unavoidable. No, sys_kqueue calling code is all protected by the lock on this file object. So nobody can close or reuse it. I don't follow. sys_kqueue creates fp on its own, before that there is nothing to lock in the first place. By the time it returns, created fp can be long gone because some other thread closed it. FreeBSD idiom of dealing with that is creating fp with 2 refs and dropping one when initialization is finished. That way close() from userland does not kill fp until the code creating the object finishes. Just my $0,03. -- Mateusz Guzik mjguzik gmail.com ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: How to best overload the fileops ?
I don't think there is a need to provide anything like this right now, nor I have any good idea how to implement it. This kinda leave it hanging in the same state. To do this kqueue fileops need to be exposed. It is always possible to create something like struct fileops* kqueue_fileops() and that would do it. I just tried to make such exposure as nice as I could, using some accepted paradigms (overloading, etc) and macros that look like some IDE might create. Another approach is to read fileops from file after the first call to sys_kqueue, but I dislike this because this would require an additional lock, also this would make the first call to epoll_create different from the others, which it shouldn't be. Also this would look much more like a hack. What is wrong with the suggested approach anyway? No, sys_kqueue calling code is all protected by the lock on this file object. So nobody can close or reuse it. I don't follow. sys_kqueue creates fp on its own, before that there is nothing to lock in the first place. By the time it returns, created fp can be long gone because some other thread closed it. I added the method kqueue_locked that leaves the the lock release to the calling routine (another kernel module). This way both epoll_create and sys_kqueue run under one atomic lock. Yuri ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
why does buildkernel set COMPILER_TYPE?
I've noticed that if you do a: make buildworld WITHOUT_CLANG_IS_CC=YES and then do a: make buildkernel (w/o the WITHOUT_CLANG_IS_CC=YES option) that it fails... Apparently instead of letting buildkernel figure out which compiler it will use, the src/Makefile.inc1 forces COMPILER_TYPE to be what the options specified instead of using what bsd.compiler.mk figures out... Can w/ please fix this? This isn't the first time I've run into this, and it's quite anoying. This simple patch fixes it: ndex: Makefile.inc1 === --- Makefile.inc1 (revision 254546) +++ Makefile.inc1 (working copy) @@ -428,7 +428,7 @@ .endif # kernel stage -KMAKEENV= ${WMAKEENV} +KMAKEENV= ${WMAKEENV:NCOMPILER_TYPE=*} KMAKE= ${KMAKEENV} ${MAKE} ${.MAKEFLAGS} ${KERNEL_FLAGS} KERNEL=${INSTKERNNAME} # This just removes setting COMPILER_TYPE for the kernel target and lets the magic in bsd.compiler.mk do it's thing... Comments? -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org