Re: IO Performance under VMware on LSI RAID controller
On Sep 19, 2013, at 11:25 AM, Guy Helmer guy.hel...@gmail.com wrote: Normally I build VMware ESXi servers with enterprise-class WD SATA drives and I/O performance in FreeBSD VMs on the servers is fine. Whenever I build a VMware ESXi server with a RAID controller, IO performance is awful in FreeBSD VMs. I've previously seen this effect with VMware ESXi 3ware 9690SA-8I and 9650 RAID controllers, and now I'm seeing similar performance with a Dell 6/iR controller. Any suggestions would be appreciated. Guy (Replying to self due to hint received off-list) I seem to remember controllers mentioned previously by FreeBSD device driver developers that don't deal well with large I/O requests. It turns out that may be the case with VMware device drivers as well -- reducing the VMware Disk.DiskMaxIOSize value from its huge default of 32676KB to 32KB seems to have helped. Disk ops/sec in the FreeBSD VM are now peaking over 400/sec. Guy signature.asc Description: Message signed with OpenPGP using GPGMail
Re: About Transparent Superpages and Non-transparent superapges
On Sep 19, 2013, at 22:06 , Patrick Dung wrote: We at Line Rate (now F5) are developing support for 1 Gig superpages on amd64. We're basing our work on 9.1.0 for now. An early preview is available here: https://github.com/Seb-LineRate/freebsd/tree/freebsd-9.1.0-1gig-pages-NOT-READY-2 That is cool. What type of applications can take advantage of the 1Gb page size? And is it transparent? Or applications need to be modified? It's transparent for the kernel: all of UMA and kmem_malloc()/kmem_free() is backed by 1 gig superpages. It's not transparent for userspace: applications need to pass a new flag to mmap() to get 1 gig pages. This is useful in applications with high memory pressure, where memory bandwidth and TLB misses are a limiting factor. -- Sebastian Kuzminsky ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Network stack changes
On Sep 19, 2013, at 16:08 , Luigi Rizzo ri...@iet.unipi.it wrote: On Thu, Sep 19, 2013 at 03:54:34PM -0400, George Neville-Neil wrote: On Sep 14, 2013, at 15:24 , Luigi Rizzo ri...@iet.unipi.it wrote: On Saturday, September 14, 2013, Olivier Cochard-Labb? oliv...@cochard.me wrote: On Sat, Sep 14, 2013 at 4:28 PM, Luigi Rizzo ri...@iet.unipi.it wrote: IXIA ? For the timescales we need to address we don't need an IXIA, a netmap sender is more than enough The great netmap generates only one IP flow (same src/dst IP and same src/dst port). True the sample app generates only one flow but it is trivial to modify it to generate multiple flows. My point was, we have the ability to generate high rate traffic, as long as we do tolerate a .1-1us jitter. Beyond that, you do need some ixia-like solution. On the bandwidth side, can a modern sender with netmap really do a full 10G? I hate the cost of an IXIA but I have not been able to destroy our stack as effectively with anything else. yes george, you can download the picobsd image http://info.iet.unipi.it/~luigi/netmap/20120618-netmap-picobsd-head-amd64.bin and try for yourself. Granted this does not have all the knobs of an ixia but it can surely blast the full 14.88 Mpps to the link, and it only takes a bit of userspace programming to generate reasonably arbitrary streams of packets. A netmap sender/receiver is not CPU bound even with 1 core. Interesting. It's on my todo. Best, George signature.asc Description: Message signed with OpenPGP using GPGMail
Re: About Transparent Superpages and Non-transparent superapges
[repost, the previous email was stuck because I used an old email address] On 21 September 2013 03:09, Cedric Blancher cedric.blanc...@gmail.com wrote: On 20 September 2013 17:20, Sebastian Kuzminsky s.kuzmin...@f5.com wrote: On Sep 19, 2013, at 22:06 , Patrick Dung wrote: We at Line Rate (now F5) are developing support for 1 Gig superpages on amd64. We're basing our work on 9.1.0 for now. An early preview is available here: https://github.com/Seb-LineRate/freebsd/tree/freebsd-9.1.0-1gig-pages-NOT-READY-2 That is cool. What type of applications can take advantage of the 1Gb page size? And is it transparent? Or applications need to be modified? It's transparent for the kernel: all of UMA and kmem_malloc()/kmem_free() is backed by 1 gig superpages. It's not transparent for userspace: applications need to pass a new flag to mmap() to get 1 gig pages. That may be the wrong approach. What happens if x86 gets more huge/largepage sizes like SPARC does (hint: Sign NDA with Intel and AMD and get surprised, and then allocate 16 more bits for mmap() if you wish to stick with your approach)? For example SPARC64 does 8k, 64k, 512k, 4M, 32M, 256M, 2GB and 256GB pages (actual page sizes differ from MMU to MMU implementation, and can be probed via pagesize -a). A much better option would be to follow the Solaris API which has APIs to enumerate the available page sizes, and then set it either for heap, stack or a given address range (the last one is used to use largepages for file I/O via mmap()). For example ksh93 uses this to use 64k pages for the stack (this mainly aims at SPARC where 64k stack pages can be a real performance booster if you shuffle a lot of strings via stack): --- int main(int argc, char *argv[]) { #if _lib_memcntl /* advise larger stack size */ struct memcntl_mha mha; mha.mha_cmd = MHA_MAPSIZE_STACK; mha.mha_flags = 0; mha.mha_pagesize = 64 * 1024; (void)memcntl(NULL, 0, MC_HAT_ADVISE, (caddr_t)mha, 0, 0); #endif return(sh_main(argc, argv, (Shinit_f)0)); } --- Below is the memcntl(2) manpage describing the API: --- System Calls memcntl(2) NAME memcntl - memory management control SYNOPSIS #include sys/types.h #include sys/mman.h int memcntl(caddr_t _ a_ d_ d_ r, size_t _ l_ e_ n, int _ c_ m_ d, caddr_t _ a_ r_ g, int _ a_ t_ t_ r, int _ m_ a_ s_ k); DESCRIPTION The memcntl() function allows the calling process to apply a variety of control operations over the address space identi- fied by the mappings established for the address range [_ a_ d_ d_ r, _ a_ d_ d_ r + _ l_ e_ n). The _ a_ d_ d_ r argument must be a multiple of the pagesize as returned by sysconf(3C). The scope of the control operations can be further defined with additional selection criteria (in the form of attributes) according to the bit pattern contained in _ a_ t_ t_ r. The following attributes specify page mapping selection cri- teria: SHARED Page is mapped shared. PRIVATEPage is mapped private. The following attributes specify page protection selection criteria. The selection criteria are constructed by a bit- wise OR operation on the attribute bits and must match exactly. PROT_READ Page can be read. PROT_WRITEPage can be written. PROT_EXEC Page can be executed. The following criteria may also be specified: SunOS 5.11 Last change: 10 Apr 20071 System Calls memcntl(2) PROC_TEXTProcess text. PROC_DATAProcess data. The PROC_TEXT attribute specifies all privately mapped seg- ments with read and execute permission, and the PROC_DATA attribute specifies all privately mapped segments with write permission. Selection criteria can be used to describe various abstract memory objects within the address space on which to operate. If an operation shall not be constrained by the selection criteria, _ a_ t_ t_ r must have the value 0. The operation to be performed is identified by the argument _ c_ m_ d. The symbolic names for the operations are defined in sys/mman.h as follows: MC_LOCK Lock in memory all pages in the range with attributes _ a_ t_ t_ r. A given page may be locked multiple times through different mappings; however, within a given mapping, page locks do not nest. Multiple lock operations on the same address in the same process will all be removed with a single unlock operation. A page locked in one process
Re: About Transparent Superpages and Non-transparent superapges
On 19 September 2013 22:34, Sebastian Kuzminsky s.kuzmin...@f5.com wrote: On Sep 18, 2013, at 10:08 , Patrick Dung wrote: I have seen somewhere that superpages support was being developed in HEAD too. Any insight on it? We at Line Rate (now F5) are developing support for 1 Gig superpages on amd64. We're basing our work on 9.1.0 for now. An early preview is available here: https://github.com/Seb-LineRate/freebsd/tree/freebsd-9.1.0-1gig-pages-NOT-READY-2 Have you ever asked Roland Mainz roland.ma...@nrubsig.org to look at your work? He worked on MPSS (multiple page size support) on Solaris and did a lot of the optimisation work there. Ced -- Cedric Blancher cedric.blanc...@gmail.com Institute Pasteur ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: About Transparent Superpages and Non-transparent superapges
On 20 September 2013 17:20, Sebastian Kuzminsky s.kuzmin...@f5.com wrote: On Sep 19, 2013, at 22:06 , Patrick Dung wrote: We at Line Rate (now F5) are developing support for 1 Gig superpages on amd64. We're basing our work on 9.1.0 for now. An early preview is available here: https://github.com/Seb-LineRate/freebsd/tree/freebsd-9.1.0-1gig-pages-NOT-READY-2 That is cool. What type of applications can take advantage of the 1Gb page size? And is it transparent? Or applications need to be modified? It's transparent for the kernel: all of UMA and kmem_malloc()/kmem_free() is backed by 1 gig superpages. It's not transparent for userspace: applications need to pass a new flag to mmap() to get 1 gig pages. That may be the wrong approach. What happens if x86 gets more huge/largepage sizes like SPARC does (hint: Sign NDA with Intel and AMD and get surprised, and then allocate 16 more bits for mmap() if you wish to stick with your approach)? For example SPARC64 does 8k, 64k, 512k, 4M, 32M, 256M, 2GB and 256GB pages (actual page sizes differ from MMU to MMU implementation, and can be probed via pagesize -a). A much better option would be to follow the Solaris API which has APIs to enumerate the available page sizes, and then set it either for heap, stack or a given address range (the last one is used to use largepages for file I/O via mmap()). For example ksh93 uses this to use 64k pages for the stack (this mainly aims at SPARC where 64k stack pages can be a real performance booster if you shuffle a lot of strings via stack): --- int main(int argc, char *argv[]) { #if _lib_memcntl /* advise larger stack size */ struct memcntl_mha mha; mha.mha_cmd = MHA_MAPSIZE_STACK; mha.mha_flags = 0; mha.mha_pagesize = 64 * 1024; (void)memcntl(NULL, 0, MC_HAT_ADVISE, (caddr_t)mha, 0, 0); #endif return(sh_main(argc, argv, (Shinit_f)0)); } --- Below is the memcntl(2) manpage describing the API: --- System Calls memcntl(2) NAME memcntl - memory management control SYNOPSIS #include sys/types.h #include sys/mman.h int memcntl(caddr_t _a_d_d_r, size_t _l_e_n, int _c_m_d, caddr_t _a_r_g, int _a_t_t_r, int _m_a_s_k); DESCRIPTION The memcntl() function allows the calling process to apply a variety of control operations over the address space identi- fied by the mappings established for the address range [_a_d_d_r, _a_d_d_r + _l_e_n). The _a_d_d_r argument must be a multiple of the pagesize as returned by sysconf(3C). The scope of the control operations can be further defined with additional selection criteria (in the form of attributes) according to the bit pattern contained in _a_t_t_r. The following attributes specify page mapping selection cri- teria: SHARED Page is mapped shared. PRIVATEPage is mapped private. The following attributes specify page protection selection criteria. The selection criteria are constructed by a bit- wise OR operation on the attribute bits and must match exactly. PROT_READ Page can be read. PROT_WRITEPage can be written. PROT_EXEC Page can be executed. The following criteria may also be specified: SunOS 5.11 Last change: 10 Apr 20071 System Calls memcntl(2) PROC_TEXTProcess text. PROC_DATAProcess data. The PROC_TEXT attribute specifies all privately mapped seg- ments with read and execute permission, and the PROC_DATA attribute specifies all privately mapped segments with write permission. Selection criteria can be used to describe various abstract memory objects within the address space on which to operate. If an operation shall not be constrained by the selection criteria, _a_t_t_r must have the value 0. The operation to be performed is identified by the argument _c_m_d. The symbolic names for the operations are defined in sys/mman.h as follows: MC_LOCK Lock in memory all pages in the range with attributes _a_t_t_r. A given page may be locked multiple times through different mappings; however, within a given mapping, page locks do not nest. Multiple lock operations on the same address in the same process will all be removed with a single unlock operation. A page locked in one process and mapped in another (or visible through a dif- ferent mapping in the locking process) is locked in memory as long as the locking process does neither an implicit nor explicit unlock operation. If a