Re: IO Performance under VMware on LSI RAID controller

2013-09-20 Thread Guy Helmer
On Sep 19, 2013, at 11:25 AM, Guy Helmer guy.hel...@gmail.com wrote:

 Normally I build VMware ESXi servers with enterprise-class WD SATA drives and 
 I/O performance in FreeBSD VMs on the servers is fine.
 Whenever I build a VMware ESXi server with a RAID controller, IO performance 
 is awful in FreeBSD VMs. I've previously seen this effect with VMware ESXi 
 3ware 9690SA-8I and 9650 RAID controllers, and now I'm seeing similar 
 performance with a Dell 6/iR controller.
 
 Any suggestions would be appreciated.
 
 Guy

(Replying to self due to hint received off-list)

I seem to remember controllers mentioned previously by FreeBSD device driver 
developers that don't deal well with large I/O requests. It turns out that may 
be the case with VMware device drivers as well -- reducing the VMware 
Disk.DiskMaxIOSize value from its huge default of 32676KB to 32KB seems to have 
helped. Disk ops/sec in the FreeBSD VM are now peaking over 400/sec.

Guy


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: About Transparent Superpages and Non-transparent superapges

2013-09-20 Thread Sebastian Kuzminsky
On Sep 19, 2013, at 22:06 , Patrick Dung wrote:

 We at Line Rate (now F5) are developing support for 1 Gig superpages on 
 amd64.  We're basing our work on 9.1.0 for now.
 
 An early preview is available here:
 
 https://github.com/Seb-LineRate/freebsd/tree/freebsd-9.1.0-1gig-pages-NOT-READY-2
 
 That is cool.
 
 What type of applications can take advantage of the 1Gb page size?
 And is it transparent? Or applications need to be modified?

It's transparent for the kernel: all of UMA and kmem_malloc()/kmem_free() is 
backed by 1 gig superpages.

It's not transparent for userspace: applications need to pass a new flag to 
mmap() to get 1 gig pages.

This is useful in applications with high memory pressure, where memory 
bandwidth and TLB misses are a limiting factor.


-- 
Sebastian Kuzminsky

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Network stack changes

2013-09-20 Thread George Neville-Neil

On Sep 19, 2013, at 16:08 , Luigi Rizzo ri...@iet.unipi.it wrote:

 On Thu, Sep 19, 2013 at 03:54:34PM -0400, George Neville-Neil wrote:
 
 On Sep 14, 2013, at 15:24 , Luigi Rizzo ri...@iet.unipi.it wrote:
 
 
 
 On Saturday, September 14, 2013, Olivier Cochard-Labb? oliv...@cochard.me 
 wrote:
 On Sat, Sep 14, 2013 at 4:28 PM, Luigi Rizzo ri...@iet.unipi.it wrote:
 
 IXIA ? For the timescales we need to address we don't need an IXIA,
 a netmap sender is more than enough
 
 
 The great netmap generates only one IP flow (same src/dst IP and same
 src/dst port).
 
 True the sample app generates only one flow but it is trivial to modify it 
 to generate multiple flows. My point was, we have the ability to generate 
 high rate traffic, as long as we do tolerate a .1-1us jitter. Beyond that, 
 you do need some ixia-like solution.
 
 
 On the bandwidth side, can a modern sender with netmap really do a full 10G? 
  I hate the cost of an
 IXIA but I have not been able to destroy our stack as effectively with 
 anything else.
 
 yes george, you can download the picobsd image
 
 http://info.iet.unipi.it/~luigi/netmap/20120618-netmap-picobsd-head-amd64.bin
 
 and try for yourself.
 
 Granted this does not have all the knobs of an ixia but it can
 surely blast the full 14.88 Mpps to the link, and it only takes a
 bit of userspace programming to generate reasonably arbitrary streams
 of packets. A netmap sender/receiver is not CPU bound even with 1 core.
 

Interesting.  It's on my todo.

Best,
George




signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: About Transparent Superpages and Non-transparent superapges

2013-09-20 Thread Cedric Blancher
[repost, the previous email was stuck because I used an old email address]

On 21 September 2013 03:09, Cedric Blancher cedric.blanc...@gmail.com wrote:
 On 20 September 2013 17:20, Sebastian Kuzminsky s.kuzmin...@f5.com wrote:
 On Sep 19, 2013, at 22:06 , Patrick Dung wrote:

 We at Line Rate (now F5) are developing support for 1 Gig superpages on 
 amd64.  We're basing our work on 9.1.0 for now.
 
 An early preview is available here:
 
 https://github.com/Seb-LineRate/freebsd/tree/freebsd-9.1.0-1gig-pages-NOT-READY-2

 That is cool.

 What type of applications can take advantage of the 1Gb page size?
 And is it transparent? Or applications need to be modified?

 It's transparent for the kernel: all of UMA and kmem_malloc()/kmem_free() is 
 backed by 1 gig superpages.

 It's not transparent for userspace: applications need to pass a new flag to 
 mmap() to get 1 gig pages.

 That may be the wrong approach. What happens if x86 gets more
 huge/largepage sizes like SPARC does (hint: Sign NDA with Intel and
 AMD and get surprised, and then allocate 16 more bits for mmap() if
 you wish to stick with your approach)? For example SPARC64 does 8k,
 64k, 512k, 4M, 32M, 256M, 2GB and 256GB pages (actual page sizes
 differ from MMU to MMU implementation, and can be probed via pagesize
 -a).

 A much better option would be to follow the Solaris API which has APIs
 to enumerate the available page sizes, and then set it either for
 heap, stack or a given address range (the last one is used to use
 largepages for file I/O via mmap()).

 For example ksh93 uses this to use 64k pages for the stack (this
 mainly aims at SPARC where 64k stack pages can be a real performance
 booster if you shuffle a lot of strings via stack):
 ---
 int main(int argc, char *argv[])
 {
 #if _lib_memcntl
 /* advise larger stack size */
 struct memcntl_mha mha;
 mha.mha_cmd = MHA_MAPSIZE_STACK;
 mha.mha_flags = 0;
 mha.mha_pagesize = 64 * 1024;
 (void)memcntl(NULL, 0, MC_HAT_ADVISE, (caddr_t)mha, 0, 0);
 #endif
 return(sh_main(argc, argv, (Shinit_f)0));
 }
 ---

 Below is the memcntl(2) manpage describing the API:
 ---



 System Calls   memcntl(2)



 NAME
  memcntl - memory management control

 SYNOPSIS
  #include sys/types.h
  #include sys/mman.h

  int memcntl(caddr_t _ a_ d_ d_ r, size_t _ l_ e_ n, int
 _ c_ m_ d, caddr_t _ a_ r_ g,
   int _ a_ t_ t_ r, int _ m_ a_ s_ k);


 DESCRIPTION
  The memcntl() function allows the calling process to apply a
  variety of control operations over the address space identi-
  fied by the  mappings  established  for  the  address  range
  [_ a_ d_ d_ r, _ a_ d_ d_ r + _ l_ e_ n).


  The _ a_ d_ d_ r argument must be a  multiple  of  the  pagesize  as
  returned by sysconf(3C). The scope of the control operations
  can be further defined with  additional  selection  criteria
  (in  the  form  of  attributes) according to the bit pattern
  contained in _ a_ t_ t_ r.


  The following attributes specify page mapping selection cri-
  teria:

  SHARED Page is mapped shared.


  PRIVATEPage is mapped private.



  The following attributes specify page  protection  selection
  criteria.  The  selection criteria are constructed by a bit-
  wise OR operation on  the  attribute  bits  and  must  match
  exactly.

  PROT_READ Page can be read.


  PROT_WRITEPage can be written.


  PROT_EXEC Page can be executed.



  The following criteria may also be specified:




 SunOS 5.11  Last change: 10 Apr 20071






 System Calls   memcntl(2)



  PROC_TEXTProcess text.


  PROC_DATAProcess data.



  The PROC_TEXT attribute specifies all privately mapped  seg-
  ments  with  read  and execute permission, and the PROC_DATA
  attribute specifies all privately mapped segments with write
  permission.


  Selection criteria can be used to describe various  abstract
  memory objects within the address space on which to operate.
  If an operation shall not be constrained  by  the  selection
  criteria, _ a_ t_ t_ r must have the value 0.


  The operation to be performed is identified by the  argument
  _ c_ m_ d.  The  symbolic  names for the operations are defined in
  sys/mman.h as follows:

  MC_LOCK

  Lock in memory all pages in the  range  with  attributes
  _ a_ t_ t_ r.  A given page may be locked multiple times through
  different mappings; however,  within  a  given  mapping,
  page  locks do not nest. Multiple lock operations on the
  same address in the same process  will  all  be  removed
  with  a  single  unlock  operation. A page locked in one
  process 

Re: About Transparent Superpages and Non-transparent superapges

2013-09-20 Thread Cedric Blancher
On 19 September 2013 22:34, Sebastian Kuzminsky s.kuzmin...@f5.com wrote:
 On Sep 18, 2013, at 10:08 , Patrick Dung wrote:

 I have seen somewhere that superpages support was being developed in HEAD 
 too.
 Any insight on it?


 We at Line Rate (now F5) are developing support for 1 Gig superpages on 
 amd64.  We're basing our work on 9.1.0 for now.

 An early preview is available here:

 https://github.com/Seb-LineRate/freebsd/tree/freebsd-9.1.0-1gig-pages-NOT-READY-2

Have you ever asked Roland Mainz roland.ma...@nrubsig.org to look at
your work? He worked on MPSS (multiple page size support) on Solaris
and did a lot of the optimisation work there.

Ced
-- 
Cedric Blancher cedric.blanc...@gmail.com
Institute Pasteur
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: About Transparent Superpages and Non-transparent superapges

2013-09-20 Thread Cedric Blancher
On 20 September 2013 17:20, Sebastian Kuzminsky s.kuzmin...@f5.com wrote:
 On Sep 19, 2013, at 22:06 , Patrick Dung wrote:

 We at Line Rate (now F5) are developing support for 1 Gig superpages on 
 amd64.  We're basing our work on 9.1.0 for now.
 
 An early preview is available here:
 
 https://github.com/Seb-LineRate/freebsd/tree/freebsd-9.1.0-1gig-pages-NOT-READY-2

 That is cool.

 What type of applications can take advantage of the 1Gb page size?
 And is it transparent? Or applications need to be modified?

 It's transparent for the kernel: all of UMA and kmem_malloc()/kmem_free() is 
 backed by 1 gig superpages.

 It's not transparent for userspace: applications need to pass a new flag to 
 mmap() to get 1 gig pages.

That may be the wrong approach. What happens if x86 gets more
huge/largepage sizes like SPARC does (hint: Sign NDA with Intel and
AMD and get surprised, and then allocate 16 more bits for mmap() if
you wish to stick with your approach)? For example SPARC64 does 8k,
64k, 512k, 4M, 32M, 256M, 2GB and 256GB pages (actual page sizes
differ from MMU to MMU implementation, and can be probed via pagesize
-a).

A much better option would be to follow the Solaris API which has APIs
to enumerate the available page sizes, and then set it either for
heap, stack or a given address range (the last one is used to use
largepages for file I/O via mmap()).

For example ksh93 uses this to use 64k pages for the stack (this
mainly aims at SPARC where 64k stack pages can be a real performance
booster if you shuffle a lot of strings via stack):
---
int main(int argc, char *argv[])
{
#if _lib_memcntl
/* advise larger stack size */
struct memcntl_mha mha;
mha.mha_cmd = MHA_MAPSIZE_STACK;
mha.mha_flags = 0;
mha.mha_pagesize = 64 * 1024;
(void)memcntl(NULL, 0, MC_HAT_ADVISE, (caddr_t)mha, 0, 0);
#endif
return(sh_main(argc, argv, (Shinit_f)0));
}
---

Below is the memcntl(2) manpage describing the API:
---



System Calls   memcntl(2)



NAME
 memcntl - memory management control

SYNOPSIS
 #include sys/types.h
 #include sys/mman.h

 int memcntl(caddr_t _a_d_d_r, size_t _l_e_n, int
_c_m_d, caddr_t _a_r_g,
  int _a_t_t_r, int _m_a_s_k);


DESCRIPTION
 The memcntl() function allows the calling process to apply a
 variety of control operations over the address space identi-
 fied by the  mappings  established  for  the  address  range
 [_a_d_d_r, _a_d_d_r + _l_e_n).


 The _a_d_d_r argument must be a  multiple  of  the  pagesize  as
 returned by sysconf(3C). The scope of the control operations
 can be further defined with  additional  selection  criteria
 (in  the  form  of  attributes) according to the bit pattern
 contained in _a_t_t_r.


 The following attributes specify page mapping selection cri-
 teria:

 SHARED Page is mapped shared.


 PRIVATEPage is mapped private.



 The following attributes specify page  protection  selection
 criteria.  The  selection criteria are constructed by a bit-
 wise OR operation on  the  attribute  bits  and  must  match
 exactly.

 PROT_READ Page can be read.


 PROT_WRITEPage can be written.


 PROT_EXEC Page can be executed.



 The following criteria may also be specified:




SunOS 5.11  Last change: 10 Apr 20071






System Calls   memcntl(2)



 PROC_TEXTProcess text.


 PROC_DATAProcess data.



 The PROC_TEXT attribute specifies all privately mapped  seg-
 ments  with  read  and execute permission, and the PROC_DATA
 attribute specifies all privately mapped segments with write
 permission.


 Selection criteria can be used to describe various  abstract
 memory objects within the address space on which to operate.
 If an operation shall not be constrained  by  the  selection
 criteria, _a_t_t_r must have the value 0.


 The operation to be performed is identified by the  argument
 _c_m_d.  The  symbolic  names for the operations are defined in
 sys/mman.h as follows:

 MC_LOCK

 Lock in memory all pages in the  range  with  attributes
 _a_t_t_r.  A given page may be locked multiple times through
 different mappings; however,  within  a  given  mapping,
 page  locks do not nest. Multiple lock operations on the
 same address in the same process  will  all  be  removed
 with  a  single  unlock  operation. A page locked in one
 process and mapped in another (or visible through a dif-
 ferent  mapping  in  the  locking  process) is locked in
 memory as long as the locking process  does  neither  an
 implicit nor explicit unlock operation. If a