Re: unnbound vs file descriptors

2014-12-16 Thread Bogdan Andu
may just a naive question..
but did you sudo vipw 
and put unbound class for unbound user?
/Bogdan
 

 On Tuesday, December 16, 2014 9:46 AM, Otto Moerbeek o...@drijf.net 
wrote:
   

 Hi,

So i have started using unbound on a mailserver (running amd64 5.6-stable). 

First observation is that it uses (too?) many file descriptors in the
default setup. 

Dec 15 22:38:00 mx1 unbound: [8713:0] error: can't create socket: Too many open 
files
Dec 15 22:38:00 mx1 last message repeated 1366 times

$ unbound-checkconf -o outgoing-range
4000

But even after settting this to 1500 and having a login.conf:

unbound:\
        :openfiles-cur=2048:\
        :tc=daemon:

I am still seeing these log messages.

I'd like to make sure the settings out of the box are reasonable
(setting outgoing-range any maybe other options in the default config
and/or having a default entry in loging.conf, but so far unbound is
not cooperating. Any clue on what setting I should fiddle with? 

    -Otto





   

Re: unnbound vs file descriptors

2014-12-16 Thread Otto Moerbeek
On Tue, Dec 16, 2014 at 09:04:52AM +, Bogdan Andu wrote:

 may just a naive question..
 but did you sudo vipw 
 and put unbound class for unbound user?

That's not neccesary anymore these days, I believe. The rc.d subsystem
takes case of setting the proper class, if available.  At least it
does not document setting the login class in the pwd db is needed. 

-Otto


 /Bogdan
  
 
  On Tuesday, December 16, 2014 9:46 AM, Otto Moerbeek o...@drijf.net 
 wrote:

 
  Hi,
 
 So i have started using unbound on a mailserver (running amd64 5.6-stable). 
 
 First observation is that it uses (too?) many file descriptors in the
 default setup. 
 
 Dec 15 22:38:00 mx1 unbound: [8713:0] error: can't create socket: Too many 
 open files
 Dec 15 22:38:00 mx1 last message repeated 1366 times
 
 $ unbound-checkconf -o outgoing-range
 4000
 
 But even after settting this to 1500 and having a login.conf:
 
 unbound:\
 ?? ?? ?? ?? :openfiles-cur=2048:\
 ?? ?? ?? ?? :tc=daemon:
 
 I am still seeing these log messages.
 
 I'd like to make sure the settings out of the box are reasonable
 (setting outgoing-range any maybe other options in the default config
 and/or having a default entry in loging.conf, but so far unbound is
 not cooperating. Any clue on what setting I should fiddle with? 
 
 ?? -Otto
 
 
 
 
 




Re: unnbound vs file descriptors

2014-12-16 Thread Antoine Jacoutot
  may just a naive question..
  but did you sudo vipw 
  and put unbound class for unbound user?
 
 That's not neccesary anymore these days, I believe. The rc.d subsystem
 takes case of setting the proper class, if available.  At least it

That's correct.

 does not document setting the login class in the pwd db is needed. 

Because it's not :-)

-- 
Antoine



Re: Binary code patching and paravirtualization

2014-12-16 Thread Mark Kettenis
 Date: Mon, 15 Dec 2014 23:21:35 +0100 (CET)
 From: Stefan Fritsch s...@sfritsch.de
 
 On Thu, 11 Dec 2014, Mark Kettenis wrote:
 
   From: Alexey Suslikov alexey.susli...@gmail.com
   Date: Thu, 11 Dec 2014 20:51:14 + (UTC)
   
   Stefan Fritsch sf at sfritsch.de writes:
   
--- a/sys/arch/amd64/include/specialreg.h
+++ b/sys/arch/amd64/include/specialreg.h
 at  at  -158,6 +158,7  at  at 
 #defineCPUIDECX_AVX0x1000  /* Advanced Vector 
Extensions 
   */
 #defineCPUIDECX_F16C   0x2000  /* 16bit fp conversion  
*/
 #defineCPUIDECX_RDRAND 0x4000  /* RDRAND instruction  
*/
+#defineCPUIDECX_HYPERV 0x8000  /* Hypervisor present */
   
   Is this flag standardized? Last time I have tried to push this, there
   was an objection based on reserved for future use status of this flag.
   
   See http://marc.info/?l=openbsd-bugsm=136907278229145w=2
  
  Well, that thread started out with a questionable workaround for a
  hypervisor bug.  That may have have influenced the debate about the
  flag a bit.
  
  You can be almost certain that Intel and AMD will not use that
  reserved bit for anything else.  The Linux KVM virtualization business
  is too important for them.  And if Microsoft Hyper-V or VMWare ESX
  sets that bit as well, this becomes an absolute certainty.
 
 The intel manual says Not Used, Always returns 0 which is different from 
 reserved, which is stated for other bits.
 
 FTR, jasper@ checked that vmware sets the bit while virtual box does not. 
 So, many but not all hypervisors set it.
 
  I prefer the CPUIDECX_HV name used in the diff you posted in:
 
 OK?

ok kettenis@

 diff --git a/sys/arch/amd64/amd64/identcpu.c b/sys/arch/amd64/amd64/identcpu.c
 --- a/sys/arch/amd64/amd64/identcpu.c
 +++ b/sys/arch/amd64/amd64/identcpu.c
 @@ -129,6 +129,7 @@ const struct {
   { CPUIDECX_AVX, AVX },
   { CPUIDECX_F16C,F16C },
   { CPUIDECX_RDRAND,  RDRAND },
 + { CPUIDECX_HV,  HV },
  }, cpu_ecpuid_ecxfeatures[] = {
   { CPUIDECX_LAHF,LAHF },
   { CPUIDECX_CMPLEG,  CMPLEG },
 diff --git a/sys/arch/amd64/include/specialreg.h 
 b/sys/arch/amd64/include/specialreg.h
 --- a/sys/arch/amd64/include/specialreg.h
 +++ b/sys/arch/amd64/include/specialreg.h
 @@ -158,6 +158,7 @@
  #define  CPUIDECX_AVX0x1000  /* Advanced Vector Extensions */
  #define  CPUIDECX_F16C   0x2000  /* 16bit fp conversion  */
  #define  CPUIDECX_RDRAND 0x4000  /* RDRAND instruction  */
 +#define  CPUIDECX_HV 0x8000  /* Running on hypervisor */
  
  /*
   * Structured Extended Feature Flags Parameters (CPUID function 0x7, leaf 
 0)
 



Re: Dell R630 high interrupts on acpi0

2014-12-16 Thread Hrvoje Popovski

On 16.12.2014. 6:16, Jonathan Matthew wrote:

On Sun, Dec 14, 2014 at 06:22:37PM +0100, Hrvoje Popovski wrote:

Hi all,

I have got two new Dell R630 and have current on them from Sun Dec
14 15:07:17. Installation went great and very fast.
The problem is that I see around 11k interrupts on acpi0. First I
thought that problem is similar to this thread
http://marc.info/?l=openbsd-miscm=140551906923931w=2

But if in dell bios system profile settings is set to performance or
to DAPC there are always interrupts on acpi0.
In links bellow you can find acpidump and dmesg from performance and
DAPC settings in dell bios.


We just got some r630s too, so I spent some time last week figuring out what's
going on here.  Something in the AML wants to talk to the intel MEI device.
Normally this works, but on the new generation of dell machines (we've seen it
on r630s and r730s), it's been moved outside the pci memory range we currently
allow on amd64.  You can see this in your dmesgs:

0:22:0: mem address conflict 0x3303000/0x10
0:22:1: mem address conflict 0x3302000/0x10

The interrupt will keep triggering until it manages to talk to the device,
which will never happen.

kettenis@ says we can get the pci memory range information we need to deal with
this from acpi.  Until that happens, expanding the allowed pci memory range
makes things work properly.

ok?


Index: pci_machdep.c
===
RCS file: /cvs/src/sys/arch/amd64/pci/pci_machdep.c,v
retrieving revision 1.59
diff -u -p -u -p -r1.59 pci_machdep.c
--- pci_machdep.c   19 Apr 2014 11:53:42 -  1.59
+++ pci_machdep.c   16 Dec 2014 04:21:53 -
@@ -622,13 +622,17 @@ pci_init_extents(void)
 * here.  As long as vendors continue to support
 * 32-bit operating systems, we should never see BARs
 * outside that region.
+*
+* Dell 13G servers have important devices outside the
+* 36-bit address space.  Until we can extract the address
+* ranges from acpi, expand the allowed range to suit.
 */
pcimem_ex = extent_create(pcimem, 0, 0xUL,
M_DEVBUF, NULL, 0, EX_NOWAIT);
if (pcimem_ex == NULL)
return;
-   extent_alloc_region(pcimem_ex, 0x10UL,
-   0xfff0UL, EX_NOWAIT);
+   extent_alloc_region(pcimem_ex, 0x400UL,
+   0xfc00UL, EX_NOWAIT);

for (bmp = bios_memmap; bmp-type != BIOS_MAP_END; bmp++) {
/*




Hi,

you patch makes acpi0 calm as overeaten grandad :)

Thank you.

vmstat without your patch
# vmstat -i
interrupt   total rate
irq0/clock  103588029  799
irq0/ipi   1025960
irq144/acpi0   157993005512201
irq96/mfii0  69380
irq114/em0 2702502
irq99/ehci0   1310
irq99/ehci1280
Total  168389802713004

vmstat with your patch
# vmstat -i
interrupt   total rate
irq0/clock 848873  800
irq0/ipi14085   13
irq144/acpi030
irq96/mfii0  22352
irq114/em0   36853
irq99/ehci0560
irq99/ehci1280
Total  868965  819




Re: divert(4) m_pullup

2014-12-16 Thread Mark Kettenis
 Date: Mon, 15 Dec 2014 23:44:54 -0500
 From: Lawrence Teo l...@openbsd.org
 
 Make divert_output() do an m_pullup only if truly needed.
 
 ok?

Questionable.  AFAIK m_pullup(9) will only do the pullup if it is
necesary in the first place.  Is there a measurable speedup from
inlining the check?

 Index: netinet/ip_divert.c
 ===
 RCS file: /cvs/src/sys/netinet/ip_divert.c,v
 retrieving revision 1.31
 diff -u -p -r1.31 ip_divert.c
 --- netinet/ip_divert.c   5 Dec 2014 15:50:04 -   1.31
 +++ netinet/ip_divert.c   13 Dec 2014 04:32:23 -
 @@ -101,7 +101,8 @@ divert_output(struct inpcb *inp, struct 
   /* Do basic sanity checks. */
   if (m-m_pkthdr.len  sizeof(struct ip))
   goto fail;
 - if ((m = m_pullup(m, sizeof(struct ip))) == NULL) {
 + if (m-m_len  sizeof(struct ip) 
 + (m = m_pullup(m, sizeof(struct ip))) == NULL) {
   /* m_pullup() has freed the mbuf, so just return. */
   divstat.divs_errors++;
   return (ENOBUFS);
 Index: netinet6/ip6_divert.c
 ===
 RCS file: /cvs/src/sys/netinet6/ip6_divert.c,v
 retrieving revision 1.31
 diff -u -p -r1.31 ip6_divert.c
 --- netinet6/ip6_divert.c 5 Dec 2014 15:50:04 -   1.31
 +++ netinet6/ip6_divert.c 13 Dec 2014 04:32:24 -
 @@ -104,7 +104,8 @@ divert6_output(struct inpcb *inp, struct
   /* Do basic sanity checks. */
   if (m-m_pkthdr.len  sizeof(struct ip6_hdr))
   goto fail;
 - if ((m = m_pullup(m, sizeof(struct ip6_hdr))) == NULL) {
 + if (m-m_len  sizeof(struct ip6_hdr) 
 + (m = m_pullup(m, sizeof(struct ip6_hdr))) == NULL) {
   /* m_pullup() has freed the mbuf, so just return. */
   div6stat.divs_errors++;
   return (ENOBUFS);
 
 



Re: Dell R630 high interrupts on acpi0

2014-12-16 Thread Mark Kettenis
 Date: Tue, 16 Dec 2014 15:16:58 +1000
 From: Jonathan Matthew jonat...@d14n.org
 
 On Sun, Dec 14, 2014 at 06:22:37PM +0100, Hrvoje Popovski wrote:
  Hi all,
  
  I have got two new Dell R630 and have current on them from Sun Dec
  14 15:07:17. Installation went great and very fast.
  The problem is that I see around 11k interrupts on acpi0. First I
  thought that problem is similar to this thread
  http://marc.info/?l=openbsd-miscm=140551906923931w=2
  
  But if in dell bios system profile settings is set to performance or
  to DAPC there are always interrupts on acpi0.
  In links bellow you can find acpidump and dmesg from performance and
  DAPC settings in dell bios.
 
 We just got some r630s too, so I spent some time last week figuring out what's
 going on here.  Something in the AML wants to talk to the intel MEI device.
 Normally this works, but on the new generation of dell machines (we've seen it
 on r630s and r730s), it's been moved outside the pci memory range we currently
 allow on amd64.  You can see this in your dmesgs:
 
 0:22:0: mem address conflict 0x3303000/0x10
 0:22:1: mem address conflict 0x3302000/0x10
 
 The interrupt will keep triggering until it manages to talk to the device,
 which will never happen.
 
 kettenis@ says we can get the pci memory range information we need to deal 
 with
 this from acpi.  Until that happens, expanding the allowed pci memory range
 makes things work properly.
 
 ok?

ok kettenis@ (although I'd prefer if you did a s/acpi/ACPI/ in the comment).

 Index: pci_machdep.c
 ===
 RCS file: /cvs/src/sys/arch/amd64/pci/pci_machdep.c,v
 retrieving revision 1.59
 diff -u -p -u -p -r1.59 pci_machdep.c
 --- pci_machdep.c 19 Apr 2014 11:53:42 -  1.59
 +++ pci_machdep.c 16 Dec 2014 04:21:53 -
 @@ -622,13 +622,17 @@ pci_init_extents(void)
* here.  As long as vendors continue to support
* 32-bit operating systems, we should never see BARs
* outside that region.
 +  *
 +  * Dell 13G servers have important devices outside the
 +  * 36-bit address space.  Until we can extract the address
 +  * ranges from acpi, expand the allowed range to suit.
*/
   pcimem_ex = extent_create(pcimem, 0, 0xUL,
   M_DEVBUF, NULL, 0, EX_NOWAIT);
   if (pcimem_ex == NULL)
   return;
 - extent_alloc_region(pcimem_ex, 0x10UL,
 - 0xfff0UL, EX_NOWAIT);
 + extent_alloc_region(pcimem_ex, 0x400UL,
 + 0xfc00UL, EX_NOWAIT);
  
   for (bmp = bios_memmap; bmp-type != BIOS_MAP_END; bmp++) {
   /*
 
 



Re: Binary code patching and paravirtualization

2014-12-16 Thread Mike Larkin
On Tue, Dec 16, 2014 at 11:08:03AM +0100, Mark Kettenis wrote:
  Date: Mon, 15 Dec 2014 23:21:35 +0100 (CET)
  From: Stefan Fritsch s...@sfritsch.de
  
  On Thu, 11 Dec 2014, Mark Kettenis wrote:
  
From: Alexey Suslikov alexey.susli...@gmail.com
Date: Thu, 11 Dec 2014 20:51:14 + (UTC)

Stefan Fritsch sf at sfritsch.de writes:

 --- a/sys/arch/amd64/include/specialreg.h
 +++ b/sys/arch/amd64/include/specialreg.h
  at  at  -158,6 +158,7  at  at 
  #define  CPUIDECX_AVX0x1000  /* Advanced Vector 
 Extensions 
*/
  #define  CPUIDECX_F16C   0x2000  /* 16bit fp conversion  
 */
  #define  CPUIDECX_RDRAND 0x4000  /* RDRAND instruction  
 */
 +#define  CPUIDECX_HYPERV 0x8000  /* Hypervisor present */

Is this flag standardized? Last time I have tried to push this, there
was an objection based on reserved for future use status of this flag.

See http://marc.info/?l=openbsd-bugsm=136907278229145w=2
   
   Well, that thread started out with a questionable workaround for a
   hypervisor bug.  That may have have influenced the debate about the
   flag a bit.
   
   You can be almost certain that Intel and AMD will not use that
   reserved bit for anything else.  The Linux KVM virtualization business
   is too important for them.  And if Microsoft Hyper-V or VMWare ESX
   sets that bit as well, this becomes an absolute certainty.
  
  The intel manual says Not Used, Always returns 0 which is different from 
  reserved, which is stated for other bits.
  
  FTR, jasper@ checked that vmware sets the bit while virtual box does not. 
  So, many but not all hypervisors set it.
  
   I prefer the CPUIDECX_HV name used in the diff you posted in:
  
  OK?
 
 ok kettenis@
 
  diff --git a/sys/arch/amd64/amd64/identcpu.c 
  b/sys/arch/amd64/amd64/identcpu.c
  --- a/sys/arch/amd64/amd64/identcpu.c
  +++ b/sys/arch/amd64/amd64/identcpu.c
  @@ -129,6 +129,7 @@ const struct {
  { CPUIDECX_AVX, AVX },
  { CPUIDECX_F16C,F16C },
  { CPUIDECX_RDRAND,  RDRAND },
  +   { CPUIDECX_HV,  HV },
   }, cpu_ecpuid_ecxfeatures[] = {
  { CPUIDECX_LAHF,LAHF },
  { CPUIDECX_CMPLEG,  CMPLEG },
  diff --git a/sys/arch/amd64/include/specialreg.h 
  b/sys/arch/amd64/include/specialreg.h
  --- a/sys/arch/amd64/include/specialreg.h
  +++ b/sys/arch/amd64/include/specialreg.h
  @@ -158,6 +158,7 @@
   #defineCPUIDECX_AVX0x1000  /* Advanced Vector Extensions */
   #defineCPUIDECX_F16C   0x2000  /* 16bit fp conversion  */
   #defineCPUIDECX_RDRAND 0x4000  /* RDRAND instruction  */
  +#defineCPUIDECX_HV 0x8000  /* Running on hypervisor */
   
   /*
* Structured Extended Feature Flags Parameters (CPUID function 0x7, 
  leaf 0)
  
 

ok mlarkin@ too



Re: divert(4) m_pullup

2014-12-16 Thread Mike Belopuhov
On 16 December 2014 at 12:08, Mark Kettenis mark.kette...@xs4all.nl wrote:
 Date: Mon, 15 Dec 2014 23:44:54 -0500
 From: Lawrence Teo l...@openbsd.org

 Make divert_output() do an m_pullup only if truly needed.

 ok?

 Questionable.  AFAIK m_pullup(9) will only do the pullup if it is
 necesary in the first place.

I agree.  m_pullup already checks that.

 Is there a measurable speedup from inlining the check?




Re: unnbound vs file descriptors

2014-12-16 Thread Otto Moerbeek
On Tue, Dec 16, 2014 at 10:30:21AM +0100, Antoine Jacoutot wrote:

   may just a naive question..
   but did you sudo vipw 
   and put unbound class for unbound user?
  
  That's not neccesary anymore these days, I believe. The rc.d subsystem
  takes case of setting the proper class, if available.  At least it
 
 That's correct.
 
  does not document setting the login class in the pwd db is needed. 
 
 Because it's not :-)
 
 -- 
 Antoine

Well, there's more to it than that.

unbound has code to set it's own rlimits. It uses setusercontext()
with the class of the _unbound user. So the class of the unbound user
*does* matter.

If I set the class of the _unbound user and both cur and max things
seem to work:

unbound:\
:openfiles=2048:\
:tc=daemon:

Just setting cur does not work, since it then tries to set a cur
higher than max and you'll get an error:

unbound: unbound: setting resource limit openfiles: Invalid argument

in the daemon log.

-Otto



kmem readable kvm db

2014-12-16 Thread Ted Unangst
The kvm_bsd.db file only needs to be readable by programs that are
setgid kmem. This is not much of an info leak since any user can read
/bsd (or in many cases download a copy), but moving forward it would
be nice to patch these leaks up one by one.

A few kmem grovelers appear to still work afterwards.

Index: kvm_mkdb.c
===
RCS file: /cvs/src/usr.sbin/kvm_mkdb/kvm_mkdb.c,v
retrieving revision 1.18
diff -u -p -r1.18 kvm_mkdb.c
--- kvm_mkdb.c  20 Jul 2014 01:38:40 -  1.18
+++ kvm_mkdb.c  16 Dec 2014 19:22:54 -
@@ -31,6 +31,9 @@
 
 #include sys/param.h
 #include sys/stat.h
+#include sys/types.h
+#include sys/time.h
+#include sys/resource.h
 
 #include db.h
 #include err.h
@@ -42,10 +45,7 @@
 #include stdlib.h
 #include string.h
 #include unistd.h
-
-#include sys/types.h
-#include sys/time.h
-#include sys/resource.h
+#include grp.h
 
 #include extern.h
 
@@ -131,6 +131,7 @@ kvm_mkdb(int fd, const char *dbdir, char
DB *db;
char dbtemp[MAXPATHLEN], dbname[MAXPATHLEN];
int r;
+   struct group *gr;
 
r = snprintf(dbtemp, sizeof(dbtemp), %skvm_%s.tmp,
dbdir, nlistname);
@@ -155,7 +156,7 @@ kvm_mkdb(int fd, const char *dbdir, char
 
(void)umask(0);
db = dbopen(dbtemp, O_CREAT | O_EXLOCK | O_TRUNC | O_RDWR,
-   S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH, DB_HASH, openinfo);
+   S_IRUSR | S_IWUSR | S_IRGRP, DB_HASH, openinfo);
if (db == NULL) {
warn(can't dbopen %s, dbtemp);
return(1);
@@ -167,6 +168,14 @@ kvm_mkdb(int fd, const char *dbdir, char
}
if (db-close(db)) {
warn(can't dbclose %s, dbtemp);
+   (void)unlink(dbtemp);
+   return(1);
+   }
+
+   if ((gr = getgrnam(kmem)) == NULL) {
+   warn(can't find kmem group);
+   } else if (chown(dbtemp, -1, gr-gr_gid)) {
+   warn(can't chown %s, dbtemp);
(void)unlink(dbtemp);
return(1);
}



rand()/random() warning

2014-12-16 Thread Carlin Bingham
There is no warning when compiling code that calls random() but two 
warnings when compiling code that calls rand() -


: warning: random() may return determinstic values, is that what you
want?warning: rand() may return determinstic values, is that what you want?


Is the first parameter to __warn_references in random.c supposed to be
'random' not 'rand'?



Index: lib/libc/stdlib/random.c
===
RCS file: /cvs/src/lib/libc/stdlib/random.c,v
retrieving revision 1.26
diff -u -p -u -r1.26 random.c
--- lib/libc/stdlib/random.c9 Dec 2014 08:00:53 -   1.26
+++ lib/libc/stdlib/random.c16 Dec 2014 20:22:12 -
@@ -417,6 +417,6 @@ random(void)
 }
 
 #if defined(APIWARN)
-__warn_references(rand,
+__warn_references(random,
 warning: random() may return determinstic values, is that what you
want?);
 #endif



Re: rand()/random() warning

2014-12-16 Thread Stuart Henderson
On 2014/12/17 09:37, Carlin Bingham wrote:
 There is no warning when compiling code that calls random() but two 
 warnings when compiling code that calls rand() -

I noticed this earlier as well.

 : warning: random() may return determinstic values, is that what you
 want?warning: rand() may return determinstic values, is that what you want?

There's a typo in deterministic too.

 Is the first parameter to __warn_references in random.c supposed to be
 'random' not 'rand'?
 
 
 
 Index: lib/libc/stdlib/random.c
 ===
 RCS file: /cvs/src/lib/libc/stdlib/random.c,v
 retrieving revision 1.26
 diff -u -p -u -r1.26 random.c
 --- lib/libc/stdlib/random.c9 Dec 2014 08:00:53 -   1.26
 +++ lib/libc/stdlib/random.c16 Dec 2014 20:22:12 -
 @@ -417,6 +417,6 @@ random(void)
  }
  
  #if defined(APIWARN)
 -__warn_references(rand,
 +__warn_references(random,
  warning: random() may return determinstic values, is that what you
 want?);
  #endif
 



Re: Binary code patching and paravirtualization

2014-12-16 Thread Stefan Fritsch
On Mon, 15 Dec 2014, Ted Unangst wrote:

 On Mon, Dec 15, 2014 at 23:55, Stefan Fritsch wrote:
  
  Only in order to get a flags field that can be tweaked with config(8). And
  to allow disable via config(8), though that could also be achieved with a
  flag.
  
  Tweaking the behavior with a flags value is necessary because hypervisors
  not always announce what they are capable of. One reason for this is that
  qemu is responsible for setting most of the cpuid flags but the kvm kernel
  module offers some interfaces in all cases. Another reason seems to be
  simple oversight, for example Illuminos KVM forgot to add the cpuid stuff
  from Linux KVM. And if there is some bug it may be a good idea to have a
  simple way to check if it is related to the paravirt stuff.
  
  I have just noticed that there is support in config(8) to set non-devices
  related int values. But this does not seem to be supported in the
  in-kernel UKC. And I can't see right now how config finds the valid int
  values in the kernel. Or is it just necessary to add the name of the
  variable prepended by an underscore to the config tool? Would this be
  preferred over introducing the paravirt device? But being able to set this
  from the in-kernel UKC would be nice, too.

What I have forgotten is that just because a hypervisor supports a 
specific interface, it does not always mean that it is a good idea to use 
it. For example, recent Intel CPUs with a recent Linux/KVM have better 
hardware support for APIC virtualization and using the Hyper-V APIC access 
MSRs actually slow things down.

 I think it would be better to avoid fake device proliferation, but
 others may have other opinions.
 
 So the problem is that some hypervisors are broken and don't
 identify as such? Perhaps we ignore them to start? Then we can add a
 mechanism to force paravirt code patching.

 I think the introduction of paravirt code patching and the mechanism
 used to enable or disable are separate issues. It can start as a
 normal option PARAVIRT, and then we discuss what else needs to be done?

Yes, we can first start with some things that are unproblematic 
WRT detection. AFAIK, this is the case for the PV EOI optimization.


Unfortunately, the bit that gives the largest performance boost at the 
moment, the APIC access MSRs, is not so easy:
- KVM has supported it for a long time
- but it is only announced if qemu is started with a special command line 
option
- Illumos KVM supports it too, but there is no way to announce it

Also, old hypervisors tend to stay around for a long time and users can't 
always influence the configuration when they rent a vserver.

On the other hand, the same positive effect as the APIC access MSRs should 
also be achievable by adding x2apic support. kettenis wanted to check how 
much effort that would be. So, I think it would be best to wait a bit with 
this piece.



Re: Binary code patching and paravirtualization

2014-12-16 Thread Alexey Suslikov
 CVSROOT: /cvs
 Module name: src
 Changes by: s...@cvs.openbsd.org 2014/12/16 14:02:58
 Modified files:
 sys/arch/amd64/amd64: identcpu.c
 sys/arch/amd64/include: specialreg.h

 Log message:
 Define and print HV cpuid flag.
 This is set by many hypervisors, including kvm, vmware, hyper-v.

do they set HV flag only for amd64 guests? how about i386 ones?



Re: Dell R630 high interrupts on acpi0

2014-12-16 Thread Hrvoje Popovski

On 16.12.2014. 6:16, Jonathan Matthew wrote:

We just got some r630s too, so I spent some time last week figuring out what's
going on here.  Something in the AML wants to talk to the intel MEI device.
Normally this works, but on the new generation of dell machines (we've seen it
on r630s and r730s), it's been moved outside the pci memory range we currently
allow on amd64.  You can see this in your dmesgs:

0:22:0: mem address conflict 0x3303000/0x10
0:22:1: mem address conflict 0x3302000/0x10

The interrupt will keep triggering until it manages to talk to the device,
which will never happen.

kettenis@ says we can get the pci memory range information we need to deal with
this from acpi.  Until that happens, expanding the allowed pci memory range
makes things work properly.



Hi,

on R630 i have custom bios settings and noticed that even if C states 
are disabled in bios i can see them in dmesg

acpicpu0 at acpi0: C1
acpicpu1 at acpi0: C1
acpicpu2 at acpi0: C1
acpicpu3 at acpi0: C1
acpicpu4 at acpi0: C1
acpicpu5 at acpi0: C1
acpicpu6 at acpi0: C1
acpicpu7 at acpi0: C1

X2Apic is disabled too but in dmesg i see
cpu0: FPU,CPI,...SSE4.1,SSE4.2,x2APIC

This is not good, right?

R630 bios settings

Processor Settings:
Logical Processor - Disabled
QPI Speed - Maximum data rate
Alternate RTID Settings - Disabled
Virtualization Technology - Disabled
Address Translation Service (ATS) - Disabled (gray)
Adjacent Cache Line Prefetch - Enabled
Hardware Prefetcher - Enabled
DCU Streamer Prefetcher - Enabled
DCU IP Prefetcher - Enabled
Execute Disabled - Enabled
Logical Processor Idling - Disabled
Configurable TDP - Nominal
X2Apic Mode - Disabled (gray)
Dell Controlled Turbo - Enabled

System Profile Settings:
System Profile - Custom
CPU Power Management - Maximum Performance
Memory Frequency - Maximum Performance
Turbo Boost - Enabled
Energy Efficient Turbo - Disabled
C1E - Disabled
C states - Disabled
Collaborative CPU Performance Control - Disabled
Memory Patrol Scrub - Standard
Memory Refresh Rate - 1x
Uncore Frequency - Maximum
Energy Efficient Policy - Performance

Full dmesg and acpidump from R630
http://kosjenka.srce.hr/~hrvoje/R630_custom_dmesg.txt
http://kosjenka.srce.hr/~hrvoje/R630_custom.tgz



R620 have similar settings and can't see C states in dmesg
acpicpu0 at acpi0
acpicpu1 at acpi0
acpicpu2 at acpi0
acpicpu3 at acpi0
acpicpu4 at acpi0
acpicpu5 at acpi0

R620 bios settings

Processor Settings:
Logical Processor - Disabled
Alternate RTID Settings - Disabled
Virtualization Technology - Disabled
Adjacent Cache Line Prefetch - Enabled
Hardware Prefetcher - Enabled
DCU Streamer Prefetcher - Enabled
DCU IP Prefetcher - Enabled
Logical Processor Idling - Disabled
Dell Controlled Turbo - Enabled

System Profile Settings:
System Profile - Custom
CPU Power Management - Maximum Performance
Memory Frequency - Maximum Performance
Turbo Boost - Enabled
C1E - Disabled
C states - Disabled
Monitor/Mwait - Disabled
Memory Patrol Scrub - Standard
Memory Refresh Rate - 1x
Memory Operating Voltage - Auto
Collaborative CPU Performance Control - Disabled


Full dmesg and acpidump from R620
http://kosjenka.srce.hr/~hrvoje/R620_custom_dmesg.txt
http://kosjenka.srce.hr/~hrvoje/R620_custom.tgz





Re: Dell R630 high interrupts on acpi0

2014-12-16 Thread Philip Guenther
On Tue, Dec 16, 2014 at 2:45 PM, Hrvoje Popovski hrv...@srce.hr wrote:
 on R630 i have custom bios settings and noticed that even if C states are
 disabled in bios i can see them in dmesg
 acpicpu0 at acpi0: C1

Uh, ACPI *requires* that C1 exist.  The halt instruction is defined as
entering C1, so not having C1 would mean your CPU lacks a basic
manadatory ia32 instruction.  Hopefully the BIOS docs explain that
you're just disabling deep C-states or something like that.  If not,
yell at the company that made it.

With the exception of C1E, I wouldn't tell a BIOS to disable
C-states unless it was causing the OS to have a problem or you're
actively trying to use the computer to heat your house.  C1E is a
cross between C1 and C3; the issue is that bugs in multiple early
hardware implementations mean it'll behave poorly depending on exactly
how the OS handles it.  This is something to test...and then test
again with each release you install...


 X2Apic is disabled too but in dmesg i see
 cpu0: FPU,CPI,...SSE4.1,SSE4.2,x2APIC

That just means the CPU has the feature bit set in the CPUID sets.
The BIOS is presumably configuring ACPI (which is what OpenBSD pays
attention to) to use the original LAPIC tables instead of the x2APIC
tables for locating CPUs and interrupts.


 R620 have similar settings and can't see C states in dmesg
 acpicpu0 at acpi0

That's either insane, or a bug in our acpicpu code, IMO.


 R620 bios settings
...
 Monitor/Mwait - Disabled

I would suggest leaving that on.  We ain't using it *right now*, but,
well, the source tree on my laptop is, and more than ever.  :-)


Philip Guenther



Re: Dell R630 high interrupts on acpi0

2014-12-16 Thread Frederic Nowak
Hi!

The below diff extracts the memory range information from ACPI. It looks
up all the memory ranges in _CRS and calculates minimal and maximal
values for pci_machdep.c.

I tested this on two amd64 machines and see no difference in pcidump.

Do you think we need to keep the old method in case the ACPI on some
machines does not report correct memory ranges? A simple check would be
to test if pcimem_range[0]  pcimem_range[1]...


Index: sys/arch/amd64/include/pci_machdep.h
===
RCS file: /cvs/src/sys/arch/amd64/include/pci_machdep.h,v
retrieving revision 1.22
diff -u -p -b -r1.22 pci_machdep.h
--- sys/arch/amd64/include/pci_machdep.h6 Nov 2013 10:40:36 -   
1.22
+++ sys/arch/amd64/include/pci_machdep.h17 Dec 2014 06:00:01 -
@@ -64,6 +64,7 @@ extern int pci_mcfg_min_bus, pci_mcfg_ma
 
 struct pci_attach_args;
 
+extern uint64_t pcimem_range[2];
 extern struct extent *pciio_ex;
 extern struct extent *pcimem_ex;
 extern struct extent *pcibus_ex;
Index: sys/arch/amd64/pci/pci_machdep.c
===
RCS file: /cvs/src/sys/arch/amd64/pci/pci_machdep.c,v
retrieving revision 1.60
diff -u -p -b -r1.60 pci_machdep.c
--- sys/arch/amd64/pci/pci_machdep.c16 Dec 2014 23:13:20 -  1.60
+++ sys/arch/amd64/pci/pci_machdep.c17 Dec 2014 07:16:47 -
@@ -66,6 +66,7 @@
  */
 
 #include sys/types.h
+#include sys/stdint.h
 #include sys/param.h
 #include sys/time.h
 #include sys/systm.h
@@ -592,6 +593,7 @@ pci_intr_disestablish(pci_chipset_tag_t 
intr_disestablish(cookie);
 }
 
+uint64_t pcimem_range[2] = {UINT64_MAX, 0};
 struct extent *pciio_ex;
 struct extent *pcimem_ex;
 struct extent *pcibus_ex;
@@ -618,31 +620,25 @@ pci_init_extents(void)
 
if (pcimem_ex == NULL) {
/*
-* Cover the 36-bit address space addressable by PAE
-* here.  As long as vendors continue to support
-* 32-bit operating systems, we should never see BARs
-* outside that region.
-*
-* Dell 13G servers have important devices outside the
-* 36-bit address space.  Until we can extract the address
-* ranges from ACPI, expand the allowed range to suit.
+* Cover the address space extracted from ACPI.
 */
pcimem_ex = extent_create(pcimem, 0, 0xUL,
M_DEVBUF, NULL, 0, EX_NOWAIT);
if (pcimem_ex == NULL)
return;
-   extent_alloc_region(pcimem_ex, 0x400UL,
-   0xfc00UL, EX_NOWAIT);
+   extent_alloc_region(pcimem_ex, pcimem_range[1] + 1,
+   (0xUL - pcimem_range[1]), EX_NOWAIT);
 
for (bmp = bios_memmap; bmp-type != BIOS_MAP_END; bmp++) {
/*
-* Ignore address space beyond 4G.
+* Ignore address space beyond address range
+* extracted from ACPI.
 */
-   if (bmp-addr = 0x1ULL)
+   if (bmp-addr  pcimem_range[1])
continue;
size = bmp-size;
-   if (bmp-addr + size = 0x1ULL)
-   size = 0x1ULL - bmp-addr;
+   if (bmp-addr + size  pcimem_range[1])
+   size = pcimem_range[1] - bmp-addr + 1;
 
/* Ignore zero-sized regions. */
if (size == 0)
Index: sys/dev/acpi/acpi.c
===
RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
retrieving revision 1.277
diff -u -p -b -r1.277 acpi.c
--- sys/dev/acpi/acpi.c 9 Dec 2014 06:58:29 -   1.277
+++ sys/dev/acpi/acpi.c 17 Dec 2014 05:59:36 -
@@ -385,0 +385,0 @@ TAILQ_HEAD(, acpi_pci) acpi_pcirootdevs

 int acpi_getpci(struct aml_node *node, void *arg);
 int acpi_getminbus(union acpi_resource *crs, void *arg);
+int acpi_getmemrange(union acpi_resource *crs, void *arg);
+
+int
+acpi_getmemrange(union acpi_resource *crs, void *arg)
+{
+   int typ = AML_CRSTYPE(crs);
+   uint64_t *range = arg;  /* size 2 */
+   uint64_t min, max;
+
+   switch(typ) {
+   case LR_24BIT:
+   min = crs-lr_m24._min;
+   max = crs-lr_m24._max;
+   break;
+   case LR_32BIT:
+   case LR_32BITFIXED:
+   min = crs-lr_m32._min;
+   max = crs-lr_m32._max;
+   break;
+   case LR_WORD:
+   min = crs-lr_word._min;
+   max = crs-lr_word._max;
+   break;
+   case LR_DWORD:
+   min = crs-lr_dword._min;
+   max =