Re: i386: vm.pmap kernel local race condition

2013-02-17 Thread Alan Cox
On 02/17/2013 08:17, Eugene Grosbein wrote:
 17.02.2013 01:25, Alan Cox wrote:

 Regardless of what that web site says, this is not really a race condition.  
 Instead, you're exhausting a resource in the kernel because of the 
 characteristics of your workload.  The kernel tries to handle this 
 gracefully, but in extreme cases, the kernel can't keep up with the demand.  
 Have you simply tried doing as the panic message suggests, i.e., increase 
 vm.pmap.shpgperproc?  Alternatively, you can increase vm.pmap.pv_entry_max 
 to more directly accomplish the same.

 That said, if possible, you should do as Adrian suggests and change your 
 Squid configuration to not use 500 helper processes.  That will allow a lot 
 more of your machine's physical memory to go to caching data rather 
 bookkeeping data structures in the kernel.
 A warning in src/sys/i386/conf/NOTES scares me:

 #
 # Set the number of PV entries per process.  Increasing this can
 # stop panics related to heavy use of shared memory.  However, that can
 # (combined with large amounts of physical memory) cause panics at
 # boot time due the kernel running out of VM space.
 #
 # If you're tweaking this, you might also want to increase the sysctls
 # vm.v_free_min, vm.v_free_reserved, and vm.v_free_target.
 #
 # The value below is the one more than the default.
 #
 options PMAP_SHPGPERPROC=201

 I guess, my 4G physical RAM is large amount for i386
 so I'm afraid of remote server's kernel panic at boot time
 as I have no idea how exaclty should I increase value of vm.pmap.shpgperproc?
 What should be an increment?

Run sysctl vm.kvm_free and sysctl vm.pmap.pv_entry_max.  The former
sysctl tells you how many bytes of free kernel address space that you
have.  Basically, you want to ensure that this number doesn't go to
zero.  Every pv entry occupies 12 bytes.  To start with, my suggestion
would be to increase the maximum number of pv entries until it consumes
about half of your current free kernel address space.

Once you've decided on a new maximum value that you want to try, set the
tunable vm.pmap.pv_entries in /boot/loader.conf.

Ignore the other comments from NOTES.


 Using concurrency extension means moving from squid-3.1 to squid-3.2,
 I might try that too, thanks. I was not aware of this new feature in 3.2

 Eugene Grosbein


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: i386: vm.pmap kernel local race condition

2013-02-16 Thread Alan Cox
On Thu, Feb 14, 2013 at 7:55 AM, Eugene Grosbein eu...@grosbein.pp.ruwrote:

 Hi!

 I've got FreeBSD 8.3-STABLE/i386 server that can be reliably panicked
 using just 'squid -k rotatelog' command. It seems the system suffers
 from the problem described here:

 http://cxsecurity.com/issue/WLB-2010090156

 I could not find any FreeBSD Security Advisory containing a fix.

 My server has 4G physical RAM (about 3.2G available) and runs
 squid (about 110M VSS) with 500 ntlm_auth subprocesses.
 Lesser number of ntlm_auth sometimes results in squid crash
 as it sometimes has several hundreds requests per second to authorize
 and is intolerant to exhaustion of free ntlm_auth.

 squid -k rotatelog at midnight results in crash:

 Feb 14 00:03:00 irl savecore: reboot after panic: get_pv_entry: increase
 vm.pmap.shpgperproc
 Feb 14 00:03:00 irl savecore: writing core to vmcore.1

 Btw, I have coredump.

 vm.pmap.shpgperproc has default value (200) here, as well as m.v_free_min,
 vm.v_free_reserved, and vm.v_free_target and KVA_PAGES.

 These crashes are pretty regular

 # last|fgrep reboot
 reboot   ~ Thu Feb 14 00:03
 reboot   ~ Wed Feb 13 19:08
 reboot   ~ Wed Feb 13 10:40
 reboot   ~ Wed Feb 13 00:04
 reboot   ~ Tue Feb 12 00:09
 reboot   ~ Mon Feb 11 00:03
 reboot   ~ Sun Feb 10 00:03
 reboot   ~ Thu Feb  7 00:03
 reboot   ~ Wed Feb  6 10:52
 reboot   ~ Sun Feb  3 00:03
 reboot   ~ Sat Feb  2 00:03

 May this be considered as security problem?
 Can it be fixed without switch to amd64?
 I have only remote access to this production server, no serial console.


Regardless of what that web site says, this is not really a race
condition.  Instead, you're exhausting a resource in the kernel because of
the characteristics of your workload.  The kernel tries to handle this
gracefully, but in extreme cases, the kernel can't keep up with the
demand.  Have you simply tried doing as the panic message suggests, i.e.,
increase vm.pmap.shpgperproc?  Alternatively, you can increase
vm.pmap.pv_entry_max to more directly accomplish the same.

That said, if possible, you should do as Adrian suggests and change your
Squid configuration to not use 500 helper processes.  That will allow a lot
more of your machine's physical memory to go to caching data rather
bookkeeping data structures in the kernel.

Regards,
Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: superpages not solving PV entries limit warning

2012-05-10 Thread Alan Cox
On Thu, May 10, 2012 at 2:32 AM, Adam Vande More amvandem...@gmail.comwrote:

 On Wed, May 9, 2012 at 12:55 PM, Charles Owens
 cow...@greatbaysoftware.comwrote:

  Hi fellow BSD-types,
 
  I have a buy system that forks lots of processes and I see repeatedly the
  message:  Approaching the limit on PV entries, consider increasing
 either
  the vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable.
 
  System details:
 
   * 8.1-RELEASE-p2 i386 PAE kernel
   * 6 GB RAM
 

 The warning is not applicable any longer including your version as well as
 several previous ones.  The warning has been removed from current releases.


For amd64, yes, but not i386.  It can't be removed from i386.

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: superpages not solving PV entries limit warning

2012-05-10 Thread Alan Cox
On Thu, May 10, 2012 at 10:52 AM, Charles Owens cow...@greatbaysoftware.com
 wrote:

  That's very helpful!  I had read about that and wondered if it applied to
 i386.

 Should I have expected superpages to completely cure the condition... or
 does it just help?  Should I now be looking at tuning the related pmap
 sysctls to give further relief?


Superpages won't cure the problem due to the nature of your workload.
After a fork, writes to portions of the address space that are both
superpages and copy-on-write will trigger demotion, or re-instantiation of
the 4KB page granularity PV entries.  Ultimately, repromotion to superpages
may occur, but in the meantime, your peak usage of PV entries is only
slightly reduced.

The bottom line is that you'll need to resort to tuning.

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: AMD Erratum 383 crashes FreeBSD 9-Stable

2012-03-17 Thread Alan Cox
On Sat, Mar 17, 2012 at 12:10 PM, Richard Yao r...@cs.stonybrook.eduwrote:

 On 03/17/12 13:08, Richard Yao wrote:
  Dear FreeBSD Developers:
 
  I used the ZFS Guru LiveCD to install FreeBSD 9 in KVM on a host system
  with an AMD Thuban processor (K10h). I then proceeded to compile perl
  and the VM crashed. Linux's dmesg gave me the following hint as to the
  cause:
 
  [ 3568.234654] KVM: Guest triggered AMD Erratum 383
 
  I also tried installing Gentoo Prefix, a userland package manager like
  NetBSD pkgsrc, and the VM also crashed with the same message when
  compiling the first component. AMD has documented this issue, with a
  workaround for hypervisors and a statement saying that they won't fix it:
 
  If system software performs uncommon methods to change the page size of
  an active page table that is valid, the CPU core may, under a highly
  specific and detailed set of conditions, form duplicate TLB entries for
  a single linear address. The CPU core will machine check if this page is
  then accessed prior to it being invalidated from the TLB.
 
  http://support.amd.com/us/Embedded_TechDocs/41322.pdf
 
  Has anyone done anything to workaround this issue? I have a Gentoo
  Hardened VM running on this machine which has no problem compiling
  software, so I am sure that some sort of page table workaround is
 possible.
 
  Yours truly,
  Richard Yao
 

 I was tired when I wrote that, so my eyes seem to have skipped some
 advice from AMD on how to workaround this in the kernel:

 Affected software must ensure that page sizes are only increased or
 decreased after the entry is invalidated and flushed out of all TLBs.
 When flushing multiple entries from the TLB, software may wish to use a
 single MOV CR3 value to invalidate the TLB instead of repetitive INVLPG
 instructions

 Also, I am not on the mailing list, so please CC replies to me.


When the FreeBSD kernel detects that it is running on an affected
processor, it automatically enables the recommended workaround.  However,
because you are running within a virtual machine, the automatic detection
may not be working.  Alternatively, you may be using a newer processor
revision that still suffers from the bug, but the kernel doesn't enable the
workaround for.  Can you tell us how the FreeBSD guest sees the underlying
processor, e.g., the first few lines of dmesg from the guest?

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: directory listing hangs in ufs state

2011-12-22 Thread Alan Cox

On 12/22/2011 03:48, Kostik Belousov wrote:

On Wed, Dec 21, 2011 at 09:03:02PM +0400, Andrey Zonov wrote:

On 15.12.2011 17:01, Kostik Belousov wrote:

On Thu, Dec 15, 2011 at 03:51:02PM +0400, Andrey Zonov wrote:

On Thu, Dec 15, 2011 at 12:42 AM, Jeremy Chadwick
free...@jdc.parodius.comwrote:


On Wed, Dec 14, 2011 at 11:47:10PM +0400, Andrey Zonov wrote:

On 14.12.2011 22:22, Jeremy Chadwick wrote:

On Wed, Dec 14, 2011 at 10:11:47PM +0400, Andrey Zonov wrote:

Hi Jeremy,

This is not hardware problem, I've already checked that. I also ran
fsck today and got no errors.

After some more exploration of how mongodb works, I found that then
listing hangs, one of mongodb thread is in biowr state for a long
time. It periodically calls msync(MS_SYNC) accordingly to ktrace
out.

If I'll remove msync() calls from mongodb, how often data will be
sync by OS?

--
Andrey Zonov

On 14.12.2011 2:15, Jeremy Chadwick wrote:

On Wed, Dec 14, 2011 at 01:11:19AM +0400, Andrey Zonov wrote:

Have you any ideas what is going on? or how to catch the problem?

Assuming this isn't a file on the root filesystem, try booting the
machine in single-user mode and using fsck -f on the filesystem in
question.

Can you verify there's no problems with the disk this file lives on
as
well (smartctl -a /dev/disk)?  I'm doubting this is the problem, but
thought I'd mention it.

I have no real answer, I'm sorry.  msync(2) indicates it's effectively
deprecated (see BUGS).  It looks like this is effectively a
mmap-version
of fsync(2).

I replaced msync(2) with fsync(2).  Unfortunately, from man pages it
is not obvious that I can do this. Anyway, thanks.

Sorry, that wasn't what I was implying.  Let me try to explain
differently.

msync(2) looks, to me, like an mmap-specific version of fsync(2).  Based
on the man page, it seems that the with msync() you can effectively
guaranteed flushing of certain pages within an mmap()'d region to disk.
fsync() would flush **all** buffers/internal pages to be flushed to
disk.

One would need to look at the code to mongodb to find out what it's
actually doing with msync().  That is to say, if it's doing something
like this (I probably have the semantics wrong -- I've never spent much
time with mmap()):

fd = open(/some/file, O_RDWR);
ptr = mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
ret = msync(ptr, 65536, MS_SYNC);
/* or alternatively, this:
ret = msync(ptr, NULL, MS_SYNC);
*/

Then this, to me, would be mostly the equivalent to:

fd = fopen(/some/file, r+);
ret = fsync(fd);

Otherwise, if it's calling msync() only on an address/location within
the region ptr points to, then that may be more efficient (less pages to
flush).


They call msync() for the whole file.  So, there will not be any
difference.



The mmap() arguments -- specifically flags (see man page) -- also play
a role here.  The one that catches my attention is MAP_NOSYNC.  So you
may need to look at the mongodb code to figure out what it's mmap()
call is.

One might wonder why they don't just use open() with the O_SYNC.  I
imagine that has to do with, again, performance; possibly the don't want
all I/O synchronous, and would rather flush certain pages in the mmap'd
region to disk as needed.  I see the legitimacy in that approach (vs.
just using O_SYNC).

There's really no easy way for me to tell you which is more efficient,
better, blah blah without spending a lot of time with a benchmarking
program that tests all of this, *plus* an entire system (world) built
with profiling.


I ran for two hours mongodb with fsync() and got the following:
STARTED  INBLK OUBLK MAJFLT MINFLT
Thu Dec 15 10:34:52 2011 3 192744314 3080182

This is output of `ps -o lstart,inblock,oublock,majflt,minflt -U mongodb'.

Then I ran it with default msync():
STARTED  INBLK OUBLK MAJFLT MINFLT
Thu Dec 15 12:34:53 2011 0 7241555 79 5401945

There are also two graphics of disk business [1] [2].

The difference is significant, in 37 times!  That what I expected to get.

In commentaries for vm_object_page_clean() I found this:

  *  When stuffing pages asynchronously, allow clustering.  XXX we
  need a
  *  synchronous clustering mode implementation.

It means for me that msync(MS_SYNC) flush every page on disk in single IO
transaction.  If we multiply 4K and 37 we get 150K.  This number is size
of
the single transaction in my experience.

+alc@, kib@

Am I right? Is there any plan to implement this?

Current buffer clustering code can only do only async writes. In fact, I
am not quite sure what would consitute the sync clustering, because the
ability to delay the write is important to be able to cluster at all.

Also, I am not sure that lack of clustering is the biggest problem.
IMO, the fact that each write is sync is the first problem there. It
would be quite a work to add the tracking of the issued writes to the
vm_object_page_clean() and down the stack. Esp. due to custom page
write 

Re: directory listing hangs in ufs state

2011-12-14 Thread Alan Cox
On Wed, Dec 14, 2011 at 12:22 PM, Jeremy Chadwick
free...@jdc.parodius.comwrote:

 On Wed, Dec 14, 2011 at 10:11:47PM +0400, Andrey Zonov wrote:
  Hi Jeremy,
 
  This is not hardware problem, I've already checked that. I also ran
  fsck today and got no errors.
 
  After some more exploration of how mongodb works, I found that then
  listing hangs, one of mongodb thread is in biowr state for a long
  time. It periodically calls msync(MS_SYNC) accordingly to ktrace
  out.
 
  If I'll remove msync() calls from mongodb, how often data will be
  sync by OS?
 
  --
  Andrey Zonov
 
  On 14.12.2011 2:15, Jeremy Chadwick wrote:
  On Wed, Dec 14, 2011 at 01:11:19AM +0400, Andrey Zonov wrote:
  
  Have you any ideas what is going on? or how to catch the problem?
  
  Assuming this isn't a file on the root filesystem, try booting the
  machine in single-user mode and using fsck -f on the filesystem in
  question.
  
  Can you verify there's no problems with the disk this file lives on as
  well (smartctl -a /dev/disk)?  I'm doubting this is the problem, but
  thought I'd mention it.

 I have no real answer, I'm sorry.  msync(2) indicates it's effectively
 deprecated (see BUGS).  It looks like this is effectively a mmap-version
 of fsync(2).


Yikes, I just looked at this man page.  I'm afraid that the text in the
BUGS section is highly misleading.  The MS_INVALIDATE option should be
obsolete for the reason given there.  Under a strict reading of the
applicable standard, FreeBSD could implement this option as a NOP.
However, we treat it something like madvise(MADV_DONTNEED|FREE).  In
contrast, MS_SYNC is definitely not obsolete.

Alan

P.S. If someone wants to take a crack at fixing this man page, contact me
off list.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: PAE broken on 7-STABLE

2011-12-05 Thread Alan Cox
On Mon, Dec 5, 2011 at 4:15 PM, Arnaud Lacombe lacom...@gmail.com wrote:

 Hi,

 A FreeBSD 7-STABLE miserably crashes on the following:

 Fatal trap 12: page fault while in kernel mode
 cpuid = 0; apic id = 00
 fault virtual address = 0xbfef
 fault code= supervisor read, page not present
 instruction pointer   = 0x20:0xc05fd1c2
 stack pointer = 0x28:0xc0af6c7c
 frame pointer = 0x28:0xc0af6cc0
 code segment  = base 0x0, limit 0xf, type 0x1b
  = DPL 0, pres 1, def32 1, gran 1
 processor eflags  = interrupt enabled, resume, IOPL = 0
 current process   = 0 ()
 trap number   = 12
 panic: page fault
 cpuid = 0
 KDB: stack
 backtrace:
 db_trace_self_wrapper(c0662728,0,c062b78b,c0af6b28,0,...) at
 db_trace_self_wrapper+0x26panic(c062b78b,c06639cc,c06c1de4,1,1,...) at
 panic+0x106trap_fatal(c0c74388,c065b897,c064d922,10,c0c74000,...) at
 trap_fatal+0x270
 trap_pfault(c06d4e40,c0c74380,c0af6c40,3,c06c1bc0,...) at trap_pfault+0x2aa
 trap(c0af6c3c) at trap+0x36ecalltrap() at calltrap+0x6
 --- trap 0xc, eip = 0xc05fd1c2, esp = 0xc0af6c7c, ebp = 0xc0af6cc0 ---
 pmap_map(c0af6d68,3f6ba000,6,3fef8000,6,...) at pmap_map+0x72
 vm_page_startup(c0d3e000,a,c0af6d88,c03f8f26,0,...) at
 vm_page_startup+0x35a
 vm_mem_init(0,af,af0020,af,0,...) at vm_mem_init+0x18
 mi_startup() at mi_startup+0x56begin() at begin+0x2c

 on a machine with 24GB of RAM, while PAE is meant to support up to 64GB.

  - Arnaud

 ps: this is just a report, I'm not really expecting anything, any
 longer, from the FreebSD community.


At this early stage in the boot process, the page table pages for the
kernel address space must be statically allocated.  When PAE was still
actively used, it was unusual to find machines that had more than about
16GB of RAM.  So, the static allocation of page table pages was set
accordingly.  For larger machines, it is necessary to increase NKPT.  The
following comment appears in i386/include/pmap.h:

/* Initial number of kernel page tables. */
#ifndef NKPT
#ifdef PAE
/* 152 page tables needed to map 16G (76B struct vm_page, 2M page
tables). */
#define NKPT240
#else
/* 18 page tables needed to map 4G (72B struct vm_page, 4M page tables).
*/
#define NKPT30
#endif
#endif

That said, a machine with 24GB of RAM is likely not going to be usable for
many workloads unless you also increase the size of the kernel virtual
address space (and thereby reduce the size of the user virtual address
space).

Regards,
Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: svn commit: r227420 - stable/9/sys/vm

2011-11-10 Thread Alan Cox
On Thu, Nov 10, 2011 at 11:27 AM, George Kontostanos gkontos.m...@gmail.com
 wrote:

 Just out of curiosity or confusion maybe..

 Will those commits be included to FreeBSD 9.0-RELEASE ?

 Thanks


Yes, I believe so.

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 32GB limit per swap device?

2011-08-23 Thread Alan Cox

On 08/22/2011 21:36, Matthew Dillon wrote:

 The limitation was ONLY due to a *minor* 32-bit integer overflow in one
 or two *intermediate* calculations in the radix tree code, which I
 long ago fixed in DragonFly.

 Just find the changes in the DFly codebase and determine if they need
 to be applied.

 The swap space radix code (which I wrote long ago) is in page-sized
 blocks, so you actually probably want to keep using a 32-bit integer for
 the block number there to keep the physical memory reservation required
 for the radix tree low.  If you just pop the base block id up to 64 bits
 without adjusting the radix code to overlay a 64 bit bitmap on it you
 waste a lot of physical memory for the same amount of swap reservation.


Unfortunately, in FreeBSD, when daddr_t was increased to 64 bits a few 
years ago, the bitmap size was not increased.  So, we have been wasting 
about half the space used by the blist structure for some time now.  I 
expect that we'll fix this after the current code freeze ends.


However, as Alexander observed, the primary reason for the 32GB limit on 
the size of a swap partition was artificial.  The limit was being 
implemented in the wrong place.  It was being performed before the 
conversion from 512 byte blocks to 4KB pages.  Since a blist is managing 
swap space at the granularity of pages and the limit is imposed by the 
blist code, the check should come after the conversion.  Just by moving 
the check to its proper place, the effective limit can be increased from 
32GB to 256GB.  Kostik committed such a change earlier today.


Alan


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 32GB limit per swap device?

2011-08-20 Thread Alan Cox
On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikov melif...@ipfw.ruwrote:

 On 10.08.2011 19:16, per...@pluto.rain.com wrote:

 Chuck Swigercswi...@mac.com  wrote:

  On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:

 I am trying to set up 64GB partitions for swap for a system that
 has 64GB of RAM (with the idea to dump kernel core etc). But, on
 8-stable as of today I get:

 WARNING: reducing size to maximum of 67108864 blocks per swap unit

 Is there workaround for this limitation?


 Another interesting question:

 swap pager operates in page blocks (PAGE_SIZE=4k on common arch).

 Block device size in passed to swaponsomething() in number of _disk_ blocks
  (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of which swap
 pager is build) maximum objects check is enforced.

 The (possible) problem is that real object count we will operate on is not
 the value passed to swaponsomething() since it is calculated in wrong units.

 we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value which
 is rough (X / 8) so we should be able to address 32*8=256G.

 The code should look like this:

 Index: vm/swap_pager.c
 ==**==**===
 --- vm/swap_pager.c (revision 223877)
 +++ vm/swap_pager.c (working copy)
 @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long
u_long mblocks;

/*
 +* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
 +* First chop nblks off to page-align it, then convert.
 +*
 +* sw-sw_nblks is in page-sized chunks now too.
 +*/
 +   nblks = ~(ctodb(1) - 1);
 +   nblks = dbtoc(nblks);
 +
 +   /*

 * If we go beyond this, we get overflows in the radix
 * tree bitmap code.
 */
 @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long
mblocks);
nblks = mblocks;
}
 -   /*
 -* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
 -* First chop nblks off to page-align it, then convert.
 -*
 -* sw-sw_nblks is in page-sized chunks now too.
 -*/
 -   nblks = ~(ctodb(1) - 1);
 -   nblks = dbtoc(nblks);

sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
sp-sw_vp = vp;


 (move pages recalculation before b-list check)


 Can someone comment on this?


I believe that you are correct.  Have you tried testing this change on a
large swap device?

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 32GB limit per swap device?

2011-08-20 Thread Alan Cox

On 08/20/2011 12:41, Kostik Belousov wrote:

On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote:

On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikovmelif...@ipfw.ruwrote:


On 10.08.2011 19:16, per...@pluto.rain.com wrote:


Chuck Swigercswi...@mac.com   wrote:

  On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:

I am trying to set up 64GB partitions for swap for a system that
has 64GB of RAM (with the idea to dump kernel core etc). But, on
8-stable as of today I get:

WARNING: reducing size to maximum of 67108864 blocks per swap unit

Is there workaround for this limitation?


Another interesting question:

swap pager operates in page blocks (PAGE_SIZE=4k on common arch).

Block device size in passed to swaponsomething() in number of _disk_ blocks
  (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of which swap
pager is build) maximum objects check is enforced.

The (possible) problem is that real object count we will operate on is not
the value passed to swaponsomething() since it is calculated in wrong units.

we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value which
is rough (X / 8) so we should be able to address 32*8=256G.

The code should look like this:

Index: vm/swap_pager.c
==**==**===
--- vm/swap_pager.c (revision 223877)
+++ vm/swap_pager.c (working copy)
@@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long
u_long mblocks;

/*
+* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
+* First chop nblks off to page-align it, then convert.
+*
+* sw-sw_nblks is in page-sized chunks now too.
+*/
+   nblks= ~(ctodb(1) - 1);
+   nblks = dbtoc(nblks);
+
+   /*

 * If we go beyond this, we get overflows in the radix
 * tree bitmap code.
 */
@@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long
mblocks);
nblks = mblocks;
}
-   /*
-* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
-* First chop nblks off to page-align it, then convert.
-*
-* sw-sw_nblks is in page-sized chunks now too.
-*/
-   nblks= ~(ctodb(1) - 1);
-   nblks = dbtoc(nblks);

sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
sp-sw_vp = vp;


(move pages recalculation before b-list check)


Can someone comment on this?



I believe that you are correct.  Have you tried testing this change on a
large swap device?

I probably agree too, but I am in the process of re-reading the swap code,
and I do not quite believe in the limit.



I'm uncertain whether the current limit, 0x4000 / 
BLIST_META_RADIX, is exact or not, but I doubt that it is too large.



When the initial code was committed, our daddr_t was 32bit, I checked
the RELENG_4 sources. Current code uses int64_t for daddr_t. My impression
right now is that we only utilize the low 32bits of daddr_t.

Esp. interesting looks the following typedef:
typedef uint32_tu_daddr_t;  /* unsigned disk address */
which (correctly) means that typical mask (u_daddr_t)-1 is 0x.

I wonder whether we could just use full 64bit and de-facto remove the
limitation on the swap partition size.


I would rather argue first that the subr_list code should not be using 
daddr_t all.  The code is abusing daddr_t and defining u_daddr_t to 
represent things that are not disk addresses.  Instead, it should either 
define its own type or directly use (u)int*_t.  Then, as for choosing 
between 32 and 64 bits, I'm skeptical of using this structure for 
managing more than 32 bits worth of blocks, given the amount of RAM it 
will use.



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: MCA: CPU 0 UNCOR PCC DTLB L1 error

2011-05-11 Thread Alan Cox
On Tue, May 10, 2011 at 7:52 AM, John Hay j...@meraka.org.za wrote:

 Hi,

 I have seen this panic a few times on a Gigabyte E350N-USB3 running
 8-STABLE.
 I have only seen it while in X, but then the machine is always in X. At
 first,
 I just got these hangs, so bought a PCI-express RS232 card and could see
 these
 at last. For some reason it does not go past this, so I have not been able
 to
 get a dump yet.

 Have anybody an idea of why this is or how to debug it further? I searched
 the archives and found something similar about a year ago, but it looks
 like it was solved with a fix that got committed.

 http://www.freebsd.org/cgi/query-pr.cgi?pr=140338

 I have now disabled mca in loader.conf with 'hw.mca.enabled=0' and I have
 not seen that panic again. I do occasionally see a panic in devfs_open(),
 but I guess that should be handled in another thread.

 The kernel is basically a GENERIC kernel with puc uncommented and the
 following in loader.conf

 vm.kmem_size=12G
 hw.mca.enabled=0
 zfs_load=YES
 ahci_load=YES
 xhci_load=YES
 amdtemp_load=YES
 ng_ubt_load=YES
 uplcom_load=YES

 Here is the panic message and after that dmesg.

 John
 --
 John Hay -- j...@meraka.csir.co.za / j...@freebsd.org

 
 MCA: Bank 0, Status 0xb6010015
 MCA: Global Cap 0x0106, Status 0x0004
 MCA: Vendor AuthenticAMD, ID 0x500f10, APIC ID 0
 MCA: CPU 0 UNCOR PCC DTLB L1 error
 MCA: Address 0x8016c4000


 Fatal trap 28: machine check trap while in user mode
 cpuid = 0; apic id = 00
 instruction pointer = 0x43:0x80156af85
 stack pointer   = 0x3b:0x7fffcb18
 frame pointer   = 0x3b:0x80fe87800
 code segment= base 0x0, limit 0xf, type 0x1b
= DPL 3, pres 1, long 1, def32 0, gran 1
 processor eflags= interrupt enabled, IOPL = 0
 current process = 2484 (initial thread)
 trap number = 28
 panic: machine check trap
 cpuid = 0
 KDB: stack backtrace:
 #0 0x80608d5e at kdb_backtrace+0x5e
 #1 0x805d6707 at panic+0x187
 #2 0x808bf4c0 at trap_fatal+0x290
 #3 0x808bfaa9 at trap+0x109
 #4 0x808a7d94 at calltrap+0x8
 


Please try the following patch:

Index: x86/x86/mca.c
===
--- x86/x86/mca.c   (revision 219060)
+++ x86/x86/mca.c   (working copy)
@@ -665,7 +665,8 @@ mca_setup(uint64_t mcg_cap)
 * for Erratum 383.
 */
if (cpu_vendor_id == CPU_VENDOR_AMD 
-   CPUID_TO_FAMILY(cpu_id) == 0x10  amd10h_L1TP)
+   (CPUID_TO_FAMILY(cpu_id) == 0x10 ||
+   CPUID_TO_FAMILY(cpu_id) == 0x14)  amd10h_L1TP)
workaround_erratum383 = 1;

mtx_init(mca_lock, mca, NULL, MTX_SPIN);
Index: i386/i386/pmap.c
===
--- i386/i386/pmap.c(revision 219060)
+++ i386/i386/pmap.c(working copy)
@@ -758,7 +758,8 @@ pmap_init(void)
 * machine monitor.
 */
if (vm_guest == VM_GUEST_VM  cpu_vendor_id == CPU_VENDOR_AMD 
-   CPUID_TO_FAMILY(cpu_id) == 0x10)
+   (CPUID_TO_FAMILY(cpu_id) == 0x10 ||
+   CPUID_TO_FAMILY(cpu_id) == 0x14))
workaround_erratum383 = 1;

/*
Index: amd64/amd64/pmap.c
===
--- amd64/amd64/pmap.c  (revision 219060)
+++ amd64/amd64/pmap.c  (working copy)
@@ -727,7 +727,8 @@ pmap_init(void)
 * machine monitor.
 */
if (vm_guest == VM_GUEST_VM  cpu_vendor_id == CPU_VENDOR_AMD 
-   CPUID_TO_FAMILY(cpu_id) == 0x10)
+   (CPUID_TO_FAMILY(cpu_id) == 0x10 ||
+   CPUID_TO_FAMILY(cpu_id) == 0x14))
workaround_erratum383 = 1;

/*
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: panic on vm_page_cache_transfer: object 0xfffffff0035508000's type is not compatible with cache pages

2011-03-18 Thread Alan Cox

On 03/08/2011 08:15, John Baldwin wrote:

On Tuesday, March 08, 2011 5:54:36 am Willem Jan Withagen wrote:

System:

FreeBSD zfs.digiware.nl 8.2-STABLE FreeBSD 8.2-STABLE #1: Sat Feb 26
06:28:43 CET 2011
r...@zfs.digiware.nl:/usr/obj/usr/src/src8/src/sys/ZFS  amd64

Don't have a serial console, so I wrote down the traceback.
But my guess is that that is not enough, however I needed the system so
I rebooted.

tb:
vm_object_split at  +0x125
vm_space_fork   at  +0x3f7
fork1   at  +0x6a9
forkat  +0xee
syscall_entrat  +1c
syscall at  +4c

rip = 0x8006bc39c
rsp = 0x7fffe9d8
rbp = 0x800a04470

It looks a lot like what I find on
http://people.freebsd.org/~pho/stress/log/kostik079.html

But my system is amd64, with 8Gb RAM and is fully ZFS based
with swap on 2 gpt freebsd-swap partitions.

System crashed last night around 1:30, which is when a few large rsync
backups are coming in.

Would I be able to call doadump to obtain something usefull afterward
(provided I have savecore set?)

Hmm, judging from the info at the URL above, I'm not sure what to make of this
assertion.  In vm_object_split(), the 'new_object' is always OBJT_DEFAULT, so
it will always fail that half of the assertion.  In fact, this is the only
place that vm_page_cache_transfer() is called, so 'new_object-type ==
OBJT_SWAP' is pretty much guaranteed to almost never be true.

I guess it is assuming that swap_pager_copy() would have always converted
'new_object' to OBJT_SWAP if it had any cache pages?  Perhaps that is a bogus
assumption if 'orig_object' only has cache pages and no currently-swapped out
pages (or if the swapped out pages are not in the range of the new object)?

I've cc'd Alan to see if he has any ideas.



Yes, it is assuming that the object is converted to OBJT_SWAP.  As a 
rule, for a page to be PG_CACHE, it should exist somewhere on secondary 
storage.


Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: MCA messages after upgrade to 8.2-BEAT1

2010-12-24 Thread Alan Cox
2010/12/23 Dan Langille d...@langille.org

 On 12/22/2010 9:57 AM, John Baldwin wrote:

 On Wednesday, December 22, 2010 7:41:25 am Miroslav Lachman wrote:

 Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e4833
 Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105,
 Status 0x
 Dec 21 12:42:26 kavkaz kernel: MCA: Vendor AuthenticAMD, ID 0x40f33,
 APIC ID 0
 Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD
 Memory
 Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0


 You are getting corrected ECC errors in your RAM.  You see them once an
 hour
 because we poll the machine check registers once an hour.  If this happens
 constantly you might have a DIMM that is dying?


 John:

 I take it these ECC errors *may* have been happening for some time. What
 has changed is the OS now polls for the errors and reports them.


Yes, we enabled MCA by default in 8.1-RELEASE.

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: MCA messages after upgrade to 8.2-BEAT1

2010-12-24 Thread Alan Cox
On Fri, Dec 24, 2010 at 5:08 PM, Carl Johnson ca...@peak.org wrote:

 Alan Cox alan.l@gmail.com writes:

  2010/12/23 Dan Langille d...@langille.org
 
  On 12/22/2010 9:57 AM, John Baldwin wrote:
 
  On Wednesday, December 22, 2010 7:41:25 am Miroslav Lachman wrote:
 
  Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e4833
  Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0105,
  Status 0x
  Dec 21 12:42:26 kavkaz kernel: MCA: Vendor AuthenticAMD, ID 0x40f33,
  APIC ID 0
  Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD
  Memory
  Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0
 
 
  You are getting corrected ECC errors in your RAM.  You see them once an
  hour
  because we poll the machine check registers once an hour.  If this
 happens
  constantly you might have a DIMM that is dying?
 
 
  John:
 
  I take it these ECC errors *may* have been happening for some time. What
  has changed is the OS now polls for the errors and reports them.
 
 
  Yes, we enabled MCA by default in 8.1-RELEASE.

 Is there some reason that it is only available for i386 and not for
 amd64?  Linux has something called mcelog, for machine check errors,
 which sounds similar and is available for amd64.


Perhaps I'm misunderstanding your question, but our MCA driver is supported
and enabled by default on both i386 and amd64.

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Panic: attempted pmap_enter on 2MB page

2010-10-08 Thread Alan Cox

Kurt Alstrup wrote:

Apologies for late response, wanted to check the code again.


On 10/07/2010 10:03 AM, Alan Cox wrote:
  

Alan Cox wrote:
At a high-level, I agree with much of what you say.  In particular, if
pmap_enter() is applied to a virtual address that is already mapped by a large
page, the reported panic could result.  However, barring bugs, for example, in
memory allocation by the upper levels of the kernel, the panic inducing
situation shouldn't occur.


Calls to malloc() of items larger than a page takes a turn through UMA and
eventually ends up in kmem_malloc() via its page_alloc() routine.
Kmem_malloc() in turn gets the pages from vm_page_alloc(), parks them in
the kmem_object and maps them into the kernel_pmap in a loop callings
pmap_enter() for each page. The assigned VA's are pulled from kmem_map.
Pages acquired through vm_page_alloc() may be backed by a super page
reservation and thus are eligible for auto-promotion.

Calls to free() initially take a similar route, ending up on kmem_free()
via UMAs page_free() routine. From there the call path is vm_map_remove(),
vm_map_remove(), vm_map_delete() to pmap_remove().

This logic indicate, that from the kernel/vm perspective the malloc/free()
pair will map/unmap pages as needed. However, the pmapper never unmaps
these pages as far as I can tell. The call path is pmap_remove(),
pmap_remove_pte() to pmap_unuse_pt() who ignores the removal because the
VA = VM_MAXUSER_ADDRESS.

  


No, consider:

static int
pmap_remove_pte(pmap_t pmap, pt_entry_t *ptq, vm_offset_t va,
   pd_entry_t ptepde, vm_page_t *free)
{
   pt_entry_t oldpte;
   vm_page_t m;

   PMAP_LOCK_ASSERT(pmap, MA_OWNED);
   oldpte = pte_load_clear(ptq);

pte_load_clear() zeroes the PTE, regardless of whether it is a kernel or 
user PTE.


I'm afraid that I need to catch an airplane.  I'll follow up to the rest 
of your message later.


Regards,
Alan


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Panic: attempted pmap_enter on 2MB page

2010-10-07 Thread Alan Cox

Kurt Alstrup wrote:

Up front disclaimer: I may very well be wrong on this..

  


At a high-level, I agree with much of what you say.  In particular, if 
pmap_enter() is applied to a virtual address that is already mapped by a 
large page, the reported panic could result.  However, barring bugs, for 
example, in memory allocation by the upper levels of the kernel, the 
panic inducing situation shouldn't occur.


At a lower-level, it appears that you are misinterpreting what 
pmap_unuse_pt() does.  It is not pmap_unuse_pt()'s responsibility to 
clear page table entries, either for user-space page tables or the 
kernel page table.  When a region of the kernel virtual address space is 
deallocated, we do clear the kernel page table entries.  See, for 
example, pmap_remove_pte() (or pmap_qremove()).


The special case for the kernel page table in pmap_unuse_pt() has a 
different purpose.  User-space page table pages are reference counted.  
When there are no longer any valid mappings contained in a user-space 
page table page, we remove that page from the page table and free it.  
However, for the kernel page table, we never free unused page table 
pages.  Instead, they persist, but with all of their mappings marked 
invalid.  In other words, the special case for the kernel page table in 
pmap_unuse_pt() is skipping the code that unmaps and frees the unused 
page table page.


This special handling of the kernel page table does create another 
special case when we destroy a large page mapping within the kernel 
address space.  We have to reinsert into the kernel page table the old 
page table page (for small page mappings) that was sitting idle while 
the kernel page table held a large page mapping in its place.  At the 
same time, all of the page table entries in this page must be 
invalidated.  This is handled by pmap_remove_pde().


I'm curious to know more about you were doing when you encountered this 
panic.  Were you also using ISO images containing large MFS roots?


Regards,
Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Panic: attempted pmap_enter on 2MB page

2010-10-06 Thread Alan Cox

Dave Hayes wrote:

Alan Cox a...@rice.edu writes:
  

[snip]

Is this problem reproducible?  I don't recall if you mentioned that
earlier.



Sort of. 


It seems that everytime I generate a bootable FreeBSD ISO, a die is
rolled.  If it comes up a certain number then it crashes, otherwise it's
fine. ;)

My ISO generation process might be relevant; I create a 600MB ramdisk
(it used to be 512 on FreeBSD 7.3) which loads from the ISO on
boot. This winds up being the root partition. 


As a datapoint the same die roll happens on FreeBSD 7.3 although the
chance of working seems to be greater. 


If you'd like a copy of the ISO to see this for yourself I can make it
available. I'm guessing it will also crash for you in this way modulo
hardware issues. 
  


When you build your kernel for this ISO are you increasing the value of 
NKPT?


Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Panic: attempted pmap_enter on 2MB page

2010-10-06 Thread Alan Cox

Dave Hayes wrote:

Alan Cox a...@rice.edu writes:
  
When you build your kernel for this ISO are you increasing the value of 
NKPT?



No. I was under the impression that this value auto-tunes on amd64,
is that correct?
  


After initialization, yes.  However, the kernel starts out with just 
NKPT page table pages.  With a 600MB MFS root in your kernel, the 
default setting for NKPT won't provide enough page table pages to map 
your entire kernel during initialization.


Try setting NKPT to 320, and let me know if the strange crashes go away.

Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Panic: attempted pmap_enter on 2MB page

2010-10-05 Thread Alan Cox

Dave Hayes wrote:

Alan Cox alan.l@gmail.com writes:
  

I'm afraid that I can't offer much insight without a stack trace.  At
initialization time, we map the kernel with 2MB pages.  I suspect that
something within the kernel is later trying to change one those mappings.
If I had to guess, it's related to the mfs root.



Here is the stack trace. The machine is sitting here in KDB if you
need me to extract any information from it. I

  db bt
  Tracing pid 0 tid 0 td 0x80c67140
  kdb_enter() at kdbenter+0x3d
  panic() at panic+0x17b
  pmap_enter() at pmap_enter+0x641
  kmem_malloc() at kmem_malloc+0x1b5
  uma_large_malloc() at uma_large_malloc+0x4a
  malloc() at malloc+0xd7
  acpi_alloc_wakeup_handler() at acpi_alloc_wakeup_handler+0x82
  mi_startup() at mi_startup+0x59
  btext() at btext+0x2c
  db

  


Thanks.

There are two pieces of information that might be helpful: the value of 
the global variable kernel_vm_end and the virtual address that was 
passed to pmap_enter().


Is this problem reproducible?  I don't recall if you mentioned that earlier.

Can you take a crash dump?

Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Panic: attempted pmap_enter on 2MB page

2010-10-03 Thread Alan Cox
On Sat, Oct 2, 2010 at 9:11 PM, Dave Hayes d...@jetcafe.org wrote:

 What does the above mentioned panic mean? I'm booting from
 an mfsroot off of a DVD with a loader.conf like this:

  autoboot_delay=5
  mfsroot_load=YES
  mfsroot_type=mfs_root
  mfsroot_name=/mfsboot
  vfs.root.mountfrom=ufs:md0
  vfs.root.mountfrom.options=rw
  kern.ipc.nmbclusters=32768
  net.inet.tcp.tcbhashsize=16384
  vm.pmap.pg_ps_enabled=1
  vm.kmem_size=2G
  accf_http_load=YES
  net.inet.tcp.syncache.hashsize=1024
  net.inet.tcp.syncache.bucketlimit=100

 This is FreeBSD 8.1-RELEASE amd64 running with the debugger installed
 into the kernel. Thanks in advance for any insight provided. :)


I'm afraid that I can't offer much insight without a stack trace.  At
initialization time, we map the kernel with 2MB pages.  I suspect that
something within the kernel is later trying to change one those mappings.
If I had to guess, it's related to the mfs root.

Regards,
Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: today's 8.1/i386: panic: bad pte

2010-07-20 Thread Alan Cox
On Mon, Jul 19, 2010 at 11:40 PM, Mikhail T.
mi+t...@aldan.algebra.commi%2bt...@aldan.algebra.com
 wrote:

 Some part of KDE4's kdm crashed at start-up and seems to have taken the
 entire machine with it:

   kgdb /boot/kernel/kernel /var/crash/vmcore.22
   GNU gdb 6.1.1 [FreeBSD]
   Copyright 2004 Free Software Foundation, Inc.
   GDB is free software, covered by the GNU General Public License, and
   you are
   welcome to change it and/or distribute copies of it under certain
   conditions.
   Type show copying to see the conditions.
   There is absolutely no warranty for GDB.  Type show warranty for
   details.
   This GDB was configured as i386-marcel-freebsd...

   Unread portion of the kernel message buffer:
   6pid 18398 (drkonqi), uid 0: exited on signal 11 (core dumped)
   TPTE at 0xbfca9488  IS ZERO @ VA 2a522000
   panic: bad pte
   Uptime: 2h28m24s
   Physical memory: 1263 MB
   Dumping 195 MB: 180 164 148 132 116 100 84 68 52 36 20 4

   Reading symbols from /boot/kernel/splash_pcx.ko...Reading symbols
   from /boot/kernel/splash_pcx.ko.symbols...done.
   done.
   Loaded symbols for /boot/kernel/splash_pcx.ko
   Reading symbols from /boot/kernel/vesa.ko...Reading symbols from
   /boot/kernel/vesa.ko.symbols...done.
   done.
   Loaded symbols for /boot/kernel/vesa.ko
   Reading symbols from /boot/modules/nvidia.ko...done.
   Loaded symbols for /boot/modules/nvidia.ko
   Reading symbols from /boot/kernel/linux.ko...Reading symbols from
   /boot/kernel/linux.ko.symbols...done.
   done.
   Loaded symbols for /boot/kernel/linux.ko
   Reading symbols from /boot/kernel/acpi.ko...Reading symbols from
   /boot/kernel/acpi.ko.symbols...done.
   done.
   Loaded symbols for /boot/kernel/acpi.ko
   Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols
   from /boot/kernel/linprocfs.ko.symbols...done.
   done.
   Loaded symbols for /boot/kernel/linprocfs.ko
   #0  doadump () at pcpu.h:231
   231 __asm __volatile(movl %%fs:0,%0 : =r (td));
   (kgdb) bt full
   #0  doadump () at pcpu.h:231
   No locals.
   #1  0xc05d10a4 in boot (howto=260) at
   /usr/src/sys/kern/kern_shutdown.c:416
_giantcnt = Variable _giantcnt is not available.
   (kgdb) where
   #0  doadump () at pcpu.h:231
   #1  0xc05d10a4 in boot (howto=260) at
   /usr/src/sys/kern/kern_shutdown.c:416
   #2  0xc05d12b1 in panic (fmt=Variable fmt is not available.
   ) at /usr/src/sys/kern/kern_shutdown.c:590
   #3  0xc07f0406 in pmap_remove_pages (pmap=0xc85bbc78) at
   /usr/src/sys/i386/i386/pmap.c:4198
   #4  0xc079516b in vmspace_exit (td=0xc51f3a00) at
   /usr/src/sys/vm/vm_map.c:409
   #5  0xc05a7253 in exit1 (td=0xc51f3a00, rv=139) at
   /usr/src/sys/kern/kern_exit.c:303
   #6  0xc05d3296 in sigexit (td=0xc51f3a00, sig=139) at
   /usr/src/sys/kern/kern_sig.c:2872
   #7  0xc05d47a8 in postsig (sig=11) at /usr/src/sys/kern/kern_sig.c:2759
   #8  0xc06082f8 in ast (framep=0xe5fafd38) at
   /usr/src/sys/kern/subr_trap.c:234
   #9  0xc07e2c44 in doreti_ast () at
   /usr/src/sys/i386/i386/exception.s:368

 Does this look familiar to anyone? Thanks!


Historically, this panic has indicated flakey memory.  This panic occurs
because a memory location within a page table has unexpectedly changed to
zero.

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: today's 8.1/i386: panic: bad pte

2010-07-20 Thread Alan Cox

Mikhail T. wrote:

20.07.2010 12:47, Alan Cox написав(ла):
Historically, this panic has indicated flakey memory.  This panic 
occurs because a memory location within a page table has unexpectedly 
changed to zero.
Ouch... Thanks for the hint (maybe, the panic should say something 
like that?)


In any case, is there a way to identify the the flakey DIMM? I did run 
memtest on this box and haven't received any errors... Thanks! Yours,


No, not from the panic message.  If a thorough memtest didn't turn up a 
problem, then I would start looking for another cause.


Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2?

2010-07-10 Thread Alan Cox
On Fri, Jul 9, 2010 at 6:53 PM, Markus Gebert markus.geb...@hostpoint.chwrote:
[snip]


 Yes, this hardware comes from Sun directly, but getting Sun (/Oracle)
 support for this issue is gonna be tough. FreeBSD is unsupported, and in a
 short test we couldn't reproduce the problem with a Linux kernel. While I
 agree that a hardware issue has always been and still is a possibility to be
 considered, the fact that we tested this on two machines remains as well as
 the fact that 6.x, 7.x do not show the behavior. Another possibility is of
 course, that the X4100 is prone to such issues and somehow 6.x and 7.x have
 workarounds we're not aware of or just do something different in way so that
 this issue does not get triggered.


8.1 is our first release to have the driver for configuring and reporting
machine check exceptions enabled by default.  Prior to 8.1, you had to
explicitly enable the driver at boot time.

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Locking a file backed mdconfig into memory

2010-06-04 Thread Alan Cox

Matthew D Fleming wrote:

On Fri, Jun 04, 2010 at 08:20:49AM -0400, John Baldwin wrote:
  
Hmmm, I would just try increasing NKPT then.  You might have to poke 
around in sys/amd64 to see what the default size is and how to tune 
it.



When Isilon did the stable/7 merge and amd64 default NKPT changed from 
240 to 32 amd64 started having weird pmap issues during boot.  At panic 
time the stack wasn't very useful, and I didn't finish debugging the 
issue since eventually I just had to get something working.  We just 
reverted NKPT to 240 and it worked for us.  I didn't see an anything in 
optsions.amd64 so I hard-coded it in amd64/include/pmap.h.


Supposedly amd64 can deal with a small NKPT and grow dynamically, but it 
didn't seem to work for us. :-( Perhaps when we do the next merge 
project I'll have a few days to devote to debugging the root cause.
  


NKPT controls the number of page table pages that are initially 
allocated at the bottom of the top 2GB of the kernel address space.  
However, the vast majority of the kernel address space, 510GB in FreeBSD 
=7.3, is below these page table pages.  The page table pages for this 
region are dynamically allocated as needed.


If you're booting a kernel and modules greater than 64GB in size, then I 
can certainly see why you would need to increase NKPT.


John, is there some way to know at boot time how big the kernel and 
modules were?  Then, we could probably eliminate NKPT.


Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Locking a file backed mdconfig into memory

2010-06-04 Thread Alan Cox

On 6/4/2010 1:53 PM, John Baldwin wrote:

On Friday 04 June 2010 1:58:13 pm Alan Cox wrote:
   

Matthew D Fleming wrote:
 

On Fri, Jun 04, 2010 at 08:20:49AM -0400, John Baldwin wrote:

   

Hmmm, I would just try increasing NKPT then.  You might have to poke
around in sys/amd64 to see what the default size is and how to tune
it.

 

When Isilon did the stable/7 merge and amd64 default NKPT changed from
240 to 32 amd64 started having weird pmap issues during boot.  At panic
time the stack wasn't very useful, and I didn't finish debugging the
issue since eventually I just had to get something working.  We just
reverted NKPT to 240 and it worked for us.  I didn't see an anything in
optsions.amd64 so I hard-coded it in amd64/include/pmap.h.

Supposedly amd64 can deal with a small NKPT and grow dynamically, but it
didn't seem to work for us. :-( Perhaps when we do the next merge
project I'll have a few days to devote to debugging the root cause.

   

NKPT controls the number of page table pages that are initially
allocated at the bottom of the top 2GB of the kernel address space.
However, the vast majority of the kernel address space, 510GB in FreeBSD
  =7.3, is below these page table pages.  The page table pages for this
region are dynamically allocated as needed.

If you're booting a kernel and modules greater than 64GB in size, then I
can certainly see why you would need to increase NKPT.
 

64GB seems like a lot of address space, I would not expect that to be
completely used by kernel and modules.  I think earlier in the thread someone
said they had problems with a mere 295MB mfsroot.

   


Oops.  I meant to say 64MB.  :-)


John, is there some way to know at boot time how big the kernel and
modules were?  Then, we could probably eliminate NKPT.
 

I think the loader knows, so it could pass that info to the kernel.

   


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920

2010-04-13 Thread Alan Cox
On Tue, Apr 13, 2010 at 12:35 AM, Andrew Snow a...@modulus.org wrote:


 The statements about the scheduler flipping between cores is also somewhat
 false, ULE does the right thing now for long-running computational threads.

 Furthermore, I can't see how a Gflops benchmark which fits in the CPU cache
 has anything to do with the memory architecture of the operating system.


It can.  Search the web for descriptions of page coloring.  Roughly
speaking, if your cache is physically indexed, the way in which the virtual
memory system allocates physical pages to virtual addresses can affect
whether or not the cache is fully utilized.  In a pathological case, those
physical pages that your application touches reside in the same part of the
cache and consequently you suffer frequent conflict misses.  Meanwhile, the
other parts of the cache go unused.  Page coloring creates a predictable
mapping between virtual and physical addresses so that a carefully written
application can avoid the pathological case.

Our support for superpages has the same effect.

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920

2010-04-12 Thread Alan Cox
On Sun, Apr 11, 2010 at 11:12 PM, Maho NAKATA cha...@mac.com wrote:

 Hi FreeBSD developers,
 [the original article in Japanese can be found at
 http://blog.goo.ne.jp/nakatamaho/e/b5f6fbc3cc6e1ac4947463eb1ca4eb0a ]

 *Abstract*
 I compared the peak performance of FreeBSD 8.0/amd64 and Ubuntu 9.10 amd64
 using dgemm
 (a linear algebra routine, matrix-matrix multiplication).
 I obtained only 70% of theoretical peak performance on FreeBSD 8/amd64 and
 almost 95% on Ubuntu 9.10 /amd64. I'm really disappointed.

 *Introduction*
 I'm a friend of Gotoh Kazushige, the principal developers of GotoBLAS. He
 told me that
 FreeBSD is not suitable OS for scientific computing or high performance
 computing. He says
 (in Japanese and my translation):

  I guess FreeBSD does page coloring, but I don't think FreeBSD considers
 very large cache
  size which recent CPU has. Support of a very large cache on Linux is
 still not very will
  sophisticated, but on *BSDs, its worst; they uses too fine memory
 allocation method,
  so we cannot expect large continuous physical memory allocation.


These statements about FreeBSD's memory management are wrong, or at least
outdated.  FreeBSD is very likely to allocate physical memory in contiguous
chunks to your memory-hungry application even if automatic superpage
promotion does not occur.

You should refer your friend to my paper at
http://www.usenix.org/events/osdi02/tech/full_papers/navarro/navarro_html/and
tell him that FreeBSD = 7.2 implements a variation on what that paper
describes.

Regards,
Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Strange problem with 8-stable, VMWare vSphere 4 AMD CPUs (unexpected shutdowns)

2010-02-23 Thread Alan Cox

Alan Cox wrote:
The next public revision guide from AMD will contain an errata (383) 
that documents the bug.  However, it doesn't really tell us anything 
that we didn't already know.


Could someone on this list please test the attached patch in an amd64 
FreeBSD 8 guest running on vSphere 4 with an AMD Family 10h processor 
underneath?  Before testing the patch, remove the manual setting of 
vm.pmap.pg_ps_enabled=0 from /boot/loader.conf.  After booting the 
virtual machine, please run sysctl vm.pmap.pg_ps_enabled to verify 
that superpage promotion has been automatically disabled.


Thanks,
Alan

Index: amd64/amd64/pmap.c
===
--- amd64/amd64/pmap.c  (revision 204175)
+++ amd64/amd64/pmap.c  (working copy)
@@ -686,6 +686,15 @@ pmap_init(void)
pv_entry_high_water = 9 * (pv_entry_max / 10);
 
/*
+* Disable large page mappings by default if the kernel is running in
+* a virtual machine on an AMD Family 10h processor.  This is a work-
+* around for Erratum 383.
+*/
+   if (vm_guest == VM_GUEST_VM  cpu_vendor_id == CPU_VENDOR_AMD 
+   CPUID_TO_FAMILY(cpu_id) == 0x10)
+   pg_ps_enabled = 0;
+
+   /*
 * Are large page mappings enabled?
 */
TUNABLE_INT_FETCH(vm.pmap.pg_ps_enabled, pg_ps_enabled);
Index: kern/subr_param.c
===
--- kern/subr_param.c   (revision 204175)
+++ kern/subr_param.c   (working copy)
@@ -74,10 +74,6 @@ __FBSDID($FreeBSD$);
 #defineMAXFILES (maxproc * 2)
 #endif
 
-/* Values of enum VM_GUEST members are used as indices in 
- * vm_guest_sysctl_names */
-enum VM_GUEST { VM_GUEST_NO = 0, VM_GUEST_VM, VM_GUEST_XEN };
-
 static int sysctl_kern_vm_guest(SYSCTL_HANDLER_ARGS);
 
 inthz;
Index: sys/systm.h
===
--- sys/systm.h (revision 204175)
+++ sys/systm.h (working copy)
@@ -45,6 +45,10 @@
 #include sys/queue.h
 #include sys/stdint.h/* for people using printf mainly */
 
+/* Values of enum VM_GUEST members are used as indices in 
+ * vm_guest_sysctl_names */
+enum VM_GUEST { VM_GUEST_NO = 0, VM_GUEST_VM, VM_GUEST_XEN };
+
 extern int cold;   /* nonzero if we are doing a cold boot */
 extern int rebooting;  /* boot() has been called. */
 extern const char *panicstr;   /* panic message */
@@ -63,6 +67,7 @@ extern int bootverbose;   /* nonzero to print 
verbo
 
 extern int maxusers;   /* system tune hint */
 extern int ngroups_max;/* max # of supplemental groups */
+extern int vm_guest;   /* Running as virtual machine guest? */
 
 #ifdef INVARIANTS  /* The option is always available */
 #defineKASSERT(exp,msg) do {   
\
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: Strange problem with 8-stable, VMWare vSphere 4 AMD CPUs (unexpected shutdowns)

2010-02-11 Thread Alan Cox
On Thu, Feb 11, 2010 at 7:13 AM, John Baldwin j...@freebsd.org wrote:

 On Wednesday 10 February 2010 1:38:37 pm Ivan Voras wrote:
  On 10 February 2010 19:35, Andriy Gapon a...@icyb.net.ua wrote:
   on 10/02/2010 20:26 Ivan Voras said the following:
   On 10 February 2010 19:10, Andriy Gapon a...@icyb.net.ua wrote:
   on 10/02/2010 20:03 Ivan Voras said the following:
   When you say very unique is it in the it is not Linux or Windows
   sense or do we do something nonstandard?
   The former - neither Linux, Windows or OpenSolaris seem to have what
 we
 have.
  
   I can't find the exact documents but I think both Windows
   MegaUltimateServer (the highest priced version of Windows Server,
   whatever it's called today) and Linux (though disabled and marked
   Experimental) have it, or have some kind of support for large pages
   that might not be as pervasive (maybe they use it for kernel only?). I
   have no idea about (Open)Solaris.
  
   I haven't said that those OSes do not use large pages.
   I've said what I've said :-)
 
  Ok :)
 
  Is there a difference between large pages as they are commonly known
  and superpages as in FreeBSD ? In other words - are you referencing
  some specific mechanism, like automatic promotion / demotion of the
  large pages or maybe something else?

 Yes, the automatic promotion / demotion.  That is a far-less common
 feature.
 FreeBSD/i386 has used large pages for the kernel text as far back as at
 least
 4.x, but that is not the same as superpages.  Linux does not have automatic
 promotion / demotion to my knowledge.  I do not know about other OS's.


A comparison of current large page support among Unix-like and Windows
operating systems has two dimensions: (1) whether or not the creation of
large pages for applications is automatic and (2) whether or not the machine
administrator has to statically partition the machine's physical memory
between large and small pages at boot time.

For FreeBSD, large pages are created automatically and there is not a static
partitioning of physical memory.  In contrast, Linux does not create large
pages automatically and does require a static partitioning.  Specifically,
Linux requires the administrator to explicitly and statically partition the
machine's physical memory at boot time into two parts, one that is dedicated
to large pages and another for general use.  To utilize large pages an
application has to explicitly request memory from the dedicated large pages
pool.  However, to make this somewhat easier, but not automatic, there do
exist re-implementations of malloc that you can explicitly link with your
application.

In Solaris, the application has to explicitly request the use of large
pages, either via explicit kernel calls in the program or from the command
line with support from a library.  However, there is not a static
partitioning of physical memory.  So, for example, when you run the Sun jdk
on Solaris, it explicitly requests large pages for much of its data, and
this works without administrator having to configure the machine for large
page usage.

To the best of my knowledge, Windows is just like Solaris.

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Strange problem with 8-stable, VMWare vSphere 4 AMD CPUs (unexpected shutdowns)

2010-02-11 Thread Alan Cox
The next public revision guide from AMD will contain an errata (383) 
that documents the bug.  However, it doesn't really tell us anything 
that we didn't already know.


Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Strange problem with 8-stable, VMWare vSphere 4 AMD CPUs (unexpected shutdowns)

2010-02-10 Thread Alan Cox
On Wed, Feb 10, 2010 at 12:46 PM, Jeremy Chadwick
free...@jdc.parodius.comwrote:
[snip]


 I read what Andriy wrote to mean that the way FreeBSD utilises 4MB TLB
 on certain models of AMD processors is broken/quirky, and on those CPUs,
 users should stick to vm.pmap.pg_ps_enabled=0 (loader.conf).


No.  He said, We don't do anything that strays from specifications.  So,
he is not saying that FreeBSD is doing anything broken.

Here is what I know.  Several of us, myself included, have been able to
reproduce either lockups or machine check exceptions when BOTH the machine
check driver and superpages are enabled on AMD family 10h processors.  There
have been no reports of this problem on either Intel or earlier AMD
processors.  Moreover, there is no evidence of instability in AMD family 10h
processors until the machine check driver is enabled.  By default, FreeBSD
8.0 enables superpages but disables the machine check driver.  So, running
natively, i.e., without virtualization, you shouldn't experience a problem,
unless you explicitly enable the machine check driver.  However, running on
top of a hypervisor, like vSphere 4, you might face a problem because the
hypervisor might enable machine check exceptions, regardless of what the
FreeBSD guest does.  I really don't know whether vSphere 4 enables machine
check exception or not.  If it does, then either you disable the use of
superpages in the FreeBSD guest, or you find a way to disable the machine
check driver in the hypervisor.

Both Andriy and I have reported this problem to people at AMD, but we
haven't yet received AMD's analysis.  These things take time.

Regards,
Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 8.0-RC1 panic attaching ppc

2009-09-28 Thread Alan Cox

Daniel O'Connor wrote:

On Sun, 27 Sep 2009, Alan Cox wrote:
  

Ok, now I can explain what is happening.  The kernel is using 1GB
pages to implement the direct map.  Unfortunately, pmap_extract()
doesn't know how to handle a 1GB page mapping.  pmap_kextract() only
works by an accident of its different implementation.  In other
words, it should not be relied upon to work either.

Please revert whatever patch John gave you and try the attached
patch. It simply disables the use of 1GB page mapping by the direct
map.



Your patch fixes (works around?) the problem.
  


Thanks.  I've committed the patch.

Yes, it's a work around.  Fortunately(?), on my test machine, I don't 
see any measurable effect from disabling the use of 1GB pages by the 
direct map.


Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 8.0-RC1 panic attaching ppc

2009-09-26 Thread Alan Cox

Daniel O'Connor wrote:

On Sat, 26 Sep 2009, Alan Cox wrote:
  

John Baldwin wrote:


On Friday 25 September 2009 3:20:05 am Daniel O'Connor wrote:
  

On Thu, 24 Sep 2009, John Baldwin wrote:


Can you try this patch perhaps:

Index: sys/amd64/isa/isa_dma.c
=
== --- isa_dma.c(revision 197430)
+++ isa_dma.c   (working copy)
  

This patch fixes the panic for me.

I haven't tried printing (don't have any device handy here).


I wonder if pmap_extract(kernel_pmap) doesn't work with direct map
addresses for some reason?  I kind of find that hard to believe
actually.  Alan, the original panic was in
pmap_extract(kernel_pmap, ...) calls in the isa_dma code.  My patch
that fixes the panic just changes them to pmap_kextract().
  

Is this problem occurring on an AMD processor?



Yes,
CPU: AMD Athlon(tm) II X2 240 Processor (2812.73-MHz K8-class CPU)
  Origin = AuthenticAMD  Id = 0x100f62  Stepping = 2
  
Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
  Features2=0x802009SSE3,MON,CX16,POPCNT
  AMD 
Features=0xee500800SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!
  AMD 
Features2=0x37ffLAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT
  TSC: P-state invariant
real memory  = 4294967296 (4096 MB)
avail memory = 3974762496 (3790 MB)

  


Ok, now I can explain what is happening.  The kernel is using 1GB pages 
to implement the direct map.  Unfortunately, pmap_extract() doesn't know 
how to handle a 1GB page mapping.  pmap_kextract() only works by an 
accident of its different implementation.  In other words, it should 
not be relied upon to work either.


Please revert whatever patch John gave you and try the attached patch.  
It simply disables the use of 1GB page mapping by the direct map.


Regards,
Alan


Index: amd64/amd64/pmap.c
===
--- amd64/amd64/pmap.c  (revision 197425)
+++ amd64/amd64/pmap.c  (working copy)
@@ -442,7 +442,7 @@
if (ndmpdp  4) /* Minimum 4GB of dirmap */
ndmpdp = 4;
DMPDPphys = allocpages(firstaddr, NDMPML4E);
-   if ((amd_feature  AMDID_PAGE1GB) == 0)
+   if (TRUE || (amd_feature  AMDID_PAGE1GB) == 0)
DMPDphys = allocpages(firstaddr, ndmpdp);
dmaplimit = (vm_paddr_t)ndmpdp  PDPSHIFT;
 
@@ -476,7 +476,7 @@
 
/* Now set up the direct map space using either 2MB or 1GB pages */
/* Preset PG_M and PG_A because demotion expects it */
-   if ((amd_feature  AMDID_PAGE1GB) == 0) {
+   if (TRUE || (amd_feature  AMDID_PAGE1GB) == 0) {
for (i = 0; i  NPDEPG * ndmpdp; i++) {
((pd_entry_t *)DMPDphys)[i] = (vm_paddr_t)i  PDRSHIFT;
((pd_entry_t *)DMPDphys)[i] |= PG_RW | PG_V | PG_PS |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: 8.0-RC1 panic attaching ppc

2009-09-25 Thread Alan Cox

John Baldwin wrote:

On Friday 25 September 2009 3:20:05 am Daniel O'Connor wrote:
  

On Thu, 24 Sep 2009, John Baldwin wrote:


Can you try this patch perhaps:

Index: sys/amd64/isa/isa_dma.c
===
--- isa_dma.c   (revision 197430)
+++ isa_dma.c   (working copy)
  

This patch fixes the panic for me.

I haven't tried printing (don't have any device handy here).



I wonder if pmap_extract(kernel_pmap) doesn't work with direct map addresses 
for some reason?  I kind of find that hard to believe actually.  Alan, the 
original panic was in pmap_extract(kernel_pmap, ...) calls in the isa_dma 
code.  My patch that fixes the panic just changes them to pmap_kextract(). 

  


Is this problem occurring on an AMD processor?

Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 8.0-RC1 panic attaching ppc

2009-09-25 Thread Alan Cox

John Baldwin wrote:

On Friday 25 September 2009 3:20:05 am Daniel O'Connor wrote:
  

On Thu, 24 Sep 2009, John Baldwin wrote:


Can you try this patch perhaps:

Index: sys/amd64/isa/isa_dma.c
===
--- isa_dma.c   (revision 197430)
+++ isa_dma.c   (working copy)
  

This patch fixes the panic for me.

I haven't tried printing (don't have any device handy here).



I wonder if pmap_extract(kernel_pmap) doesn't work with direct map addresses 
for some reason?  I kind of find that hard to believe actually.  Alan, the 
original panic was in pmap_extract(kernel_pmap, ...) calls in the isa_dma 
code.  My patch that fixes the panic just changes them to pmap_kextract(). 

  


In principle, pmap_extract(kernel_pmap, ...) should work just fine.

Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: current zfs tuning in RELENG_7 (AMD64) suggestions ?

2009-05-02 Thread Alan Cox
On Fri, May 1, 2009 at 6:28 PM, Freddie Cash fjwc...@gmail.com wrote:

 On Fri, May 1, 2009 at 4:12 PM, Louis Kowolowski
 lou...@cryptomonkeys.org wrote:
  On May 1, 2009, at 1:53 PM, Pete French wrote:
  ...
  The tuning isn't there to improve performance, it's there to prevent
  the box going titus due to a panic when the ARC gets too big, and
  you are missing the mian one, which is to limit the size of the ARC.
  On recent versions of BSD (and you are running 7.2, so thats fine) then
  the defaults for kmem size are fine, but you still need something like
  this:
 
  vfs.zfs.arc_max=256M
 
  In there to stop the ARC growing. thats the only tuning I have on
  my 4 gig machine, which takes a steady stream of data and is used
  for taking backup snapshots. ZFS is excellent, and for me is perfectly
  stable, to the point where I am starting to roll it out to production
  machines, with the above tuning.
 
  I agree, although I'm using 384 instead of 256.  My systems have been
  running in production for almost a year now w/o any ZFS issues.

 The exact value to use will depend on the system.  Particularly on the
 amount of RAM in the system, and what kmem_max is set to.  A
 rule-of-thumb we've been using is:
   kmem_max should be half of the amount of RAM (or 1.5 GB as that's
 the current max)


This information is outdated.  The current max in RELENG_7 for amd64 is
~3.75GB.

Regards,
Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: problems with 7.2, vm_page_insert: page already inserted

2009-04-11 Thread Alan Cox
Please post your kernel configuration file.

Regards,
Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: problems with 7.2, vm_page_insert: page already inserted

2009-04-11 Thread Alan Cox

Raul wrote:

El sáb, 11-04-2009 a las 19:37 -0500, Alan Cox escribió:

  

Please post your kernel configuration file.



It's rather simple:

[]
include GENERIC
ident   TURING

options IPSEC
device  enc
device  crypto
[]

That's all.
  


Ok, thanks.

Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: panic when detaching swap-backed md(4) disk which has gone into swap

2009-03-28 Thread Alan Cox
Please file a PR on this panic with the stack trace.

Thanks,
Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: GCC build causes panic: page already inserted

2009-03-16 Thread Alan Cox
On Mon, Mar 16, 2009 at 12:59 PM, Dan Allen danalle...@airwired.net wrote:

 I saw that someone else had this happen last week...  It is not a hardware
 failure.


I have not seen that.  I have only seen an assertion failure that would have
nothing to do with your reported panic.



 While building the latest GCC 4.4 from /usr/ports/lang/gcc44 I got a core
 dump with the message

vm_page_insert: page already inserted

 I build this port every week on a Toshiba laptop (1.8GHz Core 2 Duo, 1 GB
 RAM, 160 GB HD, plenty of free space, RELENG_7).  I have never seen this
 until today.  Just before building this port I completely built the kernel
 and world and installed them, so I am as up-to-date as you could be.

 I suspect recent changes to vm code... perhaps in
 /usr/src/sys/vm/vm_meter.c or vm_page.c ?

 The compressed core dump is 41 MB.



For now, can you just provide the stack trace?

Regards,
Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 7.1 panic vm_page_startup: inconsistent page counts

2009-03-16 Thread Alan Cox

Peter Jeremy wrote:

On 2009-Mar-12 08:46:50 -0400, John Baldwin j...@freebsd.org wrote:
  

On Thursday 12 March 2009 12:36:46 am Peter Jeremy wrote:


I'm trying to upgrade an 11 month old FreeBSD 7 image in a VMware
4.5.2 guest to an up-to-date -stable and it panics as above.  I've
added a printf to report the two counts and there's a difference of
one page.  I don't have any problems with the old 7-stable image or
up-to-date 6-stable or -current using the same VMware version.

A screendump for a verbose boot can be found at
http://imagebin.ca/img/wahNNw.gif

Can I safely delete the assert?
  
I don't think so, I would report it to Alan.  The one earlier report of this 
didn't include the detail that it was only off by one page.



This is a bit moot now since you've disabled the test but rolling back
to a kernel from 26th Feb (before the superpages MFC) doesn't have the
page count discrepancy.

  


It's useful to know that the older kernel doesn't fail the assertion.  
Thanks.


Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 7.1-STABLE does not boot after recent superpage support MFC

2009-02-27 Thread Alan Cox
On Fri, Feb 27, 2009 at 11:11 AM, John Baldwin j...@freebsd.org wrote:

 On Friday 27 February 2009 11:26:25 am Igor Sysoev wrote:
  On Fri, Feb 27, 2009 at 10:26:15AM -0500, John Baldwin wrote:
 
   On Friday 27 February 2009 8:08:30 am Igor Sysoev wrote:
Is anyone able to boot kernel with recently merged superpage support
 ?
I have csup'd world to
*default date=2009.02.26.23.59.59
then rebuild world and kernel does not boot:
   
FreeBSD 7.1-STABLE #4: Fri Feb 27 11:59:13 MSK 2009
X
kernel trap 12 with interrupts disabled
   
   
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code  = supervisor read data, page not present
instruction pointer = 0x8:0x803b1d80
stack pointer   = 0x10:0x80686ce0
frame pointer   = 0x10:0x80686d00
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 0 ()
trap number = 12
panic: page fault
cpuid = 0
   
And the message is cycled. The kernel does not boot despite
vm.pmap.pg_ps_enabled value.
  
   This should now be fixed, apologies for the breakage. :(
 
  Thank you, your commit has fixed the bug.
 
  Now I have
 
  $sysctl vm.pmap.pde
  vm.pmap.pde.promotions: 518
  vm.pmap.pde.p_failures: 4534
  vm.pmap.pde.mappings: 0
  vm.pmap.pde.demotions: 423
 
  Does this mean that (518 - 423) * 2 = 190M are mapped via 2M pages ?

 I don't think that includes the direct map which uses 2M pages.  I think
 your
 conclusion is correct, but alc@ would know for sure.

 --
 John Baldwin
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org



These counts are cumulative.  So, they don't really provide you with an
instantaneous number of the active 2MB page mappings.  Moreover, when a 2MB
page mapping is destroyed in its entirety, for example, when exit()ing a
process, that does not trigger a demotion.  In other words, a promoted 2MB
page mapping can cease to exist without a demotion occurring.

You are correct that in RELENG_7 these counts don't say anything about the
direct map.

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 7.1-STABLE does not boot after recent superpage support MFC

2009-02-27 Thread Alan Cox
On Fri, Feb 27, 2009 at 10:42 AM, John Baldwin j...@freebsd.org wrote:

 On Friday 27 February 2009 11:21:00 am Michael Butler wrote:
  John Baldwin wrote:
   On Friday 27 February 2009 8:08:30 am Igor Sysoev wrote:
  
   And the message is cycled. The kernel does not boot despite
   vm.pmap.pg_ps_enabled value.
  
   This should now be fixed, apologies for the breakage. :(
 
  What are the benefits and/or impacts of enabling this?
 
  Is there anything to be gained with respect to cache and/or TLB
  utilization in allowing entry promotion through a reduced footprint or
  similar? How much does this depend on architecture, say, e.g. Core-2 Duo
  vs. Pentium?

 Yes there are gains due to what you mention, but it does depend on the
 specific processor and specifically the how it manages entries for large
 pages in its TLB (some processsors have separate TLB entries for large
 pages
 and have very few of them, others can store either a small or lage page in
 a
 single TLB slot, etc.).  Alan knows far more of the details of this than I
 do.

  I note that it is not enabled by default in -current either - just
 curious,

 Actually, it is enabled by default on amd64 in current.

 --
 John Baldwin
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org



The short answer is ... if you're running an amd64 kernel on a Pentium 4,
Core 2, or tri- or quad-core Opteron/Phenom, enable promotion.  Your results
with other amd64-compatible processors, single- and dual-core Athlon/Opteron
and Atom, will be application dependent.  You'll win some and you'll lose
some.

For a longer answer with data and figures, take a look at this paper:
http://ft.ornl.gov/pubs-archive/ispass-final-csmd.pdf

That said, there are secondary benefits to enabling large page support that
have nothing to do with the TLB, specifically, it makes fork()ing and
exit()ing large address spaces cheaper.

Regards,
Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Panic: vm_page_free: freeing wired page with 6.2 RELEASE

2007-04-22 Thread Alan Cox

Martin Blapp wrote:



Hi,

Just got this panic on a loaded mailserver ... The server was rocking 
stable

up to this panic, and after we loaded it a bit more recently, it paniced.

Any ideas ?



Can you post your kernel configuration file?

Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: systat -vm output showing negative total virtual memory

2006-11-19 Thread Alan Cox
The change to vm_meter.c is ok.  Could you please add a comment like 
that above the location of the patch.


Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: vm_thread_swapin: cannot get kstack for proc: 643 under high memory load

2005-06-01 Thread Alan Cox
On Wed, Jun 01, 2005 at 01:57:59PM +0100, Robert Watson wrote:
 
 Had an interesting panic on RELENG_5 today.  System was under high memory 
 load due to a run-away pine process that was also generating a very high 
 memory load on the kernel due to heavy network I/O, but the swap pager 
 keels over due to ENOMEM (error 12).  Before I knew it, syslogd was core 
 dumping and the system panicked due to a stack paging issue (possible that 
 there was no memory into which to load the stack?).
 
 Some details below; core is available.  An undesirable failure mode...
 
 Robert N M Watson
 
 
 Had an interesting panic on RELENG_5 today.  System was under high memory
 load due to a run-away pine process, but the swap pager keels over due to
 ENOMEM (error 12).  Before I knew it, syslogd was core dumping and the
 system panicked due to a stack paging issue (possible that there was no
 memory into which to load the stack?).

We don't release the kernel virtual address range for the stack on a
swap out.  So, it's still available in vm_thread_swapin().
Furthermore, at this point in vm_thread_swapin(), we have already
allocated the required physical page(s).  The panic is a direct result 
of the I/O error.  In other words, vm_thread_swapin() panics if there
is an I/O error during page in of the stack page(s).

 Regardless, not a desirable failure mode.

Indeed, I'll see what I can do.

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: mutex vm object not owned

2005-03-11 Thread Alan Cox
On Fri, Mar 11, 2005 at 06:59:02PM +, Gavin Atkinson wrote:
 
 Hi,
 
 I'm tracking RELENG_5 and since my last update I can no longer start X
 without getting the following panic:
 
 panic: mutex vm object not owned at /usr/src/sys/vm/vm_page.c:334
 cpuid = 0
 KDB: enter: panic
 [ thread pid 2762 tid 100116 ]
 stopped at kdb_enter+0x2b:   nop
 
 At which point the machine hangs solid (note this is over a serial console
 - it is after X has claimed the display).
 
 I've done a binary chop to find where the bug was introduced, and it
 started at some point between 19:00 on 2005/02/25 and 00:00 on 2005/02/26.
 The MFC to vm_page.c which added the assertion I'm tripping up on happened
 between those two points:
 $FreeBSD: src/sys/vm/vm_page.c,v 1.290.2.4 2005/02/25 23:38:22 alc Exp $
 
 vm_page.c:334 seems to correspond to the line
 VM_OBJECT_LOCK_ASSERT(m-object, MA_OWNED);
 within vm_page_wakeup()
 
 Any suggestions?  I'll try to coerce it into generating a backtrace but
 given the machine wedges before even giving the DDB prompt I'm not sure
 I'll be able to.
 

Are you using an Nvidia driver?

Regards,
Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: mutex vm object not owned

2005-03-11 Thread Alan Cox
I'm pretty sure that I understand the cause.  Please apply and test
the attached patch.

Regards,
Alan
Index: pci/agp.c
===
RCS file: /home/ncvs/src/sys/pci/agp.c,v
retrieving revision 1.45
diff -u -r1.45 agp.c
--- pci/agp.c   16 Aug 2004 12:25:48 -  1.45
+++ pci/agp.c   11 Mar 2005 19:17:09 -
@@ -501,6 +501,7 @@
 * because vm_page_grab() used with VM_ALLOC_RETRY may
 * block and we can't hold a mutex while blocking.
 */
+   VM_OBJECT_LOCK(mem-am_obj);
for (i = 0; i  mem-am_size; i += PAGE_SIZE) {
/*
 * Find a page from the object and wire it
@@ -509,18 +510,18 @@
 * AGP_PAGE_SIZE. If this is the first call to bind,
 * the pages will be allocated and zeroed.
 */
-   VM_OBJECT_LOCK(mem-am_obj);
m = vm_page_grab(mem-am_obj, OFF_TO_IDX(i),
VM_ALLOC_WIRED | VM_ALLOC_ZERO | VM_ALLOC_RETRY);
-   VM_OBJECT_UNLOCK(mem-am_obj);
AGP_DPF(found page pa=%#x\n, VM_PAGE_TO_PHYS(m));
}
+   VM_OBJECT_UNLOCK(mem-am_obj);
 
mtx_lock(sc-as_lock);
 
if (mem-am_is_bound) {
device_printf(dev, memory already bound\n);
error = EINVAL;
+   VM_OBJECT_LOCK(mem-am_obj);
goto bad;
}

@@ -532,10 +533,9 @@
 * (i.e. use alpha_XXX_dmamap()). I don't have access to any
 * alpha AGP hardware to check.
 */
+   VM_OBJECT_LOCK(mem-am_obj);
for (i = 0; i  mem-am_size; i += PAGE_SIZE) {
-   VM_OBJECT_LOCK(mem-am_obj);
m = vm_page_lookup(mem-am_obj, OFF_TO_IDX(i));
-   VM_OBJECT_UNLOCK(mem-am_obj);
 
/*
 * Install entries in the GATT, making sure that if
@@ -566,6 +566,7 @@
vm_page_wakeup(m);
vm_page_unlock_queues();
}
+   VM_OBJECT_UNLOCK(mem-am_obj);
 
/*
 * Flush the cpu cache since we are providing a new mapping
@@ -586,7 +587,7 @@
return 0;
 bad:
mtx_unlock(sc-as_lock);
-   VM_OBJECT_LOCK(mem-am_obj);
+   VM_OBJECT_LOCK_ASSERT(mem-am_obj, MA_OWNED);
for (i = 0; i  mem-am_size; i += PAGE_SIZE) {
m = vm_page_lookup(mem-am_obj, OFF_TO_IDX(i));
vm_page_lock_queues();
Index: pci/agp_i810.c
===
RCS file: /home/ncvs/src/sys/pci/agp_i810.c,v
retrieving revision 1.30.2.1
diff -u -r1.30.2.1 agp_i810.c
--- pci/agp_i810.c  1 Mar 2005 08:11:50 -   1.30.2.1
+++ pci/agp_i810.c  11 Mar 2005 19:15:45 -
@@ -609,13 +609,10 @@
vm_page_t m;
 
VM_OBJECT_LOCK(mem-am_obj);
-   m = vm_page_grab(mem-am_obj, 0,
+   m = vm_page_grab(mem-am_obj, 0, VM_ALLOC_NOBUSY |
VM_ALLOC_WIRED | VM_ALLOC_ZERO | VM_ALLOC_RETRY);
VM_OBJECT_UNLOCK(mem-am_obj);
-   vm_page_lock_queues();
mem-am_physical = VM_PAGE_TO_PHYS(m);
-   vm_page_wakeup(m);
-   vm_page_unlock_queues();
} else {
mem-am_physical = 0;
}
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]