On Monday, 7 June 2021, Aneesh Kumar K.V wrote:
>
> This patchset enables MOVE_PMD/MOVE_PUD support on power. This requires
> the platform to support updating higher-level page tables without
> updating page table entries. This also needs to invalidate the Page Walk
> Cache on architecture
On 12 May 2017 at 13:35, Michael Ellerman wrote:
> Nicholas Piggin writes:
>
> > The single-operand form of tlbie used to be accepted as the second
> > operand (L) being implicitly 0. Newer binutils reject this.
> >
> > Change remaining single-op tlbie
On Fri, Jul 09, 2010 at 09:34:16AM +0200, Jens Axboe wrote:
On 2010-07-09 08:57, divya wrote:
On Friday 02 July 2010 12:16 PM, divya wrote:
On Thursday 01 July 2010 11:55 PM, Maciej Rutecki wrote:
On środa, 30 czerwca 2010 o 13:22:27 divya wrote:
While running fs_racer test from LTP on a
candidates are from Christoph and Nick Piggin
(added to CC)
No commits relating to POWER6 or PPC.
Not sure what's happening here. The first warning looks like some mutex
corruption, but it doesn't have a stack trace (these are 2 seperate
dumps, right? ie. the copy_process stack doesn't relate
to support.
Therefore, also take the RCU read lock along with disabling IRQs to ensure
the RCU grace period does at least cover these lookups.
Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
Requested-by: Paul E. McKenney paul...@linux.vnet.ibm.com
Cc: Nick Piggin npig...@suse.de
Cc
On Wed, Mar 24, 2010 at 06:56:31PM +1100, Benjamin Herrenschmidt wrote:
Some powerpc code needs to ensure that all previous iounmap/vunmap has
really been flushed out of the MMU hash table. Without that, various
hotplug operations may fail when trying to return those pieces to
the hypervisor
On Wed, Feb 10, 2010 at 10:04:06PM +1100, Anton Blanchard wrote:
For performance reasons we are about to change ISYNC_ON_SMP to sometimes be
lwsync. Now that the macro name doesn't make sense, change it and
LWSYNC_ON_SMP
to better explain what the barriers are doing.
Signed-off-by: Anton
On Wed, Feb 17, 2010 at 08:37:14PM +1100, Anton Blanchard wrote:
Hi Nick,
Cool. How does it go when there are significant amount of instructions
between the lock and the unlock? A real(ish) workload, like dbench on
ramdisk (which should hit the dcache lock).
Good question, I'll see
On Wed, Feb 17, 2010 at 08:43:14PM +1100, Anton Blanchard wrote:
Hi Nick,
Ah, good to see this one come back. I also tested tbench over localhost
btw which actually did show some speedup on the G5.
BTW. this was the last thing left:
On Wed, Feb 10, 2010 at 09:57:28PM +1100, Anton Blanchard wrote:
Recent versions of the PowerPC architecture added a hint bit to the larx
instructions to differentiate between an atomic operation and a lock
operation:
0 Other programs might attempt to modify the word in storage addressed
On Wed, Feb 10, 2010 at 10:10:25PM +1100, Anton Blanchard wrote:
Nick Piggin discovered that lwsync barriers around locks were faster than
isync
on 970. That was a long time ago and I completely dropped the ball in testing
his patches across other ppc64 processors.
Turns out the idea
On Tue, Jul 21, 2009 at 10:02:26AM +1000, Benjamin Herrenschmidt wrote:
On Mon, 2009-07-20 at 12:38 +0200, Nick Piggin wrote:
On Mon, Jul 20, 2009 at 08:00:41PM +1000, Benjamin Herrenschmidt wrote:
On Mon, 2009-07-20 at 10:10 +0200, Nick Piggin wrote:
Maybe I don't understand your
On Mon, Jul 20, 2009 at 05:11:13PM +1000, Benjamin Herrenschmidt wrote:
On Wed, 2009-07-15 at 15:56 +0200, Nick Piggin wrote:
I would like to merge the new support that depends on this in 2.6.32,
so unless there's major objections, I'd like this to go in early during
the merge window. We
On Thu, Jul 16, 2009 at 11:54:15AM +1000, Benjamin Herrenschmidt wrote:
On Wed, 2009-07-15 at 15:56 +0200, Nick Piggin wrote:
Interesting arrangement. So are these last level ptes modifieable
from userspace or something? If not, I wonder if you could manage
them as another level of pointers
On Mon, Jul 20, 2009 at 08:00:41PM +1000, Benjamin Herrenschmidt wrote:
On Mon, 2009-07-20 at 10:10 +0200, Nick Piggin wrote:
Maybe I don't understand your description correctly. The TLB contains
PMDs, but you say the HW still logically performs another translation
step using entries
On Mon, Jul 20, 2009 at 07:59:21PM +1000, Benjamin Herrenschmidt wrote:
On Mon, 2009-07-20 at 10:05 +0200, Nick Piggin wrote:
Unless anybody has other preferences, just send it straight to Linus in
the next merge window -- if any conflicts did come up anyway they would
be trivial. You
On Wed, Jul 15, 2009 at 05:49:47PM +1000, Benjamin Herrenschmidt wrote:
Upcoming paches to support the new 64-bit BookE powerpc architecture
will need to have the virtual address corresponding to PTE page when
freeing it, due to the way the HW table walker works.
Basically, the TLB can be
On Fri, Jun 12, 2009 at 11:14:10AM +0530, Sachin Sant wrote:
Nick Piggin wrote:
I can't really work it out. It seems to be the kmem_cache_cache which has
a problem, but there have already been lots of caches created and even
this samw cache_node already used right beforehand with no problem
On Fri, Jun 12, 2009 at 01:38:50PM +0530, Sachin Sant wrote:
Nick Piggin wrote:
I was able to boot yesterday's next (20090611) on this machine. Not sure
Still with SLQB? With debug options turned on?
Ah .. spoke too soon. The kernel was not compiled with SLQB. Sorry
about
On Mon, Jun 08, 2009 at 05:42:14PM +0530, Sachin Sant wrote:
Pekka J Enberg wrote:
Hi Sachin,
__slab_alloc_page: nid=2, cache_node=c000de01ba00,
cache_list=c000de01ba00
__slab_alloc_page: nid=2, cache_node=c000de01bd00,
cache_list=c000de01bd00
__slab_alloc_page: nid=2,
On Tue, May 12, 2009 at 03:56:13PM +1000, Stephen Rothwell wrote:
Hi Nick,
On Tue, 12 May 2009 06:57:16 +0200 Nick Piggin npig...@suse.de wrote:
Hmm, I think (hope) your problems were fixed with the recent memory
coruption bug fix for SLQB. (if not, let me know)
This one possibly
On Tue, May 12, 2009 at 04:52:45PM +1000, Stephen Rothwell wrote:
Hi Nick,
On Tue, 12 May 2009 16:03:52 +1000 Stephen Rothwell s...@canb.auug.org.au
wrote:
This is what I have been getting for the last few days:
bisected into the net changes, I will follow up there, sorry.
No
On Mon, May 11, 2009 at 06:21:35AM -0600, Matthew Wilcox wrote:
On Mon, May 11, 2009 at 05:34:07PM +0530, Sachin Sant wrote:
Matthew Wilcox wrote:
On Mon, May 11, 2009 at 05:16:10PM +0530, Sachin Sant wrote:
Today's Next tree failed to boot on a Power6 box with following BUG :
This
On Thu, Apr 30, 2009 at 11:06:36AM +0530, Sachin Sant wrote:
Nick Piggin wrote:
Well kmalloc is failing. It should not be though, even if the
current node is offline, it should be able to fall back to other
nodes. Stephen's trace indicates the same thing.
Could you try the following patch
On Thu, Apr 30, 2009 at 11:06:36AM +0530, Sachin Sant wrote:
Nick Piggin wrote:
Well kmalloc is failing. It should not be though, even if the
current node is offline, it should be able to fall back to other
nodes. Stephen's trace indicates the same thing.
Could you try the following patch
On Thu, Apr 30, 2009 at 03:17:12PM +0530, Sachin Sant wrote:
Nick Piggin wrote:
Hmm, forget that. Actually my last patch had a silly mistake because I
forgot MAX_ORDER shift is applied to PAGE_SIZE, rather than 1. So
kmalloc(PAGE_SIZE) was failing as too large.
This patch should do
On Thu, Apr 30, 2009 at 09:00:04PM +1000, Stephen Rothwell wrote:
Hi Pekka, Nick,
On Thu, 30 Apr 2009 13:38:04 +0300 Pekka Enberg penb...@cs.helsinki.fi
wrote:
Stephen, does this patch fix all the boot problems for you as well?
Unfortunately not, I am still getting this:
Memory:
On Thu, Apr 30, 2009 at 02:20:29PM +0300, Pekka Enberg wrote:
On Thu, 2009-04-30 at 13:18 +0200, Nick Piggin wrote:
OK thanks. So I think we have 2 problems. One with MAX_ORDER = 9
that is fixed by the previous patch, and another which is probably
due to having no memory on node 0 which I
On Thu, Apr 30, 2009 at 02:20:29PM +0300, Pekka Enberg wrote:
On Thu, 2009-04-30 at 13:18 +0200, Nick Piggin wrote:
OK thanks. So I think we have 2 problems. One with MAX_ORDER = 9
that is fixed by the previous patch, and another which is probably
due to having no memory on node 0 which I
On Fri, May 01, 2009 at 12:00:33AM +1000, Stephen Rothwell wrote:
Hi Nick,
On Thu, 30 Apr 2009 15:05:42 +0200 Nick Piggin npig...@suse.de wrote:
Hmm, this might do it. The following code now passes some stress testing
in a userspace harness wheras before it did not (and was obviously
On Tue, Apr 28, 2009 at 02:22:06PM +0300, Pekka Enberg wrote:
Nick,
Here's another one. I think we need to either fix these rather quickly
or make SLUB the defaut for linux-next again so we don't interfere
with other testing.
Yeah, I'm working on it. Let me either give you a fix or a patch
..
Does this help?
---
SLQB: fix slab calculation
SLQB didn't consider MAX_ORDER when defining which sizes of kmalloc
slabs to create. It panics at boot if it tries to create a cache
which exceeds MAX_ORDER-1.
Signed-off-by: Nick Piggin npig...@suse.de
---
Index: linux-2.6/include/linux/slqb_def.h
On Wed, Apr 29, 2009 at 09:56:19PM +0530, Sachin Sant wrote:
Nick Piggin wrote:
Does this help?
---
With the patch the machine boots past the failure point, but panics
immediately with the following trace...
OK good, that solves one problem.
Unable to handle kernel paging request
On Wed, Mar 04, 2009 at 03:03:15PM +1100, Benjamin Herrenschmidt wrote:
Allright, sorry for the delay, I had those stored into my need more
than half a brain cell for review list and only got to them today :-)
No problem :)
On Thu, 2009-02-19 at 18:12 +0100, Nick Piggin wrote:
Using
On Wed, Mar 04, 2009 at 03:04:11PM +1100, Benjamin Herrenschmidt wrote:
On Thu, 2009-02-19 at 18:21 +0100, Nick Piggin wrote:
OK, here is this patch again. You didn't think I'd let a 2% performance
improvement be forgotten? :)
Anyway, patch won't work well on architecture without lwsync
the generic code would be able to measure it in case the platform
does not provide it.
But this simple patch at least makes it throttle again.
Signed-off-by: Nick Piggin npig...@suse.de
---
Index: linux-2.6/arch/powerpc/platforms/powermac/cpufreq_64.c
OK, here is this patch again. You didn't think I'd let a 2% performance
improvement be forgotten? :)
Anyway, patch won't work well on architecture without lwsync, but I won't
bother fixing that kind of thing and making it merge worthy until you
guys say something positive about it.
20 runs of
Using lwsync, isync sequence in a microbenchmark is 5 times faster on my G5 than
using sync for smp_mb. Although it takes more instructions.
Running tbench with 4 clients on my 4 core G5 (20 times) gives the
following:
unpatched AVG=920.33 STD=2.36
patched AVG=921.27 STD=2.77
So not a big
On Tue, Feb 17, 2009 at 03:55:40AM +0300, Alexey Dobriyan wrote:
FYI, on powerpc-64-smp-n-debug-n:
mm/slqb.c: In function '__slab_free':
mm/slqb.c:1648: error: implicit declaration of function 'slab_free_to_remote'
mm/slqb.c: In function 'kmem_cache_open':
mm/slqb.c:2174: error: implicit
On Tue, Feb 10, 2009 at 01:53:51PM +0200, Pekka Enberg wrote:
On Tue, Feb 10, 2009 at 11:54 AM, Sachin P. Sant sach...@in.ibm.com wrote:
Sachin P. Sant wrote:
Hi Stephen,
Todays next randconfig build on powerpc fails with
CC mm/slqb.o
mm/slqb.c: In function __slab_free:
On Friday 12 December 2008 07:43, Andrew Morton wrote:
On Thu, 11 Dec 2008 20:28:00 +
Do they actually cross the page boundaries?
Some flavours of slab have at times done an order-1 allocation for
objects which would fit into an order-0 page (etc) if it looks like
that will be
On Friday 12 December 2008 13:47, Andrew Morton wrote:
On Fri, 12 Dec 2008 12:31:33 +1000 Nick Piggin nickpig...@yahoo.com.au
wrote:
On Friday 12 December 2008 07:43, Andrew Morton wrote:
On Thu, 11 Dec 2008 20:28:00 +
Do they actually cross the page boundaries?
Some
On Tuesday 18 November 2008 09:53, Paul Mackerras wrote:
I'd love to be able to use a 4k base page size if I could still get
the reduction in page faults and the expanded TLB reach that we get
now with 64k pages. If we could allocate the page cache for large
files with order-4 allocations
On Tuesday 18 November 2008 13:08, Linus Torvalds wrote:
On Tue, 18 Nov 2008, Paul Mackerras wrote:
Also, you didn't respond to my comments about the purely software
benefits of a larger page size.
I realize that there are benefits. It's just that the downsides tend to
swamp the upsides.
smp_wmb to revert back to eieio for all CPUs. Restore the behaviour
intorduced in 74f0609526afddd88bef40b651da24f3167b10b2.
Signed-off-by: Nick Piggin [EMAIL PROTECTED]
---
Index: linux-2.6/arch/powerpc/include/asm/synch.h
-off-by: Nick Piggin [EMAIL PROTECTED]
---
Index: linux-2.6/arch/powerpc/include/asm/system.h
===
--- linux-2.6.orig/arch/powerpc/include/asm/system.h2008-11-12
12:28:57.0 +1100
+++ linux-2.6/arch/powerpc/include/asm
Implement a more optimal mutex fastpath for powerpc, making use of acquire
and release barrier semantics. This takes the mutex lock+unlock benchmark
from 203 to 173 cycles on a G5.
Signed-off-by: Nick Piggin [EMAIL PROTECTED]
---
Index: linux-2.6/arch/powerpc/include/asm/mutex.h
On Thu, Nov 06, 2008 at 03:09:08PM +1100, Paul Mackerras wrote:
Nick Piggin writes:
On Sun, Oct 12, 2008 at 07:47:32AM +0200, Nick Piggin wrote:
Implement a more optimal mutex fastpath for powerpc, making use of acquire
and release barrier semantics. This takes the mutex lock+unlock
On Mon, Nov 03, 2008 at 04:32:22PM +1100, Paul Mackerras wrote:
Nick Piggin writes:
This is an interesting one for me. AFAIKS it is possible to use lwsync for
a full barrier after a successful ll/sc operation, right? (or stop me here
if I'm wrong).
An lwsync would order subsequent
A previous change removed __SUBARCH_HAS_LWSYNC define, and replaced it
with __powerpc64__. smp_wmb() seems to be the last place not updated.
Signed-off-by: Nick Piggin [EMAIL PROTECTED]
---
Index: linux-2.6/arch/powerpc/include/asm/system.h
smp_rmb can be lwsync if possible. Clarify the comment.
Signed-off-by: Nick Piggin [EMAIL PROTECTED]
---
Index: linux-2.6/arch/powerpc/include/asm/system.h
===
--- linux-2.6.orig/arch/powerpc/include/asm/system.h2008-11-01
23:56
Hi guys,
This is an interesting one for me. AFAIKS it is possible to use lwsync for
a full barrier after a successful ll/sc operation, right? (or stop me here
if I'm wrong).
Anyway, I was interested in exploring this. Unfortunately my G5 might not
be very indicative of more modern, and future
On Sat, Nov 01, 2008 at 11:47:58AM -0500, Kumar Gala wrote:
On Nov 1, 2008, at 7:33 AM, Nick Piggin wrote:
A previous change removed __SUBARCH_HAS_LWSYNC define, and replaced it
with __powerpc64__. smp_wmb() seems to be the last place not updated.
Uugh... no.. I missed the patch
On Thu, Oct 23, 2008 at 03:43:58PM +1100, Benjamin Herrenschmidt wrote:
On Wed, 2008-10-22 at 17:59 +0200, Ingo Molnar wrote:
* Nick Piggin [EMAIL PROTECTED] wrote:
Speed up generic mutex implementations.
- atomic operations which both modify the variable and return something
On Mon, Oct 13, 2008 at 11:20:20AM -0500, Scott Wood wrote:
On Mon, Oct 13, 2008 at 11:15:47AM -0500, Scott Wood wrote:
On Sun, Oct 12, 2008 at 07:47:32AM +0200, Nick Piggin wrote:
+static inline int __mutex_cmpxchg_lock(atomic_t *v, int old, int new)
+{
+ int t;
+
+ __asm__
Implement a more optimal mutex fastpath for powerpc, making use of acquire
and release barrier semantics. This takes the mutex lock+unlock benchmark
from 203 to 173 cycles on a G5.
Signed-off-by: Nick Piggin [EMAIL PROTECTED]
---
Index: linux-2.6/arch/powerpc/include/asm/mutex.h
On Sun, Oct 12, 2008 at 07:47:32AM +0200, Nick Piggin wrote:
Implement a more optimal mutex fastpath for powerpc, making use of acquire
and release barrier semantics. This takes the mutex lock+unlock benchmark
from 203 to 173 cycles on a G5.
+static inline int
+__mutex_fastpath_trylock
from 590 cycles
to 203 cycles on a ppc970 system.
Signed-off-by: Nick Piggin [EMAIL PROTECTED]
---
Index: linux-2.6/include/asm-generic/mutex-dec.h
===
--- linux-2.6.orig/include/asm-generic/mutex-dec.h
+++ linux-2.6/include/asm
On Wednesday 20 August 2008 07:08, Steven Rostedt wrote:
On Tue, 19 Aug 2008, Mathieu Desnoyers wrote:
Ok, there are two cases where it's ok :
1 - in stop_machine, considering we are not touching code executed in
NMI handlers.
2 - when using my replace_instruction_safe() which uses a
On Thursday 31 July 2008 03:34, Andrew Morton wrote:
On Wed, 30 Jul 2008 18:23:18 +0100 Mel Gorman [EMAIL PROTECTED] wrote:
On (30/07/08 01:43), Andrew Morton didst pronounce:
On Mon, 28 Jul 2008 12:17:10 -0700 Eric Munson [EMAIL PROTECTED]
wrote:
Certain workloads benefit if their data
On Thursday 31 July 2008 16:14, Andrew Morton wrote:
On Thu, 31 Jul 2008 16:04:14 +1000 Nick Piggin [EMAIL PROTECTED]
wrote:
Do we expect that this change will be replicated in other
memory-intensive apps? (I do).
Such as what? It would be nice to see some numbers with some HPC
On Thursday 31 July 2008 21:27, Mel Gorman wrote:
On (31/07/08 16:26), Nick Piggin didst pronounce:
I imagine it should be, unless you're using a CPU with seperate TLBs for
small and huge pages, and your large data set is mapped with huge pages,
in which case you might now introduce *new
On Thu, Jul 31, 2008 at 01:48:31PM -0500, Kumar Gala wrote:
Implement _PAGE_SPECIAL and pte_special() for 32-bit powerpc. This bit will
be used by the fast get_user_pages() to differenciate PTEs that correspond
to a valid struct page from special mappings that don't such as IO mappings
On Wed, Jul 30, 2008 at 03:08:40PM +1000, Benjamin Herrenschmidt wrote:
On Wed, 2008-07-30 at 15:06 +1000, Michael Ellerman wrote:
+
+/*
+ * The performance critical leaf functions are made noinline otherwise
gcc
+ * inlines everything into a single function which results in too
On Wed, Jul 30, 2008 at 07:33:26AM -0500, Kumar Gala wrote:
On Jul 29, 2008, at 10:37 PM, Benjamin Herrenschmidt wrote:
From: Nick Piggin [EMAIL PROTECTED]
Implement lockless get_user_pages_fast for powerpc. Page table
existence
is guaranteed with RCU, and speculative page references
On Thursday 24 July 2008 20:50, Sebastien Dugue wrote:
From: Sebastien Dugue [EMAIL PROTECTED]
Date: Tue, 22 Jul 2008 11:56:41 +0200
Subject: [PATCH][RT] powerpc - Make the irq reverse mapping radix tree
lockless
The radix tree used by interrupt controllers for their irq reverse
mapping
This can be folded into powerpc-implement-pte_special.patch
--
Ben has now freed up a pte bit on 64k pages. Use it for special pte bit.
Signed-off-by: Nick Piggin [EMAIL PROTECTED]
---
Index: linux-2.6/include/asm-powerpc/pgtable-64k.h
guarantee on them.
Signed-off-by: Nick Piggin [EMAIL PROTECTED]
Signed-off-by: Dave Kleikamp [EMAIL PROTECTED]
---
arch/powerpc/Kconfig|3
arch/powerpc/mm/Makefile|2
arch/powerpc/mm/gup.c | 245
include/asm
On Tue, May 13, 2008 at 12:25:27PM -0500, Jon Tollefson wrote:
Instead of using the variable mmu_huge_psize to keep track of the huge
page size we use an array of MMU_PAGE_* values. For each supported
huge page size we need to know the hugepte_shift value and have a
pgtable_cache. The hstate
On Thursday 12 June 2008 22:14, Paul Mackerras wrote:
Nick Piggin writes:
/* turn off LED */
val64 = readq(bar0-adapter_control);
val64 = val64 (~ADAPTER_LED_ON);
writeq(val64, bar0-adapter_control
On Wednesday 11 June 2008 15:35, Nick Piggin wrote:
On Wednesday 11 June 2008 15:13, Paul Mackerras wrote:
Nick Piggin writes:
I just wish we had even one actual example of things going wrong with
the current rules we have on powerpc to motivate changing to this
model.
~/usr
On Wednesday 04 June 2008 05:07, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Trent Piepho wrote:
On Tue, 3 Jun 2008, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Nick Piggin wrote:
Linus: on x86, memory operations to wc and wc+ memory are not ordered
with one another, or operations
On Wednesday 11 June 2008 05:19, Jesse Barnes wrote:
On Tuesday, June 10, 2008 12:05 pm Roland Dreier wrote:
me too. That's the whole basis for readX_relaxed() and its cohorts:
we make our weirdest machines (like altix) conform to the x86 norm.
Then where it really kills us we
On Wednesday 11 June 2008 13:40, Benjamin Herrenschmidt wrote:
On Wed, 2008-06-11 at 13:29 +1000, Nick Piggin wrote:
Exactly, yes. I guess everybody has had good intentions here, but
as noticed, what is lacking is coordination and documentation.
You mention strong ordering WRT spin_unlock
On Wednesday 11 June 2008 14:18, Paul Mackerras wrote:
Nick Piggin writes:
OK, I'm sitll not quite sure where this has ended up. I guess you are
happy with x86 semantics as they are now. That is, all IO accesses are
strongly ordered WRT one another and WRT cacheable memory (which includes
On Tuesday 03 June 2008 18:15, Jeremy Higdon wrote:
On Tue, Jun 03, 2008 at 02:33:11PM +1000, Nick Piggin wrote:
On Monday 02 June 2008 19:56, Jes Sorensen wrote:
Would we be able to use Ben's trick of setting a per cpu flag in
writel() then and checking that in spin unlock issuing
On Tuesday 03 June 2008 16:53, Paul Mackerras wrote:
Nick Piggin writes:
So your readl can pass an earlier cacheable store or earlier writel?
No. It's quite gross at the moment, it has a sync before the access
(i.e. a full mb()) and a twi; isync sequence after the access that
stalls
Torvalds wrote:
On Tue, 3 Jun 2008, Nick Piggin wrote:
Linus: on x86, memory operations to wc and wc+ memory are not
ordered with one another, or operations to other memory types (ie.
load/load and store/store reordering is allowed). Also, as you know,
store/load reordering is explicitly
On Wednesday 04 June 2008 05:07, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Trent Piepho wrote:
On Tue, 3 Jun 2008, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Nick Piggin wrote:
Linus: on x86, memory operations to wc and wc+ memory are not ordered
with one another, or operations
On Wednesday 04 June 2008 00:47, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Nick Piggin wrote:
Linus: on x86, memory operations to wc and wc+ memory are not ordered
with one another, or operations to other memory types (ie. load/load
and store/store reordering is allowed). Also, as you know
On Wednesday 04 June 2008 07:44, Trent Piepho wrote:
On Tue, 3 Jun 2008, Matthew Wilcox wrote:
I don't understand why you keep talking about DMA. Are you talking
about ordering between readX() and DMA? PCI proides those guarantees.
I guess you haven't been reading the whole thread. The
On Monday 02 June 2008 19:56, Jes Sorensen wrote:
Jeremy Higdon wrote:
We don't actually have that problem on the Altix. All writes issued
by CPU X will be ordered with respect to each other. But writes by
CPU X and CPU Y will not be, unless an mmiowb() is done by the
original CPU
On Fri, May 23, 2008 at 04:40:21PM +1000, Paul Mackerras wrote:
Nick Piggin writes:
Anyway, even if there were zero, then the point is still that you
implement that API, so you should either strongly order your
__raw_ and _relaxed then you can weaken your rmb, or you have to
strengthen
On Tue, May 13, 2008 at 12:19:36PM -0500, Jon Tollefson wrote:
Allow alloc_bm_huge_page() to be overridden by architectures that can't
always use bootmem. This requires huge_boot_pages to be available for
use by this function. The 16G pages on ppc64 have to be reserved prior
to boot-time. The
On Fri, May 23, 2008 at 12:14:41PM +1000, Paul Mackerras wrote:
Nick Piggin writes:
More than one device driver does raw/relaxed io accessors and expects the
*mb functions to order them.
Can you point us at an example?
Uh, I might be getting confused because the semantics are completely
On Fri, May 23, 2008 at 02:53:21PM +1000, Paul Mackerras wrote:
Nick Piggin writes:
There don't seem to actually be read*_relaxed calls that also use rmb
in the same file (although there is no reason why they might not appear).
But I must be thinking of are the raw_read accessors
barrier which fits the bill.
Signed-off-by: Nick Piggin [EMAIL PROTECTED]
---
Index: linux-2.6/include/asm-powerpc/system.h
===
--- linux-2.6.orig/include/asm-powerpc/system.h
+++ linux-2.6/include/asm-powerpc/system.h
@@ -34,7 +34,7
lwsync is the recommended method of store/store ordering on caching enabled
memory. For those subarchs which have lwsync, use it rather than eieio for
smp_wmb.
Signed-off-by: Nick Piggin [EMAIL PROTECTED]
---
Index: linux-2.6/include/asm-powerpc/system.h
On Wed, May 21, 2008 at 11:27:03AM -0400, Benjamin Herrenschmidt wrote:
On Wed, 2008-05-21 at 16:10 +0200, Nick Piggin wrote:
Hi,
I'm sure I've sent these patches before, but I can't remember why they
weren't merged. They still seem obviously correct to me.
We should already do all
On Wed, May 21, 2008 at 11:26:32AM -0400, Benjamin Herrenschmidt wrote:
On Wed, 2008-05-21 at 16:12 +0200, Nick Piggin wrote:
lwsync is the recommended method of store/store ordering on caching enabled
memory. For those subarchs which have lwsync, use it rather than eieio for
smp_wmb
On Wed, May 21, 2008 at 11:43:00AM -0400, Benjamin Herrenschmidt wrote:
On Wed, 2008-05-21 at 17:34 +0200, Nick Piggin wrote:
On Wed, May 21, 2008 at 11:26:32AM -0400, Benjamin Herrenschmidt wrote:
On Wed, 2008-05-21 at 16:12 +0200, Nick Piggin wrote:
lwsync is the recommended
On Wed, May 21, 2008 at 11:43:00AM -0400, Benjamin Herrenschmidt wrote:
On Wed, 2008-05-21 at 17:34 +0200, Nick Piggin wrote:
On Wed, May 21, 2008 at 11:26:32AM -0400, Benjamin Herrenschmidt wrote:
On Wed, 2008-05-21 at 16:12 +0200, Nick Piggin wrote:
lwsync is the recommended
On Wed, May 21, 2008 at 10:12:03PM +0200, Segher Boessenkool wrote:
From memory, I measured lwsync is 5 times faster than eieio on
a dual G5. This was on a simple microbenchmark that made use of
smp_wmb for store ordering, but it did not involve any IO access
(which presumably would
the subsequent non trivial
patches filter up in their own time. I don't know, just a heads up.
On Wed, May 14, 2008 at 04:12:54PM -0700, Andrew Morton wrote:
From: Nick Piggin [EMAIL PROTECTED]
spufs: convert nopfn to fault
Signed-off-by: Nick Piggin [EMAIL PROTECTED]
Acked-by: Jeremy Kerr
On Tue, May 13, 2008 at 12:07:15PM -0500, Jon Tollefson wrote:
This patch set builds on Nick Piggin's patches for multi size and giant
hugetlb page support of April 22. The following set of patches adds
support for 16G huge pages on ppc64 and support for multiple huge page
sizes at the
On Tuesday 19 February 2008 20:25, Andi Kleen wrote:
On Tue, Feb 19, 2008 at 01:33:53PM +1100, Nick Piggin wrote:
I actually once measured context switching performance in the scheduler,
and removing the unlikely hint for testing RT tasks IIRC gave about 5%
performance drop.
OT: what
On Tuesday 19 February 2008 20:57, Andi Kleen wrote:
On Tue, Feb 19, 2008 at 08:46:46PM +1100, Nick Piggin wrote:
I think it was just a simple context switch benchmark, but not lmbench
(which I found to be a bit too variable). But it was a long time ago...
Do you still have it?
I thought
On Tuesday 19 February 2008 01:39, Andi Kleen wrote:
Arjan van de Ven [EMAIL PROTECTED] writes:
you have more faith in the authors knowledge of how his code actually
behaves than I think is warranted :)
iirc there was a mm patch some time ago to keep track of the actual
unlikely values at
On Tuesday 19 February 2008 13:40, Arjan van de Ven wrote:
On Tue, 19 Feb 2008 13:33:53 +1100
Nick Piggin [EMAIL PROTECTED] wrote:
Actually one thing I don't like about gcc is that I think it still
emits cmovs for likely/unlikely branches, which is silly (the gcc
developers seem
On Tuesday 19 February 2008 16:58, Willy Tarreau wrote:
On Tue, Feb 19, 2008 at 01:33:53PM +1100, Nick Piggin wrote:
Note in particular the last predictors; assuming branch ending
with goto, including call, causing early function return or
returning negative constant are not taken. Just
1 - 100 of 121 matches
Mail list logo