Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

2007-08-17 Thread Krishna Kumar2
Hi Dave, I ran 3 iterations of 45 sec tests (total 1 hour 16 min, but I will run a longer one tonight). The results are (results in KB/s, and %): I ran a 8.5 hours run with no batching + another 8.5 hours run with batching (Buffer sizes: 32 128 512 4096 16384, Threads: 1 8 32, Each test run

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Fri, 17 Aug 2007, Herbert Xu wrote: On Fri, Aug 17, 2007 at 01:43:27PM +1000, Paul Mackerras wrote: The cost of doing so seems to me to be well down in the noise - 44 bytes of extra kernel text on a ppc64 G5 config, and I don't believe the extra few cycles for the occasional extra

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Geert Uytterhoeven
On Thu, 16 Aug 2007, Linus Torvalds wrote: On Fri, 17 Aug 2007, Paul Mackerras wrote: I'm really surprised it's as much as a few K. I tried it on powerpc and it only saved 40 bytes (10 instructions) for a G5 config. One of the things that volatile generally screws up is a simple

[PATCH] phy layer: fix genphy_setup_forced (don't reset)

2007-08-17 Thread Domen Puncer
Writing BMCR_RESET bit will reset MII_BMCR to default values. This is clearly not what we want. Signed-off-by: Domen Puncer [EMAIL PROTECTED] --- drivers/net/phy/phy_device.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: work-powerpc.git/drivers/net/phy/phy_device.c

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: #define atomic_read_volatile(v) \ ({ \ forget((v)-counter);\ ((v)-counter); \ }) where: *vomit* :)

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Thu, 16 Aug 2007, Paul E. McKenney wrote: On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu wrote: On Thu, Aug 16, 2007 at 09:34:41AM -0700, Paul E. McKenney wrote: The compiler can also reorder non-volatile accesses. For an example patch that cares about this, please see:

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Stefan Richter
Nick Piggin wrote: I don't know why people would assume volatile of atomics. AFAIK, most of the documentation is pretty clear that all the atomic stuff can be reordered etc. except for those that modify and return a value. Which documentation is there? For driver authors, there is LDD3. It

Re: [PATCH] lockdep: annotate rcu_read_{,un}lock()

2007-08-17 Thread Peter Zijlstra
On Thu, 2007-08-16 at 09:01 -0700, Paul E. McKenney wrote: On Thu, Aug 16, 2007 at 04:25:07PM +0200, Peter Zijlstra wrote: There seem to be some unbalanced rcu_read_{,un}lock() issues of late, how about doing something like this: This will break when rcu_read_lock() and

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Stefan Richter wrote: Nick Piggin wrote: I don't know why people would assume volatile of atomics. AFAIK, most of the documentation is pretty clear that all the atomic stuff can be reordered etc. except for those that modify and return a value. Which documentation is there?

Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?

2007-08-17 Thread Eric Dumazet
On Fri, 17 Aug 2007 07:33:39 +0800 Herbert Xu [EMAIL PROTECTED] wrote: On Thu, Aug 16, 2007 at 05:40:44PM +0200, Eric Dumazet wrote: So do you think this patch is enough or should we convert dst_run_gc processing from softirq to workqueue too ? I think a workqueue would be the best

Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?

2007-08-17 Thread Herbert Xu
On Fri, Aug 17, 2007 at 10:10:30AM +0200, Eric Dumazet wrote: Will a workqueue react the same in case of a DDOS situation, where softirq could use all CPU cycles to handle incoming packets and feed the GC list, and GC would never have a chance to scan and free some items ? Well when that

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Fri, 17 Aug 2007, Paul Mackerras wrote: Herbert Xu writes: On Fri, Aug 17, 2007 at 03:09:57PM +1000, Paul Mackerras wrote: Herbert Xu writes: Can you find an actual atomic_read code snippet there that is broken without the volatile modifier? There are some in

Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?

2007-08-17 Thread Eric Dumazet
On Fri, 17 Aug 2007 16:15:22 +0800 Herbert Xu [EMAIL PROTECTED] wrote: On Fri, Aug 17, 2007 at 10:10:30AM +0200, Eric Dumazet wrote: Will a workqueue react the same in case of a DDOS situation, where softirq could use all CPU cycles to handle incoming packets and feed the GC list, and

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: #define atomic_read_volatile(v) \ ({ \ forget((v)-counter); \ ((v)-counter); \

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: On Fri, 17 Aug 2007, Herbert Xu wrote: On Fri, Aug 17, 2007 at 01:43:27PM +1000, Paul Mackerras wrote: BTW, the sort of missing barriers that triggered this thread aren't that subtle. It'll result in a simple lock-up if the loop condition holds upon entry. At which

Re: [GENETLINK]: Question: global lock (genl_mutex) possible refinement?

2007-08-17 Thread Richard MUSIL
Thomas Graf wrote: @@ -150,9 +176,9 @@ int genl_register_ops(struct genl_family *family, struct genl_ops *ops) if (ops-policy) ops-flags |= GENL_CMD_CAP_HASPOL; -genl_lock(); +genl_fam_lock(family); list_add_tail(ops-ops_list, family-ops_list); -

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Fri, 17 Aug 2007, Nick Piggin wrote: Stefan Richter wrote: [...] Just use spinlocks if you're not absolutely clear about potential races and memory ordering issues -- they're pretty cheap and simple. I fully agree with this. As Paul Mackerras mentioned elsewhere, a lot of authors

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Andi Kleen
On Friday 17 August 2007 05:42, Linus Torvalds wrote: On Fri, 17 Aug 2007, Paul Mackerras wrote: I'm really surprised it's as much as a few K. I tried it on powerpc and it only saved 40 bytes (10 instructions) for a G5 config. One of the things that volatile generally screws up is a simple

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Sure, now that I learned of these properties I can start to audit code and insert barriers where I believe they are needed, but this simply means that almost all occurrences of atomic_read will get barriers (unless there already

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Also, why would you want to make these insane accessors for atomic_t types? Just make sure everybody knows the basics of barriers, and they can apply that knowledge to atomic_t and all other lockless memory accesses as well.

[PATCH 2.6.22.3] ppp: fix output buffer size in ppp_decompress_frame

2007-08-17 Thread Konstantin Sharlaimov
This patch addresses the issue with osize too small errors in mppe encryption. The patch fixes the issue with wrong output buffer size being passed to ppp decompression routine. Signed-off-by: Konstantin Sharlaimov [EMAIL PROTECTED] --- As pointed out by Suresh Mahalingam, the issue addressed by

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Paul Mackerras
Satyam Sharma writes: I wonder if this'll generate smaller and better code than _both_ the other atomic_read_volatile() variants. Would need to build allyesconfig on lots of diff arch's etc to test the theory though. I'm sure it would be a tiny effect. This whole thread is arguing about

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Sure, now that I learned of these properties I can start to audit code and insert barriers where I believe they are needed, but this simply means that almost all

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Fri, 17 Aug 2007, Andi Kleen wrote: On Friday 17 August 2007 05:42, Linus Torvalds wrote: On Fri, 17 Aug 2007, Paul Mackerras wrote: I'm really surprised it's as much as a few K. I tried it on powerpc and it only saved 40 bytes (10 instructions) for a G5 config. One of the

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Stefan Richter
Nick Piggin wrote: Satyam Sharma wrote: And we have driver / subsystem maintainers such as Stefan coming up and admitting that often a lot of code that's written to use atomic_read() does assume the read will not be elided by the compiler. So these are broken on i386 and x86-64? The

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: It is very obvious. msleep calls schedule() (ie. sleeps), which is always a barrier. Probably you didn't mean that, but no, schedule() is not barrier because it sleeps. It's a barrier because it's

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: I think they would both be equally ugly, You think both these are equivalent in terms of looks: | while (!atomic_read(v)) { | while (!atomic_read_xxx(v)) { ...

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Because they should be thinking about them in terms of barriers, over which the compiler / CPU is not to reorder accesses or cache memory operations, rather than special volatile accesses. This is obviously just a taste thing.

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: It is very obvious. msleep calls schedule() (ie. sleeps), which is always a barrier. Probably you didn't mean that, but no, schedule() is not barrier

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: [...] You think both these are equivalent in terms of looks: | while (!atomic_read(v)) { | while (!atomic_read_xxx(v)) { ... |

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: It is very obvious. msleep calls schedule() (ie. sleeps), which is always a barrier. Probably you didn't mean that, but no, schedule() is not

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Because they should be thinking about them in terms of barriers, over which the compiler / CPU is not to reorder accesses or cache memory operations, rather than special

[PATCH 3/6] ibmveth: Add ethtool TSO handlers

2007-08-17 Thread Brian King
Add handlers for get_tso and get_ufo to prevent errors being printed by ethtool. Signed-off-by: Brian King [EMAIL PROTECTED] --- linux-2.6-bjking1/drivers/net/ibmveth.c |2 ++ 1 file changed, 2 insertions(+) diff -puN drivers/net/ibmveth.c~ibmveth_ethtool_get_tso drivers/net/ibmveth.c ---

[PATCH 4/6] ibmveth: Add ethtool driver stats hooks

2007-08-17 Thread Brian King
Add ethtool hooks to ibmveth to retrieve driver statistics. Signed-off-by: Brian King [EMAIL PROTECTED] --- linux-2.6-bjking1/drivers/net/ibmveth.c | 51 1 file changed, 51 insertions(+) diff -puN drivers/net/ibmveth.c~ibmveth_ethtool_driver_stats

[PATCH 5/6] ibmveth: Remove dead frag processing code

2007-08-17 Thread Brian King
Removes dead frag processing code from ibmveth. Since NETIF_F_SG was not set, this code was never executed. Also, since the ibmveth interface can only handle 6 fragments, core networking code would need to be modified in order to efficiently enable this support. Signed-off-by: Brian King [EMAIL

[PATCH 6/6] ibmveth: Remove use of bitfields

2007-08-17 Thread Brian King
Removes the use of bitfields from the ibmveth driver. This results in slightly smaller object code. Signed-off-by: Brian King [EMAIL PROTECTED] --- linux-2.6-bjking1/drivers/net/ibmveth.c | 90 linux-2.6-bjking1/drivers/net/ibmveth.h | 56

[PATCH 2/6] ibmveth: Implement ethtool hooks to enable/disable checksum offload

2007-08-17 Thread Brian King
This patch adds the appropriate ethtool hooks to allow for enabling/disabling of hypervisor assisted checksum offload for TCP. Signed-off-by: Brian King [EMAIL PROTECTED] --- linux-2.6-bjking1/drivers/net/ibmveth.c | 125 +++-

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Paul E. McKenney
On Fri, Aug 17, 2007 at 01:09:08PM +0530, Satyam Sharma wrote: On Thu, 16 Aug 2007, Paul E. McKenney wrote: On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu wrote: On Thu, Aug 16, 2007 at 09:34:41AM -0700, Paul E. McKenney wrote: The compiler can also reorder non-volatile

Re: [PATCH] lockdep: annotate rcu_read_{,un}lock()

2007-08-17 Thread Paul E. McKenney
On Fri, Aug 17, 2007 at 09:56:45AM +0200, Peter Zijlstra wrote: On Thu, 2007-08-16 at 09:01 -0700, Paul E. McKenney wrote: On Thu, Aug 16, 2007 at 04:25:07PM +0200, Peter Zijlstra wrote: There seem to be some unbalanced rcu_read_{,un}lock() issues of late, how about doing something

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Linus Torvalds
On Fri, 17 Aug 2007, Nick Piggin wrote: That's not obviously just taste to me. Not when the primitive has many (perhaps, the majority) of uses that do not require said barriers. And this is not solely about the code generation (which, as Paul says, is relatively minor even on x86). I

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
Part of the motivation here is to fix heisenbugs. If I knew where they By the same token we should probably disable optimisations altogether since that too can create heisenbugs. Almost everything is a tradeoff; and so is this. I don't believe most people would find disabling all compiler

[PATCH 3/7] fs_enet: Don't share the interrupt.

2007-08-17 Thread Scott Wood
This driver can't handle an interrupt immediately after request_irq (making it fail with CONFIG_DEBUG_SHIRQ), and has unshared interrupts on all hardware I'm aware of. Signed-off-by: Scott Wood [EMAIL PROTECTED] --- drivers/net/fs_enet/fs_enet-main.c |2 +- 1 files changed, 1 insertions(+),

[PATCH 5/7] fs_enet: Align receive buffers.

2007-08-17 Thread Scott Wood
At least some hardware driven by this driver needs receive buffers to be aligned on a 16-byte boundary. This usually happens by chance, but it breaks if slab debugging is enabled. Signed-off-by: Scott Wood [EMAIL PROTECTED] --- drivers/net/fs_enet/fs_enet-main.c | 21 +++--

[PATCH 7/7] fs_enet: sparse fixes

2007-08-17 Thread Scott Wood
Mostly a bunch of __iomem annotations. Signed-off-by: Scott Wood [EMAIL PROTECTED] --- drivers/net/fs_enet/fs_enet-main.c | 16 +- drivers/net/fs_enet/fs_enet.h | 30 +- drivers/net/fs_enet/mac-fcc.c | 60 ---

[PATCH 6/7] fs_enet: Be an of_platform device when CONFIG_PPC_CPM_NEW_BINDING is set.

2007-08-17 Thread Scott Wood
The existing OF glue code was crufty and broken. Rather than fix it, it will be removed, and the ethernet driver now talks to the device tree directly. The old, non-CONFIG_PPC_CPM_NEW_BINDING code can go away once CPM platforms are dropped from arch/ppc (which will hopefully be soon), and

[PATCH 0/7] fs_enet patches

2007-08-17 Thread Scott Wood
The following patchset includes several updates for the fs_enet driver, the most prominent being conversion to an of_platform device (with platform_device code remaining until arch/ppc goes away). It also includes a generic MDIO bitbang library, and converts fs_enet to use it. I have a powerpc

[PATCH 6/7 v2] fs_enet: Be an of_platform device when CONFIG_PPC_CPM_NEW_BINDING is set.

2007-08-17 Thread Scott Wood
The existing OF glue code was crufty and broken. Rather than fix it, it will be removed, and the ethernet driver now talks to the device tree directly. The old, non-CONFIG_PPC_CPM_NEW_BINDING code can go away once CPM platforms are dropped from arch/ppc (which will hopefully be soon), and

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Fri, 17 Aug 2007, Segher Boessenkool wrote: atomic_dec() already has volatile behavior everywhere, so this is semantically okay, but this code (and any like it) should be calling cpu_relax() each iteration through the loop, unless there's a compelling reason not to. I'll

Re: [PATCH] lockdep: annotate rcu_read_{,un}lock()

2007-08-17 Thread Corey Minyard
Paul E. McKenney wrote: On Fri, Aug 17, 2007 at 09:56:45AM +0200, Peter Zijlstra wrote: On Thu, 2007-08-16 at 09:01 -0700, Paul E. McKenney wrote: On Thu, Aug 16, 2007 at 04:25:07PM +0200, Peter Zijlstra wrote: There seem to be some unbalanced rcu_read_{,un}lock() issues of

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Chris Friesen
Linus Torvalds wrote: - in other words, the *only* possible meaning for volatile is a purely single-CPU meaning. And if you only have a single CPU involved in the process, the volatile is by definition pointless (because even without a volatile, the compiler is required to make the

Re: [PATCH] lockdep: annotate rcu_read_{,un}lock()

2007-08-17 Thread Paul E. McKenney
On Fri, Aug 17, 2007 at 08:53:57AM -0700, Paul E. McKenney wrote: On Fri, Aug 17, 2007 at 09:56:45AM +0200, Peter Zijlstra wrote: On Thu, 2007-08-16 at 09:01 -0700, Paul E. McKenney wrote: On Thu, Aug 16, 2007 at 04:25:07PM +0200, Peter Zijlstra wrote: There seem to be some

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Fri, 17 Aug 2007, Paul E. McKenney wrote: On Fri, Aug 17, 2007 at 01:09:08PM +0530, Satyam Sharma wrote: On Thu, 16 Aug 2007, Paul E. McKenney wrote: On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu wrote: First of all, I think this illustrates that what you want

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Paul E. McKenney
On Sat, Aug 18, 2007 at 12:01:38AM +0530, Satyam Sharma wrote: On Fri, 17 Aug 2007, Paul E. McKenney wrote: On Fri, Aug 17, 2007 at 01:09:08PM +0530, Satyam Sharma wrote: On Thu, 16 Aug 2007, Paul E. McKenney wrote: On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Arjan van de Ven
On Fri, 2007-08-17 at 12:50 -0600, Chris Friesen wrote: Linus Torvalds wrote: - in other words, the *only* possible meaning for volatile is a purely single-CPU meaning. And if you only have a single CPU involved in the process, the volatile is by definition pointless (because

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Paul E. McKenney
On Fri, Aug 17, 2007 at 11:54:33AM -0700, Arjan van de Ven wrote: On Fri, 2007-08-17 at 12:50 -0600, Chris Friesen wrote: Linus Torvalds wrote: - in other words, the *only* possible meaning for volatile is a purely single-CPU meaning. And if you only have a single CPU involved

Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread Roland Dreier
Isn't RDMA _part_ of the software net stack within Linux? It very much is not so. This is just nit-picking. You can draw the boundary of the software net stack wherever you want, but I think Sean's point was just that RDMA drivers already are part of Linux, and we all want them to get

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Arjan van de Ven
On Fri, 2007-08-17 at 12:49 -0700, Paul E. McKenney wrote: What about reading values modified in interrupt handlers, as in your random case? Or is this a bug where the user of atomic_read() is invalidly expecting a read each time it is called? the interrupt handler case is an SMP

[PATCH] e1000e: Update e1000e driver to use devres

2007-08-17 Thread Brandon Philips
Conversion of e1000e probe() and remove() to devres. Depends on [patch 1/4] Update net core to use devres. Signed-off-by: Brandon Philips [EMAIL PROTECTED] --- drivers/net/e1000e/netdev.c | 70 ++-- 1 file changed, 17 insertions(+), 53 deletions(-)

Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED] Date: Fri, 17 Aug 2007 12:52:39 -0700 When using RDMA you lose the capability to do packet shaping, classification, and all the other wonderful networking facilities you've grown to love and use over the years. Same thing with TSO and LRO and who

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
Of course, since *normal* accesses aren't necessarily limited wrt re-ordering, the question then becomes one of with regard to *what* does it limit re-ordering?. A C compiler that re-orders two different volatile accesses that have a sequence point in between them is pretty clearly a buggy

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
(and yes, it is perfectly legitimate to want a non-volatile read for a data type that you also want to do atomic RMW operations on) ...which is undefined behaviour in C (and GCC) when that data is declared volatile, which is a good argument against implementing atomics that way in itself.

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
In a reasonable world, gcc should just make that be (on x86) addl $1,i(%rip) on x86-64, which is indeed what it does without the volatile. But with the volatile, the compiler gets really nervous, and doesn't dare do it in one instruction, and thus generates crap like movl

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
Now the second wording *IS* technically correct, but come on, it's 24 words long whereas the original one was 3 -- and hopefully anybody reading the shorter phrase *would* have known anyway what was meant, without having to be pedantic about it :-) Well you were talking pretty formal (and

[RFC] restore netdev_priv optimization

2007-08-17 Thread Stephen Hemminger
Compile tested only!!! Fix optimization of netdev_priv() lost by the addition of multiqueue. Move the variable size subqueues to after the constant size priv area. When putting back the old netdev_priv() code, I tried to make it clearer by using roundup() and ALIGN() macros. ---

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
#define forget(a) __asm__ __volatile__ ( :=m (a) :m (a)) [ This is exactly equivalent to using +m in the constraints, as recently explained on a GCC list somewhere, in response to the patch in my bitops series a few weeks back where I thought +m was bogus. ] [It wasn't explained

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
Here, I should obviously admit that the semantics of *(volatile int *) aren't any neater or well-defined in the _language standard_ at all. The standard does say (verbatim) precisely what constitutes as access to object of volatile-qualified type is implementation-defined, but GCC does help us

Re: [RFC] restore netdev_priv optimization

2007-08-17 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED] Date: Fri, 17 Aug 2007 15:40:22 -0700 Compile tested only!!! Obviously. The first loopback transmit is guarenteed to crash. Fix optimization of netdev_priv() lost by the addition of multiqueue. Move the variable size subqueues to after the constant

Re: [RFC] restore netdev_priv optimization

2007-08-17 Thread Stephen Hemminger
On Fri, 17 Aug 2007 16:04:09 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Fri, 17 Aug 2007 15:40:22 -0700 Compile tested only!!! Obviously. The first loopback transmit is guarenteed to crash. That is fixable. Fix optimization of

[PATCH] ethernet: optimize memcpy and memset

2007-08-17 Thread Stephen Hemminger
The ethernet header management only needs to handle a fixed size address (6 bytes). If the memcpy/memset are changed to be passed a constant length, then compiler can optimize for this case (and if it is smart eliminate string instructions). Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] ---

Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread Roland Dreier
When using RDMA you lose the capability to do packet shaping, classification, and all the other wonderful networking facilities you've grown to love and use over the years. Same thing with TSO and LRO and who knows what else. Not true at all. Full classification and

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Sat, 18 Aug 2007, Segher Boessenkool wrote: #define forget(a) __asm__ __volatile__ ( :=m (a) :m (a)) [ This is exactly equivalent to using +m in the constraints, as recently explained on a GCC list somewhere, in response to the patch in my bitops series a few weeks back

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Sat, 18 Aug 2007, Segher Boessenkool wrote: No it does not have any volatile semantics. atomic_dec() can be reordered at will by the compiler within the current basic unit if you do not add a barrier. volatile has nothing to do with reordering. If you're

[PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Stephen Hemminger
Tne network code does memset for 6 and 8 byte values, that can easily be optimized into simple assignments without string instructions. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/include/asm-i386/string.h 2007-08-17 15:14:37.0 -0700 +++ b/include/asm-i386/string.h

Re: [RFC] restore netdev_priv optimization

2007-08-17 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED] Date: Fri, 17 Aug 2007 16:19:28 -0700 The subqueue is only referenced in start/stop queue and that only happens once per packet on normal tx, and only if multiqueue is used. If it only happens when multiqueue, then why does loopback need at least one

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
#define forget(a) __asm__ __volatile__ ( :=m (a) :m (a)) [ This is exactly equivalent to using +m in the constraints, as recently explained on a GCC list somewhere, in response to the patch in my bitops series a few weeks back where I thought +m was bogus. ] [It wasn't explained

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Paul E. McKenney
On Thu, Aug 16, 2007 at 08:50:30PM -0700, Linus Torvalds wrote: Just try it yourself: volatile int i; int j; int testme(void) { return i = 1; } int testme2(void) { return j = 1; } and compile with all

Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED] Date: Fri, 17 Aug 2007 16:31:07 -0700 When using RDMA you lose the capability to do packet shaping, classification, and all the other wonderful networking facilities you've grown to love and use over the years. Same thing with TSO

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
atomic_dec() writes to memory, so it _does_ have volatile semantics, implicitly, as long as the compiler cannot optimise the atomic variable away completely -- any store counts as a side effect. I don't think an atomic_dec() implemented as an inline asm volatile or one that uses a forget macro

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Herbert Xu
On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote: gcc bugzilla bug #33102, for whatever that ends up being worth. ;-) I had totally forgotten that I'd already filed that bug more than six years ago until they just closed yours as a duplicate of mine :) Good luck in getting it

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
No it does not have any volatile semantics. atomic_dec() can be reordered at will by the compiler within the current basic unit if you do not add a barrier. volatile has nothing to do with reordering. If you're talking of volatile the type-qualifier keyword, then

[RFC] restore netdev_priv optimization (planb)

2007-08-17 Thread Stephen Hemminger
Fix optimization of netdev_priv() lost by the addition of multiqueue. Only configurations that define MULITQUEUE need the extra overhead in netdevice structure and the loss of the netdev_priv optimization. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/include/linux/netdevice.h

Re: [PATCH] e1000e: Update e1000e driver to use devres

2007-08-17 Thread Tejun Heo
Brandon Philips wrote: Conversion of e1000e probe() and remove() to devres. Depends on [patch 1/4] Update net core to use devres. Signed-off-by: Brandon Philips [EMAIL PROTECTED] Acked-by: Tejun Heo [EMAIL PROTECTED] -- tejun - To unsubscribe from this list: send the line unsubscribe

Re: [RFC] restore netdev_priv optimization (planb)

2007-08-17 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED] Date: Fri, 17 Aug 2007 17:49:09 -0700 Fix optimization of netdev_priv() lost by the addition of multiqueue. Only configurations that define MULITQUEUE need the extra overhead in netdevice structure and the loss of the netdev_priv optimization.

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Paul E. McKenney
On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote: On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote: gcc bugzilla bug #33102, for whatever that ends up being worth. ;-) I had totally forgotten that I'd already filed that bug more than six years ago until they

RE: e1000 autotuning doesn't get along with itself

2007-08-17 Thread Brandeburg, Jesse
Rick Jones wrote: Hi Rick, allow me to respond on my way out on a Friday... :-) hpcpc109:~/netperf2_trunk# src/netperf -t TCP_RR -H 192.168.2.105 -D 1.0 -l 15 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.105 (192.168.2.105) port 0 AF_INET : demo : first burst 0

Re: [RFC] restore netdev_priv optimization (planb)

2007-08-17 Thread Kok, Auke
David Miller wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Fri, 17 Aug 2007 17:49:09 -0700 Fix optimization of netdev_priv() lost by the addition of multiqueue. Only configurations that define MULITQUEUE need the extra overhead in netdevice structure and the loss of the netdev_priv

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Christoph Lameter
On Fri, 17 Aug 2007, Paul E. McKenney wrote: On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote: On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote: gcc bugzilla bug #33102, for whatever that ends up being worth. ;-) I had totally forgotten that I'd already

Re: [RFC] restore netdev_priv optimization (planb)

2007-08-17 Thread Stephen Hemminger
On Fri, 17 Aug 2007 18:21:25 -0700 Kok, Auke [EMAIL PROTECTED] wrote: David Miller wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Fri, 17 Aug 2007 17:49:09 -0700 Fix optimization of netdev_priv() lost by the addition of multiqueue. Only configurations that define MULITQUEUE

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Fri, 17 Aug 2007, Christoph Lameter wrote: On Fri, 17 Aug 2007, Paul E. McKenney wrote: On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote: On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote: gcc bugzilla bug #33102, for whatever that ends up being

Re: [RFC] restore netdev_priv optimization (planb)

2007-08-17 Thread David Miller
From: Kok, Auke [EMAIL PROTECTED] Date: Fri, 17 Aug 2007 18:21:25 -0700 this sounds highly optimistic (64 queues is enough for everyone?) and probably will be quickly outdated by both hardware and demand... As such drivers appear in the tree we can adjust the value. Even the most aggressively

Re: [RFC] restore netdev_priv optimization (planb)

2007-08-17 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED] Date: Fri, 17 Aug 2007 18:28:07 -0700 Plan C was replacing MULTIQUEUE boolean with a int value 1 ... 256. All this was a one day what if exercise, not really a big churn.. Yes, that's another reasonable approach. - To unsubscribe from this list: send

[PATCH] atm: replace DPRINTK() with pr_debug

2007-08-17 Thread Stephen Hemminger
Get rid of using DPRINTK macro in ATM and use pr_debug (in kernel.h). Using the standard macro is cleaner and forces code to check for bad arguments and formatting. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/net/atm/clip.c2007-08-17 15:05:49.0 -0400 +++ b/net/atm/clip.c

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Sat, 18 Aug 2007, Segher Boessenkool wrote: atomic_dec() writes to memory, so it _does_ have volatile semantics, implicitly, as long as the compiler cannot optimise the atomic variable away completely -- any store counts as a side effect. I don't think an

Re: [RFC] restore netdev_priv optimization (planb)

2007-08-17 Thread Kok, Auke
David Miller wrote: From: Kok, Auke [EMAIL PROTECTED] Date: Fri, 17 Aug 2007 18:21:25 -0700 this sounds highly optimistic (64 queues is enough for everyone?) and probably will be quickly outdated by both hardware and demand... As such drivers appear in the tree we can adjust the value. Even

[PATCH] net/802: indentation cleanup

2007-08-17 Thread Stephen Hemminger
Run the 802 related protocols through Lindent (and hand cleanup) to fix indentation and whitespace style issues. --- a/net/802/fc.c 2007-08-17 14:39:56.0 -0400 +++ b/net/802/fc.c 2007-08-17 14:50:52.0 -0400 @@ -44,33 +44,29 @@ static int fc_header(struct sk_buff *skb

Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Arjan van de Ven
On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote: Tne network code does memset for 6 and 8 byte values, that can easily be optimized into simple assignments without string instructions. so... question. Why are we doing this by hand? Wouldn't gcc just generate this code in the first

Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Stephen Hemminger
On Fri, 17 Aug 2007 18:49:34 -0700 Arjan van de Ven [EMAIL PROTECTED] wrote: On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote: Tne network code does memset for 6 and 8 byte values, that can easily be optimized into simple assignments without string instructions. so...

Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Arjan van de Ven
On Fri, 2007-08-17 at 18:54 -0700, Stephen Hemminger wrote: On Fri, 17 Aug 2007 18:49:34 -0700 Arjan van de Ven [EMAIL PROTECTED] wrote: On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote: Tne network code does memset for 6 and 8 byte values, that can easily be optimized

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma
On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: I didn't quite understand what you said here, so I'll tell what I think: * foo() is a compiler barrier if the definition of foo() is invisible to the compiler at a callsite. * foo() is also a compiler barrier if the

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
The asm volatile implementation does have exactly the same reordering guarantees as the volatile cast thing, I don't think so. asm volatile creates a side effect. Yeah. Side effects aren't allowed to be reordered wrt sequence points. Yeah. This is exactly the same reason as why

  1   2   >