group ownership of tun devices -- nonfunctional?

2007-08-17 Thread Mike Mohr
Per the post here:

http://lkml.org/lkml/2007/6/18/228

it appears that the group ownership patch has made it into .23.  I am
using these patches, amongst which the kernel component appears to be
identical:

http://sigxcpu.org/unsorted-patches/0001-allow-tun-ownership-by-group.patch
http://sigxcpu.org/unsorted-patches/tunctl_gid.diff

I can create devices that are owned by my user account (tunctl -u
`whoami` -t tap0) and it works fine.  However, if I use group
permissions with -g it stops working.  In all cases, if I pass -g
, the interface is created correctly but it is unusable as a
non-root user.

So my question is: am I doing something wrong?  If I am, I don't see
it.  Assuming then that I am not doing anything wrong on my end, I
assume then that there is something missing from the kernel patch I
applied.  I read over it and I can't see any issues, especially
considering that tunctl comes back without error (even with -g) and
creates an interface.

Just wondering if this was an issue that should be looked into--

Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thinking outside the box on file systems

2007-08-17 Thread Kyle Moffett

On Aug 17, 2007, at 15:01:48, Phillip Susi wrote:

[EMAIL PROTECTED] wrote:
It will become even *more* of a "not that common" if the lock will  
block moves and ACL changes *across the filesystem* for  
potentially *minutes* at a time.


It will not take anywhere NEAR minutes at a time to update the in  
memory dentries, more like 50ms.


One last comment:

50ms to update in-memory dentries would be FRIGGING TERRIBLE!!!   
Using Perl, an interpreted language, the following script takes 3.39s  
to run on one of my lower-end systems:


for (0 .. 1) {
mkdir "a-$_";
mkdir "b-$_";
rename "a-$_", "b-$_";
}

It's not even deleting things afterwards so it's populating a  
directory with ten thousand entries.  We can easily calculate  
10,000/3.39 = 2,949 entries per second, or 0.339 milliseconds per entry.


When I change it to rmdir things instead, the runtime goes down to  
2.89s == 3460 entries/sec == 0.289 milliseconds per entry.


If such a scheme even increases the overhead of a directory rename by  
a hundredth of a millisecond on that box it would easily be a 2-3%  
performance hit.  Given that people tend to kill for 1% performance  
boosts, that's not likely to be a good idea.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Marvell 88E8056 gigabit ethernet controller

2007-08-17 Thread Willy Tarreau
On Fri, Aug 17, 2007 at 05:03:41PM -0700, Stephen Hemminger wrote:
> On Fri, 17 Aug 2007 05:42:13 -0700 (PDT)
> Kevin E <[EMAIL PROTECTED]> wrote:
> 
> > Hi all,
> > 
> > I've read where the onboard Marvell lan controller on
> > some Gigabyte boards don't work.  I've got two systems
> > using the same Gigabyte board, on one the LAN works on
> > the other it dies like described by others.  Here's
> > the systems:
> > 
> > 
> > Working system:
> > Gigabyte 965P-DS3 rev 3.3  (BIOS F10)
> > Core2 Q6600
> > 2GB Corsair XMS2 memory
> > kernel 2.6.22.3
> > 
> > lspci for LAN controller:
> > 04:00.0 Ethernet controller: Marvell Technology Group
> > Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev
> > 14)
> > 
> > 
> > Broken system:
> > Gigabyte 965P-DS3 rev 3.3  (BIOS F10)
> > Core2 E4400
> > 2GB Corsair XMS2 memory
> > kernel 2.6.22.3
> > 
> > lspci for LAN controller:
> > 03:00.0 Ethernet controller: Marvell Technology Group
> > Ltd. Unknown device 4364 (rev 12)
> > 
> > 
> > The BIOS for the two systems are setup the same and
> > the config for the kernels are the same too.  I've
> > actually tried taking the kernel from the working
> > system and booting it on the broken one but still the
> > LAN dies after a couple of seconds.  The working
> > system has one card plugged in (nvidia based PCI-X
> > video card), I've taken that card and plugged into the
> > broken system, booted the same kernel, and it still
> > dies after a while.
> > 
> > I will gladly provide any info needed if it can help
> > in getting this chipset working on the Gigabyte
> > boards.
> > 
> > Thanks,
> > Kevin
> 
> I maintain the sky2 driver, and have one of the (buggy) Gigabyte motherboards.
> It is interesting that the problem seems to track with video card.

No Stephen, look again, he says that moving the video card into the broken
system does not change anything.

> Are you using the Nvidia binary driver?
> The video card in the system I have troubles with is:
>   ATI Technologies Inc RV370 [Radeon X300SE]
> 
> Surprisingly, using other PCI-E cards with same driver (different Marvell 
> chips)
> has no problem.  Vendor version of sk98lin driver has same failure mode
> on the buggy hardware.
> 
> You might want to look at lspci -vvv output on two system to see if there
> are differences. Perhaps there is a CPU speed dependency?

I don't understand why the working one is on PCI bus 3 while the other
is on PCI bus 4. It's just as if the chip embedded a PCI bridge. Maybe
those chips are just cheaper dual-channel controllers with one faulty
controller disabled. It would also explain why the PCI ID is different.

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel 2.6.19 porting

2007-08-17 Thread Willy Tarreau
On Fri, Aug 17, 2007 at 06:24:17PM +0200, Xu Yang wrote:
> Hi everyone,
> 
> I have built up a software virtual prototype system with ARM11MPCORE.
> my system is similar with realview_eb_mpcore. but my system is
> simpler, which has only a mpcore , uart0, console,timer0_1, and
> memorys.
> 
> I want to run kernel 2.6.19 on it. I have done some modification, and
> now I can just print out some information onto the console, and stuck
> at
> 
> <7>Calibrating delay loop...
> 
> Could you guys give me some suggestion that what other adaptions
> should I do according to my system?

it's likely that one of your timers is not working. Adding a few
of printk() in the calibration function should help you.

> any suggestion is welcome:)

If you upgrade to a more recent kernel (eg: 2.6.22), you'll get
more feedback because it's fresher in people's mind.

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 2.6.22.2 ] : drivers/net/via-rhine.c: Offload checksum handling to VT6105M

2007-08-17 Thread Willy Tarreau
On Fri, Aug 17, 2007 at 11:34:37AM -0700, K Naru wrote:
> From: Kim Naru ([EMAIL PROTECTED])
> 
> Added support to offload TCP/UDP/IP checksum to the
> VIA Technologies VT6105M chip.
> Firstly, let the stack know this chip is capable of
> doing its own checksum(IPV4 only).
> Secondly offload checksum to VT6105M, if necessary. 
> 
> 
> Verbose Mode:
> 
> #1. Define 3 bits(18,19,20) in Transmit Descriptor 1
> of chip, which affect checksum processing.
> The prefix(TDES1) for the 3 variables is the short
> name for Transmit Descriptior 1.
> #2. In rhine_init_one(), if pci_rev >=  VT6105M then
> set  NETIF_F_IP_CSUM(see skbuff.h for details).
> #3. In rhine_start_tx() if NETIF_F_IP_CSUM is set AND
> the stack requires a checksum then
> set either bit 19(UDP),20(TCP) AND bit 18(IP).
> 
> Note : The numbered items above(i.e.#1,#2,#3) denote
> pseudo code.
> 
> This patch was developed and tested on Imedia
> linux-2.6.20 under a PC-Engines Alix System board
> (www.pcengines.ch/alix.htm). It was tested(compilation
> only) on linux-2.6.22.2. The minor code change between
> 2.6.20 and 2.6.22 is the use of ip_hdr() in 2.26.22.
> 
> In 2.6.20 :
> struct iphdr *ip = skb->nh.iph;
> In 2.6.22 :
> const struct iphdr *ip = ip_hdr(skb);
> 
> Testing:
> 
> 
> ttcp,netperf ftp and top  where used. There appears to
> be a small CPU utilization gain. Throughput results 
> where more inconclusive.
> 
> The data sheet used to get information is 'VT6105M
> Data Sheet, Revision 1.63  June21,2006'.
> 
> Signed-off-by: Kim Naru ([EMAIL PROTECTED])
> 
> ---
> 
> 
> --- drivers/net/via-rhine.c 2007-08-17
> 00:24:33.0 -0700
> +++ drivers/net/via-rhine.c.orig2007-08-15
> 05:03:20.0 -0700
> @@ -95,8 +95,6 @@ static const int
> multicast_filter_limit 
>  #include 
>  #include 
>  #include 
> -#include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -345,9 +343,6 @@ struct tx_desc {
>  
>  /* Initial value for tx_desc.desc_length, Buffer size
> goes to bits 0-10 */
>  #define TXDESC 0x00e08000
> -#define TDES1_TCPCK0x0010  /* Bit 20,
> Transmit Desc 1 (VT6105M Data Sheet 1.63) */
> -#define TDES1_UDPCK0x0008  /* Bit 19,
> Transmit Desc 1 (VT6105M Data Sheet 1.63) */ 
> -#define TDES1_IPCK 0x0004  /* Bit 18,
> Transmit Desc 1 (VT6105M Data Sheet 1.63) */ 
>  
>  enum rx_status_bits {
> RxOK=0x8000, RxWholePkt=0x0300, RxErr=0x008F
> @@ -793,9 +788,6 @@ static int __devinit
> rhine_init_one(stru
> if (rp->quirks & rqRhineI)
> dev->features |=
> NETIF_F_SG|NETIF_F_HW_CSUM;
>  
> -   if (pci_rev >=  VT6105M)
> -   dev->features |= NETIF_F_IP_CSUM;   /*
> tell stack chip does checksum */
> -
> /* dev->name not defined before
> register_netdev()! */
> rc = register_netdev(dev);
> if (rc)
> @@ -1270,20 +1262,6 @@ static int
> rhine_start_tx(struct sk_buff
>  
> /* lock eth irq */
> spin_lock_irq(>lock);
> -
> -   if ((dev->features & NETIF_F_IP_CSUM) &&
> (skb->ip_summed == CHECKSUM_PARTIAL)) { 
> -   const struct iphdr *ip = ip_hdr(skb);
> -
> -   /* offload checksum to chip. */
> -
> -   if (ip->protocol == IPPROTO_TCP)
> -   rp->tx_ring[entry].desc_length
> |= TDES1_TCPCK;
> -   else if (ip->protocol == IPPROTO_UDP)
> -   rp->tx_ring[entry].desc_length
> |=  TDES1_UDPCK;
> -   rp->tx_ring[entry].desc_length |= 
> TDES1_IPCK;
> -
> -   }
> -
> wmb();
> rp->tx_ring[entry].tx_status =
> cpu_to_le32(DescOwn);
> wmb();


your patch was reversed! Also it's not at the proper level.
You should proceed this way :

   $  diff -u ./drivers/net/via-rhine.c{.orig,}

Note the "./" which makes your patch work both at -p0 and -p1

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Smack: Simplified Mandatory Access Control Kernel

2007-08-17 Thread Kyle Moffett

Finally moved back in and with internet.  Yay!

On Aug 17, 2007, at 00:56:44, Casey Schaufler wrote:
It would not surprise me particularly much if Kyle or someone like  
him could produce a perl script that could generate an SELinux  
policy that, when added to the reference policy on a system  
described by the reference policy, could do a fair imitation of the  
Smack scheme.


Umm, when did I ever say "emulate smack on top of the reference  
policy"?  I state categorically that I can write an estimated 500  
line perl script which will generate a standalone SELinux policy  
based directly on a smack ruleset.  It would require no additional  
policy beyond what the script outputs, and the script would be only  
roughly 500 lines so it can't contain all that much direct source-to- 
output text.


I've started tinkering with that perl script, though I probably won't  
get it finished till tomorrow or sunday.



One point that I would like to make clear however is that the  
requirement for a 400,000 line reference policy for a jumping off  
point is one of the reasons for Smack.


There is no "requirement" for a 400,000-line reference policy to  
reproduce exactly the behavior of SMACK.  The SMACK architecture is  
trivial and therefore the SELinux policy is also simple.



and argue that SMACK is better, anyway, because of its  
simplicity / speed / something.


My understanding of the current SELinux philosophy is that policy  
should only be written by professionals, and that this was "always"  
the intention. I respect that, and for policy that requires the  
level of sophistication that SELinux does I would have a hard time  
arguing otherwise.


I can also state categorically that given the set of all admins,  
users, and software developers, hardly a fraction of them are  
qualified to write security policy at all.  Hell, most admins and  
software developers can't get SUID binaries right, and that's a  
thousand times simpler than a MAC security policy.  Ergo the only  
people who should be writing security policy for deployment are those  
people who have studied and trained in the stuff.  Those people are  
also known as "security professionals".



One of the things that limited the widespread adoption of MLS  
systems was that the policy, even one as simple as Bell & LaPadula,  
was considered to complex for most uses. I do not see that SELinux,  
or AppArmor for that matter, addresses this fundimental impediment  
to the use of mandatory access control. Yes, you can do just about  
anything with the right combination of classes, booleans, and other  
interesting facilities, but you can't do simple things directly.


Neither security nor your average distro nowadays is "simple" by any  
stretch of the imagination.  Hell, my desktop system hits at least 2  
million unique lines of code during boot, let alone logging in to  
XFCE.  If you can show me a security system other than SELinux which  
is sufficiently flexible to secure those 2 million lines of code  
along with the other 50 million lines of code found in various pieces  
of software on my Debian box then I'll go put on my dunce hat and sit  
in the corner.



Cheers,
Kyle Moffett



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread Roland Dreier
 > This is also a series of falsehoods.  All packet filtering,
 > queue management, and packet scheduling facilities work perfectly
 > fine and as designed with both LRO and TSO.

I'm not sure I follow.  Perhaps "broken" was too strong a word to use,
but if you pass a huge segment to a NIC with TSO, then you've given
the NIC control of scheduling the packets that end up getting put on
the wire.  If your software packet scheduling is operating at a bigger
scale, then things work fine, but I don't see how you can say that TSO
doesn't lead to head-of-line blocking etc at short time scales.  And
yes of course I agree you can make sure things work by using short
segments or not using TSO at all.

Similarly with LRO the packets that get passed to the stack are not
the packets that were actually on the wire.  Sure, most filtering will
work fine but eg are you sure your RTT estimates aren't going to get
screwed up and cause some subtle bug?  And I could trot out all the
same bugaboos that are brought up about RDMA and warn darkly about
security problems with bugs in NIC hardware that after all has to
parse and rewrite TCP and IP packets.

Also, looking at the complexity and bug-fixing effort that go into
making TSO work vs the really pretty small gain it gives also makes
part of me wonder whether the noble proclamations about
maintainability are always taken to heart.

Of course I know everything I just wrote is wrong because I forgot to
refer to the crucial axiom that stateless == good && RDMA == bad.
And sometimes it's unfortunate that in Linux when there's disagreement
about something, the default action is *not* to do something.

Sorry for prolonging this argument.  Dave, I should say that I
appreciate all the work you've done in helping build the most kick-ass
networking stack in history.  And as I said before, I have plenty of
interesting work to do however this turns out, so I'll try to leave
any further arguing to people who actually have a dog in this fight.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 2.6.20: INFO: possible circular locking dependency detected

2007-08-17 Thread Willy Tarreau
On Thu, Aug 16, 2007 at 09:23:48PM +0100, Alan Cox wrote:
> On Thu, 16 Aug 2007 19:50:43 +0200
> Authenticated <[EMAIL PROTECTED]> wrote:
> 
> > 
> > ===
> > [ INFO: possible circular locking dependency detected ]
> > 2.6.20-1.2948.self #1
> > ---
> > squid/4058 is trying to acquire lock:
> >  (tty_mutex){--..}, at: [] print_warning+0x8b/0x12e
> > 
> > but task is already holding lock:
> >  (>s_dquot.dqptr_sem){}, at: [] 
> > dquot_alloc_inode+0x34/0x10a
> > 
> > which lock already depends on the new lock.
> 
> Yep - known 2.6.20 problem. Its very very unlikely to ever cause a real
> problem however.

Is there any known unmerged fix for this one ?

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

The documentation simply doesn't say "+m" is allowed.  The code to
allow it was added for the benefit of people who do not read the
documentation.  Documentation for "+m" might get added later if it
is decided this [the code, not the documentation] is a sane thing
to have (which isn't directly obvious).


Huh?

"If the (current) documentation doesn't match up with the (current)
code, then _at least one_ of them has to be (as of current) wrong."

I wonder how could you even try to disagree with that.


Easy.

The GCC documentation you're referring to is the user's manual.
See the blurb on the first page:

"This manual documents how to use the GNU compilers, as well as their
features and incompatibilities, and how to report bugs.  It corresponds
to GCC version 4.3.0.  The internals of the GNU compilers, including
how to port them to new targets and some information about how to write
front ends for new languages, are documented in a separate manual."

_How to use_.  This documentation doesn't describe in minute detail
everything the compiler does (see the source code for that -- no, it
isn't described in the internals manual either).

If it doesn't tell you how to use "+m", and even tells you _not_ to
use it, maybe that is what it means to say?  It doesn't mean "+m"
doesn't actually do something.  It also doesn't mean it does what
you think it should do.  It might do just that of course.  But treating
writing C code as an empirical science isn't such a smart idea.


And I didn't go whining about this ... you asked me. (I think I'd said
something to the effect of GCC docs are often wrong,


No need to guess at what you said, even if you managed to delete
your own mail already, there are plenty of free web-based archives
around.  You said:


See, "volatile" C keyword, for all it's ill-definition and dodgy
semantics, is still at least given somewhat of a treatment in the C
standard (whose quality is ... ummm, sadly not always good and clear,
but unsurprisingly, still about 5,482 orders-of-magnitude times
better than GCC docs).


and that to me reads as complaining that the ISO C standard "isn't
very good" and that the GCC documentation is 10**5482 times worse
even.  Which of course is hyperbole and cannot be true.  It also
isn't helpful in any way or form for anyone on this list.  I call
that whining.


which is true,


Yes, documentation of that size often has shortcomings.  No surprise
there.  However, great effort is made to make it better documentation,
and especially to keep it up to date; if you find any errors or
omissions, please report them.  There are many ways how to do that,
see the GCC homepage.


but probably you feel saying that is "not allowed" on non-gcc lists?)


You're allowed to say whatever you want.  Let's have a quote again
shall we?  I said:


If you find any problems/shortcomings in the GCC documentation,
please file a PR, don't go whine on some unrelated mailing lists.
Thank you.


I read that as a friendly request, not a prohibition.  Well maybe
not actually friendly, more a bit angry.  A request, either way.


As for the "PR"


"Problem report", a bugzilla ticket.  Sorry for using terminology
unknown to you.


you're requesting me to file with GCC for this, that
gcc-patches@ thread did precisely that


Actually not -- PRs make sure issues aren't forgotten (although
they might gather dust, sure).  But yes, submitting patches is a
Great Thing(tm).


and more (submitted a patch to
said documentation -- and no, saying "documentation might get added
later" is totally bogus and nonsensical -- documentation exists to
document current behaviour, not past).


When code like you want to write becomes a supported feature, that
will be reflected in the user manual.  It is completely nonsensical
to expect everything that is *not* a supported feature to be mentioned
there.


I wouldn't have replied, really, if you weren't so provoking.


Hey, maybe that character trait is good for something, then.
Now to build a business plan around it...


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


SLUB bug on sparc64

2007-08-17 Thread David Miller

Hi Christoph,

When I force SLUB debugging on sparc64, it barfs on early bootup while
making the sysfs nodes.  Removing the BUG()'s on the sysfs error
returns, and adding some tracing I captured the log below.

BTW, I would recommending removing the BUG() calls, they serve only to
force someone hitting them to edit them out and reboot which is
absolutely pointless :-) This code can also often run before the
console is first enabled, making analysis of the error return that
much harder if it BUG()'s instead of trying to continue.

The problem seems to be that both tsb_512KB and tsb_1MB are
unmergeable, and so are the kmalloc caches of those sizes.  But the
"unique name" created for the aliases are identical, always ":%d"
where %d is the cache size, so the kobject_add() fails the second
time around.

[16529.414107] sysfs_slab_add: name[shmem_inode_cache] size[800] unmergeable[1]
[16529.414958] sysfs_slab_add: name[nsproxy] size[120] unmergeable[1]
[16529.415683] sysfs_slab_add: name[posix_timers_cache] size[232] unmergeable[1]
[16529.416459] sysfs_slab_add: name[uid_cache] size[160] unmergeable[1]
[16529.417180] sysfs_slab_add: name[ip_mrt_cache] size[192] unmergeable[1]
[16529.417952] sysfs_slab_add: name[UDP-Lite] size[800] unmergeable[1]
[16529.418674] sysfs_slab_add: name[tcp_bind_bucket] size[128] unmergeable[1]
[16529.419402] sysfs_slab_add: name[inet_peer_cache] size[160] unmergeable[1]
[16529.420179] sysfs_slab_add: name[secpath_cache] size[128] unmergeable[1]
[16529.420913] sysfs_slab_add: name[xfrm_dst_cache] size[416] unmergeable[1]
[16529.421689] sysfs_slab_add: name[ip_fib_alias] size[128] unmergeable[1]
[16529.422417] sysfs_slab_add: name[ip_fib_hash] size[128] unmergeable[1]
[16529.423194] sysfs_slab_add: name[ip_dst_cache] size[384] unmergeable[1]
[16529.423930] sysfs_slab_add: name[arp_cache] size[288] unmergeable[1]
[16529.424723] sysfs_slab_add: name[RAW] size[800] unmergeable[1]
[16529.425445] sysfs_slab_add: name[UDP] size[800] unmergeable[1]
[16529.426234] sysfs_slab_add: name[tw_sock_TCP] size[224] unmergeable[1]
[16529.426968] sysfs_slab_add: name[request_sock_TCP] size[160] unmergeable[1]
[16529.427745] sysfs_slab_add: name[TCP] size[1568] unmergeable[1]
[16529.428467] sysfs_slab_add: name[eventpoll_pwq] size[144] unmergeable[1]
[16529.429244] sysfs_slab_add: name[eventpoll_epi] size[224] unmergeable[1]
[16529.429977] sysfs_slab_add: name[sgpool-128] size[3168] unmergeable[1]
[16529.430753] sysfs_slab_add: name[sgpool-64] size[1632] unmergeable[1]
[16529.431495] sysfs_slab_add: name[sgpool-32] size[864] unmergeable[1]
[16529.432223] sysfs_slab_add: name[sgpool-16] size[480] unmergeable[1]
[16529.432995] sysfs_slab_add: name[sgpool-8] size[288] unmergeable[1]
[16529.433722] sysfs_slab_add: name[scsi_io_context] size[184] unmergeable[1]
[16529.434528] sysfs_slab_add: name[blkdev_ioc] size[136] unmergeable[1]
[16529.435254] sysfs_slab_add: name[blkdev_queue] size[1624] unmergeable[1]
[16529.436028] sysfs_slab_add: name[blkdev_requests] size[360] unmergeable[1]
[16529.436760] sysfs_slab_add: name[biovec-256] size[4192] unmergeable[1]
[16529.437538] sysfs_slab_add: name[biovec-128] size[2144] unmergeable[1]
[16529.438270] sysfs_slab_add: name[biovec-64] size[1120] unmergeable[1]
[16529.439041] sysfs_slab_add: name[biovec-16] size[352] unmergeable[1]
[16529.439768] sysfs_slab_add: name[biovec-4] size[160] unmergeable[1]
[16529.440540] sysfs_slab_add: name[biovec-1] size[88] unmergeable[1]
[16529.441266] sysfs_slab_add: name[bio] size[192] unmergeable[1]
[16529.442035] sysfs_slab_add: name[sock_inode_cache] size[704] unmergeable[1]
[16529.442901] sysfs_slab_add: name[skbuff_fclone_cache] size[480] 
unmergeable[1]
[16529.443710] sysfs_slab_add: name[skbuff_head_cache] size[288] unmergeable[1]
[16529.65] sysfs_slab_add: name[file_lock_cache] size[248] unmergeable[1]
[16529.445197] sysfs_slab_add: name[proc_inode_cache] size[640] unmergeable[1]
[16529.445993] sysfs_slab_add: name[sigqueue] size[232] unmergeable[1]
[16529.446730] sysfs_slab_add: name[radix_tree_node] size[624] unmergeable[1]
[16529.447515] sysfs_slab_add: name[bdev_cache] size[832] unmergeable[1]
[16529.448250] sysfs_slab_add: name[sysfs_dir_cache] size[160] unmergeable[1]
[16529.449103] sysfs_slab_add: name[mnt_cache] size[288] unmergeable[1]
[16529.449836] sysfs_slab_add: name[inode_cache] size[608] unmergeable[1]
[16529.450616] sysfs_slab_add: name[dentry] size[280] unmergeable[1]
[16529.451347] sysfs_slab_add: name[filp] size[288] unmergeable[1]
[16529.452127] sysfs_slab_add: name[names_cache] size[4192] unmergeable[1]
[16529.452890] sysfs_slab_add: name[key_jar] size[224] unmergeable[1]
[16529.453670] sysfs_slab_add: name[idr_layer_cache] size[600] unmergeable[1]
[16529.454515] sysfs_slab_add: name[buffer_head] size[176] unmergeable[1]
[16529.455293] sysfs_slab_add: name[mm_struct] size[1056] unmergeable[1]
[16529.456028] sysfs_slab_add: name[vm_area_struct] size[240] unmergeable[1]
[16529.456807] sysfs_slab_add: 

Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Fri, 17 Aug 2007 20:31:39 -0700

> On Fri, 17 Aug 2007 18:57:00 -0700
> Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> 
> > 
> > On Fri, 2007-08-17 at 18:54 -0700, Stephen Hemminger wrote:
> > > On Fri, 17 Aug 2007 18:49:34 -0700
> > > Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> > > 
> > > > 
> > > > On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote:
> > > > > Tne network code does memset for 6 and 8 byte values, that can easily
> > > > > be optimized into simple assignments without string instructions.
> > > > 
> > > > 
> > > > so... question.
> > > > Why are we doing this by hand? Wouldn't gcc just generate this code in
> > > > the first place (when using __builtin_memset)? I very much suspect it
> > > > would (and if some version doesn't we really ought to get that
> > > > fixed)
> > > 
> > > i386 and x86_64 are not using __builtin_memset, as least from the
> > > code that I see generated.
> > 
> > .. maybe we should just fix it that way then?
> > 
> There probably is history behind the decision, like gcc problems
> on some old compiler version.

Yes, but those reasons are very likely no longer true.

In fact, just removing the memcpy macro altogether is the best
thing to do.  GCC will do inline memcpy when appropriate.
Then put all of the rep; movsl; etc. code in an external
assembler file and name the routine memcpy.

The inlining is senseless even if it all gets optimized away
into the bare necessary instructions.  All the x86 registers
get clobbered in most of those rep; movsl; code paths, so
it's hardly going to be more expensive to extern the thing
and it will make the kernel smaller to boot.

Anyways the point is to make the real C symbol called by "memcpy"
because that's what makes all the automatic gcc inline memcpy logic
kick in.  If you define the "memcpy" as a macro which calls
differently named functions, you bypass all of that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] scripts/get_maintainer.pl

2007-08-17 Thread Joe Perches
Add a easier method to find maintainers to CC for a patch

Signed-off-by: Joe Perches <[EMAIL PROTECTED]>

Changes since last submittal:

Added options to include|ignore penguin-chiefs
Changed defaults to appropriate for git-send-email
Added optional git signed-off-by: checking
Changed syntax to be more perl like  

diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
new file mode 100755
index 000..9c1d4cb
--- /dev/null
+++ b/scripts/get_maintainer.pl
@@ -0,0 +1,375 @@
+#!/usr/bin/perl -w
+# (c) 2007, Joe Perches <[EMAIL PROTECTED]>
+#   created from checkpatch.pl
+#
+# Print the contact information for the maintainers
+# of the files modified in a patch
+#
+# usage: perl scripts/get_maintainers.pl 
+#
+# Licensed under the terms of the GNU GPL License version 2
+
+use strict;
+
+my $P = $0;
+$P =~ [EMAIL PROTECTED]/@@g;
+
+my $V = '0.08';
+
+use Getopt::Long qw(:config no_auto_abbrev);
+
+my $tree = "./";
+my $email_usename = 1;
+my $email_maintainer = 1;
+my $email_list = 1;
+my $email_subscriber_list = 0;
+my $email_git = 1;
+my $email_git_penguin_chiefs = 0;
+my $email_multiline = 1;
+my $email_separator = ", ";
+
+my %saw;
+
+my @penguin_chief = ();
+push(@penguin_chief,"Linus Torvalds:[EMAIL PROTECTED]");
+push(@penguin_chief,"Andrew Morton:[EMAIL PROTECTED]");
+
+my @penguin_chief_names = ();
+foreach my $chief (@penguin_chief) {
+if ($chief =~ m/^(.*):(.*)/) {
+   my $chief_name = $1;
+   my $chief_addr = $2;
+   push(@penguin_chief_names, $chief_name);
+}
+}
+my $penguin_chiefs = join("|",@penguin_chief_names);
+
+GetOptions(
+  'tree=s' => \$tree,
+  'git!' => \$email_git,
+  'git-chief-penguins!' => \$email_git_penguin_chiefs,
+  'm!' => \$email_maintainer,
+  'n!' => \$email_usename,
+  'l!' => \$email_list,
+  's!' => \$email_subscriber_list,
+  'multiline!' => \$email_multiline,
+  'separator=s' => \$email_separator,
+  ) or exit;
+
+my $exit = 0;
+
+if ($#ARGV < 0 ||
+(!$email_maintainer
+ && !$email_list
+ && !$email_subscriber_list
+ && !$email_git
+ && !$email_git_penguin_chiefs)) {
+print "usage: $P [options] patchfile\n";
+print "version: $V\n";
+print "  --tree [path] => linux kernel source path\n";
+print "  --git => include recent git \*-by: signers\n";
+print "  --git-penguin-chiefs => include ${penguin_chiefs}\n";
+print "  --m => include maintainer(s) if any\n";
+print "  --n => include name 'Full Name <[EMAIL PROTECTED]>'\n";
+print "  --l => include list(s) if any\n";
+print "  --s => include subscriber only list(s) if any\n";
+print "  --separator [, ] => separator for multiple addresses on 1 line\n";
+print "  --multiline => print 1 address per line\n";
+print "Default: [--g --m --l --multiline]\n";
+print "Be sure to select something...\n";
+exit(1);
+}
+
+if ($tree && !top_of_kernel_tree($tree)) {
+if (${tree} ne "" and ${tree} ne "./") {
+   print "'${tree}' ";
+} else {
+   print "The current directory ";
+}
+print "doesn't appear to be a linux kernel source tree\n";
+exit(2);
+}
+
+## Read MAINTAINERS for type/value pairs
+
+my @typevalue = ();
+open(MAINT, "<${tree}MAINTAINERS") || die "$P: Can't open 
${tree}MAINTAINERS\n";
+while () {
+if (m/^(\C):\s*(.*)/) {
+   my $type = $1;
+   my $value = $2;
+
+   ##Filename pattern matching
+   if ($type eq "F" || $type eq "X") {
+   $value =~ [EMAIL PROTECTED]@[EMAIL PROTECTED];   ##Convert . to 
\.
+   $value =~ s/\*/\.\*/g;   ##Convert * to .*
+   }
+   push(@typevalue, "$type:$value");
+} elsif (!/^(\s)*$/) {
+   push(@typevalue, $_);
+}
+}
+close(MAINT);
+
+## Find the patched filenames
+
+my @patchedfiles = ();
+open(PATCH, "<$ARGV[0]") or die "Can't open $ARGV[0]\n";
+while () {
+if (m/^\+\+\+\s+(\S+)/) {
+   my $file = $1;
+   $file =~ [EMAIL PROTECTED]/]*/@@;
+   $file =~ [EMAIL PROTECTED]@@;
+   push(@patchedfiles, $file);
+}
+}
+close(PATCH);
+
+# Sort and uniq patchedfiles
+
+undef %saw;
[EMAIL PROTECTED] = sort @patchedfiles;
[EMAIL PROTECTED] = grep(!$saw{$_}++, @patchedfiles);
+
+# Find responsible parties
+
+my @email_to = ();
+foreach my $patchedfile (@patchedfiles) {
+my $exclude = 0;
+
+#Git
+
+if ($email_git_penguin_chiefs) {
+   foreach my $chief (@penguin_chief) {
+   if ($chief =~ m/^(.*):(.*)/) {
+   my $chief_name = $1;
+   my $chief_addr = $2;
+   if ($email_usename) {
+   push(@email_to, format_email($chief_name, $chief_addr));
+   } else {
+   push(@email_to, $chief_addr);
+   }
+   }
+   }
+}
+
+if ($email_git) {
+   recent_git_signoffs($patchedfile);
+}
+
+#Do not match excluded file patterns
+
+foreach my $line (@typevalue) {
+   if ($line 

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Linus Torvalds


On Sat, 18 Aug 2007, Satyam Sharma wrote:
> 
> No code does (or would do, or should do):
> 
>   x.counter++;
> 
> on an "atomic_t x;" anyway.

That's just an example of a general problem.

No, you don't use "x.counter++". But you *do* use

if (atomic_read() <= 1)

and loading into a register is stupid and pointless, when you could just 
do it as a regular memory-operand to the cmp instruction.

And as far as the compiler is concerned, the problem is the 100% same: 
combining operations with the volatile memop.

The fact is, a compiler that thinks that

movl mem,reg
cmpl $val,reg

is any better than

cmpl $val,mem

is just not a very good compiler. But when talking about "volatile", 
that's exactly what ytou always get (and always have gotten - this is 
not a regression, and I doubt gcc is alone in this).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rtc: Make rtc-ds1742 driver hotplug-aware

2007-08-17 Thread David Brownell
On Friday 17 August 2007, Kay Sievers wrote:
> 
> > I'm not the one who's advocating a change here.  If you want to
> > first change/break and then fix things, all of that is up to you.
> 
> I'm happy to do that. Patch is attached.

NAK.  That wasn't even a serious attempt at the "fix" part,
though it does the "break" part well enough to cause severe
regressions.

(As well as leaving all my technical points about your pushback
un-addressed.  As I noted before, the evident fact that you don't
have technical responses to them says to me that your pushback
has no real technical basis ...)


Out of around 300 platform drivers in the tree (and many more
not yet merged upstream), this makes it so that only *THREE* of
them can hotplug ... versus in the current tree, essentially
everything that's not a legacy driver is hotplugging just fine.

That's one heck of a regression.  Just shy of 100% ...

Plus it treats rtc-ds1742 as if it's a platform_driver not
an i2c_driver.

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + cifs-check-for-granted-memory.patch added to -mm tree

2007-08-17 Thread Cyrill Gorcunov
[Jesper Juhl - Sat, Aug 18, 2007 at 12:17:33AM +0200]
| On 17/08/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
| >
| > The patch titled
| >  CIFS: check for granted memory
| > has been added to the -mm tree.  Its filename is
| >  cifs-check-for-granted-memory.patch
| >
| > *** Remember to use Documentation/SubmitChecklist when testing your code ***
| >
| > See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
| > out what to do about this
| >
| > --
| > Subject: CIFS: check for granted memory
| > From: Cyrill Gorcunov <[EMAIL PROTECTED]>
| >
| > Add a check for granted memory to prevent possible NULL pointer usage.
| >
| > Signed-off-by: Cyrill Gorcunov <[EMAIL PROTECTED]>
| > Cc: Steven French <[EMAIL PROTECTED]>
| > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
| > ---
| >
| >  fs/cifs/sess.c |4 
| >  1 files changed, 4 insertions(+)
| >
| > diff -puN fs/cifs/sess.c~cifs-check-for-granted-memory fs/cifs/sess.c
| > --- a/fs/cifs/sess.c~cifs-check-for-granted-memory
| > +++ a/fs/cifs/sess.c
| > @@ -372,6 +372,10 @@ CIFS_SessSetup(unsigned int xid, struct
| >
| > /* 2000 big enough to fit max user, domain, NOS name etc. */
| > str_area = kmalloc(2000, GFP_KERNEL);
| > +   if (str_area == NULL) {
| > +   cifs_small_buf_release(smb_buf);
| > +   return -ENOMEM;
| > +   }
| 
| The patch, as such, is fine - not arguing against it, but as a matter
| of style; don't we usually prefer the "if (!foo)" form over "if (foo
| == NULL)" ??

I just don't wanna mess 'if (!ptr)' and 'if (ptr == NULL)'
in the procedure. Look at the code ;) Some procs in CIFS
_does_ use '!ptr' but others - 'ptr == NULL'. So in the
proc being patched 'if (ptr == NULL)' is assumed.

| 
| -- 
| Jesper Juhl <[EMAIL PROTECTED]>
| Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
| Plain text mails only, please  http://www.expita.com/nomime.html
| 


Cyrill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Stephen Hemminger
On Fri, 17 Aug 2007 18:57:00 -0700
Arjan van de Ven <[EMAIL PROTECTED]> wrote:

> 
> On Fri, 2007-08-17 at 18:54 -0700, Stephen Hemminger wrote:
> > On Fri, 17 Aug 2007 18:49:34 -0700
> > Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> > 
> > > 
> > > On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote:
> > > > Tne network code does memset for 6 and 8 byte values, that can easily
> > > > be optimized into simple assignments without string instructions.
> > > 
> > > 
> > > so... question.
> > > Why are we doing this by hand? Wouldn't gcc just generate this code in
> > > the first place (when using __builtin_memset)? I very much suspect it
> > > would (and if some version doesn't we really ought to get that
> > > fixed)
> > 
> > i386 and x86_64 are not using __builtin_memset, as least from the
> > code that I see generated.
> 
> .. maybe we should just fix it that way then?
> 
There probably is history behind the decision, like gcc problems
on some old compiler version.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/7] Simple Performance Counters: Core Piece

2007-08-17 Thread Mathieu Desnoyers
* Christoph Lameter ([EMAIL PROTECTED]) wrote:
> On Fri, 17 Aug 2007, Mathieu Desnoyers wrote:
> 
> > Actually, get_cycles() at least on some AMD cpus, do not synchronize the
> > core, which can skew the results. You might want to use
> > get_cycles_sync() there.
> 
> get_cycle() results as used here are bound to a single processor. If we 
> end up on a different processor at the end of the measurement then the 
> result is discarded. So not need for get_cycles_sync.
> 

I may be wrong, but I think the UP case still needs to synchronize the
core to have precise measurement. This sync core will make sure that
rdtsc is not executed speculatively. It therefore makes sure it is not
misplaced in the instruction stream. I therefore don't think this is a
SMP special case.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > > > > The "asm volatile" implementation does have exactly the same
> > > > > reordering guarantees as the "volatile cast" thing,
> > > > 
> > > > I don't think so.
> > > 
> > > "asm volatile" creates a side effect.
> > 
> > Yeah.
> > 
> > > Side effects aren't
> > > allowed to be reordered wrt sequence points.
> > 
> > Yeah.
> > 
> > > This is exactly
> > > the same reason as why "volatile accesses" cannot be reordered.
> > 
> > No, the code in that sub-thread I earlier pointed you at *WAS* written
> > such that there was a sequence point after all the uses of that volatile
> > access cast macro, and _therefore_ we were safe from re-ordering
> > (behaviour guaranteed by the C standard).
> 
> And exactly the same is true for the "asm" version.
> 
> > Now, one cannot fantasize that "volatile asms" are also sequence points.
> 
> Sure you can do that.  I don't though.
> 
> > In fact such an argument would be sadly mistaken, because "sequence
> > points" are defined by the C standard and it'd be horribly wrong to
> > even _try_ claiming that the C standard knows about "volatile asms".
> 
> That's nonsense.  GCC can extend the C standard any way they
> bloody well please -- witness the fact that they added an
> extra class of side effects...
> 
> > > > Read the relevant GCC documentation.
> > > 
> > > I did, yes.
> > 
> > No, you didn't read:
> > 
> > http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
> > 
> > Read the bit about the need for artificial dependencies, and the example
> > given there:
> > 
> > asm volatile("mtfsf 255,%0" : : "f" (fpenv));
> > sum = x + y;
> > 
> > The docs explicitly say the addition can be moved before the "volatile
> > asm". Hopefully, as you know, (x + y) is an C "expression" and hence
> > a "sequence point" as defined by the standard.
> 
> The _end of a full expression_ is a sequence point, not every
> expression.  And that is irrelevant here anyway.
> 
> It is perfectly fine to compute x+y any time before the
> assignment; the C compiler is allowed to compute it _after_
> the assignment even, if it could figure out how ;-)
> 
> x+y does not contain a side effect, you know.
> 
> > I know there is also stuff written about "side-effects" there which
> > _could_ give the same semantic w.r.t. sequence points as the volatile
> > access casts,
> 
> s/could/does/
> 
> > but hey, it's GCC's own documentation, you obviously can't
> > find fault with _me_ if there's wrong stuff written in there. Say that
> > to GCC ...
> 
> There's nothing wrong there.
> 
> > See, "volatile" C keyword, for all it's ill-definition and dodgy
> > semantics, is still at least given somewhat of a treatment in the C
> > standard (whose quality is ... ummm, sadly not always good and clear,
> > but unsurprisingly, still about 5,482 orders-of-magnitude times
> > better than GCC docs).
> 
> If you find any problems/shortcomings in the GCC documentation,
> please file a PR, don't go whine on some unrelated mailing lists.
> Thank you.
> 
> > Semantics of "volatile" as applies to inline
> > asm, OTOH? You're completely relying on the compiler for that ...
> 
> Yes, and?  GCC promises the behaviour it has documented.

LOTS there, which obviously isn't correct, but which I'll reply to later,
easier stuff first. (call this "handwaving" if you want, but don't worry,
I /will/ bother myself to reply)


> > > > [ of course, if the (latest) GCC documentation is *yet again*
> > > >   wrong, then alright, not much I can do about it, is there. ]
> > > 
> > > There was (and is) nothing wrong about the "+m" documentation, if
> > > that is what you are talking about.  It could be extended now, to
> > > allow "+m" -- but that takes more than just "fixing" the documentation.
> > 
> > No, there was (and is) _everything_ wrong about the "+" documentation as
> > applies to memory-constrained operands. I don't give a whit if it's
> > some workaround in their gimplifier, or the other, that makes it possible
> > to use "+m" (like the current kernel code does). The docs suggest
> > otherwise, so there's obviously a clear disconnect between the docs and
> > actual GCC behaviour.
> 
> The documentation simply doesn't say "+m" is allowed.  The code to
> allow it was added for the benefit of people who do not read the
> documentation.  Documentation for "+m" might get added later if it
> is decided this [the code, not the documentation] is a sane thing
> to have (which isn't directly obvious).

Huh?

"If the (current) documentation doesn't match up with the (current)
code, then _at least one_ of them has to be (as of current) wrong."

I wonder how could you even try to disagree with that.

And I didn't go whining about this ... you asked me. (I think I'd said
something to the effect of GCC docs are often wrong, which is true,
but probably you feel saying that is "not allowed" on non-gcc lists?)

As for the "PR" you're requesting me to file with GCC for this, that
gcc-patches@ 

Re: [PATCH 7/7] Simple Performance Counters: SLUB instrumentation

2007-08-17 Thread Mathieu Desnoyers
* Christoph Lameter ([EMAIL PROTECTED]) wrote:
> On Fri, 17 Aug 2007, Mathieu Desnoyers wrote:
> 
> > Why do you printk inside the timing period ? Filling the printk buffers
> > or outputting on things such as serial console could really hurt your
> > results.
> 
> It was easier to code?
> 
> > I hope you run your system with idle=poll and without frequency scaling
> > at all, because otherwise your cycle count would be completely off on
> > many AMD and Intel CPUs. You can have a look at this (very rough)
> > document on the topic: 
> 
> The cpu will definitely not be idle during these measurements and no 
> frequency scaling is active.

The problem is that if the cpu is idle _before_ the measurements, the
frequency will change differently from one cpu to another. Therefore,
the cycle counters may have large offsets when you start your tests. So,
if get_cycles() is executed on different CPUs (thread being migrated)
between the beginning and the end of the test, the results would be
skewed.

Mathieu


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] maps: /proc//pmaps interface - memory maps in granularity of pages

2007-08-17 Thread Fengguang Wu
Matt,

On Fri, Aug 17, 2007 at 11:58:08AM -0500, Matt Mackall wrote:
> On Fri, Aug 17, 2007 at 02:47:27PM +0800, Fengguang Wu wrote:
> > It's not easy to do direct performance comparisons between pmaps and
> > pagemap/kpagemap. However some close analyzes are still possible :)
> > 
> > 1) code size
> > pmaps   ~200 LOC
> > pagemap/kpagemap~300 LOC
> > 
> > 2) dataset size
> > take for example my running firefox on Intel Core 2:
> > VSZ 400 MB
> > RSS  64 MB, or 16k pages
> > pmaps64 KB, wc shows 2k lines, or so much page ranges
> > pagemap 800 KB, could be heavily optimized by returning partial data
> 
> I take it you're in 64-bit mode?

Yes. That will be the common case.

> You're right, this data compresses well in many circumstances. I
> suspect it will suffer under memory pressure though. That will
> fragment the ranges in-memory and also fragment the active bits. The
> worst case here is huge, of course, but realistically I'd expect
> something like 2x-4x.

Not likely to degrade even under memory pressure ;)

The compress_ratio = (VSZ:RSS) * (RSS:page_ranges).
- On fresh startup and no memory pressure,
  - the VSZ:RSS ratio of ALL processes are 4516796KB:457048KB ~= 10:1.
  - the firefox case shows a (RSS:page_ranges) of 16k:2k ~= 8:1.
- On memory pressure,
  - as VSZ goes up, RSS will be bounded by physical memory.
So VSZ:RSS ratio actually goes up with memory pressure.
  - page range is a good unit of locality. They are more likely to be
reclaimed as a whole. So (RSS:page_ranges) wouldn't degrade as much.

> But there are still the downsides I have mentioned:
> 
> - you don't get page frame numbers

True. I guess PFNs are meaningless to a normal user?

> - you can't do random access

Not for now.

It would be trivial to support seek-by-address semantic: the seqfile
operations already iterate by addresses. Only that we cannot do it via
the regular read/pread/seek interfaces. They have different semantic
on fpos. However, tricks like ioctl(begin_addr, end_addr) can be
employed if necessary.

> And how long does it take to pull the data out? My benchmarks show
> greater than 50MB/s (and that's with the version in -mm that's doing
> double buffering), so that 800K would take < .016s. 

You are right :)

> > kpagemap256 KB
> > 
> > 3) runtime overheads
> > pmaps2k lines of string processing(encode/decode)
> > kpagemap16k seek()/read()s, and context switches (could be
> > optimized somehow by doing a PFN sort first, but
> > that's also non-trivial overheads)
> 
> You can do anywhere between 16k small reads or 1 large read. Depends

No way to avoid the seeks if PFNs are discontinuous. Too bad the
memory get fragmented with uptime, at least for the current kernel.

But sure, sequential reads are viable when doing whole system memory
analysis, or for memory hog processes.

> what data you're trying to get. Right now, kpagemap is fast enough
> that I can do realtime displays of the whole of memory in my desktop
> in a GUI written in Python. And Python is fairly horrible for drawing
> bitmaps and such.
> 
> http://www.selenic.com/Screenshot-kpagemap.png
> 
> > So pmaps seems to be a clear winner :)
> 
> Except that it's only providing a subset of the data.

Yes, and it's a nice graph :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rtc: Make rtc-ds1742 driver hotplug-aware

2007-08-17 Thread Kay Sievers
On Fri, 2007-08-17 at 12:50 -0700, David Brownell wrote:
> On Friday 17 August 2007, Kay Sievers wrote:
> > On Fri, 2007-08-17 at 09:55 -0700, David Brownell wrote:

> > > 
> > > If so, then whoever tried to change the usage of module aliases
> > > in that way goofed in several ways.  First, by not changing all
> > > the in-kernel uses.  Second, by not changing all the out-of-tree
> > > uses in various distros, toolchains, etc.  Third, by not even
> > > documenting that change...
> > 
> > Yes, please fix the misuse of MODALIAS.
> 
> I'm not the one who's advocating a change here.  If you want to
> first change/break and then fix things, all of that is up to you.

I'm happy to do that. Patch is attached.

Thanks,
Kay


From: Kay Sievers <[EMAIL PROTECTED]>
Subject: platform: prefix MODALIAS with "platform:"

Prefix platform modalias strings with "platform:", which
modprobe config to blacklist alias resolving if userspace
configures it.

Send uevents for all platform devices.

Add MODULE_ALIAS's to: pxa2xx_pcmcia, ds1742 and pcspkr to trigger
module autoloading by userspace.

  $ modinfo pcspkr
  alias:  platform:pcspkr
  license:GPL
  description:PC Speaker beeper driver
  ...

  $ modprobe -n -v platform:pcspkr
  insmod 
/lib/modules/2.6.23-rc3-g28e8351a-dirty/kernel/drivers/input/misc/pcspkr.ko

Signed-off-by: Kay Sievers <[EMAIL PROTECTED]>
---

 base/platform.c   |   15 ++-
 input/misc/pcspkr.c   |1 +
 pcmcia/pxa2xx_mainstone.c |2 +-
 pcmcia/pxa2xx_sharpsl.c   |2 +-
 rtc/rtc-ds1742.c  |1 +
 5 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 869ff8c..9bfc434 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -160,11 +160,6 @@ static void platform_device_release(struct device *dev)
  *
  * Create a platform device object which can have other objects attached
  * to it, and which will have attached objects freed when it is released.
- *
- * This device will be marked as not supporting hotpluggable drivers; no
- * device add/remove uevents will be generated.  In the unusual case that
- * the device isn't being dynamically allocated as a legacy "probe the
- * hardware" driver, infrastructure code should reverse this marking.
  */
 struct platform_device *platform_device_alloc(const char *name, unsigned int 
id)
 {
@@ -177,12 +172,6 @@ struct platform_device *platform_device_alloc(const char 
*name, unsigned int id)
pa->pdev.id = id;
device_initialize(>pdev.dev);
pa->pdev.dev.release = platform_device_release;
-
-   /* prevent hotplug "modprobe $(MODALIAS)" from causing trouble 
in
-* legacy probe-the-hardware drivers, which don't properly split
-* out device enumeration logic from drivers.
-*/
-   pa->pdev.dev.uevent_suppress = 1;
}
 
return pa ? >pdev : NULL;
@@ -530,7 +519,7 @@ static ssize_t
 modalias_show(struct device *dev, struct device_attribute *a, char *buf)
 {
struct platform_device  *pdev = to_platform_device(dev);
-   int len = snprintf(buf, PAGE_SIZE, "%s\n", pdev->name);
+   int len = snprintf(buf, PAGE_SIZE, "platform:%s\n", pdev->name);
 
return (len >= PAGE_SIZE) ? (PAGE_SIZE - 1) : len;
 }
@@ -546,7 +535,7 @@ static int platform_uevent(struct device *dev, char **envp, 
int num_envp,
struct platform_device  *pdev = to_platform_device(dev);
 
envp[0] = buffer;
-   snprintf(buffer, buffer_size, "MODALIAS=%s", pdev->name);
+   snprintf(buffer, buffer_size, "MODALIAS=platform:%s", pdev->name);
return 0;
 }
 
diff --git a/drivers/input/misc/pcspkr.c b/drivers/input/misc/pcspkr.c
index 906bf5e..e1a4402 100644
--- a/drivers/input/misc/pcspkr.c
+++ b/drivers/input/misc/pcspkr.c
@@ -23,6 +23,7 @@
 MODULE_AUTHOR("Vojtech Pavlik <[EMAIL PROTECTED]>");
 MODULE_DESCRIPTION("PC Speaker beeper driver");
 MODULE_LICENSE("GPL");
+MODULE_ALIAS("platform:pcspkr");
 
 #ifdef CONFIG_X86
 /* Use the global PIT lock ! */
diff --git a/drivers/pcmcia/pxa2xx_mainstone.c 
b/drivers/pcmcia/pxa2xx_mainstone.c
index 383107b..f6722ba 100644
--- a/drivers/pcmcia/pxa2xx_mainstone.c
+++ b/drivers/pcmcia/pxa2xx_mainstone.c
@@ -175,7 +175,6 @@ static int __init mst_pcmcia_init(void)
if (!mst_pcmcia_device)
return -ENOMEM;
 
-   mst_pcmcia_device->dev.uevent_suppress = 0;
mst_pcmcia_device->dev.platform_data = _pcmcia_ops;
 
ret = platform_device_add(mst_pcmcia_device);
@@ -195,3 +194,4 @@ fs_initcall(mst_pcmcia_init);
 module_exit(mst_pcmcia_exit);
 
 MODULE_LICENSE("GPL");
+MODULE_ALIAS("platform:pxa2xx-pcmcia");
diff --git a/drivers/pcmcia/pxa2xx_sharpsl.c b/drivers/pcmcia/pxa2xx_sharpsl.c
index a2daa3f..d5c33bd 100644
--- a/drivers/pcmcia/pxa2xx_sharpsl.c
+++ b/drivers/pcmcia/pxa2xx_sharpsl.c
@@ -261,7 +261,6 @@ 

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

The "asm volatile" implementation does have exactly the same
reordering guarantees as the "volatile cast" thing,


I don't think so.


"asm volatile" creates a side effect.


Yeah.


Side effects aren't
allowed to be reordered wrt sequence points.


Yeah.


This is exactly
the same reason as why "volatile accesses" cannot be reordered.


No, the code in that sub-thread I earlier pointed you at *WAS* written
such that there was a sequence point after all the uses of that 
volatile

access cast macro, and _therefore_ we were safe from re-ordering
(behaviour guaranteed by the C standard).


And exactly the same is true for the "asm" version.

Now, one cannot fantasize that "volatile asms" are also sequence 
points.


Sure you can do that.  I don't though.


In fact such an argument would be sadly mistaken, because "sequence
points" are defined by the C standard and it'd be horribly wrong to
even _try_ claiming that the C standard knows about "volatile asms".


That's nonsense.  GCC can extend the C standard any way they
bloody well please -- witness the fact that they added an
extra class of side effects...


Read the relevant GCC documentation.


I did, yes.


No, you didn't read:

http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

Read the bit about the need for artificial dependencies, and the 
example

given there:

asm volatile("mtfsf 255,%0" : : "f" (fpenv));
sum = x + y;

The docs explicitly say the addition can be moved before the "volatile
asm". Hopefully, as you know, (x + y) is an C "expression" and hence
a "sequence point" as defined by the standard.


The _end of a full expression_ is a sequence point, not every
expression.  And that is irrelevant here anyway.

It is perfectly fine to compute x+y any time before the
assignment; the C compiler is allowed to compute it _after_
the assignment even, if it could figure out how ;-)

x+y does not contain a side effect, you know.


I know there is also stuff written about "side-effects" there which
_could_ give the same semantic w.r.t. sequence points as the volatile
access casts,


s/could/does/


but hey, it's GCC's own documentation, you obviously can't
find fault with _me_ if there's wrong stuff written in there. Say that
to GCC ...


There's nothing wrong there.


See, "volatile" C keyword, for all it's ill-definition and dodgy
semantics, is still at least given somewhat of a treatment in the C
standard (whose quality is ... ummm, sadly not always good and clear,
but unsurprisingly, still about 5,482 orders-of-magnitude times
better than GCC docs).


If you find any problems/shortcomings in the GCC documentation,
please file a PR, don't go whine on some unrelated mailing lists.
Thank you.


Semantics of "volatile" as applies to inline
asm, OTOH? You're completely relying on the compiler for that ...


Yes, and?  GCC promises the behaviour it has documented.


[ of course, if the (latest) GCC documentation is *yet again*
  wrong, then alright, not much I can do about it, is there. ]


There was (and is) nothing wrong about the "+m" documentation, if
that is what you are talking about.  It could be extended now, to
allow "+m" -- but that takes more than just "fixing" the 
documentation.


No, there was (and is) _everything_ wrong about the "+" documentation 
as

applies to memory-constrained operands. I don't give a whit if it's
some workaround in their gimplifier, or the other, that makes it 
possible

to use "+m" (like the current kernel code does). The docs suggest
otherwise, so there's obviously a clear disconnect between the docs and
actual GCC behaviour.


The documentation simply doesn't say "+m" is allowed.  The code to
allow it was added for the benefit of people who do not read the
documentation.  Documentation for "+m" might get added later if it
is decided this [the code, not the documentation] is a sane thing
to have (which isn't directly obvious).

[ You seem to often take issue with _amazingly_ petty and pedantic 
things,

  by the way :-) ]


If you're talking details, you better get them right.  Handwaving is
fine with me as long as you're not purporting you're not.

And I simply cannot stand false assertions.

You can always ignore me if _you_ take issue with _that_ :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> 
> > I didn't quite understand what you said here, so I'll tell what I think:
> > 
> > * foo() is a compiler barrier if the definition of foo() is invisible to
> >  the compiler at a callsite.
> > 
> > * foo() is also a compiler barrier if the definition of foo() includes
> >  a barrier, and it is inlined at the callsite.
> > 
> > If the above is wrong, or if there's something else at play as well,
> > do let me know.
> 
> [...]
> If a function is not completely visible to the compiler (so it can't
> determine whether a barrier could be in it or not), then it must always
> assume it will contain a barrier so it always does the right thing.

Yup, that's what I'd said just a few sentences above, as you can see. I
was actually asking for "elaboration" on "how a compiler determines that
function foo() (say foo == schedule), even when it cannot see that it has
a barrier(), as you'd mentioned, is a 'sleeping' function" actually, and
whether compilers have a "notion of sleep to automatically assume a
compiler barrier whenever such a sleeping function foo() is called". But
I think you've already qualified the discussion to this kernel, so okay,
I shouldn't nit-pick anymore.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Arjan van de Ven

On Fri, 2007-08-17 at 18:54 -0700, Stephen Hemminger wrote:
> On Fri, 17 Aug 2007 18:49:34 -0700
> Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> 
> > 
> > On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote:
> > > Tne network code does memset for 6 and 8 byte values, that can easily
> > > be optimized into simple assignments without string instructions.
> > 
> > 
> > so... question.
> > Why are we doing this by hand? Wouldn't gcc just generate this code in
> > the first place (when using __builtin_memset)? I very much suspect it
> > would (and if some version doesn't we really ought to get that
> > fixed)
> 
> i386 and x86_64 are not using __builtin_memset, as least from the
> code that I see generated.

.. maybe we should just fix it that way then?

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Stephen Hemminger
On Fri, 17 Aug 2007 18:49:34 -0700
Arjan van de Ven <[EMAIL PROTECTED]> wrote:

> 
> On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote:
> > Tne network code does memset for 6 and 8 byte values, that can easily
> > be optimized into simple assignments without string instructions.
> 
> 
> so... question.
> Why are we doing this by hand? Wouldn't gcc just generate this code in
> the first place (when using __builtin_memset)? I very much suspect it
> would (and if some version doesn't we really ought to get that
> fixed)

i386 and x86_64 are not using __builtin_memset, as least from the
code that I see generated.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Arjan van de Ven

On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote:
> Tne network code does memset for 6 and 8 byte values, that can easily
> be optimized into simple assignments without string instructions.


so... question.
Why are we doing this by hand? Wouldn't gcc just generate this code in
the first place (when using __builtin_memset)? I very much suspect it
would (and if some version doesn't we really ought to get that
fixed)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] - git-send-email.perl

2007-08-17 Thread Joe Perches
On Fri, 2007-08-17 at 16:38 -0700, Junio C Hamano wrote:
> Joe Perches <[EMAIL PROTECTED]> writes:
> ... Signed-off-by: ...
> I do not see a patch to "Documentation/git-send-email.txt" here...
> Something like this, with appropriate error checking, perhaps?
> 
>   open my $cc, "${cc_cmd} $t |";
> while (my $c = <$cc>) {
>   ...
>   }
> close $cc;

Add --cc-cmd, the ability to execute an arbitrary "cmd" to
generate per patch file specific "Cc:"s to git-send-email.perl

Signed-off-by: Joe Perches <[EMAIL PROTECTED]>

diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt
index d243ed1..9a48847 100644
--- a/Documentation/git-send-email.txt
+++ b/Documentation/git-send-email.txt
@@ -34,6 +34,12 @@ The --bcc option must be repeated for each user you want on 
the bcc list.
 +
 The --cc option must be repeated for each user you want on the cc list.
 
+--cc-cmd::
+   Specify a command to execute once per patch file which
+   should generate patch file specific "Cc:" entries.
+   Output of this command must be single email address per line.
+   Default is the value of 'sendemail.cccmd' configuration value.
+   
 --chain-reply-to, --no-chain-reply-to::
If this is set, each email will be sent as a reply to the previous
email sent.  If disabled with "--no-chain-reply-to", all emails after
@@ -124,6 +130,9 @@ sendemail.aliasfiletype::
Format of the file(s) specified in sendemail.aliasesfile. Must be
one of 'mutt', 'mailrc', 'pine', or 'gnus'.
 
+sendemail.cccmd::
+   Command to execute to generate per patch file specific "Cc:"s.
+
 sendemail.bcc::
Email address (or alias) to always bcc.
 
diff --git a/git-send-email.perl b/git-send-email.perl
index 69559b2..d49947c 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -46,6 +46,9 @@ Options:
--cc   Specify an initial "Cc:" list for the entire series
   of emails.
 
+   --cc-cmd   Specify a command to execute per file which adds
+  per file specific cc address entries
+
--bcc  Specify a list of email addresses that should be Bcc:
  on all the emails.
 
@@ -157,13 +160,14 @@ if ($@) {
 my ($quiet, $dry_run) = (0, 0);
 
 # Variables with corresponding config settings
-my ($thread, $chain_reply_to, $suppress_from, $signed_off_cc);
+my ($thread, $chain_reply_to, $suppress_from, $signed_off_cc, $cc_cmd);
 
 my %config_settings = (
 "thread" => [\$thread, 1],
 "chainreplyto" => [\$chain_reply_to, 1],
 "suppressfrom" => [\$suppress_from, 0],
 "signedoffcc" => [\$signed_off_cc, 1],
+"cccmd" => [\$cc_cmd, ""],
 );
 
 foreach my $setting (keys %config_settings) {
@@ -189,6 +193,7 @@ my $rc = GetOptions("sender|from=s" => \$sender,
"smtp-server=s" => \$smtp_server,
"compose" => \$compose,
"quiet" => \$quiet,
+   "cc-cmd=s" => \$cc_cmd,
"suppress-from!" => \$suppress_from,
"signed-off-cc|signed-off-by-cc!" => \$signed_off_cc,
"dry-run" => \$dry_run,
@@ -652,11 +657,25 @@ foreach my $t (@files) {
}
}
close F;
+
+   if (${cc_cmd} ne "") {
+   open(F, "${cc_cmd} $t |")
+   or die "(cc-cmd) Could not execute '${cc_cmd}'\n";
+   while() {
+   my $c = $_;
+   $c =~ s/^\s*//g;
+   $c =~ s/\n$//g;
+   push @cc, $c;
+   printf("(cc-cmd) Adding cc: %s from: '%s'\n", $c, $cc_cmd)
+   unless $quiet;
+   }
+   close F;
+   }
+
if (defined $author) {
$message = "From: $author\n\n$message";
}
 
-
send_message();
 
# set up for the next message


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > > > > atomic_dec() writes
> > > > > to memory, so it _does_ have "volatile semantics", implicitly, as
> > > > > long as the compiler cannot optimise the atomic variable away
> > > > > completely -- any store counts as a side effect.
> > > > 
> > > > I don't think an atomic_dec() implemented as an inline "asm volatile"
> > > > or one that uses a "forget" macro would have the same re-ordering
> > > > guarantees as an atomic_dec() that uses a volatile access cast.
> > > 
> > > The "asm volatile" implementation does have exactly the same
> > > reordering guarantees as the "volatile cast" thing,
> > 
> > I don't think so.
> 
> "asm volatile" creates a side effect.

Yeah.

> Side effects aren't
> allowed to be reordered wrt sequence points.

Yeah.

> This is exactly
> the same reason as why "volatile accesses" cannot be reordered.

No, the code in that sub-thread I earlier pointed you at *WAS* written
such that there was a sequence point after all the uses of that volatile
access cast macro, and _therefore_ we were safe from re-ordering
(behaviour guaranteed by the C standard).

But you seem to be missing the simple and basic fact that:

(something_that_has_side_effects || statement)
!= something_that_is_a_sequence_point

Now, one cannot fantasize that "volatile asms" are also sequence points.
In fact such an argument would be sadly mistaken, because "sequence
points" are defined by the C standard and it'd be horribly wrong to
even _try_ claiming that the C standard knows about "volatile asms".


> > > if that is
> > > implemented by GCC in the "obvious" way.  Even a "plain" asm()
> > > will do the same.
> > 
> > Read the relevant GCC documentation.
> 
> I did, yes.

No, you didn't read:

http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

Read the bit about the need for artificial dependencies, and the example
given there:

asm volatile("mtfsf 255,%0" : : "f" (fpenv));
sum = x + y;

The docs explicitly say the addition can be moved before the "volatile
asm". Hopefully, as you know, (x + y) is an C "expression" and hence
a "sequence point" as defined by the standard. So the "volatile asm"
should've happened before it, right? Wrong.

I know there is also stuff written about "side-effects" there which
_could_ give the same semantic w.r.t. sequence points as the volatile
access casts, but hey, it's GCC's own documentation, you obviously can't
find fault with _me_ if there's wrong stuff written in there. Say that
to GCC ...

See, "volatile" C keyword, for all it's ill-definition and dodgy
semantics, is still at least given somewhat of a treatment in the C
standard (whose quality is ... ummm, sadly not always good and clear,
but unsurprisingly, still about 5,482 orders-of-magnitude times
better than GCC docs). Semantics of "volatile" as applies to inline
asm, OTOH? You're completely relying on the compiler for that ...


> > [ of course, if the (latest) GCC documentation is *yet again*
> >   wrong, then alright, not much I can do about it, is there. ]
> 
> There was (and is) nothing wrong about the "+m" documentation, if
> that is what you are talking about.  It could be extended now, to
> allow "+m" -- but that takes more than just "fixing" the documentation.

No, there was (and is) _everything_ wrong about the "+" documentation as
applies to memory-constrained operands. I don't give a whit if it's
some workaround in their gimplifier, or the other, that makes it possible
to use "+m" (like the current kernel code does). The docs suggest
otherwise, so there's obviously a clear disconnect between the docs and
actual GCC behaviour.


[ You seem to often take issue with _amazingly_ petty and pedantic things,
  by the way :-) ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc: Implement atomic{,64}_{read,write}() without volatile

2007-08-17 Thread Segher Boessenkool

Instead, use asm() like all other atomic operations already do.



+static __inline__ long atomic64_read(const atomic_t *v)



+static __inline__ void atomic64_set(atomic_t *v, long i)


s/atomic_t/atomic64_t/ in both lines.  I've edited my copy of the 
patch.


Ouch, sorry about that.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Christoph Lameter wrote:

> On Fri, 17 Aug 2007, Paul E. McKenney wrote:
> 
> > On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote:
> > > On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote:
> > > >
> > > > gcc bugzilla bug #33102, for whatever that ends up being worth.  ;-)
> > > 
> > > I had totally forgotten that I'd already filed that bug more
> > > than six years ago until they just closed yours as a duplicate
> > > of mine :)
> > > 
> > > Good luck in getting it fixed!
> > 
> > Well, just got done re-opening it for the third time.  And a local
> > gcc community member advised me not to give up too easily.  But I
> > must admit that I am impressed with the speed that it was identified
> > as duplicate.
> > 
> > Should be entertaining!  ;-)
> 
> Right. ROTFL... volatile actually breaks atomic_t instead of making it 
> safe. x++ becomes a register load, increment and a register store. Without 
> volatile we can increment the memory directly.

No code does (or would do, or should do):

x.counter++;

on an "atomic_t x;" anyway.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Christoph Lameter
On Fri, 17 Aug 2007, Paul E. McKenney wrote:

> On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote:
> > On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote:
> > >
> > > gcc bugzilla bug #33102, for whatever that ends up being worth.  ;-)
> > 
> > I had totally forgotten that I'd already filed that bug more
> > than six years ago until they just closed yours as a duplicate
> > of mine :)
> > 
> > Good luck in getting it fixed!
> 
> Well, just got done re-opening it for the third time.  And a local
> gcc community member advised me not to give up too easily.  But I
> must admit that I am impressed with the speed that it was identified
> as duplicate.
> 
> Should be entertaining!  ;-)

Right. ROTFL... volatile actually breaks atomic_t instead of making it 
safe. x++ becomes a register load, increment and a register store. Without 
volatile we can increment the memory directly. It seems that volatile 
requires that the variable is loaded into a register first and then 
operated upon. Understandable when you think about volatile being used to 
access memory mapped I/O registers where a RMW operation could be 
problematic.

See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3506
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Paul E. McKenney
On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote:
> On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote:
> >
> > gcc bugzilla bug #33102, for whatever that ends up being worth.  ;-)
> 
> I had totally forgotten that I'd already filed that bug more
> than six years ago until they just closed yours as a duplicate
> of mine :)
> 
> Good luck in getting it fixed!

Well, just got done re-opening it for the third time.  And a local
gcc community member advised me not to give up too easily.  But I
must admit that I am impressed with the speed that it was identified
as duplicate.

Should be entertaining!  ;-)

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kfree(0) - ok?

2007-08-17 Thread Christoph Lameter
On Sat, 18 Aug 2007, Thomas Gleixner wrote:

> If yes, who invented this 1980s reminiscence, where you got valid
> pointers for malloc(0) ?

I believe his first name started with an L and ended with an s ;-)

Seriously the kmalloc(0) pointer allowed us to detect cases in which 
people tried to store into objects allocated with kmalloc(0).

If we would just return NULL then we would not be able to distinguish it 
from a failure (that is what I initially had).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/6] UML - Remove unneeded if from hostfs

2007-08-17 Thread Jeff Dike
Get rid of an empty if statement which might look like a bug to a
casual reader.

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
--
 fs/hostfs/hostfs_user.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.22/fs/hostfs/hostfs_user.c
===
--- linux-2.6.22.orig/fs/hostfs/hostfs_user.c   2007-08-17 13:36:34.0 
-0400
+++ linux-2.6.22/fs/hostfs/hostfs_user.c2007-08-17 13:37:45.0 
-0400
@@ -283,7 +283,7 @@ int set_attr(const char *file, struct ho
}
}
 
-   if(attrs->ia_valid & HOSTFS_ATTR_CTIME) ;
+   /* Note: ctime is not handled */
if(attrs->ia_valid & (HOSTFS_ATTR_ATIME | HOSTFS_ATTR_MTIME)){
err = stat_file(file, NULL, NULL, NULL, NULL, NULL, NULL,
>ia_atime, >ia_mtime, NULL,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/6] UML - Fix hostfs style

2007-08-17 Thread Jeff Dike
Style fixes in hostfs.

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
--
 fs/hostfs/hostfs.h  |9 +
 fs/hostfs/hostfs_kern.c |  226 
 fs/hostfs/hostfs_user.c |  139 +
 3 files changed, 202 insertions(+), 172 deletions(-)

Index: linux-2.6.22/fs/hostfs/hostfs.h
===
--- linux-2.6.22.orig/fs/hostfs/hostfs.h2007-08-17 13:36:34.0 
-0400
+++ linux-2.6.22/fs/hostfs/hostfs.h 2007-08-17 13:37:37.0 -0400
@@ -3,7 +3,8 @@
 
 #include "os.h"
 
-/* These are exactly the same definitions as in fs.h, but the names are
+/*
+ * These are exactly the same definitions as in fs.h, but the names are
  * changed so that this file can be included in both kernel and user files.
  */
 
@@ -21,7 +22,8 @@
 #define HOSTFS_ATTR_FORCE  512 /* Not a change, but a change it */
 #define HOSTFS_ATTR_ATTR_FLAG  1024
 
-/* If you are very careful, you'll notice that these two are missing:
+/*
+ * If you are very careful, you'll notice that these two are missing:
  *
  * #define ATTR_KILL_SUID  2048
  * #define ATTR_KILL_SGID  4096
@@ -76,7 +78,8 @@ extern int make_symlink(const char *from
 extern int unlink_file(const char *file);
 extern int do_mkdir(const char *file, int mode);
 extern int do_rmdir(const char *file);
-extern int do_mknod(const char *file, int mode, unsigned int major, unsigned 
int minor);
+extern int do_mknod(const char *file, int mode, unsigned int major,
+   unsigned int minor);
 extern int link_file(const char *from, const char *to);
 extern int do_readlink(char *file, char *buf, int size);
 extern int rename_file(char *from, char *to);
Index: linux-2.6.22/fs/hostfs/hostfs_kern.c
===
--- linux-2.6.22.orig/fs/hostfs/hostfs_kern.c   2007-08-17 13:36:34.0 
-0400
+++ linux-2.6.22/fs/hostfs/hostfs_kern.c2007-08-17 13:37:37.0 
-0400
@@ -6,22 +6,15 @@
  * 2003-02-10 Petr Baudis <[EMAIL PROTECTED]>
  */
 
-#include 
 #include 
 #include 
-#include 
-#include 
+#include 
 #include 
-#include 
-#include 
 #include 
-#include 
 #include  /* mark_page_accessed */
-#include 
 #include "hostfs.h"
-#include "kern_util.h"
-#include "kern.h"
 #include "init.h"
+#include "kern.h"
 
 struct hostfs_inode_info {
char *host_filename;
@@ -62,18 +55,18 @@ static int __init hostfs_args(char *opti
char *ptr;
 
ptr = strchr(options, ',');
-   if(ptr != NULL)
+   if (ptr != NULL)
*ptr++ = '\0';
-   if(*options != '\0')
+   if (*options != '\0')
root_ino = options;
 
options = ptr;
-   while(options){
+   while (options) {
ptr = strchr(options, ',');
-   if(ptr != NULL)
+   if (ptr != NULL)
*ptr++ = '\0';
-   if(*options != '\0'){
-   if(!strcmp(options, "append"))
+   if (*options != '\0') {
+   if (!strcmp(options, "append"))
append = 1;
else printf("hostfs_args - unsupported option - %s\n",
options);
@@ -103,7 +96,7 @@ static char *dentry_name(struct dentry *
 
len = 0;
parent = dentry;
-   while(parent->d_parent != parent){
+   while (parent->d_parent != parent) {
len += parent->d_name.len + 1;
parent = parent->d_parent;
}
@@ -111,12 +104,12 @@ static char *dentry_name(struct dentry *
root = HOSTFS_I(parent->d_inode)->host_filename;
len += strlen(root);
name = kmalloc(len + extra + 1, GFP_KERNEL);
-   if(name == NULL)
+   if (name == NULL)
return NULL;
 
name[len] = '\0';
parent = dentry;
-   while(parent->d_parent != parent){
+   while (parent->d_parent != parent) {
len -= parent->d_name.len + 1;
name[len] = '/';
strncpy([len + 1], parent->d_name.name,
@@ -137,7 +130,8 @@ static char *inode_name(struct inode *in
 
 static int read_name(struct inode *ino, char *name)
 {
-   /* The non-int inode fields are copied into ints by stat_file and
+   /*
+* The non-int inode fields are copied into ints by stat_file and
 * then copied into the inode because passing the actual pointers
 * in and having them treated as int * breaks on big-endian machines
 */
@@ -150,7 +144,7 @@ static int read_name(struct inode *ino, 
err = stat_file(name, _ino, _mode, _nlink, >i_uid,
>i_gid, _size, >i_atime, >i_mtime,
>i_ctime, _blksize, _blocks, -1);
-   if(err)
+   if (err)
return err;
 
ino->i_ino = i_ino;
@@ -167,33 +161,33 @@ static char *follow_link(char *link)

[PATCH 2/6] UML - Use 64-bits for block size on x86_64

2007-08-17 Thread Jeff Dike
The BLKGETSIZE ioctl expects a pointer to a long, os_file_size was providing
an int. Therefore, ubd access to host block devices caused a segmentation
fault on 64 bits systems.

Signed-off-by: Nicolas George <[EMAIL PROTECTED]>
Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
--
 arch/um/os-Linux/file.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.22/arch/um/os-Linux/file.c
===
--- linux-2.6.22.orig/arch/um/os-Linux/file.c   2007-08-17 13:36:35.0 
-0400
+++ linux-2.6.22/arch/um/os-Linux/file.c2007-08-17 13:37:55.0 
-0400
@@ -296,7 +296,8 @@ int os_file_size(char *file, unsigned lo
}
 
if(S_ISBLK(buf.ust_mode)){
-   int fd, blocks;
+   int fd;
+   long blocks;
 
fd = os_open_file(file, of_read(OPENFLAGS()), 0);
if(fd < 0){
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/6] UML - Fix inlines

2007-08-17 Thread Jeff Dike
"extern inline" will have different semantics with gcc 4.3.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
--
 include/asm-um/pgalloc.h  |2 +-
 include/asm-um/pgtable-3level.h   |2 +-
 include/asm-um/processor-x86_64.h |2 +-
 include/asm-um/smp.h  |2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6.22/include/asm-um/pgalloc.h
===
--- linux-2.6.22.orig/include/asm-um/pgalloc.h  2007-08-13 22:38:59.0 
-0400
+++ linux-2.6.22/include/asm-um/pgalloc.h   2007-08-16 12:04:12.0 
-0400
@@ -42,7 +42,7 @@ static inline void pte_free(struct page 
 
 #ifdef CONFIG_3_LEVEL_PGTABLES
 
-extern __inline__ void pmd_free(pmd_t *pmd)
+static inline void pmd_free(pmd_t *pmd)
 {
free_page((unsigned long)pmd);
 }
Index: linux-2.6.22/include/asm-um/pgtable-3level.h
===
--- linux-2.6.22.orig/include/asm-um/pgtable-3level.h   2007-07-08 
19:32:17.0 -0400
+++ linux-2.6.22/include/asm-um/pgtable-3level.h2007-08-15 
10:39:59.0 -0400
@@ -69,7 +69,7 @@ static inline pmd_t *pmd_alloc_one(struc
 return pmd;
 }
 
-extern inline void pud_clear (pud_t *pud)
+static inline void pud_clear (pud_t *pud)
 {
 set_pud(pud, __pud(0));
 }
Index: linux-2.6.22/include/asm-um/processor-x86_64.h
===
--- linux-2.6.22.orig/include/asm-um/processor-x86_64.h 2007-07-08 
19:32:17.0 -0400
+++ linux-2.6.22/include/asm-um/processor-x86_64.h  2007-08-15 
10:39:59.0 -0400
@@ -18,7 +18,7 @@ struct arch_thread {
 };
 
 /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
-extern inline void rep_nop(void)
+static inline void rep_nop(void)
 {
__asm__ __volatile__("rep;nop": : :"memory");
 }
Index: linux-2.6.22/include/asm-um/smp.h
===
--- linux-2.6.22.orig/include/asm-um/smp.h  2007-07-08 19:32:17.0 
-0400
+++ linux-2.6.22/include/asm-um/smp.h   2007-08-15 10:39:59.0 -0400
@@ -18,7 +18,7 @@ extern int hard_smp_processor_id(void);
 extern int ncpus;
 
 
-extern inline void smp_cpus_done(unsigned int maxcpus)
+static inline void smp_cpus_done(unsigned int maxcpus)
 {
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/6] UML - Userspace files should call libc directly

2007-08-17 Thread Jeff Dike
A number of files that were changed in the recent removal of tt mode
are userspace files which call the os_* wrappers instead of calling
libc directly.  A few other files were affected by this, through

This patch makes these call glibc directly.

There are also style fixes in the affected areas.

os_print_error has no remaining callers, so it is deleted.

There is a interface change to os_set_exec_close, eliminating a
parameter which was always the same.  The callers are fixed as well.

os_process_pc got its error path cleaned up.

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
--
 arch/um/include/os.h |3 
 arch/um/kernel/ksyms.c   |1 
 arch/um/kernel/physmem.c |4 -
 arch/um/os-Linux/aio.c   |4 -
 arch/um/os-Linux/drivers/ethertap_user.c |   61 --
 arch/um/os-Linux/drivers/tuntap_user.c   |   24 +++
 arch/um/os-Linux/file.c  |  102 +--
 arch/um/os-Linux/helper.c|8 +-
 arch/um/os-Linux/mem.c   |6 -
 arch/um/os-Linux/process.c   |   37 ++-
 arch/um/os-Linux/skas/process.c  |4 -
 arch/um/os-Linux/start_up.c  |5 -
 12 files changed, 138 insertions(+), 121 deletions(-)

Index: linux-2.6.22/arch/um/os-Linux/file.c
===
--- linux-2.6.22.orig/arch/um/os-Linux/file.c   2007-08-17 14:32:32.0 
-0400
+++ linux-2.6.22/arch/um/os-Linux/file.c2007-08-17 14:32:32.0 
-0400
@@ -82,13 +82,6 @@ int os_access(const char* file, int mode
return 0;
 }
 
-void os_print_error(int error, const char* str)
-{
-   errno = error < 0 ? -error : error;
-
-   perror(str);
-}
-
 /* FIXME? required only by hostaudio (because it passes ioctls verbatim) */
 int os_ioctl_generic(int fd, unsigned int cmd, unsigned long arg)
 {
@@ -181,19 +174,19 @@ int os_file_mode(char *file, struct open
 
*mode_out = OPENFLAGS();
 
-   err = os_access(file, OS_ACC_W_OK);
-   if((err < 0) && (err != -EACCES))
-   return(err);
-
-   *mode_out = of_write(*mode_out);
-
-   err = os_access(file, OS_ACC_R_OK);
-   if((err < 0) && (err != -EACCES))
-   return(err);
+   err = access(file, W_OK);
+   if(err && (errno != EACCES))
+   return -errno;
+   else if(!err)
+   *mode_out = of_write(*mode_out);
 
-   *mode_out = of_read(*mode_out);
+   err = access(file, R_OK);
+   if(err && (errno != EACCES))
+   return -errno;
+   else if(!err)
+   *mode_out = of_read(*mode_out);
 
-   return(0);
+   return err;
 }
 
 int os_open_file(char *file, struct openflags flags, int mode)
@@ -212,15 +205,15 @@ int os_open_file(char *file, struct open
 
fd = open64(file, f, mode);
if(fd < 0)
-   return(-errno);
+   return -errno;
 
if(flags.cl && fcntl(fd, F_SETFD, 1)){
err = -errno;
-   os_close_file(fd);
+   close(fd);
return err;
}
 
-   return(fd);
+   return fd;
 }
 
 int os_connect_socket(char *name)
@@ -292,31 +285,33 @@ int os_file_size(char *file, unsigned lo
err = os_stat_file(file, );
if(err < 0){
printk("Couldn't stat \"%s\" : err = %d\n", file, -err);
-   return(err);
+   return err;
}
 
if(S_ISBLK(buf.ust_mode)){
int fd;
long blocks;
 
-   fd = os_open_file(file, of_read(OPENFLAGS()), 0);
-   if(fd < 0){
-   printk("Couldn't open \"%s\", errno = %d\n", file, -fd);
-   return(fd);
+   fd = open(file, O_RDONLY, 0);
+   if(fd < 0) {
+   err = -errno;
+   printk("Couldn't open \"%s\", errno = %d\n", file,
+  errno);
+   return err;
}
if(ioctl(fd, BLKGETSIZE, ) < 0){
err = -errno;
printk("Couldn't get the block size of \"%s\", "
   "errno = %d\n", file, errno);
-   os_close_file(fd);
-   return(err);
+   close(fd);
+   return err;
}
*size_out = ((long long) blocks) * 512;
-   os_close_file(fd);
-   return(0);
+   close(fd);
}
-   *size_out = buf.ust_size;
-   return(0);
+   else *size_out = buf.ust_size;
+
+   return 0;
 }
 
 int os_file_modtime(char *file, unsigned long *modtime)
@@ -334,35 +329,28 @@ int os_file_modtime(char *file, unsigned
return 0;
 }
 
-int os_get_exec_close(int fd, int* close_on_exec)
+int os_get_exec_close(int fd, 

[PATCH 4/6] UML - Clean up tlb flush path

2007-08-17 Thread Jeff Dike
Tidy the tlb flushing code.

With tt mode gone, there is no reason to have the capability to have
different host-level address space updating routines.  So, do_op is
called directly from do_mmap, do_mprotect, and do_munmap, rather than
calling a function pointer that it is given.

There was a large amount of data that was passed from function to
function, being used at the lowest level, without being changed.  This
stuff is now encapsulated in a structure which is initialized at the
top layer and passed down.  This simplifies the code, reduces the
amount of code needed to pass the parameters around, and saves on
stack space.

A somewhat more subtle change is the meaning of the current operation
index.  It used to start at -1, being pre-incremented when adding an
operation.  It now starts at 0, being post-incremented, with
associated adjustments of +/- 1 on comparisons.

In addition, tlb.h contained a couple of declarations which had no
users outside of tlb.c, so they could be moved or deleted.

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
--
 arch/um/include/tlb.h |   27 -
 arch/um/kernel/tlb.c  |  258 --
 2 files changed, 127 insertions(+), 158 deletions(-)

Index: linux-2.6.22/arch/um/include/tlb.h
===
--- linux-2.6.22.orig/arch/um/include/tlb.h 2007-08-17 13:36:34.0 
-0400
+++ linux-2.6.22/arch/um/include/tlb.h  2007-08-17 13:37:27.0 -0400
@@ -8,34 +8,7 @@
 
 #include "um_mmu.h"
 
-struct host_vm_op {
-   enum { NONE, MMAP, MUNMAP, MPROTECT } type;
-   union {
-   struct {
-   unsigned long addr;
-   unsigned long len;
-   unsigned int prot;
-   int fd;
-   __u64 offset;
-   } mmap;
-   struct {
-   unsigned long addr;
-   unsigned long len;
-   } munmap;
-   struct {
-   unsigned long addr;
-   unsigned long len;
-   unsigned int prot;
-   } mprotect;
-   } u;
-};
-
 extern void force_flush_all(void);
-extern void fix_range_common(struct mm_struct *mm, unsigned long start_addr,
- unsigned long end_addr, int force,
-int (*do_ops)(struct mm_context *,
-  struct host_vm_op *, int, int,
-  void **));
 extern int flush_tlb_kernel_range_common(unsigned long start,
 unsigned long end);
 
Index: linux-2.6.22/arch/um/kernel/tlb.c
===
--- linux-2.6.22.orig/arch/um/kernel/tlb.c  2007-08-17 13:36:34.0 
-0400
+++ linux-2.6.22/arch/um/kernel/tlb.c   2007-08-17 13:37:27.0 -0400
@@ -12,19 +12,85 @@
 #include "skas.h"
 #include "tlb.h"
 
+struct host_vm_change {
+   struct host_vm_op {
+   enum { NONE, MMAP, MUNMAP, MPROTECT } type;
+   union {
+   struct {
+   unsigned long addr;
+   unsigned long len;
+   unsigned int prot;
+   int fd;
+   __u64 offset;
+   } mmap;
+   struct {
+   unsigned long addr;
+   unsigned long len;
+   } munmap;
+   struct {
+   unsigned long addr;
+   unsigned long len;
+   unsigned int prot;
+   } mprotect;
+   } u;
+   } ops[1];
+   int index;
+   struct mm_id *id;
+   void *data;
+   int force;
+};
+
+#define INIT_HVC(mm, force) \
+   ((struct host_vm_change) \
+{ .ops = { { .type = NONE } }, \
+  .id  = >context.id, \
+  .data= NULL, \
+  .index   = 0, \
+  .force   = force })
+
+static int do_ops(struct host_vm_change *hvc, int end,
+ int finished)
+{
+   struct host_vm_op *op;
+   int i, ret = 0;
+
+   for (i = 0; i < end && !ret; i++) {
+   op = >ops[i];
+   switch(op->type) {
+   case MMAP:
+   ret = map(hvc->id, op->u.mmap.addr, op->u.mmap.len,
+ op->u.mmap.prot, op->u.mmap.fd,
+ op->u.mmap.offset, finished, >data);
+   break;
+   case MUNMAP:
+   ret = unmap(hvc->id, op->u.munmap.addr,
+   op->u.munmap.len, finished, >data);
+   

[PATCH 0/6] UML - Code cleanups for 2.6.24

2007-08-17 Thread Jeff Dike
These patches are 2.6.24 material.

They are code cleanup, plus a minor bug fix.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] dma: use dma_flags_set_dmaflush in ib_umem_get

2007-08-17 Thread akepner

Add a new parameter "dmaflush" to ib_umem_get, and update all 
callers. For now only the mthca IB driver makes use of the new 
parameter.


Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]>
--
 drivers/infiniband/core/umem.c   |8 ++--
 drivers/infiniband/hw/amso1100/c2_provider.c |2 +-
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |2 +-
 drivers/infiniband/hw/ehca/ehca_mrmw.c   |2 +-
 drivers/infiniband/hw/ipath/ipath_mr.c   |2 +-
 drivers/infiniband/hw/mlx4/cq.c  |2 +-
 drivers/infiniband/hw/mlx4/doorbell.c|2 +-
 drivers/infiniband/hw/mlx4/mr.c  |3 ++-
 drivers/infiniband/hw/mlx4/qp.c  |2 +-
 drivers/infiniband/hw/mlx4/srq.c |2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c |7 ++-
 drivers/infiniband/hw/mthca/mthca_user.h |   10 +-
 include/rdma/ib_umem.h   |4 ++--
 13 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 26d0470..c626d2c 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -64,9 +64,11 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
  * @addr: userspace virtual address to start at
  * @size: length of region to pin
  * @access: IB_ACCESS_xxx flags for memory being pinned
+ * @dmaflush: map this memory "coherently", if necessary 
+ *  (for architectures that support posted DMA)
  */
 struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
-   size_t size, int access)
+   size_t size, int access, int dmaflush)
 {
struct ib_umem *umem;
struct page **page_list;
@@ -78,6 +80,8 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
int ret;
int off;
int i;
+   int flags = dmaflush ? dma_flags_set_dmaflush(DMA_BIDIRECTIONAL): 
+   DMA_BIDIRECTIONAL;
 
if (!can_do_mlock())
return ERR_PTR(-EPERM);
@@ -155,7 +159,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
chunk->nmap = ib_dma_map_sg(context->device,
>page_list[0],
chunk->nents,
-   DMA_BIDIRECTIONAL);
+   flags);
if (chunk->nmap <= 0) {
for (i = 0; i < chunk->nents; ++i)
put_page(chunk->page_list[i].page);
diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c 
b/drivers/infiniband/hw/amso1100/c2_provider.c
index 997cf15..17243b7 100644
--- a/drivers/infiniband/hw/amso1100/c2_provider.c
+++ b/drivers/infiniband/hw/amso1100/c2_provider.c
@@ -449,7 +449,7 @@ static struct ib_mr *c2_reg_user_mr(struct ib_pd *pd, u64 
start, u64 length,
return ERR_PTR(-ENOMEM);
c2mr->pd = c2pd;
 
-   c2mr->umem = ib_umem_get(pd->uobject->context, start, length, acc);
+   c2mr->umem = ib_umem_get(pd->uobject->context, start, length, acc, 0);
if (IS_ERR(c2mr->umem)) {
err = PTR_ERR(c2mr->umem);
kfree(c2mr);
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index f0c7775..d0a514c 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -601,7 +601,7 @@ static struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, u64 
start, u64 length,
if (!mhp)
return ERR_PTR(-ENOMEM);
 
-   mhp->umem = ib_umem_get(pd->uobject->context, start, length, acc);
+   mhp->umem = ib_umem_get(pd->uobject->context, start, length, acc, 0);
if (IS_ERR(mhp->umem)) {
err = PTR_ERR(mhp->umem);
kfree(mhp);
diff --git a/drivers/infiniband/hw/ehca/ehca_mrmw.c 
b/drivers/infiniband/hw/ehca/ehca_mrmw.c
index d97eda3..c13c11c 100644
--- a/drivers/infiniband/hw/ehca/ehca_mrmw.c
+++ b/drivers/infiniband/hw/ehca/ehca_mrmw.c
@@ -329,7 +329,7 @@ struct ib_mr *ehca_reg_user_mr(struct ib_pd *pd, u64 start, 
u64 length,
}
 
e_mr->umem = ib_umem_get(pd->uobject->context, start, length,
-mr_access_flags);
+mr_access_flags, 0);
if (IS_ERR(e_mr->umem)) {
ib_mr = (void *)e_mr->umem;
goto reg_user_mr_exit1;
diff --git a/drivers/infiniband/hw/ipath/ipath_mr.c 
b/drivers/infiniband/hw/ipath/ipath_mr.c
index e442470..e351222 100644
--- a/drivers/infiniband/hw/ipath/ipath_mr.c
+++ b/drivers/infiniband/hw/ipath/ipath_mr.c
@@ -195,7 +195,7 @@ struct ib_mr *ipath_reg_user_mr(struct ib_pd *pd, u64 
start, u64 length,
 

Re: [PATCH] - git-send-email.perl

2007-08-17 Thread Junio C Hamano
Joe Perches <[EMAIL PROTECTED]> writes:

> Here's a path to enable a command line option
> that takes a string argument
>
>   cc-cmd
>
> This modifies the @cc array to include whatever
> output is produced by cc_cmd $patchfile
>
> cccmd can be stored in a config settings file
>
> previous versions of this patch were submitted
> against an older version of git-send-email.perl

... Signed-off-by: ...


> diff --git a/git-send-email.perl b/git-send-email.perl
> index 69559b2..828a77a 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -46,6 +46,9 @@ Options:
> --cc   Specify an initial "Cc:" list for the entire series
>of emails.
>  
> +   --cc-cmd   Specify a command to execute per file which adds
> +  per file specific cc address entries
> +
> --bcc  Specify a list of email addresses that should be Bcc:
> on all the emails.
>  

I do not see a patch to "Documentation/git-send-email.txt" here...

> @@ -652,11 +657,21 @@ foreach my $t (@files) {
>   }
>   }
>   close F;
> +
> + if (${cc_cmd} ne "") {
> + my $output = `${cc_cmd} $t`;
> + my @lines = split("\n", $output);
> + foreach my $c (@lines) {
> + push @cc, $c;
> + printf("(cc-cmd) Adding cc: %s from: '%s'\n", $c, $cc_cmd)
> + unless $quiet;
> + }
> + }
> +

Something like this, with appropriate error checking, perhaps?

open my $cc, "${cc_cmd} $t |";
while (my $c = <$cc>) {
...
}
close $cc;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] dma: introduce no-op stub "dma_flags_set_dmaflush"

2007-08-17 Thread akepner

Introduce a no-op stub function "dma_flags_set_dmaflush". Arches 
can override this if necessary to associate a "dmaflush" attribute
with a DMA-able memory region. In-flight DMA will be flushed to 
host memory when a memory region with the dmaflush attribute is 
written to.

Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]>
-- 
 dma-mapping.h |7 +++
 1 files changed, 7 insertions(+)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 2dc21cb..594a651 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -99,4 +99,11 @@ static inline void dmam_release_declared_memory(struct 
device *dev)
 }
 #endif /* ARCH_HAS_DMA_DECLARE_COHERENT_MEMORY */
 
+#ifndef ARCH_DOES_POSTED_DMA
+static inline int
+dma_flags_set_dmaflush(int dir) {
+   return (dir);
+}
+#endif /* ARCH_DOES_POSTED_DMA */
+
 #endif
-- 
Arthur

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] dma: override "dma_flags_set_dmaflush" for sn-ia64

2007-08-17 Thread akepner

Define ARCH_DOES_POSTED_DMA, and override the dma_flags_set_dmaflush 
function. Also define dma_flags_get_direction, dma_flags_get_dmaflush 
to retrieve the direction and "dmaflush attribute" from flags 
passed to the sn_dma_map_* functions.


Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]>
--
 arch/ia64/sn/pci/pci_dma.c |   35 ++-
 include/asm-ia64/sn/io.h   |   26 ++
 2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/arch/ia64/sn/pci/pci_dma.c b/arch/ia64/sn/pci/pci_dma.c
index d79ddac..754240b 100644
--- a/arch/ia64/sn/pci/pci_dma.c
+++ b/arch/ia64/sn/pci/pci_dma.c
@@ -153,7 +153,7 @@ EXPORT_SYMBOL(sn_dma_free_coherent);
  * @dev: device to map for
  * @cpu_addr: kernel virtual address of the region to map
  * @size: size of the region
- * @direction: DMA direction
+ * @flags: DMA direction, and arch-specific attributes
  *
  * Map the region pointed to by @cpu_addr for DMA and return the
  * DMA address.
@@ -167,17 +167,23 @@ EXPORT_SYMBOL(sn_dma_free_coherent);
  *   figure out how to save dmamap handle so can use two step.
  */
 dma_addr_t sn_dma_map_single(struct device *dev, void *cpu_addr, size_t size,
-int direction)
+int flags)
 {
dma_addr_t dma_addr;
unsigned long phys_addr;
struct pci_dev *pdev = to_pci_dev(dev);
struct sn_pcibus_provider *provider = SN_PCIDEV_BUSPROVIDER(pdev);
+   int dmaflush = dma_flags_get_dmaflush(flags);
 
BUG_ON(dev->bus != _bus_type);
 
phys_addr = __pa(cpu_addr);
-   dma_addr = provider->dma_map(pdev, phys_addr, size, SN_DMA_ADDR_PHYS);
+   if (dmaflush)
+   dma_addr = provider->dma_map_consistent(pdev, phys_addr, size, 
+   SN_DMA_ADDR_PHYS);
+   else
+   dma_addr = provider->dma_map(pdev, phys_addr, size, 
+SN_DMA_ADDR_PHYS);
if (!dma_addr) {
printk(KERN_ERR "%s: out of ATEs\n", __FUNCTION__);
return 0;
@@ -240,18 +246,20 @@ EXPORT_SYMBOL(sn_dma_unmap_sg);
  * @dev: device to map for
  * @sg: scatterlist to map
  * @nhwentries: number of entries
- * @direction: direction of the DMA transaction
+ * @flags: direction of the DMA transaction, and arch-specific attributes
  *
  * Maps each entry of @sg for DMA.
  */
 int sn_dma_map_sg(struct device *dev, struct scatterlist *sg, int nhwentries,
- int direction)
+ int flags)
 {
unsigned long phys_addr;
struct scatterlist *saved_sg = sg;
struct pci_dev *pdev = to_pci_dev(dev);
struct sn_pcibus_provider *provider = SN_PCIDEV_BUSPROVIDER(pdev);
int i;
+   int dmaflush = dma_flags_get_dmaflush(flags);
+   int direction = dma_flags_get_direction(flags);
 
BUG_ON(dev->bus != _bus_type);
 
@@ -259,12 +267,21 @@ int sn_dma_map_sg(struct device *dev, struct scatterlist 
*sg, int nhwentries,
 * Setup a DMA address for each entry in the scatterlist.
 */
for (i = 0; i < nhwentries; i++, sg++) {
+   dma_addr_t dma_addr;
phys_addr = SG_ENT_PHYS_ADDRESS(sg);
-   sg->dma_address = provider->dma_map(pdev,
-   phys_addr, sg->length,
-   SN_DMA_ADDR_PHYS);
 
-   if (!sg->dma_address) {
+   if (dmaflush) {
+   dma_addr = provider->dma_map_consistent(pdev,
+   phys_addr,
+   sg->length,
+   
SN_DMA_ADDR_PHYS);
+   } else {
+   dma_addr = provider->dma_map(pdev,
+phys_addr, sg->length,
+SN_DMA_ADDR_PHYS);
+   }
+
+   if (!(sg->dma_address = dma_addr)) {
printk(KERN_ERR "%s: out of ATEs\n", __FUNCTION__);
 
/*
diff --git a/include/asm-ia64/sn/io.h b/include/asm-ia64/sn/io.h
index 41c73a7..c82eb90 100644
--- a/include/asm-ia64/sn/io.h
+++ b/include/asm-ia64/sn/io.h
@@ -271,4 +271,30 @@ sn_pci_set_vchan(struct pci_dev *pci_dev, unsigned long 
*addr, int vchan)
return 0;
 }
 
+#define ARCH_DOES_POSTED_DMA
+/* here we steal some upper bits from the "direction" argument to the 
+ * dma_map_* routines */
+#define DMA_ATTR_SHIFT 8
+/* bottom 8 bits for direction, remaining bits for additional "attributes" */
+#define DMA_FLUSH_ATTR 0x1
+/* For now the only attribute is "flush in-flight dma when writing to 
+ * this DMA mapped memory" */
+#define DMA_DIR_MASK   ((1 << DMA_ATTR_SHIFT) - 1)
+#define DMA_ATTR_MASK  ~DMA_DIR_MASK
+

[PATCH 0/3] allow drivers to flush in-flight DMA

2007-08-17 Thread akepner

Altix supports "posted DMA", so that DMA may complete out 
of order. In some cases it's necessary for a driver to 
ensure that in-flight DMA has been flushed to memory for 
correct operation.

In particular this can be a problem with Infiniband, where 
writes to Completion Queues can race with DMA of data.

The following patchset addresses this problem by allowing a 
memory region to be mapped with a "barrier" attribute. (On 
Altix, writes to memory regions with the barrier attribute 
have the side effect that in-flight DMA gets flushed to host 
memory.)

There are three patches in this set:

[1/3] dma: introduce no-op stub "dma_flags_set_dmaflush"
[2/3] dma: override "dma_flags_set_dmaflush" for sn-ia64
[3/3] dma: use dma_flags_set_dmaflush in ib_umem_get 
  (mthca only, for now)

-- 
Arthur

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
No it does not have any volatile semantics. atomic_dec() can be 
reordered
at will by the compiler within the current basic unit if you do not 
add a

barrier.


"volatile" has nothing to do with reordering.


If you're talking of "volatile" the type-qualifier keyword, then
http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows
otherwise.


I'm not sure what in that mail you mean, but anyway...

Yes, of course, the fact that "volatile" creates a side effect
prevents certain things from being reordered wrt the atomic_dec();
but the atomic_dec() has a side effect *already* so the volatile
doesn't change anything.


atomic_dec() writes
to memory, so it _does_ have "volatile semantics", implicitly, as
long as the compiler cannot optimise the atomic variable away
completely -- any store counts as a side effect.


I don't think an atomic_dec() implemented as an inline "asm volatile"
or one that uses a "forget" macro would have the same re-ordering
guarantees as an atomic_dec() that uses a volatile access cast.


The "asm volatile" implementation does have exactly the same
reordering guarantees as the "volatile cast" thing, if that is
implemented by GCC in the "obvious" way.  Even a "plain" asm()
will do the same.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [5/12] x86_64: Make patching more robust, fix paravirt issue

2007-08-17 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> This patch breaks Xen booting.  I get infinite recursive faults during
> patching when this patch is present.  If I boot with
> "noreplace-paravirt" it works OK, and it works as expected if I back
> this patch out.  I haven't tracked down the exact failure mode; its a
> little hard to debug because it overwrites all kernel memory with
> recursive fault stackframes and then finally traps out to Xen when it
> hits the bottom of memory.
> 
> I think we should back this one out before .23.

I agree (second time this has broken during .23 devel).

thanks,
-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Herbert Xu
On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote:
>
> gcc bugzilla bug #33102, for whatever that ends up being worth.  ;-)

I had totally forgotten that I'd already filed that bug more
than six years ago until they just closed yours as a duplicate
of mine :)

Good luck in getting it fixed!

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

atomic_dec() writes
to memory, so it _does_ have "volatile semantics", implicitly, as
long as the compiler cannot optimise the atomic variable away
completely -- any store counts as a side effect.


I don't think an atomic_dec() implemented as an inline "asm volatile"
or one that uses a "forget" macro would have the same re-ordering
guarantees as an atomic_dec() that uses a volatile access cast.


The "asm volatile" implementation does have exactly the same
reordering guarantees as the "volatile cast" thing,


I don't think so.


"asm volatile" creates a side effect.  Side effects aren't
allowed to be reordered wrt sequence points.  This is exactly
the same reason as why "volatile accesses" cannot be reordered.


if that is
implemented by GCC in the "obvious" way.  Even a "plain" asm()
will do the same.


Read the relevant GCC documentation.


I did, yes.


[ of course, if the (latest) GCC documentation is *yet again*
  wrong, then alright, not much I can do about it, is there. ]


There was (and is) nothing wrong about the "+m" documentation, if
that is what you are talking about.  It could be extended now, to
allow "+m" -- but that takes more than just "fixing" the documentation.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [5/12] x86_64: Make patching more robust, fix paravirt issue

2007-08-17 Thread Jeremy Fitzhardinge
Andi Kleen wrote:
> Commit 19d36ccdc34f5ed444f8a6af0cbfdb6790eb1177 "x86: Fix alternatives
> and kprobes to remap write-protected kernel text" uses code which is
> being patched for patching.
>
> In particular, paravirt_ops does patching in two stages: first it
> calls paravirt_ops.patch, then it fills any remaining instructions
> with nop_out().  nop_out calls text_poke() which calls
> lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val):
> that call site is one of the places we patch.
>
> If we always do patching as one single call to text_poke(), we only
> need make sure we're not patching the memcpy in text_poke itself.
> This means the prototype to paravirt_ops.patch needs to change, to
> marshal the new code into a buffer rather than patching in place as it
> does now.  It also means all patching goes through text_poke(), which
> is known to be safe (apply_alternatives is also changed to make a
> single patch).
>   

Hi Andi,

This patch breaks Xen booting.  I get infinite recursive faults during
patching when this patch is present.  If I boot with
"noreplace-paravirt" it works OK, and it works as expected if I back
this patch out.  I haven't tracked down the exact failure mode; its a
little hard to debug because it overwrites all kernel memory with
recursive fault stackframes and then finally traps out to Xen when it
hits the bottom of memory.

I think we should back this one out before .23.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Marvell 88E8056 gigabit ethernet controller

2007-08-17 Thread Stephen Hemminger
On Fri, 17 Aug 2007 05:42:13 -0700 (PDT)
Kevin E <[EMAIL PROTECTED]> wrote:

> Hi all,
> 
>   I've read where the onboard Marvell lan controller on
> some Gigabyte boards don't work.  I've got two systems
> using the same Gigabyte board, on one the LAN works on
> the other it dies like described by others.  Here's
> the systems:
> 
> 
> Working system:
> Gigabyte 965P-DS3 rev 3.3  (BIOS F10)
> Core2 Q6600
> 2GB Corsair XMS2 memory
> kernel 2.6.22.3
> 
> lspci for LAN controller:
> 04:00.0 Ethernet controller: Marvell Technology Group
> Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev
> 14)
> 
> 
> Broken system:
> Gigabyte 965P-DS3 rev 3.3  (BIOS F10)
> Core2 E4400
> 2GB Corsair XMS2 memory
> kernel 2.6.22.3
> 
> lspci for LAN controller:
> 03:00.0 Ethernet controller: Marvell Technology Group
> Ltd. Unknown device 4364 (rev 12)
> 
> 
>   The BIOS for the two systems are setup the same and
> the config for the kernels are the same too.  I've
> actually tried taking the kernel from the working
> system and booting it on the broken one but still the
> LAN dies after a couple of seconds.  The working
> system has one card plugged in (nvidia based PCI-X
> video card), I've taken that card and plugged into the
> broken system, booted the same kernel, and it still
> dies after a while.
> 
>   I will gladly provide any info needed if it can help
> in getting this chipset working on the Gigabyte
> boards.
> 
>   Thanks,
>   Kevin

I maintain the sky2 driver, and have one of the (buggy) Gigabyte motherboards.
It is interesting that the problem seems to track with video card.
Are you using the Nvidia binary driver?
The video card in the system I have troubles with is:
ATI Technologies Inc RV370 [Radeon X300SE]

Surprisingly, using other PCI-E cards with same driver (different Marvell chips)
has no problem.  Vendor version of sk98lin driver has same failure mode
on the buggy hardware.

You might want to look at lspci -vvv output on two system to see if there
are differences. Perhaps there is a CPU speed dependency?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread David Miller
From: Roland Dreier <[EMAIL PROTECTED]>
Date: Fri, 17 Aug 2007 16:31:07 -0700

>  > >  > When using RDMA you lose the capability to do packet shaping,
>  > >  > classification, and all the other wonderful networking facilities
>  > >  > you've grown to love and use over the years.
>  > > 
>  > > Same thing with TSO and LRO and who knows what else.
>  > 
>  > Not true at all.  Full classification and filtering still is usable
>  > with TSO and LRO.
> 
> Well, obviously with TSO and LRO the packets that the stack sends or
> receives are not the same as what's on the wire.  Whether that breaks
> your wonderful networking facilities or not depends on the specifics
> of the particular facility I guess -- for example shaping is clearly
> broken by TSO.  (And people can wonder what the packet trains TSO
> creates do to congestion control on the internet, but the netdev crowd
> has already decided that TSO is "good" and RDMA is "bad")

This is also a series of falsehoods.  All packet filtering,
queue management, and packet scheduling facilities work perfectly
fine and as designed with both LRO and TSO.

When problems come up, they are bugs, and we fix them.

Please stop spreading this FUD about TSO and LRO.

The fact is that RDMA bypasses the whole stack so that supporting
these facilities is not even _POSSIBLE_.  With stateless offloads it
is possible to support all of these facilities, and we do.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[announce] Updated PS3 Linux Distro Kit released

2007-08-17 Thread Geoff Levand
For those interested, an updated PS3 Linux Distributor's Starter
Kit (v1.4) was released.

The release note is here:

  
http://www.kernel.org/pub/linux/kernel/people/geoff/cell/CELL-Linux-CL_20070817-ADDON/README-e.txt

And the CD-ROM iso image is here (169 MiB):

  ftp://ftp.infradead.org/pub/Sony-PS3/CELL-Linux-CL_20070817-ADDON.iso
  http://ftp.uk.linux.org/pub/linux/Sony-PS3/CELL-Linux-CL_20070817-ADDON.iso

I extracted files are here for convenient browsing:

  
http://www.kernel.org/pub/linux/kernel/people/geoff/cell/CELL-Linux-CL_20070817-ADDON

-Geoff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Paul E. McKenney
On Thu, Aug 16, 2007 at 08:50:30PM -0700, Linus Torvalds wrote:
> Just try it yourself:
> 
>   volatile int i;
>   int j;
> 
>   int testme(void)
>   {
>   return i <= 1;
>   }
> 
>   int testme2(void)
>   {
>   return j <= 1;
>   }
> 
> and compile with all the optimizations you can.
> 
> I get:
> 
>   testme:
>   movli(%rip), %eax
>   subl$1, %eax
>   setle   %al
>   movzbl  %al, %eax
>   ret
> 
> vs
> 
>   testme2:
>   xorl%eax, %eax
>   cmpl$1, j(%rip)
>   setle   %al
>   ret
> 
> (now, whether that "xorl + setle" is better than "setle + movzbl", I don't 
> really know - maybe it is. But that's not thepoint. The point is the 
> difference between
> 
> movli(%rip), %eax
> subl$1, %eax
> 
> and
> 
> cmpl$1, j(%rip)

gcc bugzilla bug #33102, for whatever that ends up being worth.  ;-)

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

#define forget(a)   __asm__ __volatile__ ("" :"=m" (a) :"m" (a))

[ This is exactly equivalent to using "+m" in the constraints, as 
recently
  explained on a GCC list somewhere, in response to the patch in my 
bitops

  series a few weeks back where I thought "+m" was bogus. ]


[It wasn't explained on a GCC list in response to your patch, as
far as I can see -- if I missed it, please point me to an archived
version of it].


http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01758.html


Ah yes, that old thread, thank you.


That's when _I_ came to know how GCC interprets "+m", but probably
this has been explained on those lists multiple times. Who cares,
anyway?


I just couldn't find the thread you meant, I thought I missed
have it, that's all :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Stephen Hemminger
Tne network code does memset for 6 and 8 byte values, that can easily
be optimized into simple assignments without string instructions.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- a/include/asm-i386/string.h 2007-08-17 15:14:37.0 -0700
+++ b/include/asm-i386/string.h 2007-08-17 16:49:10.0 -0700
@@ -228,6 +228,14 @@ static __always_inline void * __constant
case 4:
*(unsigned long *)s = pattern;
return s;
+   case 6:
+   *(unsigned long *)s = pattern;
+   *(2+(unsigned short *)s) = pattern;
+   return s;
+   case 8:
+   *(unsigned long *)s = pattern;
+   *(1+(unsigned long *)s) = pattern;
+   return s;
}
 #define COMMON(x) \
 __asm__  __volatile__( \
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kfree(0) - ok?

2007-08-17 Thread Satyam Sharma


On Sat, 18 Aug 2007, Thomas Gleixner wrote:

> On Sat, 2007-08-18 at 00:42 +0300, Pekka Enberg wrote:
> > On 8/18/07, Christoph Lameter <[EMAIL PROTECTED]> wrote:
> > > That was merged over my objections. IMHO ksize(NULL) should fail since we
> > > are determining the size of an unallocated object.
> > 
> > Agreed, especially as we have real zero-sized objects returned from
> > kmalloc() et al now.
> 
> Do we really ? 
> 
> If yes, who invented this 1980s reminiscence, where you got valid
> pointers for malloc(0) ?

No, not valid. Dereferencing it will oops. See ZERO_SIZE_PTR.

> This is completely stupid. You do not go into a bar and order an empty
> glass, just because you might eventually become thirsty later.

What we're doing presently is at least better than what SLAB did
previously (did return a valid pointer!), all this time :-)

I do agree with you in principle, of course. But it's not for the kind of
cases that you describe -- "kmalloc(0) just because I'd eventually want
to krealloc() it to something meaningful later". This was done because
there's a lot of lazy callsites that "don't want to write code for corner
cases explicitly". Sad, very sad, I say :-)

[ The krealloc() discussion on this thread came about when I noticed that
  it's the only callsite of ksize() that would reasonably / meaningfully
  want to deal with NULL ptrs, for whom I noticed (from Andrew's initial
  mail) that ksize(NULL) returned 0. As you know from the canonical
  semantics of realloc(), it _is_ supposed to deal with NULL ptrs, hence
  the discussion. ]


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > > > No it does not have any volatile semantics. atomic_dec() can be
> > > > reordered
> > > > at will by the compiler within the current basic unit if you do not add
> > > > a
> > > > barrier.
> > > 
> > > "volatile" has nothing to do with reordering.
> > 
> > If you're talking of "volatile" the type-qualifier keyword, then
> > http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows
> > otherwise.
> 
> I'm not sure what in that mail you mean, but anyway...
> 
> Yes, of course, the fact that "volatile" creates a side effect
> prevents certain things from being reordered wrt the atomic_dec();
> but the atomic_dec() has a side effect *already* so the volatile
> doesn't change anything.

That's precisely what that sub-thread (read down to the last mail
there, and not the first mail only) shows. So yes, "volatile" does
have something to do with re-ordering (as guaranteed by the C
standard).


> > > atomic_dec() writes
> > > to memory, so it _does_ have "volatile semantics", implicitly, as
> > > long as the compiler cannot optimise the atomic variable away
> > > completely -- any store counts as a side effect.
> > 
> > I don't think an atomic_dec() implemented as an inline "asm volatile"
> > or one that uses a "forget" macro would have the same re-ordering
> > guarantees as an atomic_dec() that uses a volatile access cast.
> 
> The "asm volatile" implementation does have exactly the same
> reordering guarantees as the "volatile cast" thing,

I don't think so.

> if that is
> implemented by GCC in the "obvious" way.  Even a "plain" asm()
> will do the same.

Read the relevant GCC documentation.

[ of course, if the (latest) GCC documentation is *yet again*
  wrong, then alright, not much I can do about it, is there. ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kfree(0) - ok?

2007-08-17 Thread Thomas Gleixner
On Sat, 2007-08-18 at 00:42 +0300, Pekka Enberg wrote:
> On 8/18/07, Christoph Lameter <[EMAIL PROTECTED]> wrote:
> > That was merged over my objections. IMHO ksize(NULL) should fail since we
> > are determining the size of an unallocated object.
> 
> Agreed, especially as we have real zero-sized objects returned from
> kmalloc() et al now.

Do we really ? 

If yes, who invented this 1980s reminiscence, where you got valid
pointers for malloc(0) ?

This is completely stupid. You do not go into a bar and order an empty
glass, just because you might eventually become thirsty later.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > #define forget(a)   __asm__ __volatile__ ("" :"=m" (a) :"m" (a))
> > 
> > [ This is exactly equivalent to using "+m" in the constraints, as recently
> >   explained on a GCC list somewhere, in response to the patch in my bitops
> >   series a few weeks back where I thought "+m" was bogus. ]
> 
> [It wasn't explained on a GCC list in response to your patch, as
> far as I can see -- if I missed it, please point me to an archived
> version of it].

http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01758.html

is a follow-up in the thread on the [EMAIL PROTECTED] mailing list,
which began with:

http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01677.html

that was posted by Jan Kubicka, as he quotes in that initial posting,
after I had submitted:

http://lkml.org/lkml/2007/7/23/252

which was a (wrong) patch to "rectify" what I thought was the "bogus"
"+m" constraint, as per the quoted extract from gcc docs (that was
given in that (wrong) patch's changelog).

That's when _I_ came to know how GCC interprets "+m", but probably
this has been explained on those lists multiple times. Who cares,
anyway?


> One last time: it isn't equivalent on older (but still supported
> by Linux) versions of GCC.  Current versions of GCC allow it, but
> it has no documented behaviour at all, so use it at your own risk.

True.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] serial: keep the DTR setting for serial console.

2007-08-17 Thread Yinghai Lu
[PATCH] serial: keep the DTR setting for serial console.

with reverting "x86, serial: convert legacy COM ports to platform devices", we 
will have the serial console before the port is probled again.

uart_add_one_port==>uart_configure_port==>set_mcttrl(port, 0) will clear the 
DTR setting by uart_set_options(). then I will lose my output from serial 
console again.

So try to keep DTR in uart_configure_port()

Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]>
Cc: Russell King <[EMAIL PROTECTED]>
Cc: Alan Cox <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Bjorn Helgaas <[EMAIL PROTECTED]>

diff --git a/drivers/serial/serial_core.c b/drivers/serial/serial_core.c
index 030a606..70f1106 100644
--- a/drivers/serial/serial_core.c
+++ b/drivers/serial/serial_core.c
@@ -2117,10 +2117,11 @@ uart_configure_port(struct uart_driver *drv, struct 
uart_state *state,
 
/*
 * Ensure that the modem control lines are de-activated.
+* keep the DTR setting that is set in uart_set_options()
 * We probably don't need a spinlock around this, but
 */
spin_lock_irqsave(>lock, flags);
-   port->ops->set_mctrl(port, 0);
+   port->ops->set_mctrl(port, port->mctrl & TIOCM_DTR);
spin_unlock_irqrestore(>lock, flags);
 
/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86-64: memset optimization

2007-08-17 Thread Stephen Hemminger
Optimize uses of memset with small constant offsets.
This will generate smaller code, and avoid the slow rep/string instructions.
Code copied from i386 with a little cleanup.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

--- a/include/asm-x86_64/string.h   2007-08-17 15:14:32.0 -0700
+++ b/include/asm-x86_64/string.h   2007-08-17 15:36:30.0 -0700
@@ -42,9 +42,51 @@ extern void *__memcpy(void *to, const vo
 __ret = __builtin_memcpy((dst),(src),__len);   \
   __ret; }) 
 #endif
-
 #define __HAVE_ARCH_MEMSET
-void *memset(void *s, int c, size_t n);
+void *__memset(void *s, int c, size_t n);
+
+/* Optimize for cases of trivial memset's
+ * Compiler should optimize away all but the case used.
+ */
+static __always_inline void *
+__constant_c_and_count_memset(void *s, int c, size_t count)
+{
+   unsigned long pattern = 0x01010101UL * (unsigned char) c;
+
+   switch (count) {
+   case 0:
+   return s;
+   case 1:
+   *(unsigned char *)s = pattern;
+   return s;
+   case 2:
+   *(unsigned short *)s = pattern;
+   return s;
+   case 3:
+   *(unsigned short *)s = pattern;
+   *(2+(unsigned char *)s) = pattern;
+   return s;
+   case 4:
+   *(unsigned long *)s = pattern;
+   return s;
+   case 6:
+   *(unsigned long *)s = pattern;
+   *(2+(unsigned short *)s) = pattern;
+   return s;
+   case 8:
+   *(unsigned long *)s = pattern;
+   *(1+(unsigned long *)s) = pattern;
+   return s;
+   default:
+   return __memset(s, c, count);
+   }
+}
+#define memset(s, c, count)\
+   (__builtin_constant_p(c)\
+? __constant_c_and_count_memset((s),(c),(count))   \
+: __memset((s),(c),(count)))
+
+
 
 #define __HAVE_ARCH_MEMMOVE
 void * memmove(void * dest,const void *src,size_t count);
--- a/arch/x86_64/kernel/x8664_ksyms.c  2007-08-17 15:14:32.0 -0700
+++ b/arch/x86_64/kernel/x8664_ksyms.c  2007-08-17 15:44:58.0 -0700
@@ -48,10 +48,12 @@ EXPORT_SYMBOL(__read_lock_failed);
 #undef memmove
 
 extern void * memset(void *,int,__kernel_size_t);
+extern void * __memset(void *,int,__kernel_size_t);
 extern void * memcpy(void *,const void *,__kernel_size_t);
 extern void * __memcpy(void *,const void *,__kernel_size_t);
 
 EXPORT_SYMBOL(memset);
+EXPORT_SYMBOL(__memset);
 EXPORT_SYMBOL(memcpy);
 EXPORT_SYMBOL(__memcpy);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread Roland Dreier
 > >  > When using RDMA you lose the capability to do packet shaping,
 > >  > classification, and all the other wonderful networking facilities
 > >  > you've grown to love and use over the years.
 > > 
 > > Same thing with TSO and LRO and who knows what else.
 > 
 > Not true at all.  Full classification and filtering still is usable
 > with TSO and LRO.

Well, obviously with TSO and LRO the packets that the stack sends or
receives are not the same as what's on the wire.  Whether that breaks
your wonderful networking facilities or not depends on the specifics
of the particular facility I guess -- for example shaping is clearly
broken by TSO.  (And people can wonder what the packet trains TSO
creates do to congestion control on the internet, but the netdev crowd
has already decided that TSO is "good" and RDMA is "bad")

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kfree(0) - ok?

2007-08-17 Thread Christoph Lameter
On Sat, 18 Aug 2007, Pekka Enberg wrote:

> Agreed, especially as we have real zero-sized objects returned from
> kmalloc() et al now.



Slab allocators: Fail if ksize is called with a NULL parameter

A NULL pointer means that the object was not allocated. One cannot
determine the size of an object that has not been allocated. Currently
we return 0 but we really should BUG() on attempts to determine the size
of something nonexistent.

krealloc() interprets NULL to mean a zero sized object. Handle that
separately in krealloc().

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6/mm/slab.c
===
--- linux-2.6.orig/mm/slab.c2007-08-17 16:17:41.0 -0700
+++ linux-2.6/mm/slab.c 2007-08-17 16:18:15.0 -0700
@@ -4436,7 +4436,8 @@ const struct seq_operations slabstats_op
  */
 size_t ksize(const void *objp)
 {
-   if (unlikely(ZERO_OR_NULL_PTR(objp)))
+   BUG_ON(!objp);
+   if (unlikely(objp == ZERO_SIZE_PTR))
return 0;
 
return obj_size(virt_to_cache(objp));
Index: linux-2.6/mm/slob.c
===
--- linux-2.6.orig/mm/slob.c2007-08-17 16:18:19.0 -0700
+++ linux-2.6/mm/slob.c 2007-08-17 16:18:40.0 -0700
@@ -484,7 +484,8 @@ size_t ksize(const void *block)
 {
struct slob_page *sp;
 
-   if (ZERO_OR_NULL_PTR(block))
+   BUG_ON(!block);
+   if (block == ZERO_SIZE_PTR)
return 0;
 
sp = (struct slob_page *)virt_to_page(block);
Index: linux-2.6/mm/slub.c
===
--- linux-2.6.orig/mm/slub.c2007-08-17 16:16:36.0 -0700
+++ linux-2.6/mm/slub.c 2007-08-17 16:17:36.0 -0700
@@ -2426,7 +2426,8 @@ size_t ksize(const void *object)
struct page *page;
struct kmem_cache *s;
 
-   if (ZERO_OR_NULL_PTR(object))
+   BUG_ON(!object);
+   if (object == ZERO_SIZE_PTR)
return 0;
 
page = get_object_page(object);
Index: linux-2.6/mm/util.c
===
--- linux-2.6.orig/mm/util.c2007-08-17 16:16:29.0 -0700
+++ linux-2.6/mm/util.c 2007-08-17 16:16:32.0 -0700
@@ -81,14 +81,16 @@ EXPORT_SYMBOL(kmemdup);
 void *krealloc(const void *p, size_t new_size, gfp_t flags)
 {
void *ret;
-   size_t ks;
+   size_t ks = 0;
 
if (unlikely(!new_size)) {
kfree(p);
return ZERO_SIZE_PTR;
}
 
-   ks = ksize(p);
+   if (p)
+   ks = ksize(p);
+
if (ks >= new_size)
return (void *)p;
 
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][kprobes] support kretprobe-blacklist

2007-08-17 Thread Masami Hiramatsu
Hi Andrew,

Thank you for comments.

Andrew Morton wrote:
>> Index: 2.6-mm/include/asm-i386/kprobes.h
>> ===
>> --- 2.6-mm.orig/include/asm-i386/kprobes.h
>> +++ 2.6-mm/include/asm-i386/kprobes.h
>> @@ -44,6 +44,7 @@ typedef u8 kprobe_opcode_t;
>>
>>  #define ARCH_SUPPORTS_KRETPROBES
>>  #define flush_insn_slot(p)  do { } while (0)
>> +#define ARCH_SUPPORTS_KRETPROBE_BLACKLIST
> 
> Can we avoid adding this please?

Yes, sure. I'll update my patch and eliminate those ifdefs.

> It should at least have been a CONFIG_foo thing, defined in arch/*/Kconfig.
> 
> But that still requires nasty ifdefs in the C code.  It would be very small
> overhead just to require that all architectures implement
> arch_kretprobe_blacklist[] (which can be renamed to kretprobe_blacklist[]).
>  Architectures which don't need a blacklist can just have { { 0, 0 } }.
> 
> If the few bytes of overhead on non-x86 really offends then one could do
> something like this:
> 
> in powerpc header file:
> 
> #define kretprobe_blacklist_size 0
> 
> in x86 header file:
> 
> extern const int kretprobe_blacklist_size;
> 
> in x86 C file:
> 
> const int kretprobe_blacklist_size = ARRAY_SIZE(kretprobe_blacklist);

It's looks very nice, thank you for the advice.
I think we can directly define "kretprobe_blacklist" as 0 in (for
example)ppc header instead of introducing "kretprobe_blacklist_size",
and check if "kretprobe_blacklist" is 0 or not in common code, is it OK?

> and then this code:
> 
>> --- 2.6-mm.orig/kernel/kprobes.c
>> +++ 2.6-mm/kernel/kprobes.c
>> @@ -716,6 +716,18 @@ int __kprobes register_kretprobe(struct
>>  int ret = 0;
>>  struct kretprobe_instance *inst;
>>  int i;
>> +#ifdef ARCH_SUPPORTS_KRETPROBE_BLACKLIST
>> +void *addr = rp->kp.addr;
>> +
>> +if (addr == NULL)
>> +kprobe_lookup_name(rp->kp.symbol_name, addr);
>> +addr += rp->kp.offset;
>> +
>> +for (i = 0; arch_kretprobe_blacklist[i].name != NULL; i++) {
>> +if (arch_kretprobe_blacklist[i].addr == addr)
>> +return -EINVAL;
>> +}
>> +#endif
> 
> can be put inside
> 
>   if (kretprobe_blacklist_size) {
>   ...
>   }
> 
> so the compiler will remove it all for (say) powerpc.
>
> There are lots of ways of doing it but code like this:
> 
>> +#ifdef ARCH_SUPPORTS_KRETPROBE_BLACKLIST
>> +/* lookup the function address from its name */
>> +for (i = 0; arch_kretprobe_blacklist[i].name != NULL; i++) {
>> +kprobe_lookup_name(arch_kretprobe_blacklist[i].name,
>> +   arch_kretprobe_blacklist[i].addr);
>> +if (!arch_kretprobe_blacklist[i].addr)
>> +printk("kretprobe: Unknown blacklisted function: %s\n",
>> +   arch_kretprobe_blacklist[i].name);
>> +}
>> +#endif
> 
> really isn't the sort of thing we like to see spreading through core kernel
> code.
> 
> Have a think about it please, see what we can come up with?

OK, I see. I'll do that next time.

Best regards,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [EMAIL PROTECTED], [EMAIL PROTECTED]
Tel: +1-978-392-2419
Tel: +1-508-982-2642 (cell phone)
Fax: +1-978-392-1001
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix section mismatch in the Adaptec DPT SCSI Raid driver

2007-08-17 Thread Joe Korty
On Fri, Aug 17, 2007 at 02:18:56PM -0700, Andrew Morton wrote:
> Please always provide at least a copy of the error message when providing
> patches which fix warnings, or build errors, or section mismatches.
> 
> For section mismatches, an analysis of what caused the problem would help,
> too.  It saves others from having to do the same thing.
> 
> In this case, I'd need to see what error is being fixed so that I can judge
> the seriousness of the problem.  In this case I don't _think_ it'll be
> terribly serious because iirc most architectures don't free exitcall memory.




Fix section mismatch in the Adaptec DPT SCSI Raid driver.

WARNING: vmlinux.o(.init.text+0x1fcd2): Section mismatch:
reference to .exit.text:adpt_exit (between 'adpt_init' and 
'ahc_linux_init')

This warning is due to adaptec device detection calling the exit routine
on failure to properly register the adaptec device.

The exit routine + call was added on July 30 by
  Commit: 55d9fcf57ba5ec427544fca7abc335cf3da78160
  Author: Matthew Wilcox
  Subject: [SCSI] dpt_i2o: convert to SCSI hotplug model.

Mathew: isn't a module exit routine a little too strong to be calling
on the failure of a single device?  Module exit implies that other,
non-failing adaptec raid devices will also get shut down.

Signed-off-by: Joe Korty <[EMAIL PROTECTED]>

Index: 2.6.23-rc3-git1/drivers/scsi/dpt_i2o.c
===
--- 2.6.23-rc3-git1.orig/drivers/scsi/dpt_i2o.c 2007-08-17 16:36:05.0 
-0400
+++ 2.6.23-rc3-git1/drivers/scsi/dpt_i2o.c  2007-08-17 16:50:13.0 
-0400
@@ -3351,7 +3351,7 @@
return count > 0 ? 0 : -ENODEV;
 }
 
-static void __exit adpt_exit(void)
+static void adpt_exit(void)
 {
while (hba_chain)
adpt_release(hba_chain);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

#define forget(a)   __asm__ __volatile__ ("" :"=m" (a) :"m" (a))

[ This is exactly equivalent to using "+m" in the constraints, as 
recently
  explained on a GCC list somewhere, in response to the patch in my 
bitops

  series a few weeks back where I thought "+m" was bogus. ]


[It wasn't explained on a GCC list in response to your patch, as
far as I can see -- if I missed it, please point me to an archived
version of it].

One last time: it isn't equivalent on older (but still supported
by Linux) versions of GCC.  Current versions of GCC allow it, but
it has no documented behaviour at all, so use it at your own risk.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/12] Blackfin arch: add peripheral resource allocation support

2007-08-17 Thread David Brownell
On Friday 17 August 2007, Robin Getz wrote:
> On Fri 17 Aug 2007 17:10, David Brownell pondered:
> > On the other hand, maybe you want your "typical" customer to
> > be more of a systems integrator than anything else.
> 
> We are getting yelled at by our customers (I was on the phone yesterday), 
> because the kernel build environment we distribute (the default) was not 
> inside Eclipse, and someone couldn't push a button on a GUI rather than 
> typing "make".

Press the "enter" button after typing "make".  ;)


> While Linux and other open source software is free, for some taking advantage 
> of its benefits can require a significant investment in time. Linux is 
> powerful, rich, and flexible. It is these very characteristics that make 
> Linux so appealing that also create a significant level of complexity.

Yes.

 
> We are seeing more and more "first time buyers" jumping from bare metal - no 
> OS, to Linux. For them - the ease of not having to write configuration files 
> gives them a warm fuzzy feeling when they boot their board, and it 
> just "works".

There's certainly a lot to be said for having things
just work the first time when you're making such a big
transition in tools.  On the other hand, at some point
the training wheels need to come off!



> > > >That said, how you handle pinmux on Blackfin is your business.
> > > >
> > > >But you should know that this approach seems idiosyncratic and
> > > >more complex than needed:  when pin config is done early and as
> > > >part of board setup, drivers don't need to care about it or to
> > > >handle any pinmux errors.  And heck, products can sometimes be
> > > >shipped with the bootloader having done all pinmux setup, so
> > > >Linux won't need to worry about it at all.  That can help ship
> > > >multiple board revisions using the same kernel.
> > > 
> > > This works for fixed function boards.
> > 
> > That is, for typical products embedding Linux...
> 
> We have multiple customers shipping the same bootloader/kernel binary on 
> different products, and the only difference is the /etc/rc file - which 
> drivers they install, and a few things in userspace.

Be careful there.  Remember that the driver model is predicated
on knowing the devices first, and *then* matching drivers.  I
expect you *will* see problems if you get people thinking system
config comes from a "which driver" selection rather than "here's
the exact hardware that's available".  Maybe configfs should be
used for device config.


> Could this be smaller - sure - but NAND is cheap (according to them) compared 
> to the effort and cost of maintaining and testing multiple kernel versions 
> for every product.
>  
> Could this be faster - sure - but it is done at init - and then never again. 
> We have a ~2-3 second boot time - maybe we shave off a few ms - things go 
> pretty fast at 600MHz.

I'm more used to clock rates less than 1/3 that much.  :)

- Dave


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool
Here, I should obviously admit that the semantics of *(volatile int 
*)&
aren't any neater or well-defined in the _language standard_ at all. 
The

standard does say (verbatim) "precisely what constitutes as access to
object of volatile-qualified type is implementation-defined", but GCC
does help us out here by doing the right thing.


Where do you get that idea?


Try a testcase (experimentally verify).


That doesn't prove anything.  Experiments can only disprove
things.


GCC manual, section 6.1, "When
is a Volatile Object Accessed?" doesn't say anything of the
kind.


True, "implementation-defined" as per the C standard _is_ supposed to 
mean
"unspecified behaviour where each implementation documents how the 
choice

is made". So ok, probably GCC isn't "documenting" this
implementation-defined behaviour which it is supposed to, but can't 
really

fault them much for this, probably.


GCC _is_ documenting this, namely in this section 6.1.  It doesn't
mention volatile-casted stuff.  Draw your own conclusions.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

Now the second wording *IS* technically correct, but come on, it's
24 words long whereas the original one was 3 -- and hopefully anybody
reading the shorter phrase *would* have known anyway what was meant,
without having to be pedantic about it :-)


Well you were talking pretty formal (and detailed) stuff, so
IMHO it's good to have that exactly correct.  Sure it's nicer
to use small words most of the time :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/12] Blackfin arch: Add label to call new GPIO API

2007-08-17 Thread David Brownell
On Friday 17 August 2007, Robin Getz wrote:
> On Fri 17 Aug 2007 14:24, David Brownell pondered:
> > Just for the record, this is an unusual way to use these calls.
> 
> That is part of the natural evolution of the kernel isn't it - per James's 
> keynote at OLS - you release something, and see how people [ab]use it until 
> it either grows, evolves, or it dies.

Yep ... and it's worth knowing when you're doing
something different.  Different isn't always worse,
isn't always better.


> > Other platforms completely decouple these issues from the
> > IRQ infrastructure ... doing the pinmux and gpio claiming
> > separately from the request_irq()/free_irq() paths, mostly
> > as part of board setup.  Doing all of that "early":
> 
> is early:
>  - early in the kernel?
>  - early before the kernel? (in the bootloader).

Both of those are "earlier", yes.  Different product developers
may argue for either placement.

 
> >  - keeps those error returns from causing hard-to-track-down
> >runtime bugs;
> 
> The current Blackfin implementation causes a run time message:
> "the pin  driver requested, was already claimed by yyy driver".
> 
> I don't think that is too bad?

Given some product with a Blackfin chip, would you expect a
customer -- who may not even see the Linux bits!! -- to be
able to solve such problems?  If it's not possible for such
problems to crop up in the field, product support (and field
troubleshooting) gets easier...


> >  - works always, even on platforms where a given IRQ may
> >appear on any of several pins/balls;
> 
> But requires custom bootloaders or board setup for every hardware platform?

One or both, yes.  That's typical in embedded setups.
They're not necessarily all that different, but that
code does need to handle the hardware differences.


> Most of our users would not like that, since they do as you say - use the 
> same kernel - with different drivers on multiple platforms.

I thought I referred to different revisions of one platform... :)

 
> >  - makes it easier to cross-check against board schematics,
> >by keeping most board-specific setup in one source file;
> 
> Yes - but we are not talking about muxing a common peripheral (like a single 
> UART) out many different pins (A or B or C). The UART pins are fixed. If you 
> want the UART, you need to use pin A. If you want to use the I2C that also 
> sits on pin A, you will get the message:
> "pin A, requested by I2C, was already claimed by UART driver".

Not all platforms work that way though.  There can often be several
options for where a given signal gets routed.

And then there are the errors where someone accidentally copies
something like "GPIO 29" to two places ... invariants like "only
one GPIO requestor at a time" are needed to turn up such stuff.


> >  - shrinks the kernel's runtime footprint;
> 
> I agree - making things more flexible/easier to use - is normally more 
> complex/larger/slower. (I know - easier to use is a matter of opinion). Since 
> this is normally done once, in _init functions, I'm not sure that makes much 
> of a difference here.
>  
> >  - allows the label to be more descriptive ... describeing
> >exactly *which* IRQ, so that using the labels for better
> >diagnostics actually gives better diagnostics.
> 
> I'm not sure what you mean?

The $SUBJECT patch uses the string "IRQ" in all cases.
But "smc_irq" and "codec_irq" would be more informative
as entries in a list of even just a handful of GPIOs.
And with a few dozen, I'd find "IRQ" not at all helpful.


> > Again, not "wrong"; but probably sub-optimal.  You might
> > want to move towards earlier binding now, while Linux is
> > still young on Blackfin and you don't have legacy code to
> > worry about.
> 
> Our overall goal is to keep as much code - including bootloader - platform 
> agnostic, and not require people to write any of code/configuration data to 
> boot up something, and get things working in a semi-standard manner.

The issue is just where those limits lie.  IMO it's not at
all unreasonable to require board-specific code.  External
chips will need board-specfic glue data in most cases (how
they're addressed, what IRQs they use, and so on); and you
may have drivers available that correspond to devices that
are not wired up on that particular hardware.


> This still has it's limits - which is why we publish all our hardware 
> designs. 
> If you implement things the similar way (because for the most part it is 
> fixed by the processor designer) - the bootloader/kernel/driver will just 
> work.

Sure ... you'd need to say "this board uses "
and if integrated in the SOC that's often enough.  External
devices need more configuration.  Even for integrated ones,
that knowledge doesn't belong in the driver ... "which of the
many UARTS to use as console" isn't standard, and neither
is "what hardware handshaking pins are in use".

 
> I would rather force a little extra complexity on me (as a kernel developer) 
> 

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

In a reasonable world, gcc should just make that be (on x86)

addl $1,i(%rip)

on x86-64, which is indeed what it does without the volatile. But with 
the
volatile, the compiler gets really nervous, and doesn't dare do it in 
one

instruction, and thus generates crap like

movli(%rip), %eax
addl$1, %eax
movl%eax, i(%rip)

instead. For no good reason, except that "volatile" just doesn't have 
any
good/clear semantics for the compiler, so most compilers will just 
make it

be "I will not touch this access in any way, shape, or form". Including
even trivially correct instruction optimization/combination.


It's just a (target-specific, perhaps) missed-optimisation kind
of bug in GCC.  Care to file a bug report?


but is
(again) something that gcc doesn't dare do, since "i" is volatile.


Just nobody taught it it can do this; perhaps no one wanted to
add optimisations like that, maybe with a reasoning like "people
who hit the go-slow-in-unspecified-ways button should get what
they deserve" ;-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + proc-export-a-processes-resource-limits-via-proc-pid.patch added to -mm tree

2007-08-17 Thread Oleg Nesterov
Neil Horman wrote:
> 
> +static int proc_pid_limits(struct task_struct *task, char *buffer)
> +{
> + unsigned int i;
> + int count = 0;
> + char *bufptr = buffer;
> +
> + struct rlimit rlim[RLIM_NLIMITS];
> +
> + read_lock(_lock);
> + memcpy(rlim, task->signal->rlim, sizeof(struct rlimit) * RLIM_NLIMITS);
> + read_unlock(_lock);

Please don't re-introduce tasklist_lock unless strictly needed. And in this case
it doesn't help, sys_getrlimit() changes ->rlim[] under task_lock().

Hovewer, I think the whole patch is not right. The "tsk" itself is pinned, but 
its
->signal is not stable and can be == NULL.

You can use lock_task_sighand() to access ->signal.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

(and yes, it is perfectly legitimate to
want a non-volatile read for a data type that you also want to do
atomic RMW operations on)


...which is undefined behaviour in C (and GCC) when that data is
declared volatile, which is a good argument against implementing
atomics that way in itself.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + cifs-check-for-granted-memory.patch added to -mm tree

2007-08-17 Thread Jesper Juhl
On 17/08/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> The patch titled
>  CIFS: check for granted memory
> has been added to the -mm tree.  Its filename is
>  cifs-check-for-granted-memory.patch
>
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
>
> See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
> out what to do about this
>
> --
> Subject: CIFS: check for granted memory
> From: Cyrill Gorcunov <[EMAIL PROTECTED]>
>
> Add a check for granted memory to prevent possible NULL pointer usage.
>
> Signed-off-by: Cyrill Gorcunov <[EMAIL PROTECTED]>
> Cc: Steven French <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> ---
>
>  fs/cifs/sess.c |4 
>  1 files changed, 4 insertions(+)
>
> diff -puN fs/cifs/sess.c~cifs-check-for-granted-memory fs/cifs/sess.c
> --- a/fs/cifs/sess.c~cifs-check-for-granted-memory
> +++ a/fs/cifs/sess.c
> @@ -372,6 +372,10 @@ CIFS_SessSetup(unsigned int xid, struct
>
> /* 2000 big enough to fit max user, domain, NOS name etc. */
> str_area = kmalloc(2000, GFP_KERNEL);
> +   if (str_area == NULL) {
> +   cifs_small_buf_release(smb_buf);
> +   return -ENOMEM;
> +   }

The patch, as such, is fine - not arguing against it, but as a matter
of style; don't we usually prefer the "if (!foo)" form over "if (foo
== NULL)" ??

-- 
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/12] Blackfin arch: add peripheral resource allocation support

2007-08-17 Thread Robin Getz
On Fri 17 Aug 2007 17:10, David Brownell pondered:
> On the other hand, maybe you want your "typical" customer to
> be more of a systems integrator than anything else.

We are getting yelled at by our customers (I was on the phone yesterday), 
because the kernel build environment we distribute (the default) was not 
inside Eclipse, and someone couldn't push a button on a GUI rather than 
typing "make".

While Linux and other open source software is free, for some taking advantage 
of its benefits can require a significant investment in time. Linux is 
powerful, rich, and flexible. It is these very characteristics that make 
Linux so appealing that also create a significant level of complexity.

We are seeing more and more "first time buyers" jumping from bare metal - no 
OS, to Linux. For them - the ease of not having to write configuration files 
gives them a warm fuzzy feeling when they boot their board, and it 
just "works".

> > While potentially causing conflicting usage, for someone without
> > detailed hardware knowledge. The platform device board file is a good
> > thing to track conflicting memory or IO space resources as well as
> > IRQs. We also utilize platform device files for exactly these purposes.
> > 
> > The dynamic resource allocation for pinmux and gpio seems to us the
> > best way to handle things. The "resource allocation" mechanism will
> > spill an error and dump in case conflicting usage is detected. It'll
> > also tell you who is causing the conflicting usage.   
> 
> That's your call, of course.  I was pointing out why the "early"
> binding of pin resources is the more usual strategy with Linux.
> A "late" strategy is a bit surprising, and has its own issues.

Agreed - Every implementation has its own issues, but based on what we were 
being asked to do - it was the best way we could accommodate everyone.

> > >That said, how you handle pinmux on Blackfin is your business.
> > >
> > >But you should know that this approach seems idiosyncratic and
> > >more complex than needed:  when pin config is done early and as
> > >part of board setup, drivers don't need to care about it or to
> > >handle any pinmux errors.  And heck, products can sometimes be
> > >shipped with the bootloader having done all pinmux setup, so
> > >Linux won't need to worry about it at all.  That can help ship
> > >multiple board revisions using the same kernel.
> > 
> > This works for fixed function boards.
> 
> That is, for typical products embedding Linux...

We have multiple customers shipping the same bootloader/kernel binary on 
different products, and the only difference is the /etc/rc file - which 
drivers they install, and a few things in userspace.

Could this be smaller - sure - but NAND is cheap (according to them) compared 
to the effort and cost of maintaining and testing multiple kernel versions 
for every product.
 
Could this be faster - sure - but it is done at init - and then never again. 
We have a ~2-3 second boot time - maybe we shave off a few ms - things go 
pretty fast at 600MHz.

-Robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Segher Boessenkool

Of course, since *normal* accesses aren't necessarily limited wrt
re-ordering, the question then becomes one of "with regard to *what* 
does

it limit re-ordering?".

A C compiler that re-orders two different volatile accesses that have a
sequence point in between them is pretty clearly a buggy compiler. So 
at a

minimum, it limits re-ordering wrt other volatiles (assuming sequence
points exists). It also means that the compiler cannot move it
speculatively across conditionals, but other than that it's starting to
get fuzzy.


This is actually really well-defined in C, not fuzzy at all.
"Volatile accesses" are a side effect, and no side effects can
be reordered with respect to sequence points.  The side effects
that matter in the kernel environment are: 1) accessing a volatile
object; 2) modifying an object; 3) volatile asm(); 4) calling a
function that does any of these.

We certainly should avoid volatile whenever possible, but "because
it's fuzzy wrt reordering" is not a reason -- all alternatives have
exactly the same issues.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][kprobes] support kretprobe-blacklist

2007-08-17 Thread Andrew Morton
On Fri, 17 Aug 2007 15:43:04 -0400
Masami Hiramatsu <[EMAIL PROTECTED]> wrote:

> This patch introduces architecture dependent kretprobe
> blacklists to prohibit users from inserting return
> probes on the function in which kprobes can be inserted
> but kretprobes can not.
> 
> Signed-off-by: Masami Hiramatsu <[EMAIL PROTECTED]>
> 
> ---
>  
> When a kretprobe is inserted in the entry of the "__switch_to()",
> it causes kernel panic on i386 with recent kernel.
> 
>  
> In include/asm-i386/current.h, "current" is defined as an entry of
> percpu variables.:
> 
>  DECLARE_PER_CPU(struct task_struct *, current_task);
>  static __always_inline struct task_struct *get_current(void)
>  {
>  return x86_read_percpu(current_task);
>  }
> 
>  #define current get_current()
> 
> This mean the "current" macro is separated from its stack register.
> Kretprobe expects that "current" value when a function is called is
> not changed, or both of "current" value and stack register are changed in
> the target function.
> But __switch_to() in arch/i386/kernel/process.c changes only the value of
> "current", but doesn't changes the stack register(this was already switched
> to new stack before calling __switch_to()):
> 
>  struct task_struct fastcall * __switch_to(struct task_struct *prev_p, struct
>  task_struct *next_p)
>  {
>   ...
> x86_write_percpu(current_task, next_p);
> 
> return prev_p;
>  }
> 
> Kretprobe modifies the return address stored in the stack, and
> it saves the original return address in the kretprobe_instance with
> the value of "current".
> When kretprobe is hit at the return of __switch_to(), it searches
> kretprobe_instance related to the probe point from a hash list by
> using the hash-value of the "current" instead of stack address.
> However, since the value of "current" is already changed,
> it can't find proper instance, and invokes BUG() macro.
> 
>  
> As a result of discussion with other kprobe developers, I'd
> like to introduce arch-dependent blacklist.
> To introduce "__kretprobes" mark (like "__kprobes") is another way.
> But I thought it is not efficient way, because the kretprobe can
> be inserted only in the entry of functions and there is no need to
> check against a whole function.
> 
> This patch also removes "__kprobes" mark from "__switch_to" on x86_64
> and registers "__switch_to" to the blacklist on x86-64, because that mark
> is to prohibit user from inserting only kretprobe.
> 
> Index: 2.6-mm/include/asm-i386/kprobes.h
> ===
> --- 2.6-mm.orig/include/asm-i386/kprobes.h
> +++ 2.6-mm/include/asm-i386/kprobes.h
> @@ -44,6 +44,7 @@ typedef u8 kprobe_opcode_t;
> 
>  #define ARCH_SUPPORTS_KRETPROBES
>  #define flush_insn_slot(p)   do { } while (0)
> +#define ARCH_SUPPORTS_KRETPROBE_BLACKLIST

Can we avoid adding this please?

It should at least have been a CONFIG_foo thing, defined in arch/*/Kconfig.

But that still requires nasty ifdefs in the C code.  It would be very small
overhead just to require that all architectures implement
arch_kretprobe_blacklist[] (which can be renamed to kretprobe_blacklist[]).
 Architectures which don't need a blacklist can just have { { 0, 0 } }.

If the few bytes of overhead on non-x86 really offends then one could do
something like this:

in powerpc header file:

#define kretprobe_blacklist_size 0

in x86 header file:

extern const int kretprobe_blacklist_size;

in x86 C file:

const int kretprobe_blacklist_size = ARRAY_SIZE(kretprobe_blacklist);

and then this code:

> --- 2.6-mm.orig/kernel/kprobes.c
> +++ 2.6-mm/kernel/kprobes.c
> @@ -716,6 +716,18 @@ int __kprobes register_kretprobe(struct
>   int ret = 0;
>   struct kretprobe_instance *inst;
>   int i;
> +#ifdef ARCH_SUPPORTS_KRETPROBE_BLACKLIST
> + void *addr = rp->kp.addr;
> +
> + if (addr == NULL)
> + kprobe_lookup_name(rp->kp.symbol_name, addr);
> + addr += rp->kp.offset;
> +
> + for (i = 0; arch_kretprobe_blacklist[i].name != NULL; i++) {
> + if (arch_kretprobe_blacklist[i].addr == addr)
> + return -EINVAL;
> + }
> +#endif

can be put inside

if (kretprobe_blacklist_size) {
...
}

so the compiler will remove it all for (say) powerpc.

There are lots of ways of doing it but code like this:

> 
> +#ifdef ARCH_SUPPORTS_KRETPROBE_BLACKLIST
> + /* lookup the function address from its name */
> + for (i = 0; arch_kretprobe_blacklist[i].name != NULL; i++) {
> + kprobe_lookup_name(arch_kretprobe_blacklist[i].name,
> +arch_kretprobe_blacklist[i].addr);
> + if (!arch_kretprobe_blacklist[i].addr)
> + printk("kretprobe: Unknown blacklisted function: %s\n",
> +arch_kretprobe_blacklist[i].name);
> + }
> +#endif

really isn't the sort of thing we like to see spreading through core kernel
code.


RE: [PATCH 02/12] Blackfin arch: Add label to call new GPIO API

2007-08-17 Thread Hennerich, Michael


>-Original Message-
>From: David Brownell [mailto:[EMAIL PROTECTED]
>
>On Friday 17 August 2007, Hennerich, Michael wrote:
>> What Mike wants to point out is that a external IRQ is first a GPIO
and
>> needs to be configured like an INPUT GPIO and then a specific bit
needs
>> to be set unmask it as IRQ.
>>
>> So why not use the GPIO infrastructure to setup this pin as GPIO?
>
>My comments about the advantages of using that infrastructure
>for *early* binding captured the key points ... it's "failfast".
>
>For IRQs you're probably on decently firm ground, since it's
>extremely rare that people not handle request_irq() errors.
>
>Remember, I just pointed out that the "late fail" strategy
>is unusual.  That doesn't mean it's wrong ... just it'll be
>a bit of surprise, some cognitive dissonance to developers
>picking up a Blackfin project, potentially more error prone.
>

Dave,

Thanks - we really appreciate your feedback.
Please believe me - since a great while we have similar internal
discussion how we should handle these things.

Things need to be DAU proof.

We rather prefer having some verbal runtime messages, than having a
system that doesn't do what expected and being silent.
(The bootloader doesn't know what kernel modules are being loaded
requiring specific HW setup) 

We also don't fear the memory overhead (compared to the support
overhead), the runtime overhead is almost neglectable since these
functions are only called once, best case twice (module remove).

I see your points - I would prefer having a fix function board suiting
all our customers' needs - or something like an x86 system where
everything is fixed or dedicated and abstracted by IO/Memory and IRQ.

-Michael 

  


>- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] x86_64: Save registers in saved_context during suspend and hibernation

2007-08-17 Thread Rafael J. Wysocki
On Friday, 17 August 2007 23:08, Andrew Morton wrote:
> On Fri, 17 Aug 2007 15:26:22 +0200
> "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> 
> > During hibernation and suspend on x86_64 save CPU registers in the 
> > saved_context
> > structure rather than in a handful of separate variables.
> 
> You have "-mm" in the subject but afaict this patch is not dependent upon
> anything in -mm?

It shouldn't be.  The -mm means that I regard it as -mm material.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/12] Blackfin arch: Add label to call new GPIO API

2007-08-17 Thread Robin Getz
On Fri 17 Aug 2007 14:24, David Brownell pondered:
> Just for the record, this is an unusual way to use these calls.

That is part of the natural evolution of the kernel isn't it - per James's 
keynote at OLS - you release something, and see how people [ab]use it until 
it either grows, evolves, or it dies.

> Other platforms completely decouple these issues from the
> IRQ infrastructure ... doing the pinmux and gpio claiming
> separately from the request_irq()/free_irq() paths, mostly
> as part of board setup.  Doing all of that "early":

is early:
 - early in the kernel?
 - early before the kernel? (in the bootloader).

>  - keeps those error returns from causing hard-to-track-down
>runtime bugs;

The current Blackfin implementation causes a run time message:
"the pin  driver requested, was already claimed by yyy driver".

I don't think that is too bad?

>  - works always, even on platforms where a given IRQ may
>appear on any of several pins/balls;

But requires custom bootloaders or board setup for every hardware platform? 
Most of our users would not like that, since they do as you say - use the 
same kernel - with different drivers on multiple platforms.

>  - makes it easier to cross-check against board schematics,
>by keeping most board-specific setup in one source file;

Yes - but we are not talking about muxing a common peripheral (like a single 
UART) out many different pins (A or B or C). The UART pins are fixed. If you 
want the UART, you need to use pin A. If you want to use the I2C that also 
sits on pin A, you will get the message:
"pin A, requested by I2C, was already claimed by UART driver".

>  - shrinks the kernel's runtime footprint;

I agree - making things more flexible/easier to use - is normally more 
complex/larger/slower. (I know - easier to use is a matter of opinion). Since 
this is normally done once, in _init functions, I'm not sure that makes much 
of a difference here.
 
>  - allows the label to be more descriptive ... describeing
>exactly *which* IRQ, so that using the labels for better
>diagnostics actually gives better diagnostics.

I'm not sure what you mean?

> Again, not "wrong"; but probably sub-optimal.  You might
> want to move towards earlier binding now, while Linux is
> still young on Blackfin and you don't have legacy code to
> worry about.

Our overall goal is to keep as much code - including bootloader - platform 
agnostic, and not require people to write any of code/configuration data to 
boot up something, and get things working in a semi-standard manner.

This still has it's limits - which is why we publish all our hardware designs. 
If you implement things the similar way (because for the most part it is 
fixed by the processor designer) - the bootloader/kernel/driver will just 
work.

I would rather force a little extra complexity on me (as a kernel developer) 
than have to answer thousands of questions from end users, who are trying to 
move the kernel onto their hardware.

-Robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kfree(0) - ok?

2007-08-17 Thread Pekka Enberg
On 8/18/07, Christoph Lameter <[EMAIL PROTECTED]> wrote:
> That was merged over my objections. IMHO ksize(NULL) should fail since we
> are determining the size of an unallocated object.

Agreed, especially as we have real zero-sized objects returned from
kmalloc() et al now.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kfree(0) - ok?

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Christoph Lameter wrote:

> On Sat, 18 Aug 2007, Satyam Sharma wrote:
> 
> > Hmm, I didn't know ksize(NULL) was also allowed to succeed (and
> > return 0).
> 
> That was merged over my objections. IMHO ksize(NULL) should fail since we 
> are determining the size of an unallocated object.

Agreed, I'd have implemented ksize() that oops'ed on NULL, myself.
For that matter, I'd wish that kfree() oops'ed on NULL too (and have
duly participated in such a flamewar once), but not many (if any) on
this list seem to sympathize with such an opinion :-)

> > Oh yes, of course. We want krealloc(NULL) cases to behave consistently 
> > as expected, and letting ksize(NULL) return 0 means the code for 
> > krealloc() can lose an extra "if (!p)" check that would otherwise have 
> > been required. Cool.
> 
> krealloc should check for that.

Agreed again, explicitly checking for that only sounds fair to me.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: how to add debug information into the vmlinux

2007-08-17 Thread Jesper Juhl
On 17/08/07, Xu Yang <[EMAIL PROTECTED]> wrote:
> Hello everyone,
>
> I am trying to port kernel 2.6.19 onto my system.so I need the c code
> , which can show me where the program is running. I add -g when I
> compile it.
>
You shouldn't need to do that manually, simply go into "make
menuconfig", enter the "Kernel hacking" menu and select the "Kernel
debugging" and "Compile the kernel with debug info" options.
You may also want to enable "Compile the kernel with frame pointers"
and various other options in that menu to get more debug info.


-- 
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/12] Blackfin arch: Add label to call new GPIO API

2007-08-17 Thread David Brownell
On Friday 17 August 2007, Hennerich, Michael wrote:
> What Mike wants to point out is that a external IRQ is first a GPIO and
> needs to be configured like an INPUT GPIO and then a specific bit needs
> to be set unmask it as IRQ.
> 
> So why not use the GPIO infrastructure to setup this pin as GPIO?

My comments about the advantages of using that infrastructure
for *early* binding captured the key points ... it's "failfast".

For IRQs you're probably on decently firm ground, since it's
extremely rare that people not handle request_irq() errors.

Remember, I just pointed out that the "late fail" strategy
is unusual.  That doesn't mean it's wrong ... just it'll be
a bit of surprise, some cognitive dissonance to developers
picking up a Blackfin project, potentially more error prone.

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/12] Blackfin arch: Add label to call new GPIO API

2007-08-17 Thread David Brownell
On Friday 17 August 2007, Mike Frysinger wrote:

> as Michael pointed out, in the Blackfin world we tend to keep things
> very dynamic as we have dev systems which allow for dropping in of
> optional cards at will, so doing this in the bootloader is way too
> inflexible.

That's the tradeoff:  optimize for development boards, or
instead for more fixed-function product boards.


> oh, and another [smallish] data point.  the Blackfin processor has a
> small bootrom on it that could be likened to a very micro bios.  so
> it's possible to actually boot the linux kernel straight without a
> boot loader.  send the kernel over the UART to a Blackfin and watch it
> go go go :)

That's not uncommon, although I'm more used to seeing the
on-chip ROMs relying on a second stage loader for stuff like
getting memory and other clocks going at optimal speeds,
and loading "big" images (that won't fit on-chip SRAM).

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/12] Blackfin arch: add peripheral resource allocation support

2007-08-17 Thread David Brownell
On Friday 17 August 2007, Hennerich, Michael wrote:
> Hi Dave,
> 

> Right - our patch descriptions needs to be worked on.  

Yes, please ... that makes reviewing easier!
 

> For a well experienced 
> systems engineer being the same time the same guy who does the Hardware
> and the Software this is not an issue.

I guess I've rarely come across job descriptions like that.
Lowlevel software folk need to be able to use schematics and
often test equipment, but not design product circuits ... and
circuit designers rarely have responsibility to ship software.

On the other hand, maybe you want your "typical" customer to
be more of a systems integrator than anything else.


> We provide all kind of drivers utilizing almost any peripheral on
> Blackfin.

Chip vendors supporting Linux drivers for all their hardware.
What a pleasant change!  :)


> While potentially causing conflicting usage, for someone without
> detailed hardware knowledge. The platform device board file is a good
> thing to track conflicting memory or IO space resources as well as IRQs.
> We also utilize platform device files for exactly these purposes.
> 
> The dynamic resource allocation for pinmux and gpio seems to us the best
> way to handle things. The "resource allocation" mechanism will spill an
> error and dump in case conflicting usage is detected. It'll also tell
> you who is causing the conflicting usage.   

That's your call, of course.  I was pointing out why the "early"
binding of pin resources is the more usual strategy with Linux.
A "late" strategy is a bit surprising, and has its own issues.


> >That said, how you handle pinmux on Blackfin is your business.
> >
> >But you should know that this approach seems idiosyncratic and
> >more complex than needed:  when pin config is done early and as
> >part of board setup, drivers don't need to care about it or to
> >handle any pinmux errors.  And heck, products can sometimes be
> >shipped with the bootloader having done all pinmux setup, so
> >Linux won't need to worry about it at all.  That can help ship
> >multiple board revisions using the same kernel.
> 
> This works for fixed function boards.

That is, for typical products embedding Linux...


> But not for development boards 
> where we provide lego like add on cards, and allow people to connect
> their homebrewn hardware.  

Development boards are usually run differently than product
boards.  All that flexibility in the development boards is
not necessarily a feature in the product version; it costs
space and time, which the application may need.  Being able
to shift costs *early* and then drop them at runtime is a
useful strategy to apply in most places.

And heck -- most development setups get used only with one
card stack at a time, and it's easy to install new kernels
for new stacks.

- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] Filter based on a nodemask as well as a gfp_mask

2007-08-17 Thread Christoph Lameter
On Fri, 17 Aug 2007, Mel Gorman wrote:

> @@ -696,6 +696,16 @@ static inline struct zonelist *node_zone
>   return _DATA(nid)->node_zonelist;
>  }
>  
> +static inline int zone_in_nodemask(unsigned long zone_addr,
> + nodemask_t *nodes)
> +{
> +#ifdef CONFIG_NUMA
> + return node_isset(zonelist_zone(zone_addr)->node, *nodes);
> +#else
> + return 1;
> +#endif /* CONFIG_NUMA */
> +}
> +

This is dereferencind the zone in a filtering operation. I wonder if
we could encode the node in the zone_addr as well? x86_64 aligns zones on
page boundaries. So we have 10 bits left after taking 2 for the zone id.

> -int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl)
> +int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask)
>  {
> - int i;
> -
> - for (i = 0; zl->_zones[i]; i++) {
> - int nid = zone_to_nid(zonelist_zone(zl->_zones[i]));
> + int nid;
>  
> + for_each_node_mask(nid, *nodemask)
>   if (node_isset(nid, current->mems_allowed))
>   return 1;
> - }
> +
>   return 0;

Hmmm... This is equivalent to

nodemask_t temp;

nodes_and(temp, nodemask, current->mems_allowed);
return !nodes_empty(temp);

which avoids the loop over all nodes.

> - }
> - if (num == 0) {
> - kfree(zl);
> - return ERR_PTR(-EINVAL);
> + for_each_node_mask(nd, *nodemask) {
> + struct zone *z = _DATA(nd)->node_zones[k];
> + if (z->present_pages > 0)
> + return 1;

Here you could use an and with the N_HIGH_MEMORY or N_NORMAL_MEMORY 
nodemask.

> @@ -1149,12 +1125,19 @@ unsigned slab_node(struct mempolicy *pol
>   case MPOL_INTERLEAVE:
>   return interleave_nodes(policy);
>  
> - case MPOL_BIND:
> + case MPOL_BIND: {

No { } needed.

>   /*
>* Follow bind policy behavior and start allocation at the
>* first node.
>*/
> - return 
> zone_to_nid(zonelist_zone(policy->v.zonelist->_zones[0]));
> + struct zonelist *zonelist;
> + unsigned long *z;
> + enum zone_type highest_zoneidx = gfp_zone(GFP_KERNEL);
> + zonelist = _DATA(numa_node_id())->node_zonelist;
> + z = first_zones_zonelist(zonelist, >v.nodes,
> + highest_zoneidx);
> + return zone_to_nid(zonelist_zone(*z));
> + }
>  
>   case MPOL_PREFERRED:
>   if (policy->v.preferred_node >= 0)

> @@ -1330,14 +1314,6 @@ struct mempolicy *__mpol_copy(struct mem
>   }
>   *new = *old;
>   atomic_set(>refcnt, 1);
> - if (new->policy == MPOL_BIND) {
> - int sz = ksize(old->v.zonelist);
> - new->v.zonelist = kmemdup(old->v.zonelist, sz, GFP_KERNEL);
> - if (!new->v.zonelist) {
> - kmem_cache_free(policy_cache, new);
> - return ERR_PTR(-ENOMEM);
> - }
> - }
>   return new;

That is a good optimization.

> @@ -1680,32 +1647,6 @@ void mpol_rebind_policy(struct mempolicy
>   *mpolmask, *newmask);
>   *mpolmask = *newmask;
>   break;
> - case MPOL_BIND: {
> - nodemask_t nodes;
> - unsigned long *z;
> - struct zonelist *zonelist;
> -
> - nodes_clear(nodes);
> - for (z = pol->v.zonelist->_zones; *z; z++)
> - node_set(zone_to_nid(zonelist_zone(*z)), nodes);
> - nodes_remap(tmp, nodes, *mpolmask, *newmask);
> - nodes = tmp;
> -
> - zonelist = bind_zonelist();
> -
> - /* If no mem, then zonelist is NULL and we keep old zonelist.
> -  * If that old zonelist has no remaining mems_allowed nodes,
> -  * then zonelist_policy() will "FALL THROUGH" to MPOL_DEFAULT.
> -  */
> -
> - if (!IS_ERR(zonelist)) {
> - /* Good - got mem - substitute new zonelist */
> - kfree(pol->v.zonelist);
> - pol->v.zonelist = zonelist;
> - }
> - *mpolmask = *newmask;
> - break;
> - }

Simply dropped? We still need to recalculate the node_mask depending on 
the new cpuset environment!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Early printk behaviour

2007-08-17 Thread Robin Getz
On Fri 17 Aug 2007 17:09, Mike Frysinger pondered:
> On 8/17/07, Robin Getz <[EMAIL PROTECTED]> wrote:
> >
> > Something like:
> >
> > Index: kernel/printk.c
> > ===
> > --- kernel/printk.c (revision 3568)
> > +++ kernel/printk.c (working copy)
> > @@ -1104,6 +1104,22 @@
> >  }
> >  EXPORT_SYMBOL(unregister_console);
> >
> > +int __init disable_boot_consoles(void)
> > +{
> > +   struct console *con;
> > +
> > +   for (con = console_drivers; con; con = con->next) {
> > +   if (con->flags & CON_BOOT) {
> > +   printk(KERN_INFO "Unregister BootConsole %s%d\n",
> > +   con->name, con->index);
> > +   unregister_console(con);
> > +   }
> > +   }
> > +   return 0;
> > +}
> > +late_initcall(disable_boot_consoles);
> 
> is there any need for a return value then ?
> void __init disable_boot_consoles(void);

So, we don't get compiler warnings?

Otherwise:
kernel/printk.c:1119: warning: initialization from incompatible pointer type

> and if we dont think anyone else wants to call it ...
> static void __init disable_boot_consoles(void);

So I think static is Ok, but it needs to be int - that is the proper prototype

-Robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-17 Thread David Miller
From: Roland Dreier <[EMAIL PROTECTED]>
Date: Fri, 17 Aug 2007 12:52:39 -0700

>  > When using RDMA you lose the capability to do packet shaping,
>  > classification, and all the other wonderful networking facilities
>  > you've grown to love and use over the years.
> 
> Same thing with TSO and LRO and who knows what else.

Not true at all.  Full classification and filtering still is usable
with TSO and LRO.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] IRDA: Avoid a label defined but not used warning in irda_init()

2007-08-17 Thread Jesper Juhl
Hi,

Easily avoidable compiler warnings bug me.


Building irmod without CONFIG_SYSCTL currently results in : 
 net/irda/irmod.c:132: warning: label 'out_err_2' defined but not used

But that can easily be avoided by simply moving the label inside 
the existing "#ifdef CONFIG_SYSCTL" one line above it.

This patch moves the label and buys us one less warning with no 
ill effects.


Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]>
---

 net/irda/irmod.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/irda/irmod.c b/net/irda/irmod.c
index 1900937..8ba703d 100644
--- a/net/irda/irmod.c
+++ b/net/irda/irmod.c
@@ -128,8 +128,8 @@ static int __init irda_init(void)
  out_err_3:
 #ifdef CONFIG_SYSCTL
irda_sysctl_unregister();
-#endif
  out_err_2:
+#endif
 #ifdef CONFIG_PROC_FS
irda_proc_unregister();
 #endif



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kfree(0) - ok?

2007-08-17 Thread Satyam Sharma


On Sat, 18 Aug 2007, Satyam Sharma wrote:

> On Fri, 17 Aug 2007, Christoph Lameter wrote:
> 
> > On Fri, 17 Aug 2007, Andrew Morton wrote:
> > 
> > > are we seeing a pattern here?  We could stick the unlikely inside
> > > ZERO_OR_NULL_PTR() itself.  That's a little bit sleazy though - there 
> > > might
> > > be future callsites at which it is likely, who knows?
> > 
> > Thought about that myself but then there would be a weird side effect to 
> > ZERO_OR_NULL_PTR().
> 
> True, but I suspect such a side-effect to actually matter only for the
> BUG_ON case, where introducing the unlikely() would mean the output from
> the show_registers() dump during the BUG() would show a not-useful-at-all
> %%eax == 0x001 value, but only if CONFIG_PROFILE_LIKELY=y, admittedly.

Hang on, BUG_ON() already uses unlikely anyway. And I've just verified
from a testcase that gcc doesn't get confused by unlikely(unlikely(...))
kind of code, so we're in the clear, I think.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix section mismatch in the Adaptec DPT SCSI Raid driver

2007-08-17 Thread Andrew Morton
On Fri, 17 Aug 2007 16:51:15 -0400
Joe Korty <[EMAIL PROTECTED]> wrote:

> Fix section mismatch in the Adaptec DPT SCSI Raid driver.
> 
> Signed-off-by: Joe Korty <[EMAIL PROTECTED]>
> 
> Index: 2.6.23-rc3-git1/drivers/scsi/dpt_i2o.c
> ===
> --- 2.6.23-rc3-git1.orig/drivers/scsi/dpt_i2o.c   2007-08-17 
> 16:36:05.0 -0400
> +++ 2.6.23-rc3-git1/drivers/scsi/dpt_i2o.c2007-08-17 16:50:13.0 
> -0400
> @@ -3351,7 +3351,7 @@
>   return count > 0 ? 0 : -ENODEV;
>  }
>  
> -static void __exit adpt_exit(void)
> +static void adpt_exit(void)
>  {
>   while (hba_chain)
>   adpt_release(hba_chain);

Please always provide at least a copy of the error message when providing
patches which fix warnings, or build errors, or section mismatches.

For section mismatches, an analysis of what caused the problem would help,
too.  It saves others from having to do the same thing.

In this case, I'd need to see what error is being fixed so that I can judge
the seriousness of the problem.  In this case I don't _think_ it'll be
terribly serious because iirc most architectures don't free exitcall memory.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kfree(0) - ok?

2007-08-17 Thread Christoph Lameter
On Sat, 18 Aug 2007, Satyam Sharma wrote:

> > page = get_object_page(object);
> 
> Hmm, I didn't know ksize(NULL) was also allowed to succeed (and
> return 0).

That was merged over my objections. IMHO ksize(NULL) should fail since we 
are determining the size of an unallocated object.

> Oh yes, of course. We want krealloc(NULL) cases to behave consistently 
> as expected, and letting ksize(NULL) return 0 means the code for 
> krealloc() can lose an extra "if (!p)" check that would otherwise have 
> been required. Cool.

krealloc should check for that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Early printk behaviour

2007-08-17 Thread Mike Frysinger
On 8/17/07, Robin Getz <[EMAIL PROTECTED]> wrote:
> On Fri 17 Aug 2007 03:49, Gerd Hoffmann pondered:
> > Mike Frysinger wrote:
> > >> Hmm, sort of, although I didn't think about the case of no real console
> > >> replacing the early console.  The intention of the patch is to have a
> > >> smooth handover from the boot console to the real console.  And, yes, if
> > >> no real console is ever registered the boot console keeps running ...
> > >
> > > i think it also occurs in the case where real console != early console
> >
> > No.  At least not of the boot console has the CON_BOOT flag set as it
> > should.  Last message you'll see on the boot console is the handover
> > printk, telling you which real console device prints the following
> > messages.  Whenever early and real console go to the physical device or
> > not doesn't matter.
> >
> > >> So you can either let it running and *not* mark it __init, so it can
> > >> keep on going without breaking.  Or you can explicitly unregister your
> > >> boot console at some point, maybe using a late_initcall.
> > >
> > > wouldnt a common kernel late_initcall() be more appropriate ?  if
> > > early console hasnt switched over (for whatever reason), then kill it
> >
> > Hmm, yes, should be doable in generic code.  Check whenever the current
> > console has CON_BOOT set and if so unregister it.
>
> Something like:
>
> Index: kernel/printk.c
> ===
> --- kernel/printk.c (revision 3568)
> +++ kernel/printk.c (working copy)
> @@ -1104,6 +1104,22 @@
>  }
>  EXPORT_SYMBOL(unregister_console);
>
> +int __init disable_boot_consoles(void)
> +{
> +   struct console *con;
> +
> +   for (con = console_drivers; con; con = con->next) {
> +   if (con->flags & CON_BOOT) {
> +   printk(KERN_INFO "Unregister BootConsole %s%d\n",
> +   con->name, con->index);
> +   unregister_console(con);
> +   }
> +   }
> +   return 0;
> +}
> +late_initcall(disable_boot_consoles);

is there any need for a return value then ?
void __init disable_boot_consoles(void);

and if we dont think anyone else wants to call it ...
static void __init disable_boot_consoles(void);
-mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] x86_64: Save registers in saved_context during suspend and hibernation

2007-08-17 Thread Andrew Morton
On Fri, 17 Aug 2007 15:26:22 +0200
"Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:

> During hibernation and suspend on x86_64 save CPU registers in the 
> saved_context
> structure rather than in a handful of separate variables.

You have "-mm" in the subject but afaict this patch is not dependent upon
anything in -mm?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Cyclades: Avoid label defined but not used warning

2007-08-17 Thread Jesper Juhl
Hi,

I don't like compiler warnings, especially not when they are easy to 
avoid.

Building cyclades driver without CONFIG_PCI currently results in this :

   CC  drivers/char/cyclades.o
 drivers/char/cyclades.c: In function 'cy_init':
 drivers/char/cyclades.c:5488: warning: label 'err_unr' defined but not used

That's easily avoided, with no ill effects by simply moving what's 
under the 'err_unr' label (used from only a single location) inside 
the existing #ifdef - so that's what this patch does.

Compile tested on i386 only. I don't have the hardware to do any 
other testing.

Please consider applying.


Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]>
---

 drivers/char/cyclades.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/char/cyclades.c b/drivers/char/cyclades.c
index 9e0adfe..858bdab 100644
--- a/drivers/char/cyclades.c
+++ b/drivers/char/cyclades.c
@@ -5480,13 +5480,13 @@ static int __init cy_init(void)
 #ifdef CONFIG_PCI
/* look for pci boards */
retval = pci_register_driver(_pci_driver);
-   if (retval && !nboards)
-   goto err_unr;
+   if (retval && !nboards) {
+   tty_unregister_driver(cy_serial_driver);
+   goto err_frtty;
+   }
 #endif
 
return 0;
-err_unr:
-   tty_unregister_driver(cy_serial_driver);
 err_frtty:
put_tty_driver(cy_serial_driver);
 err:


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] Do not use FASTCALL for __alloc_pages_nodemask()

2007-08-17 Thread Christoph Lameter
On Fri, 17 Aug 2007, Mel Gorman wrote:

> Opinions as to why FASTCALL breaks on one machine are welcome.

Could we get rid of FASTCALL? AFAIK the compiler should automatically 
choose the right calling convention?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kfree(0) - ok?

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Christoph Lameter wrote:

> On Fri, 17 Aug 2007, Andrew Morton wrote:
> 
> > are we seeing a pattern here?  We could stick the unlikely inside
> > ZERO_OR_NULL_PTR() itself.  That's a little bit sleazy though - there might
> > be future callsites at which it is likely, who knows?
> 
> Thought about that myself but then there would be a weird side effect to 
> ZERO_OR_NULL_PTR().

True, but I suspect such a side-effect to actually matter only for the
BUG_ON case, where introducing the unlikely() would mean the output from
the show_registers() dump during the BUG() would show a not-useful-at-all
%%eax == 0x001 value, but only if CONFIG_PROFILE_LIKELY=y, admittedly.

> But since your thinking along the same lines: Lets do 
> it. I will fix up the patch to do just that.

Ok, thanks.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] Embed zone_id information within the zonelist->zones pointer

2007-08-17 Thread Christoph Lameter
On Fri, 17 Aug 2007, Mel Gorman wrote:

> +/*
> + * SMP will align zones to a large boundary so the zone ID will fit in the
> + * least significant biuts. Otherwise, ZONES_SHIFT must be 2 or less to
> + * fit

ZONES_SHIFT is always 2 or less

Acked-by: Christoph Lameter <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   >