Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
Eric W. Biederman wrote: At this point given that we actually have a small user space dependency and the fact that after I have reviewed the code it looks harmless to change the inode number of those inodes, in both cases they are just anonymous inodes generated with new_inode, and anything that we wrap is likely to be equally so. So it looks to me like we need to do three things: - Fix the inode number Okay. its already done. - Fix the name on the hugetlbfs dentry to hold the key I don't see need for doing this for hugetlbfs inodes. Currently, they don't base their name on "key" + basing on the "key" is kind of useless anyway (its not unique). - Add a big fat comment that user space programs depend on this behavior of both the dentry name and the inode number. I don't think, the user-space can depend on the dentry-name. It can only depend on inode# to match shmid. (since key is not unique esp. for key=0x). BTW, I agree that shmid is not unique even without namespaces as its based on seq# and we wrap seq#. Thanks, Badari - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
On 6/8/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote: "Albert Cahalan" <[EMAIL PROTECTED]> writes: > On 6/7/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote: >> So it looks to me like we need to do three things: >> - Fix the inode number >> - Fix the name on the hugetlbfs dentry to hold the key >> - Add a big fat comment that user space programs depend on this >> behavior of both the dentry name and the inode number. > > Assuming that this proposed fix goes in: > > Since the inode number is the shmid, and this is a number > that the kernel randomly chooses AFAIK, there should be > no need to have different shm segments sharing the same > inode number. Where we run into inode number confusion is that all of these shm segments are actually files on a tmpfs filesystem somewhere, and by making the inode number the shmid we loose the tmpfs inode number. So it is possible we get tmpfs inode number conflicts. However the inode number is not used for anything, and the files are not visible in any other way except as shm segments so it doesn't matter. Eh, the kernel choses both shmid and tmpfs inode number. You could set a high bit in one or the other. There is another case with ipc namespaces where we ultimately need to support duplicate shmids on the same machine (so migration is a possibility). However by and large the user space processes with duplicate ids should be invisible to each other. On the bright side, this only screws up people who get the crazy idea that processes can be migrated. > The situation with the key is a bit more disturbing, though > we already hit that anyway when IPC_PRIVATE is used. > (why anybody would NOT use IPC_PRIVATE is a mystery) > So having the key in the name doesn't make things worse. Having "SYSV" in the name appears mandatory. Otherwise you don't even know it is a shm file. Although I may be confused. It's mandatory for a different reason: to satisfy parsers. It is nearly useless for identifying shm files. Look what I can do: touch /SYSV touch '/SYSV (deleted)' (so pmap creates a shm, looks for the address in /proc/self/maps, determines the device major/minor in use, and then uses that) Hmm. Thinking about this I have just realized that we may want to approach this a little differently. Currently I am reusing the dentry and inode structure that hugetlbfs and tmpfs return me, and simply have a distinct struct file for each shm mapping. There is a little more cost but it may actually make sense to have a dentry and inode that is specific to shm.c so we can do whatever we need to without adding requirements to the normal tmpfs or hugtlb code. Piggybacking on tmpfs has always seemed a bit dirty to me. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
"Albert Cahalan" <[EMAIL PROTECTED]> writes: > On 6/7/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote: > >> So it looks to me like we need to do three things: >> - Fix the inode number >> - Fix the name on the hugetlbfs dentry to hold the key >> - Add a big fat comment that user space programs depend on this >> behavior of both the dentry name and the inode number. > > Assuming that this proposed fix goes in: > > Since the inode number is the shmid, and this is a number > that the kernel randomly chooses AFAIK, there should be > no need to have different shm segments sharing the same > inode number. Where we run into inode number confusion is that all of these shm segments are actually files on a tmpfs filesystem somewhere, and by making the inode number the shmid we loose the tmpfs inode number. So it is possible we get tmpfs inode number conflicts. However the inode number is not used for anything, and the files are not visible in any other way except as shm segments so it doesn't matter. There is another case with ipc namespaces where we ultimately need to support duplicate shmids on the same machine (so migration is a possibility). However by and large the user space processes with duplicate ids should be invisible to each other. > The situation with the key is a bit more disturbing, though > we already hit that anyway when IPC_PRIVATE is used. > (why anybody would NOT use IPC_PRIVATE is a mystery) > So having the key in the name doesn't make things worse. Having "SYSV" in the name appears mandatory. Otherwise you don't even know it is a shm file. Although I may be confused. > I have some concern about the device minor number. > This should be the same for all shm mappings; I do not > know if the behavior changed. So I haven't changed anything here. But I haven't really looked either. I don't have a clue if hugetlbfs files use the same device minor number as tmpfs files. Hmm. Thinking about this I have just realized that we may want to approach this a little differently. Currently I am reusing the dentry and inode structure that hugetlbfs and tmpfs return me, and simply have a distinct struct file for each shm mapping. There is a little more cost but it may actually make sense to have a dentry and inode that is specific to shm.c so we can do whatever we need to without adding requirements to the normal tmpfs or hugtlb code. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
On 6/7/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote: So it looks to me like we need to do three things: - Fix the inode number - Fix the name on the hugetlbfs dentry to hold the key - Add a big fat comment that user space programs depend on this behavior of both the dentry name and the inode number. Assuming that this proposed fix goes in: Since the inode number is the shmid, and this is a number that the kernel randomly chooses AFAIK, there should be no need to have different shm segments sharing the same inode number. The situation with the key is a bit more disturbing, though we already hit that anyway when IPC_PRIVATE is used. (why anybody would NOT use IPC_PRIVATE is a mystery) So having the key in the name doesn't make things worse. I have some concern about the device minor number. This should be the same for all shm mappings; I do not know if the behavior changed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
"Serge E. Hallyn" <[EMAIL PROTECTED]> writes: > Ok, so IIUC the problem was that inode->i_ino was being set to the id, > and the id can be the same for different things in two namespaces. There is nothing preventing inode number collisions in this code even without multiple namespaces, and even when it was functioning correctly. However as it does not seem possible to find these files through normal filesystem operations that does not seem to be a problem. > So aside from not using the id as inode->i_ino, an alternative is to use > a separate superblock, spearate mqeueue fs, for each ipc ns. > > I haven't looked at that enough to see whether it's feasible, i.e. I > don't know what else mqueue fs is used for. Eric, does that sound > reasonable to you? At this point given that we actually have a small user space dependency and the fact that after I have reviewed the code it looks harmless to change the inode number of those inodes, in both cases they are just anonymous inodes generated with new_inode, and anything that we wrap is likely to be equally so. So it looks to me like we need to do three things: - Fix the inode number - Fix the name on the hugetlbfs dentry to hold the key - Add a big fat comment that user space programs depend on this behavior of both the dentry name and the inode number. So Badari it looks like your original patch plus a little bit is what we need. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
Serge E. Hallyn wrote: Quoting Serge E. Hallyn ([EMAIL PROTECTED]): Quoting Badari Pulavarty ([EMAIL PROTECTED]): On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote: Quoting Badari Pulavarty ([EMAIL PROTECTED]): On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote: On Thu, 07 Jun 2007 10:06:37 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote: On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote: On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote: BTW, I agree with Eric that its would be nice to use shmid as part of name instead of forcing to be as inode number. It should be possible for pmap to workout shmid from "key" or name. Isn't it ? It is not at all nice. 1. it's incompatible ABI breakage 2. where will you put the key then, in the inode? :-) Nope. Currently "key" is part of the name (but its not unique). Changing to "SYSVID%d" is no good either. Look, people are ***parsing*** this stuff in /proc. The /proc filesystem is not some random sandbox to be playing in. Before you go messing with it, note that the device number also matters. (it's per-boot dynamic, but that's OK) That's how one knows that /SYSV is not just a regular file; sadly these didn't get a non-/ prefix. (and no you can't fix that now; it's way too late) Next time you feel like breaking an ABI, mind putting "LET'S BREAK AN ABI!" in the subject of your email? I am not breaking ABI. Its already broken in the current mainline. I am trying to fix it by putting back the ino# as shmid. Eric had a suggestion that, instead of depending on the inode# to be shmid, we could embed shmid into name (instead of "key" which is currently not unique). BTW, I suspect this kind of thing also breaks: a. fuser, lsof, and other resource usage display tools b. various obscure emulators (similar to valgrind) If you strongly feel that "old" behaviour needs to be retained, yup, we should put it back. The change was, afaik, accidental. here is the patch I originally suggested. Confused. Will this one-liner fix all the userspace breakage to which Albert refers? Yes. Albert, please correct me if I am wrong. It will, but could lead to two different inodes with the same i_ino, right? Only if we generate same ID in two different namespaces. Is it currently possible ? Should be nothing stopping it. (just to be more certain, a quick test showed I can get id 0 for different keys, and different ids for the same key 0xff, in different ipc namespaces) Funny. I played with it and decided that it can happen :) Thanks, Badari - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
Quoting Serge E. Hallyn ([EMAIL PROTECTED]): > Quoting Badari Pulavarty ([EMAIL PROTECTED]): > > On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote: > > > Quoting Badari Pulavarty ([EMAIL PROTECTED]): > > > > On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote: > > > > > On Thu, 07 Jun 2007 10:06:37 -0700 > > > > > Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote: > > > > > > > On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > BTW, I agree with Eric that its would be nice to use shmid as > > > > > > > > part > > > > > > > > of name instead of forcing to be as inode number. It should be > > > > > > > > possible for pmap to workout shmid from "key" or name. Isn't it > > > > > > > > ? > > > > > > > > > > > > > > It is not at all nice. > > > > > > > > > > > > > > 1. it's incompatible ABI breakage > > > > > > > 2. where will you put the key then, in the inode? :-) > > > > > > > > > > > > Nope. Currently "key" is part of the name (but its not unique). > > > > > > > > > > > > > > > > > > > > Changing to "SYSVID%d" is no good either. Look, people > > > > > > > are ***parsing*** this stuff in /proc. The /proc filesystem > > > > > > > is not some random sandbox to be playing in. > > > > > > > > > > > > > > Before you go messing with it, note that the device number > > > > > > > also matters. (it's per-boot dynamic, but that's OK) > > > > > > > That's how one knows that /SYSV is not just > > > > > > > a regular file; sadly these didn't get a non-/ prefix. > > > > > > > (and no you can't fix that now; it's way too late) > > > > > > > > > > > > > > Next time you feel like breaking an ABI, mind putting > > > > > > > "LET'S BREAK AN ABI!" in the subject of your email? > > > > > > > > > > > > I am not breaking ABI. Its already broken in the current > > > > > > mainline. I am trying to fix it by putting back the ino# > > > > > > as shmid. Eric had a suggestion that, instead of depending > > > > > > on the inode# to be shmid, we could embed shmid into name > > > > > > (instead of "key" which is currently not unique). > > > > > > > > > > > > > BTW, I suspect this kind of thing also breaks: > > > > > > > a. fuser, lsof, and other resource usage display tools > > > > > > > b. various obscure emulators (similar to valgrind) > > > > > > > > > > > > If you strongly feel that "old" behaviour needs to be retained, > > > > > > > > > > yup, we should put it back. The change was, afaik, accidental. > > > > > > > > > > > here is the patch I originally suggested. > > > > > > > > > > Confused. Will this one-liner fix all the userspace breakage to which > > > > > Albert refers? > > > > > > > > Yes. Albert, please correct me if I am wrong. > > > > > > It will, but could lead to two different inodes with the same i_ino, > > > right? > > > > Only if we generate same ID in two different namespaces. Is it currently > > possible ? > > Should be nothing stopping it. (just to be more certain, a quick test showed I can get id 0 for different keys, and different ids for the same key 0xff, in different ipc namespaces) -serge - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
Serge E. Hallyn wrote: Quoting Badari Pulavarty ([EMAIL PROTECTED]): On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote: Quoting Badari Pulavarty ([EMAIL PROTECTED]): On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote: On Thu, 07 Jun 2007 10:06:37 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote: On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote: On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote: BTW, I agree with Eric that its would be nice to use shmid as part of name instead of forcing to be as inode number. It should be possible for pmap to workout shmid from "key" or name. Isn't it ? It is not at all nice. 1. it's incompatible ABI breakage 2. where will you put the key then, in the inode? :-) Nope. Currently "key" is part of the name (but its not unique). Changing to "SYSVID%d" is no good either. Look, people are ***parsing*** this stuff in /proc. The /proc filesystem is not some random sandbox to be playing in. Before you go messing with it, note that the device number also matters. (it's per-boot dynamic, but that's OK) That's how one knows that /SYSV is not just a regular file; sadly these didn't get a non-/ prefix. (and no you can't fix that now; it's way too late) Next time you feel like breaking an ABI, mind putting "LET'S BREAK AN ABI!" in the subject of your email? I am not breaking ABI. Its already broken in the current mainline. I am trying to fix it by putting back the ino# as shmid. Eric had a suggestion that, instead of depending on the inode# to be shmid, we could embed shmid into name (instead of "key" which is currently not unique). BTW, I suspect this kind of thing also breaks: a. fuser, lsof, and other resource usage display tools b. various obscure emulators (similar to valgrind) If you strongly feel that "old" behaviour needs to be retained, yup, we should put it back. The change was, afaik, accidental. here is the patch I originally suggested. Confused. Will this one-liner fix all the userspace breakage to which Albert refers? Yes. Albert, please correct me if I am wrong. It will, but could lead to two different inodes with the same i_ino, right? Only if we generate same ID in two different namespaces. Is it currently possible ? Should be nothing stopping it. But like I say we never find the inode based on i_ino, and don't hash the inode, so it might be ok. Correct. We might end up with same shmid - which mean same inode# shows up in /proc/pid/maps. If we don't unshare pid namespace or look from parent namespace - we will end up seeing same shmid/inode# in different /proc/pid/maps, even though they are different. But I guess its okay.. Thanks, Badari - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
Quoting Serge E. Hallyn ([EMAIL PROTECTED]): > Quoting Badari Pulavarty ([EMAIL PROTECTED]): > > On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote: > > > On Thu, 07 Jun 2007 10:06:37 -0700 > > > Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > > > > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote: > > > > > On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > BTW, I agree with Eric that its would be nice to use shmid as part > > > > > > of name instead of forcing to be as inode number. It should be > > > > > > possible for pmap to workout shmid from "key" or name. Isn't it ? > > > > > > > > > > It is not at all nice. > > > > > > > > > > 1. it's incompatible ABI breakage > > > > > 2. where will you put the key then, in the inode? :-) > > > > > > > > Nope. Currently "key" is part of the name (but its not unique). > > > > > > > > > > > > > > Changing to "SYSVID%d" is no good either. Look, people > > > > > are ***parsing*** this stuff in /proc. The /proc filesystem > > > > > is not some random sandbox to be playing in. > > > > > > > > > > Before you go messing with it, note that the device number > > > > > also matters. (it's per-boot dynamic, but that's OK) > > > > > That's how one knows that /SYSV is not just > > > > > a regular file; sadly these didn't get a non-/ prefix. > > > > > (and no you can't fix that now; it's way too late) > > > > > > > > > > Next time you feel like breaking an ABI, mind putting > > > > > "LET'S BREAK AN ABI!" in the subject of your email? > > > > > > > > I am not breaking ABI. Its already broken in the current > > > > mainline. I am trying to fix it by putting back the ino# > > > > as shmid. Eric had a suggestion that, instead of depending > > > > on the inode# to be shmid, we could embed shmid into name > > > > (instead of "key" which is currently not unique). > > > > > > > > > BTW, I suspect this kind of thing also breaks: > > > > > a. fuser, lsof, and other resource usage display tools > > > > > b. various obscure emulators (similar to valgrind) > > > > > > > > If you strongly feel that "old" behaviour needs to be retained, > > > > > > yup, we should put it back. The change was, afaik, accidental. > > > > > > > here is the patch I originally suggested. > > > > > > Confused. Will this one-liner fix all the userspace breakage to which > > > Albert refers? > > > > Yes. Albert, please correct me if I am wrong. > > It will, but could lead to two different inodes with the same i_ino, > right? Well I guess it's not *technically* a problem since these inodes are never hashed. -serge - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
Quoting Badari Pulavarty ([EMAIL PROTECTED]): > On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote: > > Quoting Badari Pulavarty ([EMAIL PROTECTED]): > > > On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote: > > > > On Thu, 07 Jun 2007 10:06:37 -0700 > > > > Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > > > > > > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote: > > > > > > On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > BTW, I agree with Eric that its would be nice to use shmid as part > > > > > > > of name instead of forcing to be as inode number. It should be > > > > > > > possible for pmap to workout shmid from "key" or name. Isn't it ? > > > > > > > > > > > > It is not at all nice. > > > > > > > > > > > > 1. it's incompatible ABI breakage > > > > > > 2. where will you put the key then, in the inode? :-) > > > > > > > > > > Nope. Currently "key" is part of the name (but its not unique). > > > > > > > > > > > > > > > > > Changing to "SYSVID%d" is no good either. Look, people > > > > > > are ***parsing*** this stuff in /proc. The /proc filesystem > > > > > > is not some random sandbox to be playing in. > > > > > > > > > > > > Before you go messing with it, note that the device number > > > > > > also matters. (it's per-boot dynamic, but that's OK) > > > > > > That's how one knows that /SYSV is not just > > > > > > a regular file; sadly these didn't get a non-/ prefix. > > > > > > (and no you can't fix that now; it's way too late) > > > > > > > > > > > > Next time you feel like breaking an ABI, mind putting > > > > > > "LET'S BREAK AN ABI!" in the subject of your email? > > > > > > > > > > I am not breaking ABI. Its already broken in the current > > > > > mainline. I am trying to fix it by putting back the ino# > > > > > as shmid. Eric had a suggestion that, instead of depending > > > > > on the inode# to be shmid, we could embed shmid into name > > > > > (instead of "key" which is currently not unique). > > > > > > > > > > > BTW, I suspect this kind of thing also breaks: > > > > > > a. fuser, lsof, and other resource usage display tools > > > > > > b. various obscure emulators (similar to valgrind) > > > > > > > > > > If you strongly feel that "old" behaviour needs to be retained, > > > > > > > > yup, we should put it back. The change was, afaik, accidental. > > > > > > > > > here is the patch I originally suggested. > > > > > > > > Confused. Will this one-liner fix all the userspace breakage to which > > > > Albert refers? > > > > > > Yes. Albert, please correct me if I am wrong. > > > > It will, but could lead to two different inodes with the same i_ino, > > right? > > Only if we generate same ID in two different namespaces. Is it currently > possible ? Should be nothing stopping it. But like I say we never find the inode based on i_ino, and don't hash the inode, so it might be ok. -serge - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote: > Quoting Badari Pulavarty ([EMAIL PROTECTED]): > > On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote: > > > On Thu, 07 Jun 2007 10:06:37 -0700 > > > Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > > > > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote: > > > > > On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > BTW, I agree with Eric that its would be nice to use shmid as part > > > > > > of name instead of forcing to be as inode number. It should be > > > > > > possible for pmap to workout shmid from "key" or name. Isn't it ? > > > > > > > > > > It is not at all nice. > > > > > > > > > > 1. it's incompatible ABI breakage > > > > > 2. where will you put the key then, in the inode? :-) > > > > > > > > Nope. Currently "key" is part of the name (but its not unique). > > > > > > > > > > > > > > Changing to "SYSVID%d" is no good either. Look, people > > > > > are ***parsing*** this stuff in /proc. The /proc filesystem > > > > > is not some random sandbox to be playing in. > > > > > > > > > > Before you go messing with it, note that the device number > > > > > also matters. (it's per-boot dynamic, but that's OK) > > > > > That's how one knows that /SYSV is not just > > > > > a regular file; sadly these didn't get a non-/ prefix. > > > > > (and no you can't fix that now; it's way too late) > > > > > > > > > > Next time you feel like breaking an ABI, mind putting > > > > > "LET'S BREAK AN ABI!" in the subject of your email? > > > > > > > > I am not breaking ABI. Its already broken in the current > > > > mainline. I am trying to fix it by putting back the ino# > > > > as shmid. Eric had a suggestion that, instead of depending > > > > on the inode# to be shmid, we could embed shmid into name > > > > (instead of "key" which is currently not unique). > > > > > > > > > BTW, I suspect this kind of thing also breaks: > > > > > a. fuser, lsof, and other resource usage display tools > > > > > b. various obscure emulators (similar to valgrind) > > > > > > > > If you strongly feel that "old" behaviour needs to be retained, > > > > > > yup, we should put it back. The change was, afaik, accidental. > > > > > > > here is the patch I originally suggested. > > > > > > Confused. Will this one-liner fix all the userspace breakage to which > > > Albert refers? > > > > Yes. Albert, please correct me if I am wrong. > > It will, but could lead to two different inodes with the same i_ino, > right? Only if we generate same ID in two different namespaces. Is it currently possible ? Thanks, Badari - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
Quoting Badari Pulavarty ([EMAIL PROTECTED]): > On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote: > > On Thu, 07 Jun 2007 10:06:37 -0700 > > Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote: > > > > On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > > > > > > > BTW, I agree with Eric that its would be nice to use shmid as part > > > > > of name instead of forcing to be as inode number. It should be > > > > > possible for pmap to workout shmid from "key" or name. Isn't it ? > > > > > > > > It is not at all nice. > > > > > > > > 1. it's incompatible ABI breakage > > > > 2. where will you put the key then, in the inode? :-) > > > > > > Nope. Currently "key" is part of the name (but its not unique). > > > > > > > > > > > Changing to "SYSVID%d" is no good either. Look, people > > > > are ***parsing*** this stuff in /proc. The /proc filesystem > > > > is not some random sandbox to be playing in. > > > > > > > > Before you go messing with it, note that the device number > > > > also matters. (it's per-boot dynamic, but that's OK) > > > > That's how one knows that /SYSV is not just > > > > a regular file; sadly these didn't get a non-/ prefix. > > > > (and no you can't fix that now; it's way too late) > > > > > > > > Next time you feel like breaking an ABI, mind putting > > > > "LET'S BREAK AN ABI!" in the subject of your email? > > > > > > I am not breaking ABI. Its already broken in the current > > > mainline. I am trying to fix it by putting back the ino# > > > as shmid. Eric had a suggestion that, instead of depending > > > on the inode# to be shmid, we could embed shmid into name > > > (instead of "key" which is currently not unique). > > > > > > > BTW, I suspect this kind of thing also breaks: > > > > a. fuser, lsof, and other resource usage display tools > > > > b. various obscure emulators (similar to valgrind) > > > > > > If you strongly feel that "old" behaviour needs to be retained, > > > > yup, we should put it back. The change was, afaik, accidental. > > > > > here is the patch I originally suggested. > > > > Confused. Will this one-liner fix all the userspace breakage to which > > Albert refers? > > Yes. Albert, please correct me if I am wrong. It will, but could lead to two different inodes with the same i_ino, right? thanks, -serge - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote: > On Thu, 07 Jun 2007 10:06:37 -0700 > Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote: > > > On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > > > > > BTW, I agree with Eric that its would be nice to use shmid as part > > > > of name instead of forcing to be as inode number. It should be > > > > possible for pmap to workout shmid from "key" or name. Isn't it ? > > > > > > It is not at all nice. > > > > > > 1. it's incompatible ABI breakage > > > 2. where will you put the key then, in the inode? :-) > > > > Nope. Currently "key" is part of the name (but its not unique). > > > > > > > > Changing to "SYSVID%d" is no good either. Look, people > > > are ***parsing*** this stuff in /proc. The /proc filesystem > > > is not some random sandbox to be playing in. > > > > > > Before you go messing with it, note that the device number > > > also matters. (it's per-boot dynamic, but that's OK) > > > That's how one knows that /SYSV is not just > > > a regular file; sadly these didn't get a non-/ prefix. > > > (and no you can't fix that now; it's way too late) > > > > > > Next time you feel like breaking an ABI, mind putting > > > "LET'S BREAK AN ABI!" in the subject of your email? > > > > I am not breaking ABI. Its already broken in the current > > mainline. I am trying to fix it by putting back the ino# > > as shmid. Eric had a suggestion that, instead of depending > > on the inode# to be shmid, we could embed shmid into name > > (instead of "key" which is currently not unique). > > > > > BTW, I suspect this kind of thing also breaks: > > > a. fuser, lsof, and other resource usage display tools > > > b. various obscure emulators (similar to valgrind) > > > > If you strongly feel that "old" behaviour needs to be retained, > > yup, we should put it back. The change was, afaik, accidental. > > > here is the patch I originally suggested. > > Confused. Will this one-liner fix all the userspace breakage to which > Albert refers? Yes. Albert, please correct me if I am wrong. Thanks, Badari > > "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared > > memory (shmid). It was useful in debugging, but its changed recently. > > This patch sets inode number to shared memory id to match /proc/pid/maps. > > > > Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> > > > > Index: linux-2.6.22-rc4/ipc/shm.c > > === > > --- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-04 17:57:25.0 -0700 > > +++ linux-2.6.22-rc4/ipc/shm.c 2007-06-06 08:23:57.0 -0700 > > @@ -397,6 +397,7 @@ static int newseg (struct ipc_namespace > > shp->shm_nattch = 0; > > shp->id = shm_buildid(ns, id, shp->shm_perm.seq); > > shp->shm_file = file; > > + file->f_dentry->d_inode->i_ino = shp->id; > > > > ns->shm_tot += numpages; > > shm_unlock(shp); > > > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
On Thu, 07 Jun 2007 10:06:37 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote: > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote: > > On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > > > BTW, I agree with Eric that its would be nice to use shmid as part > > > of name instead of forcing to be as inode number. It should be > > > possible for pmap to workout shmid from "key" or name. Isn't it ? > > > > It is not at all nice. > > > > 1. it's incompatible ABI breakage > > 2. where will you put the key then, in the inode? :-) > > Nope. Currently "key" is part of the name (but its not unique). > > > > > Changing to "SYSVID%d" is no good either. Look, people > > are ***parsing*** this stuff in /proc. The /proc filesystem > > is not some random sandbox to be playing in. > > > > Before you go messing with it, note that the device number > > also matters. (it's per-boot dynamic, but that's OK) > > That's how one knows that /SYSV is not just > > a regular file; sadly these didn't get a non-/ prefix. > > (and no you can't fix that now; it's way too late) > > > > Next time you feel like breaking an ABI, mind putting > > "LET'S BREAK AN ABI!" in the subject of your email? > > I am not breaking ABI. Its already broken in the current > mainline. I am trying to fix it by putting back the ino# > as shmid. Eric had a suggestion that, instead of depending > on the inode# to be shmid, we could embed shmid into name > (instead of "key" which is currently not unique). > > > BTW, I suspect this kind of thing also breaks: > > a. fuser, lsof, and other resource usage display tools > > b. various obscure emulators (similar to valgrind) > > If you strongly feel that "old" behaviour needs to be retained, yup, we should put it back. The change was, afaik, accidental. > here is the patch I originally suggested. Confused. Will this one-liner fix all the userspace breakage to which Albert refers? > Thanks, > Badari > > "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared > memory (shmid). It was useful in debugging, but its changed recently. > This patch sets inode number to shared memory id to match /proc/pid/maps. > > Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> > > Index: linux-2.6.22-rc4/ipc/shm.c > === > --- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-04 17:57:25.0 -0700 > +++ linux-2.6.22-rc4/ipc/shm.c2007-06-06 08:23:57.0 -0700 > @@ -397,6 +397,7 @@ static int newseg (struct ipc_namespace > shp->shm_nattch = 0; > shp->id = shm_buildid(ns, id, shp->shm_perm.seq); > shp->shm_file = file; > + file->f_dentry->d_inode->i_ino = shp->id; > > ns->shm_tot += numpages; > shm_unlock(shp); > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
Quoting Albert Cahalan ([EMAIL PROTECTED]): > On 6/6/07, Andrew Morton <[EMAIL PROTECTED]> wrote: > >On Wed, 6 Jun 2007 23:27:01 -0400 "Albert Cahalan" <[EMAIL PROTECTED]> > >wrote: > >> Eric W. Biederman writes: > >> > Badari Pulavarty <[EMAIL PROTECTED]> writes: > >> > >> >> Your recent cleanup to shm code, namely > >> >> > >> >> [PATCH] shm: make sysv ipc shared memory use stacked files > >> >> > >> >> took away one of the debugging feature for shm segments. > >> >> Originally, shmid were forced to be the inode numbers and > >> >> they show up in /proc/pid/maps for the process which mapped > >> >> this shared memory segments (vma listing). That way, its easy > >> >> to find out who all mapped this shared memory segment. Your > >> >> patchset, took away the inode# setting. So, we can't easily > >> >> match the shmem segments to /proc/pid/maps easily. (It was > >> >> really useful in tracking down a customer problem recently). > >> >> Is this done deliberately ? Anything wrong in setting this back ? > >> > > >> > Theoretically it makes the stacked file concept more brittle, > >> > because it means the lower layers can't care about their inode > >> > number. > >> > > >> > We do need something to tie these things together. > >> > > >> > So I suspect what makes most sense is to simply rename the > >> > dentry SYSVID > >> > >> Please stop breaking things in /proc. The pmap command relys > >> on the old behavior. > > > >What effect did this change have upon the pmap command? Details, please. > > > >> It's time to revert. > > > >Probably true, but we'd need to understand what the impact was. > > Very simply, pmap reports the shmid. > > albert 0 ~$ pmap `pidof X` | egrep -2 shmid > 3005 16384K rw-s- /dev/fb0 > 3105152K rw---[ anon ] > 31076000384K rw-s-[ shmid=0x3f428000 ] > 310d6000384K rw-s-[ shmid=0x3f430001 ] > 31136000384K rw-s-[ shmid=0x3f438002 ] > 31196000384K rw-s-[ shmid=0x3f440003 ] > 311f6000384K rw-s-[ shmid=0x3f448004 ] > 31256000384K rw-s-[ shmid=0x3f450005 ] > 312b6000384K rw-s-[ shmid=0x3f460006 ] > 31316000384K rw-s-[ shmid=0x3f870007 ] > 31491000140K r /usr/share/fonts/type1/gsfonts/n021003l.pfb > 3150e000 9496K rw---[ anon ] Ok, so IIUC the problem was that inode->i_ino was being set to the id, and the id can be the same for different things in two namespaces. So aside from not using the id as inode->i_ino, an alternative is to use a separate superblock, spearate mqeueue fs, for each ipc ns. I haven't looked at that enough to see whether it's feasible, i.e. I don't know what else mqueue fs is used for. Eric, does that sound reasonable to you? thanks, -serge - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote: > On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote: > > > BTW, I agree with Eric that its would be nice to use shmid as part > > of name instead of forcing to be as inode number. It should be > > possible for pmap to workout shmid from "key" or name. Isn't it ? > > It is not at all nice. > > 1. it's incompatible ABI breakage > 2. where will you put the key then, in the inode? :-) Nope. Currently "key" is part of the name (but its not unique). > > Changing to "SYSVID%d" is no good either. Look, people > are ***parsing*** this stuff in /proc. The /proc filesystem > is not some random sandbox to be playing in. > > Before you go messing with it, note that the device number > also matters. (it's per-boot dynamic, but that's OK) > That's how one knows that /SYSV is not just > a regular file; sadly these didn't get a non-/ prefix. > (and no you can't fix that now; it's way too late) > > Next time you feel like breaking an ABI, mind putting > "LET'S BREAK AN ABI!" in the subject of your email? I am not breaking ABI. Its already broken in the current mainline. I am trying to fix it by putting back the ino# as shmid. Eric had a suggestion that, instead of depending on the inode# to be shmid, we could embed shmid into name (instead of "key" which is currently not unique). > BTW, I suspect this kind of thing also breaks: > a. fuser, lsof, and other resource usage display tools > b. various obscure emulators (similar to valgrind) If you strongly feel that "old" behaviour needs to be retained, here is the patch I originally suggested. Thanks, Badari "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared memory (shmid). It was useful in debugging, but its changed recently. This patch sets inode number to shared memory id to match /proc/pid/maps. Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> Index: linux-2.6.22-rc4/ipc/shm.c === --- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-04 17:57:25.0 -0700 +++ linux-2.6.22-rc4/ipc/shm.c 2007-06-06 08:23:57.0 -0700 @@ -397,6 +397,7 @@ static int newseg (struct ipc_namespace shp->shm_nattch = 0; shp->id = shm_buildid(ns, id, shp->shm_perm.seq); shp->shm_file = file; + file->f_dentry->d_inode->i_ino = shp->id; ns->shm_tot += numpages; shm_unlock(shp); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote: BTW, I agree with Eric that its would be nice to use shmid as part of name instead of forcing to be as inode number. It should be possible for pmap to workout shmid from "key" or name. Isn't it ? It is not at all nice. 1. it's incompatible ABI breakage 2. where will you put the key then, in the inode? :-) Changing to "SYSVID%d" is no good either. Look, people are ***parsing*** this stuff in /proc. The /proc filesystem is not some random sandbox to be playing in. Before you go messing with it, note that the device number also matters. (it's per-boot dynamic, but that's OK) That's how one knows that /SYSV is not just a regular file; sadly these didn't get a non-/ prefix. (and no you can't fix that now; it's way too late) Next time you feel like breaking an ABI, mind putting "LET'S BREAK AN ABI!" in the subject of your email? BTW, I suspect this kind of thing also breaks: a. fuser, lsof, and other resource usage display tools b. various obscure emulators (similar to valgrind) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
On Thu, 2007-06-07 at 00:53 -0400, Albert Cahalan wrote: > On 6/6/07, Andrew Morton <[EMAIL PROTECTED]> wrote: > > On Wed, 6 Jun 2007 23:27:01 -0400 "Albert Cahalan" <[EMAIL PROTECTED]> > > wrote: > > > Eric W. Biederman writes: > > > > Badari Pulavarty <[EMAIL PROTECTED]> writes: > > > > > > >> Your recent cleanup to shm code, namely > > > >> > > > >> [PATCH] shm: make sysv ipc shared memory use stacked files > > > >> > > > >> took away one of the debugging feature for shm segments. > > > >> Originally, shmid were forced to be the inode numbers and > > > >> they show up in /proc/pid/maps for the process which mapped > > > >> this shared memory segments (vma listing). That way, its easy > > > >> to find out who all mapped this shared memory segment. Your > > > >> patchset, took away the inode# setting. So, we can't easily > > > >> match the shmem segments to /proc/pid/maps easily. (It was > > > >> really useful in tracking down a customer problem recently). > > > >> Is this done deliberately ? Anything wrong in setting this back ? > > > > > > > > Theoretically it makes the stacked file concept more brittle, > > > > because it means the lower layers can't care about their inode > > > > number. > > > > > > > > We do need something to tie these things together. > > > > > > > > So I suspect what makes most sense is to simply rename the > > > > dentry SYSVID > > > > > > Please stop breaking things in /proc. The pmap command relys > > > on the old behavior. > > > > What effect did this change have upon the pmap command? Details, please. > > > > > It's time to revert. > > > > Probably true, but we'd need to understand what the impact was. > > Very simply, pmap reports the shmid. > > albert 0 ~$ pmap `pidof X` | egrep -2 shmid > 3005 16384K rw-s- /dev/fb0 > 3105152K rw---[ anon ] > 31076000384K rw-s-[ shmid=0x3f428000 ] > 310d6000384K rw-s-[ shmid=0x3f430001 ] > 31136000384K rw-s-[ shmid=0x3f438002 ] > 31196000384K rw-s-[ shmid=0x3f440003 ] > 311f6000384K rw-s-[ shmid=0x3f448004 ] > 31256000384K rw-s-[ shmid=0x3f450005 ] > 312b6000384K rw-s-[ shmid=0x3f460006 ] > 31316000384K rw-s-[ shmid=0x3f870007 ] > 31491000140K r /usr/share/fonts/type1/gsfonts/n021003l.pfb > 3150e000 9496K rw---[ anon ] pmap seems to get shmid from "ino#" field of /proc/pid/map. Its already broken in current mainline. But, the breakage is not due to namespaces or container effort :( Its due to noble effort from Eric to clean up the shm code, take out the hacks to handle hugetlbfs and make the code more streamlined and readable. If we really really want old behaviour, we need my one line patch to force shmid as inode# :( BTW, I agree with Eric that its would be nice to use shmid as part of name instead of forcing to be as inode number. It should be possible for pmap to workout shmid from "key" or name. Isn't it ? Andrew/Linus, its up to you to figure out if its worth breaking. Here is the patch to base dentry-name on shmid - so we don't need to use ino# to identify shmid. Thanks, Badari Instead of basing dentry name on the shm "key", base it on "shmid" - so it shows up clearly in /proc/pid/maps. Earlier we were forcing ino# to match shmid. Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> Index: linux-2.6.22-rc4/ipc/shm.c === --- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-04 17:57:25.0 -0700 +++ linux-2.6.22-rc4/ipc/shm.c 2007-06-06 13:43:36.0 -0700 @@ -364,6 +364,14 @@ static int newseg (struct ipc_namespace return error; } + error = -ENOSPC; + id = shm_addid(ns, shp); + if(id == -1) + goto no_id; + + /* Build an id, so we can use it for filename */ + shp->id = shm_buildid(ns, id, shp->shm_perm.seq); + if (shmflg & SHM_HUGETLB) { /* hugetlb_zero_setup takes care of mlock user accounting */ file = hugetlb_zero_setup(size); @@ -377,34 +385,28 @@ static int newseg (struct ipc_namespace if ((shmflg & SHM_NORESERVE) && sysctl_overcommit_memory != OVERCOMMIT_NEVER) acctflag = 0; - sprintf (name, "SYSV%08x", key); + sprintf (name, "SYSVID%d", shp->id); file = shmem_file_setup(name, size, acctflag); } error = PTR_ERR(file); if (IS_ERR(file)) goto no_file; - error = -ENOSPC; - id = shm_addid(ns, shp); - if(id == -1) - goto no_id; - shp->shm_cprid = current->tgid; shp->shm_lprid = 0; shp->shm_atim = shp->shm_dtim = 0; shp->shm_ctim = get_seconds(); shp->shm_segsz = size; shp->shm_nattch = 0; - shp->id = shm_buildid(ns, id, shp->shm_perm.seq); shp->shm_file = file; ns->shm_tot += numpages;
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
On 6/6/07, Andrew Morton <[EMAIL PROTECTED]> wrote: On Wed, 6 Jun 2007 23:27:01 -0400 "Albert Cahalan" <[EMAIL PROTECTED]> wrote: > Eric W. Biederman writes: > > Badari Pulavarty <[EMAIL PROTECTED]> writes: > > >> Your recent cleanup to shm code, namely > >> > >> [PATCH] shm: make sysv ipc shared memory use stacked files > >> > >> took away one of the debugging feature for shm segments. > >> Originally, shmid were forced to be the inode numbers and > >> they show up in /proc/pid/maps for the process which mapped > >> this shared memory segments (vma listing). That way, its easy > >> to find out who all mapped this shared memory segment. Your > >> patchset, took away the inode# setting. So, we can't easily > >> match the shmem segments to /proc/pid/maps easily. (It was > >> really useful in tracking down a customer problem recently). > >> Is this done deliberately ? Anything wrong in setting this back ? > > > > Theoretically it makes the stacked file concept more brittle, > > because it means the lower layers can't care about their inode > > number. > > > > We do need something to tie these things together. > > > > So I suspect what makes most sense is to simply rename the > > dentry SYSVID > > Please stop breaking things in /proc. The pmap command relys > on the old behavior. What effect did this change have upon the pmap command? Details, please. > It's time to revert. Probably true, but we'd need to understand what the impact was. Very simply, pmap reports the shmid. albert 0 ~$ pmap `pidof X` | egrep -2 shmid 3005 16384K rw-s- /dev/fb0 3105152K rw---[ anon ] 31076000384K rw-s-[ shmid=0x3f428000 ] 310d6000384K rw-s-[ shmid=0x3f430001 ] 31136000384K rw-s-[ shmid=0x3f438002 ] 31196000384K rw-s-[ shmid=0x3f440003 ] 311f6000384K rw-s-[ shmid=0x3f448004 ] 31256000384K rw-s-[ shmid=0x3f450005 ] 312b6000384K rw-s-[ shmid=0x3f460006 ] 31316000384K rw-s-[ shmid=0x3f870007 ] 31491000140K r /usr/share/fonts/type1/gsfonts/n021003l.pfb 3150e000 9496K rw---[ anon ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
On Wed, 6 Jun 2007 23:27:01 -0400 "Albert Cahalan" <[EMAIL PROTECTED]> wrote: > Eric W. Biederman writes: > > Badari Pulavarty <[EMAIL PROTECTED]> writes: > > >> Your recent cleanup to shm code, namely > >> > >> [PATCH] shm: make sysv ipc shared memory use stacked files > >> > >> took away one of the debugging feature for shm segments. > >> Originally, shmid were forced to be the inode numbers and > >> they show up in /proc/pid/maps for the process which mapped > >> this shared memory segments (vma listing). That way, its easy > >> to find out who all mapped this shared memory segment. Your > >> patchset, took away the inode# setting. So, we can't easily > >> match the shmem segments to /proc/pid/maps easily. (It was > >> really useful in tracking down a customer problem recently). > >> Is this done deliberately ? Anything wrong in setting this back ? > > > > Theoretically it makes the stacked file concept more brittle, > > because it means the lower layers can't care about their inode > > number. > > > > We do need something to tie these things together. > > > > So I suspect what makes most sense is to simply rename the > > dentry SYSVID > > Please stop breaking things in /proc. The pmap command relys > on the old behavior. What effect did this change have upon the pmap command? Details, please. > It's time to revert. Probably true, but we'd need to understand what the impact was. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
Eric W. Biederman writes: Badari Pulavarty <[EMAIL PROTECTED]> writes: Your recent cleanup to shm code, namely [PATCH] shm: make sysv ipc shared memory use stacked files took away one of the debugging feature for shm segments. Originally, shmid were forced to be the inode numbers and they show up in /proc/pid/maps for the process which mapped this shared memory segments (vma listing). That way, its easy to find out who all mapped this shared memory segment. Your patchset, took away the inode# setting. So, we can't easily match the shmem segments to /proc/pid/maps easily. (It was really useful in tracking down a customer problem recently). Is this done deliberately ? Anything wrong in setting this back ? Theoretically it makes the stacked file concept more brittle, because it means the lower layers can't care about their inode number. We do need something to tie these things together. So I suspect what makes most sense is to simply rename the dentry SYSVID Please stop breaking things in /proc. The pmap command relys on the old behavior. It's time to revert. Put back the segment ID where it belongs, and leave the key where it belongs too. Containers are NOT worth breaking our ABIs left and right. We don't need to leap off that bridge just because Solaris did, unless you can explain why complexity and bloat are desirable. We already have SE Linux, chroot, KVM, and several more! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
Badari Pulavarty <[EMAIL PROTECTED]> writes: > > -- Shared Memory Segments > keyshmid owner perms bytes nattch status > 0x 884737 db2inst1 76733554432 13 > 0x 950275 db2fenc1 70123052288 13 > > There is no unique way to identify them easily :( > > I guess, like you suggested, we can change the dentry name to use shmid > instead of the portions of the "key" to make it unique. I think, I can > work out a patch for this. Thanks. That should be more robust. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
On Wed, 2007-06-06 at 11:02 -0600, Eric W. Biederman wrote: > Badari Pulavarty <[EMAIL PROTECTED]> writes: > > > Hi Eric, > > > > Your recent cleanup to shm code, namely > > > > [PATCH] shm: make sysv ipc shared memory use stacked files > > > > took away one of the debugging feature for shm segments. > > Originally, shmid were forced to be the inode numbers and > > they show up in /proc/pid/maps for the process which mapped > > this shared memory segments (vma listing). That way, its easy > > to find out who all mapped this shared memory segment. Your > > patchset, took away the inode# setting. So, we can't easily > > match the shmem segments to /proc/pid/maps easily. (It was > > really useful in tracking down a customer problem recently). > > Is this done deliberately ? Anything wrong in setting this back ? > > > > Comments ? > > > > Thanks, > > Badari > > > > Without patch: > > -- > > > > # ipcs -m > > > > -- Shared Memory Segments > > keyshmid owner perms bytes nattch status > > 0x 884737 db2inst1 76733554432 13 > > > > # grep 884737 /proc/*/maps > > # > > > > With patch: > > --- > > > > # ipcs -m > > > > -- Shared Memory Segments > > keyshmid owner perms bytes nattch status > > 0x 884737 db2inst1 76733554432 13 > > > > # grep 884737 /proc/*/maps > > /proc/0/maps:40006724000-40008724000 rw-s 00:08 884737 > > /SYSV (deleted) > > /proc/1/maps:40006724000-40008724000 rw-s 00:08 884737 > > /SYSV (deleted) > > /proc/2/maps:40006724000-40008724000 rw-s 00:08 884737 > > /SYSV (deleted) > > /proc/3/maps:40006724000-40008724000 rw-s 00:08 884737 > > /SYSV (deleted) > > /proc/4/maps:40006724000-40008724000 rw-s 00:08 884737 > > /SYSV (deleted) > > /proc/5/maps:40006724000-40008724000 rw-s 00:08 884737 > > /SYSV (deleted) > > /proc/6/maps:40006724000-40008724000 rw-s 00:08 884737 > > /SYSV (deleted) > > /proc/7/maps:40006724000-40008724000 rw-s 00:08 884737 > > /SYSV (deleted) > > /proc/8/maps:40006724000-40008724000 rw-s 00:08 884737 > > /SYSV (deleted) > > /proc/11121/maps:40006724000-40008724000 rw-s 00:08 884737 > > /SYSV (deleted) > > /proc/11122/maps:40006724000-40008724000 rw-s 00:08 884737 > > /SYSV (deleted) > > /proc/11124/maps:4000389c000-4000589c000 rw-s 00:08 884737 > > /SYSV (deleted) > > /proc/11575/maps:40006724000-40008724000 rw-s 00:08 884737 > > /SYSV (deleted) > > > > > > > > Here is the patch. > > > > "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared > > memory (shmid). It was useful in debugging, but its changed recently. > > This patch sets inode number to shared memory id to match /proc/pid/maps. > > Theoretically it makes the stacked file concept more brittle, because > it means the lower layers can't care about their inode number. > > We do need something to tie these things together. > > So I suspect what makes most sense is to simply rename the dentry > SYSVID Yep. Currently, we use part of "key" as the dentry name. For example, # ipcs -- Shared Memory Segments keyshmid owner perms bytes nattch status 0x083d0d74 851968 db2inst1 76733554432 13 # grep 83d0d74 /proc/*/maps /proc/0/maps:40004724000-40006724000 rw-s 00:08 851968 /SYSV083d0d74 (deleted) /proc/1/maps:40004724000-40006724000 rw-s 00:08 851968 /SYSV083d0d74 (deleted) /proc/2/maps:40004724000-40006724000 rw-s 00:08 851968 /SYSV083d0d74 (deleted) /proc/3/maps:40004724000-40006724000 rw-s 00:08 851968 /SYSV083d0d74 (deleted) .. The issue is with the ones with key = 0x000, like following: # ipcs -- Shared Memory Segments keyshmid owner perms bytes nattch status 0x 884737 db2inst1 76733554432 13 0x 950275 db2fenc1 70123052288 13 There is no unique way to identify them easily :( I guess, like you suggested, we can change the dentry name to use shmid instead of the portions of the "key" to make it unique. I think, I can work out a patch for this. > > That should give you the necessary information while not doing something > that is a long term maintenance problem. > > Do you think you can cook up a patch to that effect? > Otherwise I will see if I can. > > In practice I'm not really against your change, but I would prefer > to leave the code in a state where someone can reimplement hugetlbfs > or shmfs and we simply don't care. Thanks for your suggestion. Thanks, Badari - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL
Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
Badari Pulavarty <[EMAIL PROTECTED]> writes: > Hi Eric, > > Your recent cleanup to shm code, namely > > [PATCH] shm: make sysv ipc shared memory use stacked files > > took away one of the debugging feature for shm segments. > Originally, shmid were forced to be the inode numbers and > they show up in /proc/pid/maps for the process which mapped > this shared memory segments (vma listing). That way, its easy > to find out who all mapped this shared memory segment. Your > patchset, took away the inode# setting. So, we can't easily > match the shmem segments to /proc/pid/maps easily. (It was > really useful in tracking down a customer problem recently). > Is this done deliberately ? Anything wrong in setting this back ? > > Comments ? > > Thanks, > Badari > > Without patch: > -- > > # ipcs -m > > -- Shared Memory Segments > keyshmid owner perms bytes nattch status > 0x 884737 db2inst1 76733554432 13 > > # grep 884737 /proc/*/maps > # > > With patch: > --- > > # ipcs -m > > -- Shared Memory Segments > keyshmid owner perms bytes nattch status > 0x 884737 db2inst1 76733554432 13 > > # grep 884737 /proc/*/maps > /proc/0/maps:40006724000-40008724000 rw-s 00:08 884737 > /SYSV (deleted) > /proc/1/maps:40006724000-40008724000 rw-s 00:08 884737 > /SYSV (deleted) > /proc/2/maps:40006724000-40008724000 rw-s 00:08 884737 > /SYSV (deleted) > /proc/3/maps:40006724000-40008724000 rw-s 00:08 884737 > /SYSV (deleted) > /proc/4/maps:40006724000-40008724000 rw-s 00:08 884737 > /SYSV (deleted) > /proc/5/maps:40006724000-40008724000 rw-s 00:08 884737 > /SYSV (deleted) > /proc/6/maps:40006724000-40008724000 rw-s 00:08 884737 > /SYSV (deleted) > /proc/7/maps:40006724000-40008724000 rw-s 00:08 884737 > /SYSV (deleted) > /proc/8/maps:40006724000-40008724000 rw-s 00:08 884737 > /SYSV (deleted) > /proc/11121/maps:40006724000-40008724000 rw-s 00:08 884737 > /SYSV (deleted) > /proc/11122/maps:40006724000-40008724000 rw-s 00:08 884737 > /SYSV (deleted) > /proc/11124/maps:4000389c000-4000589c000 rw-s 00:08 884737 > /SYSV (deleted) > /proc/11575/maps:40006724000-40008724000 rw-s 00:08 884737 > /SYSV (deleted) > > > > Here is the patch. > > "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared > memory (shmid). It was useful in debugging, but its changed recently. > This patch sets inode number to shared memory id to match /proc/pid/maps. Theoretically it makes the stacked file concept more brittle, because it means the lower layers can't care about their inode number. We do need something to tie these things together. So I suspect what makes most sense is to simply rename the dentry SYSVID That should give you the necessary information while not doing something that is a long term maintenance problem. Do you think you can cook up a patch to that effect? Otherwise I will see if I can. In practice I'm not really against your change, but I would prefer to leave the code in a state where someone can reimplement hugetlbfs or shmfs and we simply don't care. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid
Hi Eric, Your recent cleanup to shm code, namely [PATCH] shm: make sysv ipc shared memory use stacked files took away one of the debugging feature for shm segments. Originally, shmid were forced to be the inode numbers and they show up in /proc/pid/maps for the process which mapped this shared memory segments (vma listing). That way, its easy to find out who all mapped this shared memory segment. Your patchset, took away the inode# setting. So, we can't easily match the shmem segments to /proc/pid/maps easily. (It was really useful in tracking down a customer problem recently). Is this done deliberately ? Anything wrong in setting this back ? Comments ? Thanks, Badari Without patch: -- # ipcs -m -- Shared Memory Segments keyshmid owner perms bytes nattch status 0x 884737 db2inst1 76733554432 13 # grep 884737 /proc/*/maps # With patch: --- # ipcs -m -- Shared Memory Segments keyshmid owner perms bytes nattch status 0x 884737 db2inst1 76733554432 13 # grep 884737 /proc/*/maps /proc/0/maps:40006724000-40008724000 rw-s 00:08 884737 /SYSV (deleted) /proc/1/maps:40006724000-40008724000 rw-s 00:08 884737 /SYSV (deleted) /proc/2/maps:40006724000-40008724000 rw-s 00:08 884737 /SYSV (deleted) /proc/3/maps:40006724000-40008724000 rw-s 00:08 884737 /SYSV (deleted) /proc/4/maps:40006724000-40008724000 rw-s 00:08 884737 /SYSV (deleted) /proc/5/maps:40006724000-40008724000 rw-s 00:08 884737 /SYSV (deleted) /proc/6/maps:40006724000-40008724000 rw-s 00:08 884737 /SYSV (deleted) /proc/7/maps:40006724000-40008724000 rw-s 00:08 884737 /SYSV (deleted) /proc/8/maps:40006724000-40008724000 rw-s 00:08 884737 /SYSV (deleted) /proc/11121/maps:40006724000-40008724000 rw-s 00:08 884737 /SYSV (deleted) /proc/11122/maps:40006724000-40008724000 rw-s 00:08 884737 /SYSV (deleted) /proc/11124/maps:4000389c000-4000589c000 rw-s 00:08 884737 /SYSV (deleted) /proc/11575/maps:40006724000-40008724000 rw-s 00:08 884737 /SYSV (deleted) Here is the patch. "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared memory (shmid). It was useful in debugging, but its changed recently. This patch sets inode number to shared memory id to match /proc/pid/maps. Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> Index: linux-2.6.22-rc4/ipc/shm.c === --- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-04 17:57:25.0 -0700 +++ linux-2.6.22-rc4/ipc/shm.c 2007-06-06 08:23:57.0 -0700 @@ -397,6 +397,7 @@ static int newseg (struct ipc_namespace shp->shm_nattch = 0; shp->id = shm_buildid(ns, id, shp->shm_perm.seq); shp->shm_file = file; + file->f_dentry->d_inode->i_ino = shp->id; ns->shm_tot += numpages; shm_unlock(shp); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/