Re: [Cluster-devel] gfs uevent and sysfs changes
On Fri, Dec 05, 2008 at 09:51:45AM +, Steven Whitehouse wrote: In that case gfs2 should be able to generate the id itself from the fsname and it still doesn't need it passed in, even if it continues to expose the id in sysfs. Perhaps better still, it should be possible for David to generate the id directly if he really needs it from the fsname. It's not actually a crc of the fsname, but a crc of the cpg name gfs_controld creates for the mountgroup, which is gfs:mount:fsname. Also, we may at some point want to allow that generated id to be overriden by one that's set explicitly. worry about!), and I don't see that netlink should have any more overhead than any other method of sending messages. netlink is painful compared to uevents, look at dlm_controld/netlink.c which uses the generic netlink interface to transfer a data structure from the kernel to userspace. A library would help, but there didn't seem to be a de facto netlink lib when I needed it, maybe that's changed.
Re: [Cluster-devel] gfs uevent and sysfs changes
On Fri, Dec 05, 2008 at 08:52:58AM -0600, David Teigland wrote: On Fri, Dec 05, 2008 at 09:51:45AM +, Steven Whitehouse wrote: In that case gfs2 should be able to generate the id itself from the fsname and it still doesn't need it passed in, even if it continues to expose the id in sysfs. Perhaps better still, it should be possible for David to generate the id directly if he really needs it from the fsname. It's not actually a crc of the fsname, but a crc of the cpg name gfs_controld creates for the mountgroup, which is gfs:mount:fsname. Also, we may at some point want to allow that generated id to be overriden by one that's set explicitly. The fact that this id comes from gfs_controld, and becomes available only during mount, makes me think it's not well suited to be the statfs fsid. GFS should probably do it's own thing for statfs (like a hash of just the fsname) instead of depending on gfs_controld for it. With nolock the daemons won't be there, and we'd still want the same fsid to be produced.
Re: [Cluster-devel] gfs uevent and sysfs changes
On Thu, Dec 4, 2008 at 5:38 PM, David Teigland [EMAIL PROTECTED] wrote: On Thu, Dec 04, 2008 at 04:59:23PM -0500, david m. richter wrote: ah, so just to make sure i'm with you here: (1) gfs_controld is generating this id-which-is-the-mountgroup-id, and (2) gfs_kernel will no longer receive this in the hostdata string, so (3) i can just rip out my in-kernel hostdata-parsing gunk and instead send in the mountgroup id on my own (i have my own up/downcall channel)? if i've got it right, then everything's a cinch and i'll shut up :) Yep. Generally, the best way to uniquely identify and refer to a gfs filesystem is using the fsname string (specified during mkfs with -t and saved in the superblock). But, sometimes it's just a lot easier have a numerical identifier instead. I expect this is why you're using the id, and it's why we were using it for communicating about plocks. yes, the numerical id gets used a lot in my pNFS stuff, where the kernel needs to make upcalls, of which some then get relayed over multicast -- so, I've just been stashing that in the superblock. thanks for clearing up my questions. In cluster1 and cluster2 the cluster infrastructure dynamically selected a unique id when needed, and it never worked great. In cluster3 the id is just a crc of the fsname string. Now that I think about this a bit more, there may be a reason to keep the id in the string. There was some interest on linux-kernel about better using the statfs fsid field, and this id is what gfs should be putting there. interesting; that'd be cool. i've been meaning to look at statfs more often in my stuff anyway. say, one tangential question (i won't be offended if you skip it - heh): is there a particular reason that you folks went with the uevent mechanism for doing upcalls? i'm just curious, given the seeming-complexity and possible overhead of using the whole layered netlink apparatus vs. something like Trond Myklebust's rpc_pipefs (don't let the rpc fool you; it's a barebones, dead-simple pipe). -- and no, i'm not selling anything :) my boss was asking for a list of differences between rpc_pipefs and uevents and the best i could come up with is the former's bidirectional. Trond mentioned the netlink overhead and i wondered if that was actually a significant factor or just lost in the noise in most cases. The uevents looked pretty simple when I was initially designing how the kernel/user interactions would work, and they fit well with sysfs files which I was using too. I don't think the overhead of using uevents is too bad. Sysfs files and uevents definately don't work great if you need any kind of sophisticated bi-directional interface. great, thanks -- always good to get folks' anecdotal advice and keep it in my toolbag for later. cheers, d . Dave
Re: [Cluster-devel] gfs uevent and sysfs changes
On Mon, Dec 1, 2008 at 12:31 PM, David Teigland [EMAIL PROTECTED] wrote: Here are the compatibility aspects to the recent ideas about changes to the user/kernel interface between gfs (1 2) and gfs_controld. . gfs_controld can remove id from hostdata string in mount options hi david, I know I'm a peripheral consumer of the cluster suite, but I thought I'd chime in and say that I am currently using the id as passed into the kernel in the hostdata string (I believe by mount.gfs2?) in my pNFS work. does the above gfs_controld can remove id from hostdata string comment refer to something orthogonal, or would it affect what gets stored in the superblock's hostdata at mount time? ..hm, sorry, I don't have the code right in front of me, but is that id in the hostdata string the same thing as the mountgroup id? if so, then my above worry about the hostdata string is moot, because if gfs_controld still has that info I can just make a downcall. thanks, d . - no compat issues AFAICT . getting rid of id sysfs file from lock_dlm - new gfs_controld old gfs-kernel old kernel provides both block and id sysfs files new daemon looks for block instead of id in sysfs - old gfs_controld new gfs-kernel old daemon looks for id sysfs file new kernel needs to provide id as well as block sysfs files Once everyone is using the new daemon, we can remove the id sysfs file from the kernel. . uevent strings to replace recover_done/recover_status sysfs files - new gfs_controld old gfs-kernel old kernel has recover sysfs files, and no new uevent strings new daemon needs to look for either sysfs files or uevent strings - old gfs_controld new gfs-kernel old daemon looks for recover sysfs files, not new uevent strings new kernel needs to provide both sysfs files and uevent strings Once everyone is using new kernel and new daemon, we can remove the recover sysfs files from kernel, and daemon can stop looking for recover sysfs files.
Re: [Cluster-devel] gfs uevent and sysfs changes
On Thu, Dec 04, 2008 at 01:32:31PM -0500, david m. richter wrote: On Mon, Dec 1, 2008 at 12:31 PM, David Teigland [EMAIL PROTECTED] wrote: Here are the compatibility aspects to the recent ideas about changes to the user/kernel interface between gfs (1 2) and gfs_controld. . gfs_controld can remove id from hostdata string in mount options hi david, I know I'm a peripheral consumer of the cluster suite, but I thought I'd chime in and say that I am currently using the id as passed into the kernel in the hostdata string (I believe by mount.gfs2?) in my pNFS work. does the above gfs_controld can remove id from hostdata string comment refer to something orthogonal, or would it affect what gets stored in the superblock's hostdata at mount time? yes ..hm, sorry, I don't have the code right in front of me, but is that id in the hostdata string the same thing as the mountgroup id? if so, then my above worry about the hostdata string is moot, because if gfs_controld still has that info I can just make a downcall. Yes, it's created in gfs_controld, and passed to mount.gfs via the hostdata string which is then passed into the kernel during mount(2). Previously, gfs-kernel (lock_dlm actually) would pass this id back up to gfs_controld within the plock op structures. This was because plock ops for all gfs fs's were funnelled to gfs_controld through a single misc device. gfs_controld would match the op to a particular fs using the id. The dlm does this now, using the lockspace id. Dave
Re: [Cluster-devel] gfs uevent and sysfs changes
On Thu, Dec 4, 2008 at 4:07 PM, David Teigland [EMAIL PROTECTED] wrote: On Thu, Dec 04, 2008 at 01:32:31PM -0500, david m. richter wrote: On Mon, Dec 1, 2008 at 12:31 PM, David Teigland [EMAIL PROTECTED] wrote: Here are the compatibility aspects to the recent ideas about changes to the user/kernel interface between gfs (1 2) and gfs_controld. . gfs_controld can remove id from hostdata string in mount options hi david, I know I'm a peripheral consumer of the cluster suite, but I thought I'd chime in and say that I am currently using the id as passed into the kernel in the hostdata string (I believe by mount.gfs2?) in my pNFS work. does the above gfs_controld can remove id from hostdata string comment refer to something orthogonal, or would it affect what gets stored in the superblock's hostdata at mount time? yes ..hm, sorry, I don't have the code right in front of me, but is that id in the hostdata string the same thing as the mountgroup id? if so, then my above worry about the hostdata string is moot, because if gfs_controld still has that info I can just make a downcall. Yes, it's created in gfs_controld, and passed to mount.gfs via the hostdata string which is then passed into the kernel during mount(2). ah, so just to make sure i'm with you here: (1) gfs_controld is generating this id-which-is-the-mountgroup-id, and (2) gfs_kernel will no longer receive this in the hostdata string, so (3) i can just rip out my in-kernel hostdata-parsing gunk and instead send in the mountgroup id on my own (i have my own up/downcall channel)? if i've got it right, then everything's a cinch and i'll shut up :) say, one tangential question (i won't be offended if you skip it - heh): is there a particular reason that you folks went with the uevent mechanism for doing upcalls? i'm just curious, given the seeming-complexity and possible overhead of using the whole layered netlink apparatus vs. something like Trond Myklebust's rpc_pipefs (don't let the rpc fool you; it's a barebones, dead-simple pipe). -- and no, i'm not selling anything :) my boss was asking for a list of differences between rpc_pipefs and uevents and the best i could come up with is the former's bidirectional. Trond mentioned the netlink overhead and i wondered if that was actually a significant factor or just lost in the noise in most cases. thanks again, d . Previously, gfs-kernel (lock_dlm actually) would pass this id back up to gfs_controld within the plock op structures. This was because plock ops for all gfs fs's were funnelled to gfs_controld through a single misc device. gfs_controld would match the op to a particular fs using the id. The dlm does this now, using the lockspace id. Dave