Re: [Cluster-devel] gfs uevent and sysfs changes

2008-12-05 Thread David Teigland
On Fri, Dec 05, 2008 at 09:51:45AM +, Steven Whitehouse wrote:
 In that case gfs2 should be able to generate the id itself from the
 fsname and it still doesn't need it passed in, even if it continues to
 expose the id in sysfs.
 
 Perhaps better still, it should be possible for David to generate the id
 directly if he really needs it from the fsname.

It's not actually a crc of the fsname, but a crc of the cpg name
gfs_controld creates for the mountgroup, which is gfs:mount:fsname.
Also, we may at some point want to allow that generated id to be overriden
by one that's set explicitly.

 worry about!), and I don't see that netlink should have any more
 overhead than any other method of sending messages.

netlink is painful compared to uevents, look at dlm_controld/netlink.c
which uses the generic netlink interface to transfer a data structure
from the kernel to userspace.  A library would help, but there didn't seem
to be a de facto netlink lib when I needed it, maybe that's changed.



Re: [Cluster-devel] gfs uevent and sysfs changes

2008-12-05 Thread David Teigland
On Fri, Dec 05, 2008 at 08:52:58AM -0600, David Teigland wrote:
 On Fri, Dec 05, 2008 at 09:51:45AM +, Steven Whitehouse wrote:
  In that case gfs2 should be able to generate the id itself from the
  fsname and it still doesn't need it passed in, even if it continues to
  expose the id in sysfs.
  
  Perhaps better still, it should be possible for David to generate the id
  directly if he really needs it from the fsname.
 
 It's not actually a crc of the fsname, but a crc of the cpg name
 gfs_controld creates for the mountgroup, which is gfs:mount:fsname.
 Also, we may at some point want to allow that generated id to be overriden
 by one that's set explicitly.

The fact that this id comes from gfs_controld, and becomes available only
during mount, makes me think it's not well suited to be the statfs fsid.
GFS should probably do it's own thing for statfs (like a hash of just the
fsname) instead of depending on gfs_controld for it.  With nolock the
daemons won't be there, and we'd still want the same fsid to be produced.




Re: [Cluster-devel] gfs uevent and sysfs changes

2008-12-05 Thread david m. richter
On Thu, Dec 4, 2008 at 5:38 PM, David Teigland [EMAIL PROTECTED] wrote:
 On Thu, Dec 04, 2008 at 04:59:23PM -0500, david m. richter wrote:
 ah, so just to make sure i'm with you here: (1) gfs_controld is
 generating this id-which-is-the-mountgroup-id, and (2) gfs_kernel
 will no longer receive this in the hostdata string, so (3) i can just
 rip out my in-kernel hostdata-parsing gunk and instead send in the
 mountgroup id on my own (i have my own up/downcall channel)?  if i've
 got it right, then everything's a cinch and i'll shut up :)

 Yep.  Generally, the best way to uniquely identify and refer to a gfs
 filesystem is using the fsname string (specified during mkfs with -t and
 saved in the superblock).  But, sometimes it's just a lot easier have a
 numerical identifier instead.  I expect this is why you're using the id,
 and it's why we were using it for communicating about plocks.

yes, the numerical id gets used a lot in my pNFS stuff, where the
kernel needs to make upcalls, of which some then get relayed over
multicast -- so, I've just been stashing that in the superblock.
thanks for clearing up my questions.


 In cluster1 and cluster2 the cluster infrastructure dynamically selected a
 unique id when needed, and it never worked great.  In cluster3 the id is
 just a crc of the fsname string.

 Now that I think about this a bit more, there may be a reason to keep the
 id in the string.  There was some interest on linux-kernel about better
 using the statfs fsid field, and this id is what gfs should be putting
 there.

interesting; that'd be cool.  i've been meaning to look at statfs more
often in my stuff anyway.


 say, one tangential question (i won't be offended if you skip it -
 heh): is there a particular reason that you folks went with the uevent
 mechanism for doing upcalls?  i'm just curious, given the
 seeming-complexity and possible overhead of using the whole layered
 netlink apparatus vs. something like Trond Myklebust's rpc_pipefs
 (don't let the rpc fool you; it's a barebones, dead-simple pipe).
 -- and no, i'm not selling anything :)  my boss was asking for a list
 of differences between rpc_pipefs and uevents and the best i could
 come up with is the former's bidirectional.  Trond mentioned the
 netlink overhead and i wondered if that was actually a significant
 factor or just lost in the noise in most cases.

 The uevents looked pretty simple when I was initially designing how the
 kernel/user interactions would work, and they fit well with sysfs files
 which I was using too.  I don't think the overhead of using uevents is too
 bad.  Sysfs files and uevents definately don't work great if you need any
 kind of sophisticated bi-directional interface.

great, thanks -- always good to get folks' anecdotal advice and keep
it in my toolbag for later.

cheers,

  d
  .


 Dave





Re: [Cluster-devel] gfs uevent and sysfs changes

2008-12-04 Thread david m. richter
On Mon, Dec 1, 2008 at 12:31 PM, David Teigland [EMAIL PROTECTED] wrote:
 Here are the compatibility aspects to the recent ideas about changes to
 the user/kernel interface between gfs (1  2) and gfs_controld.

 . gfs_controld can remove id from hostdata string in mount options

hi david,

I know I'm a peripheral consumer of the cluster suite, but I thought
I'd chime in and say that I am currently using the id as passed into
the kernel in the hostdata string (I believe by mount.gfs2?) in my
pNFS work.  does the above gfs_controld can remove id from hostdata
string comment refer to something orthogonal, or would it affect what
gets stored in the superblock's hostdata at mount time?

..hm, sorry, I don't have the code right in front of me, but is that
id in the hostdata string the same thing as the mountgroup id?  if
so, then my above worry about the hostdata string is moot, because if
gfs_controld still has that info I can just make a downcall.

thanks,

  d
  .


  - no compat issues AFAICT

 . getting rid of id sysfs file from lock_dlm

  - new gfs_controld old gfs-kernel
old kernel provides both block and id sysfs files
new daemon looks for block instead of id in sysfs

  - old gfs_controld new gfs-kernel
old daemon looks for id sysfs file
new kernel needs to provide id as well as block sysfs files

  Once everyone is using the new daemon, we can remove the id sysfs
  file from the kernel.

 . uevent strings to replace recover_done/recover_status sysfs files

  - new gfs_controld old gfs-kernel
old kernel has recover sysfs files, and no new uevent strings
new daemon needs to look for either sysfs files or uevent strings

  - old gfs_controld new gfs-kernel
old daemon looks for recover sysfs files, not new uevent strings
new kernel needs to provide both sysfs files and uevent strings

  Once everyone is using new kernel and new daemon, we can remove
  the recover sysfs files from kernel, and daemon can stop looking for
  recover sysfs files.






Re: [Cluster-devel] gfs uevent and sysfs changes

2008-12-04 Thread David Teigland
On Thu, Dec 04, 2008 at 01:32:31PM -0500, david m. richter wrote:
 On Mon, Dec 1, 2008 at 12:31 PM, David Teigland [EMAIL PROTECTED] wrote:
  Here are the compatibility aspects to the recent ideas about changes to
  the user/kernel interface between gfs (1  2) and gfs_controld.
 
  . gfs_controld can remove id from hostdata string in mount options
 
 hi david,
 
 I know I'm a peripheral consumer of the cluster suite, but I thought
 I'd chime in and say that I am currently using the id as passed into
 the kernel in the hostdata string (I believe by mount.gfs2?) in my
 pNFS work.  does the above gfs_controld can remove id from hostdata
 string comment refer to something orthogonal, or would it affect what
 gets stored in the superblock's hostdata at mount time?

yes

 ..hm, sorry, I don't have the code right in front of me, but is that
 id in the hostdata string the same thing as the mountgroup id?  if
 so, then my above worry about the hostdata string is moot, because if
 gfs_controld still has that info I can just make a downcall.

Yes, it's created in gfs_controld, and passed to mount.gfs via the
hostdata string which is then passed into the kernel during mount(2).

Previously, gfs-kernel (lock_dlm actually) would pass this id back up to
gfs_controld within the plock op structures.  This was because plock ops
for all gfs fs's were funnelled to gfs_controld through a single misc
device.  gfs_controld would match the op to a particular fs using the id.

The dlm does this now, using the lockspace id.

Dave



Re: [Cluster-devel] gfs uevent and sysfs changes

2008-12-04 Thread david m. richter
On Thu, Dec 4, 2008 at 4:07 PM, David Teigland [EMAIL PROTECTED] wrote:
 On Thu, Dec 04, 2008 at 01:32:31PM -0500, david m. richter wrote:
 On Mon, Dec 1, 2008 at 12:31 PM, David Teigland [EMAIL PROTECTED] wrote:
  Here are the compatibility aspects to the recent ideas about changes to
  the user/kernel interface between gfs (1  2) and gfs_controld.
 
  . gfs_controld can remove id from hostdata string in mount options

 hi david,

 I know I'm a peripheral consumer of the cluster suite, but I thought
 I'd chime in and say that I am currently using the id as passed into
 the kernel in the hostdata string (I believe by mount.gfs2?) in my
 pNFS work.  does the above gfs_controld can remove id from hostdata
 string comment refer to something orthogonal, or would it affect what
 gets stored in the superblock's hostdata at mount time?

 yes

 ..hm, sorry, I don't have the code right in front of me, but is that
 id in the hostdata string the same thing as the mountgroup id?  if
 so, then my above worry about the hostdata string is moot, because if
 gfs_controld still has that info I can just make a downcall.

 Yes, it's created in gfs_controld, and passed to mount.gfs via the
 hostdata string which is then passed into the kernel during mount(2).

ah, so just to make sure i'm with you here: (1) gfs_controld is
generating this id-which-is-the-mountgroup-id, and (2) gfs_kernel
will no longer receive this in the hostdata string, so (3) i can just
rip out my in-kernel hostdata-parsing gunk and instead send in the
mountgroup id on my own (i have my own up/downcall channel)?  if i've
got it right, then everything's a cinch and i'll shut up :)

say, one tangential question (i won't be offended if you skip it -
heh): is there a particular reason that you folks went with the uevent
mechanism for doing upcalls?  i'm just curious, given the
seeming-complexity and possible overhead of using the whole layered
netlink apparatus vs. something like Trond Myklebust's rpc_pipefs
(don't let the rpc fool you; it's a barebones, dead-simple pipe).
-- and no, i'm not selling anything :)  my boss was asking for a list
of differences between rpc_pipefs and uevents and the best i could
come up with is the former's bidirectional.  Trond mentioned the
netlink overhead and i wondered if that was actually a significant
factor or just lost in the noise in most cases.

thanks again,

  d
  .

 Previously, gfs-kernel (lock_dlm actually) would pass this id back up to
 gfs_controld within the plock op structures.  This was because plock ops
 for all gfs fs's were funnelled to gfs_controld through a single misc
 device.  gfs_controld would match the op to a particular fs using the id.

 The dlm does this now, using the lockspace id.

 Dave