Re: [UPDATED PATCH] Re: [Lse-tech] Re: [Patch 7/7] Generic netlink interface (delay accounting)
On Fri, 2006-10-03 at 22:09 +0530, Balbir Singh wrote: On Fri, Mar 10, 2006 at 09:53:53AM -0500, jamal wrote: On kernel-user (in the case of response to #a or async notifiation as in #b) you really dont need to specify the TG/PID since they appear in the STATS etc. I see your point now. I am looking at other users of netlink like rtnetlink and I see the classical usage. We can implement TLV's in our code, but for the most part the data we exchange between the user - kernel has all the TLV's listed in the enum above. The major differnece is the type (pid/tgid). Hence we created a structure (taskstats) instead of using TLV's. Something to remember: 1) TLVs are essentially giving you the flexibility to send optionally appearing elements. It is up to the receiver (in the kernel or user space) to check for the presence of mandatory elements or execute things depending on the presence of certain TLVs. Example in your case: if the tgid TLV appears then the user is requesting for that TLV if the pid appears then they are requesting for that if both appear then it is the of the two. You should always ignore TLVs you dont understand - to allow for forward compatibility. 2) The T part is essentially also encoding (semantically) what size the value is; the L part is useful for validation. So the receiver will always know what the size of the TLV is by definition and uses the L to make sure it is the right size. Reject what is of the wrong size. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [UPDATED PATCH] Re: [Lse-tech] Re: [Patch 7/7] Generic netlink interface (delay accounting)
On Thu, 2006-09-03 at 20:07 +0530, Balbir Singh wrote: Please find the latest version of the patch for review. The genetlink code has been updated as per your review comments. The changelog is provided below 1. Eliminated TASKSTATS_CMD_LISTEN and TASKSTATS_CMD_IGNORE 2. Provide generic functions called genlmsg_data() and genlmsg_len() in linux/net/genetlink.h 3. Do not multicast all replies, multicast only events generated due to task exit. 4. The taskstats and taskstats_reply structures are now 64 bit aligned. 5. Family id is dynamically generated. Please let us know if we missed something out. Design still shaky IMO - now that i think i may understand what your end goal is. Using the principles i described in earlier email, the problem you are trying to solve is: a) shipping of the taskstats from kernel to user-space asynchronously to all listeners on multicast channel/group TASKSTATS_LISTEN_GRP at the point when some process exits. b) responding to queries issued by the user to the kernel for taskstats of a particular defined tgid and/or pid combination. Did i summarize your goals correctly? So lets stat with #b: i) the message is multicast; there has to be a user space app registered to the multicast group otherwise nothing goes to user space. ii) user space issues a GET and it seems to me the appropriate naming for the response is a NEW. Lets go to #a: The issued multicast messages are also NEW and no different from the ones sent in response to a GET. Having said that then, you have the following commands: enum { TASKSTATS_CMD_UNSPEC, /* Reserved */ TASKSTATS_CMD_GET, /* user - kernel query*/ TASKSTATS_CMD_NEW, /* kernel - user update */ }; You also need the following TLVs enum { TASKSTATS_TYPE_UNSPEC, /* Reserved */ TASKSTATS_TYPE_TGID, /* The TGID */ TASKSTATS_TYPE_PID, /* The PID */ TASKSTATS_TYPE_STATS,/* carries the taskstats */ TASKSTATS_TYPE_VERS, /* carries the version */ }; Refer to the doc i passed you and provide feedback if how to use the above is not obvious. The use of TLVs above implies that any of these can be optionally appearing. So when you are going from user-kernel with a GET a in #a above, then you can specify the PID and/or TGID and you dont need to specify the STATS and this would be perfectly legal. On kernel-user (in the case of response to #a or async notifiation as in #b) you really dont need to specify the TG/PID since they appear in the STATS etc. I take it you dont want to configure the values of taskstats from user space, otherwise user-kernel will send a NEW as well. I also take it dumping doesnt apply to you, so you dont need a dump callback in your kernel code. From what i described so far, you dont really need a header for yourself either (maybe you need one to just store the VERSION?) I didnt understand the point to the err field you had in the reply. Netlink already does issue errors which can be read via perror. If this is different from what you need, it may be an excuse to have your own header. I hope this helps. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [UPDATED PATCH] Re: [Lse-tech] Re: [Patch 7/7] Generic netlink interface (delay accounting)
On Fri, 2006-10-03 at 09:53 -0500, jamal wrote: a) shipping of the taskstats from kernel to user-space asynchronously to all listeners on multicast channel/group TASKSTATS_LISTEN_GRP at the point when some process exits. b) responding to queries issued by the user to the kernel for taskstats of a particular defined tgid and/or pid combination. Did i summarize your goals correctly? So lets stat with #b: i) the message is multicast; there has to be a user space app registered to the multicast group otherwise nothing goes to user space. I mispoke: The above applies to #a. For #b, the message from/to kernel to user is unicast. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [UPDATED PATCH] Re: [Lse-tech] Re: [Patch 7/7] Generic netlink interface (delay accounting)
On Fri, Mar 10, 2006 at 09:53:53AM -0500, jamal wrote: On Thu, 2006-09-03 at 20:07 +0530, Balbir Singh wrote: Please let us know if we missed something out. Design still shaky IMO - now that i think i may understand what your end goal is. Using the principles i described in earlier email, the problem you are trying to solve is: a) shipping of the taskstats from kernel to user-space asynchronously to all listeners on multicast channel/group TASKSTATS_LISTEN_GRP at the point when some process exits. b) responding to queries issued by the user to the kernel for taskstats of a particular defined tgid and/or pid combination. Did i summarize your goals correctly? Yes, you did. So lets stat with #b: i) the message is multicast; there has to be a user space app registered to the multicast group otherwise nothing goes to user space. ii) user space issues a GET and it seems to me the appropriate naming for the response is a NEW. Lets go to #a: The issued multicast messages are also NEW and no different from the ones sent in response to a GET. Having said that then, you have the following commands: enum { TASKSTATS_CMD_UNSPEC, /* Reserved */ TASKSTATS_CMD_GET, /* user - kernel query*/ TASKSTATS_CMD_NEW, /* kernel - user update */ }; You also need the following TLVs enum { TASKSTATS_TYPE_UNSPEC, /* Reserved */ TASKSTATS_TYPE_TGID, /* The TGID */ TASKSTATS_TYPE_PID, /* The PID */ TASKSTATS_TYPE_STATS, /* carries the taskstats */ TASKSTATS_TYPE_VERS, /* carries the version */ }; Refer to the doc i passed you and provide feedback if how to use the above is not obvious. I will look at the document, just got hold of it. The use of TLVs above implies that any of these can be optionally appearing. So when you are going from user-kernel with a GET a in #a above, then you can specify the PID and/or TGID and you dont need to specify the STATS and this would be perfectly legal. On kernel-user (in the case of response to #a or async notifiation as in #b) you really dont need to specify the TG/PID since they appear in the STATS etc. I see your point now. I am looking at other users of netlink like rtnetlink and I see the classical usage. We can implement TLV's in our code, but for the most part the data we exchange between the user - kernel has all the TLV's listed in the enum above. The major differnece is the type (pid/tgid). Hence we created a structure (taskstats) instead of using TLV's. I take it you dont want to configure the values of taskstats from user space, otherwise user-kernel will send a NEW as well. Your understanding is correct. I also take it dumping doesnt apply to you, so you dont need a dump callback in your kernel code. Yes, this is correct as well. From what i described so far, you dont really need a header for yourself either (maybe you need one to just store the VERSION?) True, we do not need a header. I didnt understand the point to the err field you had in the reply. Netlink already does issue errors which can be read via perror. If this is different from what you need, it may be an excuse to have your own header. Hmm.. Will look into this. I hope this helps. Yes, it does immensely. Thanks for the detailed feedback. cheers, jamal Warm Regards, Balbir - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[UPDATED PATCH] Re: [Lse-tech] Re: [Patch 7/7] Generic netlink interface (delay accounting)
Thanks for the clarification of the usage model. While our needs are certainly much less complex, it is useful to know the range of options. There are no hard rules on what you need to be multicasting and as an example you could send periodic(aka time based) samples from the kernel on a multicast channel and that would be received by all. It did seem odd that you want to have a semi-promiscous mode where a response to a GET is multicast. If that is still what you want to achieve, then you should. Also if you can provide feedback whether the doc i sent was any use and what wasnt clear etc. also take a look at the excellent documentation Thomas Graf has put in the kernel for all the utilities for manipulating netlink messages and tell me if that should also be put in this doc (It is listed as a TODO). Hello, Jamal, Please find the latest version of the patch for review. The genetlink code has been updated as per your review comments. The changelog is provided below 1. Eliminated TASKSTATS_CMD_LISTEN and TASKSTATS_CMD_IGNORE 2. Provide generic functions called genlmsg_data() and genlmsg_len() in linux/net/genetlink.h 3. Do not multicast all replies, multicast only events generated due to task exit. 4. The taskstats and taskstats_reply structures are now 64 bit aligned. 5. Family id is dynamically generated. Please let us know if we missed something out. Thanks, Balbir Signed-off-by: Shailabh Nagar [EMAIL PROTECTED] Signed-off-by: Balbir Singh [EMAIL PROTECTED] --- include/linux/delayacct.h |2 include/linux/taskstats.h | 128 include/net/genetlink.h | 20 +++ init/Kconfig | 16 ++- kernel/Makefile |1 kernel/delayacct.c| 56 ++ kernel/taskstats.c| 244 ++ 7 files changed, 464 insertions(+), 3 deletions(-) diff -puN include/linux/delayacct.h~delayacct-genetlink include/linux/delayacct.h --- linux-2.6.16-rc5/include/linux/delayacct.h~delayacct-genetlink 2006-03-09 17:15:31.0 +0530 +++ linux-2.6.16-rc5-balbir/include/linux/delayacct.h 2006-03-09 17:15:31.0 +0530 @@ -15,6 +15,7 @@ #define _LINUX_TASKDELAYS_H #include linux/sched.h +#include linux/taskstats.h #ifdef CONFIG_TASK_DELAY_ACCT extern int delayacct_on; /* Delay accounting turned on/off */ @@ -24,6 +25,7 @@ extern void __delayacct_tsk_init(struct extern void __delayacct_tsk_exit(struct task_struct *); extern void __delayacct_blkio(void); extern void __delayacct_swapin(void); +extern int delayacct_add_tsk(struct taskstats_reply *, struct task_struct *); static inline void delayacct_tsk_init(struct task_struct *tsk) { diff -puN /dev/null include/linux/taskstats.h --- /dev/null 2004-06-24 23:34:38.0 +0530 +++ linux-2.6.16-rc5-balbir/include/linux/taskstats.h 2006-03-09 19:28:54.0 +0530 @@ -0,0 +1,128 @@ +/* taskstats.h - exporting per-task statistics + * + * Copyright (C) Shailabh Nagar, IBM Corp. 2006 + * (C) Balbir Singh, IBM Corp. 2006 + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of version 2.1 of the GNU Lesser General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + */ + +#ifndef _LINUX_TASKSTATS_H +#define _LINUX_TASKSTATS_H + +/* Format for per-task data returned to userland when + * - a task exits + * - listener requests stats for a task + * + * The struct is versioned. Newer versions should only add fields to + * the bottom of the struct to maintain backward compatibility. + * + * To create the next version, bump up the taskstats_version variable + * and delineate the start of newly added fields with a comment indicating + * the version number. + */ + +#define TASKSTATS_VERSION 1 + +struct taskstats { + /* Maintain 64-bit alignment while extending */ + + /* Version 1 */ +#define TASKSTATS_NOPID-1 + __s64 pid; + __s64 tgid; + + /* XXX_count is number of delay values recorded. +* XXX_total is corresponding cumulative delay in nanoseconds +*/ + +#define TASKSTATS_NOCPUSTATS 1 + __u64 cpu_count; + __u64 cpu_delay_total;/* wait, while runnable, for cpu */ + __u64 blkio_count; + __u64 blkio_delay_total; /* sync,block io completion wait*/ + __u64 swapin_count; + __u64 swapin_delay_total; /* swapin page fault wait*/ + + __u64 cpu_run_total; /* cpu running time +* no count available/provided */ +}; + + +#define TASKSTATS_LISTEN_GROUP 0x1 + +/* + * Commands sent from userspace + * Not versioned. New commands should only be inserted at the enum's end +
Re: [UPDATED PATCH] Re: [Lse-tech] Re: [Patch 7/7] Generic netlink interface (delay accounting)
Balbir Singh wrote: snip Hello, Jamal, Please find the latest version of the patch for review. The genetlink code has been updated as per your review comments. The changelog is provided below 1. Eliminated TASKSTATS_CMD_LISTEN and TASKSTATS_CMD_IGNORE 2. Provide generic functions called genlmsg_data() and genlmsg_len() in linux/net/genetlink.h Balbir, it might be a good idea to split 2. out separately, since it has generic value beyond the delay accounting patches (just like we did for the timespec_diff_ns change) Thanks, Shailabh 3. Do not multicast all replies, multicast only events generated due to task exit. 4. The taskstats and taskstats_reply structures are now 64 bit aligned. 5. Family id is dynamically generated. Please let us know if we missed something out. Thanks, Balbir Signed-off-by: Shailabh Nagar [EMAIL PROTECTED] Signed-off-by: Balbir Singh [EMAIL PROTECTED] --- include/linux/delayacct.h |2 include/linux/taskstats.h | 128 include/net/genetlink.h | 20 +++ init/Kconfig | 16 ++- kernel/Makefile |1 kernel/delayacct.c| 56 ++ kernel/taskstats.c| 244 ++ 7 files changed, 464 insertions(+), 3 deletions(-) snip - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html