Re: [3/5] pseries: Create device hotplug entry point
On 09/22/2014 08:15 PM, Tyrel Datwyler wrote: On 09/17/2014 12:15 PM, Nathan Fontenot wrote: On 09/17/2014 02:07 AM, Michael Ellerman wrote: On Mon, 2014-09-15 at 15:31 -0500, Nathan Fontenot wrote: For pseries system the kernel will be notified of hotplug requests in the form of rtas hotplug events. Can you flesh that design out a bit for me, I don't entirely get how it's going to work. The kernel gets the rtas hotplug events (in rtasd.c) and spits them out to userspace, which then writes them back in ? This patch creates a common routine that can handle these requests in both the PowerVM anbd PowerKVM environments, handle_dlpar_errorlog(). This also ^ creates the initial memory hotplug request handling stub. For PowerVM this patch also creates a new /proc file that the drmgr command will use to write rtas hotplug events to. Why is this different between phyp and KVM? For future PowerKVM handling the rtas check-exception code can pass any rtas hotplug events received to handle_dlpar_errorlog(). Internally to the kernel you mean? Perhaps a better explanation of how things work today and where I see them going is needed. I was trying to avoid a long explanation and I don't think my shortened explanation worked. I'll include this in v2 of the patchset too. The current hotplug (or dlpar) of devices (the process is generally the same for memory, cpu, and pci) on PowerVM systems is initiated from the HMC, which communicates the request to the partitions through the RSCT framework. The RSCT framework then invokes the drmgr command. The drmgr command performs the hotplug operation by doing some pieces, such as most of the rtas calls and device tree parsing, in userspace and make requests to the kernel to online/offline the device, update the device tree and add/remove the device. For PowerKVM the approach is to follow what is currently being done for pci hotplug. A hotplug request is initiated from the host. QEMU then sends an EPOW interrupt to the guest which causes the guest to make the rtas,check-exception call. In QEMU, the rtas,check-exception call returns a rtas hotplug event to the guest. I was using this same framework to also enable memory (and next cpu) hotplug. You are correct that the current pci hotplug path for PowerKVM involves the kernel receiving the rtas event, passing it to rtas_errd in userspace, and having rtas_errd invoke drmgr. The drmgr command then handles the request as described above for PowerVM systems. There is no need for this circuitous route, we should just handle the entire hotplug of devices in the kernel. What I am hoping to do is to enable this by moving the code to handle hotplug from drmgr into the kernel and provide a single path for handling hotplug for PowerVM and PowerKVM. To make this work for PowerKVM we will update the kernel rtas code to recognize rtas hotplug events returned from rtas,check-exception calls and call handle_dlpar_errorlog(). The hotplug rtas event is never sent out to userspace. Wouldn't we still want the event surfaced to userspace so that it can at least be logged? The only logging of hotplug/dlpar events we do is putting a notification iv /var/log/messages. This is done today by the drmgr command. I can add a pr_info message to log the hotplug/dlpar request and it's success/failure. Also, I believe one of the longer term goals is to not require the rtas_errd daemon for PowerKVM. -Nathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [3/5] pseries: Create device hotplug entry point
On 09/17/2014 12:15 PM, Nathan Fontenot wrote: On 09/17/2014 02:07 AM, Michael Ellerman wrote: On Mon, 2014-09-15 at 15:31 -0500, Nathan Fontenot wrote: For pseries system the kernel will be notified of hotplug requests in the form of rtas hotplug events. Can you flesh that design out a bit for me, I don't entirely get how it's going to work. The kernel gets the rtas hotplug events (in rtasd.c) and spits them out to userspace, which then writes them back in ? This patch creates a common routine that can handle these requests in both the PowerVM anbd PowerKVM environments, handle_dlpar_errorlog(). This also ^ creates the initial memory hotplug request handling stub. For PowerVM this patch also creates a new /proc file that the drmgr command will use to write rtas hotplug events to. Why is this different between phyp and KVM? For future PowerKVM handling the rtas check-exception code can pass any rtas hotplug events received to handle_dlpar_errorlog(). Internally to the kernel you mean? Perhaps a better explanation of how things work today and where I see them going is needed. I was trying to avoid a long explanation and I don't think my shortened explanation worked. I'll include this in v2 of the patchset too. The current hotplug (or dlpar) of devices (the process is generally the same for memory, cpu, and pci) on PowerVM systems is initiated from the HMC, which communicates the request to the partitions through the RSCT framework. The RSCT framework then invokes the drmgr command. The drmgr command performs the hotplug operation by doing some pieces, such as most of the rtas calls and device tree parsing, in userspace and make requests to the kernel to online/offline the device, update the device tree and add/remove the device. For PowerKVM the approach is to follow what is currently being done for pci hotplug. A hotplug request is initiated from the host. QEMU then sends an EPOW interrupt to the guest which causes the guest to make the rtas,check-exception call. In QEMU, the rtas,check-exception call returns a rtas hotplug event to the guest. I was using this same framework to also enable memory (and next cpu) hotplug. You are correct that the current pci hotplug path for PowerKVM involves the kernel receiving the rtas event, passing it to rtas_errd in userspace, and having rtas_errd invoke drmgr. The drmgr command then handles the request as described above for PowerVM systems. There is no need for this circuitous route, we should just handle the entire hotplug of devices in the kernel. What I am hoping to do is to enable this by moving the code to handle hotplug from drmgr into the kernel and provide a single path for handling hotplug for PowerVM and PowerKVM. To make this work for PowerKVM we will update the kernel rtas code to recognize rtas hotplug events returned from rtas,check-exception calls and call handle_dlpar_errorlog(). The hotplug rtas event is never sent out to userspace. Wouldn't we still want the event surfaced to userspace so that it can at least be logged? -Tyrel For PowerVM systems, I created the /proc/powerpc/dlpar file that a rtas hotplug event can be written to and passed to handle_dlpar_errorlog(). There is no chance of updating how we receive hotplug requests on PowerVM systems. Hopefully that explains the design better. diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c index a2450b8..574ec73 100644 --- a/arch/powerpc/platforms/pseries/dlpar.c +++ b/arch/powerpc/platforms/pseries/dlpar.c @@ -16,7 +16,9 @@ #include linux/cpu.h #include linux/slab.h #include linux/of.h +#include linux/proc_fs.h #include offline_states.h +#include pseries.h #include asm/prom.h #include asm/machdep.h @@ -530,13 +532,72 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count) return count; } +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */ That is really confusing, but I think it's just a diff artifact? Yes, diff artifact. +static int handle_dlpar_errorlog(struct rtas_error_log *error_log) +{ + struct pseries_errorlog *pseries_log; + struct pseries_hp_errorlog *hp_elog; + int rc = -EINVAL; + + pseries_log = get_pseries_errorlog(error_log, + PSERIES_ELOG_SECT_ID_HOTPLUG); + if (!pseries_log) + return rc; + + hp_elog = (struct pseries_hp_errorlog *)pseries_log-data; + if (!hp_elog) + return rc; I don't see how that can happen? struct pseries_errorlog { __be16 id; /* 0x00 2-byte ASCII section ID */ __be16 length; /* 0x02 Section length in bytes */ uint8_t version;/* 0x04 Section version */ uint8_t subtype;/* 0x05 Section subtype */ __be16 creator_component; /* 0x06 Creator component ID*/ uint8_t
Re: [3/5] pseries: Create device hotplug entry point
On Mon, 2014-09-15 at 15:31 -0500, Nathan Fontenot wrote: For pseries system the kernel will be notified of hotplug requests in the form of rtas hotplug events. Can you flesh that design out a bit for me, I don't entirely get how it's going to work. The kernel gets the rtas hotplug events (in rtasd.c) and spits them out to userspace, which then writes them back in ? This patch creates a common routine that can handle these requests in both the PowerVM anbd PowerKVM environments, handle_dlpar_errorlog(). This also ^ creates the initial memory hotplug request handling stub. For PowerVM this patch also creates a new /proc file that the drmgr command will use to write rtas hotplug events to. Why is this different between phyp and KVM? For future PowerKVM handling the rtas check-exception code can pass any rtas hotplug events received to handle_dlpar_errorlog(). Internally to the kernel you mean? diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c index a2450b8..574ec73 100644 --- a/arch/powerpc/platforms/pseries/dlpar.c +++ b/arch/powerpc/platforms/pseries/dlpar.c @@ -16,7 +16,9 @@ #include linux/cpu.h #include linux/slab.h #include linux/of.h +#include linux/proc_fs.h #include offline_states.h +#include pseries.h #include asm/prom.h #include asm/machdep.h @@ -530,13 +532,72 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count) return count; } +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */ That is really confusing, but I think it's just a diff artifact? +static int handle_dlpar_errorlog(struct rtas_error_log *error_log) +{ + struct pseries_errorlog *pseries_log; + struct pseries_hp_errorlog *hp_elog; + int rc = -EINVAL; + + pseries_log = get_pseries_errorlog(error_log, +PSERIES_ELOG_SECT_ID_HOTPLUG); + if (!pseries_log) + return rc; + + hp_elog = (struct pseries_hp_errorlog *)pseries_log-data; + if (!hp_elog) + return rc; I don't see how that can happen? struct pseries_errorlog { __be16 id; /* 0x00 2-byte ASCII section ID */ __be16 length; /* 0x02 Section length in bytes */ uint8_t version;/* 0x04 Section version */ uint8_t subtype;/* 0x05 Section subtype */ __be16 creator_component; /* 0x06 Creator component ID*/ uint8_t data[]; /* 0x08 Start of section data */ }; Should you be checking for length == 0 instead ? Also I think the code will probably end up cleaner if you do the endian conversions immediately when you read the hp_elog, rather than passing it around in BE and having to remember to convert at all the usages. + switch (hp_elog-resource) { + case PSERIES_HP_ELOG_RESOURCE_MEM: + rc = dlpar_memory(hp_elog); + break; Please add: default: pr_warn_ratelimited(Unknown resource ..) Or something. + } + + return rc; +} + +static ssize_t dlpar_write(struct file *file, const char __user *buf, +size_t count, loff_t *offset) +{ + char *event_buf; + int rc; + + event_buf = kmalloc(count + 1, GFP_KERNEL); Why + 1 ? It's not null-terminated AFAICS. + if (!event_buf) + return -ENOMEM; + + rc = copy_from_user(event_buf, buf, count); + if (rc) { + kfree(event_buf); + return rc; + } + + rc = handle_dlpar_errorlog((struct rtas_error_log *)event_buf); If you start with a struct rtas_error_log * you shouldn't need any casts. + kfree(event_buf); + return rc ? rc : count; +} + +static const struct file_operations dlpar_fops = { + .write = dlpar_write, + .llseek = noop_llseek, +}; + static int __init pseries_dlpar_init(void) { + struct proc_dir_entry *proc_ent; + + proc_ent = proc_create(powerpc/dlpar, S_IWUSR, NULL, dlpar_fops); + if (proc_ent) + proc_set_size(proc_ent, 0); else error message at least please Why are we putting it in /proc, can't it go in /sys/kernel like the mobility stuff? diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index 24abc5c..0e60e15 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -20,6 +22,9 @@ #include asm/machdep.h #include asm/prom.h #include asm/sparsemem.h +#include asm/rtas.h + +DEFINE_MUTEX(dlpar_mem_mutex); static ? diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h index b94516b..28bd994 100644 --- a/arch/powerpc/platforms/pseries/pseries.h +++ b/arch/powerpc/platforms/pseries/pseries.h @@ -62,6 +63,15 @@ extern int dlpar_detach_node(struct
Re: [3/5] pseries: Create device hotplug entry point
On 09/17/2014 02:07 AM, Michael Ellerman wrote: On Mon, 2014-09-15 at 15:31 -0500, Nathan Fontenot wrote: For pseries system the kernel will be notified of hotplug requests in the form of rtas hotplug events. Can you flesh that design out a bit for me, I don't entirely get how it's going to work. The kernel gets the rtas hotplug events (in rtasd.c) and spits them out to userspace, which then writes them back in ? This patch creates a common routine that can handle these requests in both the PowerVM anbd PowerKVM environments, handle_dlpar_errorlog(). This also ^ creates the initial memory hotplug request handling stub. For PowerVM this patch also creates a new /proc file that the drmgr command will use to write rtas hotplug events to. Why is this different between phyp and KVM? For future PowerKVM handling the rtas check-exception code can pass any rtas hotplug events received to handle_dlpar_errorlog(). Internally to the kernel you mean? Perhaps a better explanation of how things work today and where I see them going is needed. I was trying to avoid a long explanation and I don't think my shortened explanation worked. I'll include this in v2 of the patchset too. The current hotplug (or dlpar) of devices (the process is generally the same for memory, cpu, and pci) on PowerVM systems is initiated from the HMC, which communicates the request to the partitions through the RSCT framework. The RSCT framework then invokes the drmgr command. The drmgr command performs the hotplug operation by doing some pieces, such as most of the rtas calls and device tree parsing, in userspace and make requests to the kernel to online/offline the device, update the device tree and add/remove the device. For PowerKVM the approach is to follow what is currently being done for pci hotplug. A hotplug request is initiated from the host. QEMU then sends an EPOW interrupt to the guest which causes the guest to make the rtas,check-exception call. In QEMU, the rtas,check-exception call returns a rtas hotplug event to the guest. I was using this same framework to also enable memory (and next cpu) hotplug. You are correct that the current pci hotplug path for PowerKVM involves the kernel receiving the rtas event, passing it to rtas_errd in userspace, and having rtas_errd invoke drmgr. The drmgr command then handles the request as described above for PowerVM systems. There is no need for this circuitous route, we should just handle the entire hotplug of devices in the kernel. What I am hoping to do is to enable this by moving the code to handle hotplug from drmgr into the kernel and provide a single path for handling hotplug for PowerVM and PowerKVM. To make this work for PowerKVM we will update the kernel rtas code to recognize rtas hotplug events returned from rtas,check-exception calls and call handle_dlpar_errorlog(). The hotplug rtas event is never sent out to userspace. For PowerVM systems, I created the /proc/powerpc/dlpar file that a rtas hotplug event can be written to and passed to handle_dlpar_errorlog(). There is no chance of updating how we receive hotplug requests on PowerVM systems. Hopefully that explains the design better. diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c index a2450b8..574ec73 100644 --- a/arch/powerpc/platforms/pseries/dlpar.c +++ b/arch/powerpc/platforms/pseries/dlpar.c @@ -16,7 +16,9 @@ #include linux/cpu.h #include linux/slab.h #include linux/of.h +#include linux/proc_fs.h #include offline_states.h +#include pseries.h #include asm/prom.h #include asm/machdep.h @@ -530,13 +532,72 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count) return count; } +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */ That is really confusing, but I think it's just a diff artifact? Yes, diff artifact. +static int handle_dlpar_errorlog(struct rtas_error_log *error_log) +{ +struct pseries_errorlog *pseries_log; +struct pseries_hp_errorlog *hp_elog; +int rc = -EINVAL; + +pseries_log = get_pseries_errorlog(error_log, + PSERIES_ELOG_SECT_ID_HOTPLUG); +if (!pseries_log) +return rc; + +hp_elog = (struct pseries_hp_errorlog *)pseries_log-data; +if (!hp_elog) +return rc; I don't see how that can happen? struct pseries_errorlog { __be16 id; /* 0x00 2-byte ASCII section ID */ __be16 length; /* 0x02 Section length in bytes */ uint8_t version;/* 0x04 Section version */ uint8_t subtype;/* 0x05 Section subtype */ __be16 creator_component; /* 0x06 Creator component ID*/ uint8_t data[]; /* 0x08 Start of section data */ }; Should you be checking for length == 0 instead ? You are correct. Also I think the code will
[PATCH 3/5] pseries: Create device hotplug entry point
For pseries system the kernel will be notified of hotplug requests in the form of rtas hotplug events. This patch creates a common routine that can handle these requests in both the PowerVM anbd PowerKVM environments, handle_dlpar_errorlog(). This also creates the initial memory hotplug request handling stub. For PowerVM this patch also creates a new /proc file that the drmgr command will use to write rtas hotplug events to. For future PowerKVM handling the rtas check-exception code can pass any rtas hotplug events received to handle_dlpar_errorlog(). Signed-off-by: Nathan Fontenot nf...@linux.vnet.ibm.com --- arch/powerpc/platforms/pseries/dlpar.c | 63 +++ arch/powerpc/platforms/pseries/hotplug-memory.c | 22 arch/powerpc/platforms/pseries/pseries.h| 10 3 files changed, 94 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c index a2450b8..574ec73 100644 --- a/arch/powerpc/platforms/pseries/dlpar.c +++ b/arch/powerpc/platforms/pseries/dlpar.c @@ -16,7 +16,9 @@ #include linux/cpu.h #include linux/slab.h #include linux/of.h +#include linux/proc_fs.h #include offline_states.h +#include pseries.h #include asm/prom.h #include asm/machdep.h @@ -530,13 +532,72 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count) return count; } +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */ + +static int handle_dlpar_errorlog(struct rtas_error_log *error_log) +{ + struct pseries_errorlog *pseries_log; + struct pseries_hp_errorlog *hp_elog; + int rc = -EINVAL; + + pseries_log = get_pseries_errorlog(error_log, + PSERIES_ELOG_SECT_ID_HOTPLUG); + if (!pseries_log) + return rc; + + hp_elog = (struct pseries_hp_errorlog *)pseries_log-data; + if (!hp_elog) + return rc; + + switch (hp_elog-resource) { + case PSERIES_HP_ELOG_RESOURCE_MEM: + rc = dlpar_memory(hp_elog); + break; + } + + return rc; +} + +static ssize_t dlpar_write(struct file *file, const char __user *buf, + size_t count, loff_t *offset) +{ + char *event_buf; + int rc; + + event_buf = kmalloc(count + 1, GFP_KERNEL); + if (!event_buf) + return -ENOMEM; + + rc = copy_from_user(event_buf, buf, count); + if (rc) { + kfree(event_buf); + return rc; + } + + rc = handle_dlpar_errorlog((struct rtas_error_log *)event_buf); + kfree(event_buf); + return rc ? rc : count; +} + +static const struct file_operations dlpar_fops = { + .write = dlpar_write, + .llseek = noop_llseek, +}; + static int __init pseries_dlpar_init(void) { + struct proc_dir_entry *proc_ent; + + proc_ent = proc_create(powerpc/dlpar, S_IWUSR, NULL, dlpar_fops); + if (proc_ent) + proc_set_size(proc_ent, 0); + +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE ppc_md.cpu_probe = dlpar_cpu_probe; ppc_md.cpu_release = dlpar_cpu_release; +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */ return 0; } machine_device_initcall(pseries, pseries_dlpar_init); -#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */ diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index 24abc5c..0e60e15 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -9,6 +9,8 @@ * 2 of the License, or (at your option) any later version. */ +#define pr_fmt(fmt)pseries-hotplug: fmt + #include linux/of.h #include linux/of_address.h #include linux/memblock.h @@ -20,6 +22,9 @@ #include asm/machdep.h #include asm/prom.h #include asm/sparsemem.h +#include asm/rtas.h + +DEFINE_MUTEX(dlpar_mem_mutex); unsigned long pseries_memory_block_size(void) { @@ -150,6 +155,23 @@ static inline int pseries_remove_mem_node(struct device_node *np) } #endif /* CONFIG_MEMORY_HOTREMOVE */ +int dlpar_memory(struct pseries_hp_errorlog *hp_elog) +{ + int rc = 0; + + mutex_lock(dlpar_mem_mutex); + + switch (hp_elog-action) { + default: + pr_err(Invalid action (%d) specified\n, hp_elog-action); + rc = -EINVAL; + break; + } + + mutex_unlock(dlpar_mem_mutex); + return rc; +} + static int pseries_add_mem_node(struct device_node *np) { const char *type; diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h index b94516b..28bd994 100644 --- a/arch/powerpc/platforms/pseries/pseries.h +++ b/arch/powerpc/platforms/pseries/pseries.h @@ -11,6 +11,7 @@ #define _PSERIES_PSERIES_H #include linux/interrupt.h +#include asm/rtas.h struct device_node; @@ -62,6 +63,15 @@ extern int dlpar_detach_node(struct device_node