[PATCH v2 07/10] pwm: pwm-tiehrpwm: pinctrl support
Enable pinctrl for pwm-tiehrpwm Signed-off-by: Philip, Avinash avinashphi...@ti.com --- :100644 100644 fba7f9b... 07911e6... M drivers/pwm/pwm-tiehrpwm.c drivers/pwm/pwm-tiehrpwm.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/drivers/pwm/pwm-tiehrpwm.c b/drivers/pwm/pwm-tiehrpwm.c index fba7f9b..07911e6 100644 --- a/drivers/pwm/pwm-tiehrpwm.c +++ b/drivers/pwm/pwm-tiehrpwm.c @@ -26,6 +26,7 @@ #include linux/clk.h #include linux/pm_runtime.h #include linux/of_device.h +#include linux/pinctrl/consumer.h #include tipwmss.h @@ -418,6 +419,11 @@ static int __devinit ehrpwm_pwm_probe(struct platform_device *pdev) struct resource *r; struct clk *clk; struct ehrpwm_pwm_chip *pc; + struct pinctrl *pinctrl; + + pinctrl = devm_pinctrl_get_select_default(pdev-dev); + if (IS_ERR(pinctrl)) + dev_warn(pdev-dev, failed to configure pins from driver\n); pc = devm_kzalloc(pdev-dev, sizeof(*pc), GFP_KERNEL); if (!pc) { -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 08/10] pwm: pwm-tiehrpwm: Adding TBCLK gating support.
Some platforms (like AM33XX) requires clock gating from control module explicitly for TBCLK. Enabling of this clock required for the functioning of the time base sub module in EHRPWM module. So adding optional TBCLK handling if DT node populated with tbclkgating. This helps the driver can coexist for Davinci platforms. Signed-off-by: Philip, Avinash avinashphi...@ti.com Cc: Grant Likely grant.lik...@secretlab.ca Cc: Rob Herring rob.herr...@calxeda.com Cc: Rob Landley r...@landley.net --- Changes since v1: - Moved TBCLK enable from probe to .pwm_enable disable from remove to .pwm_disable :100644 100644 07911e6... 927a8ed... M drivers/pwm/pwm-tiehrpwm.c drivers/pwm/pwm-tiehrpwm.c | 22 ++ 1 files changed, 22 insertions(+), 0 deletions(-) diff --git a/drivers/pwm/pwm-tiehrpwm.c b/drivers/pwm/pwm-tiehrpwm.c index 07911e6..927a8ed 100644 --- a/drivers/pwm/pwm-tiehrpwm.c +++ b/drivers/pwm/pwm-tiehrpwm.c @@ -126,6 +126,7 @@ struct ehrpwm_pwm_chip { void __iomem*mmio_base; unsigned long period_cycles[NUM_PWM_CHANNEL]; enum pwm_polarity polarity[NUM_PWM_CHANNEL]; + struct clk *tbclk; }; static inline struct ehrpwm_pwm_chip *to_ehrpwm_pwm_chip(struct pwm_chip *chip) @@ -346,6 +347,13 @@ static int ehrpwm_pwm_enable(struct pwm_chip *chip, struct pwm_device *pwm) /* Channels polarity can be configured from action qualifier module */ configure_polarity(pc, pwm-hwpwm); + /* +* Platforms require explicit clock enabling of TBCLK has +* to enable TBCLK explicitly before enabling PWM device +*/ + if (pc-tbclk) + clk_enable(pc-tbclk); + /* Enable time counter for free_run */ ehrpwm_modify(pc-mmio_base, TBCTL, TBCTL_RUN_MASK, TBCTL_FREE_RUN); return 0; @@ -374,6 +382,10 @@ static void ehrpwm_pwm_disable(struct pwm_chip *chip, struct pwm_device *pwm) ehrpwm_modify(pc-mmio_base, AQCSFRC, aqcsfrc_mask, aqcsfrc_val); + /* Disabling TBCLK on PWM disable */ + if (pc-tbclk) + clk_disable(pc-tbclk); + /* Stop Time base counter */ ehrpwm_modify(pc-mmio_base, TBCTL, TBCTL_RUN_MASK, TBCTL_STOP_NEXT); @@ -464,6 +476,16 @@ static int __devinit ehrpwm_pwm_probe(struct platform_device *pdev) dev_err(pdev-dev, pwmchip_add() failed: %d\n, ret); return ret; } + + /* Some platforms require explicit tbclk gating */ + if (of_property_read_bool(pdev-dev.of_node, tbclkgating)) { + pc-tbclk = clk_get(pdev-dev, tbclk); + if (IS_ERR(pc-tbclk)) { + dev_err(pdev-dev, Could not get EHRPWM TBCLK\n); + return PTR_ERR(pc-tbclk); + } + } + pm_runtime_enable(pdev-dev); pm_runtime_get_sync(pdev-dev); if (!(pwmss_submodule_state_change(pdev-dev.parent, EPWMCLK_EN) -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 10/10] ARM: dts: AM33XX: Add PWM backlight DT data to am335x-evm
PWM output from ecap0 uses as backlight source. Also adds low threshold value to have a uniform divisions in brightness-levels scales. Signed-off-by: Philip, Avinash avinashphi...@ti.com --- :100644 100644 185d632... 9857050... M arch/arm/boot/dts/am335x-evm.dts arch/arm/boot/dts/am335x-evm.dts | 21 + 1 files changed, 21 insertions(+), 0 deletions(-) diff --git a/arch/arm/boot/dts/am335x-evm.dts b/arch/arm/boot/dts/am335x-evm.dts index 185d632..9857050 100644 --- a/arch/arm/boot/dts/am335x-evm.dts +++ b/arch/arm/boot/dts/am335x-evm.dts @@ -18,6 +18,14 @@ reg = 0x8000 0x1000; /* 256 MB */ }; + am33xx_pinmux: pinmux@44e10800 { + ecap0_pins: backlight_pins { + pinctrl-single,pins = + 0x164 0x0 /* eCAP0_in_PWM0_out.eCAP0_in_PWM0_out MODE0 */ + ; + }; + }; + ocp { uart1: serial@44e09000 { status = okay; @@ -31,6 +39,12 @@ reg = 0x2d; }; }; + + ecap0: ecap@48300100 { + status = okay; + pinctrl-names = default; + pinctrl-0 = ecap0_pins; + }; }; vbat: fixedregulator@0 { @@ -40,6 +54,13 @@ regulator-max-microvolt = 500; regulator-boot-on; }; + + backlight { + compatible = pwm-backlight; + pwms = ecap0 0 5 0; + brightness-levels = 0 51 53 56 62 75 101 152 255; + default-brightness-level = 8; + }; }; /include/ tps65910.dtsi -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 09/10] ARM: dts: AM33XX: Add PWMSS device tree nodes
Add PWMSS device tree nodes in relation with ECAP EHRPWM DT nodes to AM33XX SoC family. Also populates device tree nodes for ECAP EHRPWM by adding necessary properties like pwm-cells, base reg set disabled as status. Signed-off-by: Philip, Avinash avinashphi...@ti.com --- :100644 100644 bb31bff... cf5e049... M arch/arm/boot/dts/am33xx.dtsi arch/arm/boot/dts/am33xx.dtsi | 90 + 1 files changed, 90 insertions(+), 0 deletions(-) diff --git a/arch/arm/boot/dts/am33xx.dtsi b/arch/arm/boot/dts/am33xx.dtsi index bb31bff..cf5e049 100644 --- a/arch/arm/boot/dts/am33xx.dtsi +++ b/arch/arm/boot/dts/am33xx.dtsi @@ -210,5 +210,95 @@ interrupt-parent = intc; interrupts = 91; }; + + epwmss0: epwmss@4830 { + compatible = ti,am33xx-pwmss; + reg = 0x4830 0x10 + 0x48300100 0x80 + 0x48300180 0x80 + 0x48300200 0x80; + ti,hwmods = epwmss0; + #address-cells = 1; + #size-cells = 1; + status = disabled; + ranges; + + ecap0: ecap@48300100 { + compatible = ti,am33xx-ecap; + #pwm-cells = 3; + reg = 0x48300100 0x80; + ti,hwmods = ecap0; + status = disabled; + }; + + ehrpwm0: ehrpwm@48300200 { + compatible = ti,am33xx-ehrpwm; + #pwm-cells = 3; + reg = 0x48300200 0x80; + ti,hwmods = ehrpwm0; + status = disabled; + tbclkgating; + }; + }; + + epwmss1: epwmss@48302000 { + compatible = ti,am33xx-pwmss; + reg = 0x48302000 0x10 + 0x48302100 0x80 + 0x48302180 0x80 + 0x48302200 0x80; + ti,hwmods = epwmss1; + #address-cells = 1; + #size-cells = 1; + status = disabled; + ranges; + + ecap1: ecap@48302100 { + compatible = ti,am33xx-ecap; + #pwm-cells = 3; + reg = 0x48302100 0x80; + ti,hwmods = ecap1; + status = disabled; + }; + + ehrpwm1: ehrpwm@48302200 { + compatible = ti,am33xx-ehrpwm; + #pwm-cells = 3; + reg = 0x48302200 0x80; + ti,hwmods = ehrpwm1; + status = disabled; + tbclkgating; + }; + }; + + epwmss2: epwmss@48304000 { + compatible = ti,am33xx-pwmss; + reg = 0x48304000 0x10 + 0x48304100 0x80 + 0x48304180 0x80 + 0x48304200 0x80; + ti,hwmods = epwmss2; + #address-cells = 1; + #size-cells = 1; + status = disabled; + ranges; + + ecap2: ecap@48304100 { + compatible = ti,am33xx-ecap; + #pwm-cells = 3; + reg = 0x48304100 0x80; + ti,hwmods = ecap2; + status = disabled; + }; + + ehrpwm2: ehrpwm@48304200 { + compatible = ti,am33xx-ehrpwm; + #pwm-cells = 3; + reg = 0x48304200 0x80; + ti,hwmods = ehrpwm2; + status = disabled; + tbclkgating; + }; + }; }; }; -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sched/fair: Set se-vruntime directly in place_entity()
We are first storing the new vruntime in a variable and then storing it in se-vruntime. Simply update se-vruntime directly. Signed-off-by: Viresh Kumar viresh.ku...@linaro.org --- kernel/sched/fair.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a319d56c..820a2f1 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1454,9 +1454,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial) } /* ensure we never gain time by being placed backwards. */ - vruntime = max_vruntime(se-vruntime, vruntime); - - se-vruntime = vruntime; + se-vruntime = max_vruntime(se-vruntime, vruntime); } static void check_enqueue_throttle(struct cfs_rq *cfs_rq); -- 1.7.12.rc2.18.g61b472e -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website
At Thu, 08 Nov 2012 08:31:35 +0100, Daniel Mack wrote: (snip) We can't simply stop both endpoints in the prepare callback. The new function doesn't stop the stream by itself but it just syncs if the stream is being stopped beforehand. So, it's safe to call it there. Maybe the name was confusing. It should have been like snd_usb_endpoint_sync_pending_stop() or such. Ah, right. I was errornously looking closer to Alan's patch but then replied to yours. Alright then - thanks for explaining :) OK, thanks for checking. FWIW, below is the patch I applied now to for-linus branch. Renamed the function, added the comment and put NULL check to the function to simplify. Takashi --- From: Takashi Iwai ti...@suse.de Subject: [PATCH] ALSA: usb-audio: Fix crash at re-preparing the PCM stream There are bug reports of a crash with USB-audio devices when PCM prepare is performed immediately after the stream is stopped via trigger callback. It turned out that the problem is that we don't wait until all URBs are killed. This patch adds a new function to synchronize the pending stop operation on an endpoint, and calls in the prepare callback for avoiding the crash above. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=49181 Reported-and-tested-by: Artem S. Tashkinov t.ar...@lycos.com Cc: sta...@vger.kernel.org [v3.6] Signed-off-by: Takashi Iwai ti...@suse.de --- sound/usb/endpoint.c | 13 + sound/usb/endpoint.h | 1 + sound/usb/pcm.c | 3 +++ 3 files changed, 17 insertions(+) diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c index 7f78c6d..34de6f2 100644 --- a/sound/usb/endpoint.c +++ b/sound/usb/endpoint.c @@ -35,6 +35,7 @@ #define EP_FLAG_ACTIVATED 0 #define EP_FLAG_RUNNING1 +#define EP_FLAG_STOPPING 2 /* * snd_usb_endpoint is a model that abstracts everything related to an @@ -502,10 +503,20 @@ static int wait_clear_urbs(struct snd_usb_endpoint *ep) if (alive) snd_printk(KERN_ERR timeout: still %d active urbs on EP #%x\n, alive, ep-ep_num); + clear_bit(EP_FLAG_STOPPING, ep-flags); return 0; } +/* sync the pending stop operation; + * this function itself doesn't trigger the stop operation + */ +void snd_usb_endpoint_sync_pending_stop(struct snd_usb_endpoint *ep) +{ + if (ep test_bit(EP_FLAG_STOPPING, ep-flags)) + wait_clear_urbs(ep); +} + /* * unlink active urbs. */ @@ -918,6 +929,8 @@ void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep, if (wait) wait_clear_urbs(ep); + else + set_bit(EP_FLAG_STOPPING, ep-flags); } } diff --git a/sound/usb/endpoint.h b/sound/usb/endpoint.h index 6376ccf..3d4c970 100644 --- a/sound/usb/endpoint.h +++ b/sound/usb/endpoint.h @@ -19,6 +19,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep, int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, int can_sleep); void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep, int force, int can_sleep, int wait); +void snd_usb_endpoint_sync_pending_stop(struct snd_usb_endpoint *ep); int snd_usb_endpoint_activate(struct snd_usb_endpoint *ep); int snd_usb_endpoint_deactivate(struct snd_usb_endpoint *ep); void snd_usb_endpoint_free(struct list_head *head); diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c index 37428f7..5c12a3f 100644 --- a/sound/usb/pcm.c +++ b/sound/usb/pcm.c @@ -568,6 +568,9 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream) goto unlock; } + snd_usb_endpoint_sync_pending_stop(subs-sync_endpoint); + snd_usb_endpoint_sync_pending_stop(subs-data_endpoint); + ret = set_format(subs, subs-cur_audiofmt); if (ret 0) goto unlock; -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] leds: leds-gpio: Defer probing in case of deferred gpio probing
Hi Bryan, On 08/11/12 01:28, Bryan Wu wrote: On Wed, Nov 7, 2012 at 5:06 AM, Roland Stigge sti...@antcom.de wrote: This patch makes leds-gpio's probe() return -EPROBE_DEFER if any of the gpios to register are deferred themselves. This makes a change of gpio_leds_create_of()'s return value necessary: Instead of returning NULL on error, we now use ERR_PTR() error coding. Signed-off-by: Roland Stigge sti...@antcom.de Sorry about this, actually I've already merged it into my for-next branch and forgot to reply this email. It's here: http://git.kernel.org/?p=linux/kernel/git/cooloney/linux-leds.git;a=commitdiff;h=424f58e4cebd76458b44e69ed31f7297808770cd So is the same one as here? Thanks, still the same patch. Could have seen this in your tree. ;-) Roland -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] add tpm_xenu.ko: Xen Virtual TPM frontend driver
On 07.11.12 at 19:14, Matthew Fioravante matthew.fiorava...@jhuapl.edu wrote: On 11/07/2012 09:46 AM, Kent Yoder wrote: --- a/drivers/char/tpm/tpm.h +++ b/drivers/char/tpm/tpm.h @@ -130,6 +130,9 @@ struct tpm_chip { struct list_head list; void (*release) (struct device *); +#if CONFIG_XEN + void *priv; +#endif Can you use the chip-vendor.data pointer here instead? tpm_ibmvtpm is already using that as a priv pointer. I should probably change that name to make it more obvious what that's used for. That makes more sense. I'm guessing your data pointer didn't exist during the 2.6.18 kernel which is why they added their own priv pointer. It got introduced with 3.7-rc. @@ -310,6 +313,18 @@ struct tpm_cmd_t { ssize_t tpm_getcap(struct device *, __be32, cap_t *, const char *); +#ifdef CONFIG_XEN +static inline void *chip_get_private(const struct tpm_chip *chip) +{ + return chip-priv; +} + +static inline void chip_set_private(struct tpm_chip *chip, void *priv) +{ + chip-priv = priv; +} +#endif Can you put these in tpm_vtpm.c please? One less #define. :-) Agreed, I'd rather not have to modify your shared tpm.h interface at all. Either such accessors should be defined here, for everyone to use (and tpm_ibmvtpm.c get changed accordingly), or the Xen code should access the field without wrappers too (for consistency). Jan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] memcg: oom: fix totalpages calculation for memory.swappiness==0
On Wed 07-11-12 14:53:40, Andrew Morton wrote: On Wed, 7 Nov 2012 23:46:40 +0100 Michal Hocko mho...@suse.cz wrote: Realistically, is anyone likely to hurt from this? The primary motivation for the fix was a real report by a customer. Describe it please and I'll copy it to the changelog. The original issue (a wrong tasks get killed in a small group and memcg swappiness=0) has been reported on top of our 3.0 based kernel (with fe35004f backported). I have tried to replicate it by the test case mentioned https://lkml.org/lkml/2012/10/10/223. As David correctly pointed out (https://lkml.org/lkml/2012/10/10/418) the significant role played the fact that all the processes in the group have CAP_SYS_ADMIN but oom_score_adj has the similar effect. Say there is 2G of swap space which is 524288 pages. If you add CAP_SYS_ADMIN bonus then you have -15728 score for the bias. This means that all tasks with less than 60M get the minimum score and it is tasks ordering which determines who gets killed as a result. To summarize it. Users of small groups (relatively to the swap size) with CAP_SYS_ADMIN tasks resp. oom_score_adj are affected the most others might see an unexpected oom_badness calculation. Whether this is a workload which is representative, I don't know but I think that it is worth fixing and pushing to stable as well. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] pinctrl fixes for v3.7 rc:s
Hi Linus, here is a set of pinctrl fixes for the current -rc series. Details in the signed tag. Please pull it in! Yours, Linus Walleij The following changes since commit 8f0d8163b50e01f398b14bcd4dc039ac5ab18d64: Linux 3.7-rc3 (2012-10-28 12:24:48 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl.git tags/pinctrl-for-v3.7-rc5 for you to fetch changes up to 924da31416f20a8ee7a9008dd4e6e6054bc36b1b: pinctrl: samsung and exynos need to depend on OF GPIOLIB (2012-11-06 10:02:14 +0100) Some pinctrl fixes for the v3.7 series: - A set of SPEAr pinctrl fixes that recently arrived - A fixup for the Samsung/Exynos Kconfig deps Axel Lin (1): pinctrl: samsung and exynos need to depend on OF GPIOLIB Deepak Sikri (2): pinctrl: SPEAr320: Correct pad mux entries for rmii/smii pinctrl: SPEAr1340: Make DDR reset clock pads as gpio Shiraz Hashim (3): pinctrl: SPEAr3xx: correct register space to configure pwm pinctrl: SPEAr1310: fix clcd high resolution pin group name pinctrl: SPEAr1310: add register entries for enabling pad direction Vipul Kumar Samar (3): pinctrl: SPEAr1310: Fix value of PERIP_CFG reigster and MCIF_SEL_SHIFT pinctrl: SPEAr1310: Separate out pci pins from pcie_sata pin group pinctrl: SPEAr1340: Add clcd sleep mode pin configuration Viresh Kumar (1): pinctrl: SPEAr: Don't update all non muxreg bits on pinctrl_disable drivers/pinctrl/Kconfig | 2 + drivers/pinctrl/spear/pinctrl-spear.c | 2 +- drivers/pinctrl/spear/pinctrl-spear1310.c | 365 ++ drivers/pinctrl/spear/pinctrl-spear1340.c | 41 +++- drivers/pinctrl/spear/pinctrl-spear320.c | 8 +- drivers/pinctrl/spear/pinctrl-spear3xx.h | 1 + 6 files changed, 369 insertions(+), 50 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] add tpm_xenu.ko: Xen Virtual TPM frontend driver
+typedef struct tpmif_tx_request tpmif_tx_request_t; checkpatch warned on this new typedef - please run through checkpatch and fix up that stuff. tpmif.h has a couple of typedefs which do trigger checkpatch warnings. However it looks like the paradigm for xen is to have these interface/io/devif.h files and all of them have typedefs. I think in this case the typedef should probably stay. Konrad your thoughts here? Rip them out plea This is somewhere that Linux coding style and Xen coding style differ, so the typedefs should be removed from the Linux copy of these interfaces to match the Linux coding style, but they should stay in the Xen side canonical copy though. Ian. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH resend] virtio_console: Free buffers from out-queue upon close
On (Thu) 08 Nov 2012 [10:28:53], Rusty Russell wrote: sjur.brandel...@stericsson.com writes: From: Sjur Brændeland sjur.brandel...@stericsson.com Free pending output buffers from the virtio out-queue when host has acknowledged port_close. Also removed WARN_ON() in remove_port_data(). Signed-off-by: Sjur Brændeland sjur.brandel...@stericsson.com --- Resending, this time including a proper Subject... -- Hi Amit, Note: This patch is compile tested only. I have done the removal of buffers from out-queue in handle_control_message() when host has acked the close request. This seems less racy than doing it in the release function. This confuses me... why are we doing this in case VIRTIO_CONSOLE_PORT_OPEN:? We can't pull unconsumed buffers out of the ring when the other side may still access it, and this seems to be doing that. Yes -- and it's my fault; I asked Sjur to do that in the close fops function. We should only do this in the port remove case (unplug or device remove) -- so the original patch, with just the WARN_ON removed is the right way. I'll send the revised 3/3 patch for you. Amit -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] memory-hotplug: fix NR_FREE_PAGES mismatch's fix
When a page is freed and put into pcp list, get_freepage_migratetype() doesn't return MIGRATE_ISOLATE even if this pageblock is isolated. So we should use get_freepage_migratetype() instead of mt to check whether it is isolated. --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 027afd0..795875f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -667,7 +667,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ __free_one_page(page, zone, 0, mt); trace_mm_page_pcpu_drain(page, 0, mt); - if (likely(mt != MIGRATE_ISOLATE)) { + if (likely(get_pageblock_migratetype(page) != MIGRATE_ISOLATE)) { __mod_zone_page_state(zone, NR_FREE_PAGES, 1); if (is_migrate_cma(mt)) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1 hot_track 02/18] vfs: add init and cleanup functions
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Add initialization function to create some key data structures when hot tracking is enabled; Clean up them when hot tracking is disabled Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/hot_tracking.c| 124 ++ include/linux/fs.h |4 ++ include/linux/hot_tracking.h |3 + 3 files changed, 131 insertions(+), 0 deletions(-) diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c index 806fbb0..3118c0b 100644 --- a/fs/hot_tracking.c +++ b/fs/hot_tracking.c @@ -76,12 +76,103 @@ static void hot_inode_item_init(struct hot_inode_item *he, he-hot_inode_tree = hot_inode_tree; kref_init(he-hot_inode.refs); spin_lock_init(he-hot_inode.lock); + INIT_LIST_HEAD(he-hot_inode.n_list); he-hot_inode.hot_freq_data.avg_delta_reads = (u64) -1; he-hot_inode.hot_freq_data.avg_delta_writes = (u64) -1; he-hot_inode.hot_freq_data.flags = FREQ_DATA_TYPE_INODE; hot_range_tree_init(he); } +static void hot_range_item_free(struct kref *kref) +{ + struct hot_comm_item *comm_item = container_of(kref, + struct hot_comm_item, refs); + struct hot_range_item *hr = container_of(comm_item, + struct hot_range_item, hot_range); + + radix_tree_delete(hr-hot_inode-hot_range_tree, hr-start); + kmem_cache_free(hot_range_item_cachep, hr); +} + +/* + * Drops the reference out on hot_range_item by one + * and free the structure if the reference count hits zero + */ +static void hot_range_item_put(struct hot_range_item *hr) +{ + kref_put(hr-hot_range.refs, hot_range_item_free); +} + +/* Frees the entire hot_range_tree. */ +static void hot_range_tree_free(struct hot_inode_item *he) +{ + struct hot_range_item *hr_nodes[8]; + loff_t start = 0; + int i, n; + + while (1) { + spin_lock(he-lock); + n = radix_tree_gang_lookup(he-hot_range_tree, + (void **)hr_nodes, start, + ARRAY_SIZE(hr_nodes)); + if (!n) { + spin_unlock(he-lock); + break; + } + + start = hr_nodes[n - 1]-start + 1; + for (i = 0; i n; i++) + hot_range_item_put(hr_nodes[i]); + spin_unlock(he-lock); + } +} + +static void hot_inode_item_free(struct kref *kref) +{ + struct hot_comm_item *comm_item = container_of(kref, + struct hot_comm_item, refs); + struct hot_inode_item *he = container_of(comm_item, + struct hot_inode_item, hot_inode); + + hot_range_tree_free(he); + radix_tree_delete(he-hot_inode_tree, he-i_ino); + kmem_cache_free(hot_inode_item_cachep, he); +} + +/* + * Drops the reference out on hot_inode_item by one + * and free the structure if the reference count hits zero + */ +void hot_inode_item_put(struct hot_inode_item *he) +{ + kref_put(he-hot_inode.refs, hot_inode_item_free); +} +EXPORT_SYMBOL_GPL(hot_inode_item_put); + +/* Frees the entire hot_inode_tree. */ +static void hot_inode_tree_exit(struct hot_info *root) +{ + struct hot_inode_item *hi_nodes[8]; + unsigned long ino = 0; + int i, n; + + while (1) { + spin_lock(root-lock); + n = radix_tree_gang_lookup(root-hot_inode_tree, + (void **)hi_nodes, ino, + ARRAY_SIZE(hi_nodes)); + if (!n) { + spin_unlock(root-lock); + break; + } + + ino = hi_nodes[n - 1]-i_ino + 1; + for (i = 0; i n; i++) + hot_inode_item_put(hi_nodes[i]); + spin_unlock(root-lock); + } +} + /* * Initialize kmem cache for hot_inode_item and hot_range_item. */ @@ -107,3 +198,36 @@ err: kmem_cache_destroy(hot_inode_item_cachep); } EXPORT_SYMBOL_GPL(hot_cache_init); + +/* + * Initialize the data structures for hot data tracking. + */ +int hot_track_init(struct super_block *sb) +{ + struct hot_info *root; + int ret = -ENOMEM; + + root = kzalloc(sizeof(struct hot_info), GFP_NOFS); + if (!root) { + printk(KERN_ERR %s: Failed to malloc memory for + hot_info\n, __func__); + return ret; + } + + sb-s_hot_root = root; + hot_inode_tree_init(root); + + printk(KERN_INFO VFS: Turning on hot data tracking\n); + + return 0; +} +EXPORT_SYMBOL_GPL(hot_track_init); + +void hot_track_exit(struct super_block *sb) +{ + struct hot_info *root = sb-s_hot_root; + + hot_inode_tree_exit(root); + kfree(root); +} +EXPORT_SYMBOL_GPL(hot_track_exit); diff --git a/include/linux/fs.h
[PATCH v1 hot_track 03/18] vfs: add I/O frequency update function
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Add some util helpers to update access frequencies for one file or its range. Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/hot_tracking.c| 174 ++ fs/hot_tracking.h|5 + include/linux/hot_tracking.h |4 + 3 files changed, 183 insertions(+), 0 deletions(-) diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c index 3118c0b..1142ef1 100644 --- a/fs/hot_tracking.c +++ b/fs/hot_tracking.c @@ -173,6 +173,131 @@ static void hot_inode_tree_exit(struct hot_info *root) } } +struct hot_inode_item +*hot_inode_item_lookup(struct hot_info *root, unsigned long ino) +{ + struct hot_inode_item *he; + int ret; + +again: + spin_lock(root-lock); + he = radix_tree_lookup(root-hot_inode_tree, ino); + if (he) { + kref_get(he-hot_inode.refs); + spin_unlock(root-lock); + return he; + } + spin_unlock(root-lock); + + he = kmem_cache_zalloc(hot_inode_item_cachep, GFP_NOFS); + if (!he) + return ERR_PTR(-ENOMEM); + + hot_inode_item_init(he, ino, root-hot_inode_tree); + + ret = radix_tree_preload(GFP_NOFS ~__GFP_HIGHMEM); + if (ret) { + kmem_cache_free(hot_inode_item_cachep, he); + return ERR_PTR(ret); + } + + spin_lock(root-lock); + ret = radix_tree_insert(root-hot_inode_tree, ino, he); + if (ret == -EEXIST) { + kmem_cache_free(hot_inode_item_cachep, he); + spin_unlock(root-lock); + radix_tree_preload_end(); + goto again; + } + spin_unlock(root-lock); + radix_tree_preload_end(); + + kref_get(he-hot_inode.refs); + return he; +} +EXPORT_SYMBOL_GPL(hot_inode_item_lookup); + +static struct hot_range_item +*hot_range_item_lookup(struct hot_inode_item *he, + loff_t start) +{ + struct hot_range_item *hr; + int ret; + +again: + spin_lock(he-lock); + hr = radix_tree_lookup(he-hot_range_tree, start); + if (hr) { + kref_get(hr-hot_range.refs); + spin_unlock(he-lock); + return hr; + } + spin_unlock(he-lock); + + hr = kmem_cache_zalloc(hot_range_item_cachep, GFP_NOFS); + if (!hr) + return ERR_PTR(-ENOMEM); + + hot_range_item_init(hr, start, he); + + ret = radix_tree_preload(GFP_NOFS ~__GFP_HIGHMEM); + if (ret) { + kmem_cache_free(hot_range_item_cachep, hr); + return ERR_PTR(ret); + } + + spin_lock(he-lock); + ret = radix_tree_insert(he-hot_range_tree, start, hr); + if (ret == -EEXIST) { + kmem_cache_free(hot_range_item_cachep, hr); + spin_unlock(he-lock); + radix_tree_preload_end(); + goto again; + } + spin_unlock(he-lock); + radix_tree_preload_end(); + + kref_get(hr-hot_range.refs); + return hr; +} + +/* + * This function does the actual work of updating + * the frequency numbers, whatever they turn out to be. + */ +static void hot_rw_freq_calc(struct timespec old_atime, + struct timespec cur_time, u64 *avg) +{ + struct timespec delta_ts; + u64 new_delta; + + delta_ts = timespec_sub(cur_time, old_atime); + new_delta = timespec_to_ns(delta_ts) FREQ_POWER; + + *avg = (*avg FREQ_POWER) - *avg + new_delta; + *avg = *avg FREQ_POWER; +} + +static void hot_freq_data_update(struct hot_freq_data *freq_data, bool write) +{ + struct timespec cur_time = current_kernel_time(); + + if (write) { + freq_data-nr_writes += 1; + hot_rw_freq_calc(freq_data-last_write_time, + cur_time, + freq_data-avg_delta_writes); + freq_data-last_write_time = cur_time; + } else { + freq_data-nr_reads += 1; + hot_rw_freq_calc(freq_data-last_read_time, + freq_data-last_read_time, + cur_time, + freq_data-avg_delta_reads); + freq_data-last_read_time = cur_time; + } +} + /* * Initialize kmem cache for hot_inode_item and hot_range_item. */ @@ -200,6 +325,55 @@ err: EXPORT_SYMBOL_GPL(hot_cache_init); /* + * Main function to update access frequency from read/writepage(s) hooks + */ +void hot_update_freqs(struct inode *inode, loff_t start, + size_t len, int rw) +{ + struct hot_info *root = inode-i_sb-s_hot_root; + struct hot_inode_item *he; + struct hot_range_item *hr; + loff_t cur, end; + + if (!root || (len == 0)) + return; + + he = hot_inode_item_lookup(root, inode-i_ino); + if (IS_ERR(he)) { +
[PATCH v1 hot_track 06/18] vfs: add temp calculation function
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/hot_tracking.c | 74 + 1 files changed, 74 insertions(+), 0 deletions(-) diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c index 7101b73..3fd6255 100644 --- a/fs/hot_tracking.c +++ b/fs/hot_tracking.c @@ -25,6 +25,14 @@ static struct kmem_cache *hot_inode_item_cachep __read_mostly; static struct kmem_cache *hot_range_item_cachep __read_mostly; +static u64 hot_raw_shift(u64 counter, u32 bits, bool dir) +{ + if (dir) + return counter bits; + else + return counter bits; +} + /* * Initialize the inode tree. Should be called for each new inode * access or other user of the hot_inode interface. @@ -319,6 +327,72 @@ static void hot_freq_data_update(struct hot_freq_data *freq_data, bool write) } /* + * hot_temp_calc() is responsible for distilling the six heat + * criteria down into a single temperature value for the data, + * which is an integer between 0 and HEAT_MAX_VALUE. + */ +static u32 hot_temp_calc(struct hot_freq_data *freq_data) +{ + u32 result = 0; + + struct timespec ckt = current_kernel_time(); + u64 cur_time = timespec_to_ns(ckt); + + u32 nrr_heat = (u32)hot_raw_shift((u64)freq_data-nr_reads, + NRR_MULTIPLIER_POWER, true); + u32 nrw_heat = (u32)hot_raw_shift((u64)freq_data-nr_writes, + NRW_MULTIPLIER_POWER, true); + + u64 ltr_heat = + hot_raw_shift((cur_time - timespec_to_ns(freq_data-last_read_time)), + LTR_DIVIDER_POWER, false); + u64 ltw_heat = + hot_raw_shift((cur_time - timespec_to_ns(freq_data-last_write_time)), + LTW_DIVIDER_POWER, false); + + u64 avr_heat = + hot_raw_shiftu64) -1) - freq_data-avg_delta_reads), + AVR_DIVIDER_POWER, false); + u64 avw_heat = + hot_raw_shiftu64) -1) - freq_data-avg_delta_writes), + AVW_DIVIDER_POWER, false); + + /* ltr_heat is now guaranteed to be u32 safe */ + if (ltr_heat = hot_raw_shift((u64) 1, 32, true)) + ltr_heat = 0; + else + ltr_heat = hot_raw_shift((u64) 1, 32, true) - ltr_heat; + + /* ltw_heat is now guaranteed to be u32 safe */ + if (ltw_heat = hot_raw_shift((u64) 1, 32, true)) + ltw_heat = 0; + else + ltw_heat = hot_raw_shift((u64) 1, 32, true) - ltw_heat; + + /* avr_heat is now guaranteed to be u32 safe */ + if (avr_heat = hot_raw_shift((u64) 1, 32, true)) + avr_heat = (u32) -1; + + /* avw_heat is now guaranteed to be u32 safe */ + if (avw_heat = hot_raw_shift((u64) 1, 32, true)) + avw_heat = (u32) -1; + + nrr_heat = (u32)hot_raw_shift((u64)nrr_heat, + (3 - NRR_COEFF_POWER), false); + nrw_heat = (u32)hot_raw_shift((u64)nrw_heat, + (3 - NRW_COEFF_POWER), false); + ltr_heat = hot_raw_shift(ltr_heat, (3 - LTR_COEFF_POWER), false); + ltw_heat = hot_raw_shift(ltw_heat, (3 - LTW_COEFF_POWER), false); + avr_heat = hot_raw_shift(avr_heat, (3 - AVR_COEFF_POWER), false); + avw_heat = hot_raw_shift(avw_heat, (3 - AVW_COEFF_POWER), false); + + result = nrr_heat + nrw_heat + (u32) ltr_heat + + (u32) ltw_heat + (u32) avr_heat + (u32) avw_heat; + + return result; +} + +/* * Initialize inode and range map info. */ static void hot_map_init(struct hot_info *root) -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1 hot_track 04/18] vfs: add two map info arrays
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Adds two map arrays which contains a lot of list and is used to efficiently look up the data temperature of a file or its ranges. In each list of map arrays, the array node will keep track of temperature info. Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/hot_tracking.c| 60 ++ include/linux/hot_tracking.h | 16 +++ 2 files changed, 76 insertions(+), 0 deletions(-) diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c index 1142ef1..7101b73 100644 --- a/fs/hot_tracking.c +++ b/fs/hot_tracking.c @@ -58,6 +58,7 @@ static void hot_range_item_init(struct hot_range_item *hr, loff_t start, hr-hot_inode = he; kref_init(hr-hot_range.refs); spin_lock_init(hr-hot_range.lock); + INIT_LIST_HEAD(hr-hot_range.n_list); hr-hot_range.hot_freq_data.avg_delta_reads = (u64) -1; hr-hot_range.hot_freq_data.avg_delta_writes = (u64) -1; hr-hot_range.hot_freq_data.flags = FREQ_DATA_TYPE_RANGE; @@ -89,6 +90,16 @@ static void hot_range_item_free(struct kref *kref) struct hot_comm_item, refs); struct hot_range_item *hr = container_of(comm_item, struct hot_range_item, hot_range); + struct hot_info *root = container_of( + hr-hot_inode-hot_inode_tree, + struct hot_info, hot_inode_tree); + + spin_lock(hr-hot_range.lock); + if (!list_empty(hr-hot_range.n_list)) { + list_del_init(hr-hot_range.n_list); + root-hot_map_nr--; + } + spin_unlock(hr-hot_range.lock); radix_tree_delete(hr-hot_inode-hot_range_tree, hr-start); kmem_cache_free(hot_range_item_cachep, hr); @@ -133,6 +144,15 @@ static void hot_inode_item_free(struct kref *kref) struct hot_comm_item, refs); struct hot_inode_item *he = container_of(comm_item, struct hot_inode_item, hot_inode); + struct hot_info *root = container_of(he-hot_inode_tree, + struct hot_info, hot_inode_tree); + + spin_lock(he-hot_inode.lock); + if (!list_empty(he-hot_inode.n_list)) { + list_del_init(he-hot_inode.n_list); + root-hot_map_nr--; + } + spin_unlock(he-hot_inode.lock); hot_range_tree_free(he); radix_tree_delete(he-hot_inode_tree, he-i_ino); @@ -299,6 +319,44 @@ static void hot_freq_data_update(struct hot_freq_data *freq_data, bool write) } /* + * Initialize inode and range map info. + */ +static void hot_map_init(struct hot_info *root) +{ + int i; + for (i = 0; i HEAT_MAP_SIZE; i++) { + INIT_LIST_HEAD(root-heat_inode_map[i].node_list); + INIT_LIST_HEAD(root-heat_range_map[i].node_list); + root-heat_inode_map[i].temp = i; + root-heat_range_map[i].temp = i; + } +} + +static void hot_map_list_free(struct list_head *node_list, + struct hot_info *root) +{ + struct list_head *pos, *next; + struct hot_comm_item *node; + + list_for_each_safe(pos, next, node_list) { + node = list_entry(pos, struct hot_comm_item, n_list); + list_del_init(node-n_list); + root-hot_map_nr--; + } + +} + +/* Free inode and range map info */ +static void hot_map_exit(struct hot_info *root) +{ + int i; + for (i = 0; i HEAT_MAP_SIZE; i++) { + hot_map_list_free(root-heat_inode_map[i].node_list, root); + hot_map_list_free(root-heat_range_map[i].node_list, root); + } +} + +/* * Initialize kmem cache for hot_inode_item and hot_range_item. */ void __init hot_cache_init(void) @@ -390,6 +448,7 @@ int hot_track_init(struct super_block *sb) sb-s_hot_root = root; hot_inode_tree_init(root); + hot_map_init(root); printk(KERN_INFO VFS: Turning on hot data tracking\n); @@ -401,6 +460,7 @@ void hot_track_exit(struct super_block *sb) { struct hot_info *root = sb-s_hot_root; + hot_map_exit(root); hot_inode_tree_exit(root); kfree(root); } diff --git a/include/linux/hot_tracking.h b/include/linux/hot_tracking.h index 3797cf0..5227e89 100644 --- a/include/linux/hot_tracking.h +++ b/include/linux/hot_tracking.h @@ -20,6 +20,9 @@ #include linux/kref.h #include linux/fs.h +#define HEAT_MAP_BITS 8 +#define HEAT_MAP_SIZE (1 HEAT_MAP_BITS) + /* * A frequency data struct holds values that are used to * determine temperature of files and file ranges. These structs @@ -36,11 +39,18 @@ struct hot_freq_data { u32 last_temp; }; +/* List heads in hot map array */ +struct hot_map_head { + struct list_head node_list; + u8 temp; +}; + /* The common info for both following structures */ struct hot_comm_item { struct hot_freq_data hot_freq_data; /* frequency data */
[PATCH v1 hot_track 12/18] vfs: add one ioctl interface
From: Zhi Yong Wu wu...@linux.vnet.ibm.com FS_IOC_GET_HEAT_INFO: return a struct containing the various metrics collected in hot_freq_data structs, and also return a calculated data temperature based on those metrics. Optionally, retrieve the temperature from the hot data hash list instead of recalculating it. Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/compat_ioctl.c|5 +++ fs/ioctl.c | 74 ++ include/linux/hot_tracking.h | 19 +++ 3 files changed, 98 insertions(+), 0 deletions(-) diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c index 4c6285f..ad1d603 100644 --- a/fs/compat_ioctl.c +++ b/fs/compat_ioctl.c @@ -57,6 +57,7 @@ #include linux/i2c-dev.h #include linux/atalk.h #include linux/gfp.h +#include linux/hot_tracking.h #include net/bluetooth/bluetooth.h #include net/bluetooth/hci.h @@ -1400,6 +1401,9 @@ COMPATIBLE_IOCTL(TIOCSTART) COMPATIBLE_IOCTL(TIOCSTOP) #endif +/*Hot data tracking*/ +COMPATIBLE_IOCTL(FS_IOC_GET_HEAT_INFO) + /* fat 'r' ioctls. These are handled by fat with -compat_ioctl, but we don't want warnings on other file systems. So declare them as compatible here. */ @@ -1579,6 +1583,7 @@ asmlinkage long compat_sys_ioctl(unsigned int fd, unsigned int cmd, case FIBMAP: case FIGETBSZ: case FIONREAD: + case FS_IOC_GET_HEAT_INFO: if (S_ISREG(f.file-f_path.dentry-d_inode-i_mode)) break; /*FALL THROUGH*/ diff --git a/fs/ioctl.c b/fs/ioctl.c index 3bdad6d..79fe81f 100644 --- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -15,6 +15,7 @@ #include linux/writeback.h #include linux/buffer_head.h #include linux/falloc.h +#include linux/hot_tracking.h #include asm/ioctls.h @@ -537,6 +538,76 @@ static int ioctl_fsthaw(struct file *filp) } /* + * Retrieve information about access frequency for the given file. Return it in + * a userspace-friendly struct for btrfsctl (or another tool) to parse. + * + * The temperature that is returned can be live -- that is, recalculated when + * the ioctl is called -- or it can be returned from the hashtable, reflecting + * the (possibly old) value that the system will use when considering files + * for migration. This behavior is determined by hot_heat_info-live. + */ +static int ioctl_heat_info(struct file *file, void __user *argp) +{ + struct inode *inode = file-f_dentry-d_inode; + struct hot_heat_info heat_info; + struct hot_inode_item *he; + int ret = 0; + + if (copy_from_user((void *)heat_info, + argp, + sizeof(struct hot_heat_info)) != 0) { + ret = -EFAULT; + goto err; + } + + he = hot_inode_item_lookup(inode-i_sb-s_hot_root, inode-i_ino); + if (!he) { + /* we don't have any info on this file yet */ + ret = -ENODATA; + goto err; + } + + spin_lock(he-hot_inode.lock); + heat_info.avg_delta_reads = + (__u64) he-hot_inode.hot_freq_data.avg_delta_reads; + heat_info.avg_delta_writes = + (__u64) he-hot_inode.hot_freq_data.avg_delta_writes; + heat_info.last_read_time = + (__u64) timespec_to_ns(he-hot_inode.hot_freq_data.last_read_time); + heat_info.last_write_time = + (__u64) timespec_to_ns(he-hot_inode.hot_freq_data.last_write_time); + heat_info.num_reads = + (__u32) he-hot_inode.hot_freq_data.nr_reads; + heat_info.num_writes = + (__u32) he-hot_inode.hot_freq_data.nr_writes; + + if (heat_info.live 0) { + /* +* got a request for live temperature, +* call hot_hash_calc_temperature to recalculate +*/ + heat_info.temp = + inode-i_sb-s_hot_root-hot_type-ops.hot_temp_calc_fn( + he-hot_inode.hot_freq_data); + } else { + /* not live temperature, get it from the hashlist */ + heat_info.temp = he-hot_inode.hot_freq_data.last_temp; + } + spin_unlock(he-hot_inode.lock); + + hot_inode_item_put(he); + + if (copy_to_user(argp, (void *)heat_info, + sizeof(struct hot_heat_info))) { + ret = -EFAULT; + goto err; + } + +err: + return ret; +} + +/* * When you add any new common ioctls to the switches above and below * please update compat_sys_ioctl() too. * @@ -591,6 +662,9 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, case FIGETBSZ: return put_user(inode-i_sb-s_blocksize, argp); + case FS_IOC_GET_HEAT_INFO: + return ioctl_heat_info(filp, argp); + default: if (S_ISREG(inode-i_mode)) error = file_ioctl(filp, cmd, arg); diff --git
[PATCH v1 hot_track 11/18] vfs: register one shrinker
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Register a shrinker to control the amount of memory that is used in tracking hot regions - if we are throwing inodes out of memory due to memory pressure, we most definitely are going to need to reduce the amount of memory the tracking code is using, even if it means losing useful information (i.e. the shrinker accelerates the aging process). Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/hot_tracking.c| 61 ++ include/linux/hot_tracking.h |1 + 2 files changed, 62 insertions(+), 0 deletions(-) diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c index 06127cf..3d96512 100644 --- a/fs/hot_tracking.c +++ b/fs/hot_tracking.c @@ -648,6 +648,61 @@ err: } EXPORT_SYMBOL_GPL(hot_cache_init); +static int hot_track_prune_map(struct hot_map_head *map_head, + bool type, int nr) +{ + struct hot_comm_item *node; + int i; + + for (i = 0; i HEAT_MAP_SIZE; i++) { + while (!list_empty((map_head + i)-node_list)) { + if (nr-- = 0) + break; + + node = list_first_entry((map_head + i)-node_list, + struct hot_comm_item, n_list); + if (type) { + struct hot_inode_item *hot_inode = + container_of(node, + struct hot_inode_item, hot_inode); + hot_inode_item_put(hot_inode); + } else { + struct hot_range_item *hot_range = + container_of(node, + struct hot_range_item, hot_range); + hot_range_item_put(hot_range); + } + } + } + + return nr; +} + +/* The shrinker callback function */ +static int hot_track_prune(struct shrinker *shrink, + struct shrink_control *sc) +{ + struct hot_info *root = + container_of(shrink, struct hot_info, hot_shrink); + int ret; + + if (sc-nr_to_scan == 0) + return root-hot_map_nr; + + if (!(sc-gfp_mask __GFP_FS)) + return -1; + + ret = hot_track_prune_map(root-heat_range_map, + false, sc-nr_to_scan); + if (ret 0) + ret = hot_track_prune_map(root-heat_inode_map, + true, ret); + if (ret 0) + root-hot_map_nr -= (sc-nr_to_scan - ret); + + return root-hot_map_nr; +} + /* * Main function to update access frequency from read/writepage(s) hooks */ @@ -745,6 +800,11 @@ int hot_track_init(struct super_block *sb) queue_delayed_work(root-update_wq, root-update_work, msecs_to_jiffies(HEAT_UPDATE_DELAY * MSEC_PER_SEC)); + /* Register a shrinker callback */ + root-hot_shrink.shrink = hot_track_prune; + root-hot_shrink.seeks = DEFAULT_SEEKS; + register_shrinker(root-hot_shrink); + printk(KERN_INFO VFS: Turning on hot data tracking\n); return 0; @@ -761,6 +821,7 @@ void hot_track_exit(struct super_block *sb) { struct hot_info *root = sb-s_hot_root; + unregister_shrinker(root-hot_shrink); cancel_delayed_work_sync(root-update_work); destroy_workqueue(root-update_wq); hot_map_exit(root); diff --git a/include/linux/hot_tracking.h b/include/linux/hot_tracking.h index 67fc0bf..43d9b53 100644 --- a/include/linux/hot_tracking.h +++ b/include/linux/hot_tracking.h @@ -104,6 +104,7 @@ struct hot_info { struct workqueue_struct *update_wq; struct delayed_work update_work; struct hot_type *hot_type; + struct shrinker hot_shrink; }; extern void __init hot_cache_init(void); -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1 hot_track 15/18] btrfs: add hot tracking support
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Introduce one new mount option '-o hot_track', and add its parsing support. Its usage looks like: mount -o hot_track mount -o nouser,hot_track mount -o nouser,hot_track,loop mount -o hot_track,nouser Reviewed-by: David Sterba dste...@suse.cz Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/btrfs/ctree.h |1 + fs/btrfs/super.c | 22 +- 2 files changed, 22 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c72ead8..4703178 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1756,6 +1756,7 @@ struct btrfs_ioctl_defrag_range_args { #define BTRFS_MOUNT_CHECK_INTEGRITY(1 20) #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1 21) #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR (1 22) +#define BTRFS_MOUNT_HOT_TRACK (1 23) #define btrfs_clear_opt(o, opt)((o) = ~BTRFS_MOUNT_##opt) #define btrfs_set_opt(o, opt) ((o) |= BTRFS_MOUNT_##opt) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 915ac14..0bcc62b 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -41,6 +41,7 @@ #include linux/slab.h #include linux/cleancache.h #include linux/ratelimit.h +#include linux/hot_tracking.h #include compat.h #include delayed-inode.h #include ctree.h @@ -299,6 +300,10 @@ static void btrfs_put_super(struct super_block *sb) * last process that kept it busy. Or segfault in the aforementioned * process... Whom would you report that to? */ + + /* Hot data tracking */ + if (btrfs_test_opt(btrfs_sb(sb)-tree_root, HOT_TRACK)) + hot_track_exit(sb); } enum { @@ -311,7 +316,7 @@ enum { Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache, Opt_no_space_cache, Opt_recovery, Opt_skip_balance, Opt_check_integrity, Opt_check_integrity_including_extent_data, - Opt_check_integrity_print_mask, Opt_fatal_errors, + Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_hot_track, Opt_err, }; @@ -352,6 +357,7 @@ static match_table_t tokens = { {Opt_check_integrity_including_extent_data, check_int_data}, {Opt_check_integrity_print_mask, check_int_print_mask=%d}, {Opt_fatal_errors, fatal_errors=%s}, + {Opt_hot_track, hot_track}, {Opt_err, NULL}, }; @@ -614,6 +620,9 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) goto out; } break; + case Opt_hot_track: + btrfs_set_opt(info-mount_opt, HOT_TRACK); + break; case Opt_err: printk(KERN_INFO btrfs: unrecognized mount option '%s'\n, p); @@ -841,11 +850,20 @@ static int btrfs_fill_super(struct super_block *sb, goto fail_close; } + if (btrfs_test_opt(fs_info-tree_root, HOT_TRACK)) { + err = hot_track_init(sb); + if (err) + goto fail_hot; + } + save_mount_options(sb, data); cleancache_init_fs(sb); sb-s_flags |= MS_ACTIVE; return 0; +fail_hot: + dput(sb-s_root); + sb-s_root = NULL; fail_close: close_ctree(fs_info-tree_root); return err; @@ -941,6 +959,8 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry) seq_puts(seq, ,skip_balance); if (btrfs_test_opt(root, PANIC_ON_FATAL_ERROR)) seq_puts(seq, ,fatal_errors=panic); + if (btrfs_test_opt(root, HOT_TRACK)) + seq_puts(seq, ,hot_track); return 0; } -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1 hot_track 17/18] ext4: add hot tracking support
From: Zheng Liu wenqing...@taobao.com Define a new mount option to add VFS hot tracking support in order to use it in ext4. CC: Zhi Yong Wu zwu.ker...@gmail.com Signed-off-by: Zheng Liu wenqing...@taobao.com --- fs/ext4/ext4.h |3 +++ fs/ext4/super.c | 13 - 2 files changed, 15 insertions(+), 1 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 3c20de1..f6cff1e 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1298,6 +1298,9 @@ struct ext4_sb_info { /* Precomputed FS UUID checksum for seeding other checksums */ __u32 s_csum_seed; + + /* Enable hot tracking or not */ + int s_hottrack_enable; }; static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 80928f7..ba9f376 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -864,6 +864,8 @@ static void ext4_put_super(struct super_block *sb) ext4_ext_release(sb); ext4_xattr_put_super(sb); + if (sbi-s_hottrack_enable) + hot_track_exit(sb); if (!(sb-s_flags MS_RDONLY)) { EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER); es-s_state = cpu_to_le16(sbi-s_mount_state); @@ -1222,7 +1224,7 @@ enum { Opt_inode_readahead_blks, Opt_journal_ioprio, Opt_dioread_nolock, Opt_dioread_lock, Opt_discard, Opt_nodiscard, Opt_init_itable, Opt_noinit_itable, - Opt_max_dir_size_kb, + Opt_max_dir_size_kb, Opt_hottrack, }; static const match_table_t tokens = { @@ -1297,6 +1299,7 @@ static const match_table_t tokens = { {Opt_init_itable, init_itable}, {Opt_noinit_itable, noinit_itable}, {Opt_max_dir_size_kb, max_dir_size_kb=%u}, + {Opt_hottrack, hot_track}, {Opt_removed, check=none},/* mount option from ext2/3 */ {Opt_removed, nocheck}, /* mount option from ext2/3 */ {Opt_removed, reservation}, /* mount option from ext2/3 */ @@ -1595,6 +1598,14 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token, sbi-s_li_wait_mult = arg; } else if (token == Opt_max_dir_size_kb) { sbi-s_max_dir_size_kb = arg; + } else if (token == Opt_hottrack) { + if (hot_track_init(sb)) { + ext4_msg(sb, KERN_ERR, + EXT4-fs: hot tracking initialization +failed); + return -1; + } + sbi-s_hottrack_enable = 1; } else if (token == Opt_stripe) { sbi-s_stripe = arg; } else if (m-flags MOPT_DATAJ) { -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1 hot_track 00/18] vfs: hot data tracking
From: Zhi Yong Wu wu...@linux.vnet.ibm.com HI, guys, Any comments or ideas are appreciated, thanks. NOTE: The patchset can be obtained via my kernel dev git on github: g...@github.com:wuzhy/kernel.git hot_tracking If you're interested, you can also review them via https://github.com/wuzhy/kernel/commits/hot_tracking For more info, please check hot_tracking.txt in Documentation TODO List: 1.) Need to do scalability or performance tests. - Required 2.) Need one simpler but efficient temp calculation function 3.) How to save the file temperature among the umount to be able to preserve the file tempreture after reboot - Optional Changelog: - Embed struct hot_type in struct file_system_type [Darrick J. Wong] - Cleanup Some issues [David Sterba] - Use a static hot debugfs root [Greg KH] - Rewritten debugfs support based on seq_file operation. [Dave Chinner] - Refactored workqueue support. [Dave Chinner] - Turn some Micro into be tunable [Zhiyong, Zheng Liu] TIME_TO_KICK, and HEAT_UPDATE_DELAY - Introduce hot func registering framework [Zhiyong] - Remove global variable for hot tracking [Zhiyong] - Add xfs hot tracking support [Dave Chinner] - Add ext4 hot tracking support [Zheng Liu] - Cleanedup a lot of other issues [Dave Chinner] - Converted to Radix trees, not RB-tree [Zhiyong, Dave Chinner] - Added memory shrinker [Dave Chinner] - Converted to one workqueue to update map info periodically [Dave Chinner] - Cleanedup a lot of other issues [Dave Chinner] - Reduce new files and put all in fs/hot_tracking.[ch] [Dave Chinner] - Add btrfs hot tracking support [Zhiyong] - The first three patches can probably just be flattened into one. [Marco Stornelli , Dave Chinner] Dave Chinner (1): xfs: add hot tracking support Zheng Liu (1): ext4: add hot tracking support Zhi Yong Wu (16): vfs: introduce some data structures vfs: add init and cleanup functions vfs: add I/O frequency update function vfs: add two map info arrays vfs: add hooks to enable hot tracking vfs: add temp calculation function vfs: add map info update function vfs: add aging function vfs: add one work queue vfs: add FS hot type support vfs: register one shrinker vfs: add one ioctl interface vfs: add debugfs support procfs: add two hot_track proc files btrfs: add hot tracking support vfs: add documentation Documentation/filesystems/00-INDEX |2 + Documentation/filesystems/hot_tracking.txt | 263 ++ fs/Makefile|2 +- fs/btrfs/ctree.h |1 + fs/btrfs/super.c | 22 +- fs/compat_ioctl.c |5 + fs/dcache.c|2 + fs/direct-io.c |6 + fs/ext4/ext4.h |3 + fs/ext4/super.c| 13 +- fs/hot_tracking.c | 1321 fs/hot_tracking.h | 53 ++ fs/ioctl.c | 74 ++ fs/xfs/xfs_mount.h |1 + fs/xfs/xfs_super.c | 16 + include/linux/fs.h |5 + include/linux/hot_tracking.h | 146 +++ kernel/sysctl.c| 14 + mm/filemap.c |6 + mm/page-writeback.c| 12 + mm/readahead.c |7 + 21 files changed, 1971 insertions(+), 3 deletions(-) create mode 100644 Documentation/filesystems/hot_tracking.txt create mode 100644 fs/hot_tracking.c create mode 100644 fs/hot_tracking.h create mode 100644 include/linux/hot_tracking.h -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1 hot_track 18/18] vfs: add documentation
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Add one doc for VFS hot tracking feature Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- Documentation/filesystems/00-INDEX |2 + Documentation/filesystems/hot_tracking.txt | 263 2 files changed, 265 insertions(+), 0 deletions(-) create mode 100644 Documentation/filesystems/hot_tracking.txt diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 8c624a1..b68bdff 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX @@ -118,3 +118,5 @@ xfs.txt - info and mount options for the XFS filesystem. xip.txt - info on execute-in-place for file mappings. +hot_tracking.txt + - info on hot data tracking in VFS layer diff --git a/Documentation/filesystems/hot_tracking.txt b/Documentation/filesystems/hot_tracking.txt new file mode 100644 index 000..0adc524 --- /dev/null +++ b/Documentation/filesystems/hot_tracking.txt @@ -0,0 +1,263 @@ +Hot Data Tracking + +September, 2012Zhi Yong Wu wu...@linux.vnet.ibm.com + +CONTENTS + +1. Introduction +2. Motivation +3. The Design +4. How to Calc Frequency of Reads/Writes Temperature +5. Git Development Tree +6. Usage Example + + +1. Introduction + + The feature adds experimental support for tracking data temperature +information in VFS layer. Essentially, this means maintaining some key +stats(like number of reads/writes, last read/write time, frequency of +reads/writes), then distilling those numbers down to a single +temperature value that reflects what data is hot, and using that +temperature to move data to SSDs. + + The long-term goal of the feature is to allow some FSs, +e.g. Btrfs to intelligently utilize SSDs in a heterogenous volume. +Incidentally, this project has been motivated by +the Project Ideas page on the Btrfs wiki. + + Of course, users are warned not to run this code outside of development +environments. These patches are EXPERIMENTAL, and as such they might eat +your data and/or memory. That said, the code should be relatively safe +when the hottrack mount option are disabled. + + +2. Motivation + + The overall goal of enabling hot data relocation to SSD has been +motivated by the Project Ideas page on the Btrfs wiki at +https://btrfs.wiki.kernel.org/index.php/Project_ideas. +It will divide into two steps. VFS provide hot data tracking function +while specific FS will provide hot data relocation function. +So as the first step of this goal, it is hoped that the patchset +for hot data tracking will eventually mature into VFS. + + This is essentially the traditional cache argument: SSD is fast and +expensive; HDD is cheap but slow. ZFS, for example, can already take +advantage of SSD caching. Btrfs should also be able to take advantage of +hybrid storage without many broad, sweeping changes to existing code. + + +3. The Design + +These include the following parts: + +* Hooks in existing vfs functions to track data access frequency + +* New radix-trees for tracking access frequency of inodes and sub-file +ranges +The relationship between super_block and radix-tree is as below: +hot_info.hot_inode_tree +Each FS instance can find hot tracking info s_hotinfo. +In this hot_info, it store a lot of hot tracking info such as hot_inode_tree, +inode and range list, etc. + +* A list for indexing data by its temperature + +* A debugfs interface for dumping data from the radix-trees + +* A background kthread for updating inode heat info + +* Mount options for enabling temperature tracking(-o hot_track, +default mean disabled) +* An ioctl to retrieve the frequency information collected for a certain +file +* Ioctls to enable/disable frequency tracking per inode. + +Let us see their relationship as below: + +* hot_info.hot_inode_tree indexes hot_inode_items, one per inode + +* hot_inode_item contains access frequency data for that inode + +* hot_inode_item holds a heat list node to index the access +frequency data for that inode + +* hot_inode_item.hot_range_tree indexes hot_range_items for that inode + +* hot_range_item contains access frequency data for that range + +* hot_range_item holds a heat list node to index the access +frequency data for that range + +* hot_info.heat_inode_map indexes per-inode heat list nodes + +* hot_info.heat_range_map indexes per-range heat list nodes + + How about some ascii art? :) Just looking at the hot inode item case +(the range item case is the same pattern, though), we have: + +heat_inode_map hot_inode_tree +| | +| V +| +---hot_comm_item+ +| | frequency data | ++---+ |list_head | +| V^ | V +| ...--hot_comm_item--... | | ...--hot_comm_item--... +| frequency data
[PATCH v1 hot_track 16/18] xfs: add hot tracking support
From: Dave Chinner dchin...@redhat.com Connect up the VFS hot tracking support so XFS filesystems can make use of it. Signed-off-by: Dave Chinner dchin...@redhat.com --- fs/xfs/xfs_mount.h |1 + fs/xfs/xfs_super.c | 16 2 files changed, 17 insertions(+), 0 deletions(-) diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index deee09e..96d93c2 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -217,6 +217,7 @@ typedef struct xfs_mount { #define XFS_MOUNT_WSYNC(1ULL 0) /* for nfs - all metadata ops must be synchronous except for space allocations */ +#define XFS_MOUNT_HOTTRACK (1ULL 1) /* hot inode tracking */ #define XFS_MOUNT_WAS_CLEAN(1ULL 3) #define XFS_MOUNT_FS_SHUTDOWN (1ULL 4) /* atomic stop of all filesystem operations, typically for diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 26a09bd..48b3bed 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -61,6 +61,7 @@ #include linux/kthread.h #include linux/freezer.h #include linux/parser.h +#include linux/hot_tracking.h static const struct super_operations xfs_super_operations; static kmem_zone_t *xfs_ioend_zone; @@ -114,6 +115,7 @@ mempool_t *xfs_ioend_pool; #define MNTOPT_NODELAYLOG nodelaylog/* Delayed logging disabled */ #define MNTOPT_DISCARDdiscard/* Discard unused blocks */ #define MNTOPT_NODISCARD nodiscard /* Do not discard unused blocks */ +#define MNTOPT_HOTTRACKhot_track /* hot inode tracking */ /* * Table driven mount option parser. @@ -371,6 +373,8 @@ xfs_parseargs( mp-m_flags |= XFS_MOUNT_DISCARD; } else if (!strcmp(this_char, MNTOPT_NODISCARD)) { mp-m_flags = ~XFS_MOUNT_DISCARD; + } else if (!strcmp(this_char, MNTOPT_HOTTRACK)) { + mp-m_flags |= XFS_MOUNT_HOTTRACK; } else if (!strcmp(this_char, ihashsize)) { xfs_warn(mp, ihashsize no longer used, option is deprecated.); @@ -1005,6 +1009,9 @@ xfs_fs_put_super( { struct xfs_mount*mp = XFS_M(sb); + if (mp-m_flags XFS_MOUNT_HOTTRACK) + hot_track_exit(sb); + xfs_filestream_unmount(mp); cancel_delayed_work_sync(mp-m_sync_work); xfs_unmountfs(mp); @@ -1407,7 +1414,16 @@ xfs_fs_fill_super( goto out_unmount; } + if (mp-m_flags XFS_MOUNT_HOTTRACK) { + error = hot_track_init(sb); + if (error) + goto out_free_root; + } + return 0; + out_free_root: + dput(sb-s_root); + sb-s_root = NULL; out_syncd_stop: xfs_syncd_stop(mp); out_filestream_unmount: -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1 hot_track 14/18] procfs: add two hot_track proc files
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Add two proc files hot-kick-time and hot-update-delay under the dir /proc/sys/fs/ in order to turn TIME_TO_KICK and HEAT_UPDATE_DELAY into be tunable. Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/hot_tracking.c| 12 +--- fs/hot_tracking.h|9 - include/linux/hot_tracking.h |7 +++ kernel/sysctl.c | 14 ++ 4 files changed, 30 insertions(+), 12 deletions(-) diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c index db430e8..2e5bf6c 100644 --- a/fs/hot_tracking.c +++ b/fs/hot_tracking.c @@ -27,6 +27,12 @@ static struct dentry *hot_debugfs_root; +int sysctl_hot_kick_time __read_mostly = 300; +EXPORT_SYMBOL_GPL(sysctl_hot_kick_time); + +int sysctl_hot_update_delay __read_mostly = 300; +EXPORT_SYMBOL_GPL(sysctl_hot_update_delay); + /* kmem_cache pointers for slab caches */ static struct kmem_cache *hot_inode_item_cachep __read_mostly; static struct kmem_cache *hot_range_item_cachep __read_mostly; @@ -413,7 +419,7 @@ static bool hot_is_obsolete(struct hot_freq_data *freq_data) (cur_time - timespec_to_ns(freq_data-last_read_time)); u64 last_write_ns = (cur_time - timespec_to_ns(freq_data-last_write_time)); - u64 kick_ns = TIME_TO_KICK * NSEC_PER_SEC; + u64 kick_ns = sysctl_hot_kick_time * NSEC_PER_SEC; if ((last_read_ns kick_ns) (last_write_ns kick_ns)) ret = 1; @@ -622,7 +628,7 @@ static void hot_update_worker(struct work_struct *work) /* Instert next delayed work */ queue_delayed_work(root-update_wq, root-update_work, - msecs_to_jiffies(HEAT_UPDATE_DELAY * MSEC_PER_SEC)); + msecs_to_jiffies(sysctl_hot_update_delay * MSEC_PER_SEC)); } /* @@ -1270,7 +1276,7 @@ int hot_track_init(struct super_block *sb) /* Initialize hot tracking wq and arm one delayed work */ INIT_DELAYED_WORK(root-update_work, hot_update_worker); queue_delayed_work(root-update_wq, root-update_work, - msecs_to_jiffies(HEAT_UPDATE_DELAY * MSEC_PER_SEC)); + msecs_to_jiffies(sysctl_hot_update_delay * MSEC_PER_SEC)); /* Register a shrinker callback */ root-hot_shrink.shrink = hot_track_prune; diff --git a/fs/hot_tracking.h b/fs/hot_tracking.h index 1a445a6..50c5681 100644 --- a/fs/hot_tracking.h +++ b/fs/hot_tracking.h @@ -24,15 +24,6 @@ #define RANGE_BITS 20 #define FREQ_POWER 4 -/* - * time to quit keeping track of - * tracking data (seconds) - */ -#define TIME_TO_KICK 300 - -/* set how often to update temperatures (seconds) */ -#define HEAT_UPDATE_DELAY 300 - /* NRR/NRW heat unit = 2^X accesses */ #define NRR_MULTIPLIER_POWER 20 /* NRR - number of reads since mount */ #define NRR_COEFF_POWER 0 diff --git a/include/linux/hot_tracking.h b/include/linux/hot_tracking.h index d9a49f9..96c91e3 100644 --- a/include/linux/hot_tracking.h +++ b/include/linux/hot_tracking.h @@ -120,6 +120,13 @@ struct hot_info { }; /* + * Two variables have meanings as below: + * 1. time to quit keeping track of tracking data (seconds) + * 2. set how often to update temperatures (seconds) + */ +extern int sysctl_hot_kick_time, sysctl_hot_update_delay; + +/* * Hot data tracking ioctls: * * HOT_INFO - retrieve info on frequency of access diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 26f65ea..37624fb 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1545,6 +1545,20 @@ static struct ctl_table fs_table[] = { .proc_handler = pipe_proc_fn, .extra1 = pipe_min_size, }, + { + .procname = hot-kick-time, + .data = sysctl_hot_kick_time, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .procname = hot-update-delay, + .data = sysctl_hot_update_delay, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, { } }; -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv8 0/3]virtio_console: Add rproc_serial driver
On (Tue) 30 Oct 2012 [09:51:50], Sjur Brændeland wrote: From: Sjur Brændeland sjur.brandel...@stericsson.com This patch-set introduces a new virtio type rproc_serial for communicating with remote processors over shared memory. The driver depends on the the remoteproc framework. As preparation for introducing rproc_serial I've done a refactoring of the transmit buffer handling. Thanks, Sjur. Please pick the virtio spec from https://github.com/rustyrussell/virtio-spec and update the spec with info for remote-proc. Thanks, Amit -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1 hot_track 13/18] vfs: add debugfs support
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Add a /sys/kernel/debug/hot_track/device_name/ directory for each volume that contains two files. The first, `inode_stats', contains the heat information for inodes that have been brought into the hot data map structures. The second, `range_stats', contains similar information for subfile ranges. Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/hot_tracking.c| 484 ++ fs/hot_tracking.h|5 + include/linux/hot_tracking.h |1 + 3 files changed, 490 insertions(+), 0 deletions(-) diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c index 3d96512..db430e8 100644 --- a/fs/hot_tracking.c +++ b/fs/hot_tracking.c @@ -21,9 +21,12 @@ #include linux/blkdev.h #include linux/types.h #include linux/list_sort.h +#include linux/debugfs.h #include linux/limits.h #include hot_tracking.h +static struct dentry *hot_debugfs_root; + /* kmem_cache pointers for slab caches */ static struct kmem_cache *hot_inode_item_cachep __read_mostly; static struct kmem_cache *hot_range_item_cachep __read_mostly; @@ -623,6 +626,475 @@ static void hot_update_worker(struct work_struct *work) } /* + * take the inode, find ranges associated with inode + * and print each range data struct + */ +static struct hot_range_item +*hot_range_tree_walk(struct hot_inode_item *he, + loff_t *pos, loff_t start, bool flag) +{ + struct hot_range_item *hr_nodes[8]; + loff_t l = *pos; + int i, n; + + /* Walk the hot_range_tree for inode */ + while (1) { + spin_lock(he-lock); + n = radix_tree_gang_lookup(he-hot_range_tree, + (void **)hr_nodes, start, + ARRAY_SIZE(hr_nodes)); + if (!n) { + spin_unlock(he-lock); + break; + } + spin_unlock(he-lock); + + start = hr_nodes[n - 1]-start + 1; + for (i = 0; i n; i++) { + if ((!flag !l--) || (flag)) { + if (flag) + (*pos)++; + kref_get(hr_nodes[i]-hot_range.refs); + return hr_nodes[i]; + } + } + } + + return NULL; +} + +static void +*hot_inode_tree_walk(struct seq_file *seq, loff_t *pos, + unsigned long ino, bool type, bool flag) +{ + struct hot_info *root = seq-private; + struct hot_inode_item *hi_nodes[8]; + struct hot_range_item *hr; + loff_t l = *pos; + int i, n; + + while (1) { + spin_lock(root-lock); + n = radix_tree_gang_lookup(root-hot_inode_tree, + (void **)hi_nodes, ino, + ARRAY_SIZE(hi_nodes)); + if (!n) { + spin_unlock(root-lock); + break; + } + spin_unlock(root-lock); + + ino = hi_nodes[n - 1]-i_ino + 1; + for (i = 0; i n; i++) { + if (!type) { + hr = hot_range_tree_walk(hi_nodes[i], + pos, 0, flag); + if (hr) + return hr; + } else { + if ((!flag !l--) || (flag)) { + if (flag) + (*pos)++; + kref_get(hi_nodes[i]-hot_inode.refs); + return hi_nodes[i]; + } + } + } + } + + return NULL; +} + +static void *hot_range_seq_start(struct seq_file *seq, loff_t *pos) +{ + return hot_inode_tree_walk(seq, pos, 0, false, false); +} + +static void *hot_range_seq_next(struct seq_file *seq, + void *v, loff_t *pos) +{ + struct hot_range_item *hr_next, *hr = v; + loff_t start = hr-start + 1; + + /* Walk the hot_range_tree for inode */ + hr_next = hot_range_tree_walk(hr-hot_inode, pos, start, true); + if (hr_next) + return hr_next; + + return hot_inode_tree_walk(seq, pos, + hr-hot_inode-i_ino + 1, false, true); +} + +static void hot_range_seq_stop(struct seq_file *seq, void *v) +{ + struct hot_range_item *hr = v; + + if (hr) + hot_range_item_put(hr); +} + +static int hot_range_seq_show(struct seq_file *seq, void *v) +{ + struct hot_range_item *hr = v; + struct hot_inode_item *he = hr-hot_inode; + struct hot_freq_data *freq_data = hr-hot_range.hot_freq_data;
[PATCH v1 hot_track 09/18] vfs: add one work queue
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Add a per-superblock workqueue and a delayed_work to run periodic work to update map info on each superblock. Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/hot_tracking.c| 85 ++ fs/hot_tracking.h|3 + include/linux/hot_tracking.h |3 + 3 files changed, 91 insertions(+), 0 deletions(-) diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c index e85eaa6..552bee0 100644 --- a/fs/hot_tracking.c +++ b/fs/hot_tracking.c @@ -15,9 +15,12 @@ #include linux/module.h #include linux/spinlock.h #include linux/hardirq.h +#include linux/kthread.h +#include linux/freezer.h #include linux/fs.h #include linux/blkdev.h #include linux/types.h +#include linux/list_sort.h #include linux/limits.h #include hot_tracking.h @@ -553,6 +556,67 @@ static void hot_map_exit(struct hot_info *root) } } +/* Temperature compare function*/ +static int hot_temp_cmp(void *priv, struct list_head *a, + struct list_head *b) +{ + struct hot_comm_item *ap = + container_of(a, struct hot_comm_item, n_list); + struct hot_comm_item *bp = + container_of(b, struct hot_comm_item, n_list); + + int diff = ap-hot_freq_data.last_temp + - bp-hot_freq_data.last_temp; + if (diff 0) + return -1; + if (diff 0) + return 1; + return 0; +} + +/* + * Every sync period we update temperatures for + * each hot inode item and hot range item for aging + * purposes. + */ +static void hot_update_worker(struct work_struct *work) +{ + struct hot_info *root = container_of(to_delayed_work(work), + struct hot_info, update_work); + struct hot_inode_item *hi_nodes[8]; + unsigned long ino = 0; + int i, n; + + while (1) { + n = radix_tree_gang_lookup(root-hot_inode_tree, + (void **)hi_nodes, ino, + ARRAY_SIZE(hi_nodes)); + if (!n) + break; + + ino = hi_nodes[n - 1]-i_ino + 1; + for (i = 0; i n; i++) { + kref_get(hi_nodes[i]-hot_inode.refs); + hot_map_update( + hi_nodes[i]-hot_inode.hot_freq_data, root); + hot_range_update(hi_nodes[i], root); + hot_inode_item_put(hi_nodes[i]); + } + } + + /* Sort temperature map info */ + for (i = 0; i HEAT_MAP_SIZE; i++) { + list_sort(NULL, root-heat_inode_map[i].node_list, + hot_temp_cmp); + list_sort(NULL, root-heat_range_map[i].node_list, + hot_temp_cmp); + } + + /* Instert next delayed work */ + queue_delayed_work(root-update_wq, root-update_work, + msecs_to_jiffies(HEAT_UPDATE_DELAY * MSEC_PER_SEC)); +} + /* * Initialize kmem cache for hot_inode_item and hot_range_item. */ @@ -647,9 +711,28 @@ int hot_track_init(struct super_block *sb) hot_inode_tree_init(root); hot_map_init(root); + root-update_wq = alloc_workqueue( + hot_update_wq, WQ_NON_REENTRANT, 0); + if (!root-update_wq) { + printk(KERN_ERR %s: Failed to create + hot update workqueue\n, __func__); + goto failed_wq; + } + + /* Initialize hot tracking wq and arm one delayed work */ + INIT_DELAYED_WORK(root-update_work, hot_update_worker); + queue_delayed_work(root-update_wq, root-update_work, + msecs_to_jiffies(HEAT_UPDATE_DELAY * MSEC_PER_SEC)); + printk(KERN_INFO VFS: Turning on hot data tracking\n); return 0; + +failed_wq: + hot_map_exit(root); + hot_inode_tree_exit(root); + kfree(root); + return ret; } EXPORT_SYMBOL_GPL(hot_track_init); @@ -657,6 +740,8 @@ void hot_track_exit(struct super_block *sb) { struct hot_info *root = sb-s_hot_root; + cancel_delayed_work_sync(root-update_work); + destroy_workqueue(root-update_wq); hot_map_exit(root); hot_inode_tree_exit(root); kfree(root); diff --git a/fs/hot_tracking.h b/fs/hot_tracking.h index 0d28e6f..6dbd27a 100644 --- a/fs/hot_tracking.h +++ b/fs/hot_tracking.h @@ -31,6 +31,9 @@ */ #define TIME_TO_KICK 300 +/* set how often to update temperatures (seconds) */ +#define HEAT_UPDATE_DELAY 300 + /* NRR/NRW heat unit = 2^X accesses */ #define NRR_MULTIPLIER_POWER 20 /* NRR - number of reads since mount */ #define NRR_COEFF_POWER 0 diff --git a/include/linux/hot_tracking.h b/include/linux/hot_tracking.h index 5227e89..af49078 100644 --- a/include/linux/hot_tracking.h +++ b/include/linux/hot_tracking.h @@ -82,6 +82,9 @@ struct
Re: [PATCH 1/2] memcg, oom: provide more precise dump info while memcg oom happening
(2012/11/07 17:41), Sha Zhengju wrote: From: Sha Zhengju handai@taobao.com Current, when a memcg oom is happening the oom dump messages is still global state and provides few useful info for users. This patch prints more pointed memcg page statistics for memcg-oom. Signed-off-by: Sha Zhengju handai@taobao.com Cc: Michal Hocko mho...@suse.cz Cc: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com Cc: David Rientjes rient...@google.com Cc: Andrew Morton a...@linux-foundation.org --- mm/memcontrol.c | 71 --- mm/oom_kill.c |6 +++- 2 files changed, 66 insertions(+), 11 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0eab7d5..2df5e72 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -118,6 +118,14 @@ static const char * const mem_cgroup_events_names[] = { pgmajfault, }; +static const char * const mem_cgroup_lru_names[] = { + inactive_anon, + active_anon, + inactive_file, + active_file, + unevictable, +}; + Is this for the same strings with show_free_areas() ? /* * Per memcg event counter is incremented at every pagein/pageout. With THP, * it will be incremated by the number of pages. This counter is used for @@ -1501,8 +1509,59 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg, spin_unlock_irqrestore(memcg-move_lock, *flags); } +#define K(x) ((x) (PAGE_SHIFT-10)) +static void mem_cgroup_print_oom_stat(struct mem_cgroup *memcg) +{ + struct mem_cgroup *mi; + unsigned int i; + + if (!memcg-use_hierarchy memcg != root_mem_cgroup) { Why do you need to have this condition check ? + for (i = 0; i MEM_CGROUP_STAT_NSTATS; i++) { + if (i == MEM_CGROUP_STAT_SWAP !do_swap_account) + continue; + printk(KERN_CONT %s:%ldKB , mem_cgroup_stat_names[i], + K(mem_cgroup_read_stat(memcg, i))); Hm, how about using the same style with show_free_areas() ? + } + + for (i = 0; i MEM_CGROUP_EVENTS_NSTATS; i++) + printk(KERN_CONT %s:%lu , mem_cgroup_events_names[i], + mem_cgroup_read_events(memcg, i)); + I don't think EVENTS info is useful for oom. + for (i = 0; i NR_LRU_LISTS; i++) + printk(KERN_CONT %s:%luKB , mem_cgroup_lru_names[i], + K(mem_cgroup_nr_lru_pages(memcg, BIT(i; How far does your new information has different format than usual oom ? Could you show a sample and difference in changelog ? Of course, I prefer both of them has similar format. + } else { + + for (i = 0; i MEM_CGROUP_STAT_NSTATS; i++) { + long long val = 0; + + if (i == MEM_CGROUP_STAT_SWAP !do_swap_account) + continue; + for_each_mem_cgroup_tree(mi, memcg) + val += mem_cgroup_read_stat(mi, i); + printk(KERN_CONT %s:%lldKB , mem_cgroup_stat_names[i], K(val)); + } + + for (i = 0; i MEM_CGROUP_EVENTS_NSTATS; i++) { + unsigned long long val = 0; + + for_each_mem_cgroup_tree(mi, memcg) + val += mem_cgroup_read_events(mi, i); + printk(KERN_CONT %s:%llu , + mem_cgroup_events_names[i], val); + } + + for (i = 0; i NR_LRU_LISTS; i++) { + unsigned long long val = 0; + + for_each_mem_cgroup_tree(mi, memcg) + val += mem_cgroup_nr_lru_pages(mi, BIT(i)); + printk(KERN_CONT %s:%lluKB , mem_cgroup_lru_names[i], K(val)); + } + } + printk(KERN_CONT \n); +} /** - * mem_cgroup_print_oom_info: Called from OOM with tasklist_lock held in read mode. * @memcg: The memory cgroup that went over limit * @p: Task that is going to be killed * @@ -1569,6 +1628,8 @@ done: res_counter_read_u64(memcg-kmem, RES_USAGE) 10, res_counter_read_u64(memcg-kmem, RES_LIMIT) 10, res_counter_read_u64(memcg-kmem, RES_FAILCNT)); + + mem_cgroup_print_oom_stat(memcg); } please put directly in print_oom_info() Thanks, -Kame -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1 hot_track 10/18] vfs: add FS hot type support
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Introduce one way to enable that specific FS can inject its own hot tracking type. Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/hot_tracking.c| 43 +++-- fs/hot_tracking.h|1 - include/linux/fs.h |1 + include/linux/hot_tracking.h | 19 ++ 4 files changed, 52 insertions(+), 12 deletions(-) diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c index 552bee0..06127cf 100644 --- a/fs/hot_tracking.c +++ b/fs/hot_tracking.c @@ -64,8 +64,11 @@ void hot_range_tree_init(struct hot_inode_item *he) static void hot_range_item_init(struct hot_range_item *hr, loff_t start, struct hot_inode_item *he) { + struct hot_info *root = container_of(he-hot_inode_tree, + struct hot_info, hot_inode_tree); + hr-start = start; - hr-len = RANGE_SIZE; + hr-len = hot_raw_shift(1, root-hot_type-range_bits, true); hr-hot_inode = he; kref_init(hr-hot_range.refs); spin_lock_init(hr-hot_range.lock); @@ -309,19 +312,21 @@ static void hot_rw_freq_calc(struct timespec old_atime, *avg = *avg FREQ_POWER; } -static void hot_freq_data_update(struct hot_freq_data *freq_data, bool write) +static void hot_freq_data_update(struct hot_info *root, + struct hot_freq_data *freq_data, bool write) { struct timespec cur_time = current_kernel_time(); if (write) { freq_data-nr_writes += 1; - hot_rw_freq_calc(freq_data-last_write_time, + root-hot_type-ops.hot_rw_freq_calc_fn( + freq_data-last_write_time, cur_time, freq_data-avg_delta_writes); freq_data-last_write_time = cur_time; } else { freq_data-nr_reads += 1; - hot_rw_freq_calc(freq_data-last_read_time, + root-hot_type-ops.hot_rw_freq_calc_fn( freq_data-last_read_time, cur_time, freq_data-avg_delta_reads); @@ -425,7 +430,7 @@ static void hot_map_update(struct hot_freq_data *freq_data, struct hot_comm_item *comm_item; struct hot_inode_item *he; struct hot_range_item *hr; - u32 temp = hot_temp_calc(freq_data); + u32 temp = root-hot_type-ops.hot_temp_calc_fn(freq_data); u8 a_temp = (u8)hot_raw_shift((u64)temp, (32 - HEAT_MAP_BITS), false); u8 b_temp = (u8)hot_raw_shift((u64)freq_data-last_temp, (32 - HEAT_MAP_BITS), false); @@ -507,7 +512,7 @@ static void hot_range_update(struct hot_inode_item *he, hr_nodes[i]-hot_range.hot_freq_data, root); spin_lock(hr_nodes[i]-hot_range.lock); - obsolete = hot_is_obsolete( + obsolete = root-hot_type-ops.hot_is_obsolete_fn( hr_nodes[i]-hot_range.hot_freq_data); spin_unlock(hr_nodes[i]-hot_range.lock); @@ -652,6 +657,7 @@ void hot_update_freqs(struct inode *inode, loff_t start, struct hot_info *root = inode-i_sb-s_hot_root; struct hot_inode_item *he; struct hot_range_item *hr; + u64 range_size; loff_t cur, end; if (!root || (len == 0)) @@ -664,15 +670,19 @@ void hot_update_freqs(struct inode *inode, loff_t start, } spin_lock(he-hot_inode.lock); - hot_freq_data_update(he-hot_inode.hot_freq_data, rw); + hot_freq_data_update(root, he-hot_inode.hot_freq_data, rw); spin_unlock(he-hot_inode.lock); /* -* Align ranges on RANGE_SIZE boundary +* Align ranges on range size boundary * to prevent proliferation of range structs */ - end = (start + len + RANGE_SIZE - 1) RANGE_BITS; - for (cur = (start RANGE_BITS); cur end; cur++) { + range_size = hot_raw_shift(1, + root-hot_type-range_bits, true); + end = hot_raw_shift((start + len + range_size - 1), + root-hot_type-range_bits, false); + cur = hot_raw_shift(start, root-hot_type-range_bits, false); + for (; cur end; cur++) { hr = hot_range_item_lookup(he, cur); if (IS_ERR(hr)) { WARN(1, hot_range_item_lookup returns %ld\n, @@ -682,7 +692,7 @@ void hot_update_freqs(struct inode *inode, loff_t start, } spin_lock(hr-hot_range.lock); - hot_freq_data_update(hr-hot_range.hot_freq_data, rw); + hot_freq_data_update(root, hr-hot_range.hot_freq_data, rw); spin_unlock(hr-hot_range.lock); hot_range_item_put(hr); @@ -711,6
[PATCH v1 hot_track 08/18] vfs: add aging function
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/hot_tracking.c | 56 + fs/hot_tracking.h |6 + 2 files changed, 62 insertions(+), 0 deletions(-) diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c index 4b143a4..e85eaa6 100644 --- a/fs/hot_tracking.c +++ b/fs/hot_tracking.c @@ -392,6 +392,24 @@ static u32 hot_temp_calc(struct hot_freq_data *freq_data) return result; } +static bool hot_is_obsolete(struct hot_freq_data *freq_data) +{ + int ret = 0; + struct timespec ckt = current_kernel_time(); + + u64 cur_time = timespec_to_ns(ckt); + u64 last_read_ns = + (cur_time - timespec_to_ns(freq_data-last_read_time)); + u64 last_write_ns = + (cur_time - timespec_to_ns(freq_data-last_write_time)); + u64 kick_ns = TIME_TO_KICK * NSEC_PER_SEC; + + if ((last_read_ns kick_ns) (last_write_ns kick_ns)) + ret = 1; + + return ret; +} + /* * Calculate a new temperature and, if necessary, * move the list_head corresponding to this inode or range @@ -459,6 +477,44 @@ static void hot_map_update(struct hot_freq_data *freq_data, } } +/* Update temperatures for each range item for aging purposes */ +static void hot_range_update(struct hot_inode_item *he, + struct hot_info *root) +{ + struct hot_range_item *hr_nodes[8]; + loff_t start = 0; + bool obsolete; + int i, n; + + while (1) { + spin_lock(he-lock); + n = radix_tree_gang_lookup(he-hot_range_tree, + (void **)hr_nodes, start, + ARRAY_SIZE(hr_nodes)); + if (!n) { + spin_unlock(he-lock); + break; + } + spin_unlock(he-lock); + + start = hr_nodes[n - 1]-start + 1; + for (i = 0; i n; i++) { + kref_get(hr_nodes[i]-hot_range.refs); + hot_map_update( + hr_nodes[i]-hot_range.hot_freq_data, root); + + spin_lock(hr_nodes[i]-hot_range.lock); + obsolete = hot_is_obsolete( + hr_nodes[i]-hot_range.hot_freq_data); + spin_unlock(hr_nodes[i]-hot_range.lock); + + hot_range_item_put(hr_nodes[i]); + if (obsolete) + hot_range_item_put(hr_nodes[i]); + } + } +} + /* * Initialize inode and range map info. */ diff --git a/fs/hot_tracking.h b/fs/hot_tracking.h index 2f209b6..0d28e6f 100644 --- a/fs/hot_tracking.h +++ b/fs/hot_tracking.h @@ -25,6 +25,12 @@ #define RANGE_SIZE (1 RANGE_BITS) #define FREQ_POWER 4 +/* + * time to quit keeping track of + * tracking data (seconds) + */ +#define TIME_TO_KICK 300 + /* NRR/NRW heat unit = 2^X accesses */ #define NRR_MULTIPLIER_POWER 20 /* NRR - number of reads since mount */ #define NRR_COEFF_POWER 0 -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1 hot_track 07/18] vfs: add map info update function
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/hot_tracking.c | 67 + fs/hot_tracking.h | 21 2 files changed, 88 insertions(+), 0 deletions(-) diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c index 3fd6255..4b143a4 100644 --- a/fs/hot_tracking.c +++ b/fs/hot_tracking.c @@ -393,6 +393,73 @@ static u32 hot_temp_calc(struct hot_freq_data *freq_data) } /* + * Calculate a new temperature and, if necessary, + * move the list_head corresponding to this inode or range + * to the proper list with the new temperature + */ +static void hot_map_update(struct hot_freq_data *freq_data, + struct hot_info *root) +{ + struct hot_map_head *buckets, *cur_bucket; + struct hot_comm_item *comm_item; + struct hot_inode_item *he; + struct hot_range_item *hr; + u32 temp = hot_temp_calc(freq_data); + u8 a_temp = (u8)hot_raw_shift((u64)temp, (32 - HEAT_MAP_BITS), false); + u8 b_temp = (u8)hot_raw_shift((u64)freq_data-last_temp, + (32 - HEAT_MAP_BITS), false); + + comm_item = container_of(freq_data, + struct hot_comm_item, hot_freq_data); + + if (freq_data-flags FREQ_DATA_TYPE_INODE) { + he = container_of(comm_item, + struct hot_inode_item, hot_inode); + buckets = root-heat_inode_map; + + if (he == NULL) + return; + + spin_lock(he-hot_inode.lock); + if (list_empty(he-hot_inode.n_list) || (a_temp != b_temp)) { + if (!list_empty(he-hot_inode.n_list)) { + list_del_init(he-hot_inode.n_list); + root-hot_map_nr--; + } + + cur_bucket = buckets + a_temp; + list_add_tail(he-hot_inode.n_list, + cur_bucket-node_list); + root-hot_map_nr++; + freq_data-last_temp = temp; + } + spin_unlock(he-hot_inode.lock); + } else if (freq_data-flags FREQ_DATA_TYPE_RANGE) { + hr = container_of(comm_item, + struct hot_range_item, hot_range); + buckets = root-heat_range_map; + + if (hr == NULL) + return; + + spin_lock(hr-hot_range.lock); + if (list_empty(hr-hot_range.n_list) || (a_temp != b_temp)) { + if (!list_empty(hr-hot_range.n_list)) { + list_del_init(hr-hot_range.n_list); + root-hot_map_nr--; + } + + cur_bucket = buckets + a_temp; + list_add_tail(hr-hot_range.n_list, + cur_bucket-node_list); + root-hot_map_nr++; + freq_data-last_temp = temp; + } + spin_unlock(hr-hot_range.lock); + } +} + +/* * Initialize inode and range map info. */ static void hot_map_init(struct hot_info *root) diff --git a/fs/hot_tracking.h b/fs/hot_tracking.h index 7f279af..2f209b6 100644 --- a/fs/hot_tracking.h +++ b/fs/hot_tracking.h @@ -25,4 +25,25 @@ #define RANGE_SIZE (1 RANGE_BITS) #define FREQ_POWER 4 +/* NRR/NRW heat unit = 2^X accesses */ +#define NRR_MULTIPLIER_POWER 20 /* NRR - number of reads since mount */ +#define NRR_COEFF_POWER 0 +#define NRW_MULTIPLIER_POWER 20 /* NRW - number of writes since mount */ +#define NRW_COEFF_POWER 0 + +/* LTR/LTW heat unit = 2^X ns of age */ +#define LTR_DIVIDER_POWER 30 /* LTR - time elapsed since last read(ns) */ +#define LTR_COEFF_POWER 1 +#define LTW_DIVIDER_POWER 30 /* LTW - time elapsed since last write(ns) */ +#define LTW_COEFF_POWER 1 + +/* + * AVR/AVW cold unit = 2^X ns of average delta + * AVR/AVW heat unit = HEAT_MAX_VALUE - cold unit + */ +#define AVR_DIVIDER_POWER 40 /* AVR - average delta between recent reads(ns) */ +#define AVR_COEFF_POWER 0 +#define AVW_DIVIDER_POWER 40 /* AVW - average delta between recent writes(ns) */ +#define AVW_COEFF_POWER 0 + #endif /* __HOT_TRACKING__ */ -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1 hot_track 05/18] vfs: add hooks to enable hot tracking
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Miscellaneous features that implement hot data tracking and generally make the hot data functions a bit more friendly. Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/direct-io.c |6 ++ mm/filemap.c|6 ++ mm/page-writeback.c | 12 mm/readahead.c |7 +++ 4 files changed, 31 insertions(+), 0 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index f86c720..51f13f4 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -37,6 +37,7 @@ #include linux/uio.h #include linux/atomic.h #include linux/prefetch.h +#include hot_tracking.h /* * How many user pages to map in one call to get_user_pages(). This determines @@ -1297,6 +1298,11 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, prefetch(bdev-bd_queue); prefetch((char *)bdev-bd_queue + SMP_CACHE_BYTES); + /* Hot data tracking */ + hot_update_freqs(inode, offset, + iov_length(iov, nr_segs), + rw WRITE); + return do_blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset, nr_segs, get_block, end_io, submit_io, flags); diff --git a/mm/filemap.c b/mm/filemap.c index 83efee7..49a1da9 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -33,6 +33,7 @@ #include linux/hardirq.h /* for BUG_ON(!in_atomic()) only */ #include linux/memcontrol.h #include linux/cleancache.h +#include linux/hot_tracking.h #include internal.h /* @@ -1224,6 +1225,11 @@ readpage: * PG_error will be set again if readpage fails. */ ClearPageError(page); + + /* Hot data tracking */ + hot_update_freqs(inode, page-index PAGE_CACHE_SHIFT, + PAGE_CACHE_SIZE, 0); + /* Start the actual read. The read will unlock the page. */ error = mapping-a_ops-readpage(filp, page); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 830893b..27db4c1 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -35,6 +35,7 @@ #include linux/buffer_head.h /* __set_page_dirty_buffers */ #include linux/pagevec.h #include linux/timer.h +#include linux/hot_tracking.h #include trace/events/writeback.h /* @@ -1903,13 +1904,24 @@ EXPORT_SYMBOL(generic_writepages); int do_writepages(struct address_space *mapping, struct writeback_control *wbc) { int ret; + pgoff_t start = 0; + size_t count = 0; if (wbc-nr_to_write = 0) return 0; + + start = mapping-writeback_index PAGE_CACHE_SHIFT; + count = wbc-nr_to_write; + if (mapping-a_ops-writepages) ret = mapping-a_ops-writepages(mapping, wbc); else ret = generic_writepages(mapping, wbc); + + /* Hot data tracking */ + hot_update_freqs(mapping-host, (loff_t)start, + (count - (u64)wbc-nr_to_write) * PAGE_CACHE_SIZE, 1); + return ret; } diff --git a/mm/readahead.c b/mm/readahead.c index 7963f23..40a0e7f 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -19,6 +19,7 @@ #include linux/pagemap.h #include linux/syscalls.h #include linux/file.h +#include linux/hot_tracking.h /* * Initialise a struct file's readahead state. Assumes that the caller has @@ -138,6 +139,12 @@ static int read_pages(struct address_space *mapping, struct file *filp, out: blk_finish_plug(plug); + /* Hot data tracking */ + hot_update_freqs(mapping-host, + (list_entry(pages-prev, struct page, lru)-index) +PAGE_CACHE_SHIFT, + nr_pages * PAGE_CACHE_SIZE, 0); + return ret; } -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Nov 8
On Thu, Nov 8, 2012 at 5:49 AM, Stephen Rothwell s...@canb.auug.org.au wrote: Hi all, Changes since 20121107: [...] The v4l-dvb tree still has its build failure so I used the version from next-20121026. Hi, I am just wondering why this v4l-dvb issues are still present... ...since 26-Oct-2012... ...Maintainer on holidays :-)? I know most Linux kernel people are visting Barcelona this week. Regards, - Sedat - -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -next] staging: comedi: usbduxfast: remove unused variable in usbduxfastsub_ai_Irq()
From: Wei Yongjun yongjun_...@trendmicro.com.cn The variable 'p' is initialized but never used otherwise, so remove the unused variable. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn --- drivers/staging/comedi/drivers/usbduxfast.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/staging/comedi/drivers/usbduxfast.c b/drivers/staging/comedi/drivers/usbduxfast.c index b4e987b..574e4b7 100644 --- a/drivers/staging/comedi/drivers/usbduxfast.c +++ b/drivers/staging/comedi/drivers/usbduxfast.c @@ -312,7 +312,6 @@ static void usbduxfastsub_ai_Irq(struct urb *urb) struct usbduxfastsub_s *udfs; struct comedi_device *this_comedidev; struct comedi_subdevice *s; - uint16_t *p; /* sanity checks - is the urb there? */ if (!urb) { @@ -379,7 +378,6 @@ static void usbduxfastsub_ai_Irq(struct urb *urb) return; } - p = urb-transfer_buffer; if (!udfs-ignore) { if (!udfs-ai_continous) { /* not continuous, fixed number of samples */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1 hot_track 01/18] vfs: introduce some data structures
From: Zhi Yong Wu wu...@linux.vnet.ibm.com One root structure hot_info is defined, is hooked up in super_block, and will be used to hold radix tree root, hash list root and some other information, etc. Adds hot_inode_tree struct to keep track of frequently accessed files, and be keyed by {inode, offset}. Trees contain hot_inode_items representing those files and ranges. Having these trees means that vfs can quickly determine the temperature of some data by doing some calculations on the hot_freq_data struct that hangs off of the tree item. Define two items hot_inode_item and hot_range_item, one of them represents one tracked file to keep track of its access frequency and the tree of ranges in this file, while the latter represents a file range of one inode. Each of the two structures contains a hot_freq_data struct with its frequency of access metrics (number of {reads, writes}, last {read,write} time, frequency of {reads,writes}). Also, each hot_inode_item contains one hot_range_tree struct which is keyed by {inode, offset, length} and used to keep track of all the ranges in this file. Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/Makefile |2 +- fs/dcache.c |2 + fs/hot_tracking.c| 109 ++ fs/hot_tracking.h| 23 + include/linux/hot_tracking.h | 73 5 files changed, 208 insertions(+), 1 deletions(-) create mode 100644 fs/hot_tracking.c create mode 100644 fs/hot_tracking.h create mode 100644 include/linux/hot_tracking.h diff --git a/fs/Makefile b/fs/Makefile index 1d7af79..f966dea 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -11,7 +11,7 @@ obj-y := open.o read_write.o file_table.o super.o \ attr.o bad_inode.o file.o filesystems.o namespace.o \ seq_file.o xattr.o libfs.o fs-writeback.o \ pnode.o drop_caches.o splice.o sync.o utimes.o \ - stack.o fs_struct.o statfs.o + stack.o fs_struct.o statfs.o hot_tracking.o ifeq ($(CONFIG_BLOCK),y) obj-y += buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o diff --git a/fs/dcache.c b/fs/dcache.c index 3a463d0..7d5be16 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -37,6 +37,7 @@ #include linux/rculist_bl.h #include linux/prefetch.h #include linux/ratelimit.h +#include linux/hot_tracking.h #include internal.h #include mount.h @@ -3172,4 +3173,5 @@ void __init vfs_caches_init(unsigned long mempages) mnt_init(); bdev_cache_init(); chrdev_init(); + hot_cache_init(); } diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c new file mode 100644 index 000..806fbb0 --- /dev/null +++ b/fs/hot_tracking.c @@ -0,0 +1,109 @@ +/* + * fs/hot_tracking.c + * + * Copyright (C) 2012 IBM Corp. All rights reserved. + * Written by Zhi Yong Wu wu...@linux.vnet.ibm.com + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + */ + +#include linux/list.h +#include linux/err.h +#include linux/slab.h +#include linux/module.h +#include linux/spinlock.h +#include linux/hardirq.h +#include linux/fs.h +#include linux/blkdev.h +#include linux/types.h +#include linux/limits.h +#include hot_tracking.h + +/* kmem_cache pointers for slab caches */ +static struct kmem_cache *hot_inode_item_cachep __read_mostly; +static struct kmem_cache *hot_range_item_cachep __read_mostly; + +/* + * Initialize the inode tree. Should be called for each new inode + * access or other user of the hot_inode interface. + */ +static void hot_inode_tree_init(struct hot_info *root) +{ + INIT_RADIX_TREE(root-hot_inode_tree, GFP_ATOMIC); + spin_lock_init(root-lock); +} + +/* + * Initialize the hot range tree. Should be called for each new inode + * access or other user of the hot_range interface. + */ +void hot_range_tree_init(struct hot_inode_item *he) +{ + INIT_RADIX_TREE(he-hot_range_tree, GFP_ATOMIC); + spin_lock_init(he-lock); +} + +/* + * Initialize a new hot_range_item structure. The new structure is + * returned with a reference count of one and needs to be + * freed using free_range_item() + */ +static void hot_range_item_init(struct hot_range_item *hr, loff_t start, + struct hot_inode_item *he) +{ + hr-start = start; + hr-len = RANGE_SIZE; + hr-hot_inode = he; + kref_init(hr-hot_range.refs); + spin_lock_init(hr-hot_range.lock); + hr-hot_range.hot_freq_data.avg_delta_reads = (u64) -1; + hr-hot_range.hot_freq_data.avg_delta_writes = (u64) -1; + hr-hot_range.hot_freq_data.flags = FREQ_DATA_TYPE_RANGE; +} + +/* + * Initialize a new hot_inode_item structure. The new structure is + * returned with a reference count of one and needs to be + * freed using hot_free_inode_item() + */ +static
Re: ACPI errors with 3.7-rc3
Here is mine - https://gist.github.com/4037687 To Greg: acpidump 20100513-3.1 And I don't have pmtools installed On Thu, Nov 8, 2012 at 8:47 AM, Greg KH gre...@linuxfoundation.org wrote: On Wed, Nov 07, 2012 at 10:49:40PM +0100, Rafael J. Wysocki wrote: On Tuesday, November 06, 2012 01:48:26 PM Greg KH wrote: On Tue, Nov 06, 2012 at 04:42:24PM +0400, Azat Khuzhin wrote: I'v also have such errors on my macbook pro. $ dmesg | tail [17056.008564] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC__.SMB0.SBRW] (Node 88026547ea10), AE_TIME (20120711/psparse-536) [17056.011194] ACPI Error: Method parse/execution failed [\_SB_.BAT0.UBST] (Node 88026547e678), AE_TIME (20120711/psparse-536) [17056.013793] ACPI Error: Method parse/execution failed [\_SB_.BAT0._BST] (Node 88026547e740), AE_TIME (20120711/psparse-536) [17056.016383] ACPI Exception: AE_TIME, Evaluating _BST (20120711/battery-464) [17056.511373] ACPI: EC: input buffer is not empty, aborting transaction [17056.512672] ACPI Exception: AE_TIME, Returned by Handler for [EmbeddedControl] (20120711/evregion-501) [17056.515256] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC__.SMB0.SBRW] (Node 88026547ea10), AE_TIME (20120711/psparse-536) [17056.517886] ACPI Error: Method parse/execution failed [\_SB_.BAT0.UBST] (Node 88026547e678), AE_TIME (20120711/psparse-536) [17056.520479] ACPI Error: Method parse/execution failed [\_SB_.BAT0._BST] (Node 88026547e740), AE_TIME (20120711/psparse-536) [17056.523070] ACPI Exception: AE_TIME, Evaluating _BST (20120711/battery-464) I'm seeing this again right now. I'm wondering if it's because I'm running on battery power at the moment: [41694.309264] ACPI Exception: AE_TIME, Returned by Handler for [EmbeddedControl] (20120913/evregion-501) [41694.309282] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC__.SMB0.SBRW] (Node 88045cc64618), AE_TIME (20120913/psparse-536) [41694.309300] ACPI Error: Method parse/execution failed [\_SB_.BAT0.UBST] (Node 88045cc64988), AE_TIME (20120913/psparse-536) [41694.309310] ACPI Error: Method parse/execution failed [\_SB_.BAT0._BST] (Node 88045cc648c0), AE_TIME (20120913/psparse-536) [41694.309324] ACPI Exception: AE_TIME, Evaluating _BST (20120913/battery-464) [41694.809093] ACPI: EC: input buffer is not empty, aborting transaction ec_storm_threshold is still set to 8 in /sys/module/acpi/parameters/ so that's not the issue here. And also loadavg is too high ~ 10 While there is no process that load CPU up to 100% or like that. I think that this because of processes that is done in kernel space. (basically that one who write such errors) $ uname -a Linux macbook-pro-sq 3.6.5macbook-pro-custom-v0.1 #4 SMP Sun Nov 4 12:39:03 UTC 2012 x86_64 GNU/Linux Ah, ok, that means it's not something new in 3.7-rc, so maybe it's just never worked properly for this hardware :) So it's not a regression, just an ACPI issue, any ACPI developer have an idea about this? Can you please send the output of acpidump from the affected machine(s)? # ./acpidump ACPI tables were not found. If you know location of RSD PTR table (from dmesg, etc), supply it with either --addr or -a option What am I doing wrong here? Is there a newer version of pmtools than the one labled pmtools-20071116 that I should be using? A link to download it would be appreciated. thanks, greg k-h -- Azat Khuzhin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] memory-hotplug: fix NR_FREE_PAGES mismatch's fix
At 11/08/2012 05:06 PM, Wen Congyang Wrote: When a page is freed and put into pcp list, get_freepage_migratetype() doesn't return MIGRATE_ISOLATE even if this pageblock is isolated. So we should use get_freepage_migratetype() instead of mt to check whether it is isolated. In my local tree, there are some patches from isimatu, so I don't add -s option when generating the patch. So I forgot to add: Signed-off-by: Wen Congyang we...@cn.fujitsu.com Reported-by: Jianguo Wu wujianguo...@gmail.com --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 027afd0..795875f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -667,7 +667,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ __free_one_page(page, zone, 0, mt); trace_mm_page_pcpu_drain(page, 0, mt); - if (likely(mt != MIGRATE_ISOLATE)) { + if (likely(get_pageblock_migratetype(page) != MIGRATE_ISOLATE)) { __mod_zone_page_state(zone, NR_FREE_PAGES, 1); if (is_migrate_cma(mt)) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[[PATCH v9 3/3] 1/1] virtio_console: Remove buffers from out_vq at port removal
From: Sjur Brændeland sjur.brandel...@stericsson.com Remove buffers from the out-queue when a port is removed. Rproc_serial communicates with remote processors that may crash and leave buffers in the out-queue. The virtio serial ports may have buffers in the out-queue as well, e.g. for non-blocking ports and the host didn't consume them yet. [Amit: Remove WARN_ON for generic ports case.] Signed-off-by: Sjur Brændeland sjur.brandel...@stericsson.com Signed-off-by: Amit Shah amit.s...@redhat.com --- drivers/char/virtio_console.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c index 9ebadcb..5ff3b3e 100644 --- a/drivers/char/virtio_console.c +++ b/drivers/char/virtio_console.c @@ -1521,6 +1521,10 @@ static void remove_port_data(struct port *port) /* Remove buffers we queued up for the Host to send us data in. */ while ((buf = virtqueue_detach_unused_buf(port-in_vq))) free_buf(buf, true); + + /* Remove buffers we queued up for the Host to consume */ + while ((buf = virtqueue_detach_unused_buf(port-out_vq))) + free_buf(buf, true); } /* -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ACPI errors with 3.7-rc3
On Thursday, November 08, 2012 05:47:15 AM Greg KH wrote: On Wed, Nov 07, 2012 at 10:49:40PM +0100, Rafael J. Wysocki wrote: On Tuesday, November 06, 2012 01:48:26 PM Greg KH wrote: On Tue, Nov 06, 2012 at 04:42:24PM +0400, Azat Khuzhin wrote: I'v also have such errors on my macbook pro. $ dmesg | tail [17056.008564] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC__.SMB0.SBRW] (Node 88026547ea10), AE_TIME (20120711/psparse-536) [17056.011194] ACPI Error: Method parse/execution failed [\_SB_.BAT0.UBST] (Node 88026547e678), AE_TIME (20120711/psparse-536) [17056.013793] ACPI Error: Method parse/execution failed [\_SB_.BAT0._BST] (Node 88026547e740), AE_TIME (20120711/psparse-536) [17056.016383] ACPI Exception: AE_TIME, Evaluating _BST (20120711/battery-464) [17056.511373] ACPI: EC: input buffer is not empty, aborting transaction [17056.512672] ACPI Exception: AE_TIME, Returned by Handler for [EmbeddedControl] (20120711/evregion-501) [17056.515256] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC__.SMB0.SBRW] (Node 88026547ea10), AE_TIME (20120711/psparse-536) [17056.517886] ACPI Error: Method parse/execution failed [\_SB_.BAT0.UBST] (Node 88026547e678), AE_TIME (20120711/psparse-536) [17056.520479] ACPI Error: Method parse/execution failed [\_SB_.BAT0._BST] (Node 88026547e740), AE_TIME (20120711/psparse-536) [17056.523070] ACPI Exception: AE_TIME, Evaluating _BST (20120711/battery-464) I'm seeing this again right now. I'm wondering if it's because I'm running on battery power at the moment: [41694.309264] ACPI Exception: AE_TIME, Returned by Handler for [EmbeddedControl] (20120913/evregion-501) [41694.309282] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC__.SMB0.SBRW] (Node 88045cc64618), AE_TIME (20120913/psparse-536) [41694.309300] ACPI Error: Method parse/execution failed [\_SB_.BAT0.UBST] (Node 88045cc64988), AE_TIME (20120913/psparse-536) [41694.309310] ACPI Error: Method parse/execution failed [\_SB_.BAT0._BST] (Node 88045cc648c0), AE_TIME (20120913/psparse-536) [41694.309324] ACPI Exception: AE_TIME, Evaluating _BST (20120913/battery-464) [41694.809093] ACPI: EC: input buffer is not empty, aborting transaction ec_storm_threshold is still set to 8 in /sys/module/acpi/parameters/ so that's not the issue here. And also loadavg is too high ~ 10 While there is no process that load CPU up to 100% or like that. I think that this because of processes that is done in kernel space. (basically that one who write such errors) $ uname -a Linux macbook-pro-sq 3.6.5macbook-pro-custom-v0.1 #4 SMP Sun Nov 4 12:39:03 UTC 2012 x86_64 GNU/Linux Ah, ok, that means it's not something new in 3.7-rc, so maybe it's just never worked properly for this hardware :) So it's not a regression, just an ACPI issue, any ACPI developer have an idea about this? Can you please send the output of acpidump from the affected machine(s)? # ./acpidump ACPI tables were not found. If you know location of RSD PTR table (from dmesg, etc), supply it with either --addr or -a option What am I doing wrong here? Is there a newer version of pmtools than the one labled pmtools-20071116 that I should be using? A link to download it would be appreciated. On my system, which is a reasonably current Tumbleweed, acpidump is in the acpica-20120518-7.1.2.x86_64 package. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] Thermal: exynos: Add sysfs node supporting exynos's emulation mode.
Hi Jonghwa Lee, I tested this patch and it looks good. I have some minor comments below, Reviewed-by: Amit Daniel Kachhap amit.kach...@linaro.org Thanks, Amit Daniel On 2 November 2012 07:54, Jonghwa Lee jonghwa3@samsung.com wrote: This patch supports exynos's emulation mode with newly created sysfs node. Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal management unit. Thermal emulation mode supports software debug for TMU's operation. User can set temperature manually with software code and TMU will read current temperature from user value not from sensor's value. This patch includes also documentary placed under Documentation/thermal/. Signed-off-by: Jonghwa Lee jonghwa3@samsung.com --- v4 - Fix Typo. - Remove unnecessary codes. - Add comments about feature of exynos emulation operation to the document. v3 - Remove unnecessay variables. - Do some code clean in exynos_tmu_emulation_store(). - Make wrapping function of sysfs node creation function to use #ifdefs in minimum. v2 exynos_thermal.c - Fix build error occured by wrong emulation control register name. - Remove exynos5410 dependent codes. exynos_thermal_emulation - Align indentation. Documentation/thermal/exynos_thermal_emulation | 56 +++ drivers/thermal/Kconfig|9 +++ drivers/thermal/exynos_thermal.c | 91 3 files changed, 156 insertions(+), 0 deletions(-) create mode 100644 Documentation/thermal/exynos_thermal_emulation diff --git a/Documentation/thermal/exynos_thermal_emulation b/Documentation/thermal/exynos_thermal_emulation new file mode 100644 index 000..a6ea06f --- /dev/null +++ b/Documentation/thermal/exynos_thermal_emulation @@ -0,0 +1,56 @@ +EXYNOS EMULATION MODE + + +Copyright (C) 2012 Samsung Electronics + +Written by Jonghwa Lee jonghwa3@samsung.com + +Description +--- + +Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal management unit. +Thermal emulation mode supports software debug for TMU's operation. User can set temperature +manually with software code and TMU will read current temperature from user value not from +sensor's value. + +Enabling CONFIG_EXYNOS_THERMAL_EMUL option will make this support in available. +When it's enabled, sysfs node will be created under +/sys/bus/platform/devices/'exynos device name'/ with name of 'emulation'. + +The sysfs node, 'emulation', will contain value 0 for the initial state. When you input any +temperature you want to update to sysfs node, it automatically enable emulation mode and +current temperature will be changed into it. +(Exynos also supports user changable delay time which would be used to delay of + changing temperature. However, this node only uses same delay of real sensing time, 938us.) + +Exynos emulation mode requires synchronous of value changing and enabling. It means when you +want to update the any value of delay or next temperature, then you have to enable emulation +mode at the same time. (Or you have to keep the mode enabling.) If you don't, it fails to +change the value to updated one and just use last succeessful value repeatedly. That's why +this node gives users the right to change termerpature only. Just one interface makes it more +simply to use. + +Disabling emulation mode only requires writing value 0 to sysfs node. + + +TEMP 120 | + | + 100 | + | +80 | + |+--- +60 || | + | +-| | +40 | | | | + | | | | +20 | | | +-- + | | | | | + 0 |__|_|__|__|_ + A A A A TIME + |-| |-| |-| | + | 938us | | | | | | +emulation: 0 50 | 70 | 20 | 0 +current temp : sensor 5070 20sensor + + + diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig index e1cb6bd..c02a66c 100644 --- a/drivers/thermal/Kconfig +++ b/drivers/thermal/Kconfig @@ -55,3 +55,12 @@ config EXYNOS_THERMAL help If you say yes here you get support for TMU (Thermal Managment Unit) on SAMSUNG EXYNOS series of SoC. + +config EXYNOS_THERMAL_EMUL + bool EXYNOS TMU emulation mode support + depends on !CPU_EXYNOS4210 EXYNOS_THERMAL Instead of using CPU_EXYNOS4210 here it is better to use data-soc == SOC_ARCH_EXYNOS4210 inside the
Re: [PATCH resend] virtio_console: Free buffers from out-queue upon close
Note: This patch is compile tested only. I have done the removal of buffers from out-queue in handle_control_message() when host has acked the close request. This seems less racy than doing it in the release function. This confuses me... why are we doing this in case VIRTIO_CONSOLE_PORT_OPEN:? We can't pull unconsumed buffers out of the ring when the other side may still access it, and this seems to be doing that. Yes -- and it's my fault; I asked Sjur to do that in the close fops function. Thanks Amit :-), but this was really my bad. We should only do this in the port remove case (unplug or device remove) -- so the original patch, with just the WARN_ON removed is the right way. I'll send the revised 3/3 patch for you. Thank you. Regards, Sjur -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 2/2] mm, oom: fix race when specifying a thread as the oom origin
test_set_oom_score_adj() and compare_swap_oom_score_adj() are used to specify that current should be killed first if an oom condition occurs in between the two calls. The usage is short oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX); ... compare_swap_oom_score_adj(OOM_SCORE_ADJ_MAX, oom_score_adj); to store the thread's oom_score_adj, temporarily change it to the maximum score possible, and then restore the old value if it is still the same. This happens to still be racy, however, if the user writes OOM_SCORE_ADJ_MAX to /proc/pid/oom_score_adj in between the two calls. The compare_swap_oom_score_adj() will then incorrectly reset the old value prior to the write of OOM_SCORE_ADJ_MAX. To fix this, introduce a new oom_flags_t member in struct signal_struct that will be used for per-thread oom killer flags. KSM and swapoff can now use a bit in this member to specify that threads should be killed first in oom conditions without playing around with oom_score_adj. This also allows the correct oom_score_adj to always be shown when reading /proc/pid/oom_score. Signed-off-by: David Rientjes rient...@google.com --- include/linux/oom.h | 19 +-- include/linux/sched.h |1 + include/linux/types.h |1 + mm/ksm.c |7 ++- mm/oom_kill.c | 49 +++-- mm/swapfile.c |5 ++--- 6 files changed, 30 insertions(+), 52 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -29,8 +29,23 @@ enum oom_scan_t { OOM_SCAN_SELECT,/* always select this thread first */ }; -extern void compare_swap_oom_score_adj(short old_val, short new_val); -extern short test_set_oom_score_adj(short new_val); +/* Thread is the potential origin of an oom condition; kill first on oom */ +#define OOM_FLAG_ORIGIN((__force oom_flags_t)0x1) + +static inline void set_current_oom_origin(void) +{ + current-signal-oom_flags |= OOM_FLAG_ORIGIN; +} + +static inline void clear_current_oom_origin(void) +{ + current-signal-oom_flags = ~OOM_FLAG_ORIGIN; +} + +static inline bool oom_task_origin(const struct task_struct *p) +{ + return !!(p-signal-oom_flags OOM_FLAG_ORIGIN); +} extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, const nodemask_t *nodemask, diff --git a/include/linux/sched.h b/include/linux/sched.h --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -631,6 +631,7 @@ struct signal_struct { struct rw_semaphore group_rwsem; #endif + oom_flags_t oom_flags; short oom_score_adj;/* OOM kill score adjustment */ short oom_score_adj_min;/* OOM kill score adjustment min value. * Only settable by CAP_SYS_RESOURCE. */ diff --git a/include/linux/types.h b/include/linux/types.h --- a/include/linux/types.h +++ b/include/linux/types.h @@ -156,6 +156,7 @@ typedef u32 dma_addr_t; #endif typedef unsigned __bitwise__ gfp_t; typedef unsigned __bitwise__ fmode_t; +typedef unsigned __bitwise__ oom_flags_t; #ifdef CONFIG_PHYS_ADDR_T_64BIT typedef u64 phys_addr_t; diff --git a/mm/ksm.c b/mm/ksm.c --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1929,12 +1929,9 @@ static ssize_t run_store(struct kobject *kobj, struct kobj_attribute *attr, if (ksm_run != flags) { ksm_run = flags; if (flags KSM_RUN_UNMERGE) { - short oom_score_adj; - - oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX); + set_current_oom_origin(); err = unmerge_and_remove_all_rmap_items(); - compare_swap_oom_score_adj(OOM_SCORE_ADJ_MAX, - oom_score_adj); + clear_current_oom_origin(); if (err) { ksm_run = KSM_RUN_STOP; count = err; diff --git a/mm/oom_kill.c b/mm/oom_kill.c --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -44,48 +44,6 @@ int sysctl_oom_kill_allocating_task; int sysctl_oom_dump_tasks = 1; static DEFINE_SPINLOCK(zone_scan_lock); -/* - * compare_swap_oom_score_adj() - compare and swap current's oom_score_adj - * @old_val: old oom_score_adj for compare - * @new_val: new oom_score_adj for swap - * - * Sets the oom_score_adj value for current to @new_val iff its present value is - * @old_val. Usually used to reinstate a previous value to prevent racing with - * userspacing tuning the value in the interim. - */ -void compare_swap_oom_score_adj(short old_val, short new_val) -{ - struct sighand_struct *sighand = current-sighand; - - spin_lock_irq(sighand-siglock); - if (current-signal-oom_score_adj == old_val) - current-signal-oom_score_adj
[patch 1/2] mm, oom: change type of oom_score_adj to short
The maximum oom_score_adj is 1000 and the minimum oom_score_adj is -1000, so this range can be represented by the signed short type with no functional change. The extra space this frees up in struct signal_struct will be used for per-thread oom kill flags in the next patch. Signed-off-by: David Rientjes rient...@google.com --- drivers/staging/android/lowmemorykiller.c | 16 fs/proc/base.c| 10 +- include/linux/oom.h |4 ++-- include/linux/sched.h |6 +++--- include/trace/events/oom.h|4 ++-- include/trace/events/task.h |8 mm/ksm.c |2 +- mm/oom_kill.c | 10 +- mm/swapfile.c |2 +- 9 files changed, 31 insertions(+), 31 deletions(-) diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -40,7 +40,7 @@ #include linux/notifier.h static uint32_t lowmem_debug_level = 2; -static int lowmem_adj[6] = { +static short lowmem_adj[6] = { 0, 1, 6, @@ -70,9 +70,9 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) int rem = 0; int tasksize; int i; - int min_score_adj = OOM_SCORE_ADJ_MAX + 1; + short min_score_adj = OOM_SCORE_ADJ_MAX + 1; int selected_tasksize = 0; - int selected_oom_score_adj; + short selected_oom_score_adj; int array_size = ARRAY_SIZE(lowmem_adj); int other_free = global_page_state(NR_FREE_PAGES); int other_file = global_page_state(NR_FILE_PAGES) - @@ -90,7 +90,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) } } if (sc-nr_to_scan 0) - lowmem_print(3, lowmem_shrink %lu, %x, ofree %d %d, ma %d\n, + lowmem_print(3, lowmem_shrink %lu, %x, ofree %d %d, ma %hd\n, sc-nr_to_scan, sc-gfp_mask, other_free, other_file, min_score_adj); rem = global_page_state(NR_ACTIVE_ANON) + @@ -107,7 +107,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) rcu_read_lock(); for_each_process(tsk) { struct task_struct *p; - int oom_score_adj; + short oom_score_adj; if (tsk-flags PF_KTHREAD) continue; @@ -141,11 +141,11 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) selected = p; selected_tasksize = tasksize; selected_oom_score_adj = oom_score_adj; - lowmem_print(2, select %d (%s), adj %d, size %d, to kill\n, + lowmem_print(2, select %d (%s), adj %hd, size %d, to kill\n, p-pid, p-comm, oom_score_adj, tasksize); } if (selected) { - lowmem_print(1, send sigkill to %d (%s), adj %d, size %d\n, + lowmem_print(1, send sigkill to %d (%s), adj %hd, size %d\n, selected-pid, selected-comm, selected_oom_score_adj, selected_tasksize); lowmem_deathpending_timeout = jiffies + HZ; @@ -176,7 +176,7 @@ static void __exit lowmem_exit(void) } module_param_named(cost, lowmem_shrinker.seeks, int, S_IRUGO | S_IWUSR); -module_param_array_named(adj, lowmem_adj, int, lowmem_adj_size, +module_param_array_named(adj, lowmem_adj, short, lowmem_adj_size, S_IRUGO | S_IWUSR); module_param_array_named(minfree, lowmem_minfree, uint, lowmem_minfree_size, S_IRUGO | S_IWUSR); diff --git a/fs/proc/base.c b/fs/proc/base.c --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -878,7 +878,7 @@ static ssize_t oom_score_adj_read(struct file *file, char __user *buf, { struct task_struct *task = get_proc_task(file-f_path.dentry-d_inode); char buffer[PROC_NUMBUF]; - int oom_score_adj = OOM_SCORE_ADJ_MIN; + short oom_score_adj = OOM_SCORE_ADJ_MIN; unsigned long flags; size_t len; @@ -889,7 +889,7 @@ static ssize_t oom_score_adj_read(struct file *file, char __user *buf, unlock_task_sighand(task, flags); } put_task_struct(task); - len = snprintf(buffer, sizeof(buffer), %d\n, oom_score_adj); + len = snprintf(buffer, sizeof(buffer), %hd\n, oom_score_adj); return simple_read_from_buffer(buf, count, ppos, buffer, len); } @@ -936,15 +936,15 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf, goto err_task_lock; } - if (oom_score_adj task-signal-oom_score_adj_min + if ((short)oom_score_adj
[PATCH -next] mtip32xx: fix potential NULL pointer dereference in mtip_timeout_function()
From: Wei Yongjun yongjun_...@trendmicro.com.cn The dereference to port should be moved below the NULL test. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn --- drivers/block/mtip32xx/mtip32xx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index adc6f36..fe16b32 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -559,7 +559,7 @@ static void mtip_timeout_function(unsigned long int data) struct mtip_cmd *command; int tag, cmdto_cnt = 0; unsigned int bit, group; - unsigned int num_command_slots = port-dd-slot_groups * 32; + unsigned int num_command_slots; unsigned long to, tagaccum[SLOTBITS_IN_LONGS]; if (unlikely(!port)) @@ -572,6 +572,7 @@ static void mtip_timeout_function(unsigned long int data) } /* clear the tag accumulator */ memset(tagaccum, 0, SLOTBITS_IN_LONGS * sizeof(long)); + num_command_slots = port-dd-slot_groups * 32; for (tag = 0; tag num_command_slots; tag++) { /* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7 0/3] Add modules to support realtek PCIE card reader
Hi Wei, On Mon, Oct 29, 2012 at 01:49:28PM +0800, wei_w...@realsil.com.cn wrote: From: Wei WANG wei_w...@realsil.com.cn Support for Realtek PCI-Express driver-based card readers including rts5209, rts5229 and rtl8411. All 3 patches applied now, thanks a lot. I also fixed the Kconfig entry where you forgot to select MFD_CORE. Cheers, Samuel. -- Intel Open Source Technology Centre http://oss.intel.com/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/9] cgroup: implement generic child / descendant walk macros
On Sat 03-11-12 01:38:29, Tejun Heo wrote: Currently, cgroup doesn't provide any generic helper for walking a given cgroup's children or descendants. This patch adds the following three macros. * cgroup_for_each_child() - walk immediate children of a cgroup. * cgroup_for_each_descendant_pre() - visit all descendants of a cgroup in pre-order tree traversal. * cgroup_for_each_descendant_post() - visit all descendants of a cgroup in post-order tree traversal. All three only require the user to hold RCU read lock during traversal. Verifying that each iterated cgroup is online is the responsibility of the user. When used with proper synchronization, cgroup_for_each_descendant_pre() can be used to propagate config updates to descendants in reliable way. See comments for details. Signed-off-by: Tejun Heo t...@kernel.org I will convert mem_cgroup_iter to use this rather than css_get_next after this gets into the next tree so that it can fly via Andrew. Reviewed-by: Michal Hocko mho...@suse.cz Just a minor knit. You are talking about a config propagation while I would consider state propagation more clear and less confusing. Config is usually stable enough so that post_create is not necessary for syncing (e.g. memcg.swappiness). It is a state which must be consistent throughout the hierarchy which matters here. Thanks! --- include/linux/cgroup.h | 82 +++ kernel/cgroup.c| 86 ++ 2 files changed, 168 insertions(+) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 90c33eb..0020329 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -534,6 +534,88 @@ static inline struct cgroup* task_cgroup(struct task_struct *task, return task_subsys_state(task, subsys_id)-cgroup; } +/** + * cgroup_for_each_child - iterate through children of a cgroup + * @pos: the cgroup * to use as the loop cursor + * @cgroup: cgroup whose children to walk + * + * Walk @cgroup's children. Must be called under rcu_read_lock(). A child + * cgroup which hasn't finished -post_create() or already has finished + * -pre_destroy() may show up during traversal and it's each subsystem's + * responsibility to verify that each @pos is alive. + * + * If a subsystem synchronizes against the parent in its -post_create() + * and before starting iterating, a cgroup which finished -post_create() + * is guaranteed to be visible in the future iterations. + */ +#define cgroup_for_each_child(pos, cgroup) \ + list_for_each_entry_rcu(pos, (cgroup)-children, sibling) + +struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos, + struct cgroup *cgroup); + +/** + * cgroup_for_each_descendant_pre - pre-order walk of a cgroup's descendants + * @pos: the cgroup * to use as the loop cursor + * @cgroup: cgroup whose descendants to walk + * + * Walk @cgroup's descendants. Must be called under rcu_read_lock(). A + * descendant cgroup which hasn't finished -post_create() or already has + * finished -pre_destroy() may show up during traversal and it's each + * subsystem's responsibility to verify that each @pos is alive. + * + * If a subsystem synchronizes against the parent in its -post_create() + * and before starting iterating, and synchronizes against @pos on each + * iteration, any descendant cgroup which finished -post_create() is + * guaranteed to be visible in the future iterations. + * + * In other words, the following guarantees that a descendant can't escape + * configuration of its ancestors. + * + * my_post_create(@cgrp) + * { + * Lock @cgrp-parent and @cgrp; + * Inherit config from @cgrp-parent; + * Unlock both. + * } + * + * my_update_config(@cgrp) + * { + * Lock @cgrp; + * Update @cgrp's config; + * Unlock @cgrp; + * + * cgroup_for_each_descendant_pre(@pos, @cgrp) { + * Lock @pos; + * Verify @pos is alive and inherit config from @pos-parent; + * Unlock @pos; + * } + * } + * + * Alternatively, a subsystem may choose to use a single global lock to + * synchronize -post_create() and -pre_destroy() against tree-walking + * operations. + */ +#define cgroup_for_each_descendant_pre(pos, cgroup) \ + for (pos = cgroup_next_descendant_pre(NULL, (cgroup)); (pos); \ + pos = cgroup_next_descendant_pre((pos), (cgroup))) + +struct cgroup *cgroup_next_descendant_post(struct cgroup *pos, +struct cgroup *cgroup); + +/** + * cgroup_for_each_descendant_post - post-order walk of a cgroup's descendants + * @pos: the cgroup * to use as the loop cursor + * @cgroup: cgroup whose descendants to walk + * + * Similar to cgroup_for_each_descendant_pre() but performs post-order + * traversal instead. Note that the walk visibility
Re: [BUGFIX] PM: Fix active child counting when disabled and forbidden
On Thursday, November 08, 2012 10:04:36 AM Huang Ying wrote: On Thu, 2012-11-08 at 02:35 +0100, Rafael J. Wysocki wrote: On Thursday, November 08, 2012 09:15:08 AM Huang Ying wrote: On Thu, 2012-11-08 at 00:09 +0100, Rafael J. Wysocki wrote: [...] I think the patch can fix the issue in a better way. I'm not sure what you mean. I mean your patch can fix the driver-less VGA issue. And it is better than my original patch. The following discussion is not about this specific issue. Just about PM core logic. OK Do we still need to clarify state about disabled and forbidden? When a device is forbidden and the usage_count 0, Forbidden always means usage_count 0. Yes. is it a good idea to allow to set device state to SUSPENDED if the device is disabled? No, it is not. The status should always be ACTIVE as long as usage_count 0. However, in some cases we actually would like to change the status to SUSPENDED when usage_count becomes equal to 0, because that means we can suspend (I mean really suspend) the parents of the devices in question (and we want to notify the parents in those cases). So do you think Alan Stern's suggestion about forbidden and disabled is the right way to go? I'm not really sure about that. My original idea was that the runtime PM status and usage counter would only matter when runtime PM of a device was enabled. That leads to problems, though, when we enable runtime PM of a device whose usage counter is greater from zero and status is SUSPENDED. Also when the device's status is ACTIVE, but its parent's child count is 0. It's not very easy to fix this at the core level, though, because we depend on the current behavior in some places. I'm thinking that perhaps pm_runtime_enable() should just WARN() if things are obviously inconsistent (although there still may be problems, for example, if the parent's child count is 2 when we enable runtime PM for its child, but that child is the only one it actually has). Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pwm: lpc32xx - Fix the PWM polarity
On 07/11/12 16:25, Alban Bedel wrote: Signed-off-by: Alban Bedel alban.be...@avionic-design.de --- drivers/pwm/pwm-lpc32xx.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/drivers/pwm/pwm-lpc32xx.c b/drivers/pwm/pwm-lpc32xx.c index adb87f0..0dc278d 100644 --- a/drivers/pwm/pwm-lpc32xx.c +++ b/drivers/pwm/pwm-lpc32xx.c @@ -51,7 +51,11 @@ static int lpc32xx_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm, c = 256 * duty_ns; do_div(c, period_ns); - duty_cycles = c; + if (c == 0) + c = 256; + if (c 255) + c = 255; + duty_cycles = 256 - c; Except for the range check (for the original c 255), this results in: duty_cycles = 256 - c except for (c == 0) where duty_cycles = 1 which actually is duty_cycles = (256 - c) - 255 (think with the original c) i.e. nearly a polarity inversion in the case of (c == 0). Why is the case (c == 0) so special here? Maybe you can document this, if it is really intended? writel(PWM_ENABLE | PWM_RELOADV(period_cycles) | PWM_DUTY(duty_cycles), lpc32xx-base + (pwm-hwpwm 2)); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/9] cgroup_freezer: trivial cleanups
On Sat 03-11-12 01:38:30, Tejun Heo wrote: * Clean-up indentation and line-breaks. Drop the invalid comment about freezer-lock. * Make all internal functions take @freezer instead of both @cgroup and @freezer. Signed-off-by: Tejun Heo t...@kernel.org Looks reasonable Reviewed-by: Michal Hocko mho...@suse.cz --- kernel/cgroup_freezer.c | 41 +++-- 1 file changed, 19 insertions(+), 22 deletions(-) diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c index bedefd9..975b3d8 100644 --- a/kernel/cgroup_freezer.c +++ b/kernel/cgroup_freezer.c @@ -29,17 +29,15 @@ enum freezer_state { }; struct freezer { - struct cgroup_subsys_state css; - enum freezer_state state; - spinlock_t lock; /* protects _writes_ to state */ + struct cgroup_subsys_state css; + enum freezer_state state; + spinlock_t lock; }; -static inline struct freezer *cgroup_freezer( - struct cgroup *cgroup) +static inline struct freezer *cgroup_freezer(struct cgroup *cgroup) { - return container_of( - cgroup_subsys_state(cgroup, freezer_subsys_id), - struct freezer, css); + return container_of(cgroup_subsys_state(cgroup, freezer_subsys_id), + struct freezer, css); } static inline struct freezer *task_freezer(struct task_struct *task) @@ -180,8 +178,9 @@ out: * migrated into or out of @cgroup, so we can't verify task states against * @freezer state here. See freezer_attach() for details. */ -static void update_if_frozen(struct cgroup *cgroup, struct freezer *freezer) +static void update_if_frozen(struct freezer *freezer) { + struct cgroup *cgroup = freezer-css.cgroup; struct cgroup_iter it; struct task_struct *task; @@ -211,12 +210,11 @@ notyet: static int freezer_read(struct cgroup *cgroup, struct cftype *cft, struct seq_file *m) { - struct freezer *freezer; + struct freezer *freezer = cgroup_freezer(cgroup); enum freezer_state state; - freezer = cgroup_freezer(cgroup); spin_lock_irq(freezer-lock); - update_if_frozen(cgroup, freezer); + update_if_frozen(freezer); state = freezer-state; spin_unlock_irq(freezer-lock); @@ -225,8 +223,9 @@ static int freezer_read(struct cgroup *cgroup, struct cftype *cft, return 0; } -static void freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer) +static void freeze_cgroup(struct freezer *freezer) { + struct cgroup *cgroup = freezer-css.cgroup; struct cgroup_iter it; struct task_struct *task; @@ -236,8 +235,9 @@ static void freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer) cgroup_iter_end(cgroup, it); } -static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer) +static void unfreeze_cgroup(struct freezer *freezer) { + struct cgroup *cgroup = freezer-css.cgroup; struct cgroup_iter it; struct task_struct *task; @@ -247,11 +247,9 @@ static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer) cgroup_iter_end(cgroup, it); } -static void freezer_change_state(struct cgroup *cgroup, +static void freezer_change_state(struct freezer *freezer, enum freezer_state goal_state) { - struct freezer *freezer = cgroup_freezer(cgroup); - /* also synchronizes against task migration, see freezer_attach() */ spin_lock_irq(freezer-lock); @@ -260,13 +258,13 @@ static void freezer_change_state(struct cgroup *cgroup, if (freezer-state != CGROUP_THAWED) atomic_dec(system_freezing_cnt); freezer-state = CGROUP_THAWED; - unfreeze_cgroup(cgroup, freezer); + unfreeze_cgroup(freezer); break; case CGROUP_FROZEN: if (freezer-state == CGROUP_THAWED) atomic_inc(system_freezing_cnt); freezer-state = CGROUP_FREEZING; - freeze_cgroup(cgroup, freezer); + freeze_cgroup(freezer); break; default: BUG(); @@ -275,8 +273,7 @@ static void freezer_change_state(struct cgroup *cgroup, spin_unlock_irq(freezer-lock); } -static int freezer_write(struct cgroup *cgroup, - struct cftype *cft, +static int freezer_write(struct cgroup *cgroup, struct cftype *cft, const char *buffer) { enum freezer_state goal_state; @@ -288,7 +285,7 @@ static int freezer_write(struct cgroup *cgroup, else return -EINVAL; - freezer_change_state(cgroup, goal_state); + freezer_change_state(cgroup_freezer(cgroup), goal_state); return 0; } -- 1.7.11.7 -- Michal Hocko SUSE Labs -- To unsubscribe from this
[PATCH] tools: hv: Netlink source address validation allows DoS
The source code without this patch caused hypervkvpd to exit when it processed a spoofed Netlink packet which has been sent from an untrusted local user. Now Netlink messages with a non-zero nl_pid source address are ignored and a warning is printed into the syslog. Signed-off-by: Tomas Hozza tho...@redhat.com --- tools/hv/hv_kvp_daemon.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c index 13c2a14..c1d9102 100755 --- a/tools/hv/hv_kvp_daemon.c +++ b/tools/hv/hv_kvp_daemon.c @@ -1486,13 +1486,19 @@ int main(void) len = recvfrom(fd, kvp_recv_buffer, sizeof(kvp_recv_buffer), 0, addr_p, addr_l); - if (len 0 || addr.nl_pid) { + if (len 0) { syslog(LOG_ERR, recvfrom failed; pid:%u error:%d %s, addr.nl_pid, errno, strerror(errno)); close(fd); return -1; } + if (addr.nl_pid) { + syslog(LOG_WARNING, Received packet from untrusted pid:%u, + addr.nl_pid); + continue; + } + incoming_msg = (struct nlmsghdr *)kvp_recv_buffer; incoming_cn_msg = (struct cn_msg *)NLMSG_DATA(incoming_msg); hv_msg = (struct hv_kvp_msg *)incoming_cn_msg-data; -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/4] ARM: EXYNOS: PL330 MDMA1 fix for revision 0 of Exynos4210 SOC
On Thursday 08 November 2012 05:49:47 Kukjin Kim wrote: Bartlomiej Zolnierkiewicz wrote: Hmm...above change and adding definition of EXYNOS_PA_S_MDMA1 address can fix the problem you commented on EXYNOS4210 Rev0 without others?... The problem is affecting only EXYNOS4210 Rev0 and the fix is applied only for case when soc_is_exynos4210() samsung_rev() == EXYNOS4210_REV_0, or did you mean something else? Yeah, I know. I mean just adding secure mdma1 address is enough for exynos4210 rev0. 8- @@ -275,6 +275,9 @@ static int __init exynos_dma_init(void) exynos_pdma1_pdata.nr_valid_peri = ARRAY_SIZE(exynos4210_pdma1_peri); exynos_pdma1_pdata.peri_id = exynos4210_pdma1_peri; + + if (samsung_rev() == EXYNOS4210_REV_0) + exynos_mdma1_device.res.start = EXYNOS4_PA_S_MDMA1; } else if (soc_is_exynos4212() || soc_is_exynos4412()) { exynos_pdma0_pdata.nr_valid_peri = ARRAY_SIZE(exynos4212_pdma0_peri); diff --git a/arch/arm/mach-exynos/include/mach/map.h b/arch/arm/mach-exynos/include/mach/map.h index 8480849..0abfe78 100644 --- a/arch/arm/mach-exynos/include/mach/map.h +++ b/arch/arm/mach-exynos/include/mach/map.h @@ -90,6 +90,7 @@ #define EXYNOS4_PA_MDMA0 0x1081 #define EXYNOS4_PA_MDMA1 0x1285 +#define EXYNOS4_PA_S_MDMA1 0x1284 #define EXYNOS4_PA_PDMA0 0x1268 #define EXYNOS4_PA_PDMA1 0x1269 #define EXYNOS5_PA_MDMA0 0x1080 8 Ah, okay. Here is full simplified patch. From: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Subject: [PATCH v2] ARM: EXYNOS: PL330 MDMA1 fix for revision 0 of Exynos4210 SOC Commit 8214513 (ARM: EXYNOS: fix address for EXYNOS4 MDMA1) changed EXYNOS specific setup of PL330 DMA engine to use 'non-secure' mdma1 address instead of 'secure' one (from 0x1284 to 0x1285) to fix issue with some Exynos4212 SOCs. Unfortunately it brakes PL330 setup for revision 0 of Exynos4210 SOC (mdma1 device cannot be found at 'non-secure' address): [0.566245] dma-pl330 dma-pl330.2: PERIPH_ID 0x0, PCELL_ID 0x0 ! [0.566278] dma-pl330: probe of dma-pl330.2 failed with error -22 Fix it by using 'secure' mdma1 address on Exynos4210 revision 0 SOC. Reviewed-by: Tomasz Figa t.f...@samsung.com Cc: Kukjin Kim kgene@samsung.com Signed-off-by: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- arch/arm/mach-exynos/dma.c |3 +++ arch/arm/mach-exynos/include/mach/map.h |1 + 2 files changed, 4 insertions(+) Index: b/arch/arm/mach-exynos/dma.c === --- a/arch/arm/mach-exynos/dma.c2012-11-07 18:20:36.561743865 +0100 +++ b/arch/arm/mach-exynos/dma.c2012-11-08 10:48:23.445067606 +0100 @@ -275,6 +275,9 @@ static int __init exynos_dma_init(void) exynos_pdma1_pdata.nr_valid_peri = ARRAY_SIZE(exynos4210_pdma1_peri); exynos_pdma1_pdata.peri_id = exynos4210_pdma1_peri; + + if (samsung_rev() == EXYNOS4210_REV_0) + exynos_mdma1_device.res.start = EXYNOS4_PA_S_MDMA1; } else if (soc_is_exynos4212() || soc_is_exynos4412()) { exynos_pdma0_pdata.nr_valid_peri = ARRAY_SIZE(exynos4212_pdma0_peri); Index: b/arch/arm/mach-exynos/include/mach/map.h === --- a/arch/arm/mach-exynos/include/mach/map.h 2012-11-07 18:20:44.801743862 +0100 +++ b/arch/arm/mach-exynos/include/mach/map.h 2012-11-08 10:48:40.597067605 +0100 @@ -92,6 +92,7 @@ #define EXYNOS4_PA_MDMA0 0x1081 #define EXYNOS4_PA_MDMA1 0x1285 +#define EXYNOS4_PA_S_MDMA1 0x1284 #define EXYNOS4_PA_PDMA0 0x1268 #define EXYNOS4_PA_PDMA1 0x1269 #define EXYNOS5_PA_MDMA0 0x1080 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] virtio_scsi: fix memory leak on full queue condition.
virtscsi_queuecommand was leaking memory when the virtio queue was full. Tested: Guest operates correctly even with very small queue sizes, validated we're not leaking kmalloc-192 sized allocations anymore. Signed-off-by: Eric Northup digitale...@google.com --- drivers/scsi/virtio_scsi.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 595af1a..dd8dc27 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -469,6 +469,8 @@ static int virtscsi_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc) sizeof cmd-req.cmd, sizeof cmd-resp.cmd, GFP_ATOMIC) = 0) ret = 0; + else + mempool_free(cmd, virtscsi_cmd_pool); out: return ret; -- 1.7.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/9] cgroup_freezer: prepare freezer_change_state() for full hierarchy support
On Sat 03-11-12 01:38:31, Tejun Heo wrote: * Make freezer_change_state() take bool @freeze instead of enum freezer_state. * Separate out freezer_apply_state() out of freezer_change_state(). This makes freezer_change_state() a rather silly thin wrapper. It will be filled with hierarchy handling later on. This patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo t...@kernel.org Makes sense Reviewed-by: Michal Hocko mho...@suse.cz --- kernel/cgroup_freezer.c | 48 ++-- 1 file changed, 30 insertions(+), 18 deletions(-) diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c index 975b3d8..2690830 100644 --- a/kernel/cgroup_freezer.c +++ b/kernel/cgroup_freezer.c @@ -247,45 +247,57 @@ static void unfreeze_cgroup(struct freezer *freezer) cgroup_iter_end(cgroup, it); } -static void freezer_change_state(struct freezer *freezer, - enum freezer_state goal_state) +/** + * freezer_apply_state - apply state change to a single cgroup_freezer + * @freezer: freezer to apply state change to + * @freeze: whether to freeze or unfreeze + */ +static void freezer_apply_state(struct freezer *freezer, bool freeze) { /* also synchronizes against task migration, see freezer_attach() */ - spin_lock_irq(freezer-lock); + lockdep_assert_held(freezer-lock); - switch (goal_state) { - case CGROUP_THAWED: - if (freezer-state != CGROUP_THAWED) - atomic_dec(system_freezing_cnt); - freezer-state = CGROUP_THAWED; - unfreeze_cgroup(freezer); - break; - case CGROUP_FROZEN: + if (freeze) { if (freezer-state == CGROUP_THAWED) atomic_inc(system_freezing_cnt); freezer-state = CGROUP_FREEZING; freeze_cgroup(freezer); - break; - default: - BUG(); + } else { + if (freezer-state != CGROUP_THAWED) + atomic_dec(system_freezing_cnt); + freezer-state = CGROUP_THAWED; + unfreeze_cgroup(freezer); } +} +/** + * freezer_change_state - change the freezing state of a cgroup_freezer + * @freezer: freezer of interest + * @freeze: whether to freeze or thaw + * + * Freeze or thaw @cgroup according to @freeze. + */ +static void freezer_change_state(struct freezer *freezer, bool freeze) +{ + /* update @freezer */ + spin_lock_irq(freezer-lock); + freezer_apply_state(freezer, freeze); spin_unlock_irq(freezer-lock); } static int freezer_write(struct cgroup *cgroup, struct cftype *cft, const char *buffer) { - enum freezer_state goal_state; + bool freeze; if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0) - goal_state = CGROUP_THAWED; + freeze = false; else if (strcmp(buffer, freezer_state_strs[CGROUP_FROZEN]) == 0) - goal_state = CGROUP_FROZEN; + freeze = true; else return -EINVAL; - freezer_change_state(cgroup_freezer(cgroup), goal_state); + freezer_change_state(cgroup_freezer(cgroup), freeze); return 0; } -- 1.7.11.7 -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] Staging: winbond: wb35rx_s: Fixed coding style issue
On Thu, Nov 8, 2012 at 12:59 PM, Dan Carpenter dan.carpen...@oracle.com wrote: It's better to use more descriptive subjects on the patches. This one could probably have been broken into smaller patches [patch 4/x] Staging: winbond: wb35rx_s: fix white space [patch 5/x] Staging: winbond: wb35rx_s: fix comments [patch 6/x] Staging: winbond: wb35rx_s: allow header to be included twice It's small enough that I don't have strong feelings about it, but in general that's how you should do it. Thanks Dan for your comment. I'll keep this in mind during my future work. Regards, Adil regards, dan carpenter -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3] firmware loader: Fix the race FW_STATUS_DONE is followed by class_timeout
On Thu, Nov 8, 2012 at 7:14 PM, Chuansheng Liu chuansheng@intel.com wrote: There is a race as below when calling request_firmware(): CPU1 CPU2 write 0 loading mutex_lock(fw_lock) ... set_bit FW_STATUS_DONE class_timeout is coming set_bit FW_STATUS_ABORT complete_all completion ... mutex_unlock(fw_lock) In this time, the bit FW_STATUS_DONE and FW_STATUS_ABORT are set, and request_firmware() will return failure due to condition in _request_firmware_load(): if (!buf-size || test_bit(FW_STATUS_ABORT, buf-status)) retval = -ENOENT; But from the above scenerio, it should be a successful requesting. So we need judge if the bit FW_STATUS_DONE is already set before calling fw_load_abort() in timeout function. As Ming's proposal, we need change the timer into sched_work to benefit from using fw_lock mutex also. Signed-off-by: liu chuansheng chuansheng@intel.com Acked-by: Ming Lei ming@canonical.com Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] DMA: add cpu_relax() to busy-loop in dma_sync_wait()
From: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Subject: [PATCH] DMA: add cpu_relax() to busy-loop in dma_sync_wait() Removal of the busy-loop from dma_sync_wait() is not a trivial task so just add cpu_relax() to the loop for now. Cc: Vinod Koul vinod.k...@intel.com Cc: Dan Williams d...@fb.com Cc: Tomasz Figa t.f...@samsung.com Signed-off-by: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- drivers/dma/dmaengine.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) Index: b/drivers/dma/dmaengine.c === --- a/drivers/dma/dmaengine.c 2012-11-07 16:12:41.776876102 +0100 +++ b/drivers/dma/dmaengine.c 2012-11-07 16:13:04.956876097 +0100 @@ -266,7 +266,10 @@ enum dma_status dma_sync_wait(struct dma pr_err(%s: timeout!\n, __func__); return DMA_ERROR; } - } while (status == DMA_IN_PROGRESS); + if (status != DMA_IN_PROGRESS) + break; + cpu_relax(); + } while (1); return status; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] async_tx: fix checking of dma_wait_for_async_tx() return value
From: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Subject: [PATCH] async_tx: fix checking of dma_wait_for_async_tx() return value dma_wait_for_async_tx() can also return DMA_PAUSED (which should be considered as error). Cc: Vinod Koul vinod.k...@intel.com Cc: Dan Williams d...@fb.com Cc: Tomasz Figa t.f...@samsung.com Signed-off-by: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- crypto/async_tx/async_tx.c |9 + 1 file changed, 5 insertions(+), 4 deletions(-) Index: b/crypto/async_tx/async_tx.c === --- a/crypto/async_tx/async_tx.c2012-11-07 16:30:47.940875970 +0100 +++ b/crypto/async_tx/async_tx.c2012-11-07 16:31:34.75965 +0100 @@ -128,8 +128,8 @@ async_tx_channel_switch(struct dma_async } device-device_issue_pending(chan); } else { - if (dma_wait_for_async_tx(depend_tx) == DMA_ERROR) - panic(%s: DMA_ERROR waiting for depend_tx\n, + if (dma_wait_for_async_tx(depend_tx) != DMA_SUCCESS) + panic(%s: DMA error waiting for depend_tx\n, __func__); tx-tx_submit(tx); } @@ -280,8 +280,9 @@ void async_tx_quiesce(struct dma_async_t * we are referring to the correct operation */ BUG_ON(async_tx_test_ack(*tx)); - if (dma_wait_for_async_tx(*tx) == DMA_ERROR) - panic(DMA_ERROR waiting for transaction\n); + if (dma_wait_for_async_tx(*tx) != DMA_SUCCESS) + panic(%s: DMA error waiting for transaction\n, + __func__); async_tx_ack(*tx); *tx = NULL; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] DMA: remove dma_async_memcpy_pending() macro
From: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Subject: [PATCH 1/2] DMA: remove dma_async_memcpy_pending() macro Just use dma_async_issue_pending() directly. Cc: Vinod Koul vinod.k...@intel.com Cc: Dan Williams d...@fb.com Cc: Tomasz Figa t.f...@samsung.com Signed-off-by: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- drivers/misc/carma/carma-fpga-program.c |2 +- drivers/misc/carma/carma-fpga.c |2 +- include/linux/dmaengine.h |2 -- net/ipv4/tcp.c |6 +++--- 4 files changed, 5 insertions(+), 7 deletions(-) Index: b/drivers/misc/carma/carma-fpga-program.c === --- a/drivers/misc/carma/carma-fpga-program.c 2012-11-07 16:00:59.680876184 +0100 +++ b/drivers/misc/carma/carma-fpga-program.c 2012-11-07 16:01:05.612876185 +0100 @@ -546,7 +546,7 @@ static noinline int fpga_program_dma(str goto out_dma_unmap; } - dma_async_memcpy_issue_pending(chan); + dma_async_issue_pending(chan); /* Set the total byte count */ fpga_set_byte_count(priv-regs, priv-bytes); Index: b/drivers/misc/carma/carma-fpga.c === --- a/drivers/misc/carma/carma-fpga.c 2012-11-07 16:01:17.304876183 +0100 +++ b/drivers/misc/carma/carma-fpga.c 2012-11-07 16:01:23.568876183 +0100 @@ -749,7 +749,7 @@ static irqreturn_t data_irq(int irq, voi submitted = true; /* Start the DMA Engine */ - dma_async_memcpy_issue_pending(priv-chan); + dma_async_issue_pending(priv-chan); out: /* If no DMA was submitted, re-enable interrupts */ Index: b/include/linux/dmaengine.h === --- a/include/linux/dmaengine.h 2012-11-07 16:01:31.720876180 +0100 +++ b/include/linux/dmaengine.h 2012-11-07 16:01:45.072876180 +0100 @@ -902,8 +902,6 @@ static inline void dma_async_issue_pendi chan-device-device_issue_pending(chan); } -#define dma_async_memcpy_issue_pending(chan) dma_async_issue_pending(chan) - /** * dma_async_is_tx_complete - poll for transaction completion * @chan: DMA channel Index: b/net/ipv4/tcp.c === --- a/net/ipv4/tcp.c2012-11-07 16:01:52.496876179 +0100 +++ b/net/ipv4/tcp.c2012-11-07 16:02:13.924876175 +0100 @@ -1410,7 +1410,7 @@ static void tcp_service_net_dma(struct s return; last_issued = tp-ucopy.dma_cookie; - dma_async_memcpy_issue_pending(tp-ucopy.dma_chan); + dma_async_issue_pending(tp-ucopy.dma_chan); do { if (dma_async_memcpy_complete(tp-ucopy.dma_chan, @@ -1744,7 +1744,7 @@ int tcp_recvmsg(struct kiocb *iocb, stru tcp_service_net_dma(sk, true); tcp_cleanup_rbuf(sk, copied); } else - dma_async_memcpy_issue_pending(tp-ucopy.dma_chan); + dma_async_issue_pending(tp-ucopy.dma_chan); } #endif if (copied = target) { @@ -1837,7 +1837,7 @@ do_prequeue: break; } - dma_async_memcpy_issue_pending(tp-ucopy.dma_chan); + dma_async_issue_pending(tp-ucopy.dma_chan); if ((offset + used) == skb-len) copied_early = true; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] DMA: remove dma_async_memcpy_complete() macro
From: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Subject: [PATCH 2/2] DMA: remove dma_async_memcpy_complete() macro Just use dma_async_is_tx_complete() directly. Cc: Vinod Koul vinod.k...@intel.com Cc: Dan Williams d...@fb.com Cc: Tomasz Figa t.f...@samsung.com Signed-off-by: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- include/linux/dmaengine.h |5 + net/ipv4/tcp.c|2 +- 2 files changed, 2 insertions(+), 5 deletions(-) Index: b/include/linux/dmaengine.h === --- a/include/linux/dmaengine.h 2012-11-07 16:04:41.028876159 +0100 +++ b/include/linux/dmaengine.h 2012-11-07 16:05:22.004876153 +0100 @@ -927,16 +927,13 @@ static inline enum dma_status dma_async_ return status; } -#define dma_async_memcpy_complete(chan, cookie, last, used)\ - dma_async_is_tx_complete(chan, cookie, last, used) - /** * dma_async_is_complete - test a cookie against chan state * @cookie: transaction identifier to test status of * @last_complete: last know completed transaction * @last_used: last cookie value handed out * - * dma_async_is_complete() is used in dma_async_memcpy_complete() + * dma_async_is_complete() is used in dma_async_is_tx_complete() * the test logic is separated for lightweight testing of multiple cookies */ static inline enum dma_status dma_async_is_complete(dma_cookie_t cookie, Index: b/net/ipv4/tcp.c === --- a/net/ipv4/tcp.c2012-11-07 16:04:11.700876163 +0100 +++ b/net/ipv4/tcp.c2012-11-07 16:04:26.444876161 +0100 @@ -1413,7 +1413,7 @@ static void tcp_service_net_dma(struct s dma_async_issue_pending(tp-ucopy.dma_chan); do { - if (dma_async_memcpy_complete(tp-ucopy.dma_chan, + if (dma_async_is_tx_complete(tp-ucopy.dma_chan, last_issued, done, used) == DMA_SUCCESS) { /* Safe to free early-copied skbs now */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] raid5: panic() on dma_wait_for_async_tx() error
From: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Subject: [PATCH] raid5: panic() on dma_wait_for_async_tx() error There is not much we can do on dma_wait_for_async_tx() error so just panic() for now. Cc: Neil Brown ne...@suse.de Cc: Vinod Koul vinod.k...@intel.com Cc: Dan Williams d...@fb.com Cc: Tomasz Figa t.f...@samsung.com Signed-off-by: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- drivers/md/raid5.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: b/drivers/md/raid5.c === --- a/drivers/md/raid5.c2012-11-07 16:25:19.480876012 +0100 +++ b/drivers/md/raid5.c2012-11-07 16:27:46.244875992 +0100 @@ -3223,7 +3223,9 @@ static void handle_stripe_expansion(stru /* done submitting copies, wait for them to complete */ if (tx) { async_tx_ack(tx); - dma_wait_for_async_tx(tx); + if (dma_wait_for_async_tx(tx) != DMA_SUCCESS) + panic(%s: DMA error waiting for transaction\n, + __func__); } } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] DMA: remove unused support for MEMSET operations
From: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Subject: [PATCH] DMA: remove unused support for MEMSET operations There have never been any real users of MEMSET operations since they have been introduced in January 2007 (commit 7405f74badf46b5d023c5d2b670b4471525f6c91 dmaengine: refactor dmaengine around dma_async_tx_descriptor). Therefore remove support for them for now, it can be always brought back when needed. Cc: Vinod Koul vinod.k...@intel.com Cc: Dan Williams d...@fb.com Cc: Tomasz Figa t.f...@samsung.com Signed-off-by: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- Documentation/crypto/async-tx-api.txt |1 arch/arm/mach-iop13xx/setup.c |3 - arch/arm/plat-iop/adma.c |2 arch/arm/plat-orion/common.c |5 - crypto/async_tx/Kconfig |4 - crypto/async_tx/Makefile |1 crypto/async_tx/async_memset.c| 88 --- drivers/dma/dmaengine.c |7 -- drivers/dma/ioat/dma.c|1 drivers/dma/ioat/dma_v3.c | 94 -- drivers/dma/iop-adma.c| 66 --- drivers/dma/mv_xor.c | 60 + drivers/dma/mv_xor.h |1 drivers/dma/ppc4xx/adma.c | 47 - include/linux/async_tx.h |4 - include/linux/dmaengine.h |5 - 16 files changed, 4 insertions(+), 385 deletions(-) Index: b/Documentation/crypto/async-tx-api.txt === --- a/Documentation/crypto/async-tx-api.txt 2012-11-07 15:00:06.208876620 +0100 +++ b/Documentation/crypto/async-tx-api.txt 2012-11-07 15:00:15.864876621 +0100 @@ -222,5 +222,4 @@ drivers/dma/: location for offload engin include/linux/async_tx.h: core header file for the async_tx api crypto/async_tx/async_tx.c: async_tx interface to dmaengine and common code crypto/async_tx/async_memcpy.c: copy offload -crypto/async_tx/async_memset.c: memory fill offload crypto/async_tx/async_xor.c: xor and xor zero sum offload Index: b/arch/arm/mach-iop13xx/setup.c === --- a/arch/arm/mach-iop13xx/setup.c 2012-11-07 15:15:11.000876512 +0100 +++ b/arch/arm/mach-iop13xx/setup.c 2012-11-07 15:15:21.068876510 +0100 @@ -469,7 +469,6 @@ void __init iop13xx_platform_init(void) dma_cap_set(DMA_MEMCPY, plat_data-cap_mask); dma_cap_set(DMA_XOR, plat_data-cap_mask); dma_cap_set(DMA_XOR_VAL, plat_data-cap_mask); - dma_cap_set(DMA_MEMSET, plat_data-cap_mask); dma_cap_set(DMA_INTERRUPT, plat_data-cap_mask); break; case IOP13XX_INIT_ADMA_1: @@ -479,7 +478,6 @@ void __init iop13xx_platform_init(void) dma_cap_set(DMA_MEMCPY, plat_data-cap_mask); dma_cap_set(DMA_XOR, plat_data-cap_mask); dma_cap_set(DMA_XOR_VAL, plat_data-cap_mask); - dma_cap_set(DMA_MEMSET, plat_data-cap_mask); dma_cap_set(DMA_INTERRUPT, plat_data-cap_mask); break; case IOP13XX_INIT_ADMA_2: @@ -489,7 +487,6 @@ void __init iop13xx_platform_init(void) dma_cap_set(DMA_MEMCPY, plat_data-cap_mask); dma_cap_set(DMA_XOR, plat_data-cap_mask); dma_cap_set(DMA_XOR_VAL, plat_data-cap_mask); - dma_cap_set(DMA_MEMSET, plat_data-cap_mask); dma_cap_set(DMA_INTERRUPT, plat_data-cap_mask); dma_cap_set(DMA_PQ, plat_data-cap_mask); dma_cap_set(DMA_PQ_VAL, plat_data-cap_mask); Index: b/arch/arm/plat-iop/adma.c === --- a/arch/arm/plat-iop/adma.c 2012-11-07 15:15:28.968876510 +0100 +++ b/arch/arm/plat-iop/adma.c 2012-11-07 15:15:44.540876510 +0100 @@ -192,12 +192,10 @@ static int __init iop3xx_adma_cap_init(v #ifdef CONFIG_ARCH_IOP32X /* the 32x AAU does not perform zero sum */ dma_cap_set(DMA_XOR, iop3xx_aau_data.cap_mask); - dma_cap_set(DMA_MEMSET, iop3xx_aau_data.cap_mask); dma_cap_set(DMA_INTERRUPT, iop3xx_aau_data.cap_mask); #else dma_cap_set(DMA_XOR, iop3xx_aau_data.cap_mask); dma_cap_set(DMA_XOR_VAL, iop3xx_aau_data.cap_mask); - dma_cap_set(DMA_MEMSET, iop3xx_aau_data.cap_mask); dma_cap_set(DMA_INTERRUPT, iop3xx_aau_data.cap_mask); #endif Index: b/arch/arm/plat-orion/common.c === --- a/arch/arm/plat-orion/common.c 2012-11-07 15:15:50.744876507
[GIT PULL] s390 patches for the 3.7-rc5
Hi Linus, please pull from the 'for-linus' branch of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git for-linus to receive a couple of bug fixes. I keep the fingers crossed that we now got transparent huge pages ready for prime time. Cornelia Huck (1): s390: Move css limits from drivers/s390/cio/ to include/asm/. Gerald Schaefer (2): s390/mm: use pmd_large() instead of pmd_huge() s390/thp: respect page protection in pmd_none() and pmd_present() Heiko Carstens (1): s390/sclp: fix addressing mode clobber Sebastian Ott (2): s390/cio: suppress 2nd path verification during resume s390/cio: fix length calculation in idset.c arch/s390/include/asm/cio.h |2 ++ arch/s390/include/asm/pgtable.h | 35 ++- arch/s390/kernel/sclp.S |8 +++- arch/s390/lib/uaccess_pt.c |2 +- arch/s390/mm/gup.c |2 +- drivers/s390/cio/css.h |3 --- drivers/s390/cio/device.c |8 +--- drivers/s390/cio/idset.c|3 +-- 8 files changed, 35 insertions(+), 28 deletions(-) diff --git a/arch/s390/include/asm/cio.h b/arch/s390/include/asm/cio.h index 55bde60..ad2b924 100644 --- a/arch/s390/include/asm/cio.h +++ b/arch/s390/include/asm/cio.h @@ -9,6 +9,8 @@ #define LPM_ANYPATH 0xff #define __MAX_CSSID 0 +#define __MAX_SUBCHANNEL 65535 +#define __MAX_SSID 3 #include asm/scsw.h diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index dd647c9..2d3b7cb 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -506,12 +506,15 @@ static inline int pud_bad(pud_t pud) static inline int pmd_present(pmd_t pmd) { - return (pmd_val(pmd) _SEGMENT_ENTRY_ORIGIN) != 0UL; + unsigned long mask = _SEGMENT_ENTRY_INV | _SEGMENT_ENTRY_RO; + return (pmd_val(pmd) mask) == _HPAGE_TYPE_NONE || + !(pmd_val(pmd) _SEGMENT_ENTRY_INV); } static inline int pmd_none(pmd_t pmd) { - return (pmd_val(pmd) _SEGMENT_ENTRY_INV) != 0UL; + return (pmd_val(pmd) _SEGMENT_ENTRY_INV) + !(pmd_val(pmd) _SEGMENT_ENTRY_RO); } static inline int pmd_large(pmd_t pmd) @@ -1223,6 +1226,11 @@ static inline void __pmd_idte(unsigned long address, pmd_t *pmdp) } #ifdef CONFIG_TRANSPARENT_HUGEPAGE + +#define SEGMENT_NONE __pgprot(_HPAGE_TYPE_NONE) +#define SEGMENT_RO __pgprot(_HPAGE_TYPE_RO) +#define SEGMENT_RW __pgprot(_HPAGE_TYPE_RW) + #define __HAVE_ARCH_PGTABLE_DEPOSIT extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pgtable_t pgtable); @@ -1242,16 +1250,15 @@ static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr, static inline unsigned long massage_pgprot_pmd(pgprot_t pgprot) { - unsigned long pgprot_pmd = 0; - - if (pgprot_val(pgprot) _PAGE_INVALID) { - if (pgprot_val(pgprot) _PAGE_SWT) - pgprot_pmd |= _HPAGE_TYPE_NONE; - pgprot_pmd |= _SEGMENT_ENTRY_INV; - } - if (pgprot_val(pgprot) _PAGE_RO) - pgprot_pmd |= _SEGMENT_ENTRY_RO; - return pgprot_pmd; + /* +* pgprot is PAGE_NONE, PAGE_RO, or PAGE_RW (see __Pxxx / __Sxxx) +* Convert to segment table entry format. +*/ + if (pgprot_val(pgprot) == pgprot_val(PAGE_NONE)) + return pgprot_val(SEGMENT_NONE); + if (pgprot_val(pgprot) == pgprot_val(PAGE_RO)) + return pgprot_val(SEGMENT_RO); + return pgprot_val(SEGMENT_RW); } static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) @@ -1269,7 +1276,9 @@ static inline pmd_t pmd_mkhuge(pmd_t pmd) static inline pmd_t pmd_mkwrite(pmd_t pmd) { - pmd_val(pmd) = ~_SEGMENT_ENTRY_RO; + /* Do not clobber _HPAGE_TYPE_NONE pages! */ + if (!(pmd_val(pmd) _SEGMENT_ENTRY_INV)) + pmd_val(pmd) = ~_SEGMENT_ENTRY_RO; return pmd; } diff --git a/arch/s390/kernel/sclp.S b/arch/s390/kernel/sclp.S index bf05389..b6506ee 100644 --- a/arch/s390/kernel/sclp.S +++ b/arch/s390/kernel/sclp.S @@ -44,6 +44,12 @@ _sclp_wait_int: #endif mvc .LoldpswS1-.LbaseS1(16,%r13),0(%r8) mvc 0(16,%r8),0(%r9) +#ifdef CONFIG_64BIT + epsw%r6,%r7 # set current addressing mode + nill%r6,0x1 # in new psw (31 or 64 bit mode) + nilh%r7,0x8000 + stm %r6,%r7,0(%r8) +#endif lhi %r6,0x0200 # cr mask for ext int (cr0.54) ltr %r2,%r2 jz .LsetctS1 @@ -87,7 +93,7 @@ _sclp_wait_int: .long 0x0008, 0x8000+.LwaitS1 # PSW to handle ext int #ifdef CONFIG_64BIT .LextpswS1_64: - .quad 0x00018000, .LwaitS1# PSW to handle ext int, 64 bit + .quad 0, .LwaitS1 # PSW to handle ext int, 64 bit #endif .LwaitpswS1: .long 0x010a,
Re: [RFC] Second attempt at kernel secure boot support
Hi, The basis for any secure boot is a way to detect that the system has been tampered with or not. Tamper Evidence. There are two main vectors for a system to be tampered with. Someone local to the machine and remote users who can access the machine across a network interface. (this includes the local user installing a program from a remote source) You have a fair chance of protecting via physical means (Locked rooms, Background checks on users etc.) of preventing a user with malicious intent to access the local machine. The first thing a computer does when switched on is run its first code instructions. Commonly referred to as the BIOS. It would therefore be a requirement to ensure that the BIOS cannot be tampered with via any other method apart from physically located at the machine. Once you have a base computer code that cannot be tampered with, you can trust it. From that point on, you can use digital signatures to build the chain of trust. Normally digital signatures would examine the binary, ensure the signature matches, and then run the code contained in it. It is vital that the private key used to sign binaries cannot be found on the local machine, otherwise an malicious user could use it to sign malicious code, and therefore break the trust. The binary files therefore must be signed on a separate computer, that is trusted and protected from malicious users. There is one known use case where the normal digital signature checks will not work and this is the Hibernate file. The files were digitally checked when loaded into a previously running machine. The state of the machine was then saved to a file. The problem is how to check the hibernate file has not been tampered with in the interim. As explained above, we cannot store a private key on the local machine, so some other method for checking that the hibernate file has not been tampered with is required. I would suggest the fix for this problem is working out a way to check the signature of binary files, while in RAM, or even on a running machine. This is the format that the hibernate file is, it is basically a RAM image. When starting a hibernate image, the file would have to be scanned and digital signature checked that all the executable code in the hibernate image was sourced from correctly digitally signed binaries. In fact, this last point, if done correctly, could replace virus scanners. We would then have a system that rather than scan for viruses, it instead scans for tampering. Remaining problems: 1) deciding who you trust, and from that, which digital signatures/certificates you trust. 2) Handling compromised or expired signatures/certificates. For 2, if the signatures are attached in each binary file, in order to distribute a new set of signatures, you would have to re-distribute all the binary files. Not a good idea due to download size. I would therefore suggest that the signatures are distributed separately from the binary files, so that you can change the signatures without having to redistribute all the binary files. Summary: 1) The BIOS code and the certificate it uses to check subsequently loaded binaries should only be changeable by a user local to the machine or not changeable at all without changing hardware. For example, on some ARM based mobile phones, the BIOS and certificate it uses are in a ROM, so not changeable at all. It then uses a multi stage boot loader, with each stage providing for a different certificate to be used to the next stage. This then permits the certificates that are used to sign the Linux kernel to be changed without having to change the certificate in the ROM. For Secure boot for Linux, the BIOS and certificate should probably be controlled by the user who controls the physical access to his machine. Then multi stage boot loaders can be used to introduce a chain to trust to trust other certificates, such as the debian or redhat or Microsoft ones, if the user chooses to trust them. With the user using their own BIOS certificate, it is very unlikely for the remote malicious user to obtain the private key and thus compromise the security of the system. 2) When tamper is detected, the system should revert to a stable safe state. This probably means, prevent the system booting, and present the local user with the evidence of tampering. Letting the user choose the next step. On 7 November 2012 14:55, Matthew Garrett mj...@srcf.ucam.org wrote: On Wed, Nov 07, 2012 at 09:19:35AM +0100, Olivier Galibert wrote: On Tue, Nov 6, 2012 at 11:47 PM, Matthew Garrett mj...@srcf.ucam.orgwrote: Sure, and scripts run as root can wipe your files too. That's really not what this is all about. What it is about then? What is secure boot supposed to do for the owner of the computer in a linux context? I've not been able to understand it through this discussion. It provides a chain of trust that allows you to ensure that a platform boots a trusted kernel. That's a pre-requisite for implementing any
Re: [PATCH v2 2/2] therma: exynos: Supports thermal tripping
On 31 October 2012 12:17, Jonghwan Choi jhbird.c...@samsung.com wrote: TMU urgently sends active-high signal (thermal trip) to PMU, and thermal tripping by hardware logic i.e PMU is performed. Thermal tripping means that PMU cut off the whole power of SoC by controlling external voltage regulator. Signed-off-by: Jonghwan Choi jhbird.c...@samsung.com --- drivers/thermal/exynos_thermal.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/thermal/exynos_thermal.c b/drivers/thermal/exynos_thermal.c index 6ce6667..5672e95 100644 --- a/drivers/thermal/exynos_thermal.c +++ b/drivers/thermal/exynos_thermal.c @@ -53,6 +53,7 @@ #define EXYNOS_TMU_TRIM_TEMP_MASK 0xff #define EXYNOS_TMU_GAIN_SHIFT 8 #define EXYNOS_TMU_REF_VOLTAGE_SHIFT 24 +#define EXYNOS_TMU_TRIP_EN BIT(12) #define EXYNOS_TMU_CORE_ON 1 #define EXYNOS_TMU_CORE_OFF0 #define EXYNOS_TMU_DEF_CODE_TO_TEMP_OFFSET 50 @@ -656,6 +657,9 @@ static void exynos_tmu_control(struct platform_device *pdev, bool on) if (data-soc == SOC_ARCH_EXYNOS) { con |= pdata-noise_cancel_mode EXYNOS_TMU_TRIP_MODE_SHIFT; con |= (EXYNOS_MUX_ADDR_VALUE EXYNOS_MUX_ADDR_SHIFT); + + if (pdata-trigger_level3_en) + con |= EXYNOS_TMU_TRIP_EN; Hi Jonghwan Choi, IMO, Also you need to write 4th trigger level, Currently only 3 trigger levels are stored in register THD_TEMP_RISE. Thanks, Amit Daniel } if (on) { -- 1.7.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pwm: lpc32xx - Fix the PWM polarity
On Thu, 08 Nov 2012 10:51:35 +0100 Roland Stigge sti...@antcom.de wrote: On 07/11/12 16:25, Alban Bedel wrote: Signed-off-by: Alban Bedel alban.be...@avionic-design.de --- drivers/pwm/pwm-lpc32xx.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/drivers/pwm/pwm-lpc32xx.c b/drivers/pwm/pwm-lpc32xx.c index adb87f0..0dc278d 100644 --- a/drivers/pwm/pwm-lpc32xx.c +++ b/drivers/pwm/pwm-lpc32xx.c @@ -51,7 +51,11 @@ static int lpc32xx_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm, c = 256 * duty_ns; do_div(c, period_ns); - duty_cycles = c; + if (c == 0) + c = 256; + if (c 255) + c = 255; + duty_cycles = 256 - c; Except for the range check (for the original c 255), this results in: duty_cycles = 256 - c except for (c == 0) where duty_cycles = 1 No it lead to duty_cycles = 0 which actually is duty_cycles = (256 - c) - 255 (think with the original c) i.e. nearly a polarity inversion in the case of (c == 0). Why is the case (c == 0) so special here? Maybe you can document this, if it is really intended? It is intended, the formular for duty value in the register is: duty = (256 - 256*duty_ns/period_ns) % 256 But the code avoid the modulo by clamping '256*duty_ns/period_ns' to 1-256. Perhaps something like: if (c 255) c = 255; duty_cycles = (256 - c) % 256; would be easier to understand. Alban -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Device Tree Overlays Proposal (Was Re: capebus moving omap_devices to mach-omap2)
+ Peter Hi Stephen, On 11/7/2012 6:25 PM, Stephen Warren wrote: On 11/07/2012 03:19 AM, Benoit Cousson wrote: Hi Panto, On 11/07/2012 09:13 AM, Pantelis Antoniou wrote: Hi Grant On Nov 6, 2012, at 9:45 PM, Grant Likely wrote: On Tue, Nov 6, 2012 at 7:34 PM, Pantelis Antoniou pa...@antoniou-consulting.com wrote: [ snip ] g. Since we've started talking about longer term goals, and the versioning provision seems to stand, I hope we address how much the fragment versioning thing is similar to the way board revisions progress. If a versioning syntax is available then one could create a single DT file for a bunch of 'almost' similar board and board revisions. I even think that the version issue is probably much more important for the short term than the overlay aspect. Well at least as important. We start having as well a bunch a panda board version with different HW setup. Having a single board-XXX.dts that will support all these versions is probably the best approach to avoid choosing that from the bootloader. We need to figure out a format + mechanism compatible with the current non-versioned format to allow filtering the nodes at runtime to keep only the relevant one. Something that can find the driver that will provide the proper board version or subsystem version or whatever like that: compatible-version = ti,panda-version, panda; Then at runtime we should create only the node with the correct match between the driver version and the string version. /* regular panda audio routing */ sound: sound { compatible = ti,abe-twl6040; ti,model = PandaBoard; compatible-version = ti,panda-version, panda; /* Audio routing */ ti,audio-routing = Headset Stereophone, HSOL, Headset Stereophone, HSOR, Ext Spk, HFL, Ext Spk, HFR, Line Out, AUXL, Line Out, AUXR, HSMIC, Headset Mic, Headset Mic, Headset Mic Bias, AFML, Line In, AFMR, Line In; }; /* Audio routing is different between PandaBoard4430 and PandaBoardES */ sound { ti,model = PandaBoardES; compatible-version = ti,panda-version, panda-es; /* Audio routing */ ti,audio-routing = Headset Stereophone, HSOL, Headset Stereophone, HSOR, Ext Spk, HFL, Ext Spk, HFR, Line Out, AUXL, Line Out, AUXR, AFML, Line In, AFMR, Line In; }; Maybe some extra version match table can just be passed during the board machine_init of_platform_populate(NULL, omap_dt_match_table, NULL, NULL, panda_version_match_table); Is the only difference here the content of the ti,audio-routing property? If so, then I don't think there's any need for infra-structure for this; the driver code already reads that property and adjusts its behaviour based upon it. That was just an example, and maybe not the best one. It could be any kind of HW differences, like a different GPIO line, a different I2C peripheral, an extra DCDC... The point was just that you might have several version of the same node with different attributes depending of the board revision. Regards, Benoit -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ARM: plat-versatile: move secondary CPU startup code out of .init for hotplug
Using __CPUINIT instead of __INIT puts the secondary CPU startup code into the right section: it will not be freed in hotplug configurations, allowing hot-add of cpus, while still getting freed in non-hotplug configs. Signed-off-by: Claudio Fontana claudio.font...@huawei.com --- arch/arm/plat-versatile/headsmp.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/plat-versatile/headsmp.S b/arch/arm/plat-versatile/headsmp.S index dd703ef..19fe180 100644 --- a/arch/arm/plat-versatile/headsmp.S +++ b/arch/arm/plat-versatile/headsmp.S @@ -11,7 +11,7 @@ #include linux/linkage.h #include linux/init.h - __INIT + __CPUINIT /* * Realview/Versatile Express specific entry point for secondary CPUs. -- 1.7.12.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/9] cgroup_freezer: make freezer-state mask of flags
On Sat 03-11-12 01:38:32, Tejun Heo wrote: freezer-state was an enum value - one of THAWED, FREEZING and FROZEN. As the scheduled full hierarchy support requires more than one freezing condition, switch it to mask of flags. If FREEZING is not set, it's thawed. FREEZING is set if freezing or frozen. If frozen, both FREEZING and FROZEN are set. Now that tasks can be attached to an already frozen cgroup, this also makes freezing condition checks more natural. This patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo t...@kernel.org I think it would be nicer to use freezer_state_flags enum rather than unsigned int for the state. I would even expect gcc to complain about that but it looks like -fstrict-enums is c++ specific (so long enum safety). Anyway Reviewed-by: Michal Hocko mho...@suse.cz --- kernel/cgroup_freezer.c | 60 ++--- 1 file changed, 27 insertions(+), 33 deletions(-) diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c index 2690830..e76aa9f 100644 --- a/kernel/cgroup_freezer.c +++ b/kernel/cgroup_freezer.c @@ -22,15 +22,14 @@ #include linux/freezer.h #include linux/seq_file.h -enum freezer_state { - CGROUP_THAWED = 0, - CGROUP_FREEZING, - CGROUP_FROZEN, +enum freezer_state_flags { + CGROUP_FREEZING = (1 1), /* this freezer is freezing */ + CGROUP_FROZEN = (1 3), /* this and its descendants frozen */ }; struct freezer { struct cgroup_subsys_state css; - enum freezer_state state; + unsigned intstate; spinlock_t lock; }; @@ -48,12 +47,10 @@ static inline struct freezer *task_freezer(struct task_struct *task) bool cgroup_freezing(struct task_struct *task) { - enum freezer_state state; bool ret; rcu_read_lock(); - state = task_freezer(task)-state; - ret = state == CGROUP_FREEZING || state == CGROUP_FROZEN; + ret = task_freezer(task)-state CGROUP_FREEZING; rcu_read_unlock(); return ret; @@ -63,10 +60,13 @@ bool cgroup_freezing(struct task_struct *task) * cgroups_write_string() limits the size of freezer state strings to * CGROUP_LOCAL_BUFFER_SIZE */ -static const char *freezer_state_strs[] = { - THAWED, - FREEZING, - FROZEN, +static const char *freezer_state_strs(unsigned int state) +{ + if (state CGROUP_FROZEN) + return FROZEN; + if (state CGROUP_FREEZING) + return FREEZING; + return THAWED; }; /* @@ -91,7 +91,6 @@ static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup) return ERR_PTR(-ENOMEM); spin_lock_init(freezer-lock); - freezer-state = CGROUP_THAWED; return freezer-css; } @@ -99,7 +98,7 @@ static void freezer_destroy(struct cgroup *cgroup) { struct freezer *freezer = cgroup_freezer(cgroup); - if (freezer-state != CGROUP_THAWED) + if (freezer-state CGROUP_FREEZING) atomic_dec(system_freezing_cnt); kfree(freezer); } @@ -129,15 +128,13 @@ static void freezer_attach(struct cgroup *new_cgrp, struct cgroup_taskset *tset) * Tasks in @tset are on @new_cgrp but may not conform to its * current state before executing the following - !frozen tasks may * be visible in a FROZEN cgroup and frozen tasks in a THAWED one. - * This means that, to determine whether to freeze, one should test - * whether the state equals THAWED. */ cgroup_taskset_for_each(task, new_cgrp, tset) { - if (freezer-state == CGROUP_THAWED) { + if (!(freezer-state CGROUP_FREEZING)) { __thaw_task(task); } else { freeze_task(task); - freezer-state = CGROUP_FREEZING; + freezer-state = ~CGROUP_FROZEN; } } @@ -159,11 +156,7 @@ static void freezer_fork(struct task_struct *task) goto out; spin_lock_irq(freezer-lock); - /* - * @task might have been just migrated into a FROZEN cgroup. Test - * equality with THAWED. Read the comment in freezer_attach(). - */ - if (freezer-state != CGROUP_THAWED) + if (freezer-state CGROUP_FREEZING) freeze_task(task); spin_unlock_irq(freezer-lock); out: @@ -184,7 +177,8 @@ static void update_if_frozen(struct freezer *freezer) struct cgroup_iter it; struct task_struct *task; - if (freezer-state != CGROUP_FREEZING) + if (!(freezer-state CGROUP_FREEZING) || + (freezer-state CGROUP_FROZEN)) return; cgroup_iter_start(cgroup, it); @@ -202,7 +196,7 @@ static void update_if_frozen(struct freezer *freezer) } } - freezer-state = CGROUP_FROZEN;
Fix perf DSOs' map address if .text is not the first secion of vmlinux
From 1bacfabf8369764126758bbbea1d3963ac778cce Mon Sep 17 00:00:00 2001 From: Lu Zhigang z...@tilera.com Date: Thu, 8 Nov 2012 04:31:05 -0500 Subject: [PATCH 1/1] perf symbol: Don't assume .text section is the first section of vmlinux The start address derived from /proc/kallsyms in is the start address of kernel, but not the start address of .text section of kernel. If the .text section is not at the beginning of vmlinux, perf will mess up the sections' address range, thus failing to resolve the kernel symbols. Verified on TILE architecture whose kernel sections are as following. Sections: Idx Name Size VMA LMA File off Algn 0 .intrpt1 3fe8 fff7 0001 2**3 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .text 008485a0 fff70002 0002 0002 2**6 CONTENTS, ALLOC, LOAD, READONLY, CODE 2 .init.text00047e88 fff70087 0087 0087 2**3 ... Signed-off-by: Lu Zhigang z...@tilera.com --- tools/perf/util/symbol-elf.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c index db0cc92..7fc219b 100644 --- a/tools/perf/util/symbol-elf.c +++ b/tools/perf/util/symbol-elf.c @@ -645,6 +645,7 @@ int dso__load_sym(struct dso *dso, struct map *map, Elf_Scn *sec, *sec_strndx; Elf *elf; int nr = 0; + u64 kernel_start = map-start; dso-symtab_type = syms_ss-type; @@ -746,6 +747,7 @@ int dso__load_sym(struct dso *dso, struct map *map, goto new_symbol; if (strcmp(section_name, .text) == 0) { + map-start = kernel_start + shdr.sh_offset; curr_map = map; curr_dso = dso; goto new_symbol; @@ -759,7 +761,7 @@ int dso__load_sym(struct dso *dso, struct map *map, u64 start = sym.st_value; if (kmodule) - start += map-start + shdr.sh_offset; + start += kernel_start + shdr.sh_offset; curr_dso = dso__new(dso_name); if (curr_dso == NULL) -- 1.7.10.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] smack: SMACK_MAGIC to include/uapi/linux/magic.h
On Tue, Nov 6, 2012 at 11:59 PM, Casey Schaufler ca...@schaufler-ca.com wrote: On 11/6/2012 12:17 AM, Jarkko Sakkinen wrote: SMACK_MAGIC moved to a proper place for easy user space access (i.e. libsmack). Signed-off-by: Jarkko Sakkinen jarkko.sakki...@iki.fi --- include/uapi/linux/magic.h |1 + security/smack/smack.h |5 - 2 files changed, 1 insertion(+), 5 deletions(-) Will security/smack/smack_lsm.c and security/smack/smackfs.c compile after this change? Sorry I haven't replied earlier. Anyway, I made a sanity check. I retried build from clean. Works. I also checked that vmlinux contains SMACK symbols. It does. diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index e15192c..12735ad 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -11,6 +11,7 @@ #define DEBUGFS_MAGIC 0x64626720 #define SECURITYFS_MAGIC 0x73636673 #define SELINUX_MAGIC0xf97cff8c +#define SMACK_MAGIC 0x43415d53 /* SMAC */ #define RAMFS_MAGIC 0x858458f6 /* some random number */ #define TMPFS_MAGIC 0x01021994 #define HUGETLBFS_MAGIC 0x958458f6 /* some random number */ diff --git a/security/smack/smack.h b/security/smack/smack.h index 99b3612..8ad3095 100644 --- a/security/smack/smack.h +++ b/security/smack/smack.h @@ -149,11 +149,6 @@ struct smack_known { #define SMACK_CIPSO_SOCKET 1 /* - * smackfs magic number - */ -#define SMACK_MAGIC 0x43415d53 /* SMAC */ - -/* * CIPSO defaults. */ #define SMACK_CIPSO_DOI_DEFAULT 3 /* Historical */ /Jarkko -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] virtio: Don't access index after unregister.
Virtio wants to release used indices after the corresponding virtio device has been unregistered. However, virtio does not hold an extra reference, giving up its last reference with device_unregister(), making accessing dev-index afterwards invalid. I actually saw problems when testing my (not-yet-merged) virtio-ccw code: - device_add virtio-net,id=xxx - creates device virtion with n0 - device_del xxx - deletes virtion, but calls ida_simple_remove with an index of 0 - device_add virtio-net,id=xxx - tries to add virtio0, which is still in use... So let's save the index we want to release before calling device_unregister(). Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- drivers/virtio/virtio.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 1e8659c..809b0de 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -225,8 +225,10 @@ EXPORT_SYMBOL_GPL(register_virtio_device); void unregister_virtio_device(struct virtio_device *dev) { + int index = dev-index; /* save for after device release */ + device_unregister(dev-dev); - ida_simple_remove(virtio_index_ida, dev-index); + ida_simple_remove(virtio_index_ida, index); } EXPORT_SYMBOL_GPL(unregister_virtio_device); -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pwm: lpc32xx - Fix the PWM polarity
On 08/11/12 11:33, Alban Bedel wrote: On Thu, 08 Nov 2012 10:51:35 +0100 Roland Stigge sti...@antcom.de wrote: On 07/11/12 16:25, Alban Bedel wrote: Signed-off-by: Alban Bedel alban.be...@avionic-design.de --- drivers/pwm/pwm-lpc32xx.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/drivers/pwm/pwm-lpc32xx.c b/drivers/pwm/pwm-lpc32xx.c index adb87f0..0dc278d 100644 --- a/drivers/pwm/pwm-lpc32xx.c +++ b/drivers/pwm/pwm-lpc32xx.c @@ -51,7 +51,11 @@ static int lpc32xx_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm, c = 256 * duty_ns; do_div(c, period_ns); - duty_cycles = c; + if (c == 0) + c = 256; + if (c 255) + c = 255; + duty_cycles = 256 - c; Except for the range check (for the original c 255), this results in: duty_cycles = 256 - c except for (c == 0) where duty_cycles = 1 No it lead to duty_cycles = 0 Let's do it step by step with the above code: c == 0 + if (c == 0) + c = 256; c == 256 + if (c 255) + c = 255; c == 255 + duty_cycles = 256 - c; c == 1 See? which actually is duty_cycles = (256 - c) - 255 (think with the original c) i.e. nearly a polarity inversion in the case of (c == 0). Why is the case (c == 0) so special here? Maybe you can document this, if it is really intended? It is intended, the formular for duty value in the register is: duty = (256 - 256*duty_ns/period_ns) % 256 Where does this modulo defined? In the Manual, there is sth. like this defined for RELOADV (tables 606+607), but not for DUTY. Maybe I missed sth. in the manual. Link or hint appreciated! Thanks, Roland -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch v4 0/7] acpi,memory-hotplug : implement framework for hot removing memory
The memory device can be removed by 2 ways: 1. send eject request by SCI 2. echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject In the 1st case, acpi_memory_disable_device() will be called. In the 2nd case, acpi_memory_device_remove() will be called. acpi_memory_device_remove() will also be called when we unbind the memory device from the driver acpi_memhotplug or a driver initialization fails. acpi_memory_disable_device() has already implemented a code which offlines memory and releases acpi_memory_info struct . But acpi_memory_device_remove() has not implemented it yet. So the patch prepares the framework for hot removing memory and adds the framework into acpi_memory_device_remove(). The last version of this patchset is here: https://lkml.org/lkml/2012/10/26/175 Note: patch1-2 are in pm tree now. And there is a bug in patch1, so I resend them. The commit in pm tree is: patch1: 85fcb3758c10e063a2a30dfad75017097999deed patch2: d0fbb400b6f3a6191bdc5024f8733b2e2b86338f Changes from v3 to v4: 1. patch1: unlock list_lock when removing memory fails. 2. patch2: just rebase them 3. patch3-7: these patches are in -mm tree, and they conflict with this patchset, so Adrew Morton drop them from -mm tree. I rebase and merge them into this patchset. Wen Congyang (6): acpi,memory-hotplug: introduce a mutex lock to protect the list in acpi_memory_device acpi_memhotplug.c: fix memory leak when memory device is unbound from the module acpi_memhotplug acpi_memhotplug.c: free memory device if acpi_memory_enable_device() failed acpi_memhotplug.c: don't allow to eject the memory device if it is being used acpi_memhotplug.c: bind the memory device when the driver is being loaded acpi_memhotplug.c: auto bind the memory device which is hotplugged before the driver is loaded Yasuaki Ishimatsu (1): acpi,memory-hotplug : add memory offline code to acpi_memory_device_remove() drivers/acpi/acpi_memhotplug.c | 168 - 1 file changed, 133 insertions(+), 35 deletions(-) -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch v4 2/7] acpi,memory-hotplug : add memory offline code to acpi_memory_device_remove()
From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com The memory device can be removed by 2 ways: 1. send eject request by SCI 2. echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject In the 1st case, acpi_memory_disable_device() will be called. In the 2nd case, acpi_memory_device_remove() will be called. acpi_memory_device_remove() will also be called when we unbind the memory device from the driver acpi_memhotplug or a driver initialization fails. acpi_memory_disable_device() has already implemented a code which offlines memory and releases acpi_memory_info struct. But acpi_memory_device_remove() has not implemented it yet. So the patch move offlining memory and releasing acpi_memory_info struct codes to a new function acpi_memory_remove_memory(). And it is used by both acpi_memory_device_remove() and acpi_memory_disable_device(). CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Rafael J. Wysocki r...@sisk.pl CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- The commit for pm tree is d0fbb400 drivers/acpi/acpi_memhotplug.c | 31 --- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 4c18ee3..e573e87 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -316,16 +316,11 @@ static int acpi_memory_powerdown_device(struct acpi_memory_device *mem_device) return 0; } -static int acpi_memory_disable_device(struct acpi_memory_device *mem_device) +static int acpi_memory_remove_memory(struct acpi_memory_device *mem_device) { int result; struct acpi_memory_info *info, *n; - - /* -* Ask the VM to offline this memory range. -* Note: Assume that this function returns zero on success -*/ mutex_lock(mem_device-list_lock); list_for_each_entry_safe(info, n, mem_device-res_list, list) { if (info-enabled) { @@ -335,10 +330,27 @@ static int acpi_memory_disable_device(struct acpi_memory_device *mem_device) return result; } } + + list_del(info-list); kfree(info); } mutex_unlock(mem_device-list_lock); + return 0; +} + +static int acpi_memory_disable_device(struct acpi_memory_device *mem_device) +{ + int result; + + /* +* Ask the VM to offline this memory range. +* Note: Assume that this function returns zero on success +*/ + result = acpi_memory_remove_memory(mem_device); + if (result) + return result; + /* Power-off and eject the device */ result = acpi_memory_powerdown_device(mem_device); if (result) { @@ -489,12 +501,17 @@ static int acpi_memory_device_add(struct acpi_device *device) static int acpi_memory_device_remove(struct acpi_device *device, int type) { struct acpi_memory_device *mem_device = NULL; - + int result; if (!device || !acpi_driver_data(device)) return -EINVAL; mem_device = acpi_driver_data(device); + + result = acpi_memory_remove_memory(mem_device); + if (result) + return result; + kfree(mem_device); return 0; -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch v4 3/7] acpi_memhotplug.c: fix memory leak when memory device is unbound from the module acpi_memhotplug
We allocate memory to store acpi_memory_info, so we should free it before freeing mem_device. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com CC: Rafael J. Wysocki r...@sisk.pl CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 31 +++ 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index e573e87..5e5ac80 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -131,12 +131,22 @@ acpi_memory_get_resource(struct acpi_resource *resource, void *context) return AE_OK; } +static void +acpi_memory_free_device_resources(struct acpi_memory_device *mem_device) +{ + struct acpi_memory_info *info, *n; + + mutex_lock(mem_device-list_lock); + list_for_each_entry_safe(info, n, mem_device-res_list, list) + kfree(info); + INIT_LIST_HEAD(mem_device-res_list); + mutex_unlock(mem_device-list_lock); +} + static int acpi_memory_get_device_resources(struct acpi_memory_device *mem_device) { acpi_status status; - struct acpi_memory_info *info, *n; - if (!list_empty(mem_device-res_list)) return 0; @@ -144,11 +154,7 @@ acpi_memory_get_device_resources(struct acpi_memory_device *mem_device) status = acpi_walk_resources(mem_device-device-handle, METHOD_NAME__CRS, acpi_memory_get_resource, mem_device); if (ACPI_FAILURE(status)) { - mutex_lock(mem_device-list_lock); - list_for_each_entry_safe(info, n, mem_device-res_list, list) - kfree(info); - INIT_LIST_HEAD(mem_device-res_list); - mutex_unlock(mem_device-list_lock); + acpi_memory_free_device_resources(mem_device); return -EINVAL; } @@ -447,6 +453,15 @@ static void acpi_memory_device_notify(acpi_handle handle, u32 event, void *data) return; } +static void acpi_memory_device_free(struct acpi_memory_device *mem_device) +{ + if (!mem_device) + return; + + acpi_memory_free_device_resources(mem_device); + kfree(mem_device); +} + static int acpi_memory_device_add(struct acpi_device *device) { int result; @@ -512,7 +527,7 @@ static int acpi_memory_device_remove(struct acpi_device *device, int type) if (result) return result; - kfree(mem_device); + acpi_memory_device_free(mem_device); return 0; } -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch v4 1/7] acpi,memory-hotplug: introduce a mutex lock to protect the list in acpi_memory_device
The memory device can be removed by 2 ways: 1. send eject request by SCI 2. echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject This 2 events may happen at the same time, so we may touch acpi_memory_device.res_list at the same time. This patch introduce a lock to protect this list. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com CC: Rafael J. Wysocki r...@sisk.pl CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- The commit in pm tree is 85fcb375 drivers/acpi/acpi_memhotplug.c | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 1e90e8f..4c18ee3 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -83,7 +83,8 @@ struct acpi_memory_info { struct acpi_memory_device { struct acpi_device * device; unsigned int state; /* State of the memory device */ - struct list_head res_list; + struct mutex list_lock; + struct list_head res_list; /* protected by list_lock */ }; static int acpi_hotmem_initialized; @@ -101,19 +102,23 @@ acpi_memory_get_resource(struct acpi_resource *resource, void *context) (address64.resource_type != ACPI_MEMORY_RANGE)) return AE_OK; + mutex_lock(mem_device-list_lock); list_for_each_entry(info, mem_device-res_list, list) { /* Can we combine the resource range information? */ if ((info-caching == address64.info.mem.caching) (info-write_protect == address64.info.mem.write_protect) (info-start_addr + info-length == address64.minimum)) { info-length += address64.address_length; + mutex_unlock(mem_device-list_lock); return AE_OK; } } new = kzalloc(sizeof(struct acpi_memory_info), GFP_KERNEL); - if (!new) + if (!new) { + mutex_unlock(mem_device-list_lock); return AE_ERROR; + } INIT_LIST_HEAD(new-list); new-caching = address64.info.mem.caching; @@ -121,6 +126,7 @@ acpi_memory_get_resource(struct acpi_resource *resource, void *context) new-start_addr = address64.minimum; new-length = address64.address_length; list_add_tail(new-list, mem_device-res_list); + mutex_unlock(mem_device-list_lock); return AE_OK; } @@ -138,9 +144,11 @@ acpi_memory_get_device_resources(struct acpi_memory_device *mem_device) status = acpi_walk_resources(mem_device-device-handle, METHOD_NAME__CRS, acpi_memory_get_resource, mem_device); if (ACPI_FAILURE(status)) { + mutex_lock(mem_device-list_lock); list_for_each_entry_safe(info, n, mem_device-res_list, list) kfree(info); INIT_LIST_HEAD(mem_device-res_list); + mutex_unlock(mem_device-list_lock); return -EINVAL; } @@ -236,6 +244,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) * We don't have memory-hot-add rollback function,now. * (i.e. memory-hot-remove function) */ + mutex_lock(mem_device-list_lock); list_for_each_entry(info, mem_device-res_list, list) { if (info-enabled) { /* just sanity check...*/ num_enabled++; @@ -256,6 +265,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) info-enabled = 1; num_enabled++; } + mutex_unlock(mem_device-list_lock); if (!num_enabled) { printk(KERN_ERR PREFIX add_memory failed\n); mem_device-state = MEMORY_INVALID_STATE; @@ -316,14 +326,18 @@ static int acpi_memory_disable_device(struct acpi_memory_device *mem_device) * Ask the VM to offline this memory range. * Note: Assume that this function returns zero on success */ + mutex_lock(mem_device-list_lock); list_for_each_entry_safe(info, n, mem_device-res_list, list) { if (info-enabled) { result = remove_memory(info-start_addr, info-length); - if (result) + if (result) { + mutex_unlock(mem_device-list_lock); return result; + } } kfree(info); } +
[Patch v4 6/7] acpi_memhotplug.c: bind the memory device when the driver is being loaded
We had introduced acpi_hotmem_initialized to avoid strange add_memory fail message. But the memory device may not be used by the kernel, and the device should be bound when the driver is being loaded. Remove acpi_hotmem_initialized to allow that the device can be bound when the driver is being loaded. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com CC: Rafael J. Wysocki r...@sisk.pl CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 12 1 file changed, 12 deletions(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 1fb1342..8a8716f 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -88,8 +88,6 @@ struct acpi_memory_device { struct list_head res_list; /* protected by list_lock */ }; -static int acpi_hotmem_initialized; - static acpi_status acpi_memory_get_resource(struct acpi_resource *resource, void *context) { @@ -520,15 +518,6 @@ static int acpi_memory_device_add(struct acpi_device *device) printk(KERN_DEBUG %s \n, acpi_device_name(device)); - /* -* Early boot code has recognized memory area by EFI/E820. -* If DSDT shows these memory devices on boot, hotplug is not necessary -* for them. So, it just returns until completion of this driver's -* start up. -*/ - if (!acpi_hotmem_initialized) - return 0; - if (!acpi_memory_check_device(mem_device)) { /* call add_memory func */ result = acpi_memory_enable_device(mem_device); @@ -644,7 +633,6 @@ static int __init acpi_memory_device_init(void) return -ENODEV; } - acpi_hotmem_initialized = 1; return 0; } -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch v4 4/7] acpi_memhotplug.c: free memory device if acpi_memory_enable_device() failed
If acpi_memory_enable_device() fails, acpi_memory_enable_device() will return a non-zero value, which means we fail to bind the memory device to this driver. So we should free memory device before acpi_memory_device_add() returns. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com CC: Rafael J. Wysocki r...@sisk.pl CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 5e5ac80..8914399 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -506,9 +506,11 @@ static int acpi_memory_device_add(struct acpi_device *device) if (!acpi_memory_check_device(mem_device)) { /* call add_memory func */ result = acpi_memory_enable_device(mem_device); - if (result) + if (result) { printk(KERN_ERR PREFIX Error in acpi_memory_enable_device\n); + acpi_memory_device_free(mem_device); + } } return result; } -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch v4 5/7] acpi_memhotplug.c: don't allow to eject the memory device if it is being used
We eject the memory device even if it is in use. It is very dangerous, and it will cause the kernel to be panicked. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com CC: Rafael J. Wysocki r...@sisk.pl CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 46 +- 1 file changed, 36 insertions(+), 10 deletions(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 8914399..1fb1342 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -78,6 +78,7 @@ struct acpi_memory_info { unsigned short caching; /* memory cache attribute */ unsigned short write_protect; /* memory read/write attribute */ unsigned int enabled:1; + unsigned int failed:1; }; struct acpi_memory_device { @@ -266,9 +267,23 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) node = memory_add_physaddr_to_nid(info-start_addr); result = add_memory(node, info-start_addr, info-length); - if (result) + + /* +* If the memory block has been used by the kernel, add_memory() +* returns -EEXIST. If add_memory() returns the other error, it +* means that this memory block is not used by the kernel. +*/ + if (result result != -EEXIST) { + info-failed = 1; continue; - info-enabled = 1; + } + + if (!result) + info-enabled = 1; + /* +* Add num_enable even if add_memory() returns -EEXIST, so the +* device is bound to this driver. +*/ num_enabled++; } mutex_unlock(mem_device-list_lock); @@ -324,25 +339,36 @@ static int acpi_memory_powerdown_device(struct acpi_memory_device *mem_device) static int acpi_memory_remove_memory(struct acpi_memory_device *mem_device) { - int result; + int result = 0; struct acpi_memory_info *info, *n; mutex_lock(mem_device-list_lock); list_for_each_entry_safe(info, n, mem_device-res_list, list) { - if (info-enabled) { - result = remove_memory(info-start_addr, info-length); - if (result) { - mutex_unlock(mem_device-list_lock); - return result; - } + if (info-failed) + /* The kernel does not use this memory block */ + continue; + + if (!info-enabled) { + /* +* The kernel uses this memory block, but it may be not +* managed by us. +*/ + result = -EBUSY; + goto out; } + result = remove_memory(info-start_addr, info-length); + if (result) + goto out; + list_del(info-list); kfree(info); } + +out: mutex_unlock(mem_device-list_lock); - return 0; + return result; } static int acpi_memory_disable_device(struct acpi_memory_device *mem_device) -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch v4 7/7] acpi_memhotplug.c: auto bind the memory device which is hotplugged before the driver is loaded
If the memory device is hotplugged before the driver is loaded, the user cannot see this device under the directory /sys/bus/acpi/devices/, and the user cannot bind it by hand after the driver is loaded. This patch introduces a new feature to bind such device when the driver is being loaded. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com CC: Rafael J. Wysocki r...@sisk.pl CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 37 - 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 8a8716f..24bfa6e 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -52,6 +52,9 @@ MODULE_LICENSE(GPL); #define MEMORY_POWER_ON_STATE 1 #define MEMORY_POWER_OFF_STATE 2 +static bool auto_probe; +module_param(auto_probe, bool, S_IRUGO | S_IWUSR); + static int acpi_memory_device_add(struct acpi_device *device); static int acpi_memory_device_remove(struct acpi_device *device, int type); @@ -581,12 +584,44 @@ acpi_memory_register_notify_handler(acpi_handle handle, u32 level, void *ctxt, void **retv) { acpi_status status; - + struct acpi_memory_device *mem_device = NULL; + unsigned long long current_status; status = is_memory_device(handle); if (ACPI_FAILURE(status)) return AE_OK; /* continue */ + if (auto_probe) { + /* Get device present/absent information from the _STA */ + status = acpi_evaluate_integer(handle, _STA, NULL, + current_status); + if (ACPI_FAILURE(status)) + goto install; + + /* +* Check for device status. Device should be +* present/enabled/functioning. +*/ + if (!(current_status + (ACPI_STA_DEVICE_PRESENT | ACPI_STA_DEVICE_ENABLED | + ACPI_STA_DEVICE_FUNCTIONING))) + goto install; + + if (acpi_memory_get_device(handle, mem_device)) + goto install; + + /* We have bound this device while we register the driver */ + if (mem_device-state == MEMORY_POWER_ON_STATE) + goto install; + + ACPI_DEBUG_PRINT((ACPI_DB_INFO, + \nauto probe memory device\n)); + + if (acpi_memory_enable_device(mem_device)) + pr_err(PREFIX Cannot enable memory device\n); + } + +install: status = acpi_install_notify_handler(handle, ACPI_SYSTEM_NOTIFY, acpi_memory_device_notify, NULL); /* continue */ -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 19/29] memcg: infrastructure to match an allocation to the right cache
On Tue 06-11-12 09:03:54, Michal Hocko wrote: On Mon 05-11-12 16:28:37, Andrew Morton wrote: On Thu, 1 Nov 2012 16:07:35 +0400 Glauber Costa glom...@parallels.com wrote: +static __always_inline struct kmem_cache * +memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp) I still don't understand why this code uses __always_inline so much. AFAIU, __always_inline (resp. __attribute__((always_inline))) is the same thing as inline if optimizations are enabled (http://ohse.de/uwe/articles/gcc-attributes.html#func-always_inline). And this doesn't tell the whole story because there is -fearly-inlining which enabled by default and it makes a difference when optimizations are enabled so __always_inline really enforces inlining. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] DMA: remove unused support for MEMSET operations
On Thu, 08 Nov 2012 10:58:17 +0100 Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com wrote: From: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Subject: [PATCH] DMA: remove unused support for MEMSET operations There have never been any real users of MEMSET operations In tree users. And is it broken, if not why do you want to break it ? Alan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] raid5: panic() on dma_wait_for_async_tx() error
On Thu, 08 Nov 2012 11:06:29 +0100 Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com wrote: From: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Subject: [PATCH] raid5: panic() on dma_wait_for_async_tx() error There is not much we can do on dma_wait_for_async_tx() error so just panic() for now. Cc: Neil Brown ne...@suse.de Cc: Vinod Koul vinod.k...@intel.com Cc: Dan Williams d...@fb.com Cc: Tomasz Figa t.f...@samsung.com Signed-off-by: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- drivers/md/raid5.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: b/drivers/md/raid5.c === --- a/drivers/md/raid5.c 2012-11-07 16:25:19.480876012 +0100 +++ b/drivers/md/raid5.c 2012-11-07 16:27:46.244875992 +0100 @@ -3223,7 +3223,9 @@ static void handle_stripe_expansion(stru /* done submitting copies, wait for them to complete */ if (tx) { async_tx_ack(tx); - dma_wait_for_async_tx(tx); + if (dma_wait_for_async_tx(tx) != DMA_SUCCESS) + panic(%s: DMA error waiting for transaction\n, + __func__); Thats a really horrible place to panic. Alan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Second attempt at kernel secure boot support
You have a fair chance of protecting via physical means (Locked rooms, Background checks on users etc.) of preventing a user with malicious intent to access the local machine. So called secure boot doesn't deal with any kind of physical access, which also means its useless if a device is lost and returned and you don't know if it was in the hands of a third party. The first thing a computer does when switched on is run its first code instructions. Commonly referred to as the BIOS. A good deal more complicated than that. However the signing in hardware and early boot up on a lot of devices already goes as far as the BIOS if the system has BIOS or EFI if it doesn't. You also have all the devices to deal with. Normally digital signatures would examine the binary, ensure the signature matches, and then run the code contained in it. No - it's a good deal more complicated than that too. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SR-IOV problem with Intel 82599EB (not enough MMIO resources for SR-IOV)
Hello, I installed kvm and tried to use SR-IOV virtualizaton for 82599EB(Intel XT-520 T2) dual port card with latest ixgbe driver(version:3.11.33) , kernel2.6.32-279.14.1(OS:Centos6.3) ,after configuration and reboot It seems that only first port of the card's VFs works,second port of the card's VFs didn't work I found some errors in /var/log messages: Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: not enough MMIO resources for SR-IOV Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: (unregistered net_device): Failed to enable PCI sriov: -12 How to fix it,any help would be greatly appreciated. below is the related information: Server: R710 OS: centos6.3 NIC: X520-T2(dual port) kernel: 2.6.32-279.14.1.el6.x86_64 BIOSVersion: 6.3.0(latest) BIOS:Inter VT/VT-d or SR-IOV(enabled) ixgbe:3.11.33(latest) ixgbevf:2.7.12(latest) grub config:intel_iommu=on appended /var/log/messages: Nov 8 14:56:54 12 kernel: Intel(R) 10 Gigabit PCI Express Network Driver - version 3.11.33 Nov 8 14:56:54 12 kernel: Copyright (c) 1999-2012 Intel Corporation. Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: PCI INT B - GSI 50 (level, low) - IRQ 50 Nov 8 14:56:54 12 kernel: ixgbe: I/O Virtualization (IOV) set to 2 Nov 8 14:56:54 12 kernel: ixgbe: :07:00.0: ixgbe_check_options: FCoE Offload feature enabled Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: (unregistered net_device): SR-IOV enabled with 2 VFs Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: FCoE offload feature is not available. Disabling FCoE offload feature Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: (PCI Express:5.0GT/s:Width x8) 68:05:ca:0c:7a:e2 Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: eth4: MAC: 2, PHY: 2, PBA No: G21371-003 Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: eth4: Enabled Features: RxQ: 1 TxQ: 1 LRO Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: eth4: IOV: VF 0 is enabled mac 0E:15:B9:26:B3:7D Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: eth4: IOV: VF 1 is enabled mac 32:69:2D:16:B9:40 Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: eth4: Intel(R) 10 Gigabit Network Connection Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: PCI INT A - GSI 40 (level, low) - IRQ 40 Nov 8 14:56:54 12 kernel: ixgbe: I/O Virtualization (IOV) set to 2 Nov 8 14:56:54 12 kernel: ixgbe: :07:00.1: ixgbe_check_options: FCoE Offload feature enabled Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: not enough MMIO resources for SR-IOV Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: (unregistered net_device): Failed to enable PCI sriov: -12 Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: FCoE offload feature is not available. Disabling FCoE offload feature Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: (PCI Express:5.0GT/s:Width x8) 68:05:ca:0c:7a:e3 Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: eth5: MAC: 2, PHY: 2, PBA No: G21371-003 Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: eth5: Enabled Features: RxQ: 16 TxQ: 16 FdirHash RSC Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: eth5: Intel(R) 10 Gigabit Network Connection # dmesg |grep -E 'DMA|IOMMU' ACPI: DMAR bf3b3668 001C0 (v01 DELL PE_SC3 0001 DELL 0001) DMA 0x0001 - 0x1000 DMA320x1000 - 0x0010 DMA zone: 56 pages used for memmap DMA zone: 102 pages reserved DMA zone: 3839 pages, LIFO batch:0 DMA32 zone: 14280 pages used for memmap DMA32 zone: 764849 pages, LIFO batch:31 Intel-IOMMU: enabled DMAR: Host address width 40 DMAR: DRHD base: 0x00fed9 flags: 0x1 IOMMU fed9: ver 1:0 cap c90780106f0462 ecap f020fe DMAR: RMRR base: 0x00bf4c8000 end: 0x00bf4d DMAR: RMRR base: 0x00bf4b1000 end: 0x00bf4b DMAR: RMRR base: 0x00bf4a1000 end: 0x00bf4a1fff DMAR: RMRR base: 0x00bf4a3000 end: 0x00bf4a3fff DMAR: RMRR base: 0x00bf4a5000 end: 0x00bf4a5fff DMAR: RMRR base: 0x00bf4a7000 end: 0x00bf4a7fff DMAR: RMRR base: 0x00bf4c end: 0x00bf4c0fff DMAR: RMRR base: 0x00bf4c2000 end: 0x00bf4c2fff DMAR: ATSR flags: 0x0 DMAR: Device scope device [:00:1a.02] not found DMAR: Device scope device [:00:1a.02] not found DMAR: Device scope device [:00:1d.02] not found DMAR: Device scope device [:00:1d.02] not found IOMMU 0xfed9: using Queued invalidation IOMMU: Setting RMRR: IOMMU: Setting identity map for device :00:1d.7 [0xbf4c2000 - 0xbf4c3000] IOMMU: Setting identity map for device :00:1a.7 [0xbf4c - 0xbf4c1000] IOMMU: Setting identity map for device :00:1d.1 [0xbf4a7000 - 0xbf4a8000] IOMMU: Setting identity map for device :00:1d.0 [0xbf4a5000 - 0xbf4a6000] IOMMU: Setting identity map for device :00:1a.1 [0xbf4a3000 - 0xbf4a4000] IOMMU: Setting identity map for device :00:1a.0 [0xbf4a1000 - 0xbf4a2000] IOMMU: Setting identity map for device :00:1a.0 [0xbf4b1000 - 0xbf4c] IOMMU: Setting identity map for device :00:1a.1 [0xbf4b1000 - 0xbf4c] IOMMU: Setting identity map for device :00:1d.0 [0xbf4b1000 -
Re: SR-IOV problem with Intel 82599EB (not enough MMIO resources for SR-IOV)
On 11/08/2012 03:15 AM, pkill.2012 wrote: Hello, I installed kvm and tried to use SR-IOV virtualizaton for 82599EB(Intel XT-520 T2) dual port card with latest ixgbe driver(version:3.11.33) , kernel2.6.32-279.14.1(OS:Centos6.3) ,after configuration and reboot It seems that only first port of the card's VFs works,second port of the card's VFs didn't work I found some errors in /var/log messages: Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: not enough MMIO resources for SR-IOV Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: (unregistered net_device): Failed to enable PCI sriov: -12 How to fix it,any help would be greatly appreciated. below is the related information: Server: R710 OS: centos6.3 NIC: X520-T2(dual port) kernel: 2.6.32-279.14.1.el6.x86_64 BIOSVersion: 6.3.0(latest) BIOS:Inter VT/VT-d or SR-IOV(enabled) ixgbe:3.11.33(latest) ixgbevf:2.7.12(latest) grub config:intel_iommu=on appended /var/log/messages: Nov 8 14:56:54 12 kernel: Intel(R) 10 Gigabit PCI Express Network Driver - version 3.11.33 Nov 8 14:56:54 12 kernel: Copyright (c) 1999-2012 Intel Corporation. Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: PCI INT B - GSI 50 (level, low) - IRQ 50 Nov 8 14:56:54 12 kernel: ixgbe: I/O Virtualization (IOV) set to 2 Nov 8 14:56:54 12 kernel: ixgbe: :07:00.0: ixgbe_check_options: FCoE Offload feature enabled Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: (unregistered net_device): SR-IOV enabled with 2 VFs Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: FCoE offload feature is not available. Disabling FCoE offload feature Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: (PCI Express:5.0GT/s:Width x8) 68:05:ca:0c:7a:e2 Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: eth4: MAC: 2, PHY: 2, PBA No: G21371-003 Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: eth4: Enabled Features: RxQ: 1 TxQ: 1 LRO Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: eth4: IOV: VF 0 is enabled mac 0E:15:B9:26:B3:7D Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: eth4: IOV: VF 1 is enabled mac 32:69:2D:16:B9:40 Nov 8 14:56:54 12 kernel: ixgbe :07:00.0: eth4: Intel(R) 10 Gigabit Network Connection Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: PCI INT A - GSI 40 (level, low) - IRQ 40 Nov 8 14:56:54 12 kernel: ixgbe: I/O Virtualization (IOV) set to 2 Nov 8 14:56:54 12 kernel: ixgbe: :07:00.1: ixgbe_check_options: FCoE Offload feature enabled Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: not enough MMIO resources for SR-IOV Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: (unregistered net_device): Failed to enable PCI sriov: -12 Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: FCoE offload feature is not available. Disabling FCoE offload feature Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: (PCI Express:5.0GT/s:Width x8) 68:05:ca:0c:7a:e3 Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: eth5: MAC: 2, PHY: 2, PBA No: G21371-003 Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: eth5: Enabled Features: RxQ: 16 TxQ: 16 FdirHash RSC Nov 8 14:56:54 12 kernel: ixgbe :07:00.1: eth5: Intel(R) 10 Gigabit Network Connection # dmesg |grep -E 'DMA|IOMMU' ACPI: DMAR bf3b3668 001C0 (v01 DELL PE_SC3 0001 DELL 0001) DMA 0x0001 - 0x1000 DMA320x1000 - 0x0010 DMA zone: 56 pages used for memmap DMA zone: 102 pages reserved DMA zone: 3839 pages, LIFO batch:0 DMA32 zone: 14280 pages used for memmap DMA32 zone: 764849 pages, LIFO batch:31 Intel-IOMMU: enabled DMAR: Host address width 40 DMAR: DRHD base: 0x00fed9 flags: 0x1 IOMMU fed9: ver 1:0 cap c90780106f0462 ecap f020fe DMAR: RMRR base: 0x00bf4c8000 end: 0x00bf4d DMAR: RMRR base: 0x00bf4b1000 end: 0x00bf4b DMAR: RMRR base: 0x00bf4a1000 end: 0x00bf4a1fff DMAR: RMRR base: 0x00bf4a3000 end: 0x00bf4a3fff DMAR: RMRR base: 0x00bf4a5000 end: 0x00bf4a5fff DMAR: RMRR base: 0x00bf4a7000 end: 0x00bf4a7fff DMAR: RMRR base: 0x00bf4c end: 0x00bf4c0fff DMAR: RMRR base: 0x00bf4c2000 end: 0x00bf4c2fff DMAR: ATSR flags: 0x0 DMAR: Device scope device [:00:1a.02] not found DMAR: Device scope device [:00:1a.02] not found DMAR: Device scope device [:00:1d.02] not found DMAR: Device scope device [:00:1d.02] not found IOMMU 0xfed9: using Queued invalidation IOMMU: Setting RMRR: IOMMU: Setting identity map for device :00:1d.7 [0xbf4c2000 - 0xbf4c3000] IOMMU: Setting identity map for device :00:1a.7 [0xbf4c - 0xbf4c1000] IOMMU: Setting identity map for device :00:1d.1 [0xbf4a7000 - 0xbf4a8000] IOMMU: Setting identity map for device :00:1d.0 [0xbf4a5000 - 0xbf4a6000] IOMMU: Setting identity map for device :00:1a.1 [0xbf4a3000 - 0xbf4a4000] IOMMU: Setting identity map for device :00:1a.0 [0xbf4a1000 - 0xbf4a2000] IOMMU: Setting identity map for device :00:1a.0 [0xbf4b1000 - 0xbf4c]
Re: [PATCH v2 1/2] thermal: exynos: Fix wrong bit to control tmu core
Hi On 31 October 2012 12:17, Jonghwan Choi jhbird.c...@samsung.com wrote: [0]bit is used to enable/disable tmu core. [1] bit is a reserved bit. Signed-off-by: Jonghwan Choi jhbird.c...@samsung.com --- drivers/thermal/exynos_thermal.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/thermal/exynos_thermal.c b/drivers/thermal/exynos_thermal.c index fd03e85..6ce6667 100644 --- a/drivers/thermal/exynos_thermal.c +++ b/drivers/thermal/exynos_thermal.c @@ -53,8 +53,8 @@ #define EXYNOS_TMU_TRIM_TEMP_MASK 0xff #define EXYNOS_TMU_GAIN_SHIFT 8 #define EXYNOS_TMU_REF_VOLTAGE_SHIFT 24 -#define EXYNOS_TMU_CORE_ON 3 -#define EXYNOS_TMU_CORE_OFF2 +#define EXYNOS_TMU_CORE_ON 1 +#define EXYNOS_TMU_CORE_OFF0 Hi Jonghwan, Only this much change is not sufficient. Also you need to do like below, diff --git a/drivers/thermal/exynos_thermal.c b/drivers/thermal/exynos_thermal.c index eebd4e5..4575144 100644 --- a/drivers/thermal/exynos_thermal.c +++ b/drivers/thermal/exynos_thermal.c @@ -52,9 +52,11 @@ #define EXYNOS_TMU_TRIM_TEMP_MASK 0xff #define EXYNOS_TMU_GAIN_SHIFT 8 +#define EXYNOS_TMU_GAIN_MASK (0xF 8) #define EXYNOS_TMU_REF_VOLTAGE_SHIFT 24 -#define EXYNOS_TMU_CORE_ON 3 -#define EXYNOS_TMU_CORE_OFF2 +#define EXYNOS_TMU_REF_VOLTAGE_MASK(0x1F 24) +#define EXYNOS_TMU_CORE_ON 1 +#define EXYNOS_TMU_CORE_OFF0 #define EXYNOS_TMU_DEF_CODE_TO_TEMP_OFFSET 50 /* Exynos4210 specific registers */ @@ -85,7 +87,9 @@ #define EXYNOS_TMU_CLEAR_FALL_INT (0x111 16) #define EXYNOS_MUX_ADDR_VALUE 6 #define EXYNOS_MUX_ADDR_SHIFT 20 +#define EXYNOS_MUX_ADDR_MASK (0xFF 16) #define EXYNOS_TMU_TRIP_MODE_SHIFT 13 +#define EXYNOS_TMU_TRIP_MODE_MASK (0x7 13) #define EFUSE_MIN_VALUE 40 #define EFUSE_MAX_VALUE 100 @@ -658,10 +662,13 @@ static void exynos_tmu_control(struct platform_device *pdev, bool on) mutex_lock(data-lock); clk_enable(data-clk); - con = pdata-reference_voltage EXYNOS_TMU_REF_VOLTAGE_SHIFT | + con = readl(data-base + EXYNOS_TMU_REG_CONTROL); + con = ~(EXYNOS_TMU_REF_VOLTAGE_MASK | EXYNOS_TMU_GAIN_MASK); + con |= pdata-reference_voltage EXYNOS_TMU_REF_VOLTAGE_SHIFT | pdata-gain EXYNOS_TMU_GAIN_SHIFT; if (data-soc == SOC_ARCH_EXYNOS) { + con = ~(EXYNOS_TMU_TRIP_MODE_MASK | EXYNOS_MUX_ADDR_MASK); con |= pdata-noise_cancel_mode EXYNOS_TMU_TRIP_MODE_SHIFT; con |= (EXYNOS_MUX_ADDR_VALUE EXYNOS_MUX_ADDR_SHIFT); } Thanks, Amit Daniel #define EXYNOS_TMU_DEF_CODE_TO_TEMP_OFFSET 50 /* Exynos4210 specific registers */ -- 1.7.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pwm: lpc32xx - Fix the PWM polarity
On Thu, 08 Nov 2012 11:44:48 +0100 Roland Stigge sti...@antcom.de wrote: On 08/11/12 11:33, Alban Bedel wrote: On Thu, 08 Nov 2012 10:51:35 +0100 Roland Stigge sti...@antcom.de wrote: On 07/11/12 16:25, Alban Bedel wrote: Signed-off-by: Alban Bedel alban.be...@avionic-design.de --- drivers/pwm/pwm-lpc32xx.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/drivers/pwm/pwm-lpc32xx.c b/drivers/pwm/pwm-lpc32xx.c index adb87f0..0dc278d 100644 --- a/drivers/pwm/pwm-lpc32xx.c +++ b/drivers/pwm/pwm-lpc32xx.c @@ -51,7 +51,11 @@ static int lpc32xx_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm, c = 256 * duty_ns; do_div(c, period_ns); - duty_cycles = c; + if (c == 0) + c = 256; + if (c 255) + c = 255; + duty_cycles = 256 - c; Except for the range check (for the original c 255), this results in: duty_cycles = 256 - c except for (c == 0) where duty_cycles = 1 No it lead to duty_cycles = 0 Let's do it step by step with the above code: c == 0 + if (c == 0) + c = 256; c == 256 + if (c 255) + c = 255; c == 255 + duty_cycles = 256 - c; c == 1 See? Right, my bad. which actually is duty_cycles = (256 - c) - 255 (think with the original c) i.e. nearly a polarity inversion in the case of (c == 0). Why is the case (c == 0) so special here? Maybe you can document this, if it is really intended? It is intended, the formular for duty value in the register is: duty = (256 - 256*duty_ns/period_ns) % 256 Where does this modulo defined? In the Manual, there is sth. like this defined for RELOADV (tables 606+607), but not for DUTY. Maybe I missed sth. in the manual. Link or hint appreciated! The manual doesn't mention this explicitly but you can see that without the modulo when duty_ns==0 DUTY would be 256, but the register is only 8 bits wide (ie. modulo 256). I made a few test and looked at the PWM output on a scope they confirm this: DUTY HIGH LEVEL 1 99.9% 2590.0% 128 50.0% 220 10.0% 2550.1% 0 0.0% I'll resubmit the patch with the clamping in the correct order. Alban -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] raid5: panic() on dma_wait_for_async_tx() error
On Thursday 08 November 2012 12:15:26 Alan Cox wrote: On Thu, 08 Nov 2012 11:06:29 +0100 Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com wrote: From: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Subject: [PATCH] raid5: panic() on dma_wait_for_async_tx() error There is not much we can do on dma_wait_for_async_tx() error so just panic() for now. Cc: Neil Brown ne...@suse.de Cc: Vinod Koul vinod.k...@intel.com Cc: Dan Williams d...@fb.com Cc: Tomasz Figa t.f...@samsung.com Signed-off-by: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- drivers/md/raid5.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: b/drivers/md/raid5.c === --- a/drivers/md/raid5.c2012-11-07 16:25:19.480876012 +0100 +++ b/drivers/md/raid5.c2012-11-07 16:27:46.244875992 +0100 @@ -3223,7 +3223,9 @@ static void handle_stripe_expansion(stru /* done submitting copies, wait for them to complete */ if (tx) { async_tx_ack(tx); - dma_wait_for_async_tx(tx); + if (dma_wait_for_async_tx(tx) != DMA_SUCCESS) + panic(%s: DMA error waiting for transaction\n, + __func__); Thats a really horrible place to panic. Still it seems better thing to do than silently ignoring errors and trying to continue operations with inconsistent data. Unfortunately higher-layers don't support error conditions and fixing them seems to be non-trivial task. Best regards, -- Bartlomiej Zolnierkiewicz Samsung Poland RD Center -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] DMA: remove unused support for MEMSET operations
On Thursday 08 November 2012 12:12:31 Alan Cox wrote: On Thu, 08 Nov 2012 10:58:17 +0100 Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com wrote: From: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Subject: [PATCH] DMA: remove unused support for MEMSET operations There have never been any real users of MEMSET operations In tree users. Please show me them. There were no users except self-test one (which this patch also removes), the whole memset code has been dead since its introduction in January 2007. And is it broken, if not why do you want to break it ? Well, it is partially broken as async_memset.c doesn't even built currently in next (I've posted fix for that before noticing that the whole code can be removed). Best regards, -- Bartlomiej Zolnierkiewicz Samsung Poland RD Center -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] DMA: remove unused support for MEMSET operations
On Thu, 08 Nov 2012 12:22:05 +0100 Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com wrote: On Thursday 08 November 2012 12:12:31 Alan Cox wrote: On Thu, 08 Nov 2012 10:58:17 +0100 Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com wrote: From: Bartlomiej Zolnierkiewicz b.zolnier...@samsung.com Subject: [PATCH] DMA: remove unused support for MEMSET operations There have never been any real users of MEMSET operations In tree users. Please show me them. Sorry ? There are potentially lots of out of tree users in both the ARM and x86 communities. And is it broken, if not why do you want to break it ? Well, it is partially broken as async_memset.c doesn't even built currently in next (I've posted fix for that before noticing that the whole code can be removed). Ok that does suggest it's not being used at all but it would be wise to check more widely. Alan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] pwm: lpc32xx - Fix the PWM polarity
The duty cycles value goes from 1 (99% HIGH) to 256 (0% HIGH) but it is stored modulo 256 in the register as it is only 8 bits wide. Signed-off-by: Alban Bedel alban.be...@avionic-design.de --- drivers/pwm/pwm-lpc32xx.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/drivers/pwm/pwm-lpc32xx.c b/drivers/pwm/pwm-lpc32xx.c index adb87f0..2590f8d 100644 --- a/drivers/pwm/pwm-lpc32xx.c +++ b/drivers/pwm/pwm-lpc32xx.c @@ -51,7 +51,9 @@ static int lpc32xx_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm, c = 256 * duty_ns; do_div(c, period_ns); - duty_cycles = c; + if (c 255) + c = 255; + duty_cycles = 256 - c; writel(PWM_ENABLE | PWM_RELOADV(period_cycles) | PWM_DUTY(duty_cycles), lpc32xx-base + (pwm-hwpwm 2)); -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] virtio: Don't access index after unregister.
On Thu, Nov 8, 2012 at 11:43 AM, Cornelia Huck cornelia.h...@de.ibm.com wrote: Virtio wants to release used indices after the corresponding virtio device has been unregistered. However, virtio does not hold an extra reference, giving up its last reference with device_unregister(), making accessing dev-index afterwards invalid. I actually saw problems when testing my (not-yet-merged) virtio-ccw code: - device_add virtio-net,id=xxx - creates device virtion with n0 - device_del xxx - deletes virtion, but calls ida_simple_remove with an index of 0 - device_add virtio-net,id=xxx - tries to add virtio0, which is still in use... So let's save the index we want to release before calling device_unregister(). Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- drivers/virtio/virtio.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 1e8659c..809b0de 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -225,8 +225,10 @@ EXPORT_SYMBOL_GPL(register_virtio_device); void unregister_virtio_device(struct virtio_device *dev) { + int index = dev-index; /* save for after device release */ + device_unregister(dev-dev); - ida_simple_remove(virtio_index_ida, dev-index); + ida_simple_remove(virtio_index_ida, index); } EXPORT_SYMBOL_GPL(unregister_virtio_device); Acked-by: Sjur Brændeland sjur.brandel...@stericsson.com Great minds think alike! I discovered issues with this implementation a while back and Michael suggested an identical patch: https://lkml.org/lkml/2012/9/4/173 https://lkml.org/lkml/2012/9/7/105 The issue I ran into was that when virtio devices are created by remoteproc the device memory might be freed when calling device_unregister(), and the value of dev-index is then undefined. So this bug bites when unregistering a Virtio devices from remoteproc with CONFIG_DEBUG_SLAB enabled. However this bug is not triggered by virtio_pci as it implements a non-standard device release-function that does not free the device memory. Thanks, Sjur -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] mtip32xx: fix potential NULL pointer dereference in mtip_timeout_function()
On 2012-11-08 10:35, Wei Yongjun wrote: From: Wei Yongjun yongjun_...@trendmicro.com.cn The dereference to port should be moved below the NULL test. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Thanks, it definitely doesn't make sense to check for !port after having dereferenced it. Applied. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] pwm: New driver to support PWMs on TWL4030/6030 series of PMICs
On Thu, Nov 8, 2012 at 9:14 AM, Péter Ujfalusi peter.ujfal...@ti.com wrote: On 11/07/2012 06:50 PM, Grazvydas Ignotas wrote: + if (pwm-hwpwm) { + /* PWM 1 */ + mask = TWL4030_GPIO7_VIBRASYNC_PWM1_MASK; + bits = TWL4030_GPIO7_VIBRASYNC_PWM1_PWM1; + } else { + /* PWM 0 */ + mask = TWL4030_GPIO6_PWM0_MUTE_MASK; + bits = TWL4030_GPIO6_PWM0_MUTE_PWM0; + } + + /* Save the current MUX configuration for the PWM */ + twl-twl4030_pwm_mux = ~mask; + twl-twl4030_pwm_mux |= (val mask); Do we really need this mask clearing here? After probe twl4030_pwm_mux should be zero, and if twl4030_pwm_request is called twice you don't clear the important bits before |=, I think 'twl4030_pwm_mux = val mask' would be better here. I'm storing both PWM's state in the same variable, but in different offsets: PWM0: bits 2-3 PWM1: bits 4-5 Probably it is over engineering to clear the relevant bits in the backup storage, but better to be safe IMHO. I would leave this part as it is. Oh, it should be good then. -- Gražvydas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] ARM: ux500: add PRCM register base for pinctrl
From: Jonas Aaberg jonas.ab...@stericsson.com This adds the PRCM register range base as a resource to the pinctrl driver do we can break the dependency to the PRCMU driver and handle these registers in the driver alone. Cc: a...@kernel.org Signed-off-by: Jonas Aaberg jonas.ab...@stericsson.com Signed-off-by: Linus Walleij linus.wall...@linaro.org --- ARM SoC guys: this patch is better contained in the pinctrl tree, can I have your ACK to push it through pinctrl? Thanks. --- arch/arm/mach-ux500/cpu-db8500.c | 2 +- arch/arm/mach-ux500/devices-common.h | 8 +++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm/mach-ux500/cpu-db8500.c b/arch/arm/mach-ux500/cpu-db8500.c index 87a8f9f..113d9c4 100644 --- a/arch/arm/mach-ux500/cpu-db8500.c +++ b/arch/arm/mach-ux500/cpu-db8500.c @@ -158,7 +158,7 @@ static void __init db8500_add_gpios(struct device *parent) dbx500_add_gpios(parent, ARRAY_AND_SIZE(db8500_gpio_base), IRQ_DB8500_GPIO0, pdata); - dbx500_add_pinctrl(parent, pinctrl-db8500); + dbx500_add_pinctrl(parent, pinctrl-db8500, U8500_PRCMU_BASE); } static int usb_db8500_rx_dma_cfg[] = { diff --git a/arch/arm/mach-ux500/devices-common.h b/arch/arm/mach-ux500/devices-common.h index 7fbf0ba..96fa4ac 100644 --- a/arch/arm/mach-ux500/devices-common.h +++ b/arch/arm/mach-ux500/devices-common.h @@ -129,12 +129,18 @@ void dbx500_add_gpios(struct device *parent, resource_size_t *base, int num, int irq, struct nmk_gpio_platform_data *pdata); static inline void -dbx500_add_pinctrl(struct device *parent, const char *name) +dbx500_add_pinctrl(struct device *parent, const char *name, + resource_size_t base) { + struct resource res[] = { + DEFINE_RES_MEM(base, SZ_8K), + }; struct platform_device_info pdevinfo = { .parent = parent, .name = name, .id = -1, + .res = res, + .num_res = ARRAY_SIZE(res), }; platform_device_register_full(pdevinfo); -- 1.7.11.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] pinctrl/nomadik: make independent of prcmu driver
From: Jonas Aaberg jonas.ab...@stericsson.com Currently there are some unnecessary criss-cross dependencies between the PRCMU driver in MFD and a lot of other drivers, mainly because other drivers need to poke around in the PRCM register range. In cases like this there are actually just a few select registers that the pinctrl driver need to read/modify/write, and it turns out that no other driver is actually using these registers, so there are no concurrency issues whatsoever. So: don't let the location of the register range complicate things, just poke into these registers directly and skip a layer of indirection. Cc: Loic Pallardy loic.palla...@st.com Signed-off-by: Jonas Aaberg jonas.ab...@stericsson.com Signed-off-by: Linus Walleij linus.wall...@linaro.org --- drivers/pinctrl/pinctrl-nomadik-db8500.c | 4 +-- drivers/pinctrl/pinctrl-nomadik-db8540.c | 4 +-- drivers/pinctrl/pinctrl-nomadik-stn8815.c | 4 +-- drivers/pinctrl/pinctrl-nomadik.c | 52 ++- drivers/pinctrl/pinctrl-nomadik.h | 14 + 5 files changed, 45 insertions(+), 33 deletions(-) diff --git a/drivers/pinctrl/pinctrl-nomadik-db8500.c b/drivers/pinctrl/pinctrl-nomadik-db8500.c index 6de52e7..e73d75e 100644 --- a/drivers/pinctrl/pinctrl-nomadik-db8500.c +++ b/drivers/pinctrl/pinctrl-nomadik-db8500.c @@ -1230,7 +1230,7 @@ static const u16 db8500_prcm_gpiocr_regs[] = { [PRCM_IDX_GPIOCR2] = 0x574, }; -static const struct nmk_pinctrl_soc_data nmk_db8500_soc = { +static struct nmk_pinctrl_soc_data nmk_db8500_soc = { .gpio_ranges = nmk_db8500_ranges, .gpio_num_ranges = ARRAY_SIZE(nmk_db8500_ranges), .pins = nmk_db8500_pins, @@ -1245,7 +1245,7 @@ static const struct nmk_pinctrl_soc_data nmk_db8500_soc = { }; void __devinit -nmk_pinctrl_db8500_init(const struct nmk_pinctrl_soc_data **soc) +nmk_pinctrl_db8500_init(struct nmk_pinctrl_soc_data **soc) { *soc = nmk_db8500_soc; } diff --git a/drivers/pinctrl/pinctrl-nomadik-db8540.c b/drivers/pinctrl/pinctrl-nomadik-db8540.c index 52fc301..1276ba3 100644 --- a/drivers/pinctrl/pinctrl-nomadik-db8540.c +++ b/drivers/pinctrl/pinctrl-nomadik-db8540.c @@ -1240,7 +1240,7 @@ static const u16 db8540_prcm_gpiocr_regs[] = { [PRCM_IDX_GPIOCR3] = 0x2bc, }; -static const struct nmk_pinctrl_soc_data nmk_db8540_soc = { +static struct nmk_pinctrl_soc_data nmk_db8540_soc = { .gpio_ranges = nmk_db8540_ranges, .gpio_num_ranges = ARRAY_SIZE(nmk_db8540_ranges), .pins = nmk_db8540_pins, @@ -1255,7 +1255,7 @@ static const struct nmk_pinctrl_soc_data nmk_db8540_soc = { }; void __devinit -nmk_pinctrl_db8540_init(const struct nmk_pinctrl_soc_data **soc) +nmk_pinctrl_db8540_init(struct nmk_pinctrl_soc_data **soc) { *soc = nmk_db8540_soc; } diff --git a/drivers/pinctrl/pinctrl-nomadik-stn8815.c b/drivers/pinctrl/pinctrl-nomadik-stn8815.c index 7d432c3..ed5b144 100644 --- a/drivers/pinctrl/pinctrl-nomadik-stn8815.c +++ b/drivers/pinctrl/pinctrl-nomadik-stn8815.c @@ -339,7 +339,7 @@ static const struct nmk_function nmk_stn8815_functions[] = { FUNCTION(i2cusb), }; -static const struct nmk_pinctrl_soc_data nmk_stn8815_soc = { +static struct nmk_pinctrl_soc_data nmk_stn8815_soc = { .gpio_ranges = nmk_stn8815_ranges, .gpio_num_ranges = ARRAY_SIZE(nmk_stn8815_ranges), .pins = nmk_stn8815_pins, @@ -351,7 +351,7 @@ static const struct nmk_pinctrl_soc_data nmk_stn8815_soc = { }; void __devinit -nmk_pinctrl_stn8815_init(const struct nmk_pinctrl_soc_data **soc) +nmk_pinctrl_stn8815_init(struct nmk_pinctrl_soc_data **soc) { *soc = nmk_stn8815_soc; } diff --git a/drivers/pinctrl/pinctrl-nomadik.c b/drivers/pinctrl/pinctrl-nomadik.c index 22f6937..33c614e 100644 --- a/drivers/pinctrl/pinctrl-nomadik.c +++ b/drivers/pinctrl/pinctrl-nomadik.c @@ -30,20 +30,6 @@ #include linux/pinctrl/pinconf.h /* Since we request GPIOs from ourself */ #include linux/pinctrl/consumer.h -/* - * For the U8500 archs, use the PRCMU register interface, for the older - * Nomadik, provide some stubs. The functions using these will only be - * called on the U8500 series. - */ -#ifdef CONFIG_ARCH_U8500 -#include linux/mfd/dbx500-prcmu.h -#else -static inline u32 prcmu_read(unsigned int reg) { - return 0; -} -static inline void prcmu_write(unsigned int reg, u32 value) {} -static inline void prcmu_write_masked(unsigned int reg, u32 mask, u32 value) {} -#endif #include linux/platform_data/pinctrl-nomadik.h #include asm/mach/irq.h @@ -85,7 +71,7 @@ struct nmk_gpio_chip { struct nmk_pinctrl { struct device *dev; struct pinctrl_dev *pctl; - const struct nmk_pinctrl_soc_data *soc; + struct nmk_pinctrl_soc_data *soc; }; static struct nmk_gpio_chip * @@ -247,6 +233,15 @@ nmk_gpio_disable_lazy_irq(struct nmk_gpio_chip *nmk_chip, unsigned offset) dev_dbg(nmk_chip-chip.dev, %d: clearing interrupt mask\n, gpio); }
Re: [headache] 3.7.0-rc2 can't handle mutt (with 3.7G mail file) +FF (4 tabs) on a 4G memory+4 core system ?
To me this looks like an issue with swap. Can you try without swap (swapoff)? Ortwin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PT_EXITKILL (Was: pdeath_signal)
On 11/07/2012 03:09 PM, Oleg Nesterov wrote: What I would IDEALLY like to have is a call, probably a ptrace option, where the parent can request: If I am ever to terminate or be killed, then my ptraced son MUST die as well. Perhaps this makes sense... Chris, iirc you also suggested something like this? And the patch is trivial. (...) OK. Please see the untested/uncompiled (but trivial) patch below - it adds PTRACE_O_EXITKILL. A better name? - A better numeric value? Note that the new option is not equal to the last-ptrace-option 1. Because currently all options have the event, and the new one starts the eventless group. 1 16 means we have the room for 8 more events. - it needs the convincing changelog for akpm If this isn't inherited by the ptrace child's children, a fork child can end up detached if the tracer dies before it had a chance of setting the PTRACE_O_EXITKILL on the new auto-attached child. Which sounds like another argument for PTRACE_O_INHERIT, as in: http://sourceware.org/ml/archer/2011-q1/msg00026.html (it sounds like you need to use PTRACE_SEIZE+options too to plug the race between PTRACE_ME/PTRACE_ATTACH and setting PTRACE_SETOPTIONS). (For completeness, Windows' age old equivalent, DebugSetProcessKillOnExit, it a tracer option, not tracee option, though that's not as flexible.) -- Pedro Alves -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/