Re: [PATCH 1/2] kfifo: round up the fifo size power of 2

2012-10-30 Thread Yuanhan Liu
On Mon, Oct 29, 2012 at 01:59:35PM -0700, Andrew Morton wrote:
> On Fri, 26 Oct 2012 15:56:57 +0800
> Yuanhan Liu  wrote:
> 
> > Say, if we want to allocate a filo with size of 6 bytes, it would be safer
> > to allocate 8 bytes instead of 4 bytes.
> >
> > ...
> >
> > --- a/kernel/kfifo.c
> > +++ b/kernel/kfifo.c
> > @@ -39,11 +39,11 @@ int __kfifo_alloc(struct __kfifo *fifo, unsigned int 
> > size,
> > size_t esize, gfp_t gfp_mask)
> >  {
> > /*
> > -* round down to the next power of 2, since our 'let the indices
> > +* round up to the next power of 2, since our 'let the indices
> >  * wrap' technique works only in this case.
> >  */
> > if (!is_power_of_2(size))
> > -   size = rounddown_pow_of_two(size);
> > +   size = roundup_pow_of_two(size);
> >  
> > fifo->in = 0;
> > fifo->out = 0;
> > @@ -84,7 +84,7 @@ int __kfifo_init(struct __kfifo *fifo, void *buffer,
> > size /= esize;
> >  
> > if (!is_power_of_2(size))
> > -   size = rounddown_pow_of_two(size);
> > +   size = roundup_pow_of_two(size);
> >  
> > fifo->in = 0;
> > fifo->out = 0;
> 
> hm, well, if the user asked for a 100-element fifo then it is a bit
> strange and unexpected to give them a 128-element one.

Hi Andrew,

Yes, and I guess the same to give them a 64-element one.

> 
> If there's absolutely no prospect that the kfifo code will ever support
> 100-byte fifos then I guess we should rework the API so that the caller
> has to pass in log2 of the size, not the size itself.  That way there
> will be no surprises and no mistakes.
> 
> That being said, the power-of-2 limitation isn't at all intrinsic to a
> fifo, so we shouldn't do this.  Ideally, we'd change the kfifo
> implementation so it does what the caller asked it to do!

I'm fine with removing the power-of-2 limitation. Stefani, what's your
comment on that?

--yliu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] AMD64 EDAC: Use appropriate name for NB indexing

2012-10-30 Thread Daniel J Blueman
Use the same 'amd' prefix as related functions for clarity.

Signed-off-by: Daniel J Blueman 
---
 arch/x86/include/asm/amd_nb.h |2 +-
 drivers/edac/amd64_edac.c |6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 0cc1045..39b5ddd 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -81,7 +81,7 @@ static inline struct amd_northbridge *node_to_amd_nb(u16 node)
return (node < amd_northbridges.num) ? _northbridges.nb[node] : 
NULL;
 }
 
-static inline u16 get_node_id(struct pci_dev *pdev)
+static inline u16 amd_get_node_id(struct pci_dev *pdev)
 {
int i;
 
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 12cd675..59658b9 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2558,7 +2558,7 @@ static int amd64_init_one_instance(struct pci_dev *F2)
struct mem_ctl_info *mci = NULL;
struct edac_mc_layer layers[2];
int err = 0, ret;
-   u16 nid = get_node_id(F2);
+   u16 nid = amd_get_node_id(F2);
 
ret = -ENOMEM;
pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
@@ -2649,7 +2649,7 @@ err_ret:
 static int __devinit amd64_probe_one_instance(struct pci_dev *pdev,
 const struct pci_device_id 
*mc_type)
 {
-   u16 nid = get_node_id(pdev);
+   u16 nid = amd_get_node_id(pdev);
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
struct ecc_settings *s;
int ret = 0;
@@ -2699,7 +2699,7 @@ static void __devexit amd64_remove_one_instance(struct 
pci_dev *pdev)
 {
struct mem_ctl_info *mci;
struct amd64_pvt *pvt;
-   u16 nid = get_node_id(pdev);
+   u16 nid = amd_get_node_id(pdev);
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
struct ecc_settings *s = ecc_stngs[nid];
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] AMD64 EDAC: Cleanup type usage to be consistent

2012-10-30 Thread Daniel J Blueman
As the Northbridge IDs are at most 16-bits, use the same type
consistently.

Signed-off-by: Daniel J Blueman 
---
 arch/x86/include/asm/amd_nb.h|2 +-
 arch/x86/include/asm/processor.h |2 +-
 arch/x86/kernel/cpu/amd.c|4 ++--
 drivers/edac/amd64_edac.c|   26 ++
 drivers/edac/amd64_edac.h|2 +-
 5 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index b88fc7a..0cc1045 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -76,7 +76,7 @@ static inline bool amd_nb_has_feature(unsigned feature)
return ((amd_northbridges.flags & feature) == feature);
 }
 
-static inline struct amd_northbridge *node_to_amd_nb(int node)
+static inline struct amd_northbridge *node_to_amd_nb(u16 node)
 {
return (node < amd_northbridges.num) ? _northbridges.nb[node] : 
NULL;
 }
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index ad1fc85..eb3ba58 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -934,7 +934,7 @@ extern void start_thread(struct pt_regs *regs, unsigned 
long new_ip,
 extern int get_tsc_mode(unsigned long adr);
 extern int set_tsc_mode(unsigned int val);
 
-extern int amd_get_nb_id(int cpu);
+extern u16 amd_get_nb_id(int cpu);
 
 struct aperfmperf {
u64 aperf, mperf;
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index f7e98a2..52cab1f 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -364,9 +364,9 @@ static void __cpuinit amd_detect_cmp(struct cpuinfo_x86 *c)
 #endif
 }
 
-int amd_get_nb_id(int cpu)
+u16 amd_get_nb_id(int cpu)
 {
-   int id = 0;
+   u16 id = 0;
 #ifdef CONFIG_SMP
id = per_cpu(cpu_llc_id, cpu);
 #endif
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 9920dfd..12cd675 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -239,7 +239,7 @@ static int amd64_get_scrub_rate(struct mem_ctl_info *mci)
  * DRAM base/limit associated with node_id
  */
 static bool amd64_base_limit_match(struct amd64_pvt *pvt, u64 sys_addr,
-  unsigned nid)
+  u16 nid)
 {
u64 addr;
 
@@ -265,7 +265,7 @@ static struct mem_ctl_info *find_mc_by_sys_addr(struct 
mem_ctl_info *mci,
u64 sys_addr)
 {
struct amd64_pvt *pvt;
-   unsigned node_id;
+   u16 node_id;
u32 intlv_en, bits;
 
/*
@@ -613,7 +613,8 @@ static u64 sys_addr_to_input_addr(struct mem_ctl_info *mci, 
u64 sys_addr)
 static u64 input_addr_to_dram_addr(struct mem_ctl_info *mci, u64 input_addr)
 {
struct amd64_pvt *pvt;
-   unsigned node_id, intlv_shift;
+   u16 node_id;
+   unsigned intlv_shift;
u64 bits, dram_addr;
u32 intlv_sel;
 
@@ -1337,7 +1338,7 @@ static u8 f1x_determine_channel(struct amd64_pvt *pvt, 
u64 sys_addr,
 }
 
 /* Convert the sys_addr to the normalized DCT address */
-static u64 f1x_get_norm_dct_addr(struct amd64_pvt *pvt, unsigned range,
+static u64 f1x_get_norm_dct_addr(struct amd64_pvt *pvt, u16 range,
 u64 sys_addr, bool hi_rng,
 u32 dct_sel_base_addr)
 {
@@ -1413,7 +1414,7 @@ static int f10_process_possible_spare(struct amd64_pvt 
*pvt, u8 dct, int csrow)
  * -EINVAL:  NOT FOUND
  * 0..csrow = Chip-Select Row
  */
-static int f1x_lookup_addr_in_dct(u64 in_addr, u32 nid, u8 dct)
+static int f1x_lookup_addr_in_dct(u64 in_addr, u16 nid, u8 dct)
 {
struct mem_ctl_info *mci;
struct amd64_pvt *pvt;
@@ -1491,7 +1492,7 @@ static u64 f1x_swap_interleaved_region(struct amd64_pvt 
*pvt, u64 sys_addr)
 
 /* For a given @dram_range, check if @sys_addr falls within it. */
 static int f1x_match_to_this_node(struct amd64_pvt *pvt, unsigned range,
- u64 sys_addr, int *nid, int *chan_sel)
+ u64 sys_addr, u16 *nid, int *chan_sel)
 {
int cs_found = -EINVAL;
u64 chan_addr;
@@ -1572,10 +1573,10 @@ static int f1x_match_to_this_node(struct amd64_pvt 
*pvt, unsigned range,
 }
 
 static int f1x_translate_sysaddr_to_cs(struct amd64_pvt *pvt, u64 sys_addr,
-  int *node, int *chan_sel)
+  u16 *node, int *chan_sel)
 {
int cs_found = -EINVAL;
-   unsigned range;
+   u16 range;
 
for (range = 0; range < DRAM_RANGES; range++) {
 
@@ -1607,7 +1608,8 @@ static void f1x_map_sysaddr_to_csrow(struct mem_ctl_info 
*mci, u64 sys_addr,
 {
struct amd64_pvt *pvt = mci->pvt_info;
u32 page, offset;
-   int nid, csrow, chan = 0;
+   int csrow, chan = 0;
+   u16 nid;
 
error_address_to_page_and_offset(sys_addr, , );
 
@@ -2065,7 +2067,7 @@ static void 

[PATCH 2/4] AMD64 EDAC: Add support for >255 memory controllers

2012-10-30 Thread Daniel J Blueman
As the AMD64 last-level-cache ID is 16-bits and federated systems
eg using Numascale's NumaConnect/NumaChip can have more than 255 memory
controllers, use 16-bits to store the ID.

Signed-off-by: Daniel J Blueman 
---
 drivers/edac/amd64_edac.c |   18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 18d404a..9920dfd 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -942,7 +942,7 @@ static u64 get_error_address(struct mce *m)
struct amd64_pvt *pvt;
u64 cc6_base, tmp_addr;
u32 tmp;
-   u8 mce_nid, intlv_en;
+   u16 mce_nid, intlv_en;
 
if ((addr & GENMASK(24, 47)) >> 24 != 0x00fdf7)
return addr;
@@ -1499,7 +1499,7 @@ static int f1x_match_to_this_node(struct amd64_pvt *pvt, 
unsigned range,
u8 channel;
bool high_range = false;
 
-   u8 node_id= dram_dst_node(pvt, range);
+   u16 node_id   = dram_dst_node(pvt, range);
u8 intlv_en   = dram_intlv_en(pvt, range);
u32 intlv_sel = dram_intlv_sel(pvt, range);
 
@@ -2306,7 +2306,7 @@ out:
return ret;
 }
 
-static int toggle_ecc_err_reporting(struct ecc_settings *s, u8 nid, bool on)
+static int toggle_ecc_err_reporting(struct ecc_settings *s, u16 nid, bool on)
 {
cpumask_var_t cmask;
int cpu;
@@ -2344,7 +2344,7 @@ static int toggle_ecc_err_reporting(struct ecc_settings 
*s, u8 nid, bool on)
return 0;
 }
 
-static bool enable_ecc_error_reporting(struct ecc_settings *s, u8 nid,
+static bool enable_ecc_error_reporting(struct ecc_settings *s, u16 nid,
   struct pci_dev *F3)
 {
bool ret = true;
@@ -2396,7 +2396,7 @@ static bool enable_ecc_error_reporting(struct 
ecc_settings *s, u8 nid,
return ret;
 }
 
-static void restore_ecc_error_reporting(struct ecc_settings *s, u8 nid,
+static void restore_ecc_error_reporting(struct ecc_settings *s, u16 nid,
struct pci_dev *F3)
 {
u32 value, mask = 0x3;  /* UECC/CECC enable */
@@ -2435,7 +2435,7 @@ static const char *ecc_msg =
"'ecc_enable_override'.\n"
" (Note that use of the override may cause unknown side effects.)\n";
 
-static bool ecc_enabled(struct pci_dev *F3, u8 nid)
+static bool ecc_enabled(struct pci_dev *F3, u16 nid)
 {
u32 value;
u8 ecc_en = 0;
@@ -2556,7 +2556,7 @@ static int amd64_init_one_instance(struct pci_dev *F2)
struct mem_ctl_info *mci = NULL;
struct edac_mc_layer layers[2];
int err = 0, ret;
-   u8 nid = get_node_id(F2);
+   u16 nid = get_node_id(F2);
 
ret = -ENOMEM;
pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
@@ -2647,7 +2647,7 @@ err_ret:
 static int __devinit amd64_probe_one_instance(struct pci_dev *pdev,
 const struct pci_device_id 
*mc_type)
 {
-   u8 nid = get_node_id(pdev);
+   u16 nid = get_node_id(pdev);
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
struct ecc_settings *s;
int ret = 0;
@@ -2697,7 +2697,7 @@ static void __devexit amd64_remove_one_instance(struct 
pci_dev *pdev)
 {
struct mem_ctl_info *mci;
struct amd64_pvt *pvt;
-   u8 nid = get_node_id(pdev);
+   u16 nid = get_node_id(pdev);
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
struct ecc_settings *s = ecc_stngs[nid];
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4, v4] AMD64 EDAC: Add multi-domain support to AMD EDAC

2012-10-30 Thread Daniel J Blueman
Fix the handling of memory controller detection to index the array
of detected Northbridges, allowing memory controllers over multiple
PCI domains in federated systems eg using Numascale's NumaConnect/
NumaChip.

v4: Generate linear Northbridge ID by indexing detected Northbridges

Signed-off-by: Daniel J Blueman 
---
 arch/x86/include/asm/amd_nb.h |   12 
 drivers/edac/amd64_edac.c |   18 ++
 drivers/edac/amd64_edac.h |6 --
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index b3341e9..b88fc7a 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -81,6 +81,19 @@ static inline struct amd_northbridge *node_to_amd_nb(int 
node)
return (node < amd_northbridges.num) ? _northbridges.nb[node] : 
NULL;
 }
 
+static inline u16 get_node_id(struct pci_dev *pdev)
+{
+   int i;
+
+   for (i = 0; i != amd_nb_num(); i++)
+   if (pci_domain_nr(node_to_amd_nb(i)->misc->bus) == 
pci_domain_nr(pdev->bus) &&
+   PCI_SLOT(node_to_amd_nb(i)->misc->devfn) == 
PCI_SLOT(pdev->devfn))
+   return i;
+
+   WARN(1, "Unable to find AMD Northbridge identifier\n");
+   return 0;
+}
+
 #else
 
 #define amd_nb_num(x)  0
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index cc8e7c7..18d404a 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -982,6 +982,9 @@ static u64 get_error_address(struct mce *m)
return addr;
 }
 
+static struct amd64_family_type *amd64_per_family_init(struct amd64_pvt *pvt);
+static struct pci_dev *pci_get_related_function(unsigned int vendor, unsigned 
int device, struct pci_dev *related);
+
 static void read_dram_base_limit_regs(struct amd64_pvt *pvt, unsigned range)
 {
struct cpuinfo_x86 *c = _cpu_data;
@@ -1001,11 +1004,17 @@ static void read_dram_base_limit_regs(struct amd64_pvt 
*pvt, unsigned range)
 
/* Factor in CC6 save area by reading dst node's limit reg */
if (c->x86 == 0x15) {
-   struct pci_dev *f1 = NULL;
-   u8 nid = dram_dst_node(pvt, range);
+   struct pci_dev *misc, *f1 = NULL;
+   struct amd64_family_type *fam_type;
+   u16 nid = dram_dst_node(pvt, range);
u32 llim;
 
-   f1 = pci_get_domain_bus_and_slot(0, 0, PCI_DEVFN(0x18 + nid, 
1));
+   misc = node_to_amd_nb(nid)->misc;
+   fam_type = amd64_per_family_init(pvt);
+   if (WARN_ON(!f1))
+   return;
+
+   f1 = pci_get_related_function(misc->vendor, fam_type->f1_id, 
misc);
if (WARN_ON(!f1))
return;
 
@@ -1720,7 +1729,8 @@ static struct pci_dev *pci_get_related_function(unsigned 
int vendor,
 
dev = pci_get_device(vendor, device, dev);
while (dev) {
-   if ((dev->bus->number == related->bus->number) &&
+   if (pci_domain_nr(dev->bus) == pci_domain_nr(related->bus) &&
+   (dev->bus->number == related->bus->number) &&
(PCI_SLOT(dev->devfn) == PCI_SLOT(related->devfn)))
break;
dev = pci_get_device(vendor, device, dev);
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 8d48047..90cae61 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -290,12 +290,6 @@
 /* MSRs */
 #define MSR_MCGCTL_NBE BIT(4)
 
-/* AMD sets the first MC device at device ID 0x18. */
-static inline u8 get_node_id(struct pci_dev *pdev)
-{
-   return PCI_SLOT(pdev->devfn) - 0x18;
-}
-
 enum amd_families {
K8_CPUS = 0,
F10_CPUS,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Input: select INPUT_MATRIXKMAP for lpc32xx-keys

2012-10-30 Thread Dmitry Torokhov
On Sat, Oct 27, 2012 at 09:40:48AM +0200, Roland Stigge wrote:
> This patch adds a "select" dependency of KEYBOARD_LPC32XX on INPUT_MATRIXKMAP,
> as the other drivers are doing in this regard. This fixes the following 
> compile
> error if KEYBOARD_LPC32XX is enabled but INPUT_MATRIXKMAP is not:
> 
> drivers/input/keyboard/lpc32xx-keys.c:230: undefined reference to
> `matrix_keypad_build_keymap'
> 
> Signed-off-by: Roland Stigge 

Applied, thank you Roland.

> 
> ---
>  drivers/input/keyboard/Kconfig |1 +
>  1 file changed, 1 insertion(+)
> 
> --- linux-2.6.orig/drivers/input/keyboard/Kconfig
> +++ linux-2.6/drivers/input/keyboard/Kconfig
> @@ -335,6 +335,7 @@ config KEYBOARD_LOCOMO
>  config KEYBOARD_LPC32XX
>   tristate "LPC32XX matrix key scanner support"
>   depends on ARCH_LPC32XX && OF
> + select INPUT_MATRIXKMAP
>   help
> Say Y here if you want to use NXP LPC32XX SoC key scanner interface,
> connected to a key matrix.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v1 07/10] perf tools: add mem access sampling core support

2012-10-30 Thread Namhyung Kim
On Mon, 29 Oct 2012 16:15:49 +0100, Stephane Eranian wrote:
> This patch adds the sorting and histogram support
> functions to enable profiling of memory accesses.
>
> The following sorting orders are added:
>  - symbol_daddr: data address symbol (or raw address)
>  - dso_daddr: data address shared object
>  - cost: access cost
>  - locked: access uses locked transaction
>  - tlb : TLB access
>  - mem : memory level of the access (L1, L2, L3, RAM, ...)
>  - snoop: access snoop mode
>
> Signed-off-by: Stephane Eranian 
> ---
[snip]
> +/* --sort daddr_sym */
> +static int64_t
> +sort__daddr_cmp(struct hist_entry *left, struct hist_entry *right)
> +{
> + struct addr_map_symbol *l = >mem_info->daddr;
> + struct addr_map_symbol *r = >mem_info->daddr;
> +
> + return (int64_t)(r->addr - l->addr);
> +}

Doesn't it need to compare symbol (start address) if any, before doing
it with raw addresses?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ti_tscadc: Match mfd sub devices to regmap interface

2012-10-30 Thread Dmitry Torokhov
On Tue, Oct 30, 2012 at 09:41:00PM -0700, Russ Dill wrote:
> On Wed, Oct 31, 2012 at 8:55 AM, Pantelis Antoniou
>  wrote:
> > The MFD parent device now uses a regmap, instead of direct
> > memory access. Use the same method in the sub devices to avoid
> > nasty surprises.
> >
> > Also rework the channel initialization of tiadc a bit.
> >
> > Signed-off-by: Pantelis Antoniou 
> > ---
> >  drivers/iio/adc/ti_am335x_adc.c   | 27 +++
> >  drivers/input/touchscreen/ti_am335x_tsc.c | 16 +---
> >  drivers/mfd/ti_am335x_tscadc.c|  7 +--
> >  3 files changed, 37 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/iio/adc/ti_am335x_adc.c 
> > b/drivers/iio/adc/ti_am335x_adc.c
> > index d48fd79..5f325c1 100644
> > --- a/drivers/iio/adc/ti_am335x_adc.c
> > +++ b/drivers/iio/adc/ti_am335x_adc.c
> > @@ -23,7 +23,9 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> > +#include 
> >  #include 
> >  #include 
> >
> > @@ -36,13 +38,17 @@ struct tiadc_device {
> >
> >  static unsigned int tiadc_readl(struct tiadc_device *adc, unsigned int reg)
> >  {
> > -   return readl(adc->mfd_tscadc->tscadc_base + reg);
> > +   unsigned int val;
> > +
> > +   val = (unsigned int)-1;
> > +   regmap_read(adc->mfd_tscadc->regmap_tscadc, reg, );
> > +   return val;
> >  }
> 
> Would it be cleaner to instead do:
> 
> static unsigned int tiadc_readl(struct tiadc_device *adc, unsigned int reg)
> {
>unsigned int val;
> 
>return regmap_read(adc->mfd_tscadc->regmap_tscadc, reg, ) ? : val;
> }
> 
> or
>int ret;
> 
>ret = regmap_read(adc->mfd_tscadc->regmap_tscadc, reg, );
>return ret < 0 ret ? : val;

Also the function should not be returning unsigned int if it returns
errors.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tools: Allow tools to be installed in a user specified location

2012-10-30 Thread Len Brown
Applied.

thanks,
Len Brown, Intel Open Source Technology Center


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] omap2-clk: Add missing lcdc clock definition

2012-10-30 Thread Paul Walmsley
On Wed, 31 Oct 2012, Hiremath, Vaibhav wrote:

> As far as lck clock node is concerned, we had deliberately dropped all leaf-
> node clocks from the clock tree, please refer to the description mentioned 
> in -
> http://lists.infradead.org/pipermail/linux-arm-kernel/2012-May/101987.html

Ach, should have remembered that :-(  Indeed there is an LCDC hwmod:

static struct omap_hwmod am33xx_lcdc_hwmod = {
.name   = "lcdc",
.class  = _lcdc_hwmod_class,
.clkdm_name = "lcdc_clkdm",
.mpu_irqs   = am33xx_lcdc_irqs,
.flags  = HWMOD_SWSUP_SIDLE | HWMOD_SWSUP_MSTANDBY,
.main_clk   = "lcd_gclk",
.prcm   = {
.omap4  = {
.clkctrl_offs   = AM33XX_CM_PER_LCDC_CLKCTRL_OFFSET,
.modulemode = MODULEMODE_SWCTRL,
},
},
};

> >From LCDC driver perspective, driver is using,
> 
> fb_clk = clk_get(>dev, NULL);
> 
> This I feel needs to be corrected for valid name as per Spec (mostly I would 
> vote for "fck") and then every platform should make sure that it returns 
> valid clock-node for it.
> 
> Change in Driver would be,
> 
> fb_clk = clk_get(>dev, "fck");

Indeed.


- Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Incomplete vmcore file

2012-10-30 Thread Ritesh Majumdar
Hello,

I am using kernel version 2.6.34.4 and trying to enable kernel crash using
kexec (kdump). Every time the crash occurs, the size of the dump file
(vmcore) is incomplete (around 20 MB).OS is installed with 8 GB of RAM.
Due to this gdb/crash fails to analyze the core dump.

Does anyone know where to look for this issue? Is there any known issue
with this kernel version or any fix available?

Regards,
Ritesh.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cpufreq: remove the unnecessary initialization of a local variable

2012-10-30 Thread Jingoo Han
This patch removes unnecessary initializer for the 'ret' variable.

Signed-off-by: Jingoo Han 
---
 drivers/cpufreq/cpufreq.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 261ef65..4e9fcc5 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -404,7 +404,7 @@ static int __cpufreq_set_policy(struct cpufreq_policy *data,
 static ssize_t store_##file_name   \
 (struct cpufreq_policy *policy, const char *buf, size_t count) \
 {  \
-   unsigned int ret = -EINVAL; \
+   unsigned int ret;   \
struct cpufreq_policy new_policy;   \
\
ret = cpufreq_get_policy(_policy, policy->cpu); \
@@ -459,7 +459,7 @@ static ssize_t show_scaling_governor(struct cpufreq_policy 
*policy, char *buf)
 static ssize_t store_scaling_governor(struct cpufreq_policy *policy,
const char *buf, size_t count)
 {
-   unsigned int ret = -EINVAL;
+   unsigned int ret;
charstr_governor[16];
struct cpufreq_policy new_policy;
 
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 9/9] bug.h: Convert BUILD_BUG{,_ON} to use BUILD_BUG_ON_MSG

2012-10-30 Thread Daniel Santos
On 10/30/2012 08:02 PM, Josh Triplett wrote:
> On Tue, Oct 30, 2012 at 08:19:05PM +0100, Borislav Petkov wrote:
>> On Sun, Oct 28, 2012 at 03:57:15PM -0500, danielfsan...@att.net wrote:
>>> Remove duplicate code by converting BUILD_BUG and BUILD_BUG_ON to just
>>> call BUILD_BUG_ON_MSG.  This not only reduces source code bloat, but
>>> also prevents the possibility of code being changed for one macro and
>>> not for the other (which was previously the case for BUILD_BUG and
>>> BUILD_BUG_ON).
>>>
>>> Signed-off-by: Daniel Santos 
>>> ---
>>>  include/linux/bug.h |   17 +++--
>>>  1 files changed, 3 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/include/linux/bug.h b/include/linux/bug.h
>>> index 3bc1ddf..b58ba51 100644
>>> --- a/include/linux/bug.h
>>> +++ b/include/linux/bug.h
>>> @@ -81,14 +81,8 @@ struct pt_regs;
>>>  #ifndef __OPTIMIZE__
>>>  #define BUILD_BUG_ON(condition) __compiletime_error_fallback(condition)
>>>  #else
>>> -#define BUILD_BUG_ON(condition)
>>> \
>>> -   do {\
>>> -   extern void __build_bug_on_failed(void) \
>>> -   __compiletime_error("BUILD_BUG_ON failed"); \
>>> -   __compiletime_error_fallback(condition);\
>>> -   if (condition)  \
>>> -   __build_bug_on_failed();\
>>> -   } while(0)
>>> +#define BUILD_BUG_ON(condition) \
>>> +   BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
>> Concatenating "condition" might not be very informative in all cases.
>> For example:
>>
>> BUILD_BUG_ON(1);
>>
>> Having __LINE__ is good enough IMHO.

Honestly, __LINE__ is only used to keep the function name unique.  If
anything, I think that having it creates more confusion rather than adds
clarity since the error message will indicate the file and line number
anyway.  So in other words, it is redundant without it being apparent
why.  Of course, it's a very simple and portable mechanism to keep the
symbols unique.  IMO, using __COUNTER__would be better for clarity
(since the number wouldn't relate to the anything real), but it is not
portable across versions of gcc (introduced in 4.4 or some such), so we
are using __LINE__.

> While it doesn't always help, it may help sometimes.  Worst case,
> BUILD_BUG_ON(1) gives you no less information than it did before; best
> case, it gives you useful data.

Yeah, and depending upon what it's fed (and how much pre-processing had
been done on that) it can actually prove helpful because stringifying
condition can reveal pre-processing errors as well.  So if you passed
some macro as the condition and it didn't expand the way you expected,
this error message will print out exactly how it expanded, up to the
point of having been passed to BUILD_BUG_ON.  I don't know how much that
could potentially help, but for troubleshooting, I find extra
information helpful, more often than harmful.

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] ARM: OMAP2+: AM33XX: clock data: fix mcasp entries

2012-10-30 Thread Hebbar, Gururaja
On Wed, Oct 31, 2012 at 01:58:32, Joel A Fernandes wrote:
> Hi Gururaja,
> 
> On Mon, Oct 29, 2012 at 10:45 AM, Hebbar, Gururaja
>  wrote:
> > Matt,
> >
> > On Wed, Oct 10, 2012 at 20:00:49, Porter, Matt wrote:
> >> 6ea74cb ARM: OMAP2+: hwmod: get rid of all omap_clk_get_by_name usage
> >> exposes a bug in the AM33XX clock data for mcasp. After moving to
> >> clk_get() usage, the _init() of all registered hwmods fails on mcasp0
> >> due to incorrect clock data causing clk_get() to fail. This causes all
> >> successive hwmods to fail to _init() leaving them in a bad state.
> >>
> >> This patch updates the mcasp clock entries so clk_get() will succeed.
> >> It is tested on BeagleBone and is needed for 3.7-rc1 to fix AM33xx
> >> boot.
> >
> >
> > I want to test Audio on AM335x Evm with your EDMA patches. I have few
> > patches for AM335x.
> > Can you share the link to the repo & branch on which I need to rebase?
> > The patches are related to mcasp dt node, mcasp pinmux in dt, etc...
> >
> 
> I was wondering about the status of following patches you wrote, not
> added to mainline yet:
> 
> (1)
>  ASoC: Davinci: machine: Add device tree binding
> https://patchwork.kernel.org/patch/1380511/  - will this be resubmitted?

There was no review comments for V3 I submitted. 

> 
> (2)
> ASoC: AM33XX: Add support for AM33xx SoC Audio
> https://github.com/joelagnel/linux-kernel/commit/973cfb48bdb70018b3869a21595bde8630efb29d

I want to re-submit both the patches along with 2 more patch-set [1]. I am
waiting for Matt Porters to reply with his recent branch, so that I can do
a final test and re-submit.

[1].
arm/dts: Add tlv320aic3x codec DT data to am335x-evm.dts
arm/dts: add mcasp1 dt node to am335x-evm.dt
ASoC: davinci-mcasp: Add pinctrl support
arm/dts: AM33XX: setup pinctrl for mcasp1 on am335x-evm

I need Mark Brown's Ack for the 3rd patch. There was some discussion
about adding pinctrl support for Audio drivers. I couldn't get the final
decision taken on it. I will rebase on to brownie-asoc/for-next and submit
it today.

> 
> Are you planning on sending/resending these patches again? I could do this 
> too.
> 
> I guess all other audio patches except for audio dts stuff is already in.
> 
> Thanks,
> Joel
> 


Regards, 
Gururaja
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-30 Thread H. Peter Anvin

On 10/30/2012 10:22 PM, Zhang, Jun wrote:

Hello, Anvin
   You are right. Thanks!

Hello, All
   Please review it again. Thanks!

 From bf7506ac7e9ce0df0b915164dbb7a6d858ef2e40 Mon Sep 17 00:00:00 2001
From: jzha144 
Date: Wed, 31 Oct 2012 08:51:18 +0800
Subject: [PATCH] When we are doing a crash dump, we still need non-E820_RAM
  memory type address information in order to do I/O. so only
  remove all RAM ranges which need to be dumped.

Signed-off-by: jzha144 
---
  arch/x86/kernel/e820.c |9 +
  1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index df06ade..77be839 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
 * reset.
 */
saved_max_pfn = e820_end_of_ram_pfn();
+
+   /*
+* We are doing a crash dump, so remove all RAM ranges
+* as they are the ones that need to be dumped.
+* We still need all non-RAM information in order to do I/O.
+*/
+   e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
+   userdef = 1;
+   return 0;
  #endif
e820.nr_map = 0;
userdef = 1;



The code is still wrong...

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.6.4-rt11

2012-10-30 Thread Paul Gortmaker
[[ANNOUNCE] 3.6.4-rt11] On 31/10/2012 (Wed 02:19) Thomas Gleixner wrote:

> Dear RT Folks,
> 
> I'm pleased to announce the 3.6.4-rt11 release.

The rt11 content is present on master in the 3.6-rt patch repo:
  http://git.kernel.org/?p=linux/kernel/git/paulg/3.6-rt-patches.git

I've also created a v3.6.4-rt11-fixes branch, which contains a fix for
the preempt-lazy TIF test on x86_32 (please check this; I really should
not be left unattended, mucking around in .S files while on vacation).
It is queued in the series right after the preempt-lazy-support.patch.

There is also a trivial fix to remove a needless whitespace change (also
appeared in preempt-lazy-support.patch) that triggers a git nag during
a "git am" of it.

Passes quick boot test on x86_32 UP (Note: I didn't try booting without
the TIF %cl patch; not sure what would have happened there...)

[ quick link to TIF patch @kernel.org: http://goo.gl/7Gbtg ]

Paul.
--

> 
> Changes since 3.6.3-rt10:
> 
>* Crypto wreckage fix (Milan Broz)
> 
>  Another proof why copy and paste should be forbidden, but if that
>  would happen most of us would be serving time.
> 
>* Another attempt to tame SLUB
> 
>  My previous approach turned out to be too naive though this one
>  has at least held up against massive memory stress tests. It's a
>  very simple and straight forward aproach now and while I'm quite
>  sure that it will not fall over as it did before, there might be
>  hidden latency issues with that new version.
> 
>   So please give it a proper testing!
> 
>* Lazy preemption
> 
>  It has become an obsession to mitigate the determinism
>  vs. throughput loss of RT. Looking at the mainline semantics of
>  preemption points gives a hint why RT sucks throughput wise for
>  ordinary SCHED_OTHER tasks. One major issue is the wakeup of
>  tasks which are right away preempting the waking task while the
>  waking task holds a lock on which the woken task will block right
>  after having preempted the wakee. In mainline this is prevented
>  due to the implicit preemption disable of spin/rw_lock held
>  regions. On RT this is not possible due to the fully preemptible
>  nature of sleeping spinlocks.
> 
>  Though for a SCHED_OTHER task preempting another SCHED_OTHER task
>  this is really not a correctness issue. RT folks are concerned
>  about SCHED_FIFO/RR tasks preemption and not about the purely
>  fairness driven SCHED_OTHER preemption latencies.
> 
>  So I introduced a lazy preemption mechanism which only applies to
>  SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of
>  the existing preempt_count each tasks sports now a
>  preempt_lazy_count which is manipulated on lock acquiry and
>  release. This is slightly incorrect as for lazyness reasons I
>  coupled this on migrate_disable/enable so some other mechanisms
>  get the same treatment (e.g. get_cpu_light).
> 
>  Now on the scheduler side instead of setting NEED_RESCHED this
>  sets NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER
>  preemption and therefor allows to exit the waking task the lock
>  held region before the woken task preempts. That also works
>  better for cross CPU wakeups as the other side can stay in the
>  adaptive spinning loop.
> 
>  For RT class preemption there is no change. This simply sets
>  NEED_RESCHED and forgoes the lazy preemption counter.
> 
>  Initial test do not expose any observable latency increasement,
>  but history shows that I've been proven wrong before :)
> 
>  The lazy preemption mode is per default on, but with
>  CONFIG_SCHED_DEBUG enabled it can be disabled via:
> 
>  # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
> 
>  and reenabled via
> 
>  # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
> 
>  The test results so far are very machine and workload dependent,
>  but there is a clear trend that it enhances the non RT workload
>  performance.
> 
>  Please give it a try and share your experience!
> 
> Known issues:
> 
>   There is still some "softirq pending xx" fallout which I have
>   not been able to investigate yet, but that's on my top priority
>   list. It's not a critical issue and only annoys people with
>   CONFIG_NO_HZ=y configurations.
> 
> 
> The delta patch against 3.6.4-rt10 is appended below and can be found
> here:
> 
>   
> http://www.kernel.org/pub/linux/kernel/projects/rt/3.6/incr/patch-3.6.4-rt10-rt11.patch.xz
> 
> 
> The RT patch against 3.6.4 can be found here:
> 
>   
> http://www.kernel.org/pub/linux/kernel/projects/rt/3.6/patch-3.6.4-rt11.patch.xz
> 
> The split quilt queue is available at:
> 
>   
> http://www.kernel.org/pub/linux/kernel/projects/rt/3.6/patches-3.6.4-rt11.tar.xz
> 
> Enjoy,
> 
>   tglx
> 
> ->
> 
--
To unsubscribe from this list: send the line 

[PART1 Patch 3/3] memory_hotplug: ensure every online node has NORMAL memory

2012-10-30 Thread Wen Congyang
From: Lai Jiangshan 

Old  memory hotplug code and new online/movable may cause a online node
don't have any normal memory, but memory-management acts bad when we have
nodes which is online but don't have any normal memory.
Example: it may cause a bound task fail on all kernel allocation and
cause the task can't create task or create other kernel object.

So we disable non-normal-memory-node here, we will enable it
when we prepared.

Signed-off-by: Lai Jiangshan 
---
 mm/memory_hotplug.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e6ec8c2..b557218 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -589,6 +589,12 @@ static int online_pages_range(unsigned long start_pfn, 
unsigned long nr_pages,
return 0;
 }
 
+/* ensure every online node has NORMAL memory */
+static bool can_online_high_movable(struct zone *zone)
+{
+   return node_state(zone_to_nid(zone), N_NORMAL_MEMORY);
+}
+
 /* check which state of node_states will be changed when online memory */
 static void node_states_check_changes_online(unsigned long nr_pages,
struct zone *zone, struct memory_notify *arg)
@@ -654,6 +660,12 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages, int online_typ
 */
zone = page_zone(pfn_to_page(pfn));
 
+   if ((zone_idx(zone) > ZONE_NORMAL || online_type == ONLINE_MOVABLE) &&
+   !can_online_high_movable(zone)) {
+   unlock_memory_hotplug();
+   return -1;
+   }
+
if (online_type == ONLINE_KERNEL && zone_idx(zone) == ZONE_MOVABLE) {
if (move_pfn_range_left(zone - 1, zone, pfn, pfn + nr_pages)) {
unlock_memory_hotplug();
@@ -1058,6 +1070,30 @@ check_pages_isolated(unsigned long start_pfn, unsigned 
long end_pfn)
return offlined;
 }
 
+/* ensure the node has NORMAL memory if it is still online */
+static bool can_offline_normal(struct zone *zone, unsigned long nr_pages)
+{
+   struct pglist_data *pgdat = zone->zone_pgdat;
+   unsigned long present_pages = 0;
+   enum zone_type zt;
+
+   for (zt = 0; zt <= ZONE_NORMAL; zt++)
+   present_pages += pgdat->node_zones[zt].present_pages;
+
+   if (present_pages > nr_pages)
+   return true;
+
+   present_pages = 0;
+   for (; zt <= ZONE_MOVABLE; zt++)
+   present_pages += pgdat->node_zones[zt].present_pages;
+
+   /*
+* we can't offline the last normal memory until all
+* higher memory is offlined.
+*/
+   return present_pages == 0;
+}
+
 /* check which state of node_states will be changed when offline memory */
 static void node_states_check_changes_offline(unsigned long nr_pages,
struct zone *zone, struct memory_notify *arg)
@@ -1145,6 +1181,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
node = zone_to_nid(zone);
nr_pages = end_pfn - start_pfn;
 
+   ret = -EINVAL;
+   if (zone_idx(zone) <= ZONE_NORMAL && !can_offline_normal(zone, 
nr_pages))
+   goto out;
+
/* set above range as isolated */
ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
if (ret)
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PART1 Patch 0/3] mm, memory-hotplug: allow to online movable memory

2012-10-30 Thread Wen Congyang
From: Lai Jiangshan 

This patch is part1 of the following patchset:
https://lkml.org/lkml/2012/10/29/319

The patchset is based on Linus's tree with these three patches already applied:
https://lkml.org/lkml/2012/10/24/151
https://lkml.org/lkml/2012/10/26/150

Movable memory is a very important concept of memory-management,
we need to consolidate it and make use of it on systems.

Movable memory is needed for
anti-fragmentation(hugepage, big-order allocation...)
logic hot-remove(virtualization, Memory capacity on Demand)
physic hot-remove(power-saving, hardware partitioning, hardware fault 
management)

All these require dynamic configuring the memory and making better utilities of
memories and safer. We also need physic hot-remove, so we need movable node too.
(Although some systems support physic-memory-migration, we don't require all
memory on physic-node is movable, but movable node is still needed here
for logic-node if we want to make physic-migration is transparent)

We add dynamic configuration commands "online_movalbe" and "online_kernel" in
this patchset, and you can't make a movable node(it will be implemented in
part4).

Usage:
1. online_movable:
   echo online_movable >/sys/devices/system/memory/memoryX/state
   The memory must be offlined before doing this.
2. online_kernel:
   echo online_kernel >/sys/devices/system/memory/memoryX/state
   The memory must be offlined before doing this.
3. online:
   echo online_kernel >/sys/devices/system/memory/memoryX/state
   The memory must be offline before doing this. This operation does't change
   the memory's attribute: movable or normal/high

Note:
   You only can move the highest memory in normal/high zone to movable zone,
   and only can move the lowest memory in movable zone to normal/high zone.

Lai Jiangshan (3):
  mm, memory-hotplug: dynamic configure movable memory and portion
memory
  memory_hotplug: handle empty zone when online_movable/online_kernel
  memory_hotplug: ensure every online node has NORMAL memory

 Documentation/memory-hotplug.txt |  14 ++-
 drivers/base/memory.c|  27 +++---
 include/linux/memory_hotplug.h   |  13 ++-
 mm/memory_hotplug.c  | 180 ++-
 4 files changed, 221 insertions(+), 13 deletions(-)

-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PART1 Patch 2/3] memory_hotplug: handle empty zone when online_movable/online_kernel

2012-10-30 Thread Wen Congyang
From: Lai Jiangshan 

make online_movable/online_kernel can empty a zone
or can move memory to a empty zone.

Signed-off-by: Lai Jiangshan 
---
 mm/memory_hotplug.c | 51 +--
 1 file changed, 45 insertions(+), 6 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4900025..e6ec8c2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -227,8 +227,17 @@ static void resize_zone(struct zone *zone, unsigned long 
start_pfn,
 
zone_span_writelock(zone);
 
-   zone->zone_start_pfn = start_pfn;
-   zone->spanned_pages = end_pfn - start_pfn;
+   if (end_pfn - start_pfn) {
+   zone->zone_start_pfn = start_pfn;
+   zone->spanned_pages = end_pfn - start_pfn;
+   } else {
+   /*
+* make it consist as free_area_init_core(),
+* if spanned_pages = 0, then keep start_pfn = 0
+*/
+   zone->zone_start_pfn = 0;
+   zone->spanned_pages = 0;
+   }
 
zone_span_writeunlock(zone);
 }
@@ -244,10 +253,19 @@ static void fix_zone_id(struct zone *zone, unsigned long 
start_pfn,
set_page_links(pfn_to_page(pfn), zid, nid, pfn);
 }
 
-static int move_pfn_range_left(struct zone *z1, struct zone *z2,
+static int __meminit move_pfn_range_left(struct zone *z1, struct zone *z2,
unsigned long start_pfn, unsigned long end_pfn)
 {
+   int ret;
unsigned long flags;
+   unsigned long z1_start_pfn;
+
+   if (!z1->wait_table) {
+   ret = init_currently_empty_zone(z1, start_pfn,
+   end_pfn - start_pfn, MEMMAP_HOTPLUG);
+   if (ret)
+   return ret;
+   }
 
pgdat_resize_lock(z1->zone_pgdat, );
 
@@ -261,7 +279,13 @@ static int move_pfn_range_left(struct zone *z1, struct 
zone *z2,
if (end_pfn <= z2->zone_start_pfn)
goto out_fail;
 
-   resize_zone(z1, z1->zone_start_pfn, end_pfn);
+   /* use start_pfn for z1's start_pfn if z1 is empty */
+   if (z1->spanned_pages)
+   z1_start_pfn = z1->zone_start_pfn;
+   else
+   z1_start_pfn = start_pfn;
+
+   resize_zone(z1, z1_start_pfn, end_pfn);
resize_zone(z2, end_pfn, z2->zone_start_pfn + z2->spanned_pages);
 
pgdat_resize_unlock(z1->zone_pgdat, );
@@ -274,10 +298,19 @@ out_fail:
return -1;
 }
 
-static int move_pfn_range_right(struct zone *z1, struct zone *z2,
+static int __meminit move_pfn_range_right(struct zone *z1, struct zone *z2,
unsigned long start_pfn, unsigned long end_pfn)
 {
+   int ret;
unsigned long flags;
+   unsigned long z2_end_pfn;
+
+   if (!z2->wait_table) {
+   ret = init_currently_empty_zone(z2, start_pfn,
+   end_pfn - start_pfn, MEMMAP_HOTPLUG);
+   if (ret)
+   return ret;
+   }
 
pgdat_resize_lock(z1->zone_pgdat, );
 
@@ -291,8 +324,14 @@ static int move_pfn_range_right(struct zone *z1, struct 
zone *z2,
if (start_pfn >= z1->zone_start_pfn + z1->spanned_pages)
goto out_fail;
 
+   /* use end_pfn for z2's end_pfn if z2 is empty */
+   if (z2->spanned_pages)
+   z2_end_pfn = z2->zone_start_pfn + z2->spanned_pages;
+   else
+   z2_end_pfn = end_pfn;
+
resize_zone(z1, z1->zone_start_pfn, start_pfn);
-   resize_zone(z2, start_pfn, z2->zone_start_pfn + z2->spanned_pages);
+   resize_zone(z2, start_pfn, z2_end_pfn);
 
pgdat_resize_unlock(z1->zone_pgdat, );
 
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PART1 Patch 1/3] mm, memory-hotplug: dynamic configure movable memory and portion memory

2012-10-30 Thread Wen Congyang
From: Lai Jiangshan 

Add online_movable and online_kernel for logic memory hotplug.
This is the dynamic version of "movablecore" & "kernelcore".

We have the same reason to introduce it as to introduce "movablecore" & 
"kernelcore".
It has the same motive as "movablecore" & "kernelcore", but it is 
dynamic/running-time:

o   We can configure memory as kernelcore or movablecore after boot.

Userspace workload is increased, we need more hugepage, we can't
use "online_movable" to add memory and allow the system use more
THP(transparent-huge-page), vice-verse when kernel workload is increase.

Also help for virtualization to dynamic configure host/guest's memory,
to save/(reduce waste) memory.

Memory capacity on Demand

o   When a new node is physically online after boot, we need to use
"online_movable" or "online_kernel" to configure/portion it
as we expected when we logic-online it.

This configuration also helps for physically-memory-migrate.

o   all benefit as the same as existed "movablecore" & "kernelcore".

o   Preparing for movable-node, which is very important for power-saving,
hardware partitioning and high-available-system(hardware fault 
management).

(Note, we don't introduce movable-node here.)

Action behavior:
When a memoryblock/memorysection is onlined by "online_movable", the kernel
will not have directly reference to the page of the memoryblock,
thus we can remove that memory any time when needed.

When it is online by "online_kernel", the kernel can use it.
When it is online by "online", the zone type doesn't changed.

Current constraints:
Only the memoryblock which is adjacent to the ZONE_MOVABLE
can be online from ZONE_NORMAL to ZONE_MOVABLE.

Signed-off-by: Lai Jiangshan 
---
 Documentation/memory-hotplug.txt |  14 +-
 drivers/base/memory.c|  27 +++
 include/linux/memory_hotplug.h   |  13 -
 mm/memory_hotplug.c  | 101 ++-
 4 files changed, 142 insertions(+), 13 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 6e6cbc7..c6f993d 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -161,7 +161,8 @@ a recent addition and not present on older kernels.
in the memory block.
 'state'   : read-write
 at read:  contains online/offline state of memory.
-at write: user can specify "online", "offline" command
+at write: user can specify "online_kernel",
+"online_movable", "online", "offline" command
 which will be performed on al sections in the block.
 'phys_device' : read-only: designed to show the name of physical memory
 device.  This is not well implemented now.
@@ -255,6 +256,17 @@ For onlining, you have to write "online" to the section's 
state file as:
 
 % echo online > /sys/devices/system/memory/memoryXXX/state
 
+This onlining will not change the ZONE type of the target memory section,
+If the memory section is in ZONE_NORMAL, you can change it to ZONE_MOVABLE:
+
+% echo online_movable > /sys/devices/system/memory/memoryXXX/state
+(NOTE: current limit: this memory section must be adjacent to ZONE_MOVABLE)
+
+And if the memory section is in ZONE_MOVABLE, you can change it to ZONE_NORMAL:
+
+% echo online_kernel > /sys/devices/system/memory/memoryXXX/state
+(NOTE: current limit: this memory section must be adjacent to ZONE_NORMAL)
+
 After this, section memoryXXX's state will be 'online' and the amount of
 available memory will be increased.
 
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 86c8821..15a1dd7 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -246,7 +246,7 @@ static bool pages_correctly_reserved(unsigned long 
start_pfn,
  * OK to have direct references to sparsemem variables in here.
  */
 static int
-memory_block_action(unsigned long phys_index, unsigned long action)
+memory_block_action(unsigned long phys_index, unsigned long action, int 
online_type)
 {
unsigned long start_pfn;
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
@@ -261,7 +261,7 @@ memory_block_action(unsigned long phys_index, unsigned long 
action)
if (!pages_correctly_reserved(start_pfn, nr_pages))
return -EBUSY;
 
-   ret = online_pages(start_pfn, nr_pages);
+   ret = online_pages(start_pfn, nr_pages, online_type);
break;
case MEM_OFFLINE:
ret = offline_pages(start_pfn, nr_pages);
@@ -276,7 +276,8 @@ memory_block_action(unsigned long phys_index, unsigned long 
action)
 }
 
 static int __memory_block_change_state(struct memory_block *mem,
-   unsigned long 

Re: [PATCH v4 6/9] compiler.h, bug.h: Prevent double error messages with BUILD_BUG{,_ON}

2012-10-30 Thread Daniel Santos
On 10/30/2012 11:19 AM, Borislav Petkov wrote:
> On Sun, Oct 28, 2012 at 03:57:12PM -0500, danielfsan...@att.net wrote:
>> Prior to the introduction of __attribute__((error("msg"))) in gcc 4.3,
>> creating compile-time errors required a little trickery.
>> BUILD_BUG{,_ON} uses this attribute when available to generate
>> compile-time errors, but also uses the negative-sized array trick for
>> older compilers, resulting in two error messages in some cases.  The
>> reason it's "some" cases is that as of gcc 4.4, the negative-sized array
>> will not create an error in some situations, like inline functions.
>>
>> This patch replaces the negative-sized array code with the new
>> __compiletime_error_fallback() macro which expands to the same thing
>> unless the the error attribute is available, in which case it expands to
>> do{}while(0), resulting in exactly one compile-time error on all
>> versions of gcc.
>>
>> Signed-off-by: Daniel Santos 
>> ---
>>  include/linux/bug.h  |4 ++--
>>  include/linux/compiler.h |7 +++
>>  2 files changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/bug.h b/include/linux/bug.h
>> index 03259d7..da03dc1 100644
>> --- a/include/linux/bug.h
>> +++ b/include/linux/bug.h
>> @@ -57,13 +57,13 @@ struct pt_regs;
>>   * track down.
>>   */
>>  #ifndef __OPTIMIZE__
>> -#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
>> +#define BUILD_BUG_ON(condition) __compiletime_error_fallback(condition)
>>  #else
>>  #define BUILD_BUG_ON(condition) 
>> \
>>  do {\
>>  extern void __build_bug_on_failed(void) \
>>  __compiletime_error("BUILD_BUG_ON failed"); \
>> -((void)sizeof(char[1 - 2*!!(condition)]));  \
>> +__compiletime_error_fallback(condition);\
>>  if (condition)  \
>>  __build_bug_on_failed();\
> If we're defining a fallback, shouldn't it come second? I.e.:
>
>   if (condition)
>   __build_bug_on_failed();
>   __compiletime_error_fallback(condition);
>
> Also, the error message from __build_bug_on_failed is much more
> informative:
>
> arch/x86/kernel/cpu/amd.c: In function ‘early_init_amd’:
> arch/x86/kernel/cpu/amd.c:486:2: error: call to ‘__build_bug_on_failed’ 
> declared with attribute error: BUILD_BUG_ON failed
> make[1]: *** [arch/x86/kernel/cpu/amd.o] Error 1
> make[1]: *** Waiting for unfinished jobs
> make: *** [arch/x86/kernel/cpu/] Error 2
>
> than
>
> arch/x86/kernel/cpu/amd.c: In function ‘early_init_amd’:
> arch/x86/kernel/cpu/amd.c:486:2: error: size of unnamed array is negative
> make[1]: *** [arch/x86/kernel/cpu/amd.o] Error 1
> make[1]: *** Waiting for unfinished jobs
> make: *** [arch/x86/kernel/cpu/] Error 2
Yes, the __build_bug_on_failed message is much more informative.  This
will only increase with these patches.  For example, the line

BUILD_BUG_ON(sizeof(*c) != 4);

emits this error:

arch/x86/kernel/cpu/amd.c: In function ‘early_init_amd’:
arch/x86/kernel/cpu/amd.c:486:2: error: call to
‘__build_bug_on_failed_486’ declared with attribute error: BUILD_BUG_ON
failed: sizeof(*c) != 4
make[1]: *** [arch/x86/kernel/cpu/amd.o] Error 1
make: *** [arch/x86/kernel/cpu/amd.o] Error 2

It's true that there is some redundancy in there as well as the
gibberish line number embedded in the function name, but the end of the
line spits out the exact statement that failed.

But as far as rather the fallback is first or the __compiletime_error
function is a matter of asthetics, since it's really an either/or
situation.  Either the __build_bug_on_failedxxx function will be
declared with __attribute__((error(message))) and the fallback will
expand to a no-op, or the fallback will produce code that (presumably
always?) breaks the build.  For insurance, a link-time error will occur
if the fallback code fails to break the build.

Realistically, a single macro could be defined in compiler*.h that
encapsulates the entirety of this mechanism and only exposes a "black
box" macro, that will simply expand to something that breaks the build
in the most appropriate fashion based upon the version of gcc.  In
essence, the new BUILD_BUG_ON_MSG macro attempts to fill that roll.
>
> Finally, you need to do:
>
>   bool __cond = !!(condition);
>
> and use __cond so that condition doesn't get evaluated multiple times
> (possibly with side effects).
>
> Thanks.
Big problem!  Very good catch, thank you!  All good programmers know not
use expressions that can have side effects in an assert-type macro, but
this it should certainly be as dummy proof as possible.  That will force
others to get a really *really* good dummy if they want to break it!

Thank you for this! I suppose another 

Re: [PATCH v3] Add support for AMD64 EDAC on multiple PCI domains

2012-10-30 Thread Daniel J Blueman

On 29/10/2012 18:32, Borislav Petkov wrote:

+ Andreas.

Dude, look at this boot log below:

http://quora.org/2012/16-server-boot-2.txt

That's 192 F10h's!


We were booting 384 a while back, but I'll let you know when reach 4096!


On Mon, Oct 29, 2012 at 04:54:59PM +0800, Daniel J Blueman wrote:

A number of other callers lookup the PCI device based on index
0..amd_nb_num(), but we can't easily allocate contiguous northbridge IDs

>from the PCI device in the first place.


OTOH we can simply this code by changing amd_get_node_id to generate a
linear northbridge ID from the index of the matching entry in the
northbridge array.

I'll get a patch together to see if there are any snags.


I suspected that after we have this nice approach, you guys would come
with non-contiguous node numbers. Maan, can't you build your systems so
that software people can have it easy at least for once??!


It depends on the definition of node, of course. The only changes we're 
considering is compliance with the Intel x2apic spec with using the 
upper 16-bits of the APIC ID as the server ("cluster") ID, since there 
are optimisations in Linux for this.



This really is a lot less intrusive [1] and boots well on top of
3.7-rc3 on one of our 16-server/192-core/512GB systems [2].

If you're happy with this simpler approach for now, I'll present
this and a separate patch cleaning up the inconsistent use of
unsigned and u8 node ID variables to u16?


Sure, bring it on.


Yes, I've prepared a patch series and it tests out well.


diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index b3341e9..b88fc7a 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -81,6 +81,18 @@ static inline struct amd_northbridge
*node_to_amd_nb(int node)
 return (node < amd_northbridges.num) ?
_northbridges.nb[node] : NULL;
  }

+static inline u8 get_node_id(struct pci_dev *pdev)
+{
+   int i;
+
+   for (i = 0; i != amd_nb_num(); i++)
+   if (pci_domain_nr(node_to_amd_nb(i)->misc->bus) ==
pci_domain_nr(pdev->bus) &&
+   PCI_SLOT(node_to_amd_nb(i)->misc->devfn) ==
PCI_SLOT(pdev->devfn))
+   return i;


Looks ok, can you send the whole patch please?


+   BUG();


I'm not sure about this - maybe WARN()? Are we absolutely sure we
unconditionally should panic after not finding an NB descriptor?


It looks like the only way we could be looking up a non-existent NB 
descriptor is if the array or variable in hand was corrupted. Maybe 
better to panic immediately debugging to be elusive later.


I've tweaked this to warn and return the first Northbridge ID to avoid 
further issues, but even that isn't ideal.



Btw, this shouldn't happen on those CPUs:

[   39.279131] TSC synchronization [CPU#0 -> CPU#12]:
[   39.287223] Measured 22750019569 cycles TSC warp between CPUs, turning off 
TSC clock.
[0.03] tsc: Marking TSC unstable due to check_tsc_sync_source failed

I guess TSCs are not starting at the same moment on all boards.


As these are physically separate servers (off-the-shelf servers in fact, 
a key benefit of NumaConnect), the TSC clocks diverge. Later, I'll be 
cooking up a patch series to keep them in sync, allowing fast TSC use.



You definitely need ucode on those too:

[  113.392460] microcode: CPU0: patch_level=0x


Good tip!

Thanks,
  Daniel
--
Daniel J Blueman
Principal Software Engineer, Numascale Asia
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] da8xx: Allow use by am33xx based devices

2012-10-30 Thread Manjunathappa, Prakash
Hi,

On Wed, Oct 31, 2012 at 21:26:08, Pantelis Antoniou wrote:
> This driver can be used for AM33xx devices, like the popular beaglebone.
> 
> Signed-off-by: Pantelis Antoniou 
> ---
>  drivers/video/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
> index 9791d10..e7868d8 100644
> --- a/drivers/video/Kconfig
> +++ b/drivers/video/Kconfig
> @@ -2202,7 +2202,7 @@ config FB_SH7760
>  
>  config FB_DA8XX
>   tristate "DA8xx/OMAP-L1xx Framebuffer support"
> - depends on FB && ARCH_DAVINCI_DA8XX
> + depends on FB && (ARCH_DAVINCI_DA8XX || SOC_AM33XX)

Agreed this is present on da8xx and am33xx, but moving forward for
supporting DT, we should be avoiding these dependencies. So instead
change this to remove machine dependencies.

Thanks,
Prakash

>   select FB_CFB_FILLRECT
>   select FB_CFB_COPYAREA
>   select FB_CFB_IMAGEBLIT
> -- 
> 1.7.12
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-30 Thread Zhang, Jun
Hello, Anvin
  You are right. Thanks!

Hello, All
  Please review it again. Thanks!

From bf7506ac7e9ce0df0b915164dbb7a6d858ef2e40 Mon Sep 17 00:00:00 2001
From: jzha144 
Date: Wed, 31 Oct 2012 08:51:18 +0800
Subject: [PATCH] When we are doing a crash dump, we still need non-E820_RAM
 memory type address information in order to do I/O. so only
 remove all RAM ranges which need to be dumped.

Signed-off-by: jzha144 
---
 arch/x86/kernel/e820.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index df06ade..77be839 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
 * reset.
 */
saved_max_pfn = e820_end_of_ram_pfn();
+
+   /*
+* We are doing a crash dump, so remove all RAM ranges
+* as they are the ones that need to be dumped.
+* We still need all non-RAM information in order to do I/O.
+*/
+   e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
+   userdef = 1;
+   return 0;
 #endif
e820.nr_map = 0;
userdef = 1;
-- 
1.7.6

Best Regards!
Zhang, jun

-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Wednesday, October 31, 2012 12:38 PM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; Fleming, 
Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] To crash dump, we need keep other memory type except 
E820_RAM, because other type come from BIOS or firmware is used by other 
code(for example: PCI_MMCONFIG).

On 10/30/2012 08:39 PM, Zhang, Jun wrote:
> Hello, Anvin
> Thanks!
>
> Hello, all
> Next is my the latest version, please review it.
> Thanks!

You're still starting in the wrong end which is confusing for the reader.

What you probably want to say is something more like:

"We are doing a crash dump, so remove all RAM ranges as they are the ones that 
need to be dumped.  We still need all non-RAM information in order to do I/O."

At that point it should be pretty obvious that the patch is wrong.  What if we 
are *not* doing a crash dump?  Just because crash dump is compiled in doesn't 
mean that that is what we are doing right now.

-hpa

>  From 141546c77ff7be523a9e72f5259df4a6827f2c1a Mon Sep 17 00:00:00 
> 2001
> From: jzha144 
> Date: Wed, 31 Oct 2012 08:51:18 +0800
> Subject: [PATCH] If we are doing a crash dump, we still need non-E820_RAM
>   memory type address information, which come from BIOS or
>   firmware. for example: PCI_MMCONFIG check this address.
>
> Signed-off-by: jzha144 
> ---
>   arch/x86/kernel/e820.c |9 +
>   1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 
> df06ade..f8672d0 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
>* reset.
>*/
>   saved_max_pfn = e820_end_of_ram_pfn();
> +
> + /*
> +  * If we are doing a crash dump, we still need non-E820_RAM
> +  * memory type address information. so we only remove
> +  * E820_RAM type.
> +  */
> + e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
> + userdef = 1;
> + return 0;
>   #endif
>   e820.nr_map = 0;
>   userdef = 1;
>


--
H. Peter Anvin, Intel Open Source Technology Center I work for Intel.  I don't 
speak on their behalf.

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [Patch v1 06/10] perf/x86: add support for PEBS Precise Store

2012-10-30 Thread Namhyung Kim
On Mon, 29 Oct 2012 16:15:48 +0100, Stephane Eranian wrote:
> This patch adds support for PEBS Precise Store
> which is available on Intel Sandy Bridge and
> Ivy Bridge processors.
>
> To use Precise store, the proper PEBS event
> must be used: mem_trans_retired:precise_stores.
> For the perf tool, the generic mem-stores event
> exported via sysfs can be used directly.

Just trivial nitpicks..

>
> Signed-off-by: Stephane Eranian 
> ---
[snip]
> @@ -486,6 +524,7 @@ struct event_constraint 
> intel_snb_pebs_event_constraints[] = {
>   INTEL_EVENT_CONSTRAINT(0xc4, 0xf),/* BR_INST_RETIRED.* */
>   INTEL_EVENT_CONSTRAINT(0xc5, 0xf),/* BR_MISP_RETIRED.* */
>   INTEL_PLD_CONSTRAINT(0x01cd, 0x8),/* 
> MEM_TRANS_RETIRED.LAT_ABOVE_THR */
> + INTEL_PST_CONSTRAINT(0x02cd, 0x8),/* 
> MEM_TRANS_RETIRED.PRECISE_STORES */
>   INTEL_EVENT_CONSTRAINT(0xd0, 0xf),/* MEM_UOP_RETIRED.* */
>   INTEL_EVENT_CONSTRAINT(0xd1, 0xf),/* MEM_LOAD_UOPS_RETIRED.* */
>   INTEL_EVENT_CONSTRAINT(0xd2, 0xf),/* 
> MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
> @@ -500,6 +539,7 @@ struct event_constraint 
> intel_ivb_pebs_event_constraints[] = {
>  INTEL_EVENT_CONSTRAINT(0xc4, 0xf),/* BR_INST_RETIRED.* */
>  INTEL_EVENT_CONSTRAINT(0xc5, 0xf),/* BR_MISP_RETIRED.* */
>  INTEL_PLD_CONSTRAINT(0x01cd, 0x8),/* 
> MEM_TRANS_RETIRED.LAT_ABOVE_THR */
> + INTEL_PST_CONSTRAINT(0x02cd, 0x8),/* 
> MEM_TRANS_RETIRED.PRECISE_STORES */

White-space damaged?  Oh, it seems already broken with spaces.


>  INTEL_EVENT_CONSTRAINT(0xd0, 0xf),/* MEM_UOP_RETIRED.* */
>  INTEL_EVENT_CONSTRAINT(0xd1, 0xf),/* MEM_LOAD_UOPS_RETIRED.* */
>  INTEL_EVENT_CONSTRAINT(0xd2, 0xf),/* 
> MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
[snip]
> @@ -672,7 +715,7 @@ static void __intel_pmu_pebs_event(struct perf_event 
> *event,
>   /*
>* if PEBS-LL or PreciseStore
>*/
> - if (fll) {
> + if (fll || fst) {
>   if (sample_type & PERF_SAMPLE_ADDR)
>   data.addr = pebs->dla;
>  
> @@ -688,6 +731,8 @@ static void __intel_pmu_pebs_event(struct perf_event 
> *event,
>   if (sample_type & PERF_SAMPLE_DSRC) {
>   if (fll)
>   data.dsrc.val = load_latency_data(pebs->dse);
> + else if (fst)

Looks like it can be converted to a plain 'else'.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the arm-soc tree with the staging tree

2012-10-30 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the arm-soc tree got a conflict in
arch/arm/mach-omap2/drm.c between commit 5e3b08749951 ("staging:
drm/omap: add support for ARCH_MULTIPLATFORM") from the staging tree and
commit 2a296c8f89bc ("ARM: OMAP: Make plat/omap_hwmod.h local to
mach-omap2") from the arm-soc tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/arm/mach-omap2/drm.c
index 49a7ffb,6282cc8..000
--- a/arch/arm/mach-omap2/drm.c
+++ b/arch/arm/mach-omap2/drm.c
@@@ -23,11 -23,9 +23,11 @@@
  #include 
  #include 
  #include 
 +#include 
  
- #include 
- #include 
+ #include "omap_device.h"
+ #include "omap_hwmod.h"
 +#include 
  
  #if defined(CONFIG_DRM_OMAP) || (CONFIG_DRM_OMAP_MODULE)
  


pgpjSx2aGBU20.pgp
Description: PGP signature


RE: [PATCH] omap2-clk: Add missing lcdc clock definition

2012-10-30 Thread Hiremath, Vaibhav
On Wed, Oct 31, 2012 at 04:56:40, Paul Walmsley wrote:
> + Vaibhav Hiremath
> 
> On Tue, 30 Oct 2012, Tony Lindgren wrote:
> 
> > * Pantelis Antoniou  [121030 11:04]:
> > > Looks like the lcdc clock definition got dropped.
> > > It is required for the LCD controller to work. Reintroduce.
> > 
> > This looks like a regression, can you also add the commit
> > causing it?
> 
> Looks like probably a new "feature," in that this clock didn't exist in 
> the original check-in.  Would be good to get Vaibhav's opinion on this; 
> also the common clock patches will need to be updated.
> 

Thanks Paul for looping me in, something went wrong with my l-o subscription, 
so I didn't receive these Patches. 

As far as lck clock node is concerned, we had deliberately dropped all leaf-
node clocks from the clock tree, please refer to the description mentioned 
in -
http://lists.infradead.org/pipermail/linux-arm-kernel/2012-May/101987.html


>From LCDC driver perspective, driver is using,

fb_clk = clk_get(>dev, NULL);

This I feel needs to be corrected for valid name as per Spec (mostly I would 
vote for "fck") and then every platform should make sure that it returns 
valid clock-node for it.

Change in Driver would be,

fb_clk = clk_get(>dev, "fck");


Thanks,
Vaibhav

> 
> - Paul
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] mm, highmem: remove useless pool_lock

2012-10-30 Thread Minchan Kim
Hi Andrew,

On Tue, Oct 30, 2012 at 02:31:07PM -0700, Andrew Morton wrote:
> On Mon, 29 Oct 2012 04:12:53 +0900
> Joonsoo Kim  wrote:
> 
> > The pool_lock protects the page_address_pool from concurrent access.
> > But, access to the page_address_pool is already protected by kmap_lock.
> > So remove it.
> 
> Well, there's a set_page_address() call in mm/page_alloc.c which
> doesn't have lock_kmap().  it doesn't *need* lock_kmap() because it's
> init-time code and we're running single-threaded there.  I hope!
> 
> But this exception should be double-checked and mentioned in the
> changelog, please.  And it's a reason why we can't add
> assert_spin_locked(_lock) to set_page_address(), which is
> unfortunate.
> 

The exception is vaild only in m68k and sparc and they will use not
set_page_address of highmem.c but page->virtual. So I think we can add
such lock check in set_page_address in highmem.c.

But I'm not sure we really need it because set_page_address is used in
few places so isn't it enough adding a just wording to avoid unnecessary
overhead?

/* NOTE : Caller should hold kmap_lock by lock_kmap() */

> 
> The irq-disabling in this code is odd.  If ARCH_NEEDS_KMAP_HIGH_GET=n,
> we didn't need irq-safe locking in set_page_address().  I guess we'll

What lock you mean in set_page_address?
We have two locks in there, pool_lock and pas->lock.
By this patchset, we don't need pool_lock any more.
Remained thing is pas->lock.

If we make the lock irq-unsafe, it would be deadlock with page_addresss
if it is called in irq context. Currenntly, page_address is used
lots of places and not sure it's called only process context.
Was there any rule that we have to use page_addresss in only
process context?

> need to retain it in page_address() - I expect some callers have IRQs
> disabled.
> 
> 
> ARCH_NEEDS_KMAP_HIGH_GET is a nasty looking thing.  It's ARM:
> 
> /*
>  * The reason for kmap_high_get() is to ensure that the currently kmap'd
>  * page usage count does not decrease to zero while we're using its
>  * existing virtual mapping in an atomic context.  With a VIVT cache this
>  * is essential to do, but with a VIPT cache this is only an optimization
>  * so not to pay the price of establishing a second mapping if an existing
>  * one can be used.  However, on platforms without hardware TLB maintenance
>  * broadcast, we simply cannot use ARCH_NEEDS_KMAP_HIGH_GET at all since
>  * the locking involved must also disable IRQs which is incompatible with
>  * the IPI mechanism used by global TLB operations.
>  */
> #define ARCH_NEEDS_KMAP_HIGH_GET
> #if defined(CONFIG_SMP) && defined(CONFIG_CPU_TLB_V6)
> #undef ARCH_NEEDS_KMAP_HIGH_GET
> #if defined(CONFIG_HIGHMEM) && defined(CONFIG_CPU_CACHE_VIVT)
> #error "The sum of features in your kernel config cannot be supported 
> together"
> #endif
> #endif
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]Documentation: Chinese translation of Documentation/arm/kernel_user_helpers.txt

2012-10-30 Thread Dongsheng Song
Hi,

There have some misleading in the translation:

 origin  
User space is expected to bypass those helpers and implement those things
inline (either in the code emitted directly by the compiler, or part of
the implementation of a library call) when optimizing for a recent enough
processor that has the necessary native support, but only if resulting
binaries are already to be incompatible with earlier ARM processors due to
usage of similar native instructions for other things.  In other words
don't make binaries unable to run on earlier processors just for the sake
of not using these kernel helpers if your compiled code is not going to
use new instructions for other purpose.

 Fu Wei  
当对那些拥有基本原生支持的新型处理器的代码进行优化时,用户空间最好绕
过这些辅助代码并在内联函数中实现这些操作(无论是通过编译器在代码中直
接放置,还是作为库函数调用实现的一部分),而仅在其他处理器因使用相似
的本地指令而导致二进制结果已与早期 ARM 处理不兼容的情况下才使用这些
辅助代码。也就是说,如果你编译的代码不会为了其他目的使用新的指令,则
不要仅为了不使用这些内核辅助代码而导致二进制程序无法在早期处理器上运
行。
 me  
当对那些拥有原生支持的新型处理器进行代码优化时,如果已经在其它代码中
使用类似的新指令,从而导致二进制结果已与早期 ARM 处理器不兼容的情况下,
用户空间最好绕过这些辅助代码,并在内联函数中实现这些操作(无论是通过
编译器在代码中直接发出,还是作为库函数调用实现的一部分)。也就是说,
如果你编译的代码不会为了其他目的使用新指令,则不要仅为了不使用这些内核
辅助代码,导致二进制程序无法在早期处理器上运行。

The attachment is my review result.

--
Dongsheng
Chinese translated version of Documentation/arm/kernel_user_helpers.txt

If you have any comment or update to the content, please contact the
original document maintainer directly.  However, if you have a problem
communicating in English you can also ask the Chinese maintainer for
help.  Contact the Chinese maintainer if this translation is outdated
or if there is a problem with the translation.

Maintainer: Nicolas Pitre 
Dave Martin 
Chinese maintainer: Fu Wei 
-
Documentation/arm/kernel_user_helpers.txt 的中文翻译

如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文
交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻
译存在问题,请联系中文版维护者。
英文版维护者: Nicolas Pitre 
Dave Martin 
中文版维护者: 傅炜 Fu Wei 
中文版翻译者: 傅炜 Fu Wei 
中文版校译者: 宋冬生 Dongsheng Song 


以下为正文
-
内核提供的用户空间辅助代码
==

在内核内存空间的固定地址处,有一个可从用户空间访问的由内核提供
的代码段。它用于向用户空间提供因在许多 ARM CPU 中未实现的特性和/或
指令而需内核提供帮助的某些操作。这些代码直接在用户模式下执行的想法是
为了获得最佳的效率,但那些与内核计数器联系过于紧密的部分,则被留给了用
户库实现。事实上,此代码甚至可能因不同的 CPU 而异,这取决于其可用的指
令集,或者它是否是 SMP 系统。换句话说,内核保留在不作出警告的情况下根据
需要更改这些代码的权利。只有本文档描述的入口及其结果是保证稳定的。

这与完全成熟的 VDSO 实现不同(但不排除),而 VDSO 可阻止某
些汇编技巧通过常量高效跳转到那些代码段。并且由于那些代码段在返
回用户代码前仅使用少量的代码周期,则一个 VDSO 间接远程调用将会在
这些简单的操作上增加一个可测量的开销。

当对那些拥有原生支持的新型处理器进行代码优化时,如果已经在其它代码中
使用类似的新指令,从而导致二进制结果已与早期 ARM 处理器不兼容的情况下,
用户空间最好绕过这些辅助代码,并在内联函数中实现这些操作(无论是通过
编译器在代码中直接发出,还是作为库函数调用实现的一部分)。也就是说,
如果你编译的代码不会为了其他目的使用新指令,则不要仅为了不使用这些内核
辅助代码,导致二进制程序无法在早期处理器上运行。

新的辅助代码可能随着时间的推移而增加,所以新内核中的某些辅助代码在旧
内核中可能不存在。因此,程序必须在对任何辅助代码调用假设是安全之前,
检测 __kuser_helper_version 的值(见下文)。理想情况下,这种检测应该
只在进程启动时执行一次;如果内核版本不支持所需辅助代码,则该进程可尽早
中止执行。

kuser_helper_version


位置: 0x0ffc

参考声明:

  extern int32_t __kuser_helper_version;

定义:

  这个区域包含了当前运行内核实现的辅助代码版本号。用户空间可以通过读
  取此版本号以确定特定的辅助代码是否存在。

使用范例:

#define __kuser_helper_version (*(int32_t *)0x0ffc)

void check_kuser_version(void)
{
if (__kuser_helper_version < 2) {
fprintf(stderr, "can't do atomic operations, kernel too old\n");
abort();
}
}

注意:

  用户空间可以假设这个域的值不会在任何单个进程的生存期内改变。也就
  是说,这个域可以仅在库的初始化阶段或进程启动阶段读取一次。

kuser_get_tls
-

位置: 0x0fe0

参考原型:

  void * __kuser_get_tls(void);

输入:

  lr = 返回地址

输出:

  r0 = TLS 值

被篡改的寄存器:

  无

定义:

  获取之前通过 __ARM_NR_set_tls 系统调用设置的 TLS 值。

使用范例:

typedef void * (__kuser_get_tls_t)(void);
#define __kuser_get_tls (*(__kuser_get_tls_t *)0x0fe0)

void foo()
{
void *tls = __kuser_get_tls();
printf("TLS = %p\n", tls);
}

注意:

  - 仅在 __kuser_helper_version >= 1 时,此辅助代码存在
(从内核版本 2.6.12 开始)。

kuser_cmpxchg
-

位置: 0x0fc0

参考原型:

  int __kuser_cmpxchg(int32_t oldval, int32_t newval, volatile int32_t *ptr);

输入:

  r0 = oldval
  r1 = newval
  r2 = ptr
  lr = 返回地址

输出:

  r0 = 成功代码 (零或非零)
  C flag = 如果 r0 == 0 则置 1,如果 r0 != 0 则清零。

被篡改的寄存器:

  r3, ip, flags

定义:

  仅在 *ptr 为 oldval 时原子保存 newval 于 *ptr 中。
  如果 *ptr 被改变,则返回值为零,否则为非零值。
  如果 *ptr 被改变,则 C flag 也会被置 1,以实现调用代码中的汇编
  优化。

使用范例:

typedef int (__kuser_cmpxchg_t)(int oldval, int newval, volatile int *ptr);
#define __kuser_cmpxchg (*(__kuser_cmpxchg_t *)0x0fc0)

int atomic_add(volatile int *ptr, int val)
{
int old, new;

do {
old = *ptr;
new = old + val;
} while(__kuser_cmpxchg(old, new, ptr));

return new;
}

注意:

  - 这个例程已根据需要包含了内存屏障。

  - 仅在 __kuser_helper_version >= 2 时,此辅助代码存在
(从内核版本 2.6.12 开始)。

kuser_memory_barrier


位置: 0x0fa0

参考原型:

  void __kuser_memory_barrier(void);

输入:

  lr = 返回地址


Re: [PATCH 2/2] therma: exynos: Supports thermal tripping

2012-10-30 Thread jonghwa3 . lee
On 2012년 10월 30일 14:32, Jonghwan Choi wrote:
> TMU urgently sends active-high signal (thermal trip) to PMU,
> and thermal tripping by hardware logic i.e PMU is performed.
> Thermal tripping means that PMU cut off the whole power of SoC
> by controlling external voltage regulator.
>
> Signed-off-by: Jonghwan Choi 
> ---
>  drivers/thermal/exynos_thermal.c |7 ++-
>  include/linux/platform_data/exynos_thermal.h |4 
>  2 files changed, 10 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/thermal/exynos_thermal.c
> b/drivers/thermal/exynos_thermal.c
> index 6ce6667..edac601 100644
> --- a/drivers/thermal/exynos_thermal.c
> +++ b/drivers/thermal/exynos_thermal.c
> @@ -53,6 +53,7 @@
>  #define EXYNOS_TMU_TRIM_TEMP_MASK  0xff
>  #define EXYNOS_TMU_GAIN_SHIFT  8
>  #define EXYNOS_TMU_REF_VOLTAGE_SHIFT   24
> +#define EXYNOS_TMU_TRIP_EN BIT(12)
>  #define EXYNOS_TMU_CORE_ON 1
>  #define EXYNOS_TMU_CORE_OFF0
>  #define EXYNOS_TMU_DEF_CODE_TO_TEMP_OFFSET 50
> @@ -656,6 +657,8 @@ static void exynos_tmu_control(struct platform_device
> *pdev, bool on)
> if (data->soc == SOC_ARCH_EXYNOS) {
> con |= pdata->noise_cancel_mode <<
> EXYNOS_TMU_TRIP_MODE_SHIFT;
> con |= (EXYNOS_MUX_ADDR_VALUE << EXYNOS_MUX_ADDR_SHIFT);
> +   if (pdata->trip_en)
> +   con |= EXYNOS_THERMAL_TRIP_EN;
> }
>
> if (on) {
> @@ -762,10 +765,12 @@ static struct exynos_tmu_platform_data const
> exynos_default_tmu_data = {
> .trigger_levels[0] = 85,
> .trigger_levels[1] = 103,
> .trigger_levels[2] = 110,
> +   .trigger_levels[3] = 120,
> .trigger_level0_en = 1,
> .trigger_level1_en = 1,
> .trigger_level2_en = 1,
> -   .trigger_level3_en = 0,
> +   .trigger_level3_en = 1,
> +   .trip_en = 1,
> .gain = 8,
> .reference_voltage = 16,
> .noise_cancel_mode = 4,
> diff --git a/include/linux/platform_data/exynos_thermal.h
> b/include/linux/platform_data/exynos_thermal.h
> index a7bdb2f..9e44aac 100644
> --- a/include/linux/platform_data/exynos_thermal.h
> +++ b/include/linux/platform_data/exynos_thermal.h
> @@ -79,6 +79,9 @@ struct freq_clip_table {
>   * @trigger_level3_en:
>   * 1 = enable trigger_level3 interrupt,
>   * 0 = disable trigger_level3 interrupt
> + * @trip_en:
> + * 1 = enable thermal tripping
> + * 0 = disable thermal tripping
>   * @gain: gain of amplifier in the positive-TC generator block
>   * 0 <= gain <= 15
>   * @reference_voltage: reference voltage of amplifier
> @@ -102,6 +105,7 @@ struct exynos_tmu_platform_data {
> bool trigger_level1_en;
> bool trigger_level2_en;
> bool trigger_level3_en;
> +   bool trip_en;
I think this variable addition is not necessary. It's enough to use
trigger_level3_en variable
to determine hardware thermal tripping enable or not. Since exynos4210
has different register map from
other exynos SOC's, trigger level 3 enable is existed different region.
exynos4210 has it in
interrupt enable register and others has it in TMU core control register
(exynos4x12 and exynos5
doesn't have trigger_level3_en bit in interrupt enable register).
Thus it's better to use trigger_level3_en variable to enable hardware
thermal tripping support for compatibility.

Thanks,
Jonghwa Lee.
>
> u8 gain;
> u8 reference_voltage;
> --
> 1.7.4.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFT] leds: blinkm: Don't init a workqueue per brightness_set call

2012-10-30 Thread Bryan Wu
On Tue, Oct 30, 2012 at 8:49 PM, Axel Lin  wrote:
>
>
> 2012/10/29 Axel Lin 
>>
>> Calling INIT_WORK in blinkm_led_common_set() means we init a workqueue
>> every time
>> when brightness_set callback is called.
>> Move INIT_WORK to blinkm_probe() so we only need to init the workqueue
>> once.
>>
>> So we only need to init a workqueue per blinkm led rather than init a
>> workqueue
>> per brightness_set call.
>
>
> Ah. the commit log and subject line is wrong.
> We are using the global workqueue.
> What the patch does is to avoid init work rather than workqueue per
> brightness_set call.
> We just need to call INIT_WORK once for the first time that structure is set
> up.
>
> There should be no functional change with this patch.
>
> Regards,
> Axel

Right, I think I did same review when this driver posted on the list firstly.
Probably it is a hardware issue. So let Jan-Simon give us some updates.

If the hardware requires this specific handling, I suggest we add some
comments or documents to clarify that.

Thanks,
-Bryan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] target fixes for v3.7-rc4

2012-10-30 Thread Nicholas A. Bellinger
On Tue, 2012-10-30 at 21:29 -0700, Nicholas A. Bellinger wrote:
> Hello Linus!
> 
> The following are the current target pending fixes headed for v3.7-rc4
> code.  This includes the following highlights:
> 
> - Fix long-standing qla2xxx target bug where certain fc_port_t state
> transitions could cause the internal session b-tree list to become
> out-of-sync. (Roland)
> - Fix task management double free of se_cmd descriptor in exception path
> for users of target_submit_tmr(). (nab)
> - Re-introduce simple NOP emulation of REZERO_UNIT, SEEK_6, and SEEK_10
> SCSI-2 commands in order to support legacy initiators that still require
> them.  (Bernhard)
> 
> Note these three patches are also CC'ed to stable.
> 
> Also, there a couple of outstanding (external) regressions that are
> still being tracked down for tcm_fc(FCoE) and tcm_vhost fabrics for
> v3.7.0 code, so please expect another PULL as these issues identified ->
> resolved.
> 

Whooops, forgot to include the usual location for an rc-fixes PULL:

  git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git master

Thanks !

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ti_tscadc: Match mfd sub devices to regmap interface

2012-10-30 Thread Russ Dill
On Wed, Oct 31, 2012 at 8:55 AM, Pantelis Antoniou
 wrote:
> The MFD parent device now uses a regmap, instead of direct
> memory access. Use the same method in the sub devices to avoid
> nasty surprises.
>
> Also rework the channel initialization of tiadc a bit.
>
> Signed-off-by: Pantelis Antoniou 
> ---
>  drivers/iio/adc/ti_am335x_adc.c   | 27 +++
>  drivers/input/touchscreen/ti_am335x_tsc.c | 16 +---
>  drivers/mfd/ti_am335x_tscadc.c|  7 +--
>  3 files changed, 37 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/iio/adc/ti_am335x_adc.c b/drivers/iio/adc/ti_am335x_adc.c
> index d48fd79..5f325c1 100644
> --- a/drivers/iio/adc/ti_am335x_adc.c
> +++ b/drivers/iio/adc/ti_am335x_adc.c
> @@ -23,7 +23,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
> +#include 
>  #include 
>  #include 
>
> @@ -36,13 +38,17 @@ struct tiadc_device {
>
>  static unsigned int tiadc_readl(struct tiadc_device *adc, unsigned int reg)
>  {
> -   return readl(adc->mfd_tscadc->tscadc_base + reg);
> +   unsigned int val;
> +
> +   val = (unsigned int)-1;
> +   regmap_read(adc->mfd_tscadc->regmap_tscadc, reg, );
> +   return val;
>  }

Would it be cleaner to instead do:

static unsigned int tiadc_readl(struct tiadc_device *adc, unsigned int reg)
{
   unsigned int val;

   return regmap_read(adc->mfd_tscadc->regmap_tscadc, reg, ) ? : val;
}

or
   int ret;

   ret = regmap_read(adc->mfd_tscadc->regmap_tscadc, reg, );
   return ret < 0 ret ? : val;



>  static void tiadc_writel(struct tiadc_device *adc, unsigned int reg,
> unsigned int val)
>  {
> -   writel(val, adc->mfd_tscadc->tscadc_base + reg);
> +   regmap_write(adc->mfd_tscadc->regmap_tscadc, reg, val);
>  }
>
>  static void tiadc_step_config(struct tiadc_device *adc_dev)
> @@ -75,22 +81,24 @@ static void tiadc_step_config(struct tiadc_device 
> *adc_dev)
> tiadc_writel(adc_dev, REG_SE, STPENB_STEPENB);
>  }
>
> -static int tiadc_channel_init(struct iio_dev *indio_dev, int channels)
> +static int tiadc_channel_init(struct iio_dev *indio_dev,
> +   struct tiadc_device *adc_dev)
>  {
> struct iio_chan_spec *chan_array;
> struct iio_chan_spec *chan;
> char *s;
> int i, len, size, ret;
> +   int channels = adc_dev->channels;
>
> -   size = indio_dev->num_channels * (sizeof(struct iio_chan_spec) + 6);
> +   size = channels * (sizeof(struct iio_chan_spec) + 6);
> chan_array = kzalloc(size, GFP_KERNEL);
> if (chan_array == NULL)
> return -ENOMEM;
>
> /* buffer space is after the array */
> -   s = (char *)(chan_array + indio_dev->num_channels);
> +   s = (char *)(chan_array + channels);
> chan = chan_array;
> -   for (i = 0; i < indio_dev->num_channels; i++, chan++, s += len + 1) {
> +   for (i = 0; i < channels; i++, chan++, s += len + 1) {
>
> len = sprintf(s, "AIN%d", i);
>
> @@ -105,8 +113,9 @@ static int tiadc_channel_init(struct iio_dev *indio_dev, 
> int channels)
> }
>
> indio_dev->channels = chan_array;
> +   indio_dev->num_channels = channels;
>
> -   size = (indio_dev->num_channels + 1) * sizeof(struct iio_map);
> +   size = (channels + 1) * sizeof(struct iio_map);
> adc_dev->map = kzalloc(size, GFP_KERNEL);
> if (adc_dev->map == NULL) {
> kfree(chan_array);
> @@ -203,7 +212,7 @@ static int __devinit tiadc_probe(struct platform_device 
> *pdev)
>
> tiadc_step_config(adc_dev);
>
> -   err = tiadc_channel_init(indio_dev, adc_dev->channels);
> +   err = tiadc_channel_init(indio_dev, adc_dev);
> if (err < 0)
> goto err_free_device;
>
> @@ -213,6 +222,8 @@ static int __devinit tiadc_probe(struct platform_device 
> *pdev)
>
> platform_set_drvdata(pdev, indio_dev);
>
> +   dev_info(>dev, "Initialized\n");
> +
> return 0;
>
>  err_free_channels:
> diff --git a/drivers/input/touchscreen/ti_am335x_tsc.c 
> b/drivers/input/touchscreen/ti_am335x_tsc.c
> index 7a26810..d09e1a7 100644
> --- a/drivers/input/touchscreen/ti_am335x_tsc.c
> +++ b/drivers/input/touchscreen/ti_am335x_tsc.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>
> @@ -64,13 +65,17 @@ struct titsc {
>
>  static unsigned int titsc_readl(struct titsc *ts, unsigned int reg)
>  {
> -   return readl(ts->mfd_tscadc->tscadc_base + reg);
> +   unsigned int val;
> +
> +   val = (unsigned int)-1;
> +   regmap_read(ts->mfd_tscadc->regmap_tscadc, reg, );
> +   return val;
>  }
>
>  static void titsc_writel(struct titsc *tsc, unsigned int reg,
> unsigned int val)
>  {
> -   writel(val, tsc->mfd_tscadc->tscadc_base + reg);
> +   regmap_write(tsc->mfd_tscadc->regmap_tscadc, reg, val);
>  }
>
>  /*
> @@ -455,10 

Re: [PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-30 Thread H. Peter Anvin

On 10/30/2012 08:39 PM, Zhang, Jun wrote:

Hello, Anvin
Thanks!

Hello, all
Next is my the latest version, please review it.
Thanks!


You're still starting in the wrong end which is confusing for the reader.

What you probably want to say is something more like:

"We are doing a crash dump, so remove all RAM ranges as they are the 
ones that need to be dumped.  We still need all non-RAM information in 
order to do I/O."


At that point it should be pretty obvious that the patch is wrong.  What 
if we are *not* doing a crash dump?  Just because crash dump is compiled 
in doesn't mean that that is what we are doing right now.


-hpa


 From 141546c77ff7be523a9e72f5259df4a6827f2c1a Mon Sep 17 00:00:00 2001
From: jzha144 
Date: Wed, 31 Oct 2012 08:51:18 +0800
Subject: [PATCH] If we are doing a crash dump, we still need non-E820_RAM
  memory type address information, which come from BIOS or
  firmware. for example: PCI_MMCONFIG check this address.

Signed-off-by: jzha144 
---
  arch/x86/kernel/e820.c |9 +
  1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index df06ade..f8672d0 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
 * reset.
 */
saved_max_pfn = e820_end_of_ram_pfn();
+
+   /*
+* If we are doing a crash dump, we still need non-E820_RAM
+* memory type address information. so we only remove
+* E820_RAM type.
+*/
+   e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
+   userdef = 1;
+   return 0;
  #endif
e820.nr_map = 0;
userdef = 1;




--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git pull] drm nouveau fixes

2012-10-30 Thread Dave Airlie

Hi Linus,

just a nouveau set, since we have a couple of reports on lkml/dri-devel of 
regressions that this should fix I sent it along on its own.

Dave.

The following changes since commit 8f0d8163b50e01f398b14bcd4dc039ac5ab18d64:

  Linux 3.7-rc3 (2012-10-28 12:24:48 -0700)

are available in the git repository at:
  git://people.freedesktop.org/~airlied/linux drm-fixes

Ben Skeggs (6):
  drm/nouveau: silence modesetting spam on pre-gf8 chipsets
  drm/nouveau/i2c: fix typo when checking nvio i2c port validity
  drm/nouveau: allow creation of zero-sized mm
  drm/nv50/fb: prevent oops on chipsets without compression tags
  drm/nouveau: resurrect headless mode since rework
  drm/nouveau: headless mode by default if pci class != vga display

Dave Airlie (1):
  Merge branch 'drm-nouveau-fixes' of 
git://people.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes

 drivers/gpu/drm/nouveau/core/core/mm.c |9 --
 drivers/gpu/drm/nouveau/core/include/core/mm.h |1 -
 drivers/gpu/drm/nouveau/core/subdev/fb/nv50.c  |   10 ++
 drivers/gpu/drm/nouveau/core/subdev/i2c/base.c |2 +-
 drivers/gpu/drm/nouveau/nouveau_display.c  |   36 ++--
 drivers/gpu/drm/nouveau/nouveau_drm.c  |   36 ++--
 drivers/gpu/drm/nouveau/nouveau_drm.h  |2 +
 drivers/gpu/drm/nouveau/nouveau_irq.c  |   16 ++
 drivers/gpu/drm/nouveau/nv04_dac.c |   16 +-
 drivers/gpu/drm/nouveau/nv04_dfp.c |   14 
 drivers/gpu/drm/nouveau/nv04_tv.c  |9 ++---
 11 files changed, 83 insertions(+), 68 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] exynos: mmc: use correct variable for MODULE_DEVICE_TABLE

2012-10-30 Thread Jaehoon Chung
Looks good to me.

Acked-by: Jaehoon Chung 

On 10/31/2012 07:21 AM, Sergei Trofimovich wrote:
> From: Sergei Trofimovich 
> 
> Found by gcc:
> 
> linux-2.6/drivers/mmc/host/dw_mmc-exynos.c: At top level:
> linux-2.6/drivers/mmc/host/dw_mmc-exynos.c:226:1: error: 
> '__mod_of_device_table' aliased to undefined symbol 'dw_mci_pltfm_match'
> 
> CC: Chris Ball 
> CC: Thomas Abraham 
> CC: Will Newton 
> CC: linux-...@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> Signed-off-by: Sergei Trofimovich 
> ---
>  drivers/mmc/host/dw_mmc-exynos.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/host/dw_mmc-exynos.c 
> b/drivers/mmc/host/dw_mmc-exynos.c
> index 660bbc5..0147ac3a 100644
> --- a/drivers/mmc/host/dw_mmc-exynos.c
> +++ b/drivers/mmc/host/dw_mmc-exynos.c
> @@ -223,7 +223,7 @@ static const struct of_device_id dw_mci_exynos_match[] = {
>   .data = (void *)_drv_data, },
>   {},
>  };
> -MODULE_DEVICE_TABLE(of, dw_mci_pltfm_match);
> +MODULE_DEVICE_TABLE(of, dw_mci_exynos_match);
>  
>  int dw_mci_exynos_probe(struct platform_device *pdev)
>  {
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] target fixes for v3.7-rc4

2012-10-30 Thread Nicholas A. Bellinger
Hello Linus!

The following are the current target pending fixes headed for v3.7-rc4
code.  This includes the following highlights:

- Fix long-standing qla2xxx target bug where certain fc_port_t state
transitions could cause the internal session b-tree list to become
out-of-sync. (Roland)
- Fix task management double free of se_cmd descriptor in exception path
for users of target_submit_tmr(). (nab)
- Re-introduce simple NOP emulation of REZERO_UNIT, SEEK_6, and SEEK_10
SCSI-2 commands in order to support legacy initiators that still require
them.  (Bernhard)

Note these three patches are also CC'ed to stable.

Also, there a couple of outstanding (external) regressions that are
still being tracked down for tcm_fc(FCoE) and tcm_vhost fabrics for
v3.7.0 code, so please expect another PULL as these issues identified ->
resolved.

Thank you,

--nab

Bernhard Kohl (1):
  target: reintroduce some obsolete SCSI-2 commands

Nicholas Bellinger (2):
  qla2xxx: Add missing ->vport_slock while calling qlt_update_vp_map
  target: Fix double-free of se_cmd in target_complete_tmr_failure

Roland Dreier (2):
  tcm_qla2xxx: Format VPD page 83h SCSI name string according to SPC
  qla2xxx: Update target lookup session tables when a target session
changes

 drivers/scsi/qla2xxx/qla_mid.c |3 +
 drivers/scsi/qla2xxx/qla_target.c  |   25 +--
 drivers/scsi/qla2xxx/qla_target.h  |1 +
 drivers/scsi/qla2xxx/tcm_qla2xxx.c |   77 +++-
 drivers/scsi/qla2xxx/tcm_qla2xxx.h |2 +
 drivers/target/target_core_sbc.c   |   18 +++
 drivers/target/target_core_transport.c |1 -
 7 files changed, 111 insertions(+), 16 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHSET] cgroup: simplify cgroup removal path

2012-10-30 Thread Tejun Heo
Hello, guys.

cgroup removal path is quite ugly.  A lot of the ugliness comes from
the weird design which allows ->pre_destroy() to fail and the feature
to drain existing CSS reference counts before committing to removal.
Both mean that it should be possible to roll-back cgroup destruction
after some or all ->pre_destroy() invocations.

This weird design has never really worked.  To list a couple examples.

 * Some ->pre_destroy() implementations aren't side-effect free.
   Roll-back happens after a lot of state is already lost.

 * Some ->pre_destroy() implementations (naturally) assume that the
   cgroup being destroyed would stay quiescent between successful
   ->pre_destroy() and its destruction.  Unfortunately, any operation
   can happen inbetween and the cgroup could be in a very different
   state by the time it actually gets destroyed.

It's just such an unusual design which unnecessarily contains weird
code path combinations which are tricky to hit, reproduce and expect.
Moreover, the design's deficiencies attracts kludges on top as
workarounds and we end up with stuff like cgroup_exclude_rmdir() and
cgroup_release_and_wakeup_rmdir() which really make me want to cry.

Now that memcg has moved away from failable ->pre_destroy(), we can do
away with all these.  I tested some basic operations and some corner
cases but am still a bit scared.  Would love to get acks from Li and
memcg people.

This patchset contains the following eight patches.

 0001-cgroup-kill-cgroup_subsys-__DEPRECATED_clear_css_ref.patch
 0002-cgroup-kill-CSS_REMOVED.patch
 0003-cgroup-use-cgroup_lock_live_group-parent-in-cgroup_c.patch
 0004-cgroup-deactivate-CSS-s-and-mark-cgroup-dead-before-.patch
 0005-cgroup-remove-CGRP_WAIT_ON_RMDIR-cgroup_exclude_rmdi.patch
 0006-memcg-make-mem_cgroup_reparent_charges-non-failing.patch
 0007-hugetlb-do-not-fail-in-hugetlb_cgroup_pre_destroy.patch
 0008-cgroup-make-pre_destroy-return-void.patch

0001-0002 remove now unused ->pre_destroy() failure handling and do
follow-up simplification.

0003-0004 update removal path such that each ->pre_destroy() is
guaranteed to be invoked once per removal and the cgroup being
destroyed stays quiescent until destruction is complete.

0005 removes the scary CGRP_WAIT_ON_RMDIR mechanism.

0006-0008 are follow-up clean-ups.  0006 and 0007 are from Michal's
patchset[1].

This patchset is on top of

  v3.6 (a0d271cbfe)
+ [1] the first three patches of
  "memcg/cgroup: do not fail fail on pre_destroy callbacks" patchset

and available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 
review-cgroup-rmdir-updates

Thanks.

 block/blk-cgroup.c |3 
 include/linux/cgroup.h |   41 ---
 kernel/cgroup.c|  256 +++--
 mm/hugetlb_cgroup.c|   11 --
 mm/memcontrol.c|   51 +
 5 files changed, 75 insertions(+), 287 deletions(-)

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel.cgroups/4757
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/8] cgroup: kill CSS_REMOVED

2012-10-30 Thread Tejun Heo
CSS_REMOVED is one of the several contortions which were necessary to
support css reference draining on cgroup removal.  All css->refcnts
which need draining should be deactivated and verified to equal zero
atomically w.r.t. css_tryget().  If any one isn't zero, all refcnts
needed to be re-activated and css_tryget() shouldn't fail in the
process.

This was achieved by letting css_tryget() busy-loop until either the
refcnt is reactivated (failed removal attempt) or CSS_REMOVED is set
(committing to removal).

Now that css refcnt draining is no longer used, there's no need for
atomic rollback mechanism.  css_tryget() simply can look at the
reference count and fail if the it's deactivated - it's never getting
re-activated.

This patch removes CSS_REMOVED and updates __css_tryget() to fail if
the refcnt is deactivated.

Note that this removes css_is_removed() whose only user is VM_BUG_ON()
in memcontrol.c.  We can replace it with a check on the refcnt but
given that the only use case is a debug assert, I think it's better to
simply unexport it.

Signed-off-by: Tejun Heo 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Balbir Singh 
Cc: KAMEZAWA Hiroyuki 
---
 include/linux/cgroup.h |  6 --
 kernel/cgroup.c| 31 ---
 mm/memcontrol.c|  7 +++
 3 files changed, 15 insertions(+), 29 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 02e09c0..a309804 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -85,7 +85,6 @@ struct cgroup_subsys_state {
 /* bits in struct cgroup_subsys_state flags field */
 enum {
CSS_ROOT, /* This CSS is the root of the subsystem */
-   CSS_REMOVED, /* This CSS is dead */
 };
 
 /* Caller must verify that the css is not for root cgroup */
@@ -108,11 +107,6 @@ static inline void css_get(struct cgroup_subsys_state *css)
__css_get(css, 1);
 }
 
-static inline bool css_is_removed(struct cgroup_subsys_state *css)
-{
-   return test_bit(CSS_REMOVED, >flags);
-}
-
 /*
  * Call css_tryget() to take a reference on a css if your existing
  * (known-valid) reference isn't already ref-counted. Returns false if
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 033bf4b..a49cdbc 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -170,8 +170,8 @@ struct css_id {
 * The css to which this ID points. This pointer is set to valid value
 * after cgroup is populated. If cgroup is removed, this will be NULL.
 * This pointer is expected to be RCU-safe because destroy()
-* is called after synchronize_rcu(). But for safe use, css_is_removed()
-* css_tryget() should be used for avoiding race.
+* is called after synchronize_rcu(). But for safe use, css_tryget()
+* should be used for avoiding race.
 */
struct cgroup_subsys_state __rcu *css;
/*
@@ -4088,8 +4088,6 @@ static int cgroup_rmdir(struct inode *unused_dir, struct 
dentry *dentry)
}
prepare_to_wait(_rmdir_waitq, , TASK_INTERRUPTIBLE);
 
-   local_irq_disable();
-
/* block new css_tryget() by deactivating refcnt */
for_each_subsys(cgrp->root, ss) {
struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
@@ -4099,21 +4097,14 @@ static int cgroup_rmdir(struct inode *unused_dir, 
struct dentry *dentry)
}
 
/*
-* Set REMOVED.  All in-progress css_tryget() will be released.
 * Put all the base refs.  Each css holds an extra reference to the
 * cgroup's dentry and cgroup removal proceeds regardless of css
 * refs.  On the last put of each css, whenever that may be, the
 * extra dentry ref is put so that dentry destruction happens only
 * after all css's are released.
 */
-   for_each_subsys(cgrp->root, ss) {
-   struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
-
-   set_bit(CSS_REMOVED, >flags);
-   css_put(css);
-   }
-
-   local_irq_enable();
+   for_each_subsys(cgrp->root, ss)
+   css_put(cgrp->subsys[ss->subsys_id]);
 
finish_wait(_rmdir_waitq, );
clear_bit(CGRP_WAIT_ON_RMDIR, >flags);
@@ -4837,15 +4828,17 @@ static void check_for_release(struct cgroup *cgrp)
 /* Caller must verify that the css is not for root cgroup */
 bool __css_tryget(struct cgroup_subsys_state *css)
 {
-   do {
-   int v = css_refcnt(css);
+   while (true) {
+   int t, v;
 
-   if (atomic_cmpxchg(>refcnt, v, v + 1) == v)
+   v = css_refcnt(css);
+   t = atomic_cmpxchg(>refcnt, v, v + 1);
+   if (likely(t == v))
return true;
+   else if (t < 0)
+   return false;
cpu_relax();
-   } while (!test_bit(CSS_REMOVED, >flags));
-
-   return false;
+   }
 }
 EXPORT_SYMBOL_GPL(__css_tryget);
 
diff --git 

[PATCH 4/8] cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy()

2012-10-30 Thread Tejun Heo
Because ->pre_destroy() could fail and can't be called under
cgroup_mutex, cgroup destruction did something very ugly.

  1. Grab cgroup_mutex and verify it can be destroyed; fail otherwise.

  2. Release cgroup_mutex and call ->pre_destroy().

  3. Re-grab cgroup_mutex and verify it can still be destroyed; fail
 otherwise.

  4. Continue destroying.

In addition to being ugly, it has been always broken in various ways.
For example, memcg ->pre_destroy() expects the cgroup to be inactive
after it's done but tasks can be attached and detached between #2 and
#3 and the conditions that memcg verified in ->pre_destroy() might no
longer hold by the time control reaches #3.

Now that ->pre_destroy() is no longer allowed to fail.  We can switch
to the following.

  1. Grab cgroup_mutex and fail if it can't be destroyed; fail
 otherwise.

  2. Deactivate CSS's and mark the cgroup removed thus preventing any
 further operations which can invalidate the verification from #1.

  3. Release cgroup_mutex and call ->pre_destroy().

  4. Re-grab cgroup_mutex and continue destroying.

After this change, controllers can safely assume that ->pre_destroy()
will only be called only once for a given cgroup and, once
->pre_destroy() is called, the cgroup will stay dormant till it's
destroyed.

Signed-off-by: Tejun Heo 
---
 kernel/cgroup.c | 41 +++--
 1 file changed, 19 insertions(+), 22 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b3010ae..66204a6 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4058,18 +4058,6 @@ static int cgroup_rmdir(struct inode *unused_dir, struct 
dentry *dentry)
struct cgroup_event *event, *tmp;
struct cgroup_subsys *ss;
 
-   /* the vfs holds both inode->i_mutex already */
-   mutex_lock(_mutex);
-   if (atomic_read(>count) != 0) {
-   mutex_unlock(_mutex);
-   return -EBUSY;
-   }
-   if (!list_empty(>children)) {
-   mutex_unlock(_mutex);
-   return -EBUSY;
-   }
-   mutex_unlock(_mutex);
-
/*
 * In general, subsystem has no css->refcnt after pre_destroy(). But
 * in racy cases, subsystem may have to get css->refcnt after
@@ -4081,14 +4069,7 @@ static int cgroup_rmdir(struct inode *unused_dir, struct 
dentry *dentry)
 */
set_bit(CGRP_WAIT_ON_RMDIR, >flags);
 
-   /*
-* Call pre_destroy handlers of subsys. Notify subsystems
-* that rmdir() request comes.
-*/
-   for_each_subsys(cgrp->root, ss)
-   if (ss->pre_destroy)
-   WARN_ON_ONCE(ss->pre_destroy(cgrp));
-
+   /* the vfs holds both inode->i_mutex already */
mutex_lock(_mutex);
parent = cgrp->parent;
if (atomic_read(>count) || !list_empty(>children)) {
@@ -4098,13 +4079,30 @@ static int cgroup_rmdir(struct inode *unused_dir, 
struct dentry *dentry)
}
prepare_to_wait(_rmdir_waitq, , TASK_INTERRUPTIBLE);
 
-   /* block new css_tryget() by deactivating refcnt */
+   /*
+* Block new css_tryget() by deactivating refcnt and mark @cgrp
+* removed.  This makes future css_tryget() and child creation
+* attempts fail thus maintaining the removal conditions verified
+* above.
+*/
for_each_subsys(cgrp->root, ss) {
struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
 
WARN_ON(atomic_read(>refcnt) < 0);
atomic_add(CSS_DEACT_BIAS, >refcnt);
}
+   set_bit(CGRP_REMOVED, >flags);
+
+   /*
+* Tell subsystems to initate destruction.  pre_destroy() should be
+* called with cgroup_mutex unlocked.  See 3fa59dfbc3 ("cgroup: fix
+* potential deadlock in pre_destroy") for details.
+*/
+   mutex_unlock(_mutex);
+   for_each_subsys(cgrp->root, ss)
+   if (ss->pre_destroy)
+   WARN_ON_ONCE(ss->pre_destroy(cgrp));
+   mutex_lock(_mutex);
 
/*
 * Put all the base refs.  Each css holds an extra reference to the
@@ -4120,7 +4118,6 @@ static int cgroup_rmdir(struct inode *unused_dir, struct 
dentry *dentry)
clear_bit(CGRP_WAIT_ON_RMDIR, >flags);
 
raw_spin_lock(_list_lock);
-   set_bit(CGRP_REMOVED, >flags);
if (!list_empty(>release_list))
list_del_init(>release_list);
raw_spin_unlock(_list_lock);
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/8] cgroup: remove CGRP_WAIT_ON_RMDIR, cgroup_exclude_rmdir() and cgroup_release_and_wakeup_rmdir()

2012-10-30 Thread Tejun Heo
CGRP_WAIT_ON_RMDIR is another kludge which was added to make cgroup
destruction rollback somewhat working.  cgroup_rmdir() used to drain
CSS references and CGRP_WAIT_ON_RMDIR and the associated waitqueue and
helpers were used to allow the task performing rmdir to wait for the
next relevant event.

Unfortunately, the wait is visible to controllers too and the
mechanism got exposed to memcg by 887032670d ("cgroup avoid permanent
sleep at rmdir").

Now that the draining and retries are gone, CGRP_WAIT_ON_RMDIR is
unnecessary.  Remove it and all the mechanisms supporting it.  Note
that memcontrol.c changes are essentially revert of 887032670d
("cgroup avoid permanent sleep at rmdir").

Signed-off-by: Tejun Heo 
Cc: Michal Hocko 
Cc: Balbir Singh 
Cc: KAMEZAWA Hiroyuki 
---
 include/linux/cgroup.h | 21 -
 kernel/cgroup.c| 51 --
 mm/memcontrol.c| 24 +---
 3 files changed, 1 insertion(+), 95 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index a309804..47868a8 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -145,10 +145,6 @@ enum {
/* Control Group requires release notifications to userspace */
CGRP_NOTIFY_ON_RELEASE,
/*
-* A thread in rmdir() is wating for this cgroup.
-*/
-   CGRP_WAIT_ON_RMDIR,
-   /*
 * Clone cgroup values when creating a new child cgroup
 */
CGRP_CLONE_CHILDREN,
@@ -412,23 +408,6 @@ int cgroup_task_count(const struct cgroup *cgrp);
 int cgroup_is_descendant(const struct cgroup *cgrp, struct task_struct *task);
 
 /*
- * When the subsys has to access css and may add permanent refcnt to css,
- * it should take care of racy conditions with rmdir(). Following set of
- * functions, is for stop/restart rmdir if necessary.
- * Because these will call css_get/put, "css" should be alive css.
- *
- *  cgroup_exclude_rmdir();
- *  ...do some jobs which may access arbitrary empty cgroup
- *  cgroup_release_and_wakeup_rmdir();
- *
- *  When someone removes a cgroup while cgroup_exclude_rmdir() holds it,
- *  it sleeps and cgroup_release_and_wakeup_rmdir() will wake him up.
- */
-
-void cgroup_exclude_rmdir(struct cgroup_subsys_state *css);
-void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css);
-
-/*
  * Control Group taskset, used to pass around set of tasks to cgroup_subsys
  * methods.
  */
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 66204a6..c5f6fb2 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -966,33 +966,6 @@ static void cgroup_d_remove_dir(struct dentry *dentry)
 }
 
 /*
- * A queue for waiters to do rmdir() cgroup. A tasks will sleep when
- * cgroup->count == 0 && list_empty(>children) && subsys has some
- * reference to css->refcnt. In general, this refcnt is expected to goes down
- * to zero, soon.
- *
- * CGRP_WAIT_ON_RMDIR flag is set under cgroup's inode->i_mutex;
- */
-static DECLARE_WAIT_QUEUE_HEAD(cgroup_rmdir_waitq);
-
-static void cgroup_wakeup_rmdir_waiter(struct cgroup *cgrp)
-{
-   if (unlikely(test_and_clear_bit(CGRP_WAIT_ON_RMDIR, >flags)))
-   wake_up_all(_rmdir_waitq);
-}
-
-void cgroup_exclude_rmdir(struct cgroup_subsys_state *css)
-{
-   css_get(css);
-}
-
-void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css)
-{
-   cgroup_wakeup_rmdir_waiter(css->cgroup);
-   css_put(css);
-}
-
-/*
  * Call with cgroup_mutex held. Drops reference counts on modules, including
  * any duplicate ones that parse_cgroupfs_options took. If this function
  * returns an error, no reference counts are touched.
@@ -1963,12 +1936,6 @@ int cgroup_attach_task(struct cgroup *cgrp, struct 
task_struct *tsk)
}
 
synchronize_rcu();
-
-   /*
-* wake up rmdir() waiter. the rmdir should fail since the cgroup
-* is no longer empty.
-*/
-   cgroup_wakeup_rmdir_waiter(cgrp);
 out:
if (retval) {
for_each_subsys(root, ss) {
@@ -2138,7 +2105,6 @@ static int cgroup_attach_proc(struct cgroup *cgrp, struct 
task_struct *leader)
 * step 5: success! and cleanup
 */
synchronize_rcu();
-   cgroup_wakeup_rmdir_waiter(cgrp);
retval = 0;
 out_put_css_set_refs:
if (retval) {
@@ -4058,26 +4024,13 @@ static int cgroup_rmdir(struct inode *unused_dir, 
struct dentry *dentry)
struct cgroup_event *event, *tmp;
struct cgroup_subsys *ss;
 
-   /*
-* In general, subsystem has no css->refcnt after pre_destroy(). But
-* in racy cases, subsystem may have to get css->refcnt after
-* pre_destroy() and it makes rmdir return with -EBUSY. This sometimes
-* make rmdir return -EBUSY too often. To avoid that, we use waitqueue
-* for cgroup's rmdir. CGRP_WAIT_ON_RMDIR is for synchronizing rmdir
-* and subsystem's reference count handling. Please see css_get/put
-* 

[PATCH 3/8] cgroup: use cgroup_lock_live_group(parent) in cgroup_create()

2012-10-30 Thread Tejun Heo
This patch makes cgroup_create() fail if @parent is marked removed.
This is to prepare for further updates to cgroup_rmdir() path.

Note that this change isn't strictly necessary.  cgroup can only be
created via mkdir and the removed marking and dentry removal happen
without releasing cgroup_mutex, so cgroup_create() can never race with
cgroup_rmdir().  Even after the scheduled updates to cgroup_rmdir(),
cgroup_mkdir() and cgroup_rmdir() are synchronized by i_mutex
rendering the added liveliness check unnecessary.

Do it anyway such that locking is contained inside cgroup proper and
we don't get nasty surprises if we ever grow another caller of
cgroup_create().

Signed-off-by: Tejun Heo 
---
 kernel/cgroup.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index a49cdbc..b3010ae 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3906,6 +3906,18 @@ static long cgroup_create(struct cgroup *parent, struct 
dentry *dentry,
if (!cgrp)
return -ENOMEM;
 
+   /*
+* Only live parents can have children.  Note that the liveliness
+* check isn't strictly necessary because cgroup_mkdir() and
+* cgroup_rmdir() are fully synchronized by i_mutex; however, do it
+* anyway so that locking is contained inside cgroup proper and we
+* don't get nasty surprises if we ever grow another caller.
+*/
+   if (!cgroup_lock_live_group(parent)) {
+   err = -ENODEV;
+   goto err_free;
+   }
+
/* Grab a reference on the superblock so the hierarchy doesn't
 * get deleted on unmount if there are child cgroups.  This
 * can be done outside cgroup_mutex, since the sb can't
@@ -3913,8 +3925,6 @@ static long cgroup_create(struct cgroup *parent, struct 
dentry *dentry,
 * fs */
atomic_inc(>s_active);
 
-   mutex_lock(_mutex);
-
init_cgroup_housekeeping(cgrp);
 
cgrp->parent = parent;
@@ -3985,7 +3995,7 @@ static long cgroup_create(struct cgroup *parent, struct 
dentry *dentry,
 
/* Release the reference count that we took on the superblock */
deactivate_super(sb);
-
+err_free:
kfree(cgrp);
return err;
 }
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/8] cgroup: kill cgroup_subsys->__DEPRECATED_clear_css_refs

2012-10-30 Thread Tejun Heo
2ef37d3fe4 ("memcg: Simplify mem_cgroup_force_empty_list error
handling") removed the last user of __DEPRECATED_clear_css_refs.  This
patch removes __DEPRECATED_clear_css_refs and mechanisms to support
it.

* Conditionals dependent on __DEPRECATED_clear_css_refs removed.

* ->pre_destroy() now can only fail if a new task is attached or child
  cgroup is created while ->pre_destroy()s are being called.  As the
  condition is checked again after re-acquiring cgroup_mutex
  afterwards, we don't need to take any immediate action on
  ->pre_destroy() failures.  This reduces cgroup_call_pre_destroy() to
  a simple loop surrounding ->pre_destory().  Remove
  cgroup_call_pre_destroy() and open-code the loop into
  cgroup_rmdir().

* cgroup_clear_css_refs() can no longer fail.  All that needs to be
  done are deactivating refcnts, setting CSS_REMOVED and putting the
  base reference on each css.  Remove cgroup_clear_css_refs() and the
  failure path, and open-code the loops into cgroup_rmdir().

Note that cgroup_rmdir() will see more cleanup soon.

Signed-off-by: Tejun Heo 
---
 include/linux/cgroup.h |  12 
 kernel/cgroup.c| 159 -
 2 files changed, 38 insertions(+), 133 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index c90eaa8..02e09c0 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -86,7 +86,6 @@ struct cgroup_subsys_state {
 enum {
CSS_ROOT, /* This CSS is the root of the subsystem */
CSS_REMOVED, /* This CSS is dead */
-   CSS_CLEAR_CSS_REFS, /* @ss->__DEPRECATED_clear_css_refs */
 };
 
 /* Caller must verify that the css is not for root cgroup */
@@ -485,17 +484,6 @@ struct cgroup_subsys {
 */
bool use_id;
 
-   /*
-* If %true, cgroup removal will try to clear css refs by retrying
-* ss->pre_destroy() until there's no css ref left.  This behavior
-* is strictly for backward compatibility and will be removed as
-* soon as the current user (memcg) is updated.
-*
-* If %false, ss->pre_destroy() can't fail and cgroup removal won't
-* wait for css refs to drop to zero before proceeding.
-*/
-   bool __DEPRECATED_clear_css_refs;
-
 #define MAX_CGROUP_TYPE_NAMELEN 32
const char *name;
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7981850..033bf4b 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -851,30 +851,6 @@ static struct inode *cgroup_new_inode(umode_t mode, struct 
super_block *sb)
return inode;
 }
 
-/*
- * Call subsys's pre_destroy handler.
- * This is called before css refcnt check.
- */
-static int cgroup_call_pre_destroy(struct cgroup *cgrp)
-{
-   struct cgroup_subsys *ss;
-   int ret = 0;
-
-   for_each_subsys(cgrp->root, ss) {
-   if (!ss->pre_destroy)
-   continue;
-
-   ret = ss->pre_destroy(cgrp);
-   if (ret) {
-   /* ->pre_destroy() failure is being deprecated */
-   WARN_ON_ONCE(!ss->__DEPRECATED_clear_css_refs);
-   break;
-   }
-   }
-
-   return ret;
-}
-
 static void cgroup_diput(struct dentry *dentry, struct inode *inode)
 {
/* is dentry a directory ? if so, kfree() associated cgroup */
@@ -3901,14 +3877,12 @@ static void init_cgroup_css(struct cgroup_subsys_state 
*css,
cgrp->subsys[ss->subsys_id] = css;
 
/*
-* If !clear_css_refs, css holds an extra ref to @cgrp->dentry
-* which is put on the last css_put().  dput() requires process
-* context, which css_put() may be called without.  @css->dput_work
-* will be used to invoke dput() asynchronously from css_put().
+* css holds an extra ref to @cgrp->dentry which is put on the last
+* css_put().  dput() requires process context, which css_put() may
+* be called without.  @css->dput_work will be used to invoke
+* dput() asynchronously from css_put().
 */
INIT_WORK(>dput_work, css_dput_fn);
-   if (ss->__DEPRECATED_clear_css_refs)
-   set_bit(CSS_CLEAR_CSS_REFS, >flags);
 }
 
 /*
@@ -3978,10 +3952,9 @@ static long cgroup_create(struct cgroup *parent, struct 
dentry *dentry,
if (err < 0)
goto err_remove;
 
-   /* If !clear_css_refs, each css holds a ref to the cgroup's dentry */
+   /* each css holds a ref to the cgroup's dentry */
for_each_subsys(root, ss)
-   if (!ss->__DEPRECATED_clear_css_refs)
-   dget(dentry);
+   dget(dentry);
 
/* The cgroup directory was pre-locked for us */
BUG_ON(!mutex_is_locked(>dentry->d_inode->i_mutex));
@@ -4066,71 +4039,6 @@ static int cgroup_has_css_refs(struct cgroup *cgrp)
return 0;
 }
 
-/*
- * Atomically mark all (or else none) of the cgroup's CSS objects as
- * CSS_REMOVED. Return 

[PATCH 7/8] hugetlb: do not fail in hugetlb_cgroup_pre_destroy

2012-10-30 Thread Tejun Heo
From: Michal Hocko 

Now that pre_destroy callbacks are called from the context where neither
any task can attach the group nor any children group can be added there
is no other way to fail from hugetlb_pre_destroy.

Signed-off-by: Michal Hocko 
Reviewed-by: Tejun Heo 
Reviewed-by: Glauber Costa 
Signed-off-by: Tejun Heo 
---
 mm/hugetlb_cgroup.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index a3f358f..dc595c6 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -159,14 +159,9 @@ static int hugetlb_cgroup_pre_destroy(struct cgroup 
*cgroup)
 {
struct hstate *h;
struct page *page;
-   int ret = 0, idx = 0;
+   int idx = 0;
 
do {
-   if (cgroup_task_count(cgroup) ||
-   !list_empty(>children)) {
-   ret = -EBUSY;
-   goto out;
-   }
for_each_hstate(h) {
spin_lock(_lock);
list_for_each_entry(page, >hugepage_activelist, lru)
@@ -177,8 +172,8 @@ static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
}
cond_resched();
} while (hugetlb_cgroup_have_usage(cgroup));
-out:
-   return ret;
+
+   return 0;
 }
 
 int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/8] cgroup: make ->pre_destroy() return void

2012-10-30 Thread Tejun Heo
All ->pre_destory() implementations return 0 now, which is the only
allowed return value.  Make it return void.

Signed-off-by: Tejun Heo 
Cc: Michal Hocko 
Cc: Balbir Singh 
Cc: KAMEZAWA Hiroyuki 
Cc: Vivek Goyal 
---
 block/blk-cgroup.c | 3 +--
 include/linux/cgroup.h | 2 +-
 kernel/cgroup.c| 2 +-
 mm/hugetlb_cgroup.c| 4 +---
 mm/memcontrol.c| 3 +--
 5 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index f3b44a6..a7816f3 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -600,7 +600,7 @@ struct cftype blkcg_files[] = {
  *
  * This is the blkcg counterpart of ioc_release_fn().
  */
-static int blkcg_pre_destroy(struct cgroup *cgroup)
+static void blkcg_pre_destroy(struct cgroup *cgroup)
 {
struct blkcg *blkcg = cgroup_to_blkcg(cgroup);
 
@@ -622,7 +622,6 @@ static int blkcg_pre_destroy(struct cgroup *cgroup)
}
 
spin_unlock_irq(>lock);
-   return 0;
 }
 
 static void blkcg_destroy(struct cgroup *cgroup)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 47868a8..adb2adc 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -436,7 +436,7 @@ int cgroup_taskset_size(struct cgroup_taskset *tset);
 
 struct cgroup_subsys {
struct cgroup_subsys_state *(*create)(struct cgroup *cgrp);
-   int (*pre_destroy)(struct cgroup *cgrp);
+   void (*pre_destroy)(struct cgroup *cgrp);
void (*destroy)(struct cgroup *cgrp);
int (*can_attach)(struct cgroup *cgrp, struct cgroup_taskset *tset);
void (*cancel_attach)(struct cgroup *cgrp, struct cgroup_taskset *tset);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index c5f6fb2..83cd7d0 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4054,7 +4054,7 @@ static int cgroup_rmdir(struct inode *unused_dir, struct 
dentry *dentry)
mutex_unlock(_mutex);
for_each_subsys(cgrp->root, ss)
if (ss->pre_destroy)
-   WARN_ON_ONCE(ss->pre_destroy(cgrp));
+   ss->pre_destroy(cgrp);
mutex_lock(_mutex);
 
/*
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index dc595c6..0d3a1a3 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -155,7 +155,7 @@ out:
  * Force the hugetlb cgroup to empty the hugetlb resources by moving them to
  * the parent cgroup.
  */
-static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
+static void hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
 {
struct hstate *h;
struct page *page;
@@ -172,8 +172,6 @@ static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
}
cond_resched();
} while (hugetlb_cgroup_have_usage(cgroup));
-
-   return 0;
 }
 
 int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 47c4680..af05a60 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5002,12 +5002,11 @@ free_out:
return ERR_PTR(error);
 }
 
-static int mem_cgroup_pre_destroy(struct cgroup *cont)
+static void mem_cgroup_pre_destroy(struct cgroup *cont)
 {
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 
mem_cgroup_reparent_charges(memcg);
-   return 0;
 }
 
 static void mem_cgroup_destroy(struct cgroup *cont)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/8] memcg: make mem_cgroup_reparent_charges non failing

2012-10-30 Thread Tejun Heo
From: Michal Hocko 

Now that pre_destroy callbacks are called from the context where neither
any task can attach the group nor any children group can be added there
is no other way to fail from mem_cgroup_pre_destroy.
mem_cgroup_pre_destroy doesn't have to take a reference to memcg's css
because all css' are marked dead already.

tj: Remove now unused local variable @cgrp from
mem_cgroup_reparent_charges().

Signed-off-by: Michal Hocko 
Reviewed-by: Glauber Costa 
Signed-off-by: Tejun Heo 
---
 mm/memcontrol.c | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1033b2b..47c4680 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3740,14 +3740,11 @@ static void mem_cgroup_force_empty_list(struct 
mem_cgroup *memcg,
  *
  * Caller is responsible for holding css reference on the memcg.
  */
-static int mem_cgroup_reparent_charges(struct mem_cgroup *memcg)
+static void mem_cgroup_reparent_charges(struct mem_cgroup *memcg)
 {
-   struct cgroup *cgrp = memcg->css.cgroup;
int node, zid;
 
do {
-   if (cgroup_task_count(cgrp) || !list_empty(>children))
-   return -EBUSY;
/* This is for making all *used* pages to be on LRU. */
lru_add_drain_all();
drain_all_stock_sync(memcg);
@@ -3773,8 +3770,6 @@ static int mem_cgroup_reparent_charges(struct mem_cgroup 
*memcg)
 * charge before adding to the LRU.
 */
} while (res_counter_read_u64(>res, RES_USAGE) > 0);
-
-   return 0;
 }
 
 /*
@@ -3811,7 +3806,9 @@ static int mem_cgroup_force_empty(struct mem_cgroup 
*memcg)
 
}
lru_add_drain();
-   return mem_cgroup_reparent_charges(memcg);
+   mem_cgroup_reparent_charges(memcg);
+
+   return 0;
 }
 
 static int mem_cgroup_force_empty_write(struct cgroup *cont, unsigned int 
event)
@@ -5008,13 +5005,9 @@ free_out:
 static int mem_cgroup_pre_destroy(struct cgroup *cont)
 {
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
-   int ret;
 
-   css_get(>css);
-   ret = mem_cgroup_reparent_charges(memcg);
-   css_put(>css);
-
-   return ret;
+   mem_cgroup_reparent_charges(memcg);
+   return 0;
 }
 
 static void mem_cgroup_destroy(struct cgroup *cont)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] Thermal: exynos: Add sysfs node supporting exynos's emulation mode.

2012-10-30 Thread Jonghwa Lee
This patch supports exynos's emulation mode with newly created sysfs node.
Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal
management unit. Thermal emulation mode supports software debug for TMU's
operation. User can set temperature manually with software code and TMU
will read current temperature from user value not from sensor's value.
This patch includes also documentary placed under Documentation/thermal/.

Signed-off-by: Jonghwa Lee 
---
v2
 exynos_thermal.c
 - Fix build error occured by wrong emulation control register name.
 - Remove exynos5410 dependent codes.
 exynos_theraml_emulation
 - Align indentation.

 Documentation/thermal/exynos_thermal_emulation |   49 +
 drivers/thermal/Kconfig|9 +++
 drivers/thermal/exynos_thermal.c   |   88 
 3 files changed, 146 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/thermal/exynos_thermal_emulation

diff --git a/Documentation/thermal/exynos_thermal_emulation 
b/Documentation/thermal/exynos_thermal_emulation
new file mode 100644
index 000..062d867
--- /dev/null
+++ b/Documentation/thermal/exynos_thermal_emulation
@@ -0,0 +1,49 @@
+EXYNOS EMULATION MODE
+
+
+Copyright (C) 2012 Samsung Electronics
+
+Writen by Jonghwa Lee 
+
+Description
+---
+
+Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal 
management unit.
+Thermal emulation mode supports software debug for TMU's operation. User can 
set temperature
+manually with software code and TMU will read current temperature from user 
value not from
+sensor's value.
+
+Enabling CONFIG_EXYNOS_THERMAL_EMUL option will make this support in available.
+When it's enabled, sysfs node will be created under
+/sys/bus/platform/devices/'exynos device name'/ with name of 'emulation'.
+
+The sysfs node, 'emulation', will contain value 0 for the initial state. When 
you input any
+temperature you want to update to sysfs node, it automatically enable 
emulation mode and
+current temperature will be changed into it.
+(Exynos also supports user changable delay time which would be used to delay of
+ changing temperature. However, this node only uses same delay of real sensing 
time, 938us.)
+
+Disabling emulation mode only requires writing value 0 to sysfs node.
+
+
+TEMP   120 |
+   |
+   100 |
+   |
+80 |
+   |+---
+60 ||  |
+   |  +-|  |
+40 |  | |  |
+   |  | |  |
+20 |  | |  +--
+   |  | |  |  |
+ 0 |__|_|__|__|_
+  A A  A  A TIME
+  |<->| |<->|  |<->|  |
+  | 938us | |   |  |   |  |
+emulation:  0  50 | 70  |  20  |  0
+current temp :   sensor   5070 20sensor
+
+
+
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index e1cb6bd..c02a66c 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -55,3 +55,12 @@ config EXYNOS_THERMAL
help
  If you say yes here you get support for TMU (Thermal Managment
  Unit) on SAMSUNG EXYNOS series of SoC.
+
+config EXYNOS_THERMAL_EMUL
+   bool "EXYNOS TMU emulation mode support"
+   depends on !CPU_EXYNOS4210 && EXYNOS_THERMAL
+   help
+ Exynos 4412 and 4414 and 5 series has emulation mode on TMU.
+ Enable this option will be make sysfs node in exynos thermal platform
+ device directory to support emulation mode. With emulation mode sysfs
+ node, you can manually input temperature to TMU for simulation 
purpose.
diff --git a/drivers/thermal/exynos_thermal.c b/drivers/thermal/exynos_thermal.c
index fd03e85..9e3c150 100644
--- a/drivers/thermal/exynos_thermal.c
+++ b/drivers/thermal/exynos_thermal.c
@@ -99,6 +99,15 @@
 #define IDLE_INTERVAL 1
 #define MCELSIUS   1000
 
+#ifdef CONFIG_EXYNOS_THERMAL_EMUL
+#define EXYNOS_EMUL_TIME   0x57F0
+#define EXYNOS_EMUL_TIME_SHIFT 16
+#define EXYNOS_EMUL_DATA_SHIFT 8
+#define EXYNOS_EMUL_DATA_MASK  0xFF
+#define EXYNOS_EMUL_DISABLE0x0
+#define EXYNOS_EMUL_ENABLE 0x1
+#endif /* CONFIG_EXYNOS_THERMAL_EMUL */
+
 /* CPU Zone information */
 #define PANIC_ZONE  4
 #define WARN_ZONE   3
@@ -832,6 +841,82 @@ static inline struct  exynos_tmu_platform_data 
*exynos_get_driver_data(
return (struct exynos_tmu_platform_data *)
platform_get_device_id(pdev)->driver_data;
 }
+
+#ifdef CONFIG_EXYNOS_THERMAL_EMUL
+static ssize_t exynos_tmu_emulation_show(struct device *dev,
+

Re: [PATCH 1/2] zram: factor-out zram_decompress_page() function

2012-10-30 Thread Nitin Gupta

On 10/30/2012 02:04 PM, Sergey Senozhatsky wrote:

On (10/29/12 10:14), Nitin Gupta wrote:

==
zram: Fix use-after-free in partial I/O case

When the compressed size of a page exceeds a threshold, the page is
stored as-is i.e. in uncompressed form. In the partial I/O i.e.
non-PAGE_SIZE'ed I/O case, however, the uncompressed memory was being
freed before it could be copied into the zsmalloc pool resulting in
use-after-free bug.



Hello Nitin,
hope you are fine.

how about the following one? I moved some of the code to zram_compress_page()
(very similar to zram_decompress_page()), so errors are easier to care in
zram_bvec_write(). now we handle both use after-kfree (as you did in your 
patch),
and use after-kunmap.

please review.

Signed-off-by: Sergey Senozhatsky 

---

  drivers/staging/zram/zram_drv.c | 91 +
  1 file changed, 46 insertions(+), 45 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 47f2e3a..5f37be1 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -213,6 +213,44 @@ static int zram_decompress_page(struct zram *zram, char 
*mem, u32 index)
return 0;
  }

+static int zram_compress_page(struct zram *zram, char *uncmem, u32 index)
+{
+   int ret;
+   size_t clen;
+   unsigned long handle;
+   unsigned char *cmem, *src;
+
+   src = zram->compress_buffer;
+   ret = lzo1x_1_compress(uncmem, PAGE_SIZE, src, ,
+  zram->compress_workmem);
+   if (unlikely(ret != LZO_E_OK)) {
+   pr_err("Page compression failed: err=%d\n", ret);
+   return ret;
+   }
+
+   if (unlikely(clen > max_zpage_size)) {
+   zram_stat_inc(>stats.bad_compress);
+   src = uncmem;
+   clen = PAGE_SIZE;
+   }
+
+   handle = zs_malloc(zram->mem_pool, clen);
+   if (!handle) {
+   pr_info("Error allocating memory for compressed "
+   "page: %u, size=%zu\n", index, clen);
+   return -ENOMEM;
+   }
+
+   cmem = zs_map_object(zram->mem_pool, handle, ZS_MM_WO);
+   memcpy(cmem, src, clen);
+   zs_unmap_object(zram->mem_pool, handle);
+
+   zram->table[index].handle = handle;
+   zram->table[index].size = clen;
+
+   return 0;
+}
+
  static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
  u32 index, int offset, struct bio *bio)
  {
@@ -267,13 +305,10 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,
  {
int ret;
size_t clen;
-   unsigned long handle;
struct page *page;
-   unsigned char *user_mem, *cmem, *src, *uncmem = NULL;
+   unsigned char *user_mem, *uncmem = NULL;

page = bvec->bv_page;
-   src = zram->compress_buffer;
-
if (is_partial_io(bvec)) {
/*
 * This is a partial IO. We need to read the full page
@@ -286,10 +321,8 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,
goto out;
}
ret = zram_decompress_page(zram, uncmem, index);
-   if (ret) {
-   kfree(uncmem);
+   if (ret)
goto out;
-   }
}

/*
@@ -309,58 +342,26 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,
uncmem = user_mem;

if (page_zero_filled(uncmem)) {
-   kunmap_atomic(user_mem);
-   if (is_partial_io(bvec))
-   kfree(uncmem);
zram_stat_inc(>stats.pages_zero);
zram_set_flag(zram, index, ZRAM_ZERO);
ret = 0;
goto out;
}

-   ret = lzo1x_1_compress(uncmem, PAGE_SIZE, src, ,
-  zram->compress_workmem);
-
-   kunmap_atomic(user_mem);
-   if (is_partial_io(bvec))
-   kfree(uncmem);
-
-   if (unlikely(ret != LZO_E_OK)) {
-   pr_err("Compression failed! err=%d\n", ret);
-   goto out;
-   }
-
-   if (unlikely(clen > max_zpage_size)) {
-   zram_stat_inc(>stats.bad_compress);
-   src = uncmem;
-   clen = PAGE_SIZE;
-   }
-
-   handle = zs_malloc(zram->mem_pool, clen);
-   if (!handle) {
-   pr_info("Error allocating memory for compressed "
-   "page: %u, size=%zu\n", index, clen);
-   ret = -ENOMEM;
+   ret = zram_compress_page(zram, uncmem, index);
+   if (ret)
goto out;
-   }
-   cmem = zs_map_object(zram->mem_pool, handle, ZS_MM_WO);
-
-   memcpy(cmem, src, clen);
-
-   zs_unmap_object(zram->mem_pool, handle);
-
-   zram->table[index].handle = handle;
-   zram->table[index].size = clen;

+   

RE: [PATCH] da8xx: Fix revision check on the da8xx driver

2012-10-30 Thread Manjunathappa, Prakash
On Wed, Oct 31, 2012 at 21:26:24, Pantelis Antoniou wrote:
> The revision check fails for the beaglebone; Add new revision ID.
> 
> Signed-off-by: Pantelis Antoniou 
> ---
>  drivers/video/da8xx-fb.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/video/da8xx-fb.c b/drivers/video/da8xx-fb.c
> index 80665f6..866d804 100644
> --- a/drivers/video/da8xx-fb.c
> +++ b/drivers/video/da8xx-fb.c
> @@ -1283,6 +1283,7 @@ static int __devinit fb_probe(struct platform_device 
> *device)
>   lcd_revision = LCD_VERSION_1;
>   break;
>   case 0x4F200800:
> + case 0x4F201000:

Thanks for Correcting. This is the LCDC revision on am335x silicon in comparison
with to one read(0x4F200800) on emulator platform.

Acked-by: Manjunathappa, Prakash 

>   lcd_revision = LCD_VERSION_2;
>   break;
>   default:
> -- 
> 1.7.12
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 01/16] hashtable: introduce a small and naive hashtable

2012-10-30 Thread Linus Torvalds
On Tue, Oct 30, 2012 at 8:24 PM, Al Viro  wrote:
>
> Oh, well... there go my blackmail plans ;-)  Seriously, though, I'm at loss
> regarding several embedded architectures - arch/score, in particular,
> seems to be completely orphaned.

Don't worry about it. Do a best-effort, and if nobody ever reacts
about some odd-ball architecture, whatever.

We won't start deleting architectures over something like this, but it
might be another sign down the road that some arch code can be removed
entirely.

So it's not arch/score I'd worry about. It's all the *other* architectures..

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH tip/core/rcu 1/2] rcu: Add callback-free CPUs

2012-10-30 Thread Paul E. McKenney
From: "Paul E. McKenney" 

RCU callback execution can add significant OS jitter and also can degrade
scheduling latency.  This commit therefore adds the ability for selected
CPUs ("rcu_nocbs=" boot parameter) to have their callbacks offloaded to
kthreads.  If the "rcu_nocb_poll" boot parameter is also specified, these
kthreads will do polling, removing the need for the offloaded CPUs to do
wakeups.  At least one CPU must be doing normal callback processing:
currently CPU 0 cannot be selected as a no-CBs CPU.  In addition, attempts
to offline the last normal-CBs CPU will fail.

This feature was inspired by Jim Houston's and Joe Korty's JRCU, and
this commit includes fixes to problems located by Fengguang Wu's
kbuild test robot.

[ paulmck: Added gfp.h include file as suggested by Fengguang Wu. ]

Signed-off-by: Paul E. McKenney 
Signed-off-by: Paul E. McKenney 
---
 Documentation/kernel-parameters.txt |5 +
 include/trace/events/rcu.h  |1 +
 init/Kconfig|   19 ++
 kernel/rcutree.c|   63 +-
 kernel/rcutree.h|   47 
 kernel/rcutree_plugin.h |  397 ++-
 kernel/rcutree_trace.c  |7 +-
 7 files changed, 523 insertions(+), 16 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 9776f06..dfd03272 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2394,6 +2394,11 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
ramdisk_size=   [RAM] Sizes of RAM disks in kilobytes
See Documentation/blockdev/ramdisk.txt.
 
+   rcu_nocbs=  [KNL,BOOT]
+   Set the specified list of CPUs to be no-callback
+   CPUs.  Invocation of these CPUs' RCU callbacks will
+   be offloaded to kthreads created for that purpose.
+
rcutree.blimit= [KNL,BOOT]
Set maximum number of finished RCU callbacks to process
in one batch.
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 5bde94d..d4f559b 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -549,6 +549,7 @@ TRACE_EVENT(rcu_torture_read,
  * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
  * "Inc1": rcu_barrier_callback() piggyback check counter incremented.
  * "Offline": rcu_barrier_callback() found offline CPU
+ * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
  * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
  * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
  * "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
diff --git a/init/Kconfig b/init/Kconfig
index ec62139..e2343d4 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -654,6 +654,25 @@ config RCU_BOOST_DELAY
 
  Accept the default if unsure.
 
+config RCU_NOCB_CPU
+   bool "Offload RCU callback processing from boot-selected CPUs"
+   depends on TREE_RCU || TREE_PREEMPT_RCU
+   default n
+   help
+ Use this option to reduce OS jitter for aggressive HPC or
+ real-time workloads.
+
+ This option offloads callback invocation from the set of CPUs
+ specified at boot time by the rcu_nocbs parameter.  For each
+ such CPU, a kthread ("rcuoN") will be created to invoke callbacks.
+ Nothing prevents this kthread from running on of of the specified
+ CPUs, but (1) the kthreads may be preempted between each callback
+ and (2) affinity or cgroups can be used to force the kthreads off
+ of those CPUs if desired.
+
+ Say Y here if you want reduced OS jitter on selected CPUs.
+ Say N here if you are unsure.
+
 endmenu # "RCU Subsystem"
 
 config IKCONFIG
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 9ce19c9..1523c47 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -297,7 +297,8 @@ EXPORT_SYMBOL_GPL(rcu_sched_force_quiescent_state);
 static int
 cpu_has_callbacks_ready_to_invoke(struct rcu_data *rdp)
 {
-   return >nxtlist != rdp->nxttail[RCU_DONE_TAIL];
+   return >nxtlist != rdp->nxttail[RCU_DONE_TAIL] &&
+  rdp->nxttail[RCU_DONE_TAIL] != NULL;
 }
 
 /*
@@ -306,8 +307,11 @@ cpu_has_callbacks_ready_to_invoke(struct rcu_data *rdp)
 static int
 cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp)
 {
-   return *rdp->nxttail[RCU_DONE_TAIL +
-(ACCESS_ONCE(rsp->completed) != rdp->completed)] &&
+   struct rcu_head **ntp;
+
+   ntp = rdp->nxttail[RCU_DONE_TAIL +
+  (ACCESS_ONCE(rsp->completed) != rdp->completed)];
+   return rdp->nxttail[RCU_DONE_TAIL] && ntp && *ntp &&
   !rcu_gp_in_progress(rsp);
 }
 
@@ -1084,6 +1088,7 @@ static void 

[PATCH tip/core/rcu 2/2] rcu: Separate accounting of callbacks from callback-free CPUs

2012-10-30 Thread Paul E. McKenney
From: "Paul E. McKenney" 

Currently, callback invocations from callback-free CPUs are accounted to
the CPU that registered the callback, but using the same field that is
used for normal callbacks.  This makes it impossible to determine from
debugfs output whether callbacks are in fact being diverted.  This commit
therefore adds a separate ->n_nocbs_invoked field in the rcu_data structure
in which diverted callback invocations are counted.  RCU's debugfs tracing
still displays normal callback invocations using ci=, but displayed
diverted callbacks with nci=.

Signed-off-by: Paul E. McKenney 
Signed-off-by: Paul E. McKenney 
---
 kernel/rcutree.h|1 +
 kernel/rcutree_plugin.h |2 +-
 kernel/rcutree_trace.c  |5 +++--
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 79954bb..db9bec8 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -287,6 +287,7 @@ struct rcu_data {
longqlen_last_fqs_check;
/* qlen at last check for QS forcing */
unsigned long   n_cbs_invoked;  /* count of RCU cbs invoked. */
+   unsigned long   n_nocbs_invoked; /* count of no-CBs RCU cbs invoked. */
unsigned long   n_cbs_orphaned; /* RCU cbs orphaned by dying CPU */
unsigned long   n_cbs_adopted;  /* RCU cbs adopted from dying CPU */
unsigned long   n_force_qs_snap;
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index ea960c4..31d3a95 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2406,7 +2406,7 @@ static int rcu_nocb_kthread(void *arg)
trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
ACCESS_ONCE(rdp->nocb_p_count) -= c;
ACCESS_ONCE(rdp->nocb_p_count_lazy) -= cl;
-   rdp->n_cbs_invoked += c;
+   rdp->n_nocbs_invoked += c;
}
return 0;
 }
diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c
index 5e9ca3e..167375d 100644
--- a/kernel/rcutree_trace.c
+++ b/kernel/rcutree_trace.c
@@ -146,8 +146,9 @@ static void print_one_rcu_data(struct seq_file *m, struct 
rcu_data *rdp)
   per_cpu(rcu_cpu_kthread_loops, rdp->cpu) & 0x);
 #endif /* #ifdef CONFIG_RCU_BOOST */
seq_printf(m, " b=%ld", rdp->blimit);
-   seq_printf(m, " ci=%lu co=%lu ca=%lu\n",
-  rdp->n_cbs_invoked, rdp->n_cbs_orphaned, rdp->n_cbs_adopted);
+   seq_printf(m, " ci=%lu nci=%lu co=%lu ca=%lu\n",
+  rdp->n_cbs_invoked, rdp->n_nocbs_invoked,
+  rdp->n_cbs_orphaned, rdp->n_cbs_adopted);
 }
 
 static int show_rcudata(struct seq_file *m, void *v)
-- 
1.7.8

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH tip/core/rcu 0/2] v2 Add callback-free CPUs

2012-10-30 Thread Paul E. McKenney
Hello!

RCU callback execution can add significant OS jitter and also can
degrade scheduling latency.  This commit therefore adds the ability
for selected CPUs ("rcu_nocbs=" boot parameter) to have their callbacks
offloaded to kthreads, inspired by Joe Korty's and Jim Houston's JRCU.
If the "rcu_nocb_poll" boot parameter is also specified, these kthreads
will do polling, removing the need for the offloaded CPUs to do wakeups.
At least one CPU must be doing normal callback processing: currently CPU
0 cannot be selected as a no-CBs CPU.  In addition, attempts to offline
the last normal-CBs CPU will fail.

Changes since v1 (https://lkml.org/lkml/2012/9/5/572):

1.  Contains fixes for a few problems located by Fengguang Wu's
kbuild test robot.
2.  Counters are now atomic_long_t rather than atomic_t, as
suggested by Peter Zijlstra.
3.  The rcu_nocbs= kernel boot parameter is now documented.
4.  Fixed a bug introduced by commit c96ea7cf (Avoid spurious
RCU CPU stall warnings) that resulted in boot-time NULL-pointer
dereferences (reported by Paul Gortmaker).
4.  Account for normal and offloaded callbacks separately, so that
offloading is represented in debugfs output.

The patches in this series are as follows:

1.  Offload RCU callbacks based on boot-time kernel parameter.
2.  Account for normal and offloaded callbacks separately, so that
offloading is represented in debugfs output.

Thanx, Paul

 b/Documentation/kernel-parameters.txt |5 
 b/include/trace/events/rcu.h  |1 
 b/init/Kconfig|   19 +
 b/kernel/rcutree.c|   63 -
 b/kernel/rcutree.h|   48 
 b/kernel/rcutree_plugin.h |  399 +-
 b/kernel/rcutree_trace.c  |   12 -
 7 files changed, 528 insertions(+), 19 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-30 Thread Zhang, Jun
Hello, Anvin
Thanks!

Hello, all
Next is my the latest version, please review it. 
Thanks!

>From 141546c77ff7be523a9e72f5259df4a6827f2c1a Mon Sep 17 00:00:00 2001
From: jzha144 
Date: Wed, 31 Oct 2012 08:51:18 +0800
Subject: [PATCH] If we are doing a crash dump, we still need non-E820_RAM
 memory type address information, which come from BIOS or
 firmware. for example: PCI_MMCONFIG check this address.

Signed-off-by: jzha144 
---
 arch/x86/kernel/e820.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index df06ade..f8672d0 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
 * reset.
 */
saved_max_pfn = e820_end_of_ram_pfn();
+
+   /*
+* If we are doing a crash dump, we still need non-E820_RAM
+* memory type address information. so we only remove
+* E820_RAM type.
+*/
+   e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
+   userdef = 1;
+   return 0;
 #endif
e820.nr_map = 0;
userdef = 1;
-- 
1.7.6


Best Regards!

Jun Zhang
Inet: 8821-4273
Dir.Tel: 86-21-6116-4273
Email: jun.zh...@intel.com


-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Wednesday, October 31, 2012 10:47 AM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; Fleming, 
Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] To crash dump, we need keep other memory type except 
E820_RAM, because other type come from BIOS or firmware is used by other 
code(for example: PCI_MMCONFIG).

On 10/30/2012 06:26 PM, Zhang, Jun wrote:
> From aebc336baa7ec2d4ccb6f21166770c7d2ee26cba Mon Sep 17 00:00:00 2001
> From: jzha144 
> Date: Wed, 31 Oct 2012 08:51:18 +0800
> Subject: [PATCH] To crash dump, we need keep other memory type except  
> E820_RAM, because other type come from BIOS or firmware is  used by 
> other code(for example: PCI_MMCONFIG).

I'm sorry, I can't quite parse the description or the comment... could you 
clarify it a bit?  I think I know what you mean, but there is clearly risk for 
misunderstandings.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 2/2] KVM: make crash_clear_loaded_vmcss valid when loading kvm_intel module

2012-10-30 Thread zhangyanfei
Signed-off-by: Zhang Yanfei 
---
 arch/x86/kvm/vmx.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4ff0ab9..f6a16b2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "trace.h"
 
@@ -7230,6 +7231,10 @@ static int __init vmx_init(void)
if (r)
goto out3;
 
+#ifdef CONFIG_KEXEC
+   crash_clear_loaded_vmcss = vmclear_local_loaded_vmcss;
+#endif
+
vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
@@ -7265,6 +7270,10 @@ static void __exit vmx_exit(void)
free_page((unsigned long)vmx_io_bitmap_b);
free_page((unsigned long)vmx_io_bitmap_a);
 
+#ifdef CONFIG_KEXEC
+   crash_clear_loaded_vmcss = NULL;
+#endif
+
kvm_exit();
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 1/2] x86/kexec: VMCLEAR vmcss on all cpus if necessary

2012-10-30 Thread zhangyanfei
This patch provides a way to VMCLEAR vmcss related to guests
on all cpus before executing the VMXOFF when doing kdump. This
is used to ensure the VMCSs in the vmcore updated and
non-corrupted.

Signed-off-by: Zhang Yanfei 
---
 arch/x86/include/asm/kexec.h |2 ++
 arch/x86/kernel/crash.c  |   25 +
 2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 317ff17..fc05440 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -163,6 +163,8 @@ struct kimage_arch {
 };
 #endif
 
+extern void (*crash_clear_loaded_vmcss)(void);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 13ad899..9ed65c1 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -30,6 +31,20 @@
 
 int in_crash_kexec;
 
+/*
+ * This is used to VMCLEAR vmcss loaded on all
+ * cpus. And when loading kvm_intel module, the
+ * function pointer will be made valid.
+ */
+void (*crash_clear_loaded_vmcss)(void) = NULL;
+EXPORT_SYMBOL_GPL(crash_clear_loaded_vmcss);
+
+static void cpu_emergency_clear_loaded_vmcss(void)
+{
+   if (crash_clear_loaded_vmcss)
+   crash_clear_loaded_vmcss();
+}
+
 #if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
 
 static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
@@ -46,6 +61,11 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
 #endif
crash_save_cpu(regs, cpu);
 
+   /*
+* VMCLEAR vmcss loaded on all cpus if needed.
+*/
+   cpu_emergency_clear_loaded_vmcss();
+
/* Disable VMX or SVM if needed.
 *
 * We need to disable virtualization on all CPUs.
@@ -88,6 +108,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 
kdump_nmi_shootdown_cpus();
 
+   /*
+* VMCLEAR vmcss loaded on this cpu if needed.
+*/
+   cpu_emergency_clear_loaded_vmcss();
+
/* Booting kdump kernel with VMX or SVM enabled won't work,
 * because (among other limitations) we can't disable paging
 * with the virt flags.
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 00/23] Load keys from signed PE binaries

2012-10-30 Thread Rusty Russell
David Howells  writes:

> Hi Rusty,
>
> Here's a set of patches to load a key out of a signed PE format binary:
>
>   
> http://git.kernel.org/?p=linux/kernel/git/dhowells/linux-modsign.git;a=shortlog;h=refs/heads/devel-pekey

AFAICT this is no longer a module issue, so I'm not going to take
these.  Perhaps via the crypto people, or direct to Linus?

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux-next changes for module and virtio trees.

2012-10-30 Thread Rusty Russell
Stephen Rothwell  writes:
> Hi Rusty,
>
> On Tue, 02 Oct 2012 15:56:56 +0930 Rusty Russell  
> wrote:
>>
>> Please remove my quilt tree
>> http://ozlabs.org/~rusty/kernel/rr-latest/ from linux-next, and use my
>> git trees from now on:
>> 
>> git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux.git
>> Branches:
>> modules-next
>> virtio-next
>
> Done.

And please add my 'fixes' branch from the same tree.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 0/2] x86: clear vmcss on all cpus when doing kdump if necessary

2012-10-30 Thread zhangyanfei
Currently, kdump just makes all the logical processors leave VMX operation by
executing VMXOFF instruction, so any VMCSs active on the logical processors may
be corrupted. But, sometimes, we need the VMCSs to debug guest images contained
in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs 
before
executing the VMXOFF instruction.

The patch set provides a way to VMCLEAR vmcss related to guests on all cpus 
before
executing the VMXOFF when doing kdump. This is used to ensure the VMCSs in the
vmcore updated and non-corrupted.

Changelog from v2 to v3:
1. remove unnecessary conditions in function
   cpu_emergency_clear_loaded_vmcss as Marcelo suggested.

Changelog from v1 to v2:
1. remove the sysctl and clear VMCSs unconditionally.

Zhang Yanfei (2):
  x86/kexec: VMCLEAR vmcss on all cpus if necessary
  KVM: make crash_clear_loaded_vmcss valid when loading kvm_intel
module

 arch/x86/include/asm/kexec.h |2 ++
 arch/x86/kernel/crash.c  |   25 +
 arch/x86/kvm/vmx.c   |9 +
 3 files changed, 36 insertions(+), 0 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/9] random32: introduce random32_get_bytes() and prandom32_get_bytes()

2012-10-30 Thread Theodore Ts'o
On Tue, Oct 30, 2012 at 08:12:39PM +0900, Akinobu Mita wrote:
> >>
> >> How about prandom32_get_bytes_state() and prandom32_get_bytes() instead?
> >
> > I agree with your suggestion.  I'll rename them and try again.
> >
> > By the way, should we also rename the existing random32() and
> > prandom32() in the future?

I suppose the other way to go is to just use random32 as the common
prefix, and just have random32() and random32_state().  My concern was
that people might assume that prandom32() and random32() might imply
that only prandom32() was the one using a pseudo-random number
generator.  This might be easier since there are large number of uses
of random32() in the source tree, but only a relative few using
prandom32().

 - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 01/16] hashtable: introduce a small and naive hashtable

2012-10-30 Thread Al Viro
On Tue, Oct 30, 2012 at 07:48:19PM -0700, Linus Torvalds wrote:
> On Tue, Oct 30, 2012 at 7:24 PM, Al Viro  wrote:
> >
> > BTW, how serious have you been back at KS when you were talking about
> > pull requests killing a thousand of lines of code being acceptable
> > at any point in the cycle?
> 
> Well... I'm absolutely a lot more open to pull requests that kill code
> than not, but I have to admit to being a bit more worried about stuff
> like your execve/fork patches that touch very low-level code.
> 
> So I think I'll punt that for 3.8 anyway.

Oh, well... there go my blackmail plans ;-)  Seriously, though, I'm at loss
regarding several embedded architectures - arch/score, in particular,
seems to be completely orphaned.  As far as I can see, it's
* abandoned by hw vendor (seems like they were planning to push
it game consoles, but that was just before the recession, and...)
* abandoned by primary maintainer, who isn't employed by said
hw vendor anymore, so his old address had been bouncy for several years.
He had bothered to update it in gcc tree, but hadn't been active there
either for almost as long.  And new address in gcc tree is of form
+g...@gmail.com, so using it for kernel-related mail would seem to
be a lousy idea.
* the second maintainer seems to be nearly MIA as well - all I can
find is Acked-by on one commit.  Cc'ed on the kernel_execve() thread, but...
no signs of life whatsoever.
* a lot of asm glue is in "apparently never worked" state, starting
with ptrace hookup (it's clearly started its life as a mips clone, but uses
different registers for passing return value, etc.  TIF_SYSCALL_TRACE side of
that thing still assumes MIPS ABI *and* is suffering obvious bitrot)

Sigh...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] staging: csr: Remove struct CsrEvent

2012-10-30 Thread SeongJae Park
Sorry and Thank you about that.

I will not forget it next time.

Thanks and Regards.
SeongJae Park.

On Wed, Oct 31, 2012 at 2:38 AM, Greg KH  wrote:
>
> On Tue, Oct 30, 2012 at 11:26:13AM +0900, SeongJae Park wrote:
> > Nobody use struct CsrEvent. So, remove it.
> > Signed-off-by: SeongJae Park 
>
> Please put a blank line between your changelog text and your
> signed-off-by line, otherwise I have to edit it and do it by hand.  I've
> fixed it up this time.
>
> thanks,
>
> greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] thermal: solve compilation errors in rcar_thermal

2012-10-30 Thread Kuninori Morimoto

Hi Zhang, Andrew

This patch is needed on latest linus/master branch.
Please re-check this patch.

And, similar patch was added on linux-next/master branch
b5da4e6d5603633835a1da267e0e699eea66f317
(Thermal: Pass zone parameters as argument to tzd_register)
but it seems wrong (?)

At Tue, 21 Aug 2012 22:01:36 +0530,
Devendra Naga wrote:
> 
> following were the errors reported
> 
> drivers/thermal/rcar_thermal.c: In function ‘rcar_thermal_probe’:
> drivers/thermal/rcar_thermal.c:214:10: warning: passing argument 3 of 
> ‘thermal_zone_device_register’ makes integer from pointer without a cast 
> [enabled by default]
> include/linux/thermal.h:166:29: note: expected ‘int’ but argument is of type 
> ‘struct rcar_thermal_priv *’
> drivers/thermal/rcar_thermal.c:214:10: error: too few arguments to function 
> ‘thermal_zone_device_register’
> include/linux/thermal.h:166:29: note: declared here
> make[1]: *** [drivers/thermal/rcar_thermal.o] Error 1
> make: *** [drivers/thermal/rcar_thermal.o] Error 2
> 
> with gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
> 
> Signed-off-by: Devendra Naga 
> ---
>  drivers/thermal/rcar_thermal.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/rcar_thermal.c b/drivers/thermal/rcar_thermal.c
> index d445271..f7a1b57 100644
> --- a/drivers/thermal/rcar_thermal.c
> +++ b/drivers/thermal/rcar_thermal.c
> @@ -210,7 +210,7 @@ static int rcar_thermal_probe(struct platform_device 
> *pdev)
>   goto error_free_priv;
>   }
>  
> - zone = thermal_zone_device_register("rcar_thermal", 0, priv,
> + zone = thermal_zone_device_register("rcar_thermal", 0, 0, priv,
>   _thermal_zone_ops, 0, 0);
>   if (IS_ERR(zone)) {
>   dev_err(>dev, "thermal zone device is NULL\n");
> -- 
> 1.7.9.5
> 


Best regards
---
Kuninori Morimoto
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/9] uuid: use random32_get_bytes()

2012-10-30 Thread Huang Ying
On Tue, 2012-10-30 at 22:38 -0400, Theodore Ts'o wrote:
> On Wed, Oct 31, 2012 at 09:35:37AM +0800, Huang Ying wrote:
> > 
> > The intention of lib/uuid.c is to unify various UUID related code, and
> > put them in same place.  In addition to UUID generation, it provide some
> > other utility and may provide/collect more in the future.  So do you
> > think it is a good idea to put generate_rand_uuid/guid into lib/uuid.c
> > and maybe change the name/prototype to make it consistent with other
> > uuid definitions?
> 
> I had trouble understanding why lib/uuid.c existed, since the only
> thing I saw was the uuid generation function.  After some more
> looking, I see you also created inline functions which wrapped
> memcmp().
> 
> The problem I have with your abstractions is that it just makes life
> more complicated for the callers.  All of the current places which use
> generate_random_uuid() merely want to fill in a unsigned char array.
> This includes btrfs, by the way, which is already using
> generate_random_uuid in some places, and I'm not sure why they are
> using uuid_le_gen(), since there doesn't seem to be any need for a
> little-endian uuid/guid here (it's just used as unique bag of bits
> which is 16 bytes long), and using uuid_le_gen() means extra memory
> has to be allocated on the stack, and then an extra memory copy is
> required.  Contrast (in fs/btrfs/root-tree.c):
> 
>  uuid_le uuid;
>  ...
>   uuid_le_gen();
>   memcpy(item->uuid, uuid.b, BTRFS_UUID_SIZE);
> 
> versus, simply doing (fs/btrfs/volumes.c):
> 
>   generate_random_uuid(fs_devices->fsid);
> 
> see which one is easier?  And after the uuid is generated, none of the
> current callers ever do any manipulation of the uuid, so there's no
> real point to play fancy typedef games; it just adds more work for no
> real gain.

If we use uuid_le when we define the data structure, life will be eaiser

struct btrfs_root_item {
...
uuid_le uuid;
...
};

Then it is quite easy to use it.

uuid_le_gen(>uuid);

That is the intended usage model.

UUID_LE() macro definition has some user.  It makes it easier to
construct UUID/GUID defined in some specs.

> > > Using UUID vs. GUID I think makes things much clearer, since the EFI
> > > specification talks about GUID's, not UUID's, and that way we don't
> > > have to worry about people getting confused about whether they should
> > > be using the little-endian versus big-endian variant.  (And I'd love
> > > to ask to whoever wrote the EFI specification what on *Earth* were
> > > they thinking when they decided to diverge from the rest of the
> > > world)
> > 
> > I think that is a good idea.  From Wikipedia, GUID is in native byte
> > order, while UUID is in internet byte order.
> 
> Well, technially GUID is "intel/little-endian byte order".  If someone
> tried to implement the GPT on a big-endian system, such as PowerPC,
> they would still have to use the little-endian byte order, even though
> it's not the native byte order for that architecture.  Otherwise
> devices wouldn't be portable between those systems.  (This is why I
> think the GUID was such a bad idea; everyone basically treats them as
> 16 byte octet strings, so this whole idea of "native byte order" just
> to save a few byte swaps at UUID generation time was really, IMHO, a
> very bad idea.)

Yes.  Explicit byte order is better.

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()

2012-10-30 Thread Ming Lei
On Wed, Oct 31, 2012 at 10:08 AM, Ming Lei  wrote:
>> I am afraid it is, because a disk may just have been probed as the deviceis 
>> being reset.
>
> Yes, it is probable, and sounds like similar with 'root_wait' problem, see
> prepare_namespace(): init/do_mounts.c, so looks no good solution
> for the problem, and maybe we have to set the flag always before resetting
> usb device.

The below idea may help the problem which 'memalloc_noio' flag isn't set during
usb_reset_device().

- for usb mass storage device, call pm_runtime_set_memalloc_noio(true)
  inside usb_stor_probe2() and uas_probe(), and call
  pm_runtime_set_memalloc_noio(false) inside uas_disconnect()
  and usb_stor_disconnect().

- for usb network device, register_netdev() is always called inside usb
  interface's probe(),  looks no such problem.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

2012-10-30 Thread Frederic Weisbecker
2012/10/31 Steven Rostedt :
> More confidence over what? The xchg()? They are equivalent (wrt memory
> barriers).
>
> Here's the issue that currently exists. Let's look at the code:
>
>
> /*
>  * Claim the entry so that no one else will poke at it.
>  */
> static bool irq_work_claim(struct irq_work *work)
> {
> unsigned long flags, nflags;
>
> for (;;) {
> flags = work->flags;
> if (flags & IRQ_WORK_PENDING)
> return false;
> nflags = flags | IRQ_WORK_FLAGS;
> if (cmpxchg(>flags, flags, nflags) == flags)
> break;
> cpu_relax();
> }
>
> return true;
> }
>
> and
>
> llnode = llist_del_all(this_list);
> while (llnode != NULL) {
> work = llist_entry(llnode, struct irq_work, llnode);
>
> llnode = llist_next(llnode);
>
> /*
>  * Clear the PENDING bit, after this point the @work
>  * can be re-used.
>  */
> work->flags = IRQ_WORK_BUSY;
> work->func(work);
> /*
>  * Clear the BUSY bit and return to the free state if
>  * no-one else claimed it meanwhile.
>  */
> (void)cmpxchg(>flags, IRQ_WORK_BUSY, 0);
> }
>
> The irq_work_claim() will only queue its work if it's not already
> pending. If it is pending, then someone is going to process it anyway.
> But once we start running the function, new work needs to be processed
> again.
>
> Thus we have:
>
> CPU 1   CPU 2
> -   -
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
>
> if (flags & IRQ_WORK_PENDING)
> return false
> flags = IRQ_WORK_BUSY
> (flags = 2)
> func()
>
> The above shows the normal case were CPU 2 doesn't need to queue work,
> because its going to be done for it by CPU 1. But...
>
>
>
> CPU 1   CPU 2
> -   -
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
> flags = IRQ_WORK_BUSY
> (flags = 2)
> func()
> (sees flags = 3)
> if (flags & IRQ_WORK_PENDING)
> return false
> cmpxchg(flags, 2, 0);
> (flags = 0)
>
>
> Here, because we did not do a memory barrier after
> flags = IRQ_WORK_BUSY, CPU 2 saw stale data and did not queue its work,
> and missed the opportunity. Now if you had this fix with the xchg() as
> you have in your patch, then CPU 2 would not see the stale flags.
> Except, with the code I showed above it still can!
>
> CPU 1   CPU 2
> -   -
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
> (fetch flags)
> xchg(, IRQ_WORK_BUSY)
> (flags = 2)
> func()
> (sees flags = 3)
> if (flags & IRQ_WORK_PENDING)
> return false
> cmpxchg(flags, 2, 0);
> (flags = 0)
>
>
> Even with the update of xchg(), if CPU2 fetched the flags before CPU1
> did the xchg, then it would still lose out. But that's where your
> previous patch comes in that does:
>
>flags = work->flags & ~IRQ_WORK_PENDING;
>for (;;) {
>nflags = flags | IRQ_WORK_FLAGS;
>oflags = cmpxchg(>flags, flags, nflags);
>if (oflags == flags)
>break;
>if (oflags & IRQ_WORK_PENDING)
>return false;
>flags = oflags;
>cpu_relax();
>}
>
>
> This now does:
>
> CPU 1   CPU 2
> -   -
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
> xchg(, IRQ_WORK_BUSY)
> (flags = 2)
> func()
> oflags = cmpxchg(, 
> flags, nflags);
> (sees flags = 2)
> if (flags & IRQ_WORK_PENDING)
> (not true)
> (loop)

Re: [PATCH v8 01/16] hashtable: introduce a small and naive hashtable

2012-10-30 Thread Linus Torvalds
On Tue, Oct 30, 2012 at 7:24 PM, Al Viro  wrote:
>
> BTW, how serious have you been back at KS when you were talking about
> pull requests killing a thousand of lines of code being acceptable
> at any point in the cycle?

Well... I'm absolutely a lot more open to pull requests that kill code
than not, but I have to admit to being a bit more worried about stuff
like your execve/fork patches that touch very low-level code.

So I think I'll punt that for 3.8 anyway.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-30 Thread H. Peter Anvin
On 10/30/2012 06:26 PM, Zhang, Jun wrote:
> From aebc336baa7ec2d4ccb6f21166770c7d2ee26cba Mon Sep 17 00:00:00 2001
> From: jzha144 
> Date: Wed, 31 Oct 2012 08:51:18 +0800
> Subject: [PATCH] To crash dump, we need keep other memory type except
>  E820_RAM, because other type come from BIOS or firmware is
>  used by other code(for example: PCI_MMCONFIG).

I'm sorry, I can't quite parse the description or the comment... could
you clarify it a bit?  I think I know what you mean, but there is
clearly risk for misunderstandings.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/3] zram/zsmalloc promotion

2012-10-30 Thread Greg Kroah-Hartman
On Wed, Oct 31, 2012 at 11:39:48AM +0900, Minchan Kim wrote:
> Greg, what do you think about LTSI?
> Is it proper feature to add it? For it, still do I need ACK from mm 
> developers?

It's already in LTSI, as it's in the 3.4 kernel, right?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/9] uuid: use random32_get_bytes()

2012-10-30 Thread Theodore Ts'o
On Wed, Oct 31, 2012 at 09:35:37AM +0800, Huang Ying wrote:
> 
> The intention of lib/uuid.c is to unify various UUID related code, and
> put them in same place.  In addition to UUID generation, it provide some
> other utility and may provide/collect more in the future.  So do you
> think it is a good idea to put generate_rand_uuid/guid into lib/uuid.c
> and maybe change the name/prototype to make it consistent with other
> uuid definitions?

I had trouble understanding why lib/uuid.c existed, since the only
thing I saw was the uuid generation function.  After some more
looking, I see you also created inline functions which wrapped
memcmp().

The problem I have with your abstractions is that it just makes life
more complicated for the callers.  All of the current places which use
generate_random_uuid() merely want to fill in a unsigned char array.
This includes btrfs, by the way, which is already using
generate_random_uuid in some places, and I'm not sure why they are
using uuid_le_gen(), since there doesn't seem to be any need for a
little-endian uuid/guid here (it's just used as unique bag of bits
which is 16 bytes long), and using uuid_le_gen() means extra memory
has to be allocated on the stack, and then an extra memory copy is
required.  Contrast (in fs/btrfs/root-tree.c):

   uuid_le uuid;
   ...
uuid_le_gen();
memcpy(item->uuid, uuid.b, BTRFS_UUID_SIZE);

versus, simply doing (fs/btrfs/volumes.c):

generate_random_uuid(fs_devices->fsid);

see which one is easier?  And after the uuid is generated, none of the
current callers ever do any manipulation of the uuid, so there's no
real point to play fancy typedef games; it just adds more work for no
real gain.

> > Using UUID vs. GUID I think makes things much clearer, since the EFI
> > specification talks about GUID's, not UUID's, and that way we don't
> > have to worry about people getting confused about whether they should
> > be using the little-endian versus big-endian variant.  (And I'd love
> > to ask to whoever wrote the EFI specification what on *Earth* were
> > they thinking when they decided to diverge from the rest of the
> > world)
> 
> I think that is a good idea.  From Wikipedia, GUID is in native byte
> order, while UUID is in internet byte order.

Well, technially GUID is "intel/little-endian byte order".  If someone
tried to implement the GPT on a big-endian system, such as PowerPC,
they would still have to use the little-endian byte order, even though
it's not the native byte order for that architecture.  Otherwise
devices wouldn't be portable between those systems.  (This is why I
think the GUID was such a bad idea; everyone basically treats them as
16 byte octet strings, so this whole idea of "native byte order" just
to save a few byte swaps at UUID generation time was really, IMHO, a
very bad idea.)

Regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 4/5] Thermal: Add ST-Ericsson DB8500 thermal driver.

2012-10-30 Thread Viresh Kumar
Sorry for late comments :(

On 30 October 2012 22:19, hongbo.zhang  wrote:
> From: "hongbo.zhang" 
>
> This diver is based on the thermal management framework in thermal_sys.c. A

s/diver/driver

> thermal zone device is created with the trip points to which cooling devices
> can be bound, the current cooling device is cpufreq, e.g. CPU frequency is
> clipped down to cool the CPU, and other cooling devices can be added and bound
> to the trip points dynamically.  The platform specific PRCMU interrupts are
> used to active thermal update when trip points are reached.
>
> Signed-off-by: hongbo.zhang 
> ---
>  .../devicetree/bindings/thermal/db8500-thermal.txt |  40 ++
>  drivers/thermal/Kconfig|  20 +
>  drivers/thermal/Makefile   |   2 +
>  drivers/thermal/db8500_cpufreq_cooling.c   | 108 +
>  drivers/thermal/db8500_thermal.c   | 531 
> +
>  include/linux/platform_data/db8500_thermal.h   |  38 ++
>  6 files changed, 739 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/thermal/db8500-thermal.txt
>  create mode 100644 drivers/thermal/db8500_cpufreq_cooling.c
>  create mode 100644 drivers/thermal/db8500_thermal.c
>  create mode 100644 include/linux/platform_data/db8500_thermal.h
>
> diff --git a/Documentation/devicetree/bindings/thermal/db8500-thermal.txt 
> b/Documentation/devicetree/bindings/thermal/db8500-thermal.txt
> new file mode 100644
> index 000..cab6916
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/thermal/db8500-thermal.txt
> @@ -0,0 +1,40 @@
> +* ST-Ericsson DB8500 Thermal
> +
> +** Thermal node properties:
> +
> +- compatible : "stericsson,db8500-thermal";
> +- reg : address range of the thermal sensor registers;
> +- interrupts : interrupts generated from PRCMU;
> +- interrupt-names : "IRQ_HOTMON_LOW" and "IRQ_HOTMON_HIGH";

Just mention here that below properties are optional or required.

> +- num-trips : number of total trip points;
> +- tripN-temp : temperature of trip point N, should be in ascending order;
> +- tripN-type : type of trip point N, should be one of "active" "passive" 
> "hot" "critical";
> +- tripN-cdev-num : number of the cooling devices which can be bound to trip 
> point N;
> +- tripN-cdev-nameM : name of the No. M cooling device of trip point N;

> diff --git a/drivers/thermal/db8500_thermal.c 
> b/drivers/thermal/db8500_thermal.c

> +static int db8500_thermal_match_cdev(struct thermal_cooling_device *cdev,
> +   struct db8500_trip_point *trip_points)
> +{
> +   int i;
> +   char *cdev_name;
> +
> +   if (!strlen(cdev->type))
> +   return -EINVAL;
> +
> +   for (i = 0; i < COOLING_DEV_MAX; i++) {
> +   cdev_name = trip_points->cdev_name[i];
> +   if (!strcmp(cdev_name, cdev->type))

You can actually remove cdev_name variable. and use
if (!strcmp(trip_points->cdev_name[i], cdev->type))

> +   return 0;
> +   }
> +
> +   return -ENODEV;
> +}

> +#ifdef CONFIG_OF
> +static struct db8500_thsens_platform_data*
> +   db8500_thermal_parse_dt(struct platform_device *pdev)
> +{

> +   for (j = 0; j < tmp_data; j++) {
> +   sprintf(prop_name, "trip%d-cdev-name%d", i, j);
> +   if (of_property_read_string(np, prop_name, _str))
> +   goto err_parse_dt;
> +
> +   if (strlen(tmp_str) > THERMAL_NAME_LENGTH)
> +   goto err_parse_dt;
> +
> +   strcpy(ptrips->trip_points[i].cdev_name[j], tmp_str);

want to check if it is copied or not??

> +   }
> +   }
> +   return ptrips;
> +
> +err_parse_dt:
> +   dev_err(>dev, "Parsing device tree data error.\n");
> +   return NULL;
> +}

After these please add my:

Reviewed-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/3] zram/zsmalloc promotion

2012-10-30 Thread Minchan Kim
On Tue, Oct 30, 2012 at 07:16:18PM -0700, Greg Kroah-Hartman wrote:
> On Wed, Oct 31, 2012 at 11:04:43AM +0900, Minchan Kim wrote:
> > Hi Greg,
> > 
> > On Tue, Oct 30, 2012 at 06:42:09PM -0700, Greg Kroah-Hartman wrote:
> > > On Wed, Oct 31, 2012 at 10:06:42AM +0900, Minchan Kim wrote:
> > > > Thanks all,
> > > > 
> > > > At last, everybody who contributes to zsmalloc want to put it under 
> > > > /lib.
> > > > 
> > > > Greg,
> > > > What should I do for promoting this dragging patchset?
> > > 
> > > You need to get the -mm developers to agree that this is something that
> > > is worth accepting.  I have yet to see any compeling argument why this
> > 
> > I'm one of mm developers. :)
> > Yes. I hope Andrew have a time to take a look.
> > 
> > > even needs to be in the kernel in the first place.
> > 
> > Confused. what do you mean "this"? "zsmalloc" or "zram" or "both"?
> > If you mean "zsmalloc", I guess there were some lengthy thread about
> > "why we need a new another allocator". Unfortunately, I didn't follow it
> > at that time. Nitin, Pekka, Could you point out that thread? or summarize
> > the result.
> > 
> > > 
> > > I'm not moving this anywhere until you get their acceptance.
> > 
> > I understand you.
> > 
> > It's one of problem in current mm mailing list.
> > As you know, many mm guys works for server, not embedded so they don't have
> > big interest about embedded feature so prioirty of the feature was always
> > low. CMA proved it and next turn is zram. Even new-comer in mm is few so
> > review bandwidth is always low, too. :(
> > 
> > How can I poke them?
> 
> You just did.  If they ignore this, wait a week, and resend.
> Persistance is key.
> 
> good luck,

Okay. I will wait.
Greg, what do you think about LTSI?
Is it proper feature to add it? For it, still do I need ACK from mm developers?

> 
> greg k-h
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v11] kvm: notify host when the guest is panicked

2012-10-30 Thread Sasha Levin
On Tue, Oct 30, 2012 at 9:48 PM, Wen Congyang  wrote:
> At 10/31/2012 09:12 AM, Marcelo Tosatti Wrote:
>> It has been asked earlier why a simple virtio device is not usable
>> for this (with no response IIRC).
>
> 1. We can't use virtio device when the kernel is booting.

So the issue here is the small window between the point the guest
becomes "self aware" and to the point virtio drivers are loaded,
right?

I agree that if something happens during that interval, a
"virtio-notifier" driver won't catch that, but anything beyond that is
better done with a virtio driver, so how is the generic infrastructure
added in this patch useful to anything beyond detecting panics in that
initial interval?

> 2. The virtio's driver can be built as a module, and if it is not loaded
>and the kernel is panicked, there is no way to notify the host.

Even if the suggested virtio-notifier driver is built as a module, it
would get auto-loaded when the guest is booting, so I'm not sure about
this point?

> 3. I/O port is more reliable than virtio device.
>If virtio's driver has some bug, and it cause kernel panicked, we can't
>use it. The I/O port is more reliable because it only depends on notifier
>chain(If we use virtio device, it also depends on notifier chain).

This is like suggesting that we let KVM emulate virtio-blk on it's
own, parallel to the virtio implementation, so that even if there's a
problem with virtio-blk, KVM can emulate a virtio-blk on it's own.

Furthermore, why stop at virtio? What if the KVM code has a bug and it
doesn't pass IO properly? Or the x86 code? we still want panic
notifications if that happens...


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [linux-next PATCH] PM / devfreq: documentation cleanups for devfreq header

2012-10-30 Thread 함명주
> On Tuesday, October 30, 2012 08:09:09 AM MyungJoo Ham wrote:
[]
> > 
> > Acked-by: MyungJoo Ham 
> > 
> > Applying to git://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq.git 
> > for-rafael, which is based on rafael's linux-pm.git / linux-next.
> > 
> > http://git.kernel.org/?p=linux/kernel/git/mzx/devfreq.git;a=shortlog;h=refs/heads/for-rafael
> > 
> > I'll apply your "Add sysfs node ..." patch after refactoring with Jonghwa's 
> > (devfreq trans_stat) as the two patches use the same data (list of 
> > available freqs).
> 
> May I assume that you'll handle all of the subsequent devfreq patches too?
> 
> Rafael

Yes, you may. I'll apply the patches (currently at 
http://git.kernel.org/?p=linux/kernel/git/mzx/devfreq.git;a=shortlog;h=refs/heads/for-rafael
 ) and send pull requests to you or Linus.

Anyway, do you want me to keep sending pull requests to you as you've told
last time?


Cheers,
MyungJoo



Re: [PATCH v8 01/16] hashtable: introduce a small and naive hashtable

2012-10-30 Thread Al Viro
On Tue, Oct 30, 2012 at 06:25:46PM -0700, Linus Torvalds wrote:

> But whatever. This series has gotten way too much bike-shedding
> anyway. I think it should just be applied, since it does remove lines
> of code overall. I'd even possibly apply it to mainline, but it seems
> to be against linux-next.

BTW, how serious have you been back at KS when you were talking about
pull requests killing a thousand of lines of code being acceptable
at any point in the cycle?  Because right now I'm sitting on a pile that
removes 2-3 times as much (~-2KLoC for stuff that got considerable
testing for most of the architectures, -3KLoC if I include fork/clone/vfork
unification series) and seeing how maintainers of a bunch of embedded
architectures seem to be MIA...  The idea of saying "screw them" and sending
a pull request becomes more and more tempting every day ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 01/16] hashtable: introduce a small and naive hashtable

2012-10-30 Thread Linus Torvalds
On Tue, Oct 30, 2012 at 6:36 PM, Sasha Levin  wrote:
>
> I can either rebase that on top of mainline, or we can ask maintainers
> to take it to their own trees if you take only 01/16 into mainline.
> What would you prefer?

I don't really care deeply. The only reason to merge it now would be
to avoid any pain with it during the next merge window. Just taking
01/16 might be the sanest way to do that, then the rest can trickle in
independently at their own leisure.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

2012-10-30 Thread Steven Rostedt
On Wed, 2012-10-31 at 01:36 +0100, Frederic Weisbecker wrote:
> 2012/10/30 anish kumar :
> > As I understand without the memory barrier proposed by you the situation
> > would be as below:
> > CPU 0 CPU 1
> >
> > data = something flags = IRQ_WORK_BUSY
> > smp_mb() (implicit with cmpxchg  execute_work (sees data from CPU 0)
> >on flags in claim)
> > _success_ in claiming and goes
> > ahead and execute the work(wrong?)
> >  cmpxchg cause flag to IRQ_WORK_BUSY
> >
> > Now knows the flag==IRQ_WORK_BUSY
> >
> > Am I right?
> 
> (Adding Paul in Cc because I'm again confused with memory barriers)
> 
> Actually what I had in mind is rather that CPU 0 fails its claim
> because it's not seeing the IRQ_WORK_BUSY flag as it should:
> 
> 
> CPU 0 CPU 1
> 
> data = something  flags = IRQ_WORK_BUSY
> cmpxchg() for claim   execute_work (sees data from CPU 0)
> 
> CPU 0 should see IRQ_WORK_BUSY but it may not because CPU 1 sets this
> value in a non-atomic way.
> 
> Also, while browsing Paul's perfbook, I realize my changelog is buggy.
> It seems we can't reliably use memory barriers here because we would
> be in the following case:
> 
> CPU 0  CPU 1
> 
> store(work data)store(flags)
> smp_mb()smp_mb()
> load(flags)load(work data)
> 
> On top of this barrier pairing, we can't make the assumption that, for
> example, if CPU 1 sees the work data stored in CPU 0 then CPU 0 sees
> the flags stored in CPU 1.
> 
> So now I wonder if cmpxchg() can give us more confidence:

More confidence over what? The xchg()? They are equivalent (wrt memory
barriers).

Here's the issue that currently exists. Let's look at the code:


/*
 * Claim the entry so that no one else will poke at it.
 */
static bool irq_work_claim(struct irq_work *work)
{
unsigned long flags, nflags;

for (;;) {
flags = work->flags;
if (flags & IRQ_WORK_PENDING)
return false;
nflags = flags | IRQ_WORK_FLAGS;
if (cmpxchg(>flags, flags, nflags) == flags)
break;
cpu_relax();
}

return true;
}

and

llnode = llist_del_all(this_list);
while (llnode != NULL) {
work = llist_entry(llnode, struct irq_work, llnode);

llnode = llist_next(llnode);

/*
 * Clear the PENDING bit, after this point the @work
 * can be re-used.
 */
work->flags = IRQ_WORK_BUSY;
work->func(work);
/*
 * Clear the BUSY bit and return to the free state if
 * no-one else claimed it meanwhile.
 */
(void)cmpxchg(>flags, IRQ_WORK_BUSY, 0);
}

The irq_work_claim() will only queue its work if it's not already
pending. If it is pending, then someone is going to process it anyway.
But once we start running the function, new work needs to be processed
again.

Thus we have:

CPU 1   CPU 2
-   -
(flags = 0)
cmpxchg(flags, 0, IRQ_WORK_FLAGS)
(flags = 3)
[...]

if (flags & IRQ_WORK_PENDING)
return false
flags = IRQ_WORK_BUSY
(flags = 2)
func()

The above shows the normal case were CPU 2 doesn't need to queue work,
because its going to be done for it by CPU 1. But...



CPU 1   CPU 2
-   -
(flags = 0)
cmpxchg(flags, 0, IRQ_WORK_FLAGS)
(flags = 3)
[...]
flags = IRQ_WORK_BUSY
(flags = 2)
func()
(sees flags = 3)
if (flags & IRQ_WORK_PENDING)
return false
cmpxchg(flags, 2, 0);
(flags = 0)


Here, because we did not do a memory barrier after 
flags = IRQ_WORK_BUSY, CPU 2 saw stale data and did not queue its work,
and missed the opportunity. Now if you had this fix with the xchg() as
you have in your patch, then CPU 2 would not see the stale flags.
Except, with the code I showed above it still can!

CPU 1   CPU 2
-   -
(flags = 0)
cmpxchg(flags, 0, IRQ_WORK_FLAGS)
(flags = 3)
[...]
(fetch flags)
xchg(, IRQ_WORK_BUSY)
(flags = 2)
func()

Re: [PATCH V3 5/5] Thermal: Add ST-Ericsson DB8500 thermal properties and platform data.

2012-10-30 Thread viresh kumar
On Tue, Oct 30, 2012 at 10:19 PM, hongbo.zhang  wrote:
> From: "hongbo.zhang" 

Just a minor comment below.

> This patch adds device tree properties for ST-Ericsson DB8500 thermal driver,
> also adds the platform data to support the old fashion.
>
> Signed-off-by: hongbo.zhang 

Reviewed-by: Viresh Kumar 

> ---
>  arch/arm/boot/dts/dbx5x0.dtsi  | 14 +
>  arch/arm/boot/dts/snowball.dts | 31 ++
>  arch/arm/configs/u8500_defconfig   |  4 +++
>  arch/arm/mach-ux500/board-mop500.c | 64 
> ++
>  4 files changed, 113 insertions(+)

> diff --git a/arch/arm/configs/u8500_defconfig 
> b/arch/arm/configs/u8500_defconfig
> index cc5e7a8..34918c4 100644
> --- a/arch/arm/configs/u8500_defconfig
> +++ b/arch/arm/configs/u8500_defconfig
> @@ -118,3 +118,7 @@ CONFIG_DEBUG_KERNEL=y
>  CONFIG_DEBUG_INFO=y
>  # CONFIG_FTRACE is not set
>  CONFIG_DEBUG_USER=y
> +CONFIG_THERMAL=y
> +CONFIG_CPU_THERMAL=y
> +CONFIG_DB8500_THERMAL=y
> +CONFIG_DB8500_CPUFREQ_COOLING=y

Have you entered these manually?? Or used make savedefconfig?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/3] zram/zsmalloc promotion

2012-10-30 Thread Greg Kroah-Hartman
On Wed, Oct 31, 2012 at 11:04:43AM +0900, Minchan Kim wrote:
> Hi Greg,
> 
> On Tue, Oct 30, 2012 at 06:42:09PM -0700, Greg Kroah-Hartman wrote:
> > On Wed, Oct 31, 2012 at 10:06:42AM +0900, Minchan Kim wrote:
> > > Thanks all,
> > > 
> > > At last, everybody who contributes to zsmalloc want to put it under /lib.
> > > 
> > > Greg,
> > > What should I do for promoting this dragging patchset?
> > 
> > You need to get the -mm developers to agree that this is something that
> > is worth accepting.  I have yet to see any compeling argument why this
> 
> I'm one of mm developers. :)
> Yes. I hope Andrew have a time to take a look.
> 
> > even needs to be in the kernel in the first place.
> 
> Confused. what do you mean "this"? "zsmalloc" or "zram" or "both"?
> If you mean "zsmalloc", I guess there were some lengthy thread about
> "why we need a new another allocator". Unfortunately, I didn't follow it
> at that time. Nitin, Pekka, Could you point out that thread? or summarize
> the result.
> 
> > 
> > I'm not moving this anywhere until you get their acceptance.
> 
> I understand you.
> 
> It's one of problem in current mm mailing list.
> As you know, many mm guys works for server, not embedded so they don't have
> big interest about embedded feature so prioirty of the feature was always
> low. CMA proved it and next turn is zram. Even new-comer in mm is few so
> review bandwidth is always low, too. :(
> 
> How can I poke them?

You just did.  If they ignore this, wait a week, and resend.
Persistance is key.

good luck,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [for-next PATCH V2] PM / devfreq: Add sysfs node to expose available frequencies

2012-10-30 Thread MyungJoo Ham
> On Friday, October 26, 2012 06:16:36 AM MyungJoo Ham wrote:
> > > devfreq governors such as ondemand are controlled by a min and
> > > max frequency, while governors like userspace governor allow us
> > > to set a specific frequency.
> > > However, for the same specific device, depending on the SoC, the
> > > available frequencies can vary.
> > > 
> > > So expose the available frequencies as a snapshot over sysfs to
> > > allow informed decisions.
> > > 
> > > This was inspired by cpufreq framework's equivalent for similar
> > > usage sysfs node: scaling_available_frequencies.
> > > 
> > > Cc: Rajagopal Venkat 
> > > Cc: MyungJoo Ham 
> > > Cc: Kyungmin Park 
> > > Cc: "Rafael J. Wysocki" 
> > > Cc: Kevin Hilman 
> > > Cc: linux...@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > 
> > > Signed-off-by: Nishanth Menon 
> > 
> > Acked-by: MyungJoo Ham 
> 
> Are you going to handle this patch?

Yes, I've just setup the git repository this week
and I'm willing to handle this one.

It is applied at
http://git.kernel.org/?p=linux/kernel/git/mzx/devfreq.git;a=shortlog;h=refs/heads/for-rafael

Thanks,

MyungJoo

> 
> Rafael
> 
> 
> -- 
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
> 
> 
> 
>
>   
>  
> 
N떑꿩�r툤y鉉싕b쾊Ф푤v�^�)頻{.n�+돴쪐{콗喩zX㎍썳變}찠꼿쟺�:+v돣�쳭喩zZ+€�+zf"톒쉱�~넮녬i鎬z�췿ⅱ�?솳鈺�&�)刪f뷌^j푹y쬶끷@A첺뛴
0띠h��뭝

Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()

2012-10-30 Thread Ming Lei
On Wed, Oct 31, 2012 at 12:30 AM, Oliver Neukum  wrote:
>> If the USB mass-storage device is being reseted, the flag should be set
>> already generally.  If the flag is still unset, that means the disk/network
>> device isn't added into system(or removed just now), so memory allocation
>> with block I/O should be allowed during the reset. Looks it isn't one 
>> problem,
>> isn't it?
>
> I am afraid it is, because a disk may just have been probed as the deviceis 
> being reset.

Yes, it is probable, and sounds like similar with 'root_wait' problem, see
prepare_namespace(): init/do_mounts.c, so looks no good solution
for the problem, and maybe we have to set the flag always before resetting
usb device.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 0/5] Fix thermal bugs and Upstream ST-Ericsson thermal driver

2012-10-30 Thread viresh kumar
On Tue, Oct 30, 2012 at 10:18 PM, hongbo.zhang  wrote:
> From: "hongbo.zhang" 
>
> V2->V3 Changes:
>
> 2. Update ST-Ericsson thermal driver due to review comments from Viresh Kumar
> and Francesco Lavra.

You expect people, who want to know what has changed, to go and check our
comments? That never happens and so above statement from you must have
been more descriptive, detailing all the fixes you have done.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the block tree with Linus' tree

2012-10-30 Thread Stephen Rothwell
Hi Jens,

Today's linux-next merge of the block tree got a conflict in
drivers/block/floppy.c between a set of common patches from Linus' tree
and commit b33d002f4b6b ("genhd: Make put_disk() safe for disks that have
not been registered") from the block tree.

I fixed it up (by using the block tree version) and can carry the fix as
necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpXsDiI9GJwZ.pgp
Description: PGP signature


[PATCH] Thermal: exynos: Add sysfs node supporting exynos's emulation mode.

2012-10-30 Thread Jonghwa Lee
This patch supports exynos's emulation mode with newly created sysfs node.
Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal
management unit. Thermal emulation mode supports software debug for TMU's
operation. User can set temperature manually with software code and TMU
will read current temperature from user value not from sensor's value.
This patch includes also documentary placed under Documentation/thermal/.

Signed-off-by: Jonghwa Lee 
---
 Documentation/thermal/exynos_thermal_emulation |   49 +
 drivers/thermal/Kconfig|9 +++
 drivers/thermal/exynos_thermal.c   |   89 
 3 files changed, 147 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/thermal/exynos_thermal_emulation

diff --git a/Documentation/thermal/exynos_thermal_emulation 
b/Documentation/thermal/exynos_thermal_emulation
new file mode 100644
index 000..daf3216
--- /dev/null
+++ b/Documentation/thermal/exynos_thermal_emulation
@@ -0,0 +1,49 @@
+EXYNOS EMULATION MODE
+
+
+Copyright (C) 2012 Samsung Electronics
+
+Writen by Jonghwa Lee 
+
+Description
+---
+
+Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal 
management unit.
+Thermal emulation mode supports software debug for TMU's operation. User can 
set temperature
+manually with software code and TMU will read current temperature from user 
value not from
+sensor's value.
+
+Enabling CONFIG_EXYNOS_THERMAL_EMUL option will make this support in available.
+When it's enabled, sysfs node will be created under
+/sys/bus/platform/devices/'exynos device name'/ with name of 'emulation'.
+
+The sysfs node, 'emulation', will contain value 0 for the initial state. When 
you input any
+temperature you want to update to sysfs node, it automatically enable 
emulation mode and
+current temperature will be changed into it.
+(Exynos also supports user changable delay time which would be used to delay of
+ changing temperature. However, this node only uses same delay of real sensing 
time, 938us.)
+
+Disabling emulation mode only requires writing value 0 to sysfs node.
+
+
+TEMP   120 |
+   |
+   100 |
+   |
+80 |
+   |+---
+60 ||  |
+   |  +-|  |
+40 |  | |  |
+   |  | |  |
+20 |  | |  +--
+   |  | |  |  |
+ 0 |__|_|__|__|_
+  A A  A  A TIME
+  |<->| |<->|  |<->|  |
+  | 938us | |   |  |   |  |
+emulation:  0  50  |70  |  20  |  0
+current temp :   sensor   5070 20sensor
+
+
+
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index e1cb6bd..c02a66c 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -55,3 +55,12 @@ config EXYNOS_THERMAL
help
  If you say yes here you get support for TMU (Thermal Managment
  Unit) on SAMSUNG EXYNOS series of SoC.
+
+config EXYNOS_THERMAL_EMUL
+   bool "EXYNOS TMU emulation mode support"
+   depends on !CPU_EXYNOS4210 && EXYNOS_THERMAL
+   help
+ Exynos 4412 and 4414 and 5 series has emulation mode on TMU.
+ Enable this option will be make sysfs node in exynos thermal platform
+ device directory to support emulation mode. With emulation mode sysfs
+ node, you can manually input temperature to TMU for simulation 
purpose.
diff --git a/drivers/thermal/exynos_thermal.c b/drivers/thermal/exynos_thermal.c
index fd03e85..baa9108 100644
--- a/drivers/thermal/exynos_thermal.c
+++ b/drivers/thermal/exynos_thermal.c
@@ -99,6 +99,15 @@
 #define IDLE_INTERVAL 1
 #define MCELSIUS   1000
 
+#ifdef CONFIG_EXYNOS_THERMAL_EMUL
+#define EXYNOS_EMUL_TIME   0x57F0
+#define EXYNOS_EMUL_TIME_SHIFT 16
+#define EXYNOS_EMUL_DATA_SHIFT 8
+#define EXYNOS_EMUL_DATA_MASK  0xFF
+#define EXYNOS_EMUL_DISABLE0x0
+#define EXYNOS_EMUL_ENABLE 0x1
+#endif /* CONFIG_EXYNOS_THERMAL_EMUL */
+
 /* CPU Zone information */
 #define PANIC_ZONE  4
 #define WARN_ZONE   3
@@ -832,6 +841,83 @@ static inline struct  exynos_tmu_platform_data 
*exynos_get_driver_data(
return (struct exynos_tmu_platform_data *)
platform_get_device_id(pdev)->driver_data;
 }
+
+#ifdef CONFIG_EXYNOS_THERMAL_EMUL
+static ssize_t exynos_tmu_emulation_show(struct device *dev,
+struct device_attribute *attr,
+char *buf)
+{
+   struct platform_device *pdev = 

Re: [PATCH v3 0/3] zram/zsmalloc promotion

2012-10-30 Thread Minchan Kim
Hi Greg,

On Tue, Oct 30, 2012 at 06:42:09PM -0700, Greg Kroah-Hartman wrote:
> On Wed, Oct 31, 2012 at 10:06:42AM +0900, Minchan Kim wrote:
> > Thanks all,
> > 
> > At last, everybody who contributes to zsmalloc want to put it under /lib.
> > 
> > Greg,
> > What should I do for promoting this dragging patchset?
> 
> You need to get the -mm developers to agree that this is something that
> is worth accepting.  I have yet to see any compeling argument why this

I'm one of mm developers. :)
Yes. I hope Andrew have a time to take a look.

> even needs to be in the kernel in the first place.

Confused. what do you mean "this"? "zsmalloc" or "zram" or "both"?
If you mean "zsmalloc", I guess there were some lengthy thread about
"why we need a new another allocator". Unfortunately, I didn't follow it
at that time. Nitin, Pekka, Could you point out that thread? or summarize
the result.

> 
> I'm not moving this anywhere until you get their acceptance.

I understand you.

It's one of problem in current mm mailing list.
As you know, many mm guys works for server, not embedded so they don't have
big interest about embedded feature so prioirty of the feature was always
low. CMA proved it and next turn is zram. Even new-comer in mm is few so
review bandwidth is always low, too. :(

How can I poke them?
The only thing I can do is just (wait, repost) * 5?
Sigh. :(

> 
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/kexec: VMCLEAR vmcss on all cpus if necessary

2012-10-30 Thread zhangyanfei
于 2012年10月31日 08:18, Marcelo Tosatti 写道:
> On Fri, Oct 19, 2012 at 01:44:31PM +0800, Zhang Yanfei wrote:
>> This patch provides a way to VMCLEAR vmcss related to guests
>> on all cpus before executing the VMXOFF when doing kdump. This
>> is used to ensure the VMCSs in the vmcore updated and
>> non-corrupted.
>>
>> Signed-off-by: zhangyanfei 
>> ---
>>  arch/x86/include/asm/kexec.h |2 ++
>>  arch/x86/kernel/crash.c  |   27 +++
>>  2 files changed, 29 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 317ff17..fc05440 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -163,6 +163,8 @@ struct kimage_arch {
>>  };
>>  #endif
>>  
>> +extern void (*crash_clear_loaded_vmcss)(void);
>> +
>>  #endif /* __ASSEMBLY__ */
>>  
>>  #endif /* _ASM_X86_KEXEC_H */
>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>> index 13ad899..7289976 100644
>> --- a/arch/x86/kernel/crash.c
>> +++ b/arch/x86/kernel/crash.c
>> @@ -16,6 +16,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include 
>>  #include 
>> @@ -30,6 +31,22 @@
>>  
>>  int in_crash_kexec;
>>  
>> +/*
>> + * This is used to VMCLEAR vmcss loaded on all
>> + * cpus. And when loading kvm_intel module, the
>> + * function pointer will be made valid.
>> + */
>> +void (*crash_clear_loaded_vmcss)(void) = NULL;
>> +EXPORT_SYMBOL_GPL(crash_clear_loaded_vmcss);
>> +
>> +static void cpu_emergency_clear_loaded_vmcss(void)
>> +{
>> +if (crash_clear_loaded_vmcss &&
>> +cpu_has_vmx() && cpu_vmx_enabled()) {
>> +crash_clear_loaded_vmcss();
>> +}
>> +}
>> +
> 
> Are all this checks necessary? 
> 
> if (crash_clear_loaded_vmcss)
>   crash_clear_loaded_vmcss();
> 
> Should be enough ? (callback only set if kvm-vmx module loaded).

Hmm, it is enough. Thanks.

> 
>>  #if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
>>  
>>  static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
>> @@ -46,6 +63,11 @@ static void kdump_nmi_callback(int cpu, struct pt_regs 
>> *regs)
>>  #endif
>>  crash_save_cpu(regs, cpu);
>>  
>> +/*
>> + * VMCLEAR vmcss loaded on all cpus if needed.
>> + */
>> +cpu_emergency_clear_loaded_vmcss();
>> +
>>  /* Disable VMX or SVM if needed.
>>   *
>>   * We need to disable virtualization on all CPUs.
>> @@ -88,6 +110,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>>  
>>  kdump_nmi_shootdown_cpus();
>>  
>> +/*
>> + * VMCLEAR vmcss loaded on this cpu if needed.
>> + */
>> +cpu_emergency_clear_loaded_vmcss();
>> +
>>  /* Booting kdump kernel with VMX or SVM enabled won't work,
>>   * because (among other limitations) we can't disable paging
>>   * with the virt flags.
>> -- 
>> 1.7.1
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


ACPI errors with 3.7-rc3

2012-10-30 Thread Greg KH
Hi Len and Rafael,

With 3.7-rc3, I'm seeing a constant stream of these errors in the kernel
log for my MacBook Pro:

[30443.430133] ACPI: EC: input buffer is not empty, aborting transaction
[30443.430145] ACPI Exception: AE_TIME, Returned by Handler for 
[EmbeddedControl] (20120913/evregion-501)
[30443.430162] ACPI Error: Method parse/execution failed 
[\_SB_.PCI0.LPCB.EC__.SMB0.SBRW] (Node 88045cc64618), AE_TIME 
(20120913/psparse-536)
[30443.430179] ACPI Error: Method parse/execution failed [\_SB_.BAT0.UBST] 
(Node 88045cc64988), AE_TIME (20120913/psparse-536)
[30443.430188] ACPI Error: Method parse/execution failed [\_SB_.BAT0._BST] 
(Node 88045cc648c0), AE_TIME (20120913/psparse-536)
[30443.430202] ACPI Exception: AE_TIME, Evaluating _BST (20120913/battery-464)

They never showed up before in 3.7-rc2.

Anything I should try out to resolve this?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v11] kvm: notify host when the guest is panicked

2012-10-30 Thread Wen Congyang
At 10/31/2012 09:12 AM, Marcelo Tosatti Wrote:
> On Thu, Oct 25, 2012 at 11:42:32AM +0800, Hu Tao wrote:
>> We can know the guest is panicked when the guest runs on xen.
>> But we do not have such feature on kvm.
>>
>> Another purpose of this feature is: management app(for example:
>> libvirt) can do auto dump when the guest is panicked. If management
>> app does not do auto dump, the guest's user can do dump by hand if
>> he sees the guest is panicked.
>>
>> We have three solutions to implement this feature:
>> 1. use vmcall
>> 2. use I/O port
>> 3. use virtio-serial.
>>
>> We have decided to avoid touching hypervisor. The reason why I choose
>> choose the I/O port is:
>> 1. it is easier to implememt
>> 2. it does not depend any virtual device
>> 3. it can work when starting the kernel
> 
> It has been asked earlier why a simple virtio device is not usable
> for this (with no response IIRC).

1. We can't use virtio device when the kernel is booting.
2. The virtio's driver can be built as a module, and if it is not loaded
   and the kernel is panicked, there is no way to notify the host.
3. I/O port is more reliable than virtio device.
   If virtio's driver has some bug, and it cause kernel panicked, we can't
   use it. The I/O port is more reliable because it only depends on notifier
   chain(If we use virtio device, it also depends on notifier chain).

Thanks
Wen Congyang

> 
> Also, there is no high level documentation: purpose of the interface,
> how a management application should use it, etc.
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/3] zram/zsmalloc promotion

2012-10-30 Thread Greg Kroah-Hartman
On Wed, Oct 31, 2012 at 10:06:42AM +0900, Minchan Kim wrote:
> Thanks all,
> 
> At last, everybody who contributes to zsmalloc want to put it under /lib.
> 
> Greg,
> What should I do for promoting this dragging patchset?

You need to get the -mm developers to agree that this is something that
is worth accepting.  I have yet to see any compeling argument why this
even needs to be in the kernel in the first place.

I'm not moving this anywhere until you get their acceptance.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Console font corruption on tty1 after using xorg in 3.6 with i915

2012-10-30 Thread Ken Moffat
Hi, since 3.6-rc7 I'm *sometimes* seeing console font corruption on
tty1 after I leave xorg [ I'm old enough to use 'startx' ].  This is
with a 512-glyph font.  What seems to be happening is that many
lower-case ASCII letters, and also '0', are replaced by other
glyphs.  Many of these other glyphs happen to be stored at ASCII
values below 'space' in my font, but I doubt that is important.

 Last week I tried to determine where/when this happened, and
managed to get it by using the 3.4 epiphany browser, perhaps only
when accessing googlemail.  But it didn't happen all the time.  I
never saw it with 3.6-rc3, but I did see it with all of -rc7, 3.6.0,
3.6.1.  I then upgraded to 3.6.3 and the problem seemed to have gone.
Unfortunately, tonight it happened again in 3.6.3.

 Is anyone else seeing anything like this ?

 Just to be clear, I normally log in on tty1.  If the problem
occurs, tty2 to tty6 are fine.  The only way I've found to fix the
corruption is to reboot.  I'm mentioning i915 in the subject because
my r600 radeon doesn't have this problem.

ken
-- 
das eine Mal als Tragödie, das andere Mal als Farce
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 01/16] hashtable: introduce a small and naive hashtable

2012-10-30 Thread Sasha Levin
Hi Linus,

> But whatever. This series has gotten way too much bike-shedding
> anyway. I think it should just be applied, since it does remove lines
> of code overall. I'd even possibly apply it to mainline, but it seems
> to be against linux-next.

Yup, I switched to using -next because I've been running my
trinity/KVM tools tests with it.

I can either rebase that on top of mainline, or we can ask maintainers
to take it to their own trees if you take only 01/16 into mainline.
What would you prefer?


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 01/16] hashtable: introduce a small and naive hashtable

2012-10-30 Thread Steven Rostedt
On Tue, 2012-10-30 at 18:25 -0700, Linus Torvalds wrote:
> On Tue, Oct 30, 2012 at 6:16 PM, Steven Rostedt  wrote:
> >
> > ({\
> > sizeof(val) <= 4 ? hash_32(val, bits) : hash_long(val, bits); \
> > })
> >
> > Is the better way to go. We are C programmers, we like to see the ?: on
> > a single line if possible. The way you have it, looks like three
> > statements run consecutively.
> 
> If we're C programmers, why use the non-standard statement-expression
> at all? And split it onto three lines when it's just a single one?

I like the blue color over the pink. Anyway, I was just expressing an
opinion and really didn't care if it was changed or not.


> 
> But whatever. This series has gotten way too much bike-shedding
> anyway. I think it should just be applied, since it does remove lines
> of code overall. I'd even possibly apply it to mainline, but it seems
> to be against linux-next.

I would think this change is a bit too big for an -rc4 release, but
you're the boss.  I've already given my ack for my code that this set
touches. Let it go to Stephen's repo then.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/9] uuid: use random32_get_bytes()

2012-10-30 Thread Huang Ying
On Tue, 2012-10-30 at 00:48 -0400, Theodore Ts'o wrote:
> On Tue, Oct 30, 2012 at 09:49:58AM +0800, Huang Ying wrote:
> > The uuid_le/be_gen() in lib/uuid.c has set UUID variants to be DCE,
> > that is done in __uuid_gen_common() with "b[8] = (b[8] & 0x3F) | 0x80".
> 
> Oh, I see, I missed that.
> 
> > To deal with random number generation issue, how about use
> > get_random_bytes() in __uuid_gen_common()?
> 
> We already have generate_random_uuid() in drivers/char/random.c, and
> no users for lib/uuid.c's equivalent uuid_be_gen().  So here's a
> counter-proposal, why don't we drop lib/uuid.c, and include in
> drivers/char/random.c:
> 
> /*
>  * Generate random GUID
>  *
>  * GUID's is like UUID's, but they uses the non-standard little-endian
>  * layout, compared to what is defined in RFC-4112; it is primarily
>  * used by the EFI specification.
>  */
> void generate_random_guid(unsigned char uuid_out[16])
> {
>   get_random_bytes(uuid_out, 16);
>   /* Set UUID version to 4 --- truly random generation */
>   uuid_out[7] = (uuid_out[7] & 0x0F) | 0x40;
>   /* Set the UUID variant to DCE */
>   uuid_out[8] = (uuid_out[8] & 0x3F) | 0x80;
> }
> EXPORT_SYMBOL(generate_random_guid);
> 
> I really don't think it's worth it to have a __uuid_gen_common once we
> are using get_random_bytes(), since there isn't much code to be
> factored out, and it's simpler just to have two functions in one place.

The intention of lib/uuid.c is to unify various UUID related code, and
put them in same place.  In addition to UUID generation, it provide some
other utility and may provide/collect more in the future.  So do you
think it is a good idea to put generate_rand_uuid/guid into lib/uuid.c
and maybe change the name/prototype to make it consistent with other
uuid definitions?

> Using UUID vs. GUID I think makes things much clearer, since the EFI
> specification talks about GUID's, not UUID's, and that way we don't
> have to worry about people getting confused about whether they should
> be using the little-endian versus big-endian variant.  (And I'd love
> to ask to whoever wrote the EFI specification what on *Earth* were
> they thinking when they decided to diverge from the rest of the
> world)

I think that is a good idea.  From Wikipedia, GUID is in native byte
order, while UUID is in internet byte order.

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] x86/kexec: VMCLEAR vmcss on all cpus if necessary

2012-10-30 Thread Marcelo Tosatti
On Fri, Oct 19, 2012 at 01:44:31PM +0800, Zhang Yanfei wrote:
> This patch provides a way to VMCLEAR vmcss related to guests
> on all cpus before executing the VMXOFF when doing kdump. This
> is used to ensure the VMCSs in the vmcore updated and
> non-corrupted.
> 
> Signed-off-by: zhangyanfei 
> ---
>  arch/x86/include/asm/kexec.h |2 ++
>  arch/x86/kernel/crash.c  |   27 +++
>  2 files changed, 29 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 317ff17..fc05440 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -163,6 +163,8 @@ struct kimage_arch {
>  };
>  #endif
>  
> +extern void (*crash_clear_loaded_vmcss)(void);
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* _ASM_X86_KEXEC_H */
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 13ad899..7289976 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -30,6 +31,22 @@
>  
>  int in_crash_kexec;
>  
> +/*
> + * This is used to VMCLEAR vmcss loaded on all
> + * cpus. And when loading kvm_intel module, the
> + * function pointer will be made valid.
> + */
> +void (*crash_clear_loaded_vmcss)(void) = NULL;
> +EXPORT_SYMBOL_GPL(crash_clear_loaded_vmcss);
> +
> +static void cpu_emergency_clear_loaded_vmcss(void)
> +{
> + if (crash_clear_loaded_vmcss &&
> + cpu_has_vmx() && cpu_vmx_enabled()) {
> + crash_clear_loaded_vmcss();
> + }
> +}
> +

Are all this checks necessary? 

if (crash_clear_loaded_vmcss)
crash_clear_loaded_vmcss();

Should be enough ? (callback only set if kvm-vmx module loaded).

>  #if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
>  
>  static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
> @@ -46,6 +63,11 @@ static void kdump_nmi_callback(int cpu, struct pt_regs 
> *regs)
>  #endif
>   crash_save_cpu(regs, cpu);
>  
> + /*
> +  * VMCLEAR vmcss loaded on all cpus if needed.
> +  */
> + cpu_emergency_clear_loaded_vmcss();
> +
>   /* Disable VMX or SVM if needed.
>*
>* We need to disable virtualization on all CPUs.
> @@ -88,6 +110,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>  
>   kdump_nmi_shootdown_cpus();
>  
> + /*
> +  * VMCLEAR vmcss loaded on this cpu if needed.
> +  */
> + cpu_emergency_clear_loaded_vmcss();
> +
>   /* Booting kdump kernel with VMX or SVM enabled won't work,
>* because (among other limitations) we can't disable paging
>* with the virt flags.
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v11] kvm: notify host when the guest is panicked

2012-10-30 Thread Marcelo Tosatti
On Thu, Oct 25, 2012 at 11:42:32AM +0800, Hu Tao wrote:
> We can know the guest is panicked when the guest runs on xen.
> But we do not have such feature on kvm.
> 
> Another purpose of this feature is: management app(for example:
> libvirt) can do auto dump when the guest is panicked. If management
> app does not do auto dump, the guest's user can do dump by hand if
> he sees the guest is panicked.
> 
> We have three solutions to implement this feature:
> 1. use vmcall
> 2. use I/O port
> 3. use virtio-serial.
> 
> We have decided to avoid touching hypervisor. The reason why I choose
> choose the I/O port is:
> 1. it is easier to implememt
> 2. it does not depend any virtual device
> 3. it can work when starting the kernel

It has been asked earlier why a simple virtio device is not usable
for this (with no response IIRC).

Also, there is no high level documentation: purpose of the interface,
how a management application should use it, etc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-30 Thread Zhang, Jun
>From aebc336baa7ec2d4ccb6f21166770c7d2ee26cba Mon Sep 17 00:00:00 2001
From: jzha144 
Date: Wed, 31 Oct 2012 08:51:18 +0800
Subject: [PATCH] To crash dump, we need keep other memory type except
 E820_RAM, because other type come from BIOS or firmware is
 used by other code(for example: PCI_MMCONFIG).

Signed-off-by: jzha144 
---
 arch/x86/kernel/e820.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index df06ade..8760427 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
 * reset.
 */
saved_max_pfn = e820_end_of_ram_pfn();
+
+   /*
+* To CRASH DUMP, only remove E820_RAM.
+*  some other memory typecome from BIOS or firmware,
+* it must be same with system kernel.
+*/
+   e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
+   userdef = 1;
+   return 0;
 #endif
e820.nr_map = 0;
userdef = 1;
-- 
1.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 01/16] hashtable: introduce a small and naive hashtable

2012-10-30 Thread Linus Torvalds
On Tue, Oct 30, 2012 at 6:16 PM, Steven Rostedt  wrote:
>
> ({\
> sizeof(val) <= 4 ? hash_32(val, bits) : hash_long(val, bits); \
> })
>
> Is the better way to go. We are C programmers, we like to see the ?: on
> a single line if possible. The way you have it, looks like three
> statements run consecutively.

If we're C programmers, why use the non-standard statement-expression
at all? And split it onto three lines when it's just a single one?

But whatever. This series has gotten way too much bike-shedding
anyway. I think it should just be applied, since it does remove lines
of code overall. I'd even possibly apply it to mainline, but it seems
to be against linux-next.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ANNOUNCE] 3.6.4-rt11

2012-10-30 Thread Thomas Gleixner
Dear RT Folks,

I'm pleased to announce the 3.6.4-rt11 release.

Changes since 3.6.3-rt10:

   * Crypto wreckage fix (Milan Broz)

 Another proof why copy and paste should be forbidden, but if that
 would happen most of us would be serving time.

   * Another attempt to tame SLUB

 My previous approach turned out to be too naive though this one
 has at least held up against massive memory stress tests. It's a
 very simple and straight forward aproach now and while I'm quite
 sure that it will not fall over as it did before, there might be
 hidden latency issues with that new version.

  So please give it a proper testing!

   * Lazy preemption

 It has become an obsession to mitigate the determinism
 vs. throughput loss of RT. Looking at the mainline semantics of
 preemption points gives a hint why RT sucks throughput wise for
 ordinary SCHED_OTHER tasks. One major issue is the wakeup of
 tasks which are right away preempting the waking task while the
 waking task holds a lock on which the woken task will block right
 after having preempted the wakee. In mainline this is prevented
 due to the implicit preemption disable of spin/rw_lock held
 regions. On RT this is not possible due to the fully preemptible
 nature of sleeping spinlocks.

 Though for a SCHED_OTHER task preempting another SCHED_OTHER task
 this is really not a correctness issue. RT folks are concerned
 about SCHED_FIFO/RR tasks preemption and not about the purely
 fairness driven SCHED_OTHER preemption latencies.

 So I introduced a lazy preemption mechanism which only applies to
 SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of
 the existing preempt_count each tasks sports now a
 preempt_lazy_count which is manipulated on lock acquiry and
 release. This is slightly incorrect as for lazyness reasons I
 coupled this on migrate_disable/enable so some other mechanisms
 get the same treatment (e.g. get_cpu_light).

 Now on the scheduler side instead of setting NEED_RESCHED this
 sets NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER
 preemption and therefor allows to exit the waking task the lock
 held region before the woken task preempts. That also works
 better for cross CPU wakeups as the other side can stay in the
 adaptive spinning loop.

 For RT class preemption there is no change. This simply sets
 NEED_RESCHED and forgoes the lazy preemption counter.

 Initial test do not expose any observable latency increasement,
 but history shows that I've been proven wrong before :)

 The lazy preemption mode is per default on, but with
 CONFIG_SCHED_DEBUG enabled it can be disabled via:

 # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features

 and reenabled via

 # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features

 The test results so far are very machine and workload dependent,
 but there is a clear trend that it enhances the non RT workload
 performance.

 Please give it a try and share your experience!

Known issues:

  There is still some "softirq pending xx" fallout which I have
  not been able to investigate yet, but that's on my top priority
  list. It's not a critical issue and only annoys people with
  CONFIG_NO_HZ=y configurations.


The delta patch against 3.6.4-rt10 is appended below and can be found
here:

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.6/incr/patch-3.6.4-rt10-rt11.patch.xz


The RT patch against 3.6.4 can be found here:

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.6/patch-3.6.4-rt11.patch.xz

The split quilt queue is available at:

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.6/patches-3.6.4-rt11.tar.xz

Enjoy,

tglx

->

Index: linux-stable/arch/x86/Kconfig
===
--- linux-stable.orig/arch/x86/Kconfig
+++ linux-stable/arch/x86/Kconfig
@@ -97,6 +97,7 @@ config X86
select KTIME_SCALAR if X86_32
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
+   select HAVE_PREEMPT_LAZY
 
 config INSTRUCTION_DECODER
def_bool (KPROBES || PERF_EVENTS || UPROBES)
Index: linux-stable/arch/x86/kernel/entry_64.S
===
--- linux-stable.orig/arch/x86/kernel/entry_64.S
+++ linux-stable/arch/x86/kernel/entry_64.S
@@ -1003,9 +1003,15 @@ retint_signal:
 ENTRY(retint_kernel)
cmpl $0,TI_preempt_count(%rcx)
jnz  retint_restore_args
-   bt  $TIF_NEED_RESCHED,TI_flags(%rcx)
+   bt   $TIF_NEED_RESCHED,TI_flags(%rcx)
+   jc   1f
+
+   cmpl $0,TI_preempt_lazy_count(%rcx)
+   jnz  retint_restore_args
+   bt   $TIF_NEED_RESCHED_LAZY,TI_flags(%rcx)
jnc  retint_restore_args
-   bt   $9,EFLAGS-ARGOFFSET(%rsp)  /* interrupts off? */
+
+1: bt   

Re: [PATCH 5/6] power: export opp cpufreq functions

2012-10-30 Thread Nishanth Menon
On 16:04-20121030, Mark Langsdorf wrote:
$subject
PM / OPP:
Also adding info that this allows cpufreq drivers to be used as module
might be helpful.
> Signed-off-by: Mark Langsdorf 
> Cc: linux...@vger.kernel.org
Side note:
Applies on v3.7-rc3
on rafael's linux-next branch:
linux-next  2b7f449 Merge branch 'pm-opp-next' into linux-next
on git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
needs a rebase as it probably conflicts with
https://patchwork.kernel.org/patch/1582091/

Otherwise, approach:
Acked-by: Nishanth Menon 

-- 
Regards,
Nishanth Menon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v8 01/16] hashtable: introduce a small and naive hashtable

2012-10-30 Thread Steven Rostedt
On Tue, 2012-10-30 at 20:33 -0400, Sasha Levin wrote:
> On Tue, Oct 30, 2012 at 5:42 PM, Tejun Heo  wrote:
> > Hello,
> >
> > Just some nitpicks.
> >
> > On Tue, Oct 30, 2012 at 02:45:57PM -0400, Sasha Levin wrote:
> >> +/* Use hash_32 when possible to allow for fast 32bit hashing in 64bit 
> >> kernels. */
> >> +#define hash_min(val, bits)   
> >>\
> >> +({
> >>\
> >> + sizeof(val) <= 4 ?   
> >>\
> >> + hash_32(val, bits) : 
> >>\
> >> + hash_long(val, bits);
> >>\
> >> +})
> >
> > Doesn't the above fit in 80 column.  Why is it broken into multiple
> > lines?  Also, you probably want () around at least @val.  In general,
> > it's a good idea to add () around any macro argument to avoid nasty
> > surprises.
> 
> It was broken to multiple lines because it looks nicer that way (IMO).
> 
> If we wrap it with () it's going to go over 80, so it's going to stay
> broken down either way :)

({\
sizeof(val) <= 4 ? hash_32(val, bits) : hash_long(val, bits); \
})

Is the better way to go. We are C programmers, we like to see the ?: on
a single line if possible. The way you have it, looks like three
statements run consecutively.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >