[PATCH] regulator: core: use regulator name for sysfs

2013-02-20 Thread Shawn Joo
regulator is named by numbering on sysfs, e.g. regulator.0, regulator.1
it confuses to find desired regulator before counting the order.
add option for regulator name by use_name_onsysfs.
if it is true and name is not NULL, desc's name will be the name.
e.g. if name in desc is "LDO0", then regulator.LDO0 on sysfs.
otherwise it follows origin.

Signed-off-by: Shawn Joo 
---
 drivers/regulator/core.c |7 ++-
 include/linux/regulator/driver.h |1 +
 2 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 2785843..4dde54d 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -3399,7 +3399,12 @@ regulator_register(const struct regulator_desc 
*regulator_desc,
rdev->dev.class = _class;
rdev->dev.of_node = config->of_node;
rdev->dev.parent = dev;
-   dev_set_name(>dev, "regulator.%d",
+   if (regulator_desc->use_name_onsysfs &&
+   regulator_desc->name != NULL)
+   dev_set_name(>dev, "regulator.%s",
+   regulator_desc->name);
+   else
+   dev_set_name(>dev, "regulator.%d",
 atomic_inc_return(_no) - 1);
ret = device_register(>dev);
if (ret != 0) {
diff --git a/include/linux/regulator/driver.h b/include/linux/regulator/driver.h
index d10bb0f..597f8dd 100644
--- a/include/linux/regulator/driver.h
+++ b/include/linux/regulator/driver.h
@@ -224,6 +224,7 @@ struct regulator_desc {
unsigned int bypass_mask;
 
unsigned int enable_time;
+   bool use_name_onsysfs;
 };
 
 /**
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 00/32] ldisc patchset

2013-02-20 Thread Sebastian Andrzej Siewior
On 02/20/2013 09:02 PM, Peter Hurley wrote:
> Sebastian, please re-test your g_nokia+dummy_hcd testcase with
> this series.

I've seen your first series but I did not have the time yet. I hope
this will change this weekend.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: prctl(PR_SET_MM)

2013-02-20 Thread Amnon Shiloh
Cyrill Gorcunov wrote:

>> Another possibility is to have a dual #if:
>>
>> #if defined(CONFIG_CHECKPOINT_RESTORE) || defined(CONFIG_MM_FIELDS_SETTING)
>
> Thus this approach looks preferred. And MM_FIELDS_SETTING will be y by 
> default.
> Mind to cook a patch and lets see if community accept it? Don't forget to
> CC Andrew Morton.

Very well, patch attached.

Amnon.
diff -Naur before/init/Kconfig after/init/Kconfig
--- before/init/Kconfig 2013-02-19 10:28:34.0 +1030
+++ after/init/Kconfig  2013-02-21 18:03:48.0 +1030
@@ -999,6 +999,22 @@
 
  If unsure, say N here.
 
+config MM_FIELDS_SETTING
+   bool "Allow modifying per-process memory-region fields"
+   default y
+   help
+  Support "prctl(PR_SET_MM)" which allows applications to modify
+  the following in their "mm_struct":
+
+ start_code, end_code, start_data, end_data, start_brk, brk,
+ start_stack, arg_start, arg_end, env_start, env_end.
+
+   Also to modify their executable file ("/proc/self/exe").
+
+   This option is needed for reconstructing processes (such as when
+   restoring a process from a checkpoint; duplicating a process;
+   or migrating it to another computer).
+
 menuconfig NAMESPACES
bool "Namespaces support" if EXPERT
default !EXPERT
diff -Naur before/kernel/sys.c after/kernel/sys.c
--- before/kernel/sys.c 2013-02-19 10:28:34.0 +1030
+++ after/kernel/sys.c  2013-02-21 17:19:10.0 +1030
@@ -1788,7 +1788,7 @@
return mask;
 }
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
+#if defined(CONFIG_CHECKPOINT_RESTORE) || defined(CONFIG_MM_FIELDS_SETTING)
 static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 {
struct fd exe;
@@ -1981,18 +1981,22 @@
up_read(>mmap_sem);
return error;
 }
+#else /* CONFIG_CHECKPOINT_RESTORE || CONFIG_MM_FIELDS_SETTING */
 
-static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
-{
-   return put_user(me->clear_child_tid, tid_addr);
-}
-
-#else /* CONFIG_CHECKPOINT_RESTORE */
 static int prctl_set_mm(int opt, unsigned long addr,
unsigned long arg4, unsigned long arg5)
 {
return -EINVAL;
 }
+#endif
+
+#ifdef CONFIG_CHECKPOINT_RESTORE
+static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
+{
+   return put_user(me->clear_child_tid, tid_addr);
+}
+
+#else
 static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 {
return -EINVAL;


Re: [PATCH 01/11] ARM: disable virt_to_bus/virt_to_bus almost everywhere

2013-02-20 Thread Vineet Gupta
>> I guess you'll have to do something similar for arch/metag, and Vineet
>> will do it for arch/arc.

After getting the tip-bot msg about Stephen's patch for -mm, I never saw it in
-next and thus was not sure how when it will start showing up in -next.

> 
> Yes, I have a little series of patches to fix this sort of harmless
> thing in arch/metag, removing GENERIC_SIGALTSTACK, HAVE_IRQ_WORK,
> ARCH_NO_VIRT_TO_BUS, and CONFIG_EXPERIMENTAL from defconfigs. I wasn't
> sure how it's normally handled though, so I was just going to wait until
> the other things go in before requesting a second pull.

Yes I also have a TODO list which certainly had most of those - but thanks for 
the
complete list James :-)


>> Stephen's patch is currently in Andrew's tree, and I don't see an easy
>> way to coordinate this. The patch we will need once both are merged
>> is below.
>>
>> Signed-off-by: Arnd Bergmann 
> 
> FWIW:
> Acked-by: James Hogan  (for arch/metag)
> 
> Cheers
> James

Acked-by: Vineet Gupta  (for arch/arc)

So this means Stephen's patch - once merged will take care of existing in-tree
arches and your's will take care of metag/arc.

Thx,
-Vineet
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: io_apic.c --> "nr_ioapics" not initialized !

2013-02-20 Thread Armin Steinhoff

Thomas Gleixner wrote:

On Wed, 20 Feb 2013, Armin Steinhoff wrote:

after a walk through the module "io_apic.c" in
"/usr/src/linux/arch/x86/kernel/apic" I got the impression that the variable
"nr_ioapics" is used but isn't initialized !
Could it be the source of boot problems ?

Well no, unless your compiler is silly.
  
arch/x86/kernel/apic/io_apic.c:int nr_ioapics;


That's initialized to 0


 My "silly compiler" does it only during load time 

--Armin




Dangerous coding stile in "static struct resource * __init
ioapic_setup_resources(int nr_ioapics)" 

Though the brilliant brain who decided to name the argument of
ioapic_setup_resources() the same as a global variable and of course
the call site of it to do:

 ioapic_res = ioapic_setup_resources(nr_ioapics);

Brilliant. Unfortunately that's completely correct C code. Though it's
confusing as hell and definitely worth to be fixed. Patch below.

Thanks,

tglx

Index: linux-2.6/arch/x86/kernel/apic/io_apic.c
===
--- linux-2.6.orig/arch/x86/kernel/apic/io_apic.c
+++ linux-2.6/arch/x86/kernel/apic/io_apic.c
@@ -3637,25 +3637,25 @@ void __init setup_ioapic_dest(void)
  
  static struct resource *ioapic_resources;
  
-static struct resource * __init ioapic_setup_resources(int nr_ioapics)

+static struct resource * __init ioapic_setup_resources(int cnt)
  {
unsigned long n;
struct resource *res;
char *mem;
int i;
  
-	if (nr_ioapics <= 0)

+   if (cnt <= 0)
return NULL;
  
  	n = IOAPIC_RESOURCE_NAME_SIZE + sizeof(struct resource);

-   n *= nr_ioapics;
+   n *= cnt;
  
  	mem = alloc_bootmem(n);

res = (void *)mem;
  
-	mem += sizeof(struct resource) * nr_ioapics;

+   mem += sizeof(struct resource) * cnt;
  
-	for (i = 0; i < nr_ioapics; i++) {

+   for (i = 0; i < cnt; i++) {
res[i].name = mem;
res[i].flags = IORESOURCE_MEM | IORESOURCE_BUSY;
snprintf(mem, IOAPIC_RESOURCE_NAME_SIZE, "IOAPIC %u", i);



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2] spi: tegra114: add spi driver

2013-02-20 Thread Laxman Dewangan
Add SPI driver for NVIDIA's Tegra114 SPI controller. This controller
is different than the older SoCs SPI controller in internal design as
well as register interface.

This driver supports the:
- non DMA based transfer for smaller transfer i.e. less than FIFO depth.
- APB DMA based transfer for lager transfer i.e. more than FIFO depth.
- Clock gating through runtime PM callbacks.
- registration through DT only.

Signed-off-by: Laxman Dewangan 
---
Changes from V1:
- All nit cleanups for nomenclature like Nvidia to NVIDIA, spi to SPI etc.
- remove bits_per_word check as all transfer have this valid parameter.
- Cleanups in dt parsing and remove dt node validity.
- use devm_ioremap_resource for mapping physical address.

 .../bindings/spi/nvidia,tegra114-spi.txt   |   25 +
 drivers/spi/Kconfig|8 +
 drivers/spi/Makefile   |1 +
 drivers/spi/spi-tegra114.c | 1246 
 4 files changed, 1280 insertions(+), 0 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/spi/nvidia,tegra114-spi.txt
 create mode 100644 drivers/spi/spi-tegra114.c

diff --git a/Documentation/devicetree/bindings/spi/nvidia,tegra114-spi.txt 
b/Documentation/devicetree/bindings/spi/nvidia,tegra114-spi.txt
new file mode 100644
index 000..c6457e9
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/nvidia,tegra114-spi.txt
@@ -0,0 +1,25 @@
+NVIDIA Tegra114 SPI controller.
+
+Required properties:
+- compatible : should be "nvidia,tegra114-spi".
+- reg: Should contain SPI registers location and length.
+- interrupts: Should contain SPI interrupts.
+- nvidia,dma-request-selector : The Tegra DMA controller's phandle and
+  request selector for this SPI controller.
+
+Recommended properties:
+- spi-max-frequency: Definition as per
+ Documentation/devicetree/bindings/spi/spi-bus.txt
+Example:
+
+spi@7000d600 {
+   compatible = "nvidia,tegra114-spi";
+   reg = <0x7000d600 0x200>;
+   interrupts = <0 82 0x04>;
+   nvidia,dma-request-selector = < 16>;
+   spi-max-frequency = <2500>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   status = "disabled";
+};
+
diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig
index f80eee7..10699bb 100644
--- a/drivers/spi/Kconfig
+++ b/drivers/spi/Kconfig
@@ -405,6 +405,14 @@ config SPI_TEGRA20_SFLASH
  The main usecase of this controller is to use spi flash as boot
  device.
 
+config SPI_TEGRA114
+   tristate "NVIDIA Tegra114 SPI Controller"
+   depends on ARCH_TEGRA && TEGRA20_APB_DMA
+   help
+ SPI driver for NVIDIA Tegra114 SPI Controller interface. This 
controller
+ is different than the older SoCs SPI controller and also register 
interface
+ get changed with this controller.
+
 config SPI_TEGRA20_SLINK
tristate "Nvidia Tegra20/Tegra30 SLINK Controller"
depends on ARCH_TEGRA && TEGRA20_APB_DMA
diff --git a/drivers/spi/Makefile b/drivers/spi/Makefile
index e53c309..ee2a119 100644
--- a/drivers/spi/Makefile
+++ b/drivers/spi/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPI_SH_SCI)  += spi-sh-sci.o
 obj-$(CONFIG_SPI_SIRF) += spi-sirf.o
 obj-$(CONFIG_SPI_TEGRA20_SFLASH)   += spi-tegra20-sflash.o
 obj-$(CONFIG_SPI_TEGRA20_SLINK)+= spi-tegra20-slink.o
+obj-$(CONFIG_SPI_TEGRA114) += spi-tegra114.o
 obj-$(CONFIG_SPI_TI_SSP)   += spi-ti-ssp.o
 obj-$(CONFIG_SPI_TLE62X0)  += spi-tle62x0.o
 obj-$(CONFIG_SPI_TOPCLIFF_PCH) += spi-topcliff-pch.o
diff --git a/drivers/spi/spi-tegra114.c b/drivers/spi/spi-tegra114.c
new file mode 100644
index 000..35b1e94
--- /dev/null
+++ b/drivers/spi/spi-tegra114.c
@@ -0,0 +1,1246 @@
+/*
+ * SPI driver for NVIDIA's Tegra114 SPI Controller.
+ *
+ * Copyright (c) 2013, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define SPI_COMMAND1   0x000
+#define SPI_BIT_LENGTH(x)  (((x) & 0x1f) << 0)
+#define SPI_PACKED (1 << 5)
+#define SPI_TX_EN  (1 << 

Re: [Bug fix PATCH 0/2] Make whatever node kernel resides in un-hotpluggable.

2013-02-20 Thread Tang Chen

Hi Andrew,

Please see below. :)

On 02/21/2013 05:36 AM, Andrew Morton wrote:


Also, please review the changelogging for these:

page_alloc-add-movable_memmap-kernel-parameter.patch
page_alloc-add-movable_memmap-kernel-parameter-fix.patch
page_alloc-add-movable_memmap-kernel-parameter-fix-fix.patch
page_alloc-add-movable_memmap-kernel-parameter-fix-fix-checkpatch-fixes.patch
page_alloc-add-movable_memmap-kernel-parameter-fix-fix-fix.patch
page_alloc-add-movable_memmap-kernel-parameter-rename-movablecore_map-to-movablemem_map.patch


**
Add functions to parse movablemem_map boot option.  Since the option
could be specified more then once, all the maps will be stored in the
global variable movablemem_map.map array.

And also, we keep the array in monotonic increasing order by start_pfn.
And merge all overlapped ranges.
**



memory-hotplug-remove-sys-firmware-memmap-x-sysfs.patch
memory-hotplug-remove-sys-firmware-memmap-x-sysfs-fix.patch
memory-hotplug-remove-sys-firmware-memmap-x-sysfs-fix-fix.patch
memory-hotplug-remove-sys-firmware-memmap-x-sysfs-fix-fix-fix.patch
memory-hotplug-remove-sys-firmware-memmap-x-sysfs-fix-fix-fix-fix.patch
memory-hotplug-remove-sys-firmware-memmap-x-sysfs-fix-fix-fix-fix-fix.patch


**
When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start,
type} sysfs files are created.  But there is no code to remove these
files.  This patch implements the function to remove them.

We cannot free firmware_map_entry which is allocated by bootmem because
there is no way to do so when the system is up. But we can at least remember
the address of that memory and reuse the storage when the memory is added
next time.

This patch also introduces a new list map_entries_bootmem to link the map
entries allocated by bootmem when they are removed, and a lock to 
protect it.

And these entries will be reused when the memory is hot-added again.

The idea is suggestted by Andrew Morton 

NOTE: It is unsafe to return an entry pointer and release the 
map_entries_lock.

  So we should not hold the map_entries_lock separately in
  firmware_map_find_entry() and firmware_map_remove_entry(). Hold the
  map_entries_lock across find and remove /sys/firmware/memmap/X 
operation.


  And also, users of these two functions need to be careful to hold 
the lock

  when using these two functions.
**



memory-hotplug-implement-register_page_bootmem_info_section-of-sparse-vmemmap.patch
memory-hotplug-implement-register_page_bootmem_info_section-of-sparse-vmemmap-fix.patch
memory-hotplug-implement-register_page_bootmem_info_section-of-sparse-vmemmap-fix-fix.patch
memory-hotplug-implement-register_page_bootmem_info_section-of-sparse-vmemmap-fix-fix-fix.patch
memory-hotplug-implement-register_page_bootmem_info_section-of-sparse-vmemmap-fix-fix-fix-fix.patch


**
For removing memmap region of sparse-vmemmap which is allocated bootmem,
memmap region of sparse-vmemmap needs to be registered by
get_page_bootmem(). So the patch searches pages of virtual mapping and
registers the pages by get_page_bootmem().

NOTE: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390,
  and sparc. So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert
  register_page_bootmem_info_node() when platform doesn't support it.

  It's implemented by adding a new Kconfig option named
  CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically 
selected by
  memory-hotplug feature fully supported archs(currently only on 
x86_64).


  Since we have 2 config options called MEMORY_HOTPLUG and 
MEMORY_HOTREMOVE
  used for memory hot-add and hot-remove separately, and codes in 
function
  register_page_bootmem_info_node() are only used for collecting 
infomation

  for hot-remove, so reside it under MEMORY_HOTREMOVE.

  Besides page_isolation.c selected by MEMORY_ISOLATION under 
MEMORY_HOTPLUG

  is also such case, move it too.
**



memory-hotplug-common-apis-to-support-page-tables-hot-remove.patch
memory-hotplug-common-apis-to-support-page-tables-hot-remove-fix.patch
memory-hotplug-common-apis-to-support-page-tables-hot-remove-fix-fix.patch
memory-hotplug-common-apis-to-support-page-tables-hot-remove-fix-fix-fix.patch
memory-hotplug-common-apis-to-support-page-tables-hot-remove-fix-fix-fix-fix.patch
memory-hotplug-common-apis-to-support-page-tables-hot-remove-fix-fix-fix-fix-fix.patch
memory-hotplug-common-apis-to-support-page-tables-hot-remove-fix-fix-fix-fix-fix-fix.patch
memory-hotplug-common-apis-to-support-page-tables-hot-remove-fix-fix-fix-fix-fix-fix-fix.patch


**
When memory is removed, the corresponding pagetables should alse be removed.
This patch introduces some common APIs to support vmemmap pagetable and 
x86_64

architecture direct mapping pagetable removing.

All pages of virtual mapping in removed memory cannot be freed if some 
pages
used as PGD/PUD include not only removed memory 

Re: [PATCH 7/7] ACPI / scan: Make memory hotplug driver use struct acpi_scan_handler

2013-02-20 Thread Yasuaki Ishimatsu

Hi Vasilis,

2013/02/20 19:42, Vasilis Liaskovitis wrote:

Hi Yasuaki,

On Wed, Feb 20, 2013 at 12:35:48PM +0900, Yasuaki Ishimatsu wrote:

Hi Vasilis,

2013/02/20 3:11, Vasilis Liaskovitis wrote:

Hi,

On Sun, Feb 17, 2013 at 04:27:18PM +0100, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

Make the ACPI memory hotplug driver use struct acpi_scan_handler
for representing the object used to set up ACPI memory hotplug
functionality and to remove hotplug memory ranges and data
structures used by the driver before unregistering ACPI device
nodes representing memory.  Register the new struct acpi_scan_handler
object with the help of acpi_scan_add_handler_with_hotplug() to allow
user space to manipulate the attributes of the memory hotplug
profile.


Let's consider an example where we want acpi memory device ejection to be safely
handled by userspace. We do the following:

echo 0 > /sys/firmware/acpi/hotplug/memory/autoeject
echo 1 > /sys/firmware/acpi/hotplug/memory/uevents

We succesfully hotplug acpi device:
/sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00
and its corresponding memblocks /sys/devices/system/memory/memoryXX are
also successfully onlined.

On an eject request, since uevents == 1, the kernel will emit KOBJ_OFFLINE for:
/sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00

Can userspace know which memblocks in /sys/devices/system/memory/memoryXX/
correspond to the acpi device /sys/devices/LNXSYSTM:00/LNXSYSBUS:00/PNP0C80:00 ?
This will be needed so that userspace tries to offline the memblocks (and only
if successful, issue the eject operation on the acpi device). As far as I see,
we don't create any sysfs links or files for this scenario - can userspace get
this info somehow?




/sys/devices/system/memory/memoryXX/phys_device needs to be properly implemented
for this to work I think, see Documentation/ABI/testing/sysfs-memory

The following test patch works toward that direction. Let me know if it's of
interest or if there are better ideas /comments.


How about use ../PNP0C80:00/physical_node/resources file?
In my system, the file shows following information.

$ cat /sys/bus/acpi/devices/PNP0C80\:00/physical_node/resources
state = active
mem 0x0-0x8000
mem 0x1-0x8

It means PNP0C80:00's memory ranges are "0x0-0x7fff" and
"0x1-0x7". In x86 architecture, memory section size is
128MiB. So, if these memory range is divided by 128MiB, you can
calculate memory section number as follow:

0x0-0x7fff => 0x0-0x10
0x1-0x7 => 0x20-0xff

But there is one problem. The problem is that resources file of added memory
is not created. If the problem is fixed, I think you can use the way.





thanks for your suggestion. Is this resources file a property of the
physical_node or of each acpi devices?

If it's a node specific file could there be a chance that adjacent memory
ranges get merged? We 'd like these to not get merged.


This information is created by pnppacpi_init().
It seems that:
  - resources file is created to each acpi_devices.
  - the memory range does not get merged.

Thanks,
Yasuaki Ishimatsu


I will look more into this property. I don't see it currently in my system
(probably because initial memory is not backed by acpi devices in my
  seabios/virtual machine).




[...]

+int acpi_memory_phys_device(unsigned long start_pfn)
+{
+   struct acpi_memory_device *mem_dev;
+   struct acpi_memory_info *info;
+   unsigned long start_addr = start_pfn << PAGE_SHIFT;
+   int id = 0;
+
+   list_for_each_entry(mem_dev, _mem_device_list, mem_device_list) {
+   list_for_each_entry(info, _dev->res_list, list) {
+   if ((info->start_addr <= start_addr) &&
+   (info->start_addr + info->length > start_addr))
+   return id;
+   }
+   id++;
+   }


I don't think this solve your problem.

When hot adding memory device in my system, consecutive index number is
applied to PNP0C80 as follows:

$ ls /sys/bus/acpi/devices/ |grep PNP0C80
PNP0C80:00
PNP0C80:01  => hot added memory device
PNP0C80:02  => hot added memory device

In this case, we can know PNP0C80:YY by memoryXX/phys_device file.
But if hot removing and adding the same device, index number is changed
as follows:

$ ls /sys/bus/acpi/devices/
PNP0C80:00
PNP0C80:03  => hot added memory device
PNP0C80:04  => hot added memory device

In this case, we cannot know PNP0C80:YY by memoryXX/phys_device file.



thanks, yes you are right. I forgot each new hotplug event will create a new
PNP0C80:XX device where XX always increases. So the hot-add/hot-remove/hot-add
scenario would have a problem.
Then it would be enough to be able to return this monotonically increasing
counter from phys_device instead of the current list iterator. Is this counter
available somehwere in drivers/acpi/scan.c or bus.c? I 'll take a look.

thanks,

- Vasilis




--
To unsubscribe from this list: 

Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-20 Thread Michael Wang
On 02/21/2013 02:11 PM, Mike Galbraith wrote:
> On Thu, 2013-02-21 at 12:51 +0800, Michael Wang wrote: 
>> On 02/20/2013 06:49 PM, Ingo Molnar wrote:
>> [snip]
[snip]
>>
>>  if wake_affine()
>>  new_cpu = select_idle_sibling(curr_cpu)
>>  else
>>  new_cpu = select_idle_sibling(prev_cpu)
>>
>>  return new_cpu
>>
>> Actually that doesn't make sense.
>>
>> I think wake_affine() is trying to check whether move a task from
>> prev_cpu to curr_cpu will break the balance in affine_sd or not, but why
>> won't break balance means curr_cpu is better than prev_cpu for searching
>> the idle cpu?
> 
> You could argue that it's impossible to break balance by moving any task
> to any idle cpu, but that would mean bouncing tasks cross node on every
> wakeup is fine, which it isn't.

I don't get it... could you please give me more detail on how
wake_affine() related with bouncing?

> 
>> So the new logical in this patch set is:
>>
>>  new_cpu = select_idle_sibling(prev_cpu)
>>  if idle_cpu(new_cpu)
>>  return new_cpu
> 
> So you tilted the scales in favor of leaving tasks in their current
> package, which should benefit large footprint tasks, but should also
> penalize light communicating tasks.

Yes, I'd prefer to wakeup the task on a cpu which:
1. idle
2. close to prev_cpu

So if both curr_cpu and prev_cpu have idle cpu in their topology, which
one is better? that depends on how task benefit from cache and the
balance situation, whatever, I don't think the benefit worth the high
cost of wake_affine() in most cases...

Regards,
Michael Wang

> 
> I suspect that much of the pgbench improvement comes from the preemption
> mitigation from keeping 1:N load maximally spread, which is the perfect
> thing to do with such loads.  In all the testing I ever did with it in
> 1:N mode, preemption dominated performance numbers.  Keep server away
> from clients, it has fewer fair competition worries, can consume more
> CPU preemption free, pushing the load collapse point strongly upward.
> 
> -Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 17/16 V2] virtio-scsi: use virtqueue_add_sgs for command buffers

2013-02-20 Thread Rusty Russell
Asias He  writes:
> On 02/20/2013 05:47 PM, Wanlong Gao wrote:
>> Using the new virtqueue_add_sgs function lets us simplify the queueing
>> path.  In particular, all data protected by the tgt_lock is just gone
>> (multiqueue will find a new use for the lock).
>> 
>> Signed-off-by: Paolo Bonzini 
>> Signed-off-by: Wanlong Gao 
>
> Reviewed-by: Asias He 

Applied.

Unfortunately these won't be in until *next* merge window...

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/16] virtio-blk: use virtqueue_start_buf on bio path

2013-02-20 Thread Rusty Russell
Asias He  writes:

> On 02/19/2013 03:56 PM, Rusty Russell wrote:
>> (This is a respin of Paolo Bonzini's patch, but it calls
>> virtqueue_add_sgs() instead of his multi-part API).
...
> (This subject needs to be changed to reflect using of virtqueue_add_sgs)

Thanks, done.

>> -static inline int __virtblk_add_req(struct virtqueue *vq,
>> - struct virtblk_req *vbr,
>> - unsigned long out,
>> - unsigned long in)
>> +static int __virtblk_add_req(struct virtqueue *vq,
>> + struct virtblk_req *vbr)
>>  {
>> -return virtqueue_add_buf(vq, vbr->sg, out, in, vbr, GFP_ATOMIC);
>> +struct scatterlist hdr, tailer, *sgs[3];
>
> 'status' might be better than 'tailer'. We are using status in other
> places.

Indeed, done.

> Reviewed-by: Asias He 

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 vringh 1/3] remoteproc: Add support for vringh (Host vrings)

2013-02-20 Thread Rusty Russell
Ohad Ben-Cohen  writes:
> Hi Sjur,
>
> On Tue, Feb 12, 2013 at 1:49 PM,   wrote:
>> From: Sjur Brændeland 
>>
>> Add functions for creating, deleting and kicking host-side virtio rings.
>>
>> The host ring is not integrated with virtiqueues and cannot be managed
>> through virtio-config.
>
> Is that an inherent design/issue of vringh or just a description of
> the current vringh code ?

It's by design.  The producer (virtqueue) and consumer (vringh) are two
sides of the same coin, but they do different things.

virtqueue is a slightly higher level abstraction which assumes a
virtio_device, because every user so far has had one.  vringh doesn't,
because it's also aimed to underlie vhost.c which doesn't really have
one.

> This is possible of course thanks to the abstraction provided by
> virtio: remoteproc only implements a set of callbacks which virtio
> invokes when needed.
>
> Do we not want to follow a similar design scheme with vringh ?

Hmm... I clearly jumped the gun, assuming consensus was already reached.
I have put these patches *back* into pending-rebases, and they will not
be merged this merge window.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/16] virtio_net: use virtqueue_add_sgs[] for command buffers.

2013-02-20 Thread Rusty Russell
Wanlong Gao  writes:
> On 02/19/2013 03:56 PM, Rusty Russell wrote:
>> It's a bit cleaner to hand multiple sgs, rather than one big one.
>> 
>> Signed-off-by: Rusty Russell 
...
>> +BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
>>  
>> -ctrl.class = class;
>> -ctrl.cmd = cmd;
>
> The class and cmd assignment of ctrl header is forgotten?
>
> Thanks,
> Wanlong Gao

Good catch, fixed.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/16] virtio ring rework.

2013-02-20 Thread Rusty Russell
Paolo Bonzini  writes:
> Il 19/02/2013 08:56, Rusty Russell ha scritto:
>> OK, this is (ab)uses some of Paolo's patches.  The first 7 are
>> candidates for this merge window (maybe), the rest I'm not so sure
>> about.
>
> Cool, thanks.
>
>> Thanks,
>> Rusty.
>> 
>> Paolo Bonzini (3):
>>   scatterlist: introduce sg_unmark_end
>>   virtio-blk: reorganize virtblk_add_req
>>   virtio-blk: use virtqueue_add_sgs on req path
>> 
>> Rusty Russell (13):
>>   virtio_ring: virtqueue_add_sgs, to add multiple sgs.
>>   virtio-blk: use virtqueue_start_buf on bio path
>
> Something wrong with author and commit message in this patch.

Re: author.  I mangled your patch pretty badly in that case, so I wasn't
sure you wanted the blame.

I have restore your authorship.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] arm: use built-in byte swap function

2013-02-20 Thread Kim Phillips
On Wed, 20 Feb 2013 23:29:58 -0500
Nicolas Pitre  wrote:

> On Wed, 20 Feb 2013, Kim Phillips wrote:
> 
> > On Wed, 20 Feb 2013 10:43:18 -0500
> > Nicolas Pitre  wrote:
> > 
> > > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > > > ... in which case there is no harm shipping a .c file and trivially 
> > > > > enforcing -O2, the rest being equal.
> > > > 
> > > > For today's compilers, unless the wind changes.
> > > 
> > > We'll adapt if necessary.  Going with -O2 should remain pretty safe 
> > > anyway.
> > 
> > Alas, not so for gcc 4.4 - I had forgotten I had tested
> > Ubuntu/Linaro 4.4.7-1ubuntu2 here:
> > 
> > https://patchwork.kernel.org/patch/2101491/
> > 
> > add -O2 to that test script and gcc 4.4 *always* emits calls to
> > __bswap[sd]i2, even with -march=armv6k+.

argh, sorry - that script was testing support for 
__builtin_bswap{16,32,64} directly, which isn't the same as testing
code generation of a byte swap pattern in C.

> Crap.  OK, assembly code is the way to go then.
> 
> > I'll try working on an assembly version given it probably
> > makes more sense, future-gcc-immunity-wise.
> 
> Agreed.

I'll still try the assembly approach - gcc 4.4's armv6 output looks
worse than both the pre-armv6 and post-armv6 __arch_swab32
implementations currently in use:

mov ip, sp
push{fp, ip, lr, pc}
sub fp, ip, #4
and r2, r0, #65280  ; 0xff00
lsl ip, r0, #24
orr r1, ip, r0, lsr #24
and r0, r0, #16711680   ; 0xff
orr r3, r1, r2, lsl #8
orr r0, r3, r0, lsr #8
ldm sp, {fp, sp, pc}

Kim

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pci: do not try to assign irq 255

2013-02-20 Thread Hannes Reinecke

On 02/20/2013 05:57 PM, Yinghai Lu wrote:

On Tue, Feb 19, 2013 at 11:58 PM, Hannes Reinecke  wrote:



Apparently this device is meant to use MSI _only_ so the BIOS developer
didn't feel the need to assign an INTx here.

According to PCI-3.0, section 6.8 (Message Signalled Interrupts):

It is recommended that devices implement interrupt pins to
provide compatibility in systems that do not support MSI
(devices default to interrupt pins). However, it is expected
that the need for interrupt pins will diminish over time.
Devices that do not support interrupt pins due to pin
constraints (rely on polling for device service) may implement
messages to increase performance without adding additional pins. >
Therefore, system configuration software must not assume that a
message capable device has an interrupt pin.


Which sounds to me as if the implementation is valid...


it seems you mess pin with interrupt line.

current code:
 unsigned char irq;

 pci_read_config_byte(dev, PCI_INTERRUPT_PIN, );
 dev->pin = irq;
 if (irq)
 pci_read_config_byte(dev, PCI_INTERRUPT_LINE, );
 dev->irq = irq;

so if the device does not have interrupt pin implemented, pin should be zero.
and  pin and irq in dev should
be all 0.


But the device _has_ an interrupt pin implemented.
The whole point here is that the interrupt line is _NOT_ zero.

00:14.0 USB controller [0c03]: Intel Corporation 7 Series/C210 
Series Chipset Family USB xHCI Host Controller [8086:1e31] (rev 04) 
(prog-if 30 [XHCI])

Subsystem: Hewlett-Packard Company Device [103c:179b]
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- 
Interrupt: pin A routed to IRQ 255
Region 0: Memory at d472 (64-bit, non-prefetchable) [size=64K]
Capabilities: [70] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA 
PME(D0-,D1-,D2-,D3hot+,D3cold+)

Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] MSI: Enable- Count=1/8 Maskable- 64bit+
Address:   Data: 

So at one point we have to decide that ->irq is not valid, despite 
it being not set to zero.

An alternative fix would be this:

diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index 68a921d..4a480cb 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -469,6 +469,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
} else {
dev_warn(>dev, "PCI INT %c: no GSI\n",
 pin_name(pin));
+   dev->irq = 0;
}
return 0;
}

Which probably is a better solution, as here ->irq is _definitely_
not valid, so we should reset it to '0' to avoid confusion on upper
layers.

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tools: usb: ffs-test: Fix build failure

2013-02-20 Thread Michal Nazarewicz
On Thu, Feb 21 2013, Maxin B. John wrote:
> Hi,
>
> On Thu, Feb 21, 2013 at 2:06 AM, Greg KH  wrote:
>> On Thu, Feb 21, 2013 at 01:57:51AM +0200, maxin.j...@gmail.com wrote:
>>> From: "Maxin B. John" 
>>>
>>> Fixes this build failure:
>>> gcc -Wall -Wextra -g -lpthread -I../include -o testusb testusb.c
>>> gcc -Wall -Wextra -g -lpthread -I../include -o ffs-test ffs-test.c
>>> In file included from ffs-test.c:41:0:
>>> ../../include/linux/usb/functionfs.h:4:39: fatal error:
>>> uapi/linux/usb/functionfs.h: No such file or directory
>>> compilation terminated.
>>> make: *** [ffs-test] Error 1
>>
>> This is a build failure where, 3.8, or linux-next, or somewhere else?
>
> It is in 3.8

This also happens in 3.7.  [commit
5e1ddb481776a487b15b40579a000b279ce527c9: UAPI: (Scripted) Disintegrate
include/linux/usb] is the culprit.

-- 
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +--ooO--(_)--Ooo--

pgpbFJmia2nRt.pgp
Description: PGP signature


Re: [PATCH] tools: usb: ffs-test: Fix build failure

2013-02-20 Thread Michal Nazarewicz
On Thu, Feb 21 2013, maxin.j...@gmail.com wrote:
> From: "Maxin B. John" 
>
> Fixes this build failure:
> gcc -Wall -Wextra -g -lpthread -I../include -o testusb testusb.c
> gcc -Wall -Wextra -g -lpthread -I../include -o ffs-test ffs-test.c
> In file included from ffs-test.c:41:0:
> ../../include/linux/usb/functionfs.h:4:39: fatal error:
> uapi/linux/usb/functionfs.h: No such file or directory
> compilation terminated.
> make: *** [ffs-test] Error 1
>
> Signed-off-by: Maxin B. John 

Acked-by: Michal Nazarewicz 

> ---
>  tools/usb/ffs-test.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/tools/usb/ffs-test.c b/tools/usb/ffs-test.c
> index 8674b9e..fe1e66b 100644
> --- a/tools/usb/ffs-test.c
> +++ b/tools/usb/ffs-test.c
> @@ -38,7 +38,7 @@
>  #include 
>  #include 
>  
> -#include "../../include/linux/usb/functionfs.h"
> +#include "../../include/uapi/linux/usb/functionfs.h"
>  
>  
>  / Little Endian Handling 
> /
> -- 
> 1.7.7
>

-- 
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +--ooO--(_)--Ooo--

pgpbBCf_yr1UH.pgp
Description: PGP signature


RE: linux-next: build failure after merge of the xen-two tree

2013-02-20 Thread Liu, Jinsong
Konrad Rzeszutek Wilk wrote:
 commit 3757b94802fb65d8f696597a74053cf21738da0b
 Author: Rafael J. Wysocki 
 Date:   Wed Feb 13 14:36:47 2013 +0100
 
 ACPI / hotplug: Fix concurrency issues and memory leaks
 
 after which acpi_bus_scan() and acpi_bus_trim() have to be run
 under acpi_scan_lock (new in my tree as well).
>>> 
>>> Yes, we noticed that and only need minor updates at xen side, will
>>> send out 2 xen patches later accordingly, for cleanup and adding
>>> lock. 
>> 
>> Thanks, but those new changes will only make sense after merging the
>> Xen tree with the PM tree.  Why don't we queue them up for merging
>> later after both the Xen and PM trees have been pulled from?
> 
> OK, I've created a branch
> (http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=shortlog;h=refs/heads/linux-next-resolved)
> that has your branch and my branch - along with the fix from Stephan
> and then  
> the three updates from Jinsong. Jinsong, please check that I've got
> all the 
> right patches. I will rebase it once Linus has merged both of the Xen
> and PM trees. 

Check done, it's OK.

Thanks,
Jinsong--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] block: partition: optimize memory allocation in check_partition

2013-02-20 Thread Yasuaki Ishimatsu
2013/02/21 14:22, Ming Lei wrote:
> Currently, sizeof(struct parsed_partitions) may be 64KB in 32bit arch,
> so it is easy to trigger page allocation failure by check_partition,
> especially in hotplug block device situation(such as, USB mass storage,
> MMC card, ...), and Felipe Balbi has observed the failure.
> 
> This patch does below optimizations on the allocation of struct
> parsed_partitions to try to address the issue:
> 
> - make parsed_partitions.parts as pointer so that the pointed memory
> can fit in 32KB buffer, then approximate 32KB memory can be saved
> 
> - vmalloc the buffer pointed by parsed_partitions.parts because
> 32KB is still a bit big for kmalloc
> 
> - given that many devices have the partition count limit, so only
> allocate disk_max_parts() partitions instead of 256 partitions always
> 
> Reported-by: Felipe Balbi 
> Signed-off-by: Ming Lei 
> ---

Reviewed-by: Yasuaki Ishimatsu 

Thanks,
Yasuaki Ishimatsu

>   block/partition-generic.c |4 ++--
>   block/partitions/check.c  |   37 -
>   block/partitions/check.h  |4 +++-
>   3 files changed, 37 insertions(+), 8 deletions(-)
> 
> diff --git a/block/partition-generic.c b/block/partition-generic.c
> index 1cb4dec..789cdea 100644
> --- a/block/partition-generic.c
> +++ b/block/partition-generic.c
> @@ -418,7 +418,7 @@ int rescan_partitions(struct gendisk *disk, struct 
> block_device *bdev)
>   int p, highest, res;
>   rescan:
>   if (state && !IS_ERR(state)) {
> - kfree(state);
> + free_partitions(state);
>   state = NULL;
>   }
>   
> @@ -525,7 +525,7 @@ rescan:
>   md_autodetect_dev(part_to_dev(part)->devt);
>   #endif
>   }
> - kfree(state);
> + free_partitions(state);
>   return 0;
>   }
>   
> diff --git a/block/partitions/check.c b/block/partitions/check.c
> index bc90867..19ba207 100644
> --- a/block/partitions/check.c
> +++ b/block/partitions/check.c
> @@ -14,6 +14,7 @@
>*/
>   
>   #include 
> +#include 
>   #include 
>   #include 
>   
> @@ -106,18 +107,45 @@ static int (*check_part[])(struct parsed_partitions *) 
> = {
>   NULL
>   };
>   
> +static struct parsed_partitions *allocate_partitions(struct gendisk *hd)
> +{
> + struct parsed_partitions *state;
> + int nr;
> +
> + state = kzalloc(sizeof(*state), GFP_KERNEL);
> + if (!state)
> + return NULL;
> +
> + nr = disk_max_parts(hd);
> + state->parts = vzalloc(nr * sizeof(state->parts[0]));
> + if (!state->parts) {
> + kfree(state);
> + return NULL;
> + }
> +
> + state->limit = nr;
> +
> + return state;
> +}
> +
> +void free_partitions(struct parsed_partitions *state)
> +{
> + vfree(state->parts);
> + kfree(state);
> +}
> +
>   struct parsed_partitions *
>   check_partition(struct gendisk *hd, struct block_device *bdev)
>   {
>   struct parsed_partitions *state;
>   int i, res, err;
>   
> - state = kzalloc(sizeof(struct parsed_partitions), GFP_KERNEL);
> + state = allocate_partitions(hd);
>   if (!state)
>   return NULL;
>   state->pp_buf = (char *)__get_free_page(GFP_KERNEL);
>   if (!state->pp_buf) {
> - kfree(state);
> + free_partitions(state);
>   return NULL;
>   }
>   state->pp_buf[0] = '\0';
> @@ -128,10 +156,9 @@ check_partition(struct gendisk *hd, struct block_device 
> *bdev)
>   if (isdigit(state->name[strlen(state->name)-1]))
>   sprintf(state->name, "p");
>   
> - state->limit = disk_max_parts(hd);
>   i = res = err = 0;
>   while (!res && check_part[i]) {
> - memset(>parts, 0, sizeof(state->parts));
> + memset(state->parts, 0, state->limit * sizeof(state->parts[0]));
>   res = check_part[i++](state);
>   if (res < 0) {
>   /* We have hit an I/O error which we don't report now.
> @@ -161,6 +188,6 @@ check_partition(struct gendisk *hd, struct block_device 
> *bdev)
>   printk(KERN_INFO "%s", state->pp_buf);
>   
>   free_page((unsigned long)state->pp_buf);
> - kfree(state);
> + free_partitions(state);
>   return ERR_PTR(res);
>   }
> diff --git a/block/partitions/check.h b/block/partitions/check.h
> index 52b1003..eade17e 100644
> --- a/block/partitions/check.h
> +++ b/block/partitions/check.h
> @@ -15,13 +15,15 @@ struct parsed_partitions {
>   int flags;
>   bool has_info;
>   struct partition_meta_info info;
> - } parts[DISK_MAX_PARTS];
> + } *parts;
>   int next;
>   int limit;
>   bool access_beyond_eod;
>   char *pp_buf;
>   };
>   
> +void free_partitions(struct parsed_partitions *state);
> +
>   struct parsed_partitions *
>   check_partition(struct gendisk *, struct block_device *);
>   
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body 

PATCH: freezer: add fake signal clearing back when thaw task

2013-02-20 Thread Lianwei Wang
Hi Tejun Heo and all,

The commit of "34b087e freezer: kill unused
set_freezable_with_signal()" remove recalc_sigpending*() calls in
freezer, so the user tasks get TIF_SIGPENDING fake signal that is set
when freezing userspace process. It left the fake signal to userspcae
which cause the userspace task that wait_event_freezable and friends
return a wrong ERESTARTSYS. This is not good because it waste cpu time
to handle the fake signal.

Can we just call the recalc_sigpending to clear the fake signal for
userspace tasks? as below patch do:

>From 176fccee178bc0185d92853dd2f521c9166b0853 Mon Sep 17 00:00:00 2001
From: Lianwei Wang 
Date: Mon, 21 Jan 2013 18:21:26 +0800
Subject: [PATCH] freezer: add fake signal clearing back when thaw task

The fake TIF_SIGPENDING is set during freeze userspace process, but it
is not cleared when thaw tasks after below commit:
  34b087e freezer: kill unused set_freezable_with_signal()

This will cause the userspace task that wait_event_freezable and friends
return a wrong ERESTARTSYS. This is not good because it waste cpu time to
handle the fake signal.

Try to clear the TIF_SIGPENDING flag for userspace apps when wakeup the
frozen task to fix this issue.

Change-Id: I91c90ad2ee9a46c42e3b39a7384ec81e97bc0394
Signed-off-by: Lianwei Wang 
---
 kernel/freezer.c |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index c38893b..09557f6 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -46,6 +46,16 @@ bool freezing_slow_path(struct task_struct *p)
 }
 EXPORT_SYMBOL(freezing_slow_path);

+static void fake_signal_clear(struct task_struct *p)
+{
+ unsigned long flags;
+
+ if (lock_task_sighand(p, )) {
+ recalc_sigpending();
+ unlock_task_sighand(p, );
+ }
+}
+
 /* Refrigerator is place where frozen processes are stored :-). */
 bool __refrigerator(bool check_kthr_stop)
 {
@@ -74,6 +84,10 @@ bool __refrigerator(bool check_kthr_stop)

  pr_debug("%s left refrigerator\n", current->comm);

+ if (!(current->flags & PF_KTHREAD))
+ if (test_tsk_thread_flag(current, TIF_SIGPENDING))
+ fake_signal_clear(current);
+
  /*
  * Restore saved task state before returning.  The mb'd version
  * needs to be used; otherwise, it might silently break
--
1.7.4.1


0001-freezer-add-fake-signal-clearing-back-when-thaw-task.patch
Description: Binary data


Re: [resend] Timer broadcast question

2013-02-20 Thread Santosh Shilimkar

On Tuesday 19 February 2013 11:51 PM, Daniel Lezcano wrote:

On 02/19/2013 07:10 PM, Thomas Gleixner wrote:

On Tue, 19 Feb 2013, Daniel Lezcano wrote:

I am working on identifying the different wakeup sources from the
interrupts and I have a question regarding the timer broadcast.

The broadcast timer is setup to the next event and that will wake up any
idle cpu belonging to the "broadcast cpumask", right ?

The cpu which has been woken up will look for each cpu the next-event
and send an IPI to wake it up.

Although, it is possible the sender of this IPI may not be concerned by
the timer expiration and has been woken up just for sending the IPI, right ?


Correct.


If this is correct, is it possible to setup the timer irq affinity to a
cpu which will be concerned by the timer expiration ? so we prevent an
unnecessary wake up for a cpu.


It is possible, but we never implemented it.

If we go there, we want to make that conditional on a property flag,
because some interrupt controllers especially on x86 only allow to
move the affinity from interrupt context, which is pointless.


Thanks Thomas for your quick answer. I will write a RFC patchset.


Last year I implemented the affinity hook for broad-cast code and
experimented with it. Since the system I was using was dual core,
it wasn't much beneficial and hence gave up later. I did remember
discussing the approach with few folks in the conference.

Patch in the end of the email (also attached) for generic broadcast
code. I didn't look at all corner case though. In arch code then
you need to setup "broadcast_affinity" hook which should be able
to get handle of the arch irqchip and call the respective affinity
handler. Just 3 lines function should do the trick.

As Thomas said, effectiveness of such optimization solely depends
on how well the affinity (in low powers) supported by your IRQ chip.

Hope this is helpful for you.

Regards,
Santosh


From d70f2d48ec08a3f1d73187c49b16e4e60f81a50c Mon Sep 17 00:00:00 2001
From: Santosh Shilimkar 
Date: Wed, 25 Jul 2012 03:42:33 +0530
Subject: [PATCH] tick-broadcast: Add tick road-cast affinity suport

Current tick broad-cast code has affinity set to the boot CPU and hence
the boot CPU will always wakeup from low power states when broad cast timer
is armed even if the next expiry event doesn't belong to it.

Patch adds broadcast affinity functionality to avoid above and let the
tick framework set the affinity of the event for the CPU it belongs.

Signed-off-by: Santosh Shilimkar 
---
 include/linux/clockchips.h   |2 ++
 kernel/time/tick-broadcast.c |   13 -
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 8a7096f..5488cdc 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -95,6 +95,8 @@ struct clock_event_device {
unsigned long   retries;

void(*broadcast)(const struct cpumask *mask);
+   void(*broadcast_affinity)
+   (const struct cpumask *mask, int irq);
void(*set_mode)(enum clock_event_mode mode,
struct clock_event_device *);
void(*suspend)(struct clock_event_device *);
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index f113755..2ec2425 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -39,6 +39,8 @@ static void tick_broadcast_clear_oneshot(int cpu);
 static inline void tick_broadcast_clear_oneshot(int cpu) { }
 #endif

+static inline void dummy_broadcast_affinity(const struct cpumask *mask,
+   int irq) { }
 /*
  * Debugging: see timer_list.c
  */
@@ -485,14 +487,19 @@ void tick_broadcast_oneshot_control(unsigned long 
reason)

if (!cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) {
cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask());
clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
-   if (dev->next_event.tv64 < bc->next_event.tv64)
+   if (dev->next_event.tv64 < bc->next_event.tv64) {
tick_broadcast_set_event(dev->next_event, 1);
+   bc->broadcast_affinity(
+   tick_get_broadcast_oneshot_mask(), bc->irq);
+   }
}
} else {
if (cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) {
cpumask_clear_cpu(cpu,
  tick_get_broadcast_oneshot_mask());
clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
+   bc->broadcast_affinity(
+   tick_get_broadcast_oneshot_mask(), bc->irq);
  

Re: [PATCH v2] X.509: Support parse long form of length octets in Authority Key Identifier

2013-02-20 Thread Rusty Russell
joeyli  writes:
> 於 三,2013-02-20 於 12:49 +,David Howells 提到:
>> Acked-by: David Howells 
>> 
>
> Thanks for David's review and confirm.

Should this be CC stable?

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: too many timer retries happen when do local timer swtich with broadcast timer

2013-02-20 Thread Jason Liu
2013/2/20 Thomas Gleixner :
> On Wed, 20 Feb 2013, Jason Liu wrote:
>> void arch_idle(void)
>> {
>> 
>> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, );
>>
>> enter_the_wait_mode();
>>
>> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, );
>> }
>>
>> when the broadcast timer interrupt arrives(this interrupt just wakeup
>> the ARM, and ARM has no chance
>> to handle it since local irq is disabled. In fact it's disabled in
>> cpu_idle() of arch/arm/kernel/process.c)
>>
>> the broadcast timer interrupt will wake up the CPU and run:
>>
>> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, );->
>> tick_broadcast_oneshot_control(...);
>> ->
>> tick_program_event(dev->next_event, 1);
>> ->
>> tick_dev_program_event(dev, expires, force);
>> ->
>> for (i = 0;;) {
>> int ret = clockevents_program_event(dev, expires, now);
>> if (!ret || !force)
>> return ret;
>>
>> dev->retries++;
>> 
>> now = ktime_get();
>> expires = ktime_add_ns(now, dev->min_delta_ns);
>> }
>> clockevents_program_event(dev, expires, now);
>>
>> delta = ktime_to_ns(ktime_sub(expires, now));
>>
>> if (delta <= 0)
>> return -ETIME;
>>
>> when the bc timer interrupt arrives,  which means the last local timer
>> expires too. so,
>> clockevents_program_event will return -ETIME, which will cause the
>> dev->retries++
>> when retry to program the expired timer.
>>
>> Even under the worst case, after the re-program the expired timer,
>> then CPU enter idle
>> quickly before the re-progam timer expired, it will make system
>> ping-pang forever,
>
> That's nonsense.

I don't think so.

>
> The timer IPI brings the core out of the deep idle state.
>
> So after returning from enter_wait_mode() and after calling
> clockevents_notify() it returns from arch_idle() to cpu_idle().
>
> In cpu_idle() interrupts are reenabled, so the timer IPI handler is
> invoked. That calls the event_handler of the per cpu local clockevent
> device (the one which stops in C3). That ends up in the generic timer
> code which expires timers and reprograms the local clock event device
> with the next pending timer.
>
> So you cannot go idle again, before the expired timers of this event
> are handled and their callbacks invoked.

That's true for the CPUs which not response to the global timer interrupt.
Take our platform as example: we have 4CPUs(CPU0, CPU1,CPU2,CPU3)
The global timer device will keep running even in the deep idle mode, so, it
can be used as the broadcast timer device, and the interrupt of this device
just raised to CPU0 when the timer expired, then, CPU0 will broadcast the
IPI timer to other CPUs which is in deep idle mode.

So for CPU1, CPU2, CPU3, you are right, the IPI timer will bring it out of idle
state, after running clockevents_notify() it returns from arch_idle()
to cpu_idle(),
then local_irq_enable(), the IPI handler will be invoked and handle
the expires times
and re-program the next pending timer.

But, that's not true for the CPU0. The flow for CPU0 is:
the global timer interrupt wakes up CPU0 and then call:
clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, );

which will cpumask_clear_cpu(cpu, tick_get_broadcast_oneshot_mask());
in the function tick_broadcast_oneshot_control(),

After return from clockevents_notify(), it will return to cpu_idle
from arch_idle,
then local_irq_enable(), the CPU0 will response to the global timer
interrupt, and
call the interrupt handler: tick_handle_oneshot_broadcast()

static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
{
struct tick_device *td;
ktime_t now, next_event;
int cpu;

raw_spin_lock(_broadcast_lock);
again:
dev->next_event.tv64 = KTIME_MAX;
next_event.tv64 = KTIME_MAX;
cpumask_clear(to_cpumask(tmpmask));
now = ktime_get();
/* Find all expired events */
for_each_cpu(cpu, tick_get_broadcast_oneshot_mask()) {
td = _cpu(tick_cpu_device, cpu);
if (td->evtdev->next_event.tv64 <= now.tv64)
cpumask_set_cpu(cpu, to_cpumask(tmpmask));
else if (td->evtdev->next_event.tv64 < next_event.tv64)
next_event.tv64 = td->evtdev->next_event.tv64;
}

/*
 * Wakeup the cpus which have an expired event.
 */
tick_do_broadcast(to_cpumask(tmpmask));
...
}

since cpu0 has been removed from the tick_get_broadcast_oneshot_mask(), and if
all the other cpu1/2/3 state in idle, and no expired timers, then the
tmpmask will be 0,
when call tick_do_broadcast().

static void tick_do_broadcast(struct cpumask *mask)
{
int cpu = smp_processor_id();
struct tick_device *td;

/*
 * Check, if the current cpu is in the mask
 */
if (cpumask_test_cpu(cpu, mask)) {

Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-20 Thread Mike Galbraith
On Thu, 2013-02-21 at 12:51 +0800, Michael Wang wrote: 
> On 02/20/2013 06:49 PM, Ingo Molnar wrote:
> [snip]
> > 
> > The changes look clean and reasoable, any ideas exactly *why* it 
> > speeds up?
> > 
> > I.e. are there one or two key changes in the before/after logic 
> > and scheduling patterns that you can identify as causing the 
> > speedup?
> 
> Hi, Ingo
> 
> Thanks for your reply, please let me point out the key changes here
> (forgive me for haven't wrote a good description in cover).
> 
> The performance improvement from this patch set is:
> 1. delay the invoke on wake_affine().
> 2. save the circle to gain proper sd.
> 
> The second point is obviously, and will benefit a lot when the sd
> topology is deep (NUMA is suppose to make it deeper on large system).
> 
> So in my testing on a 12 cpu box, actually most of the benefit comes
> from the first point, and please let me introduce it in detail.
> 
> The old logical when locate affine_sd is:
> 
>   if prev_cpu != curr_cpu
>   if wake_affine()
>   prev_cpu = curr_cpu
>   new_cpu = select_idle_sibling(prev_cpu)
>   return new_cpu
> 
> The new logical is same to the old one if prev_cpu == curr_cpu, so let's
> simplify the old logical like:
> 
>   if wake_affine()
>   new_cpu = select_idle_sibling(curr_cpu)
>   else
>   new_cpu = select_idle_sibling(prev_cpu)
> 
>   return new_cpu
> 
> Actually that doesn't make sense.
> 
> I think wake_affine() is trying to check whether move a task from
> prev_cpu to curr_cpu will break the balance in affine_sd or not, but why
> won't break balance means curr_cpu is better than prev_cpu for searching
> the idle cpu?

You could argue that it's impossible to break balance by moving any task
to any idle cpu, but that would mean bouncing tasks cross node on every
wakeup is fine, which it isn't.

> So the new logical in this patch set is:
> 
>   new_cpu = select_idle_sibling(prev_cpu)
>   if idle_cpu(new_cpu)
>   return new_cpu

So you tilted the scales in favor of leaving tasks in their current
package, which should benefit large footprint tasks, but should also
penalize light communicating tasks.

I suspect that much of the pgbench improvement comes from the preemption
mitigation from keeping 1:N load maximally spread, which is the perfect
thing to do with such loads.  In all the testing I ever did with it in
1:N mode, preemption dominated performance numbers.  Keep server away
from clients, it has fewer fair competition worries, can consume more
CPU preemption free, pushing the load collapse point strongly upward.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the signal tree with the powerpc tree

2013-02-20 Thread Benjamin Herrenschmidt
On Thu, 2013-02-21 at 15:52 +1100, Stephen Rothwell wrote:
> Hi Al,
> 
> Today's linux-next merge of the signal tree got conflicts in
> arch/powerpc/kernel/signal_32.c and arch/powerpc/kernel/signal_64.c
> between commit 2b0a576d15e0 ("powerpc: Add new transactional memory state
> to the signal context") from the powerpc tree and commit 7cce246557bf
> ("powerpc: switch to generic sigaltstack") from the signal tree.
> 
> I fixed it up (I think - see below) and can carry the fix as necessary
> (no action is required).

Mikey, can you check everything's all right ?

I'm happy to wait for Al stuff to go in first & fixup the conflict
before I send the pull request to Linus. I'm off travelling around but I
should be able to get stuff out this week-end.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 2/2] cpufreq: Convert the cpufreq_driver_lock to use the rcu

2013-02-20 Thread Viresh Kumar
On 21 February 2013 05:26, Nathan Zimmer  wrote:
> In general rwlocks are discourged so we are moving it to use the rcu instead.
> This does require a bit of care since the cpufreq_driver_lock protects both
> the cpufreq_driver and the cpufreq_cpu_data array.
> Also since many of the function pointers on cpufreq_driver may sleep when
> called we have to grab them under the rcu_read_lock but call them after
> rcu_read_unlock();

Even i have started reading rcu documentation now :)

> Cc: Viresh Kumar 
> Cc: "Rafael J. Wysocki" 
> Signed-off-by: Nathan Zimmer 
> ---
>  drivers/cpufreq/cpufreq.c | 312 
> +-
>  1 file changed, 224 insertions(+), 88 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c

> @@ -255,20 +258,21 @@ static inline void adjust_jiffies(unsigned long val, 
> struct cpufreq_freqs *ci)
>  void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int 
> state)
>  {
> struct cpufreq_policy *policy;
> -   unsigned long flags;
> +   u8 flags;

I think you can get rid of flags.

> BUG_ON(irqs_disabled());
>
> if (cpufreq_disabled())
> return;
>
> -   freqs->flags = cpufreq_driver->flags;
> pr_debug("notification %u of frequency transition to %u kHz\n",
> state, freqs->new);
>
> -   read_lock_irqsave(_driver_lock, flags);
> +   rcu_read_lock();
> +   flags = rcu_dereference(cpufreq_driver)->flags;

use freq->flags here ...

> policy = per_cpu(cpufreq_cpu_data, freqs->cpu);
> -   read_unlock_irqrestore(_driver_lock, flags);
> +   rcu_read_unlock();
> +   freqs->flags = flags;
>
> switch (state) {
>
> @@ -277,7 +281,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs 
> *freqs, unsigned int state)
>  * which is not equal to what the cpufreq core thinks is
>  * "old frequency".
>  */
> -   if (!(cpufreq_driver->flags & CPUFREQ_CONST_LOOPS)) {
> +   if (!(flags & CPUFREQ_CONST_LOOPS)) {

and here.

> if ((policy) && (policy->cpu == freqs->cpu) &&
> (policy->cur) && (policy->cur != freqs->old)) {
> pr_debug("Warning: CPU frequency is"


> @@ -742,35 +773,39 @@ static int cpufreq_add_dev_interface(unsigned int cpu,

> -   write_lock_irqsave(_driver_lock, flags);
> +   spin_lock_irqsave(_driver_lock, flags);
> for_each_cpu(j, policy->cpus) {
> per_cpu(cpufreq_cpu_data, j) = policy;
> per_cpu(cpufreq_policy_cpu, j) = policy->cpu;
> }
> -   write_unlock_irqrestore(_driver_lock, flags);
> +   spin_unlock_irqrestore(_driver_lock, flags);
> +   synchronize_rcu();

I don't think (but i can be wrong too :) ), that we need a synchronize_rcu()
here. We need it only at places where we have updated the cpufreq_driver
pointer.

As we aren't doing any rcu specific read/update for cpufreq_cpu_data.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] spi: tegra114: add spi driver

2013-02-20 Thread Laxman Dewangan

On Wednesday 20 February 2013 11:30 PM, Stephen Warren wrote:

On 02/20/2013 10:57 AM, Mark Brown wrote:

On Wed, Feb 20, 2013 at 10:36:41AM -0700, Stephen Warren wrote:

On 02/20/2013 10:31 AM, Mark Brown wrote:

Since we can extend the list of clocks it doesn't seem like
there's much issue here, especially if some of them are
optional?

Yes, there's certainly a way to extend the binding in a
backwards-compatible way.
However, I have seen in Rob and/or Grant push back on not fully
defining bindings in the past - i.e. actively planning to
initially create a minimal binding and extend it in the future,
rather than completely defining it up-front.

That sounds like the current stuff with a minimal definition is
OK?

I'm personally OK with defining a minimal binding first and extending
it later. But, I'm worried if when we actually try to extend the
binding later, we'll get push-back.


Yes, for a given controller there is lots of input sources which can be 
mux but we can not use all option as some of source is changeable based 
on DVFS policy or other constraints. Like one of controller has the 
input clock source as PLLC which is again used by CPU and it varies for 
requested CPU frequency. In this context, we would like to not choose 
PLLC as clock source for given controller.


So we may need to provide the list of valid clock source/option from DT 
file and clock muxing should be done from that source list only in place 
of super set supported by SoCs.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/2] arm: Wire up kcmp syscall

2013-02-20 Thread Cyrill Gorcunov
On Wed, Feb 20, 2013 at 03:17:23PM -0800, Andrew Morton wrote:
> On Tue, 19 Feb 2013 11:07:03 +0400
> Cyrill Gorcunov  wrote:
> 
> > From: Alexander Kartashov 
> > Subject: arm: Wire up kcmp syscall
> > 
> > Signed-off-by: Alexander Kartashov 
> > Cc: Russell King 
> 
> This should have had signed-off-by:you, as you were on the patch's
> delivery path.

Ouch, sorry Andrew! Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] block: partition: optimize memory allocation in check_partition

2013-02-20 Thread Ming Lei
Currently, sizeof(struct parsed_partitions) may be 64KB in 32bit arch,
so it is easy to trigger page allocation failure by check_partition,
especially in hotplug block device situation(such as, USB mass storage,
MMC card, ...), and Felipe Balbi has observed the failure.

This patch does below optimizations on the allocation of struct
parsed_partitions to try to address the issue:

- make parsed_partitions.parts as pointer so that the pointed memory
can fit in 32KB buffer, then approximate 32KB memory can be saved

- vmalloc the buffer pointed by parsed_partitions.parts because
32KB is still a bit big for kmalloc

- given that many devices have the partition count limit, so only
allocate disk_max_parts() partitions instead of 256 partitions always

Reported-by: Felipe Balbi 
Signed-off-by: Ming Lei 
---
 block/partition-generic.c |4 ++--
 block/partitions/check.c  |   37 -
 block/partitions/check.h  |4 +++-
 3 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/block/partition-generic.c b/block/partition-generic.c
index 1cb4dec..789cdea 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -418,7 +418,7 @@ int rescan_partitions(struct gendisk *disk, struct 
block_device *bdev)
int p, highest, res;
 rescan:
if (state && !IS_ERR(state)) {
-   kfree(state);
+   free_partitions(state);
state = NULL;
}
 
@@ -525,7 +525,7 @@ rescan:
md_autodetect_dev(part_to_dev(part)->devt);
 #endif
}
-   kfree(state);
+   free_partitions(state);
return 0;
 }
 
diff --git a/block/partitions/check.c b/block/partitions/check.c
index bc90867..19ba207 100644
--- a/block/partitions/check.c
+++ b/block/partitions/check.c
@@ -14,6 +14,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 
@@ -106,18 +107,45 @@ static int (*check_part[])(struct parsed_partitions *) = {
NULL
 };
 
+static struct parsed_partitions *allocate_partitions(struct gendisk *hd)
+{
+   struct parsed_partitions *state;
+   int nr;
+
+   state = kzalloc(sizeof(*state), GFP_KERNEL);
+   if (!state)
+   return NULL;
+
+   nr = disk_max_parts(hd);
+   state->parts = vzalloc(nr * sizeof(state->parts[0]));
+   if (!state->parts) {
+   kfree(state);
+   return NULL;
+   }
+
+   state->limit = nr;
+
+   return state;
+}
+
+void free_partitions(struct parsed_partitions *state)
+{
+   vfree(state->parts);
+   kfree(state);
+}
+
 struct parsed_partitions *
 check_partition(struct gendisk *hd, struct block_device *bdev)
 {
struct parsed_partitions *state;
int i, res, err;
 
-   state = kzalloc(sizeof(struct parsed_partitions), GFP_KERNEL);
+   state = allocate_partitions(hd);
if (!state)
return NULL;
state->pp_buf = (char *)__get_free_page(GFP_KERNEL);
if (!state->pp_buf) {
-   kfree(state);
+   free_partitions(state);
return NULL;
}
state->pp_buf[0] = '\0';
@@ -128,10 +156,9 @@ check_partition(struct gendisk *hd, struct block_device 
*bdev)
if (isdigit(state->name[strlen(state->name)-1]))
sprintf(state->name, "p");
 
-   state->limit = disk_max_parts(hd);
i = res = err = 0;
while (!res && check_part[i]) {
-   memset(>parts, 0, sizeof(state->parts));
+   memset(state->parts, 0, state->limit * sizeof(state->parts[0]));
res = check_part[i++](state);
if (res < 0) {
/* We have hit an I/O error which we don't report now.
@@ -161,6 +188,6 @@ check_partition(struct gendisk *hd, struct block_device 
*bdev)
printk(KERN_INFO "%s", state->pp_buf);
 
free_page((unsigned long)state->pp_buf);
-   kfree(state);
+   free_partitions(state);
return ERR_PTR(res);
 }
diff --git a/block/partitions/check.h b/block/partitions/check.h
index 52b1003..eade17e 100644
--- a/block/partitions/check.h
+++ b/block/partitions/check.h
@@ -15,13 +15,15 @@ struct parsed_partitions {
int flags;
bool has_info;
struct partition_meta_info info;
-   } parts[DISK_MAX_PARTS];
+   } *parts;
int next;
int limit;
bool access_beyond_eod;
char *pp_buf;
 };
 
+void free_partitions(struct parsed_partitions *state);
+
 struct parsed_partitions *
 check_partition(struct gendisk *, struct block_device *);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] block: partitions: mac: obey the state->limit constraint

2013-02-20 Thread Ming Lei
It isn't necessary to read the information of partitions whose No.
is equal and more than state->limit since only maximum state->limit
partitions will be added inside rescan_partitions().

That is also what other kind of partitions are doing.

Signed-off-by: Ming Lei 
---
 block/partitions/mac.c |4 
 1 file changed, 4 insertions(+)

diff --git a/block/partitions/mac.c b/block/partitions/mac.c
index 11f688b..76d8ba6 100644
--- a/block/partitions/mac.c
+++ b/block/partitions/mac.c
@@ -63,6 +63,10 @@ int mac_partition(struct parsed_partitions *state)
put_dev_sector(sect);
return 0;
}
+
+   if (blocks_in_map >= state->limit)
+   blocks_in_map = state->limit - 1;
+
strlcat(state->pp_buf, " [mac]", PAGE_SIZE);
for (slot = 1; slot <= blocks_in_map; ++slot) {
int pos = slot * secsize;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-20 Thread Michael Wang
On 02/20/2013 10:05 PM, Mike Galbraith wrote:
> On Wed, 2013-02-20 at 14:32 +0100, Peter Zijlstra wrote: 
>> On Wed, 2013-02-20 at 11:49 +0100, Ingo Molnar wrote:
>>
>>> The changes look clean and reasoable, 
>>
>> I don't necessarily agree, note that O(n^2) storage requirement that
>> Michael failed to highlight ;-)
> 
> (yeah, I mentioned that needs to shrink.. a lot)

Exactly, and I'm going to apply the suggestion now :)

> 
>>> any ideas exactly *why* it speeds up?
>>
>> That is indeed the most interesting part.. There's two parts to
>> select_task_rq_fair(), the 'regular' affine wakeup path, and the
>> fork/exec find_idlest_goo() path. At the very least we need to quantify
>> which of these two parts contributes most to the speedup.
>>
>> In the power balancing discussion we already noted that the
>> find_idlest_goo() is in need of attention.
> 
> Yup, even little stuff like break off the search when load is zero..

Agree, searching in a bunch of idle cpus and their subsets doesn't make
sense...

Regards,
Michael Wang

> unless someone is planning on implementing anti-idle 'course ;-)
> 
> -Mike
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] exynos-dp updates for v3.9

2013-02-20 Thread Jingoo Han
Hi Linus,

Florian, the fbdev maintainer, has been very busy lately, so I send the pull 
request
for exynos-dp for this merge window.

The following changes since commit 19f949f52599ba7c3f67a5897ac6be14bfcb1200:

 Linux 3.8 (Mon Feb 18 15:58:34 2013 -0800)

are available in the git repository at:
  git://github.com/jingoo/linux.git tags/exynos-dp-3.9

for you to fetch changes up to bb80934325dab97b479815aed237ebec33ed1c57:

 video: exynos_dp: move disable_irq() to exynos_dp_suspend() (Tue Jan 29 
18:26:05 2013 +0900)


exynos-dp updates for the v3.9:

- The missing function calls are fixed.


Ajay Kumar (1):
  video: exynos_dp: move disable_irq() to exynos_dp_suspend()

Jingoo Han (1):
  video: exynos_dp: add missing of_node_put()

drivers/video/exynos/exynos_dp_core.c |   24 +++-
 1 files changed, 15 insertions(+), 9 deletions(-)

--
Best regards,
Jingoo Han

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] samsung-fb updates for v3.9

2013-02-20 Thread Jingoo Han
Hi Linus,

Florian, the fbdev maintainer, has been very busy lately, so I send the pull 
request
for samsung-fb for this merge window.

The following changes since commit 19f949f52599ba7c3f67a5897ac6be14bfcb1200:

 Linux 3.8 (Mon Feb 18 15:58:34 2013 -0800)

are available in the git repository at:
  git://github.com/jingoo/linux.git tags/samsung-fb-3.9

for you to fetch changes up to 5a415ae252d5922de9eadefabe8510115395fbc6:

 video: s3c-fb: Fix typo in definition of VIDCON1_VSTATUS_FRONTPORCH value (Sat 
Nov 17 21:31:00 2012 +)


samsung-fb updates for the v3.9:

- The bit definitions of header file are updated.
- The dependancy is fixed.


Jingoo Han (4):
  video: s3c-fb: use ARCH_ dependancy
  video: s3c-fb: remove duplicated S3C_FB_MAX_WIN
  video: s3c-fb: remove unnecessary brackets
  video: s3c-fb: add the bit definitions for CSC EQ709 and EQ601

Tomasz Figa (1):
  video: s3c-fb: Fix typo in definition of VIDCON1_VSTATUS_FRONTPORCH value

 drivers/video/Kconfig|3 +-
 include/video/samsung_fimd.h |  205 -
 2 files changed, 102 insertions(+), 106 deletions(-)

--
Best regards,
Jingoo Han

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-20 Thread Michael Wang
On 02/20/2013 09:32 PM, Peter Zijlstra wrote:
> On Wed, 2013-02-20 at 11:49 +0100, Ingo Molnar wrote:
> 
>> The changes look clean and reasoable, 
> 
> I don't necessarily agree, note that O(n^2) storage requirement that
> Michael failed to highlight ;-)

Forgive me for not explain this point in cover, but it's really not a
big deal in my opinion...

And I'm going to apply Mike's suggestion, do allocation when cpu active,
that will save some space :)

Regards,
Michael Wang

> 
>> any ideas exactly *why* it speeds up?
> 
> That is indeed the most interesting part.. There's two parts to
> select_task_rq_fair(), the 'regular' affine wakeup path, and the
> fork/exec find_idlest_goo() path. At the very least we need to quantify
> which of these two parts contributes most to the speedup.
> 
> In the power balancing discussion we already noted that the
> find_idlest_goo() is in need of attention.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: general protection fault in do_msgrcv [3.8]

2013-02-20 Thread Stanislav Kinsbursky

20.02.2013 22:24, Dave Jones пишет:

On Wed, Feb 20, 2013 at 12:23:22PM +0400, Stanislav Kinsbursky wrote:

  > > Pid: 887, comm: trinity-child2 Not tainted 3.8.0+ #57 Gigabyte Technology 
Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H
  > > RIP: 0010:[]  [] do_msgrcv+0x22a/0x670
  > > ...
  > > Looks like Stanislav recently changed this code, so problem was likely 
introduced
  > > in those changes.
  > >
  >
  > Is it easy to reproduce? Do you use KVM?

Only hit it once so far, no KVM

  > There is a NULL selinux handler bug fix by Stephen Smalley here:
  > https://lkml.org/lkml/2013/2/6/663
  >
  > But anyway, this bug fix affects only the case, when MSG_COPY flag is set.
  >
  > And this is not your case, I suppose?

 From my reading of the traces, I'd say not. It looks like I'm oopsing before
we even get to the SELinux hooks.



Thanks, Dave. I've seen a couple of issues when running trinity in KVM 
somewhere in the same place.
Look like message queue itself has been destroyed somewhere in the past.
Have no idea how this can happen yet but still searching and will inform you in 
case of any fixes.


Dave




--
Best regards,
Stanislav Kinsbursky
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH linux-next] cpufreq: ondemand: Calculate gradient of CPU load to early increase frequency

2013-02-20 Thread Viresh Kumar
Hi Stratos,

On Thu, Feb 21, 2013 at 2:20 AM, Stratos Karafotis
 wrote:
> Instead of checking only the absolute value of CPU load_freq to increase
> frequency, we detect forthcoming CPU load rise and increase frequency
> earlier.
>
> Every sampling rate, we calculate the gradient of load_freq.
> If it is too steep we assume that the load most probably will
> go over up_threshold in next iteration(s). We reduce up_threshold
> by early_differential to achieve frequency increase in the current
> iteration.
>
> A new tuner early_demand is introduced to enable this functionality
> (disabled by default). Also we use new tuners to control early demand:
>
> - early_differential: controls the final up_threshold
> - grad_up_threshold: over this gradient of load we will decrease
> up_threshold by early_differential.
>
> Signed-off-by: Stratos Karafotis 

Sorry for this but i already have a patchset which has changed these files
to some extent. Can you please rebase over them? Actually my patchset
is already accepted, its just that rafael didn't wanted to have them for 3.9.

http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/cpufreq-for-3.10

Back to your patch:

Following is what i understood about this patch:
- The only case where this code will come into picture is when load is
below up_threshold.
- And we see a steep rise in the load from previous request..

i.e. (with the default values)

UP_THRESHOLD   (80)
GRAD_UP_THESHOLD  (50)
EARLY_DIFFERENTIAL (45)

If the load was 10 previously and it went to 80 > load >= 60, we will
make up_threshold as 80-45 = 35. Which is lower than grad_up_threshold :)

Isn't it strange?

So, probably you just don't need this tunable: early_differential.
Rather just increase the frequency without doing this calculation:

if (load_freq > od_tuners.up_threshold * policy->cur) {

> diff --git a/drivers/cpufreq/cpufreq_ondemand.c 
> b/drivers/cpufreq/cpufreq_ondemand.c
> index f3eb26c..458806f 100644
> --- a/drivers/cpufreq/cpufreq_ondemand.c
> +++ b/drivers/cpufreq/cpufreq_ondemand.c
> @@ -30,6 +30,8 @@
>  #define DEF_FREQUENCY_DOWN_DIFFERENTIAL(10)
>  #define DEF_FREQUENCY_UP_THRESHOLD (80)
>  #define DEF_SAMPLING_DOWN_FACTOR   (1)
> +#define DEF_GRAD_UP_THESHOLD   (50)

s/THESHOLD/THRESHOLD

> @@ -170,11 +175,29 @@ static void od_check_cpu(int cpu, unsigned int 
> load_freq)
>  {
> struct od_cpu_dbs_info_s *dbs_info = _cpu(od_cpu_dbs_info, cpu);
> struct cpufreq_policy *policy = dbs_info->cdbs.cur_policy;
> +   unsigned int up_threshold = od_tuners.up_threshold;
> +   unsigned int grad;
>
> dbs_info->freq_lo = 0;
>
> +   /*
> +* Calculate the gradient of load_freq. If it is too steep we assume
> +* that the load will go over up_threshold in next iteration(s). We
> +* reduce up_threshold by early_differential to achieve frequency
> +* increase earlier
> +*/
> +   if (od_tuners.early_demand) {
> +   if (load_freq > dbs_info->prev_load_freq) {

&& (load_freq < od_tuners.up_threshold * policy->cur) ??

> +   grad = load_freq - dbs_info->prev_load_freq;
> +
> +   if (grad > od_tuners.grad_up_threshold * policy->cur)
> +   up_threshold -= od_tuners.early_differential;
> +   }
> +   dbs_info->prev_load_freq = load_freq;
> +   }
> +
> /* Check for frequency increase */
> -   if (load_freq > od_tuners.up_threshold * policy->cur) {
> +   if (load_freq > up_threshold * policy->cur) {
> /* If switching to max speed, apply sampling_down_factor */
> if (policy->cur < policy->max)
> dbs_info->rate_mult =
> @@ -438,12 +461,26 @@ static ssize_t store_powersave_bias(struct kobject *a, 
> struct attribute *b,
> return count;
>  }

> +show_one(od, early_demand, early_demand);

What about making other two tunables rw?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation

2013-02-20 Thread Michael Wang
On 02/20/2013 09:25 PM, Peter Zijlstra wrote:
> On Tue, 2013-01-29 at 17:09 +0800, Michael Wang wrote:
>> +struct sched_balance_map {
>> +   struct sched_domain **sd[SBM_MAX_TYPE];
>> +   int top_level[SBM_MAX_TYPE];
>> +   struct sched_domain *affine_map[NR_CPUS];
>> +};
> 
> Argh.. affine_map is O(n^2) in nr_cpus, that's not cool.

You are right, it cost space in order to accelerate the system, I've
calculated the cost once before (I'm really not good at this, please let
me know if I make any silly calculation...), the size of struct is:

SBM_MAX_TYPE * size of pointer * domain level
SBM_MAX_TYPE * size of int
NR_CPUS * size of pointer
padding

So for my 64bits box, which has 12 cpu and 3 domain level, the struct
size is:

3 * size of pointer * 3 = 9 pointer
3 * size of int = 3 int
12 * size of pointer= 12 pointer
padding

= 3 int + 21 pointer + padding

And the final cost is 36 int and 252 pointer, add some padding, won't
over 5K, not a big deal.

Now suppose a big 64bits system with 1000 cpu and 10 level(I have no
idea how to calculate level from nodes, 10 is big in my mind...), the
struct size is:

3 * size of pointer * 10 = 30 pointer
3 * size of int = 3 int
1000 * size of pointer  = 1000 pointer
padding

= 3 int + 1030 pointer + padding

And the final cost is 3000 int and 103 pointer, and some padding,
but won't bigger than 10M, not a big deal for a system with 1000 cpu too.

Regards,
Michael Wang

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation

2013-02-20 Thread Michael Wang
On 02/20/2013 09:21 PM, Peter Zijlstra wrote:
> On Tue, 2013-01-29 at 17:09 +0800, Michael Wang wrote:
>> +   for_each_possible_cpu(cpu) {
>> +   sbm = _cpu(sbm_array, cpu);
>> +   node = cpu_to_node(cpu);
>> +   size = sizeof(struct sched_domain *) * sbm_max_level;
>> +
>> +   for (type = 0; type < SBM_MAX_TYPE; type++) {
>> +   sbm->sd[type] = kmalloc_node(size, GFP_KERNEL,
>> node);
>> +   WARN_ON(!sbm->sd[type]);
>> +   if (!sbm->sd[type])
>> +   goto failed;
>> +   }
>> +   }
> 
> You can't readily use kmalloc_node() here, cpu_to_node() might return an
> invalid node for offline cpus here.
> 
> Also see: 2ea45800d8e1c3c51c45a233d6bd6289a297a386

Hi, Peter

Thanks for your reply, I've not noticed this point, Mike had suggested
to do allocation in notifier when cpu is online, I will try to use that
idea in the formal patch set.

Regards,
Michael Wang

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the signal tree with the powerpc tree

2013-02-20 Thread Stephen Rothwell
Hi Al,

Today's linux-next merge of the signal tree got conflicts in
arch/powerpc/kernel/signal_32.c and arch/powerpc/kernel/signal_64.c
between commit 2b0a576d15e0 ("powerpc: Add new transactional memory state
to the signal context") from the powerpc tree and commit 7cce246557bf
("powerpc: switch to generic sigaltstack") from the signal tree.

I fixed it up (I think - see below) and can carry the fix as necessary
(no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/powerpc/kernel/signal_32.c
index e4a88d3,802ab5e..000
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@@ -817,223 -513,7 +742,140 @@@ static long restore_user_regs(struct pt
return 0;
  }
  
 +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 +/*
 + * Restore the current user register values from the user stack, except for
 + * MSR, and recheckpoint the original checkpointed register state for 
processes
 + * in transactions.
 + */
 +static long restore_tm_user_regs(struct pt_regs *regs,
 +   struct mcontext __user *sr,
 +   struct mcontext __user *tm_sr)
 +{
 +  long err;
 +  unsigned long msr;
 +#ifdef CONFIG_VSX
 +  int i;
 +#endif
 +
 +  /*
 +   * restore general registers but not including MSR or SOFTE. Also
 +   * take care of keeping r2 (TLS) intact if not a signal.
 +   * See comment in signal_64.c:restore_tm_sigcontexts();
 +   * TFHAR is restored from the checkpointed NIP; TEXASR and TFIAR
 +   * were set by the signal delivery.
 +   */
 +  err = restore_general_regs(regs, tm_sr);
 +  err |= restore_general_regs(>thread.ckpt_regs, sr);
 +
 +  err |= __get_user(current->thread.tm_tfhar, >mc_gregs[PT_NIP]);
 +
 +  err |= __get_user(msr, >mc_gregs[PT_MSR]);
 +  if (err)
 +  return 1;
 +
 +  /* Restore the previous little-endian mode */
 +  regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
 +
 +  /*
 +   * Do this before updating the thread state in
 +   * current->thread.fpr/vr/evr.  That way, if we get preempted
 +   * and another task grabs the FPU/Altivec/SPE, it won't be
 +   * tempted to save the current CPU state into the thread_struct
 +   * and corrupt what we are writing there.
 +   */
 +  discard_lazy_cpu_state();
 +
 +#ifdef CONFIG_ALTIVEC
 +  regs->msr &= ~MSR_VEC;
 +  if (msr & MSR_VEC) {
 +  /* restore altivec registers from the stack */
 +  if (__copy_from_user(current->thread.vr, >mc_vregs,
 +   sizeof(sr->mc_vregs)) ||
 +  __copy_from_user(current->thread.transact_vr,
 +   _sr->mc_vregs,
 +   sizeof(sr->mc_vregs)))
 +  return 1;
 +  } else if (current->thread.used_vr) {
 +  memset(current->thread.vr, 0, ELF_NVRREG * sizeof(vector128));
 +  memset(current->thread.transact_vr, 0,
 + ELF_NVRREG * sizeof(vector128));
 +  }
 +
 +  /* Always get VRSAVE back */
 +  if (__get_user(current->thread.vrsave,
 + (u32 __user *)>mc_vregs[32]) ||
 +  __get_user(current->thread.transact_vrsave,
 + (u32 __user *)_sr->mc_vregs[32]))
 +  return 1;
 +#endif /* CONFIG_ALTIVEC */
 +
 +  regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
 +
 +  if (copy_fpr_from_user(current, >mc_fregs) ||
 +  copy_transact_fpr_from_user(current, _sr->mc_fregs))
 +  return 1;
 +
 +#ifdef CONFIG_VSX
 +  regs->msr &= ~MSR_VSX;
 +  if (msr & MSR_VSX) {
 +  /*
 +   * Restore altivec registers from the stack to a local
 +   * buffer, then write this out to the thread_struct
 +   */
 +  if (copy_vsx_from_user(current, >mc_vsregs) ||
 +  copy_transact_vsx_from_user(current, _sr->mc_vsregs))
 +  return 1;
 +  } else if (current->thread.used_vsr)
 +  for (i = 0; i < 32 ; i++) {
 +  current->thread.fpr[i][TS_VSRLOWOFFSET] = 0;
 +  current->thread.transact_fpr[i][TS_VSRLOWOFFSET] = 0;
 +  }
 +#endif /* CONFIG_VSX */
 +
 +#ifdef CONFIG_SPE
 +  /* SPE regs are not checkpointed with TM, so this section is
 +   * simply the same as in restore_user_regs().
 +   */
 +  regs->msr &= ~MSR_SPE;
 +  if (msr & MSR_SPE) {
 +  if (__copy_from_user(current->thread.evr, >mc_vregs,
 +   ELF_NEVRREG * sizeof(u32)))
 +  return 1;
 +  } else if (current->thread.used_spe)
 +  memset(current->thread.evr, 0, ELF_NEVRREG * sizeof(u32));
 +
 +  /* Always get SPEFSCR back */
 +  if (__get_user(current->thread.spefscr, (u32 __user *)>mc_vregs
 + + 

Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-20 Thread Michael Wang
On 02/20/2013 06:49 PM, Ingo Molnar wrote:
[snip]
> 
> The changes look clean and reasoable, any ideas exactly *why* it 
> speeds up?
> 
> I.e. are there one or two key changes in the before/after logic 
> and scheduling patterns that you can identify as causing the 
> speedup?

Hi, Ingo

Thanks for your reply, please let me point out the key changes here
(forgive me for haven't wrote a good description in cover).

The performance improvement from this patch set is:
1. delay the invoke on wake_affine().
2. save the circle to gain proper sd.

The second point is obviously, and will benefit a lot when the sd
topology is deep (NUMA is suppose to make it deeper on large system).

So in my testing on a 12 cpu box, actually most of the benefit comes
from the first point, and please let me introduce it in detail.

The old logical when locate affine_sd is:

if prev_cpu != curr_cpu
if wake_affine()
prev_cpu = curr_cpu
new_cpu = select_idle_sibling(prev_cpu)
return new_cpu

The new logical is same to the old one if prev_cpu == curr_cpu, so let's
simplify the old logical like:

if wake_affine()
new_cpu = select_idle_sibling(curr_cpu)
else
new_cpu = select_idle_sibling(prev_cpu)

return new_cpu

Actually that doesn't make sense.

I think wake_affine() is trying to check whether move a task from
prev_cpu to curr_cpu will break the balance in affine_sd or not, but why
won't break balance means curr_cpu is better than prev_cpu for searching
the idle cpu?

So the new logical in this patch set is:

new_cpu = select_idle_sibling(prev_cpu)
if idle_cpu(new_cpu)
return new_cpu

new_cpu = select_idle_sibling(curr_cpu)
if idle_cpu(new_cpu) {
if wake_affine()
return new_cpu
}

return prev_cpu

And now, unless we are really going to move load from prev_cpu to
curr_cpu, we won't use wake_affine() any more.

So we avoid wake_affine() when system load is low or high, for middle
load, the worst cases is when failed to locate idle cpu in prev_cpu
topology but succeed to locate one in curr_cpu's, but that's rarely
happen and the benchmark results proved that point.

Some comparison below:

1. system load is low
old logical cost:
wake_affine()
select_idle_sibling()
new logical cost:
select_idle_sibling()

2. system load is high
old logical cost:
wake_affine()
select_idle_sibling()
new logical cost:
select_idle_sibling()
select_idle_sibling()

3. system load is middle
don't know

1 save the cost of wake_affine(), 3 could be proved by benchmark that no
regression at least.

For 2, it's the comparison between wake_affine() and
select_idle_sibling(), since the system load is high, wake_affine() cost
far more than select_idle_sibling(), and we saved many according to the
benchmark results.

> 
> Such changes also typically have a chance to cause regressions 
> in other workloads - when that happens we need this kind of 
> information to be able to enact plan-B.

The benefit comes from avoiding unnecessary works, and the patch set is
suppose to only reduce the cost of key function with least logical
changing, I could not promise it benefit all the workloads, but till
now, I've not found regression.

Regards,
Michael Wang

> 
> Thanks,
> 
>   Ingo
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] arm: use built-in byte swap function

2013-02-20 Thread Nicolas Pitre
On Wed, 20 Feb 2013, Kim Phillips wrote:

> On Wed, 20 Feb 2013 10:43:18 -0500
> Nicolas Pitre  wrote:
> 
> > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > > ... in which case there is no harm shipping a .c file and trivially 
> > > > enforcing -O2, the rest being equal.
> > > 
> > > For today's compilers, unless the wind changes.
> > 
> > We'll adapt if necessary.  Going with -O2 should remain pretty safe anyway.
> 
> Alas, not so for gcc 4.4 - I had forgotten I had tested
> Ubuntu/Linaro 4.4.7-1ubuntu2 here:
> 
> https://patchwork.kernel.org/patch/2101491/
> 
> add -O2 to that test script and gcc 4.4 *always* emits calls to
> __bswap[sd]i2, even with -march=armv6k+.

Crap.  OK, assembly code is the way to go then.

> I'll try working on an assembly version given it probably
> makes more sense, future-gcc-immunity-wise.

Agreed.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] Networking

2013-02-20 Thread Paul Gortmaker
On Wed, Feb 20, 2013 at 10:05 PM, Linus Torvalds
 wrote:
> On Wed, Feb 20, 2013 at 2:09 PM, David Miller  wrote:
>>
>> 15) Orphan and delete a bunch of pre-historic networking drivers from
>> Paul Gortmaker.
>
> Nooo You killed the 3c501 and 3c503 drivers! Snif.

Not true!  They were dead long ago, and here we were just providing
the service of a coroner, by removing the bodies vs. having them left to
decompose on the side of the street.

Paul.
--

>
> I wonder if they still worked..
>
>  Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] arm: use built-in byte swap function

2013-02-20 Thread Kim Phillips
On Wed, 20 Feb 2013 10:43:18 -0500
Nicolas Pitre  wrote:

> On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > ... in which case there is no harm shipping a .c file and trivially 
> > > enforcing -O2, the rest being equal.
> > 
> > For today's compilers, unless the wind changes.
> 
> We'll adapt if necessary.  Going with -O2 should remain pretty safe anyway.

Alas, not so for gcc 4.4 - I had forgotten I had tested
Ubuntu/Linaro 4.4.7-1ubuntu2 here:

https://patchwork.kernel.org/patch/2101491/

add -O2 to that test script and gcc 4.4 *always* emits calls to
__bswap[sd]i2, even with -march=armv6k+.

I'll try working on an assembly version given it probably
makes more sense, future-gcc-immunity-wise.

Otherwise we're back to the old 'if GCC_VERSION >= 40500' in
arch/arm/include/asm/swab.h...

Kim

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] ubifs: Wait for page writeback to provide stable pages

2013-02-20 Thread Darrick J. Wong
On Wed, Jan 23, 2013 at 01:43:12PM -0800, Andrew Morton wrote:
> On Fri, 18 Jan 2013 17:13:16 -0800
> "Darrick J. Wong"  wrote:
> 
> > When stable pages are required, we have to wait if the page is just
> > going to disk and we want to modify it. Add proper callback to
> > ubifs_vm_page_mkwrite().
> > 
> > CC: Artem Bityutskiy 
> > CC: Adrian Hunter 
> > CC: linux-...@lists.infradead.org
> > From: Jan Kara 
> > Signed-off-by: Jan Kara 
> > Signed-off-by: Darrick J. Wong 
> 
> A couple of these patches had this From:Jan strangely embedded in the
> signoff area.  I have assumed that they were indeed authored by Jan.
> 
> Please note that authorship is indicated by putting the From: line
> right at the start of the chagnelog.
> 
> 
> I grabbed the patches.  They should appear in linux-next tomorrow if I
> can get the current pooppile to build.

Well... these patches have been banging around in -next for a month or so now.
As far as I know there haven't been any complaints.  Can we push these for 3.9?

--D
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] Please pull NFS client bugfixes

2013-02-20 Thread Myklebust, Trond
Hi Linus,

The following changes since commit 88b62b915b0b7e25870eb0604ed9a92ba4bfc9f7:

  Linux 3.8-rc6 (2013-02-01 12:08:14 +1100)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.9-1

for you to fetch changes up to 666b3d803a511fbc9bc5e5ea8ce66010cf03ea13:

  NLM: Ensure that we resend all pending blocking locks after a reclaim 
(2013-02-19 12:18:27 -0500)


NFS client bugfixes for Linux 3.9

- Fix an Oops in the pNFS layoutget code
- Fix a number of NFSv4 and v4.1 state recovery deadlocks and hangs
  due to the interaction of the session drain lock and state management
  locks.
- Remove task->tk_xprt, which was hiding a lot of RCU dereferencing bugs
- Fix a long standing NFSv3 posix lock recovery bug.
- Revert commit 324d003b0cd82151adbaecefef57b73f7959a469. It turned out
  that the root cause of the deadlock was due to interactions with the
  workqueues that have now been resolved.


Jeff Layton (1):
  sunrpc: silence build warning in gss_fill_context

Tim Gardner (1):
  nfs: remove kfree() redundant null checks

Trond Myklebust (18):
  SUNRPC: Eliminate task->tk_xprt accesses that bypass rcu_dereference()
  SUNRPC: Pass a pointer to struct rpc_xprt to the connect callback
  SUNRPC: Fix an RCU dereference in xs_local_rpcbind
  SUNRPC: Pass pointers to struct rpc_xprt to the congestion window
  SUNRPC: Fix an RCU dereference in xprt_reserve
  SUNRPC: Avoid RCU dereferences in the transport bind and connect code
  SUNRPC: Nuke the tk_xprt macro
  Revert "NFS: add nfs_sb_deactive_async to avoid deadlock"
  SUNRPC: Add missing static declaration to _gss_mech_get_by_name
  NFSv4: Allow the state manager to mark an open_owner as being recovered
  NFSv4.1: Prevent deadlocks between state recovery and file locking
  NFSv4.1: Don't lose locks when a server reboots during delegation return
  NFSv4: Fix up the return values of nfs4_open_delegation_recall
  NFSv4: Ensure delegation recall and byte range lock removal don't conflict
  NFSv4: Fix a reboot recovery race when opening a file
  NFSv4.1: Fix an ABBA locking issue with session and state serialisation
  NFSv4.1: Fix bulk recall and destroy of layouts
  NLM: Ensure that we resend all pending blocking locks after a reclaim

Weston Andros Adamson (1):
  NFSv4.1: Don't decode skipped layoutgets

fanchaoting (1):
  umount oops when remove blocklayoutdriver first

 fs/lockd/clntproc.c   |   3 +
 fs/nfs/blocklayout/blocklayout.c  |   1 +
 fs/nfs/callback_proc.c|  61 ++
 fs/nfs/delegation.c   | 154 --
 fs/nfs/delegation.h   |   1 +
 fs/nfs/getroot.c  |   3 +-
 fs/nfs/inode.c|   5 +-
 fs/nfs/internal.h |   1 -
 fs/nfs/nfs4_fs.h  |   4 +
 fs/nfs/nfs4proc.c | 133 -
 fs/nfs/nfs4state.c|  11 ++-
 fs/nfs/objlayout/objio_osd.c  |   1 +
 fs/nfs/pnfs.c | 150 -
 fs/nfs/pnfs.h |   7 +-
 fs/nfs/super.c|  49 ---
 fs/nfs/unlink.c   |   5 +-
 include/linux/sunrpc/sched.h  |   1 -
 include/linux/sunrpc/xprt.h   |   6 +-
 net/sunrpc/auth_gss/auth_gss.c|   5 +-
 net/sunrpc/auth_gss/gss_mech_switch.c |   4 +-
 net/sunrpc/clnt.c |  16 ++--
 net/sunrpc/xprt.c |  21 +++--
 net/sunrpc/xprtrdma/rpc_rdma.c|   4 +-
 net/sunrpc/xprtrdma/transport.c   |   7 +-
 net/sunrpc/xprtrdma/xprt_rdma.h   |   6 +-
 net/sunrpc/xprtsock.c |  16 ++--
 26 files changed, 415 insertions(+), 260 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] Networking

2013-02-20 Thread David Miller
From: Linus Torvalds 
Date: Wed, 20 Feb 2013 19:12:37 -0800

> On Wed, Feb 20, 2013 at 7:05 PM, Linus Torvalds
>  wrote:
>>
>> Nooo You killed the 3c501 and 3c503 drivers! Snif.
> 
> .. but thank gods, the 3c509 still exists in the tree. I was worried
> for a minute.

Don't worry, the 3c509 will have it's day of reckoning too at
some point. :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] X.509: Support parse long form of length octets in Authority Key Identifier

2013-02-20 Thread joeyli
於 三,2013-02-20 於 12:49 +,David Howells 提到:
> Chun-Yi Lee  wrote:
> 
> > Per X.509 spec in 4.2.1.1 section, the structure of Authority Key
> > Identifier Extension is:
> > 
> >AuthorityKeyIdentifier ::= SEQUENCE {
> >   keyIdentifier [0] KeyIdentifier   OPTIONAL,
> >   authorityCertIssuer   [1] GeneralNamesOPTIONAL,
> >   authorityCertSerialNumber [2] CertificateSerialNumber OPTIONAL  }
> > 
> >KeyIdentifier ::= OCTET STRING
> > 
> > When a certificate also provides
> > authorityCertIssuer and authorityCertSerialNumber then the length of
> > AuthorityKeyIdentifier SEQUENCE is likely to long form format.
> > e.g.
> >The example certificate demos/tunala/A-server.pem in openssl source:
> > 
> > X509v3 Authority Key Identifier:
> > keyid:49:FB:45:72:12:C4:CC:E1:45:A1:D3:08:9E:95:C4:2C:6D:55:3F:17
> > DirName:/C=NZ/L=Wellington/O=Really Irresponsible Authorisation 
> > Authority (RIAA)/OU=Cert-stamping/CN=Jackov 
> > al-Trades/emailAddress=none@fake.domain
> > serial:00
> > 
> > Current parsing rule of OID_authorityKeyIdentifier only take care the
> > short form format, it causes load certificate to modsign_keyring fail:
> > 
> > [   12.061147] X.509: Extension: 47
> > [   12.075121] MODSIGN: Problem loading in-kernel X.509 certificate (-74)
> > 
> > So, this patch add the parsing rule for support long form format against
> > Authority Key Identifier.
> > 
> > v2:
> >  - Removed comma from author's name.
> >  - Moved 'Short Form length' comment inside the if-body.
> >  - Changed the type of sub to size_t.
> >  - Use ASN1_INDEFINITE_LENGTH rather than writing 0x80 and 127.
> >  - Moved the key_len's value assignment before alter v.
> >  - Fixed the typo of octets.
> >  - Add 2 to v before entering the loop for calculate the length.
> >  - Removed the comment of check vlen.
> > 
> > Cc: Rusty Russell 
> > Cc: Josh Boyer 
> > Cc: Randy Dunlap 
> > Cc: Herbert Xu 
> > Cc: "David S. Miller" 
> > Signed-off-by: Chun-Yi Lee 
> 
> Acked-by: David Howells 
> 

Thanks for David's review and confirm.

Joey Lee


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH V3] i2c: davinci: update to devm_* API

2013-02-20 Thread Vishwanathrao Badarkhe, Manish
Hi Wolfram

On Sat, Feb 16, 2013 at 00:39:43, Wolfram Sang wrote:
> On Thu, Feb 07, 2013 at 06:22:00PM +0530, Vishwanathrao Badarkhe, Manish 
> wrote:
> > Update the code to use devm_* API so that driver core will manage 
> > resources.
> > Also, if "devm_request_and_ioremap" fails return -EADDRNOTAVAIL 
> > instead of -EBUSY.
> > 
> > Signed-off-by: Vishwanathrao Badarkhe, Manish 
> 
> Basically OK, please resend when devm_ioremap_resource hits mainline in 3.9.

Thanks for pointing this out. Sure, I will resend this patch once 
devm_ioremap_resource hits mainline in 3.9.

Regards, 
Manish
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] lockdep: check that no locks held at freeze time

2013-02-20 Thread Mandeep Singh Baines
On Wed, Feb 20, 2013 at 4:42 PM, Andrew Morton
 wrote:
> On Wed, 20 Feb 2013 16:28:07 -0800
> Mandeep Singh Baines  wrote:
>
>> > Backtraces aren't *that* bad.  We'll easily be able to tell which of
>> > the two callsites triggered the trace.
>> >
>>
>> Let's say there was a try_to_freeze() that got inlined indirectly
>> (multiple levels of inline) into do_exit. Wouldn't the backtraces for
>> the regular exit check and the try_to_freeze check be identical except
>> for the offset (do_exit+0x45 versus do_exit+0x88)? So unless you had
>> an object file you wouldn't know which check you hit.
>
> Mutter.  Spose so.  Vaguely possible.  Yes, if we want to avoid a
> wont-happen, use __FILE__ and __LINE__.  Or, probably more sanely,
> __func__.
>

Fair enough. I'll avoid using a macro unless/until its actually needed.

Regards,
Mandeep

> Or uninline try_to_freeze().  If anything's calling that at high
> frequency, we have a problem.  And given the number of callsites,
> getting it into icache might result in a faster kernel...
>
> (Someone needs to teach __might_sleep() about __ratelimit())
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5] lockdep: check that no locks held at freeze time

2013-02-20 Thread Mandeep Singh Baines
We shouldn't try_to_freeze if locks are held.

Changes since v1:
* LKML: <20130215111635.ga26...@gmail.com> Ingo Molnar
  * Added a msg string that gets passed in.
* LKML: <20130215154449.gd30...@redhat.com> Oleg Nesterov
  * Check PF_NOFREEZE in try_to_freeze().
Changes since v2:
* LKML: <20130216170605.gc4...@redhat.com> Oleg Nesterovw
  * Avoid unnecessary PF_NOFREEZE check when !CONFIG_LOCKDEP.
* Mandeep Singh Baines
  * Generalize an exit specific printk.
Changes since v3:
* LKML: <20130220223013.ga15...@redhat.com> Oleg Nesterovw
  * Remove stale vfork comment from commit message.
Changes since v4:
* LKML: <20130220152446.a65ff84f.a...@linux-foundation.org> Andrew Morton
  * Remove tsk param since tsk is always current.
  * Remove msg param, dump_stack() should tell us all we need to know.

Signed-off-by: Mandeep Singh Baines 
CC: Ingo Molnar 
CC: Oleg Nesterov 
CC: Tejun Heo 
CC: Andrew Morton 
CC: Rafael J. Wysocki 
---
 include/linux/debug_locks.h |  4 ++--
 include/linux/freezer.h |  3 +++
 kernel/exit.c   |  2 +-
 kernel/lockdep.c| 16 +++-
 4 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index 3bd46f7..a975de1 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -51,7 +51,7 @@ struct task_struct;
 extern void debug_show_all_locks(void);
 extern void debug_show_held_locks(struct task_struct *task);
 extern void debug_check_no_locks_freed(const void *from, unsigned long len);
-extern void debug_check_no_locks_held(struct task_struct *task);
+extern void debug_check_no_locks_held(void);
 #else
 static inline void debug_show_all_locks(void)
 {
@@ -67,7 +67,7 @@ debug_check_no_locks_freed(const void *from, unsigned long 
len)
 }
 
 static inline void
-debug_check_no_locks_held(struct task_struct *task)
+debug_check_no_locks_held(void)
 {
 }
 #endif
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index e4238ce..c5bd118 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -3,6 +3,7 @@
 #ifndef FREEZER_H_INCLUDED
 #define FREEZER_H_INCLUDED
 
+#include 
 #include 
 #include 
 #include 
@@ -43,6 +44,8 @@ extern void thaw_kernel_threads(void);
 
 static inline bool try_to_freeze(void)
 {
+   if (!(current->flags & PF_NOFREEZE))
+   debug_check_no_locks_held();
might_sleep();
if (likely(!freezing(current)))
return false;
diff --git a/kernel/exit.c b/kernel/exit.c
index b4df219..aff5bdb 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -833,7 +833,7 @@ void do_exit(long code)
/*
 * Make sure we are holding no locks:
 */
-   debug_check_no_locks_held(tsk);
+   debug_check_no_locks_held();
/*
 * We can do this unlocked here. The futex code uses this flag
 * just to verify whether the pi state cleanup has been done
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 7981e5b..8e28f56 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -4083,7 +4083,7 @@ void debug_check_no_locks_freed(const void *mem_from, 
unsigned long mem_len)
 }
 EXPORT_SYMBOL_GPL(debug_check_no_locks_freed);
 
-static void print_held_locks_bug(struct task_struct *curr)
+static void print_held_locks_bug(void)
 {
if (!debug_locks_off())
return;
@@ -4092,21 +4092,19 @@ static void print_held_locks_bug(struct task_struct 
*curr)
 
printk("\n");
printk("=\n");
-   printk("[ BUG: lock held at task exit time! ]\n");
+   printk("[ BUG: %s/%d still has locks held! ]\n",
+  current->comm, task_pid_nr(current));
print_kernel_ident();
printk("-\n");
-   printk("%s/%d is exiting with locks still held!\n",
-   curr->comm, task_pid_nr(curr));
-   lockdep_print_held_locks(curr);
-
+   lockdep_print_held_locks(current);
printk("\nstack backtrace:\n");
dump_stack();
 }
 
-void debug_check_no_locks_held(struct task_struct *task)
+void debug_check_no_locks_held(void)
 {
-   if (unlikely(task->lockdep_depth > 0))
-   print_held_locks_bug(task);
+   if (unlikely(current->lockdep_depth > 0))
+   print_held_locks_bug();
 }
 
 void debug_show_all_locks(void)
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] Networking

2013-02-20 Thread Linus Torvalds
On Wed, Feb 20, 2013 at 7:05 PM, Linus Torvalds
 wrote:
>
> Nooo You killed the 3c501 and 3c503 drivers! Snif.

.. but thank gods, the 3c509 still exists in the tree. I was worried
for a minute.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] Networking

2013-02-20 Thread Linus Torvalds
On Wed, Feb 20, 2013 at 2:09 PM, David Miller  wrote:
>
> 15) Orphan and delete a bunch of pre-historic networking drivers from
> Paul Gortmaker.

Nooo You killed the 3c501 and 3c503 drivers! Snif.

I wonder if they still worked..

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug fix PATCH 0/2] Make whatever node kernel resides in un-hotpluggable.

2013-02-20 Thread Tang Chen

On 02/21/2013 05:36 AM, Andrew Morton wrote:

On Wed, 20 Feb 2013 19:00:54 +0800
Tang Chen  wrote:


As mentioned by HPA before, when we are using movablemem_map=acpi, if all the
memory in SRAT is hotpluggable, then the kernel will have no memory to use, and
will fail to boot.

Before parsing SRAT, memblock has already reserved some memory in 
memblock.reserve,
which is used by the kernel, such as storing the kernel image. We are not able 
to
prevent the kernel from using these memory. So, these 2 patches make the node 
which
the kernel resides in un-hotpluggable.


I'm planning to roll all these into a single commit:

acpi-memory-hotplug-support-getting-hotplug-info-from-srat.patch
acpi-memory-hotplug-support-getting-hotplug-info-from-srat-fix.patch
acpi-memory-hotplug-support-getting-hotplug-info-from-srat-fix-fix.patch
acpi-memory-hotplug-support-getting-hotplug-info-from-srat-fix-fix-fix.patch
acpi-memory-hotplug-support-getting-hotplug-info-from-srat-fix-fix-fix-fix.patch
acpi-memory-hotplug-support-getting-hotplug-info-from-srat-fix-fix-fix-fix-fix.patch

for reasons of tree-cleanliness and to avoid bisection holes.  They're
at http://ozlabs.org/~akpm/mmots/broken-out/.

Can you please check the changelog for
acpi-memory-hotplug-support-getting-hotplug-info-from-srat.patch to see
if it needs any updates due to all the fixup patches?  If so, please
send me the new changelog, thanks.


Hi Andrew,

Please use the following changelog for
acpi-memory-hotplug-support-getting-hotplug-info-from-srat.patch

**

We now provide an option for users who don't want to specify physical 
memory address

in kernel commandline.

/*
 * For movablemem_map=acpi:
 *
 * SRAT:|_| |_| |_| |_| 
..

 * node id:0   1 1   2
 * hotpluggable:   n   y y   n
 * movablemem_map:  |_| |_|
 *
 * Using movablemem_map, we can prevent memblock from 
allocating memory

 * on ZONE_MOVABLE at boot time.
 */

So user just specify movablemem_map=acpi, and the kernel will use 
hotpluggable info

in SRAT to determine which memory ranges should be set as ZONE_MOVABLE.

If all the memory ranges in SRAT is hotpluggable, then no memory can be 
used by kernel.
But before parsing SRAT, memblock has already reserve some memory ranges 
for other
purposes, such as for kernel image, and so on. We cannot prevent kernel 
from using
these memory. So we need to exclude these ranges even if these memory is 
hotpluggable.


Furthermore, there could be several memory ranges in the single node 
which the kernel
resides in. We may skip one range that have memory reserved by memblock, 
but if the
rest of memory is too small, then the kernel will fail to boot. So, make 
the whole node
which the kernel resides in un-hotpluggable. Then the kernel has enough 
memory to use.


NOTE: Using this way will cause NUMA performance down because the whole node
  will be set as ZONE_MOVABLE, and kernel cannot use memory on it.
  If users don't want to lose NUMA performance, just don't use it.

**



Also, please review the changelogging for these:


The following xxx-fix-... patches will also be rolled, right ?
I'll post the changelogs later.

Thanks. :)



page_alloc-add-movable_memmap-kernel-parameter.patch
page_alloc-add-movable_memmap-kernel-parameter-fix.patch
page_alloc-add-movable_memmap-kernel-parameter-fix-fix.patch
page_alloc-add-movable_memmap-kernel-parameter-fix-fix-checkpatch-fixes.patch
page_alloc-add-movable_memmap-kernel-parameter-fix-fix-fix.patch
page_alloc-add-movable_memmap-kernel-parameter-rename-movablecore_map-to-movablemem_map.patch

memory-hotplug-remove-sys-firmware-memmap-x-sysfs.patch
memory-hotplug-remove-sys-firmware-memmap-x-sysfs-fix.patch
memory-hotplug-remove-sys-firmware-memmap-x-sysfs-fix-fix.patch
memory-hotplug-remove-sys-firmware-memmap-x-sysfs-fix-fix-fix.patch
memory-hotplug-remove-sys-firmware-memmap-x-sysfs-fix-fix-fix-fix.patch
memory-hotplug-remove-sys-firmware-memmap-x-sysfs-fix-fix-fix-fix-fix.patch

memory-hotplug-implement-register_page_bootmem_info_section-of-sparse-vmemmap.patch
memory-hotplug-implement-register_page_bootmem_info_section-of-sparse-vmemmap-fix.patch
memory-hotplug-implement-register_page_bootmem_info_section-of-sparse-vmemmap-fix-fix.patch
memory-hotplug-implement-register_page_bootmem_info_section-of-sparse-vmemmap-fix-fix-fix.patch
memory-hotplug-implement-register_page_bootmem_info_section-of-sparse-vmemmap-fix-fix-fix-fix.patch

memory-hotplug-common-apis-to-support-page-tables-hot-remove.patch
memory-hotplug-common-apis-to-support-page-tables-hot-remove-fix.patch
memory-hotplug-common-apis-to-support-page-tables-hot-remove-fix-fix.patch
memory-hotplug-common-apis-to-support-page-tables-hot-remove-fix-fix-fix.patch

Re: [PATCH] DMI: Always call dmi_present with DMI structure

2013-02-20 Thread Zhenzhong Duan

Hi
Ben had sent a patch fixing this issue. Would you like to test his patch?
https://lkml.org/lkml/2013/2/16/102
zduan
On 2013-02-21 02:12, H.J. Lu wrote:

Hi,

This patch:

commit 9f9c9cbb60576a1518d0bf93fb8e499cffccf377
Author: Zhenzhong Duan 
Date:   Thu Dec 20 15:05:14 2012 -0800

 drivers/firmware/dmi_scan.c: fetch dmi version from SMBIOS if it exists

 The right dmi version is in SMBIOS if it's zero in DMI region

 This issue was originally found from an oracle bug.
 One customer noticed system UUID doesn't match between dmidecode & uek2.

  - HP ProLiant BL460c G6 :
# cat /sys/devices/virtual/dmi/id/product_uuid
--4C48-3031-4D5030333531
# dmidecode | grep -i uuiddrivers/firmware/dmi_scan.c:
fetch dmi version from SMBIOS if it exists

 The right dmi version is in SMBIOS if it's zero in DMI region

 This issue was originally found from an oracle bug.
 One customer noticed system UUID doesn't match between dmidecode & uek2.

  - HP ProLiant BL460c G6 :
# cat /sys/devices/virtual/dmi/id/product_uuid
--4C48-3031-4D5030333531
# dmidecode | grep -i uuid
UUID: --484C-3031-4D5030333531

 From SMBIOS 2.6 on, spec use little-endian encoding for UUID other than
 network byte order.

 So we need to get dmi version to distinguish.  If version is 0.0, the
 real version is taken from the SMBIOS version.  This is part of original
 kernel comment in code.

UUID: --484C-3031-4D5030333531

 From SMBIOS 2.6 on, spec use little-endian encoding for UUID other than
 network byte order.

 So we need to get dmi version to distinguish.  If version is 0.0, the
 real version is taken from the SMBIOS version.  This is part of original
 kernel comment in code.

causes a regression in 3.7, 3.8 and 3.9 kernels.   Before the change,
we only scan DMI structure.  Now smbios_present scans SMBIOS
entry point.  I have a machine which has invalid checksum in
SMBIOS entry point.  We wind up calling dmi_present with SMBIOS
entry point instead of DMI structure.  This patch changes smbios_present
to always call dmi_present with DMI structure.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] dlm updates for 3.9

2013-02-20 Thread David Teigland
Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.9

This includes a single patch to avoid excessive and
unnecessary scanning of rsbs to free.  Patch copied below.

Thanks,
Dave


dlm: avoid scanning unchanged toss lists

Keep track of whether a toss list contains any
shrinkable rsbs.  If not, dlm_scand can avoid
scanning the list for rsbs to shrink.  Unnecessary
scanning can otherwise waste a lot of time because
the toss lists can contain a large number of rsbs
that are non-shrinkable (directory records).

Signed-off-by: David Teigland 
---
 fs/dlm/dlm_internal.h |  3 +++
 fs/dlm/lock.c | 15 +++
 2 files changed, 18 insertions(+)

diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
index 77c0f70..e7665c3 100644
--- a/fs/dlm/dlm_internal.h
+++ b/fs/dlm/dlm_internal.h
@@ -96,10 +96,13 @@ do { \
 }
 
 
+#define DLM_RTF_SHRINK 0x0001
+
 struct dlm_rsbtable {
struct rb_root  keep;
struct rb_root  toss;
spinlock_t  lock;
+   uint32_tflags;
 };
 
 
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index a579f30..f750165 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -1132,6 +1132,7 @@ static void toss_rsb(struct kref *kref)
rb_erase(>res_hashnode, >ls_rsbtbl[r->res_bucket].keep);
rsb_insert(r, >ls_rsbtbl[r->res_bucket].toss);
r->res_toss_time = jiffies;
+   ls->ls_rsbtbl[r->res_bucket].flags |= DLM_RTF_SHRINK;
if (r->res_lvbptr) {
dlm_free_lvb(r->res_lvbptr);
r->res_lvbptr = NULL;
@@ -1659,11 +1660,18 @@ static void shrink_bucket(struct dlm_ls *ls, int b)
char *name;
int our_nodeid = dlm_our_nodeid();
int remote_count = 0;
+   int need_shrink = 0;
int i, len, rv;
 
memset(>ls_remove_lens, 0, sizeof(int) * DLM_REMOVE_NAMES_MAX);
 
spin_lock(>ls_rsbtbl[b].lock);
+
+   if (!(ls->ls_rsbtbl[b].flags & DLM_RTF_SHRINK)) {
+   spin_unlock(>ls_rsbtbl[b].lock);
+   return;
+   }
+
for (n = rb_first(>ls_rsbtbl[b].toss); n; n = next) {
next = rb_next(n);
r = rb_entry(n, struct dlm_rsb, res_hashnode);
@@ -1679,6 +1687,8 @@ static void shrink_bucket(struct dlm_ls *ls, int b)
continue;
}
 
+   need_shrink = 1;
+
if (!time_after_eq(jiffies, r->res_toss_time +
   dlm_config.ci_toss_secs * HZ)) {
continue;
@@ -1710,6 +1720,11 @@ static void shrink_bucket(struct dlm_ls *ls, int b)
rb_erase(>res_hashnode, >ls_rsbtbl[b].toss);
dlm_free_rsb(r);
}
+
+   if (need_shrink)
+   ls->ls_rsbtbl[b].flags |= DLM_RTF_SHRINK;
+   else
+   ls->ls_rsbtbl[b].flags &= ~DLM_RTF_SHRINK;
spin_unlock(>ls_rsbtbl[b].lock);
 
/*
-- 
1.8.1.rc1.5.g7e0651a

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] tracing/syscalls: Annotate field-defining functions with __init

2013-02-20 Thread Li Zefan
These two functions are called during kernel boot only.

Signed-off-by: Li Zefan 
---
 kernel/trace/trace_syscalls.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 5329e13..a70fa19 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -232,7 +232,7 @@ static void free_syscall_print_fmt(struct ftrace_event_call 
*call)
kfree(call->print_fmt);
 }
 
-static int syscall_enter_define_fields(struct ftrace_event_call *call)
+static int __init syscall_enter_define_fields(struct ftrace_event_call *call)
 {
struct syscall_trace_enter trace;
struct syscall_metadata *meta = call->data;
@@ -255,7 +255,7 @@ static int syscall_enter_define_fields(struct 
ftrace_event_call *call)
return ret;
 }
 
-static int syscall_exit_define_fields(struct ftrace_event_call *call)
+static int __init syscall_exit_define_fields(struct ftrace_event_call *call)
 {
struct syscall_trace_exit trace;
int ret;
-- 
1.8.0.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] tracing: Annotate event field-defining functions with __init

2013-02-20 Thread Li Zefan
Those functions are called either during kernel boot or module init.

Before:

$ dmesg | grep 'Freeing unused kernel memory'
Freeing unused kernel memory: 1208k freed
Freeing unused kernel memory: 1360k freed
Freeing unused kernel memory: 1960k freed

After:

$ dmesg | grep 'Freeing unused kernel memory'
Freeing unused kernel memory: 1236k freed
Freeing unused kernel memory: 1388k freed
Freeing unused kernel memory: 1960k freed

Signed-off-by: Li Zefan 
---
 include/trace/ftrace.h  | 2 +-
 kernel/trace/trace_export.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 20b6005..dc18af3 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -324,7 +324,7 @@ static struct trace_event_functions 
ftrace_event_type_funcs_##call = {  \
 
 #undef DECLARE_EVENT_CLASS
 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, func, print)   \
-static int notrace \
+static int notrace __init  \
 ftrace_define_fields_##call(struct ftrace_event_call *event_call)  \
 {  \
struct ftrace_raw_##call field; \
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index e039906..4f6a91c 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -129,7 +129,7 @@ static void __always_unused ftrace_check_##name(void)   
\
 
 #undef FTRACE_ENTRY
 #define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter)\
-int\
+static int __init  \
 ftrace_define_fields_##name(struct ftrace_event_call *event_call)  \
 {  \
struct struct_name field;   \
-- 
1.8.0.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 00/32] ldisc patchset

2013-02-20 Thread Shawn Guo
On Wed, Feb 20, 2013 at 03:02:47PM -0500, Peter Hurley wrote:
> [-cc Alan Cox]
> 
> Sebastian, please re-test your g_nokia+dummy_hcd testcase with
> this series.
> 
> Sasha and Dave, my trinity testbeds die in other areas right now;
> I would really appreciate if you would please re-test this series.
> 
> Michael and Shawn, I'd appreciate if you test with this series
> although I know it won't WARN because this patchset removes it.

On imx51 and imx6q:

Tested-by: Shawn Guo 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] tracing: Add a helper function for event print functions

2013-02-20 Thread Li Zefan
Move duplicate code in event print functions to a helper function.

This shrinks the size of the kernel by ~13K.

   textdata bss dec hex filename
6596137 1743966 1013867218478775119f6b7 vmlinux.o.old
6583002 1743849 1013867218465523119c2f3 vmlinux.o.new

Signed-off-by: Li Zefan 
---
 include/linux/ftrace_event.h |  8 ++--
 include/trace/ftrace.h   | 23 ++-
 kernel/trace/trace_output.c  | 26 ++
 3 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index a3d4895..d54d458 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -38,6 +38,12 @@ const char *ftrace_print_symbols_seq_u64(struct trace_seq *p,
 const char *ftrace_print_hex_seq(struct trace_seq *p,
 const unsigned char *buf, int len);
 
+struct trace_iterator;
+struct trace_event;
+
+int ftrace_raw_output_prep(struct trace_iterator *iter,
+  struct trace_event *event);
+
 /*
  * The trace entry - the most basic unit of tracing. This is what
  * is printed in the end as a single line in the trace output, such as:
@@ -93,8 +99,6 @@ enum trace_iter_flags {
 };
 
 
-struct trace_event;
-
 typedef enum print_line_t (*trace_print_func)(struct trace_iterator *iter,
  int flags, struct trace_event *event);
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 40dc5e8..20b6005 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -227,29 +227,18 @@ static notrace enum print_line_t  
\
 ftrace_raw_output_##call(struct trace_iterator *iter, int flags,   \
 struct trace_event *trace_event)   \
 {  \
-   struct ftrace_event_call *event;\
struct trace_seq *s = >seq;   \
+   struct trace_seq __maybe_unused *p = >tmp_seq;\
struct ftrace_raw_##call *field;\
-   struct trace_entry *entry;  \
-   struct trace_seq *p = >tmp_seq;   \
int ret;\
\
-   event = container_of(trace_event, struct ftrace_event_call, \
-event);\
-   \
-   entry = iter->ent;  \
+   field = (typeof(field))iter->ent;   \
\
-   if (entry->type != event->event.type) { \
-   WARN_ON_ONCE(1);\
-   return TRACE_TYPE_UNHANDLED;\
-   }   \
-   \
-   field = (typeof(field))entry;   \
-   \
-   trace_seq_init(p);  \
-   ret = trace_seq_printf(s, "%s: ", event->name); \
+   ret = ftrace_raw_output_prep(iter, trace_event);\
if (ret)\
-   ret = trace_seq_printf(s, print);   \
+   return ret; \
+   \
+   ret = trace_seq_printf(s, print);   \
if (!ret)   \
return TRACE_TYPE_PARTIAL_LINE; \
\
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 194d796..aa92ac3 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -397,6 +397,32 @@ ftrace_print_hex_seq(struct trace_seq *p, const unsigned 
char *buf, int buf_len)
 }
 EXPORT_SYMBOL(ftrace_print_hex_seq);
 
+int ftrace_raw_output_prep(struct trace_iterator *iter,
+  struct trace_event *trace_event)
+{
+   struct ftrace_event_call *event;
+   struct trace_seq *s = >seq;
+   struct trace_seq *p = >tmp_seq;
+   struct trace_entry *entry;
+   int ret;
+
+   event = container_of(trace_event, struct ftrace_event_call, event);
+   entry = iter->ent;
+
+   if 

[PATCH 1/4] tracing/syscalls: Anotate some functions static

2013-02-20 Thread Li Zefan

Signed-off-by: Li Zefan 
---
 kernel/trace/trace_syscalls.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 7609dd6..5329e13 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -77,7 +77,7 @@ static struct syscall_metadata *syscall_nr_to_meta(int nr)
return syscalls_metadata[nr];
 }
 
-enum print_line_t
+static enum print_line_t
 print_syscall_enter(struct trace_iterator *iter, int flags,
struct trace_event *event)
 {
@@ -130,7 +130,7 @@ end:
return TRACE_TYPE_HANDLED;
 }
 
-enum print_line_t
+static enum print_line_t
 print_syscall_exit(struct trace_iterator *iter, int flags,
   struct trace_event *event)
 {
@@ -270,7 +270,7 @@ static int syscall_exit_define_fields(struct 
ftrace_event_call *call)
return ret;
 }
 
-void ftrace_syscall_enter(void *ignore, struct pt_regs *regs, long id)
+static void ftrace_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 {
struct syscall_trace_enter *entry;
struct syscall_metadata *sys_data;
@@ -305,7 +305,7 @@ void ftrace_syscall_enter(void *ignore, struct pt_regs 
*regs, long id)
trace_current_buffer_unlock_commit(buffer, event, 0, 0);
 }
 
-void ftrace_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
+static void ftrace_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
 {
struct syscall_trace_exit *entry;
struct syscall_metadata *sys_data;
@@ -337,7 +337,7 @@ void ftrace_syscall_exit(void *ignore, struct pt_regs 
*regs, long ret)
trace_current_buffer_unlock_commit(buffer, event, 0, 0);
 }
 
-int reg_event_syscall_enter(struct ftrace_event_call *call)
+static int reg_event_syscall_enter(struct ftrace_event_call *call)
 {
int ret = 0;
int num;
@@ -356,7 +356,7 @@ int reg_event_syscall_enter(struct ftrace_event_call *call)
return ret;
 }
 
-void unreg_event_syscall_enter(struct ftrace_event_call *call)
+static void unreg_event_syscall_enter(struct ftrace_event_call *call)
 {
int num;
 
@@ -371,7 +371,7 @@ void unreg_event_syscall_enter(struct ftrace_event_call 
*call)
mutex_unlock(_trace_lock);
 }
 
-int reg_event_syscall_exit(struct ftrace_event_call *call)
+static int reg_event_syscall_exit(struct ftrace_event_call *call)
 {
int ret = 0;
int num;
@@ -390,7 +390,7 @@ int reg_event_syscall_exit(struct ftrace_event_call *call)
return ret;
 }
 
-void unreg_event_syscall_exit(struct ftrace_event_call *call)
+static void unreg_event_syscall_exit(struct ftrace_event_call *call)
 {
int num;
 
@@ -459,7 +459,7 @@ unsigned long __init __weak arch_syscall_addr(int nr)
return (unsigned long)sys_call_table[nr];
 }
 
-int __init init_ftrace_syscalls(void)
+static int __init init_ftrace_syscalls(void)
 {
struct syscall_metadata *meta;
unsigned long addr;
-- 
1.8.0.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] staging: fix all sparse warnings in silicom/bypasslib/

2013-02-20 Thread Randy Dunlap
From: Randy Dunlap 

Fix all sparse warning in drivers/staging/silicom/bypasslib/,
e.g.:


drivers/staging/silicom/bypasslib/bypass.c:471:21: warning: non-ANSI function 
declaration of function 'init_lib_module'
drivers/staging/silicom/bypasslib/bypass.c:478:25: warning: non-ANSI function 
declaration of function 'cleanup_lib_module'
drivers/staging/silicom/bypasslib/bypass.c:137:5: warning: symbol 
'is_bypass_dev' was not declared. Should it be static?
drivers/staging/silicom/bypasslib/bypass.c:182:5: warning: symbol 'is_bypass' 
was not declared. Should it be static?
drivers/staging/silicom/bypasslib/bypass.c:192:5: warning: symbol 
'get_bypass_slave' was not declared. Should it be static?
drivers/staging/silicom/bypasslib/bypass.c:197:5: warning: symbol 
'get_bypass_caps' was not declared. Should it be static?
drivers/staging/silicom/bypasslib/bypass.c:202:5: warning: symbol 
'get_wd_set_caps' was not declared. Should it be static?
etc.

Signed-off-by: Randy Dunlap 
---
 drivers/staging/silicom/bypasslib/bypass.c |   94 +--
 1 file changed, 47 insertions(+), 47 deletions(-)

--- lnx-38.orig/drivers/staging/silicom/bypasslib/bypass.c
+++ lnx-38/drivers/staging/silicom/bypasslib/bypass.c
@@ -134,7 +134,7 @@ static int is_dev_sd(int if_index)
return (ret >= 0 ? 1 : 0);
 }
 
-int is_bypass_dev(int if_index)
+static int is_bypass_dev(int if_index)
 {
struct pci_dev *pdev = NULL;
struct net_device *dev = NULL;
@@ -179,7 +179,7 @@ int is_bypass_dev(int if_index)
return (ret < 0 ? -1 : ret);
 }
 
-int is_bypass(int if_index)
+static int is_bypass(int if_index)
 {
int ret = 0;
SET_BPLIB_INT_FN(is_bypass, int, if_index, ret);
@@ -189,70 +189,70 @@ int is_bypass(int if_index)
return ret;
 }
 
-int get_bypass_slave(int if_index)
+static int get_bypass_slave(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_bypass_slave, GET_BYPASS_SLAVE, if_index);
 }
 
-int get_bypass_caps(int if_index)
+static int get_bypass_caps(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_bypass_caps, GET_BYPASS_CAPS, if_index);
 }
 
-int get_wd_set_caps(int if_index)
+static int get_wd_set_caps(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_wd_set_caps, GET_WD_SET_CAPS, if_index);
 }
 
-int set_bypass(int if_index, int bypass_mode)
+static int set_bypass(int if_index, int bypass_mode)
 {
DO_BPLIB_SET_ARG_FN(set_bypass, SET_BYPASS, if_index, bypass_mode);
 }
 
-int get_bypass(int if_index)
+static int get_bypass(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_bypass, GET_BYPASS, if_index);
 }
 
-int get_bypass_change(int if_index)
+static int get_bypass_change(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_bypass_change, GET_BYPASS_CHANGE, if_index);
 }
 
-int set_dis_bypass(int if_index, int dis_bypass)
+static int set_dis_bypass(int if_index, int dis_bypass)
 {
DO_BPLIB_SET_ARG_FN(set_dis_bypass, SET_DIS_BYPASS, if_index,
dis_bypass);
 }
 
-int get_dis_bypass(int if_index)
+static int get_dis_bypass(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_dis_bypass, GET_DIS_BYPASS, if_index);
 }
 
-int set_bypass_pwoff(int if_index, int bypass_mode)
+static int set_bypass_pwoff(int if_index, int bypass_mode)
 {
DO_BPLIB_SET_ARG_FN(set_bypass_pwoff, SET_BYPASS_PWOFF, if_index,
bypass_mode);
 }
 
-int get_bypass_pwoff(int if_index)
+static int get_bypass_pwoff(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_bypass_pwoff, GET_BYPASS_PWOFF, if_index);
 }
 
-int set_bypass_pwup(int if_index, int bypass_mode)
+static int set_bypass_pwup(int if_index, int bypass_mode)
 {
DO_BPLIB_SET_ARG_FN(set_bypass_pwup, SET_BYPASS_PWUP, if_index,
bypass_mode);
 }
 
-int get_bypass_pwup(int if_index)
+static int get_bypass_pwup(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_bypass_pwup, GET_BYPASS_PWUP, if_index);
 }
 
-int set_bypass_wd(int if_index, int ms_timeout, int *ms_timeout_set)
+static int set_bypass_wd(int if_index, int ms_timeout, int *ms_timeout_set)
 {
int data = ms_timeout, ret = 0;
if (is_dev_sd(if_index))
@@ -268,7 +268,7 @@ int set_bypass_wd(int if_index, int ms_t
return ret;
 }
 
-int get_bypass_wd(int if_index, int *ms_timeout_set)
+static int get_bypass_wd(int if_index, int *ms_timeout_set)
 {
int *data = ms_timeout_set, ret = 0;
if (is_dev_sd(if_index))
@@ -279,7 +279,7 @@ int get_bypass_wd(int if_index, int *ms_
return ret;
 }
 
-int get_wd_expire_time(int if_index, int *ms_time_left)
+static int get_wd_expire_time(int if_index, int *ms_time_left)
 {
int *data = ms_time_left, ret = 0;
if (is_dev_sd(if_index))
@@ -293,144 +293,144 @@ int get_wd_expire_time(int if_index, int
return ret;
 }
 
-int reset_bypass_wd_timer(int if_index)
+static int reset_bypass_wd_timer(int if_index)
 {
DO_BPLIB_GET_ARG_FN(reset_bypass_wd_timer, RESET_BYPASS_WD_TIMER,

Re: What does the PG_swapbacked of page flags actually mean?

2013-02-20 Thread common An
On Wed, Feb 20, 2013 at 6:43 PM, common An  wrote:
> PG_swapbacked is a bit for page->flags.
>
> In kernel code, its comment is "page is backed by RAM/swap". But I couldn't
> understand it.
> 1. Does the RAM mean DRAM? How page is backed by RAM?
> 2. When the page is page-out to swap file, the bit PG_swapbacked will be set
> to demonstrate this page is backed by swap. Is it right?
> 3. In general, when will call SetPageSwapBacked() to set the bit?

>From : http://www.gossamer-threads.com/lists/linux/kernel/840692#840692

Every anonymous, tmpfs or shared memory segment page is potentially
swap backed. That is the whole point of the PG_swapbacked flag.

A page from a filesystem like ext3 or NFS cannot suddenly turn into
a swap backed page. This page "nature" is not changed during the
lifetime of a page.

But, I am still a little confusing.

>
> Could anybody kindly explain for me?
>
> Thanks very much.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the tip tree

2013-02-20 Thread Stephen Rothwell
Hi all,

On Thu, 14 Feb 2013 13:30:16 +1100 Stephen Rothwell  
wrote:
>
> After merging the tip tree, today's linux-next build (x86_64 allmodconfig)
> failed like this:
> 
> drivers/thermal/intel_powerclamp.c: In function 'clamp_thread':
> drivers/thermal/intel_powerclamp.c:360:21: error: 'MAX_USER_RT_PRIO' 
> undeclared (first use in this function)
> 
> Caused by commit 8bd75c77b7c6 ("sched/rt: Move rt specific bits into new
> header file") interacting with commit d6d71ee4a14a ("PM: Introduce Intel
> PowerClamp Driver") from the thermal tree.
> 
> I applied this merge fix patch and can carry it as necessary:
> 
> From: Stephen Rothwell 
> Date: Thu, 14 Feb 2013 13:26:22 +1100
> Subject: [PATCH] sched/rt: fix PowerClamp Driver for define move
> 
> Signed-off-by: Stephen Rothwell 
> ---
>  drivers/thermal/intel_powerclamp.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/thermal/intel_powerclamp.c 
> b/drivers/thermal/intel_powerclamp.c
> index ab3ed90..b40b37c 100644
> --- a/drivers/thermal/intel_powerclamp.c
> +++ b/drivers/thermal/intel_powerclamp.c
> @@ -50,6 +50,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 

The above fix is now needed when the thermal tree is merged with Linus'
tree ...

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpthGBLP8kYm.pgp
Description: PGP signature


[PATCH] ALSA: usb: Fix Processing Unit Descriptor parsers

2013-02-20 Thread Pawel Moll
Commit 99fc86450c439039d2ef88d06b222fd51a779176 "ALSA: usb-mixer:
parse descriptors with structs" introduced a set of useful parsers
for descriptors. Unfortunately the parses for the Processing Unit
Descriptor came with a very subtle bug...

Functions uac_processing_unit_iProcessing() and
uac_processing_unit_specific() were indexing the baSourceID array
forgetting the fields before the iProcessing and process-specific
descriptors.

The problem was observed with Sound Blaster Extigy mixer,
where nNrModes in Up/Down-mix Processing Unit Descriptor
was accessed at offset 10 of the descriptor (value 0)
instead of offset 15 (value 7). In result the resulting
control had interesting limit values:

Simple mixer control 'Channel Routing Mode Select',0
  Capabilities: volume volume-joined penum
  Playback channels: Mono
  Capture channels: Mono
  Limits: 0 - -1
  Mono: -1 [100%]

Fixed by starting from the bmControls, which was calculated
correctly, instead of baSourceID.

Now the mentioned control is fine:

Simple mixer control 'Channel Routing Mode Select',0
  Capabilities: volume volume-joined penum
  Playback channels: Mono
  Capture channels: Mono
  Limits: 0 - 6
  Mono: 0 [0%]

Signed-off-by: Pawel Moll 
---
 include/uapi/linux/usb/audio.h |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/usb/audio.h b/include/uapi/linux/usb/audio.h
index ac90037..d2314be 100644
--- a/include/uapi/linux/usb/audio.h
+++ b/include/uapi/linux/usb/audio.h
@@ -384,14 +384,16 @@ static inline __u8 uac_processing_unit_iProcessing(struct 
uac_processing_unit_de
   int protocol)
 {
__u8 control_size = uac_processing_unit_bControlSize(desc, protocol);
-   return desc->baSourceID[desc->bNrInPins + control_size];
+   return *(uac_processing_unit_bmControls(desc, protocol)
+   + control_size);
 }
 
 static inline __u8 *uac_processing_unit_specific(struct 
uac_processing_unit_descriptor *desc,
 int protocol)
 {
__u8 control_size = uac_processing_unit_bControlSize(desc, protocol);
-   return >baSourceID[desc->bNrInPins + control_size + 1];
+   return uac_processing_unit_bmControls(desc, protocol)
+   + control_size + 1;
 }
 
 /* 4.5.2 Class-Specific AS Interface Descriptor */
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the net-next tree with the mips tree

2013-02-20 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the net-next tree got a conflict in
include/linux/ssb/ssb_driver_gige.h between commit 111bd981e221 ("MIPS:
BCM47XX: add bcm47xx prefix in front of nvram function names") from the
mips tree and commit 180996c30517 ("ssb: get mac address from sprom
struct for gige driver") from the net-next tree.

I fixed it up (the latter seems to supercede the former, so I used that)
and can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpm0DusOqeyK.pgp
Description: PGP signature


RE: [PATCH] libertas sdio: remove CMD_FUNC_INIT call

2013-02-20 Thread Bing Zhao
Hi Lubomir,

> > > @@ -825,20 +825,6 @@ static void if_sdio_finish_power_on(struct 
> > > if_sdio_card *card)
> > >
> > >   sdio_release_host(func);
> > >
> > > - /*
> > > -  * FUNC_INIT is required for SD8688 WLAN/BT multiple functions
> > > -  */
> > > - if (card->model == MODEL_8688) {
> > > - struct cmd_header cmd;
> > > -
> > > - memset(, 0, sizeof(cmd));
> > > -
> > > - lbs_deb_sdio("send function INIT command\n");
> > > - if (__lbs_cmd(priv, CMD_FUNC_INIT, , sizeof(cmd),
> > > - lbs_cmd_copyback, (unsigned long) ))
> > > - netdev_alert(priv->dev, "CMD_FUNC_INIT cmd failed\n");
> > > - }
> > > -
> >
> > Removing FUNC_INIT could break things in some scenarios.
> > Could you please test the following case?
> >
> > 1. insmod liberates -> download firmware, send FUNC_INIT, ...
> > 2. rmmod libertas -> send FUNC_SHUTDOWN command to firmware; BT is still 
> > working.
> > 3. insmod libertas -> skip firmware downloading, send FUNC_INIT, ...
> >
> > If FUNC_INIT is removed, I don't expect step 3 to work.
> 
> In case btmrvl_sdio is loaded, the driver always locks up in FUNC_INIT
> upon probe time, thus I'm not able to proceed to further steps.
> 
> [  209.338953] [] (__schedule+0x610/0x764) from [] 
> (__lbs_cmd+0xb8/0x130
> [libertas])
> [  209.348340] [] (__lbs_cmd+0xb8/0x130 [libertas]) from 
> []
> (if_sdio_finish_power_on+0xec/0x1b0 [libertas_sdio])
> [  209.360136] [] (if_sdio_finish_power_on+0xec/0x1b0 
> [libertas_sdio]) from []
> (if_sdio_power_on+0x18c/0x20c [libertas_sdio])
> [  209.373052] [] (if_sdio_power_on+0x18c/0x20c [libertas_sdio]) 
> from []
> (if_sdio_probe+0x200/0x31c [libertas_sdio])
> [  209.385316] [] (if_sdio_probe+0x200/0x31c [libertas_sdio]) from 
> []
> (sdio_bus_probe+0x94/0xfc [mmc_core])
> [  209.396748] [] (sdio_bus_probe+0x94/0xfc [mmc_core]) from 
> []
> (driver_probe_device+0x12c/0x348)
> [  209.407214] [] (driver_probe_device+0x12c/0x348) from 
> []
> (__driver_attach+0x78/0x9c)
> [  209.416798] [] (__driver_attach+0x78/0x9c) from [] 
> (bus_for_each_dev+0x50/0x88)
> [  209.425946] [] (bus_for_each_dev+0x50/0x88) from []
> (bus_add_driver+0x108/0x268)
> [  209.435180] [] (bus_add_driver+0x108/0x268) from []
> (driver_register+0xa4/0x134)
> [  209.26] [] (driver_register+0xa4/0x134) from []
> (if_sdio_init_module+0x1c/0x3c [libertas_sdio])
> [  209.455339] [] (if_sdio_init_module+0x1c/0x3c [libertas_sdio]) 
> from []
> (do_one_initcall+0x98/0x174)
> [  209.466236] [] (do_one_initcall+0x98/0x174) from [] 
> (load_module+0x1c5c/0x1f80)
> [  209.475390] [] (load_module+0x1c5c/0x1f80) from []
> (sys_init_module+0x104/0x128)
> [  209.484632] [] (sys_init_module+0x104/0x128) from []
> (ret_fast_syscall+0x0/0x38)
> 
> In case btmrvl_sdio is _not_ loaded, insmod returns, but driver locks up
> waiting for FUNC_INIT to finish:
> 
> [  300.538859] [] (__schedule+0x610/0x764) from [] 
> (__lbs_cmd+0xb8/0x130
> [libertas])
> [  300.548600] [] (__lbs_cmd+0xb8/0x130 [libertas]) from 
> []
> (if_sdio_finish_power_on+0xec/0x1b0 [libertas_sdio])
> [  300.560398] [] (if_sdio_finish_power_on+0xec/0x1b0 
> [libertas_sdio]) from []
> (if_sdio_do_prog_firmware+0x414/0x454 [libertas_sdio])
> [  300.574052] [] (if_sdio_do_prog_firmware+0x414/0x454 
> [libertas_sdio]) from []
> (lbs_fw_loaded+0x24/0x58 [libertas])
> [  300.586907] [] (lbs_fw_loaded+0x24/0x58 [libertas]) from 
> []
> (request_firmware_work_func+0xb0/0xf4)
> [  300.597746] [] (request_firmware_work_func+0xb0/0xf4) from 
> []
> (process_one_work+0x348/0x6a8)
> [  300.608288] [] (process_one_work+0x348/0x6a8) from []
> (worker_thread+0x268/0x390)
> [  300.617630] [] (worker_thread+0x268/0x390) from [] 
> (kthread+0xc0/0xd4)
> [  300.625947] [] (kthread+0xc0/0xd4) from [] 
> (ret_from_fork+0x14/0x20)
> [  300.634135] 2 locks held by kworker/0:1/19:
> [  300.638383]  #0:  (events){.+.+.+}, at: [] 
> process_one_work+0x208/0x6a8
> [  300.646512]  #1:  ((_work->work)){+.+.+.}, at: [] 
> process_one_work+0x208/0x6a8

There seems to be a race condition in lbs_thread().

At line 582:
 582 if (!priv->fw_ready)
 583 continue;

The fw_ready is 0, so you never get the chance to execute the FUNC_INIT command.

 617 /* Execute the next command */
 618 if (!priv->dnld_sent && !priv->cur_cmd)
 619 lbs_execute_next_command(priv);


Could you try the following change?

diff --git a/drivers/net/wireless/libertas/if_sdio.c b/drivers/net/wireless/libe
index 739309e..8f5d977 100644
--- a/drivers/net/wireless/libertas/if_sdio.c
+++ b/drivers/net/wireless/libertas/if_sdio.c
@@ -825,6 +825,8 @@ static void if_sdio_finish_power_on(struct if_sdio_card *car

sdio_release_host(func);

+   priv->fw_ready = 1;
+
/*
 * FUNC_INIT is required for SD8688 WLAN/BT multiple functions
 */
@@ -839,7 +841,6 @@ static void if_sdio_finish_power_on(struct if_sdio_card *car

[PATCH] x86: mm: Fix vmalloc_fault oops during lazy MMU updates

2013-02-20 Thread Samu Kallio
In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
when lazy MMU updates are enabled, because set_pgd effects are being
deferred.

One instance of this problem is during process mm cleanup with memory
cgroups enabled. The chain of events is as follows:

- zap_pte_range enables lazy MMU updates
- zap_pte_range eventually calls mem_cgroup_charge_statistics,
  which accesses the vmalloc'd mem_cgroup per-cpu stat area
- vmalloc_fault is triggered which tries to sync the corresponding
  PGD entry with set_pgd, but the update is deferred
- vmalloc_fault oopses due to a mismatch in the PUD entries

Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
changes visible to the consistency checks.

Signed-off-by: Samu Kallio 
---
 arch/x86/mm/fault.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 027088f..3ba3dba 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long 
address)
if (pgd_none(*pgd_ref))
return -1;
 
-   if (pgd_none(*pgd))
+   if (pgd_none(*pgd)) {
set_pgd(pgd, *pgd_ref);
-   else
+   arch_flush_lazy_mmu_mode();
+   } else {
BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+   }
 
/*
 * Below here mismatches are bugs because these lower tables
-- 
1.8.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v5 04/15] sched: add sched balance policies in kernel

2013-02-20 Thread Alex Shi
On 02/20/2013 11:41 PM, Ingo Molnar wrote:
> 
> * Alex Shi  wrote:
> 
>> Now there is just 2 types policy: performance and 
>> powersaving(with 2 degrees, powersaving and balance).
> 
> I don't think we really want to have 'degrees' to the policies 
> at this point - we want each policy to be extremely good at what 
> it aims to do:
> 
>  - 'performance' should finish jobs in in the least amount of 
> time possible. No ifs and whens.
> 
>  - 'power saving' should finish jobs with the least amount of 
> watts consumed. No ifs and whens.
> 
>> powersaving policy will try to assign one task to each LCPU, 
>> whichever the LCPU is SMT thread or a core. The balance policy 
>> is also a kind of powersaving policy, just a bit less 
>> aggressive. It will try to assign tasks according group 
>> capacity, one task to one capacity.
> 
> The thing is, 'a bit less aggressive' is an awfully vague 
> concept to maintain on a long term basis - while the two 
> definitions above are reasonably deterministic which can be 
> measured and improved upon.
> 
> Those two policies and definitions are also much easier to 
> communicate to user-space and to users - it's much easier to 
> explain what each policy is supposed to do.
> 
> I'd be totally glad if we got so far that those two policies 
> work really well. Any further nuance visible at the ABI level is 
> I think many years down the road - if at all. Simple things 
> first - those are complex enough already.


Thanks for comments!
I will remove the 'balance' policy.

> 
> Thanks,
> 
>   Ingo
> 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 8/9] pps: Use a single cdev

2013-02-20 Thread Peter Hurley
On Tue, 2013-02-12 at 02:02 -0500, George Spelvin wrote:
> One per device just seems wasteful, when we already manintain a
> data structure to map minor numbers to devices, and we already have
> a PPS_MAX_SOURCES #define.
> 
> This is also a more comprehensive fix to the use-after-free bug
> that has already received a minimal patch.
> ---
>  drivers/pps/pps.c  | 66 
> --
>  include/linux/pps_kernel.h |  1 -
>  2 files changed, 34 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/pps/pps.c b/drivers/pps/pps.c
> index 6437703..754b0b5 100644
> --- a/drivers/pps/pps.c
> +++ b/drivers/pps/pps.c
> @@ -41,6 +41,8 @@
>  
>  static dev_t pps_devt;
>  static struct class *pps_class;
> +static struct cdev pps_cdev;
> +
>  
>  static DEFINE_MUTEX(pps_idr_lock);
>  static DEFINE_IDR(pps_idr);
> @@ -244,17 +246,23 @@ static long pps_cdev_ioctl(struct file *file,
>  
>  static int pps_cdev_open(struct inode *inode, struct file *file)
>  {
> - struct pps_device *pps = container_of(inode->i_cdev,
> - struct pps_device, cdev);
> - file->private_data = pps;
> - kobject_get(>dev->kobj);
> - return 0;
> + int err = -ENXIO;
> + struct pps_device *pps;
> +
> + rcu_read_lock();
> + pps = idr_find(_idr, iminor(inode));
> + if (pps) {
> + file->private_data = pps;
> + kobject_get(>dev->kobj);
> + err = 0;
> + }
> + rcu_read_unlock();

This should be:
rcu_read_lock();
pps = idr_find(_idr, iminor(inode));
rcu_read_unlock();
if (pps) {
file->private_data = pps;
kobject_get(>dev->kobj);
err = 0;
}

It's only the internal structures of idr that need rcu barriers.

> + return err;
>  }
>  
>  static int pps_cdev_release(struct inode *inode, struct file *file)
>  {
> - struct pps_device *pps = container_of(inode->i_cdev,
> - struct pps_device, cdev);
> + struct pps_device *pps = file->private_data;
>   kobject_put(>dev->kobj);
>   return 0;
>  }
> @@ -277,8 +285,6 @@ static void pps_device_destruct(struct device *dev)
>  {
>   struct pps_device *pps = dev_get_drvdata(dev);
>  
> - cdev_del(>cdev);
> -
>   /* Now we can release the ID for re-use */
>   pr_debug("deallocating pps%d\n", pps->id);
>   mutex_lock(_idr_lock);
> @@ -295,17 +301,14 @@ int pps_register_cdev(struct pps_device *pps)
>   dev_t devt;
>  
>   mutex_lock(_idr_lock);
> - /* Get new ID for the new PPS source */
> - if (idr_pre_get(_idr, GFP_KERNEL) == 0) {
> - mutex_unlock(_idr_lock);
> - return -ENOMEM;
> - }
> -
> - /* Now really allocate the PPS source.
> + /* Get new ID for the new PPS source.
>* After idr_get_new() calling the new source will be freely available
>* into the kernel.
>*/
> - err = idr_get_new(_idr, pps, >id);
> + if (idr_pre_get(_idr, GFP_KERNEL) == 0)
> + err = -ENOMEM;
> + else
> + err = idr_get_new(_idr, pps, >id);

Your maintainer should be letting you know about this:

 Forwarded Message 
From: Tejun Heo 
To: a...@linux-foundation.org
Cc: linux-kernel@vger.kernel.org, ru...@rustcorp.com.au, bfie...@fieldses.org,
skinsbur...@parallels.com, ebied...@xmission.com, jmor...@namei.org, 
ax...@kernel.dk,
Tejun Heo , Rodolfo Giometti 
Subject: [PATCH 41/62] pps: convert to idr_alloc()
Date: Sat, 2 Feb 2013 17:20:42 -0800

Convert to the much saner new idr interface.

Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Rodolfo Giometti 
---
This patch depends on an earlier idr changes and I think it would be
best to route these together through -mm.  Please holler if there's
any objection.  Thanks.

 drivers/pps/kapi.c |  2 +-
 drivers/pps/pps.c  | 36 ++--
 2 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/drivers/pps/kapi.c b/drivers/pps/kapi.c
index f197e8e..cdad4d9 100644
--- a/drivers/pps/kapi.c
+++ b/drivers/pps/kapi.c
@@ -102,7 +102,7 @@ struct pps_device *pps_register_source(struct 
pps_source_info *info,
goto pps_register_source_exit;
}
 
-   /* These initializations must be done before calling idr_get_new()
+   /* These initializations must be done before calling idr_alloc()
 * in order to avoid reces into pps_event().
 */
pps->params.api_version = PPS_API_VERS;
diff --git a/drivers/pps/pps.c b/drivers/pps/pps.c
index 2420d5a..de8e663 100644
--- a/drivers/pps/pps.c
+++ b/drivers/pps/pps.c
@@ -290,29 +290,21 @@ int pps_register_cdev(struct pps_device *pps)
dev_t devt;
 
mutex_lock(_idr_lock);
-   /* Get new ID for the new PPS source */
-   if (idr_pre_get(_idr, GFP_KERNEL) == 0) {
-   mutex_unlock(_idr_lock);
-   return 

Re: [patch v5 06/15] sched: log the cpu utilization at rq

2013-02-20 Thread Alex Shi
On 02/20/2013 11:20 PM, Peter Zijlstra wrote:
> On Wed, 2013-02-20 at 22:33 +0800, Alex Shi wrote:
>>> There's generally a better value than 100 when using computers..
>> seeing
>>> how 100 is 64+32+4.
>>
>> I didn't find a good example for this. and no idea of your suggestion,
>> would you like to explain a bit more?
> 
> Basically what you're doing ends up being fixed point math, using 100 as
> unit is inefficient, pick a power-of-2 and everything reduces to
> bit-shifts.
> 
> http://en.wikipedia.org/wiki/Fixed-point_arithmetic
> 
> So use 128 or 1024 or whatever and you don't need mult and div
> instructions to represent [0,1]
> 

got it. will reconsider this.

-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] of/i2c: don't register disabled devices

2013-02-20 Thread Dmitry Eremin-Solenikov
On 20/02/13 23:09, Rob Herring wrote:
> On 02/20/2013 12:28 PM, Dmitry Eremin-Solenikov wrote:
>> Don't register i2c slave device tree nodes which have
>> status = "disabled" property.
>>
> 
> This is already in 3.8.

Ah, true. Sorry for the noise then.

-- 
With best wishes
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v5 11/15] sched: add power/performance balance allow flag

2013-02-20 Thread Alex Shi
On 02/20/2013 11:22 PM, Borislav Petkov wrote:
> On Wed, Feb 20, 2013 at 10:20:19PM +0800, Alex Shi wrote:
 > >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 > >> index 2e8131d..0047856 100644
 > >> --- a/kernel/sched/fair.c
 > >> +++ b/kernel/sched/fair.c
 > >> @@ -4053,6 +4053,8 @@ struct lb_env {
 > >>   unsigned intloop;
 > >>   unsigned intloop_break;
 > >>   unsigned intloop_max;
 > >> + int power_lb;  /* if power balance needed */
 > >> + int perf_lb;   /* if performance balance 
 > >> needed */
>>> > > 
>>> > > Those look like they're used like simple boolean flags. Why not make
>>> > > them such, i.e. bitfields? See struct perf_event_attr for an example.
>> > 
>> > there are 11 long words in struct lb_env now. use boolean or bitfields
>> > can't save much space.
> Now now maybe.
> 
> Btw, there's a ->flags variable there which simply cries to get another
> LBF_* flag or two. This way you don't add any new members at all and
> don't enlarge the struct.
> 

Yes, use flags can save 2 int variable, I will change that.

Just curious, consider the lb_env size and just used in stack, plus the
big cacheline size of modern cpu, and the alignment of gcc flag on
kernel, seems no arch needs more cache lines. Are there any platforms
performance is impacted by this 2 int variables?

-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Update][PATCH 2/7] ACPI / scan: Introduce common code for ACPI-based device hotplug

2013-02-20 Thread Toshi Kani
On Wed, 2013-02-20 at 23:49 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
 :
> +
> +/**
> + * acpi_bus_hot_remove_device: hot-remove a device and its children
> + * @context: struct acpi_eject_event pointer (freed in this func)
> + *
> + * Hot-remove a device and its children. This function frees up the
> + * memory space passed by arg context, so that the caller may call
> + * this function asynchronously through acpi_os_hotplug_execute().
> + */
> +void acpi_bus_hot_remove_device(void *context)
> +{
> + struct acpi_eject_event *ej_event = context;
> + struct acpi_device *device = ej_event->device;
> + acpi_handle handle = device->handle;
> + u32 ost_code = ACPI_OST_SC_SUCCESS;
> + int error;
> +
> + mutex_lock(_scan_lock);
> +
> + error = acpi_scan_hot_remove(device);
> + if (error)
> + ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE;
> +
> + acpi_evaluate_hotplug_ost(handle, ej_event->event, ost_code, NULL);

Thanks for the quick update.  It fixed the deadlock issue. :-)  As it
now completes an eject operation, I found a new issue.  When the OS
called _EJ0, it is not supposed to call _OST since FW has already
received the completion status from _EJ0.  That is, the OS calls either
_EJ0 (success case) or _OST (failure case) for hot-delete. 

-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH EDAC 03/13] ghes: add the needed hooks for EDAC error report

2013-02-20 Thread Huang Ying
Sorry for late!

On Fri, 2013-02-15 at 10:44 -0200, Mauro Carvalho Chehab wrote:
> In order to allow reporting errors via EDAC, add hooks for:
> 
> 1) register an EDAC driver;
> 2) unregister an EDAC driver;
> 3) report errors via EDAC.
> 
> As the EDAC driver will need to access the ghes structure, adds it
> as one of the parameters for ghes_do_proc.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  drivers/acpi/apei/ghes.c | 17 ++---
>  include/acpi/ghes.h  | 27 +++
>  2 files changed, 41 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 6d0e146..a21d7da 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -409,7 +409,8 @@ static void ghes_clear_estatus(struct ghes *ghes)
>   ghes->flags &= ~GHES_TO_CLEAR;
>  }
>  
> -static void ghes_do_proc(const struct acpi_hest_generic_status *estatus)
> +static void ghes_do_proc(struct ghes *ghes,
> +  const struct acpi_hest_generic_status *estatus)
>  {
>   int sev, sec_sev;
>   struct acpi_hest_generic_data *gdata;
> @@ -421,6 +422,8 @@ static void ghes_do_proc(const struct 
> acpi_hest_generic_status *estatus)
>CPER_SEC_PLATFORM_MEM)) {
>   struct cper_sec_mem_err *mem_err;
>   mem_err = (struct cper_sec_mem_err *)(gdata+1);
> + ghes_edac_report_mem_error(ghes, sev, mem_err);
> +
>  #ifdef CONFIG_X86_MCE
>   apei_mce_report_mem_error(sev == GHES_SEV_CORRECTED,
> mem_err);
> @@ -639,7 +642,7 @@ static int ghes_proc(struct ghes *ghes)
>   if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))
>   ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>   }
> - ghes_do_proc(ghes->estatus);
> + ghes_do_proc(ghes, ghes->estatus);
>  out:
>   ghes_clear_estatus(ghes);
>   return 0;
> @@ -732,7 +735,7 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
>   estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
>   len = apei_estatus_len(estatus);
>   node_len = GHES_ESTATUS_NODE_LEN(len);
> - ghes_do_proc(estatus);
> + ghes_do_proc(estatus_node->ghes, estatus);
>   if (!ghes_estatus_cached(estatus)) {
>   generic = estatus_node->generic;
>   if (ghes_print_estatus(NULL, generic, estatus))
> @@ -821,6 +824,7 @@ static int ghes_notify_nmi(unsigned int cmd, struct 
> pt_regs *regs)
>   estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool,
> node_len);
>   if (estatus_node) {
> + estatus_node->ghes = ghes;
>   estatus_node->generic = ghes->generic;
>   estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
>   memcpy(estatus, ghes->estatus, len);
> @@ -942,6 +946,10 @@ static int ghes_probe(struct platform_device *ghes_dev)
>   }
>   platform_set_drvdata(ghes_dev, ghes);
>  
> + rc = ghes_edac_register(ghes, _dev->dev);
> + if (rc < 0)
> + goto err;
> +

If ghes_edac_register() failed, we need to do some cleanup such as
unregister from hed etc.

Or just move ghes_edac_register() before switch?

>   return 0;
>  err:
>   if (ghes) {
> @@ -995,6 +1003,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>   }
>  
>   ghes_fini(ghes);
> +
> + ghes_edac_unregister(ghes);
> +
>   kfree(ghes);
>  
>   platform_set_drvdata(ghes_dev, NULL);
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 3eb8dc4..c6fef72 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -22,11 +22,14 @@ struct ghes {
>   struct timer_list timer;
>   unsigned int irq;
>   };
> +
> + struct mem_ctl_info *mci;

Why we need this?  This is not used by ghes.[hc].

>  };
>  
>  struct ghes_estatus_node {
>   struct llist_node llnode;
>   struct acpi_hest_generic *generic;
> + struct ghes *ghes;
>  };
>  
>  struct ghes_estatus_cache {
> @@ -43,3 +46,27 @@ enum {
>   GHES_SEV_RECOVERABLE = 0x2,
>   GHES_SEV_PANIC = 0x3,
>  };
> +
> +#ifdef CONFIG_EDAC_GHES
> +void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
> + struct cper_sec_mem_err *mem_err);
> +
> +int ghes_edac_register(struct ghes *ghes, struct device *dev);
> +
> +void ghes_edac_unregister(struct ghes *ghes);
> +
> +#else
> +static inline void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
> +struct cper_sec_mem_err *mem_err)
> +{
> +}
> +
> +static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
> +{
> + return 0;
> +}
> +
> +static inline void ghes_edac_unregister(struct ghes *ghes)

Re: [PATCH 0/3] posix timers: Extend kernel API to report more info about timers

2013-02-20 Thread Matthew Helsley
On Thu, Feb 14, 2013 at 8:18 AM, Pavel Emelyanov  wrote:
> Hi.
>
> I'm working on the checkpoint-restore project (http://criu.org), briefly
> it's aim is to collect information about process' state and saving it so
> that later it is possible to recreate the processes in the very same state
> as they were, using the collected information.
>
> One part of the task's state is the posix timers that this task has created.
> Currently kernel doesn't provide any API for getting information about
> what timers are currently created by process and in which state they are.
> I'd like to extend the posix timers API to provide more information about
> timers.
>
> Another problem with timers is the timer ID. Currently IDs are generated
> from global IDR and this makes it impossible to restore a timer from
> the saved state in general, as the required ID may be already busy at the
> time of restore.
>
> That said, I propose to
>
> 1. Change the way timer IDs are generated. This was done some time ago, so
>I'm just re-sending this patch;

Seems fine in principle. Aside: I noticed there were some
important-looking patches to the idr usage in timer id allocation
today...

> 2. Add a system call that will list timer IDs created by the calling process;

If timers were listed in /proc like fds then you wouldn't need this
syscall. If we keep adding new syscalls like this CRIU will be
needlessly x86-specific when it could have been written more portably.

> 3. Add a system call that will allow to get the sigevent information about
>particular timer in the sigaction-like manner.

You mentioned "extending the POSIX timer API". Isn't that something
best left to standards bodies lest your changes conflict with theirs?
Again, if this were a /proc interface you wouldn't have that issue
(you'll have others ;)).

>
> This is actually an RFC to start discussion about how the described problems
> can be addressed. Thus, if the approach with new system calls is not 
> acceptable,
> I'm OK to implement this in any other form.

My preference is for "other form" for the reasons above.

Cheers,
-Matt Helsley
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id

2013-02-20 Thread Bjorn Helgaas
On Mon, Feb 11, 2013 at 4:00 PM, Shuah Khan  wrote:
> pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however,
> it doesn't have interfaces to return PCI bus and PCI device id. Drivers
> (AMD IOMMU, and AER) implement module specific definitions for PCI_BUS()
> and AMD_IOMMU driver also has a module specific interface to calculate PCI
> device id from bus number and devfn.
>
> Add PCI_BUS and PCI_DEVID interfaces to return PCI bus number and PCI device
> id respectively to avoid the need for duplicate definitions in other modules.
> AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver defines
> an interface to calculate device id from bus number, and devfn pair.
>
> Signed-off-by: Shuah Khan 
> ---
>  include/uapi/linux/pci.h |4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
> index 3c292bc0..6b2c8b3 100644
> --- a/include/uapi/linux/pci.h
> +++ b/include/uapi/linux/pci.h
> @@ -30,6 +30,10 @@
>  #define PCI_DEVFN(slot, func)  slot) & 0x1f) << 3) | ((func) & 0x07))
>  #define PCI_SLOT(devfn)(((devfn) >> 3) & 0x1f)
>  #define PCI_FUNC(devfn)((devfn) & 0x07)
> +#define PCI_DEVID(bus, devfn)  u16)bus) << 8) | devfn)
> +
> +/* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */
> +#define PCI_BUS(x) (((x) >> 8) & 0xff)
>
>  /* Ioctls for /proc/bus/pci/X/Y nodes. */
>  #define PCIIOC_BASE('P' << 24 | 'C' << 16 | 'I' << 8)

David, can you point me at a description of include/uapi ... what is
there and why, and how we should decide what new things go in
include/uapi/linux/pci.h as opposed to include/linux/pci.h?  Maybe
there should be something in Documentation/?

I'm guessing it's something to do with being exported to userland, but
I'm not sure the things in this patch (PCI_DEV_ID, PCI_BUS) are really
exportable in the sense of being used for syscalls, etc.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] KVM updates for the 3.9 merge window

2013-02-20 Thread Marcelo Tosatti


Linus,

Please pull from

git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-3.9-1

to receive the KVM updates for the 3.9 merge window, including x86 real
mode emulation fixes, stronger memory slot interface restrictions, 
mmu_lock spinlock hold time reduction, improved handling of large 
page faults on shadow, initial APICv HW acceleration support, 
s390 channel IO based virtio, amongst others.

--

Alex Williamson (13):
  KVM: Restrict non-existing slot state transitions
  KVM: Check userspace_addr when modifying a memory slot
  KVM: Fix iommu map/unmap to handle memory slot moves
  KVM: Minor memory slot optimization
  KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS
  KVM: Make KVM_PRIVATE_MEM_SLOTS optional
  KVM: struct kvm_memory_slot.user_alloc -> bool
  KVM: struct kvm_memory_slot.flags -> u32
  KVM: struct kvm_memory_slot.id -> short
  KVM: Increase user memory slots on x86 to 125
  kvm: Fix memory slot generation updates
  kvm: Force IOMMU remapping on memory slot read-only flag changes
  kvm: Obey read-only mappings in iommu

Alexander Graf (17):
  KVM: PPC: Only WARN on invalid emulation
  KVM: PPC: Book3S: PR: Enable alternative instruction for SC 1
  KVM: PPC: BookE: Allow irq deliveries to inject requests
  KVM: PPC: BookE: Emulate mfspr on EPR
  KVM: PPC: BookE: Implement EPR exit
  KVM: PPC: BookE: Add EPR ONE_REG sync
  KVM: PPC: E500: Move write_stlbe higher
  KVM: PPC: E500: Explicitly mark shadow maps invalid
  KVM: PPC: E500: Propagate errors when shadow mapping
  KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping
  KVM: PPC: E500: Split host and guest MMU parts
  KVM: PPC: e500: Implement TLB1-in-TLB0 mapping
  KVM: PPC: E500: Make clear_tlb_refs and clear_tlb1_bitmap static
  KVM: PPC: E500: Remove kvmppc_e500_tlbil_all usage from guest TLB code
  Merge commit 'origin/next' into kvm-ppc-next
  KVM: PPC: BookE: Handle alignment interrupts
  Merge commit 'origin/next' into kvm-ppc-next

Avi Kivity (16):
  KVM: x86 emulator: framework for streamlining arithmetic opcodes
  KVM: x86 emulator: Support for declaring single operand fastops
  KVM: x86 emulator: introduce NoWrite flag
  KVM: x86 emulator: mark CMP, CMPS, SCAS, TEST as NoWrite
  KVM: x86 emulator: convert NOT, NEG to fastop
  KVM: x86 emulator: add macros for defining 2-operand fastop emulation
  KVM: x86 emulator: convert basic ALU ops to fastop
  KVM: x86 emulator: Convert SHLD, SHRD to fastop
  KVM: x86 emulator: convert shift/rotate instructions to fastop
  KVM: x86 emulator: covert SETCC to fastop
  KVM: x86 emulator: convert INC/DEC to fastop
  KVM: x86 emulator: convert BT/BTS/BTR/BTC/BSF/BSR to fastop
  KVM: x86 emulator: convert 2-operand IMUL to fastop
  KVM: x86 emulator: rearrange fastop definitions
  KVM: x86 emulator: convert a few freestanding emulations to fastop
  KVM: x86 emulator: fix test_cc() build failure on i386

Bharat Bhushan (3):
  KVM: PPC: booke: use vcpu reference from thread_struct
  KVM: PPC: booke: Allow multiple exception types
  booke: Added DBCR4 SPR number

Christian Borntraeger (3):
  KVM: s390: Gracefully handle busy conditions on ccw_device_start
  s390/kvm: Fix store status for ACRS/FPRS
  s390/kvm: Fix instruction decoding

Cong Ding (1):
  KVM: s390: kvm/sigp.c: fix memory leakage

Cornelia Huck (14):
  KVM: s390: Handle hosts not supporting s390-virtio.
  s390/ccwdev: Include asm/schid.h.
  KVM: s390: Add a channel I/O based virtio transport driver.
  KVM: s390: Constify intercept handler tables.
  KVM: s390: Decoding helper functions.
  KVM: s390: Support for I/O interrupts.
  KVM: s390: Add support for machine checks.
  KVM: s390: In-kernel handling of I/O instructions.
  KVM: s390: Base infrastructure for enabling capabilities.
  KVM: s390: Add support for channel I/O instructions.
  KVM: s390: Dynamic allocation of virtio-ccw I/O data.
  KVM: trace: Fix exit decoding.
  s390/virtio-ccw: Fix setup_vq error handling.
  KVM: s390: Fix handling of iscs.

Dongxiao Xu (1):
  KVM: VMX: disable SMEP feature when guest is in non-paging mode

Geoff Levand (1):
  KVM: Remove duplicate text in api.txt

Gleb Natapov (39):
  KVM: emulator: implement AAD instruction
  KVM: inject ExtINT interrupt before APIC interrupts
  KVM: remove unused variable.
  KVM: VMX: cleanup rmode_segment_valid()
  KVM: VMX: relax check for CS register in rmode_segment_valid()
  KVM: VMX: return correct segment limit and flags for CS/SS registers in 
real mode
  KVM: VMX: use fix_rmode_seg() to fix all code/data segments
  KVM: VMX: remove redundant code from vmx_set_segment()
  KVM: VMX: clean-up vmx_set_segment()
  KVM: VMX: remove unneeded temporary variable from vmx_set_segment()
  

Re: [PATCH v2] vt: add init_hide parameter to suppress boot output

2013-02-20 Thread Greg Kroah-Hartman
On Wed, Feb 20, 2013 at 02:08:25PM -0800, Andy Ross wrote:
> On 02/20/2013 12:57 PM, Pavel Machek wrote:
> >I'm sure something creative can be done with fake init that shuts
> >the console up then execs previous init. No need to add more kernel
> >knobs, I'd say.
> 
> Fair enough, but some last words:
> 
> That's argument is the "it's about logging" hypothesis again.  Even if
> it were possible to completely shut up console output (something
> that's awfully hard in the general case when running on PC hardware,
> and IMHO from a developer's perspective not even a good thing), that's
> not the whole problem.  The framebuffer console initialization does a
> buffer clear and mode set, and that clobbers anything the bootloader
> might have left on the screen prematurely, before userspace is ready
> to throw up its own splash.  Splash screens may be a silly
> requirement, but they're still a requirement.

Yes, they are a requirement in some situations, and if you look most
distros have already solved this issue for you, by not using a
framebuffer at all.  Why not just do the same thing in your Android
system as you do have full control over the hardware and the boot
process.

> And the suspend console problem is likewise at work: ideally you'd
> like to know, for example, that the panel backlight is off before
> suspending.  But what happens in practice is that the kernel does a VT
> switch to/from console 63 and the backlight wakes up (I'm not going to
> pretend I have this bit completely figured out, but the problem is/was
> real and this patch fixed it by suppressing the console visibility).

My systems don't drop down to the framebuffer when suspending, I think
you need to look at using a better distro :)

> Now, the point that an in-kernel console is "going away" and thus not
> worth augmenting with new APIs is valid.  And this is a small patch
> that's unlikely to be difficult to maintain in a custom tree.  And as
> we all agree there are other mechanisms that can be used here (even if
> AFAICT they don't completely solve the problem), and indeed I'd love
> to get surfaceflinger working with VT_ACTIVATE et. al. if I get a
> chance.  So I'm not going to cry if this isn't worth mainline.

I don't see why this is even needed for surfaceflinger systems, as
again, you have full control over the hardware and system so you don't
even need a framebuffer console at all.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Documentation: update top level 00-INDEX file with new additions

2013-02-20 Thread Rob Landley

On 02/18/2013 09:57:36 AM, Randy Dunlap wrote:

On 02/18/13 01:39, Jiri Kosina wrote:
> On Thu, 14 Feb 2013, Paul Gortmaker wrote:
>
>> It seems there are about 80 new, but undocumented addtions at
>> the top level Documentation directory.  This fixes up the top
>> level 00-INDEX by adding new entries and deleting a couple orphans.
>> Some subdirs could probably still use a check/cleanup too though.

After this patch, I would prefer to see a requirement that each  
Documentation/
file contain a "topic" line and then generate INDEX files from those  
automatically...


comments?


I actually have a script that can audit the 00-INDEX files, as part of  
my kernel.org/doc build stuff:


  http://landley.net/hg/kdocs/file/tip/make

Manually auditing these isn't hard for me, it's just that since  
kernel.org went all-in on locking the barn door after the horses  
escaped, I haven't had access to my old kernel.org account (I need to  
meed kernel developers in person to get keys signed, which doesn't  
happen much).


And even if I did get a new ssh key, you don't get shell access anymore  
you get "kup" which is a git wrapper you can't rsync through. So fixing  
problem 1 opens up problem 2 and I still can't do anything useful.  
(Navigating the new bureaucracy is on my todo list, but not really  
something I sit down and go "oh boy, I should work on THIS" on any  
given evening.)


So I haven't been able to update kernel.org/doc since the breakin, and  
my tools for auditing the 00-INDEX files and htmldocs and menuconfig  
and so on are all tied up with that.


Rob--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] lockdep: check that no locks held at freeze time

2013-02-20 Thread Andrew Morton
On Wed, 20 Feb 2013 16:28:07 -0800
Mandeep Singh Baines  wrote:

> > Backtraces aren't *that* bad.  We'll easily be able to tell which of
> > the two callsites triggered the trace.
> >
> 
> Let's say there was a try_to_freeze() that got inlined indirectly
> (multiple levels of inline) into do_exit. Wouldn't the backtraces for
> the regular exit check and the try_to_freeze check be identical except
> for the offset (do_exit+0x45 versus do_exit+0x88)? So unless you had
> an object file you wouldn't know which check you hit.

Mutter.  Spose so.  Vaguely possible.  Yes, if we want to avoid a
wont-happen, use __FILE__ and __LINE__.  Or, probably more sanely,
__func__.

Or uninline try_to_freeze().  If anything's calling that at high
frequency, we have a problem.  And given the number of callsites,
getting it into icache might result in a faster kernel...

(Someone needs to teach __might_sleep() about __ratelimit())
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] user namespace and namespace infrastructure changes for 3.9

2013-02-20 Thread Eric W. Biederman

Linus,

Please pull the for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
for-linus

   HEAD: 139321c65c0584cd65c4c87a5eb3fdb4fdbd0e19 cifs: Enable building with 
user namespaces enabled.

   This tree is against v3.8-rc1 with the first few bug-fix commits
   already merged into v3.8.

This set of changes starts with a few small enhnacements to the user
namespace.  reboot support, allowing more arbitrary mappings, and
support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the user
namespace root.

I do my best to document that if you care about limiting your
unprivileged users that when you have the user namespace support enabled
you will need to enable memory control groups.

There is a minor bug fix to prevent overflowing the stack if someone
creates way too many user namespaces.

The bulk of the changes are a continuation of the kuid/kgid push down
work through the filesystems.  These changes make using uids and gids
typesafe which ensures that these filesystems are safe to use when
multiple user namespaces are in use.  The filesystems converted for 3.9
are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs.  The changes
for these filesystems were a little more involved so I split the changes
into smaller hopefully obviously correct changes.

XFS is the only filesystem that remains.  I was hoping I could get that
in this release so that user namespace support would be enabled with an
allyesconfig or an allmodconfig but it looks like the xfs changes need
another couple of days before it they are ready.

Eric W. Biederman (91):
  userns: Avoid recursion in put_user_ns
  userns: Allow any uid or gid mappings that don't overlap.
  userns: Recommend use of memory control groups.
  userns: Allow the userns root to mount of devpts
  userns: Allow the userns root to mount ramfs.
  userns: Allow the userns root to mount tmpfs.
  ceph: Only allow mounts in the initial network namespace
  ceph: Translate between uid and gids in cap messages and kuids and kgids
  ceph: Translate inode uid and gid attributes to/from kuids and kgids.
  ceph: Convert struct ceph_mds_request to use kuid_t and kgid_t
  ceph: Convert kuids and kgids before printing them.
  ceph: Enable building when user namespaces are enabled.
  9p: Add 'u' and 'g' format specifies for kuids and kgids
  9p: Transmit kuid and kgid values
  9p: Modify the stat structures to use kuid_t and kgid_t
  9p: Modify struct 9p_fid to use a kuid_t not a uid_t
  9p: Modify struct v9fs_session_info to use a kuids and kgids
  9p: Modify v9fs_get_fsgid_for_create to return a kgid
  9p: Allow building 9p with user namespaces enabled.
  afs: Remove unused structure afs_store_status
  afs: Only allow mounting afs in the intial network namespace
  afs: Support interacting with multiple user namespaces
  coda: Restrict coda messages to the initial pid namespace
  coda: Restrict coda messages to the initial user namespace
  coda: Cache permisions in struct coda_inode_info in a kuid_t.
  coda: Allow coda to be built when user namespace support is enabled
  ocfs2: Handle kuids and kgids in acl/xattr conversions.
  ocfs2: convert between kuids and kgids and DLM locks
  ocfs2: Convert uid and gids between in core and on disk inodes
  ocfs2: For tracing report the uid and gid values in the initial user 
namespace
  ocfs2: Compare kuids and kgids using uid_eq and gid_eq
  ocfs2: Enable building with user namespaces enabled
  gfs2: Remove improper checks in gfs2_set_dqblk.
  gfs2: Split NO_QUOTA_CHANGE inot NO_UID_QUTOA_CHANGE and 
NO_GID_QUTOA_CHANGE
  gfs2: Report quotas in the caller's user namespace.
  gfs2: Introduce qd2index
  gfs2: Modify struct gfs2_quota_change_host to use struct kqid
  gfs2: Modify qdsb_get to take a struct kqid
  gfs2: Convert gfs2_quota_refresh to take a kqid
  gfs2: Store qd_id in struct gfs2_quota_data as a struct kqid
  gfs2: Remove the QUOTA_USER and QUOTA_GROUP defines
  gfs2: Use kuid_t and kgid_t types where appropriate.
  gfs2: Use uid_eq and gid_eq where appropriate
  gfs2: Convert uids and gids between dinodes and vfs inodes.
  gfs2: Enable building with user namespaces enabled
  ncpfs: Support interacting with multiple user namespaces
  nfs_common: Update the translation between nfsv3 acls linux posix acls
  sunrpc: Use userns friendly constants.
  sunrpc: Use kuid_t and kgid_t where appropriate
  sunrpc: Use uid_eq and gid_eq where appropriate
  sunrpc: Simplify auth_unix now that everything is a kgid_t
  sunrpc: Convert kuids and kgids to uids and gids for printing
  sunrpc: Use gid_valid to test for gid != INVALID_GID
  sunrpc: Update gss uid to security context mapping.
  sunrpc: Update svcgss xdr handle to rpsec_contect cache
  sunrpc: Hash uids by 

Re: sched: Fix signedness bug in yield_to()

2013-02-20 Thread Shuah Khan
On Tue, Feb 19, 2013 at 7:27 PM, Linux Kernel Mailing List
 wrote:
> Gitweb: 
> http://git.kernel.org/linus/;a=commit;h=c3c186403c6abd32e719f005f0af950155a9e54d
> Commit: c3c186403c6abd32e719f005f0af950155a9e54d
> Parent: e0a79f529d5ba2507486d498b25da40911d95cf6
> Author: Dan Carpenter 
> AuthorDate: Tue Feb 5 14:37:51 2013 +0300
> Committer:  Ingo Molnar 
> CommitDate: Tue Feb 5 12:59:29 2013 +0100
>
> sched: Fix signedness bug in yield_to()
>
> In 7b270f6099 "sched: Bail out of yield_to when source and
> target runqueue has one task" we changed this to store -ESRCH so
> it needs to be signed.

Dan, Ingo,

I can't find the 7b270f6099 "sched: Bail out of yield_to when source
and  target runqueue has one task" in the latest Linus's git. Am I
missing something.

The current kenel/sched/core.c doesn't have the code from the
associated patch https://patchwork.kernel.org/patch/2016651/

>  bool __sched yield_to(struct task_struct *p, bool preempt)
>  {
> @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
> preempt)
>
>  again:
>   p_rq = task_rq(p);
> + /*
> +  * If we're the only runnable task on the rq and target rq also
> +  * has only one task, there's absolutely no point in yielding.
> +  */
> + if (rq->nr_running == 1 && p_rq->nr_running == 1) {
> + yielded = -ESRCH;
> + goto out_irq;
> + }

Without the 7b270f6099 "sched: Bail out of yield_to when source and
target runqueue has one task", do you need this change?

Am I missing something?

-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Corrupt packets with ath5k

2013-02-20 Thread JA Magallón

On 02/20/2013 08:58 PM, JA Magallón wrote:

Hi all...

I have a strange problem with latest kernels. When I update my netbook
I get many "Installation failed, bad rpms:" mesages. First I thougt
that the oldie cheap ssd was failing (it is an Aspire AOA110).
But the updates work always fine when plugged to ethernet.

Now i tried to copy a couple RPMS via ssh over wifi and got:

mplayer-1.1-11.r35916.1.mga3.tainted.i586.rpm  100% 2192KB   2.1MB/s   00:00
Received disconnect from 192.168.1.51: 2: Packet corrupt
lost connection

Hardware/driver are these:

03:00.0 Ethernet controller: Atheros Communications Inc. AR242x / AR542x 
Wireless Network Adapter (PCI-Express) (rev 01)
 Subsystem: Foxconn International, Inc. Device e008
 Kernel driver in use: ath5k

Ist that hardware failing ? Is driver failing ? Other laptop/androids
update fine with the router via wifi.
Any idea ?

I currently have 3.8.0, distro build.



I tried with 3.7.1 and everything works fine.
Possible clues:
- 3.7.1: works fine
- warm boot in 3.8.0: transfer stalls and scp looks hanged, speed
  drops to zero, but no error message
- cold boot in 3.8.0: transfer stops with Packet corrupt error or sometimes
  stalls..


TIA




--
J.A. Magallon \   Winter is coming...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] lockdep: check that no locks held at freeze time

2013-02-20 Thread Mandeep Singh Baines
On Wed, Feb 20, 2013 at 4:20 PM, Andrew Morton
 wrote:
> On Wed, 20 Feb 2013 16:17:39 -0800
> Mandeep Singh Baines  wrote:
>
>> On Wed, Feb 20, 2013 at 3:24 PM, Andrew Morton
>>  wrote:
>> > On Wed, 20 Feb 2013 15:17:16 -0800
>> > Mandeep Singh Baines  wrote:
>> >
>> >> We shouldn't try_to_freeze if locks are held.
>> >>
>> >> ...
>> >>
>> >> @@ -43,6 +44,9 @@ extern void thaw_kernel_threads(void);
>> >>
>> >> + if (!(current->flags & PF_NOFREEZE))
>> >> + debug_check_no_locks_held(current,
>> >> +
>> >> "lock held while trying to 
>> >> freeze");
>> >> ...
>> >>
>> >> + debug_check_no_locks_held(tsk, "lock held at task exit time");
>> >
>> > There doesn't seem much point in adding the `msg' to
>> > debug_check_no_locks_held() - the dump_stack() in
>> > print_held_locks_bug() will tell us the same thing.  Maybe just change
>>
>> dump_stack() can be confusing when there is inlining. On occasion I've
>> looked at the wrong mutex_lock, for example, when there was another
>> mutex_lock that was inlined. Of course, you can start objdump and
>> verify the offsets. But that requires that you have the object file.
>> You could have a try_to_freeze added to do_exit. I was thinking of
>> adding another locks_held in the return from syscall path.
>
> Backtraces aren't *that* bad.  We'll easily be able to tell which of
> the two callsites triggered the trace.
>

Let's say there was a try_to_freeze() that got inlined indirectly
(multiple levels of inline) into do_exit. Wouldn't the backtraces for
the regular exit check and the try_to_freeze check be identical except
for the offset (do_exit+0x45 versus do_exit+0x88)? So unless you had
an object file you wouldn't know which check you hit.

Regards,
Mandeep
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tools: usb: ffs-test: Fix build failure

2013-02-20 Thread Maxin B. John
Hi,

On Thu, Feb 21, 2013 at 2:06 AM, Greg KH  wrote:
> On Thu, Feb 21, 2013 at 01:57:51AM +0200, maxin.j...@gmail.com wrote:
>> From: "Maxin B. John" 
>>
>> Fixes this build failure:
>> gcc -Wall -Wextra -g -lpthread -I../include -o testusb testusb.c
>> gcc -Wall -Wextra -g -lpthread -I../include -o ffs-test ffs-test.c
>> In file included from ffs-test.c:41:0:
>> ../../include/linux/usb/functionfs.h:4:39: fatal error:
>> uapi/linux/usb/functionfs.h: No such file or directory
>> compilation terminated.
>> make: *** [ffs-test] Error 1
>
> This is a build failure where, 3.8, or linux-next, or somewhere else?

It is in 3.8

> thanks,
>
> greg k-h

Best Regards,
Maxin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] lockdep: check that no locks held at freeze time

2013-02-20 Thread Andrew Morton
On Wed, 20 Feb 2013 16:17:39 -0800
Mandeep Singh Baines  wrote:

> On Wed, Feb 20, 2013 at 3:24 PM, Andrew Morton
>  wrote:
> > On Wed, 20 Feb 2013 15:17:16 -0800
> > Mandeep Singh Baines  wrote:
> >
> >> We shouldn't try_to_freeze if locks are held.
> >>
> >> ...
> >>
> >> @@ -43,6 +44,9 @@ extern void thaw_kernel_threads(void);
> >>
> >> + if (!(current->flags & PF_NOFREEZE))
> >> + debug_check_no_locks_held(current,
> >> +
> >> "lock held while trying to 
> >> freeze");
> >> ...
> >>
> >> + debug_check_no_locks_held(tsk, "lock held at task exit time");
> >
> > There doesn't seem much point in adding the `msg' to
> > debug_check_no_locks_held() - the dump_stack() in
> > print_held_locks_bug() will tell us the same thing.  Maybe just change
> 
> dump_stack() can be confusing when there is inlining. On occasion I've
> looked at the wrong mutex_lock, for example, when there was another
> mutex_lock that was inlined. Of course, you can start objdump and
> verify the offsets. But that requires that you have the object file.
> You could have a try_to_freeze added to do_exit. I was thinking of
> adding another locks_held in the return from syscall path.

Backtraces aren't *that* bad.  We'll easily be able to tell which of
the two callsites triggered the trace.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] lockdep: check that no locks held at freeze time

2013-02-20 Thread Mandeep Singh Baines
On Wed, Feb 20, 2013 at 3:24 PM, Andrew Morton
 wrote:
> On Wed, 20 Feb 2013 15:17:16 -0800
> Mandeep Singh Baines  wrote:
>
>> We shouldn't try_to_freeze if locks are held.
>>
>> ...
>>
>> @@ -43,6 +44,9 @@ extern void thaw_kernel_threads(void);
>>
>> + if (!(current->flags & PF_NOFREEZE))
>> + debug_check_no_locks_held(current,
>> +
>> "lock held while trying to freeze");
>> ...
>>
>> + debug_check_no_locks_held(tsk, "lock held at task exit time");
>
> There doesn't seem much point in adding the `msg' to
> debug_check_no_locks_held() - the dump_stack() in
> print_held_locks_bug() will tell us the same thing.  Maybe just change

dump_stack() can be confusing when there is inlining. On occasion I've
looked at the wrong mutex_lock, for example, when there was another
mutex_lock that was inlined. Of course, you can start objdump and
verify the offsets. But that requires that you have the object file.
You could have a try_to_freeze added to do_exit. I was thinking of
adding another locks_held in the return from syscall path.

How about if we did some inlining and printed out the function, file
and line number where the check was placed:

#define debug_check_no_locks_held() do { \
if (unlikely(current->lockdep_depth > 0)) { \
printk("BUG: locks helds at %s:%d/%s()!\n", __FILE__,
__LINE__, __func__); \
print_held_locks_bug(); \
} \
} while (0)

That we avoid any potential confusion.

> the print_held_locks_bug() messages so they stop assuming they were
> called from do_exit()?
>
> Also, I wonder if the `tsk' arg is needed.  In both callers
> tsk==current.  Is it likely that we'll ever call
> debug_check_no_locks_held() for any task other than `current'?
>

I agree. I'll add that change to the patch once we decide what to about msg.

Regards,
Mandeep
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:x86/urgent] x86, efi: Make "noefi" really disable EFI runtime serivces

2013-02-20 Thread Yinghai Lu
On Wed, Feb 20, 2013 at 3:25 PM, tip-bot for Matt Fleming
 wrote:
> Commit-ID:  fb834c7acc5e140cf4f9e86da93a66de8c0514da
> Gitweb: http://git.kernel.org/tip/fb834c7acc5e140cf4f9e86da93a66de8c0514da
> Author: Matt Fleming 
> AuthorDate: Wed, 20 Feb 2013 20:36:12 +
> Committer:  H. Peter Anvin 
> CommitDate: Wed, 20 Feb 2013 13:18:36 -0800
>
> x86, efi: Make "noefi" really disable EFI runtime serivces
>
> commit 1de63d60cd5b ("efi: Clear EFI_RUNTIME_SERVICES rather than
> EFI_BOOT by "noefi" boot parameter") attempted to make "noefi" true to
> its documentation and disable EFI runtime services to prevent the
> bricking bug described in commit e0094244e41c ("samsung-laptop:
> Disable on EFI hardware"). However, it's not possible to clear
> EFI_RUNTIME_SERVICES from an early param function because
> EFI_RUNTIME_SERVICES is set in efi_init() *after* parse_early_param().
>
> This resulted in "noefi" effectively becoming a no-op and no longer
> providing users with a way to disable EFI, which is bad for those
> users that have buggy machines.
>
> Reported-by: Walt Nelson Jr 
> Cc: Satoru Takeuchi 
> Cc: 
> Signed-off-by: Matt Fleming 
> Link: 
> http://lkml.kernel.org/r/1361392572-25657-1-git-send-email-m...@console-pimps.org
> Signed-off-by: H. Peter Anvin 
> ---
>  arch/x86/platform/efi/efi.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 928bf83..e2cd38f 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -85,9 +85,10 @@ int efi_enabled(int facility)
>  }
>  EXPORT_SYMBOL(efi_enabled);
>
> +static bool disable_runtime = false;

__initdata please.

>  static int __init setup_noefi(char *arg)
>  {
> -   clear_bit(EFI_RUNTIME_SERVICES, _efi_facility);
> +   disable_runtime = true;
> return 0;
>  }
>  early_param("noefi", setup_noefi);
> @@ -734,7 +735,7 @@ void __init efi_init(void)
> if (!efi_is_native())
> pr_info("No EFI runtime due to 32/64-bit mismatch with 
> kernel\n");
> else {
> -   if (efi_runtime_init())
> +   if (disable_runtime || efi_runtime_init())
> return;
> set_bit(EFI_RUNTIME_SERVICES, _efi_facility);
> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Wonky PS2-USB converter issues...

2013-02-20 Thread Valdis . Kletnieks
On Wed, 20 Feb 2013 16:07:49 -0800, Greg Kroah-Hartman said:

> PS-2 connectors can not normally handle hotplugging, the protocol
> doesn't allow it, and for some unlucky devices, it could actually fry
> the motherboard or the PS-2 device.
>
> So that's probably the issue here, the device just doesn't support it,
> sorry.

You mis-understood the problem.

This works:

PS/2 keyboard plugged into this device:
Bus 006 Device 002: ID 0e8f:0020 GreenAsia Inc. USB to PS/2 Adapter
USB side of device plugged into USB port on the Latitude laptop.
Power up, boot - the keyboard works.

If you plug the USB side of the GreenAsia adapter into a USB slot on
the dock, the keyboard is dead and not recognized by the system.  However,
replugging the USB, or unplug/plug the PS/2 side, and it becomes recognized
and starts working.

This tells me that the dock is doing something busted with USB that the laptop
does correctly, and not enumerating devices until something happens to whack it
upside the head. I was hoping to identify it and maybe quirk it.



pgpnMBk2nq6Ub.pgp
Description: PGP signature


Re: Wonky PS2-USB converter issues...

2013-02-20 Thread Greg Kroah-Hartman
On Wed, Feb 20, 2013 at 05:48:38PM -0500, Valdis Kletnieks wrote:
> Quite some time ago, I posted about a problematic PS2-USB converter
> that I used to connect an old PS2-connector keyboard to my laptop
> dock, where the keyboard wouldn't be recognized at boot unless
> I unplugged and reconnected it.
> 
> Well, I've recently figured out (partly by obtaining a converter
> from a different vendor) that the converters both work correctly
> when plugged into a USB port that's physically on my Dell Latitude
> laptop - but they have issues and require a reconnect cycle when
> plugged into the docking station for the laptop. (It took a long time
> to figure this out, because *of course* you plug this sort of stuff
> into the docking station specifically so you don't have to plug it
> in every morning).
> 
> So obviously, it isn't the converter that is the problem, but the docking
> station is doing "something stupid".  Anybody have suggestions on figuring out
> what it's doing wrong?

PS-2 connectors can not normally handle hotplugging, the protocol
doesn't allow it, and for some unlucky devices, it could actually fry
the motherboard or the PS-2 device.

So that's probably the issue here, the device just doesn't support it,
sorry.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tools: usb: ffs-test: Fix build failure

2013-02-20 Thread Greg KH
On Thu, Feb 21, 2013 at 01:57:51AM +0200, maxin.j...@gmail.com wrote:
> From: "Maxin B. John" 
> 
> Fixes this build failure:
> gcc -Wall -Wextra -g -lpthread -I../include -o testusb testusb.c
> gcc -Wall -Wextra -g -lpthread -I../include -o ffs-test ffs-test.c
> In file included from ffs-test.c:41:0:
> ../../include/linux/usb/functionfs.h:4:39: fatal error:
> uapi/linux/usb/functionfs.h: No such file or directory
> compilation terminated.
> make: *** [ffs-test] Error 1

This is a build failure where, 3.8, or linux-next, or somewhere else?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 1/2] cpufreq: Convert the cpufreq_driver_lock to a rwlock

2013-02-20 Thread Nathan Zimmer
This eliminates the contention I am seeing in __cpufreq_cpu_get.
It also nicely stages the lock to be replaced by the rcu.

Cc: Viresh Kumar 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Nathan Zimmer 
---
 drivers/cpufreq/cpufreq.c | 52 +++
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index b02824d..c5996fe 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -45,7 +45,7 @@ static DEFINE_PER_CPU(struct cpufreq_policy *, 
cpufreq_cpu_data);
 /* This one keeps track of the previously set governor of a removed CPU */
 static DEFINE_PER_CPU(char[CPUFREQ_NAME_LEN], cpufreq_cpu_governor);
 #endif
-static DEFINE_SPINLOCK(cpufreq_driver_lock);
+static DEFINE_RWLOCK(cpufreq_driver_lock);
 
 /*
  * cpu_policy_rwsem is a per CPU reader-writer semaphore designed to cure
@@ -137,7 +137,7 @@ static struct cpufreq_policy *__cpufreq_cpu_get(unsigned 
int cpu, bool sysfs)
goto err_out;
 
/* get the cpufreq driver */
-   spin_lock_irqsave(_driver_lock, flags);
+   read_lock_irqsave(_driver_lock, flags);
 
if (!cpufreq_driver)
goto err_out_unlock;
@@ -155,13 +155,13 @@ static struct cpufreq_policy *__cpufreq_cpu_get(unsigned 
int cpu, bool sysfs)
if (!sysfs && !kobject_get(>kobj))
goto err_out_put_module;
 
-   spin_unlock_irqrestore(_driver_lock, flags);
+   read_unlock_irqrestore(_driver_lock, flags);
return data;
 
 err_out_put_module:
module_put(cpufreq_driver->owner);
 err_out_unlock:
-   spin_unlock_irqrestore(_driver_lock, flags);
+   read_unlock_irqrestore(_driver_lock, flags);
 err_out:
return NULL;
 }
@@ -266,9 +266,9 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, 
unsigned int state)
pr_debug("notification %u of frequency transition to %u kHz\n",
state, freqs->new);
 
-   spin_lock_irqsave(_driver_lock, flags);
+   read_lock_irqsave(_driver_lock, flags);
policy = per_cpu(cpufreq_cpu_data, freqs->cpu);
-   spin_unlock_irqrestore(_driver_lock, flags);
+   read_unlock_irqrestore(_driver_lock, flags);
 
switch (state) {
 
@@ -765,12 +765,12 @@ static int cpufreq_add_dev_interface(unsigned int cpu,
goto err_out_kobj_put;
}
 
-   spin_lock_irqsave(_driver_lock, flags);
+   write_lock_irqsave(_driver_lock, flags);
for_each_cpu(j, policy->cpus) {
per_cpu(cpufreq_cpu_data, j) = policy;
per_cpu(cpufreq_policy_cpu, j) = policy->cpu;
}
-   spin_unlock_irqrestore(_driver_lock, flags);
+   write_unlock_irqrestore(_driver_lock, flags);
 
ret = cpufreq_add_dev_symlink(cpu, policy);
if (ret)
@@ -813,12 +813,12 @@ static int cpufreq_add_policy_cpu(unsigned int cpu, 
unsigned int sibling,
 
lock_policy_rwsem_write(sibling);
 
-   spin_lock_irqsave(_driver_lock, flags);
+   write_lock_irqsave(_driver_lock, flags);
 
cpumask_set_cpu(cpu, policy->cpus);
per_cpu(cpufreq_policy_cpu, cpu) = policy->cpu;
per_cpu(cpufreq_cpu_data, cpu) = policy;
-   spin_unlock_irqrestore(_driver_lock, flags);
+   write_unlock_irqrestore(_driver_lock, flags);
 
unlock_policy_rwsem_write(sibling);
 
@@ -871,15 +871,15 @@ static int cpufreq_add_dev(struct device *dev, struct 
subsys_interface *sif)
 
 #ifdef CONFIG_HOTPLUG_CPU
/* Check if this cpu was hot-unplugged earlier and has siblings */
-   spin_lock_irqsave(_driver_lock, flags);
+   read_lock_irqsave(_driver_lock, flags);
for_each_online_cpu(sibling) {
struct cpufreq_policy *cp = per_cpu(cpufreq_cpu_data, sibling);
if (cp && cpumask_test_cpu(cpu, cp->related_cpus)) {
-   spin_unlock_irqrestore(_driver_lock, flags);
+   read_unlock_irqrestore(_driver_lock, flags);
return cpufreq_add_policy_cpu(cpu, sibling, dev);
}
}
-   spin_unlock_irqrestore(_driver_lock, flags);
+   read_unlock_irqrestore(_driver_lock, flags);
 #endif
 #endif
 
@@ -952,10 +952,10 @@ static int cpufreq_add_dev(struct device *dev, struct 
subsys_interface *sif)
return 0;
 
 err_out_unregister:
-   spin_lock_irqsave(_driver_lock, flags);
+   write_lock_irqsave(_driver_lock, flags);
for_each_cpu(j, policy->cpus)
per_cpu(cpufreq_cpu_data, j) = NULL;
-   spin_unlock_irqrestore(_driver_lock, flags);
+   write_unlock_irqrestore(_driver_lock, flags);
 
kobject_put(>kobj);
wait_for_completion(>kobj_unregister);
@@ -1008,12 +1008,12 @@ static int __cpufreq_remove_dev(struct device *dev, 
struct subsys_interface *sif
 
pr_debug("%s: unregistering CPU %u\n", __func__, cpu);
 
-   spin_lock_irqsave(_driver_lock, flags);
+   

[PATCH v3 2/2] cpufreq: Convert the cpufreq_driver_lock to use the rcu

2013-02-20 Thread Nathan Zimmer
In general rwlocks are discourged so we are moving it to use the rcu instead.
This does require a bit of care since the cpufreq_driver_lock protects both
the cpufreq_driver and the cpufreq_cpu_data array.
Also since many of the function pointers on cpufreq_driver may sleep when
called we have to grab them under the rcu_read_lock but call them after
rcu_read_unlock();

Cc: Viresh Kumar 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Nathan Zimmer 
---
 drivers/cpufreq/cpufreq.c | 312 +-
 1 file changed, 224 insertions(+), 88 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index c5996fe..110ec02 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -39,13 +39,13 @@
  * level driver of CPUFreq support, and its spinlock. This lock
  * also protects the cpufreq_cpu_data array.
  */
-static struct cpufreq_driver *cpufreq_driver;
+static struct cpufreq_driver __rcu *cpufreq_driver;
 static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data);
 #ifdef CONFIG_HOTPLUG_CPU
 /* This one keeps track of the previously set governor of a removed CPU */
 static DEFINE_PER_CPU(char[CPUFREQ_NAME_LEN], cpufreq_cpu_governor);
 #endif
-static DEFINE_RWLOCK(cpufreq_driver_lock);
+static DEFINE_SPINLOCK(cpufreq_driver_lock);
 
 /*
  * cpu_policy_rwsem is a per CPU reader-writer semaphore designed to cure
@@ -131,18 +131,19 @@ static DEFINE_MUTEX(cpufreq_governor_mutex);
 static struct cpufreq_policy *__cpufreq_cpu_get(unsigned int cpu, bool sysfs)
 {
struct cpufreq_policy *data;
-   unsigned long flags;
+   struct cpufreq_driver *driver;
 
if (cpu >= nr_cpu_ids)
goto err_out;
 
/* get the cpufreq driver */
-   read_lock_irqsave(_driver_lock, flags);
+   rcu_read_lock();
+   driver = rcu_dereference(cpufreq_driver);
 
-   if (!cpufreq_driver)
+   if (!driver)
goto err_out_unlock;
 
-   if (!try_module_get(cpufreq_driver->owner))
+   if (!try_module_get(driver->owner))
goto err_out_unlock;
 
 
@@ -155,13 +156,13 @@ static struct cpufreq_policy *__cpufreq_cpu_get(unsigned 
int cpu, bool sysfs)
if (!sysfs && !kobject_get(>kobj))
goto err_out_put_module;
 
-   read_unlock_irqrestore(_driver_lock, flags);
+   rcu_read_unlock();
return data;
 
 err_out_put_module:
-   module_put(cpufreq_driver->owner);
+   module_put(driver->owner);
 err_out_unlock:
-   read_unlock_irqrestore(_driver_lock, flags);
+   rcu_read_unlock();
 err_out:
return NULL;
 }
@@ -184,7 +185,9 @@ static void __cpufreq_cpu_put(struct cpufreq_policy *data, 
bool sysfs)
 {
if (!sysfs)
kobject_put(>kobj);
-   module_put(cpufreq_driver->owner);
+   rcu_read_lock();
+   module_put(rcu_dereference(cpufreq_driver)->owner);
+   rcu_read_unlock();
 }
 
 void cpufreq_cpu_put(struct cpufreq_policy *data)
@@ -255,20 +258,21 @@ static inline void adjust_jiffies(unsigned long val, 
struct cpufreq_freqs *ci)
 void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
 {
struct cpufreq_policy *policy;
-   unsigned long flags;
+   u8 flags;
 
BUG_ON(irqs_disabled());
 
if (cpufreq_disabled())
return;
 
-   freqs->flags = cpufreq_driver->flags;
pr_debug("notification %u of frequency transition to %u kHz\n",
state, freqs->new);
 
-   read_lock_irqsave(_driver_lock, flags);
+   rcu_read_lock();
+   flags = rcu_dereference(cpufreq_driver)->flags;
policy = per_cpu(cpufreq_cpu_data, freqs->cpu);
-   read_unlock_irqrestore(_driver_lock, flags);
+   rcu_read_unlock();
+   freqs->flags = flags;
 
switch (state) {
 
@@ -277,7 +281,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, 
unsigned int state)
 * which is not equal to what the cpufreq core thinks is
 * "old frequency".
 */
-   if (!(cpufreq_driver->flags & CPUFREQ_CONST_LOOPS)) {
+   if (!(flags & CPUFREQ_CONST_LOOPS)) {
if ((policy) && (policy->cpu == freqs->cpu) &&
(policy->cur) && (policy->cur != freqs->old)) {
pr_debug("Warning: CPU frequency is"
@@ -329,11 +333,23 @@ static int cpufreq_parse_governor(char *str_governor, 
unsigned int *policy,
struct cpufreq_governor **governor)
 {
int err = -EINVAL;
-
-   if (!cpufreq_driver)
+   struct cpufreq_driver *driver;
+   int (*setpolicy)(struct cpufreq_policy *policy);
+   int (*target)(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation);
+
+   rcu_read_lock();
+   driver = rcu_dereference(cpufreq_driver);
+   if (!driver) {
+   

[PATCH v3 0/2] cpufreq: cpufreq_driver_lock is hot on large systems

2013-02-20 Thread Nathan Zimmer
I am noticing the cpufreq_driver_lock is quite hot.
On an idle 512 system perf shows me most of the system time is spent on this
lock.  This is quite significant as top shows 5% of time in system time.
My solution was to first convert the lock to a rwlock and then to the rcu.

v2: Rebase

v3: Read the RCU documentation instead of skimming it.  Also I based on 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
pm+acpi-3.9-rc1
I assumed that was what you would prefer Rafael.

Nathan Zimmer (2):
  cpufreq: Convert the cpufreq_driver_lock to a rwlock
  cpufreq: Convert the cpufreq_driver_lock to use the rcu

 drivers/cpufreq/cpufreq.c | 286 ++
 1 file changed, 211 insertions(+), 75 deletions(-)

-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] tools: usb: ffs-test: Fix build failure

2013-02-20 Thread maxin . john
From: "Maxin B. John" 

Fixes this build failure:
gcc -Wall -Wextra -g -lpthread -I../include -o testusb testusb.c
gcc -Wall -Wextra -g -lpthread -I../include -o ffs-test ffs-test.c
In file included from ffs-test.c:41:0:
../../include/linux/usb/functionfs.h:4:39: fatal error:
uapi/linux/usb/functionfs.h: No such file or directory
compilation terminated.
make: *** [ffs-test] Error 1

Signed-off-by: Maxin B. John 
---
 tools/usb/ffs-test.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/usb/ffs-test.c b/tools/usb/ffs-test.c
index 8674b9e..fe1e66b 100644
--- a/tools/usb/ffs-test.c
+++ b/tools/usb/ffs-test.c
@@ -38,7 +38,7 @@
 #include 
 #include 
 
-#include "../../include/linux/usb/functionfs.h"
+#include "../../include/uapi/linux/usb/functionfs.h"
 
 
 / Little Endian Handling /
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, efi: Make "noefi" really disable EFI runtime serivces

2013-02-20 Thread H. Peter Anvin
On 02/20/2013 03:46 PM, Satoru Takeuchi wrote:
> 
> Sorry, my patch was imperfect. This patch looks good to me.
> 

Perfection is something very rarely achieved.  That's why we work with
gradual improvements.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, efi: Make "noefi" really disable EFI runtime serivces

2013-02-20 Thread Satoru Takeuchi
Hi Matt,

(2013/02/21 5:36), Matt Fleming wrote:
> From: Matt Fleming 
> 
> commit 1de63d60cd5b ("efi: Clear EFI_RUNTIME_SERVICES rather than
> EFI_BOOT by "noefi" boot parameter") attempted to make "noefi" true to
> its documentation and disable EFI runtime services to prevent the
> bricking bug described in commit e0094244e41c ("samsung-laptop:
> Disable on EFI hardware"). However, it's not possible to clear
> EFI_RUNTIME_SERVICES from an early param function because
> EFI_RUNTIME_SERVICES is set in efi_init() *after* parse_early_param().
> 
> This resulted in "noefi" effectively becoming a no-op and no longer
> providing users with a way to disable EFI, which is bad for those
> users that have buggy machines.

Sorry, my patch was imperfect. This patch looks good to me.

Reviewed-by: Satoru Takeuchi 

Thanks,
Satoru

> 
> Reported-by: Walt Nelson Jr 
> Cc: Satoru Takeuchi 
> Cc: H. Peter Anvin 
> Cc: 
> Signed-off-by: Matt Fleming 
> ---
>  arch/x86/platform/efi/efi.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 928bf83..e2cd38f 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -85,9 +85,10 @@ int efi_enabled(int facility)
>  }
>  EXPORT_SYMBOL(efi_enabled);
>  
> +static bool disable_runtime = false;
>  static int __init setup_noefi(char *arg)
>  {
> - clear_bit(EFI_RUNTIME_SERVICES, _efi_facility);
> + disable_runtime = true;
>   return 0;
>  }
>  early_param("noefi", setup_noefi);
> @@ -734,7 +735,7 @@ void __init efi_init(void)
>   if (!efi_is_native())
>   pr_info("No EFI runtime due to 32/64-bit mismatch with 
> kernel\n");
>   else {
> - if (efi_runtime_init())
> + if (disable_runtime || efi_runtime_init())
>   return;
>   set_bit(EFI_RUNTIME_SERVICES, _efi_facility);
>   }
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V5 4/5] drivers/amba: add support for a PCI bridge

2013-02-20 Thread H. Peter Anvin
On 02/20/2013 03:35 PM, Russell King - ARM Linux wrote:
> On Wed, Feb 20, 2013 at 02:50:17PM -0800, H. Peter Anvin wrote:
>> On 02/20/2013 02:45 PM, Alessandro Rubini wrote:
>>> [meanwhile I posted V6 with the acked-by of linusw and others, that
>>> were missing in V5]
>>>
>>> rmk:
> I'm happy to take it through my tree if everyone is now happy with this.
>>>
>>> hpa: 
 I am okay with that, although I would like to make sure we do a bunch of
 x86 randconfigs on it before pushing it to Linus.
>>>
>>> I did like this:
>>>   - disable STA2X11 (and thus AMBA) and build
>>>   - enable STA2X11, answer y to all new questions and build
>>>
>>> So there's nothing left (you'll have two unrelated warnings, that I'm
>>> working on and I'll post a fix tomorrow).  Sure, Peter, first time I
>>> didn't do that test and missed some of the drivers.
>>>
>>
>> I was just concerned that rmk wouldn't necessarily do those tests as a
>> matter of process.
>>
>> So Russell -- how do you want to handle this?  Should I take them (and
>> ask Ingo to put them through his test machinery) or do you want to (and
>> run x86 randconfigs as part of your testing)?
> 
> Well, I'm happy to take the non-x86 bits if that's what others want (for
> the _next_ merge window, not this one.)  That _should_ result in x86 not
> seeing this stuff until it gets the ARM_AMBA definition enabled, and
> giving it a full cycle of testing.
> 
> However, if we want to keep the patch set together and route it via
> another tree, I'm also fine with that too.
> 

Actually, between linux-next and Fengguang's zeroday testbot I suspect
we'll get all the coverage we need.  So yes, go ahead and take them.

Acked-by: H. Peter Anvin 

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] idr: explain WARN_ON_ONCE() on negative IDs out-of-range ID

2013-02-20 Thread Tejun Heo
Until recently, when an negative ID is specified, idr functions used
to ignore the sign bit and proceeded with the operation with the rest
of bits, which is bizarre and error-prone.  The behavior recently got
changed so that negative IDs are treated as invalid but we're
triggering WARN_ON_ONCE() on negative IDs just in case somebody was
depending on the sign bit being ignored, so that those can be detected
and fixed easily.

We only need this for a while.  Explain why WARN_ON_ONCE()s are there
and that they can be removed later.

Signed-off-by: Tejun Heo 
Cc: Thomas Gleixner 
---
 lib/idr.c |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/idr.c b/lib/idr.c
index 5c772dc..134a61a 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -569,6 +569,7 @@ void idr_remove(struct idr *idp, int id)
struct idr_layer *p;
struct idr_layer *to_free;
 
+   /* see comment in idr_find_slowpath() */
if (WARN_ON_ONCE(id < 0))
return;
 
@@ -666,6 +667,14 @@ void *idr_find_slowpath(struct idr *idp, int id)
int n;
struct idr_layer *p;
 
+   /*
+* If @id is negative, idr_find() used to ignore the sign bit and
+* performed lookup with the rest of bits, which is weird and can
+* lead to very obscure bugs.  We're now returning NULL for all
+* negative IDs but just in case somebody was depending on the sign
+* bit being ignored, let's trigger WARN_ON_ONCE() so that they can
+* be detected and fixed.  WARN_ON_ONCE() can later be removed.
+*/
if (WARN_ON_ONCE(id < 0))
return NULL;
 
@@ -815,6 +824,7 @@ void *idr_replace(struct idr *idp, void *ptr, int id)
int n;
struct idr_layer *p, *old_p;
 
+   /* see comment in idr_find_slowpath() */
if (WARN_ON_ONCE(id < 0))
return ERR_PTR(-EINVAL);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >