date:20120727

Re: [Xen-devel] [PATCH 10/24] xen: do not compile manage, balloon, pci, acpi and cpu_hotplug on ARM

2012-07-27 Thread Jan Beulich

>>> On 26.07.12 at 17:33, Stefano Stabellini  
>>> wrote:
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -1,11 +1,15 @@
> -obj-y+= grant-table.o features.o events.o manage.o balloon.o
> +ifneq ($(CONFIG_ARM),y)
> +obj-y+= manage.o balloon.o

While I assume that this part (and the cpu_hotplug one below) is
temporary, ...

> +obj-$(CONFIG_XEN_DOM0)   += pci.o acpi.o

... at least this one should imo be solved with a proper long term
mechanism, i.e. the usual var-$(CONFIG_...) approach, i.e.

dom0-$(CONFIG_PCI) := pci.o
dom0-$(CONFIG_ACPI) := acpi.o
obj-$(CONFIG_XEN_DOM0)  += $(dom0-y)

Jan

> +obj-$(CONFIG_HOTPLUG_CPU)+= cpu_hotplug.o
> +endif
> +obj-y+= grant-table.o features.o events.o
>  obj-y+= xenbus/
>  
>  nostackp := $(call cc-option, -fno-stack-protector)
>  CFLAGS_features.o:= $(nostackp)
>  
>  obj-$(CONFIG_BLOCK)  += biomerge.o
> -obj-$(CONFIG_HOTPLUG_CPU)+= cpu_hotplug.o
>  obj-$(CONFIG_XEN_XENCOMM)+= xencomm.o
>  obj-$(CONFIG_XEN_BALLOON)+= xen-balloon.o
>  obj-$(CONFIG_XEN_SELFBALLOONING) += xen-selfballoon.o
> @@ -17,7 +21,6 @@ obj-$(CONFIG_XEN_SYS_HYPERVISOR)+= sys-hypervisor.o
>  obj-$(CONFIG_XEN_PVHVM)  += platform-pci.o
>  obj-$(CONFIG_XEN_TMEM)   += tmem.o
>  obj-$(CONFIG_SWIOTLB_XEN)+= swiotlb-xen.o
> -obj-$(CONFIG_XEN_DOM0)   += pci.o acpi.o
>  obj-$(CONFIG_XEN_PCIDEV_BACKEND) += xen-pciback/
>  obj-$(CONFIG_XEN_PRIVCMD)+= xen-privcmd.o
>  obj-$(CONFIG_XEN_ACPI_PROCESSOR) += xen-acpi-processor.o
> -- 
> 1.7.2.5
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org 
> http://lists.xen.org/xen-devel 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH 17/24] xen: allow privcmd for HVM guests

2012-07-27 Thread Jan Beulich

>>> On 26.07.12 at 17:33, Stefano Stabellini  
>>> wrote:
> In order for privcmd mmap to work correctly, xen_remap_domain_mfn_range
> needs to be implemented for HVM guests.
> If it is not, mmap is going to fail later on.

Somehow, for me at least, this description doesn't connect to the
actual change.

> Signed-off-by: Stefano Stabellini 
> ---
>  drivers/xen/privcmd.c |4 
>  1 files changed, 0 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> index ccee0f1..85226cb 100644
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
> @@ -380,10 +380,6 @@ static struct vm_operations_struct privcmd_vm_ops = {
>  
>  static int privcmd_mmap(struct file *file, struct vm_area_struct *vma)
>  {
> - /* Unsupported for auto-translate guests. */
> - if (xen_feature(XENFEAT_auto_translated_physmap))
> - return -ENOSYS;
> -

Is this safe on x86?

Jan

>   /* DONTCOPY is essential for Xen because copy_page_range doesn't know
>* how to recreate these mappings */
>   vma->vm_flags |= VM_RESERVED | VM_IO | VM_DONTCOPY | VM_PFNMAP;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/13] UAPI header file split

2012-07-27 Thread Michael Kerrisk

On Thu, Jul 26, 2012 at 12:46 PM, David Howells  wrote:
> Michael Kerrisk  wrote:
>
>> >> >> 3. HEADER COMMENTS NOT RETAINED IN KAPI FILES
>> >
>> > How about the attached changes?  This is a delta to the disintegrate 
>> > markers
>> > diff I sent earlier.
>>
>> That looks about right to me.
>>
>> Acked-by: Michael Kerrisk 
>
> Excellent, thanks.  The question is where can I attach your ack?  I'm not
> going to simply tag these on the end as a new patch, but rather the changes
> are included in the regenerated patches as I changed the scripts.
>
> So just the main set of scripted commits?

Modulo the following statements:
* The conceptual approach of your scripts makes sense to me.
* I checked a significant, but far from complete, sample of the output
files, and the results look correct.
* My comparator scripts support the belief that no content is being
lost in the resulting header files (once you've made the changes noted
yesterday).

then you could add my Acked-by: for the commits generated by the scripts.

Acked-by: Michael Kerrisk 

Cheers,

Michael

-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/6] w1: omap-hdq: add section annotation to remove

2012-07-27 Thread Felipe Balbi

Hi,

On Thu, Jul 26, 2012 at 06:45:26PM +0400, Evgeniy Polyakov wrote:
> Hi all
> 
> On Wed, Jul 25, 2012 at 03:05:27PM +0300, Felipe Balbi (ba...@ti.com) wrote:
> > trivial patch, no functional changes.
> > 
> > Signed-off-by: Felipe Balbi 
> 
> Looks good to me
> Who should pick it up?
> 
> Feel free to add my acked-by: Evgeniy Polyakov 

I thought you would :-p Then I guess Tony, maybe ?

-- 
balbi


signature.asc
Description: Digital signature

Re: Issue with block I/O cgroup in case of threads

2012-07-27 Thread naveen yadav

We are using 3.0.33 kernel and verification is done on ARM cortex a9.


On Fri, Jul 27, 2012 at 12:33 PM, naveen yadav  wrote:
> Hi All,
>
>
> I am testing the cgroup block IO attributes in multiple threads scenario.
> I tried testing Throttling policy (max read/write bytes per second per device)
>  that can be set using the following attribute-
> "blkio.throttle.write_bps_device"
> "blkio.throttle.read_bps_device"
> but  I am not getting appropriate bandwidth readings, in case of
> process and threads.
>
> The following is my kernel configuration-
> # CONFIG_RCU_BOOST is not set
> # CONFIG_IKCONFIG is not set
> CONFIG_LOG_BUF_SHIFT=17
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> # CONFIG_CGROUP_FREEZER is not set
> CONFIG_CGROUP_DEVICE=y
> # CONFIG_CPUSETS is not set
> # CONFIG_CGROUP_CPUACCT is not set
> CONFIG_RESOURCE_COUNTERS=y
> CONFIG_CGROUP_MEM_RES_CTLR=y
> CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
> CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED=y
> # CONFIG_CGROUP_PERF is not set
> CONFIG_CGROUP_SCHED=y
> CONFIG_FAIR_GROUP_SCHED=y
> CONFIG_RT_GROUP_SCHED=y
> CONFIG_BLK_CGROUP=y
> CONFIG_DEBUG_BLK_CGROUP=y
> # CONFIG_NAMESPACES is not set
> CONFIG_BLK_DEV_THROTTLING=y
> CONFIG_CFQ_GROUP_IOSCHED=y
>
> Below is the procedure that I followed for testing-
> first of all I mounted the cgroup blkio on /mnt
> $mount -t cgroup -o blkio none /mnt
> $mount | grep "cgroup"
> ==> output
> none on /sys/fs/cgroup type cgroup (rw,relatime,cpu)
> none on /mnt type cgroup (rw,relatime,blkio)
>
> The default readings were taken through dd command
> $dd if=linux.3.0.20.tgz of=/dev/null bs=4096 count=51200
> 28419+1 records in
> 28419+1 records out
> 116405611 bytes (111.0MB) copied, 6.041795 seconds, 18.4MB/s
> $dd if=/dev/zero of=test1 bs=4096 count=51200
> 51200+0 records in
> 51200+0 records out
> 209715200 bytes (200.0MB) copied, 31.203680 seconds, 6.4MB/s
>
> Then I made two groups in /mnt named g1 and g2 and set the bandwidth -
> $echo "8:0 2097152" > /mnt/g1/blkio.throttle.read_bps_device//2MB
> $echo "8:0 16777216" > /mnt/g2/blkio.throttle.read_bps_device   //16MB
> $echo "8:0 1048576" > /mnt/g1/blkio.throttle.write_bps_device   //1MB
> $echo "8:0 5242880" > /mnt/g2/blkio.throttle.write_bps_device   //5MB
>
> Test program -
> 
> #define MAX_NAME_LEN16
> #define NO_THREADS  2
>
> volatile char flag=0;
> struct sigaction sigact;
> static char count=0;
>
> void *threadFunc(void *arg)
> {
> char cmd[100];
> sprintf(cmd,"dd if=/dev/zero of=ThreadTest%d bs=4096 
> count=51200",++count);
> while(flag != 2);
> system("echo 3 > /proc/sys/vm/drop_caches");
>
> system("dd if=Linux.3.0.20.tgz of=/dev/null bs=4096 
> count=51200");
> system("echo 3 > /proc/sys/vm/drop_caches");
> system(cmd);
> while(1);
> return NULL;
> }
>
> static void signal_handler(int sig) {
> printf("Caught signal SIGUSR1 : %d\n",sig);
> flag = 1;
> }
>
> void init_signals(void) {
> sigact.sa_handler = signal_handler;
> sigemptyset(&sigact.sa_mask);
> sigact.sa_flags = 0;
> sigaction(SIGUSR1, &sigact, (struct sigaction *)NULL);
> }
>
> int main(int argc, char *argv[])
> {
> pid_t pid;
> pthread_t pth[2];
> struct sched_param mysched;
> char name[MAX_NAME_LEN + 1];
> int i;
> int j;
>
> init_signals();
> mysched.sched_priority = 19;
>
> for (i=0; i<3; ++i) {
> pid = fork();
> if (pid == 0) {
> sprintf(name, "%d",i);
> prctl(PR_SET_NAME, (unsigned long)&name);
>
> if (argc==2 && !strcmp(argv[1], "FIFO")) {
> sched_setscheduler(0, SCHED_FIFO, &mysched);
> } else if (argc==2 && !strcmp(argv[1], "RR")) {
> sched_setscheduler(0, SCHED_RR, &mysched);
> }
> printf("\nPID=%d, Sched Policy=%d\n",
> getpid(),sched_getscheduler );
>
> sleep(30);
> printf("Starting Thread 
> Creation\n");
>
> for(j=0;j   {
> 
> pthread_create(&pth[j],NULL,threadFunc,NULL);
> }
>
> while(!flag);
> printf("InProcess\n");
> system("echo 3 > 
> /proc/sys/vm/drop_caches");
> system("dd 
> if=Linux.3.0.20.tg

[RFC PATCH] netconsole.txt: "nc" needs "-p" to specify the listening port

2012-07-27 Thread Dirk Gouders

Hi Jesse,

I would like to ask you to check if the documentation of "nc" in
netconsole.txt is still correct.  I tried two different netcat packages
and both require "-p" to specify the listening port.  I am wondering if
that changed after the use of "nc" has been documented.

Best regards,

Dirk

Signed-off-by: Dirk Gouders 
---
 Documentation/networking/netconsole.txt |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/networking/netconsole.txt 
b/Documentation/networking/netconsole.txt
index 8d02207..ffe30a7 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -52,7 +52,7 @@ initialized and attempts to bring up the supplied dev at the 
supplied
 address.
 
 The remote host can run either 'netcat -u -l -p ',
-'nc -l -u ' or syslogd.
+'nc -l -u -p ' or syslogd.
 
 Dynamic reconfiguration:
 
-- 
1.7.8.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Oops after merge of tty-next

2012-07-27 Thread Thierry Reding

On Tue, Jul 24, 2012 at 11:05:27PM +0100, Alan Cox wrote:
> On Mon, 23 Jul 2012 15:51:03 +0100
> Ian Abbott  wrote:
> 
> > On 2012-07-21 23:41, Alan Cox wrote:
> > > On Fri, 20 Jul 2012 23:07:06 +0100
> > > Ian Abbott  wrote:
> > >
> > >> I'm getting an Oops in the linux-next tree today after the merge
> > >> of the remote-tracking branch 'tty/tty-next'.  I bisected it down
> > >> to commit 36b3c070d2346c890d690d71f6eab02f8c511137 in
> > >> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git :
> 
> Ok Greg you can you leave that one in -next but not push it for 3.6.
> 
> I need to go over this in some detail and figure out the remaining
> race, and worse yet how to fix it without the mess of existing locks
> turning it into something nasty.
> 
> I think I understand what is needed but I don't want to be doing it as
> a mad panic for 3.6. On the bright side I think it explains the other
> tty lock splitting mysteries.

I've also been able to reproduce this, or at least a very similar issue,
on ARM (Tegra). The system boots into the initrd, which works as usual,
but it hangs when executing the switch_root that starts systemd within
the final root filesystem. With the above-mentioned commit reverted, the
system successfully boots to the login prompt.

Thierry


pgpVrQ6NAwqAi.pgp
Description: PGP signature

Re: [Xen-devel] [PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on > 4GB, don't turn it on.

2012-07-27 Thread Jan Beulich

>>> On 26.07.12 at 22:43, Konrad Rzeszutek Wilk  wrote:
> If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB
> gets turned on:
> PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at 
> [8800fb43d000-8800ff43cfff]
> 
> which is OK if we had PCI devices, but not if we did not. In a PV
> guest the SWIOTLB ends up asking the hypervisor for precious lowmem
> memory - and 64MB of it per guest. On a 32GB machine, this limits the
> amount of guests that are 4GB to start due to lowmem exhaustion.
> 
> What we do is detect whether the user supplied e820_hole=1
> parameter, which is used to construct an E820 that is similar to
> the machine  - so that the PCI regions do not overlap with RAM regions.
> We check for that by looking at the E820 and seeing if it diverges
> from the standard - and if so (and if iommu=soft was not turned on),
> we disable the check pci_swiotlb_detect_4gb code.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  arch/x86/xen/pci-swiotlb-xen.c |   26 ++
>  1 files changed, 26 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
> index 967633a..56f373e 100644
> --- a/arch/x86/xen/pci-swiotlb-xen.c
> +++ b/arch/x86/xen/pci-swiotlb-xen.c
> @@ -8,6 +8,10 @@
>  #include 
>  #include 
>  
> +#include 
> +#include 
> +#include 
> +
>  int xen_swiotlb __read_mostly;
>  
>  static struct dma_map_ops xen_swiotlb_dma_ops = {
> @@ -24,7 +28,19 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
>   .unmap_page = xen_swiotlb_unmap_page,
>   .dma_supported = xen_swiotlb_dma_supported,
>  };
> +bool __init e820_has_acpi(void)
> +{
> + int i;
>  
> + /* Check if the user supplied the e820_hole parameter
> +  * which would create a machine looking E820 region. */
> + for (i = 0; i < e820.nr_map; i++) {
> + if ((e820.map[i].type == E820_ACPI) ||
> + (e820.map[i].type == E820_NVS))
> + return true;

Tying this decision to the presence of ACPI regions in E820 is
problematic for two reasons imo: For one, it precludes cleaning
up this (bogus!) construct where it gets produced (PV DomU-s
really shouldn't ever see such E820 entries, they should get
converted to simple reserved entries, to wipe any notion of
ACPI presence). And second it ties you to running on systems
that actually have ACPI, whereas it is my rudimentary
understanding that systems with e.g. SFI would not have any
ACPI).

Jan

> + }
> + return false;
> +}
>  /*
>   * pci_xen_swiotlb_detect - set xen_swiotlb to 1 if necessary
>   *
> @@ -33,7 +49,17 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
>   */
>  int __init pci_xen_swiotlb_detect(void)
>  {
> +#ifdef CONFIG_X86_64
>  
> + /* Having more than 4GB triggers the native SWIOTLB to activate.
> +  * The way to turn it off is to set no_iommu. */
> + printk(KERN_INFO "swiotlb: %d\n", swiotlb);
> + if (xen_pv_domain() && !swiotlb && max_pfn > MAX_DMA32_PFN) {
> + /* Normal PV guests only have E820_RSV and E820_RAM regions */
> + if (!e820_has_acpi())
> + no_iommu = 1;
> + }
> +#endif
>   /* If running as PV guest, either iommu=soft, or swiotlb=force will
>* activate this IOMMU. If running as PV privileged, activate it
>* irregardless.
> -- 
> 1.7.7.6
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org 
> http://lists.xen.org/xen-devel 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: How to use the generic thermal sysfs.

2012-07-27 Thread Jean Delvare

On Fri, 27 Jul 2012 10:58:21 +0800, Wei Ni wrote:
> On Fri, 2012-07-27 at 09:21 +0800, Zhang Rui wrote:
> > is it possible to program the sensor at this time, in your own thermal
> > driver?
> 
> Since we are using the generic thermal driver lm90.c, I'm not sure if we
> could program these limits in the generic driver, I think it's better to
> have a generic interface to set the limits, so I wish to add a
> callback .set_limits() in the generic thermal framework.

I can confirm that hwmon drivers do not set limits, it is up to
user-space to do it if they want. So if there is a need to do so in the
kernel itself, a proper interface at the generic thermal framework
level seems appropriate.

-- 
Jean Delvare
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

arm interrupt handling

2012-07-27 Thread Qipeng Zha

Hi
When I study the interrupt handling code in 2.6.39 for omap soc, found don't 
clear CPSR.I to enable irq till each ISR finished.
Is this true? Or I miss something, since this will be wired that the core will 
not service any other irq before complete before irq handling.


Best wishes
Qipeng


-Original Message-
From: linux-arm-kernel-boun...@lists.infradead.org 
[mailto:linux-arm-kernel-boun...@lists.infradead.org] On Behalf Of "Andy Green 
(林安廸)"
Sent: 2012年7月10日 20:59
To: Florian Fainelli
Cc: s-...@ti.com; a...@arndb.de; patc...@linaro.org; t...@atomide.com; 
net...@vger.kernel.org; linux-kernel@vger.kernel.org; rost...@goodmis.org; 
linux-o...@vger.kernel.org; linux-arm-ker...@lists.infradead.org
Subject: Re: [PATCH 4 0/4] Add ability to set defaultless network device MAC 
addresses to deterministic computed locally administered values

On 10/07/12 20:37, the mail apparently from Florian Fainelli included:

Hi -

> Le jeudi 05 juillet 2012 04:44:33, Andy Green a écrit :
>> The following series adds some code to generate legal, locally administered
>> MAC addresses from OMAP4 CPU Die ID fuse data, and then adds a helper at
>> net/ethernet taking care of accepting device path / MAC mapping
>> registrations and running a notifier to enforce the requested MAC when the
>> matching network device turns up.
>
> This looks like something you can solve by user-space entirely. Expose the

That might seem so from a openwrt perspective, where you custom cook the 
whole userland thing per-device, but it ain't so from a generic rootfs 
perspective.

Why should Ubuntu, Fedora etc stink up their OSes with Panda-specific 
workarounds?  And Panda is not the only device with this issue.

-Andy

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [Xen-devel] [RFC PATCH] Boot PV guests with more than 128GB (v1) for 3.7

2012-07-27 Thread Jan Beulich

>>> On 26.07.12 at 22:47, Konrad Rzeszutek Wilk  wrote:
>  2). Allocate a new array, copy the existing P2M into it,
> revector the P2M tree to use that, and return the old
> P2M to the memory allocate. This has the advantage that
> it sets the stage for using XEN_ELF_NOTE_INIT_P2M
> feature. That feature allows us to set the exact virtual
> address space we want for the P2M - and allows us to
> boot as initial domain on large machines.

And I would hope that the tools would get updated to recognize
this note too, so that huge DomU-s would become possible as
well.

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH 1/7] xen/mmu: use copy_page instead of memcpy.

2012-07-27 Thread Jan Beulich

>>> On 26.07.12 at 22:47, Konrad Rzeszutek Wilk  wrote:
> After all, this is what it is there for.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Jan Beulich 

> ---
>  arch/x86/xen/mmu.c |   13 ++---
>  1 files changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 6ba6100..7247e5a 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1754,14 +1754,14 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
> unsigned long max_pfn)
>* it will be also modified in the __ka space! (But if you just
>* modify the PMD table to point to other PTE's or none, then you
>* are OK - which is what cleanup_highmap does) */
> - memcpy(level2_ident_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
> + copy_page(level2_ident_pgt, l2);
>   /* Graft it onto L4[511][511] */
> - memcpy(level2_kernel_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
> + copy_page(level2_kernel_pgt, l2);
>  
>   /* Get [511][510] and graft that in level2_fixmap_pgt */
>   l3 = m2v(pgd[pgd_index(__START_KERNEL_map + PMD_SIZE)].pgd);
>   l2 = m2v(l3[pud_index(__START_KERNEL_map + PMD_SIZE)].pud);
> - memcpy(level2_fixmap_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
> + copy_page(level2_fixmap_pgt, l2);
>   /* Note that we don't do anything with level1_fixmap_pgt which
>* we don't need. */
>  
> @@ -1821,8 +1821,7 @@ static void __init xen_write_cr3_init(unsigned long 
> cr3)
>*/
>   swapper_kernel_pmd =
>   extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
> - memcpy(swapper_kernel_pmd, initial_kernel_pmd,
> -sizeof(pmd_t) * PTRS_PER_PMD);
> + copy_page(swapper_kernel_pmd, initial_kernel_pmd);
>   swapper_pg_dir[KERNEL_PGD_BOUNDARY] =
>   __pgd(__pa(swapper_kernel_pmd) | _PAGE_PRESENT);
>   set_page_prot(swapper_kernel_pmd, PAGE_KERNEL_RO);
> @@ -1851,11 +1850,11 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
> unsigned long max_pfn)
> 512*1024);
>  
>   kernel_pmd = m2v(pgd[KERNEL_PGD_BOUNDARY].pgd);
> - memcpy(initial_kernel_pmd, kernel_pmd, sizeof(pmd_t) * PTRS_PER_PMD);
> + copy_page(initial_kernel_pmd, kernel_pmd);
>  
>   xen_map_identity_early(initial_kernel_pmd, max_pfn);
>  
> - memcpy(initial_page_table, pgd, sizeof(pgd_t) * PTRS_PER_PGD);
> + copy_page(initial_page_table, pgd);
>   initial_page_table[KERNEL_PGD_BOUNDARY] =
>   __pgd(__pa(initial_kernel_pmd) | _PAGE_PRESENT);
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ANNOUNCE] util-linux v2.22-rc1

2012-07-27 Thread Karel Zak


The util-linux release v2.22-rc1 is available at

   ftp://ftp.kernel.org/pub/linux/utils/util-linux/v2.22/

Feedback and bug reports, as always, are welcomed.

Karel


Util-linux 2.22 Release Notes
=

 The cryptoloop support in the commands mount(8) and losetup(8) is DEPRECATED.
 This is the last release where encryption= mount option and -e,-E,--encryption
 losetup options are supported.

Release highlights
--

mount(8), umount(8), swapon(8), blkid(8) and findmnt(8):
  - supports PARTUUID= and PARTLABEL= tags to specify block devices by partition
UUID or LABEL (for example for UEFI GPT). These tags are filesystem
independent and provide persistent configuration (your /etc/fstab setting 
will not be affected by mkfs/mkswap changes).

dmesg(1):
  - reads kernel messages from /dev/kmsg on kernel 3.5
  - supports new option --follow to wait for new messages  (kernel 3.5 required)
  - supports new option --reltime to print human readable deltas

su(1):
  - has been merged from coreutils into util-linux

sulogin(8):
  - has been merged from sysvinit into util-linux

utmpdumpa(1):
  - has been merged from sysvinit into util-linux

eject(1):
  - has been merged from inactive upstream from sf.net and Fedora into 
util-linux
  - supports new options --manualeject, --force and --no-partitions-unmount

lslocks(1)
  - this NEW COMMAND prints local system locks and it's replacement to very 
long time unmaintained lslk(1)
 
wdctl(8):
  - this NEW COMMAND shows hardware watchdog status

mount(8):
  - pure libmount based mount(8) and umount(8) command are ENABLED BY DEFAULT
  - the old mount(8) and umount(8) implementation is DEPRECATED
  - the hybrid mount(8) [old mount linked with libmount] is not supported 
anymore
  - supports new command line options --source and --target to avoid ambivalent
interpretation if only one argument is given

swapon(8):
  - supports new option --show to print information about swaps in definable
format

findmnt(8):
  - supports new option --task  to print private task mount table
  - supports new option --df to imitate df(1)

fdisk(8)
  - does not print geometry in 'p'rint output in non-DOS mode

libuuid:
  - does NOT EXECUTE uuidd on demand, the daemon has to be started by 
init scripts / systemd

uuidd:
  - supports socket activation (for systemd)
  - supports new options -no-fork, --no-pid and --socket-activation

flock(1):
  - supports new option --conflict-exit-code to specify return code

fsck(8):
  - supports new option -r to report memory and runtime statistics

lsblk(8):
  - supports inverse trees (new option -s) 

losetup(8):
  - supports option --detach-all to detach all loop devices


build-system changes:
  - login(1) enabled by default (see --disable-login)
  - partx(8) enabled by default (see --disable-partx)
  - new non-recursive build-system
  

Stable maintenance releases between v2.21 and v2.22
---

util-linux 2.21.1 [30-Mar-2012]

 * ftp://ftp.kernel.org/pub/linux/utils/util-linux/v2.21/v2.21.1-ReleaseNotes
   ftp://ftp.kernel.org/pub/linux/utils/util-linux/v2.21/v2.21.1-ChangeLog

util-linux 2.21.2 [25-May-2012]

 * ftp://ftp.kernel.org/pub/linux/utils/util-linux/v2.21/v2.21.2-ReleaseNotes
   ftp://ftp.kernel.org/pub/linux/utils/util-linux/v2.21/v2.21.2-ChangeLog


Changes between v2.21 and v2.22
---

 For more details see ChangeLog files at:
 ftp://ftp.kernel.org/pub/linux/utils/util-linux/v2.22/


addpart:
   - align with util-linux coding standards  [Sami Kerola]
   - improve error messages  [Karel Zak]
agetty:
   - close tty before vhangup()  [Karel Zak]
   - make tcsetpgrp() optional  [Karel Zak]
   - more robust debug() macro, check ioctl result [coverity scan]  [Karel Zak]
   - move vc initialization to ttyutils.h  [Karel Zak]
   - remove unnecessary sleep(10)  [Mantas Mikulėnas]
   - use configured run state directory  [Sami Kerola]
arch, eject, elvtune:
   - Gracefully disable on non-Linux platforms.  [Thomas Schwinge]
blkdev:
   - add blkdev_scsi_type_to_name()  [Sami Kerola]
blkid:
   - add DEVNAME= to export output format  [Karel Zak]
   - add docs about PARTUUID= and PARTLABEL=  [Karel Zak]
   - add note about variable tags and devices order.  [Karel Zak]
   - fix realloc memory leak [cppcheck]  [Sami Kerola]
   - fix shadow declaration  [Sami Kerola]
   - introduce symbolic names for different blkid exit codes  [Petr Uzel]
   - stop device probing if error is detected  [Petr Uzel]
   - use err_exclusive_options()  [Karel Zak]
   - use exclusive_option()  [Sami Kerola]
   - use get_terminal_width() from ttyutils.h  [Petr Uzel]
   - use strtosize_or_err()  [Karel Zak]
   - use symbolic exit code  [Petr Uzel]
build:
   - fix redundant redeclaration warnings  [Sami Kerola]
   - fix unused parameter warnings  [Sami Kerola]
build-sys:
   - add --diable-sulogin (enabled by default)  [Karel Zak]
   - add --disab

Re: How to use the generic thermal sysfs.

2012-07-27 Thread Zhang Rui

On 五, 2012-07-27 at 09:30 +0200, Jean Delvare wrote:
> On Fri, 27 Jul 2012 10:58:21 +0800, Wei Ni wrote:
> > On Fri, 2012-07-27 at 09:21 +0800, Zhang Rui wrote:
> > > is it possible to program the sensor at this time, in your own thermal
> > > driver?
> > 
> > Since we are using the generic thermal driver lm90.c, I'm not sure if we
> > could program these limits in the generic driver, I think it's better to
> > have a generic interface to set the limits, so I wish to add a
> > callback .set_limits() in the generic thermal framework.
> 
> I can confirm that hwmon drivers do not set limits, it is up to
> user-space to do it if they want. So if there is a need to do so in the
> kernel itself, a proper interface at the generic thermal framework
> level seems appropriate.
> 
oh, setting limits from userspace?
I think you can program the senor when writing the trip point?
with this patch,
http://marc.info/?l=linux-acpi&m=134318814620429&w=2

thanks,
rui


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86: don't ever patch back to UP if we unplug cpus.

2012-07-27 Thread Rusty Russell

Paul McKenney points out:

 mean offline overhead is 6251/48=130.2 milliseconds.

 If I remove the alternatives_smp_switch() from the offline
 path [...] the mean offline overhead is 550/42=13.1 milliseconds

Basically, we're never going to get those 120ms back, and the code is
pretty messy.

We get rid of:
1) The "smp-alt-once" boot option.  It's actually "smp-alt-boot", the
   documentation is wrong.  It's now the default.
2) The skip_smp_alternatives flag used by suspend.
3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end()
   which were only used to set this one flag.

Signed-off-by: Rusty Russell 
---
 Documentation/kernel-parameters.txt |3 -
 arch/x86/include/asm/alternative.h  |4 -
 arch/x86/kernel/alternative.c   |  104 +++-
 arch/x86/kernel/smpboot.c   |   20 --
 kernel/cpu.c|   11 ---
 5 files changed, 27 insertions(+), 115 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2638,9 +2638,6 @@ bytes respectively. Such letter suffixes
smart2= [HW]
Format: [,[,...,]]
 
-   smp-alt-once[X86-32,SMP] On a hotplug CPU system, only
-   attempt to substitute SMP alternatives once at boot.
-
smsc-ircc2.nopnp[HW] Don't use PNP to discover SMC devices
smsc-ircc2.ircc_cfg=[HW] Device configuration I/O port
smsc-ircc2.ircc_sir=[HW] SIR base I/O port
diff --git a/arch/x86/include/asm/alternative.h 
b/arch/x86/include/asm/alternative.h
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -60,7 +60,7 @@ extern void alternatives_smp_module_add(
void *locks, void *locks_end,
void *text, void *text_end);
 extern void alternatives_smp_module_del(struct module *mod);
-extern void alternatives_smp_switch(int smp);
+extern void alternatives_enable_smp(void);
 extern int alternatives_text_reserved(void *start, void *end);
 extern bool skip_smp_alternatives;
 #else
@@ -68,7 +68,7 @@ static inline void alternatives_smp_modu
   void *locks, void *locks_end,
   void *text, void *text_end) {}
 static inline void alternatives_smp_module_del(struct module *mod) {}
-static inline void alternatives_smp_switch(int smp) {}
+static inline void alternatives_enable_smp(void) {}
 static inline int alternatives_text_reserved(void *start, void *end)
 {
return 0;
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -23,19 +23,6 @@
 
 #define MAX_PATCH_LEN (255-1)
 
-#ifdef CONFIG_HOTPLUG_CPU
-static int smp_alt_once;
-
-static int __init bootonly(char *str)
-{
-   smp_alt_once = 1;
-   return 1;
-}
-__setup("smp-alt-boot", bootonly);
-#else
-#define smp_alt_once 1
-#endif
-
 static int __initdata_or_module debug_alternative;
 
 static int __init debug_alt(char *str)
@@ -326,9 +313,6 @@ static void alternatives_smp_unlock(cons
 {
const s32 *poff;
 
-   if (noreplace_smp)
-   return;
-
mutex_lock(&text_mutex);
for (poff = start; poff < end; poff++) {
u8 *ptr = (u8 *)poff + *poff;
@@ -359,7 +343,7 @@ struct smp_alt_module {
 };
 static LIST_HEAD(smp_alt_modules);
 static DEFINE_MUTEX(smp_alt);
-static int smp_mode = 1;   /* protected by smp_alt */
+static bool uniproc_patched = false;   /* protected by smp_alt */
 
 void __init_or_module alternatives_smp_module_add(struct module *mod,
  char *name,
@@ -368,19 +352,18 @@ void __init_or_module alternatives_smp_m
 {
struct smp_alt_module *smp;
 
-   if (noreplace_smp)
-   return;
+   mutex_lock(&smp_alt);
+   if (!uniproc_patched)
+   goto unlock;
 
-   if (smp_alt_once) {
-   if (boot_cpu_has(X86_FEATURE_UP))
-   alternatives_smp_unlock(locks, locks_end,
-   text, text_end);
-   return;
-   }
+   if (num_possible_cpus() == 1)
+   /* Don't bother remembering, we'll never have to undo it. */
+   goto smp_unlock;
 
smp = kzalloc(sizeof(*smp), GFP_KERNEL);
if (NULL == smp)
-   return; /* we'll run the (safe but slow) SMP code then ... */
+   /* we'll run the (safe but slow) SMP code then ... */
+   goto unlock;
 
smp->mod= mod;
smp->name   = name;
@@ -392,11 +375,10 @@ void __init_or_module alternatives_smp_m
__func__, smp->locks, smp->locks_end,
smp->text, smp->text_end

Re: [RFC PATCH 0/6] CPU hotplug: Reverse invocation of notifiers during CPU hotplug

2012-07-27 Thread Rusty Russell

On Wed, 25 Jul 2012 18:30:41 +0200 (CEST), Thomas Gleixner  
wrote:
> The problem with the current notifiers is, that we only have ordering
> for a few specific callbacks, but we don't have the faintest idea in
> which order all other random stuff is brought up and torn down.
> 
> So I started experimenting with the following:
> 
> struct hotplug_event {
>int (*bring_up)(unsigned int cpu);
>int (*tear_down)(unsigned int cpu);
> };
> 
> enum hotplug_events {
>  CPU_HOTPLUG_START,
>  CPU_HOTPLUG_CREATE_THREADS,
>  CPU_HOTPLUG_INIT_TIMERS,
>  ...
>  CPU_HOTPLUG_KICK_CPU,
>  ...
>  CPU_HOTPLUG_START_THREADS,
>  ...
>  CPU_HOTPLUG_SET_ONLINE,
>  ...
>  CPU_HOTPLUG_MAX_EVENTS,
> };

This looks awfully like hardcoded a list of calls, without the
readability :)

OK, I finally got off my ass and looked at the different users of cpu
hotplug.  Some are just doing crazy stuff, but most seem to fall into
two types:

1) Hardware-style cpu callbacks (CPU_UP_PREPARE & CPU_DEAD)
2) Live cpu callbacks (CPU_ONLINE & CPU_DOWN_PREPARE)

I think this is what Srivatsa was referring to with "physical" and
"logical" parts.  Maybe we should explicitly split them, with the idea
that we'd automatically call the other one if we hit an error.

struct cpu_hotplug_physical {
   int (*coming)(unsigned int cpu);
   void (*gone)(unsigned int cpu);
};

struct cpu_hotplug_logical {
   void (*arrived)(unsigned int cpu);
   int (*going)(unsigned int cpu);
};

Several of the live cpu callbacks seem racy to me, since we could be
running userspace tasks before CPU_ONLINE.  It'd be nice to fix this,
too.

Anyway, if we get a model which fits 90%, we can always open-code the
tricky ones.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 3/3] virtio-blk: Add bio-based IO path for virtio-blk

2012-07-27 Thread Rusty Russell

On Fri, 13 Jul 2012 16:38:51 +0800, Asias He  wrote:
> This patch introduces bio-based IO path for virtio-blk.

Acked-by: Rusty Russell 

I just hope we can do better than a module option in future.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-27 Thread Rusty Russell

On Thu, 26 Jul 2012 15:05:39 +0200, Paolo Bonzini  wrote:
> Il 26/07/2012 09:58, Paolo Bonzini ha scritto:
> > 
> >> > Please CC me on the "convert to sg copy-less" patches, It looks 
> >> > interesting
> > Sure.
> 
> Well, here is the gist of it (note it won't apply on any public tree,
> hence no SoB yet).  It should be split in multiple changesets and you
> can make more simplifications on top of it, because
> virtio_scsi_target_state is not anymore variable-sized, but that's
> secondary.

ISTR starting on such a patch years ago, but the primitives to
manipulate a chained sg_list were nonexistent, so I dropped it,
waiting for it to be fully-baked or replaced.  That hasn't happened:

> + /* Remove scatterlist terminator, we will tack more items soon.  */
> + vblk->sg[num + out - 1].page_link &= ~0x2;

I hate this interface:

> +int virtqueue_add_buf_sg(struct virtqueue *_vq,
> +  struct scatterlist *sg_out,
> +  unsigned int out,
> +  struct scatterlist *sg_in,
> +  unsigned int in,
> +  void *data,
> +  gfp_t gfp)

The point of chained scatterlists is they're self-terminated, so the
in & out counts should be calculated.

Counting them is not *that* bad, since we're about to read them all
anyway.

(Yes, the chained scatterlist stuff is complete crack, but I lost that
debate years ago.)

Here's my variant.  Networking, console and block seem OK, at least
(ie. it booted!).

From: Rusty Russell 
Subject: virtio: use chained scatterlists.

Rather than handing a scatterlist[] and out and in numbers to
virtqueue_add_buf(), hand two separate ones which can be chained.

I shall refrain from ranting about what a disgusting hack chained
scatterlists are.  I'll just note that this doesn't make things
simpler (see diff).

The scatterlists we use can be too large for the stack, so we put them
in our device struct and reuse them.  But in many cases we don't want
to pay the cost of sg_init_table() as we don't know how many elements
we'll have and we'd have to initialize the entire table.

This means we have two choices: carefully reset the end markers after
we call virtqueue_add_buf(), which we do in virtio_net for the xmit
path where it's easy and we want to be optimal.  Elsewhere we
implement a helper to unset the end markers after we've filled the
array.

Signed-off-by: Rusty Russell 
---
 drivers/block/virtio_blk.c  |   37 +-
 drivers/char/hw_random/virtio-rng.c |2 -
 drivers/char/virtio_console.c   |6 +--
 drivers/net/virtio_net.c|   67 ++---
 drivers/virtio/virtio_balloon.c |6 +--
 drivers/virtio/virtio_ring.c|   71 ++--
 include/linux/virtio.h  |5 +-
 net/9p/trans_virtio.c   |   46 +--
 8 files changed, 159 insertions(+), 81 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -100,6 +100,14 @@ static void blk_done(struct virtqueue *v
spin_unlock_irqrestore(vblk->disk->queue->queue_lock, flags);
 }

+static void sg_unset_end_markers(struct scatterlist *sg, unsigned int num)
+{
+   unsigned int i;
+
+   for (i = 0; i < num; i++)
+   sg[i].page_link &= ~0x02;
+}
+
 static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
   struct request *req)
 {
@@ -140,6 +148,7 @@ static bool do_req(struct request_queue 
}
}

+   /* We layout out scatterlist in a single array, out then in. */
sg_set_buf(&vblk->sg[out++], &vbr->out_hdr, sizeof(vbr->out_hdr));

/*
@@ -151,17 +160,8 @@ static bool do_req(struct request_queue 
if (vbr->req->cmd_type == REQ_TYPE_BLOCK_PC)
sg_set_buf(&vblk->sg[out++], vbr->req->cmd, vbr->req->cmd_len);

+   /* This marks the end of the sg list at vblk->sg[out]. */
num = blk_rq_map_sg(q, vbr->req, vblk->sg + out);
-
-   if (vbr->req->cmd_type == REQ_TYPE_BLOCK_PC) {
-   sg_set_buf(&vblk->sg[num + out + in++], vbr->req->sense, 
SCSI_SENSE_BUFFERSIZE);
-   sg_set_buf(&vblk->sg[num + out + in++], &vbr->in_hdr,
-  sizeof(vbr->in_hdr));
-   }
-
-   sg_set_buf(&vblk->sg[num + out + in++], &vbr->status,
-  sizeof(vbr->status));
-
if (num) {
if (rq_data_dir(vbr->req) == WRITE) {
vbr->out_hdr.type |= VIRTIO_BLK_T_OUT;
@@ -172,7 +172,22 @@ static bool do_req(struct request_queue 
}
}

-   if (virtqueue_add_buf(vblk->vq, vblk->sg, out, in, vbr, GFP_ATOMIC)<0) {
+   if (vbr->req->cmd_type == REQ_TYPE_BLOCK_PC) {
+   sg_set_buf(&vblk->sg[out + in++], vbr->req->sense,
+  SCSI_SENSE_BUFFERSIZE);
+

[RFC PATCH] fs/direct-io.c: Set bi_rw when alloc bio.

2012-07-27 Thread majianpeng

When exec bio_alloc, the bi_rw is zero.But after calling bio_add_page,
it will use bi_rw.
Fox example, in functiion __bio_add_page,it will call merge_bvec_fn().
The merge_bvec_fn of raid456 will use the bi_rw to judge the merge.
>> if ((bvm->bi_rw & 1) == WRITE)
>>  return biovec->bv_len; /* always allow writes to be mergeable */

Signed-off-by: Jianpeng Ma 

There are many place like this in kernel.If you think this patch ok, i will 
correct those.
---
 fs/direct-io.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 1faf4cb..77f0bbf 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -349,6 +349,7 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
 
bio->bi_bdev = bdev;
bio->bi_sector = first_sector;
+   bio->bi_rw = dio->rw;
if (dio->is_async)
bio->bi_end_io = dio_bio_end_aio;
else
-- 
1.7.5.4

[RFC] PCI/PM: Add ABI document for sysfs file d3cold_allowed

2012-07-27 Thread Huang Ying

This patch adds ABI document for the following sysfs file:

/sys/bus/pci/devices/.../d3cold_allowed

Signed-off-by: Huang Ying 
---
 Documentation/ABI/testing/sysfs-bus-pci |   12 
 1 file changed, 12 insertions(+)

--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -210,3 +210,15 @@ Users:
firmware assigned instance number of the PCI
device that can help in understanding the firmware
intended order of the PCI device.
+
+What:  /sys/bus/pci/devices/.../d3cold_allowed
+Date:  July 2012
+Contact:   Huang Ying 
+Description:
+   d3cold_allowed is bit to control whether the corresponding PCI
+   device can be put into D3Cold state.  If it is cleared, the
+   device will never be put into D3Cold state.  If it is set, the
+   device may be put into D3Cold state if other requirement are
+   satisfied too.  Reading this attribute will show the current
+   value of d3cold_allowed bit.  Writting this attribute will set
+   the value of d3cold_allowed bit.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-27 Thread Paolo Bonzini

Il 27/07/2012 08:27, Rusty Russell ha scritto:
>> > +int virtqueue_add_buf_sg(struct virtqueue *_vq,
>> > +   struct scatterlist *sg_out,
>> > +   unsigned int out,
>> > +   struct scatterlist *sg_in,
>> > +   unsigned int in,
>> > +   void *data,
>> > +   gfp_t gfp)
> The point of chained scatterlists is they're self-terminated, so the
> in & out counts should be calculated.
> 
> Counting them is not *that* bad, since we're about to read them all
> anyway.
> 
> (Yes, the chained scatterlist stuff is complete crack, but I lost that
> debate years ago.)
> 
> Here's my variant.  Networking, console and block seem OK, at least
> (ie. it booted!).

I hate the for loops, even though we're about indeed to read all the
scatterlists anyway... all they do is lengthen critical sections.  Also,
being the first user of chained scatterlist doesn't exactly give me warm
fuzzies.

I think it's simpler if we provide an API to add individual buffers to
the virtqueue, so that you can do multiple virtqueue_add_buf_more
(whatever) before kicking the virtqueue.  The idea is that I can still
use indirect buffers for the scatterlists that come from the block layer
or from an skb, but I will use direct buffers for the request/response
descriptors.  The direct buffers are always a small number (usually 2),
so you can balance the effect by making the virtqueue bigger.  And for
small reads and writes, you save a kmalloc on a very hot path.

(BTW, scatterlists will have separate entries for each page; we do not
need this in virtio buffers.  Collapsing physically-adjacent entries
will speed up QEMU and will also help avoiding indirect buffers).

Paolo



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] dma: tegra: enable/disable dma clock

2012-07-27 Thread Laxman Dewangan


On Tuesday 24 July 2012 10:30 AM, Laxman Dewangan wrote:

On Tuesday 24 July 2012 10:38 AM, Vinod Koul wrote:

On Fri, 2012-07-20 at 13:31 +0530, Laxman Dewangan wrote:

Enable the DMA clock when allocating channel and
disable clock when freeing channels.

Signed-off-by: Laxman Dewangan
---
+   clk_disable_unprepare(tdma->dma_clk);

What if another channel is active, disabling clock can cause bad
behavior. You should check here if all channels are idle and then
disable, or is this handled by clock API?

Yes, clock driver keeps the reference count and so client driver need
not to take care.


Hi Vinod,
Is there any thing remaining from my side here?
Is it possible to make it for K3.6?

Thanks,
Laxman

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code clean up

2012-07-27 Thread Peter Zijlstra

On Fri, 2012-07-27 at 09:47 +0800, Alex Shi wrote:

> From 610515185d8a98c14c7c339c25381bc96cd99d93 Mon Sep 17 00:00:00 2001
> From: Alex Shi 
> Date: Thu, 26 Jul 2012 08:55:34 +0800
> Subject: [PATCH 1/3] sched: recover SD_WAKE_AFFINE in select_task_rq_fair and
>  code clean up
> 
> Since power saving code was removed from sched now, the implement
> code is out of service in this function, and even pollute other logical.
> like, 'want_sd' never has chance to be set '0', that remove the effect
> of SD_WAKE_AFFINE here.
> 
> So, clean up the obsolete code and some other unnecessary code.
> 
> Signed-off-by: Alex Shi 

I think your code leaves an unused definition of SD_PREFER_LOCAL around.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -alternative] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables V2 (resend)

2012-07-27 Thread Mel Gorman

On Thu, Jul 26, 2012 at 01:42:26PM -0400, Rik van Riel wrote:
> On 07/23/2012 12:04 AM, Hugh Dickins wrote:
> 
> >Please don't be upset if I say that I don't like either of your patches.
> >Mainly for obvious reasons - I don't like Mel's because anything with
> >trylock retries and nested spinlocks worries me before I can even start
> >to think about it; and I don't like Michal's for the same reason as Mel,
> >that it spreads more change around in common paths than we would like.
> 
> I have a naive question.
> 
> In huge_pmd_share, we protect ourselves by taking
> the mapping->i_mmap_mutex.
> 
> Is there any reason we could not take the i_mmap_mutex
> in the huge_pmd_unshare path?
> 

We do, in 3.4 at least - callers of __unmap_hugepage_range hold the
i_mmap_mutex. Locking changes in mmotm and there is a patch there that
needs to be reverted. What tree are you looking at?

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables v2

2012-07-27 Thread Mel Gorman

On Thu, Jul 26, 2012 at 12:01:04PM -0400, Larry Woodman wrote:
> On 07/20/2012 09:49 AM, Mel Gorman wrote:
> >+retry:
> > mutex_lock(&mapping->i_mmap_mutex);
> > vma_prio_tree_foreach(svma,&iter,&mapping->i_mmap, idx, idx) {
> > if (svma == vma)
> > continue;
> >+if (svma->vm_mm == vma->vm_mm)
> >+continue;
> >+
> >+/*
> >+ * The target mm could be in the process of tearing down
> >+ * its page tables and the i_mmap_mutex on its own is
> >+ * not sufficient. To prevent races against teardown and
> >+ * pagetable updates, we acquire the mmap_sem and pagetable
> >+ * lock of the remote address space. down_read_trylock()
> >+ * is necessary as the other process could also be trying
> >+ * to share pagetables with the current mm. In the fork
> >+ * case, we are already both mm's so check for that
> >+ */
> >+if (locked_mm != svma->vm_mm) {
> >+if (!down_read_trylock(&svma->vm_mm->mmap_sem)) {
> >+mutex_unlock(&mapping->i_mmap_mutex);
> >+goto retry;
> >+}
> >+smmap_sem =&svma->vm_mm->mmap_sem;
> >+}
> >+
> >+spage_table_lock =&svma->vm_mm->page_table_lock;
> >+spin_lock_nested(spage_table_lock, SINGLE_DEPTH_NESTING);
> >
> > saddr = page_table_shareable(svma, vma, addr, idx);
> > if (saddr) {
> 
> Hi Mel, FYI I tried this and ran into a problem.  When there are
> multiple processes
> in huge_pmd_share() just faulting in the same i_map they all have
> their mmap_sem
> down for write so the down_read_trylock(&svma->vm_mm->mmap_sem) never
> succeeds.  What am I missing?
> 

Probably nothing, this version of the patch is flawed. In the final
(unreleased) version of this approach it had to check if it tried this
trylock for too long and bail out if that happened and fail to share
the page tables. I've dropped this approach to the problem as better
alternatives exist.

Thanks Larry!

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 2/2] powerpc: Uprobes port to powerpc

2012-07-27 Thread Srikar Dronamraju

* Ananth N Mavinakayanahalli  [2012-07-26 10:50:29]:

> From: Ananth N Mavinakayanahalli 
> 
> This is the port of uprobes to powerpc. Usage is similar to x86.
> 
> [root@ ~]# ./bin/perf probe -x /lib64/libc.so.6 malloc
> Added new event:
>   probe_libc:malloc(on 0xb4860)
> 
> You can now use it in all perf tools, such as:
> 
>   perf record -e probe_libc:malloc -aR sleep 1
> 
> [root@ ~]# ./bin/perf record -e probe_libc:malloc -aR sleep 20
> [ perf record: Woken up 22 times to write data ]
> [ perf record: Captured and wrote 5.843 MB perf.data (~255302 samples) ]
> [root@ ~]# ./bin/perf report --stdio
> ...
> 
> # Samples: 83K of event 'probe_libc:malloc'
> # Event count (approx.): 83484
> #
> # Overhead   Command  Shared Object  Symbol
> #     .  ..
> #
> 69.05%   tar  libc-2.12.so   [.] malloc
> 28.57%rm  libc-2.12.so   [.] malloc
>  1.32%  avahi-daemon  libc-2.12.so   [.] malloc
>  0.58%  bash  libc-2.12.so   [.] malloc
>  0.28%  sshd  libc-2.12.so   [.] malloc
>  0.08%irqbalance  libc-2.12.so   [.] malloc
>  0.05% bzip2  libc-2.12.so   [.] malloc
>  0.04% sleep  libc-2.12.so   [.] malloc
>  0.03%multipathd  libc-2.12.so   [.] malloc
>  0.01%  sendmail  libc-2.12.so   [.] malloc
>  0.01% automount  libc-2.12.so   [.] malloc
> 
> Patch applies on the current master branch of Linus' tree (bdc0077af).
> The trap_nr addition patch is a prereq.
> 
> Signed-off-by: Ananth N Mavinakayanahalli 

Acked-by: Srikar Dronamraju  
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables v2

2012-07-27 Thread Mel Gorman

On Thu, Jul 26, 2012 at 05:00:28PM -0400, Rik van Riel wrote:
> On 07/20/2012 09:49 AM, Mel Gorman wrote:
> >This V2 is still the mmap_sem approach that fixes a potential deadlock
> >problem pointed out by Michal.
> 
> Larry and I were looking around the hugetlb code some
> more, and found what looks like yet another race.
> 
> In hugetlb_no_page, we have the following code:
> 
> 
> spin_lock(&mm->page_table_lock);
> size = i_size_read(mapping->host) >> huge_page_shift(h);
> if (idx >= size)
> goto backout;
> 
> ret = 0;
> if (!huge_pte_none(huge_ptep_get(ptep)))
> goto backout;
> 
> if (anon_rmap)
> hugepage_add_new_anon_rmap(page, vma, address);
> else
> page_dup_rmap(page);
> new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
> && (vma->vm_flags & VM_SHARED)));
> set_huge_pte_at(mm, address, ptep, new_pte);
>   ...
>   spin_unlock(&mm->page_table_lock);
> 
> Notice how we check !huge_pte_none with our own
> mm->page_table_lock held.
> 
> This offers no protection at all against other
> processes, that also hold their own page_table_lock.
> 

Yes, the page_table_lock is close to useless once shared page tables are
involved. It's why if we ever wanted to make shared page tables a core MM
thing we'd have to revisit how PTE locking at any level that can share
page tables works.

> In short, it looks like it is possible for multiple
> processes to go through the above code simultaneously,
> potentially resulting in:
> 
> 1) one process overwriting the pte just created by
>another process
> 
> 2) data corruption, as one partially written page
>gets superceded by an newly zeroed page, but no
>TLB invalidates get sent to other CPUs
> 
> 3) a memory leak of a huge page
> 
> Is there anything that would make this race impossible,
> or is this a real bug?
> 

In this case it all happens under the hugetlb instantiation mutex in
hugetlb_fault(). It's yet another reason why removing that mutex would
be a serious undertaking and the gain for doing so is marginal.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace

2012-07-27 Thread Yoshihiro YUNOMAE


Hi Amit,

Thank you for commenting on our work.

(2012/07/26 20:35), Amit Shah wrote:

On (Tue) 24 Jul 2012 [11:36:57], Yoshihiro YUNOMAE wrote:


[...]



Therefore, we propose a new system "virtio-trace", which uses enhanced
virtio-serial and existing ring-buffer of ftrace, for collecting guest kernel
tracing data. In this system, there are 5 main components:
  (1) Ring-buffer of ftrace in a guest
  - When trace agent reads ring-buffer, a page is removed from ring-buffer.
  (2) Trace agent in the guest
  - Splice the page of ring-buffer to read_pipe using splice() without
memory copying. Then, the page is spliced from write_pipe to virtio
without memory copying.


I really like the splicing idea.


Thanks. We will improve this patch set.


  (3) Virtio-console driver in the guest
  - Pass the page to virtio-ring
  (4) Virtio-serial bus in QEMU
  - Copy the page to kernel pipe
  (5) Reader in the host
  - Read guest tracing data via FIFO(named pipe)


So will this be useful only if guest and host run the same kernel?

I'd like to see the host kernel not being used at all -- collect all
relevant info from the guest and send it out to qemu, where it can be
consumed directly by apps driving the tracing.


No, this patch set is used only for guest kernels, so guest and host
don't need to run the same kernel.


***Evaluation***
When a host collects tracing data of a guest, the performance of using
virtio-trace is compared with that of using native(just running ftrace),
IVRing, and virtio-serial(normal method of read/write).


Why is tracing performance-sensitive?  i.e. why try to optimise this
at all?


To minimize effects for applications on guests when a host collects
tracing data of guests.
For example, we assume the situation where guests A and B are running
on a host sharing I/O device. An I/O delay problem occur in guest A,
but it doesn't for the requirement in guest B. In this case, we need to
collect tracing data of guests A and B, but a usual method using
network takes high load for applications of guest B even if guest B is
normally running. Therefore, we try to decrease the load on guests.
We also use this feature for performance analysis on production
virtualization systems.

[...]



***Just enhancement ideas***
  - Support for trace-cmd
  - Support for 9pfs protocol
  - Support for non-blocking mode in QEMU


There were patches long back (by me) to make chardevs non-blocking but
they didn't make it upstream.  Fedora carries them, if you want to try
out.  Though we want to converge on a reasonable solution that's
acceptable upstream as well.  Just that no one's working on it
currently.  Any help here will be appreciated.


Thanks! In this case, since a guest will stop to run when host reads
trace data of the guest, char device is needed to add a non-blocking
mode. I'll read your patch series. Is the latest version 8?
http://lists.gnu.org/archive/html/qemu-devel/2010-12/msg00035.html


  - Make "vhost-serial"


I need to understand a) why it's perf-critical, and b) why should the
host be involved at all, to comment on these.


a) To make collecting overhead decrease for application on a guest.
   (see above)
b) Trace data of host kernel is not involved even if we introduce this
   patch set.

Thank you,

--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH 02/24] xen/arm: hypercalls

2012-07-27 Thread Ian Campbell

On Thu, 2012-07-26 at 17:56 +0100, David Vrabel wrote:
> On 26/07/12 16:33, Stefano Stabellini wrote:
> > 
> > + * The hvc ISS is required to be 0xEA1, that is the Xen specific ARM
> > + * hypercall tag.
> 
> Is this number, 0xea1, assigned to Xen by some external body?

The value and semantics of the hvc instructions immediate operand is
entirely up to the hypervisor authors. We could have chosen 0 or some
random number, we went for the latter because it increases the chances,
by some tiny amount, that we won't clash with some other hypervisors ABI
which makes supporting "foreign" guests that bit easier should it even
come to it.

IOW it's arbitrary in the same way that a Linux system calls used to use
int 0x80.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 4/7] scsi: sr: block events when runtime suspended

2012-07-27 Thread Aaron Lu

When the ODD is runtime suspended, there is no need to poll it for
events, so block events poll for it and unblock when resumed.

Signed-off-by: Aaron Lu 
---
 block/genhd.c | 2 ++
 drivers/scsi/sr.c | 7 ---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 9cf5583..bdb3682 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1458,6 +1458,7 @@ void disk_block_events(struct gendisk *disk)
 
mutex_unlock(&ev->block_mutex);
 }
+EXPORT_SYMBOL(disk_block_events);
 
 static void __disk_unblock_events(struct gendisk *disk, bool check_now)
 {
@@ -1502,6 +1503,7 @@ void disk_unblock_events(struct gendisk *disk)
if (disk->ev)
__disk_unblock_events(disk, false);
 }
+EXPORT_SYMBOL(disk_unblock_events);
 
 /**
  * disk_flush_events - schedule immediate event checking and flushing
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index acfd10a..cbc14ea 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -205,6 +205,8 @@ static int sr_suspend(struct device *dev, pm_message_t msg)
return -EBUSY;
}
 
+   disk_block_events(cd->disk);
+
return 0;
 }
 
@@ -226,6 +228,8 @@ static int sr_resume(struct device *dev)
atomic_set(&cd->suspend_count, 1);
}
 
+   disk_unblock_events(cd->disk);
+
return 0;
 }
 
@@ -314,9 +318,6 @@ static unsigned int sr_check_events(struct 
cdrom_device_info *cdi,
if (CDSL_CURRENT != slot)
return 0;
 
-   if (pm_runtime_suspended(&cd->device->sdev_gendev))
-   return 0;
-
/* if the logical unit just finished loading/unloading, do a TUR */
if (cd->device->can_power_off && cd->dbml && sr_unit_load_done(cd)) {
events = 0;
-- 
1.7.11.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 3/7] scsi: sr: support zero power ODD(ZPODD)

2012-07-27 Thread Aaron Lu

The ODD will be placed into suspend state when:
1 For tray type ODD, no media inside and door closed;
2 For slot type ODD, no media inside;
And together with ACPI, when we suspend the ODD's parent(the port it
attached to), we will omit the power altogether to reduce power
consumption(done in libata-acpi.c).

The ODD can be resumed either by user or by software.

For user to resume the suspended ODD:
1 For tray type ODD, press the eject button;
2 For slot type ODD, insert a disc;
Once such events happened, an ACPI notification will be sent and in our
handler, we will power up the ODD and set its status back to
active(again in libata-acpi.c).

For software to resume the suspended ODD, we did this in ODD's
open/release function: we scsi_autopm_get/put_device in scsi_cd_get/put.

On old distros, the udisk daemon will poll the ODD and thus ODD will be
open/closed every 2 seconds. To make use of ZPODD, udisks' poll has to
be inhibited:
$ udisks --inhibit-polling /dev/sr0

All of the above depends on if the device can be powered off runtime,
which is reflected by the can_power_off flag.

Signed-off-by: Aaron Lu 
---
 drivers/ata/libata-acpi.c  |   4 +-
 drivers/scsi/sr.c  | 131 -
 drivers/scsi/sr.h  |   2 +
 include/scsi/scsi_device.h |   1 +
 4 files changed, 136 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c
index 902b5a4..a2b16c9 100644
--- a/drivers/ata/libata-acpi.c
+++ b/drivers/ata/libata-acpi.c
@@ -985,8 +985,10 @@ static void ata_acpi_wake_dev(acpi_handle handle, u32 
event, void *context)
struct ata_device *ata_dev = context;
 
if (event == ACPI_NOTIFY_DEVICE_WAKE && ata_dev &&
-   pm_runtime_suspended(&ata_dev->sdev->sdev_gendev))
+   pm_runtime_suspended(&ata_dev->sdev->sdev_gendev)) {
+   ata_dev->sdev->wakeup_by_user = 1;
scsi_autopm_get_device(ata_dev->sdev);
+   }
 }
 
 static void ata_acpi_add_pm_notifier(struct ata_device *dev)
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index abfefab..acfd10a 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -79,6 +80,8 @@ static DEFINE_MUTEX(sr_mutex);
 static int sr_probe(struct device *);
 static int sr_remove(struct device *);
 static int sr_done(struct scsi_cmnd *);
+static int sr_suspend(struct device *, pm_message_t msg);
+static int sr_resume(struct device *);
 
 static struct scsi_driver sr_template = {
.owner  = THIS_MODULE,
@@ -86,6 +89,8 @@ static struct scsi_driver sr_template = {
.name   = "sr",
.probe  = sr_probe,
.remove = sr_remove,
+   .suspend= sr_suspend,
+   .resume = sr_resume,
},
.done   = sr_done,
 };
@@ -147,8 +152,12 @@ static inline struct scsi_cd *scsi_cd_get(struct gendisk 
*disk)
kref_get(&cd->kref);
if (scsi_device_get(cd->device))
goto out_put;
+   if (cd->device->can_power_off && scsi_autopm_get_device(cd->device))
+   goto out_pm;
goto out;
 
+ out_pm:
+   scsi_device_put(cd->device);
  out_put:
kref_put(&cd->kref, sr_kref_release);
cd = NULL;
@@ -164,9 +173,93 @@ static void scsi_cd_put(struct scsi_cd *cd)
mutex_lock(&sr_ref_mutex);
kref_put(&cd->kref, sr_kref_release);
scsi_device_put(sdev);
+   if (sdev->can_power_off)
+   scsi_autopm_put_device_autosuspend(sdev);
mutex_unlock(&sr_ref_mutex);
 }
 
+static int sr_suspend(struct device *dev, pm_message_t msg)
+{
+   int poweroff;
+   struct scsi_sense_hdr sshdr;
+   struct scsi_cd *cd = dev_get_drvdata(dev);
+
+   /* no action for system pm */
+   if (!PMSG_IS_AUTO(msg))
+   return 0;
+
+   /* do another TUR to see if the ODD is still ready to be powered off */
+   scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr);
+
+   if (cd->cdi.mask & CDC_CLOSE_TRAY)
+   /* no media for caddy/slot type ODD */
+   poweroff = scsi_sense_valid(&sshdr) && sshdr.asc == 0x3a;
+   else
+   /* no media and door closed for tray type ODD */
+   poweroff = scsi_sense_valid(&sshdr) && sshdr.asc == 0x3a &&
+   sshdr.ascq == 0x01;
+
+   if (!poweroff) {
+   pm_runtime_get_noresume(dev);
+   atomic_set(&cd->suspend_count, 1);
+   return -EBUSY;
+   }
+
+   return 0;
+}
+
+static int sr_resume(struct device *dev)
+{
+   struct scsi_cd *cd;
+   struct scsi_sense_hdr sshdr;
+
+   cd = dev_get_drvdata(dev);
+
+   /* get the disk ready */
+   scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr);
+
+   /* if user wa

[PATCH v4 6/7] scsi: sr: balance sr disk events block depth

2012-07-27 Thread Aaron Lu

When the ODD is resumed, disk_unblock_events should be called when:
1 The ODD is runtime resumed;
2 System is resuming from S3 and the ODD is runtime suspended before S3;
But not when the system is resuming from S3 and the ODD is runtime
active before S3.

So seperate the resume calls, one for system resume and one for runtime
resume to do different things accordingly.

Signed-off-by: Aaron Lu 
---
 drivers/scsi/sr.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index cbc14ea..f0c4aa2 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -82,6 +82,11 @@ static int sr_remove(struct device *);
 static int sr_done(struct scsi_cmnd *);
 static int sr_suspend(struct device *, pm_message_t msg);
 static int sr_resume(struct device *);
+static int sr_runtime_resume(struct device *);
+
+static struct dev_pm_ops sr_pm_ops = {
+   .runtime_resume = sr_runtime_resume,
+};
 
 static struct scsi_driver sr_template = {
.owner  = THIS_MODULE,
@@ -91,6 +96,7 @@ static struct scsi_driver sr_template = {
.remove = sr_remove,
.suspend= sr_suspend,
.resume = sr_resume,
+   .pm = &sr_pm_ops,
},
.done   = sr_done,
 };
@@ -213,6 +219,23 @@ static int sr_suspend(struct device *dev, pm_message_t msg)
 static int sr_resume(struct device *dev)
 {
struct scsi_cd *cd;
+
+   /*
+* If ODD is runtime suspended before system pm, unblock disk
+* events now since on system resume we will fully resume it
+* and set its runtime status to active.
+*/
+   if (pm_runtime_suspended(dev)) {
+   cd = dev_get_drvdata(dev);
+   disk_unblock_events(cd->disk);
+   }
+
+   return 0;
+}
+
+static int sr_runtime_resume(struct device *dev)
+{
+   struct scsi_cd *cd;
struct scsi_sense_hdr sshdr;
 
cd = dev_get_drvdata(dev);
-- 
1.7.11.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 5/7] scsi: pm: use runtime resume callback if available

2012-07-27 Thread Aaron Lu

When runtime resume a scsi device, if the device's driver has
implemented runtime resume callback, use that.

sr driver needs this to do different things for system resume and
runtime resume.

Signed-off-by: Aaron Lu 
---
 drivers/scsi/scsi_pm.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index 83edb93..690136c 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -34,14 +34,19 @@ static int scsi_dev_type_suspend(struct device *dev, 
pm_message_t msg)
return err;
 }
 
-static int scsi_dev_type_resume(struct device *dev)
+static int scsi_dev_type_resume(struct device *dev, bool runtime)
 {
struct device_driver *drv;
int err = 0;
+   int (*resume)(struct device *);
 
drv = dev->driver;
-   if (drv && drv->resume)
-   err = drv->resume(dev);
+   if (runtime && drv && drv->pm && drv->pm->runtime_resume)
+   resume = drv->pm->runtime_resume;
+   else
+   resume = drv ? drv->resume : NULL;
+   if (resume)
+   err = resume(dev);
scsi_device_resume(to_scsi_device(dev));
dev_dbg(dev, "scsi resume: %d\n", err);
return err;
@@ -85,7 +90,7 @@ static int scsi_bus_resume_common(struct device *dev)
pm_runtime_get_sync(dev->parent);
 
if (scsi_is_sdev_device(dev))
-   err = scsi_dev_type_resume(dev);
+   err = scsi_dev_type_resume(dev, false);
if (err == 0) {
pm_runtime_disable(dev);
pm_runtime_set_active(dev);
@@ -160,7 +165,7 @@ static int scsi_runtime_resume(struct device *dev)
 
dev_dbg(dev, "scsi_runtime_resume\n");
if (scsi_is_sdev_device(dev))
-   err = scsi_dev_type_resume(dev);
+   err = scsi_dev_type_resume(dev, true);
 
/* Insert hooks here for targets, hosts, and transport classes */
 
-- 
1.7.11.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 1/7] scsi: sr: check support for device busy class events

2012-07-27 Thread Aaron Lu

Signed-off-by: Aaron Lu 
---
 drivers/scsi/sr.c | 23 +++
 drivers/scsi/sr.h |  1 +
 include/linux/cdrom.h | 43 +++
 3 files changed, 67 insertions(+)

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 5fc97d2..abfefab 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -101,6 +101,7 @@ static DEFINE_MUTEX(sr_ref_mutex);
 static int sr_open(struct cdrom_device_info *, int);
 static void sr_release(struct cdrom_device_info *);
 
+static void check_dbml(struct scsi_cd *);
 static void get_sectorsize(struct scsi_cd *);
 static void get_capabilities(struct scsi_cd *);
 
@@ -728,6 +729,28 @@ fail:
return error;
 }
 
+static void check_dbml(struct scsi_cd *cd)
+{
+   struct packet_command cgc;
+   unsigned char buffer[16];
+   struct rm_feature_desc *rfd;
+
+   init_cdrom_command(&cgc, buffer, sizeof(buffer), CGC_DATA_READ);
+   cgc.cmd[0] = GPCMD_GET_CONFIGURATION;
+   cgc.cmd[3] = CDF_RM;
+   cgc.cmd[8] = sizeof(buffer);
+   cgc.quiet = 1;
+
+   if (cd->cdi.ops->generic_packet(&cd->cdi, &cgc))
+   return;
+
+   rfd = (struct rm_feature_desc *)&buffer[sizeof(struct feature_header)];
+   if (be16_to_cpu(rfd->feature_code) != CDF_RM)
+   return;
+
+   if (rfd->dbml)
+   cd->dbml = 1;
+}
 
 static void get_sectorsize(struct scsi_cd *cd)
 {
diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h
index 37c8f6b..7cc40ad 100644
--- a/drivers/scsi/sr.h
+++ b/drivers/scsi/sr.h
@@ -41,6 +41,7 @@ typedef struct scsi_cd {
unsigned readcd_known:1;/* drive supports READ_CD (0xbe) */
unsigned readcd_cdda:1; /* reading audio data using READ_CD */
unsigned media_present:1;   /* media is present */
+   unsigned dbml:1;/* generates device busy class events */
 
/* GET_EVENT spurious event handling, blk layer guarantees exclusion */
int tur_mismatch;   /* nr of get_event TUR mismatches */
diff --git a/include/linux/cdrom.h b/include/linux/cdrom.h
index dfd7f18..962be39 100644
--- a/include/linux/cdrom.h
+++ b/include/linux/cdrom.h
@@ -727,6 +727,7 @@ struct request_sense {
 /*
  * feature profile
  */
+#define CDF_RM 0x0003  /* "Removable Medium" */
 #define CDF_RWRT   0x0020  /* "Random Writable" */
 #define CDF_HWDM   0x0024  /* "Hardware Defect Management" */
 #define CDF_MRW0x0028
@@ -739,6 +740,48 @@ struct request_sense {
 #define CDM_MRW_BGFORMAT_ACTIVE2
 #define CDM_MRW_BGFORMAT_COMPLETE  3
 
+/* Removable medium feature descriptor */
+struct rm_feature_desc {
+   __be16 feature_code;
+#if defined(__BIG_ENDIAN_BITFIELD)
+   __u8 reserved1  : 2;
+   __u8 feature_version: 4;
+   __u8 persistent : 1;
+   __u8 curr   : 1;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+   __u8 curr   : 1;
+   __u8 persistent : 1;
+   __u8 feature_version: 4;
+   __u8 reserved1  : 2;
+#endif
+   __u8 add_len;
+#if defined(__BIG_ENDIAN_BITFIELD)
+   __u8 mech_type  : 3;
+   __u8 load   : 1;
+   __u8 eject  : 1;
+   __u8 pvnt_jmpr  : 1;
+   __u8 dbml   : 1;
+   __u8 lock   : 1;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+   __u8 lock   : 1;
+   __u8 dbml   : 1;
+   __u8 pvnt_jmpr  : 1;
+   __u8 eject  : 1;
+   __u8 load   : 1;
+   __u8 mech_type  : 3;
+#endif
+   __u8 reserved2;
+   __u8 reserved3;
+   __u8 reserved4;
+};
+
+struct device_busy_event_desc {
+   __u8 device_busy_event  : 4;
+   __u8 reserved1  : 4;
+   __u8 device_busy_status;
+   __u8 time;
+};
+
 /*
  * mrw address spaces
  */
-- 
1.7.11.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 7/7] block: genhd: add an interface to set disk's poll interval

2012-07-27 Thread Aaron Lu

Set the ODD's in kernel poll interval to 2s for the user in case the
user is using an old distro on which udev will not set the system wide
block parameter events_dfl_poll_msecs.

Signed-off-by: Aaron Lu 
---
 block/genhd.c | 23 +--
 drivers/scsi/sr.c |  1 +
 include/linux/genhd.h |  1 +
 3 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index bdb3682..de9b9d9 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1619,6 +1619,19 @@ static void disk_events_workfn(struct work_struct *work)
kobject_uevent_env(&disk_to_dev(disk)->kobj, KOBJ_CHANGE, envp);
 }
 
+int disk_events_set_poll_msecs(struct gendisk *disk, long intv)
+{
+   if (intv < 0 && intv != -1)
+   return -EINVAL;
+
+   disk_block_events(disk);
+   disk->ev->poll_msecs = intv;
+   __disk_unblock_events(disk, true);
+
+   return 0;
+}
+EXPORT_SYMBOL(disk_events_set_poll_msecs);
+
 /*
  * A disk events enabled device has the following sysfs nodes under
  * its /sys/block/X/ directory.
@@ -1675,16 +1688,14 @@ static ssize_t disk_events_poll_msecs_store(struct 
device *dev,
 {
struct gendisk *disk = dev_to_disk(dev);
long intv;
+   int ret;
 
if (!count || !sscanf(buf, "%ld", &intv))
return -EINVAL;
 
-   if (intv < 0 && intv != -1)
-   return -EINVAL;
-
-   disk_block_events(disk);
-   disk->ev->poll_msecs = intv;
-   __disk_unblock_events(disk, true);
+   ret = disk_events_set_poll_msecs(disk, intv);
+   if (ret)
+   return ret;
 
return count;
 }
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index f0c4aa2..e6e5549 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -864,6 +864,7 @@ static int sr_probe(struct device *dev)
dev_set_drvdata(dev, cd);
disk->flags |= GENHD_FL_REMOVABLE;
add_disk(disk);
+   disk_events_set_poll_msecs(disk, 2000);
 
sdev_printk(KERN_DEBUG, sdev,
"Attached scsi CD-ROM %s\n", cd->cdi.name);
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index ae0aaa9..308d47e 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -417,6 +417,7 @@ extern void disk_block_events(struct gendisk *disk);
 extern void disk_unblock_events(struct gendisk *disk);
 extern void disk_flush_events(struct gendisk *disk, unsigned int mask);
 extern unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask);
+extern int disk_events_set_poll_msecs(struct gendisk *disk, long intv);
 
 /* drivers/char/random.c */
 extern void add_disk_randomness(struct gendisk *disk);
-- 
1.7.11.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 0/7] ZPODD patches

2012-07-27 Thread Aaron Lu

v4:
Rebase on top of Linus' tree, due to this, the problem of a missing
flag in v3 is gone;
Add a new function scsi_autopm_put_device_autosuspend to first mark
last busy for the device and then put autosuspend it as suggested by
Oliver Neukum.
Typo fix as pointed by Sergei Shtylyov.
Check can_power_off flag before any runtime pm operations in sr.

v3:
Rebase on top of scsi-misc tree;
Add the sr related patches previously in Jeff's libata tree;
Re-organize the sr patches.
A problem for now: for patch
scsi: sr: support zero power ODD(ZPODD)
I can't set a flag in libata-acpi.c since a related function is
missing in scsi-misc tree. Will fix this when 3.6-rc1 released.

v2:
Bug fix for v1;
Use scsi_autopm_* in sr driver instead of pm_runtime_*;

v1:
Here are some patches to make ZPODD easier to use for end users and
a fix for using ZPODD with system suspend.

Aaron Lu (7):
  scsi: sr: check support for device busy class events
  scsi: pm: add interface to autosuspend scsi device
  scsi: sr: support zero power ODD(ZPODD)
  scsi: sr: block events when runtime suspended
  scsi: pm: use runtime resume callback if available
  scsi: sr: balance sr disk events block depth
  block: genhd: add an interface to set disk's poll interval

 block/genhd.c  |  25 +--
 drivers/ata/libata-acpi.c  |   4 +-
 drivers/scsi/scsi_pm.c |  22 --
 drivers/scsi/sr.c  | 179 -
 drivers/scsi/sr.h  |   3 +
 include/linux/cdrom.h  |  43 +++
 include/linux/genhd.h  |   1 +
 include/scsi/scsi_device.h |   3 +
 8 files changed, 267 insertions(+), 13 deletions(-)

-- 
1.7.11.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 2/7] scsi: pm: add interface to autosuspend scsi device

2012-07-27 Thread Aaron Lu

Add a new interface scsi_autopm_put_device_autosuspend to mark
last busy for the device and then put autosuspend the device.

Signed-off-by: Aaron Lu 
---
 drivers/scsi/scsi_pm.c | 7 +++
 include/scsi/scsi_device.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index dc0ad85..83edb93 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -201,6 +201,13 @@ void scsi_autopm_put_device(struct scsi_device *sdev)
 }
 EXPORT_SYMBOL_GPL(scsi_autopm_put_device);
 
+void scsi_autopm_put_device_autosuspend(struct scsi_device *sdev)
+{
+   pm_runtime_mark_last_busy(&sdev->sdev_gendev);
+   pm_runtime_put_autosuspend(&sdev->sdev_gendev);
+}
+EXPORT_SYMBOL_GPL(scsi_autopm_put_device_autosuspend);
+
 void scsi_autopm_get_target(struct scsi_target *starget)
 {
pm_runtime_get_sync(&starget->dev);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 9895f69..3636146 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -395,9 +395,11 @@ extern int scsi_execute_req(struct scsi_device *sdev, 
const unsigned char *cmd,
 #ifdef CONFIG_PM_RUNTIME
 extern int scsi_autopm_get_device(struct scsi_device *);
 extern void scsi_autopm_put_device(struct scsi_device *);
+extern void scsi_autopm_put_device_autosuspend(struct scsi_device *);
 #else
 static inline int scsi_autopm_get_device(struct scsi_device *d) { return 0; }
 static inline void scsi_autopm_put_device(struct scsi_device *d) {}
+static inline void scsi_autopm_put_device_autosuspend(struct scsi_device *d) {}
 #endif /* CONFIG_PM_RUNTIME */
 
 static inline int __must_check scsi_device_reprobe(struct scsi_device *sdev)
-- 
1.7.11.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] netconsole.txt: "nc" needs "-p" to specify the listening port

2012-07-27 Thread Cong Wang

On Fri, Jul 27, 2012 at 2:35 PM, Dirk Gouders
 wrote:
> Hi Jesse,
>
> I would like to ask you to check if the documentation of "nc" in
> netconsole.txt is still correct.  I tried two different netcat packages
> and both require "-p" to specify the listening port.  I am wondering if
> that changed after the use of "nc" has been documented.

On Fedora 16, `nc -u -l ` works fine.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -alternative] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables V2 (resend)

2012-07-27 Thread Michal Hocko

On Thu 26-07-12 14:31:50, Rik van Riel wrote:
> On 07/20/2012 10:36 AM, Michal Hocko wrote:
> 
> >--- a/arch/x86/mm/hugetlbpage.c
> >+++ b/arch/x86/mm/hugetlbpage.c
> >@@ -81,7 +81,12 @@ static void huge_pmd_share(struct mm_struct *mm, unsigned 
> >long addr, pud_t *pud)
> > if (saddr) {
> > spte = huge_pte_offset(svma->vm_mm, saddr);
> > if (spte) {
> >-get_page(virt_to_page(spte));
> >+struct page *spte_page = virt_to_page(spte);
> >+if (!is_hugetlb_pmd_page_valid(spte_page)) {
> 
> What prevents somebody else from marking the hugetlb
> pmd invalid, between here...
> 
> >+spte = NULL;
> >+continue;
> >+}
> 
> ... and here?

huge_ptep_get_and_clear is (should be) called inside i_mmap which is not
the case right now as Mel already pointed out in other email

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/1] mmc: block: Add write packing control

2012-07-27 Thread S, Venkatraman

On Fri, Jul 27, 2012 at 12:24 AM,   wrote:
>
> On Thu, July 26, 2012 8:28 am, S, Venkatraman wrote:
>> On Tue, Jul 24, 2012 at 2:14 PM,   wrote:
>>> On Mon, July 23, 2012 5:22 am, S, Venkatraman wrote:
 On Mon, Jul 23, 2012 at 5:13 PM,   wrote:
> On Wed, July 18, 2012 12:26 am, Chris Ball wrote:
>> Hi,  [removing Jens and the documentation list, since now we're
>>> talking about the MMC side only]
>> On Wed, Jul 18 2012, me...@codeaurora.org wrote:
>>> Is there anything else that holds this patch from being pushed to
> mmc-next?
>> Yes, I'm still uncomfortable with the write packing patchsets for a
> couple of reasons, and I suspect that the sum of those reasons means
> that
> we should probably plan on holding off merging it until after 3.6.
>> Here are the open issues; please correct any misunderstandings: With
>>> Seungwon's patchset ("Support packed write command"):
>> * I still don't have a good set of representative benchmarks showing
>>   what kind of performance changes come with this patchset.  It seems
> like we've had a small amount of testing on one controller/eMMC part
> combo
> from Seungwon, and an entirely different test from Maya, and the
>>> results
> aren't documented fully anywhere to the level of describing what the
>>> hardware was, what the test was, and what the results were before and
>>> after the patchset.
> Currently, there is only one card vendor that supports packed
> commands.
>>> Following are our sequential write (LMDD) test results on 2 of our
>>> targets
> (in MB/s):
>No packingpacking
> Target 1 (SDR 50MHz) 15   25
> Target 2 (DDR 50MHz) 20   30
>> With the reads-during-writes regression:
>> * Venkat still has open questions about the nature of the read
>>   regression, and thinks we should understand it with blktrace before
> trying to fix it.  Maya has a theory about writes overwhelming reads,
> but
> Venkat doesn't understand why this would explain the observed
> bandwidth drop.
> The degradation of read due to writes is not a new behavior and exists
>>> also without the write packing feature (which only increases the
>>> degradation). Our investigation of this phenomenon led us to the
>>> Conclusion that a new scheduling policy should be used for mobile
>>> devices,
> but this is not related to the current discussion of the write packing
>>> feature.
> The write packing feature increases the degradation of read due to
>>> write
> since it allows the MMC to fetch many write requests in a row, instead
> of
> fetching only one at a time.  Therefore some of the read requests will
>>> have to wait for the completion of more write requests before they can
>>> be
> issued.

 I am a bit puzzled by this claim. One thing I checked carefully when
>>> reviewing write packing patches from SJeon was that the code didn't
>>> plough through a mixed list of reads and writes and selected only
>>> writes.
 This section of the code in "mmc_blk_prep_packed_list()", from v8
>>> patchset..
 
 +   if (rq_data_dir(cur) != rq_data_dir(next)) {
 +   put_back = 1;
 +   break;
 +   }
 

 means that once a read is encountered in the middle of write packing,
>>> the packing is stopped at that point and it is executed. Then the next
>>> blk_fetch_request should get the next read and continue as before.

 IOW, the ordering of reads and writes is _not_ altered when using
 packed
>>> commands.
 For example if there were 5 write requests, followed by 1 read,
 followed by 5 more write requests in the request_queue, the first 5
>>> writes will be executed as one "packed command", then the read will be
>>> executed, and then the remaining 5 writes will be executed as one
>>> "packed command". So the read does not have to wait any more than it
>>> waited before (packing feature)
>>>
>>> Let me try to better explain with your example.
>>> Without packing the MMC layer will fetch 2 write requests and wait for
>>> the
>>> first write request completion before fetching another write request.
>>> During this time the read request could be inserted into the CFQ and
>>> since
>>> it has higher priority than the async write it will be dispatched in the
>>> next fetch. So, the result would be 2 write requests followed by one
>>> read
>>> request and the read would have to wait for completion of only 2 write
>>> requests.
>>> With packing, all the 5 write requests will be fetched in a row, and
>>> then
>>> the read will arrive and be dispatched in the next fetch. Then the read
>>> will have to wait for the completion of 5 write requests.
>>>
>>> Few more clarifications:
>>> Due to the plug list mechanism in the block layer the applications can
>>> "aggregate" several requests to be

Re: [Xen-devel] [PATCH 02/24] xen/arm: hypercalls

2012-07-27 Thread Ian Campbell

On Thu, 2012-07-26 at 17:33 +0100, Konrad Rzeszutek Wilk wrote:
> On Thu, Jul 26, 2012 at 04:33:44PM +0100, Stefano Stabellini wrote:
> > Use r12 to pass the hypercall number to the hypervisor.
> > 
> > We need a register to pass the hypercall number because we might not
> > know it at compile time and HVC only takes an immediate argument.
> > 
> > Among the available registers r12 seems to be the best choice because it
> > is defined as "intra-procedure call scratch register".
> > 
> > Use the ISS to pass an hypervisor specific tag.
> > 
> > Signed-off-by: Stefano Stabellini 
> > ---
> >  arch/arm/include/asm/xen/hypercall.h |   50 ++
> >  arch/arm/xen/Makefile|2 +-
> >  arch/arm/xen/hypercall.S |   65 
> > ++
> >  3 files changed, 116 insertions(+), 1 deletions(-)
> >  create mode 100644 arch/arm/include/asm/xen/hypercall.h
> >  create mode 100644 arch/arm/xen/hypercall.S
> > 
> > diff --git a/arch/arm/include/asm/xen/hypercall.h 
> > b/arch/arm/include/asm/xen/hypercall.h
> > new file mode 100644
> > index 000..4ac0624
> > --- /dev/null
> > +++ b/arch/arm/include/asm/xen/hypercall.h
> > @@ -0,0 +1,50 @@
> > +/**
> > + * hypercall.h
> > + *
> > + * Linux-specific hypervisor handling.
> > + *
> > + * Stefano Stabellini , Citrix, 2012
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License version 2
> > + * as published by the Free Software Foundation; or, when distributed
> > + * separately from the Linux kernel or incorporated into other
> > + * software packages, subject to the following license:
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a 
> > copy
> > + * of this source file (the "Software"), to deal in the Software without
> > + * restriction, including without limitation the rights to use, copy, 
> > modify,
> > + * merge, publish, distribute, sublicense, and/or sell copies of the 
> > Software,
> > + * and to permit persons to whom the Software is furnished to do so, 
> > subject to
> > + * the following conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be included 
> > in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
> > OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 
> > THE
> > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> > DEALINGS
> > + * IN THE SOFTWARE.
> > + */
> > +
> > +#ifndef _ASM_ARM_XEN_HYPERCALL_H
> > +#define _ASM_ARM_XEN_HYPERCALL_H
> > +
> > +#include 
> > +
> > +long privcmd_call(unsigned call, unsigned long a1,
> > +   unsigned long a2, unsigned long a3,
> > +   unsigned long a4, unsigned long a5);
> > +int HYPERVISOR_xen_version(int cmd, void *arg);
> > +int HYPERVISOR_console_io(int cmd, int count, char *str);
> > +int HYPERVISOR_grant_table_op(unsigned int cmd, void *uop, unsigned int 
> > count);
> > +int HYPERVISOR_sched_op(int cmd, void *arg);
> > +int HYPERVISOR_event_channel_op(int cmd, void *arg);
> > +unsigned long HYPERVISOR_hvm_op(int op, void *arg);
> > +int HYPERVISOR_memory_op(unsigned int cmd, void *arg);
> > +int HYPERVISOR_physdev_op(int cmd, void *arg);
> > +
> > +#endif /* _ASM_ARM_XEN_HYPERCALL_H */
> > diff --git a/arch/arm/xen/Makefile b/arch/arm/xen/Makefile
> > index 0bad594..b9d6acc 100644
> > --- a/arch/arm/xen/Makefile
> > +++ b/arch/arm/xen/Makefile
> > @@ -1 +1 @@
> > -obj-y  := enlighten.o
> > +obj-y  := enlighten.o hypercall.o
> > diff --git a/arch/arm/xen/hypercall.S b/arch/arm/xen/hypercall.S
> > new file mode 100644
> > index 000..038cc5b
> > --- /dev/null
> > +++ b/arch/arm/xen/hypercall.S
> > @@ -0,0 +1,65 @@
> > +/**
> > + * hypercall.S
> > + *
> > + * Xen hypercall wrappers
> > + *
> > + * The Xen hypercall calling convention is very similar to the ARM
> > + * procedure calling convention: the first paramter is passed in r0, the
> > + * second in r1, the third in r2 and the third in r3. Considering that
> 
> I think you meant 'and the fourth in r3'.
> 
> So where does the similarity end?  Just in that we use r12?

The standard ARM function calling convention is arguments 1-4 on r0-r3
and arguments 5+ on the stack. r12 is a scratch register which can be
clobbered by the *linker* on subroutine call (r12 is also called "ip"
the intra-procedure call scratch register).

The hypervisor doesn't

Re: [PATCH 02/24] xen/arm: hypercalls

2012-07-27 Thread Ian Campbell

On Thu, 2012-07-26 at 20:19 +0100, Christopher Covington wrote:
> Hi Stefano,
> 
> On 07/26/2012 11:33 AM, Stefano Stabellini wrote:
> > Use r12 to pass the hypercall number to the hypervisor.
> > 
> > We need a register to pass the hypercall number because we might not
> > know it at compile time and HVC only takes an immediate argument.
> 
> You're not going to JIT assemble the appropriate HVC instruction? Darn.

;-)

> How many call numbers are there, though? 8?

The maximum currently defined hypercall number is 55, although there are
some small gaps so there's actually more like 45 in total.

>  It seems like it'd be
> reasonable to take the approach that seems to be favored for MRC/MCR
> instructions, using a function containing switch statement that chooses
> between several inline assembly instructions based off an enum passed to
> the function. See for example arch_timer_reg_read in
> arch/arm/kernel/arch_timer.c.

I don't think it is feasible with this number of hypercalls, even
accepting that in many cases the number will be a constant so gcc can
likely optimise almost all of it away.

Is there something wrong with the r12 based approach?

Ian.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] netconsole.txt: "nc" needs "-p" to specify the listening port

2012-07-27 Thread Dirk Gouders

Cong Wang  writes:

> On Fri, Jul 27, 2012 at 2:35 PM, Dirk Gouders
>  wrote:
>> Hi Jesse,
>>
>> I would like to ask you to check if the documentation of "nc" in
>> netconsole.txt is still correct.  I tried two different netcat packages
>> and both require "-p" to specify the listening port.  I am wondering if
>> that changed after the use of "nc" has been documented.
>
> On Fedora 16, `nc -u -l ` works fine.

Thanks for checking that.

If the information I found is correct, Fedora uses OpenBSD's nc
codebase.  The two netcat packages I tested on a Gentoo system differ in
requiring the -p switch for the port specification.

Dirk
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH 04/24] xen/arm: sync_bitops

2012-07-27 Thread Ian Campbell

On Thu, 2012-07-26 at 17:37 +0100, Konrad Rzeszutek Wilk wrote:
> On Thu, Jul 26, 2012 at 04:33:46PM +0100, Stefano Stabellini wrote:
> > sync_bitops functions are equivalent to the SMP implementation of the
> > original functions, independently from CONFIG_SMP being defined.
> 
> So why can't the code be changed to use that? Is it that
> the _set_bit, _clear_bit, etc are not available with !CONFIG_SMP?

_set_bit etc are not SMP safe if !CONFIG_SMP. But under Xen you might be
communicating with a completely external entity who might be on another
CPU (e.g. two uniprocessor guests communicating via event channels and
grant tables). So we need a variant of the bit ops which are SMP safe
even on a UP kernel.

The users are common code and the sync_foo vs foo distinction matters on
some platforms (e.g. x86 where a UP kernel would omit the LOCK prefix
for the normal ones).

> 
> > 
> > Signed-off-by: Stefano Stabellini 
> > ---
> >  arch/arm/include/asm/sync_bitops.h |   17 +
> >  1 files changed, 17 insertions(+), 0 deletions(-)
> >  create mode 100644 arch/arm/include/asm/sync_bitops.h
> > 
> > diff --git a/arch/arm/include/asm/sync_bitops.h 
> > b/arch/arm/include/asm/sync_bitops.h
> > new file mode 100644
> > index 000..d975092903
> > --- /dev/null
> > +++ b/arch/arm/include/asm/sync_bitops.h
> > @@ -0,0 +1,17 @@
> > +#ifndef __ASM_SYNC_BITOPS_H__
> > +#define __ASM_SYNC_BITOPS_H__
> > +
> > +#include 
> > +#include 
> > +
> > +#define sync_set_bit(nr, p)_set_bit(nr, p)
> > +#define sync_clear_bit(nr, p)  _clear_bit(nr, p)
> > +#define sync_change_bit(nr, p) _change_bit(nr, p)
> > +#define sync_test_and_set_bit(nr, p)   _test_and_set_bit(nr, p)
> > +#define sync_test_and_clear_bit(nr, p) _test_and_clear_bit(nr, p)
> > +#define sync_test_and_change_bit(nr, p)_test_and_change_bit(nr, p)
> > +#define sync_test_bit(nr, addr)test_bit(nr, addr)
> > +#define sync_cmpxchg   cmpxchg
> > +
> > +
> > +#endif
> > -- 
> > 1.7.2.5
> > 
> > 
> > ___
> > Xen-devel mailing list
> > xen-de...@lists.xen.org
> > http://lists.xen.org/xen-devel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 07/24] xen/arm: Xen detection and shared_info page mapping

2012-07-27 Thread Ian Campbell

On Thu, 2012-07-26 at 16:33 +0100, Stefano Stabellini wrote:
> Check for a "/xen" node in the device tree, if it is present set
> xen_domain_type to XEN_HVM_DOMAIN and continue initialization.
> 
> Map the real shared info page using XENMEM_add_to_physmap with
> XENMAPSPACE_shared_info.
> 
> Signed-off-by: Stefano Stabellini 
> ---
>  arch/arm/xen/enlighten.c |   56 
> ++
>  1 files changed, 56 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index d27c2a6..8c923af 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -5,6 +5,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  
>  struct start_info _xen_start_info;
>  struct start_info *xen_start_info = &_xen_start_info;
> @@ -33,3 +36,56 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
>   return -ENOSYS;
>  }
>  EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range);
> +
> +/*
> + * == Xen Device Tree format ==
> + * - /xen node;
> + * - compatible "arm,xen";
> + * - one interrupt for Xen event notifications;
> + * - one memory region to map the grant_table.
> + */
> +static int __init xen_guest_init(void)
> +{
> + int cpu;
> + struct xen_add_to_physmap xatp;
> + static struct shared_info *shared_info_page = 0;
> + struct device_node *node;
> +
> + node = of_find_compatible_node(NULL, NULL, "arm,xen");
> + if (!node) {
> + pr_info("No Xen support\n");
> + return 0;
> + }

This should either only print in the success case (to avoid spamming
everyone) or we need a little bit of infrastructure like on x86 so that
we print exactly one of:
"Booting natively on bearmetal"
"Booting paravirtualised on %s", hypervisor->name

> + xen_domain_type = XEN_HVM_DOMAIN;
> +
> + if (!shared_info_page)
> + shared_info_page = (struct shared_info *)
> + get_zeroed_page(GFP_KERNEL);
> + if (!shared_info_page) {
> + pr_err("not enough memory");
> + return -ENOMEM;
> + }
> + xatp.domid = DOMID_SELF;
> + xatp.idx = 0;
> + xatp.space = XENMAPSPACE_shared_info;
> + xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> + if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
> + BUG();
> +
> + HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
> +
> + /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
> +  * page, we use it in the event channel upcall and in some pvclock
> +  * related functions. We don't need the vcpu_info placement
> +  * optimizations because we don't use any pv_mmu or pv_irq op on
> +  * HVM.
> +  * When xen_hvm_init_shared_info is run at boot time only vcpu 0 is
> +  * online but xen_hvm_init_shared_info is run at resume time too and
> +  * in that case multiple vcpus might be online. */
> + for_each_online_cpu(cpu) {
> + per_cpu(xen_vcpu, cpu) =
> + &HYPERVISOR_shared_info->vcpu_info[cpu];

On ARM the shared info contains exactly 1 CPU (the boot CPU). The guest
is required to use VCPUOP_register_vcpu_info to place vcpu info for
secondary CPUs as they are brought up.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 12/24] xen/arm: Introduce xen_guest_init

2012-07-27 Thread Ian Campbell

On Thu, 2012-07-26 at 16:33 +0100, Stefano Stabellini wrote:
> We used to rely on a core_initcall to initialize Xen on ARM, however
> core_initcalls are actually called after early consoles are initialized.
> That means that hvc_xen.c is going to be initialized before Xen.
> 
> Given the lack of a better alternative, just call a new Xen
> initialization function (xen_guest_init) from xen_cons_init.

Can't we just arrange for this to be called super early on from
setup_arch? That's got to be better than calling it from some random
function which happens to get called early enough.

I presume that KVM is going to want some similarly early init hooks etc
and therefore ARM could benefit from the same sort of infrastructure as
is in arch/x86/include/asm/hypervisor.h?


> 
> xen_guest_init has to be arch independent, so write both an ARM and an
> x86 implementation. The x86 implementation is currently empty because we
> can be sure that xen_hvm_guest_init is called early enough.
> 
> Signed-off-by: Stefano Stabellini 
> ---
>  arch/arm/xen/enlighten.c  |7 ++-
>  arch/x86/xen/enlighten.c  |8 
>  drivers/tty/hvc/hvc_xen.c |7 ++-
>  include/xen/xen.h |2 ++
>  4 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index 8c923af..dc68074 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -44,7 +44,7 @@ EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range);
>   * - one interrupt for Xen event notifications;
>   * - one memory region to map the grant_table.
>   */
> -static int __init xen_guest_init(void)
> +int __init xen_guest_init(void)
>  {
>   int cpu;
>   struct xen_add_to_physmap xatp;
> @@ -58,6 +58,10 @@ static int __init xen_guest_init(void)
>   }
>   xen_domain_type = XEN_HVM_DOMAIN;
>  
> + /* already setup */
> + if (shared_info_page != 0 && HYPERVISOR_shared_info == shared_info_page)
> + return 0;
> +
>   if (!shared_info_page)
>   shared_info_page = (struct shared_info *)
>   get_zeroed_page(GFP_KERNEL);
> @@ -88,4 +92,5 @@ static int __init xen_guest_init(void)
>   }
>   return 0;
>  }
> +EXPORT_SYMBOL_GPL(xen_guest_init);
>  core_initcall(xen_guest_init);
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index ff962d4..6131d43 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1567,4 +1567,12 @@ const struct hypervisor_x86 x86_hyper_xen_hvm 
> __refconst = {
>   .init_platform  = xen_hvm_guest_init,
>  };
>  EXPORT_SYMBOL(x86_hyper_xen_hvm);
> +
> +int __init xen_guest_init(void)
> +{
> + /* do nothing: rely on x86_hyper_xen_hvm for the initialization */
> + return 0;
> + 
> +}
> +EXPORT_SYMBOL_GPL(xen_guest_init);
>  #endif
> diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
> index dc07f56..3c04fb8 100644
> --- a/drivers/tty/hvc/hvc_xen.c
> +++ b/drivers/tty/hvc/hvc_xen.c
> @@ -577,6 +577,12 @@ static void __exit xen_hvc_fini(void)
>  static int xen_cons_init(void)
>  {
>   const struct hv_ops *ops;
> + int r;
> +
> + /* retrieve xen infos  */
> + r = xen_guest_init();
> + if (r < 0)
> + return r;
>  
>   if (!xen_domain())
>   return 0;
> @@ -584,7 +590,6 @@ static int xen_cons_init(void)
>   if (xen_initial_domain())
>   ops = &dom0_hvc_ops;
>   else {
> - int r;
>   ops = &domU_hvc_ops;
>  
>   if (xen_hvm_domain())
> diff --git a/include/xen/xen.h b/include/xen/xen.h
> index 2c0d3a5..792a4d2 100644
> --- a/include/xen/xen.h
> +++ b/include/xen/xen.h
> @@ -9,8 +9,10 @@ enum xen_domain_type {
>  
>  #ifdef CONFIG_XEN
>  extern enum xen_domain_type xen_domain_type;
> +int xen_guest_init(void);
>  #else
>  #define xen_domain_type  XEN_NATIVE
> +static inline int xen_guest_init(void) { return 0; }
>  #endif
>  
>  #define xen_domain() (xen_domain_type != XEN_NATIVE)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace

2012-07-27 Thread Amit Shah

On (Fri) 27 Jul 2012 [17:55:11], Yoshihiro YUNOMAE wrote:
> Hi Amit,
> 
> Thank you for commenting on our work.
> 
> (2012/07/26 20:35), Amit Shah wrote:
> >On (Tue) 24 Jul 2012 [11:36:57], Yoshihiro YUNOMAE wrote:
> 
> [...]
> 
> >>
> >>Therefore, we propose a new system "virtio-trace", which uses enhanced
> >>virtio-serial and existing ring-buffer of ftrace, for collecting guest 
> >>kernel
> >>tracing data. In this system, there are 5 main components:
> >>  (1) Ring-buffer of ftrace in a guest
> >>  - When trace agent reads ring-buffer, a page is removed from 
> >> ring-buffer.
> >>  (2) Trace agent in the guest
> >>  - Splice the page of ring-buffer to read_pipe using splice() without
> >>memory copying. Then, the page is spliced from write_pipe to virtio
> >>without memory copying.
> >
> >I really like the splicing idea.
> 
> Thanks. We will improve this patch set.
> 
> >>  (3) Virtio-console driver in the guest
> >>  - Pass the page to virtio-ring
> >>  (4) Virtio-serial bus in QEMU
> >>  - Copy the page to kernel pipe
> >>  (5) Reader in the host
> >>  - Read guest tracing data via FIFO(named pipe)
> >
> >So will this be useful only if guest and host run the same kernel?
> >
> >I'd like to see the host kernel not being used at all -- collect all
> >relevant info from the guest and send it out to qemu, where it can be
> >consumed directly by apps driving the tracing.
> 
> No, this patch set is used only for guest kernels, so guest and host
> don't need to run the same kernel.

OK - that's good to know.

> >>***Evaluation***
> >>When a host collects tracing data of a guest, the performance of using
> >>virtio-trace is compared with that of using native(just running ftrace),
> >>IVRing, and virtio-serial(normal method of read/write).
> >
> >Why is tracing performance-sensitive?  i.e. why try to optimise this
> >at all?
> 
> To minimize effects for applications on guests when a host collects
> tracing data of guests.
> For example, we assume the situation where guests A and B are running
> on a host sharing I/O device. An I/O delay problem occur in guest A,
> but it doesn't for the requirement in guest B. In this case, we need to
> collect tracing data of guests A and B, but a usual method using
> network takes high load for applications of guest B even if guest B is
> normally running. Therefore, we try to decrease the load on guests.
> We also use this feature for performance analysis on production
> virtualization systems.

OK, got it.

> 
> [...]
> 
> >>
> >>***Just enhancement ideas***
> >>  - Support for trace-cmd
> >>  - Support for 9pfs protocol
> >>  - Support for non-blocking mode in QEMU
> >
> >There were patches long back (by me) to make chardevs non-blocking but
> >they didn't make it upstream.  Fedora carries them, if you want to try
> >out.  Though we want to converge on a reasonable solution that's
> >acceptable upstream as well.  Just that no one's working on it
> >currently.  Any help here will be appreciated.
> 
> Thanks! In this case, since a guest will stop to run when host reads
> trace data of the guest, char device is needed to add a non-blocking
> mode. I'll read your patch series. Is the latest version 8?
> http://lists.gnu.org/archive/html/qemu-devel/2010-12/msg00035.html

I suppose the latest version on-list is what you quote above.  The
objections to the patch series are mentioned in Anthony's mails.

Hans maintains a rebased version of the patches in his tree at

http://cgit.freedesktop.org/~jwrdegoede/qemu/

those patches are included in Fedora's qemu-kvm, so you can try that
out if it improves performance for you.

> >>  - Make "vhost-serial"
> >
> >I need to understand a) why it's perf-critical, and b) why should the
> >host be involved at all, to comment on these.
> 
> a) To make collecting overhead decrease for application on a guest.
>(see above)
> b) Trace data of host kernel is not involved even if we introduce this
>patch set.

I see, so you suggested vhost-serial only because you saw the guest
stopping problem due to the absence of non-blocking code?  If so, it
now makes sense.  I don't think we need vhost-serial in any way yet.

BTW where do you parse the trace data obtained from guests?  On a
remote host?

Thanks,
Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 13/24] xen/arm: get privilege status

2012-07-27 Thread Ian Campbell

On Thu, 2012-07-26 at 16:33 +0100, Stefano Stabellini wrote:
> Use Xen features to figure out if we are privileged.
> 
> XENFEAT_dom0 was introduced by 23735 in xen-unstable.hg.
> 
> Signed-off-by: Stefano Stabellini 
> ---
>  arch/arm/xen/enlighten.c |7 +++
>  include/xen/interface/features.h |3 +++
>  2 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index dc68074..2e013cf 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -2,6 +2,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -58,6 +59,12 @@ int __init xen_guest_init(void)
>   }
>   xen_domain_type = XEN_HVM_DOMAIN;
>  
> + xen_setup_features();
> + if (xen_feature(XENFEAT_dom0))
> + xen_start_info->flags |= SIF_INITDOMAIN|SIF_PRIVILEGED;
> + else
> + xen_start_info->flags &= ~(SIF_INITDOMAIN|SIF_PRIVILEGED);

What happens here on platforms prior to hypervisor changeset 23735?

> +
>   /* already setup */
>   if (shared_info_page != 0 && HYPERVISOR_shared_info == shared_info_page)
>   return 0;
> diff --git a/include/xen/interface/features.h 
> b/include/xen/interface/features.h
> index b6ca39a..131a6cc 100644
> --- a/include/xen/interface/features.h
> +++ b/include/xen/interface/features.h
> @@ -50,6 +50,9 @@
>  /* x86: pirq can be used by HVM guests */
>  #define XENFEAT_hvm_pirqs   10
>  
> +/* operation as Dom0 is supported */
> +#define XENFEAT_dom0  11
> +
>  #define XENFEAT_NR_SUBMAPS 1
>  
>  #endif /* __XEN_PUBLIC_FEATURES_H__ */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 18/24] xen/arm: compile blkfront and blkback

2012-07-27 Thread Ian Campbell

On Thu, 2012-07-26 at 16:34 +0100, Stefano Stabellini wrote:
> 
> +#define XEN_IO_PROTO_ABI_ARM"arm-abi" 

I wonder if we ought to call this arm-aarch32-abi or something?

I wonder if we can also take the opportunity to fix the ABI cockup for
disks on ARM and make the structs the same for both 32 and 64 bit?

Ian.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [vmw_vmci 11/11] Apply the header code to make VMCI build

2012-07-27 Thread Alan Cox

> +enum {
> + VMCI_SUCCESS_QUEUEPAIR_ATTACH   =  5,
> + VMCI_SUCCESS_QUEUEPAIR_CREATE   =  4,
> + VMCI_SUCCESS_LAST_DETACH=  3,
> + VMCI_SUCCESS_ACCESS_GRANTED =  2,
> + VMCI_SUCCESS_ENTRY_DEAD =  1,

We've got a nice collection of Linux error codes than you, and it would
make the driver enormously more readable on the Linux side if as low
level as possible it started using Linux error codes.


> + VMCI_SUCCESS=  0,
> + VMCI_ERROR_INVALID_RESOURCE = (-1),
> + VMCI_ERROR_INVALID_ARGS = (-2),
> + VMCI_ERROR_NO_MEM   = (-3),
> + VMCI_ERROR_DATAGRAM_FAILED  = (-4),
> + VMCI_ERROR_MORE_DATA= (-5),
> + VMCI_ERROR_NO_MORE_DATAGRAMS= (-6),
> + VMCI_ERROR_NO_ACCESS= (-7),
> + VMCI_ERROR_NO_HANDLE= (-8),
> + VMCI_ERROR_DUPLICATE_ENTRY  = (-9),
> + VMCI_ERROR_DST_UNREACHABLE  = (-10),
> + VMCI_ERROR_PAYLOAD_TOO_LARGE= (-11),
> + VMCI_ERROR_INVALID_PRIV = (-12),
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [RFC PATCH] Boot PV guests with more than 128GB (v1) for 3.7

2012-07-27 Thread Ian Campbell

On Fri, 2012-07-27 at 08:34 +0100, Jan Beulich wrote:
> >>> On 26.07.12 at 22:47, Konrad Rzeszutek Wilk  
> >>> wrote:
> >  2). Allocate a new array, copy the existing P2M into it,
> > revector the P2M tree to use that, and return the old
> > P2M to the memory allocate. This has the advantage that
> > it sets the stage for using XEN_ELF_NOTE_INIT_P2M
> > feature. That feature allows us to set the exact virtual
> > address space we want for the P2M - and allows us to
> > boot as initial domain on large machines.
> 
> And I would hope that the tools would get updated to recognize
> this note too, so that huge DomU-s would become possible as
> well.

Does this help us with >160GB 32 bit PV guests too? I'm guessing not
since the real limitation there is the relatively small amount of kernel
address space.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -alternative] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables V2 (resend)

2012-07-27 Thread Larry Woodman


On 07/26/2012 11:48 PM, Larry Woodman wrote:


Mel, did you see this???

Larry


This patch looks good to me.

Larry, does Hugh's patch survive your testing?




Like I said earlier, no.  However, I finally set up a reproducer that 
only takes a few seconds

on a large system and this totally fixes the problem:

- 


diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c36febb..cc023b8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2151,7 +2151,7 @@ int copy_hugetlb_page_range(struct mm_struct 
*dst, struct mm_struct *src,

goto nomem;

/* If the pagetables are shared don't copy or take 
references */

-   if (dst_pte == src_pte)
+   if (*(unsigned long *)dst_pte == *(unsigned long 
*)src_pte)

continue;

spin_lock(&dst->page_table_lock);
--- 



When we compare what the src_pte & dst_pte point to instead of their 
addresses everything works,

I suspect there is a missing memory barrier somewhere ???

Larry



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] PWM subsystem for v3.6

2012-07-27 Thread Mark Brown

On Fri, Jul 27, 2012 at 07:10:54AM +0200, Thierry Reding wrote:

> At least the patch that adds me as the maintainer is Acked-by: Sascha
> Hauer, who did the original work, and Arnd Bergmann who was involved in
> the review process. Other people such as Shawn Guo and Mark Brown have
> also been reviewing these patches and new patches have been contributed
> by Eric Bénard, Axel Lin, Sachin Kamat, Alexandre Courbot, Alexandre
> Pereira da Silva and Philip Avinash.

I'm happy with it - I'm intending to push at least one driver for it
fairly shortly (well, as time allows).  I'm also comfortable that
Thierry will look after the system longer term.

Acked-by: Mark Brown 

(I did ack quite a few of the patches individually too).

signature.asc
Description: Digital signature

[RFC PATCH v5 00/19] memory-hotplug: hot-remove physical memory

2012-07-27 Thread Wen Congyang

This patch series aims to support physical memory hot-remove.

The patches can free/remove following things:

  - acpi_memory_info  : [RFC PATCH 4/19]
  - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
  - iomem_resource: [RFC PATCH 9/19]
  - mem_section and related sysfs files   : [RFC PATCH 10-11, 13-16/19]
  - page table of removed memory  : [RFC PATCH 12/19]
  - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.

change log of v5:
 * merge the patchset to clear page table and the patchset to hot remove
   memory(from ishimatsu) to one big patchset.

 [RFC PATCH v5 1/19]
   * rename remove_memory() to offline_memory()/offline_pages()

 [RFC PATCH v5 2/19]
   * new patch: implement offline_memory(). This function offlines pages,
 update memory block's state, and notify the userspace that the memory
 block's state is changed.

 [RFC PATCH v5 4/19]
   * offline and remove memory in acpi_memory_disable_device() too.

 [RFC PATCH v5 17/19]
   * new patch: add a new function __remove_zone() to revert the things done
 in the function __add_zone().

 [RFC PATCH v5 18/19]
   * flush work befor reseting node device.

change log of v4:
 * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug"
   from the patch series, since the patch is a bugfix. It is being disccussed
   on other thread. But for testing the patch series, the patch is needed.
   So I added the patch as [PATCH 0/13].

 [RFC PATCH v4 2/13]
   * check memory is online or not at remove_memory()
   * add memory_add_physaddr_to_nid() to acpi_memory_device_remove() for
 getting node id
 
 [RFC PATCH v4 3/13]
   * create new patch : check memory is online or not at online_pages()

 [RFC PATCH v4 4/13]
   * add __ref section to remove_memory()
   * call firmware_map_remove_entry() before remove_sysfs_fw_map_entry()

 [RFC PATCH v4 11/13]
   * rewrite register_page_bootmem_memmap() for removing page used as PT/PMD

change log of v3:
 * rebase to 3.5.0-rc6

 [RFC PATCH v2 2/13]
   * remove extra kobject_put()

   * The patch was commented by Wen. Wen's comment is
 "acpi_memory_device_remove() should ignore a return value of
 remove_memory() since caller does not care the return value".
 But I did not change it since I think caller should care the
 return value. And I am trying to fix it as follow:

 https://lkml.org/lkml/2012/7/5/624

 [RFC PATCH v2 4/13]
   * remove a firmware_memmap_entry allocated by kzmalloc()

change log of v2:
 [RFC PATCH v2 2/13]
   * check whether memory block is offline or not before calling 
offline_memory()
   * check whether section is valid or not in is_memblk_offline()
   * call kobject_put() for each memory_block in is_memblk_offline()

 [RFC PATCH v2 3/13]
   * unify the end argument of firmware_map_add_early/hotplug

 [RFC PATCH v2 4/13]
   * add release_firmware_map_entry() for freeing firmware_map_entry

 [RFC PATCH v2 6/13]
  * add release_memory_block() for freeing memory_block

 [RFC PATCH v2 11/13]
  * fix wrong arguments of free_pages()


Wen Congyang (5):
  memory-hotplug: implement offline_memory()
  memory-hotplug: store the node id in acpi_memory_device
  memory-hotplug: export the function acpi_bus_remove()
  memory-hotplug: call acpi_bus_remove() to remove memory device
  memory-hotplug: introduce new function arch_remove_memory()

Yasuaki Ishimatsu (14):
  memory-hotplug: rename remove_memory() to
offline_memory()/offline_pages()
  memory-hotplug: offline and remove memory when removing the memory
device
  memory-hotplug: check whether memory is present or not
  memory-hotplug: remove /sys/firmware/memmap/X sysfs
  memory-hotplug: does not release memory region in PAGES_PER_SECTION
chunks
  memory-hotplug: add memory_block_release
  memory-hotplug: remove_memory calls __remove_pages
  memory-hotplug: check page type in get_page_bootmem
  memory-hotplug: move register_page_bootmem_info_node and
put_page_bootmem for sparse-vmemmap
  memory-hotplug: implement register_page_bootmem_info_section of
sparse-vmemmap
  memory-hotplug: free memmap of sparse-vmemmap
  memory_hotplug: clear zone when the memory is removed
  memory-hotplug: add node_device_release
  memory-hotplug: remove sysfs file of node

 arch/ia64/mm/init.c |   16 +
 arch/powerpc/mm/mem.c   |   14 +
 arch/powerpc/platforms/pseries/hotplug-memory.c |   16 +-
 arch/s390/mm/init.c |8 +
 arch/sh/mm/init.c   |   15 +
 arch/tile/mm/init.c |8 +
 arch/x86/include/asm/pgtable_types.h|1 +
 arch/x86/mm/init_32.c   |   10 +
 arch/x86/mm/init_64.c   |  333 ++
 arch/x86/mm/pageattr.c

Re: [PATCH] sd: do not set changed flag on all unit attention conditions

2012-07-27 Thread Hannes Reinecke

On 07/17/2012 11:59 PM, James Bottomley wrote:
> On Tue, 2012-07-17 at 12:36 -0400, Christoph Hellwig wrote:
>> On Tue, Jul 17, 2012 at 10:11:57AM +0100, James Bottomley wrote:
>>> There's no such thing in the market today as a removable disk that's
>>> resizeable.  Removable disks are for things like backup cartridges and
>>> ageing jazz drives.  Worse: most removeable devices today are USB card
>>> readers whose standards compliance varies from iffy to non existent.
>>> Resizeable disks are currently the province of storage arrays.
>>
>> The virtual disks exported by aacraid are both marked removable and
>> can be resized.
> 
> So what are properties of these things? ... or is this just an instance
> of a RAID manufacturer hacking around a problem by adding a removable
> flag?
> 
Presumably.

The general intention is to automatically catch any disk resizing.
As the SCSI stack (used to) ignore these things that was their way
of working around it.

Curiously, though; the aacraid driver is the only one doing this,
plus the process is quite involved (using a proprietary application
for doing so etc).

None of the FC driver do this, despite the fact that resizing a disk
is even easier here.

I even tried to remove that line once, but then got told off by then
Adaptec that I would break their apps.
Since then there's a patch in the SLES kernel for adding a module
option switching off this behaviour.

We should ask Adaptec/PMC-Sierra here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 0/19] firmware_map : unify argument of firmware_map_add_early/hotplug

2012-07-27 Thread Wen Congyang

From: Yasuaki Ishimatsu 

There are two ways to create /sys/firmware/memmap/X sysfs:

  - firmware_map_add_early
When the system starts, it is calledd from e820_reserve_resources()
  - firmware_map_add_hotplug
When the memory is hot plugged, it is called from add_memory()

But these functions are called without unifying value of end argument as below:

  - end argument of firmware_map_add_early()   : start + size - 1
  - end argument of firmware_map_add_hogplug() : start + size

The patch unifies them to "start + size". Even if applying the patch,
/sys/firmware/memmap/X/end file content does not change.

CC: Thomas Gleixner 
CC: Ingo Molnar 
CC: H. Peter Anvin 
CC: Tejun Heo 
CC: Andrew Morton 
Reviewed-by: Dave Hansen 
Signed-off-by: Yasuaki Ishimatsu 

---
 arch/x86/kernel/e820.c|2 +-
 drivers/firmware/memmap.c |8 
 2 files changed, 5 insertions(+), 5 deletions(-)

Index: linux-3.5-rc6/arch/x86/kernel/e820.c
===
--- linux-3.5-rc6.orig/arch/x86/kernel/e820.c   2012-07-18 17:19:38.391365260 
+0900
+++ linux-3.5-rc6/arch/x86/kernel/e820.c2012-07-18 17:19:43.616300222 
+0900
@@ -944,7 +944,7 @@ void __init e820_reserve_resources(void)
for (i = 0; i < e820_saved.nr_map; i++) {
struct e820entry *entry = &e820_saved.map[i];
firmware_map_add_early(entry->addr,
-   entry->addr + entry->size - 1,
+   entry->addr + entry->size,
e820_type_to_string(entry->type));
}
 }
Index: linux-3.5-rc6/drivers/firmware/memmap.c
===
--- linux-3.5-rc6.orig/drivers/firmware/memmap.c2012-07-18 
17:19:38.388365299 +0900
+++ linux-3.5-rc6/drivers/firmware/memmap.c 2012-07-18 18:30:47.608390251 
+0900
@@ -98,7 +98,7 @@ static LIST_HEAD(map_entries);
 /**
  * firmware_map_add_entry() - Does the real work to add a firmware memmap 
entry.
  * @start: Start of the memory range.
- * @end:   End of the memory range (inclusive).
+ * @end:   End of the memory range.
  * @type:  Type of the memory range.
  * @entry: Pre-allocated (either kmalloc() or bootmem allocator), uninitialised
  * entry.
@@ -113,7 +113,7 @@ static int firmware_map_add_entry(u64 st
BUG_ON(start > end);
 
entry->start = start;
-   entry->end = end;
+   entry->end = end - 1;
entry->type = type;
INIT_LIST_HEAD(&entry->list);
kobject_init(&entry->kobj, &memmap_ktype);
@@ -148,7 +148,7 @@ static int add_sysfs_fw_map_entry(struct
  * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
  * memory hotplug.
  * @start: Start of the memory range.
- * @end:   End of the memory range (inclusive).
+ * @end:   End of the memory range.
  * @type:  Type of the memory range.
  *
  * Adds a firmware mapping entry. This function is for memory hotplug, it is
@@ -175,7 +175,7 @@ int __meminit firmware_map_add_hotplug(u
 /**
  * firmware_map_add_early() - Adds a firmware mapping entry.
  * @start: Start of the memory range.
- * @end:   End of the memory range (inclusive).
+ * @end:   End of the memory range.
  * @type:  Type of the memory range.
  *
  * Adds a firmware mapping entry. This function uses the bootmem allocator



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [RFC PATCH] Boot PV guests with more than 128GB (v1) for 3.7

2012-07-27 Thread Jan Beulich

>>> On 27.07.12 at 12:00, Ian Campbell  wrote:
> On Fri, 2012-07-27 at 08:34 +0100, Jan Beulich wrote:
>> >>> On 26.07.12 at 22:47, Konrad Rzeszutek Wilk  
>> >>> wrote:
>> >  2). Allocate a new array, copy the existing P2M into it,
>> > revector the P2M tree to use that, and return the old
>> > P2M to the memory allocate. This has the advantage that
>> > it sets the stage for using XEN_ELF_NOTE_INIT_P2M
>> > feature. That feature allows us to set the exact virtual
>> > address space we want for the P2M - and allows us to
>> > boot as initial domain on large machines.
>> 
>> And I would hope that the tools would get updated to recognize
>> this note too, so that huge DomU-s would become possible as
>> well.
> 
> Does this help us with >160GB 32 bit PV guests too? I'm guessing not
> since the real limitation there is the relatively small amount of kernel
> address space.

Correct - 32-bit PV guests are limited anyway (and it's for a
reason the Dom0 support in the hypervisor only deals with
64-bit ones). And honestly, considering the huge page
information table such a memory amount would require, I
doubt this big a PV guest would even boot (or if it does, be
of any use).

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2 0/6] Per-cgroup page stat accounting

2012-07-27 Thread Sha Zhengju

From: Sha Zhengju 

Hi, list

This V2 patch series provide the ability for each memory cgroup to have 
independent
dirty/writeback page statistics which can provide information for per-cgroup
direct reclaim or some.

In the first three prepare patches, we have done some cleanup and reworked vfs
set page dirty routines to make "modify page info" and "dirty page accouting" 
stay
in one function as much as possible for the sake of memcg bigger lock(test 
numbers
are in the specific patch).

Kame, I tested these patches on linux mainline v3.5, because I cannot boot up 
the kernel
under linux-next :(. But these patches are cooked on top of your recent memcg 
patches
(I backport them to mainline) and I think there is no hunk with the mm tree.
So If there's no other problem, I think it could be considered for merging.



Following is performance comparison between before/after the series:

Test steps(Mem-24g, ext4):
drop_cache; sync
cat /proc/meminfo|grep Dirty (=4KB)
fio (buffered/randwrite/bs=4k/size=128m/filesize=1g/numjobs=8/sync) 
cat /proc/meminfo|grep Dirty(=648696kB)

We test it for 10 times and get the average numbers:
Before:
write: io=1024.0MB, bw=334678 KB/s, iops=83669.2 , runt=  3136 msec
lat (usec): min=1 , max=26203.1 , avg=81.473, stdev=275.754

After:
write: io=1024.0MB, bw=325219 KB/s, iops= 81304.1 , runt=  3226.9 msec
lat (usec): min=1 , max=17224 , avg=86.194, stdev=298.183



There is about 2.8% performance decrease. But I notice that once memcg is 
enabled,
the root_memcg exsits and all pages allocated are belong to it, so they will go
through the root memcg statistics routines which bring some overhead. 
Moreover,in case of memcg_is_enable && no cgroups, we can get root memcg stats
just from global numbers which can avoid both accounting overheads and many 
if-test
overheads. Later I'll work further into it.

Any comments are welcomed. : )



Change log:
v2 <-- v1:
1. add test numbers
2. some small fix and comments

Sha Zhengju (6):
memcg-remove-MEMCG_NR_FILE_MAPPED.patch
Make-TestSetPageDirty-and-dirty-page-accounting-in-o.patch
Use-vfs-__set_page_dirty-interface-instead-of-doing-.patch
memcg-add-per-cgroup-dirty-pages-accounting.patch
memcg-add-per-cgroup-writeback-pages-accounting.patch
memcg-Document-cgroup-dirty-writeback-memory-statist.patch

 Documentation/cgroups/memory.txt |2 +
 fs/buffer.c  |   36 +++
 fs/ceph/addr.c   |   20 +
 include/linux/buffer_head.h  |2 +
 include/linux/memcontrol.h   |   30 ++-
 mm/filemap.c |9 ++
 mm/memcontrol.c  |   58 +++---
 mm/page-writeback.c  |   48 ---
 mm/rmap.c|4 +-
 mm/truncate.c|6 
 10 files changed, 141 insertions(+), 74 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v5 01/19] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages()

2012-07-27 Thread Wen Congyang

From: Yasuaki Ishimatsu 

remove_memory() only try to offline pages. It is called in two cases:
1. hot remove a memory device
2. echo offline >/sys/devices/system/memory/memoryXX/state

In the 1st case, we should also change memory block's state, and notify
the userspace that the memory block's state is changed after offlining
pages.

So rename remove_memory() to offline_memory()/offline_pages(). And in
the 1st case, offline_memory() will be used. The function offline_memory()
is not implemented. In the 2nd case, offline_pages() will be used.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
Signed-off-by: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
 drivers/acpi/acpi_memhotplug.c |2 +-
 drivers/base/memory.c  |9 +++--
 include/linux/memory_hotplug.h |3 ++-
 mm/memory_hotplug.c|   22 ++
 4 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 81a9def..8957ed9 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -318,7 +318,7 @@ static int acpi_memory_disable_device(struct 
acpi_memory_device *mem_device)
 */
list_for_each_entry_safe(info, n, &mem_device->res_list, list) {
if (info->enabled) {
-   result = remove_memory(info->start_addr, info->length);
+   result = offline_memory(info->start_addr, info->length);
if (result)
return result;
}
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 7dda4f7..44e7de6 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -248,26 +248,23 @@ static bool pages_correctly_reserved(unsigned long 
start_pfn,
 static int
 memory_block_action(unsigned long phys_index, unsigned long action)
 {
-   unsigned long start_pfn, start_paddr;
+   unsigned long start_pfn;
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
struct page *first_page;
int ret;
 
first_page = pfn_to_page(phys_index << PFN_SECTION_SHIFT);
+   start_pfn = page_to_pfn(first_page);
 
switch (action) {
case MEM_ONLINE:
-   start_pfn = page_to_pfn(first_page);
-
if (!pages_correctly_reserved(start_pfn, nr_pages))
return -EBUSY;
 
ret = online_pages(start_pfn, nr_pages);
break;
case MEM_OFFLINE:
-   start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
-   ret = remove_memory(start_paddr,
-   nr_pages << PAGE_SHIFT);
+   ret = offline_pages(start_pfn, nr_pages);
break;
default:
WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 910550f..c183f39 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -233,7 +233,8 @@ static inline int is_mem_section_removable(unsigned long 
pfn,
 extern int mem_online_node(int nid);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int arch_add_memory(int nid, u64 start, u64 size);
-extern int remove_memory(u64 start, u64 size);
+extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
+extern int offline_memory(u64 start, u64 size);
 extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
 extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
*ms);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 427bb29..7a6659f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -865,7 +865,7 @@ check_pages_isolated(unsigned long start_pfn, unsigned long 
end_pfn)
return offlined;
 }
 
-static int __ref offline_pages(unsigned long start_pfn,
+static int __ref __offline_pages(unsigned long start_pfn,
  unsigned long end_pfn, unsigned long timeout)
 {
unsigned long pfn, nr_pages, expire;
@@ -990,18 +990,24 @@ out:
return ret;
 }
 
-int remove_memory(u64 start, u64 size)
+int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 {
-   unsigned long start_pfn, end_pfn;
+   return __offline_pages(start_pfn, start_pfn + nr_pages, 120 * HZ);
+}
 
-   start_pfn = PFN_DOWN(start);
-   end_pfn = start_pfn + PFN_DOWN(size);
-   return offline_pages(start_pfn, end_pfn, 120 * HZ);
+int offline_memory(u64 start, u64 size)
+{
+   return -EINVAL;
 }
 #else
-int remove_memory(u64 start, u64 size)
+int offline_pages(u

Re: [Xen-devel] [RFC PATCH] Boot PV guests with more than 128GB (v1) for 3.7

2012-07-27 Thread Ian Campbell

On Fri, 2012-07-27 at 11:17 +0100, Jan Beulich wrote:
> >>> On 27.07.12 at 12:00, Ian Campbell  wrote:
> > On Fri, 2012-07-27 at 08:34 +0100, Jan Beulich wrote:
> >> >>> On 26.07.12 at 22:47, Konrad Rzeszutek Wilk  
> >> >>> wrote:
> >> >  2). Allocate a new array, copy the existing P2M into it,
> >> > revector the P2M tree to use that, and return the old
> >> > P2M to the memory allocate. This has the advantage that
> >> > it sets the stage for using XEN_ELF_NOTE_INIT_P2M
> >> > feature. That feature allows us to set the exact virtual
> >> > address space we want for the P2M - and allows us to
> >> > boot as initial domain on large machines.
> >> 
> >> And I would hope that the tools would get updated to recognize
> >> this note too, so that huge DomU-s would become possible as
> >> well.
> > 
> > Does this help us with >160GB 32 bit PV guests too? I'm guessing not
> > since the real limitation there is the relatively small amount of kernel
> > address space.
> 
> Correct - 32-bit PV guests are limited anyway (and it's for a
> reason the Dom0 support in the hypervisor only deals with
> 64-bit ones). And honestly, considering the huge page
> information table such a memory amount would require, I
> doubt this big a PV guest would even boot (or if it does, be
> of any use).

Right.

I was actually think of the issue with 32 bit PV guests accessing MFN
space > 160G, even if they are themselves small, which is a separate
concern.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v5 02/19] memory-hotplug: implement offline_memory()

2012-07-27 Thread Wen Congyang

The function offline_memory() will be called when hot removing a
memory device. The memory device may contain more than one memory
block. If the memory block has been offlined, __offline_pages()
will fail. So we should try to offline one memory block at a
time.

If the memory block is offlined in offline_memory(), we also
update it's state, and notify the userspace that its state is
changed.

The function offline_memory() also check each memory block's
state. So there is no need to check the memory block's state
before calling offline_memory().

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Yasuaki Ishimatsu 
CC: Vasilis Liaskovitis 
Signed-off-by: Wen Congyang 
---
 drivers/base/memory.c  |   31 +++
 include/linux/memory_hotplug.h |2 ++
 mm/memory_hotplug.c|   37 -
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 44e7de6..86c8821 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -275,13 +275,11 @@ memory_block_action(unsigned long phys_index, unsigned 
long action)
return ret;
 }
 
-static int memory_block_change_state(struct memory_block *mem,
+static int __memory_block_change_state(struct memory_block *mem,
unsigned long to_state, unsigned long from_state_req)
 {
int ret = 0;
 
-   mutex_lock(&mem->state_mutex);
-
if (mem->state != from_state_req) {
ret = -EINVAL;
goto out;
@@ -309,10 +307,20 @@ static int memory_block_change_state(struct memory_block 
*mem,
break;
}
 out:
-   mutex_unlock(&mem->state_mutex);
return ret;
 }
 
+static int memory_block_change_state(struct memory_block *mem,
+   unsigned long to_state, unsigned long from_state_req)
+{
+   int ret;
+
+   mutex_lock(&mem->state_mutex);
+   ret = __memory_block_change_state(mem, to_state, from_state_req);
+   mutex_unlock(&mem->state_mutex);
+
+   return ret;
+}
 static ssize_t
 store_mem_state(struct device *dev,
struct device_attribute *attr, const char *buf, size_t count)
@@ -653,6 +661,21 @@ int unregister_memory_section(struct mem_section *section)
 }
 
 /*
+ * offline one memory block. If the memory block has been offlined, do nothing.
+ */
+int offline_memory_block(struct memory_block *mem)
+{
+   int ret = 0;
+
+   mutex_lock(&mem->state_mutex);
+   if (mem->state != MEM_OFFLINE)
+   ret = __memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
+   mutex_unlock(&mem->state_mutex);
+
+   return ret;
+}
+
+/*
  * Initialize the sysfs support for memory devices...
  */
 int __init memory_dev_init(void)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index c183f39..0b040bb 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -10,6 +10,7 @@ struct page;
 struct zone;
 struct pglist_data;
 struct mem_section;
+struct memory_block;
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 
@@ -234,6 +235,7 @@ extern int mem_online_node(int nid);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int arch_add_memory(int nid, u64 start, u64 size);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
+extern int offline_memory_block(struct memory_block *mem);
 extern int offline_memory(u64 start, u64 size);
 extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7a6659f..992454a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -997,7 +997,42 @@ int offline_pages(unsigned long start_pfn, unsigned long 
nr_pages)
 
 int offline_memory(u64 start, u64 size)
 {
-   return -EINVAL;
+   struct memory_block *mem = NULL;
+   struct mem_section *section;
+   unsigned long start_pfn, end_pfn;
+   unsigned long pfn, section_nr;
+   int ret;
+
+   start_pfn = PFN_DOWN(start);
+   end_pfn = start_pfn + PFN_DOWN(size);
+
+   for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+   section_nr = pfn_to_section_nr(pfn);
+   if (!present_section_nr(section_nr))
+   continue;
+
+   section = __nr_to_section(section_nr);
+   /* same memblock? */
+   if (mem)
+   if ((section_nr >= mem->start_section_nr) &&
+   (section_nr <= mem->end_section_nr))
+   continue;
+
+   mem = find_memory_block_hinted(section, mem);
+   if (!mem)
+   continue;
+
+   ret = offline_memory_block(mem);
+

[RFC PATCH v5 03/19] memory-hotplug: store the node id in acpi_memory_device

2012-07-27 Thread Wen Congyang

The memory device has only one node id. Store the node id when
enable the memory device, and we can reuse it when removing the
memory device.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
Reviewed-by: Yasuaki Ishimatsu 
---
 drivers/acpi/acpi_memhotplug.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 8957ed9..293d718 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -83,6 +83,7 @@ struct acpi_memory_info {
 struct acpi_memory_device {
struct acpi_device * device;
unsigned int state; /* State of the memory device */
+   int nid;
struct list_head res_list;
 };
 
@@ -256,6 +257,9 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
info->enabled = 1;
num_enabled++;
}
+
+   mem_device->nid = node;
+
if (!num_enabled) {
printk(KERN_ERR PREFIX "add_memory failed\n");
mem_device->state = MEMORY_INVALID_STATE;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v5 04/19] memory-hotplug: offline and remove memory when removing the memory device

2012-07-27 Thread Wen Congyang

From: Yasuaki Ishimatsu 

We should offline and remove memory when removing the memory device.
The memory device can be removed by 2 ways:
1. send eject request by SCI
2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

In the 1st case, acpi_memory_disable_device() will be called. In the 2nd
case, acpi_memory_device_remove() will be called. acpi_memory_device_remove()
will also be called when we unbind the memory device from the driver
acpi_memhotplug. If the type is ACPI_BUS_REMOVAL_EJECT, it means
that the user wants to eject the memory device, and we should offline
and remove memory in acpi_memory_device_remove().

The function remove_memory() is not implemeted now. It only check whether
all memory has been offllined now.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
Signed-off-by: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
 drivers/acpi/acpi_memhotplug.c |   42 +--
 drivers/base/memory.c  |   39 +
 include/linux/memory.h |5 
 include/linux/memory_hotplug.h |5 
 mm/memory_hotplug.c|   22 
 5 files changed, 106 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 293d718..ed37fc2 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -310,26 +311,42 @@ static int acpi_memory_powerdown_device(struct 
acpi_memory_device *mem_device)
return 0;
 }
 
-static int acpi_memory_disable_device(struct acpi_memory_device *mem_device)
+static int
+acpi_memory_device_remove_memory(struct acpi_memory_device *mem_device)
 {
int result;
struct acpi_memory_info *info, *n;
+   int node = mem_device->nid;
 
-
-   /*
-* Ask the VM to offline this memory range.
-* Note: Assume that this function returns zero on success
-*/
list_for_each_entry_safe(info, n, &mem_device->res_list, list) {
if (info->enabled) {
result = offline_memory(info->start_addr, info->length);
if (result)
return result;
+
+   result = remove_memory(node, info->start_addr,
+  info->length);
+   if (result)
+   return result;
}
+
list_del(&info->list);
kfree(info);
}
 
+   return 0;
+}
+
+static int acpi_memory_disable_device(struct acpi_memory_device *mem_device)
+{
+   int result;
+
+   /*
+* Ask the VM to offline this memory range.
+* Note: Assume that this function returns zero on success
+*/
+   result = acpi_memory_device_remove_memory(mem_device);
+
/* Power-off and eject the device */
result = acpi_memory_powerdown_device(mem_device);
if (result) {
@@ -478,12 +495,23 @@ static int acpi_memory_device_add(struct acpi_device 
*device)
 static int acpi_memory_device_remove(struct acpi_device *device, int type)
 {
struct acpi_memory_device *mem_device = NULL;
-
+   int result;
 
if (!device || !acpi_driver_data(device))
return -EINVAL;
 
mem_device = acpi_driver_data(device);
+
+   if (type == ACPI_BUS_REMOVAL_EJECT) {
+   /*
+* offline and remove memory only when the memory device is
+* ejected.
+*/
+   result = acpi_memory_device_remove_memory(mem_device);
+   if (result)
+   return result;
+   }
+
kfree(mem_device);
 
return 0;
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 86c8821..038be73 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -70,6 +70,45 @@ void unregister_memory_isolate_notifier(struct 
notifier_block *nb)
 }
 EXPORT_SYMBOL(unregister_memory_isolate_notifier);
 
+bool is_memblk_offline(unsigned long start, unsigned long size)
+{
+   struct memory_block *mem = NULL;
+   struct mem_section *section;
+   unsigned long start_pfn, end_pfn;
+   unsigned long pfn, section_nr;
+
+   start_pfn = PFN_DOWN(start);
+   end_pfn = PFN_UP(start + size);
+
+   for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+   section_nr = pfn_to_section_nr(pfn);
+   if (!present_section_nr(section_nr))
+   continue;
+
+   section = __nr_to_section(section_nr);
+   /* same memblock? */
+   if (mem)
+   if ((section_nr >= mem->start_section_nr) &&
+   (section_nr <

RE: [GIT PULL] PWM subsystem for v3.6

2012-07-27 Thread Philip, Avinash

On Fri, Jul 27, 2012 at 10:40:54, Thierry Reding wrote:
> On Thu, Jul 26, 2012 at 02:11:58PM -0700, Linus Torvalds wrote:
> > On Thu, Jul 26, 2012 at 12:16 AM, Thierry Reding
> >  wrote:
> > >
> > > The new PWM subsystem aims at collecting all implementations of the
> > > legacy PWM API and to eventually replace it completely. The subsystem
> > > has been in development for over half a year now and many drivers have
> > > already been converted. It has been in linux-next for a couple of weeks
> > > and there have been no major issues so I think it is ready for inclusion
> > > in your tree.
> > 
> > For new subsystems like this, I really want ack's from the people who
> > are expected to use it.
> 
> At least the patch that adds me as the maintainer is Acked-by: Sascha
> Hauer, who did the original work, and Arnd Bergmann who was involved in
> the review process. Other people such as Shawn Guo and Mark Brown have
> also been reviewing these patches and new patches have been contributed
> by Eric Bénard, Axel Lin, Sachin Kamat, Alexandre Courbot, Alexandre
> Pereira da Silva and Philip Avinash.
> 
> I'm adding all of them on Cc so they can ack this (I'm assuming acking
> this email will suffice).
> 

I found this framework very useful for supporting eCAP and eHRPWM driver on
TI's AM33xx platforms.

Acked-by: Philip, Avinash 

Avinash
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

[PATCH V2 1/6] memcg: remove MEMCG_NR_FILE_MAPPED

2012-07-27 Thread Sha Zhengju

From: Sha Zhengju 

While accounting memcg page stat, it's not worth to use MEMCG_NR_FILE_MAPPED
as an extra layer of indirection because of the complexity and presumed
performance overhead. We can use MEM_CGROUP_STAT_FILE_MAPPED directly.

Signed-off-by: Sha Zhengju 
Acked-by: KAMEZAWA Hiroyuki 
Acked-by: Michal Hocko 
Acked-by: Fengguang Wu 
Reviewed-by: Greg Thelen 
---
 include/linux/memcontrol.h |   28 
 mm/memcontrol.c|   25 +++--
 mm/rmap.c  |4 ++--
 3 files changed, 25 insertions(+), 32 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 83e7ba9..c1e2617 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -27,9 +27,21 @@ struct page_cgroup;
 struct page;
 struct mm_struct;
 
-/* Stats that can be updated by kernel. */
-enum mem_cgroup_page_stat_item {
-   MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */
+/*
+ * Statistics for memory cgroup.
+ *
+ * The corresponding mem_cgroup_stat_names is defined in mm/memcontrol.c,
+ * These two lists should keep in accord with each other.
+ */
+enum mem_cgroup_stat_index {
+   /*
+* For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
+*/
+   MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */
+   MEM_CGROUP_STAT_RSS,   /* # of pages charged as anon rss */
+   MEM_CGROUP_STAT_FILE_MAPPED,  /* # of pages charged as file rss */
+   MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */
+   MEM_CGROUP_STAT_NSTATS,
 };
 
 struct mem_cgroup_reclaim_cookie {
@@ -164,17 +176,17 @@ static inline void mem_cgroup_end_update_page_stat(struct 
page *page,
 }
 
 void mem_cgroup_update_page_stat(struct page *page,
-enum mem_cgroup_page_stat_item idx,
+enum mem_cgroup_stat_index idx,
 int val);
 
 static inline void mem_cgroup_inc_page_stat(struct page *page,
-   enum mem_cgroup_page_stat_item idx)
+   enum mem_cgroup_stat_index idx)
 {
mem_cgroup_update_page_stat(page, idx, 1);
 }
 
 static inline void mem_cgroup_dec_page_stat(struct page *page,
-   enum mem_cgroup_page_stat_item idx)
+   enum mem_cgroup_stat_index idx)
 {
mem_cgroup_update_page_stat(page, idx, -1);
 }
@@ -349,12 +361,12 @@ static inline void mem_cgroup_end_update_page_stat(struct 
page *page,
 }
 
 static inline void mem_cgroup_inc_page_stat(struct page *page,
-   enum mem_cgroup_page_stat_item idx)
+   enum mem_cgroup_stat_index idx)
 {
 }
 
 static inline void mem_cgroup_dec_page_stat(struct page *page,
-   enum mem_cgroup_page_stat_item idx)
+   enum mem_cgroup_stat_index idx)
 {
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1940ba8..aef9fb0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -76,21 +76,10 @@ static int really_do_swap_account __initdata = 0;
 #define do_swap_account0
 #endif
 
-
 /*
- * Statistics for memory cgroup.
+ * The corresponding mem_cgroup_stat_index is defined in 
include/linux/memcontrol.h,
+ * These two lists should keep in accord with each other.
  */
-enum mem_cgroup_stat_index {
-   /*
-* For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
-*/
-   MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */
-   MEM_CGROUP_STAT_RSS,   /* # of pages charged as anon rss */
-   MEM_CGROUP_STAT_FILE_MAPPED,  /* # of pages charged as file rss */
-   MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */
-   MEM_CGROUP_STAT_NSTATS,
-};
-
 static const char * const mem_cgroup_stat_names[] = {
"cache",
"rss",
@@ -1926,7 +1915,7 @@ void __mem_cgroup_end_update_page_stat(struct page *page, 
unsigned long *flags)
 }
 
 void mem_cgroup_update_page_stat(struct page *page,
-enum mem_cgroup_page_stat_item idx, int val)
+enum mem_cgroup_stat_index idx, int val)
 {
struct mem_cgroup *memcg;
struct page_cgroup *pc = lookup_page_cgroup(page);
@@ -1939,14 +1928,6 @@ void mem_cgroup_update_page_stat(struct page *page,
if (unlikely(!memcg || !PageCgroupUsed(pc)))
return;
 
-   switch (idx) {
-   case MEMCG_NR_FILE_MAPPED:
-   idx = MEM_CGROUP_STAT_FILE_MAPPED;
-   break;
-   default:
-   BUG();
-   }
-
this_cpu_add(memcg->stat->count[idx], val);
 }
 
diff --git a/mm/rmap.c b/mm/rmap.c
index 0f3b7cd..cd7e54e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1148,7 +1148,7 @@ void page_add_file_rmap(struct page *page)
mem_cg

[RFC PATCH v5 05/19] memory-hotplug: check whether memory is present or not

2012-07-27 Thread Wen Congyang

From: Yasuaki Ishimatsu 

If system supports memory hot-remove, online_pages() may online removed pages.
So online_pages() need to check whether onlining pages are present or not.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Wen Congyang 
Signed-off-by: Yasuaki Ishimatsu 
---
 include/linux/mmzone.h |   19 +++
 mm/memory_hotplug.c|   13 +
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 458988b..822f705 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1168,6 +1168,25 @@ void sparse_init(void);
 #define sparse_index_init(_sec, _nid)  do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
+#ifdef CONFIG_SPARSEMEM
+static inline int pfns_present(unsigned long pfn, unsigned long nr_pages)
+{
+   int i;
+   for (i = 0; i < nr_pages; i++) {
+   if (pfn_present(pfn + 1))
+   continue;
+   else
+   return -EINVAL;
+   }
+   return 0;
+}
+#else
+static inline int pfns_present(unsigned long pfn, unsigned long nr_pages)
+{
+   return 0;
+}
+#endif /* CONFIG_SPARSEMEM*/
+
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
 bool early_pfn_in_nid(unsigned long pfn, int nid);
 #else
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5af0a9f..d510be0 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -467,6 +467,19 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages)
struct memory_notify arg;
 
lock_memory_hotplug();
+   /*
+* If system supports memory hot-remove, the memory may have been
+* removed. So we check whether the memory has been removed or not.
+*
+* Note: When CONFIG_SPARSEMEM is defined, pfns_present() become
+*   effective. If CONFIG_SPARSEMEM is not defined, pfns_present()
+*   always returns 0.
+*/
+   ret = pfns_present(pfn, nr_pages);
+   if (ret) {
+   unlock_memory_hotplug();
+   return ret;
+   }
arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
arg.status_change_nid = -1;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v5 06/19] memory-hotplug: export the function acpi_bus_remove()

2012-07-27 Thread Wen Congyang

The function acpi_bus_remove() can remove a acpi device from acpi device.
When a acpi device is removed, we need to call this function to remove
the acpi device from acpi bus. So export this function.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
 drivers/acpi/scan.c |3 ++-
 include/acpi/acpi_bus.h |1 +
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index d1ecca2..1cefc34 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1224,7 +1224,7 @@ static int acpi_device_set_context(struct acpi_device 
*device)
return -ENODEV;
 }
 
-static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
+int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
 {
if (!dev)
return -EINVAL;
@@ -1246,6 +1246,7 @@ static int acpi_bus_remove(struct acpi_device *dev, int 
rmdevice)
 
return 0;
 }
+EXPORT_SYMBOL(acpi_bus_remove);
 
 static int acpi_add_single_object(struct acpi_device **child,
  acpi_handle handle, int type,
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index bde976e..2ccf109 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -360,6 +360,7 @@ bool acpi_bus_power_manageable(acpi_handle handle);
 bool acpi_bus_can_wakeup(acpi_handle handle);
 int acpi_power_resource_register_device(struct device *dev, acpi_handle 
handle);
 void acpi_power_resource_unregister_device(struct device *dev, acpi_handle 
handle);
+int acpi_bus_remove(struct acpi_device *dev, int rmdevice);
 #ifdef CONFIG_ACPI_PROC_EVENT
 int acpi_bus_generate_proc_event(struct acpi_device *device, u8 type, int 
data);
 int acpi_bus_generate_proc_event4(const char *class, const char *bid, u8 type, 
int data);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -alternative] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables V2 (resend)

2012-07-27 Thread Mel Gorman

On Thu, Jul 26, 2012 at 11:48:56PM -0400, Larry Woodman wrote:
> On 07/26/2012 02:37 PM, Rik van Riel wrote:
> >On 07/23/2012 12:04 AM, Hugh Dickins wrote:
> >
> >>I spent hours trying to dream up a better patch, trying various
> >>approaches.  I think I have a nice one now, what do you think?  And
> >>more importantly, does it work?  I have not tried to test it at all,
> >>that I'm hoping to leave to you, I'm sure you'll attack it with gusto!
> >>
> >>If you like it, please take it over and add your comments and signoff
> >>and send it in.  The second part won't come up in your testing,
> >>and could
> >>be made a separate patch if you prefer: it's a related point that struck
> >>me while I was playing with a different approach.
> >>
> >>I'm sorely tempted to leave a dangerous pair of eyes off the Cc,
> >>but that too would be unfair.
> >>
> >>Subject-to-your-testing-
> >>Signed-off-by: Hugh Dickins 
> >
> >This patch looks good to me.
> >
> >Larry, does Hugh's patch survive your testing?
> >
> >
>
> Like I said earlier, no. 

That is a surprise. Can you try your test case on 3.4 and tell us if the
patch fixes the problem there? I would like to rule out the possibility
that the locking rules are slightly different in RHEL. If it hits on 3.4
then it's also possible you are seeing a different bug, more on this later.

> However, I finally set up a reproducer
> that only takes a few seconds
> on a large system and this totally fixes the problem:
> 

The other possibility is that your reproducer case is triggering a
different race to mine. Would it be possible to post?

> -
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c36febb..cc023b8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2151,7 +2151,7 @@ int copy_hugetlb_page_range(struct mm_struct
> *dst, struct mm_struct *src,
> goto nomem;
> 
> /* If the pagetables are shared don't copy or take references 
> */
> -   if (dst_pte == src_pte)
> +   if (*(unsigned long *)dst_pte == *(unsigned long *)src_pte)
> continue;
> 
> spin_lock(&dst->page_table_lock);
> ---
> 
> When we compare what the src_pte & dst_pte point to instead of their
> addresses everything works,

The dst_pte and src_pte are pointing to the PMD page though which is what
we're meant to be checking. Your patch appears to change that to check if
they are sharing data which is quite different. This is functionally
similar to if you just checked VM_MAYSHARE at the start of the function
and bailed if so. The PTEs would be populated at fault time instead.

> I suspect there is a missing memory barrier somewhere ???
> 

Possibly but hard to tell whether it's barriers that are the real
problem during fork. The copy routine is suspicious.

On the barrier side - in normal PTE alloc routines there is a write
barrier which is documented in __pte_alloc. If hugepage table sharing is
successful, there is no similar barrier in huge_pmd_share before the PUD
is populated. By rights, there should be a smp_wmb() before the page table
spinlock is taken in huge_pmd_share().

The lack of a write barrier leads to a possible snarls between fork()
and fault. Take three processes, parent, child and other. Parent is
forking to create child. Other is calling fault.

Other faults
hugetlb_fault()->huge_pte_alloc->allocate a PMD (write barrier)
It is about to enter hugetlb_no_fault()

Parent forks() runs at the same time
Child shares a page table page but NOT with the forking process (dst_pte
!= src_pte) and calls huge_pte_offset.

As it's not reading the contents of the PMD page, there is no implicit read
barrier to pair with the write barrier from hugetlb_fault that updates
the PMD page and they are not serialised by the page table lock. Hard to
see exactly where that would cause a problem though.

Thing is, in this scenario I think it's possible that page table sharing
is not correctly detected by that dst_pte == src_pte check.  dst_pte !=
src_pte but that does not mean it's not sharing with somebody! If it's
sharing and it falls though then it copies the src PTE even though the
dst PTE could already be populated and updates the mapcount accordingly.
That would be a mess in its own right.

There might be two bugs here.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v5 07/19] memory-hotplug: call acpi_bus_remove() to remove memory device

2012-07-27 Thread Wen Congyang

The memory device has been ejected and powoffed, so we can call
acpi_bus_remove() to remove the memory device from acpi bus.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
 drivers/acpi/acpi_memhotplug.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index ed37fc2..755cc31 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -423,8 +423,9 @@ static void acpi_memory_device_notify(acpi_handle handle, 
u32 event, void *data)
}
 
/*
-* TBD: Invoke acpi_bus_remove to cleanup data structures
+* Invoke acpi_bus_remove() to remove memory device
 */
+   acpi_bus_remove(device, 1);
 
/* _EJ0 succeeded; _OST is not necessary */
return;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v5 08/19] memory-hotplug: remove /sys/firmware/memmap/X sysfs

2012-07-27 Thread Wen Congyang

From: Yasuaki Ishimatsu 

When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type}
sysfs files are created. But there is no code to remove these files. The patch
implements the function to remove them.

Note : The code does not free firmware_map_entry since there is no way to free
   memory which is allocated by bootmem.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Wen Congyang 
Signed-off-by: Yasuaki Ishimatsu 
---
 drivers/firmware/memmap.c|   78 +-
 include/linux/firmware-map.h |6 +++
 mm/memory_hotplug.c  |9 -
 3 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/drivers/firmware/memmap.c b/drivers/firmware/memmap.c
index 1296605..e03e84f 100644
--- a/drivers/firmware/memmap.c
+++ b/drivers/firmware/memmap.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Data types 
--
@@ -79,7 +80,22 @@ static const struct sysfs_ops memmap_attr_ops = {
.show = memmap_attr_show,
 };
 
+#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj)
+
+static void release_firmware_map_entry(struct kobject *kobj)
+{
+   struct firmware_map_entry *entry = to_memmap_entry(kobj);
+   struct page *page;
+
+   page = virt_to_page(entry);
+   if (PageSlab(page) || PageCompound(page))
+   kfree(entry);
+
+   /* There is no way to free memory allocated from bootmem*/
+}
+
 static struct kobj_type memmap_ktype = {
+   .release= release_firmware_map_entry,
.sysfs_ops  = &memmap_attr_ops,
.default_attrs  = def_attrs,
 };
@@ -123,6 +139,16 @@ static int firmware_map_add_entry(u64 start, u64 end,
return 0;
 }
 
+/**
+ * firmware_map_remove_entry() - Does the real work to remove a firmware
+ * memmap entry.
+ * @entry: removed entry.
+ **/
+static inline void firmware_map_remove_entry(struct firmware_map_entry *entry)
+{
+   list_del(&entry->list);
+}
+
 /*
  * Add memmap entry on sysfs
  */
@@ -144,6 +170,31 @@ static int add_sysfs_fw_map_entry(struct 
firmware_map_entry *entry)
return 0;
 }
 
+/*
+ * Remove memmap entry on sysfs
+ */
+static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
+{
+   kobject_put(&entry->kobj);
+}
+
+/*
+ * Search memmap entry
+ */
+
+struct firmware_map_entry * __meminit
+find_firmware_map_entry(u64 start, u64 end, const char *type)
+{
+   struct firmware_map_entry *entry;
+
+   list_for_each_entry(entry, &map_entries, list)
+   if ((entry->start == start) && (entry->end == end) &&
+   (!strcmp(entry->type, type)))
+   return entry;
+
+   return NULL;
+}
+
 /**
  * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
  * memory hotplug.
@@ -196,6 +247,32 @@ int __init firmware_map_add_early(u64 start, u64 end, 
const char *type)
return firmware_map_add_entry(start, end, type, entry);
 }
 
+/**
+ * firmware_map_remove() - remove a firmware mapping entry
+ * @start: Start of the memory range.
+ * @end:   End of the memory range.
+ * @type:  Type of the memory range.
+ *
+ * removes a firmware mapping entry.
+ *
+ * Returns 0 on success, or -EINVAL if no entry.
+ **/
+int __meminit firmware_map_remove(u64 start, u64 end, const char *type)
+{
+   struct firmware_map_entry *entry;
+
+   entry = find_firmware_map_entry(start, end - 1, type);
+   if (!entry)
+   return -EINVAL;
+
+   firmware_map_remove_entry(entry);
+
+   /* remove the memmap entry */
+   remove_sysfs_fw_map_entry(entry);
+
+   return 0;
+}
+
 /*
  * Sysfs functions 
-
  */
@@ -218,7 +295,6 @@ static ssize_t type_show(struct firmware_map_entry *entry, 
char *buf)
 }
 
 #define to_memmap_attr(_attr) container_of(_attr, struct memmap_attribute, 
attr)
-#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj)
 
 static ssize_t memmap_attr_show(struct kobject *kobj,
struct attribute *attr, char *buf)
diff --git a/include/linux/firmware-map.h b/include/linux/firmware-map.h
index 43fe52f..71d4fa7 100644
--- a/include/linux/firmware-map.h
+++ b/include/linux/firmware-map.h
@@ -25,6 +25,7 @@
 
 int firmware_map_add_early(u64 start, u64 end, const char *type);
 int firmware_map_add_hotplug(u64 start, u64 end, const char *type);
+int firmware_map_remove(u64 start, u64 end, const char *type);
 
 #else /* CONFIG_FIRMWARE_MEMMAP */
 
@@ -38,6 +39,11 @@ static inline int firmware_map_add_hotplug(u64 start, u64 
end, const char *type)
return 0;
 }
 
+static inline int firmware_map_remove(u64 start, u64 end, const char *type)
+{
+   return 0;
+}
+
 #endif /

Re: [RESEND RFC 3/3] memory-hotplug: bug fix race between isolation and allocation

2012-07-27 Thread Kamezawa Hiroyuki

(2012/07/23 9:48), Minchan Kim wrote:
> Like below, memory-hotplug makes race between page-isolation
> and page-allocation so it can hit BUG_ON in __offline_isolated_pages.
> 
>   CPU A   CPU B
> 
> start_isolate_page_range
> set_migratetype_isolate
> spin_lock_irqsave(zone->lock)
> 
>   free_hot_cold_page(Page A)
>   /* without zone->lock */
>   migratetype = get_pageblock_migratetype(Page A);
>   /*
>* Page could be moved into MIGRATE_MOVABLE
>* of per_cpu_pages
>*/
>   list_add_tail(&page->lru, 
> &pcp->lists[migratetype]);
> 
> set_pageblock_isolate
> move_freepages_block
> drain_all_pages
> 
>   /* Page A could be in MIGRATE_MOVABLE of 
> free_list. */
> 
> check_pages_isolated
> __test_page_isolated_in_pageblock
> /*
>   * We can't catch freed page which
>   * is free_list[MIGRATE_MOVABLE]
>   */
> if (PageBuddy(page A))
>   pfn += 1 << page_order(page A);
> 
>   /* So, Page A could be allocated */
> 
> __offline_isolated_pages
> /*
>   * BUG_ON hit or offline page
>   * which is used by someone
>   */
> BUG_ON(!PageBuddy(page A));
> 
> Signed-off-by: Minchan Kim 

Ah, hm. Then, you say the page in MIGRATE_MOVABLE will not be isolated
and may be used again.


> ---
> I found this problem during code review so please confirm it.
> Kame?
> 
>   mm/page_isolation.c |5 -
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index acf65a7..4699d1f 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -196,8 +196,11 @@ __test_page_isolated_in_pageblock(unsigned long pfn, 
> unsigned long end_pfn)
>   continue;
>   }
>   page = pfn_to_page(pfn);
> - if (PageBuddy(page))
> + if (PageBuddy(page)) {
> + if (get_page_migratetype(page) != MIGRATE_ISOLATE)
> + break;

Doesn't this work enough ? The problem is MIGRATE_TYPE and list_head mis-match.

Thanks,
-Kame
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2 2/6] Make TestSetPageDirty and dirty page accounting in one func

2012-07-27 Thread Sha Zhengju

From: Sha Zhengju 

Commit a8e7d49a(Fix race in create_empty_buffers() vs 
__set_page_dirty_buffers())
extracts TestSetPageDirty from __set_page_dirty and is far away from
account_page_dirtied.But it's better to make the two operations in one single
function to keep modular.So in order to avoid the potential race mentioned in
commit a8e7d49a, we can hold private_lock until __set_page_dirty completes.
I guess there's no deadlock between ->private_lock and ->tree_lock by quick 
look.
It's a prepare patch for following memcg dirty page accounting patches.


Here is some test numbers that before/after this patch:

Test steps(Mem-24g, ext4):
drop_cache; sync
fio (buffered/randwrite/bs=4k/size=128m/filesize=1g/numjobs=8/sync)

We test it for 10 times and get the average numbers:
Before:
write: io=1024.0MB, bw=334678 KB/s, iops=83669.2 , runt=  3136 msec
lat (usec): min=1 , max=26203.1 , avg=81.473, stdev=275.754
After:
write: io=1024.0MB, bw=331583 KB/s, iops=82895.3 , runt=  3164.4 msec
lat (usec): min=1.1 , max=19001.6 , avg=83.544, stdev=272.704

Note that the impact is little(~1%).

Signed-off-by: Sha Zhengju 
Reviewed-by: Michal Hocko 
---
 fs/buffer.c |   25 +
 1 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index c7062c8..5e0b0d2 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -610,9 +610,15 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode);
  * If warn is true, then emit a warning if the page is not uptodate and has
  * not been truncated.
  */
-static void __set_page_dirty(struct page *page,
+static int __set_page_dirty(struct page *page,
struct address_space *mapping, int warn)
 {
+   if (unlikely(!mapping))
+   return !TestSetPageDirty(page);
+
+   if (TestSetPageDirty(page))
+   return 0;
+
spin_lock_irq(&mapping->tree_lock);
if (page->mapping) {/* Race with truncate? */
WARN_ON_ONCE(warn && !PageUptodate(page));
@@ -622,6 +628,8 @@ static void __set_page_dirty(struct page *page,
}
spin_unlock_irq(&mapping->tree_lock);
__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
+
+   return 1;
 }
 
 /*
@@ -667,11 +675,9 @@ int __set_page_dirty_buffers(struct page *page)
bh = bh->b_this_page;
} while (bh != head);
}
-   newly_dirty = !TestSetPageDirty(page);
+   newly_dirty = __set_page_dirty(page, mapping, 1);
spin_unlock(&mapping->private_lock);
 
-   if (newly_dirty)
-   __set_page_dirty(page, mapping, 1);
return newly_dirty;
 }
 EXPORT_SYMBOL(__set_page_dirty_buffers);
@@ -1119,14 +1125,9 @@ void mark_buffer_dirty(struct buffer_head *bh)
return;
}
 
-   if (!test_set_buffer_dirty(bh)) {
-   struct page *page = bh->b_page;
-   if (!TestSetPageDirty(page)) {
-   struct address_space *mapping = page_mapping(page);
-   if (mapping)
-   __set_page_dirty(page, mapping, 0);
-   }
-   }
+   if (!test_set_buffer_dirty(bh))
+   __set_page_dirty(bh->b_page, page_mapping(bh->b_page), 0);
+
 }
 EXPORT_SYMBOL(mark_buffer_dirty);
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v5 09/19] memory-hotplug: does not release memory region in PAGES_PER_SECTION chunks

2012-07-27 Thread Wen Congyang

From: Yasuaki Ishimatsu 

Since applying a patch(de7f0cba96786c), release_mem_region() has been changed
as called in PAGES_PER_SECTION chunks because register_memory_resource() is
called in PAGES_PER_SECTION chunks by add_memory(). But it seems firmware
dependency. If CRS are written in the PAGES_PER_SECTION chunks in ACPI DSDT
Table, register_memory_resource() is called in PAGES_PER_SECTION chunks.
But if CRS are written in the DIMM unit in ACPI DSDT Table,
register_memory_resource() is called in DIMM unit. So release_mem_region()
should not be called in PAGES_PER_SECTION chunks. The patch fixes it.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Wen Congyang 
Signed-off-by: Yasuaki Ishimatsu 
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   13 +
 mm/memory_hotplug.c |4 ++--
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 11d8e05..dc0a035 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -77,7 +77,8 @@ static int pseries_remove_memblock(unsigned long base, 
unsigned int memblock_siz
 {
unsigned long start, start_pfn;
struct zone *zone;
-   int ret;
+   int i, ret;
+   int sections_to_remove;
 
start_pfn = base >> PAGE_SHIFT;
 
@@ -97,9 +98,13 @@ static int pseries_remove_memblock(unsigned long base, 
unsigned int memblock_siz
 * to sysfs "state" file and we can't remove sysfs entries
 * while writing to it. So we have to defer it to here.
 */
-   ret = __remove_pages(zone, start_pfn, memblock_size >> PAGE_SHIFT);
-   if (ret)
-   return ret;
+   sections_to_remove = (memblock_size >> PAGE_SHIFT) / PAGES_PER_SECTION;
+   for (i = 0; i < sections_to_remove; i++) {
+   unsigned long pfn = start_pfn + i * PAGES_PER_SECTION;
+   ret = __remove_pages(zone, start_pfn,  PAGES_PER_SECTION);
+   if (ret)
+   return ret;
+   }
 
/*
 * Update memory regions for memory remove
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5237d49..d360c5c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -358,11 +358,11 @@ int __remove_pages(struct zone *zone, unsigned long 
phys_start_pfn,
BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
BUG_ON(nr_pages % PAGES_PER_SECTION);
 
+   release_mem_region(phys_start_pfn << PAGE_SHIFT,  nr_pages * PAGE_SIZE);
+
sections_to_remove = nr_pages / PAGES_PER_SECTION;
for (i = 0; i < sections_to_remove; i++) {
unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
-   release_mem_region(pfn << PAGE_SHIFT,
-  PAGES_PER_SECTION << PAGE_SHIFT);
ret = __remove_section(zone, __pfn_to_section(pfn));
if (ret)
break;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v5 10/19] memory-hotplug: add memory_block_release

2012-07-27 Thread Wen Congyang

From: Yasuaki Ishimatsu 

When calling remove_memory_block(), the function shows following message at
device_release().

Device 'memory528' does not have a release() function, it is broken and must
be fixed.

remove_memory_block() calls kfree(mem). I think it shouled be called from
device_release(). So the patch implements memory_block_release()

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Wen Congyang 
Signed-off-by: Yasuaki Ishimatsu 
---
 drivers/base/memory.c |   11 ++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 038be73..1cd3ef3 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -109,6 +109,15 @@ bool is_memblk_offline(unsigned long start, unsigned long 
size)
 }
 EXPORT_SYMBOL(is_memblk_offline);
 
+#define to_memory_block(device) container_of(device, struct memory_block, dev)
+
+static void release_memory_block(struct device *dev)
+{
+   struct memory_block *mem = to_memory_block(dev);
+
+   kfree(mem);
+}
+
 /*
  * register_memory - Setup a sysfs device for a memory block
  */
@@ -119,6 +128,7 @@ int register_memory(struct memory_block *memory)
 
memory->dev.bus = &memory_subsys;
memory->dev.id = memory->start_section_nr / sections_per_block;
+   memory->dev.release = release_memory_block;
 
error = device_register(&memory->dev);
return error;
@@ -674,7 +684,6 @@ int remove_memory_block(unsigned long node_id, struct 
mem_section *section,
mem_remove_simple_file(mem, phys_device);
mem_remove_simple_file(mem, removable);
unregister_memory(mem);
-   kfree(mem);
} else
kobject_put(&mem->dev.kobj);
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v5 11/19] memory-hotplug: remove_memory calls __remove_pages

2012-07-27 Thread Wen Congyang

From: Yasuaki Ishimatsu 

The patch adds __remove_pages() to remove_memory(). Then the range of
phys_start_pfn argument and nr_pages argument in __remove_pagse() may
have different zone. So zone argument is removed from __remove_pages()
and __remove_pages() caluculates zone in each section.

When CONFIG_SPARSEMEM_VMEMMAP is defined, there is no way to remove a memmap.
So __remove_section only calls unregister_memory_section().

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Wen Congyang 
Signed-off-by: Yasuaki Ishimatsu 
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |5 +
 include/linux/memory_hotplug.h  |3 +--
 mm/memory_hotplug.c |   18 +++---
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index dc0a035..cc14da4 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -76,7 +76,6 @@ unsigned long memory_block_size_bytes(void)
 static int pseries_remove_memblock(unsigned long base, unsigned int 
memblock_size)
 {
unsigned long start, start_pfn;
-   struct zone *zone;
int i, ret;
int sections_to_remove;
 
@@ -87,8 +86,6 @@ static int pseries_remove_memblock(unsigned long base, 
unsigned int memblock_siz
return 0;
}
 
-   zone = page_zone(pfn_to_page(start_pfn));
-
/*
 * Remove section mappings and sysfs entries for the
 * section of the memory we are removing.
@@ -101,7 +98,7 @@ static int pseries_remove_memblock(unsigned long base, 
unsigned int memblock_siz
sections_to_remove = (memblock_size >> PAGE_SHIFT) / PAGES_PER_SECTION;
for (i = 0; i < sections_to_remove; i++) {
unsigned long pfn = start_pfn + i * PAGES_PER_SECTION;
-   ret = __remove_pages(zone, start_pfn,  PAGES_PER_SECTION);
+   ret = __remove_pages(start_pfn,  PAGES_PER_SECTION);
if (ret)
return ret;
}
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index fd84ea9..8bf820d 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -90,8 +90,7 @@ extern bool is_pageblock_removable_nolock(struct page *page);
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages);
-extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
-   unsigned long nr_pages);
+extern int __remove_pages(unsigned long start_pfn, unsigned long nr_pages);
 
 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d360c5c..a9e1579 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -275,11 +275,14 @@ static int __meminit __add_section(int nid, struct zone 
*zone,
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 static int __remove_section(struct zone *zone, struct mem_section *ms)
 {
-   /*
-* XXX: Freeing memmap with vmemmap is not implement yet.
-*  This should be removed later.
-*/
-   return -EBUSY;
+   int ret = -EINVAL;
+
+   if (!valid_section(ms))
+   return ret;
+
+   ret = unregister_memory_section(ms);
+
+   return ret;
 }
 #else
 static int __remove_section(struct zone *zone, struct mem_section *ms)
@@ -346,11 +349,11 @@ EXPORT_SYMBOL_GPL(__add_pages);
  * sure that pages are marked reserved and zones are adjust properly by
  * calling offline_pages().
  */
-int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
-unsigned long nr_pages)
+int __remove_pages(unsigned long phys_start_pfn, unsigned long nr_pages)
 {
unsigned long i, ret = 0;
int sections_to_remove;
+   struct zone *zone;
 
/*
 * We can only remove entire sections
@@ -363,6 +366,7 @@ int __remove_pages(struct zone *zone, unsigned long 
phys_start_pfn,
sections_to_remove = nr_pages / PAGES_PER_SECTION;
for (i = 0; i < sections_to_remove; i++) {
unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
+   zone = page_zone(pfn_to_page(pfn));
ret = __remove_section(zone, __pfn_to_section(pfn));
if (ret)
break;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v5 12/19] memory-hotplug: introduce new function arch_remove_memory()

2012-07-27 Thread Wen Congyang

We don't call __add_pages() directly in the function add_memory()
because some other architecture related things need to be done
before or after calling __add_pages(). So we should introduce
a new function arch_remove_memory() to revert the things
done in arch_add_memory().

Note: the function for s390 is not implemented(I don't know how to
implement it for s390).

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
 arch/ia64/mm/init.c  |   16 
 arch/powerpc/mm/mem.c|   14 +++
 arch/s390/mm/init.c  |8 ++
 arch/sh/mm/init.c|   15 +++
 arch/tile/mm/init.c  |8 ++
 arch/x86/include/asm/pgtable_types.h |1 +
 arch/x86/mm/init_32.c|   10 ++
 arch/x86/mm/init_64.c|  160 ++
 arch/x86/mm/pageattr.c   |   47 +-
 include/linux/memory_hotplug.h   |1 +
 mm/memory_hotplug.c  |1 +
 11 files changed, 259 insertions(+), 22 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 0eab454..1e345ed 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -688,6 +688,22 @@ int arch_add_memory(int nid, u64 start, u64 size)
 
return ret;
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   unsigned long start_pfn = start >> PAGE_SHIFT;
+   unsigned long nr_pages = size >> PAGE_SHIFT;
+   int ret;
+
+   ret = __remove_pages(start_pfn, nr_pages);
+   if (ret)
+   pr_warn("%s: Problem encountered in __remove_pages() as"
+   " ret=%d\n", __func__,  ret);
+
+   return ret;
+}
+#endif
 #endif
 
 /*
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index baaafde..249cef4 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -133,6 +133,20 @@ int arch_add_memory(int nid, u64 start, u64 size)
 
return __add_pages(nid, zone, start_pfn, nr_pages);
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   unsigned long start_pfn = start >> PAGE_SHIFT;
+   unsigned long nr_pages = size >> PAGE_SHIFT;
+
+   start = (unsigned long)__va(start);
+   if (remove_section_mapping(start, start + size))
+   return -EINVAL;
+
+   return __remove_pages(start_pfn, nr_pages);
+}
+#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 /*
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 6adbc08..ca4bc46 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -257,4 +257,12 @@ int arch_add_memory(int nid, u64 start, u64 size)
vmem_remove_mapping(start, size);
return rc;
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   /* TODO */
+   return -EBUSY;
+}
+#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 82cc576..fc84491 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -558,4 +558,19 @@ int memory_add_physaddr_to_nid(u64 addr)
 EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
 #endif
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   unsigned long start_pfn = start >> PAGE_SHIFT;
+   unsigned long nr_pages = size >> PAGE_SHIFT;
+   int ret;
+
+   ret = __remove_pages(start_pfn, nr_pages);
+   if (unlikely(ret))
+   pr_warn("%s: Failed, __remove_pages() == %d\n", __func__,
+   ret);
+
+   return ret;
+}
+#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index ef29d6c..2749515 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -935,6 +935,14 @@ int remove_memory(u64 start, u64 size)
 {
return -EINVAL;
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   /* TODO */
+   return -EBUSY;
+}
+#endif
 #endif
 
 struct kmem_cache *pgd_cache;
diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index 013286a..b725af2 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -334,6 +334,7 @@ static inline void update_page_count(int level, unsigned 
long pages) { }
  * as a pte too.
  */
 extern pte_t *lookup_address(unsigned long address, unsigned int *level);
+extern int __split_large_page(pte_t *kpte, unsigned long address, pte_t 
*pbase);
 
 #endif /* !__ASSEMBLY__ */
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 575d86f..a690153 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -842,6 +842,16 @@ int arch_add_memory(int nid, u64 start, u64 size)
 
return __add_pages(nid, zone, start_pfn, nr_pages);
 }
+
+#ifdef CONFIG_MEMO

[PATCH V2 3/6] Use vfs __set_page_dirty interface instead of doing it inside filesystem

2012-07-27 Thread Sha Zhengju

From: Sha Zhengju 

Following we will treat SetPageDirty and dirty page accounting as an integrated
operation. Filesystems had better use vfs interface directly to avoid those 
details.

Signed-off-by: Sha Zhengju 
Acked-by: Sage Weil 
---
 fs/buffer.c |3 ++-
 fs/ceph/addr.c  |   20 ++--
 include/linux/buffer_head.h |2 ++
 3 files changed, 6 insertions(+), 19 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 5e0b0d2..ffcfb87 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode);
  * If warn is true, then emit a warning if the page is not uptodate and has
  * not been truncated.
  */
-static int __set_page_dirty(struct page *page,
+int __set_page_dirty(struct page *page,
struct address_space *mapping, int warn)
 {
if (unlikely(!mapping))
@@ -631,6 +631,7 @@ static int __set_page_dirty(struct page *page,
 
return 1;
 }
+EXPORT_SYMBOL(__set_page_dirty);
 
 /*
  * Add a page to the dirty page list.
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 8b67304..d028fbe 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include/* generic_writepages */
+#include 
 #include 
 #include 
 #include 
@@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page)
int undo = 0;
struct ceph_snap_context *snapc;
 
-   if (unlikely(!mapping))
-   return !TestSetPageDirty(page);
-
-   if (TestSetPageDirty(page)) {
-   dout("%p set_page_dirty %p idx %lu -- already dirty\n",
-mapping->host, page, page->index);
+   if (!__set_page_dirty(page, mapping, 1))
return 0;
-   }
 
inode = mapping->host;
ci = ceph_inode(inode);
@@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page)
 snapc, snapc->seq, snapc->num_snaps);
spin_unlock(&ci->i_ceph_lock);
 
-   /* now adjust page */
-   spin_lock_irq(&mapping->tree_lock);
if (page->mapping) {/* Race with truncate? */
-   WARN_ON_ONCE(!PageUptodate(page));
-   account_page_dirtied(page, page->mapping);
-   radix_tree_tag_set(&mapping->page_tree,
-   page_index(page), PAGECACHE_TAG_DIRTY);
-
/*
 * Reference snap context in page->private.  Also set
 * PagePrivate so that we get invalidatepage callback.
@@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page *page)
undo = 1;
}
 
-   spin_unlock_irq(&mapping->tree_lock);
-
if (undo)
/* whoops, we failed to dirty the page */
ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
 
-   __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
-
BUG_ON(!PageDirty(page));
return 1;
 }
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 458f497..0a331a8 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head *bh)
 }
 
 extern int __set_page_dirty_buffers(struct page *page);
+extern int __set_page_dirty(struct page *page,
+   struct address_space *mapping, int warn);
 
 #else /* CONFIG_BLOCK */
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v5 13/19] memory-hotplug: check page type in get_page_bootmem

2012-07-27 Thread Wen Congyang

From: Yasuaki Ishimatsu 

There is a possibility that get_page_bootmem() is called to the same page many
times. So when get_page_bootmem is called to the same page, the function only
increments page->_count.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Wen Congyang 
Signed-off-by: Yasuaki Ishimatsu 
---
 mm/memory_hotplug.c |   15 +++
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0c932e1..eae946b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -95,10 +95,17 @@ static void release_memory_resource(struct resource *res)
 static void get_page_bootmem(unsigned long info,  struct page *page,
 unsigned long type)
 {
-   page->lru.next = (struct list_head *) type;
-   SetPagePrivate(page);
-   set_page_private(page, info);
-   atomic_inc(&page->_count);
+   unsigned long page_type;
+
+   page_type = (unsigned long) page->lru.next;
+   if (type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
+   type > MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
+   page->lru.next = (struct list_head *) type;
+   SetPagePrivate(page);
+   set_page_private(page, info);
+   atomic_inc(&page->_count);
+   } else
+   atomic_inc(&page->_count);
 }
 
 /* reference to __meminit __free_pages_bootmem is valid
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] regmap: Add regmap dummy driver

2012-07-27 Thread Dimitris Papastamos

Add a pseudo-driver for debugging and stress-testing the
regmap/regcache APIs.  A standard set of tools for working
with this driver (mainly sh scripts) will be put in a repo
at https://github.com/quantumdream/regmap-tools.

Change-Id: Ie6498f18d6f9a1f7a7cf813240e87ffed0d6f047
Signed-off-by: Dimitris Papastamos 
---
 This is an initial implementation of the regdummy driver for regmap.

 This is mainly useful for debugging/stress-testing regcache as it
 removes the need for real hardware and can be done in an emulated
 environment very easily.

 There'll be incremental patches adding more features such as,
 support for configurable volatile/readable/etc. registers via
 debugfs entries.

 drivers/base/regmap/Kconfig|   8 +
 drivers/base/regmap/Makefile   |   1 +
 drivers/base/regmap/regmap-dummy.c | 599 +
 3 files changed, 608 insertions(+)
 create mode 100644 drivers/base/regmap/regmap-dummy.c

diff --git a/drivers/base/regmap/Kconfig b/drivers/base/regmap/Kconfig
index 6be390b..5a1ab02 100644
--- a/drivers/base/regmap/Kconfig
+++ b/drivers/base/regmap/Kconfig
@@ -20,3 +20,11 @@ config REGMAP_MMIO
 
 config REGMAP_IRQ
bool
+
+config REGMAP_DUMMY
+   tristate
+   select REGMAP_MMIO
+   help
+ Say Y or M if you want to add the regdummy driver for regmap.
+ This is a pseudo-driver used for debugging and stress-testing
+ the regmap/regcache APIs.
diff --git a/drivers/base/regmap/Makefile b/drivers/base/regmap/Makefile
index 5e75d1b..c5d70f1 100644
--- a/drivers/base/regmap/Makefile
+++ b/drivers/base/regmap/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_REGMAP_I2C) += regmap-i2c.o
 obj-$(CONFIG_REGMAP_SPI) += regmap-spi.o
 obj-$(CONFIG_REGMAP_MMIO) += regmap-mmio.o
 obj-$(CONFIG_REGMAP_IRQ) += regmap-irq.o
+obj-$(CONFIG_REGMAP_DUMMY) += regmap-dummy.o
diff --git a/drivers/base/regmap/regmap-dummy.c 
b/drivers/base/regmap/regmap-dummy.c
new file mode 100644
index 000..76310db
--- /dev/null
+++ b/drivers/base/regmap/regmap-dummy.c
@@ -0,0 +1,599 @@
+/*
+ * Register map access API - Dummy regmap driver
+ *
+ * Copyright 2012 Wolfson Microelectronics PLC.
+ *
+ * Author: Dimitris Papastamos 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DEFAULT_REGS_SIZE 1024
+
+struct regdummy_dev {
+   struct device *dev;
+   struct mutex lock;
+
+   /* Set when regdummy defaults have been modified.
+* This is useful to know so we don't reinit the
+* cache if there is no reason to do so. */
+   unsigned int dirty:1;
+
+   void *regs;
+   unsigned int regs_size;
+   unsigned int regs_size_new;
+
+   struct regmap *map;
+   struct regmap_config *config;
+   struct reg_default *regdef;
+};
+
+static struct dentry *regdummy_debugfs_root;
+
+/* Default volatile register callback, this should
+ * normally be configured by the user via a debugfs
+ * entry */
+static bool regdummy_volatile_reg(struct device *dev,
+ unsigned int reg)
+{
+   return false;
+}
+
+/* Default readable register callback, this should
+ * normally be configured by the user via a debugfs
+ * entry */
+static bool regdummy_readable_reg(struct device *dev,
+ unsigned int reg)
+{
+   return true;
+}
+
+/* Default precious register callback, this should
+ * normally be configured by the user via a debugfs
+ * entry */
+static bool regdummy_precious_reg(struct device *dev,
+ unsigned int reg)
+{
+   return false;
+}
+
+/* Calculate the length of a fixed format  */
+static size_t regmap_calc_reg_len(int max_val, char *buf, size_t buf_size)
+{
+   snprintf(buf, buf_size, "%x", max_val);
+   return strlen(buf);
+}
+
+static ssize_t regdummy_defaults_read_file(struct file *file, char __user 
*user_buf,
+  size_t count, loff_t *ppos)
+{
+   int reg_len, val_len, tot_len;
+   size_t buf_pos = 0;
+   loff_t p = 0;
+   ssize_t ret;
+   int i;
+   struct regdummy_dev *rdevp = file->private_data;
+   struct regmap_config *config;
+   struct reg_default *regdef;
+   unsigned int val;
+   unsigned int j;
+   unsigned int regdef_num;
+   char *buf;
+
+   if (*ppos < 0 || !count)
+   return -EINVAL;
+
+   buf = kmalloc(count, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   mutex_lock(&rdevp->lock);
+
+   config = rdevp->config;
+   regdef = rdevp->regdef;
+   regdef_num = rdevp->regs_size / config->reg_stride;
+
+   /* Calculate the length of a fixed format  */
+   reg_len = regmap_calc_reg_len(config->max_r

[PATCH V2 4/6] memcg: add per cgroup dirty pages accounting

2012-07-27 Thread Sha Zhengju

From: Sha Zhengju 

This patch adds memcg routines to count dirty pages, which allows memory 
controller
to maintain an accurate view of the amount of its dirty memory and can provide 
some
info for users while group's direct reclaim is working.

After Kame's commit 89c06bd5(memcg: use new logic for page stat accounting), we 
can
use 'struct page' flag to test page state instead of per page_cgroup flag. But 
memcg
has a feature to move a page from a cgroup to another one and may have race 
between
"move" and "page stat accounting". So in order to avoid the race we have 
designed a
bigger lock:

 mem_cgroup_begin_update_page_stat()
 modify page information-->(a)
 mem_cgroup_update_page_stat()  -->(b)
 mem_cgroup_end_update_page_stat()

It requires (a) and (b)(dirty pages accounting) can stay close enough.

In the previous two prepare patches, we have reworked the vfs set page dirty 
routines
and now the interfaces are more explicit:
incrementing (2):
__set_page_dirty
__set_page_dirty_nobuffers
decrementing (2):
clear_page_dirty_for_io
cancel_dirty_page


Signed-off-by: Sha Zhengju 
Acked-by: KAMEZAWA Hiroyuki 
Acked-by: Fengguang Wu 
---
 fs/buffer.c|   16 +---
 include/linux/memcontrol.h |1 +
 mm/filemap.c   |9 +
 mm/memcontrol.c|   28 +---
 mm/page-writeback.c|   31 ++-
 mm/truncate.c  |6 ++
 6 files changed, 76 insertions(+), 15 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index ffcfb87..e7b5766 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -613,11 +613,19 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode);
 int __set_page_dirty(struct page *page,
struct address_space *mapping, int warn)
 {
+   bool locked;
+   unsigned long flags;
+   int ret = 1;
+
if (unlikely(!mapping))
return !TestSetPageDirty(page);
 
-   if (TestSetPageDirty(page))
-   return 0;
+   mem_cgroup_begin_update_page_stat(page, &locked, &flags);
+
+   if (TestSetPageDirty(page)) {
+   ret = 0;
+   goto out;
+   }
 
spin_lock_irq(&mapping->tree_lock);
if (page->mapping) {/* Race with truncate? */
@@ -629,7 +637,9 @@ int __set_page_dirty(struct page *page,
spin_unlock_irq(&mapping->tree_lock);
__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
 
-   return 1;
+out:
+   mem_cgroup_end_update_page_stat(page, &locked, &flags);
+   return ret;
 }
 EXPORT_SYMBOL(__set_page_dirty);
 
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index c1e2617..8c6b8ca 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -41,6 +41,7 @@ enum mem_cgroup_stat_index {
MEM_CGROUP_STAT_RSS,   /* # of pages charged as anon rss */
MEM_CGROUP_STAT_FILE_MAPPED,  /* # of pages charged as file rss */
MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */
+   MEM_CGROUP_STAT_FILE_DIRTY,  /* # of dirty pages in page cache */
MEM_CGROUP_STAT_NSTATS,
 };
 
diff --git a/mm/filemap.c b/mm/filemap.c
index a4a5260..7f53fb0 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -62,6 +62,10 @@
  *  ->swap_lock(exclusive_swap_page, others)
  *->mapping->tree_lock
  *
+ *->private_lock   (__set_page_dirty_buffers)
+ *  ->memcg->move_lock 
(mem_cgroup_begin_update_page_stat->move_lock_mem_cgroup)
+ *->mapping->tree_lock
+ *
  *  ->i_mutex
  *->i_mmap_mutex   (truncate->unmap_mapping_range)
  *
@@ -112,6 +116,8 @@
 void __delete_from_page_cache(struct page *page)
 {
struct address_space *mapping = page->mapping;
+   bool locked;
+   unsigned long flags;
 
/*
 * if we're uptodate, flush out into the cleancache, otherwise
@@ -139,10 +145,13 @@ void __delete_from_page_cache(struct page *page)
 * Fix it up by doing a final dirty accounting check after
 * having removed the page entirely.
 */
+   mem_cgroup_begin_update_page_stat(page, &locked, &flags);
if (PageDirty(page) && mapping_cap_account_dirty(mapping)) {
+   mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_DIRTY);
dec_zone_page_state(page, NR_FILE_DIRTY);
dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
}
+   mem_cgroup_end_update_page_stat(page, &locked, &flags);
 }
 
 /**
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index aef9fb0..cdcd547 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -85,6 +85,7 @@ static const char * const mem_cgroup_stat_names[] = {
"rss",
"mapped_file",
"swap",
+   "dirty",
 };
 
 enum mem_cgroup_events_index {
@@ -2541,6 +2542,18 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 }
 #end

[RFC PATCH v5 14/19] memory-hotplug: move register_page_bootmem_info_node and put_page_bootmem for sparse-vmemmap

2012-07-27 Thread Wen Congyang

From: Yasuaki Ishimatsu 

For implementing register_page_bootmem_info_node of sparse-vmemmap,
register_page_bootmem_info_node and put_page_bootmem are moved to
memory_hotplug.c

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Wen Congyang 
Signed-off-by: Yasuaki Ishimatsu 
---
 include/linux/memory_hotplug.h |9 -
 mm/memory_hotplug.c|8 ++--
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 0d500be..fe50a9b 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -162,17 +162,8 @@ static inline void arch_refresh_nodedata(int nid, 
pg_data_t *pgdat)
 #endif /* CONFIG_NUMA */
 #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
 
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
-{
-}
-static inline void put_page_bootmem(struct page *page)
-{
-}
-#else
 extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
 extern void put_page_bootmem(struct page *page);
-#endif
 
 /*
  * Lock for memory hotplug guarantees 1) all callbacks for memory hotplug
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index eae946b..180d555 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -91,7 +91,6 @@ static void release_memory_resource(struct resource *res)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
-#ifndef CONFIG_SPARSEMEM_VMEMMAP
 static void get_page_bootmem(unsigned long info,  struct page *page,
 unsigned long type)
 {
@@ -127,6 +126,7 @@ void __ref put_page_bootmem(struct page *page)
 
 }
 
+#ifndef CONFIG_SPARSEMEM_VMEMMAP
 static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
unsigned long *usemap, mapsize, section_nr, i;
@@ -163,6 +163,11 @@ static void register_page_bootmem_info_section(unsigned 
long start_pfn)
get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
 
 }
+#else
+static inline void register_page_bootmem_info_section(unsigned long start_pfn)
+{
+}
+#endif
 
 void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
@@ -198,7 +203,6 @@ void register_page_bootmem_info_node(struct pglist_data 
*pgdat)
register_page_bootmem_info_section(pfn);
 
 }
-#endif /* !CONFIG_SPARSEMEM_VMEMMAP */
 
 static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
   unsigned long end_pfn)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2 5/6] memcg: add per cgroup writeback pages accounting

2012-07-27 Thread Sha Zhengju

From: Sha Zhengju 

Similar to dirty page, we add per cgroup writeback pages accounting. The lock
rule still is:
mem_cgroup_begin_update_page_stat()
modify page WRITEBACK stat
mem_cgroup_update_page_stat()
mem_cgroup_end_update_page_stat()

There're two writeback interface to modify: test_clear/set_page_writeback.

Signed-off-by: Sha Zhengju 
---
 include/linux/memcontrol.h |1 +
 mm/memcontrol.c|5 +
 mm/page-writeback.c|   17 +
 3 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 8c6b8ca..0c8a699 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -42,6 +42,7 @@ enum mem_cgroup_stat_index {
MEM_CGROUP_STAT_FILE_MAPPED,  /* # of pages charged as file rss */
MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */
MEM_CGROUP_STAT_FILE_DIRTY,  /* # of dirty pages in page cache */
+   MEM_CGROUP_STAT_WRITEBACK,  /* # of pages under writeback */
MEM_CGROUP_STAT_NSTATS,
 };
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index cdcd547..de91d3d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -86,6 +86,7 @@ static const char * const mem_cgroup_stat_names[] = {
"mapped_file",
"swap",
"dirty",
+   "writeback",
 };
 
 enum mem_cgroup_events_index {
@@ -2607,6 +2608,10 @@ static int mem_cgroup_move_account(struct page *page,
mem_cgroup_move_account_page_stat(from, to,
MEM_CGROUP_STAT_FILE_DIRTY);
 
+   if (PageWriteback(page))
+   mem_cgroup_move_account_page_stat(from, to,
+   MEM_CGROUP_STAT_WRITEBACK);
+
mem_cgroup_charge_statistics(from, anon, -nr_pages);
 
/* caller should have done css_get */
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 233e7ac..6b06d5e 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1956,11 +1956,17 @@ EXPORT_SYMBOL(account_page_dirtied);
 
 /*
  * Helper function for set_page_writeback family.
+ *
+ * The caller must hold mem_cgroup_begin/end_update_page_stat() lock
+ * while modifying struct page state and accounting writeback pages.
+ * See test_set_page_writeback for example.
+ *
  * NOTE: Unlike account_page_dirtied this does not rely on being atomic
  * wrt interrupts.
  */
 void account_page_writeback(struct page *page)
 {
+   mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_WRITEBACK);
inc_zone_page_state(page, NR_WRITEBACK);
 }
 EXPORT_SYMBOL(account_page_writeback);
@@ -2192,7 +2198,10 @@ int test_clear_page_writeback(struct page *page)
 {
struct address_space *mapping = page_mapping(page);
int ret;
+   bool locked;
+   unsigned long flags;
 
+   mem_cgroup_begin_update_page_stat(page, &locked, &flags);
if (mapping) {
struct backing_dev_info *bdi = mapping->backing_dev_info;
unsigned long flags;
@@ -2213,9 +,12 @@ int test_clear_page_writeback(struct page *page)
ret = TestClearPageWriteback(page);
}
if (ret) {
+   mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_WRITEBACK);
dec_zone_page_state(page, NR_WRITEBACK);
inc_zone_page_state(page, NR_WRITTEN);
}
+
+   mem_cgroup_end_update_page_stat(page, &locked, &flags);
return ret;
 }
 
@@ -2223,7 +2235,10 @@ int test_set_page_writeback(struct page *page)
 {
struct address_space *mapping = page_mapping(page);
int ret;
+   bool locked;
+   unsigned long flags;
 
+   mem_cgroup_begin_update_page_stat(page, &locked, &flags);
if (mapping) {
struct backing_dev_info *bdi = mapping->backing_dev_info;
unsigned long flags;
@@ -2250,6 +2265,8 @@ int test_set_page_writeback(struct page *page)
}
if (!ret)
account_page_writeback(page);
+
+   mem_cgroup_end_update_page_stat(page, &locked, &flags);
return ret;
 
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v5 15/19] memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap

2012-07-27 Thread Wen Congyang

From: Yasuaki Ishimatsu 

For removing memmap region of sparse-vmemmap which is allocated bootmem,
memmap region of sparse-vmemmap needs to be registered by get_page_bootmem().
So the patch searches pages of virtual mapping and registers the pages by
get_page_bootmem().

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Wen Congyang 
Signed-off-by: Yasuaki Ishimatsu 
---
 arch/x86/mm/init_64.c  |   52 
 include/linux/memory_hotplug.h |2 +
 include/linux/mm.h |3 +-
 mm/memory_hotplug.c|   23 +++--
 4 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index f1554a9..a151145 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1138,6 +1138,58 @@ vmemmap_populate(struct page *start_page, unsigned long 
size, int node)
return 0;
 }
 
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+   unsigned long addr = (unsigned long)start_page;
+   unsigned long end = (unsigned long)(start_page + size);
+   unsigned long next;
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+
+   for (; addr < end; addr = next) {
+   pte_t *pte = NULL;
+
+   pgd = pgd_offset_k(addr);
+   if (pgd_none(*pgd)) {
+   next = (addr + PAGE_SIZE) & PAGE_MASK;
+   continue;
+   }
+   get_page_bootmem(section_nr, pgd_page(*pgd), MIX_SECTION_INFO);
+
+   pud = pud_offset(pgd, addr);
+   if (pud_none(*pud)) {
+   next = (addr + PAGE_SIZE) & PAGE_MASK;
+   continue;
+   }
+   get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
+
+   if (!cpu_has_pse) {
+   next = (addr + PAGE_SIZE) & PAGE_MASK;
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd))
+   continue;
+   get_page_bootmem(section_nr, pmd_page(*pmd),
+MIX_SECTION_INFO);
+
+   pte = pte_offset_kernel(pmd, addr);
+   if (pte_none(*pte))
+   continue;
+   get_page_bootmem(section_nr, pte_page(*pte),
+SECTION_INFO);
+   } else {
+   next = pmd_addr_end(addr, end);
+
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd))
+   continue;
+   get_page_bootmem(section_nr, pmd_page(*pmd),
+SECTION_INFO);
+   }
+   }
+}
+
 void __meminit vmemmap_populate_print_last(void)
 {
if (p_start) {
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index fe50a9b..e79d744 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -164,6 +164,8 @@ static inline void arch_refresh_nodedata(int nid, pg_data_t 
*pgdat)
 
 extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
 extern void put_page_bootmem(struct page *page);
+extern void get_page_bootmem(unsigned long ingo, struct page *page,
+unsigned long type);
 
 /*
  * Lock for memory hotplug guarantees 1) all callbacks for memory hotplug
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f9f279c..716f38b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1586,7 +1586,8 @@ int vmemmap_populate_basepages(struct page *start_page,
unsigned long pages, int node);
 int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
 void vmemmap_populate_print_last(void);
-
+void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
+ unsigned long size);
 
 enum mf_flags {
MF_COUNT_INCREASED = 1 << 0,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 180d555..adcc93d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -91,8 +91,8 @@ static void release_memory_resource(struct resource *res)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
-static void get_page_bootmem(unsigned long info,  struct page *page,
-unsigned long type)
+void get_page_bootmem(unsigned long info,  struct page *page,
+ unsigned long type)
 {
unsigned long page_type;
 
@@ -164,8 +164,25 @@ static void register_page_bootmem_info_section(unsigned 
long start_pfn)
 
 }
 #else
-static inline void register_page_bootmem_info_sect

[RFC PATCH v5 16/19] memory-hotplug: free memmap of sparse-vmemmap

2012-07-27 Thread Wen Congyang

From: Yasuaki Ishimatsu 

All pages of virtual mapping in removed memory cannot be freed, since some pages
used as PGD/PUD includes not only removed memory but also other memory. So the
patch checks whether page can be freed or not.

How to check whether page can be freed or not?
 1. When removing memory, the page structs of the revmoved memory are filled
with 0FD.
 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
In this case, the page used as PT/PMD can be freed.

Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Wen Congyang 
Signed-off-by: Yasuaki Ishimatsu 
---
 arch/x86/mm/init_64.c |  121 +
 include/linux/mm.h|2 +
 mm/memory_hotplug.c   |   17 +--
 mm/sparse.c   |5 +-
 4 files changed, 128 insertions(+), 17 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a151145..ef83955 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1138,6 +1138,127 @@ vmemmap_populate(struct page *start_page, unsigned long 
size, int node)
return 0;
 }
 
+#define PAGE_INUSE 0xFD
+
+unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
+   struct page **pp, int *page_size)
+{
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+   void *page_addr;
+   unsigned long next;
+
+   *pp = NULL;
+
+   pgd = pgd_offset_k(addr);
+   if (pgd_none(*pgd))
+   return pgd_addr_end(addr, end);
+
+   pud = pud_offset(pgd, addr);
+   if (pud_none(*pud))
+   return pud_addr_end(addr, end);
+
+   if (!cpu_has_pse) {
+   next = (addr + PAGE_SIZE) & PAGE_MASK;
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd))
+   return next;
+
+   pte = pte_offset_kernel(pmd, addr);
+   if (pte_none(*pte))
+   return next;
+
+   *page_size = PAGE_SIZE;
+   *pp = pte_page(*pte);
+   } else {
+   next = pmd_addr_end(addr, end);
+
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd))
+   return next;
+
+   *page_size = PMD_SIZE;
+   *pp = pmd_page(*pmd);
+   }
+
+   /*
+* Removed page structs are filled with 0xFD.
+*/
+   memset((void *)addr, PAGE_INUSE, next - addr);
+
+   page_addr = page_address(*pp);
+
+   /*
+* Check the page is filled with 0xFD or not.
+* memchr_inv() returns the address. In this case, we cannot
+* clear PTE/PUD entry, since the page is used by other.
+* So we cannot also free the page.
+*
+* memchr_inv() returns NULL. In this case, we can clear
+* PTE/PUD entry, since the page is not used by other.
+* So we can also free the page.
+*/
+   if (memchr_inv(page_addr, PAGE_INUSE, *page_size)) {
+   *pp = NULL;
+   return next;
+   }
+
+   if (!cpu_has_pse)
+   pte_clear(&init_mm, addr, pte);
+   else
+   pmd_clear(pmd);
+
+   return next;
+}
+
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+   unsigned long addr = (unsigned long)memmap;
+   unsigned long end = (unsigned long)(memmap + nr_pages);
+   unsigned long next;
+   struct page *page;
+   int page_size;
+
+   for (; addr < end; addr = next) {
+   page = NULL;
+   page_size = 0;
+   next = find_and_clear_pte_page(addr, end, &page, &page_size);
+   if (!page)
+   continue;
+
+   free_pages((unsigned long)page_address(page),
+   get_order(page_size));
+   __flush_tlb_one(addr);
+   }
+
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+   unsigned long addr = (unsigned long)memmap;
+   unsigned long end = (unsigned long)(memmap + nr_pages);
+   unsigned long next;
+   struct page *page;
+   int page_size;
+   unsigned long magic;
+
+   for (; addr < end; addr = next) {
+   page = NULL;
+   page_size = 0;
+   next = find_and_clear_pte_page(addr, end, &page, &page_size);
+   if (!page)
+   continue;
+
+   magic = (unsigned long) page->lru.next;
+   if (magic == SECTION_INFO)
+   put_page_bootmem(page);
+   flush_tlb_kernel_range(addr, end);
+   }
+
+}
+
 void register_page_bootmem_memmap(unsigned long

[RFC PATCH v5 17/19] memory_hotplug: clear zone when the memory is removed

2012-07-27 Thread Wen Congyang

From: Yasuaki Ishimatsu 

When a memory is added, we update zone's and pgdat's start_pfn and spanned_pages
in the function __add_zone(). So we should revert these when the memory is
removed. Add a new function __remove_zone() to do this.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Wen Congyang 
Signed-off-by: Yasuaki Ishimatsu 
---
 mm/memory_hotplug.c |  181 +++
 1 files changed, 181 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 859425c..5ac035f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -300,10 +300,187 @@ static int __meminit __add_section(int nid, struct zone 
*zone,
return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
 }
 
+/* find the smallest valid pfn in the range [start_pfn, end_pfn) */
+static int find_smallest_section_pfn(unsigned long start_pfn,
+unsigned long end_pfn)
+{
+   struct mem_section *ms;
+
+   for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) {
+   ms = __pfn_to_section(start_pfn);
+
+   if (unlikely(!valid_section(ms)))
+   continue;
+
+   return start_pfn;
+   }
+
+   return 0;
+}
+
+/* find the biggest valid pfn in the range [start_pfn, end_pfn). */
+static int find_biggest_section_pfn(unsigned long start_pfn,
+   unsigned long end_pfn)
+{
+   struct mem_section *ms;
+   unsigned long pfn;
+
+   /* pfn is the end pfn of a memory section. */
+   pfn = end_pfn - 1;
+   for (; pfn >= start_pfn; pfn -= PAGES_PER_SECTION) {
+   ms = __pfn_to_section(pfn);
+
+   if (unlikely(!valid_section(ms)))
+   continue;
+
+   return pfn;
+   }
+
+   return 0;
+}
+
+static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
+unsigned long end_pfn)
+{
+   unsigned long zone_start_pfn =  zone->zone_start_pfn;
+   unsigned long zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
+   unsigned long pfn;
+   struct mem_section *ms;
+
+   zone_span_writelock(zone);
+   if (zone_start_pfn == start_pfn) {
+   /*
+* If the section is smallest section in the zone, it need
+* shrink zone->zone_start_pfn and zone->zone_spanned_pages.
+* In this case, we find second smallest valid mem_section
+* for shrinking zone.
+*/
+   pfn = find_smallest_section_pfn(end_pfn, zone_end_pfn);
+   if (pfn) {
+   zone->zone_start_pfn = pfn;
+   zone->spanned_pages = zone_end_pfn - pfn;
+   }
+   } else if (zone_end_pfn == end_pfn) {
+   /*
+* If the section is biggest section in the zone, it need
+* shrink zone->spanned_pages.
+* In this case, we find second biggest valid mem_section for
+* shrinking zone.
+*/
+   pfn = find_biggest_section_pfn(zone_start_pfn, start_pfn);
+   if (pfn)
+   zone->spanned_pages = pfn - zone_start_pfn + 1;
+   }
+
+   /*
+* The section is not biggest or smallest mem_section in the zone, it
+* only creates a hole in the zone. So in this case, we need not
+* change the zone. But perhaps, the zone has only hole data. Thus
+* it check the zone has only hole or not.
+*/
+   pfn = zone_start_pfn;
+   for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) {
+   ms = __pfn_to_section(pfn);
+
+   if (unlikely(!valid_section(ms)))
+   continue;
+
+/* If the section is current section, it continues the loop */
+   if (start_pfn == pfn)
+   continue;
+
+   /* If we find valid section, we have nothing to do */
+   zone_span_writeunlock(zone);
+   return;
+   }
+
+   /* The zone has no valid section */
+   zone->zone_start_pfn = 0;
+   zone->spanned_pages = 0;
+   zone_span_writeunlock(zone);
+}
+
+static void shrink_pgdat_span(struct pglist_data *pgdat,
+ unsigned long start_pfn, unsigned long end_pfn)
+{
+   unsigned long pgdat_start_pfn =  pgdat->node_start_pfn;
+   unsigned long pgdat_end_pfn =
+   pgdat->node_start_pfn + pgdat->node_spanned_pages;
+   unsigned long pfn;
+   struct mem_section *ms;
+
+   if (pgdat_start_pfn == start_pfn) {
+   /*
+* If the section is smallest section in the pgdat, it need
+* shrink pgdat->node_start_pfn

[RFC PATCH v5 18/19] memory-hotplug: add node_device_release

2012-07-27 Thread Wen Congyang

From: Yasuaki Ishimatsu 

When calling unregister_node(), the function shows following message at
device_release().

Device 'node2' does not have a release() function, it is broken and must be
fixed.

So the patch implements node_device_release()

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
Signed-off-by: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
 drivers/base/node.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index af1a177..9bc2f57 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -252,6 +252,13 @@ static inline void hugetlb_register_node(struct node 
*node) {}
 static inline void hugetlb_unregister_node(struct node *node) {}
 #endif
 
+static void node_device_release(struct device *dev)
+{
+   struct node *node_dev = to_node(dev);
+
+   flush_work(&node_dev->node_work);
+   memset(node_dev, 0, sizeof(struct node));
+}
 
 /*
  * register_node - Setup a sysfs device for a node.
@@ -265,6 +272,7 @@ int register_node(struct node *node, int num, struct node 
*parent)
 
node->dev.id = num;
node->dev.bus = &node_subsys;
+   node->dev.release = node_device_release;
error = device_register(&node->dev);
 
if (!error){
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v5 19/19] memory-hotplug: remove sysfs file of node

2012-07-27 Thread Wen Congyang

From: Yasuaki Ishimatsu 

The patch adds node_set_offline() and unregister_one_node() to remove_memory()
for removing sysfs file of node.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Wen Congyang 
Signed-off-by: Yasuaki Ishimatsu 
---
 mm/memory_hotplug.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5ac035f..5681968 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1267,6 +1267,11 @@ int __ref remove_memory(int nid, u64 start, u64 size)
/* remove memmap entry */
firmware_map_remove(start, start + size, "System RAM");
 
+   if (!node_present_pages(nid)) {
+   node_set_offline(nid);
+   unregister_one_node(nid);
+   }
+
arch_remove_memory(start, size);
 out:
unlock_memory_hotplug();
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2 6/6] memcg: Document cgroup dirty/writeback memory statistics

2012-07-27 Thread Sha Zhengju

From: Sha Zhengju 

Signed-off-by: Sha Zhengju 
Acked-by: KAMEZAWA Hiroyuki 
Ackedy-by: Michal Hocko 
Acked-by: Fengguang Wu 
---
 Documentation/cgroups/memory.txt |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index dd88540..f4b5778 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -420,6 +420,8 @@ pgpgin  - # of charging events to the memory 
cgroup. The charging
 pgpgout- # of uncharging events to the memory cgroup. The 
uncharging
event happens each time a page is unaccounted from the cgroup.
 swap   - # of bytes of swap usage
+dirty  - # of bytes of file cache that are not in sync with the disk 
copy.
+writeback  - # of bytes of file/anon cache that are queued for syncing to 
disk.
 inactive_anon  - # of bytes of anonymous memory and swap cache memory on
LRU list.
 active_anon- # of bytes of anonymous and swap cache memory on active
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 08/13] driver core: firmware loader: fix device lifetime

2012-07-27 Thread Borislav Petkov

On Fri, Jul 27, 2012 at 09:30:57AM +0800, Ming Lei wrote:
> No, the comment above is misleading and not useless, and I think the below
> is good:
> 
>  *  Asynchronous variant of request_firmware() for user contexts where
>  *  it is not possible to sleep for long time or can't sleep at all, 
> depends


depending

>  *  on the @gfp flag passed.
> 
> Anyway, the original part of 'It can't be called in atomic contexts.' is wrong
> and should be removed.

I still don't like too much the "not possible to sleep for long time"
expression.

Maybe change it to "should sleep for as small periods as possible since
it increases boot time of device drivers requesting firmware in their
->probe() methods."

This way you explain exactly why - this way people who don't know the
code will know exactly what the comments mean and what the intention
was.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] PWM subsystem for v3.6

2012-07-27 Thread Alexandre Pereira da Silva

On Fri, Jul 27, 2012 at 2:10 AM, Thierry Reding
 wrote:
> On Thu, Jul 26, 2012 at 02:11:58PM -0700, Linus Torvalds wrote:
>> On Thu, Jul 26, 2012 at 12:16 AM, Thierry Reding
>>  wrote:
>> >
>> > The new PWM subsystem aims at collecting all implementations of the
>> > legacy PWM API and to eventually replace it completely. The subsystem
>> > has been in development for over half a year now and many drivers have
>> > already been converted. It has been in linux-next for a couple of weeks
>> > and there have been no major issues so I think it is ready for inclusion
>> > in your tree.
>>
>> For new subsystems like this, I really want ack's from the people who
>> are expected to use it.
>
> At least the patch that adds me as the maintainer is Acked-by: Sascha
> Hauer, who did the original work, and Arnd Bergmann who was involved in
> the review process. Other people such as Shawn Guo and Mark Brown have
> also been reviewing these patches and new patches have been contributed
> by Eric Bénard, Axel Lin, Sachin Kamat, Alexandre Courbot, Alexandre
> Pereira da Silva and Philip Avinash.
>
> I'm adding all of them on Cc so they can ack this (I'm assuming acking
> this email will suffice).

I'm using this on LPC32XX.

Acked-By: Alexandre Pereira da Silva 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [RFC PATCH] Boot PV guests with more than 128GB (v1) for 3.7

2012-07-27 Thread Jan Beulich

>>> On 27.07.12 at 12:21, Ian Campbell  wrote:
> I was actually think of the issue with 32 bit PV guests accessing MFN
> space > 160G, even if they are themselves small, which is a separate
> concern.

That can be made work if really needed, but not via the
mechanism we're talking about here. The question is whether
it's worth it.

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [vmw_vmci 11/11] Apply the header code to make VMCI build

2012-07-27 Thread Sam Ravnborg

Hi Andrew.

A few things noted in the following..

> 
> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index 2661f6e..fe38c7a 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -517,4 +517,5 @@ source "drivers/misc/lis3lv02d/Kconfig"
>  source "drivers/misc/carma/Kconfig"
>  source "drivers/misc/altera-stapl/Kconfig"
>  source "drivers/misc/mei/Kconfig"
> +source "drivers/misc/vmw_vmci/Kconfig"
>  endmenu
> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> index 456972f..af9e413 100644
> --- a/drivers/misc/Makefile
> +++ b/drivers/misc/Makefile
> @@ -51,3 +51,4 @@ obj-y   += carma/
>  obj-$(CONFIG_USB_SWITCH_FSA9480) += fsa9480.o
>  obj-$(CONFIG_ALTERA_STAPL)   +=altera-stapl/
>  obj-$(CONFIG_INTEL_MEI)  += mei/
> +obj-y+= vmw_vmci/

Please use obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci/

like we do in the other cases. This prevents us from visiting the directory
when this feature is not enabled.

> +++ b/drivers/misc/vmw_vmci/Makefile
> @@ -0,0 +1,43 @@
> +
> +#
> +# Linux driver for VMware's VMCI device.
> +#
> +# Copyright (C) 2007-2012, VMware, Inc. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or modify it
> +# under the terms of the GNU General Public License as published by the
> +# Free Software Foundation; version 2 of the License and no later version.
> +#
> +# This program is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> +# NON INFRINGEMENT.  See the GNU General Public License for more
> +# details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write to the Free Software
> +# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
> +#
> +# The full GNU General Public License is included in this distribution in
> +# the file called "COPYING".
> +#
> +# Maintained by: Andrew Stiegmann 
> +#
> +
Lot's of boilerplate noise for such a simple file...

> +
> +#
> +# Makefile for the VMware VMCI
> +#
> +
> +obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci.o
> +
> +vmw_vmci-objs += vmci_context.o
> +vmw_vmci-objs += vmci_datagram.o
> +vmw_vmci-objs += vmci_doorbell.o
> +vmw_vmci-objs += vmci_driver.o
> +vmw_vmci-objs += vmci_event.o
> +vmw_vmci-objs += vmci_handle_array.o
> +vmw_vmci-objs += vmci_hash_table.o
> +vmw_vmci-objs += vmci_queue_pair.o
> +vmw_vmci-objs += vmci_resource.o
> +vmw_vmci-objs += vmci_route.o

please use:
vmw_vmci-y += vmci_context.o
vmw_vmci-y += vmci_datagram.o
vmw_vmci-y += vmci_doorbell.o

This is recommended these days and allows you to enable/disable
single files later using a config option.



> diff --git a/drivers/misc/vmw_vmci/vmci_common_int.h 
> b/drivers/misc/vmw_vmci/vmci_common_int.h
> +
> +#ifndef _VMCI_COMMONINT_H_
> +#define _VMCI_COMMONINT_H_
> +
> +#include 
> +#include 

Use inverse chrismas tree here.
Longer include lines first, and soret alphabetically when
lines are of the same length.
This applies likely in many cases.

> +#include "vmci_handle_array.h"
> +
> +#define ASSERT(cond) BUG_ON(!(cond))
> +
> +#define CAN_BLOCK(_f) (!((_f) & VMCI_QPFLAG_NONBLOCK))
> +#define QP_PINNED(_f) ((_f) & VMCI_QPFLAG_PINNED)

Looks like poor obscufation.
Use a statis inline function if you need a helper for this.

> +
> +/*
> + * Utilility function that checks whether two entities are allowed
> + * to interact. If one of them is restricted, the other one must
> + * be trusted.
> + */
> +static inline bool vmci_deny_interaction(uint32_t partOne,
> +  uint32_t partTwo)

The kernel types are u32 not uint32_t - these types belongs in user-space.

> +++ b/include/linux/vmw_vmci_api.h
> +
> +#ifndef __VMW_VMCI_API_H__
> +#define __VMW_VMCI_API_H__
> +
> +#include 
> +
> +#undef  VMCI_KERNEL_API_VERSION
> +#define VMCI_KERNEL_API_VERSION_2 2
> +#define VMCI_KERNEL_API_VERSION   VMCI_KERNEL_API_VERSION_2
> +
> +typedef void (VMCI_DeviceShutdownFn) (void *deviceRegistration, void 
> *userData);
> +
> +bool VMCI_DeviceGet(uint32_t *apiVersion,
> + VMCI_DeviceShutdownFn *deviceShutdownCB,
> + void *userData, void **deviceRegistration);

The kernel style is to use lower_case for everything.
So this would become:

vmci_device_get()

This is obviously a very general comment and applies everywhere.

Sam
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 12/13] driver core: firmware loader: use small timeout for cache device firmware

2012-07-27 Thread Borislav Petkov

On Fri, Jul 27, 2012 at 09:54:25AM +0800, Ming Lei wrote:
> On Fri, Jul 27, 2012 at 1:54 AM, Borislav Petkov  wrote:
> 
> >> No, it is not what I was saying.
> 
> I just mean the point is not mentioned in my commit log, but I admit it should
> be a appropriate cause.
> 
> >
> > Ok, maybe I'm not understanding this then. So explain to me this: why
> > do you need that timeout value of 10, how did we decide it to be 10
> 
> If one firmware image was loaded successfully before, the probability of
> loading it successfully at this time should be much higher than the 1st time
> because something crazy(for example, the firmware is deleted) happens
> with low probability.

Believe it or not, I'm addressing exactly the possibility of the
firmware disappearing from under us in the AMD microcode driver
currently :) (and some other annoyances, of course).

> Choosing 10 secs is just a estimation for loading time because the maximum
> size of firmware in current distributions is about 2M bytes, since we know
> it has been loaded successfully before.

This is exactly the comment we want over the code to explain to others
why we're choosing 10 secs. Simply add that sentence above the 10s
assignment and we're perfect! :-)

> > (and not 20 or 30 or whatever)? Generally, why do we need to reprogram
> > the timer to a smaller timeout instead of simply doing the completion
> > without a timeout?
> 
> No, it should be crazy without a timeout, and it can be triggered in init call
> easily.

Ok.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -alternative] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables V2 (resend)

2012-07-27 Thread Larry Woodman


On 07/27/2012 06:23 AM, Mel Gorman wrote:

On Thu, Jul 26, 2012 at 11:48:56PM -0400, Larry Woodman wrote:

On 07/26/2012 02:37 PM, Rik van Riel wrote:

On 07/23/2012 12:04 AM, Hugh Dickins wrote:


I spent hours trying to dream up a better patch, trying various
approaches.  I think I have a nice one now, what do you think?  And
more importantly, does it work?  I have not tried to test it at all,
that I'm hoping to leave to you, I'm sure you'll attack it with gusto!

If you like it, please take it over and add your comments and signoff
and send it in.  The second part won't come up in your testing,
and could
be made a separate patch if you prefer: it's a related point that struck
me while I was playing with a different approach.

I'm sorely tempted to leave a dangerous pair of eyes off the Cc,
but that too would be unfair.

Subject-to-your-testing-
Signed-off-by: Hugh Dickins

This patch looks good to me.

Larry, does Hugh's patch survive your testing?



Like I said earlier, no.

That is a surprise. Can you try your test case on 3.4 and tell us if the
patch fixes the problem there? I would like to rule out the possibility
that the locking rules are slightly different in RHEL. If it hits on 3.4
then it's also possible you are seeing a different bug, more on this later.

Sure, it will take me a little while because the machine is shared between
several users.



However, I finally set up a reproducer
that only takes a few seconds
on a large system and this totally fixes the problem:


The other possibility is that your reproducer case is triggering a
different race to mine. Would it be possible to post?

Let me ask, I only have the binary and dont know if its OK to distribute
so I dont know exactly what is going on.  I did some tracing and saw 
forking,

group exits, multi-threading, hufetlbfs file creation, mmap'ng munmap'ng &
deleting the hugetlbfs
files.




-
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c36febb..cc023b8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2151,7 +2151,7 @@ int copy_hugetlb_page_range(struct mm_struct
*dst, struct mm_struct *src,
 goto nomem;

 /* If the pagetables are shared don't copy or take references 
*/
-   if (dst_pte == src_pte)
+   if (*(unsigned long *)dst_pte == *(unsigned long *)src_pte)
 continue;

 spin_lock(&dst->page_table_lock);
---

When we compare what the src_pte&  dst_pte point to instead of their
addresses everything works,

The dst_pte and src_pte are pointing to the PMD page though which is what
we're meant to be checking. Your patch appears to change that to check if
they are sharing data which is quite different. This is functionally
similar to if you just checked VM_MAYSHARE at the start of the function
and bailed if so. The PTEs would be populated at fault time instead.


I suspect there is a missing memory barrier somewhere ???


Possibly but hard to tell whether it's barriers that are the real
problem during fork. The copy routine is suspicious.

On the barrier side - in normal PTE alloc routines there is a write
barrier which is documented in __pte_alloc. If hugepage table sharing is
successful, there is no similar barrier in huge_pmd_share before the PUD
is populated. By rights, there should be a smp_wmb() before the page table
spinlock is taken in huge_pmd_share().

The lack of a write barrier leads to a possible snarls between fork()
and fault. Take three processes, parent, child and other. Parent is
forking to create child. Other is calling fault.

Other faults
hugetlb_fault()->huge_pte_alloc->allocate a PMD (write barrier)
It is about to enter hugetlb_no_fault()

Parent forks() runs at the same time
Child shares a page table page but NOT with the forking process (dst_pte
!= src_pte) and calls huge_pte_offset.

As it's not reading the contents of the PMD page, there is no implicit read
barrier to pair with the write barrier from hugetlb_fault that updates
the PMD page and they are not serialised by the page table lock. Hard to
see exactly where that would cause a problem though.

Thing is, in this scenario I think it's possible that page table sharing
is not correctly detected by that dst_pte == src_pte check.  dst_pte !=
src_pte but that does not mean it's not sharing with somebody! If it's
sharing and it falls though then it copies the src PTE even though the
dst PTE could already be populated and updates the mapcount accordingly.
That would be a mess in its own right.

I think this is exactly what is happening.  I'll put more cave-man debugging
code in and let you know.

Larry



There might be two bugs here.



--
To unsubscribe from this list: send the line "unsubs

Re: [RFC PATCH v5 00/19] memory-hotplug: hot-remove physical memory

2012-07-27 Thread Yasuaki Ishimatsu


Hi Wen,

2012/07/27 19:20, Wen Congyang wrote:

This patch series aims to support physical memory hot-remove.

The patches can free/remove following things:

   - acpi_memory_info  : [RFC PATCH 4/19]
   - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
   - iomem_resource: [RFC PATCH 9/19]
   - mem_section and related sysfs files   : [RFC PATCH 10-11, 13-16/19]
   - page table of removed memory  : [RFC PATCH 12/19]
   - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.

change log of v5:
  * merge the patchset to clear page table and the patchset to hot remove
memory(from ishimatsu) to one big patchset.


Thank you for merging patches. I'll review next Monday.

Thanks,
Yasuaki Ishimatsu


  [RFC PATCH v5 1/19]
* rename remove_memory() to offline_memory()/offline_pages()

  [RFC PATCH v5 2/19]
* new patch: implement offline_memory(). This function offlines pages,
  update memory block's state, and notify the userspace that the memory
  block's state is changed.

  [RFC PATCH v5 4/19]
* offline and remove memory in acpi_memory_disable_device() too.

  [RFC PATCH v5 17/19]
* new patch: add a new function __remove_zone() to revert the things done
  in the function __add_zone().

  [RFC PATCH v5 18/19]
* flush work befor reseting node device.

change log of v4:
  * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug"
from the patch series, since the patch is a bugfix. It is being disccussed
on other thread. But for testing the patch series, the patch is needed.
So I added the patch as [PATCH 0/13].

  [RFC PATCH v4 2/13]
* check memory is online or not at remove_memory()
* add memory_add_physaddr_to_nid() to acpi_memory_device_remove() for
  getting node id

  [RFC PATCH v4 3/13]
* create new patch : check memory is online or not at online_pages()

  [RFC PATCH v4 4/13]
* add __ref section to remove_memory()
* call firmware_map_remove_entry() before remove_sysfs_fw_map_entry()

  [RFC PATCH v4 11/13]
* rewrite register_page_bootmem_memmap() for removing page used as PT/PMD

change log of v3:
  * rebase to 3.5.0-rc6

  [RFC PATCH v2 2/13]
* remove extra kobject_put()

* The patch was commented by Wen. Wen's comment is
  "acpi_memory_device_remove() should ignore a return value of
  remove_memory() since caller does not care the return value".
  But I did not change it since I think caller should care the
  return value. And I am trying to fix it as follow:

  https://lkml.org/lkml/2012/7/5/624

  [RFC PATCH v2 4/13]
* remove a firmware_memmap_entry allocated by kzmalloc()

change log of v2:
  [RFC PATCH v2 2/13]
* check whether memory block is offline or not before calling 
offline_memory()
* check whether section is valid or not in is_memblk_offline()
* call kobject_put() for each memory_block in is_memblk_offline()

  [RFC PATCH v2 3/13]
* unify the end argument of firmware_map_add_early/hotplug

  [RFC PATCH v2 4/13]
* add release_firmware_map_entry() for freeing firmware_map_entry

  [RFC PATCH v2 6/13]
   * add release_memory_block() for freeing memory_block

  [RFC PATCH v2 11/13]
   * fix wrong arguments of free_pages()


Wen Congyang (5):
   memory-hotplug: implement offline_memory()
   memory-hotplug: store the node id in acpi_memory_device
   memory-hotplug: export the function acpi_bus_remove()
   memory-hotplug: call acpi_bus_remove() to remove memory device
   memory-hotplug: introduce new function arch_remove_memory()

Yasuaki Ishimatsu (14):
   memory-hotplug: rename remove_memory() to
 offline_memory()/offline_pages()
   memory-hotplug: offline and remove memory when removing the memory
 device
   memory-hotplug: check whether memory is present or not
   memory-hotplug: remove /sys/firmware/memmap/X sysfs
   memory-hotplug: does not release memory region in PAGES_PER_SECTION
 chunks
   memory-hotplug: add memory_block_release
   memory-hotplug: remove_memory calls __remove_pages
   memory-hotplug: check page type in get_page_bootmem
   memory-hotplug: move register_page_bootmem_info_node and
 put_page_bootmem for sparse-vmemmap
   memory-hotplug: implement register_page_bootmem_info_section of
 sparse-vmemmap
   memory-hotplug: free memmap of sparse-vmemmap
   memory_hotplug: clear zone when the memory is removed
   memory-hotplug: add node_device_release
   memory-hotplug: remove sysfs file of node

  arch/ia64/mm/init.c |   16 +
  arch/powerpc/mm/mem.c   |   14 +
  arch/powerpc/platforms/pseries/hotplug-memory.c |   16 +-
  arch/s390/mm/init.c |8 +
  arch/sh/mm/init.c   |   15 +
  arch/tile/mm/init.c |8 +
  arc

Re: [PATCH 3/4] xen/mmu: The xen_setup_kernel_pagetable doesn't need to return anything.

2012-07-27 Thread Stefano Stabellini

On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote:
> We don't need to return the new PGD - as we do not use it.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 


Acked-by: Stefano Stabellini 

>  arch/x86/xen/enlighten.c |5 +
>  arch/x86/xen/mmu.c   |   10 ++
>  arch/x86/xen/xen-ops.h   |2 +-
>  3 files changed, 4 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 9b1afa4..2b67948 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1295,7 +1295,6 @@ asmlinkage void __init xen_start_kernel(void)
>  {
>   struct physdev_set_iopl set_iopl;
>   int rc;
> - pgd_t *pgd;
>  
>   if (!xen_start_info)
>   return;
> @@ -1387,8 +1386,6 @@ asmlinkage void __init xen_start_kernel(void)
>   acpi_numa = -1;
>  #endif
>  
> - pgd = (pgd_t *)xen_start_info->pt_base;
> -
>   /* Don't do the full vcpu_info placement stuff until we have a
>  possible map and a non-dummy shared_info. */
>   per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
> @@ -1397,7 +1394,7 @@ asmlinkage void __init xen_start_kernel(void)
>   early_boot_irqs_disabled = true;
>  
>   xen_raw_console_write("mapping kernel into physical memory\n");
> - pgd = xen_setup_kernel_pagetable(pgd, xen_start_info->nr_pages);
> + xen_setup_kernel_pagetable((pgd_t *)xen_start_info->pt_base, 
> xen_start_info->nr_pages);
>  
>   xen_reserve_internals();
>   /* Allocate and initialize top and mid mfn levels for p2m structure */
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 3a73785..4ac21a4 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1719,8 +1719,7 @@ static void convert_pfn_mfn(void *v)
>   * of the physical mapping once some sort of allocator has been set
>   * up.
>   */
> -pgd_t * __init xen_setup_kernel_pagetable(pgd_t *pgd,
> -  unsigned long max_pfn)
> +void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
>  {
>   pud_t *l3;
>   pmd_t *l2;
> @@ -1781,8 +1780,6 @@ pgd_t * __init xen_setup_kernel_pagetable(pgd_t *pgd,
>  
>   memblock_reserve(__pa(xen_start_info->pt_base),
>xen_start_info->nr_pt_frames * PAGE_SIZE);
> -
> - return pgd;
>  }
>  #else/* !CONFIG_X86_64 */
>  static RESERVE_BRK_ARRAY(pmd_t, initial_kernel_pmd, PTRS_PER_PMD);
> @@ -1825,8 +1822,7 @@ static void __init xen_write_cr3_init(unsigned long cr3)
>   pv_mmu_ops.write_cr3 = &xen_write_cr3;
>  }
>  
> -pgd_t * __init xen_setup_kernel_pagetable(pgd_t *pgd,
> -  unsigned long max_pfn)
> +void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
>  {
>   pmd_t *kernel_pmd;
>  
> @@ -1858,8 +1854,6 @@ pgd_t * __init xen_setup_kernel_pagetable(pgd_t *pgd,
>  
>   memblock_reserve(__pa(xen_start_info->pt_base),
>xen_start_info->nr_pt_frames * PAGE_SIZE);
> -
> - return initial_page_table;
>  }
>  #endif   /* CONFIG_X86_64 */
>  
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 202d4c1..2230f57 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -27,7 +27,7 @@ void xen_setup_mfn_list_list(void);
>  void xen_setup_shared_info(void);
>  void xen_build_mfn_list_list(void);
>  void xen_setup_machphys_mapping(void);
> -pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn);
> +void xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn);
>  void xen_reserve_top(void);
>  extern unsigned long xen_max_p2m_pfn;
>  
> -- 
> 1.7.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: bluetooth not working since kernel 3.4.4

2012-07-27 Thread Jiri Slaby

FWD to upstream. For him, bluetooth in 3.5 vanilla does not work:

On 07/27/2012 12:02 AM, Alin M Elena wrote:
> since updating to kernel 3.4.4 my bluetooth stopped working...
> kde claims there is no adapter available and the same gnome...
> 
> 
> the device is
> Bus 004 Device 004: ID 0a5c:4500 Broadcom Corp. BCM2046B1 USB 2.0 Hub (part 
> of 
> BCM2046 Bluetooth)
> [alin@abbaton:~]: dmesg | grep bt
> [   15.862126] usbcore: registered new interface driver btusb
> [alin@abbaton:~]: hcitool dev
> Devices:
> 
> I have updated since to the new 3.5.0 kernel and the situation is the same.
> 
> 
> [alin@abbaton:~]: uname -a
> Linux abbaton.ucd.ie 3.5.0-1-desktop #1 SMP PREEMPT Tue Jul 24 13:38:05 UTC 
> 2012 (fb9c50b) x86_64 x86_64 x86_64 GNU/Linux
> 
> Anyone else seeing the same? any advice?



On 07/27/2012 11:33 AM, Alin M Elena wrote:
>> What was the last working kernel?
> 3.4.2 iirc
>
>> Does kernel-vanilla work? I suppose
>> not, if you confirm that, we could discuss this with upstream...
> just tried the last kernel-vanilla from Kerned:Head 3.5.0-2.1 and no
bluetooth
> adapter found..
>
> I forgot to mention earlier.
>
> rfkill list
> 0: hci0: Bluetooth
> Soft blocked: no
> Hard blocked: no
>
> and I have tried the device in mac OSX and it works...
> the machine is a mac book pro 7,1.

regards,
-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] Input: synaptics - use firmware data for Cr-48

2012-07-27 Thread Daniel Kurtz

On Sat, Jul 21, 2012 at 2:31 AM, Chase Douglas
 wrote:
>
> On 07/20/2012 02:03 AM, Daniel Kurtz wrote:
>>
>> * Leave the device as SEMI_MT, but provide the real locations, and
>> allow userspace to determine the device vendor/model/etc. If
>> userspace knows that a specific device behaves in a specific way, it
>> can do its own quirking handling. Given the specificity of this
>> behavior to only some devices of one brand, this would be my
>> suggested resolution to the issue.
>>
>>
>> This is essentially what this patch does.  It sets the SEMI_MT flag to
>> indicate that the kernel data cannot be totally trusted, and then
>> provides real MT-B (including per-finger pressures), instead of a
>> fixed bounding box.  It leaves it to userspace to treat the two slots
>> worth of coordinates as a bounding box or as actual fingers using its
>> own heuristics.  By limiting to only one hardware type (using DMI),
>> any breakage caused by this alternative use of the SEMI_MT flag is
>> limited.
>
>
> So I was worried that you were trying to remove the SEMI_MT flag, and I 
> apologise for not looking closely enough to notice that wasn't the case. The 
> documentation for the flag says:
>
> """
> Some touchpads, most common between 2008 and 2011, can detect the presence of 
> multiple contacts without resolving the individual positions; only the number 
> of contacts and a rectangular shape is known. For such touchpads, the semi-mt 
> property should be set.
>
> Depending on the device, the rectangle may enclose all touches, like a 
> bounding box, or just some of them, for instance the two most recent touches. 
> The diversity makes the rectangle of limited use, but some gestures can 
> normally be extracted from it.
> """
>
> Since the documentation doesn't say the data must be provided as min/max 
> values, this patch actually appears to be perfectly fine as is.
>
> My next question is: how are you going to tell from userspace if the hardware 
> actually provides correct data? IIRC, it was decided that we wouldn't provide 
> sysfs nodes for the device IDs.
>

Excellent question.  We haven't solved this in any elegant way.  When
building images for this particular hardware platform, we set a flag
in our user-space touchpad driver.  It then knows to process this
device's data as "non-bounding box semi-mt".

-Daniel

> -- Chase
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-input" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] efi: Build EFI stub with EFI-appropriate options

2012-07-27 Thread Matt Fleming

On Thu, 2012-07-26 at 18:00 -0400, Matthew Garrett wrote:
> We can't assume the presence of the red zone while we're still in a boot
> services environment, so we should build with -fno-red-zone to avoid
> problems. Change the size of wchar at the same time to make string handling
> simpler.
> 
> Signed-off-by: Matthew Garrett 
> ---
>  arch/x86/boot/compressed/Makefile |3 +++
>  1 file changed, 3 insertions(+)

Acked-by: Matt Fleming 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] X86: Improve GOP detection in the EFI boot stub

2012-07-27 Thread Matt Fleming

On Thu, 2012-07-26 at 18:00 -0400, Matthew Garrett wrote:
> We currently use the PCI IO protocol as a proxy for a functional GOP. This
> is less than ideal, since some platforms will put the GOP on output devices
> rather than the GPU itself. Move to using the conout protocol. This is not
> guaranteed per-spec, but is part of the consplitter implementation that
> causes this problem in the first place and so should be reliable.
> 
> Signed-off-by: Matthew Garrett 
> ---
>  arch/x86/boot/compressed/eboot.c |   29 -
>  arch/x86/boot/compressed/eboot.h |4 
>  2 files changed, 20 insertions(+), 13 deletions(-)

Acked-by: Matt Fleming 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 >

1 - 100 of 475 matches

Mail list logo