date:20130220

On 02/21/2013 02:11 PM, Mike Galbraith wrote:
> On Thu, 2013-02-21 at 12:51 +0800, Michael Wang wrote: 
>> On 02/20/2013 06:49 PM, Ingo Molnar wrote:
>> [snip]
[snip]
>>
>>  if wake_affine()
>>  new_cpu = select_idle_sibling(curr_cpu)
>>  else
>>  new_cpu = select_idle_sibling(prev_cpu)
>>
>>  return new_cpu
>>
>> Actually that doesn't make sense.
>>
>> I think wake_affine() is trying to check whether move a task from
>> prev_cpu to curr_cpu will break the balance in affine_sd or not, but why
>> won't break balance means curr_cpu is better than prev_cpu for searching
>> the idle cpu?
> 
> You could argue that it's impossible to break balance by moving any task
> to any idle cpu, but that would mean bouncing tasks cross node on every
> wakeup is fine, which it isn't.

I don't get it... could you please give me more detail on how
wake_affine() related with bouncing?

> 
>> So the new logical in this patch set is:
>>
>>  new_cpu = select_idle_sibling(prev_cpu)
>>  if idle_cpu(new_cpu)
>>  return new_cpu
> 
> So you tilted the scales in favor of leaving tasks in their current
> package, which should benefit large footprint tasks, but should also
> penalize light communicating tasks.

Yes, I'd prefer to wakeup the task on a cpu which:
1. idle
2. close to prev_cpu

So if both curr_cpu and prev_cpu have idle cpu in their topology, which
one is better? that depends on how task benefit from cache and the
balance situation, whatever, I don't think the benefit worth the high
cost of wake_affine() in most cases...

Regards,
Michael Wang

> 
> I suspect that much of the pgbench improvement comes from the preemption
> mitigation from keeping 1:N load maximally spread, which is the perfect
> thing to do with such loads.  In all the testing I ever did with it in
> 1:N mode, preemption dominated performance numbers.  Keep server away
> from clients, it has fewer fair competition worries, can consume more
> CPU preemption free, pushing the load collapse point strongly upward.
> 
> -Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 17/16 V2] virtio-scsi: use virtqueue_add_sgs for command buffers

Asias He  writes:
> On 02/20/2013 05:47 PM, Wanlong Gao wrote:
>> Using the new virtqueue_add_sgs function lets us simplify the queueing
>> path.  In particular, all data protected by the tgt_lock is just gone
>> (multiqueue will find a new use for the lock).
>> 
>> Signed-off-by: Paolo Bonzini 
>> Signed-off-by: Wanlong Gao 
>
> Reviewed-by: Asias He 

Applied.

Unfortunately these won't be in until *next* merge window...

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 04/16] virtio-blk: use virtqueue_start_buf on bio path

Asias He  writes:

> On 02/19/2013 03:56 PM, Rusty Russell wrote:
>> (This is a respin of Paolo Bonzini's patch, but it calls
>> virtqueue_add_sgs() instead of his multi-part API).
...
> (This subject needs to be changed to reflect using of virtqueue_add_sgs)

Thanks, done.

>> -static inline int __virtblk_add_req(struct virtqueue *vq,
>> - struct virtblk_req *vbr,
>> - unsigned long out,
>> - unsigned long in)
>> +static int __virtblk_add_req(struct virtqueue *vq,
>> + struct virtblk_req *vbr)
>>  {
>> -return virtqueue_add_buf(vq, vbr->sg, out, in, vbr, GFP_ATOMIC);
>> +struct scatterlist hdr, tailer, *sgs[3];
>
> 'status' might be better than 'tailer'. We are using status in other
> places.

Indeed, done.

> Reviewed-by: Asias He 

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 vringh 1/3] remoteproc: Add support for vringh (Host vrings)

Ohad Ben-Cohen  writes:
> Hi Sjur,
>
> On Tue, Feb 12, 2013 at 1:49 PM,   wrote:
>> From: Sjur Brændeland 
>>
>> Add functions for creating, deleting and kicking host-side virtio rings.
>>
>> The host ring is not integrated with virtiqueues and cannot be managed
>> through virtio-config.
>
> Is that an inherent design/issue of vringh or just a description of
> the current vringh code ?

It's by design.  The producer (virtqueue) and consumer (vringh) are two
sides of the same coin, but they do different things.

virtqueue is a slightly higher level abstraction which assumes a
virtio_device, because every user so far has had one.  vringh doesn't,
because it's also aimed to underlie vhost.c which doesn't really have
one.

> This is possible of course thanks to the abstraction provided by
> virtio: remoteproc only implements a set of callbacks which virtio
> invokes when needed.
>
> Do we not want to follow a similar design scheme with vringh ?

Hmm... I clearly jumped the gun, assuming consensus was already reached.
I have put these patches *back* into pending-rebases, and they will not
be merged this merge window.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/16] virtio_net: use virtqueue_add_sgs[] for command buffers.

Wanlong Gao  writes:
> On 02/19/2013 03:56 PM, Rusty Russell wrote:
>> It's a bit cleaner to hand multiple sgs, rather than one big one.
>> 
>> Signed-off-by: Rusty Russell 
...
>> +BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
>>  
>> -ctrl.class = class;
>> -ctrl.cmd = cmd;
>
> The class and cmd assignment of ctrl header is forgotten?
>
> Thanks,
> Wanlong Gao

Good catch, fixed.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/16] virtio ring rework.

Paolo Bonzini  writes:
> Il 19/02/2013 08:56, Rusty Russell ha scritto:
>> OK, this is (ab)uses some of Paolo's patches.  The first 7 are
>> candidates for this merge window (maybe), the rest I'm not so sure
>> about.
>
> Cool, thanks.
>
>> Thanks,
>> Rusty.
>> 
>> Paolo Bonzini (3):
>>   scatterlist: introduce sg_unmark_end
>>   virtio-blk: reorganize virtblk_add_req
>>   virtio-blk: use virtqueue_add_sgs on req path
>> 
>> Rusty Russell (13):
>>   virtio_ring: virtqueue_add_sgs, to add multiple sgs.
>>   virtio-blk: use virtqueue_start_buf on bio path
>
> Something wrong with author and commit message in this patch.

Re: author.  I mangled your patch pretty badly in that case, so I wasn't
sure you wanted the blame.

I have restore your authorship.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] arm: use built-in byte swap function

2013-02-20 Thread Kim Phillips

On Wed, 20 Feb 2013 23:29:58 -0500
Nicolas Pitre  wrote:

> On Wed, 20 Feb 2013, Kim Phillips wrote:
> 
> > On Wed, 20 Feb 2013 10:43:18 -0500
> > Nicolas Pitre  wrote:
> > 
> > > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > > > ... in which case there is no harm shipping a .c file and trivially 
> > > > > enforcing -O2, the rest being equal.
> > > > 
> > > > For today's compilers, unless the wind changes.
> > > 
> > > We'll adapt if necessary.  Going with -O2 should remain pretty safe 
> > > anyway.
> > 
> > Alas, not so for gcc 4.4 - I had forgotten I had tested
> > Ubuntu/Linaro 4.4.7-1ubuntu2 here:
> > 
> > https://patchwork.kernel.org/patch/2101491/
> > 
> > add -O2 to that test script and gcc 4.4 *always* emits calls to
> > __bswap[sd]i2, even with -march=armv6k+.

argh, sorry - that script was testing support for 
__builtin_bswap{16,32,64} directly, which isn't the same as testing
code generation of a byte swap pattern in C.

> Crap.  OK, assembly code is the way to go then.
> 
> > I'll try working on an assembly version given it probably
> > makes more sense, future-gcc-immunity-wise.
> 
> Agreed.

I'll still try the assembly approach - gcc 4.4's armv6 output looks
worse than both the pre-armv6 and post-armv6 __arch_swab32
implementations currently in use:

mov ip, sp
push{fp, ip, lr, pc}
sub fp, ip, #4
and r2, r0, #65280  ; 0xff00
lsl ip, r0, #24
orr r1, ip, r0, lsr #24
and r0, r0, #16711680   ; 0xff
orr r3, r1, r2, lsl #8
orr r0, r3, r0, lsr #8
ldm sp, {fp, sp, pc}

Kim

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci: do not try to assign irq 255

2013-02-20 Thread Hannes Reinecke


On 02/20/2013 05:57 PM, Yinghai Lu wrote:

On Tue, Feb 19, 2013 at 11:58 PM, Hannes Reinecke  wrote:



Apparently this device is meant to use MSI _only_ so the BIOS developer
didn't feel the need to assign an INTx here.

According to PCI-3.0, section 6.8 (Message Signalled Interrupts):

It is recommended that devices implement interrupt pins to
provide compatibility in systems that do not support MSI
(devices default to interrupt pins). However, it is expected
that the need for interrupt pins will diminish over time.
Devices that do not support interrupt pins due to pin
constraints (rely on polling for device service) may implement
messages to increase performance without adding additional pins. >
Therefore, system configuration software must not assume that a
message capable device has an interrupt pin.


Which sounds to me as if the implementation is valid...


it seems you mess pin with interrupt line.

current code:
 unsigned char irq;

 pci_read_config_byte(dev, PCI_INTERRUPT_PIN, );
 dev->pin = irq;
 if (irq)
 pci_read_config_byte(dev, PCI_INTERRUPT_LINE, );
 dev->irq = irq;

so if the device does not have interrupt pin implemented, pin should be zero.
and  pin and irq in dev should
be all 0.


But the device _has_ an interrupt pin implemented.
The whole point here is that the interrupt line is _NOT_ zero.

00:14.0 USB controller [0c03]: Intel Corporation 7 Series/C210 
Series Chipset Family USB xHCI Host Controller [8086:1e31] (rev 04) 
(prog-if 30 [XHCI])

Subsystem: Hewlett-Packard Company Device [103c:179b]
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- 
Interrupt: pin A routed to IRQ 255
Region 0: Memory at d472 (64-bit, non-prefetchable) [size=64K]
Capabilities: [70] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA 
PME(D0-,D1-,D2-,D3hot+,D3cold+)

Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] MSI: Enable- Count=1/8 Maskable- 64bit+
Address:   Data: 

So at one point we have to decide that ->irq is not valid, despite 
it being not set to zero.

An alternative fix would be this:

diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index 68a921d..4a480cb 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -469,6 +469,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
} else {
dev_warn(>dev, "PCI INT %c: no GSI\n",
 pin_name(pin));
+   dev->irq = 0;
}
return 0;
}

Which probably is a better solution, as here ->irq is _definitely_
not valid, so we should reset it to '0' to avoid confusion on upper
layers.

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tools: usb: ffs-test: Fix build failure

2013-02-20 Thread Michal Nazarewicz

On Thu, Feb 21 2013, Maxin B. John wrote:
> Hi,
>
> On Thu, Feb 21, 2013 at 2:06 AM, Greg KH  wrote:
>> On Thu, Feb 21, 2013 at 01:57:51AM +0200, maxin.j...@gmail.com wrote:
>>> From: "Maxin B. John" 
>>>
>>> Fixes this build failure:
>>> gcc -Wall -Wextra -g -lpthread -I../include -o testusb testusb.c
>>> gcc -Wall -Wextra -g -lpthread -I../include -o ffs-test ffs-test.c
>>> In file included from ffs-test.c:41:0:
>>> ../../include/linux/usb/functionfs.h:4:39: fatal error:
>>> uapi/linux/usb/functionfs.h: No such file or directory
>>> compilation terminated.
>>> make: *** [ffs-test] Error 1
>>
>> This is a build failure where, 3.8, or linux-next, or somewhere else?
>
> It is in 3.8

This also happens in 3.7.  [commit
5e1ddb481776a487b15b40579a000b279ce527c9: UAPI: (Scripted) Disintegrate
include/linux/usb] is the culprit.

-- 
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +--ooO--(_)--Ooo--

pgpbFJmia2nRt.pgp
Description: PGP signature

Re: [PATCH] tools: usb: ffs-test: Fix build failure

2013-02-20 Thread Michal Nazarewicz

On Thu, Feb 21 2013, maxin.j...@gmail.com wrote:
> From: "Maxin B. John" 
>
> Fixes this build failure:
> gcc -Wall -Wextra -g -lpthread -I../include -o testusb testusb.c
> gcc -Wall -Wextra -g -lpthread -I../include -o ffs-test ffs-test.c
> In file included from ffs-test.c:41:0:
> ../../include/linux/usb/functionfs.h:4:39: fatal error:
> uapi/linux/usb/functionfs.h: No such file or directory
> compilation terminated.
> make: *** [ffs-test] Error 1
>
> Signed-off-by: Maxin B. John 

Acked-by: Michal Nazarewicz 

> ---
>  tools/usb/ffs-test.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/tools/usb/ffs-test.c b/tools/usb/ffs-test.c
> index 8674b9e..fe1e66b 100644
> --- a/tools/usb/ffs-test.c
> +++ b/tools/usb/ffs-test.c
> @@ -38,7 +38,7 @@
>  #include 
>  #include 
>  
> -#include "../../include/linux/usb/functionfs.h"
> +#include "../../include/uapi/linux/usb/functionfs.h"
>  
>  
>  / Little Endian Handling 
> /
> -- 
> 1.7.7
>

-- 
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of  o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz(o o)
ooo +--ooO--(_)--Ooo--

pgpbBCf_yr1UH.pgp
Description: PGP signature

RE: linux-next: build failure after merge of the xen-two tree

2013-02-20 Thread Liu, Jinsong

Konrad Rzeszutek Wilk wrote:
 commit 3757b94802fb65d8f696597a74053cf21738da0b
 Author: Rafael J. Wysocki 
 Date:   Wed Feb 13 14:36:47 2013 +0100
 
 ACPI / hotplug: Fix concurrency issues and memory leaks
 
 after which acpi_bus_scan() and acpi_bus_trim() have to be run
 under acpi_scan_lock (new in my tree as well).
>>> 
>>> Yes, we noticed that and only need minor updates at xen side, will
>>> send out 2 xen patches later accordingly, for cleanup and adding
>>> lock. 
>> 
>> Thanks, but those new changes will only make sense after merging the
>> Xen tree with the PM tree.  Why don't we queue them up for merging
>> later after both the Xen and PM trees have been pulled from?
> 
> OK, I've created a branch
> (http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=shortlog;h=refs/heads/linux-next-resolved)
> that has your branch and my branch - along with the fix from Stephan
> and then  
> the three updates from Jinsong. Jinsong, please check that I've got
> all the 
> right patches. I will rebase it once Linus has merged both of the Xen
> and PM trees. 

Check done, it's OK.

Thanks,
Jinsong--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] block: partition: optimize memory allocation in check_partition

2013-02-20 Thread Yasuaki Ishimatsu

2013/02/21 14:22, Ming Lei wrote:
> Currently, sizeof(struct parsed_partitions) may be 64KB in 32bit arch,
> so it is easy to trigger page allocation failure by check_partition,
> especially in hotplug block device situation(such as, USB mass storage,
> MMC card, ...), and Felipe Balbi has observed the failure.
> 
> This patch does below optimizations on the allocation of struct
> parsed_partitions to try to address the issue:
> 
> - make parsed_partitions.parts as pointer so that the pointed memory
> can fit in 32KB buffer, then approximate 32KB memory can be saved
> 
> - vmalloc the buffer pointed by parsed_partitions.parts because
> 32KB is still a bit big for kmalloc
> 
> - given that many devices have the partition count limit, so only
> allocate disk_max_parts() partitions instead of 256 partitions always
> 
> Reported-by: Felipe Balbi 
> Signed-off-by: Ming Lei 
> ---

Reviewed-by: Yasuaki Ishimatsu 

Thanks,
Yasuaki Ishimatsu

>   block/partition-generic.c |4 ++--
>   block/partitions/check.c  |   37 -
>   block/partitions/check.h  |4 +++-
>   3 files changed, 37 insertions(+), 8 deletions(-)
> 
> diff --git a/block/partition-generic.c b/block/partition-generic.c
> index 1cb4dec..789cdea 100644
> --- a/block/partition-generic.c
> +++ b/block/partition-generic.c
> @@ -418,7 +418,7 @@ int rescan_partitions(struct gendisk *disk, struct 
> block_device *bdev)
>   int p, highest, res;
>   rescan:
>   if (state && !IS_ERR(state)) {
> - kfree(state);
> + free_partitions(state);
>   state = NULL;
>   }
>   
> @@ -525,7 +525,7 @@ rescan:
>   md_autodetect_dev(part_to_dev(part)->devt);
>   #endif
>   }
> - kfree(state);
> + free_partitions(state);
>   return 0;
>   }
>   
> diff --git a/block/partitions/check.c b/block/partitions/check.c
> index bc90867..19ba207 100644
> --- a/block/partitions/check.c
> +++ b/block/partitions/check.c
> @@ -14,6 +14,7 @@
>*/
>   
>   #include 
> +#include 
>   #include 
>   #include 
>   
> @@ -106,18 +107,45 @@ static int (*check_part[])(struct parsed_partitions *) 
> = {
>   NULL
>   };
>   
> +static struct parsed_partitions *allocate_partitions(struct gendisk *hd)
> +{
> + struct parsed_partitions *state;
> + int nr;
> +
> + state = kzalloc(sizeof(*state), GFP_KERNEL);
> + if (!state)
> + return NULL;
> +
> + nr = disk_max_parts(hd);
> + state->parts = vzalloc(nr * sizeof(state->parts[0]));
> + if (!state->parts) {
> + kfree(state);
> + return NULL;
> + }
> +
> + state->limit = nr;
> +
> + return state;
> +}
> +
> +void free_partitions(struct parsed_partitions *state)
> +{
> + vfree(state->parts);
> + kfree(state);
> +}
> +
>   struct parsed_partitions *
>   check_partition(struct gendisk *hd, struct block_device *bdev)
>   {
>   struct parsed_partitions *state;
>   int i, res, err;
>   
> - state = kzalloc(sizeof(struct parsed_partitions), GFP_KERNEL);
> + state = allocate_partitions(hd);
>   if (!state)
>   return NULL;
>   state->pp_buf = (char *)__get_free_page(GFP_KERNEL);
>   if (!state->pp_buf) {
> - kfree(state);
> + free_partitions(state);
>   return NULL;
>   }
>   state->pp_buf[0] = '\0';
> @@ -128,10 +156,9 @@ check_partition(struct gendisk *hd, struct block_device 
> *bdev)
>   if (isdigit(state->name[strlen(state->name)-1]))
>   sprintf(state->name, "p");
>   
> - state->limit = disk_max_parts(hd);
>   i = res = err = 0;
>   while (!res && check_part[i]) {
> - memset(>parts, 0, sizeof(state->parts));
> + memset(state->parts, 0, state->limit * sizeof(state->parts[0]));
>   res = check_part[i++](state);
>   if (res < 0) {
>   /* We have hit an I/O error which we don't report now.
> @@ -161,6 +188,6 @@ check_partition(struct gendisk *hd, struct block_device 
> *bdev)
>   printk(KERN_INFO "%s", state->pp_buf);
>   
>   free_page((unsigned long)state->pp_buf);
> - kfree(state);
> + free_partitions(state);
>   return ERR_PTR(res);
>   }
> diff --git a/block/partitions/check.h b/block/partitions/check.h
> index 52b1003..eade17e 100644
> --- a/block/partitions/check.h
> +++ b/block/partitions/check.h
> @@ -15,13 +15,15 @@ struct parsed_partitions {
>   int flags;
>   bool has_info;
>   struct partition_meta_info info;
> - } parts[DISK_MAX_PARTS];
> + } *parts;
>   int next;
>   int limit;
>   bool access_beyond_eod;
>   char *pp_buf;
>   };
>   
> +void free_partitions(struct parsed_partitions *state);
> +
>   struct parsed_partitions *
>   check_partition(struct gendisk *, struct block_device *);
>   
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body

PATCH: freezer: add fake signal clearing back when thaw task

2013-02-20 Thread Lianwei Wang

Hi Tejun Heo and all,

The commit of "34b087e freezer: kill unused
set_freezable_with_signal()" remove recalc_sigpending*() calls in
freezer, so the user tasks get TIF_SIGPENDING fake signal that is set
when freezing userspace process. It left the fake signal to userspcae
which cause the userspace task that wait_event_freezable and friends
return a wrong ERESTARTSYS. This is not good because it waste cpu time
to handle the fake signal.

Can we just call the recalc_sigpending to clear the fake signal for
userspace tasks? as below patch do:

>From 176fccee178bc0185d92853dd2f521c9166b0853 Mon Sep 17 00:00:00 2001
From: Lianwei Wang 
Date: Mon, 21 Jan 2013 18:21:26 +0800
Subject: [PATCH] freezer: add fake signal clearing back when thaw task

The fake TIF_SIGPENDING is set during freeze userspace process, but it
is not cleared when thaw tasks after below commit:
  34b087e freezer: kill unused set_freezable_with_signal()

This will cause the userspace task that wait_event_freezable and friends
return a wrong ERESTARTSYS. This is not good because it waste cpu time to
handle the fake signal.

Try to clear the TIF_SIGPENDING flag for userspace apps when wakeup the
frozen task to fix this issue.

Change-Id: I91c90ad2ee9a46c42e3b39a7384ec81e97bc0394
Signed-off-by: Lianwei Wang 
---
 kernel/freezer.c |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index c38893b..09557f6 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -46,6 +46,16 @@ bool freezing_slow_path(struct task_struct *p)
 }
 EXPORT_SYMBOL(freezing_slow_path);

+static void fake_signal_clear(struct task_struct *p)
+{
+ unsigned long flags;
+
+ if (lock_task_sighand(p, )) {
+ recalc_sigpending();
+ unlock_task_sighand(p, );
+ }
+}
+
 /* Refrigerator is place where frozen processes are stored :-). */
 bool __refrigerator(bool check_kthr_stop)
 {
@@ -74,6 +84,10 @@ bool __refrigerator(bool check_kthr_stop)

  pr_debug("%s left refrigerator\n", current->comm);

+ if (!(current->flags & PF_KTHREAD))
+ if (test_tsk_thread_flag(current, TIF_SIGPENDING))
+ fake_signal_clear(current);
+
  /*
  * Restore saved task state before returning.  The mb'd version
  * needs to be used; otherwise, it might silently break
--
1.7.4.1


0001-freezer-add-fake-signal-clearing-back-when-thaw-task.patch
Description: Binary data

Re: [resend] Timer broadcast question

2013-02-20 Thread Santosh Shilimkar


On Tuesday 19 February 2013 11:51 PM, Daniel Lezcano wrote:

On 02/19/2013 07:10 PM, Thomas Gleixner wrote:

On Tue, 19 Feb 2013, Daniel Lezcano wrote:

I am working on identifying the different wakeup sources from the
interrupts and I have a question regarding the timer broadcast.

The broadcast timer is setup to the next event and that will wake up any
idle cpu belonging to the "broadcast cpumask", right ?

The cpu which has been woken up will look for each cpu the next-event
and send an IPI to wake it up.

Although, it is possible the sender of this IPI may not be concerned by
the timer expiration and has been woken up just for sending the IPI, right ?


Correct.


If this is correct, is it possible to setup the timer irq affinity to a
cpu which will be concerned by the timer expiration ? so we prevent an
unnecessary wake up for a cpu.


It is possible, but we never implemented it.

If we go there, we want to make that conditional on a property flag,
because some interrupt controllers especially on x86 only allow to
move the affinity from interrupt context, which is pointless.


Thanks Thomas for your quick answer. I will write a RFC patchset.


Last year I implemented the affinity hook for broad-cast code and
experimented with it. Since the system I was using was dual core,
it wasn't much beneficial and hence gave up later. I did remember
discussing the approach with few folks in the conference.

Patch in the end of the email (also attached) for generic broadcast
code. I didn't look at all corner case though. In arch code then
you need to setup "broadcast_affinity" hook which should be able
to get handle of the arch irqchip and call the respective affinity
handler. Just 3 lines function should do the trick.

As Thomas said, effectiveness of such optimization solely depends
on how well the affinity (in low powers) supported by your IRQ chip.

Hope this is helpful for you.

Regards,
Santosh


From d70f2d48ec08a3f1d73187c49b16e4e60f81a50c Mon Sep 17 00:00:00 2001
From: Santosh Shilimkar 
Date: Wed, 25 Jul 2012 03:42:33 +0530
Subject: [PATCH] tick-broadcast: Add tick road-cast affinity suport

Current tick broad-cast code has affinity set to the boot CPU and hence
the boot CPU will always wakeup from low power states when broad cast timer
is armed even if the next expiry event doesn't belong to it.

Patch adds broadcast affinity functionality to avoid above and let the
tick framework set the affinity of the event for the CPU it belongs.

Signed-off-by: Santosh Shilimkar 
---
 include/linux/clockchips.h   |2 ++
 kernel/time/tick-broadcast.c |   13 -
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 8a7096f..5488cdc 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -95,6 +95,8 @@ struct clock_event_device {
unsigned long   retries;

void(*broadcast)(const struct cpumask *mask);
+   void(*broadcast_affinity)
+   (const struct cpumask *mask, int irq);
void(*set_mode)(enum clock_event_mode mode,
struct clock_event_device *);
void(*suspend)(struct clock_event_device *);
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index f113755..2ec2425 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -39,6 +39,8 @@ static void tick_broadcast_clear_oneshot(int cpu);
 static inline void tick_broadcast_clear_oneshot(int cpu) { }
 #endif

+static inline void dummy_broadcast_affinity(const struct cpumask *mask,
+   int irq) { }
 /*
  * Debugging: see timer_list.c
  */
@@ -485,14 +487,19 @@ void tick_broadcast_oneshot_control(unsigned long 
reason)

if (!cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) {
cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask());
clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
-   if (dev->next_event.tv64 < bc->next_event.tv64)
+   if (dev->next_event.tv64 < bc->next_event.tv64) {
tick_broadcast_set_event(dev->next_event, 1);
+   bc->broadcast_affinity(
+   tick_get_broadcast_oneshot_mask(), bc->irq);
+   }
}
} else {
if (cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) {
cpumask_clear_cpu(cpu,
  tick_get_broadcast_oneshot_mask());
clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
+   bc->broadcast_affinity(
+   tick_get_broadcast_oneshot_mask(), bc->irq);

Re: [PATCH v2] X.509: Support parse long form of length octets in Authority Key Identifier

joeyli  writes:
> 於 三，2013-02-20 於 12:49 +，David Howells 提到：
>> Acked-by: David Howells 
>> 
>
> Thanks for David's review and confirm.

Should this be CC stable?

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: too many timer retries happen when do local timer swtich with broadcast timer

2013-02-20 Thread Jason Liu

2013/2/20 Thomas Gleixner :
> On Wed, 20 Feb 2013, Jason Liu wrote:
>> void arch_idle(void)
>> {
>> 
>> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, );
>>
>> enter_the_wait_mode();
>>
>> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, );
>> }
>>
>> when the broadcast timer interrupt arrives(this interrupt just wakeup
>> the ARM, and ARM has no chance
>> to handle it since local irq is disabled. In fact it's disabled in
>> cpu_idle() of arch/arm/kernel/process.c)
>>
>> the broadcast timer interrupt will wake up the CPU and run:
>>
>> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, );->
>> tick_broadcast_oneshot_control(...);
>> ->
>> tick_program_event(dev->next_event, 1);
>> ->
>> tick_dev_program_event(dev, expires, force);
>> ->
>> for (i = 0;;) {
>> int ret = clockevents_program_event(dev, expires, now);
>> if (!ret || !force)
>> return ret;
>>
>> dev->retries++;
>> 
>> now = ktime_get();
>> expires = ktime_add_ns(now, dev->min_delta_ns);
>> }
>> clockevents_program_event(dev, expires, now);
>>
>> delta = ktime_to_ns(ktime_sub(expires, now));
>>
>> if (delta <= 0)
>> return -ETIME;
>>
>> when the bc timer interrupt arrives,  which means the last local timer
>> expires too. so,
>> clockevents_program_event will return -ETIME, which will cause the
>> dev->retries++
>> when retry to program the expired timer.
>>
>> Even under the worst case, after the re-program the expired timer,
>> then CPU enter idle
>> quickly before the re-progam timer expired, it will make system
>> ping-pang forever,
>
> That's nonsense.

I don't think so.

>
> The timer IPI brings the core out of the deep idle state.
>
> So after returning from enter_wait_mode() and after calling
> clockevents_notify() it returns from arch_idle() to cpu_idle().
>
> In cpu_idle() interrupts are reenabled, so the timer IPI handler is
> invoked. That calls the event_handler of the per cpu local clockevent
> device (the one which stops in C3). That ends up in the generic timer
> code which expires timers and reprograms the local clock event device
> with the next pending timer.
>
> So you cannot go idle again, before the expired timers of this event
> are handled and their callbacks invoked.

That's true for the CPUs which not response to the global timer interrupt.
Take our platform as example: we have 4CPUs(CPU0, CPU1,CPU2,CPU3)
The global timer device will keep running even in the deep idle mode, so, it
can be used as the broadcast timer device, and the interrupt of this device
just raised to CPU0 when the timer expired, then, CPU0 will broadcast the
IPI timer to other CPUs which is in deep idle mode.

So for CPU1, CPU2, CPU3, you are right, the IPI timer will bring it out of idle
state, after running clockevents_notify() it returns from arch_idle()
to cpu_idle(),
then local_irq_enable(), the IPI handler will be invoked and handle
the expires times
and re-program the next pending timer.

But, that's not true for the CPU0. The flow for CPU0 is:
the global timer interrupt wakes up CPU0 and then call:
clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, );

which will cpumask_clear_cpu(cpu, tick_get_broadcast_oneshot_mask());
in the function tick_broadcast_oneshot_control(),

After return from clockevents_notify(), it will return to cpu_idle
from arch_idle,
then local_irq_enable(), the CPU0 will response to the global timer
interrupt, and
call the interrupt handler: tick_handle_oneshot_broadcast()

static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
{
struct tick_device *td;
ktime_t now, next_event;
int cpu;

raw_spin_lock(_broadcast_lock);
again:
dev->next_event.tv64 = KTIME_MAX;
next_event.tv64 = KTIME_MAX;
cpumask_clear(to_cpumask(tmpmask));
now = ktime_get();
/* Find all expired events */
for_each_cpu(cpu, tick_get_broadcast_oneshot_mask()) {
td = _cpu(tick_cpu_device, cpu);
if (td->evtdev->next_event.tv64 <= now.tv64)
cpumask_set_cpu(cpu, to_cpumask(tmpmask));
else if (td->evtdev->next_event.tv64 < next_event.tv64)
next_event.tv64 = td->evtdev->next_event.tv64;
}

/*
 * Wakeup the cpus which have an expired event.
 */
tick_do_broadcast(to_cpumask(tmpmask));
...
}

since cpu0 has been removed from the tick_get_broadcast_oneshot_mask(), and if
all the other cpu1/2/3 state in idle, and no expired timers, then the
tmpmask will be 0,
when call tick_do_broadcast().

static void tick_do_broadcast(struct cpumask *mask)
{
int cpu = smp_processor_id();
struct tick_device *td;

/*
 * Check, if the current cpu is in the mask
 */
if (cpumask_test_cpu(cpu, mask)) {

Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

2013-02-20 Thread Mike Galbraith

On Thu, 2013-02-21 at 12:51 +0800, Michael Wang wrote: 
> On 02/20/2013 06:49 PM, Ingo Molnar wrote:
> [snip]
> > 
> > The changes look clean and reasoable, any ideas exactly *why* it 
> > speeds up?
> > 
> > I.e. are there one or two key changes in the before/after logic 
> > and scheduling patterns that you can identify as causing the 
> > speedup?
> 
> Hi, Ingo
> 
> Thanks for your reply, please let me point out the key changes here
> (forgive me for haven't wrote a good description in cover).
> 
> The performance improvement from this patch set is:
> 1. delay the invoke on wake_affine().
> 2. save the circle to gain proper sd.
> 
> The second point is obviously, and will benefit a lot when the sd
> topology is deep (NUMA is suppose to make it deeper on large system).
> 
> So in my testing on a 12 cpu box, actually most of the benefit comes
> from the first point, and please let me introduce it in detail.
> 
> The old logical when locate affine_sd is:
> 
>   if prev_cpu != curr_cpu
>   if wake_affine()
>   prev_cpu = curr_cpu
>   new_cpu = select_idle_sibling(prev_cpu)
>   return new_cpu
> 
> The new logical is same to the old one if prev_cpu == curr_cpu, so let's
> simplify the old logical like:
> 
>   if wake_affine()
>   new_cpu = select_idle_sibling(curr_cpu)
>   else
>   new_cpu = select_idle_sibling(prev_cpu)
> 
>   return new_cpu
> 
> Actually that doesn't make sense.
> 
> I think wake_affine() is trying to check whether move a task from
> prev_cpu to curr_cpu will break the balance in affine_sd or not, but why
> won't break balance means curr_cpu is better than prev_cpu for searching
> the idle cpu?

You could argue that it's impossible to break balance by moving any task
to any idle cpu, but that would mean bouncing tasks cross node on every
wakeup is fine, which it isn't.

> So the new logical in this patch set is:
> 
>   new_cpu = select_idle_sibling(prev_cpu)
>   if idle_cpu(new_cpu)
>   return new_cpu

So you tilted the scales in favor of leaving tasks in their current
package, which should benefit large footprint tasks, but should also
penalize light communicating tasks.

I suspect that much of the pgbench improvement comes from the preemption
mitigation from keeping 1:N load maximally spread, which is the perfect
thing to do with such loads.  In all the testing I ever did with it in
1:N mode, preemption dominated performance numbers.  Keep server away
from clients, it has fewer fair competition worries, can consume more
CPU preemption free, pushing the load collapse point strongly upward.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the signal tree with the powerpc tree

2013-02-20 Thread Benjamin Herrenschmidt

On Thu, 2013-02-21 at 15:52 +1100, Stephen Rothwell wrote:
> Hi Al,
> 
> Today's linux-next merge of the signal tree got conflicts in
> arch/powerpc/kernel/signal_32.c and arch/powerpc/kernel/signal_64.c
> between commit 2b0a576d15e0 ("powerpc: Add new transactional memory state
> to the signal context") from the powerpc tree and commit 7cce246557bf
> ("powerpc: switch to generic sigaltstack") from the signal tree.
> 
> I fixed it up (I think - see below) and can carry the fix as necessary
> (no action is required).

Mikey, can you check everything's all right ?

I'm happy to wait for Al stuff to go in first & fixup the conflict
before I send the pull request to Linus. I'm off travelling around but I
should be able to get stuff out this week-end.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 2/2] cpufreq: Convert the cpufreq_driver_lock to use the rcu

2013-02-20 Thread Viresh Kumar

On 21 February 2013 05:26, Nathan Zimmer  wrote:
> In general rwlocks are discourged so we are moving it to use the rcu instead.
> This does require a bit of care since the cpufreq_driver_lock protects both
> the cpufreq_driver and the cpufreq_cpu_data array.
> Also since many of the function pointers on cpufreq_driver may sleep when
> called we have to grab them under the rcu_read_lock but call them after
> rcu_read_unlock();

Even i have started reading rcu documentation now :)

> Cc: Viresh Kumar 
> Cc: "Rafael J. Wysocki" 
> Signed-off-by: Nathan Zimmer 
> ---
>  drivers/cpufreq/cpufreq.c | 312 
> +-
>  1 file changed, 224 insertions(+), 88 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c

> @@ -255,20 +258,21 @@ static inline void adjust_jiffies(unsigned long val, 
> struct cpufreq_freqs *ci)
>  void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int 
> state)
>  {
> struct cpufreq_policy *policy;
> -   unsigned long flags;
> +   u8 flags;

I think you can get rid of flags.

> BUG_ON(irqs_disabled());
>
> if (cpufreq_disabled())
> return;
>
> -   freqs->flags = cpufreq_driver->flags;
> pr_debug("notification %u of frequency transition to %u kHz\n",
> state, freqs->new);
>
> -   read_lock_irqsave(_driver_lock, flags);
> +   rcu_read_lock();
> +   flags = rcu_dereference(cpufreq_driver)->flags;

use freq->flags here ...

> policy = per_cpu(cpufreq_cpu_data, freqs->cpu);
> -   read_unlock_irqrestore(_driver_lock, flags);
> +   rcu_read_unlock();
> +   freqs->flags = flags;
>
> switch (state) {
>
> @@ -277,7 +281,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs 
> *freqs, unsigned int state)
>  * which is not equal to what the cpufreq core thinks is
>  * "old frequency".
>  */
> -   if (!(cpufreq_driver->flags & CPUFREQ_CONST_LOOPS)) {
> +   if (!(flags & CPUFREQ_CONST_LOOPS)) {

and here.

> if ((policy) && (policy->cpu == freqs->cpu) &&
> (policy->cur) && (policy->cur != freqs->old)) {
> pr_debug("Warning: CPU frequency is"


> @@ -742,35 +773,39 @@ static int cpufreq_add_dev_interface(unsigned int cpu,

> -   write_lock_irqsave(_driver_lock, flags);
> +   spin_lock_irqsave(_driver_lock, flags);
> for_each_cpu(j, policy->cpus) {
> per_cpu(cpufreq_cpu_data, j) = policy;
> per_cpu(cpufreq_policy_cpu, j) = policy->cpu;
> }
> -   write_unlock_irqrestore(_driver_lock, flags);
> +   spin_unlock_irqrestore(_driver_lock, flags);
> +   synchronize_rcu();

I don't think (but i can be wrong too :) ), that we need a synchronize_rcu()
here. We need it only at places where we have updated the cpufreq_driver
pointer.

As we aren't doing any rcu specific read/update for cpufreq_cpu_data.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] spi: tegra114: add spi driver

2013-02-20 Thread Laxman Dewangan


On Wednesday 20 February 2013 11:30 PM, Stephen Warren wrote:

On 02/20/2013 10:57 AM, Mark Brown wrote:

On Wed, Feb 20, 2013 at 10:36:41AM -0700, Stephen Warren wrote:

On 02/20/2013 10:31 AM, Mark Brown wrote:

Since we can extend the list of clocks it doesn't seem like
there's much issue here, especially if some of them are
optional?

Yes, there's certainly a way to extend the binding in a
backwards-compatible way.
However, I have seen in Rob and/or Grant push back on not fully
defining bindings in the past - i.e. actively planning to
initially create a minimal binding and extend it in the future,
rather than completely defining it up-front.

That sounds like the current stuff with a minimal definition is
OK?

I'm personally OK with defining a minimal binding first and extending
it later. But, I'm worried if when we actually try to extend the
binding later, we'll get push-back.


Yes, for a given controller there is lots of input sources which can be 
mux but we can not use all option as some of source is changeable based 
on DVFS policy or other constraints. Like one of controller has the 
input clock source as PLLC which is again used by CPU and it varies for 
requested CPU frequency. In this context, we would like to not choose 
PLLC as clock source for given controller.


So we may need to provide the list of valid clock source/option from DT 
file and clock muxing should be done from that source list only in place 
of super set supported by SoCs.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/2] arm: Wire up kcmp syscall

2013-02-20 Thread Cyrill Gorcunov

On Wed, Feb 20, 2013 at 03:17:23PM -0800, Andrew Morton wrote:
> On Tue, 19 Feb 2013 11:07:03 +0400
> Cyrill Gorcunov  wrote:
> 
> > From: Alexander Kartashov 
> > Subject: arm: Wire up kcmp syscall
> > 
> > Signed-off-by: Alexander Kartashov 
> > Cc: Russell King 
> 
> This should have had signed-off-by:you, as you were on the patch's
> delivery path.

Ouch, sorry Andrew! Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] block: partition: optimize memory allocation in check_partition

2013-02-20 Thread Ming Lei

Currently, sizeof(struct parsed_partitions) may be 64KB in 32bit arch,
so it is easy to trigger page allocation failure by check_partition,
especially in hotplug block device situation(such as, USB mass storage,
MMC card, ...), and Felipe Balbi has observed the failure.

This patch does below optimizations on the allocation of struct
parsed_partitions to try to address the issue:

- make parsed_partitions.parts as pointer so that the pointed memory
can fit in 32KB buffer, then approximate 32KB memory can be saved

- vmalloc the buffer pointed by parsed_partitions.parts because
32KB is still a bit big for kmalloc

- given that many devices have the partition count limit, so only
allocate disk_max_parts() partitions instead of 256 partitions always

Reported-by: Felipe Balbi 
Signed-off-by: Ming Lei 
---
 block/partition-generic.c |4 ++--
 block/partitions/check.c  |   37 -
 block/partitions/check.h  |4 +++-
 3 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/block/partition-generic.c b/block/partition-generic.c
index 1cb4dec..789cdea 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -418,7 +418,7 @@ int rescan_partitions(struct gendisk *disk, struct 
block_device *bdev)
int p, highest, res;
 rescan:
if (state && !IS_ERR(state)) {
-   kfree(state);
+   free_partitions(state);
state = NULL;
}
 
@@ -525,7 +525,7 @@ rescan:
md_autodetect_dev(part_to_dev(part)->devt);
 #endif
}
-   kfree(state);
+   free_partitions(state);
return 0;
 }
 
diff --git a/block/partitions/check.c b/block/partitions/check.c
index bc90867..19ba207 100644
--- a/block/partitions/check.c
+++ b/block/partitions/check.c
@@ -14,6 +14,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 
@@ -106,18 +107,45 @@ static int (*check_part[])(struct parsed_partitions *) = {
NULL
 };
 
+static struct parsed_partitions *allocate_partitions(struct gendisk *hd)
+{
+   struct parsed_partitions *state;
+   int nr;
+
+   state = kzalloc(sizeof(*state), GFP_KERNEL);
+   if (!state)
+   return NULL;
+
+   nr = disk_max_parts(hd);
+   state->parts = vzalloc(nr * sizeof(state->parts[0]));
+   if (!state->parts) {
+   kfree(state);
+   return NULL;
+   }
+
+   state->limit = nr;
+
+   return state;
+}
+
+void free_partitions(struct parsed_partitions *state)
+{
+   vfree(state->parts);
+   kfree(state);
+}
+
 struct parsed_partitions *
 check_partition(struct gendisk *hd, struct block_device *bdev)
 {
struct parsed_partitions *state;
int i, res, err;
 
-   state = kzalloc(sizeof(struct parsed_partitions), GFP_KERNEL);
+   state = allocate_partitions(hd);
if (!state)
return NULL;
state->pp_buf = (char *)__get_free_page(GFP_KERNEL);
if (!state->pp_buf) {
-   kfree(state);
+   free_partitions(state);
return NULL;
}
state->pp_buf[0] = '\0';
@@ -128,10 +156,9 @@ check_partition(struct gendisk *hd, struct block_device 
*bdev)
if (isdigit(state->name[strlen(state->name)-1]))
sprintf(state->name, "p");
 
-   state->limit = disk_max_parts(hd);
i = res = err = 0;
while (!res && check_part[i]) {
-   memset(>parts, 0, sizeof(state->parts));
+   memset(state->parts, 0, state->limit * sizeof(state->parts[0]));
res = check_part[i++](state);
if (res < 0) {
/* We have hit an I/O error which we don't report now.
@@ -161,6 +188,6 @@ check_partition(struct gendisk *hd, struct block_device 
*bdev)
printk(KERN_INFO "%s", state->pp_buf);
 
free_page((unsigned long)state->pp_buf);
-   kfree(state);
+   free_partitions(state);
return ERR_PTR(res);
 }
diff --git a/block/partitions/check.h b/block/partitions/check.h
index 52b1003..eade17e 100644
--- a/block/partitions/check.h
+++ b/block/partitions/check.h
@@ -15,13 +15,15 @@ struct parsed_partitions {
int flags;
bool has_info;
struct partition_meta_info info;
-   } parts[DISK_MAX_PARTS];
+   } *parts;
int next;
int limit;
bool access_beyond_eod;
char *pp_buf;
 };
 
+void free_partitions(struct parsed_partitions *state);
+
 struct parsed_partitions *
 check_partition(struct gendisk *, struct block_device *);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] block: partitions: mac: obey the state->limit constraint

2013-02-20 Thread Ming Lei

It isn't necessary to read the information of partitions whose No.
is equal and more than state->limit since only maximum state->limit
partitions will be added inside rescan_partitions().

That is also what other kind of partitions are doing.

Signed-off-by: Ming Lei 
---
 block/partitions/mac.c |4 
 1 file changed, 4 insertions(+)

diff --git a/block/partitions/mac.c b/block/partitions/mac.c
index 11f688b..76d8ba6 100644
--- a/block/partitions/mac.c
+++ b/block/partitions/mac.c
@@ -63,6 +63,10 @@ int mac_partition(struct parsed_partitions *state)
put_dev_sector(sect);
return 0;
}
+
+   if (blocks_in_map >= state->limit)
+   blocks_in_map = state->limit - 1;
+
strlcat(state->pp_buf, " [mac]", PAGE_SIZE);
for (slot = 1; slot <= blocks_in_map; ++slot) {
int pos = slot * secsize;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

On 02/20/2013 10:05 PM, Mike Galbraith wrote:
> On Wed, 2013-02-20 at 14:32 +0100, Peter Zijlstra wrote: 
>> On Wed, 2013-02-20 at 11:49 +0100, Ingo Molnar wrote:
>>
>>> The changes look clean and reasoable, 
>>
>> I don't necessarily agree, note that O(n^2) storage requirement that
>> Michael failed to highlight ;-)
> 
> (yeah, I mentioned that needs to shrink.. a lot)

Exactly, and I'm going to apply the suggestion now :)

> 
>>> any ideas exactly *why* it speeds up?
>>
>> That is indeed the most interesting part.. There's two parts to
>> select_task_rq_fair(), the 'regular' affine wakeup path, and the
>> fork/exec find_idlest_goo() path. At the very least we need to quantify
>> which of these two parts contributes most to the speedup.
>>
>> In the power balancing discussion we already noted that the
>> find_idlest_goo() is in need of attention.
> 
> Yup, even little stuff like break off the search when load is zero..

Agree, searching in a bunch of idle cpus and their subsets doesn't make
sense...

Regards,
Michael Wang

> unless someone is planning on implementing anti-idle 'course ;-)
> 
> -Mike
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] exynos-dp updates for v3.9

2013-02-20 Thread Jingoo Han

Hi Linus,

Florian, the fbdev maintainer, has been very busy lately, so I send the pull 
request
for exynos-dp for this merge window.

The following changes since commit 19f949f52599ba7c3f67a5897ac6be14bfcb1200:

 Linux 3.8 (Mon Feb 18 15:58:34 2013 -0800)

are available in the git repository at:
  git://github.com/jingoo/linux.git tags/exynos-dp-3.9

for you to fetch changes up to bb80934325dab97b479815aed237ebec33ed1c57:

 video: exynos_dp: move disable_irq() to exynos_dp_suspend() (Tue Jan 29 
18:26:05 2013 +0900)


exynos-dp updates for the v3.9:

- The missing function calls are fixed.


Ajay Kumar (1):
  video: exynos_dp: move disable_irq() to exynos_dp_suspend()

Jingoo Han (1):
  video: exynos_dp: add missing of_node_put()

drivers/video/exynos/exynos_dp_core.c |   24 +++-
 1 files changed, 15 insertions(+), 9 deletions(-)

--
Best regards,
Jingoo Han

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] samsung-fb updates for v3.9

2013-02-20 Thread Jingoo Han

Hi Linus,

Florian, the fbdev maintainer, has been very busy lately, so I send the pull 
request
for samsung-fb for this merge window.

The following changes since commit 19f949f52599ba7c3f67a5897ac6be14bfcb1200:

 Linux 3.8 (Mon Feb 18 15:58:34 2013 -0800)

are available in the git repository at:
  git://github.com/jingoo/linux.git tags/samsung-fb-3.9

for you to fetch changes up to 5a415ae252d5922de9eadefabe8510115395fbc6:

 video: s3c-fb: Fix typo in definition of VIDCON1_VSTATUS_FRONTPORCH value (Sat 
Nov 17 21:31:00 2012 +)


samsung-fb updates for the v3.9:

- The bit definitions of header file are updated.
- The dependancy is fixed.


Jingoo Han (4):
  video: s3c-fb: use ARCH_ dependancy
  video: s3c-fb: remove duplicated S3C_FB_MAX_WIN
  video: s3c-fb: remove unnecessary brackets
  video: s3c-fb: add the bit definitions for CSC EQ709 and EQ601

Tomasz Figa (1):
  video: s3c-fb: Fix typo in definition of VIDCON1_VSTATUS_FRONTPORCH value

 drivers/video/Kconfig|3 +-
 include/video/samsung_fimd.h |  205 -
 2 files changed, 102 insertions(+), 106 deletions(-)

--
Best regards,
Jingoo Han

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

On 02/20/2013 09:32 PM, Peter Zijlstra wrote:
> On Wed, 2013-02-20 at 11:49 +0100, Ingo Molnar wrote:
> 
>> The changes look clean and reasoable, 
> 
> I don't necessarily agree, note that O(n^2) storage requirement that
> Michael failed to highlight ;-)

Forgive me for not explain this point in cover, but it's really not a
big deal in my opinion...

And I'm going to apply Mike's suggestion, do allocation when cpu active,
that will save some space :)

Regards,
Michael Wang

> 
>> any ideas exactly *why* it speeds up?
> 
> That is indeed the most interesting part.. There's two parts to
> select_task_rq_fair(), the 'regular' affine wakeup path, and the
> fork/exec find_idlest_goo() path. At the very least we need to quantify
> which of these two parts contributes most to the speedup.
> 
> In the power balancing discussion we already noted that the
> find_idlest_goo() is in need of attention.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: general protection fault in do_msgrcv [3.8]

2013-02-20 Thread Stanislav Kinsbursky


20.02.2013 22:24, Dave Jones пишет:

On Wed, Feb 20, 2013 at 12:23:22PM +0400, Stanislav Kinsbursky wrote:

  > > Pid: 887, comm: trinity-child2 Not tainted 3.8.0+ #57 Gigabyte Technology 
Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H
  > > RIP: 0010:[]  [] do_msgrcv+0x22a/0x670
  > > ...
  > > Looks like Stanislav recently changed this code, so problem was likely 
introduced
  > > in those changes.
  > >
  >
  > Is it easy to reproduce? Do you use KVM?

Only hit it once so far, no KVM

  > There is a NULL selinux handler bug fix by Stephen Smalley here:
  > https://lkml.org/lkml/2013/2/6/663
  >
  > But anyway, this bug fix affects only the case, when MSG_COPY flag is set.
  >
  > And this is not your case, I suppose?

 From my reading of the traces, I'd say not. It looks like I'm oopsing before
we even get to the SELinux hooks.



Thanks, Dave. I've seen a couple of issues when running trinity in KVM 
somewhere in the same place.
Look like message queue itself has been destroyed somewhere in the past.
Have no idea how this can happen yet but still searching and will inform you in 
case of any fixes.


Dave




--
Best regards,
Stanislav Kinsbursky
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH linux-next] cpufreq: ondemand: Calculate gradient of CPU load to early increase frequency

2013-02-20 Thread Viresh Kumar

Hi Stratos,

On Thu, Feb 21, 2013 at 2:20 AM, Stratos Karafotis
 wrote:
> Instead of checking only the absolute value of CPU load_freq to increase
> frequency, we detect forthcoming CPU load rise and increase frequency
> earlier.
>
> Every sampling rate, we calculate the gradient of load_freq.
> If it is too steep we assume that the load most probably will
> go over up_threshold in next iteration(s). We reduce up_threshold
> by early_differential to achieve frequency increase in the current
> iteration.
>
> A new tuner early_demand is introduced to enable this functionality
> (disabled by default). Also we use new tuners to control early demand:
>
> - early_differential: controls the final up_threshold
> - grad_up_threshold: over this gradient of load we will decrease
> up_threshold by early_differential.
>
> Signed-off-by: Stratos Karafotis 

Sorry for this but i already have a patchset which has changed these files
to some extent. Can you please rebase over them? Actually my patchset
is already accepted, its just that rafael didn't wanted to have them for 3.9.

http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/cpufreq-for-3.10

Back to your patch:

Following is what i understood about this patch:
- The only case where this code will come into picture is when load is
below up_threshold.
- And we see a steep rise in the load from previous request..

i.e. (with the default values)

UP_THRESHOLD   (80)
GRAD_UP_THESHOLD  (50)
EARLY_DIFFERENTIAL (45)

If the load was 10 previously and it went to 80 > load >= 60, we will
make up_threshold as 80-45 = 35. Which is lower than grad_up_threshold :)

Isn't it strange?

So, probably you just don't need this tunable: early_differential.
Rather just increase the frequency without doing this calculation:

if (load_freq > od_tuners.up_threshold * policy->cur) {

> diff --git a/drivers/cpufreq/cpufreq_ondemand.c 
> b/drivers/cpufreq/cpufreq_ondemand.c
> index f3eb26c..458806f 100644
> --- a/drivers/cpufreq/cpufreq_ondemand.c
> +++ b/drivers/cpufreq/cpufreq_ondemand.c
> @@ -30,6 +30,8 @@
>  #define DEF_FREQUENCY_DOWN_DIFFERENTIAL(10)
>  #define DEF_FREQUENCY_UP_THRESHOLD (80)
>  #define DEF_SAMPLING_DOWN_FACTOR   (1)
> +#define DEF_GRAD_UP_THESHOLD   (50)

s/THESHOLD/THRESHOLD

> @@ -170,11 +175,29 @@ static void od_check_cpu(int cpu, unsigned int 
> load_freq)
>  {
> struct od_cpu_dbs_info_s *dbs_info = _cpu(od_cpu_dbs_info, cpu);
> struct cpufreq_policy *policy = dbs_info->cdbs.cur_policy;
> +   unsigned int up_threshold = od_tuners.up_threshold;
> +   unsigned int grad;
>
> dbs_info->freq_lo = 0;
>
> +   /*
> +* Calculate the gradient of load_freq. If it is too steep we assume
> +* that the load will go over up_threshold in next iteration(s). We
> +* reduce up_threshold by early_differential to achieve frequency
> +* increase earlier
> +*/
> +   if (od_tuners.early_demand) {
> +   if (load_freq > dbs_info->prev_load_freq) {

&& (load_freq < od_tuners.up_threshold * policy->cur) ??

> +   grad = load_freq - dbs_info->prev_load_freq;
> +
> +   if (grad > od_tuners.grad_up_threshold * policy->cur)
> +   up_threshold -= od_tuners.early_differential;
> +   }
> +   dbs_info->prev_load_freq = load_freq;
> +   }
> +
> /* Check for frequency increase */
> -   if (load_freq > od_tuners.up_threshold * policy->cur) {
> +   if (load_freq > up_threshold * policy->cur) {
> /* If switching to max speed, apply sampling_down_factor */
> if (policy->cur < policy->max)
> dbs_info->rate_mult =
> @@ -438,12 +461,26 @@ static ssize_t store_powersave_bias(struct kobject *a, 
> struct attribute *b,
> return count;
>  }

> +show_one(od, early_demand, early_demand);

What about making other two tunables rw?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation

On 02/20/2013 09:25 PM, Peter Zijlstra wrote:
> On Tue, 2013-01-29 at 17:09 +0800, Michael Wang wrote:
>> +struct sched_balance_map {
>> +   struct sched_domain **sd[SBM_MAX_TYPE];
>> +   int top_level[SBM_MAX_TYPE];
>> +   struct sched_domain *affine_map[NR_CPUS];
>> +};
> 
> Argh.. affine_map is O(n^2) in nr_cpus, that's not cool.

You are right, it cost space in order to accelerate the system, I've
calculated the cost once before (I'm really not good at this, please let
me know if I make any silly calculation...), the size of struct is:

SBM_MAX_TYPE * size of pointer * domain level
SBM_MAX_TYPE * size of int
NR_CPUS * size of pointer
padding

So for my 64bits box, which has 12 cpu and 3 domain level, the struct
size is:

3 * size of pointer * 3 = 9 pointer
3 * size of int = 3 int
12 * size of pointer= 12 pointer
padding

= 3 int + 21 pointer + padding

And the final cost is 36 int and 252 pointer, add some padding, won't
over 5K, not a big deal.

Now suppose a big 64bits system with 1000 cpu and 10 level(I have no
idea how to calculate level from nodes, 10 is big in my mind...), the
struct size is:

3 * size of pointer * 10 = 30 pointer
3 * size of int = 3 int
1000 * size of pointer  = 1000 pointer
padding

= 3 int + 1030 pointer + padding

And the final cost is 3000 int and 103 pointer, and some padding,
but won't bigger than 10M, not a big deal for a system with 1000 cpu too.

Regards,
Michael Wang

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation

On 02/20/2013 09:21 PM, Peter Zijlstra wrote:
> On Tue, 2013-01-29 at 17:09 +0800, Michael Wang wrote:
>> +   for_each_possible_cpu(cpu) {
>> +   sbm = _cpu(sbm_array, cpu);
>> +   node = cpu_to_node(cpu);
>> +   size = sizeof(struct sched_domain *) * sbm_max_level;
>> +
>> +   for (type = 0; type < SBM_MAX_TYPE; type++) {
>> +   sbm->sd[type] = kmalloc_node(size, GFP_KERNEL,
>> node);
>> +   WARN_ON(!sbm->sd[type]);
>> +   if (!sbm->sd[type])
>> +   goto failed;
>> +   }
>> +   }
> 
> You can't readily use kmalloc_node() here, cpu_to_node() might return an
> invalid node for offline cpus here.
> 
> Also see: 2ea45800d8e1c3c51c45a233d6bd6289a297a386

Hi, Peter

Thanks for your reply, I've not noticed this point, Mike had suggested
to do allocation in notifier when cpu is online, I will try to use that
idea in the formal patch set.

Regards,
Michael Wang

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the signal tree with the powerpc tree

2013-02-20 Thread Stephen Rothwell

Hi Al,

Today's linux-next merge of the signal tree got conflicts in
arch/powerpc/kernel/signal_32.c and arch/powerpc/kernel/signal_64.c
between commit 2b0a576d15e0 ("powerpc: Add new transactional memory state
to the signal context") from the powerpc tree and commit 7cce246557bf
("powerpc: switch to generic sigaltstack") from the signal tree.

I fixed it up (I think - see below) and can carry the fix as necessary
(no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/powerpc/kernel/signal_32.c
index e4a88d3,802ab5e..000
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@@ -817,223 -513,7 +742,140 @@@ static long restore_user_regs(struct pt
return 0;
  }
  
 +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 +/*
 + * Restore the current user register values from the user stack, except for
 + * MSR, and recheckpoint the original checkpointed register state for 
processes
 + * in transactions.
 + */
 +static long restore_tm_user_regs(struct pt_regs *regs,
 +   struct mcontext __user *sr,
 +   struct mcontext __user *tm_sr)
 +{
 +  long err;
 +  unsigned long msr;
 +#ifdef CONFIG_VSX
 +  int i;
 +#endif
 +
 +  /*
 +   * restore general registers but not including MSR or SOFTE. Also
 +   * take care of keeping r2 (TLS) intact if not a signal.
 +   * See comment in signal_64.c:restore_tm_sigcontexts();
 +   * TFHAR is restored from the checkpointed NIP; TEXASR and TFIAR
 +   * were set by the signal delivery.
 +   */
 +  err = restore_general_regs(regs, tm_sr);
 +  err |= restore_general_regs(>thread.ckpt_regs, sr);
 +
 +  err |= __get_user(current->thread.tm_tfhar, >mc_gregs[PT_NIP]);
 +
 +  err |= __get_user(msr, >mc_gregs[PT_MSR]);
 +  if (err)
 +  return 1;
 +
 +  /* Restore the previous little-endian mode */
 +  regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
 +
 +  /*
 +   * Do this before updating the thread state in
 +   * current->thread.fpr/vr/evr.  That way, if we get preempted
 +   * and another task grabs the FPU/Altivec/SPE, it won't be
 +   * tempted to save the current CPU state into the thread_struct
 +   * and corrupt what we are writing there.
 +   */
 +  discard_lazy_cpu_state();
 +
 +#ifdef CONFIG_ALTIVEC
 +  regs->msr &= ~MSR_VEC;
 +  if (msr & MSR_VEC) {
 +  /* restore altivec registers from the stack */
 +  if (__copy_from_user(current->thread.vr, >mc_vregs,
 +   sizeof(sr->mc_vregs)) ||
 +  __copy_from_user(current->thread.transact_vr,
 +   _sr->mc_vregs,
 +   sizeof(sr->mc_vregs)))
 +  return 1;
 +  } else if (current->thread.used_vr) {
 +  memset(current->thread.vr, 0, ELF_NVRREG * sizeof(vector128));
 +  memset(current->thread.transact_vr, 0,
 + ELF_NVRREG * sizeof(vector128));
 +  }
 +
 +  /* Always get VRSAVE back */
 +  if (__get_user(current->thread.vrsave,
 + (u32 __user *)>mc_vregs[32]) ||
 +  __get_user(current->thread.transact_vrsave,
 + (u32 __user *)_sr->mc_vregs[32]))
 +  return 1;
 +#endif /* CONFIG_ALTIVEC */
 +
 +  regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
 +
 +  if (copy_fpr_from_user(current, >mc_fregs) ||
 +  copy_transact_fpr_from_user(current, _sr->mc_fregs))
 +  return 1;
 +
 +#ifdef CONFIG_VSX
 +  regs->msr &= ~MSR_VSX;
 +  if (msr & MSR_VSX) {
 +  /*
 +   * Restore altivec registers from the stack to a local
 +   * buffer, then write this out to the thread_struct
 +   */
 +  if (copy_vsx_from_user(current, >mc_vsregs) ||
 +  copy_transact_vsx_from_user(current, _sr->mc_vsregs))
 +  return 1;
 +  } else if (current->thread.used_vsr)
 +  for (i = 0; i < 32 ; i++) {
 +  current->thread.fpr[i][TS_VSRLOWOFFSET] = 0;
 +  current->thread.transact_fpr[i][TS_VSRLOWOFFSET] = 0;
 +  }
 +#endif /* CONFIG_VSX */
 +
 +#ifdef CONFIG_SPE
 +  /* SPE regs are not checkpointed with TM, so this section is
 +   * simply the same as in restore_user_regs().
 +   */
 +  regs->msr &= ~MSR_SPE;
 +  if (msr & MSR_SPE) {
 +  if (__copy_from_user(current->thread.evr, >mc_vregs,
 +   ELF_NEVRREG * sizeof(u32)))
 +  return 1;
 +  } else if (current->thread.used_spe)
 +  memset(current->thread.evr, 0, ELF_NEVRREG * sizeof(u32));
 +
 +  /* Always get SPEFSCR back */
 +  if (__get_user(current->thread.spefscr, (u32 __user *)>mc_vregs
 + +

Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()

On 02/20/2013 06:49 PM, Ingo Molnar wrote:
[snip]
> 
> The changes look clean and reasoable, any ideas exactly *why* it 
> speeds up?
> 
> I.e. are there one or two key changes in the before/after logic 
> and scheduling patterns that you can identify as causing the 
> speedup?

Hi, Ingo

Thanks for your reply, please let me point out the key changes here
(forgive me for haven't wrote a good description in cover).

The performance improvement from this patch set is:
1. delay the invoke on wake_affine().
2. save the circle to gain proper sd.

The second point is obviously, and will benefit a lot when the sd
topology is deep (NUMA is suppose to make it deeper on large system).

So in my testing on a 12 cpu box, actually most of the benefit comes
from the first point, and please let me introduce it in detail.

The old logical when locate affine_sd is:

if prev_cpu != curr_cpu
if wake_affine()
prev_cpu = curr_cpu
new_cpu = select_idle_sibling(prev_cpu)
return new_cpu

The new logical is same to the old one if prev_cpu == curr_cpu, so let's
simplify the old logical like:

if wake_affine()
new_cpu = select_idle_sibling(curr_cpu)
else
new_cpu = select_idle_sibling(prev_cpu)

return new_cpu

Actually that doesn't make sense.

I think wake_affine() is trying to check whether move a task from
prev_cpu to curr_cpu will break the balance in affine_sd or not, but why
won't break balance means curr_cpu is better than prev_cpu for searching
the idle cpu?

So the new logical in this patch set is:

new_cpu = select_idle_sibling(prev_cpu)
if idle_cpu(new_cpu)
return new_cpu

new_cpu = select_idle_sibling(curr_cpu)
if idle_cpu(new_cpu) {
if wake_affine()
return new_cpu
}

return prev_cpu

And now, unless we are really going to move load from prev_cpu to
curr_cpu, we won't use wake_affine() any more.

So we avoid wake_affine() when system load is low or high, for middle
load, the worst cases is when failed to locate idle cpu in prev_cpu
topology but succeed to locate one in curr_cpu's, but that's rarely
happen and the benchmark results proved that point.

Some comparison below:

1. system load is low
old logical cost:
wake_affine()
select_idle_sibling()
new logical cost:
select_idle_sibling()

2. system load is high
old logical cost:
wake_affine()
select_idle_sibling()
new logical cost:
select_idle_sibling()
select_idle_sibling()

3. system load is middle
don't know

1 save the cost of wake_affine(), 3 could be proved by benchmark that no
regression at least.

For 2, it's the comparison between wake_affine() and
select_idle_sibling(), since the system load is high, wake_affine() cost
far more than select_idle_sibling(), and we saved many according to the
benchmark results.

> 
> Such changes also typically have a chance to cause regressions 
> in other workloads - when that happens we need this kind of 
> information to be able to enact plan-B.

The benefit comes from avoiding unnecessary works, and the patch set is
suppose to only reduce the cost of key function with least logical
changing, I could not promise it benefit all the workloads, but till
now, I've not found regression.

Regards,
Michael Wang

> 
> Thanks,
> 
>   Ingo
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] arm: use built-in byte swap function

2013-02-20 Thread Nicolas Pitre

On Wed, 20 Feb 2013, Kim Phillips wrote:

> On Wed, 20 Feb 2013 10:43:18 -0500
> Nicolas Pitre  wrote:
> 
> > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > > ... in which case there is no harm shipping a .c file and trivially 
> > > > enforcing -O2, the rest being equal.
> > > 
> > > For today's compilers, unless the wind changes.
> > 
> > We'll adapt if necessary.  Going with -O2 should remain pretty safe anyway.
> 
> Alas, not so for gcc 4.4 - I had forgotten I had tested
> Ubuntu/Linaro 4.4.7-1ubuntu2 here:
> 
> https://patchwork.kernel.org/patch/2101491/
> 
> add -O2 to that test script and gcc 4.4 *always* emits calls to
> __bswap[sd]i2, even with -march=armv6k+.

Crap.  OK, assembly code is the way to go then.

> I'll try working on an assembly version given it probably
> makes more sense, future-gcc-immunity-wise.

Agreed.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT] Networking

2013-02-20 Thread Paul Gortmaker

On Wed, Feb 20, 2013 at 10:05 PM, Linus Torvalds
 wrote:
> On Wed, Feb 20, 2013 at 2:09 PM, David Miller  wrote:
>>
>> 15) Orphan and delete a bunch of pre-historic networking drivers from
>> Paul Gortmaker.
>
> Nooo You killed the 3c501 and 3c503 drivers! Snif.

Not true!  They were dead long ago, and here we were just providing
the service of a coroner, by removing the bodies vs. having them left to
decompose on the side of the street.

Paul.
--

>
> I wonder if they still worked..
>
>  Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] arm: use built-in byte swap function

2013-02-20 Thread Kim Phillips

On Wed, 20 Feb 2013 10:43:18 -0500
Nicolas Pitre  wrote:

> On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > ... in which case there is no harm shipping a .c file and trivially 
> > > enforcing -O2, the rest being equal.
> > 
> > For today's compilers, unless the wind changes.
> 
> We'll adapt if necessary.  Going with -O2 should remain pretty safe anyway.

Alas, not so for gcc 4.4 - I had forgotten I had tested
Ubuntu/Linaro 4.4.7-1ubuntu2 here:

https://patchwork.kernel.org/patch/2101491/

add -O2 to that test script and gcc 4.4 *always* emits calls to
__bswap[sd]i2, even with -march=armv6k+.

I'll try working on an assembly version given it probably
makes more sense, future-gcc-immunity-wise.

Otherwise we're back to the old 'if GCC_VERSION >= 40500' in
arch/arm/include/asm/swab.h...

Kim

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/6] ubifs: Wait for page writeback to provide stable pages

2013-02-20 Thread Darrick J. Wong

On Wed, Jan 23, 2013 at 01:43:12PM -0800, Andrew Morton wrote:
> On Fri, 18 Jan 2013 17:13:16 -0800
> "Darrick J. Wong"  wrote:
> 
> > When stable pages are required, we have to wait if the page is just
> > going to disk and we want to modify it. Add proper callback to
> > ubifs_vm_page_mkwrite().
> > 
> > CC: Artem Bityutskiy 
> > CC: Adrian Hunter 
> > CC: linux-...@lists.infradead.org
> > From: Jan Kara 
> > Signed-off-by: Jan Kara 
> > Signed-off-by: Darrick J. Wong 
> 
> A couple of these patches had this From:Jan strangely embedded in the
> signoff area.  I have assumed that they were indeed authored by Jan.
> 
> Please note that authorship is indicated by putting the From: line
> right at the start of the chagnelog.
> 
> 
> I grabbed the patches.  They should appear in linux-next tomorrow if I
> can get the current pooppile to build.

Well... these patches have been banging around in -next for a month or so now.
As far as I know there haven't been any complaints.  Can we push these for 3.9?

--D
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] Please pull NFS client bugfixes

2013-02-20 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 88b62b915b0b7e25870eb0604ed9a92ba4bfc9f7:

  Linux 3.8-rc6 (2013-02-01 12:08:14 +1100)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.9-1

for you to fetch changes up to 666b3d803a511fbc9bc5e5ea8ce66010cf03ea13:

  NLM: Ensure that we resend all pending blocking locks after a reclaim 
(2013-02-19 12:18:27 -0500)


NFS client bugfixes for Linux 3.9

- Fix an Oops in the pNFS layoutget code
- Fix a number of NFSv4 and v4.1 state recovery deadlocks and hangs
  due to the interaction of the session drain lock and state management
  locks.
- Remove task->tk_xprt, which was hiding a lot of RCU dereferencing bugs
- Fix a long standing NFSv3 posix lock recovery bug.
- Revert commit 324d003b0cd82151adbaecefef57b73f7959a469. It turned out
  that the root cause of the deadlock was due to interactions with the
  workqueues that have now been resolved.


Jeff Layton (1):
  sunrpc: silence build warning in gss_fill_context

Tim Gardner (1):
  nfs: remove kfree() redundant null checks

Trond Myklebust (18):
  SUNRPC: Eliminate task->tk_xprt accesses that bypass rcu_dereference()
  SUNRPC: Pass a pointer to struct rpc_xprt to the connect callback
  SUNRPC: Fix an RCU dereference in xs_local_rpcbind
  SUNRPC: Pass pointers to struct rpc_xprt to the congestion window
  SUNRPC: Fix an RCU dereference in xprt_reserve
  SUNRPC: Avoid RCU dereferences in the transport bind and connect code
  SUNRPC: Nuke the tk_xprt macro
  Revert "NFS: add nfs_sb_deactive_async to avoid deadlock"
  SUNRPC: Add missing static declaration to _gss_mech_get_by_name
  NFSv4: Allow the state manager to mark an open_owner as being recovered
  NFSv4.1: Prevent deadlocks between state recovery and file locking
  NFSv4.1: Don't lose locks when a server reboots during delegation return
  NFSv4: Fix up the return values of nfs4_open_delegation_recall
  NFSv4: Ensure delegation recall and byte range lock removal don't conflict
  NFSv4: Fix a reboot recovery race when opening a file
  NFSv4.1: Fix an ABBA locking issue with session and state serialisation
  NFSv4.1: Fix bulk recall and destroy of layouts
  NLM: Ensure that we resend all pending blocking locks after a reclaim

Weston Andros Adamson (1):
  NFSv4.1: Don't decode skipped layoutgets

fanchaoting (1):
  umount oops when remove blocklayoutdriver first

 fs/lockd/clntproc.c   |   3 +
 fs/nfs/blocklayout/blocklayout.c  |   1 +
 fs/nfs/callback_proc.c|  61 ++
 fs/nfs/delegation.c   | 154 --
 fs/nfs/delegation.h   |   1 +
 fs/nfs/getroot.c  |   3 +-
 fs/nfs/inode.c|   5 +-
 fs/nfs/internal.h |   1 -
 fs/nfs/nfs4_fs.h  |   4 +
 fs/nfs/nfs4proc.c | 133 -
 fs/nfs/nfs4state.c|  11 ++-
 fs/nfs/objlayout/objio_osd.c  |   1 +
 fs/nfs/pnfs.c | 150 -
 fs/nfs/pnfs.h |   7 +-
 fs/nfs/super.c|  49 ---
 fs/nfs/unlink.c   |   5 +-
 include/linux/sunrpc/sched.h  |   1 -
 include/linux/sunrpc/xprt.h   |   6 +-
 net/sunrpc/auth_gss/auth_gss.c|   5 +-
 net/sunrpc/auth_gss/gss_mech_switch.c |   4 +-
 net/sunrpc/clnt.c |  16 ++--
 net/sunrpc/xprt.c |  21 +++--
 net/sunrpc/xprtrdma/rpc_rdma.c|   4 +-
 net/sunrpc/xprtrdma/transport.c   |   7 +-
 net/sunrpc/xprtrdma/xprt_rdma.h   |   6 +-
 net/sunrpc/xprtsock.c |  16 ++--
 26 files changed, 415 insertions(+), 260 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT] Networking

2013-02-20 Thread David Miller

From: Linus Torvalds 
Date: Wed, 20 Feb 2013 19:12:37 -0800

> On Wed, Feb 20, 2013 at 7:05 PM, Linus Torvalds
>  wrote:
>>
>> Nooo You killed the 3c501 and 3c503 drivers! Snif.
> 
> .. but thank gods, the 3c509 still exists in the tree. I was worried
> for a minute.

Don't worry, the 3c509 will have it's day of reckoning too at
some point. :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] X.509: Support parse long form of length octets in Authority Key Identifier

2013-02-20 Thread joeyli

於 三，2013-02-20 於 12:49 +，David Howells 提到：
> Chun-Yi Lee  wrote:
> 
> > Per X.509 spec in 4.2.1.1 section, the structure of Authority Key
> > Identifier Extension is:
> > 
> >AuthorityKeyIdentifier ::= SEQUENCE {
> >   keyIdentifier [0] KeyIdentifier   OPTIONAL,
> >   authorityCertIssuer   [1] GeneralNamesOPTIONAL,
> >   authorityCertSerialNumber [2] CertificateSerialNumber OPTIONAL  }
> > 
> >KeyIdentifier ::= OCTET STRING
> > 
> > When a certificate also provides
> > authorityCertIssuer and authorityCertSerialNumber then the length of
> > AuthorityKeyIdentifier SEQUENCE is likely to long form format.
> > e.g.
> >The example certificate demos/tunala/A-server.pem in openssl source:
> > 
> > X509v3 Authority Key Identifier:
> > keyid:49:FB:45:72:12:C4:CC:E1:45:A1:D3:08:9E:95:C4:2C:6D:55:3F:17
> > DirName:/C=NZ/L=Wellington/O=Really Irresponsible Authorisation 
> > Authority (RIAA)/OU=Cert-stamping/CN=Jackov 
> > al-Trades/emailAddress=none@fake.domain
> > serial:00
> > 
> > Current parsing rule of OID_authorityKeyIdentifier only take care the
> > short form format, it causes load certificate to modsign_keyring fail:
> > 
> > [   12.061147] X.509: Extension: 47
> > [   12.075121] MODSIGN: Problem loading in-kernel X.509 certificate (-74)
> > 
> > So, this patch add the parsing rule for support long form format against
> > Authority Key Identifier.
> > 
> > v2:
> >  - Removed comma from author's name.
> >  - Moved 'Short Form length' comment inside the if-body.
> >  - Changed the type of sub to size_t.
> >  - Use ASN1_INDEFINITE_LENGTH rather than writing 0x80 and 127.
> >  - Moved the key_len's value assignment before alter v.
> >  - Fixed the typo of octets.
> >  - Add 2 to v before entering the loop for calculate the length.
> >  - Removed the comment of check vlen.
> > 
> > Cc: Rusty Russell 
> > Cc: Josh Boyer 
> > Cc: Randy Dunlap 
> > Cc: Herbert Xu 
> > Cc: "David S. Miller" 
> > Signed-off-by: Chun-Yi Lee 
> 
> Acked-by: David Howells 
> 

Thanks for David's review and confirm.

Joey Lee


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH V3] i2c: davinci: update to devm_* API

2013-02-20 Thread Vishwanathrao Badarkhe, Manish

Hi Wolfram

On Sat, Feb 16, 2013 at 00:39:43, Wolfram Sang wrote:
> On Thu, Feb 07, 2013 at 06:22:00PM +0530, Vishwanathrao Badarkhe, Manish 
> wrote:
> > Update the code to use devm_* API so that driver core will manage 
> > resources.
> > Also, if "devm_request_and_ioremap" fails return -EADDRNOTAVAIL 
> > instead of -EBUSY.
> > 
> > Signed-off-by: Vishwanathrao Badarkhe, Manish 
> 
> Basically OK, please resend when devm_ioremap_resource hits mainline in 3.9.

Thanks for pointing this out. Sure, I will resend this patch once 
devm_ioremap_resource hits mainline in 3.9.

Regards, 
Manish
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] lockdep: check that no locks held at freeze time

On Wed, Feb 20, 2013 at 4:42 PM, Andrew Morton
 wrote:
> On Wed, 20 Feb 2013 16:28:07 -0800
> Mandeep Singh Baines  wrote:
>
>> > Backtraces aren't *that* bad.  We'll easily be able to tell which of
>> > the two callsites triggered the trace.
>> >
>>
>> Let's say there was a try_to_freeze() that got inlined indirectly
>> (multiple levels of inline) into do_exit. Wouldn't the backtraces for
>> the regular exit check and the try_to_freeze check be identical except
>> for the offset (do_exit+0x45 versus do_exit+0x88)? So unless you had
>> an object file you wouldn't know which check you hit.
>
> Mutter.  Spose so.  Vaguely possible.  Yes, if we want to avoid a
> wont-happen, use __FILE__ and __LINE__.  Or, probably more sanely,
> __func__.
>

Fair enough. I'll avoid using a macro unless/until its actually needed.

Regards,
Mandeep

> Or uninline try_to_freeze().  If anything's calling that at high
> frequency, we have a problem.  And given the number of callsites,
> getting it into icache might result in a faster kernel...
>
> (Someone needs to teach __might_sleep() about __ratelimit())
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5] lockdep: check that no locks held at freeze time

We shouldn't try_to_freeze if locks are held.

Changes since v1:
* LKML: <20130215111635.ga26...@gmail.com> Ingo Molnar
  * Added a msg string that gets passed in.
* LKML: <20130215154449.gd30...@redhat.com> Oleg Nesterov
  * Check PF_NOFREEZE in try_to_freeze().
Changes since v2:
* LKML: <20130216170605.gc4...@redhat.com> Oleg Nesterovw
  * Avoid unnecessary PF_NOFREEZE check when !CONFIG_LOCKDEP.
* Mandeep Singh Baines
  * Generalize an exit specific printk.
Changes since v3:
* LKML: <20130220223013.ga15...@redhat.com> Oleg Nesterovw
  * Remove stale vfork comment from commit message.
Changes since v4:
* LKML: <20130220152446.a65ff84f.a...@linux-foundation.org> Andrew Morton
  * Remove tsk param since tsk is always current.
  * Remove msg param, dump_stack() should tell us all we need to know.

Signed-off-by: Mandeep Singh Baines 
CC: Ingo Molnar 
CC: Oleg Nesterov 
CC: Tejun Heo 
CC: Andrew Morton 
CC: Rafael J. Wysocki 
---
 include/linux/debug_locks.h |  4 ++--
 include/linux/freezer.h |  3 +++
 kernel/exit.c   |  2 +-
 kernel/lockdep.c| 16 +++-
 4 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index 3bd46f7..a975de1 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -51,7 +51,7 @@ struct task_struct;
 extern void debug_show_all_locks(void);
 extern void debug_show_held_locks(struct task_struct *task);
 extern void debug_check_no_locks_freed(const void *from, unsigned long len);
-extern void debug_check_no_locks_held(struct task_struct *task);
+extern void debug_check_no_locks_held(void);
 #else
 static inline void debug_show_all_locks(void)
 {
@@ -67,7 +67,7 @@ debug_check_no_locks_freed(const void *from, unsigned long 
len)
 }
 
 static inline void
-debug_check_no_locks_held(struct task_struct *task)
+debug_check_no_locks_held(void)
 {
 }
 #endif
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index e4238ce..c5bd118 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -3,6 +3,7 @@
 #ifndef FREEZER_H_INCLUDED
 #define FREEZER_H_INCLUDED
 
+#include 
 #include 
 #include 
 #include 
@@ -43,6 +44,8 @@ extern void thaw_kernel_threads(void);
 
 static inline bool try_to_freeze(void)
 {
+   if (!(current->flags & PF_NOFREEZE))
+   debug_check_no_locks_held();
might_sleep();
if (likely(!freezing(current)))
return false;
diff --git a/kernel/exit.c b/kernel/exit.c
index b4df219..aff5bdb 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -833,7 +833,7 @@ void do_exit(long code)
/*
 * Make sure we are holding no locks:
 */
-   debug_check_no_locks_held(tsk);
+   debug_check_no_locks_held();
/*
 * We can do this unlocked here. The futex code uses this flag
 * just to verify whether the pi state cleanup has been done
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 7981e5b..8e28f56 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -4083,7 +4083,7 @@ void debug_check_no_locks_freed(const void *mem_from, 
unsigned long mem_len)
 }
 EXPORT_SYMBOL_GPL(debug_check_no_locks_freed);
 
-static void print_held_locks_bug(struct task_struct *curr)
+static void print_held_locks_bug(void)
 {
if (!debug_locks_off())
return;
@@ -4092,21 +4092,19 @@ static void print_held_locks_bug(struct task_struct 
*curr)
 
printk("\n");
printk("=\n");
-   printk("[ BUG: lock held at task exit time! ]\n");
+   printk("[ BUG: %s/%d still has locks held! ]\n",
+  current->comm, task_pid_nr(current));
print_kernel_ident();
printk("-\n");
-   printk("%s/%d is exiting with locks still held!\n",
-   curr->comm, task_pid_nr(curr));
-   lockdep_print_held_locks(curr);
-
+   lockdep_print_held_locks(current);
printk("\nstack backtrace:\n");
dump_stack();
 }
 
-void debug_check_no_locks_held(struct task_struct *task)
+void debug_check_no_locks_held(void)
 {
-   if (unlikely(task->lockdep_depth > 0))
-   print_held_locks_bug(task);
+   if (unlikely(current->lockdep_depth > 0))
+   print_held_locks_bug();
 }
 
 void debug_show_all_locks(void)
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT] Networking

2013-02-20 Thread Linus Torvalds

On Wed, Feb 20, 2013 at 7:05 PM, Linus Torvalds
 wrote:
>
> Nooo You killed the 3c501 and 3c503 drivers! Snif.

.. but thank gods, the 3c509 still exists in the tree. I was worried
for a minute.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT] Networking

2013-02-20 Thread Linus Torvalds

On Wed, Feb 20, 2013 at 2:09 PM, David Miller  wrote:
>
> 15) Orphan and delete a bunch of pre-historic networking drivers from
> Paul Gortmaker.

Nooo You killed the 3c501 and 3c503 drivers! Snif.

I wonder if they still worked..

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Bug fix PATCH 0/2] Make whatever node kernel resides in un-hotpluggable.

2013-02-20 Thread Tang Chen

On 02/21/2013 05:36 AM, Andrew Morton wrote:

On Wed, 20 Feb 2013 19:00:54 +0800
Tang Chen wrote:

As mentioned by HPA before, when we are using movablemem_map=acpi, if all the
memory in SRAT is hotpluggable, then the kernel will have no memory to use, and
will fail to boot.

Before parsing SRAT, memblock has already reserved some memory in
memblock.reserve,
which is used by the kernel, such as storing the kernel image. We are not able
to
prevent the kernel from using these memory. So, these 2 patches make the node
which
the kernel resides in un-hotpluggable.

I'm planning to roll all these into a single commit:

acpi-memory-hotplug-support-getting-hotplug-info-from-srat.patch
acpi-memory-hotplug-support-getting-hotplug-info-from-srat-fix.patch
acpi-memory-hotplug-support-getting-hotplug-info-from-srat-fix-fix.patch
acpi-memory-hotplug-support-getting-hotplug-info-from-srat-fix-fix-fix.patch
acpi-memory-hotplug-support-getting-hotplug-info-from-srat-fix-fix-fix-fix.patch
acpi-memory-hotplug-support-getting-hotplug-info-from-srat-fix-fix-fix-fix-fix.patch

for reasons of tree-cleanliness and to avoid bisection holes. They're
at http://ozlabs.org/~akpm/mmots/broken-out/.

Can you please check the changelog for
acpi-memory-hotplug-support-getting-hotplug-info-from-srat.patch to see
if it needs any updates due to all the fixup patches? If so, please
send me the new changelog, thanks.

Hi Andrew,

Please use the following changelog for
acpi-memory-hotplug-support-getting-hotplug-info-from-srat.patch

We now provide an option for users who don't want to specify physical
memory address

in kernel commandline.

/*
* For movablemem_map=acpi:
*
* SRAT:|_| |_| |_| |_|
..

* node id:0 1 1 2
* hotpluggable: n y y n
* movablemem_map: |_| |_|
*
* Using movablemem_map, we can prevent memblock from
allocating memory

* on ZONE_MOVABLE at boot time.
*/

So user just specify movablemem_map=acpi, and the kernel will use
hotpluggable info

in SRAT to determine which memory ranges should be set as ZONE_MOVABLE.

If all the memory ranges in SRAT is hotpluggable, then no memory can be
used by kernel.
But before parsing SRAT, memblock has already reserve some memory ranges
for other
purposes, such as for kernel image, and so on. We cannot prevent kernel
from using
these memory. So we need to exclude these ranges even if these memory is
hotpluggable.

Furthermore, there could be several memory ranges in the single node
which the kernel
resides in. We may skip one range that have memory reserved by memblock,
but if the
rest of memory is too small, then the kernel will fail to boot. So, make
the whole node
which the kernel resides in un-hotpluggable. Then the kernel has enough
memory to use.

NOTE: Using this way will cause NUMA performance down because the whole node
will be set as ZONE_MOVABLE, and kernel cannot use memory on it.
If users don't want to lose NUMA performance, just don't use it.

Also, please review the changelogging for these:

The following xxx-fix-... patches will also be rolled, right ?
I'll post the changelogs later.

Thanks. :)

Re: [PATCH] DMI: Always call dmi_present with DMI structure

2013-02-20 Thread Zhenzhong Duan


Hi
Ben had sent a patch fixing this issue. Would you like to test his patch?
https://lkml.org/lkml/2013/2/16/102
zduan
On 2013-02-21 02:12, H.J. Lu wrote:

Hi,

This patch:

commit 9f9c9cbb60576a1518d0bf93fb8e499cffccf377
Author: Zhenzhong Duan 
Date:   Thu Dec 20 15:05:14 2012 -0800

 drivers/firmware/dmi_scan.c: fetch dmi version from SMBIOS if it exists

 The right dmi version is in SMBIOS if it's zero in DMI region

 This issue was originally found from an oracle bug.
 One customer noticed system UUID doesn't match between dmidecode & uek2.

  - HP ProLiant BL460c G6 :
# cat /sys/devices/virtual/dmi/id/product_uuid
--4C48-3031-4D5030333531
# dmidecode | grep -i uuiddrivers/firmware/dmi_scan.c:
fetch dmi version from SMBIOS if it exists

 The right dmi version is in SMBIOS if it's zero in DMI region

 This issue was originally found from an oracle bug.
 One customer noticed system UUID doesn't match between dmidecode & uek2.

  - HP ProLiant BL460c G6 :
# cat /sys/devices/virtual/dmi/id/product_uuid
--4C48-3031-4D5030333531
# dmidecode | grep -i uuid
UUID: --484C-3031-4D5030333531

 From SMBIOS 2.6 on, spec use little-endian encoding for UUID other than
 network byte order.

 So we need to get dmi version to distinguish.  If version is 0.0, the
 real version is taken from the SMBIOS version.  This is part of original
 kernel comment in code.

UUID: --484C-3031-4D5030333531

 From SMBIOS 2.6 on, spec use little-endian encoding for UUID other than
 network byte order.

 So we need to get dmi version to distinguish.  If version is 0.0, the
 real version is taken from the SMBIOS version.  This is part of original
 kernel comment in code.

causes a regression in 3.7, 3.8 and 3.9 kernels.   Before the change,
we only scan DMI structure.  Now smbios_present scans SMBIOS
entry point.  I have a machine which has invalid checksum in
SMBIOS entry point.  We wind up calling dmi_present with SMBIOS
entry point instead of DMI structure.  This patch changes smbios_present
to always call dmi_present with DMI structure.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] dlm updates for 3.9

2013-02-20 Thread David Teigland

Hi Linus,

Please pull dlm updates from tag:

git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git dlm-3.9

This includes a single patch to avoid excessive and
unnecessary scanning of rsbs to free.  Patch copied below.

Thanks,
Dave


dlm: avoid scanning unchanged toss lists

Keep track of whether a toss list contains any
shrinkable rsbs.  If not, dlm_scand can avoid
scanning the list for rsbs to shrink.  Unnecessary
scanning can otherwise waste a lot of time because
the toss lists can contain a large number of rsbs
that are non-shrinkable (directory records).

Signed-off-by: David Teigland 
---
 fs/dlm/dlm_internal.h |  3 +++
 fs/dlm/lock.c | 15 +++
 2 files changed, 18 insertions(+)

diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
index 77c0f70..e7665c3 100644
--- a/fs/dlm/dlm_internal.h
+++ b/fs/dlm/dlm_internal.h
@@ -96,10 +96,13 @@ do { \
 }
 
 
+#define DLM_RTF_SHRINK 0x0001
+
 struct dlm_rsbtable {
struct rb_root  keep;
struct rb_root  toss;
spinlock_t  lock;
+   uint32_tflags;
 };
 
 
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index a579f30..f750165 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -1132,6 +1132,7 @@ static void toss_rsb(struct kref *kref)
rb_erase(>res_hashnode, >ls_rsbtbl[r->res_bucket].keep);
rsb_insert(r, >ls_rsbtbl[r->res_bucket].toss);
r->res_toss_time = jiffies;
+   ls->ls_rsbtbl[r->res_bucket].flags |= DLM_RTF_SHRINK;
if (r->res_lvbptr) {
dlm_free_lvb(r->res_lvbptr);
r->res_lvbptr = NULL;
@@ -1659,11 +1660,18 @@ static void shrink_bucket(struct dlm_ls *ls, int b)
char *name;
int our_nodeid = dlm_our_nodeid();
int remote_count = 0;
+   int need_shrink = 0;
int i, len, rv;
 
memset(>ls_remove_lens, 0, sizeof(int) * DLM_REMOVE_NAMES_MAX);
 
spin_lock(>ls_rsbtbl[b].lock);
+
+   if (!(ls->ls_rsbtbl[b].flags & DLM_RTF_SHRINK)) {
+   spin_unlock(>ls_rsbtbl[b].lock);
+   return;
+   }
+
for (n = rb_first(>ls_rsbtbl[b].toss); n; n = next) {
next = rb_next(n);
r = rb_entry(n, struct dlm_rsb, res_hashnode);
@@ -1679,6 +1687,8 @@ static void shrink_bucket(struct dlm_ls *ls, int b)
continue;
}
 
+   need_shrink = 1;
+
if (!time_after_eq(jiffies, r->res_toss_time +
   dlm_config.ci_toss_secs * HZ)) {
continue;
@@ -1710,6 +1720,11 @@ static void shrink_bucket(struct dlm_ls *ls, int b)
rb_erase(>res_hashnode, >ls_rsbtbl[b].toss);
dlm_free_rsb(r);
}
+
+   if (need_shrink)
+   ls->ls_rsbtbl[b].flags |= DLM_RTF_SHRINK;
+   else
+   ls->ls_rsbtbl[b].flags &= ~DLM_RTF_SHRINK;
spin_unlock(>ls_rsbtbl[b].lock);
 
/*
-- 
1.8.1.rc1.5.g7e0651a

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4] tracing/syscalls: Annotate field-defining functions with __init

These two functions are called during kernel boot only.

Signed-off-by: Li Zefan 
---
 kernel/trace/trace_syscalls.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 5329e13..a70fa19 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -232,7 +232,7 @@ static void free_syscall_print_fmt(struct ftrace_event_call 
*call)
kfree(call->print_fmt);
 }
 
-static int syscall_enter_define_fields(struct ftrace_event_call *call)
+static int __init syscall_enter_define_fields(struct ftrace_event_call *call)
 {
struct syscall_trace_enter trace;
struct syscall_metadata *meta = call->data;
@@ -255,7 +255,7 @@ static int syscall_enter_define_fields(struct 
ftrace_event_call *call)
return ret;
 }
 
-static int syscall_exit_define_fields(struct ftrace_event_call *call)
+static int __init syscall_exit_define_fields(struct ftrace_event_call *call)
 {
struct syscall_trace_exit trace;
int ret;
-- 
1.8.0.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4] tracing: Annotate event field-defining functions with __init

Those functions are called either during kernel boot or module init.

Before:

$ dmesg | grep 'Freeing unused kernel memory'
Freeing unused kernel memory: 1208k freed
Freeing unused kernel memory: 1360k freed
Freeing unused kernel memory: 1960k freed

After:

$ dmesg | grep 'Freeing unused kernel memory'
Freeing unused kernel memory: 1236k freed
Freeing unused kernel memory: 1388k freed
Freeing unused kernel memory: 1960k freed

Signed-off-by: Li Zefan 
---
 include/trace/ftrace.h  | 2 +-
 kernel/trace/trace_export.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 20b6005..dc18af3 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -324,7 +324,7 @@ static struct trace_event_functions 
ftrace_event_type_funcs_##call = {  \
 
 #undef DECLARE_EVENT_CLASS
 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, func, print)   \
-static int notrace \
+static int notrace __init  \
 ftrace_define_fields_##call(struct ftrace_event_call *event_call)  \
 {  \
struct ftrace_raw_##call field; \
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index e039906..4f6a91c 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -129,7 +129,7 @@ static void __always_unused ftrace_check_##name(void)   
\
 
 #undef FTRACE_ENTRY
 #define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter)\
-int\
+static int __init  \
 ftrace_define_fields_##name(struct ftrace_event_call *event_call)  \
 {  \
struct struct_name field;   \
-- 
1.8.0.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 00/32] ldisc patchset

2013-02-20 Thread Shawn Guo

On Wed, Feb 20, 2013 at 03:02:47PM -0500, Peter Hurley wrote:
> [-cc Alan Cox]
> 
> Sebastian, please re-test your g_nokia+dummy_hcd testcase with
> this series.
> 
> Sasha and Dave, my trinity testbeds die in other areas right now;
> I would really appreciate if you would please re-test this series.
> 
> Michael and Shawn, I'd appreciate if you test with this series
> although I know it won't WARN because this patchset removes it.

On imx51 and imx6q:

Tested-by: Shawn Guo 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4] tracing: Add a helper function for event print functions

Move duplicate code in event print functions to a helper function.

This shrinks the size of the kernel by ~13K.

   textdata bss dec hex filename
6596137 1743966 1013867218478775119f6b7 vmlinux.o.old
6583002 1743849 1013867218465523119c2f3 vmlinux.o.new

Signed-off-by: Li Zefan 
---
 include/linux/ftrace_event.h |  8 ++--
 include/trace/ftrace.h   | 23 ++-
 kernel/trace/trace_output.c  | 26 ++
 3 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index a3d4895..d54d458 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -38,6 +38,12 @@ const char *ftrace_print_symbols_seq_u64(struct trace_seq *p,
 const char *ftrace_print_hex_seq(struct trace_seq *p,
 const unsigned char *buf, int len);
 
+struct trace_iterator;
+struct trace_event;
+
+int ftrace_raw_output_prep(struct trace_iterator *iter,
+  struct trace_event *event);
+
 /*
  * The trace entry - the most basic unit of tracing. This is what
  * is printed in the end as a single line in the trace output, such as:
@@ -93,8 +99,6 @@ enum trace_iter_flags {
 };
 
 
-struct trace_event;
-
 typedef enum print_line_t (*trace_print_func)(struct trace_iterator *iter,
  int flags, struct trace_event *event);
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 40dc5e8..20b6005 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -227,29 +227,18 @@ static notrace enum print_line_t  
\
 ftrace_raw_output_##call(struct trace_iterator *iter, int flags,   \
 struct trace_event *trace_event)   \
 {  \
-   struct ftrace_event_call *event;\
struct trace_seq *s = >seq;   \
+   struct trace_seq __maybe_unused *p = >tmp_seq;\
struct ftrace_raw_##call *field;\
-   struct trace_entry *entry;  \
-   struct trace_seq *p = >tmp_seq;   \
int ret;\
\
-   event = container_of(trace_event, struct ftrace_event_call, \
-event);\
-   \
-   entry = iter->ent;  \
+   field = (typeof(field))iter->ent;   \
\
-   if (entry->type != event->event.type) { \
-   WARN_ON_ONCE(1);\
-   return TRACE_TYPE_UNHANDLED;\
-   }   \
-   \
-   field = (typeof(field))entry;   \
-   \
-   trace_seq_init(p);  \
-   ret = trace_seq_printf(s, "%s: ", event->name); \
+   ret = ftrace_raw_output_prep(iter, trace_event);\
if (ret)\
-   ret = trace_seq_printf(s, print);   \
+   return ret; \
+   \
+   ret = trace_seq_printf(s, print);   \
if (!ret)   \
return TRACE_TYPE_PARTIAL_LINE; \
\
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 194d796..aa92ac3 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -397,6 +397,32 @@ ftrace_print_hex_seq(struct trace_seq *p, const unsigned 
char *buf, int buf_len)
 }
 EXPORT_SYMBOL(ftrace_print_hex_seq);
 
+int ftrace_raw_output_prep(struct trace_iterator *iter,
+  struct trace_event *trace_event)
+{
+   struct ftrace_event_call *event;
+   struct trace_seq *s = >seq;
+   struct trace_seq *p = >tmp_seq;
+   struct trace_entry *entry;
+   int ret;
+
+   event = container_of(trace_event, struct ftrace_event_call, event);
+   entry = iter->ent;
+
+   if

[PATCH 1/4] tracing/syscalls: Anotate some functions static


Signed-off-by: Li Zefan 
---
 kernel/trace/trace_syscalls.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 7609dd6..5329e13 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -77,7 +77,7 @@ static struct syscall_metadata *syscall_nr_to_meta(int nr)
return syscalls_metadata[nr];
 }
 
-enum print_line_t
+static enum print_line_t
 print_syscall_enter(struct trace_iterator *iter, int flags,
struct trace_event *event)
 {
@@ -130,7 +130,7 @@ end:
return TRACE_TYPE_HANDLED;
 }
 
-enum print_line_t
+static enum print_line_t
 print_syscall_exit(struct trace_iterator *iter, int flags,
   struct trace_event *event)
 {
@@ -270,7 +270,7 @@ static int syscall_exit_define_fields(struct 
ftrace_event_call *call)
return ret;
 }
 
-void ftrace_syscall_enter(void *ignore, struct pt_regs *regs, long id)
+static void ftrace_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 {
struct syscall_trace_enter *entry;
struct syscall_metadata *sys_data;
@@ -305,7 +305,7 @@ void ftrace_syscall_enter(void *ignore, struct pt_regs 
*regs, long id)
trace_current_buffer_unlock_commit(buffer, event, 0, 0);
 }
 
-void ftrace_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
+static void ftrace_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
 {
struct syscall_trace_exit *entry;
struct syscall_metadata *sys_data;
@@ -337,7 +337,7 @@ void ftrace_syscall_exit(void *ignore, struct pt_regs 
*regs, long ret)
trace_current_buffer_unlock_commit(buffer, event, 0, 0);
 }
 
-int reg_event_syscall_enter(struct ftrace_event_call *call)
+static int reg_event_syscall_enter(struct ftrace_event_call *call)
 {
int ret = 0;
int num;
@@ -356,7 +356,7 @@ int reg_event_syscall_enter(struct ftrace_event_call *call)
return ret;
 }
 
-void unreg_event_syscall_enter(struct ftrace_event_call *call)
+static void unreg_event_syscall_enter(struct ftrace_event_call *call)
 {
int num;
 
@@ -371,7 +371,7 @@ void unreg_event_syscall_enter(struct ftrace_event_call 
*call)
mutex_unlock(_trace_lock);
 }
 
-int reg_event_syscall_exit(struct ftrace_event_call *call)
+static int reg_event_syscall_exit(struct ftrace_event_call *call)
 {
int ret = 0;
int num;
@@ -390,7 +390,7 @@ int reg_event_syscall_exit(struct ftrace_event_call *call)
return ret;
 }
 
-void unreg_event_syscall_exit(struct ftrace_event_call *call)
+static void unreg_event_syscall_exit(struct ftrace_event_call *call)
 {
int num;
 
@@ -459,7 +459,7 @@ unsigned long __init __weak arch_syscall_addr(int nr)
return (unsigned long)sys_call_table[nr];
 }
 
-int __init init_ftrace_syscalls(void)
+static int __init init_ftrace_syscalls(void)
 {
struct syscall_metadata *meta;
unsigned long addr;
-- 
1.8.0.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] staging: fix all sparse warnings in silicom/bypasslib/

2013-02-20 Thread Randy Dunlap

From: Randy Dunlap 

Fix all sparse warning in drivers/staging/silicom/bypasslib/,
e.g.:


drivers/staging/silicom/bypasslib/bypass.c:471:21: warning: non-ANSI function 
declaration of function 'init_lib_module'
drivers/staging/silicom/bypasslib/bypass.c:478:25: warning: non-ANSI function 
declaration of function 'cleanup_lib_module'
drivers/staging/silicom/bypasslib/bypass.c:137:5: warning: symbol 
'is_bypass_dev' was not declared. Should it be static?
drivers/staging/silicom/bypasslib/bypass.c:182:5: warning: symbol 'is_bypass' 
was not declared. Should it be static?
drivers/staging/silicom/bypasslib/bypass.c:192:5: warning: symbol 
'get_bypass_slave' was not declared. Should it be static?
drivers/staging/silicom/bypasslib/bypass.c:197:5: warning: symbol 
'get_bypass_caps' was not declared. Should it be static?
drivers/staging/silicom/bypasslib/bypass.c:202:5: warning: symbol 
'get_wd_set_caps' was not declared. Should it be static?
etc.

Signed-off-by: Randy Dunlap 
---
 drivers/staging/silicom/bypasslib/bypass.c |   94 +--
 1 file changed, 47 insertions(+), 47 deletions(-)

--- lnx-38.orig/drivers/staging/silicom/bypasslib/bypass.c
+++ lnx-38/drivers/staging/silicom/bypasslib/bypass.c
@@ -134,7 +134,7 @@ static int is_dev_sd(int if_index)
return (ret >= 0 ? 1 : 0);
 }
 
-int is_bypass_dev(int if_index)
+static int is_bypass_dev(int if_index)
 {
struct pci_dev *pdev = NULL;
struct net_device *dev = NULL;
@@ -179,7 +179,7 @@ int is_bypass_dev(int if_index)
return (ret < 0 ? -1 : ret);
 }
 
-int is_bypass(int if_index)
+static int is_bypass(int if_index)
 {
int ret = 0;
SET_BPLIB_INT_FN(is_bypass, int, if_index, ret);
@@ -189,70 +189,70 @@ int is_bypass(int if_index)
return ret;
 }
 
-int get_bypass_slave(int if_index)
+static int get_bypass_slave(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_bypass_slave, GET_BYPASS_SLAVE, if_index);
 }
 
-int get_bypass_caps(int if_index)
+static int get_bypass_caps(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_bypass_caps, GET_BYPASS_CAPS, if_index);
 }
 
-int get_wd_set_caps(int if_index)
+static int get_wd_set_caps(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_wd_set_caps, GET_WD_SET_CAPS, if_index);
 }
 
-int set_bypass(int if_index, int bypass_mode)
+static int set_bypass(int if_index, int bypass_mode)
 {
DO_BPLIB_SET_ARG_FN(set_bypass, SET_BYPASS, if_index, bypass_mode);
 }
 
-int get_bypass(int if_index)
+static int get_bypass(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_bypass, GET_BYPASS, if_index);
 }
 
-int get_bypass_change(int if_index)
+static int get_bypass_change(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_bypass_change, GET_BYPASS_CHANGE, if_index);
 }
 
-int set_dis_bypass(int if_index, int dis_bypass)
+static int set_dis_bypass(int if_index, int dis_bypass)
 {
DO_BPLIB_SET_ARG_FN(set_dis_bypass, SET_DIS_BYPASS, if_index,
dis_bypass);
 }
 
-int get_dis_bypass(int if_index)
+static int get_dis_bypass(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_dis_bypass, GET_DIS_BYPASS, if_index);
 }
 
-int set_bypass_pwoff(int if_index, int bypass_mode)
+static int set_bypass_pwoff(int if_index, int bypass_mode)
 {
DO_BPLIB_SET_ARG_FN(set_bypass_pwoff, SET_BYPASS_PWOFF, if_index,
bypass_mode);
 }
 
-int get_bypass_pwoff(int if_index)
+static int get_bypass_pwoff(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_bypass_pwoff, GET_BYPASS_PWOFF, if_index);
 }
 
-int set_bypass_pwup(int if_index, int bypass_mode)
+static int set_bypass_pwup(int if_index, int bypass_mode)
 {
DO_BPLIB_SET_ARG_FN(set_bypass_pwup, SET_BYPASS_PWUP, if_index,
bypass_mode);
 }
 
-int get_bypass_pwup(int if_index)
+static int get_bypass_pwup(int if_index)
 {
DO_BPLIB_GET_ARG_FN(get_bypass_pwup, GET_BYPASS_PWUP, if_index);
 }
 
-int set_bypass_wd(int if_index, int ms_timeout, int *ms_timeout_set)
+static int set_bypass_wd(int if_index, int ms_timeout, int *ms_timeout_set)
 {
int data = ms_timeout, ret = 0;
if (is_dev_sd(if_index))
@@ -268,7 +268,7 @@ int set_bypass_wd(int if_index, int ms_t
return ret;
 }
 
-int get_bypass_wd(int if_index, int *ms_timeout_set)
+static int get_bypass_wd(int if_index, int *ms_timeout_set)
 {
int *data = ms_timeout_set, ret = 0;
if (is_dev_sd(if_index))
@@ -279,7 +279,7 @@ int get_bypass_wd(int if_index, int *ms_
return ret;
 }
 
-int get_wd_expire_time(int if_index, int *ms_time_left)
+static int get_wd_expire_time(int if_index, int *ms_time_left)
 {
int *data = ms_time_left, ret = 0;
if (is_dev_sd(if_index))
@@ -293,144 +293,144 @@ int get_wd_expire_time(int if_index, int
return ret;
 }
 
-int reset_bypass_wd_timer(int if_index)
+static int reset_bypass_wd_timer(int if_index)
 {
DO_BPLIB_GET_ARG_FN(reset_bypass_wd_timer, RESET_BYPASS_WD_TIMER,

Re: What does the PG_swapbacked of page flags actually mean?

2013-02-20 Thread common An

On Wed, Feb 20, 2013 at 6:43 PM, common An  wrote:
> PG_swapbacked is a bit for page->flags.
>
> In kernel code, its comment is "page is backed by RAM/swap". But I couldn't
> understand it.
> 1. Does the RAM mean DRAM? How page is backed by RAM?
> 2. When the page is page-out to swap file, the bit PG_swapbacked will be set
> to demonstrate this page is backed by swap. Is it right?
> 3. In general, when will call SetPageSwapBacked() to set the bit?

>From : http://www.gossamer-threads.com/lists/linux/kernel/840692#840692

Every anonymous, tmpfs or shared memory segment page is potentially
swap backed. That is the whole point of the PG_swapbacked flag.

A page from a filesystem like ext3 or NFS cannot suddenly turn into
a swap backed page. This page "nature" is not changed during the
lifetime of a page.

But, I am still a little confusing.

>
> Could anybody kindly explain for me?
>
> Thanks very much.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the tip tree

2013-02-20 Thread Stephen Rothwell

Hi all,

On Thu, 14 Feb 2013 13:30:16 +1100 Stephen Rothwell  
wrote:
>
> After merging the tip tree, today's linux-next build (x86_64 allmodconfig)
> failed like this:
> 
> drivers/thermal/intel_powerclamp.c: In function 'clamp_thread':
> drivers/thermal/intel_powerclamp.c:360:21: error: 'MAX_USER_RT_PRIO' 
> undeclared (first use in this function)
> 
> Caused by commit 8bd75c77b7c6 ("sched/rt: Move rt specific bits into new
> header file") interacting with commit d6d71ee4a14a ("PM: Introduce Intel
> PowerClamp Driver") from the thermal tree.
> 
> I applied this merge fix patch and can carry it as necessary:
> 
> From: Stephen Rothwell 
> Date: Thu, 14 Feb 2013 13:26:22 +1100
> Subject: [PATCH] sched/rt: fix PowerClamp Driver for define move
> 
> Signed-off-by: Stephen Rothwell 
> ---
>  drivers/thermal/intel_powerclamp.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/thermal/intel_powerclamp.c 
> b/drivers/thermal/intel_powerclamp.c
> index ab3ed90..b40b37c 100644
> --- a/drivers/thermal/intel_powerclamp.c
> +++ b/drivers/thermal/intel_powerclamp.c
> @@ -50,6 +50,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 

The above fix is now needed when the thermal tree is merged with Linus'
tree ...

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpthGBLP8kYm.pgp
Description: PGP signature

[PATCH] ALSA: usb: Fix Processing Unit Descriptor parsers

2013-02-20 Thread Pawel Moll

Commit 99fc86450c439039d2ef88d06b222fd51a779176 "ALSA: usb-mixer:
parse descriptors with structs" introduced a set of useful parsers
for descriptors. Unfortunately the parses for the Processing Unit
Descriptor came with a very subtle bug...

Functions uac_processing_unit_iProcessing() and
uac_processing_unit_specific() were indexing the baSourceID array
forgetting the fields before the iProcessing and process-specific
descriptors.

The problem was observed with Sound Blaster Extigy mixer,
where nNrModes in Up/Down-mix Processing Unit Descriptor
was accessed at offset 10 of the descriptor (value 0)
instead of offset 15 (value 7). In result the resulting
control had interesting limit values:

Simple mixer control 'Channel Routing Mode Select',0
  Capabilities: volume volume-joined penum
  Playback channels: Mono
  Capture channels: Mono
  Limits: 0 - -1
  Mono: -1 [100%]

Fixed by starting from the bmControls, which was calculated
correctly, instead of baSourceID.

Now the mentioned control is fine:

Simple mixer control 'Channel Routing Mode Select',0
  Capabilities: volume volume-joined penum
  Playback channels: Mono
  Capture channels: Mono
  Limits: 0 - 6
  Mono: 0 [0%]

Signed-off-by: Pawel Moll 
---
 include/uapi/linux/usb/audio.h |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/usb/audio.h b/include/uapi/linux/usb/audio.h
index ac90037..d2314be 100644
--- a/include/uapi/linux/usb/audio.h
+++ b/include/uapi/linux/usb/audio.h
@@ -384,14 +384,16 @@ static inline __u8 uac_processing_unit_iProcessing(struct 
uac_processing_unit_de
   int protocol)
 {
__u8 control_size = uac_processing_unit_bControlSize(desc, protocol);
-   return desc->baSourceID[desc->bNrInPins + control_size];
+   return *(uac_processing_unit_bmControls(desc, protocol)
+   + control_size);
 }
 
 static inline __u8 *uac_processing_unit_specific(struct 
uac_processing_unit_descriptor *desc,
 int protocol)
 {
__u8 control_size = uac_processing_unit_bControlSize(desc, protocol);
-   return >baSourceID[desc->bNrInPins + control_size + 1];
+   return uac_processing_unit_bmControls(desc, protocol)
+   + control_size + 1;
 }
 
 /* 4.5.2 Class-Specific AS Interface Descriptor */
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the net-next tree with the mips tree

2013-02-20 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in
include/linux/ssb/ssb_driver_gige.h between commit 111bd981e221 ("MIPS:
BCM47XX: add bcm47xx prefix in front of nvram function names") from the
mips tree and commit 180996c30517 ("ssb: get mac address from sprom
struct for gige driver") from the net-next tree.

I fixed it up (the latter seems to supercede the former, so I used that)
and can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpm0DusOqeyK.pgp
Description: PGP signature

RE: [PATCH] libertas sdio: remove CMD_FUNC_INIT call

2013-02-20 Thread Bing Zhao

Hi Lubomir,

> > > @@ -825,20 +825,6 @@ static void if_sdio_finish_power_on(struct 
> > > if_sdio_card *card)
> > >
> > >   sdio_release_host(func);
> > >
> > > - /*
> > > -  * FUNC_INIT is required for SD8688 WLAN/BT multiple functions
> > > -  */
> > > - if (card->model == MODEL_8688) {
> > > - struct cmd_header cmd;
> > > -
> > > - memset(, 0, sizeof(cmd));
> > > -
> > > - lbs_deb_sdio("send function INIT command\n");
> > > - if (__lbs_cmd(priv, CMD_FUNC_INIT, , sizeof(cmd),
> > > - lbs_cmd_copyback, (unsigned long) ))
> > > - netdev_alert(priv->dev, "CMD_FUNC_INIT cmd failed\n");
> > > - }
> > > -
> >
> > Removing FUNC_INIT could break things in some scenarios.
> > Could you please test the following case?
> >
> > 1. insmod liberates -> download firmware, send FUNC_INIT, ...
> > 2. rmmod libertas -> send FUNC_SHUTDOWN command to firmware; BT is still 
> > working.
> > 3. insmod libertas -> skip firmware downloading, send FUNC_INIT, ...
> >
> > If FUNC_INIT is removed, I don't expect step 3 to work.
> 
> In case btmrvl_sdio is loaded, the driver always locks up in FUNC_INIT
> upon probe time, thus I'm not able to proceed to further steps.
> 
> [  209.338953] [] (__schedule+0x610/0x764) from [] 
> (__lbs_cmd+0xb8/0x130
> [libertas])
> [  209.348340] [] (__lbs_cmd+0xb8/0x130 [libertas]) from 
> []
> (if_sdio_finish_power_on+0xec/0x1b0 [libertas_sdio])
> [  209.360136] [] (if_sdio_finish_power_on+0xec/0x1b0 
> [libertas_sdio]) from []
> (if_sdio_power_on+0x18c/0x20c [libertas_sdio])
> [  209.373052] [] (if_sdio_power_on+0x18c/0x20c [libertas_sdio]) 
> from []
> (if_sdio_probe+0x200/0x31c [libertas_sdio])
> [  209.385316] [] (if_sdio_probe+0x200/0x31c [libertas_sdio]) from 
> []
> (sdio_bus_probe+0x94/0xfc [mmc_core])
> [  209.396748] [] (sdio_bus_probe+0x94/0xfc [mmc_core]) from 
> []
> (driver_probe_device+0x12c/0x348)
> [  209.407214] [] (driver_probe_device+0x12c/0x348) from 
> []
> (__driver_attach+0x78/0x9c)
> [  209.416798] [] (__driver_attach+0x78/0x9c) from [] 
> (bus_for_each_dev+0x50/0x88)
> [  209.425946] [] (bus_for_each_dev+0x50/0x88) from []
> (bus_add_driver+0x108/0x268)
> [  209.435180] [] (bus_add_driver+0x108/0x268) from []
> (driver_register+0xa4/0x134)
> [  209.26] [] (driver_register+0xa4/0x134) from []
> (if_sdio_init_module+0x1c/0x3c [libertas_sdio])
> [  209.455339] [] (if_sdio_init_module+0x1c/0x3c [libertas_sdio]) 
> from []
> (do_one_initcall+0x98/0x174)
> [  209.466236] [] (do_one_initcall+0x98/0x174) from [] 
> (load_module+0x1c5c/0x1f80)
> [  209.475390] [] (load_module+0x1c5c/0x1f80) from []
> (sys_init_module+0x104/0x128)
> [  209.484632] [] (sys_init_module+0x104/0x128) from []
> (ret_fast_syscall+0x0/0x38)
> 
> In case btmrvl_sdio is _not_ loaded, insmod returns, but driver locks up
> waiting for FUNC_INIT to finish:
> 
> [  300.538859] [] (__schedule+0x610/0x764) from [] 
> (__lbs_cmd+0xb8/0x130
> [libertas])
> [  300.548600] [] (__lbs_cmd+0xb8/0x130 [libertas]) from 
> []
> (if_sdio_finish_power_on+0xec/0x1b0 [libertas_sdio])
> [  300.560398] [] (if_sdio_finish_power_on+0xec/0x1b0 
> [libertas_sdio]) from []
> (if_sdio_do_prog_firmware+0x414/0x454 [libertas_sdio])
> [  300.574052] [] (if_sdio_do_prog_firmware+0x414/0x454 
> [libertas_sdio]) from []
> (lbs_fw_loaded+0x24/0x58 [libertas])
> [  300.586907] [] (lbs_fw_loaded+0x24/0x58 [libertas]) from 
> []
> (request_firmware_work_func+0xb0/0xf4)
> [  300.597746] [] (request_firmware_work_func+0xb0/0xf4) from 
> []
> (process_one_work+0x348/0x6a8)
> [  300.608288] [] (process_one_work+0x348/0x6a8) from []
> (worker_thread+0x268/0x390)
> [  300.617630] [] (worker_thread+0x268/0x390) from [] 
> (kthread+0xc0/0xd4)
> [  300.625947] [] (kthread+0xc0/0xd4) from [] 
> (ret_from_fork+0x14/0x20)
> [  300.634135] 2 locks held by kworker/0:1/19:
> [  300.638383]  #0:  (events){.+.+.+}, at: [] 
> process_one_work+0x208/0x6a8
> [  300.646512]  #1:  ((_work->work)){+.+.+.}, at: [] 
> process_one_work+0x208/0x6a8

There seems to be a race condition in lbs_thread().

At line 582:
 582 if (!priv->fw_ready)
 583 continue;

The fw_ready is 0, so you never get the chance to execute the FUNC_INIT command.

 617 /* Execute the next command */
 618 if (!priv->dnld_sent && !priv->cur_cmd)
 619 lbs_execute_next_command(priv);


Could you try the following change?

diff --git a/drivers/net/wireless/libertas/if_sdio.c b/drivers/net/wireless/libe
index 739309e..8f5d977 100644
--- a/drivers/net/wireless/libertas/if_sdio.c
+++ b/drivers/net/wireless/libertas/if_sdio.c
@@ -825,6 +825,8 @@ static void if_sdio_finish_power_on(struct if_sdio_card *car

sdio_release_host(func);

+   priv->fw_ready = 1;
+
/*
 * FUNC_INIT is required for SD8688 WLAN/BT multiple functions
 */
@@ -839,7 +841,6 @@ static void if_sdio_finish_power_on(struct if_sdio_card *car

[PATCH] x86: mm: Fix vmalloc_fault oops during lazy MMU updates

2013-02-20 Thread Samu Kallio

In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
when lazy MMU updates are enabled, because set_pgd effects are being
deferred.

One instance of this problem is during process mm cleanup with memory
cgroups enabled. The chain of events is as follows:

- zap_pte_range enables lazy MMU updates
- zap_pte_range eventually calls mem_cgroup_charge_statistics,
  which accesses the vmalloc'd mem_cgroup per-cpu stat area
- vmalloc_fault is triggered which tries to sync the corresponding
  PGD entry with set_pgd, but the update is deferred
- vmalloc_fault oopses due to a mismatch in the PUD entries

Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
changes visible to the consistency checks.

Signed-off-by: Samu Kallio 
---
 arch/x86/mm/fault.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 027088f..3ba3dba 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long 
address)
if (pgd_none(*pgd_ref))
return -1;
 
-   if (pgd_none(*pgd))
+   if (pgd_none(*pgd)) {
set_pgd(pgd, *pgd_ref);
-   else
+   arch_flush_lazy_mmu_mode();
+   } else {
BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+   }
 
/*
 * Below here mismatches are bugs because these lower tables
-- 
1.8.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v5 04/15] sched: add sched balance policies in kernel

2013-02-20 Thread Alex Shi

On 02/20/2013 11:41 PM, Ingo Molnar wrote:
> 
> * Alex Shi  wrote:
> 
>> Now there is just 2 types policy: performance and 
>> powersaving(with 2 degrees, powersaving and balance).
> 
> I don't think we really want to have 'degrees' to the policies 
> at this point - we want each policy to be extremely good at what 
> it aims to do:
> 
>  - 'performance' should finish jobs in in the least amount of 
> time possible. No ifs and whens.
> 
>  - 'power saving' should finish jobs with the least amount of 
> watts consumed. No ifs and whens.
> 
>> powersaving policy will try to assign one task to each LCPU, 
>> whichever the LCPU is SMT thread or a core. The balance policy 
>> is also a kind of powersaving policy, just a bit less 
>> aggressive. It will try to assign tasks according group 
>> capacity, one task to one capacity.
> 
> The thing is, 'a bit less aggressive' is an awfully vague 
> concept to maintain on a long term basis - while the two 
> definitions above are reasonably deterministic which can be 
> measured and improved upon.
> 
> Those two policies and definitions are also much easier to 
> communicate to user-space and to users - it's much easier to 
> explain what each policy is supposed to do.
> 
> I'd be totally glad if we got so far that those two policies 
> work really well. Any further nuance visible at the ABI level is 
> I think many years down the road - if at all. Simple things 
> first - those are complex enough already.


Thanks for comments!
I will remove the 'balance' policy.

> 
> Thanks,
> 
>   Ingo
> 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 8/9] pps: Use a single cdev

2013-02-20 Thread Peter Hurley

On Tue, 2013-02-12 at 02:02 -0500, George Spelvin wrote:
> One per device just seems wasteful, when we already manintain a
> data structure to map minor numbers to devices, and we already have
> a PPS_MAX_SOURCES #define.
> 
> This is also a more comprehensive fix to the use-after-free bug
> that has already received a minimal patch.
> ---
>  drivers/pps/pps.c  | 66 
> --
>  include/linux/pps_kernel.h |  1 -
>  2 files changed, 34 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/pps/pps.c b/drivers/pps/pps.c
> index 6437703..754b0b5 100644
> --- a/drivers/pps/pps.c
> +++ b/drivers/pps/pps.c
> @@ -41,6 +41,8 @@
>  
>  static dev_t pps_devt;
>  static struct class *pps_class;
> +static struct cdev pps_cdev;
> +
>  
>  static DEFINE_MUTEX(pps_idr_lock);
>  static DEFINE_IDR(pps_idr);
> @@ -244,17 +246,23 @@ static long pps_cdev_ioctl(struct file *file,
>  
>  static int pps_cdev_open(struct inode *inode, struct file *file)
>  {
> - struct pps_device *pps = container_of(inode->i_cdev,
> - struct pps_device, cdev);
> - file->private_data = pps;
> - kobject_get(>dev->kobj);
> - return 0;
> + int err = -ENXIO;
> + struct pps_device *pps;
> +
> + rcu_read_lock();
> + pps = idr_find(_idr, iminor(inode));
> + if (pps) {
> + file->private_data = pps;
> + kobject_get(>dev->kobj);
> + err = 0;
> + }
> + rcu_read_unlock();

This should be:
rcu_read_lock();
pps = idr_find(_idr, iminor(inode));
rcu_read_unlock();
if (pps) {
file->private_data = pps;
kobject_get(>dev->kobj);
err = 0;
}

It's only the internal structures of idr that need rcu barriers.

> + return err;
>  }
>  
>  static int pps_cdev_release(struct inode *inode, struct file *file)
>  {
> - struct pps_device *pps = container_of(inode->i_cdev,
> - struct pps_device, cdev);
> + struct pps_device *pps = file->private_data;
>   kobject_put(>dev->kobj);
>   return 0;
>  }
> @@ -277,8 +285,6 @@ static void pps_device_destruct(struct device *dev)
>  {
>   struct pps_device *pps = dev_get_drvdata(dev);
>  
> - cdev_del(>cdev);
> -
>   /* Now we can release the ID for re-use */
>   pr_debug("deallocating pps%d\n", pps->id);
>   mutex_lock(_idr_lock);
> @@ -295,17 +301,14 @@ int pps_register_cdev(struct pps_device *pps)
>   dev_t devt;
>  
>   mutex_lock(_idr_lock);
> - /* Get new ID for the new PPS source */
> - if (idr_pre_get(_idr, GFP_KERNEL) == 0) {
> - mutex_unlock(_idr_lock);
> - return -ENOMEM;
> - }
> -
> - /* Now really allocate the PPS source.
> + /* Get new ID for the new PPS source.
>* After idr_get_new() calling the new source will be freely available
>* into the kernel.
>*/
> - err = idr_get_new(_idr, pps, >id);
> + if (idr_pre_get(_idr, GFP_KERNEL) == 0)
> + err = -ENOMEM;
> + else
> + err = idr_get_new(_idr, pps, >id);

Your maintainer should be letting you know about this:

 Forwarded Message 
From: Tejun Heo 
To: a...@linux-foundation.org
Cc: linux-kernel@vger.kernel.org, ru...@rustcorp.com.au, bfie...@fieldses.org,
skinsbur...@parallels.com, ebied...@xmission.com, jmor...@namei.org, 
ax...@kernel.dk,
Tejun Heo , Rodolfo Giometti 
Subject: [PATCH 41/62] pps: convert to idr_alloc()
Date: Sat, 2 Feb 2013 17:20:42 -0800

Convert to the much saner new idr interface.

Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Rodolfo Giometti 
---
This patch depends on an earlier idr changes and I think it would be
best to route these together through -mm.  Please holler if there's
any objection.  Thanks.

 drivers/pps/kapi.c |  2 +-
 drivers/pps/pps.c  | 36 ++--
 2 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/drivers/pps/kapi.c b/drivers/pps/kapi.c
index f197e8e..cdad4d9 100644
--- a/drivers/pps/kapi.c
+++ b/drivers/pps/kapi.c
@@ -102,7 +102,7 @@ struct pps_device *pps_register_source(struct 
pps_source_info *info,
goto pps_register_source_exit;
}
 
-   /* These initializations must be done before calling idr_get_new()
+   /* These initializations must be done before calling idr_alloc()
 * in order to avoid reces into pps_event().
 */
pps->params.api_version = PPS_API_VERS;
diff --git a/drivers/pps/pps.c b/drivers/pps/pps.c
index 2420d5a..de8e663 100644
--- a/drivers/pps/pps.c
+++ b/drivers/pps/pps.c
@@ -290,29 +290,21 @@ int pps_register_cdev(struct pps_device *pps)
dev_t devt;
 
mutex_lock(_idr_lock);
-   /* Get new ID for the new PPS source */
-   if (idr_pre_get(_idr, GFP_KERNEL) == 0) {
-   mutex_unlock(_idr_lock);
-   return

Re: [patch v5 06/15] sched: log the cpu utilization at rq

2013-02-20 Thread Alex Shi

On 02/20/2013 11:20 PM, Peter Zijlstra wrote:
> On Wed, 2013-02-20 at 22:33 +0800, Alex Shi wrote:
>>> There's generally a better value than 100 when using computers..
>> seeing
>>> how 100 is 64+32+4.
>>
>> I didn't find a good example for this. and no idea of your suggestion,
>> would you like to explain a bit more?
> 
> Basically what you're doing ends up being fixed point math, using 100 as
> unit is inefficient, pick a power-of-2 and everything reduces to
> bit-shifts.
> 
> http://en.wikipedia.org/wiki/Fixed-point_arithmetic
> 
> So use 128 or 1024 or whatever and you don't need mult and div
> instructions to represent [0,1]
> 

got it. will reconsider this.

-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] of/i2c: don't register disabled devices

2013-02-20 Thread Dmitry Eremin-Solenikov

On 20/02/13 23:09, Rob Herring wrote:
> On 02/20/2013 12:28 PM, Dmitry Eremin-Solenikov wrote:
>> Don't register i2c slave device tree nodes which have
>> status = "disabled" property.
>>
> 
> This is already in 3.8.

Ah, true. Sorry for the noise then.

-- 
With best wishes
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v5 11/15] sched: add power/performance balance allow flag

2013-02-20 Thread Alex Shi

On 02/20/2013 11:22 PM, Borislav Petkov wrote:
> On Wed, Feb 20, 2013 at 10:20:19PM +0800, Alex Shi wrote:
 > >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 > >> index 2e8131d..0047856 100644
 > >> --- a/kernel/sched/fair.c
 > >> +++ b/kernel/sched/fair.c
 > >> @@ -4053,6 +4053,8 @@ struct lb_env {
 > >>   unsigned intloop;
 > >>   unsigned intloop_break;
 > >>   unsigned intloop_max;
 > >> + int power_lb;  /* if power balance needed */
 > >> + int perf_lb;   /* if performance balance 
 > >> needed */
>>> > > 
>>> > > Those look like they're used like simple boolean flags. Why not make
>>> > > them such, i.e. bitfields? See struct perf_event_attr for an example.
>> > 
>> > there are 11 long words in struct lb_env now. use boolean or bitfields
>> > can't save much space.
> Now now maybe.
> 
> Btw, there's a ->flags variable there which simply cries to get another
> LBF_* flag or two. This way you don't add any new members at all and
> don't enlarge the struct.
> 

Yes, use flags can save 2 int variable, I will change that.

Just curious, consider the lb_env size and just used in stack, plus the
big cacheline size of modern cpu, and the alignment of gcc flag on
kernel, seems no arch needs more cache lines. Are there any platforms
performance is impacted by this 2 int variables?

-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Update][PATCH 2/7] ACPI / scan: Introduce common code for ACPI-based device hotplug

2013-02-20 Thread Toshi Kani

On Wed, 2013-02-20 at 23:49 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
 :
> +
> +/**
> + * acpi_bus_hot_remove_device: hot-remove a device and its children
> + * @context: struct acpi_eject_event pointer (freed in this func)
> + *
> + * Hot-remove a device and its children. This function frees up the
> + * memory space passed by arg context, so that the caller may call
> + * this function asynchronously through acpi_os_hotplug_execute().
> + */
> +void acpi_bus_hot_remove_device(void *context)
> +{
> + struct acpi_eject_event *ej_event = context;
> + struct acpi_device *device = ej_event->device;
> + acpi_handle handle = device->handle;
> + u32 ost_code = ACPI_OST_SC_SUCCESS;
> + int error;
> +
> + mutex_lock(_scan_lock);
> +
> + error = acpi_scan_hot_remove(device);
> + if (error)
> + ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE;
> +
> + acpi_evaluate_hotplug_ost(handle, ej_event->event, ost_code, NULL);

Thanks for the quick update.  It fixed the deadlock issue. :-)  As it
now completes an eject operation, I found a new issue.  When the OS
called _EJ0, it is not supposed to call _OST since FW has already
received the completion status from _EJ0.  That is, the OS calls either
_EJ0 (success case) or _OST (failure case) for hot-delete. 

-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH EDAC 03/13] ghes: add the needed hooks for EDAC error report

2013-02-20 Thread Huang Ying

Sorry for late!

On Fri, 2013-02-15 at 10:44 -0200, Mauro Carvalho Chehab wrote:
> In order to allow reporting errors via EDAC, add hooks for:
> 
> 1) register an EDAC driver;
> 2) unregister an EDAC driver;
> 3) report errors via EDAC.
> 
> As the EDAC driver will need to access the ghes structure, adds it
> as one of the parameters for ghes_do_proc.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  drivers/acpi/apei/ghes.c | 17 ++---
>  include/acpi/ghes.h  | 27 +++
>  2 files changed, 41 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 6d0e146..a21d7da 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -409,7 +409,8 @@ static void ghes_clear_estatus(struct ghes *ghes)
>   ghes->flags &= ~GHES_TO_CLEAR;
>  }
>  
> -static void ghes_do_proc(const struct acpi_hest_generic_status *estatus)
> +static void ghes_do_proc(struct ghes *ghes,
> +  const struct acpi_hest_generic_status *estatus)
>  {
>   int sev, sec_sev;
>   struct acpi_hest_generic_data *gdata;
> @@ -421,6 +422,8 @@ static void ghes_do_proc(const struct 
> acpi_hest_generic_status *estatus)
>CPER_SEC_PLATFORM_MEM)) {
>   struct cper_sec_mem_err *mem_err;
>   mem_err = (struct cper_sec_mem_err *)(gdata+1);
> + ghes_edac_report_mem_error(ghes, sev, mem_err);
> +
>  #ifdef CONFIG_X86_MCE
>   apei_mce_report_mem_error(sev == GHES_SEV_CORRECTED,
> mem_err);
> @@ -639,7 +642,7 @@ static int ghes_proc(struct ghes *ghes)
>   if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))
>   ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>   }
> - ghes_do_proc(ghes->estatus);
> + ghes_do_proc(ghes, ghes->estatus);
>  out:
>   ghes_clear_estatus(ghes);
>   return 0;
> @@ -732,7 +735,7 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
>   estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
>   len = apei_estatus_len(estatus);
>   node_len = GHES_ESTATUS_NODE_LEN(len);
> - ghes_do_proc(estatus);
> + ghes_do_proc(estatus_node->ghes, estatus);
>   if (!ghes_estatus_cached(estatus)) {
>   generic = estatus_node->generic;
>   if (ghes_print_estatus(NULL, generic, estatus))
> @@ -821,6 +824,7 @@ static int ghes_notify_nmi(unsigned int cmd, struct 
> pt_regs *regs)
>   estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool,
> node_len);
>   if (estatus_node) {
> + estatus_node->ghes = ghes;
>   estatus_node->generic = ghes->generic;
>   estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
>   memcpy(estatus, ghes->estatus, len);
> @@ -942,6 +946,10 @@ static int ghes_probe(struct platform_device *ghes_dev)
>   }
>   platform_set_drvdata(ghes_dev, ghes);
>  
> + rc = ghes_edac_register(ghes, _dev->dev);
> + if (rc < 0)
> + goto err;
> +

If ghes_edac_register() failed, we need to do some cleanup such as
unregister from hed etc.

Or just move ghes_edac_register() before switch?

>   return 0;
>  err:
>   if (ghes) {
> @@ -995,6 +1003,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>   }
>  
>   ghes_fini(ghes);
> +
> + ghes_edac_unregister(ghes);
> +
>   kfree(ghes);
>  
>   platform_set_drvdata(ghes_dev, NULL);
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 3eb8dc4..c6fef72 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -22,11 +22,14 @@ struct ghes {
>   struct timer_list timer;
>   unsigned int irq;
>   };
> +
> + struct mem_ctl_info *mci;

Why we need this?  This is not used by ghes.[hc].

>  };
>  
>  struct ghes_estatus_node {
>   struct llist_node llnode;
>   struct acpi_hest_generic *generic;
> + struct ghes *ghes;
>  };
>  
>  struct ghes_estatus_cache {
> @@ -43,3 +46,27 @@ enum {
>   GHES_SEV_RECOVERABLE = 0x2,
>   GHES_SEV_PANIC = 0x3,
>  };
> +
> +#ifdef CONFIG_EDAC_GHES
> +void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
> + struct cper_sec_mem_err *mem_err);
> +
> +int ghes_edac_register(struct ghes *ghes, struct device *dev);
> +
> +void ghes_edac_unregister(struct ghes *ghes);
> +
> +#else
> +static inline void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
> +struct cper_sec_mem_err *mem_err)
> +{
> +}
> +
> +static inline int ghes_edac_register(struct ghes *ghes, struct device *dev)
> +{
> + return 0;
> +}
> +
> +static inline void ghes_edac_unregister(struct ghes *ghes)

Re: [PATCH 0/3] posix timers: Extend kernel API to report more info about timers

2013-02-20 Thread Matthew Helsley

On Thu, Feb 14, 2013 at 8:18 AM, Pavel Emelyanov  wrote:
> Hi.
>
> I'm working on the checkpoint-restore project (http://criu.org), briefly
> it's aim is to collect information about process' state and saving it so
> that later it is possible to recreate the processes in the very same state
> as they were, using the collected information.
>
> One part of the task's state is the posix timers that this task has created.
> Currently kernel doesn't provide any API for getting information about
> what timers are currently created by process and in which state they are.
> I'd like to extend the posix timers API to provide more information about
> timers.
>
> Another problem with timers is the timer ID. Currently IDs are generated
> from global IDR and this makes it impossible to restore a timer from
> the saved state in general, as the required ID may be already busy at the
> time of restore.
>
> That said, I propose to
>
> 1. Change the way timer IDs are generated. This was done some time ago, so
>I'm just re-sending this patch;

Seems fine in principle. Aside: I noticed there were some
important-looking patches to the idr usage in timer id allocation
today...

> 2. Add a system call that will list timer IDs created by the calling process;

If timers were listed in /proc like fds then you wouldn't need this
syscall. If we keep adding new syscalls like this CRIU will be
needlessly x86-specific when it could have been written more portably.

> 3. Add a system call that will allow to get the sigevent information about
>particular timer in the sigaction-like manner.

You mentioned "extending the POSIX timer API". Isn't that something
best left to standards bodies lest your changes conflict with theirs?
Again, if this were a /proc interface you wouldn't have that issue
(you'll have others ;)).

>
> This is actually an RFC to start discussion about how the described problems
> can be addressed. Thus, if the approach with new system calls is not 
> acceptable,
> I'm OK to implement this in any other form.

My preference is for "other form" for the reasons above.

Cheers,
-Matt Helsley
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id

2013-02-20 Thread Bjorn Helgaas

On Mon, Feb 11, 2013 at 4:00 PM, Shuah Khan  wrote:
> pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however,
> it doesn't have interfaces to return PCI bus and PCI device id. Drivers
> (AMD IOMMU, and AER) implement module specific definitions for PCI_BUS()
> and AMD_IOMMU driver also has a module specific interface to calculate PCI
> device id from bus number and devfn.
>
> Add PCI_BUS and PCI_DEVID interfaces to return PCI bus number and PCI device
> id respectively to avoid the need for duplicate definitions in other modules.
> AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver defines
> an interface to calculate device id from bus number, and devfn pair.
>
> Signed-off-by: Shuah Khan 
> ---
>  include/uapi/linux/pci.h |4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
> index 3c292bc0..6b2c8b3 100644
> --- a/include/uapi/linux/pci.h
> +++ b/include/uapi/linux/pci.h
> @@ -30,6 +30,10 @@
>  #define PCI_DEVFN(slot, func)  slot) & 0x1f) << 3) | ((func) & 0x07))
>  #define PCI_SLOT(devfn)(((devfn) >> 3) & 0x1f)
>  #define PCI_FUNC(devfn)((devfn) & 0x07)
> +#define PCI_DEVID(bus, devfn)  u16)bus) << 8) | devfn)
> +
> +/* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */
> +#define PCI_BUS(x) (((x) >> 8) & 0xff)
>
>  /* Ioctls for /proc/bus/pci/X/Y nodes. */
>  #define PCIIOC_BASE('P' << 24 | 'C' << 16 | 'I' << 8)

David, can you point me at a description of include/uapi ... what is
there and why, and how we should decide what new things go in
include/uapi/linux/pci.h as opposed to include/linux/pci.h?  Maybe
there should be something in Documentation/?

I'm guessing it's something to do with being exported to userland, but
I'm not sure the things in this patch (PCI_DEV_ID, PCI_BUS) are really
exportable in the sense of being used for syscalls, etc.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] KVM updates for the 3.9 merge window

2013-02-20 Thread Marcelo Tosatti



Linus,

Please pull from

git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-3.9-1

to receive the KVM updates for the 3.9 merge window, including x86 real
mode emulation fixes, stronger memory slot interface restrictions, 
mmu_lock spinlock hold time reduction, improved handling of large 
page faults on shadow, initial APICv HW acceleration support, 
s390 channel IO based virtio, amongst others.

--

Alex Williamson (13):
  KVM: Restrict non-existing slot state transitions
  KVM: Check userspace_addr when modifying a memory slot
  KVM: Fix iommu map/unmap to handle memory slot moves
  KVM: Minor memory slot optimization
  KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS
  KVM: Make KVM_PRIVATE_MEM_SLOTS optional
  KVM: struct kvm_memory_slot.user_alloc -> bool
  KVM: struct kvm_memory_slot.flags -> u32
  KVM: struct kvm_memory_slot.id -> short
  KVM: Increase user memory slots on x86 to 125
  kvm: Fix memory slot generation updates
  kvm: Force IOMMU remapping on memory slot read-only flag changes
  kvm: Obey read-only mappings in iommu

Alexander Graf (17):
  KVM: PPC: Only WARN on invalid emulation
  KVM: PPC: Book3S: PR: Enable alternative instruction for SC 1
  KVM: PPC: BookE: Allow irq deliveries to inject requests
  KVM: PPC: BookE: Emulate mfspr on EPR
  KVM: PPC: BookE: Implement EPR exit
  KVM: PPC: BookE: Add EPR ONE_REG sync
  KVM: PPC: E500: Move write_stlbe higher
  KVM: PPC: E500: Explicitly mark shadow maps invalid
  KVM: PPC: E500: Propagate errors when shadow mapping
  KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping
  KVM: PPC: E500: Split host and guest MMU parts
  KVM: PPC: e500: Implement TLB1-in-TLB0 mapping
  KVM: PPC: E500: Make clear_tlb_refs and clear_tlb1_bitmap static
  KVM: PPC: E500: Remove kvmppc_e500_tlbil_all usage from guest TLB code
  Merge commit 'origin/next' into kvm-ppc-next
  KVM: PPC: BookE: Handle alignment interrupts
  Merge commit 'origin/next' into kvm-ppc-next

Avi Kivity (16):
  KVM: x86 emulator: framework for streamlining arithmetic opcodes
  KVM: x86 emulator: Support for declaring single operand fastops
  KVM: x86 emulator: introduce NoWrite flag
  KVM: x86 emulator: mark CMP, CMPS, SCAS, TEST as NoWrite
  KVM: x86 emulator: convert NOT, NEG to fastop
  KVM: x86 emulator: add macros for defining 2-operand fastop emulation
  KVM: x86 emulator: convert basic ALU ops to fastop
  KVM: x86 emulator: Convert SHLD, SHRD to fastop
  KVM: x86 emulator: convert shift/rotate instructions to fastop
  KVM: x86 emulator: covert SETCC to fastop
  KVM: x86 emulator: convert INC/DEC to fastop
  KVM: x86 emulator: convert BT/BTS/BTR/BTC/BSF/BSR to fastop
  KVM: x86 emulator: convert 2-operand IMUL to fastop
  KVM: x86 emulator: rearrange fastop definitions
  KVM: x86 emulator: convert a few freestanding emulations to fastop
  KVM: x86 emulator: fix test_cc() build failure on i386

Bharat Bhushan (3):
  KVM: PPC: booke: use vcpu reference from thread_struct
  KVM: PPC: booke: Allow multiple exception types
  booke: Added DBCR4 SPR number

Christian Borntraeger (3):
  KVM: s390: Gracefully handle busy conditions on ccw_device_start
  s390/kvm: Fix store status for ACRS/FPRS
  s390/kvm: Fix instruction decoding

Cong Ding (1):
  KVM: s390: kvm/sigp.c: fix memory leakage

Cornelia Huck (14):
  KVM: s390: Handle hosts not supporting s390-virtio.
  s390/ccwdev: Include asm/schid.h.
  KVM: s390: Add a channel I/O based virtio transport driver.
  KVM: s390: Constify intercept handler tables.
  KVM: s390: Decoding helper functions.
  KVM: s390: Support for I/O interrupts.
  KVM: s390: Add support for machine checks.
  KVM: s390: In-kernel handling of I/O instructions.
  KVM: s390: Base infrastructure for enabling capabilities.
  KVM: s390: Add support for channel I/O instructions.
  KVM: s390: Dynamic allocation of virtio-ccw I/O data.
  KVM: trace: Fix exit decoding.
  s390/virtio-ccw: Fix setup_vq error handling.
  KVM: s390: Fix handling of iscs.

Dongxiao Xu (1):
  KVM: VMX: disable SMEP feature when guest is in non-paging mode

Geoff Levand (1):
  KVM: Remove duplicate text in api.txt

Gleb Natapov (39):
  KVM: emulator: implement AAD instruction
  KVM: inject ExtINT interrupt before APIC interrupts
  KVM: remove unused variable.
  KVM: VMX: cleanup rmode_segment_valid()
  KVM: VMX: relax check for CS register in rmode_segment_valid()
  KVM: VMX: return correct segment limit and flags for CS/SS registers in 
real mode
  KVM: VMX: use fix_rmode_seg() to fix all code/data segments
  KVM: VMX: remove redundant code from vmx_set_segment()
  KVM: VMX: clean-up vmx_set_segment()
  KVM: VMX: remove unneeded temporary variable from vmx_set_segment()

Re: [PATCH v2] vt: add init_hide parameter to suppress boot output

2013-02-20 Thread Greg Kroah-Hartman

On Wed, Feb 20, 2013 at 02:08:25PM -0800, Andy Ross wrote:
> On 02/20/2013 12:57 PM, Pavel Machek wrote:
> >I'm sure something creative can be done with fake init that shuts
> >the console up then execs previous init. No need to add more kernel
> >knobs, I'd say.
> 
> Fair enough, but some last words:
> 
> That's argument is the "it's about logging" hypothesis again.  Even if
> it were possible to completely shut up console output (something
> that's awfully hard in the general case when running on PC hardware,
> and IMHO from a developer's perspective not even a good thing), that's
> not the whole problem.  The framebuffer console initialization does a
> buffer clear and mode set, and that clobbers anything the bootloader
> might have left on the screen prematurely, before userspace is ready
> to throw up its own splash.  Splash screens may be a silly
> requirement, but they're still a requirement.

Yes, they are a requirement in some situations, and if you look most
distros have already solved this issue for you, by not using a
framebuffer at all.  Why not just do the same thing in your Android
system as you do have full control over the hardware and the boot
process.

> And the suspend console problem is likewise at work: ideally you'd
> like to know, for example, that the panel backlight is off before
> suspending.  But what happens in practice is that the kernel does a VT
> switch to/from console 63 and the backlight wakes up (I'm not going to
> pretend I have this bit completely figured out, but the problem is/was
> real and this patch fixed it by suppressing the console visibility).

My systems don't drop down to the framebuffer when suspending, I think
you need to look at using a better distro :)

> Now, the point that an in-kernel console is "going away" and thus not
> worth augmenting with new APIs is valid.  And this is a small patch
> that's unlikely to be difficult to maintain in a custom tree.  And as
> we all agree there are other mechanisms that can be used here (even if
> AFAICT they don't completely solve the problem), and indeed I'd love
> to get surfaceflinger working with VT_ACTIVATE et. al. if I get a
> chance.  So I'm not going to cry if this isn't worth mainline.

I don't see why this is even needed for surfaceflinger systems, as
again, you have full control over the hardware and system so you don't
even need a framebuffer console at all.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Documentation: update top level 00-INDEX file with new additions

2013-02-20 Thread Rob Landley

On 02/18/2013 09:57:36 AM, Randy Dunlap wrote:

On 02/18/13 01:39, Jiri Kosina wrote:
> On Thu, 14 Feb 2013, Paul Gortmaker wrote:
>
>> It seems there are about 80 new, but undocumented addtions at
>> the top level Documentation directory.  This fixes up the top
>> level 00-INDEX by adding new entries and deleting a couple orphans.
>> Some subdirs could probably still use a check/cleanup too though.

After this patch, I would prefer to see a requirement that each  
Documentation/
file contain a "topic" line and then generate INDEX files from those  
automatically...

comments?

I actually have a script that can audit the 00-INDEX files, as part of  
my kernel.org/doc build stuff:

  http://landley.net/hg/kdocs/file/tip/make

Manually auditing these isn't hard for me, it's just that since  
kernel.org went all-in on locking the barn door after the horses  
escaped, I haven't had access to my old kernel.org account (I need to  
meed kernel developers in person to get keys signed, which doesn't  
happen much).

And even if I did get a new ssh key, you don't get shell access anymore  
you get "kup" which is a git wrapper you can't rsync through. So fixing  
problem 1 opens up problem 2 and I still can't do anything useful.  
(Navigating the new bureaucracy is on my todo list, but not really  
something I sit down and go "oh boy, I should work on THIS" on any  
given evening.)

So I haven't been able to update kernel.org/doc since the breakin, and  
my tools for auditing the 00-INDEX files and htmldocs and menuconfig  
and so on are all tied up with that.

Rob--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] lockdep: check that no locks held at freeze time

2013-02-20 Thread Andrew Morton

On Wed, 20 Feb 2013 16:28:07 -0800
Mandeep Singh Baines  wrote:

> > Backtraces aren't *that* bad.  We'll easily be able to tell which of
> > the two callsites triggered the trace.
> >
> 
> Let's say there was a try_to_freeze() that got inlined indirectly
> (multiple levels of inline) into do_exit. Wouldn't the backtraces for
> the regular exit check and the try_to_freeze check be identical except
> for the offset (do_exit+0x45 versus do_exit+0x88)? So unless you had
> an object file you wouldn't know which check you hit.

Mutter.  Spose so.  Vaguely possible.  Yes, if we want to avoid a
wont-happen, use __FILE__ and __LINE__.  Or, probably more sanely,
__func__.

Or uninline try_to_freeze().  If anything's calling that at high
frequency, we have a problem.  And given the number of callsites,
getting it into icache might result in a faster kernel...

(Someone needs to teach __might_sleep() about __ratelimit())
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] user namespace and namespace infrastructure changes for 3.9

2013-02-20 Thread Eric W. Biederman


Linus,

Please pull the for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
for-linus

   HEAD: 139321c65c0584cd65c4c87a5eb3fdb4fdbd0e19 cifs: Enable building with 
user namespaces enabled.

   This tree is against v3.8-rc1 with the first few bug-fix commits
   already merged into v3.8.

This set of changes starts with a few small enhnacements to the user
namespace.  reboot support, allowing more arbitrary mappings, and
support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the user
namespace root.

I do my best to document that if you care about limiting your
unprivileged users that when you have the user namespace support enabled
you will need to enable memory control groups.

There is a minor bug fix to prevent overflowing the stack if someone
creates way too many user namespaces.

The bulk of the changes are a continuation of the kuid/kgid push down
work through the filesystems.  These changes make using uids and gids
typesafe which ensures that these filesystems are safe to use when
multiple user namespaces are in use.  The filesystems converted for 3.9
are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs.  The changes
for these filesystems were a little more involved so I split the changes
into smaller hopefully obviously correct changes.

XFS is the only filesystem that remains.  I was hoping I could get that
in this release so that user namespace support would be enabled with an
allyesconfig or an allmodconfig but it looks like the xfs changes need
another couple of days before it they are ready.

Eric W. Biederman (91):
  userns: Avoid recursion in put_user_ns
  userns: Allow any uid or gid mappings that don't overlap.
  userns: Recommend use of memory control groups.
  userns: Allow the userns root to mount of devpts
  userns: Allow the userns root to mount ramfs.
  userns: Allow the userns root to mount tmpfs.
  ceph: Only allow mounts in the initial network namespace
  ceph: Translate between uid and gids in cap messages and kuids and kgids
  ceph: Translate inode uid and gid attributes to/from kuids and kgids.
  ceph: Convert struct ceph_mds_request to use kuid_t and kgid_t
  ceph: Convert kuids and kgids before printing them.
  ceph: Enable building when user namespaces are enabled.
  9p: Add 'u' and 'g' format specifies for kuids and kgids
  9p: Transmit kuid and kgid values
  9p: Modify the stat structures to use kuid_t and kgid_t
  9p: Modify struct 9p_fid to use a kuid_t not a uid_t
  9p: Modify struct v9fs_session_info to use a kuids and kgids
  9p: Modify v9fs_get_fsgid_for_create to return a kgid
  9p: Allow building 9p with user namespaces enabled.
  afs: Remove unused structure afs_store_status
  afs: Only allow mounting afs in the intial network namespace
  afs: Support interacting with multiple user namespaces
  coda: Restrict coda messages to the initial pid namespace
  coda: Restrict coda messages to the initial user namespace
  coda: Cache permisions in struct coda_inode_info in a kuid_t.
  coda: Allow coda to be built when user namespace support is enabled
  ocfs2: Handle kuids and kgids in acl/xattr conversions.
  ocfs2: convert between kuids and kgids and DLM locks
  ocfs2: Convert uid and gids between in core and on disk inodes
  ocfs2: For tracing report the uid and gid values in the initial user 
namespace
  ocfs2: Compare kuids and kgids using uid_eq and gid_eq
  ocfs2: Enable building with user namespaces enabled
  gfs2: Remove improper checks in gfs2_set_dqblk.
  gfs2: Split NO_QUOTA_CHANGE inot NO_UID_QUTOA_CHANGE and 
NO_GID_QUTOA_CHANGE
  gfs2: Report quotas in the caller's user namespace.
  gfs2: Introduce qd2index
  gfs2: Modify struct gfs2_quota_change_host to use struct kqid
  gfs2: Modify qdsb_get to take a struct kqid
  gfs2: Convert gfs2_quota_refresh to take a kqid
  gfs2: Store qd_id in struct gfs2_quota_data as a struct kqid
  gfs2: Remove the QUOTA_USER and QUOTA_GROUP defines
  gfs2: Use kuid_t and kgid_t types where appropriate.
  gfs2: Use uid_eq and gid_eq where appropriate
  gfs2: Convert uids and gids between dinodes and vfs inodes.
  gfs2: Enable building with user namespaces enabled
  ncpfs: Support interacting with multiple user namespaces
  nfs_common: Update the translation between nfsv3 acls linux posix acls
  sunrpc: Use userns friendly constants.
  sunrpc: Use kuid_t and kgid_t where appropriate
  sunrpc: Use uid_eq and gid_eq where appropriate
  sunrpc: Simplify auth_unix now that everything is a kgid_t
  sunrpc: Convert kuids and kgids to uids and gids for printing
  sunrpc: Use gid_valid to test for gid != INVALID_GID
  sunrpc: Update gss uid to security context mapping.
  sunrpc: Update svcgss xdr handle to rpsec_contect cache
  sunrpc: Hash uids by

Re: sched: Fix signedness bug in yield_to()

2013-02-20 Thread Shuah Khan

On Tue, Feb 19, 2013 at 7:27 PM, Linux Kernel Mailing List
 wrote:
> Gitweb: 
> http://git.kernel.org/linus/;a=commit;h=c3c186403c6abd32e719f005f0af950155a9e54d
> Commit: c3c186403c6abd32e719f005f0af950155a9e54d
> Parent: e0a79f529d5ba2507486d498b25da40911d95cf6
> Author: Dan Carpenter 
> AuthorDate: Tue Feb 5 14:37:51 2013 +0300
> Committer:  Ingo Molnar 
> CommitDate: Tue Feb 5 12:59:29 2013 +0100
>
> sched: Fix signedness bug in yield_to()
>
> In 7b270f6099 "sched: Bail out of yield_to when source and
> target runqueue has one task" we changed this to store -ESRCH so
> it needs to be signed.

Dan, Ingo,

I can't find the 7b270f6099 "sched: Bail out of yield_to when source
and  target runqueue has one task" in the latest Linus's git. Am I
missing something.

The current kenel/sched/core.c doesn't have the code from the
associated patch https://patchwork.kernel.org/patch/2016651/

>  bool __sched yield_to(struct task_struct *p, bool preempt)
>  {
> @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
> preempt)
>
>  again:
>   p_rq = task_rq(p);
> + /*
> +  * If we're the only runnable task on the rq and target rq also
> +  * has only one task, there's absolutely no point in yielding.
> +  */
> + if (rq->nr_running == 1 && p_rq->nr_running == 1) {
> + yielded = -ESRCH;
> + goto out_irq;
> + }

Without the 7b270f6099 "sched: Bail out of yield_to when source and
target runqueue has one task", do you need this change?

Am I missing something?

-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Corrupt packets with ath5k

2013-02-20 Thread JA Magallón


On 02/20/2013 08:58 PM, JA Magallón wrote:

Hi all...

I have a strange problem with latest kernels. When I update my netbook
I get many "Installation failed, bad rpms:" mesages. First I thougt
that the oldie cheap ssd was failing (it is an Aspire AOA110).
But the updates work always fine when plugged to ethernet.

Now i tried to copy a couple RPMS via ssh over wifi and got:

mplayer-1.1-11.r35916.1.mga3.tainted.i586.rpm  100% 2192KB   2.1MB/s   00:00
Received disconnect from 192.168.1.51: 2: Packet corrupt
lost connection

Hardware/driver are these:

03:00.0 Ethernet controller: Atheros Communications Inc. AR242x / AR542x 
Wireless Network Adapter (PCI-Express) (rev 01)
 Subsystem: Foxconn International, Inc. Device e008
 Kernel driver in use: ath5k

Ist that hardware failing ? Is driver failing ? Other laptop/androids
update fine with the router via wifi.
Any idea ?

I currently have 3.8.0, distro build.



I tried with 3.7.1 and everything works fine.
Possible clues:
- 3.7.1: works fine
- warm boot in 3.8.0: transfer stalls and scp looks hanged, speed
  drops to zero, but no error message
- cold boot in 3.8.0: transfer stops with Packet corrupt error or sometimes
  stalls..


TIA




--
J.A. Magallon \   Winter is coming...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] lockdep: check that no locks held at freeze time

On Wed, Feb 20, 2013 at 4:20 PM, Andrew Morton
 wrote:
> On Wed, 20 Feb 2013 16:17:39 -0800
> Mandeep Singh Baines  wrote:
>
>> On Wed, Feb 20, 2013 at 3:24 PM, Andrew Morton
>>  wrote:
>> > On Wed, 20 Feb 2013 15:17:16 -0800
>> > Mandeep Singh Baines  wrote:
>> >
>> >> We shouldn't try_to_freeze if locks are held.
>> >>
>> >> ...
>> >>
>> >> @@ -43,6 +44,9 @@ extern void thaw_kernel_threads(void);
>> >>
>> >> + if (!(current->flags & PF_NOFREEZE))
>> >> + debug_check_no_locks_held(current,
>> >> +
>> >> "lock held while trying to 
>> >> freeze");
>> >> ...
>> >>
>> >> + debug_check_no_locks_held(tsk, "lock held at task exit time");
>> >
>> > There doesn't seem much point in adding the `msg' to
>> > debug_check_no_locks_held() - the dump_stack() in
>> > print_held_locks_bug() will tell us the same thing.  Maybe just change
>>
>> dump_stack() can be confusing when there is inlining. On occasion I've
>> looked at the wrong mutex_lock, for example, when there was another
>> mutex_lock that was inlined. Of course, you can start objdump and
>> verify the offsets. But that requires that you have the object file.
>> You could have a try_to_freeze added to do_exit. I was thinking of
>> adding another locks_held in the return from syscall path.
>
> Backtraces aren't *that* bad.  We'll easily be able to tell which of
> the two callsites triggered the trace.
>

Let's say there was a try_to_freeze() that got inlined indirectly
(multiple levels of inline) into do_exit. Wouldn't the backtraces for
the regular exit check and the try_to_freeze check be identical except
for the offset (do_exit+0x45 versus do_exit+0x88)? So unless you had
an object file you wouldn't know which check you hit.

Regards,
Mandeep
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tools: usb: ffs-test: Fix build failure

2013-02-20 Thread Maxin B. John

Hi,

On Thu, Feb 21, 2013 at 2:06 AM, Greg KH  wrote:
> On Thu, Feb 21, 2013 at 01:57:51AM +0200, maxin.j...@gmail.com wrote:
>> From: "Maxin B. John" 
>>
>> Fixes this build failure:
>> gcc -Wall -Wextra -g -lpthread -I../include -o testusb testusb.c
>> gcc -Wall -Wextra -g -lpthread -I../include -o ffs-test ffs-test.c
>> In file included from ffs-test.c:41:0:
>> ../../include/linux/usb/functionfs.h:4:39: fatal error:
>> uapi/linux/usb/functionfs.h: No such file or directory
>> compilation terminated.
>> make: *** [ffs-test] Error 1
>
> This is a build failure where, 3.8, or linux-next, or somewhere else?

It is in 3.8

> thanks,
>
> greg k-h

Best Regards,
Maxin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] lockdep: check that no locks held at freeze time

2013-02-20 Thread Andrew Morton

On Wed, 20 Feb 2013 16:17:39 -0800
Mandeep Singh Baines  wrote:

> On Wed, Feb 20, 2013 at 3:24 PM, Andrew Morton
>  wrote:
> > On Wed, 20 Feb 2013 15:17:16 -0800
> > Mandeep Singh Baines  wrote:
> >
> >> We shouldn't try_to_freeze if locks are held.
> >>
> >> ...
> >>
> >> @@ -43,6 +44,9 @@ extern void thaw_kernel_threads(void);
> >>
> >> + if (!(current->flags & PF_NOFREEZE))
> >> + debug_check_no_locks_held(current,
> >> +
> >> "lock held while trying to 
> >> freeze");
> >> ...
> >>
> >> + debug_check_no_locks_held(tsk, "lock held at task exit time");
> >
> > There doesn't seem much point in adding the `msg' to
> > debug_check_no_locks_held() - the dump_stack() in
> > print_held_locks_bug() will tell us the same thing.  Maybe just change
> 
> dump_stack() can be confusing when there is inlining. On occasion I've
> looked at the wrong mutex_lock, for example, when there was another
> mutex_lock that was inlined. Of course, you can start objdump and
> verify the offsets. But that requires that you have the object file.
> You could have a try_to_freeze added to do_exit. I was thinking of
> adding another locks_held in the return from syscall path.

Backtraces aren't *that* bad.  We'll easily be able to tell which of
the two callsites triggered the trace.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] lockdep: check that no locks held at freeze time