date:20140511

linux-next: build failure after merge of the fsl tree

2014-05-11 Thread Stephen Rothwell

Hi Scott,

After merging the fsl tree, today's linux-next build (powerpc
allyesconfig) failed like this:

arch/powerpc/kernel/epapr_paravirt.c: In function 'epapr_idle_init':
arch/powerpc/kernel/epapr_paravirt.c:77:23: error: 'epapr_ev_idle' undeclared 
(first use in this function)
   ppc_md.power_save = epapr_ev_idle;
   ^

Caused by commit 7762b1ed7aae ("powerpc: move epapr paravirt init of
power_save to an initcall").

I have reverted that commit for today.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature

Re: [RFC][PATCH] af_key: return error when meet errors on sendmsg() syscall

2014-05-11 Thread Xufeng Zhang


On 05/12/2014 01:11 PM, David Miller wrote:



So it makes sense to return errors for send() syscall.

Signed-off-by: Xufeng Zhang
 

I disagree.

If pfkey_error() is successful, the error will be reported in the AF_KEY
message that is broadcast, there is no reason for sendmsg to return an
error.  The message was sucessfully sent, there was no problem with it's
passage into the AF_KEY layer.

Like netlink, operational responses come in packets, not error codes.

However, if pfkey_error() fails, we must do pass back the original
error code because it's a last ditch effort to prevent information
from being lost.

That's why 'err' must be preserved when pfkey_error() returns zero.
   


I know what you mean, but isn't the kernel API aimed to facilitate the
implementation of user space?
Since send the message to the kernel and receive the error report message
are asynchronous, I don't think it's easy to recover from the EINTR error
by parsing the error report message.


Thanks,
Xufeng




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] hrtimer: reprogram event for expires=KTIME_MAX in hrtimer_force_reprogram()

2014-05-11 Thread Viresh Kumar

On 10 May 2014 21:47, Preeti U Murthy  wrote:
> On 05/09/2014 04:27 PM, Viresh Kumar wrote:
>> On 9 May 2014 16:04, Preeti U Murthy  wrote:

>> Ideally, the device should have stopped events as we programmed it in
>> ONESHOT mode. And should have waited for kernel to set it again..
>>
>> But probably that device doesn't have a ONESHOT mode and is firing
>> again and again. Anyway the real problem I was trying to solve wasn't
>> infinite interrupts coming from event dev, but the first extra event that
>> we should have got rid of .. It just happened that we got more problems
>> on this particular board.
>
> So on a timer interrupt the tick device, irrespective of if it is in
> ONESHOT mode or not, is in an expired state. Thus it will continue to
> fire. What has ONESHOT mode got to do with this?

So, the arch specific timer handler must be clearing it I suppose and it
shouldn't have fired again after 5 ms as it is not reprogrammed.

Probably that's an implementation specific stuff.. I have seen timers which
have two modes, periodic: they fire continuously and oneshot: they get
disabled after firing and have to be reprogrammed.

>>> The reason this got exposed in NOHZ_FULL config is because in a normal
>>> NOHZ scenario when the cpu goes idle, and there are no pending timers in
>>> timer_list, even then tick_sched_timer gets cancelled. Precisely the
>>> scenario that you have described.
>>
>> I haven't tried but it looks like this problem will exist there as well.. 
>> Who is
>> disabling the event device in that case when tick_sched timer goes off ?
>> The same question that is applicable in this case as well..
>>
>>>But we don't get continuous interrupts then because the first time we
>>> get an interrupt, we queue the tick_sched_timer and program the tick
>>> device to the time of its expiry and therefore *push* the time at which
>>> your tick device should fire further.
>>
>> Probably not.. We don't get continuous interrupts because that's a special
>> case for my platform. But I am quite sure you would be getting one extra
>> interrupt after tick period, but because we didn't had anything to service
>
> Hmm? I didn't get this. Why would we?  We ensure that if there are no
> pending timers in timer_list the tick_sched_timer is cancelled. We
> cannot get spurious interrupts when there are no pending timers in NOHZ
> mode.

Okay, there are no pending timers to fire and even we have disabled
tick_sched_timer as well.. But the event dev isn't SHUTDOWN or reprogrammed.
And so it must fire after tick interval? Exactly the same issue we are getting
here in NO_HZ_FULL..

And the worst part is we aren't getting these interrupts in traces as well.
Somebody probably need to revisit the trace_irq_handler_entry part as well
to catch such problems.

> Hmm yeah looking at the problem that you are trying to solve, that being
> completely disabling timer interrupts on cpus that are running just one
> process, it appears to me that setting the tick device in SHUTDOWN mode
> is the only way to do so. And you are right. We use SHUTDOWN mode to
> imply that the device can be switched off. Its upto the arch to react to
> it appropriately.

So, from the mail where tglx blasted me off, we have a better solution to
implement now :)

> My concern is on powerpc today when we set the device to SHUTDOWN mode
> we set the decrementer to a MAX value. Which means we will get
> interrupts only spaced out more widely in time. But on NOHZ_FULL mode if
> you are looking at completely disabling tick_sched_timer as long as a
> single process runs then we might need to change the semantics here.

Lets see if we can do some nice stuff with ONESHOT_STOPPED state..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error (again)

2014-05-11 Thread Benjamin Herrenschmidt

On Sun, 2014-05-11 at 21:52 -0700, Guenter Roeck wrote:
> Oh well, it was worth a try. Can you give me an example for a failing
> configuration ?

My g5 config which is close to g5_defconfig with PR KVM enabled.

In any case, see my other messages. I'm waiting for all my test builders
to come back and if it's clear I'll post a new patch.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: bug: acpi ata_bay dock reminds undocked

2014-05-11 Thread Pali Rohár

On Monday 12 May 2014 02:00:29 Rafael J. Wysocki wrote:
> On Sunday, May 11, 2014 05:49:17 PM Pali Rohár wrote:
> > --nextPart2204083.pLeiedbUui
> > Content-Type: Text/Plain;
> > 
> >   charset="utf-8"
> > 
> > Content-Transfer-Encoding: quoted-printable
> > 
> > On Wednesday 30 April 2014 11:24:50 Pali Roh=C3=A1r wrote:
> > > On Tuesday 29 April 2014 23:35:42 Rafael J. Wysocki wrote:
> > > > On Tuesday, April 29, 2014 11:00:01 PM Pali Roh=C3=A1r 
wrote:
> > > > > On Tuesday 29 April 2014 22:55:07 Rafael J. Wysocki 
wrote:
> > > > > > Which kernel version(s) have you tried?
> > > > >
> > > > >=20
> > > > >
> > > > > 3.15-rc3
> > > >
> > > >=20
> > > >
> > > > Does it work with 3.14(.x) by chance?
> > >
> > >=20
> > >
> > > Tested with 3.14 and 3.8. Same problem, not working.
> > 
> > BUMP!
> > 
> > Rafael, do you need some other information?
> 
> I'll take care of this when I have the time, OK?

Ok, I will wait.

-- 
Pali Rohár
pali.ro...@gmail.com


signature.asc
Description: This is a digitally signed message part.

Re: [PATCH] pinctrl: Add i.MX1 pincontrol driver

2014-05-11 Thread Sascha Hauer

On Mon, May 12, 2014 at 09:03:26AM +0400, Alexander Shiyan wrote:
> Mon, 12 May 2014 06:51:13 +0200 от Sascha Hauer :
> > On Fri, May 09, 2014 at 08:16:33PM +0400, Alexander Shiyan wrote:
> > > This patch adds pincontrol driver for Freescale i.MX1 SOCs.
> > > 
> > > Signed-off-by: Alexander Shiyan 
> > > ---
> > >  drivers/pinctrl/Kconfig|   7 ++
> > >  drivers/pinctrl/Makefile   |   1 +
> > >  drivers/pinctrl/pinctrl-imx1.c | 279 
> > > +
> > >  3 files changed, 287 insertions(+)
> > >  create mode 100644 drivers/pinctrl/pinctrl-imx1.c
> > 
> > Nice. I thought about adding devicetree support for i.MX1 aswell.
> > 
> > Don't we need a imx1-pinfunc.h file to make use of this patch?
> 
> It will be added along with the DTS template for that CPU architecture.

Ok.

Sascha

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error (again)

2014-05-11 Thread Guenter Roeck


On 05/11/2014 10:37 PM, Benjamin Herrenschmidt wrote:

On Mon, 2014-05-12 at 14:12 +1000, Benjamin Herrenschmidt wrote:

On Fri, 2014-05-09 at 17:07 -0700, Guenter Roeck wrote:

Commit 4e243b7 (powerpc: Fix "attempt to move .org backwards" error) fixes the
allyesconfig build by moving machine_check_common to a different location.
While this fixes most of the errors, both allmodconfig and allyesconfig still
fail as follows.

arch/powerpc/kernel/exceptions-64s.S:1315: Error: attempt to move .org backwards

Fix by moving machine_check_common after the offending address.


This suffers from the same problem as previous attempts, on some of my
test configs I get:

arch/powerpc/kernel/head_64.o:(__ftr_alt_97+0xb0): relocation truncated to fit: 
R_PPC64_REL14 against `.text'+1c90
make[1]: *** [vmlinux] Error 1
make: *** [sub-make] Error 2

IE, it breaks currently working configs.

So we need to move more things around and I haven't had a chance to
sort it out.


Ok, I think I sorted it out for now. It's a mess and likely to break
again until we do something more drastic like moving everything that's
after 0x8000 to a separate file but for now that will do. Patch on its
way, I'll also shoot it to Linus today along with a few other things.



Great, thanks a lot!

Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error (again)

2014-05-11 Thread Benjamin Herrenschmidt

On Mon, 2014-05-12 at 14:12 +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2014-05-09 at 17:07 -0700, Guenter Roeck wrote:
> > Commit 4e243b7 (powerpc: Fix "attempt to move .org backwards" error) fixes 
> > the
> > allyesconfig build by moving machine_check_common to a different location.
> > While this fixes most of the errors, both allmodconfig and allyesconfig 
> > still
> > fail as follows.
> > 
> > arch/powerpc/kernel/exceptions-64s.S:1315: Error: attempt to move .org 
> > backwards
> > 
> > Fix by moving machine_check_common after the offending address.
> 
> This suffers from the same problem as previous attempts, on some of my
> test configs I get:
> 
> arch/powerpc/kernel/head_64.o:(__ftr_alt_97+0xb0): relocation truncated to 
> fit: R_PPC64_REL14 against `.text'+1c90
> make[1]: *** [vmlinux] Error 1
> make: *** [sub-make] Error 2
> 
> IE, it breaks currently working configs.
> 
> So we need to move more things around and I haven't had a chance to
> sort it out.

Ok, I think I sorted it out for now. It's a mess and likely to break
again until we do something more drastic like moving everything that's
after 0x8000 to a separate file but for now that will do. Patch on its
way, I'll also shoot it to Linus today along with a few other things.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] tick: SHUTDOWN event-dev if no events are required for KTIME_MAX

2014-05-11 Thread Viresh Kumar

Thanks for blasting me off, it might be very helpful going forward :)

On 10 May 2014 01:39, Thomas Gleixner  wrote:
> On Fri, 9 May 2014, Viresh Kumar wrote:

>> diff --git a/kernel/time/tick-oneshot.c b/kernel/time/tick-oneshot.c

>>  int tick_program_event(ktime_t expires, int force)
>>  {
>>   struct clock_event_device *dev = 
>> __this_cpu_read(tick_cpu_device.evtdev);
>> + int ret = 0;
>>
>> - return clockevents_program_event(dev, expires, force);
>> + /* Shut down event device if it is not required for long */
>> + if (unlikely(expires.tv64 == KTIME_MAX)) {
>> + dev->last_mode = dev->mode;
>> + clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
>
> No, we are not doing a state change behind the scene and a magic
> restore. And I know at least one way to make this fall flat on its
> nose, because you are blindly doing dev->last_mode = dev->mode on
> every invocation. So if that gets called twice without a restore in
> between, the device is going to be in shutdown mode forever.

During my tests I had this as well:

if (unlikely(expires.tv64 == KTIME_MAX)) {
+   WARN_ON(dev->mode == CLOCK_EVT_MODE_SHUTDOWN);

But it never got to it and I thought it might never happen, so removed it.
But yes, there should be some check here for that.

> It's moronic anyway as the clock event device has the state
> CLOCK_EVT_MODE_ONESHOT if its active, otherwise we would not be in
> that code path.

Yeah, Missed that earlier.

> But what's even worse: you just define that it's the best way for all
> implementations of clockevents to handle this.
>
> It's definitley NOT. Some startup/shutdown implementations are rather
> complex, so that would burden them with rather big latencies and some
> of them will even outright break.
>
> There is a world outside of YOUR favourite subarch.

:)

> We do not hijack stuff just because we can and it works on some
> machines. We think about it proper.

Agreed..

> If we hijack some existing facility then we audit ALL implementation
> sites and document that we did so and why we are sure that it won't
> break stuff. It still might break some oddball case, but that's not a
> big issue.

Because SHUTDOWN was an existing old API, I thought it will work
without breaking stuff. Yes, I must have done some auditing or made
this an RFC series atleast to get the discussion going forward..

> In the clockevents case we do not even need a new interface, but this
> must be made OPT-in and not a flagday change for all users.
>
> And no we are not going to abuse a feature flag for this. It's not a
> feature.

Okay.

> I'd rather have a new state for this, simply because it is NOT
> shutdown. It is in ONESHOT_STOPPED state. Whether a specific
> implementation will use the SHUTDOWN code for it or not does not
> matter.

Correct.

> That requires a full tree update of all implementations because most
> of them have a switch case for the mode. And adding a state will cause
> all of them which do not have a default clause to omit warnings
> because the mode is an enum for this very reason.
>
> And even if all of them would have a default clause, you'd need a way
> to OPT-In, because some of the defaults have a BUG() in there. Again,
> no feature flag exclusion. See above.

Okay..

> So the right thing to do this is:
>
> 1A) Change the prototype of the set_mode callback to return int and
> fixup all users. Either add the missing default clause or remove
> the existing BUG()/ pr_err()/whatever handling in the existing
> default clause and return a UNIQUE error code.
>
> I know I should have done that from the very beginning, but in
> hindsight one could have done everything better.
>
> coccinelle is your friend (if you need help ask me or Julia
> Lawall). But it's going to be quite some manual work on top.

Sure.

> 1B) Audit the changes and look at the implementations. If the patch is
> just adding the default clause or replacing some BUG/printk error
> handling goto #1C
>
> If it looks like it needs some preparatory care or if you find
> bugs in a particular implementation, roll back the changes and do
> the bug fixes and preparatory changes first as separate patches.
>
> Go back to #1A until the coccinelle patches are just squeaky
> clean.
>
> 1C) Add proper error handling for the various modes to the set_mode
> callback call sites, only two AFAIK.
>
> 2A) Add a new mode ONESHOT_STOPPED. That's safe now as all error
> handling will be done in the core code.
>
> 2B) Implement the ONESHOT_STOPPED logic and make sure all of the core
> code is aware of it.

Okay..

> And don't tell me it can't be done.

No way :)

> I've done it I don't know how many
> times with interrupts, timers, locking and some more. It's hard work,
> but it's valuable and way better than the brainless "make it work for
> me" hackery.

I didn't mean that actually. I just pin pointed how badly things can

Re: [PATCH V4 2/2] fs/ext4/fsync.c: generic_file_fsync call based on barrier flag

2014-05-11 Thread Fabian Frederick

On Mon, 12 May 2014 11:24:26 +0800
Ming Lei  wrote:

> On Sun, May 11, 2014 at 1:06 AM, Fabian Frederick  wrote:
> > generic_file_fsync has been updated to issue a flush for
> > older filesystems.
> >
> > This patch tests for barrier flag in ext4 mount flags
> > and calls the right function.
> >
> > Suggested-by: Jan Kara 
> > Suggested-by: Christoph Hellwig 
> > Cc: Jan Kara 
> > Cc: Christoph Hellwig 
> > Cc: Alexander Viro 
> > Cc: "Theodore Ts'o" 
> > Cc: Andrew Morton 
> > Signed-off-by: Fabian Frederick 
> > ---
> >  fs/ext4/fsync.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> > index a8bc47f..fa82c0a 100644
> > --- a/fs/ext4/fsync.c
> > +++ b/fs/ext4/fsync.c
> > @@ -108,6 +108,10 @@ int ext4_sync_file(struct file *file, loff_t start, 
> > loff_t end, int datasync)
> >
> > if (!journal) {
> > ret = generic_file_fsync(file, start, end, datasync);
> 
> Forget to remove above line?
Oops, of course ! Thanks a lot, I've sent a new version :)

Regards,
Fabian

> 
> > +   if (test_opt(inode->i_sb, BARRIER))
> > +   ret = generic_file_fsync(file, start, end, 
> > datasync);
> > +   else
> > +   ret = __generic_file_fsync(file, start, end, 
> > datasync);
> > if (!ret && !hlist_empty(>i_dentry))
> > ret = ext4_sync_parent(inode);
> > goto out;
> 
> 
> 
> Thanks,
> -- 
> Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V5 2/2] fs/ext4/fsync.c: generic_file_fsync call based on barrier flag

2014-05-11 Thread Fabian Frederick

generic_file_fsync has been updated to issue a flush for
older filesystems.

This patch tests for barrier flag in ext4 mount flags
and calls the right function.

Suggested-by: Jan Kara 
Suggested-by: Christoph Hellwig 
Cc: Jan Kara 
Cc: Christoph Hellwig 
Cc: Alexander Viro 
Cc: "Theodore Ts'o" 
Cc: Andrew Morton 
Signed-off-by: Fabian Frederick 
---
 fs/ext4/fsync.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index a8bc47f..5b6e9f2 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -107,7 +107,10 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t 
end, int datasync)
}
 
if (!journal) {
-   ret = generic_file_fsync(file, start, end, datasync);
+   if (test_opt(inode->i_sb, BARRIER))
+   ret = generic_file_fsync(file, start, end, datasync);
+   else
+   ret = __generic_file_fsync(file, start, end, datasync);
if (!ret && !hlist_empty(>i_dentry))
ret = ext4_sync_parent(inode);
goto out;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V5 1/2] FS: Add generic data flush to fsync

2014-05-11 Thread Fabian Frederick

This patch issues a flush in generic_file_fsync.
(Modern filesystems already do it)

Behaviour can be reversed using /sys/devices/.../cache_type
or by calling __generic_file_fsync

Suggested-by: Jan Kara 
Suggested-by: Christoph Hellwig 
Cc: Jan Kara 
Cc: Christoph Hellwig 
Cc: Alexander Viro 
Cc: "Theodore Ts'o" 
Cc: Andrew Morton 
Signed-off-by: Fabian Frederick 
---
v5: patch2/2 ext4 patch fix (Thanks to Ming Lei)
V4: update description
V3: __generic_file_fsync = no flush
V2: No additional flag
V1: First version with MS_BARRIER flag

 fs/libfs.c | 36 +---
 include/linux/fs.h |  1 +
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index a184424..4877906 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -3,6 +3,7 @@
  * Library for filesystems writers.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -923,16 +924,19 @@ struct dentry *generic_fh_to_parent(struct super_block 
*sb, struct fid *fid,
 EXPORT_SYMBOL_GPL(generic_fh_to_parent);
 
 /**
- * generic_file_fsync - generic fsync implementation for simple filesystems
+ * __generic_file_fsync - generic fsync implementation for simple filesystems
+ *
  * @file:  file to synchronize
+ * @start: start offset in bytes
+ * @end:   end offset in bytes (inclusive)
  * @datasync:  only synchronize essential metadata if true
  *
  * This is a generic implementation of the fsync method for simple
  * filesystems which track all non-inode metadata in the buffers list
  * hanging off the address_space structure.
  */
-int generic_file_fsync(struct file *file, loff_t start, loff_t end,
-  int datasync)
+int __generic_file_fsync(struct file *file, loff_t start, loff_t end,
+int datasync)
 {
struct inode *inode = file->f_mapping->host;
int err;
@@ -952,10 +956,36 @@ int generic_file_fsync(struct file *file, loff_t start, 
loff_t end,
err = sync_inode_metadata(inode, 1);
if (ret == 0)
ret = err;
+
 out:
mutex_unlock(>i_mutex);
return ret;
 }
+EXPORT_SYMBOL(__generic_file_fsync);
+
+/**
+ * generic_file_fsync - generic fsync implementation for simple filesystems
+ * with flush
+ * @file:  file to synchronize
+ * @start: start offset in bytes
+ * @end:   end offset in bytes (inclusive)
+ * @datasync:  only synchronize essential metadata if true
+ *
+ */
+
+int generic_file_fsync(struct file *file, loff_t start, loff_t end,
+  int datasync)
+{
+   struct inode *inode = file->f_mapping->host;
+   int err;
+
+   err = __generic_file_fsync(file, start, end, datasync);
+   if (err)
+   return err;
+
+   return blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
+
+}
 EXPORT_SYMBOL(generic_file_fsync);
 
 /**
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8780312..c3f46e4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2590,6 +2590,7 @@ extern ssize_t simple_read_from_buffer(void __user *to, 
size_t count,
 extern ssize_t simple_write_to_buffer(void *to, size_t available, loff_t *ppos,
const void __user *from, size_t count);
 
+extern int __generic_file_fsync(struct file *, loff_t, loff_t, int);
 extern int generic_file_fsync(struct file *, loff_t, loff_t, int);
 
 extern int generic_check_addressable(unsigned, u64);
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the gpio tree with the net-next tree

2014-05-11 Thread Stephen Rothwell

Hi Linus,

Today's linux-next merge of the gpio tree got a conflict in
Documentation/driver-model/devres.txt between commit 6d48f44b7b2a
("mdio_bus: implement devm_mdiobus_alloc/devm_mdiobus_free") from the
net-next tree and commit f9748ef13b6a ("gpio: Add missing
device-managed documentation") from the gpio tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc Documentation/driver-model/devres.txt
index d483f2cf221b,8ff1167cfedf..
--- a/Documentation/driver-model/devres.txt
+++ b/Documentation/driver-model/devres.txt
@@@ -310,7 -309,7 +310,12 @@@ SLAVE DMA ENGIN
  SPI
devm_spi_register_master()
  
 +MDIO
 +  devm_mdiobus_alloc()
 +  devm_mdiobus_alloc_size()
 +  devm_mdiobus_free()
++
+ GPIO
+   devm_gpiod_get()
+   devm_gpiod_get_index()
+   devm_gpiod_put()


signature.asc
Description: PGP signature

Re: [PATCHv2 0/2] remap_file_pages() decommission

2014-05-11 Thread Konstantin Khlebnikov

On Mon, May 12, 2014 at 7:36 AM, Andi Kleen  wrote:
> Armin Rigo  writes:
>
>> Here is a note from the PyPy project (mentioned earlier in this
>> thread, and at https://lwn.net/Articles/587923/ ).
>
> Your use is completely bogus. remap_file_pages() pins everything
> and disables any swapping for the area.

Wait, what's wrong with swapping pages from non-linear vmas?
try_to_umap() can handle them, though not very effectively.

Some time ago I was thinking about tracking rmap for non-linear vmas, something
like second-level tree of sub-vmas stored in non-linear vma. This
could be done using
exising vm_area_struct, and in rmap tree everything will looks just as normal.
We'll waste some kernel memory, but it also will remove complexity from rmap and
make non-linear vmas usable for all filesystems not just for shmem.

But it's not worth. I ACK killing it.

Maybe we should keep flag on vma and hide/merge them in proc/maps.
Bloating files/dirs in proc might be bigger problem than non-existent
performance regression.

>
> -Andi
> --
> a...@linux.intel.com -- Speaking for myself only
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] zram: remove global tb_lock by using lock-free CAS

2014-05-11 Thread Minchan Kim

On Sat, May 10, 2014 at 02:10:08PM +0800, Weijie Yang wrote:
> On Thu, May 8, 2014 at 2:24 PM, Minchan Kim  wrote:
> > On Wed, May 07, 2014 at 11:52:59PM +0900, Joonsoo Kim wrote:
> >> >> Most popular use of zram is the in-memory swap for small embedded system
> >> >> so I don't want to increase memory footprint without good reason 
> >> >> although
> >> >> it makes synthetic benchmark. Alhought it's 1M for 1G, it isn't small 
> >> >> if we
> >> >> consider compression ratio and real free memory after boot
> >>
> >> We can use bit spin lock and this would not increase memory footprint for 
> >> 32 bit
> >> platform.
> >
> > Sounds like a idea.
> > Weijie, Do you mind testing with bit spin lock?
> 
> Yes, I re-test them.
> This time, I test each case 10 times, and take the average(KS/s).
> (the test machine and method are same like previous mail's)
> 
> Iozone test result:
> 
>   Test   BASE CAS   spinlock   rwlock  bit_spinlock
> --
>  Initial write  1381094   1425435   1422860   1423075   1421521
>Rewrite  1529479   1641199   1668762   1672855   1654910
>   Read  8468009  11324979  11305569  7273  10997202
>Re-read  8467476  11260914  11248059  11145336  10906486
>   Reverse Read  6821393   8106334   8282174   8279195   8109186
>Stride read  7191093   8994306   9153982   8961224   9004434
>Random read  7156353   8957932   9167098   8980465   8940476
> Mixed workload  4172747   5680814   5927825   5489578   5972253
>   Random write  1483044   1605588   1594329   1600453   1596010
> Pwrite  1276644   1303108   1311612   1314228   1300960
>  Pread  4324337   4632869   4618386   4457870   4500166
> 
> Fio test result:
> 
> Test base CASspinlockrwlock  bit_spinlock
> -
> seq-write   933789   999357   1003298995961   1001958
>  seq-read  5634130  6577930   6380861   6243912   6230006
>seq-rw  1405687  1638117   1640256   1633903   1634459
>   rand-rw  1386119  1614664   1617211   1609267   1612471
> 
> 
> The base is v3.15.0-rc3, the others are per-meta entry lock.
> Every optimization method shows higher performance than the base, however,
> it is hard to say which method is the most appropriate.

It's not too big between CAS and bit_spinlock so I prefer general method.

> 
> To bit_spinlock, the modified code is mainly like this:
> 
> +#define ZRAM_FLAG_SHIFT 16
> +
> enum zram_pageflags {
>   /* Page consists entirely of zeros */
> - ZRAM_ZERO,
> + ZRAM_ZERO = ZRAM_FLAG_SHIFT + 1,
> + ZRAM_ACCESS,
>  
>   __NR_ZRAM_PAGEFLAGS,
>  };
>  
>  /* Allocated for each disk page */
>  struct table {
>   unsigned long handle;
> - u16 size;   /* object size (excluding header) */
> - u8 flags;
> + unsigned long value;

Why does we need to change flags and size "unsigned long value"?
Couldn't we use existing flags with just adding new ZRAM_TABLE_LOCK?


>  } __aligned(4);
> 
> The lower ZRAM_FLAG_SHIFT bits of table.value is size, the higher bits
> is for zram_pageflags. By this means, it doesn't increase any memory
> overhead on both 32-bit and 64-bit system.
> 
> Any complaint or suggestions are welcomed.

Anyway, I'd like to go this way.
Pz, resend formal patch with a number.

Thanks!

> 
> >>
> >> Thanks.
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majord...@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> >
> > --
> > Kind regards,
> > Minchan Kim
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] af_key: return error when meet errors on sendmsg() syscall

2014-05-11 Thread David Miller

From: Xufeng Zhang 
Date: Fri, 9 May 2014 13:47:35 +0800

> Current implementation for pfkey_sendmsg() always return success
> no matter whether or not error happens during this syscall,
> this is incompatible with the general send()/sendmsg() API:
>   man send
> RETURN VALUE
>   On success, these calls return the number of characters sent.
>   On error, -1 is returned, and errno is set appropriately.
> 
> One side effect this problem introduces is that we can't determine
> when to resend the message when the previous send() fails because
> it was interrupted by signals.
> We detect such a problem when racoon is sending SADBADD message to
> add SAD entry in the kernel, but sometimes kernel is responding with
> "Interrupted system call"(-EINTR) error.
> 
> Check the send implementation of strongswan, it has below logic:
>   pfkey_send_socket()
>   {
>   ...
>   while (TRUE)
>   {
>   len = send(socket, in, in_len, 0);
> 
>   if (len != in_len)
>   {
>   case EINTR:
>   /* interrupted, try again */
>   continue;
>   ...
>   }
>   }
>   ...
> }
> So it makes sense to return errors for send() syscall.  
> 
> Signed-off-by: Xufeng Zhang 

I disagree.

If pfkey_error() is successful, the error will be reported in the AF_KEY
message that is broadcast, there is no reason for sendmsg to return an
error.  The message was sucessfully sent, there was no problem with it's
passage into the AF_KEY layer.

Like netlink, operational responses come in packets, not error codes.

However, if pfkey_error() fails, we must do pass back the original
error code because it's a last ditch effort to prevent information
from being lost.

That's why 'err' must be preserved when pfkey_error() returns zero.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next,v3] Add support for netvsc build without CONFIG_SYSFS flag

2014-05-11 Thread David Miller

From: Haiyang Zhang 
Date: Thu,  8 May 2014 15:14:10 -0700

> This change ensures the driver can be built successfully without the
> CONFIG_SYSFS flag.
> MS-TFS: 182270
> 
> Signed-off-by: Haiyang Zhang 
> Reviewed-by: K. Y. Srinivasan 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pinctrl: Add i.MX1 pincontrol driver

2014-05-11 Thread Alexander Shiyan

Mon, 12 May 2014 06:51:13 +0200 от Sascha Hauer :
> On Fri, May 09, 2014 at 08:16:33PM +0400, Alexander Shiyan wrote:
> > This patch adds pincontrol driver for Freescale i.MX1 SOCs.
> > 
> > Signed-off-by: Alexander Shiyan 
> > ---
> >  drivers/pinctrl/Kconfig|   7 ++
> >  drivers/pinctrl/Makefile   |   1 +
> >  drivers/pinctrl/pinctrl-imx1.c | 279 
> > +
> >  3 files changed, 287 insertions(+)
> >  create mode 100644 drivers/pinctrl/pinctrl-imx1.c
> 
> Nice. I thought about adding devicetree support for i.MX1 aswell.
> 
> Don't we need a imx1-pinfunc.h file to make use of this patch?

It will be added along with the DTS template for that CPU architecture.

---

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH 3.14 27/83] ARC: !PREEMPT: Ensure Return to kernel mode is IRQ safe

2014-05-11 Thread Vineet Gupta


On Monday 12 May 2014 12:51 AM, Greg Kroah-Hartman wrote:
> 3.14-stable review patch.  If anyone has any objections, please let me know.
>
> --
>
> From: Vineet Gupta 
>
> commit 8aa9e85adac609588eeec356e5a85059b3b819ba upstream.

Hi Greg,

This one was also marked for stable 3.10 however because the 2 pre-req patches
were not in yet, applying it would have failed and AFAIKR I did describe the 
state
of things in that failure report. Anyhow can you please queue this one up for 
the
next 3.10 stable.

Thx,
-Vineet

>
> There was a very small race window where resume to kernel mode from a
> Exception Path (or pure kernel mode which is true for most of ARC
> exceptions anyways), was not disabling interrupts in restore_regs,
> clobbering the exception regs
>
> Anton found the culprit call flow (after many sleepless nights)
>
> | 1. we got a Trap from user land
> | 2. started to service it.
> | 3. While doing some stuff on user-land memory (I think it is padzero()),
> | we got a DataTlbMiss
> | 4. On return from it we are taking "resume_kernel_mode" path
> | 5. NEED_RESHED is not set, so we go to "return from exception" path in
> | restore regs.
> | 6. there seems to be IRQ happening
>
> Signed-off-by: Vineet Gupta 
> Cc: Anton Kolesov 
> Cc: Francois Bedard 
> Signed-off-by: Linus Torvalds 
> Signed-off-by: Greg Kroah-Hartman 
>
> ---
>  arch/arc/kernel/entry.S |8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> --- a/arch/arc/kernel/entry.S
> +++ b/arch/arc/kernel/entry.S
> @@ -614,11 +614,13 @@ resume_user_mode_begin:
>  
>  resume_kernel_mode:
>  
> -#ifdef CONFIG_PREEMPT
> -
> - ; This is a must for preempt_schedule_irq()
> + ; Disable Interrupts from this point on
> + ; CONFIG_PREEMPT: This is a must for preempt_schedule_irq()
> + ; !CONFIG_PREEMPT: To ensure restore_regs is intr safe
>   IRQ_DISABLE r9
>  
> +#ifdef CONFIG_PREEMPT
> +
>   ; Can't preempt if preemption disabled
>   GET_CURR_THR_INFO_FROM_SP   r10
>   ld  r8, [r10, THREAD_INFO_PREEMPT_COUNT]
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 0/3] TI CPSW Cleanup

2014-05-11 Thread George Cherian

This series does some minimal cleanups.
-Conversion of pr_*() to dev_*()
-Convert kzalloc to devm_kzalloc.

No functional changes.

v1 -> v2 Address review comments.
v2 -> v3 Remove a stale commit comment.

George Cherian (3):
  driver net: cpsw: Convert pr_*() to dev_*() calls
  net: davinci_mdio: Convert pr_err() to dev_err() call
  drivers: net: davinci_cpdma: Convert kzalloc() to devm_kzalloc().

 drivers/net/ethernet/ti/cpsw.c  | 50 -
 drivers/net/ethernet/ti/davinci_cpdma.c | 35 ---
 drivers/net/ethernet/ti/davinci_mdio.c  |  2 +-
 3 files changed, 38 insertions(+), 49 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 1/3] driver net: cpsw: Convert pr_() to dev_() calls

2014-05-11 Thread George Cherian

Convert all pr_*() calls to dev_*() calls.
No functional changes.

Signed-off-by: George Cherian 
Reviewed-by: Felipe Balbi 
---
 drivers/net/ethernet/ti/cpsw.c | 50 +-
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index d14c8da..9512738 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1808,25 +1808,25 @@ static int cpsw_probe_dt(struct cpsw_platform_data 
*data,
return -EINVAL;
 
if (of_property_read_u32(node, "slaves", )) {
-   pr_err("Missing slaves property in the DT.\n");
+   dev_err(>dev, "Missing slaves property in the DT.\n");
return -EINVAL;
}
data->slaves = prop;
 
if (of_property_read_u32(node, "active_slave", )) {
-   pr_err("Missing active_slave property in the DT.\n");
+   dev_err(>dev, "Missing active_slave property in the 
DT.\n");
return -EINVAL;
}
data->active_slave = prop;
 
if (of_property_read_u32(node, "cpts_clock_mult", )) {
-   pr_err("Missing cpts_clock_mult property in the DT.\n");
+   dev_err(>dev, "Missing cpts_clock_mult property in the 
DT.\n");
return -EINVAL;
}
data->cpts_clock_mult = prop;
 
if (of_property_read_u32(node, "cpts_clock_shift", )) {
-   pr_err("Missing cpts_clock_shift property in the DT.\n");
+   dev_err(>dev, "Missing cpts_clock_shift property in the 
DT.\n");
return -EINVAL;
}
data->cpts_clock_shift = prop;
@@ -1838,31 +1838,31 @@ static int cpsw_probe_dt(struct cpsw_platform_data 
*data,
return -ENOMEM;
 
if (of_property_read_u32(node, "cpdma_channels", )) {
-   pr_err("Missing cpdma_channels property in the DT.\n");
+   dev_err(>dev, "Missing cpdma_channels property in the 
DT.\n");
return -EINVAL;
}
data->channels = prop;
 
if (of_property_read_u32(node, "ale_entries", )) {
-   pr_err("Missing ale_entries property in the DT.\n");
+   dev_err(>dev, "Missing ale_entries property in the 
DT.\n");
return -EINVAL;
}
data->ale_entries = prop;
 
if (of_property_read_u32(node, "bd_ram_size", )) {
-   pr_err("Missing bd_ram_size property in the DT.\n");
+   dev_err(>dev, "Missing bd_ram_size property in the 
DT.\n");
return -EINVAL;
}
data->bd_ram_size = prop;
 
if (of_property_read_u32(node, "rx_descs", )) {
-   pr_err("Missing rx_descs property in the DT.\n");
+   dev_err(>dev, "Missing rx_descs property in the DT.\n");
return -EINVAL;
}
data->rx_descs = prop;
 
if (of_property_read_u32(node, "mac_control", )) {
-   pr_err("Missing mac_control property in the DT.\n");
+   dev_err(>dev, "Missing mac_control property in the 
DT.\n");
return -EINVAL;
}
data->mac_control = prop;
@@ -1876,7 +1876,7 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
ret = of_platform_populate(node, NULL, NULL, >dev);
/* We do not want to force this, as in some cases may not have child */
if (ret)
-   pr_warn("Doesn't have any child node\n");
+   dev_warn(>dev, "Doesn't have any child node\n");
 
for_each_child_of_node(node, slave_node) {
struct cpsw_slave_data *slave_data = data->slave_data + i;
@@ -1893,7 +1893,7 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 
parp = of_get_property(slave_node, "phy_id", );
if ((parp == NULL) || (lenp != (sizeof(void *) * 2))) {
-   pr_err("Missing slave[%d] phy_id property\n", i);
+   dev_err(>dev, "Missing slave[%d] phy_id 
property\n", i);
return -EINVAL;
}
mdio_node = of_find_node_by_phandle(be32_to_cpup(parp));
@@ -1918,18 +1918,18 @@ static int cpsw_probe_dt(struct cpsw_platform_data 
*data,
 
slave_data->phy_if = of_get_phy_mode(slave_node);
if (slave_data->phy_if < 0) {
-   pr_err("Missing or malformed slave[%d] phy-mode 
property\n",
-  i);
+   dev_err(>dev, "Missing or malformed slave[%d] 
phy-mode property\n",
+   i);
return slave_data->phy_if;
}
 
if (data->dual_emac) {
if (of_property_read_u32(slave_node, 
"dual_emac_res_vlan",
 )) {
-   pr_err("Missing dual_emac_res_vlan in DT.\n");

[PATCH v3 2/3] net: davinci_mdio: Convert pr_err() to dev_err() call

2014-05-11 Thread George Cherian

Convert the lone pr_err() to dev_err() call.

Signed-off-by: George Cherian 
Reviewed-by: Felipe Balbi 
---
 drivers/net/ethernet/ti/davinci_mdio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/davinci_mdio.c 
b/drivers/net/ethernet/ti/davinci_mdio.c
index 34e97ec..735dc53 100644
--- a/drivers/net/ethernet/ti/davinci_mdio.c
+++ b/drivers/net/ethernet/ti/davinci_mdio.c
@@ -303,7 +303,7 @@ static int davinci_mdio_probe_dt(struct mdio_platform_data 
*data,
return -EINVAL;
 
if (of_property_read_u32(node, "bus_freq", )) {
-   pr_err("Missing bus_freq property in the DT.\n");
+   dev_err(>dev, "Missing bus_freq property in the DT.\n");
return -EINVAL;
}
data->bus_freq = prop;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 3/3] drivers: net: davinci_cpdma: Convert kzalloc() to devm_kzalloc().

2014-05-11 Thread George Cherian

Convert kzalloc() to devm_kzalloc().

Signed-off-by: George Cherian 
Reviewed-by: Felipe Balbi 
---
 drivers/net/ethernet/ti/davinci_cpdma.c | 35 +++--
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index 88ef270..539dbde 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -158,9 +158,9 @@ cpdma_desc_pool_create(struct device *dev, u32 phys, u32 
hw_addr,
int bitmap_size;
struct cpdma_desc_pool *pool;
 
-   pool = kzalloc(sizeof(*pool), GFP_KERNEL);
+   pool = devm_kzalloc(dev, sizeof(*pool), GFP_KERNEL);
if (!pool)
-   return NULL;
+   goto fail;
 
spin_lock_init(>lock);
 
@@ -170,7 +170,7 @@ cpdma_desc_pool_create(struct device *dev, u32 phys, u32 
hw_addr,
pool->num_desc  = size / pool->desc_size;
 
bitmap_size  = (pool->num_desc / BITS_PER_LONG) * sizeof(long);
-   pool->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+   pool->bitmap = devm_kzalloc(dev, bitmap_size, GFP_KERNEL);
if (!pool->bitmap)
goto fail;
 
@@ -187,10 +187,7 @@ cpdma_desc_pool_create(struct device *dev, u32 phys, u32 
hw_addr,
 
if (pool->iomap)
return pool;
-
 fail:
-   kfree(pool->bitmap);
-   kfree(pool);
return NULL;
 }
 
@@ -203,7 +200,6 @@ static void cpdma_desc_pool_destroy(struct cpdma_desc_pool 
*pool)
 
spin_lock_irqsave(>lock, flags);
WARN_ON(pool->used_desc);
-   kfree(pool->bitmap);
if (pool->cpumap) {
dma_free_coherent(pool->dev, pool->mem_size, pool->cpumap,
  pool->phys);
@@ -211,7 +207,6 @@ static void cpdma_desc_pool_destroy(struct cpdma_desc_pool 
*pool)
iounmap(pool->iomap);
}
spin_unlock_irqrestore(>lock, flags);
-   kfree(pool);
 }
 
 static inline dma_addr_t desc_phys(struct cpdma_desc_pool *pool,
@@ -276,7 +271,7 @@ struct cpdma_ctlr *cpdma_ctlr_create(struct cpdma_params 
*params)
 {
struct cpdma_ctlr *ctlr;
 
-   ctlr = kzalloc(sizeof(*ctlr), GFP_KERNEL);
+   ctlr = devm_kzalloc(params->dev, sizeof(*ctlr), GFP_KERNEL);
if (!ctlr)
return NULL;
 
@@ -468,7 +463,6 @@ int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
 
cpdma_desc_pool_destroy(ctlr->pool);
spin_unlock_irqrestore(>lock, flags);
-   kfree(ctlr);
return ret;
 }
 EXPORT_SYMBOL_GPL(cpdma_ctlr_destroy);
@@ -507,21 +501,22 @@ struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr 
*ctlr, int chan_num,
 cpdma_handler_fn handler)
 {
struct cpdma_chan *chan;
-   int ret, offset = (chan_num % CPDMA_MAX_CHANNELS) * 4;
+   int offset = (chan_num % CPDMA_MAX_CHANNELS) * 4;
unsigned long flags;
 
if (__chan_linear(chan_num) >= ctlr->num_chan)
return NULL;
 
-   ret = -ENOMEM;
-   chan = kzalloc(sizeof(*chan), GFP_KERNEL);
+   chan = devm_kzalloc(ctlr->dev, sizeof(*chan), GFP_KERNEL);
if (!chan)
-   goto err_chan_alloc;
+   return ERR_PTR(-ENOMEM);
 
spin_lock_irqsave(>lock, flags);
-   ret = -EBUSY;
-   if (ctlr->channels[chan_num])
-   goto err_chan_busy;
+   if (ctlr->channels[chan_num]) {
+   spin_unlock_irqrestore(>lock, flags);
+   devm_kfree(ctlr->dev, chan);
+   return ERR_PTR(-EBUSY);
+   }
 
chan->ctlr  = ctlr;
chan->state = CPDMA_STATE_IDLE;
@@ -551,12 +546,6 @@ struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr 
*ctlr, int chan_num,
ctlr->channels[chan_num] = chan;
spin_unlock_irqrestore(>lock, flags);
return chan;
-
-err_chan_busy:
-   spin_unlock_irqrestore(>lock, flags);
-   kfree(chan);
-err_chan_alloc:
-   return ERR_PTR(ret);
 }
 EXPORT_SYMBOL_GPL(cpdma_chan_create);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error (again)

2014-05-11 Thread Guenter Roeck


On 05/11/2014 09:12 PM, Benjamin Herrenschmidt wrote:

On Fri, 2014-05-09 at 17:07 -0700, Guenter Roeck wrote:

Commit 4e243b7 (powerpc: Fix "attempt to move .org backwards" error) fixes the
allyesconfig build by moving machine_check_common to a different location.
While this fixes most of the errors, both allmodconfig and allyesconfig still
fail as follows.

arch/powerpc/kernel/exceptions-64s.S:1315: Error: attempt to move .org backwards

Fix by moving machine_check_common after the offending address.


This suffers from the same problem as previous attempts, on some of my
test configs I get:

arch/powerpc/kernel/head_64.o:(__ftr_alt_97+0xb0): relocation truncated to fit: 
R_PPC64_REL14 against `.text'+1c90
make[1]: *** [vmlinux] Error 1
make: *** [sub-make] Error 2

IE, it breaks currently working configs.


Oh well, it was worth a try. Can you give me an example for a failing 
configuration ?

Thanks,
Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pinctrl: Add i.MX1 pincontrol driver

2014-05-11 Thread Sascha Hauer

On Fri, May 09, 2014 at 08:16:33PM +0400, Alexander Shiyan wrote:
> This patch adds pincontrol driver for Freescale i.MX1 SOCs.
> 
> Signed-off-by: Alexander Shiyan 
> ---
>  drivers/pinctrl/Kconfig|   7 ++
>  drivers/pinctrl/Makefile   |   1 +
>  drivers/pinctrl/pinctrl-imx1.c | 279 
> +
>  3 files changed, 287 insertions(+)
>  create mode 100644 drivers/pinctrl/pinctrl-imx1.c

Nice. I thought about adding devicetree support for i.MX1 aswell.

Don't we need a imx1-pinfunc.h file to make use of this patch?

Sascha

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: icmp: account for ICMP out errors because of socket limit

2014-05-11 Thread zhuyj


Hi, Eric && David

This patch is similar to the following patch.

commit 1f8438a853667d48055ad38384c63e94b32c6578
Author: Eric Dumazet 
Date:   Sat Apr 3 15:09:04 2010 -0700

icmp: Account for ICMP out errors

When ip_append() fails because of socket limit or memory shortage,
increment ICMP_MIB_OUTERRORS counter, so that "netstat -s" can report
these errors.

LANG=C netstat -s | grep "ICMP messages failed"
0 ICMP messages failed

For IPV6, implement ICMP6_MIB_OUTERRORS counter as well.

# grep Icmp6OutErrors /proc/net/dev_snmp6/*
/proc/net/dev_snmp6/eth0:Icmp6OutErrors 0
/proc/net/dev_snmp6/lo:Icmp6OutErrors   0

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 

Best Regards!
Zhu Yanjun

On 05/12/2014 11:19 AM, zhuyj wrote:

Hi, Eric && David

    __
|| |  |
| PC |<--->| MIPS 32 core |
|| |__|

When ping from a PC to a board (MIPS 32 core), because of socket limit,
ping echo will fail. But ICMP_MIB_OUTERRORS counter is not incremented.
In this case, "netstat -s" can not report these errors.

This patch will fix this problem. Now it is in the attachment. Please 
check it.


Best Regards!
Zhu Yanjun


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] clk: samsung: out: Add infrastructure to register CLKOUT

2014-05-11 Thread Tushar Behera

On 05/10/2014 09:21 AM, Pankaj Dubey wrote:
> On 05/09/2014 10:00 PM, Tushar Behera wrote:
>> All SoC in Exynos-series have a clock with name XCLKOUT to provide
>> debug information about various clocks available in the SoC. The register
>> controlling the MUX and GATE of this clock is provided within PMU domain.
>> Since PMU domain can't be dedicatedly mapped by every driver, the
>> register
>> needs to be handled through a regmap handle provided by PMU syscon
>> controller. Right now, CCF doesn't allow regmap based MUX and GATE
>> clocks,
>> hence a dedicated clock provider for XCLKOUT is added here.
>>
>> Signed-off-by: Tushar Behera 
>> CC: Tomasz Figa 
>> ---
>>   drivers/clk/samsung/Makefile  |2 +-
>>   drivers/clk/samsung/clk-out.c |  181
>> +
>>   drivers/clk/samsung/clk.h |   33 
>>   3 files changed, 215 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/clk/samsung/clk-out.c
>>

[ ... ]

>> +/**
>> + * struct samsung_clkout_soc_data: SoC specific register details
>> + * @reg: Offset of CLKOUT register from PMU base
> 
> how about naming this variable as "offset" instead of "reg".
> 

Okay, I will change that.

[ ... ]

>> +u8 samsung_clkout_get_parent(struct clk_hw *hw)
>> +{
>> +struct samsung_clkout *clkout = to_clk_out(hw);
>> +const struct samsung_clkout_soc_data *soc_data = clkout->soc_data;
>> +unsigned int parent_mask = BIT(soc_data->mux_width) - 1;
>> +unsigned int val;
>> +int ret;
>> +
>> +ret = regmap_read(clkout->regmap, soc_data->reg, );
> 
> Do we really need to keep return value in "ret" as I can't see you are
> using it anywhere?
> 

Right, we are not using that and can be removed.

>> +
>> +return (val >> soc_data->mux_shift) & parent_mask;
>> +}
>> +

[ ... ]

>> +/* All existing Exynos serial of SoCs have common values for this
>> offsets. */
> typo: serial/series/

Sure. Thanks for your review.

-- 
Tushar Behera
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] Add framework to support clkout

2014-05-11 Thread Tushar Behera

On 05/10/2014 09:09 AM, Pankaj Dubey wrote:
> Hi Tushar,
> 
[ ... ]
>> Also we need to find a suitable place to call early_syscon_init(), after
>> the device tree has been unflattened and before clock initialization.
>>
>> While testing, I called this before of_clk_init() in
>> arch/arm/kernel/time.c,
>> but that place is too generic. Calling anywhere from exynos.c is not
>> working ATM.
> 
> IMO we do not need to, or if I am not wrong we should not change time.c.
> 

The above solution is definitely a hack and just to test my stuff. The
below solution looks good.

> It's possible if we have exynos specific init_time with following changes.
> FYI, In my patch series for Exynos PMU [1], currently I am handling this in
> exynos_dt_machine_init. But definitely it can be handled as below and it
> works
> without any side effect and I have tested it. Only reason I do not
> adopted this
> as for Exynos PMU patch support I had other options. But if required and if
> following change is acceptable I can include this in my next version of
> Exynos
> PMU patch series.
> 
> [1]: https://lkml.org/lkml/2014/4/30/18
> 
> 
> +static void __init exynos_init_time(void)
> +{
> +/* Nothing to do timer specific
> + * as early_syscon_init requires DT to be unflattened and
> + * system should be able to allocate memory we need to
> + * postpone until init_time, but it should be done before
> + * init_machine. Because before init_machine, secondary
> + * core boot starts and it uses PMU registers.
> + */
> +
> +exynos_map_pmu();
> +

Instead of calling early_syscon_init() from within exynos_map_pmu(), it
would be good to call it explicitly here before exynos_map_pmu().

> +of_clk_init(NULL);
> +clocksource_of_init();
> +
> +}
> +

-- 
Tushar Behera
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] ARM: dts: at91-sama5d3_xplained: add the regulator device node

2014-05-11 Thread Yang, Wenyou



> -Original Message-
> From: Ferre, Nicolas
> Sent: Friday, May 09, 2014 11:31 PM
> To: Yang, Wenyou; Alexandre Belloni
> Cc: devicet...@vger.kernel.org; linux-kernel@vger.kernel.org;
> robh...@kernel.org; broo...@kernel.org; linux-arm-
> ker...@lists.infradead.org
> Subject: Re: [PATCH] ARM: dts: at91-sama5d3_xplained: add the regulator
> device node
> 
> On 22/04/2014 03:37, Yang, Wenyou :
> > Hi,
> >
> >> -Original Message-
> >> From: Alexandre Belloni [mailto:alexandre.bell...@free-electrons.com]
> >> Sent: Monday, April 21, 2014 8:22 PM
> >> To: Yang, Wenyou
> >> Cc: devicet...@vger.kernel.org; Ferre, Nicolas; linux-
> >> ker...@vger.kernel.org; robh...@kernel.org; broo...@kernel.org;
> >> linux- arm-ker...@lists.infradead.org
> >> Subject: Re: [PATCH] ARM: dts: at91-sama5d3_xplained: add the
> >> regulator device node
> >>
> >> On 21/04/2014 at 11:54:43 +0200, Alexandre Belloni wrote :
> >>> Hi,
> >>>
> >>> On 21/04/2014 at 12:29:07 +0800, Wenyou Yang wrote :
>  +
>  +vddana_reg: LDO_REG2 {
>  +regulator-name =
> "VDDANA";
>  +
>  regulator-min-microvolt
> =
> >> <330>;
>  +
>  regulator-max-microvolt
> =
> >> <330>;
>  +
>  regulator-always-on;
> >>>
> >>> I'm pretty sure that one is not always on as you actually have to
> >>> configure it to get any voltage. Are you sure you want to set the
> >>> regulator-always-on property here ?
> >>>
> >>
> >> Just to clarify my though, wouldn't it be better to make the ADC
> >> driver handle that regulator instead of using regulator-always-on ?
> > Yes, you are right.
> > It should not use regulator-always-on property for this regulator.
> > It is ADC driver and ISI driver to handle it(The ISI takes PCK for
> clock).
> 
> Hi Wenyou and Alexandre,
> 
> After talking to our system engineers, it not usual to avoid to power
> the VDDANA rail. In fact it will prevent you to use all the pads that
> are powered by VDDANA: PD20-PD31. Moreover, even if you do not activate
> the ADC output on these lines you won't be able to use them as plain
> GPIO... (Cf. package and pinout section of the datasheet).
> 
> As the ADVREF pin of the SoC is connected to the VDDANA on this board
> (even if this default configuration can be modified with a soldering
> iron), we have to note that we may consume a little bit more power.
> 
> But still, I would recommend to keep the "regulator-always-on" property
> on this node. Do you agree and allow me to take your first revision of
> the patch?
I agree.

> 
> 
> Bye,
> --
> Nicolas Ferre

Best Regards,
Wenyou Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] cpufreq: powernow-k8: Suppress checkpatch warnings

2014-05-11 Thread Viresh Kumar

On 11 May 2014 22:56, Stratos Karafotis  wrote:
> Suppress the following checkpatch.pl warnings:
>
> - WARNING: Prefer pr_err(... to printk(KERN_ERR ...
> - WARNING: Prefer pr_info(... to printk(KERN_INFO ...
> - WARNING: Prefer pr_warn(... to printk(KERN_WARNING ...
> - WARNING: quoted string split across lines
> - WARNING: please, no spaces at the start of a line
>
> Also, define the pr_fmt macro instead of PFX for the module name.
>
> Signed-off-by: Stratos Karafotis 
> ---
>
> Changes v1 -> v2
> - Use pr_err_once instead of printk_once
> - Change missing_pss_msg to macro (because pr_err_once
> doesn't compile otherwise)
> - Put one pr_err message in a single line instead of two
> - Ignore "line over 80 characters" warnings
> - Change the word "Fix" in the subject of the patch to
> "Suppress" as the patch doesn't really fix anything
>
>  drivers/cpufreq/powernow-k8.c | 180 
> +-
>  drivers/cpufreq/powernow-k8.h |   2 +-
>  2 files changed, 74 insertions(+), 108 deletions(-)

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error (again)

2014-05-11 Thread Benjamin Herrenschmidt

On Fri, 2014-05-09 at 17:07 -0700, Guenter Roeck wrote:
> Commit 4e243b7 (powerpc: Fix "attempt to move .org backwards" error) fixes the
> allyesconfig build by moving machine_check_common to a different location.
> While this fixes most of the errors, both allmodconfig and allyesconfig still
> fail as follows.
> 
> arch/powerpc/kernel/exceptions-64s.S:1315: Error: attempt to move .org 
> backwards
> 
> Fix by moving machine_check_common after the offending address.

This suffers from the same problem as previous attempts, on some of my
test configs I get:

arch/powerpc/kernel/head_64.o:(__ftr_alt_97+0xb0): relocation truncated to fit: 
R_PPC64_REL14 against `.text'+1c90
make[1]: *** [vmlinux] Error 1
make: *** [sub-make] Error 2

IE, it breaks currently working configs.

So we need to move more things around and I haven't had a chance to
sort it out.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fs: cifs: new helper: file_inode(file)

2014-05-11 Thread Steve French

merged into cifs-2.6.git for-next

On Tue, Dec 10, 2013 at 9:02 PM, Libo Chen  wrote:
>
> Signed-off-by: Libo Chen 
> ---
>  fs/cifs/ioctl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
> index 7749230..45cb59b 100644
> --- a/fs/cifs/ioctl.c
> +++ b/fs/cifs/ioctl.c
> @@ -85,7 +85,7 @@ static long cifs_ioctl_clone(unsigned int xid, struct file 
> *dst_file,
> goto out_fput;
> }
>
> -   src_inode = src_file.file->f_dentry->d_inode;
> +   src_inode = file_inode(src_file.file);
>
> /*
>  * Note: cifs case is easier than btrfs since server responsible for
> --
> 1.8.2.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Thanks,

Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.14.3 i915 dead display under X11

2014-05-11 Thread Carbonated Beverage

Hi all,

I rarely upgrade kernels these days -- so when updating to 3.14.3, I found
the X display was blank -- switching to a text console appears to work, but
I still have to type blind.

Symptoms:

Text mode and KMS works correctly to come up with the text console.  Running
X (whether through xdm or /usr/bin/Xorg) causes the display to go blank
and apparently turn off.  Switching to a text console via Control-Alt-F#
leaves a mostly blank screen up, but there are brief flashes where it looks
like the contents of the text console gets rendered once every 5 seconds or so,
but so fast no words or letters can be recognized.

System:

* Thinkpad R61
* 00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 
Integrated Graphics Controller (primary) (rev 0c)
* Debian/wheezy
* xserver-xorg-video-intel 2:2.19.0-6

A diff of the Xorg.0.log (with timestamps removed, as it made almost every line
show up in the diff) trimmed down shows:

@@ -53,7 +53,7 @@
  (==) |-->Input Device ""
  (==) The core keyboard device wasn't specified explicitly in the layout.
Using the default keyboard configuration.
- (II) Loader magic: 0x7f2813451ae0
+ (II) Loader magic: 0x7f492c35aae0
  (II) Module ABI versions:
X.Org ANSI C Emulation: 0.4
X.Org Video Driver: 12.1
@@ -170,15 +170,17 @@
Sandybridge Server, Ivybridge Mobile (GT1), Ivybridge Mobile (GT2),
Ivybridge Desktop (GT1), Ivybridge Desktop (GT2), Ivybridge Server,
Ivybridge Server (GT2)
- (--) using VT number 7
+ (++) using VT number 7
 
+ (WW) xf86OpenConsole: setpgid failed: Operation not permitted
+ (WW) xf86OpenConsole: setsid failed: Operation not permitted
  (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
  drmOpenDevice: node name is /dev/dri/card0
- drmOpenDevice: open result is 10, (OK)
+ drmOpenDevice: open result is 8, (OK)
  drmOpenByBusid: Searching for BusID pci::00:02.0
  drmOpenDevice: node name is /dev/dri/card0
- drmOpenDevice: open result is 10, (OK)
- drmOpenByBusid: drmOpenMinor returns 10
+ drmOpenDevice: open result is 8, (OK)
+ drmOpenByBusid: drmOpenMinor returns 8
  drmOpenByBusid: drmGetBusid reports pci::00:02.0
  (**) intel(0): Depth 16, (--) framebuffer bpp 16
  (==) intel(0): RGB weight 565
@@ -387,7 +389,17 @@
  (II) AutoAddDevices is off - not adding device.
  (II) config/udev: Adding input device ThinkPad Extra Buttons 
(/dev/input/event5)
  (II) AutoAddDevices is off - not adding device.
- (II) AIGLX: Suspending AIGLX clients for VT switch
- (II) UnloadModule: "kbd"
- (II) UnloadModule: "mouse"
- Server terminated successfully (0). Closing log file.
+ (II) intel(0): EDID vendor "LEN", prod id 16435
+ (II) intel(0): Printing DDC gathered Modelines:
+ (II) intel(0): Modeline "1440x900"x0.0   97.78  1440 1488 1520 1760  900 903 
909 926 -hsync -vsync (55.6 kHz eP)
+ (II) intel(0): Modeline "1440x900"x0.0   81.49  1440 1488 1520 1760  900 903 
909 926 -hsync -vsync (46.3 kHz e)
+ (II) intel(0): Modeline "800x600"x0.0   40.00  800 840 968 1056  600 601 605 
628 +hsync +vsync (37.9 kHz e)
+ (II) intel(0): Modeline "640x480"x0.0   25.18  640 656 752 800  480 490 492 
525 -hsync -vsync (31.5 kHz e)
+ (II) intel(0): Modeline "1024x768"x0.0   65.00  1024 1048 1184 1344  768 771 
777 806 -hsync -vsync (48.4 kHz e)
+ (II) intel(0): EDID vendor "LEN", prod id 16435
+ (II) intel(0): Printing DDC gathered Modelines:
+ (II) intel(0): Modeline "1440x900"x0.0   97.78  1440 1488 1520 1760  900 903 
909 926 -hsync -vsync (55.6 kHz eP)
+ (II) intel(0): Modeline "1440x900"x0.0   81.49  1440 1488 1520 1760  900 903 
909 926 -hsync -vsync (46.3 kHz e)
+ (II) intel(0): Modeline "800x600"x0.0   40.00  800 840 968 1056  600 601 605 
628 +hsync +vsync (37.9 kHz e)
+ (II) intel(0): Modeline "640x480"x0.0   25.18  640 656 752 800  480 490 492 
525 -hsync -vsync (31.5 kHz e)
+ (II) intel(0): Modeline "1024x768"x0.0   65.00  1024 1048 1184 1344  768 771 
777 806 -hsync -vsync (48.4 kHz e)

Bisecting from 3.13.6 (good) to 3.14.3 (bad) ended up with...

commit b35684b8fa94e04f55fd38bf672b737741d2f9e2
Author: Jani Nikula 
Date:   Thu Nov 14 12:13:41 2013 +0200

drm/i915: do full backlight setup at enable time

We should now have all the information we need to do a full
initialization of the backlight registers.

v2: Keep QUIRK_NO_PCH_PWM_ENABLE for now (Imre).

Signed-off-by: Jani Nikula 
Reviewed-by: Imre Deak 
Signed-off-by: Daniel Vetter 

Which is in 3.12.0

I'm not sure how that came to be.  Does that look right?  What other
information would be required to track this down?

Thanks,

-- DN
Daniel Nobuto
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: arch_random_refill

2014-05-11 Thread H. Peter Anvin

On 05/11/2014 08:36 PM, Stephan Mueller wrote:
> 
> But in our current predicament, not everybody trusts a few potentially easily 
> manipulated gates that have no other purpose than produce white noise which 
> are developed by the biggest chip vendor in the US. Gates which have other 
> purposes may not be that easily manipulated.
>

Incidentally, I disagree with the "easily manipulated" bit.  Yes, I have
seen the paper which says that you can do it in such a way that it
doesn't show up on *visual* examination.  However, put an electrical
probe on it and it shows up immediately.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: arch_random_refill

2014-05-11 Thread H. Peter Anvin

On 05/11/2014 08:36 PM, Stephan Mueller wrote:
> 
> Ohh, ok, thanks for fixing that. :-) 
> 
> Though what makes me wonder is the following: why are some RNGs forced to use 
> the hw_random framework whereas some others are not? What is the driver for 
> that?
> 
> The current state of random.c vs. drivers/char/hw_random and the strong in-
> kernel separation between both makes me wonder. Isn't that all kind of 
> inconsistent?
> 

The main differences are speed of access, trivial interface, and
architectural guarantees.  You also don't have to deal with enumeration,
DMA engines, interrupts, indirect access, or bus drivers, which all are
utterly unacceptable on a synchronous path.

That being said, it is getting clear that we most likely would be better
off with the kernel directly feeding from at least a subset of the
hw_random drivers, rather than waiting for user space to come along and
launch a daemon... after $DEITY knows how many other processes have
already been launched.  There are patches being worked on to make that
happen, although there are a fair number of potential issues, including
the fact that some of the hw_random drivers are believed to be dodgy --
for example, the TPM driver: some TPMs are believed to not contain any
entropy element and simply rely on a factory-seeded nonvolatile counter
(since the TPM has to have support for nonvolatile counters anyway, this
hardware is already present.)

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] staging/lustre: fix sparse warnings in o2iblnd_cb.c

2014-05-11 Thread Zi Shen Lim

This patch fixes the following sparse warnings:

drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:44:1: warning: symbol 
'kiblnd_tx_done' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:102:10: warning: symbol 
'kiblnd_get_idle_tx' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:131:1: warning: symbol 
'kiblnd_drop_rx' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:212:10: warning: symbol 
'kiblnd_find_waiting_tx_locked' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:238:1: warning: symbol 
'kiblnd_handle_completion' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:277:1: warning: symbol 
'kiblnd_send_completion' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:296:1: warning: symbol 
'kiblnd_handle_rx' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:457:1: warning: symbol 
'kiblnd_rx_complete' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:527:13: warning: symbol 
'kiblnd_kvaddr_to_page' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:699:1: warning: symbol 
'kiblnd_setup_rd_iov' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:752:1: warning: symbol 
'kiblnd_setup_rd_kiov' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:792:1: warning: symbol 
'kiblnd_post_tx_locked' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:996:1: warning: symbol 
'kiblnd_tx_complete' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1270:1: warning: symbol 
'kiblnd_connect_peer' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1627:1: warning: symbol 
'kiblnd_reply' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1814:1: warning: symbol 
'kiblnd_thread_fini' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1828:1: warning: symbol 
'kiblnd_peer_notify' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1934:1: warning: symbol 
'kiblnd_handle_early_rxs' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1957:1: warning: symbol 
'kiblnd_abort_txs' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1993:1: warning: symbol 
'kiblnd_finalise_conn' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2167:1: warning: symbol 
'kiblnd_reject' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2178:1: warning: symbol 
'kiblnd_passive_connect' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2452:1: warning: symbol 
'kiblnd_reconnect' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2516:1: warning: symbol 
'kiblnd_rejected' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2655:1: warning: symbol 
'kiblnd_check_connreply' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2754:1: warning: symbol 
'kiblnd_active_connect' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:3025:1: warning: symbol 
'kiblnd_check_conns' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:3108:1: warning: symbol 
'kiblnd_disconnect_conn' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:3247:1: warning: symbol 
'kiblnd_complete' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:904:20: warning: context 
imbalance in 'kiblnd_post_tx_locked' - unexpected unlock

Signed-off-by: Zi Shen Lim 
---
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 60 +++---
 1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c 
b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 9bf6c94..dfd16e7 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -40,7 +40,7 @@
 
 #include "o2iblnd.h"
 
-void
+static void
 kiblnd_tx_done (lnet_ni_t *ni, kib_tx_t *tx)
 {
lnet_msg_t *lntmsg[2];
@@ -99,7 +99,7 @@ kiblnd_txlist_done (lnet_ni_t *ni, struct list_head *txlist, 
int status)
}
 }
 
-kib_tx_t *
+static kib_tx_t *

Re: arch_random_refill

2014-05-11 Thread Stephan Mueller

Am Sonntag, 11. Mai 2014, 20:22:28 schrieb H. Peter Anvin:

Hi Peter,
> 
> > Note, I do not see an issue with the patch that adds RDSEED as part of
> > add_interrupt_randomness outlined in [2]. The reason is that this patch
> > does not monopolizes the noise sources.
> > 
> > I do not want to imply that Intel (or any other chip manufacturer that
> > will
> > hook into arch_random_refill) intentionally provides bad entropy (and this
> > email shall not start a discussion about entropy again), but I would like
> > to be able to only use noise sources that I can fully audit. As it is
> > with hardware, I am not able to see what it is doing.
> 
> I have to point out the irony in this given your previous proposals,
> however...

I guess that is the funny nature of entropy :-)

But in our current predicament, not everybody trusts a few potentially easily 
manipulated gates that have no other purpose than produce white noise which 
are developed by the biggest chip vendor in the US. Gates which have other 
purposes may not be that easily manipulated.
> 
> > Thus, may I ask that arch_random_refill is revised such that it will not
> > monopolize the noise sources? If somebody wants that, he can easily use
> > rngd.
> Feel free to build the kernel without CONFIG_ARCH_RANDOM, or use the
> "nordrand" option to the kernel.  These options are there for a reason.
> 
> Now when you mention it, though, the nordrand option should turn off
> RDSEED as well as RDRAND.  It currently doesn't; that is a bug, plain
> and simple.

Ohh, ok, thanks for fixing that. :-) 

Though what makes me wonder is the following: why are some RNGs forced to use 
the hw_random framework whereas some others are not? What is the driver for 
that?

The current state of random.c vs. drivers/char/hw_random and the strong in-
kernel separation between both makes me wonder. Isn't that all kind of 
inconsistent?

Ciao
Stephan
-- 
| Cui bono? |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 0/2] remap_file_pages() decommission

2014-05-11 Thread Andi Kleen

Armin Rigo  writes:

> Here is a note from the PyPy project (mentioned earlier in this
> thread, and at https://lwn.net/Articles/587923/ ).

Your use is completely bogus. remap_file_pages() pins everything 
and disables any swapping for the area.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kmemcheck: got WARNING when dynamicly adjust /proc/sys/kernel/kmemcheck to 0/1

2014-05-11 Thread Xishi Qiu

On 2014/5/9 18:02, Vegard Nossum wrote:

> On 05/09/2014 11:52 AM, Xishi Qiu wrote:
>> On 2014/5/9 15:57, Xishi Qiu wrote:
>>
>>> OS boot with kmemcheck=0, then set 1, do something, set 0, do something, 
>>> set 1...
>>> then I got the WARNING log. Does kmemcheck support dynamicly adjust?
>>>
>>> Thanks,
>>> Xishi Qiu
>>>
>>> [   20.200305] igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow 
>>> Control: RX
>>> [   20.208652] ADDRCONF(NETDEV_UP): eth0: link is not ready
>>> [   20.216504] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>> [   22.647385] auditd (3116): /proc/3116/oom_adj is deprecated, please use 
>>> /proc/3116/oom_score_adj instead.
>>> [   24.845214] BIOS EDD facility v0.16 2004-Jun-25, 1 devices found
>>> [   30.434764] eth0: no IPv6 routers present
>>> [  340.154608] NOHZ: local_softirq_pending 01
>>> [  340.154639] WARNING: kmemcheck: Caught 64-bit read from uninitialized 
>>> memory (88083f43a550)
>>> [  340.154644] 
>>> c20080ff5d0100c9400ed34e0888
>>> [  340.154667]  u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u 
>>> u u
>>> [  340.154687]  ^
>>> [  340.154690]
>>> [  340.154694] Pid: 3, comm: ksoftirqd/0 Tainted: G C   
>>> 3.4.24-qiuxishi.19-0.1-default+ #2 Huawei Technologies Co., Ltd. Tecal 
>>> RH2285 V2-24S/BC11SRSC1
>>> [  340.154702] RIP: 0010:[]  [] 
>>> d_namespace_path+0x132/0x270
>>> [  340.154714] RSP: 0018:8808515a1c88  EFLAGS: 00010202
>>> [  340.154718] RAX: 88083f43a540 RBX: 880852e718f3 RCX: 
>>> 0001
>>> [  340.154721] RDX: 8808515a1d28 RSI:  RDI: 
>>> 881053855a60
>>> [  340.154725] RBP: 8808515a1ce8 R08: 8808515a1c50 R09: 
>>> 880852e75800
>>> [  340.154728] R10: 000156f0 R11:  R12: 
>>> 0001
>>> [  340.154731] R13: 0100 R14: 880852e71510 R15: 
>>> 880852e71800
>>> [  340.154736] FS:  () GS:88085f60() 
>>> knlGS:
>>> [  340.154740] CS:  0010 DS:  ES:  CR0: 8005003b
>>> [  340.154743] CR2: 880852e71570 CR3: 0008513f2000 CR4: 
>>> 000407f0
>>> [  340.154746] DR0:  DR1:  DR2: 
>>> 
>>> [  340.154750] DR3:  DR6: 4ff0 DR7: 
>>> 0400
>>> [  340.154753]  [] aa_path_name+0x85/0x180
>>> [  340.154758]  [] apparmor_bprm_set_creds+0x126/0x520
>>> [  340.154763]  [] security_bprm_set_creds+0xe/0x10
>>> [  340.154771]  [] prepare_binprm+0xa5/0x100
>>> [  340.154777]  [] do_execve_common+0x232/0x430
>>> [  340.154781]  [] do_execve+0x3a/0x40
>>> [  340.154785]  [] sys_execve+0x49/0x70
>>> [  340.154793]  [] stub_execve+0x6c/0xc0
>>> [  340.154801]  [] 0x
>>> [  340.154813] WARNING: kmemcheck: Caught 64-bit read from uninitialized 
>>> memory (88083f43a570)
>>> [  340.154817] 
>>> 746f730078a5433f0888f86d433f0888746f7073
>>> [  340.154839]  u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u 
>>> u u
>>> [  340.154858]  ^
>>> [  340.154861]
>>> [  340.154864] Pid: 3, comm: ksoftirqd/0 Tainted: G C   
>>> 3.4.24-qiuxishi.19-0.1-default+ #2 Huawei Technologies Co., Ltd. Tecal 
>>> RH2285 V2-24S/BC11SRSC1
>>> [  340.154871] RIP: 0010:[]  [] 
>>> rw_verify_area+0x24/0x100
>>> [  340.154880] RSP: 0018:8808515a1dc8  EFLAGS: 00010202
>>> [  340.154883] RAX: 88083f43a540 RBX: 0080 RCX: 
>>> 0080
>>> [  340.154887] RDX: 8808515a1e30 RSI: 880852e71500 RDI: 
>>> 
>>> [  340.154890] RBP: 8808515a1de8 R08: 880852e73200 R09: 
>>> 88085f004900
>>> [  340.154894] R10: 880852e72600 R11:  R12: 
>>> 880852e71500
>>> [  340.154897] R13:  R14: 880852e73200 R15: 
>>> 0001
>>> [  340.154901] FS:  () GS:88085f60() 
>>> knlGS:
>>> [  340.154905] CS:  0010 DS:  ES:  CR0: 8005003b
>>> [  340.154908] CR2: 880852e71570 CR3: 0008513f2000 CR4: 
>>> 000407f0
>>> [  340.154911] DR0:  DR1:  DR2: 
>>> 
>>> [  340.154914] DR3:  DR6: 4ff0 DR7: 
>>> 0400
>>> [  340.154917]  [] vfs_read+0xa4/0x130
>>> [  340.154922]  [] kernel_read+0x44/0x60
>>> [  340.154926]  [] prepare_binprm+0xd0/0x100
>>> [  340.154931]  [] do_execve_common+0x232/0x430
>>> [  340.154935]  [] do_execve+0x3a/0x40
>>> [  340.154939]  [] sys_execve+0x49/0x70
>>> [  340.154944]  [] stub_execve+0x6c/0xc0
>>> [  340.154950]  [] 0x
>>> [  340.154955] WARNING: kmemcheck: Caught 32-bit read from uninitialized 
>>> memory (88083f43a540)
>>> [  340.154959] 
>>> c20080ff5d0100c9400ed34e0888
>>> [  340.154981]  u u u u u u u u u u u u u u u u i i i

[tip:x86/urgent] x86, rdrand: When nordrand is specified, disable RDSEED as well

2014-05-11 Thread tip-bot for H. Peter Anvin

Commit-ID:  7a5091d58419b4e5222abce58a40c072786ea1d6
Gitweb: http://git.kernel.org/tip/7a5091d58419b4e5222abce58a40c072786ea1d6
Author: H. Peter Anvin 
AuthorDate: Sun, 11 May 2014 20:25:20 -0700
Committer:  H. Peter Anvin 
CommitDate: Sun, 11 May 2014 20:25:20 -0700

x86, rdrand: When nordrand is specified, disable RDSEED as well

One can logically expect that when the user has specified "nordrand",
the user doesn't want any use of the CPU random number generator,
neither RDRAND nor RDSEED, so disable both.

Reported-by: Stephan Mueller 
Cc: Theodore Ts'o 
Link: http://lkml.kernel.org/r/21542339.0lfnpsy...@myon.chronox.de
Signed-off-by: H. Peter Anvin 
---
 Documentation/kernel-parameters.txt | 8 
 arch/x86/kernel/cpu/rdrand.c| 1 +
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 4384217..30a8ad0d 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2218,10 +2218,10 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
noreplace-smp   [X86-32,SMP] Don't replace SMP instructions
with UP alternatives
 
-   nordrand[X86] Disable the direct use of the RDRAND
-   instruction even if it is supported by the
-   processor.  RDRAND is still available to user
-   space applications.
+   nordrand[X86] Disable kernel use of the RDRAND and
+   RDSEED instructions even if they are supported
+   by the processor.  RDRAND and RDSEED are still
+   available to user space applications.
 
noresume[SWSUSP] Disables resume and restores original swap
space.
diff --git a/arch/x86/kernel/cpu/rdrand.c b/arch/x86/kernel/cpu/rdrand.c
index 384df51..136ac74 100644
--- a/arch/x86/kernel/cpu/rdrand.c
+++ b/arch/x86/kernel/cpu/rdrand.c
@@ -27,6 +27,7 @@
 static int __init x86_rdrand_setup(char *s)
 {
setup_clear_cpu_cap(X86_FEATURE_RDRAND);
+   setup_clear_cpu_cap(X86_FEATURE_RDSEED);
return 1;
 }
 __setup("nordrand", x86_rdrand_setup);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V4 2/2] fs/ext4/fsync.c: generic_file_fsync call based on barrier flag

2014-05-11 Thread Ming Lei

On Sun, May 11, 2014 at 1:06 AM, Fabian Frederick  wrote:
> generic_file_fsync has been updated to issue a flush for
> older filesystems.
>
> This patch tests for barrier flag in ext4 mount flags
> and calls the right function.
>
> Suggested-by: Jan Kara 
> Suggested-by: Christoph Hellwig 
> Cc: Jan Kara 
> Cc: Christoph Hellwig 
> Cc: Alexander Viro 
> Cc: "Theodore Ts'o" 
> Cc: Andrew Morton 
> Signed-off-by: Fabian Frederick 
> ---
>  fs/ext4/fsync.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index a8bc47f..fa82c0a 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -108,6 +108,10 @@ int ext4_sync_file(struct file *file, loff_t start, 
> loff_t end, int datasync)
>
> if (!journal) {
> ret = generic_file_fsync(file, start, end, datasync);

Forget to remove above line?

> +   if (test_opt(inode->i_sb, BARRIER))
> +   ret = generic_file_fsync(file, start, end, datasync);
> +   else
> +   ret = __generic_file_fsync(file, start, end, 
> datasync);
> if (!ret && !hlist_empty(>i_dentry))
> ret = ext4_sync_parent(inode);
> goto out;



Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH 2/2] ARM: ioremap: Add IO mapping space reused support.

2014-05-11 Thread Richard Lee

For the IO mapping, for the same physical address space maybe
mapped more than one time, for example, in some SoCs:
0x2000 ~ 0x20001000: are global control IO physical map,
and this range space will be used by many drivers.
And then if each driver will do the same ioremap operation, we
will waste to much malloc virtual spaces.

This patch add IO mapping space reused support.

Signed-off-by: Richard Lee 
---
 arch/arm/mm/ioremap.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index f9c32ba..26a3744 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -260,7 +260,7 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
 {
const struct mem_type *type;
int err;
-   unsigned long addr;
+   unsigned long addr, off;
struct vm_struct *area;
phys_addr_t paddr = __pfn_to_phys(pfn);
 
@@ -301,6 +301,12 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
if (WARN_ON(pfn_valid(pfn)))
return NULL;
 
+   area = find_vm_area_paddr(paddr, size, , VM_IOREMAP);
+   if (area) {
+   addr = (unsigned long)area->addr;
+   return (void __iomem *)(offset + off + addr);
+   }
+
area = get_vm_area_caller(size, VM_IOREMAP, caller);
if (!area)
return NULL;
@@ -410,6 +416,9 @@ void __iounmap(volatile void __iomem *io_addr)
if (svm)
return;
 
+   if (!vm_area_is_aready_to_free((unsigned long)addr))
+   return;
+
 #if !defined(CONFIG_SMP) && !defined(CONFIG_ARM_LPAE)
{
struct vm_struct *vm;
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: arch_random_refill

2014-05-11 Thread H. Peter Anvin

On 05/11/2014 04:01 PM, Stephan Mueller wrote:
> Hi Peter,
> 
> some time back when the RDRAND instruction was debated, a patch was offered 
> for driver/char/random.c that in essence turned /dev/random into a frontend 
> for RDRAND in case that instruction was available. The patch kind of 
> monopolized the noise sources such that if a user space "random number hog" 
> pulled data from /dev/random endlessly, the (almost) only noise source was 
> RDRAND. As that patch treated RDRAND to provide entropy, the blocking 
> behavior 
> went away for /dev/random.
> 

This is false in a number of ways.

First of all... we NEVER pulled either /dev/random or /dev/urandom
directly from RDRAND.  We used RDRAND directly for kernel internal
randomness uses.  Users did object to this.

> That patch did not sit well with some developers and it got finally changed 
> such that the output of RDRAND is now just XORed with the output of the 
> "classic" /dev/random behavior -- /dev/random is still blocking. 

Mixing in RDRAND into /dev/random and /dev/urandom is actually

> With the current development cycle for 3.15, the function arch_random_refill 
> is added as presented in [1]. It now uses RDSEED instead of RDRAND. Yet, the 
> way this function is called in random_read seems (as I have no system with an 
> RDSEED, I cannot test) to show the very same behavior as the aforementioned 
> RDRAND patch: the blocking behavior of /dev/random will be gone and RDSEED 
> will monopolize the noise sources in case of a user space hog.

There is a huge difference between this and what people objected to
earlier: we filter everything through the kernel random number pool
system, which would require a herculean mathematical effort to reverse
even if the output of RDSEED was 100% predictable.

> Note, I do not see an issue with the patch that adds RDSEED as part of 
> add_interrupt_randomness outlined in [2]. The reason is that this patch does 
> not monopolizes the noise sources.
> 
> I do not want to imply that Intel (or any other chip manufacturer that will 
> hook into arch_random_refill) intentionally provides bad entropy (and this 
> email shall not start a discussion about entropy again), but I would like to 
> be able to only use noise sources that I can fully audit. As it is with 
> hardware, I am not able to see what it is doing.

I have to point out the irony in this given your previous proposals,
however...

> Thus, may I ask that arch_random_refill is revised such that it will not 
> monopolize the noise sources? If somebody wants that, he can easily use rngd.

Feel free to build the kernel without CONFIG_ARCH_RANDOM, or use the
"nordrand" option to the kernel.  These options are there for a reason.

Now when you mention it, though, the nordrand option should turn off
RDSEED as well as RDRAND.  It currently doesn't; that is a bug, plain
and simple.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH 0/2] Add IO mapping space reused support

2014-05-11 Thread Richard Lee


Richard Lee (2):
  mm/vmalloc: Add IO mapping space reused interface.
  ARM: ioremap: Add IO mapping space reused support.

 arch/arm/mm/ioremap.c   | 11 -
 include/linux/vmalloc.h |  5 
 mm/vmalloc.c| 63 +
 3 files changed, 78 insertions(+), 1 deletion(-)

-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

icmp: account for ICMP out errors because of socket limit

2014-05-11 Thread zhuyj


Hi, Eric && David

    __
|| |  |
| PC |<--->| MIPS 32 core |
|| |__|

When ping from a PC to a board (MIPS 32 core), because of socket limit,
ping echo will fail. But ICMP_MIB_OUTERRORS counter is not incremented.
In this case, "netstat -s" can not report these errors.

This patch will fix this problem. Now it is in the attachment. Please 
check it.


Best Regards!
Zhu Yanjun
>From af800d0f123cf9c66a9ae167baa7dc1d25d0cd1f Mon Sep 17 00:00:00 2001
From: Zhu Yanjun 
Date: Mon, 12 May 2014 11:07:20 +0800
Subject: [PATCH 1/1] icmp: account for ICMP out errors because of socket
 limit

When icmp_xmit_lock fails because of socket limit or memory shortage,
increment ICMP_MIB_OUTERRORS counter, so that "netstat -s" can report
these errors.

netstat -s | grep "ICMP messages failed"
0 ICMP messages failed

Signed-off-by: Zhu Yanjun 
---
 net/ipv4/icmp.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 0134663..9a0bd7c 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -342,8 +342,10 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 		return;
 
 	sk = icmp_xmit_lock(net);
-	if (sk == NULL)
+	if (sk == NULL){
+		ICMP_INC_STATS_BH(net, ICMP_MIB_OUTERRORS);
 		return;
+	}
 	inet = inet_sk(sk);
 
 	icmp_param->data.icmph.checksum = 0;
-- 
1.7.9.5

Re: [PATCH 1/5] irq_work: Architecture support for remote irq work raise

2014-05-11 Thread Benjamin Herrenschmidt

On Mon, 2014-05-12 at 10:08 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2014-05-12 at 01:33 +0200, Frederic Weisbecker wrote:
> > We are going to extend irq work to support remote queuing.
> > 
> > So lets add a cpu argument to arch_irq_work_raise(). The architectures
> > willing to support that must then provide the backend to raise irq work
> > IPIs remotely.
> > 
> > Initial support is provided for x86 and ARM since they are easily
> > extended. The other archs that overwrite arch_irq_work_raise() seem
> > to use local clock interrupts and therefore need deeper rewrite of their
> > irq work support to implement remote raising.
> 
> Well, looks like it's time to turn it into an IPI... It gets a bit more
> tricky because whether whacking the interrupt controller is safe to
> do from an NMI is safe or not might depend on that irq controller
> implementation...
> 
> It looks like XICS and MPIC should be safe though, so at least we
> should be able to cover ppc64, but I'll leave ppc32 alone.

Correction... that's actually a bit more tricky. We might need an MMIO
to trigger the IPI. That means potentially having to take a hash miss,
and we certainly can't do that at NMI time at the moment.

We *could* hard disable interrupts (which blocks our NMIs since they
arent't real NMIs, they are just a way to bypass our soft-disable state
for perf interrupts) for hash_page, but that still makes me somewhat
nervous.

Another option would be to add an ioremap flag of some description to
be able to install bolted hash entries. (It already does so if called
early enough during boot, so it might actually just work by accident but
that's an undebuggable horror show waiting to happen if we ever change
that).

So needs a bit more thinking on our side.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH 1/2] mm/vmalloc: Add IO mapping space reused interface.

2014-05-11 Thread Richard Lee

For the IO mapping, for the same physical address space maybe
mapped more than one time, for example, in some SoCs:
0x2000 ~ 0x20001000: are global control IO physical map,
and this range space will be used by many drivers.
And then if each driver will do the same ioremap operation, we
will waste to much malloc virtual spaces.

This patch add the IO mapping space reusing interface:
- find_vm_area_paddr: used to find the exsit vmalloc area using
  the IO physical address.
- vm_area_is_aready_to_free: before vfree the IO mapped areas
  using this to do the check that if this area is used by more
  than one consumer.

Signed-off-by: Richard Lee 
---
 include/linux/vmalloc.h |  5 
 mm/vmalloc.c| 63 +
 2 files changed, 68 insertions(+)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 4b8a891..2b811f6 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -34,6 +34,7 @@ struct vm_struct {
struct page **pages;
unsigned intnr_pages;
phys_addr_t phys_addr;
+   unsigned intused;
const void  *caller;
 };
 
@@ -100,6 +101,10 @@ static inline size_t get_vm_area_size(const struct 
vm_struct *area)
return area->size - PAGE_SIZE;
 }
 
+extern int vm_area_is_aready_to_free(phys_addr_t addr);
+struct vm_struct *find_vm_area_paddr(phys_addr_t paddr, size_t size,
+unsigned long *offset,
+unsigned long flags);
 extern struct vm_struct *get_vm_area(unsigned long size, unsigned long flags);
 extern struct vm_struct *get_vm_area_caller(unsigned long size,
unsigned long flags, const void 
*caller);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index bf233b2..f75b7b3 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1293,6 +1293,7 @@ static void setup_vmalloc_vm(struct vm_struct *vm, struct 
vmap_area *va,
vm->addr = (void *)va->va_start;
vm->size = va->va_end - va->va_start;
vm->caller = caller;
+   vm->used = 1;
va->vm = vm;
va->flags |= VM_VM_AREA;
spin_unlock(_area_lock);
@@ -1383,6 +1384,68 @@ struct vm_struct *get_vm_area_caller(unsigned long size, 
unsigned long flags,
  NUMA_NO_NODE, GFP_KERNEL, caller);
 }
 
+int vm_area_is_aready_to_free(phys_addr_t addr)
+{
+   struct vmap_area *va;
+
+   va = find_vmap_area((unsigned long)addr);
+   if (!va || !(va->flags & VM_VM_AREA) || !va->vm)
+   return 1;
+
+   if (va->vm->used <= 1)
+   return 1;
+
+   --va->vm->used;
+
+   return 0;
+}
+
+/**
+ * find_vm_area_paddr  -  find a continuous kernel virtual area using the
+ * physical addreess.
+ * @paddr: base physical address
+ * @size:  size of the physical area range
+ * @offset:the start offset of the vm area
+ * @flags: %VM_IOREMAP for I/O mappings
+ *
+ * Search for the kernel VM area, whoes physical address starting at 
@paddr,
+ * and if the exsit VM area's size is large enough, then just return it, or
+ * return NULL.
+ */
+struct vm_struct *find_vm_area_paddr(phys_addr_t paddr, size_t size,
+unsigned long *offset,
+unsigned long flags)
+{
+   struct vmap_area *va;
+
+   if (!(flags & VM_IOREMAP))
+   return NULL;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(va, _area_list, list) {
+   phys_addr_t phys_addr;
+
+   if (!va || !(va->flags & VM_VM_AREA) || !va->vm)
+   continue;
+
+   phys_addr = va->vm->phys_addr;
+
+   if (paddr < phys_addr || paddr + size > phys_addr + 
va->vm->size)
+   continue;
+
+   *offset = paddr - phys_addr;
+
+   if (va->vm->flags & VM_IOREMAP && va->vm->size >= size) {
+   va->vm->used++;
+   rcu_read_unlock();
+   return va->vm;
+   }
+   }
+   rcu_read_unlock();
+
+   return NULL;
+}
+
 /**
  * find_vm_area  -  find a continuous kernel virtual area
  * @addr:  base address
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/6] ipc/sem.c: Fix semctl(,,{GETNCNT,GETZCNT})

2014-05-11 Thread Davidlohr Bueso

On Sat, 2014-05-10 at 12:03 +0200, Manfred Spraul wrote:
> Hi all,
> 
> According to the man page of semop(), semzcnt or semncnt are increased
> exactly for the operation that couldn't proceed.
> 
> The Linux implementation always tried to be clever and to increase the 
> counters
> for all operations that might be the reason why a task sleeps.

... and I hate the fact that we do so on demand, instead of modifying
the values when semop is called. This makes the whole semctl calls less
accurate, and in fact its mentioned in the code.

> The following patches fix that and make the code conform to the 
> documentation.
> 
> The series got fairly long, because I also noticed that semzcnt was calculated
> incorrectly.
> 
> What do you think?

I'm still going through the changes, sems make my brain hurt. But
conceptually they do make sense... and hey, if semctl(GETNCNT,GETZCNT)
calls are currently incomplete, then yeah, we should fix it.

> I ran a few test cases, and the semncnt and semzcnt counts now match
> the expectation.
> 
> Is anyone aware of an application that uses GETNCNT or GETZCNT?

Given how Oracle uses sysv semaphores I wouldn't be surprised if they
make use of these, specially GETNCNT, for something like "get the amount
of waiters" as opposed to "are there waiters"... but I'm just
speculating here.

I did find that LTP does some calls to GETZNCT, GETNCNT, and these
patches do not break those tests. However, they are pretty bogus since
they always test for zero. That reminds me, it might be worthwhile
adding some more tests in the selftests/ipc dir, we only have some
trivial msgq program, for the rest I pretty much rely on LTP for
correctness runs.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 000/143] 2.6.32.62-longterm review

2014-05-11 Thread Willy Tarreau

This is the start of the longterm review cycle for the 2.6.32.62 release.
All patches will be posted as a response to this one. If anyone has any
issue with these being applied, please let me know. If anyone is a
maintainer of the proper subsystem, and wants to add a Signed-off-by: line
to the patch, please respond with it.

Responses should be made before Friday 16th 8PM UTC. Anything received
after that time might be too late. If someone wants a bit more time for
a deeper review, please let me know.

The whole patch series can be found in one patch at :
 kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.62-rc1.gz

The shortlog and diffstat are appended below.

--

Andreas Henriksson (1):
  net: Fix "ip rule delete table 256"

Andy Honig (2):
  KVM: Improve create VCPU parameter (CVE-2013-4587)
  KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367)

Ben Greear (1):
  Fix lockup related to stop_machine being stuck in __do_softirq.

Changli Gao (2):
  net: Swap ver and type in pppoe_hdr
  net: drop_monitor: fix the value of maxattr

Chris Healy (1):
  resubmit bridge: fix message_age_timer calculation

Dan Carpenter (13):
  cciss: fix info leak in cciss_ioctl32_passthru()
  cpqarray: fix info leak in ida_locked_ioctl()
  net: heap overflow in __audit_sockaddr()
  arcnet: cleanup sizeof parameter
  af_key: more info leaks in pfkey messages
  net_sched: info leak in atm_tc_dump_class()
  isdnloop: use strlcpy() instead of strcpy()
  net: clamp ->msg_namelen instead of returning an error
  isdnloop: several buffer overflows
  libertas: potential oops in debugfs
  uml: check length in exitcode_proc_write()
  xfs: underflow bug in xfs_attrlist_by_handle()
  aacraid: missing capable() check in compat ioctl

Daniel Borkmann (8):
  net: sctp: fix NULL pointer dereference in socket destruction
  packet: packet_getname_spkt: make sure string is always 0-terminated
  random32: fix off-by-one in seeding requirement
  net: llc: fix use after free in llc_ui_recvmsg
  net: sctp: fix sctp_connectx abi for ia32 emulation/compat mode
  net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable
  net: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunk
  netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages

Dave Kleikamp (1):
  sunvnet: vnet_port_remove must call unregister_netdev

David S. Miller (1):
  net_sched: Fix stack info leak in cbq_dump_wrr().

Ding Tianhong (1):
  bridge: flush br's address entry in fdb when remove the bridge dev

Duan Jiong (1):
  ipv6: use rt6_get_dflt_router to get default router in rt6_route_rcv

Eric Dumazet (12):
  ipv6: ip6_sk_dst_check() must not assume ipv6 dst
  ipv6: tcp: fix panic in SYN processing
  tcp: must unclone packets before mangling them
  net: do not call sock_put() on TIMEWAIT sockets
  tcp: fix tcp_md5_hash_skb_data()
  ipv6: fix possible crashes in ip6_cork_release()
  ip_tunnel: fix kernel panic with icmp_dest_unreach
  neighbour: fix a race in neigh_destroy()
  vlan: fix a race in egress prio management
  tcp: cubic: fix bug in bictcp_acked()
  ipv4: fix possible seqlock deadlock
  inet: fix possible seqlock deadlocks

Fan Du (1):
  sctp: Use software crc32 checksum when xfrm transform will happen.

Florian Westphal (1):
  net: rose: restore old recvmsg behavior

Hannes Frederic Sowa (12):
  ipv6: don't stop backtracking in fib6_lookup_1 if subtree does not match
  ipv6: remove max_addresses check from ipv6_create_tempaddr
  ipv6: drop packets with multiple fragmentation headers
  inet: prevent leakage of uninitialized memory to user in recv syscalls
  net: rework recvmsg handler msg_name and msg_namelen logic
  net: add BUG_ON if kernel advertises msg_namelen > sizeof(struct 
sockaddr_storage)
  inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu 
functions
  ipv6: fix leaking uninitialized port number of offender sockaddr
  ipv6: fix possible seqlock deadlock in ip6_finish_output2
  ipv6: udp packets following an UFO enqueued packet need also be handled 
by UFO
  inet: fix possible memory corruption with UDP_CORK and UFO
  ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET 
pending data

Ian Abbott (1):
  staging: comedi: ni_65xx: (bug fix) confine insn_bits to one subdevice

Jason Wang (1):
  virtio-net: alloc big buffers also when guest can receive UFO

Jiri Bohac (2):
  ICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTO
  bonding: 802.3ad: make aggregator_identifier bond-private

Jitendra Bhivare (1):
  intel-iommu: Flush unmaps at domain_exit

Jonathan Salwan (1):
  drivers/cdrom/cdrom.c: use kzalloc() for failing hardware

Julian Anastasov (1):
  ipvs: fix CHECKSUM_PARTIAL for TCP, UDP

[ 030/143] proc connector: fix info leaks

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Mathias Krause 

[ Upstream commit e727ca82e0e9616ab4844301e6bae60ca7327682 ]

Initialize event_data for all possible message types to prevent leaking
kernel stack contents to userland (up to 20 bytes). Also set the flags
member of the connector message to 0 to prevent leaking two more stack
bytes this way.

Cc: sta...@vger.kernel.org  
Signed-off-by: Mathias Krause 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 drivers/connector/cn_proc.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 6069790..3a2587a 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -59,6 +59,7 @@ void proc_fork_connector(struct task_struct *task)
 
msg = (struct cn_msg*)buffer;
ev = (struct proc_event*)msg->data;
+   memset(>event_data, 0, sizeof(ev->event_data));
get_seq(>seq, >cpu);
ktime_get_ts(); /* get high res monotonic timestamp */
put_unaligned(timespec_to_ns(), (__u64 *)>timestamp_ns);
@@ -71,6 +72,7 @@ void proc_fork_connector(struct task_struct *task)
memcpy(>id, _proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
+   msg->flags = 0; /* not used */
/*  If cn_netlink_send() failed, the data is not sent */
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
@@ -87,6 +89,7 @@ void proc_exec_connector(struct task_struct *task)
 
msg = (struct cn_msg*)buffer;
ev = (struct proc_event*)msg->data;
+   memset(>event_data, 0, sizeof(ev->event_data));
get_seq(>seq, >cpu);
ktime_get_ts(); /* get high res monotonic timestamp */
put_unaligned(timespec_to_ns(), (__u64 *)>timestamp_ns);
@@ -97,6 +100,7 @@ void proc_exec_connector(struct task_struct *task)
memcpy(>id, _proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
+   msg->flags = 0; /* not used */
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
@@ -113,6 +117,7 @@ void proc_id_connector(struct task_struct *task, int 
which_id)
 
msg = (struct cn_msg*)buffer;
ev = (struct proc_event*)msg->data;
+   memset(>event_data, 0, sizeof(ev->event_data));
ev->what = which_id;
ev->event_data.id.process_pid = task->pid;
ev->event_data.id.process_tgid = task->tgid;
@@ -136,6 +141,7 @@ void proc_id_connector(struct task_struct *task, int 
which_id)
memcpy(>id, _proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
+   msg->flags = 0; /* not used */
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
@@ -151,6 +157,7 @@ void proc_sid_connector(struct task_struct *task)
 
msg = (struct cn_msg *)buffer;
ev = (struct proc_event *)msg->data;
+   memset(>event_data, 0, sizeof(ev->event_data));
get_seq(>seq, >cpu);
ktime_get_ts(); /* get high res monotonic timestamp */
put_unaligned(timespec_to_ns(), (__u64 *)>timestamp_ns);
@@ -161,6 +168,7 @@ void proc_sid_connector(struct task_struct *task)
memcpy(>id, _proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
+   msg->flags = 0; /* not used */
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
@@ -176,8 +184,10 @@ void proc_exit_connector(struct task_struct *task)
 
msg = (struct cn_msg*)buffer;
ev = (struct proc_event*)msg->data;
+   memset(>event_data, 0, sizeof(ev->event_data));
get_seq(>seq, >cpu);
ktime_get_ts(); /* get high res monotonic timestamp */
+   memset(>event_data, 0, sizeof(ev->event_data));
put_unaligned(timespec_to_ns(), (__u64 *)>timestamp_ns);
ev->what = PROC_EVENT_EXIT;
ev->event_data.exit.process_pid = task->pid;
@@ -188,6 +198,7 @@ void proc_exit_connector(struct task_struct *task)
memcpy(>id, _proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
+   msg->flags = 0; /* not used */
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
@@ -211,6 +222,7 @@ static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack)
 
msg = (struct cn_msg*)buffer;
ev = (struct proc_event*)msg->data;
+   memset(>event_data, 0, sizeof(ev->event_data));
msg->seq = rcvd_seq;
ktime_get_ts(); /* get high res monotonic timestamp */
put_unaligned(timespec_to_ns(), (__u64 *)>timestamp_ns);
@@ -220,6 +232,7 @@ static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack)
memcpy(>id, _proc_event_id, sizeof(msg->id));
msg->ack = rcvd_ack + 1;
msg->len = sizeof(*ev);
+   msg->flags = 0; /* not used */
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
@@ -249,6

[ 005/143] ipvs: fix CHECKSUM_PARTIAL for TCP, UDP

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Julian Anastasov 

Fix CHECKSUM_PARTIAL handling. Tested for IPv4 TCP,
UDP not tested because it needs network card with HW CSUM support.
May be fixes problem where IPVS can not be used in virtual boxes.
Problem appears with DNAT to local address when the local stack
sends reply in CHECKSUM_PARTIAL mode.

Fix tcp_dnat_handler and udp_dnat_handler to provide
vaddr and daddr in right order (old and new IP) when calling
tcp_partial_csum_update/udp_partial_csum_update (CHECKSUM_PARTIAL).

Signed-off-by: Julian Anastasov 
Signed-off-by: Simon Horman 
(cherry picked from commit 5bc9068e9d962ca6b8bec3f0eb6f60ab4dee1d04)
Signed-off-by: Willy Tarreau 
---
 net/netfilter/ipvs/ip_vs_proto_tcp.c | 10 +-
 net/netfilter/ipvs/ip_vs_proto_udp.c | 10 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c 
b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index 91d28e0..d462b0d 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -147,15 +147,15 @@ tcp_partial_csum_update(int af, struct tcphdr *tcph,
 #ifdef CONFIG_IP_VS_IPV6
if (af == AF_INET6)
tcph->check =
-   csum_fold(ip_vs_check_diff16(oldip->ip6, newip->ip6,
+   ~csum_fold(ip_vs_check_diff16(oldip->ip6, newip->ip6,
 ip_vs_check_diff2(oldlen, newlen,
-   ~csum_unfold(tcph->check;
+   csum_unfold(tcph->check;
else
 #endif
tcph->check =
-   csum_fold(ip_vs_check_diff4(oldip->ip, newip->ip,
+   ~csum_fold(ip_vs_check_diff4(oldip->ip, newip->ip,
ip_vs_check_diff2(oldlen, newlen,
-   ~csum_unfold(tcph->check;
+   csum_unfold(tcph->check;
 }
 
 
@@ -269,7 +269,7 @@ tcp_dnat_handler(struct sk_buff *skb,
 *  Adjust TCP checksums
 */
if (skb->ip_summed == CHECKSUM_PARTIAL) {
-   tcp_partial_csum_update(cp->af, tcph, >daddr, >vaddr,
+   tcp_partial_csum_update(cp->af, tcph, >vaddr, >daddr,
htons(oldlen),
htons(skb->len - tcphoff));
} else if (!cp->app) {
diff --git a/net/netfilter/ipvs/ip_vs_proto_udp.c 
b/net/netfilter/ipvs/ip_vs_proto_udp.c
index e7a6885..c1781f5 100644
--- a/net/netfilter/ipvs/ip_vs_proto_udp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_udp.c
@@ -154,15 +154,15 @@ udp_partial_csum_update(int af, struct udphdr *uhdr,
 #ifdef CONFIG_IP_VS_IPV6
if (af == AF_INET6)
uhdr->check =
-   csum_fold(ip_vs_check_diff16(oldip->ip6, newip->ip6,
+   ~csum_fold(ip_vs_check_diff16(oldip->ip6, newip->ip6,
 ip_vs_check_diff2(oldlen, newlen,
-   ~csum_unfold(uhdr->check;
+   csum_unfold(uhdr->check;
else
 #endif
uhdr->check =
-   csum_fold(ip_vs_check_diff4(oldip->ip, newip->ip,
+   ~csum_fold(ip_vs_check_diff4(oldip->ip, newip->ip,
ip_vs_check_diff2(oldlen, newlen,
-   ~csum_unfold(uhdr->check;
+   csum_unfold(uhdr->check;
 }
 
 
@@ -205,7 +205,7 @@ udp_snat_handler(struct sk_buff *skb,
 *  Adjust UDP checksums
 */
if (skb->ip_summed == CHECKSUM_PARTIAL) {
-   udp_partial_csum_update(cp->af, udph, >daddr, >vaddr,
+   udp_partial_csum_update(cp->af, udph, >vaddr, >daddr,
htons(oldlen),
htons(skb->len - udphoff));
} else if (!cp->app && (udph->check != 0)) {
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 007/143] staging: comedi: ni_65xx: (bug fix) confine insn_bits to one

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--
 subdevice

From: Ian Abbott 

Commit 677a31565692d596ef42ea589b53ba289abf4713 upstream.

The `insn_bits` handler `ni_65xx_dio_insn_bits()` has a `for` loop that
currently writes (optionally) and reads back up to 5 "ports" consisting
of 8 channels each.  It reads up to 32 1-bit channels but can only read
and write a whole port at once - it needs to handle up to 5 ports as the
first channel it reads might not be aligned on a port boundary.  It
breaks out of the loop early if the next port it handles is beyond the
final port on the card.  It also breaks out early on the 5th port in the
loop if the first channel was aligned.  Unfortunately, it doesn't check
that the current port it is dealing with belongs to the comedi subdevice
the `insn_bits` handler is acting on.  That's a bug.

Redo the `for` loop to terminate after the final port belonging to the
subdevice, changing the loop variable in the process to simplify things
a bit.  The `for` loop could now try and handle more than 5 ports if the
subdevice has more than 40 channels, but the test `if (bitshift >= 32)`
ensures it will break out early after 4 or 5 ports (depending on whether
the first channel is aligned on a port boundary).  (`bitshift` will be
between -7 and 7 inclusive on the first iteration, increasing by 8 for
each subsequent operation.)

Signed-off-by: Ian Abbott 
Signed-off-by: Willy Tarreau 
---
 drivers/staging/comedi/drivers/ni_65xx.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/staging/comedi/drivers/ni_65xx.c 
b/drivers/staging/comedi/drivers/ni_65xx.c
index bbf75eb..bb23291 100644
--- a/drivers/staging/comedi/drivers/ni_65xx.c
+++ b/drivers/staging/comedi/drivers/ni_65xx.c
@@ -410,28 +410,25 @@ static int ni_65xx_dio_insn_bits(struct comedi_device 
*dev,
 struct comedi_subdevice *s,
 struct comedi_insn *insn, unsigned int *data)
 {
-   unsigned base_bitfield_channel;
-   const unsigned max_ports_per_bitfield = 5;
+   int base_bitfield_channel;
unsigned read_bits = 0;
-   unsigned j;
+   int last_port_offset = ni_65xx_port_by_channel(s->n_chan - 1);
+   int port_offset;
+
if (insn->n != 2)
return -EINVAL;
base_bitfield_channel = CR_CHAN(insn->chanspec);
-   for (j = 0; j < max_ports_per_bitfield; ++j) {
-   const unsigned port_offset = 
ni_65xx_port_by_channel(base_bitfield_channel) + j;
-   const unsigned port =
-   sprivate(s)->base_port + port_offset;
-   unsigned base_port_channel;
+   for (port_offset = ni_65xx_port_by_channel(base_bitfield_channel);
+port_offset <= last_port_offset; port_offset++) {
+   unsigned port = sprivate(s)->base_port + port_offset;
+   int base_port_channel = port_offset * ni_65xx_channels_per_port;
unsigned port_mask, port_data, port_read_bits;
-   int bitshift;
-   if (port >= ni_65xx_total_num_ports(board(dev)))
+   int bitshift = base_port_channel - base_bitfield_channel;
+
+   if (bitshift >= 32)
break;
-   base_port_channel = port_offset * ni_65xx_channels_per_port;
port_mask = data[0];
port_data = data[1];
-   bitshift = base_port_channel - base_bitfield_channel;
-   if (bitshift >= 32 || bitshift <= -32)
-   break;
if (bitshift > 0) {
port_mask >>= bitshift;
port_data >>= bitshift;
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 008/143] kernel/kmod.c: check for NULL in call_usermodehelper_exec()

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Tetsuo Handa 

If /proc/sys/kernel/core_pattern contains only "|", a NULL pointer
dereference happens upon core dump because argv_split("") returns
argv[0] == NULL.

This bug was once fixed by commit 264b83c07a84 ("usermodehelper: check
subprocess_info->path != NULL") but was by error reintroduced by commit
7f57cfa4e2aa ("usermodehelper: kill the sub_info->path[0] check").

This bug seems to exist since 2.6.19 (the version which core dump to
pipe was added).  Depending on kernel version and config, some side
effect might happen immediately after this oops (e.g.  kernel panic with
2.6.32-358.18.1.el6).

Signed-off-by: Tetsuo Handa 
Acked-by: Oleg Nesterov 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
(cherry picked from commit 4c1c7be95c345cf2ad537a0c48e9aeadc7304527)
Signed-off-by: Willy Tarreau 
---
 kernel/kmod.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/kmod.c b/kernel/kmod.c
index 8ecc509..3da09a9 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -560,6 +560,10 @@ int call_usermodehelper_exec(struct subprocess_info 
*sub_info,
BUG_ON(atomic_read(_info->cred->usage) != 1);
validate_creds(sub_info->cred);
 
+   if (!sub_info->path) {
+   call_usermodehelper_freeinfo(sub_info);
+   return -EINVAL;
+   }
helper_lock();
if (sub_info->path[0] == '\0')
goto out;
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 024/143] HID: provide a helper for validating hid reports

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Kees Cook 

commit 331415ff16a12147d57d5c953f3a961b7ede348b upstream

Many drivers need to validate the characteristics of their HID report
during initialization to avoid misusing the reports. This adds a common
helper to perform validation of the report exisitng, the field existing,
and the expected number of values within the field.

Signed-off-by: Kees Cook 
Cc: sta...@vger.kernel.org
Reviewed-by: Benjamin Tissoires 
Signed-off-by: Jiri Kosina 

[jmm: backported to 2.6.32]
[wt: dev_err() in 2.6.32 instead of hid_err()]
Signed-off-by: Willy Tarreau 
---
 drivers/hid/hid-core.c | 58 ++
 include/linux/hid.h|  4 
 2 files changed, 62 insertions(+)

diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index a222cbb..e7e28b5 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -808,6 +808,64 @@ static __inline__ int search(__s32 *array, __s32 value, 
unsigned n)
return -1;
 }
 
+static const char * const hid_report_names[] = {
+   "HID_INPUT_REPORT",
+   "HID_OUTPUT_REPORT",
+   "HID_FEATURE_REPORT",
+};
+/**
+ * hid_validate_values - validate existing device report's value indexes
+ *
+ * @device: hid device
+ * @type: which report type to examine
+ * @id: which report ID to examine (0 for first)
+ * @field_index: which report field to examine
+ * @report_counts: expected number of values
+ *
+ * Validate the number of values in a given field of a given report, after
+ * parsing.
+ */
+struct hid_report *hid_validate_values(struct hid_device *hid,
+  unsigned int type, unsigned int id,
+  unsigned int field_index,
+  unsigned int report_counts)
+{
+   struct hid_report *report;
+
+   if (type > HID_FEATURE_REPORT) {
+   dev_err(>dev, "invalid HID report type %u\n", type);
+   return NULL;
+   }
+
+   if (id >= HID_MAX_IDS) {
+   dev_err(>dev, "invalid HID report id %u\n", id);
+   return NULL;
+   }
+
+   /*
+* Explicitly not using hid_get_report() here since it depends on
+* ->numbered being checked, which may not always be the case when
+* drivers go to access report values.
+*/
+   report = hid->report_enum[type].report_id_hash[id];
+   if (!report) {
+   dev_err(>dev, "missing %s %u\n", hid_report_names[type], 
id);
+   return NULL;
+   }
+   if (report->maxfield <= field_index) {
+   dev_err(>dev, "not enough fields in %s %u\n",
+   hid_report_names[type], id);
+   return NULL;
+   }
+   if (report->field[field_index]->report_count < report_counts) {
+   dev_err(>dev, "not enough values in %s %u field %u\n",
+   hid_report_names[type], id, field_index);
+   return NULL;
+   }
+   return report;
+}
+EXPORT_SYMBOL_GPL(hid_validate_values);
+
 /**
  * hid_match_report - check if driver's raw_event should be called
  *
diff --git a/include/linux/hid.h b/include/linux/hid.h
index 481080d..e5db8e5 100644
--- a/include/linux/hid.h
+++ b/include/linux/hid.h
@@ -693,6 +693,10 @@ int hidinput_find_field(struct hid_device *hid, unsigned 
int type, unsigned int
 void hid_output_report(struct hid_report *report, __u8 *data);
 struct hid_device *hid_allocate_device(void);
 int hid_parse_report(struct hid_device *hid, __u8 *start, unsigned size);
+struct hid_report *hid_validate_values(struct hid_device *hid,
+  unsigned int type, unsigned int id,
+  unsigned int field_index,
+  unsigned int report_counts);
 int hid_check_keys_pressed(struct hid_device *hid);
 int hid_connect(struct hid_device *hid, unsigned int connect_mask);
 void hid_disconnect(struct hid_device *hid);
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] intel_pstate: Remove C0 tracking

2014-05-11 Thread Stratos Karafotis

Hi,

On 12/05/2014 05:14 πμ, Stratos Karafotis wrote:
> From: Dirk Brandewie 
> 
> Commit fcb6a15c intel_pstate: Take core C0 time into account for core busy
> introduced a regression referenced below.  The issue with "lockup"
> after suspend that this commit was addressing is now dealt with in the
> suspend path.
> 
> References:
>https://bugzilla.kernel.org/show_bug.cgi?id=66581
>https://bugzilla.kernel.org/show_bug.cgi?id=75121
> 
> Reported-by: Doug Smythies 
> Signed-off-by: Dirk Brandewie 
> ---
>  drivers/cpufreq/intel_pstate.c | 13 +
>  1 file changed, 1 insertion(+), 12 deletions(-)
> 
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index bb20881..4c26faf 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -59,7 +59,6 @@ struct sample {
> int32_t core_pct_busy;
> u64 aperf;
> u64 mperf;
> -   unsigned long long tsc;
> int freq;
>  };
> 
> @@ -100,7 +99,6 @@ struct cpudata {
> 
> u64 prev_aperf;
> u64 prev_mperf;
> -   unsigned long long prev_tsc;
> struct sample sample;
>  };
> 
> @@ -561,46 +559,37 @@ static inline void intel_pstate_calc_busy(struct
> cpudata *cpu,
> struct sample *sample)
>  {
> int32_t core_pct;
> -   int32_t c0_pct;
> 
> core_pct = div_fp(int_tofp((sample->aperf)),
> int_tofp((sample->mperf)));
> core_pct = mul_fp(core_pct, int_tofp(100));
> FP_ROUNDUP(core_pct);
> 
> -   c0_pct = div_fp(int_tofp(sample->mperf), int_tofp(sample->tsc));
> -
> sample->freq = fp_toint(
> mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct));
> 
> -   sample->core_pct_busy = mul_fp(core_pct, c0_pct);
> +   sample->core_pct_busy = core_pct;
>  }
> 
>  static inline void intel_pstate_sample(struct cpudata *cpu)
>  {
> u64 aperf, mperf;
> -   unsigned long long tsc;
> 
> rdmsrl(MSR_IA32_APERF, aperf);
> rdmsrl(MSR_IA32_MPERF, mperf);
> -   tsc = native_read_tsc();
> 
> aperf = aperf >> FRAC_BITS;
> mperf = mperf >> FRAC_BITS;
> -   tsc = tsc >> FRAC_BITS;
> 
> cpu->sample.aperf = aperf;
> cpu->sample.mperf = mperf;
> -   cpu->sample.tsc = tsc;
> cpu->sample.aperf -= cpu->prev_aperf;
> cpu->sample.mperf -= cpu->prev_mperf;
> -   cpu->sample.tsc -= cpu->prev_tsc;
> 
> intel_pstate_calc_busy(cpu, >sample);
> 
> cpu->prev_aperf = aperf;
> cpu->prev_mperf = mperf;
> -   cpu->prev_tsc = tsc;
>  }
> 
>  static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
> --
> 1.9.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

With this patch, my CPU (Core i7-3770 @ 3.90GHz) seems to never use lowest
frequencies. Even on an idle system I get always ~2GHz. Normally,
on an idle system it used to be 1.6GHz.
On very small loads (mp3 decoding) the CPU goes up to 2.7G GHz (it used to
be 1.6GHz)

Reverting, this patch on my local build, the problem is resolved.


Thanks,
Stratos Karafotis

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 035/143] net: dst: provide accessor function to dst->xfrm

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Vlad Yasevich 

[ Upstream commit e87b3998d795123b4139bc3f25490dd236f68212 ]

dst->xfrm is conditionally defined.  Provide accessor funtion that
is always available.

Signed-off-by: Vlad Yasevich 
Acked-by: Neil Horman 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 include/net/dst.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/net/dst.h b/include/net/dst.h
index 5a900dd..49f443b 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -286,11 +286,22 @@ static inline int __xfrm_lookup(struct net *net, struct 
dst_entry **dst_p,
 {
return 0;
 }
+static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
+{
+   return NULL;
+}
+
 #else
 extern int xfrm_lookup(struct net *net, struct dst_entry **dst_p,
   struct flowi *fl, struct sock *sk, int flags);
 extern int __xfrm_lookup(struct net *net, struct dst_entry **dst_p,
 struct flowi *fl, struct sock *sk, int flags);
+
+/* skb attached with this dst needs transformation if dst->xfrm is valid */
+static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
+{
+   return dst->xfrm;
+}
 #endif
 #endif
 
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 020/143] HID: zeroplus: validate output report details

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Kees Cook 

commit 78214e81a1bf43740ce89bb5efda78eac2f8ef83 upstream

The zeroplus HID driver was not checking the size of allocated values
in fields it used. A HID device could send a malicious output report
that would cause the driver to write beyond the output report allocation
during initialization, causing a heap overflow:

[ 1442.728680] usb 1-1: New USB device found, idVendor=0c12, idProduct=0005
...
[ 1466.243173] BUG kmalloc-192 (Tainted: GW   ): Redzone overwritten

CVE-2013-2889

Signed-off-by: Kees Cook 
Cc: sta...@vger.kernel.org
Reviewed-by: Benjamin Tissoires 
Signed-off-by: Jiri Kosina 
[jmm: backport to 2.6.32]
Signed-off-by: Willy Tarreau 
---
 drivers/hid/hid-zpff.c | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/hid/hid-zpff.c b/drivers/hid/hid-zpff.c
index a79f0d7..5617ea9 100644
--- a/drivers/hid/hid-zpff.c
+++ b/drivers/hid/hid-zpff.c
@@ -68,21 +68,13 @@ static int zpff_init(struct hid_device *hid)
struct hid_report *report;
struct hid_input *hidinput = list_entry(hid->inputs.next,
struct hid_input, list);
-   struct list_head *report_list =
-   >report_enum[HID_OUTPUT_REPORT].report_list;
struct input_dev *dev = hidinput->input;
-   int error;
+   int i, error;
 
-   if (list_empty(report_list)) {
-   dev_err(>dev, "no output report found\n");
-   return -ENODEV;
-   }
-
-   report = list_entry(report_list->next, struct hid_report, list);
-
-   if (report->maxfield < 4) {
-   dev_err(>dev, "not enough fields in report\n");
-   return -ENODEV;
+   for (i = 0; i < 4; i++) {
+   report = hid_validate_values(hid, HID_OUTPUT_REPORT, 0, i, 1);
+   if (!report)
+   return -ENODEV;
}
 
zpff = kzalloc(sizeof(struct zpff_device), GFP_KERNEL);
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 015/143] af_key: fix info leaks in notify messages

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Mathias Krause 

commit a5cc68f3d63306d0d288f31edfc2ae6ef8ecd887 upstream

key_notify_sa_flush() and key_notify_policy_flush() miss to initialize
the sadb_msg_reserved member of the broadcasted message and thereby
leak 2 bytes of heap memory to listeners. Fix that.

Signed-off-by: Mathias Krause 
Cc: Steffen Klassert 
Cc: "David S. Miller" 
Cc: Herbert Xu 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 net/key/af_key.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 4e98193..03d626f 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1726,6 +1726,7 @@ static int key_notify_sa_flush(struct km_event *c)
hdr->sadb_msg_version = PF_KEY_V2;
hdr->sadb_msg_errno = (uint8_t) 0;
hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t));
+   hdr->sadb_msg_reserved = 0;
 
pfkey_broadcast(skb, GFP_ATOMIC, BROADCAST_ALL, NULL, c->net);
 
@@ -2694,6 +2695,7 @@ static int key_notify_policy_flush(struct km_event *c)
hdr->sadb_msg_version = PF_KEY_V2;
hdr->sadb_msg_errno = (uint8_t) 0;
hdr->sadb_msg_len = (sizeof(struct sadb_msg) / sizeof(uint64_t));
+   hdr->sadb_msg_reserved = 0;
pfkey_broadcast(skb_out, GFP_ATOMIC, BROADCAST_ALL, NULL, c->net);
return 0;
 
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 039/143] davinci_emac.c: Fix IFF_ALLMULTI setup

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Mariusz Ceier 

[ Upstream commit d69e0f7ea95fef8059251325a79c004bac01f018 ]

When IFF_ALLMULTI flag is set on interface and IFF_PROMISC isn't,
emac_dev_mcast_set should only enable RX of multicasts and reset
MACHASH registers.

It does this, but afterwards it either sets up multicast MACs
filtering or disables RX of multicasts and resets MACHASH registers
again, rendering IFF_ALLMULTI flag useless.

This patch fixes emac_dev_mcast_set, so that multicast MACs filtering and
disabling of RX of multicasts are skipped when IFF_ALLMULTI flag is set.

Tested with kernel 2.6.37.

Signed-off-by: Mariusz Ceier 
Acked-by: Mugunthan V N 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 drivers/net/davinci_emac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/davinci_emac.c b/drivers/net/davinci_emac.c
index e347831..eafd1e4 100644
--- a/drivers/net/davinci_emac.c
+++ b/drivers/net/davinci_emac.c
@@ -960,7 +960,7 @@ static void emac_dev_mcast_set(struct net_device *ndev)
mbp_enable = (mbp_enable | EMAC_MBP_RXMCAST);
emac_add_mcast(priv, EMAC_ALL_MULTI_SET, NULL);
}
-   if (ndev->mc_count > 0) {
+   else if (ndev->mc_count > 0) {
struct dev_mc_list *mc_ptr;
mbp_enable = (mbp_enable | EMAC_MBP_RXMCAST);
emac_add_mcast(priv, EMAC_ALL_MULTI_CLR, NULL);
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 025/143] crypto: api - Fix race condition in larval lookup

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Nikola Pajkovsky 

https://bugzilla.redhat.com/1016108

64z is missing rhel6 commit 3af031a395c0 ("[crypto] algboss: Hold ref
count on larval") which is causing cosmetic fuzz, because crypto_alg_get
was move from crypto/api.c to crypto/internal.h.

From: Herbert Xu 

[ upstream commit 77dbd7a95e4a4f15264c333a9e9ab97ee27dc2aa ]

crypto_larval_lookup should only return a larval if it created one.
Any larval created by another entity must be processed through
crypto_larval_wait before being returned.

Otherwise this will lead to a larval being killed twice, which
will most likely lead to a crash.

Cc: sta...@vger.kernel.org
Reported-by: Kees Cook 
Tested-by: Kees Cook 
Signed-off-by: Herbert Xu 
Signed-off-by: Nikola Pajkovsky 
Signed-off-by: Willy Tarreau 
---
 crypto/api.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/crypto/api.c b/crypto/api.c
index 798526d..f4be65f 100644
--- a/crypto/api.c
+++ b/crypto/api.c
@@ -40,6 +40,8 @@ static inline struct crypto_alg *crypto_alg_get(struct 
crypto_alg *alg)
return alg;
 }
 
+static struct crypto_alg *crypto_larval_wait(struct crypto_alg *alg);
+
 struct crypto_alg *crypto_mod_get(struct crypto_alg *alg)
 {
return try_module_get(alg->cra_module) ? crypto_alg_get(alg) : NULL;
@@ -150,8 +152,11 @@ static struct crypto_alg *crypto_larval_add(const char 
*name, u32 type,
}
up_write(_alg_sem);
 
-   if (alg != >alg)
+   if (alg != >alg) {
kfree(larval);
+   if (crypto_is_larval(alg))
+   alg = crypto_larval_wait(alg);
+   }
 
return alg;
 }
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 04/12 v2] CPU ConCurrency tracking

2014-05-11 Thread Yuyang Du

CC can only be modified when enqueue and dequeue the CPU rq. And we also
track it in scheduler tick and idle enter/exit in case we may not have
enqueue and dequeue for a long time.

Therefore, we track CC in and only in these four points:

1. dequeue
2. enqueue
3. scheduler tick
4. idle enter and exit

TODO: use existing load tracking framework

Signed-off-by: Yuyang Du 
---
 kernel/sched/core.c |3 +++
 kernel/sched/fair.c |2 ++
 2 files changed, 5 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7958a47..0236455 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -767,6 +767,7 @@ static void enqueue_task(struct rq *rq, struct task_struct 
*p, int flags)
update_rq_clock(rq);
sched_info_queued(rq, p);
p->sched_class->enqueue_task(rq, p, flags);
+   update_cpu_concurrency(rq);
 }
 
 static void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
@@ -774,6 +775,7 @@ static void dequeue_task(struct rq *rq, struct task_struct 
*p, int flags)
update_rq_clock(rq);
sched_info_dequeued(rq, p);
p->sched_class->dequeue_task(rq, p, flags);
+   update_cpu_concurrency(rq);
 }
 
 void activate_task(struct rq *rq, struct task_struct *p, int flags)
@@ -2428,6 +2430,7 @@ void scheduler_tick(void)
update_rq_clock(rq);
curr->sched_class->task_tick(rq, curr, 0);
update_cpu_load_active(rq);
+   update_cpu_concurrency(rq);
raw_spin_unlock(>lock);
 
perf_event_task_tick();
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7570dd9..e7153ff 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2563,6 +2563,7 @@ static inline void dequeue_entity_load_avg(struct cfs_rq 
*cfs_rq,
 void idle_enter_fair(struct rq *this_rq)
 {
update_rq_runnable_avg(this_rq, 1);
+   update_cpu_concurrency(this_rq);
 }
 
 /*
@@ -2573,6 +2574,7 @@ void idle_enter_fair(struct rq *this_rq)
 void idle_exit_fair(struct rq *this_rq)
 {
update_rq_runnable_avg(this_rq, 0);
+   update_cpu_concurrency(this_rq);
 }
 
 static int idle_balance(struct rq *this_rq);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 01/12 v2] CONFIG for CPU ConCurrency

2014-05-11 Thread Yuyang Du

Add CONFIG_CPU_CONCURRENCY in arch/x86/Kconfig. This CONFIG enables/disables
CPU ConCurrency load metric tracking.

Signed-off-by: Yuyang Du 
---
 arch/x86/Kconfig |   11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 25d2c6f..9bfac8d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -797,6 +797,17 @@ config SCHED_MC
  making when dealing with multi-core CPU chips at a cost of slightly
  increased overhead in some places. If unsure say N here.
 
+config CPU_CONCURRENCY
+   bool "CPU ConCurency (CC)"
+   default n
+   depends on SMP
+   ---help---
+ CPU ConCurrency (CC) is a new CPU load metric that measures the CPU
+ load by averaging the number of running tasks. Using CC, the scheduler
+ can evaluate the load of CPUs and may consolidate workloads on CPUs in
+ load balancing for power efficiency without sacrificing performance.
+ If unsure say N here.
+
 source "kernel/Kconfig.preempt"
 
 config X86_UP_APIC
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 07/12 v2] CPU ConCurrency API for Workload Consolidation

2014-05-11 Thread Yuyang Du

Currently, CC is per CPU. To consolidate, the formula is based on a heuristic.
Suppose we have 2 CPUs, their task concurrency over time is ('-' means no
task, 'x' having tasks):

1)
CPU0: ----- (CC[0])
CPU1: - (CC[1])

2)
CPU0: ----- (CC[0])
CPU1: ----- (CC[1])

If we consolidate CPU0 and CPU1, the consolidated CC will be: CC' = CC[0] +
CC[1] for case 1 and CC'' = (CC[0] + CC[1]) * 2 for case 2. For the cases in
between case 1 and 2 in terms of how xxx overlaps, the CC should be between
CC' and CC''. So, we uniformly use this condition for consolidation (suppose
we consolidate m CPUs to n CPUs, m > n):

(CC[0] + CC[1] + ... + CC[m-2] + CC[m-1]) * (n + log(m-n)) >=
---
 kernel/sched/concurrency.c |  562 
 kernel/sched/sched.h   |   13 +
 2 files changed, 575 insertions(+)

diff --git a/kernel/sched/concurrency.c b/kernel/sched/concurrency.c
index da26dd7..21e5631 100644
--- a/kernel/sched/concurrency.c
+++ b/kernel/sched/concurrency.c
@@ -28,6 +28,25 @@ unsigned int sysctl_concurrency_decay_rate = 1UL;
  */
 static unsigned int cc_contrib_period = 10UL;
 
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+/*
+ * whether we use concurrency to select cpu to run
+ * the woken up task
+ */
+static unsigned int wc_wakeup = 1UL;
+
+/*
+ * concurrency lower than percentage of this number
+ * is capable of running wakee
+ */
+static unsigned int wc_wakeup_threshold = 80UL;
+
+/*
+ * aggressively push the task even it is hot
+ */
+static unsigned int wc_push_hot_task = 1UL;
+#endif
+
 /*
  * the concurrency is scaled up for decaying,
  * thus, concurrency 1 is effectively 2^cc_resolution (1024),
@@ -343,6 +362,9 @@ void init_cpu_concurrency(struct rq *rq)
rq->concurrency.nr_running = 0;
rq->concurrency.sum_timestamp = ULLONG_MAX;
rq->concurrency.contrib_timestamp = ULLONG_MAX;
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+   rq->concurrency.unload = 0;
+#endif
 }
 
 /*
@@ -364,3 +386,543 @@ void update_cpu_concurrency(struct rq *rq)
 }
 
 #endif
+
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+/*
+ * whether cpu is capable of having more concurrency
+ */
+static int cpu_cc_capable(int cpu)
+{
+   u64 sum = cpu_rq(cpu)->concurrency.sum_now;
+   u64 threshold = cc_weight(1);
+
+   sum *= 100;
+   sum *= cpu_rq(cpu)->cpu_power;
+
+   threshold *= wc_wakeup_threshold;
+   threshold <<= SCHED_POWER_SHIFT;
+
+   if (sum <= threshold)
+   return 1;
+
+   return 0;
+}
+
+/*
+ * we do not select idle, if the cc of the
+ * wakee and waker (in this order) is capable
+ * of handling the wakee task
+ */
+int workload_consolidation_wakeup(int prev, int target)
+{
+   if (!wc_wakeup) {
+   if (idle_cpu(target))
+   return target;
+
+   return nr_cpu_ids;
+   }
+
+   if (idle_cpu(prev) || cpu_cc_capable(prev))
+   return prev;
+
+   if (prev != target && (idle_cpu(target) || cpu_cc_capable(target)))
+   return target;
+
+   return nr_cpu_ids;
+}
+
+static inline u64 sched_group_cc(struct sched_group *sg)
+{
+   u64 sg_cc = 0;
+   int i;
+
+   for_each_cpu(i, sched_group_cpus(sg))
+   sg_cc += cpu_rq(i)->concurrency.sum_now *
+   cpu_rq(i)->cpu_power;
+
+   return sg_cc;
+}
+
+static inline u64 sched_domain_cc(struct sched_domain *sd)
+{
+   struct sched_group *sg = sd->groups;
+   u64 sd_cc = 0;
+
+   do {
+   sd_cc += sched_group_cc(sg);
+   sg = sg->next;
+   } while (sg != sd->groups);
+
+   return sd_cc;
+}
+
+static inline struct sched_group *
+find_lowest_cc_group(struct sched_group *sg, int span)
+{
+   u64 grp_cc, min = ULLONG_MAX;
+   struct sched_group *lowest = NULL;
+   int i;
+
+   for (i = 0; i < span; ++i) {
+   grp_cc = sched_group_cc(sg);
+
+   if (grp_cc < min) {
+   min = grp_cc;
+   lowest = sg;
+   }
+
+   sg = sg->next;
+   }
+
+   return lowest;
+}
+
+static inline u64 __calc_cc_thr(int cpus, unsigned int asym_cc)
+{
+   u64 thr = cpus;
+
+   thr *= cc_weight(1);
+   thr *= asym_cc;
+   thr <<= SCHED_POWER_SHIFT;
+
+   return thr;
+}
+
+/*
+ * can @src_cc of @src_nr cpus be consolidated
+ * to @dst_cc of @dst_nr cpus
+ */
+static inline int
+__can_consolidate_cc(u64 src_cc, int src_nr, u64 dst_cc, int dst_nr)
+{
+   dst_cc *= dst_nr;
+   src_nr -= dst_nr;
+
+   if (unlikely(src_nr <= 0))
+   return 0;
+
+   src_nr = ilog2(src_nr);
+   src_nr += dst_nr;
+   src_cc *= src_nr;
+
+   if (src_cc > dst_cc)
+   return 0;
+
+   return 1;
+}
+
+/*
+ * find the group for asymmetric concurrency
+ * problem to address: traverse sd from top to down
+ */
+struct sched_group *
+workload_consolidation_find_group(struct

[RFC PATCH 10/12 v2] Intercept periodic nohz idle balancing

2014-05-11 Thread Yuyang Du

We intercept load balancing to contain the load and load balancing in
the consolidated CPUs according to our consolidating mechanism.

In periodic nohz idle balance, we skip the idle but non-consolidated
CPUs from load balancing.

Signed-off-by: Yuyang Du 
---
 kernel/sched/fair.c |   50 +++---
 1 file changed, 43 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 94c7a6a..9bb1304 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6867,10 +6867,46 @@ static struct {
 
 static inline int find_new_ilb(void)
 {
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+   struct cpumask *nonshielded = __get_cpu_var(local_cpu_mask);
+   int ilb, weight;
+   int this_cpu = smp_processor_id();
+
+   /*
+* Optimize for the case when we have no idle CPUs or only one
+* idle CPU. Don't walk the sched_domain hierarchy in such cases
+*/
+   if (cpumask_weight(nohz.idle_cpus_mask) < 2)
+   return nr_cpu_ids;
+
+   ilb = cpumask_first(nohz.idle_cpus_mask);
+
+   if (ilb < nr_cpu_ids && idle_cpu(ilb)) {
+
+   cpumask_copy(nonshielded, nohz.idle_cpus_mask);
+
+   rcu_read_lock();
+   workload_consolidation_nonshielded_mask(this_cpu, nonshielded);
+   rcu_read_unlock();
+
+   weight = cpumask_weight(nonshielded);
+
+   if (weight < 2)
+   return nr_cpu_ids;
+
+   /*
+* get idle load balancer again
+*/
+   ilb = cpumask_first(nonshielded);
+   if (ilb < nr_cpu_ids && idle_cpu(ilb))
+   return ilb;
+   }
+#else
int ilb = cpumask_first(nohz.idle_cpus_mask);
 
if (ilb < nr_cpu_ids && idle_cpu(ilb))
return ilb;
+#endif
 
return nr_cpu_ids;
 }
@@ -7107,7 +7143,7 @@ out:
  * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
  * rebalancing for all the cpus for whom scheduler ticks are stopped.
  */
-static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
+static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle, 
struct cpumask *mask)
 {
int this_cpu = this_rq->cpu;
struct rq *rq;
@@ -7117,7 +7153,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum 
cpu_idle_type idle)
!test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
goto end;
 
-   for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
+   for_each_cpu(balance_cpu, mask) {
if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
continue;
 
@@ -7165,10 +7201,10 @@ static inline int nohz_kick_needed(struct rq *rq)
if (unlikely(rq->idle_balance))
return 0;
 
-   /*
-   * We may be recently in ticked or tickless idle mode. At the first
-   * busy tick after returning from idle, we will update the busy stats.
-   */
+   /*
+* We may be recently in ticked or tickless idle mode. At the first
+* busy tick after returning from idle, we will update the busy stats.
+*/
set_cpu_sd_state_busy();
nohz_balance_exit_idle(cpu);
 
@@ -7211,7 +7247,7 @@ need_kick:
return 1;
 }
 #else
-static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) { }
+static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle, 
struct cpumask *mask) { }
 #endif
 
 /*
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 08/12 v2] Intercept wakeup/fork/exec load balancing

2014-05-11 Thread Yuyang Du

We intercept load balancing to contain the load and load balancing in
the consolidated CPUs according to our consolidating mechanism.

In wakeup load balaning, we do not select idle, if the CC of the wakee
and waker (in this order if SD_WAKE_AFFINE) is capable of handling the
wakee task. And in fork/exec load balancing when finding the sched_group,
we find the consolidated group.

Signed-off-by: Yuyang Du 
---
 kernel/sched/fair.c |   15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e7153ff..c7a6347 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4365,9 +4365,16 @@ static int select_idle_sibling(struct task_struct *p, 
int target)
struct sched_domain *sd;
struct sched_group *sg;
int i = task_cpu(p);
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+   int ret;
 
+   ret = workload_consolidation_wakeup(i, target);
+   if (ret < nr_cpu_ids)
+   return ret;
+#else
if (idle_cpu(target))
return target;
+#endif
 
/*
 * If the prevous cpu is cache affine and idle, don't be stupid.
@@ -4460,7 +4467,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, 
int sd_flag, int wake_f
}
 
while (sd) {
-   struct sched_group *group;
+   struct sched_group *group = NULL;
int weight;
 
if (!(sd->flags & sd_flag)) {
@@ -4468,6 +4475,12 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, 
int sd_flag, int wake_f
continue;
}
 
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+   if (sd->flags & SD_WORKLOAD_CONSOLIDATION)
+   group = workload_consolidation_find_group(sd, p, cpu);
+
+   if (!group)
+#endif
group = find_idlest_group(sd, p, cpu, sd_flag);
if (!group) {
sd = sd->child;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 09/12 v2] Intercept idle balancing

2014-05-11 Thread Yuyang Du

We intercept load balancing to contain the load and load balancing in
the consolidated CPUs according to our consolidating mechanism.

In idle balancing, we do two things:

1) Skip pulling task to the idle non-consolidated CPUs.

2) In addition, for consolidated Idle CPU, we aggressively pull tasks from
non-consolidated CPUs.

Signed-off-by: Yuyang Du 
---
 kernel/sched/fair.c |   31 +++
 1 file changed, 31 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7a6347..94c7a6a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6655,6 +6655,10 @@ out:
return ld_moved;
 }
 
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+static DEFINE_PER_CPU(cpumask_var_t, local_cpu_mask);
+#endif
+
 /*
  * idle_balance is called by schedule() if this_cpu is about to become
  * idle. Attempts to pull tasks from other CPUs.
@@ -,6 +6670,9 @@ static int idle_balance(struct rq *this_rq)
unsigned long next_balance = jiffies + HZ;
u64 curr_cost = 0;
int this_cpu = this_rq->cpu;
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+   struct cpumask *nonshielded = __get_cpu_var(local_cpu_mask);
+#endif
 
idle_enter_fair(this_rq);
/*
@@ -6684,6 +6691,19 @@ static int idle_balance(struct rq *this_rq)
 
update_blocked_averages(this_cpu);
rcu_read_lock();
+
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+   cpumask_copy(nonshielded, cpu_active_mask);
+
+   /*
+* if we encounter shielded cpus here, don't do balance on them
+*/
+   workload_consolidation_nonshielded_mask(this_cpu, nonshielded);
+   if (!cpumask_test_cpu(this_cpu, nonshielded))
+   goto unlock;
+   workload_consolidation_unload(nonshielded);
+#endif
+
for_each_domain(this_cpu, sd) {
unsigned long interval;
int continue_balancing = 1;
@@ -6716,6 +6736,9 @@ static int idle_balance(struct rq *this_rq)
if (pulled_task)
break;
}
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+unlock:
+#endif
rcu_read_unlock();
 
raw_spin_lock(_rq->lock);
@@ -7709,6 +7732,14 @@ void print_cfs_stats(struct seq_file *m, int cpu)
 __init void init_sched_fair_class(void)
 {
 #ifdef CONFIG_SMP
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+   unsigned int i;
+   for_each_possible_cpu(i) {
+   zalloc_cpumask_var_node(_cpu(local_cpu_mask, i),
+   GFP_KERNEL, cpu_to_node(i));
+   }
+#endif
+
open_softirq(SCHED_SOFTIRQ, run_rebalance_domains);
 
 #ifdef CONFIG_NO_HZ_COMMON
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 05/12 v2] CONFIG for Workload Consolidation

2014-05-11 Thread Yuyang Du

Add CONFIG_WORKLOAD_CONSOLIDATION in arch/x86/Kconfig. This CONFIG enables
and disables CPU workload consolidation in scheduler's load balancing.

Signed-off-by: Yuyang Du 
---
 arch/x86/Kconfig |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9bfac8d..0999c16 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -808,6 +808,16 @@ config CPU_CONCURRENCY
  load balancing for power efficiency without sacrificing performance.
  If unsure say N here.
 
+config WORKLOAD_CONSOLIDATION
+   bool "CPU Workload Consolidation"
+   default n
+   depends on CPU_CONCURRENCY
+   ---help---
+ CPU Workload Consolidation is a new CPU PM module, which uses the CPU
+ concurrency of the CPU, and allows asymmetric concurrency across CPUs 
to
+ reduce the SW and HW overhead to increase load balance efficiency and
+ conserve energy. If unsure say N here.
+
 source "kernel/Kconfig.preempt"
 
 config X86_UP_APIC
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 12/12 v2] Intercept RT scheduler

2014-05-11 Thread Yuyang Du

We intercept load balancing to contain the load and load balancing in
the consolidated CPUs according to our consolidating mechanism.

In RT scheduler, we also skip pulling/selecting task to the idle
non-consolidated CPUs. This is pretty provocative.

Signed-off-by: Yuyang Du 
---
 kernel/sched/rt.c |   25 +
 1 file changed, 25 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index bd2267a..f8141fb 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1217,6 +1217,9 @@ select_task_rq_rt(struct task_struct *p, int cpu, int 
sd_flag, int flags)
 {
struct task_struct *curr;
struct rq *rq;
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+   int do_find = 0;
+#endif
 
if (p->nr_cpus_allowed == 1)
goto out;
@@ -1230,6 +1233,11 @@ select_task_rq_rt(struct task_struct *p, int cpu, int 
sd_flag, int flags)
rcu_read_lock();
curr = ACCESS_ONCE(rq->curr); /* unlocked access */
 
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+   if (workload_consolidation_cpu_shielded(cpu))
+   do_find = 1;
+#endif
+
/*
 * If the current task on @p's runqueue is an RT task, then
 * try to see if we can wake this RT task up on another
@@ -1252,9 +1260,15 @@ select_task_rq_rt(struct task_struct *p, int cpu, int 
sd_flag, int flags)
 * This test is optimistic, if we get it wrong the load-balancer
 * will have to sort it out.
 */
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+   if (do_find || (curr && unlikely(rt_task(curr)) &&
+   (curr->nr_cpus_allowed < 2 ||
+curr->prio <= p->prio))) {
+#else
if (curr && unlikely(rt_task(curr)) &&
(curr->nr_cpus_allowed < 2 ||
 curr->prio <= p->prio)) {
+#endif
int target = find_lowest_rq(p);
 
if (target != -1)
@@ -1460,6 +1474,12 @@ static int find_lowest_rq(struct task_struct *task)
if (!cpupri_find(_rq(task)->rd->cpupri, task, lowest_mask))
return -1; /* No targets found */
 
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+   workload_consolidation_nonshielded_mask(this_cpu, lowest_mask);
+   if (!cpumask_weight(lowest_mask))
+   return -1;
+#endif
+
/*
 * At this point we have built a mask of cpus representing the
 * lowest priority tasks in the system.  Now we want to elect
@@ -1687,6 +1707,11 @@ static int pull_rt_task(struct rq *this_rq)
if (likely(!rt_overloaded(this_rq)))
return 0;
 
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+   if (workload_consolidation_cpu_shielded(this_cpu))
+   return 0;
+#endif
+
/*
 * Match the barrier from rt_set_overloaded; this guarantees that if we
 * see overloaded we must also see the rto_mask bit.
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 06/12 v2] Attach CPU topology to specify each sched_domain's workload consolidation

2014-05-11 Thread Yuyang Du

Defined SD_WORKLOAD_CONSOLIDATION flag in sched_domain. When this flag is
set, the workload consolidation applies to that domain. In addition, a
consolidating_coeff is defined in sched_domain to specify the degree of
consolidation in that domain.

Signed-off-by: Yuyang Du 
---
 include/linux/sched.h|   13 +
 include/linux/topology.h |   16 
 kernel/sched/core.c  |   41 +
 3 files changed, 70 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 25f54c7..f3f7d4a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -877,6 +877,12 @@ enum cpu_idle_type {
 #define SD_OVERLAP 0x2000  /* sched_domains of this level overlap 
*/
 #define SD_NUMA0x4000  /* cross-node balancing */
 
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+#define SD_WORKLOAD_CONSOLIDATION  0x8000  /* Higher concurrency in front 
*/
+#else
+#define SD_WORKLOAD_CONSOLIDATION  0
+#endif
+
 extern int __weak arch_sd_sibiling_asym_packing(void);
 
 struct sched_domain_attr {
@@ -960,6 +966,13 @@ struct sched_domain {
struct rcu_head rcu;/* used during destruction */
};
 
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+   unsigned int total_groups;  /* total groups number 
*/
+   unsigned int group_number;  /* this CPU's group 
sequence */
+   unsigned int consolidating_coeff;   /* consolidating coefficient */
+   struct sched_group *first_group;/* ordered by CPU number */
+#endif
+
unsigned int span_weight;
/*
 * Span of all CPUs in this domain.
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 7062330..334f83e 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -66,6 +66,16 @@ int arch_update_cpu_topology(void);
 #define PENALTY_FOR_NODE_WITH_CPUS (1)
 #endif
 
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+#ifndef WORKLOAD_CONSOLIDATION_INIT
+#define WORKLOAD_CONSOLIDATION_INIT(n) .consolidating_coeff = (n),
+#endif
+#else
+#ifndef WORKLOAD_CONSOLIDATION_INIT
+#define WORKLOAD_CONSOLIDATION_INIT(n)
+#endif
+#endif
+
 /*
  * Below are the 3 major initializers used in building sched_domains:
  * SD_SIBLING_INIT, for SMT domains
@@ -102,12 +112,14 @@ int arch_update_cpu_topology(void);
| 0*SD_SERIALIZE\
| 0*SD_PREFER_SIBLING   \
| arch_sd_sibling_asym_packing()\
+   | 0*SD_WORKLOAD_CONSOLIDATION   \
,   \
.last_balance   = jiffies,  \
.balance_interval   = 1,\
.smt_gain   = 1178, /* 15% */   \
.max_newidle_lb_cost= 0,\
.next_decay_max_lb_cost = jiffies,  \
+   WORKLOAD_CONSOLIDATION_INIT(0)  \
 }
 #endif
 #endif /* CONFIG_SCHED_SMT */
@@ -134,11 +146,13 @@ int arch_update_cpu_topology(void);
| 0*SD_SHARE_CPUPOWER   \
| 1*SD_SHARE_PKG_RESOURCES  \
| 0*SD_SERIALIZE\
+   | 1*SD_WORKLOAD_CONSOLIDATION   \
,   \
.last_balance   = jiffies,  \
.balance_interval   = 1,\
.max_newidle_lb_cost= 0,\
.next_decay_max_lb_cost = jiffies,  \
+   WORKLOAD_CONSOLIDATION_INIT(180)\
 }
 #endif
 #endif /* CONFIG_SCHED_MC */
@@ -167,11 +181,13 @@ int arch_update_cpu_topology(void);
| 0*SD_SHARE_PKG_RESOURCES  \
| 0*SD_SERIALIZE\
| 1*SD_PREFER_SIBLING   \
+   | 1*SD_WORKLOAD_CONSOLIDATION   \
,   \
.last_balance   = jiffies,  \
.balance_interval   = 1,\
.max_newidle_lb_cost= 0,\
.next_decay_max_lb_cost = jiffies,  \
+   WORKLOAD_CONSOLIDATION_INIT(180)\
 }
 #endif
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0236455..cd92f2d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4843,7 +4843,11 @@

[RFC PATCH 11/12 v2] Intercept periodic load balancing

2014-05-11 Thread Yuyang Du

We intercept load balancing to contain the load and load balancing in
the consolidated CPUs according to our consolidating mechanism.

In periodic load balancing, we do two things:

1) Skip pulling task to the non-consolidated CPUs.

2) In addition, for consolidated Idle CPU, we aggressively pull tasks from
non-consolidated CPUs.

Signed-off-by: Yuyang Du 
---
 kernel/sched/fair.c |   33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9bb1304..1c7a3d7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7260,6 +7260,36 @@ static void run_rebalance_domains(struct softirq_action 
*h)
enum cpu_idle_type idle = this_rq->idle_balance ?
CPU_IDLE : CPU_NOT_IDLE;
 
+#ifdef CONFIG_WORKLOAD_CONSOLIDATION
+   struct cpumask *nonshielded = __get_cpu_var(local_cpu_mask);
+   int this_cpu = cpu_of(this_rq);
+
+   /*
+* if we encounter shielded cpus here, don't do balance on them
+*/
+   cpumask_copy(nonshielded, cpu_active_mask);
+
+   rcu_read_lock();
+   workload_consolidation_nonshielded_mask(this_cpu, nonshielded);
+   rcu_read_unlock();
+
+   /*
+* aggressively unload the shielded cpus to unshielded cpus
+*/
+   workload_consolidation_unload(nonshielded);
+
+   if (cpumask_test_cpu(this_cpu, nonshielded)) {
+   rebalance_domains(this_rq, idle);
+
+   /*
+* If this cpu has a pending nohz_balance_kick, then do the
+* balancing on behalf of the other idle cpus whose ticks are
+* stopped.
+*/
+   cpumask_and(nonshielded, nonshielded, nohz.idle_cpus_mask);
+   nohz_idle_balance(this_rq, idle, nonshielded);
+   }
+#else
rebalance_domains(this_rq, idle);
 
/*
@@ -7267,7 +7297,8 @@ static void run_rebalance_domains(struct softirq_action 
*h)
 * balancing on behalf of the other idle cpus whose ticks are
 * stopped.
 */
-   nohz_idle_balance(this_rq, idle);
+   nohz_idle_balance(this_rq, idle, nohz.idle_cpus_mask);
+#endif
 }
 
 /*
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 02/12 v2] Init CPU ConCurrency

2014-05-11 Thread Yuyang Du

CPU ConCurrency (CC) is inserted as a member in each CPU's rq, and
initiated the same time as rq. Updating CC is protected by rq's lock.

Signed-off-by: Yuyang Du 
---
 kernel/sched/Makefile  |1 +
 kernel/sched/concurrency.c |   22 ++
 kernel/sched/core.c|2 ++
 kernel/sched/sched.h   |   21 +
 4 files changed, 46 insertions(+)
 create mode 100644 kernel/sched/concurrency.c

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index ab32b7b..e67f7e3 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -19,3 +19,4 @@ obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
 obj-$(CONFIG_SCHED_DEBUG) += debug.o
 obj-$(CONFIG_CGROUP_CPUACCT) += cpuacct.o
+obj-$(CONFIG_CPU_CONCURRENCY) += concurrency.o
diff --git a/kernel/sched/concurrency.c b/kernel/sched/concurrency.c
new file mode 100644
index 000..50e08a2
--- /dev/null
+++ b/kernel/sched/concurrency.c
@@ -0,0 +1,22 @@
+/*
+ * CPU ConCurrency (CC) measures the CPU load by averaging
+ * the number of running tasks. Using CC, the scheduler can
+ * evaluate the load of CPUs to improve load balance for power
+ * efficiency without sacrificing performance.
+ *
+ */
+
+#ifdef CONFIG_CPU_CONCURRENCY
+
+#include "sched.h"
+
+void init_cpu_concurrency(struct rq *rq)
+{
+   rq->concurrency.sum = 0;
+   rq->concurrency.sum_now = 0;
+   rq->concurrency.contrib = 0;
+   rq->concurrency.nr_running = 0;
+   rq->concurrency.sum_timestamp = ULLONG_MAX;
+   rq->concurrency.contrib_timestamp = ULLONG_MAX;
+}
+#endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 268a45e..7958a47 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6884,6 +6884,8 @@ void __init sched_init(void)
 #endif
init_rq_hrtick(rq);
atomic_set(>nr_iowait, 0);
+
+   init_cpu_concurrency(rq);
}
 
set_load_weight(_task);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 456e492..f1c9235 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -508,6 +508,17 @@ extern struct root_domain def_root_domain;
 
 #endif /* CONFIG_SMP */
 
+#ifdef CONFIG_CPU_CONCURRENCY
+struct cpu_concurrency_t {
+   u64 sum;
+   u64 sum_now;
+   u64 contrib;
+   u64 sum_timestamp;
+   u64 contrib_timestamp;
+   unsigned int nr_running;
+};
+#endif
+
 /*
  * This is the main, per-CPU runqueue data structure.
  *
@@ -643,6 +654,10 @@ struct rq {
 #ifdef CONFIG_SMP
struct llist_head wake_list;
 #endif
+
+#ifdef CONFIG_CPU_CONCURRENCY
+   struct cpu_concurrency_t concurrency;
+#endif
 };
 
 static inline int cpu_of(struct rq *rq)
@@ -1203,6 +1218,12 @@ extern void init_sched_dl_class(void);
 extern void resched_task(struct task_struct *p);
 extern void resched_cpu(int cpu);
 
+#ifdef CONFIG_CPU_CONCURRENCY
+extern void init_cpu_concurrency(struct rq *rq);
+#else
+static inline void init_cpu_concurrency(struct rq *rq) {}
+#endif
+
 extern struct rt_bandwidth def_rt_bandwidth;
 extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 
runtime);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 03/12 v2] CPU ConCurrency calculation

2014-05-11 Thread Yuyang Du

It is natural to use task concurrency (running tasks in the rq) as load
indicator. We calculate CC from task concurrency by two steps:

1) Divide continuous time into periods of time, and average task concurrency
in period, for tolerating the transient bursts:

a = sum(concurrency * time) / period

2) Exponentially decay past periods, and synthesize them all, for hysteresis
to load drops or resilience to load rises (let f be decaying factor, and a_x
the xth period average since period 0):

s = a_n + f^1 * a_n-1 + f^2 * a_n-2 +, ..., + f^(n-1) * a_1 + f^n * a_0

Signed-off-by: Yuyang Du 
---
 include/linux/sched/sysctl.h |8 +
 kernel/sched/concurrency.c   |  344 ++
 kernel/sched/sched.h |2 +
 kernel/sysctl.c  |   16 ++
 4 files changed, 370 insertions(+)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index 8045a55..ec52b3f 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -36,6 +36,14 @@ extern unsigned int sysctl_sched_min_granularity;
 extern unsigned int sysctl_sched_wakeup_granularity;
 extern unsigned int sysctl_sched_child_runs_first;
 
+#ifdef CONFIG_CPU_CONCURRENCY
+extern unsigned int sysctl_concurrency_sum_period;
+extern unsigned int sysctl_concurrency_decay_rate;
+extern int concurrency_decay_rate_handler(struct ctl_table *table, int write,
+   void __user *buffer,
+   size_t *lenp, loff_t *ppos);
+#endif
+
 enum sched_tunable_scaling {
SCHED_TUNABLESCALING_NONE,
SCHED_TUNABLESCALING_LOG,
diff --git a/kernel/sched/concurrency.c b/kernel/sched/concurrency.c
index 50e08a2..da26dd7 100644
--- a/kernel/sched/concurrency.c
+++ b/kernel/sched/concurrency.c
@@ -10,6 +10,331 @@
 
 #include "sched.h"
 
+/*
+ * the sum period of time is 2^26 ns (~64) by default
+ */
+unsigned int sysctl_concurrency_sum_period = 26UL;
+
+/*
+ * the number of sum periods, after which the original
+ * will be reduced/decayed to half
+ */
+unsigned int sysctl_concurrency_decay_rate = 1UL;
+
+/*
+ * the contrib period of time is 2^10 (~1us) by default,
+ * us has better precision than ms, and
+ * 1024 makes use of faster shift than div
+ */
+static unsigned int cc_contrib_period = 10UL;
+
+/*
+ * the concurrency is scaled up for decaying,
+ * thus, concurrency 1 is effectively 2^cc_resolution (1024),
+ * which can be halved by 10 half-life periods
+ */
+static unsigned int cc_resolution = 10UL;
+
+/*
+ * after this number of half-life periods, even
+ * (1>>32)-1 (which is sufficiently large) is less than 1
+ */
+static unsigned int cc_decay_max_pds = 32UL;
+
+static inline u32 cc_scale_up(unsigned int c)
+{
+   return c << cc_resolution;
+}
+
+static inline u32 cc_scale_down(unsigned int c)
+{
+   return c >> cc_resolution;
+}
+
+/* from nanoseconds to sum periods */
+static inline u64 cc_sum_pds(u64 n)
+{
+   return n >> sysctl_concurrency_sum_period;
+}
+
+/* from sum period to timestamp in ns */
+static inline u64 cc_timestamp(u64 p)
+{
+   return p << sysctl_concurrency_sum_period;
+}
+
+/*
+ * from nanoseconds to contrib periods, because
+ * ns so risky that can overflow cc->contrib
+ */
+static inline u64 cc_contrib_pds(u64 n)
+{
+   return n >> cc_contrib_period;
+}
+
+/*
+ * cc_decay_factor only works for 32bit integer,
+ * cc_decay_factor_x, x indicates the number of periods
+ * as half-life (sysctl_concurrency_decay_rate)
+ */
+static const u32 cc_decay_factor_1[] = {
+   0x,
+};
+
+static const u32 cc_decay_factor_2[] = {
+   0x, 0xB504F333,
+};
+
+static const u32 cc_decay_factor_4[] = {
+   0x, 0xD744FCCA, 0xB504F333, 0x9837F051,
+};
+
+static const u32 cc_decay_factor_8[] = {
+   0x, 0xEAC0C6E7, 0xD744FCCA, 0xC5672A11,
+   0xB504F333, 0xA5FED6A9, 0x9837F051, 0x8B95C1E3,
+};
+
+/* by default sysctl_concurrency_decay_rate */
+static const u32 *cc_decay_factor =
+   cc_decay_factor_1;
+
+/*
+ * cc_decayed_sum depends on cc_resolution (fixed 10),
+ * cc_decayed_sum_x, x indicates the number of periods
+ * as half-life (sysctl_concurrency_decay_rate)
+ */
+static const u32 cc_decayed_sum_1[] = {
+   0, 512, 768, 896, 960, 992,
+   1008, 1016, 1020, 1022, 1023,
+};
+
+static const u32 cc_decayed_sum_2[] = {
+   0, 724, 1235, 1597, 1853, 2034, 2162, 2252,
+   2316, 2361, 2393, 2416, 2432, 2443, 2451,
+   2457, 2461, 2464, 2466, 2467, 2468, 2469,
+};
+
+static const u32 cc_decayed_sum_4[] = {
+   0, 861, 1585, 2193, 2705, 3135, 3497, 3801, 4057,
+   4272, 4453, 4605, 4733, 4840, 4930, 5006, 5070,
+   5124, 5169, 5207, 5239, 5266, 5289, 5308, 5324,
+   5337, 5348, 5358, 5366, 5373, 5379, 5384, 5388,
+   5391, 5394, 5396, 5398, 5400, 5401, 5402, 5403,
+   5404, 5405, 5406,
+};
+
+static const u32 cc_decayed_sum_8[] = {
+   0, 939, 1800, 2589, 3313, 3977, 4585, 5143,
+   5655, 6124, 6554,

[RFC PATCH 00/12 v2] A new CPU load metric for power-efficient scheduler: CPU ConCurrency

2014-05-11 Thread Yuyang Du

Hi Ingo, PeterZ, Rafael, and others,

The current scheduler’s load balancing is completely work-conserving. In some
workload, generally low CPU utilization but immersed with CPU bursts of
transient tasks, migrating task to engage all available CPUs for
work-conserving can lead to significant overhead: cache locality loss,
idle/active HW state transitional latency and power, shallower idle state,
etc, which are both power and performance inefficient especially for today’s
low power processors in mobile. 

This RFC introduces a sense of idleness-conserving into work-conserving (by
all means, we really don’t want to be overwhelming in only one way). But to
what extent the idleness-conserving should be, bearing in mind that we don’t
want to sacrifice performance? We first need a load/idleness indicator to that
end.

Thanks to CFS’s “model an ideal, precise multi-tasking CPU”, tasks can be seen
as concurrently running (the tasks in the runqueue). So it is natural to use
task concurrency as load indicator. Having said that, we do two things:

1) Divide continuous time into periods of time, and average task concurrency
in period, for tolerating the transient bursts:
a = sum(concurrency * time) / period
2) Exponentially decay past periods, and synthesize them all, for hysteresis
to load drops or resilience to load rises (let f be decaying factor, and a_x
the xth period average since period 0):
s = a_n + f^1 * a_n-1 + f^2 * a_n-2 +, ..., + f^(n-1) * a_1 + f^n * a_0

We name this load indicator as CPU ConCurrency (CC): task concurrency
determines how many CPUs are needed to be running concurrently.

Another two ways of how to interpret CC:

1) the current work-conserving load balance also uses CC, but instantaneous
CC.

2) CC vs. CPU utilization. CC is runqueue-length-weighted CPU utilization. If
we change: "a = sum(concurrency * time) / period" to "a' = sum(1 * time) /
period". Then a' is just about the CPU utilization. And the way we weight
runqueue-length is the simplest one (excluding the exponential decays, and you
may have other ways).

To track CC, we intercept the scheduler in 1) enqueue, 2) dequeue, 3)
scheduler tick, and 4) enter/exit idle.

After CC, in the consolidation part, we do 1) attach the CPU topology to be
adaptive beyond our experimental platforms, and 2) intercept the current load
balance for load and load balancing containment.

Currently, CC is per CPU. To consolidate, the formula is based on a heuristic.
Suppose we have 2 CPUs, their task concurrency over time is ('-' means no
task, 'x' having tasks):

1)
CPU0: ----- (CC[0])
CPU1: - (CC[1])

2)
CPU0: ----- (CC[0])
CPU1: ----- (CC[1])

If we consolidate CPU0 and CPU1, the consolidated CC will be: CC' = CC[0] +
CC[1] for case 1 and CC'' = (CC[0] + CC[1]) * 2 for case 2. For the cases in
between case 1 and 2 in terms of how xxx overlaps, the CC should be between
CC' and CC''. So, we uniformly use this condition for consolidation (suppose
we consolidate m CPUs to n CPUs, m > n):

(CC[0] + CC[1] + ... + CC[m-2] + CC[m-1]) * (n + log(m-n)) >=http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 013/143] sctp: Use correct sideffect command in duplicate cookie handling

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Vlad Yasevich 

commit f2815633504b442ca0b0605c16bf3d88a3a0fcea upstream

When SCTP is done processing a duplicate cookie chunk, it tries
to delete a newly created association.  For that, it has to set
the right association for the side-effect processing to work.
However, when it uses the SCTP_CMD_NEW_ASOC command, that performs
more work then really needed (like hashing the associationa and
assigning it an id) and there is no point to do that only to
delete the association as a next step.  In fact, it also creates
an impossible condition where an association may be found by
the getsockopt() call, and that association is empty.  This
causes a crash in some sctp getsockopts.

The solution is rather simple.  We simply use SCTP_CMD_SET_ASOC
command that doesn't have all the overhead and does exactly
what we need.

Reported-by: Karl Heiss 
Tested-by: Karl Heiss 
CC: Neil Horman 
Signed-off-by: Vlad Yasevich 
Acked-by: Neil Horman 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 net/sctp/sm_statefuns.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 9e4e846..486df56 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -2045,7 +2045,7 @@ sctp_disposition_t sctp_sf_do_5_2_4_dupcook(const struct 
sctp_endpoint *ep,
}
 
/* Delete the tempory new association. */
-   sctp_add_cmd_sf(commands, SCTP_CMD_NEW_ASOC, SCTP_ASOC(new_asoc));
+   sctp_add_cmd_sf(commands, SCTP_CMD_SET_ASOC, SCTP_ASOC(new_asoc));
sctp_add_cmd_sf(commands, SCTP_CMD_DELETE_TCB, SCTP_NULL());
 
/* Restore association pointer to provide SCTP command interpeter
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 036/143] sctp: Use software crc32 checksum when xfrm transform will happen.

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Fan Du 

[ Upstream commit 27127a82561a2a3ed955ce207048e1b066a80a2a ]

igb/ixgbe have hardware sctp checksum support, when this feature is enabled
and also IPsec is armed to protect sctp traffic, ugly things happened as
xfrm_output checks CHECKSUM_PARTIAL to do checksum operation(sum every thing
up and pack the 16bits result in the checksum field). The result is fail
establishment of sctp communication.

Cc: Neil Horman 
Cc: Steffen Klassert 
Signed-off-by: Fan Du 
Signed-off-by: Vlad Yasevich 
Acked-by: Neil Horman 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 net/sctp/output.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index d494100..8d4eacf 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -506,7 +506,8 @@ int sctp_packet_transmit(struct sctp_packet *packet)
 * by CRC32-C as described in .
 */
if (!sctp_checksum_disable &&
-   !(dst->dev->features & (NETIF_F_NO_CSUM | NETIF_F_SCTP_CSUM))) {
+   (!(dst->dev->features & (NETIF_F_NO_CSUM | NETIF_F_SCTP_CSUM)) ||
+(dst_xfrm(dst) != NULL))) {
__u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
 
/* 3) Put the resultant value into the checksum field in the
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [lm-sensors] [PATCH] drivers/hwmon/emc1403.c: add support for emc1412

2014-05-11 Thread Guenter Roeck


On 05/11/2014 03:40 PM, Guenter Roeck wrote:
[ ... ]



  id = i2c_smbus_read_byte_data(client, THERMAL_REVISION_REG);
-if (id != 0x01)
+if (id != 0x01 && id != 0x04) {
  return -ENODEV;


This should be a separate patch, as it applies to emc1403/emc1404 as well,
so we can backport it into -stable.



Also, the chip datasheet suggests that chip revision 3 exists as well.
Given that, I would suggest to replace the revision number check with
something like
if (id < 0x01 || id > 0x04)
return -ENODEV;

Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 031/143] can: dev: fix nlmsg size calculation in can_get_size()

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Marc Kleine-Budde 

[ Upstream commit fe119a05f8ca481623a8d02efcc984332e612528 ]

This patch fixes the calculation of the nlmsg size, by adding the missing
nla_total_size().

Signed-off-by: Marc Kleine-Budde 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 drivers/net/can/dev.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index 2868fe8..ea2749f9 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -595,12 +595,12 @@ static size_t can_get_size(const struct net_device *dev)
size_t size;
 
size = nla_total_size(sizeof(u32));   /* IFLA_CAN_STATE */
-   size += sizeof(struct can_ctrlmode);  /* IFLA_CAN_CTRLMODE */
+   size += nla_total_size(sizeof(struct can_ctrlmode));  /* 
IFLA_CAN_CTRLMODE */
size += nla_total_size(sizeof(u32));  /* IFLA_CAN_RESTART_MS */
-   size += sizeof(struct can_bittiming); /* IFLA_CAN_BITTIMING */
-   size += sizeof(struct can_clock); /* IFLA_CAN_CLOCK */
+   size += nla_total_size(sizeof(struct can_bittiming)); /* 
IFLA_CAN_BITTIMING */
+   size += nla_total_size(sizeof(struct can_clock)); /* IFLA_CAN_CLOCK 
*/
if (priv->bittiming_const)/* IFLA_CAN_BITTIMING_CONST */
-   size += sizeof(struct can_bittiming_const);
+   size += nla_total_size(sizeof(struct can_bittiming_const));
 
return size;
 }
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Michal Tesar 

[ Upstream commit 651e92716aaae60fc41b9652f54cb6803896e0da ]

Limit the min/max value passed to the
/proc/sys/net/ipv4/tcp_syn_retries.

Signed-off-by: Michal Tesar 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 net/ipv4/sysctl_net_ipv4.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 2dcf04d..910fa54 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -23,6 +23,8 @@
 
 static int zero;
 static int tcp_retr1_max = 255;
+static int tcp_syn_retries_min = 1;
+static int tcp_syn_retries_max = MAX_TCP_SYNCNT;
 static int ip_local_port_range_min[] = { 1, 1 };
 static int ip_local_port_range_max[] = { 65535, 65535 };
 
@@ -237,7 +239,9 @@ static struct ctl_table ipv4_table[] = {
.data   = _config.no_pmtu_disc,
.maxlen = sizeof(int),
.mode   = 0644,
-   .proc_handler   = proc_dointvec
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = _syn_retries_min,
+   .extra2 = _syn_retries_max
},
{
.ctl_name   = NET_IPV4_NONLOCAL_BIND,
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 038/143] wanxl: fix info leak in ioctl

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: =?latin1?q?Salva=20Peir=F3?= 

[ Upstream commit 2b13d06c9584b4eb773f1e80bbaedab9a1c344e1 ]

The wanxl_ioctl() code fails to initialize the two padding bytes of
struct sync_serial_settings after the ->loopback member. Add an explicit
memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Salva Peiró 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 drivers/net/wan/wanxl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wan/wanxl.c b/drivers/net/wan/wanxl.c
index daee8a0..b52b378 100644
--- a/drivers/net/wan/wanxl.c
+++ b/drivers/net/wan/wanxl.c
@@ -354,6 +354,7 @@ static int wanxl_ioctl(struct net_device *dev, struct ifreq 
*ifr, int cmd)
ifr->ifr_settings.size = size; /* data size wanted */
return -ENOBUFS;
}
+   memset(, 0, sizeof(line));
line.clock_type = get_status(port)->clocking;
line.clock_rate = 0;
line.loopback = 0;
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 040/143] resubmit bridge: fix message_age_timer calculation

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Chris Healy 

[ Upstream commit 9a0620133ccce9dd35c00a96405c8d80938c2cc0 ]

This changes the message_age_timer calculation to use the BPDU's max age as
opposed to the local bridge's max age.  This is in accordance with section
8.6.2.3.2 Step 2 of the 802.1D-1998 sprecification.

With the current implementation, when running with very large bridge
diameters, convergance will not always occur even if a root bridge is
configured to have a longer max age.

Tested successfully on bridge diameters of ~200.

Signed-off-by: Chris Healy 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 net/bridge/br_stp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index c7d6bfc..a67e6ce 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -192,7 +192,7 @@ static inline void br_record_config_information(struct 
net_bridge_port *p,
p->designated_age = jiffies + bpdu->message_age;
 
mod_timer(>message_age_timer, jiffies
- + (p->br->max_age - bpdu->message_age));
+ + (bpdu->max_age - bpdu->message_age));
 }
 
 /* called under bridge lock */
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 047/143] ipv6: fix possible crashes in ip6_cork_release()

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 284041ef21fdf2e0d216ab6b787bc9072b4eb58a ]

commit 0178b695fd6b4 ("ipv6: Copy cork options in ip6_append_data")
added some code duplication and bad error recovery, leading to potential
crash in ip6_cork_release() as kfree() could be called with garbage.

use kzalloc() to make sure this wont happen.

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Cc: Herbert Xu 
Cc: Hideaki YOSHIFUJI 
Cc: Neal Cardwell 
Signed-off-by: Willy Tarreau 
---
 net/ipv6/ip6_output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index bba91a1..bb63ffc 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1174,7 +1174,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void 
*from, char *to,
if (WARN_ON(np->cork.opt))
return -EINVAL;
 
-   np->cork.opt = kmalloc(opt->tot_len, sk->sk_allocation);
+   np->cork.opt = kzalloc(opt->tot_len, sk->sk_allocation);
if (unlikely(np->cork.opt == NULL))
return -ENOBUFS;
 
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 089/143] ipv6: fix possible seqlock deadlock in ip6_finish_output2

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Hannes Frederic Sowa 

[ Upstream commit 7f88c6b23afbd31545c676dea77ba9593a1a14bf ]

IPv6 stats are 64 bits and thus are protected with a seqlock. By not
disabling bottom-half we could deadlock here if we don't disable bh and
a softirq reentrantly updates the same mib.

Cc: Eric Dumazet 
Signed-off-by: Hannes Frederic Sowa 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 net/ipv6/ip6_output.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index bb63ffc..6ff4d07 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -91,8 +91,8 @@ static int ip6_output_finish(struct sk_buff *skb)
else if (dst->neighbour)
return dst->neighbour->output(skb);
 
-   IP6_INC_STATS_BH(dev_net(dst->dev),
-ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
+   IP6_INC_STATS(dev_net(dst->dev),
+ ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
kfree_skb(skb);
return -EINVAL;
 
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 002/143] Fix lockup related to stop_machine being stuck in __do_softirq.

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Ben Greear 

The stop machine logic can lock up if all but one of the migration
threads make it through the disable-irq step and the one remaining
thread gets stuck in __do_softirq.  The reason __do_softirq can hang is
that it has a bail-out based on jiffies timeout, but in the lockup case,
jiffies itself is not incremented.

To work around this, re-add the max_restart counter in __do_irq and stop
processing irqs after 10 restarts.

Thanks to Tejun Heo and Rusty Russell and others for helping me track
this down.

This was introduced in 3.9 by commit c10d73671ad3 ("softirq: reduce
latencies").

It may be worth looking into ath9k to see if it has issues with its irq
handler at a later date.

The hang stack traces look something like this:

[ cut here ]
WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
Watchdog detected hard LOCKUP on cpu 2
Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 
auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen 
lockd sunrpc]
Pid: 23, comm: migration/2 Tainted: G C   3.9.4+ #11
Call Trace:
warn_slowpath_common+0x85/0x9f
  warn_slowpath_fmt+0x46/0x48
  watchdog_overflow_callback+0x9c/0xa7
  __perf_event_overflow+0x137/0x1cb
  perf_event_overflow+0x14/0x16
  intel_pmu_handle_irq+0x2dc/0x359
  perf_event_nmi_handler+0x19/0x1b
  nmi_handle+0x7f/0xc2
  do_nmi+0xbc/0x304
  end_repeat_nmi+0x1e/0x2e
 <>
  cpu_stopper_thread+0xae/0x162
  smpboot_thread_fn+0x258/0x260
  kthread+0xc7/0xcf
  ret_from_fork+0x7c/0xb0
---[ end trace 4947dfa9b0a4cec3 ]---
BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 
auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen 
lockd sunrpc]
irq event stamp: 835637905
hardirqs last  enabled at (835637904): __do_softirq+0x9f/0x257
hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80
softirqs last  enabled at (5654720): __do_softirq+0x1ff/0x257
softirqs last disabled at (5654725): irq_exit+0x5f/0xbb
CPU 1
Pid: 17, comm: migration/1 Tainted: GWC   3.9.4+ #11 To be filled 
by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
RIP: tasklet_hi_action+0xf0/0xf0
Process migration/1
Call Trace:
 
  __do_softirq+0x117/0x257
  irq_exit+0x5f/0xbb
  smp_apic_timer_interrupt+0x8a/0x98
  apic_timer_interrupt+0x72/0x80
 
  printk+0x4d/0x4f
  stop_machine_cpu_stop+0x22c/0x274
  cpu_stopper_thread+0xae/0x162
  smpboot_thread_fn+0x258/0x260
  kthread+0xc7/0xcf
  ret_from_fork+0x7c/0xb0

Signed-off-by: Ben Greear 
Acked-by: Tejun Heo 
Acked-by: Pekka Riikonen 
Cc: Eric Dumazet 
Cc: sta...@kernel.org
Cc: Ben Hutchings 
Signed-off-by: Linus Torvalds 
(cherry picked from commit 34376a50fb1fa095b9d0636fa41ed2e73125f214)
Signed-off-by: Willy Tarreau 
---
 kernel/softirq.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index d75c136..e4d5d8c 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -194,8 +194,12 @@ void local_bh_enable_ip(unsigned long ip)
 EXPORT_SYMBOL(local_bh_enable_ip);
 
 /*
- * We restart softirq processing for at most 2 ms,
- * and if need_resched() is not set.
+ * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
+ * but break the loop if need_resched() is set or after 2 ms.
+ * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
+ * certain cases, such as stop_machine(), jiffies may cease to
+ * increment and so we need the MAX_SOFTIRQ_RESTART limit as
+ * well to make sure we eventually return from this method.
  *
  * These limits have been established via experimentation.
  * The two things to balance is latency against fairness -
@@ -203,6 +207,7 @@ EXPORT_SYMBOL(local_bh_enable_ip);
  * should not be able to lock up the box.
  */
 #define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
+#define MAX_SOFTIRQ_RESTART 10
 
 asmlinkage void __do_softirq(void)
 {
@@ -210,6 +215,7 @@ asmlinkage void __do_softirq(void)
__u32 pending;
unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
int cpu;
+   int max_restart = MAX_SOFTIRQ_RESTART;
 
pending = local_softirq_pending();
account_system_vtime(current);
@@ -254,7 +260,8 @@ restart:
 
pending = local_softirq_pending();
if (pending) {
-   if (time_before(jiffies, end) && !need_resched())
+   if (time_before(jiffies, end) && !need_resched() &&
+   --max_restart)
goto restart;
 
wakeup_softirqd();
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list:

[ 113/143] aacraid: prevent invalid pointer dereference

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Mahesh Rajashekhara 

It appears that driver runs into a problem here if fibsize is too small
because we allocate user_srbcmd with fibsize size only but later we
access it until user_srbcmd->sg.count to copy it over to srbcmd.

It is not correct to test (fibsize < sizeof(*user_srbcmd)) because this
structure already includes one sg element and this is not needed for
commands without data.  So, we would recommend to add the following
(instead of test for fibsize == 0).

Signed-off-by: Mahesh Rajashekhara 
Reported-by: Nico Golde 
Reported-by: Fabian Yamaguchi 
Signed-off-by: Linus Torvalds 
(cherry picked from commit b4789b8e6be3151a955ade74872822f30e8cd914)

CVE-2013-6380
Signed-off-by: Willy Tarreau 
---
 drivers/scsi/aacraid/commctrl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c
index a5b8e7b..c895174 100644
--- a/drivers/scsi/aacraid/commctrl.c
+++ b/drivers/scsi/aacraid/commctrl.c
@@ -507,7 +507,8 @@ static int aac_send_raw_srb(struct aac_dev* dev, void 
__user * arg)
goto cleanup;
}
 
-   if (fibsize > (dev->max_fib_size - sizeof(struct aac_fibhdr))) {
+   if ((fibsize < (sizeof(struct user_aac_srb) - sizeof(struct 
user_sgentry))) ||
+   (fibsize > (dev->max_fib_size - sizeof(struct aac_fibhdr {
rcode = -EINVAL;
goto cleanup;
}
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 100/143] tg3: Fix deadlock in tg3_change_mtu()

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Nithin Sujir 

[ Upstream commit c6993dfd7db9b0c6b7ca7503a56fda9236a4710f ]

Quoting David Vrabel -
"5780 cards cannot have jumbo frames and TSO enabled together.  When
jumbo frames are enabled by setting the MTU, the TSO feature must be
cleared.  This is done indirectly by calling netdev_update_features()
which will call tg3_fix_features() to actually clear the flags.

netdev_update_features() will also trigger a new netlink message for the
feature change event which will result in a call to tg3_get_stats64()
which deadlocks on the tg3 lock."

tg3_set_mtu() does not need to be under the tg3 lock since converting
the flags to use set_bit(). Move it out to after tg3_netif_stop().

Reported-by: David Vrabel 
Tested-by: David Vrabel 
Signed-off-by: Michael Chan 
Signed-off-by: Nithin Nayak Sujir 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 drivers/net/tg3.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 89aa69c..56648b4 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -5583,12 +5583,12 @@ static int tg3_change_mtu(struct net_device *dev, int 
new_mtu)
 
tg3_netif_stop(tp);
 
+   tg3_set_mtu(dev, tp, new_mtu);
+
tg3_full_lock(tp, 1);
 
tg3_halt(tp, RESET_KIND_SHUTDOWN, 1);
 
-   tg3_set_mtu(dev, tp, new_mtu);
-
err = tg3_restart_hw(tp, 0);
 
if (!err)
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 065/143] net: check net.core.somaxconn sysctl values

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Roman Gushchin 

[ Upstream commit 5f671d6b4ec3e6d66c2a868738af2cdea09e7509 ]

It's possible to assign an invalid value to the net.core.somaxconn
sysctl variable, because there is no checks at all.

The sk_max_ack_backlog field of the sock structure is defined as
unsigned short. Therefore, the backlog argument in inet_listen()
shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall
is truncated to the somaxconn value. So, the somaxconn value shouldn't
exceed 65535 (USHRT_MAX).
Also, negative values of somaxconn are meaningless.

before:
$ sysctl -w net.core.somaxconn=256
net.core.somaxconn = 256
$ sysctl -w net.core.somaxconn=65536
net.core.somaxconn = 65536
$ sysctl -w net.core.somaxconn=-100
net.core.somaxconn = -100

after:
$ sysctl -w net.core.somaxconn=256
net.core.somaxconn = 256
$ sysctl -w net.core.somaxconn=65536
error: "Invalid argument" setting key "net.core.somaxconn"
$ sysctl -w net.core.somaxconn=-100
error: "Invalid argument" setting key "net.core.somaxconn"

Based on a prior patch from Changli Gao.

Signed-off-by: Roman Gushchin 
Reported-by: Changli Gao 
Suggested-by: Eric Dumazet 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 net/core/sysctl_net_core.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 7db1de0..e2eaf29 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -14,6 +14,9 @@
 #include 
 #include 
 
+static int zero = 0;
+static int ushort_max = 65535;
+
 static struct ctl_table net_core_table[] = {
 #ifdef CONFIG_NET
{
@@ -116,7 +119,9 @@ static struct ctl_table netns_core_table[] = {
.data   = _net.core.sysctl_somaxconn,
.maxlen = sizeof(int),
.mode   = 0644,
-   .proc_handler   = proc_dointvec
+   .extra1 = ,
+   .extra2 = _max,
+   .proc_handler   = proc_dointvec_minmax
},
{ .ctl_name = 0 }
 };
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 037/143] sctp: Perform software checksum if packet has to be fragmented.

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Vlad Yasevich 

[ Upstream commit d2dbbba77e95dff4b4f901fee236fef6d9552072 ]

IP/IPv6 fragmentation knows how to compute only TCP/UDP checksum.
This causes problems if SCTP packets has to be fragmented and
ipsummed has been set to PARTIAL due to checksum offload support.
This condition can happen when retransmitting after MTU discover,
or when INIT or other control chunks are larger then MTU.
Check for the rare fragmentation condition in SCTP and use software
checksum calculation in this case.

CC: Fan Du 
Signed-off-by: Vlad Yasevich 
Acked-by: Neil Horman 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 net/sctp/output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index 8d4eacf..54bc011 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -507,7 +507,7 @@ int sctp_packet_transmit(struct sctp_packet *packet)
 */
if (!sctp_checksum_disable &&
(!(dst->dev->features & (NETIF_F_NO_CSUM | NETIF_F_SCTP_CSUM)) ||
-(dst_xfrm(dst) != NULL))) {
+(dst_xfrm(dst) != NULL) || packet->ipfragok)) {
__u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
 
/* 3) Put the resultant value into the checksum field in the
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 106/143] net: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunk

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Daniel Borkmann 

[ Upstream commit c485658bae87faccd7aed540fd2ca3ab37992310 ]

While working on ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to
verify if we/peer is AUTH capable"), we noticed that there's a skb
memory leakage in the error path.

Running the same reproducer as in ec0223ec48a9 and by unconditionally
jumping to the error label (to simulate an error condition) in
sctp_sf_do_5_1D_ce() receive path lets kmemleak detector bark about
the unfreed chunk->auth_chunk skb clone:

Unreferenced object 0x8800b8f3a000 (size 256):
  comm "softirq", pid 0, jiffies 4294769856 (age 110.757s)
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
89 ab 75 5e d4 01 58 13 00 00 00 00 00 00 00 00  ..u^..X.
  backtrace:
[] kmemleak_alloc+0x4e/0xb0
[] kmem_cache_alloc+0xc8/0x210
[] skb_clone+0x49/0xb0
[] sctp_endpoint_bh_rcv+0x1d9/0x230 [sctp]
[] sctp_inq_push+0x4c/0x70 [sctp]
[] sctp_rcv+0x82e/0x9a0 [sctp]
[] ip_local_deliver_finish+0xa8/0x210
[] nf_reinject+0xbf/0x180
[] nfqnl_recv_verdict+0x1d2/0x2b0 [nfnetlink_queue]
[] nfnetlink_rcv_msg+0x14b/0x250 [nfnetlink]
[] netlink_rcv_skb+0xa9/0xc0
[] nfnetlink_rcv+0x23f/0x408 [nfnetlink]
[] netlink_unicast+0x168/0x250
[] netlink_sendmsg+0x2e1/0x3f0
[] sock_sendmsg+0x8b/0xc0
[] ___sys_sendmsg+0x369/0x380

What happens is that commit bbd0d59809f9 clones the skb containing
the AUTH chunk in sctp_endpoint_bh_rcv() when having the edge case
that an endpoint requires COOKIE-ECHO chunks to be authenticated:

  -- INIT[RANDOM; CHUNKS; HMAC-ALGO] -->
  <--- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] -
  -- AUTH; COOKIE-ECHO >
  < COOKIE-ACK -

When we enter sctp_sf_do_5_1D_ce() and before we actually get to
the point where we process (and subsequently free) a non-NULL
chunk->auth_chunk, we could hit the "goto nomem_init" path from
an error condition and thus leave the cloned skb around w/o
freeing it.

The fix is to centrally free such clones in sctp_chunk_destroy()
handler that is invoked from sctp_chunk_free() after all refs have
dropped; and also move both kfree_skb(chunk->auth_chunk) there,
so that chunk->auth_chunk is either NULL (since sctp_chunkify()
allocs new chunks through kmem_cache_zalloc()) or non-NULL with
a valid skb pointer. chunk->skb and chunk->auth_chunk are the
only skbs in the sctp_chunk structure that need to be handeled.

While at it, we should use consume_skb() for both. It is the same
as dev_kfree_skb() but more appropriately named as we are not
a device but a protocol. Also, this effectively replaces the
kfree_skb() from both invocations into consume_skb(). Functions
are the same only that kfree_skb() assumes that the frame was
being dropped after a failure (e.g. for tools like drop monitor),
usage of consume_skb() seems more appropriate in function
sctp_chunk_destroy() though.

Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH 
chunk")
Signed-off-by: Daniel Borkmann 
Cc: Vlad Yasevich 
Cc: Neil Horman 
Acked-by: Vlad Yasevich 
Acked-by: Neil Horman 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 net/sctp/sm_make_chunk.c | 4 ++--
 net/sctp/sm_statefuns.c  | 4 
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index feedee7..22d4ed8 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -1356,8 +1356,8 @@ static void sctp_chunk_destroy(struct sctp_chunk *chunk)
BUG_ON(!list_empty(>list));
list_del_init(>transmitted_list);
 
-   /* Free the chunk skb data and the SCTP_chunk stub itself. */
-   dev_kfree_skb(chunk->skb);
+   consume_skb(chunk->skb);
+   consume_skb(chunk->auth_chunk);
 
SCTP_DBG_OBJCNT_DEC(chunk);
kmem_cache_free(sctp_chunk_cachep, chunk);
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index d43002b..6da0171 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -762,10 +762,6 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(const struct 
sctp_endpoint *ep,
auth.transport = chunk->transport;
 
ret = sctp_sf_authenticate(ep, new_asoc, type, );
-
-   /* We can now safely free the auth_chunk clone */
-   kfree_skb(chunk->auth_chunk);
-
if (ret != SCTP_IERROR_NO_ERROR) {
sctp_association_free(new_asoc);
return sctp_sf_pdiscard(ep, asoc, type, arg, commands);
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the

[ 091/143] net: drop_monitor: fix the value of maxattr

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Changli Gao 

[ Upstream commit d323e92cc3f4edd943610557c9ea1bb4bb5056e8 ]

maxattr in genl_family should be used to save the max attribute
type, but not the max command type. Drop monitor doesn't support
any attributes, so we should leave it as zero.

Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 net/core/drop_monitor.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 0a113f2..e65fa2f 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -63,7 +63,6 @@ static struct genl_family net_drop_monitor_family = {
.hdrsize= 0,
.name   = "NET_DM",
.version= 2,
-   .maxattr= NET_DM_CMD_MAX,
 };
 
 static DEFINE_PER_CPU(struct per_cpu_dm_data, dm_cpu_data);
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 028/143] net: do not call sock_put() on TIMEWAIT sockets

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 80ad1d61e72d626e30ebe8529a0455e660ca4693 ]

commit 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU /
hlist_nulls") incorrectly used sock_put() on TIMEWAIT sockets.

We should instead use inet_twsk_put()

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 net/ipv4/inet_hashtables.c  | 2 +-
 net/ipv6/inet6_hashtables.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index d717267..03fd04a 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -247,7 +247,7 @@ begintw:
}
if (unlikely(!INET_TW_MATCH(sk, net, hash, acookie,
 saddr, daddr, ports, dif))) {
-   sock_put(sk);
+   inet_twsk_put(inet_twsk(sk));
goto begintw;
}
goto out;
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 093e9b2..93765577 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -104,7 +104,7 @@ begintw:
goto out;
}
if (!INET6_TW_MATCH(sk, net, hash, saddr, daddr, ports, 
dif)) {
-   sock_put(sk);
+   inet_twsk_put(inet_twsk(sk));
goto begintw;
}
goto out;
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 019/143] HID: validate HID report id size

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Kees Cook 

commit 43622021d2e2b82ea03d883926605bdd0525e1d1 upstream

The "Report ID" field of a HID report is used to build indexes of
reports. The kernel's index of these is limited to 256 entries, so any
malicious device that sets a Report ID greater than 255 will trigger
memory corruption on the host:

[ 1347.156239] BUG: unable to handle kernel paging request at 88094958a878
[ 1347.156261] IP: [] hid_register_report+0x2a/0x8b

CVE-2013-2888

Signed-off-by: Kees Cook 
Cc: sta...@kernel.org
Signed-off-by: Jiri Kosina 
[jmm: backport to 2.6.32]
Signed-off-by: Willy Tarreau 
---
 drivers/hid/hid-core.c | 10 +++---
 include/linux/hid.h|  4 +++-
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index 11f8069..e40e3c4 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -58,6 +58,8 @@ static struct hid_report *hid_register_report(struct 
hid_device *device, unsigne
struct hid_report_enum *report_enum = device->report_enum + type;
struct hid_report *report;
 
+   if (id >= HID_MAX_IDS)
+   return NULL;
if (report_enum->report_id_hash[id])
return report_enum->report_id_hash[id];
 
@@ -368,8 +370,10 @@ static int hid_parser_global(struct hid_parser *parser, 
struct hid_item *item)
 
case HID_GLOBAL_ITEM_TAG_REPORT_ID:
parser->global.report_id = item_udata(item);
-   if (parser->global.report_id == 0) {
-   dbg_hid("report_id 0 is invalid\n");
+   if (parser->global.report_id == 0 ||
+   parser->global.report_id >= HID_MAX_IDS) {
+   dbg_hid("report_id %u is invalid\n",
+   parser->global.report_id);
return -1;
}
return 0;
@@ -545,7 +549,7 @@ static void hid_device_release(struct device *dev)
for (i = 0; i < HID_REPORT_TYPES; i++) {
struct hid_report_enum *report_enum = device->report_enum + i;
 
-   for (j = 0; j < 256; j++) {
+   for (j = 0; j < HID_MAX_IDS; j++) {
struct hid_report *report = 
report_enum->report_id_hash[j];
if (report)
hid_free_report(report);
diff --git a/include/linux/hid.h b/include/linux/hid.h
index 8709365..481080d 100644
--- a/include/linux/hid.h
+++ b/include/linux/hid.h
@@ -410,10 +410,12 @@ struct hid_report {
struct hid_device *device;  /* associated device */
 };
 
+#define HID_MAX_IDS 256
+
 struct hid_report_enum {
unsigned numbered;
struct list_head report_list;
-   struct hid_report *report_id_hash[256];
+   struct hid_report *report_id_hash[HID_MAX_IDS];
 };
 
 #define HID_REPORT_TYPES 3
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 139/143] SELinux: Fix kernel BUG on empty security contexts.

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Stephen Smalley 

commit 2172fa709ab32ca60e86179dc67d0857be8e2c98 upstream

Setting an empty security context (length=0) on a file will
lead to incorrectly dereferencing the type and other fields
of the security context structure, yielding a kernel BUG.
As a zero-length security context is never valid, just reject
all such security contexts whether coming from userspace
via setxattr or coming from the filesystem upon a getxattr
request by SELinux.

Setting a security context value (empty or otherwise) unknown to
SELinux in the first place is only possible for a root process
(CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
if the corresponding SELinux mac_admin permission is also granted
to the domain by policy.  In Fedora policies, this is only allowed for
specific domains such as livecd for setting down security contexts
that are not defined in the build host policy.

Reproducer:
su
setenforce 0
touch foo
setfattr -n security.selinux foo

Caveat:
Relabeling or removing foo after doing the above may not be possible
without booting with SELinux disabled.  Any subsequent access to foo
after doing the above will also trigger the BUG.

BUG output from Matthew Thode:
[  473.893141] [ cut here ]
[  473.962110] kernel BUG at security/selinux/ss/services.c:654!
[  473.995314] invalid opcode:  [#6] SMP
[  474.027196] Modules linked in:
[  474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G  D   I
3.13.0-grsec #1
[  474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
07/29/10
[  474.149768] task: 8805f50cd010 ti: 8805f50cd488 task.ti:
8805f50cd488
[  474.183707] RIP: 0010:[]  []
context_struct_compute_av+0xce/0x308
[  474.219954] RSP: 0018:8805c0ac3c38  EFLAGS: 00010246
[  474.252253] RAX:  RBX: 8805c0ac3d94 RCX:
0100
[  474.287018] RDX: 8805e8aac000 RSI:  RDI:
8805e8aaa000
[  474.321199] RBP: 8805c0ac3cb8 R08: 0010 R09:
0006
[  474.357446] R10:  R11: 8805c567a000 R12:
0006
[  474.419191] R13: 8805c2b74e88 R14: 01da R15:

[  474.453816] FS:  7f2e75220800() GS:88061fc0()
knlGS:
[  474.489254] CS:  0010 DS:  ES:  CR0: 80050033
[  474.522215] CR2: 7f2e74716090 CR3: 0005c085e000 CR4:
000207f0
[  474.556058] Stack:
[  474.584325]  8805c0ac3c98 811b549b 8805c0ac3c98
8805f1190a40
[  474.618913]  8805a6202f08 8805c2b74e88 00068800d0464990
8805e8aac860
[  474.653955]  8805c0ac3cb8 000700068113833a 880606c75060
8805c0ac3d94
[  474.690461] Call Trace:
[  474.723779]  [] ? lookup_fast+0x1cd/0x22a
[  474.778049]  [] security_compute_av+0xf4/0x20b
[  474.811398]  [] avc_compute_av+0x2a/0x179
[  474.843813]  [] avc_has_perm+0x45/0xf4
[  474.875694]  [] inode_has_perm+0x2a/0x31
[  474.907370]  [] selinux_inode_getattr+0x3c/0x3e
[  474.938726]  [] security_inode_getattr+0x1b/0x22
[  474.970036]  [] vfs_getattr+0x19/0x2d
[  475.000618]  [] vfs_fstatat+0x54/0x91
[  475.030402]  [] vfs_lstat+0x19/0x1b
[  475.061097]  [] SyS_newlstat+0x15/0x30
[  475.094595]  [] ? __audit_syscall_entry+0xa1/0xc3
[  475.148405]  [] system_call_fastpath+0x16/0x1b
[  475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
[  475.255884] RIP  []
context_struct_compute_av+0xce/0x308
[  475.296120]  RSP 
[  475.328734] ---[ end trace f076482e9d754adc ]---

Reported-by:  Matthew Thode 
Signed-off-by: Stephen Smalley 
Cc: sta...@vger.kernel.org
Signed-off-by: Paul Moore 
Signed-off-by: Willy Tarreau 
---
 security/selinux/ss/services.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index ff17820..dee7177 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -1074,6 +1074,10 @@ static int security_context_to_sid_core(const char 
*scontext, u32 scontext_len,
struct context context;
int rc = 0;
 
+   /* An empty security context is never valid. */
+   if (!scontext_len)
+   return -EINVAL;
+
if (!ss_initialized) {
int i;
 
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 142/143] floppy: ignore kernel-only members in FDRAWCMD ioctl input

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Matthew Daley 

Always clear out these floppy_raw_cmd struct members after copying the
entire structure from userspace so that the in-kernel version is always
valid and never left in an interdeterminate state.

Signed-off-by: Matthew Daley 
Signed-off-by: Linus Torvalds 
(cherry picked from commit ef87dbe7614341c2e7bfe8d32fcb7028cc97442c)
[wt: be careful in 2.6.32 we still have the ugly macros everywhere]
Signed-off-by: Willy Tarreau 
---
 drivers/block/floppy.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 5c01f74..19d45e6 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3209,9 +3209,12 @@ static inline int raw_cmd_copyin(int cmd, char __user 
*param,
if (!ptr)
return -ENOMEM;
*rcmd = ptr;
-   COPYIN(*ptr);
+   ret = copy_from_user(ptr, (void __user *)param, sizeof(*ptr));
ptr->next = NULL;
ptr->buffer_length = 0;
+   ptr->kernel_data = NULL;
+   if (ret)
+   return -EFAULT;
param += sizeof(struct floppy_raw_cmd);
if (ptr->cmd_count > 33)
/* the command may now also take up the space
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 043/143] dm9601: fix IFF_ALLMULTI handling

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Peter Korsgaard 

[ Upstream commit bf0ea6380724beb64f27a722dfc4b0edabff816e ]

Pass-all-multicast is controlled by bit 3 in RX control, not bit 2
(pass undersized frames).

Reported-by: Joseph Chang 
Signed-off-by: Peter Korsgaard 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 drivers/net/usb/dm9601.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/dm9601.c b/drivers/net/usb/dm9601.c
index 9a6eede..498681a 100644
--- a/drivers/net/usb/dm9601.c
+++ b/drivers/net/usb/dm9601.c
@@ -382,7 +382,7 @@ static void dm9601_set_multicast(struct net_device *net)
if (net->flags & IFF_PROMISC) {
rx_ctl |= 0x02;
} else if (net->flags & IFF_ALLMULTI || net->mc_count > DM_MAX_MCAST) {
-   rx_ctl |= 0x04;
+   rx_ctl |= 0x08;
} else if (net->mc_count) {
struct dev_mc_list *mc_list = net->mc_list;
int i;
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 136/143] qeth: avoid buffer overflow in snmp ioctl

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Ursula Braun 

commit 6fb392b1a63ae36c31f62bc3fc8630b49d602b62 upstream

Check user-defined length in snmp ioctl request and allow request
only if it fits into a qeth command buffer.

Signed-off-by: Ursula Braun 
Signed-off-by: Frank Blaschka 
Reviewed-by: Heiko Carstens 
Reported-by: Nico Golde 
Reported-by: Fabian Yamaguchi 
Cc: 
Signed-off-by: David S. Miller 
[jmm: backport 2.6.32]
Signed-off-by: Willy Tarreau 
---
 drivers/s390/net/qeth_core_main.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index c4a42d9..29afd6c 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -3557,7 +3557,7 @@ int qeth_snmp_command(struct qeth_card *card, char __user 
*udata)
struct qeth_cmd_buffer *iob;
struct qeth_ipa_cmd *cmd;
struct qeth_snmp_ureq *ureq;
-   int req_len;
+   unsigned int req_len;
struct qeth_arp_query_info qinfo = {0, };
int rc = 0;
 
@@ -3573,6 +3573,10 @@ int qeth_snmp_command(struct qeth_card *card, char 
__user *udata)
/* skip 4 bytes (data_len struct member) to get req_len */
if (copy_from_user(_len, udata + sizeof(int), sizeof(int)))
return -EFAULT;
+   if (req_len > (QETH_BUFSIZE - IPA_PDU_HEADER_SIZE -
+  sizeof(struct qeth_ipacmd_hdr) -
+  sizeof(struct qeth_ipacmd_setadpparms_hdr)))
+   return -EINVAL;
ureq = kmalloc(req_len+sizeof(struct qeth_snmp_ureq_hdr), GFP_KERNEL);
if (!ureq) {
QETH_DBF_TEXT(TRACE, 2, "snmpnome");
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 072/143] tipc: fix lockdep warning during bearer initialization

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Ying Xue 

[ Upstream commit 4225a398c1352a7a5c14dc07277cb5cc4473983b ]

When the lockdep validator is enabled, it will report the below
warning when we enable a TIPC bearer:

[ INFO: possible irq lock inversion dependency detected ]
-
Possible interrupt unsafe locking scenario:

CPU0CPU1

   lock(ptype_lock);
local_irq_disable();
lock(tipc_net_lock);
lock(ptype_lock);
   
   lock(tipc_net_lock);

  *** DEADLOCK ***

the shortest dependencies between 2nd lock and 1st lock:
  -> (ptype_lock){+.+...} ops: 10 {
[...]
SOFTIRQ-ON-W at:
  [] __lock_acquire+0x528/0x13e0
  [] lock_acquire+0x90/0x100
  [] _raw_spin_lock+0x38/0x50
  [] dev_add_pack+0x3a/0x60
  [] arp_init+0x1a/0x48
  [] inet_init+0x181/0x27e
  [] do_one_initcall+0x34/0x170
  [] kernel_init+0x110/0x1b2
  [] kernel_thread_helper+0x6/0x10
[...]
   ... key  at: [] ptype_lock+0x10/0x20
   ... acquired at:
[] lock_acquire+0x90/0x100
[] _raw_spin_lock+0x38/0x50
[] dev_add_pack+0x3a/0x60
[] enable_bearer+0xf2/0x140 [tipc]
[] tipc_enable_bearer+0x1ba/0x450 [tipc]
[] tipc_cfg_do_cmd+0x5c4/0x830 [tipc]
[] handle_cmd+0x42/0xd0 [tipc]
[] genl_rcv_msg+0x232/0x280
[] netlink_rcv_skb+0x86/0xb0
[] genl_rcv+0x1c/0x30
[] netlink_unicast+0x174/0x1f0
[] netlink_sendmsg+0x1eb/0x2d0
[] sock_aio_write+0x161/0x170
[] do_sync_write+0xac/0xf0
[] vfs_write+0x156/0x170
[] sys_write+0x42/0x70
[] sysenter_do_call+0x12/0x38
[...]
}
  -> (tipc_net_lock){+..-..} ops: 4 {
[...]
IN-SOFTIRQ-R at:
 [] __lock_acquire+0x64a/0x13e0
 [] lock_acquire+0x90/0x100
 [] _raw_read_lock_bh+0x3d/0x50
 [] tipc_recv_msg+0x1d/0x830 [tipc]
 [] recv_msg+0x3f/0x50 [tipc]
 [] __netif_receive_skb+0x22a/0x590
 [] netif_receive_skb+0x2b/0xf0
 [] pcnet32_poll+0x292/0x780
 [] net_rx_action+0xfa/0x1e0
 [] __do_softirq+0xae/0x1e0
[...]
}

>From the log, we can see three different call chains between
CPU0 and CPU1:

Time 0 on CPU0:

  kernel_init()->inet_init()->dev_add_pack()

At time 0, the ptype_lock is held by CPU0 in dev_add_pack();

Time 1 on CPU1:

  tipc_enable_bearer()->enable_bearer()->dev_add_pack()

At time 1, tipc_enable_bearer() first holds tipc_net_lock, and then
wants to take ptype_lock to register TIPC protocol handler into the
networking stack.  But the ptype_lock has been taken by dev_add_pack()
on CPU0, so at this time the dev_add_pack() running on CPU1 has to be
busy looping.

Time 2 on CPU0:

  netif_receive_skb()->recv_msg()->tipc_recv_msg()

At time 2, an incoming TIPC packet arrives at CPU0, hence
tipc_recv_msg() will be invoked. In tipc_recv_msg(), it first wants
to hold tipc_net_lock.  At the moment, below scenario happens:

On CPU0, below is our sequence of taking locks:

  lock(ptype_lock)->lock(tipc_net_lock)

On CPU1, our sequence of taking locks looks like:

  lock(tipc_net_lock)->lock(ptype_lock)

Obviously deadlock may happen in this case.

But please note the deadlock possibly doesn't occur at all when the
first TIPC bearer is enabled.  Before enable_bearer() -- running on
CPU1 does not hold ptype_lock, so the TIPC receive handler (i.e.
recv_msg()) is not registered successfully via dev_add_pack(), so
the tipc_recv_msg() cannot be called by recv_msg() even if a TIPC
message comes to CPU0. But when the second TIPC bearer is
registered, the deadlock can perhaps really happen.

To fix it, we will push the work of registering TIPC protocol
handler into workqueue context. After the change, both paths taking
ptype_lock are always in process contexts, thus, the deadlock should
never occur.

Signed-off-by: Ying Xue 
Signed-off-by: Jon Maloy 
Signed-off-by: Paul Gortmaker 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 net/tipc/eth_media.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c
index 524ba56..22453a8 100644
--- a/net/tipc/eth_media.c
+++ b/net/tipc/eth_media.c
@@ -56,6 +56,7 @@ struct eth_bearer {
struct tipc_bearer *bearer;
struct net_device *dev;
struct packet_type tipc_packet_type;
+   struct work_struct setup;
 };
 
 static struct eth_bearer eth_bearers[MAX_ETH_BEARERS];
@@ -122,6 +123,17 @@ static int recv_msg(struct sk_buff *buf, struct net_device 
*dev,
 }
 
 /**
+ * setup_bearer -

[ 022/143] HID: LG: validate HID output report details

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Kees Cook 

commit 0fb6bd06e06792469acc15bbe427361b56ada528 upstream

A HID device could send a malicious output report that would cause the
lg, lg3, and lg4 HID drivers to write beyond the output report allocation
during an event, causing a heap overflow:

[  325.245240] usb 1-1: New USB device found, idVendor=046d, idProduct=c287
...
[  414.518960] BUG kmalloc-4096 (Not tainted): Redzone overwritten

Additionally, while lg2 did correctly validate the report details, it was
cleaned up and shortened.

CVE-2013-2893

Signed-off-by: Kees Cook 
Cc: sta...@vger.kernel.org
Reviewed-by: Benjamin Tissoires 
Signed-off-by: Jiri Kosina 

[jmm: backported to 2.6.32]
Signed-off-by: Willy Tarreau 
---
 drivers/hid/hid-lg2ff.c | 19 +++
 drivers/hid/hid-lgff.c  | 17 ++---
 2 files changed, 5 insertions(+), 31 deletions(-)

diff --git a/drivers/hid/hid-lg2ff.c b/drivers/hid/hid-lg2ff.c
index 4e6dc6e..a260a8c 100644
--- a/drivers/hid/hid-lg2ff.c
+++ b/drivers/hid/hid-lg2ff.c
@@ -65,26 +65,13 @@ int lg2ff_init(struct hid_device *hid)
struct hid_report *report;
struct hid_input *hidinput = list_entry(hid->inputs.next,
struct hid_input, list);
-   struct list_head *report_list =
-   >report_enum[HID_OUTPUT_REPORT].report_list;
struct input_dev *dev = hidinput->input;
int error;
 
-   if (list_empty(report_list)) {
-   dev_err(>dev, "no output report found\n");
+   /* Check that the report looks ok */
+   report = hid_validate_values(hid, HID_OUTPUT_REPORT, 0, 0, 7);
+   if (!report)
return -ENODEV;
-   }
-
-   report = list_entry(report_list->next, struct hid_report, list);
-
-   if (report->maxfield < 1) {
-   dev_err(>dev, "output report is empty\n");
-   return -ENODEV;
-   }
-   if (report->field[0]->report_count < 7) {
-   dev_err(>dev, "not enough values in the field\n");
-   return -ENODEV;
-   }
 
lg2ff = kmalloc(sizeof(struct lg2ff_device), GFP_KERNEL);
if (!lg2ff)
diff --git a/drivers/hid/hid-lgff.c b/drivers/hid/hid-lgff.c
index 987abeb..df26abb 100644
--- a/drivers/hid/hid-lgff.c
+++ b/drivers/hid/hid-lgff.c
@@ -135,27 +135,14 @@ static void hid_lgff_set_autocenter(struct input_dev 
*dev, u16 magnitude)
 int lgff_init(struct hid_device* hid)
 {
struct hid_input *hidinput = list_entry(hid->inputs.next, struct 
hid_input, list);
-   struct list_head *report_list = 
>report_enum[HID_OUTPUT_REPORT].report_list;
struct input_dev *dev = hidinput->input;
-   struct hid_report *report;
-   struct hid_field *field;
const signed short *ff_bits = ff_joystick;
int error;
int i;
 
-   /* Find the report to use */
-   if (list_empty(report_list)) {
-   err_hid("No output report found");
-   return -1;
-   }
-
/* Check that the report looks ok */
-   report = list_entry(report_list->next, struct hid_report, list);
-   field = report->field[0];
-   if (!field) {
-   err_hid("NULL field");
-   return -1;
-   }
+   if (!hid_validate_values(hid, HID_OUTPUT_REPORT, 0, 0, 7))
+   return -ENODEV;
 
for (i = 0; i < ARRAY_SIZE(devices); i++) {
if (dev->id.vendor == devices[i].idVendor &&
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 041/143] ipv6 mcast: use in6_dev_put in timer handlers instead of

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--
 __in6_dev_put

From: Salam Noureddine 

[ Upstream commit 9260d3e1013701aa814d10c8fc6a9f92bd17d643 ]

It is possible for the timer handlers to run after the call to
ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the
handler function in order to do proper cleanup when the refcnt
reaches 0. Otherwise, the refcnt can reach zero without the
inet6_dev being destroyed and we end up leaking a reference to
the net_device and see messages like the following,

unregister_netdevice: waiting for eth0 to become free. Usage count = 1

Tested on linux-3.4.43.

Signed-off-by: Salam Noureddine 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 net/ipv6/mcast.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index f9fcf69..99ae9e3 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -2208,7 +2208,7 @@ static void mld_gq_timer_expire(unsigned long data)
 
idev->mc_gq_running = 0;
mld_send_report(idev, NULL);
-   __in6_dev_put(idev);
+   in6_dev_put(idev);
 }
 
 static void mld_ifc_timer_expire(unsigned long data)
@@ -2221,7 +2221,7 @@ static void mld_ifc_timer_expire(unsigned long data)
if (idev->mc_ifc_count)
mld_ifc_start_timer(idev, idev->mc_maxdelay);
}
-   __in6_dev_put(idev);
+   in6_dev_put(idev);
 }
 
 static void mld_ifc_event(struct inet6_dev *idev)
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 070/143] ipv6: Dont depend on per socket memory for neighbour discovery

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--
 messages

From: Thomas Graf 

[ Upstream commit 25a6e6b84fba601eff7c28d30da8ad7cfbef0d43 ]

Allocating skbs when sending out neighbour discovery messages
currently uses sock_alloc_send_skb() based on a per net namespace
socket and thus share a socket wmem buffer space.

If a netdevice is temporarily unable to transmit due to carrier
loss or for other reasons, the queued up ndisc messages will cosnume
all of the wmem space and will thus prevent from any more skbs to
be allocated even for netdevices that are able to transmit packets.

The number of neighbour discovery messages sent is very limited,
use of alloc_skb() bypasses the socket wmem buffer size enforcement
while the manual call to skb_set_owner_w() maintains the socket
reference needed for the IPv6 output path.

This patch has orginally been posted by Eric Dumazet in a modified
form.

Signed-off-by: Thomas Graf 
Cc: Eric Dumazet 
Cc: Hannes Frederic Sowa 
Cc: Stephen Warren 
Cc: Fabio Estevam 
Tested-by: Fabio Estevam 
Tested-by: Stephen Warren 
Acked-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 
Signed-off-by: Willy Tarreau 
---
 net/ipv6/ndisc.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index f74e4e2..752da21 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -449,7 +449,6 @@ struct sk_buff *ndisc_build_skb(struct net_device *dev,
struct sk_buff *skb;
struct icmp6hdr *hdr;
int len;
-   int err;
u8 *opt;
 
if (!dev->addr_len)
@@ -459,14 +458,12 @@ struct sk_buff *ndisc_build_skb(struct net_device *dev,
if (llinfo)
len += ndisc_opt_addr_space(dev);
 
-   skb = sock_alloc_send_skb(sk,
- (MAX_HEADER + sizeof(struct ipv6hdr) +
-  len + LL_ALLOCATED_SPACE(dev)),
- 1, );
+   skb = alloc_skb((MAX_HEADER + sizeof(struct ipv6hdr) +
+len + LL_ALLOCATED_SPACE(dev)), GFP_ATOMIC);
if (!skb) {
ND_PRINTK0(KERN_ERR
-  "ICMPv6 ND: %s() failed to allocate an skb, 
err=%d.\n",
-  __func__, err);
+  "ICMPv6 ND: %s() failed to allocate an skb.\n",
+  __func__);
return NULL;
}
 
@@ -494,6 +491,11 @@ struct sk_buff *ndisc_build_skb(struct net_device *dev,
   csum_partial(hdr,
len, 0));
 
+   /* Manually assign socket ownership as we avoid calling
+* sock_alloc_send_pskb() to bypass wmem buffer limits
+*/
+   skb_set_owner_w(skb, sk);
+
return skb;
 }
 
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 131/143] dm snapshot: fix data corruption

2014-05-11 Thread Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

--

From: Mikulas Patocka 

CVE-2013-4299

BugLink: http://bugs.launchpad.net/bugs/1241769

This patch fixes a particular type of data corruption that has been
encountered when loading a snapshot's metadata from disk.

When we allocate a new chunk in persistent_prepare, we increment
ps->next_free and we make sure that it doesn't point to a metadata area
by further incrementing it if necessary.

When we load metadata from disk on device activation, ps->next_free is
positioned after the last used data chunk. However, if this last used
data chunk is followed by a metadata area, ps->next_free is positioned
erroneously to the metadata area. A newly-allocated chunk is placed at
the same location as the metadata area, resulting in data or metadata
corruption.

This patch changes the code so that ps->next_free skips the metadata
area when metadata are loaded in function read_exceptions.

The patch also moves a piece of code from persistent_prepare_exception
to a separate function skip_metadata to avoid code duplication.

CVE-2013-4299

Signed-off-by: Mikulas Patocka 
Cc: sta...@vger.kernel.org
Cc: Mike Snitzer 
Signed-off-by: Alasdair G Kergon 
(back ported from commit e9c6a182649f4259db704ae15a91ac820e63b0ca)
Signed-off-by: Luis Henriques 
Acked-by: Stefan Bader 
Signed-off-by: Tim Gardner 
Signed-off-by: Willy Tarreau 
---
 drivers/md/dm-snap-persistent.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-snap-persistent.c b/drivers/md/dm-snap-persistent.c
index 0c74642..97c3f06 100644
--- a/drivers/md/dm-snap-persistent.c
+++ b/drivers/md/dm-snap-persistent.c
@@ -252,6 +252,14 @@ static chunk_t area_location(struct pstore *ps, chunk_t 
area)
return 1 + ((ps->exceptions_per_area + 1) * area);
 }
 
+static void skip_metadata(struct pstore *ps)
+{
+   uint32_t stride = ps->exceptions_per_area + 1;
+   chunk_t next_free = ps->next_free;
+   if (sector_div(next_free, stride) == 1)
+   ps->next_free++;
+}
+
 /*
  * Read or write a metadata area.  Remembering to skip the first
  * chunk which holds the header.
@@ -481,6 +489,8 @@ static int read_exceptions(struct pstore *ps,
 
ps->current_area--;
 
+   skip_metadata(ps);
+
return 0;
 }
 
@@ -587,8 +597,6 @@ static int persistent_prepare_exception(struct 
dm_exception_store *store,
struct dm_snap_exception *e)
 {
struct pstore *ps = get_info(store);
-   uint32_t stride;
-   chunk_t next_free;
sector_t size = get_dev_size(store->cow->bdev);
 
/* Is there enough room ? */
@@ -601,10 +609,8 @@ static int persistent_prepare_exception(struct 
dm_exception_store *store,
 * Move onto the next free pending, making sure to take
 * into account the location of the metadata chunks.
 */
-   stride = (ps->exceptions_per_area + 1);
-   next_free = ++ps->next_free;
-   if (sector_div(next_free, stride) == 1)
-   ps->next_free++;
+   ps->next_free++;
+   skip_metadata(ps);
 
atomic_inc(>pending_count);
return 0;
-- 
1.7.12.2.21.g234cd45.dirty



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1038 matches

Mail list logo