date:20140725

Re: ARC fails to boot on linux-next of 20140711

2014-07-25 Thread Grant Likely

On Fri, 25 Jul 2014 09:15:22 -0500, Rob Herring  wrote:
> On Fri, Jul 25, 2014 at 6:02 AM, Vineet Gupta
>  wrote:
> > Hi Grant,
> >
> > linux-next has a series for arc_uart (via tty tree) which converts it to 
> > generic
> > earlycon and specifies console via /chosen/stdout-path vs.  an explicit 
> > param in
> > /chose/bootargs
> >
> > 2014-06-24 9da433c0a0b5 ARC: [arcfpga] stdout-path now suffices for 
> > earlycon/console
> >
> > This relied on prev commit of yours (from linux next of 20140711), which 
> > seem to
> > have disappeared now.
> >
> > 2014-03-27 a9296cf2d0b6 of: Create of_console_check() for selecting a 
> > console
> > specified in /chosen
> > 2014-03-27 cfa9cacc5dd3 of: Enable console on serial ports specified by
> > /chosen/stdout-path
> >
> > Is there a specific reason for dropping these patches (or perhaps a merge 
> > to be
> > merged). I cherry-picked both but still doesn't work.
> >
> > Can you please advise next step forward, before I go off debugging with 
> > those
> > patches in.
> 
> There's an issue that if you have stdout-path and "earlycon" on the
> command line, the kernel will switch to tty0 and disable the earlycon.
> 
> This is the "fix", but I don't like adding the DT dependency into generic 
> code:

Yes, I'm not fond of it either. I've not been able to test it though and
work out a proper bug fix. As far as I can understand, the earlycon code
only works on aarch64, correct? I haven't been able to get an aarch64
boot working in QEMU yet.

g.

> 
> @@ -2382,7 +2386,7 @@ void register_console(struct console *newcon)
> if (newcon->setup == NULL ||
> newcon->setup(newcon, NULL) == 0) {
> newcon->flags |= CON_ENABLED;
> -   if (newcon->device) {
> +   if (newcon->device  && !of_stdout) {
> newcon->flags |= CON_CONSDEV;
> preferred_console = 0;
> }
> 
> Rob

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/2] dirreadahead system call

2014-07-25 Thread Andreas Dilger

Is there a time when this doesn't get called to prefetch entries in
readdir() order?  It isn't clear to me what benefit there is of returning
the entries to userspace instead of just doing the statahead implicitly
in the kernel?

The Lustre client has had what we call "statahead" for a while,
and similar to regular file readahead it detects the sequential access
pattern for readdir() + stat() in readdir() order (taking into account if ".*"
entries are being processed or not) and starts fetching the inode
attributes asynchronously with a worker thread.

This syscall might be more useful if userspace called readdir() to get
the dirents and then passed the kernel the list of inode numbers
to prefetch before starting on the stat() calls. That way, userspace
could generate an arbitrary list of inodes (e.g. names matching a
regexp) and the kernel doesn't need to guess if every inode is needed. 

As it stands, this syscall doesn't help in anything other than readdir
order (or of the directory is small enough to be handled in one
syscall), which could be handled by the kernel internally already,
and it may fetch a considerable number of extra inodes from
disk if not every inode needs to be touched. 

Cheers, Andreas

> On Jul 25, 2014, at 11:37, Abhi Das  wrote:
> 
> This system call takes 3 arguments:
> fd  - file descriptor of the directory being readahead
> *offset - offset in dir from which to resume. This is updated
>  as we move along in the directory
> count   - The max number of entries to readahead
> 
> The syscall is supposed to read upto 'count' entries starting at
> '*offset' and cache the inodes corresponding to those entries. It
> returns a negative error code or a positive number indicating
> the number of inodes it has issued readaheads for. It also
> updates the '*offset' value so that repeated calls to dirreadahead
> can resume at the right location. Returns 0 when there are no more
> entries left.
> 
> Abhi Das (2):
>  fs: Add dirreadahead syscall and VFS hooks
>  gfs2: GFS2's implementation of the dir_readahead file operation
> 
> arch/x86/syscalls/syscall_32.tbl |   1 +
> arch/x86/syscalls/syscall_64.tbl |   1 +
> fs/gfs2/Makefile |   3 +-
> fs/gfs2/dir.c|  49 ++---
> fs/gfs2/dir.h|  15 +++
> fs/gfs2/dir_readahead.c  | 209 +++
> fs/gfs2/file.c   |   2 +
> fs/gfs2/main.c   |  10 +-
> fs/gfs2/super.c  |   1 +
> fs/readdir.c |  49 +
> include/linux/fs.h   |   3 +
> 11 files changed, 328 insertions(+), 15 deletions(-)
> create mode 100644 fs/gfs2/dir_readahead.c
> 
> -- 
> 1.8.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: general protection fault on 3.15.6

2014-07-25 Thread Steven Noonan

On Fri, Jul 25, 2014 at 9:42 PM, Steven Noonan  wrote:
> On Thu, Jul 24, 2014 at 12:06 AM, Alexander Holler  
> wrote:
>> Am 23.07.2014 19:50, schrieb Steven Noonan:
>>
>>> (Oops, LKML doesn't like rich text, resending. Was trying to avoid
>>> GMail's bad line wrapping. Going to use Mutt instead.)
>>>
>>> I'm starting to wonder if it's bad RAM or something. Just got a couple of
>>> worrying warnings on boot from the same system (after it spontaneously
>>> rebooted, with nothing revealing in the previous boot's logs).
>
> So the spontaneous reboot was apparently caused by a power outage. All
> my boxes had identical uptimes of less than a couple days when I checked
> them.
>
>>
>>
>> I once had such too and since then I'm using memtest=3 in my kernel command
>> line on x86* machines. Depending on the amount of RAM it will slow down boot
>> by a few seconds, but if you don't care if your machine comes up in 5 or 10
>> seconds, it is a no-brainer.
>>
>
> However, I got another general protection fault. This time it happened
> when doing 'find' on an NFS mount point. Tried booting with 'memtest=16'
> to see if that would catch anything, but it passed without finding any
> bad regions. I'm running memtest86 right now to be a bit more thorough
> and ensure it's not just bad hardware, but so far it's not found
> anything (1 full pass done so far).
>
> Here's the latest backtraces. I only managed to copy/paste this before
> the system hung and I had to reboot it, but there should be a more
> complete kernel log in the systemd journal that I can grab once it's
> done with memtest86.
>
> [212326.408380] general protection fault:  [#1] SMP
> [212326.409183] Modules linked in: rpcsec_gss_krb5 auth_rpcgss oid_registry 
> nfsv4 nfs lockd fscache sunrpc macvlan xt_nat sit tunnel4 ip_tunnel sch_sfq 
> ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT 
> xt_limit 8021q nf_conntrack_ipv4 nf_defrag_ipv4 xt_LOG xt_tcpudp bridge 
> ip6t_rt nf_conntrack_ipv6 stp llc nf_defrag_ipv6 xt_conntrack nf_conntrack 
> iptable_filter ip6table_filter ip6_tables ip_tables x_tables it87 hwmon_vid 
> nls_cp437 vfat fat x86_pkg_temp_thermal iTCO_wdt intel_powerclamp raid1 
> iTCO_vendor_support raid0 coretemp crct10dif_pclmul md_mod snd_hda_codec_hdmi 
> crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
> snd_hda_codec_realtek glue_helper ablk_helper cryptd snd_hda_codec_generic 
> snd_hda_intel snd_hda_controller microcode i2c_i801 r8169 snd_hda_codec
> [212326.411879]  snd_hwdep mii snd_pcm snd_timer thermal fan snd acpi_cpufreq 
> battery soundcore lpc_ich mfd_core evdev processor zfs(PO) zunicode(PO) 
> zavl(PO) zcommon(PO) znvpair(PO) spl(O) tun usbip_host(C) usbip_core(C) msr 
> loop kvm_intel kvm efivarfs ext4 crc16 jbd2 mbcache sd_mod crc_t10dif 
> crct10dif_common hid_generic usbhid hid ahci libahci crc32c_intel ehci_pci 
> libata xhci_hcd ehci_hcd scsi_mod usbcore usb_common i915 video intel_gtt 
> i2c_algo_bit drm_kms_helper drm i2c_core e1000e ptp pps_core ipmi_poweroff 
> ipmi_msghandler button
> [212326.414577] CPU: 5 PID: 30360 Comm: find Tainted: PWC O  
> 3.15.6-1-ec2 #1
> [212326.415457] Hardware name: Shuttle Inc. SH67H/FH67H, BIOS 2.04 04/10/2013
> [212326.416352] task: 8801275bbb00 ti: 88030f80c000 task.ti: 
> 88030f80c000
> [212326.417261] RIP: 0010:[]  [] 
> __kmalloc_track_caller+0x86/0x260
> [212326.418194] RSP: 0018:88030f80fb78  EFLAGS: 00010282
> [212326.419130] RAX:  RBX: 0004 RCX: 
> 35ee
> [212326.420081] RDX: 35ed RSI:  RDI: 
> 
> [212326.421021] RBP: 88030f80fbb0 R08: 000173c0 R09: 
> 8801eb6ae160
> [212326.421958] R10: 88040e803e00 R11: 0004 R12: 
> ff0074726f707262
> [212326.422887] R13: 00d0 R14: 0004 R15: 
> 88040e803e00
> [212326.423808] FS:  7f3b98919700() GS:88041f34() 
> knlGS:
> [212326.424752] CS:  0010 DS:  ES:  CR0: 80050033
> [212326.425698] CR2: 00ef0010 CR3: 0003ffd3c000 CR4: 
> 001407e0
> [212326.426659] Stack:
> [212326.427620]  88040e803e00 a0211d75 0004 
> 8803607f0558
> [212326.428609]  0009 8801eb6ae000 8801eb6ae140 
> 88030f80fbd0
> [212326.429630]  8116fb60 88030f80fd40 88030f80fe58 
> 88030f80fcc8
> [212326.430640] Call Trace:
> [212326.431651]  [] ? nfs_permission+0x405/0xfb0 [nfs]
> [212326.432681]  [] kmemdup+0x20/0x50
> [212326.433717]  [] nfs_permission+0x405/0xfb0 [nfs]
> [212326.434760]  [] nfs_permission+0x907/0xfb0 [nfs]
> [212326.435810]  [] ? nfs_permission+0x9e0/0xfb0 [nfs]
> [212326.436863]  [] nfs_permission+0xa02/0xfb0 [nfs]
> [212326.437924]  [] do_read_cache_page+0x7e/0x1a0
> [212326.438990]  [] read_cache_page+0x1c/0x20
> [212326.440078]  [] nfs_permission+0xbbb/0xfb0 [nfs]
> [212326.441159]  [] ?

Re: general protection fault on 3.15.6

2014-07-25 Thread Steven Noonan

On Thu, Jul 24, 2014 at 12:06 AM, Alexander Holler  wrote:
> Am 23.07.2014 19:50, schrieb Steven Noonan:
>
>> (Oops, LKML doesn't like rich text, resending. Was trying to avoid
>> GMail's bad line wrapping. Going to use Mutt instead.)
>>
>> I'm starting to wonder if it's bad RAM or something. Just got a couple of
>> worrying warnings on boot from the same system (after it spontaneously
>> rebooted, with nothing revealing in the previous boot's logs).

So the spontaneous reboot was apparently caused by a power outage. All
my boxes had identical uptimes of less than a couple days when I checked
them.

>
>
> I once had such too and since then I'm using memtest=3 in my kernel command
> line on x86* machines. Depending on the amount of RAM it will slow down boot
> by a few seconds, but if you don't care if your machine comes up in 5 or 10
> seconds, it is a no-brainer.
>

However, I got another general protection fault. This time it happened
when doing 'find' on an NFS mount point. Tried booting with 'memtest=16'
to see if that would catch anything, but it passed without finding any
bad regions. I'm running memtest86 right now to be a bit more thorough
and ensure it's not just bad hardware, but so far it's not found
anything (1 full pass done so far).

Here's the latest backtraces. I only managed to copy/paste this before
the system hung and I had to reboot it, but there should be a more
complete kernel log in the systemd journal that I can grab once it's
done with memtest86.

[212326.408380] general protection fault:  [#1] SMP
[212326.409183] Modules linked in: rpcsec_gss_krb5 auth_rpcgss oid_registry 
nfsv4 nfs lockd fscache sunrpc macvlan xt_nat sit tunnel4 ip_tunnel sch_sfq 
ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT 
xt_limit 8021q nf_conntrack_ipv4 nf_defrag_ipv4 xt_LOG xt_tcpudp bridge ip6t_rt 
nf_conntrack_ipv6 stp llc nf_defrag_ipv6 xt_conntrack nf_conntrack 
iptable_filter ip6table_filter ip6_tables ip_tables x_tables it87 hwmon_vid 
nls_cp437 vfat fat x86_pkg_temp_thermal iTCO_wdt intel_powerclamp raid1 
iTCO_vendor_support raid0 coretemp crct10dif_pclmul md_mod snd_hda_codec_hdmi 
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
snd_hda_codec_realtek glue_helper ablk_helper cryptd snd_hda_codec_generic 
snd_hda_intel snd_hda_controller microcode i2c_i801 r8169 snd_hda_codec
[212326.411879]  snd_hwdep mii snd_pcm snd_timer thermal fan snd acpi_cpufreq 
battery soundcore lpc_ich mfd_core evdev processor zfs(PO) zunicode(PO) 
zavl(PO) zcommon(PO) znvpair(PO) spl(O) tun usbip_host(C) usbip_core(C) msr 
loop kvm_intel kvm efivarfs ext4 crc16 jbd2 mbcache sd_mod crc_t10dif 
crct10dif_common hid_generic usbhid hid ahci libahci crc32c_intel ehci_pci 
libata xhci_hcd ehci_hcd scsi_mod usbcore usb_common i915 video intel_gtt 
i2c_algo_bit drm_kms_helper drm i2c_core e1000e ptp pps_core ipmi_poweroff 
ipmi_msghandler button
[212326.414577] CPU: 5 PID: 30360 Comm: find Tainted: PWC O  
3.15.6-1-ec2 #1
[212326.415457] Hardware name: Shuttle Inc. SH67H/FH67H, BIOS 2.04 04/10/2013
[212326.416352] task: 8801275bbb00 ti: 88030f80c000 task.ti: 
88030f80c000
[212326.417261] RIP: 0010:[]  [] 
__kmalloc_track_caller+0x86/0x260
[212326.418194] RSP: 0018:88030f80fb78  EFLAGS: 00010282
[212326.419130] RAX:  RBX: 0004 RCX: 
35ee
[212326.420081] RDX: 35ed RSI:  RDI: 

[212326.421021] RBP: 88030f80fbb0 R08: 000173c0 R09: 
8801eb6ae160
[212326.421958] R10: 88040e803e00 R11: 0004 R12: 
ff0074726f707262
[212326.422887] R13: 00d0 R14: 0004 R15: 
88040e803e00
[212326.423808] FS:  7f3b98919700() GS:88041f34() 
knlGS:
[212326.424752] CS:  0010 DS:  ES:  CR0: 80050033
[212326.425698] CR2: 00ef0010 CR3: 0003ffd3c000 CR4: 
001407e0
[212326.426659] Stack:
[212326.427620]  88040e803e00 a0211d75 0004 
8803607f0558
[212326.428609]  0009 8801eb6ae000 8801eb6ae140 
88030f80fbd0
[212326.429630]  8116fb60 88030f80fd40 88030f80fe58 
88030f80fcc8
[212326.430640] Call Trace:
[212326.431651]  [] ? nfs_permission+0x405/0xfb0 [nfs]
[212326.432681]  [] kmemdup+0x20/0x50
[212326.433717]  [] nfs_permission+0x405/0xfb0 [nfs]
[212326.434760]  [] nfs_permission+0x907/0xfb0 [nfs]
[212326.435810]  [] ? nfs_permission+0x9e0/0xfb0 [nfs]
[212326.436863]  [] nfs_permission+0xa02/0xfb0 [nfs]
[212326.437924]  [] do_read_cache_page+0x7e/0x1a0
[212326.438990]  [] read_cache_page+0x1c/0x20
[212326.440078]  [] nfs_permission+0xbbb/0xfb0 [nfs]
[212326.441159]  [] ? nfs4_proc_secinfo+0x63a0/0x63a0 [nfsv4]
[212326.442251]  [] iterate_dir+0xa6/0xe0
[212326.443347]  [] SyS_getdents+0x89/0x100
[212326.48]  [] ? fillonedir+0xd0/0xd0
[212326.445552]  [] ? __audit_syscall_exit+0x236/0x2e0
[212326.44]  []

[PATCH] timekeeping: Fixup typo in update_vsyscall_old definition

2014-07-25 Thread John Stultz

In commit 4a0e637738f0 ("clocksource: Get rid of cycle_last"),
currently in the -tip tree, there was a small typo where cycles_t
was used intstead of cycle_t. This broke ppc64 builds.

Fix this by using the proper cycle_t type for this usage, in
both the definition and the ia64 implementation.

Now, having both cycle_t and cycles_t types seems like a very
bad idea just asking for these sorts of issues. But that
will be a cleanup for another day.

Cc: Stephen Rothwell 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Peter Zijlstra 
Reported-by: Stephen Rothwell 
Signed-off-by: John Stultz 
---

Note: This should be visibly correct, and I've test built on ppc64,
but I don't have an ia64 toolchain, so if anyone could give this a
build whirl on ia64, I'd appreciate it.

 arch/ia64/kernel/time.c | 2 +-
 include/linux/timekeeper_internal.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 11dc42d..3e71ef8 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -441,7 +441,7 @@ void update_vsyscall_tz(void)
 }
 
 void update_vsyscall_old(struct timespec *wall, struct timespec *wtm,
-struct clocksource *c, u32 mult, cycles_t cycle_last)
+struct clocksource *c, u32 mult, cycle_t cycle_last)
 {
write_seqcount_begin(_gtod_data.seq);
 
diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index e9660e5..95640dc 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -113,7 +113,7 @@ extern void update_vsyscall_tz(void);
 
 extern void update_vsyscall_old(struct timespec *ts, struct timespec *wtm,
struct clocksource *c, u32 mult,
-   cycles_t cycle_last);
+   cycle_t cycle_last);
 extern void update_vsyscall_tz(void);
 
 #else
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] kthread_work: wake up worker only when the worker is idle

2014-07-25 Thread Lai Jiangshan

If the worker task is not idle, it may sleep on some conditions by the request
of the work.  Our unfriendly wakeup in the insert_kthread_work() may confuse
the worker.

Signed-off-by: Lai Jiangshan 
---
 kernel/kthread.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index c2390f4..ef48322 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -591,7 +591,7 @@ static void insert_kthread_work(struct kthread_worker 
*worker,
 
list_add_tail(>node, pos);
work->worker = worker;
-   if (likely(worker->task))
+   if (!worker->current_work && likely(worker->task))
wake_up_process(worker->task);
 }
 
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] kthread_work: add cancel_kthread_work[_sync]()

2014-07-25 Thread Lai Jiangshan

When an object or a subsystem quits, we need to destroy the kthread_work
which is used by the object or the subsystem.  We used to use
flush_kthread_work().  But flush_kthread_work() has not any guarantee
about the suspension of the work, this duty is pushed to the users.

So we introduce the cancel_kthread_work_sync() with a strict guarantee
like cancel_work_sync() (workqueue).  We also introduce cancel_kthread_work()
which can be used by users on some conditions.  And it is required for
making the implementation of the cancel_kthread_work_sync() simpler.
kthread_flush_work_fn() owns the running state of the kthread_worker
and calls cancel_kthread_work() to cancel the possible requeued work.

Both cancel_kthread_work_sync() and cancel_kthread_work() share the
code of flush_kthread_work() which also make the implementation simpler.

Signed-off-by: Lai Jiangshan 
---
 include/linux/kthread.h |2 +
 kernel/kthread.c|   78 ++
 2 files changed, 66 insertions(+), 14 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 790e49c..3cc3377 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -129,6 +129,8 @@ int kthread_worker_fn(void *worker_ptr);
 bool queue_kthread_work(struct kthread_worker *worker,
struct kthread_work *work);
 void flush_kthread_work(struct kthread_work *work);
+void cancel_kthread_work(struct kthread_work *work);
+void cancel_kthread_work_sync(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
 #endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index ef48322..b5d6844 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -622,6 +622,7 @@ EXPORT_SYMBOL_GPL(queue_kthread_work);
 
 struct kthread_flush_work {
struct kthread_work work;
+   struct kthread_work *cancel_work;
struct completion   done;
 };
 
@@ -629,24 +630,25 @@ static void kthread_flush_work_fn(struct kthread_work 
*work)
 {
struct kthread_flush_work *fwork =
container_of(work, struct kthread_flush_work, work);
+
+   /* cancel the possible requeued work for cancel_kthread_work_sync() */
+   if (fwork->cancel_work)
+   cancel_kthread_work(fwork->cancel_work);
complete(>done);
 }
 
-/**
- * flush_kthread_work - flush a kthread_work
- * @work: work to flush
- *
- * If @work is queued or executing, wait for it to finish execution.
- */
-void flush_kthread_work(struct kthread_work *work)
+static void __cancel_work_sync(struct kthread_work *work, bool cancel, bool 
sync)
 {
struct kthread_flush_work fwork = {
-   KTHREAD_WORK_INIT(fwork.work, kthread_flush_work_fn),
-   COMPLETION_INITIALIZER_ONSTACK(fwork.done),
+   .work = KTHREAD_WORK_INIT(fwork.work, kthread_flush_work_fn),
+   .done = COMPLETION_INITIALIZER_ONSTACK(fwork.done),
};
struct kthread_worker *worker;
bool noop = false;
 
+   if (WARN_ON(!cancel && !sync))
+   return;
+
 retry:
worker = work->worker;
if (!worker)
@@ -658,21 +660,69 @@ retry:
goto retry;
}
 
-   if (!list_empty(>node))
+   /* cancel the queued work */
+   if (cancel && !list_empty(>node))
+   list_del_init(>node);
+
+   /* cancel the work during flushing it if it is requeued */
+   if (cancel && sync)
+   fwork.cancel_work = work;
+
+   /* insert the kthread_flush_work when sync */
+   if (sync && !list_empty(>node))
insert_kthread_work(worker, , work->node.next);
-   else if (worker->current_work == work)
+   else if (sync && worker->current_work == work)
insert_kthread_work(worker, , 
worker->work_list.next);
else
noop = true;
 
spin_unlock_irq(>lock);
 
-   if (!noop)
+   if (sync && !noop)
wait_for_completion();
 }
+
+/**
+ * flush_kthread_work - flush a kthread_work
+ * @work: work to flush
+ *
+ * If @work is queued or executing, wait for it to finish execution.
+ */
+void flush_kthread_work(struct kthread_work *work)
+{
+   __cancel_work_sync(work, false, true);
+}
 EXPORT_SYMBOL_GPL(flush_kthread_work);
 
 /**
+ * cancel_kthread_work - cancel a kthread_work
+ * @work: work to cancel
+ *
+ * If @work is queued, cancel it. Note, the work maybe still
+ * be executing after it returns.
+ */
+void cancel_kthread_work(struct kthread_work *work)
+{
+   __cancel_work_sync(work, true, false);
+}
+EXPORT_SYMBOL_GPL(cancel_kthread_work);
+
+/**
+ * cancel_kthread_work_sync - cancel a kthread_work and sync it
+ * @work: work to cancel
+ *
+ * If @work is queued or executing, cancel the queued work and
+ * wait for the executing work to finish execution. It ensures
+ * that there is at least one point that the work is not queued
+ * nor executing.
+ */

[PATCH] kthread_work: remove the unused wait_queue_head

2014-07-25 Thread Lai Jiangshan

The wait_queue_head_t done was totally unused since the flush_kthread_work()
had been re-implemented.  So we removed it including the initialization
code.  Some LOCKDEP code also depends on this wait_queue_head, so the
LOCKDEP code is also cleanup.

Signed-off-by: Lai Jiangshan 
---
 include/linux/kthread.h |   16 +---
 1 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 7dcef33..790e49c 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -73,7 +73,6 @@ struct kthread_worker {
 struct kthread_work {
struct list_headnode;
kthread_work_func_t func;
-   wait_queue_head_t   done;
struct kthread_worker   *worker;
 };
 
@@ -85,7 +84,6 @@ struct kthread_work {
 #define KTHREAD_WORK_INIT(work, fn){   \
.node = LIST_HEAD_INIT((work).node),\
.func = (fn),   \
-   .done = __WAIT_QUEUE_HEAD_INITIALIZER((work).done), \
}
 
 #define DEFINE_KTHREAD_WORKER(worker)  \
@@ -95,24 +93,21 @@ struct kthread_work {
struct kthread_work work = KTHREAD_WORK_INIT(work, fn)
 
 /*
- * kthread_worker.lock and kthread_work.done need their own lockdep class
- * keys if they are defined on stack with lockdep enabled.  Use the
- * following macros when defining them on stack.
+ * kthread_worker.lock need its own lockdep class key if it is defined
+ * on stack with lockdep enabled.  Use the following macros when defining
+ * it on stack.
  */
 #ifdef CONFIG_LOCKDEP
 # define KTHREAD_WORKER_INIT_ONSTACK(worker)   \
({ init_kthread_worker(); worker; })
 # define DEFINE_KTHREAD_WORKER_ONSTACK(worker) \
struct kthread_worker worker = KTHREAD_WORKER_INIT_ONSTACK(worker)
-# define KTHREAD_WORK_INIT_ONSTACK(work, fn)   \
-   ({ init_kthread_work((), fn); work; })
-# define DEFINE_KTHREAD_WORK_ONSTACK(work, fn) \
-   struct kthread_work work = KTHREAD_WORK_INIT_ONSTACK(work, fn)
 #else
 # define DEFINE_KTHREAD_WORKER_ONSTACK(worker) DEFINE_KTHREAD_WORKER(worker)
-# define DEFINE_KTHREAD_WORK_ONSTACK(work, fn) DEFINE_KTHREAD_WORK(work, fn)
 #endif
 
+# define DEFINE_KTHREAD_WORK_ONSTACK(work, fn) DEFINE_KTHREAD_WORK(work, fn)
+
 extern void __init_kthread_worker(struct kthread_worker *worker,
const char *name, struct lock_class_key *key);
 
@@ -127,7 +122,6 @@ extern void __init_kthread_worker(struct kthread_worker 
*worker,
memset((work), 0, sizeof(struct kthread_work)); \
INIT_LIST_HEAD(&(work)->node);  \
(work)->func = (fn);\
-   init_waitqueue_head(&(work)->done); \
} while (0)
 
 int kthread_worker_fn(void *worker_ptr);
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 06/10] Input - wacom: prepare the driver to include BT devices

2014-07-25 Thread Dmitry Torokhov

On Fri, Jul 25, 2014 at 11:25:17PM -0400, Benjamin Tissoires wrote:
> On Jul 25 2014 or thereabouts, Dmitry Torokhov wrote:
> > Hi Benjamin,
> > 
> > On Thu, Jul 24, 2014 at 02:14:01PM -0400, Benjamin Tissoires wrote:
> > > Now that wacom is a hid driver, there is no point in having a separate
> > > driver for bluetooth devices.
> > > This patch prepares the common paths of Bluetooth devices in the
> > > common wacom driver.
> > > It also adds the sysfs file "speed" used by Bluetooth devices.
> > > 
> > > Signed-off-by: Benjamin Tissoires 
> > > ---
> > > 
> > > new in v2
> > > 
> > >  drivers/hid/wacom_sys.c | 70 
> > > ++---
> > >  drivers/hid/wacom_wac.h |  2 ++
> > >  2 files changed, 69 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c
> > > index d0d06b8..add76ec 100644
> > > --- a/drivers/hid/wacom_sys.c
> > > +++ b/drivers/hid/wacom_sys.c
> > > @@ -262,6 +262,12 @@ static int wacom_set_device_mode(struct hid_device 
> > > *hdev, int report_id,
> > >   return error < 0 ? error : 0;
> > >  }
> > >  
> > > +static int wacom_bt_query_tablet_data(struct hid_device *hdev, u8 speed,
> > > + struct wacom_features *features)
> > > +{
> > > + return 0;
> > > +}
> > > +
> > >  /*
> > >   * Switch the tablet into its most-capable mode. Wacom tablets are
> > >   * typically configured to power-up in a mode which sends mouse-like
> > > @@ -272,6 +278,9 @@ static int wacom_set_device_mode(struct hid_device 
> > > *hdev, int report_id,
> > >  static int wacom_query_tablet_data(struct hid_device *hdev,
> > >   struct wacom_features *features)
> > >  {
> > > + if (hdev->bus == BUS_BLUETOOTH)
> > > + return wacom_bt_query_tablet_data(hdev, 1, features);
> > > +
> > >   if (features->device_type == BTN_TOOL_FINGER) {
> > >   if (features->type > TABLETPC) {
> > >   /* MT Tablet PC touch */
> > > @@ -890,6 +899,38 @@ static void wacom_destroy_battery(struct wacom 
> > > *wacom)
> > >   }
> > >  }
> > >  
> > > +static ssize_t wacom_show_speed(struct device *dev,
> > > + struct device_attribute
> > > + *attr, char *buf)
> > > +{
> > > + struct hid_device *hdev = container_of(dev, struct hid_device, dev);
> > > + struct wacom *wacom = hid_get_drvdata(hdev);
> > > +
> > > + return snprintf(buf, PAGE_SIZE, "%i\n", wacom->wacom_wac.bt_high_speed);
> > > +}
> > > +
> > > +static ssize_t wacom_store_speed(struct device *dev,
> > > + struct device_attribute *attr,
> > > + const char *buf, size_t count)
> > > +{
> > > + struct hid_device *hdev = container_of(dev, struct hid_device, dev);
> > > + struct wacom *wacom = hid_get_drvdata(hdev);
> > > + int new_speed;
> > > +
> > > + if (sscanf(buf, "%1d", _speed ) != 1)
> > 
> > Checkpach is unhappy with ')' placement and I agree with it.
> > 
> 
> ouch
> 
> > > + return -EINVAL;
> > 
> > kstrtou8?
> 
> re-ouch
> 
> > 
> > > +
> > > + if (new_speed == 0 || new_speed == 1) {
> > > + wacom_bt_query_tablet_data(hdev, new_speed,
> > > + >wacom_wac.features);
> > > + return strnlen(buf, PAGE_SIZE);
> > 
> > This is weird. Normally you want to return count since you should refuse
> > input with excessive data.
> 
> indeed
> 
> > 
> > > + } else
> > > + return -EINVAL;
> > 
> > Need braces on both branches.
> > 
> 
> Grmblmbl. I should not have blinded copied the code from one driver to
> one other. I will send a v3 of the rest of the series on top of your
> wacom branch, at some point next week. Other people can still try to
> find out other mistakes meanwhile ;)

Great, will be waiting for it and then will merge everything into next for
3.17.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/10] Input - wacom: conversion to HID driver, series 2

2014-07-25 Thread Dmitry Torokhov

On Fri, Jul 25, 2014 at 11:21:27PM -0400, Benjamin Tissoires wrote:
> On Jul 25 2014 or thereabouts, Dmitry Torokhov wrote:
> > Hi Benjamin,
> > 
> > On Thu, Jul 24, 2014 at 02:13:55PM -0400, Benjamin Tissoires wrote:
> > > Hi Dmitry,
> > > 
> > > this is the second series I told you about for wacom.ko. This series also 
> > > have
> > > a good number of removed lines of code. \o/
> > > 
> > > The first patch is Jason's one that I finally decided to take with me. His
> > > previous submission still applied correctly even after the moving of the 
> > > files
> > > (git is definitively awesome).
> > > 
> > > The second one is a patch I sent earlier and forgot to include in the v2 
> > > of
> > > the first series. It might have been dropped during my many rebases. So 
> > > here
> > > he is.
> > > 
> > > The rest is for one part enhancing the battery reporting system (to make 
> > > it
> > > equal to the one in hid-wacom, and even slightly better). The other part
> > > is the actual merge of hid-wacom into wacom which gives the same user 
> > > space API
> > > for bluetooth and USB devices, fixes the pad-in-a-separate-input-dev, and
> > > fixes the missing tools not supported in the previous implementation of
> > > hid-wacom for Intuos 4 BT.
> > > 
> > 
> > I ended up taking 3.16-rc6 and applying your first series and the first
> > 5 patches of this series to it. You should be able to see the result on
> > kernel.org in wacom branch.
> 
> Cool. The resolution of the hid-core.c conflict is in fact better than
> what I proposed. It should not conflict with Jiri's pull request IMO. And
> when the two branches will hit Linus' we can then un-split hid-rmi and
> wacom processes.
> 
> I also noticed few differences. Some of them you obviously made, I am
> fine with them BTW, and some other I think because your current next
> branch already has some wacom patches queued. I guess you will handle
> this just fine, as usual, but I'll try to keep an eye on it just in case
> git messed up the merge and forgot one branch.
> 
> Thanks again Dmitry. And sorry for have pushed you regarding that.

No worries, I need to be pushed sometimes ;)

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 06/10] Input - wacom: prepare the driver to include BT devices

2014-07-25 Thread Benjamin Tissoires

On Jul 25 2014 or thereabouts, Dmitry Torokhov wrote:
> Hi Benjamin,
> 
> On Thu, Jul 24, 2014 at 02:14:01PM -0400, Benjamin Tissoires wrote:
> > Now that wacom is a hid driver, there is no point in having a separate
> > driver for bluetooth devices.
> > This patch prepares the common paths of Bluetooth devices in the
> > common wacom driver.
> > It also adds the sysfs file "speed" used by Bluetooth devices.
> > 
> > Signed-off-by: Benjamin Tissoires 
> > ---
> > 
> > new in v2
> > 
> >  drivers/hid/wacom_sys.c | 70 
> > ++---
> >  drivers/hid/wacom_wac.h |  2 ++
> >  2 files changed, 69 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c
> > index d0d06b8..add76ec 100644
> > --- a/drivers/hid/wacom_sys.c
> > +++ b/drivers/hid/wacom_sys.c
> > @@ -262,6 +262,12 @@ static int wacom_set_device_mode(struct hid_device 
> > *hdev, int report_id,
> > return error < 0 ? error : 0;
> >  }
> >  
> > +static int wacom_bt_query_tablet_data(struct hid_device *hdev, u8 speed,
> > +   struct wacom_features *features)
> > +{
> > +   return 0;
> > +}
> > +
> >  /*
> >   * Switch the tablet into its most-capable mode. Wacom tablets are
> >   * typically configured to power-up in a mode which sends mouse-like
> > @@ -272,6 +278,9 @@ static int wacom_set_device_mode(struct hid_device 
> > *hdev, int report_id,
> >  static int wacom_query_tablet_data(struct hid_device *hdev,
> > struct wacom_features *features)
> >  {
> > +   if (hdev->bus == BUS_BLUETOOTH)
> > +   return wacom_bt_query_tablet_data(hdev, 1, features);
> > +
> > if (features->device_type == BTN_TOOL_FINGER) {
> > if (features->type > TABLETPC) {
> > /* MT Tablet PC touch */
> > @@ -890,6 +899,38 @@ static void wacom_destroy_battery(struct wacom *wacom)
> > }
> >  }
> >  
> > +static ssize_t wacom_show_speed(struct device *dev,
> > +   struct device_attribute
> > +   *attr, char *buf)
> > +{
> > +   struct hid_device *hdev = container_of(dev, struct hid_device, dev);
> > +   struct wacom *wacom = hid_get_drvdata(hdev);
> > +
> > +   return snprintf(buf, PAGE_SIZE, "%i\n", wacom->wacom_wac.bt_high_speed);
> > +}
> > +
> > +static ssize_t wacom_store_speed(struct device *dev,
> > +   struct device_attribute *attr,
> > +   const char *buf, size_t count)
> > +{
> > +   struct hid_device *hdev = container_of(dev, struct hid_device, dev);
> > +   struct wacom *wacom = hid_get_drvdata(hdev);
> > +   int new_speed;
> > +
> > +   if (sscanf(buf, "%1d", _speed ) != 1)
> 
> Checkpach is unhappy with ')' placement and I agree with it.
> 

ouch

> > +   return -EINVAL;
> 
> kstrtou8?

re-ouch

> 
> > +
> > +   if (new_speed == 0 || new_speed == 1) {
> > +   wacom_bt_query_tablet_data(hdev, new_speed,
> > +   >wacom_wac.features);
> > +   return strnlen(buf, PAGE_SIZE);
> 
> This is weird. Normally you want to return count since you should refuse
> input with excessive data.

indeed

> 
> > +   } else
> > +   return -EINVAL;
> 
> Need braces on both branches.
> 

Grmblmbl. I should not have blinded copied the code from one driver to
one other. I will send a v3 of the rest of the series on top of your
wacom branch, at some point next week. Other people can still try to
find out other mistakes meanwhile ;)

Thanks for the review.

Cheers,
Benjamin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/10] Input - wacom: conversion to HID driver, series 2

2014-07-25 Thread Benjamin Tissoires

On Jul 25 2014 or thereabouts, Dmitry Torokhov wrote:
> Hi Benjamin,
> 
> On Thu, Jul 24, 2014 at 02:13:55PM -0400, Benjamin Tissoires wrote:
> > Hi Dmitry,
> > 
> > this is the second series I told you about for wacom.ko. This series also 
> > have
> > a good number of removed lines of code. \o/
> > 
> > The first patch is Jason's one that I finally decided to take with me. His
> > previous submission still applied correctly even after the moving of the 
> > files
> > (git is definitively awesome).
> > 
> > The second one is a patch I sent earlier and forgot to include in the v2 of
> > the first series. It might have been dropped during my many rebases. So here
> > he is.
> > 
> > The rest is for one part enhancing the battery reporting system (to make it
> > equal to the one in hid-wacom, and even slightly better). The other part
> > is the actual merge of hid-wacom into wacom which gives the same user space 
> > API
> > for bluetooth and USB devices, fixes the pad-in-a-separate-input-dev, and
> > fixes the missing tools not supported in the previous implementation of
> > hid-wacom for Intuos 4 BT.
> > 
> 
> I ended up taking 3.16-rc6 and applying your first series and the first
> 5 patches of this series to it. You should be able to see the result on
> kernel.org in wacom branch.

Cool. The resolution of the hid-core.c conflict is in fact better than
what I proposed. It should not conflict with Jiri's pull request IMO. And
when the two branches will hit Linus' we can then un-split hid-rmi and
wacom processes.

I also noticed few differences. Some of them you obviously made, I am
fine with them BTW, and some other I think because your current next
branch already has some wacom patches queued. I guess you will handle
this just fine, as usual, but I'll try to keep an eye on it just in case
git messed up the merge and forgot one branch.

Thanks again Dmitry. And sorry for have pushed you regarding that.

Cheers,
Benjamin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

security: oops on boot in __key_link_begin

2014-07-25 Thread Sasha Levin

Hi all,

I'm (sometimes) seeing the following when booting the most recent -next kernel:

[   31.319902] Loading compiled-in X.509 certificates
[   31.328118] BUG: unable to handle kernel paging request at 8b49ff42
[   31.328981] IP: assoc_array_insert (lib/assoc_array.c:480 
lib/assoc_array.c:1021)
[   31.329703] PGD 25a25067 PUD 25a26063 PMD 0
[   31.330218] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   31.330473] Dumping ftrace buffer:
[   31.330473](ftrace buffer empty)
[   31.330473] Modules linked in:
[   31.330473] CPU: 29 PID: 1 Comm: swapper/0 Tainted: GW  
3.16.0-rc6-next-20140725-sasha-00048-ga713fc0-dirty #937
[   31.330473] task: 8805f9c4 ti: 88030bcd4000 task.ti: 
88030bcd4000
[   31.330473] RIP: assoc_array_insert (lib/assoc_array.c:480 
lib/assoc_array.c:1021)
[   31.330473] RSP: :88030bcd7bc8  EFLAGS: 00010246
[   31.330473] RAX:  RBX: 8805f56ac208 RCX: dfff970a48e0
[   31.330473] RDX: 880a375b9b38 RSI: 00fc RDI: 880a375b9a50
[   31.330473] RBP: 88030bcd7cc8 R08: 0004 R09: 0004
[   31.330473] R10: 880b078d5854 R11:  R12: 8805f56ac209
[   31.330473] R13: a41c0880 R14: 88030bcd7d30 R15: 880a375b9a40
[   31.330473] FS:  () GS:8805ffc0() 
knlGS:
[   31.330473] CS:  0010 DS:  ES:  CR0: 8005003b
[   31.330473] CR2: 8b49ff42 CR3: 25a22000 CR4: 06a0
[   31.330473] Stack:
[   31.330473]  88030bcd7cc8 8805f9c4 a7b8d5a0 
0001
[   31.330473]  8803095f8090 88030bcd7c08 9f2115cd 
0001
[   31.330473]  88030bcd7c20 9f2117b1  
88030bcd7c30
[   31.330473] Call Trace:
[   31.330473] ? get_parent_ip (kernel/sched/core.c:2561)
[   31.330473] ? preempt_count_sub (kernel/sched/core.c:2617)
[   31.330473] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 
kernel/locking/lockdep.c:254)
[   31.330473] ? __key_link_begin (./arch/x86/include/asm/bitops.h:311 
security/keys/keyring.c:1073)
[   31.330473] ? __key_link_begin (./arch/x86/include/asm/bitops.h:311 
security/keys/keyring.c:1073)
[   31.330473] __key_link_begin (security/keys/keyring.c:1088)
[   31.330473] key_create_or_update (security/keys/key.c:840 (discriminator 4))
[   31.330473] load_system_certificate_list (kernel/system_keyring.c:88)
[   31.330473] ? system_trusted_keyring_init (kernel/system_keyring.c:56)
[   31.330473] do_one_initcall (init/main.c:792)
[   31.330473] kernel_init_freeable (init/main.c:858 init/main.c:866 
init/main.c:885 init/main.c:1006)
[   31.330473] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
[   31.330473] ? do_one_initcall (init/main.c:933)
[   31.330473] kernel_init (init/main.c:938)
[   31.330473] ret_from_fork (arch/x86/kernel/entry_64.S:348)
[   31.330473] ? do_one_initcall (init/main.c:933)
[ 31.330473] Code: 67 40 e8 19 ba 42 ff 48 8d 43 10 49 89 47 30 49 8d bf f0 00 
00 00 e8 05 ba 42 ff 48 8b bc 24 88 00 00 00 49 89 9f f0 00 00 00 55  b8 42 
ff 49 8b 5f 10 49 8d bf 00 01 00 00 e8 e1 b9 42 ff 49
All code

   0:   67 40 e8 19 ba 42 ffaddr32 rex callq 0xff42ba20
   7:   48 8d 43 10 lea0x10(%rbx),%rax
   b:   49 89 47 30 mov%rax,0x30(%r15)
   f:   49 8d bf f0 00 00 00lea0xf0(%r15),%rdi
  16:   e8 05 ba 42 ff  callq  0xff42ba20
  1b:   48 8b bc 24 88 00 00mov0x88(%rsp),%rdi
  22:   00
  23:   49 89 9f f0 00 00 00mov%rbx,0xf0(%r15)
  2a:   55  push   %rbp
  2b:*  c1 b8 42 ff 49 8b 5fsarl   $0x5f,-0x74b600be(%rax)  <-- 
trapping instruction
  32:   10 49 8dadc%cl,-0x73(%rcx)
  35:   bf 00 01 00 00  mov$0x100,%edi
  3a:   e8 e1 b9 42 ff  callq  0xff42ba20
  3f:   49  rex.WB
...

Code starting with the faulting instruction
===
   0:   c1 b8 42 ff 49 8b 5fsarl   $0x5f,-0x74b600be(%rax)
   7:   10 49 8dadc%cl,-0x73(%rcx)
   a:   bf 00 01 00 00  mov$0x100,%edi
   f:   e8 e1 b9 42 ff  callq  0xff42b9f5
  14:   49  rex.WB
...
[   31.330473] RIP assoc_array_insert (lib/assoc_array.c:480 
lib/assoc_array.c:1021)
[   31.330473]  RSP 
[   31.330473] CR2: 8b49ff42


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] workqueue: cleanup may_start_working()

2014-07-25 Thread Lai Jiangshan

The name of may_start_working() became misleading due to the semantics of
"!pool->nr_idle" is changed and any worker can start working in spite of
the value of pool->nr_idle.

So we remove the may_start_working() and use "!pool->nr_idle" directly,
need_to_create_worker() is also removed along with it.

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c |   37 +
 1 files changed, 13 insertions(+), 24 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ce8e3fc..e1ab4f9 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -735,12 +735,6 @@ static bool need_more_worker(struct worker_pool *pool)
return !list_empty(>worklist) && __need_more_worker(pool);
 }
 
-/* Can I start working?  Called from busy but !running workers. */
-static bool may_start_working(struct worker_pool *pool)
-{
-   return pool->nr_idle;
-}
-
 /* Do I need to keep working?  Called from currently running workers. */
 static bool keep_working(struct worker_pool *pool)
 {
@@ -748,12 +742,6 @@ static bool keep_working(struct worker_pool *pool)
atomic_read(>nr_running) <= 1;
 }
 
-/* Do we need a new worker?  Called from manager. */
-static bool need_to_create_worker(struct worker_pool *pool)
-{
-   return need_more_worker(pool) && !may_start_working(pool);
-}
-
 /* Do we have too many workers and should some go away? */
 static bool too_many_workers(struct worker_pool *pool)
 {
@@ -1815,19 +1803,20 @@ static void pool_mayday_timeout(unsigned long __pool)
spin_lock_irq(_mayday_lock); /* for wq->maydays */
spin_lock(>lock);
 
-   if (need_to_create_worker(pool)) {
-   /*
-* We've been trying to create a new worker but
-* haven't been successful.  We might be hitting an
-* allocation deadlock.  Send distress signals to
-* rescuers.
-*/
-   list_for_each_entry(work, >worklist, entry)
-   send_mayday(work);
-   }
+   if (!pool->nr_idle) {
+   if (need_more_worker(pool)) {
+   /*
+* We've been trying to create a new worker but
+* haven't been successful.  We might be hitting an
+* allocation deadlock.  Send distress signals to
+* rescuers.
+*/
+   list_for_each_entry(work, >worklist, entry)
+   send_mayday(work);
+   }
 
-   if (!pool->nr_idle)
mod_timer(>mayday_timer, jiffies + MAYDAY_INTERVAL);
+   }
 
spin_unlock(>lock);
spin_unlock_irq(_mayday_lock);
@@ -2097,7 +2086,7 @@ woke_up:
goto sleep;
 
/* do we need to create worker? */
-   if (unlikely(!may_start_working(pool)))
+   if (unlikely(!pool->nr_idle))
start_creater_work(pool);
 
/*
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] workqueue: migrate the new worker before add it to idle_list

2014-07-25 Thread Lai Jiangshan

There is an undocumented requirement for create_worker() that it can
only be called from existing worker (aka. manager) except the first call.

The reason is that the current create_worker() queues the new worker to
idle_list at first and then wake up it.  But the new worker is not
guaranteed to be migrated until it is waken up.  Thus the
wq_worker_sleeping() may see the new non-local worker from the idle_list
if this block of code is not executed on the local CPU to disable
wq_worker_sleeping().  Existing worker can guarantee to run on local
CPU when !DISASSOCIATED, so create_worker() is required to be called
from existing worker/manager only currently.

But we are planning to allow create_worker() to be called out side
from its workers and the requirement should be alleviated.  So we exchange
the order of the code, the new worker is woken up before queued
to idle_list.

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 370f947..1d44d8d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1708,8 +1708,13 @@ static struct worker *create_worker(struct worker_pool 
*pool)
/* start the newly created worker */
spin_lock_irq(>lock);
worker->pool->nr_workers++;
-   worker_enter_idle(worker);
+   /*
+* Wake up the worker at first and then queue it to the idle_list,
+* so that it is ensued that the wq_worker_sleeping() sees the worker
+* had been migrated properly when sees this worker in the idle_list.
+*/
wake_up_process(worker->task);
+   worker_enter_idle(worker);
spin_unlock_irq(>lock);
 
return worker;
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] workqueue: use dedicated creater kthread for all pools

2014-07-25 Thread Lai Jiangshan

There are some problems with the managers:
  1) The last idle worker prefer managing to processing.
 It is better that the processing of work items should be the first
 priority to make the whole system make progress earlier.
  2) managers among different pools can be parallel, but actually
 their major work are serialized on the kernel thread "kthreadd".
 These managers are sleeping and wasted when the system is lack
 of memory.

This patch introduces a dedicated creater kthread which offloads the
managing from the workers, thus every worker makes effort to process work
rather than create worker, and there is no manager wasting on sleeping
when the system is lack of memory.  This dedicated creater kthread causes
a little more work serialized than before, but it is acceptable.

Implement Detail:
  1) the creater_work:  do the creation, creation exclusion
  2) the cooldown timer:cool down when fail
  3) the mayday timer:  dismiss itself when pool->nr_idle > 0
  4) the semantic for "pool->nr_idle == 0" active the creater_work, cooldown 
and mayday timer
  5) the routine of the creater_work:   create worker on two conditions
  6) worker_thread():   start management without 
unlock/wait/recheck/retry
  7) put_unbound_pool():group destruction code by functionality
  8) init_workqueues(): init creater kthread earlier than pools/workers

  1) the creater_work
Every pool has a struct kthread_work creater_work to create worker, and
the dedicated creater kthread processes all these creater_works of
all pools. struct kthread_work has itself execution exclusion, so we don't
need the manager_arb to handle the creating exclusion any more.
put_unbound_pool() uses the flush_kthread_work() to synchronize with
the creating rather than uses the manager_arb.

  2) the cooldown timer
The cooldown timer is introduced to implement the cool-down mechanism
rather than to causes the creater to sleep.  When the create_worker()
fails, the cooldown timer is requested and it will restart the creater_work.

  3) the mayday timer
The mayday timer is changed so it doesn't restart itself when
pool->nr_idle > 0.  If it always restarts itself as before, we will add
a lot of complication in creater_work.  The creater_work will need to
call del_timer_sync() after successful creation and grab the pool->lock
to check and restart the timer when pool->nr_idle becomes zero.
We don't want that complication, let the timer dismiss itself.

  4) the semantic for "pool->nr_idle == 0"
Any moment when pool->nr_idle == 0, the pool must on the creating state.
The mayday timer must be active (pending or running) to ensure the mayday
can be sent, and at least one of the creater_work or the cooldown timer
must be active to ensure the creating is in progress or standby.  So the
last worker who causes the pool->nr_idle reduce to 0 has the responsibility
to kick the mayday timer and the creater_work. And may_start_working()
becomes misleading due to the meaning of "!pool->nr_idle" is changed and
any worker should start working in spite of the value of pool->nr_idle.
may_start_working() will be cleanup in the next patch with the intention
that the size of this patch can be shorter for reviewing.

  5) the routine of the creater_work
The creater_work creates a worker in these two conditions:
  A) pool->nr_idle == 0
 A new worker is needed to be created obviously even there is no
 work item pending.  The busy workers may sleep and the pool can't
 serves the future new work items if no new worker is standby or
 being created.
  B) pool->nr_idle == 1 && pool->nr_running == 0
 It should create a worker but not strictly needed since we still
 have a free idle worker and it can restart the creation when it goes
 to busy.  But if we don't create worker in this condition, this
 condition may occur frequently.  If a work item is queued and the
 last idle worker starts creater_work.  But if the work item is
 finished before the creater_work, this condition happens again,
 and it can happen again and again in this way.  So we had better
 to create a worker in this condition.
The creater_work will quit directly in other conditions.

  6) worker_thread()
There is no recheck nor retry when creating is required in worker_thread(),
it just kicks out the mayday timer and the creater work and goes to
process the work items directly.

  7) put_unbound_pool()
put_unbound_pool() groups code by functionality not by the name, it
stops creation activity (creater_work, cooldown_timer, mayday_timer)
at first and then stops idle workers and idle_tiemr.

  8) init_workqueues()
The struct kthread_worker kworker_creater is initialized earlier than
worker_pools in init_workqueues() so that kworker_creater_thread is
created than all early kworkers.  Although the early kworkers are not
depends on kworker_creater_thread, but this initialization order makes
the pid of

[PATCH 0/3] workqueue: offload the worker-management out from kworker

2014-07-25 Thread Lai Jiangshan

Current kworker prefer creating worker (if required) to processing work items,
we hope the processing should be the first priority.

The jobs in managers are serialized, it is just wasting if we have multiple
managers, only one worker-creater is enough.

It causes much complication and tricky when manager is implemented inside
worker, using dedicated creater will make things more flexible.

So we offload the worker-management out from kworker into a single
dedicated creater kthread.  It is done in patch2. And the patch1 is
preparation and patch3 is cleanup patch.

Lai Jiangshan (3):
  workqueue: migrate the new worker before add it to idle_list
  workqueue: use dedicated creater kthread for all pools
  workqueue: cleanup may_start_working()

 kernel/workqueue.c |  228 ++--
 1 files changed, 96 insertions(+), 132 deletions(-)

-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 00/11] Refactor MSI to support Non-PCI device

2014-07-25 Thread Yijing Wang

Hi all,
The series is a draft of generic MSI driver that supports PCI
and Non-PCI device which have MSI capability. If you're not interested
it, sorry for the noise.

The series is based on Linux-3.16-rc1.

MSI was introduced in PCI Spec 2.2. Currently, kernel MSI 
driver codes are bonding with PCI device. Because MSI has a lot
advantages in design. More and more non-PCI devices want to
use MSI as their default interrupt. The existing MSI device
include HPET. HPET driver provide its own MSI code to initialize
and process MSI interrupts. In the latest GIC v3 spec, legacy device
can deliver MSI by the help of a relay device named consolidator.
Consolidator can translate the legacy interrupts connected to it
to MSI/MSI-X. And new non-PCI device will be designed to 
support MSI in future. So make the MSI driver code be generic will 
help the non-PCI device use MSI more simply.

The new data struct for generic MSI driver.
struct msi_irqs {
u8 msi_enabled:1; /* Enable flag */
u8 msix_enabled:1;
struct list_head msi_list; /* MSI desc list */
void *data; /* help to find the MSI device */
struct msi_ops *ops; /* MSI device specific hook */
};
struct msi_irqs is used to manage MSI related informations. Every device 
supports
MSI should contain this data struct and allocate it.

struct msi_ops {
struct msi_desc *(*msi_setup_entry)(struct msi_irqs *msi, struct 
msi_desc *entry);
int msix_setup_entries(struct msi_irqs *msi, struct msix_entry 
*entries);
u32 (*msi_mask_irq)(struct msi_desc *desc, u32 mask, u32 flag);
u32 (*msix_mask_irq)(struct msi_desc *desc, u32 flag);
void (*msi_read_message)(struct msi_desc *desc, struct msi_msg *msg);
void (*msi_write_message)(struct msi_desc *desc, struct msi_msg *msg);
void (*msi_set_intx)(struct msi_irqs *msi, int enable);
};
struct msi_ops provides several hook functions, generic MSI driver will call
the hook functions to access device specific registers. PCI devices will share
the same msi_ops, because they have the same way to access MSI hardware 
registers.

Generic MSI layer export msi_capability_init() and msix_capability_init() 
functions
to drivers. msi/x_capability_init() will initialize MSI capability data struct 
msi_desc
and alloc the irq, then write the msi address/data value to hardware registers.

This series only did compile test, we will test it in x86 and arm platform 
later.

Any comments are welcome.

Thanks!
Yijing.




Yijing Wang (11):
  PCI/MSI: Use pci_dev->msi_cap instead of msi_desc->msi_attrib.pos
  PCI/MSI: Use new MSI type macro instead of PCI MSI flags
  PCI/MSI: Refactor pci_dev_msi_enabled()
  PCI/MSI: Move MSIX table address mapping out of msix_capability_init
  PCI/MSI: Move populate_msi_sysfs() out of msi_capability_init()
  PCI/MSI: Save MSI irq in PCI MSI layer
  PCI/MSI: Mask MSI-X entry in msix_setup_entries()
  PCI/MSI: Introduce new struct msi_irqs and struct msi_ops
  PCI/MSI: refactor PCI MSI driver
  PCI/MSI: Split the generic MSI code into new file
  x86/MSI: Refactor x86 MSI code

 arch/cris/arch-v32/drivers/pci/bios.c |2 +-
 arch/frv/mb93090-mb00/pci-vdk.c   |2 +-
 arch/ia64/pci/pci.c   |4 +-
 arch/mips/pci/msi-octeon.c|8 +-
 arch/powerpc/kernel/eeh_driver.c  |2 +-
 arch/powerpc/kernel/msi.c |2 +-
 arch/powerpc/platforms/pseries/msi.c  |8 +-
 arch/s390/pci/pci.c   |2 +-
 arch/x86/include/asm/io_apic.h|2 +-
 arch/x86/include/asm/irq_remapping.h  |4 +-
 arch/x86/include/asm/pci.h|6 +-
 arch/x86/include/asm/x86_init.h   |   10 +-
 arch/x86/kernel/apic/io_apic.c|   25 +-
 arch/x86/kernel/x86_init.c|   12 +-
 arch/x86/pci/common.c |5 +-
 arch/x86/pci/xen.c|   24 +-
 drivers/Kconfig   |1 +
 drivers/Makefile  |1 +
 drivers/block/nvme-core.c |4 +-
 drivers/dma/ioat/dma.c|2 +-
 drivers/firewire/ohci.c   |2 +-
 drivers/gpu/drm/i915/i915_dma.c   |4 +-
 drivers/iommu/amd_iommu.c |   16 +-
 drivers/iommu/intel_irq_remapping.c   |9 +-
 drivers/iommu/irq_remapping.c |   53 ++--
 drivers/iommu/irq_remapping.h |6 +-
 drivers/irqchip/irq-armada-370-xp.c   |2 +-
 drivers/misc/mei/hw-me.c  |2 +-
 drivers/misc/mei/hw-txe.c |2 +-
 drivers/misc/mei/pci-me.c |4 +-
 drivers/misc/mei/pci-txe.c|4 +-
 drivers/misc/mic/host/mic_debugfs.c   |4 +-
 drivers/misc/mic/host/mic_intr.c  |8 +-
 drivers/msi/Kconfig   |8 +
 drivers/msi/Makefile  |1 +
 drivers/msi/msi.c

[RFC PATCH 06/11] PCI/MSI: Save MSI irq in PCI MSI layer

2014-07-25 Thread Yijing Wang

Save MSI irq in PCI MSI layer, this is preparation
for generic MSI.

Signed-off-by: Yijing Wang 
---
 drivers/pci/msi.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 21b16e0..f96dd38 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -650,8 +650,6 @@ static int msi_capability_init(struct pci_dev *dev, int 
nvec)
pci_intx_for_msi(dev, 0);
msi_set_enable(dev, 1);
dev->msi_enabled = 1;
-
-   dev->irq = entry->irq;
return 0;
 }
 
@@ -1059,6 +1057,7 @@ int pci_enable_msi_range(struct pci_dev *dev, int minvec, 
int maxvec)
 {
int nvec;
int rc;
+   struct msi_desc *entry;
 
if (dev->current_state != PCI_D0)
return -EINVAL;
@@ -1114,6 +1113,8 @@ int pci_enable_msi_range(struct pci_dev *dev, int minvec, 
int maxvec)
return rc;
}
 
+   entry = list_entry(dev->msi_list.next, struct msi_desc, list);
+   dev->irq = entry->irq;
return nvec;
 }
 EXPORT_SYMBOL(pci_enable_msi_range);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 02/11] PCI/MSI: Use new MSI type macro instead of PCI MSI flags

2014-07-25 Thread Yijing Wang

Add new MSI type marco(MSI_TYPE and MSIX_TYPE) to support
the future generic MSI driver. The coming generic MSI driver
will be used by PCI and Non-PCI devices that have MSI capability.

Signed-off-by: Yijing Wang 
---
 arch/mips/pci/msi-octeon.c   |4 ++--
 arch/powerpc/kernel/msi.c|2 +-
 arch/powerpc/platforms/pseries/msi.c |8 
 arch/s390/pci/pci.c  |2 +-
 arch/x86/kernel/apic/io_apic.c   |2 +-
 arch/x86/pci/xen.c   |   24 
 drivers/iommu/irq_remapping.c|2 +-
 drivers/irqchip/irq-armada-370-xp.c  |2 +-
 drivers/pci/msi.c|   10 +-
 include/linux/msi.h  |3 +++
 10 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/arch/mips/pci/msi-octeon.c b/arch/mips/pci/msi-octeon.c
index 6a6a99f..8105610 100644
--- a/arch/mips/pci/msi-octeon.c
+++ b/arch/mips/pci/msi-octeon.c
@@ -192,14 +192,14 @@ int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, 
int type)
/*
 * MSI-X is not supported.
 */
-   if (type == PCI_CAP_ID_MSIX)
+   if (type == MSIX_TYPE)
return -EINVAL;
 
/*
 * If an architecture wants to support multiple MSI, it needs to
 * override arch_setup_msi_irqs()
 */
-   if (type == PCI_CAP_ID_MSI && nvec > 1)
+   if (type == MSI_TYPE && nvec > 1)
return 1;
 
list_for_each_entry(entry, >msi_list, list) {
diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c
index 8bbc12d..05b3133 100644
--- a/arch/powerpc/kernel/msi.c
+++ b/arch/powerpc/kernel/msi.c
@@ -21,7 +21,7 @@ int arch_msi_check_device(struct pci_dev* dev, int nvec, int 
type)
}
 
/* PowerPC doesn't support multiple MSI yet */
-   if (type == PCI_CAP_ID_MSI && nvec > 1)
+   if (type == MSI_TYPE && nvec > 1)
return 1;
 
if (ppc_md.msi_check_device) {
diff --git a/arch/powerpc/platforms/pseries/msi.c 
b/arch/powerpc/platforms/pseries/msi.c
index 0c882e8..e2f27d6 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -339,7 +339,7 @@ static int rtas_msi_check_device(struct pci_dev *pdev, int 
nvec, int type)
 {
int quota, rc;
 
-   if (type == PCI_CAP_ID_MSIX)
+   if (type == MSIX_TYPE)
rc = check_req_msix(pdev, nvec);
else
rc = check_req_msi(pdev, nvec);
@@ -406,14 +406,14 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int 
nvec_in, int type)
if (!pdn)
return -ENODEV;
 
-   if (type == PCI_CAP_ID_MSIX && check_msix_entries(pdev))
+   if (type == MSIX_TYPE && check_msix_entries(pdev))
return -EINVAL;
 
/*
 * Firmware currently refuse any non power of two allocation
 * so we round up if the quota will allow it.
 */
-   if (type == PCI_CAP_ID_MSIX) {
+   if (type == MSIX_TYPE) {
int m = roundup_pow_of_two(nvec);
int quota = msi_quota_for_device(pdev, m);
 
@@ -427,7 +427,7 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int 
nvec_in, int type)
 * return MSI-Xs.
 */
 again:
-   if (type == PCI_CAP_ID_MSI) {
+   if (type == MSI_TYPE) {
if (pdn->force_32bit_msi) {
rc = rtas_change_msi(pdn, RTAS_CHANGE_32MSI_FN, nvec);
if (rc < 0) {
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 9ddc51e..fe3a40c 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -407,7 +407,7 @@ int arch_setup_msi_irqs(struct pci_dev *pdev, int nvec, int 
type)
struct msi_msg msg;
int rc, irq;
 
-   if (type == PCI_CAP_ID_MSI && nvec > 1)
+   if (type == MSI_TYPE && nvec > 1)
return 1;
msi_vecs = min(nvec, ZPCI_MSI_VEC_MAX);
msi_vecs = min_t(unsigned int, msi_vecs, CONFIG_PCI_NR_MSI);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 81e08ef..b833042 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3069,7 +3069,7 @@ int native_setup_msi_irqs(struct pci_dev *dev, int nvec, 
int type)
int node, ret;
 
/* Multiple MSI vectors only supported with interrupt remapping */
-   if (type == PCI_CAP_ID_MSI && nvec > 1)
+   if (type == MSI_TYPE && nvec > 1)
return 1;
 
node = dev_to_node(>dev);
diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 905956f..c19a8de 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -162,14 +162,14 @@ static int xen_setup_msi_irqs(struct pci_dev *dev, int 
nvec, int type)
struct msi_desc *msidesc;
int *v;
 
-   if (type == PCI_CAP_ID_MSI && nvec > 1)
+   if (type == MSI_TYPE && nvec > 1)
return 1;
 
v = kzalloc(sizeof(int) * max(1, nvec), GFP_KERNEL);

[RFC PATCH 05/11] PCI/MSI: Move populate_msi_sysfs() out of msi_capability_init()

2014-07-25 Thread Yijing Wang

Because some Non-PCI devices don't need to create sysfs object,
so move populate_msi_sysfs() out of generic MSI function
msi/x_capability_init().

Signed-off-by: Yijing Wang 
---
 drivers/pci/msi.c |   31 ++-
 1 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 116383c..21b16e0 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -646,13 +646,6 @@ static int msi_capability_init(struct pci_dev *dev, int 
nvec)
return ret;
}
 
-   ret = populate_msi_sysfs(dev);
-   if (ret) {
-   msi_mask_irq(entry, mask, ~mask);
-   free_msi_irqs(dev);
-   return ret;
-   }
-
/* Set MSI enabled bits  */
pci_intx_for_msi(dev, 0);
msi_set_enable(dev, 1);
@@ -760,10 +753,6 @@ static int msix_capability_init(struct pci_dev *dev, void 
__iomem *base,
 
msix_program_entries(dev, entries);
 
-   ret = populate_msi_sysfs(dev);
-   if (ret)
-   goto out_free;
-
/* Set MSI-X enabled bits and unmask the function */
pci_intx_for_msi(dev, 0);
dev->msix_enabled = 1;
@@ -789,7 +778,6 @@ out_avail:
ret = avail;
}
 
-out_free:
free_msi_irqs(dev);
 
return ret;
@@ -939,7 +927,7 @@ EXPORT_SYMBOL(pci_msix_vec_count);
 int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
 {
int status, nr_entries;
-   int i, j;
+   int i, j, ret;
void __iomem *base;
u16 control;
 
@@ -980,6 +968,14 @@ int pci_enable_msix(struct pci_dev *dev, struct msix_entry 
*entries, int nvec)
return -ENOMEM;
 
status = msix_capability_init(dev, base, entries, nvec);
+   if (!status) {
+   ret = populate_msi_sysfs(dev);
+   if (ret) {
+   dev->msix_enabled = 0;
+   pci_intx_for_msi(dev, 1);
+   free_msi_irqs(dev);
+   }
+   }
return status;
 }
 EXPORT_SYMBOL(pci_enable_msix);
@@ -1109,6 +1105,15 @@ int pci_enable_msi_range(struct pci_dev *dev, int 
minvec, int maxvec)
}
} while (rc);
 
+   rc = populate_msi_sysfs(dev);
+   if (rc) {
+   msi_set_enable(dev, 0);
+   pci_intx_for_msi(dev, 1);
+   dev->msi_enabled = 0;
+   free_msi_irqs(dev);
+   return rc;
+   }
+
return nvec;
 }
 EXPORT_SYMBOL(pci_enable_msi_range);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 04/11] PCI/MSI: Move MSIX table address mapping out of msix_capability_init

2014-07-25 Thread Yijing Wang

Move MSIX table address mapping work to PCI MSIX layer.
Some Non-PCI MSI device will do their address mapping work before
enable MSIX capability or their MSIX table address is within
device address block. So Move address mapping stuff out of the
generic MSIX core. This is prepartion for generic MSI drvier.

Suggested-by: Yun Wu 
Signed-off-by: Yijing Wang 
---
 drivers/pci/msi.c |   25 +
 1 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index d5c8e56..116383c 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -668,8 +668,8 @@ static void __iomem *msix_map_region(struct pci_dev *dev, 
unsigned nr_entries)
u32 table_offset;
u8 bir;
 
-   pci_read_config_dword(dev, dev->msix_cap + PCI_MSIX_TABLE,
- _offset);
+   pci_read_config_dword(dev, dev->msix_cap + PCI_MSIX_TABLE, 
+   _offset);
bir = (u8)(table_offset & PCI_MSIX_TABLE_BIR);
table_offset &= PCI_MSIX_TABLE_OFFSET;
phys_addr = pci_resource_start(dev, bir) + table_offset;
@@ -734,22 +734,14 @@ static void msix_program_entries(struct pci_dev *dev,
  * single MSI-X irq. A return of zero indicates the successful setup of
  * requested MSI-X entries with allocated irqs or non-zero for otherwise.
  **/
-static int msix_capability_init(struct pci_dev *dev,
+static int msix_capability_init(struct pci_dev *dev, void __iomem *base,
struct msix_entry *entries, int nvec)
 {
int ret;
-   u16 control;
-   void __iomem *base;
 
/* Ensure MSI-X is disabled while it is set up */
msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
 
-   pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, );
-   /* Request & Map MSI-X table region */
-   base = msix_map_region(dev, msix_table_size(control));
-   if (!base)
-   return -ENOMEM;
-
ret = msix_setup_entries(dev, base, entries, nvec);
if (ret)
return ret;
@@ -948,6 +940,8 @@ int pci_enable_msix(struct pci_dev *dev, struct msix_entry 
*entries, int nvec)
 {
int status, nr_entries;
int i, j;
+   void __iomem *base;
+   u16 control;
 
if (!entries || !dev->msix_cap || dev->current_state != PCI_D0)
return -EINVAL;
@@ -978,7 +972,14 @@ int pci_enable_msix(struct pci_dev *dev, struct msix_entry 
*entries, int nvec)
dev_info(>dev, "can't enable MSI-X (MSI IRQ already 
assigned)\n");
return -EINVAL;
}
-   status = msix_capability_init(dev, entries, nvec);
+
+   /* Request & Map MSI-X table region */
+   pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, );
+   base = msix_map_region(dev, msix_table_size(control));
+   if (!base)
+   return -ENOMEM;
+
+   status = msix_capability_init(dev, base, entries, nvec);
return status;
 }
 EXPORT_SYMBOL(pci_enable_msix);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 03/11] PCI/MSI: Refactor pci_dev_msi_enabled()

2014-07-25 Thread Yijing Wang

Pci_dev_msi_enabled() is used to check whether device
MSI/MSIX enabled. Refactor this function  to suuport
checking only device MSI or MSIX enabled.

Signed-off-by: Yijing Wang 
---
 arch/cris/arch-v32/drivers/pci/bios.c |2 +-
 arch/frv/mb93090-mb00/pci-vdk.c   |2 +-
 arch/ia64/pci/pci.c   |4 ++--
 arch/powerpc/kernel/eeh_driver.c  |2 +-
 arch/x86/pci/common.c |5 +++--
 drivers/block/nvme-core.c |4 ++--
 drivers/dma/ioat/dma.c|2 +-
 drivers/firewire/ohci.c   |2 +-
 drivers/gpu/drm/i915/i915_dma.c   |4 ++--
 drivers/misc/mei/hw-me.c  |2 +-
 drivers/misc/mei/hw-txe.c |2 +-
 drivers/misc/mei/pci-me.c |4 ++--
 drivers/misc/mei/pci-txe.c|4 ++--
 drivers/misc/mic/host/mic_debugfs.c   |4 ++--
 drivers/misc/mic/host/mic_intr.c  |8 
 drivers/ntb/ntb_hw.c  |2 +-
 drivers/pci/irq.c |4 ++--
 drivers/pci/msi.c |   15 +--
 drivers/pci/pci.c |6 +++---
 drivers/pci/pcie/portdrv_core.c   |4 ++--
 drivers/scsi/esas2r/esas2r_init.c |4 ++--
 drivers/scsi/esas2r/esas2r_ioctl.c|4 ++--
 drivers/scsi/hpsa.c   |4 ++--
 drivers/staging/crystalhd/crystalhd_lnx.c |2 +-
 drivers/xen/xen-pciback/pciback_ops.c |   12 ++--
 include/linux/pci.h   |   12 ++--
 virt/kvm/assigned-dev.c   |2 +-
 27 files changed, 67 insertions(+), 55 deletions(-)

diff --git a/arch/cris/arch-v32/drivers/pci/bios.c 
b/arch/cris/arch-v32/drivers/pci/bios.c
index 64a5fb9..d9d8332 100644
--- a/arch/cris/arch-v32/drivers/pci/bios.c
+++ b/arch/cris/arch-v32/drivers/pci/bios.c
@@ -93,7 +93,7 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
if ((err = pcibios_enable_resources(dev, mask)) < 0)
return err;
 
-   if (!dev->msi_enabled)
+   if (!pci_dev_msi_enabled(dev, MSI_TYPE))
pcibios_enable_irq(dev);
return 0;
 }
diff --git a/arch/frv/mb93090-mb00/pci-vdk.c b/arch/frv/mb93090-mb00/pci-vdk.c
index efa5d65..b96c128 100644
--- a/arch/frv/mb93090-mb00/pci-vdk.c
+++ b/arch/frv/mb93090-mb00/pci-vdk.c
@@ -409,7 +409,7 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
 
if ((err = pci_enable_resources(dev, mask)) < 0)
return err;
-   if (!dev->msi_enabled)
+   if (!pci_dev_msi_enabled(dev, MSI_TYPE))
pcibios_enable_irq(dev);
return 0;
 }
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 291a582..da8ddff 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -568,7 +568,7 @@ pcibios_enable_device (struct pci_dev *dev, int mask)
if (ret < 0)
return ret;
 
-   if (!dev->msi_enabled)
+   if (!pci_dev_msi_enabled(dev, MSI_TYPE))
return acpi_pci_irq_enable(dev);
return 0;
 }
@@ -577,7 +577,7 @@ void
 pcibios_disable_device (struct pci_dev *dev)
 {
BUG_ON(atomic_read(>enable_cnt));
-   if (!dev->msi_enabled)
+   if (!pci_dev_msi_enabled(dev, MSI_TYPE))
acpi_pci_irq_disable(dev);
 }
 
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 420da61..e3f2074 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -123,7 +123,7 @@ static void eeh_disable_irq(struct pci_dev *dev)
 * effectively disabled by the DMA Stopped state
 * when an EEH error occurs.
 */
-   if (dev->msi_enabled || dev->msix_enabled)
+   if (pci_dev_msi_enabled(dev, MSI_TYPE | MSIX_TYPE))
return;
 
if (!irq_has_action(dev->irq))
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 059a76c..4597940 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -662,14 +662,15 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
if ((err = pci_enable_resources(dev, mask)) < 0)
return err;
 
-   if (!pci_dev_msi_enabled(dev))
+   if (!pci_dev_msi_enabled(dev, MSI_TYPE | MSIX_TYPE))
return pcibios_enable_irq(dev);
return 0;
 }
 
 void pcibios_disable_device (struct pci_dev *dev)
 {
-   if (!pci_dev_msi_enabled(dev) && pcibios_disable_irq)
+   if (!pci_dev_msi_enabled(dev, MSI_TYPE | MSIX_TYPE) 
+   && pcibios_disable_irq)
pcibios_disable_irq(dev);
 }
 
diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index 02351e2..f96b90f 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -2325,9 +2325,9 @@ static int nvme_dev_map(struct nvme_dev *dev)
 
 static void nvme_dev_unmap(struct nvme_dev *dev)
 {
-   if

[RFC PATCH 11/11] x86/MSI: Refactor x86 MSI code

2014-07-25 Thread Yijing Wang

Signed-off-by: Yijing Wang 
---
 arch/x86/include/asm/io_apic.h   |2 +-
 arch/x86/include/asm/irq_remapping.h |4 +-
 arch/x86/include/asm/pci.h   |6 ++--
 arch/x86/include/asm/x86_init.h  |   10 +++---
 arch/x86/kernel/apic/io_apic.c   |   23 +++
 arch/x86/kernel/x86_init.c   |   12 
 drivers/iommu/amd_iommu.c|   16 ++
 drivers/iommu/intel_irq_remapping.c  |9 --
 drivers/iommu/irq_remapping.c|   51 -
 drivers/iommu/irq_remapping.h|6 ++--
 drivers/msi/msi.c|3 +-
 11 files changed, 72 insertions(+), 70 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 90f97b4..692a90f 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -158,7 +158,7 @@ extern int native_setup_ioapic_entry(int, struct 
IO_APIC_route_entry *,
 struct io_apic_irq_attr *);
 extern void eoi_ioapic_irq(unsigned int irq, struct irq_cfg *cfg);
 
-extern void native_compose_msi_msg(struct pci_dev *pdev,
+extern void native_compose_msi_msg(struct msi_irqs *msi,
   unsigned int irq, unsigned int dest,
   struct msi_msg *msg, u8 hpet_id);
 extern void native_eoi_ioapic_pin(int apic, int pin, int vector);
diff --git a/arch/x86/include/asm/irq_remapping.h 
b/arch/x86/include/asm/irq_remapping.h
index b7747c4..a10003d 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -47,7 +47,7 @@ extern int setup_ioapic_remapped_entry(int irq,
   int vector,
   struct io_apic_irq_attr *attr);
 extern void free_remapped_irq(int irq);
-extern void compose_remapped_msi_msg(struct pci_dev *pdev,
+extern void compose_remapped_msi_msg(struct msi_irqs *msi,
 unsigned int irq, unsigned int dest,
 struct msi_msg *msg, u8 hpet_id);
 extern int setup_hpet_msi_remapped(unsigned int irq, unsigned int id);
@@ -77,7 +77,7 @@ static inline int setup_ioapic_remapped_entry(int irq,
return -ENODEV;
 }
 static inline void free_remapped_irq(int irq) { }
-static inline void compose_remapped_msi_msg(struct pci_dev *pdev,
+static inline void compose_remapped_msi_msg(struct msi_irqs *msi,
unsigned int irq, unsigned int dest,
struct msi_msg *msg, u8 hpet_id)
 {
diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index 0892ea0..04c9ef6 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -96,10 +96,10 @@ extern void pci_iommu_alloc(void);
 #ifdef CONFIG_PCI_MSI
 /* implemented in arch/x86/kernel/apic/io_apic. */
 struct msi_desc;
-int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
+int native_setup_msi_irqs(struct msi_irqs *msi, int nvec, int type);
 void native_teardown_msi_irq(unsigned int irq);
-void native_restore_msi_irqs(struct pci_dev *dev);
-int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc,
+void native_restore_msi_irqs(struct msi_irqs *msi);
+int setup_msi_irq(struct msi_irqs *msi, struct msi_desc *msidesc,
  unsigned int irq_base, unsigned int irq_offset);
 #else
 #define native_setup_msi_irqs  NULL
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index e45e4da..8e42f17 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -170,18 +170,18 @@ struct x86_platform_ops {
void (*apic_post_init)(void);
 };
 
-struct pci_dev;
+struct msi_irqs;
 struct msi_msg;
 struct msi_desc;
 
 struct x86_msi_ops {
-   int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type);
-   void (*compose_msi_msg)(struct pci_dev *dev, unsigned int irq,
+   int (*setup_msi_irqs)(struct msi_irqs *msi, int nvec, int type);
+   void (*compose_msi_msg)(struct msi_irqs *msi, unsigned int irq,
unsigned int dest, struct msi_msg *msg,
   u8 hpet_id);
void (*teardown_msi_irq)(unsigned int irq);
-   void (*teardown_msi_irqs)(struct pci_dev *dev);
-   void (*restore_msi_irqs)(struct pci_dev *dev);
+   void (*teardown_msi_irqs)(struct msi_irqs *msi);
+   void (*restore_msi_irqs)(struct msi_irqs *msi);
int  (*setup_hpet_msi)(unsigned int irq, unsigned int id);
u32 (*msi_mask_irq)(struct msi_desc *desc, u32 mask, u32 flag);
u32 (*msix_mask_irq)(struct msi_desc *desc, u32 flag);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index b833042..3cb4a6a 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2939,7 +2939,7 @@ void arch_teardown_hwirq(unsigned int irq)
 /*

[RFC PATCH 08/11] PCI/MSI: Introduce new struct msi_irqs and struct msi_ops

2014-07-25 Thread Yijing Wang

Currently, MSI driver is bonding with PCI everywhere.
Now introduce a new struct msi_irqs to manage all MSI
related informations in a MSI support device. In addition,
we introduce struct msi_ops to hook all device specific
MSI operations. Then MSI driver can be decoupled with
PCI.

Signed-off-by: Yijing Wang 
---
 include/linux/msi.h |   30 +-
 include/linux/pci.h |7 +--
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 3ad8416..5a672d3 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -10,6 +10,34 @@ struct msi_msg {
u32 data;   /* 16 bits of msi message data */
 };
 
+struct msi_ops;
+
+struct msi_irqs {
+   u8 msi_enabled:1;
+   u8 msix_enabled:1;
+   int node;
+   struct list_head msi_list;
+   void *data;
+   struct msi_ops *ops;
+};
+
+struct msix_entry {
+   u32 vector; /* kernel uses to write allocated vector */
+   u16 entry;  /* driver uses to specify entry, OS writes */
+};
+
+struct msi_ops {
+   void (*msi_set_enable)(struct msi_irqs *msi, int enable, int type);
+   struct msi_desc *(*msi_setup_entry)(struct msi_irqs *msi);
+   int (*msix_setup_entries)(struct msi_irqs *msi, void __iomem *base,
+   struct msix_entry *entries, int nvec);
+   u32 (*msi_mask_irq)(struct msi_desc *desc, u32 mask, u32 flag);
+   u32 (*msix_mask_irq)(struct msi_desc *desc, u32 flag);
+   void (*msi_read_message)(struct msi_desc *desc, struct msi_msg *msg);
+   void (*msi_write_message)(struct msi_desc *desc, struct msi_msg *msg);
+   void (*msi_set_intx)(struct msi_irqs *msi, int enable);
+};
+
 /* Helper functions */
 struct irq_data;
 struct msi_desc;
@@ -42,7 +70,7 @@ struct msi_desc {
void __iomem *mask_base;
u8 mask_pos;
};
-   struct pci_dev *dev;
+   struct msi_irqs *msi;
 
/* Last set MSI message */
struct msi_msg msg;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c6c01ae..c7bca1c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -32,8 +32,8 @@
 #include 
 
 #include 
-
 #include 
+
 /*
  * The PCI interface treats multi-function devices as independent
  * devices.  The slot/function address of each device is encoded
@@ -1182,11 +1182,6 @@ enum pci_dma_burst_strategy {
   strategy_parameter byte boundaries */
 };
 
-struct msix_entry {
-   u32 vector; /* kernel uses to write allocated vector */
-   u16 entry;  /* driver uses to specify entry, OS writes */
-};
-
 
 #ifdef CONFIG_PCI_MSI
 int pci_msi_vec_count(struct pci_dev *dev);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 10/11] PCI/MSI: Split the generic MSI code into new file

2014-07-25 Thread Yijing Wang

MSI interrupt will not only used in PCI device, more
and more Non-PCI device also want to use MSI. ARM
GIC v3 spec says in ARM platform with GIC v3 controller,
Non-PCI device can also be design to support MSI to
simplify interrupt wires, for the existing Non-PCI
device, consolidator is designed and used to translate
legacy interrupt to MSI. So for support Non-PCI MSI
device, generic MSI driver is needed. Split the generic
MSI code into new location, drivers/msi/msi.c. Then
MSI driver does not depend PCI anymore.

Signed-off-by: Yijing Wang 
---
 drivers/Kconfig  |1 +
 drivers/Makefile |1 +
 drivers/msi/Kconfig  |8 +
 drivers/msi/Makefile |1 +
 drivers/msi/msi.c|  540 ++
 drivers/pci/Kconfig  |6 +-
 drivers/pci/msi.c|  500 ---
 include/linux/msi.h  |   31 +++-
 8 files changed, 617 insertions(+), 471 deletions(-)
 create mode 100644 drivers/msi/Kconfig
 create mode 100644 drivers/msi/Makefile
 create mode 100644 drivers/msi/msi.c

diff --git a/drivers/Kconfig b/drivers/Kconfig
index 0e87a34..4d05749 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -176,4 +176,5 @@ source "drivers/powercap/Kconfig"
 
 source "drivers/mcb/Kconfig"
 
+source "drivers/msi/Kconfig"
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index f98b50d..47ae3d1 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -158,3 +158,4 @@ obj-$(CONFIG_NTB)   += ntb/
 obj-$(CONFIG_FMC)  += fmc/
 obj-$(CONFIG_POWERCAP) += powercap/
 obj-$(CONFIG_MCB)  += mcb/
+obj-$(CONFIG_MSI)  += msi/
diff --git a/drivers/msi/Kconfig b/drivers/msi/Kconfig
new file mode 100644
index 000..739bd13
--- /dev/null
+++ b/drivers/msi/Kconfig
@@ -0,0 +1,8 @@
+config MSI
+   bool "Message Signaled Interrupts (MSI and MSI-X)"
+   default y
+   help
+   This allows device drivers to use generic MSI(Message
+   Signaled Interrupt). Message Signaled Interrupts enable 
+   a device to generate an interrupt using an inbound Memory 
+   Write to a specific target address.
diff --git a/drivers/msi/Makefile b/drivers/msi/Makefile
new file mode 100644
index 000..39cb026
--- /dev/null
+++ b/drivers/msi/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_MSI) += msi.o
diff --git a/drivers/msi/msi.c b/drivers/msi/msi.c
new file mode 100644
index 000..3fbd539
--- /dev/null
+++ b/drivers/msi/msi.c
@@ -0,0 +1,540 @@
+/*
+ * File:   msi.c
+ * Purpose:Message Signaled Interrupt (MSI)
+ *
+ * Copyright (C) 2014 Huawei Ltd.
+ * Copyright (C) Yijing Wang  
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Arch hooks */
+
+int __weak arch_setup_msi_irq(struct msi_irqs *msi, struct msi_desc *desc)
+{
+   struct pci_dev *dev = msi->data;
+   struct msi_chip *chip = dev->bus->msi; //TO BE DONE: rework msi_chip to 
support Non-PCI MSI
+   int err;
+
+   if (!chip || !chip->setup_irq)
+   return -EINVAL;
+
+   err = chip->setup_irq(chip, dev, desc);
+   if (err < 0)
+   return err;
+
+   irq_set_chip_data(desc->irq, chip);
+   return 0;
+}
+
+void __weak arch_teardown_msi_irq(unsigned int irq)
+{
+   struct msi_chip *chip = irq_get_chip_data(irq);
+
+   if (!chip || !chip->teardown_irq)
+   return;
+
+   chip->teardown_irq(chip, irq);
+}
+
+int __weak arch_msi_check_device(struct msi_irqs *msi, int nvec, int type)
+{
+   struct pci_dev *dev = msi->data;
+   struct msi_chip *chip = dev->bus->msi; //TO BE DONE: rework msi_chip to 
support Non-PCI MSI
+
+   if (!chip || !chip->check_device)
+   return 0;
+
+   return chip->check_device(chip, dev, nvec, type);
+}
+
+int __weak arch_setup_msi_irqs(struct msi_irqs *msi, int nvec, int type)
+{
+   struct msi_desc *entry;
+   int ret;
+
+   /*
+* If an architecture wants to support multiple MSI, it needs to
+* override arch_setup_msi_irqs()
+*/
+   if (type == MSI_TYPE && nvec > 1)
+   return 1;
+
+   list_for_each_entry(entry, >msi_list, list) {
+   ret = arch_setup_msi_irq(msi, entry);
+   if (ret < 0)
+   return ret;
+   if (ret > 0)
+   return -ENOSPC;
+   }
+   return 0;
+}
+
+
+void __weak arch_teardown_msi_irqs(struct msi_irqs *msi)
+{
+   return default_teardown_msi_irqs(msi);
+}
+
+/*
+ * We have a default implementation available as a separate non-weak
+ * function, as it is used by the Xen x86 PCI code
+ */
+void default_teardown_msi_irqs(struct msi_irqs *msi)
+{
+   struct msi_desc *entry;
+
+   list_for_each_entry(entry, >msi_list, list) {
+   int i, nvec;
+   if (entry->irq == 0)
+

[RFC PATCH 09/11] PCI/MSI: refactor PCI MSI driver

2014-07-25 Thread Yijing Wang

Use struct msi_ops to hook PCI MSI operations,
and use struct msi_irqs to refactor PCI MSI drvier.

Signed-off-by: Yijing Wang 
---
 drivers/pci/msi.c   |  351 ++-
 include/linux/msi.h |   14 +-
 include/linux/pci.h |   11 +-
 3 files changed, 222 insertions(+), 154 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 41c33da..f0c5989 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -29,8 +29,9 @@ static int pci_msi_enable = 1;
 
 /* Arch hooks */
 
-int __weak arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
+int __weak arch_setup_msi_irq(struct msi_irqs *msi, struct msi_desc *desc)
 {
+   struct pci_dev *dev = msi->data; //TO BE DONE: rework msi_chip to 
support Non-PCI
struct msi_chip *chip = dev->bus->msi;
int err;
 
@@ -56,8 +57,9 @@ void __weak arch_teardown_msi_irq(unsigned int irq)
chip->teardown_irq(chip, irq);
 }
 
-int __weak arch_msi_check_device(struct pci_dev *dev, int nvec, int type)
+int __weak arch_msi_check_device(struct msi_irqs *msi, int nvec, int type)
 {
+   struct pci_dev *dev = msi->data; //TO BE DONE: rework msi_chip to 
support Non-PCI
struct msi_chip *chip = dev->bus->msi;
 
if (!chip || !chip->check_device)
@@ -66,7 +68,7 @@ int __weak arch_msi_check_device(struct pci_dev *dev, int 
nvec, int type)
return chip->check_device(chip, dev, nvec, type);
 }
 
-int __weak arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+int __weak arch_setup_msi_irqs(struct msi_irqs *msi, int nvec, int type)
 {
struct msi_desc *entry;
int ret;
@@ -78,8 +80,8 @@ int __weak arch_setup_msi_irqs(struct pci_dev *dev, int nvec, 
int type)
if (type == MSI_TYPE && nvec > 1)
return 1;
 
-   list_for_each_entry(entry, >msi_list, list) {
-   ret = arch_setup_msi_irq(dev, entry);
+   list_for_each_entry(entry, >msi_list, list) {
+   ret = arch_setup_msi_irq(msi, entry);
if (ret < 0)
return ret;
if (ret > 0)
@@ -93,11 +95,11 @@ int __weak arch_setup_msi_irqs(struct pci_dev *dev, int 
nvec, int type)
  * We have a default implementation available as a separate non-weak
  * function, as it is used by the Xen x86 PCI code
  */
-void default_teardown_msi_irqs(struct pci_dev *dev)
+void default_teardown_msi_irqs(struct msi_irqs *msi)
 {
struct msi_desc *entry;
 
-   list_for_each_entry(entry, >msi_list, list) {
+   list_for_each_entry(entry, >msi_list, list) {
int i, nvec;
if (entry->irq == 0)
continue;
@@ -110,22 +112,22 @@ void default_teardown_msi_irqs(struct pci_dev *dev)
}
 }
 
-void __weak arch_teardown_msi_irqs(struct pci_dev *dev)
+void __weak arch_teardown_msi_irqs(struct msi_irqs *msi)
 {
-   return default_teardown_msi_irqs(dev);
+   return default_teardown_msi_irqs(msi);
 }
 
-static void default_restore_msi_irq(struct pci_dev *dev, int irq)
+static void default_restore_msi_irq(struct msi_irqs *msi, int irq)
 {
struct msi_desc *entry;
 
entry = NULL;
-   if (dev->msix_enabled) {
-   list_for_each_entry(entry, >msi_list, list) {
+   if (msi->msix_enabled) {
+   list_for_each_entry(entry, >msi_list, list) {
if (irq == entry->irq)
break;
}
-   } else if (pci_dev_msi_enabled(dev, MSI_TYPE))  {
+   } else if (msi->msi_enabled)  {
entry = irq_get_msi_desc(irq);
}
 
@@ -133,20 +135,9 @@ static void default_restore_msi_irq(struct pci_dev *dev, 
int irq)
write_msi_msg(irq, >msg);
 }
 
-void __weak arch_restore_msi_irqs(struct pci_dev *dev)
+void __weak arch_restore_msi_irqs(struct msi_irqs *msi)
 {
-   return default_restore_msi_irqs(dev);
-}
-
-static void msi_set_enable(struct pci_dev *dev, int enable)
-{
-   u16 control;
-
-   pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, );
-   control &= ~PCI_MSI_FLAGS_ENABLE;
-   if (enable)
-   control |= PCI_MSI_FLAGS_ENABLE;
-   pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control);
+   return default_restore_msi_irqs(msi);
 }
 
 static void msix_clear_and_set_ctrl(struct pci_dev *dev, u16 clear, u16 set)
@@ -159,6 +150,25 @@ static void msix_clear_and_set_ctrl(struct pci_dev *dev, 
u16 clear, u16 set)
pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, ctrl);
 }
 
+static void msi_set_enable(struct msi_irqs *msi, int enable, int type)
+{
+   u16 control;
+   struct pci_dev *dev = msi->data;
+
+   if (type == MSI_TYPE) {
+   pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, 
);
+   control &= ~PCI_MSI_FLAGS_ENABLE;
+   if (enable)
+   control |= PCI_MSI_FLAGS_ENABLE;
+

[RFC PATCH 07/11] PCI/MSI: Mask MSI-X entry in msix_setup_entries()

2014-07-25 Thread Yijing Wang

Save the MSI-X entry initial mask status in
msix_setup_entries(), also mask the entry.
This is preparation for generic MSI.

Signed-off-by: Yijing Wang 
---
 drivers/pci/msi.c |   21 +++--
 1 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index f96dd38..41c33da 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -672,7 +672,7 @@ static int msix_setup_entries(struct pci_dev *dev, void 
__iomem *base,
  struct msix_entry *entries, int nvec)
 {
struct msi_desc *entry;
-   int i;
+   int i, offset;
 
for (i = 0; i < nvec; i++) {
entry = alloc_msi_entry(dev);
@@ -691,6 +691,15 @@ static int msix_setup_entries(struct pci_dev *dev, void 
__iomem *base,
entry->msi_attrib.default_irq   = dev->irq;
entry->mask_base= base;
 
+   msix_clear_and_set_ctrl(dev, 0, 
+   PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE);
+   offset = entries[i].entry * PCI_MSIX_ENTRY_SIZE +
+   PCI_MSIX_ENTRY_VECTOR_CTRL;
+   entry->masked = readl(entry->mask_base + offset);
+   msix_mask_irq(entry, 1);
+   msix_clear_and_set_ctrl(dev, 
+   PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE, 
0);
+
list_add_tail(>list, >msi_list);
}
 
@@ -704,13 +713,8 @@ static void msix_program_entries(struct pci_dev *dev,
int i = 0;
 
list_for_each_entry(entry, >msi_list, list) {
-   int offset = entries[i].entry * PCI_MSIX_ENTRY_SIZE +
-   PCI_MSIX_ENTRY_VECTOR_CTRL;
-
entries[i].vector = entry->irq;
irq_set_msi_desc(entry->irq, entry);
-   entry->masked = readl(entry->mask_base + offset);
-   msix_mask_irq(entry, 1);
i++;
}
 }
@@ -746,16 +750,13 @@ static int msix_capability_init(struct pci_dev *dev, void 
__iomem *base,
 * MSI-X registers.  We need to mask all the vectors to prevent
 * interrupts coming in before they're fully set up.
 */
-   msix_clear_and_set_ctrl(dev, 0,
-   PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE);
-
msix_program_entries(dev, entries);
 
/* Set MSI-X enabled bits and unmask the function */
pci_intx_for_msi(dev, 0);
dev->msix_enabled = 1;
 
-   msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
+   msix_clear_and_set_ctrl(dev, 0, PCI_MSIX_FLAGS_ENABLE);
 
return 0;
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 01/11] PCI/MSI: Use pci_dev->msi_cap instead of msi_desc->msi_attrib.pos

2014-07-25 Thread Yijing Wang

PCI devices save the msi and msix capability offset in pci_dev->msi_cap
and pci_dev->msix_cap. When we access PCI device MSI and MSIX
registers, we can use msi_cap and msix_cap in pci_dev directly.
Remove the pos member in msi_attrib.

Signed-off-by: Yijing Wang 
---
 arch/mips/pci/msi-octeon.c |4 ++--
 drivers/pci/host/pcie-designware.c |2 +-
 drivers/pci/msi.c  |2 --
 include/linux/msi.h|1 -
 4 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/mips/pci/msi-octeon.c b/arch/mips/pci/msi-octeon.c
index ab0c5d1..6a6a99f 100644
--- a/arch/mips/pci/msi-octeon.c
+++ b/arch/mips/pci/msi-octeon.c
@@ -73,7 +73,7 @@ int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc 
*desc)
 * wants.  Most devices only want 1, which will give
 * configured_private_bits and request_private_bits equal 0.
 */
-   pci_read_config_word(dev, desc->msi_attrib.pos + PCI_MSI_FLAGS,
+   pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS,
 );
 
/*
@@ -176,7 +176,7 @@ msi_irq_allocated:
/* Update the number of IRQs the device has available to it */
control &= ~PCI_MSI_FLAGS_QSIZE;
control |= request_private_bits << 4;
-   pci_write_config_word(dev, desc->msi_attrib.pos + PCI_MSI_FLAGS,
+   pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS,
  control);
 
irq_set_msi_desc(irq, desc);
diff --git a/drivers/pci/host/pcie-designware.c 
b/drivers/pci/host/pcie-designware.c
index 1eaf4df..04339cd 100644
--- a/drivers/pci/host/pcie-designware.c
+++ b/drivers/pci/host/pcie-designware.c
@@ -335,7 +335,7 @@ static int dw_msi_setup_irq(struct msi_chip *chip, struct 
pci_dev *pdev,
return -EINVAL;
}
 
-   pci_read_config_word(pdev, desc->msi_attrib.pos+PCI_MSI_FLAGS,
+   pci_read_config_word(pdev, pdev->msi_cap + PCI_MSI_FLAGS,
_ctr);
msgvec = (msg_ctr_MSI_FLAGS_QSIZE) >> 4;
if (msgvec == 0)
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 5a40516..e67acd1 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -595,7 +595,6 @@ static struct msi_desc *msi_setup_entry(struct pci_dev *dev)
entry->msi_attrib.entry_nr  = 0;
entry->msi_attrib.maskbit   = !!(control & PCI_MSI_FLAGS_MASKBIT);
entry->msi_attrib.default_irq   = dev->irq; /* Save IOAPIC IRQ */
-   entry->msi_attrib.pos   = dev->msi_cap;
entry->msi_attrib.multi_cap = (control & PCI_MSI_FLAGS_QMASK) >> 1;
 
if (control & PCI_MSI_FLAGS_64BIT)
@@ -699,7 +698,6 @@ static int msix_setup_entries(struct pci_dev *dev, void 
__iomem *base,
entry->msi_attrib.is_64 = 1;
entry->msi_attrib.entry_nr  = entries[i].entry;
entry->msi_attrib.default_irq   = dev->irq;
-   entry->msi_attrib.pos   = dev->msix_cap;
entry->mask_base= base;
 
list_add_tail(>list, >msi_list);
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 8103f32..ce88c5b 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -29,7 +29,6 @@ struct msi_desc {
__u8multi_cap : 3;  /* log2 num of messages supported */
__u8maskbit : 1;/* mask-pending bit supported ? */
__u8is_64   : 1;/* Address size: 0=32bit 1=64bit */
-   __u8pos;/* Location of the msi capability */
__u16   entry_nr;   /* specific enabled entry */
unsigned default_irq;   /* default pre-assigned irq */
} msi_attrib;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: perf: invalid memory access in perf_swevent_del

2014-07-25 Thread Sasha Levin

On 05/10/2014 07:34 PM, Sasha Levin wrote:
> Hi all,
> 
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel I've stumbled on the following spew:

Ping? I'm still seeing corruption on perf_swevent_del and perf_swevent_init:

[  488.092839] AddressSanitizer: use after free in perf_swevent_del+0x33/0x70 
at addr 8805f430ea48
[  488.09] page:ea0017d0c380 count:0 mapcount:0 mapping:  
(null) index:0x0
[  488.095681] page flags: 0x6f80008000(tail)
[  488.096407] page dumped because: kasan error
[  488.097116] CPU: 17 PID: 9306 Comm: trinity-main Not tainted 
3.16.0-rc6-next-20140725-sasha-00048-ga713fc0-dirty #937
[  488.098736]  00fb  ea0017d0c380 
8805f444b740
[  488.099933]  b6dc96f3 8805f444b810 8805f444b800 
b242d17c
[  488.100020]  880be215f448 880be215f45d 8805ff7e2dc0 
8805ff7e2dd0
[  488.100020] Call Trace:
[  488.100020] dump_stack (lib/dump_stack.c:52)
[  488.100020] kasan_report_error (mm/kasan/report.c:98 mm/kasan/report.c:166)
[  488.100020] ? kvm_clock_read (./arch/x86/include/asm/preempt.h:90 
arch/x86/kernel/kvmclock.c:86)
[  488.100020] ? sched_clock (./arch/x86/include/asm/paravirt.h:192 
arch/x86/kernel/tsc.c:304)
[  488.100020] ? sched_clock_local (kernel/sched/clock.c:214)
[  488.100020] __asan_store8 (mm/kasan/kasan.c:400)
[  488.100020] ? perf_swevent_del (include/linux/list.h:618 
include/linux/rculist.h:345 kernel/events/core.c:5758)
[  488.100020] perf_swevent_del (include/linux/list.h:618 
include/linux/rculist.h:345 kernel/events/core.c:5758)
[  488.100020] event_sched_out.isra.49 (kernel/events/core.c:1416)
[  488.100020] group_sched_out (kernel/events/core.c:1442)
[  488.100020] ctx_sched_out (kernel/events/core.c:2185 (discriminator 3))
[  488.100020] __perf_event_task_sched_out (kernel/events/core.c:2360 
kernel/events/core.c:2385)
[  488.100020] ? __perf_event_task_sched_out (include/linux/rcupdate.h:806 
kernel/events/core.c:2314 kernel/events/core.c:2385)
[  488.100020] ? update_stats_wait_end (kernel/sched/fair.c:760)
[  488.100020] perf_event_task_sched_out (include/linux/perf_event.h:702)
[  488.100020] ? __schedule (kernel/sched/core.c:2773)
[  488.100020] ? __schedule (kernel/sched/core.c:2773)
[  488.100020] __schedule (kernel/sched/core.c:2146 kernel/sched/core.c:2184 
kernel/sched/core.c:2308 kernel/sched/core.c:2810)
[  488.100020] preempt_schedule_irq (./arch/x86/include/asm/paravirt.h:814 
kernel/sched/core.c:2927)
[  488.100020] retint_kernel (arch/x86/kernel/entry_64.S:935)
[  488.100020] ? __asan_load4 (mm/kasan/kasan.c:358)
[  488.100020] ? debug_lockdep_rcu_enabled (kernel/rcu/update.c:134)
[  488.100020] __fget_light (include/linux/fdtable.h:77 fs/file.c:684)
[  488.100020] __fdget_raw (fs/file.c:704)
[  488.100020] path_init (include/linux/file.h:60 fs/namei.c:1873)
[  488.100020] ? path_lookupat (fs/namei.c:1937)
[  488.100020] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
[  488.100020] path_lookupat (fs/namei.c:1937)
[  488.100020] ? poison_shadow (mm/kasan/kasan.c:76)
[  488.100020] ? unpoison_shadow (mm/kasan/kasan.c:82)
[  488.100020] ? kasan_slab_alloc (mm/kasan/kasan.c:206)
[  488.100020] ? strncpy_from_user (./arch/x86/include/asm/word-at-a-time.h:48 
lib/strncpy_from_user.c:44 lib/strncpy_from_user.c:109)
[  488.100020] filename_lookup (fs/namei.c:1984)
[  488.100020] user_path_at_empty (fs/namei.c:2135)
[  488.100020] ? check_chain_key (kernel/locking/lockdep.c:2188)
[  488.100020] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 
kernel/locking/lockdep.c:254)
[  488.100020] ? get_parent_ip (kernel/sched/core.c:2561)
[  488.100020] user_path_at (fs/namei.c:2146)
[  488.100020] vfs_fstatat (fs/stat.c:107)
[  488.100020] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
[  488.100020] SYSC_newfstatat (fs/stat.c:298)
[  488.100020] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 
(discriminator 2))
[  488.100020] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
[  488.100020] ? tracesys (arch/x86/kernel/entry_64.S:530)
[  488.100020] SyS_newfstatat (fs/stat.c:291)
[  488.100020] tracesys (arch/x86/kernel/entry_64.S:541)
[  488.100020] Write of size 8 by thread T9306:
[  488.100020] Memory state around the buggy address:
[  488.100020]  8805f430e780: fc fc fc fc fc fc fb fb fb fb fb fb fb fb fb 
fb
[  488.100020]  8805f430e800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  488.100020]  8805f430e880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  488.100020]  8805f430e900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  488.100020]  8805f430e980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  488.100020] >8805f430ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  488.100020]   ^
[  488.100020]  8805f430ea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
f

Re: [PATCH RFC tip/core/rcu] Fix attempt to avoid offloading callbacks unless requested

2014-07-25 Thread Frederic Weisbecker

On Fri, Jul 25, 2014 at 08:10:57PM -0400, Pranith Kumar wrote:
> On 07/25/2014 07:36 PM, Paul E. McKenney wrote:
> > [ Note: This applies on top of commit 187497fa5e9e (rcu: Allow for NULL
> > tick_nohz_full_mask when nohz_full= missing) in -tip
> > or -rcu.  To make this work on top of rcu/next, move the
> > call to rcu_organize_nocb_kthreads(rsp) to the end of the
> > for_each_rcu_flavor(rsp) loop in rcu_init_nohz(). ]
> >
> > Commit b58cc46c5f6b (rcu: Don't offload callbacks unless specifically
> > requested) failed to adjust the callback lists of the CPUs that are
> > known to be no-CBs CPUs only because they are also nohz_full= CPUs.
> > This failure can result in callbacks that are posted during early boot
> > getting stranded on nxtlist for CPUs whose no-CBs property becomes
> > apparent late, and there can also be spurious warnings about offline
> > CPUs posting callbacks.
> >
> > This commit fixes these problems by adding an early-boot rcu_init_nohz()
> > that properly initializes the no-CBs CPUs.
> >
> > Note that kernels built with CONFIG_RCU_NOCB_CPU_ALL=y or with
> > CONFIG_RCU_NOCB_CPU=n do not exhibit this bug.  Neither do kernels
> > booted without the nohz_full= boot parameter.
> >
> > Signed-off-by: Paul E. McKenney 
> 
> Please find two points below.
> 
> 
> >  #ifdef CONFIG_TREE_PREEMPT_RCU
> > @@ -2451,6 +2424,66 @@ static void do_nocb_deferred_wakeup(struct rcu_data 
> > *rdp)
> >  trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, 
> > TPS("DeferredWakeEmpty"));
> >  }
> >  
> > +void rcu_init_nohz(void)
> > +{
> > +int cpu;
> > +bool need_rcu_nocb_mask = true;
> > +struct rcu_state *rsp;
> > +
> > +#ifdef CONFIG_RCU_NOCB_CPU_NONE
> > +need_rcu_nocb_mask = false;
> > +#endif /* #ifndef CONFIG_RCU_NOCB_CPU_NONE */
> > +
> > +#if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL)
> > +if (tick_nohz_full_running && cpumask_weight(tick_nohz_full_mask))
> > +need_rcu_nocb_mask = true;
> > +#endif /* #if defined(CONFIG_NO_HZ_FULL) && 
> > !defined(CONFIG_NO_HZ_FULL_ALL) */
> > +
> > +if (!have_rcu_nocb_mask && need_rcu_nocb_mask) {
> > +zalloc_cpumask_var(_nocb_mask, GFP_KERNEL);
> 
> Please check the return value unless you want to increase my commit count ;)
> 
> >
> > +have_rcu_nocb_mask = true;
> > +}
> > +if (!have_rcu_nocb_mask)
> > +return;
> > +
> > +#ifdef CONFIG_RCU_NOCB_CPU_ZERO
> > +pr_info("\tOffload RCU callbacks from CPU 0\n");
> > +cpumask_set_cpu(0, rcu_nocb_mask);
> > +#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ZERO */
> > +#ifdef CONFIG_RCU_NOCB_CPU_ALL
> > +pr_info("\tOffload RCU callbacks from all CPUs\n");
> > +cpumask_copy(rcu_nocb_mask, cpu_possible_mask);
> > +#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ALL */
> > +#if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL)
> > +cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
> > +#endif /* #if defined(CONFIG_NO_HZ_FULL) && 
> > !defined(CONFIG_NO_HZ_FULL_ALL) */
> 
> I understand that if CONFIG_NO_HZ_FULL_ALL is set then CONFIG_NOCB_CPU_ALL
> will also be set and there is no need for this cpumask_or().
> 
> Is there any reason for the coupling between CONFIG_NO_HZ_FULL_ALL
> and CONFIG_NOCB_CPU_ALL?

Yeah, for any nohz full CPU, we need the corresponding CPU to be rcu_nocb.
So if all CPUs are full dynticks, all CPUs must be rcunocb.

That said with this patch, the dependency is perhaps not needed anymore.

> 
> I ask because a user can override CONFIG_NO_HZ_FULL_ALL=y at boot time
> using the nohz_full= boot time parameter.

No, the content of nohz_full= is ignored with CONFIG_NO_HZ_FULL_ALL=y.

That said you made me check and I realize that when that happens, we alloc
the mask two times and we leak the first. I need to fix that.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: vmstat: On demand vmstat workers V8

2014-07-25 Thread Sasha Levin

On 07/10/2014 10:04 AM, Christoph Lameter wrote:
> This patch creates a vmstat shepherd worker that monitors the
> per cpu differentials on all processors. If there are differentials
> on a processor then a vmstat worker local to the processors
> with the differentials is created. That worker will then start
> folding the diffs in regular intervals. Should the worker
> find that there is no work to be done then it will make the shepherd
> worker monitor the differentials again.

Hi Christoph, all,

This patch doesn't interact well with my fuzzing setup. I'm seeing
the following:

[  490.446927] BUG: using __this_cpu_read() in preemptible [] code: 
kworker/16:1/7368
[  490.447909] caller is __this_cpu_preempt_check+0x13/0x20
[  490.448596] CPU: 8 PID: 7368 Comm: kworker/16:1 Not tainted 
3.16.0-rc6-next-20140725-sasha-00047-g9eb9a52 #933
[  490.449847] Workqueue: events vmstat_update
[  490.450558]  97383bb6  9727df83 
8803077cfb68
[  490.451520]  95dc96b3 0008 8803077cfba0 
92002438
[  490.452475]  8803077cfc80 880be21ea138 8803077cfc80 
001e6a48
[  490.453459] Call Trace:
[  490.453776] dump_stack (lib/dump_stack.c:52)
[  490.454394] check_preemption_disabled (lib/smp_processor_id.c:46)
[  490.455161] __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[  490.455927] refresh_cpu_vm_stats (mm/vmstat.c:492)
[  490.456753] vmstat_update (mm/vmstat.c:1252)
[  490.457463] process_one_work (kernel/workqueue.c:2022 
include/linux/jump_label.h:115 include/trace/events/workqueue.h:111 
kernel/workqueue.c:2027)
[  490.458159] ? process_one_work (include/linux/workqueue.h:185 
kernel/workqueue.c:598 kernel/workqueue.c:625 kernel/workqueue.c:2015)
[  490.458887] worker_thread (include/linux/list.h:188 kernel/workqueue.c:2154)
[  490.459555] ? __schedule (./arch/x86/include/asm/bitops.h:311 
include/linux/thread_info.h:91 include/linux/sched.h:2854 
kernel/sched/core.c:2825)
[  490.460370] ? process_one_work (kernel/workqueue.c:2098)
[  490.461177] kthread (kernel/kthread.c:207)
[  490.461792] ? flush_kthread_work (kernel/kthread.c:176)
[  490.462529] ret_from_fork (arch/x86/kernel/entry_64.S:348)
[  490.463181] ? flush_kthread_work (kernel/kthread.c:176)
[  490.464008] [ cut here ]
[  490.464613] kernel BUG at mm/vmstat.c:1278!
[  490.465116] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  490.465981] Dumping ftrace buffer:
[  490.466585](ftrace buffer empty)
[  490.467030] Modules linked in:
[  490.467429] CPU: 8 PID: 7368 Comm: kworker/16:1 Not tainted 
3.16.0-rc6-next-20140725-sasha-00047-g9eb9a52 #933
[  490.468641] Workqueue: events vmstat_update
[  490.469163] task: 88030772b000 ti: 8803077cc000 task.ti: 
8803077cc000
[  490.470033] RIP: vmstat_update (mm/vmstat.c:1278)
[  490.470269] RSP: :8803077cfcb8  EFLAGS: 00010287
[  490.470269] RAX: 87ff RBX: 0008 RCX: 
[  490.470269] RDX: 88030772bcf8 RSI: 972e5fd0 RDI: 986fa5d0
[  490.470269] RBP: 8803077cfcd0 R08: 0002 R09: 
[  490.470269] R10:  R11:  R12: 8805fa7e34d0
[  490.470269] R13: 8803117e2240 R14: 0800 R15: 
[  490.470269] FS:  () GS:88031120() 
knlGS:
[  490.470269] CS:  0010 DS:  ES:  CR0: 8005003b
[  490.470269] CR2: 7fffc67fef1a CR3: 17a22000 CR4: 06a0
[  490.470269] Stack:
[  490.470269]  8803117dd240 88030af938e0 8803117e2240 
8803077cfd88
[  490.470269]  911f45f5 911f455d 88030af93928 
88031081e900
[  490.470269]  88030af938f0 88030af93900 88030af938e8 
88030af938f8
[  490.470269] Call Trace:
[  490.470269] process_one_work (kernel/workqueue.c:2022 
include/linux/jump_label.h:115 include/trace/events/workqueue.h:111 
kernel/workqueue.c:2027)
[  490.470269] ? process_one_work (include/linux/workqueue.h:185 
kernel/workqueue.c:598 kernel/workqueue.c:625 kernel/workqueue.c:2015)
[  490.470269] worker_thread (include/linux/list.h:188 kernel/workqueue.c:2154)
[  490.470269] ? __schedule (./arch/x86/include/asm/bitops.h:311 
include/linux/thread_info.h:91 include/linux/sched.h:2854 
kernel/sched/core.c:2825)
[  490.470269] ? process_one_work (kernel/workqueue.c:2098)
[  490.470269] kthread (kernel/kthread.c:207)
[  490.470269] ? flush_kthread_work (kernel/kthread.c:176)
[  490.470269] ret_from_fork (arch/x86/kernel/entry_64.S:348)
[  490.470269] ? flush_kthread_work (kernel/kthread.c:176)
[ 490.470269] Code: c7 d0 a5 6f 98 89 c3 e8 9f 9e 08 00 3b 1d 89 8b 35 07 73 7f 
f0 49 0f ab 1c 24 72 0f 5b 41 5c 41 5d 5d c3 0f 1f 84 00 00 00 00 00 <0f> 0b 66 
0f 1f 44 00 00 48 63 3d f1 be 36 07 48 c7 c3 40 d2 1d
All code

   0:   c7  (bad)
   1:   d0 a5 6f 98 89

Re: [PATCH v2 2/4] pinctrl: qpnp: Qualcomm PMIC pin controller driver

2014-07-25 Thread David Collins

On 07/17/2014 08:25 AM, Ivan T. Ivanov wrote:
> From: "Ivan T. Ivanov" 
> 
> This is the pinctrl, pinmux, pinconf and gpiolib driver for the
> Qualcomm GPIO and MPP sub-function blocks found in the PMIC chips.
> QPNP_REG_STATUS1_GPIO_EN_REV0_MASK
> Signed-off-by: Ivan T. Ivanov 

(...)
> +static int qpnp_conv_to_pin(struct qpnp_pinctrl *qctrl,
> +struct qpnp_padinfo *pad, unsigned param,
> +unsigned val)
(...)
> + switch (param) {
(...)
> + case PIN_CONFIG_OUTPUT:
> + nattrs = 3;
> + attr[0].addr  = QPNP_REG_MODE_CTL;
> + attr[0].shift = QPNP_REG_OUT_SRC_SEL_SHIFT;
> + attr[0].mask  = QPNP_REG_OUT_SRC_SEL_MASK;
> + attr[0].val   = !!val;

It seems that this patch provides no means to configure the output source
select bits to be anything besides 0 (constant low) or 1 (constant high).
 Some non-generic property is needed to configure this for both GPIOs and
MPPs.  Passing the value in via the output-high property does not seem
like a good approach since that is a generic pin config property that is
defined to take no value.  The special functions available for GPIOs (e.g.
PWM/LPG, clock, keypad, etc.) which are configured via this register are
used by many boards.

Something else to consider is that QPNP_REG_OUT_SRC_SEL_MASK is being
defined as 0xf which would imply that there are 16 possible output source
select options.  While technically true, this makes the situation more
complicated since half of those options are the inverted version of the
other half.  In the GPIO hardware this corresponds to an 8-way mux
followed by an XOR gate to conditionally invert the mux output.  If output
source select is handled this way, then the following values would need to
be supported in device tree for GPIOs:
* 0:  constant low (already supported via output-low;)
* 1:  constant high (already supported via output-high;)
* 2:  paired GPIO
* 3:  inverted paired GPIO
* 4:  special function 1
* 5:  inverted special function 1
* 6:  special function 2
* 7:  inverted special function 2
* 8:  dtest1
* 9:  inverted dtest1
* 10: dtest2
* 11: inverted dtest2
* 12: dtest3
* 13: inverted dtest3
* 14: dtest4
* 15: inverted dtest4
The same options are supported by MPPs except for special function 1,
inverted special function 1, special function 2, and inverted special
function 2.

If the output source select register parameter is instead treated as a
3-bit value along with an inversion bit, then the list of output selection
options that needs to be supported in device tree is cut in half:
* 0:  constant (already supported)
* 1:  paired GPIO
* 2:  special function 1
* 3:  special function 2
* 4:  dtest1
* 5:  dtest2
* 6:  dtest3
* 7:  dtest4
Another DT pin configuration property would then need to be used to
specify if the signal should be inverted or not.

> + attr[1].addr  = QPNP_REG_MODE_CTL;
> + attr[1].shift = QPNP_REG_MODE_SEL_SHIFT;
> + attr[1].mask  = QPNP_REG_MODE_SEL_MASK;
> + attr[1].val   = QPNP_PIN_MODE_DIG_OUT;
> + attr[2].addr  = QPNP_REG_EN_CTL;
> + attr[2].shift = QPNP_REG_MASTER_EN_SHIFT;
> + attr[2].mask  = QPNP_REG_MASTER_EN_MASK;
> + attr[2].val   = 1;
> + break;

(...)

> +static int qpnp_of_xlate(struct gpio_chip *chip,
> +const struct of_phandle_args *gpio_desc, u32 *flags)
> +{
> + struct qpnp_pinctrl *qctrl = to_qpnp_pinctrl(chip);
> + struct qpnp_padinfo *pad;
> +
> + if (chip->of_gpio_n_cells < 2) {
> + dev_err(qctrl->dev, "of_gpio_n_cells < 2\n");
> + return -EINVAL;
> + }
> +
> + pad = qpnp_get_desc(qctrl, gpio_desc->args[0]);
> + if (!pad)
> + return -EINVAL;
> +
> + if (flags)
> + *flags = gpio_desc->args[1];
> +
> + return gpio_desc->args[0];
> +}

This of_xlate callback function will result in the following situation:
If for example, a device tree consumer node wishes to use PM8941 GPIO 7
within gpiolib, then it would need to specify a gpiospec like this:
<_gpio 6 0>
There is an off-by-one issue with the indexing between the hardware GPIO
numbers (1-based) and the gpiolib gpio offsets (0-based).  Do you agree
that the indexing used within the device tree gpiospecs should match the
hardware numbering scheme?  I feel like this would be much less confusing
for users to work with.  If so, I think that a change to qpnp_of_xlate
like this would achieve it:

+#define QPNP_PIN_PHYSICAL_OFFSET   1

 static int qpnp_of_xlate(struct gpio_chip *chip,
 const struct of_phandle_args *gpio_desc, u32 *flags)
 {
struct qpnp_pinctrl *qctrl = to_qpnp_pinctrl(chip);

Re: [PATCH v2 00/10] Input - wacom: conversion to HID driver, series 2

2014-07-25 Thread Dmitry Torokhov

Hi Benjamin,

On Thu, Jul 24, 2014 at 02:13:55PM -0400, Benjamin Tissoires wrote:
> Hi Dmitry,
> 
> this is the second series I told you about for wacom.ko. This series also have
> a good number of removed lines of code. \o/
> 
> The first patch is Jason's one that I finally decided to take with me. His
> previous submission still applied correctly even after the moving of the files
> (git is definitively awesome).
> 
> The second one is a patch I sent earlier and forgot to include in the v2 of
> the first series. It might have been dropped during my many rebases. So here
> he is.
> 
> The rest is for one part enhancing the battery reporting system (to make it
> equal to the one in hid-wacom, and even slightly better). The other part
> is the actual merge of hid-wacom into wacom which gives the same user space 
> API
> for bluetooth and USB devices, fixes the pad-in-a-separate-input-dev, and
> fixes the missing tools not supported in the previous implementation of
> hid-wacom for Intuos 4 BT.
> 

I ended up taking 3.16-rc6 and applying your first series and the first
5 patches of this series to it. You should be able to see the result on
kernel.org in wacom branch.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Bug on Boot of Ubuntu 14.04 with kernel 3.16 r6 release

2014-07-25 Thread Nick Krause

Hey Guys,
After building my first rc kernel. I am sad to state it doesn't
boot.It states that it can't find my root uuid for my ssd boot drive.
I am using GSP with UEFI, on a Sandy Bridge i5 2500k build I have  had
for a few years. The kernel boots and after a few seconds
drops to a busybox ash shell stating it can't find my root device. I
am using the default Ubuntu build for this and it did boot
before with x86_64 defconfig without my sound through. Seems to me an
issue with the Ubuntu built kernel and not upstream
at least from what I can tell. I will attach my .config to help
through or my sound drivers.
Nick


config
Description: Binary data

[PATCH 2/3] x86,vdso: Make the PER_CPU segment start out accessed

2014-07-25 Thread Andy Lutomirski

The first userspace attempt to read or write the PER_CPU segment
will write the accessed bit to the GDT.  This is visible to
userspace using the LAR instruction.

Set the segment's accessed bit at boot to keep all userspace GDT
access idempotent.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/vsyscall_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 158cdff..0e2c229 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -305,7 +305,7 @@ static void vsyscall_set_cpu(int cpu)
memset(, 0, sizeof(d));
d.limit0 = cpu | ((node & 0xf) << 12);
d.limit = node >> 4;
-   d.type = 4; /* RO data, expand down */
+   d.type = 5; /* RO data, expand down, accessed */
d.dpl = 3;  /* Visible to user code */
d.s = 1;/* Non a system segment */
d.p = 1;/* Present */
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] x86,vdso: Change the PER_CPU segment to use struct desc_struct

2014-07-25 Thread Andy Lutomirski

This makes it easier to see what's going on.  It produces exactly
the same segment descriptor as the old code.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/vsyscall_64.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index e1e1e80..158cdff 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -289,7 +289,7 @@ sigsegv:
  */
 static void vsyscall_set_cpu(int cpu)
 {
-   unsigned long d;
+   struct desc_struct d;
unsigned long node = 0;
 #ifdef CONFIG_NUMA
node = cpu_to_node(cpu);
@@ -298,13 +298,17 @@ static void vsyscall_set_cpu(int cpu)
write_rdtscp_aux((node << 12) | cpu);
 
/*
-* Store cpu number in limit so that it can be loaded quickly
-* in user space in vgetcpu. (12 bits for the CPU and 8 bits for the 
node)
+* Store cpu number in limit so that it can be loaded
+* quickly in user space in vgetcpu. (12 bits for the CPU
+* and 8 bits for the node)
 */
-   d = 0x0f400ULL;
-   d |= cpu;
-   d |= (node & 0xf) << 12;
-   d |= (node >> 4) << 48;
+   memset(, 0, sizeof(d));
+   d.limit0 = cpu | ((node & 0xf) << 12);
+   d.limit = node >> 4;
+   d.type = 4; /* RO data, expand down */
+   d.dpl = 3;  /* Visible to user code */
+   d.s = 1;/* Non a system segment */
+   d.p = 1;/* Present */
 
write_gdt_entry(get_cpu_gdt_table(cpu), GDT_ENTRY_PER_CPU, , 
DESCTYPE_S);
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] x86,vdso: Make the PER_CPU segment 32 bits

2014-07-25 Thread Andy Lutomirski

IMO users ought not to be able to use 16-bit segments without using
modify_ldt.  Fortunately, it's impossible to break espfix64 by
loading the PER_CPU segment into SS because it's a read-only
segment, but marking it 32-bit seems less fragile.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/vsyscall_64.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 0e2c229..87ab841 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -309,6 +309,7 @@ static void vsyscall_set_cpu(int cpu)
d.dpl = 3;  /* Visible to user code */
d.s = 1;/* Non a system segment */
d.p = 1;/* Present */
+   d.d = 1;/* 32-bit */
 
write_gdt_entry(get_cpu_gdt_table(cpu), GDT_ENTRY_PER_CPU, , 
DESCTYPE_S);
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] x86: PER_CPU segment improvements

2014-07-25 Thread Andy Lutomirski

x86 sets up a per-cpu GDT entry so that vgetcpu can use LSL on it
to determine the CPU number and node.

This series, in little baby steps, cleans up that code and sets
the accessed and 32-bit flags on the segment.

The accessed bit prevents user code from setting the accessed bit
on its own, and making the segment 32-bit prevents concerns about
shenanigans involving CPU oddities with 16-bit data segments.

The latter isn't a real problem -- if it were a 16-bit read/write
segment, it could be used to bypass espfix64, but fortunately
RO segments can't be loaded into SS.

Andy Lutomirski (3):
  x86,vdso: Change the PER_CPU segment to use struct desc_struct
  x86,vdso: Make the PER_CPU segment start out accessed
  x86,vdso: Make the PER_CPU segment 32 bits

 arch/x86/kernel/vsyscall_64.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[uclinux-dist-devel] [GIT PULL] Blackfin fixes for v3.16-rc7

2014-07-25 Thread Steven Miao

Hi Linus,

please pull blackfin fixes for v3.16, smc nor flash PM fix, pinctrl group fix, 
update defconfig, and build fixes.

The following changes since commit 9a3c4145af32125c5ee39c0272662b47307a8323:

  Linux 3.16-rc6 (2014-07-20 21:04:16 -0700)

are available in the git repository at:

  http://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux.git 
tags/blackfin-3.16-fixes

for you to fetch changes up to b76f98236a23f808d6e3a27f7292670bc1d2c21b:

  blackfin: vmlinux.lds.S: reserve 32 bytes space at the end of data section 
for XIP kernel (2014-07-26 08:32:50 +0800)


blackfin fixes for v3.16


Sonic Zhang (1):
  blackfin: bind different groups of one pinmux function to different state 
name

Steven Miao (5):
  pm: bf609: cleanup smc nor flash
  blackfin: fix some bf5xx boards build for missing 
  irq: blackfin sec: drop duplicated sec priority set
  defconfig: BF609: update spi config name
  blackfin: vmlinux.lds.S: reserve 32 bytes space at the end of data 
section for XIP kernel

 arch/blackfin/configs/BF609-EZKIT_defconfig  |2 +-
 arch/blackfin/kernel/vmlinux.lds.S   |2 +-
 arch/blackfin/mach-bf533/boards/blackstamp.c |1 +
 arch/blackfin/mach-bf537/boards/cm_bf537e.c  |1 +
 arch/blackfin/mach-bf537/boards/cm_bf537u.c  |1 +
 arch/blackfin/mach-bf537/boards/tcm_bf537.c  |1 +
 arch/blackfin/mach-bf548/boards/ezkit.c  |6 --
 arch/blackfin/mach-bf561/boards/acvilon.c|1 +
 arch/blackfin/mach-bf561/boards/cm_bf561.c   |1 +
 arch/blackfin/mach-bf561/boards/ezkit.c  |1 +
 arch/blackfin/mach-bf609/boards/ezkit.c  |   20 
 arch/blackfin/mach-bf609/include/mach/pm.h   |5 +++--
 arch/blackfin/mach-bf609/pm.c|4 ++--
 arch/blackfin/mach-common/ints-priority.c|2 --
 14 files changed, 26 insertions(+), 22 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC tip/core/rcu] Fix attempt to avoid offloading callbacks unless requested

2014-07-25 Thread Paul E. McKenney

On Fri, Jul 25, 2014 at 08:10:57PM -0400, Pranith Kumar wrote:
> On 07/25/2014 07:36 PM, Paul E. McKenney wrote:
> > [ Note: This applies on top of commit 187497fa5e9e (rcu: Allow for NULL
> > tick_nohz_full_mask when nohz_full= missing) in -tip
> > or -rcu.  To make this work on top of rcu/next, move the
> > call to rcu_organize_nocb_kthreads(rsp) to the end of the
> > for_each_rcu_flavor(rsp) loop in rcu_init_nohz(). ]
> >
> > Commit b58cc46c5f6b (rcu: Don't offload callbacks unless specifically
> > requested) failed to adjust the callback lists of the CPUs that are
> > known to be no-CBs CPUs only because they are also nohz_full= CPUs.
> > This failure can result in callbacks that are posted during early boot
> > getting stranded on nxtlist for CPUs whose no-CBs property becomes
> > apparent late, and there can also be spurious warnings about offline
> > CPUs posting callbacks.
> >
> > This commit fixes these problems by adding an early-boot rcu_init_nohz()
> > that properly initializes the no-CBs CPUs.
> >
> > Note that kernels built with CONFIG_RCU_NOCB_CPU_ALL=y or with
> > CONFIG_RCU_NOCB_CPU=n do not exhibit this bug.  Neither do kernels
> > booted without the nohz_full= boot parameter.
> >
> > Signed-off-by: Paul E. McKenney 
> 
> Please find two points below.
> 
> 
> >  #ifdef CONFIG_TREE_PREEMPT_RCU
> > @@ -2451,6 +2424,66 @@ static void do_nocb_deferred_wakeup(struct rcu_data 
> > *rdp)
> >  trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, 
> > TPS("DeferredWakeEmpty"));
> >  }
> >  
> > +void rcu_init_nohz(void)
> > +{
> > +int cpu;
> > +bool need_rcu_nocb_mask = true;
> > +struct rcu_state *rsp;
> > +
> > +#ifdef CONFIG_RCU_NOCB_CPU_NONE
> > +need_rcu_nocb_mask = false;
> > +#endif /* #ifndef CONFIG_RCU_NOCB_CPU_NONE */
> > +
> > +#if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL)
> > +if (tick_nohz_full_running && cpumask_weight(tick_nohz_full_mask))
> > +need_rcu_nocb_mask = true;
> > +#endif /* #if defined(CONFIG_NO_HZ_FULL) && 
> > !defined(CONFIG_NO_HZ_FULL_ALL) */
> > +
> > +if (!have_rcu_nocb_mask && need_rcu_nocb_mask) {
> > +zalloc_cpumask_var(_nocb_mask, GFP_KERNEL);
> 
> Please check the return value unless you want to increase my commit count ;)

I have already queued an adapted version of your patch on top of this
one, see below.  ;-)

> > +have_rcu_nocb_mask = true;
> > +}
> > +if (!have_rcu_nocb_mask)
> > +return;
> > +
> > +#ifdef CONFIG_RCU_NOCB_CPU_ZERO
> > +pr_info("\tOffload RCU callbacks from CPU 0\n");
> > +cpumask_set_cpu(0, rcu_nocb_mask);
> > +#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ZERO */
> > +#ifdef CONFIG_RCU_NOCB_CPU_ALL
> > +pr_info("\tOffload RCU callbacks from all CPUs\n");
> > +cpumask_copy(rcu_nocb_mask, cpu_possible_mask);
> > +#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ALL */
> > +#if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL)
> > +cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
> > +#endif /* #if defined(CONFIG_NO_HZ_FULL) && 
> > !defined(CONFIG_NO_HZ_FULL_ALL) */
> 
> I understand that if CONFIG_NO_HZ_FULL_ALL is set then CONFIG_NOCB_CPU_ALL
> will also be set and there is no need for this cpumask_or().
> 
> Is there any reason for the coupling between CONFIG_NO_HZ_FULL_ALL
> and CONFIG_NOCB_CPU_ALL?
> 
> I ask because a user can override CONFIG_NO_HZ_FULL_ALL=y at boot time
> using the nohz_full= boot time parameter. In this case even if a user marks
> CPU 0 as the only nohz_full cpu, we will offload call backs from all CPUs.
> Is this behavior what you have in mind?

Yep.  The normal setup will be CONFIG_NO_HZ_FULL=y and
CONFIG_NO_HZ_FULL_ALL=n, and that works as you advocate.
If someone builds with CONFIG_NO_HZ_FULL_ALL=y, then they get
all CPUs offloaded.

Longer term, the hope is that offloading is unconditional, so that
CONFIG_NOCB_CPU and friends disappear, but the current code is most
definitely not up to that task yet.

Thanx, Paul



rcu: Check the return value of zalloc_cpumask_var()

This commit checks the return value of the zalloc_cpumask_var() used for
allocating cpumask for rcu_nocb_mask.

Signed-off-by: Pranith Kumar 
Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 095d6e4d2fd7..5a6398e21bfc 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2440,7 +2440,10 @@ void rcu_init_nohz(void)
 #endif /* #if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL) */
 
if (!have_rcu_nocb_mask && need_rcu_nocb_mask) {
-   zalloc_cpumask_var(_nocb_mask, GFP_KERNEL);
+   if (!zalloc_cpumask_var(_nocb_mask, GFP_KERNEL)) {
+   pr_info("rcu_nocb_mask allocation failed, callback 
offloading disabled.\n");
+   return;
+   }

Re: [PATCH v2 06/10] Input - wacom: prepare the driver to include BT devices

2014-07-25 Thread Dmitry Torokhov

Hi Benjamin,

On Thu, Jul 24, 2014 at 02:14:01PM -0400, Benjamin Tissoires wrote:
> Now that wacom is a hid driver, there is no point in having a separate
> driver for bluetooth devices.
> This patch prepares the common paths of Bluetooth devices in the
> common wacom driver.
> It also adds the sysfs file "speed" used by Bluetooth devices.
> 
> Signed-off-by: Benjamin Tissoires 
> ---
> 
> new in v2
> 
>  drivers/hid/wacom_sys.c | 70 
> ++---
>  drivers/hid/wacom_wac.h |  2 ++
>  2 files changed, 69 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c
> index d0d06b8..add76ec 100644
> --- a/drivers/hid/wacom_sys.c
> +++ b/drivers/hid/wacom_sys.c
> @@ -262,6 +262,12 @@ static int wacom_set_device_mode(struct hid_device 
> *hdev, int report_id,
>   return error < 0 ? error : 0;
>  }
>  
> +static int wacom_bt_query_tablet_data(struct hid_device *hdev, u8 speed,
> + struct wacom_features *features)
> +{
> + return 0;
> +}
> +
>  /*
>   * Switch the tablet into its most-capable mode. Wacom tablets are
>   * typically configured to power-up in a mode which sends mouse-like
> @@ -272,6 +278,9 @@ static int wacom_set_device_mode(struct hid_device *hdev, 
> int report_id,
>  static int wacom_query_tablet_data(struct hid_device *hdev,
>   struct wacom_features *features)
>  {
> + if (hdev->bus == BUS_BLUETOOTH)
> + return wacom_bt_query_tablet_data(hdev, 1, features);
> +
>   if (features->device_type == BTN_TOOL_FINGER) {
>   if (features->type > TABLETPC) {
>   /* MT Tablet PC touch */
> @@ -890,6 +899,38 @@ static void wacom_destroy_battery(struct wacom *wacom)
>   }
>  }
>  
> +static ssize_t wacom_show_speed(struct device *dev,
> + struct device_attribute
> + *attr, char *buf)
> +{
> + struct hid_device *hdev = container_of(dev, struct hid_device, dev);
> + struct wacom *wacom = hid_get_drvdata(hdev);
> +
> + return snprintf(buf, PAGE_SIZE, "%i\n", wacom->wacom_wac.bt_high_speed);
> +}
> +
> +static ssize_t wacom_store_speed(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct hid_device *hdev = container_of(dev, struct hid_device, dev);
> + struct wacom *wacom = hid_get_drvdata(hdev);
> + int new_speed;
> +
> + if (sscanf(buf, "%1d", _speed ) != 1)

Checkpach is unhappy with ')' placement and I agree with it.

> + return -EINVAL;

kstrtou8?

> +
> + if (new_speed == 0 || new_speed == 1) {
> + wacom_bt_query_tablet_data(hdev, new_speed,
> + >wacom_wac.features);
> + return strnlen(buf, PAGE_SIZE);

This is weird. Normally you want to return count since you should refuse
input with excessive data.

> + } else
> + return -EINVAL;

Need braces on both branches.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] readdirplus implementations: xgetdents vs dirreadahead syscalls

2014-07-25 Thread Dave Chinner

On Fri, Jul 25, 2014 at 10:52:57AM -0700, Zach Brown wrote:
> On Fri, Jul 25, 2014 at 01:37:19PM -0400, Abhijith Das wrote:
> > Hi all,
> > 
> > The topic of a readdirplus-like syscall had come up for discussion at last 
> > year's
> > LSF/MM collab summit. I wrote a couple of syscalls with their GFS2 
> > implementations
> > to get at a directory's entries as well as stat() info on the individual 
> > inodes.
> > I'm presenting these patches and some early test results on a single-node 
> > GFS2
> > filesystem.
> > 
> > 1. dirreadahead() - This patchset is very simple compared to the 
> > xgetdents() system
> > call below and scales very well for large directories in GFS2. 
> > dirreadahead() is
> > designed to be called prior to getdents+stat operations.
> 
> Hmm.  Have you tried plumbing these read-ahead calls in under the normal
> getdents() syscalls?

The issue is not directory block readahead (which some filesystems
like XFS already have), but issuing inode readahead during the
getdents() syscall.

It's the semi-random, interleaved inode IO that is being optimised
here (i.e. queued, ordered, issued, cached), not the directory
blocks themselves. As such, why does this need to be done in the
kernel?  This can all be done in userspace, and even hidden within
the readdir() or ftw/ntfw() implementations themselves so it's OS,
kernel and filesystem independent..

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] neighbour : fix ndm_type type error issue

2014-07-25 Thread Jun Zhao

On Sat, 2014-07-26 at 01:24 +0200, Hannes Frederic Sowa wrote:
> On Fri, Jul 25, 2014, at 18:38, Jun Zhao wrote:
> > ndm_type means L3 address type, in neighbour proxy and vxlan, it's
> > RTN_UNICAST.
> > NDA_DST is for netlink TLV type, hence it's not right value in this
> > context.
> 
> The value of NDA_DST == RTN_UNICAST, otherwise we couldn't do this
> change as it would alter e.g. arpd behavior.
> 
> Acked-by: Hannes Frederic Sowa 
> 
> Thanks,
> Hannes

But I think NDA_DST/RTN_UNICAST have different means in this context, 
even though the value of NDA_DST == RTN_UNICAST.

For arp proxy/NDP proxy context, ndm_type means the peer L3 address,
so RTN_UNICAST is the right value. For vxlan have similar semantic for
remote ip.

BTW: In the source code, implicit think NDA_DST == RTN_UNICAST maybe
not a good idea when we don't have a comment or the other explain.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Input: synaptics - properly initialize slots for semi-MT

2014-07-25 Thread Dmitry Torokhov

Semi-MT devices are pointers too, so let's tell that to
input_mt_init_slots(), as well as let it set up the devices as semi-MT,
instead of us doing it manually.

Reviewed-by: Daniel Kurtz 
Reviewed-by: Benson Leung 
Signed-off-by: Dmitry Torokhov 
---
 drivers/input/mouse/synaptics.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/input/mouse/synaptics.c b/drivers/input/mouse/synaptics.c
index ef9e0b8..fe607e9 100644
--- a/drivers/input/mouse/synaptics.c
+++ b/drivers/input/mouse/synaptics.c
@@ -1371,11 +1371,11 @@ static void set_input_params(struct psmouse *psmouse,
__set_bit(BTN_TOOL_QUADTAP, dev->keybit);
__set_bit(BTN_TOOL_QUINTTAP, dev->keybit);
} else if (SYN_CAP_ADV_GESTURE(priv->ext_cap_0c)) {
-   /* Non-image sensors with AGM use semi-mt */
-   __set_bit(INPUT_PROP_SEMI_MT, dev->propbit);
-   input_mt_init_slots(dev, 2, 0);
set_abs_position_params(dev, priv, ABS_MT_POSITION_X,
ABS_MT_POSITION_Y);
+   /* Non-image sensors with AGM use semi-mt */
+   input_mt_init_slots(dev, 2,
+   INPUT_MT_POINTER | INPUT_MT_SEMI_MT);
}
 
if (SYN_CAP_PALMDETECT(priv->capabilities))
-- 
2.0.0.526.g5318336


-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] Keyrings: PKCS#7 fixup

2014-07-25 Thread James Morris

On Fri, 25 Jul 2014, David Howells wrote:

> Hi James,
> 
> Here's a fixup for the problem that Stephen spotted.
> 
> David
> ---
> The following changes since commit 633706a2ee81637be37b6bc02c5336950cc163b5:
> 
>   Merge branch 'keys-fixes' into keys-next (2014-07-22 21:55:45 +0100)
> 
> are available in the git repository at:
> 
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git 
> tags/keys-next-20140725
> 
> for you to fetch changes up to 8f3438ccea149647ad1849651d1e14c7b8b85e63:
> 
>   PKCS#7: Missing inclusion of linux/err.h (2014-07-25 11:33:53 +0100)

Thanks, pulled.


-- 
James Morris


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC tip/core/rcu] Fix attempt to avoid offloading callbacks unless requested

2014-07-25 Thread Pranith Kumar

On 07/25/2014 07:36 PM, Paul E. McKenney wrote:
> [ Note: This applies on top of commit 187497fa5e9e (rcu: Allow for NULL
> tick_nohz_full_mask when nohz_full= missing) in -tip
> or -rcu.  To make this work on top of rcu/next, move the
> call to rcu_organize_nocb_kthreads(rsp) to the end of the
> for_each_rcu_flavor(rsp) loop in rcu_init_nohz(). ]
>
> Commit b58cc46c5f6b (rcu: Don't offload callbacks unless specifically
> requested) failed to adjust the callback lists of the CPUs that are
> known to be no-CBs CPUs only because they are also nohz_full= CPUs.
> This failure can result in callbacks that are posted during early boot
> getting stranded on nxtlist for CPUs whose no-CBs property becomes
> apparent late, and there can also be spurious warnings about offline
> CPUs posting callbacks.
>
> This commit fixes these problems by adding an early-boot rcu_init_nohz()
> that properly initializes the no-CBs CPUs.
>
> Note that kernels built with CONFIG_RCU_NOCB_CPU_ALL=y or with
> CONFIG_RCU_NOCB_CPU=n do not exhibit this bug.  Neither do kernels
> booted without the nohz_full= boot parameter.
>
> Signed-off-by: Paul E. McKenney 

Please find two points below.


>  #ifdef CONFIG_TREE_PREEMPT_RCU
> @@ -2451,6 +2424,66 @@ static void do_nocb_deferred_wakeup(struct rcu_data 
> *rdp)
>  trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("DeferredWakeEmpty"));
>  }
>  
> +void rcu_init_nohz(void)
> +{
> +int cpu;
> +bool need_rcu_nocb_mask = true;
> +struct rcu_state *rsp;
> +
> +#ifdef CONFIG_RCU_NOCB_CPU_NONE
> +need_rcu_nocb_mask = false;
> +#endif /* #ifndef CONFIG_RCU_NOCB_CPU_NONE */
> +
> +#if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL)
> +if (tick_nohz_full_running && cpumask_weight(tick_nohz_full_mask))
> +need_rcu_nocb_mask = true;
> +#endif /* #if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL) 
> */
> +
> +if (!have_rcu_nocb_mask && need_rcu_nocb_mask) {
> +zalloc_cpumask_var(_nocb_mask, GFP_KERNEL);

Please check the return value unless you want to increase my commit count ;)

>
> +have_rcu_nocb_mask = true;
> +}
> +if (!have_rcu_nocb_mask)
> +return;
> +
> +#ifdef CONFIG_RCU_NOCB_CPU_ZERO
> +pr_info("\tOffload RCU callbacks from CPU 0\n");
> +cpumask_set_cpu(0, rcu_nocb_mask);
> +#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ZERO */
> +#ifdef CONFIG_RCU_NOCB_CPU_ALL
> +pr_info("\tOffload RCU callbacks from all CPUs\n");
> +cpumask_copy(rcu_nocb_mask, cpu_possible_mask);
> +#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ALL */
> +#if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL)
> +cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
> +#endif /* #if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL) 
> */

I understand that if CONFIG_NO_HZ_FULL_ALL is set then CONFIG_NOCB_CPU_ALL
will also be set and there is no need for this cpumask_or().

Is there any reason for the coupling between CONFIG_NO_HZ_FULL_ALL
and CONFIG_NOCB_CPU_ALL?

I ask because a user can override CONFIG_NO_HZ_FULL_ALL=y at boot time
using the nohz_full= boot time parameter. In this case even if a user marks
CPU 0 as the only nohz_full cpu, we will offload call backs from all CPUs.
Is this behavior what you have in mind?

--
Pranith

>
> +
> +if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) {
> +pr_info("\tNote: kernel parameter 'rcu_nocbs=' contains nonexistent 
> CPUs.\n");
> +cpumask_and(rcu_nocb_mask, cpu_possible_mask,
> +rcu_nocb_mask);
> +}
> +cpulist_scnprintf(nocb_buf, sizeof(nocb_buf), rcu_nocb_mask);
> +pr_info("\tOffload RCU callbacks from CPUs: %s.\n", nocb_buf);
> +if (rcu_nocb_poll)
> +pr_info("\tPoll for callbacks from no-CBs CPUs.\n");
> +
> +for_each_rcu_flavor(rsp) {
> +for_each_cpu(cpu, rcu_nocb_mask) {
> +struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
> +
> +/*
> + * If there are early callbacks, they will need
> + * to be moved to the nocb lists.
> + */
> +WARN_ON_ONCE(rdp->nxttail[RCU_NEXT_TAIL] !=
> + >nxtlist &&
> + rdp->nxttail[RCU_NEXT_TAIL] != NULL);
> +init_nocb_callback_list(rdp);
> +}
> +}
> +}
> +
>  /* Initialize per-rcu_data variables for no-CBs CPUs. */
>  static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
>  {
> @@ -2479,10 +2512,6 @@ static void __init rcu_spawn_nocb_kthreads(struct 
> rcu_state *rsp)
>  
>  if (rcu_nocb_mask == NULL)
>  return;
> -#if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL)
> -if (tick_nohz_full_running)
> -cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
> -#endif /* #if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL) 
> */
>  if (ls == -1) {
>  ls = int_sqrt(nr_cpu_ids);
>  rcu_nocb_leader_stride =

[PATCH] parport: fix menu breakage

2014-07-25 Thread Randy Dunlap

From: Randy Dunlap 

Fixes: d90c3eb31535 "Kconfig cleanup (PARPORT_PC dependencies)"

Do not split the PARPORT-related symbols with the new kconfig
symbol ARCH_MIGHT_HAVE_PC_PARPORT. The split was causing incorrect
display of these symbols -- they were not being displayed together
as they should be.

Signed-off-by: Randy Dunlap 
Cc: Mark Salter 
Cc: Ingo Molnar 
Cc: sta...@vger.kernel.org # for 3.13, 3.14, 3.15
---
 drivers/parport/Kconfig |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: lnx-316-rc6/drivers/parport/Kconfig
===
--- lnx-316-rc6.orig/drivers/parport/Kconfig
+++ lnx-316-rc6/drivers/parport/Kconfig
@@ -5,6 +5,12 @@
 # Parport configuration.
 #
 
+config ARCH_MIGHT_HAVE_PC_PARPORT
+   bool
+   help
+ Select this config option from the architecture Kconfig if
+ the architecture might have PC parallel port hardware.
+
 menuconfig PARPORT
tristate "Parallel port support"
depends on HAS_IOMEM
@@ -31,12 +37,6 @@ menuconfig PARPORT
 
  If unsure, say Y.
 
-config ARCH_MIGHT_HAVE_PC_PARPORT
-   bool
-   help
- Select this config option from the architecture Kconfig if
- the architecture might have PC parallel port hardware.
-
 if PARPORT
 
 config PARPORT_PC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel 3.16-rc6 Bug with Sound?

2014-07-25 Thread Valdis . Kletnieks

On Fri, 25 Jul 2014 18:57:33 -0400, Nick Krause said:

> Hey guys after compiling and running the kernel in the subject line I
> get no sound
> and a message of no sound codec could be found. I am new so this may be
> a  missed needed config for sound or it's a bug. I am attaching my config
> to help you guys out :).

You want to *really* be helpful, you could tell us what lspci/lsusb/etc
say your sound card is, so we can tell you which driver you forgot to
include in your kernel config. :)




pgp76oiaodLZc.pgp
Description: PGP signature

Re: General flags to turn things off (getrandom, pid lookup, etc)

2014-07-25 Thread Andy Lutomirski

On Fri, Jul 25, 2014 at 4:43 PM, H. Peter Anvin  wrote:
> On 07/25/2014 11:30 AM, Andy Lutomirski wrote:
>> - 32-bit GDT code segments [huge attack surface]
>> - 64-bit GDT code segments [probably pointless]
>
> I presume you mean s/GDT/LDT/.
>
> We already don't allow 64-bit LDT code segments.  Also, it is unclear to
> me how 32-bit LDT segments have a huge attack surface, given that there
> will realistically always be a 32-bit *GDT* segment present.

I really did mean GDT :)  Setting the 32-bit code segment to "not
present" (and using seccomp to block modify_ldt) prevents any attempt
to exploit bugs in the sysenter and cstar code.  It also might prevent
exploiting CPU bugs, although I've never heard of a relevant CPU bug
in this area.

If I actually tried to implement this (which wouldn't be part of the
initial implementation), I'd split out the unusual things in
__switch_to and friends to a slow path that's only used if weird
settings are present (e.g. this, TSC restrictions, etc).  But
twiddling the present bit on a GDT entry is very fast, I assume --
it's just memory, and I don't think that any flush is needed.

Also, if I implement this, I will curse Xen.  I might even go so far
as to disable the feature entirely if there's a paravirt GDT.

Hmm.  A separate flag to turn int $0x80 into GPF could have some value, too.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: General flags to turn things off (getrandom, pid lookup, etc)

2014-07-25 Thread H. Peter Anvin

On 07/25/2014 11:30 AM, Andy Lutomirski wrote:
> - 32-bit GDT code segments [huge attack surface]
> - 64-bit GDT code segments [probably pointless]

I presume you mean s/GDT/LDT/.

We already don't allow 64-bit LDT code segments.  Also, it is unclear to
me how 32-bit LDT segments have a huge attack surface, given that there
will realistically always be a 32-bit *GDT* segment present.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/vdso] x86_64/vsyscall: Fix warn_bad_vsyscall log output

2014-07-25 Thread tip-bot for Andy Lutomirski

Commit-ID:  53b884ac3745353de220d92ef792515c3ae692f0
Gitweb: http://git.kernel.org/tip/53b884ac3745353de220d92ef792515c3ae692f0
Author: Andy Lutomirski 
AuthorDate: Fri, 25 Jul 2014 16:30:27 -0700
Committer:  H. Peter Anvin 
CommitDate: Fri, 25 Jul 2014 16:34:15 -0700

x86_64/vsyscall: Fix warn_bad_vsyscall log output

This commit in Linux 3.6:

commit c767a54ba0657e52e6edaa97cbe0b0a8bf1c1655
Author: Joe Perches 
Date:   Mon May 21 19:50:07 2012 -0700

x86/debug: Add KERN_ to bare printks, convert printks to 
pr_

caused warn_bad_vsyscall to output garbage in the middle of the
line.  Revert the bad part of it.

The printk in question isn't actually bare; the level is "%s".

The bug this fixes is purely cosmetic; backports are optional.

Cc:  # v3.6+
Signed-off-by: Andy Lutomirski 
Link: 
http://lkml.kernel.org/r/03eac1f24110bbe496ecc12a4df467e0d88466d4.1406330947.git.l...@amacapital.net
Signed-off-by: H. Peter Anvin 
---
 arch/x86/kernel/vsyscall_64.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index ea5b570..e1e1e80 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -81,10 +81,10 @@ static void warn_bad_vsyscall(const char *level, struct 
pt_regs *regs,
if (!show_unhandled_signals)
return;
 
-   pr_notice_ratelimited("%s%s[%d] %s ip:%lx cs:%lx sp:%lx ax:%lx si:%lx 
di:%lx\n",
- level, current->comm, task_pid_nr(current),
- message, regs->ip, regs->cs,
- regs->sp, regs->ax, regs->si, regs->di);
+   printk_ratelimited("%s%s[%d] %s ip:%lx cs:%lx sp:%lx ax:%lx si:%lx 
di:%lx\n",
+  level, current->comm, task_pid_nr(current),
+  message, regs->ip, regs->cs,
+  regs->sp, regs->ax, regs->si, regs->di);
 }
 
 static int addr_to_vsyscall_nr(unsigned long addr)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/vdso] x86/vdso: Set VM_MAYREAD for the vvar vma

2014-07-25 Thread tip-bot for Andy Lutomirski

Commit-ID:  ac379835e820de27429b5c4eadf4c1b40320cff4
Gitweb: http://git.kernel.org/tip/ac379835e820de27429b5c4eadf4c1b40320cff4
Author: Andy Lutomirski 
AuthorDate: Fri, 25 Jul 2014 16:27:01 -0700
Committer:  H. Peter Anvin 
CommitDate: Fri, 25 Jul 2014 16:32:53 -0700

x86/vdso: Set VM_MAYREAD for the vvar vma

The VVAR area can, obviously, be read; that is kind of the point.

AFAIK this has no effect whatsoever unless x86 suddenly turns into a
nommu architecture.  Nonetheless, not setting it is suspicious.

Reported-by: Nathan Lynch 
Signed-off-by: Andy Lutomirski 
Link: 
http://lkml.kernel.org/r/e4c8bf4bc2725bda22c4a4b7d0c82adcd8f8d9b8.1406330779.git.l...@amacapital.net
Signed-off-by: H. Peter Anvin 
---
 arch/x86/vdso/vma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index dbef622..970463b 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -138,7 +138,7 @@ static int map_vdso(const struct vdso_image *image, bool 
calculate_addr)
vma = _install_special_mapping(mm,
   addr,
   -image->sym_vvar_start,
-  VM_READ,
+  VM_READ|VM_MAYREAD,
   _mapping);
 
if (IS_ERR(vma)) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC tip/core/rcu] Fix attempt to avoid offloading callbacks unless requested

2014-07-25 Thread Paul E. McKenney

[ Note: This applies on top of commit 187497fa5e9e (rcu: Allow for NULL
tick_nohz_full_mask when nohz_full= missing) in -tip
or -rcu.  To make this work on top of rcu/next, move the
call to rcu_organize_nocb_kthreads(rsp) to the end of the
for_each_rcu_flavor(rsp) loop in rcu_init_nohz(). ]

Commit b58cc46c5f6b (rcu: Don't offload callbacks unless specifically
requested) failed to adjust the callback lists of the CPUs that are
known to be no-CBs CPUs only because they are also nohz_full= CPUs.
This failure can result in callbacks that are posted during early boot
getting stranded on nxtlist for CPUs whose no-CBs property becomes
apparent late, and there can also be spurious warnings about offline
CPUs posting callbacks.

This commit fixes these problems by adding an early-boot rcu_init_nohz()
that properly initializes the no-CBs CPUs.

Note that kernels built with CONFIG_RCU_NOCB_CPU_ALL=y or with
CONFIG_RCU_NOCB_CPU=n do not exhibit this bug.  Neither do kernels
booted without the nohz_full= boot parameter.

Signed-off-by: Paul E. McKenney 

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index d231aa17b1d7..cc7bed1c90dc 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -269,6 +269,14 @@ static inline void rcu_user_hooks_switch(struct 
task_struct *prev,
 struct task_struct *next) { }
 #endif /* CONFIG_RCU_USER_QS */
 
+#ifdef CONFIG_RCU_NOCB_CPU
+void rcu_init_nohz(void);
+#else /* #ifdef CONFIG_RCU_NOCB_CPU */
+static inline void rcu_init_nohz(void)
+{
+}
+#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
+
 /**
  * RCU_NONIDLE - Indicate idle-loop code that needs RCU readers
  * @a: Code that RCU needs to pay attention to.
diff --git a/init/main.c b/init/main.c
index e8ae1fef0908..5d8c83ae6c55 100644
--- a/init/main.c
+++ b/init/main.c
@@ -577,6 +577,7 @@ asmlinkage __visible void __init start_kernel(void)
idr_init_cache();
rcu_init();
tick_nohz_init();
+   rcu_init_nohz();
context_tracking_init();
radix_tree_init();
/* init some links before init_ISA_irqs() */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 00dc411e9676..095d6e4d2fd7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -85,33 +85,6 @@ static void __init rcu_bootup_announce_oddness(void)
pr_info("\tBoot-time adjustment of leaf fanout to %d.\n", 
rcu_fanout_leaf);
if (nr_cpu_ids != NR_CPUS)
pr_info("\tRCU restricting CPUs from NR_CPUS=%d to 
nr_cpu_ids=%d.\n", NR_CPUS, nr_cpu_ids);
-#ifdef CONFIG_RCU_NOCB_CPU
-#ifndef CONFIG_RCU_NOCB_CPU_NONE
-   if (!have_rcu_nocb_mask) {
-   zalloc_cpumask_var(_nocb_mask, GFP_KERNEL);
-   have_rcu_nocb_mask = true;
-   }
-#ifdef CONFIG_RCU_NOCB_CPU_ZERO
-   pr_info("\tOffload RCU callbacks from CPU 0\n");
-   cpumask_set_cpu(0, rcu_nocb_mask);
-#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ZERO */
-#ifdef CONFIG_RCU_NOCB_CPU_ALL
-   pr_info("\tOffload RCU callbacks from all CPUs\n");
-   cpumask_copy(rcu_nocb_mask, cpu_possible_mask);
-#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ALL */
-#endif /* #ifndef CONFIG_RCU_NOCB_CPU_NONE */
-   if (have_rcu_nocb_mask) {
-   if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) {
-   pr_info("\tNote: kernel parameter 'rcu_nocbs=' contains 
nonexistent CPUs.\n");
-   cpumask_and(rcu_nocb_mask, cpu_possible_mask,
-   rcu_nocb_mask);
-   }
-   cpulist_scnprintf(nocb_buf, sizeof(nocb_buf), rcu_nocb_mask);
-   pr_info("\tOffload RCU callbacks from CPUs: %s.\n", nocb_buf);
-   if (rcu_nocb_poll)
-   pr_info("\tPoll for callbacks from no-CBs CPUs.\n");
-   }
-#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
 }
 
 #ifdef CONFIG_TREE_PREEMPT_RCU
@@ -2451,6 +2424,66 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("DeferredWakeEmpty"));
 }
 
+void rcu_init_nohz(void)
+{
+   int cpu;
+   bool need_rcu_nocb_mask = true;
+   struct rcu_state *rsp;
+
+#ifdef CONFIG_RCU_NOCB_CPU_NONE
+   need_rcu_nocb_mask = false;
+#endif /* #ifndef CONFIG_RCU_NOCB_CPU_NONE */
+
+#if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL)
+   if (tick_nohz_full_running && cpumask_weight(tick_nohz_full_mask))
+   need_rcu_nocb_mask = true;
+#endif /* #if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL) */
+
+   if (!have_rcu_nocb_mask && need_rcu_nocb_mask) {
+   zalloc_cpumask_var(_nocb_mask, GFP_KERNEL);
+   have_rcu_nocb_mask = true;
+   }
+   if (!have_rcu_nocb_mask)
+   return;
+
+#ifdef CONFIG_RCU_NOCB_CPU_ZERO
+   pr_info("\tOffload RCU callbacks from CPU 0\n");
+   cpumask_set_cpu(0,

[PATCH] x86_64,vsyscall: Fix warn_bad_vsyscall log output

2014-07-25 Thread Andy Lutomirski

This commit in Linux 3.6:

commit c767a54ba0657e52e6edaa97cbe0b0a8bf1c1655
Author: Joe Perches 
Date:   Mon May 21 19:50:07 2012 -0700

x86/debug: Add KERN_ to bare printks, convert printks to 
pr_

caused warn_bad_vsyscall to output garbage in the middle of the
line.  Revert the bad part of it.

The printk in question isn't actually bare; the level is "%s".

The bug this fixes is purely cosmetic; backports are optional.

Cc: sta...@vger.kernel.org
Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/vsyscall_64.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index ea5b570..e1e1e80 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -81,10 +81,10 @@ static void warn_bad_vsyscall(const char *level, struct 
pt_regs *regs,
if (!show_unhandled_signals)
return;
 
-   pr_notice_ratelimited("%s%s[%d] %s ip:%lx cs:%lx sp:%lx ax:%lx si:%lx 
di:%lx\n",
- level, current->comm, task_pid_nr(current),
- message, regs->ip, regs->cs,
- regs->sp, regs->ax, regs->si, regs->di);
+   printk_ratelimited("%s%s[%d] %s ip:%lx cs:%lx sp:%lx ax:%lx si:%lx 
di:%lx\n",
+  level, current->comm, task_pid_nr(current),
+  message, regs->ip, regs->cs,
+  regs->sp, regs->ax, regs->si, regs->di);
 }
 
 static int addr_to_vsyscall_nr(unsigned long addr)
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/1] rcu: Use rcu_gp_kthread_wake() to wake up kthreads

2014-07-25 Thread Pranith Kumar


On 07/25/2014 07:15 PM, Paul E. McKenney wrote:
> On Fri, Jul 25, 2014 at 06:23:41PM -0400, Pranith Kumar wrote:
>> Here total is the total number of times we enter th function 
>> rcu_report_qs_rsp()
>> and unnecessary is the times we call wake_up() unnecessarily.
>> case1, 2, 3 are the cases I listed above.
>>
>> Note that the frequency has gone way up than before, I am not sure why that 
>> is.
>>
>> *ALL* the wakeups seem to be unnecessary from that location. And the
>> main reason is that gp_flags is 0.
>>
>> My rcugp file has the following:
>>
>> completed=257515  gpnum=257516  age=1  max=1684
>>
>> Thoughts?
> Hard to believe in the rcutorture case.  My guess was that rcutorture was
> doing about 9000 wakeups, 2000 of which were unnecessary.  Which would
> of course still tilt things very much in favor of your patch.
>
> I am not surprised in the mostly-idle case, as the RCU grace-period
> kthread would most likely be the one ending the grace period, which
> would therefore almost always be a self-wakeup.
>
> Any chance of a peek at your debugging code?
>
>   Thanx, Paul
>

Sure, I am also attaching my dmesg output. Hope it helps!

--
Pranith

---
 kernel/rcu/tree.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 946d47b..10ac44e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1936,8 +1936,27 @@ static bool rcu_start_gp(struct rcu_state *rsp)
 static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned long flags)
 __releases(rcu_get_root(rsp)->lock)
 {
+static unsigned long total_wakeups = 0, unnecessary_wakeups = 0;
+static unsigned long case1 = 0, case2 = 0, case3 = 0;
+
 WARN_ON_ONCE(!rcu_gp_in_progress(rsp));
 raw_spin_unlock_irqrestore(_get_root(rsp)->lock, flags);
+total_wakeups++;
+
+if (current == rsp->gp_kthread ||
+!ACCESS_ONCE(rsp->gp_flags) ||
+!rsp->gp_kthread) {
+
+unnecessary_wakeups++;
+if (current == rsp->gp_kthread) case1++;
+if (!ACCESS_ONCE(rsp->gp_flags)) case2++;
+if (!rsp->gp_kthread) case3++;
+
+if (unnecessary_wakeups % 2000 == 0)
+pr_info("Total:%lu, unnecessary:%lu, case1:%lu, case2:%lu, 
case3:%lu\n",
+total_wakeups, unnecessary_wakeups, case1, case2, case3);
+
+}
 wake_up(>gp_wq);  /* Memory barrier implied by wake_up() path. */
 }
 
-- 
2.0.1
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Initializing cgroup subsys cpuacct
[0.00] Linux version 3.16.0-rc6-next-20140725+ (pranith@homedesk) (gcc 
version 4.9.1 (Debian 4.9.1-1) ) #10 SMP Fri Jul 25 16:53:16 EDT 2014
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-3.16.0-rc6-next-20140725+ 
root=UUID=3a4fa5bb-491b-41db-829c-a075c262c775 ro quiet
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009c3ff] usable
[0.00] BIOS-e820: [mem 0x0009c400-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xde881fff] usable
[0.00] BIOS-e820: [mem 0xde882000-0xdea7] reserved
[0.00] BIOS-e820: [mem 0xdea8-0xdea8] ACPI data
[0.00] BIOS-e820: [mem 0xdea9-0xdeba0fff] ACPI NVS
[0.00] BIOS-e820: [mem 0xdeba1000-0xdee8efff] reserved
[0.00] BIOS-e820: [mem 0xdee8f000-0xdee8] usable
[0.00] BIOS-e820: [mem 0xdee9-0xdeed2fff] ACPI NVS
[0.00] BIOS-e820: [mem 0xdeed3000-0xdf7f] usable
[0.00] BIOS-e820: [mem 0xf000-0xf7ff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfed0-0xfed03fff] reserved
[0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xff00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00041eff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.7 present.
[0.00] DMI:  /DZ77GA-70K, BIOS 
GAZ7711H.86A.0059.2012.1106.1053 11/06/2012
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x41f000 max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[

Re: [PATCH] x86,vdso: Set VM_MAYREAD for the vvar vma

2014-07-25 Thread Andy Lutomirski

On Fri, Jul 25, 2014 at 4:27 PM, Andy Lutomirski  wrote:
> AFAIK this has no effect whatsoever unless x86 suddenly turns into a
> nommu architecture.  Nonetheless, not setting it is suspicious.

Sorry, forgot to mention: this is based on tip/x86/vdso.

--Andy

>
> Reported-by: Nathan Lynch 
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/vdso/vma.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
> index dbef622..970463b 100644
> --- a/arch/x86/vdso/vma.c
> +++ b/arch/x86/vdso/vma.c
> @@ -138,7 +138,7 @@ static int map_vdso(const struct vdso_image *image, bool 
> calculate_addr)
> vma = _install_special_mapping(mm,
>addr,
>-image->sym_vvar_start,
> -  VM_READ,
> +  VM_READ|VM_MAYREAD,
>_mapping);
>
> if (IS_ERR(vma)) {
> --
> 1.9.3
>



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86,vdso: Set VM_MAYREAD for the vvar vma

2014-07-25 Thread Andy Lutomirski

AFAIK this has no effect whatsoever unless x86 suddenly turns into a
nommu architecture.  Nonetheless, not setting it is suspicious.

Reported-by: Nathan Lynch 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/vdso/vma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index dbef622..970463b 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -138,7 +138,7 @@ static int map_vdso(const struct vdso_image *image, bool 
calculate_addr)
vma = _install_special_mapping(mm,
   addr,
   -image->sym_vvar_start,
-  VM_READ,
+  VM_READ|VM_MAYREAD,
   _mapping);
 
if (IS_ERR(vma)) {
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] neighbour : fix ndm_type type error issue

2014-07-25 Thread Hannes Frederic Sowa

On Fri, Jul 25, 2014, at 18:38, Jun Zhao wrote:
> ndm_type means L3 address type, in neighbour proxy and vxlan, it's
> RTN_UNICAST.
> NDA_DST is for netlink TLV type, hence it's not right value in this
> context.

The value of NDA_DST == RTN_UNICAST, otherwise we couldn't do this
change as it would alter e.g. arpd behavior.

Acked-by: Hannes Frederic Sowa 

Thanks,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: net, phonet, rcu: rcu hang within gprs_attach

2014-07-25 Thread Sasha Levin

On 07/25/2014 07:19 PM, Paul E. McKenney wrote:
> On Thu, Jul 24, 2014 at 07:28:35PM -0400, Sasha Levin wrote:
>> > On 07/24/2014 06:54 PM, Paul E. McKenney wrote:
>>> > > On Thu, Jul 24, 2014 at 06:19:11PM -0400, Sasha Levin wrote:
 > >> Hi all,
 > >>
 > >> While fuzzing with trinity inside a KVM tools guest running the 
 > >> latest -next
 > >> kernel I've stumbled on the following stack trace (full log attached):
 > >>
 > >> [  370.662014] INFO: task trinity-main:8727 blocked for more than 120 
 > >> seconds.
 > >> [  370.662891]   Not tainted 
 > >> 3.16.0-rc6-next-20140724-sasha-00046-g7324c87-dirty #932
 > >> [  370.663655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
 > >> disables this message.
 > >> [  370.664562] trinity-mainD 88053cc8 13064  8727   8714 
 > >> 0x
 > >> [  370.665328]  88053da6fc10 0002 8805483e2dc8 
 > >> 880541873000
 > >> [  370.666147]  00276ed30787 88053da6c010 88053da6c000 
 > >> 8805452a
 > >> [  370.667243]  880541873000  7fff 
 > >> b3ec51d8
 > >> [  370.668788] Call Trace:
 > >> [  370.669118] schedule (kernel/sched/core.c:2847)
 > >> [  370.670538] schedule_timeout (kernel/time/timer.c:1476)
 > >> [  370.671524] ? mark_lock (kernel/locking/lockdep.c:2894)
 > >> [  370.672299] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
 > >> [  370.673227] ? get_parent_ip (kernel/sched/core.c:2561)
 > >> [  370.674085] wait_for_completion (include/linux/spinlock.h:328 
 > >> kernel/sched/completion.c:76 kernel/sched/completion.c:93 
 > >> kernel/sched/completion.c:101 kernel/sched/completion.c:122)
 > >> [  370.674960] ? wake_up_state (kernel/sched/core.c:2942)
 > >> [  370.675576] _rcu_barrier (kernel/rcu/tree.c:3325 (discriminator 8))
 > >> [  370.676109] rcu_barrier (kernel/rcu/tree_plugin.h:920)
 > >> [  370.676627] netdev_run_todo (net/core/dev.c:6323)
 > >> [  370.677202] rtnl_unlock (net/core/rtnetlink.c:80)
 > >> [  370.677714] unregister_netdev (net/core/dev.c:6687)
 > >> [  370.678266] gprs_attach (net/phonet/pep-gprs.c:311)
 > >> [  370.679641] pep_setsockopt (net/phonet/pep.c:1016)
 > >> [  370.681082] sock_common_setsockopt (net/core/sock.c:2603)
 > >> [  370.682048] SyS_setsockopt (net/socket.c:1914 net/socket.c:1894)
 > >> [  370.682854] tracesys (arch/x86/kernel/entry_64.S:541)
 > >> [  370.683586] 1 lock held by trinity-main/8727:
 > >> [  370.684232] #0: (rcu_preempt_state.barrier_mutex){+.+...}, at: 
 > >> _rcu_barrier (kernel/rcu/tree.c:3233)
 > >>
 > >> This has reproduced couple of times, and has always originated from 
 > >> gprs_attach. I don't see any obvious
 > >> issues with the code there, so I'm not sure if it's a fault of the 
 > >> phonet or the rcu code.
>>> > > 
>>> > > Can't tell much from this.  Any chance of a .config?
>>> > > 
>>> > > Thanx, Paul
>>> > > 
>> > 
>> > Attached.
> If you were doing partial nohz_full= CPUs, there is a recent RCU bug
> that would result in these symptoms.  No idea how you would make it
> happen without specifying the nohz_full= boot parameter, but I should
> be getting the fix into -next in a few days.
> 
> But you never know.  So if you are interested in testing sooner, and if
> my local tests pass, I could send you a modified patch that applies on
> top of rcu/next.  If you would like such a patch, let me know.

Sure, if you Cc me on it I'll be happy to test it out, just don't go out
of your way since I've disabled phonet for now anyways, so it's not really
delaying me.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel 3.16-rc6 Bug with Sound?

2014-07-25 Thread Nick Krause

On Fri, Jul 25, 2014 at 7:19 PM,   wrote:
> On Fri, 25 Jul 2014 18:57:33 -0400, Nick Krause said:
>
>> Hey guys after compiling and running the kernel in the subject line I
>> get no sound
>> and a message of no sound codec could be found. I am new so this may be
>> a  missed needed config for sound or it's a bug. I am attaching my config
>> to help you guys out :).
>
> You want to *really* be helpful, you could tell us what lspci/lsusb/etc
> say your sound card is, so we can tell you which driver you forgot to
> include in your kernel config. :)
>
>
Sorry again , I need to get used to not holding my hand :).
Sure I will paste below the lsmod output I get on a Ubuntu
distro kernel that works.
eeepc_wmi  13151  0
asus_wmi   24191  1 eeepc_wmi
sparse_keymap  13948  1 asus_wmi
snd_hda_codec_hdmi 46254  1
dm_multipath   22873  0
scsi_dh14882  1 dm_multipath
intel_rapl 18773  0
x86_pkg_temp_thermal14205  0
intel_powerclamp   14705  0
kvm_intel 143060  0
kvm   451511  1 kvm_intel
crct10dif_pclmul   14289  0
crc32_pclmul   13113  0
ghash_clmulni_intel13216  0
aesni_intel55624  0
snd_hda_codec_realtek61438  1
aes_x86_64 17131  1 aesni_intel
lrw13286  1 aesni_intel
gf128mul   14951  1 lrw
glue_helper13990  1 aesni_intel
ablk_helper13597  1 aesni_intel
cryptd 20359  3 ghash_clmulni_intel,aesni_intel,ablk_helper
rfcomm 69160  4
bnep   19624  2
snd_hda_intel  52355  5
snd_hda_codec 192906  3
snd_hda_codec_realtek,snd_hda_codec_hdmi,snd_hda_intel
serio_raw  13462  0
snd_hwdep  13602  1 snd_hda_codec
bluetooth 391196  10 bnep,rfcomm
snd_pcm   102099  3 snd_hda_codec_hdmi,snd_hda_codec,snd_hda_intel
snd_page_alloc 18710  2 snd_pcm,snd_hda_intel
snd_seq_midi   13324  0
snd_seq_midi_event 14899  1 snd_seq_midi
snd_rawmidi30144  1 snd_seq_midi
lpc_ich21080  0
snd_seq61560  2 snd_seq_midi_event,snd_seq_midi
snd_seq_device 14497  3 snd_seq,snd_rawmidi,snd_seq_midi
snd_timer  29482  2 snd_pcm,snd_seq
mac_hid13205  0
parport_pc 32701  0
binfmt_misc17468  1
ppdev  17671  0
snd69238  21
snd_hda_codec_realtek,snd_hwdep,snd_timer,snd_hda_codec_hdmi,snd_pcm,snd_seq,snd_rawmidi,snd_hda_codec,snd_hda_intel,snd_seq_device,snd_seq_midi
nct677555222  0
hwmon_vid  12783  1 nct6775
fglrx8815330  97
mei_me 18627  0
coretemp   13435  0
lp 17759  0
amd_iommu_v2   19054  1 fglrx
parport42348  3 lp,ppdev,parport_pc
mei82276  1 mei_me
soundcore  12680  1 snd
nls_iso8859_1  12713  1
btrfs 835954  0
ses17363  0
enclosure  15368  1 ses
xor21411  1 btrfs
raid6_pq   97812  1 btrfs
libcrc32c  12644  1 btrfs
dm_mirror  22135  0
dm_region_hash 20862  1 dm_mirror
dm_log 18411  2 dm_region_hash,dm_mirror
hid_generic12548  0
usbhid 52570  0
hid   106148  2 hid_generic,usbhid
usb_storage62209  0
ahci   25819  4
r8169  67581  0
libahci32560  1 ahci
mii13934  1 r8169
video  19476  1 asus_wmi
wmi19177  1 asus_wmi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] neighbour : fix ndm_type type error issue

2014-07-25 Thread Cong Wang

On Fri, Jul 25, 2014 at 9:38 AM, Jun Zhao  wrote:
> ndm_type means L3 address type, in neighbour proxy and vxlan, it's 
> RTN_UNICAST.
> NDA_DST is for netlink TLV type, hence it's not right value in this context.
>

Looks correct to me, at least libnl uses RTN_* for ndm_type.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/10] Input - wacom: Check for bluetooth protocol while setting OLEDs

2014-07-25 Thread Benjamin Tissoires

Bluetooth Intuos 4 use 1-bit definition while the USB ones use a 4-bits
definition. This changes the size of the raw image we receive, and thus
the kernel will only accept 1-bit images for Bluetooth and 4-bits for
USB.

Signed-off-by: Benjamin Tissoires 
---
 drivers/hid/wacom_sys.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c
index 3adc6ef..42f139f 100644
--- a/drivers/hid/wacom_sys.c
+++ b/drivers/hid/wacom_sys.c
@@ -531,12 +531,14 @@ static int wacom_led_control(struct wacom *wacom)
return retval;
 }
 
-static int wacom_led_putimage(struct wacom *wacom, int button_id, const void 
*img)
+static int wacom_led_putimage(struct wacom *wacom, int button_id,
+   const unsigned len, const void *img)
 {
unsigned char *buf;
int i, retval;
+   const unsigned chunk_len = len / 4; /* 4 chunks are needed to be sent */
 
-   buf = kzalloc(259, GFP_KERNEL);
+   buf = kzalloc(chunk_len + 3 , GFP_KERNEL);
if (!buf)
return -ENOMEM;
 
@@ -552,11 +554,11 @@ static int wacom_led_putimage(struct wacom *wacom, int 
button_id, const void *im
buf[1] = button_id & 0x07;
for (i = 0; i < 4; i++) {
buf[2] = i;
-   memcpy(buf + 3, img + i * 256, 256);
+   memcpy(buf + 3, img + i * chunk_len, chunk_len);
 
retval = wacom_set_report(wacom->hdev, HID_FEATURE_REPORT,
  WAC_CMD_ICON_XFER,
- buf, 259, WAC_CMD_RETRIES);
+ buf, chunk_len + 3, WAC_CMD_RETRIES);
if (retval < 0)
break;
}
@@ -657,13 +659,14 @@ static ssize_t wacom_button_image_store(struct device 
*dev, int button_id,
struct hid_device *hdev = container_of(dev, struct hid_device, dev);
struct wacom *wacom = hid_get_drvdata(hdev);
int err;
+   const unsigned len = hdev->bus == BUS_BLUETOOTH ? 256 : 1024;
 
-   if (count != 1024)
+   if (count != len)
return -EINVAL;
 
mutex_lock(>lock);
 
-   err = wacom_led_putimage(wacom, button_id, buf);
+   err = wacom_led_putimage(wacom, button_id, len, buf);
 
mutex_unlock(>lock);
 
-- 
2.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: vt6655: fix direct dereferencing of user pointer

2014-07-25 Thread Malcolm Priestley


Hi Guillaume

On 25/07/14 13:47, Guillaume Clement wrote:

Sparse reported that the data from tagSCmdRequest is given by
userspace, so it should be tagged as such.

extra is not in user space

All Wireless Extensions ioctl extra calls originate from 
ioctl_standard_iw_point in wext-core.


Either through ioctl or iw_handler

All these functions should have been converted to iw_handler.

Regards


Malcolm



Later, we were memcomparing and dereferencing it without first copying
it, fix that as well.

Signed-off-by: Guillaume Clement 
---
  drivers/staging/vt6655/iocmd.h |  2 +-
  drivers/staging/vt6655/iwctl.c | 32 ++--
  drivers/staging/vt6655/iwctl.h |  6 +++---
  3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/drivers/staging/vt6655/iocmd.h b/drivers/staging/vt6655/iocmd.h
index e499f1b..dd12498 100644
--- a/drivers/staging/vt6655/iocmd.h
+++ b/drivers/staging/vt6655/iocmd.h
@@ -100,7 +100,7 @@ typedef enum tagWZONETYPE {
  #pragma pack(1)
  typedef struct tagSCmdRequest {
u8  name[16];
-   void*data;
+   void __user *data;
u16 wResult;
u16 wCmdCode;
  } SCmdRequest, *PSCmdRequest;
diff --git a/drivers/staging/vt6655/iwctl.c b/drivers/staging/vt6655/iwctl.c
index 501cd64..7ce23b5 100644
--- a/drivers/staging/vt6655/iwctl.c
+++ b/drivers/staging/vt6655/iwctl.c
@@ -1621,17 +1621,24 @@ int iwctl_giwauth(struct net_device *dev,
  int iwctl_siwgenie(struct net_device *dev,
   struct iw_request_info *info,
   struct iw_point *wrq,
-  char *extra)
+  char __user *extra)
  {
PSDevicepDevice = (PSDevice)netdev_priv(dev);
PSMgmtObjectpMgmt = &(pDevice->sMgmtObj);
int ret = 0;
+   char length;

if (wrq->length) {
-   if ((wrq->length < 2) || (extra[1]+2 != wrq->length)) {
-   ret = -EINVAL;
-   goto out;
-   }
+   if (wrq->length < 2)
+   return -EINVAL;
+
+   ret = get_user(length, extra + 1);
+   if (ret)
+   return ret;
+
+   if (length + 2 != wrq->length)
+   return -EINVAL;
+
if (wrq->length > MAX_WPA_IE_LEN) {
ret = -ENOMEM;
goto out;
@@ -1654,7 +1661,7 @@ out://not completely ...not necessary in wpa_supplicant 
0.5.8
  int iwctl_giwgenie(struct net_device *dev,
   struct iw_request_info *info,
   struct iw_point *wrq,
-  char *extra)
+  char __user *extra)
  {
PSDevicepDevice = (PSDevice)netdev_priv(dev);
PSMgmtObjectpMgmt = &(pDevice->sMgmtObj);
@@ -1801,18 +1808,23 @@ int iwctl_giwencodeext(struct net_device *dev,
  int iwctl_siwmlme(struct net_device *dev,
  struct iw_request_info *info,
  struct iw_point *wrq,
- char *extra)
+ char __user *extra)
  {
PSDevicepDevice = (PSDevice)netdev_priv(dev);
PSMgmtObjectpMgmt = &(pDevice->sMgmtObj);
-   struct iw_mlme *mlme = (struct iw_mlme *)extra;
+   struct iw_mlme mime;
+
int ret = 0;

-   if (memcmp(pMgmt->abyCurrBSSID, mlme->addr.sa_data, ETH_ALEN)) {
+   ret = copy_from_user(, extra, sizeof(mime));
+   if (ret)
+   return -EFAULT;
+
+   if (memcmp(pMgmt->abyCurrBSSID, mime.addr.sa_data, ETH_ALEN)) {
ret = -EINVAL;
return ret;
}
-   switch (mlme->cmd) {
+   switch (mime.cmd) {
case IW_MLME_DEAUTH:
//this command seems to be not complete,please test it 
--einsnliu
//bScheduleCommand((void *) pDevice, WLAN_CMD_DEAUTH, (unsigned 
char *));
diff --git a/drivers/staging/vt6655/iwctl.h b/drivers/staging/vt6655/iwctl.h
index de0a337..7dd6310 100644
--- a/drivers/staging/vt6655/iwctl.h
+++ b/drivers/staging/vt6655/iwctl.h
@@ -176,12 +176,12 @@ int iwctl_giwauth(struct net_device *dev,
  int iwctl_siwgenie(struct net_device *dev,
   struct iw_request_info *info,
   struct iw_point *wrq,
-  char *extra);
+  char __user *extra);

  int iwctl_giwgenie(struct net_device *dev,
   struct iw_request_info *info,
   struct iw_point *wrq,
-  char *extra);
+  char __user *extra);

  int iwctl_siwencodeext(struct net_device *dev,
   struct iw_request_info *info,
@@ -196,7 +196,7 @@ int iwctl_giwencodeext(struct net_device *dev,
  int iwctl_siwmlme(struct net_device *dev,
  struct iw_request_info *info,
  struct iw_point *wrq,
- char *extra);
+ char __user *extra);
  #endif

Re: [RFC PATCH 1/1] rcu: Use rcu_gp_kthread_wake() to wake up kthreads

2014-07-25 Thread Paul E. McKenney

On Fri, Jul 25, 2014 at 06:23:41PM -0400, Pranith Kumar wrote:
> On Fri, Jul 25, 2014 at 11:02 AM, Paul E. McKenney
>  wrote:
> > On Fri, Jul 25, 2014 at 02:24:34AM -0400, Pranith Kumar wrote:
> >> On Fri, Jul 25, 2014 at 1:06 AM, Pranith Kumar  
> >> wrote:
> >>
> >> >
> >> > In rcu_report_qs_rsp(), I added a pr_info() call testing if any of the 
> >> > above
> >> > conditions is true, in which case we can avoid calling wake_up(). It 
> >> > turns out
> >> > that quite a few actually are. Most of the cases where we can avoid is 
> >> > condition 2
> >> > above and condition 1 also occurs quite often. Condition 3 never happens.
> >> >
> >>
> >> A little more data. On an idle system there are about 2000 unnecessary
> >> wake_up() calls every 5 minutes with the most common trace being the
> >> follows:
> >>
> >> [Fri Jul 25 02:05:49 2014]  [] 
> >> rcu_report_qs_rnp+0x285/0x2c0
> >> [Fri Jul 25 02:05:49 2014]  [] ? 
> >> schedule_timeout+0x159/0x270
> >> [Fri Jul 25 02:05:49 2014]  [] force_qs_rnp+0x111/0x190
> >> [Fri Jul 25 02:05:49 2014]  [] ? 
> >> synchronize_rcu_bh+0x50/0x50
> >> [Fri Jul 25 02:05:49 2014]  [] rcu_gp_kthread+0x85f/0xa70
> >> [Fri Jul 25 02:05:49 2014]  [] ? __wake_up_sync+0x20/0x20
> >> [Fri Jul 25 02:05:49 2014]  [] ? rcu_barrier+0x20/0x20
> >> [Fri Jul 25 02:05:49 2014]  [] kthread+0xdb/0x100
> >>   []?kthread_create_on_node+0x180/0x180
> >> [Fri Jul 25 02:05:49 2014]  [] ret_from_fork+0x7c/0xb0
> >>   [] ?kthread_create_on_node+0x180/0x180
> >>
> >> With rcutorture, there are about 2000 unnecessary wake_ups() every 3
> >> minutes with the most common trace being:
> >>
> >> [Fri Jul 25 02:18:30 2014]  [] 
> >> rcu_report_qs_rnp+0x285/0x2c0
> >> [Fri Jul 25 02:18:30 2014]  [] ? 
> >> __update_cpu_load+0xe5/0x140
> >>  [] ?rcu_read_delay+0x50/0x80 
> >> [rcutorture]
> >>  []rcu_process_callbacks+0x6b8/0x7e0
> >
> > Good to see the numbers!!!
> >
> > But to evaluate this analytically, we should compare the overhead of the
> > wake_up() with the overhead of the extra checks in rcu_gp_kthread_wake(),
> > and then compare the number of unnecessary wake_up()s to the number of
> > calls to rcu_gp_kthread_wake() added by this patch.  This means that we
> > need more numbers.
> >
> > For example, suppose that the extra checks cost 10ns on average, and that
> > a unnecessary wake_up() costs 1us on average, to that each wake_up()
> > is on average 100 times more expensive than the extra checks.  Then it
> > makes sense to ask whether the saved wake_up() save more time than the
> > extra tests cost.  Turning the arithmetic crank says that if more than 1%
> > of the wake_up()s are unnecessary, we should add the checks.
> >
> > This means that if there are fewer than 200,000 grace periods in each
> > of the time periods, then your patch really would provide performance
> > benefits.  I bet that there are -way- fewer than 200,000 grace periods in
> > each of the time periods, but why don't you build with RCU_TRACE and look
> > at the "rcugp" file in RCU's debugfs hierarchy?  Or just periodically
> > print out the rcu_state ->completed field?
> >
> 
> I put some debugging code to see how many unnecessary wake ups were
> being generated in rcu_report_qs_rsp(). I ran both with and without
> rcutorture running. Here are the results
> 
> Without rcutorture:
> 
> [   14.839214] Total:2000, unnecessary:2000, case1:1741, case2:2000, case3:0
> [  224.284633] Total:4000, unnecessary:4000, case1:3652, case2:4000, case3:0
> [  244.159021] Total:6000, unnecessary:6000, case1:5539, case2:6000, case3:0
> [  260.522175] Total:8000, unnecessary:8000, case1:7447, case2:8000, case3:0
> [  268.293058] Total:1, unnecessary:1, case1:9317, case2:1, 
> case3:0
> [  275.962033] Total:12000, unnecessary:12000, case1:11159, case2:12000, 
> case3:0
> [  287.411032] Total:14000, unnecessary:14000, case1:13008, case2:14000, 
> case3:0
> [  304.868334] Total:16000, unnecessary:16000, case1:14885, case2:16000, 
> case3:0
> [  318.090930] Total:18000, unnecessary:18000, case1:16747, case2:18000, 
> case3:0
> [  333.423876] Total:2, unnecessary:2, case1:18631, case2:2, 
> case3:0
> [  346.775399] Total:22000, unnecessary:22000, case1:20502, case2:22000, 
> case3:0
> [  362.867751] Total:24000, unnecessary:24000, case1:22386, case2:24000, 
> case3:0
> [  376.777817] Total:26000, unnecessary:26000, case1:24251, case2:26000, 
> case3:0
> [  391.839994] Total:28000, unnecessary:28000, case1:26118, case2:28000, 
> case3:0
> [  406.559406] Total:3, unnecessary:3, case1:27983, case2:3, 
> case3:0
> [  419.973867] Total:32000, unnecessary:32000, case1:29855, case2:32000, 
> case3:0
> [  435.080002] Total:34000, unnecessary:34000, case1:31740, case2:34000, 
> case3:0
> [  449.077018] Total:36000, unnecessary:36000, case1:33588, case2:36000, 
> case3:0
> [  464.418942] Total:38000, unnecessary:38000, case1:35460, case2:38000, 
> case3:0
> [

Re: WARNING: CPU: 0 PID: 2623 at drivers/pnp/pnpacpi/core.c:96 pnpacpi_set_resource

2014-07-25 Thread Vinson Lee

On Thu, May 29, 2014 at 4:14 AM, Rafael J. Wysocki  wrote:
> On Thursday, May 29, 2014 10:41:43 AM Zdenek Kabelac wrote:
>> Hi
>>
>>
>> I've noticed this message in my dmesg:
>> (Possibly related to this commit?:
>> a8d22396302b7e4e5f0a594c1c1594388c29edaf)
>
> Well, does reverting that commit make the warning go away?
>
> Rafael
>
>
>> (My vanilla git commit number for my kernel:
>> cd79bde29f00a346eec3fe17c1c5073c37ed95e7)
>>
>> Zdenek
>>
>>
>> [ 2174.058615] ata5: port disabled--ignoring
>> [ 2174.059460] sd 0:0:0:0: [sda] Starting disk
>> [ 2174.076342] [ cut here ]
>> [ 2174.076350] WARNING: CPU: 0 PID: 2623 at drivers/pnp/pnpacpi/core.c:96
>> pnpacpi_set_resources+0x14f/0x160()
>> [ 2174.076412] Modules linked in: dm_raid raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx raid1 raid10 dm_mod md_mod xor
>> raid6_pq i915 i2c_algo_bit drm_kms_helper drm xt_CHECKSUM iptable_mangle
>> ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 
>> nf_defrag_ipv4
>> xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables
>> x_tables tun bridge stp llc ipv6 hid_generic usbhid hid snd_hda_codec_analog
>> snd_hda_codec_generic iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm
>> microcode psmouse serio_raw i2c_i801 i2c_core arc4 lpc_ich mfd_core r852
>> iwl3945 sm_common nand r592 nand_ecc nand_ids iwlegacy mtd memstick mac80211
>> sdhci_pci pcmcia sdhci snd_hda_intel mmc_core snd_hda_controller 
>> snd_hda_codec
>> snd_hwdep snd_seq snd_seq_device snd_pcm cfg80211 e1000e ptp snd_timer
>> [ 2174.076433]  pps_core wmi thinkpad_acpi nvram snd soundcore evdev nfsd
>> auth_rpcgss oid_registry nfs_acl lockd binfmt_misc loop sunrpc uhci_hcd 
>> sr_mod
>> cdrom yenta_socket ehci_pci ehci_hcd usbcore usb_common video backlight 
>> autofs4
>> [ 2174.076436] CPU: 0 PID: 2623 Comm: systemd-sleep Not tainted
>> 3.15.0-rc7-00044-g887210a #209
>> [ 2174.076437] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 )
>> 03/18/2011
>> [ 2174.076440]  0009 880095d55c40 815db694
>> 
>> [ 2174.076444]  880095d55c78 8104e78d 
>> 880136c3cd98
>> [ 2174.076447]  8800bac2b000 8180cb5a 
>> 880095d55c88
>> [ 2174.076448] Call Trace:
>> [ 2174.076453]  [] dump_stack+0x4e/0x7a
>> [ 2174.076643]  [] warn_slowpath_common+0x7d/0xa0
>> [ 2174.076646]  [] warn_slowpath_null+0x1a/0x20
>> [ 2174.076648]  [] pnpacpi_set_resources+0x14f/0x160
>> [ 2174.076651]  [] pnp_start_dev+0x42/0x80
>> [ 2174.076655]  [] pnp_bus_resume+0x88/0xa0
>> [ 2174.076658]  [] ? pnp_bus_suspend+0x20/0x20
>> [ 2174.076662]  [] dpm_run_callback+0x49/0xa0
>> [ 2174.076664]  [] device_resume+0xc8/0x1f0
>> [ 2174.076667]  [] dpm_resume+0x119/0x250
>> [ 2174.076670]  [] dpm_resume_end+0x11/0x20
>> [ 2174.076673]  [] suspend_devices_and_enter+0xff/0x680
>> [ 2174.076676]  [] pm_suspend+0x1e7/0x2a0
>> [ 2174.076678]  [] state_store+0x7c/0xf0
>> [ 2174.076683]  [] kobj_attr_store+0xf/0x20
>> [ 2174.076686]  [] sysfs_kf_write+0x45/0x60
>> [ 2174.076690]  [] kernfs_fop_write+0xf9/0x180
>> [ 2174.076694]  [] vfs_write+0xbd/0x1e0
>> [ 2174.076696]  [] SyS_write+0x49/0xb0
>> [ 2174.076700]  [] system_call_fastpath+0x1a/0x1f
>> [ 2174.081587] ---[ end trace 32ffe1e61f685f01 ]---
>> [ 2174.082221] serial 00:09: activated
>> [ 2174.233290] thinkpad_acpi: ACPI backlight control delay disabled
>> [ 2174.323685] ata4.00: ACPI cmd ef/03:42:00:00:00:a0 (SET FEATURES) 
>> filtered out
>> [ 2174.323688] ata4.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) 
>> filtered out
>> [ 2174.324671] ata4.00: ACPI cmd e3/00:79:00:00:00:a0 (IDLE) succeeded
>> [ 2174.325651] ata4.00: ACPI cmd e3/00:01:00:00:00:a0 (IDLE) succeeded
>> [ 2174.345266] ata4.00: configured for UDMA/33
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


Hi.

I hit a similar stack trace with 3.14.8.

[ cut here ]
 WARNING: CPU: 23 PID: 1 at drivers/pnp/pnpacpi/core.c:96
pnpacpi_set_resources+0x9a/0x134()
 Modules linked in:
 CPU: 23 PID: 1 Comm: swapper/0 Not tainted 3.14.8 #1
   881fd29f5c78 814e27e5 
  881fd29f5cb0 8105c4b1 812f32f3 
  883fd1f0a800 881fff047348  881fd29f5cc0
 Call Trace:
  [] dump_stack+0x45/0x56
  [] warn_slowpath_common+0x7f/0x98
  [] ?

Re: [PATCH v2 00/10] Input - wacom: conversion to HID driver, series 2

2014-07-25 Thread Benjamin Tissoires

Hi Przemo,

On Jul 25 2014 or thereabouts, Przemo Firszt wrote:
> Dnia 2014-07-24, czw o godzinie 14:13 -0400, Benjamin Tissoires pisze:
> [..]
> Hi Benjamin,
> I'm testing the whole series including the OLED patch that's not on the
> list yet.
> 
> Hardware: 2 x Intuos4 Wireless tested on usb and bluetooth until noted
> otherwise.
> 
> What works:
> 1. Tablet in general, pressure, tilt, buttons etc.
> 2. Battery reporting (including gnome). The double wireless tablet bug
> is gone:
> 
> $ ls /sys/class/power_supply/
> AC  BAT0  wacom_ac_2  wacom_ac_3  wacom_battery_2  wacom_battery_3
> 
> 3. Setting LED selector value
> 4. Setting LED selector brightness (default and pressed)
> 5. Rendering images to button displays works on usb ONLY.
> 
> $ i4oled -d 
> /sys/bus/hid/drivers/wacom/0003\:056A\:00BC.0009/wacom_led/button0_rawimg -t 
> Linux
> 
> On bluetooth writing image goes fine (no error), but there is nothing showing 
> up,
> so I suspect the brightness of OLED displays is not set properly.
> 
> That's the code before changes:
> 
> led = wdata->led_selector | 0x04;
> buf = kzalloc(9, GFP_KERNEL);
> if (buf) {
> buf[0] = WAC_CMD_LED_CONTROL;
> buf[1] = led;
> buf[2] = value >> 2;
> buf[3] = value;
> /* use fixed brightness for OLEDs */
> buf[4] = 0x08;
> hid_hw_raw_request(hdev, buf[0], buf, 9, HID_FEATURE_REPORT,
>HID_REQ_SET_REPORT);
> kfree(buf);
> }
> 
> I don't remember for sure, but I think the range of brightness might be 
> different
> over usb and over bluetooth.

Maybe you can try setting the sysfs file "buttons_luminance" with the
value 8 to check if this will solve the bug.

The bug might also be linked to the slight difference while setting up
the transfer of the image (WAC_CMD_ICON_START) with the value of buf[1]
set to 1 in USB, while it was 0 on bluetooth.

The weird thing is that I remembered having set the OLED (though
scrambled) with these patches applied. I guess the scrambling was due to
the 4-bit vs 1-bit. But I definitively had some results.

Anyway. Przemo, Dmitry, can we consider that this *will* be fixed by
next week, and so we can apply the series for 3.17?
I will have the hardware next week and be able to figure out the
differences between the 2 communication modes.

> 
> TL;DR: the only thing that needs to be fixed is image-over-bluetooth, 
> probably caused by not
> setting or incorrect setting of OLED brightness. 

Thanks for the extensive testings. I'll send the "OLED patch that's not
on the list yet" as a 11/10 so everybody can have a look.

Cheers,
Benjamin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: vt6655: fix direct dereferencing of user pointer

2014-07-25 Thread Malcolm Priestley


Hi Guillaume

On 25/07/14 13:47, Guillaume Clement wrote:

Sparse reported that the data from tagSCmdRequest is given by
userspace, so it should be tagged as such.

extra is not in user space

All Wireless Extensions ioctl extra calls originate from 
ioctl_standard_iw_point in wext-core.


Either through ioctl or iw_handler

All these functions should have been converted to iw_handler.

Regards


Malcolm



Later, we were memcomparing and dereferencing it without first copying
it, fix that as well.

Signed-off-by: Guillaume Clement 
---
  drivers/staging/vt6655/iocmd.h |  2 +-
  drivers/staging/vt6655/iwctl.c | 32 ++--
  drivers/staging/vt6655/iwctl.h |  6 +++---
  3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/drivers/staging/vt6655/iocmd.h b/drivers/staging/vt6655/iocmd.h
index e499f1b..dd12498 100644
--- a/drivers/staging/vt6655/iocmd.h
+++ b/drivers/staging/vt6655/iocmd.h
@@ -100,7 +100,7 @@ typedef enum tagWZONETYPE {
  #pragma pack(1)
  typedef struct tagSCmdRequest {
u8  name[16];
-   void*data;
+   void __user *data;
u16 wResult;
u16 wCmdCode;
  } SCmdRequest, *PSCmdRequest;
diff --git a/drivers/staging/vt6655/iwctl.c b/drivers/staging/vt6655/iwctl.c
index 501cd64..7ce23b5 100644
--- a/drivers/staging/vt6655/iwctl.c
+++ b/drivers/staging/vt6655/iwctl.c
@@ -1621,17 +1621,24 @@ int iwctl_giwauth(struct net_device *dev,
  int iwctl_siwgenie(struct net_device *dev,
   struct iw_request_info *info,
   struct iw_point *wrq,
-  char *extra)
+  char __user *extra)
  {
PSDevicepDevice = (PSDevice)netdev_priv(dev);
PSMgmtObjectpMgmt = &(pDevice->sMgmtObj);
int ret = 0;
+   char length;

if (wrq->length) {
-   if ((wrq->length < 2) || (extra[1]+2 != wrq->length)) {
-   ret = -EINVAL;
-   goto out;
-   }
+   if (wrq->length < 2)
+   return -EINVAL;
+
+   ret = get_user(length, extra + 1);
+   if (ret)
+   return ret;
+
+   if (length + 2 != wrq->length)
+   return -EINVAL;
+
if (wrq->length > MAX_WPA_IE_LEN) {
ret = -ENOMEM;
goto out;
@@ -1654,7 +1661,7 @@ out://not completely ...not necessary in wpa_supplicant 
0.5.8
  int iwctl_giwgenie(struct net_device *dev,
   struct iw_request_info *info,
   struct iw_point *wrq,
-  char *extra)
+  char __user *extra)
  {
PSDevicepDevice = (PSDevice)netdev_priv(dev);
PSMgmtObjectpMgmt = &(pDevice->sMgmtObj);
@@ -1801,18 +1808,23 @@ int iwctl_giwencodeext(struct net_device *dev,
  int iwctl_siwmlme(struct net_device *dev,
  struct iw_request_info *info,
  struct iw_point *wrq,
- char *extra)
+ char __user *extra)
  {
PSDevicepDevice = (PSDevice)netdev_priv(dev);
PSMgmtObjectpMgmt = &(pDevice->sMgmtObj);
-   struct iw_mlme *mlme = (struct iw_mlme *)extra;
+   struct iw_mlme mime;
+
int ret = 0;

-   if (memcmp(pMgmt->abyCurrBSSID, mlme->addr.sa_data, ETH_ALEN)) {
+   ret = copy_from_user(, extra, sizeof(mime));
+   if (ret)
+   return -EFAULT;
+
+   if (memcmp(pMgmt->abyCurrBSSID, mime.addr.sa_data, ETH_ALEN)) {
ret = -EINVAL;
return ret;
}
-   switch (mlme->cmd) {
+   switch (mime.cmd) {
case IW_MLME_DEAUTH:
//this command seems to be not complete,please test it 
--einsnliu
//bScheduleCommand((void *) pDevice, WLAN_CMD_DEAUTH, (unsigned 
char *));
diff --git a/drivers/staging/vt6655/iwctl.h b/drivers/staging/vt6655/iwctl.h
index de0a337..7dd6310 100644
--- a/drivers/staging/vt6655/iwctl.h
+++ b/drivers/staging/vt6655/iwctl.h
@@ -176,12 +176,12 @@ int iwctl_giwauth(struct net_device *dev,
  int iwctl_siwgenie(struct net_device *dev,
   struct iw_request_info *info,
   struct iw_point *wrq,
-  char *extra);
+  char __user *extra);

  int iwctl_giwgenie(struct net_device *dev,
   struct iw_request_info *info,
   struct iw_point *wrq,
-  char *extra);
+  char __user *extra);

  int iwctl_siwencodeext(struct net_device *dev,
   struct iw_request_info *info,
@@ -196,7 +196,7 @@ int iwctl_giwencodeext(struct net_device *dev,
  int iwctl_siwmlme(struct net_device *dev,
  struct iw_request_info *info,
  struct iw_point *wrq,
- char *extra);
+ char __user *extra);
  #endif

Re: [PATCH] cpufreq: Fix latency for cpufreq_info

2014-07-25 Thread Nick Krause

On Fri, Jul 25, 2014 at 1:36 AM, pramod gurav
 wrote:
> Viresh,
> Be careful when you ACK Nick's patches. He has confessed he has no
> idea how to build test a kernel. His patches are NOT AT ALL build
> tested. And some of his patches are being reverted for causing
> problems in build and all. He looks for FIXME and removes/edits the
> code as per the comments. Just be careful.
>
> On Mon, Jul 14, 2014 at 12:00 PM, Viresh Kumar  
> wrote:
>> On 14 July 2014 11:58, Nicholas Krause  wrote:
>>> This fixes the latency for the cpufreq policy to 1 million nanoseconds
>>> that calls the function pxa_cpu_init for the member of the structure
>>> called cpuinfo.transition_latency.
>>>
>>> Signed-off-by: Nicholas Krause 
>>> ---
>>>  drivers/cpufreq/pxa2xx-cpufreq.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/cpufreq/pxa2xx-cpufreq.c 
>>> b/drivers/cpufreq/pxa2xx-cpufreq.c
>>> index e24269a..e08bb98 100644
>>> --- a/drivers/cpufreq/pxa2xx-cpufreq.c
>>> +++ b/drivers/cpufreq/pxa2xx-cpufreq.c
>>> @@ -372,7 +372,7 @@ static int pxa_cpufreq_init(struct cpufreq_policy 
>>> *policy)
>>> init_sdram_rows();
>>>
>>> /* set default policy and cpuinfo */
>>> -   policy->cpuinfo.transition_latency = 1000; /* FIXME: 1 ms, assumed 
>>> */
>>> +   policy->cpuinfo.transition_latency = 100;
>>>
>>> /* Generate pxa25x the run cpufreq_frequency_table struct */
>>> for (i = 0; i < NUM_PXA25x_RUN_FREQS; i++) {
>>
>> Acked-by: Viresh Kumar 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
>
>
> --
> Thanks and Regards
> Pramod


Pramod,
I learned how to do it today and Viresh checked this and it didn't build
so I sent him another fixed patch :). I am learning fast now and known
how to properly test my patches now.
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Kernel 3.16-rc6 Bug with Sound?

2014-07-25 Thread Nick Krause

Hey guys after compiling and running the kernel in the subject line I
get no sound
and a message of no sound codec could be found. I am new so this may be
a  missed needed config for sound or it's a bug. I am attaching my config
to help you guys out :).
Nick


config
Description: Binary data

[PATCH 08/11] f2fs: fix wrong condition for unlikely

2014-07-25 Thread Jaegeuk Kim

This patch fixes the wrongly used unlikely condition.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 42a16c1..36b0d47 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -932,7 +932,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool 
is_umount)
/* Here, we only have one bio having CP pack */
sync_meta_pages(sbi, META_FLUSH, LONG_MAX);
 
-   if (unlikely(!is_set_ckpt_flags(ckpt, CP_ERROR_FLAG))) {
+   if (!is_set_ckpt_flags(ckpt, CP_ERROR_FLAG)) {
clear_prefree_segments(sbi);
release_dirty_inode(sbi);
F2FS_RESET_SB_DIRT(sbi);
-- 
1.8.5.2 (Apple Git-48)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/11] f2fs: avoid retrying wrong recovery routine when error was occurred

2014-07-25 Thread Jaegeuk Kim

This patch eliminates the propagation of recovery errors to the next mount.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c |  3 ++-
 fs/f2fs/f2fs.h   |  2 +-
 fs/f2fs/recovery.c   | 20 +++-
 fs/f2fs/segment.c|  5 +
 4 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 36b0d47..765cc51 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -795,6 +795,7 @@ static void wait_on_all_pages_writeback(struct f2fs_sb_info 
*sbi)
 static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
 {
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
+   struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
nid_t last_nid = 0;
block_t start_blk;
struct page *cp_page;
@@ -808,7 +809,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool 
is_umount)
 * This avoids to conduct wrong roll-forward operations and uses
 * metapages, so should be called prior to sync_meta_pages below.
 */
-   discard_next_dnode(sbi);
+   discard_next_dnode(sbi, NEXT_FREE_BLKADDR(sbi, curseg));
 
/* Flush all the NAT/SIT pages */
while (get_pages(sbi, F2FS_DIRTY_META))
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 475f97c..14b9f74 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1225,7 +1225,7 @@ void destroy_flush_cmd_control(struct f2fs_sb_info *);
 void invalidate_blocks(struct f2fs_sb_info *, block_t);
 void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t);
 void clear_prefree_segments(struct f2fs_sb_info *);
-void discard_next_dnode(struct f2fs_sb_info *);
+void discard_next_dnode(struct f2fs_sb_info *, block_t);
 int npages_for_summary_flush(struct f2fs_sb_info *);
 void allocate_new_segments(struct f2fs_sb_info *);
 struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 90d7e80..8ef78f1 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -434,7 +434,9 @@ next:
 
 int recover_fsync_data(struct f2fs_sb_info *sbi)
 {
+   struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
struct list_head inode_list;
+   block_t blkaddr;
int err;
 
fsync_entry_slab = f2fs_kmem_cache_create("f2fs_fsync_inode_entry",
@@ -446,6 +448,9 @@ int recover_fsync_data(struct f2fs_sb_info *sbi)
 
/* step #1: find fsynced inode numbers */
sbi->por_doing = true;
+
+   blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
+
err = find_fsync_dnodes(sbi, _list);
if (err)
goto out;
@@ -459,8 +464,21 @@ int recover_fsync_data(struct f2fs_sb_info *sbi)
 out:
destroy_fsync_dnodes(_list);
kmem_cache_destroy(fsync_entry_slab);
+
+   if (err) {
+   truncate_inode_pages_final(NODE_MAPPING(sbi));
+   truncate_inode_pages_final(META_MAPPING(sbi));
+   }
+
sbi->por_doing = false;
-   if (!err)
+   if (!err) {
write_checkpoint(sbi, false);
+   } else {
+   discard_next_dnode(sbi, blkaddr);
+
+   /* Flush all the NAT/SIT pages */
+   while (get_pages(sbi, F2FS_DIRTY_META))
+   sync_meta_pages(sbi, META, LONG_MAX);
+   }
return err;
 }
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 9fce0f47..e016b97 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -379,11 +379,8 @@ static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
return blkdev_issue_discard(sbi->sb->s_bdev, start, len, GFP_NOFS, 0);
 }
 
-void discard_next_dnode(struct f2fs_sb_info *sbi)
+void discard_next_dnode(struct f2fs_sb_info *sbi, block_t blkaddr)
 {
-   struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
-   block_t blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
-
if (f2fs_issue_discard(sbi, blkaddr, 1)) {
struct page *page = grab_meta_page(sbi, blkaddr);
/* zero-filled page */
-- 
1.8.5.2 (Apple Git-48)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/11] f2fs: enable in-place-update for fdatasync

2014-07-25 Thread Jaegeuk Kim

This patch enforces in-place-updates only when fdatasync is requested.
If we adopt this in-place-updates for the fdatasync, we can skip to write the
recovery information.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h| 1 +
 fs/f2fs/file.c| 7 +++
 fs/f2fs/segment.h | 4 
 3 files changed, 12 insertions(+)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ab36025..8f8685e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -998,6 +998,7 @@ enum {
FI_INLINE_DATA, /* used for inline data*/
FI_APPEND_WRITE,/* inode has appended data */
FI_UPDATE_WRITE,/* inode has in-place-update data */
+   FI_NEED_IPU,/* used fo ipu for fdatasync */
 };
 
 static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag)
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 121689a..e339856 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -127,11 +127,18 @@ int f2fs_sync_file(struct file *file, loff_t start, 
loff_t end, int datasync)
return 0;
 
trace_f2fs_sync_file_enter(inode);
+
+   /* if fdatasync is triggered, let's do in-place-update */
+   if (datasync)
+   set_inode_flag(fi, FI_NEED_IPU);
+
ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
if (ret) {
trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret);
return ret;
}
+   if (datasync)
+   clear_inode_flag(fi, FI_NEED_IPU);
 
/*
 * if there is no written data, don't waste time to write recovery info.
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index ee5c75e..55973f7 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -486,6 +486,10 @@ static inline bool need_inplace_update(struct inode *inode)
if (S_ISDIR(inode->i_mode))
return false;
 
+   /* this is only set during fdatasync */
+   if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU))
+   return true;
+
switch (SM_I(sbi)->ipu_policy) {
case F2FS_IPU_FORCE:
return true;
-- 
1.8.5.2 (Apple Git-48)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/11] f2fs: avoid checkpoint when error was occurred

2014-07-25 Thread Jaegeuk Kim

No need to do checkpoint, whenever any errors were detected.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/recovery.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index a112368..90d7e80 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -436,7 +436,6 @@ int recover_fsync_data(struct f2fs_sb_info *sbi)
 {
struct list_head inode_list;
int err;
-   bool need_writecp = false;
 
fsync_entry_slab = f2fs_kmem_cache_create("f2fs_fsync_inode_entry",
sizeof(struct fsync_inode_entry));
@@ -454,8 +453,6 @@ int recover_fsync_data(struct f2fs_sb_info *sbi)
if (list_empty(_list))
goto out;
 
-   need_writecp = true;
-
/* step #2: recover data */
err = recover_data(sbi, _list, CURSEG_WARM_NODE);
f2fs_bug_on(!list_empty(_list));
@@ -463,7 +460,7 @@ out:
destroy_fsync_dnodes(_list);
kmem_cache_destroy(fsync_entry_slab);
sbi->por_doing = false;
-   if (!err && need_writecp)
+   if (!err)
write_checkpoint(sbi, false);
return err;
 }
-- 
1.8.5.2 (Apple Git-48)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 09/11] f2fs: test before set/clear bits

2014-07-25 Thread Jaegeuk Kim

If the bit is already set, we don't need to reset it, and vice versa.
Because we don't need to make the caches dirty for that.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8f8685e..475f97c 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1003,7 +1003,8 @@ enum {
 
 static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag)
 {
-   set_bit(flag, >flags);
+   if (!test_bit(flag, >flags))
+   set_bit(flag, >flags);
 }
 
 static inline int is_inode_flag_set(struct f2fs_inode_info *fi, int flag)
@@ -1013,7 +1014,8 @@ static inline int is_inode_flag_set(struct 
f2fs_inode_info *fi, int flag)
 
 static inline void clear_inode_flag(struct f2fs_inode_info *fi, int flag)
 {
-   clear_bit(flag, >flags);
+   if (test_bit(flag, >flags))
+   clear_bit(flag, >flags);
 }
 
 static inline void set_acl_inode(struct f2fs_inode_info *fi, umode_t mode)
-- 
1.8.5.2 (Apple Git-48)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/11] f2fs: skip unnecessary data writes during fsync

2014-07-25 Thread Jaegeuk Kim

This patch intends to improve the fsync performance by skipping remaining the
recovery information, only when there is no data that we should recover.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/file.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 7c652b3..121689a 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -133,6 +133,17 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t 
end, int datasync)
return ret;
}
 
+   /*
+* if there is no written data, don't waste time to write recovery info.
+*/
+   if (!is_inode_flag_set(fi, FI_APPEND_WRITE) &&
+   !exist_written_data(sbi, inode->i_ino, APPEND_INO)) {
+   if (is_inode_flag_set(fi, FI_UPDATE_WRITE) &&
+   exist_written_data(sbi, inode->i_ino, UPDATE_INO))
+   goto flush_out;
+   goto out;
+   }
+
/* guarantee free sections for fsync */
f2fs_balance_fs(sbi);
 
@@ -188,6 +199,11 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t 
end, int datasync)
ret = wait_on_node_pages_writeback(sbi, inode->i_ino);
if (ret)
goto out;
+
+   /* once recovery info is written, don't need to tack this */
+   remove_dirty_inode(sbi, inode->i_ino, APPEND_INO);
+flush_out:
+   remove_dirty_inode(sbi, inode->i_ino, UPDATE_INO);
ret = f2fs_issue_flush(F2FS_SB(inode->i_sb));
}
 out:
-- 
1.8.5.2 (Apple Git-48)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 04/11] f2fs: use radix_tree for ino management

2014-07-25 Thread Jaegeuk Kim

For better ino management, this patch replaces the data structure from list
to radix tree.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c | 48 ++--
 fs/f2fs/f2fs.h   |  1 +
 2 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index f93d154..d35094a 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -284,24 +284,26 @@ const struct address_space_operations f2fs_meta_aops = {
 
 static void __add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
 {
-   struct ino_entry *new, *e;
-
-   new = f2fs_kmem_cache_alloc(ino_entry_slab, GFP_ATOMIC);
-   new->ino = ino;
-
+   struct ino_entry *e;
+retry:
spin_lock(>ino_lock[type]);
-   list_for_each_entry(e, >ino_list[type], list) {
-   if (e->ino == ino) {
+
+   e = radix_tree_lookup(>ino_root[type], ino);
+   if (!e) {
+   e = kmem_cache_alloc(ino_entry_slab, GFP_ATOMIC);
+   if (!e) {
spin_unlock(>ino_lock[type]);
-   kmem_cache_free(ino_entry_slab, new);
-   return;
+   goto retry;
}
-   if (e->ino > ino)
-   break;
-   }
+   if (radix_tree_insert(>ino_root[type], ino, e)) {
+   spin_unlock(>ino_lock[type]);
+   goto retry;
+   }
+   memset(e, 0, sizeof(struct ino_entry));
+   e->ino = ino;
 
-   /* add new entry into list which is sorted by inode number */
-   list_add_tail(>list, >list);
+   list_add_tail(>list, >ino_list[type]);
+   }
spin_unlock(>ino_lock[type]);
 }
 
@@ -310,14 +312,15 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, 
nid_t ino, int type)
struct ino_entry *e;
 
spin_lock(>ino_lock[type]);
-   list_for_each_entry(e, >ino_list[type], list) {
-   if (e->ino == ino) {
-   list_del(>list);
+   e = radix_tree_lookup(>ino_root[type], ino);
+   if (e) {
+   list_del(>list);
+   radix_tree_delete(>ino_root[type], ino);
+   if (type == ORPHAN_INO)
sbi->n_orphans--;
-   spin_unlock(>ino_lock[type]);
-   kmem_cache_free(ino_entry_slab, e);
-   return;
-   }
+   spin_unlock(>ino_lock[type]);
+   kmem_cache_free(ino_entry_slab, e);
+   return;
}
spin_unlock(>ino_lock[type]);
 }
@@ -346,7 +349,7 @@ void release_orphan_inode(struct f2fs_sb_info *sbi)
 
 void add_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
 {
-   /* add new orphan entry into list which is sorted by inode number */
+   /* add new orphan ino entry into list */
__add_ino_entry(sbi, ino, ORPHAN_INO);
 }
 
@@ -943,6 +946,7 @@ void init_ino_entry_info(struct f2fs_sb_info *sbi)
int i;
 
for (i = 0; i < MAX_INO_ENTRY; i++) {
+   INIT_RADIX_TREE(>ino_root[i], GFP_ATOMIC);
spin_lock_init(>ino_lock[i]);
INIT_LIST_HEAD(>ino_list[i]);
}
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index b6fa6ec..4454caa 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -456,6 +456,7 @@ struct f2fs_sb_info {
wait_queue_head_t cp_wait;
 
/* for inode management */
+   struct radix_tree_root ino_root[MAX_INO_ENTRY]; /* ino entry array */
spinlock_t ino_lock[MAX_INO_ENTRY]; /* for ino entry lock */
struct list_head ino_list[MAX_INO_ENTRY];   /* inode list head */
 
-- 
1.8.5.2 (Apple Git-48)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/11] f2fs: add info of appended or updated data writes

2014-07-25 Thread Jaegeuk Kim

This patch introduces a inode number list in which represents inodes having
appended data writes or updated data writes after last checkpoint.
This will be used at fsync to determine whether the recovery information
should be written or not.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c | 39 +++
 fs/f2fs/data.c   |  2 ++
 fs/f2fs/f2fs.h   |  7 +++
 fs/f2fs/inode.c  |  4 
 4 files changed, 52 insertions(+)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index d35094a..42a16c1 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -325,6 +325,44 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, 
nid_t ino, int type)
spin_unlock(>ino_lock[type]);
 }
 
+void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type)
+{
+   /* add new dirty ino entry into list */
+   __add_ino_entry(sbi, ino, type);
+}
+
+void remove_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type)
+{
+   /* remove dirty ino entry from list */
+   __remove_ino_entry(sbi, ino, type);
+}
+
+/* mode should be APPEND_INO or UPDATE_INO */
+bool exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode)
+{
+   struct ino_entry *e;
+   spin_lock(>ino_lock[mode]);
+   e = radix_tree_lookup(>ino_root[mode], ino);
+   spin_unlock(>ino_lock[mode]);
+   return e ? true : false;
+}
+
+static void release_dirty_inode(struct f2fs_sb_info *sbi)
+{
+   struct ino_entry *e, *tmp;
+   int i;
+
+   for (i = APPEND_INO; i <= UPDATE_INO; i++) {
+   spin_lock(>ino_lock[i]);
+   list_for_each_entry_safe(e, tmp, >ino_list[i], list) {
+   list_del(>list);
+   radix_tree_delete(>ino_root[i], e->ino);
+   kmem_cache_free(ino_entry_slab, e);
+   }
+   spin_unlock(>ino_lock[i]);
+   }
+}
+
 int acquire_orphan_inode(struct f2fs_sb_info *sbi)
 {
int err = 0;
@@ -896,6 +934,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool 
is_umount)
 
if (unlikely(!is_set_ckpt_flags(ckpt, CP_ERROR_FLAG))) {
clear_prefree_segments(sbi);
+   release_dirty_inode(sbi);
F2FS_RESET_SB_DIRT(sbi);
}
 }
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 482313d..ec3c886 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -789,9 +789,11 @@ int do_write_data_page(struct page *page, struct 
f2fs_io_info *fio)
!is_cold_data(page) &&
need_inplace_update(inode))) {
rewrite_data_page(page, old_blkaddr, fio);
+   set_inode_flag(F2FS_I(inode), FI_UPDATE_WRITE);
} else {
write_data_page(page, , _blkaddr, fio);
update_extent_cache(new_blkaddr, );
+   set_inode_flag(F2FS_I(inode), FI_APPEND_WRITE);
}
 out_writepage:
f2fs_put_dnode();
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4454caa..ab36025 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -103,6 +103,8 @@ enum {
 /* for the list of ino */
 enum {
ORPHAN_INO, /* for orphan ino list */
+   APPEND_INO, /* for append ino list */
+   UPDATE_INO, /* for update ino list */
MAX_INO_ENTRY,  /* max. list */
 };
 
@@ -994,6 +996,8 @@ enum {
FI_NO_EXTENT,   /* not to use the extent cache */
FI_INLINE_XATTR,/* used for inline xattr */
FI_INLINE_DATA, /* used for inline data*/
+   FI_APPEND_WRITE,/* inode has appended data */
+   FI_UPDATE_WRITE,/* inode has in-place-update data */
 };
 
 static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag)
@@ -1252,6 +1256,9 @@ struct page *grab_meta_page(struct f2fs_sb_info *, 
pgoff_t);
 struct page *get_meta_page(struct f2fs_sb_info *, pgoff_t);
 int ra_meta_pages(struct f2fs_sb_info *, int, int, int);
 long sync_meta_pages(struct f2fs_sb_info *, enum page_type, long);
+void add_dirty_inode(struct f2fs_sb_info *, nid_t, int type);
+void remove_dirty_inode(struct f2fs_sb_info *, nid_t, int type);
+bool exist_written_data(struct f2fs_sb_info *, nid_t, int);
 int acquire_orphan_inode(struct f2fs_sb_info *);
 void release_orphan_inode(struct f2fs_sb_info *);
 void add_orphan_inode(struct f2fs_sb_info *, nid_t);
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index cafba3c..0e69aa9 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -296,6 +296,10 @@ void f2fs_evict_inode(struct inode *inode)
sb_end_intwrite(inode->i_sb);
 no_delete:
invalidate_mapping_pages(NODE_MAPPING(sbi), inode->i_ino, inode->i_ino);
+   if (is_inode_flag_set(F2FS_I(inode), FI_APPEND_WRITE))
+   add_dirty_inode(sbi, inode->i_ino, APPEND_INO);
+   if (is_inode_flag_set(F2FS_I(inode), FI_UPDATE_WRITE))
+   add_dirty_inode(sbi, inode->i_ino, UPDATE_INO);

Re: [RFC][PATCH] irq: Rework IRQF_NO_SUSPENDED

2014-07-25 Thread Rafael J. Wysocki

On Saturday, July 26, 2014 12:25:29 AM Rafael J. Wysocki wrote:
> On Friday, July 25, 2014 11:00:12 PM Thomas Gleixner wrote:
> > On Fri, 25 Jul 2014, Rafael J. Wysocki wrote:
> > > On Friday, July 25, 2014 03:25:41 PM Peter Zijlstra wrote:
> > > > OK, so Rafael said there's devices that keep on raising their interrupt
> > > > until they get attention. Ideally this won't happen because the device
> > > > is suspended etc.. But I'm sure there's some broken piece of hardware
> > > > out there that'll make it go boom.
> > > 
> > > So here's an idea.
> > > 
> > > What about returning IRQ_NONE rather than IRQ_HANDLED for "suspended"
> > > interrupts (after all, that's what a sane driver would do for a
> > > suspended device I suppose)?
> > > 
> > > If the line is really shared and the interrupt is taken care of by
> > > the other guy sharing the line, we'll be all fine.
> > > 
> > > If that is not the case, on the other hand, and something's really
> > > broken, we'll end up disabling the interrupt and marking it as
> > > IRQS_SPURIOUS_DISABLED (if I understand things correctly).
> > 
> > We should not wait 100k unhandled interrupts in that case. We know
> > already at the first unhandled interrupt that the shit hit the fan.
> 
> The first one may be a bus glitch or some such.  Also I guess we still need to
> allow the legitimate "no suspend" guy to handle his interrupts until it gets
> too worse.

s/worse/bad/ (ah, grammar).

> Also does it really hurt to rely on the generic mechanism here?  We regard
> it as fine at all other times after all.
> 
> > I'll have a deeper look how we can sanitize the whole wake/no_suspend
> > logic vs. shared interrupts.
> 
> Cool, thanks!
> 
> > Need to look at the usage sites first.
> 
> There will be more of them, like this:
> 
> https://patchwork.kernel.org/patch/4618531/
> 
> Essentially, all wakeup interrupts will need at least one no_suspend irqaction
> going forward.
> 
> Below is my take on this (untested) in case it is useful for anything.
> 
> It is targeted at the problematic case (that is, a shared interrupt with at 
> least
> one irqaction that has IRQF_NO_SUSPEND set and at least one that doesn't) 
> only and
> is not supposed to change behavior in the other cases (the do_irqaction thing
> shamelessly stolen from the Peter's patch).  It drops the IRQD_WAKEUP_STATE 
> check,
> because that has the same problem with shared interrupts as no_suspend.

Self-correction ->

> ---
>  kernel/irq/handle.c |   21 ++---
>  kernel/irq/manage.c |   30 +-
>  2 files changed, 43 insertions(+), 8 deletions(-)
> 
> Index: linux-pm/kernel/irq/manage.c
> ===
> --- linux-pm.orig/kernel/irq/manage.c
> +++ linux-pm/kernel/irq/manage.c

[cut]

> @@ -446,7 +459,15 @@ EXPORT_SYMBOL(disable_irq);
>  void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume)
>  {
>   if (resume) {
> - if (!(desc->istate & IRQS_SUSPENDED)) {
> + if (desc->istate & IRQS_SUSPENDED) {
> + desc->istate &= ~IRQS_SUSPENDED;
> + if (desc->istate & IRQS_SPURIOUS_DISABLED) {
> + pr_err("WARNING! Unhandled events during 
> suspend for IRQ %d\n", irq);

-> This should be printed for desc->irqs_unhandled > 0 I suppose.  That will 
cover
the cases when we don't have to disable it too.  The value of 
desc->irqs_unhandled
can be included into the warning too.

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/11] f2fs: add infra for ino management

2014-07-25 Thread Jaegeuk Kim

This patch changes the naming of orphan-related data structures to use as
inode numbers managed globally.
Later, we can use this facility for managing any inode number lists.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c | 72 +++-
 fs/f2fs/debug.c  |  2 +-
 fs/f2fs/f2fs.h   | 19 +-
 fs/f2fs/super.c  |  2 +-
 4 files changed, 53 insertions(+), 42 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 3e3c2c3..f93d154 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -22,7 +22,7 @@
 #include "segment.h"
 #include 
 
-static struct kmem_cache *orphan_entry_slab;
+static struct kmem_cache *ino_entry_slab;
 static struct kmem_cache *inode_entry_slab;
 
 /*
@@ -282,19 +282,18 @@ const struct address_space_operations f2fs_meta_aops = {
.set_page_dirty = f2fs_set_meta_page_dirty,
 };
 
-static void __add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino)
+static void __add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
 {
-   struct list_head *head;
struct ino_entry *new, *e;
 
-   new = f2fs_kmem_cache_alloc(orphan_entry_slab, GFP_ATOMIC);
+   new = f2fs_kmem_cache_alloc(ino_entry_slab, GFP_ATOMIC);
new->ino = ino;
 
-   spin_lock(>orphan_inode_lock);
-   list_for_each_entry(e, >orphan_inode_list, list) {
+   spin_lock(>ino_lock[type]);
+   list_for_each_entry(e, >ino_list[type], list) {
if (e->ino == ino) {
-   spin_unlock(>orphan_inode_lock);
-   kmem_cache_free(orphan_entry_slab, new);
+   spin_unlock(>ino_lock[type]);
+   kmem_cache_free(ino_entry_slab, new);
return;
}
if (e->ino > ino)
@@ -303,58 +302,58 @@ static void __add_ino_entry(struct f2fs_sb_info *sbi, 
nid_t ino)
 
/* add new entry into list which is sorted by inode number */
list_add_tail(>list, >list);
-   spin_unlock(>orphan_inode_lock);
+   spin_unlock(>ino_lock[type]);
 }
 
-static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino)
+static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
 {
struct ino_entry *e;
 
-   spin_lock(>orphan_inode_lock);
-   list_for_each_entry(e, >orphan_inode_list, list) {
+   spin_lock(>ino_lock[type]);
+   list_for_each_entry(e, >ino_list[type], list) {
if (e->ino == ino) {
list_del(>list);
sbi->n_orphans--;
-   spin_unlock(>orphan_inode_lock);
-   kmem_cache_free(orphan_entry_slab, e);
+   spin_unlock(>ino_lock[type]);
+   kmem_cache_free(ino_entry_slab, e);
return;
}
}
-   spin_unlock(>orphan_inode_lock);
+   spin_unlock(>ino_lock[type]);
 }
 
 int acquire_orphan_inode(struct f2fs_sb_info *sbi)
 {
int err = 0;
 
-   spin_lock(>orphan_inode_lock);
+   spin_lock(>ino_lock[ORPHAN_INO]);
if (unlikely(sbi->n_orphans >= sbi->max_orphans))
err = -ENOSPC;
else
sbi->n_orphans++;
-   spin_unlock(>orphan_inode_lock);
+   spin_unlock(>ino_lock[ORPHAN_INO]);
 
return err;
 }
 
 void release_orphan_inode(struct f2fs_sb_info *sbi)
 {
-   spin_lock(>orphan_inode_lock);
+   spin_lock(>ino_lock[ORPHAN_INO]);
f2fs_bug_on(sbi->n_orphans == 0);
sbi->n_orphans--;
-   spin_unlock(>orphan_inode_lock);
+   spin_unlock(>ino_lock[ORPHAN_INO]);
 }
 
 void add_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
 {
/* add new orphan entry into list which is sorted by inode number */
-   __add_ino_entry(sbi, ino);
+   __add_ino_entry(sbi, ino, ORPHAN_INO);
 }
 
 void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
 {
/* remove orphan entry from orphan list */
-   __remove_ino_entry(sbi, ino);
+   __remove_ino_entry(sbi, ino, ORPHAN_INO);
 }
 
 static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
@@ -408,14 +407,14 @@ static void write_orphan_inodes(struct f2fs_sb_info *sbi, 
block_t start_blk)
unsigned short orphan_blocks = (unsigned short)((sbi->n_orphans +
(F2FS_ORPHANS_PER_BLOCK - 1)) / F2FS_ORPHANS_PER_BLOCK);
struct page *page = NULL;
-   struct orphan_inode_entry *orphan = NULL;
+   struct ino_entry *orphan = NULL;
 
for (index = 0; index < orphan_blocks; index++)
grab_meta_page(sbi, start_blk + index);
 
index = 1;
-   spin_lock(>orphan_inode_lock);
-   head = >orphan_inode_list;
+   spin_lock(>ino_lock[ORPHAN_INO]);
+   head = >ino_list[ORPHAN_INO];
 
/* loop for each orphan inode entry and write them in Jornal block */
list_for_each_entry(orphan, head, list) {
@@ -455,7

Re: [RFC PATCH 1/1] rcu: Use rcu_gp_kthread_wake() to wake up kthreads

2014-07-25 Thread Paul E. McKenney

On Fri, Jul 25, 2014 at 04:19:43PM -0400, Pranith Kumar wrote:
> On Fri, Jul 25, 2014 at 10:44 AM, Paul E. McKenney
>  wrote:
> > On Fri, Jul 25, 2014 at 01:06:58AM -0400, Pranith Kumar wrote:
> >> The rcu_gp_kthread_wake() function checks for three conditions before 
> >> waking up
> >> grace period kthreads:
> >>
> >> *  Is the thread we are trying to wake up the current thread?
> >> *  Are the gp_flags zero? (all threads wait on non-zero gp_flags condition)
> >> *  Is there no thread created for this flavour, hence nothing to wake up?
> >>
> >> If any one of these condition is true, we do not call wake_up().
> >>
> >> In rcu_report_qs_rsp(), I added a pr_info() call testing if any of the 
> >> above
> >> conditions is true, in which case we can avoid calling wake_up(). It turns 
> >> out
> >> that quite a few actually are. Most of the cases where we can avoid is 
> >> condition 2
> >> above and condition 1 also occurs quite often. Condition 3 never happens.
> >>
> >> I could not test the wake_up() in force_quiescent_state() as that is not
> >> triggered trivially, but I am assuming we can replace wake_up() there too.
> >>
> >> Hence this commit tries to avoid calling wake_up() whenever we can by using
> >> rcu_gp_kthread_wake() function.
> >
> > This one does sound much more plausible than the earlier one.  I have
> > a few more questions that I will ask in your follow-up message.
> >
> >> One concern is the comment which states that we need a memory barrier at 
> >> that
> >> location which is being implied by the wake_up(). Should we put an 
> >> smp_mb() and
> >> just not rely on the barrier provided by wake_up()? Thoughts?
> >
> > Let's see...  The memory barriers are unnecessary for your case 1
> > and case 3.  That leaves your case 2, which is all about ->gp_flags.
> > It is quite possible that this case is now fully covered by locking,
> > so that the comment is obsolete.  But why don't you check?
> 
> I checked all the locations where gp_flags is being updated and the
> root node lock is held in all the cases.
> So I guess we can remove the comment too.

And the accesses that matter (for some definition of "that matter") are
also similarly protected?

An example of an access that doesn't matter is one that is followed up
by an access under the appropriate lock.

Anyway, if it is all locked properly, then yes, we should get rid of
the comment -- or replace it with a comment saying that barriers are
not needed due to locking.

Thanx, Paul

> >> Signed-off-by: Pranith Kumar 
> >> ---
> >>  kernel/rcu/tree.c | 6 --
> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >> index 72e0b1f..d0e0d6e 100644
> >> --- a/kernel/rcu/tree.c
> >> +++ b/kernel/rcu/tree.c
> >> @@ -1938,7 +1938,8 @@ static void rcu_report_qs_rsp(struct rcu_state *rsp, 
> >> unsigned long flags)
> >>  {
> >>   WARN_ON_ONCE(!rcu_gp_in_progress(rsp));
> >>   raw_spin_unlock_irqrestore(_get_root(rsp)->lock, flags);
> >> - wake_up(>gp_wq);  /* Memory barrier implied by wake_up() path. 
> >> */
> >> + /* Memory barrier implied by wake_up() path. */
> >> + rcu_gp_kthread_wake(rsp);
> >>  }
> >>
> >>  /*
> >> @@ -2516,7 +2517,8 @@ static void force_quiescent_state(struct rcu_state 
> >> *rsp)
> >>   ACCESS_ONCE(rsp->gp_flags) =
> >>   ACCESS_ONCE(rsp->gp_flags) | RCU_GP_FLAG_FQS;
> >>   raw_spin_unlock_irqrestore(_old->lock, flags);
> >> - wake_up(>gp_wq);  /* Memory barrier implied by wake_up() path. 
> >> */
> >> + /* Memory barrier implied by wake_up() path. */
> >> + rcu_gp_kthread_wake(rsp);
> >>  }
> >>
> >>  /*
> >> --
> >> 2.0.1
> >>
> >
> 
> 
> 
> -- 
> Pranith
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/11] f2fs: punch the core function for inode management

2014-07-25 Thread Jaegeuk Kim

This patch punches out the core functions to manage the inode numbers.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c | 81 
 1 file changed, 44 insertions(+), 37 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 0b4710c..3e3c2c3 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -282,6 +282,47 @@ const struct address_space_operations f2fs_meta_aops = {
.set_page_dirty = f2fs_set_meta_page_dirty,
 };
 
+static void __add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino)
+{
+   struct list_head *head;
+   struct ino_entry *new, *e;
+
+   new = f2fs_kmem_cache_alloc(orphan_entry_slab, GFP_ATOMIC);
+   new->ino = ino;
+
+   spin_lock(>orphan_inode_lock);
+   list_for_each_entry(e, >orphan_inode_list, list) {
+   if (e->ino == ino) {
+   spin_unlock(>orphan_inode_lock);
+   kmem_cache_free(orphan_entry_slab, new);
+   return;
+   }
+   if (e->ino > ino)
+   break;
+   }
+
+   /* add new entry into list which is sorted by inode number */
+   list_add_tail(>list, >list);
+   spin_unlock(>orphan_inode_lock);
+}
+
+static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino)
+{
+   struct ino_entry *e;
+
+   spin_lock(>orphan_inode_lock);
+   list_for_each_entry(e, >orphan_inode_list, list) {
+   if (e->ino == ino) {
+   list_del(>list);
+   sbi->n_orphans--;
+   spin_unlock(>orphan_inode_lock);
+   kmem_cache_free(orphan_entry_slab, e);
+   return;
+   }
+   }
+   spin_unlock(>orphan_inode_lock);
+}
+
 int acquire_orphan_inode(struct f2fs_sb_info *sbi)
 {
int err = 0;
@@ -306,48 +347,14 @@ void release_orphan_inode(struct f2fs_sb_info *sbi)
 
 void add_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
 {
-   struct list_head *head;
-   struct orphan_inode_entry *new, *orphan;
-
-   new = f2fs_kmem_cache_alloc(orphan_entry_slab, GFP_ATOMIC);
-   new->ino = ino;
-
-   spin_lock(>orphan_inode_lock);
-   head = >orphan_inode_list;
-   list_for_each_entry(orphan, head, list) {
-   if (orphan->ino == ino) {
-   spin_unlock(>orphan_inode_lock);
-   kmem_cache_free(orphan_entry_slab, new);
-   return;
-   }
-
-   if (orphan->ino > ino)
-   break;
-   }
-
/* add new orphan entry into list which is sorted by inode number */
-   list_add_tail(>list, >list);
-   spin_unlock(>orphan_inode_lock);
+   __add_ino_entry(sbi, ino);
 }
 
 void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
 {
-   struct list_head *head;
-   struct orphan_inode_entry *orphan;
-
-   spin_lock(>orphan_inode_lock);
-   head = >orphan_inode_list;
-   list_for_each_entry(orphan, head, list) {
-   if (orphan->ino == ino) {
-   list_del(>list);
-   f2fs_bug_on(sbi->n_orphans == 0);
-   sbi->n_orphans--;
-   spin_unlock(>orphan_inode_lock);
-   kmem_cache_free(orphan_entry_slab, orphan);
-   return;
-   }
-   }
-   spin_unlock(>orphan_inode_lock);
+   /* remove orphan entry from orphan list */
+   __remove_ino_entry(sbi, ino);
 }
 
 static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
-- 
1.8.5.2 (Apple Git-48)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/11] f2fs: add nobarrier mount option

2014-07-25 Thread Jaegeuk Kim

This patch adds a mount option, nobarrier, in f2fs.
The assumption in here is that file system keeps the IO ordering, but
doesn't care about cache flushes inside the storages.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c| 5 -
 fs/f2fs/f2fs.h| 1 +
 fs/f2fs/segment.c | 3 +++
 fs/f2fs/super.c   | 7 +++
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index c77c667..482313d 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -139,7 +139,10 @@ void f2fs_submit_merged_bio(struct f2fs_sb_info *sbi,
/* change META to META_FLUSH in the checkpoint procedure */
if (type >= META_FLUSH) {
io->fio.type = META_FLUSH;
-   io->fio.rw = WRITE_FLUSH_FUA | REQ_META | REQ_PRIO;
+   if (test_opt(sbi, NOBARRIER))
+   io->fio.rw = WRITE_FLUSH | REQ_META | REQ_PRIO;
+   else
+   io->fio.rw = WRITE_FLUSH_FUA | REQ_META | REQ_PRIO;
}
__submit_merged_bio(io);
up_write(>io_rwsem);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8f507d4..e999eec 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -41,6 +41,7 @@
 #define F2FS_MOUNT_INLINE_XATTR0x0080
 #define F2FS_MOUNT_INLINE_DATA 0x0100
 #define F2FS_MOUNT_FLUSH_MERGE 0x0200
+#define F2FS_MOUNT_NOBARRIER   0x0400
 
 #define clear_opt(sbi, option) (sbi->mount_opt.opt &= ~F2FS_MOUNT_##option)
 #define set_opt(sbi, option)   (sbi->mount_opt.opt |= F2FS_MOUNT_##option)
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 8a6e57d..9fce0f47 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -239,6 +239,9 @@ int f2fs_issue_flush(struct f2fs_sb_info *sbi)
struct flush_cmd_control *fcc = SM_I(sbi)->cmd_control_info;
struct flush_cmd cmd;
 
+   if (test_opt(sbi, NOBARRIER))
+   return 0;
+
if (!test_opt(sbi, FLUSH_MERGE))
return blkdev_issue_flush(sbi->sb->s_bdev, GFP_KERNEL, NULL);
 
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 34649aa..eec89a2 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -52,6 +52,7 @@ enum {
Opt_inline_xattr,
Opt_inline_data,
Opt_flush_merge,
+   Opt_nobarrier,
Opt_err,
 };
 
@@ -69,6 +70,7 @@ static match_table_t f2fs_tokens = {
{Opt_inline_xattr, "inline_xattr"},
{Opt_inline_data, "inline_data"},
{Opt_flush_merge, "flush_merge"},
+   {Opt_nobarrier, "nobarrier"},
{Opt_err, NULL},
 };
 
@@ -339,6 +341,9 @@ static int parse_options(struct super_block *sb, char 
*options)
case Opt_flush_merge:
set_opt(sbi, FLUSH_MERGE);
break;
+   case Opt_nobarrier:
+   set_opt(sbi, NOBARRIER);
+   break;
default:
f2fs_msg(sb, KERN_ERR,
"Unrecognized mount option \"%s\" or missing 
value",
@@ -544,6 +549,8 @@ static int f2fs_show_options(struct seq_file *seq, struct 
dentry *root)
seq_puts(seq, ",inline_data");
if (!f2fs_readonly(sbi->sb) && test_opt(sbi, FLUSH_MERGE))
seq_puts(seq, ",flush_merge");
+   if (!f2fs_readonly(sbi->sb) && test_opt(sbi, NOBARRIER))
+   seq_puts(seq, ",nobarrier");
seq_printf(seq, ",active_logs=%u", sbi->active_logs);
 
return 0;
-- 
1.8.5.2 (Apple Git-48)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] isdn/bas_gigaset: fix a leak on failure path in gigaset_probe()

2014-07-25 Thread Alexey Khoroshilov

There is a lack of usb_put_dev(udev) on failure path in gigaset_probe().

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov 
---
 drivers/isdn/gigaset/bas-gigaset.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/isdn/gigaset/bas-gigaset.c 
b/drivers/isdn/gigaset/bas-gigaset.c
index c44950d3eb7b..b7ae0a0dd5b6 100644
--- a/drivers/isdn/gigaset/bas-gigaset.c
+++ b/drivers/isdn/gigaset/bas-gigaset.c
@@ -2400,6 +2400,7 @@ allocerr:
 error:
freeurbs(cs);
usb_set_intfdata(interface, NULL);
+   usb_put_dev(udev);
gigaset_freecs(cs);
return rc;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 00/13] Add Maxim 77802 PMIC support

2014-07-25 Thread Mike Turquette

Quoting Javier Martinez Canillas (2014-07-14 04:35:56)
> This series are based on drivers added by Simon Glass to the Chrome OS
> kernel and adds support for the Maxim 77802 Power Management IC, their
> regulators, clocks, RTC and i2c interface.
> 
> This is a v8 of the patch-set that addresses issues pointed out in v7.
> Individual changes are added on each patch but the biggest changes are:
> 
> * Patches 1-7 from v7 are not included since those were improvements to
> the max77686 mfd driver and can be applied independently. Lee Jones said
> that he is going to pick them from the posted v7 series.
> 
> I've created a patchwork bundle with 1-7 from v7 to make it easy to apply:
> 
> https://patchwork.kernel.org/bundle/javier/max77686-improvements/
> 
> * The Dynamic Voltage Scaling support has been removed since that can be
> added in a follow up series and shouldn't block the minimum PMIC support.
> 
> The patch-set has been tested on both Daisy/Snow (max77686) and Peach
> Pit (max77802) Chromebooks and it's composed of the following patches:
> 
> [PATCH v8 01/13] mfd: max77686: Add Maxim 77802 PMIC support
> [PATCH v8 02/13] mfd: max77802: Add DT binding documentation
> [PATCH v8 03/13] regulator: Add driver for Maxim 77802 PMIC regulators
> [PATCH v8 04/13] clk: max77686: Add DT include for MAX77686 PMIC clock
> [PATCH v8 05/13] clk: Add generic driver for Maxim PMIC clocks
> [PATCH v8 06/13] clk: max77686: Convert to the generic max clock driver
> [PATCH v8 07/13] clk: max77686: Improve Maxim 77686 PMIC clocks binding
> [PATCH v8 08/13] clk: Add driver for Maxim 77802 PMIC clocks
> [PATCH v8 09/13] clk: max77802: Add DT binding documentation

For patches 4-9:

Acked-by: Mike Turquette 

Regards,
Mike

> [PATCH v8 10/13] rtc: max77686: Allow the max77686 rtc to wakeup the system
> [PATCH v8 11/13] rtc: max77686: Remove dead code for SMPL and WTSR
> [PATCH v8 12/13] rtc: Add driver for Maxim 77802 PMIC Real-Time-Clock
> [PATCH v8 13/13] ARM: dts: Add max77802 to exynos5420-peach-pit and 
> exynos5800-peach-pi
> 
> Patch 01/13 extend the max77686 mfd driver to also support the max77802
> PMIC and patch 02/13 adds the DT binding documentation for this PMIC.
> 
> Patch 03/13 adds support for the regulators found in the PMIC.
> 
> Patch 04/13 to 07/13 are improvements and refactoring to the max77686 clock
> driver to avoid code duplication when adding max77802 clocks support in patch
> 08/13. Patch 09/13 adds the DT binding document for the max77802 clock driver.
> 
> Patches 10/13 and 11/13 are improvements to max77686 RTC driver and patch
> 12/13 adds support for the RTC found in the max77802 PMIC.
> 
> Finally patch 13/13 adds the required device node to the Peach Pit and Pi
> exynos5 based boards.
> 
> Since there are cross-subsystems dependencies, I think that the best way to
> sort this out is if relevant maintainers ack the patches so 01/13 to 012/13
> can be merged through the mfd tree. The patches and the relevant acks are:
> 
> Patch 03/13 (regulator - Mark Brown)
> Patches 04/13 to 09/13 (clk - Mike Turquette)
> Patches 10/13 to 12/13 (rtc - Alessandro Zummo)
> 
> Patch 13/13 is only a DTS change so it can be picked by Kukjin Kim once the
> other patches are picked by Lee Jones.
> 
> Since we are in 3.16-rc5 already, it would be great if I can get your acks
> or feedback since I was hoping this series to make it to 3.17. This is due
> other series that were already posted depend on this one.
> 
> Also, the series have been reviewed and tested by Samsung folks and most of
> the patches already collected Reviewed-by and Tested-by tags.
> 
> Thanks a lot and best regards,
> Javier
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/1] rcu: Use rcu_gp_kthread_wake() to wake up kthreads

2014-07-25 Thread Pranith Kumar

On Fri, Jul 25, 2014 at 11:02 AM, Paul E. McKenney
 wrote:
> On Fri, Jul 25, 2014 at 02:24:34AM -0400, Pranith Kumar wrote:
>> On Fri, Jul 25, 2014 at 1:06 AM, Pranith Kumar  wrote:
>>
>> >
>> > In rcu_report_qs_rsp(), I added a pr_info() call testing if any of the 
>> > above
>> > conditions is true, in which case we can avoid calling wake_up(). It turns 
>> > out
>> > that quite a few actually are. Most of the cases where we can avoid is 
>> > condition 2
>> > above and condition 1 also occurs quite often. Condition 3 never happens.
>> >
>>
>> A little more data. On an idle system there are about 2000 unnecessary
>> wake_up() calls every 5 minutes with the most common trace being the
>> follows:
>>
>> [Fri Jul 25 02:05:49 2014]  [] 
>> rcu_report_qs_rnp+0x285/0x2c0
>> [Fri Jul 25 02:05:49 2014]  [] ? 
>> schedule_timeout+0x159/0x270
>> [Fri Jul 25 02:05:49 2014]  [] force_qs_rnp+0x111/0x190
>> [Fri Jul 25 02:05:49 2014]  [] ? 
>> synchronize_rcu_bh+0x50/0x50
>> [Fri Jul 25 02:05:49 2014]  [] rcu_gp_kthread+0x85f/0xa70
>> [Fri Jul 25 02:05:49 2014]  [] ? __wake_up_sync+0x20/0x20
>> [Fri Jul 25 02:05:49 2014]  [] ? rcu_barrier+0x20/0x20
>> [Fri Jul 25 02:05:49 2014]  [] kthread+0xdb/0x100
>>   []?kthread_create_on_node+0x180/0x180
>> [Fri Jul 25 02:05:49 2014]  [] ret_from_fork+0x7c/0xb0
>>   [] ?kthread_create_on_node+0x180/0x180
>>
>> With rcutorture, there are about 2000 unnecessary wake_ups() every 3
>> minutes with the most common trace being:
>>
>> [Fri Jul 25 02:18:30 2014]  [] 
>> rcu_report_qs_rnp+0x285/0x2c0
>> [Fri Jul 25 02:18:30 2014]  [] ? 
>> __update_cpu_load+0xe5/0x140
>>  [] ?rcu_read_delay+0x50/0x80 [rcutorture]
>>  []rcu_process_callbacks+0x6b8/0x7e0
>
> Good to see the numbers!!!
>
> But to evaluate this analytically, we should compare the overhead of the
> wake_up() with the overhead of the extra checks in rcu_gp_kthread_wake(),
> and then compare the number of unnecessary wake_up()s to the number of
> calls to rcu_gp_kthread_wake() added by this patch.  This means that we
> need more numbers.
>
> For example, suppose that the extra checks cost 10ns on average, and that
> a unnecessary wake_up() costs 1us on average, to that each wake_up()
> is on average 100 times more expensive than the extra checks.  Then it
> makes sense to ask whether the saved wake_up() save more time than the
> extra tests cost.  Turning the arithmetic crank says that if more than 1%
> of the wake_up()s are unnecessary, we should add the checks.
>
> This means that if there are fewer than 200,000 grace periods in each
> of the time periods, then your patch really would provide performance
> benefits.  I bet that there are -way- fewer than 200,000 grace periods in
> each of the time periods, but why don't you build with RCU_TRACE and look
> at the "rcugp" file in RCU's debugfs hierarchy?  Or just periodically
> print out the rcu_state ->completed field?
>

I put some debugging code to see how many unnecessary wake ups were
being generated in rcu_report_qs_rsp(). I ran both with and without
rcutorture running. Here are the results

Without rcutorture:

[   14.839214] Total:2000, unnecessary:2000, case1:1741, case2:2000, case3:0
[  224.284633] Total:4000, unnecessary:4000, case1:3652, case2:4000, case3:0
[  244.159021] Total:6000, unnecessary:6000, case1:5539, case2:6000, case3:0
[  260.522175] Total:8000, unnecessary:8000, case1:7447, case2:8000, case3:0
[  268.293058] Total:1, unnecessary:1, case1:9317, case2:1, case3:0
[  275.962033] Total:12000, unnecessary:12000, case1:11159, case2:12000, case3:0
[  287.411032] Total:14000, unnecessary:14000, case1:13008, case2:14000, case3:0
[  304.868334] Total:16000, unnecessary:16000, case1:14885, case2:16000, case3:0
[  318.090930] Total:18000, unnecessary:18000, case1:16747, case2:18000, case3:0
[  333.423876] Total:2, unnecessary:2, case1:18631, case2:2, case3:0
[  346.775399] Total:22000, unnecessary:22000, case1:20502, case2:22000, case3:0
[  362.867751] Total:24000, unnecessary:24000, case1:22386, case2:24000, case3:0
[  376.777817] Total:26000, unnecessary:26000, case1:24251, case2:26000, case3:0
[  391.839994] Total:28000, unnecessary:28000, case1:26118, case2:28000, case3:0
[  406.559406] Total:3, unnecessary:3, case1:27983, case2:3, case3:0
[  419.973867] Total:32000, unnecessary:32000, case1:29855, case2:32000, case3:0
[  435.080002] Total:34000, unnecessary:34000, case1:31740, case2:34000, case3:0
[  449.077018] Total:36000, unnecessary:36000, case1:33588, case2:36000, case3:0
[  464.418942] Total:38000, unnecessary:38000, case1:35460, case2:38000, case3:0
[  478.654755] Total:4, unnecessary:4, case1:37326, case2:4, case3:0
[  494.650198] Total:42000, unnecessary:42000, case1:39232, case2:42000, case3:0
[  508.594240] Total:44000, unnecessary:44000, case1:41134, case2:44000, case3:0
[  524.273907] Total:46000, unnecessary:46000, case1:43039,

Re: [PATCH net-next] net: filter: rename 'struct sk_filter' to 'struct bpf_prog'

2014-07-25 Thread Pablo Neira Ayuso

On Fri, Jul 25, 2014 at 02:50:32PM -0400, Willem de Bruijn wrote:
> On Fri, Jul 25, 2014 at 2:43 PM, Alexei Starovoitov  wrote:
> > On Fri, Jul 25, 2014 at 11:32 AM, Willem de Bruijn  
> > wrote:
>  This follows a convention in include/uapi/linux/netfilter/*.h that
>  likely predates the introduction of uapi. A search for "Used
>  internally by the kernel" shows many more examples. I should not have
>  included filter.h, however. The common behavior when using pointers
>  to kernel-internal structures is to have a forward declaration. I suggest
>  making that change, instead of changing to void *. This avoids having
>  to add casts where xt_bpf_info is used in net/netfilter/xt_bpf.c:
> >>>
> >>> that will not avoid typecast.
> >>> Either 'void *' approach or extra 'struct sk_filter;' approach, both need
> >>> type casts to 'struct bpf_prog' in xt_bpf.c
> >>> (because of SK_RUN_FILTER macro)
> >>> Therefore I prefer extra 'struct sk_filter;' approach.
> >>
> >> I hadn't noticed that your patch makes the same change that I
> >> proposed. Nothing in userspace should touch that pointer, so it is
> >> fine to change its type to struct bpf_prog* at the same time. No need
> >> for typecasts.
> >
> > really? I don't think it's a good idea to expose kernel struct type
> > to user space. How is it even going to compile?
> 
> a forward declaration.
> 
> > #include  brings different files in kernel and in user 
> > space.
> > struct bpf_prog is undefined in user space and compiler will complain.
> > Adding 'struct bpf_prog;' will be ugly.
> > imo the lesser evil is adding 'struct sk_filter;' and doing type casts
> > in kernel.
> 
> but the exact same argument applies to sk_filter. If that struct is
> renamed everywhere else, then the result will only be more confusing.
> A forward declaration is the standard workaround to all such cases in
> include/uapi/linux/netfilter. See for instance xt_connlimit.h. This is
> sufficient to allow userspace build to succeed, without exposing any
> kernel structure detail. If you don't even want to leak the name, then
> let's make it void *. Keeping a declaration for sk_filter, while
> sk_filter is renamed everywhere else is the least good option, in my
> opinion.

Please, send me a patch to remove that include  from the
uapi header and define struct sk_filter; so we save the typecast in
xt_bpf.c

The struct sk_filter; doesn't expose anything relevant since, even
assuming userspace knows the layout, it can *not* do anything useful
with that.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] hwrng: Pass entropy to add_hwgenerator_randomness() in bits, not bytes

2014-07-25 Thread Stephen Boyd

rng_get_data() returns the number of bytes read from the hardware.
The entropy argument to add_hwgenerator_randomness() is passed
directly to credit_entropy_bits() so we should be passing the
number of bits, not bytes here.

Fixes: be4000bc464 "hwrng: create filler thread"
Cc: Torsten Duwe 
Signed-off-by: Stephen Boyd 
---
 drivers/char/hw_random/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index dc80cdab733d..6e02ec103cc7 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -362,7 +362,7 @@ static int hwrng_fillfn(void *unused)
continue;
}
add_hwgenerator_randomness((void *)rng_fillbuf, rc,
-  (rc*current_quality)>>10);
+  rc * current_quality * 8 >> 10);
}
hwrng_fill = NULL;
return 0;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next] net: filter: rename 'struct sk_filter' to 'struct bpf_prog'

2014-07-25 Thread Pablo Neira Ayuso

On Fri, Jul 25, 2014 at 10:24:29AM -0700, Alexei Starovoitov wrote:
> On Fri, Jul 25, 2014 at 6:00 AM, Daniel Borkmann  wrote:
> > On 07/25/2014 01:54 PM, Pablo Neira Ayuso wrote:
> >>
> >> On Fri, Jul 25, 2014 at 01:25:35PM +0200, Daniel Borkmann wrote:
> >>>
> >>> [ also Cc'ing Willem, Pablo ]
> >>>
> >>> On 07/25/2014 10:04 AM, Alexei Starovoitov wrote:
> 
>  'sk_filter' name is used as 'struct sk_filter', function sk_filter() and
>  as variable 'sk_filter', which makes code hard to read.
>  Also it's easily confused with 'struct sock_filter'
>  Rename 'struct sk_filter' to 'struct bpf_prog' to clarify semantics and
>  align the name with generic BPF use model.
> >>>
> >>>
> >>> Agreed, as we went for kernel/bpf/, renaming makes absolutely sense.
> >>
> >>
> >> My nft socket filtering changes are accomodated into struct sk_filter,
> >> and will still be, so I still need some generic name there...
> >
> >
> > All the parts from filter.c which is BPF's core engine have been moved
> > into kernel/bpf/ to get it ready for tracing et al, since there is not
> > always a socket context anymore. The *whole* infrastructure around struct
> > sk_filter is [e]BPF and used in non-net related contexts as well, whereas
> > nft socket filtering is *only* for sockets. Due to the socket-only specific
> > use case why doesn't it make more sense to have a union in struct sock
> > around sk_filter (or however we name it) and only allow one of the two
> > being loaded on a socket?
> 
> yep.
> Adding nft specific things to struct sk_filter/bpf_prog is not correct,
> since this struct is already part of seccomp and will be used
> in net-less configurations. SK_RUN_FILTER() macro will also be
> renamed into something like RUN_BPF_RPOG(). It's one and only
> way to invoke eBPF programs. Adding nft selector cannot work,
> since eBPF is used with generic context whereas nft is skb specific.
> If you want to add nft filtering capabilities to sockets, you'd need
> to add union around 'struct bpf_prog' inside 'struct sock', which will be
> much cleaner way.

The struct sk_filter is almost providing the generic framework, it
just needs to be generalized, a quick layout for it:

struct sk_filter {
struct sk_filter_cb *cb;
atomic_trefcnt;
struct rcu_head head;
chardata[0]; /* here, you specific struct bpf_prog 
*/
};

The refcnt is required sk_filter_{charge,uncharge,release}. The struct
rcu_head is also need from sk_filter_release().

struct sk_filter_cb {
int type;
struct module *me;
void (*charge)(struct sock *sk, struct sk_filter *fp);
void (*uncharge)(struct sock *sk, struct sk_filter *fp);
unsigned int (*run_filter)(struct sk_filter *fp, struct sk_buff *skb);
};

We have to provide the register/unregister functions for the specific
callbacks depending on the socket filtering approach. But I'll have to
introduce this myself when I come up with the nft patches again.

So meanwhile, you should just encapsulate what really belongs to
struct bpf_prog, ie. size, bytecode, jitted, etc. and leave struct
sk_filter in place.

struct sk_filter {
atomic_trefcnt;
struct rcu_head head;
u32 len;
struct bpf_prog bpf_prog;
};

The len will go into struct bpf_prog once the generic infrastructure
above is introduced since the semantics (number of blocks) is
different from nft.

If you straight forward rename the entire structure, you'll take
things that are not specific from bpf such as refcnt and rcu_head.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: net: socket: NULL ptr deref in sendmsg

2014-07-25 Thread Hannes Frederic Sowa

On Fr, 2014-07-25 at 19:23 +0400, Andrey Ryabinin wrote:
> On 07/14/14 01:50, Sasha Levin wrote:
> 
> > 
> > I've tried debugging it, but I don't see a code path that could lead to 
> > that.
> > 
> 
> I finally found some time to take look at this and I've found where the 
> problem is.
> 
> Sasha, I suppose there was no usual "Unable to handle NULL pointer deference" 
> after KASAN's report, right?
> 
> This gave me a clue that address 0 is actually mapped and contains valid 
> socket address structure in it.
> I've managed to write a simple code (in attachment), which could easily 
> reproduce this bug.
> 
> I've fixed it with the following patch, please take a look.
> 
> 
> From: Andrey Ryabinin 
> Subject: [PATCH] net: sendmsg: fix NULL pointer dereference
> 
> Sasha's report:
>   > While fuzzing with trinity inside a KVM tools guest running the 
> latest -next
>   > kernel with the KASAN patchset, I've stumbled on the following spew:
>   >
>   > [ 4448.949424] 
> ==
>   > [ 4448.951737] AddressSanitizer: user-memory-access on address 0
>   > [ 4448.952988] Read of size 2 by thread T19638:
>   > [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 
> 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
>   > [ 4448.956823]  88046d86ca40  880082f37e78 
> 880082f37a40
>   > [ 4448.958233]  b6e47068 880082f37a68 880082f37a58 
> b242708d
>   > [ 4448.959552]   880082f37a88 b24255b1 
> 
>   > [ 4448.961266] Call Trace:
>   > [ 4448.963158] dump_stack (lib/dump_stack.c:52)
>   > [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184)
>   > [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352)
>   > [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339)
>   > [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339)
>   > [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555)
>   > [ 4448.970103] sock_sendmsg (net/socket.c:654)
>   > [ 4448.971584] ? might_fault (mm/memory.c:3741)
>   > [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 
> mm/memory.c:3740)
>   > [ 4448.973596] ? verify_iovec (net/core/iovec.c:64)
>   > [ 4448.974522] ___sys_sendmsg (net/socket.c:2096)
>   > [ 4448.975797] ? put_lock_stats.isra.13 
> (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
>   > [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273)
>   > [ 4448.978197] ? lock_release_non_nested 
> (kernel/locking/lockdep.c:3434 (discriminator 1))
>   > [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188)
>   > [ 4448.980535] __sys_sendmmsg (net/socket.c:2181)
>   > [ 4448.981592] ? trace_hardirqs_on_caller 
> (kernel/locking/lockdep.c:2600)
>   > [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
>   > [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 
> (discriminator 2))
>   > [ 4448.985621] ? trace_hardirqs_on_caller 
> (kernel/locking/lockdep.c:2600)
>   > [ 4448.986754] SyS_sendmmsg (net/socket.c:2201)
>   > [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542)
>   > [ 4448.988929] 
> ==
> 
> This reports means that we've come to netlink_sendmsg() with msg->msg_name == 
> NULL and msg->msg_namelen > 0.
> 
> After this report there was no usual "Unable to handle kernel NULL pointer 
> dereference"
> and this gave me a clue that address 0 is mapped and contains valid socket 
> address structure in it.
> 
> This bug was introduced in f3d3342602f8bcbf37d7c46641cb9bca7618eb1c
> (net: rework recvmsg handler msg_name and msg_namelen logic).
> Commit message states that:
>   "Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
>non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
>affect sendto as it would bail out earlier while trying to copy-in the
>address."
> But in fact this affects sendto when address 0 is mapped and contains
> socket address structure in it. In such case copy-in address will succeed,
> verify_iovec() function will successfully exit with msg->msg_namelen > 0
> and msg->msg_name == NULL.
> 
> This patch fixes it by assigning m->msg_name to address if 
> move_addr_to_kernel()
> was successful.
> 
> Cc: Hannes Frederic Sowa 
> Cc: Eric Dumazet 
> Cc: 
> Reported-by: Sasha Levin 
> Signed-off-by: Andrey Ryabinin 
> ---
>  net/compat.c | 6 --
>  net/core/iovec.c | 6 --
>  2 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/net/compat.c b/net/compat.c
> index 9a76eaf..eefd989 100644
> --- a/net/compat.c
> +++ b/net/compat.c
> @@ -92,9 +92,11 @@ int verify_compat_iovec(struct msghdr *kern_msg, struct 
> iovec *kern_iov,
>

Re: net: socket: NULL ptr deref in sendmsg

2014-07-25 Thread Hannes Frederic Sowa

On Fr, 2014-07-25 at 16:52 -0400, Sasha Levin wrote:
> On 07/25/2014 11:23 AM, Andrey Ryabinin wrote:
> > After this report there was no usual "Unable to handle kernel NULL pointer 
> > dereference"
> > and this gave me a clue that address 0 is mapped and contains valid socket 
> > address structure in it.
> 
> Interesting. Does it mean that all network protocols that check it for being 
> NULL instead of checking
> the length are incorrect?

I would not like to go down this route and keep msg->msg_namelen and
msg->msg_name in sync after verify_iovec.

> (such as:)
> 
> if (msg->msg_name) {
> DECLARE_SOCKADDR(struct sockaddr_can *, addr, msg->msg_name);
> 
> [...]
> 

Thanks,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/10] Input - wacom: conversion to HID driver, series 2

2014-07-25 Thread Przemo Firszt

Dnia 2014-07-24, czw o godzinie 14:13 -0400, Benjamin Tissoires pisze:
[..]
Hi Benjamin,
I'm testing the whole series including the OLED patch that's not on the
list yet.

Hardware: 2 x Intuos4 Wireless tested on usb and bluetooth until noted
otherwise.

What works:
1. Tablet in general, pressure, tilt, buttons etc.
2. Battery reporting (including gnome). The double wireless tablet bug
is gone:

$ ls /sys/class/power_supply/
AC  BAT0  wacom_ac_2  wacom_ac_3  wacom_battery_2  wacom_battery_3

3. Setting LED selector value
4. Setting LED selector brightness (default and pressed)
5. Rendering images to button displays works on usb ONLY.

$ i4oled -d 
/sys/bus/hid/drivers/wacom/0003\:056A\:00BC.0009/wacom_led/button0_rawimg -t 
Linux

On bluetooth writing image goes fine (no error), but there is nothing showing 
up,
so I suspect the brightness of OLED displays is not set properly.

That's the code before changes:

led = wdata->led_selector | 0x04;
buf = kzalloc(9, GFP_KERNEL);
if (buf) {
buf[0] = WAC_CMD_LED_CONTROL;
buf[1] = led;
buf[2] = value >> 2;
buf[3] = value;
/* use fixed brightness for OLEDs */
buf[4] = 0x08;
hid_hw_raw_request(hdev, buf[0], buf, 9, HID_FEATURE_REPORT,
   HID_REQ_SET_REPORT);
kfree(buf);
}

I don't remember for sure, but I think the range of brightness might be 
different
over usb and over bluetooth.

TL;DR: the only thing that needs to be fixed is image-over-bluetooth, probably 
caused by not
setting or incorrect setting of OLED brightness. 

-- 
Kind regards,
Przemo Firszt


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] irq: Rework IRQF_NO_SUSPENDED

2014-07-25 Thread Rafael J. Wysocki

On Friday, July 25, 2014 11:00:12 PM Thomas Gleixner wrote:
> On Fri, 25 Jul 2014, Rafael J. Wysocki wrote:
> > On Friday, July 25, 2014 03:25:41 PM Peter Zijlstra wrote:
> > > OK, so Rafael said there's devices that keep on raising their interrupt
> > > until they get attention. Ideally this won't happen because the device
> > > is suspended etc.. But I'm sure there's some broken piece of hardware
> > > out there that'll make it go boom.
> > 
> > So here's an idea.
> > 
> > What about returning IRQ_NONE rather than IRQ_HANDLED for "suspended"
> > interrupts (after all, that's what a sane driver would do for a
> > suspended device I suppose)?
> > 
> > If the line is really shared and the interrupt is taken care of by
> > the other guy sharing the line, we'll be all fine.
> > 
> > If that is not the case, on the other hand, and something's really
> > broken, we'll end up disabling the interrupt and marking it as
> > IRQS_SPURIOUS_DISABLED (if I understand things correctly).
> 
> We should not wait 100k unhandled interrupts in that case. We know
> already at the first unhandled interrupt that the shit hit the fan.

The first one may be a bus glitch or some such.  Also I guess we still need to
allow the legitimate "no suspend" guy to handle his interrupts until it gets
too worse.

Also does it really hurt to rely on the generic mechanism here?  We regard
it as fine at all other times after all.

> I'll have a deeper look how we can sanitize the whole wake/no_suspend
> logic vs. shared interrupts.

Cool, thanks!

> Need to look at the usage sites first.

There will be more of them, like this:

https://patchwork.kernel.org/patch/4618531/

Essentially, all wakeup interrupts will need at least one no_suspend irqaction
going forward.

Below is my take on this (untested) in case it is useful for anything.

It is targeted at the problematic case (that is, a shared interrupt with at 
least
one irqaction that has IRQF_NO_SUSPEND set and at least one that doesn't) only 
and
is not supposed to change behavior in the other cases (the do_irqaction thing
shamelessly stolen from the Peter's patch).  It drops the IRQD_WAKEUP_STATE 
check,
because that has the same problem with shared interrupts as no_suspend.

Rafael


---
 kernel/irq/handle.c |   21 ++---
 kernel/irq/manage.c |   30 +-
 2 files changed, 43 insertions(+), 8 deletions(-)

Index: linux-pm/kernel/irq/manage.c
===
--- linux-pm.orig/kernel/irq/manage.c
+++ linux-pm/kernel/irq/manage.c
@@ -385,10 +385,23 @@ setup_affinity(unsigned int irq, struct
 void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend)
 {
if (suspend) {
-   if (!desc->action || (desc->action->flags & IRQF_NO_SUSPEND)
-   || irqd_has_set(>irq_data, IRQD_WAKEUP_STATE))
+   struct irqaction *action = desc->action;
+   unsigned int no_suspend, flags;
+
+   if (!action)
+   return;
+   no_suspend = IRQF_NO_SUSPEND;
+   flags = 0;
+   do {
+   no_suspend &= action->flags;
+   flags |= action->flags;
+   action = action->next;
+   } while (action);
+   if (no_suspend)
return;
desc->istate |= IRQS_SUSPENDED;
+   if (flags & IRQF_NO_SUSPEND)
+   return;
}
 
if (!desc->depth++)
@@ -446,7 +459,15 @@ EXPORT_SYMBOL(disable_irq);
 void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume)
 {
if (resume) {
-   if (!(desc->istate & IRQS_SUSPENDED)) {
+   if (desc->istate & IRQS_SUSPENDED) {
+   desc->istate &= ~IRQS_SUSPENDED;
+   if (desc->istate & IRQS_SPURIOUS_DISABLED) {
+   pr_err("WARNING! Unhandled events during 
suspend for IRQ %d\n", irq);
+   desc->istate &= ~IRQS_SPURIOUS_DISABLED;
+   } else if (desc->depth == 0) {
+   return;
+   }
+   } else {
if (!desc->action)
return;
if (!(desc->action->flags & IRQF_FORCE_RESUME))
@@ -454,7 +475,6 @@ void __enable_irq(struct irq_desc *desc,
/* Pretend that it got disabled ! */
desc->depth++;
}
-   desc->istate &= ~IRQS_SUSPENDED;
}
 
switch (desc->depth) {
@@ -1079,7 +1099,7 @@ __setup_irq(unsigned int irq, struct irq
 */
 
 #define IRQF_MISMATCH \
-   (IRQF_TRIGGER_MASK | IRQF_ONESHOT | IRQF_NO_SUSPEND)
+   (IRQF_TRIGGER_MASK | IRQF_ONESHOT)
 
if (!((old->flags & new->flags) & IRQF_SHARED) ||

[PATCH v2] perf: fix arm64 build error

2014-07-25 Thread Mark Salter

I'm seeing the following build error on arm64:

  In file included from util/event.c:3:0:
  util/event.h:95:17: error: 'PERF_REGS_MAX' undeclared here (not in a function)
u64 cache_regs[PERF_REGS_MAX];
 ^

This patch adds a PERF_REGS_MAX definition for arm64.

Signed-off-by: Mark Salter 
Acked-by: Jean Pihet 
---
 tools/perf/arch/arm64/include/perf_regs.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/arch/arm64/include/perf_regs.h 
b/tools/perf/arch/arm64/include/perf_regs.h
index e9441b9..1d3f39c 100644
--- a/tools/perf/arch/arm64/include/perf_regs.h
+++ b/tools/perf/arch/arm64/include/perf_regs.h
@@ -6,6 +6,8 @@
 #include 
 
 #define PERF_REGS_MASK ((1ULL << PERF_REG_ARM64_MAX) - 1)
+#define PERF_REGS_MAX  PERF_REG_ARM64_MAX
+
 #define PERF_REG_IPPERF_REG_ARM64_PC
 #define PERF_REG_SPPERF_REG_ARM64_SP
 
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] mmu_notifier: Add mmu_notifier_invalidate_range()

2014-07-25 Thread Joerg Roedel

On Fri, Jul 25, 2014 at 02:42:13PM -0700, Jesse Barnes wrote:
> On Fri, 25 Jul 2014 23:38:06 +0200
> Joerg Roedel  wrote:
> > I though about removing the need for invalidate_range_end too when
> > writing the patches, and possible solutions are
> > 
> > 1) Add mmu_notifier_invalidate_range() to all places where
> >start/end is called too. This might add some unnecessary
> >overhead.
> > 
> > 2) Call the invalidate_range() call-back from the
> >mmu_notifier_invalidate_range_end too.
> > 
> > 3) Just let the user register the same function for
> >invalidate_range and invalidate_range_end
> > 
> > I though that option 1) adds overhead that is not needed (but it might
> > not be too bad, the overhead is an additional iteration over the
> > mmu_notifer list when there are no call-backs registered).
> > 
> > Option 2) might also be overhead if a user registers different functions
> > for invalidate_range() and invalidate_range_end(). In the end I came to
> > the conclusion that option 3) is the best one from an overhead POV.
> > 
> > But probably targeting better usability with one of the other options is
> > a better choice? I am open for thoughts and suggestions on that.
> 
> Making the _end callback just do another TLB flush is fine too, but it
> would be nice to have the consistency of (1).  I can live with either
> though, as long as the callbacks are well documented.

You are right, having this consistency would be good. The more I think
about it, the more it makes sense to go with option 2). Option 1) would
mean that invalidate_range is explicitly called right before
invalidate_range_end at some places. Doing this implicitly like in
option 2) is cleaner and less error-prone. And the list of mmu_notifiers
needs only be traversed once in invalidate_range_end(), so additional
overhead is minimal. I'll update patch 3 for this, unless there are
other opinions.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the v4l-dvb tree

2014-07-25 Thread Mauro Carvalho Chehab

Em Fri, 25 Jul 2014 07:25:05 +0200
David Härdeman  escreveu:

> Mauro,
> 
> On July 25, 2014 4:23:17 AM CEST, Stephen Rothwell  
> wrote:
> >Hi Mauro,
> >
> >After merging the v4l-dvb tree, today's linux-next build (x86_64
> >allmodconfig)
> >failed like this:
> >
> >drivers/hid/hid-picolcd_cir.c: In function 'picolcd_init_cir':
> >drivers/hid/hid-picolcd_cir.c:117:6: error: 'struct rc_dev' has no
> >member named 'allowed_protos'
> >  rdev->allowed_protos   = RC_BIT_ALL;
> >  ^

Sorry for not noticing. I generally don't do full builds, as doing it
for all archs would require a bigger compilation system than what I
have here. I generally do only partial builds for (almost) all archs
before pushing at linux-next, because it is very hare that a change
on media would break anything outside it.

> I'll have time to look at it on Monday/Tuesday unless you beat me to it.
> 

Well, I can beat you on that ;) it should be allowed_protocols. I'll
fix it here and do a full build to see if you made this mistake on
some other file, as this is the second file where you typed it wrong ;)

Regards,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] mmu_notifier: Add mmu_notifier_invalidate_range()

2014-07-25 Thread Jerome Glisse

On Fri, Jul 25, 2014 at 11:38:06PM +0200, Joerg Roedel wrote:
> On Fri, Jul 25, 2014 at 01:16:39PM -0700, Jesse Barnes wrote:
> > > To allow managing external TLBs the MMU-notifiers need to
> > > catch the moment when pages are unmapped but not yet freed.
> > > This new notifier catches that moment and notifies the
> > > interested subsytem when pages that were unmapped are about
> > > to be freed. The new notifier will only be called between
> > > invalidate_range_start()/end().
> > 
> > So if we were actually sharing page tables, we should be able to make
> > start/end no-ops and just use this new callback, assuming we didn't
> > need to do any other serialization or debug stuff, right?
> 
> Well, not completly. What you need with this patch-set is a
> invalidate_range and an invalidate_end call-back. There are call sites
> of the start/end functions where the TLB flush happens after the _end
> notifier (or at least can wait until _end is called). I did not add
> invalidate_range calls to these places (yet). But you can easily discard
> invalidate_range_start, any flush done in there is useless with shared
> page-tables.
> 
> I though about removing the need for invalidate_range_end too when
> writing the patches, and possible solutions are
> 
>   1) Add mmu_notifier_invalidate_range() to all places where
>  start/end is called too. This might add some unnecessary
>  overhead.
> 
>   2) Call the invalidate_range() call-back from the
>  mmu_notifier_invalidate_range_end too.
> 
>   3) Just let the user register the same function for
>  invalidate_range and invalidate_range_end
> 
> I though that option 1) adds overhead that is not needed (but it might
> not be too bad, the overhead is an additional iteration over the
> mmu_notifer list when there are no call-backs registered).
> 
> Option 2) might also be overhead if a user registers different functions
> for invalidate_range() and invalidate_range_end(). In the end I came to
> the conclusion that option 3) is the best one from an overhead POV.
> 
> But probably targeting better usability with one of the other options is
> a better choice? I am open for thoughts and suggestions on that.
> 

I should add that for hmm it is crucial to exactly match call to start and
end ie hmm needs to know when it can start again to do cpu page table look
up and expect valid content.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] spi/pxa2xx-pci: Enable DMA binding through device name

2014-07-25 Thread One Thousand Gnomes

> The current plan I think is to convert all platforms to use DT
> or ACPI so they get the right data from tables passed by the
> platform. I'm a bit puzzled about why Intel wants to support the
> non-ACPI non-DT case again. If we have to support this case anyway,
> what good will ACPI do us on those platforms?

Because some industries move very very slowly so still don't believe in
requiring ACPI or ACPI capable operating systems.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] mmu_notifier: Add mmu_notifier_invalidate_range()

2014-07-25 Thread Jesse Barnes

On Fri, 25 Jul 2014 23:38:06 +0200
Joerg Roedel  wrote:

> On Fri, Jul 25, 2014 at 01:16:39PM -0700, Jesse Barnes wrote:
> > > To allow managing external TLBs the MMU-notifiers need to
> > > catch the moment when pages are unmapped but not yet freed.
> > > This new notifier catches that moment and notifies the
> > > interested subsytem when pages that were unmapped are about
> > > to be freed. The new notifier will only be called between
> > > invalidate_range_start()/end().
> > 
> > So if we were actually sharing page tables, we should be able to make
> > start/end no-ops and just use this new callback, assuming we didn't
> > need to do any other serialization or debug stuff, right?
> 
> Well, not completly. What you need with this patch-set is a
> invalidate_range and an invalidate_end call-back. There are call sites
> of the start/end functions where the TLB flush happens after the _end
> notifier (or at least can wait until _end is called). I did not add
> invalidate_range calls to these places (yet). But you can easily discard
> invalidate_range_start, any flush done in there is useless with shared
> page-tables.
> 
> I though about removing the need for invalidate_range_end too when
> writing the patches, and possible solutions are
> 
>   1) Add mmu_notifier_invalidate_range() to all places where
>  start/end is called too. This might add some unnecessary
>  overhead.
> 
>   2) Call the invalidate_range() call-back from the
>  mmu_notifier_invalidate_range_end too.
> 
>   3) Just let the user register the same function for
>  invalidate_range and invalidate_range_end
> 
> I though that option 1) adds overhead that is not needed (but it might
> not be too bad, the overhead is an additional iteration over the
> mmu_notifer list when there are no call-backs registered).
> 
> Option 2) might also be overhead if a user registers different functions
> for invalidate_range() and invalidate_range_end(). In the end I came to
> the conclusion that option 3) is the best one from an overhead POV.
> 
> But probably targeting better usability with one of the other options is
> a better choice? I am open for thoughts and suggestions on that.

Making the _end callback just do another TLB flush is fine too, but it
would be nice to have the consistency of (1).  I can live with either
though, as long as the callbacks are well documented.

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC V8 1/1] clk: Support for clock parents and rates assigned from device tree

2014-07-25 Thread Mike Turquette

Quoting Sylwester Nawrocki (2014-07-03 10:25:53)
> On 18/06/14 17:29, Sylwester Nawrocki wrote:
> > This patch adds helper functions to configure clock parents and rates
> > as specified through 'assigned-clock-parents', 'assigned-clock-rates'
> > DT properties for a clock provider or clock consumer device.
> > The helpers are now being called by the bus code for the platform, I2C
> > and SPI busses, before the driver probing and also in the clock core
> > after registration of a clock provider.
> > 
> > Signed-off-by: Sylwester Nawrocki 
> > Acked-by: Kyungmin Park 
> 
> Could someone please take a look and review that patch ?
> Any further suggestions, ACKs/NAKs ?

Patch looks good to me. I'm happy to take it.

> 
> I would appreciate a DT, SPI or the I2C maintainer opinions.

Yes, Acks from SPI and I2C maintainers would be good. I might need to
drop those parts of this patch if they don't come through :-(

Regards,
Mike

> 
> Thanks,
> Sylwester
> > ---
> > Changes since v6:
> >  - use a set of separate DT properties to specify the default parent
> >clocks and rates;
> >  - the clock defaults setting extended to the I2C and SPI busses.
> > 
> > Changes since v5:
> >  - updated the DT binding description (dropped 'assigned-clocks' node);
> >  - fixed detecting of null phandles (ENOENT error handling);
> >  - modified of_clk_init() to account for that the clocks property may now
> >contain a clock specifier with a phandle that points to our node;
> > 
> > Changes since v4:
> >  - added note explaining how to skip setting parent and rate
> >of a clock,
> >  - moved of_clk_dev_init() calls to the platform bus,
> >  - added missing call to of_node_put(),
> >  - dropped debug traces.
> > 
> > Changes since v3:
> >  - added detailed description of the assigned-clocks subnode,
> >  - added missing 'static inline' to the function stub definition,
> >  - clk-conf.c is now excluded when CONFIG_OF is not set,
> >  - s/of_clk_device_setup/of_clk_device_init.
> > 
> > Changes since v2:
> >  - edited in clock-bindings.txt, added note about 'assigned-clocks'
> >subnode which may be used to specify "global" clocks configuration
> >at a clock provider node,
> >  - moved of_clk_device_setup() function declaration from clk-provider.h
> >to clk-conf.h so required function stubs are available when
> >CONFIG_COMMON_CLK is not enabled,
> > 
> > Changes since v1:
> >  - the helper function to parse and set assigned clock parents and
> >rates made public so it is available to clock providers to call
> >directly;
> >  - dropped the platform bus notification and call of_clk_device_setup()
> >is is now called from the driver core, rather than from the
> >notification callback;
> >  - s/of_clk_get_list_entry/of_clk_get_by_property.
> > ---
> >  .../devicetree/bindings/clock/clock-bindings.txt   |   36 +
> >  drivers/base/platform.c|5 +
> >  drivers/clk/Makefile   |3 +
> >  drivers/clk/clk-conf.c |  143 
> > 
> >  drivers/clk/clk.c  |   12 +-
> >  drivers/i2c/i2c-core.c |5 +
> >  drivers/spi/spi.c  |5 +
> >  include/linux/clk/clk-conf.h   |   20 +++
> >  8 files changed, 227 insertions(+), 2 deletions(-)
> >  create mode 100644 drivers/clk/clk-conf.c
> >  create mode 100644 include/linux/clk/clk-conf.h
> > 
> > diff --git a/Documentation/devicetree/bindings/clock/clock-bindings.txt 
> > b/Documentation/devicetree/bindings/clock/clock-bindings.txt
> > index f1578781..06fc6d5 100644
> > --- a/Documentation/devicetree/bindings/clock/clock-bindings.txt
> > +++ b/Documentation/devicetree/bindings/clock/clock-bindings.txt
> > @@ -131,3 +131,39 @@ clock signal, and a UART.
> >("pll" and "pll-switched").
> >  * The UART has its baud clock connected the external oscillator and its
> >register clock connected to the PLL clock (the "pll-switched" signal)
> > +
> > +==Assigned clock parents and rates==
> > +
> > +Some platforms may require initial configuration of default parent clocks
> > +and clock frequencies. Such a configuration can be specified in a device 
> > tree
> > +node through assigned-clocks, assigned-clock-parents and 
> > assigned-clock-rates
> > +properties. The assigned-clock-parents property should contain a list of 
> > parent
> > +clocks in form of phandle and clock specifier pairs, the 
> > assigned-clock-parents
> > +property the list of assigned clock frequency values - corresponding to 
> > clocks
> > +listed in the assigned-clocks property.
> > +
> > +To skip setting parent or rate of a clock its corresponding entry should be
> > +set to 0, or can be omitted if it is not followed by any non-zero entry.
> > +
> > +uart@a000 {
> > +compatible = "fsl,imx-uart";
> > +reg = <0xa000 0x1000>;
> > +...
> > +

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1508 matches

Mail list logo