Re: yama_ptrace_access_check(): possible recursive locking detected

2012-08-14 Thread Kees Cook
On Tue, Aug 14, 2012 at 8:01 PM, Fengguang Wu  wrote:
> On Tue, Aug 14, 2012 at 02:16:52PM -0700, Kees Cook wrote:
>> On Thu, Aug 9, 2012 at 6:52 PM, Fengguang Wu  wrote:
>> > On Thu, Aug 09, 2012 at 06:39:34PM -0700, Kees Cook wrote:
>> >> Hi,
>> >>
>> >> So, after taking a closer look at this, I cannot understand how it's
>> >> possible. Yama's task_lock call is against "current", not "child",
>> >> which is what ptrace_may_access() is locking. And the same code makes
>> >> sure that current != child. Yama would never get called if current ==
>> >> child.
>> >>
>> >> How did you reproduce this situation?
>> >
>> > This warning can be triggered with Dave Jones' trinity tool:
>> >
>> > git://git.codemonkey.org.uk/trinity
>> >
>> > That's a very dangerous tool, please only run it as normal user in a
>> > backed up and chrooted test box. I personally run it inside an initrd.
>> > If you are interested in reproducing this, I can send you the ready
>> > made initrd in private email.
>>
>> Well, even with your initrd, I can't reproduce this. You're running
>> this against a stock kernel? I can't see how the path you've shown can
>
> Yes, it happens on 3.6-rc1.
>
>> possible happen. It could only happen if "task" was "current", but
>> there is an explicit test for that in ptrace_may_access(). Based on
>> the traceback, this is from reading /proc/$pid/stack (or
>> /proc/$pid/task/$tid/stack), rather than a direct ptrace() call, but
>> the code path for task != current still stands.
>>
>> I've tried both normal and "trinity -c read" and I haven't seen the
>> trace you found. :(
>>
>> If you can isolate the case further, I'm happy to fix it, but
>> currently, I don't see a path where this can deadlock.
>
> Even if it's proved to be a false warning, it's still very worthwhile
> to apply Oleg's fix to quiet the warning. Such warnings will mislead
> my bisect script. The sooner it's fixed, the better. And I like Oleg's
> fix because it makes things more simple and a little bit faster.
>
> btw, I see some different warnings when digging through the boot logs:
>
> (x86_64-randconfig-b050)
> [  128.725667]
> [  128.728649] =
> [  128.733989] [ INFO: possible recursive locking detected ]
> [  128.733989] 3.6.0-rc1 #1 Not tainted
> [  128.733989] -
> [  128.733989] trinity-child0/523 is trying to acquire lock:
> [  128.733989]  (&(>alloc_lock)->rlock){+.+...}, at: [] 
> get_task_comm+0x20/0x47
> [  128.733989]
> [  128.733989] but task is already holding lock:
> [  128.733989]  (&(>alloc_lock)->rlock){+.+...}, at: [] 
> sys_ptrace+0x158/0x313
> [  128.733989]
> [  128.733989] other info that might help us debug this:
> [  128.733989]  Possible unsafe locking scenario:
> [  128.733989]
> [  128.733989]CPU0
> [  128.733989]
> [  128.733989]   lock(&(>alloc_lock)->rlock);
> [  128.733989]   lock(&(>alloc_lock)->rlock);
> [  128.733989]
> [  128.733989]  *** DEADLOCK ***
> [  128.733989]
> [  128.733989]  May be due to missing lock nesting notation
> [  128.733989]
> [  128.733989] 2 locks held by trinity-child0/523:
> [  128.733989]  #0:  (>cred_guard_mutex){+.+.+.}, at: 
> [] sys_ptrace+0x13d/0x313
> [  128.733989]  #1:  (&(>alloc_lock)->rlock){+.+...}, at: 
> [] sys_ptrace+0x158/0x313
> [  128.733989]
> [  128.733989] stack backtrace:
> [  128.733989] Pid: 523, comm: trinity-child0 Not tainted 3.6.0-rc1 #1
> [  128.733989] Call Trace:
> [  128.733989]  [] __lock_acquire+0xbe0/0xcfb
> [  128.733989]  [] ? mark_lock+0x2d/0x212
> [  128.733989]  [] ? mark_lock+0x2d/0x212
> [  128.733989]  [] lock_acquire+0x82/0x9d
> [  128.733989]  [] ? get_task_comm+0x20/0x47
> [  128.733989]  [] _raw_spin_lock+0x3b/0x4a
> [  128.733989]  [] ? get_task_comm+0x20/0x47
> [  128.733989]  [] get_task_comm+0x20/0x47
> [  128.733989]  [] yama_ptrace_access_check+0x16a/0x1c7
> [  128.733989]  [] ? lock_release+0x12b/0x157
> [  128.733989]  [] security_ptrace_access_check+0xe/0x10
> [  128.733989]  [] __ptrace_may_access+0x109/0x11b
> [  128.733989]  [] sys_ptrace+0x165/0x313
> [  128.733989]  [] system_call_fastpath+0x16/0x1b
> [  128.823670] ptrace of pid 522 was attempted by: trinity-child0 (pid 523)

Okay, I've now managed to reproduce this locally. I added a bunch of
debugging, and I think I understand what's going on. This warning is,
actually, a false positive. It is correct in that the _class_ of locks
get used recursively (the task_struct->alloc_lock), but they are
separate instantiations ("task" is never "current").

So Oleg's suggestion of removing the locking around the reading of
->comm is wrong since it really does need the lock. I've read the bit
on declaring nested locking, but it doesn't seem to apply here. I have
no idea what the correct solution for this is since the code already
verifies that the same task_struct instance will never be the locked
twice. How can I teach the lockdep checker about this?

-Kees

-- 
Kees Cook

Re: [PATCH V2 0/5] x86: Create direct mappings for E820_RAM only

2012-08-14 Thread WANG Chao

On 08/15/2012 06:54 AM, Jacob Shin wrote:

On Tue, Aug 14, 2012 at 04:34:39PM +0800, Dave Young wrote:

On 08/14/2012 05:46 AM, Jacob Shin wrote:


Currently kernel direct mappings are created for all pfns between
[ 0 to max_low_pfn ) and [ 4GB to max_pfn ). When we introduce memory
holes, we end up mapping memory ranges that are not backed by physical
DRAM. This is fine for lower memory addresses which can be marked as UC
by fixed/variable range MTRRs, however we run in to trouble with high
addresses.

The following patchset creates direct mappings only for E820_RAM regions
between 0 ~ max_low_pfn and 4GB ~ max_pfn. And leaves non-E820_RAM and
memory holes unmapped.



Hi,

Chaowang did some kdump test in a kvm guest with this patchset, 2nd
kenrel just reboot after some ACPI printk, see below dmesg of 2nd kernel:


Hello, thanks for testing, since I have not tested under KVM .. I also have
not tested passing in user supplied memory maps as your kernel log suggests.

Looking into this, it seems like we get a page fault while trying to set up
fixmap for the APIC. I think the fixmap is set up even before we get to
setup_arch(), and it is sitting in memory that is not marked as usable by
your user supplied e820.

Could you give V3 a try? I just sent it out a minute ago, this version
won't try to remap what has already been mapped as part of the boot process
before we get to setup_arch, it'll just take what its given.



Hi, Jacob

I just tried v3 patchset in my x86_64 kvm guest, it was booting 
successfully and the issue mentioned is gone.


-WANG Chao

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] select GENERIC_ATOMIC64 for c6x/score/unicore32 archs

2012-08-14 Thread guanxuetao
> Sorry I have no compilers for build testing these changes, however the
> risk looks low and it's much better than to leave the arch broken,
> considering that Eric will do atomic64_t in the core fs/namespace.c code.
>
> CC: "Eric W. Biederman" 
> Signed-off-by: Fengguang Wu 
It looks ok for unicore32.

Signed-off-by: Guan Xuetao 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] debugfs: Allow debugfs_create_dir() to take data

2012-08-14 Thread Hiroshi Doyu
On Thu, 9 Aug 2012 14:56:24 +0200
Hiroshi Doyu  wrote:

> Hi Greg, Felipe,
> 
> On Wed, 8 Aug 2012 15:34:27 +0200
> Greg Kroah-Hartman  wrote:
> 
> > On Wed, Aug 08, 2012 at 09:24:32AM +0300, Hiroshi Doyu wrote:
> > > Add __debugfs_create_dir(), which takes data passed from caller.
> > 
> > Why?
> > 
> > > Signed-off-by: Hiroshi Doyu 
> > > ---
> > >  fs/debugfs/inode.c  |7 ---
> > >  include/linux/debugfs.h |9 -
> > >  2 files changed, 12 insertions(+), 4 deletions(-)
> .
> > What are you trying to do here?  This patch doesn't look right at all.
> 
> I missed to send the cover letter of this patch series to LKML, which
> explained the background. I attached that cover letter below. Please
> read the following explanation too.

Any chance to get some feedback on this?

I'm also sending another version of patch, which just uses
debgufs_create_dir() as Felipe suggested, in-reply-to this mail.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] net: add new QCA alx ethernet driver

2012-08-14 Thread Ren, Cloud
From: David Miller [mailto:da...@davemloft.net]
Sent: Wednesday, August 15, 2012 1:33 PM

>From: "Ren, Cloud" 
>Date: Wed, 15 Aug 2012 03:29:26 +
>
 +  strncpy(netdev->name, pci_name(pdev), sizeof(netdev->name) - 1);
 ...
 +  strcpy(netdev->name, "eth%d");
 +  retval = register_netdev(netdev);
>>>
>>>The strcpy is unnecessary, alloc_etherdev already sets that.
>>
>> The strcpy is useful. netdev->name is set as pci_name in front. So the strcpy
>restores it.
>
>Are you doing this just to influence the initial driver log messages?

 Yes.

>
>Don't do that, it's gross.

Ok, I will remove it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: add new QCA alx ethernet driver

2012-08-14 Thread David Miller
From: "Ren, Cloud" 
Date: Wed, 15 Aug 2012 03:29:26 +

>>> +   strncpy(netdev->name, pci_name(pdev), sizeof(netdev->name) - 1);
>>> ...
>>> +   strcpy(netdev->name, "eth%d");
>>> +   retval = register_netdev(netdev);
>>
>>The strcpy is unnecessary, alloc_etherdev already sets that.
> 
> The strcpy is useful. netdev->name is set as pci_name in front. So the strcpy 
> restores it.

Are you doing this just to influence the initial driver log messages?

Don't do that, it's gross.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] cputime: Virtual cputime accounting small cleanups and consolidation v2

2012-08-14 Thread Frederic Weisbecker
On Tue, Aug 14, 2012 at 04:16:46PM +0200, Frederic Weisbecker wrote:
> Hi,
> 
> No fundamental change in this release but a rebase to solve conflicts
> against latest tip:/sched/core commits.
> 
> Thanks.

This can be pulled from:

git://github.com/fweisbec/linux-dynticks.git
virt-cputime-v2

This patchset, besides beeing a desired consolidation and
cleanup IMO, is necessary for the adaptive nohz feature
(see: http://comments.gmane.org/gmane.linux.kernel/1337690)

Thanks.

> 
> Frederic Weisbecker (4):
>   cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
>   sched: Move cputime code to its own file
>   cputime: Consolidate vtime handling on context switch
>   s390: Remove leftover account_tick_vtime() header
> 
>  arch/Kconfig   |3 +
>  arch/ia64/Kconfig  |   12 +-
>  arch/ia64/include/asm/switch_to.h  |8 -
>  arch/ia64/kernel/time.c|4 +-
>  arch/powerpc/include/asm/time.h|6 -
>  arch/powerpc/kernel/process.c  |3 -
>  arch/powerpc/kernel/time.c |6 +
>  arch/powerpc/platforms/Kconfig.cputype |   16 +-
>  arch/s390/Kconfig  |5 +-
>  arch/s390/include/asm/switch_to.h  |4 -
>  arch/s390/kernel/vtime.c   |4 +-
>  include/linux/kernel_stat.h|6 +
>  init/Kconfig   |   13 +
>  kernel/sched/Makefile  |2 +-
>  kernel/sched/core.c|  558 
> +---
>  kernel/sched/cputime.c |  503 
>  kernel/sched/sched.h   |   63 
>  17 files changed, 606 insertions(+), 610 deletions(-)
>  create mode 100644 kernel/sched/cputime.c
> 
> -- 
> 1.7.5.4
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 06/21] userns: Print out socket uids in a user namespace aware fashion.

2012-08-14 Thread Eric W. Biederman
"Rémi Denis-Courmont"  writes:

> Le lundi 13 août 2012 23:18:20, vous avez écrit :
>> From: "Eric W. Biederman" 
>> 
>> Cc: David Miller 
>> Cc: Alexey Kuznetsov 
>> Cc: James Morris 
>> Cc: Hideaki YOSHIFUJI 
>> Cc: Patrick McHardy 
>> Cc: Arnaldo Carvalho de Melo 
>> Cc: Vlad Yasevich 
>> Cc: Sridhar Samudrala 
>> Acked-by: Serge Hallyn 
>> Signed-off-by: Eric W. Biederman 
>
> FWIW, ...

Well I will take every bit of review I can get.  It can be terribly
embarrassing to discover you messed up uid/gid handling and no one
noticed the security hole for 3 kernel releases...

> Acked-By: Rémi Denis-Courmont 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] UDF: During mount free lvid_bh before rescanning with different blocksize

2012-08-14 Thread Ashish Sangwan
Hi Jan,

>   Yeah, I don't think this happens in practice but in theory it could. BTW,
> did you check whether we don't need to free other information (like VAT
> inode etc.) when rescanning the filesystem? I think we do but currently I'm
> catching up after a long vacation and this doesn't have high priority.
>
Actually, it did happen in practice. That's how we discovered it.
Currently, it seems a bug.
We formatted a usb stick, sector size 512bytes, using mkudffs with
default block size 2KB.
While writing to this media we unplug the USB which left lvid in
corrupted state.
On remounting, first UDF tries to mount the media with sector size and
somehow it managed to fill lvid bh
but failed to load metadata/mirror fe because of wrong block size.
While rescanning with 2KB block size it failed to load the correct
lvid as it was corrupted earlier
and ended up using the wrong one.
After noticing this problem, we did check other info too. Everything
else seems to be correct.

> Anyway, I've added your patch to my tree. Thanks.
>
 Thanks for looking into the patch.

Ashish
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] add support bluetooth usb 0489:e046 Foxconn / Hon Hai

2012-08-14 Thread Gustavo Padovan
* aborigines <7aborigin...@gmail.com> [2012-08-12 14:41:44 +0700]:

> $ usb-deveices
> T: Bus=01 Lev=02 Prnt=02 Port=02 Cnt=03 Dev#= 5 Spd=12 MxCh= 0
> D: Ver= 2.00 Cls=ff(vend.) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
> P: Vendor=0489 ProdID=e046 Rev=01.12
> S: Manufacturer=Broadcom Corp
> S: Product=BCM20702A0
> S: SerialNumber=C01885F67F9E
> C: #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=0mA
> I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
> I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
> I: If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
> I: If#= 3 Alt= 0 #EPs= 0 Cls=fe(app. ) Sub=01 Prot=01 Driver=(none)
> 
> $ lsusb
> Bus 001 Device 005: ID 0489:e046 Foxconn / Hon Hai
> ---
>  drivers/bluetooth/btusb.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
> index fc4bcd6..fb54c70 100644
> --- a/drivers/bluetooth/btusb.c
> +++ b/drivers/bluetooth/btusb.c
> @@ -110,6 +110,7 @@ static struct usb_device_id btusb_table[] = {
>  
>   /* Foxconn - Hon Hai */
>   { USB_DEVICE(0x0489, 0xe033) },
> + { USB_DEVICE(0x0489, 0xe046) },
>  
>   { } /* Terminating entry */
>  };

Can you try the following patch and see if it works for you.

Gustavo

---
commit 8d2952f43af434f0aa5d109a9ae50c5886a9322c (HEAD, master)
Author: Gustavo Padovan 
Date:   Wed Aug 15 01:38:11 2012 -0300

Bluetooth: Add USB_VENDOR_AND_INTERFACE_INFO() for Broadcom/Foxconn

Foxconn devices has a vendor specific class of device, we will match them
differently now.

Signed-off-by: Gustavo Padovan 

diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index f637c25..f077f4d 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -101,7 +101,7 @@ static struct usb_device_id btusb_table[] = {
{ USB_DEVICE(0x413c, 0x8197) },
 
/* Foxconn - Hon Hai */
-   { USB_DEVICE(0x0489, 0xe033) },
+   { USB_VENDOR_AND_INTERFACE_INFO(0x0489, 0xff, 0x01, 0x01) },
 
{ } /* Terminating entry */
 };

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 10/65] ARM: 7467/1: mutex: use generic xchg-based implementation for ARMv6+

2012-08-14 Thread Ben Hutchings
On Mon, 2012-08-13 at 15:13 -0700, Greg Kroah-Hartman wrote:
> From: Greg KH 
> 
> 3.4-stable review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: Will Deacon 
> 
> commit a76d7bd96d65fa5119adba97e1b58d95f2e78829 upstream.
> 
> The open-coded mutex implementation for ARMv6+ cores suffers from a
> severe lack of barriers, so in the uncontended case we don't actually
> protect any accesses performed during the critical section.
> 
> Furthermore, the code is largely a duplication of the ARMv6+ atomic_dec
> code but optimised to remove a branch instruction, as the mutex fastpath
> was previously inlined. Now that this is executed out-of-line, we can
> reuse the atomic access code for the locking (in fact, we use the xchg
> code as this produces shorter critical sections).
> 
> This patch uses the generic xchg based implementation for mutexes on
> ARMv6+, which introduces barriers to the lock/unlock operations and also
> has the benefit of removing a fair amount of inline assembly code.
[...]

I understand that a further fix is needed on top of this
 but it's
not in Linus's tree yet.  Is it better to apply this on its own or to
wait for the complete fix?

Ben.

-- 
Ben Hutchings
I say we take off; nuke the site from orbit.  It's the only way to be sure.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-14 Thread Rusty Russell
On Tue, 14 Aug 2012 11:33:20 +0300, "Michael S. Tsirkin"  
wrote:
> On Tue, Aug 14, 2012 at 09:29:49AM +0930, Rusty Russell wrote:
> > On Mon, 13 Aug 2012 11:41:23 +0300, "Michael S. Tsirkin"  
> > wrote:
> > > On Fri, Aug 10, 2012 at 02:55:15PM -0300, Rafael Aquini wrote:
> > > > +/*
> > > > + * Populate balloon_mapping->a_ops->freepage method to help compaction 
> > > > on
> > > > + * re-inserting an isolated page into the balloon page list.
> > > > + */
> > > > +void virtballoon_putbackpage(struct page *page)
> > > > +{
> > > > +   spin_lock(_lock);
> > > > +   list_add(>lru, _ptr->pages);
> > > > +   spin_unlock(_lock);
> > > 
> > > Could the following race trigger:
> > > migration happens while module unloading is in progress,
> > > module goes away between here and when the function
> > > returns, then code for this function gets overwritten?
> > > If yes we need locking external to module to prevent this.
> > > Maybe add a spinlock to struct address_space?
> > 
> > The balloon module cannot be unloaded until it has leaked all its pages,
> > so I think this is safe:
> > 
> > static void remove_common(struct virtio_balloon *vb)
> > {
> > /* There might be pages left in the balloon: free them. */
> > while (vb->num_pages)
> > leak_balloon(vb, vb->num_pages);
> > 
> > Cheers,
> > Rusty.
> 
> I know I meant something else.
> Let me lay this out:
> 
> CPU1 executes:
> void virtballoon_putbackpage(struct page *page)
> {
>   spin_lock(_lock);
>   list_add(>lru, _ptr->pages);
>   spin_unlock(_lock);
> 
> 
>   at this point CPU2 unloads module:
>   leak_balloon
>   ..
> 
>   next CPU2 loads another module so code memory gets overwritten
> 
> now CPU1 executes the next instruction:
> 
> }
> 
> which would normally return to function's caller,
> but it has been overwritten by CPU2 so we get corruption.

Actually, I have no idea.

Where does virtballoon_putbackpage get called from?  It's some weird mm
thing, and I stay out of that mess.

The vb thread is stopped before we spin checking vb->num_pages, so it's
not touching pages; who would be calling this?

Confused,
Rusty.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Init: main: fixed a brace coding style issue

2012-08-14 Thread Rusty Russell
On Tue, 14 Aug 2012 12:24:33 +0200, Valerio Baudo =  wrote:
> From: Valerio Baudo 
> 
> Fixed a coding style issue.
> 
> Signed-off-by: Valerio Baudo 

Please forward to triv...@kernel.org.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] MIPS: fix module.c build for 32 bit

2012-08-14 Thread Rusty Russell
On Tue, 14 Aug 2012 15:08:10 +0100, David Howells  wrote:
> Rusty Russell  wrote:
> 
> > Yep, thanks.  And might as well sent them straight to Linus; since
> > linux-next didn't catch this, there's little point baking them there if
> > we have some acks.
> > 
> > If he misses it, I'll grab them.
> 
> It might have to wait for the next merge window.

For a build fix

Confused,
Rusty.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] user_ns: use new hashtable implementation

2012-08-14 Thread Mathieu Desnoyers
* Eric W. Biederman (ebied...@xmission.com) wrote:
> Sasha Levin  writes:
> 
> > On 08/15/2012 03:08 AM, Eric W. Biederman wrote:
> >>> I can offer the following: I'll write a small module that will hash 
> >>> 1...1
> >>> > into a hashtable which uses 7 bits (just like user_ns) and post the 
> >>> > distribution
> >>> > we'll get.
> >> That won't hurt.  I think 1-100 then 1000-1100 may actually be more
> >> representative.  Not that I would mind seeing the larger range.
> >> Especially since I am in the process of encouraging the use of more
> >> uids.
> >> 
> >
> > Alrighty, the results are in (numbers are objects in bucket):
> >
> > For the 0...1 range:
> >
> > Average: 78.125
> > Std dev: 1.4197704151
> > Min: 75
> > Max: 80
> >
> >
> > For the 1...100 range:
> >
> > Average: 0.78125
> > Std dev: 0.5164613088
> > Min: 0
> > Max: 2
> >
> >
> > For the 1000...1100 range:
> >
> > Average: 0.7890625
> > Std dev: 0.4964812206
> > Min: 0
> > Max: 2
> >
> >
> > Looks like hash_32 is pretty good with small numbers.
> 
> Yes hash_32 seems reasonable for the uid hash.   With those long hash
> chains I wouldn't like to be on a machine with 10,000 processes with
> each with a different uid, and a processes calling setuid in the fast
> path.
> 
> The uid hash that we are playing with is one that I sort of wish that
> the hash table could grow in size, so that we could scale up better.

Hi Eric,

If you want to try out something that has more features than a basic
hash table, already exists and is available for you to play with, you
might want to have a look at the RCU lock-free resizable hash table.
It's initially done in userspace, but shares the same RCU semantic as
the kernel, and has chunk-based kernel-friendly index backends (thanks
to Lai Jiangshan), very useful to integrate with the kernel page
allocator.

It has the following properties that might make this container a good
fit for uid hashing:

- Real-time friendly lookups: Lookups are RCU and wait-free.
- Fast and real-time friendly updates: Use cmpxchg for update, and RCU
  to deal with ABA.
- Resize (expand/shrink) for each power of two size, performed
  concurrently with ongoing updates and lookups.
- Has add_unique (uniquify), add_replace, and also duplicate semantics.
- Provide uniqueness guarantees for RCU traversals of the hash table
  with respect to add_unique and add_replace.

So if you are looking for a fast, RT-friendly, resizable hash table to
play with, you might want to have a look at the userspace RCU
implementation, which now features this hash table:

https://lttng.org/urcu

See urcu/rculfhash.h for the API.

Best regards,

Mathieu

> 
> Aw well.  Most of the time we only have a very small number of uids
> in play, so it doesn't matter at this point.
> 
> Eric
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] net: add new QCA alx ethernet driver

2012-08-14 Thread Ren, Cloud
>> +strncpy(netdev->name, pci_name(pdev), sizeof(netdev->name) - 1);
>> ...
>> +strcpy(netdev->name, "eth%d");
>> +retval = register_netdev(netdev);
>
>The strcpy is unnecessary, alloc_etherdev already sets that.

The strcpy is useful. netdev->name is set as pci_name in front. So the strcpy 
restores it.

Thanks
cloud

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/13] SCTP: Enable netns

2012-08-14 Thread Vlad Yasevich

On 08/04/2012 05:30 PM, Jan Ariyasu wrote:

The following set of patches enable network-namespaces for the SCTP protocol.

The multitude of global parameters are stored in a net_generic
structure, and the bulk of the patches enable the protocol to access
the parameters on a per-namespace basis.  The first five patches
enable netns handling of the protocol, procfs and sysfs.

Signed-off-by: Jan Ariyasu 
---



NACK.  Sorry Jan, but Eric's patches are much cleaner and do everything 
that's needed.


-vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 06/21] userns: Print out socket uids in a user namespace aware fashion.

2012-08-14 Thread Vlad Yasevich

On 08/13/2012 04:18 PM, Eric W. Biederman wrote:

From: "Eric W. Biederman" 

Cc: David Miller 
Cc: Alexey Kuznetsov 
Cc: James Morris 
Cc: Hideaki YOSHIFUJI 
Cc: Patrick McHardy 
Cc: Remi Denis-Courmont 
Cc: Arnaldo Carvalho de Melo 
Cc: Vlad Yasevich 
Cc: Sridhar Samudrala 
Acked-by: Serge Hallyn 
Signed-off-by: Eric W. Biederman 


ACK sctp parts

Acked-by: Vlad Yasevich 


---
  include/net/tcp.h  |3 ++-
  init/Kconfig   |6 --
  net/appletalk/atalk_proc.c |3 ++-
  net/ipv4/ping.c|4 +++-
  net/ipv4/raw.c |4 +++-
  net/ipv4/tcp_ipv4.c|6 +++---
  net/ipv4/udp.c |4 +++-
  net/ipv6/raw.c |3 ++-
  net/ipv6/tcp_ipv6.c|6 +++---
  net/ipv6/udp.c |3 ++-
  net/ipx/ipx_proc.c |3 ++-
  net/key/af_key.c   |2 +-
  net/llc/llc_proc.c |2 +-
  net/packet/af_packet.c |2 +-
  net/phonet/socket.c|6 --
  net/sctp/proc.c|6 --
  16 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e19124b..91e7467 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1509,7 +1509,8 @@ struct tcp_iter_state {
sa_family_t family;
enum tcp_seq_states state;
struct sock *syn_wait_sk;
-   int bucket, offset, sbucket, num, uid;
+   int bucket, offset, sbucket, num;
+   kuid_t  uid;
loff_t  last_pos;
  };

diff --git a/init/Kconfig b/init/Kconfig
index 80fae19..25a6ebb 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -942,10 +942,7 @@ config UIDGID_CONVERTED
depends on PROC_EVENTS = n

# Networking
-   depends on PACKET = n
depends on NET_9P = n
-   depends on IPX = n
-   depends on PHONET = n
depends on NET_CLS_FLOW = n
depends on NETFILTER_XT_MATCH_OWNER = n
depends on NETFILTER_XT_MATCH_RECENT = n
@@ -953,14 +950,11 @@ config UIDGID_CONVERTED
depends on NETFILTER_NETLINK_LOG = n
depends on INET = n
depends on IPV6 = n
-   depends on IP_SCTP = n
depends on AF_RXRPC = n
-   depends on LLC2 = n
depends on NET_KEY = n
depends on INET_DIAG = n
depends on DNS_RESOLVER = n
depends on AX25 = n
-   depends on ATALK = n

# Filesystems
depends on USB_GADGETFS = n
diff --git a/net/appletalk/atalk_proc.c b/net/appletalk/atalk_proc.c
index b5b1a22..c30f3a0 100644
--- a/net/appletalk/atalk_proc.c
+++ b/net/appletalk/atalk_proc.c
@@ -183,7 +183,8 @@ static int atalk_seq_socket_show(struct seq_file *seq, void 
*v)
   ntohs(at->dest_net), at->dest_node, at->dest_port,
   sk_wmem_alloc_get(s),
   sk_rmem_alloc_get(s),
-  s->sk_state, SOCK_INODE(s->sk_socket)->i_uid);
+  s->sk_state,
+  from_kuid_munged(seq_user_ns(seq), sock_i_uid(s)));
  out:
return 0;
  }
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 6232d47..bee5eeb 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -845,7 +845,9 @@ static void ping_format_sock(struct sock *sp, struct 
seq_file *f,
bucket, src, srcp, dest, destp, sp->sk_state,
sk_wmem_alloc_get(sp),
sk_rmem_alloc_get(sp),
-   0, 0L, 0, sock_i_uid(sp), 0, sock_i_ino(sp),
+   0, 0L, 0,
+   from_kuid_munged(seq_user_ns(f), sock_i_uid(sp)),
+   0, sock_i_ino(sp),
atomic_read(>sk_refcnt), sp,
atomic_read(>sk_drops), len);
  }
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index ff0f071..f242578 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -992,7 +992,9 @@ static void raw_sock_seq_show(struct seq_file *seq, struct 
sock *sp, int i)
i, src, srcp, dest, destp, sp->sk_state,
sk_wmem_alloc_get(sp),
sk_rmem_alloc_get(sp),
-   0, 0L, 0, sock_i_uid(sp), 0, sock_i_ino(sp),
+   0, 0L, 0,
+   from_kuid_munged(seq_user_ns(seq), sock_i_uid(sp)),
+   0, sock_i_ino(sp),
atomic_read(>sk_refcnt), sp, atomic_read(>sk_drops));
  }

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 42b2a6a..642be8a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2382,7 +2382,7 @@ void tcp_proc_unregister(struct net *net, struct 
tcp_seq_afinfo *afinfo)
  EXPORT_SYMBOL(tcp_proc_unregister);

  static void get_openreq4(const struct sock *sk, const struct request_sock 
*req,
-struct seq_file *f, int i, int uid, int *len)
+struct seq_file *f, int i, kuid_t uid, int *len)
  {
const struct inet_request_sock *ireq = inet_rsk(req);
int ttd = req->expires - jiffies;
@@ -2399,7 +2399,7 @@ 

Re: [PATCH net-next 0/7] sctp: network namespace support Part 2: per net tunables

2012-08-14 Thread Vlad Yasevich

On 08/07/2012 01:17 PM, Eric W. Biederman wrote:


Since I am motivated to get things done, and since there has been much
grumbling about my patches not implementing tunables, I have added
tunable support on top of my last patchset.

I have performed basic testing on the these patches and nothing
appears amis.

The sm statemachine is a major tease as it has all of these association
and endpoint pointers in the common set of function parameters that turn
out to be NULL at the most inconvinient times.  So I added to the common
parameter list a struct net pointer, that is never NULL.

  include/net/netns/sctp.h   |   96 +++-
  include/net/sctp/sctp.h|   16 +-
  include/net/sctp/sm.h  |8 +-
  include/net/sctp/structs.h |  126 +-
  net/sctp/associola.c   |   18 +-
  net/sctp/auth.c|   20 ++-
  net/sctp/bind_addr.c   |6 +-
  net/sctp/endpointola.c |   13 +-
  net/sctp/input.c   |6 +-
  net/sctp/primitive.c   |4 +-
  net/sctp/protocol.c|  137 +-
  net/sctp/sm_make_chunk.c   |   61 +++--
  net/sctp/sm_sideeffect.c   |   26 ++-
  net/sctp/sm_statefuns.c|  631 
  net/sctp/sm_statetable.c   |   17 +-
  net/sctp/socket.c  |   92 ---
  net/sctp/sysctl.c  |  200 --
  net/sctp/transport.c   |   23 +-
  18 files changed, 817 insertions(+), 683 deletions(-)

Eric W. Biederman (7):
   sctp: Add infrastructure for per net sysctls
   sctp: Push struct net down to sctp_chunk_event_lookup
   sctp: Push struct net down into sctp_transport_init
   sctp: Push struct net down into sctp_in_scope
   sctp: Push struct net down into all of the state machine functions
   sctp: Push struct net down into sctp_verify_ext_param
   sctp: Making sysctl tunables per net

Eric




Acked-by: Vlad Yasevich 

To this entire follow-on series.  This is much better.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net-next 1/7] sctp: Add infrastructure for per net sysctls

2012-08-14 Thread Vlad Yasevich

On 08/07/2012 01:23 PM, Eric W. Biederman wrote:


Start with an empty sctp_net_table that will be populated as the various
tunable sysctls are made per net.

Signed-off-by: "Eric W. Biederman" 


Acked-by: Vlad Yasevich 


---
  include/net/netns/sctp.h |6 +-
  include/net/sctp/sctp.h  |4 
  net/sctp/protocol.c  |7 +++
  net/sctp/sysctl.c|   21 +
  4 files changed, 37 insertions(+), 1 deletions(-)

diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h
index 06ccddf..9576b60 100644
--- a/include/net/netns/sctp.h
+++ b/include/net/netns/sctp.h
@@ -4,6 +4,7 @@
  struct sock;
  struct proc_dir_entry;
  struct sctp_mib;
+struct ctl_table_header;

  struct netns_sctp {
DEFINE_SNMP_STAT(struct sctp_mib, sctp_statistics);
@@ -11,7 +12,9 @@ struct netns_sctp {
  #ifdef CONFIG_PROC_FS
struct proc_dir_entry *proc_net_sctp;
  #endif
-
+#ifdef CONFIG_SYSCTL
+   struct ctl_table_header *sysctl_header;
+#endif
/* This is the global socket data structure used for responding to
 * the Out-of-the-blue (OOTB) packets.  A control sock will be created
 * for this socket at the initialization time.
@@ -32,6 +35,7 @@ struct netns_sctp {

/* Lock that protects the local_addr_list writers */
spinlock_t local_addr_lock;
+
  };

  #endif /* __NETNS_SCTP_H__ */
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index b0e6fe5..15037e7 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -375,9 +375,13 @@ static inline void sctp_dbg_objcnt_exit(void) { return; }
  #if defined CONFIG_SYSCTL
  void sctp_sysctl_register(void);
  void sctp_sysctl_unregister(void);
+int sctp_sysctl_net_register(struct net *net);
+void sctp_sysctl_net_unregister(struct net *net);
  #else
  static inline void sctp_sysctl_register(void) { return; }
  static inline void sctp_sysctl_unregister(void) { return; }
+static inline int sctp_sysctl_net_register(struct net *net) { return 0; }
+static inline void sctp_sysctl_net_unregister(struct net *net) { return; }
  #endif

  /* Size of Supported Address Parameter for 'x' address types. */
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 69bdc72..de25d9c 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1169,6 +1169,10 @@ static int sctp_net_init(struct net *net)
  {
int status;

+   status = sctp_sysctl_net_register(net);
+   if (status)
+   goto err_sysctl_register;
+
/* Allocate and initialise sctp mibs.  */
status = init_sctp_mibs(net);
if (status)
@@ -1205,6 +1209,8 @@ err_ctl_sock_init:
  err_init_proc:
cleanup_sctp_mibs(net);
  err_init_mibs:
+   sctp_sysctl_net_unregister(net);
+err_sysctl_register:
return status;
  }

@@ -1219,6 +1225,7 @@ static void sctp_net_exit(struct net *net)

sctp_proc_exit(net);
cleanup_sctp_mibs(net);
+   sctp_sysctl_net_unregister(net);
  }

  static struct pernet_operations sctp_net_ops = {
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index 2b2bfe9..7528d59 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -284,6 +284,27 @@ static ctl_table sctp_table[] = {
{ /* sentinel */ }
  };

+static ctl_table sctp_net_table[] = {
+   { /* sentinel */ }  
+};
+
+int sctp_sysctl_net_register(struct net *net)
+{
+   struct ctl_table *table;
+
+   table = kmemdup(sctp_net_table, sizeof(sctp_net_table), GFP_KERNEL);
+   if (!table)
+   return -ENOMEM;
+
+   net->sctp.sysctl_header = register_net_sysctl(net, "net/sctp", table);
+   return 0;
+}
+
+void sctp_sysctl_net_unregister(struct net *net)
+{
+   unregister_net_sysctl_table(net->sctp.sysctl_header);
+}
+
  static struct ctl_table_header * sctp_sysctl_header;

  /* Sysctl registration.  */



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net-next 9/9] sctp: Make the mib per network namespace

2012-08-14 Thread Vlad Yasevich

On 08/06/2012 02:47 PM, Eric W. Biederman wrote:


Signed-off-by: "Eric W. Biederman" 


Acked-by: Vlad Yasevich 

---
  include/net/netns/sctp.h |3 +
  include/net/sctp/sctp.h  |9 +--
  net/sctp/associola.c |2 +-
  net/sctp/chunk.c |2 +-
  net/sctp/endpointola.c   |2 +-
  net/sctp/input.c |   22 +++---
  net/sctp/ipv6.c  |4 +-
  net/sctp/output.c|2 +-
  net/sctp/outqueue.c  |   18 +++--
  net/sctp/proc.c  |5 +-
  net/sctp/protocol.c  |   27 
  net/sctp/sm_statefuns.c  |  163 +++--
  net/sctp/ulpqueue.c  |   18 --
  13 files changed, 158 insertions(+), 119 deletions(-)

diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h
index 9c20a82..06ccddf 100644
--- a/include/net/netns/sctp.h
+++ b/include/net/netns/sctp.h
@@ -3,8 +3,11 @@

  struct sock;
  struct proc_dir_entry;
+struct sctp_mib;

  struct netns_sctp {
+   DEFINE_SNMP_STAT(struct sctp_mib, sctp_statistics);
+
  #ifdef CONFIG_PROC_FS
struct proc_dir_entry *proc_net_sctp;
  #endif
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index ca716da..b0e6fe5 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -221,11 +221,10 @@ extern struct kmem_cache *sctp_bucket_cachep 
__read_mostly;
  #define sctp_bh_unlock_sock(sk)  bh_unlock_sock(sk)

  /* SCTP SNMP MIB stats handlers */
-DECLARE_SNMP_STAT(struct sctp_mib, sctp_statistics);
-#define SCTP_INC_STATS(field)  SNMP_INC_STATS(sctp_statistics, field)
-#define SCTP_INC_STATS_BH(field)   SNMP_INC_STATS_BH(sctp_statistics, field)
-#define SCTP_INC_STATS_USER(field) SNMP_INC_STATS_USER(sctp_statistics, field)
-#define SCTP_DEC_STATS(field)  SNMP_DEC_STATS(sctp_statistics, field)
+#define SCTP_INC_STATS(net, field)  
SNMP_INC_STATS((net)->sctp.sctp_statistics, field)
+#define SCTP_INC_STATS_BH(net, field)   
SNMP_INC_STATS_BH((net)->sctp.sctp_statistics, field)
+#define SCTP_INC_STATS_USER(net, field) 
SNMP_INC_STATS_USER((net)->sctp.sctp_statistics, field)
+#define SCTP_DEC_STATS(net, field)  
SNMP_DEC_STATS((net)->sctp.sctp_statistics, field)

  #endif /* !TEST_FRAME */

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index ed4930b..8a1f27a 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1150,7 +1150,7 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
if (sctp_chunk_is_data(chunk))
asoc->peer.last_data_from = chunk->transport;
else
-   SCTP_INC_STATS(SCTP_MIB_INCTRLCHUNKS);
+   SCTP_INC_STATS(sock_net(asoc->base.sk), 
SCTP_MIB_INCTRLCHUNKS);

if (chunk->transport)
chunk->transport->last_time_heard = jiffies;
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 6c85564..7c2df9c 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -257,7 +257,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct 
sctp_association *asoc,
offset = 0;

if ((whole > 1) || (whole && over))
-   SCTP_INC_STATS_USER(SCTP_MIB_FRAGUSRMSGS);
+   SCTP_INC_STATS_USER(sock_net(asoc->base.sk), 
SCTP_MIB_FRAGUSRMSGS);

/* Create chunks for all the full sized DATA chunks. */
for (i=0, len=first_len; i < whole; i++) {
diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index 6b76393..3edca80 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -478,7 +478,7 @@ normal:
if (asoc && sctp_chunk_is_data(chunk))
asoc->peer.last_data_from = chunk->transport;
else
-   SCTP_INC_STATS(SCTP_MIB_INCTRLCHUNKS);
+   SCTP_INC_STATS(sock_net(ep->base.sk), 
SCTP_MIB_INCTRLCHUNKS);

if (chunk->transport)
chunk->transport->last_time_heard = jiffies;
diff --git a/net/sctp/input.c b/net/sctp/input.c
index c9a0449..5308301 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -83,7 +83,7 @@ static int sctp_add_backlog(struct sock *sk, struct sk_buff 
*skb);


  /* Calculate the SCTP checksum of an SCTP packet.  */
-static inline int sctp_rcv_checksum(struct sk_buff *skb)
+static inline int sctp_rcv_checksum(struct net *net, struct sk_buff *skb)
  {
struct sctphdr *sh = sctp_hdr(skb);
__le32 cmp = sh->checksum;
@@ -99,7 +99,7 @@ static inline int sctp_rcv_checksum(struct sk_buff *skb)

if (val != cmp) {
/* CRC failure, dump it. */
-   SCTP_INC_STATS_BH(SCTP_MIB_CHECKSUMERRORS);
+   SCTP_INC_STATS_BH(net, SCTP_MIB_CHECKSUMERRORS);
return -1;
}
return 0;
@@ -137,7 +137,7 @@ int sctp_rcv(struct sk_buff *skb)
if (skb->pkt_type!=PACKET_HOST)
goto discard_it;

-   SCTP_INC_STATS_BH(SCTP_MIB_INSCTPPACKS);
+   SCTP_INC_STATS_BH(net, 

Re: [PATCH net-next 8/9] sctp: Enable sctp in all network namespaces

2012-08-14 Thread Vlad Yasevich

On 08/06/2012 02:46 PM, Eric W. Biederman wrote:


- Fix the sctp_af operations to work in all namespaces
- Enable sctp socket creation in all network namespaces.

Signed-off-by: "Eric W. Biederman" 


Acked-by: Vlad Yasevich 


---
  net/sctp/ipv6.c |   12 ++--
  net/sctp/protocol.c |8 +---
  2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index bbf1534..a18cda6 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -582,7 +582,7 @@ static int sctp_v6_available(union sctp_addr *addr, struct 
sctp_sock *sp)
if (!(type & IPV6_ADDR_UNICAST))
return 0;

-   return ipv6_chk_addr(_net, in6, NULL, 0);
+   return ipv6_chk_addr(sock_net(>inet.sk), in6, NULL, 0);
  }

  /* This function checks if the address is a valid address to be used for
@@ -859,14 +859,14 @@ static int sctp_inet6_bind_verify(struct sctp_sock *opt, 
union sctp_addr *addr)
struct net_device *dev;

if (type & IPV6_ADDR_LINKLOCAL) {
+   struct net *net;
if (!addr->v6.sin6_scope_id)
return 0;
+   net = sock_net(>inet.sk);
rcu_read_lock();
-   dev = dev_get_by_index_rcu(_net,
-  addr->v6.sin6_scope_id);
+   dev = dev_get_by_index_rcu(net, addr->v6.sin6_scope_id);
if (!dev ||
-   !ipv6_chk_addr(_net, >v6.sin6_addr,
-  dev, 0)) {
+   !ipv6_chk_addr(net, >v6.sin6_addr, dev, 0)) {
rcu_read_unlock();
return 0;
}
@@ -899,7 +899,7 @@ static int sctp_inet6_send_verify(struct sctp_sock *opt, 
union sctp_addr *addr)
if (!addr->v6.sin6_scope_id)
return 0;
rcu_read_lock();
-   dev = dev_get_by_index_rcu(_net,
+   dev = dev_get_by_index_rcu(sock_net(>inet.sk),
   addr->v6.sin6_scope_id);
rcu_read_unlock();
if (!dev)
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 72b3aa7..ab35691 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -367,7 +367,8 @@ static int sctp_v4_addr_valid(union sctp_addr *addr,
  /* Should this be available for binding?   */
  static int sctp_v4_available(union sctp_addr *addr, struct sctp_sock *sp)
  {
-   int ret = inet_addr_type(_net, addr->v4.sin_addr.s_addr);
+   struct net *net = sock_net(>inet.sk);
+   int ret = inet_addr_type(net, addr->v4.sin_addr.s_addr);


if (addr->v4.sin_addr.s_addr != htonl(INADDR_ANY) &&
@@ -454,7 +455,7 @@ static void sctp_v4_get_dst(struct sctp_transport *t, union 
sctp_addr *saddr,
SCTP_DEBUG_PRINTK("%s: DST:%pI4, SRC:%pI4 - ",
  __func__, >daddr, >saddr);

-   rt = ip_route_output_key(_net, fl4);
+   rt = ip_route_output_key(sock_net(sk), fl4);
if (!IS_ERR(rt))
dst = >dst;

@@ -500,7 +501,7 @@ static void sctp_v4_get_dst(struct sctp_transport *t, union 
sctp_addr *saddr,
(AF_INET == laddr->a.sa.sa_family)) {
fl4->saddr = laddr->a.v4.sin_addr.s_addr;
fl4->fl4_sport = laddr->a.v4.sin_port;
-   rt = ip_route_output_key(_net, fl4);
+   rt = ip_route_output_key(sock_net(sk), fl4);
if (!IS_ERR(rt)) {
dst = >dst;
goto out_unlock;
@@ -1033,6 +1034,7 @@ static const struct net_protocol sctp_protocol = {
.handler = sctp_rcv,
.err_handler = sctp_v4_err,
.no_policy   = 1,
+   .netns_ok= 1,
  };

  /* IPv4 address related functions.  */



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net-next 7/9] sctp: Make the proc files per network namespace.

2012-08-14 Thread Vlad Yasevich

On 08/06/2012 02:45 PM, Eric W. Biederman wrote:


- Convert all of the files under /proc/net/sctp to be per
   network namespace.

- Don't print anything for /proc/net/sctp/snmp except in
   the initial network namespaces as the snmp counters still
   have to be converted to be per network namespace.

Signed-off-by: "Eric W. Biederman" 


Acked-by: Vlad Yasevich 


---
  include/net/netns/sctp.h |5 +++
  include/net/sctp/sctp.h  |   16 +-
  net/sctp/proc.c  |   59 +++
  net/sctp/protocol.c  |   77 +++--
  4 files changed, 85 insertions(+), 72 deletions(-)

diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h
index 29e36b4..9c20a82 100644
--- a/include/net/netns/sctp.h
+++ b/include/net/netns/sctp.h
@@ -2,8 +2,13 @@
  #define __NETNS_SCTP_H__

  struct sock;
+struct proc_dir_entry;

  struct netns_sctp {
+#ifdef CONFIG_PROC_FS
+   struct proc_dir_entry *proc_net_sctp;
+#endif
+
/* This is the global socket data structure used for responding to
 * the Out-of-the-blue (OOTB) packets.  A control sock will be created
 * for this socket at the initialization time.
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 550a81b..ca716da 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -172,14 +172,14 @@ void sctp_backlog_migrate(struct sctp_association *assoc,
  /*
   * sctp/proc.c
   */
-int sctp_snmp_proc_init(void);
-void sctp_snmp_proc_exit(void);
-int sctp_eps_proc_init(void);
-void sctp_eps_proc_exit(void);
-int sctp_assocs_proc_init(void);
-void sctp_assocs_proc_exit(void);
-int sctp_remaddr_proc_init(void);
-void sctp_remaddr_proc_exit(void);
+int sctp_snmp_proc_init(struct net *net);
+void sctp_snmp_proc_exit(struct net *net);
+int sctp_eps_proc_init(struct net *net);
+void sctp_eps_proc_exit(struct net *net);
+int sctp_assocs_proc_init(struct net *net);
+void sctp_assocs_proc_exit(struct net *net);
+int sctp_remaddr_proc_init(struct net *net);
+void sctp_remaddr_proc_exit(struct net *net);


  /*
diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index 1e2eee8..dc79a3a 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -80,8 +80,12 @@ static const struct snmp_mib sctp_snmp_list[] = {
  /* Display sctp snmp mib statistics(/proc/net/sctp/snmp). */
  static int sctp_snmp_seq_show(struct seq_file *seq, void *v)
  {
+   struct net *net = seq->private;
int i;

+   if (!net_eq(net, _net))
+   return 0;
+
for (i = 0; sctp_snmp_list[i].name != NULL; i++)
seq_printf(seq, "%-32s\t%ld\n", sctp_snmp_list[i].name,
   snmp_fold_field((void __percpu **)sctp_statistics,
@@ -93,7 +97,7 @@ static int sctp_snmp_seq_show(struct seq_file *seq, void *v)
  /* Initialize the seq file operations for 'snmp' object. */
  static int sctp_snmp_seq_open(struct inode *inode, struct file *file)
  {
-   return single_open(file, sctp_snmp_seq_show, NULL);
+   return single_open_net(inode, file, sctp_snmp_seq_show);
  }

  static const struct file_operations sctp_snmp_seq_fops = {
@@ -105,11 +109,12 @@ static const struct file_operations sctp_snmp_seq_fops = {
  };

  /* Set up the proc fs entry for 'snmp' object. */
-int __init sctp_snmp_proc_init(void)
+int __net_init sctp_snmp_proc_init(struct net *net)
  {
struct proc_dir_entry *p;

-   p = proc_create("snmp", S_IRUGO, proc_net_sctp, _snmp_seq_fops);
+   p = proc_create("snmp", S_IRUGO, net->sctp.proc_net_sctp,
+   _snmp_seq_fops);
if (!p)
return -ENOMEM;

@@ -117,9 +122,9 @@ int __init sctp_snmp_proc_init(void)
  }

  /* Cleanup the proc fs entry for 'snmp' object. */
-void sctp_snmp_proc_exit(void)
+void sctp_snmp_proc_exit(struct net *net)
  {
-   remove_proc_entry("snmp", proc_net_sctp);
+   remove_proc_entry("snmp", net->sctp.proc_net_sctp);
  }

  /* Dump local addresses of an association/endpoint. */
@@ -197,6 +202,7 @@ static void * sctp_eps_seq_next(struct seq_file *seq, void 
*v, loff_t *pos)
  /* Display sctp endpoints (/proc/net/sctp/eps). */
  static int sctp_eps_seq_show(struct seq_file *seq, void *v)
  {
+   struct seq_net_private *priv = seq->private;
struct sctp_hashbucket *head;
struct sctp_ep_common *epb;
struct sctp_endpoint *ep;
@@ -213,6 +219,8 @@ static int sctp_eps_seq_show(struct seq_file *seq, void *v)
sctp_for_each_hentry(epb, node, >chain) {
ep = sctp_ep(epb);
sk = epb->sk;
+   if (!net_eq(sock_net(sk), priv->net))
+   continue;
seq_printf(seq, "%8pK %8pK %-3d %-3d %-4d %-5d %5d %5lu ", ep, 
sk,
   sctp_sk(sk)->type, sk->sk_state, hash,
   epb->bind_addr.port,
@@ -238,7 +246,8 @@ static const struct seq_operations sctp_eps_ops = {
  /* Initialize the seq file operations for 

Re: [PATCH net-next 6/9] sctp: Move the percpu sockets counter out of sctp_proc_init

2012-08-14 Thread Vlad Yasevich

On 08/06/2012 02:44 PM, Eric W. Biederman wrote:


The percpu sctp socket counter has nothing at all to do with the sctp
proc files, and having it in the wrong initialization is confusing,
and makes network namespace support a pain.

Signed-off-by: "Eric W. Biederman" 


Acked-by: Vlad Yasevich 


---
  net/sctp/protocol.c |   13 +++--
  1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 6193d20..976d765 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -93,8 +93,6 @@ int sysctl_sctp_wmem[3];
  /* Set up the proc fs entry for the SCTP protocol. */
  static __init int sctp_proc_init(void)
  {
-   if (percpu_counter_init(_sockets_allocated, 0))
-   goto out_nomem;
  #ifdef CONFIG_PROC_FS
if (!proc_net_sctp) {
proc_net_sctp = proc_mkdir("sctp", init_net.proc_net);
@@ -125,12 +123,9 @@ out_snmp_proc_init:
remove_proc_entry("sctp", init_net.proc_net);
}
  out_free_percpu:
-   percpu_counter_destroy(_sockets_allocated);
  #else
return 0;
  #endif /* CONFIG_PROC_FS */
-
-out_nomem:
return -ENOMEM;
  }

@@ -151,7 +146,6 @@ static void sctp_proc_exit(void)
remove_proc_entry("sctp", init_net.proc_net);
}
  #endif
-   percpu_counter_destroy(_sockets_allocated);
  }

  /* Private helper to extract ipv4 address and stash them in
@@ -1261,6 +1255,10 @@ SCTP_STATIC __init int sctp_init(void)
if (status)
goto err_init_mibs;

+   status = percpu_counter_init(_sockets_allocated, 0);
+   if (status)
+   goto err_percpu_counter_init;
+
/* Initialize proc fs directory.  */
status = sctp_proc_init();
if (status)
@@ -1481,6 +1479,8 @@ err_ahash_alloc:
sctp_dbg_objcnt_exit();
sctp_proc_exit();
  err_init_proc:
+   percpu_counter_destroy(_sockets_allocated);
+err_percpu_counter_init:
cleanup_sctp_mibs();
  err_init_mibs:
kmem_cache_destroy(sctp_chunk_cachep);
@@ -1521,6 +1521,7 @@ SCTP_STATIC __exit void sctp_exit(void)
 sizeof(struct sctp_bind_hashbucket)));

sctp_dbg_objcnt_exit();
+   percpu_counter_destroy(_sockets_allocated);
sctp_proc_exit();
cleanup_sctp_mibs();




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net-next 5/9] sctp: Make the ctl_sock per network namespace

2012-08-14 Thread Vlad Yasevich

On 08/06/2012 02:43 PM, Eric W. Biederman wrote:


- Kill sctp_get_ctl_sock, it is useless now.
- Pass struct net where needed so net->sctp.ctl_sock is accessible.

Signed-off-by: "Eric W. Biederman" 


Acked-by: Vlad Yasevich 


---
  include/net/netns/sctp.h |8 +++
  include/net/sctp/sctp.h  |1 -
  net/sctp/input.c |4 +-
  net/sctp/protocol.c  |   47 ++---
  net/sctp/sm_statefuns.c  |   45 ++-
  5 files changed, 60 insertions(+), 45 deletions(-)

diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h
index cbd684e..29e36b4 100644
--- a/include/net/netns/sctp.h
+++ b/include/net/netns/sctp.h
@@ -1,7 +1,15 @@
  #ifndef __NETNS_SCTP_H__
  #define __NETNS_SCTP_H__

+struct sock;
+
  struct netns_sctp {
+   /* This is the global socket data structure used for responding to
+* the Out-of-the-blue (OOTB) packets.  A control sock will be created
+* for this socket at the initialization time.
+*/
+   struct sock *ctl_sock;
+
/* This is the global local address list.
 * We actively maintain this complete list of addresses on
 * the system by catching address add/delete events.
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 00c9205..550a81b 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -114,7 +114,6 @@
  /*
   * sctp/protocol.c
   */
-extern struct sock *sctp_get_ctl_sock(void);
  extern int sctp_copy_local_addr_list(struct net *, struct sctp_bind_addr *,
 sctp_scope_t, gfp_t gfp,
 int flags);
diff --git a/net/sctp/input.c b/net/sctp/input.c
index a7e9a85..c9a0449 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -204,7 +204,7 @@ int sctp_rcv(struct sk_buff *skb)
sctp_endpoint_put(ep);
ep = NULL;
}
-   sk = sctp_get_ctl_sock();
+   sk = net->sctp.ctl_sock;
ep = sctp_sk(sk)->ep;
sctp_endpoint_hold(ep);
rcvr = >base;
@@ -795,7 +795,7 @@ static struct sctp_endpoint 
*__sctp_rcv_lookup_endpoint(struct net *net,
goto hit;
}

-   ep = sctp_sk((sctp_get_ctl_sock()))->ep;
+   ep = sctp_sk(net->sctp.ctl_sock)->ep;

  hit:
sctp_endpoint_hold(ep);
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 291e682..6193d20 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -78,12 +78,6 @@ struct proc_dir_entry*proc_net_sctp;
  struct idr sctp_assocs_id;
  DEFINE_SPINLOCK(sctp_assocs_id_lock);

-/* This is the global socket data structure used for responding to
- * the Out-of-the-blue (OOTB) packets.  A control sock will be created
- * for this socket at the initialization time.
- */
-static struct sock *sctp_ctl_sock;
-
  static struct sctp_pf *sctp_pf_inet6_specific;
  static struct sctp_pf *sctp_pf_inet_specific;
  static struct sctp_af *sctp_af_v4_specific;
@@ -96,12 +90,6 @@ long sysctl_sctp_mem[3];
  int sysctl_sctp_rmem[3];
  int sysctl_sctp_wmem[3];

-/* Return the address of the control sock. */
-struct sock *sctp_get_ctl_sock(void)
-{
-   return sctp_ctl_sock;
-}
-
  /* Set up the proc fs entry for the SCTP protocol. */
  static __init int sctp_proc_init(void)
  {
@@ -822,7 +810,7 @@ static int sctp_inetaddr_event(struct notifier_block *this, 
unsigned long ev,
   * Initialize the control inode/socket with a control endpoint data
   * structure.  This endpoint is reserved exclusively for the OOTB processing.
   */
-static int sctp_ctl_sock_init(void)
+static int sctp_ctl_sock_init(struct net *net)
  {
int err;
sa_family_t family = PF_INET;
@@ -830,14 +818,14 @@ static int sctp_ctl_sock_init(void)
if (sctp_get_pf_specific(PF_INET6))
family = PF_INET6;

-   err = inet_ctl_sock_create(_ctl_sock, family,
-  SOCK_SEQPACKET, IPPROTO_SCTP, _net);
+   err = inet_ctl_sock_create(>sctp.ctl_sock, family,
+  SOCK_SEQPACKET, IPPROTO_SCTP, net);

/* If IPv6 socket could not be created, try the IPv4 socket */
if (err < 0 && family == PF_INET6)
-   err = inet_ctl_sock_create(_ctl_sock, AF_INET,
+   err = inet_ctl_sock_create(>sctp.ctl_sock, AF_INET,
   SOCK_SEQPACKET, IPPROTO_SCTP,
-  _net);
+  net);

if (err < 0) {
pr_err("Failed to create the SCTP control socket\n");
@@ -1196,6 +1184,14 @@ static void sctp_v4_del_protocol(void)

  static int sctp_net_init(struct net *net)
  {
+   int status;
+
+   /* Initialize the control inode/socket for handling OOTB packets.  */
+   if ((status = sctp_ctl_sock_init(net))) {
+   

Re: [PATCH net-next 4/9] sctp: Make the address lists per network namespace

2012-08-14 Thread Vlad Yasevich

On 08/06/2012 02:42 PM, Eric W. Biederman wrote:


- Move the address lists into struct net
- Add per network namespace initialization and cleanup
- Pass around struct net so it is everywhere I need it.
- Rename all of the global variable references into references
   to the variables moved into struct net

Signed-off-by: "Eric W. Biederman" 


Acked-by: Vlad Yasevich 


---
  include/net/net_namespace.h |4 +
  include/net/netns/sctp.h|   21 +++
  include/net/sctp/sctp.h |4 +-
  include/net/sctp/structs.h  |   22 +---
  net/sctp/associola.c|3 +-
  net/sctp/bind_addr.c|   14 ++--
  net/sctp/ipv6.c |   17 +++---
  net/sctp/protocol.c |  141 +--
  net/sctp/socket.c   |7 +-
  9 files changed, 131 insertions(+), 102 deletions(-)
  create mode 100644 include/net/netns/sctp.h

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index ae1cd6c..8ab5250 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -15,6 +15,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
@@ -80,6 +81,9 @@ struct net {
  #if IS_ENABLED(CONFIG_IPV6)
struct netns_ipv6   ipv6;
  #endif
+#if defined(CONFIG_IP_SCTP) || defined(CONFIG_IP_SCTP_MODULE)
+   struct netns_sctp   sctp;
+#endif
  #if defined(CONFIG_IP_DCCP) || defined(CONFIG_IP_DCCP_MODULE)
struct netns_dccp   dccp;
  #endif
diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h
new file mode 100644
index 000..cbd684e
--- /dev/null
+++ b/include/net/netns/sctp.h
@@ -0,0 +1,21 @@
+#ifndef __NETNS_SCTP_H__
+#define __NETNS_SCTP_H__
+
+struct netns_sctp {
+   /* This is the global local address list.
+* We actively maintain this complete list of addresses on
+* the system by catching address add/delete events.
+*
+* It is a list of sctp_sockaddr_entry.
+*/
+   struct list_head local_addr_list;
+   struct list_head addr_waitq;
+   struct timer_list addr_wq_timer;
+   struct list_head auto_asconf_splist;
+   spinlock_t addr_wq_lock;
+
+   /* Lock that protects the local_addr_list writers */
+   spinlock_t local_addr_lock;
+};
+
+#endif /* __NETNS_SCTP_H__ */
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 640915a..00c9205 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -115,12 +115,12 @@
   * sctp/protocol.c
   */
  extern struct sock *sctp_get_ctl_sock(void);
-extern int sctp_copy_local_addr_list(struct sctp_bind_addr *,
+extern int sctp_copy_local_addr_list(struct net *, struct sctp_bind_addr *,
 sctp_scope_t, gfp_t gfp,
 int flags);
  extern struct sctp_pf *sctp_get_pf_specific(sa_family_t family);
  extern int sctp_register_pf(struct sctp_pf *, sa_family_t);
-extern void sctp_addr_wq_mgmt(struct sctp_sockaddr_entry *, int);
+extern void sctp_addr_wq_mgmt(struct net *, struct sctp_sockaddr_entry *, int);

  /*
   * sctp/socket.c
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index c0563d1..6bdfcab 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -205,21 +205,7 @@ extern struct sctp_globals {
int port_hashsize;
struct sctp_bind_hashbucket *port_hashtable;

-   /* This is the global local address list.
-* We actively maintain this complete list of addresses on
-* the system by catching address add/delete events.
-*
-* It is a list of sctp_sockaddr_entry.
-*/
-   struct list_head local_addr_list;
int default_auto_asconf;
-   struct list_head addr_waitq;
-   struct timer_list addr_wq_timer;
-   struct list_head auto_asconf_splist;
-   spinlock_t addr_wq_lock;
-
-   /* Lock that protects the local_addr_list writers */
-   spinlock_t addr_list_lock;

/* Flag to indicate if addip is enabled. */
int addip_enable;
@@ -278,12 +264,6 @@ extern struct sctp_globals {
  #define sctp_assoc_hashtable  (sctp_globals.assoc_hashtable)
  #define sctp_port_hashsize(sctp_globals.port_hashsize)
  #define sctp_port_hashtable   (sctp_globals.port_hashtable)
-#define sctp_local_addr_list   (sctp_globals.local_addr_list)
-#define sctp_local_addr_lock   (sctp_globals.addr_list_lock)
-#define sctp_auto_asconf_splist
(sctp_globals.auto_asconf_splist)
-#define sctp_addr_waitq(sctp_globals.addr_waitq)
-#define sctp_addr_wq_timer (sctp_globals.addr_wq_timer)
-#define sctp_addr_wq_lock  (sctp_globals.addr_wq_lock)
  #define sctp_default_auto_asconf  (sctp_globals.default_auto_asconf)
  #define sctp_scope_policy (sctp_globals.ipv4_scope_policy)
  

Re: [PATCH net-next 3/9] sctp: Make the association hashtable handle multiple network namespaces

2012-08-14 Thread Vlad Yasevich

On 08/06/2012 02:41 PM, Eric W. Biederman wrote:


- Use struct net in the hash calculation
- Use sock_net(association.base.sk) in the association lookups.
- On receive calculate the network namespace from skb->dev.
- Pass struct net from receive down to the functions that actually
   do the association lookup.

Signed-off-by: "Eric W. Biederman" 


Acked-by: Vlad Yasevich 


---
  include/net/sctp/sctp.h|6 ++--
  include/net/sctp/structs.h |3 +-
  net/sctp/associola.c   |4 ++-
  net/sctp/endpointola.c |6 +++-
  net/sctp/input.c   |   64 +++
  net/sctp/ipv6.c|3 +-
  6 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 87b119f..640915a 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -156,7 +156,7 @@ void sctp_hash_established(struct sctp_association *);
  void sctp_unhash_established(struct sctp_association *);
  void sctp_hash_endpoint(struct sctp_endpoint *);
  void sctp_unhash_endpoint(struct sctp_endpoint *);
-struct sock *sctp_err_lookup(int family, struct sk_buff *,
+struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *,
 struct sctphdr *, struct sctp_association **,
 struct sctp_transport **);
  void sctp_err_finish(struct sock *, struct sctp_association *);
@@ -644,9 +644,9 @@ static inline int sctp_ep_hashfn(struct net *net, __u16 
lport)
  }

  /* This is the hash function for the association hash table. */
-static inline int sctp_assoc_hashfn(__u16 lport, __u16 rport)
+static inline int sctp_assoc_hashfn(struct net *net, __u16 lport, __u16 rport)
  {
-   int h = (lport << 16) + rport;
+   int h = (lport << 16) + rport + net_hash_mix(net);
h ^= h>>8;
return h & (sctp_assoc_hashsize - 1);
  }
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 9f9de55..c0563d1 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1427,7 +1427,7 @@ int sctp_endpoint_is_peeled_off(struct sctp_endpoint *,
const union sctp_addr *);
  struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *,
struct net *, const union sctp_addr *);
-int sctp_has_association(const union sctp_addr *laddr,
+int sctp_has_association(struct net *net, const union sctp_addr *laddr,
 const union sctp_addr *paddr);

  int sctp_verify_init(const struct sctp_association *asoc, sctp_cid_t,
@@ -2014,6 +2014,7 @@ void sctp_assoc_control_transport(struct sctp_association 
*,
  sctp_transport_cmd_t, sctp_sn_error_t);
  struct sctp_transport *sctp_assoc_lookup_tsn(struct sctp_association *, 
__u32);
  struct sctp_transport *sctp_assoc_is_match(struct sctp_association *,
+  struct net *,
   const union sctp_addr *,
   const union sctp_addr *);
  void sctp_assoc_migrate(struct sctp_association *, struct sock *);
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index ebaef3e..a3601f3 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1089,13 +1089,15 @@ out:

  /* Is this the association we are looking for? */
  struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
+  struct net *net,
   const union sctp_addr *laddr,
   const union sctp_addr *paddr)
  {
struct sctp_transport *transport;

if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) &&
-   (htons(asoc->peer.port) == paddr->v4.sin_port)) {
+   (htons(asoc->peer.port) == paddr->v4.sin_port) &&
+   net_eq(sock_net(asoc->base.sk), net)) {
transport = sctp_assoc_lookup_paddr(asoc, paddr);
if (!transport)
goto out;
diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index 50c87b4..6b76393 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -345,7 +345,8 @@ static struct sctp_association 
*__sctp_endpoint_lookup_assoc(

rport = ntohs(paddr->v4.sin_port);

-   hash = sctp_assoc_hashfn(ep->base.bind_addr.port, rport);
+   hash = sctp_assoc_hashfn(sock_net(ep->base.sk), ep->base.bind_addr.port,
+rport);
head = _assoc_hashtable[hash];
read_lock(>lock);
sctp_for_each_hentry(epb, node, >chain) {
@@ -388,13 +389,14 @@ int sctp_endpoint_is_peeled_off(struct sctp_endpoint *ep,
  {
struct sctp_sockaddr_entry *addr;
struct sctp_bind_addr *bp;
+   struct net *net = sock_net(ep->base.sk);

bp = >base.bind_addr;
/* This function is called with the 

Re: [PATCH net-next 2/9] sctp: Make the endpoint hashtable handle multiple network namespaces

2012-08-14 Thread Vlad Yasevich

On 08/06/2012 02:40 PM, Eric W. Biederman wrote:


- Use struct net in the hash calculation
- Use sock_net(endpoint.base.sk) in the endpoint lookups.
- On receive calculate the network namespace from skb->dev.

Signed-off-by: "Eric W. Biederman" 


Acked-by: Vlad Yasevich 


---
  include/net/sctp/sctp.h|4 ++--
  include/net/sctp/structs.h |2 +-
  net/sctp/endpointola.c |4 +++-
  net/sctp/input.c   |   19 ---
  4 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 7c05040..87b119f 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -638,9 +638,9 @@ static inline int sctp_phashfn(struct net *net, __u16 lport)
  }

  /* This is the hash function for the endpoint hash table. */
-static inline int sctp_ep_hashfn(__u16 lport)
+static inline int sctp_ep_hashfn(struct net *net, __u16 lport)
  {
-   return lport & (sctp_ep_hashsize - 1);
+   return (net_hash_mix(net) + lport) & (sctp_ep_hashsize - 1);
  }

  /* This is the hash function for the association hash table. */
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index c089bb1..9f9de55 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1426,7 +1426,7 @@ struct sctp_association *sctp_endpoint_lookup_assoc(
  int sctp_endpoint_is_peeled_off(struct sctp_endpoint *,
const union sctp_addr *);
  struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *,
-   const union sctp_addr *);
+   struct net *, const union sctp_addr *);
  int sctp_has_association(const union sctp_addr *laddr,
 const union sctp_addr *paddr);

diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index 68a385d..50c87b4 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -302,11 +302,13 @@ void sctp_endpoint_put(struct sctp_endpoint *ep)

  /* Is this the endpoint we are looking for?  */
  struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *ep,
+  struct net *net,
   const union sctp_addr *laddr)
  {
struct sctp_endpoint *retval = NULL;

-   if (htons(ep->base.bind_addr.port) == laddr->v4.sin_port) {
+   if ((htons(ep->base.bind_addr.port) == laddr->v4.sin_port) &&
+   net_eq(sock_net(ep->base.sk), net)) {
if (sctp_bind_addr_match(>base.bind_addr, laddr,
 sctp_sk(ep->base.sk)))
retval = ep;
diff --git a/net/sctp/input.c b/net/sctp/input.c
index e64d521..c0ca893 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -70,7 +70,8 @@ static struct sctp_association *__sctp_rcv_lookup(struct 
sk_buff *skb,
  const union sctp_addr *laddr,
  const union sctp_addr *paddr,
  struct sctp_transport **transportp);
-static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(const union sctp_addr 
*laddr);
+static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(struct net *net,
+   const union sctp_addr *laddr);
  static struct sctp_association *__sctp_lookup_association(
const union sctp_addr *local,
const union sctp_addr *peer,
@@ -129,6 +130,7 @@ int sctp_rcv(struct sk_buff *skb)
union sctp_addr dest;
int family;
struct sctp_af *af;
+   struct net *net = dev_net(skb->dev);

if (skb->pkt_type!=PACKET_HOST)
goto discard_it;
@@ -181,7 +183,7 @@ int sctp_rcv(struct sk_buff *skb)
asoc = __sctp_rcv_lookup(skb, , , );

if (!asoc)
-   ep = __sctp_rcv_lookup_endpoint();
+   ep = __sctp_rcv_lookup_endpoint(net, );

/* Retrieve the common input handling substructure. */
rcvr = asoc ? >base : >base;
@@ -723,12 +725,13 @@ discard:
  /* Insert endpoint into the hash table.  */
  static void __sctp_hash_endpoint(struct sctp_endpoint *ep)
  {
+   struct net *net = sock_net(ep->base.sk);
struct sctp_ep_common *epb;
struct sctp_hashbucket *head;

epb = >base;

-   epb->hashent = sctp_ep_hashfn(epb->bind_addr.port);
+   epb->hashent = sctp_ep_hashfn(net, epb->bind_addr.port);
head = _ep_hashtable[epb->hashent];

sctp_write_lock(>lock);
@@ -747,12 +750,13 @@ void sctp_hash_endpoint(struct sctp_endpoint *ep)
  /* Remove endpoint from the hash table.  */
  static void __sctp_unhash_endpoint(struct sctp_endpoint *ep)
  {
+   struct net *net = sock_net(ep->base.sk);
struct sctp_hashbucket *head;
struct sctp_ep_common *epb;

epb = >base;

-   epb->hashent = 

Re: [PATCH net-next 1/9] sctp: Make the port hash table use struct net in it's key.

2012-08-14 Thread Vlad Yasevich

On 08/06/2012 02:39 PM, Eric W. Biederman wrote:


- Add struct net into the port hash table hash calculation
- Add struct net inot the struct sctp_bind_bucket so there
   is a memory of which network namespace a port is allocated in.
   No need for a ref count because sctp_bind_bucket only exists
   when there are sockets in the hash table and sockets can not
   change their network namspace, and sockets already ref count
   their network namespace.
- Add struct net into the key comparison when we are testing
   to see if we have found the port hash table entry we are
   looking for.

With these changes lookups in the port hash table becomes
safe to use in multiple network namespaces.

Signed-off-by: "Eric W. Biederman" 


Acked-by: Vlad Yasevich 



---
  include/net/sctp/sctp.h|4 ++--
  include/net/sctp/structs.h |1 +
  net/sctp/socket.c  |   22 +-
  3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index ff49964..7c05040 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -632,9 +632,9 @@ static inline int sctp_sanity_check(void)

  /* Warning: The following hash functions assume a power of two 'size'. */
  /* This is the hash function for the SCTP port hash table. */
-static inline int sctp_phashfn(__u16 lport)
+static inline int sctp_phashfn(struct net *net, __u16 lport)
  {
-   return lport & (sctp_port_hashsize - 1);
+   return (net_hash_mix(net) + lport) & (sctp_port_hashsize - 1);
  }

  /* This is the hash function for the endpoint hash table. */
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index fc5e600..c089bb1 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -102,6 +102,7 @@ struct sctp_bind_bucket {
unsigned short  fastreuse;
struct hlist_node   node;
struct hlist_head   owner;
+   struct net  *net;
  };

  struct sctp_bind_hashbucket {
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 5e25981..4316b0f 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -5769,7 +5769,7 @@ static void sctp_unhash(struct sock *sk)
   * a fastreuse flag (FIXME: NPI ipg).
   */
  static struct sctp_bind_bucket *sctp_bucket_create(
-   struct sctp_bind_hashbucket *head, unsigned short snum);
+   struct sctp_bind_hashbucket *head, struct net *, unsigned short snum);

  static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
  {
@@ -5799,11 +5799,12 @@ static long sctp_get_port_local(struct sock *sk, union 
sctp_addr *addr)
rover = low;
if (inet_is_reserved_local_port(rover))
continue;
-   index = sctp_phashfn(rover);
+   index = sctp_phashfn(sock_net(sk), rover);
head = _port_hashtable[index];
sctp_spin_lock(>lock);
sctp_for_each_hentry(pp, node, >chain)
-   if (pp->port == rover)
+   if ((pp->port == rover) &&
+   net_eq(sock_net(sk), pp->net))
goto next;
break;
next:
@@ -5827,10 +5828,10 @@ static long sctp_get_port_local(struct sock *sk, union 
sctp_addr *addr)
 * to the port number (snum) - we detect that with the
 * port iterator, pp being NULL.
 */
-   head = _port_hashtable[sctp_phashfn(snum)];
+   head = _port_hashtable[sctp_phashfn(sock_net(sk), snum)];
sctp_spin_lock(>lock);
sctp_for_each_hentry(pp, node, >chain) {
-   if (pp->port == snum)
+   if ((pp->port == snum) && net_eq(pp->net, sock_net(sk)))
goto pp_found;
}
}
@@ -5881,7 +5882,7 @@ pp_found:
  pp_not_found:
/* If there was a hash table miss, create a new port.  */
ret = 1;
-   if (!pp && !(pp = sctp_bucket_create(head, snum)))
+   if (!pp && !(pp = sctp_bucket_create(head, sock_net(sk), snum)))
goto fail_unlock;

/* In either case (hit or miss), make sure fastreuse is 1 only
@@ -6113,7 +6114,7 @@ unsigned int sctp_poll(struct file *file, struct socket 
*sock, poll_table *wait)
   /

  static struct sctp_bind_bucket *sctp_bucket_create(
-   struct sctp_bind_hashbucket *head, unsigned short snum)
+   struct sctp_bind_hashbucket *head, struct net *net, unsigned short snum)
  {
struct sctp_bind_bucket *pp;

@@ -6123,6 +6124,7 @@ static struct sctp_bind_bucket *sctp_bucket_create(
pp->port = snum;
pp->fastreuse = 0;
INIT_HLIST_HEAD(>owner);
+   

Re: [PATCH net-next 0/7] sctp: network namespace support Part 2: per net tunables

2012-08-14 Thread Vlad Yasevich

On 08/14/2012 05:14 PM, David Miller wrote:


Come on Vlad, please review this stuff some time this century.  If you
want inclusion to be dependent upon your review, then the onus is on
you to review it in a timely manner.  And you are not doing so here.

I'm not letting Eric's patches rot in patchwork for more than a week,
this is completely unacceptable.




I swear I sent an ACK 2 days ago, but I now see it sitting in my draft 
folder.  My bad.  I'll go now and dust off the ACK...


-vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] user_ns: use new hashtable implementation

2012-08-14 Thread Eric W. Biederman
Sasha Levin  writes:

> On 08/15/2012 03:08 AM, Eric W. Biederman wrote:
>>> I can offer the following: I'll write a small module that will hash 
>>> 1...1
>>> > into a hashtable which uses 7 bits (just like user_ns) and post the 
>>> > distribution
>>> > we'll get.
>> That won't hurt.  I think 1-100 then 1000-1100 may actually be more
>> representative.  Not that I would mind seeing the larger range.
>> Especially since I am in the process of encouraging the use of more
>> uids.
>> 
>
> Alrighty, the results are in (numbers are objects in bucket):
>
> For the 0...1 range:
>
> Average: 78.125
> Std dev: 1.4197704151
> Min: 75
> Max: 80
>
>
> For the 1...100 range:
>
> Average: 0.78125
> Std dev: 0.5164613088
> Min: 0
> Max: 2
>
>
> For the 1000...1100 range:
>
> Average: 0.7890625
> Std dev: 0.4964812206
> Min: 0
> Max: 2
>
>
> Looks like hash_32 is pretty good with small numbers.

Yes hash_32 seems reasonable for the uid hash.   With those long hash
chains I wouldn't like to be on a machine with 10,000 processes with
each with a different uid, and a processes calling setuid in the fast
path.

The uid hash that we are playing with is one that I sort of wish that
the hash table could grow in size, so that we could scale up better.

Aw well.  Most of the time we only have a very small number of uids
in play, so it doesn't matter at this point.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: yama_ptrace_access_check(): possible recursive locking detected

2012-08-14 Thread Fengguang Wu
On Tue, Aug 14, 2012 at 02:16:52PM -0700, Kees Cook wrote:
> On Thu, Aug 9, 2012 at 6:52 PM, Fengguang Wu  wrote:
> > On Thu, Aug 09, 2012 at 06:39:34PM -0700, Kees Cook wrote:
> >> Hi,
> >>
> >> So, after taking a closer look at this, I cannot understand how it's
> >> possible. Yama's task_lock call is against "current", not "child",
> >> which is what ptrace_may_access() is locking. And the same code makes
> >> sure that current != child. Yama would never get called if current ==
> >> child.
> >>
> >> How did you reproduce this situation?
> >
> > This warning can be triggered with Dave Jones' trinity tool:
> >
> > git://git.codemonkey.org.uk/trinity
> >
> > That's a very dangerous tool, please only run it as normal user in a
> > backed up and chrooted test box. I personally run it inside an initrd.
> > If you are interested in reproducing this, I can send you the ready
> > made initrd in private email.
> 
> Well, even with your initrd, I can't reproduce this. You're running
> this against a stock kernel? I can't see how the path you've shown can

Yes, it happens on 3.6-rc1.

> possible happen. It could only happen if "task" was "current", but
> there is an explicit test for that in ptrace_may_access(). Based on
> the traceback, this is from reading /proc/$pid/stack (or
> /proc/$pid/task/$tid/stack), rather than a direct ptrace() call, but
> the code path for task != current still stands.
> 
> I've tried both normal and "trinity -c read" and I haven't seen the
> trace you found. :(
> 
> If you can isolate the case further, I'm happy to fix it, but
> currently, I don't see a path where this can deadlock.

Even if it's proved to be a false warning, it's still very worthwhile
to apply Oleg's fix to quiet the warning. Such warnings will mislead
my bisect script. The sooner it's fixed, the better. And I like Oleg's
fix because it makes things more simple and a little bit faster.

btw, I see some different warnings when digging through the boot logs:

(x86_64-randconfig-b050)
[  128.725667]
[  128.728649] =
[  128.733989] [ INFO: possible recursive locking detected ]
[  128.733989] 3.6.0-rc1 #1 Not tainted
[  128.733989] -
[  128.733989] trinity-child0/523 is trying to acquire lock:
[  128.733989]  (&(>alloc_lock)->rlock){+.+...}, at: [] 
get_task_comm+0x20/0x47
[  128.733989]
[  128.733989] but task is already holding lock:
[  128.733989]  (&(>alloc_lock)->rlock){+.+...}, at: [] 
sys_ptrace+0x158/0x313
[  128.733989]
[  128.733989] other info that might help us debug this:
[  128.733989]  Possible unsafe locking scenario:
[  128.733989]
[  128.733989]CPU0
[  128.733989]
[  128.733989]   lock(&(>alloc_lock)->rlock);
[  128.733989]   lock(&(>alloc_lock)->rlock);
[  128.733989]
[  128.733989]  *** DEADLOCK ***
[  128.733989]
[  128.733989]  May be due to missing lock nesting notation
[  128.733989]
[  128.733989] 2 locks held by trinity-child0/523:
[  128.733989]  #0:  (>cred_guard_mutex){+.+.+.}, at: [] 
sys_ptrace+0x13d/0x313
[  128.733989]  #1:  (&(>alloc_lock)->rlock){+.+...}, at: 
[] sys_ptrace+0x158/0x313
[  128.733989]
[  128.733989] stack backtrace:
[  128.733989] Pid: 523, comm: trinity-child0 Not tainted 3.6.0-rc1 #1
[  128.733989] Call Trace:
[  128.733989]  [] __lock_acquire+0xbe0/0xcfb
[  128.733989]  [] ? mark_lock+0x2d/0x212
[  128.733989]  [] ? mark_lock+0x2d/0x212
[  128.733989]  [] lock_acquire+0x82/0x9d
[  128.733989]  [] ? get_task_comm+0x20/0x47
[  128.733989]  [] _raw_spin_lock+0x3b/0x4a
[  128.733989]  [] ? get_task_comm+0x20/0x47
[  128.733989]  [] get_task_comm+0x20/0x47
[  128.733989]  [] yama_ptrace_access_check+0x16a/0x1c7
[  128.733989]  [] ? lock_release+0x12b/0x157
[  128.733989]  [] security_ptrace_access_check+0xe/0x10
[  128.733989]  [] __ptrace_may_access+0x109/0x11b
[  128.733989]  [] sys_ptrace+0x165/0x313
[  128.733989]  [] system_call_fastpath+0x16/0x1b
[  128.823670] ptrace of pid 522 was attempted by: trinity-child0 (pid 523)


(x86_64-randconfig-k056)
[   87.057392]
[   87.058009] =
[   87.058009] [ INFO: possible recursive locking detected ]
[   87.058009] 3.6.0-rc1-00011-gf8cdda8 #2 Not tainted
[   87.058009] -
[   87.058009] trinity-child0/328 is trying to acquire lock:
[   87.058009]  (&(>alloc_lock)->rlock){+.+...}, at: [] 
spin_lock+0x9/0xb
[   87.058009]
[   87.058009] but task is already holding lock:
[   87.058009]  (&(>alloc_lock)->rlock){+.+...}, at: [] 
ptrace_attach+0xa4/0x208
[   87.058009]
[   87.058009] other info that might help us debug this:
[   87.058009]  Possible unsafe locking scenario:
[   87.058009]
[   87.058009]CPU0
[   87.058009]
[   87.058009]   lock(&(>alloc_lock)->rlock);
[   87.058009]   lock(&(>alloc_lock)->rlock);
[   87.058009]
[   87.058009]  *** DEADLOCK ***
[   87.058009]
[   87.058009]  May be due to missing lock nesting 

Re: [PATCH] select GENERIC_ATOMIC64 for c6x/score/unicore32 archs

2012-08-14 Thread Fengguang Wu
> -#define L1_CACHE_BYTESL2_CACHE_BYTES
> +#define L1_CACHE_SHIFTL2_CACHE_SHIFT
> +#define L1_CACHE_BYTES(1 << L2_CACHE_SHIFT)

Nitpick: the last line could better be:

+#define L1_CACHE_BYTES(1 << L1_CACHE_SHIFT)

Reviewed-by: Fengguang Wu 

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] select GENERIC_ATOMIC64 for c6x/score/unicore32 archs

2012-08-14 Thread Fengguang Wu
On Tue, Aug 14, 2012 at 12:22:49PM -0400, Mark Salter wrote:
> On Tue, 2012-08-14 at 23:34 +0800, Fengguang Wu wrote:
> > Sorry I have no compilers for build testing these changes, however the
> > risk looks low and it's much better than to leave the arch broken,
> > considering that Eric will do atomic64_t in the core fs/namespace.c
> > code.
> > 
> > CC: "Eric W. Biederman" 
> > Signed-off-by: Fengguang Wu 
> > ---
> > 
> > Andrew: the arch maintainers have been CCed. Best is the maintainers
> > respond, test and perhaps take the corresponding change. Let's see how
> > this will work out..
> > 
> > 
> >  arch/c6x/Kconfig   |1 +
> 
> The c6x port also needs this:
> 
> C6X: add L*_CACHE_SHIFT defines
> 
> C6X currently lacks L*_CACHE_SHIFT defines which are used in a few
> places in the generic kernel. This patch adds those missing defines.
> 
> Signed-off-by: Mark Salter 

Thanks for the quick fix! git grep shows this:

lib/atomic64.c: addr >>= L1_CACHE_SHIFT;

So this patch is a prerequisite for the GENERIC_ATOMIC64 patch.

git grep also shows

arch/score/include/asm/cache.h:#define L1_CACHE_SHIFT   4

arch/unicore32/include/asm/cache.h:#define L1_CACHE_SHIFT (5)

So the other two archs are fine.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] max17042_battery: add support for battery STATUS and CHARGE_TYPE

2012-08-14 Thread Pallala, Ramakrishna
> On Fri, Jul 27, 2012 at 10:26:21PM +0530, Ramakrishna Pallala wrote:
> > This patch adds the support to report the battery power supply
> > attributes STATUS and CHARGE_TYPE. This patch makes use of
> > power_supply_get_external_attr() API to get these attributes through power
> supply core.
> >
> > Signed-off-by: Ramakrishna Pallala 
> > ---
> [...]
> > switch (psp) {
> > +   case POWER_SUPPLY_PROP_STATUS:
> > +   ret = power_supply_get_external_attr();
> > +   if (ret < 0)
> > +   return ret;
> > +   val->intval = query.res.intval;
> > +   break;
> 
> First of, thanks a lot for your work. And sorry that it took me quite a while 
> to
> review it, but these are fundamental changes, so I had to thoughtfully 
> consider
> all this.
> 
> This exact code clearly shows that it is not battery's property. Battery 
> itself
> cannot report it, right? OK, here is what
> Documentation/power/power_supply_class.txt says:
> 
> A: Most likely, no. This class is designed to export properties which are
>directly measurable by the specific hardware available.
> 
>Inferring not available properties using some heuristics or mathematical
>model is not subject of work for a battery driver. Such functionality
>should be factored out
> 
> So, you basically try to infer battery's properties from the charger hw.
> This is surely doable, but no, I think we should not do this. Instead, what we
> want is to make use of "supplied_to" mechanism of the power supply class, and
> export it via sysfs. Then userland can see all the power supply hierarchy, and
> thus see which hardware provides which data.
> 
> See my thoughts about exporting "supplied_to" to the sysfs:
> 
>   http://lkml.org/lkml/2011/6/22/258
> 
> So, we'll have:
> 
> /sys/
>class/
>   power_supply/
>  charger/
> supplied_to/battery -> ../../battery
>  battery/
> 
> Sure, we'll have to teach userland to operate on this scheme, i.e.
> it must be aware that batteries might be connected to an external charger, and
> if so, userland must query* charging status from the charger.
> 
> Do you see any problem with this approach?
> 
> Thanks!

Got your point. I will try this approach.
Thanks for your time :-)

Thanks,
Ram


RE: [PATCH] smb347_charger: fix battery status reporting logic for charger faults

2012-08-14 Thread Pallala, Ramakrishna


> On Fri, Jul 20, 2012 at 07:22:48PM +0530, Ramakrishna Pallala wrote:
> > This patch checks for charger status register for determining the
> > battery charging status and reports Discharing/Charging/Not
> > Charging/Full accordingly.
> >
> > This patch also adds the interrupt support for Safety Timer Expiration.
> > This interrupt is helpful in debugging the cause for charger fault.
> >
> > Signed-off-by: Ramakrishna Pallala 
> 
> Few nitpicks below, otherwise looks good to me.

Thanks for the comments. I will fix them.

Thanks,
Ram

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V5 09/18] MIPS: Loongson: Add swiotlb to support big memory (>4GB).

2012-08-14 Thread Huacai Chen
On Tue, Aug 14, 2012 at 1:54 AM, Konrad Rzeszutek Wilk
 wrote:
>> +static void *loongson_dma_alloc_coherent(struct device *dev, size_t size,
>> + dma_addr_t *dma_handle, gfp_t gfp, struct 
>> dma_attrs *attrs)
>> +{
>> + void *ret;
>> +
>> + if (dma_alloc_from_coherent(dev, size, dma_handle, ))
>> + return ret;
>> +
>> + /* ignore region specifiers */
>> + gfp &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
>> +
>> +#ifdef CONFIG_ZONE_DMA
>> + if (dev == NULL)
>> + gfp |= __GFP_DMA;
>
> When would this happen? dev == NULL?
This can really happen, "grep dma_alloc_coherent drivers/ -rwI | grep
NULL" will get lots of information.

>
>> + else if (dev->coherent_dma_mask <= DMA_BIT_MASK(24))
>> + gfp |= __GFP_DMA;
>> + else
>> +#endif
>> +#ifdef CONFIG_ZONE_DMA32
>> + if (dev->coherent_dma_mask <= DMA_BIT_MASK(32))
>> + gfp |= __GFP_DMA32;
>> + else
>
> Why the 'else'
>> +#endif
>> + ;
>
> why?
>> + gfp |= __GFP_NORETRY;
>> +
>> + ret = swiotlb_alloc_coherent(dev, size, dma_handle, gfp);
>> + mb();
>
> Why the 'mb()' ? Can you just do
> return swiotlb_alloc_coherent(...)
>
>> + return ret;
>> +}
>> +
>> +static void loongson_dma_free_coherent(struct device *dev, size_t size,
>> + void *vaddr, dma_addr_t dma_handle, struct 
>> dma_attrs *attrs)
>> +{
>> + int order = get_order(size);
>> +
>> + if (dma_release_from_coherent(dev, order, vaddr))
>> + return;
>> +
>> + swiotlb_free_coherent(dev, size, vaddr, dma_handle);
>> +}
>> +
>> +static dma_addr_t loongson_dma_map_page(struct device *dev, struct page 
>> *page,
>> + unsigned long offset, size_t size,
>> + enum dma_data_direction dir,
>> + struct dma_attrs *attrs)
>> +{
>> + dma_addr_t daddr = swiotlb_map_page(dev, page, offset, size,
>> + dir, attrs);
>> + mb();
>
> Please do 'return swiotlb_map_page(..)'..
mb() is needed because of cache coherency (CPU write some data, then
map the page for a device, if without mb(), then device may read wrong
data.)

>
> But if you are doing that why don't you just set the dma_ops.map_page = 
> swiotlb_map_page
> ?
>
>
>> + return daddr;
>> +}
>> +
>> +static int loongson_dma_map_sg(struct device *dev, struct scatterlist *sg,
>> + int nents, enum dma_data_direction dir,
>> + struct dma_attrs *attrs)
>> +{
>> + int r = swiotlb_map_sg_attrs(dev, sg, nents, dir, NULL);
>> + mb();
>> +
>> + return r;
>> +}
>> +
>> +static void loongson_dma_sync_single_for_device(struct device *dev,
>> + dma_addr_t dma_handle, size_t size,
>> + enum dma_data_direction dir)
>> +{
>> + swiotlb_sync_single_for_device(dev, dma_handle, size, dir);
>> + mb();
>> +}
>> +
>> +static void loongson_dma_sync_sg_for_device(struct device *dev,
>> + struct scatterlist *sg, int nents,
>> + enum dma_data_direction dir)
>> +{
>> + swiotlb_sync_sg_for_device(dev, sg, nents, dir);
>> + mb();
>> +}
>> +
>
> I am not really sure why you have these extra functions, when you could
> just modify the dma_ops to point to the swiotlb ones
>
>> +static dma_addr_t loongson_unity_phys_to_dma(struct device *dev, 
>> phys_addr_t paddr)
>> +{
>> + return (paddr < 0x1000) ?
>> + (paddr | 0x8000) : paddr;
>> +}
>> +
>> +static phys_addr_t loongson_unity_dma_to_phys(struct device *dev, 
>> dma_addr_t daddr)
>> +{
>> + return (daddr < 0x9000 && daddr >= 0x8000) ?
>> + (daddr & 0x0fff) : daddr;
>> +}
>> +
>> +struct loongson_dma_map_ops {
>> + struct dma_map_ops dma_map_ops;
>> + dma_addr_t (*phys_to_dma)(struct device *dev, phys_addr_t paddr);
>> + phys_addr_t (*dma_to_phys)(struct device *dev, dma_addr_t daddr);
>> +};
>> +
>> +dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
>> +{
>> + struct loongson_dma_map_ops *ops = container_of(get_dma_ops(dev),
>> + struct loongson_dma_map_ops, 
>> dma_map_ops);
>> +
>> + return ops->phys_to_dma(dev, paddr);
>> +}
>> +
>> +phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
>> +{
>> + struct loongson_dma_map_ops *ops = container_of(get_dma_ops(dev),
>> + struct loongson_dma_map_ops, 
>> dma_map_ops);
>> +
>> + return ops->dma_to_phys(dev, daddr);
>> +}
>> +
>> +static int loongson_dma_set_mask(struct device *dev, u64 mask)
>> +{
>> + /* Loongson doesn't support DMA above 32-bit */
>> + if (mask > DMA_BIT_MASK(32))
>> + return -EIO;
>> +
>> + *dev->dma_mask = mask;
>> +
>> + return 0;
>> +}
>> +
>> +static 

[UPDATE][PATCH -v3 3/4] PCI/PM: Fix config reg access for D3cold and bridge suspending

2012-08-14 Thread Huang Ying
This patch fixes the following bug:

http://marc.info/?l=linux-pci=134338059022620=2

Where lspci does not work properly if a device and the corresponding
parent bridge (such as PCIe port) is suspended.  This is because the
device configuration space registers will be not accessible if the
corresponding parent bridge is suspended or the device is put into
D3cold state.

To solve the issue, the bridge/PCIe port connected to the device is
put into active state before read/write configuration space registers.
If the device is in D3cold state, it will be put into active state
too.

To avoid resume/suspend PCIe port for each configuration register
read/write, a small delay is added before the PCIe port to go
suspended.

Reported-by: Bjorn Mork 
Signed-off-by: Huang Ying 
---
 drivers/pci/pci-sysfs.c|   42 +
 drivers/pci/pcie/portdrv_pci.c |9 
 2 files changed, 51 insertions(+)

--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -458,6 +458,40 @@ boot_vga_show(struct device *dev, struct
 }
 struct device_attribute vga_attr = __ATTR_RO(boot_vga);
 
+static void
+pci_config_pm_runtime_get(struct pci_dev *pdev)
+{
+   struct device *dev = >dev;
+   struct device *parent = dev->parent;
+
+   if (parent)
+   pm_runtime_get_sync(parent);
+   pm_runtime_get_noresume(dev);
+   /*
+* pdev->current_state is set to PCI_D3cold during suspending,
+* so wait until suspending completes
+*/
+   pm_runtime_barrier(dev);
+   /*
+* Only need to resume devices in D3cold, because config
+* registers are still accessible for devices suspended but
+* not in D3cold.
+*/
+   if (pdev->current_state == PCI_D3cold)
+   pm_runtime_resume(dev);
+}
+
+static void
+pci_config_pm_runtime_put(struct pci_dev *pdev)
+{
+   struct device *dev = >dev;
+   struct device *parent = dev->parent;
+
+   pm_runtime_put(dev);
+   if (parent)
+   pm_runtime_put_sync(parent);
+}
+
 static ssize_t
 pci_read_config(struct file *filp, struct kobject *kobj,
struct bin_attribute *bin_attr,
@@ -484,6 +518,8 @@ pci_read_config(struct file *filp, struc
size = count;
}
 
+   pci_config_pm_runtime_get(dev);
+
if ((off & 1) && size) {
u8 val;
pci_user_read_config_byte(dev, off, );
@@ -529,6 +565,8 @@ pci_read_config(struct file *filp, struc
--size;
}
 
+   pci_config_pm_runtime_put(dev);
+
return count;
 }
 
@@ -549,6 +587,8 @@ pci_write_config(struct file* filp, stru
count = size;
}

+   pci_config_pm_runtime_get(dev);
+
if ((off & 1) && size) {
pci_user_write_config_byte(dev, off, data[off - init_off]);
off++;
@@ -587,6 +627,8 @@ pci_write_config(struct file* filp, stru
--size;
}
 
+   pci_config_pm_runtime_put(dev);
+
return count;
 }
 
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -140,9 +140,17 @@ static int pcie_port_runtime_resume(stru
 {
return 0;
 }
+
+static int pcie_port_runtime_idle(struct device *dev)
+{
+   /* Delay for a short while to prevent too frequent suspend/resume */
+   pm_schedule_suspend(dev, 10);
+   return -EBUSY;
+}
 #else
 #define pcie_port_runtime_suspend  NULL
 #define pcie_port_runtime_resume   NULL
+#define pcie_port_runtime_idle NULL
 #endif
 
 static const struct dev_pm_ops pcie_portdrv_pm_ops = {
@@ -155,6 +163,7 @@ static const struct dev_pm_ops pcie_port
.resume_noirq   = pcie_port_resume_noirq,
.runtime_suspend = pcie_port_runtime_suspend,
.runtime_resume = pcie_port_runtime_resume,
+   .runtime_idle   = pcie_port_runtime_idle,
 };
 
 #define PCIE_PORTDRV_PM_OPS(_portdrv_pm_ops)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Qemu-devel] [PATCH v8] kvm: notify host when the guest is panicked\

2012-08-14 Thread Marcelo Tosatti
On Tue, Aug 14, 2012 at 05:59:06PM -0500, Anthony Liguori wrote:
> Marcelo Tosatti  writes:
> 
> > On Tue, Aug 14, 2012 at 02:35:34PM -0500, Anthony Liguori wrote:
> >> Marcelo Tosatti  writes:
> >> 
> >> > On Tue, Aug 14, 2012 at 01:53:01PM -0500, Anthony Liguori wrote:
> >> >> Marcelo Tosatti  writes:
> >> >> 
> >> >> > On Tue, Aug 14, 2012 at 05:55:54PM +0300, Yan Vugenfirer wrote:
> >> >> >> 
> >> >> >> On Aug 14, 2012, at 1:42 PM, Jan Kiszka wrote:
> >> >> >> 
> >> >> >> > On 2012-08-14 10:56, Daniel P. Berrange wrote:
> >> >> >> >> On Mon, Aug 13, 2012 at 03:21:32PM -0300, Marcelo Tosatti wrote:
> >> >> >> >>> On Wed, Aug 08, 2012 at 10:43:01AM +0800, Wen Congyang wrote:
> >> >> >>  We can know the guest is panicked when the guest runs on xen.
> >> >> >>  But we do not have such feature on kvm.
> >> >> >>  
> >> >> >>  Another purpose of this feature is: management app(for example:
> >> >> >>  libvirt) can do auto dump when the guest is panicked. If 
> >> >> >>  management
> >> >> >>  app does not do auto dump, the guest's user can do dump by hand 
> >> >> >>  if
> >> >> >>  he sees the guest is panicked.
> >> >> >>  
> >> >> >>  We have three solutions to implement this feature:
> >> >> >>  1. use vmcall
> >> >> >>  2. use I/O port
> >> >> >>  3. use virtio-serial.
> >> >> >>  
> >> >> >>  We have decided to avoid touching hypervisor. The reason why I 
> >> >> >>  choose
> >> >> >>  choose the I/O port is:
> >> >> >>  1. it is easier to implememt
> >> >> >>  2. it does not depend any virtual device
> >> >> >>  3. it can work when starting the kernel
> >> >> >> >>> 
> >> >> >> >>> How about searching for the "Kernel panic - not syncing" string 
> >> >> >> >>> in the guests serial output? Say libvirtd could take an action 
> >> >> >> >>> upon
> >> >> >> >>> that?
> >> >> >> >> 
> >> >> >> >> No, this is not satisfactory. It depends on the guest OS being
> >> >> >> >> configured to use the serial port for console output which we
> >> >> >> >> cannot mandate, since it may well be required for other purposes.
> >> >> >> > 
> >> >> >> Please don't forget Windows guests, there is no console and no 
> >> >> >> "Kernel Panic" string ;)
> >> >> >> 
> >> >> >> What I used for debugging purposes on Windows guest is to register a 
> >> >> >> bugcheck callback in virtio-net driver and write 1 to VIRTIO_PCI_ISR 
> >> >> >> register.
> >> >> >> 
> >> >> >> Yan. 
> >> >> >
> >> >> > Considering whether a "panic-device" should cover other OSes is also \
> >> >
> >> >> > something to consider. Even for Linux, is "panic" the only case which
> >> >> > should be reported via the mechanism? What about oopses without 
> >> >> > panic? 
> >> >> >
> >> >> > Is the mechanism general enough for supporting new events, etc.
> >> >> 
> >> >> Hi,
> >> >> 
> >> >> I think this discussion is gone of the deep end.
> >> >> 
> >> >> Forget about !x86 platforms.  They have their own way to do this sort of
> >> >> thing.  
> >> >
> >> > The panic function in kernel/panic.c has the following options, which
> >> > appear to be arch independent, on panic:
> >> >
> >> > - reboot 
> >> > - blink
> >> 
> >> Not sure the semantics of blink but that might be a good place for a
> >> pvops hook.
> >> 
> >> >
> >> > None are paravirtual interfaces however.
> >> >
> >> >> Think of this feature like a status LED on a motherboard.  These
> >> >> are very common and usually controlled by IO ports.
> >> >> 
> >> >> We're simply reserving a "status LED" for the guest to indicate that it
> >> >> has paniced.  Let's not over engineer this.
> >> >
> >> > My concern is that you end up with state that is dependant on x86.
> >> >
> >> > Subject: [PATCH v8 3/6] add a new runstate: RUN_STATE_GUEST_PANICKED
> >> >
> >> > Having the ability to stop/restart the guest (and even introducing a 
> >> > new VM runstate) is more than a status LED analogy.
> >> 
> >> I must admit, I don't know why a new runstate is necessary/useful.  The
> >> kernel shouldn't have to care about the difference between a halted guest
> >> and a panicked guest.  That level of information belongs in userspace IMHO.
> >> 
> >> > Can this new infrastructure be used by other architectures?
> >> 
> >> I guess I don't understand why the kernel side of this isn't anything
> >> more than a paravirt op hook that does a single outb() with the
> >> remaining logic handled 100% in QEMU.
> >
> > From the patch description:
> >
> > "Another purpose of this feature is: management app(for example:
> > libvirt) can do auto dump when the guest is panicked. If management
> > app does not do auto dump, the guest's user can do dump by hand if
> > he sees the guest is panicked."
> 
> Why does this mandated another runstate?  

Good question.

> QEMU can simply mark the VCPUs as stopped and raise a QMP event.

Yes. As long as management app is able to find out for what the reason
the VM has been stopped (that is, its not an issue 

Re: [PATCH 02/16] user_ns: use new hashtable implementation

2012-08-14 Thread Sasha Levin
On 08/15/2012 03:08 AM, Eric W. Biederman wrote:
>> I can offer the following: I'll write a small module that will hash 1...1
>> > into a hashtable which uses 7 bits (just like user_ns) and post the 
>> > distribution
>> > we'll get.
> That won't hurt.  I think 1-100 then 1000-1100 may actually be more
> representative.  Not that I would mind seeing the larger range.
> Especially since I am in the process of encouraging the use of more
> uids.
> 

Alrighty, the results are in (numbers are objects in bucket):

For the 0...1 range:

Average: 78.125
Std dev: 1.4197704151
Min: 75
Max: 80


For the 1...100 range:

Average: 0.78125
Std dev: 0.5164613088
Min: 0
Max: 2


For the 1000...1100 range:

Average: 0.7890625
Std dev: 0.4964812206
Min: 0
Max: 2


Looks like hash_32 is pretty good with small numbers.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/7] staging: comedi: dnya_pci10xx: move boardinfo values into subdevice setup

2012-08-14 Thread H Hartley Sweeten
There is only one "boardtype" actually supported by this driver.
The second entry in the boardinfo is a dummy entry that would
result in an unusable device.

Remove the boardinfo fields and just use the open coded values
in the subdevice setup.

Signed-off-by: H Hartley Sweeten 
Cc: Ian Abbott 
Cc: Greg Kroah-Hartman 
---
 drivers/staging/comedi/drivers/dyna_pci10xx.c | 32 ---
 1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/drivers/staging/comedi/drivers/dyna_pci10xx.c 
b/drivers/staging/comedi/drivers/dyna_pci10xx.c
index 80bfae5..a23969e 100644
--- a/drivers/staging/comedi/drivers/dyna_pci10xx.c
+++ b/drivers/staging/comedi/drivers/dyna_pci10xx.c
@@ -57,24 +57,12 @@ static const char range_codes_pci1050_ai[] = { 0x00, 0x10, 
0x30 };
 struct boardtype {
const char *name;
int device_id;
-   int ai_chans;
-   int ao_chans;
-   int di_chans;
-   int do_chans;
-   const struct comedi_lrange *range_ai;
-   const char *range_codes_ai;
 };
 
 static const struct boardtype boardtypes[] = {
{
.name = "dyna_pci1050",
.device_id = 0x1050,
-   .ai_chans = 16,
-   .ao_chans = 16,
-   .di_chans = 16,
-   .do_chans = 16,
-   .range_ai = _pci1050_ai,
-   .range_codes_ai = range_codes_pci1050_ai,
},
/*  dummy entry corresponding to driver name */
{.name = DRV_NAME},
@@ -94,7 +82,6 @@ static int dyna_pci10xx_insn_read_ai(struct comedi_device 
*dev,
struct comedi_subdevice *s,
struct comedi_insn *insn, unsigned int *data)
 {
-   const struct boardtype *thisboard = comedi_board(dev);
struct dyna_pci10xx_private *devpriv = dev->private;
int n, counter;
u16 d = 0;
@@ -102,7 +89,7 @@ static int dyna_pci10xx_insn_read_ai(struct comedi_device 
*dev,
 
/* get the channel number and range */
chan = CR_CHAN(insn->chanspec);
-   range = thisboard->range_codes_ai[CR_RANGE((insn->chanspec))];
+   range = range_codes_pci1050_ai[CR_RANGE((insn->chanspec))];
 
mutex_lock(>mutex);
/* convert n samples */
@@ -139,13 +126,12 @@ static int dyna_pci10xx_insn_write_ao(struct 
comedi_device *dev,
 struct comedi_subdevice *s,
 struct comedi_insn *insn, unsigned int *data)
 {
-   const struct boardtype *thisboard = comedi_board(dev);
struct dyna_pci10xx_private *devpriv = dev->private;
int n;
unsigned int chan, range;
 
chan = CR_CHAN(insn->chanspec);
-   range = thisboard->range_codes_ai[CR_RANGE((insn->chanspec))];
+   range = range_codes_pci1050_ai[CR_RANGE((insn->chanspec))];
 
mutex_lock(>mutex);
for (n = 0; n < insn->n; n++) {
@@ -259,9 +245,9 @@ static int dyna_pci10xx_attach_pci(struct comedi_device 
*dev,
s = dev->subdevices + 0;
s->type = COMEDI_SUBD_AI;
s->subdev_flags = SDF_READABLE | SDF_GROUND | SDF_DIFF;
-   s->n_chan = thisboard->ai_chans;
+   s->n_chan = 16;
s->maxdata = 0x0FFF;
-   s->range_table = thisboard->range_ai;
+   s->range_table = _pci1050_ai;
s->len_chanlist = 16;
s->insn_read = dyna_pci10xx_insn_read_ai;
 
@@ -269,7 +255,7 @@ static int dyna_pci10xx_attach_pci(struct comedi_device 
*dev,
s = dev->subdevices + 1;
s->type = COMEDI_SUBD_AO;
s->subdev_flags = SDF_WRITABLE;
-   s->n_chan = thisboard->ao_chans;
+   s->n_chan = 16;
s->maxdata = 0x0FFF;
s->range_table = _unipolar10;
s->len_chanlist = 16;
@@ -279,20 +265,20 @@ static int dyna_pci10xx_attach_pci(struct comedi_device 
*dev,
s = dev->subdevices + 2;
s->type = COMEDI_SUBD_DI;
s->subdev_flags = SDF_READABLE | SDF_GROUND;
-   s->n_chan = thisboard->di_chans;
+   s->n_chan = 16;
s->maxdata = 1;
s->range_table = _digital;
-   s->len_chanlist = thisboard->di_chans;
+   s->len_chanlist = 16;
s->insn_bits = dyna_pci10xx_di_insn_bits;
 
/* digital output */
s = dev->subdevices + 3;
s->type = COMEDI_SUBD_DO;
s->subdev_flags = SDF_WRITABLE | SDF_GROUND;
-   s->n_chan = thisboard->do_chans;
+   s->n_chan = 16;
s->maxdata = 1;
s->range_table = _digital;
-   s->len_chanlist = thisboard->do_chans;
+   s->len_chanlist = 16;
s->state = 0;
s->insn_bits = dyna_pci10xx_do_insn_bits;
 
-- 
1.7.11

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/7] staging: comedi: dnya_pci10xx: remove unused DRV_NAME

2012-08-14 Thread H Hartley Sweeten
This define is not used in the driver. Remove it.

Signed-off-by: H Hartley Sweeten 
Cc: Ian Abbott 
Cc: Greg Kroah-Hartman 
---
 drivers/staging/comedi/drivers/dyna_pci10xx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/staging/comedi/drivers/dyna_pci10xx.c 
b/drivers/staging/comedi/drivers/dyna_pci10xx.c
index 56fb35b..e852808 100644
--- a/drivers/staging/comedi/drivers/dyna_pci10xx.c
+++ b/drivers/staging/comedi/drivers/dyna_pci10xx.c
@@ -41,7 +41,6 @@
 #include 
 
 #define PCI_VENDOR_ID_DYNALOG  0x10b5
-#define DRV_NAME   "dyna_pci10xx"
 
 #define READ_TIMEOUT 50
 
-- 
1.7.11

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/7] staging: comedi: dnya_pci10xx: remove unneeded boardinfo code

2012-08-14 Thread H Hartley Sweeten
The boardinfo code is not needed by this driver. Only one board
type is supported.

Signed-off-by: H Hartley Sweeten 
Cc: Ian Abbott 
Cc: Greg Kroah-Hartman 
---
 drivers/staging/comedi/drivers/dyna_pci10xx.c | 38 ++-
 1 file changed, 2 insertions(+), 36 deletions(-)

diff --git a/drivers/staging/comedi/drivers/dyna_pci10xx.c 
b/drivers/staging/comedi/drivers/dyna_pci10xx.c
index a23969e..56fb35b 100644
--- a/drivers/staging/comedi/drivers/dyna_pci10xx.c
+++ b/drivers/staging/comedi/drivers/dyna_pci10xx.c
@@ -54,20 +54,6 @@ static const struct comedi_lrange range_pci1050_ai = { 3, {
 
 static const char range_codes_pci1050_ai[] = { 0x00, 0x10, 0x30 };
 
-struct boardtype {
-   const char *name;
-   int device_id;
-};
-
-static const struct boardtype boardtypes[] = {
-   {
-   .name = "dyna_pci1050",
-   .device_id = 0x1050,
-   },
-   /*  dummy entry corresponding to driver name */
-   {.name = DRV_NAME},
-};
-
 struct dyna_pci10xx_private {
struct mutex mutex;
unsigned long BADR3;
@@ -194,35 +180,16 @@ static int dyna_pci10xx_do_insn_bits(struct comedi_device 
*dev,
return insn->n;
 }
 
-static const void *dyna_pci10xx_find_boardinfo(struct comedi_device *dev,
-  struct pci_dev *pcidev)
-{
-   const struct boardtype *thisboard;
-   int i;
-
-   for (i = 0; i < ARRAY_SIZE(boardtypes); ++i) {
-   thisboard = [i];
-   if (pcidev->device != thisboard->device_id)
-   return thisboard;
-   }
-   return NULL;
-}
-
 static int dyna_pci10xx_attach_pci(struct comedi_device *dev,
   struct pci_dev *pcidev)
 {
-   const struct boardtype *thisboard;
struct dyna_pci10xx_private *devpriv;
struct comedi_subdevice *s;
int ret;
 
comedi_set_hw_dev(dev, >dev);
 
-   thisboard = dyna_pci10xx_find_boardinfo(dev, pcidev);
-   if (!thisboard)
-   return -ENODEV;
-   dev->board_ptr = thisboard;
-   dev->board_name = thisboard->name;
+   dev->board_name = dev->driver->driver_name;
 
ret = alloc_private(dev, sizeof(*devpriv));
if (ret)
@@ -282,8 +249,7 @@ static int dyna_pci10xx_attach_pci(struct comedi_device 
*dev,
s->state = 0;
s->insn_bits = dyna_pci10xx_do_insn_bits;
 
-   dev_info(dev->class_dev, "%s: %s attached\n",
-   dev->driver->driver_name, dev->board_name);
+   dev_info(dev->class_dev, "%s attached\n", dev->board_name);
 
return 0;
 }
-- 
1.7.11

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/7] staging: comedi: dnya_pci10xx: cleanup the analog output range

2012-08-14 Thread H Hartley Sweeten
The analog output channels on this board only support a single
range, 0-10V unipolar. This range is available as an exported
symbol from the comedi core and "range_unipolar10". Use that
instead of duplicating the range in this driver and remove
the information from the boardinfo.

Signed-off-by: H Hartley Sweeten 
Cc: Ian Abbott 
Cc: Greg Kroah-Hartman 
---
 drivers/staging/comedi/drivers/dyna_pci10xx.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/staging/comedi/drivers/dyna_pci10xx.c 
b/drivers/staging/comedi/drivers/dyna_pci10xx.c
index 15048aa..abfbb12 100644
--- a/drivers/staging/comedi/drivers/dyna_pci10xx.c
+++ b/drivers/staging/comedi/drivers/dyna_pci10xx.c
@@ -54,13 +54,6 @@ static const struct comedi_lrange range_pci1050_ai = { 3, {
 
 static const char range_codes_pci1050_ai[] = { 0x00, 0x10, 0x30 };
 
-static const struct comedi_lrange range_pci1050_ao = { 1, {
- UNI_RANGE(10)
- }
-};
-
-static const char range_codes_pci1050_ao[] = { 0x00 };
-
 struct boardtype {
const char *name;
int device_id;
@@ -74,8 +67,6 @@ struct boardtype {
int do_bits;
const struct comedi_lrange *range_ai;
const char *range_codes_ai;
-   const struct comedi_lrange *range_ao;
-   const char *range_codes_ao;
 };
 
 static const struct boardtype boardtypes[] = {
@@ -92,8 +83,6 @@ static const struct boardtype boardtypes[] = {
.do_bits = 16,
.range_ai = _pci1050_ai,
.range_codes_ai = range_codes_pci1050_ai,
-   .range_ao = _pci1050_ao,
-   .range_codes_ao = range_codes_pci1050_ao,
},
/*  dummy entry corresponding to driver name */
{.name = DRV_NAME},
@@ -290,7 +279,7 @@ static int dyna_pci10xx_attach_pci(struct comedi_device 
*dev,
s->subdev_flags = SDF_WRITABLE;
s->n_chan = thisboard->ao_chans;
s->maxdata = 0x0FFF;
-   s->range_table = thisboard->range_ao;
+   s->range_table = _unipolar10;
s->len_chanlist = 16;
s->insn_write = dyna_pci10xx_insn_write_ao;
 
-- 
1.7.11

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/7] staging: comedi: dnya_pci10xx: remove unused fields in the boardinfo

2012-08-14 Thread H Hartley Sweeten
The *_bits information in the boardinfo is not used by the driver.
Remove it.

Signed-off-by: H Hartley Sweeten 
Cc: Ian Abbott 
Cc: Greg Kroah-Hartman 
---
 drivers/staging/comedi/drivers/dyna_pci10xx.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/staging/comedi/drivers/dyna_pci10xx.c 
b/drivers/staging/comedi/drivers/dyna_pci10xx.c
index abfbb12..80bfae5 100644
--- a/drivers/staging/comedi/drivers/dyna_pci10xx.c
+++ b/drivers/staging/comedi/drivers/dyna_pci10xx.c
@@ -58,13 +58,9 @@ struct boardtype {
const char *name;
int device_id;
int ai_chans;
-   int ai_bits;
int ao_chans;
-   int ao_bits;
int di_chans;
-   int di_bits;
int do_chans;
-   int do_bits;
const struct comedi_lrange *range_ai;
const char *range_codes_ai;
 };
@@ -74,13 +70,9 @@ static const struct boardtype boardtypes[] = {
.name = "dyna_pci1050",
.device_id = 0x1050,
.ai_chans = 16,
-   .ai_bits = 12,
.ao_chans = 16,
-   .ao_bits = 12,
.di_chans = 16,
-   .di_bits = 16,
.do_chans = 16,
-   .do_bits = 16,
.range_ai = _pci1050_ai,
.range_codes_ai = range_codes_pci1050_ai,
},
-- 
1.7.11

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/7] staging: comedi: dnya_pci10xx: use attach_pci callback

2012-08-14 Thread H Hartley Sweeten
Convert this PCI driver to use the comedi PCI auto config attach
mechanism by adding an attach_pci callback function. Since the
driver does not require any external configuration options, disable
the legacy attach by making the attach simply return -ENOSYS. This
removes the need to walk the pci bus to find the pci_dev and the
need for the pci_dev_put() in the detach.

Signed-off-by: H Hartley Sweeten 
Cc: Ian Abbott 
Cc: Greg Kroah-Hartman 
---
 drivers/staging/comedi/drivers/dyna_pci10xx.c | 88 +++
 1 file changed, 35 insertions(+), 53 deletions(-)

diff --git a/drivers/staging/comedi/drivers/dyna_pci10xx.c 
b/drivers/staging/comedi/drivers/dyna_pci10xx.c
index 7884a94..15048aa 100644
--- a/drivers/staging/comedi/drivers/dyna_pci10xx.c
+++ b/drivers/staging/comedi/drivers/dyna_pci10xx.c
@@ -227,73 +227,49 @@ static int dyna_pci10xx_do_insn_bits(struct comedi_device 
*dev,
return insn->n;
 }
 
-static struct pci_dev *dyna_pci10xx_find_pci_dev(struct comedi_device *dev,
-struct comedi_devconfig *it)
+static const void *dyna_pci10xx_find_boardinfo(struct comedi_device *dev,
+  struct pci_dev *pcidev)
 {
-   struct pci_dev *pcidev = NULL;
-   int bus = it->options[0];
-   int slot = it->options[1];
+   const struct boardtype *thisboard;
int i;
 
-   for_each_pci_dev(pcidev) {
-   if (bus || slot) {
-   if (bus != pcidev->bus->number ||
-   slot != PCI_SLOT(pcidev->devfn))
-   continue;
-   }
-   if (pcidev->vendor != PCI_VENDOR_ID_DYNALOG)
-   continue;
-
-   for (i = 0; i < ARRAY_SIZE(boardtypes); ++i) {
-   if (pcidev->device != boardtypes[i].device_id)
-   continue;
-
-   dev->board_ptr = [i];
-   return pcidev;
-   }
+   for (i = 0; i < ARRAY_SIZE(boardtypes); ++i) {
+   thisboard = [i];
+   if (pcidev->device != thisboard->device_id)
+   return thisboard;
}
-   dev_err(dev->class_dev,
-   "No supported board found! (req. bus %d, slot %d)\n",
-   bus, slot);
return NULL;
 }
 
-static int dyna_pci10xx_attach(struct comedi_device *dev,
- struct comedi_devconfig *it)
+static int dyna_pci10xx_attach_pci(struct comedi_device *dev,
+  struct pci_dev *pcidev)
 {
const struct boardtype *thisboard;
struct dyna_pci10xx_private *devpriv;
-   struct pci_dev *pcidev;
struct comedi_subdevice *s;
int ret;
 
-   ret = alloc_private(dev, sizeof(*devpriv));
-   if (ret)
-   return ret;
-   devpriv = dev->private;
-
-   pcidev = dyna_pci10xx_find_pci_dev(dev, it);
-   if (!pcidev)
-   return -EIO;
comedi_set_hw_dev(dev, >dev);
-   thisboard = comedi_board(dev);
 
+   thisboard = dyna_pci10xx_find_boardinfo(dev, pcidev);
+   if (!thisboard)
+   return -ENODEV;
+   dev->board_ptr = thisboard;
dev->board_name = thisboard->name;
-   dev->irq = 0;
-
-   if (comedi_pci_enable(pcidev, DRV_NAME)) {
-   printk(KERN_ERR "comedi: dyna_pci10xx: "
-   "failed to enable PCI device and request regions!");
-   return -EIO;
-   }
-
-   mutex_init(>mutex);
 
-   printk(KERN_INFO "comedi: dyna_pci10xx: device found!\n");
+   ret = alloc_private(dev, sizeof(*devpriv));
+   if (ret)
+   return ret;
+   devpriv = dev->private;
 
+   ret = comedi_pci_enable(pcidev, dev->board_name);
+   if (ret)
+   return ret;
dev->iobase = pci_resource_start(pcidev, 2);
devpriv->BADR3 = pci_resource_start(pcidev, 3);
 
+   mutex_init(>mutex);
+
ret = comedi_alloc_subdevices(dev, 4);
if (ret)
return ret;
@@ -339,10 +315,19 @@ static int dyna_pci10xx_attach(struct comedi_device *dev,
s->state = 0;
s->insn_bits = dyna_pci10xx_do_insn_bits;
 
-   printk(KERN_INFO "comedi: dyna_pci10xx: %s - device setup completed!\n",
-   thisboard->name);
+   dev_info(dev->class_dev, "%s: %s attached\n",
+   dev->driver->driver_name, dev->board_name);
+
+   return 0;
+}
+
+static int dyna_pci10xx_attach(struct comedi_device *dev,
+  struct comedi_devconfig *it)
+{
+   dev_warn(dev->class_dev,
+   "This driver does not support attach using comedi_config\n");
 
-   return 1;
+   return -ENOSYS;
 }
 
 static void dyna_pci10xx_detach(struct comedi_device *dev)
@@ -355,7 +340,6 @@ static void dyna_pci10xx_detach(struct comedi_device *dev)
if (pcidev) {
   

[PATCH 1/7] staging: comedi: dnya_pci10xx: remove thisboard and devpriv macros

2012-08-14 Thread H Hartley Sweeten
These macros rely on local variables having a specific name. Replace
them with local variables where used. Use the comedi_board() helper
to get the thisboard pointer.

Signed-off-by: H Hartley Sweeten 
Cc: Ian Abbott 
Cc: Greg Kroah-Hartman 
---
 drivers/staging/comedi/drivers/dyna_pci10xx.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/comedi/drivers/dyna_pci10xx.c 
b/drivers/staging/comedi/drivers/dyna_pci10xx.c
index 064be9a..7884a94 100644
--- a/drivers/staging/comedi/drivers/dyna_pci10xx.c
+++ b/drivers/staging/comedi/drivers/dyna_pci10xx.c
@@ -104,9 +104,6 @@ struct dyna_pci10xx_private {
unsigned long BADR3;
 };
 
-#define thisboard ((const struct boardtype *)dev->board_ptr)
-#define devpriv ((struct dyna_pci10xx_private *)dev->private)
-
 
/**/
 /** READ WRITE FUNCTIONS 
**/
 
/**/
@@ -116,6 +113,8 @@ static int dyna_pci10xx_insn_read_ai(struct comedi_device 
*dev,
struct comedi_subdevice *s,
struct comedi_insn *insn, unsigned int *data)
 {
+   const struct boardtype *thisboard = comedi_board(dev);
+   struct dyna_pci10xx_private *devpriv = dev->private;
int n, counter;
u16 d = 0;
unsigned int chan, range;
@@ -159,6 +158,8 @@ static int dyna_pci10xx_insn_write_ao(struct comedi_device 
*dev,
 struct comedi_subdevice *s,
 struct comedi_insn *insn, unsigned int *data)
 {
+   const struct boardtype *thisboard = comedi_board(dev);
+   struct dyna_pci10xx_private *devpriv = dev->private;
int n;
unsigned int chan, range;
 
@@ -181,6 +182,7 @@ static int dyna_pci10xx_di_insn_bits(struct comedi_device 
*dev,
  struct comedi_subdevice *s,
  struct comedi_insn *insn, unsigned int *data)
 {
+   struct dyna_pci10xx_private *devpriv = dev->private;
u16 d = 0;
 
mutex_lock(>mutex);
@@ -200,6 +202,8 @@ static int dyna_pci10xx_do_insn_bits(struct comedi_device 
*dev,
  struct comedi_subdevice *s,
  struct comedi_insn *insn, unsigned int *data)
 {
+   struct dyna_pci10xx_private *devpriv = dev->private;
+
/* The insn data is a mask in data[0] and the new data
 * in data[1], each channel cooresponding to a bit.
 * s->state contains the previous write data
@@ -257,20 +261,22 @@ static struct pci_dev *dyna_pci10xx_find_pci_dev(struct 
comedi_device *dev,
 static int dyna_pci10xx_attach(struct comedi_device *dev,
  struct comedi_devconfig *it)
 {
+   const struct boardtype *thisboard;
+   struct dyna_pci10xx_private *devpriv;
struct pci_dev *pcidev;
struct comedi_subdevice *s;
int ret;
 
-   if (alloc_private(dev, sizeof(struct dyna_pci10xx_private)) < 0) {
-   printk(KERN_ERR "comedi: dyna_pci10xx: "
-   "failed to allocate memory!\n");
-   return -ENOMEM;
-   }
+   ret = alloc_private(dev, sizeof(*devpriv));
+   if (ret)
+   return ret;
+   devpriv = dev->private;
 
pcidev = dyna_pci10xx_find_pci_dev(dev, it);
if (!pcidev)
return -EIO;
comedi_set_hw_dev(dev, >dev);
+   thisboard = comedi_board(dev);
 
dev->board_name = thisboard->name;
dev->irq = 0;
@@ -342,6 +348,7 @@ static int dyna_pci10xx_attach(struct comedi_device *dev,
 static void dyna_pci10xx_detach(struct comedi_device *dev)
 {
struct pci_dev *pcidev = comedi_to_pci_dev(dev);
+   struct dyna_pci10xx_private *devpriv = dev->private;
 
if (devpriv)
mutex_destroy(>mutex);
-- 
1.7.11

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: O_DIRECT to md raid 6 is slow

2012-08-14 Thread kedacomkernel
On 2012-08-15 09:12 Andy Lutomirski  Wrote:
>Ubuntu's 3.2.0-27-generic.  I can test on a newer kernel tomorrow.
I guess maybe miss the blk_plug function.
Can you add this patch and retest.

Move unplugging for direct I/O from around ->direct_IO() down to
do_blockdev_direct_IO(). This implicitly adds plugging for direct
writes.
 
CC: Li Shaohua 
Acked-by: Jeff Moyer 
Signed-off-by: Wu Fengguang 
---
 fs/direct-io.c |5 +
 mm/filemap.c   |4 
 2 files changed, 5 insertions(+), 4 deletions(-)
 
--- linux-next.orig/mm/filemap.c 2012-08-05 16:24:47.859465122 +0800
+++ linux-next/mm/filemap.c 2012-08-05 16:24:48.407465135 +0800
@@ -1412,12 +1412,8 @@ generic_file_aio_read(struct kiocb *iocb
  retval = filemap_write_and_wait_range(mapping, pos,
  pos + iov_length(iov, nr_segs) - 1);
  if (!retval) {
- struct blk_plug plug;
-
- blk_start_plug();
  retval = mapping->a_ops->direct_IO(READ, iocb,
  iov, pos, nr_segs);
- blk_finish_plug();
  }
  if (retval > 0) {
  *ppos = pos + retval;
--- linux-next.orig/fs/direct-io.c 2012-07-07 21:46:39.531508198 +0800
+++ linux-next/fs/direct-io.c 2012-08-05 16:24:48.411465136 +0800
@@ -1062,6 +1062,7 @@ do_blockdev_direct_IO(int rw, struct kio
  unsigned long user_addr;
  size_t bytes;
  struct buffer_head map_bh = { 0, };
+ struct blk_plug plug;
 
  if (rw & WRITE)
  rw = WRITE_ODIRECT;
@@ -1177,6 +1178,8 @@ do_blockdev_direct_IO(int rw, struct kio
  PAGE_SIZE - user_addr / PAGE_SIZE);
  }
 
+ blk_start_plug();
+
  for (seg = 0; seg < nr_segs; seg++) {
  user_addr = (unsigned long)iov[seg].iov_base;
  sdio.size += bytes = iov[seg].iov_len;
@@ -1235,6 +1238,8 @@ do_blockdev_direct_IO(int rw, struct kio
  if (sdio.bio)
  dio_bio_submit(dio, );
 
+ blk_finish_plug();
+
  /*
   * It is possible that, we return short IO due to end of file.
   * In that case, we need to release all the pages we got hold on.
 
 
--


[PATCH 0/7] staging: comedi: dyna_pci10xx: update driver

2012-08-14 Thread H Hartley Sweeten
Update the dyna_pci10xx driver to use the PCI Pnp auto config mechanism
of the comedi core and remove the unneeded boardinfo code.

H Hartley Sweeten (7):
  staging: comedi: dnya_pci10xx: remove thisboard and devpriv macros
  staging: comedi: dnya_pci10xx: use attach_pci callback
  staging: comedi: dnya_pci10xx: cleanup the analog output range
  staging: comedi: dnya_pci10xx: remove unused fields in the boardinfo
  staging: comedi: dnya_pci10xx: move boardinfo values into subdevice
setup
  staging: comedi: dnya_pci10xx: remove unneeded boardinfo code
  staging: comedi: dnya_pci10xx: remove unused DRV_NAME

 drivers/staging/comedi/drivers/dyna_pci10xx.c | 161 +++---
 1 file changed, 41 insertions(+), 120 deletions(-)

-- 
1.7.11

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: O_DIRECT to md raid 6 is slow

2012-08-14 Thread Andy Lutomirski
Ubuntu's 3.2.0-27-generic.  I can test on a newer kernel tomorrow.

--Andy

On Tue, Aug 14, 2012 at 6:07 PM, kedacomkernel  wrote:
> On 2012-08-15 08:49 Andy Lutomirski  Wrote:
>>If I do:
>># dd if=/dev/zero of=/dev/md0p1 bs=8M
>>then iostat -m 5 says:
>>
>>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>   0.000.00   26.88   35.270.00   37.85
>>
>>Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
>>sdb 265.20 1.1654.79  5273
>>sdc 266.20 1.4754.73  7273
>>sdd 264.20 1.3854.54  6272
>>sdf 286.00 1.8454.74  9273
>>sde 266.60 1.0454.75  5273
>>sdg 265.00 1.0254.74  5273
>>md0   55808.00 0.00   218.00  0   1090
>>
>>If I do:
>># dd if=/dev/zero of=/dev/md0p1 bs=8M oflag=direct
>>then iostat -m 5 says:
>>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>   0.000.00   11.70   12.940.00   75.36
>>
>>Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
>>sdb 831.00 8.5830.42 42152
>>sdc 832.80 8.0529.99 40149
>>sdd 832.00 9.1029.78 45148
>>sdf 838.40 9.1129.72 45148
>>sde 828.80 7.9129.79 39148
>>sdg 850.80 8.0030.18 40150
>>md01012.60 0.00   101.27  0506
>>
>>It looks like md isn't recognizing that I'm writing whole stripes when
>>I'm in O_DIRECT mode.
>>
> kernel version?
>
>>--Andy
>>
>>--
>>Andy Lutomirski
>>AMA Capital Management, LLC
>>--
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>the body of a message to majord...@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] user_ns: use new hashtable implementation

2012-08-14 Thread Eric W. Biederman
Sasha Levin  writes:

> On 08/15/2012 01:52 AM, Eric W. Biederman wrote:
>> Sasha Levin  writes:
>> 
>>> Switch user_ns to use the new hashtable implementation. This reduces the 
>>> amount of
>>> generic unrelated code in user_ns.
>> 
>> Two concerns here.
>> 1) When adding a new entry you recompute the hash where previously that
>>was not done.  I believe that will slow down adding of new entries.
>
> I figured that the price for the extra hashing isn't significant since hash_32
> is just a multiplication and a shift.
>
> I'll modify the code to calculate the key just once.

Honestly I don't know either way, but it seemed a shame to give up a
common and trivial optimization.

>> 2) Using hash_32 for uids is an interesting choice.  hash_32 discards
>>the low bits.  Last I checked for uids the low bits were the bits
>>that were most likely to be different and had the most entropy.
>> 
>>I'm not certain how multiplying by the GOLDEN_RATION_PRIME_32 will
>>affect things but I would be surprised if it shifted all of the
>>randomness from the low bits to the high bits.
>
> "Is hash_* good enough for our purpose?" - I was actually surprised that no 
> one
> raised that question during the RFC and assumed it was because everybody 
> agreed
> that it's indeed good enough.
>
> I can offer the following: I'll write a small module that will hash 1...1
> into a hashtable which uses 7 bits (just like user_ns) and post the 
> distribution
> we'll get.

That won't hurt.  I think 1-100 then 1000-1100 may actually be more
representative.  Not that I would mind seeing the larger range.
Especially since I am in the process of encouraging the use of more
uids.

> If the results of the above will be satisfactory we can avoid the discussion
> about which hash function we should really be using. If not, I guess now is a
> good time for that :)

Yes.  A small emperical test sounds good.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: O_DIRECT to md raid 6 is slow

2012-08-14 Thread kedacomkernel
On 2012-08-15 08:49 Andy Lutomirski  Wrote:
>If I do:
># dd if=/dev/zero of=/dev/md0p1 bs=8M
>then iostat -m 5 says:
>
>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>   0.000.00   26.88   35.270.00   37.85
>
>Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
>sdb 265.20 1.1654.79  5273
>sdc 266.20 1.4754.73  7273
>sdd 264.20 1.3854.54  6272
>sdf 286.00 1.8454.74  9273
>sde 266.60 1.0454.75  5273
>sdg 265.00 1.0254.74  5273
>md0   55808.00 0.00   218.00  0   1090
>
>If I do:
># dd if=/dev/zero of=/dev/md0p1 bs=8M oflag=direct
>then iostat -m 5 says:
>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>   0.000.00   11.70   12.940.00   75.36
>
>Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
>sdb 831.00 8.5830.42 42152
>sdc 832.80 8.0529.99 40149
>sdd 832.00 9.1029.78 45148
>sdf 838.40 9.1129.72 45148
>sde 828.80 7.9129.79 39148
>sdg 850.80 8.0030.18 40150
>md01012.60 0.00   101.27  0506
>
>It looks like md isn't recognizing that I'm writing whole stripes when
>I'm in O_DIRECT mode.
>
kernel version?

>--Andy
>
>-- 
>Andy Lutomirski
>AMA Capital Management, LLC
>--
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  
>http://vger.kernel.org/majordomo-info.htmlN�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤�
>0鹅h���i

Re: [tip:auto-latest 18/37] kernel/sched/core.c:6460:1: error: 'SD_PREFER_LOCAL' undeclared (first use in this function)

2012-08-14 Thread Alex Shi
On 08/15/2012 04:36 AM, Fengguang Wu wrote:

> Hi Alex,
> 
> FYI, kernel build failed on
> 
> tree:   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git auto-latest
> head:   e42942691c8262b7fe2a7b88577623082b988217
> commit: f03542a7019c600163ac4441d8a826c92c1bd510 [18/37] sched: recover 
> SD_WAKE_AFFINE in select_task_rq_fair and code clean up
> config: tile-tilegx_defconfig (attached as .config)
> 
> All related error/warning messages:
> 
> kernel/sched/core.c: In function 'sd_init_CPU':
> kernel/sched/core.c:6460:1: error: 'SD_PREFER_LOCAL' undeclared (first use in 
> this function)
> kernel/sched/core.c:6460:1: note: each undeclared identifier is reported only 
> once for each function it appears in
> 


Sorry for this mistaken!
The following is fixing patch


>From 033c87bfde319e7b0bdd090a76e5731930332c41 Mon Sep 17 00:00:00 2001
From: Alex Shi 
Date: Wed, 15 Aug 2012 08:14:36 +0800
Subject: [PATCH] sched: remove SD_PREFER_LOCAL in tile arch

commit (sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code
clean up) remove SD_PREFER_LOCAL definition, but left a SD_PREFER_LOCAL
usage in tile arch code. That cause tile arch out of work.

This patch removes the died code from tile arch.

Reported-by: Fengguang Wu 
Signed-off-by: Alex Shi 
---
 arch/tile/include/asm/topology.h |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h
index 7a7ce39..d5e86c9 100644
--- a/arch/tile/include/asm/topology.h
+++ b/arch/tile/include/asm/topology.h
@@ -69,7 +69,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 0*SD_WAKE_AFFINE  \
-   | 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER   \
| 0*SD_SHARE_PKG_RESOURCES  \
| 0*SD_SERIALIZE\
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 15/31] arm64: SMP support

2012-08-14 Thread Olof Johansson
Hi,

On Tue, Aug 14, 2012 at 06:52:16PM +0100, Catalin Marinas wrote:
> This patch adds SMP initialisation and spinlocks implementation for
> AArch64. The spinlock support uses the new load-acquire/store-release
> instructions to avoid explicit barriers. The architecture also specifies
> that an event is automatically generated when clearing the exclusive
> monitor state to wake up processors in WFE, so there is no need for an
> explicit DSB/SEV instruction sequence. The SEVL instruction is used to
> set the exclusive monitor locally as there is no conditional WFE and a
> branch is more expensive.
> 
> For the SMP booting protocol, see Documentation/arm64/booting.txt.
> 
> Signed-off-by: Will Deacon 
> Signed-off-by: Marc Zyngier 
> Signed-off-by: Catalin Marinas 
> ---

> diff --git a/arch/arm64/include/asm/spinlock.h 
> b/arch/arm64/include/asm/spinlock.h
> new file mode 100644
> index 000..34a37fb
> --- /dev/null
> +++ b/arch/arm64/include/asm/spinlock.h
> @@ -0,0 +1,199 @@
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +#ifndef __ASM_SPINLOCK_H
> +#define __ASM_SPINLOCK_H
> +
> +#include 
> +#include 
> +
> +/*
> + * AArch64 Spin-locking.
> + *
> + * We exclusively read the old value.  If it is zero, we may have
> + * won the lock, so we try exclusively storing it.  A memory barrier
> + * is required after we get a lock, and before we release it, because
> + * V6 CPUs are assumed to have weakly ordered memory.

This comment should be updated, to mention the implicit locking and remove the
reference to V6?

Also, ignore previous questions on another reply about need for barriers,
obviously not needed given the load-acquire/store-release semantics.



-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/19] Input: Improve the events-per-packet estimate

2012-08-14 Thread Ping Cheng
On Tue, Aug 14, 2012 at 2:12 PM, Dmitry Torokhov
 wrote:
> >> On Sun, Aug 12, 2012 at 2:42 PM, Henrik Rydberg  
> >> wrote:
> >> > Many MT devices send a number of keys along with the mt information.
> >> > This patch makes sure that there is room for them in the packet
> >> > buffer.
>>
>> So, what device are we talking about here? I thought it is a touch
>> device with a few extra buttons, which are reported as key events. Am
>> I missing something?
>
> I was talking about a bog-standard computer keyboard here.
>
>>
>> If it is a touch device, we won't have too many buttons. So,
>> test_bit(i, dev->keybit) won't be true for more than the number of
>> buttons that declared by __set_bit().
>
> input_estimate_events_per_packet() is a generic routine that is used for
> all devices, not only [multi]touch.

I understand you are talking about standard keyboard. And I know this
routine is for all devices.

However, from the commit comments, the patch is to address an MT
issue. If it is not just for MT, we need either to make it clear in
the comments or to verify the type of the device in the code.

>> I would think we could play a keyboard (this keyboard does not have
>> letters on it ;-) with ten fingers.
>
> But even that keyboard would have more than 10 keys, right? So even
> though max_events should be 10 + 10 + 1 (10 keys, 10 msc, syn) your loop
> would produce what 88 + 88 + 1 for full size music keyboard?

No, I was not talking about implementing full music keyboard functions
in the kernel. My point was: why do we take 7 instead of 10, or
another number?

In fact, 7 works for me as long as we explain the rationale behind the
decision. I do not have a device that needs to post 10 button events
simultaneously, yet ;-).

Ping
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] efi: add efi_runtime state checking

2012-08-14 Thread H. Peter Anvin

On 06/28/2012 10:02 AM, Olof Johansson wrote:

On Wed, Jun 27, 2012 at 2:52 PM, H. Peter Anvin  wrote:

On 06/27/2012 02:35 PM, Olof Johansson wrote:


This adds an efi_runtime variable indicating whether the
efi runtime services are available. The only time they are
expected to not be available is when a 32-bit kernel has been
booted using 64-but EFI and vice versa.

It also adds checking to the two locations where functions are
called; x86 reboot and efivars.



OK, stupid question:

Why is this different from the efi_enabled variable, or rather: why is it
different from what the efi_enabled variable *should* be?  If runtime
services aren't available the only "EFI" that is available to the kernel are
the data structures passed in, and those can be checked directly...


Excellent question, and I think it would work to turn off efi_enabled
towards the end of setup_arch() for non-native boots. That'd solve all
these problems, I believe.

I'll try it out and revise this patch. It might take me a day or two
to get cycles for it.



Ping?

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


O_DIRECT to md raid 6 is slow

2012-08-14 Thread Andy Lutomirski
If I do:
# dd if=/dev/zero of=/dev/md0p1 bs=8M
then iostat -m 5 says:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.000.00   26.88   35.270.00   37.85

Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
sdb 265.20 1.1654.79  5273
sdc 266.20 1.4754.73  7273
sdd 264.20 1.3854.54  6272
sdf 286.00 1.8454.74  9273
sde 266.60 1.0454.75  5273
sdg 265.00 1.0254.74  5273
md0   55808.00 0.00   218.00  0   1090

If I do:
# dd if=/dev/zero of=/dev/md0p1 bs=8M oflag=direct
then iostat -m 5 says:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.000.00   11.70   12.940.00   75.36

Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
sdb 831.00 8.5830.42 42152
sdc 832.80 8.0529.99 40149
sdd 832.00 9.1029.78 45148
sdf 838.40 9.1129.72 45148
sde 828.80 7.9129.79 39148
sdg 850.80 8.0030.18 40150
md01012.60 0.00   101.27  0506

It looks like md isn't recognizing that I'm writing whole stripes when
I'm in O_DIRECT mode.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] user_ns: use new hashtable implementation

2012-08-14 Thread Sasha Levin
On 08/15/2012 01:52 AM, Eric W. Biederman wrote:
> Sasha Levin  writes:
> 
>> Switch user_ns to use the new hashtable implementation. This reduces the 
>> amount of
>> generic unrelated code in user_ns.
> 
> Two concerns here.
> 1) When adding a new entry you recompute the hash where previously that
>was not done.  I believe that will slow down adding of new entries.

I figured that the price for the extra hashing isn't significant since hash_32
is just a multiplication and a shift.

I'll modify the code to calculate the key just once.

> 2) Using hash_32 for uids is an interesting choice.  hash_32 discards
>the low bits.  Last I checked for uids the low bits were the bits
>that were most likely to be different and had the most entropy.
> 
>I'm not certain how multiplying by the GOLDEN_RATION_PRIME_32 will
>affect things but I would be surprised if it shifted all of the
>randomness from the low bits to the high bits.

"Is hash_* good enough for our purpose?" - I was actually surprised that no one
raised that question during the RFC and assumed it was because everybody agreed
that it's indeed good enough.

I can offer the following: I'll write a small module that will hash 1...1
into a hashtable which uses 7 bits (just like user_ns) and post the distribution
we'll get.

If the results of the above will be satisfactory we can avoid the discussion
about which hash function we should really be using. If not, I guess now is a
good time for that :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 14/31] arm64: DMA mapping API

2012-08-14 Thread Olof Johansson
Hi,


On Tue, Aug 14, 2012 at 06:52:15PM +0100, Catalin Marinas wrote:
> This patch adds support for the DMA mapping API. It uses dma_map_ops for
> flexibility and it currently supports swiotlb. This patch could be
> simplified further if the DMA accesses are coherent (not mandated by the
> architecture) or if corresponding hooks are placed in the generic
> swiotlb code to deal with cache maintenance.
> 
> Signed-off-by: Catalin Marinas 
> ---
>  arch/arm64/include/asm/dma-mapping.h |  124 
>  arch/arm64/mm/dma-mapping.c  |  208 
> ++
>  2 files changed, 332 insertions(+), 0 deletions(-)
>  create mode 100644 arch/arm64/include/asm/dma-mapping.h
>  create mode 100644 arch/arm64/mm/dma-mapping.c
> 
> diff --git a/arch/arm64/include/asm/dma-mapping.h 
> b/arch/arm64/include/asm/dma-mapping.h
> new file mode 100644
> index 000..538f4b4
> --- /dev/null
> +++ b/arch/arm64/include/asm/dma-mapping.h
> @@ -0,0 +1,124 @@
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +#ifndef __ASM_DMA_MAPPING_H
> +#define __ASM_DMA_MAPPING_H
> +
> +#ifdef __KERNEL__
> +
> +#include 
> +#include 
> +
> +#include 
> +
> +#define ARCH_HAS_DMA_GET_REQUIRED_MASK
> +
> +extern struct dma_map_ops *dma_ops;
> 
> +static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> +{
> + if (unlikely(!dev) || !dev->archdata.dma_ops)
> + return dma_ops;
> + else
> + return dev->archdata.dma_ops;
> +}

Does it make sense to add the concept of a global dma ops on arm64,
instead of requiring the dma ops pointer per device similar to how
some other platforms do it (including powerpc)? For devices that lack
archdata.dma_ops, dma_supported() should return 0 (and the other ops
should return error).



-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 13/31] arm64: Device specific operations

2012-08-14 Thread Olof Johansson
On Tue, Aug 14, 2012 at 06:52:14PM +0100, Catalin Marinas wrote:
> This patch adds several definitions for device communication, including
> I/O accessors and ioremap(). The __raw_* accessors are implemented as
> inline asm to avoid compiler generation of post-indexed accesses (less
> efficient to emulate in a virtualised environment).
> 
> Signed-off-by: Will Deacon 
> Signed-off-by: Catalin Marinas 
> ---
>  arch/arm64/include/asm/device.h |   26 
>  arch/arm64/include/asm/fb.h |   34 +
>  arch/arm64/include/asm/io.h |  263 
> +++
>  arch/arm64/kernel/io.c  |   64 ++
>  arch/arm64/mm/ioremap.c |   84 +
>  5 files changed, 471 insertions(+), 0 deletions(-)
>  create mode 100644 arch/arm64/include/asm/device.h
>  create mode 100644 arch/arm64/include/asm/fb.h
>  create mode 100644 arch/arm64/include/asm/io.h
>  create mode 100644 arch/arm64/kernel/io.c
>  create mode 100644 arch/arm64/mm/ioremap.c
> 
> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> new file mode 100644
> index 000..48fa83f
> --- /dev/null
> +++ b/arch/arm64/include/asm/io.h

[...]

> +/*
> + *  I/O port access primitives.
> + */
> +#define IO_SPACE_LIMIT   0x
> +
> +/*
> + * We currently don't have any platform with PCI support, so just leave this
> + * defined to 0 until needed.
> + */
> +#define PCI_IOBASE   ((void __iomem *)0)

You could just leave out the PCI / I/O code alltogether instead.

> diff --git a/arch/arm64/kernel/io.c b/arch/arm64/kernel/io.c
> new file mode 100644
> index 000..7d37ead
> --- /dev/null
> +++ b/arch/arm64/kernel/io.c
> @@ -0,0 +1,64 @@
> +/*
> + * Based on arch/arm/kernel/io.c
> + *
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * Copy data from IO memory space to "real" memory space.
> + */
> +void __memcpy_fromio(void *to, const volatile void __iomem *from, size_t 
> count)
> +{
> + unsigned char *t = to;
> + while (count) {
> + count--;
> + *t = readb(from);
> + t++;
> + from++;
> + }
> +}
> +EXPORT_SYMBOL(__memcpy_fromio);
> +
> +/*
> + * Copy data from "real" memory space to IO memory space.
> + */
> +void __memcpy_toio(volatile void __iomem *to, const void *from, size_t count)
> +{
> + const unsigned char *f = from;
> + while (count) {
> + count--;
> + writeb(*f, to);
> + f++;
> + to++;
> + }
> +}
> +EXPORT_SYMBOL(__memcpy_toio);
> +
> +/*
> + * "memset" on IO memory space.
> + */
> +void __memset_io(volatile void __iomem *dst, int c, size_t count)
> +{
> + while (count) {
> + count--;
> + writeb(c, dst);
> + dst++;
> + }
> +}
> +EXPORT_SYMBOL(__memset_io);

Doing all of the above a byte at a time is horribly inefficient. Feel
free to borrow the implementations from arch/powerpc/kernel/io.c instead
of from ARM.


-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/16] hashtable: introduce a small and naive hashtable

2012-08-14 Thread Tejun Heo
Hello,

(Sasha, would it be possible to change your MUA so that it breaks long
 lines.  It's pretty difficult to reply to.)

On Wed, Aug 15, 2012 at 02:24:49AM +0200, Sasha Levin wrote:
> The hashtable uses hlist. hlist provides us with an entire family of
> init functions which I'm supposed to use to initialize hlist heads.
> 
> So while a memset(0) will work perfectly here, I consider that
> cheating - it results in an uglier code that assumes to know about
> hlist internals, and will probably break as soon as someone tries to
> do something to hlist.

I think we should stick with INIT_HLIST_HEAD().  It's not a hot path
and we might add, say, debug fields or initialization magics added
later.  If this really matters, the right thing to do would be adding
something like INIT_HLIST_HEAD_ARRAY().

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 36/44] x86, microcode: Sanitize per-cpu microcode reloading interface

2012-08-14 Thread Henrique de Moraes Holschuh
I believe the patch bellow, which was required on 3.2, will also be
necessary.

From: Kevin Winchester 
Subject: [PATCH] x86: Simplify code by removing a !SMP #ifdefs from 'struct
 cpuinfo_x86'

commit 141168c36cdee3ff23d9c7700b0edc47cb65479f and
commit 3f806e50981825fa56a7f1938f24c0680816be45 upstream.

Several fields in struct cpuinfo_x86 were not defined for the
!SMP case, likely to save space.  However, those fields still
have some meaning for UP, and keeping them allows some #ifdef
removal from other files.  The additional size of the UP kernel
from this change is not significant enough to worry about
keeping up the distinction:

   textdata bss dec hex filename
4737168  506459  972040 6215667  5ed7f3 vmlinux.o.before
4737444  506459  972040 6215943  5ed907 vmlinux.o.after

for a difference of 276 bytes for an example UP config.

If someone wants those 276 bytes back badly then it should
be implemented in a cleaner way.

Signed-off-by: Kevin Winchester 
Cc: Steffen Persvold 
Link: 
http://lkml.kernel.org/r/1324428742-12498-1-git-send-email-kjwinches...@gmail.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Borislav Petkov 
Signed-off-by: Ben Hutchings 
---
 arch/x86/include/asm/processor.h |2 --
 arch/x86/kernel/amd_nb.c |8 ++--
 arch/x86/kernel/cpu/amd.c|2 --
 arch/x86/kernel/cpu/common.c |5 -
 arch/x86/kernel/cpu/intel.c  |2 --
 arch/x86/kernel/cpu/mcheck/mce.c |2 --
 arch/x86/kernel/cpu/mcheck/mce_amd.c |5 +
 arch/x86/kernel/cpu/proc.c   |4 +---
 drivers/edac/sb_edac.c   |2 --
 drivers/hwmon/coretemp.c |7 +++
 10 files changed, 7 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index bb3ee36..f7c89e2 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -99,7 +99,6 @@ struct cpuinfo_x86 {
u16 apicid;
u16 initial_apicid;
u16 x86_clflush_size;
-#ifdef CONFIG_SMP
/* number of cores as seen by the OS: */
u16 booted_cores;
/* Physical processor id: */
@@ -110,7 +109,6 @@ struct cpuinfo_x86 {
u8  compute_unit_id;
/* Index into per_cpu list: */
u16 cpu_index;
-#endif
u32 microcode;
 } __attribute__((__aligned__(SMP_CACHE_BYTES)));
 
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index bae1efe..be16854 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -154,16 +154,14 @@ int amd_get_subcaches(int cpu)
 {
struct pci_dev *link = node_to_amd_nb(amd_get_nb_id(cpu))->link;
unsigned int mask;
-   int cuid = 0;
+   int cuid;
 
if (!amd_nb_has_feature(AMD_NB_L3_PARTITIONING))
return 0;
 
pci_read_config_dword(link, 0x1d4, );
 
-#ifdef CONFIG_SMP
cuid = cpu_data(cpu).compute_unit_id;
-#endif
return (mask >> (4 * cuid)) & 0xf;
 }
 
@@ -172,7 +170,7 @@ int amd_set_subcaches(int cpu, int mask)
static unsigned int reset, ban;
struct amd_northbridge *nb = node_to_amd_nb(amd_get_nb_id(cpu));
unsigned int reg;
-   int cuid = 0;
+   int cuid;
 
if (!amd_nb_has_feature(AMD_NB_L3_PARTITIONING) || mask > 0xf)
return -EINVAL;
@@ -190,9 +188,7 @@ int amd_set_subcaches(int cpu, int mask)
pci_write_config_dword(nb->misc, 0x1b8, reg & ~0x18);
}
 
-#ifdef CONFIG_SMP
cuid = cpu_data(cpu).compute_unit_id;
-#endif
mask <<= 4 * cuid;
mask |= (0xf ^ (1 << cuid)) << 26;
 
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 3524e1f..ff8557e 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -148,7 +148,6 @@ static void __cpuinit init_amd_k6(struct cpuinfo_x86 *c)
 
 static void __cpuinit amd_k7_smp_check(struct cpuinfo_x86 *c)
 {
-#ifdef CONFIG_SMP
/* calling is from identify_secondary_cpu() ? */
if (!c->cpu_index)
return;
@@ -192,7 +191,6 @@ static void __cpuinit amd_k7_smp_check(struct cpuinfo_x86 
*c)
 
 valid_k7:
;
-#endif
 }
 
 static void __cpuinit init_amd_k7(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index aa003b1..ca93cc7 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -676,9 +676,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
if (this_cpu->c_early_init)
this_cpu->c_early_init(c);
 
-#ifdef CONFIG_SMP
c->cpu_index = 0;
-#endif
filter_cpuid_features(c, false);
 
setup_smep(c);
@@ -764,10 +762,7 @@ static void __cpuinit generic_identify(struct cpuinfo_x86 
*c)
c->apicid = 

Re: [PATCH 01/16] hashtable: introduce a small and naive hashtable

2012-08-14 Thread Sasha Levin
On 08/15/2012 01:25 AM, NeilBrown wrote:
> On Tue, 14 Aug 2012 18:24:35 +0200 Sasha Levin 
> wrote:
> 
> 
>> +static inline void hash_init_size(struct hlist_head *hashtable, int bits)
>> +{
>> +int i;
>> +
>> +for (i = 0; i < HASH_SIZE(bits); i++)
>> +INIT_HLIST_HEAD(hashtable + i);
>> +}
> 
> This seems like an inefficient way to do "memset(hashtable, 0, ...);".
> And in many cases it isn't needed as the hash table is static and initialised
> to zero.
> I note that in the SUNRPC/cache patch you call hash_init(), but in the lockd
> patch you don't.  You don't actually need to in either case.

Agreed that the code will run just fine if we wouldn't use hash_init().

> I realise that any optimisation here is for code that is only executed once
> per boot, so no big deal, and even the presence of extra code making the
> kernel bigger is unlikely to be an issue.  But I'd at least like to see
> consistency: Either use hash_init everywhere, even when not needed, or only
> use it where absolutely needed which might be no-where because static tables
> are already initialised, and dynamic tables can use GFP_ZERO.

This is a consistency problem. I didn't want to add a module_init() to modules 
that didn't have it just to get hash_init() in there.

I'll get it fixed.

> And if you keep hash_init_size I would rather see a memset(0)

My concern with using a memset(0) is that I'm going to break layering.

The hashtable uses hlist. hlist provides us with an entire family of init 
functions which I'm supposed to use to initialize hlist heads.

So while a memset(0) will work perfectly here, I consider that cheating - it 
results in an uglier code that assumes to know about hlist internals, and will 
probably break as soon as someone tries to do something to hlist.

I can think of several alternatives here, and all of them involve changes to 
hlist instead of the hashtable:

 - Remove INIT_HLIST_HEAD()/HLIST_HEAD()/HLIST_HEAD_INIT() and introduce a 
CLEAR_HLIST instead, documenting that it's enough to memset(0) the hlist to 
initialize it properly.
 - Add a block initializer INIT_HLIST_HEADS() or something similar that would 
initialize an array of heads.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 12/31] arm64: Atomic operations

2012-08-14 Thread Olof Johansson
Hi,

On Tue, Aug 14, 2012 at 06:52:13PM +0100, Catalin Marinas wrote:
> This patch introduces the atomic, mutex and futex operations. Many
> atomic operations use the load-acquire and store-release operations
> which imply barriers, avoiding the need for explicit DMB.
> 
> Signed-off-by: Will Deacon 
> Signed-off-by: Catalin Marinas 
> ---
>  arch/arm64/include/asm/atomic.h |  306 
> +++
>  arch/arm64/include/asm/futex.h  |  134 +
>  2 files changed, 440 insertions(+), 0 deletions(-)
>  create mode 100644 arch/arm64/include/asm/atomic.h
>  create mode 100644 arch/arm64/include/asm/futex.h
> 
> diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
> new file mode 100644
> index 000..fa60c8b
> --- /dev/null
> +++ b/arch/arm64/include/asm/atomic.h
> @@ -0,0 +1,306 @@
> +/*
> + * Based on arch/arm/include/asm/atomic.h
> + *
> + * Copyright (C) 1996 Russell King.
> + * Copyright (C) 2002 Deep Blue Solutions Ltd.
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +#ifndef __ASM_ATOMIC_H
> +#define __ASM_ATOMIC_H
> +
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +
> +#define ATOMIC_INIT(i)   { (i) }
> +
> +#ifdef __KERNEL__
> +
> +/*
> + * On ARM, ordinary assignment (str instruction) doesn't clear the local
> + * strex/ldrex monitor on some implementations. The reason we can use it for
> + * atomic_set() is the clrex or dummy strex done on every exception return.
> + */
> +#define atomic_read(v)   (*(volatile int *)&(v)->counter)
> +#define atomic_set(v,i)  (((v)->counter) = (i))
> +
> +/*
> + * AArch64 UP and SMP safe atomic ops.  We use load exclusive and
> + * store exclusive to ensure that these are atomic.  We may loop
> + * to ensure that the update happens.
> + */
> +static inline void atomic_add(int i, atomic_t *v)
> +{
> + unsigned long tmp;
> + int result;
> +
> + asm volatile("// atomic_add\n"
> +"1:  ldxr%w0, [%3]\n"
> +"add %w0, %w0, %w4\n"
> +"stxr%w1, %w0, [%3]\n"
> +"cbnz%w1,1b"

Nit: space before 1b

[...]

> diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
> new file mode 100644
> index 000..0745e82
> --- /dev/null
> +++ b/arch/arm64/include/asm/futex.h
> @@ -0,0 +1,134 @@
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +#ifndef __ASM_FUTEX_H
> +#define __ASM_FUTEX_H
> +
> +#ifdef __KERNEL__
> +
> +#include 
> +#include 
> +#include 
> +
> +#define __futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg)  
> \
> + asm volatile(   \
> +"1:  ldaxr   %w1, %2\n"  \
> + insn "\n"   \
> +"2:  stlxr   %w3, %w0, %2\n" \
> +"cbnz%w3, 1b\n"  \
> +"3:  .pushsection __ex_table,\"a\"\n"\
> +".align  3\n"\
> +".quad   1b, 4f, 2b, 4f\n"   \
> +".popsection\n"  \
> +".pushsection .fixup,\"ax\"\n"   \

Moving the exception table below the body of the code makes the flow easier to
read, please do that.

Also, don't you need a barrier here?

> +"4:  mov %w0, %w5\n" \
> +"b   3b\n"   \
> +".popsection"\
> + : "=" (ret), "=" (oldval), "+Q" (*uaddr), "=" (tmp)   \
> + : "r" (oparg), "Ir" (-EFAULT)  

Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers

2012-08-14 Thread Tejun Heo
Hello, Thomas.

On Wed, Aug 15, 2012 at 01:33:09AM +0200, Thomas Gleixner wrote:
> To convince me to accept your patches you should start answering my
> questions and suggestions seriously in the first place and not
> discarding them upfront as lunatic visions.
> 
> As long as you can't provide a proper counter argument against
> maintaining the timer in the same context as the work, no matter what
> the underlying mechanism to achieve this will be, I'm not going to
> accept any of this hackery neither near next nor mainline.

Sure, that's exactly why the patches are posted for review, so you're
suggesting for workqueue implement essentially its own timer list - be
that a simple sorted linked list or priority heap.  Am I understanding
you correctly?

If so, we're comparing the following two.

a. Adding IRQSAFE timer.  Runtime cost is one additional if() in timer
   execution path.

b. Implementing workqueue's own timer system which is driven by timer
   so that the timer part can also be protected by the usual workqueue
   synchronization.

To me, #a seems like the better choice here.

delayed_work is one of the more common constructs used widely in the
kernel.  It's often used in device drivers to defer processing to
process context and timer queueing (including modification) is a
frequent operation.

IRQ handlers schedule them, some drivers use it to poll the device,
block layer uses it for most of deferred handling - SCSI layer failing
to issue a command due to resource shortage reschedules delayed_work
repeatedly, and so on.

Essentially, delayed_work might be used for any purpose a timer is
used.  Timer users switch to delayed_work if process context becomes
necessary for whatever reason, so, I don't think we can get away with
simple sorted list implementation.  We might be okay under some
workloads but O(N^2) insertion complexity for something as commonly
used as delayed_work doesn't seem like a good idea to me.

We could go for more involved implementation, say a priority heap or
somewhat simplified version of tvec_base, but that seems like a bad
tradeoff to me.  We would be trading off fairly complex chunk of code
duplicating an existing capability to avoid adding fairly small
feature to the timer.  It will likely be worse than the proper timer
and we have to maintain two chunks of code doing about the same thing
to save single if() in the existing timer code.

Let's see if we can agree on the latter point first.  Do you agree
that it wouldn't be a good idea to implement relatively complex timer
subsystem inside workqueue?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/11] fblog: register one fblog object per framebuffer

2012-08-14 Thread Ryan Mallon
On 14/08/12 21:01, David Herrmann wrote:
> Hi Ryan
> 
> On Mon, Aug 13, 2012 at 1:54 AM, Ryan Mallon  wrote:
>> On 13/08/12 00:53, David Herrmann wrote:
>>>  drivers/video/console/fblog.c | 195 
>>> ++
>>>  1 file changed, 195 insertions(+)
>>>
>>> diff --git a/drivers/video/console/fblog.c b/drivers/video/console/fblog.c
>>> index fb39737..279f4d8 100644
>>> --- a/drivers/video/console/fblog.c
>>> +++ b/drivers/video/console/fblog.c
>>> @@ -23,15 +23,210 @@
>>>   * all fblog instances before running other graphics applications.
>>>   */
>>>
>>> +#define pr_fmt(_fmt) KBUILD_MODNAME ": " _fmt
>>> +
>>> +#include 
>>> +#include 
>>>  #include 
>>> +#include 
>>> +
>>> +enum fblog_flags {
>>> + FBLOG_KILLED,
>>> +};
>>> +
>>> +struct fblog_fb {
>>> + unsigned long flags;
>>
>> Are more flags added in later patches? If not, why not just have:
>>
>>   bool is_killed;
>>
>> ?
> 
> Yes, more are added in patch-6 and patch-8 which includes FBLOG_OPEN,
> FBLOG_SUSPENDED, FBLOG_BLANKED.
> 
>>> +static void fblog_release(struct device *dev)
>>> +{
>>> + struct fblog_fb *fb = to_fblog_dev(dev);
>>> +
>>> + kfree(fb);
>>> + module_put(THIS_MODULE);
>>> +}
>>> +
>>> +static void fblog_do_unregister(struct fb_info *info)
>>> +{
>>> + struct fblog_fb *fb;
>>> +
>>> + fb = fblog_fbs[info->node];
>>> + if (!fb || fb->info != info)
>>> + return;
>>> +
>>> + fblog_fbs[info->node] = NULL;
>>> +
>>> + device_del(>dev);
>>> + put_device(>dev);
>>
>> device_unregister?
> 
> Right, I will replace it.
> 
>>> +}
>>> +
>>> +static void fblog_do_register(struct fb_info *info, bool force)
>>> +{
>>> + struct fblog_fb *fb;
>>> + int ret;
>>> +
>>> + fb = fblog_fbs[info->node];
>>> + if (fb && fb->info != info) {
>>> + if (!force)
>>> + return;
>>> +
>>> + fblog_do_unregister(fb->info);
>>> + }
>>> +
>>> + fb = kzalloc(sizeof(*fb), GFP_KERNEL);
>>> + if (!fb)
>>> + return;
>>> +
>>> + fb->info = info;
>>> + __module_get(THIS_MODULE);
>>> + device_initialize(>dev);
>>> + fb->dev.class = fb_class;
>>> + fb->dev.release = fblog_release;
>>> + dev_set_name(>dev, "fblog%d", info->node);
>>> + fblog_fbs[info->node] = fb;
>>> +
>>> + ret = device_add(>dev);
>>> + if (ret) {
>>> + fblog_fbs[info->node] = NULL;
>>> + set_bit(FBLOG_KILLED, >flags);
>>> + put_device(>dev);
>>
>> kfree(fb); ?
> 
> No. See device_initialize() in ./drivers/base/core.c. After a call to
> device_initialize() the object is ref-counted so put_device() will
> invoke the fblog_release() callback which will call kfree(fb) itself.
> 
>>> + return;
>>> + }
>>> +}
>>> +
>>> +static void fblog_register(struct fb_info *info, bool force)
>>> +{
>>> + mutex_lock(_registration_lock);
>>> + fblog_do_register(info, force);
>>> + mutex_unlock(_registration_lock);
>>> +}
>>> +
>>> +static void fblog_unregister(struct fb_info *info)
>>> +{
>>> + mutex_lock(_registration_lock);
>>> + fblog_do_unregister(info);
>>> + mutex_unlock(_registration_lock);
>>> +}
>>
>> This locking is needlessly heavy, and could easily pushed down into the
>> fb_do_(un)register functions. It would also help make it clear exactly
>> what the lock is protecting.
> 
> I need to call fblog_do_unregister() from within fblog_do_register().
> I cannot release the locks while calling fblog_do_unregister() so I
> need the unlocked fblog_do_unregister() function. So the locking must
> be in a wrapper function.
> 
> See below for an explanation of the locks.

I meant something like the below. It doesn't actually make the lock much
more fine-grained, but (IMHO) it does make it a bit more clear how the
lock is being used. I also don't think you need to split
device_initialize and device_add, which can make the code a bit simpler:

static void __fblog_unregister(struct fblog_fb *fb)
{
fblog_fbs[fb->info->node] = NULL;   
device_unregister(>dev);
}

static void fblog_unregister(struct fb_info *info)
{
struct fblog_fb *fb;

mutex_lock(_registration_lock);
fb = fblog_fbs[info->node];
if (!fb || fb->info != info) {
mutex_unlock(_registration_lock);
return;
}

__fblog_unregister(fb);
mutex_unlock(_registration_lock);
}

static int fblog_register(struct fb_info *info, bool force)
{
struct fblog_fb *fb;
int ret;

mutex_lock(_registration_lock);
fb = fblog_fbs[info->node];
if (fb && fb->info != info) {
if (!force) {
mutex_unlock(_registration_lock);
return -EEXIST;
}

__fblog_unregister(fb);
}

fb = kzalloc(sizeof(*fb), GFP_KERNEL);
if (!fb)
return;


Re: [PATCH v2 08/31] arm64: CPU support

2012-08-14 Thread Olof Johansson
Hi,

On Tue, Aug 14, 2012 at 06:52:09PM +0100, Catalin Marinas wrote:

> diff --git a/arch/arm64/include/asm/cputype.h 
> b/arch/arm64/include/asm/cputype.h
> new file mode 100644
> index 000..ef54125
> --- /dev/null
> +++ b/arch/arm64/include/asm/cputype.h
> @@ -0,0 +1,49 @@
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +#ifndef __ASM_CPUTYPE_H
> +#define __ASM_CPUTYPE_H
> +
> +#define ID_MIDR_EL1  "midr_el1"
> +#define ID_CTR_EL0   "ctr_el0"
> +
> +#define ID_AA64PFR0_EL1  "id_aa64pfr0_el1"
> +#define ID_AA64DFR0_EL1  "id_aa64dfr0_el1"
> +#define ID_AA64AFR0_EL1  "id_aa64afr0_el1"
> +#define ID_AA64ISAR0_EL1 "id_aa64isar0_el1"
> +#define ID_AA64MMFR0_EL1 "id_aa64mmfr0_el1"
> +
> +#define read_cpuid(reg) ({   \
> + u64 __val;  \
> + asm("mrs%0, " reg : "=r" (__val));  \
> + __val;  \
> +})
> +
> +/*
> + * The CPU ID never changes at run time, so we might as well tell the
> + * compiler that it's constant.  Use this function to read the CPU ID
> + * rather than directly reading processor_id or read_cpuid() directly.
> + */
> +static inline u32 __attribute_const__ read_cpuid_id(void)
> +{
> + return read_cpuid(ID_MIDR_EL1);
> +}
> +
> +static inline u32 __attribute_const__ read_cpuid_cachetype(void)
> +{
> + return read_cpuid(ID_CTR_EL0);
> +}

Is this perhaps a carry-over from arch/arm? Abstracting out read_cpuid()
doesn't seem to buy anything here, just opencode the one-line assembly
in each.

Might as well cleanup the naming a little too while you're at it, i.e.
read_cpu_id() and read_cpu_cachetype().


> diff --git a/arch/arm64/include/asm/procinfo.h 
> b/arch/arm64/include/asm/procinfo.h
> new file mode 100644
> index 000..81fece9
> --- /dev/null
> +++ b/arch/arm64/include/asm/procinfo.h
> @@ -0,0 +1,44 @@
> +/*
> + * Based on arch/arm/include/asm/procinfo.h
> + *
> + * Copyright (C) 1996-1999 Russell King
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +#ifndef __ASM_PROCINFO_H
> +#define __ASM_PROCINFO_H
> +
> +#ifdef __KERNEL__
> +
> +/*
> + * Note!  struct processor is always defined if we're
> + * using MULTI_CPU, otherwise this entry is unused,
> + * but still exists.

Stale comment?

> + *
> + * NOTE! The following structure is defined by assembly
> + * language, NOT C code.  For more information, check:
> + *  arch/arm/mm/proc-*.S and arch/arm/kernel/head.S

Stale references. Also, no current arm64 implementation uses this. Premature
abstraction perhaps?

> +struct proc_info_list {
> + unsigned intcpu_val;
> + unsigned intcpu_mask;
> + unsigned long   __cpu_flush;/* used by head.S */
> + const char  *cpu_name;
> +};
> +
> +#else/* __KERNEL__ */
> +#include 
> +#warning "Please include asm/elf.h instead"
> +#endif   /* __KERNEL__ */
> +#endif
> diff --git a/arch/arm64/mm/proc-syms.c b/arch/arm64/mm/proc-syms.c
> new file mode 100644
> index 000..2d99ef9
> --- /dev/null
> +++ b/arch/arm64/mm/proc-syms.c
> @@ -0,0 +1,31 @@
> +/*
> + * Based on arch/arm/mm/proc-syms.c
> + *
> + * Copyright (C) 2000-2002 Russell King
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU 

Re: [patch 3/8] procfs: Add ability to plug in auxiliary fdinfo providers

2012-08-14 Thread Al Viro
On Wed, Aug 15, 2012 at 02:21:47AM +0400, Cyrill Gorcunov wrote:
> > Hmm, in very first versions I've been using one ->show method, but
> > then I thought that this is not very correlate with seq-files idea
> > where for each record show/next sequence is called. I'll update (this
> > for sure will make code simplier, and I'll have to check for seq-file
> > overflow after seq_printf call to not continue printing data for too
> > long if buffer already out of space).
> 
> Al, I'll cook the whole series tomorrow and resend it for review,
> also I guess the new show_fdinfo() member in file-operations should
> be guarded with CONFIG_PROC_FS, right?

I seriously doubt that it's worth bothering.  If somebody cares, they
can add making it conditional later.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pull request: wireless 2012-08-14

2012-08-14 Thread David Miller
From: "John W. Linville" 
Date: Tue, 14 Aug 2012 15:02:46 -0400

> Alexey Khoroshilov provides a potential memory leak in rndis_wlan.
> 
> Bob Copeland gives us an ath5k fix for a lockdep problem.
> 
> Dan Carpenter fixes a signedness mismatch in at76c50x.
> 
> Felix Fietkau corrects a regression caused by an earlier commit that can
> lead to an IRQ storm.
> 
> Lorenzo Bianconi offers a fix for a bad variable initialization in ath9k
> that can cause it to improperly mark decrypted frames.
> 
> Rajkumar Manoharan fixes ath9k to prevent the btcoex time from running
> when the hardware is asleep.
> 
> The remainder are Bluetooth fixes, about which Gustavo says:
> 
>   "Here goes some fixes for 3.6-rc1, there are a few fix to
>   thte inquiry code by Ram Malovany, support for 2 new devices,
>   and few others fixes for NULL dereference, possible deadlock
>   and a memory leak."
> 
> Please let me know if there are problems!

Pulled, thanks John.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] drivers/net/ethernet/mellanox/mlx4/mcg.c: fix error return code

2012-08-14 Thread David Miller
From: Julia Lawall 
Date: Tue, 14 Aug 2012 14:58:34 +0200

> From: Julia Lawall 
> 
> Convert a 0 error return code to a negative one, as returned elsewhere in the
> function.
> 
> A simplified version of the semantic match that finds this problem is as
> follows: (http://coccinelle.lip6.fr/)
 ...
> Signed-off-by: Julia Lawall 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] drivers/net/ethernet/freescale/fs_enet: fix error return code

2012-08-14 Thread David Miller
From: Julia Lawall 
Date: Tue, 14 Aug 2012 14:58:33 +0200

> From: Julia Lawall 
> 
> Convert a 0 error return code to a negative one, as returned elsewhere in the
> function.
> 
> A simplified version of the semantic match that finds this problem is as
> follows: (http://coccinelle.lip6.fr/)
 ...
> Signed-off-by: Julia Lawall 

Applied.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] drivers/net/ethernet/ti/davinci_cpdma.c: Remove potential NULL dereference

2012-08-14 Thread David Miller
From: Julia Lawall 
Date: Tue, 14 Aug 2012 17:49:47 +0200

> From: Julia Lawall 
> 
> If the NULL test is necessary, the initialization involving a dereference of
> the tested value should be moved after the NULL test.
> 
> The sematic patch that fixes this problem is as follows:
> (http://coccinelle.lip6.fr/)
 ...
> Signed-off-by: Julia Lawall 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers

2012-08-14 Thread Tejun Heo
Hello,

On Tue, Aug 14, 2012 at 4:46 PM, Thomas Gleixner  wrote:
> Do you really expect that I follow all of kernel dev posts within a
> day of returning from a two weeks vacation?

The head message says on what it's based on and the git branch. I
can't read your mind or know your current state. You could have asked
using a proper sentence.

We can continue this, but I don't think this is leading any place
productive. I'm replying to your other reply about the suggestion
about implementing workqueue's own timerlist. Let's just talk about
that. Please.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] user_ns: use new hashtable implementation

2012-08-14 Thread Eric W. Biederman
Sasha Levin  writes:

> Switch user_ns to use the new hashtable implementation. This reduces the 
> amount of
> generic unrelated code in user_ns.

Two concerns here.
1) When adding a new entry you recompute the hash where previously that
   was not done.  I believe that will slow down adding of new entries.

2) Using hash_32 for uids is an interesting choice.  hash_32 discards
   the low bits.  Last I checked for uids the low bits were the bits
   that were most likely to be different and had the most entropy.

   I'm not certain how multiplying by the GOLDEN_RATION_PRIME_32 will
   affect things but I would be surprised if it shifted all of the
   randomness from the low bits to the high bits.

And just a nit.  struct user is essentially orthogonal to the user namespace
at this point, making the description of the patch a little weird.

Eric

> Signed-off-by: Sasha Levin 
> ---
>  kernel/user.c |   33 +
>  1 files changed, 13 insertions(+), 20 deletions(-)
>
> diff --git a/kernel/user.c b/kernel/user.c
> index b815fef..d10c484 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * userns count is 1 for root user, 1 for init_uts_ns,
> @@ -52,13 +53,9 @@ EXPORT_SYMBOL_GPL(init_user_ns);
>   */
>  
>  #define UIDHASH_BITS (CONFIG_BASE_SMALL ? 3 : 7)
> -#define UIDHASH_SZ   (1 << UIDHASH_BITS)
> -#define UIDHASH_MASK (UIDHASH_SZ - 1)
> -#define __uidhashfn(uid) (((uid >> UIDHASH_BITS) + uid) & UIDHASH_MASK)
> -#define uidhashentry(uid)(uidhash_table + __uidhashfn((__kuid_val(uid
>  
>  static struct kmem_cache *uid_cachep;
> -struct hlist_head uidhash_table[UIDHASH_SZ];
> +static DEFINE_HASHTABLE(uidhash_table, UIDHASH_BITS)
>  
>  /*
>   * The uidhash_lock is mostly taken from process context, but it is
> @@ -84,22 +81,22 @@ struct user_struct root_user = {
>  /*
>   * These routines must be called with the uidhash spinlock held!
>   */
> -static void uid_hash_insert(struct user_struct *up, struct hlist_head 
> *hashent)
> +static void uid_hash_insert(struct user_struct *up)
>  {
> - hlist_add_head(>uidhash_node, hashent);
> + hash_add(uidhash_table, >uidhash_node, __kuid_val(up->uid));
>  }
>  
>  static void uid_hash_remove(struct user_struct *up)
>  {
> - hlist_del_init(>uidhash_node);
> + hash_del(>uidhash_node);
>  }
>  
> -static struct user_struct *uid_hash_find(kuid_t uid, struct hlist_head 
> *hashent)
> +static struct user_struct *uid_hash_find(kuid_t uid)
>  {
>   struct user_struct *user;
>   struct hlist_node *h;
>  
> - hlist_for_each_entry(user, h, hashent, uidhash_node) {
> + hash_for_each_possible(uidhash_table, user, h, uidhash_node, 
> __kuid_val(uid)) {
>   if (uid_eq(user->uid, uid)) {
>   atomic_inc(>__count);
>   return user;
> @@ -135,7 +132,7 @@ struct user_struct *find_user(kuid_t uid)
>   unsigned long flags;
>  
>   spin_lock_irqsave(_lock, flags);
> - ret = uid_hash_find(uid, uidhashentry(uid));
> + ret = uid_hash_find(uid);
>   spin_unlock_irqrestore(_lock, flags);
>   return ret;
>  }
> @@ -156,11 +153,10 @@ void free_uid(struct user_struct *up)
>  
>  struct user_struct *alloc_uid(kuid_t uid)
>  {
> - struct hlist_head *hashent = uidhashentry(uid);
>   struct user_struct *up, *new;
>  
>   spin_lock_irq(_lock);
> - up = uid_hash_find(uid, hashent);
> + up = uid_hash_find(uid);
>   spin_unlock_irq(_lock);
>  
>   if (!up) {
> @@ -176,13 +172,13 @@ struct user_struct *alloc_uid(kuid_t uid)
>* on adding the same user already..
>*/
>   spin_lock_irq(_lock);
> - up = uid_hash_find(uid, hashent);
> + up = uid_hash_find(uid);
>   if (up) {
>   key_put(new->uid_keyring);
>   key_put(new->session_keyring);
>   kmem_cache_free(uid_cachep, new);
>   } else {
> - uid_hash_insert(new, hashent);
> + uid_hash_insert(new);
>   up = new;
>   }
>   spin_unlock_irq(_lock);
> @@ -196,17 +192,14 @@ out_unlock:
>  
>  static int __init uid_cache_init(void)
>  {
> - int n;
> -
>   uid_cachep = kmem_cache_create("uid_cache", sizeof(struct user_struct),
>   0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
>  
> - for(n = 0; n < UIDHASH_SZ; ++n)
> - INIT_HLIST_HEAD(uidhash_table + n);
> + hash_init(uidhash_table);
>  
>   /* Insert the root user immediately (init already runs as root) */
>   spin_lock_irq(_lock);
> - uid_hash_insert(_user, uidhashentry(GLOBAL_ROOT_UID));
> + uid_hash_insert(_user);
>   spin_unlock_irq(_lock);
>  
>   return 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a 

Re: [PATCH] rtc: tps65910: Add RTC driver for TPS65910 PMIC RTC

2012-08-14 Thread Andrew Morton
On Thu, 26 Jul 2012 12:05:19 +0530
Venu Byravarasu  wrote:

> TPS65910 PMIC is a MFD with RTC as one of the device.
> Adding RTC driver for supporting RTC device present
> inside TPS65910 PMIC.
> 
> Only support for RTC alarm is implemented as part of this patch.

It needs a build fix:

drivers/rtc/rtc-tps65910.c: In function 'tps65910_rtc_suspend':
drivers/rtc/rtc-tps65910.c:313: error: request for member 'irqstat' in 
something not a structure or union
drivers/rtc/rtc-tps65910.c: In function 'tps65910_rtc_resume':
drivers/rtc/rtc-tps65910.c:327: error: request for member 'irqstat' in 
something not a structure or union

--- 
a/drivers/rtc/rtc-tps65910.c~rtc-tps65910-add-rtc-driver-for-tps65910-pmic-rtc-fix
+++ a/drivers/rtc/rtc-tps65910.c
@@ -310,7 +310,7 @@ static int tps65910_rtc_suspend(struct p
 
/* Store current list of enabled interrupts*/
ret = regmap_read(tps->regmap, TPS65910_RTC_INTERRUPTS,
-   >rtc.irqstat);
+   >rtc->irqstat);
if (ret < 0)
return ret;
 
@@ -324,7 +324,7 @@ static int tps65910_rtc_resume(struct pl
 
/* Restore list of enabled interrupts before suspend */
return regmap_write(tps->regmap, TPS65910_RTC_INTERRUPTS,
-   tps->rtc.irqstat);
+   tps->rtc->irqstat);
 }
 
 static const struct dev_pm_ops tps65910_rtc_pm_ops = {


but it still has problems:

drivers/rtc/rtc-tps65910.c:331: warning: initialization from incompatible 
pointer type
drivers/rtc/rtc-tps65910.c:332: warning: initialization from incompatible 
pointer type

fix and resend, please?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 07/31] arm64: Process management

2012-08-14 Thread Olof Johansson
Hi,

On Tue, Aug 14, 2012 at 06:52:08PM +0100, Catalin Marinas wrote:

> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> new file mode 100644
> index 000..c4a4e1c
> --- /dev/null
> +++ b/arch/arm64/kernel/process.c
> @@ -0,0 +1,416 @@

[...]
> +/*
> + * Function pointers to optional machine specific functions
> + */
> +void (*pm_power_off)(void);
> +EXPORT_SYMBOL(pm_power_off);
> +
> +void (*pm_restart)(const char *cmd);
> +EXPORT_SYMBOL_GPL(pm_restart);
[...]
> +void (*pm_idle)(void) = default_idle;
> +EXPORT_SYMBOL(pm_idle);

Does it really make sense to export these to modules?

I find the powerpc way of having a machine descriptor structure with these
(and other) function pointers in it a bit cleaner, since it gives you
one place to plug it all in. I'd recommend that you consider doing that
here as well, for these three and potentially other cases in the future.

(See arch/powerpc/include/asm/machdep.h, struct machdep_calls).

> +void machine_halt(void)
> +{
> + machine_shutdown();
> + while (1);
> +}
> +
> +void machine_power_off(void)
> +{
> + machine_shutdown();
> + if (pm_power_off)
> + pm_power_off();
> +}

Printing something here along the lines of "System halted, OK to power off"
is useful.


-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: Tree for Aug 14 (drivers/input/touchscreen/edt-ft5x06.c)

2012-08-14 Thread Simon Budig
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/14/2012 07:59 PM, Randy Dunlap wrote:
> on x86_64, when CONFIG_DEBUG_FS is not enabled:
> 
> drivers/input/touchscreen/edt-ft5x06.c: In function
> 'edt_ft5x06_ts_remove': 
> drivers/input/touchscreen/edt-ft5x06.c:846:14: error: 'struct
> edt_ft5x06_ts_data' has no member named 'raw_buffer'

Yeah, unfortunate oversight on my part.

Guenter Roeck provided a patch for this issue which looks good to me.
(Aug 7th, linux-input mailinglist)

Dmitry: Can you upstream this? Is there any action required on my part?

Thanks,
Simon
- -- 
   Simon Budigkernel concepts GmbH
   simon.bu...@kernelconcepts.de  Sieghuetter Hauptweg 48
   +49-271-771091-17  D-57072 Siegen

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlAq3C4ACgkQO2O/RXesiHAofACgsPG5EcF3fn0me1STCIK5nYwm
4QkAoL+Q30zA0NLNbUSPJIRt6CmR4o91
=y99v
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 03/31] arm64: Exception handling

2012-08-14 Thread Thomas Gleixner
On Tue, 14 Aug 2012, Olof Johansson wrote:
> > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> > new file mode 100644
> > index 000..8712a8e
> > --- /dev/null
> > +++ b/arch/arm64/kernel/traps.c
> [...]
> > +DEFINE_SPINLOCK(die_lock);
> 
> Should probably be static.

And RAW_
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers

2012-08-14 Thread Thomas Gleixner
Tejun,

On Tue, 14 Aug 2012, Tejun Heo wrote:
> On Wed, Aug 15, 2012 at 01:12:01AM +0200, Thomas Gleixner wrote:
> > Just for the record. The thread evolved from here:
> > 
> >   * mod_delayed_work() can't be used from IRQ handlers.
> > 
> > My answer was:
> > 
> > This function does not exist. So what?
> > 
> > Which was completely appropriate as this function does not exist
> > though you used it as a primary argument for your patches.
> 
> I read it as "so, what's wrong with not having mod_delayed_work()?",
> so the response.

Oh well. Your interpretation of "So what?" starts to stress my
patience.

> It exists in wq/for-3.7 and cancel_delayed_work() (the one without
> preceding __) + queue() users have been already converted.
> 
>   http://thread.gmane.org/gmane.linux.kernel/1334546

Do you really expect that I follow all of kernel dev posts within a
day of returning from a two weeks vacation?

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 2/3] PPC64: Add support for instantiating SML from Open Firmware

2012-08-14 Thread Ashley Lai
This patch instantiate Stored Measurement Log (SML) and put the
log address and size in the device tree.

Signed-off-by: Ashley Lai 
---
 arch/powerpc/kernel/prom_init.c |   62 +++
 1 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 0794a30..e144498 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1624,6 +1624,63 @@ static void __init prom_instantiate_rtas(void)
 
 #ifdef CONFIG_PPC64
 /*
+ * Allocate room for and instantiate Stored Measurement Log (SML)
+ */
+static void __init prom_instantiate_sml(void)
+{
+   phandle ibmvtpm_node;
+   ihandle ibmvtpm_inst;
+   u32 entry = 0, size = 0;
+   u64 base;
+
+   prom_debug("prom_instantiate_sml: start...\n");
+
+   ibmvtpm_node = call_prom("finddevice", 1, 1, ADDR("/ibm,vtpm"));
+   prom_debug("ibmvtpm_node: %x\n", ibmvtpm_node);
+   if (!PHANDLE_VALID(ibmvtpm_node))
+   return;
+
+   ibmvtpm_inst = call_prom("open", 1, 1, ADDR("/ibm,vtpm"));
+   if (!IHANDLE_VALID(ibmvtpm_inst)) {
+   prom_printf("opening vtpm package failed (%x)\n", ibmvtpm_inst);
+   return;
+   }
+
+   if (call_prom_ret("call-method", 2, 2, ,
+ ADDR("sml-get-handover-size"),
+ ibmvtpm_inst) != 0 || size == 0) {
+   prom_printf("SML get handover size failed\n");
+   return;
+   }
+
+   base = alloc_down(size, PAGE_SIZE, 0);
+   if (base == 0)
+   prom_panic("Could not allocate memory for sml\n");
+
+   prom_printf("instantiating sml at 0x%x...", base);
+
+   if (call_prom_ret("call-method", 4, 2, ,
+ ADDR("sml-handover"),
+ ibmvtpm_inst, size, base) != 0 || entry == 0) {
+   prom_printf("SML handover failed\n");
+   return;
+   }
+   prom_printf(" done\n");
+
+   reserve_mem(base, size);
+
+   prom_setprop(ibmvtpm_node, "/ibm,vtpm", "linux,sml-base",
+, sizeof(base));
+   prom_setprop(ibmvtpm_node, "/ibm,vtpm", "linux,sml-size",
+, sizeof(size));
+
+   prom_debug("sml base = 0x%x\n", base);
+   prom_debug("sml size = 0x%x\n", (long)size);
+
+   prom_debug("prom_instantiate_sml: end...\n");
+}
+
+/*
  * Allocate room for and initialize TCE tables
  */
 static void __init prom_initialize_tce_table(void)
@@ -2916,6 +2973,11 @@ unsigned long __init prom_init(unsigned long r3, 
unsigned long r4,
prom_instantiate_opal();
 #endif
 
+#ifdef CONFIG_PPC64
+   /* instantiate sml */
+   prom_instantiate_sml();
+#endif
+
/*
 * On non-powermacs, put all CPUs in spin-loops.
 *
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] irqdesc: delete now orphaned references to timer_rand_state

2012-08-14 Thread Paul Gortmaker
In commit c5857ccf293 ("random: remove rand_initialize_irq()")
the timer_rand_state was removed from struct irq_desc.  Hence
we can also remove the forward declaration of it and the kernel
doc information now too.

Cc: Jiri Kosina 
Signed-off-by: Paul Gortmaker 
---
 include/linux/irqdesc.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index 9a323d1..0ba014c 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -10,12 +10,10 @@
 
 struct irq_affinity_notify;
 struct proc_dir_entry;
-struct timer_rand_state;
 struct module;
 /**
  * struct irq_desc - interrupt descriptor
  * @irq_data:  per irq and chip data passed down to chip functions
- * @timer_rand_state:  pointer to timer rand state struct
  * @kstat_irqs:irq stats per cpu
  * @handle_irq:highlevel irq-events handler
  * @preflow_handler:   handler called before the flow handler (currently used 
by sparc)
-- 
1.7.11.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 3/3] drivers/char/tpm: Add securityfs support for event log

2012-08-14 Thread Ashley Lai
This patch retrieves the event log data from the device tree
during file open. The event log data will then displayed through
securityfs.

Signed-off-by: Ashley Lai 
---
 drivers/char/tpm/Makefile   |5 +++
 drivers/char/tpm/tpm.h  |   12 --
 drivers/char/tpm/tpm_eventlog.h |   15 
 drivers/char/tpm/tpm_of.c   |   73 +++
 4 files changed, 93 insertions(+), 12 deletions(-)
 create mode 100644 drivers/char/tpm/tpm_of.c

diff --git a/drivers/char/tpm/Makefile b/drivers/char/tpm/Makefile
index 547509d..9080cc4 100644
--- a/drivers/char/tpm/Makefile
+++ b/drivers/char/tpm/Makefile
@@ -5,6 +5,11 @@ obj-$(CONFIG_TCG_TPM) += tpm.o
 ifdef CONFIG_ACPI
obj-$(CONFIG_TCG_TPM) += tpm_bios.o
tpm_bios-objs += tpm_eventlog.o tpm_acpi.o
+else
+ifdef CONFIG_TCG_IBMVTPM
+   obj-$(CONFIG_TCG_TPM) += tpm_bios.o
+   tpm_bios-objs += tpm_eventlog.o tpm_of.o
+endif
 endif
 obj-$(CONFIG_TCG_TIS) += tpm_tis.o
 obj-$(CONFIG_TCG_TIS_I2C_INFINEON) += tpm_i2c_infineon.o
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 870fde7..f1af738 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -327,15 +327,3 @@ extern int tpm_pm_suspend(struct device *);
 extern int tpm_pm_resume(struct device *);
 extern int wait_for_tpm_stat(struct tpm_chip *, u8, unsigned long,
 wait_queue_head_t *);
-#ifdef CONFIG_ACPI
-extern struct dentry ** tpm_bios_log_setup(char *);
-extern void tpm_bios_log_teardown(struct dentry **);
-#else
-static inline struct dentry ** tpm_bios_log_setup(char *name)
-{
-   return NULL;
-}
-static inline void tpm_bios_log_teardown(struct dentry **dir)
-{
-}
-#endif
diff --git a/drivers/char/tpm/tpm_eventlog.h b/drivers/char/tpm/tpm_eventlog.h
index 8e23ccd..e7da086 100644
--- a/drivers/char/tpm/tpm_eventlog.h
+++ b/drivers/char/tpm/tpm_eventlog.h
@@ -68,4 +68,19 @@ enum tcpa_pc_event_ids {
 };
 
 int read_log(struct tpm_bios_log *log);
+
+#if defined(CONFIG_TCG_IBMVTPM) || defined(CONFIG_TCG_IBMVTPM_MODULE) || \
+   defined(CONFIG_ACPI)
+extern struct dentry **tpm_bios_log_setup(char *);
+extern void tpm_bios_log_teardown(struct dentry **);
+#else
+static inline struct dentry **tpm_bios_log_setup(char *name)
+{
+   return NULL;
+}
+static inline void tpm_bios_log_teardown(struct dentry **dir)
+{
+}
+#endif
+
 #endif
diff --git a/drivers/char/tpm/tpm_of.c b/drivers/char/tpm/tpm_of.c
new file mode 100644
index 000..98ba2bd
--- /dev/null
+++ b/drivers/char/tpm/tpm_of.c
@@ -0,0 +1,73 @@
+/*
+ * Copyright 2012 IBM Corporation
+ *
+ * Author: Ashley Lai 
+ *
+ * Maintained by: 
+ *
+ * Read the event log created by the firmware on PPC64
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include 
+#include 
+
+#include "tpm.h"
+#include "tpm_eventlog.h"
+
+int read_log(struct tpm_bios_log *log)
+{
+   struct device_node *np;
+   const u32 *sizep;
+   const __be64 *basep;
+
+   if (log->bios_event_log != NULL) {
+   pr_err("%s: ERROR - Eventlog already initialized\n", __func__);
+   return -EFAULT;
+   }
+
+   np = of_find_node_by_name(NULL, "ibm,vtpm");
+   if (!np) {
+   pr_err("%s: ERROR - IBMVTPM not supported\n", __func__);
+   return -ENODEV;
+   }
+
+   sizep = of_get_property(np, "linux,sml-size", NULL);
+   if (sizep == NULL) {
+   pr_err("%s: ERROR - SML size not found\n", __func__);
+   goto cleanup_eio;
+   }
+   if (*sizep == 0) {
+   pr_err("%s: ERROR - event log area empty\n", __func__);
+   goto cleanup_eio;
+   }
+
+   basep = of_get_property(np, "linux,sml-base", NULL);
+   if (basep == NULL) {
+   pr_err(KERN_ERR "%s: ERROR - SML not found\n", __func__);
+   goto cleanup_eio;
+   }
+
+   of_node_put(np);
+   log->bios_event_log = kmalloc(*sizep, GFP_KERNEL);
+   if (!log->bios_event_log) {
+   pr_err("%s: ERROR - Not enough memory for BIOS measurements\n",
+  __func__);
+   return -ENOMEM;
+   }
+
+   log->bios_event_log_end = log->bios_event_log + *sizep;
+
+   memcpy(log->bios_event_log, __va(be64_to_cpup(basep)), *sizep);
+
+   return 0;
+
+cleanup_eio:
+   of_node_put(np);
+   return -EIO;
+}
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers

2012-08-14 Thread Thomas Gleixner
On Tue, 14 Aug 2012, Tejun Heo wrote:
> On Wed, Aug 15, 2012 at 12:45:24AM +0200, Thomas Gleixner wrote:
> > And we have very well worked out mechanisms regarding cross tree
> > changes, i.e. providing minimal trees to pull for other maintainers.
> 
> If you look at the review branches, they're actually structured that
> way so that the timer part can be pulled separately.  If the
> maintainer wants to do that, sure.  If the maintainer thinks routing
> through another tree is fine, that's okay too.  Subsystem boundaries
> are all good and great but it's not some absolute barrier which can't
> be crossed at any cost.

That's not about any cost. You are trying to force stuff into -next
with a THREE workdays notice, just because you think that your stuff
is so important.

Have you ever tried to understand how the kernel development system
works?

> I probably should have written "if the maintainer doesn't object, I
> think it would be easier to route these through wq/for-3.7 as it will
> be the only user for now, blah blah blah" instead and maybe I
> misjudged the character of the changes or the subsystem.  That said, I
> think you're inferring too much.

No, I'm, not inferrring too much. It's not about what you should have
written.

It's about your general attitude. You really think that I accept all
that stuff just because you add some "blah, blah, blah" to some mail?

Far from it!

To convince me to accept your patches you should start answering my
questions and suggestions seriously in the first place and not
discarding them upfront as lunatic visions.

As long as you can't provide a proper counter argument against
maintaining the timer in the same context as the work, no matter what
the underlying mechanism to achieve this will be, I'm not going to
accept any of this hackery neither near next nor mainline.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 1/3] drivers/char/tpm: Add new device driver to support IBM vTPM

2012-08-14 Thread Ashley Lai
This patch adds a new device driver to support IBM virtual TPM
(vTPM) for PPC64.  IBM vTPM is supported through the adjunct
partition with firmware release 740 or higher.  With vTPM
support, each lpar is able to have its own vTPM without the
physical TPM hardware.

This driver provides TPM functionalities by communicating with
the vTPM adjunct partition through Hypervisor calls (Hcalls)
and Command/Response Queue (CRQ) commands.

Signed-off-by: Ashley Lai 
---
 drivers/char/tpm/Kconfig   |8 +
 drivers/char/tpm/Makefile  |1 +
 drivers/char/tpm/tpm.h |1 +
 drivers/char/tpm/tpm_ibmvtpm.c |  749 
 drivers/char/tpm/tpm_ibmvtpm.h |   83 +
 5 files changed, 842 insertions(+), 0 deletions(-)
 create mode 100644 drivers/char/tpm/tpm_ibmvtpm.c
 create mode 100644 drivers/char/tpm/tpm_ibmvtpm.h

diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig
index c4aac48..915875e 100644
--- a/drivers/char/tpm/Kconfig
+++ b/drivers/char/tpm/Kconfig
@@ -73,4 +73,12 @@ config TCG_INFINEON
  Further information on this driver and the supported hardware
  can be found at 
http://www.trust.rub.de/projects/linux-device-driver-infineon-tpm/ 

+config TCG_IBMVTPM
+   tristate "IBM VTPM Interface"
+   depends on PPC64
+   ---help---
+ If you have IBM virtual TPM (VTPM) support say Yes and it
+ will be accessible from within Linux.  To compile this driver
+ as a module, choose M here; the module will be called tpm_ibmvtpm.
+
 endif # TCG_TPM
diff --git a/drivers/char/tpm/Makefile b/drivers/char/tpm/Makefile
index beac52f..547509d 100644
--- a/drivers/char/tpm/Makefile
+++ b/drivers/char/tpm/Makefile
@@ -11,3 +11,4 @@ obj-$(CONFIG_TCG_TIS_I2C_INFINEON) += tpm_i2c_infineon.o
 obj-$(CONFIG_TCG_NSC) += tpm_nsc.o
 obj-$(CONFIG_TCG_ATMEL) += tpm_atmel.o
 obj-$(CONFIG_TCG_INFINEON) += tpm_infineon.o
+obj-$(CONFIG_TCG_IBMVTPM) += tpm_ibmvtpm.o
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 645136e..870fde7 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -100,6 +100,7 @@ struct tpm_vendor_specific {
bool timeout_adjusted;
unsigned long duration[3]; /* jiffies */
bool duration_adjusted;
+   void *data;
 
wait_queue_head_t read_queue;
wait_queue_head_t int_queue;
diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
new file mode 100644
index 000..efc4ab3
--- /dev/null
+++ b/drivers/char/tpm/tpm_ibmvtpm.c
@@ -0,0 +1,749 @@
+/*
+ * Copyright (C) 2012 IBM Corporation
+ *
+ * Author: Ashley Lai 
+ *
+ * Maintained by: 
+ *
+ * Device driver for TCG/TCPA TPM (trusted platform module).
+ * Specifications at www.trustedcomputinggroup.org
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "tpm.h"
+#include "tpm_ibmvtpm.h"
+
+static const char tpm_ibmvtpm_driver_name[] = "tpm_ibmvtpm";
+
+static struct vio_device_id tpm_ibmvtpm_device_table[] __devinitdata = {
+   { "IBM,vtpm", "IBM,vtpm"},
+   { "", "" }
+};
+MODULE_DEVICE_TABLE(vio, tpm_ibmvtpm_device_table);
+
+DECLARE_WAIT_QUEUE_HEAD(wq);
+
+/**
+ * ibmvtpm_send_crq - Send a CRQ request
+ * @vdev:  vio device struct
+ * @w1:first word
+ * @w2:second word
+ *
+ * Return value:
+ * 0 -Sucess
+ * Non-zero - Failure
+ */
+static int ibmvtpm_send_crq(struct vio_dev *vdev, u64 w1, u64 w2)
+{
+   return plpar_hcall_norets(H_SEND_CRQ, vdev->unit_address, w1, w2);
+}
+
+/**
+ * ibmvtpm_get_data - Retrieve ibm vtpm data
+ * @dev:   device struct
+ *
+ * Return value:
+ * vtpm device struct
+ */
+static struct ibmvtpm_dev *ibmvtpm_get_data(const struct device *dev)
+{
+   struct tpm_chip *chip = dev_get_drvdata(dev);
+   if (chip)
+   return (struct ibmvtpm_dev *)chip->vendor.data;
+   return NULL;
+}
+
+/**
+ * tpm_ibmvtpm_recv - Receive data after send
+ * @chip:  tpm chip struct
+ * @buf:   buffer to read
+ * count:  size of buffer
+ *
+ * Return value:
+ * Number of bytes read
+ */
+static int tpm_ibmvtpm_recv(struct tpm_chip *chip, u8 *buf, size_t count)
+{
+   struct ibmvtpm_dev *ibmvtpm;
+   u16 len;
+
+   ibmvtpm = (struct ibmvtpm_dev *)chip->vendor.data;
+
+   if (!ibmvtpm->rtce_buf) {
+   dev_err(ibmvtpm->dev, "ibmvtpm device is not ready\n");
+   return 0;
+   }
+
+   wait_event_interruptible(wq, ibmvtpm->crq_res.len != 0);
+
+   if (count < ibmvtpm->crq_res.len) {
+   dev_err(ibmvtpm->dev,
+   "Invalid size in recv: count=%ld, crq_size=%d\n",
+   count, 

Re: [PATCH 0/4] promote zcache from staging

2012-08-14 Thread Minchan Kim
Hi Seth,

On Tue, Aug 14, 2012 at 05:18:57PM -0500, Seth Jennings wrote:
> On 07/27/2012 01:18 PM, Seth Jennings wrote:
> > zcache is the remaining piece of code required to support in-kernel
> > memory compression.  The other two features, cleancache and frontswap,
> > have been promoted to mainline in 3.0 and 3.5.  This patchset
> > promotes zcache from the staging tree to mainline.
> > 
> > Based on the level of activity and contributions we're seeing from a
> > diverse set of people and interests, I think zcache has matured to the
> > point where it makes sense to promote this out of staging.
> 
> I am wondering if there is any more discussion to be had on
> the topic of promoting zcache.  The discussion got dominated
> by performance concerns, but hopefully my latest performance
> metrics have alleviated those concerns for most and shown
> the continuing value of zcache in both I/O and runtime savings.
> 
> I'm not saying that zcache development is complete by any
> means. There are still many improvements that can be made.
> I'm just saying that I believe it is stable and beneficial
> enough to leave the staging tree.
> 
> Seth

I want to do some clean up on zcache but I'm okay after it is promoted
if Andrew merge it. But I'm not sure he doesn't mind it due to not good code
quality which includes not enough comment, not good variable/function name,
many code duplication of ramster).
Anyway, I think  we should unify common code between zcache and ramster
before promoting at least. Otherwise, it would make code refactoring hard
because we always have to touch both side for just a clean up. It means
zcache contributor for the clean up should know well ramster too and it's
not desirable.


> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 03/31] arm64: Exception handling

2012-08-14 Thread Olof Johansson
Hi,

This one is a bit denser, so just a quick first pass with a couple of minor
comments. I'll revisit the rest.

On Tue, Aug 14, 2012 at 06:52:04PM +0100, Catalin Marinas wrote:

> +el1_sp_pc:
> + /*
> +  *Stack or PC alignment exception handling
> +  */
> + mrs x0, far_el1
> + mov x1, x25
> + mov x2, sp
> + b   do_sp_pc_abort
> +el1_undef:
> + /*
> +  *Undefined instruction
> +  */

Nit: Missing spaces in the comment here and the one above.

> +el0_undef:
> + /*
> +  *Undefined instruction
> +  */
> + mov x0, sp
> + b   do_undefinstr

Here too.

> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> new file mode 100644
> index 000..8712a8e
> --- /dev/null
> +++ b/arch/arm64/kernel/traps.c
[...]
> +DEFINE_SPINLOCK(die_lock);

Should probably be static.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers

2012-08-14 Thread Tejun Heo
Hello, Thomas.

On Wed, Aug 15, 2012 at 01:12:01AM +0200, Thomas Gleixner wrote:
> Just for the record. The thread evolved from here:
> 
>   * mod_delayed_work() can't be used from IRQ handlers.
> 
> My answer was:
> 
> This function does not exist. So what?
> 
> Which was completely appropriate as this function does not exist
> though you used it as a primary argument for your patches.

I read it as "so, what's wrong with not having mod_delayed_work()?",
so the response.

It exists in wq/for-3.7 and cancel_delayed_work() (the one without
preceding __) + queue() users have been already converted.

  http://thread.gmane.org/gmane.linux.kernel/1334546

> Can you please sit down for a little while and think about your own
> snarkiness and your own tiring behaviour against other kernel
> maintainers?

Believe it or not, I tend to work pretty well with other maintainers
and developers.  You start responding with words like "mess" and
"crap" with condescension sprinkled and expect the conversation to not
escalate?  Let's just stay technical, okay?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 2/2] kvm: KVM_EOIFD, an eventfd for EOIs

2012-08-14 Thread Alex Williamson
On Wed, 2012-08-15 at 02:04 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 14, 2012 at 04:01:15PM -0600, Alex Williamson wrote:
> > On Tue, 2012-08-14 at 15:35 +0300, Avi Kivity wrote:
> > > On 08/12/2012 12:33 PM, Michael S. Tsirkin wrote:
> > > >> 
> > > >> Michael, would the interface be more acceptable to you if we added
> > > >> separate ioctls to allocate and free some representation of an irq
> > > >> source ID, gsi pair?  For instance, an ioctl might return an idr entry
> > > >> for an irq source ID/gsi object which would then be passed as a
> > > >> parameter in struct kvm_irqfd and struct kvm_eoifd so that the object
> > > >> representing the source id/gsi isn't magically freed on it's own.  This
> > > >> would also allow us to deassign/close one end and reconfigure it later.
> > > >> Thanks,
> > > >> 
> > > >> Alex
> > > > 
> > > > It's acceptable to me either way. I was only pointing out that as
> > > > designed, the interface looks simple at first but then you find out some
> > > > subtle limitations which are implementation driven. This gives
> > > > an overall feeling the abstraction is too low level.
> > > > 
> > > > If we compare to the existing irqfd, isn't the difference
> > > > simply that irqfd deasserts immediately ATM, while we
> > > > want to delay this until later?
> > > > 
> > > > If yes, then along the lines that you proposed, and combining with my
> > > > idea of tracking deasserts, how do you like the following:
> > > > 
> > > > /* Keep line asserted until guest has handled the interrupt. */
> > > > #define KVM_IRQFD_FLAG_DEASSERT_ON_ACK (1 << 1)
> > > > /* Notify after line is deasserted. */
> > > > #define KVM_IRQFD_FLAG_DEASSERT_EVENTFD (2 << 1)
> > > > 
> > > > struct kvm_irqfd {
> > > > __u32 fd;
> > > > __u32 gsi;
> > > > __u32 flags;
> > > > /* eventfd to notify when line is deasserted */
> > > > __u32 deassert_eventfd;
> > > > __u8  pad[16];
> > > > };
> > > > 
> > > > now the only limitation is that KVM_IRQFD_FLAG_DEASSERT_ON_ACK is only
> > > > effective for level interrupts.
> > > > 
> > > > Notes about lifetime of objects:
> > > > - closing deassert_eventfd does nothing (we can keep
> > > >   reference to it from irqfd so no need for
> > > >   complex polling/flushing scheme)
> > > > - closing irqfd or deasserting dis-associates
> > > >   deassert_eventfd automatically
> > > > - source id is internal to irqfd and goes away with it
> > > > 
> > > > it looks harder to misuse and fits what we want to do nicely,
> > > > and needs less code to implement.
> > > > 
> > > > Avi, what do you think?
> > > 
> > > I think given all the complexity in the separate ioctl approach that
> > > this makes sense.  There are no lifetime issues or code to match the two
> > > eventfds.  Alex, would this API simplify the code?
> > 
> > It does though I'm concerned that it's a very specific solution that
> > only addresses this problem.  Generic userspace eoi/ack is not
> > addressed.  The latest version using separate ioctls does a lot of
> > simplification by exposing irq sourceids.  The bulk of the code there is
> > duplicating what irqfd does just so we can catch the POLLHUP for
> > cleanup.  If there was an easier way to do that, we don't care about
> > POLLIN/POLLOUT, much of the code could be removed.  Alternatively we
> > could make some common infrastructure to simplify both irqfd and
> > irq_ackfd, but how to frame the helpers isn't easy.
> 
> There is way easier with a single ioctl.  Don't you see?
> 
> As ack_eventfd pointer becomes part of the irqfd structure now, you
> simply drop the reference together with irqfd.
> In other words you do not care that ack eventfd goes
> away anymore. So no need for POLLHUP handlers, no
> separate DEASSERT that can race with that, etc.
> 
> So all this code just goes away, and it goes away completely, together
> with managing source IDs (source ID comes an internal optimization to
> avoid spurious EOIs, so no need to expose it to userspace anymore).
> 
> So all we are left with is minimal:
> 1. change irqfds to use a separate source id (can do this
>unconditionally for all irqfds)
> 2. check deassert on ack, if set register ack notifier
> 3. in ack notifier check deassert eventfd, if set signal it
> 4. (optionally) add a flag in irqfd, set on assert, test and clear
>on deassert, and only signal eventfd if it was set
> 
> on top of that we could try to do
> 5. allocate some more source IDs and if they are free try to use them as
>an optimization to avoid atomics

Yes, I understand.  It's simple, it's also very specific to this
problem, and doesn't address generic ack notification.  All of which
I've noted before and I continue to note that v8 offers simplifications
while retaining flexibility.  Least amount of code doesn't really buy us
much if we end up needing to invent 

Re: [PATCH 01/16] hashtable: introduce a small and naive hashtable

2012-08-14 Thread NeilBrown
On Tue, 14 Aug 2012 18:24:35 +0200 Sasha Levin 
wrote:


> +static inline void hash_init_size(struct hlist_head *hashtable, int bits)
> +{
> + int i;
> +
> + for (i = 0; i < HASH_SIZE(bits); i++)
> + INIT_HLIST_HEAD(hashtable + i);
> +}

This seems like an inefficient way to do "memset(hashtable, 0, ...);".
And in many cases it isn't needed as the hash table is static and initialised
to zero.
I note that in the SUNRPC/cache patch you call hash_init(), but in the lockd
patch you don't.  You don't actually need to in either case.

I realise that any optimisation here is for code that is only executed once
per boot, so no big deal, and even the presence of extra code making the
kernel bigger is unlikely to be an issue.  But I'd at least like to see
consistency: Either use hash_init everywhere, even when not needed, or only
use it where absolutely needed which might be no-where because static tables
are already initialised, and dynamic tables can use GFP_ZERO.

And if you keep hash_init_size I would rather see a memset(0)

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH v2 11/31] arm64: IRQ handling

2012-08-14 Thread Aaro Koskinen
Hi,

On Tue, Aug 14, 2012 at 06:52:12PM +0100, Catalin Marinas wrote:
> +void handle_IRQ(unsigned int irq, struct pt_regs *regs)
> +{
> + struct pt_regs *old_regs = set_irq_regs(regs);
> +
> + irq_enter();
> +
> + /*
> +  * Some hardware gives randomly wrong interrupts.  Rather
> +  * than crashing, do something sensible.
> +  */
> + if (unlikely(irq >= nr_irqs)) {
> + if (printk_ratelimit())
> + pr_warning("Bad IRQ%u\n", irq);

I guess pr_warn_ratelimited() should be used for new code.

(See include/linux/printk.h, "Please don't use printk_ratelimit()...")

A.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] ARM: kirkwood: DT board setup for Seagate FreeAgent Dockstar

2012-08-14 Thread Jason Cooper
On Tue, Aug 14, 2012 at 10:43:41PM +0200, Sebastian Hesselbarth wrote:
> This add a DT compatible board specific setup for the Seagate
> FreeAgent Dockstar.
> 
> Signed-off-by: Sebastian Hesselbarth 
> ---
> Cc: Jason Cooper 
> Cc: Andrew Lunn 
> Cc: Russell King 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> 
> v2: rebased on git://git.infradead.org/users/jcooper/linux.git fixes-for-v3.6

patches [1/2] and [2/2] of this series applied to:

  git://git.infradead.org/users/jcooper/linux.git boards-for-v3.7

patch [2/2] depends upon:

  bda63c1 ARM: Kirkwood: fix Makefile.boot

which should be in an upcoming v3.6-rc

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 0/3] tpm: Add new vTPM device driver for PPC64

2012-08-14 Thread Ashley Lai
Change log V3:
- Replaced TPM_NO_EVENT_LOG macro with stubs
- Removed tpm_noeventlog.c file 
- Called of_node_put() before return in tpm_of.c

Change log V2:
- Removed unnecessary tpm_bios_log_setup and tpm_bios_log_teardown
  functions in tpm_eventlog.h (patch 3/3).
- Added more descriptions on vTPM (patch 1/3).

These patches add support for IBM vTPM for PPC64. This new device driver
works on firmware that supports vTPM (firmware release 740 or higher).

Tested on Power7+ system with firmware level ZM770_001.

Applied to Kent Yoder tree at:
https://github.com/shpedoikal/linux/tree/v3.6-rc1-tpmdd-staging

Ashley Lai (3):
  drivers/char/tpm: Add new device driver to support IBM vTPM
  PPC64: Add support for instantiating SML from Open Firmware
  drivers/char/tpm: Add securityfs support for event log

 arch/powerpc/kernel/prom_init.c |   62 
 drivers/char/tpm/Kconfig|8 +
 drivers/char/tpm/Makefile   |6 +
 drivers/char/tpm/tpm.h  |   13 +-
 drivers/char/tpm/tpm_eventlog.h |   15 +
 drivers/char/tpm/tpm_ibmvtpm.c  |  749 +++
 drivers/char/tpm/tpm_ibmvtpm.h  |   83 +
 drivers/char/tpm/tpm_of.c   |   73 
 8 files changed, 997 insertions(+), 12 deletions(-)
 create mode 100644 drivers/char/tpm/tpm_ibmvtpm.c
 create mode 100644 drivers/char/tpm/tpm_ibmvtpm.h
 create mode 100644 drivers/char/tpm/tpm_of.c


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers

2012-08-14 Thread Thomas Gleixner
Tejun,

On Tue, 14 Aug 2012, Tejun Heo wrote:
> On Tue, Aug 14, 2012 at 10:43:30PM +0200, Thomas Gleixner wrote:
> > > It makes the workqueue users messy.  It's difficult to get completely
> > > correct and subtle errors are difficult to detect / verify.
> > 
> > Ah, the function which does not exist makes the users
> > messy. Interesting observation.
> 
> Can we get a little less snarky please?  It's tiring.

Can you please try to answer my questions instead of throwing random
blurb into my direction?

Just for the record. The thread evolved from here:

  * mod_delayed_work() can't be used from IRQ handlers.

My answer was:

This function does not exist. So what?

Which was completely appropriate as this function does not exist
though you used it as a primary argument for your patches.

Now your answer to my reply was:

  It makes the workqueue users messy.  It's difficult to get
  completely correct and subtle errors are difficult to
  detect / verify.

Can you please point out any relevance to my question which would have
me prevented from writing the following?

Ah, the function which does not exist makes the users
  messy. Interesting observation.

So instead of saying, that you wrote an utter nonsense reply you
accuse me of being obnoxious:

  Can we get a little less snarky please?  It's tiring.

Can you please sit down for a little while and think about your own
snarkiness and your own tiring behaviour against other kernel
maintainers?

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH][RFC][Update] PM / Runtime: Introduce driver runtime PM work routine

2012-08-14 Thread Rafael J. Wysocki
On Tuesday, August 14, 2012, Rafael J. Wysocki wrote:
> On Monday, August 13, 2012, Alan Stern wrote:
> > On Mon, 13 Aug 2012, Rafael J. Wysocki wrote:
> > 
> > > > I guess the best we can say is that if you call pm_runtime_barrier()  
> > > > after updating the dev_pm_ops method pointers then after the barrier
> > > > returns, the old method pointers will not be invoked and the old method
> > > > routines will not be running.  So we need an equivalent guarantee with
> > > > regard to the pm_runtime_work pointer.  (Yes, we could use a better 
> > > > name for that pointer.)
> > > > 
> > > > Which means the code in the patch isn't quite right, because it saves 
> > > > the pm_runtime_work pointer before calling rpm_resume().  Maybe we 
> > > > should avoid looking at the pointer until rpm_resume() returns.
> > > 
> > > Yes, we can do that.
> > > 
> > > Alternatively, we can set power.work_in_progress before calling
> > > rpm_resume(dev, 0) (i.e. regard the resume as a part of the work) to make
> > > the barrier wait for all of it to complete.
> > 
> > Yep, that would work.  In fact, I did it that way in the proposed code 
> > posted earlier in this thread.  (But that was just on general 
> > principles, not because I had this particular race in mind.)
> 
> OK
> 
> I need to prepare a new patch now, but first I'll send a couple of (minor)
> fixes for the core runtime PM code.

The new patch is appended along with the "motivation" part of the changelog.
It is on top of the following three patches posted earlier:

https://patchwork.kernel.org/patch/1323641/
https://patchwork.kernel.org/patch/1323661/
https://patchwork.kernel.org/patch/1323651/

I changed the new callback's name to .pm_async_work() to avoid the name
conflict with the pm_runtime_work() function.  I don't have a better idea
for its name at the moment.

Thanks,
Rafael


---
Unfortunately, pm_runtime_get() is not a very useful interface,
because if the device is not in the "active" state already, it
only queues up a work item supposed to resume the device.  Then,
the caller doesn't really know when the device is going to be
resumed which makes it difficult to synchronize future device
accesses with the resume completion.

In that case, if the caller is the device's driver, the most
straightforward way for it to cope with the situation is to use its
.runtime_resume() callback for data processing unrelated to the
resume itself, but the correctness of this is questionable.  Namely,
the driver's .runtime_resume() callback need not be executed directly
by the core and it may be run from within a subsystem or PM domain
callback.  Then, the data processing carried out by the driver's
callback may disturb the subsystem's or PM domain's functionality
(for example, the subsystem may still be unready for the device to
process I/O when the driver's callback is about to return).  Besides,
the .runtime_resume() callback is not supposed to do anything beyond
what is needed to resume the device.

For this reason, it appears to be necessary to introduce a mechanism
by which device drivers may schedule the execution of certain code
(say a procedure) to occur when the device in question is known to be
in the "active" state (preferably, as soon as it has been resumed).
Thus add a new runtime PM callback, .pm_async_work(), to struct
device_driver that will be executed along with the asynchronous
resume if pm_runtime_get() returns 0 (it may be executed once for
multiple subsequent invocations of pm_runtime_get() for the same
device, but if at least one of them returns 0, .pm_async_work() will
be executed at least once).

Additionally, define pm_runtime_get_nowork() that won't cause
the driver's .pm_async_work() callback to be executed.

This version of the patch doesn't include any documentation updates.

No sign-off yet.
---
 drivers/base/power/runtime.c |  111 ++-
 include/linux/device.h   |2 
 include/linux/pm.h   |2 
 include/linux/pm_runtime.h   |6 ++
 4 files changed, 88 insertions(+), 33 deletions(-)

Index: linux/include/linux/device.h
===
--- linux.orig/include/linux/device.h
+++ linux/include/linux/device.h
@@ -203,6 +203,7 @@ extern struct klist *bus_get_device_klis
  * automatically.
  * @pm:Power management operations of the device which matched
  * this driver.
+ * @pm_async_work: Called after asynchronous runtime resume of the device.
  * @p: Driver core's private data, no one other than the driver
  * core can touch this.
  *
@@ -232,6 +233,7 @@ struct device_driver {
const struct attribute_group **groups;
 
const struct dev_pm_ops *pm;
+   void (*pm_async_work) (struct device *dev);
 
struct driver_private *p;
 };
Index: linux/include/linux/pm.h
===
--- 

Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

2012-08-14 Thread Olof Johansson
Hi,


On Tue, Aug 14, 2012 at 06:52:03PM +0100, Catalin Marinas wrote:

> +Before jumping into the kernel, the following conditions must be met:
> +
> +- Quiesce all DMA capable devices so that memory does not get
> +  corrupted by bogus network packets or disk data.  This will save
> +  you many hours of debug.
> +
> +- Primary CPU general-purpose register settings
> +  x0 = physical address of device tree blob (dtb) in system RAM.
> +
> +- CPU mode
> +  All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError,
> +  IRQ and FIQ).
> +  The CPU must be in either EL2 (RECOMMENDED in order to have access to
> +  the virtualisation extensions) or non-secure EL1.
> +
> +- Caches, MMUs
> +  The MMU must be off.
> +  Instruction cache may be on or off.
> +  Data cache must be off and invalidated.
> +
> +- Architected timers
> +  CNTFRQ must be programmed with the timer frequency.
> +  If entering the kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0)
> +  set where available.
> +
> +- Coherency
> +  All CPUs to be booted by the kernel must be part of the same coherency
> +  domain on entry to the kernel.  This may require IMPLEMENTATION DEFINED
> +  initialisation to enable the receiving of maintenance operations on
> +  each CPU.
> +
> +- System registers
> +  All writable architected system registers at the exception level where
> +  the kernel image will be entered must be initialised by software at a
> +  higher exception level to prevent execution in an UNKNOWN state.

Given the recent development of ARM platforms, you might want to mandate
the state of IOMMUs as well (they should probably be off, since there
should be no active DMA activity). Graphics would be the exception to
this, since if you want to keep scanning out a splash screen, you'll
have to keep doing DMA...

> +- The primary CPU must jump directly to the first instruction of the
> +  kernel image.  The device tree blob passed by this CPU must contain
> +  for each CPU node:
> +
> +1. An 'enable-method' property. Currently, the only supported value
> +   for this field is the string "spin-table".
> +
> +2. A 'cpu-release-addr' property identifying a 64-bit,
> +   zero-initialised memory location.

These would be good to have documented in the
Documentation/devicetree/bindings hierarchy as well.

> index 000..d766493
> --- /dev/null
> +++ b/arch/arm64/include/asm/setup.h
> @@ -0,0 +1,26 @@
> +/*
> + * Based on arch/arm/include/asm/setup.h
> + *
> + * Copyright (C) 1997-1999 Russell King
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +#ifndef __ASM_SETUP_H
> +#define __ASM_SETUP_H
> +
> +#include 
> +
> +#define COMMAND_LINE_SIZE 1024

Probably not a huge deal, and other architectures seem to be all over
the map on this, but you might want to go with a larger value now rather
than later. 2048 or 4096 perhaps?

> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> new file mode 100644
> index 000..34ccdc0
> --- /dev/null
> +++ b/arch/arm64/kernel/head.S

[...]

> +/*
> + * Setup common bits before finally enabling the MMU. Essentially this is 
> just
> + * loading the page table pointer and vector base registers.
> + *
> + * On entry to this code, x0 must contain the SCTLR_EL1 value for turning on
> + * the MMU.
> + */
> +__enable_mmu:

ENTRY()?

> + ldr x5, =vectors
> + msr vbar_el1, x5
> + msr ttbr0_el1, x25  // load TTBR0
> + msr ttbr1_el1, x26  // load TTBR1
> + isb
> + b   __turn_mmu_on
> +ENDPROC(__enable_mmu)

...or just END()? Same for a few of the other functions below.

> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> new file mode 100644
> index 000..f25186f
> --- /dev/null
> +++ b/arch/arm64/kernel/setup.c

[...]

> +static void __init setup_processor(void)
> +{
> + struct proc_info_list *list;
> +
> + /*
> +  * locate processor in the list of supported processor
> +  * types.  The linker builds this table for us from the
> +  * entries in arch/arm/mm/proc.S
> +  */

Probably from arch/arm64/... somewhere?


[...]

> + printk("CPU: %s [%08x] revision %d\n",
> +cpu_name, read_cpuid_id(), read_cpuid_id() & 15);
> +
> + sprintf(init_utsname()->machine, "aarch64");

> + initial_boot_params = devtree;
> + dt_root = 

Re: [PATCH v7 2/2] kvm: KVM_EOIFD, an eventfd for EOIs

2012-08-14 Thread Michael S. Tsirkin
On Tue, Aug 14, 2012 at 04:01:15PM -0600, Alex Williamson wrote:
> On Tue, 2012-08-14 at 15:35 +0300, Avi Kivity wrote:
> > On 08/12/2012 12:33 PM, Michael S. Tsirkin wrote:
> > >> 
> > >> Michael, would the interface be more acceptable to you if we added
> > >> separate ioctls to allocate and free some representation of an irq
> > >> source ID, gsi pair?  For instance, an ioctl might return an idr entry
> > >> for an irq source ID/gsi object which would then be passed as a
> > >> parameter in struct kvm_irqfd and struct kvm_eoifd so that the object
> > >> representing the source id/gsi isn't magically freed on it's own.  This
> > >> would also allow us to deassign/close one end and reconfigure it later.
> > >> Thanks,
> > >> 
> > >> Alex
> > > 
> > > It's acceptable to me either way. I was only pointing out that as
> > > designed, the interface looks simple at first but then you find out some
> > > subtle limitations which are implementation driven. This gives
> > > an overall feeling the abstraction is too low level.
> > > 
> > > If we compare to the existing irqfd, isn't the difference
> > > simply that irqfd deasserts immediately ATM, while we
> > > want to delay this until later?
> > > 
> > > If yes, then along the lines that you proposed, and combining with my
> > > idea of tracking deasserts, how do you like the following:
> > > 
> > > /* Keep line asserted until guest has handled the interrupt. */
> > > #define KVM_IRQFD_FLAG_DEASSERT_ON_ACK (1 << 1)
> > > /* Notify after line is deasserted. */
> > > #define KVM_IRQFD_FLAG_DEASSERT_EVENTFD (2 << 1)
> > > 
> > >   struct kvm_irqfd {
> > >   __u32 fd;
> > >   __u32 gsi;
> > >   __u32 flags;
> > >   /* eventfd to notify when line is deasserted */
> > >   __u32 deassert_eventfd;
> > >   __u8  pad[16];
> > >   };
> > > 
> > > now the only limitation is that KVM_IRQFD_FLAG_DEASSERT_ON_ACK is only
> > > effective for level interrupts.
> > > 
> > > Notes about lifetime of objects:
> > >   - closing deassert_eventfd does nothing (we can keep
> > > reference to it from irqfd so no need for
> > >   complex polling/flushing scheme)
> > >   - closing irqfd or deasserting dis-associates
> > > deassert_eventfd automatically
> > >   - source id is internal to irqfd and goes away with it
> > > 
> > > it looks harder to misuse and fits what we want to do nicely,
> > > and needs less code to implement.
> > > 
> > > Avi, what do you think?
> > 
> > I think given all the complexity in the separate ioctl approach that
> > this makes sense.  There are no lifetime issues or code to match the two
> > eventfds.  Alex, would this API simplify the code?
> 
> It does though I'm concerned that it's a very specific solution that
> only addresses this problem.  Generic userspace eoi/ack is not
> addressed.  The latest version using separate ioctls does a lot of
> simplification by exposing irq sourceids.  The bulk of the code there is
> duplicating what irqfd does just so we can catch the POLLHUP for
> cleanup.  If there was an easier way to do that, we don't care about
> POLLIN/POLLOUT, much of the code could be removed.  Alternatively we
> could make some common infrastructure to simplify both irqfd and
> irq_ackfd, but how to frame the helpers isn't easy.

There is way easier with a single ioctl.  Don't you see?

As ack_eventfd pointer becomes part of the irqfd structure now, you
simply drop the reference together with irqfd.
In other words you do not care that ack eventfd goes
away anymore. So no need for POLLHUP handlers, no
separate DEASSERT that can race with that, etc.

So all this code just goes away, and it goes away completely, together
with managing source IDs (source ID comes an internal optimization to
avoid spurious EOIs, so no need to expose it to userspace anymore).

So all we are left with is minimal:
1. change irqfds to use a separate source id (can do this
   unconditionally for all irqfds)
2. check deassert on ack, if set register ack notifier
3. in ack notifier check deassert eventfd, if set signal it
4. (optionally) add a flag in irqfd, set on assert, test and clear
   on deassert, and only signal eventfd if it was set

on top of that we could try to do
5. allocate some more source IDs and if they are free try to use them as
   an optimization to avoid atomics


> > Yet another option was raised in the past, and that was exiling ioapic
> > and pic to userspace.  This moves the entire issue to userspace.  The
> > cost is a new interface that implements the APIC bus (betweem APIC and
> > IOAPIC) and the INTACK sequence (between APIC and PIC), and potential
> > for performance regressions due to the PIC, IOAPIC, and PIT being in
> > userspace.  We would still have to keep the IOAPIC/PIC in the kernel,
> > but no new features would be added.
> 
> Doesn't this assure a performance regression or are we assuming anywhere
> we care about performance we're using MSI?  Thanks,
> 
> Alex
--
To 

Re: [patch net-next v2 01/15] net: introduce upper device lists

2012-08-14 Thread Stephen Hemminger
On Tue, 14 Aug 2012 23:33:44 +0100
Ben Hutchings  wrote:

> On Tue, 2012-08-14 at 17:05 +0200, Jiri Pirko wrote:
> > This lists are supposed to serve for storing pointers to all upper devices.
> > Eventually it will replace dev->master pointer which is used for
> > bonding, bridge, team but it cannot be used for vlan, macvlan where
> > there might be multiple upper present. In case the upper link is
> > replacement for dev->master, it is marked with "master" flag.
> 
> Something I found interesting is that the dev->master pointer and now
> netdev_master_upper_dev_get{,_rcu}() are hardly used by the stackled
> drivers that set the master.  They also have to set an rx_handler on the
> lower device (which is itself mutually exclusive) which gets its own
> context pointer (rx_handler_data).
> 
> Instead, the master pointer is mostly used by device drivers to find out
> about a bridge or bonding device above *their* devices.  And that seems
> to work only for those specific device drivers, not e.g. openvswitch or
> team.  I wonder if we could find a better way to encapsulate the things
> they want do do, in a later step (not holding up this change!).

The concept is master is very useful to user level config things like
Vyatta for seeing parent/child relationship. Since is in ABI now, it
must stay.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers

2012-08-14 Thread Tejun Heo
Hello, Thomas.

On Wed, Aug 15, 2012 at 12:45:24AM +0200, Thomas Gleixner wrote:
> And we have very well worked out mechanisms regarding cross tree
> changes, i.e. providing minimal trees to pull for other maintainers.

If you look at the review branches, they're actually structured that
way so that the timer part can be pulled separately.  If the
maintainer wants to do that, sure.  If the maintainer thinks routing
through another tree is fine, that's okay too.  Subsystem boundaries
are all good and great but it's not some absolute barrier which can't
be crossed at any cost.

> > If you're upset about the style of the ping, I apologize.  I'll try
> > to be more sensitive when pinging you.
> 
> It's not about me. You are trying to play the system.

Thomas, I wasn't trying to get it through behind your back.  You have
been notified clearly multiple times and have ample opportunities to
object and suggest different ways if you don't like whatever is going
on.

I probably should have written "if the maintainer doesn't object, I
think it would be easier to route these through wq/for-3.7 as it will
be the only user for now, blah blah blah" instead and maybe I
misjudged the character of the changes or the subsystem.  That said, I
think you're inferring too much.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Backports mailing list

2012-08-14 Thread Luis R. Rodriguez
On Tue, Aug 14, 2012 at 9:33 AM, Luis R. Rodriguez  wrote:
> On Tue, Aug 14, 2012 at 9:00 AM, Luis R. Rodriguez  
> wrote:
>> For more details please see:
>>
>> http://www.do-not-panic.com/2012/08/automatically-backporting-linux-kernel.html
>> http://www.do-not-panic.com/2012/08/optimizing-backporting-collateral.html
>>
>>   Luis
>
> All that said, please use the shiny new mailing list:
> backpo...@vger.kernel.org  for  patches for compat / compat-drivers.

Furthermore, the project git tree has changed to be renamed to
compat-drivers as well now, and the new wiki page should be used
moving forward:

https://backports.wiki.kernel.org
git://github.com/mcgrof/compat-drivers.git

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Qemu-devel] [PATCH v8] kvm: notify host when the guest is panicked

2012-08-14 Thread Anthony Liguori
Marcelo Tosatti  writes:

> On Tue, Aug 14, 2012 at 02:35:34PM -0500, Anthony Liguori wrote:
>> Marcelo Tosatti  writes:
>> 
>> > On Tue, Aug 14, 2012 at 01:53:01PM -0500, Anthony Liguori wrote:
>> >> Marcelo Tosatti  writes:
>> >> 
>> >> > On Tue, Aug 14, 2012 at 05:55:54PM +0300, Yan Vugenfirer wrote:
>> >> >> 
>> >> >> On Aug 14, 2012, at 1:42 PM, Jan Kiszka wrote:
>> >> >> 
>> >> >> > On 2012-08-14 10:56, Daniel P. Berrange wrote:
>> >> >> >> On Mon, Aug 13, 2012 at 03:21:32PM -0300, Marcelo Tosatti wrote:
>> >> >> >>> On Wed, Aug 08, 2012 at 10:43:01AM +0800, Wen Congyang wrote:
>> >> >>  We can know the guest is panicked when the guest runs on xen.
>> >> >>  But we do not have such feature on kvm.
>> >> >>  
>> >> >>  Another purpose of this feature is: management app(for example:
>> >> >>  libvirt) can do auto dump when the guest is panicked. If 
>> >> >>  management
>> >> >>  app does not do auto dump, the guest's user can do dump by hand if
>> >> >>  he sees the guest is panicked.
>> >> >>  
>> >> >>  We have three solutions to implement this feature:
>> >> >>  1. use vmcall
>> >> >>  2. use I/O port
>> >> >>  3. use virtio-serial.
>> >> >>  
>> >> >>  We have decided to avoid touching hypervisor. The reason why I 
>> >> >>  choose
>> >> >>  choose the I/O port is:
>> >> >>  1. it is easier to implememt
>> >> >>  2. it does not depend any virtual device
>> >> >>  3. it can work when starting the kernel
>> >> >> >>> 
>> >> >> >>> How about searching for the "Kernel panic - not syncing" string 
>> >> >> >>> in the guests serial output? Say libvirtd could take an action upon
>> >> >> >>> that?
>> >> >> >> 
>> >> >> >> No, this is not satisfactory. It depends on the guest OS being
>> >> >> >> configured to use the serial port for console output which we
>> >> >> >> cannot mandate, since it may well be required for other purposes.
>> >> >> > 
>> >> >> Please don't forget Windows guests, there is no console and no "Kernel 
>> >> >> Panic" string ;)
>> >> >> 
>> >> >> What I used for debugging purposes on Windows guest is to register a 
>> >> >> bugcheck callback in virtio-net driver and write 1 to VIRTIO_PCI_ISR 
>> >> >> register.
>> >> >> 
>> >> >> Yan. 
>> >> >
>> >> > Considering whether a "panic-device" should cover other OSes is also \
>> >
>> >> > something to consider. Even for Linux, is "panic" the only case which
>> >> > should be reported via the mechanism? What about oopses without panic? 
>> >> >
>> >> > Is the mechanism general enough for supporting new events, etc.
>> >> 
>> >> Hi,
>> >> 
>> >> I think this discussion is gone of the deep end.
>> >> 
>> >> Forget about !x86 platforms.  They have their own way to do this sort of
>> >> thing.  
>> >
>> > The panic function in kernel/panic.c has the following options, which
>> > appear to be arch independent, on panic:
>> >
>> > - reboot 
>> > - blink
>> 
>> Not sure the semantics of blink but that might be a good place for a
>> pvops hook.
>> 
>> >
>> > None are paravirtual interfaces however.
>> >
>> >> Think of this feature like a status LED on a motherboard.  These
>> >> are very common and usually controlled by IO ports.
>> >> 
>> >> We're simply reserving a "status LED" for the guest to indicate that it
>> >> has paniced.  Let's not over engineer this.
>> >
>> > My concern is that you end up with state that is dependant on x86.
>> >
>> > Subject: [PATCH v8 3/6] add a new runstate: RUN_STATE_GUEST_PANICKED
>> >
>> > Having the ability to stop/restart the guest (and even introducing a 
>> > new VM runstate) is more than a status LED analogy.
>> 
>> I must admit, I don't know why a new runstate is necessary/useful.  The
>> kernel shouldn't have to care about the difference between a halted guest
>> and a panicked guest.  That level of information belongs in userspace IMHO.
>> 
>> > Can this new infrastructure be used by other architectures?
>> 
>> I guess I don't understand why the kernel side of this isn't anything
>> more than a paravirt op hook that does a single outb() with the
>> remaining logic handled 100% in QEMU.
>
> From the patch description:
>
> "Another purpose of this feature is: management app(for example:
> libvirt) can do auto dump when the guest is panicked. If management
> app does not do auto dump, the guest's user can do dump by hand if
> he sees the guest is panicked."

Why does this mandated another runstate?  QEMU can simply mark the VCPUs
as stopped and raise a QMP event.  The kernel doesn't care if the VCPUs
are stopped or panicked.

> Wen, auto dump means dump of guest memory?
>
> In that case, the notification should obviously stop the guest 
> otherwise the guest might be reset by the time memdump from QEMU 
> monitor runs.
>
> But kexec supports dumping of memory already (i suppose it can 
> do automatic dump+{reboot,shutdown}).
>
>> > Do you consider allowing support for Windows as overengineering?
>> 
>> I don't think there is a 

  1   2   3   4   5   6   7   8   9   10   >