Re: [RFC PATCH 0/2] dirreadahead system call

2014-07-31 Thread Andreas Dilger
On Aug 1, 2014, at 1:53, Dave Chinner  wrote:
> On Thu, Jul 31, 2014 at 01:19:45PM +0200, Andreas Dilger wrote:
>> None of these issues are relevant in the API that I'm thinking about. 
>> The syscall just passes the list of inode numbers to be prefetched
>> into kernel memory, and then stat() is used to actually get the data into
>> userspace (or whatever other operation is to be done on them),
>> so there is no danger if the wrong inode is prefetched.  If the inode
>> number is bad the filesystem can just ignore it.
> 
> Which means the filesystem has to treat the inode number as
> potentially hostile. i.e. it can not be trusted to be correct and so
> must take slow paths to validate the inode numbers. This adds
> *significant* overhead to the readahead path for some filesystems:
> readahead is only a win if it is low cost.
> 
> For example, on XFS every untrusted inode number lookup requires an
> inode btree lookup to validate the inode is actually valid on disk
> and that is it allocated and has references. That lookup serialises
> against inode allocation/freeing as well as other lookups. In
> comparison, when using a trusted inode number from a directory
> lookup within the kernel, we only need to do a couple of shift and
> mask operations to convert it to a disk address and we are good to
> go.
> 
> i.e. the difference is at least 5 orders of magnitude higher CPU usage
> for an "inode number readahead" syscall versus a "directory
> readahead" syscall, it has significant serialisation issues and it
> can stall other modification/lookups going on at the same time.
> That's *horrible behaviour* for a speculative readahead operation,
> but because the inodenumbers are untrusted, we can't avoid it.

For ext4 this is virtually free. The same needs to happen for any
inode number from NFS so it can't be that bad. 

Also, since this API would be prefetching inodes in bulk it
could presumably optimize this to some extent. 

> So, again, it's way more overhead than userspace just calling
> stat() asycnhronously on many files at once as readdir/gentdents
> returns dirents from the kernel to speed up cache population.

To me this seems like saying it is just as fast to submit hundreds of
256-byte random reads for a file as it is for large linear reads with
readahead.  Yes, it is possible for the kernel to optimize the random
read workload to some extent, but not as easily as getting reads
in order in the first place. 

> That's my main issue with this patchset - it's implementing
> something in kernelspace that can *easily* be done generically in
> userspace without introducing all sorts of nasty corner cases that
> we have to handle in the kernel. We only add functionality to the kernel if 
> there's a
> compelling reason to do it in kernelspace, and right now I just
> don't see any numbers that justify adding readdir+stat() readahead
> or inode number based cache population in kernelspace.

The original patch showed there was definitely a significant win with
the prefetch case over the single threaded readdir+stat. There is
also something to be said for keeping complexity out of applications.
Sure it is possible for apps to get good performance from AIO,
but very few do so because of complexity. 

> Before we add *any* syscall for directory readahead, we need
> comparison numbers against doing the dumb multithreaded
> userspace readahead of stat() calls. If userspace can do this as
> fast as the kernel can

I'd be interested to see this also, but my prediction is that this will not
deliver the kind of improvements you are expecting. 

Cheers, Andreas--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] autofs: the documentation I wanted to read

2014-07-31 Thread Ian Kent
On Tue, 2014-07-29 at 12:00 +1000, NeilBrown wrote:
> 
> This documents autofs from the perspective of what the module actually
> supports rather than how automount is expected to use it.
> It is based mostly on code review and very little on testing so it
> may be inaccurate in some places.
> 
> The document assumes the functionality added by the RCU-walk patches
> that I posted recently.
> 
> It is formatted using "markdown" and works best with Markdown.pl
> (markdown_py doesn't like some constructs).
> 
> 
> Signed-off-by: NeilBrown 
Acked-by: Ian Kent 

There are a couple of places that might need more work but it's better
to have this now and to work with it in future than to hold it up. 

Especially since I can't quite nail down what it was that didn't quite
sound right when reading it.

Excellent job Neil, thanks very much.
Ian

> 
> diff --git a/Documentation/filesystems/autofs4.txt 
> b/Documentation/filesystems/autofs4.txt
> new file mode 100644
> index ..45f67c83d713
> --- /dev/null
> +++ b/Documentation/filesystems/autofs4.txt
> @@ -0,0 +1,503 @@
> +
> + p { max-width:50em} ol, ul {max-width: 40em}
> +
> +
> +autofs - how it works
> +=
> +
> +Purpose
> +---
> +
> +The goal of autofs is to provide on-demand mounting and race free
> +automatic unmounting of various other filesystems.  This provides two
> +key advantages:
> +
> +1. There is no need to delay boot until all filesystems that
> +   might be needed are mounted.  Processes that try to access those
> +   slow filesystems might be delayed but other processes can
> +   continue freely.  This is particularly important for
> +   network filesystems (e.g. NFS) or filesystems stored on
> +   media with a media-changing robot.
> +
> +2. The names and locations of filesystems can be stored in
> +   a remote database and can change at any time.  The content
> +   in that data base at the time of access will be used to provide
> +   a target for the access.  The interpretation of names in the
> +   filesystem can even be programatic rather than database-backed,
> +   allowing wildcards for example, and can vary based on the user who
> +   first accessed a name.
> +
> +Context
> +---
> +
> +The "autofs4" filesystem module is only one part of an autofs system.
> +There also needs to be a user-space program which looks up names
> +and mounts filesystems.  This will often be the "automount" program,
> +though other tools including "systemd" can make use of "autofs4".
> +This document describes only the kernel module and the interactions
> +required with any user-space program.  Subsequent text refers to this
> +as the "automount daemon" or simply "the daemon".
> +
> +"autofs4" is a Linux kernel module with provides the "autofs"
> +filesystem type.  Several "autofs" filesystems can be mounted and they
> +can each be managed separately, or all managed by the same daemon.
> +
> +Content
> +---
> +
> +An autofs filesystem can contain 3 sorts of objects: directories,
> +symbolic links and mount traps.  Mount traps are directories with
> +extra properties as described in the next section.
> +
> +Objects can only be created by the automount daemon: symlinks are
> +created with a regular `symlink` systemcall, while directories and
> +mount traps are created with `mkdir`.  The determination of whether a
> +directory should be a mount trap or not is quite _ad hoc_, largely for
> +historical reasons, and is determined in part the
> +*direct*/*indirect*/*offset* mount options, and the *maxproto* mount option.
> +
> +If neither the *direct* or *offset* mount options are given (so the
> +mount is considered to be *indirect*), then the root directory is
> +always a regular directory, otherwise it is a mount trap when it is
> +empty and a regular directory when not empty.  Note that *direct* and
> +*offset* are treated identically so a concise summary is that the root
> +directory is a mount trap only if the filesystem is mounted *direct*
> +and the root is empty.
> +
> +Directories created in the root directory are mount traps only if the
> +filesystem is mounted  *indirect* and they are empty.
> +
> +Directories further down the tree depend on the *max_proto* mount
> +option and particularly whether it is less than five or not.
> +When *max_proto* is five, no directories further down the
> +tree are ever mount traps, they are always regular directories.  When
> +the *max_proto* is four (or three), these directories are mount traps
> +precisely when they are empty.
> +
> +So: non-empty (i.e. non-leaf) directories are never mount traps. Empty
> +directories are sometimes mount traps, and sometimes not depending on
> +where in the tree they are (root, top level, or lower) the *maxproto*,
> +and whether the mount was *indirect* or not.
> +
> +Mount Traps
> +---
> +
> +A core element of the implementation of autofs is the Mount Traps
> +which are provided by the Linux VFS.  Any directory provided by a
> 

Re: [PATCH v2 0/3 net-next] Lockless netlink_lookup() with new concurrent hash table

2014-07-31 Thread David Miller
From: David Miller 
Date: Thu, 31 Jul 2014 22:39:46 -0700 (PDT)

> Looks great, series applied, thanks!

Actually, this needs more work, reverted:

net/netfilter/nft_hash.c: In function ‘nft_hash_destroy’:
net/netfilter/nft_hash.c:183:3: error: ‘ht’ undeclared (first use in this 
function)
net/netfilter/nft_hash.c:183:3: note: each undeclared identifier is reported 
only once for each function it appears in


Re: perf tools: Question about kmem and kernel symbol resolution

2014-07-31 Thread Namhyung Kim
On Thu, 31 Jul 2014 11:27:11 -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Jul 31, 2014 at 05:35:32PM +0900, Namhyung Kim escreveu:
>> I'm looking kernel symbol mismatch issue, and found something in perf
>> kmem code.  The commit e727ca73f85d ("perf kmem: Resolve kernel
>> symbols again") added perf_session__create_kernel_maps() but I don't
>> know why.  Why did it miss the MMAP event?
>  
>> I think if we create a kernel maps at report time, it might not match
>> to samples in a perf.data if it's recorded on a different kernel.
>> This is the main reason of the mismatch problem I'm currently chasing
>> IMHO.  What am I missing?
>
>> From a quick look, nothing, i.e. we can not call
> perf_session__create_kernel_maps() at that point, as it will create the
> kernel maps from the running kernel and use it with events from the
> kernel that was in place when the perf.data file being processed was
> created.
>
> Perhaps that problem was fixed somewhere else and we should just revert
> that patch?
>
> Have you tried just reverting it and checking that the results are the
> expected ones? I.e. that there is the kernel MMAP event in perf.data
> file and that it gets properly processed?

Simply reverting ended up with no symbols but it contains MMAP event for
sure.

Then I found a reason - it's simply because kmem tools doesn't register
mmap event handlers. :-/  Adding mmap[2] handlers + reverting ended up
with the expected output.

I'll send the fix soon.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Oops in scsi_put_host_cmd_pool

2014-07-31 Thread Juergen Gross

During test of Xen pvSCSI frontend module I found the following issue:

When unplugging a passed-through SCSI-device the SCSI Host is removed.
When calling the final scsi_host_put() from the driver an Oops is
happening:

[  219.816292] (file=drivers/scsi/xen-scsifront.c, line=808) 
scsifront_remove: device/vscsi/1 removed
[  219.816371] BUG: unable to handle kernel NULL pointer dereference at 
0010

[  219.816380] IP: [] scsi_put_host_cmd_pool+0x38/0xb0
[  219.816390] PGD 3bd60067 PUD 3d353067 PMD 0
[  219.816396] Oops:  [#1] SMP
[  219.816400] Modules linked in: nls_utf8 sr_mod cdrom xen_scsifront 
xt_pkttype xt_LOG xt_limit af_packet ip6t_REJECT xt_tcpudp 
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw 
xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns 
nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables 
xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables 
x86_pkg_temp_thermal thermal_sys coretemp hwmon crc32_pclmul 
crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw 
gf128mul glue_helper aes_x86_64 microcode pcspkr sg dm_mod autofs4 
scsi_dh_alua scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh xen_blkfront 
xen_netfront
[  219.816458] CPU: 0 PID: 23 Comm: xenwatch Not tainted 
3.16.0-rc6-11-xen+ #66
[  219.816463] task: 88003da985d0 ti: 88003da9c000 task.ti: 
88003da9c000
[  219.816467] RIP: e030:[]  [] 
scsi_put_host_cmd_pool+0x38/0xb0

[  219.816474] RSP: e02b:88003da9fc20  EFLAGS: 00010202
[  219.816477] RAX: a01a50c0 RBX:  RCX: 
0003
[  219.816481] RDX: 0240 RSI: 88003d242b80 RDI: 
80c708c0
[  219.816485] RBP: 88003da9fc38 R08: 4f7e974a31ed0290 R09: 

[  219.816488] R10: 7ff0 R11: 0001 R12: 
8800038f8000
[  219.816491] R13: a01a50c0 R14:  R15: 

[  219.816498] FS:  7fe2e2eeb700() GS:88003f80() 
knlGS:

[  219.816502] CS:  e033 DS:  ES:  CR0: 80050033
[  219.816505] CR2: 0010 CR3: 3d20c000 CR4: 
00042660

[  219.816509] Stack:
[  219.816511]  8800038f8000 8800038f8030 880003ae3400 
88003da9fc58
[  219.816516]  805fe78b 8800038f8000 88003bb82c40 
88003da9fc80
[  219.816521]  805ff587 8800038f81a0 8800038f8190 
880003ae3400

[  219.816527] Call Trace:
[  219.816533]  [] scsi_destroy_command_freelist+0x5b/0x60
[  219.816538]  [] scsi_host_dev_release+0x97/0xe0
[  219.816543]  [] device_release+0x2d/0xa0
[  219.816548]  [] kobject_cleanup+0x77/0x1b0
[  219.816553]  [] kobject_put+0x30/0x60
[  219.816556]  [] put_device+0x12/0x20
[  219.816560]  [] scsi_host_put+0x10/0x20
[  219.816565]  [] scsifront_free+0x42/0x90 
[xen_scsifront]
[  219.816569]  [] scsifront_remove+0x1d/0x50 
[xen_scsifront]

[  219.816576]  [] xenbus_dev_remove+0x50/0xa0
[  219.816580]  [] __device_release_driver+0x7a/0xf0
[  219.816584]  [] device_release_driver+0x1e/0x30
[  219.816588]  [] bus_remove_device+0x100/0x180
[  219.816593]  [] device_del+0x121/0x1b0
[  219.816596]  [] device_unregister+0x19/0x60
[  219.816601]  [] xenbus_dev_changed+0x9e/0x1e0
[  219.816606]  [] ? _raw_spin_unlock_irqrestore+0x25/0x50
[  219.816611]  [] ? unregister_xenbus_watch+0x200/0x200
[  219.816615]  [] frontend_changed+0x20/0x50
[  219.816619]  [] xenwatch_thread+0x9f/0x160
[  219.816624]  [] ? prepare_to_wait_event+0xf0/0xf0
[  219.816628]  [] kthread+0xcd/0xf0
[  219.816633]  [] ? kthread_create_on_node+0x170/0x170
[  219.816638]  [] ret_from_fork+0x7c/0xb0
[  219.816642]  [] ? kthread_create_on_node+0x170/0x170
[  219.816645] Code: 8b af c0 00 00 00 48 c7 c7 c0 08 c7 80 e8 d1 e0 19 
00 49 8b 84 24 c0 00 00 00 8b 90 48 01 00 00 85 d2 74 2f 48 8b 98 50 01 
00 00 <8b> 43 10 85 c0 74 68 83 e8 01 85 c0 89 43 10 74 37 48 c7 c7 c0

[  219.816732] RIP  [] scsi_put_host_cmd_pool+0x38/0xb0
[  219.816747]  RSP 
[  219.816750] CR2: 0010
[  219.816753] ---[ end trace c6915ea21a3d05f7 ]---

I should mention I've specified .cmd_len in the scsi_host_template. The
only other driver doing this seems to be virtio_scsi.c, so I assume the
same problem could occur with passed-through SCSI devices under KVM...


Juergen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/3 net-next] Lockless netlink_lookup() with new concurrent hash table

2014-07-31 Thread David Miller
From: Thomas Graf 
Date: Fri,  1 Aug 2014 00:56:00 +0200

> Netlink sockets are maintained in a hash table to allow efficient lookup
> via the port ID for unicast messages. However, lookups currently require
> a read lock to be taken. This series adds a new generic, resizable,
> scalable, concurrent hash table based on the paper referenced in the first
> patch. It then makes use of the new data type to implement lockless
> netlink_lookup().
> 
> Patch 3/3 to convert nft_hash is included for reference but should be
> merged via the netfilter tree. Inclusion in this series is to provide
> context for the suggested API.
> 
> Against net-next since the initial user of the new hash table is in net/
> 
> Changes:
> v1-v2:
>  - fixed traversal off-by-one as spotted by Tobias Klauser
>  - removed unlikely() from BUG_ON() as spotted by Josh Triplett
>  - new 3rd patch to convert nft_hash to rhashtable
>  - make rhashtable_insert() return void
>  - nl_sk_hash_lock must be a mutex
>  - fixed wrong name of rht_shrink_below_30()
>  - exported symbols rht_grow_above_75() and rht_shrink_below_30()
>  - allow table freeing with RCU callback

Looks great, series applied, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] ARM: DTS: da850-evm: Add node for tlv320aic3106 codec

2014-07-31 Thread Peter Ujfalusi
On 07/31/2014 05:24 PM, Sergei Shtylyov wrote:
> Hello.
> 
> On 07/31/2014 02:18 PM, Peter Ujfalusi wrote:
> 
>> The board uses aic3106 for audio.
> 
>> Signed-off-by: Peter Ujfalusi 
>> ---
>>   arch/arm/boot/dts/da850-evm.dts | 14 ++
>>   1 file changed, 14 insertions(+)
> 
>> diff --git a/arch/arm/boot/dts/da850-evm.dts 
>> b/arch/arm/boot/dts/da850-evm.dts
>> index 09118c72e83f..b9ef2be0b145 100644
>> --- a/arch/arm/boot/dts/da850-evm.dts
>> +++ b/arch/arm/boot/dts/da850-evm.dts
>> @@ -51,6 +51,20 @@
>>   tps: tps@48 {
>>   reg = <0x48>;
>>   };
>> +tlv320aic3106: tlv320aic3106@1b {
> 
>The "reg" property is <0x18>, why the unit-address part of a name is
> different?

True, I have lifted the codec part from other dts file and overlooked the
unit-address.
I will resend the series with this fixed.


> Also, the ePAPR standard [1] says:
> 
> The name of a node should be somewhat generic, reflecting the function of the
> device and not its precise programming model.

True. This is why the node for the audio support is named as 'sound'. For the
components, like in this case I do not see issue to call the audio codec with
it's name.

> 
>> +#sound-dai-cells = <0>;
>> +compatible = "ti,tlv320aic3106";
>> +reg = <0x18>;
>> +status = "okay";
>> +
>> +/* Regulators */
>> +IOVDD-supply = <_reg>;
>> +/* Derived from VBAT: Baseboard 3.3V / 1.8V */
>> +AVDD-supply = <>;
>> +DRVDD-supply = <>;
>> +DVDD-supply = <>;
>> +};
>> +
> 
> [1] http://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.0.pdf

BTW: there's a newer version available:
https://www.power.org/wp-content/uploads/2012/06/Power_ePAPR_APPROVED_v1.1.pdf

> 
> WBR, Sergei
> 


-- 
Péter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the kvm-arm tree with Linus' tree

2014-07-31 Thread Stephen Rothwell
Hi Christoffer,

On Thu, 31 Jul 2014 16:23:47 +0200 Christoffer Dall  
wrote:
>
> Stephen, did you pick up the resolution provided by Marc for the gicv2
> fix patch so that it applies to tomorrow's next/kvmarm merge?

Yes, I have.  You will need to produce the same for Linus eventually,
or do the merge yourself as I suggested.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature


Re: [PATCH 2/6] ARM: DTS: da850: Add node for edma0

2014-07-31 Thread Peter Ujfalusi
On 07/31/2014 05:26 PM, Sergei Shtylyov wrote:
> On 07/31/2014 02:18 PM, Peter Ujfalusi wrote:
> 
>> Add DT node for edma0.
> 
>> Signed-off-by: Peter Ujfalusi 
>> ---
>>   arch/arm/boot/dts/da850.dtsi | 6 ++
>>   1 file changed, 6 insertions(+)
> 
>> diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi
>> index b695548dbb4e..41ce4e8bf227 100644
>> --- a/arch/arm/boot/dts/da850.dtsi
>> +++ b/arch/arm/boot/dts/da850.dtsi
>> @@ -150,6 +150,12 @@
>>   };
>>
>>   };
>> +edma0: edma@01c0 {
>> +compatible = "ti,edma3";
>> +reg =<0x0 0x1>;
> 
>Why the mismatch between the unit-address part of the node name and the
> "reg" property?

For some reason the whole da850 uses offset from 0x01c0 for the SoC IPs.
The nodes are under 'soc' and that has the ranges attribute.
I do not really like this either.

> 
>> +interrupts = <11 13 12>;
>> +#dma-cells = <1>;
>> +};
> 
> WBR, Sergei
> 

-- 
Péter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] AX88179_178A: Add ethtool ops for EEE support

2014-07-31 Thread David Miller
From: fre...@asix.com.tw
Date: Thu, 31 Jul 2014 19:06:35 +0800

> From: Freddy Xin 
> 
> Add functions to support ethtool EEE manipulating, and the EEE
> is disabled in default setting to enhance the compatibility
> with certain switch.
> 
> Signed-off-by: Freddy Xin 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] mm, shmem: Show location of non-resident shmem pages in smaps

2014-07-31 Thread Hugh Dickins
On Tue, 22 Jul 2014, Jerome Marchand wrote:

> Adds ShmOther, ShmOrphan, ShmSwapCache and ShmSwap lines to
> /proc//smaps for shmem mappings.
> 
> ShmOther: amount of memory that is currently resident in memory, not
> present in the page table of this process but present in the page
> table of an other process.
> ShmOrphan: amount of memory that is currently resident in memory but
> not present in any process page table. This can happens when a process
> unmaps a shared mapping it has accessed before or exits. Despite being
> resident, this memory is not currently accounted to any process.
> ShmSwapcache: amount of memory currently in swap cache
> ShmSwap: amount of memory that is paged out on disk.
> 
> Signed-off-by: Jerome Marchand 

You will have to do a much better job of persuading me that these
numbers are of any interest.  Okay, maybe not me, I'm not that keen
on /proc//smaps at the best of times.  But you will need to show
plausible cases where having these numbers available would have made
a real difference, and drum up support for their inclusion from
/proc//smaps devotees.

Do you have a customer, who has underprovisioned with swap,
and wants these numbers to work out how much more is needed?

As it is, they appear to be numbers that you found you could provide,
and so you're adding them into /proc//smaps, but having great
difficulty in finding good names to describe them - which is itself
an indicator that they're probably not the most useful statistics
a sysadmin is wanting.

(Google is a /proc//smaps user: let's take a look to see if
we have been driven to add in stats of this kind: no, not at all.)

The more numbers we add to /proc//smaps, the longer it will take to
print, the longer mmap_sem will be held, and the more it will interfere
with proper system operation - that's the concern I more often see.

> ---
>  Documentation/filesystems/proc.txt | 11 
>  fs/proc/task_mmu.c | 56 
> +-
>  2 files changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/filesystems/proc.txt 
> b/Documentation/filesystems/proc.txt
> index 1a15c56..a65ab59 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -422,6 +422,10 @@ Swap:  0 kB
>  KernelPageSize:4 kB
>  MMUPageSize:   4 kB
>  Locked:  374 kB
> +ShmOther:124 kB
> +ShmOrphan: 0 kB
> +ShmSwapCache: 12 kB
> +ShmSwap:  36 kB
>  VmFlags: rd ex mr mw me de
>  
>  the first of these lines shows the same information as is displayed for the
> @@ -437,6 +441,13 @@ a mapping associated with a file may contain anonymous 
> pages: when MAP_PRIVATE
>  and a page is modified, the file page is replaced by a private anonymous 
> copy.
>  "Swap" shows how much would-be-anonymous memory is also used, but out on
>  swap.
> +The ShmXXX lines only appears for shmem mapping. They show the amount of 
> memory
> +from the mapping that is currently:
> + - resident in RAM, not present in the page table of this process but present
> + in the page table of an other process (ShmOther)

We don't show that for files of any other filesystem, why for shmem?
Perhaps you are too focussed on SysV SHM, and I am too focussed on tmpfs.

It is a very specialized statistic, and therefore hard to name: I don't
think ShmOther is a good name, but doubt any would do.  ShmOtherMapped?

> + - resident in RAM but not present in the page table of any process 
> (ShmOrphan)

We don't show that for files of any other filesystem, why for shmem?

Orphan?  We do use the word "orphan" to describe pages which have been
truncated off a file, but somehow not yet removed from pagecache.  We
don't use the the word "orphan" to describe pagecache pages which are
not mapped into userspace - they are known as "pagecache pages which
are not mapped into userspace".  ShmNotMapped?

> + - in swap cache (ShmSwapCache)

Is this interesting?  It's a transitional state: either memory pressure
has forced the page to swapcache, but not yet freed it from memory; or
swapin_readahead has brought this page back in when bringing in a nearby
page of swap.

I can understand that we might want better stats on the behaviour of
swapin_readahead; better stats on shmem objects and swap; better stats
on duplication between pagecache and swap; but I'm not convinced that
/proc//smaps is the right place for those.

Against all that, of course, we do have mincore() showing these pages
as incore, where /proc//smaps does not.  But I think that is
justified by mincore()'s mission to show what's incore.

> + - paged out on swap (ShmSwap).

This one has the best case for inclusion: we do show Swap for the anon
pages which are out on swap, but not for the shmem areas, where swap
entry does not go into page table.  But there is good reason for that:
this is shared memory, files, objects commonly shared between
processes, so it's a poor fit 

Re: [PATCH 4/5] mm, shmem: Add shmem swap memory accounting

2014-07-31 Thread Hugh Dickins
On Tue, 22 Jul 2014, Jerome Marchand wrote:

> Adds get_mm_shswap() which compute the size of swaped out shmem. It
> does so by pagewalking the mm and using the new shmem_locate() function
> to get the physical location of shmem pages.
> The result is displayed in the new VmShSw line of /proc//status.
> Use mm_walk an shmem_locate() to account paged out shmem pages.
> 
> It significantly slows down /proc//status acccess speed when
> there is a big shmem mapping. If that is an issue, we can drop this
> patch and only display this counter in the inherently slower
> /proc//smaps file (cf. next patch).
> 
> Signed-off-by: Jerome Marchand 

Definite NAK to this one.  As you guessed yourself, it is always a
mistake to add one potentially very slow-to-gather number to a stats
file showing a group of quickly gathered numbers.

Is there anything you could do instead?  I don't know if it's worth
the (little) extra mm_struct storage and maintenance, but you could
add a VmShmSize, which shows that subset of VmSize (total_vm) which
is occupied by shmem mappings.

It's ambiguous what to deduce when VmShm is less than VmShmSize:
the difference might be swapped out, it might be holes in the sparse
object, it might be instantiated in the object but never faulted
into the mapping: in general it will be a mix of all of those.
So, sometimes useful info, but easy to be misled by it.

As I say, I don't know if VmShmSize would be worth adding, given its
deficiencies; and it could be worked out from /proc//maps anyway.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] mm, shmem: Add shmem_vma() helper

2014-07-31 Thread Hugh Dickins
On Tue, 22 Jul 2014, Jerome Marchand wrote:

> Add a simple helper to check if a vm area belongs to shmem.
> 
> Signed-off-by: Jerome Marchand 
> ---
>  include/linux/mm.h | 6 ++
>  mm/shmem.c | 8 
>  2 files changed, 14 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 34099fa..04a58d1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1074,11 +1074,17 @@ int shmem_zero_setup(struct vm_area_struct *);
>  
>  extern int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int 
> *count);
>  bool shmem_mapping(struct address_space *mapping);
> +bool shmem_vma(struct vm_area_struct *vma);
> +
>  #else
>  static inline bool shmem_mapping(struct address_space *mapping)
>  {
>   return false;
>  }
> +static inline bool shmem_vma(struct vm_area_struct *vma)
> +{
> + return false;
> +}
>  #endif

I would prefer include/linux/shmem_fs.h for this (and one of us clean
up where the declarations of shmem_zero_setup and shmem_mapping live).

But if 4/5 goes away, then there will only be one user of shmem_vma(),
so in that case better just declare it (using shmem_mapping()) there
in task_mmu.c in the smaps patch.

>  
>  extern int can_do_mlock(void);
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 8aa4892..7d16227 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1483,6 +1483,14 @@ bool shmem_mapping(struct address_space *mapping)
>   return mapping->backing_dev_info == _backing_dev_info;
>  }
>  
> +bool shmem_vma(struct vm_area_struct *vma)
> +{
> + return (vma->vm_file &&
> + vma->vm_file->f_dentry->d_inode->i_mapping->backing_dev_info
> + == _backing_dev_info);
> +

I agree with Oleg,
vma->vm_file && shmem_mapping(file_inode(vma->vm_file)->i_mapping);
would be better,

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] mm, shmem: Add shmem_locate function

2014-07-31 Thread Hugh Dickins
On Tue, 22 Jul 2014, Jerome Marchand wrote:

> The shmem subsytem is kind of a black box: the generic mm code can't

I'm happier with that black box than you are :)

> always know where a specific page physically is. This patch adds the
> shmem_locate() function to find out the physical location of shmem
> pages (resident, in swap or swapcache). If the optional argument count
> isn't NULL and the page is resident, it also returns the mapcount value
> of this page.
> This is intended to allow finer accounting of shmem/tmpfs pages.
> 
> Signed-off-by: Jerome Marchand 
> ---
>  include/linux/mm.h |  7 +++
>  mm/shmem.c | 29 +
>  2 files changed, 36 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e69ee9d..34099fa 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1066,6 +1066,13 @@ extern bool skip_free_areas_node(unsigned int flags, 
> int nid);
>  
>  int shmem_zero_setup(struct vm_area_struct *);
>  #ifdef CONFIG_SHMEM
> +
> +#define SHMEM_NOTPRESENT 1 /* page is not present in memory */
> +#define SHMEM_RESIDENT   2 /* page is resident in RAM */
> +#define SHMEM_SWAPCACHE  3 /* page is in swap cache */
> +#define SHMEM_SWAP   4 /* page is paged out */
> +
> +extern int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int 
> *count);

Please place these, or what's needed of them, in include/linux/shmem_fs.h,
rather than in the very overloaded include/linux/mm.h.
You will need a !CONFIG_SHMEM stub for shmem_locate(),
or whatever it ends up being called.

>  bool shmem_mapping(struct address_space *mapping);

Oh, you're following a precedent, that's already bad placement.
And it (but not its !CONFIG_SHMEM stub) is duplicated in shmem_fs.h.
Perhaps because we were moving shmem_zero_setup() from mm.h to shmem_fs.h
some time ago, but never got around to cleaning up the old location.

Well, please place the new ones in shmem_fs.h, and I ought to clean
up the rest at a time which does not interfere with you.

>  #else
>  static inline bool shmem_mapping(struct address_space *mapping)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index b16d3e7..8aa4892 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1341,6 +1341,35 @@ static int shmem_fault(struct vm_area_struct *vma, 
> struct vm_fault *vmf)
>   return ret;
>  }
>  
> +int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count)

I don't find that a helpful name; but in 5/5 I question the info you're
gathering here - maybe a good name will be more obvious once we've cut
down what it's gathering.

I just noticed that in 5/5 you're using a walk->pte_hole across
empty extents: perhaps I'm prematurely optimizing, but that feels very
inefficient, maybe here you should use a radix_tree lookup of the extent.

If all we had to look up were the number of swap entries, in the vast
majority of cases shmem.c could just see info->swapped is 0 and spend
no time on radix_tree lookups at all.

But what happens here depends on what really needs to be shown in 5/5.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] mm, shmem: Add shmem resident memory accounting

2014-07-31 Thread Hugh Dickins
On Tue, 22 Jul 2014, Jerome Marchand wrote:

> Currently looking at /proc//status or statm, there is no way to
> distinguish shmem pages from pages mapped to a regular file (shmem
> pages are mapped to /dev/zero), even though their implication in
> actual memory use is quite different.
> This patch adds MM_SHMEMPAGES counter to mm_rss_stat. It keeps track of
> resident shmem memory size. Its value is exposed in the new VmShm line
> of /proc//status.

I like adding this info to /proc//status - thank you -
but I think you can make the patch much better in a couple of ways.

> 
> Signed-off-by: Jerome Marchand 
> ---
>  Documentation/filesystems/proc.txt |  2 ++
>  arch/s390/mm/pgtable.c |  2 +-
>  fs/proc/task_mmu.c |  9 ++---
>  include/linux/mm.h |  7 +++
>  include/linux/mm_types.h   |  7 ---
>  kernel/events/uprobes.c|  2 +-
>  mm/filemap_xip.c   |  2 +-
>  mm/memory.c| 37 +++--
>  mm/rmap.c  |  8 
>  9 files changed, 57 insertions(+), 19 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.txt 
> b/Documentation/filesystems/proc.txt
> index ddc531a..1c49957 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -171,6 +171,7 @@ read the file /proc/PID/status:
>VmLib:  1412 kB
>VmPTE:20 kb
>VmSwap:0 kB
> +  VmShm: 0 kB
>Threads:1
>SigQ:   0/28578
>SigPnd: 
> @@ -228,6 +229,7 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
>   VmLib   size of shared library code
>   VmPTE   size of page table entries
>   VmSwap  size of swap usage (the number of referred 
> swapents)
> + VmShm size of resident shmem memory

Needs to say that includes mappings of tmpfs, and needs to say that
it's a subset of VmRSS.  Better placed immediately after VmRSS...

...but now that I look through what's in /proc//status, it appears
that we have to defer to /proc//statm to see MM_FILEPAGES (third
field) and MM_ANONPAGES (subtract third field from second field).

That's not a very friendly interface.  If you're going to help by
exposing MM_SHMPAGES separately, please help even more by exposing
VmFile and VmAnon here in /proc//status too.

VmRSS, VmAnon, VmShm, VmFile?  I'm not sure what's the best order:
here I'm thinking that anon comes before file in /proc/meminfo, and
shm should be halfway between anon and file.  You may have another idea.

And of course the VmFile count here should exclude VmShm: I think it
will work out least confusingly if you account MM_FILEPAGES separately
from MM_SHMPAGES, but add them together where needed e.g. for statm.

>   Threads number of threads
>   SigQnumber of signals queued/max. number for queue
>   SigPnd  bitmap of pending signals for the thread
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 37b8241..9fe31b0 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -612,7 +612,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct 
> mm_struct *mm)
>   if (PageAnon(page))
>   dec_mm_counter(mm, MM_ANONPAGES);
>   else
> - dec_mm_counter(mm, MM_FILEPAGES);
> + dec_mm_file_counters(mm, page);
>   }

That is a recurring pattern: please try putting

static inline int mm_counter(struct page *page)
{
if (PageAnon(page))
return MM_ANONPAGES;
if (PageSwapBacked(page))
return MM_SHMPAGES;
return MM_FILEPAGES;
}

in include/linux/mm.h.

Then dec_mm_counter(mm, mm_counter(page)) here, and wherever you can,
use mm_counter(page) to simplify the code throughout.

I say "try" because I think factoring out mm_counter() will simplify
the most code, given the profusion of different accessors, particularly
in mm/memory.c.  But I'm not sure how much bloat having it as an inline
function will add, versus how much overhead it would add if not inline.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 2/2] dma: Add Xilinx AXI Direct Memory Access Engine driver support

2014-07-31 Thread Srikanth Thokala
Hi,

Kindly review this patch and please provide your inputs.

Thanks
Srikanth

On Mon, Jul 28, 2014 at 5:47 PM, Srikanth Thokala  wrote:
> This is the driver for the AXI Direct Memory Access (AXI DMA)
> core, which is a soft Xilinx IP core that provides high-
> bandwidth direct memory access between memory and AXI4-Stream
> type target peripherals.
>
> This module works on Zynq (ARM Based SoC) and Microblaze platforms.
>
> Signed-off-by: Srikanth Thokala 
> ---
> Changes in v3:
> - Rebased on 3.16-rc7
>
> Changes in v2:
> - Simplified the logic to set SOP and APP words in prep_slave_sg().
> - Corrected function description comments to match the return type.
> - Fixed some minor comments as suggested by Andy, Thanks.
> ---
>  drivers/dma/Kconfig |   13 +
>  drivers/dma/xilinx/Makefile |1 +
>  drivers/dma/xilinx/xilinx_dma.c | 1225 
> +++
>  include/linux/amba/xilinx_dma.h |   17 +
>  4 files changed, 1256 insertions(+)
>  create mode 100644 drivers/dma/xilinx/xilinx_dma.c
>
> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
> index 1eca7b9..b8e831e 100644
> --- a/drivers/dma/Kconfig
> +++ b/drivers/dma/Kconfig
> @@ -375,6 +375,19 @@ config XILINX_VDMA
>   channels, Memory Mapped to Stream (MM2S) and Stream to
>   Memory Mapped (S2MM) for the data transfers.
>
> +config XILINX_DMA
> +   tristate "Xilinx AXI DMA Engine"
> +   depends on (ARCH_ZYNQ || MICROBLAZE)
> +   select DMA_ENGINE
> +   help
> + Enable support for Xilinx AXI DMA Soft IP.
> +
> + This engine provides high-bandwidth direct memory access
> + between memory and AXI4-Stream type target peripherals.
> + It has two stream interfaces/channels, Memory Mapped to
> + Stream (MM2S) and Stream to Memory Mapped (S2MM) for the
> + data transfers.
> +
>  config DMA_ENGINE
> bool
>
> diff --git a/drivers/dma/xilinx/Makefile b/drivers/dma/xilinx/Makefile
> index 3c4e9f2..6224a49 100644
> --- a/drivers/dma/xilinx/Makefile
> +++ b/drivers/dma/xilinx/Makefile
> @@ -1 +1,2 @@
>  obj-$(CONFIG_XILINX_VDMA) += xilinx_vdma.o
> +obj-$(CONFIG_XILINX_DMA) += xilinx_dma.o
> diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
> new file mode 100644
> index 000..0500773
> --- /dev/null
> +++ b/drivers/dma/xilinx/xilinx_dma.c
> @@ -0,0 +1,1225 @@
> +/*
> + * DMA driver for Xilinx DMA Engine
> + *
> + * Copyright (C) 2010 - 2014 Xilinx, Inc. All rights reserved.
> + *
> + * Based on the Freescale DMA driver.
> + *
> + * Description:
> + *  The AXI DMA, is a soft IP, which provides high-bandwidth Direct Memory
> + *  Access between memory and AXI4-Stream-type target peripherals. It can be
> + *  configured to have one channel or two channels and if configured as two
> + *  channels, one is to transmit data from memory to a device and another is
> + *  to receive from a device.
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "../dmaengine.h"
> +
> +/* Register Offsets */
> +#define XILINX_DMA_REG_CONTROL 0x00
> +#define XILINX_DMA_REG_STATUS  0x04
> +#define XILINX_DMA_REG_CURDESC 0x08
> +#define XILINX_DMA_REG_TAILDESC0x10
> +#define XILINX_DMA_REG_SRCADDR 0x18
> +#define XILINX_DMA_REG_DSTADDR 0x20
> +#define XILINX_DMA_REG_BTT 0x28
> +
> +/* Channel/Descriptor Offsets */
> +#define XILINX_DMA_MM2S_CTRL_OFFSET0x00
> +#define XILINX_DMA_S2MM_CTRL_OFFSET0x30
> +
> +/* General register bits definitions */
> +#define XILINX_DMA_CR_RUNSTOP_MASK BIT(0)
> +#define XILINX_DMA_CR_RESET_MASK   BIT(2)
> +
> +#define XILINX_DMA_CR_DELAY_SHIFT  24
> +#define XILINX_DMA_CR_COALESCE_SHIFT   16
> +
> +#define XILINX_DMA_CR_DELAY_MAXGENMASK(7, 0)
> +#define XILINX_DMA_CR_COALESCE_MAX GENMASK(7, 0)
> +
> +#define XILINX_DMA_SR_HALTED_MASK  BIT(0)
> +#define XILINX_DMA_SR_IDLE_MASKBIT(1)
> +
> +#define XILINX_DMA_XR_IRQ_IOC_MASK BIT(12)
> +#define XILINX_DMA_XR_IRQ_DELAY_MASK   BIT(13)
> +#define XILINX_DMA_XR_IRQ_ERROR_MASK   BIT(14)
> +#define XILINX_DMA_XR_IRQ_ALL_MASK GENMASK(14, 12)
> +
> +/* BD definitions */
> +#define XILINX_DMA_BD_STS_ALL_MASK GENMASK(31, 28)
> +#define XILINX_DMA_BD_SOP  BIT(27)
> +#define XILINX_DMA_BD_EOP  BIT(26)
> +
> +/* Hw specific definitions */
> +#define XILINX_DMA_MAX_CHANS_PER_DEVICE0x2
> +#define XILINX_DMA_MAX_TRANS_LEN   GENMASK(22, 0)
> +
> +/* Delay loop counter to prevent hardware failure */
> +#define 

Re: [PATCH 1/1] Drivers: net-next: hyperv: Increase the size of the sendbuf region

2014-07-31 Thread David Miller
From: "K. Y. Srinivasan" 
Date: Wed, 30 Jul 2014 18:35:49 -0700

> For forwarding scenarios, it will be useful to allocate larger
> sendbuf. Make the necessary adjustments to permit this.
> 
> Signed-off-by: K. Y. Srinivasan 

This needs more information.

You're increasing the size by 16 times, 1MB --> 16MB, thus less
cache locality.

You're also now using vmalloc() memory, thus more TLB misses and
thrashing.

This must have a negative impact on performance, and you have to
test for that and quantify it when making a change as serious as
this one.

You also haven't gone into detail as to why forwarding scenerios
require more buffer space, than say thousands of local sockets
sending bulk TCP data.

I'm not applying this, it needs a lot more work.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 03/10] ARM: dts: Clean up exynos5250-snow

2014-07-31 Thread Andreas Färber
Use the new style of referencing inherited nodes and use symbolic names.
Reorder one pinctrl node in GPIO order.

Goal is the alignment of all exynos5250 based device trees for comparison.

Suggested-by: Doug Anderson 
Signed-off-by: Andreas Färber 
---
 v4 -> v5:
 * Introduced labels to consistently use new referencing style (Tomasz)
 * Use IRQ_TYPE_* constants
 * Use some more GPIO_ACTIVE_*

 v3 -> v4: Unchanged
 
 v3: New (Doug Anderson)

 arch/arm/boot/dts/exynos5250-snow.dts | 291 +-
 arch/arm/boot/dts/exynos5250.dtsi |  20 +--
 2 files changed, 155 insertions(+), 156 deletions(-)

diff --git a/arch/arm/boot/dts/exynos5250-snow.dts 
b/arch/arm/boot/dts/exynos5250-snow.dts
index 1c36cd72905f..7680d5e03fb3 100644
--- a/arch/arm/boot/dts/exynos5250-snow.dts
+++ b/arch/arm/boot/dts/exynos5250-snow.dts
@@ -6,9 +6,12 @@
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
-*/
+ */
 
 /dts-v1/;
+#include 
+#include 
+#include 
 #include "exynos5250.dtsi"
 
 / {
@@ -26,89 +29,19 @@
chosen {
};
 
-   rtc@101E {
-   status = "okay";
-   };
-
-   pinctrl@1140 {
-   ec_irq: ec-irq {
-   samsung,pins = "gpx1-6";
-   samsung,pin-function = <0>;
-   samsung,pin-pud = <0>;
-   samsung,pin-drv = <0>;
-   };
-
-   sd3_clk: sd3-clk {
-   samsung,pin-drv = <0>;
-   };
-
-   sd3_cmd: sd3-cmd {
-   samsung,pin-pud = <3>;
-   samsung,pin-drv = <0>;
-   };
-
-   sd3_bus4: sd3-bus-width4 {
-   samsung,pin-drv = <0>;
-   };
-
-   max98095_en: max98095-en {
-   samsung,pins = "gpx1-7";
-   samsung,pin-function = <0>;
-   samsung,pin-pud = <3>;
-   samsung,pin-drv = <0>;
-   };
-
-   tps65090_irq: tps65090-irq {
-   samsung,pins = "gpx2-6";
-   samsung,pin-function = <0>;
-   samsung,pin-pud = <0>;
-   samsung,pin-drv = <0>;
-   };
-
-   usb3_vbus_en: usb3-vbus-en {
-   samsung,pins = "gpx2-7";
-   samsung,pin-function = <1>;
-   samsung,pin-pud = <0>;
-   samsung,pin-drv = <0>;
-   };
-
-   hdmi_hpd_irq: hdmi-hpd-irq {
-   samsung,pins = "gpx3-7";
-   samsung,pin-function = <0>;
-   samsung,pin-pud = <1>;
-   samsung,pin-drv = <0>;
-   };
-   };
-
-   pinctrl@1340 {
-   arb_their_claim: arb-their-claim {
-   samsung,pins = "gpe0-4";
-   samsung,pin-function = <0>;
-   samsung,pin-pud = <3>;
-   samsung,pin-drv = <0>;
-   };
-
-   arb_our_claim: arb-our-claim {
-   samsung,pins = "gpf0-3";
-   samsung,pin-function = <1>;
-   samsung,pin-pud = <0>;
-   samsung,pin-drv = <0>;
-   };
-   };
-
gpio-keys {
compatible = "gpio-keys";
 
power {
label = "Power";
-   gpios = < 3 1>;
-   linux,code = <116>; /* KEY_POWER */
+   gpios = < 3 GPIO_ACTIVE_LOW>;
+   linux,code = ;
gpio-key,wakeup;
};
 
lid-switch {
label = "Lid";
-   gpios = < 5 1>;
+   gpios = < 5 GPIO_ACTIVE_LOW>;
linux,input-type = <5>; /* EV_SW */
linux,code = <0>; /* SW_LID */
debounce-interval = <1>;
@@ -129,8 +62,8 @@
 
i2c-parent = <&{/i2c@12CA}>;
 
-   our-claim-gpio = < 3 1>;
-   their-claim-gpios = < 4 1>;
+   our-claim-gpio = < 3 GPIO_ACTIVE_LOW>;
+   their-claim-gpios = < 4 GPIO_ACTIVE_LOW>;
slew-delay-us = <10>;
wait-retry-us = <3000>;
wait-free-us = <5>;
@@ -153,7 +86,7 @@
cros_ec: embedded-controller {
compatible = "google,cros-ec-i2c";
reg = <0x1e>;
-   interrupts = <6 0>;
+   interrupts = <6 IRQ_TYPE_NONE>;
interrupt-parent = <>;

[PATCH v5 05/10] ARM: dts: Move dp_hpd from exynos5250 into smdk5250 and snow

2014-07-31 Thread Andreas Färber
Spring uses a different GPIO, so this is not a generic SoC piece.

Suggested-by: Tomasz Figa 
Signed-off-by: Andreas Färber 
---
 v5: New (Tomasz Figa)
 Frees dp_hpd for Spring.

 arch/arm/boot/dts/exynos5250-pinctrl.dtsi | 7 ---
 arch/arm/boot/dts/exynos5250-smdk5250.dts | 9 +
 arch/arm/boot/dts/exynos5250-snow.dts | 7 +++
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/arm/boot/dts/exynos5250-pinctrl.dtsi 
b/arch/arm/boot/dts/exynos5250-pinctrl.dtsi
index 886cfca044ac..ed0e5230514b 100644
--- a/arch/arm/boot/dts/exynos5250-pinctrl.dtsi
+++ b/arch/arm/boot/dts/exynos5250-pinctrl.dtsi
@@ -581,13 +581,6 @@
samsung,pin-pud = <0>;
samsung,pin-drv = <0>;
};
-
-   dp_hpd: dp_hpd {
-   samsung,pins = "gpx0-7";
-   samsung,pin-function = <3>;
-   samsung,pin-pud = <0>;
-   samsung,pin-drv = <0>;
-   };
};
 
pinctrl@1340 {
diff --git a/arch/arm/boot/dts/exynos5250-smdk5250.dts 
b/arch/arm/boot/dts/exynos5250-smdk5250.dts
index aaa055ac0fe3..5d30fe1dcda4 100644
--- a/arch/arm/boot/dts/exynos5250-smdk5250.dts
+++ b/arch/arm/boot/dts/exynos5250-smdk5250.dts
@@ -414,3 +414,12 @@
};
};
 };
+
+_0 {
+   dp_hpd: dp_hpd {
+   samsung,pins = "gpx0-7";
+   samsung,pin-function = <3>;
+   samsung,pin-pud = <0>;
+   samsung,pin-drv = <0>;
+   };
+};
diff --git a/arch/arm/boot/dts/exynos5250-snow.dts 
b/arch/arm/boot/dts/exynos5250-snow.dts
index c4b0c73c736d..a9a2f2743794 100644
--- a/arch/arm/boot/dts/exynos5250-snow.dts
+++ b/arch/arm/boot/dts/exynos5250-snow.dts
@@ -547,6 +547,13 @@
 };
 
 _0 {
+   dp_hpd: dp_hpd {
+   samsung,pins = "gpx0-7";
+   samsung,pin-function = <3>;
+   samsung,pin-pud = <0>;
+   samsung,pin-drv = <0>;
+   };
+
ec_irq: ec-irq {
samsung,pins = "gpx1-6";
samsung,pin-function = <0>;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 06/10] ARM: dts: Clean up exynos5250-smdk5250

2014-07-31 Thread Andreas Färber
Use the new style for referencing inherited nodes and use symbolic names.

Goal is the alignment of all exynos5250 based device trees for comparison.

Signed-off-by: Andreas Färber 
---
 v5: New
 Follow-up after adding dp_hpd pinctrl node new-style.

 arch/arm/boot/dts/exynos5250-smdk5250.dts | 640 +++---
 arch/arm/boot/dts/exynos5250.dtsi |   4 +-
 2 files changed, 324 insertions(+), 320 deletions(-)

diff --git a/arch/arm/boot/dts/exynos5250-smdk5250.dts 
b/arch/arm/boot/dts/exynos5250-smdk5250.dts
index 5d30fe1dcda4..81dc921a5e5e 100644
--- a/arch/arm/boot/dts/exynos5250-smdk5250.dts
+++ b/arch/arm/boot/dts/exynos5250-smdk5250.dts
@@ -7,9 +7,11 @@
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
-*/
+ */
 
 /dts-v1/;
+#include 
+#include 
 #include "exynos5250.dtsi"
 
 / {
@@ -27,165 +29,6 @@
bootargs = "root=/dev/ram0 rw ramdisk=8192 initrd=0x4100,8M 
console=ttySAC2,115200 init=/linuxrc";
};
 
-   rtc@101E {
-   status = "okay";
-   };
-
-   i2c@12C6 {
-   samsung,i2c-sda-delay = <100>;
-   samsung,i2c-max-bus-freq = <2>;
-   status = "okay";
-
-   eeprom@50 {
-   compatible = "samsung,s524ad0xd1";
-   reg = <0x50>;
-   };
-
-   max77686@09 {
-   compatible = "maxim,max77686";
-   reg = <0x09>;
-   interrupt-parent = <>;
-   interrupts = <2 0>;
-
-   voltage-regulators {
-   ldo1_reg: LDO1 {
-   regulator-name = "P1.0V_LDO_OUT1";
-   regulator-min-microvolt = <100>;
-   regulator-max-microvolt = <100>;
-   regulator-always-on;
-   };
-
-   ldo2_reg: LDO2 {
-   regulator-name = "P1.2V_LDO_OUT2";
-   regulator-min-microvolt = <120>;
-   regulator-max-microvolt = <120>;
-   regulator-always-on;
-   };
-
-   ldo3_reg: LDO3 {
-   regulator-name = "P1.8V_LDO_OUT3";
-   regulator-min-microvolt = <180>;
-   regulator-max-microvolt = <180>;
-   regulator-always-on;
-   };
-
-   ldo4_reg: LDO4 {
-   regulator-name = "P2.8V_LDO_OUT4";
-   regulator-min-microvolt = <280>;
-   regulator-max-microvolt = <280>;
-   };
-
-   ldo5_reg: LDO5 {
-   regulator-name = "P1.8V_LDO_OUT5";
-   regulator-min-microvolt = <180>;
-   regulator-max-microvolt = <180>;
-   };
-
-   ldo6_reg: LDO6 {
-   regulator-name = "P1.1V_LDO_OUT6";
-   regulator-min-microvolt = <110>;
-   regulator-max-microvolt = <110>;
-   regulator-always-on;
-   };
-
-   ldo7_reg: LDO7 {
-   regulator-name = "P1.1V_LDO_OUT7";
-   regulator-min-microvolt = <110>;
-   regulator-max-microvolt = <110>;
-   regulator-always-on;
-   };
-
-   ldo8_reg: LDO8 {
-   regulator-name = "P1.0V_LDO_OUT8";
-   regulator-min-microvolt = <100>;
-   regulator-max-microvolt = <100>;
-   };
-
-   ldo10_reg: LDO10 {
-   regulator-name = "P1.8V_LDO_OUT10";
-   regulator-min-microvolt = <180>;
-   regulator-max-microvolt = <180>;
-   };
-
-   ldo11_reg: LDO11 {
-   regulator-name = 

[PATCH v5 04/10] ARM: dts: Fill in bootargs for exynos5250-snow

2014-07-31 Thread Andreas Färber
exynos5250-cros-common.dtsi had an empty /chosen node.
Fill in exemplary boot arguments.

Signed-off-by: Andreas Färber 
---
 v5: New
 Cleanup for /chosen node moved into -snow.dts.

 arch/arm/boot/dts/exynos5250-snow.dts | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/exynos5250-snow.dts 
b/arch/arm/boot/dts/exynos5250-snow.dts
index 7680d5e03fb3..c4b0c73c736d 100644
--- a/arch/arm/boot/dts/exynos5250-snow.dts
+++ b/arch/arm/boot/dts/exynos5250-snow.dts
@@ -27,6 +27,7 @@
};
 
chosen {
+   bootargs = "console=tty1";
};
 
gpio-keys {
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 07/10] ARM: dts: Clean up exynos5250-arndale

2014-07-31 Thread Andreas Färber
Use the new style of referencing inherited nodes, use symbolic names,
tidy indentation and reorder includes.

Goal is the alignment of all exynos5250 based device trees for comparison.

Signed-off-by: Andreas Färber 
---
 v5: New
 Aligns with SMDK.

 arch/arm/boot/dts/exynos5250-arndale.dts | 929 ---
 1 file changed, 466 insertions(+), 463 deletions(-)

diff --git a/arch/arm/boot/dts/exynos5250-arndale.dts 
b/arch/arm/boot/dts/exynos5250-arndale.dts
index d0de1f50d15b..3a608f57f833 100644
--- a/arch/arm/boot/dts/exynos5250-arndale.dts
+++ b/arch/arm/boot/dts/exynos5250-arndale.dts
@@ -7,12 +7,13 @@
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
-*/
+ */
 
 /dts-v1/;
-#include "exynos5250.dtsi"
+#include 
 #include 
 #include 
+#include "exynos5250.dtsi"
 
 / {
model = "Insignal Arndale evaluation board based on EXYNOS5250";
@@ -26,473 +27,52 @@
bootargs = "console=ttySAC2,115200";
};
 
-   rtc@101E {
-   status = "okay";
-   };
-
-   codec@1100 {
-   samsung,mfc-r = <0x4300 0x80>;
-   samsung,mfc-l = <0x5100 0x80>;
-   };
-
-   i2c@12C6 {
-   samsung,i2c-sda-delay = <100>;
-   samsung,i2c-max-bus-freq = <2>;
-   samsung,i2c-slave-addr = <0x66>;
-   status = "okay";
-
-   s5m8767_pmic@66 {
-   compatible = "samsung,s5m8767-pmic";
-   reg = <0x66>;
-   interrupt-parent = <>;
-   interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
-
-   vinb1-supply = <_dc_reg>;
-   vinb2-supply = <_dc_reg>;
-   vinb3-supply = <_dc_reg>;
-   vinb4-supply = <_dc_reg>;
-   vinb5-supply = <_dc_reg>;
-   vinb6-supply = <_dc_reg>;
-   vinb7-supply = <_dc_reg>;
-   vinb8-supply = <_dc_reg>;
-   vinb9-supply = <_dc_reg>;
-
-   vinl1-supply = <_reg>;
-   vinl2-supply = <_reg>;
-   vinl3-supply = <_reg>;
-   vinl4-supply = <_dc_reg>;
-   vinl5-supply = <_dc_reg>;
-   vinl6-supply = <_dc_reg>;
-   vinl7-supply = <_dc_reg>;
-   vinl8-supply = <_reg>;
-   vinl9-supply = <_reg>;
-
-   s5m8767,pmic-buck2-dvs-voltage = <130>;
-   s5m8767,pmic-buck3-dvs-voltage = <110>;
-   s5m8767,pmic-buck4-dvs-voltage = <120>;
-   s5m8767,pmic-buck-dvs-gpios = < 0 0>,
-   < 1 0>,
-   < 2 0>;
-   s5m8767,pmic-buck-ds-gpios = < 3 0>,
-   < 4 0>,
-   < 5 0>;
-   regulators {
-   ldo1_reg: LDO1 {
-   regulator-name = "VDD_ALIVE_1.0V";
-   regulator-min-microvolt = <110>;
-   regulator-max-microvolt = <110>;
-   regulator-always-on;
-   regulator-boot-on;
-   op_mode = <1>;
-   };
-
-   ldo2_reg: LDO2 {
-   regulator-name = "VDD_28IO_DP_1.35V";
-   regulator-min-microvolt = <120>;
-   regulator-max-microvolt = <120>;
-   regulator-always-on;
-   regulator-boot-on;
-   op_mode = <1>;
-   };
-
-   ldo3_reg: LDO3 {
-   regulator-name = "VDD_COMMON1_1.8V";
-   regulator-min-microvolt = <180>;
-   regulator-max-microvolt = <180>;
-   regulator-always-on;
-   regulator-boot-on;
-   op_mode = <1>;
-   };
-
-   ldo4_reg: LDO4 {
-   regulator-name = "VDD_IOPERI_1.8V";
-   regulator-min-microvolt = <180>;
-   

[PATCH v5 09/10] ARM: dts: Simplify USB3503 on exynos5250-arndale

2014-07-31 Thread Andreas Färber
There's no need for a simple-bus, place the smsc,usb3503a directly into
the root node. That's what we're going to do on exynos5250-spring.

Reported-by: Tomasz Figa 
Signed-off-by: Andreas Färber 
---
 v5: New
 Aligns with Spring's new USB3503 node.

 arch/arm/boot/dts/exynos5250-arndale.dts | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/arm/boot/dts/exynos5250-arndale.dts 
b/arch/arm/boot/dts/exynos5250-arndale.dts
index a04a875346aa..9912d27492db 100644
--- a/arch/arm/boot/dts/exynos5250-arndale.dts
+++ b/arch/arm/boot/dts/exynos5250-arndale.dts
@@ -108,18 +108,12 @@
};
};
 
-   usb_hub_bus {
-   compatible = "simple-bus";
-   #address-cells = <1>;
-   #size-cells = <0>;
+   // SMSC USB3503 connected in hardware only mode as a PHY
+   usb_hub: usb-hub {
+   compatible = "smsc,usb3503a";
 
-   // SMSC USB3503 connected in hardware only mode as a PHY
-   usb_hub: usb_hub {
-   compatible = "smsc,usb3503a";
-
-   reset-gpios = < 5 GPIO_ACTIVE_LOW>;
-   connect-gpios = < 7 GPIO_ACTIVE_LOW>;
-   };
+   reset-gpios = < 5 GPIO_ACTIVE_LOW>;
+   connect-gpios = < 7 GPIO_ACTIVE_LOW>;
};
 };
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 02/10] ARM: dts: Fold exynos5250-cros-common into exynos5250-snow

2014-07-31 Thread Andreas Färber
exynos5250-cros-common.dtsi was meant for sharing common pieces across
ChromeOS devices. This turned out premature, as several devices ended up
in the common file that are not common after all. Since the remaining
common ChromeOS pieces are fairly minor,  exynos5250-cros-common.dtsi
was requested to be merged into the Snow device tree, sharing only the
keyboard controller for now. This may be re-evaluated as both mature.

Suggested-by: Doug Anderson 
Reviewed-by: Tomasz Figa 
Signed-off-by: Andreas Färber 
---
 v4 -> v5:
 * Extended commit message (Tomasz Figa)

 v3 -> v4: Unchanged
 
 v2 -> v3:
 * Renamed subject to match Kukjin's style
 * Rebased onto MMC pinctrl bug fix (Doug Anderson)
 
 v2: New (Doug Anderson)

 arch/arm/boot/dts/exynos5250-cros-common.dtsi | 164 --
 arch/arm/boot/dts/exynos5250-snow.dts | 164 +++---
 2 files changed, 145 insertions(+), 183 deletions(-)
 delete mode 100644 arch/arm/boot/dts/exynos5250-cros-common.dtsi

diff --git a/arch/arm/boot/dts/exynos5250-cros-common.dtsi 
b/arch/arm/boot/dts/exynos5250-cros-common.dtsi
deleted file mode 100644
index e603e9c70142..
--- a/arch/arm/boot/dts/exynos5250-cros-common.dtsi
+++ /dev/null
@@ -1,164 +0,0 @@
-/*
- * Common device tree include for all Exynos 5250 boards based off of Daisy.
- *
- * Copyright (c) 2012 Google, Inc
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
-*/
-
-/ {
-   aliases {
-   };
-
-   memory {
-   reg = <0x4000 0x8000>;
-   };
-
-   chosen {
-   };
-
-   pinctrl@1140 {
-   /*
-* Disabled pullups since external part has its own pullups and
-* double-pulling gets us out of spec in some cases.
-*/
-   i2c2_bus: i2c2-bus {
-   samsung,pin-pud = <0>;
-   };
-   };
-
-   i2c@12C6 {
-   status = "okay";
-   samsung,i2c-sda-delay = <100>;
-   samsung,i2c-max-bus-freq = <378000>;
-   };
-
-   i2c@12C7 {
-   status = "okay";
-   samsung,i2c-sda-delay = <100>;
-   samsung,i2c-max-bus-freq = <378000>;
-   };
-
-   i2c@12C8 {
-   status = "okay";
-   samsung,i2c-sda-delay = <100>;
-   samsung,i2c-max-bus-freq = <66000>;
-
-   hdmiddc@50 {
-   compatible = "samsung,exynos4210-hdmiddc";
-   reg = <0x50>;
-   };
-   };
-
-   i2c@12C9 {
-   status = "okay";
-   samsung,i2c-sda-delay = <100>;
-   samsung,i2c-max-bus-freq = <66000>;
-   };
-
-   i2c@12CA {
-   status = "okay";
-   samsung,i2c-sda-delay = <100>;
-   samsung,i2c-max-bus-freq = <66000>;
-   };
-
-   i2c@12CB {
-   status = "okay";
-   samsung,i2c-sda-delay = <100>;
-   samsung,i2c-max-bus-freq = <66000>;
-   };
-
-   i2c@12CD {
-   status = "okay";
-   samsung,i2c-sda-delay = <100>;
-   samsung,i2c-max-bus-freq = <66000>;
-   };
-
-   i2c@12CE {
-   status = "okay";
-   samsung,i2c-sda-delay = <100>;
-   samsung,i2c-max-bus-freq = <378000>;
-
-   hdmiphy: hdmiphy@38 {
-   compatible = "samsung,exynos4212-hdmiphy";
-   reg = <0x38>;
-   };
-   };
-
-   mmc@1220 {
-   num-slots = <1>;
-   supports-highspeed;
-   broken-cd;
-   card-detect-delay = <200>;
-   samsung,dw-mshc-ciu-div = <3>;
-   samsung,dw-mshc-sdr-timing = <2 3>;
-   samsung,dw-mshc-ddr-timing = <1 2>;
-   pinctrl-names = "default";
-   pinctrl-0 = <_clk _cmd _cd _bus4 _bus8>;
-
-   slot@0 {
-   reg = <0>;
-   bus-width = <8>;
-   };
-   };
-
-   mmc@1222 {
-   num-slots = <1>;
-   supports-highspeed;
-   card-detect-delay = <200>;
-   samsung,dw-mshc-ciu-div = <3>;
-   samsung,dw-mshc-sdr-timing = <2 3>;
-   samsung,dw-mshc-ddr-timing = <1 2>;
-   pinctrl-names = "default";
-   pinctrl-0 = <_clk _cmd _cd _bus4>;
-
-   slot@0 {
-   reg = <0>;
-   bus-width = <4>;
-   wp-gpios = < 1 0>;
-   };
-   };
-
-   mmc@1223 {
-   num-slots = <1>;
-   supports-highspeed;
-   broken-cd;
-   card-detect-delay = <200>;
-

[PATCH v5 10/10] ARM: dts: Add exynos5250-spring device tree

2014-07-31 Thread Andreas Färber
Adds initial support for the HP Chromebook 11.

Cc: Vincent Palatin 
Cc: Doug Anderson 
Cc: Stephan van Schaik 
Signed-off-by: Andreas Färber 
---
 v4 -> v5:
 * Dropped bogus USB3 regulator (Vincent Palatin, Tomasz Figa)
 * Fixed USB3503 reset GPIO (Tomasz Figa)
 * Introduced labels to use new referencing style consistently (Tomasz Figa)
 * Don't override dp_hpd, moved to pinctrl_0 instead (Tomasz Figa)
 * mmc_1: Added comment from Snow's mmc_3 (Tomasz Figa / Doug Anderson)
 * Override /codec samsung,mfc-{l,r} properties for alignment with Arndale
 * Use more GPIO_ACTIVE_* constants
 * Use IRQ_TYPE_* constants
 * Dropped s5m_ prefix for s5m8767 LDO regulator labels (max77686 is gone)
 * Labeled also all s5m8767 BUCK regulators
 
 v3 -> v4:
 * Fixed samsung,pin-function 1 -> 0 for dp-hpd-gpio
 * Replaced dp-hpd-gpio with existing dp_hpd, overriding it
 
 v2 -> v3:
 * Use GPIO_ACTIVE_{LOW,HIGH} (Doug Anderson)
 * Use symbolic KEY_POWER instead of comment
 * Moved hsic_reset to new USB3503 node's reset-gpios (Vincent Palatin)
 * Use dp_hpd_gpio for dp-controller (Doug Anderson, Ajay Kumar)
 * Override sd1_{clk,cmd,cd,bus4} pinctrl similar to Snow (Doug Anderson)
 * Added ec_irq pinctrl for cros_ec (Doug Anderson)
 * Reordered nodes to minimize diff against Snow (Doug Anderson)
 * Dropped obsolete mmc_2 override (Doug Anderson)
 * Added lid-switch node (Doug Anderson)
 * Added gpio-keys pinctrl (Doug Anderson)
 * Added bootargs to avoid empty /chosen node and to document console setting
 * Renamed s5m8767_pmic node to avoid underscore
 * Use new style for overriding inherited pinctrl nodes, too
 * Enable i2s0 node
 
 v1 -> v2:
 * Use label-based overriding/extension of nodes. (Doug Anderson)
 * Dropped tps65090 for now, until we know where to place it.
 * Dropped non-Spring nodes from -cros-common.dtsi rather than disabling them.
 * Enabled a missing MMC node for access to internal storage.
 * Dropped display-timings from dp-controller node. (Ajay Kumar)

 arch/arm/boot/dts/Makefile  |   1 +
 arch/arm/boot/dts/exynos5250-spring.dts | 536 
 2 files changed, 537 insertions(+)
 create mode 100644 arch/arm/boot/dts/exynos5250-spring.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 80a781f76e88..dec4c292f63d 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -76,6 +76,7 @@ dtb-$(CONFIG_ARCH_EXYNOS) += exynos4210-origen.dtb \
exynos5250-arndale.dtb \
exynos5250-smdk5250.dtb \
exynos5250-snow.dtb \
+   exynos5250-spring.dtb \
exynos5260-xyref5260.dtb \
exynos5410-smdk5410.dtb \
exynos5420-arndale-octa.dtb \
diff --git a/arch/arm/boot/dts/exynos5250-spring.dts 
b/arch/arm/boot/dts/exynos5250-spring.dts
new file mode 100644
index ..108e3a9002e7
--- /dev/null
+++ b/arch/arm/boot/dts/exynos5250-spring.dts
@@ -0,0 +1,536 @@
+/*
+ * Google Spring board device tree source
+ *
+ * Copyright (c) 2013 Google, Inc
+ * Copyright (c) 2014 SUSE LINUX Products GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/dts-v1/;
+#include 
+#include 
+#include 
+#include "exynos5250.dtsi"
+
+/ {
+   model = "Google Spring";
+   compatible = "google,spring", "samsung,exynos5250", "samsung,exynos5";
+
+   memory {
+   reg = <0x4000 0x8000>;
+   };
+
+   chosen {
+   bootargs = "console=tty1";
+   };
+
+   gpio-keys {
+   compatible = "gpio-keys";
+   pinctrl-names = "default";
+   pinctrl-0 = <_key_irq>, <_irq>;
+
+   power {
+   label = "Power";
+   gpios = < 3 GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   gpio-key,wakeup;
+   };
+
+   lid-switch {
+   label = "Lid";
+   gpios = < 5 GPIO_ACTIVE_LOW>;
+   linux,input-type = <5>; /* EV_SW */
+   linux,code = <0>; /* SW_LID */
+   debounce-interval = <1>;
+   gpio-key,wakeup;
+   };
+   };
+
+   usb-hub {
+   compatible = "smsc,usb3503a";
+   reset-gpios = < 0 GPIO_ACTIVE_LOW>;
+   };
+
+   fixed-rate-clocks {
+   xxti {
+   compatible = "samsung,clock-xxti";
+   clock-frequency = <2400>;
+   };
+   };
+};
+
+ {
+   samsung,mfc-r = <0x4300 0x80>;
+   samsung,mfc-l = <0x5100 0x80>;
+};
+
+ {
+   status = "okay";
+   pinctrl-names = "default";
+   pinctrl-0 = <_hpd>;
+   samsung,color-space = <0>;
+   samsung,dynamic-range = <0>;
+   samsung,ycbcr-coeff = <0>;
+   samsung,color-depth = 

[PATCH v5 08/10] ARM: dts: Fix apparent GPIO typo in exynos5250-arndale

2014-07-31 Thread Andreas Färber
The GPIO flag 2 has no constant assigned, so this was probably active-low.

Signed-off-by: Andreas Färber 
---
 v5: New
 Spotted during cleanup.

 arch/arm/boot/dts/exynos5250-arndale.dts | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/exynos5250-arndale.dts 
b/arch/arm/boot/dts/exynos5250-arndale.dts
index 3a608f57f833..a04a875346aa 100644
--- a/arch/arm/boot/dts/exynos5250-arndale.dts
+++ b/arch/arm/boot/dts/exynos5250-arndale.dts
@@ -164,7 +164,7 @@
 };
 
  {
-   hpd-gpio = < 7 2>;
+   hpd-gpio = < 7 GPIO_ACTIVE_LOW>;
vdd_osc-supply = <_reg>;
vdd_pll-supply = <_reg>;
vdd-supply = <_reg>;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 01/10] ARM: dts: Fix MMC pinctrl for exynos5250-snow

2014-07-31 Thread Andreas Färber
The pinctrl properties should be on the device directly and not on the
slot sub-node.

Reported-by: Doug Anderson 
Cc: Jaehoon Chung 
Reviewed-by: Tomasz Figa 
Signed-off-by: Andreas Färber 
---
 v3 -> v4 -> v5: Unchanged
 
 v3: New (Doug Anderson)
 Redundant with Jaehoon Chung's general slot@0 deprecation,
 in case that hits the tree earlier.

 arch/arm/boot/dts/exynos5250-snow.dts | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm/boot/dts/exynos5250-snow.dts 
b/arch/arm/boot/dts/exynos5250-snow.dts
index f2b8c4116541..eb437f6afec1 100644
--- a/arch/arm/boot/dts/exynos5250-snow.dts
+++ b/arch/arm/boot/dts/exynos5250-snow.dts
@@ -240,10 +240,8 @@
 */
mmc@1223 {
status = "okay";
-   slot@0 {
-   pinctrl-names = "default";
-   pinctrl-0 = <_clk _cmd _bus4>;
-   };
+   pinctrl-names = "default";
+   pinctrl-0 = <_clk _cmd _bus4>;
};
 
i2c@12CD {
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen-netback: Turn off the carrier if the guest is not able to receive

2014-07-31 Thread David Miller
From: Zoltan Kiss 
Date: Wed, 30 Jul 2014 20:50:49 +0100

> Currently when the guest is not able to receive more packets, qdisc layer 
> starts
> a timer, and when it goes off, qdisc is started again to deliver a packet 
> again.
> This is a very slow way to drain the queues, consumes unnecessary resources 
> and
> slows down other guests shutdown.
> This patch change the behaviour by turning the carrier off when that timer
> fires, so all the packets are freed up which were stucked waiting for that 
> vif.
> Instead of the rx_queue_purge bool it uses the VIF_STATUS_RX_PURGE_EVENT bit 
> to
> signal the thread that either the timout happened or an RX interrupt arrived, 
> so
> the thread can check what it should do. It also disables NAPI, so the guest
> can't transmit, but leaves the interrupts on, so it can resurrect.
> 
> Signed-off-by: Zoltan Kiss 
> Signed-off-by: David Vrabel 

When posting a multi-part patch set, number your patches and have a header
"[PATCH 0/N] " posting which describes at a high level what the patch
series is doing, and why.

> + for (i = 0; i < num_queues; ++i) {

Please use the more canonical "i++" increment.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the kvm tree with the ftrace tree

2014-07-31 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the kvm tree got a conflict in
arch/x86/kvm/mmutrace.h between commit 7b039cb4c5a9 ("tracing: Add
trace_seq_buffer_ptr() helper function") from the ftrace tree and
commit 42cbc04fd3b5 ("x86/kvm: Resolve shadow warnings in macro
expansion") from the kvm tree.

I fixed it up (I dropped the ftrace tree's change to this file) and can
carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature


Re: [PATCH net-next v3 4/4 RFC] pktgen: Allow sending IPv4 TCP packets

2014-07-31 Thread David Miller
From: Zoltan Kiss 
Date: Wed, 30 Jul 2014 17:20:12 +0100

> This is a prototype patch to enable sending IPv4 TCP packets with pktgen. The
> original motivation is to test TCP GSO with xen-netback/netfront, but I'm not
> sure about how the checksum should be set up, and also someone should verify 
> the
> GSO settings I'm using.
> 
> Signed-off-by: Zoltan Kiss 
> Cc: "David S. Miller" 
> Cc: Thomas Graf 
> Cc: Joe Perches 
> Cc: net...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: xen-de...@lists.xenproject.org
> ---
> v3:
> - mention explicitly that this for IPv4
> - memset the TCP header and set up doff
> - rework of checksum handling and GSO setting in fill_packet_ipv4
> - bail out in pktgen_xmit if the device won't be able to handle GSO
> 
> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> index 0d0aaac..9d93bda 100644
> --- a/net/core/pktgen.c
> +++ b/net/core/pktgen.c
> @@ -162,6 +162,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #ifdef CONFIG_XFRM
> @@ -203,6 +204,7 @@
>  #define F_NODE  (1<<15)  /* Node memory alloc*/
>  #define F_UDPCSUM   (1<<16)  /* Include UDP checksum */
>  #define F_PATTERN   (1<<17)  /* Fill the payload with a pattern */
> +#define F_TCP   (1<<18)  /* Send TCP packet instead of UDP */
>  
>  /* Thread control flag bits */
>  #define T_STOP(1<<0) /* Stop run */
> @@ -664,6 +666,9 @@ static int pktgen_if_show(struct seq_file *seq, void *v)
>   if (pkt_dev->flags & F_PATTERN)
>   seq_puts(seq, "PATTERN  ");
>  
> + if (pkt_dev->flags & F_TCP)
> + seq_puts(seq, "TCP  ");
> +
>   if (pkt_dev->flags & F_MPLS_RND)
>   seq_puts(seq,  "MPLS_RND  ");
>  
> @@ -1342,6 +1347,12 @@ static ssize_t pktgen_if_write(struct file *file,
>   else if (strcmp(f, "!PATTERN") == 0)
>   pkt_dev->flags &= ~F_PATTERN;
>  
> + else if (strcmp(f, "TCP") == 0)
> + pkt_dev->flags |= F_TCP;
> +
> + else if (strcmp(f, "!TCP") == 0)
> + pkt_dev->flags &= ~F_TCP;
> +
>   else {
>   sprintf(pg_result,
>   "Flag -:%s:- unknown\nAvailable flags, (prepend 
> ! to un-set flag):\n%s",
> @@ -2955,7 +2966,8 @@ static struct sk_buff *fill_packet_ipv4(struct 
> net_device *odev,
>  {
>   struct sk_buff *skb = NULL;
>   __u8 *eth;
> - struct udphdr *udph;
> + struct udphdr *udph = NULL;
> + struct tcphdr *tcph;
>   int datalen, iplen;
>   struct iphdr *iph;
>   __be16 protocol = htons(ETH_P_IP);
> @@ -3017,29 +3029,40 @@ static struct sk_buff *fill_packet_ipv4(struct 
> net_device *odev,
>   iph = (struct iphdr *) skb_put(skb, sizeof(struct iphdr));
>  
>   skb_set_transport_header(skb, skb->len);
> - udph = (struct udphdr *) skb_put(skb, sizeof(struct udphdr));
> +
> + if (pkt_dev->flags & F_TCP) {
> + datalen = pkt_dev->cur_pkt_size - ETH_HLEN - 20 -
> +   sizeof(struct tcphdr) - pkt_dev->pkt_overhead;
> + tcph = (struct tcphdr *)skb_put(skb, sizeof(struct tcphdr));
> + memset(tcph, 0, sizeof(*tcph));
> + tcph->source = htons(pkt_dev->cur_udp_src);
> + tcph->dest = htons(pkt_dev->cur_udp_dst);
> + tcph->doff = sizeof(struct tcphdr) >> 2;
> + } else {
> + datalen = pkt_dev->cur_pkt_size - ETH_HLEN - 20 -
> +   sizeof(struct udphdr) - pkt_dev->pkt_overhead;
> + udph = (struct udphdr *)skb_put(skb, sizeof(struct udphdr));
> + udph->source = htons(pkt_dev->cur_udp_src);
> + udph->dest = htons(pkt_dev->cur_udp_dst);
> + udph->len = htons(datalen + sizeof(struct udphdr));
> + udph->check = 0;
> + }
> +

As more protocols (SCTP, etc.) get supported, this is going to become
completely unmanageable.  Please use callbacks or something like that
so this function doesn't turn into even more spaghetti.

> + } else if (pkt_dev->flags & F_TCP) {
> + struct inet_sock inet;
> +
> + inet.inet_saddr = iph->saddr;
> + inet.inet_daddr = iph->daddr;
> + skb->ip_summed = CHECKSUM_NONE;
> + tcp_v4_send_check((struct sock *), skb);

Please don't do things like this.  Making fake sockets on the stack, don't
do it.

Do other non-socket contexts compute TCP checksums this way?  Check
netfilter or similar, see what they do.

Worst case export __tcp_v4_send_check() or just duplicate it's contents
in the tcp case here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] LLVMLinux: Patches to enable the kernel to be compiled with clang/LLVM

2014-07-31 Thread Behan Webster

On 07/31/14 03:33, Will Deacon wrote:

On Thu, Jul 31, 2014 at 12:57:25AM +0100, beh...@converseincode.com wrote:

From: Behan Webster 

This patch set moves from using locally defined named registers to access the
stack pointer to using a globally defined named register. This allows the code
to work both with gcc and clang.

The LLVMLinux project aims to fully build the Linux kernel using both gcc and
clang (the C front end for the LLVM compiler infrastructure project).

Behan Webster (4):
   arm64: LLVMLinux: Add current_stack_pointer() for arm64
   arm64: LLVMLinux: Use current_stack_pointer in save_stack_trace_tsk
   arm64: LLVMLinux: Calculate current_thread_info from
 current_stack_pointer
   arm64: LLVMLinux: Use current_stack_pointer in kernel/traps.c

Once Andreas's comments have been addressed:

   Acked-by: Will Deacon 

Please can you send a new series after the merge window?

Pity. I was hoping to get it in this merge window.

However, will resubmit for 3.18.

Thanks,

Behan

--
Behan Webster
beh...@converseincode.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] kbuild, LLVMLinux: Supress warnings unless W=1-3

2014-07-31 Thread behanw
From: Behan Webster 

clang has more warnings enabled by default. Turn them off unless W is set.
This patch fixes a logic bug where warnings in clang were disabled when W was 
set.

Signed-off-by: Behan Webster 
Signed-off-by: Jan-Simon Möller 
Signed-off-by: Mark Charlebois 
Cc: mma...@suse.cz
Cc: b...@alien8.de
---
 Makefile   |  1 +
 scripts/Makefile.extrawarn | 22 --
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/Makefile b/Makefile
index f6a7794..f343e17 100644
--- a/Makefile
+++ b/Makefile
@@ -668,6 +668,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, 
tautological-compare)
 # source of a reference will be _MergedGlobals and not on of the whitelisted 
names.
 # See modpost pattern 2
 KBUILD_CFLAGS += $(call cc-option, -mno-global-merge,)
+KBUILD_CFLAGS += $(call cc-option, -fcatch-undefined-behavior)
 else
 
 # This warning generated too much noise in a regular build.
diff --git a/scripts/Makefile.extrawarn b/scripts/Makefile.extrawarn
index 6564350..4315d34 100644
--- a/scripts/Makefile.extrawarn
+++ b/scripts/Makefile.extrawarn
@@ -26,16 +26,6 @@ warning-1 += $(call cc-option, -Wmissing-include-dirs)
 warning-1 += $(call cc-option, -Wunused-but-set-variable)
 warning-1 += $(call cc-disable-warning, missing-field-initializers)
 
-# Clang
-warning-1 += $(call cc-disable-warning, initializer-overrides)
-warning-1 += $(call cc-disable-warning, unused-value)
-warning-1 += $(call cc-disable-warning, format)
-warning-1 += $(call cc-disable-warning, unknown-warning-option)
-warning-1 += $(call cc-disable-warning, sign-compare)
-warning-1 += $(call cc-disable-warning, format-zero-length)
-warning-1 += $(call cc-disable-warning, uninitialized)
-warning-1 += $(call cc-option, -fcatch-undefined-behavior)
-
 warning-2 := -Waggregate-return
 warning-2 += -Wcast-align
 warning-2 += -Wdisabled-optimization
@@ -64,4 +54,16 @@ ifeq ("$(strip $(warning))","")
 endif
 
 KBUILD_CFLAGS += $(warning)
+else
+
+ifeq ($(COMPILER),clang)
+KBUILD_CFLAGS += $(call cc-disable-warning, initializer-overrides)
+KBUILD_CFLAGS += $(call cc-disable-warning, unused-value)
+KBUILD_CFLAGS += $(call cc-disable-warning, format)
+KBUILD_CFLAGS += $(call cc-disable-warning, unknown-warning-option)
+KBUILD_CFLAGS += $(call cc-disable-warning, sign-compare)
+KBUILD_CFLAGS += $(call cc-disable-warning, format-zero-length)
+KBUILD_CFLAGS += $(call cc-disable-warning, uninitialized)
 endif
+endif
+
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] kbuild, LLVMLinux: Supress warnings unless W=1-3

2014-07-31 Thread Behan Webster

On 07/31/14 13:46, Michal Marek wrote:

Dne 31.7.2014 18:12, Behan Webster napsal(a):

On 07/31/14 01:18, Michal Marek wrote:

Dne 31.7.2014 06:16, beh...@converseincode.com napsal(a):

@@ -55,6 +45,18 @@ warning-3 += -Wswitch-default
   warning-3 += $(call cc-option, -Wpacked-bitfield-compat)
   warning-3 += $(call cc-option, -Wvla)
   +ifeq ($(COMPILER),clang)
+ifndef $(W)
+KBUILD_CFLAGS += $(call cc-disable-warning, initializer-overrides)
+KBUILD_CFLAGS += $(call cc-disable-warning, unused-value)
+KBUILD_CFLAGS += $(call cc-disable-warning, format)
+KBUILD_CFLAGS += $(call cc-disable-warning, unknown-warning-option)
+KBUILD_CFLAGS += $(call cc-disable-warning, sign-compare)
+KBUILD_CFLAGS += $(call cc-disable-warning, format-zero-length)
+KBUILD_CFLAGS += $(call cc-disable-warning, uninitialized)
+endif
+endif
+

Please remove this part, it has no effect. I assume that if it works for
you, these warning are not as annoying so they do not need to be
disabled?

Actually they are annoying, that's why they're disabled normally. Most
of them complain about practices which are relatively common in kernel
code.

clang warns about a lot more things than gcc does. It means that code
which compiles cleanly in gcc often doesn't with clang. This cuts out
the warnings which are unlikely to to be fixed in kernel code anytime
soon, but which are probably worth exposing when W=1 is used.

This part of the patch explicitly deals with complaints from some in the
kernel community that clang is too noisy with kernel code.

This part of the patch needs to be somewhere. This seemed the best place.

You placed it inside a branch that is only evaluated when W= is given.

Hmm. You're right. Will fix.

Behan

--
Behan Webster
beh...@converseincode.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Davidlohr Bueso
On Fri, 2014-08-01 at 10:03 +0800, Aaron Lu wrote:
> On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote:
> > On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
> > > On Tue, 29 Jul 2014 13:24:05 +0800
> > > Aaron Lu  wrote:
> > > 
> > > > FYI, we noticed the below changes on
> > > > 
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > > > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure 
> > > > task_numa_migrate() checks the preferred node")
> > > > 
> > > > ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
> > > > ---  -  
> > > >  94500 ~ 3%+115.6% 203711 ~ 6%  
> > > > ivb42/hackbench/50%-threads-pipe
> > > >  67745 ~ 4% +64.1% 74 ~ 5%  
> > > > lkp-snb01/hackbench/50%-threads-socket
> > > > 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
> > > > proc-vmstat.numa_hint_faults_local
> > > 
> > > Hi Aaron,
> > > 
> > > Jirka Hladky has reported a regression with that changeset as
> > > well, and I have already spent some time debugging the issue.
> > 
> > So assuming those numbers above are the difference in
> 
> Yes, they are.
> 
> It means, for commit ebe06187bf2aec1, the number for
> num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01
> machine. The 3%, 4% following that number means the deviation of the
> different runs to their average(we usually run it multiple times to
> phase out possible sharp values). We should probably remove that
> percentage, as they cause confusion if no detailed explanation and may
> not mean much to the commit author and others(if the deviation is big
> enough, we should simply drop that result).
> 
> The percentage in the middle is the change between the two commits.
> 
> Another thing is the meaning of the numbers, it doesn't seem that
> evident they are for proc-vmstat.numa_hint_faults_local. Maybe something
> like this is better?

Instead of removing info, why not document what each piece of data
represents. Or add headers to the table. etc.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] ARM: Straggler SoC fix for 3.16

2014-07-31 Thread Olof Johansson
Hi Linus,

The following changes since commit a1ae5b128365f36a3fa2143cfa9de14fc71c51d8:

  Merge tag 'omap-for-v3.16/n900-regression' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes 
(2014-07-29 13:04:27 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git 
tags/fixes-for-linus

for you to fetch changes up to b779b88df8370feafa6f49d08a2cc88db4834992:

  MAINTAINERS: Update Tegra Git URL (2014-07-30 12:50:54 -0700)


ARM: Straggler SoC fix for 3.16

A DT bugfix for Nomadik that had an ambigouos double-inversion of a gpio
line, and one MAINTAINER URL update that might as well go in now.

We could hold off until the merge window, but then we'll just have to mark
the DT fix for stable and it just seems like in total causing more work.


Andreas Färber (1):
  MAINTAINERS: Update Tegra Git URL

Linus Walleij (1):
  ARM: nomadik: fix up double inversion in DT

 MAINTAINERS| 2 +-
 arch/arm/boot/dts/ste-nomadik-s8815.dts| 2 +-
 arch/arm/boot/dts/ste-nomadik-stn8815.dtsi | 7 ---
 3 files changed, 6 insertions(+), 5 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Help with btrfs_zero_range function

2014-07-31 Thread Nick Krause
Hey Guys,
I need to ask a question again, I am writing the above function and
basing it off the one of punch hole.
I have only started writing the function and have a few questions
about how to write this. Below this message
are my questions so fair and I also posting my written code in case
you guys want to give any feed back.
Regards and Thanks Again,
Nick
Questions
1. bool no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES);
   How I change this to check for a zero range or do I just remove
this variable;
2. ret = find_first_non_hole(inode, , );
How do I modify the called function for ret to be for a zero range?
The other parts of this function  are pretty similar to the one for
punch holes and seems pretty easy
to move other the other parts.
Code

static long btrfs_zero_range(struct inode *inode, loff_t loffset,
loff_t len,){
struct btrfs_root *root = BTRF_I(inode)->root;
struct btrfs_path *path;
struct btrfs_block_rsv *rsv;
struct btrfs_trans_handle *trans;
 u64 lockstart;
 u64 lockend;
 u64 tail_start;
 u64 tail_len;
 u64 orig_start = offset;
u64 cur_offset;
u64 min_size = btrfs_calc_trunc_metadata_size(root, 1);
u64 drop_end;
int ret = 0;
 int err = 0;
 int rsv_count;
 bool same_page;
 bool no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES);
 u64 ino_size;
ret=btrfs_wait_ordered_range(inode, offset, len);
if(ret)
return ret;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2 v4] sched: Rewrite per entity runnable load average

2014-07-31 Thread Yuyang Du
Hi Vincent,

On Thu, Jul 31, 2014 at 11:56:13AM +0200, Vincent Guittot wrote:
> 
> load_sum is now the average runnable time before being weighted
 
So when weight changes, load_avg will completely use new weight. I have
some cents:

1) Task does not change weight much, so it is practically ok

2) Group entity does change weight much, and very likely back and forth,
   so I really think keeping the intact history will make everything
   more predictable/stable, prevent thrashing, etc.

3) If you do the same for cfs_rq->load.weight, then we simply abandoned
   blocked entities, and all states won't compute. So we then need to
   maintain blocked load average again, and we just can't do cfs_rq load
   average as a whole anymore, but must update at the granularity of an
   entity...

Anyway, it does not seem to me you really need to change load_sum, no? So
could you please not change it?

> The sum of usage_sum of the tasks that are on a rq, is used to detect
> the overload of a rq.

I think you only need usage_sum for task and rq, but not cfs_rq. Others
are ok.
 
> Does something like the patch below to be applied of top of your patchset, 
> seem
> reasonable add-on?
> 

If you only add running statistics, I am all good, and indeed reasonable if
you can make good use of it. I am not at all against adding anything or
adding running average or unweighted anything...

Thanks,
Yuyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 4/4] ARM: dts: Add exynos5250-spring device tree

2014-07-31 Thread Andreas Färber
Am 31.07.2014 21:05, schrieb Tomasz Figa:
>> +
>> +_2 {
>> +status = "okay";
>> +samsung,i2c-sda-delay = <100>;
>> +samsung,i2c-max-bus-freq = <66000>;
>> +
>> +hdmiddc@50 {
>> +compatible = "samsung,exynos4210-hdmiddc";
>> +reg = <0x50>;
>> +};
> 
> I don't think this matches current Exynos HDMI bindings, which I believe
> have been changed to just take a phandle to i2c bus instead.

Looks correct to me:

http://git.kernel.org/cgit/linux/kernel/git/kgene/linux-samsung.git/tree/Documentation/devicetree/bindings/video/exynos_hdmiddc.txt?h=for-next

https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/Documentation/devicetree/bindings/video/exynos_hdmiddc.txt

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM/ARM64: don't enter kgdb when userspace executes a kgdb break instruction.

2014-07-31 Thread Omar Sandoval
Hi, Will,

On Thu, Jul 31, 2014 at 11:46:53AM +0100, Will Deacon wrote:
> I'll merge the arm64 diff I proposed. Could you repost the ARM part please?
I've just reposted it, hopefully we can get that merged in soon as well.

> I think enabling and activating kgdb by default is a pretty crazy thing to
> do, but I agree that we shouldn't allow userspace to trap into it either.
Agreed, hopefully the actual footprint of this bug isn't too large, but it's
worth fixing anyways.

> Once you repost the ARM patches, we can look at getting them merged via rmk.
> 
> Cheers,
> 
> Will

Thanks again!
Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ARM: don't enter kgdb when userspace executes a kgdb break instruction.

2014-07-31 Thread Omar Sandoval
The kgdb breakpoint hooks (kgdb_brk_fn and kgdb_compiled_brk_fn) should only be
entered when a kgdb break instruction is executed from the kernel. Otherwise,
if kgdb is enabled, a userspace program can cause the kernel to drop into the
debugger by executing either KGDB_BREAKINST or KGDB_COMPILED_BREAK.

Signed-off-by: Omar Sandoval 
---
On a kernel running with kgdb enabled, this program reproduces the problem:
.globl _start
_start:
udf #65006  @ KGDB_BREAKINST

The same problem has been fixed in ARM64.

 arch/arm/kernel/kgdb.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index 778c2f7..a74b53c 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -160,12 +160,16 @@ static int kgdb_compiled_brk_fn(struct pt_regs *regs, 
unsigned int instr)
 static struct undef_hook kgdb_brkpt_hook = {
.instr_mask = 0x,
.instr_val  = KGDB_BREAKINST,
+   .cpsr_mask  = MODE_MASK,
+   .cpsr_val   = SVC_MODE,
.fn = kgdb_brk_fn
 };
 
 static struct undef_hook kgdb_compiled_brkpt_hook = {
.instr_mask = 0x,
.instr_val  = KGDB_COMPILED_BREAK,
+   .cpsr_mask  = MODE_MASK,
+   .cpsr_val   = SVC_MODE,
.fn = kgdb_compiled_brk_fn
 };
 
-- 
2.0.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] iovec: make sure the caller actually wants anything in memcpy_fromiovecend

2014-07-31 Thread Sasha Levin
Check for cases when the caller requests 0 bytes instead of running off
and dereferencing potentially invalid iovecs.

Signed-off-by: Sasha Levin 
---
 lib/iovec.c |4 
 1 file changed, 4 insertions(+)

diff --git a/lib/iovec.c b/lib/iovec.c
index 7a7c2da..df3abd1 100644
--- a/lib/iovec.c
+++ b/lib/iovec.c
@@ -85,6 +85,10 @@ EXPORT_SYMBOL(memcpy_toiovecend);
 int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov,
int offset, int len)
 {
+   /* No data? Done! */
+   if (len == 0)
+   return 0;
+
/* Skip over the finished iovecs */
while (offset >= iov->iov_len) {
offset -= iov->iov_len;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC tip/core/rcu 1/9] rcu: Add call_rcu_tasks()

2014-07-31 Thread Mike Galbraith
On Thu, 2014-07-31 at 09:38 -0700, Paul E. McKenney wrote:

> Does building with CONFIG_NO_HZ_FULL_SYSIDLE=y slow things down even more?
> If so, that would give me a rough idea of the cost of RCU's dyntick-idle
> handling.

Nope.  Deltas are all down in the statistical frog hair.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ACPI/Processor: Add CPU_STARTING_FROZEN check in the acpi_cpu_soft_notify()

2014-07-31 Thread Lan Tianyu
On 2014年08月01日 05:20, Rafael J. Wysocki wrote:
> On Thursday, July 31, 2014 05:20:26 PM Lan Tianyu wrote:
>> The callback of CPU_STARTING event can't sleep and so acpi_cpu_soft_notify()
>> return directly when CPU_STARTING event is triggered. But cpu hotplug also
>> happens during S2RAM. The action will become CPU_STARTING_FROZEN. This
>> patch is to fix missing check the frozen event.
>>
>> Signed-off-by: Lan Tianyu 
> 
> There is work to restructure the handling of CPU_TASKS_FROZEN under way
> and Chen Gong is driving it.  That's likely to conflict with the last
> two patches from you.  Can you please coordinate with Gong?

Hi Rafael:

Thanks for reminder. I just checked Chen Gong's patchset "Gloabl CPU
Hot-plug flag _FROZEN Clean up". There is no conflict between our
patches. Gong's patch is to remove the following macro.

CPU_ONLINE_FROZEN
CPU_UP_PREPARE_FROZEN
CPU_UP_CANCELED_FROZEN
CPU_DOWN_PREPARE_FROZEN
CPU_DOWN_FAILED_FROZEN
CPU_DEAD_FROZEN
CPU_DYING_FROZEN
CPU_STARTING_FROZEN

CPU_TASKS_FROZEN is still available and the CPU events during S2RAM
are still (CPU_xxx | CPU_TASKS_FROZEN).

BTW, this is a bug fix from my opinion and it should be backported to
stable tree.

> 
> Rafael
> 
> 
>> ---
>>  drivers/acpi/processor_driver.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/acpi/processor_driver.c 
>> b/drivers/acpi/processor_driver.c
>> index 4fcbd67..66e2249 100644
>> --- a/drivers/acpi/processor_driver.c
>> +++ b/drivers/acpi/processor_driver.c
>> @@ -125,7 +125,7 @@ static int acpi_cpu_soft_notify(struct notifier_block 
>> *nfb,
>>   * CPU_STARTING and CPU_DYING must not sleep. Return here since
>>   * acpi_bus_get_device() may sleep.
>>   */
>> -if (action == CPU_STARTING || action == CPU_DYING)
>> +if ((action & ~CPU_TASKS_FROZEN) == CPU_STARTING || action == CPU_DYING)
>>  return NOTIFY_DONE;
>>  
>>  if (!pr || acpi_bus_get_device(pr->handle, ))
>>
> 


-- 
Best regards
Tianyu Lan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V9 2/6] Documentation: power: reset: Add documentation for generic SYSCON reboot driver

2014-07-31 Thread Feng Kan
Add documentation for generic SYSCON reboot driver.

Signed-off-by: Feng Kan 
---
 .../bindings/power/reset/syscon-reboot.txt | 23 ++
 1 file changed, 23 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/power/reset/syscon-reboot.txt

diff --git a/Documentation/devicetree/bindings/power/reset/syscon-reboot.txt 
b/Documentation/devicetree/bindings/power/reset/syscon-reboot.txt
new file mode 100644
index 000..1190631
--- /dev/null
+++ b/Documentation/devicetree/bindings/power/reset/syscon-reboot.txt
@@ -0,0 +1,23 @@
+Generic SYSCON mapped register reset driver
+
+This is a generic reset driver using syscon to map the reset register.
+The reset is generally performed with a write to the reset register
+defined by the register map pointed by syscon reference plus the offset
+with the mask defined in the reboot node.
+
+Required properties:
+- compatible: should contain "syscon-reboot"
+- regmap: this is phandle to the register map node
+- offset: offset in the register map for the reboot register (in bytes)
+- mask: the reset value written to the reboot register (32 bit access)
+
+Default will be little endian mode, 32 bit access only.
+
+Examples:
+
+   reboot {
+  compatible = "syscon-reboot";
+  regmap = <>;
+  offset = <0x0>;
+  mask = <0x1>;
+   };
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V9 0/6] Add X-Gene platform reboot mechanism

2014-07-31 Thread Feng Kan
Enable reboot driver for the X-Gene platform. Add generic syscon reboot
driver.

V9 Change:
- rebase on Guenter Roeck's V5 reset handler patch set. This 
  allows for a generic reset to be call rather than the arm
  specific reset handler.

V8 Change:
- change Kconfig to depend on ARM || ARM64 || COMPILE_TEST

V7 Change:
- Seem V3 on, the patches were not making in to the mailinglist.
- Fix build error produced by other ARCH while including this
  driver. Set to depend on arm64 ARCH for now.

V6 Change:
- Add documentation for scu node.

V5 Change:
- Documentation update, endian and access size.

V4 Change:
- Remove old X-Gene reboot driver
- Add generic syscon reboot driver
- Add DTS and Kconfig for X-Gene reboot using syscon method

V3 Change:
- Remove the reboot driver's use of acpi resource patch.
- Change the reboot driver to use syscon to parse out 
  system clock register. Remove the old method of getting
  register from the reboot driver directly.
- Remove documentation since its now simple.

V2 Change:
- Add support for using ACPI resource.


Feng Kan (6):
  power: reset: Add generic SYSCON register mapped reset
  Documentation: power: reset: Add documentation for generic SYSCON
reboot driver
  Documentation: arm64: add SCU dts binding documentation to linux
kernel
  arm64: dts: Add X-Gene reboot driver dts node
  arm64: Select reboot driver for X-Gene platform
  power: reset: Remove X-Gene reboot driver

 Documentation/devicetree/bindings/arm/apm/scu.txt  |  17 
 .../bindings/power/reset/syscon-reboot.txt |  23 +
 arch/arm64/Kconfig |   2 +
 arch/arm64/boot/dts/apm-storm.dtsi |  12 +++
 drivers/power/reset/Kconfig|  12 +--
 drivers/power/reset/Makefile   |   2 +-
 drivers/power/reset/syscon-reboot.c|  98 
 drivers/power/reset/xgene-reboot.c | 103 -
 8 files changed, 158 insertions(+), 111 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/apm/scu.txt
 create mode 100644 
Documentation/devicetree/bindings/power/reset/syscon-reboot.txt
 create mode 100644 drivers/power/reset/syscon-reboot.c
 delete mode 100644 drivers/power/reset/xgene-reboot.c

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V9 4/6] arm64: dts: Add X-Gene reboot driver dts node

2014-07-31 Thread Feng Kan
Add X-Gene platform reboot driver dts node.

Signed-off-by: Feng Kan 
---
 arch/arm64/boot/dts/apm-storm.dtsi | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/boot/dts/apm-storm.dtsi 
b/arch/arm64/boot/dts/apm-storm.dtsi
index ccd150a..3dfd1f4 100644
--- a/arch/arm64/boot/dts/apm-storm.dtsi
+++ b/arch/arm64/boot/dts/apm-storm.dtsi
@@ -103,6 +103,11 @@
#size-cells = <2>;
ranges;
 
+   scu: system-clk-controller@1700 {
+   compatible = "apm,xgene-scu","syscon";
+   reg = <0x0 0x1700 0x0 0x400>;
+   };
+
clocks {
#address-cells = <2>;
#size-cells = <2>;
@@ -397,5 +402,12 @@
#clock-cells = <1>;
clocks = < 0>;
};
+
+   reboot: reboot@1714 {
+   compatible = "syscon-reboot";
+   regmap = <>;
+   offset = <0x14>;
+   mask = <0x1>;
+   };
};
 };
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V9 3/6] Documentation: arm64: add SCU dts binding documentation to linux kernel

2014-07-31 Thread Feng Kan
This add documentation for the SCU system clock unit device tree binding
to the kernel.

Signed-off-by: Feng Kan 
---
 Documentation/devicetree/bindings/arm/apm/scu.txt | 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/apm/scu.txt

diff --git a/Documentation/devicetree/bindings/arm/apm/scu.txt 
b/Documentation/devicetree/bindings/arm/apm/scu.txt
new file mode 100644
index 000..b45be06
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/apm/scu.txt
@@ -0,0 +1,17 @@
+APM X-GENE SoC series SCU Registers
+
+This system clock unit contain various register that control block resets,
+clock enable/disables, clock divisors and other deepsleep registers.
+
+Properties:
+ - compatible : should contain two values. First value must be:
+  - "apm,xgene-scu"
+   second value must be always "syscon".
+
+ - reg : offset and length of the register set.
+
+Example :
+   scu: system-clk-controller@1700 {
+   compatible = "apm,xgene-scu","syscon";
+   reg = <0x0 0x1700 0x0 0x400>;
+   };
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V9 5/6] arm64: Select reboot driver for X-Gene platform

2014-07-31 Thread Feng Kan
Select reboot driver for X-Gene platform.

Signed-off-by: Feng Kan 
---
 arch/arm64/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 839f48c..df6a646 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -141,6 +141,8 @@ config ARCH_VEXPRESS
 
 config ARCH_XGENE
bool "AppliedMicro X-Gene SOC Family"
+   select MFD_SYSCON
+   select POWER_RESET_SYSCON
help
  This enables support for AppliedMicro X-Gene SOC Family
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V9 6/6] power: reset: Remove X-Gene reboot driver

2014-07-31 Thread Feng Kan
Remove X-Gene reboot driver.

Signed-off-by: Feng Kan 
---
 drivers/power/reset/Kconfig|   7 ---
 drivers/power/reset/Makefile   |   1 -
 drivers/power/reset/xgene-reboot.c | 103 -
 3 files changed, 111 deletions(-)
 delete mode 100644 drivers/power/reset/xgene-reboot.c

diff --git a/drivers/power/reset/Kconfig b/drivers/power/reset/Kconfig
index fd5f9e5..48b3cb3 100644
--- a/drivers/power/reset/Kconfig
+++ b/drivers/power/reset/Kconfig
@@ -66,13 +66,6 @@ config POWER_RESET_VEXPRESS
  Power off and reset support for the ARM Ltd. Versatile
  Express boards.
 
-config POWER_RESET_XGENE
-   bool "APM SoC X-Gene reset driver"
-   depends on ARM64
-   depends on POWER_RESET
-   help
- Reboot support for the APM SoC X-Gene Eval boards.
-
 config POWER_RESET_KEYSTONE
bool "Keystone reset driver"
depends on ARCH_KEYSTONE
diff --git a/drivers/power/reset/Makefile b/drivers/power/reset/Makefile
index b1b5ab3..62088d8 100644
--- a/drivers/power/reset/Makefile
+++ b/drivers/power/reset/Makefile
@@ -6,6 +6,5 @@ obj-$(CONFIG_POWER_RESET_QNAP) += qnap-poweroff.o
 obj-$(CONFIG_POWER_RESET_RESTART) += restart-poweroff.o
 obj-$(CONFIG_POWER_RESET_SUN6I) += sun6i-reboot.o
 obj-$(CONFIG_POWER_RESET_VEXPRESS) += vexpress-poweroff.o
-obj-$(CONFIG_POWER_RESET_XGENE) += xgene-reboot.o
 obj-$(CONFIG_POWER_RESET_KEYSTONE) += keystone-reset.o
 obj-$(CONFIG_POWER_RESET_SYSCON) += syscon-reboot.o
diff --git a/drivers/power/reset/xgene-reboot.c 
b/drivers/power/reset/xgene-reboot.c
deleted file mode 100644
index ecd55f8..000
--- a/drivers/power/reset/xgene-reboot.c
+++ /dev/null
@@ -1,103 +0,0 @@
-/*
- * AppliedMicro X-Gene SoC Reboot Driver
- *
- * Copyright (c) 2013, Applied Micro Circuits Corporation
- * Author: Feng Kan 
- * Author: Loc Ho 
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2 of
- * the License, or (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place, Suite 330, Boston,
- * MA 02111-1307 USA
- *
- * This driver provides system reboot functionality for APM X-Gene SoC.
- * For system shutdown, this is board specify. If a board designer
- * implements GPIO shutdown, use the gpio-poweroff.c driver.
- */
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-struct xgene_reboot_context {
-   struct platform_device *pdev;
-   void *csr;
-   u32 mask;
-};
-
-static struct xgene_reboot_context *xgene_restart_ctx;
-
-static void xgene_restart(char str, const char *cmd)
-{
-   struct xgene_reboot_context *ctx = xgene_restart_ctx;
-   unsigned long timeout;
-
-   /* Issue the reboot */
-   if (ctx)
-   writel(ctx->mask, ctx->csr);
-
-   timeout = jiffies + HZ;
-   while (time_before(jiffies, timeout))
-   cpu_relax();
-
-   dev_emerg(>pdev->dev, "Unable to restart system\n");
-}
-
-static int xgene_reboot_probe(struct platform_device *pdev)
-{
-   struct xgene_reboot_context *ctx;
-
-   ctx = devm_kzalloc(>dev, sizeof(*ctx), GFP_KERNEL);
-   if (!ctx) {
-   dev_err(>dev, "out of memory for context\n");
-   return -ENODEV;
-   }
-
-   ctx->csr = of_iomap(pdev->dev.of_node, 0);
-   if (!ctx->csr) {
-   devm_kfree(>dev, ctx);
-   dev_err(>dev, "can not map resource\n");
-   return -ENODEV;
-   }
-
-   if (of_property_read_u32(pdev->dev.of_node, "mask", >mask))
-   ctx->mask = 0x;
-
-   ctx->pdev = pdev;
-   arm_pm_restart = xgene_restart;
-   xgene_restart_ctx = ctx;
-
-   return 0;
-}
-
-static struct of_device_id xgene_reboot_of_match[] = {
-   { .compatible = "apm,xgene-reboot" },
-   {}
-};
-
-static struct platform_driver xgene_reboot_driver = {
-   .probe = xgene_reboot_probe,
-   .driver = {
-   .name = "xgene-reboot",
-   .of_match_table = xgene_reboot_of_match,
-   },
-};
-
-static int __init xgene_reboot_init(void)
-{
-   return platform_driver_register(_reboot_driver);
-}
-device_initcall(xgene_reboot_init);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V9 1/6] power: reset: Add generic SYSCON register mapped reset

2014-07-31 Thread Feng Kan
Add a generic SYSCON register mapped reset mechanism.

Signed-off-by: Feng Kan 
---
 drivers/power/reset/Kconfig |  5 ++
 drivers/power/reset/Makefile|  1 +
 drivers/power/reset/syscon-reboot.c | 98 +
 3 files changed, 104 insertions(+)
 create mode 100644 drivers/power/reset/syscon-reboot.c

diff --git a/drivers/power/reset/Kconfig b/drivers/power/reset/Kconfig
index bdcf517..fd5f9e5 100644
--- a/drivers/power/reset/Kconfig
+++ b/drivers/power/reset/Kconfig
@@ -80,3 +80,8 @@ config POWER_RESET_KEYSTONE
help
  Reboot support for the KEYSTONE SoCs.
 
+config POWER_RESET_SYSCON
+   bool "Generic SYSCON regmap reset driver"
+   depends on POWER_RESET && MFD_SYSCON && OF
+   help
+ Reboot support for generic SYSCON mapped register reset.
diff --git a/drivers/power/reset/Makefile b/drivers/power/reset/Makefile
index dde2e8b..b1b5ab3 100644
--- a/drivers/power/reset/Makefile
+++ b/drivers/power/reset/Makefile
@@ -8,3 +8,4 @@ obj-$(CONFIG_POWER_RESET_SUN6I) += sun6i-reboot.o
 obj-$(CONFIG_POWER_RESET_VEXPRESS) += vexpress-poweroff.o
 obj-$(CONFIG_POWER_RESET_XGENE) += xgene-reboot.o
 obj-$(CONFIG_POWER_RESET_KEYSTONE) += keystone-reset.o
+obj-$(CONFIG_POWER_RESET_SYSCON) += syscon-reboot.o
diff --git a/drivers/power/reset/syscon-reboot.c 
b/drivers/power/reset/syscon-reboot.c
new file mode 100644
index 000..75d6eae
--- /dev/null
+++ b/drivers/power/reset/syscon-reboot.c
@@ -0,0 +1,98 @@
+/*
+ * Generic Syscon Reboot Driver
+ *
+ * Copyright (c) 2013, Applied Micro Circuits Corporation
+ * Author: Feng Kan 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct syscon_reboot_context {
+   struct regmap *map;
+   u32 offset;
+   u32 mask;
+   struct notifier_block restart_handler;
+};
+
+static struct syscon_reboot_context *syscon_reboot_ctx;
+
+static int syscon_restart_handle(struct notifier_block *this,
+   unsigned long mode, void *cmd)
+{
+   struct syscon_reboot_context *ctx = syscon_reboot_ctx;
+   unsigned long timeout;
+
+   /* Issue the reboot */
+   if (ctx->map)
+   regmap_write(ctx->map, ctx->offset, ctx->mask);
+
+   timeout = jiffies + HZ;
+   while (time_before(jiffies, timeout))
+   cpu_relax();
+
+   pr_emerg("Unable to restart system\n");
+   return NOTIFY_DONE;
+}
+
+static int syscon_reboot_probe(struct platform_device *pdev)
+{
+   struct syscon_reboot_context *ctx;
+   struct device *dev = >dev;
+   int err;
+
+   ctx = devm_kzalloc(>dev, sizeof(*ctx), GFP_KERNEL);
+   if (!ctx) {
+   dev_err(>dev, "out of memory for context\n");
+   return -ENOMEM;
+   }
+
+   ctx->map = syscon_regmap_lookup_by_phandle(dev->of_node, "regmap");
+   if (IS_ERR(ctx->map))
+   return PTR_ERR(ctx->map);
+
+   if (of_property_read_u32(pdev->dev.of_node, "offset", >offset))
+   return -EINVAL;
+
+   if (of_property_read_u32(pdev->dev.of_node, "mask", >mask))
+   return -EINVAL;
+
+   ctx->restart_handler.notifier_call = syscon_restart_handle;
+   ctx->restart_handler.priority = 128;
+   err = register_restart_handler(>restart_handler);
+   if (err)
+   dev_err(dev, "can't register restart notifier (err=%d)\n", err);
+
+   syscon_reboot_ctx = ctx;
+
+   return 0;
+}
+
+static struct of_device_id syscon_reboot_of_match[] = {
+   { .compatible = "syscon-reboot" },
+   {}
+};
+
+static struct platform_driver syscon_reboot_driver = {
+   .probe = syscon_reboot_probe,
+   .driver = {
+   .name = "syscon-reboot",
+   .of_match_table = syscon_reboot_of_match,
+   },
+};
+module_platform_driver(syscon_reboot_driver);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ARM: l2x0: fix build warning without CONFIG_OF

2014-07-31 Thread Kefeng Wang
Commit cf9ea8f13(ARM: l2c: remove obsolete l2x0 ops for non-OF init)
remove some obsolete l2x0 ops, the rest of ops: l2x0_cache_sync,
l2x0_cache_sync, l2x0_disable only use under OF enable, so move them
into OF part, or "defined but not used" warning occurs.

Signed-off-by: Kefeng Wang 
---
 arch/arm/mm/cache-l2x0.c | 134 +++
 1 file changed, 67 insertions(+), 67 deletions(-)

diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 7c3fb41..77e57b0 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -135,73 +135,6 @@ static void l2c_disable(void)
dsb(st);
 }
 
-#ifdef CONFIG_CACHE_PL310
-static inline void cache_wait(void __iomem *reg, unsigned long mask)
-{
-   /* cache operations by line are atomic on PL310 */
-}
-#else
-#define cache_wait l2c_wait_mask
-#endif
-
-static inline void cache_sync(void)
-{
-   void __iomem *base = l2x0_base;
-
-   writel_relaxed(0, base + sync_reg_offset);
-   cache_wait(base + L2X0_CACHE_SYNC, 1);
-}
-
-#if defined(CONFIG_PL310_ERRATA_588369) || defined(CONFIG_PL310_ERRATA_727915)
-static inline void debug_writel(unsigned long val)
-{
-   l2c_set_debug(l2x0_base, val);
-}
-#else
-/* Optimised out for non-errata case */
-static inline void debug_writel(unsigned long val)
-{
-}
-#endif
-
-static void l2x0_cache_sync(void)
-{
-   unsigned long flags;
-
-   raw_spin_lock_irqsave(_lock, flags);
-   cache_sync();
-   raw_spin_unlock_irqrestore(_lock, flags);
-}
-
-static void __l2x0_flush_all(void)
-{
-   debug_writel(0x03);
-   __l2c_op_way(l2x0_base + L2X0_CLEAN_INV_WAY);
-   cache_sync();
-   debug_writel(0x00);
-}
-
-static void l2x0_flush_all(void)
-{
-   unsigned long flags;
-
-   /* clean all ways */
-   raw_spin_lock_irqsave(_lock, flags);
-   __l2x0_flush_all();
-   raw_spin_unlock_irqrestore(_lock, flags);
-}
-
-static void l2x0_disable(void)
-{
-   unsigned long flags;
-
-   raw_spin_lock_irqsave(_lock, flags);
-   __l2x0_flush_all();
-   l2c_write_sec(0, l2x0_base, L2X0_CTRL);
-   dsb(st);
-   raw_spin_unlock_irqrestore(_lock, flags);
-}
-
 static void l2c_save(void __iomem *base)
 {
l2x0_saved_regs.aux_ctrl = readl_relaxed(l2x0_base + L2X0_AUX_CTRL);
@@ -945,6 +878,73 @@ static int l2_wt_override;
  * pass it though the device tree */
 static u32 cache_id_part_number_from_dt;
 
+#ifdef CONFIG_CACHE_PL310
+static inline void cache_wait(void __iomem *reg, unsigned long mask)
+{
+   /* cache operations by line are atomic on PL310 */
+}
+#else
+#define cache_wait l2c_wait_mask
+#endif
+
+static inline void cache_sync(void)
+{
+   void __iomem *base = l2x0_base;
+
+   writel_relaxed(0, base + sync_reg_offset);
+   cache_wait(base + L2X0_CACHE_SYNC, 1);
+}
+
+#if defined(CONFIG_PL310_ERRATA_588369) || defined(CONFIG_PL310_ERRATA_727915)
+static inline void debug_writel(unsigned long val)
+{
+   l2c_set_debug(l2x0_base, val);
+}
+#else
+/* Optimised out for non-errata case */
+static inline void debug_writel(unsigned long val)
+{
+}
+#endif
+
+static void l2x0_cache_sync(void)
+{
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(_lock, flags);
+   cache_sync();
+   raw_spin_unlock_irqrestore(_lock, flags);
+}
+
+static void __l2x0_flush_all(void)
+{
+   debug_writel(0x03);
+   __l2c_op_way(l2x0_base + L2X0_CLEAN_INV_WAY);
+   cache_sync();
+   debug_writel(0x00);
+}
+
+static void l2x0_flush_all(void)
+{
+   unsigned long flags;
+
+   /* clean all ways */
+   raw_spin_lock_irqsave(_lock, flags);
+   __l2x0_flush_all();
+   raw_spin_unlock_irqrestore(_lock, flags);
+}
+
+static void l2x0_disable(void)
+{
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(_lock, flags);
+   __l2x0_flush_all();
+   l2c_write_sec(0, l2x0_base, L2X0_CTRL);
+   dsb(st);
+   raw_spin_unlock_irqrestore(_lock, flags);
+}
+
 static void __init l2x0_of_parse(const struct device_node *np,
 u32 *aux_val, u32 *aux_mask)
 {
-- 
1.7.12.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REVIEW][PATCH 0/4] /proc/thread-self

2014-07-31 Thread Davidlohr Bueso
On Thu, 2014-07-31 at 17:30 -0700, Eric W. Biederman wrote:
> This is small chance changing /proc/net and /proc/mounts will cause
> userspace regressions (although nothing has shown up in my testing) if
> that happens we can just point the change that moves them from
> /proc/self/... to /proc/thread-self/...

Isn't breaking userspace a no no, no matter what? At least some
util-linux programs makes use of both /proc/mounts and /proc/net.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/2] dirreadahead system call

2014-07-31 Thread Abhijith Das


- Original Message -
> From: "NeilBrown" 
> To: "Abhi Das" 
> Cc: linux-kernel@vger.kernel.org, linux-fsde...@vger.kernel.org, 
> cluster-de...@redhat.com
> Sent: Wednesday, July 30, 2014 10:18:05 PM
> Subject: Re: [RFC PATCH 0/2] dirreadahead system call
> 
> On Fri, 25 Jul 2014 12:37:29 -0500 Abhi Das  wrote:
> 
> > This system call takes 3 arguments:
> > fd  - file descriptor of the directory being readahead
> > *offset - offset in dir from which to resume. This is updated
> >   as we move along in the directory
> > count   - The max number of entries to readahead
> > 
> > The syscall is supposed to read upto 'count' entries starting at
> > '*offset' and cache the inodes corresponding to those entries. It
> > returns a negative error code or a positive number indicating
> > the number of inodes it has issued readaheads for. It also
> > updates the '*offset' value so that repeated calls to dirreadahead
> > can resume at the right location. Returns 0 when there are no more
> > entries left.
> 
> Hi Abhi,
> 
>  I like the idea of enhanced read-ahead on a directory.
>  It isn't clear to me why you have included these particular fields in the
>  interface though.
> 
>  - why have an 'offset'?  Why not just use the current offset of the
>directory 'fd'?

The idea was that we didn't want a syscall like readahead mucking with the
file pointer as the same fd might be used to do getdents().

>  - Why have a count?  How would a program choose what count to give?

If a program knows that it's only going to work on a subset of files at a time,
it can use the count value to only readahead a small number of inodes at once
instead of reading ahead the entire directory.

That said, this interface is not set in stone and we are exploring ways to 
inform
the kernel of the inodes we are interested in reading ahead.

> 
>  Maybe you imagine using 'getdents' first to get a list of names, then
>  selectively calling 'dirreadahead'  on the offsets of the names you are
>  interested it?  That would be racy as names can be added and removed which
>  might change offsets.  So maybe you have another reason?
> 
>  I would like to suggest an alternate interface (I love playing the API
>  game).
> 
>  1/ Add a flag to 'fstatat'  AT_EXPECT_MORE.
> If the pathname does not contain a '/', then the 'dirfd' is marked
> to indicate that stat information for all names returned by getdents will
> be wanted.  The filesystem can choose to optimise that however it sees
> fit.
> 
>  2/ Add a flag to 'fstatat'  AT_NONBLOCK.
> This tells the filesystem that you want this information, so if it can
> return it immediately it should, and if not it should start pulling it
> into cache.  Possibly this should be two flags: AT_NONBLOCK just avoids
> any IO, and AT_ASYNC instigates IO even if NONBLOCK is set.
> 
>  Then an "ls -l" could use AT_EXPECT_MORE and then just stat each name.
>  An "ls -l *.c", might avoid AT_EXPECT_MORE, but would use AT_NONBLOCK
>  against all names, then try again with all the names that returned
>  EWOULDBLOCK the first time.
> 
> 
>  I would really like to see the 'xstat' syscall too, but there is no point
>  having both "xstat" and "fxstat".  Follow the model of "fstatat" and provide
>  just "fxstatat" which can do both.
>  With fxstatat, AT_EXPECT_MORE would tell the dirfd exactly which attributes
>  would be wanted so it can fetch only that which is desired.
> 
>  I'm not very keen on the xgetdents idea of including name information and
>  stat information into the one syscall - I would prefer getdents and xstat be
>  kept separate.  Of course if a genuine performance cost of the separate can
>  be demonstrated, I could well change my mind.
> 
>  It does, however, have the advantage that the kernel doesn't need to worry
>  about how long read-ahead data needs to be kept, and the application doesn't
>  need to worry about how soon to retry an fstatat which failed with
>  EWOULDBLOCK.
> 
> Thanks for raising this issue again.  I hope it gets fixed one day...
> 
> NeilBrown
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] lpfc: Avoid to disable pci_dev twice

2014-07-31 Thread Mike Qiu

On 07/17/2014 02:32 PM, Mike Qiu wrote:


Hi, all

How about this patch ?

Any idea ?


In IBM Power servers, when hardware error occurs during probe
state, EEH subsystem will call driver's error_detected interface,
which will call pci_disable_device(). But driver's probe function also
call pci_disable_device() in this situation.

So pci_dev will be disabled twice:

Device lpfc disabling already-disabled device
[ cut here ]
WARNING: at drivers/pci/pci.c:1407
CPU: 0 PID: 8744 Comm: kworker/0:0 Tainted: GW
3.10.42-2002.pkvm2_1_1.6.ppc64 #1
Workqueue: events .work_for_cpu_fn
task: c0274e3f5400 ti: c027d3958000 task.ti: c027d3958000
NIP: c0471b8c LR: c0471b88 CTR: c043ebe0
REGS: c027d395b650 TRAP: 0700   Tainted: GW 
(3.10.42-2002.pkvm2_1_1.6.ppc64)
MSR: 900100029032   CR: 28b52b44  XER: 2000
CFAR: c0879ab8 SOFTE: 1
...
NIP .pci_disable_device+0xcc/0xe0
LR  .pci_disable_device+0xc8/0xe0
Call Trace:
.pci_disable_device+0xc8/0xe0 (unreliable)
.lpfc_disable_pci_dev+0x50/0x80 [lpfc]
.lpfc_pci_probe_one+0x870/0x21a0 [lpfc]
.local_pci_probe+0x68/0xb0
.work_for_cpu_fn+0x38/0x60
.process_one_work+0x1a4/0x4d0
.worker_thread+0x37c/0x490
.kthread+0xf0/0x100
.ret_from_kernel_thread+0x5c/0x80

Signed-off-by: Mike Qiu 
---
  drivers/scsi/lpfc/lpfc.h  |  1 +
  drivers/scsi/lpfc/lpfc_init.c | 59 +++
  2 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 434e903..0c7bad9 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -813,6 +813,7 @@ struct lpfc_hba {
  #define VPD_MASK0xf /* mask for any vpd data */

uint8_t soft_wwn_enable;
+   uint8_t probe_done;

struct timer_list fcp_poll_timer;
struct timer_list eratt_poll;
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 06f9a5b..c2e67ae 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -9519,6 +9519,9 @@ lpfc_pci_probe_one_s3(struct pci_dev *pdev, const struct 
pci_device_id *pid)
}
}

+   /* Set the probe flag */
+   phba->probe_done = 1;
+
/* Perform post initialization setup */
lpfc_post_init_setup(phba);

@@ -9795,6 +9798,9 @@ lpfc_sli_prep_dev_for_recover(struct lpfc_hba *phba)
  static void
  lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba)
  {
+   if (phba)
+   return;
+
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
"2710 PCI channel disable preparing for reset\n");

@@ -9812,7 +9818,8 @@ lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba)

/* Disable interrupt and pci device */
lpfc_sli_disable_intr(phba);
-   pci_disable_device(phba->pcidev);
+   if (phba->probe_done && phba->pcidev)
+   pci_disable_device(phba->pcidev);
  }

  /**
@@ -10282,6 +10289,9 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const 
struct pci_device_id *pid)
goto out_disable_intr;
}

+   /* Set probe_done flag */
+   phba->probe_done = 1;
+
/* Log the current active interrupt mode */
phba->intr_mode = intr_mode;
lpfc_log_intr_mode(phba, intr_mode);
@@ -10544,6 +10554,9 @@ lpfc_sli4_prep_dev_for_recover(struct lpfc_hba *phba)
  static void
  lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba)
  {
+   if (!phba)
+   return;
+
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
"2826 PCI channel disable preparing for reset\n");

@@ -10562,7 +10575,9 @@ lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba)
/* Disable interrupt and pci device */
lpfc_sli4_disable_intr(phba);
lpfc_sli4_queue_destroy(phba);
-   pci_disable_device(phba->pcidev);
+
+   if (phba->probe_done && phba->pcidev)
+   pci_disable_device(phba->pcidev);
  }

  /**
@@ -10893,9 +10908,21 @@ static pci_ers_result_t
  lpfc_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
  {
struct Scsi_Host *shost = pci_get_drvdata(pdev);
-   struct lpfc_hba *phba = ((struct lpfc_vport *)shost->hostdata)->phba;
+   struct lpfc_hba *phba;
pci_ers_result_t rc = PCI_ERS_RESULT_DISCONNECT;

+   if (!shost)
+   /* Run here means it may during probe state and
+* Scsi_Host has not been created and We can do nothing
+* in this state so call for hotplug*/
+   return PCI_ERS_RESULT_NONE;
+
+   phba = ((struct lpfc_vport *)shost->hostdata)->phba;
+
+   if (!phba || !phba->probe_done)
+   /* Run here means it may during probe state */
+   return PCI_ERS_RESULT_NONE;
+
switch (phba->pci_dev_grp) {
case LPFC_PCI_DEV_LP:
rc = lpfc_io_error_detected_s3(pdev, state);
@@ -10930,9 +10957,20 @@ static 

Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks()

2014-07-31 Thread Paul E. McKenney
On Fri, Aug 01, 2014 at 09:31:37AM +0800, Lai Jiangshan wrote:
> On 08/01/2014 05:55 AM, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" 
> > 
> > This commit adds a new RCU-tasks flavor of RCU, which provides
> > call_rcu_tasks().  This RCU flavor's quiescent states are voluntary
> > context switch (not preemption!), userspace execution, and the idle loop.
> > Note that unlike other RCU flavors, these quiescent states occur in tasks,
> > not necessarily CPUs.  Includes fixes from Steven Rostedt.
> > 
> > This RCU flavor is assumed to have very infrequent latency-tolerate
> > updaters.  This assumption permits significant simplifications, including
> > a single global callback list protected by a single global lock, along
> > with a single linked list containing all tasks that have not yet passed
> > through a quiescent state.  If experience shows this assumption to be
> > incorrect, the required additional complexity will be added.
> > 
> > Suggested-by: Steven Rostedt 
> > Signed-off-by: Paul E. McKenney 
> > ---
> >  include/linux/init_task.h |   9 +++
> >  include/linux/rcupdate.h  |  36 ++
> >  include/linux/sched.h |  23 ---
> >  init/Kconfig  |  10 +++
> >  kernel/rcu/tiny.c |   2 +
> >  kernel/rcu/tree.c |   2 +
> >  kernel/rcu/update.c   | 171 
> > ++
> >  7 files changed, 242 insertions(+), 11 deletions(-)
> > 
> > diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> > index 6df7f9fe0d01..78715ea7c30c 100644
> > --- a/include/linux/init_task.h
> > +++ b/include/linux/init_task.h
> > @@ -124,6 +124,14 @@ extern struct group_info init_groups;
> >  #else
> >  #define INIT_TASK_RCU_PREEMPT(tsk)
> >  #endif
> > +#ifdef CONFIG_TASKS_RCU
> > +#define INIT_TASK_RCU_TASKS(tsk)   \
> > +   .rcu_tasks_holdout = false, \
> > +   .rcu_tasks_holdout_list =   \
> > +   LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),
> > +#else
> > +#define INIT_TASK_RCU_TASKS(tsk)
> > +#endif
> >  
> >  extern struct cred init_cred;
> >  
> > @@ -231,6 +239,7 @@ extern struct task_group root_task_group;
> > INIT_FTRACE_GRAPH   \
> > INIT_TRACE_RECURSION\
> > INIT_TASK_RCU_PREEMPT(tsk)  \
> > +   INIT_TASK_RCU_TASKS(tsk)\
> > INIT_CPUSET_SEQ(tsk)\
> > INIT_RT_MUTEXES(tsk)\
> > INIT_VTIME(tsk) \
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 6a94cc8b1ca0..829efc99df3e 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head,
> >  
> >  void synchronize_sched(void);
> >  
> > +/**
> > + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period
> > + * @head: structure to be used for queueing the RCU updates.
> > + * @func: actual callback function to be invoked after the grace period
> > + *
> > + * The callback function will be invoked some time after a full grace
> > + * period elapses, in other words after all currently executing RCU
> > + * read-side critical sections have completed. call_rcu_tasks() assumes
> > + * that the read-side critical sections end at a voluntary context
> > + * switch (not a preemption!), entry into idle, or transition to usermode
> > + * execution.  As such, there are no read-side primitives analogous to
> > + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended
> > + * to determine that all tasks have passed through a safe state, not so
> > + * much for data-strcuture synchronization.
> > + *
> > + * See the description of call_rcu() for more detailed information on
> > + * memory ordering guarantees.
> > + */
> > +void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head 
> > *head));
> > +
> >  #ifdef CONFIG_PREEMPT_RCU
> >  
> >  void __rcu_read_lock(void);
> > @@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct 
> > task_struct *prev,
> > rcu_irq_exit(); \
> > } while (0)
> >  
> > +/*
> > + * Note a voluntary context switch for RCU-tasks benefit.  This is a
> > + * macro rather than an inline function to avoid #include hell.
> > + */
> > +#ifdef CONFIG_TASKS_RCU
> > +#define rcu_note_voluntary_context_switch(t) \
> > +   do { \
> > +   preempt_disable(); /* Exclude synchronize_sched(); */ \
> > +   if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
> > +   ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \
> > +   preempt_enable(); \
> 
> Why the preempt_disable() is needed here? The comments in rcu_tasks_kthread()
> can't persuade me.  

Re: [RFC PATCH 0/2] dirreadahead system call

2014-07-31 Thread Abhijith Das


- Original Message -
> From: "Dave Chinner" 
> To: "Andreas Dilger" 
> Cc: "Abhijith Das" , "LKML" , 
> "linux-fsdevel"
> , cluster-de...@redhat.com
> Sent: Thursday, July 31, 2014 6:53:06 PM
> Subject: Re: [RFC PATCH 0/2] dirreadahead system call
> 
> On Thu, Jul 31, 2014 at 01:19:45PM +0200, Andreas Dilger wrote:
> > On Jul 31, 2014, at 6:49, Dave Chinner  wrote:
> > > 
> > >> On Mon, Jul 28, 2014 at 03:19:31PM -0600, Andreas Dilger wrote:
> > >>> On Jul 28, 2014, at 6:52 AM, Abhijith Das  wrote:
> > >>> OnJuly 26, 2014 12:27:19 AM "Andreas Dilger"  wrote:
> >  Is there a time when this doesn't get called to prefetch entries in
> >  readdir() order?  It isn't clear to me what benefit there is of
> >  returning
> >  the entries to userspace instead of just doing the statahead
> >  implicitly
> >  in the kernel?
> >  
> >  The Lustre client has had what we call "statahead" for a while,
> >  and similar to regular file readahead it detects the sequential access
> >  pattern for readdir() + stat() in readdir() order (taking into account
> >  if
> >  ".*"
> >  entries are being processed or not) and starts fetching the inode
> >  attributes asynchronously with a worker thread.
> > >>> 
> > >>> Does this heuristic work well in practice? In the use case we were
> > >>> trying to
> > >>> address, a Samba server is aware beforehand if it is going to stat all
> > >>> the
> > >>> inodes in a directory.
> > >> 
> > >> Typically this works well for us, because this is done by the Lustre
> > >> client, so the statahead is hiding the network latency of the RPCs to
> > >> fetch attributes from the server.  I imagine the same could be seen with
> > >> GFS2. I don't know if this approach would help very much for local
> > >> filesystems because the latency is low.
> > >> 
> >  This syscall might be more useful if userspace called readdir() to get
> >  the dirents and then passed the kernel the list of inode numbers
> >  to prefetch before starting on the stat() calls. That way, userspace
> >  could generate an arbitrary list of inodes (e.g. names matching a
> >  regexp) and the kernel doesn't need to guess if every inode is needed.
> > >>> 
> > >>> Were you thinking arbitrary inodes across the filesystem or just a
> > >>> subset
> > >>> from a directory? Arbitrary inodes may potentially throw up locking
> > >>> issues.
> > >> 
> > >> I was thinking about inodes returned from readdir(), but the syscall
> > >> would be much more useful if it could handle arbitrary inodes.
> > > 
> > > I'm not sure we can do that. The only way to safely identify a
> > > specific inode in the filesystem from userspace is via a filehandle.
> > > Plain inode numbers are susceptible to TOCTOU race conditions that
> > > the kernel cannot resolve. Also, lookup by inode number bypasses
> > > directory access permissions, so is not something we would expose
> > > to arbitrary unprivileged users.
> > 
> > None of these issues are relevant in the API that I'm thinking about.
> > The syscall just passes the list of inode numbers to be prefetched
> > into kernel memory, and then stat() is used to actually get the data into
> > userspace (or whatever other operation is to be done on them),
> > so there is no danger if the wrong inode is prefetched.  If the inode
> > number is bad the filesystem can just ignore it.
> 
> Which means the filesystem has to treat the inode number as
> potentially hostile. i.e. it can not be trusted to be correct and so
> must take slow paths to validate the inode numbers. This adds
> *significant* overhead to the readahead path for some filesystems:
> readahead is only a win if it is low cost.
> 
> For example, on XFS every untrusted inode number lookup requires an
> inode btree lookup to validate the inode is actually valid on disk
> and that is it allocated and has references. That lookup serialises
> against inode allocation/freeing as well as other lookups. In
> comparison, when using a trusted inode number from a directory
> lookup within the kernel, we only need to do a couple of shift and
> mask operations to convert it to a disk address and we are good to
> go.
> 
> i.e. the difference is at least 5 orders of magnitude higher CPU usage
> for an "inode number readahead" syscall versus a "directory
> readahead" syscall, it has significant serialisation issues and it
> can stall other modification/lookups going on at the same time.
> That's *horrible behaviour* for a speculative readahead operation,
> but because the inodenumbers are untrusted, we can't avoid it.
> 
> So, again, it's way more overhead than userspace just calling
> stat() asycnhronously on many files at once as readdir/gentdents
> returns dirents from the kernel to speed up cache population.
> 
> That's my main issue with this patchset - it's implementing
> something in kernelspace that can *easily* be done generically in
> userspace without introducing all sorts of 

Re: [PATCH v2 tip/core/rcu 01/10] rcu: Add call_rcu_tasks()

2014-07-31 Thread Paul E. McKenney
On Fri, Aug 01, 2014 at 08:53:38AM +0800, Lai Jiangshan wrote:
> On 08/01/2014 12:09 AM, Paul E. McKenney wrote:
> 
> > 
> >>> + /*
> >>> +  * There were callbacks, so we need to wait for an
> >>> +  * RCU-tasks grace period.  Start off by scanning
> >>> +  * the task list for tasks that are not already
> >>> +  * voluntarily blocked.  Mark these tasks and make
> >>> +  * a list of them in rcu_tasks_holdouts.
> >>> +  */
> >>> + rcu_read_lock();
> >>> + for_each_process_thread(g, t) {
> >>> + if (t != current && ACCESS_ONCE(t->on_rq) &&
> >>> + !is_idle_task(t)) {
> >>
> >> What happen when the trampoline is on the idle task?
> >>
> >> I think we need to use schedule_on_each_cpu() to replace one of
> >> the synchronize_sched() in this function. (or other stuff which can
> >> cause real schedule for *ALL* online CPUs).
> > 
> > Well, that is one of the questions in the 0/10 cover letter.  If it turns
> > out to be necessary to worry about idle-task trampolines, it should be
> > possible to avoid hammering all idle CPUs in the common case.  Though maybe
> > battery-powered devices won't need RCU-tasks.
> 
> trampolines on NO_HZ idle CPU can be arbitrary long, (example, SMI happens
> inside the trampoline).  So only the real schedule on idle CPU is reliable
> to me.

You might well be right, but first let's see if Steven needs this to
work in the idle task to begin with.  If he doesn't, then there is no
point in worrying about it.  If he does, I bet I can come up with a
trick or two.  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks()

2014-07-31 Thread Paul E. McKenney
On Fri, Aug 01, 2014 at 01:57:50AM +0200, Frederic Weisbecker wrote:
> On Thu, Jul 31, 2014 at 02:55:01PM -0700, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" 
> > 
> > This commit adds a new RCU-tasks flavor of RCU, which provides
> > call_rcu_tasks().  This RCU flavor's quiescent states are voluntary
> > context switch (not preemption!), userspace execution, and the idle loop.
> > Note that unlike other RCU flavors, these quiescent states occur in tasks,
> > not necessarily CPUs.  Includes fixes from Steven Rostedt.
> > 
> > This RCU flavor is assumed to have very infrequent latency-tolerate
> > updaters.  This assumption permits significant simplifications, including
> > a single global callback list protected by a single global lock, along
> > with a single linked list containing all tasks that have not yet passed
> > through a quiescent state.  If experience shows this assumption to be
> > incorrect, the required additional complexity will be added.
> > 
> > Suggested-by: Steven Rostedt 
> > Signed-off-by: Paul E. McKenney 
> > ---
> >  include/linux/init_task.h |   9 +++
> >  include/linux/rcupdate.h  |  36 ++
> >  include/linux/sched.h |  23 ---
> >  init/Kconfig  |  10 +++
> >  kernel/rcu/tiny.c |   2 +
> >  kernel/rcu/tree.c |   2 +
> >  kernel/rcu/update.c   | 171 
> > ++
> >  7 files changed, 242 insertions(+), 11 deletions(-)
> > 
> > diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> > index 6df7f9fe0d01..78715ea7c30c 100644
> > --- a/include/linux/init_task.h
> > +++ b/include/linux/init_task.h
> > @@ -124,6 +124,14 @@ extern struct group_info init_groups;
> >  #else
> >  #define INIT_TASK_RCU_PREEMPT(tsk)
> >  #endif
> > +#ifdef CONFIG_TASKS_RCU
> > +#define INIT_TASK_RCU_TASKS(tsk)   \
> > +   .rcu_tasks_holdout = false, \
> > +   .rcu_tasks_holdout_list =   \
> > +   LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),
> > +#else
> > +#define INIT_TASK_RCU_TASKS(tsk)
> > +#endif
> >  
> >  extern struct cred init_cred;
> >  
> > @@ -231,6 +239,7 @@ extern struct task_group root_task_group;
> > INIT_FTRACE_GRAPH   \
> > INIT_TRACE_RECURSION\
> > INIT_TASK_RCU_PREEMPT(tsk)  \
> > +   INIT_TASK_RCU_TASKS(tsk)\
> > INIT_CPUSET_SEQ(tsk)\
> > INIT_RT_MUTEXES(tsk)\
> > INIT_VTIME(tsk) \
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 6a94cc8b1ca0..829efc99df3e 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head,
> >  
> >  void synchronize_sched(void);
> >  
> > +/**
> > + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period
> > + * @head: structure to be used for queueing the RCU updates.
> > + * @func: actual callback function to be invoked after the grace period
> > + *
> > + * The callback function will be invoked some time after a full grace
> > + * period elapses, in other words after all currently executing RCU
> > + * read-side critical sections have completed. call_rcu_tasks() assumes
> > + * that the read-side critical sections end at a voluntary context
> > + * switch (not a preemption!), entry into idle, or transition to usermode
> > + * execution.  As such, there are no read-side primitives analogous to
> > + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended
> > + * to determine that all tasks have passed through a safe state, not so
> > + * much for data-strcuture synchronization.
> > + *
> > + * See the description of call_rcu() for more detailed information on
> > + * memory ordering guarantees.
> > + */
> > +void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head 
> > *head));
> > +
> >  #ifdef CONFIG_PREEMPT_RCU
> >  
> >  void __rcu_read_lock(void);
> > @@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct 
> > task_struct *prev,
> > rcu_irq_exit(); \
> > } while (0)
> >  
> > +/*
> > + * Note a voluntary context switch for RCU-tasks benefit.  This is a
> > + * macro rather than an inline function to avoid #include hell.
> > + */
> > +#ifdef CONFIG_TASKS_RCU
> > +#define rcu_note_voluntary_context_switch(t) \
> > +   do { \
> > +   preempt_disable(); /* Exclude synchronize_sched(); */ \
> > +   if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
> > +   ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \
> > +   preempt_enable(); \
> > +   } while (0)
> > +#else /* #ifdef CONFIG_TASKS_RCU */
> > +#define 

Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Aaron Lu
On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
> > On Tue, 29 Jul 2014 13:24:05 +0800
> > Aaron Lu  wrote:
> > 
> > > FYI, we noticed the below changes on
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure 
> > > task_numa_migrate() checks the preferred node")
> > > 
> > > ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
> > > ---  -  
> > >  94500 ~ 3%+115.6% 203711 ~ 6%  
> > > ivb42/hackbench/50%-threads-pipe
> > >  67745 ~ 4% +64.1% 74 ~ 5%  
> > > lkp-snb01/hackbench/50%-threads-socket
> > > 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
> > > proc-vmstat.numa_hint_faults_local
> > 
> > Hi Aaron,
> > 
> > Jirka Hladky has reported a regression with that changeset as
> > well, and I have already spent some time debugging the issue.
> 
> So assuming those numbers above are the difference in

Yes, they are.

It means, for commit ebe06187bf2aec1, the number for
num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01
machine. The 3%, 4% following that number means the deviation of the
different runs to their average(we usually run it multiple times to
phase out possible sharp values). We should probably remove that
percentage, as they cause confusion if no detailed explanation and may
not mean much to the commit author and others(if the deviation is big
enough, we should simply drop that result).

The percentage in the middle is the change between the two commits.

Another thing is the meaning of the numbers, it doesn't seem that
evident they are for proc-vmstat.numa_hint_faults_local. Maybe something
like this is better?

ebe06187bf2aec1  a43455a1d572daf7b730fe12e  proc-vmstat.numa_hint_faults_local
---  -  -
 94500 +115.6% 203711   ivb42/hackbench/50%-threads-pipe
 67745  +64.1% 74   
lkp-snb01/hackbench/50%-threads-socket
162245  +94.1% 314885   TOTAL 

Regards,
Aaron

> numa_hint_local_faults, the report is actually a significant
> _improvement_, not a regression.
> 
> On my IVB-EP I get similar numbers; using:
> 
>   PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
>   perf bench sched messaging -g 24 -t -p -l 6
>   POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
>   echo $((POST-PRE))
> 
> 
> tip/mater+origin/master   tip/master+origin/master-a43455a1d57
> 
> local total   local   total
> faults  timefaults  time
> 
> 19971 51.384  10104   50.838
> 17193 50.564  911650.208
> 13435 49.057  833251.344
> 23794 50.795  995451.364
> 20255 49.463  959851.258
> 
> 18929.6   50.2526 9420.8  51.0024
> 3863.61   0.96717.78  0.49
> 
> So that patch improves both local faults and runtime. Its good (even
> though for the runtime we're still inside stdev overlap, so ideally I'd
> do more runs).
> 
> 
> Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and
> that slightly reduces both again:
> 
> tip/master+origin/master+patch
> 
> local total
> faults  time
> 
> 21296 50.541
> 12771 50.54
> 13872 52.224
> 23352 50.85
> 16516 50.705
> 
> 17561.4   50.972
> 4613.32   0.71
> 
> So for hackbench a43455a1d57 is good and the proposed patch is making
> things worse.
> 
> Let me see if I can still find my SPECjbb2005 copy to see what that
> does.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks()

2014-07-31 Thread Paul E. McKenney
On Fri, Aug 01, 2014 at 09:15:34AM +0800, Lai Jiangshan wrote:
> On 08/01/2014 05:55 AM, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" 
> > 
> > This commit adds a new RCU-tasks flavor of RCU, which provides
> > call_rcu_tasks().  This RCU flavor's quiescent states are voluntary
> > context switch (not preemption!), userspace execution, and the idle loop.
> > Note that unlike other RCU flavors, these quiescent states occur in tasks,
> > not necessarily CPUs.  Includes fixes from Steven Rostedt.
> > 
> > This RCU flavor is assumed to have very infrequent latency-tolerate
> > updaters.  This assumption permits significant simplifications, including
> > a single global callback list protected by a single global lock, along
> > with a single linked list containing all tasks that have not yet passed
> > through a quiescent state.  If experience shows this assumption to be
> > incorrect, the required additional complexity will be added.
> > 
> > Suggested-by: Steven Rostedt 
> > Signed-off-by: Paul E. McKenney 
> > ---
> >  include/linux/init_task.h |   9 +++
> >  include/linux/rcupdate.h  |  36 ++
> >  include/linux/sched.h |  23 ---
> >  init/Kconfig  |  10 +++
> >  kernel/rcu/tiny.c |   2 +
> >  kernel/rcu/tree.c |   2 +
> >  kernel/rcu/update.c   | 171 
> > ++
> >  7 files changed, 242 insertions(+), 11 deletions(-)
> > 
> > diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> > index 6df7f9fe0d01..78715ea7c30c 100644
> > --- a/include/linux/init_task.h
> > +++ b/include/linux/init_task.h
> > @@ -124,6 +124,14 @@ extern struct group_info init_groups;
> >  #else
> >  #define INIT_TASK_RCU_PREEMPT(tsk)
> >  #endif
> > +#ifdef CONFIG_TASKS_RCU
> > +#define INIT_TASK_RCU_TASKS(tsk)   \
> > +   .rcu_tasks_holdout = false, \
> > +   .rcu_tasks_holdout_list =   \
> > +   LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),
> > +#else
> > +#define INIT_TASK_RCU_TASKS(tsk)
> > +#endif
> >  
> >  extern struct cred init_cred;
> >  
> > @@ -231,6 +239,7 @@ extern struct task_group root_task_group;
> > INIT_FTRACE_GRAPH   \
> > INIT_TRACE_RECURSION\
> > INIT_TASK_RCU_PREEMPT(tsk)  \
> > +   INIT_TASK_RCU_TASKS(tsk)\
> > INIT_CPUSET_SEQ(tsk)\
> > INIT_RT_MUTEXES(tsk)\
> > INIT_VTIME(tsk) \
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 6a94cc8b1ca0..829efc99df3e 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head,
> >  
> >  void synchronize_sched(void);
> >  
> > +/**
> > + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period
> > + * @head: structure to be used for queueing the RCU updates.
> > + * @func: actual callback function to be invoked after the grace period
> > + *
> > + * The callback function will be invoked some time after a full grace
> > + * period elapses, in other words after all currently executing RCU
> > + * read-side critical sections have completed. call_rcu_tasks() assumes
> > + * that the read-side critical sections end at a voluntary context
> > + * switch (not a preemption!), entry into idle, or transition to usermode
> > + * execution.  As such, there are no read-side primitives analogous to
> > + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended
> > + * to determine that all tasks have passed through a safe state, not so
> > + * much for data-strcuture synchronization.
> > + *
> > + * See the description of call_rcu() for more detailed information on
> > + * memory ordering guarantees.
> > + */
> > +void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head 
> > *head));
> > +
> >  #ifdef CONFIG_PREEMPT_RCU
> >  
> >  void __rcu_read_lock(void);
> > @@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct 
> > task_struct *prev,
> > rcu_irq_exit(); \
> > } while (0)
> >  
> > +/*
> > + * Note a voluntary context switch for RCU-tasks benefit.  This is a
> > + * macro rather than an inline function to avoid #include hell.
> > + */
> > +#ifdef CONFIG_TASKS_RCU
> > +#define rcu_note_voluntary_context_switch(t) \
> > +   do { \
> > +   preempt_disable(); /* Exclude synchronize_sched(); */ \
> > +   if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
> > +   ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \
> > +   preempt_enable(); \
> > +   } while (0)
> > +#else /* #ifdef CONFIG_TASKS_RCU */
> > +#define 

[PATCH 2/2] staging: comedi: addi_apci_1564: remove diagnostic interrupt support code

2014-07-31 Thread Chase Southwood
As per Ian, at this point in time it is not worth implementing an async
command interface for diagnostic interrupts for this board.  As this is
the case, this patch removes the code which supports such interrupts as it
is now unused.

This includes removing apci1564_do_read(), which was the insn_read
operation for the digital output subdevice, since all it was doing was
reading the current diagnostic interrupt type and returning it in 'data'.
This doesn't follow the comedi API and this operation can be emulated by
the comedi core anyway since the insn_bits operation follows the comedi
API.  So it is safe to simply remove this function.

Signed-off-by: Chase Southwood 
Cc: Ian Abbott 
Cc: H Hartley Sweeten 
---
 .../staging/comedi/drivers/addi-data/hwdrv_apci1564.c  | 14 --
 drivers/staging/comedi/drivers/addi_apci_1564.c| 18 --
 2 files changed, 32 deletions(-)

diff --git a/drivers/staging/comedi/drivers/addi-data/hwdrv_apci1564.c 
b/drivers/staging/comedi/drivers/addi-data/hwdrv_apci1564.c
index a1730e9..8a613ae 100644
--- a/drivers/staging/comedi/drivers/addi-data/hwdrv_apci1564.c
+++ b/drivers/staging/comedi/drivers/addi-data/hwdrv_apci1564.c
@@ -340,17 +340,3 @@ static int apci1564_timer_read(struct comedi_device *dev,
}
return insn->n;
 }
-
-/*
- * Reads the interrupt status register
- */
-static int apci1564_do_read(struct comedi_device *dev,
-   struct comedi_subdevice *s,
-   struct comedi_insn *insn,
-   unsigned int *data)
-{
-   struct apci1564_private *devpriv = dev->private;
-
-   *data = devpriv->do_int_type;
-   return insn->n;
-}
diff --git a/drivers/staging/comedi/drivers/addi_apci_1564.c 
b/drivers/staging/comedi/drivers/addi_apci_1564.c
index 819255b..543cb07 100644
--- a/drivers/staging/comedi/drivers/addi_apci_1564.c
+++ b/drivers/staging/comedi/drivers/addi_apci_1564.c
@@ -13,7 +13,6 @@ struct apci1564_private {
unsigned int mode1; /* riding-edge/high level channels */
unsigned int mode2; /* falling-edge/low level channels */
unsigned int ctrl;  /* interrupt mode OR (edge) . AND 
(level) */
-   unsigned int do_int_type;
unsigned char timer_select_mode;
unsigned char mode_select_register;
struct task_struct *tsk_current;
@@ -25,8 +24,6 @@ static int apci1564_reset(struct comedi_device *dev)
 {
struct apci1564_private *devpriv = dev->private;
 
-   devpriv->do_int_type = 0;
-
/* Disable the input interrupts and reset status register */
outl(0x0, devpriv->amcc_iobase + APCI1564_DI_IRQ_REG);
inl(devpriv->amcc_iobase + APCI1564_DI_INT_STATUS_REG);
@@ -83,20 +80,6 @@ static irqreturn_t apci1564_interrupt(int irq, void *d)
outl(status, devpriv->amcc_iobase + APCI1564_DI_IRQ_REG);
}
 
-   status = inl(devpriv->amcc_iobase + APCI1564_DO_IRQ_REG);
-   if (status & 0x01) {
-   /* Check for Digital Output interrupt Type */
-   /* 1: VCC interrupt*/
-   /* 2: CC interrupt */
-   devpriv->do_int_type = inl(devpriv->amcc_iobase +
- APCI1564_DO_INT_STATUS_REG) & 0x3;
-   /* Disable the  Interrupt */
-   outl(0x0, devpriv->amcc_iobase + APCI1564_DO_INT_CTRL_REG);
-
-   /* Sends signal to user space */
-   send_sig(SIGIO, devpriv->tsk_current, 0);
-   }
-
status = inl(devpriv->amcc_iobase + APCI1564_TIMER_IRQ_REG);
if (status & 0x01) {
/*  Disable Timer Interrupt */
@@ -407,7 +390,6 @@ static int apci1564_auto_attach(struct comedi_device *dev,
s->range_table = _digital;
s->insn_config = apci1564_do_config;
s->insn_bits = apci1564_do_insn_bits;
-   s->insn_read = apci1564_do_read;
 
/* Change-Of-State (COS) interrupt subdevice */
s = >subdevices[2];
-- 
2.0.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] staging: comedi: addi_apci_1564: add subdevice to check diagnostic status

2014-07-31 Thread Chase Southwood
This board provides VCC/CC diagnostic information, and it also supports
diagnostic interrupts.  However, as per Ian, these interrupts aren't very
useful and it is enough to simply provide an interface for accessing the
diagnostic status on-demand.  This patch adds a 2-channel digital input
subdevice with an insn_bits handler to access this information.

Signed-off-by: Chase Southwood 
Cc: Ian Abbott 
Cc: H Hartley Sweeten 
---
 drivers/staging/comedi/drivers/addi_apci_1564.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/drivers/addi_apci_1564.c 
b/drivers/staging/comedi/drivers/addi_apci_1564.c
index 190b026..819255b 100644
--- a/drivers/staging/comedi/drivers/addi_apci_1564.c
+++ b/drivers/staging/comedi/drivers/addi_apci_1564.c
@@ -157,6 +157,18 @@ static int apci1564_do_insn_bits(struct comedi_device *dev,
return insn->n;
 }
 
+static int apci1564_diag_insn_bits(struct comedi_device *dev,
+  struct comedi_subdevice *s,
+  struct comedi_insn *insn,
+  unsigned int *data)
+{
+   struct apci1564_private *devpriv = dev->private;
+
+   data[1] = inl(devpriv->amcc_iobase + APCI1564_DO_INT_STATUS_REG) & 3;
+
+   return insn->n;
+}
+
 /*
  * Change-Of-State (COS) interrupt configuration
  *
@@ -373,7 +385,7 @@ static int apci1564_auto_attach(struct comedi_device *dev,
dev->irq = pcidev->irq;
}
 
-   ret = comedi_alloc_subdevices(dev, 5);
+   ret = comedi_alloc_subdevices(dev, 6);
if (ret)
return ret;
 
@@ -434,6 +446,15 @@ static int apci1564_auto_attach(struct comedi_device *dev,
if (ret)
return ret;
 
+   /* Initialize the diagnostic status subdevice */
+   s = >subdevices[5];
+   s->type = COMEDI_SUBD_DI;
+   s->subdev_flags = SDF_READABLE;
+   s->n_chan = 2;
+   s->maxdata = 1;
+   s->range_table = _digital;
+   s->insn_bits = apci1564_diag_insn_bits;
+
return 0;
 }
 
-- 
2.0.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] staging: comedi: addi_apci_1564: provide interface to read diagnostic status

2014-07-31 Thread Chase Southwood
This patchset creates a simple subdevice to allow for reading of the
board's diagnostic status, and then removes any code which is related to
diagnostic interrupts, as the driver will not support these at this time.

Chase Southwood (2):
  staging: comedi: addi_apci_1564: add subdevice to check diagnostic
status
  staging: comedi: addi_apci_1564: remove diagnostic interrupt support
code

 .../comedi/drivers/addi-data/hwdrv_apci1564.c  | 14 
 drivers/staging/comedi/drivers/addi_apci_1564.c| 41 --
 2 files changed, 22 insertions(+), 33 deletions(-)

-- 
2.0.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: scheduler crash on Power

2014-07-31 Thread Michael Ellerman
On Wed, 2014-07-30 at 00:22 -0700, Sukadev Bhattiprolu wrote:
> I am getting this crash on a Powerpc system using 3.16.0-rc7 kernel plus
> some patches related to perf (24x7 counters) that Cody Schafer posted here:
> 
>   https://lkml.org/lkml/2014/5/27/768
> 
> I don't get the crash on an unpatched kernel though.

You mean you don't get the crash on 3.16-rc7 ?

I find it hard to believe those 24x7 patches are causing this.
 
> I am also attaching the debug messages that Peterz added
> here: https://lkml.org/lkml/2014/7/17/288

I don't see any FAIL messages in your log, so it looks like you're not hitting
the case that patch was looking for?

> Appreciate any debug suggestions.

Reproduce on an unpatched kernel.

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes

2014-07-31 Thread Nick Krause
On Thu, Jul 31, 2014 at 3:09 PM, Hugo Mills  wrote:
> On Thu, Jul 31, 2014 at 01:53:33PM -0400, Nicholas Krause wrote:
>> This adds checks for the stated modes as if they are crap we will return 
>> error
>> not supported.
>
>You've just enabled two options, but you haven't actually
> implemented the code behind it. I would tell you *NOT* to do anything
> else on this work until you can answer the question: What happens if
> you apply this patch, create a large file called "foo.txt", and then a
> userspace program executes the following code?
>
> int fd = open("foo.txt", O_RDWR);
> fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 50, 50);
>
>Try it on a btrfs filesystem, both with and without your patch.
> Also try it on an ext4 filesystem.
>
>Once you've done all of that, reply to this mail and tell me what
> the problem is with this patch. You need to make two answers: what are
> the technical problems with the patch? What errors have you made in
> the development process?
>
>*Only* if you can answer those questions sensibly, should you write
> any more patches, of any kind.
>
>Hugo.
>
>> Signed-off-by: Nicholas Krause 
>> ---
>>  fs/btrfs/file.c |3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index 1f2b99c..599495a 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -2490,7 +2490,8 @@ static long btrfs_fallocate(struct file *file, int 
>> mode,
>>   alloc_end = round_up(offset + len, blocksize);
>>
>>   /* Make sure we aren't being give some crap mode */
>> - if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
>> + if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE|
>> + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
>>   return -EOPNOTSUPP;
>>
>>   if (mode & FALLOC_FL_PUNCH_HOLE)
>> --
>> 1.7.10.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
>   PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
>   --- The glass is neither half-full nor half-empty; it is twice as ---
> large as it needs to be.
Calls are there in btrfs , therefore will either kernel panic or cause an oops.
Need to test this patch as this is very easy to catch bug.
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes

2014-07-31 Thread Duncan
Nicholas Krause posted on Thu, 31 Jul 2014 13:53:33 -0400 as excerpted:

> This adds checks for the stated modes as if they are crap we will return
> error not supported.
> 
> Signed-off-by: Nicholas Krause 
> ---
>  fs/btrfs/file.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 1f2b99c..599495a
> 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2490,7 +2490,8 @@
> static long btrfs_fallocate(struct file *file, int mode,
>   alloc_end = round_up(offset + len, blocksize);
>  
>   /* Make sure we aren't being give some crap mode */
> - if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
> + if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE| +
> FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
>   return -EOPNOTSUPP;
>  
>   if (mode & FALLOC_FL_PUNCH_HOLE)

Is the supporting code already there?

You're removing the EOPNOTSUPP errors, but the code doesn't add the 
support, just removes the errors in the check for it, yet your comment 
doesn't point out that the support is actually already there with a 
pointer to either the commit adding it or the functions supporting it, as 
it should if that's true and the implementing patch simply forgot to 
remove those checks.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] swap: remove the struct cpumask has_work

2014-07-31 Thread Lai Jiangshan
On 08/01/2014 12:09 AM, Chris Metcalf wrote:
> On 7/31/2014 7:51 AM, Michal Hocko wrote:
>> On Thu 31-07-14 11:30:19, Lai Jiangshan wrote:
>>> It is suggested that cpumask_var_t and alloc_cpumask_var() should be used
>>> instead of struct cpumask.  But I don't want to add this complicity nor
>>> leave this unwelcome "static struct cpumask has_work;", so I just remove
>>> it and use flush_work() to perform on all online drain_work.  flush_work()
>>> performs very quickly on initialized but unused work item, thus we don't
>>> need the struct cpumask has_work for performance.
>> Why? Just because there is general recommendation for using
>> cpumask_var_t rather than cpumask?
>>
>> In this particular case cpumask shouldn't matter much as it is static.
>> Your code will work as well, but I do not see any strong reason to
>> change it just to get rid of cpumask which is not on stack.
> 
> The code uses for_each_cpu with a cpumask to avoid waking cpus that don't
> need to do work.  This is important for the nohz_full type functionality,
> power efficiency, etc.  So, nack for this change.
> 

flush_work() on initialized but unused work item just disables irq and
fetches work->data to test and restores irq and return.

the struct cpumask has_work is just premature optimization.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks()

2014-07-31 Thread Lai Jiangshan
On 08/01/2014 05:55 AM, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> This commit adds a new RCU-tasks flavor of RCU, which provides
> call_rcu_tasks().  This RCU flavor's quiescent states are voluntary
> context switch (not preemption!), userspace execution, and the idle loop.
> Note that unlike other RCU flavors, these quiescent states occur in tasks,
> not necessarily CPUs.  Includes fixes from Steven Rostedt.
> 
> This RCU flavor is assumed to have very infrequent latency-tolerate
> updaters.  This assumption permits significant simplifications, including
> a single global callback list protected by a single global lock, along
> with a single linked list containing all tasks that have not yet passed
> through a quiescent state.  If experience shows this assumption to be
> incorrect, the required additional complexity will be added.
> 
> Suggested-by: Steven Rostedt 
> Signed-off-by: Paul E. McKenney 
> ---
>  include/linux/init_task.h |   9 +++
>  include/linux/rcupdate.h  |  36 ++
>  include/linux/sched.h |  23 ---
>  init/Kconfig  |  10 +++
>  kernel/rcu/tiny.c |   2 +
>  kernel/rcu/tree.c |   2 +
>  kernel/rcu/update.c   | 171 
> ++
>  7 files changed, 242 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index 6df7f9fe0d01..78715ea7c30c 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -124,6 +124,14 @@ extern struct group_info init_groups;
>  #else
>  #define INIT_TASK_RCU_PREEMPT(tsk)
>  #endif
> +#ifdef CONFIG_TASKS_RCU
> +#define INIT_TASK_RCU_TASKS(tsk) \
> + .rcu_tasks_holdout = false, \
> + .rcu_tasks_holdout_list =   \
> + LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),
> +#else
> +#define INIT_TASK_RCU_TASKS(tsk)
> +#endif
>  
>  extern struct cred init_cred;
>  
> @@ -231,6 +239,7 @@ extern struct task_group root_task_group;
>   INIT_FTRACE_GRAPH   \
>   INIT_TRACE_RECURSION\
>   INIT_TASK_RCU_PREEMPT(tsk)  \
> + INIT_TASK_RCU_TASKS(tsk)\
>   INIT_CPUSET_SEQ(tsk)\
>   INIT_RT_MUTEXES(tsk)\
>   INIT_VTIME(tsk) \
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 6a94cc8b1ca0..829efc99df3e 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head,
>  
>  void synchronize_sched(void);
>  
> +/**
> + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period
> + * @head: structure to be used for queueing the RCU updates.
> + * @func: actual callback function to be invoked after the grace period
> + *
> + * The callback function will be invoked some time after a full grace
> + * period elapses, in other words after all currently executing RCU
> + * read-side critical sections have completed. call_rcu_tasks() assumes
> + * that the read-side critical sections end at a voluntary context
> + * switch (not a preemption!), entry into idle, or transition to usermode
> + * execution.  As such, there are no read-side primitives analogous to
> + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended
> + * to determine that all tasks have passed through a safe state, not so
> + * much for data-strcuture synchronization.
> + *
> + * See the description of call_rcu() for more detailed information on
> + * memory ordering guarantees.
> + */
> +void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head 
> *head));
> +
>  #ifdef CONFIG_PREEMPT_RCU
>  
>  void __rcu_read_lock(void);
> @@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct 
> task_struct *prev,
>   rcu_irq_exit(); \
>   } while (0)
>  
> +/*
> + * Note a voluntary context switch for RCU-tasks benefit.  This is a
> + * macro rather than an inline function to avoid #include hell.
> + */
> +#ifdef CONFIG_TASKS_RCU
> +#define rcu_note_voluntary_context_switch(t) \
> + do { \
> + preempt_disable(); /* Exclude synchronize_sched(); */ \
> + if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
> + ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \
> + preempt_enable(); \

Why the preempt_disable() is needed here? The comments in rcu_tasks_kthread()
can't persuade me.  Maybe it could be removed?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the 

Re: [PATCH 2/5] raid: Require designated initialization of structures

2014-07-31 Thread Josh Triplett
On Fri, Aug 01, 2014 at 11:10:55AM +1000, NeilBrown wrote:
> On Thu, 31 Jul 2014 16:47:35 -0700 Josh Triplett 
> wrote:
> 
> > Mark raid6_calls and other structures containing function pointers with
> > __designated_init.  Fix implementations in lib/raid6/ to use designated
> > initializers; this also simplifies those initializers using the default
> > initialization of fields to 0.
> > 
> > Signed-off-by: Josh Triplett 
> 
> Looks like an excellent idea!
> Feel free to forward this upstream on my behalf, or remind me once the first
> patch is in -next, and I'll take this one myself - whichever you prefer.
> 
>  Acked-by: NeilBrown 

Thanks!  Ideally, I'd like to see the whole series go in through one
tree, which is why I CCed Andrew.  I can easily produce several dozen
more patches like these, but I just included enough examples to motivate
patch 1, and I can send more in any order once that one goes in.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH percpu/for-3.17 1/2] percpu: implement percpu_pool

2014-07-31 Thread Tejun Heo
Hello, Andrew.

On Thu, Jul 31, 2014 at 06:16:56PM -0700, Andrew Morton wrote:
> Yet nowhere in either the changelog or the code comments is it even
> mentioned that this allocator is unreliable and that callers *must*
> implement (and test!) fallback paths.

Hmmm, yeah, somehow the atomic behavior seemed obvious to me.  I'll
try to make it clear that this thing can and does fail.

> > an obvious solution is adding a failure
> > injection for debugging, but really except for being a bit ghetto,
> > this is just the atomic allocation for percpu areas.
> 
> If it was a try-GFP_ATOMIC-then-fall-back-to-pool thing then it would
> work fairly well.  But it's not even that - a caller could trivially
> chew through that pool in a single timeslice.  Especially on !SMP. 
> Especially squared with !PREEMPT or SCHED_FIFO.

Yeap, occassional pool depletion would be a normal thing to happen,
which isn't a correctness issue and most likely not even a performance
issue.

> But please make very sure that this is how we position it.  I don't
> know how to do this.  Maybe prefix the names with "blk_" to signify
> that it is block-private (and won't even be there if !CONFIG_BLOCK).
> 
> Or rename percpu_pool_alloc() to percpu_pool_try_alloc() - that should
> wake people up.

Sounds good to me.  I'll rename it to percpu_pool_try_alloc() and make
it clear in the comment that the allocation is opportunistic.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH percpu/for-3.17 1/2] percpu: implement percpu_pool

2014-07-31 Thread Andrew Morton
On Thu, 31 Jul 2014 20:44:38 -0400 Tejun Heo  wrote:

> Hello, Andrew.
> 
> On Thu, Jul 31, 2014 at 6:03 PM, Andrew Morton
>  wrote:
> > I don't think we should add facilities such as this.  Because if we do,
> > people will use them and thereby make the kernel less reliable, for
> > obvious reasons.
> >
> > It would be better to leave the nasty hack localized within
> > blk-throttle.c and hope that someone finds a way of fixing it.
> 
> The thing is we need similar facilities in the IO path in other places
> too. They share exactly the same characteristics - opportunistic
> percpu allocations during IO which are expected to fail from time to
> time and they will all implement fallback behavior on allocation
> failures. I'm not sure how this makes the kernel less reliable. This
> conceptually isn't different from atomic allocations which we also use
> in a similar way.

Atomic allocations are more robust than this thing.  But yes, they also
are unreliable and their use should be discouraged for the same
reasons.

> If you're worried that people might use this
> assuming that it won't fail,

That's precisely my concern.

Yet nowhere in either the changelog or the code comments is it even
mentioned that this allocator is unreliable and that callers *must*
implement (and test!) fallback paths.

> an obvious solution is adding a failure
> injection for debugging, but really except for being a bit ghetto,
> this is just the atomic allocation for percpu areas.

If it was a try-GFP_ATOMIC-then-fall-back-to-pool thing then it would
work fairly well.  But it's not even that - a caller could trivially
chew through that pool in a single timeslice.  Especially on !SMP. 
Especially squared with !PREEMPT or SCHED_FIFO.

And that's all OK, as long as this is positioned as "opportunistic
performance optimisation which is expected to be available most of the
time in non-stressful use cases".

But please make very sure that this is how we position it.  I don't
know how to do this.  Maybe prefix the names with "blk_" to signify
that it is block-private (and won't even be there if !CONFIG_BLOCK).

Or rename percpu_pool_alloc() to percpu_pool_try_alloc() - that should
wake people up.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks()

2014-07-31 Thread Lai Jiangshan
On 08/01/2014 05:55 AM, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> This commit adds a new RCU-tasks flavor of RCU, which provides
> call_rcu_tasks().  This RCU flavor's quiescent states are voluntary
> context switch (not preemption!), userspace execution, and the idle loop.
> Note that unlike other RCU flavors, these quiescent states occur in tasks,
> not necessarily CPUs.  Includes fixes from Steven Rostedt.
> 
> This RCU flavor is assumed to have very infrequent latency-tolerate
> updaters.  This assumption permits significant simplifications, including
> a single global callback list protected by a single global lock, along
> with a single linked list containing all tasks that have not yet passed
> through a quiescent state.  If experience shows this assumption to be
> incorrect, the required additional complexity will be added.
> 
> Suggested-by: Steven Rostedt 
> Signed-off-by: Paul E. McKenney 
> ---
>  include/linux/init_task.h |   9 +++
>  include/linux/rcupdate.h  |  36 ++
>  include/linux/sched.h |  23 ---
>  init/Kconfig  |  10 +++
>  kernel/rcu/tiny.c |   2 +
>  kernel/rcu/tree.c |   2 +
>  kernel/rcu/update.c   | 171 
> ++
>  7 files changed, 242 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index 6df7f9fe0d01..78715ea7c30c 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -124,6 +124,14 @@ extern struct group_info init_groups;
>  #else
>  #define INIT_TASK_RCU_PREEMPT(tsk)
>  #endif
> +#ifdef CONFIG_TASKS_RCU
> +#define INIT_TASK_RCU_TASKS(tsk) \
> + .rcu_tasks_holdout = false, \
> + .rcu_tasks_holdout_list =   \
> + LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),
> +#else
> +#define INIT_TASK_RCU_TASKS(tsk)
> +#endif
>  
>  extern struct cred init_cred;
>  
> @@ -231,6 +239,7 @@ extern struct task_group root_task_group;
>   INIT_FTRACE_GRAPH   \
>   INIT_TRACE_RECURSION\
>   INIT_TASK_RCU_PREEMPT(tsk)  \
> + INIT_TASK_RCU_TASKS(tsk)\
>   INIT_CPUSET_SEQ(tsk)\
>   INIT_RT_MUTEXES(tsk)\
>   INIT_VTIME(tsk) \
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 6a94cc8b1ca0..829efc99df3e 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head,
>  
>  void synchronize_sched(void);
>  
> +/**
> + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period
> + * @head: structure to be used for queueing the RCU updates.
> + * @func: actual callback function to be invoked after the grace period
> + *
> + * The callback function will be invoked some time after a full grace
> + * period elapses, in other words after all currently executing RCU
> + * read-side critical sections have completed. call_rcu_tasks() assumes
> + * that the read-side critical sections end at a voluntary context
> + * switch (not a preemption!), entry into idle, or transition to usermode
> + * execution.  As such, there are no read-side primitives analogous to
> + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended
> + * to determine that all tasks have passed through a safe state, not so
> + * much for data-strcuture synchronization.
> + *
> + * See the description of call_rcu() for more detailed information on
> + * memory ordering guarantees.
> + */
> +void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head 
> *head));
> +
>  #ifdef CONFIG_PREEMPT_RCU
>  
>  void __rcu_read_lock(void);
> @@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct 
> task_struct *prev,
>   rcu_irq_exit(); \
>   } while (0)
>  
> +/*
> + * Note a voluntary context switch for RCU-tasks benefit.  This is a
> + * macro rather than an inline function to avoid #include hell.
> + */
> +#ifdef CONFIG_TASKS_RCU
> +#define rcu_note_voluntary_context_switch(t) \
> + do { \
> + preempt_disable(); /* Exclude synchronize_sched(); */ \
> + if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
> + ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \
> + preempt_enable(); \
> + } while (0)
> +#else /* #ifdef CONFIG_TASKS_RCU */
> +#define rcu_note_voluntary_context_switch(t) do { } while (0)
> +#endif /* #else #ifdef CONFIG_TASKS_RCU */
> +
>  #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || 
> defined(CONFIG_SMP)
>  bool __rcu_is_watching(void);
>  #endif /* #if 

Re: [PATCH v2] ACPI / LPSS: add lpss device for Wildcat Point PCH

2014-07-31 Thread Rafael J. Wysocki
On Friday, August 01, 2014 09:06:35 AM Jie Yang wrote:
> INT3438 is the ADSP device on Wildcat Point platform
> with 2 DW DMA engines built In. The DMA engines are
> used for DSP FW loading and audio data transferring.
> These DMA engine probing need the clock, without it,
> probing may failed and can't go forward.
> 
> Add lpss device "INT3438" for Wildcat Point PCH, to
> provide clock for its ADSP DMA engine probing.
> 
> Signed-off-by: Jie Yang 

Looks good, queued up for 3.17, thanks!

> ---
>  drivers/acpi/acpi_lpss.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c
> index 9cb65b0..ce06149 100644
> --- a/drivers/acpi/acpi_lpss.c
> +++ b/drivers/acpi/acpi_lpss.c
> @@ -113,6 +113,14 @@ static void lpss_i2c_setup(struct lpss_private_data 
> *pdata)
>   writel(val, pdata->mmio_base + offset);
>  }
>  
> +static struct lpss_device_desc wpt_dev_desc = {
> + .clk_required = true,
> + .prv_offset = 0x800,
> + .ltr_required = true,
> + .clk_divider = true,
> + .clk_gate = true,
> +};
> +
>  static struct lpss_device_desc lpt_dev_desc = {
>   .clk_required = true,
>   .prv_offset = 0x800,
> @@ -226,6 +234,8 @@ static const struct acpi_device_id acpi_lpss_device_ids[] 
> = {
>   { "INT3436", LPSS_ADDR(lpt_sdio_dev_desc) },
>   { "INT3437", },
>  
> + { "INT3438", LPSS_ADDR(wpt_dev_desc) },
> +
>   { }
>  };
>  
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] raid: Require designated initialization of structures

2014-07-31 Thread NeilBrown
On Thu, 31 Jul 2014 16:47:35 -0700 Josh Triplett 
wrote:

> Mark raid6_calls and other structures containing function pointers with
> __designated_init.  Fix implementations in lib/raid6/ to use designated
> initializers; this also simplifies those initializers using the default
> initialization of fields to 0.
> 
> Signed-off-by: Josh Triplett 

Looks like an excellent idea!
Feel free to forward this upstream on my behalf, or remind me once the first
patch is in -next, and I'll take this one myself - whichever you prefer.

 Acked-by: NeilBrown 

Thanks,
NeilBrown

> ---
>  include/linux/raid/pq.h|  4 ++--
>  include/linux/raid/xor.h   |  2 +-
>  include/linux/raid_class.h |  2 +-
>  lib/raid6/altivec.uc   |  7 +++
>  lib/raid6/avx2.c   | 24 
>  lib/raid6/int.uc   |  6 ++
>  lib/raid6/mmx.c| 14 ++
>  lib/raid6/neon.c   |  7 +++
>  lib/raid6/sse1.c   | 16 
>  lib/raid6/sse2.c   | 24 
>  lib/raid6/tilegx.uc|  6 ++
>  11 files changed, 52 insertions(+), 60 deletions(-)
> 
> diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
> index 73069cb..2147bff 100644
> --- a/include/linux/raid/pq.h
> +++ b/include/linux/raid/pq.h
> @@ -75,7 +75,7 @@ struct raid6_calls {
>   int  (*valid)(void);/* Returns 1 if this routine set is usable */
>   const char *name;   /* Name of this routine set */
>   int prefer; /* Has special performance attribute */
> -};
> +} __designated_init;
>  
>  /* Selected algorithm */
>  extern struct raid6_calls raid6_call;
> @@ -109,7 +109,7 @@ struct raid6_recov_calls {
>   int  (*valid)(void);
>   const char *name;
>   int priority;
> -};
> +} __designated_init;
>  
>  extern const struct raid6_recov_calls raid6_recov_intx1;
>  extern const struct raid6_recov_calls raid6_recov_ssse3;
> diff --git a/include/linux/raid/xor.h b/include/linux/raid/xor.h
> index 5a21095..c7df59f 100644
> --- a/include/linux/raid/xor.h
> +++ b/include/linux/raid/xor.h
> @@ -17,6 +17,6 @@ struct xor_block_template {
>unsigned long *, unsigned long *);
>   void (*do_5)(unsigned long, unsigned long *, unsigned long *,
>unsigned long *, unsigned long *, unsigned long *);
> -};
> +} __designated_init;
>  
>  #endif
> diff --git a/include/linux/raid_class.h b/include/linux/raid_class.h
> index 31e1ff6..603af94 100644
> --- a/include/linux/raid_class.h
> +++ b/include/linux/raid_class.h
> @@ -16,7 +16,7 @@ struct raid_function_template {
>   int (*is_raid)(struct device *);
>   void (*get_resync)(struct device *);
>   void (*get_state)(struct device *);
> -};
> +} __designated_init;
>  
>  enum raid_state {
>   RAID_STATE_UNKNOWN = 0,
> diff --git a/lib/raid6/altivec.uc b/lib/raid6/altivec.uc
> index 7cc12b5..4ff138c 100644
> --- a/lib/raid6/altivec.uc
> +++ b/lib/raid6/altivec.uc
> @@ -118,10 +118,9 @@ int raid6_have_altivec(void)
>  #endif
>  
>  const struct raid6_calls raid6_altivec$# = {
> - raid6_altivec$#_gen_syndrome,
> - raid6_have_altivec,
> - "altivecx$#",
> - 0
> + .gen_syndrome = raid6_altivec$#_gen_syndrome,
> + .valid = raid6_have_altivec,
> + .name = "altivecx$#",
>  };
>  
>  #endif /* CONFIG_ALTIVEC */
> diff --git a/lib/raid6/avx2.c b/lib/raid6/avx2.c
> index bc3b1dd..e56fa06 100644
> --- a/lib/raid6/avx2.c
> +++ b/lib/raid6/avx2.c
> @@ -88,10 +88,10 @@ static void raid6_avx21_gen_syndrome(int disks, size_t 
> bytes, void **ptrs)
>  }
>  
>  const struct raid6_calls raid6_avx2x1 = {
> - raid6_avx21_gen_syndrome,
> - raid6_have_avx2,
> - "avx2x1",
> - 1   /* Has cache hints */
> + .gen_syndrome = raid6_avx21_gen_syndrome,
> + .valid = raid6_have_avx2,
> + .name = "avx2x1",
> + .prefer = 1,/* Has cache hints */
>  };
>  
>  /*
> @@ -149,10 +149,10 @@ static void raid6_avx22_gen_syndrome(int disks, size_t 
> bytes, void **ptrs)
>  }
>  
>  const struct raid6_calls raid6_avx2x2 = {
> - raid6_avx22_gen_syndrome,
> - raid6_have_avx2,
> - "avx2x2",
> - 1   /* Has cache hints */
> + .gen_syndrome = raid6_avx22_gen_syndrome,
> + .valid = raid6_have_avx2,
> + .name = "avx2x2",
> + .prefer = 1,/* Has cache hints */
>  };
>  
>  #ifdef CONFIG_X86_64
> @@ -241,10 +241,10 @@ static void raid6_avx24_gen_syndrome(int disks, size_t 
> bytes, void **ptrs)
>  }
>  
>  const struct raid6_calls raid6_avx2x4 = {
> - raid6_avx24_gen_syndrome,
> - raid6_have_avx2,
> - "avx2x4",
> - 1   /* Has cache hints */
> + .gen_syndrome = raid6_avx24_gen_syndrome,
> + .valid = raid6_have_avx2,
> + .name = "avx2x4",
> + .prefer = 1,/* Has cache hints */
>  };
>  #endif
>  
> diff --git a/lib/raid6/int.uc b/lib/raid6/int.uc
> index 5b50f8d..35ad01a 

Re: [PATCH 0/2] new API to allocate buffer-cache for superblock in non-movable area

2014-07-31 Thread Gioh Kim



2014-08-01 오전 9:07, Gioh Kim 쓴 글:



2014-07-31 오후 9:21, Jan Kara 쓴 글:

On Thu 31-07-14 09:37:15, Gioh Kim wrote:



2014-07-31 오전 9:03, Jan Kara 쓴 글:

On Thu 31-07-14 08:54:40, Gioh Kim wrote:

2014-07-30 오후 7:11, Jan Kara 쓴 글:

On Wed 30-07-14 16:44:24, Gioh Kim wrote:

2014-07-22 오후 6:38, Jan Kara 쓴 글:

On Tue 22-07-14 09:30:05, Peter Zijlstra wrote:

On Tue, Jul 22, 2014 at 02:18:47PM +0900, Gioh Kim wrote:

Hello,

This patch try to solve problem that a long-lasting page cache of
ext4 superblock disturbs page migration.

I've been testing CMA feature on my ARM-based platform
and found some pages for page caches cannot be migrated.
Some of them are page caches of superblock of ext4 filesystem.

Current ext4 reads superblock with sb_bread(). sb_bread() allocates page

>from movable area. But the problem is that ext4 hold the page until

it is unmounted. If root filesystem is ext4 the page cannot be migrated forever.

I introduce a new API for allocating page from non-movable area.
It is useful for ext4 and others that want to hold page cache for a long time.


There's no word on why you can't teach ext4 to still migrate that page.
For all I know it might be impossible, but at least mention why.


I am very sorry for lacking of details.

In ext4_fill_super() the buffer-head of superblock is stored in sbi->s_sbh.
The page belongs to the buffer-head is allocated from movable area.
To migrate the page the buffer-head should be released via brelse().
But brelse() is not called until unmount.

   Hum, I don't see where in the code do we check buffer_head use count. Can
you please point me? Thanks.


Filesystem code does not check buffer_head use count.  sb_bread() returns
the buffer_head that is included in bh_lru and has non-zero use count.
You can see the bh_lru code in buffer.c: __find_get_clock() and
lookup_bh_lru().  bh_lru_install() inserts the buffer_head into the
bh_lru().  It first calls get_bh() to increase the use count and insert
bh into the lru array.

The buffer_head use count is non-zero until brelse() is called.

   So I probably didn't phrase the question precisely enough. What I was
asking about is where exactly *migration* code checks buffer use count?
Because as I'm looking at buffer_migrate_page() we lock the buffers on a
migrated page but we don't look at buffer use counts... So it seems to me
that migration of a page with buffers should succeed even if buffer head
has an elevated use count. Now I think that it *should* check the buffer
use counts (it is dangerous to migrate buffers someone holds reference to)
but I just cannot find that place. Or does CMA use some other migration
function for buffer pages than buffer_migrate_page()?


CMA allocation function is cma_alloc().
Function flow is alloc_contig_range() -> __alloc_contig_migrate_range() -> 
migrate_pages -> unmap_and_move
-> __unmap_and_move -> try_to_free_buffers -> drop_buffers -> buffer_busy.

The buffer_busy() is checking b_count.
If buffer is busy buffer-cache cannot be removed.
So the page that includes buffer_head and the page that is refered by
buffer_head are not movable.

Is this what you need?

   Yes, this is what I was asking about. Thanks! But as I'm looking into
__unmap_and_move() it calls try_to_free_buffers() only if page->mapping ==
NULL. As the comment before that test states, this can happen only for swap
cache (not our case) or for pagecache pages that were truncated and not yet
fully cleaned up. But superblock page cannot really be truncated. So I
somewhat doubt you can hit the above path for a page holding superblock...


I printed the address of busy buffer_head in drop_buffers() that is called by 
try_to_free_buffers().
And I printed the address of sb buffer_head.
They were the same.

I'm going to check page->mapping.


I'm very sorry. It's my fault.

Function path is like followings:

[   97.868304] [<8011a750>] (drop_buffers+0xfc/0x168) from [<8011bc64>] 
(try_to_free_buffers+0x50/0xbc)
[   97.877457] [<8011bc64>] (try_to_free_buffers+0x50/0xbc) from [<80121e40>] 
(blkdev_releasepage+0x38/0x48)
[   97.887093] [<80121e40>] (blkdev_releasepage+0x38/0x48) from [<800add8c>] 
(try_to_release_page+0x40/0x5c)
[   97.896728] [<800add8c>] (try_to_release_page+0x40/0x5c) from [<800bd9bc>] 
(shrink_page_list+0x508/0x8a4)
[   97.906334] [<800bd9bc>] (shrink_page_list+0x508/0x8a4) from [<800bde5c>] 
(reclaim_clean_pages_from_list+0x104/0x148)
[   97.917017] [<800bde5c>] (reclaim_clean_pages_from_list+0x104/0x148) from 
[<800b5dec>] (alloc_contig_range+0x114/0x2dc)
[   97.927856] [<800b5dec>] (alloc_contig_range+0x114/0x2dc) from [<802f6c04>] 
(dma_alloc_from_contiguous+0x8c/0x14c)
[   97.938264] [<802f6c04>] (dma_alloc_from_contiguous+0x8c/0x14c) from 
[<80017b6c>] (__alloc_from_contiguous+0x34/0xc0)
[   97.948926] [<80017b6c>] (__alloc_from_contiguous+0x34/0xc0) from 
[<80017d40>] (__dma_alloc+0xc4/0x2a0)
[   97.958362] [<80017d40>] (__dma_alloc+0xc4/0x2a0) from [<8001803c>] 
(arm_dma_alloc+0x80/0x98)
[   

[PATCH v2] ACPI / LPSS: add lpss device for Wildcat Point PCH

2014-07-31 Thread Jie Yang
INT3438 is the ADSP device on Wildcat Point platform
with 2 DW DMA engines built In. The DMA engines are
used for DSP FW loading and audio data transferring.
These DMA engine probing need the clock, without it,
probing may failed and can't go forward.

Add lpss device "INT3438" for Wildcat Point PCH, to
provide clock for its ADSP DMA engine probing.

Signed-off-by: Jie Yang 
---
 drivers/acpi/acpi_lpss.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c
index 9cb65b0..ce06149 100644
--- a/drivers/acpi/acpi_lpss.c
+++ b/drivers/acpi/acpi_lpss.c
@@ -113,6 +113,14 @@ static void lpss_i2c_setup(struct lpss_private_data *pdata)
writel(val, pdata->mmio_base + offset);
 }
 
+static struct lpss_device_desc wpt_dev_desc = {
+   .clk_required = true,
+   .prv_offset = 0x800,
+   .ltr_required = true,
+   .clk_divider = true,
+   .clk_gate = true,
+};
+
 static struct lpss_device_desc lpt_dev_desc = {
.clk_required = true,
.prv_offset = 0x800,
@@ -226,6 +234,8 @@ static const struct acpi_device_id acpi_lpss_device_ids[] = 
{
{ "INT3436", LPSS_ADDR(lpt_sdio_dev_desc) },
{ "INT3437", },
 
+   { "INT3438", LPSS_ADDR(wpt_dev_desc) },
+
{ }
 };
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] irq / PM: New driver interface for wakeup interrupts

2014-07-31 Thread Rafael J. Wysocki
On Friday, August 01, 2014 02:08:12 AM Rafael J. Wysocki wrote:
> On Friday, August 01, 2014 12:16:23 AM Thomas Gleixner wrote:
> > On Thu, 31 Jul 2014, Rafael J. Wysocki wrote:
> > > On Thursday, July 31, 2014 12:44:24 PM Thomas Gleixner wrote:

[cut]

> Except for a couple of points where I'm not sure I understand you correctly
> (commented above), all of that sounds good to me.
> 
> I'm not sure about the ordering, though.  It would be good to have a working
> replacement for the IRQF_NO_SUSPEND things that we'll be removing in 1, for
> example.  So since we need to do 3) IRQF_SHARED for both IRQF_NO_SUSPEND and
> wakeup, as you said, would it be practical to start with that one?

I forgot about one case which in my opinion would be good to take into account
from the outset.  That is the case of runtime-suspended devices that we don't
want to touch during system suspend/resume.  If those are system wakeup
devices, their drivers should be able to configure them for system wakeup
at the runtime suspend time rather than during system suspend.  Also if their
interrupts are going to be used as system wakeup interrupts, the interface
that we're going to provide needs to handle that case.  That is, it should
be possible to use that interface at the runtime suspend time and it should
take care of all things going forward.

Triggering a lazy disable right at the runtime suspend time may not work,
because the interrupt will also be used for the device's runtime remote wakeup
in that case, so it has to be functional until system suspend is started and
the core decides to leave the device in its current state.  This means that
the core will need to trigger the lazy disable for it at one point during
system suspend.

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] KVM: nVMX: nested TPR shadow/threshold emulation

2014-07-31 Thread Zhang, Yang Z
Paolo Bonzini wrote on 2014-07-31:
> Il 31/07/2014 10:03, Wanpeng Li ha scritto:
>>> One thing:
>>> 
 +  if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
 +  vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold);
>>> 
>>> I think you can just do this write unconditionally, since most
>>> hypervisors will enable this.  Also, you probably can add the tpr
>> 
>> What will happen if a hypervisor doesn't enable it? I make it more
>> cleaner in version two.
> 
> TPR_THRESHOLD will be likely written as zero, but the processor will
> never use it anyway.  It's just a small optimization because
> nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always be true.

Theoretically, you are right. But we should not expect all VMMs follow it. It 
is not worth to violate the SDM just for saving two or three instructions' cost.

> 
> Paolo
> 
>>> threshold field to the read-write fields for shadow VMCS.
>> 
>> Agreed.
>> 
>> Regards,
>> Wanpeng Li


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq, store_scaling_governor requires policy->rwsem to be held for duration of changing governors [v2]

2014-07-31 Thread Saravana Kannan

On 07/31/2014 03:58 PM, Prarit Bhargava wrote:



On 07/31/2014 06:13 PM, Saravana Kannan wrote:

On 07/31/2014 02:08 PM, Prarit Bhargava wrote:



On 07/31/2014 04:38 PM, Saravana Kannan wrote:

On 07/31/2014 01:30 PM, Prarit Bhargava wrote:



On 07/31/2014 04:24 PM, Saravana Kannan wrote:


Prarit,

I'm not an expert on sysfs locking, but I would think the specific sysfs lock
would depend on the file/attribute group. So, can you please try to hotplug a
core in/out (to trigger the POLICY_EXIT) and then read a sysfs file
exported by
the governor? scaling_governor doesn't cut it since that file is not
removed on
policy exit event to governor. If it's ondemand, try reading/write it's
sampling
rate file.


Thanks Saravana -- will do.  I will get back to you shortly on this.



Thanks. Btw, in case you weren't already aware of it. You'll have to hoplug out
all the CPUs in a cluster to trigger a POLICY_EXIT for that cluster/policy.


Yep -- the affected_cpus file should show all the cpus in the policy IIRC.  One
of the systems I have has 1 cpu/policy and has 48 threads so the POLICY_EXIT is
called.

I'll put something like

while [1];
do
echo ondemand > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
echo 2 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
echo 0 > /sys/devices/system/cpu/cpu1/online
sleep 1
echo 1 > /sys/devices/system/cpu/cpu1/online
sleep 1
done



The actual race can only happen with 2 threads. I'm just trying to trigger a
lockdep warning here.


I ran the above in two separate terminals with cpuset -c 0 and cpuset -c 1 to
multi-thread it all.  No deadlock or LOCKDEP trace after about 1/2 hour, so I
think we're in the clear on that concern.



I wasn't convinced. So, I took some help from Stephen to test it.

It's been a while, so I didn't remember the original issue clearly when 
I gave you some test suggestions. Now that I looked at the code more 
closely, I have a proper way to reproduce the original issue.


Nack for this patch for 2 reasons:
1. You seem to have accidentally removed a GOV_STOP in your patch. We 
definitely can't do that. This broke changing governors and that's why 
your patch didn't cause any issues. Because all your governor echos were 
failing.
2. When we fixed that and actually tried a proper test (not the one I 
gave you), we reproduced the original issue.


To reproduce original issue:
Preconditions:
* lockdep is enabled
* governor per policy is enabled

Steps:
1. Set governor to ondemand.
2. Cat one of the ondemand sysfs files.
3. Change governor to conservative.

When you do that, there's an AB, BA dead lock issue with one thread 
trying to cat a governor sysfs file and another thread trying to change 
governors.


-Saravana

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 tip/core/rcu 01/10] rcu: Add call_rcu_tasks()

2014-07-31 Thread Lai Jiangshan
On 08/01/2014 12:09 AM, Paul E. McKenney wrote:

> 
>>> +   /*
>>> +* There were callbacks, so we need to wait for an
>>> +* RCU-tasks grace period.  Start off by scanning
>>> +* the task list for tasks that are not already
>>> +* voluntarily blocked.  Mark these tasks and make
>>> +* a list of them in rcu_tasks_holdouts.
>>> +*/
>>> +   rcu_read_lock();
>>> +   for_each_process_thread(g, t) {
>>> +   if (t != current && ACCESS_ONCE(t->on_rq) &&
>>> +   !is_idle_task(t)) {
>>
>> What happen when the trampoline is on the idle task?
>>
>> I think we need to use schedule_on_each_cpu() to replace one of
>> the synchronize_sched() in this function. (or other stuff which can
>> cause real schedule for *ALL* online CPUs).
> 
> Well, that is one of the questions in the 0/10 cover letter.  If it turns
> out to be necessary to worry about idle-task trampolines, it should be
> possible to avoid hammering all idle CPUs in the common case.  Though maybe
> battery-powered devices won't need RCU-tasks.
> 

trampolines on NO_HZ idle CPU can be arbitrary long, (example, SMI happens
inside the trampoline).  So only the real schedule on idle CPU is reliable
to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH percpu/for-3.17 1/2] percpu: implement percpu_pool

2014-07-31 Thread Tejun Heo
Hello, Andrew.

On Thu, Jul 31, 2014 at 6:03 PM, Andrew Morton
 wrote:
> I don't think we should add facilities such as this.  Because if we do,
> people will use them and thereby make the kernel less reliable, for
> obvious reasons.
>
> It would be better to leave the nasty hack localized within
> blk-throttle.c and hope that someone finds a way of fixing it.

The thing is we need similar facilities in the IO path in other places
too. They share exactly the same characteristics - opportunistic
percpu allocations during IO which are expected to fail from time to
time and they will all implement fallback behavior on allocation
failures. I'm not sure how this makes the kernel less reliable. This
conceptually isn't different from atomic allocations which we also use
in a similar way. If you're worried that people might use this
assuming that it won't fail, an obvious solution is adding a failure
injection for debugging, but really except for being a bit ghetto,
this is just the atomic allocation for percpu areas.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] printk: Add function to return log buffer address and size

2014-07-31 Thread Joe Perches
On Thu, 2014-07-31 at 15:22 -0700, Andrew Morton wrote:

> Please include this in whatever tree carries "powerpc/powernv:
> Interface to register/unregister opal dump region".

At some point, I'd like to redo the patch series
that breaks up printk.c into more manageable blocks.

https://lkml.org/lkml/2012/10/17/41

Any suggestion for timing?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REVIEW][PATCH 4/4] proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts

2014-07-31 Thread Eric W. Biederman

In oddball cases where the thread has a different mount namespace than
the thread group leader or more likely in cases where the thread
remains and the thread group leader has exited this ensures that
/proc/mounts continues to work.

This should not cause any problems but if it does this patch can just
be reverted.

Signed-off-by: "Eric W. Biederman" 
---
 fs/proc/root.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/proc/root.c b/fs/proc/root.c
index 48f1c03bc7ed..92c12c243ce3 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -173,7 +173,7 @@ void __init proc_root_init(void)
 
proc_self_init();
proc_thread_self_init();
-   proc_symlink("mounts", NULL, "self/mounts");
+   proc_symlink("mounts", NULL, "thread-self/mounts");
 
proc_net_init();
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REVIEW][PATCH 3/4] proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net

2014-07-31 Thread Eric W. Biederman

In oddball cases where the thread has a different network namespace
than the primary thread group leader or more likely in cases where
the thread remains and the thread group leader has exited this
ensures that /proc/net continues to work.

This should not cause any problems but if it does this patch can just
be reverted.

Signed-off-by: "Eric W. Biederman" 
---
 fs/proc/proc_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index a63af3e0a612..39481028ec08 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -226,7 +226,7 @@ static struct pernet_operations __net_initdata 
proc_net_ns_ops = {
 
 int __init proc_net_init(void)
 {
-   proc_symlink("net", NULL, "self/net");
+   proc_symlink("net", NULL, "thread-self/net");
 
return register_pernet_subsys(_net_ns_ops);
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REVIEW][PATCH 1/4] proc: Have net show up under /proc//task/

2014-07-31 Thread Eric W. Biederman

Network namespaces are per task so it make sense for them to show up
in the task directory.

Signed-off-by: "Eric W. Biederman" 
---
 fs/proc/base.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 2d696b0c93bf..ed34e405c6b9 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2895,6 +2895,9 @@ static const struct pid_entry tid_base_stuff[] = {
DIR("fd",S_IRUSR|S_IXUSR, proc_fd_inode_operations, 
proc_fd_operations),
DIR("fdinfo",S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, 
proc_fdinfo_operations),
DIR("ns",S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, 
proc_ns_dir_operations),
+#ifdef CONFIG_NET
+   DIR("net",S_IRUGO|S_IXUGO, proc_net_inode_operations, 
proc_net_operations),
+#endif
REG("environ",   S_IRUSR, proc_environ_operations),
INF("auxv",  S_IRUSR, proc_pid_auxv),
ONE("status",S_IRUGO, proc_pid_status),
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REVIEW][PATCH 2/4] proc: Implement /proc/thread-self to point at the directory of the current thread

2014-07-31 Thread Eric W. Biederman

/proc/thread-self is derived from /proc/self.  /proc/thread-self
points to the directory in proc containing information about the
current thread.

This funtionality has been missing for a long time, and is tricky to
implement in userspace as gettid() is not exported by glibc.  More
importantly this allows fixing defects in /proc/mounts and /proc/net
where in a threaded application today they wind up being empty files
when only the initial pthread has exited, causing problems for other
threads.

Signed-off-by: "Eric W. Biederman" 
---
 fs/proc/Makefile  |  1 +
 fs/proc/base.c| 15 +---
 fs/proc/inode.c   |  7 +++-
 fs/proc/internal.h|  6 +++
 fs/proc/root.c|  3 ++
 fs/proc/thread_self.c | 85 +++
 include/linux/pid_namespace.h |  1 +
 7 files changed, 112 insertions(+), 6 deletions(-)
 create mode 100644 fs/proc/thread_self.c

diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index 239493ec718e..7151ea428041 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -23,6 +23,7 @@ proc-y+= version.o
 proc-y += softirqs.o
 proc-y += namespaces.o
 proc-y += self.o
+proc-y += thread_self.o
 proc-$(CONFIG_PROC_SYSCTL) += proc_sysctl.o
 proc-$(CONFIG_NET) += proc_net.o
 proc-$(CONFIG_PROC_KCORE)  += kcore.o
diff --git a/fs/proc/base.c b/fs/proc/base.c
index ed34e405c6b9..0131156ce7c9 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2847,7 +2847,7 @@ retry:
return iter;
 }
 
-#define TGID_OFFSET (FIRST_PROCESS_ENTRY + 1)
+#define TGID_OFFSET (FIRST_PROCESS_ENTRY + 2)
 
 /* for the /proc/ directory itself, after non-process stuff has been done */
 int proc_pid_readdir(struct file *file, struct dir_context *ctx)
@@ -2859,14 +2859,19 @@ int proc_pid_readdir(struct file *file, struct 
dir_context *ctx)
if (pos >= PID_MAX_LIMIT + TGID_OFFSET)
return 0;
 
-   if (pos == TGID_OFFSET - 1) {
+   if (pos == TGID_OFFSET - 2) {
struct inode *inode = ns->proc_self->d_inode;
if (!dir_emit(ctx, "self", 4, inode->i_ino, DT_LNK))
return 0;
-   iter.tgid = 0;
-   } else {
-   iter.tgid = pos - TGID_OFFSET;
+   ctx->pos = pos = pos + 1;
+   }
+   if (pos == TGID_OFFSET - 1) {
+   struct inode *inode = ns->proc_thread_self->d_inode;
+   if (!dir_emit(ctx, "thread-self", 11, inode->i_ino, DT_LNK))
+   return 0;
+   ctx->pos = pos = pos + 1;
}
+   iter.tgid = pos - TGID_OFFSET;
iter.task = NULL;
for (iter = next_tgid(ns, iter);
 iter.task;
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 0adbc02d60e3..333080d7a671 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -442,6 +442,7 @@ struct inode *proc_get_inode(struct super_block *sb, struct 
proc_dir_entry *de)
 int proc_fill_super(struct super_block *s)
 {
struct inode *root_inode;
+   int ret;
 
s->s_flags |= MS_NODIRATIME | MS_NOSUID | MS_NOEXEC;
s->s_blocksize = 1024;
@@ -463,5 +464,9 @@ int proc_fill_super(struct super_block *s)
return -ENOMEM;
}
 
-   return proc_setup_self(s);
+   ret = proc_setup_self(s);
+   if (ret) {
+   return ret;
+   }
+   return proc_setup_thread_self(s);
 }
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 3ab6d14e71c5..ee04619173b2 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -234,6 +234,12 @@ static inline int proc_net_init(void) { return 0; }
 extern int proc_setup_self(struct super_block *);
 
 /*
+ * proc_thread_self.c
+ */
+extern int proc_setup_thread_self(struct super_block *);
+extern void proc_thread_self_init(void);
+
+/*
  * proc_sysctl.c
  */
 #ifdef CONFIG_PROC_SYSCTL
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 5dbadecb234d..48f1c03bc7ed 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -149,6 +149,8 @@ static void proc_kill_sb(struct super_block *sb)
ns = (struct pid_namespace *)sb->s_fs_info;
if (ns->proc_self)
dput(ns->proc_self);
+   if (ns->proc_thread_self)
+   dput(ns->proc_thread_self);
kill_anon_super(sb);
put_pid_ns(ns);
 }
@@ -170,6 +172,7 @@ void __init proc_root_init(void)
return;
 
proc_self_init();
+   proc_thread_self_init();
proc_symlink("mounts", NULL, "self/mounts");
 
proc_net_init();
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
new file mode 100644
index ..59075b509df3
--- /dev/null
+++ b/fs/proc/thread_self.c
@@ -0,0 +1,85 @@
+#include 
+#include 
+#include 
+#include 
+#include "internal.h"
+
+/*
+ * /proc/thread_self:
+ */
+static int proc_thread_self_readlink(struct dentry *dentry, char __user 
*buffer,
+ int buflen)
+{
+   struct pid_namespace *ns = 

[REVIEW][PATCH 0/4] /proc/thread-self

2014-07-31 Thread Eric W. Biederman

This patchset implements /proc/thread-self a magic symlink that
solves a couple of problems.

- It makes it easy to get to a specific threads directory in /proc
  with gettid() not being exported in glibc this is currently a pain.

- It allows fixing the problem present in /proc/mounts and /proc/net
  that when the thread group leader exits but the entire thread group
  remains /proc/self/net and /proc/self/mounts and thus /proc/mounts and
  /proc/net become empty.

- As mount and network namespaces are per thread it allows /proc/net and
  /proc/mounts to reflect this.

This is small chance changing /proc/net and /proc/mounts will cause
userspace regressions (although nothing has shown up in my testing) if
that happens we can just point the change that moves them from
/proc/self/... to /proc/thread-self/...

Eric W. Biederman (4):
  proc: Have net show up under /proc//task/
  proc: Implement /proc/thread-self to point at the directory of the 
current thread
  proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net
  proc: Point /proc/mounts at /proc/thread-self/mounts instead of 
/proc/self/mounts

 fs/proc/Makefile  |  1 +
 fs/proc/base.c| 18 ++---
 fs/proc/inode.c   |  7 +++-
 fs/proc/internal.h|  6 +++
 fs/proc/proc_net.c|  2 +-
 fs/proc/root.c|  5 ++-
 fs/proc/thread_self.c | 85 +++
 include/linux/pid_namespace.h |  1 +
 8 files changed, 117 insertions(+), 8 deletions(-)

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] irq / PM: New driver interface for wakeup interrupts

2014-07-31 Thread Rafael J. Wysocki
On Friday, August 01, 2014 01:41:31 AM Thomas Gleixner wrote:
> On Thu, 31 Jul 2014, Rafael J. Wysocki wrote:
> > On Thursday, July 31, 2014 04:12:55 PM Alan Stern wrote:
> > > Pardon me for sticking my nose into the middle of the conversation, but
> > > here's what it looks like to me:
> > > 
> > > The entire no_irq phase of suspend/resume is starting to seem like a
> > > mistake.  We should never have done it.
> > 
> > In hindsight, I totally agree.  Question is what we can do about it now.
> 
> 
> 
> > So how can we eliminate the noirq phase in a workable way?
> 
> The straight way to do that is breaking the world and some more and
> then fix up a gazillion of device drivers by doing a massive voodoo
> debugging effort simply because in most cases we do not get any useful
> information out of the system once the shit hits the fan.
> 
> We could add instrumentation to the core code about interrupts which
> are coming in unexpectedly during suspend, but that does not solve
> anything.
> 
> We really cannot call any device handler at that point as clocks might
> be turned off already and any access to a device register might simply
> cause a full undebuggable stall of the CPU.
> 
> And there is no way to prove that there is no chance of a spurious
> interrupt for a given device. 
> 
> So if we cannot handle it at the infrastructure level, we need to make
> sure that every fricking device driver interrupt handler has a 
> 
>  if (dev->suspended)
>   return CRAP;
> 
> conditional as the first line of code in it.
> 
> What is that buying us? 
> 
> Nothing than a shitload of hard to understand problems, really. The
> only sensible way to handle this is at the core level.
> 
> #1 There is no way that you can rely on random drivers to do the Right
>Thing. 
> 
> #2 There is no way that all hardware is implemented in a sane way.
> 
> #3 You CANNOT educate the people who are tasked to implement something
>which "does the job" to understand all the subtle details of
>suspend/resume or whatever.

These are fair points.

However, if the driver implements ->runtime_suspend, it has to handle
the "my device is suspended" condition in its interrupt handler regardless.

For such a driver doing the same over system suspend/resume shouldn't
be a real problem.

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the driver-core tree with the tip tree

2014-07-31 Thread Greg KH
On Thu, Jul 31, 2014 at 05:07:35PM +1000, Stephen Rothwell wrote:
> Hi Greg,
> 
> Today's linux-next merge of the driver-core tree got a conflict in
> lib/Kconfig.debug between commit e704f93af5a0 ("kernel: time: Add
> udelay_test module to validate udelay") from the tip tree and commit
> 0a8adf584759 ("test: add firmware_class loader test") from the
> driver-core tree.
> 
> I fixed it up (see below) and can carry the fix as necessary (no action
> is required).

Looks good, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local

2014-07-31 Thread Davidlohr Bueso
On Thu, 2014-07-31 at 12:42 +0200, Peter Zijlstra wrote:
> On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote:
> > On Tue, 29 Jul 2014 13:24:05 +0800
> > Aaron Lu  wrote:
> > 
> > > FYI, we noticed the below changes on
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure 
> > > task_numa_migrate() checks the preferred node")
> > > 
> > > ebe06187bf2aec1  a43455a1d572daf7b730fe12e  
> > > ---  -  
> > >  94500 ~ 3%+115.6% 203711 ~ 6%  
> > > ivb42/hackbench/50%-threads-pipe
> > >  67745 ~ 4% +64.1% 74 ~ 5%  
> > > lkp-snb01/hackbench/50%-threads-socket
> > > 162245 ~ 3% +94.1% 314885 ~ 6%  TOTAL 
> > > proc-vmstat.numa_hint_faults_local
> > 
> > Hi Aaron,
> > 
> > Jirka Hladky has reported a regression with that changeset as
> > well, and I have already spent some time debugging the issue.
> 
> So assuming those numbers above are the difference in
> numa_hint_local_faults, the report is actually a significant
> _improvement_, not a regression.
> 
> On my IVB-EP I get similar numbers; using:
> 
>   PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
>   perf bench sched messaging -g 24 -t -p -l 6
>   POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2`
>   echo $((POST-PRE))
> 
> 
> tip/mater+origin/master   tip/master+origin/master-a43455a1d57
> 
> local total   local   total
> faults  timefaults  time
> 
> 19971 51.384  10104   50.838
> 17193 50.564  911650.208
> 13435 49.057  833251.344
> 23794 50.795  995451.364
> 20255 49.463  959851.258
> 
> 18929.6   50.2526 9420.8  51.0024
> 3863.61   0.96717.78  0.49
> 
> So that patch improves both local faults and runtime. Its good (even
> though for the runtime we're still inside stdev overlap, so ideally I'd
> do more runs).
> 
> 
> Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and
> that slightly reduces both again:
> 
> tip/master+origin/master+patch
> 
> local total
> faults  time
> 
> 21296 50.541
> 12771 50.54
> 13872 52.224
> 23352 50.85
> 16516 50.705
> 
> 17561.4   50.972
> 4613.32   0.71
> 
> So for hackbench a43455a1d57 is good and the proposed patch is making
> things worse.

It also seems to be the case on a 8-socket 80 core DL980:

tip/master baseline:
67276 169.590 [sec]
82400 188.406 [sec]
87827 201.122 [sec]
96659 228.243 [sec]
83180 192.422 [sec]

tip/master + a43455a1d57 reverted
36686 170.373 [sec]
52670 187.904 [sec]
55723 203.597 [sec]
41780 174.354 [sec]
36070 173.179 [sec]

Runtimes are pretty much all over the place, cannot really say if it's
gotten slower or faster. However, on avg, we nearly double the amount of
hint local faults with the commit in question.

After adding the proposed fix (NUMA_SCALE/8 variant), it goes down
again, closer to without a43455a1d57"

tip/master + patch
50591 175.272 [sec]
57858 191.969 [sec]
77564 215.429 [sec]
50613 179.384 [sec]
61673 201.694 [sec]

> Let me see if I can still find my SPECjbb2005 copy to see what that
> does.

I'll try to dig it up as well.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2] CMA/HOTPLUG: clear buffer-head lru before page migration

2014-07-31 Thread Gioh Kim



2014-08-01 오전 7:57, Andrew Morton 쓴 글:

On Thu, 31 Jul 2014 11:22:35 +0900 Gioh Kim  wrote:


The previous PATCH inserts invalidate_bh_lrus() only into CMA code.
HOTPLUG needs also dropping bh of lru.
So v2 inserts invalidate_bh_lrus() into both of CMA and HOTPLUG.


 8< 
The bh must be free to migrate a page at which bh is mapped.
The reference count of bh is increased when it is installed
into lru so that the bh of lru must be freed before migrating the page.

This frees every bh of lru. We could free only bh of migrating page.
But searching lru sometimes costs more than invalidating entire lru.

Signed-off-by: Gioh Kim 
Acked-by: Michal Nazarewicz 
---
  mm/memory_hotplug.c |1 +
  mm/page_alloc.c |2 ++
  2 files changed, 3 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a3797d3..1c5454f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1672,6 +1672,7 @@ repeat:
 lru_add_drain_all();
 cond_resched();
 drain_all_pages();
+   invalidate_bh_lrus();


Both of these calls should have a comment explaining why
invalidate_bh_lrus() is being called.


 }

 pfn = scan_movable_pages(start_pfn, end_pfn);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b99643d4..c00dedf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6369,6 +6369,8 @@ int alloc_contig_range(unsigned long start, unsigned long 
end,
 if (ret)
 return ret;

+   invalidate_bh_lrus();
+
 ret = __alloc_contig_migrate_range(, start, end);
 if (ret)
 goto done;


I do feel that this change is likely to be beneficial, but I don't want
to apply such a patch until I know what its effects are upon all
alloc_contig_range() callers.  Especially hugetlb.


I'm very sorry to hear that.
How can I check the effects?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] new API to allocate buffer-cache for superblock in non-movable area

2014-07-31 Thread Gioh Kim



2014-07-31 오후 9:21, Jan Kara 쓴 글:

On Thu 31-07-14 09:37:15, Gioh Kim wrote:



2014-07-31 오전 9:03, Jan Kara 쓴 글:

On Thu 31-07-14 08:54:40, Gioh Kim wrote:

2014-07-30 오후 7:11, Jan Kara 쓴 글:

On Wed 30-07-14 16:44:24, Gioh Kim wrote:

2014-07-22 오후 6:38, Jan Kara 쓴 글:

On Tue 22-07-14 09:30:05, Peter Zijlstra wrote:

On Tue, Jul 22, 2014 at 02:18:47PM +0900, Gioh Kim wrote:

Hello,

This patch try to solve problem that a long-lasting page cache of
ext4 superblock disturbs page migration.

I've been testing CMA feature on my ARM-based platform
and found some pages for page caches cannot be migrated.
Some of them are page caches of superblock of ext4 filesystem.

Current ext4 reads superblock with sb_bread(). sb_bread() allocates page

>from movable area. But the problem is that ext4 hold the page until

it is unmounted. If root filesystem is ext4 the page cannot be migrated forever.

I introduce a new API for allocating page from non-movable area.
It is useful for ext4 and others that want to hold page cache for a long time.


There's no word on why you can't teach ext4 to still migrate that page.
For all I know it might be impossible, but at least mention why.


I am very sorry for lacking of details.

In ext4_fill_super() the buffer-head of superblock is stored in sbi->s_sbh.
The page belongs to the buffer-head is allocated from movable area.
To migrate the page the buffer-head should be released via brelse().
But brelse() is not called until unmount.

   Hum, I don't see where in the code do we check buffer_head use count. Can
you please point me? Thanks.


Filesystem code does not check buffer_head use count.  sb_bread() returns
the buffer_head that is included in bh_lru and has non-zero use count.
You can see the bh_lru code in buffer.c: __find_get_clock() and
lookup_bh_lru().  bh_lru_install() inserts the buffer_head into the
bh_lru().  It first calls get_bh() to increase the use count and insert
bh into the lru array.

The buffer_head use count is non-zero until brelse() is called.

   So I probably didn't phrase the question precisely enough. What I was
asking about is where exactly *migration* code checks buffer use count?
Because as I'm looking at buffer_migrate_page() we lock the buffers on a
migrated page but we don't look at buffer use counts... So it seems to me
that migration of a page with buffers should succeed even if buffer head
has an elevated use count. Now I think that it *should* check the buffer
use counts (it is dangerous to migrate buffers someone holds reference to)
but I just cannot find that place. Or does CMA use some other migration
function for buffer pages than buffer_migrate_page()?


CMA allocation function is cma_alloc().
Function flow is alloc_contig_range() -> __alloc_contig_migrate_range() -> 
migrate_pages -> unmap_and_move
-> __unmap_and_move -> try_to_free_buffers -> drop_buffers -> buffer_busy.

The buffer_busy() is checking b_count.
If buffer is busy buffer-cache cannot be removed.
So the page that includes buffer_head and the page that is refered by
buffer_head are not movable.

Is this what you need?

   Yes, this is what I was asking about. Thanks! But as I'm looking into
__unmap_and_move() it calls try_to_free_buffers() only if page->mapping ==
NULL. As the comment before that test states, this can happen only for swap
cache (not our case) or for pagecache pages that were truncated and not yet
fully cleaned up. But superblock page cannot really be truncated. So I
somewhat doubt you can hit the above path for a page holding superblock...


I printed the address of busy buffer_head in drop_buffers() that is called by 
try_to_free_buffers().
And I printed the address of sb buffer_head.
They were the same.

I'm going to check page->mapping.




Honza


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86_64,vsyscall] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

2014-07-31 Thread Fengguang Wu
> > Oops, github needs this link for downloading big files:
> >
> > https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/yocto-minimal-i386.cgz
> 
> Or 
> https://github.com/fengguang/reproduce-kernel-bug/blob/master/initrd/yocto-minimal-x86_64.cgz,
> I guess?  The particular failure you're seeing here is only possible
> on 64-bit kernels.

You are right. Sorry for the mistake!

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [perf/x86/RAPL] BUG: unable to handle kernel NULL pointer dereference at 00000028

2014-07-31 Thread Fengguang Wu
On Thu, Jul 31, 2014 at 07:57:25PM +0200, Stephane Eranian wrote:
> On Thu, Jul 31, 2014 at 4:32 AM, Fengguang Wu  wrote:
> > Hi Stephane,
> >
> > On Wed, Jul 30, 2014 at 07:56:11PM +0200, Stephane Eranian wrote:
> >> On Wed, Jul 30, 2014 at 7:53 AM, Fengguang Wu  
> >> wrote:
> >> > On Wed, Jul 30, 2014 at 06:45:58AM +0200, Stephane Eranian wrote:
> >> >> On Wed, Jul 30, 2014 at 6:00 AM, Fengguang Wu  
> >> >> wrote:
> >> >> > Greetings,
> >> >> >
> >> >> > 0day kernel testing robot got the below dmesg and the first bad 
> >> >> > commit is
> >> >> >
> >> >> Is this booting a guest kernel or native?
> >> >
> >> > It's a guest kernel.
> >> >
> >> >> What is the  host CPU?
> >> >
> >> > The host CPU is E5-2680, Sandy Bridge-EP.
> >> >
> >> I thought this problem had already be mentioned a while back.
> >>
> >> See https://lkml.org/lkml/2014/3/6/685
> >> And https://lkml.org/lkml/2014/4/23/512
> >>
> >> So what you are telling here is that those two fixes never made it or
> >> that you are
> >> running an older kernel.
> >
> > I just checked linux-next and find that the bug in rapl_pmu_init() has
> > been fixed. linux-next happen to have the same "BUG: unable to handle
> > kernel NULL pointer dereference" message but at another function
> > validate_chain().. Attached is the dmesg in linux-next.
> >
> > Sorry for the noise!
> >
> Is it fixed with the two patches I referred you to?

Yes.

Thanks,
Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks()

2014-07-31 Thread Frederic Weisbecker
On Thu, Jul 31, 2014 at 02:55:01PM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> This commit adds a new RCU-tasks flavor of RCU, which provides
> call_rcu_tasks().  This RCU flavor's quiescent states are voluntary
> context switch (not preemption!), userspace execution, and the idle loop.
> Note that unlike other RCU flavors, these quiescent states occur in tasks,
> not necessarily CPUs.  Includes fixes from Steven Rostedt.
> 
> This RCU flavor is assumed to have very infrequent latency-tolerate
> updaters.  This assumption permits significant simplifications, including
> a single global callback list protected by a single global lock, along
> with a single linked list containing all tasks that have not yet passed
> through a quiescent state.  If experience shows this assumption to be
> incorrect, the required additional complexity will be added.
> 
> Suggested-by: Steven Rostedt 
> Signed-off-by: Paul E. McKenney 
> ---
>  include/linux/init_task.h |   9 +++
>  include/linux/rcupdate.h  |  36 ++
>  include/linux/sched.h |  23 ---
>  init/Kconfig  |  10 +++
>  kernel/rcu/tiny.c |   2 +
>  kernel/rcu/tree.c |   2 +
>  kernel/rcu/update.c   | 171 
> ++
>  7 files changed, 242 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index 6df7f9fe0d01..78715ea7c30c 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -124,6 +124,14 @@ extern struct group_info init_groups;
>  #else
>  #define INIT_TASK_RCU_PREEMPT(tsk)
>  #endif
> +#ifdef CONFIG_TASKS_RCU
> +#define INIT_TASK_RCU_TASKS(tsk) \
> + .rcu_tasks_holdout = false, \
> + .rcu_tasks_holdout_list =   \
> + LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),
> +#else
> +#define INIT_TASK_RCU_TASKS(tsk)
> +#endif
>  
>  extern struct cred init_cred;
>  
> @@ -231,6 +239,7 @@ extern struct task_group root_task_group;
>   INIT_FTRACE_GRAPH   \
>   INIT_TRACE_RECURSION\
>   INIT_TASK_RCU_PREEMPT(tsk)  \
> + INIT_TASK_RCU_TASKS(tsk)\
>   INIT_CPUSET_SEQ(tsk)\
>   INIT_RT_MUTEXES(tsk)\
>   INIT_VTIME(tsk) \
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 6a94cc8b1ca0..829efc99df3e 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head,
>  
>  void synchronize_sched(void);
>  
> +/**
> + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period
> + * @head: structure to be used for queueing the RCU updates.
> + * @func: actual callback function to be invoked after the grace period
> + *
> + * The callback function will be invoked some time after a full grace
> + * period elapses, in other words after all currently executing RCU
> + * read-side critical sections have completed. call_rcu_tasks() assumes
> + * that the read-side critical sections end at a voluntary context
> + * switch (not a preemption!), entry into idle, or transition to usermode
> + * execution.  As such, there are no read-side primitives analogous to
> + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended
> + * to determine that all tasks have passed through a safe state, not so
> + * much for data-strcuture synchronization.
> + *
> + * See the description of call_rcu() for more detailed information on
> + * memory ordering guarantees.
> + */
> +void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head 
> *head));
> +
>  #ifdef CONFIG_PREEMPT_RCU
>  
>  void __rcu_read_lock(void);
> @@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct 
> task_struct *prev,
>   rcu_irq_exit(); \
>   } while (0)
>  
> +/*
> + * Note a voluntary context switch for RCU-tasks benefit.  This is a
> + * macro rather than an inline function to avoid #include hell.
> + */
> +#ifdef CONFIG_TASKS_RCU
> +#define rcu_note_voluntary_context_switch(t) \
> + do { \
> + preempt_disable(); /* Exclude synchronize_sched(); */ \
> + if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \
> + ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \
> + preempt_enable(); \
> + } while (0)
> +#else /* #ifdef CONFIG_TASKS_RCU */
> +#define rcu_note_voluntary_context_switch(t) do { } while (0)
> +#endif /* #else #ifdef CONFIG_TASKS_RCU */
> +
>  #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || 
> defined(CONFIG_SMP)
>  bool __rcu_is_watching(void);
>  #endif /* #if 

Re: [RFC PATCH 0/2] dirreadahead system call

2014-07-31 Thread Dave Chinner
On Thu, Jul 31, 2014 at 01:19:45PM +0200, Andreas Dilger wrote:
> On Jul 31, 2014, at 6:49, Dave Chinner  wrote:
> > 
> >> On Mon, Jul 28, 2014 at 03:19:31PM -0600, Andreas Dilger wrote:
> >>> On Jul 28, 2014, at 6:52 AM, Abhijith Das  wrote:
> >>> OnJuly 26, 2014 12:27:19 AM "Andreas Dilger"  wrote:
>  Is there a time when this doesn't get called to prefetch entries in
>  readdir() order?  It isn't clear to me what benefit there is of returning
>  the entries to userspace instead of just doing the statahead implicitly
>  in the kernel?
>  
>  The Lustre client has had what we call "statahead" for a while,
>  and similar to regular file readahead it detects the sequential access
>  pattern for readdir() + stat() in readdir() order (taking into account if
>  ".*"
>  entries are being processed or not) and starts fetching the inode
>  attributes asynchronously with a worker thread.
> >>> 
> >>> Does this heuristic work well in practice? In the use case we were trying 
> >>> to
> >>> address, a Samba server is aware beforehand if it is going to stat all the
> >>> inodes in a directory.
> >> 
> >> Typically this works well for us, because this is done by the Lustre
> >> client, so the statahead is hiding the network latency of the RPCs to
> >> fetch attributes from the server.  I imagine the same could be seen with
> >> GFS2. I don't know if this approach would help very much for local
> >> filesystems because the latency is low.
> >> 
>  This syscall might be more useful if userspace called readdir() to get
>  the dirents and then passed the kernel the list of inode numbers
>  to prefetch before starting on the stat() calls. That way, userspace
>  could generate an arbitrary list of inodes (e.g. names matching a
>  regexp) and the kernel doesn't need to guess if every inode is needed.
> >>> 
> >>> Were you thinking arbitrary inodes across the filesystem or just a subset
> >>> from a directory? Arbitrary inodes may potentially throw up locking 
> >>> issues.
> >> 
> >> I was thinking about inodes returned from readdir(), but the syscall
> >> would be much more useful if it could handle arbitrary inodes.
> > 
> > I'm not sure we can do that. The only way to safely identify a
> > specific inode in the filesystem from userspace is via a filehandle.
> > Plain inode numbers are susceptible to TOCTOU race conditions that
> > the kernel cannot resolve. Also, lookup by inode number bypasses
> > directory access permissions, so is not something we would expose
> > to arbitrary unprivileged users.
> 
> None of these issues are relevant in the API that I'm thinking about. 
> The syscall just passes the list of inode numbers to be prefetched
> into kernel memory, and then stat() is used to actually get the data into
> userspace (or whatever other operation is to be done on them),
> so there is no danger if the wrong inode is prefetched.  If the inode
> number is bad the filesystem can just ignore it. 

Which means the filesystem has to treat the inode number as
potentially hostile. i.e. it can not be trusted to be correct and so
must take slow paths to validate the inode numbers. This adds
*significant* overhead to the readahead path for some filesystems:
readahead is only a win if it is low cost.

For example, on XFS every untrusted inode number lookup requires an
inode btree lookup to validate the inode is actually valid on disk
and that is it allocated and has references. That lookup serialises
against inode allocation/freeing as well as other lookups. In
comparison, when using a trusted inode number from a directory
lookup within the kernel, we only need to do a couple of shift and
mask operations to convert it to a disk address and we are good to
go.

i.e. the difference is at least 5 orders of magnitude higher CPU usage
for an "inode number readahead" syscall versus a "directory
readahead" syscall, it has significant serialisation issues and it
can stall other modification/lookups going on at the same time.
That's *horrible behaviour* for a speculative readahead operation,
but because the inodenumbers are untrusted, we can't avoid it.

So, again, it's way more overhead than userspace just calling
stat() asycnhronously on many files at once as readdir/gentdents
returns dirents from the kernel to speed up cache population.

That's my main issue with this patchset - it's implementing
something in kernelspace that can *easily* be done generically in
userspace without introducing all sorts of nasty corner cases that
we have to handle in the kernel. We only add functionality to the kernel if 
there's a
compelling reason to do it in kernelspace, and right now I just
don't see any numbers that justify adding readdir+stat() readahead
or inode number based cache population in kernelspace.

Before we add *any* syscall for directory readahead, we need
comparison numbers against doing the dumb multithreaded
userspace readahead of stat() calls. If userspace can do 

Re: [PATCH v3] Documentation: devicetree: Fix tps65090 typos in example

2014-07-31 Thread Javier Martinez Canillas
Hello Andreas,

On Wed, Jul 30, 2014 at 11:29 PM, Andreas Färber  wrote:
> Specification and existing device trees use vsys-l{1,2}-supply,
> not vsys_l{1,2}-supply. Fix the example to match the specification.
>
> Reviewed-by: Doug Anderson 
> Acked-by: Mark Rutland 
> Fixes: 21d2202158e9 ("mfd: tps65090: add DT support for tps65090")
> Signed-off-by: Andreas Färber 
> ---
>  v2 -> v3:
>  * Added Fixes header
>  * + regulator and mfd maintainers
>
>  v1 -> v2:
>  * More verbose commit message (requested by Mark Rutland)
>
>  Documentation/devicetree/bindings/regulator/tps65090.txt | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/regulator/tps65090.txt 
> b/Documentation/devicetree/bindings/regulator/tps65090.txt
> index 340980239ea9..ca69f5e3040c 100644
> --- a/Documentation/devicetree/bindings/regulator/tps65090.txt
> +++ b/Documentation/devicetree/bindings/regulator/tps65090.txt
> @@ -45,8 +45,8 @@ Example:
> infet5-supply = <_reg>;
> infet6-supply = <_reg>;
> infet7-supply = <_reg>;
> -   vsys_l1-supply = <_reg>;
> -   vsys_l2-supply = <_reg>;
> +   vsys-l1-supply = <_reg>;
> +   vsys-l2-supply = <_reg>;
>

True, these also matches the .supply_name used when registering the
LDO[1-2] regulators in the tps65090 driver. So clearly the example was
wrong while the property specification is correct.

Reviewed-by: Javier Martinez Canillas 

Best regards,
Javier
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] irq / PM: New driver interface for wakeup interrupts

2014-07-31 Thread Rafael J. Wysocki
On Friday, August 01, 2014 12:16:23 AM Thomas Gleixner wrote:
> On Thu, 31 Jul 2014, Rafael J. Wysocki wrote:
> > On Thursday, July 31, 2014 12:44:24 PM Thomas Gleixner wrote:
> > > What's this PCIe PME handler doing? Is it required functionality for
> > > the suspend/resume path or is it a wakeup/abort mechanism.
> > 
> > It is a wakeup/abort mechanism.
> 
> So why is it using IRQF_NO_SUSPEND in the first place?

It isn't in the current code.

It did in *some* of the prototype patches floating around, but they were
withdrawn.

In the last one ([2/3] in this series) it doesn't any more.

> Just because x86 does not have irq wake implemented or flagged it's irq
> chips with IRQCHIP_SKIP_SET_WAKE.

The reason for using IRQF_NO_SUSPEND was to make it wake up from 
suspend-to-idle,
but that was a sledgehammer approach.

> Right, so instead of thinking about a proper solution driver folks just slap
> the next available thing on it w/o thinking about the consequences. But,
> thats partly our own fault due to lack of proper documentation.

Prototyping with the next available thing is not generally wrong in my view
as long as this doesn't get to the code base without consideration.

> > And before we enter the wakeup handling slippery slope, let me make a note
> > that this problem is bothering me quite a bit at the moment.  In my opinion
> > we need to address it somehow regardless of the wakeup issues and I'm not 
> > sure
> > if failing __setup_irq() when there's a mismatch (that is, there are 
> > existing
> > actions for the given irq_desc and their IRQF_NO_SUSPEND settings are not
> > consistent with the new one) is the right way to do that, because it may 
> > make
> > things behave a bit randomly (it will always fail the second guy, but that 
> > need
> > not be the one who's requested IRQF_NO_SUSPEND and it depends on the 
> > ordering
> > between them).
> 
> I totally agree that we want to fix it and I'm going to help. Though I
> really wanted to have a clear picture of the stuff before making
> decisions.
>  
> > I had a couple of ideas, but none of them was particularly clean.  Ideally,
> > IRQF_NO_SUSPEND should always be requested without IRQF_SHARED, but I'm
> > afraid that we can't really do that for the ACPI SCI, because that may
> > cause problems to happen on some older systems where that interrupt is
> > actually shared.  On all systems I have immediate access to it isn't shared,
> > but I remember seeing some where it was.  On those systems the ACPI SCI 
> > itself
> > would not be affected, because it is requested quite early during system 
> > init,
> > but the other guys wanting to share the line with it would take a hit.
> > 
> > One thing I was thinking about was to return an error from 
> > suspend_device_irqs()
> > if there was a mismatch between IRQF_NO_SUSPEND settings for different 
> > irqactions
> > in the same irq_desc.  That would make system suspend fail on systems where 
> > it
> > is potentially unsafe, but at least any other functionality would not be 
> > affected.
> >
> 
> That's one possible solution. See below.
> 
> > > So many of them use it for wakeup purposes. Why so and how is that
> > > supposed to work?
> > 
> > Quite frankly, I'm not sure why they use it.  These are mostly drivers I'm 
> > not
> > familiar with on platforms I'm not familiar with.  My guess is that the lazy
> > disable mechanism is not sufficient for them for some reason.
> 
> Looking at a few of them I fear the reason is that the developer did
> not understand the wakeup mechanism at all. Again that's probably our
> fault, because the whole business including the irq part lacks proper
> in depth documentation.
> 
> > > The mechanism for wakeup sources is:
> > > 
> > > 1) Lazy disable the interrupt
> > > 
> > > 2) Do the transition into suspend with interrupts enabled
> > > 
> > > 3) Check whether one of the wakeup sources has triggered. If yes,
> > >abort. Otherwise suspend.
> > > 
> > > The ones marked IRQF_NO_SUSPEND are not part of the above scheme,
> > > because they are not checked. So they use different mechanisms to
> > > abort the suspend?
> > 
> > Well, if you look at the tegra_kbc driver, for example, it uses both
> > enable_irq_wake() and IRQF_NO_SUSPEND.  Why it does that, I don't know.
> 
> That doesn't make sense at all.
>  
> > Other ones seem to be using pm_wakeup_event(), but that will only abort
> > suspend when it is enabled upfront (it need not be).  Moreover, it wasn't
> > intended to be used that way.
> 
> Right. We should kill that before more people copy it blindly.
>  
> > It generally looks like things are used not as intended in the wakeup
> > area, sadly.  Perhaps that's my fault, because I wasn't looking carefully
> > enough every time, but I wasn't directly involved in any of them IIRC.
> > 
> > I guess that's an opportunity to clean that up ...
> 
> Agreed. I'm not frightened to do a tree wide sweep. Seems to become a
> habit :)
>  
> > And now 

Re: [PATCH v4 3/5] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

2014-07-31 Thread Saravana Kannan

On 07/31/2014 02:56 PM, Rafael J. Wysocki wrote:

On Thursday, July 24, 2014 06:07:26 PM Saravana Kannan wrote:

This patch simplifies a lot of the hotplug/suspend code by not
adding/removing/moving the policy/sysfs/kobj during hotplug and just leaves
the cpufreq directory and policy in place irrespective of whether the CPUs
are ONLINE/OFFLINE.


I'm still quite unsure how this is going to work with the real CPU hot-remove
that makes the entire sysfs cpu directories go away.  Can you please explain
that?


Sure. Not a problem. I just wanted to make sure you had a chance to look 
at the code first.


Physical hot-remove triggers a "remove" for all the  registered 
subsys_interfaces for that CPU (after going through a couple of 
functions). So, when that happens, the cpufreq subsys_interface remove 
for that CPU gets called. At that point, I clean up that CPU's SW states 
as if it was never plugged in from the start. If that CPU was the owner 
of the sysfs directory, I move it over to a different CPU.


-Saravana

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] ftrace: Require designated initialization of structures

2014-07-31 Thread Josh Triplett
Mark various ftrace structures with __designated_init.  Fix some ftrace
macros to use designated initializers for those structures.

Signed-off-by: Josh Triplett 
---
 include/linux/ftrace.h   | 4 ++--
 include/linux/ftrace_event.h | 4 ++--
 include/linux/syscalls.h | 8 ++--
 include/trace/ftrace.h   | 8 ++--
 kernel/trace/trace_export.c  | 4 +---
 5 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 404a686..cb2d023 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -260,7 +260,7 @@ struct ftrace_func_command {
int (*func)(struct ftrace_hash *hash,
char *func, char *cmd,
char *params, int enable);
-};
+} __designated_init;
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 
@@ -283,7 +283,7 @@ struct ftrace_probe_ops {
 unsigned long ip,
 struct ftrace_probe_ops *ops,
 void *data);
-};
+} __designated_init;
 
 extern int
 register_ftrace_function_probe(char *glob, struct ftrace_probe_ops *ops,
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index cff3106..25af313 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -198,7 +198,7 @@ struct ftrace_event_class {
struct list_head*(*get_fields)(struct ftrace_event_call *);
struct list_headfields;
int (*raw_init)(struct ftrace_event_call *);
-};
+} __designated_init;
 
 extern int ftrace_event_reg(struct ftrace_event_call *event,
enum trace_reg type, void *data);
@@ -293,7 +293,7 @@ struct ftrace_event_call {
int (*perf_perm)(struct ftrace_event_call *,
 struct perf_event *);
 #endif
-};
+} __designated_init;
 
 static inline const char *
 ftrace_event_name(struct ftrace_event_call *call)
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b0881a0..3002648 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -120,9 +120,7 @@ extern struct trace_event_functions 
exit_syscall_print_funcs;
static struct ftrace_event_call __used  \
  event_enter_##sname = {   \
.class  = _class_syscall_enter,   \
-   {   \
-   .name   = "sys_enter"#sname,\
-   },  \
+   .name   = "sys_enter"#sname,\
.event.funcs= _syscall_print_funcs,   \
.data   = (void *)&__syscall_meta_##sname,\
.flags  = TRACE_EVENT_FL_CAP_ANY,   \
@@ -136,9 +134,7 @@ extern struct trace_event_functions 
exit_syscall_print_funcs;
static struct ftrace_event_call __used  \
  event_exit_##sname = {\
.class  = _class_syscall_exit,\
-   {   \
-   .name   = "sys_exit"#sname, \
-   },  \
+   .name   = "sys_exit"#sname, \
.event.funcs= _syscall_print_funcs,\
.data   = (void *)&__syscall_meta_##sname,\
.flags  = TRACE_EVENT_FL_CAP_ANY,   \
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 26b4f2e..095aaca 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -699,9 +699,7 @@ static struct ftrace_event_class __used __refdata 
event_class_##call = { \
\
 static struct ftrace_event_call __used event_##call = {
\
.class  = _class_##template,  \
-   {   \
-   .tp = &__tracepoint_##call, \
-   },  \
+   .tp = &__tracepoint_##call, \
.event.funcs= _event_type_funcs_##template,  \
.print_fmt  = print_fmt_##template, \
.flags  = TRACE_EVENT_FL_TRACEPOINT,\
@@ -716,9 +714,7 @@ static const char print_fmt_##call[] = print;   
\
\
 static 

  1   2   3   4   5   6   7   8   9   10   >