Re: [RFC PATCH 0/2] dirreadahead system call
On Aug 1, 2014, at 1:53, Dave Chinner wrote: > On Thu, Jul 31, 2014 at 01:19:45PM +0200, Andreas Dilger wrote: >> None of these issues are relevant in the API that I'm thinking about. >> The syscall just passes the list of inode numbers to be prefetched >> into kernel memory, and then stat() is used to actually get the data into >> userspace (or whatever other operation is to be done on them), >> so there is no danger if the wrong inode is prefetched. If the inode >> number is bad the filesystem can just ignore it. > > Which means the filesystem has to treat the inode number as > potentially hostile. i.e. it can not be trusted to be correct and so > must take slow paths to validate the inode numbers. This adds > *significant* overhead to the readahead path for some filesystems: > readahead is only a win if it is low cost. > > For example, on XFS every untrusted inode number lookup requires an > inode btree lookup to validate the inode is actually valid on disk > and that is it allocated and has references. That lookup serialises > against inode allocation/freeing as well as other lookups. In > comparison, when using a trusted inode number from a directory > lookup within the kernel, we only need to do a couple of shift and > mask operations to convert it to a disk address and we are good to > go. > > i.e. the difference is at least 5 orders of magnitude higher CPU usage > for an "inode number readahead" syscall versus a "directory > readahead" syscall, it has significant serialisation issues and it > can stall other modification/lookups going on at the same time. > That's *horrible behaviour* for a speculative readahead operation, > but because the inodenumbers are untrusted, we can't avoid it. For ext4 this is virtually free. The same needs to happen for any inode number from NFS so it can't be that bad. Also, since this API would be prefetching inodes in bulk it could presumably optimize this to some extent. > So, again, it's way more overhead than userspace just calling > stat() asycnhronously on many files at once as readdir/gentdents > returns dirents from the kernel to speed up cache population. To me this seems like saying it is just as fast to submit hundreds of 256-byte random reads for a file as it is for large linear reads with readahead. Yes, it is possible for the kernel to optimize the random read workload to some extent, but not as easily as getting reads in order in the first place. > That's my main issue with this patchset - it's implementing > something in kernelspace that can *easily* be done generically in > userspace without introducing all sorts of nasty corner cases that > we have to handle in the kernel. We only add functionality to the kernel if > there's a > compelling reason to do it in kernelspace, and right now I just > don't see any numbers that justify adding readdir+stat() readahead > or inode number based cache population in kernelspace. The original patch showed there was definitely a significant win with the prefetch case over the single threaded readdir+stat. There is also something to be said for keeping complexity out of applications. Sure it is possible for apps to get good performance from AIO, but very few do so because of complexity. > Before we add *any* syscall for directory readahead, we need > comparison numbers against doing the dumb multithreaded > userspace readahead of stat() calls. If userspace can do this as > fast as the kernel can I'd be interested to see this also, but my prediction is that this will not deliver the kind of improvements you are expecting. Cheers, Andreas-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] autofs: the documentation I wanted to read
On Tue, 2014-07-29 at 12:00 +1000, NeilBrown wrote: > > This documents autofs from the perspective of what the module actually > supports rather than how automount is expected to use it. > It is based mostly on code review and very little on testing so it > may be inaccurate in some places. > > The document assumes the functionality added by the RCU-walk patches > that I posted recently. > > It is formatted using "markdown" and works best with Markdown.pl > (markdown_py doesn't like some constructs). > > > Signed-off-by: NeilBrown Acked-by: Ian Kent There are a couple of places that might need more work but it's better to have this now and to work with it in future than to hold it up. Especially since I can't quite nail down what it was that didn't quite sound right when reading it. Excellent job Neil, thanks very much. Ian > > diff --git a/Documentation/filesystems/autofs4.txt > b/Documentation/filesystems/autofs4.txt > new file mode 100644 > index ..45f67c83d713 > --- /dev/null > +++ b/Documentation/filesystems/autofs4.txt > @@ -0,0 +1,503 @@ > + > + p { max-width:50em} ol, ul {max-width: 40em} > + > + > +autofs - how it works > += > + > +Purpose > +--- > + > +The goal of autofs is to provide on-demand mounting and race free > +automatic unmounting of various other filesystems. This provides two > +key advantages: > + > +1. There is no need to delay boot until all filesystems that > + might be needed are mounted. Processes that try to access those > + slow filesystems might be delayed but other processes can > + continue freely. This is particularly important for > + network filesystems (e.g. NFS) or filesystems stored on > + media with a media-changing robot. > + > +2. The names and locations of filesystems can be stored in > + a remote database and can change at any time. The content > + in that data base at the time of access will be used to provide > + a target for the access. The interpretation of names in the > + filesystem can even be programatic rather than database-backed, > + allowing wildcards for example, and can vary based on the user who > + first accessed a name. > + > +Context > +--- > + > +The "autofs4" filesystem module is only one part of an autofs system. > +There also needs to be a user-space program which looks up names > +and mounts filesystems. This will often be the "automount" program, > +though other tools including "systemd" can make use of "autofs4". > +This document describes only the kernel module and the interactions > +required with any user-space program. Subsequent text refers to this > +as the "automount daemon" or simply "the daemon". > + > +"autofs4" is a Linux kernel module with provides the "autofs" > +filesystem type. Several "autofs" filesystems can be mounted and they > +can each be managed separately, or all managed by the same daemon. > + > +Content > +--- > + > +An autofs filesystem can contain 3 sorts of objects: directories, > +symbolic links and mount traps. Mount traps are directories with > +extra properties as described in the next section. > + > +Objects can only be created by the automount daemon: symlinks are > +created with a regular `symlink` systemcall, while directories and > +mount traps are created with `mkdir`. The determination of whether a > +directory should be a mount trap or not is quite _ad hoc_, largely for > +historical reasons, and is determined in part the > +*direct*/*indirect*/*offset* mount options, and the *maxproto* mount option. > + > +If neither the *direct* or *offset* mount options are given (so the > +mount is considered to be *indirect*), then the root directory is > +always a regular directory, otherwise it is a mount trap when it is > +empty and a regular directory when not empty. Note that *direct* and > +*offset* are treated identically so a concise summary is that the root > +directory is a mount trap only if the filesystem is mounted *direct* > +and the root is empty. > + > +Directories created in the root directory are mount traps only if the > +filesystem is mounted *indirect* and they are empty. > + > +Directories further down the tree depend on the *max_proto* mount > +option and particularly whether it is less than five or not. > +When *max_proto* is five, no directories further down the > +tree are ever mount traps, they are always regular directories. When > +the *max_proto* is four (or three), these directories are mount traps > +precisely when they are empty. > + > +So: non-empty (i.e. non-leaf) directories are never mount traps. Empty > +directories are sometimes mount traps, and sometimes not depending on > +where in the tree they are (root, top level, or lower) the *maxproto*, > +and whether the mount was *indirect* or not. > + > +Mount Traps > +--- > + > +A core element of the implementation of autofs is the Mount Traps > +which are provided by the Linux VFS. Any directory provided by a >
Re: [PATCH v2 0/3 net-next] Lockless netlink_lookup() with new concurrent hash table
From: David Miller Date: Thu, 31 Jul 2014 22:39:46 -0700 (PDT) > Looks great, series applied, thanks! Actually, this needs more work, reverted: net/netfilter/nft_hash.c: In function ‘nft_hash_destroy’: net/netfilter/nft_hash.c:183:3: error: ‘ht’ undeclared (first use in this function) net/netfilter/nft_hash.c:183:3: note: each undeclared identifier is reported only once for each function it appears in
Re: perf tools: Question about kmem and kernel symbol resolution
On Thu, 31 Jul 2014 11:27:11 -0300, Arnaldo Carvalho de Melo wrote: > Em Thu, Jul 31, 2014 at 05:35:32PM +0900, Namhyung Kim escreveu: >> I'm looking kernel symbol mismatch issue, and found something in perf >> kmem code. The commit e727ca73f85d ("perf kmem: Resolve kernel >> symbols again") added perf_session__create_kernel_maps() but I don't >> know why. Why did it miss the MMAP event? > >> I think if we create a kernel maps at report time, it might not match >> to samples in a perf.data if it's recorded on a different kernel. >> This is the main reason of the mismatch problem I'm currently chasing >> IMHO. What am I missing? > >> From a quick look, nothing, i.e. we can not call > perf_session__create_kernel_maps() at that point, as it will create the > kernel maps from the running kernel and use it with events from the > kernel that was in place when the perf.data file being processed was > created. > > Perhaps that problem was fixed somewhere else and we should just revert > that patch? > > Have you tried just reverting it and checking that the results are the > expected ones? I.e. that there is the kernel MMAP event in perf.data > file and that it gets properly processed? Simply reverting ended up with no symbols but it contains MMAP event for sure. Then I found a reason - it's simply because kmem tools doesn't register mmap event handlers. :-/ Adding mmap[2] handlers + reverting ended up with the expected output. I'll send the fix soon. Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Oops in scsi_put_host_cmd_pool
During test of Xen pvSCSI frontend module I found the following issue: When unplugging a passed-through SCSI-device the SCSI Host is removed. When calling the final scsi_host_put() from the driver an Oops is happening: [ 219.816292] (file=drivers/scsi/xen-scsifront.c, line=808) scsifront_remove: device/vscsi/1 removed [ 219.816371] BUG: unable to handle kernel NULL pointer dereference at 0010 [ 219.816380] IP: [] scsi_put_host_cmd_pool+0x38/0xb0 [ 219.816390] PGD 3bd60067 PUD 3d353067 PMD 0 [ 219.816396] Oops: [#1] SMP [ 219.816400] Modules linked in: nls_utf8 sr_mod cdrom xen_scsifront xt_pkttype xt_LOG xt_limit af_packet ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables x86_pkg_temp_thermal thermal_sys coretemp hwmon crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 microcode pcspkr sg dm_mod autofs4 scsi_dh_alua scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh xen_blkfront xen_netfront [ 219.816458] CPU: 0 PID: 23 Comm: xenwatch Not tainted 3.16.0-rc6-11-xen+ #66 [ 219.816463] task: 88003da985d0 ti: 88003da9c000 task.ti: 88003da9c000 [ 219.816467] RIP: e030:[] [] scsi_put_host_cmd_pool+0x38/0xb0 [ 219.816474] RSP: e02b:88003da9fc20 EFLAGS: 00010202 [ 219.816477] RAX: a01a50c0 RBX: RCX: 0003 [ 219.816481] RDX: 0240 RSI: 88003d242b80 RDI: 80c708c0 [ 219.816485] RBP: 88003da9fc38 R08: 4f7e974a31ed0290 R09: [ 219.816488] R10: 7ff0 R11: 0001 R12: 8800038f8000 [ 219.816491] R13: a01a50c0 R14: R15: [ 219.816498] FS: 7fe2e2eeb700() GS:88003f80() knlGS: [ 219.816502] CS: e033 DS: ES: CR0: 80050033 [ 219.816505] CR2: 0010 CR3: 3d20c000 CR4: 00042660 [ 219.816509] Stack: [ 219.816511] 8800038f8000 8800038f8030 880003ae3400 88003da9fc58 [ 219.816516] 805fe78b 8800038f8000 88003bb82c40 88003da9fc80 [ 219.816521] 805ff587 8800038f81a0 8800038f8190 880003ae3400 [ 219.816527] Call Trace: [ 219.816533] [] scsi_destroy_command_freelist+0x5b/0x60 [ 219.816538] [] scsi_host_dev_release+0x97/0xe0 [ 219.816543] [] device_release+0x2d/0xa0 [ 219.816548] [] kobject_cleanup+0x77/0x1b0 [ 219.816553] [] kobject_put+0x30/0x60 [ 219.816556] [] put_device+0x12/0x20 [ 219.816560] [] scsi_host_put+0x10/0x20 [ 219.816565] [] scsifront_free+0x42/0x90 [xen_scsifront] [ 219.816569] [] scsifront_remove+0x1d/0x50 [xen_scsifront] [ 219.816576] [] xenbus_dev_remove+0x50/0xa0 [ 219.816580] [] __device_release_driver+0x7a/0xf0 [ 219.816584] [] device_release_driver+0x1e/0x30 [ 219.816588] [] bus_remove_device+0x100/0x180 [ 219.816593] [] device_del+0x121/0x1b0 [ 219.816596] [] device_unregister+0x19/0x60 [ 219.816601] [] xenbus_dev_changed+0x9e/0x1e0 [ 219.816606] [] ? _raw_spin_unlock_irqrestore+0x25/0x50 [ 219.816611] [] ? unregister_xenbus_watch+0x200/0x200 [ 219.816615] [] frontend_changed+0x20/0x50 [ 219.816619] [] xenwatch_thread+0x9f/0x160 [ 219.816624] [] ? prepare_to_wait_event+0xf0/0xf0 [ 219.816628] [] kthread+0xcd/0xf0 [ 219.816633] [] ? kthread_create_on_node+0x170/0x170 [ 219.816638] [] ret_from_fork+0x7c/0xb0 [ 219.816642] [] ? kthread_create_on_node+0x170/0x170 [ 219.816645] Code: 8b af c0 00 00 00 48 c7 c7 c0 08 c7 80 e8 d1 e0 19 00 49 8b 84 24 c0 00 00 00 8b 90 48 01 00 00 85 d2 74 2f 48 8b 98 50 01 00 00 <8b> 43 10 85 c0 74 68 83 e8 01 85 c0 89 43 10 74 37 48 c7 c7 c0 [ 219.816732] RIP [] scsi_put_host_cmd_pool+0x38/0xb0 [ 219.816747] RSP [ 219.816750] CR2: 0010 [ 219.816753] ---[ end trace c6915ea21a3d05f7 ]--- I should mention I've specified .cmd_len in the scsi_host_template. The only other driver doing this seems to be virtio_scsi.c, so I assume the same problem could occur with passed-through SCSI devices under KVM... Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/3 net-next] Lockless netlink_lookup() with new concurrent hash table
From: Thomas Graf Date: Fri, 1 Aug 2014 00:56:00 +0200 > Netlink sockets are maintained in a hash table to allow efficient lookup > via the port ID for unicast messages. However, lookups currently require > a read lock to be taken. This series adds a new generic, resizable, > scalable, concurrent hash table based on the paper referenced in the first > patch. It then makes use of the new data type to implement lockless > netlink_lookup(). > > Patch 3/3 to convert nft_hash is included for reference but should be > merged via the netfilter tree. Inclusion in this series is to provide > context for the suggested API. > > Against net-next since the initial user of the new hash table is in net/ > > Changes: > v1-v2: > - fixed traversal off-by-one as spotted by Tobias Klauser > - removed unlikely() from BUG_ON() as spotted by Josh Triplett > - new 3rd patch to convert nft_hash to rhashtable > - make rhashtable_insert() return void > - nl_sk_hash_lock must be a mutex > - fixed wrong name of rht_shrink_below_30() > - exported symbols rht_grow_above_75() and rht_shrink_below_30() > - allow table freeing with RCU callback Looks great, series applied, thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/6] ARM: DTS: da850-evm: Add node for tlv320aic3106 codec
On 07/31/2014 05:24 PM, Sergei Shtylyov wrote: > Hello. > > On 07/31/2014 02:18 PM, Peter Ujfalusi wrote: > >> The board uses aic3106 for audio. > >> Signed-off-by: Peter Ujfalusi >> --- >> arch/arm/boot/dts/da850-evm.dts | 14 ++ >> 1 file changed, 14 insertions(+) > >> diff --git a/arch/arm/boot/dts/da850-evm.dts >> b/arch/arm/boot/dts/da850-evm.dts >> index 09118c72e83f..b9ef2be0b145 100644 >> --- a/arch/arm/boot/dts/da850-evm.dts >> +++ b/arch/arm/boot/dts/da850-evm.dts >> @@ -51,6 +51,20 @@ >> tps: tps@48 { >> reg = <0x48>; >> }; >> +tlv320aic3106: tlv320aic3106@1b { > >The "reg" property is <0x18>, why the unit-address part of a name is > different? True, I have lifted the codec part from other dts file and overlooked the unit-address. I will resend the series with this fixed. > Also, the ePAPR standard [1] says: > > The name of a node should be somewhat generic, reflecting the function of the > device and not its precise programming model. True. This is why the node for the audio support is named as 'sound'. For the components, like in this case I do not see issue to call the audio codec with it's name. > >> +#sound-dai-cells = <0>; >> +compatible = "ti,tlv320aic3106"; >> +reg = <0x18>; >> +status = "okay"; >> + >> +/* Regulators */ >> +IOVDD-supply = <_reg>; >> +/* Derived from VBAT: Baseboard 3.3V / 1.8V */ >> +AVDD-supply = <>; >> +DRVDD-supply = <>; >> +DVDD-supply = <>; >> +}; >> + > > [1] http://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.0.pdf BTW: there's a newer version available: https://www.power.org/wp-content/uploads/2012/06/Power_ePAPR_APPROVED_v1.1.pdf > > WBR, Sergei > -- Péter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the kvm-arm tree with Linus' tree
Hi Christoffer, On Thu, 31 Jul 2014 16:23:47 +0200 Christoffer Dall wrote: > > Stephen, did you pick up the resolution provided by Marc for the gicv2 > fix patch so that it applies to tomorrow's next/kvmarm merge? Yes, I have. You will need to produce the same for Linus eventually, or do the merge yourself as I suggested. -- Cheers, Stephen Rothwells...@canb.auug.org.au signature.asc Description: PGP signature
Re: [PATCH 2/6] ARM: DTS: da850: Add node for edma0
On 07/31/2014 05:26 PM, Sergei Shtylyov wrote: > On 07/31/2014 02:18 PM, Peter Ujfalusi wrote: > >> Add DT node for edma0. > >> Signed-off-by: Peter Ujfalusi >> --- >> arch/arm/boot/dts/da850.dtsi | 6 ++ >> 1 file changed, 6 insertions(+) > >> diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi >> index b695548dbb4e..41ce4e8bf227 100644 >> --- a/arch/arm/boot/dts/da850.dtsi >> +++ b/arch/arm/boot/dts/da850.dtsi >> @@ -150,6 +150,12 @@ >> }; >> >> }; >> +edma0: edma@01c0 { >> +compatible = "ti,edma3"; >> +reg =<0x0 0x1>; > >Why the mismatch between the unit-address part of the node name and the > "reg" property? For some reason the whole da850 uses offset from 0x01c0 for the SoC IPs. The nodes are under 'soc' and that has the ranges attribute. I do not really like this either. > >> +interrupts = <11 13 12>; >> +#dma-cells = <1>; >> +}; > > WBR, Sergei > -- Péter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] AX88179_178A: Add ethtool ops for EEE support
From: fre...@asix.com.tw Date: Thu, 31 Jul 2014 19:06:35 +0800 > From: Freddy Xin > > Add functions to support ethtool EEE manipulating, and the EEE > is disabled in default setting to enhance the compatibility > with certain switch. > > Signed-off-by: Freddy Xin Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] mm, shmem: Show location of non-resident shmem pages in smaps
On Tue, 22 Jul 2014, Jerome Marchand wrote: > Adds ShmOther, ShmOrphan, ShmSwapCache and ShmSwap lines to > /proc//smaps for shmem mappings. > > ShmOther: amount of memory that is currently resident in memory, not > present in the page table of this process but present in the page > table of an other process. > ShmOrphan: amount of memory that is currently resident in memory but > not present in any process page table. This can happens when a process > unmaps a shared mapping it has accessed before or exits. Despite being > resident, this memory is not currently accounted to any process. > ShmSwapcache: amount of memory currently in swap cache > ShmSwap: amount of memory that is paged out on disk. > > Signed-off-by: Jerome Marchand You will have to do a much better job of persuading me that these numbers are of any interest. Okay, maybe not me, I'm not that keen on /proc//smaps at the best of times. But you will need to show plausible cases where having these numbers available would have made a real difference, and drum up support for their inclusion from /proc//smaps devotees. Do you have a customer, who has underprovisioned with swap, and wants these numbers to work out how much more is needed? As it is, they appear to be numbers that you found you could provide, and so you're adding them into /proc//smaps, but having great difficulty in finding good names to describe them - which is itself an indicator that they're probably not the most useful statistics a sysadmin is wanting. (Google is a /proc//smaps user: let's take a look to see if we have been driven to add in stats of this kind: no, not at all.) The more numbers we add to /proc//smaps, the longer it will take to print, the longer mmap_sem will be held, and the more it will interfere with proper system operation - that's the concern I more often see. > --- > Documentation/filesystems/proc.txt | 11 > fs/proc/task_mmu.c | 56 > +- > 2 files changed, 66 insertions(+), 1 deletion(-) > > diff --git a/Documentation/filesystems/proc.txt > b/Documentation/filesystems/proc.txt > index 1a15c56..a65ab59 100644 > --- a/Documentation/filesystems/proc.txt > +++ b/Documentation/filesystems/proc.txt > @@ -422,6 +422,10 @@ Swap: 0 kB > KernelPageSize:4 kB > MMUPageSize: 4 kB > Locked: 374 kB > +ShmOther:124 kB > +ShmOrphan: 0 kB > +ShmSwapCache: 12 kB > +ShmSwap: 36 kB > VmFlags: rd ex mr mw me de > > the first of these lines shows the same information as is displayed for the > @@ -437,6 +441,13 @@ a mapping associated with a file may contain anonymous > pages: when MAP_PRIVATE > and a page is modified, the file page is replaced by a private anonymous > copy. > "Swap" shows how much would-be-anonymous memory is also used, but out on > swap. > +The ShmXXX lines only appears for shmem mapping. They show the amount of > memory > +from the mapping that is currently: > + - resident in RAM, not present in the page table of this process but present > + in the page table of an other process (ShmOther) We don't show that for files of any other filesystem, why for shmem? Perhaps you are too focussed on SysV SHM, and I am too focussed on tmpfs. It is a very specialized statistic, and therefore hard to name: I don't think ShmOther is a good name, but doubt any would do. ShmOtherMapped? > + - resident in RAM but not present in the page table of any process > (ShmOrphan) We don't show that for files of any other filesystem, why for shmem? Orphan? We do use the word "orphan" to describe pages which have been truncated off a file, but somehow not yet removed from pagecache. We don't use the the word "orphan" to describe pagecache pages which are not mapped into userspace - they are known as "pagecache pages which are not mapped into userspace". ShmNotMapped? > + - in swap cache (ShmSwapCache) Is this interesting? It's a transitional state: either memory pressure has forced the page to swapcache, but not yet freed it from memory; or swapin_readahead has brought this page back in when bringing in a nearby page of swap. I can understand that we might want better stats on the behaviour of swapin_readahead; better stats on shmem objects and swap; better stats on duplication between pagecache and swap; but I'm not convinced that /proc//smaps is the right place for those. Against all that, of course, we do have mincore() showing these pages as incore, where /proc//smaps does not. But I think that is justified by mincore()'s mission to show what's incore. > + - paged out on swap (ShmSwap). This one has the best case for inclusion: we do show Swap for the anon pages which are out on swap, but not for the shmem areas, where swap entry does not go into page table. But there is good reason for that: this is shared memory, files, objects commonly shared between processes, so it's a poor fit
Re: [PATCH 4/5] mm, shmem: Add shmem swap memory accounting
On Tue, 22 Jul 2014, Jerome Marchand wrote: > Adds get_mm_shswap() which compute the size of swaped out shmem. It > does so by pagewalking the mm and using the new shmem_locate() function > to get the physical location of shmem pages. > The result is displayed in the new VmShSw line of /proc//status. > Use mm_walk an shmem_locate() to account paged out shmem pages. > > It significantly slows down /proc//status acccess speed when > there is a big shmem mapping. If that is an issue, we can drop this > patch and only display this counter in the inherently slower > /proc//smaps file (cf. next patch). > > Signed-off-by: Jerome Marchand Definite NAK to this one. As you guessed yourself, it is always a mistake to add one potentially very slow-to-gather number to a stats file showing a group of quickly gathered numbers. Is there anything you could do instead? I don't know if it's worth the (little) extra mm_struct storage and maintenance, but you could add a VmShmSize, which shows that subset of VmSize (total_vm) which is occupied by shmem mappings. It's ambiguous what to deduce when VmShm is less than VmShmSize: the difference might be swapped out, it might be holes in the sparse object, it might be instantiated in the object but never faulted into the mapping: in general it will be a mix of all of those. So, sometimes useful info, but easy to be misled by it. As I say, I don't know if VmShmSize would be worth adding, given its deficiencies; and it could be worked out from /proc//maps anyway. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/5] mm, shmem: Add shmem_vma() helper
On Tue, 22 Jul 2014, Jerome Marchand wrote: > Add a simple helper to check if a vm area belongs to shmem. > > Signed-off-by: Jerome Marchand > --- > include/linux/mm.h | 6 ++ > mm/shmem.c | 8 > 2 files changed, 14 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 34099fa..04a58d1 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1074,11 +1074,17 @@ int shmem_zero_setup(struct vm_area_struct *); > > extern int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int > *count); > bool shmem_mapping(struct address_space *mapping); > +bool shmem_vma(struct vm_area_struct *vma); > + > #else > static inline bool shmem_mapping(struct address_space *mapping) > { > return false; > } > +static inline bool shmem_vma(struct vm_area_struct *vma) > +{ > + return false; > +} > #endif I would prefer include/linux/shmem_fs.h for this (and one of us clean up where the declarations of shmem_zero_setup and shmem_mapping live). But if 4/5 goes away, then there will only be one user of shmem_vma(), so in that case better just declare it (using shmem_mapping()) there in task_mmu.c in the smaps patch. > > extern int can_do_mlock(void); > diff --git a/mm/shmem.c b/mm/shmem.c > index 8aa4892..7d16227 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1483,6 +1483,14 @@ bool shmem_mapping(struct address_space *mapping) > return mapping->backing_dev_info == _backing_dev_info; > } > > +bool shmem_vma(struct vm_area_struct *vma) > +{ > + return (vma->vm_file && > + vma->vm_file->f_dentry->d_inode->i_mapping->backing_dev_info > + == _backing_dev_info); > + I agree with Oleg, vma->vm_file && shmem_mapping(file_inode(vma->vm_file)->i_mapping); would be better, Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] mm, shmem: Add shmem_locate function
On Tue, 22 Jul 2014, Jerome Marchand wrote: > The shmem subsytem is kind of a black box: the generic mm code can't I'm happier with that black box than you are :) > always know where a specific page physically is. This patch adds the > shmem_locate() function to find out the physical location of shmem > pages (resident, in swap or swapcache). If the optional argument count > isn't NULL and the page is resident, it also returns the mapcount value > of this page. > This is intended to allow finer accounting of shmem/tmpfs pages. > > Signed-off-by: Jerome Marchand > --- > include/linux/mm.h | 7 +++ > mm/shmem.c | 29 + > 2 files changed, 36 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index e69ee9d..34099fa 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1066,6 +1066,13 @@ extern bool skip_free_areas_node(unsigned int flags, > int nid); > > int shmem_zero_setup(struct vm_area_struct *); > #ifdef CONFIG_SHMEM > + > +#define SHMEM_NOTPRESENT 1 /* page is not present in memory */ > +#define SHMEM_RESIDENT 2 /* page is resident in RAM */ > +#define SHMEM_SWAPCACHE 3 /* page is in swap cache */ > +#define SHMEM_SWAP 4 /* page is paged out */ > + > +extern int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int > *count); Please place these, or what's needed of them, in include/linux/shmem_fs.h, rather than in the very overloaded include/linux/mm.h. You will need a !CONFIG_SHMEM stub for shmem_locate(), or whatever it ends up being called. > bool shmem_mapping(struct address_space *mapping); Oh, you're following a precedent, that's already bad placement. And it (but not its !CONFIG_SHMEM stub) is duplicated in shmem_fs.h. Perhaps because we were moving shmem_zero_setup() from mm.h to shmem_fs.h some time ago, but never got around to cleaning up the old location. Well, please place the new ones in shmem_fs.h, and I ought to clean up the rest at a time which does not interfere with you. > #else > static inline bool shmem_mapping(struct address_space *mapping) > diff --git a/mm/shmem.c b/mm/shmem.c > index b16d3e7..8aa4892 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1341,6 +1341,35 @@ static int shmem_fault(struct vm_area_struct *vma, > struct vm_fault *vmf) > return ret; > } > > +int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count) I don't find that a helpful name; but in 5/5 I question the info you're gathering here - maybe a good name will be more obvious once we've cut down what it's gathering. I just noticed that in 5/5 you're using a walk->pte_hole across empty extents: perhaps I'm prematurely optimizing, but that feels very inefficient, maybe here you should use a radix_tree lookup of the extent. If all we had to look up were the number of swap entries, in the vast majority of cases shmem.c could just see info->swapped is 0 and spend no time on radix_tree lookups at all. But what happens here depends on what really needs to be shown in 5/5. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/5] mm, shmem: Add shmem resident memory accounting
On Tue, 22 Jul 2014, Jerome Marchand wrote: > Currently looking at /proc//status or statm, there is no way to > distinguish shmem pages from pages mapped to a regular file (shmem > pages are mapped to /dev/zero), even though their implication in > actual memory use is quite different. > This patch adds MM_SHMEMPAGES counter to mm_rss_stat. It keeps track of > resident shmem memory size. Its value is exposed in the new VmShm line > of /proc//status. I like adding this info to /proc//status - thank you - but I think you can make the patch much better in a couple of ways. > > Signed-off-by: Jerome Marchand > --- > Documentation/filesystems/proc.txt | 2 ++ > arch/s390/mm/pgtable.c | 2 +- > fs/proc/task_mmu.c | 9 ++--- > include/linux/mm.h | 7 +++ > include/linux/mm_types.h | 7 --- > kernel/events/uprobes.c| 2 +- > mm/filemap_xip.c | 2 +- > mm/memory.c| 37 +++-- > mm/rmap.c | 8 > 9 files changed, 57 insertions(+), 19 deletions(-) > > diff --git a/Documentation/filesystems/proc.txt > b/Documentation/filesystems/proc.txt > index ddc531a..1c49957 100644 > --- a/Documentation/filesystems/proc.txt > +++ b/Documentation/filesystems/proc.txt > @@ -171,6 +171,7 @@ read the file /proc/PID/status: >VmLib: 1412 kB >VmPTE:20 kb >VmSwap:0 kB > + VmShm: 0 kB >Threads:1 >SigQ: 0/28578 >SigPnd: > @@ -228,6 +229,7 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7) > VmLib size of shared library code > VmPTE size of page table entries > VmSwap size of swap usage (the number of referred > swapents) > + VmShm size of resident shmem memory Needs to say that includes mappings of tmpfs, and needs to say that it's a subset of VmRSS. Better placed immediately after VmRSS... ...but now that I look through what's in /proc//status, it appears that we have to defer to /proc//statm to see MM_FILEPAGES (third field) and MM_ANONPAGES (subtract third field from second field). That's not a very friendly interface. If you're going to help by exposing MM_SHMPAGES separately, please help even more by exposing VmFile and VmAnon here in /proc//status too. VmRSS, VmAnon, VmShm, VmFile? I'm not sure what's the best order: here I'm thinking that anon comes before file in /proc/meminfo, and shm should be halfway between anon and file. You may have another idea. And of course the VmFile count here should exclude VmShm: I think it will work out least confusingly if you account MM_FILEPAGES separately from MM_SHMPAGES, but add them together where needed e.g. for statm. > Threads number of threads > SigQnumber of signals queued/max. number for queue > SigPnd bitmap of pending signals for the thread > diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c > index 37b8241..9fe31b0 100644 > --- a/arch/s390/mm/pgtable.c > +++ b/arch/s390/mm/pgtable.c > @@ -612,7 +612,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct > mm_struct *mm) > if (PageAnon(page)) > dec_mm_counter(mm, MM_ANONPAGES); > else > - dec_mm_counter(mm, MM_FILEPAGES); > + dec_mm_file_counters(mm, page); > } That is a recurring pattern: please try putting static inline int mm_counter(struct page *page) { if (PageAnon(page)) return MM_ANONPAGES; if (PageSwapBacked(page)) return MM_SHMPAGES; return MM_FILEPAGES; } in include/linux/mm.h. Then dec_mm_counter(mm, mm_counter(page)) here, and wherever you can, use mm_counter(page) to simplify the code throughout. I say "try" because I think factoring out mm_counter() will simplify the most code, given the profusion of different accessors, particularly in mm/memory.c. But I'm not sure how much bloat having it as an inline function will add, versus how much overhead it would add if not inline. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/2] dma: Add Xilinx AXI Direct Memory Access Engine driver support
Hi, Kindly review this patch and please provide your inputs. Thanks Srikanth On Mon, Jul 28, 2014 at 5:47 PM, Srikanth Thokala wrote: > This is the driver for the AXI Direct Memory Access (AXI DMA) > core, which is a soft Xilinx IP core that provides high- > bandwidth direct memory access between memory and AXI4-Stream > type target peripherals. > > This module works on Zynq (ARM Based SoC) and Microblaze platforms. > > Signed-off-by: Srikanth Thokala > --- > Changes in v3: > - Rebased on 3.16-rc7 > > Changes in v2: > - Simplified the logic to set SOP and APP words in prep_slave_sg(). > - Corrected function description comments to match the return type. > - Fixed some minor comments as suggested by Andy, Thanks. > --- > drivers/dma/Kconfig | 13 + > drivers/dma/xilinx/Makefile |1 + > drivers/dma/xilinx/xilinx_dma.c | 1225 > +++ > include/linux/amba/xilinx_dma.h | 17 + > 4 files changed, 1256 insertions(+) > create mode 100644 drivers/dma/xilinx/xilinx_dma.c > > diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig > index 1eca7b9..b8e831e 100644 > --- a/drivers/dma/Kconfig > +++ b/drivers/dma/Kconfig > @@ -375,6 +375,19 @@ config XILINX_VDMA > channels, Memory Mapped to Stream (MM2S) and Stream to > Memory Mapped (S2MM) for the data transfers. > > +config XILINX_DMA > + tristate "Xilinx AXI DMA Engine" > + depends on (ARCH_ZYNQ || MICROBLAZE) > + select DMA_ENGINE > + help > + Enable support for Xilinx AXI DMA Soft IP. > + > + This engine provides high-bandwidth direct memory access > + between memory and AXI4-Stream type target peripherals. > + It has two stream interfaces/channels, Memory Mapped to > + Stream (MM2S) and Stream to Memory Mapped (S2MM) for the > + data transfers. > + > config DMA_ENGINE > bool > > diff --git a/drivers/dma/xilinx/Makefile b/drivers/dma/xilinx/Makefile > index 3c4e9f2..6224a49 100644 > --- a/drivers/dma/xilinx/Makefile > +++ b/drivers/dma/xilinx/Makefile > @@ -1 +1,2 @@ > obj-$(CONFIG_XILINX_VDMA) += xilinx_vdma.o > +obj-$(CONFIG_XILINX_DMA) += xilinx_dma.o > diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c > new file mode 100644 > index 000..0500773 > --- /dev/null > +++ b/drivers/dma/xilinx/xilinx_dma.c > @@ -0,0 +1,1225 @@ > +/* > + * DMA driver for Xilinx DMA Engine > + * > + * Copyright (C) 2010 - 2014 Xilinx, Inc. All rights reserved. > + * > + * Based on the Freescale DMA driver. > + * > + * Description: > + * The AXI DMA, is a soft IP, which provides high-bandwidth Direct Memory > + * Access between memory and AXI4-Stream-type target peripherals. It can be > + * configured to have one channel or two channels and if configured as two > + * channels, one is to transmit data from memory to a device and another is > + * to receive from a device. > + * > + * This program is free software: you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation, either version 2 of the License, or > + * (at your option) any later version. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "../dmaengine.h" > + > +/* Register Offsets */ > +#define XILINX_DMA_REG_CONTROL 0x00 > +#define XILINX_DMA_REG_STATUS 0x04 > +#define XILINX_DMA_REG_CURDESC 0x08 > +#define XILINX_DMA_REG_TAILDESC0x10 > +#define XILINX_DMA_REG_SRCADDR 0x18 > +#define XILINX_DMA_REG_DSTADDR 0x20 > +#define XILINX_DMA_REG_BTT 0x28 > + > +/* Channel/Descriptor Offsets */ > +#define XILINX_DMA_MM2S_CTRL_OFFSET0x00 > +#define XILINX_DMA_S2MM_CTRL_OFFSET0x30 > + > +/* General register bits definitions */ > +#define XILINX_DMA_CR_RUNSTOP_MASK BIT(0) > +#define XILINX_DMA_CR_RESET_MASK BIT(2) > + > +#define XILINX_DMA_CR_DELAY_SHIFT 24 > +#define XILINX_DMA_CR_COALESCE_SHIFT 16 > + > +#define XILINX_DMA_CR_DELAY_MAXGENMASK(7, 0) > +#define XILINX_DMA_CR_COALESCE_MAX GENMASK(7, 0) > + > +#define XILINX_DMA_SR_HALTED_MASK BIT(0) > +#define XILINX_DMA_SR_IDLE_MASKBIT(1) > + > +#define XILINX_DMA_XR_IRQ_IOC_MASK BIT(12) > +#define XILINX_DMA_XR_IRQ_DELAY_MASK BIT(13) > +#define XILINX_DMA_XR_IRQ_ERROR_MASK BIT(14) > +#define XILINX_DMA_XR_IRQ_ALL_MASK GENMASK(14, 12) > + > +/* BD definitions */ > +#define XILINX_DMA_BD_STS_ALL_MASK GENMASK(31, 28) > +#define XILINX_DMA_BD_SOP BIT(27) > +#define XILINX_DMA_BD_EOP BIT(26) > + > +/* Hw specific definitions */ > +#define XILINX_DMA_MAX_CHANS_PER_DEVICE0x2 > +#define XILINX_DMA_MAX_TRANS_LEN GENMASK(22, 0) > + > +/* Delay loop counter to prevent hardware failure */ > +#define
Re: [PATCH 1/1] Drivers: net-next: hyperv: Increase the size of the sendbuf region
From: "K. Y. Srinivasan" Date: Wed, 30 Jul 2014 18:35:49 -0700 > For forwarding scenarios, it will be useful to allocate larger > sendbuf. Make the necessary adjustments to permit this. > > Signed-off-by: K. Y. Srinivasan This needs more information. You're increasing the size by 16 times, 1MB --> 16MB, thus less cache locality. You're also now using vmalloc() memory, thus more TLB misses and thrashing. This must have a negative impact on performance, and you have to test for that and quantify it when making a change as serious as this one. You also haven't gone into detail as to why forwarding scenerios require more buffer space, than say thousands of local sockets sending bulk TCP data. I'm not applying this, it needs a lot more work. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 03/10] ARM: dts: Clean up exynos5250-snow
Use the new style of referencing inherited nodes and use symbolic names. Reorder one pinctrl node in GPIO order. Goal is the alignment of all exynos5250 based device trees for comparison. Suggested-by: Doug Anderson Signed-off-by: Andreas Färber --- v4 -> v5: * Introduced labels to consistently use new referencing style (Tomasz) * Use IRQ_TYPE_* constants * Use some more GPIO_ACTIVE_* v3 -> v4: Unchanged v3: New (Doug Anderson) arch/arm/boot/dts/exynos5250-snow.dts | 291 +- arch/arm/boot/dts/exynos5250.dtsi | 20 +-- 2 files changed, 155 insertions(+), 156 deletions(-) diff --git a/arch/arm/boot/dts/exynos5250-snow.dts b/arch/arm/boot/dts/exynos5250-snow.dts index 1c36cd72905f..7680d5e03fb3 100644 --- a/arch/arm/boot/dts/exynos5250-snow.dts +++ b/arch/arm/boot/dts/exynos5250-snow.dts @@ -6,9 +6,12 @@ * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. -*/ + */ /dts-v1/; +#include +#include +#include #include "exynos5250.dtsi" / { @@ -26,89 +29,19 @@ chosen { }; - rtc@101E { - status = "okay"; - }; - - pinctrl@1140 { - ec_irq: ec-irq { - samsung,pins = "gpx1-6"; - samsung,pin-function = <0>; - samsung,pin-pud = <0>; - samsung,pin-drv = <0>; - }; - - sd3_clk: sd3-clk { - samsung,pin-drv = <0>; - }; - - sd3_cmd: sd3-cmd { - samsung,pin-pud = <3>; - samsung,pin-drv = <0>; - }; - - sd3_bus4: sd3-bus-width4 { - samsung,pin-drv = <0>; - }; - - max98095_en: max98095-en { - samsung,pins = "gpx1-7"; - samsung,pin-function = <0>; - samsung,pin-pud = <3>; - samsung,pin-drv = <0>; - }; - - tps65090_irq: tps65090-irq { - samsung,pins = "gpx2-6"; - samsung,pin-function = <0>; - samsung,pin-pud = <0>; - samsung,pin-drv = <0>; - }; - - usb3_vbus_en: usb3-vbus-en { - samsung,pins = "gpx2-7"; - samsung,pin-function = <1>; - samsung,pin-pud = <0>; - samsung,pin-drv = <0>; - }; - - hdmi_hpd_irq: hdmi-hpd-irq { - samsung,pins = "gpx3-7"; - samsung,pin-function = <0>; - samsung,pin-pud = <1>; - samsung,pin-drv = <0>; - }; - }; - - pinctrl@1340 { - arb_their_claim: arb-their-claim { - samsung,pins = "gpe0-4"; - samsung,pin-function = <0>; - samsung,pin-pud = <3>; - samsung,pin-drv = <0>; - }; - - arb_our_claim: arb-our-claim { - samsung,pins = "gpf0-3"; - samsung,pin-function = <1>; - samsung,pin-pud = <0>; - samsung,pin-drv = <0>; - }; - }; - gpio-keys { compatible = "gpio-keys"; power { label = "Power"; - gpios = < 3 1>; - linux,code = <116>; /* KEY_POWER */ + gpios = < 3 GPIO_ACTIVE_LOW>; + linux,code = ; gpio-key,wakeup; }; lid-switch { label = "Lid"; - gpios = < 5 1>; + gpios = < 5 GPIO_ACTIVE_LOW>; linux,input-type = <5>; /* EV_SW */ linux,code = <0>; /* SW_LID */ debounce-interval = <1>; @@ -129,8 +62,8 @@ i2c-parent = <&{/i2c@12CA}>; - our-claim-gpio = < 3 1>; - their-claim-gpios = < 4 1>; + our-claim-gpio = < 3 GPIO_ACTIVE_LOW>; + their-claim-gpios = < 4 GPIO_ACTIVE_LOW>; slew-delay-us = <10>; wait-retry-us = <3000>; wait-free-us = <5>; @@ -153,7 +86,7 @@ cros_ec: embedded-controller { compatible = "google,cros-ec-i2c"; reg = <0x1e>; - interrupts = <6 0>; + interrupts = <6 IRQ_TYPE_NONE>; interrupt-parent = <>;
[PATCH v5 05/10] ARM: dts: Move dp_hpd from exynos5250 into smdk5250 and snow
Spring uses a different GPIO, so this is not a generic SoC piece. Suggested-by: Tomasz Figa Signed-off-by: Andreas Färber --- v5: New (Tomasz Figa) Frees dp_hpd for Spring. arch/arm/boot/dts/exynos5250-pinctrl.dtsi | 7 --- arch/arm/boot/dts/exynos5250-smdk5250.dts | 9 + arch/arm/boot/dts/exynos5250-snow.dts | 7 +++ 3 files changed, 16 insertions(+), 7 deletions(-) diff --git a/arch/arm/boot/dts/exynos5250-pinctrl.dtsi b/arch/arm/boot/dts/exynos5250-pinctrl.dtsi index 886cfca044ac..ed0e5230514b 100644 --- a/arch/arm/boot/dts/exynos5250-pinctrl.dtsi +++ b/arch/arm/boot/dts/exynos5250-pinctrl.dtsi @@ -581,13 +581,6 @@ samsung,pin-pud = <0>; samsung,pin-drv = <0>; }; - - dp_hpd: dp_hpd { - samsung,pins = "gpx0-7"; - samsung,pin-function = <3>; - samsung,pin-pud = <0>; - samsung,pin-drv = <0>; - }; }; pinctrl@1340 { diff --git a/arch/arm/boot/dts/exynos5250-smdk5250.dts b/arch/arm/boot/dts/exynos5250-smdk5250.dts index aaa055ac0fe3..5d30fe1dcda4 100644 --- a/arch/arm/boot/dts/exynos5250-smdk5250.dts +++ b/arch/arm/boot/dts/exynos5250-smdk5250.dts @@ -414,3 +414,12 @@ }; }; }; + +_0 { + dp_hpd: dp_hpd { + samsung,pins = "gpx0-7"; + samsung,pin-function = <3>; + samsung,pin-pud = <0>; + samsung,pin-drv = <0>; + }; +}; diff --git a/arch/arm/boot/dts/exynos5250-snow.dts b/arch/arm/boot/dts/exynos5250-snow.dts index c4b0c73c736d..a9a2f2743794 100644 --- a/arch/arm/boot/dts/exynos5250-snow.dts +++ b/arch/arm/boot/dts/exynos5250-snow.dts @@ -547,6 +547,13 @@ }; _0 { + dp_hpd: dp_hpd { + samsung,pins = "gpx0-7"; + samsung,pin-function = <3>; + samsung,pin-pud = <0>; + samsung,pin-drv = <0>; + }; + ec_irq: ec-irq { samsung,pins = "gpx1-6"; samsung,pin-function = <0>; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 06/10] ARM: dts: Clean up exynos5250-smdk5250
Use the new style for referencing inherited nodes and use symbolic names. Goal is the alignment of all exynos5250 based device trees for comparison. Signed-off-by: Andreas Färber --- v5: New Follow-up after adding dp_hpd pinctrl node new-style. arch/arm/boot/dts/exynos5250-smdk5250.dts | 640 +++--- arch/arm/boot/dts/exynos5250.dtsi | 4 +- 2 files changed, 324 insertions(+), 320 deletions(-) diff --git a/arch/arm/boot/dts/exynos5250-smdk5250.dts b/arch/arm/boot/dts/exynos5250-smdk5250.dts index 5d30fe1dcda4..81dc921a5e5e 100644 --- a/arch/arm/boot/dts/exynos5250-smdk5250.dts +++ b/arch/arm/boot/dts/exynos5250-smdk5250.dts @@ -7,9 +7,11 @@ * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. -*/ + */ /dts-v1/; +#include +#include #include "exynos5250.dtsi" / { @@ -27,165 +29,6 @@ bootargs = "root=/dev/ram0 rw ramdisk=8192 initrd=0x4100,8M console=ttySAC2,115200 init=/linuxrc"; }; - rtc@101E { - status = "okay"; - }; - - i2c@12C6 { - samsung,i2c-sda-delay = <100>; - samsung,i2c-max-bus-freq = <2>; - status = "okay"; - - eeprom@50 { - compatible = "samsung,s524ad0xd1"; - reg = <0x50>; - }; - - max77686@09 { - compatible = "maxim,max77686"; - reg = <0x09>; - interrupt-parent = <>; - interrupts = <2 0>; - - voltage-regulators { - ldo1_reg: LDO1 { - regulator-name = "P1.0V_LDO_OUT1"; - regulator-min-microvolt = <100>; - regulator-max-microvolt = <100>; - regulator-always-on; - }; - - ldo2_reg: LDO2 { - regulator-name = "P1.2V_LDO_OUT2"; - regulator-min-microvolt = <120>; - regulator-max-microvolt = <120>; - regulator-always-on; - }; - - ldo3_reg: LDO3 { - regulator-name = "P1.8V_LDO_OUT3"; - regulator-min-microvolt = <180>; - regulator-max-microvolt = <180>; - regulator-always-on; - }; - - ldo4_reg: LDO4 { - regulator-name = "P2.8V_LDO_OUT4"; - regulator-min-microvolt = <280>; - regulator-max-microvolt = <280>; - }; - - ldo5_reg: LDO5 { - regulator-name = "P1.8V_LDO_OUT5"; - regulator-min-microvolt = <180>; - regulator-max-microvolt = <180>; - }; - - ldo6_reg: LDO6 { - regulator-name = "P1.1V_LDO_OUT6"; - regulator-min-microvolt = <110>; - regulator-max-microvolt = <110>; - regulator-always-on; - }; - - ldo7_reg: LDO7 { - regulator-name = "P1.1V_LDO_OUT7"; - regulator-min-microvolt = <110>; - regulator-max-microvolt = <110>; - regulator-always-on; - }; - - ldo8_reg: LDO8 { - regulator-name = "P1.0V_LDO_OUT8"; - regulator-min-microvolt = <100>; - regulator-max-microvolt = <100>; - }; - - ldo10_reg: LDO10 { - regulator-name = "P1.8V_LDO_OUT10"; - regulator-min-microvolt = <180>; - regulator-max-microvolt = <180>; - }; - - ldo11_reg: LDO11 { - regulator-name =
[PATCH v5 04/10] ARM: dts: Fill in bootargs for exynos5250-snow
exynos5250-cros-common.dtsi had an empty /chosen node. Fill in exemplary boot arguments. Signed-off-by: Andreas Färber --- v5: New Cleanup for /chosen node moved into -snow.dts. arch/arm/boot/dts/exynos5250-snow.dts | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/boot/dts/exynos5250-snow.dts b/arch/arm/boot/dts/exynos5250-snow.dts index 7680d5e03fb3..c4b0c73c736d 100644 --- a/arch/arm/boot/dts/exynos5250-snow.dts +++ b/arch/arm/boot/dts/exynos5250-snow.dts @@ -27,6 +27,7 @@ }; chosen { + bootargs = "console=tty1"; }; gpio-keys { -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 07/10] ARM: dts: Clean up exynos5250-arndale
Use the new style of referencing inherited nodes, use symbolic names, tidy indentation and reorder includes. Goal is the alignment of all exynos5250 based device trees for comparison. Signed-off-by: Andreas Färber --- v5: New Aligns with SMDK. arch/arm/boot/dts/exynos5250-arndale.dts | 929 --- 1 file changed, 466 insertions(+), 463 deletions(-) diff --git a/arch/arm/boot/dts/exynos5250-arndale.dts b/arch/arm/boot/dts/exynos5250-arndale.dts index d0de1f50d15b..3a608f57f833 100644 --- a/arch/arm/boot/dts/exynos5250-arndale.dts +++ b/arch/arm/boot/dts/exynos5250-arndale.dts @@ -7,12 +7,13 @@ * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. -*/ + */ /dts-v1/; -#include "exynos5250.dtsi" +#include #include #include +#include "exynos5250.dtsi" / { model = "Insignal Arndale evaluation board based on EXYNOS5250"; @@ -26,473 +27,52 @@ bootargs = "console=ttySAC2,115200"; }; - rtc@101E { - status = "okay"; - }; - - codec@1100 { - samsung,mfc-r = <0x4300 0x80>; - samsung,mfc-l = <0x5100 0x80>; - }; - - i2c@12C6 { - samsung,i2c-sda-delay = <100>; - samsung,i2c-max-bus-freq = <2>; - samsung,i2c-slave-addr = <0x66>; - status = "okay"; - - s5m8767_pmic@66 { - compatible = "samsung,s5m8767-pmic"; - reg = <0x66>; - interrupt-parent = <>; - interrupts = <2 IRQ_TYPE_LEVEL_LOW>; - - vinb1-supply = <_dc_reg>; - vinb2-supply = <_dc_reg>; - vinb3-supply = <_dc_reg>; - vinb4-supply = <_dc_reg>; - vinb5-supply = <_dc_reg>; - vinb6-supply = <_dc_reg>; - vinb7-supply = <_dc_reg>; - vinb8-supply = <_dc_reg>; - vinb9-supply = <_dc_reg>; - - vinl1-supply = <_reg>; - vinl2-supply = <_reg>; - vinl3-supply = <_reg>; - vinl4-supply = <_dc_reg>; - vinl5-supply = <_dc_reg>; - vinl6-supply = <_dc_reg>; - vinl7-supply = <_dc_reg>; - vinl8-supply = <_reg>; - vinl9-supply = <_reg>; - - s5m8767,pmic-buck2-dvs-voltage = <130>; - s5m8767,pmic-buck3-dvs-voltage = <110>; - s5m8767,pmic-buck4-dvs-voltage = <120>; - s5m8767,pmic-buck-dvs-gpios = < 0 0>, - < 1 0>, - < 2 0>; - s5m8767,pmic-buck-ds-gpios = < 3 0>, - < 4 0>, - < 5 0>; - regulators { - ldo1_reg: LDO1 { - regulator-name = "VDD_ALIVE_1.0V"; - regulator-min-microvolt = <110>; - regulator-max-microvolt = <110>; - regulator-always-on; - regulator-boot-on; - op_mode = <1>; - }; - - ldo2_reg: LDO2 { - regulator-name = "VDD_28IO_DP_1.35V"; - regulator-min-microvolt = <120>; - regulator-max-microvolt = <120>; - regulator-always-on; - regulator-boot-on; - op_mode = <1>; - }; - - ldo3_reg: LDO3 { - regulator-name = "VDD_COMMON1_1.8V"; - regulator-min-microvolt = <180>; - regulator-max-microvolt = <180>; - regulator-always-on; - regulator-boot-on; - op_mode = <1>; - }; - - ldo4_reg: LDO4 { - regulator-name = "VDD_IOPERI_1.8V"; - regulator-min-microvolt = <180>; -
[PATCH v5 09/10] ARM: dts: Simplify USB3503 on exynos5250-arndale
There's no need for a simple-bus, place the smsc,usb3503a directly into the root node. That's what we're going to do on exynos5250-spring. Reported-by: Tomasz Figa Signed-off-by: Andreas Färber --- v5: New Aligns with Spring's new USB3503 node. arch/arm/boot/dts/exynos5250-arndale.dts | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/arch/arm/boot/dts/exynos5250-arndale.dts b/arch/arm/boot/dts/exynos5250-arndale.dts index a04a875346aa..9912d27492db 100644 --- a/arch/arm/boot/dts/exynos5250-arndale.dts +++ b/arch/arm/boot/dts/exynos5250-arndale.dts @@ -108,18 +108,12 @@ }; }; - usb_hub_bus { - compatible = "simple-bus"; - #address-cells = <1>; - #size-cells = <0>; + // SMSC USB3503 connected in hardware only mode as a PHY + usb_hub: usb-hub { + compatible = "smsc,usb3503a"; - // SMSC USB3503 connected in hardware only mode as a PHY - usb_hub: usb_hub { - compatible = "smsc,usb3503a"; - - reset-gpios = < 5 GPIO_ACTIVE_LOW>; - connect-gpios = < 7 GPIO_ACTIVE_LOW>; - }; + reset-gpios = < 5 GPIO_ACTIVE_LOW>; + connect-gpios = < 7 GPIO_ACTIVE_LOW>; }; }; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 02/10] ARM: dts: Fold exynos5250-cros-common into exynos5250-snow
exynos5250-cros-common.dtsi was meant for sharing common pieces across ChromeOS devices. This turned out premature, as several devices ended up in the common file that are not common after all. Since the remaining common ChromeOS pieces are fairly minor, exynos5250-cros-common.dtsi was requested to be merged into the Snow device tree, sharing only the keyboard controller for now. This may be re-evaluated as both mature. Suggested-by: Doug Anderson Reviewed-by: Tomasz Figa Signed-off-by: Andreas Färber --- v4 -> v5: * Extended commit message (Tomasz Figa) v3 -> v4: Unchanged v2 -> v3: * Renamed subject to match Kukjin's style * Rebased onto MMC pinctrl bug fix (Doug Anderson) v2: New (Doug Anderson) arch/arm/boot/dts/exynos5250-cros-common.dtsi | 164 -- arch/arm/boot/dts/exynos5250-snow.dts | 164 +++--- 2 files changed, 145 insertions(+), 183 deletions(-) delete mode 100644 arch/arm/boot/dts/exynos5250-cros-common.dtsi diff --git a/arch/arm/boot/dts/exynos5250-cros-common.dtsi b/arch/arm/boot/dts/exynos5250-cros-common.dtsi deleted file mode 100644 index e603e9c70142.. --- a/arch/arm/boot/dts/exynos5250-cros-common.dtsi +++ /dev/null @@ -1,164 +0,0 @@ -/* - * Common device tree include for all Exynos 5250 boards based off of Daisy. - * - * Copyright (c) 2012 Google, Inc - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. -*/ - -/ { - aliases { - }; - - memory { - reg = <0x4000 0x8000>; - }; - - chosen { - }; - - pinctrl@1140 { - /* -* Disabled pullups since external part has its own pullups and -* double-pulling gets us out of spec in some cases. -*/ - i2c2_bus: i2c2-bus { - samsung,pin-pud = <0>; - }; - }; - - i2c@12C6 { - status = "okay"; - samsung,i2c-sda-delay = <100>; - samsung,i2c-max-bus-freq = <378000>; - }; - - i2c@12C7 { - status = "okay"; - samsung,i2c-sda-delay = <100>; - samsung,i2c-max-bus-freq = <378000>; - }; - - i2c@12C8 { - status = "okay"; - samsung,i2c-sda-delay = <100>; - samsung,i2c-max-bus-freq = <66000>; - - hdmiddc@50 { - compatible = "samsung,exynos4210-hdmiddc"; - reg = <0x50>; - }; - }; - - i2c@12C9 { - status = "okay"; - samsung,i2c-sda-delay = <100>; - samsung,i2c-max-bus-freq = <66000>; - }; - - i2c@12CA { - status = "okay"; - samsung,i2c-sda-delay = <100>; - samsung,i2c-max-bus-freq = <66000>; - }; - - i2c@12CB { - status = "okay"; - samsung,i2c-sda-delay = <100>; - samsung,i2c-max-bus-freq = <66000>; - }; - - i2c@12CD { - status = "okay"; - samsung,i2c-sda-delay = <100>; - samsung,i2c-max-bus-freq = <66000>; - }; - - i2c@12CE { - status = "okay"; - samsung,i2c-sda-delay = <100>; - samsung,i2c-max-bus-freq = <378000>; - - hdmiphy: hdmiphy@38 { - compatible = "samsung,exynos4212-hdmiphy"; - reg = <0x38>; - }; - }; - - mmc@1220 { - num-slots = <1>; - supports-highspeed; - broken-cd; - card-detect-delay = <200>; - samsung,dw-mshc-ciu-div = <3>; - samsung,dw-mshc-sdr-timing = <2 3>; - samsung,dw-mshc-ddr-timing = <1 2>; - pinctrl-names = "default"; - pinctrl-0 = <_clk _cmd _cd _bus4 _bus8>; - - slot@0 { - reg = <0>; - bus-width = <8>; - }; - }; - - mmc@1222 { - num-slots = <1>; - supports-highspeed; - card-detect-delay = <200>; - samsung,dw-mshc-ciu-div = <3>; - samsung,dw-mshc-sdr-timing = <2 3>; - samsung,dw-mshc-ddr-timing = <1 2>; - pinctrl-names = "default"; - pinctrl-0 = <_clk _cmd _cd _bus4>; - - slot@0 { - reg = <0>; - bus-width = <4>; - wp-gpios = < 1 0>; - }; - }; - - mmc@1223 { - num-slots = <1>; - supports-highspeed; - broken-cd; - card-detect-delay = <200>; -
[PATCH v5 10/10] ARM: dts: Add exynos5250-spring device tree
Adds initial support for the HP Chromebook 11. Cc: Vincent Palatin Cc: Doug Anderson Cc: Stephan van Schaik Signed-off-by: Andreas Färber --- v4 -> v5: * Dropped bogus USB3 regulator (Vincent Palatin, Tomasz Figa) * Fixed USB3503 reset GPIO (Tomasz Figa) * Introduced labels to use new referencing style consistently (Tomasz Figa) * Don't override dp_hpd, moved to pinctrl_0 instead (Tomasz Figa) * mmc_1: Added comment from Snow's mmc_3 (Tomasz Figa / Doug Anderson) * Override /codec samsung,mfc-{l,r} properties for alignment with Arndale * Use more GPIO_ACTIVE_* constants * Use IRQ_TYPE_* constants * Dropped s5m_ prefix for s5m8767 LDO regulator labels (max77686 is gone) * Labeled also all s5m8767 BUCK regulators v3 -> v4: * Fixed samsung,pin-function 1 -> 0 for dp-hpd-gpio * Replaced dp-hpd-gpio with existing dp_hpd, overriding it v2 -> v3: * Use GPIO_ACTIVE_{LOW,HIGH} (Doug Anderson) * Use symbolic KEY_POWER instead of comment * Moved hsic_reset to new USB3503 node's reset-gpios (Vincent Palatin) * Use dp_hpd_gpio for dp-controller (Doug Anderson, Ajay Kumar) * Override sd1_{clk,cmd,cd,bus4} pinctrl similar to Snow (Doug Anderson) * Added ec_irq pinctrl for cros_ec (Doug Anderson) * Reordered nodes to minimize diff against Snow (Doug Anderson) * Dropped obsolete mmc_2 override (Doug Anderson) * Added lid-switch node (Doug Anderson) * Added gpio-keys pinctrl (Doug Anderson) * Added bootargs to avoid empty /chosen node and to document console setting * Renamed s5m8767_pmic node to avoid underscore * Use new style for overriding inherited pinctrl nodes, too * Enable i2s0 node v1 -> v2: * Use label-based overriding/extension of nodes. (Doug Anderson) * Dropped tps65090 for now, until we know where to place it. * Dropped non-Spring nodes from -cros-common.dtsi rather than disabling them. * Enabled a missing MMC node for access to internal storage. * Dropped display-timings from dp-controller node. (Ajay Kumar) arch/arm/boot/dts/Makefile | 1 + arch/arm/boot/dts/exynos5250-spring.dts | 536 2 files changed, 537 insertions(+) create mode 100644 arch/arm/boot/dts/exynos5250-spring.dts diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index 80a781f76e88..dec4c292f63d 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -76,6 +76,7 @@ dtb-$(CONFIG_ARCH_EXYNOS) += exynos4210-origen.dtb \ exynos5250-arndale.dtb \ exynos5250-smdk5250.dtb \ exynos5250-snow.dtb \ + exynos5250-spring.dtb \ exynos5260-xyref5260.dtb \ exynos5410-smdk5410.dtb \ exynos5420-arndale-octa.dtb \ diff --git a/arch/arm/boot/dts/exynos5250-spring.dts b/arch/arm/boot/dts/exynos5250-spring.dts new file mode 100644 index ..108e3a9002e7 --- /dev/null +++ b/arch/arm/boot/dts/exynos5250-spring.dts @@ -0,0 +1,536 @@ +/* + * Google Spring board device tree source + * + * Copyright (c) 2013 Google, Inc + * Copyright (c) 2014 SUSE LINUX Products GmbH + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +/dts-v1/; +#include +#include +#include +#include "exynos5250.dtsi" + +/ { + model = "Google Spring"; + compatible = "google,spring", "samsung,exynos5250", "samsung,exynos5"; + + memory { + reg = <0x4000 0x8000>; + }; + + chosen { + bootargs = "console=tty1"; + }; + + gpio-keys { + compatible = "gpio-keys"; + pinctrl-names = "default"; + pinctrl-0 = <_key_irq>, <_irq>; + + power { + label = "Power"; + gpios = < 3 GPIO_ACTIVE_LOW>; + linux,code = ; + gpio-key,wakeup; + }; + + lid-switch { + label = "Lid"; + gpios = < 5 GPIO_ACTIVE_LOW>; + linux,input-type = <5>; /* EV_SW */ + linux,code = <0>; /* SW_LID */ + debounce-interval = <1>; + gpio-key,wakeup; + }; + }; + + usb-hub { + compatible = "smsc,usb3503a"; + reset-gpios = < 0 GPIO_ACTIVE_LOW>; + }; + + fixed-rate-clocks { + xxti { + compatible = "samsung,clock-xxti"; + clock-frequency = <2400>; + }; + }; +}; + + { + samsung,mfc-r = <0x4300 0x80>; + samsung,mfc-l = <0x5100 0x80>; +}; + + { + status = "okay"; + pinctrl-names = "default"; + pinctrl-0 = <_hpd>; + samsung,color-space = <0>; + samsung,dynamic-range = <0>; + samsung,ycbcr-coeff = <0>; + samsung,color-depth =
[PATCH v5 08/10] ARM: dts: Fix apparent GPIO typo in exynos5250-arndale
The GPIO flag 2 has no constant assigned, so this was probably active-low. Signed-off-by: Andreas Färber --- v5: New Spotted during cleanup. arch/arm/boot/dts/exynos5250-arndale.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/exynos5250-arndale.dts b/arch/arm/boot/dts/exynos5250-arndale.dts index 3a608f57f833..a04a875346aa 100644 --- a/arch/arm/boot/dts/exynos5250-arndale.dts +++ b/arch/arm/boot/dts/exynos5250-arndale.dts @@ -164,7 +164,7 @@ }; { - hpd-gpio = < 7 2>; + hpd-gpio = < 7 GPIO_ACTIVE_LOW>; vdd_osc-supply = <_reg>; vdd_pll-supply = <_reg>; vdd-supply = <_reg>; -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 01/10] ARM: dts: Fix MMC pinctrl for exynos5250-snow
The pinctrl properties should be on the device directly and not on the slot sub-node. Reported-by: Doug Anderson Cc: Jaehoon Chung Reviewed-by: Tomasz Figa Signed-off-by: Andreas Färber --- v3 -> v4 -> v5: Unchanged v3: New (Doug Anderson) Redundant with Jaehoon Chung's general slot@0 deprecation, in case that hits the tree earlier. arch/arm/boot/dts/exynos5250-snow.dts | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/arm/boot/dts/exynos5250-snow.dts b/arch/arm/boot/dts/exynos5250-snow.dts index f2b8c4116541..eb437f6afec1 100644 --- a/arch/arm/boot/dts/exynos5250-snow.dts +++ b/arch/arm/boot/dts/exynos5250-snow.dts @@ -240,10 +240,8 @@ */ mmc@1223 { status = "okay"; - slot@0 { - pinctrl-names = "default"; - pinctrl-0 = <_clk _cmd _bus4>; - }; + pinctrl-names = "default"; + pinctrl-0 = <_clk _cmd _bus4>; }; i2c@12CD { -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xen-netback: Turn off the carrier if the guest is not able to receive
From: Zoltan Kiss Date: Wed, 30 Jul 2014 20:50:49 +0100 > Currently when the guest is not able to receive more packets, qdisc layer > starts > a timer, and when it goes off, qdisc is started again to deliver a packet > again. > This is a very slow way to drain the queues, consumes unnecessary resources > and > slows down other guests shutdown. > This patch change the behaviour by turning the carrier off when that timer > fires, so all the packets are freed up which were stucked waiting for that > vif. > Instead of the rx_queue_purge bool it uses the VIF_STATUS_RX_PURGE_EVENT bit > to > signal the thread that either the timout happened or an RX interrupt arrived, > so > the thread can check what it should do. It also disables NAPI, so the guest > can't transmit, but leaves the interrupts on, so it can resurrect. > > Signed-off-by: Zoltan Kiss > Signed-off-by: David Vrabel When posting a multi-part patch set, number your patches and have a header "[PATCH 0/N] " posting which describes at a high level what the patch series is doing, and why. > + for (i = 0; i < num_queues; ++i) { Please use the more canonical "i++" increment. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: manual merge of the kvm tree with the ftrace tree
Hi all, Today's linux-next merge of the kvm tree got a conflict in arch/x86/kvm/mmutrace.h between commit 7b039cb4c5a9 ("tracing: Add trace_seq_buffer_ptr() helper function") from the ftrace tree and commit 42cbc04fd3b5 ("x86/kvm: Resolve shadow warnings in macro expansion") from the kvm tree. I fixed it up (I dropped the ftrace tree's change to this file) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au signature.asc Description: PGP signature
Re: [PATCH net-next v3 4/4 RFC] pktgen: Allow sending IPv4 TCP packets
From: Zoltan Kiss Date: Wed, 30 Jul 2014 17:20:12 +0100 > This is a prototype patch to enable sending IPv4 TCP packets with pktgen. The > original motivation is to test TCP GSO with xen-netback/netfront, but I'm not > sure about how the checksum should be set up, and also someone should verify > the > GSO settings I'm using. > > Signed-off-by: Zoltan Kiss > Cc: "David S. Miller" > Cc: Thomas Graf > Cc: Joe Perches > Cc: net...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: xen-de...@lists.xenproject.org > --- > v3: > - mention explicitly that this for IPv4 > - memset the TCP header and set up doff > - rework of checksum handling and GSO setting in fill_packet_ipv4 > - bail out in pktgen_xmit if the device won't be able to handle GSO > > diff --git a/net/core/pktgen.c b/net/core/pktgen.c > index 0d0aaac..9d93bda 100644 > --- a/net/core/pktgen.c > +++ b/net/core/pktgen.c > @@ -162,6 +162,7 @@ > #include > #include > #include > +#include > #include > #include > #ifdef CONFIG_XFRM > @@ -203,6 +204,7 @@ > #define F_NODE (1<<15) /* Node memory alloc*/ > #define F_UDPCSUM (1<<16) /* Include UDP checksum */ > #define F_PATTERN (1<<17) /* Fill the payload with a pattern */ > +#define F_TCP (1<<18) /* Send TCP packet instead of UDP */ > > /* Thread control flag bits */ > #define T_STOP(1<<0) /* Stop run */ > @@ -664,6 +666,9 @@ static int pktgen_if_show(struct seq_file *seq, void *v) > if (pkt_dev->flags & F_PATTERN) > seq_puts(seq, "PATTERN "); > > + if (pkt_dev->flags & F_TCP) > + seq_puts(seq, "TCP "); > + > if (pkt_dev->flags & F_MPLS_RND) > seq_puts(seq, "MPLS_RND "); > > @@ -1342,6 +1347,12 @@ static ssize_t pktgen_if_write(struct file *file, > else if (strcmp(f, "!PATTERN") == 0) > pkt_dev->flags &= ~F_PATTERN; > > + else if (strcmp(f, "TCP") == 0) > + pkt_dev->flags |= F_TCP; > + > + else if (strcmp(f, "!TCP") == 0) > + pkt_dev->flags &= ~F_TCP; > + > else { > sprintf(pg_result, > "Flag -:%s:- unknown\nAvailable flags, (prepend > ! to un-set flag):\n%s", > @@ -2955,7 +2966,8 @@ static struct sk_buff *fill_packet_ipv4(struct > net_device *odev, > { > struct sk_buff *skb = NULL; > __u8 *eth; > - struct udphdr *udph; > + struct udphdr *udph = NULL; > + struct tcphdr *tcph; > int datalen, iplen; > struct iphdr *iph; > __be16 protocol = htons(ETH_P_IP); > @@ -3017,29 +3029,40 @@ static struct sk_buff *fill_packet_ipv4(struct > net_device *odev, > iph = (struct iphdr *) skb_put(skb, sizeof(struct iphdr)); > > skb_set_transport_header(skb, skb->len); > - udph = (struct udphdr *) skb_put(skb, sizeof(struct udphdr)); > + > + if (pkt_dev->flags & F_TCP) { > + datalen = pkt_dev->cur_pkt_size - ETH_HLEN - 20 - > + sizeof(struct tcphdr) - pkt_dev->pkt_overhead; > + tcph = (struct tcphdr *)skb_put(skb, sizeof(struct tcphdr)); > + memset(tcph, 0, sizeof(*tcph)); > + tcph->source = htons(pkt_dev->cur_udp_src); > + tcph->dest = htons(pkt_dev->cur_udp_dst); > + tcph->doff = sizeof(struct tcphdr) >> 2; > + } else { > + datalen = pkt_dev->cur_pkt_size - ETH_HLEN - 20 - > + sizeof(struct udphdr) - pkt_dev->pkt_overhead; > + udph = (struct udphdr *)skb_put(skb, sizeof(struct udphdr)); > + udph->source = htons(pkt_dev->cur_udp_src); > + udph->dest = htons(pkt_dev->cur_udp_dst); > + udph->len = htons(datalen + sizeof(struct udphdr)); > + udph->check = 0; > + } > + As more protocols (SCTP, etc.) get supported, this is going to become completely unmanageable. Please use callbacks or something like that so this function doesn't turn into even more spaghetti. > + } else if (pkt_dev->flags & F_TCP) { > + struct inet_sock inet; > + > + inet.inet_saddr = iph->saddr; > + inet.inet_daddr = iph->daddr; > + skb->ip_summed = CHECKSUM_NONE; > + tcp_v4_send_check((struct sock *), skb); Please don't do things like this. Making fake sockets on the stack, don't do it. Do other non-socket contexts compute TCP checksums this way? Check netfilter or similar, see what they do. Worst case export __tcp_v4_send_check() or just duplicate it's contents in the tcp case here. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] LLVMLinux: Patches to enable the kernel to be compiled with clang/LLVM
On 07/31/14 03:33, Will Deacon wrote: On Thu, Jul 31, 2014 at 12:57:25AM +0100, beh...@converseincode.com wrote: From: Behan Webster This patch set moves from using locally defined named registers to access the stack pointer to using a globally defined named register. This allows the code to work both with gcc and clang. The LLVMLinux project aims to fully build the Linux kernel using both gcc and clang (the C front end for the LLVM compiler infrastructure project). Behan Webster (4): arm64: LLVMLinux: Add current_stack_pointer() for arm64 arm64: LLVMLinux: Use current_stack_pointer in save_stack_trace_tsk arm64: LLVMLinux: Calculate current_thread_info from current_stack_pointer arm64: LLVMLinux: Use current_stack_pointer in kernel/traps.c Once Andreas's comments have been addressed: Acked-by: Will Deacon Please can you send a new series after the merge window? Pity. I was hoping to get it in this merge window. However, will resubmit for 3.18. Thanks, Behan -- Behan Webster beh...@converseincode.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3] kbuild, LLVMLinux: Supress warnings unless W=1-3
From: Behan Webster clang has more warnings enabled by default. Turn them off unless W is set. This patch fixes a logic bug where warnings in clang were disabled when W was set. Signed-off-by: Behan Webster Signed-off-by: Jan-Simon Möller Signed-off-by: Mark Charlebois Cc: mma...@suse.cz Cc: b...@alien8.de --- Makefile | 1 + scripts/Makefile.extrawarn | 22 -- 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/Makefile b/Makefile index f6a7794..f343e17 100644 --- a/Makefile +++ b/Makefile @@ -668,6 +668,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, tautological-compare) # source of a reference will be _MergedGlobals and not on of the whitelisted names. # See modpost pattern 2 KBUILD_CFLAGS += $(call cc-option, -mno-global-merge,) +KBUILD_CFLAGS += $(call cc-option, -fcatch-undefined-behavior) else # This warning generated too much noise in a regular build. diff --git a/scripts/Makefile.extrawarn b/scripts/Makefile.extrawarn index 6564350..4315d34 100644 --- a/scripts/Makefile.extrawarn +++ b/scripts/Makefile.extrawarn @@ -26,16 +26,6 @@ warning-1 += $(call cc-option, -Wmissing-include-dirs) warning-1 += $(call cc-option, -Wunused-but-set-variable) warning-1 += $(call cc-disable-warning, missing-field-initializers) -# Clang -warning-1 += $(call cc-disable-warning, initializer-overrides) -warning-1 += $(call cc-disable-warning, unused-value) -warning-1 += $(call cc-disable-warning, format) -warning-1 += $(call cc-disable-warning, unknown-warning-option) -warning-1 += $(call cc-disable-warning, sign-compare) -warning-1 += $(call cc-disable-warning, format-zero-length) -warning-1 += $(call cc-disable-warning, uninitialized) -warning-1 += $(call cc-option, -fcatch-undefined-behavior) - warning-2 := -Waggregate-return warning-2 += -Wcast-align warning-2 += -Wdisabled-optimization @@ -64,4 +54,16 @@ ifeq ("$(strip $(warning))","") endif KBUILD_CFLAGS += $(warning) +else + +ifeq ($(COMPILER),clang) +KBUILD_CFLAGS += $(call cc-disable-warning, initializer-overrides) +KBUILD_CFLAGS += $(call cc-disable-warning, unused-value) +KBUILD_CFLAGS += $(call cc-disable-warning, format) +KBUILD_CFLAGS += $(call cc-disable-warning, unknown-warning-option) +KBUILD_CFLAGS += $(call cc-disable-warning, sign-compare) +KBUILD_CFLAGS += $(call cc-disable-warning, format-zero-length) +KBUILD_CFLAGS += $(call cc-disable-warning, uninitialized) endif +endif + -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] kbuild, LLVMLinux: Supress warnings unless W=1-3
On 07/31/14 13:46, Michal Marek wrote: Dne 31.7.2014 18:12, Behan Webster napsal(a): On 07/31/14 01:18, Michal Marek wrote: Dne 31.7.2014 06:16, beh...@converseincode.com napsal(a): @@ -55,6 +45,18 @@ warning-3 += -Wswitch-default warning-3 += $(call cc-option, -Wpacked-bitfield-compat) warning-3 += $(call cc-option, -Wvla) +ifeq ($(COMPILER),clang) +ifndef $(W) +KBUILD_CFLAGS += $(call cc-disable-warning, initializer-overrides) +KBUILD_CFLAGS += $(call cc-disable-warning, unused-value) +KBUILD_CFLAGS += $(call cc-disable-warning, format) +KBUILD_CFLAGS += $(call cc-disable-warning, unknown-warning-option) +KBUILD_CFLAGS += $(call cc-disable-warning, sign-compare) +KBUILD_CFLAGS += $(call cc-disable-warning, format-zero-length) +KBUILD_CFLAGS += $(call cc-disable-warning, uninitialized) +endif +endif + Please remove this part, it has no effect. I assume that if it works for you, these warning are not as annoying so they do not need to be disabled? Actually they are annoying, that's why they're disabled normally. Most of them complain about practices which are relatively common in kernel code. clang warns about a lot more things than gcc does. It means that code which compiles cleanly in gcc often doesn't with clang. This cuts out the warnings which are unlikely to to be fixed in kernel code anytime soon, but which are probably worth exposing when W=1 is used. This part of the patch explicitly deals with complaints from some in the kernel community that clang is too noisy with kernel code. This part of the patch needs to be somewhere. This seemed the best place. You placed it inside a branch that is only evaluated when W= is given. Hmm. You're right. Will fix. Behan -- Behan Webster beh...@converseincode.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Fri, 2014-08-01 at 10:03 +0800, Aaron Lu wrote: > On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote: > > On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: > > > On Tue, 29 Jul 2014 13:24:05 +0800 > > > Aaron Lu wrote: > > > > > > > FYI, we noticed the below changes on > > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > > > > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure > > > > task_numa_migrate() checks the preferred node") > > > > > > > > ebe06187bf2aec1 a43455a1d572daf7b730fe12e > > > > --- - > > > > 94500 ~ 3%+115.6% 203711 ~ 6% > > > > ivb42/hackbench/50%-threads-pipe > > > > 67745 ~ 4% +64.1% 74 ~ 5% > > > > lkp-snb01/hackbench/50%-threads-socket > > > > 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL > > > > proc-vmstat.numa_hint_faults_local > > > > > > Hi Aaron, > > > > > > Jirka Hladky has reported a regression with that changeset as > > > well, and I have already spent some time debugging the issue. > > > > So assuming those numbers above are the difference in > > Yes, they are. > > It means, for commit ebe06187bf2aec1, the number for > num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01 > machine. The 3%, 4% following that number means the deviation of the > different runs to their average(we usually run it multiple times to > phase out possible sharp values). We should probably remove that > percentage, as they cause confusion if no detailed explanation and may > not mean much to the commit author and others(if the deviation is big > enough, we should simply drop that result). > > The percentage in the middle is the change between the two commits. > > Another thing is the meaning of the numbers, it doesn't seem that > evident they are for proc-vmstat.numa_hint_faults_local. Maybe something > like this is better? Instead of removing info, why not document what each piece of data represents. Or add headers to the table. etc. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] ARM: Straggler SoC fix for 3.16
Hi Linus, The following changes since commit a1ae5b128365f36a3fa2143cfa9de14fc71c51d8: Merge tag 'omap-for-v3.16/n900-regression' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes (2014-07-29 13:04:27 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git tags/fixes-for-linus for you to fetch changes up to b779b88df8370feafa6f49d08a2cc88db4834992: MAINTAINERS: Update Tegra Git URL (2014-07-30 12:50:54 -0700) ARM: Straggler SoC fix for 3.16 A DT bugfix for Nomadik that had an ambigouos double-inversion of a gpio line, and one MAINTAINER URL update that might as well go in now. We could hold off until the merge window, but then we'll just have to mark the DT fix for stable and it just seems like in total causing more work. Andreas Färber (1): MAINTAINERS: Update Tegra Git URL Linus Walleij (1): ARM: nomadik: fix up double inversion in DT MAINTAINERS| 2 +- arch/arm/boot/dts/ste-nomadik-s8815.dts| 2 +- arch/arm/boot/dts/ste-nomadik-stn8815.dtsi | 7 --- 3 files changed, 6 insertions(+), 5 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Help with btrfs_zero_range function
Hey Guys, I need to ask a question again, I am writing the above function and basing it off the one of punch hole. I have only started writing the function and have a few questions about how to write this. Below this message are my questions so fair and I also posting my written code in case you guys want to give any feed back. Regards and Thanks Again, Nick Questions 1. bool no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES); How I change this to check for a zero range or do I just remove this variable; 2. ret = find_first_non_hole(inode, , ); How do I modify the called function for ret to be for a zero range? The other parts of this function are pretty similar to the one for punch holes and seems pretty easy to move other the other parts. Code static long btrfs_zero_range(struct inode *inode, loff_t loffset, loff_t len,){ struct btrfs_root *root = BTRF_I(inode)->root; struct btrfs_path *path; struct btrfs_block_rsv *rsv; struct btrfs_trans_handle *trans; u64 lockstart; u64 lockend; u64 tail_start; u64 tail_len; u64 orig_start = offset; u64 cur_offset; u64 min_size = btrfs_calc_trunc_metadata_size(root, 1); u64 drop_end; int ret = 0; int err = 0; int rsv_count; bool same_page; bool no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES); u64 ino_size; ret=btrfs_wait_ordered_range(inode, offset, len); if(ret) return ret; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2 v4] sched: Rewrite per entity runnable load average
Hi Vincent, On Thu, Jul 31, 2014 at 11:56:13AM +0200, Vincent Guittot wrote: > > load_sum is now the average runnable time before being weighted So when weight changes, load_avg will completely use new weight. I have some cents: 1) Task does not change weight much, so it is practically ok 2) Group entity does change weight much, and very likely back and forth, so I really think keeping the intact history will make everything more predictable/stable, prevent thrashing, etc. 3) If you do the same for cfs_rq->load.weight, then we simply abandoned blocked entities, and all states won't compute. So we then need to maintain blocked load average again, and we just can't do cfs_rq load average as a whole anymore, but must update at the granularity of an entity... Anyway, it does not seem to me you really need to change load_sum, no? So could you please not change it? > The sum of usage_sum of the tasks that are on a rq, is used to detect > the overload of a rq. I think you only need usage_sum for task and rq, but not cfs_rq. Others are ok. > Does something like the patch below to be applied of top of your patchset, > seem > reasonable add-on? > If you only add running statistics, I am all good, and indeed reasonable if you can make good use of it. I am not at all against adding anything or adding running average or unweighted anything... Thanks, Yuyang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 4/4] ARM: dts: Add exynos5250-spring device tree
Am 31.07.2014 21:05, schrieb Tomasz Figa: >> + >> +_2 { >> +status = "okay"; >> +samsung,i2c-sda-delay = <100>; >> +samsung,i2c-max-bus-freq = <66000>; >> + >> +hdmiddc@50 { >> +compatible = "samsung,exynos4210-hdmiddc"; >> +reg = <0x50>; >> +}; > > I don't think this matches current Exynos HDMI bindings, which I believe > have been changed to just take a phandle to i2c bus instead. Looks correct to me: http://git.kernel.org/cgit/linux/kernel/git/kgene/linux-samsung.git/tree/Documentation/devicetree/bindings/video/exynos_hdmiddc.txt?h=for-next https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/Documentation/devicetree/bindings/video/exynos_hdmiddc.txt Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ARM/ARM64: don't enter kgdb when userspace executes a kgdb break instruction.
Hi, Will, On Thu, Jul 31, 2014 at 11:46:53AM +0100, Will Deacon wrote: > I'll merge the arm64 diff I proposed. Could you repost the ARM part please? I've just reposted it, hopefully we can get that merged in soon as well. > I think enabling and activating kgdb by default is a pretty crazy thing to > do, but I agree that we shouldn't allow userspace to trap into it either. Agreed, hopefully the actual footprint of this bug isn't too large, but it's worth fixing anyways. > Once you repost the ARM patches, we can look at getting them merged via rmk. > > Cheers, > > Will Thanks again! Omar -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ARM: don't enter kgdb when userspace executes a kgdb break instruction.
The kgdb breakpoint hooks (kgdb_brk_fn and kgdb_compiled_brk_fn) should only be entered when a kgdb break instruction is executed from the kernel. Otherwise, if kgdb is enabled, a userspace program can cause the kernel to drop into the debugger by executing either KGDB_BREAKINST or KGDB_COMPILED_BREAK. Signed-off-by: Omar Sandoval --- On a kernel running with kgdb enabled, this program reproduces the problem: .globl _start _start: udf #65006 @ KGDB_BREAKINST The same problem has been fixed in ARM64. arch/arm/kernel/kgdb.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c index 778c2f7..a74b53c 100644 --- a/arch/arm/kernel/kgdb.c +++ b/arch/arm/kernel/kgdb.c @@ -160,12 +160,16 @@ static int kgdb_compiled_brk_fn(struct pt_regs *regs, unsigned int instr) static struct undef_hook kgdb_brkpt_hook = { .instr_mask = 0x, .instr_val = KGDB_BREAKINST, + .cpsr_mask = MODE_MASK, + .cpsr_val = SVC_MODE, .fn = kgdb_brk_fn }; static struct undef_hook kgdb_compiled_brkpt_hook = { .instr_mask = 0x, .instr_val = KGDB_COMPILED_BREAK, + .cpsr_mask = MODE_MASK, + .cpsr_val = SVC_MODE, .fn = kgdb_compiled_brk_fn }; -- 2.0.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] iovec: make sure the caller actually wants anything in memcpy_fromiovecend
Check for cases when the caller requests 0 bytes instead of running off and dereferencing potentially invalid iovecs. Signed-off-by: Sasha Levin --- lib/iovec.c |4 1 file changed, 4 insertions(+) diff --git a/lib/iovec.c b/lib/iovec.c index 7a7c2da..df3abd1 100644 --- a/lib/iovec.c +++ b/lib/iovec.c @@ -85,6 +85,10 @@ EXPORT_SYMBOL(memcpy_toiovecend); int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov, int offset, int len) { + /* No data? Done! */ + if (len == 0) + return 0; + /* Skip over the finished iovecs */ while (offset >= iov->iov_len) { offset -= iov->iov_len; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC tip/core/rcu 1/9] rcu: Add call_rcu_tasks()
On Thu, 2014-07-31 at 09:38 -0700, Paul E. McKenney wrote: > Does building with CONFIG_NO_HZ_FULL_SYSIDLE=y slow things down even more? > If so, that would give me a rough idea of the cost of RCU's dyntick-idle > handling. Nope. Deltas are all down in the statistical frog hair. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ACPI/Processor: Add CPU_STARTING_FROZEN check in the acpi_cpu_soft_notify()
On 2014年08月01日 05:20, Rafael J. Wysocki wrote: > On Thursday, July 31, 2014 05:20:26 PM Lan Tianyu wrote: >> The callback of CPU_STARTING event can't sleep and so acpi_cpu_soft_notify() >> return directly when CPU_STARTING event is triggered. But cpu hotplug also >> happens during S2RAM. The action will become CPU_STARTING_FROZEN. This >> patch is to fix missing check the frozen event. >> >> Signed-off-by: Lan Tianyu > > There is work to restructure the handling of CPU_TASKS_FROZEN under way > and Chen Gong is driving it. That's likely to conflict with the last > two patches from you. Can you please coordinate with Gong? Hi Rafael: Thanks for reminder. I just checked Chen Gong's patchset "Gloabl CPU Hot-plug flag _FROZEN Clean up". There is no conflict between our patches. Gong's patch is to remove the following macro. CPU_ONLINE_FROZEN CPU_UP_PREPARE_FROZEN CPU_UP_CANCELED_FROZEN CPU_DOWN_PREPARE_FROZEN CPU_DOWN_FAILED_FROZEN CPU_DEAD_FROZEN CPU_DYING_FROZEN CPU_STARTING_FROZEN CPU_TASKS_FROZEN is still available and the CPU events during S2RAM are still (CPU_xxx | CPU_TASKS_FROZEN). BTW, this is a bug fix from my opinion and it should be backported to stable tree. > > Rafael > > >> --- >> drivers/acpi/processor_driver.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/acpi/processor_driver.c >> b/drivers/acpi/processor_driver.c >> index 4fcbd67..66e2249 100644 >> --- a/drivers/acpi/processor_driver.c >> +++ b/drivers/acpi/processor_driver.c >> @@ -125,7 +125,7 @@ static int acpi_cpu_soft_notify(struct notifier_block >> *nfb, >> * CPU_STARTING and CPU_DYING must not sleep. Return here since >> * acpi_bus_get_device() may sleep. >> */ >> -if (action == CPU_STARTING || action == CPU_DYING) >> +if ((action & ~CPU_TASKS_FROZEN) == CPU_STARTING || action == CPU_DYING) >> return NOTIFY_DONE; >> >> if (!pr || acpi_bus_get_device(pr->handle, )) >> > -- Best regards Tianyu Lan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V9 2/6] Documentation: power: reset: Add documentation for generic SYSCON reboot driver
Add documentation for generic SYSCON reboot driver. Signed-off-by: Feng Kan --- .../bindings/power/reset/syscon-reboot.txt | 23 ++ 1 file changed, 23 insertions(+) create mode 100644 Documentation/devicetree/bindings/power/reset/syscon-reboot.txt diff --git a/Documentation/devicetree/bindings/power/reset/syscon-reboot.txt b/Documentation/devicetree/bindings/power/reset/syscon-reboot.txt new file mode 100644 index 000..1190631 --- /dev/null +++ b/Documentation/devicetree/bindings/power/reset/syscon-reboot.txt @@ -0,0 +1,23 @@ +Generic SYSCON mapped register reset driver + +This is a generic reset driver using syscon to map the reset register. +The reset is generally performed with a write to the reset register +defined by the register map pointed by syscon reference plus the offset +with the mask defined in the reboot node. + +Required properties: +- compatible: should contain "syscon-reboot" +- regmap: this is phandle to the register map node +- offset: offset in the register map for the reboot register (in bytes) +- mask: the reset value written to the reboot register (32 bit access) + +Default will be little endian mode, 32 bit access only. + +Examples: + + reboot { + compatible = "syscon-reboot"; + regmap = <>; + offset = <0x0>; + mask = <0x1>; + }; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V9 0/6] Add X-Gene platform reboot mechanism
Enable reboot driver for the X-Gene platform. Add generic syscon reboot driver. V9 Change: - rebase on Guenter Roeck's V5 reset handler patch set. This allows for a generic reset to be call rather than the arm specific reset handler. V8 Change: - change Kconfig to depend on ARM || ARM64 || COMPILE_TEST V7 Change: - Seem V3 on, the patches were not making in to the mailinglist. - Fix build error produced by other ARCH while including this driver. Set to depend on arm64 ARCH for now. V6 Change: - Add documentation for scu node. V5 Change: - Documentation update, endian and access size. V4 Change: - Remove old X-Gene reboot driver - Add generic syscon reboot driver - Add DTS and Kconfig for X-Gene reboot using syscon method V3 Change: - Remove the reboot driver's use of acpi resource patch. - Change the reboot driver to use syscon to parse out system clock register. Remove the old method of getting register from the reboot driver directly. - Remove documentation since its now simple. V2 Change: - Add support for using ACPI resource. Feng Kan (6): power: reset: Add generic SYSCON register mapped reset Documentation: power: reset: Add documentation for generic SYSCON reboot driver Documentation: arm64: add SCU dts binding documentation to linux kernel arm64: dts: Add X-Gene reboot driver dts node arm64: Select reboot driver for X-Gene platform power: reset: Remove X-Gene reboot driver Documentation/devicetree/bindings/arm/apm/scu.txt | 17 .../bindings/power/reset/syscon-reboot.txt | 23 + arch/arm64/Kconfig | 2 + arch/arm64/boot/dts/apm-storm.dtsi | 12 +++ drivers/power/reset/Kconfig| 12 +-- drivers/power/reset/Makefile | 2 +- drivers/power/reset/syscon-reboot.c| 98 drivers/power/reset/xgene-reboot.c | 103 - 8 files changed, 158 insertions(+), 111 deletions(-) create mode 100644 Documentation/devicetree/bindings/arm/apm/scu.txt create mode 100644 Documentation/devicetree/bindings/power/reset/syscon-reboot.txt create mode 100644 drivers/power/reset/syscon-reboot.c delete mode 100644 drivers/power/reset/xgene-reboot.c -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V9 4/6] arm64: dts: Add X-Gene reboot driver dts node
Add X-Gene platform reboot driver dts node. Signed-off-by: Feng Kan --- arch/arm64/boot/dts/apm-storm.dtsi | 12 1 file changed, 12 insertions(+) diff --git a/arch/arm64/boot/dts/apm-storm.dtsi b/arch/arm64/boot/dts/apm-storm.dtsi index ccd150a..3dfd1f4 100644 --- a/arch/arm64/boot/dts/apm-storm.dtsi +++ b/arch/arm64/boot/dts/apm-storm.dtsi @@ -103,6 +103,11 @@ #size-cells = <2>; ranges; + scu: system-clk-controller@1700 { + compatible = "apm,xgene-scu","syscon"; + reg = <0x0 0x1700 0x0 0x400>; + }; + clocks { #address-cells = <2>; #size-cells = <2>; @@ -397,5 +402,12 @@ #clock-cells = <1>; clocks = < 0>; }; + + reboot: reboot@1714 { + compatible = "syscon-reboot"; + regmap = <>; + offset = <0x14>; + mask = <0x1>; + }; }; }; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V9 3/6] Documentation: arm64: add SCU dts binding documentation to linux kernel
This add documentation for the SCU system clock unit device tree binding to the kernel. Signed-off-by: Feng Kan --- Documentation/devicetree/bindings/arm/apm/scu.txt | 17 + 1 file changed, 17 insertions(+) create mode 100644 Documentation/devicetree/bindings/arm/apm/scu.txt diff --git a/Documentation/devicetree/bindings/arm/apm/scu.txt b/Documentation/devicetree/bindings/arm/apm/scu.txt new file mode 100644 index 000..b45be06 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/apm/scu.txt @@ -0,0 +1,17 @@ +APM X-GENE SoC series SCU Registers + +This system clock unit contain various register that control block resets, +clock enable/disables, clock divisors and other deepsleep registers. + +Properties: + - compatible : should contain two values. First value must be: + - "apm,xgene-scu" + second value must be always "syscon". + + - reg : offset and length of the register set. + +Example : + scu: system-clk-controller@1700 { + compatible = "apm,xgene-scu","syscon"; + reg = <0x0 0x1700 0x0 0x400>; + }; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V9 5/6] arm64: Select reboot driver for X-Gene platform
Select reboot driver for X-Gene platform. Signed-off-by: Feng Kan --- arch/arm64/Kconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 839f48c..df6a646 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -141,6 +141,8 @@ config ARCH_VEXPRESS config ARCH_XGENE bool "AppliedMicro X-Gene SOC Family" + select MFD_SYSCON + select POWER_RESET_SYSCON help This enables support for AppliedMicro X-Gene SOC Family -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V9 6/6] power: reset: Remove X-Gene reboot driver
Remove X-Gene reboot driver. Signed-off-by: Feng Kan --- drivers/power/reset/Kconfig| 7 --- drivers/power/reset/Makefile | 1 - drivers/power/reset/xgene-reboot.c | 103 - 3 files changed, 111 deletions(-) delete mode 100644 drivers/power/reset/xgene-reboot.c diff --git a/drivers/power/reset/Kconfig b/drivers/power/reset/Kconfig index fd5f9e5..48b3cb3 100644 --- a/drivers/power/reset/Kconfig +++ b/drivers/power/reset/Kconfig @@ -66,13 +66,6 @@ config POWER_RESET_VEXPRESS Power off and reset support for the ARM Ltd. Versatile Express boards. -config POWER_RESET_XGENE - bool "APM SoC X-Gene reset driver" - depends on ARM64 - depends on POWER_RESET - help - Reboot support for the APM SoC X-Gene Eval boards. - config POWER_RESET_KEYSTONE bool "Keystone reset driver" depends on ARCH_KEYSTONE diff --git a/drivers/power/reset/Makefile b/drivers/power/reset/Makefile index b1b5ab3..62088d8 100644 --- a/drivers/power/reset/Makefile +++ b/drivers/power/reset/Makefile @@ -6,6 +6,5 @@ obj-$(CONFIG_POWER_RESET_QNAP) += qnap-poweroff.o obj-$(CONFIG_POWER_RESET_RESTART) += restart-poweroff.o obj-$(CONFIG_POWER_RESET_SUN6I) += sun6i-reboot.o obj-$(CONFIG_POWER_RESET_VEXPRESS) += vexpress-poweroff.o -obj-$(CONFIG_POWER_RESET_XGENE) += xgene-reboot.o obj-$(CONFIG_POWER_RESET_KEYSTONE) += keystone-reset.o obj-$(CONFIG_POWER_RESET_SYSCON) += syscon-reboot.o diff --git a/drivers/power/reset/xgene-reboot.c b/drivers/power/reset/xgene-reboot.c deleted file mode 100644 index ecd55f8..000 --- a/drivers/power/reset/xgene-reboot.c +++ /dev/null @@ -1,103 +0,0 @@ -/* - * AppliedMicro X-Gene SoC Reboot Driver - * - * Copyright (c) 2013, Applied Micro Circuits Corporation - * Author: Feng Kan - * Author: Loc Ho - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation; either version 2 of - * the License, or (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, - * MA 02111-1307 USA - * - * This driver provides system reboot functionality for APM X-Gene SoC. - * For system shutdown, this is board specify. If a board designer - * implements GPIO shutdown, use the gpio-poweroff.c driver. - */ -#include -#include -#include -#include -#include -#include -#include - -struct xgene_reboot_context { - struct platform_device *pdev; - void *csr; - u32 mask; -}; - -static struct xgene_reboot_context *xgene_restart_ctx; - -static void xgene_restart(char str, const char *cmd) -{ - struct xgene_reboot_context *ctx = xgene_restart_ctx; - unsigned long timeout; - - /* Issue the reboot */ - if (ctx) - writel(ctx->mask, ctx->csr); - - timeout = jiffies + HZ; - while (time_before(jiffies, timeout)) - cpu_relax(); - - dev_emerg(>pdev->dev, "Unable to restart system\n"); -} - -static int xgene_reboot_probe(struct platform_device *pdev) -{ - struct xgene_reboot_context *ctx; - - ctx = devm_kzalloc(>dev, sizeof(*ctx), GFP_KERNEL); - if (!ctx) { - dev_err(>dev, "out of memory for context\n"); - return -ENODEV; - } - - ctx->csr = of_iomap(pdev->dev.of_node, 0); - if (!ctx->csr) { - devm_kfree(>dev, ctx); - dev_err(>dev, "can not map resource\n"); - return -ENODEV; - } - - if (of_property_read_u32(pdev->dev.of_node, "mask", >mask)) - ctx->mask = 0x; - - ctx->pdev = pdev; - arm_pm_restart = xgene_restart; - xgene_restart_ctx = ctx; - - return 0; -} - -static struct of_device_id xgene_reboot_of_match[] = { - { .compatible = "apm,xgene-reboot" }, - {} -}; - -static struct platform_driver xgene_reboot_driver = { - .probe = xgene_reboot_probe, - .driver = { - .name = "xgene-reboot", - .of_match_table = xgene_reboot_of_match, - }, -}; - -static int __init xgene_reboot_init(void) -{ - return platform_driver_register(_reboot_driver); -} -device_initcall(xgene_reboot_init); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V9 1/6] power: reset: Add generic SYSCON register mapped reset
Add a generic SYSCON register mapped reset mechanism. Signed-off-by: Feng Kan --- drivers/power/reset/Kconfig | 5 ++ drivers/power/reset/Makefile| 1 + drivers/power/reset/syscon-reboot.c | 98 + 3 files changed, 104 insertions(+) create mode 100644 drivers/power/reset/syscon-reboot.c diff --git a/drivers/power/reset/Kconfig b/drivers/power/reset/Kconfig index bdcf517..fd5f9e5 100644 --- a/drivers/power/reset/Kconfig +++ b/drivers/power/reset/Kconfig @@ -80,3 +80,8 @@ config POWER_RESET_KEYSTONE help Reboot support for the KEYSTONE SoCs. +config POWER_RESET_SYSCON + bool "Generic SYSCON regmap reset driver" + depends on POWER_RESET && MFD_SYSCON && OF + help + Reboot support for generic SYSCON mapped register reset. diff --git a/drivers/power/reset/Makefile b/drivers/power/reset/Makefile index dde2e8b..b1b5ab3 100644 --- a/drivers/power/reset/Makefile +++ b/drivers/power/reset/Makefile @@ -8,3 +8,4 @@ obj-$(CONFIG_POWER_RESET_SUN6I) += sun6i-reboot.o obj-$(CONFIG_POWER_RESET_VEXPRESS) += vexpress-poweroff.o obj-$(CONFIG_POWER_RESET_XGENE) += xgene-reboot.o obj-$(CONFIG_POWER_RESET_KEYSTONE) += keystone-reset.o +obj-$(CONFIG_POWER_RESET_SYSCON) += syscon-reboot.o diff --git a/drivers/power/reset/syscon-reboot.c b/drivers/power/reset/syscon-reboot.c new file mode 100644 index 000..75d6eae --- /dev/null +++ b/drivers/power/reset/syscon-reboot.c @@ -0,0 +1,98 @@ +/* + * Generic Syscon Reboot Driver + * + * Copyright (c) 2013, Applied Micro Circuits Corporation + * Author: Feng Kan + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ +#include +#include +#include +#include +#include +#include +#include +#include + +struct syscon_reboot_context { + struct regmap *map; + u32 offset; + u32 mask; + struct notifier_block restart_handler; +}; + +static struct syscon_reboot_context *syscon_reboot_ctx; + +static int syscon_restart_handle(struct notifier_block *this, + unsigned long mode, void *cmd) +{ + struct syscon_reboot_context *ctx = syscon_reboot_ctx; + unsigned long timeout; + + /* Issue the reboot */ + if (ctx->map) + regmap_write(ctx->map, ctx->offset, ctx->mask); + + timeout = jiffies + HZ; + while (time_before(jiffies, timeout)) + cpu_relax(); + + pr_emerg("Unable to restart system\n"); + return NOTIFY_DONE; +} + +static int syscon_reboot_probe(struct platform_device *pdev) +{ + struct syscon_reboot_context *ctx; + struct device *dev = >dev; + int err; + + ctx = devm_kzalloc(>dev, sizeof(*ctx), GFP_KERNEL); + if (!ctx) { + dev_err(>dev, "out of memory for context\n"); + return -ENOMEM; + } + + ctx->map = syscon_regmap_lookup_by_phandle(dev->of_node, "regmap"); + if (IS_ERR(ctx->map)) + return PTR_ERR(ctx->map); + + if (of_property_read_u32(pdev->dev.of_node, "offset", >offset)) + return -EINVAL; + + if (of_property_read_u32(pdev->dev.of_node, "mask", >mask)) + return -EINVAL; + + ctx->restart_handler.notifier_call = syscon_restart_handle; + ctx->restart_handler.priority = 128; + err = register_restart_handler(>restart_handler); + if (err) + dev_err(dev, "can't register restart notifier (err=%d)\n", err); + + syscon_reboot_ctx = ctx; + + return 0; +} + +static struct of_device_id syscon_reboot_of_match[] = { + { .compatible = "syscon-reboot" }, + {} +}; + +static struct platform_driver syscon_reboot_driver = { + .probe = syscon_reboot_probe, + .driver = { + .name = "syscon-reboot", + .of_match_table = syscon_reboot_of_match, + }, +}; +module_platform_driver(syscon_reboot_driver); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ARM: l2x0: fix build warning without CONFIG_OF
Commit cf9ea8f13(ARM: l2c: remove obsolete l2x0 ops for non-OF init) remove some obsolete l2x0 ops, the rest of ops: l2x0_cache_sync, l2x0_cache_sync, l2x0_disable only use under OF enable, so move them into OF part, or "defined but not used" warning occurs. Signed-off-by: Kefeng Wang --- arch/arm/mm/cache-l2x0.c | 134 +++ 1 file changed, 67 insertions(+), 67 deletions(-) diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c index 7c3fb41..77e57b0 100644 --- a/arch/arm/mm/cache-l2x0.c +++ b/arch/arm/mm/cache-l2x0.c @@ -135,73 +135,6 @@ static void l2c_disable(void) dsb(st); } -#ifdef CONFIG_CACHE_PL310 -static inline void cache_wait(void __iomem *reg, unsigned long mask) -{ - /* cache operations by line are atomic on PL310 */ -} -#else -#define cache_wait l2c_wait_mask -#endif - -static inline void cache_sync(void) -{ - void __iomem *base = l2x0_base; - - writel_relaxed(0, base + sync_reg_offset); - cache_wait(base + L2X0_CACHE_SYNC, 1); -} - -#if defined(CONFIG_PL310_ERRATA_588369) || defined(CONFIG_PL310_ERRATA_727915) -static inline void debug_writel(unsigned long val) -{ - l2c_set_debug(l2x0_base, val); -} -#else -/* Optimised out for non-errata case */ -static inline void debug_writel(unsigned long val) -{ -} -#endif - -static void l2x0_cache_sync(void) -{ - unsigned long flags; - - raw_spin_lock_irqsave(_lock, flags); - cache_sync(); - raw_spin_unlock_irqrestore(_lock, flags); -} - -static void __l2x0_flush_all(void) -{ - debug_writel(0x03); - __l2c_op_way(l2x0_base + L2X0_CLEAN_INV_WAY); - cache_sync(); - debug_writel(0x00); -} - -static void l2x0_flush_all(void) -{ - unsigned long flags; - - /* clean all ways */ - raw_spin_lock_irqsave(_lock, flags); - __l2x0_flush_all(); - raw_spin_unlock_irqrestore(_lock, flags); -} - -static void l2x0_disable(void) -{ - unsigned long flags; - - raw_spin_lock_irqsave(_lock, flags); - __l2x0_flush_all(); - l2c_write_sec(0, l2x0_base, L2X0_CTRL); - dsb(st); - raw_spin_unlock_irqrestore(_lock, flags); -} - static void l2c_save(void __iomem *base) { l2x0_saved_regs.aux_ctrl = readl_relaxed(l2x0_base + L2X0_AUX_CTRL); @@ -945,6 +878,73 @@ static int l2_wt_override; * pass it though the device tree */ static u32 cache_id_part_number_from_dt; +#ifdef CONFIG_CACHE_PL310 +static inline void cache_wait(void __iomem *reg, unsigned long mask) +{ + /* cache operations by line are atomic on PL310 */ +} +#else +#define cache_wait l2c_wait_mask +#endif + +static inline void cache_sync(void) +{ + void __iomem *base = l2x0_base; + + writel_relaxed(0, base + sync_reg_offset); + cache_wait(base + L2X0_CACHE_SYNC, 1); +} + +#if defined(CONFIG_PL310_ERRATA_588369) || defined(CONFIG_PL310_ERRATA_727915) +static inline void debug_writel(unsigned long val) +{ + l2c_set_debug(l2x0_base, val); +} +#else +/* Optimised out for non-errata case */ +static inline void debug_writel(unsigned long val) +{ +} +#endif + +static void l2x0_cache_sync(void) +{ + unsigned long flags; + + raw_spin_lock_irqsave(_lock, flags); + cache_sync(); + raw_spin_unlock_irqrestore(_lock, flags); +} + +static void __l2x0_flush_all(void) +{ + debug_writel(0x03); + __l2c_op_way(l2x0_base + L2X0_CLEAN_INV_WAY); + cache_sync(); + debug_writel(0x00); +} + +static void l2x0_flush_all(void) +{ + unsigned long flags; + + /* clean all ways */ + raw_spin_lock_irqsave(_lock, flags); + __l2x0_flush_all(); + raw_spin_unlock_irqrestore(_lock, flags); +} + +static void l2x0_disable(void) +{ + unsigned long flags; + + raw_spin_lock_irqsave(_lock, flags); + __l2x0_flush_all(); + l2c_write_sec(0, l2x0_base, L2X0_CTRL); + dsb(st); + raw_spin_unlock_irqrestore(_lock, flags); +} + static void __init l2x0_of_parse(const struct device_node *np, u32 *aux_val, u32 *aux_mask) { -- 1.7.12.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REVIEW][PATCH 0/4] /proc/thread-self
On Thu, 2014-07-31 at 17:30 -0700, Eric W. Biederman wrote: > This is small chance changing /proc/net and /proc/mounts will cause > userspace regressions (although nothing has shown up in my testing) if > that happens we can just point the change that moves them from > /proc/self/... to /proc/thread-self/... Isn't breaking userspace a no no, no matter what? At least some util-linux programs makes use of both /proc/mounts and /proc/net. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/2] dirreadahead system call
- Original Message - > From: "NeilBrown" > To: "Abhi Das" > Cc: linux-kernel@vger.kernel.org, linux-fsde...@vger.kernel.org, > cluster-de...@redhat.com > Sent: Wednesday, July 30, 2014 10:18:05 PM > Subject: Re: [RFC PATCH 0/2] dirreadahead system call > > On Fri, 25 Jul 2014 12:37:29 -0500 Abhi Das wrote: > > > This system call takes 3 arguments: > > fd - file descriptor of the directory being readahead > > *offset - offset in dir from which to resume. This is updated > > as we move along in the directory > > count - The max number of entries to readahead > > > > The syscall is supposed to read upto 'count' entries starting at > > '*offset' and cache the inodes corresponding to those entries. It > > returns a negative error code or a positive number indicating > > the number of inodes it has issued readaheads for. It also > > updates the '*offset' value so that repeated calls to dirreadahead > > can resume at the right location. Returns 0 when there are no more > > entries left. > > Hi Abhi, > > I like the idea of enhanced read-ahead on a directory. > It isn't clear to me why you have included these particular fields in the > interface though. > > - why have an 'offset'? Why not just use the current offset of the >directory 'fd'? The idea was that we didn't want a syscall like readahead mucking with the file pointer as the same fd might be used to do getdents(). > - Why have a count? How would a program choose what count to give? If a program knows that it's only going to work on a subset of files at a time, it can use the count value to only readahead a small number of inodes at once instead of reading ahead the entire directory. That said, this interface is not set in stone and we are exploring ways to inform the kernel of the inodes we are interested in reading ahead. > > Maybe you imagine using 'getdents' first to get a list of names, then > selectively calling 'dirreadahead' on the offsets of the names you are > interested it? That would be racy as names can be added and removed which > might change offsets. So maybe you have another reason? > > I would like to suggest an alternate interface (I love playing the API > game). > > 1/ Add a flag to 'fstatat' AT_EXPECT_MORE. > If the pathname does not contain a '/', then the 'dirfd' is marked > to indicate that stat information for all names returned by getdents will > be wanted. The filesystem can choose to optimise that however it sees > fit. > > 2/ Add a flag to 'fstatat' AT_NONBLOCK. > This tells the filesystem that you want this information, so if it can > return it immediately it should, and if not it should start pulling it > into cache. Possibly this should be two flags: AT_NONBLOCK just avoids > any IO, and AT_ASYNC instigates IO even if NONBLOCK is set. > > Then an "ls -l" could use AT_EXPECT_MORE and then just stat each name. > An "ls -l *.c", might avoid AT_EXPECT_MORE, but would use AT_NONBLOCK > against all names, then try again with all the names that returned > EWOULDBLOCK the first time. > > > I would really like to see the 'xstat' syscall too, but there is no point > having both "xstat" and "fxstat". Follow the model of "fstatat" and provide > just "fxstatat" which can do both. > With fxstatat, AT_EXPECT_MORE would tell the dirfd exactly which attributes > would be wanted so it can fetch only that which is desired. > > I'm not very keen on the xgetdents idea of including name information and > stat information into the one syscall - I would prefer getdents and xstat be > kept separate. Of course if a genuine performance cost of the separate can > be demonstrated, I could well change my mind. > > It does, however, have the advantage that the kernel doesn't need to worry > about how long read-ahead data needs to be kept, and the application doesn't > need to worry about how soon to retry an fstatat which failed with > EWOULDBLOCK. > > Thanks for raising this issue again. I hope it gets fixed one day... > > NeilBrown > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lpfc: Avoid to disable pci_dev twice
On 07/17/2014 02:32 PM, Mike Qiu wrote: Hi, all How about this patch ? Any idea ? In IBM Power servers, when hardware error occurs during probe state, EEH subsystem will call driver's error_detected interface, which will call pci_disable_device(). But driver's probe function also call pci_disable_device() in this situation. So pci_dev will be disabled twice: Device lpfc disabling already-disabled device [ cut here ] WARNING: at drivers/pci/pci.c:1407 CPU: 0 PID: 8744 Comm: kworker/0:0 Tainted: GW 3.10.42-2002.pkvm2_1_1.6.ppc64 #1 Workqueue: events .work_for_cpu_fn task: c0274e3f5400 ti: c027d3958000 task.ti: c027d3958000 NIP: c0471b8c LR: c0471b88 CTR: c043ebe0 REGS: c027d395b650 TRAP: 0700 Tainted: GW (3.10.42-2002.pkvm2_1_1.6.ppc64) MSR: 900100029032 CR: 28b52b44 XER: 2000 CFAR: c0879ab8 SOFTE: 1 ... NIP .pci_disable_device+0xcc/0xe0 LR .pci_disable_device+0xc8/0xe0 Call Trace: .pci_disable_device+0xc8/0xe0 (unreliable) .lpfc_disable_pci_dev+0x50/0x80 [lpfc] .lpfc_pci_probe_one+0x870/0x21a0 [lpfc] .local_pci_probe+0x68/0xb0 .work_for_cpu_fn+0x38/0x60 .process_one_work+0x1a4/0x4d0 .worker_thread+0x37c/0x490 .kthread+0xf0/0x100 .ret_from_kernel_thread+0x5c/0x80 Signed-off-by: Mike Qiu --- drivers/scsi/lpfc/lpfc.h | 1 + drivers/scsi/lpfc/lpfc_init.c | 59 +++ 2 files changed, 55 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 434e903..0c7bad9 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -813,6 +813,7 @@ struct lpfc_hba { #define VPD_MASK0xf /* mask for any vpd data */ uint8_t soft_wwn_enable; + uint8_t probe_done; struct timer_list fcp_poll_timer; struct timer_list eratt_poll; diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 06f9a5b..c2e67ae 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -9519,6 +9519,9 @@ lpfc_pci_probe_one_s3(struct pci_dev *pdev, const struct pci_device_id *pid) } } + /* Set the probe flag */ + phba->probe_done = 1; + /* Perform post initialization setup */ lpfc_post_init_setup(phba); @@ -9795,6 +9798,9 @@ lpfc_sli_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) { + if (phba) + return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "2710 PCI channel disable preparing for reset\n"); @@ -9812,7 +9818,8 @@ lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli_disable_intr(phba); - pci_disable_device(phba->pcidev); + if (phba->probe_done && phba->pcidev) + pci_disable_device(phba->pcidev); } /** @@ -10282,6 +10289,9 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const struct pci_device_id *pid) goto out_disable_intr; } + /* Set probe_done flag */ + phba->probe_done = 1; + /* Log the current active interrupt mode */ phba->intr_mode = intr_mode; lpfc_log_intr_mode(phba, intr_mode); @@ -10544,6 +10554,9 @@ lpfc_sli4_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) { + if (!phba) + return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "2826 PCI channel disable preparing for reset\n"); @@ -10562,7 +10575,9 @@ lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli4_disable_intr(phba); lpfc_sli4_queue_destroy(phba); - pci_disable_device(phba->pcidev); + + if (phba->probe_done && phba->pcidev) + pci_disable_device(phba->pcidev); } /** @@ -10893,9 +10908,21 @@ static pci_ers_result_t lpfc_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) { struct Scsi_Host *shost = pci_get_drvdata(pdev); - struct lpfc_hba *phba = ((struct lpfc_vport *)shost->hostdata)->phba; + struct lpfc_hba *phba; pci_ers_result_t rc = PCI_ERS_RESULT_DISCONNECT; + if (!shost) + /* Run here means it may during probe state and +* Scsi_Host has not been created and We can do nothing +* in this state so call for hotplug*/ + return PCI_ERS_RESULT_NONE; + + phba = ((struct lpfc_vport *)shost->hostdata)->phba; + + if (!phba || !phba->probe_done) + /* Run here means it may during probe state */ + return PCI_ERS_RESULT_NONE; + switch (phba->pci_dev_grp) { case LPFC_PCI_DEV_LP: rc = lpfc_io_error_detected_s3(pdev, state); @@ -10930,9 +10957,20 @@ static
Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks()
On Fri, Aug 01, 2014 at 09:31:37AM +0800, Lai Jiangshan wrote: > On 08/01/2014 05:55 AM, Paul E. McKenney wrote: > > From: "Paul E. McKenney" > > > > This commit adds a new RCU-tasks flavor of RCU, which provides > > call_rcu_tasks(). This RCU flavor's quiescent states are voluntary > > context switch (not preemption!), userspace execution, and the idle loop. > > Note that unlike other RCU flavors, these quiescent states occur in tasks, > > not necessarily CPUs. Includes fixes from Steven Rostedt. > > > > This RCU flavor is assumed to have very infrequent latency-tolerate > > updaters. This assumption permits significant simplifications, including > > a single global callback list protected by a single global lock, along > > with a single linked list containing all tasks that have not yet passed > > through a quiescent state. If experience shows this assumption to be > > incorrect, the required additional complexity will be added. > > > > Suggested-by: Steven Rostedt > > Signed-off-by: Paul E. McKenney > > --- > > include/linux/init_task.h | 9 +++ > > include/linux/rcupdate.h | 36 ++ > > include/linux/sched.h | 23 --- > > init/Kconfig | 10 +++ > > kernel/rcu/tiny.c | 2 + > > kernel/rcu/tree.c | 2 + > > kernel/rcu/update.c | 171 > > ++ > > 7 files changed, 242 insertions(+), 11 deletions(-) > > > > diff --git a/include/linux/init_task.h b/include/linux/init_task.h > > index 6df7f9fe0d01..78715ea7c30c 100644 > > --- a/include/linux/init_task.h > > +++ b/include/linux/init_task.h > > @@ -124,6 +124,14 @@ extern struct group_info init_groups; > > #else > > #define INIT_TASK_RCU_PREEMPT(tsk) > > #endif > > +#ifdef CONFIG_TASKS_RCU > > +#define INIT_TASK_RCU_TASKS(tsk) \ > > + .rcu_tasks_holdout = false, \ > > + .rcu_tasks_holdout_list = \ > > + LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list), > > +#else > > +#define INIT_TASK_RCU_TASKS(tsk) > > +#endif > > > > extern struct cred init_cred; > > > > @@ -231,6 +239,7 @@ extern struct task_group root_task_group; > > INIT_FTRACE_GRAPH \ > > INIT_TRACE_RECURSION\ > > INIT_TASK_RCU_PREEMPT(tsk) \ > > + INIT_TASK_RCU_TASKS(tsk)\ > > INIT_CPUSET_SEQ(tsk)\ > > INIT_RT_MUTEXES(tsk)\ > > INIT_VTIME(tsk) \ > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > index 6a94cc8b1ca0..829efc99df3e 100644 > > --- a/include/linux/rcupdate.h > > +++ b/include/linux/rcupdate.h > > @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head, > > > > void synchronize_sched(void); > > > > +/** > > + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period > > + * @head: structure to be used for queueing the RCU updates. > > + * @func: actual callback function to be invoked after the grace period > > + * > > + * The callback function will be invoked some time after a full grace > > + * period elapses, in other words after all currently executing RCU > > + * read-side critical sections have completed. call_rcu_tasks() assumes > > + * that the read-side critical sections end at a voluntary context > > + * switch (not a preemption!), entry into idle, or transition to usermode > > + * execution. As such, there are no read-side primitives analogous to > > + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended > > + * to determine that all tasks have passed through a safe state, not so > > + * much for data-strcuture synchronization. > > + * > > + * See the description of call_rcu() for more detailed information on > > + * memory ordering guarantees. > > + */ > > +void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head > > *head)); > > + > > #ifdef CONFIG_PREEMPT_RCU > > > > void __rcu_read_lock(void); > > @@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct > > task_struct *prev, > > rcu_irq_exit(); \ > > } while (0) > > > > +/* > > + * Note a voluntary context switch for RCU-tasks benefit. This is a > > + * macro rather than an inline function to avoid #include hell. > > + */ > > +#ifdef CONFIG_TASKS_RCU > > +#define rcu_note_voluntary_context_switch(t) \ > > + do { \ > > + preempt_disable(); /* Exclude synchronize_sched(); */ \ > > + if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \ > > + ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \ > > + preempt_enable(); \ > > Why the preempt_disable() is needed here? The comments in rcu_tasks_kthread() > can't persuade me.
Re: [RFC PATCH 0/2] dirreadahead system call
- Original Message - > From: "Dave Chinner" > To: "Andreas Dilger" > Cc: "Abhijith Das" , "LKML" , > "linux-fsdevel" > , cluster-de...@redhat.com > Sent: Thursday, July 31, 2014 6:53:06 PM > Subject: Re: [RFC PATCH 0/2] dirreadahead system call > > On Thu, Jul 31, 2014 at 01:19:45PM +0200, Andreas Dilger wrote: > > On Jul 31, 2014, at 6:49, Dave Chinner wrote: > > > > > >> On Mon, Jul 28, 2014 at 03:19:31PM -0600, Andreas Dilger wrote: > > >>> On Jul 28, 2014, at 6:52 AM, Abhijith Das wrote: > > >>> OnJuly 26, 2014 12:27:19 AM "Andreas Dilger" wrote: > > Is there a time when this doesn't get called to prefetch entries in > > readdir() order? It isn't clear to me what benefit there is of > > returning > > the entries to userspace instead of just doing the statahead > > implicitly > > in the kernel? > > > > The Lustre client has had what we call "statahead" for a while, > > and similar to regular file readahead it detects the sequential access > > pattern for readdir() + stat() in readdir() order (taking into account > > if > > ".*" > > entries are being processed or not) and starts fetching the inode > > attributes asynchronously with a worker thread. > > >>> > > >>> Does this heuristic work well in practice? In the use case we were > > >>> trying to > > >>> address, a Samba server is aware beforehand if it is going to stat all > > >>> the > > >>> inodes in a directory. > > >> > > >> Typically this works well for us, because this is done by the Lustre > > >> client, so the statahead is hiding the network latency of the RPCs to > > >> fetch attributes from the server. I imagine the same could be seen with > > >> GFS2. I don't know if this approach would help very much for local > > >> filesystems because the latency is low. > > >> > > This syscall might be more useful if userspace called readdir() to get > > the dirents and then passed the kernel the list of inode numbers > > to prefetch before starting on the stat() calls. That way, userspace > > could generate an arbitrary list of inodes (e.g. names matching a > > regexp) and the kernel doesn't need to guess if every inode is needed. > > >>> > > >>> Were you thinking arbitrary inodes across the filesystem or just a > > >>> subset > > >>> from a directory? Arbitrary inodes may potentially throw up locking > > >>> issues. > > >> > > >> I was thinking about inodes returned from readdir(), but the syscall > > >> would be much more useful if it could handle arbitrary inodes. > > > > > > I'm not sure we can do that. The only way to safely identify a > > > specific inode in the filesystem from userspace is via a filehandle. > > > Plain inode numbers are susceptible to TOCTOU race conditions that > > > the kernel cannot resolve. Also, lookup by inode number bypasses > > > directory access permissions, so is not something we would expose > > > to arbitrary unprivileged users. > > > > None of these issues are relevant in the API that I'm thinking about. > > The syscall just passes the list of inode numbers to be prefetched > > into kernel memory, and then stat() is used to actually get the data into > > userspace (or whatever other operation is to be done on them), > > so there is no danger if the wrong inode is prefetched. If the inode > > number is bad the filesystem can just ignore it. > > Which means the filesystem has to treat the inode number as > potentially hostile. i.e. it can not be trusted to be correct and so > must take slow paths to validate the inode numbers. This adds > *significant* overhead to the readahead path for some filesystems: > readahead is only a win if it is low cost. > > For example, on XFS every untrusted inode number lookup requires an > inode btree lookup to validate the inode is actually valid on disk > and that is it allocated and has references. That lookup serialises > against inode allocation/freeing as well as other lookups. In > comparison, when using a trusted inode number from a directory > lookup within the kernel, we only need to do a couple of shift and > mask operations to convert it to a disk address and we are good to > go. > > i.e. the difference is at least 5 orders of magnitude higher CPU usage > for an "inode number readahead" syscall versus a "directory > readahead" syscall, it has significant serialisation issues and it > can stall other modification/lookups going on at the same time. > That's *horrible behaviour* for a speculative readahead operation, > but because the inodenumbers are untrusted, we can't avoid it. > > So, again, it's way more overhead than userspace just calling > stat() asycnhronously on many files at once as readdir/gentdents > returns dirents from the kernel to speed up cache population. > > That's my main issue with this patchset - it's implementing > something in kernelspace that can *easily* be done generically in > userspace without introducing all sorts of
Re: [PATCH v2 tip/core/rcu 01/10] rcu: Add call_rcu_tasks()
On Fri, Aug 01, 2014 at 08:53:38AM +0800, Lai Jiangshan wrote: > On 08/01/2014 12:09 AM, Paul E. McKenney wrote: > > > > >>> + /* > >>> + * There were callbacks, so we need to wait for an > >>> + * RCU-tasks grace period. Start off by scanning > >>> + * the task list for tasks that are not already > >>> + * voluntarily blocked. Mark these tasks and make > >>> + * a list of them in rcu_tasks_holdouts. > >>> + */ > >>> + rcu_read_lock(); > >>> + for_each_process_thread(g, t) { > >>> + if (t != current && ACCESS_ONCE(t->on_rq) && > >>> + !is_idle_task(t)) { > >> > >> What happen when the trampoline is on the idle task? > >> > >> I think we need to use schedule_on_each_cpu() to replace one of > >> the synchronize_sched() in this function. (or other stuff which can > >> cause real schedule for *ALL* online CPUs). > > > > Well, that is one of the questions in the 0/10 cover letter. If it turns > > out to be necessary to worry about idle-task trampolines, it should be > > possible to avoid hammering all idle CPUs in the common case. Though maybe > > battery-powered devices won't need RCU-tasks. > > trampolines on NO_HZ idle CPU can be arbitrary long, (example, SMI happens > inside the trampoline). So only the real schedule on idle CPU is reliable > to me. You might well be right, but first let's see if Steven needs this to work in the idle task to begin with. If he doesn't, then there is no point in worrying about it. If he does, I bet I can come up with a trick or two. ;-) Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks()
On Fri, Aug 01, 2014 at 01:57:50AM +0200, Frederic Weisbecker wrote: > On Thu, Jul 31, 2014 at 02:55:01PM -0700, Paul E. McKenney wrote: > > From: "Paul E. McKenney" > > > > This commit adds a new RCU-tasks flavor of RCU, which provides > > call_rcu_tasks(). This RCU flavor's quiescent states are voluntary > > context switch (not preemption!), userspace execution, and the idle loop. > > Note that unlike other RCU flavors, these quiescent states occur in tasks, > > not necessarily CPUs. Includes fixes from Steven Rostedt. > > > > This RCU flavor is assumed to have very infrequent latency-tolerate > > updaters. This assumption permits significant simplifications, including > > a single global callback list protected by a single global lock, along > > with a single linked list containing all tasks that have not yet passed > > through a quiescent state. If experience shows this assumption to be > > incorrect, the required additional complexity will be added. > > > > Suggested-by: Steven Rostedt > > Signed-off-by: Paul E. McKenney > > --- > > include/linux/init_task.h | 9 +++ > > include/linux/rcupdate.h | 36 ++ > > include/linux/sched.h | 23 --- > > init/Kconfig | 10 +++ > > kernel/rcu/tiny.c | 2 + > > kernel/rcu/tree.c | 2 + > > kernel/rcu/update.c | 171 > > ++ > > 7 files changed, 242 insertions(+), 11 deletions(-) > > > > diff --git a/include/linux/init_task.h b/include/linux/init_task.h > > index 6df7f9fe0d01..78715ea7c30c 100644 > > --- a/include/linux/init_task.h > > +++ b/include/linux/init_task.h > > @@ -124,6 +124,14 @@ extern struct group_info init_groups; > > #else > > #define INIT_TASK_RCU_PREEMPT(tsk) > > #endif > > +#ifdef CONFIG_TASKS_RCU > > +#define INIT_TASK_RCU_TASKS(tsk) \ > > + .rcu_tasks_holdout = false, \ > > + .rcu_tasks_holdout_list = \ > > + LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list), > > +#else > > +#define INIT_TASK_RCU_TASKS(tsk) > > +#endif > > > > extern struct cred init_cred; > > > > @@ -231,6 +239,7 @@ extern struct task_group root_task_group; > > INIT_FTRACE_GRAPH \ > > INIT_TRACE_RECURSION\ > > INIT_TASK_RCU_PREEMPT(tsk) \ > > + INIT_TASK_RCU_TASKS(tsk)\ > > INIT_CPUSET_SEQ(tsk)\ > > INIT_RT_MUTEXES(tsk)\ > > INIT_VTIME(tsk) \ > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > index 6a94cc8b1ca0..829efc99df3e 100644 > > --- a/include/linux/rcupdate.h > > +++ b/include/linux/rcupdate.h > > @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head, > > > > void synchronize_sched(void); > > > > +/** > > + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period > > + * @head: structure to be used for queueing the RCU updates. > > + * @func: actual callback function to be invoked after the grace period > > + * > > + * The callback function will be invoked some time after a full grace > > + * period elapses, in other words after all currently executing RCU > > + * read-side critical sections have completed. call_rcu_tasks() assumes > > + * that the read-side critical sections end at a voluntary context > > + * switch (not a preemption!), entry into idle, or transition to usermode > > + * execution. As such, there are no read-side primitives analogous to > > + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended > > + * to determine that all tasks have passed through a safe state, not so > > + * much for data-strcuture synchronization. > > + * > > + * See the description of call_rcu() for more detailed information on > > + * memory ordering guarantees. > > + */ > > +void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head > > *head)); > > + > > #ifdef CONFIG_PREEMPT_RCU > > > > void __rcu_read_lock(void); > > @@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct > > task_struct *prev, > > rcu_irq_exit(); \ > > } while (0) > > > > +/* > > + * Note a voluntary context switch for RCU-tasks benefit. This is a > > + * macro rather than an inline function to avoid #include hell. > > + */ > > +#ifdef CONFIG_TASKS_RCU > > +#define rcu_note_voluntary_context_switch(t) \ > > + do { \ > > + preempt_disable(); /* Exclude synchronize_sched(); */ \ > > + if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \ > > + ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \ > > + preempt_enable(); \ > > + } while (0) > > +#else /* #ifdef CONFIG_TASKS_RCU */ > > +#define
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote: > On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: > > On Tue, 29 Jul 2014 13:24:05 +0800 > > Aaron Lu wrote: > > > > > FYI, we noticed the below changes on > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > > > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure > > > task_numa_migrate() checks the preferred node") > > > > > > ebe06187bf2aec1 a43455a1d572daf7b730fe12e > > > --- - > > > 94500 ~ 3%+115.6% 203711 ~ 6% > > > ivb42/hackbench/50%-threads-pipe > > > 67745 ~ 4% +64.1% 74 ~ 5% > > > lkp-snb01/hackbench/50%-threads-socket > > > 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL > > > proc-vmstat.numa_hint_faults_local > > > > Hi Aaron, > > > > Jirka Hladky has reported a regression with that changeset as > > well, and I have already spent some time debugging the issue. > > So assuming those numbers above are the difference in Yes, they are. It means, for commit ebe06187bf2aec1, the number for num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01 machine. The 3%, 4% following that number means the deviation of the different runs to their average(we usually run it multiple times to phase out possible sharp values). We should probably remove that percentage, as they cause confusion if no detailed explanation and may not mean much to the commit author and others(if the deviation is big enough, we should simply drop that result). The percentage in the middle is the change between the two commits. Another thing is the meaning of the numbers, it doesn't seem that evident they are for proc-vmstat.numa_hint_faults_local. Maybe something like this is better? ebe06187bf2aec1 a43455a1d572daf7b730fe12e proc-vmstat.numa_hint_faults_local --- - - 94500 +115.6% 203711 ivb42/hackbench/50%-threads-pipe 67745 +64.1% 74 lkp-snb01/hackbench/50%-threads-socket 162245 +94.1% 314885 TOTAL Regards, Aaron > numa_hint_local_faults, the report is actually a significant > _improvement_, not a regression. > > On my IVB-EP I get similar numbers; using: > > PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` > perf bench sched messaging -g 24 -t -p -l 6 > POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` > echo $((POST-PRE)) > > > tip/mater+origin/master tip/master+origin/master-a43455a1d57 > > local total local total > faults timefaults time > > 19971 51.384 10104 50.838 > 17193 50.564 911650.208 > 13435 49.057 833251.344 > 23794 50.795 995451.364 > 20255 49.463 959851.258 > > 18929.6 50.2526 9420.8 51.0024 > 3863.61 0.96717.78 0.49 > > So that patch improves both local faults and runtime. Its good (even > though for the runtime we're still inside stdev overlap, so ideally I'd > do more runs). > > > Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and > that slightly reduces both again: > > tip/master+origin/master+patch > > local total > faults time > > 21296 50.541 > 12771 50.54 > 13872 52.224 > 23352 50.85 > 16516 50.705 > > 17561.4 50.972 > 4613.32 0.71 > > So for hackbench a43455a1d57 is good and the proposed patch is making > things worse. > > Let me see if I can still find my SPECjbb2005 copy to see what that > does. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks()
On Fri, Aug 01, 2014 at 09:15:34AM +0800, Lai Jiangshan wrote: > On 08/01/2014 05:55 AM, Paul E. McKenney wrote: > > From: "Paul E. McKenney" > > > > This commit adds a new RCU-tasks flavor of RCU, which provides > > call_rcu_tasks(). This RCU flavor's quiescent states are voluntary > > context switch (not preemption!), userspace execution, and the idle loop. > > Note that unlike other RCU flavors, these quiescent states occur in tasks, > > not necessarily CPUs. Includes fixes from Steven Rostedt. > > > > This RCU flavor is assumed to have very infrequent latency-tolerate > > updaters. This assumption permits significant simplifications, including > > a single global callback list protected by a single global lock, along > > with a single linked list containing all tasks that have not yet passed > > through a quiescent state. If experience shows this assumption to be > > incorrect, the required additional complexity will be added. > > > > Suggested-by: Steven Rostedt > > Signed-off-by: Paul E. McKenney > > --- > > include/linux/init_task.h | 9 +++ > > include/linux/rcupdate.h | 36 ++ > > include/linux/sched.h | 23 --- > > init/Kconfig | 10 +++ > > kernel/rcu/tiny.c | 2 + > > kernel/rcu/tree.c | 2 + > > kernel/rcu/update.c | 171 > > ++ > > 7 files changed, 242 insertions(+), 11 deletions(-) > > > > diff --git a/include/linux/init_task.h b/include/linux/init_task.h > > index 6df7f9fe0d01..78715ea7c30c 100644 > > --- a/include/linux/init_task.h > > +++ b/include/linux/init_task.h > > @@ -124,6 +124,14 @@ extern struct group_info init_groups; > > #else > > #define INIT_TASK_RCU_PREEMPT(tsk) > > #endif > > +#ifdef CONFIG_TASKS_RCU > > +#define INIT_TASK_RCU_TASKS(tsk) \ > > + .rcu_tasks_holdout = false, \ > > + .rcu_tasks_holdout_list = \ > > + LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list), > > +#else > > +#define INIT_TASK_RCU_TASKS(tsk) > > +#endif > > > > extern struct cred init_cred; > > > > @@ -231,6 +239,7 @@ extern struct task_group root_task_group; > > INIT_FTRACE_GRAPH \ > > INIT_TRACE_RECURSION\ > > INIT_TASK_RCU_PREEMPT(tsk) \ > > + INIT_TASK_RCU_TASKS(tsk)\ > > INIT_CPUSET_SEQ(tsk)\ > > INIT_RT_MUTEXES(tsk)\ > > INIT_VTIME(tsk) \ > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > index 6a94cc8b1ca0..829efc99df3e 100644 > > --- a/include/linux/rcupdate.h > > +++ b/include/linux/rcupdate.h > > @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head, > > > > void synchronize_sched(void); > > > > +/** > > + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period > > + * @head: structure to be used for queueing the RCU updates. > > + * @func: actual callback function to be invoked after the grace period > > + * > > + * The callback function will be invoked some time after a full grace > > + * period elapses, in other words after all currently executing RCU > > + * read-side critical sections have completed. call_rcu_tasks() assumes > > + * that the read-side critical sections end at a voluntary context > > + * switch (not a preemption!), entry into idle, or transition to usermode > > + * execution. As such, there are no read-side primitives analogous to > > + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended > > + * to determine that all tasks have passed through a safe state, not so > > + * much for data-strcuture synchronization. > > + * > > + * See the description of call_rcu() for more detailed information on > > + * memory ordering guarantees. > > + */ > > +void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head > > *head)); > > + > > #ifdef CONFIG_PREEMPT_RCU > > > > void __rcu_read_lock(void); > > @@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct > > task_struct *prev, > > rcu_irq_exit(); \ > > } while (0) > > > > +/* > > + * Note a voluntary context switch for RCU-tasks benefit. This is a > > + * macro rather than an inline function to avoid #include hell. > > + */ > > +#ifdef CONFIG_TASKS_RCU > > +#define rcu_note_voluntary_context_switch(t) \ > > + do { \ > > + preempt_disable(); /* Exclude synchronize_sched(); */ \ > > + if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \ > > + ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \ > > + preempt_enable(); \ > > + } while (0) > > +#else /* #ifdef CONFIG_TASKS_RCU */ > > +#define
[PATCH 2/2] staging: comedi: addi_apci_1564: remove diagnostic interrupt support code
As per Ian, at this point in time it is not worth implementing an async command interface for diagnostic interrupts for this board. As this is the case, this patch removes the code which supports such interrupts as it is now unused. This includes removing apci1564_do_read(), which was the insn_read operation for the digital output subdevice, since all it was doing was reading the current diagnostic interrupt type and returning it in 'data'. This doesn't follow the comedi API and this operation can be emulated by the comedi core anyway since the insn_bits operation follows the comedi API. So it is safe to simply remove this function. Signed-off-by: Chase Southwood Cc: Ian Abbott Cc: H Hartley Sweeten --- .../staging/comedi/drivers/addi-data/hwdrv_apci1564.c | 14 -- drivers/staging/comedi/drivers/addi_apci_1564.c| 18 -- 2 files changed, 32 deletions(-) diff --git a/drivers/staging/comedi/drivers/addi-data/hwdrv_apci1564.c b/drivers/staging/comedi/drivers/addi-data/hwdrv_apci1564.c index a1730e9..8a613ae 100644 --- a/drivers/staging/comedi/drivers/addi-data/hwdrv_apci1564.c +++ b/drivers/staging/comedi/drivers/addi-data/hwdrv_apci1564.c @@ -340,17 +340,3 @@ static int apci1564_timer_read(struct comedi_device *dev, } return insn->n; } - -/* - * Reads the interrupt status register - */ -static int apci1564_do_read(struct comedi_device *dev, - struct comedi_subdevice *s, - struct comedi_insn *insn, - unsigned int *data) -{ - struct apci1564_private *devpriv = dev->private; - - *data = devpriv->do_int_type; - return insn->n; -} diff --git a/drivers/staging/comedi/drivers/addi_apci_1564.c b/drivers/staging/comedi/drivers/addi_apci_1564.c index 819255b..543cb07 100644 --- a/drivers/staging/comedi/drivers/addi_apci_1564.c +++ b/drivers/staging/comedi/drivers/addi_apci_1564.c @@ -13,7 +13,6 @@ struct apci1564_private { unsigned int mode1; /* riding-edge/high level channels */ unsigned int mode2; /* falling-edge/low level channels */ unsigned int ctrl; /* interrupt mode OR (edge) . AND (level) */ - unsigned int do_int_type; unsigned char timer_select_mode; unsigned char mode_select_register; struct task_struct *tsk_current; @@ -25,8 +24,6 @@ static int apci1564_reset(struct comedi_device *dev) { struct apci1564_private *devpriv = dev->private; - devpriv->do_int_type = 0; - /* Disable the input interrupts and reset status register */ outl(0x0, devpriv->amcc_iobase + APCI1564_DI_IRQ_REG); inl(devpriv->amcc_iobase + APCI1564_DI_INT_STATUS_REG); @@ -83,20 +80,6 @@ static irqreturn_t apci1564_interrupt(int irq, void *d) outl(status, devpriv->amcc_iobase + APCI1564_DI_IRQ_REG); } - status = inl(devpriv->amcc_iobase + APCI1564_DO_IRQ_REG); - if (status & 0x01) { - /* Check for Digital Output interrupt Type */ - /* 1: VCC interrupt*/ - /* 2: CC interrupt */ - devpriv->do_int_type = inl(devpriv->amcc_iobase + - APCI1564_DO_INT_STATUS_REG) & 0x3; - /* Disable the Interrupt */ - outl(0x0, devpriv->amcc_iobase + APCI1564_DO_INT_CTRL_REG); - - /* Sends signal to user space */ - send_sig(SIGIO, devpriv->tsk_current, 0); - } - status = inl(devpriv->amcc_iobase + APCI1564_TIMER_IRQ_REG); if (status & 0x01) { /* Disable Timer Interrupt */ @@ -407,7 +390,6 @@ static int apci1564_auto_attach(struct comedi_device *dev, s->range_table = _digital; s->insn_config = apci1564_do_config; s->insn_bits = apci1564_do_insn_bits; - s->insn_read = apci1564_do_read; /* Change-Of-State (COS) interrupt subdevice */ s = >subdevices[2]; -- 2.0.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] staging: comedi: addi_apci_1564: add subdevice to check diagnostic status
This board provides VCC/CC diagnostic information, and it also supports diagnostic interrupts. However, as per Ian, these interrupts aren't very useful and it is enough to simply provide an interface for accessing the diagnostic status on-demand. This patch adds a 2-channel digital input subdevice with an insn_bits handler to access this information. Signed-off-by: Chase Southwood Cc: Ian Abbott Cc: H Hartley Sweeten --- drivers/staging/comedi/drivers/addi_apci_1564.c | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/drivers/staging/comedi/drivers/addi_apci_1564.c b/drivers/staging/comedi/drivers/addi_apci_1564.c index 190b026..819255b 100644 --- a/drivers/staging/comedi/drivers/addi_apci_1564.c +++ b/drivers/staging/comedi/drivers/addi_apci_1564.c @@ -157,6 +157,18 @@ static int apci1564_do_insn_bits(struct comedi_device *dev, return insn->n; } +static int apci1564_diag_insn_bits(struct comedi_device *dev, + struct comedi_subdevice *s, + struct comedi_insn *insn, + unsigned int *data) +{ + struct apci1564_private *devpriv = dev->private; + + data[1] = inl(devpriv->amcc_iobase + APCI1564_DO_INT_STATUS_REG) & 3; + + return insn->n; +} + /* * Change-Of-State (COS) interrupt configuration * @@ -373,7 +385,7 @@ static int apci1564_auto_attach(struct comedi_device *dev, dev->irq = pcidev->irq; } - ret = comedi_alloc_subdevices(dev, 5); + ret = comedi_alloc_subdevices(dev, 6); if (ret) return ret; @@ -434,6 +446,15 @@ static int apci1564_auto_attach(struct comedi_device *dev, if (ret) return ret; + /* Initialize the diagnostic status subdevice */ + s = >subdevices[5]; + s->type = COMEDI_SUBD_DI; + s->subdev_flags = SDF_READABLE; + s->n_chan = 2; + s->maxdata = 1; + s->range_table = _digital; + s->insn_bits = apci1564_diag_insn_bits; + return 0; } -- 2.0.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] staging: comedi: addi_apci_1564: provide interface to read diagnostic status
This patchset creates a simple subdevice to allow for reading of the board's diagnostic status, and then removes any code which is related to diagnostic interrupts, as the driver will not support these at this time. Chase Southwood (2): staging: comedi: addi_apci_1564: add subdevice to check diagnostic status staging: comedi: addi_apci_1564: remove diagnostic interrupt support code .../comedi/drivers/addi-data/hwdrv_apci1564.c | 14 drivers/staging/comedi/drivers/addi_apci_1564.c| 41 -- 2 files changed, 22 insertions(+), 33 deletions(-) -- 2.0.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scheduler crash on Power
On Wed, 2014-07-30 at 00:22 -0700, Sukadev Bhattiprolu wrote: > I am getting this crash on a Powerpc system using 3.16.0-rc7 kernel plus > some patches related to perf (24x7 counters) that Cody Schafer posted here: > > https://lkml.org/lkml/2014/5/27/768 > > I don't get the crash on an unpatched kernel though. You mean you don't get the crash on 3.16-rc7 ? I find it hard to believe those 24x7 patches are causing this. > I am also attaching the debug messages that Peterz added > here: https://lkml.org/lkml/2014/7/17/288 I don't see any FAIL messages in your log, so it looks like you're not hitting the case that patch was looking for? > Appreciate any debug suggestions. Reproduce on an unpatched kernel. cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes
On Thu, Jul 31, 2014 at 3:09 PM, Hugo Mills wrote: > On Thu, Jul 31, 2014 at 01:53:33PM -0400, Nicholas Krause wrote: >> This adds checks for the stated modes as if they are crap we will return >> error >> not supported. > >You've just enabled two options, but you haven't actually > implemented the code behind it. I would tell you *NOT* to do anything > else on this work until you can answer the question: What happens if > you apply this patch, create a large file called "foo.txt", and then a > userspace program executes the following code? > > int fd = open("foo.txt", O_RDWR); > fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 50, 50); > >Try it on a btrfs filesystem, both with and without your patch. > Also try it on an ext4 filesystem. > >Once you've done all of that, reply to this mail and tell me what > the problem is with this patch. You need to make two answers: what are > the technical problems with the patch? What errors have you made in > the development process? > >*Only* if you can answer those questions sensibly, should you write > any more patches, of any kind. > >Hugo. > >> Signed-off-by: Nicholas Krause >> --- >> fs/btrfs/file.c |3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >> index 1f2b99c..599495a 100644 >> --- a/fs/btrfs/file.c >> +++ b/fs/btrfs/file.c >> @@ -2490,7 +2490,8 @@ static long btrfs_fallocate(struct file *file, int >> mode, >> alloc_end = round_up(offset + len, blocksize); >> >> /* Make sure we aren't being give some crap mode */ >> - if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) >> + if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE| >> + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE)) >> return -EOPNOTSUPP; >> >> if (mode & FALLOC_FL_PUNCH_HOLE) >> -- >> 1.7.10.4 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === > PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk > --- The glass is neither half-full nor half-empty; it is twice as --- > large as it needs to be. Calls are there in btrfs , therefore will either kernel panic or cause an oops. Need to test this patch as this is very easy to catch bug. Nick -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes
Nicholas Krause posted on Thu, 31 Jul 2014 13:53:33 -0400 as excerpted: > This adds checks for the stated modes as if they are crap we will return > error not supported. > > Signed-off-by: Nicholas Krause > --- > fs/btrfs/file.c |3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 1f2b99c..599495a > 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2490,7 +2490,8 @@ > static long btrfs_fallocate(struct file *file, int mode, > alloc_end = round_up(offset + len, blocksize); > > /* Make sure we aren't being give some crap mode */ > - if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) > + if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE| + > FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE)) > return -EOPNOTSUPP; > > if (mode & FALLOC_FL_PUNCH_HOLE) Is the supporting code already there? You're removing the EOPNOTSUPP errors, but the code doesn't add the support, just removes the errors in the check for it, yet your comment doesn't point out that the support is actually already there with a pointer to either the commit adding it or the functions supporting it, as it should if that's true and the implementing patch simply forgot to remove those checks. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] swap: remove the struct cpumask has_work
On 08/01/2014 12:09 AM, Chris Metcalf wrote: > On 7/31/2014 7:51 AM, Michal Hocko wrote: >> On Thu 31-07-14 11:30:19, Lai Jiangshan wrote: >>> It is suggested that cpumask_var_t and alloc_cpumask_var() should be used >>> instead of struct cpumask. But I don't want to add this complicity nor >>> leave this unwelcome "static struct cpumask has_work;", so I just remove >>> it and use flush_work() to perform on all online drain_work. flush_work() >>> performs very quickly on initialized but unused work item, thus we don't >>> need the struct cpumask has_work for performance. >> Why? Just because there is general recommendation for using >> cpumask_var_t rather than cpumask? >> >> In this particular case cpumask shouldn't matter much as it is static. >> Your code will work as well, but I do not see any strong reason to >> change it just to get rid of cpumask which is not on stack. > > The code uses for_each_cpu with a cpumask to avoid waking cpus that don't > need to do work. This is important for the nohz_full type functionality, > power efficiency, etc. So, nack for this change. > flush_work() on initialized but unused work item just disables irq and fetches work->data to test and restores irq and return. the struct cpumask has_work is just premature optimization. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks()
On 08/01/2014 05:55 AM, Paul E. McKenney wrote: > From: "Paul E. McKenney" > > This commit adds a new RCU-tasks flavor of RCU, which provides > call_rcu_tasks(). This RCU flavor's quiescent states are voluntary > context switch (not preemption!), userspace execution, and the idle loop. > Note that unlike other RCU flavors, these quiescent states occur in tasks, > not necessarily CPUs. Includes fixes from Steven Rostedt. > > This RCU flavor is assumed to have very infrequent latency-tolerate > updaters. This assumption permits significant simplifications, including > a single global callback list protected by a single global lock, along > with a single linked list containing all tasks that have not yet passed > through a quiescent state. If experience shows this assumption to be > incorrect, the required additional complexity will be added. > > Suggested-by: Steven Rostedt > Signed-off-by: Paul E. McKenney > --- > include/linux/init_task.h | 9 +++ > include/linux/rcupdate.h | 36 ++ > include/linux/sched.h | 23 --- > init/Kconfig | 10 +++ > kernel/rcu/tiny.c | 2 + > kernel/rcu/tree.c | 2 + > kernel/rcu/update.c | 171 > ++ > 7 files changed, 242 insertions(+), 11 deletions(-) > > diff --git a/include/linux/init_task.h b/include/linux/init_task.h > index 6df7f9fe0d01..78715ea7c30c 100644 > --- a/include/linux/init_task.h > +++ b/include/linux/init_task.h > @@ -124,6 +124,14 @@ extern struct group_info init_groups; > #else > #define INIT_TASK_RCU_PREEMPT(tsk) > #endif > +#ifdef CONFIG_TASKS_RCU > +#define INIT_TASK_RCU_TASKS(tsk) \ > + .rcu_tasks_holdout = false, \ > + .rcu_tasks_holdout_list = \ > + LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list), > +#else > +#define INIT_TASK_RCU_TASKS(tsk) > +#endif > > extern struct cred init_cred; > > @@ -231,6 +239,7 @@ extern struct task_group root_task_group; > INIT_FTRACE_GRAPH \ > INIT_TRACE_RECURSION\ > INIT_TASK_RCU_PREEMPT(tsk) \ > + INIT_TASK_RCU_TASKS(tsk)\ > INIT_CPUSET_SEQ(tsk)\ > INIT_RT_MUTEXES(tsk)\ > INIT_VTIME(tsk) \ > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index 6a94cc8b1ca0..829efc99df3e 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head, > > void synchronize_sched(void); > > +/** > + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period > + * @head: structure to be used for queueing the RCU updates. > + * @func: actual callback function to be invoked after the grace period > + * > + * The callback function will be invoked some time after a full grace > + * period elapses, in other words after all currently executing RCU > + * read-side critical sections have completed. call_rcu_tasks() assumes > + * that the read-side critical sections end at a voluntary context > + * switch (not a preemption!), entry into idle, or transition to usermode > + * execution. As such, there are no read-side primitives analogous to > + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended > + * to determine that all tasks have passed through a safe state, not so > + * much for data-strcuture synchronization. > + * > + * See the description of call_rcu() for more detailed information on > + * memory ordering guarantees. > + */ > +void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head > *head)); > + > #ifdef CONFIG_PREEMPT_RCU > > void __rcu_read_lock(void); > @@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct > task_struct *prev, > rcu_irq_exit(); \ > } while (0) > > +/* > + * Note a voluntary context switch for RCU-tasks benefit. This is a > + * macro rather than an inline function to avoid #include hell. > + */ > +#ifdef CONFIG_TASKS_RCU > +#define rcu_note_voluntary_context_switch(t) \ > + do { \ > + preempt_disable(); /* Exclude synchronize_sched(); */ \ > + if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \ > + ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \ > + preempt_enable(); \ Why the preempt_disable() is needed here? The comments in rcu_tasks_kthread() can't persuade me. Maybe it could be removed? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the
Re: [PATCH 2/5] raid: Require designated initialization of structures
On Fri, Aug 01, 2014 at 11:10:55AM +1000, NeilBrown wrote: > On Thu, 31 Jul 2014 16:47:35 -0700 Josh Triplett > wrote: > > > Mark raid6_calls and other structures containing function pointers with > > __designated_init. Fix implementations in lib/raid6/ to use designated > > initializers; this also simplifies those initializers using the default > > initialization of fields to 0. > > > > Signed-off-by: Josh Triplett > > Looks like an excellent idea! > Feel free to forward this upstream on my behalf, or remind me once the first > patch is in -next, and I'll take this one myself - whichever you prefer. > > Acked-by: NeilBrown Thanks! Ideally, I'd like to see the whole series go in through one tree, which is why I CCed Andrew. I can easily produce several dozen more patches like these, but I just included enough examples to motivate patch 1, and I can send more in any order once that one goes in. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH percpu/for-3.17 1/2] percpu: implement percpu_pool
Hello, Andrew. On Thu, Jul 31, 2014 at 06:16:56PM -0700, Andrew Morton wrote: > Yet nowhere in either the changelog or the code comments is it even > mentioned that this allocator is unreliable and that callers *must* > implement (and test!) fallback paths. Hmmm, yeah, somehow the atomic behavior seemed obvious to me. I'll try to make it clear that this thing can and does fail. > > an obvious solution is adding a failure > > injection for debugging, but really except for being a bit ghetto, > > this is just the atomic allocation for percpu areas. > > If it was a try-GFP_ATOMIC-then-fall-back-to-pool thing then it would > work fairly well. But it's not even that - a caller could trivially > chew through that pool in a single timeslice. Especially on !SMP. > Especially squared with !PREEMPT or SCHED_FIFO. Yeap, occassional pool depletion would be a normal thing to happen, which isn't a correctness issue and most likely not even a performance issue. > But please make very sure that this is how we position it. I don't > know how to do this. Maybe prefix the names with "blk_" to signify > that it is block-private (and won't even be there if !CONFIG_BLOCK). > > Or rename percpu_pool_alloc() to percpu_pool_try_alloc() - that should > wake people up. Sounds good to me. I'll rename it to percpu_pool_try_alloc() and make it clear in the comment that the allocation is opportunistic. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH percpu/for-3.17 1/2] percpu: implement percpu_pool
On Thu, 31 Jul 2014 20:44:38 -0400 Tejun Heo wrote: > Hello, Andrew. > > On Thu, Jul 31, 2014 at 6:03 PM, Andrew Morton > wrote: > > I don't think we should add facilities such as this. Because if we do, > > people will use them and thereby make the kernel less reliable, for > > obvious reasons. > > > > It would be better to leave the nasty hack localized within > > blk-throttle.c and hope that someone finds a way of fixing it. > > The thing is we need similar facilities in the IO path in other places > too. They share exactly the same characteristics - opportunistic > percpu allocations during IO which are expected to fail from time to > time and they will all implement fallback behavior on allocation > failures. I'm not sure how this makes the kernel less reliable. This > conceptually isn't different from atomic allocations which we also use > in a similar way. Atomic allocations are more robust than this thing. But yes, they also are unreliable and their use should be discouraged for the same reasons. > If you're worried that people might use this > assuming that it won't fail, That's precisely my concern. Yet nowhere in either the changelog or the code comments is it even mentioned that this allocator is unreliable and that callers *must* implement (and test!) fallback paths. > an obvious solution is adding a failure > injection for debugging, but really except for being a bit ghetto, > this is just the atomic allocation for percpu areas. If it was a try-GFP_ATOMIC-then-fall-back-to-pool thing then it would work fairly well. But it's not even that - a caller could trivially chew through that pool in a single timeslice. Especially on !SMP. Especially squared with !PREEMPT or SCHED_FIFO. And that's all OK, as long as this is positioned as "opportunistic performance optimisation which is expected to be available most of the time in non-stressful use cases". But please make very sure that this is how we position it. I don't know how to do this. Maybe prefix the names with "blk_" to signify that it is block-private (and won't even be there if !CONFIG_BLOCK). Or rename percpu_pool_alloc() to percpu_pool_try_alloc() - that should wake people up. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks()
On 08/01/2014 05:55 AM, Paul E. McKenney wrote: > From: "Paul E. McKenney" > > This commit adds a new RCU-tasks flavor of RCU, which provides > call_rcu_tasks(). This RCU flavor's quiescent states are voluntary > context switch (not preemption!), userspace execution, and the idle loop. > Note that unlike other RCU flavors, these quiescent states occur in tasks, > not necessarily CPUs. Includes fixes from Steven Rostedt. > > This RCU flavor is assumed to have very infrequent latency-tolerate > updaters. This assumption permits significant simplifications, including > a single global callback list protected by a single global lock, along > with a single linked list containing all tasks that have not yet passed > through a quiescent state. If experience shows this assumption to be > incorrect, the required additional complexity will be added. > > Suggested-by: Steven Rostedt > Signed-off-by: Paul E. McKenney > --- > include/linux/init_task.h | 9 +++ > include/linux/rcupdate.h | 36 ++ > include/linux/sched.h | 23 --- > init/Kconfig | 10 +++ > kernel/rcu/tiny.c | 2 + > kernel/rcu/tree.c | 2 + > kernel/rcu/update.c | 171 > ++ > 7 files changed, 242 insertions(+), 11 deletions(-) > > diff --git a/include/linux/init_task.h b/include/linux/init_task.h > index 6df7f9fe0d01..78715ea7c30c 100644 > --- a/include/linux/init_task.h > +++ b/include/linux/init_task.h > @@ -124,6 +124,14 @@ extern struct group_info init_groups; > #else > #define INIT_TASK_RCU_PREEMPT(tsk) > #endif > +#ifdef CONFIG_TASKS_RCU > +#define INIT_TASK_RCU_TASKS(tsk) \ > + .rcu_tasks_holdout = false, \ > + .rcu_tasks_holdout_list = \ > + LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list), > +#else > +#define INIT_TASK_RCU_TASKS(tsk) > +#endif > > extern struct cred init_cred; > > @@ -231,6 +239,7 @@ extern struct task_group root_task_group; > INIT_FTRACE_GRAPH \ > INIT_TRACE_RECURSION\ > INIT_TASK_RCU_PREEMPT(tsk) \ > + INIT_TASK_RCU_TASKS(tsk)\ > INIT_CPUSET_SEQ(tsk)\ > INIT_RT_MUTEXES(tsk)\ > INIT_VTIME(tsk) \ > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index 6a94cc8b1ca0..829efc99df3e 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head, > > void synchronize_sched(void); > > +/** > + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period > + * @head: structure to be used for queueing the RCU updates. > + * @func: actual callback function to be invoked after the grace period > + * > + * The callback function will be invoked some time after a full grace > + * period elapses, in other words after all currently executing RCU > + * read-side critical sections have completed. call_rcu_tasks() assumes > + * that the read-side critical sections end at a voluntary context > + * switch (not a preemption!), entry into idle, or transition to usermode > + * execution. As such, there are no read-side primitives analogous to > + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended > + * to determine that all tasks have passed through a safe state, not so > + * much for data-strcuture synchronization. > + * > + * See the description of call_rcu() for more detailed information on > + * memory ordering guarantees. > + */ > +void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head > *head)); > + > #ifdef CONFIG_PREEMPT_RCU > > void __rcu_read_lock(void); > @@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct > task_struct *prev, > rcu_irq_exit(); \ > } while (0) > > +/* > + * Note a voluntary context switch for RCU-tasks benefit. This is a > + * macro rather than an inline function to avoid #include hell. > + */ > +#ifdef CONFIG_TASKS_RCU > +#define rcu_note_voluntary_context_switch(t) \ > + do { \ > + preempt_disable(); /* Exclude synchronize_sched(); */ \ > + if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \ > + ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \ > + preempt_enable(); \ > + } while (0) > +#else /* #ifdef CONFIG_TASKS_RCU */ > +#define rcu_note_voluntary_context_switch(t) do { } while (0) > +#endif /* #else #ifdef CONFIG_TASKS_RCU */ > + > #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || > defined(CONFIG_SMP) > bool __rcu_is_watching(void); > #endif /* #if
Re: [PATCH v2] ACPI / LPSS: add lpss device for Wildcat Point PCH
On Friday, August 01, 2014 09:06:35 AM Jie Yang wrote: > INT3438 is the ADSP device on Wildcat Point platform > with 2 DW DMA engines built In. The DMA engines are > used for DSP FW loading and audio data transferring. > These DMA engine probing need the clock, without it, > probing may failed and can't go forward. > > Add lpss device "INT3438" for Wildcat Point PCH, to > provide clock for its ADSP DMA engine probing. > > Signed-off-by: Jie Yang Looks good, queued up for 3.17, thanks! > --- > drivers/acpi/acpi_lpss.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c > index 9cb65b0..ce06149 100644 > --- a/drivers/acpi/acpi_lpss.c > +++ b/drivers/acpi/acpi_lpss.c > @@ -113,6 +113,14 @@ static void lpss_i2c_setup(struct lpss_private_data > *pdata) > writel(val, pdata->mmio_base + offset); > } > > +static struct lpss_device_desc wpt_dev_desc = { > + .clk_required = true, > + .prv_offset = 0x800, > + .ltr_required = true, > + .clk_divider = true, > + .clk_gate = true, > +}; > + > static struct lpss_device_desc lpt_dev_desc = { > .clk_required = true, > .prv_offset = 0x800, > @@ -226,6 +234,8 @@ static const struct acpi_device_id acpi_lpss_device_ids[] > = { > { "INT3436", LPSS_ADDR(lpt_sdio_dev_desc) }, > { "INT3437", }, > > + { "INT3438", LPSS_ADDR(wpt_dev_desc) }, > + > { } > }; > > -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] raid: Require designated initialization of structures
On Thu, 31 Jul 2014 16:47:35 -0700 Josh Triplett wrote: > Mark raid6_calls and other structures containing function pointers with > __designated_init. Fix implementations in lib/raid6/ to use designated > initializers; this also simplifies those initializers using the default > initialization of fields to 0. > > Signed-off-by: Josh Triplett Looks like an excellent idea! Feel free to forward this upstream on my behalf, or remind me once the first patch is in -next, and I'll take this one myself - whichever you prefer. Acked-by: NeilBrown Thanks, NeilBrown > --- > include/linux/raid/pq.h| 4 ++-- > include/linux/raid/xor.h | 2 +- > include/linux/raid_class.h | 2 +- > lib/raid6/altivec.uc | 7 +++ > lib/raid6/avx2.c | 24 > lib/raid6/int.uc | 6 ++ > lib/raid6/mmx.c| 14 ++ > lib/raid6/neon.c | 7 +++ > lib/raid6/sse1.c | 16 > lib/raid6/sse2.c | 24 > lib/raid6/tilegx.uc| 6 ++ > 11 files changed, 52 insertions(+), 60 deletions(-) > > diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h > index 73069cb..2147bff 100644 > --- a/include/linux/raid/pq.h > +++ b/include/linux/raid/pq.h > @@ -75,7 +75,7 @@ struct raid6_calls { > int (*valid)(void);/* Returns 1 if this routine set is usable */ > const char *name; /* Name of this routine set */ > int prefer; /* Has special performance attribute */ > -}; > +} __designated_init; > > /* Selected algorithm */ > extern struct raid6_calls raid6_call; > @@ -109,7 +109,7 @@ struct raid6_recov_calls { > int (*valid)(void); > const char *name; > int priority; > -}; > +} __designated_init; > > extern const struct raid6_recov_calls raid6_recov_intx1; > extern const struct raid6_recov_calls raid6_recov_ssse3; > diff --git a/include/linux/raid/xor.h b/include/linux/raid/xor.h > index 5a21095..c7df59f 100644 > --- a/include/linux/raid/xor.h > +++ b/include/linux/raid/xor.h > @@ -17,6 +17,6 @@ struct xor_block_template { >unsigned long *, unsigned long *); > void (*do_5)(unsigned long, unsigned long *, unsigned long *, >unsigned long *, unsigned long *, unsigned long *); > -}; > +} __designated_init; > > #endif > diff --git a/include/linux/raid_class.h b/include/linux/raid_class.h > index 31e1ff6..603af94 100644 > --- a/include/linux/raid_class.h > +++ b/include/linux/raid_class.h > @@ -16,7 +16,7 @@ struct raid_function_template { > int (*is_raid)(struct device *); > void (*get_resync)(struct device *); > void (*get_state)(struct device *); > -}; > +} __designated_init; > > enum raid_state { > RAID_STATE_UNKNOWN = 0, > diff --git a/lib/raid6/altivec.uc b/lib/raid6/altivec.uc > index 7cc12b5..4ff138c 100644 > --- a/lib/raid6/altivec.uc > +++ b/lib/raid6/altivec.uc > @@ -118,10 +118,9 @@ int raid6_have_altivec(void) > #endif > > const struct raid6_calls raid6_altivec$# = { > - raid6_altivec$#_gen_syndrome, > - raid6_have_altivec, > - "altivecx$#", > - 0 > + .gen_syndrome = raid6_altivec$#_gen_syndrome, > + .valid = raid6_have_altivec, > + .name = "altivecx$#", > }; > > #endif /* CONFIG_ALTIVEC */ > diff --git a/lib/raid6/avx2.c b/lib/raid6/avx2.c > index bc3b1dd..e56fa06 100644 > --- a/lib/raid6/avx2.c > +++ b/lib/raid6/avx2.c > @@ -88,10 +88,10 @@ static void raid6_avx21_gen_syndrome(int disks, size_t > bytes, void **ptrs) > } > > const struct raid6_calls raid6_avx2x1 = { > - raid6_avx21_gen_syndrome, > - raid6_have_avx2, > - "avx2x1", > - 1 /* Has cache hints */ > + .gen_syndrome = raid6_avx21_gen_syndrome, > + .valid = raid6_have_avx2, > + .name = "avx2x1", > + .prefer = 1,/* Has cache hints */ > }; > > /* > @@ -149,10 +149,10 @@ static void raid6_avx22_gen_syndrome(int disks, size_t > bytes, void **ptrs) > } > > const struct raid6_calls raid6_avx2x2 = { > - raid6_avx22_gen_syndrome, > - raid6_have_avx2, > - "avx2x2", > - 1 /* Has cache hints */ > + .gen_syndrome = raid6_avx22_gen_syndrome, > + .valid = raid6_have_avx2, > + .name = "avx2x2", > + .prefer = 1,/* Has cache hints */ > }; > > #ifdef CONFIG_X86_64 > @@ -241,10 +241,10 @@ static void raid6_avx24_gen_syndrome(int disks, size_t > bytes, void **ptrs) > } > > const struct raid6_calls raid6_avx2x4 = { > - raid6_avx24_gen_syndrome, > - raid6_have_avx2, > - "avx2x4", > - 1 /* Has cache hints */ > + .gen_syndrome = raid6_avx24_gen_syndrome, > + .valid = raid6_have_avx2, > + .name = "avx2x4", > + .prefer = 1,/* Has cache hints */ > }; > #endif > > diff --git a/lib/raid6/int.uc b/lib/raid6/int.uc > index 5b50f8d..35ad01a
Re: [PATCH 0/2] new API to allocate buffer-cache for superblock in non-movable area
2014-08-01 오전 9:07, Gioh Kim 쓴 글: 2014-07-31 오후 9:21, Jan Kara 쓴 글: On Thu 31-07-14 09:37:15, Gioh Kim wrote: 2014-07-31 오전 9:03, Jan Kara 쓴 글: On Thu 31-07-14 08:54:40, Gioh Kim wrote: 2014-07-30 오후 7:11, Jan Kara 쓴 글: On Wed 30-07-14 16:44:24, Gioh Kim wrote: 2014-07-22 오후 6:38, Jan Kara 쓴 글: On Tue 22-07-14 09:30:05, Peter Zijlstra wrote: On Tue, Jul 22, 2014 at 02:18:47PM +0900, Gioh Kim wrote: Hello, This patch try to solve problem that a long-lasting page cache of ext4 superblock disturbs page migration. I've been testing CMA feature on my ARM-based platform and found some pages for page caches cannot be migrated. Some of them are page caches of superblock of ext4 filesystem. Current ext4 reads superblock with sb_bread(). sb_bread() allocates page >from movable area. But the problem is that ext4 hold the page until it is unmounted. If root filesystem is ext4 the page cannot be migrated forever. I introduce a new API for allocating page from non-movable area. It is useful for ext4 and others that want to hold page cache for a long time. There's no word on why you can't teach ext4 to still migrate that page. For all I know it might be impossible, but at least mention why. I am very sorry for lacking of details. In ext4_fill_super() the buffer-head of superblock is stored in sbi->s_sbh. The page belongs to the buffer-head is allocated from movable area. To migrate the page the buffer-head should be released via brelse(). But brelse() is not called until unmount. Hum, I don't see where in the code do we check buffer_head use count. Can you please point me? Thanks. Filesystem code does not check buffer_head use count. sb_bread() returns the buffer_head that is included in bh_lru and has non-zero use count. You can see the bh_lru code in buffer.c: __find_get_clock() and lookup_bh_lru(). bh_lru_install() inserts the buffer_head into the bh_lru(). It first calls get_bh() to increase the use count and insert bh into the lru array. The buffer_head use count is non-zero until brelse() is called. So I probably didn't phrase the question precisely enough. What I was asking about is where exactly *migration* code checks buffer use count? Because as I'm looking at buffer_migrate_page() we lock the buffers on a migrated page but we don't look at buffer use counts... So it seems to me that migration of a page with buffers should succeed even if buffer head has an elevated use count. Now I think that it *should* check the buffer use counts (it is dangerous to migrate buffers someone holds reference to) but I just cannot find that place. Or does CMA use some other migration function for buffer pages than buffer_migrate_page()? CMA allocation function is cma_alloc(). Function flow is alloc_contig_range() -> __alloc_contig_migrate_range() -> migrate_pages -> unmap_and_move -> __unmap_and_move -> try_to_free_buffers -> drop_buffers -> buffer_busy. The buffer_busy() is checking b_count. If buffer is busy buffer-cache cannot be removed. So the page that includes buffer_head and the page that is refered by buffer_head are not movable. Is this what you need? Yes, this is what I was asking about. Thanks! But as I'm looking into __unmap_and_move() it calls try_to_free_buffers() only if page->mapping == NULL. As the comment before that test states, this can happen only for swap cache (not our case) or for pagecache pages that were truncated and not yet fully cleaned up. But superblock page cannot really be truncated. So I somewhat doubt you can hit the above path for a page holding superblock... I printed the address of busy buffer_head in drop_buffers() that is called by try_to_free_buffers(). And I printed the address of sb buffer_head. They were the same. I'm going to check page->mapping. I'm very sorry. It's my fault. Function path is like followings: [ 97.868304] [<8011a750>] (drop_buffers+0xfc/0x168) from [<8011bc64>] (try_to_free_buffers+0x50/0xbc) [ 97.877457] [<8011bc64>] (try_to_free_buffers+0x50/0xbc) from [<80121e40>] (blkdev_releasepage+0x38/0x48) [ 97.887093] [<80121e40>] (blkdev_releasepage+0x38/0x48) from [<800add8c>] (try_to_release_page+0x40/0x5c) [ 97.896728] [<800add8c>] (try_to_release_page+0x40/0x5c) from [<800bd9bc>] (shrink_page_list+0x508/0x8a4) [ 97.906334] [<800bd9bc>] (shrink_page_list+0x508/0x8a4) from [<800bde5c>] (reclaim_clean_pages_from_list+0x104/0x148) [ 97.917017] [<800bde5c>] (reclaim_clean_pages_from_list+0x104/0x148) from [<800b5dec>] (alloc_contig_range+0x114/0x2dc) [ 97.927856] [<800b5dec>] (alloc_contig_range+0x114/0x2dc) from [<802f6c04>] (dma_alloc_from_contiguous+0x8c/0x14c) [ 97.938264] [<802f6c04>] (dma_alloc_from_contiguous+0x8c/0x14c) from [<80017b6c>] (__alloc_from_contiguous+0x34/0xc0) [ 97.948926] [<80017b6c>] (__alloc_from_contiguous+0x34/0xc0) from [<80017d40>] (__dma_alloc+0xc4/0x2a0) [ 97.958362] [<80017d40>] (__dma_alloc+0xc4/0x2a0) from [<8001803c>] (arm_dma_alloc+0x80/0x98) [
[PATCH v2] ACPI / LPSS: add lpss device for Wildcat Point PCH
INT3438 is the ADSP device on Wildcat Point platform with 2 DW DMA engines built In. The DMA engines are used for DSP FW loading and audio data transferring. These DMA engine probing need the clock, without it, probing may failed and can't go forward. Add lpss device "INT3438" for Wildcat Point PCH, to provide clock for its ADSP DMA engine probing. Signed-off-by: Jie Yang --- drivers/acpi/acpi_lpss.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c index 9cb65b0..ce06149 100644 --- a/drivers/acpi/acpi_lpss.c +++ b/drivers/acpi/acpi_lpss.c @@ -113,6 +113,14 @@ static void lpss_i2c_setup(struct lpss_private_data *pdata) writel(val, pdata->mmio_base + offset); } +static struct lpss_device_desc wpt_dev_desc = { + .clk_required = true, + .prv_offset = 0x800, + .ltr_required = true, + .clk_divider = true, + .clk_gate = true, +}; + static struct lpss_device_desc lpt_dev_desc = { .clk_required = true, .prv_offset = 0x800, @@ -226,6 +234,8 @@ static const struct acpi_device_id acpi_lpss_device_ids[] = { { "INT3436", LPSS_ADDR(lpt_sdio_dev_desc) }, { "INT3437", }, + { "INT3438", LPSS_ADDR(wpt_dev_desc) }, + { } }; -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] irq / PM: New driver interface for wakeup interrupts
On Friday, August 01, 2014 02:08:12 AM Rafael J. Wysocki wrote: > On Friday, August 01, 2014 12:16:23 AM Thomas Gleixner wrote: > > On Thu, 31 Jul 2014, Rafael J. Wysocki wrote: > > > On Thursday, July 31, 2014 12:44:24 PM Thomas Gleixner wrote: [cut] > Except for a couple of points where I'm not sure I understand you correctly > (commented above), all of that sounds good to me. > > I'm not sure about the ordering, though. It would be good to have a working > replacement for the IRQF_NO_SUSPEND things that we'll be removing in 1, for > example. So since we need to do 3) IRQF_SHARED for both IRQF_NO_SUSPEND and > wakeup, as you said, would it be practical to start with that one? I forgot about one case which in my opinion would be good to take into account from the outset. That is the case of runtime-suspended devices that we don't want to touch during system suspend/resume. If those are system wakeup devices, their drivers should be able to configure them for system wakeup at the runtime suspend time rather than during system suspend. Also if their interrupts are going to be used as system wakeup interrupts, the interface that we're going to provide needs to handle that case. That is, it should be possible to use that interface at the runtime suspend time and it should take care of all things going forward. Triggering a lazy disable right at the runtime suspend time may not work, because the interrupt will also be used for the device's runtime remote wakeup in that case, so it has to be functional until system suspend is started and the core decides to leave the device in its current state. This means that the core will need to trigger the lazy disable for it at one point during system suspend. Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] KVM: nVMX: nested TPR shadow/threshold emulation
Paolo Bonzini wrote on 2014-07-31: > Il 31/07/2014 10:03, Wanpeng Li ha scritto: >>> One thing: >>> + if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) + vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold); >>> >>> I think you can just do this write unconditionally, since most >>> hypervisors will enable this. Also, you probably can add the tpr >> >> What will happen if a hypervisor doesn't enable it? I make it more >> cleaner in version two. > > TPR_THRESHOLD will be likely written as zero, but the processor will > never use it anyway. It's just a small optimization because > nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always be true. Theoretically, you are right. But we should not expect all VMMs follow it. It is not worth to violate the SDM just for saving two or three instructions' cost. > > Paolo > >>> threshold field to the read-write fields for shadow VMCS. >> >> Agreed. >> >> Regards, >> Wanpeng Li Best regards, Yang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cpufreq, store_scaling_governor requires policy->rwsem to be held for duration of changing governors [v2]
On 07/31/2014 03:58 PM, Prarit Bhargava wrote: On 07/31/2014 06:13 PM, Saravana Kannan wrote: On 07/31/2014 02:08 PM, Prarit Bhargava wrote: On 07/31/2014 04:38 PM, Saravana Kannan wrote: On 07/31/2014 01:30 PM, Prarit Bhargava wrote: On 07/31/2014 04:24 PM, Saravana Kannan wrote: Prarit, I'm not an expert on sysfs locking, but I would think the specific sysfs lock would depend on the file/attribute group. So, can you please try to hotplug a core in/out (to trigger the POLICY_EXIT) and then read a sysfs file exported by the governor? scaling_governor doesn't cut it since that file is not removed on policy exit event to governor. If it's ondemand, try reading/write it's sampling rate file. Thanks Saravana -- will do. I will get back to you shortly on this. Thanks. Btw, in case you weren't already aware of it. You'll have to hoplug out all the CPUs in a cluster to trigger a POLICY_EXIT for that cluster/policy. Yep -- the affected_cpus file should show all the cpus in the policy IIRC. One of the systems I have has 1 cpu/policy and has 48 threads so the POLICY_EXIT is called. I'll put something like while [1]; do echo ondemand > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate echo 2 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate echo 0 > /sys/devices/system/cpu/cpu1/online sleep 1 echo 1 > /sys/devices/system/cpu/cpu1/online sleep 1 done The actual race can only happen with 2 threads. I'm just trying to trigger a lockdep warning here. I ran the above in two separate terminals with cpuset -c 0 and cpuset -c 1 to multi-thread it all. No deadlock or LOCKDEP trace after about 1/2 hour, so I think we're in the clear on that concern. I wasn't convinced. So, I took some help from Stephen to test it. It's been a while, so I didn't remember the original issue clearly when I gave you some test suggestions. Now that I looked at the code more closely, I have a proper way to reproduce the original issue. Nack for this patch for 2 reasons: 1. You seem to have accidentally removed a GOV_STOP in your patch. We definitely can't do that. This broke changing governors and that's why your patch didn't cause any issues. Because all your governor echos were failing. 2. When we fixed that and actually tried a proper test (not the one I gave you), we reproduced the original issue. To reproduce original issue: Preconditions: * lockdep is enabled * governor per policy is enabled Steps: 1. Set governor to ondemand. 2. Cat one of the ondemand sysfs files. 3. Change governor to conservative. When you do that, there's an AB, BA dead lock issue with one thread trying to cat a governor sysfs file and another thread trying to change governors. -Saravana -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 tip/core/rcu 01/10] rcu: Add call_rcu_tasks()
On 08/01/2014 12:09 AM, Paul E. McKenney wrote: > >>> + /* >>> +* There were callbacks, so we need to wait for an >>> +* RCU-tasks grace period. Start off by scanning >>> +* the task list for tasks that are not already >>> +* voluntarily blocked. Mark these tasks and make >>> +* a list of them in rcu_tasks_holdouts. >>> +*/ >>> + rcu_read_lock(); >>> + for_each_process_thread(g, t) { >>> + if (t != current && ACCESS_ONCE(t->on_rq) && >>> + !is_idle_task(t)) { >> >> What happen when the trampoline is on the idle task? >> >> I think we need to use schedule_on_each_cpu() to replace one of >> the synchronize_sched() in this function. (or other stuff which can >> cause real schedule for *ALL* online CPUs). > > Well, that is one of the questions in the 0/10 cover letter. If it turns > out to be necessary to worry about idle-task trampolines, it should be > possible to avoid hammering all idle CPUs in the common case. Though maybe > battery-powered devices won't need RCU-tasks. > trampolines on NO_HZ idle CPU can be arbitrary long, (example, SMI happens inside the trampoline). So only the real schedule on idle CPU is reliable to me. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH percpu/for-3.17 1/2] percpu: implement percpu_pool
Hello, Andrew. On Thu, Jul 31, 2014 at 6:03 PM, Andrew Morton wrote: > I don't think we should add facilities such as this. Because if we do, > people will use them and thereby make the kernel less reliable, for > obvious reasons. > > It would be better to leave the nasty hack localized within > blk-throttle.c and hope that someone finds a way of fixing it. The thing is we need similar facilities in the IO path in other places too. They share exactly the same characteristics - opportunistic percpu allocations during IO which are expected to fail from time to time and they will all implement fallback behavior on allocation failures. I'm not sure how this makes the kernel less reliable. This conceptually isn't different from atomic allocations which we also use in a similar way. If you're worried that people might use this assuming that it won't fail, an obvious solution is adding a failure injection for debugging, but really except for being a bit ghetto, this is just the atomic allocation for percpu areas. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/2] printk: Add function to return log buffer address and size
On Thu, 2014-07-31 at 15:22 -0700, Andrew Morton wrote: > Please include this in whatever tree carries "powerpc/powernv: > Interface to register/unregister opal dump region". At some point, I'd like to redo the patch series that breaks up printk.c into more manageable blocks. https://lkml.org/lkml/2012/10/17/41 Any suggestion for timing? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[REVIEW][PATCH 4/4] proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts
In oddball cases where the thread has a different mount namespace than the thread group leader or more likely in cases where the thread remains and the thread group leader has exited this ensures that /proc/mounts continues to work. This should not cause any problems but if it does this patch can just be reverted. Signed-off-by: "Eric W. Biederman" --- fs/proc/root.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/proc/root.c b/fs/proc/root.c index 48f1c03bc7ed..92c12c243ce3 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -173,7 +173,7 @@ void __init proc_root_init(void) proc_self_init(); proc_thread_self_init(); - proc_symlink("mounts", NULL, "self/mounts"); + proc_symlink("mounts", NULL, "thread-self/mounts"); proc_net_init(); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[REVIEW][PATCH 3/4] proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net
In oddball cases where the thread has a different network namespace than the primary thread group leader or more likely in cases where the thread remains and the thread group leader has exited this ensures that /proc/net continues to work. This should not cause any problems but if it does this patch can just be reverted. Signed-off-by: "Eric W. Biederman" --- fs/proc/proc_net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c index a63af3e0a612..39481028ec08 100644 --- a/fs/proc/proc_net.c +++ b/fs/proc/proc_net.c @@ -226,7 +226,7 @@ static struct pernet_operations __net_initdata proc_net_ns_ops = { int __init proc_net_init(void) { - proc_symlink("net", NULL, "self/net"); + proc_symlink("net", NULL, "thread-self/net"); return register_pernet_subsys(_net_ns_ops); } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[REVIEW][PATCH 1/4] proc: Have net show up under /proc//task/
Network namespaces are per task so it make sense for them to show up in the task directory. Signed-off-by: "Eric W. Biederman" --- fs/proc/base.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/proc/base.c b/fs/proc/base.c index 2d696b0c93bf..ed34e405c6b9 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2895,6 +2895,9 @@ static const struct pid_entry tid_base_stuff[] = { DIR("fd",S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations), DIR("fdinfo",S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations), DIR("ns",S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations), +#ifdef CONFIG_NET + DIR("net",S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations), +#endif REG("environ", S_IRUSR, proc_environ_operations), INF("auxv", S_IRUSR, proc_pid_auxv), ONE("status",S_IRUGO, proc_pid_status), -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[REVIEW][PATCH 2/4] proc: Implement /proc/thread-self to point at the directory of the current thread
/proc/thread-self is derived from /proc/self. /proc/thread-self points to the directory in proc containing information about the current thread. This funtionality has been missing for a long time, and is tricky to implement in userspace as gettid() is not exported by glibc. More importantly this allows fixing defects in /proc/mounts and /proc/net where in a threaded application today they wind up being empty files when only the initial pthread has exited, causing problems for other threads. Signed-off-by: "Eric W. Biederman" --- fs/proc/Makefile | 1 + fs/proc/base.c| 15 +--- fs/proc/inode.c | 7 +++- fs/proc/internal.h| 6 +++ fs/proc/root.c| 3 ++ fs/proc/thread_self.c | 85 +++ include/linux/pid_namespace.h | 1 + 7 files changed, 112 insertions(+), 6 deletions(-) create mode 100644 fs/proc/thread_self.c diff --git a/fs/proc/Makefile b/fs/proc/Makefile index 239493ec718e..7151ea428041 100644 --- a/fs/proc/Makefile +++ b/fs/proc/Makefile @@ -23,6 +23,7 @@ proc-y+= version.o proc-y += softirqs.o proc-y += namespaces.o proc-y += self.o +proc-y += thread_self.o proc-$(CONFIG_PROC_SYSCTL) += proc_sysctl.o proc-$(CONFIG_NET) += proc_net.o proc-$(CONFIG_PROC_KCORE) += kcore.o diff --git a/fs/proc/base.c b/fs/proc/base.c index ed34e405c6b9..0131156ce7c9 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2847,7 +2847,7 @@ retry: return iter; } -#define TGID_OFFSET (FIRST_PROCESS_ENTRY + 1) +#define TGID_OFFSET (FIRST_PROCESS_ENTRY + 2) /* for the /proc/ directory itself, after non-process stuff has been done */ int proc_pid_readdir(struct file *file, struct dir_context *ctx) @@ -2859,14 +2859,19 @@ int proc_pid_readdir(struct file *file, struct dir_context *ctx) if (pos >= PID_MAX_LIMIT + TGID_OFFSET) return 0; - if (pos == TGID_OFFSET - 1) { + if (pos == TGID_OFFSET - 2) { struct inode *inode = ns->proc_self->d_inode; if (!dir_emit(ctx, "self", 4, inode->i_ino, DT_LNK)) return 0; - iter.tgid = 0; - } else { - iter.tgid = pos - TGID_OFFSET; + ctx->pos = pos = pos + 1; + } + if (pos == TGID_OFFSET - 1) { + struct inode *inode = ns->proc_thread_self->d_inode; + if (!dir_emit(ctx, "thread-self", 11, inode->i_ino, DT_LNK)) + return 0; + ctx->pos = pos = pos + 1; } + iter.tgid = pos - TGID_OFFSET; iter.task = NULL; for (iter = next_tgid(ns, iter); iter.task; diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 0adbc02d60e3..333080d7a671 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -442,6 +442,7 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de) int proc_fill_super(struct super_block *s) { struct inode *root_inode; + int ret; s->s_flags |= MS_NODIRATIME | MS_NOSUID | MS_NOEXEC; s->s_blocksize = 1024; @@ -463,5 +464,9 @@ int proc_fill_super(struct super_block *s) return -ENOMEM; } - return proc_setup_self(s); + ret = proc_setup_self(s); + if (ret) { + return ret; + } + return proc_setup_thread_self(s); } diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 3ab6d14e71c5..ee04619173b2 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -234,6 +234,12 @@ static inline int proc_net_init(void) { return 0; } extern int proc_setup_self(struct super_block *); /* + * proc_thread_self.c + */ +extern int proc_setup_thread_self(struct super_block *); +extern void proc_thread_self_init(void); + +/* * proc_sysctl.c */ #ifdef CONFIG_PROC_SYSCTL diff --git a/fs/proc/root.c b/fs/proc/root.c index 5dbadecb234d..48f1c03bc7ed 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -149,6 +149,8 @@ static void proc_kill_sb(struct super_block *sb) ns = (struct pid_namespace *)sb->s_fs_info; if (ns->proc_self) dput(ns->proc_self); + if (ns->proc_thread_self) + dput(ns->proc_thread_self); kill_anon_super(sb); put_pid_ns(ns); } @@ -170,6 +172,7 @@ void __init proc_root_init(void) return; proc_self_init(); + proc_thread_self_init(); proc_symlink("mounts", NULL, "self/mounts"); proc_net_init(); diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c new file mode 100644 index ..59075b509df3 --- /dev/null +++ b/fs/proc/thread_self.c @@ -0,0 +1,85 @@ +#include +#include +#include +#include +#include "internal.h" + +/* + * /proc/thread_self: + */ +static int proc_thread_self_readlink(struct dentry *dentry, char __user *buffer, + int buflen) +{ + struct pid_namespace *ns =
[REVIEW][PATCH 0/4] /proc/thread-self
This patchset implements /proc/thread-self a magic symlink that solves a couple of problems. - It makes it easy to get to a specific threads directory in /proc with gettid() not being exported in glibc this is currently a pain. - It allows fixing the problem present in /proc/mounts and /proc/net that when the thread group leader exits but the entire thread group remains /proc/self/net and /proc/self/mounts and thus /proc/mounts and /proc/net become empty. - As mount and network namespaces are per thread it allows /proc/net and /proc/mounts to reflect this. This is small chance changing /proc/net and /proc/mounts will cause userspace regressions (although nothing has shown up in my testing) if that happens we can just point the change that moves them from /proc/self/... to /proc/thread-self/... Eric W. Biederman (4): proc: Have net show up under /proc//task/ proc: Implement /proc/thread-self to point at the directory of the current thread proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts fs/proc/Makefile | 1 + fs/proc/base.c| 18 ++--- fs/proc/inode.c | 7 +++- fs/proc/internal.h| 6 +++ fs/proc/proc_net.c| 2 +- fs/proc/root.c| 5 ++- fs/proc/thread_self.c | 85 +++ include/linux/pid_namespace.h | 1 + 8 files changed, 117 insertions(+), 8 deletions(-) Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] irq / PM: New driver interface for wakeup interrupts
On Friday, August 01, 2014 01:41:31 AM Thomas Gleixner wrote: > On Thu, 31 Jul 2014, Rafael J. Wysocki wrote: > > On Thursday, July 31, 2014 04:12:55 PM Alan Stern wrote: > > > Pardon me for sticking my nose into the middle of the conversation, but > > > here's what it looks like to me: > > > > > > The entire no_irq phase of suspend/resume is starting to seem like a > > > mistake. We should never have done it. > > > > In hindsight, I totally agree. Question is what we can do about it now. > > > > > So how can we eliminate the noirq phase in a workable way? > > The straight way to do that is breaking the world and some more and > then fix up a gazillion of device drivers by doing a massive voodoo > debugging effort simply because in most cases we do not get any useful > information out of the system once the shit hits the fan. > > We could add instrumentation to the core code about interrupts which > are coming in unexpectedly during suspend, but that does not solve > anything. > > We really cannot call any device handler at that point as clocks might > be turned off already and any access to a device register might simply > cause a full undebuggable stall of the CPU. > > And there is no way to prove that there is no chance of a spurious > interrupt for a given device. > > So if we cannot handle it at the infrastructure level, we need to make > sure that every fricking device driver interrupt handler has a > > if (dev->suspended) > return CRAP; > > conditional as the first line of code in it. > > What is that buying us? > > Nothing than a shitload of hard to understand problems, really. The > only sensible way to handle this is at the core level. > > #1 There is no way that you can rely on random drivers to do the Right >Thing. > > #2 There is no way that all hardware is implemented in a sane way. > > #3 You CANNOT educate the people who are tasked to implement something >which "does the job" to understand all the subtle details of >suspend/resume or whatever. These are fair points. However, if the driver implements ->runtime_suspend, it has to handle the "my device is suspended" condition in its interrupt handler regardless. For such a driver doing the same over system suspend/resume shouldn't be a real problem. Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the driver-core tree with the tip tree
On Thu, Jul 31, 2014 at 05:07:35PM +1000, Stephen Rothwell wrote: > Hi Greg, > > Today's linux-next merge of the driver-core tree got a conflict in > lib/Kconfig.debug between commit e704f93af5a0 ("kernel: time: Add > udelay_test module to validate udelay") from the tip tree and commit > 0a8adf584759 ("test: add firmware_class loader test") from the > driver-core tree. > > I fixed it up (see below) and can carry the fix as necessary (no action > is required). Looks good, thanks. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, 2014-07-31 at 12:42 +0200, Peter Zijlstra wrote: > On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: > > On Tue, 29 Jul 2014 13:24:05 +0800 > > Aaron Lu wrote: > > > > > FYI, we noticed the below changes on > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > > > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure > > > task_numa_migrate() checks the preferred node") > > > > > > ebe06187bf2aec1 a43455a1d572daf7b730fe12e > > > --- - > > > 94500 ~ 3%+115.6% 203711 ~ 6% > > > ivb42/hackbench/50%-threads-pipe > > > 67745 ~ 4% +64.1% 74 ~ 5% > > > lkp-snb01/hackbench/50%-threads-socket > > > 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL > > > proc-vmstat.numa_hint_faults_local > > > > Hi Aaron, > > > > Jirka Hladky has reported a regression with that changeset as > > well, and I have already spent some time debugging the issue. > > So assuming those numbers above are the difference in > numa_hint_local_faults, the report is actually a significant > _improvement_, not a regression. > > On my IVB-EP I get similar numbers; using: > > PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` > perf bench sched messaging -g 24 -t -p -l 6 > POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` > echo $((POST-PRE)) > > > tip/mater+origin/master tip/master+origin/master-a43455a1d57 > > local total local total > faults timefaults time > > 19971 51.384 10104 50.838 > 17193 50.564 911650.208 > 13435 49.057 833251.344 > 23794 50.795 995451.364 > 20255 49.463 959851.258 > > 18929.6 50.2526 9420.8 51.0024 > 3863.61 0.96717.78 0.49 > > So that patch improves both local faults and runtime. Its good (even > though for the runtime we're still inside stdev overlap, so ideally I'd > do more runs). > > > Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and > that slightly reduces both again: > > tip/master+origin/master+patch > > local total > faults time > > 21296 50.541 > 12771 50.54 > 13872 52.224 > 23352 50.85 > 16516 50.705 > > 17561.4 50.972 > 4613.32 0.71 > > So for hackbench a43455a1d57 is good and the proposed patch is making > things worse. It also seems to be the case on a 8-socket 80 core DL980: tip/master baseline: 67276 169.590 [sec] 82400 188.406 [sec] 87827 201.122 [sec] 96659 228.243 [sec] 83180 192.422 [sec] tip/master + a43455a1d57 reverted 36686 170.373 [sec] 52670 187.904 [sec] 55723 203.597 [sec] 41780 174.354 [sec] 36070 173.179 [sec] Runtimes are pretty much all over the place, cannot really say if it's gotten slower or faster. However, on avg, we nearly double the amount of hint local faults with the commit in question. After adding the proposed fix (NUMA_SCALE/8 variant), it goes down again, closer to without a43455a1d57" tip/master + patch 50591 175.272 [sec] 57858 191.969 [sec] 77564 215.429 [sec] 50613 179.384 [sec] 61673 201.694 [sec] > Let me see if I can still find my SPECjbb2005 copy to see what that > does. I'll try to dig it up as well. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2] CMA/HOTPLUG: clear buffer-head lru before page migration
2014-08-01 오전 7:57, Andrew Morton 쓴 글: On Thu, 31 Jul 2014 11:22:35 +0900 Gioh Kim wrote: The previous PATCH inserts invalidate_bh_lrus() only into CMA code. HOTPLUG needs also dropping bh of lru. So v2 inserts invalidate_bh_lrus() into both of CMA and HOTPLUG. 8< The bh must be free to migrate a page at which bh is mapped. The reference count of bh is increased when it is installed into lru so that the bh of lru must be freed before migrating the page. This frees every bh of lru. We could free only bh of migrating page. But searching lru sometimes costs more than invalidating entire lru. Signed-off-by: Gioh Kim Acked-by: Michal Nazarewicz --- mm/memory_hotplug.c |1 + mm/page_alloc.c |2 ++ 2 files changed, 3 insertions(+) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index a3797d3..1c5454f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1672,6 +1672,7 @@ repeat: lru_add_drain_all(); cond_resched(); drain_all_pages(); + invalidate_bh_lrus(); Both of these calls should have a comment explaining why invalidate_bh_lrus() is being called. } pfn = scan_movable_pages(start_pfn, end_pfn); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b99643d4..c00dedf 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6369,6 +6369,8 @@ int alloc_contig_range(unsigned long start, unsigned long end, if (ret) return ret; + invalidate_bh_lrus(); + ret = __alloc_contig_migrate_range(, start, end); if (ret) goto done; I do feel that this change is likely to be beneficial, but I don't want to apply such a patch until I know what its effects are upon all alloc_contig_range() callers. Especially hugetlb. I'm very sorry to hear that. How can I check the effects? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] new API to allocate buffer-cache for superblock in non-movable area
2014-07-31 오후 9:21, Jan Kara 쓴 글: On Thu 31-07-14 09:37:15, Gioh Kim wrote: 2014-07-31 오전 9:03, Jan Kara 쓴 글: On Thu 31-07-14 08:54:40, Gioh Kim wrote: 2014-07-30 오후 7:11, Jan Kara 쓴 글: On Wed 30-07-14 16:44:24, Gioh Kim wrote: 2014-07-22 오후 6:38, Jan Kara 쓴 글: On Tue 22-07-14 09:30:05, Peter Zijlstra wrote: On Tue, Jul 22, 2014 at 02:18:47PM +0900, Gioh Kim wrote: Hello, This patch try to solve problem that a long-lasting page cache of ext4 superblock disturbs page migration. I've been testing CMA feature on my ARM-based platform and found some pages for page caches cannot be migrated. Some of them are page caches of superblock of ext4 filesystem. Current ext4 reads superblock with sb_bread(). sb_bread() allocates page >from movable area. But the problem is that ext4 hold the page until it is unmounted. If root filesystem is ext4 the page cannot be migrated forever. I introduce a new API for allocating page from non-movable area. It is useful for ext4 and others that want to hold page cache for a long time. There's no word on why you can't teach ext4 to still migrate that page. For all I know it might be impossible, but at least mention why. I am very sorry for lacking of details. In ext4_fill_super() the buffer-head of superblock is stored in sbi->s_sbh. The page belongs to the buffer-head is allocated from movable area. To migrate the page the buffer-head should be released via brelse(). But brelse() is not called until unmount. Hum, I don't see where in the code do we check buffer_head use count. Can you please point me? Thanks. Filesystem code does not check buffer_head use count. sb_bread() returns the buffer_head that is included in bh_lru and has non-zero use count. You can see the bh_lru code in buffer.c: __find_get_clock() and lookup_bh_lru(). bh_lru_install() inserts the buffer_head into the bh_lru(). It first calls get_bh() to increase the use count and insert bh into the lru array. The buffer_head use count is non-zero until brelse() is called. So I probably didn't phrase the question precisely enough. What I was asking about is where exactly *migration* code checks buffer use count? Because as I'm looking at buffer_migrate_page() we lock the buffers on a migrated page but we don't look at buffer use counts... So it seems to me that migration of a page with buffers should succeed even if buffer head has an elevated use count. Now I think that it *should* check the buffer use counts (it is dangerous to migrate buffers someone holds reference to) but I just cannot find that place. Or does CMA use some other migration function for buffer pages than buffer_migrate_page()? CMA allocation function is cma_alloc(). Function flow is alloc_contig_range() -> __alloc_contig_migrate_range() -> migrate_pages -> unmap_and_move -> __unmap_and_move -> try_to_free_buffers -> drop_buffers -> buffer_busy. The buffer_busy() is checking b_count. If buffer is busy buffer-cache cannot be removed. So the page that includes buffer_head and the page that is refered by buffer_head are not movable. Is this what you need? Yes, this is what I was asking about. Thanks! But as I'm looking into __unmap_and_move() it calls try_to_free_buffers() only if page->mapping == NULL. As the comment before that test states, this can happen only for swap cache (not our case) or for pagecache pages that were truncated and not yet fully cleaned up. But superblock page cannot really be truncated. So I somewhat doubt you can hit the above path for a page holding superblock... I printed the address of busy buffer_head in drop_buffers() that is called by try_to_free_buffers(). And I printed the address of sb buffer_head. They were the same. I'm going to check page->mapping. Honza -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86_64,vsyscall] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > Oops, github needs this link for downloading big files: > > > > https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/yocto-minimal-i386.cgz > > Or > https://github.com/fengguang/reproduce-kernel-bug/blob/master/initrd/yocto-minimal-x86_64.cgz, > I guess? The particular failure you're seeing here is only possible > on 64-bit kernels. You are right. Sorry for the mistake! Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [perf/x86/RAPL] BUG: unable to handle kernel NULL pointer dereference at 00000028
On Thu, Jul 31, 2014 at 07:57:25PM +0200, Stephane Eranian wrote: > On Thu, Jul 31, 2014 at 4:32 AM, Fengguang Wu wrote: > > Hi Stephane, > > > > On Wed, Jul 30, 2014 at 07:56:11PM +0200, Stephane Eranian wrote: > >> On Wed, Jul 30, 2014 at 7:53 AM, Fengguang Wu > >> wrote: > >> > On Wed, Jul 30, 2014 at 06:45:58AM +0200, Stephane Eranian wrote: > >> >> On Wed, Jul 30, 2014 at 6:00 AM, Fengguang Wu > >> >> wrote: > >> >> > Greetings, > >> >> > > >> >> > 0day kernel testing robot got the below dmesg and the first bad > >> >> > commit is > >> >> > > >> >> Is this booting a guest kernel or native? > >> > > >> > It's a guest kernel. > >> > > >> >> What is the host CPU? > >> > > >> > The host CPU is E5-2680, Sandy Bridge-EP. > >> > > >> I thought this problem had already be mentioned a while back. > >> > >> See https://lkml.org/lkml/2014/3/6/685 > >> And https://lkml.org/lkml/2014/4/23/512 > >> > >> So what you are telling here is that those two fixes never made it or > >> that you are > >> running an older kernel. > > > > I just checked linux-next and find that the bug in rapl_pmu_init() has > > been fixed. linux-next happen to have the same "BUG: unable to handle > > kernel NULL pointer dereference" message but at another function > > validate_chain().. Attached is the dmesg in linux-next. > > > > Sorry for the noise! > > > Is it fixed with the two patches I referred you to? Yes. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 tip/core/rcu 1/9] rcu: Add call_rcu_tasks()
On Thu, Jul 31, 2014 at 02:55:01PM -0700, Paul E. McKenney wrote: > From: "Paul E. McKenney" > > This commit adds a new RCU-tasks flavor of RCU, which provides > call_rcu_tasks(). This RCU flavor's quiescent states are voluntary > context switch (not preemption!), userspace execution, and the idle loop. > Note that unlike other RCU flavors, these quiescent states occur in tasks, > not necessarily CPUs. Includes fixes from Steven Rostedt. > > This RCU flavor is assumed to have very infrequent latency-tolerate > updaters. This assumption permits significant simplifications, including > a single global callback list protected by a single global lock, along > with a single linked list containing all tasks that have not yet passed > through a quiescent state. If experience shows this assumption to be > incorrect, the required additional complexity will be added. > > Suggested-by: Steven Rostedt > Signed-off-by: Paul E. McKenney > --- > include/linux/init_task.h | 9 +++ > include/linux/rcupdate.h | 36 ++ > include/linux/sched.h | 23 --- > init/Kconfig | 10 +++ > kernel/rcu/tiny.c | 2 + > kernel/rcu/tree.c | 2 + > kernel/rcu/update.c | 171 > ++ > 7 files changed, 242 insertions(+), 11 deletions(-) > > diff --git a/include/linux/init_task.h b/include/linux/init_task.h > index 6df7f9fe0d01..78715ea7c30c 100644 > --- a/include/linux/init_task.h > +++ b/include/linux/init_task.h > @@ -124,6 +124,14 @@ extern struct group_info init_groups; > #else > #define INIT_TASK_RCU_PREEMPT(tsk) > #endif > +#ifdef CONFIG_TASKS_RCU > +#define INIT_TASK_RCU_TASKS(tsk) \ > + .rcu_tasks_holdout = false, \ > + .rcu_tasks_holdout_list = \ > + LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list), > +#else > +#define INIT_TASK_RCU_TASKS(tsk) > +#endif > > extern struct cred init_cred; > > @@ -231,6 +239,7 @@ extern struct task_group root_task_group; > INIT_FTRACE_GRAPH \ > INIT_TRACE_RECURSION\ > INIT_TASK_RCU_PREEMPT(tsk) \ > + INIT_TASK_RCU_TASKS(tsk)\ > INIT_CPUSET_SEQ(tsk)\ > INIT_RT_MUTEXES(tsk)\ > INIT_VTIME(tsk) \ > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index 6a94cc8b1ca0..829efc99df3e 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head, > > void synchronize_sched(void); > > +/** > + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period > + * @head: structure to be used for queueing the RCU updates. > + * @func: actual callback function to be invoked after the grace period > + * > + * The callback function will be invoked some time after a full grace > + * period elapses, in other words after all currently executing RCU > + * read-side critical sections have completed. call_rcu_tasks() assumes > + * that the read-side critical sections end at a voluntary context > + * switch (not a preemption!), entry into idle, or transition to usermode > + * execution. As such, there are no read-side primitives analogous to > + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended > + * to determine that all tasks have passed through a safe state, not so > + * much for data-strcuture synchronization. > + * > + * See the description of call_rcu() for more detailed information on > + * memory ordering guarantees. > + */ > +void call_rcu_tasks(struct rcu_head *head, void (*func)(struct rcu_head > *head)); > + > #ifdef CONFIG_PREEMPT_RCU > > void __rcu_read_lock(void); > @@ -294,6 +314,22 @@ static inline void rcu_user_hooks_switch(struct > task_struct *prev, > rcu_irq_exit(); \ > } while (0) > > +/* > + * Note a voluntary context switch for RCU-tasks benefit. This is a > + * macro rather than an inline function to avoid #include hell. > + */ > +#ifdef CONFIG_TASKS_RCU > +#define rcu_note_voluntary_context_switch(t) \ > + do { \ > + preempt_disable(); /* Exclude synchronize_sched(); */ \ > + if (ACCESS_ONCE((t)->rcu_tasks_holdout)) \ > + ACCESS_ONCE((t)->rcu_tasks_holdout) = 0; \ > + preempt_enable(); \ > + } while (0) > +#else /* #ifdef CONFIG_TASKS_RCU */ > +#define rcu_note_voluntary_context_switch(t) do { } while (0) > +#endif /* #else #ifdef CONFIG_TASKS_RCU */ > + > #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || > defined(CONFIG_SMP) > bool __rcu_is_watching(void); > #endif /* #if
Re: [RFC PATCH 0/2] dirreadahead system call
On Thu, Jul 31, 2014 at 01:19:45PM +0200, Andreas Dilger wrote: > On Jul 31, 2014, at 6:49, Dave Chinner wrote: > > > >> On Mon, Jul 28, 2014 at 03:19:31PM -0600, Andreas Dilger wrote: > >>> On Jul 28, 2014, at 6:52 AM, Abhijith Das wrote: > >>> OnJuly 26, 2014 12:27:19 AM "Andreas Dilger" wrote: > Is there a time when this doesn't get called to prefetch entries in > readdir() order? It isn't clear to me what benefit there is of returning > the entries to userspace instead of just doing the statahead implicitly > in the kernel? > > The Lustre client has had what we call "statahead" for a while, > and similar to regular file readahead it detects the sequential access > pattern for readdir() + stat() in readdir() order (taking into account if > ".*" > entries are being processed or not) and starts fetching the inode > attributes asynchronously with a worker thread. > >>> > >>> Does this heuristic work well in practice? In the use case we were trying > >>> to > >>> address, a Samba server is aware beforehand if it is going to stat all the > >>> inodes in a directory. > >> > >> Typically this works well for us, because this is done by the Lustre > >> client, so the statahead is hiding the network latency of the RPCs to > >> fetch attributes from the server. I imagine the same could be seen with > >> GFS2. I don't know if this approach would help very much for local > >> filesystems because the latency is low. > >> > This syscall might be more useful if userspace called readdir() to get > the dirents and then passed the kernel the list of inode numbers > to prefetch before starting on the stat() calls. That way, userspace > could generate an arbitrary list of inodes (e.g. names matching a > regexp) and the kernel doesn't need to guess if every inode is needed. > >>> > >>> Were you thinking arbitrary inodes across the filesystem or just a subset > >>> from a directory? Arbitrary inodes may potentially throw up locking > >>> issues. > >> > >> I was thinking about inodes returned from readdir(), but the syscall > >> would be much more useful if it could handle arbitrary inodes. > > > > I'm not sure we can do that. The only way to safely identify a > > specific inode in the filesystem from userspace is via a filehandle. > > Plain inode numbers are susceptible to TOCTOU race conditions that > > the kernel cannot resolve. Also, lookup by inode number bypasses > > directory access permissions, so is not something we would expose > > to arbitrary unprivileged users. > > None of these issues are relevant in the API that I'm thinking about. > The syscall just passes the list of inode numbers to be prefetched > into kernel memory, and then stat() is used to actually get the data into > userspace (or whatever other operation is to be done on them), > so there is no danger if the wrong inode is prefetched. If the inode > number is bad the filesystem can just ignore it. Which means the filesystem has to treat the inode number as potentially hostile. i.e. it can not be trusted to be correct and so must take slow paths to validate the inode numbers. This adds *significant* overhead to the readahead path for some filesystems: readahead is only a win if it is low cost. For example, on XFS every untrusted inode number lookup requires an inode btree lookup to validate the inode is actually valid on disk and that is it allocated and has references. That lookup serialises against inode allocation/freeing as well as other lookups. In comparison, when using a trusted inode number from a directory lookup within the kernel, we only need to do a couple of shift and mask operations to convert it to a disk address and we are good to go. i.e. the difference is at least 5 orders of magnitude higher CPU usage for an "inode number readahead" syscall versus a "directory readahead" syscall, it has significant serialisation issues and it can stall other modification/lookups going on at the same time. That's *horrible behaviour* for a speculative readahead operation, but because the inodenumbers are untrusted, we can't avoid it. So, again, it's way more overhead than userspace just calling stat() asycnhronously on many files at once as readdir/gentdents returns dirents from the kernel to speed up cache population. That's my main issue with this patchset - it's implementing something in kernelspace that can *easily* be done generically in userspace without introducing all sorts of nasty corner cases that we have to handle in the kernel. We only add functionality to the kernel if there's a compelling reason to do it in kernelspace, and right now I just don't see any numbers that justify adding readdir+stat() readahead or inode number based cache population in kernelspace. Before we add *any* syscall for directory readahead, we need comparison numbers against doing the dumb multithreaded userspace readahead of stat() calls. If userspace can do
Re: [PATCH v3] Documentation: devicetree: Fix tps65090 typos in example
Hello Andreas, On Wed, Jul 30, 2014 at 11:29 PM, Andreas Färber wrote: > Specification and existing device trees use vsys-l{1,2}-supply, > not vsys_l{1,2}-supply. Fix the example to match the specification. > > Reviewed-by: Doug Anderson > Acked-by: Mark Rutland > Fixes: 21d2202158e9 ("mfd: tps65090: add DT support for tps65090") > Signed-off-by: Andreas Färber > --- > v2 -> v3: > * Added Fixes header > * + regulator and mfd maintainers > > v1 -> v2: > * More verbose commit message (requested by Mark Rutland) > > Documentation/devicetree/bindings/regulator/tps65090.txt | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/Documentation/devicetree/bindings/regulator/tps65090.txt > b/Documentation/devicetree/bindings/regulator/tps65090.txt > index 340980239ea9..ca69f5e3040c 100644 > --- a/Documentation/devicetree/bindings/regulator/tps65090.txt > +++ b/Documentation/devicetree/bindings/regulator/tps65090.txt > @@ -45,8 +45,8 @@ Example: > infet5-supply = <_reg>; > infet6-supply = <_reg>; > infet7-supply = <_reg>; > - vsys_l1-supply = <_reg>; > - vsys_l2-supply = <_reg>; > + vsys-l1-supply = <_reg>; > + vsys-l2-supply = <_reg>; > True, these also matches the .supply_name used when registering the LDO[1-2] regulators in the tps65090 driver. So clearly the example was wrong while the property specification is correct. Reviewed-by: Javier Martinez Canillas Best regards, Javier -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] irq / PM: New driver interface for wakeup interrupts
On Friday, August 01, 2014 12:16:23 AM Thomas Gleixner wrote: > On Thu, 31 Jul 2014, Rafael J. Wysocki wrote: > > On Thursday, July 31, 2014 12:44:24 PM Thomas Gleixner wrote: > > > What's this PCIe PME handler doing? Is it required functionality for > > > the suspend/resume path or is it a wakeup/abort mechanism. > > > > It is a wakeup/abort mechanism. > > So why is it using IRQF_NO_SUSPEND in the first place? It isn't in the current code. It did in *some* of the prototype patches floating around, but they were withdrawn. In the last one ([2/3] in this series) it doesn't any more. > Just because x86 does not have irq wake implemented or flagged it's irq > chips with IRQCHIP_SKIP_SET_WAKE. The reason for using IRQF_NO_SUSPEND was to make it wake up from suspend-to-idle, but that was a sledgehammer approach. > Right, so instead of thinking about a proper solution driver folks just slap > the next available thing on it w/o thinking about the consequences. But, > thats partly our own fault due to lack of proper documentation. Prototyping with the next available thing is not generally wrong in my view as long as this doesn't get to the code base without consideration. > > And before we enter the wakeup handling slippery slope, let me make a note > > that this problem is bothering me quite a bit at the moment. In my opinion > > we need to address it somehow regardless of the wakeup issues and I'm not > > sure > > if failing __setup_irq() when there's a mismatch (that is, there are > > existing > > actions for the given irq_desc and their IRQF_NO_SUSPEND settings are not > > consistent with the new one) is the right way to do that, because it may > > make > > things behave a bit randomly (it will always fail the second guy, but that > > need > > not be the one who's requested IRQF_NO_SUSPEND and it depends on the > > ordering > > between them). > > I totally agree that we want to fix it and I'm going to help. Though I > really wanted to have a clear picture of the stuff before making > decisions. > > > I had a couple of ideas, but none of them was particularly clean. Ideally, > > IRQF_NO_SUSPEND should always be requested without IRQF_SHARED, but I'm > > afraid that we can't really do that for the ACPI SCI, because that may > > cause problems to happen on some older systems where that interrupt is > > actually shared. On all systems I have immediate access to it isn't shared, > > but I remember seeing some where it was. On those systems the ACPI SCI > > itself > > would not be affected, because it is requested quite early during system > > init, > > but the other guys wanting to share the line with it would take a hit. > > > > One thing I was thinking about was to return an error from > > suspend_device_irqs() > > if there was a mismatch between IRQF_NO_SUSPEND settings for different > > irqactions > > in the same irq_desc. That would make system suspend fail on systems where > > it > > is potentially unsafe, but at least any other functionality would not be > > affected. > > > > That's one possible solution. See below. > > > > So many of them use it for wakeup purposes. Why so and how is that > > > supposed to work? > > > > Quite frankly, I'm not sure why they use it. These are mostly drivers I'm > > not > > familiar with on platforms I'm not familiar with. My guess is that the lazy > > disable mechanism is not sufficient for them for some reason. > > Looking at a few of them I fear the reason is that the developer did > not understand the wakeup mechanism at all. Again that's probably our > fault, because the whole business including the irq part lacks proper > in depth documentation. > > > > The mechanism for wakeup sources is: > > > > > > 1) Lazy disable the interrupt > > > > > > 2) Do the transition into suspend with interrupts enabled > > > > > > 3) Check whether one of the wakeup sources has triggered. If yes, > > >abort. Otherwise suspend. > > > > > > The ones marked IRQF_NO_SUSPEND are not part of the above scheme, > > > because they are not checked. So they use different mechanisms to > > > abort the suspend? > > > > Well, if you look at the tegra_kbc driver, for example, it uses both > > enable_irq_wake() and IRQF_NO_SUSPEND. Why it does that, I don't know. > > That doesn't make sense at all. > > > Other ones seem to be using pm_wakeup_event(), but that will only abort > > suspend when it is enabled upfront (it need not be). Moreover, it wasn't > > intended to be used that way. > > Right. We should kill that before more people copy it blindly. > > > It generally looks like things are used not as intended in the wakeup > > area, sadly. Perhaps that's my fault, because I wasn't looking carefully > > enough every time, but I wasn't directly involved in any of them IIRC. > > > > I guess that's an opportunity to clean that up ... > > Agreed. I'm not frightened to do a tree wide sweep. Seems to become a > habit :) > > > And now
Re: [PATCH v4 3/5] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend
On 07/31/2014 02:56 PM, Rafael J. Wysocki wrote: On Thursday, July 24, 2014 06:07:26 PM Saravana Kannan wrote: This patch simplifies a lot of the hotplug/suspend code by not adding/removing/moving the policy/sysfs/kobj during hotplug and just leaves the cpufreq directory and policy in place irrespective of whether the CPUs are ONLINE/OFFLINE. I'm still quite unsure how this is going to work with the real CPU hot-remove that makes the entire sysfs cpu directories go away. Can you please explain that? Sure. Not a problem. I just wanted to make sure you had a chance to look at the code first. Physical hot-remove triggers a "remove" for all the registered subsys_interfaces for that CPU (after going through a couple of functions). So, when that happens, the cpufreq subsys_interface remove for that CPU gets called. At that point, I clean up that CPU's SW states as if it was never plugged in from the start. If that CPU was the owner of the sysfs directory, I move it over to a different CPU. -Saravana -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/5] ftrace: Require designated initialization of structures
Mark various ftrace structures with __designated_init. Fix some ftrace macros to use designated initializers for those structures. Signed-off-by: Josh Triplett --- include/linux/ftrace.h | 4 ++-- include/linux/ftrace_event.h | 4 ++-- include/linux/syscalls.h | 8 ++-- include/trace/ftrace.h | 8 ++-- kernel/trace/trace_export.c | 4 +--- 5 files changed, 9 insertions(+), 19 deletions(-) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 404a686..cb2d023 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -260,7 +260,7 @@ struct ftrace_func_command { int (*func)(struct ftrace_hash *hash, char *func, char *cmd, char *params, int enable); -}; +} __designated_init; #ifdef CONFIG_DYNAMIC_FTRACE @@ -283,7 +283,7 @@ struct ftrace_probe_ops { unsigned long ip, struct ftrace_probe_ops *ops, void *data); -}; +} __designated_init; extern int register_ftrace_function_probe(char *glob, struct ftrace_probe_ops *ops, diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index cff3106..25af313 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -198,7 +198,7 @@ struct ftrace_event_class { struct list_head*(*get_fields)(struct ftrace_event_call *); struct list_headfields; int (*raw_init)(struct ftrace_event_call *); -}; +} __designated_init; extern int ftrace_event_reg(struct ftrace_event_call *event, enum trace_reg type, void *data); @@ -293,7 +293,7 @@ struct ftrace_event_call { int (*perf_perm)(struct ftrace_event_call *, struct perf_event *); #endif -}; +} __designated_init; static inline const char * ftrace_event_name(struct ftrace_event_call *call) diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index b0881a0..3002648 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -120,9 +120,7 @@ extern struct trace_event_functions exit_syscall_print_funcs; static struct ftrace_event_call __used \ event_enter_##sname = { \ .class = _class_syscall_enter, \ - { \ - .name = "sys_enter"#sname,\ - }, \ + .name = "sys_enter"#sname,\ .event.funcs= _syscall_print_funcs, \ .data = (void *)&__syscall_meta_##sname,\ .flags = TRACE_EVENT_FL_CAP_ANY, \ @@ -136,9 +134,7 @@ extern struct trace_event_functions exit_syscall_print_funcs; static struct ftrace_event_call __used \ event_exit_##sname = {\ .class = _class_syscall_exit,\ - { \ - .name = "sys_exit"#sname, \ - }, \ + .name = "sys_exit"#sname, \ .event.funcs= _syscall_print_funcs,\ .data = (void *)&__syscall_meta_##sname,\ .flags = TRACE_EVENT_FL_CAP_ANY, \ diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index 26b4f2e..095aaca 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -699,9 +699,7 @@ static struct ftrace_event_class __used __refdata event_class_##call = { \ \ static struct ftrace_event_call __used event_##call = { \ .class = _class_##template, \ - { \ - .tp = &__tracepoint_##call, \ - }, \ + .tp = &__tracepoint_##call, \ .event.funcs= _event_type_funcs_##template, \ .print_fmt = print_fmt_##template, \ .flags = TRACE_EVENT_FL_TRACEPOINT,\ @@ -716,9 +714,7 @@ static const char print_fmt_##call[] = print; \ \ static