Re: [LEDE-DEV] DHCP via bridge in case of IPv4
Hello, On Mon, 2016-07-11 at 06:15 +, Alexey Brodkin wrote: > Hi Russel, > > On Sun, 2016-07-10 at 00:19 -0700, Russell Senior wrote: > > > > > > > > > "Alexey" == Alexey Brodkinwrites: > > Alexey> Hi Aaron, > > Alexey> On Sat, 2016-07-09 at 07:47 -0400, Aaron Z wrote: > > > > > > > On Sat, Jul 9, 2016 at 4:37 AM, Alexey Brodkin > > > > wrote: > > > > > > > > > > Hello, > > > > > > > > > > I was playing with quite simple bridged setup on different boards > > > > with > very recent kernels (4.6.3 as of this writing) and found one > > > > interesting > behavior that I cannot yet understand and googling > > > > din't help here as well. > > > > > > > > > > My setup is pretty simple: > > > > > - -- - > > > > > > > > > > > HOST | | "Dumb AP" | | Wireless > > > > client | > > with DHCP |<->(eth0) (wlan0)<->| > > > > attempting to | > > server| |\ br0 > > > > / | | get settings via DHCP | > > > > > - -- - > > > > > * HOST is my laptop with DHCP server that works for sure. > * > > > > "Dumb AP" is a separate board (I tried ARM-based Wandboard and > > > > ARC-based > AXS10x boards but results are exactly the same) with > > > > wired (eth0) and wireless > (wlan0) network controllers bridged > > > > together (br0). That "br0" bridge flawlessly > gets its settings > > > > from DHCP server on host. > * Wireless client could be either a > > > > smatrphone or another laptop etc but > what's important it should > > > > be configured to get network settings by DHCP as well. > > > > > > > > > > So what happens "br0" always gets network settings from DHCP server > > > > on HOST. > That's fine. But wireless client only reliably gets > > > > settings from DHCP server > if IPv6 is enabled on "Dumb AP" board. If > > > > IPv6 is disabled I may see that > wireless client sends "DHCP > > > > Discover" then server replies with "DHCP Offer" but > that offer > > > > never reaches wireless client. > > > > > > > > Do you have WDS enabled? If not, DHCP has issues in that scenario: > > > > https://wiki.openwrt.org/doc/howto/clientmode > > If the Dumb AP's wireless interface is in ap-mode, then this shouldn't > > be an issue. It's only client-mode interfaces that have trouble with > > bridging. > > > > I'd suggest running tcpdump on the Dumb AP's wireless interface and the > > client's wireless interface and see which of them sees the various parts > > of the DHCP handshake. > > So I did but for DHCP server and wireless client (had no tcpdump on Dump AP > at the moment). > > That's what I see on the server: > ->8--- > No. TimeSource Destination Protocol Length Info > 3 0.151181000 0.0.0.0255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 11 2.760796000 10.42.0.1 10.42.0.13 DHCP 342DHCP Offer- > Transaction ID 0x31dc321f > 14 5.220985000 0.0.0.0255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 15 5.22115 10.42.0.1 10.42.0.13 DHCP 342DHCP Offer- > Transaction ID 0x31dc321f > 23 15.649835000 0.0.0.0255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 24 15.650017000 10.42.0.1 10.42.0.13 DHCP 342DHCP Offer- > Transaction ID 0x31dc321f > 32 25.648589000 0.0.0.0255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 33 25.648758000 10.42.0.1 10.42.0.13 DHCP 342DHCP Offer- > Transaction ID 0x31dc321f > 43 35.864567000 0.0.0.0255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 48 38.832837000 10.42.0.1 10.42.0.13 DHCP 342DHCP Offer- > Transaction ID 0x31dc321f > ->8--- > > That's on the wireless client: > ->8--- > No. Time Source Destination Protocol Length Info > 1171 94.192971000 0.0.0.0 255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 1182 99.263686000 0.0.0.0 255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 1185 109.692642000 0.0.0.0 255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 1186 119.691474000 0.0.0.0 255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 1190 129.907507000 0.0.0.0 255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > ->8--- > > I'll try to capture data from Dumb AP sometime soon and will reply to the > thread. So finally after quite some time I figured out what
Re: [LEDE-DEV] DHCP via bridge in case of IPv4
Hello, On Mon, 2016-07-11 at 06:15 +, Alexey Brodkin wrote: > Hi Russel, > > On Sun, 2016-07-10 at 00:19 -0700, Russell Senior wrote: > > > > > > > > > "Alexey" == Alexey Brodkin writes: > > Alexey> Hi Aaron, > > Alexey> On Sat, 2016-07-09 at 07:47 -0400, Aaron Z wrote: > > > > > > > On Sat, Jul 9, 2016 at 4:37 AM, Alexey Brodkin > > > > wrote: > > > > > > > > > > Hello, > > > > > > > > > > I was playing with quite simple bridged setup on different boards > > > > with > very recent kernels (4.6.3 as of this writing) and found one > > > > interesting > behavior that I cannot yet understand and googling > > > > din't help here as well. > > > > > > > > > > My setup is pretty simple: > > > > > - -- - > > > > > > > > > > > HOST | | "Dumb AP" | | Wireless > > > > client | > > with DHCP |<->(eth0) (wlan0)<->| > > > > attempting to | > > server| |\ br0 > > > > / | | get settings via DHCP | > > > > > - -- - > > > > > * HOST is my laptop with DHCP server that works for sure. > * > > > > "Dumb AP" is a separate board (I tried ARM-based Wandboard and > > > > ARC-based > AXS10x boards but results are exactly the same) with > > > > wired (eth0) and wireless > (wlan0) network controllers bridged > > > > together (br0). That "br0" bridge flawlessly > gets its settings > > > > from DHCP server on host. > * Wireless client could be either a > > > > smatrphone or another laptop etc but > what's important it should > > > > be configured to get network settings by DHCP as well. > > > > > > > > > > So what happens "br0" always gets network settings from DHCP server > > > > on HOST. > That's fine. But wireless client only reliably gets > > > > settings from DHCP server > if IPv6 is enabled on "Dumb AP" board. If > > > > IPv6 is disabled I may see that > wireless client sends "DHCP > > > > Discover" then server replies with "DHCP Offer" but > that offer > > > > never reaches wireless client. > > > > > > > > Do you have WDS enabled? If not, DHCP has issues in that scenario: > > > > https://wiki.openwrt.org/doc/howto/clientmode > > If the Dumb AP's wireless interface is in ap-mode, then this shouldn't > > be an issue. It's only client-mode interfaces that have trouble with > > bridging. > > > > I'd suggest running tcpdump on the Dumb AP's wireless interface and the > > client's wireless interface and see which of them sees the various parts > > of the DHCP handshake. > > So I did but for DHCP server and wireless client (had no tcpdump on Dump AP > at the moment). > > That's what I see on the server: > ->8--- > No. TimeSource Destination Protocol Length Info > 3 0.151181000 0.0.0.0255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 11 2.760796000 10.42.0.1 10.42.0.13 DHCP 342DHCP Offer- > Transaction ID 0x31dc321f > 14 5.220985000 0.0.0.0255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 15 5.22115 10.42.0.1 10.42.0.13 DHCP 342DHCP Offer- > Transaction ID 0x31dc321f > 23 15.649835000 0.0.0.0255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 24 15.650017000 10.42.0.1 10.42.0.13 DHCP 342DHCP Offer- > Transaction ID 0x31dc321f > 32 25.648589000 0.0.0.0255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 33 25.648758000 10.42.0.1 10.42.0.13 DHCP 342DHCP Offer- > Transaction ID 0x31dc321f > 43 35.864567000 0.0.0.0255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 48 38.832837000 10.42.0.1 10.42.0.13 DHCP 342DHCP Offer- > Transaction ID 0x31dc321f > ->8--- > > That's on the wireless client: > ->8--- > No. Time Source Destination Protocol Length Info > 1171 94.192971000 0.0.0.0 255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 1182 99.263686000 0.0.0.0 255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 1185 109.692642000 0.0.0.0 255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 1186 119.691474000 0.0.0.0 255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > 1190 129.907507000 0.0.0.0 255.255.255.255 DHCP 342DHCP Discover - > Transaction ID 0x31dc321f > ->8--- > > I'll try to capture data from Dumb AP sometime soon and will reply to the > thread. So finally after quite some time I figured out what happens in my setup. Basically it all boils down to the
Re: [RFC][PATCHSET v2] allowing exports in *.S
Dne 16.8.2016 v 07:48 Michal Marek napsal(a): > Dne 2.8.2016 v 16:01 Michal Marek napsal(a): >> On 2016-02-03 22:19, Al Viro wrote: >>> Shortlog: >>> Al Viro (13): >>> [kbuild] handle exports in lib-y objects reliably >>> EXPORT_SYMBOL() for asm >>> x86: move exports to actual definitions >>> alpha: move exports to actual definitions >>> m68k: move exports to definitions >>> s390: move exports to definitions >>> arm: move exports to definitions >>> ppc: move exports to definitions >>> ppc: get rid of unreachable abs() implementation >>> sparc: move exports to definitions >>> [sparc] unify 32bit and 64bit string.h >>> sparc32: debride memcpy.S a bit >>> ia64: move exports to definitions >> >> After several pings by Al (sorry about that!), I got around to review a >> rebased version of this patchset at >> >> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git asm-exports >> >> The kbuild commits are good, but since we are close to the end of the >> merge window, I will apply them to my kbuild branch after 4.8-rc1. > > The rebased patchset is now in kbuild.git#kbuild. Before pushing, I > noticed one issue: For some reason, > drivers/firmware/efi/libstub/lib-ksyms.o is regenerated each time, > leading to relink of vmlinux. I'm looking into this. OK, it's the $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE $(call if_changed_rule,cc_o_c) rule in drivers/firmware/efi/libstub/Makefile file that conflicts with the lib-ksyms.o rule. I need to find a better solution to this hack. Michal
Re: [RFC][PATCHSET v2] allowing exports in *.S
Dne 16.8.2016 v 07:48 Michal Marek napsal(a): > Dne 2.8.2016 v 16:01 Michal Marek napsal(a): >> On 2016-02-03 22:19, Al Viro wrote: >>> Shortlog: >>> Al Viro (13): >>> [kbuild] handle exports in lib-y objects reliably >>> EXPORT_SYMBOL() for asm >>> x86: move exports to actual definitions >>> alpha: move exports to actual definitions >>> m68k: move exports to definitions >>> s390: move exports to definitions >>> arm: move exports to definitions >>> ppc: move exports to definitions >>> ppc: get rid of unreachable abs() implementation >>> sparc: move exports to definitions >>> [sparc] unify 32bit and 64bit string.h >>> sparc32: debride memcpy.S a bit >>> ia64: move exports to definitions >> >> After several pings by Al (sorry about that!), I got around to review a >> rebased version of this patchset at >> >> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git asm-exports >> >> The kbuild commits are good, but since we are close to the end of the >> merge window, I will apply them to my kbuild branch after 4.8-rc1. > > The rebased patchset is now in kbuild.git#kbuild. Before pushing, I > noticed one issue: For some reason, > drivers/firmware/efi/libstub/lib-ksyms.o is regenerated each time, > leading to relink of vmlinux. I'm looking into this. OK, it's the $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE $(call if_changed_rule,cc_o_c) rule in drivers/firmware/efi/libstub/Makefile file that conflicts with the lib-ksyms.o rule. I need to find a better solution to this hack. Michal
Re: [PATCH v6 05/11] mm, compaction: add the ultimate direct compaction priority
On Wed, Aug 10, 2016 at 11:12:20AM +0200, Vlastimil Babka wrote: > During reclaim/compaction loop, it's desirable to get a final answer from > unsuccessful compaction so we can either fail the allocation or invoke the OOM > killer. However, heuristics such as deferred compaction or pageblock skip bits > can cause compaction to skip parts or whole zones and lead to premature OOM's, > failures or excessive reclaim/compaction retries. > > To remedy this, we introduce a new direct compaction priority called > COMPACT_PRIO_SYNC_FULL, which instructs direct compaction to: > > - ignore deferred compaction status for a zone > - ignore pageblock skip hints > - ignore cached scanner positions and scan the whole zone > > The new priority should get eventually picked up by should_compact_retry() and > this should improve success rates for costly allocations using __GFP_REPEAT, > such as hugetlbfs allocations, and reduce some corner-case OOM's for > non-costly > allocations. > > Signed-off-by: Vlastimil Babka> Acked-by: Michal Hocko > --- > include/linux/compaction.h | 3 ++- > mm/compaction.c| 5 - > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/include/linux/compaction.h b/include/linux/compaction.h > index e88c037afe47..a1fba9994728 100644 > --- a/include/linux/compaction.h > +++ b/include/linux/compaction.h > @@ -6,8 +6,9 @@ > * Lower value means higher priority, analogically to reclaim priority. > */ > enum compact_priority { > + COMPACT_PRIO_SYNC_FULL, > + MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_FULL, > COMPACT_PRIO_SYNC_LIGHT, > - MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT, > DEF_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT, > COMPACT_PRIO_ASYNC, > INIT_COMPACT_PRIORITY = COMPACT_PRIO_ASYNC > diff --git a/mm/compaction.c b/mm/compaction.c > index a144f58f7193..ae4f40afcca1 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1644,6 +1644,8 @@ static enum compact_result compact_zone_order(struct > zone *zone, int order, > .alloc_flags = alloc_flags, > .classzone_idx = classzone_idx, > .direct_compaction = true, > + .whole_zone = (prio == COMPACT_PRIO_SYNC_FULL), > + .ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL) > }; > INIT_LIST_HEAD(); > INIT_LIST_HEAD(); > @@ -1689,7 +1691,8 @@ enum compact_result try_to_compact_pages(gfp_t > gfp_mask, unsigned int order, > ac->nodemask) { > enum compact_result status; > > - if (compaction_deferred(zone, order)) { > + if (prio > COMPACT_PRIO_SYNC_FULL > + && compaction_deferred(zone, order)) { > rc = max_t(enum compact_result, COMPACT_DEFERRED, rc); > continue; Could we provide prio to compaction_deferred() and do the decision in that that function? BTW, in kcompactd, compaction_deferred() is checked but .ignore_skip_hint=true. Is there any reason? If we can remove compaction_deferred() for kcompactd, we can check .ignore_skip_hint to determine if defer is needed or not. Thanks.
Re: [Query] increased latency observed in cpu hotplug path
On 8/5/2016 12:49 PM, Khan, Imran wrote: > On 8/1/2016 2:58 PM, Khan, Imran wrote: >> On 7/30/2016 7:54 AM, Akinobu Mita wrote: >>> 2016-07-28 22:18 GMT+09:00 Khan, Imran: Hi, Recently we have observed some increased latency in CPU hotplug event in CPU online path. For online latency we see that block layer is executing notification handler for CPU_UP_PREPARE event and this in turn waits for RCU grace period resulting (sometimes) in an execution time of 15-20 ms for this notification handler. This change was not there in 3.18 kernel but is present in 4.4 kernel and was introduced by following commit: commit 5778322e67ed34dc9f391a4a5cbcbb856071ceba Author: Akinobu Mita Date: Sun Sep 27 02:09:23 2015 +0900 blk-mq: avoid inserting requests before establishing new mapping >>> >>> ... >>> Upon reverting this commit I could see an improvement of 15-20 ms in online latency. So I am looking for some help in analyzing the effects of reverting this or should some other approach to reduce the online latency must be taken. >>> >>> Can you observe the difference in online latency by removing >>> get_online_cpus() and put_online_cpus() pair in >>> blk_mq_init_allocated_queue() >>> instead of full reverting the commit? >>> >> Hi Akinobu, >> I tried your suggestion but could not achieve any improvement. Actually the >> snippet that is causing the change in latency is the following one : >> >> list_for_each_entry(q, _q_list, all_q_node) { >> blk_mq_freeze_queue_wait(q); >> >> /* >> * timeout handler can't touch hw queue during the >> * reinitialization >> */ >> del_timer_sync(>timeout); >> } >> >> I understand that this is getting executed now for CPU_UP_PREPARE as well >> resulting in >> increased latency in the cpu online path. I am trying to reduce this latency >> while keeping the >> purpose of this commit intact. I would welcome further suggestions/feedback >> in this regard. >> > Hi Akinobu, > > I am not able to reduce the cpu online latency with this patch, could you > please let me know what > functionality will be broken, if we avoid this patch in our kernel. Also if > you have some other > suggestions towards improving this patch please let me know. > After moving the remapping of queues to block layer's kworker I see that online latency has improved while offline latency remains the same. As the freezing of queues happens in the context of block layer's worker, I think it would be better to do the remapping in the same context and then go ahead with freezing. In this regard I have made following change: commit b2131b86eeef4c5b1f8adaf7a53606301aa6b624 Author: Imran Khan Date: Fri Aug 12 19:59:47 2016 +0530 blk-mq: Move block queue remapping from cpu hotplug path During a cpu hotplug, the hardware and software contexts mappings need to be updated in order to take into account requests submitted for the hotadded CPU. But if this mapping is done in hotplug notifier, it deteriorates the hotplug latency. So move the block queue remapping to block layer worker which results in significant improvements in hotplug latency. Change-Id: I01ac83178ce95c3a4e3b7b1b286eda65ff34e8c4 Signed-off-by: Imran Khan diff --git a/block/blk-mq.c b/block/blk-mq.c index 6d6f8fe..06fcf89 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -22,7 +22,11 @@ #include #include #include - +#include +#include +#include +#include +#include #include #include @@ -32,10 +36,18 @@ static DEFINE_MUTEX(all_q_mutex); static LIST_HEAD(all_q_list); - static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx); /* + * New online cpumask which is going to be set in this hotplug event. + * Declare this cpumasks as global as cpu-hotplug operation is invoked + * one-by-one and dynamically allocating this could result in a failure. + */ +static struct cpumask online_new; + +static struct work_struct blk_mq_remap_work; + +/* * Check if any of the ctx's have pending work in this hardware queue */ static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx) @@ -2125,14 +2137,7 @@ static void blk_mq_queue_reinit(struct request_queue *q, static int blk_mq_queue_reinit_notify(struct notifier_block *nb, unsigned long action, void *hcpu) { - struct request_queue *q; int cpu = (unsigned long)hcpu; - /* -* New online cpumask which is going to be set in this hotplug event. -* Declare this cpumasks as global as cpu-hotplug operation is invoked -* one-by-one and dynamically allocating this could result in a failure. -*/ - static struct cpumask online_new; /* * Before
Re: [PATCH v6 05/11] mm, compaction: add the ultimate direct compaction priority
On Wed, Aug 10, 2016 at 11:12:20AM +0200, Vlastimil Babka wrote: > During reclaim/compaction loop, it's desirable to get a final answer from > unsuccessful compaction so we can either fail the allocation or invoke the OOM > killer. However, heuristics such as deferred compaction or pageblock skip bits > can cause compaction to skip parts or whole zones and lead to premature OOM's, > failures or excessive reclaim/compaction retries. > > To remedy this, we introduce a new direct compaction priority called > COMPACT_PRIO_SYNC_FULL, which instructs direct compaction to: > > - ignore deferred compaction status for a zone > - ignore pageblock skip hints > - ignore cached scanner positions and scan the whole zone > > The new priority should get eventually picked up by should_compact_retry() and > this should improve success rates for costly allocations using __GFP_REPEAT, > such as hugetlbfs allocations, and reduce some corner-case OOM's for > non-costly > allocations. > > Signed-off-by: Vlastimil Babka > Acked-by: Michal Hocko > --- > include/linux/compaction.h | 3 ++- > mm/compaction.c| 5 - > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/include/linux/compaction.h b/include/linux/compaction.h > index e88c037afe47..a1fba9994728 100644 > --- a/include/linux/compaction.h > +++ b/include/linux/compaction.h > @@ -6,8 +6,9 @@ > * Lower value means higher priority, analogically to reclaim priority. > */ > enum compact_priority { > + COMPACT_PRIO_SYNC_FULL, > + MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_FULL, > COMPACT_PRIO_SYNC_LIGHT, > - MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT, > DEF_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT, > COMPACT_PRIO_ASYNC, > INIT_COMPACT_PRIORITY = COMPACT_PRIO_ASYNC > diff --git a/mm/compaction.c b/mm/compaction.c > index a144f58f7193..ae4f40afcca1 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1644,6 +1644,8 @@ static enum compact_result compact_zone_order(struct > zone *zone, int order, > .alloc_flags = alloc_flags, > .classzone_idx = classzone_idx, > .direct_compaction = true, > + .whole_zone = (prio == COMPACT_PRIO_SYNC_FULL), > + .ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL) > }; > INIT_LIST_HEAD(); > INIT_LIST_HEAD(); > @@ -1689,7 +1691,8 @@ enum compact_result try_to_compact_pages(gfp_t > gfp_mask, unsigned int order, > ac->nodemask) { > enum compact_result status; > > - if (compaction_deferred(zone, order)) { > + if (prio > COMPACT_PRIO_SYNC_FULL > + && compaction_deferred(zone, order)) { > rc = max_t(enum compact_result, COMPACT_DEFERRED, rc); > continue; Could we provide prio to compaction_deferred() and do the decision in that that function? BTW, in kcompactd, compaction_deferred() is checked but .ignore_skip_hint=true. Is there any reason? If we can remove compaction_deferred() for kcompactd, we can check .ignore_skip_hint to determine if defer is needed or not. Thanks.
Re: [Query] increased latency observed in cpu hotplug path
On 8/5/2016 12:49 PM, Khan, Imran wrote: > On 8/1/2016 2:58 PM, Khan, Imran wrote: >> On 7/30/2016 7:54 AM, Akinobu Mita wrote: >>> 2016-07-28 22:18 GMT+09:00 Khan, Imran : Hi, Recently we have observed some increased latency in CPU hotplug event in CPU online path. For online latency we see that block layer is executing notification handler for CPU_UP_PREPARE event and this in turn waits for RCU grace period resulting (sometimes) in an execution time of 15-20 ms for this notification handler. This change was not there in 3.18 kernel but is present in 4.4 kernel and was introduced by following commit: commit 5778322e67ed34dc9f391a4a5cbcbb856071ceba Author: Akinobu Mita Date: Sun Sep 27 02:09:23 2015 +0900 blk-mq: avoid inserting requests before establishing new mapping >>> >>> ... >>> Upon reverting this commit I could see an improvement of 15-20 ms in online latency. So I am looking for some help in analyzing the effects of reverting this or should some other approach to reduce the online latency must be taken. >>> >>> Can you observe the difference in online latency by removing >>> get_online_cpus() and put_online_cpus() pair in >>> blk_mq_init_allocated_queue() >>> instead of full reverting the commit? >>> >> Hi Akinobu, >> I tried your suggestion but could not achieve any improvement. Actually the >> snippet that is causing the change in latency is the following one : >> >> list_for_each_entry(q, _q_list, all_q_node) { >> blk_mq_freeze_queue_wait(q); >> >> /* >> * timeout handler can't touch hw queue during the >> * reinitialization >> */ >> del_timer_sync(>timeout); >> } >> >> I understand that this is getting executed now for CPU_UP_PREPARE as well >> resulting in >> increased latency in the cpu online path. I am trying to reduce this latency >> while keeping the >> purpose of this commit intact. I would welcome further suggestions/feedback >> in this regard. >> > Hi Akinobu, > > I am not able to reduce the cpu online latency with this patch, could you > please let me know what > functionality will be broken, if we avoid this patch in our kernel. Also if > you have some other > suggestions towards improving this patch please let me know. > After moving the remapping of queues to block layer's kworker I see that online latency has improved while offline latency remains the same. As the freezing of queues happens in the context of block layer's worker, I think it would be better to do the remapping in the same context and then go ahead with freezing. In this regard I have made following change: commit b2131b86eeef4c5b1f8adaf7a53606301aa6b624 Author: Imran Khan Date: Fri Aug 12 19:59:47 2016 +0530 blk-mq: Move block queue remapping from cpu hotplug path During a cpu hotplug, the hardware and software contexts mappings need to be updated in order to take into account requests submitted for the hotadded CPU. But if this mapping is done in hotplug notifier, it deteriorates the hotplug latency. So move the block queue remapping to block layer worker which results in significant improvements in hotplug latency. Change-Id: I01ac83178ce95c3a4e3b7b1b286eda65ff34e8c4 Signed-off-by: Imran Khan diff --git a/block/blk-mq.c b/block/blk-mq.c index 6d6f8fe..06fcf89 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -22,7 +22,11 @@ #include #include #include - +#include +#include +#include +#include +#include #include #include @@ -32,10 +36,18 @@ static DEFINE_MUTEX(all_q_mutex); static LIST_HEAD(all_q_list); - static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx); /* + * New online cpumask which is going to be set in this hotplug event. + * Declare this cpumasks as global as cpu-hotplug operation is invoked + * one-by-one and dynamically allocating this could result in a failure. + */ +static struct cpumask online_new; + +static struct work_struct blk_mq_remap_work; + +/* * Check if any of the ctx's have pending work in this hardware queue */ static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx) @@ -2125,14 +2137,7 @@ static void blk_mq_queue_reinit(struct request_queue *q, static int blk_mq_queue_reinit_notify(struct notifier_block *nb, unsigned long action, void *hcpu) { - struct request_queue *q; int cpu = (unsigned long)hcpu; - /* -* New online cpumask which is going to be set in this hotplug event. -* Declare this cpumasks as global as cpu-hotplug operation is invoked -* one-by-one and dynamically allocating this could result in a failure. -*/ - static struct cpumask online_new; /* * Before hotadded cpu starts handling requests, new mappings must @@ -2155,43 +2160,17 @@ static int
Re: [PATCH] Map in physical addresses in efi_map_region_fixed
On Mon, Aug 15, 2016 at 01:47:31PM -0500, Alex Thorlton wrote: > The only thing we're adding here is the physical mappings, to match > what is availble in the primary kernel. I can see what it does - I just am questioning the reasoning for as we did all that effort so that kexec can have stable virtual mappings. I guess we still need a way to pass the virtual mappings to kexec as they're immutable as some "smartass" decided to allow to call SetVirtualAddressMap only once. > This is sort of a hand-wavey answer - I will investigate the his further... Yeah, it'll be interesting to know whether that is an issue because if we do the 1:1 mappings in the kexec kernel too and there's an address conflict, then we better know upfront. > It's not that we need it all of the sudden, necessarily, it's just that > we've had to make other changes to make things work with the new, > (almost) completely isolated, EFI page tables. We ended up choosing the > lesser of two evils, and have decided to temporarily rely on the > physical address of our runtime code, instead of continuing to rely on > EFI_OLD_MEMMAP. Well, if it starts to cause trouble, you probably will have to revert. > If there are strong objections to this change, I won't pursue it > further. I don't really care all that much as long as it doesn't break the existing situation. I've long given up on the hope that EFI and all its incarnations will hold on to some spec... :-) -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --
Re: [PATCH] Map in physical addresses in efi_map_region_fixed
On Mon, Aug 15, 2016 at 01:47:31PM -0500, Alex Thorlton wrote: > The only thing we're adding here is the physical mappings, to match > what is availble in the primary kernel. I can see what it does - I just am questioning the reasoning for as we did all that effort so that kexec can have stable virtual mappings. I guess we still need a way to pass the virtual mappings to kexec as they're immutable as some "smartass" decided to allow to call SetVirtualAddressMap only once. > This is sort of a hand-wavey answer - I will investigate the his further... Yeah, it'll be interesting to know whether that is an issue because if we do the 1:1 mappings in the kexec kernel too and there's an address conflict, then we better know upfront. > It's not that we need it all of the sudden, necessarily, it's just that > we've had to make other changes to make things work with the new, > (almost) completely isolated, EFI page tables. We ended up choosing the > lesser of two evils, and have decided to temporarily rely on the > physical address of our runtime code, instead of continuing to rely on > EFI_OLD_MEMMAP. Well, if it starts to cause trouble, you probably will have to revert. > If there are strong objections to this change, I won't pursue it > further. I don't really care all that much as long as it doesn't break the existing situation. I've long given up on the hope that EFI and all its incarnations will hold on to some spec... :-) -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --
Re: [PATCH 1/2] ARM: dts: imx7d: move CPU operating points to imx7d.dtsi
On Thu, Aug 11, 2016 at 05:11:06PM -0700, Stefan Agner wrote: > Only i.MX 7Dual SoC supports CPU frequencies of up to 1GHz. The i.MX > 7Solo can run with up to 800MHz and does so without making use of DVFS > usually. While the device tree clearly specified a too fast operating > point for i.MX 7Solo, the kernel did not used it in practise so far > because the CPUfreq driver does not get loaded on i.MX 7Solo devices > (since the fsl,imx7s compatible string is not in the list of devices > making use of the cpufreq-dt driver...). > > Signed-off-by: Stefan Agner> --- > Hi Shawn, > > This is based on my earlier patchset: > ARM: dts: imx7d: move ARM platform peripherals inside soc > > This are kind of fixes too, so if possible I would like to see them > in v4.8, what do you think? Patch "ARM: dts: imx7d: move ARM platform peripherals inside soc node" is not really a fix, and the diffstat looks too dramatic to be a -rc material, so I queued it as a -next patch, and any patch based on it will have to go through -next as well. Applied for -next, thanks. Shawn > > -- > Stefan > > arch/arm/boot/dts/imx7d.dtsi | 8 > arch/arm/boot/dts/imx7s.dtsi | 5 - > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/arch/arm/boot/dts/imx7d.dtsi b/arch/arm/boot/dts/imx7d.dtsi > index 3d77d95..d0b199c 100644 > --- a/arch/arm/boot/dts/imx7d.dtsi > +++ b/arch/arm/boot/dts/imx7d.dtsi > @@ -45,6 +45,14 @@ > > / { > cpus { > + cpu0: cpu@0 { > + operating-points = < > + /* KHz uV */ > + 996000 1075000 > + 792000 975000 > + >; > + }; > + > cpu1: cpu@1 { > compatible = "arm,cortex-a7"; > device_type = "cpu"; > diff --git a/arch/arm/boot/dts/imx7s.dtsi b/arch/arm/boot/dts/imx7s.dtsi > index c63591c..5132e2f 100644 > --- a/arch/arm/boot/dts/imx7s.dtsi > +++ b/arch/arm/boot/dts/imx7s.dtsi > @@ -85,11 +85,6 @@ > compatible = "arm,cortex-a7"; > device_type = "cpu"; > reg = <0>; > - operating-points = < > - /* KHz uV */ > - 996000 1075000 > - 792000 975000 > - >; > clock-latency = <61036>; /* two CLK32 periods */ > clocks = < IMX7D_CLK_ARM>; > }; > -- > 2.9.0 > > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Re: [PATCH 1/2] ARM: dts: imx7d: move CPU operating points to imx7d.dtsi
On Thu, Aug 11, 2016 at 05:11:06PM -0700, Stefan Agner wrote: > Only i.MX 7Dual SoC supports CPU frequencies of up to 1GHz. The i.MX > 7Solo can run with up to 800MHz and does so without making use of DVFS > usually. While the device tree clearly specified a too fast operating > point for i.MX 7Solo, the kernel did not used it in practise so far > because the CPUfreq driver does not get loaded on i.MX 7Solo devices > (since the fsl,imx7s compatible string is not in the list of devices > making use of the cpufreq-dt driver...). > > Signed-off-by: Stefan Agner > --- > Hi Shawn, > > This is based on my earlier patchset: > ARM: dts: imx7d: move ARM platform peripherals inside soc > > This are kind of fixes too, so if possible I would like to see them > in v4.8, what do you think? Patch "ARM: dts: imx7d: move ARM platform peripherals inside soc node" is not really a fix, and the diffstat looks too dramatic to be a -rc material, so I queued it as a -next patch, and any patch based on it will have to go through -next as well. Applied for -next, thanks. Shawn > > -- > Stefan > > arch/arm/boot/dts/imx7d.dtsi | 8 > arch/arm/boot/dts/imx7s.dtsi | 5 - > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/arch/arm/boot/dts/imx7d.dtsi b/arch/arm/boot/dts/imx7d.dtsi > index 3d77d95..d0b199c 100644 > --- a/arch/arm/boot/dts/imx7d.dtsi > +++ b/arch/arm/boot/dts/imx7d.dtsi > @@ -45,6 +45,14 @@ > > / { > cpus { > + cpu0: cpu@0 { > + operating-points = < > + /* KHz uV */ > + 996000 1075000 > + 792000 975000 > + >; > + }; > + > cpu1: cpu@1 { > compatible = "arm,cortex-a7"; > device_type = "cpu"; > diff --git a/arch/arm/boot/dts/imx7s.dtsi b/arch/arm/boot/dts/imx7s.dtsi > index c63591c..5132e2f 100644 > --- a/arch/arm/boot/dts/imx7s.dtsi > +++ b/arch/arm/boot/dts/imx7s.dtsi > @@ -85,11 +85,6 @@ > compatible = "arm,cortex-a7"; > device_type = "cpu"; > reg = <0>; > - operating-points = < > - /* KHz uV */ > - 996000 1075000 > - 792000 975000 > - >; > clock-latency = <61036>; /* two CLK32 periods */ > clocks = < IMX7D_CLK_ARM>; > }; > -- > 2.9.0 > > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Re: [PATCH v6 0/2] Block layer support ZAC/ZBC commands
On Mon, Aug 15, 2016 at 11:00 PM, Damien Le Moalwrote: > > Shaun, > >> On Aug 14, 2016, at 09:09, Shaun Tancheff wrote: > […] >>> No, surely not. >>> But one of the _big_ advantages for the RB tree is blkdev_discard(). >>> Without the RB tree any mkfs program will issue a 'discard' for every >>> sector. We will be able to coalesce those into one discard per zone, but >>> we still need to issue one for _every_ zone. >> >> How can you make coalesce work transparently in the >> sd layer _without_ keeping some sort of a discard cache along >> with the zone cache? >> >> Currently the block layer's blkdev_issue_discard() is breaking >> large discard's into nice granular and aligned chunks but it is >> not preventing small discards nor coalescing them. >> >> In the sd layer would there be way to persist or purge an >> overly large discard cache? What about honoring >> discard_zeroes_data? Once the discard is completed with >> discard_zeroes_data you have to return zeroes whenever >> a discarded sector is read. Isn't that a log more than just >> tracking a write pointer? Couldn't a zone have dozens of holes? > > My understanding of the standards regarding discard is that it is not > mandatory and that it is a hint to the drive. The drive can completely > ignore it if it thinks that is a better choice. I may be wrong on this > though. Need to check again. But you are currently setting discard_zeroes_data=1 in your current patches. I believe that setting discard_zeroes_data=1 effectively promotes discards to being mandatory. I have a follow on patch to my SCT Write Same series that handles the CMR zone case in the sd_zbc_setup_discard() handler. > For reset write pointer, the mapping to discard requires that the calls > to blkdev_issue_discard be zone aligned for anything to happen. Specify > less than a zone and nothing will be done. This I think preserve the > discard semantic. Oh. If that is the intent then there is just a bug in the handler. I have pointed out where I believe it to be in my response to the zone cache patch being posted. > As for the “discard_zeroes_data” thing, I also think that is a drive > feature not mandatory. Drives may have it or not, which is consistent > with the ZBC/ZAC standards regarding reading after write pointer (nothing > says that zeros have to be returned). In any case, discard of CMR zones > will be a nop, so for SMR drives, discard_zeroes_data=0 may be a better > choice. However I am still curious about discard's being coalesced. >>> Which is (as indicated) really slow, and easily takes several minutes. >>> With the RB tree we can short-circuit discards to empty zones, and speed >>> up processing time dramatically. >>> Sure we could be moving the logic into mkfs and friends, but that would >>> require us to change the programs and agree on a library (libzbc?) which >>> should be handling that. >> >> F2FS's mkfs.f2fs is already reading the zone topology via SG_IO ... >> so I'm not sure your argument is valid here. > > This initial SMR support patch is just that: a first try. Jaegeuk > used SG_IO (in fact copy-paste of parts of libzbc) because the current > ZBC patch-set has no ioctl API for zone information manipulation. We > will fix this mkfs.f2fs once we agree on an ioctl interface. Which again is my point. If mkfs.f2fs wants to speed up it's discard pass in mkfs.f2fs by _not_ sending unneccessary Reset WP for zones that are already empty it has all the information it needs to do so. Here it seems to me that the zone cache is _at_best_ doing double work. At works the zone cache could be doing the wrong thing _if_ the zone cache got out of sync. It is certainly possible (however unlikely) that someone was doing some raw sg activity that is not seed by the sd path. All I am trying to do is have a discussion about the reasons for and against have a zone cache. Where it works and where it breaks this should be entirely technical but I understand that we have all spent a lot of time _not_ discussing this for various non-technical reasons. So far the only reason I've been able to ascertain is that Host Manged drives really don't like being stuck with the URSWRZ and would like to have a software hack to return MUD rather than ship drives with some weird out-of-the box config where the last zone is marked as FINISH'd thereby returning MUD on reads as per spec. I understand that it would be strange state to see of first boot and likely people would just do a ResetWP and have weird boot errors, which would probably just make matters worse. I just would rather the work around be a bit cleaner and/or use less memory. I would also like a path available that does not require SD_ZBC or BLK_ZONED for Host Aware drives to work, hence this set of patches and me begging for a single bit in struct bio. >> >> [..] >> > 3) Try to condense the blkzone data structure to save memory: > I think that we can at the very
Re: [PATCH v6 0/2] Block layer support ZAC/ZBC commands
On Mon, Aug 15, 2016 at 11:00 PM, Damien Le Moal wrote: > > Shaun, > >> On Aug 14, 2016, at 09:09, Shaun Tancheff wrote: > […] >>> No, surely not. >>> But one of the _big_ advantages for the RB tree is blkdev_discard(). >>> Without the RB tree any mkfs program will issue a 'discard' for every >>> sector. We will be able to coalesce those into one discard per zone, but >>> we still need to issue one for _every_ zone. >> >> How can you make coalesce work transparently in the >> sd layer _without_ keeping some sort of a discard cache along >> with the zone cache? >> >> Currently the block layer's blkdev_issue_discard() is breaking >> large discard's into nice granular and aligned chunks but it is >> not preventing small discards nor coalescing them. >> >> In the sd layer would there be way to persist or purge an >> overly large discard cache? What about honoring >> discard_zeroes_data? Once the discard is completed with >> discard_zeroes_data you have to return zeroes whenever >> a discarded sector is read. Isn't that a log more than just >> tracking a write pointer? Couldn't a zone have dozens of holes? > > My understanding of the standards regarding discard is that it is not > mandatory and that it is a hint to the drive. The drive can completely > ignore it if it thinks that is a better choice. I may be wrong on this > though. Need to check again. But you are currently setting discard_zeroes_data=1 in your current patches. I believe that setting discard_zeroes_data=1 effectively promotes discards to being mandatory. I have a follow on patch to my SCT Write Same series that handles the CMR zone case in the sd_zbc_setup_discard() handler. > For reset write pointer, the mapping to discard requires that the calls > to blkdev_issue_discard be zone aligned for anything to happen. Specify > less than a zone and nothing will be done. This I think preserve the > discard semantic. Oh. If that is the intent then there is just a bug in the handler. I have pointed out where I believe it to be in my response to the zone cache patch being posted. > As for the “discard_zeroes_data” thing, I also think that is a drive > feature not mandatory. Drives may have it or not, which is consistent > with the ZBC/ZAC standards regarding reading after write pointer (nothing > says that zeros have to be returned). In any case, discard of CMR zones > will be a nop, so for SMR drives, discard_zeroes_data=0 may be a better > choice. However I am still curious about discard's being coalesced. >>> Which is (as indicated) really slow, and easily takes several minutes. >>> With the RB tree we can short-circuit discards to empty zones, and speed >>> up processing time dramatically. >>> Sure we could be moving the logic into mkfs and friends, but that would >>> require us to change the programs and agree on a library (libzbc?) which >>> should be handling that. >> >> F2FS's mkfs.f2fs is already reading the zone topology via SG_IO ... >> so I'm not sure your argument is valid here. > > This initial SMR support patch is just that: a first try. Jaegeuk > used SG_IO (in fact copy-paste of parts of libzbc) because the current > ZBC patch-set has no ioctl API for zone information manipulation. We > will fix this mkfs.f2fs once we agree on an ioctl interface. Which again is my point. If mkfs.f2fs wants to speed up it's discard pass in mkfs.f2fs by _not_ sending unneccessary Reset WP for zones that are already empty it has all the information it needs to do so. Here it seems to me that the zone cache is _at_best_ doing double work. At works the zone cache could be doing the wrong thing _if_ the zone cache got out of sync. It is certainly possible (however unlikely) that someone was doing some raw sg activity that is not seed by the sd path. All I am trying to do is have a discussion about the reasons for and against have a zone cache. Where it works and where it breaks this should be entirely technical but I understand that we have all spent a lot of time _not_ discussing this for various non-technical reasons. So far the only reason I've been able to ascertain is that Host Manged drives really don't like being stuck with the URSWRZ and would like to have a software hack to return MUD rather than ship drives with some weird out-of-the box config where the last zone is marked as FINISH'd thereby returning MUD on reads as per spec. I understand that it would be strange state to see of first boot and likely people would just do a ResetWP and have weird boot errors, which would probably just make matters worse. I just would rather the work around be a bit cleaner and/or use less memory. I would also like a path available that does not require SD_ZBC or BLK_ZONED for Host Aware drives to work, hence this set of patches and me begging for a single bit in struct bio. >> >> [..] >> > 3) Try to condense the blkzone data structure to save memory: > I think that we can at the very least remove the zone length, and also > may be
Re: [RFC][PATCHSET v2] allowing exports in *.S
Dne 2.8.2016 v 16:01 Michal Marek napsal(a): > On 2016-02-03 22:19, Al Viro wrote: >> Shortlog: >> Al Viro (13): >> [kbuild] handle exports in lib-y objects reliably >> EXPORT_SYMBOL() for asm >> x86: move exports to actual definitions >> alpha: move exports to actual definitions >> m68k: move exports to definitions >> s390: move exports to definitions >> arm: move exports to definitions >> ppc: move exports to definitions >> ppc: get rid of unreachable abs() implementation >> sparc: move exports to definitions >> [sparc] unify 32bit and 64bit string.h >> sparc32: debride memcpy.S a bit >> ia64: move exports to definitions > > After several pings by Al (sorry about that!), I got around to review a > rebased version of this patchset at > > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git asm-exports > > The kbuild commits are good, but since we are close to the end of the > merge window, I will apply them to my kbuild branch after 4.8-rc1. The rebased patchset is now in kbuild.git#kbuild. Before pushing, I noticed one issue: For some reason, drivers/firmware/efi/libstub/lib-ksyms.o is regenerated each time, leading to relink of vmlinux. I'm looking into this. Michal
Re: [RFC][PATCHSET v2] allowing exports in *.S
Dne 2.8.2016 v 16:01 Michal Marek napsal(a): > On 2016-02-03 22:19, Al Viro wrote: >> Shortlog: >> Al Viro (13): >> [kbuild] handle exports in lib-y objects reliably >> EXPORT_SYMBOL() for asm >> x86: move exports to actual definitions >> alpha: move exports to actual definitions >> m68k: move exports to definitions >> s390: move exports to definitions >> arm: move exports to definitions >> ppc: move exports to definitions >> ppc: get rid of unreachable abs() implementation >> sparc: move exports to definitions >> [sparc] unify 32bit and 64bit string.h >> sparc32: debride memcpy.S a bit >> ia64: move exports to definitions > > After several pings by Al (sorry about that!), I got around to review a > rebased version of this patchset at > > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git asm-exports > > The kbuild commits are good, but since we are close to the end of the > merge window, I will apply them to my kbuild branch after 4.8-rc1. The rebased patchset is now in kbuild.git#kbuild. Before pushing, I noticed one issue: For some reason, drivers/firmware/efi/libstub/lib-ksyms.o is regenerated each time, leading to relink of vmlinux. I'm looking into this. Michal
Re: [PATCH] KEYS: fix big_key dependency
Am Dienstag, 16. August 2016, 00:45:39 CEST schrieb Kirill Marinushkin: Hi Kirill, > + select CRYPTO_ANSI_CPRNG This change enables the RNG which will not pass FIPS testing any more. Hence, this selection could cause an issue in FIPS mode (i.e. booting the kernel with fips=1). May I suggest CRYPTO_DRBG? Ciao Stephan
Re: [PATCH] KEYS: fix big_key dependency
Am Dienstag, 16. August 2016, 00:45:39 CEST schrieb Kirill Marinushkin: Hi Kirill, > + select CRYPTO_ANSI_CPRNG This change enables the RNG which will not pass FIPS testing any more. Hence, this selection could cause an issue in FIPS mode (i.e. booting the kernel with fips=1). May I suggest CRYPTO_DRBG? Ciao Stephan
Re: [PATCH v6 0/5] /dev/random - a new approach
Am Montag, 15. August 2016, 13:42:54 CEST schrieb H. Peter Anvin: Hi H, > On 08/11/16 05:24, Stephan Mueller wrote: > > * prevent fast noise sources from dominating slow noise sources > > > > in case of /dev/random > > Can someone please explain if and why this is actually desirable, and if > this assessment has been passed to someone who has actual experience > with cryptography at the professional level? There are two motivations for that: - the current /dev/random is compliant to NTG.1 from AIS 20/31 which requires (in brief words) that entropy comes from auditible noise sources. Currently in my LRNG only RDRAND is a fast noise source which is not auditible (and it is designed to cause a VM exit making it even harder to assess it). To make the LRNG to comply with NTG.1, RDRAND can provide entropy but must not become the sole entropy provider which is the case now with that change. - the current /dev/random implementation follows the same concept with the exception of 3.15 and 3.16 where RDRAND was not rate-limited. In later versions, this was changed. Ciao Stephan
Re: [PATCH v6 0/5] /dev/random - a new approach
Am Montag, 15. August 2016, 13:42:54 CEST schrieb H. Peter Anvin: Hi H, > On 08/11/16 05:24, Stephan Mueller wrote: > > * prevent fast noise sources from dominating slow noise sources > > > > in case of /dev/random > > Can someone please explain if and why this is actually desirable, and if > this assessment has been passed to someone who has actual experience > with cryptography at the professional level? There are two motivations for that: - the current /dev/random is compliant to NTG.1 from AIS 20/31 which requires (in brief words) that entropy comes from auditible noise sources. Currently in my LRNG only RDRAND is a fast noise source which is not auditible (and it is designed to cause a VM exit making it even harder to assess it). To make the LRNG to comply with NTG.1, RDRAND can provide entropy but must not become the sole entropy provider which is the case now with that change. - the current /dev/random implementation follows the same concept with the exception of 3.15 and 3.16 where RDRAND was not rate-limited. In later versions, this was changed. Ciao Stephan
Re: [PATCH] Map in physical addresses in efi_map_region_fixed
On Mon, Aug 15, 2016 at 02:52:22PM -0700, H. Peter Anvin wrote: > So to answer the implicit question: we have found UEFI stacks in the > field which fail without the physical mappings present, and we have > found stacks which fail without a nontrivial SetAddressMapping. You mean SetVirtualAddressMap. Oh well, it's not like it matters all that much as we have our own pagetable for EFI so we can go nuts there. Apparently. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --
Re: [PATCH] Map in physical addresses in efi_map_region_fixed
On Mon, Aug 15, 2016 at 02:52:22PM -0700, H. Peter Anvin wrote: > So to answer the implicit question: we have found UEFI stacks in the > field which fail without the physical mappings present, and we have > found stacks which fail without a nontrivial SetAddressMapping. You mean SetVirtualAddressMap. Oh well, it's not like it matters all that much as we have our own pagetable for EFI so we can go nuts there. Apparently. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --
Re: [PATCH] perf/core: Fix the mask in perf_output_sample_regs
On Thursday 11 August 2016 05:57 PM, Peter Zijlstra wrote: Sorry, found it in my inbox while clearing out backlog.. On Sun, Jul 03, 2016 at 11:31:58PM +0530, Madhavan Srinivasan wrote: When decoding the perf_regs mask in perf_output_sample_regs(), we loop through the mask using find_first_bit and find_next_bit functions. While the exisitng code works fine in most of the case, the logic is broken for 32bit kernel (Big Endian). When reading u64 mask using (u32 *)()[0], find_*_bit() assumes it gets lower 32bits of u64 but instead gets upper 32bits which is wrong. Proposed fix is to swap the words of the u64 to handle this case. This is _not_ endianness swap. But it looks an awful lot like it.. Hit this issue when testing my perf_arch_regs patchset. Yep exactly the reason for adding that comment in the commit message. +++ b/kernel/events/core.c @@ -5205,8 +5205,10 @@ perf_output_sample_regs(struct perf_output_handle *handle, struct pt_regs *regs, u64 mask) { int bit; + DECLARE_BITMAP(_mask, 64); - for_each_set_bit(bit, (const unsigned long *) , + bitmap_from_u64(_mask, mask); + for_each_set_bit(bit, _mask, sizeof(mask) * BITS_PER_BYTE) { u64 val; +++ b/lib/bitmap.c +void bitmap_from_u64(unsigned long *dst, u64 mask) +{ + dst[0] = mask & ULONG_MAX; + + if (sizeof(mask) > sizeof(unsigned long)) + dst[1] = mask >> 32; +} +EXPORT_SYMBOL(bitmap_from_u64); Looks small enough for an inline. Alternatively you can go all the way and add bitmap_from_u64array(), but that seems massive overkill. Ok will make it inline and resend. Maddy Tedious stuff.. I can't come up with anything prettier :/
Re: [PATCH] perf/core: Fix the mask in perf_output_sample_regs
On Thursday 11 August 2016 05:57 PM, Peter Zijlstra wrote: Sorry, found it in my inbox while clearing out backlog.. On Sun, Jul 03, 2016 at 11:31:58PM +0530, Madhavan Srinivasan wrote: When decoding the perf_regs mask in perf_output_sample_regs(), we loop through the mask using find_first_bit and find_next_bit functions. While the exisitng code works fine in most of the case, the logic is broken for 32bit kernel (Big Endian). When reading u64 mask using (u32 *)()[0], find_*_bit() assumes it gets lower 32bits of u64 but instead gets upper 32bits which is wrong. Proposed fix is to swap the words of the u64 to handle this case. This is _not_ endianness swap. But it looks an awful lot like it.. Hit this issue when testing my perf_arch_regs patchset. Yep exactly the reason for adding that comment in the commit message. +++ b/kernel/events/core.c @@ -5205,8 +5205,10 @@ perf_output_sample_regs(struct perf_output_handle *handle, struct pt_regs *regs, u64 mask) { int bit; + DECLARE_BITMAP(_mask, 64); - for_each_set_bit(bit, (const unsigned long *) , + bitmap_from_u64(_mask, mask); + for_each_set_bit(bit, _mask, sizeof(mask) * BITS_PER_BYTE) { u64 val; +++ b/lib/bitmap.c +void bitmap_from_u64(unsigned long *dst, u64 mask) +{ + dst[0] = mask & ULONG_MAX; + + if (sizeof(mask) > sizeof(unsigned long)) + dst[1] = mask >> 32; +} +EXPORT_SYMBOL(bitmap_from_u64); Looks small enough for an inline. Alternatively you can go all the way and add bitmap_from_u64array(), but that seems massive overkill. Ok will make it inline and resend. Maddy Tedious stuff.. I can't come up with anything prettier :/
[PATCH v2 8/8] power: ds2760_battery: Remove deprecated create_singlethread_workqueue
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "monitor_wqueue" is used to monitor the battery status. It has been identity converted. It queues multiple work items viz >monitor_work, >set_charged_work, which require execution ordering. Hence, alloc_workqueue has been used to replace the deprecated create_singlethread_workqueue instance. WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar--- drivers/power/ds2760_battery.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/power/ds2760_battery.c b/drivers/power/ds2760_battery.c index 80f73cc..ac92e80 100644 --- a/drivers/power/ds2760_battery.c +++ b/drivers/power/ds2760_battery.c @@ -566,7 +566,8 @@ static int ds2760_battery_probe(struct platform_device *pdev) INIT_DELAYED_WORK(>monitor_work, ds2760_battery_work); INIT_DELAYED_WORK(>set_charged_work, ds2760_battery_set_charged_work); - di->monitor_wqueue = create_singlethread_workqueue(dev_name(>dev)); + di->monitor_wqueue = alloc_ordered_workqueue(dev_name(>dev), +WQ_MEM_RECLAIM); if (!di->monitor_wqueue) { retval = -ESRCH; goto workqueue_failed; -- 2.1.4
[PATCH v2 8/8] power: ds2760_battery: Remove deprecated create_singlethread_workqueue
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "monitor_wqueue" is used to monitor the battery status. It has been identity converted. It queues multiple work items viz >monitor_work, >set_charged_work, which require execution ordering. Hence, alloc_workqueue has been used to replace the deprecated create_singlethread_workqueue instance. WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar --- drivers/power/ds2760_battery.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/power/ds2760_battery.c b/drivers/power/ds2760_battery.c index 80f73cc..ac92e80 100644 --- a/drivers/power/ds2760_battery.c +++ b/drivers/power/ds2760_battery.c @@ -566,7 +566,8 @@ static int ds2760_battery_probe(struct platform_device *pdev) INIT_DELAYED_WORK(>monitor_work, ds2760_battery_work); INIT_DELAYED_WORK(>set_charged_work, ds2760_battery_set_charged_work); - di->monitor_wqueue = create_singlethread_workqueue(dev_name(>dev)); + di->monitor_wqueue = alloc_ordered_workqueue(dev_name(>dev), +WQ_MEM_RECLAIM); if (!di->monitor_wqueue) { retval = -ESRCH; goto workqueue_failed; -- 2.1.4
[PATCH v2 5/8] power: ab8500_charger: Remove deprecated create_singlethread_workqueue
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "charger_wq" is used for the IRQs and checking HW state of the charger. It has been identity converted. It has multiple work items viz usb_charger_attached_work, kick_wd_work, check_vbat_work, check_hw_failure_work, usb_charger_attached_work, ac_work, ac_charger_attached_work, attach_work and check_usbchgnotok_work, which require execution ordering. Hence, a dedicated ordered workqueue has been used here. The WQ_MEM_RECLAIM flag has also been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar--- drivers/power/ab8500_charger.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/power/ab8500_charger.c b/drivers/power/ab8500_charger.c index 30de5d4..5cee9aa 100644 --- a/drivers/power/ab8500_charger.c +++ b/drivers/power/ab8500_charger.c @@ -3540,8 +3540,8 @@ static int ab8500_charger_probe(struct platform_device *pdev) di->usb_state.usb_current = -1; /* Create a work queue for the charger */ - di->charger_wq = - create_singlethread_workqueue("ab8500_charger_wq"); + di->charger_wq = alloc_ordered_workqueue("ab8500_charger_wq", +WQ_MEM_RECLAIM); if (di->charger_wq == NULL) { dev_err(di->dev, "failed to create work queue\n"); return -ENOMEM; -- 2.1.4
[PATCH v2 7/8] power: ab8500_fg: Remove deprecated create_singlethread_workqueue
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "fg_wq" is used for running the FG algorithm periodically. It has been identity converted. It has multiple work items viz fg_periodic_work, fg_low_bat_work, fg_reinit_work, fg_work, fg_acc_cur_work and fg_check_hw_failure_work, which require execution ordering. Hence, a dedicated ordered workqueue has been used here. The WQ_MEM_RECLAIM flag has been set to guarantee forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar--- drivers/power/ab8500_fg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/power/ab8500_fg.c b/drivers/power/ab8500_fg.c index 5a36cf8..199f2db 100644 --- a/drivers/power/ab8500_fg.c +++ b/drivers/power/ab8500_fg.c @@ -3096,7 +3096,7 @@ static int ab8500_fg_probe(struct platform_device *pdev) ab8500_fg_discharge_state_to(di, AB8500_FG_DISCHARGE_INIT); /* Create a work queue for running the FG algorithm */ - di->fg_wq = create_singlethread_workqueue("ab8500_fg_wq"); + di->fg_wq = alloc_ordered_workqueue("ab8500_fg_wq", WQ_MEM_RECLAIM); if (di->fg_wq == NULL) { dev_err(di->dev, "failed to create work queue\n"); return -ENOMEM; -- 2.1.4
[PATCH v2 6/8] power: ipaq_micro_battery: Remove deprecated create_singlethread_workqueue
The workqueue "wq" is used for handling battery related tasks. It has a single work item viz >update and hence it doesn't require execution ordering. Hence, alloc_workqueue has been used to replace the deprecated create_singlethread_workqueue instance. The WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure. Since there is a single work item, explicit concurrency limit is unnecessary here. Signed-off-by: Bhaktipriya Shridhar--- drivers/power/ipaq_micro_battery.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/power/ipaq_micro_battery.c b/drivers/power/ipaq_micro_battery.c index 35b01c7..4af7b77 100644 --- a/drivers/power/ipaq_micro_battery.c +++ b/drivers/power/ipaq_micro_battery.c @@ -235,7 +235,7 @@ static int micro_batt_probe(struct platform_device *pdev) return -ENOMEM; mb->micro = dev_get_drvdata(pdev->dev.parent); - mb->wq = create_singlethread_workqueue("ipaq-battery-wq"); + mb->wq = alloc_workqueue("ipaq-battery-wq", WQ_MEM_RECLAIM, 0); if (!mb->wq) return -ENOMEM; -- 2.1.4
[PATCH v2 5/8] power: ab8500_charger: Remove deprecated create_singlethread_workqueue
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "charger_wq" is used for the IRQs and checking HW state of the charger. It has been identity converted. It has multiple work items viz usb_charger_attached_work, kick_wd_work, check_vbat_work, check_hw_failure_work, usb_charger_attached_work, ac_work, ac_charger_attached_work, attach_work and check_usbchgnotok_work, which require execution ordering. Hence, a dedicated ordered workqueue has been used here. The WQ_MEM_RECLAIM flag has also been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar --- drivers/power/ab8500_charger.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/power/ab8500_charger.c b/drivers/power/ab8500_charger.c index 30de5d4..5cee9aa 100644 --- a/drivers/power/ab8500_charger.c +++ b/drivers/power/ab8500_charger.c @@ -3540,8 +3540,8 @@ static int ab8500_charger_probe(struct platform_device *pdev) di->usb_state.usb_current = -1; /* Create a work queue for the charger */ - di->charger_wq = - create_singlethread_workqueue("ab8500_charger_wq"); + di->charger_wq = alloc_ordered_workqueue("ab8500_charger_wq", +WQ_MEM_RECLAIM); if (di->charger_wq == NULL) { dev_err(di->dev, "failed to create work queue\n"); return -ENOMEM; -- 2.1.4
[PATCH v2 7/8] power: ab8500_fg: Remove deprecated create_singlethread_workqueue
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "fg_wq" is used for running the FG algorithm periodically. It has been identity converted. It has multiple work items viz fg_periodic_work, fg_low_bat_work, fg_reinit_work, fg_work, fg_acc_cur_work and fg_check_hw_failure_work, which require execution ordering. Hence, a dedicated ordered workqueue has been used here. The WQ_MEM_RECLAIM flag has been set to guarantee forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar --- drivers/power/ab8500_fg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/power/ab8500_fg.c b/drivers/power/ab8500_fg.c index 5a36cf8..199f2db 100644 --- a/drivers/power/ab8500_fg.c +++ b/drivers/power/ab8500_fg.c @@ -3096,7 +3096,7 @@ static int ab8500_fg_probe(struct platform_device *pdev) ab8500_fg_discharge_state_to(di, AB8500_FG_DISCHARGE_INIT); /* Create a work queue for running the FG algorithm */ - di->fg_wq = create_singlethread_workqueue("ab8500_fg_wq"); + di->fg_wq = alloc_ordered_workqueue("ab8500_fg_wq", WQ_MEM_RECLAIM); if (di->fg_wq == NULL) { dev_err(di->dev, "failed to create work queue\n"); return -ENOMEM; -- 2.1.4
[PATCH v2 6/8] power: ipaq_micro_battery: Remove deprecated create_singlethread_workqueue
The workqueue "wq" is used for handling battery related tasks. It has a single work item viz >update and hence it doesn't require execution ordering. Hence, alloc_workqueue has been used to replace the deprecated create_singlethread_workqueue instance. The WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure. Since there is a single work item, explicit concurrency limit is unnecessary here. Signed-off-by: Bhaktipriya Shridhar --- drivers/power/ipaq_micro_battery.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/power/ipaq_micro_battery.c b/drivers/power/ipaq_micro_battery.c index 35b01c7..4af7b77 100644 --- a/drivers/power/ipaq_micro_battery.c +++ b/drivers/power/ipaq_micro_battery.c @@ -235,7 +235,7 @@ static int micro_batt_probe(struct platform_device *pdev) return -ENOMEM; mb->micro = dev_get_drvdata(pdev->dev.parent); - mb->wq = create_singlethread_workqueue("ipaq-battery-wq"); + mb->wq = alloc_workqueue("ipaq-battery-wq", WQ_MEM_RECLAIM, 0); if (!mb->wq) return -ENOMEM; -- 2.1.4
[PATCH v2 4/8] power: intel_mid_battery: Remove deprecated create_singlethread_workqueue
The workqueue "monitor_wqueue" is used to monitor the PMIC battery status. It queues a single work item (pbi->monitor_battery) and hence doesn't require ordering. Hence, alloc_workqueue has been used to replace the deprecated create_singlethread_workqueue instance. Since PMIC battery status needs to be monitored for any change, the WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure. Since there is a single work item, explicit concurrency limit is unnecessary here. Signed-off-by: Bhaktipriya Shridhar--- drivers/power/intel_mid_battery.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/power/intel_mid_battery.c b/drivers/power/intel_mid_battery.c index 9fa4acc..dc7feef 100644 --- a/drivers/power/intel_mid_battery.c +++ b/drivers/power/intel_mid_battery.c @@ -689,8 +689,7 @@ static int probe(int irq, struct device *dev) /* initialize all required framework before enabling interrupts */ INIT_WORK(>handler, pmic_battery_handle_intrpt); INIT_DELAYED_WORK(>monitor_battery, pmic_battery_monitor); - pbi->monitor_wqueue = - create_singlethread_workqueue(dev_name(dev)); + pbi->monitor_wqueue = alloc_workqueue(dev_name(dev), WQ_MEM_RECLAIM, 0); if (!pbi->monitor_wqueue) { dev_err(dev, "%s(): wqueue init failed\n", __func__); retval = -ESRCH; -- 2.1.4
[PATCH v2 3/8] power: pm2301_charger: Remove deprecated create_singlethread_workqueue
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "charger_wq" is used for running all the charger related tasks. This involves charger detection, checking for HW failure and HW status. This workqueue has been identity converted. It queues multiple workitems viz >check_main_thermal_prot_work, >check_hw_failure_work, >ac_work. Hence, the deprecated create_singlethread_workqueue() instance has been replaced with a dedicated ordered workqueue. The WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar--- drivers/power/pm2301_charger.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/power/pm2301_charger.c b/drivers/power/pm2301_charger.c index fb62ed3..78561b6 100644 --- a/drivers/power/pm2301_charger.c +++ b/drivers/power/pm2301_charger.c @@ -1054,7 +1054,8 @@ static int pm2xxx_wall_charger_probe(struct i2c_client *i2c_client, pm2->ac_chg.external = true; /* Create a work queue for the charger */ - pm2->charger_wq = create_singlethread_workqueue("pm2xxx_charger_wq"); + pm2->charger_wq = alloc_ordered_workqueue("pm2xxx_charger_wq", + WQ_MEM_RECLAIM); if (pm2->charger_wq == NULL) { ret = -ENOMEM; dev_err(pm2->dev, "failed to create work queue\n"); -- 2.1.4
[PATCH v2 2/8] power: ab8500_btemp: Remove deprecated create_singlethread_workqueue
The workqueue "btemp_wq" is used for measuring the temperature periodically. It queues a single workitem (btemp_periodic_work) and hence doesn't require ordering. Thus, the deprecated create_singlethread_workqueue() instance has been replaced with alloc_workqueue(). The WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure. Since there is a single work item, explicit concurrency limit is unnecessary here. Signed-off-by: Bhaktipriya Shridhar--- drivers/power/ab8500_btemp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/power/ab8500_btemp.c b/drivers/power/ab8500_btemp.c index bf2e5dd..6ffdc18 100644 --- a/drivers/power/ab8500_btemp.c +++ b/drivers/power/ab8500_btemp.c @@ -1095,7 +1095,7 @@ static int ab8500_btemp_probe(struct platform_device *pdev) /* Create a work queue for the btemp */ di->btemp_wq = - create_singlethread_workqueue("ab8500_btemp_wq"); + alloc_workqueue("ab8500_btemp_wq", WQ_MEM_RECLAIM, 0); if (di->btemp_wq == NULL) { dev_err(di->dev, "failed to create work queue\n"); return -ENOMEM; -- 2.1.4
[PATCH v2 4/8] power: intel_mid_battery: Remove deprecated create_singlethread_workqueue
The workqueue "monitor_wqueue" is used to monitor the PMIC battery status. It queues a single work item (pbi->monitor_battery) and hence doesn't require ordering. Hence, alloc_workqueue has been used to replace the deprecated create_singlethread_workqueue instance. Since PMIC battery status needs to be monitored for any change, the WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure. Since there is a single work item, explicit concurrency limit is unnecessary here. Signed-off-by: Bhaktipriya Shridhar --- drivers/power/intel_mid_battery.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/power/intel_mid_battery.c b/drivers/power/intel_mid_battery.c index 9fa4acc..dc7feef 100644 --- a/drivers/power/intel_mid_battery.c +++ b/drivers/power/intel_mid_battery.c @@ -689,8 +689,7 @@ static int probe(int irq, struct device *dev) /* initialize all required framework before enabling interrupts */ INIT_WORK(>handler, pmic_battery_handle_intrpt); INIT_DELAYED_WORK(>monitor_battery, pmic_battery_monitor); - pbi->monitor_wqueue = - create_singlethread_workqueue(dev_name(dev)); + pbi->monitor_wqueue = alloc_workqueue(dev_name(dev), WQ_MEM_RECLAIM, 0); if (!pbi->monitor_wqueue) { dev_err(dev, "%s(): wqueue init failed\n", __func__); retval = -ESRCH; -- 2.1.4
[PATCH v2 3/8] power: pm2301_charger: Remove deprecated create_singlethread_workqueue
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "charger_wq" is used for running all the charger related tasks. This involves charger detection, checking for HW failure and HW status. This workqueue has been identity converted. It queues multiple workitems viz >check_main_thermal_prot_work, >check_hw_failure_work, >ac_work. Hence, the deprecated create_singlethread_workqueue() instance has been replaced with a dedicated ordered workqueue. The WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar --- drivers/power/pm2301_charger.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/power/pm2301_charger.c b/drivers/power/pm2301_charger.c index fb62ed3..78561b6 100644 --- a/drivers/power/pm2301_charger.c +++ b/drivers/power/pm2301_charger.c @@ -1054,7 +1054,8 @@ static int pm2xxx_wall_charger_probe(struct i2c_client *i2c_client, pm2->ac_chg.external = true; /* Create a work queue for the charger */ - pm2->charger_wq = create_singlethread_workqueue("pm2xxx_charger_wq"); + pm2->charger_wq = alloc_ordered_workqueue("pm2xxx_charger_wq", + WQ_MEM_RECLAIM); if (pm2->charger_wq == NULL) { ret = -ENOMEM; dev_err(pm2->dev, "failed to create work queue\n"); -- 2.1.4
[PATCH v2 2/8] power: ab8500_btemp: Remove deprecated create_singlethread_workqueue
The workqueue "btemp_wq" is used for measuring the temperature periodically. It queues a single workitem (btemp_periodic_work) and hence doesn't require ordering. Thus, the deprecated create_singlethread_workqueue() instance has been replaced with alloc_workqueue(). The WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure. Since there is a single work item, explicit concurrency limit is unnecessary here. Signed-off-by: Bhaktipriya Shridhar --- drivers/power/ab8500_btemp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/power/ab8500_btemp.c b/drivers/power/ab8500_btemp.c index bf2e5dd..6ffdc18 100644 --- a/drivers/power/ab8500_btemp.c +++ b/drivers/power/ab8500_btemp.c @@ -1095,7 +1095,7 @@ static int ab8500_btemp_probe(struct platform_device *pdev) /* Create a work queue for the btemp */ di->btemp_wq = - create_singlethread_workqueue("ab8500_btemp_wq"); + alloc_workqueue("ab8500_btemp_wq", WQ_MEM_RECLAIM, 0); if (di->btemp_wq == NULL) { dev_err(di->dev, "failed to create work queue\n"); return -ENOMEM; -- 2.1.4
[PATCH v2 1/8] power: abx500_chargalg: Remove deprecated create_singlethread_workqueue
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "chargalg_wq" is used for running the charging algorithm. It has multiple workitems viz >chargalg_periodic_work, >chargalg_wd_work, >chargalg_work per abx500_chargalg, which require ordering. It has been identity converted. Also, WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar--- drivers/power/abx500_chargalg.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/power/abx500_chargalg.c b/drivers/power/abx500_chargalg.c index d9104b1..a4411d6 100644 --- a/drivers/power/abx500_chargalg.c +++ b/drivers/power/abx500_chargalg.c @@ -2091,8 +2091,8 @@ static int abx500_chargalg_probe(struct platform_device *pdev) abx500_chargalg_maintenance_timer_expired; /* Create a work queue for the chargalg */ - di->chargalg_wq = - create_singlethread_workqueue("abx500_chargalg_wq"); + di->chargalg_wq = alloc_ordered_workqueue("abx500_chargalg_wq", + WQ_MEM_RECLAIM); if (di->chargalg_wq == NULL) { dev_err(di->dev, "failed to create work queue\n"); return -ENOMEM; -- 2.1.4
[PATCH v2 1/8] power: abx500_chargalg: Remove deprecated create_singlethread_workqueue
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "chargalg_wq" is used for running the charging algorithm. It has multiple workitems viz >chargalg_periodic_work, >chargalg_wd_work, >chargalg_work per abx500_chargalg, which require ordering. It has been identity converted. Also, WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar --- drivers/power/abx500_chargalg.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/power/abx500_chargalg.c b/drivers/power/abx500_chargalg.c index d9104b1..a4411d6 100644 --- a/drivers/power/abx500_chargalg.c +++ b/drivers/power/abx500_chargalg.c @@ -2091,8 +2091,8 @@ static int abx500_chargalg_probe(struct platform_device *pdev) abx500_chargalg_maintenance_timer_expired; /* Create a work queue for the chargalg */ - di->chargalg_wq = - create_singlethread_workqueue("abx500_chargalg_wq"); + di->chargalg_wq = alloc_ordered_workqueue("abx500_chargalg_wq", + WQ_MEM_RECLAIM); if (di->chargalg_wq == NULL) { dev_err(di->dev, "failed to create work queue\n"); return -ENOMEM; -- 2.1.4
[PATCH v2 0/8] power: Remove deprecated create_singlethread_workqueue
This patch set removes the instances of deprecated create_singlethread_workqueues in drivers/power by making the appropriate conversions. Bhaktipriya Shridhar (8): power: abx500_chargalg: Remove deprecated create_singlethread_workqueue power: ab8500_btemp: Remove deprecated create_singlethread_workqueue power: pm2301_charger: Remove deprecated create_singlethread_workqueue power: intel_mid_battery: Remove deprecated create_singlethread_workqueue power: ab8500_charger: Remove deprecated create_singlethread_workqueue power: ipaq_micro_battery: Remove deprecated create_singlethread_workqueue power: ab8500_fg: Remove deprecated create_singlethread_workqueue power: ds2760_battery: Remove deprecated create_singlethread_workqueue drivers/power/ab8500_btemp.c | 2 +- drivers/power/ab8500_charger.c | 4 ++-- drivers/power/ab8500_fg.c | 2 +- drivers/power/abx500_chargalg.c| 4 ++-- drivers/power/ds2760_battery.c | 3 ++- drivers/power/intel_mid_battery.c | 3 +-- drivers/power/ipaq_micro_battery.c | 2 +- drivers/power/pm2301_charger.c | 3 ++- 8 files changed, 12 insertions(+), 11 deletions(-) -- 2.1.4
[PATCH v2 0/8] power: Remove deprecated create_singlethread_workqueue
This patch set removes the instances of deprecated create_singlethread_workqueues in drivers/power by making the appropriate conversions. Bhaktipriya Shridhar (8): power: abx500_chargalg: Remove deprecated create_singlethread_workqueue power: ab8500_btemp: Remove deprecated create_singlethread_workqueue power: pm2301_charger: Remove deprecated create_singlethread_workqueue power: intel_mid_battery: Remove deprecated create_singlethread_workqueue power: ab8500_charger: Remove deprecated create_singlethread_workqueue power: ipaq_micro_battery: Remove deprecated create_singlethread_workqueue power: ab8500_fg: Remove deprecated create_singlethread_workqueue power: ds2760_battery: Remove deprecated create_singlethread_workqueue drivers/power/ab8500_btemp.c | 2 +- drivers/power/ab8500_charger.c | 4 ++-- drivers/power/ab8500_fg.c | 2 +- drivers/power/abx500_chargalg.c| 4 ++-- drivers/power/ds2760_battery.c | 3 ++- drivers/power/intel_mid_battery.c | 3 +-- drivers/power/ipaq_micro_battery.c | 2 +- drivers/power/pm2301_charger.c | 3 ++- 8 files changed, 12 insertions(+), 11 deletions(-) -- 2.1.4
Re: [PATCH v1 3/3] PM / AVS: rockchip-cpu-avs: add driver handling Rockchip cpu avs
Hi Finley, [auto build test ERROR on battery/master] [also build test ERROR on v4.8-rc2 next-20160815] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Finlye-Xiao/PM-AVS-add-Rockchip-cpu-avs/20160816-105228 base: git://git.infradead.org/battery-2.6.git master config: arm-allmodconfig (attached as .config) compiler: arm-linux-gnueabi-gcc (Debian 5.4.0-6) 5.4.0 20160609 reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=arm All error/warnings (new ones prefixed by >>): drivers/power/avs/rockchip-cpu-avs.c: In function 'rockchip_cpu_avs_notifier': >> drivers/power/avs/rockchip-cpu-avs.c:230:10: error: implicit declaration of >> function 'cpufreq_frequency_get_table' >> [-Werror=implicit-function-declaration] table = cpufreq_frequency_get_table(policy->cpu); ^ >> drivers/power/avs/rockchip-cpu-avs.c:230:8: warning: assignment makes >> pointer from integer without a cast [-Wint-conversion] table = cpufreq_frequency_get_table(policy->cpu); ^ cc1: some warnings being treated as errors vim +/cpufreq_frequency_get_table +230 drivers/power/avs/rockchip-cpu-avs.c 224 dev = get_cpu_device(policy->cpu); 225 if (!dev) { 226 pr_err("cpu%d Failed to get device\n", policy->cpu); 227 goto out; 228 } 229 > 230 table = cpufreq_frequency_get_table(policy->cpu); 231 if (!table) { 232 pr_err("cpu%d CPUFreq table not found\n", policy->cpu); 233 goto out; --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH v1 3/3] PM / AVS: rockchip-cpu-avs: add driver handling Rockchip cpu avs
Hi Finley, [auto build test ERROR on battery/master] [also build test ERROR on v4.8-rc2 next-20160815] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Finlye-Xiao/PM-AVS-add-Rockchip-cpu-avs/20160816-105228 base: git://git.infradead.org/battery-2.6.git master config: arm-allmodconfig (attached as .config) compiler: arm-linux-gnueabi-gcc (Debian 5.4.0-6) 5.4.0 20160609 reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=arm All error/warnings (new ones prefixed by >>): drivers/power/avs/rockchip-cpu-avs.c: In function 'rockchip_cpu_avs_notifier': >> drivers/power/avs/rockchip-cpu-avs.c:230:10: error: implicit declaration of >> function 'cpufreq_frequency_get_table' >> [-Werror=implicit-function-declaration] table = cpufreq_frequency_get_table(policy->cpu); ^ >> drivers/power/avs/rockchip-cpu-avs.c:230:8: warning: assignment makes >> pointer from integer without a cast [-Wint-conversion] table = cpufreq_frequency_get_table(policy->cpu); ^ cc1: some warnings being treated as errors vim +/cpufreq_frequency_get_table +230 drivers/power/avs/rockchip-cpu-avs.c 224 dev = get_cpu_device(policy->cpu); 225 if (!dev) { 226 pr_err("cpu%d Failed to get device\n", policy->cpu); 227 goto out; 228 } 229 > 230 table = cpufreq_frequency_get_table(policy->cpu); 231 if (!table) { 232 pr_err("cpu%d CPUFreq table not found\n", policy->cpu); 233 goto out; --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
[PATCH] Bluetooth: btusb: Add support for 0cf3:e009
Device 0cf3:e009 is one of the QCA ROME family. T: Bus=01 Lev=01 Prnt=01 Port=07 Cnt=04 Dev#= 4 Spd=12 MxCh= 0 D: Ver= 2.01 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0cf3 ProdID=e009 Rev=00.01 C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb Signed-off-by: Kai-Heng Feng--- drivers/bluetooth/btusb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index c58a00c..80ae854 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -248,6 +248,7 @@ static const struct usb_device_id blacklist_table[] = { /* QCA ROME chipset */ { USB_DEVICE(0x0cf3, 0xe007), .driver_info = BTUSB_QCA_ROME }, + { USB_DEVICE(0x0cf3, 0xe009), .driver_info = BTUSB_QCA_ROME }, { USB_DEVICE(0x0cf3, 0xe300), .driver_info = BTUSB_QCA_ROME }, { USB_DEVICE(0x0cf3, 0xe360), .driver_info = BTUSB_QCA_ROME }, { USB_DEVICE(0x0489, 0xe092), .driver_info = BTUSB_QCA_ROME }, -- 2.8.1
[PATCH] Bluetooth: btusb: Add support for 0cf3:e009
Device 0cf3:e009 is one of the QCA ROME family. T: Bus=01 Lev=01 Prnt=01 Port=07 Cnt=04 Dev#= 4 Spd=12 MxCh= 0 D: Ver= 2.01 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0cf3 ProdID=e009 Rev=00.01 C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb Signed-off-by: Kai-Heng Feng --- drivers/bluetooth/btusb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index c58a00c..80ae854 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -248,6 +248,7 @@ static const struct usb_device_id blacklist_table[] = { /* QCA ROME chipset */ { USB_DEVICE(0x0cf3, 0xe007), .driver_info = BTUSB_QCA_ROME }, + { USB_DEVICE(0x0cf3, 0xe009), .driver_info = BTUSB_QCA_ROME }, { USB_DEVICE(0x0cf3, 0xe300), .driver_info = BTUSB_QCA_ROME }, { USB_DEVICE(0x0cf3, 0xe360), .driver_info = BTUSB_QCA_ROME }, { USB_DEVICE(0x0489, 0xe092), .driver_info = BTUSB_QCA_ROME }, -- 2.8.1
Re: ASoC: sun4i-codec: playback stall and I/O error with DAPM paths all disabled
On Mon, Aug 15, 2016 at 7:42 PM, Mark Brownwrote: > On Mon, Aug 15, 2016 at 05:43:55PM +0800, wens Tsai wrote: > >> What is unexpected is any attempt to play anything under this state makes >> the playback software (in my case mpg321) stall, and later report an I/O >> error. My guess is that the DAC is still disabled by DAPM, so it doesn't >> send any DRQs, and thus the DMA engine is not consuming any data from >> userspace. > > This is normal for ASoC - like you say it'll be becasue the hardware > isn't powered up. > >> I think we should just enable the digital bits of the DAC/ADC all the >> time. Or maybe transfer and then discard data if the DAC is off. Not >> sure if this is doable though. I expect playback software to work, and >> not block, regardless of the hardware status. > > Powering things up all the time will have a major effect on battery life > for systems that care about that. The expectation is that systems with > this sort of hardware won't normally be offering end users direct > control of the routing, it'll be something that's handled during system > integration. Ok. So I guess one solution would be to move the mute controls out of DAPM, and maybe change some other mux like paths into actual muxes, so there's at least one usable path at all times. IIRC there was a patch doing something like this. I'll look into it. Regards ChenYu
Re: ASoC: sun4i-codec: playback stall and I/O error with DAPM paths all disabled
On Mon, Aug 15, 2016 at 7:42 PM, Mark Brown wrote: > On Mon, Aug 15, 2016 at 05:43:55PM +0800, wens Tsai wrote: > >> What is unexpected is any attempt to play anything under this state makes >> the playback software (in my case mpg321) stall, and later report an I/O >> error. My guess is that the DAC is still disabled by DAPM, so it doesn't >> send any DRQs, and thus the DMA engine is not consuming any data from >> userspace. > > This is normal for ASoC - like you say it'll be becasue the hardware > isn't powered up. > >> I think we should just enable the digital bits of the DAC/ADC all the >> time. Or maybe transfer and then discard data if the DAC is off. Not >> sure if this is doable though. I expect playback software to work, and >> not block, regardless of the hardware status. > > Powering things up all the time will have a major effect on battery life > for systems that care about that. The expectation is that systems with > this sort of hardware won't normally be offering end users direct > control of the routing, it'll be something that's handled during system > integration. Ok. So I guess one solution would be to move the mute controls out of DAPM, and maybe change some other mux like paths into actual muxes, so there's at least one usable path at all times. IIRC there was a patch doing something like this. I'll look into it. Regards ChenYu
Re: [lkp] [usb] ad05399d68: BUG: unable to handle kernel NULL pointer dereference at 0000000000000012
On 08/16, Peter Chen wrote: >On Mon, Aug 15, 2016 at 10:49:55PM +0800, Ye Xiaolong wrote: >> On 08/15, Peter Chen wrote: >> > >> >> >> >> >> >>FYI, we noticed the following commit: >> >> >> >>https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git testing/next >> >>commit >> >>ad05399d68b6ae1649cdcfc82ce3ffea1a7c5104 ("usb: udc: core: fix error >> >>handling") >> >> >> > >> >Hi Xiaolong, >> > >> >You reported it one month ago, and said it is a false report. see below. >> >Would you please double confirm it? >> >> Hi, peter >> >> Last time I reported stat "WARNING: CPU: 0 PID: 1 at >> lib/list_debug.c:36" and it showed both in this commit and its parent, >> this time, the observed change stat is "BUG: unable to handle kernel NULL >> pointer dereference at 0012" and it doesn't show in parent >> commit, however, the parent commit's dmesg would show kernel panic log >> as: >> >> [ 10.338487] Kernel panic - not syncing: Attempted to kill init! >> exitcode=0x000b >> [ 10.338487] >> [ 10.339911] CPU: 0 PID: 1 Comm: init Not tainted 4.8.0-rc1-00020-g0937a4d >> #1 >> [ 10.341177] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> Debian-1.8.2-1 04/01/2014 >> [ 10.342798] 88001e53bc28 8168cf8a >> 88001e534000 >> [ 10.345177] 8256ef20 88001e53bcb8 88001e50ca50 >> 88001e53bca8 >> [ 10.346739] 8114e062 8810 88001e53bcb8 >> 88001e53bc50 >> [ 10.347970] Call Trace: >> [ 10.348690] [] dump_stack+0x83/0xb9 >> [ 10.351592] [] panic+0xf3/0x2a9 >> [ 10.352386] [] do_exit+0x601/0xde0 >> [ 10.352879] [] ? __sigqueue_free+0x43/0x50 >> [ 10.353511] [] ? __dequeue_signal+0x1f7/0x210 >> [ 10.354483] [] do_group_exit+0xa2/0x100 >> [ 10.355324] [] get_signal+0x68e/0x740 >> [ 10.356155] [] do_signal+0x23/0x670 >> [ 10.356983] [] ? do_syslog+0x2c0/0x6a0 >> [ 10.357832] [] ? bad_area_nosemaphore+0x33/0x40 >> [ 10.358825] [] ? __do_page_fault+0x407/0x4d0 >> [ 10.359738] [] exit_to_usermode_loop+0x69/0xc0 >> [ 10.360680] [] prepare_exit_to_usermode+0x3d/0x70 >> [ 10.361725] [] retint_user+0x8/0x10 >> [ 10.362650] Kernel Offset: disabled >> >> The whole parent dmesg is attached. >> > >Then, what's the conclusion? Is this one is detect one or not? > It seems parent kernel lives longer than this commit, and the sysfs_kf_write bug shows up consistently in 3 boot tests in LKP environment. % compare -at ad05399d68b6ae1649cdcfc82ce3ffea1a7c5104 tests: 3 testcase/path_params/tbox_group/run: boot/1/vm-kbuild-yocto-x86_64 0937a4d787539e2f ad05399d68b6ae1649cdcfc82c -- fail:runs %reproductionfail:runs | | | 6:6 -100%:4 kmsg.stc):gdata/new_proto/recv_or_reg_complete_cb_not_ready 6:6 -100%:4 kmsg.fmdrv:st_unregister_failed :6 100% 4:4 kmsg.list_del_corruption.prev->next_should_be#,but_was :6 100% 4:4 dmesg.WARNING:at_lib/list_debug.c:#__list_del_entry :6 100% 4:4 dmesg.BUG:unable_to_handle_kernel :6 100% 4:4 dmesg.Oops :6 100% 4:4 dmesg.RIP:sysfs_kf_write :6 100% 4:4 dmesg.Kernel_panic-not_syncing:Fatal_exception 6:6 -100%:4 dmesg.Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode= testcase/path_params/tbox_group/run: boot/1/vm-ivb41-yocto-ia32 0937a4d787539e2f ad05399d68b6ae1649cdcfc82c -- 2:2 -100%:2 kmsg.stc):gdata/new_proto/recv_or_reg_complete_cb_not_ready 2:2 -100%:2 kmsg.fmdrv:st_unregister_failed :2 100% 2:2 dmesg.BUG:unable_to_handle_kernel :2 100% 2:2 dmesg.Oops :2 100% 2:2 dmesg.RIP:sysfs_kf_write :2 100% 2:2 dmesg.Kernel_panic-not_syncing:Fatal_exception 2:2 -100%:2 dmesg.BUG:kernel_test_hang testcase/path_params/tbox_group/run: boot/1/vm-kbuild-1G 0937a4d787539e2f ad05399d68b6ae1649cdcfc82c -- :4 100% 4:4 dmesg.WARNING:at_lib/list_debug.c:#__list_del_entry :4 75% 3:4 dmesg.BUG:unable_to_handle_kernel :4 75% 3:4 dmesg.Oops :4 75% 3:4 dmesg.RIP:sysfs_kf_write :4 75% 3:4 dmesg.Kernel_panic-not_syncing:Fatal_exception 4:4 -100%:4 dmesg.BUG:kernel_oversize_in_test_stage >Peter > >> Thanks, >> Xiaolong >> >> > >> >On Wed, Jul 13, 2016 at
Re: [lkp] [usb] ad05399d68: BUG: unable to handle kernel NULL pointer dereference at 0000000000000012
On 08/16, Peter Chen wrote: >On Mon, Aug 15, 2016 at 10:49:55PM +0800, Ye Xiaolong wrote: >> On 08/15, Peter Chen wrote: >> > >> >> >> >> >> >>FYI, we noticed the following commit: >> >> >> >>https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git testing/next >> >>commit >> >>ad05399d68b6ae1649cdcfc82ce3ffea1a7c5104 ("usb: udc: core: fix error >> >>handling") >> >> >> > >> >Hi Xiaolong, >> > >> >You reported it one month ago, and said it is a false report. see below. >> >Would you please double confirm it? >> >> Hi, peter >> >> Last time I reported stat "WARNING: CPU: 0 PID: 1 at >> lib/list_debug.c:36" and it showed both in this commit and its parent, >> this time, the observed change stat is "BUG: unable to handle kernel NULL >> pointer dereference at 0012" and it doesn't show in parent >> commit, however, the parent commit's dmesg would show kernel panic log >> as: >> >> [ 10.338487] Kernel panic - not syncing: Attempted to kill init! >> exitcode=0x000b >> [ 10.338487] >> [ 10.339911] CPU: 0 PID: 1 Comm: init Not tainted 4.8.0-rc1-00020-g0937a4d >> #1 >> [ 10.341177] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> Debian-1.8.2-1 04/01/2014 >> [ 10.342798] 88001e53bc28 8168cf8a >> 88001e534000 >> [ 10.345177] 8256ef20 88001e53bcb8 88001e50ca50 >> 88001e53bca8 >> [ 10.346739] 8114e062 8810 88001e53bcb8 >> 88001e53bc50 >> [ 10.347970] Call Trace: >> [ 10.348690] [] dump_stack+0x83/0xb9 >> [ 10.351592] [] panic+0xf3/0x2a9 >> [ 10.352386] [] do_exit+0x601/0xde0 >> [ 10.352879] [] ? __sigqueue_free+0x43/0x50 >> [ 10.353511] [] ? __dequeue_signal+0x1f7/0x210 >> [ 10.354483] [] do_group_exit+0xa2/0x100 >> [ 10.355324] [] get_signal+0x68e/0x740 >> [ 10.356155] [] do_signal+0x23/0x670 >> [ 10.356983] [] ? do_syslog+0x2c0/0x6a0 >> [ 10.357832] [] ? bad_area_nosemaphore+0x33/0x40 >> [ 10.358825] [] ? __do_page_fault+0x407/0x4d0 >> [ 10.359738] [] exit_to_usermode_loop+0x69/0xc0 >> [ 10.360680] [] prepare_exit_to_usermode+0x3d/0x70 >> [ 10.361725] [] retint_user+0x8/0x10 >> [ 10.362650] Kernel Offset: disabled >> >> The whole parent dmesg is attached. >> > >Then, what's the conclusion? Is this one is detect one or not? > It seems parent kernel lives longer than this commit, and the sysfs_kf_write bug shows up consistently in 3 boot tests in LKP environment. % compare -at ad05399d68b6ae1649cdcfc82ce3ffea1a7c5104 tests: 3 testcase/path_params/tbox_group/run: boot/1/vm-kbuild-yocto-x86_64 0937a4d787539e2f ad05399d68b6ae1649cdcfc82c -- fail:runs %reproductionfail:runs | | | 6:6 -100%:4 kmsg.stc):gdata/new_proto/recv_or_reg_complete_cb_not_ready 6:6 -100%:4 kmsg.fmdrv:st_unregister_failed :6 100% 4:4 kmsg.list_del_corruption.prev->next_should_be#,but_was :6 100% 4:4 dmesg.WARNING:at_lib/list_debug.c:#__list_del_entry :6 100% 4:4 dmesg.BUG:unable_to_handle_kernel :6 100% 4:4 dmesg.Oops :6 100% 4:4 dmesg.RIP:sysfs_kf_write :6 100% 4:4 dmesg.Kernel_panic-not_syncing:Fatal_exception 6:6 -100%:4 dmesg.Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode= testcase/path_params/tbox_group/run: boot/1/vm-ivb41-yocto-ia32 0937a4d787539e2f ad05399d68b6ae1649cdcfc82c -- 2:2 -100%:2 kmsg.stc):gdata/new_proto/recv_or_reg_complete_cb_not_ready 2:2 -100%:2 kmsg.fmdrv:st_unregister_failed :2 100% 2:2 dmesg.BUG:unable_to_handle_kernel :2 100% 2:2 dmesg.Oops :2 100% 2:2 dmesg.RIP:sysfs_kf_write :2 100% 2:2 dmesg.Kernel_panic-not_syncing:Fatal_exception 2:2 -100%:2 dmesg.BUG:kernel_test_hang testcase/path_params/tbox_group/run: boot/1/vm-kbuild-1G 0937a4d787539e2f ad05399d68b6ae1649cdcfc82c -- :4 100% 4:4 dmesg.WARNING:at_lib/list_debug.c:#__list_del_entry :4 75% 3:4 dmesg.BUG:unable_to_handle_kernel :4 75% 3:4 dmesg.Oops :4 75% 3:4 dmesg.RIP:sysfs_kf_write :4 75% 3:4 dmesg.Kernel_panic-not_syncing:Fatal_exception 4:4 -100%:4 dmesg.BUG:kernel_oversize_in_test_stage >Peter > >> Thanks, >> Xiaolong >> >> > >> >On Wed, Jul 13, 2016 at
Re: [PATCH v2 2/3] ses: use scsi_is_sas_rphy instead of is_sas_attached
Hi Johannes, [auto build test ERROR on scsi/for-next] [also build test ERROR on v4.8-rc2 next-20160815] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Johannes-Thumshirn/Fix-panic-when-a-SES-device-is-attached-to-a-hpsa-logical-volume/20160815-231901 base: https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next config: i386-randconfig-h0-08161012 (attached as .config) compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): >> ERROR: "scsi_is_sas_rphy" [drivers/scsi/ses.ko] undefined! --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH v2 2/3] ses: use scsi_is_sas_rphy instead of is_sas_attached
Hi Johannes, [auto build test ERROR on scsi/for-next] [also build test ERROR on v4.8-rc2 next-20160815] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Johannes-Thumshirn/Fix-panic-when-a-SES-device-is-attached-to-a-hpsa-logical-volume/20160815-231901 base: https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next config: i386-randconfig-h0-08161012 (attached as .config) compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): >> ERROR: "scsi_is_sas_rphy" [drivers/scsi/ses.ko] undefined! --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH] exynos-drm: Fix display manager failing to start without IOMMU problem
Hi Shuah, 2016년 08월 13일 02:52에 Shuah Khan 이(가) 쓴 글: > On 08/12/2016 11:28 AM, Shuah Khan wrote: >> On 08/10/2016 05:05 PM, Shuah Khan wrote: >>> On 08/10/2016 04:59 PM, Inki Dae wrote: Hi Shuah, 2016년 08월 11일 02:30에 Shuah Khan 이(가) 쓴 글: > Fix exynos_drm_gem_create_ioctl() attempts to allocate non-contiguous GEM > memory without IOMMU. In this case, there is no point in attempting to DRM gem can be used for Non-DRM drivers such as GPU, V4L2 based Multimedia device and other DMA devices. Even though IOMMU support is disabled, other framework based DMA drivers can use IOMMU - i.e., GPU driver - and they can use non-contiguous GEM buffer through UMM. (DMABUF) So GEM allocation type is not dependent on IOMMU. >>> >>> Hi Inki, >>> >>> I am seeing the following failure without IOMMU and light dm fails >>> to start: >>> >>> [drm:exynos_drm_framebuffer_init] *ERROR* Non-continguous GEM memory is not >>> supported. >>> >>> The change I made fixed that problem and light dm starts without IOMMU. >>> Is there a better way to fix this problem? Currently without IOMMU, >>> light dm doesn't start. >>> >>> This is on linux_next >> >> Hi Inki, >> >> I am looking into this further and I am finding inconsistent >> commits with regards to GEM contiguous and non-contiguous >> buffers. >> >> Okay what you said is that: >> >> exymod-drm should support non-continguous and contiguous GEM memory >> type with or without IOMMU Right. >> >> However, the code currently isn't doing that. The following >> commit allocates non-contiguous buffers when IOMMU is enabled >> to handle contiguous allocation failures. >> >> There are other commits that removed checks for non-contig type. >> Let's look at the following cases to see what should be the driver >> behavior in these cases: >> >> IOMMU is disabled: >> >> exynos_drm_gem_create_ioctl() gets called with NONCONTIG >> - driver should try to allocate non-contig >> - if it can't allocate non-contig, allocate contig >> ( this will allow avoid failure like the one I am seeing) >> >> exynos_drm_gem_create_ioctl() gets called with CONTIG >> - driver should try to allocate contig >> - if it can't allocate contig, allocate non-contig >> >> What is confusing is there are several code paths in the >> GEN allocation and checking memory types are enforcing >> non-contig with IOMMU. Check this routine: >> >> exynos_drm_framebuffer_init() will reject non-contig >> memory type when check_fb_gem_memory_type() rejects >> non-contig GEM memory type without IOMMU. Only in case that the gem buffer is used for framebuffer, gem memory type should be checked because this means the DMA of Display controller accesses the gem buffer so without IOMMU the DMA device cannot access non-contiguous memory region. That is why exynos_drm_framebuffer_init checks gem memory type for fb not when gem is created. > > > okay the very first commit that added IOMMU support > introduced the code that rejects non-contig gem memory > type without IOMMU. > > commit 0519f9a12d0113caab78980c48a7902d2bd40c2c > Author: Inki Dae> Date: Sat Oct 20 07:53:42 2012 -0700 > > drm/exynos: add iommu support for exynos drm framework > > Anyway, if it is th right change to fix check_fb_gem_memory_type() > to not reject NONCONTIG_BUFFER, then I can make that change No, as I mentioned above, the gem buffer for fb is dependent on IOMMU because the gem buffer for fb is used by DMA device - FIMD, DECON or Mixer. You would need to understand that gem buffer can be used for other purposes - 2D/3D or post process devices which don't use framebuffer - not display controller which uses framebuffer to scanout Thanks, Inki Dae > instead of this patch I sent. > >> >> So there is inconsistency in the non-contig vs. contig >> GEM support in exynos-drm. I think this needs to be cleaned >> up to get the desired behavior. >> >> The following commit allocates non-contiguous buffers when IOMMU is >> enabled to handle contiguous allocation failures. >> >> There are other commits that removed checks for non-contig type. >> Let's look at the following cases to see what should be the driver >> behavior in these cases: >> >> commit 122beea84bb90236b1ae545f08267af58591c21b >> Author: Rahul Sharma >> Date: Wed May 7 17:21:29 2014 +0530 >> >> drm/exynos: allocate non-contigous buffers when iommu is enabled >> >> Allow to allocate non-contigous buffers when iommu is enabled. >> Currently, it tries to allocates contigous buffer which consistently >> fail for large buffers and then fall back to non contigous. Apart >> from being slow, this implementation is also very noisy and fills >> the screen with alloc fail logs. >> >> Signed-off-by: Rahul Sharma >> Reviewed-by: Sachin Kamat >> Signed-off-by: Inki Dae >> >> >>
Re: [PATCH] exynos-drm: Fix display manager failing to start without IOMMU problem
Hi Shuah, 2016년 08월 13일 02:52에 Shuah Khan 이(가) 쓴 글: > On 08/12/2016 11:28 AM, Shuah Khan wrote: >> On 08/10/2016 05:05 PM, Shuah Khan wrote: >>> On 08/10/2016 04:59 PM, Inki Dae wrote: Hi Shuah, 2016년 08월 11일 02:30에 Shuah Khan 이(가) 쓴 글: > Fix exynos_drm_gem_create_ioctl() attempts to allocate non-contiguous GEM > memory without IOMMU. In this case, there is no point in attempting to DRM gem can be used for Non-DRM drivers such as GPU, V4L2 based Multimedia device and other DMA devices. Even though IOMMU support is disabled, other framework based DMA drivers can use IOMMU - i.e., GPU driver - and they can use non-contiguous GEM buffer through UMM. (DMABUF) So GEM allocation type is not dependent on IOMMU. >>> >>> Hi Inki, >>> >>> I am seeing the following failure without IOMMU and light dm fails >>> to start: >>> >>> [drm:exynos_drm_framebuffer_init] *ERROR* Non-continguous GEM memory is not >>> supported. >>> >>> The change I made fixed that problem and light dm starts without IOMMU. >>> Is there a better way to fix this problem? Currently without IOMMU, >>> light dm doesn't start. >>> >>> This is on linux_next >> >> Hi Inki, >> >> I am looking into this further and I am finding inconsistent >> commits with regards to GEM contiguous and non-contiguous >> buffers. >> >> Okay what you said is that: >> >> exymod-drm should support non-continguous and contiguous GEM memory >> type with or without IOMMU Right. >> >> However, the code currently isn't doing that. The following >> commit allocates non-contiguous buffers when IOMMU is enabled >> to handle contiguous allocation failures. >> >> There are other commits that removed checks for non-contig type. >> Let's look at the following cases to see what should be the driver >> behavior in these cases: >> >> IOMMU is disabled: >> >> exynos_drm_gem_create_ioctl() gets called with NONCONTIG >> - driver should try to allocate non-contig >> - if it can't allocate non-contig, allocate contig >> ( this will allow avoid failure like the one I am seeing) >> >> exynos_drm_gem_create_ioctl() gets called with CONTIG >> - driver should try to allocate contig >> - if it can't allocate contig, allocate non-contig >> >> What is confusing is there are several code paths in the >> GEN allocation and checking memory types are enforcing >> non-contig with IOMMU. Check this routine: >> >> exynos_drm_framebuffer_init() will reject non-contig >> memory type when check_fb_gem_memory_type() rejects >> non-contig GEM memory type without IOMMU. Only in case that the gem buffer is used for framebuffer, gem memory type should be checked because this means the DMA of Display controller accesses the gem buffer so without IOMMU the DMA device cannot access non-contiguous memory region. That is why exynos_drm_framebuffer_init checks gem memory type for fb not when gem is created. > > > okay the very first commit that added IOMMU support > introduced the code that rejects non-contig gem memory > type without IOMMU. > > commit 0519f9a12d0113caab78980c48a7902d2bd40c2c > Author: Inki Dae > Date: Sat Oct 20 07:53:42 2012 -0700 > > drm/exynos: add iommu support for exynos drm framework > > Anyway, if it is th right change to fix check_fb_gem_memory_type() > to not reject NONCONTIG_BUFFER, then I can make that change No, as I mentioned above, the gem buffer for fb is dependent on IOMMU because the gem buffer for fb is used by DMA device - FIMD, DECON or Mixer. You would need to understand that gem buffer can be used for other purposes - 2D/3D or post process devices which don't use framebuffer - not display controller which uses framebuffer to scanout Thanks, Inki Dae > instead of this patch I sent. > >> >> So there is inconsistency in the non-contig vs. contig >> GEM support in exynos-drm. I think this needs to be cleaned >> up to get the desired behavior. >> >> The following commit allocates non-contiguous buffers when IOMMU is >> enabled to handle contiguous allocation failures. >> >> There are other commits that removed checks for non-contig type. >> Let's look at the following cases to see what should be the driver >> behavior in these cases: >> >> commit 122beea84bb90236b1ae545f08267af58591c21b >> Author: Rahul Sharma >> Date: Wed May 7 17:21:29 2014 +0530 >> >> drm/exynos: allocate non-contigous buffers when iommu is enabled >> >> Allow to allocate non-contigous buffers when iommu is enabled. >> Currently, it tries to allocates contigous buffer which consistently >> fail for large buffers and then fall back to non contigous. Apart >> from being slow, this implementation is also very noisy and fills >> the screen with alloc fail logs. >> >> Signed-off-by: Rahul Sharma >> Reviewed-by: Sachin Kamat >> Signed-off-by: Inki Dae >> >> >> commit ea6d66c3a797376d21b23dc8261733ce35970014 >> Author: Inki Dae >> Date: Fri Nov 2 16:10:39 2012 +0900 >> >>
[PATCH v2] arc: Add "model" properly in device tree description of all boards
As it was discussed quite some time ago (see https://lkml.org/lkml/2015/11/5/862) it's a good practice to add "model" property in .dts. Moreover as per ePAPR "model" property is required and should look like "manufacturer,model" so we do here. Signed-off-by: Alexey BrodkinCc: Vineet Gupta Cc: Jonas Gorski Cc: Arnd Bergmann Cc: Rob Herring Cc: Christian Ruppert --- Changes v1 -> v2: * Added "hs" postfix for boards based on ARC HS core * Added "archs" postfix in VDK's .dts to distinguish VDKs for ARC cores from those for ARM cores arch/arc/boot/dts/abilis_tb100_dvk.dts | 1 + arch/arc/boot/dts/abilis_tb101_dvk.dts | 1 + arch/arc/boot/dts/axs101.dts | 1 + arch/arc/boot/dts/axs103.dts | 1 + arch/arc/boot/dts/axs103_idu.dts | 1 + arch/arc/boot/dts/nsim_700.dts | 1 + arch/arc/boot/dts/nsim_hs.dts | 1 + arch/arc/boot/dts/nsim_hs_idu.dts | 1 + arch/arc/boot/dts/nsimosci.dts | 1 + arch/arc/boot/dts/nsimosci_hs.dts | 1 + arch/arc/boot/dts/nsimosci_hs_idu.dts | 1 + arch/arc/boot/dts/vdk_hs38.dts | 1 + arch/arc/boot/dts/vdk_hs38_smp.dts | 1 + 13 files changed, 13 insertions(+) diff --git a/arch/arc/boot/dts/abilis_tb100_dvk.dts b/arch/arc/boot/dts/abilis_tb100_dvk.dts index 3dd6ed9..3acf04d 100644 --- a/arch/arc/boot/dts/abilis_tb100_dvk.dts +++ b/arch/arc/boot/dts/abilis_tb100_dvk.dts @@ -24,6 +24,7 @@ /include/ "abilis_tb100.dtsi" / { + model = "abilis,tb100"; chosen { bootargs = "earlycon=uart8250,mmio32,0xff10,9600n8 console=ttyS0,9600n8"; }; diff --git a/arch/arc/boot/dts/abilis_tb101_dvk.dts b/arch/arc/boot/dts/abilis_tb101_dvk.dts index 1cf51c2..37d88c5 100644 --- a/arch/arc/boot/dts/abilis_tb101_dvk.dts +++ b/arch/arc/boot/dts/abilis_tb101_dvk.dts @@ -24,6 +24,7 @@ /include/ "abilis_tb101.dtsi" / { + model = "abilis,tb101"; chosen { bootargs = "earlycon=uart8250,mmio32,0xff10,9600n8 console=ttyS0,9600n8"; }; diff --git a/arch/arc/boot/dts/axs101.dts b/arch/arc/boot/dts/axs101.dts index 3f9b058..d9b9b9d 100644 --- a/arch/arc/boot/dts/axs101.dts +++ b/arch/arc/boot/dts/axs101.dts @@ -13,6 +13,7 @@ /include/ "axs10x_mb.dtsi" / { + model = "snps,axs101"; compatible = "snps,axs101", "snps,arc-sdp"; chosen { diff --git a/arch/arc/boot/dts/axs103.dts b/arch/arc/boot/dts/axs103.dts index e6d0e31..ec7fb27 100644 --- a/arch/arc/boot/dts/axs103.dts +++ b/arch/arc/boot/dts/axs103.dts @@ -16,6 +16,7 @@ /include/ "axs10x_mb.dtsi" / { + model = "snps,axs103"; compatible = "snps,axs103", "snps,arc-sdp"; chosen { diff --git a/arch/arc/boot/dts/axs103_idu.dts b/arch/arc/boot/dts/axs103_idu.dts index f999fef..070c297 100644 --- a/arch/arc/boot/dts/axs103_idu.dts +++ b/arch/arc/boot/dts/axs103_idu.dts @@ -16,6 +16,7 @@ /include/ "axs10x_mb.dtsi" / { + model = "snps,axs103-smp"; compatible = "snps,axs103", "snps,arc-sdp"; chosen { diff --git a/arch/arc/boot/dts/nsim_700.dts b/arch/arc/boot/dts/nsim_700.dts index 6397051..ce0ccd20 100644 --- a/arch/arc/boot/dts/nsim_700.dts +++ b/arch/arc/boot/dts/nsim_700.dts @@ -10,6 +10,7 @@ /include/ "skeleton.dtsi" / { + model = "snps,nsim"; compatible = "snps,nsim"; #address-cells = <1>; #size-cells = <1>; diff --git a/arch/arc/boot/dts/nsim_hs.dts b/arch/arc/boot/dts/nsim_hs.dts index bf05fe5..3772c40 100644 --- a/arch/arc/boot/dts/nsim_hs.dts +++ b/arch/arc/boot/dts/nsim_hs.dts @@ -10,6 +10,7 @@ /include/ "skeleton_hs.dtsi" / { + model = "snps,nsim_hs"; compatible = "snps,nsim_hs"; #address-cells = <2>; #size-cells = <2>; diff --git a/arch/arc/boot/dts/nsim_hs_idu.dts b/arch/arc/boot/dts/nsim_hs_idu.dts index 99eabe1..48434d7c 100644 --- a/arch/arc/boot/dts/nsim_hs_idu.dts +++ b/arch/arc/boot/dts/nsim_hs_idu.dts @@ -10,6 +10,7 @@ /include/ "skeleton_hs_idu.dtsi" / { + model = "snps,nsim_hs-smp"; compatible = "snps,nsim_hs"; interrupt-parent = <_intc>; diff --git a/arch/arc/boot/dts/nsimosci.dts b/arch/arc/boot/dts/nsimosci.dts index e659a34..bcf6031 100644 --- a/arch/arc/boot/dts/nsimosci.dts +++ b/arch/arc/boot/dts/nsimosci.dts @@ -10,6 +10,7 @@ /include/ "skeleton.dtsi" / { + model = "snps,nsimosci"; compatible = "snps,nsimosci"; #address-cells = <1>; #size-cells = <1>; diff --git a/arch/arc/boot/dts/nsimosci_hs.dts b/arch/arc/boot/dts/nsimosci_hs.dts index 16ce5d6..14a727c 100644 --- a/arch/arc/boot/dts/nsimosci_hs.dts +++ b/arch/arc/boot/dts/nsimosci_hs.dts @@ -10,6 +10,7 @@ /include/ "skeleton_hs.dtsi" / { + model = "snps,nsimosci_hs"; compatible = "snps,nsimosci_hs"; #address-cells = <1>; #size-cells = <1>; diff
[PATCH v2] arc: Add "model" properly in device tree description of all boards
As it was discussed quite some time ago (see https://lkml.org/lkml/2015/11/5/862) it's a good practice to add "model" property in .dts. Moreover as per ePAPR "model" property is required and should look like "manufacturer,model" so we do here. Signed-off-by: Alexey Brodkin Cc: Vineet Gupta Cc: Jonas Gorski Cc: Arnd Bergmann Cc: Rob Herring Cc: Christian Ruppert --- Changes v1 -> v2: * Added "hs" postfix for boards based on ARC HS core * Added "archs" postfix in VDK's .dts to distinguish VDKs for ARC cores from those for ARM cores arch/arc/boot/dts/abilis_tb100_dvk.dts | 1 + arch/arc/boot/dts/abilis_tb101_dvk.dts | 1 + arch/arc/boot/dts/axs101.dts | 1 + arch/arc/boot/dts/axs103.dts | 1 + arch/arc/boot/dts/axs103_idu.dts | 1 + arch/arc/boot/dts/nsim_700.dts | 1 + arch/arc/boot/dts/nsim_hs.dts | 1 + arch/arc/boot/dts/nsim_hs_idu.dts | 1 + arch/arc/boot/dts/nsimosci.dts | 1 + arch/arc/boot/dts/nsimosci_hs.dts | 1 + arch/arc/boot/dts/nsimosci_hs_idu.dts | 1 + arch/arc/boot/dts/vdk_hs38.dts | 1 + arch/arc/boot/dts/vdk_hs38_smp.dts | 1 + 13 files changed, 13 insertions(+) diff --git a/arch/arc/boot/dts/abilis_tb100_dvk.dts b/arch/arc/boot/dts/abilis_tb100_dvk.dts index 3dd6ed9..3acf04d 100644 --- a/arch/arc/boot/dts/abilis_tb100_dvk.dts +++ b/arch/arc/boot/dts/abilis_tb100_dvk.dts @@ -24,6 +24,7 @@ /include/ "abilis_tb100.dtsi" / { + model = "abilis,tb100"; chosen { bootargs = "earlycon=uart8250,mmio32,0xff10,9600n8 console=ttyS0,9600n8"; }; diff --git a/arch/arc/boot/dts/abilis_tb101_dvk.dts b/arch/arc/boot/dts/abilis_tb101_dvk.dts index 1cf51c2..37d88c5 100644 --- a/arch/arc/boot/dts/abilis_tb101_dvk.dts +++ b/arch/arc/boot/dts/abilis_tb101_dvk.dts @@ -24,6 +24,7 @@ /include/ "abilis_tb101.dtsi" / { + model = "abilis,tb101"; chosen { bootargs = "earlycon=uart8250,mmio32,0xff10,9600n8 console=ttyS0,9600n8"; }; diff --git a/arch/arc/boot/dts/axs101.dts b/arch/arc/boot/dts/axs101.dts index 3f9b058..d9b9b9d 100644 --- a/arch/arc/boot/dts/axs101.dts +++ b/arch/arc/boot/dts/axs101.dts @@ -13,6 +13,7 @@ /include/ "axs10x_mb.dtsi" / { + model = "snps,axs101"; compatible = "snps,axs101", "snps,arc-sdp"; chosen { diff --git a/arch/arc/boot/dts/axs103.dts b/arch/arc/boot/dts/axs103.dts index e6d0e31..ec7fb27 100644 --- a/arch/arc/boot/dts/axs103.dts +++ b/arch/arc/boot/dts/axs103.dts @@ -16,6 +16,7 @@ /include/ "axs10x_mb.dtsi" / { + model = "snps,axs103"; compatible = "snps,axs103", "snps,arc-sdp"; chosen { diff --git a/arch/arc/boot/dts/axs103_idu.dts b/arch/arc/boot/dts/axs103_idu.dts index f999fef..070c297 100644 --- a/arch/arc/boot/dts/axs103_idu.dts +++ b/arch/arc/boot/dts/axs103_idu.dts @@ -16,6 +16,7 @@ /include/ "axs10x_mb.dtsi" / { + model = "snps,axs103-smp"; compatible = "snps,axs103", "snps,arc-sdp"; chosen { diff --git a/arch/arc/boot/dts/nsim_700.dts b/arch/arc/boot/dts/nsim_700.dts index 6397051..ce0ccd20 100644 --- a/arch/arc/boot/dts/nsim_700.dts +++ b/arch/arc/boot/dts/nsim_700.dts @@ -10,6 +10,7 @@ /include/ "skeleton.dtsi" / { + model = "snps,nsim"; compatible = "snps,nsim"; #address-cells = <1>; #size-cells = <1>; diff --git a/arch/arc/boot/dts/nsim_hs.dts b/arch/arc/boot/dts/nsim_hs.dts index bf05fe5..3772c40 100644 --- a/arch/arc/boot/dts/nsim_hs.dts +++ b/arch/arc/boot/dts/nsim_hs.dts @@ -10,6 +10,7 @@ /include/ "skeleton_hs.dtsi" / { + model = "snps,nsim_hs"; compatible = "snps,nsim_hs"; #address-cells = <2>; #size-cells = <2>; diff --git a/arch/arc/boot/dts/nsim_hs_idu.dts b/arch/arc/boot/dts/nsim_hs_idu.dts index 99eabe1..48434d7c 100644 --- a/arch/arc/boot/dts/nsim_hs_idu.dts +++ b/arch/arc/boot/dts/nsim_hs_idu.dts @@ -10,6 +10,7 @@ /include/ "skeleton_hs_idu.dtsi" / { + model = "snps,nsim_hs-smp"; compatible = "snps,nsim_hs"; interrupt-parent = <_intc>; diff --git a/arch/arc/boot/dts/nsimosci.dts b/arch/arc/boot/dts/nsimosci.dts index e659a34..bcf6031 100644 --- a/arch/arc/boot/dts/nsimosci.dts +++ b/arch/arc/boot/dts/nsimosci.dts @@ -10,6 +10,7 @@ /include/ "skeleton.dtsi" / { + model = "snps,nsimosci"; compatible = "snps,nsimosci"; #address-cells = <1>; #size-cells = <1>; diff --git a/arch/arc/boot/dts/nsimosci_hs.dts b/arch/arc/boot/dts/nsimosci_hs.dts index 16ce5d6..14a727c 100644 --- a/arch/arc/boot/dts/nsimosci_hs.dts +++ b/arch/arc/boot/dts/nsimosci_hs.dts @@ -10,6 +10,7 @@ /include/ "skeleton_hs.dtsi" / { + model = "snps,nsimosci_hs"; compatible = "snps,nsimosci_hs"; #address-cells = <1>; #size-cells = <1>; diff --git a/arch/arc/boot/dts/nsimosci_hs_idu.dts b/arch/arc/boot/dts/nsimosci_hs_idu.dts index ce8dfbc..cbf65b6 100644 ---
Re: [PATCH] powerpc/powernv: Initialise nest mmu
On 16/08/16 10:37, Alistair Popple wrote: > Balbir, > > > >>> + /* Update partition table control register on all Nest MMUs */ >>> + opal_nmmu_set_ptcr(-1UL, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12)); >>> + >> >> Just wondering if >> >> 1. Instead of using -1 for all cpus, we should do >> for_each_online_cpu() { >> opal_numm_set_ptcr(...) >> } > > Good question, but I don't think it makes sense to do that. The NMMU is > per-chip/socket rather than per-cpu so it shouldn't be tied to > onlining/offlining of individual CPUs. > >> 2. In cpu hotplug path do the same when onlining and set to NULL on >> offlining? > > Again, the nmmu isn't tied to a specific CPU but rather a chip/socket. So in > theory at least it's possible that all CPUs in a chip could be offline but > other units on the chip could still be using the nmmu so we wouldn't want to > disable the nmmu at that point. Fair enough Balbir Singh.
Re: [PATCH] powerpc/powernv: Initialise nest mmu
On 16/08/16 10:37, Alistair Popple wrote: > Balbir, > > > >>> + /* Update partition table control register on all Nest MMUs */ >>> + opal_nmmu_set_ptcr(-1UL, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12)); >>> + >> >> Just wondering if >> >> 1. Instead of using -1 for all cpus, we should do >> for_each_online_cpu() { >> opal_numm_set_ptcr(...) >> } > > Good question, but I don't think it makes sense to do that. The NMMU is > per-chip/socket rather than per-cpu so it shouldn't be tied to > onlining/offlining of individual CPUs. > >> 2. In cpu hotplug path do the same when onlining and set to NULL on >> offlining? > > Again, the nmmu isn't tied to a specific CPU but rather a chip/socket. So in > theory at least it's possible that all CPUs in a chip could be offline but > other units on the chip could still be using the nmmu so we wouldn't want to > disable the nmmu at that point. Fair enough Balbir Singh.
Re: [PATCH V5 3/4] drm/bridge: Add driver for GE B850v3 LVDS/DP++ Bridge
Hi, On 08/09/2016 10:11 PM, Peter Senna Tschudin wrote: Add a driver that create a drm_bridge and a drm_connector for the LVDS to DP++ display bridge of the GE B850v3. There are two physical bridges on the video signal pipeline: a STDP4028(LVDS to DP) and a STDP2690(DP to DP++). The hardware and firmware made it complicated for this binding to comprise two device tree nodes, as the design goal is to configure both bridges based on the LVDS signal, which leave the driver powerless to control the video processing pipeline. The two bridges behaves as a single bridge, and the driver is only needed for telling the host about EDID / HPD, and for giving the host powers to ack interrupts. The video signal pipeline is as follows: Host -> LVDS|--(STDP4028)--|DP -> DP|--(STDP2690)--|DP++ -> Video output I'd commented on an earlier revision (v2) of this patch, but hadn't got a response on it. Pasting the query again: Are these two chips always expected to be used together? I don't think it's right to pair up two encoder chips into one driver just for one board. Is one device @0x72 and other @0x73? Or is only one of them an i2c slave? What's preventing us to create these as two different bridge drivers? The drm framework allows us to daisy chain encoder bridges. The only problem I see is that we don't have a clear-cut way to tell the bridge driver whether we want it to create a connector for us or not. Because, it looks like both can potentially create connectors. This isn't a big problem either if we have DT. We just need to check whether our output port is connected to another bridge or a connector. Thanks, Archit Cc: Martyn WelchCc: Martin Donnelly Cc: Daniel Vetter Cc: Enric Balletbo i Serra Cc: Philipp Zabel Cc: Rob Herring Cc: Fabio Estevam CC: David Airlie CC: Thierry Reding CC: Thierry Reding Reviewed-by: Enric Balletbo Signed-off-by: Peter Senna Tschudin --- Changes from V4: - Check the output of the first call to i2c_smbus_write_word_data() and return it's error code for failing gracefully on i2c issues - Renamed the i2c_driver.name from "ge,b850v3-lvds-dp" to "b850v3-lvds-dp" to remove the comma from the driver name Changes from V3: - 3/4 instead of 4/5 - Tested on next-20160804 Changes from V2: - Made it atomic to be applied on next-20160729 on top of Liu Ying changes that made imx-ldb atomic Changes from V1: - New commit message - Removed 3 empty entry points - Removed memory leak from ge_b850v3_lvds_dp_get_modes() - Added a lock for mode setting - Removed a few blank lines - Changed the order at Makefile and Kconfig MAINTAINERS| 8 + drivers/gpu/drm/bridge/Kconfig | 11 + drivers/gpu/drm/bridge/Makefile| 1 + drivers/gpu/drm/bridge/ge_b850v3_lvds_dp.c | 405 + 4 files changed, 425 insertions(+) create mode 100644 drivers/gpu/drm/bridge/ge_b850v3_lvds_dp.c diff --git a/MAINTAINERS b/MAINTAINERS index a306795..e8d106a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5142,6 +5142,14 @@ W: https://linuxtv.org S:Maintained F:drivers/media/radio/radio-gemtek* +GENERAL ELECTRIC B850V3 LVDS/DP++ BRIDGE +M: Peter Senna Tschudin +M: Martin Donnelly +M: Martyn Welch +S: Maintained +F: drivers/gpu/drm/bridge/ge_b850v3_dp2.c +F: Documentation/devicetree/bindings/ge/b850v3_dp2_bridge.txt + GENERIC GPIO I2C DRIVER M:Haavard Skinnemoen S:Supported diff --git a/drivers/gpu/drm/bridge/Kconfig b/drivers/gpu/drm/bridge/Kconfig index b590e67..b4b70fb 100644 --- a/drivers/gpu/drm/bridge/Kconfig +++ b/drivers/gpu/drm/bridge/Kconfig @@ -32,6 +32,17 @@ config DRM_DW_HDMI_AHB_AUDIO Designware HDMI block. This is used in conjunction with the i.MX6 HDMI driver. +config DRM_GE_B850V3_LVDS_DP + tristate "GE B850v3 LVDS to DP++ display bridge" + depends on OF + select DRM_KMS_HELPER + select DRM_PANEL + ---help--- + This is a driver for the display bridge of + GE B850v3 that convert dual channel LVDS + to DP++. This is used with the i.MX6 imx-ldb + driver. + config DRM_NXP_PTN3460 tristate "NXP PTN3460 DP/LVDS bridge" depends on OF diff --git a/drivers/gpu/drm/bridge/Makefile b/drivers/gpu/drm/bridge/Makefile index efdb07e..b9606f3 100644 --- a/drivers/gpu/drm/bridge/Makefile +++ b/drivers/gpu/drm/bridge/Makefile @@ -3,6 +3,7 @@ ccflags-y := -Iinclude/drm obj-$(CONFIG_DRM_ANALOGIX_ANX78XX) += analogix-anx78xx.o
Re: [PATCH V5 3/4] drm/bridge: Add driver for GE B850v3 LVDS/DP++ Bridge
Hi, On 08/09/2016 10:11 PM, Peter Senna Tschudin wrote: Add a driver that create a drm_bridge and a drm_connector for the LVDS to DP++ display bridge of the GE B850v3. There are two physical bridges on the video signal pipeline: a STDP4028(LVDS to DP) and a STDP2690(DP to DP++). The hardware and firmware made it complicated for this binding to comprise two device tree nodes, as the design goal is to configure both bridges based on the LVDS signal, which leave the driver powerless to control the video processing pipeline. The two bridges behaves as a single bridge, and the driver is only needed for telling the host about EDID / HPD, and for giving the host powers to ack interrupts. The video signal pipeline is as follows: Host -> LVDS|--(STDP4028)--|DP -> DP|--(STDP2690)--|DP++ -> Video output I'd commented on an earlier revision (v2) of this patch, but hadn't got a response on it. Pasting the query again: Are these two chips always expected to be used together? I don't think it's right to pair up two encoder chips into one driver just for one board. Is one device @0x72 and other @0x73? Or is only one of them an i2c slave? What's preventing us to create these as two different bridge drivers? The drm framework allows us to daisy chain encoder bridges. The only problem I see is that we don't have a clear-cut way to tell the bridge driver whether we want it to create a connector for us or not. Because, it looks like both can potentially create connectors. This isn't a big problem either if we have DT. We just need to check whether our output port is connected to another bridge or a connector. Thanks, Archit Cc: Martyn Welch Cc: Martin Donnelly Cc: Daniel Vetter Cc: Enric Balletbo i Serra Cc: Philipp Zabel Cc: Rob Herring Cc: Fabio Estevam CC: David Airlie CC: Thierry Reding CC: Thierry Reding Reviewed-by: Enric Balletbo Signed-off-by: Peter Senna Tschudin --- Changes from V4: - Check the output of the first call to i2c_smbus_write_word_data() and return it's error code for failing gracefully on i2c issues - Renamed the i2c_driver.name from "ge,b850v3-lvds-dp" to "b850v3-lvds-dp" to remove the comma from the driver name Changes from V3: - 3/4 instead of 4/5 - Tested on next-20160804 Changes from V2: - Made it atomic to be applied on next-20160729 on top of Liu Ying changes that made imx-ldb atomic Changes from V1: - New commit message - Removed 3 empty entry points - Removed memory leak from ge_b850v3_lvds_dp_get_modes() - Added a lock for mode setting - Removed a few blank lines - Changed the order at Makefile and Kconfig MAINTAINERS| 8 + drivers/gpu/drm/bridge/Kconfig | 11 + drivers/gpu/drm/bridge/Makefile| 1 + drivers/gpu/drm/bridge/ge_b850v3_lvds_dp.c | 405 + 4 files changed, 425 insertions(+) create mode 100644 drivers/gpu/drm/bridge/ge_b850v3_lvds_dp.c diff --git a/MAINTAINERS b/MAINTAINERS index a306795..e8d106a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5142,6 +5142,14 @@ W: https://linuxtv.org S:Maintained F:drivers/media/radio/radio-gemtek* +GENERAL ELECTRIC B850V3 LVDS/DP++ BRIDGE +M: Peter Senna Tschudin +M: Martin Donnelly +M: Martyn Welch +S: Maintained +F: drivers/gpu/drm/bridge/ge_b850v3_dp2.c +F: Documentation/devicetree/bindings/ge/b850v3_dp2_bridge.txt + GENERIC GPIO I2C DRIVER M:Haavard Skinnemoen S:Supported diff --git a/drivers/gpu/drm/bridge/Kconfig b/drivers/gpu/drm/bridge/Kconfig index b590e67..b4b70fb 100644 --- a/drivers/gpu/drm/bridge/Kconfig +++ b/drivers/gpu/drm/bridge/Kconfig @@ -32,6 +32,17 @@ config DRM_DW_HDMI_AHB_AUDIO Designware HDMI block. This is used in conjunction with the i.MX6 HDMI driver. +config DRM_GE_B850V3_LVDS_DP + tristate "GE B850v3 LVDS to DP++ display bridge" + depends on OF + select DRM_KMS_HELPER + select DRM_PANEL + ---help--- + This is a driver for the display bridge of + GE B850v3 that convert dual channel LVDS + to DP++. This is used with the i.MX6 imx-ldb + driver. + config DRM_NXP_PTN3460 tristate "NXP PTN3460 DP/LVDS bridge" depends on OF diff --git a/drivers/gpu/drm/bridge/Makefile b/drivers/gpu/drm/bridge/Makefile index efdb07e..b9606f3 100644 --- a/drivers/gpu/drm/bridge/Makefile +++ b/drivers/gpu/drm/bridge/Makefile @@ -3,6 +3,7 @@ ccflags-y := -Iinclude/drm obj-$(CONFIG_DRM_ANALOGIX_ANX78XX) += analogix-anx78xx.o obj-$(CONFIG_DRM_DW_HDMI) += dw-hdmi.o obj-$(CONFIG_DRM_DW_HDMI_AHB_AUDIO) += dw-hdmi-ahb-audio.o +obj-$(CONFIG_DRM_GE_B850V3_LVDS_DP) += ge_b850v3_lvds_dp.o obj-$(CONFIG_DRM_NXP_PTN3460) += nxp-ptn3460.o obj-$(CONFIG_DRM_PARADE_PS8622) += parade-ps8622.o obj-$(CONFIG_DRM_SII902X) += sii902x.o diff --git a/drivers/gpu/drm/bridge/ge_b850v3_lvds_dp.c
Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
On Wed, Aug 10, 2016 at 04:59:39AM -0700, Andy Lutomirski wrote: > On Sun, Jul 31, 2016 at 10:30 PM, Joonsoo Kimwrote: > > On Fri, Jul 29, 2016 at 12:47:38PM -0700, Andy Lutomirski wrote: > >> -- Forwarded message -- > >> From: "Joonsoo Kim" > >> Date: Jul 28, 2016 7:57 PM > >> Subject: Re: [RFC] can we use vmalloc to alloc thread stack if compaction > >> failed > >> To: "Andy Lutomirski" > >> Cc: "Xishi Qiu" , "Michal Hocko" > >> , "Tejun Heo" , "Ingo Molnar" > >> , "Peter Zijlstra" , "LKML" > >> , "Linux MM" , > >> "Yisheng Xie" > >> > >> > On Thu, Jul 28, 2016 at 08:07:51AM -0700, Andy Lutomirski wrote: > >> > > On Thu, Jul 28, 2016 at 3:51 AM, Xishi Qiu wrote: > >> > > > On 2016/7/28 17:43, Michal Hocko wrote: > >> > > > > >> > > >> On Thu 28-07-16 16:45:06, Xishi Qiu wrote: > >> > > >>> On 2016/7/28 15:58, Michal Hocko wrote: > >> > > >>> > >> > > On Thu 28-07-16 15:41:53, Xishi Qiu wrote: > >> > > > On 2016/7/28 15:20, Michal Hocko wrote: > >> > > > > >> > > >> On Thu 28-07-16 15:08:26, Xishi Qiu wrote: > >> > > >>> Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb > >> > > >>> continuous > >> > > >>> physical memory during fork a new process. > >> > > >>> > >> > > >>> If the system's memory is very small, especially the smart > >> > > >>> phone, maybe there > >> > > >>> is only 1G memory. So the free memory is very small and > >> > > >>> compaction is not > >> > > >>> always success in slowpath(__alloc_pages_slowpath), then alloc > >> > > >>> thread stack > >> > > >>> may be failed for memory fragment. > >> > > >> > >> > > >> Well, with the current implementation of the page allocator > >> > > >> those > >> > > >> requests will not fail in most cases. The oom killer would be > >> > > >> invoked in > >> > > >> order to free up some memory. > >> > > >> > >> > > > > >> > > > Hi Michal, > >> > > > > >> > > > Yes, it success in most cases, but I did have seen this problem > >> > > > in some > >> > > > stress-test. > >> > > > > >> > > > DMA free:470628kB, but alloc 2 order block failed during fork a > >> > > > new process. > >> > > > There are so many memory fragments and the large block may be > >> > > > soon taken by > >> > > > others after compact because of stress-test. > >> > > > > >> > > > --- dmesg messages --- > >> > > > 07-13 08:41:51.341 > >> > > > <4>[309805.658142s][pid:1361,cpu5,sManagerService]sManagerService: > >> > > > page allocation failure: order:2, mode:0x2000d1 > >> > > > >> > > Yes but this is __GFP_DMA allocation. I guess you have already > >> > > reported > >> > > this failure and you've been told that this is quite unexpected > >> > > for the > >> > > kernel stack allocation. It is your out-of-tree patch which just > >> > > makes > >> > > things worse because DMA restricted allocations are considered > >> > > "lowmem" > >> > > and so they do not invoke OOM killer and do not retry like regular > >> > > GFP_KERNEL allocations. > >> > > >>> > >> > > >>> Hi Michal, > >> > > >>> > >> > > >>> Yes, we add GFP_DMA, but I don't think this is the key for the > >> > > >>> problem. > >> > > >> > >> > > >> You are restricting the allocation request to a single zone which is > >> > > >> definitely not good. Look at how many larger order pages are > >> > > >> available > >> > > >> in the Normal zone. > >> > > >> > >> > > >>> If we do oom-killer, maybe we will get a large block later, but > >> > > >>> there > >> > > >>> is enough free memory before oom(although most of them are > >> > > >>> fragments). > >> > > >> > >> > > >> Killing a task is of course the last resort action. It would give > >> > > >> you > >> > > >> larger order blocks used for the victims thread. > >> > > >> > >> > > >>> I wonder if we can alloc success without kill any process in this > >> > > >>> situation. > >> > > >> > >> > > >> Sure it would be preferable to compact that memory but that might be > >> > > >> hard with your restriction in place. Consider that DMA zone would > >> > > >> tend > >> > > >> to be less movable than normal zones as users would have to pin it > >> > > >> for > >> > > >> DMA. Your DMA is really large so this might turn out to just happen > >> > > >> to > >> > > >> work but note that the primary problem here is that you put a zone > >> > > >> restriction for your allocations. > >> > > >> > >> > > >>> Maybe use vmalloc is a good way, but I don't know the influence. > >> > > >> > >> > > >> You can have a look at vmalloc patches posted by Andy. They are not > >> > > >> that > >> > >
Re: [RFC] can we use vmalloc to alloc thread stack if compaction failed
On Wed, Aug 10, 2016 at 04:59:39AM -0700, Andy Lutomirski wrote: > On Sun, Jul 31, 2016 at 10:30 PM, Joonsoo Kim wrote: > > On Fri, Jul 29, 2016 at 12:47:38PM -0700, Andy Lutomirski wrote: > >> -- Forwarded message -- > >> From: "Joonsoo Kim" > >> Date: Jul 28, 2016 7:57 PM > >> Subject: Re: [RFC] can we use vmalloc to alloc thread stack if compaction > >> failed > >> To: "Andy Lutomirski" > >> Cc: "Xishi Qiu" , "Michal Hocko" > >> , "Tejun Heo" , "Ingo Molnar" > >> , "Peter Zijlstra" , "LKML" > >> , "Linux MM" , > >> "Yisheng Xie" > >> > >> > On Thu, Jul 28, 2016 at 08:07:51AM -0700, Andy Lutomirski wrote: > >> > > On Thu, Jul 28, 2016 at 3:51 AM, Xishi Qiu wrote: > >> > > > On 2016/7/28 17:43, Michal Hocko wrote: > >> > > > > >> > > >> On Thu 28-07-16 16:45:06, Xishi Qiu wrote: > >> > > >>> On 2016/7/28 15:58, Michal Hocko wrote: > >> > > >>> > >> > > On Thu 28-07-16 15:41:53, Xishi Qiu wrote: > >> > > > On 2016/7/28 15:20, Michal Hocko wrote: > >> > > > > >> > > >> On Thu 28-07-16 15:08:26, Xishi Qiu wrote: > >> > > >>> Usually THREAD_SIZE_ORDER is 2, it means we need to alloc 16kb > >> > > >>> continuous > >> > > >>> physical memory during fork a new process. > >> > > >>> > >> > > >>> If the system's memory is very small, especially the smart > >> > > >>> phone, maybe there > >> > > >>> is only 1G memory. So the free memory is very small and > >> > > >>> compaction is not > >> > > >>> always success in slowpath(__alloc_pages_slowpath), then alloc > >> > > >>> thread stack > >> > > >>> may be failed for memory fragment. > >> > > >> > >> > > >> Well, with the current implementation of the page allocator > >> > > >> those > >> > > >> requests will not fail in most cases. The oom killer would be > >> > > >> invoked in > >> > > >> order to free up some memory. > >> > > >> > >> > > > > >> > > > Hi Michal, > >> > > > > >> > > > Yes, it success in most cases, but I did have seen this problem > >> > > > in some > >> > > > stress-test. > >> > > > > >> > > > DMA free:470628kB, but alloc 2 order block failed during fork a > >> > > > new process. > >> > > > There are so many memory fragments and the large block may be > >> > > > soon taken by > >> > > > others after compact because of stress-test. > >> > > > > >> > > > --- dmesg messages --- > >> > > > 07-13 08:41:51.341 > >> > > > <4>[309805.658142s][pid:1361,cpu5,sManagerService]sManagerService: > >> > > > page allocation failure: order:2, mode:0x2000d1 > >> > > > >> > > Yes but this is __GFP_DMA allocation. I guess you have already > >> > > reported > >> > > this failure and you've been told that this is quite unexpected > >> > > for the > >> > > kernel stack allocation. It is your out-of-tree patch which just > >> > > makes > >> > > things worse because DMA restricted allocations are considered > >> > > "lowmem" > >> > > and so they do not invoke OOM killer and do not retry like regular > >> > > GFP_KERNEL allocations. > >> > > >>> > >> > > >>> Hi Michal, > >> > > >>> > >> > > >>> Yes, we add GFP_DMA, but I don't think this is the key for the > >> > > >>> problem. > >> > > >> > >> > > >> You are restricting the allocation request to a single zone which is > >> > > >> definitely not good. Look at how many larger order pages are > >> > > >> available > >> > > >> in the Normal zone. > >> > > >> > >> > > >>> If we do oom-killer, maybe we will get a large block later, but > >> > > >>> there > >> > > >>> is enough free memory before oom(although most of them are > >> > > >>> fragments). > >> > > >> > >> > > >> Killing a task is of course the last resort action. It would give > >> > > >> you > >> > > >> larger order blocks used for the victims thread. > >> > > >> > >> > > >>> I wonder if we can alloc success without kill any process in this > >> > > >>> situation. > >> > > >> > >> > > >> Sure it would be preferable to compact that memory but that might be > >> > > >> hard with your restriction in place. Consider that DMA zone would > >> > > >> tend > >> > > >> to be less movable than normal zones as users would have to pin it > >> > > >> for > >> > > >> DMA. Your DMA is really large so this might turn out to just happen > >> > > >> to > >> > > >> work but note that the primary problem here is that you put a zone > >> > > >> restriction for your allocations. > >> > > >> > >> > > >>> Maybe use vmalloc is a good way, but I don't know the influence. > >> > > >> > >> > > >> You can have a look at vmalloc patches posted by Andy. They are not > >> > > >> that > >> > > >> trivial. > >> > > >> > >> > > > > >> > > > Hi Michal, > >> > > > > >> > > > Thank you for your comment, could you give me the link? > >> > > > > >> > > > >> > > I've been keeping it mostly up to date in this branch: > >> > > > >> > >
Re: [PATCH 2/6] x86/boot: Move compressed kernel to end of decompression buffer
[added Simon Glass to CC in case there's some input from u-boot] On Thu, Apr 28, 2016 at 05:09:04PM -0700, Kees Cook wrote: > From: Yinghai Lu> > This patch adds BP_init_size (which is the INIT_SIZE as passed in from > the boot_params) into asm-offsets.c to make it visible to the assembly > code. Then when moving the ZO, it calculates the starting position of > the copied ZO (via BP_init_size and the ZO run size) so that the VO__end > will be at the end of the decompression buffer. To make the position > calculation safe, the end of ZO is page aligned (and a comment is added > to the existing VO alignment for good measure). > diff --git a/arch/x86/boot/compressed/head_64.S > b/arch/x86/boot/compressed/head_64.S > index d43c30ed89ed..09cdc0c3ee7e 100644 > --- a/arch/x86/boot/compressed/head_64.S > +++ b/arch/x86/boot/compressed/head_64.S > @@ -338,7 +340,9 @@ preferred_addr: > 1: > > /* Target address to relocate to for decompression */ > - leaqz_extract_offset(%rbp), %rbx > + movlBP_init_size(%rsi), %ebx > + subl$_end, %ebx > + addq%rbp, %rbx > > /* Set up the stack */ > leaqboot_stack_end(%rbx), %rsp This appears to have a negative effect on booting the Intel Edison platform, as it uses u-boot as its bootloader. u-boot does not copy the init_size parameter when booting a bzImage: it copies a fixed-size setup_header [1], and its definition of setup_header doesn't include the parameters beyond setup_data [2]. With a zero value for init_size, this calculates a %rsp value of 0x101ff9600. This causes the boot process to hard-stop at the immediately-following pushq, as this platform has no usable physical addresses above 4G. What are the options for getting this type of platform to function again? For now, kexec from a working Linux system does seem to be a work-around, but there appears to be other x86 hardware using u-boot: the chromium.org folks seem to be maintaining the u-boot x86 tree. [1] http://git.denx.de/?p=u-boot.git;a=blob;f=arch/x86/lib/zimage.c;h=1b33c771391f49ffe82864ff1582bdfd07e5e97d;hb=HEAD#l156 [2] http://git.denx.de/?p=u-boot.git;a=blob;f=arch/x86/include/asm/bootparam.h;h=140095117e5a2daef0a097c55f0ed10e08acc781;hb=HEAD#l24
Re: [PATCH 2/6] x86/boot: Move compressed kernel to end of decompression buffer
[added Simon Glass to CC in case there's some input from u-boot] On Thu, Apr 28, 2016 at 05:09:04PM -0700, Kees Cook wrote: > From: Yinghai Lu > > This patch adds BP_init_size (which is the INIT_SIZE as passed in from > the boot_params) into asm-offsets.c to make it visible to the assembly > code. Then when moving the ZO, it calculates the starting position of > the copied ZO (via BP_init_size and the ZO run size) so that the VO__end > will be at the end of the decompression buffer. To make the position > calculation safe, the end of ZO is page aligned (and a comment is added > to the existing VO alignment for good measure). > diff --git a/arch/x86/boot/compressed/head_64.S > b/arch/x86/boot/compressed/head_64.S > index d43c30ed89ed..09cdc0c3ee7e 100644 > --- a/arch/x86/boot/compressed/head_64.S > +++ b/arch/x86/boot/compressed/head_64.S > @@ -338,7 +340,9 @@ preferred_addr: > 1: > > /* Target address to relocate to for decompression */ > - leaqz_extract_offset(%rbp), %rbx > + movlBP_init_size(%rsi), %ebx > + subl$_end, %ebx > + addq%rbp, %rbx > > /* Set up the stack */ > leaqboot_stack_end(%rbx), %rsp This appears to have a negative effect on booting the Intel Edison platform, as it uses u-boot as its bootloader. u-boot does not copy the init_size parameter when booting a bzImage: it copies a fixed-size setup_header [1], and its definition of setup_header doesn't include the parameters beyond setup_data [2]. With a zero value for init_size, this calculates a %rsp value of 0x101ff9600. This causes the boot process to hard-stop at the immediately-following pushq, as this platform has no usable physical addresses above 4G. What are the options for getting this type of platform to function again? For now, kexec from a working Linux system does seem to be a work-around, but there appears to be other x86 hardware using u-boot: the chromium.org folks seem to be maintaining the u-boot x86 tree. [1] http://git.denx.de/?p=u-boot.git;a=blob;f=arch/x86/lib/zimage.c;h=1b33c771391f49ffe82864ff1582bdfd07e5e97d;hb=HEAD#l156 [2] http://git.denx.de/?p=u-boot.git;a=blob;f=arch/x86/include/asm/bootparam.h;h=140095117e5a2daef0a097c55f0ed10e08acc781;hb=HEAD#l24
Re: [PATCH 4.7 00/41] 4.7.1-stable review
On 08/14/2016 02:38 PM, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 4.7.1 release. > There are 41 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Tue Aug 16 20:25:22 UTC 2016. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.7.1-rc1.gz > or in the git tree and branch at: > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.7.y > and the diffstat can be found below. > > thanks, > > greg k-h > Compiled and booted on my test system. No dmesg regressions. thanks, -- Shuah -- Shuah Khan Sr. Linux Kernel Developer Open Source Innovation Group Samsung Research America(Silicon Valley) shuah...@samsung.com
Re: [PATCH 4.7 00/41] 4.7.1-stable review
On 08/14/2016 02:38 PM, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 4.7.1 release. > There are 41 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Tue Aug 16 20:25:22 UTC 2016. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.7.1-rc1.gz > or in the git tree and branch at: > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.7.y > and the diffstat can be found below. > > thanks, > > greg k-h > Compiled and booted on my test system. No dmesg regressions. thanks, -- Shuah -- Shuah Khan Sr. Linux Kernel Developer Open Source Innovation Group Samsung Research America(Silicon Valley) shuah...@samsung.com
Re: [PATCH v6 0/2] Block layer support ZAC/ZBC commands
Shaun, > On Aug 14, 2016, at 09:09, Shaun Tancheffwrote: […] >>> >> No, surely not. >> But one of the _big_ advantages for the RB tree is blkdev_discard(). >> Without the RB tree any mkfs program will issue a 'discard' for every >> sector. We will be able to coalesce those into one discard per zone, but >> we still need to issue one for _every_ zone. > > How can you make coalesce work transparently in the > sd layer _without_ keeping some sort of a discard cache along > with the zone cache? > > Currently the block layer's blkdev_issue_discard() is breaking > large discard's into nice granular and aligned chunks but it is > not preventing small discards nor coalescing them. > > In the sd layer would there be way to persist or purge an > overly large discard cache? What about honoring > discard_zeroes_data? Once the discard is completed with > discard_zeroes_data you have to return zeroes whenever > a discarded sector is read. Isn't that a log more than just > tracking a write pointer? Couldn't a zone have dozens of holes? My understanding of the standards regarding discard is that it is not mandatory and that it is a hint to the drive. The drive can completely ignore it if it thinks that is a better choice. I may be wrong on this though. Need to check again. For reset write pointer, the mapping to discard requires that the calls to blkdev_issue_discard be zone aligned for anything to happen. Specify less than a zone and nothing will be done. This I think preserve the discard semantic. As for the “discard_zeroes_data” thing, I also think that is a drive feature not mandatory. Drives may have it or not, which is consistent with the ZBC/ZAC standards regarding reading after write pointer (nothing says that zeros have to be returned). In any case, discard of CMR zones will be a nop, so for SMR drives, discard_zeroes_data=0 may be a better choice. > >> Which is (as indicated) really slow, and easily takes several minutes. >> With the RB tree we can short-circuit discards to empty zones, and speed >> up processing time dramatically. >> Sure we could be moving the logic into mkfs and friends, but that would >> require us to change the programs and agree on a library (libzbc?) which >> should be handling that. > > F2FS's mkfs.f2fs is already reading the zone topology via SG_IO ... > so I'm not sure your argument is valid here. This initial SMR support patch is just that: a first try. Jaegeuk used SG_IO (in fact copy-paste of parts of libzbc) because the current ZBC patch-set has no ioctl API for zone information manipulation. We will fix this mkfs.f2fs once we agree on an ioctl interface. > > [..] > 3) Try to condense the blkzone data structure to save memory: I think that we can at the very least remove the zone length, and also may be the per zone spinlock too (a single spinlock and proper state flags can be used). >>> >>> I have a variant that is an array of descriptors that roughly mimics the >>> api from blk-zoned.c that I did a few months ago as an example. >>> I should be able to update that to the current kernel + patches. >>> >> Okay. If we restrict the in-kernel SMR drive handling to devices with >> identical zone sizes of course we can remove the zone length. >> And we can do away with the per-zone spinlock and use a global one instead. > > I don't think dropping the zone length is a reasonable thing to do. > > What I propose is an array of _descriptors_ it doesn't drop _any_ > of the zone information that you are holding in an RB tree, it is > just a condensed format that _mostly_ plugs into your existing > API. I do not agree. The Seagate drive already has one zone (the last one) that is not the same length as the other zones. Sure, since it is the last one, we can had “if (last zone)” all over the place and make it work. But that is really ugly. Keeping the length field makes the code generic and following the standard, which has no restriction on the zone sizes. We could do some memory optimisation using different types of blk_zone sturcts, the types mapping to the SAME value: drives with constant zone size can use a blk_zone type without the length field, others use a different type that include the field. Accessor functions can hide the different types in the zone manipulation code. Best regards. -- Damien Le Moal, Ph.D. Sr. Manager, System Software Group, HGST Research, HGST, a Western Digital brand damien.lem...@hgst.com (+81) 0466-98-3593 (ext. 513593) 1 kirihara-cho, Fujisawa, Kanagawa, 252-0888 Japan www.hgst.com Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer: This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution
Re: [PATCH 4.6 00/56] 4.6.7-stable review
On 08/14/2016 02:37 PM, Greg Kroah-Hartman wrote: > * > NOTE > This is the LAST 4.6.y kernel that will be released. After this > release, it is end-of-life. You should be moving on to 4.7.y at this > point in time. You have been warned. > * > > This is the start of the stable review cycle for the 4.6.7 release. > There are 56 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Tue Aug 16 20:24:52 UTC 2016. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.6.7-rc1.gz > or in the git tree and branch at: > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.6.y > and the diffstat can be found below. > Compiled and booted on my test system. No dmesg regressions. thanks, -- Shuah -- Shuah Khan Sr. Linux Kernel Developer Open Source Innovation Group Samsung Research America(Silicon Valley) shuah...@samsung.com
Re: [PATCH 4.4 00/49] 4.4.18-stable review
On 08/14/2016 02:23 PM, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 4.4.18 release. > There are 49 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Tue Aug 16 20:22:43 UTC 2016. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.18-rc1.gz > or in the git tree and branch at: > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.4.y > and the diffstat can be found below. > Compiled and booted on my test system. No dmesg regressions. thanks, -- Shuah -- Shuah Khan Sr. Linux Kernel Developer Open Source Innovation Group Samsung Research America(Silicon Valley) shuah...@samsung.com
Re: [PATCH 4.6 00/56] 4.6.7-stable review
On 08/14/2016 02:37 PM, Greg Kroah-Hartman wrote: > * > NOTE > This is the LAST 4.6.y kernel that will be released. After this > release, it is end-of-life. You should be moving on to 4.7.y at this > point in time. You have been warned. > * > > This is the start of the stable review cycle for the 4.6.7 release. > There are 56 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Tue Aug 16 20:24:52 UTC 2016. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.6.7-rc1.gz > or in the git tree and branch at: > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.6.y > and the diffstat can be found below. > Compiled and booted on my test system. No dmesg regressions. thanks, -- Shuah -- Shuah Khan Sr. Linux Kernel Developer Open Source Innovation Group Samsung Research America(Silicon Valley) shuah...@samsung.com
Re: [PATCH 4.4 00/49] 4.4.18-stable review
On 08/14/2016 02:23 PM, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 4.4.18 release. > There are 49 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Tue Aug 16 20:22:43 UTC 2016. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.18-rc1.gz > or in the git tree and branch at: > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.4.y > and the diffstat can be found below. > Compiled and booted on my test system. No dmesg regressions. thanks, -- Shuah -- Shuah Khan Sr. Linux Kernel Developer Open Source Innovation Group Samsung Research America(Silicon Valley) shuah...@samsung.com
Re: [PATCH v6 0/2] Block layer support ZAC/ZBC commands
Shaun, > On Aug 14, 2016, at 09:09, Shaun Tancheff wrote: […] >>> >> No, surely not. >> But one of the _big_ advantages for the RB tree is blkdev_discard(). >> Without the RB tree any mkfs program will issue a 'discard' for every >> sector. We will be able to coalesce those into one discard per zone, but >> we still need to issue one for _every_ zone. > > How can you make coalesce work transparently in the > sd layer _without_ keeping some sort of a discard cache along > with the zone cache? > > Currently the block layer's blkdev_issue_discard() is breaking > large discard's into nice granular and aligned chunks but it is > not preventing small discards nor coalescing them. > > In the sd layer would there be way to persist or purge an > overly large discard cache? What about honoring > discard_zeroes_data? Once the discard is completed with > discard_zeroes_data you have to return zeroes whenever > a discarded sector is read. Isn't that a log more than just > tracking a write pointer? Couldn't a zone have dozens of holes? My understanding of the standards regarding discard is that it is not mandatory and that it is a hint to the drive. The drive can completely ignore it if it thinks that is a better choice. I may be wrong on this though. Need to check again. For reset write pointer, the mapping to discard requires that the calls to blkdev_issue_discard be zone aligned for anything to happen. Specify less than a zone and nothing will be done. This I think preserve the discard semantic. As for the “discard_zeroes_data” thing, I also think that is a drive feature not mandatory. Drives may have it or not, which is consistent with the ZBC/ZAC standards regarding reading after write pointer (nothing says that zeros have to be returned). In any case, discard of CMR zones will be a nop, so for SMR drives, discard_zeroes_data=0 may be a better choice. > >> Which is (as indicated) really slow, and easily takes several minutes. >> With the RB tree we can short-circuit discards to empty zones, and speed >> up processing time dramatically. >> Sure we could be moving the logic into mkfs and friends, but that would >> require us to change the programs and agree on a library (libzbc?) which >> should be handling that. > > F2FS's mkfs.f2fs is already reading the zone topology via SG_IO ... > so I'm not sure your argument is valid here. This initial SMR support patch is just that: a first try. Jaegeuk used SG_IO (in fact copy-paste of parts of libzbc) because the current ZBC patch-set has no ioctl API for zone information manipulation. We will fix this mkfs.f2fs once we agree on an ioctl interface. > > [..] > 3) Try to condense the blkzone data structure to save memory: I think that we can at the very least remove the zone length, and also may be the per zone spinlock too (a single spinlock and proper state flags can be used). >>> >>> I have a variant that is an array of descriptors that roughly mimics the >>> api from blk-zoned.c that I did a few months ago as an example. >>> I should be able to update that to the current kernel + patches. >>> >> Okay. If we restrict the in-kernel SMR drive handling to devices with >> identical zone sizes of course we can remove the zone length. >> And we can do away with the per-zone spinlock and use a global one instead. > > I don't think dropping the zone length is a reasonable thing to do. > > What I propose is an array of _descriptors_ it doesn't drop _any_ > of the zone information that you are holding in an RB tree, it is > just a condensed format that _mostly_ plugs into your existing > API. I do not agree. The Seagate drive already has one zone (the last one) that is not the same length as the other zones. Sure, since it is the last one, we can had “if (last zone)” all over the place and make it work. But that is really ugly. Keeping the length field makes the code generic and following the standard, which has no restriction on the zone sizes. We could do some memory optimisation using different types of blk_zone sturcts, the types mapping to the SAME value: drives with constant zone size can use a blk_zone type without the length field, others use a different type that include the field. Accessor functions can hide the different types in the zone manipulation code. Best regards. -- Damien Le Moal, Ph.D. Sr. Manager, System Software Group, HGST Research, HGST, a Western Digital brand damien.lem...@hgst.com (+81) 0466-98-3593 (ext. 513593) 1 kirihara-cho, Fujisawa, Kanagawa, 252-0888 Japan www.hgst.com Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer: This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or
Re: [PATCH 3.14 00/29] 3.14.76-stable review
On 08/14/2016 02:07 PM, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 3.14.76 release. > There are 29 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Tue Aug 16 20:07:18 UTC 2016. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.14.76-rc1.gz > or in the git tree and branch at: > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-3.14.y > and the diffstat can be found below. > Compiled and booted on my test system. No dmesg regressions. thanks, -- Shuah -- Shuah Khan Sr. Linux Kernel Developer Open Source Innovation Group Samsung Research America(Silicon Valley) shuah...@samsung.com
Re: [PATCH 3.14 00/29] 3.14.76-stable review
On 08/14/2016 02:07 PM, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 3.14.76 release. > There are 29 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Tue Aug 16 20:07:18 UTC 2016. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.14.76-rc1.gz > or in the git tree and branch at: > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-3.14.y > and the diffstat can be found below. > Compiled and booted on my test system. No dmesg regressions. thanks, -- Shuah -- Shuah Khan Sr. Linux Kernel Developer Open Source Innovation Group Samsung Research America(Silicon Valley) shuah...@samsung.com
linux-next: Tree for Aug 16
Hi all, Changes since 20160815: Non-merge commits (relative to Linus' tree): 2149 2086 files changed, 87009 insertions(+), 30762 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig (with CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig (this fails its final link) and pseries_le_defconfig and i386, sparc and sparc64 defconfig. Below is a summary of the state of the merge. I am currently merging 241 trees (counting Linus' and 35 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (3684b03d8e9a Merge tag 'iommu-fixes-v4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu) Merging fixes/master (d3396e1e4ec4 Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging kbuild-current/rc-fixes (b36fad65d61f kbuild: Initialize exported variables) Merging arc-current/for-curr (45c3b08a117e ARC: Elide redundant setup of DMA callbacks) Merging arm-current/fixes (87eed3c74d7c ARM: fix address limit restoration for undefined instructions) Merging m68k-current/for-linus (6bd80f372371 m68k/defconfig: Update defconfigs for v4.7-rc2) Merging metag-fixes/fixes (97b1d23f7bcb metag: Drop show_mem() from mem_init()) Merging powerpc-fixes/fixes (ca49e64f0cb1 selftests/powerpc: Specify we expect to build with std=gnu99) Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2) Merging sparc/master (4620a06e4b3c shmem: Fix link error if huge pages support is disabled) Merging net/master (d2fbdf76b85b tipc: fix NULL pointer dereference in shutdown()) Merging ipsec/master (1625f4529957 net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key) Merging netfilter/master (4b5b9ba553f9 openvswitch: do not ignore netdev errors when creating tunnel vports) Merging ipvs/master (ea43f860d984 Merge branch 'ethoc-fixes') Merging wireless-drivers/master (034fdd4a17ff Merge ath-current from ath.git) Merging mac80211/master (4d0bd46a4d55 Revert "wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel") Merging sound-current/for-linus (a52ff34e5ec6 ALSA: hda - Manage power well properly for resume) Merging pci-current/for-linus (8b078c603249 PCI: Update "pci=resource_alignment" documentation) Merging driver-core.current/driver-core-linus (694d0d0bb203 Linux 4.8-rc2) Merging tty.current/tty-linus (29b4817d4018 Linux 4.8-rc1) Merging usb.current/usb-linus (add125054b87 cdc-acm: fix wrong pipe type on rx interrupt xfers) Merging usb-gadget-fixes/fixes (a0ad85ae866f usb: dwc3: gadget: stop processing on HWO set) Merging usb-serial-fixes/usb-linus (3b7c7e52efda USB: serial: mos7840: fix non-atomic allocation in write path) Merging usb-chipidea-fixes/ci-for-usb-stable (ea1d39a31d3b usb: common: otg-fsm: add license to usb-otg-fsm) Merging staging.current/staging-linus (99f1c013194e staging/lustre/llite: Close atomic_open race with several openers) Merging char-misc.current/char-misc-linus (7b142d8fd0bd android: binder: fix dangling pointer comparison) Merging input-current/for-linus (22fe874f3803 Input: silead - remove some dead code) Merging crypto-current/master (a0118c8b2be9 crypto: caam - fix non-hmac hashes) Merging ide/master (797cee982eef Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit) Merging rr-fixes/fixes (8244062ef1e5 modules: fix longstanding /proc/kallsyms vs module insertion race.) Merging vfio-fixes/for-linus (c8952a707556 vfio/pci: Fix NULL pointer oops in error interrupt setup handling) Merging kselftest-fixes/fixes (29b4817d4018 Linux 4.8-rc1) Merging backli
linux-next: Tree for Aug 16
Hi all, Changes since 20160815: Non-merge commits (relative to Linus' tree): 2149 2086 files changed, 87009 insertions(+), 30762 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig (with CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig (this fails its final link) and pseries_le_defconfig and i386, sparc and sparc64 defconfig. Below is a summary of the state of the merge. I am currently merging 241 trees (counting Linus' and 35 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (3684b03d8e9a Merge tag 'iommu-fixes-v4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu) Merging fixes/master (d3396e1e4ec4 Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging kbuild-current/rc-fixes (b36fad65d61f kbuild: Initialize exported variables) Merging arc-current/for-curr (45c3b08a117e ARC: Elide redundant setup of DMA callbacks) Merging arm-current/fixes (87eed3c74d7c ARM: fix address limit restoration for undefined instructions) Merging m68k-current/for-linus (6bd80f372371 m68k/defconfig: Update defconfigs for v4.7-rc2) Merging metag-fixes/fixes (97b1d23f7bcb metag: Drop show_mem() from mem_init()) Merging powerpc-fixes/fixes (ca49e64f0cb1 selftests/powerpc: Specify we expect to build with std=gnu99) Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2) Merging sparc/master (4620a06e4b3c shmem: Fix link error if huge pages support is disabled) Merging net/master (d2fbdf76b85b tipc: fix NULL pointer dereference in shutdown()) Merging ipsec/master (1625f4529957 net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key) Merging netfilter/master (4b5b9ba553f9 openvswitch: do not ignore netdev errors when creating tunnel vports) Merging ipvs/master (ea43f860d984 Merge branch 'ethoc-fixes') Merging wireless-drivers/master (034fdd4a17ff Merge ath-current from ath.git) Merging mac80211/master (4d0bd46a4d55 Revert "wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel") Merging sound-current/for-linus (a52ff34e5ec6 ALSA: hda - Manage power well properly for resume) Merging pci-current/for-linus (8b078c603249 PCI: Update "pci=resource_alignment" documentation) Merging driver-core.current/driver-core-linus (694d0d0bb203 Linux 4.8-rc2) Merging tty.current/tty-linus (29b4817d4018 Linux 4.8-rc1) Merging usb.current/usb-linus (add125054b87 cdc-acm: fix wrong pipe type on rx interrupt xfers) Merging usb-gadget-fixes/fixes (a0ad85ae866f usb: dwc3: gadget: stop processing on HWO set) Merging usb-serial-fixes/usb-linus (3b7c7e52efda USB: serial: mos7840: fix non-atomic allocation in write path) Merging usb-chipidea-fixes/ci-for-usb-stable (ea1d39a31d3b usb: common: otg-fsm: add license to usb-otg-fsm) Merging staging.current/staging-linus (99f1c013194e staging/lustre/llite: Close atomic_open race with several openers) Merging char-misc.current/char-misc-linus (7b142d8fd0bd android: binder: fix dangling pointer comparison) Merging input-current/for-linus (22fe874f3803 Input: silead - remove some dead code) Merging crypto-current/master (a0118c8b2be9 crypto: caam - fix non-hmac hashes) Merging ide/master (797cee982eef Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit) Merging rr-fixes/fixes (8244062ef1e5 modules: fix longstanding /proc/kallsyms vs module insertion race.) Merging vfio-fixes/for-linus (c8952a707556 vfio/pci: Fix NULL pointer oops in error interrupt setup handling) Merging kselftest-fixes/fixes (29b4817d4018 Linux 4.8-rc1) Merging backli
[PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered
We have observed on few x86 machines with rtc-cmos device that hpet_rtc_interrupt() is called just after irq registration and before cmos_do_probe() could call hpet_rtc_timer_init(). So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time interrupt is raised in the given situation, and this results in NMI watchdog LOCKUP. It has only been observed sporadically on kdump secondary kernels. See the call trace: ---<-snip->--- 27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 [ 27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-342.el7.x86_64 #1 [ 27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014 [ 27.919455] 8186a728 59c82488 880034e05af0 81637bd4 [ 27.921870] 880034e05b70 8163144a 0010 880034e05b80 [ 27.924257] 880034e05b20 59c82488 [ 27.926599] Call Trace: [ 27.927352][] dump_stack+0x19/0x1b [ 27.929080] [] panic+0xd8/0x1e7 [ 27.930588] [] ? restart_watchdog_hrtimer+0x50/0x50 [ 27.932502] [] watchdog_overflow_callback+0xc2/0xd0 [ 27.934427] [] __perf_event_overflow+0xa1/0x250 [ 27.936232] [] perf_event_overflow+0x14/0x20 [ 27.937957] [] intel_pmu_handle_irq+0x1e8/0x470 [ 27.939799] [] perf_event_nmi_handler+0x2b/0x50 [ 27.941649] [] nmi_handle.isra.0+0x69/0xb0 [ 27.943348] [] do_nmi+0x169/0x340 [ 27.944802] [] end_repeat_nmi+0x1e/0x2e [ 27.946424] [] ? hpet_rtc_interrupt+0x85/0x380 [ 27.948197] [] ? hpet_rtc_interrupt+0x85/0x380 [ 27.949992] [] ? hpet_rtc_interrupt+0x85/0x380 [ 27.951816] <>[] ? run_timer_softirq+0x43/0x340 [ 27.954114] [] handle_irq_event_percpu+0x3e/0x1e0 [ 27.955962] [] handle_irq_event+0x3d/0x60 [ 27.957635] [] handle_edge_irq+0x77/0x130 [ 27.959332] [] handle_irq+0xbf/0x150 [ 27.960949] [] do_IRQ+0x4f/0xf0 [ 27.962434] [] common_interrupt+0x6d/0x6d [ 27.964101][] ? _raw_spin_unlock_irqrestore+0x1b/0x40 [ 27.966308] [] __setup_irq+0x2a7/0x570 [ 28.067859] [] ? hpet_cpuhp_notify+0x140/0x140 [ 28.069709] [] request_threaded_irq+0xcc/0x170 [ 28.071585] [] cmos_do_probe+0x1e6/0x450 [ 28.073240] [] ? cmos_do_probe+0x450/0x450 [ 28.074911] [] cmos_pnp_probe+0xbb/0xc0 [ 28.076533] [] pnp_device_probe+0x65/0xd0 [ 28.078198] [] driver_probe_device+0x87/0x390 [ 28.079971] [] __driver_attach+0x93/0xa0 [ 28.081660] [] ? __device_attach+0x40/0x40 [ 28.083662] [] bus_for_each_dev+0x73/0xc0 [ 28.085370] [] driver_attach+0x1e/0x20 [ 28.086974] [] bus_add_driver+0x200/0x2d0 [ 28.088634] [] ? rtc_sysfs_init+0xe/0xe [ 28.090349] [] driver_register+0x64/0xf0 [ 28.091989] [] pnp_register_driver+0x20/0x30 [ 28.093707] [] cmos_init+0x11/0x71 ---<-snip->--- The previous patch split hpet_rtc_timer_init into hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable(). Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ registration, so that we can gracefully handle such spurious interrupts. We were able to reproduce the problem in maximum 15 trials of kdump secondary kernel boot on an hp-dl160gen8 machine without this patch. However, more than 35 trials went fine after applying this patch. Signed-off-by: Pratyush Anand[dzic...@redhat.com: edited the patch's summary] Signed-off-by: Don Zickus --- drivers/rtc/rtc-cmos.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index 43745cac0141..089d987f2638 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void) return 0; } +static inline int hpet_rtc_timer_counter_init(void) +{ + return 0; +} + +static inline int hpet_rtc_timer_enable(void) +{ + return 0; +} + static inline int hpet_rtc_timer_init(void) { return 0; @@ -707,6 +717,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) goto cleanup1; } + hpet_rtc_timer_counter_init(); if (is_valid_irq(rtc_irq)) { irq_handler_t rtc_cmos_int_handler; @@ -729,7 +740,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) goto cleanup1; } } - hpet_rtc_timer_init(); + hpet_rtc_timer_enable(); /* export at least the first block of NVRAM */ nvram.size = address_space - NVRAM_OFFSET; -- 2.5.5
[PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered
We have observed on few x86 machines with rtc-cmos device that hpet_rtc_interrupt() is called just after irq registration and before cmos_do_probe() could call hpet_rtc_timer_init(). So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time interrupt is raised in the given situation, and this results in NMI watchdog LOCKUP. It has only been observed sporadically on kdump secondary kernels. See the call trace: ---<-snip->--- 27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 [ 27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-342.el7.x86_64 #1 [ 27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014 [ 27.919455] 8186a728 59c82488 880034e05af0 81637bd4 [ 27.921870] 880034e05b70 8163144a 0010 880034e05b80 [ 27.924257] 880034e05b20 59c82488 [ 27.926599] Call Trace: [ 27.927352][] dump_stack+0x19/0x1b [ 27.929080] [] panic+0xd8/0x1e7 [ 27.930588] [] ? restart_watchdog_hrtimer+0x50/0x50 [ 27.932502] [] watchdog_overflow_callback+0xc2/0xd0 [ 27.934427] [] __perf_event_overflow+0xa1/0x250 [ 27.936232] [] perf_event_overflow+0x14/0x20 [ 27.937957] [] intel_pmu_handle_irq+0x1e8/0x470 [ 27.939799] [] perf_event_nmi_handler+0x2b/0x50 [ 27.941649] [] nmi_handle.isra.0+0x69/0xb0 [ 27.943348] [] do_nmi+0x169/0x340 [ 27.944802] [] end_repeat_nmi+0x1e/0x2e [ 27.946424] [] ? hpet_rtc_interrupt+0x85/0x380 [ 27.948197] [] ? hpet_rtc_interrupt+0x85/0x380 [ 27.949992] [] ? hpet_rtc_interrupt+0x85/0x380 [ 27.951816] <>[] ? run_timer_softirq+0x43/0x340 [ 27.954114] [] handle_irq_event_percpu+0x3e/0x1e0 [ 27.955962] [] handle_irq_event+0x3d/0x60 [ 27.957635] [] handle_edge_irq+0x77/0x130 [ 27.959332] [] handle_irq+0xbf/0x150 [ 27.960949] [] do_IRQ+0x4f/0xf0 [ 27.962434] [] common_interrupt+0x6d/0x6d [ 27.964101][] ? _raw_spin_unlock_irqrestore+0x1b/0x40 [ 27.966308] [] __setup_irq+0x2a7/0x570 [ 28.067859] [] ? hpet_cpuhp_notify+0x140/0x140 [ 28.069709] [] request_threaded_irq+0xcc/0x170 [ 28.071585] [] cmos_do_probe+0x1e6/0x450 [ 28.073240] [] ? cmos_do_probe+0x450/0x450 [ 28.074911] [] cmos_pnp_probe+0xbb/0xc0 [ 28.076533] [] pnp_device_probe+0x65/0xd0 [ 28.078198] [] driver_probe_device+0x87/0x390 [ 28.079971] [] __driver_attach+0x93/0xa0 [ 28.081660] [] ? __device_attach+0x40/0x40 [ 28.083662] [] bus_for_each_dev+0x73/0xc0 [ 28.085370] [] driver_attach+0x1e/0x20 [ 28.086974] [] bus_add_driver+0x200/0x2d0 [ 28.088634] [] ? rtc_sysfs_init+0xe/0xe [ 28.090349] [] driver_register+0x64/0xf0 [ 28.091989] [] pnp_register_driver+0x20/0x30 [ 28.093707] [] cmos_init+0x11/0x71 ---<-snip->--- The previous patch split hpet_rtc_timer_init into hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable(). Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ registration, so that we can gracefully handle such spurious interrupts. We were able to reproduce the problem in maximum 15 trials of kdump secondary kernel boot on an hp-dl160gen8 machine without this patch. However, more than 35 trials went fine after applying this patch. Signed-off-by: Pratyush Anand [dzic...@redhat.com: edited the patch's summary] Signed-off-by: Don Zickus --- drivers/rtc/rtc-cmos.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index 43745cac0141..089d987f2638 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void) return 0; } +static inline int hpet_rtc_timer_counter_init(void) +{ + return 0; +} + +static inline int hpet_rtc_timer_enable(void) +{ + return 0; +} + static inline int hpet_rtc_timer_init(void) { return 0; @@ -707,6 +717,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) goto cleanup1; } + hpet_rtc_timer_counter_init(); if (is_valid_irq(rtc_irq)) { irq_handler_t rtc_cmos_int_handler; @@ -729,7 +740,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) goto cleanup1; } } - hpet_rtc_timer_init(); + hpet_rtc_timer_enable(); /* export at least the first block of NVRAM */ nvram.size = address_space - NVRAM_OFFSET; -- 2.5.5
[PATCH V3 1/2] rtc/hpet: Factorize hpet_rtc_timer_init()
We need the ability to support initialization of hpet_default_delta and hpet_t1_cmp counters before irq can be enabled. This patch splits hpet_rtc_timer_init() into two functions: hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable, so that the above functionality can be achieved. Next patch explains it's need in detail. No functional change in this patch. Signed-off-by: Pratyush Anand[dzic...@redhat.com: edited the patch's summary] Signed-off-by: Don Zickus --- arch/x86/include/asm/hpet.h | 2 ++ arch/x86/kernel/hpet.c | 41 +++-- 2 files changed, 37 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h index cc285ec4b2c1..8eecb31bebcb 100644 --- a/arch/x86/include/asm/hpet.h +++ b/arch/x86/include/asm/hpet.h @@ -96,6 +96,8 @@ extern int hpet_set_alarm_time(unsigned char hrs, unsigned char min, unsigned char sec); extern int hpet_set_periodic_freq(unsigned long freq); extern int hpet_rtc_dropped_irq(void); +extern int hpet_rtc_timer_counter_init(void); +extern int hpet_rtc_timer_enable(void); extern int hpet_rtc_timer_init(void); extern irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id); extern int hpet_register_irq_handler(rtc_irq_handler handler); diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index ed16e58658a4..6f6d21059b1b 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -1074,14 +1074,12 @@ void hpet_unregister_irq_handler(rtc_irq_handler handler) EXPORT_SYMBOL_GPL(hpet_unregister_irq_handler); /* - * Timer 1 for RTC emulation. We use one shot mode, as periodic mode - * is not supported by all HPET implementations for timer 1. - * - * hpet_rtc_timer_init() is called when the rtc is initialized. + * hpet_rtc_timer_counter_init() is called before interrupt can be + * registered */ -int hpet_rtc_timer_init(void) +int hpet_rtc_timer_counter_init(void) { - unsigned int cfg, cnt, delta; + unsigned int cnt, delta; unsigned long flags; if (!is_hpet_enabled()) @@ -1106,6 +1104,22 @@ int hpet_rtc_timer_init(void) hpet_writel(cnt, HPET_T1_CMP); hpet_t1_cmp = cnt; + local_irq_restore(flags); + + return 1; +} +EXPORT_SYMBOL_GPL(hpet_rtc_timer_counter_init); + +/* + * hpet_rtc_timer_enable() is called during RTC initialization + */ +int hpet_rtc_timer_enable(void) +{ + unsigned int cfg; + unsigned long flags; + + local_irq_save(flags); + cfg = hpet_readl(HPET_T1_CFG); cfg &= ~HPET_TN_PERIODIC; cfg |= HPET_TN_ENABLE | HPET_TN_32BIT; @@ -1115,6 +1129,21 @@ int hpet_rtc_timer_init(void) return 1; } +EXPORT_SYMBOL_GPL(hpet_rtc_timer_enable); + +/* + * Timer 1 for RTC emulation. We use one shot mode, as periodic mode + * is not supported by all HPET implementations for timer 1. + * + * hpet_rtc_timer_init() is called when the rtc is initialized. + */ +int hpet_rtc_timer_init(void) +{ + if (!hpet_rtc_timer_counter_init()) + return 0; + + return hpet_rtc_timer_enable(); +} EXPORT_SYMBOL_GPL(hpet_rtc_timer_init); static void hpet_disable_rtc_channel(void) -- 2.5.5
[PATCH V3 1/2] rtc/hpet: Factorize hpet_rtc_timer_init()
We need the ability to support initialization of hpet_default_delta and hpet_t1_cmp counters before irq can be enabled. This patch splits hpet_rtc_timer_init() into two functions: hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable, so that the above functionality can be achieved. Next patch explains it's need in detail. No functional change in this patch. Signed-off-by: Pratyush Anand [dzic...@redhat.com: edited the patch's summary] Signed-off-by: Don Zickus --- arch/x86/include/asm/hpet.h | 2 ++ arch/x86/kernel/hpet.c | 41 +++-- 2 files changed, 37 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h index cc285ec4b2c1..8eecb31bebcb 100644 --- a/arch/x86/include/asm/hpet.h +++ b/arch/x86/include/asm/hpet.h @@ -96,6 +96,8 @@ extern int hpet_set_alarm_time(unsigned char hrs, unsigned char min, unsigned char sec); extern int hpet_set_periodic_freq(unsigned long freq); extern int hpet_rtc_dropped_irq(void); +extern int hpet_rtc_timer_counter_init(void); +extern int hpet_rtc_timer_enable(void); extern int hpet_rtc_timer_init(void); extern irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id); extern int hpet_register_irq_handler(rtc_irq_handler handler); diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index ed16e58658a4..6f6d21059b1b 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -1074,14 +1074,12 @@ void hpet_unregister_irq_handler(rtc_irq_handler handler) EXPORT_SYMBOL_GPL(hpet_unregister_irq_handler); /* - * Timer 1 for RTC emulation. We use one shot mode, as periodic mode - * is not supported by all HPET implementations for timer 1. - * - * hpet_rtc_timer_init() is called when the rtc is initialized. + * hpet_rtc_timer_counter_init() is called before interrupt can be + * registered */ -int hpet_rtc_timer_init(void) +int hpet_rtc_timer_counter_init(void) { - unsigned int cfg, cnt, delta; + unsigned int cnt, delta; unsigned long flags; if (!is_hpet_enabled()) @@ -1106,6 +1104,22 @@ int hpet_rtc_timer_init(void) hpet_writel(cnt, HPET_T1_CMP); hpet_t1_cmp = cnt; + local_irq_restore(flags); + + return 1; +} +EXPORT_SYMBOL_GPL(hpet_rtc_timer_counter_init); + +/* + * hpet_rtc_timer_enable() is called during RTC initialization + */ +int hpet_rtc_timer_enable(void) +{ + unsigned int cfg; + unsigned long flags; + + local_irq_save(flags); + cfg = hpet_readl(HPET_T1_CFG); cfg &= ~HPET_TN_PERIODIC; cfg |= HPET_TN_ENABLE | HPET_TN_32BIT; @@ -1115,6 +1129,21 @@ int hpet_rtc_timer_init(void) return 1; } +EXPORT_SYMBOL_GPL(hpet_rtc_timer_enable); + +/* + * Timer 1 for RTC emulation. We use one shot mode, as periodic mode + * is not supported by all HPET implementations for timer 1. + * + * hpet_rtc_timer_init() is called when the rtc is initialized. + */ +int hpet_rtc_timer_init(void) +{ + if (!hpet_rtc_timer_counter_init()) + return 0; + + return hpet_rtc_timer_enable(); +} EXPORT_SYMBOL_GPL(hpet_rtc_timer_init); static void hpet_disable_rtc_channel(void) -- 2.5.5
[PATCH 6/7] dax: define a unified inode/address_space for device-dax mappings
In support of enabling resize / truncate of device-dax instances, define a pseudo-fs to provide a unified inode/address space for vm operations. Cc: Al ViroSigned-off-by: Dan Williams --- drivers/dax/dax.c | 150 +++- fs/char_dev.c |1 include/uapi/linux/magic.h |1 3 files changed, 148 insertions(+), 4 deletions(-) diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c index 17715773c097..e8b9319aeadb 100644 --- a/drivers/dax/dax.c +++ b/drivers/dax/dax.c @@ -13,7 +13,9 @@ #include #include #include +#include #include +#include #include #include #include @@ -26,6 +28,9 @@ static struct class *dax_class; static DEFINE_IDA(dax_minor_ida); static int nr_dax = CONFIG_NR_DEV_DAX; module_param(nr_dax, int, S_IRUGO); +static struct vfsmount *dax_mnt; +static struct kmem_cache *dax_cache __read_mostly; +static struct super_block *dax_superblock __read_mostly; MODULE_PARM_DESC(nr_dax, "max number of device-dax instances"); /** @@ -61,6 +66,7 @@ struct dax_region { */ struct dax_dev { struct dax_region *region; + struct inode *inode; struct device dev; struct cdev cdev; bool alive; @@ -69,6 +75,117 @@ struct dax_dev { struct resource res[0]; }; +static struct inode *dax_alloc_inode(struct super_block *sb) +{ + return kmem_cache_alloc(dax_cache, GFP_KERNEL); +} + +static void dax_i_callback(struct rcu_head *head) +{ + struct inode *inode = container_of(head, struct inode, i_rcu); + + kmem_cache_free(dax_cache, inode); +} + +static void dax_destroy_inode(struct inode *inode) +{ + call_rcu(>i_rcu, dax_i_callback); +} + +static const struct super_operations dax_sops = { + .statfs = simple_statfs, + .alloc_inode = dax_alloc_inode, + .destroy_inode = dax_destroy_inode, + .drop_inode = generic_delete_inode, +}; + +static struct dentry *dax_mount(struct file_system_type *fs_type, + int flags, const char *dev_name, void *data) +{ + return mount_pseudo(fs_type, "dax:", _sops, NULL, DAXFS_MAGIC); +} + +static struct file_system_type dax_type = { + .name = "dax", + .mount = dax_mount, + .kill_sb = kill_anon_super, +}; + +static int dax_test(struct inode *inode, void *data) +{ + return inode->i_cdev == data; +} + +static int dax_set(struct inode *inode, void *data) +{ + inode->i_cdev = data; + return 0; +} + +static struct inode *dax_inode_get(struct cdev *cdev, dev_t devt) +{ + struct inode *inode; + + inode = iget5_locked(dax_superblock, hash_32(devt + DAXFS_MAGIC, 31), + dax_test, dax_set, cdev); + + if (!inode) + return NULL; + + if (inode->i_state & I_NEW) { + inode->i_mode = S_IFCHR; + inode->i_flags = S_DAX; + inode->i_rdev = devt; + mapping_set_gfp_mask(>i_data, GFP_USER); + unlock_new_inode(inode); + } + return inode; +} + +static void init_once(void *inode) +{ + inode_init_once(inode); +} + +static int dax_inode_init(void) +{ + int rc; + + dax_cache = kmem_cache_create("dax_cache", sizeof(struct inode), 0, + (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT| +SLAB_MEM_SPREAD|SLAB_ACCOUNT), + init_once); + if (!dax_cache) + return -ENOMEM; + + rc = register_filesystem(_type); + if (rc) + goto err_register_fs; + + dax_mnt = kern_mount(_type); + if (IS_ERR(dax_mnt)) { + rc = PTR_ERR(dax_mnt); + goto err_mount; + } + dax_superblock = dax_mnt->mnt_sb; + + return 0; + + err_mount: + unregister_filesystem(_type); + err_register_fs: + kmem_cache_destroy(dax_cache); + + return rc; +} + +static void dax_inode_exit(void) +{ + kern_unmount(dax_mnt); + unregister_filesystem(_type); + kmem_cache_destroy(dax_cache); +} + static void dax_region_free(struct kref *kref) { struct dax_region *dax_region; @@ -379,6 +496,9 @@ static int dax_open(struct inode *inode, struct file *filp) dax_dev = container_of(inode->i_cdev, struct dax_dev, cdev); dev_dbg(_dev->dev, "%s\n", __func__); + inode->i_mapping = dax_dev->inode->i_mapping; + inode->i_mapping->host = dax_dev->inode; + filp->f_mapping = inode->i_mapping; filp->private_data = dax_dev; inode->i_flags = S_DAX; @@ -410,6 +530,7 @@ static void dax_dev_release(struct device *dev) ida_simple_remove(_region->ida, dax_dev->id); ida_simple_remove(_minor_ida, MINOR(dev->devt)); dax_region_put(dax_region); + iput(dax_dev->inode); kfree(dax_dev); } @@ -459,6 +580,12 @@ int devm_create_dax_dev(struct dax_region *dax_region, struct resource
[PATCH 6/7] dax: define a unified inode/address_space for device-dax mappings
In support of enabling resize / truncate of device-dax instances, define a pseudo-fs to provide a unified inode/address space for vm operations. Cc: Al Viro Signed-off-by: Dan Williams --- drivers/dax/dax.c | 150 +++- fs/char_dev.c |1 include/uapi/linux/magic.h |1 3 files changed, 148 insertions(+), 4 deletions(-) diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c index 17715773c097..e8b9319aeadb 100644 --- a/drivers/dax/dax.c +++ b/drivers/dax/dax.c @@ -13,7 +13,9 @@ #include #include #include +#include #include +#include #include #include #include @@ -26,6 +28,9 @@ static struct class *dax_class; static DEFINE_IDA(dax_minor_ida); static int nr_dax = CONFIG_NR_DEV_DAX; module_param(nr_dax, int, S_IRUGO); +static struct vfsmount *dax_mnt; +static struct kmem_cache *dax_cache __read_mostly; +static struct super_block *dax_superblock __read_mostly; MODULE_PARM_DESC(nr_dax, "max number of device-dax instances"); /** @@ -61,6 +66,7 @@ struct dax_region { */ struct dax_dev { struct dax_region *region; + struct inode *inode; struct device dev; struct cdev cdev; bool alive; @@ -69,6 +75,117 @@ struct dax_dev { struct resource res[0]; }; +static struct inode *dax_alloc_inode(struct super_block *sb) +{ + return kmem_cache_alloc(dax_cache, GFP_KERNEL); +} + +static void dax_i_callback(struct rcu_head *head) +{ + struct inode *inode = container_of(head, struct inode, i_rcu); + + kmem_cache_free(dax_cache, inode); +} + +static void dax_destroy_inode(struct inode *inode) +{ + call_rcu(>i_rcu, dax_i_callback); +} + +static const struct super_operations dax_sops = { + .statfs = simple_statfs, + .alloc_inode = dax_alloc_inode, + .destroy_inode = dax_destroy_inode, + .drop_inode = generic_delete_inode, +}; + +static struct dentry *dax_mount(struct file_system_type *fs_type, + int flags, const char *dev_name, void *data) +{ + return mount_pseudo(fs_type, "dax:", _sops, NULL, DAXFS_MAGIC); +} + +static struct file_system_type dax_type = { + .name = "dax", + .mount = dax_mount, + .kill_sb = kill_anon_super, +}; + +static int dax_test(struct inode *inode, void *data) +{ + return inode->i_cdev == data; +} + +static int dax_set(struct inode *inode, void *data) +{ + inode->i_cdev = data; + return 0; +} + +static struct inode *dax_inode_get(struct cdev *cdev, dev_t devt) +{ + struct inode *inode; + + inode = iget5_locked(dax_superblock, hash_32(devt + DAXFS_MAGIC, 31), + dax_test, dax_set, cdev); + + if (!inode) + return NULL; + + if (inode->i_state & I_NEW) { + inode->i_mode = S_IFCHR; + inode->i_flags = S_DAX; + inode->i_rdev = devt; + mapping_set_gfp_mask(>i_data, GFP_USER); + unlock_new_inode(inode); + } + return inode; +} + +static void init_once(void *inode) +{ + inode_init_once(inode); +} + +static int dax_inode_init(void) +{ + int rc; + + dax_cache = kmem_cache_create("dax_cache", sizeof(struct inode), 0, + (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT| +SLAB_MEM_SPREAD|SLAB_ACCOUNT), + init_once); + if (!dax_cache) + return -ENOMEM; + + rc = register_filesystem(_type); + if (rc) + goto err_register_fs; + + dax_mnt = kern_mount(_type); + if (IS_ERR(dax_mnt)) { + rc = PTR_ERR(dax_mnt); + goto err_mount; + } + dax_superblock = dax_mnt->mnt_sb; + + return 0; + + err_mount: + unregister_filesystem(_type); + err_register_fs: + kmem_cache_destroy(dax_cache); + + return rc; +} + +static void dax_inode_exit(void) +{ + kern_unmount(dax_mnt); + unregister_filesystem(_type); + kmem_cache_destroy(dax_cache); +} + static void dax_region_free(struct kref *kref) { struct dax_region *dax_region; @@ -379,6 +496,9 @@ static int dax_open(struct inode *inode, struct file *filp) dax_dev = container_of(inode->i_cdev, struct dax_dev, cdev); dev_dbg(_dev->dev, "%s\n", __func__); + inode->i_mapping = dax_dev->inode->i_mapping; + inode->i_mapping->host = dax_dev->inode; + filp->f_mapping = inode->i_mapping; filp->private_data = dax_dev; inode->i_flags = S_DAX; @@ -410,6 +530,7 @@ static void dax_dev_release(struct device *dev) ida_simple_remove(_region->ida, dax_dev->id); ida_simple_remove(_minor_ida, MINOR(dev->devt)); dax_region_put(dax_region); + iput(dax_dev->inode); kfree(dax_dev); } @@ -459,6 +580,12 @@ int devm_create_dax_dev(struct dax_region *dax_region, struct resource *res, goto err_minor; } +
[PATCH 5/7] dax: convert to the cdev api
A goal of the device-DAX interface is to be able to support many exclusive allocations (partitions) of performance / feature differentiated memory. This count may exceed the default minors limit of 256. As a result of switching to an embedded cdev the inode-to-dax_dev conversion is simplified, as well as reference counting which can switch to the cdev kobject lifetime. Cc: Al ViroSigned-off-by: Dan Williams --- drivers/dax/Kconfig |5 +++ drivers/dax/dax.c | 82 ++- 2 files changed, 46 insertions(+), 41 deletions(-) diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig index cedab7572de3..daadd20aa936 100644 --- a/drivers/dax/Kconfig +++ b/drivers/dax/Kconfig @@ -23,4 +23,9 @@ config DEV_DAX_PMEM Say Y if unsure +config NR_DEV_DAX + int "Maximum number of Device-DAX instances" + default 32768 + range 256 2147483647 + endif diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c index 181d2a5a21e4..17715773c097 100644 --- a/drivers/dax/dax.c +++ b/drivers/dax/dax.c @@ -14,15 +14,19 @@ #include #include #include +#include #include #include #include #include #include "dax.h" -static int dax_major; +static dev_t dax_devt; static struct class *dax_class; static DEFINE_IDA(dax_minor_ida); +static int nr_dax = CONFIG_NR_DEV_DAX; +module_param(nr_dax, int, S_IRUGO); +MODULE_PARM_DESC(nr_dax, "max number of device-dax instances"); /** * struct dax_region - mapping infrastructure for dax devices @@ -49,6 +53,7 @@ struct dax_region { * struct dax_dev - subdivision of a dax region * @region - parent region * @dev - device backing the character device + * @cdev - core chardev data * @alive - !alive + rcu grace period == no new mappings can be established * @id - child id in the region * @num_resources - number of physical address extents in this device @@ -57,6 +62,7 @@ struct dax_region { struct dax_dev { struct dax_region *region; struct device dev; + struct cdev cdev; bool alive; int id; int num_resources; @@ -367,29 +373,12 @@ static unsigned long dax_get_unmapped_area(struct file *filp, return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags); } -static int __match_devt(struct device *dev, const void *data) -{ - const dev_t *devt = data; - - return dev->devt == *devt; -} - -static struct device *dax_dev_find(dev_t dev_t) -{ - return class_find_device(dax_class, NULL, _t, __match_devt); -} - static int dax_open(struct inode *inode, struct file *filp) { - struct dax_dev *dax_dev = NULL; - struct device *dev; - - dev = dax_dev_find(inode->i_rdev); - if (!dev) - return -ENXIO; + struct dax_dev *dax_dev; - dax_dev = to_dax_dev(dev); - dev_dbg(dev, "%s\n", __func__); + dax_dev = container_of(inode->i_cdev, struct dax_dev, cdev); + dev_dbg(_dev->dev, "%s\n", __func__); filp->private_data = dax_dev; inode->i_flags = S_DAX; @@ -399,11 +388,8 @@ static int dax_open(struct inode *inode, struct file *filp) static int dax_release(struct inode *inode, struct file *filp) { struct dax_dev *dax_dev = filp->private_data; - struct device *dev = _dev->dev; - - dev_dbg(dev, "%s\n", __func__); - put_device(dev); + dev_dbg(_dev->dev, "%s\n", __func__); return 0; } @@ -430,6 +416,7 @@ static void dax_dev_release(struct device *dev) static void unregister_dax_dev(void *dev) { struct dax_dev *dax_dev = to_dax_dev(dev); + struct cdev *cdev = _dev->cdev; dev_dbg(dev, "%s\n", __func__); @@ -442,6 +429,7 @@ static void unregister_dax_dev(void *dev) */ dax_dev->alive = false; synchronize_rcu(); + cdev_del(cdev); device_unregister(dev); } @@ -451,17 +439,13 @@ int devm_create_dax_dev(struct dax_region *dax_region, struct resource *res, struct device *parent = dax_region->dev; struct dax_dev *dax_dev; struct device *dev; + struct cdev *cdev; int rc, minor; dev_t dev_t; dax_dev = kzalloc(sizeof(*dax_dev) + sizeof(*res) * count, GFP_KERNEL); if (!dax_dev) return -ENOMEM; - memcpy(dax_dev->res, res, sizeof(*res) * count); - dax_dev->num_resources = count; - dax_dev->alive = true; - dax_dev->region = dax_region; - kref_get(_region->kref); dax_dev->id = ida_simple_get(_region->ida, 0, 0, GFP_KERNEL); if (dax_dev->id < 0) { @@ -475,10 +459,26 @@ int devm_create_dax_dev(struct dax_region *dax_region, struct resource *res, goto err_minor; } - dev_t = MKDEV(dax_major, minor); - + /* device_initialize() so cdev can reference kobj parent */ + dev_t = MKDEV(MAJOR(dax_devt), minor); dev = _dev->dev;
[PATCH 4/7] dax: embed a struct device in dax_dev
The kref in dax_dev can be made redundant if the final put_device() on the device associated with the dax_dev frees the dax_dev. This can be accomplished by embedding a struct device in struct dax_dev, open coding device_create() and specifying a custom release method. Signed-off-by: Dan Williams--- drivers/dax/dax.c | 130 ++--- 1 file changed, 45 insertions(+), 85 deletions(-) diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c index 994dfa507dfb..181d2a5a21e4 100644 --- a/drivers/dax/dax.c +++ b/drivers/dax/dax.c @@ -49,7 +49,6 @@ struct dax_region { * struct dax_dev - subdivision of a dax region * @region - parent region * @dev - device backing the character device - * @kref - enable this data to be tracked in filp->private_data * @alive - !alive + rcu grace period == no new mappings can be established * @id - child id in the region * @num_resources - number of physical address extents in this device @@ -57,8 +56,7 @@ struct dax_region { */ struct dax_dev { struct dax_region *region; - struct device *dev; - struct kref kref; + struct device dev; bool alive; int id; int num_resources; @@ -79,20 +77,6 @@ void dax_region_put(struct dax_region *dax_region) } EXPORT_SYMBOL_GPL(dax_region_put); -static void dax_dev_free(struct kref *kref) -{ - struct dax_dev *dax_dev; - - dax_dev = container_of(kref, struct dax_dev, kref); - dax_region_put(dax_dev->region); - kfree(dax_dev); -} - -static void dax_dev_put(struct dax_dev *dax_dev) -{ - kref_put(_dev->kref, dax_dev_free); -} - struct dax_region *alloc_dax_region(struct device *parent, int region_id, struct resource *res, unsigned int align, void *addr, unsigned long pfn_flags) @@ -117,10 +101,15 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id, } EXPORT_SYMBOL_GPL(alloc_dax_region); +static struct dax_dev *to_dax_dev(struct device *dev) +{ + return container_of(dev, struct dax_dev, dev); +} + static ssize_t size_show(struct device *dev, struct device_attribute *attr, char *buf) { - struct dax_dev *dax_dev = dev_get_drvdata(dev); + struct dax_dev *dax_dev = to_dax_dev(dev); unsigned long long size = 0; int i; @@ -149,7 +138,7 @@ static int check_vma(struct dax_dev *dax_dev, struct vm_area_struct *vma, const char *func) { struct dax_region *dax_region = dax_dev->region; - struct device *dev = dax_dev->dev; + struct device *dev = _dev->dev; unsigned long mask; if (!dax_dev->alive) @@ -214,7 +203,7 @@ static int __dax_dev_fault(struct dax_dev *dax_dev, struct vm_area_struct *vma, struct vm_fault *vmf) { unsigned long vaddr = (unsigned long) vmf->virtual_address; - struct device *dev = dax_dev->dev; + struct device *dev = _dev->dev; struct dax_region *dax_region; int rc = VM_FAULT_SIGBUS; phys_addr_t phys; @@ -254,7 +243,7 @@ static int dax_dev_fault(struct vm_area_struct *vma, struct vm_fault *vmf) struct file *filp = vma->vm_file; struct dax_dev *dax_dev = filp->private_data; - dev_dbg(dax_dev->dev, "%s: %s: %s (%#lx - %#lx)\n", __func__, + dev_dbg(_dev->dev, "%s: %s: %s (%#lx - %#lx)\n", __func__, current->comm, (vmf->flags & FAULT_FLAG_WRITE) ? "write" : "read", vma->vm_start, vma->vm_end); rcu_read_lock(); @@ -269,7 +258,7 @@ static int __dax_dev_pmd_fault(struct dax_dev *dax_dev, unsigned int flags) { unsigned long pmd_addr = addr & PMD_MASK; - struct device *dev = dax_dev->dev; + struct device *dev = _dev->dev; struct dax_region *dax_region; phys_addr_t phys; pgoff_t pgoff; @@ -311,7 +300,7 @@ static int dax_dev_pmd_fault(struct vm_area_struct *vma, unsigned long addr, struct file *filp = vma->vm_file; struct dax_dev *dax_dev = filp->private_data; - dev_dbg(dax_dev->dev, "%s: %s: %s (%#lx - %#lx)\n", __func__, + dev_dbg(_dev->dev, "%s: %s: %s (%#lx - %#lx)\n", __func__, current->comm, (flags & FAULT_FLAG_WRITE) ? "write" : "read", vma->vm_start, vma->vm_end); @@ -322,29 +311,9 @@ static int dax_dev_pmd_fault(struct vm_area_struct *vma, unsigned long addr, return rc; } -static void dax_dev_vm_open(struct vm_area_struct *vma) -{ - struct file *filp = vma->vm_file; - struct dax_dev *dax_dev = filp->private_data; - - dev_dbg(dax_dev->dev, "%s\n", __func__); - kref_get(_dev->kref); -} - -static void dax_dev_vm_close(struct vm_area_struct *vma) -{ - struct file *filp = vma->vm_file; - struct dax_dev *dax_dev = filp->private_data; - - dev_dbg(dax_dev->dev, "%s\n", __func__); -
[RFC PATCH] mmc: dw_mmc: avoid race condition of cpu and IDMAC
We could see an obvious race condition by test that the former write operation by IDMAC aiming to clear OWN bit reach right after the later configuration of the same desc, which makes the IDMAC be in SUSPEND state as the OWN bit was cleared by the asynchronous write operation of IDMAC. The bug can be very easy reproduced on RK3288 or similar when lowering the bandwidth of bus and aggravating the Qos to make the large numbers of IP fight for the priority. One possible replaceable solution may be alloc dual buff for the desc to avoid it but could still race each other theoretically. Signed-off-by: Shawn Lin--- drivers/mmc/host/dw_mmc.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c index 32380d5..7b01fab 100644 --- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -490,6 +490,23 @@ static void dw_mci_translate_sglist(struct dw_mci *host, struct mmc_data *data, length -= desc_len; /* +* OWN bit should be clear by IDMAC after +* finishing transfer. Let's wait for the +* asynchronous operation of IDMAC and cpu +* to make sure that we do not rely on the +* order of Qos of bus and architecture. +* Otherwise we could see a race condition +* here that the former write operation of +* IDMAC(to clear the OWN bit) reach right +* after the later new configuration of desc +* which makes value of desc been covered +* leading to DMA_SUSPEND state as IDMAC fecth +* the wrong desc then. +*/ + while ((readl(>des0) & IDMAC_DES0_OWN)) + ; + + /* * Set the OWN bit and disable interrupts * for this descriptor */ @@ -535,6 +552,23 @@ static void dw_mci_translate_sglist(struct dw_mci *host, struct mmc_data *data, length -= desc_len; /* +* OWN bit should be clear by IDMAC after +* finishing transfer. Let's wait for the +* asynchronous operation of IDMAC and cpu +* to make sure that we do not rely on the +* order of Qos of bus and architecture. +* Otherwise we could see a race condition +* here that the former write operation of +* IDMAC(to clear the OWN bit) reach right +* after the later new configuration of desc +* which makes value of desc been covered +* leading to DMA_SUSPEND state as IDMAC fecth +* the wrong desc then. +*/ + while ((readl(>des0) & IDMAC_DES0_OWN)) + ; + + /* * Set the OWN bit and disable interrupts * for this descriptor */ -- 2.3.7
[PATCH 7/7] dax: unmap/truncate on device shutdown
Invalidate all mappings of a device-dax instance when the device is unregistered. Signed-off-by: Dan Williams--- drivers/dax/dax.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c index e8b9319aeadb..0a7899d5c65c 100644 --- a/drivers/dax/dax.c +++ b/drivers/dax/dax.c @@ -550,6 +550,7 @@ static void unregister_dax_dev(void *dev) */ dax_dev->alive = false; synchronize_rcu(); + unmap_mapping_range(dax_dev->inode->i_mapping, 0, 0, 1); cdev_del(cdev); device_unregister(dev); }
[PATCH 5/7] dax: convert to the cdev api
A goal of the device-DAX interface is to be able to support many exclusive allocations (partitions) of performance / feature differentiated memory. This count may exceed the default minors limit of 256. As a result of switching to an embedded cdev the inode-to-dax_dev conversion is simplified, as well as reference counting which can switch to the cdev kobject lifetime. Cc: Al Viro Signed-off-by: Dan Williams --- drivers/dax/Kconfig |5 +++ drivers/dax/dax.c | 82 ++- 2 files changed, 46 insertions(+), 41 deletions(-) diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig index cedab7572de3..daadd20aa936 100644 --- a/drivers/dax/Kconfig +++ b/drivers/dax/Kconfig @@ -23,4 +23,9 @@ config DEV_DAX_PMEM Say Y if unsure +config NR_DEV_DAX + int "Maximum number of Device-DAX instances" + default 32768 + range 256 2147483647 + endif diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c index 181d2a5a21e4..17715773c097 100644 --- a/drivers/dax/dax.c +++ b/drivers/dax/dax.c @@ -14,15 +14,19 @@ #include #include #include +#include #include #include #include #include #include "dax.h" -static int dax_major; +static dev_t dax_devt; static struct class *dax_class; static DEFINE_IDA(dax_minor_ida); +static int nr_dax = CONFIG_NR_DEV_DAX; +module_param(nr_dax, int, S_IRUGO); +MODULE_PARM_DESC(nr_dax, "max number of device-dax instances"); /** * struct dax_region - mapping infrastructure for dax devices @@ -49,6 +53,7 @@ struct dax_region { * struct dax_dev - subdivision of a dax region * @region - parent region * @dev - device backing the character device + * @cdev - core chardev data * @alive - !alive + rcu grace period == no new mappings can be established * @id - child id in the region * @num_resources - number of physical address extents in this device @@ -57,6 +62,7 @@ struct dax_region { struct dax_dev { struct dax_region *region; struct device dev; + struct cdev cdev; bool alive; int id; int num_resources; @@ -367,29 +373,12 @@ static unsigned long dax_get_unmapped_area(struct file *filp, return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags); } -static int __match_devt(struct device *dev, const void *data) -{ - const dev_t *devt = data; - - return dev->devt == *devt; -} - -static struct device *dax_dev_find(dev_t dev_t) -{ - return class_find_device(dax_class, NULL, _t, __match_devt); -} - static int dax_open(struct inode *inode, struct file *filp) { - struct dax_dev *dax_dev = NULL; - struct device *dev; - - dev = dax_dev_find(inode->i_rdev); - if (!dev) - return -ENXIO; + struct dax_dev *dax_dev; - dax_dev = to_dax_dev(dev); - dev_dbg(dev, "%s\n", __func__); + dax_dev = container_of(inode->i_cdev, struct dax_dev, cdev); + dev_dbg(_dev->dev, "%s\n", __func__); filp->private_data = dax_dev; inode->i_flags = S_DAX; @@ -399,11 +388,8 @@ static int dax_open(struct inode *inode, struct file *filp) static int dax_release(struct inode *inode, struct file *filp) { struct dax_dev *dax_dev = filp->private_data; - struct device *dev = _dev->dev; - - dev_dbg(dev, "%s\n", __func__); - put_device(dev); + dev_dbg(_dev->dev, "%s\n", __func__); return 0; } @@ -430,6 +416,7 @@ static void dax_dev_release(struct device *dev) static void unregister_dax_dev(void *dev) { struct dax_dev *dax_dev = to_dax_dev(dev); + struct cdev *cdev = _dev->cdev; dev_dbg(dev, "%s\n", __func__); @@ -442,6 +429,7 @@ static void unregister_dax_dev(void *dev) */ dax_dev->alive = false; synchronize_rcu(); + cdev_del(cdev); device_unregister(dev); } @@ -451,17 +439,13 @@ int devm_create_dax_dev(struct dax_region *dax_region, struct resource *res, struct device *parent = dax_region->dev; struct dax_dev *dax_dev; struct device *dev; + struct cdev *cdev; int rc, minor; dev_t dev_t; dax_dev = kzalloc(sizeof(*dax_dev) + sizeof(*res) * count, GFP_KERNEL); if (!dax_dev) return -ENOMEM; - memcpy(dax_dev->res, res, sizeof(*res) * count); - dax_dev->num_resources = count; - dax_dev->alive = true; - dax_dev->region = dax_region; - kref_get(_region->kref); dax_dev->id = ida_simple_get(_region->ida, 0, 0, GFP_KERNEL); if (dax_dev->id < 0) { @@ -475,10 +459,26 @@ int devm_create_dax_dev(struct dax_region *dax_region, struct resource *res, goto err_minor; } - dev_t = MKDEV(dax_major, minor); - + /* device_initialize() so cdev can reference kobj parent */ + dev_t = MKDEV(MAJOR(dax_devt), minor); dev = _dev->dev; device_initialize(dev); + + cdev = _dev->cdev; +
[PATCH 4/7] dax: embed a struct device in dax_dev
The kref in dax_dev can be made redundant if the final put_device() on the device associated with the dax_dev frees the dax_dev. This can be accomplished by embedding a struct device in struct dax_dev, open coding device_create() and specifying a custom release method. Signed-off-by: Dan Williams --- drivers/dax/dax.c | 130 ++--- 1 file changed, 45 insertions(+), 85 deletions(-) diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c index 994dfa507dfb..181d2a5a21e4 100644 --- a/drivers/dax/dax.c +++ b/drivers/dax/dax.c @@ -49,7 +49,6 @@ struct dax_region { * struct dax_dev - subdivision of a dax region * @region - parent region * @dev - device backing the character device - * @kref - enable this data to be tracked in filp->private_data * @alive - !alive + rcu grace period == no new mappings can be established * @id - child id in the region * @num_resources - number of physical address extents in this device @@ -57,8 +56,7 @@ struct dax_region { */ struct dax_dev { struct dax_region *region; - struct device *dev; - struct kref kref; + struct device dev; bool alive; int id; int num_resources; @@ -79,20 +77,6 @@ void dax_region_put(struct dax_region *dax_region) } EXPORT_SYMBOL_GPL(dax_region_put); -static void dax_dev_free(struct kref *kref) -{ - struct dax_dev *dax_dev; - - dax_dev = container_of(kref, struct dax_dev, kref); - dax_region_put(dax_dev->region); - kfree(dax_dev); -} - -static void dax_dev_put(struct dax_dev *dax_dev) -{ - kref_put(_dev->kref, dax_dev_free); -} - struct dax_region *alloc_dax_region(struct device *parent, int region_id, struct resource *res, unsigned int align, void *addr, unsigned long pfn_flags) @@ -117,10 +101,15 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id, } EXPORT_SYMBOL_GPL(alloc_dax_region); +static struct dax_dev *to_dax_dev(struct device *dev) +{ + return container_of(dev, struct dax_dev, dev); +} + static ssize_t size_show(struct device *dev, struct device_attribute *attr, char *buf) { - struct dax_dev *dax_dev = dev_get_drvdata(dev); + struct dax_dev *dax_dev = to_dax_dev(dev); unsigned long long size = 0; int i; @@ -149,7 +138,7 @@ static int check_vma(struct dax_dev *dax_dev, struct vm_area_struct *vma, const char *func) { struct dax_region *dax_region = dax_dev->region; - struct device *dev = dax_dev->dev; + struct device *dev = _dev->dev; unsigned long mask; if (!dax_dev->alive) @@ -214,7 +203,7 @@ static int __dax_dev_fault(struct dax_dev *dax_dev, struct vm_area_struct *vma, struct vm_fault *vmf) { unsigned long vaddr = (unsigned long) vmf->virtual_address; - struct device *dev = dax_dev->dev; + struct device *dev = _dev->dev; struct dax_region *dax_region; int rc = VM_FAULT_SIGBUS; phys_addr_t phys; @@ -254,7 +243,7 @@ static int dax_dev_fault(struct vm_area_struct *vma, struct vm_fault *vmf) struct file *filp = vma->vm_file; struct dax_dev *dax_dev = filp->private_data; - dev_dbg(dax_dev->dev, "%s: %s: %s (%#lx - %#lx)\n", __func__, + dev_dbg(_dev->dev, "%s: %s: %s (%#lx - %#lx)\n", __func__, current->comm, (vmf->flags & FAULT_FLAG_WRITE) ? "write" : "read", vma->vm_start, vma->vm_end); rcu_read_lock(); @@ -269,7 +258,7 @@ static int __dax_dev_pmd_fault(struct dax_dev *dax_dev, unsigned int flags) { unsigned long pmd_addr = addr & PMD_MASK; - struct device *dev = dax_dev->dev; + struct device *dev = _dev->dev; struct dax_region *dax_region; phys_addr_t phys; pgoff_t pgoff; @@ -311,7 +300,7 @@ static int dax_dev_pmd_fault(struct vm_area_struct *vma, unsigned long addr, struct file *filp = vma->vm_file; struct dax_dev *dax_dev = filp->private_data; - dev_dbg(dax_dev->dev, "%s: %s: %s (%#lx - %#lx)\n", __func__, + dev_dbg(_dev->dev, "%s: %s: %s (%#lx - %#lx)\n", __func__, current->comm, (flags & FAULT_FLAG_WRITE) ? "write" : "read", vma->vm_start, vma->vm_end); @@ -322,29 +311,9 @@ static int dax_dev_pmd_fault(struct vm_area_struct *vma, unsigned long addr, return rc; } -static void dax_dev_vm_open(struct vm_area_struct *vma) -{ - struct file *filp = vma->vm_file; - struct dax_dev *dax_dev = filp->private_data; - - dev_dbg(dax_dev->dev, "%s\n", __func__); - kref_get(_dev->kref); -} - -static void dax_dev_vm_close(struct vm_area_struct *vma) -{ - struct file *filp = vma->vm_file; - struct dax_dev *dax_dev = filp->private_data; - - dev_dbg(dax_dev->dev, "%s\n", __func__); - dax_dev_put(dax_dev);
[RFC PATCH] mmc: dw_mmc: avoid race condition of cpu and IDMAC
We could see an obvious race condition by test that the former write operation by IDMAC aiming to clear OWN bit reach right after the later configuration of the same desc, which makes the IDMAC be in SUSPEND state as the OWN bit was cleared by the asynchronous write operation of IDMAC. The bug can be very easy reproduced on RK3288 or similar when lowering the bandwidth of bus and aggravating the Qos to make the large numbers of IP fight for the priority. One possible replaceable solution may be alloc dual buff for the desc to avoid it but could still race each other theoretically. Signed-off-by: Shawn Lin --- drivers/mmc/host/dw_mmc.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c index 32380d5..7b01fab 100644 --- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -490,6 +490,23 @@ static void dw_mci_translate_sglist(struct dw_mci *host, struct mmc_data *data, length -= desc_len; /* +* OWN bit should be clear by IDMAC after +* finishing transfer. Let's wait for the +* asynchronous operation of IDMAC and cpu +* to make sure that we do not rely on the +* order of Qos of bus and architecture. +* Otherwise we could see a race condition +* here that the former write operation of +* IDMAC(to clear the OWN bit) reach right +* after the later new configuration of desc +* which makes value of desc been covered +* leading to DMA_SUSPEND state as IDMAC fecth +* the wrong desc then. +*/ + while ((readl(>des0) & IDMAC_DES0_OWN)) + ; + + /* * Set the OWN bit and disable interrupts * for this descriptor */ @@ -535,6 +552,23 @@ static void dw_mci_translate_sglist(struct dw_mci *host, struct mmc_data *data, length -= desc_len; /* +* OWN bit should be clear by IDMAC after +* finishing transfer. Let's wait for the +* asynchronous operation of IDMAC and cpu +* to make sure that we do not rely on the +* order of Qos of bus and architecture. +* Otherwise we could see a race condition +* here that the former write operation of +* IDMAC(to clear the OWN bit) reach right +* after the later new configuration of desc +* which makes value of desc been covered +* leading to DMA_SUSPEND state as IDMAC fecth +* the wrong desc then. +*/ + while ((readl(>des0) & IDMAC_DES0_OWN)) + ; + + /* * Set the OWN bit and disable interrupts * for this descriptor */ -- 2.3.7
[PATCH 7/7] dax: unmap/truncate on device shutdown
Invalidate all mappings of a device-dax instance when the device is unregistered. Signed-off-by: Dan Williams --- drivers/dax/dax.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c index e8b9319aeadb..0a7899d5c65c 100644 --- a/drivers/dax/dax.c +++ b/drivers/dax/dax.c @@ -550,6 +550,7 @@ static void unregister_dax_dev(void *dev) */ dax_dev->alive = false; synchronize_rcu(); + unmap_mapping_range(dax_dev->inode->i_mapping, 0, 0, 1); cdev_del(cdev); device_unregister(dev); }
[PATCH 1/7] dax: cleanup needlessly global symbol warnings
drivers/dax/dax.c:75:6: warning: symbol 'dax_region_put' was not declared. drivers/dax/dax.c:95:19: warning: symbol 'alloc_dax_region' was not declared. drivers/dax/dax.c:173:5: warning: symbol 'devm_create_dax_dev' was not declared. drivers/dax/pmem.c:27:17: warning: symbol 'to_dax_pmem' was not declared. Signed-off-by: Dan Williams--- drivers/dax/dax.c |1 + drivers/dax/pmem.c |2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c index 803f3953b341..736c03830fd0 100644 --- a/drivers/dax/dax.c +++ b/drivers/dax/dax.c @@ -18,6 +18,7 @@ #include #include #include +#include "dax.h" static int dax_major; static struct class *dax_class; diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c index dfb168568af1..59b75c5972bb 100644 --- a/drivers/dax/pmem.c +++ b/drivers/dax/pmem.c @@ -24,7 +24,7 @@ struct dax_pmem { struct completion cmp; }; -struct dax_pmem *to_dax_pmem(struct percpu_ref *ref) +static struct dax_pmem *to_dax_pmem(struct percpu_ref *ref) { return container_of(ref, struct dax_pmem, ref); }
[PATCH 3/7] dax: rename fops from dax_dev_ to dax_
Shorten the prefix of the file operations to distinguish them from operations on the struct device associated with the dax_dev. Signed-off-by: Dan Williams--- drivers/dax/dax.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c index 3774fc9709bb..994dfa507dfb 100644 --- a/drivers/dax/dax.c +++ b/drivers/dax/dax.c @@ -347,7 +347,7 @@ static const struct vm_operations_struct dax_dev_vm_ops = { .close = dax_dev_vm_close, }; -static int dax_dev_mmap(struct file *filp, struct vm_area_struct *vma) +static int dax_mmap(struct file *filp, struct vm_area_struct *vma) { struct dax_dev *dax_dev = filp->private_data; int rc; @@ -365,7 +365,7 @@ static int dax_dev_mmap(struct file *filp, struct vm_area_struct *vma) } /* return an unmapped area aligned to the dax region specified alignment */ -static unsigned long dax_dev_get_unmapped_area(struct file *filp, +static unsigned long dax_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { @@ -411,7 +411,7 @@ static struct device *dax_dev_find(dev_t dev_t) return class_find_device(dax_class, NULL, _t, __match_devt); } -static int dax_dev_open(struct inode *inode, struct file *filp) +static int dax_open(struct inode *inode, struct file *filp) { struct dax_dev *dax_dev = NULL; struct device *dev; @@ -437,7 +437,7 @@ static int dax_dev_open(struct inode *inode, struct file *filp) return 0; } -static int dax_dev_release(struct inode *inode, struct file *filp) +static int dax_release(struct inode *inode, struct file *filp) { struct dax_dev *dax_dev = filp->private_data; struct device *dev = dax_dev->dev; @@ -452,10 +452,10 @@ static int dax_dev_release(struct inode *inode, struct file *filp) static const struct file_operations dax_fops = { .llseek = noop_llseek, .owner = THIS_MODULE, - .open = dax_dev_open, - .release = dax_dev_release, - .get_unmapped_area = dax_dev_get_unmapped_area, - .mmap = dax_dev_mmap, + .open = dax_open, + .release = dax_release, + .get_unmapped_area = dax_get_unmapped_area, + .mmap = dax_mmap, }; static void unregister_dax_dev(void *_dev)
[PATCH 1/7] dax: cleanup needlessly global symbol warnings
drivers/dax/dax.c:75:6: warning: symbol 'dax_region_put' was not declared. drivers/dax/dax.c:95:19: warning: symbol 'alloc_dax_region' was not declared. drivers/dax/dax.c:173:5: warning: symbol 'devm_create_dax_dev' was not declared. drivers/dax/pmem.c:27:17: warning: symbol 'to_dax_pmem' was not declared. Signed-off-by: Dan Williams --- drivers/dax/dax.c |1 + drivers/dax/pmem.c |2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c index 803f3953b341..736c03830fd0 100644 --- a/drivers/dax/dax.c +++ b/drivers/dax/dax.c @@ -18,6 +18,7 @@ #include #include #include +#include "dax.h" static int dax_major; static struct class *dax_class; diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c index dfb168568af1..59b75c5972bb 100644 --- a/drivers/dax/pmem.c +++ b/drivers/dax/pmem.c @@ -24,7 +24,7 @@ struct dax_pmem { struct completion cmp; }; -struct dax_pmem *to_dax_pmem(struct percpu_ref *ref) +static struct dax_pmem *to_dax_pmem(struct percpu_ref *ref) { return container_of(ref, struct dax_pmem, ref); }
[PATCH 3/7] dax: rename fops from dax_dev_ to dax_
Shorten the prefix of the file operations to distinguish them from operations on the struct device associated with the dax_dev. Signed-off-by: Dan Williams --- drivers/dax/dax.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c index 3774fc9709bb..994dfa507dfb 100644 --- a/drivers/dax/dax.c +++ b/drivers/dax/dax.c @@ -347,7 +347,7 @@ static const struct vm_operations_struct dax_dev_vm_ops = { .close = dax_dev_vm_close, }; -static int dax_dev_mmap(struct file *filp, struct vm_area_struct *vma) +static int dax_mmap(struct file *filp, struct vm_area_struct *vma) { struct dax_dev *dax_dev = filp->private_data; int rc; @@ -365,7 +365,7 @@ static int dax_dev_mmap(struct file *filp, struct vm_area_struct *vma) } /* return an unmapped area aligned to the dax region specified alignment */ -static unsigned long dax_dev_get_unmapped_area(struct file *filp, +static unsigned long dax_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { @@ -411,7 +411,7 @@ static struct device *dax_dev_find(dev_t dev_t) return class_find_device(dax_class, NULL, _t, __match_devt); } -static int dax_dev_open(struct inode *inode, struct file *filp) +static int dax_open(struct inode *inode, struct file *filp) { struct dax_dev *dax_dev = NULL; struct device *dev; @@ -437,7 +437,7 @@ static int dax_dev_open(struct inode *inode, struct file *filp) return 0; } -static int dax_dev_release(struct inode *inode, struct file *filp) +static int dax_release(struct inode *inode, struct file *filp) { struct dax_dev *dax_dev = filp->private_data; struct device *dev = dax_dev->dev; @@ -452,10 +452,10 @@ static int dax_dev_release(struct inode *inode, struct file *filp) static const struct file_operations dax_fops = { .llseek = noop_llseek, .owner = THIS_MODULE, - .open = dax_dev_open, - .release = dax_dev_release, - .get_unmapped_area = dax_dev_get_unmapped_area, - .mmap = dax_dev_mmap, + .open = dax_open, + .release = dax_release, + .get_unmapped_area = dax_get_unmapped_area, + .mmap = dax_mmap, }; static void unregister_dax_dev(void *_dev)
[PATCH 0/7] dax: unified host inode for device-dax mappings
There are two scenarios where we need mappings of a /dev/dax device to share a single host inode, invalidating mappings at device shutdown, and coordinating resize of an actively mapped device. This series addresses the unmap-on-shutdown case and includes reworks, like the cdev api conversion, to prepare for a dynamic resize / allocation capability. Recall that device-DAX, introduced in v4.7 [1], is a mechanism to provide deterministic mapping behavior for performance- / feature-differentiated memory ranges. [1]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ab68f2622136 --- Dan Williams (7): dax: cleanup needlessly global symbol warnings dax: reorder dax_fops function definitions dax: rename fops from dax_dev_ to dax_ dax: embed a struct device in dax_dev dax: convert to the cdev api dax: define a unified inode/address_space for device-dax mappings dax: unmap/truncate on device shutdown drivers/dax/Kconfig|5 drivers/dax/dax.c | 555 ++-- drivers/dax/pmem.c |2 fs/char_dev.c |1 include/uapi/linux/magic.h |1 5 files changed, 337 insertions(+), 227 deletions(-)
[PATCH 2/7] dax: reorder dax_fops function definitions
In order to convert devm_create_dax_dev() to use cdev, it will need access to dax_fops. Move dax_fops and related function definitions before devm_create_dax_dev(). Signed-off-by: Dan Williams--- drivers/dax/dax.c | 337 ++--- 1 file changed, 168 insertions(+), 169 deletions(-) diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c index 736c03830fd0..3774fc9709bb 100644 --- a/drivers/dax/dax.c +++ b/drivers/dax/dax.c @@ -145,175 +145,6 @@ static const struct attribute_group *dax_attribute_groups[] = { NULL, }; -static void unregister_dax_dev(void *_dev) -{ - struct device *dev = _dev; - struct dax_dev *dax_dev = dev_get_drvdata(dev); - struct dax_region *dax_region = dax_dev->region; - - dev_dbg(dev, "%s\n", __func__); - - /* -* Note, rcu is not protecting the liveness of dax_dev, rcu is -* ensuring that any fault handlers that might have seen -* dax_dev->alive == true, have completed. Any fault handlers -* that start after synchronize_rcu() has started will abort -* upon seeing dax_dev->alive == false. -*/ - dax_dev->alive = false; - synchronize_rcu(); - - get_device(dev); - device_unregister(dev); - ida_simple_remove(_region->ida, dax_dev->id); - ida_simple_remove(_minor_ida, MINOR(dev->devt)); - put_device(dev); - dax_dev_put(dax_dev); -} - -int devm_create_dax_dev(struct dax_region *dax_region, struct resource *res, - int count) -{ - struct device *parent = dax_region->dev; - struct dax_dev *dax_dev; - struct device *dev; - int rc, minor; - dev_t dev_t; - - dax_dev = kzalloc(sizeof(*dax_dev) + sizeof(*res) * count, GFP_KERNEL); - if (!dax_dev) - return -ENOMEM; - memcpy(dax_dev->res, res, sizeof(*res) * count); - dax_dev->num_resources = count; - kref_init(_dev->kref); - dax_dev->alive = true; - dax_dev->region = dax_region; - kref_get(_region->kref); - - dax_dev->id = ida_simple_get(_region->ida, 0, 0, GFP_KERNEL); - if (dax_dev->id < 0) { - rc = dax_dev->id; - goto err_id; - } - - minor = ida_simple_get(_minor_ida, 0, 0, GFP_KERNEL); - if (minor < 0) { - rc = minor; - goto err_minor; - } - - dev_t = MKDEV(dax_major, minor); - dev = device_create_with_groups(dax_class, parent, dev_t, dax_dev, - dax_attribute_groups, "dax%d.%d", dax_region->id, - dax_dev->id); - if (IS_ERR(dev)) { - rc = PTR_ERR(dev); - goto err_create; - } - dax_dev->dev = dev; - - rc = devm_add_action_or_reset(dax_region->dev, unregister_dax_dev, dev); - if (rc) - return rc; - - return 0; - - err_create: - ida_simple_remove(_minor_ida, minor); - err_minor: - ida_simple_remove(_region->ida, dax_dev->id); - err_id: - dax_dev_put(dax_dev); - - return rc; -} -EXPORT_SYMBOL_GPL(devm_create_dax_dev); - -/* return an unmapped area aligned to the dax region specified alignment */ -static unsigned long dax_dev_get_unmapped_area(struct file *filp, - unsigned long addr, unsigned long len, unsigned long pgoff, - unsigned long flags) -{ - unsigned long off, off_end, off_align, len_align, addr_align, align; - struct dax_dev *dax_dev = filp ? filp->private_data : NULL; - struct dax_region *dax_region; - - if (!dax_dev || addr) - goto out; - - dax_region = dax_dev->region; - align = dax_region->align; - off = pgoff << PAGE_SHIFT; - off_end = off + len; - off_align = round_up(off, align); - - if ((off_end <= off_align) || ((off_end - off_align) < align)) - goto out; - - len_align = len + align; - if ((off + len_align) < off) - goto out; - - addr_align = current->mm->get_unmapped_area(filp, addr, len_align, - pgoff, flags); - if (!IS_ERR_VALUE(addr_align)) { - addr_align += (off - addr_align) & (align - 1); - return addr_align; - } - out: - return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags); -} - -static int __match_devt(struct device *dev, const void *data) -{ - const dev_t *devt = data; - - return dev->devt == *devt; -} - -static struct device *dax_dev_find(dev_t dev_t) -{ - return class_find_device(dax_class, NULL, _t, __match_devt); -} - -static int dax_dev_open(struct inode *inode, struct file *filp) -{ - struct dax_dev *dax_dev = NULL; - struct device *dev; - - dev = dax_dev_find(inode->i_rdev); - if (!dev) - return -ENXIO; - - device_lock(dev); - dax_dev = dev_get_drvdata(dev); -
[PATCH 0/7] dax: unified host inode for device-dax mappings
There are two scenarios where we need mappings of a /dev/dax device to share a single host inode, invalidating mappings at device shutdown, and coordinating resize of an actively mapped device. This series addresses the unmap-on-shutdown case and includes reworks, like the cdev api conversion, to prepare for a dynamic resize / allocation capability. Recall that device-DAX, introduced in v4.7 [1], is a mechanism to provide deterministic mapping behavior for performance- / feature-differentiated memory ranges. [1]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ab68f2622136 --- Dan Williams (7): dax: cleanup needlessly global symbol warnings dax: reorder dax_fops function definitions dax: rename fops from dax_dev_ to dax_ dax: embed a struct device in dax_dev dax: convert to the cdev api dax: define a unified inode/address_space for device-dax mappings dax: unmap/truncate on device shutdown drivers/dax/Kconfig|5 drivers/dax/dax.c | 555 ++-- drivers/dax/pmem.c |2 fs/char_dev.c |1 include/uapi/linux/magic.h |1 5 files changed, 337 insertions(+), 227 deletions(-)
[PATCH 2/7] dax: reorder dax_fops function definitions
In order to convert devm_create_dax_dev() to use cdev, it will need access to dax_fops. Move dax_fops and related function definitions before devm_create_dax_dev(). Signed-off-by: Dan Williams --- drivers/dax/dax.c | 337 ++--- 1 file changed, 168 insertions(+), 169 deletions(-) diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c index 736c03830fd0..3774fc9709bb 100644 --- a/drivers/dax/dax.c +++ b/drivers/dax/dax.c @@ -145,175 +145,6 @@ static const struct attribute_group *dax_attribute_groups[] = { NULL, }; -static void unregister_dax_dev(void *_dev) -{ - struct device *dev = _dev; - struct dax_dev *dax_dev = dev_get_drvdata(dev); - struct dax_region *dax_region = dax_dev->region; - - dev_dbg(dev, "%s\n", __func__); - - /* -* Note, rcu is not protecting the liveness of dax_dev, rcu is -* ensuring that any fault handlers that might have seen -* dax_dev->alive == true, have completed. Any fault handlers -* that start after synchronize_rcu() has started will abort -* upon seeing dax_dev->alive == false. -*/ - dax_dev->alive = false; - synchronize_rcu(); - - get_device(dev); - device_unregister(dev); - ida_simple_remove(_region->ida, dax_dev->id); - ida_simple_remove(_minor_ida, MINOR(dev->devt)); - put_device(dev); - dax_dev_put(dax_dev); -} - -int devm_create_dax_dev(struct dax_region *dax_region, struct resource *res, - int count) -{ - struct device *parent = dax_region->dev; - struct dax_dev *dax_dev; - struct device *dev; - int rc, minor; - dev_t dev_t; - - dax_dev = kzalloc(sizeof(*dax_dev) + sizeof(*res) * count, GFP_KERNEL); - if (!dax_dev) - return -ENOMEM; - memcpy(dax_dev->res, res, sizeof(*res) * count); - dax_dev->num_resources = count; - kref_init(_dev->kref); - dax_dev->alive = true; - dax_dev->region = dax_region; - kref_get(_region->kref); - - dax_dev->id = ida_simple_get(_region->ida, 0, 0, GFP_KERNEL); - if (dax_dev->id < 0) { - rc = dax_dev->id; - goto err_id; - } - - minor = ida_simple_get(_minor_ida, 0, 0, GFP_KERNEL); - if (minor < 0) { - rc = minor; - goto err_minor; - } - - dev_t = MKDEV(dax_major, minor); - dev = device_create_with_groups(dax_class, parent, dev_t, dax_dev, - dax_attribute_groups, "dax%d.%d", dax_region->id, - dax_dev->id); - if (IS_ERR(dev)) { - rc = PTR_ERR(dev); - goto err_create; - } - dax_dev->dev = dev; - - rc = devm_add_action_or_reset(dax_region->dev, unregister_dax_dev, dev); - if (rc) - return rc; - - return 0; - - err_create: - ida_simple_remove(_minor_ida, minor); - err_minor: - ida_simple_remove(_region->ida, dax_dev->id); - err_id: - dax_dev_put(dax_dev); - - return rc; -} -EXPORT_SYMBOL_GPL(devm_create_dax_dev); - -/* return an unmapped area aligned to the dax region specified alignment */ -static unsigned long dax_dev_get_unmapped_area(struct file *filp, - unsigned long addr, unsigned long len, unsigned long pgoff, - unsigned long flags) -{ - unsigned long off, off_end, off_align, len_align, addr_align, align; - struct dax_dev *dax_dev = filp ? filp->private_data : NULL; - struct dax_region *dax_region; - - if (!dax_dev || addr) - goto out; - - dax_region = dax_dev->region; - align = dax_region->align; - off = pgoff << PAGE_SHIFT; - off_end = off + len; - off_align = round_up(off, align); - - if ((off_end <= off_align) || ((off_end - off_align) < align)) - goto out; - - len_align = len + align; - if ((off + len_align) < off) - goto out; - - addr_align = current->mm->get_unmapped_area(filp, addr, len_align, - pgoff, flags); - if (!IS_ERR_VALUE(addr_align)) { - addr_align += (off - addr_align) & (align - 1); - return addr_align; - } - out: - return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags); -} - -static int __match_devt(struct device *dev, const void *data) -{ - const dev_t *devt = data; - - return dev->devt == *devt; -} - -static struct device *dax_dev_find(dev_t dev_t) -{ - return class_find_device(dax_class, NULL, _t, __match_devt); -} - -static int dax_dev_open(struct inode *inode, struct file *filp) -{ - struct dax_dev *dax_dev = NULL; - struct device *dev; - - dev = dax_dev_find(inode->i_rdev); - if (!dev) - return -ENXIO; - - device_lock(dev); - dax_dev = dev_get_drvdata(dev); - if (dax_dev) { -
[PATCH V3 0/2] rtc-cmos: Workaround unwanted interrupt generation
We have observed on few machines with rtc-cmos devices that it generates an interrupt before the hpet_rtc_timer_init() call is finished. This leads to hpet_rtc_interrupt() being called before it is fully initialized. Therefore the while-loop of hpet_cnt_ahead() in hpet_rtc_timer_reinit() never completes. This leads to "NMI watchdog: Watchdog detected hard LOCKUP on cpu 0". This patch set initializes hpet_default_delta and hpet_t1_cmp before interrupt can be raised. Changes since V2: - Improved commit log further Changes since RFC: - Commit log of patches has been improved. Pratyush Anand (2): rtc/hpet: Factorize hpet_rtc_timer_init() rtc/rtc-cmos: Initialize software counters before irq is registered arch/x86/include/asm/hpet.h | 2 ++ arch/x86/kernel/hpet.c | 41 +++-- drivers/rtc/rtc-cmos.c | 13 - 3 files changed, 49 insertions(+), 7 deletions(-) -- 2.5.5
[PATCH V3 0/2] rtc-cmos: Workaround unwanted interrupt generation
We have observed on few machines with rtc-cmos devices that it generates an interrupt before the hpet_rtc_timer_init() call is finished. This leads to hpet_rtc_interrupt() being called before it is fully initialized. Therefore the while-loop of hpet_cnt_ahead() in hpet_rtc_timer_reinit() never completes. This leads to "NMI watchdog: Watchdog detected hard LOCKUP on cpu 0". This patch set initializes hpet_default_delta and hpet_t1_cmp before interrupt can be raised. Changes since V2: - Improved commit log further Changes since RFC: - Commit log of patches has been improved. Pratyush Anand (2): rtc/hpet: Factorize hpet_rtc_timer_init() rtc/rtc-cmos: Initialize software counters before irq is registered arch/x86/include/asm/hpet.h | 2 ++ arch/x86/kernel/hpet.c | 41 +++-- drivers/rtc/rtc-cmos.c | 13 - 3 files changed, 49 insertions(+), 7 deletions(-) -- 2.5.5