date:20150416

boot hang after TSC init.

2015-04-16 Thread Dave Jones

After updating to todays tree, I have two machines
that stop booting after TSC init.

The last things printed are..

tsc: Refined TSC clocksource calibration: 2933.328 MHz
clocksource tsc: mask: 0x max_cycles: 0x2a483edfd02, 
max_idle_ns: 440795320705 ns
Switched to clocksource tsc

Then the cursor blinks, but booting doesn't continue.

I'll debug more in the morning..


Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] drivers/usb/chipidea/debuc.c: avoid out of bound read

2015-04-16 Thread Heinrich Schuchardt

A string written by the user may not be zero terminated.

sscanf may read memory beyond the buffer if no zero byte
is found.

Signed-off-by: Heinrich Schuchardt 
---
 drivers/usb/chipidea/debug.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/chipidea/debug.c b/drivers/usb/chipidea/debug.c
index dfb05ed..ef08af3 100644
--- a/drivers/usb/chipidea/debug.c
+++ b/drivers/usb/chipidea/debug.c
@@ -88,9 +88,13 @@ static ssize_t ci_port_test_write(struct file *file, const 
char __user *ubuf,
char buf[32];
int ret;
 
-   if (copy_from_user(buf, ubuf, min_t(size_t, sizeof(buf) - 1, count)))
+   count = min_t(size_t, sizeof(buf) - 1, count);
+   if (copy_from_user(buf, ubuf, count))
return -EFAULT;
 
+   /* sscanf requires a zero terminated string */
+   buf[count] = 0;
+
if (sscanf(buf, "%u", ) != 1)
return -EINVAL;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] proc: move the adding option Ngid to the end of proc/PID/status

2015-04-16 Thread Wang, Xiaoming

Dear tejun

> -Original Message-
> From: hte...@gmail.com [mailto:hte...@gmail.com] On Behalf Of Tejun Heo
> Sent: Friday, April 17, 2015 11:42 AM
> To: Wang, Xiaoming
> Cc: a...@linux-foundation.org; o...@redhat.com;
> andriy.shevche...@linux.intel.com; li...@rasmusvillemoes.dk;
> ebied...@xmission.com; epa...@redhat.com; chenhanx...@cn.fujitsu.com;
> t...@linutronix.de; linux-kernel@vger.kernel.org; Schallberger, Timothy M;
> Zhang, Dongxing
> Subject: Re: [PATCH] proc: move the adding option Ngid to the end of
> proc/PID/status
> 
> On Thu, Apr 16, 2015 at 11:37 PM, Wang, Xiaoming
>  wrote:
> >> git describe --contains says 3.13 and it's about 1.5 years ago.
> >>
> > Yes this kernel change is 1.5 years ago.
> > As we known not all user update the kernel so frequently.
> > They just use the stable one.
> > We met this issue when update to 3.13 now.
> > A lot of application failed to run which run well previously.
> > Do you have any idea on this issue?
> 
> Not really. It's a sucky situation. How many applications are we talking 
> about? I
> tried to find information on libsecuritysdk but couldn't find it anywhere.
> 
This lib libsecuritysdk is included in application.
Taobao, weibo,  tmall, alipay, etc
It refer to security .
> --
> tejun

Re: [PATCH V3 2/2] pinctrl: Add Pistachio SoC pin control driver

2015-04-16 Thread Ezequiel Garcia


Hi Andrew,

On 04/07/2015 04:44 PM, Andrew Bresticker wrote:
[..]
> +static int pistachio_gpio_register(struct pistachio_pinctrl *pctl)
> +{
> + struct device_node *node = pctl->dev->of_node;
> + struct pistachio_gpio_bank *bank;
> + unsigned int i;
> + int irq, ret = 0;
> +
> + for (i = 0; i < pctl->nbanks; i++) {
> + char child_name[sizeof("gpioXX")];
> + struct device_node *child;

The first submission used for_each_child_of_node, and I can't find
any review comments explaining why you've changed it to a regular for
loop.

> +
> + snprintf(child_name, sizeof(child_name), "gpio%d", i);

This assumes the GPIO bank nodes are called gpio0, gpio1, ... and so on.
Do we really want to assume that?

Thanks,
-- 
Ezequiel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 13/13] thermal: of: implement .set_trips for device tree thermal zones

2015-04-16 Thread Sascha Hauer

Hi Brian,

On Wed, Apr 15, 2015 at 10:59:36AM -0700, Brian Norris wrote:
> On Mon, Apr 13, 2015 at 08:30:18AM +0200, Sascha Hauer wrote:
> > On Mon, Apr 06, 2015 at 07:43:08PM -0700, Eduardo Valentin wrote:
> > > On Thu, Mar 26, 2015 at 04:54:00PM +0100, Sascha Hauer wrote:
> > > > Signed-off-by: Sascha Hauer 
> > > > ---
> > > >  drivers/thermal/of-thermal.c | 12 
> > > >  include/linux/thermal.h  |  1 +
> > > >  2 files changed, 13 insertions(+)
> > > 
> > > Can you please include at least one user of this call back in your patch
> > > series?
> > 
> > Thanks for the comments on this series. I'll address them in the next
> > round. The user for this callback is not yet ready to be posted. I'll
> > try and move this patch to the end of the series so that the series can
> > be applied without this one.
> 
> I'm actually interested in this patch, as I am developing a thermal
> sensor driver that has hardware-assisted temperature-trip interrupts.
> I'm testing out your patches now, and will probably end up using them on
> my local tree. With help of this (and a few other changes I need to
> make), my driver should be ready to post soon.
> 
> So, I'll try to post my driver soon as an example, if you'd like. I'd
> also appreciate a CC on any future revisions, if you don't mind. I'm
> subscribed to the list, but it's not always easy to pick out the signal
> from the noise.

Sure, I'll add you to Cc next round. I'm halfway through updating this
series according to the comments I received, but got distracted. I hope
to send an update next week.

Sascha

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] tty: Remove leftover dependencies on PPC_OF

2015-04-16 Thread Guenter Roeck

powerpc qemu runs fail with the current upstream kernel.
Bisect points to commit 52d996270032 ("powerpc: kill PPC_OF").
Unfortunately, that commit did not remove all instances of PPC_OF.
Practical impact is that the serial driver used by powerpc qemu
targets is no longer built into the test kernel.

Fixes: 52d996270032 ("powerpc: kill PPC_OF")
Cc: Kevin Hao 
Cc: Michael Ellerman 
Signed-off-by: Guenter Roeck 
---
 drivers/tty/serial/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/serial/Kconfig b/drivers/tty/serial/Kconfig
index d2501f01cd03..77471d3db8d3 100644
--- a/drivers/tty/serial/Kconfig
+++ b/drivers/tty/serial/Kconfig
@@ -835,7 +835,7 @@ config SERIAL_MCF_CONSOLE
 
 config SERIAL_PMACZILOG
tristate "Mac or PowerMac z85c30 ESCC support"
-   depends on (M68K && MAC) || (PPC_OF && PPC_PMAC)
+   depends on (M68K && MAC) || PPC_PMAC
select SERIAL_CORE
help
  This driver supports the Zilog z85C30 serial ports found on
@@ -1153,7 +1153,7 @@ config SERIAL_OMAP_CONSOLE
 
 config SERIAL_OF_PLATFORM_NWPSERIAL
tristate "NWP serial port driver"
-   depends on PPC_OF && PPC_DCR
+   depends on PPC_DCR
select SERIAL_OF_PLATFORM
select SERIAL_CORE_CONSOLE
select SERIAL_CORE
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tracing: fix incorrect enabling of trace events by boot cmdline

2015-04-16 Thread Joonsoo Kim

On Thu, Apr 16, 2015 at 09:39:52AM -0400, Steven Rostedt wrote:
> On Thu, 16 Apr 2015 13:44:44 +0900
> Joonsoo Kim  wrote:
> 
> > There is a problem that trace events are not properly enabled with
> > boot cmdline. Problem is that if we pass "trace_event=kmem:mm_page_alloc"
> > to boot cmdline, it enables all kmem trace events.
> > 
> > It is caused by parsing mechanism. When we parse cmdline, buffer
> > contents is modified due to tokenization. And, if we use this buffer
> > again, we will get wrong result.
> > 
> > Unfortunately, this buffer should be accessed three times
> > to set trace events properly in boot time. So, we need to handle
> > this situation.
> > 
> > There is already handling code for ",", but, we need another for
> > ":". This patch add it.
> 
> Thanks, but your patch has a bug in it. I'll fix it up.
> 
> > 
> > Signed-off-by: Joonsoo Kim 
> > ---
> >  kernel/trace/trace_events.c | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> > index db54dda..ce5b194 100644
> > --- a/kernel/trace/trace_events.c
> > +++ b/kernel/trace/trace_events.c
> > @@ -565,6 +565,7 @@ static int __ftrace_set_clr_event(struct trace_array 
> > *tr, const char *match,
> >  static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
> >  {
> > char *event = NULL, *sub = NULL, *match;
> > +   int ret;
> >  
> > /*
> >  * The buf format can be :
> > @@ -590,7 +591,11 @@ static int ftrace_set_clr_event(struct trace_array 
> > *tr, char *buf, int set)
> > event = NULL;
> > }
> >  
> > -   return __ftrace_set_clr_event(tr, match, sub, event, set);
> > +   ret = __ftrace_set_clr_event(tr, match, sub, event, set);
> > +
> > +   /* Put back the colon to allow this to be called again */
> > +   if (buf)
> > +   *(buf - 1) = ':';
> 
> You forgot to add:
> 
>   return ret;
> 

Ah... my bad... :)
thanks for fixing it.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] mm: vmscan: invoke slab shrinkers from shrink_zone()

2015-04-16 Thread Joonsoo Kim

On Fri, Apr 17, 2015 at 09:17:53AM +1000, Dave Chinner wrote:
> On Thu, Apr 16, 2015 at 10:34:13AM -0400, Johannes Weiner wrote:
> > On Thu, Apr 16, 2015 at 12:57:36PM +0900, Joonsoo Kim wrote:
> > > This causes following success rate regression of phase 1,2 on 
> > > stress-highalloc
> > > benchmark. The situation of phase 1,2 is that many high order allocations 
> > > are
> > > requested while many threads do kernel build in parallel.
> > 
> > Yes, the patch made the shrinkers on multi-zone nodes less aggressive.
> > From the changelog:
> > 
> > This changes kswapd behavior, which used to invoke the shrinkers for 
> > each
> > zone, but with scan ratios gathered from the entire node, resulting in
> > meaningless pressure quantities on multi-zone nodes.
> > 
> > So the previous code *did* apply more pressure on the shrinkers, but
> > it didn't make any sense.  The number of slab objects to scan for each
> > scanned LRU page depended on how many zones there were in a node, and
> > their relative sizes.  So a node with a large DMA32 and a small Normal
> > would receive vastly different relative slab pressure than a node with
> > only one big zone Normal.  That's not something we should revert to.
> > 
> > If we are too weak on objects compared to LRU pages then we should
> > adjust DEFAULT_SEEKS or individual shrinker settings.
> 
> Now this thread has my attention. Changing shrinker defaults will
> seriously upset the memory balance under load (in unpredictable
> ways) so I really don't think we should even consider changing
> DEFAULT_SEEKS.
> 
> If there's a shrinker imbalance, we need to understand which
> shrinker needs rebalancing, then modify that shrinker's
> configuration and then observe the impact this has on the rest of
> the system. This means looking at variance of the memory footprint
> in steady state, reclaim overshoot and damping rates before steady
> state is acheived, etc.  Balancing multiple shrinkers (especially
> those with dependencies on other caches) under memory
> load is a non-trivial undertaking.
> 
> I don't see any evidence that we have a shrinker imbalance, so I
> really suspect the problem is "shrinkers aren't doing enough work".
> In that case, we need to increase the pressure being generated, not
> start fiddling around with shrinker configurations.

Okay. I agree.

> 
> > If we think our pressure ratio is accurate but we don't reclaim enough
> > compared to our compaction efforts, then any adjustments to improve
> > huge page successrate should come from the allocator/compaction side.
> 
> Right - if compaction is failing, then the problem is more likely
> that it isn't generating enough pressure, and so the shrinkers
> aren't doing the work we are expecting them to do. That's a problem
> with compaction, not the shrinkers...

Yes, I agree that. I will investigate more on compaction.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] mm: vmscan: invoke slab shrinkers from shrink_zone()

2015-04-16 Thread Joonsoo Kim

On Thu, Apr 16, 2015 at 10:34:13AM -0400, Johannes Weiner wrote:
> Hi Joonsoo,
> 
> On Thu, Apr 16, 2015 at 12:57:36PM +0900, Joonsoo Kim wrote:
> > Hello, Johannes.
> > 
> > Ccing Vlastimil, because this patch causes some regression on
> > stress-highalloc test in mmtests and he is a expert on compaction
> > and would have interest on it. :)
> > 
> > On Fri, Nov 28, 2014 at 07:06:37PM +0300, Vladimir Davydov wrote:
> > > If the highest zone (zone_idx=requested_highidx) is not populated, we
> > > won't scan slab caches on direct reclaim, which may result in OOM kill
> > > even if there are plenty of freeable dentries available.
> > > 
> > > It's especially relevant for VMs, which often have less than 4G of RAM,
> > > in which case we will only have ZONE_DMA and ZONE_DMA32 populated and
> > > empty ZONE_NORMAL on x86_64.
> > 
> > I got similar problem mentioned above by Vladimir when I test stress-highest
> > benchmark. My test system has ZONE_DMA and ZONE_DMA32 and ZONE_NORMAL zones
> > like as following.
> > 
> > Node 0, zone  DMA
> > spanned  4095
> > present  3998
> > managed  3977
> > Node 0, zoneDMA32
> > spanned  1044480
> > present  782333
> > managed  762561
> > Node 0, zone   Normal
> > spanned  262144
> > present  262144
> > managed  245318
> > 
> > Perhaps, requested_highidx would be ZONE_NORMAL for almost normal
> > allocation request.
> > 
> > When I test stress-highalloc benchmark, shrink_zone() on requested_highidx
> > zone in kswapd_shrink_zone() is frequently skipped because this zone is
> > already balanced. But, another zone, for example, DMA32, which has more 
> > memory,
> > isn't balanced so kswapd try to reclaim on that zone. But,
> > zone_idx(zone) == classzone_idx isn't true for that zone so
> > shrink_slab() is skipped and we can't age slab objects with same ratio
> > of lru pages.
> 
> No, kswapd_shrink_zone() has the highest *unbalanced* zone as the
> classzone.  When Normal is balanced but DMA32 is not, then kswapd
> scans DMA and DMA32 and invokes the shrinkers for DMA32.

Hmm... there is some corner cases related to compaction.
kswapd checks highest *unbalanced* zone with sc->order, but,
in kswapd_shrink_zone(), test_order would be changed to 0 if running
compaction is possible. In this case, following code could be true and
kswapd skip to shrink that highest unbalanced zone.

  if (!lowmem_pressure &&
zone_balanced(zone, testorder, balance_gap, classzone_idx))

If this happens and lower zone is unbalanced, shrink_zone() would be
called but shrink_slab() could be skipped.

Anyway, I suspected that this is cause of regression, but, I found that
it isn't. When highest unbalanced zone is skipped to shrink due to
compaction possibility, lower zones are also balanced and all shrinks
are skipped in my test. See below.

> 
> > This could be also possible on direct reclaim path as Vladimir
> > mentioned.
> 
> Direct reclaim ignores watermarks and always scans a zone.  The
> problem is only with completely unpopulated zones, but Vladimir
> addressed that.

There is also similar corner case related to compaction in direct
reclaim path.

> > This causes following success rate regression of phase 1,2 on 
> > stress-highalloc
> > benchmark. The situation of phase 1,2 is that many high order allocations 
> > are
> > requested while many threads do kernel build in parallel.
> 
> Yes, the patch made the shrinkers on multi-zone nodes less aggressive.
> >From the changelog:
> 
> This changes kswapd behavior, which used to invoke the shrinkers for each
> zone, but with scan ratios gathered from the entire node, resulting in
> meaningless pressure quantities on multi-zone nodes.
> 
> So the previous code *did* apply more pressure on the shrinkers, but
> it didn't make any sense.  The number of slab objects to scan for each
> scanned LRU page depended on how many zones there were in a node, and
> their relative sizes.  So a node with a large DMA32 and a small Normal
> would receive vastly different relative slab pressure than a node with
> only one big zone Normal.  That's not something we should revert to.

Yes, I agree that previous code didn't make any sense.

> If we are too weak on objects compared to LRU pages then we should
> adjust DEFAULT_SEEKS or individual shrinker settings.
> 
> If we think our pressure ratio is accurate but we don't reclaim enough
> compared to our compaction efforts, then any adjustments to improve
> huge page successrate should come from the allocator/compaction side.

Yes, I agree. Before tackling down to the compaction side, I'd like to
confirm how shrinker works and it has no problem.

> > Base: Run 1
> > Ops 1   33.00 (  0.00%)
> > Ops 2   43.00 (  0.00%)
> > Ops 3   80.00 (  0.00%)
> > Base: Run 2
> > Ops 1   33.00 (  0.00%)
> > Ops 2   44.00 (  0.00%)
> > Ops 3   80.00 (  0.00%)
> > Base: Run 3
> > Ops 1   30.00 (  0.00%)
> > Ops 2

[PATCH] kasan: Show gcc version requirements in Kconfig and Documentation

2015-04-16 Thread Joe Perches

The documentation shows a need for gcc > 4.9.2, but it's
really >=.  The Kconfig entries don't show require versions
so add them.  Correct a latter/later typo too.

Signed-off-by: Joe Perches 
---

(dropping Ingo from cc's)

On Thu, 2015-04-16 at 23:59 -0400, Sasha Levin wrote:
> On 04/16/2015 11:21 PM, Steven Rostedt wrote:
> > I have no idea what event caused the issue :-/
> 4.9.2+ is needed for KASan.

Perhaps the documentation and Kconfig entries could
describe that a bit better.

 Documentation/kasan.txt | 6 +++---
 lib/Kconfig.kasan   | 6 --
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/Documentation/kasan.txt b/Documentation/kasan.txt
index 092fc10..3038b4b 100644
--- a/Documentation/kasan.txt
+++ b/Documentation/kasan.txt
@@ -9,7 +9,7 @@ a fast and comprehensive solution for finding use-after-free 
and out-of-bounds
 bugs.
 
 KASan uses compile-time instrumentation for checking every memory access,
-therefore you will need a certain version of GCC > 4.9.2
+therefore you will need a gcc version of 4.9.2 or later.
 
 Currently KASan is supported only for x86_64 architecture and requires that the
 kernel be built with the SLUB allocator.
@@ -23,8 +23,8 @@ To enable KASAN configure kernel with:
 
 and choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE. Outline/inline
 is compiler instrumentation types. The former produces smaller binary the
-latter is 1.1 - 2 times faster. Inline instrumentation requires GCC 5.0 or
-latter.
+latter is 1.1 - 2 times faster. Inline instrumentation requires a gcc version
+of 5.0 or later.
 
 Currently KASAN works only with the SLUB memory allocator.
 For better bug detection and nicer report, enable CONFIG_STACKTRACE and put
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index 4fecaedc..1e1c23e 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -10,8 +10,9 @@ config KASAN
help
  Enables kernel address sanitizer - runtime memory debugger,
  designed to find out-of-bounds accesses and use-after-free bugs.
- This is strictly debugging feature. It consumes about 1/8
- of available memory and brings about ~x3 performance slowdown.
+ This is strictly a debugging feature and it requires a gcc version
+ of 4.9.2 or later.  It consumes about 1/8 of available memory and
+ brings about ~x3 performance slowdown.
  For better error detection enable CONFIG_STACKTRACE,
  and add slub_debug=U to boot cmdline.
 
@@ -40,6 +41,7 @@ config KASAN_INLINE
  memory accesses. This is faster than outline (in some workloads
  it gives about x2 boost over outline instrumentation), but
  make kernel's .text size much bigger.
+ This requires a gcc version of 5.0 or later.
 
 endchoice
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] IOMMU: Removed checkpatch warnings for Missing a blank line after declarations

2015-04-16 Thread Robert Callicotte

Fixed checkpatch warnings for missing blank line after declaration of
struct.

Signed-off-by: Robert Callicotte 
---
 drivers/iommu/iova.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 9dd8208..b7c3d92 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -227,6 +227,7 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova)
/* Figure out where to put new node */
while (*new) {
struct iova *this = container_of(*new, struct iova, node);
+
parent = *new;
 
if (iova->pfn_lo < this->pfn_lo)
@@ -350,6 +351,7 @@ void
 free_iova(struct iova_domain *iovad, unsigned long pfn)
 {
struct iova *iova = find_iova(iovad, pfn);
+
if (iova)
__free_iova(iovad, iova);
 
@@ -369,6 +371,7 @@ void put_iova_domain(struct iova_domain *iovad)
node = rb_first(>rbroot);
while (node) {
struct iova *iova = container_of(node, struct iova, node);
+
rb_erase(node, >rbroot);
free_iova_mem(iova);
node = rb_first(>rbroot);
@@ -482,6 +485,7 @@ copy_reserved_iova(struct iova_domain *from, struct 
iova_domain *to)
for (node = rb_first(>rbroot); node; node = rb_next(node)) {
struct iova *iova = container_of(node, struct iova, node);
struct iova *new_iova;
+
new_iova = reserve_iova(to, iova->pfn_lo, iova->pfn_hi);
if (!new_iova)
printk(KERN_ERR "Reserve iova range %lx@%lx failed\n",
-- 
2.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 07/18 v3] tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values

2015-04-16 Thread Sasha Levin

On 04/16/2015 11:21 PM, Steven Rostedt wrote:
> On Wed, 15 Apr 2015 20:58:31 -0400
> Sasha Levin  wrote:
> 
>> > On 04/15/2015 10:05 AM, Steven Rostedt wrote:
>>> > > On Wed, 15 Apr 2015 09:22:37 -0400
>>> > > Sasha Levin  wrote:
>>> > > 
 > >> Hey Steven,
 > >>
 > >> I'm seeing the following when booting:
 > >>
 > >> [   10.678876] BUG: KASan: out of bounds access in 
 > >> trace_event_enum_update+0xb1d/0xb70 at addr a6c4dc68
>>> > > 
>>> > > Thanks for the report. Mind sending me over your config, and which git
>>> > > commit was your HEAD. I can't seem to find 7858a62 from your output.
>> > 
>> > It reproduces on the latest -next kernel. I've attached my config.
>> > 
> What version of gcc is required to run KASan, the highest version I
> have to build kernels with is 4.9.0
> 
> I have no idea what event caused the issue :-/

4.9.2+ is needed for KASan.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [V8 PATCH 1/3] ACPICA: Add ACPI _CLS processing

2015-04-16 Thread Moore, Robert

Yes, this is a good question that we should think about.
Bob


> -Original Message-
> From: Zheng, Lv
> Sent: Thursday, April 16, 2015 6:46 PM
> To: Suravee Suthikulpanit; r...@rjwysocki.net;
> mika.westerb...@linux.intel.com; Moore, Robert; hanjun@linaro.org
> Cc: l...@kernel.org; hdego...@redhat.com; t...@kernel.org;
> mj...@srcf.ucam.org; gre...@linuxfoundation.org; al.st...@linaro.org;
> graeme.greg...@linaro.org; leo.du...@amd.com; linux-...@vger.kernel.org;
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linaro-
> a...@lists.linaro.org
> Subject: RE: [V8 PATCH 1/3] ACPICA: Add ACPI _CLS processing
> 
> Before back porting this to ACPICA, let me ask one simple question.
> According to the spec, the _CLS is optional and PCI specific.
> So why should we implement it in ACPICA core not OSPM specific modules?
> If this need to be implemented in ACPICA, then what about the following
> device identification objects?
> _DDN, _HRV, _MLS, _PLD, _STR, _SUN
> 
> Thanks and best regards
> -Lv
> 
> > From: Suravee Suthikulpanit [mailto:suravee.suthikulpa...@amd.com]
> > Sent: Tuesday, March 31, 2015 5:56 AM
> >
> > ACPI Device configuration often contain _CLS object to suppy
> > PCI-defined class code for the device. This patch introduces logic to
> > process the _CLS object.
> >
> > Acked-by: Mika Westerberg 
> > Reviewed-by: Hanjun Guo 
> > Signed-off-by: Suravee Suthikulpanit 
> > ---
> >  drivers/acpi/acpica/acutils.h  |  3 ++
> > drivers/acpi/acpica/nsxfname.c | 21 ++--
> >  drivers/acpi/acpica/utids.c| 73
> ++
> >  3 files changed, 95 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/acpi/acpica/acutils.h
> > b/drivers/acpi/acpica/acutils.h index c2f03e8..2aef850 100644
> > --- a/drivers/acpi/acpica/acutils.h
> > +++ b/drivers/acpi/acpica/acutils.h
> > @@ -430,6 +430,9 @@ acpi_status
> >  acpi_ut_execute_CID(struct acpi_namespace_node *device_node,
> > struct acpi_pnp_device_id_list ** return_cid_list);
> >
> > +acpi_status
> > +acpi_ut_execute_CLS(struct acpi_namespace_node *device_node,
> > +   struct acpi_pnp_device_id **return_id);
> >  /*
> >   * utlock - reader/writer locks
> >   */
> > diff --git a/drivers/acpi/acpica/nsxfname.c
> > b/drivers/acpi/acpica/nsxfname.c index d66c326..590ef06 100644
> > --- a/drivers/acpi/acpica/nsxfname.c
> > +++ b/drivers/acpi/acpica/nsxfname.c
> > @@ -276,11 +276,12 @@ acpi_get_object_info(acpi_handle handle,
> > struct acpi_pnp_device_id *hid = NULL;
> > struct acpi_pnp_device_id *uid = NULL;
> > struct acpi_pnp_device_id *sub = NULL;
> > +   struct acpi_pnp_device_id *cls = NULL;
> > char *next_id_string;
> > acpi_object_type type;
> > acpi_name name;
> > u8 param_count = 0;
> > -   u8 valid = 0;
> > +   u16 valid = 0;
> > u32 info_size;
> > u32 i;
> > acpi_status status;
> > @@ -320,7 +321,7 @@ acpi_get_object_info(acpi_handle handle,
> > if ((type == ACPI_TYPE_DEVICE) || (type == ACPI_TYPE_PROCESSOR)) {
> > /*
> >  * Get extra info for ACPI Device/Processor objects only:
> > -* Run the Device _HID, _UID, _SUB, and _CID methods.
> > +* Run the Device _HID, _UID, _SUB, _CID and _CLS methods.
> >  *
> >  * Note: none of these methods are required, so they may or
> may
> >  * not be present for this device. The Info->Valid bitfield is
> used
> > @@ -351,6 +352,14 @@ acpi_get_object_info(acpi_handle handle,
> > valid |= ACPI_VALID_SUB;
> > }
> >
> > +   /* Execute the Device._CLS method */
> > +
> > +   status = acpi_ut_execute_CLS(node, );
> > +   if (ACPI_SUCCESS(status)) {
> > +   info_size += cls->length;
> > +   valid |= ACPI_VALID_CLS;
> > +   }
> > +
> > /* Execute the Device._CID method */
> >
> > status = acpi_ut_execute_CID(node, _list); @@ -468,6
> +477,11 @@
> > acpi_get_object_info(acpi_handle handle,
> > sub, next_id_string);
> > }
> >
> > +   if (cls) {
> > +   next_id_string = acpi_ns_copy_device_id(>cls,
> > +   cls, next_id_string);
> > +   }
> > +
> > if (cid_list) {
> > info->compatible_id_list.count = cid_list->count;
> > info->compatible_id_list.list_size = cid_list->list_size; @@ -
> 507,6
> > +521,9 @@ cleanup:
> > if (sub) {
> > ACPI_FREE(sub);
> > }
> > +   if (cls) {
> > +   ACPI_FREE(cls);
> > +   }
> > if (cid_list) {
> > ACPI_FREE(cid_list);
> > }
> > diff --git a/drivers/acpi/acpica/utids.c b/drivers/acpi/acpica/utids.c
> > index 27431cf..9745065 100644
> > --- a/drivers/acpi/acpica/utids.c
> > +++ b/drivers/acpi/acpica/utids.c
> > @@ -416,3 +416,76 @@ cleanup:
> >

Re: [PATCH] proc: move the adding option Ngid to the end of proc/PID/status

2015-04-16 Thread Tejun Heo

On Thu, Apr 16, 2015 at 11:37 PM, Wang, Xiaoming
 wrote:
>> git describe --contains says 3.13 and it's about 1.5 years ago.
>>
> Yes this kernel change is 1.5 years ago.
> As we known not all user update the kernel so frequently.
> They just use the stable one.
> We met this issue when update to 3.13 now.
> A lot of application failed to run which run well previously.
> Do you have any idea on this issue?

Not really. It's a sucky situation. How many applications are we
talking about? I tried to find information on libsecuritysdk but
couldn't find it anywhere.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] proc: move the adding option Ngid to the end of proc/PID/status

2015-04-16 Thread Wang, Xiaoming

Dear tejun

> -Original Message-
> From: hte...@gmail.com [mailto:hte...@gmail.com] On Behalf Of Tejun Heo
> Sent: Friday, April 17, 2015 11:26 AM
> To: Wang, Xiaoming
> Cc: a...@linux-foundation.org; o...@redhat.com;
> andriy.shevche...@linux.intel.com; li...@rasmusvillemoes.dk;
> ebied...@xmission.com; epa...@redhat.com; chenhanx...@cn.fujitsu.com;
> t...@linutronix.de; linux-kernel@vger.kernel.org; Schallberger, Timothy M;
> Zhang, Dongxing
> Subject: Re: [PATCH] proc: move the adding option Ngid to the end of
> proc/PID/status
> 
> Hello,
> 
> On Thu, Apr 16, 2015 at 11:15 PM, Wang, Xiaoming
>  wrote:
> > I am not sure exactly which kernel introduced this Ngid.
> > I check the git and found it added in
> > commit e29cf08b05dc0b8151d65704d96d525a9e179a6b
> 
> git describe --contains says 3.13 and it's about 1.5 years ago.
> 
Yes this kernel change is 1.5 years ago.
As we known not all user update the kernel so frequently.
They just use the stable one.
We met this issue when update to 3.13 now. 
A lot of application failed to run which run well previously.
Do you have any idea on this issue?
> > And this change has destroyed the order already.
> > Moving the adding option Ngid to the end of proc/PID/status is to keep
> > order .
> 
> There isn't any assurance that changing the order won't break a different set 
> of
> programs which make different ordering assumptions.
> The new order is already out there. This isn't a legal dispute where the 
> original
> owner is "right".
> 
> --
> tejun

Re: [git pull] vfs part 3

2015-04-16 Thread Linus Torvalds

On Wed, Apr 15, 2015 at 9:04 PM, Al Viro  wrote:
>
> That leaves only the actual d_inode annotations series out of the
> stuff already in for-next; there are several piles of stuff from various
> folks I'm going to add there tonight, leave it all to stew until the middle
> of the next week or so and send the final pull request then.  I hoped to do
> that last bit on Monday, but since there won't be -next on Thursday and
> Friday, this will probably have to happen a couple of days later ;-/

We can just leave that for next time.

I abhor this "feed things in chunks _during_ the merge window".

If this had been four different and separate branches that did
separate cleanups and had all been independently of each other in
linux-next since before the merge window, and had been in a "ready to
merge" state, that would be one thing. But this kind of "let's feed
Linus one chunk, then work on the next one" is not how it is supposed
to work. Not during the merge window,

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] sched, timer: Remove usages of ACCESS_ONCE in the scheduler

2015-04-16 Thread Jason Low

On Thu, 2015-04-16 at 20:24 +0200, Ingo Molnar wrote:
> Would it make sense to add a few comments to the seq field definition 
> site(s), about how it's supposed to be accessed - or to the 
> READ_ONCE()/WRITE_ONCE() sites, to keep people from wondering?

How about this:

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5a44371..63fa87f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1794,6 +1794,11 @@ static void task_numa_placement(struct task_struct *p)
u64 runtime, period;
spinlock_t *group_lock = NULL;
 
+   /*
+* The p->mm->numa_scan_seq gets updated without
+* exclusive access. Use READ_ONCE() here to ensure
+* that the field is read in a single access.
+*/
seq = READ_ONCE(p->mm->numa_scan_seq);
if (p->numa_scan_seq == seq)
return;
@@ -2107,6 +2112,13 @@ void task_numa_fault(int last_cpupid, int mem_node, int 
pages, int flags)
 
 static void reset_ptenuma_scan(struct task_struct *p)
 {
+   /*
+* We only did a read acquisition of the mmap sem, so
+* p->mm->numa_scan_seq is written to without exclusive access.
+* That's not much of an issue though, since this is just used
+* for statistical sampling. Use WRITE_ONCE and READ_ONCE, which
+* are not expensive, to avoid load/store tearing.
+*/
WRITE_ONCE(p->mm->numa_scan_seq, READ_ONCE(p->mm->numa_scan_seq) + 1);
p->mm->numa_scan_offset = 0;
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] proc: move the adding option Ngid to the end of proc/PID/status

2015-04-16 Thread Tejun Heo

Hello,

On Thu, Apr 16, 2015 at 11:15 PM, Wang, Xiaoming
 wrote:
> I am not sure exactly which kernel introduced this Ngid.
> I check the git and found it added in
> commit e29cf08b05dc0b8151d65704d96d525a9e179a6b

git describe --contains says 3.13 and it's about 1.5 years ago.

> And this change has destroyed the order already.
> Moving the adding option Ngid to the end of proc/PID/status
> is to keep order .

There isn't any assurance that changing the order won't break a
different set of programs which make different ordering assumptions.
The new order is already out there. This isn't a legal dispute where
the original owner is "right".

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 07/18 v3] tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values

2015-04-16 Thread Steven Rostedt

On Wed, 15 Apr 2015 20:58:31 -0400
Sasha Levin  wrote:

> On 04/15/2015 10:05 AM, Steven Rostedt wrote:
> > On Wed, 15 Apr 2015 09:22:37 -0400
> > Sasha Levin  wrote:
> > 
> >> Hey Steven,
> >>
> >> I'm seeing the following when booting:
> >>
> >> [   10.678876] BUG: KASan: out of bounds access in 
> >> trace_event_enum_update+0xb1d/0xb70 at addr a6c4dc68
> > 
> > Thanks for the report. Mind sending me over your config, and which git
> > commit was your HEAD. I can't seem to find 7858a62 from your output.
> 
> It reproduces on the latest -next kernel. I've attached my config.
> 

What version of gcc is required to run KASan, the highest version I
have to build kernels with is 4.9.0

I have no idea what event caused the issue :-/


-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [Adi-buildroot-devel] [Consult] blackfin: About one building break issue for STACKTRACE

2015-04-16 Thread Zhang, Sonic

Hi Gang,

Please only use the GCC for Blackfin 2013R1 or 2014R1 from 
https://sourceforge.net/projects/adi-buildroot/files/ . Upstream GCC5 isn't 
ported to Blackfin properly.

Regards,

Sonic

>-Original Message-
>From: Chen Gang [mailto:xili_gchen_5...@hotmail.com]
>Sent: Thursday, April 16, 2015 11:22 PM
>To: real...@gmail.com; Richard Weinberger
>Cc: adi-buildroot-de...@lists.sourceforge.net; linux-kernel@vger.kernel.org
>Subject: [Adi-buildroot-devel] [Consult] blackfin: About one building break 
>issue for STACKTRACE
>
>Hello Maintainers:
>
>I want to consult one building issue about blackfin, the related issue
>is:
>
>CC  arch/blackfin/kernel/stacktrace.o
>  arch/blackfin/kernel/stacktrace.c: In function 'save_stack_trace':
>  arch/blackfin/kernel/stacktrace.c:23:6: error: frame pointer required, but 
> reserved
>   void save_stack_trace(struct stack_trace *trace)
>^
>  arch/blackfin/kernel/stacktrace.c:13:24: note: for 'current_frame_pointer'
>   register unsigned long current_frame_pointer asm("FP");
>  ^
>
>For me:
>
> - Originally, I treated it as gcc's issue, but after think of, for me,
>   gcc is OK:
>
>"-fomit-frame-pointer" is needed by extern "FP" pointer.
>
>"-fomit-frame-pointer" is against "-pg" (they can not be together).
>
> - For kernel:
>
>STACKTRACE needs "-fomit-frame-pointer", and FUNCTION_TRACER will
>enable "-pg",
>
>FUNCTION_TRACER and STACKTRACE are related.
>
>The related commit: "1c873be Blackfin: initial support for ftrace"
>(the commit time point is Jun 9 2009).
>
> - After this related commit, it never can pass building by upstream
>   blackfin gcc5:
>
>make defconfig && make menuconfig
>
>enable FUNCTION_TRACER (which also enable STACKTRACE)
>
>make (which will cause building break)
>
>So I want to consult:
>
> - Is it OK to use upstream gcc5 for blackfin? (or which gcc version is
>   suitable for building blackfin Linux kernel?)
>
> - Did the original commit pass building ? (e.g. by one of old gcc
>   version).
>
> - How to fix this issue, next?
>
>
>Welcome any members ideas, suggestions and completions.
>
>Thanks.
>--
>Chen Gang
>
>Open, share, and attitude like air, water, and life which God blessed
>
>--
>BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own 
>process in accordance with the BPMN
>2 standard Learn Process modeling best practices with Bonita BPM through live 
>exercises
>http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
>source=Sourceforge_BPM_Camp_5_6_15_medium=email_campaign=VA_SF
>___
>Adi-buildroot-devel mailing list
>adi-buildroot-de...@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/adi-buildroot-devel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] proc: move the adding option Ngid to the end of proc/PID/status

2015-04-16 Thread Wang, Xiaoming

Dear tejun

> -Original Message-
> From: Tejun Heo [mailto:hte...@gmail.com] On Behalf Of Tejun Heo
> Sent: Friday, April 17, 2015 10:56 AM
> To: Wang, Xiaoming
> Cc: a...@linux-foundation.org; o...@redhat.com;
> andriy.shevche...@linux.intel.com; li...@rasmusvillemoes.dk;
> ebied...@xmission.com; epa...@redhat.com; chenhanx...@cn.fujitsu.com;
> t...@linutronix.de; linux-kernel@vger.kernel.org; Schallberger, Timothy M;
> Zhang, Dongxing
> Subject: Re: [PATCH] proc: move the adding option Ngid to the end of
> proc/PID/status
> 
> On Fri, Apr 17, 2015 at 10:13:15AM +0800, Wang Xiaoming wrote:
> > Move debugging has been done and the following Kernel issue was found
> > with a number of applications.
> > Take a look at: (even though the comments are for Weibo.browser they
> > also pertain to other apps that use Libsecuritysdk-x.x.x.so
> >
> > In kernel(3.14) is a little different than before it will generate
> > /proc/PID/status in this way:
> > Name: a.weibo.browser
> > State: T (stopped)
> > Tgid: 8487
> > Ngid: 0  add in kernel after (3.11 maybe)
> 
> Well, that's kinda hilarious and I don't know.  3.11 is way back and what if 
> there
> are others depending on the current ordering?  Both situations kinda suck so
> what's the point of changing?
> 
I am not sure exactly which kernel introduced this Ngid.
I check the git and found it added in 
commit e29cf08b05dc0b8151d65704d96d525a9e179a6b
And this change has destroyed the order already.
Moving the adding option Ngid to the end of proc/PID/status
is to keep order .
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the drm tree with the v4l-dvb tree

2015-04-16 Thread Dave Airlie

> Am Mittwoch, den 15.04.2015, 13:33 +1000 schrieb Stephen Rothwell:
> > Hi Dave,
> > 
> > Today's linux-next merge of the drm tree got a conflict in
> > Documentation/DocBook/media/v4l/subdev-formats.xml between commit
> > 7b0fd4568bee ("[media] v4l: Add RBG and RGB 8:8:8 media bus formats on
> > 24 and 32 bit busses") and e8b2d7a565ae ("[media] v4l: Sort YUV formats
> > of v4l2_mbus_pixelcode") from the v4l-dvb tree and commits 08c38458be7e
> > ("Add BGR888_1X24 and GBR888_1X24 media bus formats"), 0fc63eb104d7
> > ("Add YUV8_1X24 media bus format") and 203508ef52e3 ("Add
> > RGB666_1X24_CPADHI media bus format") from the drm tree.
> > 
> > I fixed it up (almost certainly incorrectly - see below) and can carry
> > the fix as necessary.  Please sort out who "owns" this file and try to
> > coordinate updates to it.
> 
> Together with the corresponding fixup for 
> include/uapi/linux/media-bus-format.h,
> how about this:

This should never have gone into my tree if there wasn't someone in the 
v4l tree who knew it was coming,

In future please merge the media-bus-formats through both tree, providing
a stable git tree to both maintainers to pull from, though this may not
avoid all bad cases, it hopefully will avoid this sort of mess.

I'm not really sure how best to clean this one up, I think I'd want
patches to my tree that just use the correect values, then it would just 
be a normal conflict on merging, instead of renumbering userspace visible
values,

Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] proc: move the adding option Ngid to the end of proc/PID/status

2015-04-16 Thread Tejun Heo

On Fri, Apr 17, 2015 at 10:13:15AM +0800, Wang Xiaoming wrote:
> Move debugging has been done and the following Kernel issue
> was found with a number of applications.
> Take a look at: (even though the comments are for Weibo.browser
> they also pertain to other apps that use Libsecuritysdk-x.x.x.so
> 
> In kernel(3.14) is a little different than before
> it will generate /proc/PID/status in this way:
> Name: a.weibo.browser
> State: T (stopped)
> Tgid: 8487
> Ngid: 0  add in kernel after (3.11 maybe)

Well, that's kinda hilarious and I don't know.  3.11 is way back and
what if there are others depending on the current ordering?  Both
situations kinda suck so what's the point of changing?

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH v2 08/13] usb: otg: hub: Notify OTG fsm when A device sets b_hnp_enable

2015-04-16 Thread Peter Chen

On Tue, Apr 14, 2015 at 01:41:55PM +0300, Roger Quadros wrote:
> This is the a_set_b_hnp_enable flag in the OTG state machine
> diagram and must be set when the A-Host has successfully set
> the b_hnp_enable feature of the OTG-B-Peripheral attached to it.
> 
> When this bit changes we kick our OTG FSM to make note of the
> change and act accordingly.
> 
> Signed-off-by: Roger Quadros 
> ---
>  drivers/usb/core/hub.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index d7c3d5a..ab0d498 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -2270,6 +2271,9 @@ static int usb_enumerate_device_otg(struct usb_device 
> *udev)
>   "can't set HNP mode: %d\n",
>   err);
>   bus->b_hnp_enable = 0;
> + } else {
> + /* notify OTG fsm about 
> a_set_b_hnp_enable */
> + usb_otg_kick_fsm(udev->bus->controller);
>   }
>   }
>   }
> @@ -4244,8 +4248,13 @@ hub_port_init (struct usb_hub *hub, struct usb_device 
> *udev, int port1,
>*/
>   if (!hdev->parent) {
>   delay = HUB_ROOT_RESET_TIME;
> - if (port1 == hdev->bus->otg_port)
> + if (port1 == hdev->bus->otg_port) {
>   hdev->bus->b_hnp_enable = 0;
> +#ifdef CONFIG_USB_OTG
> + /* notify OTG fsm about a_set_b_hnp_enable change */
> + usb_otg_kick_fsm(hdev->bus->controller);
> +#endif
> + }
>   }
>  
>   /* Some low speed devices have problems with the quick delay, so */
> -- 
> 2.1.0
> 

Since the fsm has already kicked, the only thing we need is update
fsm.a_set_b_hnp_enable, but this flag is missing at current fsm
structure.

-- 

Best Regards,
Peter Chen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH v2 06/13] usb: hcd: Add hcd add/remove functions for OTG use

2015-04-16 Thread Peter Chen

On Tue, Apr 14, 2015 at 01:41:53PM +0300, Roger Quadros wrote:
> The existing usb_add/remove_hcd() functionality
> remains unchanged for non-OTG devices. For OTG
> devices they only register the HCD with the OTG core.
> 
> Introduce _usb_add/remove_hcd() for use by OTG core.
> These functions actually add/remove the HCD.

Would you please explain why additional _usb_add/remove_hcd are needed?

Peter
> 
> Signed-off-by: Roger Quadros 
> ---
>  drivers/usb/common/usb-otg.c | 14 +++---
>  drivers/usb/core/hcd.c   | 24 ++--
>  include/linux/usb/hcd.h  |  3 +++
>  3 files changed, 32 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/usb/common/usb-otg.c b/drivers/usb/common/usb-otg.c
> index e848e08..860e2e7 100644
> --- a/drivers/usb/common/usb-otg.c
> +++ b/drivers/usb/common/usb-otg.c
> @@ -198,18 +198,18 @@ static int usb_otg_start_host(struct otg_fsm *fsm, int 
> on)
>   otgd->start_host(fsm, on);
>  
>   /* start host */
> - usb_add_hcd(otgd->primary_hcd.hcd, otgd->primary_hcd.irqnum,
> - otgd->primary_hcd.irqflags);
> + _usb_add_hcd(otgd->primary_hcd.hcd, otgd->primary_hcd.irqnum,
> +  otgd->primary_hcd.irqflags);
>   if (otgd->shared_hcd.hcd) {
> - usb_add_hcd(otgd->shared_hcd.hcd,
> - otgd->shared_hcd.irqnum,
> - otgd->shared_hcd.irqflags);
> + _usb_add_hcd(otgd->shared_hcd.hcd,
> +  otgd->shared_hcd.irqnum,
> +  otgd->shared_hcd.irqflags);
>   }
>   } else {
>   /* stop host */
>   if (otgd->shared_hcd.hcd)
> - usb_remove_hcd(otgd->shared_hcd.hcd);
> - usb_remove_hcd(otgd->primary_hcd.hcd);
> + _usb_remove_hcd(otgd->shared_hcd.hcd);
> + _usb_remove_hcd(otgd->primary_hcd.hcd);
>  
>   /* OTG device operations */
>   if (otgd->start_host)
> diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> index 45a915c..9a9c0f7 100644
> --- a/drivers/usb/core/hcd.c
> +++ b/drivers/usb/core/hcd.c
> @@ -46,6 +46,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "usb.h"
>  
> @@ -2622,7 +2623,7 @@ static void usb_put_invalidate_rhdev(struct usb_hcd 
> *hcd)
>   * buffers of consistent memory, register the bus, request the IRQ line,
>   * and call the driver's reset() and start() routines.
>   */
> -int usb_add_hcd(struct usb_hcd *hcd,
> +int _usb_add_hcd(struct usb_hcd *hcd,
>   unsigned int irqnum, unsigned long irqflags)
>  {
>   int retval;
> @@ -2827,6 +2828,17 @@ err_phy:
>   }
>   return retval;
>  }
> +EXPORT_SYMBOL_GPL(_usb_add_hcd);
> +
> +int usb_add_hcd(struct usb_hcd *hcd,
> + unsigned int irqnum, unsigned long irqflags)
> +{
> + /* If OTG device, OTG core takes care of adding HCD */
> + if (usb_otg_register_hcd(hcd, irqnum, irqflags))
> + return _usb_add_hcd(hcd, irqnum, irqflags);
> +
> + return 0;
> +}
>  EXPORT_SYMBOL_GPL(usb_add_hcd);
>  
>  /**
> @@ -2837,7 +2849,7 @@ EXPORT_SYMBOL_GPL(usb_add_hcd);
>   * Disconnects the root hub, then reverses the effects of usb_add_hcd(),
>   * invoking the HCD's stop() method.
>   */
> -void usb_remove_hcd(struct usb_hcd *hcd)
> +void _usb_remove_hcd(struct usb_hcd *hcd)
>  {
>   struct usb_device *rhdev = hcd->self.root_hub;
>  
> @@ -2911,6 +2923,14 @@ void usb_remove_hcd(struct usb_hcd *hcd)
>  
>   usb_put_invalidate_rhdev(hcd);
>  }
> +EXPORT_SYMBOL_GPL(_usb_remove_hcd);
> +
> +void usb_remove_hcd(struct usb_hcd *hcd)
> +{
> + /* If OTG device, OTG core takes care of stopping HCD */
> + if (usb_otg_unregister_hcd(hcd))
> + _usb_remove_hcd(hcd);
> +}
>  EXPORT_SYMBOL_GPL(usb_remove_hcd);
>  
>  void
> diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h
> index 68b1e83..7993ae7 100644
> --- a/include/linux/usb/hcd.h
> +++ b/include/linux/usb/hcd.h
> @@ -433,7 +433,10 @@ extern void usb_put_hcd(struct usb_hcd *hcd);
>  extern int usb_hcd_is_primary_hcd(struct usb_hcd *hcd);
>  extern int usb_add_hcd(struct usb_hcd *hcd,
>   unsigned int irqnum, unsigned long irqflags);
> +extern int _usb_add_hcd(struct usb_hcd *hcd,
> + unsigned int irqnum, unsigned long irqflags);
>  extern void usb_remove_hcd(struct usb_hcd *hcd);
> +extern void _usb_remove_hcd(struct usb_hcd *hcd);
>  extern int usb_hcd_find_raw_port_number(struct usb_hcd *hcd, int port1);
>  
>  struct platform_device;
> -- 
> 2.1.0
> 

-- 

Best Regards,
Peter Chen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] net: dsa: register hwmon for any provided function

2015-04-16 Thread Guenter Roeck


hi Vivien,

On 04/16/2015 03:05 PM, Vivien Didelot wrote:

Hi Guenter,


  switch (index) {
+case 0: /* temp1_input */
+if (drv->get_temp)
+mode |= S_IRUGO;


This should be mandatory. Sorry, I don't really understand what you are
trying to accomplish here.

Can you give me a real world example where a chip would support setting
a limit but not reading it ?


I have no such example. I just did not see why this couldn't be allowed
(e.g. setting only set_temp_limit and get_temp_alarm looks fine to me).
But if you say that get_temp should be mandatory, I'm OK with that.


write-only attributes are not defined in the hwmon ABI. If the 'sensors'
command encounters such an attribute, it will create an error message
each time it executes. That doesn't sound very useful to me.

If a chip - for whatever reason - does not have a limit register
but an alarm register or flag, its temperature limit is usually hard-coded
and can be reported this way (the AMD temperature sensor driver does this,
for example). If there is ever a need to support the alarm-register-only
situation for some odd reason, we can add the code at the time.
For now, it just seems to me that you are adding complexity to solve
some theoretic problem which is very unlikely to occur in the real world.


The primary goal of this patchset was to use DEVICE_ATTR_RW to declare
temp1_max, instead of reflecting the minimal permissions needed.


Then why don't you just do that and nothing else ? The goal should be
to simplify code, not to make it more complicated. If the result isn't
less code, I don't think it is worth it.

Thanks,
Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] proc: move the adding option Ngid to the end of proc/PID/status

2015-04-16 Thread Wang Xiaoming

Move debugging has been done and the following Kernel issue
was found with a number of applications.
Take a look at: (even though the comments are for Weibo.browser
they also pertain to other apps that use Libsecuritysdk-x.x.x.so

In kernel(3.14) is a little different than before
it will generate /proc/PID/status in this way:
Name: a.weibo.browser
State: T (stopped)
Tgid: 8487
Ngid: 0  add in kernel after (3.11 maybe)
Pid: 8487
PPid: 139
TracerPid: 0 -=> line 7
……

But on previous kernel(3.11), it normally like that:
Name: a.weibo.browser
State: S (sleeping)
Tgid: 2109
Pid: 2109
PPid: 231
TracerPid: 0 ---=> line 6
……

WeiBo always assume the “TracePid” is in line 6 of the status.
And it will read “PPid: 139” instead of “TracePid: 0”,
which will made Weibo to kill the process because there is attached debugger.
This issue also met in other application.

As the Ngid is added later, so it should be added at the end of task_state.
It is better keeping compatible to avoid such issue.

Signed-off-by: Schallberger, Timothy M 
Signed-off-by: Dongxing Zhang 
Signed-off-by: Wang Xiaoming 
---
 fs/proc/array.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index fd02a9e..86dcd2b 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -163,15 +163,15 @@ static inline void task_state(struct seq_file *m, struct 
pid_namespace *ns,
seq_printf(m,
"State:\t%s\n"
"Tgid:\t%d\n"
-   "Ngid:\t%d\n"
"Pid:\t%d\n"
"PPid:\t%d\n"
"TracerPid:\t%d\n"
"Uid:\t%d\t%d\t%d\t%d\n"
"Gid:\t%d\t%d\t%d\t%d\n"
+   "Ngid:\t%d\n"
"FDSize:\t%d\nGroups:\t",
get_task_state(p),
-   tgid, ngid, pid_nr_ns(pid, ns), ppid, tpid,
+   tgid, pid_nr_ns(pid, ns), ppid, tpid,
from_kuid_munged(user_ns, cred->uid),
from_kuid_munged(user_ns, cred->euid),
from_kuid_munged(user_ns, cred->suid),
@@ -180,6 +180,7 @@ static inline void task_state(struct seq_file *m, struct 
pid_namespace *ns,
from_kgid_munged(user_ns, cred->egid),
from_kgid_munged(user_ns, cred->sgid),
from_kgid_munged(user_ns, cred->fsgid),
+   ngid,
max_fds);
 
group_info = cred->group_info;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] mm/hwpoison-inject: check PageLRU of hpage

2015-04-16 Thread Naoya Horiguchi

Hwpoison injector checks PageLRU of the raw target page to find out whether
the page is an appropriate target, but current code now filters out thp tail
pages, which prevents us from testing for such cases via this interface.
So let's check hpage instead of p.

Signed-off-by: Naoya Horiguchi 
---
 mm/hwpoison-inject.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git v4.0.orig/mm/hwpoison-inject.c v4.0/mm/hwpoison-inject.c
index 2b3f933e3282..4ca5fe0042e1 100644
--- v4.0.orig/mm/hwpoison-inject.c
+++ v4.0/mm/hwpoison-inject.c
@@ -34,12 +34,12 @@ static int hwpoison_inject(void *data, u64 val)
if (!hwpoison_filter_enable)
goto inject;
 
-   if (!PageLRU(p) && !PageHuge(p))
-   shake_page(p, 0);
+   if (!PageLRU(hpage) && !PageHuge(p))
+   shake_page(hpage, 0);
/*
 * This implies unable to support non-LRU pages.
 */
-   if (!PageLRU(p) && !PageHuge(p))
+   if (!PageLRU(hpage) && !PageHuge(p))
goto put_out;
 
/*
-- 
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] mm/hwpoison-inject: fix refcounting in no-injection case

2015-04-16 Thread Naoya Horiguchi

Hwpoison injection via debugfs:hwpoison/corrupt-pfn takes a refcount of
the target page. But current code doesn't release it if the target page
is not supposed to be injected, which results in memory leak.
This patch simply adds the refcount releasing code.

Signed-off-by: Naoya Horiguchi 
---
 mm/hwpoison-inject.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git v4.0.orig/mm/hwpoison-inject.c v4.0/mm/hwpoison-inject.c
index 329caf56df22..2b3f933e3282 100644
--- v4.0.orig/mm/hwpoison-inject.c
+++ v4.0/mm/hwpoison-inject.c
@@ -40,7 +40,7 @@ static int hwpoison_inject(void *data, u64 val)
 * This implies unable to support non-LRU pages.
 */
if (!PageLRU(p) && !PageHuge(p))
-   return 0;
+   goto put_out;
 
/*
 * do a racy check with elevated page count, to make sure PG_hwpoison
@@ -52,11 +52,14 @@ static int hwpoison_inject(void *data, u64 val)
err = hwpoison_filter(hpage);
unlock_page(hpage);
if (err)
-   return 0;
+   goto put_out;
 
 inject:
pr_info("Injecting memory failure at pfn %#lx\n", pfn);
return memory_failure(pfn, 18, MF_COUNT_INCREASED);
+put_out:
+   put_page(hpage);
+   return 0;
 }
 
 static int hwpoison_unpoison(void *data, u64 val)
-- 
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state

2015-04-16 Thread Guenter Roeck


On 04/16/2015 11:51 AM, Geert Uytterhoeven wrote:

On Thu, Apr 16, 2015 at 3:46 PM, Guenter Roeck  wrote:

On 04/15/2015 10:12 PM, Guenter Roeck wrote:


Return correct error code if _mv88e6xxx_reg_read returns an error.

Fixes: facd95b2e0ec0 ("net: dsa: mv88e6xxx: Add Hardware bridging
support")
Signed-off-by: Guenter Roeck 



I should have given proper credit.

Reported-by: kbuild test robot 

For the curious, neither W=1, W=2, C-1, C=2, nor sparse report this problem,
at least not with gcc 4.9.1.


Good old gcc 4.1.2 (which I still use for m68k builds, and won't retire anytime
soon as it finds real bugs in every new kernel release) says:

 drivers/net/dsa/mv88e6xxx.c: In function ‘mv88e6xxx_set_port_state’:
 drivers/net/dsa/mv88e6xxx.c:905: warning: ‘ret’ may be used
uninitialized in this function

Even after your patch, as it missed one case (patch sent).



Grumble. So much for trusting tools :-(

Thanks for fixing this!

Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: AM335x OMAP2 common clock external fixed-clock registration

2015-04-16 Thread Michael Welling

On Fri, Apr 17, 2015 at 01:23:50AM +0200, Sebastian Hesselbarth wrote:
> On 17.04.2015 00:09, Michael Welling wrote:
> >On Thu, Apr 16, 2015 at 10:37:19PM +0200, Sebastian Hesselbarth wrote:
> >>On 16.04.2015 18:17, Michael Welling wrote:
> >>>On Thu, Apr 16, 2015 at 07:32:32AM +0300, Tero Kristo wrote:
> On 04/15/2015 11:51 PM, Michael Welling wrote:
> >On Wed, Apr 15, 2015 at 01:45:53PM -0700, Mike Turquette wrote:
> >>On Wed, Apr 15, 2015 at 12:47 PM, Michael Welling  
> >>wrote:
> >>[...]
> >>>There is still an issue with the si5351.
> >>>
> >>>I had to comment out the clk_put here for the frequency to show up:
> >>>http://lxr.free-electrons.com/source/drivers/clk/clk-si5351.c#L1133
> >>>
> >>>Ideas?
> >>
> >>What is the most recent upstream commit that you are based on?
> >
> >I am working from 4.0.0-rc7.
> >
> >7b43b47373d40d557cd7e1a84a0bd8ebc4d745ab
> 
> Hmm, I wonder why si5351 calls clk_put immediately after of_clk_get
> in the first place, as far as I understand this destroys the clock
> handle, which is still being used later in the code.
> >>>
> >>>Not sure how this ever worked. This has been in the code since the
> >>>initial commit.
> >>
> >>The reason it worked before may be related with recent rework of
> >>clk_put() itself and clk cookies instead of pointers. I lost track on
> >>the recent clk subsystem changes here, sorry.
> >>
> >>However, droping the clk immediately surely isn't right.
> >>The thing is, we can remove the clk_put() just because there is no
> >>_remove() for that driver. I remember that back in the days the driver
> >>was mainlined, clk removal wasn't too easy.
> >>
> >>FWIW, as soon as _remove() support will be added by someone, we'll have
> >>to rethink passing struct clk* by platform_data or at least
> >>double-check if we ever used [of_]clk_get() to obtain it.
> >>
> >>Mind to send a patch removing the clk_put() on !IS_ERR and add a proper
> >>error path instead? While of_clk_get() is the only calls that need
> >>cleanup on error in si5351_dt_parse() we should probably move that
> >>calls to the end of this function. Otherwise we'd also have to cleanup
> >>on every of_parse_foo() failure.
> >
> >What would be the proper error path?
> >What cleanup is required?
> 
> A proper error path would be to release any claimed resource
> on any error. If you look at the code, the only resources that
> need to be released are the two clocks in question.

So for every error return in the probe function and in the of si5351_dt_parse
it needs to clk_put first right?

See attached patch to see if we are on the same page.

> 
> >It should be noted that there are more deep rooted issues with the driver
> >that I have noticed. For one the driver behaves differently if the debugging
> >is on and when it is off.
> 
> I guess you mean #define DEBUG in the driver?

Yes.

> 
> >Here is what the kernel reports with debugging off:
> 
> Do you have any measurement equipment to check what is actually set?

Yes, I have an oscilloscope here at my desk.
The reported numbers do not always correspond to the actual output in some
cases.

The ms2 output has appeared to stop working all together sometime whilest
testing. I may have to solder a new chip on there.

Could misconfiguration damage the chip?

> 
> >root@som3517-som200:~# cat /sys/kernel/debug/clk/clk_summary
> >clock enable_cnt  prepare_cntrate   
> > accuracy   phase
> >
> >  ref27002700
> >   0 0
> > xtal  002700
> >   0 0
> >pllb   00   59994
> >   0 0
> >   ms0 001249
> >   0 0
> >  clk0 001249
> >   0 0
> >plla   00   59994
> >   0 0
> >   ms2 00 8219178
> >   0 0
> >  clk2 00 8219178
> >   0 0
> >   ms1 0094117646
> >   0 0
> >  clk1 0094117646
> >   0 0
> >
> >Here is what the kernel reports with debugging on:
> >clock enable_cnt  prepare_cntrate   
> > accuracy   phase
> >
> >  ref27002700
> >   0 0
> > xtal  002700
> >   0 0
> >pllb   0

Re: [RFC PATCH] Security: ignore private inode from security_file_receive

2015-04-16 Thread Seung-Woo Kim

Hello,

On 2015년 04월 16일 22:48, Stephen Smalley wrote:
> On 04/16/2015 09:40 AM, Seung-Woo Kim wrote:
>> The dma-buf fd from anon_inode can be shared across processes, but
>> there is no way to set security permission for the fd. So this
>> patch fix just to ignore private inode from security_file_receive.
>>
>> Signed-off-by: Seung-Woo Kim 
>> ---
>>
>> If security like smack is enabled, the dmabuf fd can not be shared between
>> processes via unix domain socket. I am not familiar with security, so I am
>> not sure that this kind of patch can be acceptable.
>>
>> Is there other option to share dmabuf fd via socket with security check?
>>
>> Best Regards,
>> - Seung-Woo Kim
>>
>> ---
>>  security/security.c |3 +++
>>  1 files changed, 3 insertions(+), 0 deletions(-)
>>
>> diff --git a/security/security.c b/security/security.c
>> index 730ac65..c57354c 100644
>> --- a/security/security.c
>> +++ b/security/security.c
>> @@ -810,6 +810,9 @@ int security_file_send_sigiotask(struct task_struct *tsk,
>>  
>>  int security_file_receive(struct file *file)
>>  {
>> +
>> +if (unlikely(IS_PRIVATE(file->f_path.dentry->d_inode)))
>> +return 0;
>>  return security_ops->file_receive(file);
>>  }
> 
> SELinux handles this internally; see its inode_has_perm() function.
> Doing it here would prevent any security module checking at all, even of
> the struct file, which SELinux does presently do (selinux_file_receive
> calls file_has_perm which applies the fd use check and then calls
> inode_has_perm on the inode).  Unless you are saying that the
> file->f_security field is also not being set correctly.

Thanks for the suggestion. I will try to do on smack side.

Best Regards,
- Seung-Woo Kim

> 
> 
> 

-- 
Seung-Woo Kim
Samsung Software R Center
--

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/6] Input: elan_i2c - Add hover detection flag

2015-04-16 Thread duson

When hover event coming, set ABS_MT_DISTANCE as 1, otherwise clear to 0.

Signed-off-by: Duson Lin 
---
 drivers/input/mouse/elan_i2c_core.c |   15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/input/mouse/elan_i2c_core.c 
b/drivers/input/mouse/elan_i2c_core.c
index 6333ba6..da7893f 100644
--- a/drivers/input/mouse/elan_i2c_core.c
+++ b/drivers/input/mouse/elan_i2c_core.c
@@ -52,6 +52,7 @@
 #define ETP_REPORT_ID_OFFSET   2
 #define ETP_TOUCH_INFO_OFFSET  3
 #define ETP_FINGER_DATA_OFFSET 4
+#define ETP_HOVER_INFO_OFFSET  30
 #define ETP_MAX_REPORT_LEN 34
 
 /* The main device structure */
@@ -725,7 +726,7 @@ static const struct attribute_group *elan_sysfs_groups[] = {
  */
 static void elan_report_contact(struct elan_tp_data *data,
int contact_num, bool contact_valid,
-   u8 *finger_data)
+   bool hover_event, u8 *finger_data)
 {
struct input_dev *input = data->input;
unsigned int pos_x, pos_y;
@@ -769,7 +770,9 @@ static void elan_report_contact(struct elan_tp_data *data,
input_mt_report_slot_state(input, MT_TOOL_FINGER, true);
input_report_abs(input, ABS_MT_POSITION_X, pos_x);
input_report_abs(input, ABS_MT_POSITION_Y, data->max_y - pos_y);
-   input_report_abs(input, ABS_MT_PRESSURE, scaled_pressure);
+   input_report_abs(input, ABS_MT_DISTANCE, hover_event);
+   input_report_abs(input, ABS_MT_PRESSURE,
+hover_event ? 0 : scaled_pressure);
input_report_abs(input, ABS_TOOL_WIDTH, mk_x);
input_report_abs(input, ABS_MT_TOUCH_MAJOR, major);
input_report_abs(input, ABS_MT_TOUCH_MINOR, minor);
@@ -785,11 +788,14 @@ static void elan_report_absolute(struct elan_tp_data 
*data, u8 *packet)
u8 *finger_data = [ETP_FINGER_DATA_OFFSET];
int i;
u8 tp_info = packet[ETP_TOUCH_INFO_OFFSET];
-   bool contact_valid;
+   u8 hover_info = packet[ETP_HOVER_INFO_OFFSET];
+   bool contact_valid, hover_event;
 
+   hover_event = hover_info & 0x40;
for (i = 0; i < ETP_MAX_FINGERS; i++) {
contact_valid = tp_info & (1U << (3 + i));
-   elan_report_contact(data, i, contact_valid, finger_data);
+   elan_report_contact(data, i, contact_valid, hover_event,
+   finger_data);
 
if (contact_valid)
finger_data += ETP_FINGER_DATA_LEN;
@@ -883,6 +889,7 @@ static int elan_setup_input_device(struct elan_tp_data 
*data)
 ETP_FINGER_WIDTH * max_width, 0, 0);
input_set_abs_params(input, ABS_MT_TOUCH_MINOR, 0,
 ETP_FINGER_WIDTH * min_width, 0, 0);
+   input_set_abs_params(input, ABS_MT_DISTANCE, 0, 1, 0, 0);
 
data->input = input;
 






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [V8 PATCH 1/3] ACPICA: Add ACPI _CLS processing

2015-04-16 Thread Zheng, Lv

Before back porting this to ACPICA, let me ask one simple question.
According to the spec, the _CLS is optional and PCI specific.
So why should we implement it in ACPICA core not OSPM specific modules?
If this need to be implemented in ACPICA, then what about the following device 
identification objects?
_DDN, _HRV, _MLS, _PLD, _STR, _SUN

Thanks and best regards
-Lv

> From: Suravee Suthikulpanit [mailto:suravee.suthikulpa...@amd.com]
> Sent: Tuesday, March 31, 2015 5:56 AM
> 
> ACPI Device configuration often contain _CLS object to suppy PCI-defined
> class code for the device. This patch introduces logic to process the _CLS
> object.
> 
> Acked-by: Mika Westerberg 
> Reviewed-by: Hanjun Guo 
> Signed-off-by: Suravee Suthikulpanit 
> ---
>  drivers/acpi/acpica/acutils.h  |  3 ++
>  drivers/acpi/acpica/nsxfname.c | 21 ++--
>  drivers/acpi/acpica/utids.c| 73 
> ++
>  3 files changed, 95 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/acpica/acutils.h b/drivers/acpi/acpica/acutils.h
> index c2f03e8..2aef850 100644
> --- a/drivers/acpi/acpica/acutils.h
> +++ b/drivers/acpi/acpica/acutils.h
> @@ -430,6 +430,9 @@ acpi_status
>  acpi_ut_execute_CID(struct acpi_namespace_node *device_node,
>   struct acpi_pnp_device_id_list ** return_cid_list);
> 
> +acpi_status
> +acpi_ut_execute_CLS(struct acpi_namespace_node *device_node,
> + struct acpi_pnp_device_id **return_id);
>  /*
>   * utlock - reader/writer locks
>   */
> diff --git a/drivers/acpi/acpica/nsxfname.c b/drivers/acpi/acpica/nsxfname.c
> index d66c326..590ef06 100644
> --- a/drivers/acpi/acpica/nsxfname.c
> +++ b/drivers/acpi/acpica/nsxfname.c
> @@ -276,11 +276,12 @@ acpi_get_object_info(acpi_handle handle,
>   struct acpi_pnp_device_id *hid = NULL;
>   struct acpi_pnp_device_id *uid = NULL;
>   struct acpi_pnp_device_id *sub = NULL;
> + struct acpi_pnp_device_id *cls = NULL;
>   char *next_id_string;
>   acpi_object_type type;
>   acpi_name name;
>   u8 param_count = 0;
> - u8 valid = 0;
> + u16 valid = 0;
>   u32 info_size;
>   u32 i;
>   acpi_status status;
> @@ -320,7 +321,7 @@ acpi_get_object_info(acpi_handle handle,
>   if ((type == ACPI_TYPE_DEVICE) || (type == ACPI_TYPE_PROCESSOR)) {
>   /*
>* Get extra info for ACPI Device/Processor objects only:
> -  * Run the Device _HID, _UID, _SUB, and _CID methods.
> +  * Run the Device _HID, _UID, _SUB, _CID and _CLS methods.
>*
>* Note: none of these methods are required, so they may or may
>* not be present for this device. The Info->Valid bitfield is 
> used
> @@ -351,6 +352,14 @@ acpi_get_object_info(acpi_handle handle,
>   valid |= ACPI_VALID_SUB;
>   }
> 
> + /* Execute the Device._CLS method */
> +
> + status = acpi_ut_execute_CLS(node, );
> + if (ACPI_SUCCESS(status)) {
> + info_size += cls->length;
> + valid |= ACPI_VALID_CLS;
> + }
> +
>   /* Execute the Device._CID method */
> 
>   status = acpi_ut_execute_CID(node, _list);
> @@ -468,6 +477,11 @@ acpi_get_object_info(acpi_handle handle,
>   sub, next_id_string);
>   }
> 
> + if (cls) {
> + next_id_string = acpi_ns_copy_device_id(>cls,
> + cls, next_id_string);
> + }
> +
>   if (cid_list) {
>   info->compatible_id_list.count = cid_list->count;
>   info->compatible_id_list.list_size = cid_list->list_size;
> @@ -507,6 +521,9 @@ cleanup:
>   if (sub) {
>   ACPI_FREE(sub);
>   }
> + if (cls) {
> + ACPI_FREE(cls);
> + }
>   if (cid_list) {
>   ACPI_FREE(cid_list);
>   }
> diff --git a/drivers/acpi/acpica/utids.c b/drivers/acpi/acpica/utids.c
> index 27431cf..9745065 100644
> --- a/drivers/acpi/acpica/utids.c
> +++ b/drivers/acpi/acpica/utids.c
> @@ -416,3 +416,76 @@ cleanup:
>   acpi_ut_remove_reference(obj_desc);
>   return_ACPI_STATUS(status);
>  }
> +
> +/***
> + *
> + * FUNCTION:acpi_ut_execute_CLS
> + *
> + * PARAMETERS:  device_node - Node for the device
> + *  return_id   - Where the string UID is returned
> + *
> + * RETURN:  Status
> + *
> + * DESCRIPTION: Executes the _CLS control method that returns PCI-defined
> + *  class code of the device. The ACPI spec define _CLS as a
> + *  package with three integers. The returned string has format:
> + *
> + *  "bbsspp"
> + *  where:
> + *  bb = Base-class code
> + *  ss = Sub-class code
> + *

Re: [PATCH] zram: enable compaction support in zram

2015-04-16 Thread Minchan Kim

On Fri, Apr 17, 2015 at 10:34:57AM +0900, Sergey Senozhatsky wrote:
> >> Commit c72c6160d967 ("zram: move compact_store() to sysfs functions
> area")
> >> was intended to be a cosmetic change that moved function around w/o any
> >> functional change:
> >> http://lkml.iu.edu/hypermail/linux/kernel/1503.1/01818.html
> >>
> >> Unfortunately, on its way from mmotm to Linus's tree it was altered and
> turned
> >> into "remove compaction support from zram" commit. I've managed to
> create a
> >> mess of commits and Andrew had to pick some of the commits and hand edit
> them.
> >>
> >> Fix it and learn my lessons.
> >>
> >> Signed-off-by: Sergey Senozhatsky 
> > Acked-by: Minchan Kim 
> 
> 
> let's just revert c72c6160d967ed26a0b136dbab337f821d233509 instead.
> this commit doesn't make a lot of sense.

Agree.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 2/3] watchdog: add watchdog_cpumask sysctl to assist nohz

2015-04-16 Thread Chai Wen

On 04/15/2015 03:37 AM, Chris Metcalf wrote:

> Change the default behavior of watchdog so it only runs on the
> housekeeping cores when nohz_full is enabled at build and boot time.
> Allow modifying the set of cores the watchdog is currently running
> on with a new kernel.watchdog_cpumask sysctl.
> 
> If we allowed the watchdog to run on nohz_full cores, the timer
> interrupts and scheduler work would prevent the desired tickless
> operation on those cores.  But if we disable the watchdog globally,
> then the housekeeping cores can't benefit from the watchdog
> functionality.  So we allow disabling it only on some cores.
> See Documentation/lockup-watchdogs.txt for more information.
> 
> Acked-by: Don Zickus 
> Signed-off-by: Chris Metcalf 
> ---
> v8: use new semantics of smpboot_update_cpumask_percpu_thread() [Frederic]
> improve documentation in "Documentation/" and in changelong [akpm]


s/changelong/changelog/

> 
> v7: use cpumask field instead of valid_cpu() callback
> 
> v6: use alloc_cpumask_var() [Sasha Levin]
> switch from watchdog_exclude to watchdog_cpumask [Frederic]
> simplify the smp_hotplug_thread API to watchdog [Frederic]
> add Don's Acked-by
> 
>  Documentation/lockup-watchdogs.txt | 18 ++
>  Documentation/sysctl/kernel.txt| 15 
>  include/linux/nmi.h|  3 +++
>  kernel/sysctl.c|  7 ++
>  kernel/watchdog.c  | 49 
> ++
>  5 files changed, 92 insertions(+)
> 
> diff --git a/Documentation/lockup-watchdogs.txt 
> b/Documentation/lockup-watchdogs.txt
> index ab0baa692c13..22dd6af2e4bd 100644
> --- a/Documentation/lockup-watchdogs.txt
> +++ b/Documentation/lockup-watchdogs.txt
> @@ -61,3 +61,21 @@ As explained above, a kernel knob is provided that allows
>  administrators to configure the period of the hrtimer and the perf
>  event. The right value for a particular environment is a trade-off
>  between fast response to lockups and detection overhead.
> +
> +By default, the watchdog runs on all online cores.  However, on a
> +kernel configured with NO_HZ_FULL, by default the watchdog runs only
> +on the housekeeping cores, not the cores specified in the "nohz_full"
> +boot argument.  If we allowed the watchdog to run by default on
> +the "nohz_full" cores, we would have to run timer ticks to activate
> +the scheduler, which would prevent the "nohz_full" functionality
> +from protecting the user code on those cores from the kernel.
> +Of course, disabling it by default on the nohz_full cores means that
> +when those cores do enter the kernel, by default we will not be
> +able to detect if they lock up.  However, allowing the watchdog
> +to continue to run on the housekeeping (non-tickless) cores means
> +that we will continue to detect lockups properly on those cores.
> +
> +In either case, the set of cores excluded from running the watchdog
> +may be adjusted via the kernel.watchdog_cpumask sysctl.  For
> +nohz_full cores, this may be useful for debugging a case where the
> +kernel seems to be hanging on the nohz_full cores.
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index c831001c45f1..f1697858d71c 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -923,6 +923,21 @@ and nmi_watchdog.
>  
>  ==
>  
> +watchdog_cpumask:
> +
> +This value can be used to control on which cpus the watchdog may run.
> +The default cpumask is all possible cores, but if NO_HZ_FULL is
> +enabled in the kernel config, and cores are specified with the
> +nohz_full= boot argument, those cores are excluded by default.
> +Offline cores can be included in this mask, and if the core is later
> +brought online, the watchdog will be started based on the mask value.
> +
> +Typically this value would only be touched in the nohz_full case
> +to re-enable cores that by default were not running the watchdog,
> +if a kernel lockup was suspected on those cores.
> +
> +==
> +
>  watchdog_thresh:
>  
>  This value can be used to control the frequency of hrtimer and NMI
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 3d46fb4708e0..f94da0e65dea 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -67,6 +67,7 @@ extern int nmi_watchdog_enabled;
>  extern int soft_watchdog_enabled;
>  extern int watchdog_user_enabled;
>  extern int watchdog_thresh;
> +extern unsigned long *watchdog_cpumask_bits;
>  extern int sysctl_softlockup_all_cpu_backtrace;
>  struct ctl_table;
>  extern int proc_watchdog(struct ctl_table *, int ,
> @@ -77,6 +78,8 @@ extern int proc_soft_watchdog(struct ctl_table *, int ,
> void __user *, size_t *, loff_t *);
>  extern int proc_watchdog_thresh(struct ctl_table *, int ,
>   void __user *, size_t *,

Re: [PATCH] zram: enable compaction support in zram

2015-04-16 Thread Minchan Kim

On Fri, Apr 17, 2015 at 10:07:40AM +0900, Sergey Senozhatsky wrote:
> Commit c72c6160d967 ("zram: move compact_store() to sysfs functions area")
> was intended to be a cosmetic change that moved function around w/o any
> functional change:
> http://lkml.iu.edu/hypermail/linux/kernel/1503.1/01818.html
> 
> Unfortunately, on its way from mmotm to Linus's tree it was altered and turned
> into "remove compaction support from zram" commit. I've managed to create a
> mess of commits and Andrew had to pick some of the commits and hand edit them.
> 
> Fix it and learn my lessons.
> 
> Signed-off-by: Sergey Senozhatsky 
Acked-by: Minchan Kim 


-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 00/10] cleaned up on-demand device creation

2015-04-16 Thread Sergey Senozhatsky

On (04/17/15 10:00), Sergey Senozhatsky wrote:
> Date: Fri, 17 Apr 2015 10:00:24 +0900
> From: Sergey Senozhatsky 
> To: Andrew Morton , Minchan Kim
>  
> Cc: Nitin Gupta , linux-kernel@vger.kernel.org, Sergey
>  Senozhatsky 
> Subject: Re: [PATCHv2 00/10] cleaned up on-demand device creation
> User-Agent: Mutt/1.5.23 (2014-03-12)
> 
> On (04/16/15 20:55), Sergey Senozhatsky wrote:
> > unfortunately, commit c72c6160d967ed26a0b136dbab337f821d233509
> >   Author: Sergey Senozhatsky 
> >   Date:   Wed Apr 15 16:15:55 2015 -0700
> > 
> >  zram: move compact_store() to sysfs functions area
> >
> 
> Andrew, please ignore the whole series.
> 
> 
> I will send a separate patch to address the compaction issue shortly.
> 

or simply revert c72c6160d967ed26a0b136dbab337f821d233509, which is
probably even better. there is no sense in moving compact_store to
a sysfs handlers logical code block anyway. there is simply no such
block there yet (logical code reorganization happens in a separate patch,
which was not merged into 4.1).

so I think reverting it is the best option for now.

$ git revert c72c6160d967ed26a0b136dbab337f821d233509
[master 211073e] Revert "zram: move compact_store() to sysfs functions area"

-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] zram: enable compaction support in zram

2015-04-16 Thread Sergey Senozhatsky

Commit c72c6160d967 ("zram: move compact_store() to sysfs functions area")
was intended to be a cosmetic change that moved function around w/o any
functional change:
http://lkml.iu.edu/hypermail/linux/kernel/1503.1/01818.html

Unfortunately, on its way from mmotm to Linus's tree it was altered and turned
into "remove compaction support from zram" commit. I've managed to create a
mess of commits and Andrew had to pick some of the commits and hand edit them.

Fix it and learn my lessons.

Signed-off-by: Sergey Senozhatsky 
---
 Documentation/blockdev/zram.txt |  2 +-
 drivers/block/zram/zram_drv.c   | 23 +++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 48a183e..bef4998 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -126,7 +126,7 @@ mem_used_max  RWthe maximum amount memory zram have 
consumed to
 mem_limit RWthe maximum amount of memory ZRAM can use to store
 the compressed data
 num_migrated  ROthe number of objects migrated migrated by compaction
-
+compact   WOtrigger memory compaction
 
 WARNING
 ===
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index c94386a..f474cbe 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -261,6 +261,27 @@ static ssize_t comp_algorithm_store(struct device *dev,
return len;
 }
 
+static ssize_t compact_store(struct device *dev,
+   struct device_attribute *attr, const char *buf, size_t len)
+{
+   unsigned long nr_migrated;
+   struct zram *zram = dev_to_zram(dev);
+   struct zram_meta *meta;
+
+   down_read(>init_lock);
+   if (!init_done(zram)) {
+   up_read(>init_lock);
+   return -EINVAL;
+   }
+
+   meta = zram->meta;
+   nr_migrated = zs_compact(meta->mem_pool);
+   atomic64_add(nr_migrated, >stats.num_migrated);
+   up_read(>init_lock);
+
+   return len;
+}
+
 /* flag operations needs meta->tb_lock */
 static int zram_test_flag(struct zram_meta *meta, u32 index,
enum zram_pageflags flag)
@@ -1038,6 +1059,7 @@ static const struct block_device_operations zram_devops = 
{
.owner = THIS_MODULE
 };
 
+static DEVICE_ATTR_WO(compact);
 static DEVICE_ATTR_RW(disksize);
 static DEVICE_ATTR_RO(initstate);
 static DEVICE_ATTR_WO(reset);
@@ -1114,6 +1136,7 @@ static struct attribute *zram_disk_attrs[] = {
_attr_num_writes.attr,
_attr_failed_reads.attr,
_attr_failed_writes.attr,
+   _attr_compact.attr,
_attr_invalid_io.attr,
_attr_notify_free.attr,
_attr_zero_pages.attr,
-- 
2.4.0.rc1.29.gecc46a1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] mfd: rtsx: add support for rts522A

2015-04-16 Thread micky_ching

From: Micky Ching 

rts522a(rts5227s) is derived from rts5227, and mainly same with rts5227.
Add it to file mfd/rts5227.c to support this chip.

Signed-off-by: Micky Ching 
---
 drivers/mfd/Kconfig  |  7 ++--
 drivers/mfd/rts5227.c| 77 ++--
 drivers/mfd/rtsx_pcr.c   |  5 +++
 drivers/mfd/rtsx_pcr.h   |  3 ++
 include/linux/mfd/rtsx_pci.h |  6 
 5 files changed, 93 insertions(+), 5 deletions(-)

diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index 38356e3..2c52f93 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -646,9 +646,10 @@ config MFD_RTSX_PCI
select MFD_CORE
help
  This supports for Realtek PCI-Express card reader including rts5209,
- rts5229, rtl8411, etc. Realtek card reader supports access to many
- types of memory cards, such as Memory Stick, Memory Stick Pro,
- Secure Digital and MultiMediaCard.
+ rts5227, rts522A, rts5229, rts5249, rts524A, rts525A, rtl8411, etc.
+ Realtek card reader supports access to many types of memory cards,
+ such as Memory Stick, Memory Stick Pro, Secure Digital and
+ MultiMediaCard.
 
 config MFD_RT5033
tristate "Richtek RT5033 Power Management IC"
diff --git a/drivers/mfd/rts5227.c b/drivers/mfd/rts5227.c
index ce012d7..cf13e66 100644
--- a/drivers/mfd/rts5227.c
+++ b/drivers/mfd/rts5227.c
@@ -26,6 +26,14 @@
 
 #include "rtsx_pcr.h"
 
+static u8 rts5227_get_ic_version(struct rtsx_pcr *pcr)
+{
+   u8 val;
+
+   rtsx_pci_read_register(pcr, DUMMY_REG_RESET_0, );
+   return val & 0x0F;
+}
+
 static void rts5227_fill_driving(struct rtsx_pcr *pcr, u8 voltage)
 {
u8 driving_3v3[4][3] = {
@@ -88,7 +96,7 @@ static void rts5227_force_power_down(struct rtsx_pcr *pcr, u8 
pm_state)
rtsx_pci_write_register(pcr, AUTOLOAD_CFG_BASE + 3, 0x01, 0);
 
if (pm_state == HOST_ENTER_S3)
-   rtsx_pci_write_register(pcr, PM_CTRL3, 0x10, 0x10);
+   rtsx_pci_write_register(pcr, pcr->reg_pm_ctrl3, 0x10, 0x10);
 
rtsx_pci_write_register(pcr, FPDCTL, 0x03, 0x03);
 }
@@ -121,7 +129,7 @@ static int rts5227_extra_init_hw(struct rtsx_pcr *pcr)
rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, PETXCFG, 0xB8, 0xB8);
else
rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, PETXCFG, 0xB8, 0x88);
-   rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, PM_CTRL3, 0x10, 0x00);
+   rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, pcr->reg_pm_ctrl3, 0x10, 0x00);
 
return rtsx_pci_send_cmd(pcr, 100);
 }
@@ -298,8 +306,73 @@ void rts5227_init_params(struct rtsx_pcr *pcr)
pcr->tx_initial_phase = SET_CLOCK_PHASE(27, 27, 15);
pcr->rx_initial_phase = SET_CLOCK_PHASE(30, 7, 7);
 
+   pcr->ic_version = rts5227_get_ic_version(pcr);
pcr->sd_pull_ctl_enable_tbl = rts5227_sd_pull_ctl_enable_tbl;
pcr->sd_pull_ctl_disable_tbl = rts5227_sd_pull_ctl_disable_tbl;
pcr->ms_pull_ctl_enable_tbl = rts5227_ms_pull_ctl_enable_tbl;
pcr->ms_pull_ctl_disable_tbl = rts5227_ms_pull_ctl_disable_tbl;
+
+   pcr->reg_pm_ctrl3 = PM_CTRL3;
+}
+
+static int rts522a_optimize_phy(struct rtsx_pcr *pcr)
+{
+   int err;
+
+   err = rtsx_pci_write_register(pcr, RTS522A_PM_CTRL3, D3_DELINK_MODE_EN,
+   0x00);
+   if (err < 0)
+   return err;
+
+   if (is_version(pcr, 0x522A, IC_VER_A)) {
+   err = rtsx_pci_write_phy_register(pcr, PHY_RCR2,
+   PHY_RCR2_INIT_27S);
+   if (err)
+   return err;
+
+   rtsx_pci_write_phy_register(pcr, PHY_RCR1, PHY_RCR1_INIT_27S);
+   rtsx_pci_write_phy_register(pcr, PHY_FLD0, PHY_FLD0_INIT_27S);
+   rtsx_pci_write_phy_register(pcr, PHY_FLD3, PHY_FLD3_INIT_27S);
+   rtsx_pci_write_phy_register(pcr, PHY_FLD4, PHY_FLD4_INIT_27S);
+   }
+
+   return 0;
+}
+
+static int rts522a_extra_init_hw(struct rtsx_pcr *pcr)
+{
+   rts5227_extra_init_hw(pcr);
+
+   rtsx_pci_write_register(pcr, FUNC_FORCE_CTL, FUNC_FORCE_UPME_XMT_DBG,
+   FUNC_FORCE_UPME_XMT_DBG);
+   rtsx_pci_write_register(pcr, PCLK_CTL, 0x04, 0x04);
+   rtsx_pci_write_register(pcr, PM_EVENT_DEBUG, PME_DEBUG_0, PME_DEBUG_0);
+   rtsx_pci_write_register(pcr, PM_CLK_FORCE_CTL, 0xFF, 0x11);
+
+   return 0;
+}
+
+/* rts522a operations mainly derived from rts5227, except phy/hw init setting.
+ */
+static const struct pcr_ops rts522a_pcr_ops = {
+   .fetch_vendor_settings = rts5227_fetch_vendor_settings,
+   .extra_init_hw = rts522a_extra_init_hw,
+   .optimize_phy = rts522a_optimize_phy,
+   .turn_on_led = rts5227_turn_on_led,
+   .turn_off_led = rts5227_turn_off_led,
+   .enable_auto_blink = rts5227_enable_auto_blink,
+   .disable_auto_blink = rts5227_disable_auto_blink,
+   .card_power_on = rts5227_card_power_on,
+   .card_power_off =

Re: [PATCH] firmware: dmi_scan: Fix ordering of product_uuid

2015-04-16 Thread Zhenzhong Duan


在 2015/4/16 17:30, Jean Delvare 写道:

Le Thursday 16 April 2015 à 16:46 +0800, Zhenzhong Duan a écrit :

On 2015/4/16 15:09, Jean Delvare wrote:

Le Thursday 16 April 2015 à 14:22 +0800, Zhenzhong Duan a écrit :

The basic idea is right, but you ignore the case dmi_walk_early may
fail, though looks impossible when bootup.

Better to add below for robust.

@@ -521,6 +521,6 @@ static int __init dmi_present(const u8 *

return 0;
}
}
+dmi_ver = 0;
return 1;
}


What is the value of this? dmi_ver will never be accessed after this
point anyway, as far as I can see.

Same as above, future commit may not realize you bring this faulty when
they want to use dmi_ver.

Why do you think this is "faulty"? The value in dmi_ver is correct
whether dmi_walk_early() succeeded or not. There's no rationale for
resetting dmi_ver on error and not dmi_num, dmi_len and dmi_base. Note
that dmi_smbios3_present() doesn't reset any of these either. These
values are all correct.

If other modules need to check whether DMI was successfully initialized,
they must check dmi_available rather than any of the variables above
(which are all static anyway.)


You are right, dmi_available should be used here. Sorry for noise

zduan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 00/10] cleaned up on-demand device creation

2015-04-16 Thread Sergey Senozhatsky

On (04/16/15 20:55), Sergey Senozhatsky wrote:
> unfortunately, commit c72c6160d967ed26a0b136dbab337f821d233509
>   Author: Sergey Senozhatsky 
>   Date:   Wed Apr 15 16:15:55 2015 -0700
> 
>  zram: move compact_store() to sysfs functions area
>

Andrew, please ignore the whole series.


I will send a separate patch to address the compaction issue shortly.


-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:x86/pmem] x86/mm: Add support for the non-standard protected e820 type

2015-04-16 Thread Andy Lutomirski

On Thu, Apr 16, 2015 at 5:55 PM, Elliott, Robert (Server Storage)
 wrote:
>
>
>> -Original Message-
>> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
>> ow...@vger.kernel.org] On Behalf Of Andy Lutomirski
>> Sent: Thursday, April 16, 2015 5:31 PM
>> To: Ingo Molnar
>> Subject: Re: [tip:x86/pmem] x86/mm: Add support for the non-standard
>> protected e820 type
>>
>> On Thu, Apr 2, 2015 at 12:51 PM, Andy Lutomirski 
>> wrote:
>> > On Thu, Apr 2, 2015 at 12:13 PM, Ingo Molnar  wrote:
>> >>
>> >> * Andy Lutomirski  wrote:
>> >>
>> >>> On Thu, Apr 2, 2015 at 5:31 AM, tip-bot for Christoph Hellwig
>> >>>  wrote:
>> >>> > Commit-ID:  ec776ef6bbe1734c29cd6bd05219cd93b2731bd4
>> >>> > Gitweb:
>> http://git.kernel.org/tip/ec776ef6bbe1734c29cd6bd05219cd93b2731bd4
>> >>> > Author: Christoph Hellwig 
>> >>> > AuthorDate: Wed, 1 Apr 2015 09:12:18 +0200
>> >>> > Committer:  Ingo Molnar 
>> >>> > CommitDate: Wed, 1 Apr 2015 17:02:43 +0200
>> >>> >
>> >>> > x86/mm: Add support for the non-standard protected e820 type
>> >>> >
>> >>> > Various recent BIOSes support NVDIMMs or ADR using a
>> >>> > non-standard e820 memory type, and Intel supplied reference
>> >>> > Linux code using this type to various vendors.
>> >>> >
>> >>> > Wire this e820 table type up to export platform devices for the
>> >>> > pmem driver so that we can use it in Linux.
>> >>>
>> >>> This scares me a bit.  Do we know that the upcoming ACPI 6.0
>> >>> enumeration mechanism *won't* use e820 type 12? [...]
>> >>
>> >> So I know nothing about it, but I'd be surprised if e820 was touched
>> >> at all, as e820 isn't really well suited to enumerate more complex
>> >> resources, and it appears pmem wants to grow into complex directions?
>> >
>> > I hope so, but I have no idea what the ACPI committee's schemes are.
>> >
>> > We could require pmem.enable_legacy_e820=Y to load the driver for now
>> > if we're concerned about it.
>> >
>>
>> ACPI 6.0 is out:
>>
>> http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
>>
>> AFAICT from a quick read, ACPI 6.0 systems will show EFI type 14
>> (EfiPersistentMemory), ACPI type 7 (AddressRangePersistentMemory) and
>> e820 type 7.
>>
>> Type 12 is still "OEM defined".  See table 15-312.  Maybe I'm reading
>> this wrong.
>>
>> *However*, ACPI 6.0 unsurprisingly also has a real enumeration
>> mechanism for NVDIMMs and such, and those should take precedence.
>>
>> So this driver could plausibly be safe even on ACPI 6.0 systems.
>> Someone from one of the relevant vendors should probably confirm that.
>> I'm still a bit nervous, though.
>
> That value was set aside on behalf of this pre-standard usage, to
> keep future ACPI revisions from standardizing it for anything else.
> The kernel should be just as safe in continuing to recognize that
> value as it is now.
>
> New legacy BIOS systems should follow ACPI 6.0 going forward and
> report type 7 in their E820 tables, along with meeting the other
> ACPI 6.0 requirements.
>
> UEFI systems don't provide an E820 table; that's an artificial
> creation by the bootloader (e.g., grub2) or the kernel with its
> add_efi_memmmap parameter. These systems will just report the
> EFI memory type 14. There was no pre-standard EFI type
> identified that needed to be blocked out.
>
> Any software creating a fake E820 table (grub2, etc.) should
> map EFI type 14 to E820 type 7.

Great!

Off-topic: do you know what an NVDIMM control block is?  Are we really
supposed to access NVDIMM registers using MMIO?  If so, that must have
been a beast to get right in the memory controller.  Do you know if
any vendor has published docs for this stuff?

(I'm trying to figure out whether we're supposed to use SMBUS cycles
for any purpose so I can either fix my driver or relegate it to
legacy-only status.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [tip:x86/pmem] x86/mm: Add support for the non-standard protected e820 type

2015-04-16 Thread Elliott, Robert (Server Storage)



> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> ow...@vger.kernel.org] On Behalf Of Andy Lutomirski
> Sent: Thursday, April 16, 2015 5:31 PM
> To: Ingo Molnar
> Subject: Re: [tip:x86/pmem] x86/mm: Add support for the non-standard
> protected e820 type
> 
> On Thu, Apr 2, 2015 at 12:51 PM, Andy Lutomirski 
> wrote:
> > On Thu, Apr 2, 2015 at 12:13 PM, Ingo Molnar  wrote:
> >>
> >> * Andy Lutomirski  wrote:
> >>
> >>> On Thu, Apr 2, 2015 at 5:31 AM, tip-bot for Christoph Hellwig
> >>>  wrote:
> >>> > Commit-ID:  ec776ef6bbe1734c29cd6bd05219cd93b2731bd4
> >>> > Gitweb:
> http://git.kernel.org/tip/ec776ef6bbe1734c29cd6bd05219cd93b2731bd4
> >>> > Author: Christoph Hellwig 
> >>> > AuthorDate: Wed, 1 Apr 2015 09:12:18 +0200
> >>> > Committer:  Ingo Molnar 
> >>> > CommitDate: Wed, 1 Apr 2015 17:02:43 +0200
> >>> >
> >>> > x86/mm: Add support for the non-standard protected e820 type
> >>> >
> >>> > Various recent BIOSes support NVDIMMs or ADR using a
> >>> > non-standard e820 memory type, and Intel supplied reference
> >>> > Linux code using this type to various vendors.
> >>> >
> >>> > Wire this e820 table type up to export platform devices for the
> >>> > pmem driver so that we can use it in Linux.
> >>>
> >>> This scares me a bit.  Do we know that the upcoming ACPI 6.0
> >>> enumeration mechanism *won't* use e820 type 12? [...]
> >>
> >> So I know nothing about it, but I'd be surprised if e820 was touched
> >> at all, as e820 isn't really well suited to enumerate more complex
> >> resources, and it appears pmem wants to grow into complex directions?
> >
> > I hope so, but I have no idea what the ACPI committee's schemes are.
> >
> > We could require pmem.enable_legacy_e820=Y to load the driver for now
> > if we're concerned about it.
> >
> 
> ACPI 6.0 is out:
> 
> http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
> 
> AFAICT from a quick read, ACPI 6.0 systems will show EFI type 14
> (EfiPersistentMemory), ACPI type 7 (AddressRangePersistentMemory) and
> e820 type 7.
> 
> Type 12 is still "OEM defined".  See table 15-312.  Maybe I'm reading
> this wrong.
> 
> *However*, ACPI 6.0 unsurprisingly also has a real enumeration
> mechanism for NVDIMMs and such, and those should take precedence.
> 
> So this driver could plausibly be safe even on ACPI 6.0 systems.
> Someone from one of the relevant vendors should probably confirm that.
> I'm still a bit nervous, though.

That value was set aside on behalf of this pre-standard usage, to
keep future ACPI revisions from standardizing it for anything else.
The kernel should be just as safe in continuing to recognize that
value as it is now.

New legacy BIOS systems should follow ACPI 6.0 going forward and
report type 7 in their E820 tables, along with meeting the other
ACPI 6.0 requirements.

UEFI systems don't provide an E820 table; that's an artificial
creation by the bootloader (e.g., grub2) or the kernel with its
add_efi_memmmap parameter. These systems will just report the
EFI memory type 14. There was no pre-standard EFI type
identified that needed to be blocked out.

Any software creating a fake E820 table (grub2, etc.) should 
map EFI type 14 to E820 type 7.

---
Robert Elliott, HP Server Storage

[PATCH] Tracing/Profiling: Ftrace: exported additional symbols after de7b2973 change

2015-04-16 Thread Ron Rechenmacher



After change to use "struct pointer instead of name hash",
additional symbols need to be exported using the EXPORT_TRACEPOINT_SYMBOL_GPL
macro.

Signed-off-by: Ron Rechenmacher 
---
 kernel/irq/handle.c   |3 +++
 kernel/softirq.c  |3 +++
 kernel/trace/trace_sched_switch.c |2 ++
 kernel/trace/trace_syscalls.c |3 +++
 4 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index 6354802..241e165 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -195,3 +195,6 @@ irqreturn_t handle_irq_event(struct irq_desc *desc)
irqd_clear(>irq_data, IRQD_IRQ_INPROGRESS);
return ret;
 }
+
+EXPORT_TRACEPOINT_SYMBOL_GPL(irq_handler_entry);
+EXPORT_TRACEPOINT_SYMBOL_GPL(irq_handler_exit);
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 479e443..adce531 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -785,3 +785,6 @@ unsigned int __weak arch_dynirq_lower_bound(unsigned int 
from)
 {
return from;
 }
+
+EXPORT_TRACEPOINT_SYMBOL_GPL(softirq_entry);
+EXPORT_TRACEPOINT_SYMBOL_GPL(softirq_exit);
diff --git a/kernel/trace/trace_sched_switch.c 
b/kernel/trace/trace_sched_switch.c
index 419ca37..c5b96d2 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -99,3 +99,5 @@ void tracing_stop_cmdline_record(void)
 {
tracing_stop_sched_switch();
 }
+
+EXPORT_TRACEPOINT_SYMBOL_GPL(sched_switch);
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index f97f6e3..377c712 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -748,3 +748,6 @@ static int syscall_exit_register(struct ftrace_event_call 
*event,
}
return 0;
 }
+
+EXPORT_TRACEPOINT_SYMBOL_GPL(sys_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(sys_enter);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] ACPICA updates for v4.1-rc1

2015-04-16 Thread Rafael J. Wysocki

Hi Linus,

Please pull from

 git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
 acpica-4.1-rc1

to receive ACPICA material for v4.1-rc1 with top-most commit
0ee0d34985ceffe4036319e1e46df8bff591b9e3

 ACPICA: Store GPE register enable masks upfront

on top of commit 39a8804455fb23f09157341d3ba7db6d7ae6ee76

 Linux 4.0

This updates the kernel's ACPICA code to upstream revision 20150410
and adds a fix for a GPE handling regression introduced during the
3.19 cycle on top of that.

Included are two stable-candidate bug fixes (one of them fixing a
3.16 regression), multiple other fixes and a bunch of cleanups.

Specifics:

 - Fix for a GPE handling regression on Dell Latitude D600 that
   caused GPE signaling to stop working on that machine, which
   appears to be due to a hardware glitch, but it used to work
   and it can be made work again in a relativly straightforward
   way (Rafael J Wysocki).

 - Fix for a mutex unlock regression related to the handling of ACPI
   tables introduced during the 3.16 development cycle (Octavian Purdila).

 - _REV modification to always return 2 which has been done by all
   versions of Windows since NT and the firmware people started to
   use it to distinguish between OSes in their AML and do some silly
   and wrong things on that basis (Bob Moore).

 - Fixes and cleanups related to the acpi_physicall_address data type
   including one stable-candidate fix for an issue occasionally occuring
   on 64-bit machines running 32-bit kernels where using offsets provided
   by the firmware may lead to address overflows (Lv Zheng).

 - External() opcode support infrastructure needed for recompiling
   disassembled ACPI tables in some cases including interpreter
   modification to ignore that opcode (Bob Moore).

 - Support for the "Windows 2015" string in _OSI (Bob Moore).

 - GPE debug interface change to return values read from hardware
   registers (Lv Zheng).

 - Removal of the __DATE__ macro usage in tools (Rasmus Villemoes).

 - Assorted minor fixes and cleanups (Lv Zheng, Rickard Strandqvist,
   Bob Moore).

Thanks!


---

Bob Moore (15):
  ACPICA: Casting changes around acpi_physical_address/acpi_size.
  ACPICA: Fix a sscanf format string.
  ACPICA: Update Resource descriptor dump module.
  ACPICA: Update AML Debugger global variables.
  ACPICA: iASL/Disassembler: Add option to assume table contains valid AML.
  ACPICA: iASL: Enhancement for constant folding.
  ACPICA: Add infrastructure for External() opcode.
  ACPICA: Add "Windows 2015" string to _OSI support.
  ACPICA: Permanently set _REV to the value '2'.
  ACPICA: Remove unused internal AML opcode.
  ACPICA: Add "//" before ascii output of buffers.
  ACPICA: Update for SLIC ACPI table.
  ACPICA: iASL: Add support for MSDM ACPI table.
  ACPICA: Disassembler: Some cleanup of the table dump module.
  ACPICA: Update version to 20150410.

Lv Zheng (13):
  ACPICA: Linuxize: Reduce divergences for 20150410 release.
  ACPICA: Tables: Change acpi_find_root_pointer() to use
acpi_physical_address.
  ACPICA: Unix: Cleanup to use ACPI_TO_INTEGER() to calc page offset.
  ACPICA: Executer: Cleanup to remove an unnecessary conversion.
  ACPICA: Utilities: Cleanup to enforce
ACPI_PHYSADDR_TO_PTR()/ACPI_PTR_TO_PHYSADDR().
  ACPICA: Utilities: Cleanup to convert physical address printing formats.
  ACPICA: Utilities: Cleanup to remove useless
ACPI_PRINTF/FORMAT_xxx helpers.
  ACPICA: Utilities: split IO address types from data type models.
  ACPICA: Events: Add support to return both enable/status
register values for GPE and fixed event.
  ACPICA: Tables: Move an iasl specific table function to iasl source file.
  ACPICA: Utilities: Correct conditional compilation definitions.
  ACPICA: Resources: Correct conditional compilation definitions.
  ACPICA: Fix a couple issues with the local printf module.

Octavian Purdila (1):
  ACPICA: Tables: Don't release ACPI_MTX_TABLES in
acpi_tb_install_standard_table().

Rafael J. Wysocki (1):
  ACPICA: Store GPE register enable masks upfront

Rasmus Villemoes (1):
  ACPICA: Applications: Remove use of __DATE__ macro.

Rickard Strandqvist (1):
  ACPICA: Utilities: Remove unused acpi_ut_create_pkg_state_and_push().

---

 drivers/acpi/acpica/acapps.h   |   8 +-
 drivers/acpi/acpica/acglobal.h |   5 +-
 drivers/acpi/acpica/aclocal.h  |   2 +-
 drivers/acpi/acpica/acmacros.h |  13 +-
 drivers/acpi/acpica/acopcode.h |   2 +
 drivers/acpi/acpica/acresrc.h  |   6 +-
 drivers/acpi/acpica/acstruct.h |   5 -
 drivers/acpi/acpica/actables.h |   9 +-
 drivers/acpi/acpica/acutils.h  |  22 +-
 drivers/acpi/acpica/amlcode.h  |   2 +-

[GIT PULL] Block drivers bits for 4.1

2015-04-16 Thread Jens Axboe

Hi Linus,

This is the block driver pull request for 4.1. As with the core bits,
this is a relatively slow round. This pull request contains:

- Various fixes and cleanups for NVMe, from Alexey Khoroshilov, Chong
  Yuan, myself, Keith Busch, and Murali Iyer.

- Documentation and code cleanups for nbd from Markus Pargmann.

- Change of brd maintainer to me, from Ross Zwisler. At least the email
  doesn't bounce anymore then.

- Two xen-blkback fixes from Tao Chen.

Please pull!

  git://git.kernel.dk/linux-block.git for-4.1/drivers



Alexey Khoroshilov (1):
  NVMe: Fix error handling of class_create("nvme")

Chong Yuan (1):
  NVMe: embedded iod mask cleanup

David Rientjes (2):
  block, drbd: fix drbd_req_new() initialization
  block, drbd: use mempool_create_slab_pool()

Jens Axboe (2):
  NVMe: increase depth of admin queue
  Merge branch 'stable/for-jens-4.1' of git://git.kernel.org/.../konrad/xen 
into for-4.1/drivers

Keith Busch (5):
  NVMe: Freeze admin queue on device failure
  NVMe: Fix blk-mq hot cpu notification
  NVMe: Remove check for null
  NVMe: Add translation for block limits
  NVMe: Meta data handling through submit io ioctl

Markus Pargmann (9):
  Documentation: nbd: Reformat to allow more documentation
  Documentation: nbd: Add list of module parameters
  nbd: Remove kernel internal header
  nbd: Replace kthread_create with kthread_run
  nbd: Fix device bytesize type
  nbd: Restructure debugging prints
  nbd: Remove fixme that was already fixed
  nbd: Return error code directly
  nbd: Return error pointer directly

Murali Iyer (1):
  nvme: Fix PRP list calculation for non-4k system page size

Ross Zwisler (1):
  brd: update maintainer to be Jens Axboe

Tao Chen (2):
  xen-blkback: enlarge the array size of blkback name
  xen-blkback: define pr_fmt macro to avoid the duplication of DRV_PFX

 Documentation/blockdev/nbd.txt  |  48 +++
 MAINTAINERS |   2 +-
 drivers/block/drbd/drbd_main.c  |   7 +-
 drivers/block/drbd/drbd_req.c   |   3 +-
 drivers/block/nbd.c | 140 +--
 drivers/block/nvme-core.c   | 159 +++-
 drivers/block/nvme-scsi.c   |  28 ++-
 drivers/block/xen-blkback/blkback.c |  62 +++---
 drivers/block/xen-blkback/common.h  |   6 --
 drivers/block/xen-blkback/xenbus.c  |  38 +
 include/linux/nbd.h |  46 ---
 include/linux/nvme.h|   5 +-
 12 files changed, 242 insertions(+), 302 deletions(-)
 delete mode 100644 include/linux/nbd.h

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [LKP] [crypto] 9c521a200bc: +11.7% vmstat.system.in

2015-04-16 Thread Herbert Xu

On Fri, Apr 17, 2015 at 08:28:50AM +0800, Huang Ying wrote:
> FYI, we noticed the below changes on
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> commit 9c521a200bc3c12bd724e48a75c57d5358f672be ("crypto: api - remove 
> instance when test failed")

Should already be fixed by 34c9a0ffc75ad25b6a60f61e27c4a4b1189b8085.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 00/10] cleaned up on-demand device creation

2015-04-16 Thread Sergey Senozhatsky

On (04/17/15 08:23), Minchan Kim wrote:
> Probably, I will have time to review next week so I feel it's too late
> to merge it into 4.1 but I think there is no urgency to merge it.
> 

Minchan,

I will resubmit the whole patch set. it has things like

[..]
  What:   /sys/class/zram-control/zram_add
  Date:   August 2015
  KernelVersion:  4.1
[..]

will have to do it anyway. so no rush, take your time.

-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Gracely handle GPE dispatch failure

2015-04-16 Thread Rafael J. Wysocki

On Thursday, April 16, 2015 04:37:45 PM H.J. Lu wrote:
> When GPE dispatch fails due to missing handler, kernel issues
> 
> ACPI Error: No handler for Region
> ACPI Error: Region EmbeddedControl
> ACPI Error: Method parse/execution failed
> ACPI Exception: AE_NOT_EXIST, while evaluating GPE method
> 
> and goes ahead enable GPE anyway.  After that, kernel goes into an
> infinite loop:
> 
> ACPI Error: No handler for Region
> ACPI Error: Region EmbeddedControl
> ACPI Error: Method parse/execution failed
> ACPI Exception: AE_NOT_EXIST, while evaluating GPE method
> 
> and repeat.  This patch avoids the infinite loop by not enabling GPE if
> GPE dispatch fails.

First, please send patches inline not in attachements.

Second, please CC this one to linux-a...@vger.kernel.org and to the ACPICA
maintainers as per the information in the MAINTAINERS file.

Thanks!


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] Block core bits for 4.1-rc1

2015-04-16 Thread Jens Axboe

Hi Linus,

This is the core pull request for 4.1. Not a lot of stuff in here for
this round, mostly little fixes or optimizations. This pull request
contains:

- An optimization that speeds up queue runs on blk-mq, especially for
  the case where there's a large difference between nr_cpu_ids and
  the actual mapped software queues on a hardware queue. From Chong
  Yuan.

- Honor node local allocations for requests on legacy devices. From
  David Rientjes.

- Cleanup of blk_mq_rq_to_pdu() from me.

- exit_aio() fixup from me, greatly speeding up exiting multiple
  IO contexts off exit_group(). For my particular test case, fio
  exit took ~6 seconds. A typical case of both exposing RCU grace
  periods to user space, and serializing exit of them.

- Make blk_mq_queue_enter() honor the gfp mask passed in, so we only
  wait if __GFP_WAIT is set. From Keith Busch.

- blk-mq exports and two added helpers from Mike Snitzer, which will be
  used by the dm-mq code.

- Cleanups of blk-mq queue init from Wei Fang and Xiaoguang Wang.

Please pull!

  git://git.kernel.dk/linux-block.git for-4.1/core


Chong Yuan (1):
  blk-mq: reduce unnecessary software queue looping

David Rientjes (1):
  block: allocate request memory local to request queue

Jens Axboe (3):
  Merge branch 'for-linus' into for-4.1/core
  blk-mq: cleanup blk_mq_rq_to_pdu()
  aio: fix serial draining in exit_aio()

Keith Busch (1):
  blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set

Mike Snitzer (2):
  blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk
  blk-mq: export blk_mq_run_hw_queues

Wei Fang (1):
  blk-mq: put blk_queue_rq_timeout together in blk_mq_init_queue()

Xiaoguang Wang (1):
  block: remove redundant check about 'set->nr_hw_queues' in 
blk_mq_alloc_tag_set()

 block/blk-core.c   | 19 ---
 block/blk-mq-sysfs.c   |  1 +
 block/blk-mq.c | 67 

 fs/aio.c   | 45 +++
 include/linux/blk-mq.h |  7 --
 5 files changed, 93 insertions(+), 46 deletions(-)

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 00/10] cleaned up on-demand device creation

2015-04-16 Thread Sergey Senozhatsky

On (04/17/15 08:23), Minchan Kim wrote:
> > resending on-demand device creation patch set. sadly, I managed to create a
> > mess; so here is my take to clean it up, fold patches and, hopefully, see
> > them in 4.1.
> 
> Thanks for handling this quickly. I acknowlege dynamic device management part
> but I want to review your patchset carefully one more time because you changed
> a lot although it's just refactoring. Really sorry for late review. It's 
> totally
> my bad. Probably, I will have time to review next week so I feel it's too late
> to merge it into 4.1 but I think there is no urgency to merge it.

ok. 4.2 then... we ended up missing 4.1 merge window.


Andrew, please ignore 0002 - 0010. 0001 is important, please pick it up.

-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

update README

2015-04-16 Thread Justin Keller

The  README file specifics 3.X kernel in different places. The kernel
is now 4.X, no longer 3.X

Justin Keller

diff --git a/README b/linux/README
index a24ec89..67fc7a8 100644
--- a/README
+++ b/linux/README
@@ -1,6 +1,6 @@
-Linux kernel release 3.x 
+Linux kernel release 4.x 

-These are the release notes for Linux version 3.  Read them carefully,
+These are the release notes for Linux version 4.  Read them carefully,
 as they tell you what this is all about, explain how to install the
 kernel, and what to do if something goes wrong.

@@ -62,11 +62,11 @@ INSTALLING the kernel source:
directory where you have permissions (eg. your home directory) and
unpack it:

- gzip -cd linux-3.X.tar.gz | tar xvf -
+ gzip -cd linux-4.X.tar.gz | tar xvf -

or

- bzip2 -dc linux-3.X.tar.bz2 | tar xvf -
+ bzip2 -dc linux-4.X.tar.bz2 | tar xvf -

Replace "X" with the version number of the latest kernel.

@@ -75,16 +75,16 @@ INSTALLING the kernel source:
files.  They should match the library, and not get messed up by
whatever the kernel-du-jour happens to be.

- - You can also upgrade between 3.x releases by patching.  Patches are
+ - You can also upgrade between 4.x releases by patching.  Patches are
distributed in the traditional gzip and the newer bzip2 format.  To
install by patching, get all the newer patch files, enter the
-   top level directory of the kernel source (linux-3.X) and execute:
+   top level directory of the kernel source (linux-4.X) and execute:

- gzip -cd ../patch-3.x.gz | patch -p1
+ gzip -cd ../patch-4.x.gz | patch -p1

or

- bzip2 -dc ../patch-3.x.bz2 | patch -p1
+ bzip2 -dc ../patch-4.x.bz2 | patch -p1

Replace "x" for all versions bigger than the version "X" of your current
source tree, _in_order_, and you should be ok.  You may want to remove
@@ -92,13 +92,13 @@ INSTALLING the kernel source:
that there are no failed patches (some-file-name# or some-file-name.rej).
If there are, either you or I have made a mistake.

-   Unlike patches for the 3.x kernels, patches for the 3.x.y kernels
+   Unlike patches for the 4.x kernels, patches for the 4.x.y kernels
(also known as the -stable kernels) are not incremental but instead apply
-   directly to the base 3.x kernel.  For example, if your base kernel is 3.0
-   and you want to apply the 3.0.3 patch, you must not first apply the 3.0.1
-   and 3.0.2 patches. Similarly, if you are running kernel version 3.0.2 and
-   want to jump to 3.0.3, you must first reverse the 3.0.2 patch (that is,
-   patch -R) _before_ applying the 3.0.3 patch. You can read more on this in
+   directly to the base 4.x kernel.  For example, if your base kernel is 4.0
+   and you want to apply the 4.0.3 patch, you must not first apply the 4.0.1
+   and 4.0.2 patches. Similarly, if you are running kernel version 4.0.2 and
+   want to jump to 4.0.3, you must first reverse the 4.0.2 patch (that is,
+   patch -R) _before_ applying the 4.0.3 patch. You can read more on this in
Documentation/applying-patches.txt

Alternatively, the script patch-kernel can be used to automate this
@@ -120,7 +120,7 @@ INSTALLING the kernel source:

 SOFTWARE REQUIREMENTS

-   Compiling and running the 3.x kernels requires up-to-date
+   Compiling and running the 4.x kernels requires up-to-date
versions of various software packages.  Consult
Documentation/Changes for the minimum version numbers required
and how to get updates for these packages.  Beware that using
@@ -137,12 +137,12 @@ BUILD directory for the kernel:
place for the output files (including .config).
Example:

- kernel source code: /usr/src/linux-3.X
+ kernel source code: /usr/src/linux-4.X
  build directory:/home/name/build/kernel

To configure and build the kernel, use:

- cd /usr/src/linux-3.X
+ cd /usr/src/linux-4.X
  make O=/home/name/build/kernel menuconfig
  make O=/home/name/build/kernel
  sudo make O=/home/name/build/kernel modules_install install
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Gracely handle GPE dispatch failure

2015-04-16 Thread H.J. Lu

When GPE dispatch fails due to missing handler, kernel issues

ACPI Error: No handler for Region
ACPI Error: Region EmbeddedControl
ACPI Error: Method parse/execution failed
ACPI Exception: AE_NOT_EXIST, while evaluating GPE method

and goes ahead enable GPE anyway.  After that, kernel goes into an
infinite loop:

ACPI Error: No handler for Region
ACPI Error: Region EmbeddedControl
ACPI Error: Method parse/execution failed
ACPI Exception: AE_NOT_EXIST, while evaluating GPE method

and repeat.  This patch avoids the infinite loop by not enabling GPE if
GPE dispatch fails.


-- 
H.J.
From e892b19f5100427b8c89a74186b90675141ceaab Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sat, 12 Nov 2011 11:03:38 -0800
Subject: [PATCH] Gracely handle GPE dispatch failure

When GPE dispatch fails due to missing handler, kernel issues

ACPI Error: No handler for Region
ACPI Error: Region EmbeddedControl
ACPI Error: Method parse/execution failed
ACPI Exception: AE_NOT_EXIST, while evaluating GPE method

and goes ahead enable GPE anyway.  After that, kernel goes into an
infinite loop:

ACPI Error: No handler for Region
ACPI Error: Region EmbeddedControl
ACPI Error: Method parse/execution failed
ACPI Exception: AE_NOT_EXIST, while evaluating GPE method

and repeat.  This patch avoids the infinite loop by not enabling GPE if
GPE dispatch fails.

Signed-off-by: H.J. Lu 
---
 drivers/acpi/acpica/evgpe.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/acpi/acpica/evgpe.c b/drivers/acpi/acpica/evgpe.c
index 5ed064e..ca1ec02 100644
--- a/drivers/acpi/acpica/evgpe.c
+++ b/drivers/acpi/acpica/evgpe.c
@@ -589,6 +589,7 @@ static void ACPI_SYSTEM_XFACE acpi_ev_asynch_execute_gpe_method(void *context)
 	acpi_ut_get_node_name(gpe_event_info->
 			  dispatch.
 			  method_node)));
+			return_VOID;
 		}
 		break;
 
-- 
1.9.3

Re: [PATCH 2/4] mm: Send a single IPI to TLB flush multiple pages when unmapping

2015-04-16 Thread Minchan Kim

Hello Mel,

On Thu, Apr 16, 2015 at 10:19:22AM +0100, Mel Gorman wrote:
> On Thu, Apr 16, 2015 at 05:29:55PM +0900, Minchan Kim wrote:
> > On Thu, Apr 16, 2015 at 09:07:22AM +0100, Mel Gorman wrote:
> > > On Thu, Apr 16, 2015 at 03:38:26PM +0900, Minchan Kim wrote:
> > > > Hello Mel,
> > > > 
> > > > On Wed, Apr 15, 2015 at 10:28:55PM +0100, Mel Gorman wrote:
> > > > > On Wed, Apr 15, 2015 at 02:16:49PM -0700, Hugh Dickins wrote:
> > > > > > On Wed, 15 Apr 2015, Rik van Riel wrote:
> > > > > > > On 04/15/2015 06:42 AM, Mel Gorman wrote:
> > > > > > > > An IPI is sent to flush remote TLBs when a page is unmapped 
> > > > > > > > that was
> > > > > > > > recently accessed by other CPUs. There are many circumstances 
> > > > > > > > where this
> > > > > > > > happens but the obvious one is kswapd reclaiming pages 
> > > > > > > > belonging to a
> > > > > > > > running process as kswapd and the task are likely running on 
> > > > > > > > separate CPUs.
> > > > > > > > 
> > > > > > > > On small machines, this is not a significant problem but as 
> > > > > > > > machine
> > > > > > > > gets larger with more cores and more memory, the cost of these 
> > > > > > > > IPIs can
> > > > > > > > be high. This patch uses a structure similar in principle to a 
> > > > > > > > pagevec
> > > > > > > > to collect a list of PFNs and CPUs that require flushing. It 
> > > > > > > > then sends
> > > > > > > > one IPI to flush the list of PFNs. A new TLB flush helper is 
> > > > > > > > required for
> > > > > > > > this and one is added for x86. Other architectures will need to 
> > > > > > > > decide if
> > > > > > > > batching like this is both safe and worth the memory overhead. 
> > > > > > > > Specifically
> > > > > > > > the requirement is;
> > > > > > > > 
> > > > > > > > If a clean page is unmapped and not immediately 
> > > > > > > > flushed, the
> > > > > > > > architecture must guarantee that a write to that page 
> > > > > > > > from a CPU
> > > > > > > > with a cached TLB entry will trap a page fault.
> > > > > > > > 
> > > > > > > > This is essentially what the kernel already depends on but the 
> > > > > > > > window is
> > > > > > > > much larger with this patch applied and is worth highlighting.
> > > > > > > 
> > > > > > > This means we already have a (hard to hit?) data corruption
> > > > > > > issue in the kernel.  We can lose data if we unmap a writable
> > > > > > > but not dirty pte from a file page, and the task writes before
> > > > > > > we flush the TLB.
> > > > > > 
> > > > > > I don't think so.  IIRC, when the CPU needs to set the dirty bit,
> > > > > > it doesn't just do that in its TLB entry, but has to fetch and 
> > > > > > update
> > > > > > the actual pte entry - and at that point discovers it's no longer
> > > > > > valid so traps, as Mel says.
> > > > > > 
> > > > > 
> > > > > This is what I'm expecting i.e. clean->dirty transition is 
> > > > > write-through
> > > > > to the PTE which is now unmapped and it traps. I'm assuming there is 
> > > > > an
> > > > > architectural guarantee that it happens but could not find an explicit
> > > > > statement in the docs. I'm hoping Dave or Andi can check with the 
> > > > > relevant
> > > > > people on my behalf.
> > > > 
> > > > A dumb question. It's not related to your patch but MADV_FREE.
> > > > 
> > > > clean->dirty transition is *atomic* as well as write-through?
> > > 
> > > This is the TLB cache clean->dirty transition so it's not 100% clear what 
> > > you
> > > are asking. It both needs to be write-through and the TLB updates must 
> > > happen
> > > before the actual data write to cache or memory and it must be ordered.
> > 
> > Sorry for not clear. I will try again.
> > 
> > In try_to_unmap_one,
> > 
> > 
> > pteval = ptep_clear_flush(vma, address, pte);
> > {
> > pte = ptep_get_and_clear(mm, address, ptep);
> > <-- A application write on other CPU.
> > flush_tlb_page(vma, address);
> > } 
> >  
> > /* Move the dirty bit to the physical page now the pte is gone. */
> > dirty = pte_dirty(pteval);
> > if (dirty)
> > set_page_dirty(page);
> > ...
> > 
> > 
> > In above, ptep_clear_flush just does xchg operation to make pte zero
> > in ptep_get_and_clear and return old pte_val but didn't flush TLB yet.
> 
> Correct.
> 
> > Let's assume old pte_val doesn't have dirty bit(ie, it was clean).
> > If application on other CPU does write the memory at the same time,
> > what happens?
> 
> The comments describe the architectural guarantee I'm looking for. Dave
> says he's asking the relevant people within Intel. I revised the comment
> in the unreleased V2 so it reads
> 
> /*
>  * We clear the PTE but do not flush so potentially a remote
>  * CPU could still be writing to the page. If the entry was
>  * previously clean then

Re: AM335x OMAP2 common clock external fixed-clock registration

2015-04-16 Thread Sebastian Hesselbarth


On 17.04.2015 00:09, Michael Welling wrote:

On Thu, Apr 16, 2015 at 10:37:19PM +0200, Sebastian Hesselbarth wrote:

On 16.04.2015 18:17, Michael Welling wrote:

On Thu, Apr 16, 2015 at 07:32:32AM +0300, Tero Kristo wrote:

On 04/15/2015 11:51 PM, Michael Welling wrote:

On Wed, Apr 15, 2015 at 01:45:53PM -0700, Mike Turquette wrote:

On Wed, Apr 15, 2015 at 12:47 PM, Michael Welling  wrote:

[...]

There is still an issue with the si5351.

I had to comment out the clk_put here for the frequency to show up:
http://lxr.free-electrons.com/source/drivers/clk/clk-si5351.c#L1133

Ideas?


What is the most recent upstream commit that you are based on?


I am working from 4.0.0-rc7.

7b43b47373d40d557cd7e1a84a0bd8ebc4d745ab


Hmm, I wonder why si5351 calls clk_put immediately after of_clk_get
in the first place, as far as I understand this destroys the clock
handle, which is still being used later in the code.


Not sure how this ever worked. This has been in the code since the
initial commit.


The reason it worked before may be related with recent rework of
clk_put() itself and clk cookies instead of pointers. I lost track on
the recent clk subsystem changes here, sorry.

However, droping the clk immediately surely isn't right.
The thing is, we can remove the clk_put() just because there is no
_remove() for that driver. I remember that back in the days the driver
was mainlined, clk removal wasn't too easy.

FWIW, as soon as _remove() support will be added by someone, we'll have
to rethink passing struct clk* by platform_data or at least
double-check if we ever used [of_]clk_get() to obtain it.

Mind to send a patch removing the clk_put() on !IS_ERR and add a proper
error path instead? While of_clk_get() is the only calls that need
cleanup on error in si5351_dt_parse() we should probably move that
calls to the end of this function. Otherwise we'd also have to cleanup
on every of_parse_foo() failure.


What would be the proper error path?
What cleanup is required?


A proper error path would be to release any claimed resource
on any error. If you look at the code, the only resources that
need to be released are the two clocks in question.


It should be noted that there are more deep rooted issues with the driver
that I have noticed. For one the driver behaves differently if the debugging
is on and when it is off.


I guess you mean #define DEBUG in the driver?


Here is what the kernel reports with debugging off:


Do you have any measurement equipment to check what is actually set?


root@som3517-som200:~# cat /sys/kernel/debug/clk/clk_summary
clock enable_cnt  prepare_cntrate   
accuracy   phase

  ref27002700  
0 0
 xtal  002700  
0 0
pllb   00   59994  
0 0
   ms0 001249  
0 0
  clk0 001249  
0 0
plla   00   59994  
0 0
   ms2 00 8219178  
0 0
  clk2 00 8219178  
0 0
   ms1 0094117646  
0 0
  clk1 0094117646  
0 0

Here is what the kernel reports with debugging on:
clock enable_cnt  prepare_cntrate   
accuracy   phase

  ref27002700  
0 0
 xtal  002700  
0 0
pllb   00   884736000  
0 0
   ms0 0018432000  
0 0
  clk0 0018432000  
0 0


Is this what you expect for clk0?


plla   00   897023997  
0 0
   ms2 0012287999  
0 0
  clk2 0012287999  
0 0


ditto for clk2?


   ms1 00   140709646  
0 0
  clk1 00   140709646  
0 0


This is wrong, I agree. Looks like round_rate()/recalc_rate() of msynth
or clkout is broken with respect to non-pll-master clocks.

I had a quick look at drivers/clk.c too, there has been a lot of churn
in clk API since I last booted my device using si5351.

Is

Re: [PATCHv2 00/10] cleaned up on-demand device creation

2015-04-16 Thread Minchan Kim

Hello,

On Thu, Apr 16, 2015 at 08:55:46PM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> resending on-demand device creation patch set. sadly, I managed to create a
> mess; so here is my take to clean it up, fold patches and, hopefully, see
> them in 4.1.

Thanks for handling this quickly. I acknowlege dynamic device management part
but I want to review your patchset carefully one more time because you changed
a lot although it's just refactoring. Really sorry for late review. It's totally
my bad. Probably, I will have time to review next week so I feel it's too late
to merge it into 4.1 but I think there is no urgency to merge it.

> 
> this mess will not happen again.
> 
> 
> Andrew picked up some of the commits lined up for 4.1, which required manual
> editing. sorry for that inconvenience.
> 
> 
> unfortunately, commit c72c6160d967ed26a0b136dbab337f821d233509
>   Author: Sergey Senozhatsky 
>   Date:   Wed Apr 15 16:15:55 2015 -0700
> 
>  zram: move compact_store() to sysfs functions area
> 
> 
> ended up to be different: from a cosmetic change it has transformed into
> a functional change.
> 
> I fix it in 0001-zram-enable-compaction-support-in-zram.patch.
> 

So Andrew, could you pick 0001 in this merge window? Without it, zram cannot
use compaction feature of zsmalloc so zsmalloc's compaciton feature will be
void.

> 
> the rest is functionally identical to what we had in linux-next and mmotm for
> quite some time: in linux-next since Wed Apr 8 09:44:43 2015 +1000
> (commit 273b0791dae2f0b).
> 
> it would be nice to see it in 4.1, if possible.
> 
> no functional change in zram_drv.c file, compared to zram_drv.c from
> linux-next-20150415 (yes, actually checked). just a couple of additional
> comment tweaks.
> 
> like:
> 
> -/* allocate and initialize new zram device. the function returns
> - * '>= 0' device_id upon success, and negative value otherwise. */
> +/*
> + * Allocate and initialize new zram device. the function returns
> + * '>= 0' device_id upon success, and negative value otherwise.
> + */
> 
> or
> 
> /*
>  * First, make ->disksize device attr RO, closing
> -* ZRAM_CTL_REMOVE vs disksize_store() race window
> +* zram_remove() vs disksize_store() race window
>  */
> 
> 
> I also picked up the remaining part of Julia Lawall's
>  ("zram: fix error return code") commit.
> 
> Documentation is identical to linux-next-20150415 version.
> 
> 
> 8<---
> 
> We currently don't support zram on-demand device creation.  The only way
> to have N zram devices is to specify num_devices module parameter (default
> value 1).  That means that if, for some reason, at some point, user wants
> to have N + 1 devies he/she must umount all the existing devices, unload
> the module, load the module passing num_devices equals to N + 1.  And do
> this again, if needed.
> 
> This patchset introduces zram-control sysfs class, which has two sysfs
> attrs:
> 
>  - zram_add -- add a new zram device
>  - zram_remove  -- remove a specific (device_id) zram device
> 
> Usage example:
> # add a new specific zram device
> cat /sys/class/zram-control/zram_add
> 1
> 
> # remove a specific zram device
> echo 4 > /sys/class/zram-control/zram_remove
> 
> The patchset also does some cleanups and huge code reorganization.
> 
> 
> -ss
> 
> 
> Sergey Senozhatsky (10):
>   zram: enable compaction support in zram
>   zram: cosmetic ZRAM_ATTR_RO code formatting tweak
>   zram: use idr instead of `zram_devices' array
>   zram: factor out device reset from reset_store()
>   zram: reorganize code layout
>   zram: remove max_num_devices limitation
>   zram: report every added and removed device
>   zram: trivial: correct flag operations comment
>   zram: return zram device_id value from zram_add()
>   zram: add dynamic device add/remove functionality
> 
>  Documentation/ABI/testing/sysfs-class-zram |  24 +
>  Documentation/blockdev/zram.txt|  31 +-
>  drivers/block/zram/zram_drv.c  | 939 
> +
>  drivers/block/zram/zram_drv.h  |   6 -
>  4 files changed, 597 insertions(+), 403 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-class-zram
> 
> -- 
> 2.4.0.rc1.29.gecc46a1
> 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [net:r8169] kernel 4.0.0 double "link down" on boot

2015-04-16 Thread Francois Romieu

rh_  :
> I see these double "link down" now in 4.0
> 
> [   19.927880] r8169 :03:00.2 eth0: link down
> [   19.927902] r8169 :03:00.2 eth0: link down
> [   22.32] r8169 :03:00.2 eth0: link up
> 
> Everything seems to be working fine. Is this an indicator of a problem?

No.

> Safe to ignore?

Mostly.

> If so why?

The driver is notified of a link event. It schedules a link check.
The link may be back to its original state before the driver notices
the change.

As long as you don't experience high frequency link events for an
extended duration, you should not care.

[...]
> Does it make sense to check 3.19 ?

No.

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] mm: vmscan: invoke slab shrinkers from shrink_zone()

2015-04-16 Thread Dave Chinner

On Thu, Apr 16, 2015 at 10:34:13AM -0400, Johannes Weiner wrote:
> On Thu, Apr 16, 2015 at 12:57:36PM +0900, Joonsoo Kim wrote:
> > This causes following success rate regression of phase 1,2 on 
> > stress-highalloc
> > benchmark. The situation of phase 1,2 is that many high order allocations 
> > are
> > requested while many threads do kernel build in parallel.
> 
> Yes, the patch made the shrinkers on multi-zone nodes less aggressive.
> From the changelog:
> 
> This changes kswapd behavior, which used to invoke the shrinkers for each
> zone, but with scan ratios gathered from the entire node, resulting in
> meaningless pressure quantities on multi-zone nodes.
> 
> So the previous code *did* apply more pressure on the shrinkers, but
> it didn't make any sense.  The number of slab objects to scan for each
> scanned LRU page depended on how many zones there were in a node, and
> their relative sizes.  So a node with a large DMA32 and a small Normal
> would receive vastly different relative slab pressure than a node with
> only one big zone Normal.  That's not something we should revert to.
> 
> If we are too weak on objects compared to LRU pages then we should
> adjust DEFAULT_SEEKS or individual shrinker settings.

Now this thread has my attention. Changing shrinker defaults will
seriously upset the memory balance under load (in unpredictable
ways) so I really don't think we should even consider changing
DEFAULT_SEEKS.

If there's a shrinker imbalance, we need to understand which
shrinker needs rebalancing, then modify that shrinker's
configuration and then observe the impact this has on the rest of
the system. This means looking at variance of the memory footprint
in steady state, reclaim overshoot and damping rates before steady
state is acheived, etc.  Balancing multiple shrinkers (especially
those with dependencies on other caches) under memory
load is a non-trivial undertaking.

I don't see any evidence that we have a shrinker imbalance, so I
really suspect the problem is "shrinkers aren't doing enough work".
In that case, we need to increase the pressure being generated, not
start fiddling around with shrinker configurations.

> If we think our pressure ratio is accurate but we don't reclaim enough
> compared to our compaction efforts, then any adjustments to improve
> huge page successrate should come from the allocator/compaction side.

Right - if compaction is failing, then the problem is more likely
that it isn't generating enough pressure, and so the shrinkers
aren't doing the work we are expecting them to do. That's a problem
with compaction, not the shrinkers...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/16] printk: implement log_seq_range() and ext_log_from_seq()

2015-04-16 Thread Tejun Heo

Implement two functions to access log messages by their sequence
numbers.

* log_seq_range() determines the currently available sequence number
  range.

* ext_log_from_seq() outputs log message identified by a given
  sequence number in the extended format.

These will be used to implement reliable netconsole.

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
Cc: Petr Mladek 
---
 include/linux/printk.h | 14 
 kernel/printk/printk.c | 96 ++
 2 files changed, 110 insertions(+)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 58b1fec..4fd3bb4 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 extern const char linux_banner[];
 extern const char linux_proc_banner[];
@@ -169,6 +170,8 @@ void __init setup_log_buf(int early);
 void dump_stack_set_arch_desc(const char *fmt, ...);
 void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
+void log_seq_range(u64 *beginp, u64 *endp);
+ssize_t ext_log_from_seq(char *buf, size_t size, u64 log_seq);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -228,6 +231,17 @@ static inline void dump_stack_print_info(const char 
*log_lvl)
 static inline void show_regs_print_info(const char *log_lvl)
 {
 }
+
+static inline void log_seq_range(u64 *begin_seqp, u64 *end_seqp)
+{
+   *begin_seqp = 0;
+   *end_seqp = 0;
+}
+
+static inline ssize_t ext_log_from_seq(char *buf, size_t size, u64 log_seq)
+{
+   return -ERANGE;
+}
 #endif
 
 extern asmlinkage void dump_stack(void) __cold;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 349a37b..904413e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -617,6 +617,102 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
return p - buf;
 }
 
+/**
+ * log_seq_range - get the range of available log sequence numbers
+ * @begin_seqp: output parameter for the begin of the range
+ * @end_seqp: output parameter for the end of the range
+ *
+ * Returns the log sequence number range [*@begin_seqp, *@ends_seqp) which
+ * is currently available to be retrieved using ext_log_from_seq().  Note
+ * that the range may shift anytime.
+ */
+void log_seq_range(u64 *begin_seqp, u64 *end_seqp)
+{
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(_lock, flags);
+   *begin_seqp = log_first_seq;
+   *end_seqp = log_next_seq;
+   raw_spin_unlock_irqrestore(_lock, flags);
+}
+EXPORT_SYMBOL_GPL(log_seq_range);
+
+/**
+ * ext_log_from_seq - obtain log message in extended format from its seq number
+ * @buf: output buffer
+ * @size: size of @buf
+ * @log_seq: target log sequence number
+ *
+ * Print the log message with @log_seq as its sequence number into @buf
+ * using the extended format.  On success, returns the length of formatted
+ * output excluding the terminating '\0'.  If @log_seq doesn't match any
+ * message, -ERANGE is returned.
+ *
+ * Due to the way messages are stored, accessing @log_seq requires seeking
+ * the log buffer sequentially.  As the last looked up position is cached,
+ * accessing messages in ascending sequence order is recommended.
+ */
+ssize_t ext_log_from_seq(char *buf, size_t size, u64 log_seq)
+{
+   static u64 seq_hint;
+   static u32 idx_hint;
+   static enum log_flags prev_flags_hint;
+   struct printk_log *msg;
+   enum log_flags prev_flags = 0;
+   unsigned long flags;
+   ssize_t len;
+   u32 seq;
+   u32 prev_idx, idx;
+
+   raw_spin_lock_irqsave(_lock, flags);
+
+   if (log_seq < log_first_seq || log_seq >= log_next_seq) {
+   len = -ERANGE;
+   goto out_unlock;
+   }
+
+   /*
+* Seek to @log_seq to determine its index. The last looked up seq
+* and index are cached to accelerate in-order accesses.  For now,
+* @prev_flags is best effort.  It'd be better to restructure it so
+* that the current entry carries all the relevant information
+* without depending on the previous one.
+*/
+   seq = log_first_seq;
+   idx = log_first_idx;
+
+   if (seq < seq_hint && seq_hint <= log_seq) {
+   seq = seq_hint;
+   idx = idx_hint;
+   prev_flags = prev_flags_hint;
+   }
+
+   prev_idx = idx;
+   while (seq < log_seq) {
+   seq++;
+   prev_idx = idx;
+   idx = log_next(idx);
+   }
+
+   if (prev_idx != idx)
+   prev_flags = log_from_idx(prev_idx)->flags;
+   msg = log_from_idx(idx);
+
+   /* entry located, format it */
+   len = msg_print_ext_header(buf, size, msg, seq, prev_flags);
+   len += msg_print_ext_body(buf + len, size - len,
+ log_dict(msg), msg->dict_len,
+ log_text(msg), msg->text_len);
+
+   seq_hint = seq;

[PATCH 14/16] netconsole: implement ack handling and emergency transmission

2015-04-16 Thread Tejun Heo

While the retransmission support added by the previous patch goes a
long way towards enabling implementation of reliable remote logger, it
depends on a lot larger part of the kernel working and there's no
non-polling way of discovering whether the latest messages have been
lost.

This patch implements an optional ack handling.  An extended remote
logger can respond with a message formatted like the following.

 nca[]  ...

The optional  enables ack handling.  Whenever ack lags
transmission by more than ACK_TIMEOUT (10s), the netconsole target
will retransmit all unacked messages with increasing interval (100ms
between message at the beginning, exponentially backing out to 10s).
This ensures that the remote logger has a very high chance of getting
all the messages even after severe failures as long as the timer and
netpoll are working.

This doesn't interfere with the normal transmission path.  Even if
something goes wrong with timer, the normal transmission path should
keep working.

Emergency transmissions are marked with "emg=1" header.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 136 +++
 1 file changed, 127 insertions(+), 9 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index b2763e0..82c8be0 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -59,6 +60,10 @@ MODULE_LICENSE("GPL");
 #define MAX_PARAM_LENGTH   256
 #define MAX_PRINT_CHUNK1000
 
+#define ACK_TIMEOUT(10 * HZ)
+#define EMG_TX_MIN_INTV(HZ / 10)
+#define EMG_TX_MAX_INTVHZ
+
 static char config[MAX_PARAM_LENGTH];
 module_param_string(netconsole, config, MAX_PARAM_LENGTH, 0);
 MODULE_PARM_DESC(netconsole, " 
netconsole=[src-port]@[src-ip]/[dev],[tgt-port]@/[tgt-macaddr]");
@@ -137,6 +142,13 @@ struct netconsole_target {
wait_queue_trx_wait;/* ditto */
wait_queue_head_t   *rx_waitq;  /* ditto */
struct work_struct  rx_work;/* receive & process packets */
+
+   /* ack handling and emergency transmission for extended netconsoles */
+   unsigned long   ack_tstmp;  /* pending ack timestamp */
+   u64 ack_seq;/* last acked sequence */
+   struct timer_list   ack_timer;  /* ack timeout */
+   unsigned long   emg_tx_intv;/* emergency tx interval */
+   u64 emg_tx_seq; /* current emg tx sequence */
 };
 
 #ifdef CONFIG_NETCONSOLE_DYNAMIC
@@ -223,16 +235,17 @@ static struct netconsole_target 
*alloc_netconsole_target(void)
  * @nt: target to send message to
  * @msg: extended log message to send
  * @msg_len: length of message
+ * @emg_tx: is it for emergency transmission?
  *
  * Transfer extended log @msg to @nt.  If @msg is too long, it'll be split
  * and transmitted in multiple chunks with ncfrag header field added to
- * enable correct reassembly.
+ * enable correct reassembly.  If @emg_tx, emg header field is added.
  */
 static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
-int msg_len)
+int msg_len, bool emg_tx)
 {
static char buf[MAX_PRINT_CHUNK];
-   const int max_extra_len = sizeof(",ncfrag=@00/00");
+   const int max_extra_len = sizeof(",emg=1,ncfrag=@00/00");
const char *header, *body;
int header_len = msg_len, body_len = 0;
int chunk_len, nr_chunks, i;
@@ -240,7 +253,7 @@ static void send_ext_msg_udp(struct netconsole_target *nt, 
const char *msg,
if (!nt->enabled || !netif_running(nt->np.dev))
return;
 
-   if (msg_len <= MAX_PRINT_CHUNK) {
+   if (!emg_tx && msg_len <= MAX_PRINT_CHUNK) {
netpoll_send_udp(>np, msg, msg_len);
return;
}
@@ -261,6 +274,8 @@ static void send_ext_msg_udp(struct netconsole_target *nt, 
const char *msg,
/*
 * Transfer possibly multiple chunks with extra header fields.
 *
+* For emergency transfers due to missing acks, add "emg=1".
+*
 * If @msg needs to be split to fit MAX_PRINT_CHUNK, add
 * "ncfrag=@<0-based-chunk-index>/" to
 * enable proper reassembly on receiver side.
@@ -272,6 +287,10 @@ static void send_ext_msg_udp(struct netconsole_target *nt, 
const char *msg,
int this_header = header_len;
int this_chunk;
 
+   if (emg_tx)
+   this_header += scnprintf(buf + this_header,
+sizeof(buf) - this_header,
+",emg=1");
if (nr_chunks > 1)
this_header += scnprintf(buf + this_header,

[PATCH 04/16] printk: implement support for extended console drivers

2015-04-16 Thread Tejun Heo

printk log_buf keeps various metadata for each message including its
sequence number and timestamp.  The metadata is currently available
only through /dev/kmsg and stripped out before passed onto console
drivers.  We want this metadata to be available to console drivers
too.  Immediately, it's to implement reliable netconsole but may be
useful for other console devices too.

This patch implements support for extended console drivers.  Consoles
can indicate that they process extended messages by setting the new
CON_EXTENDED flag and they'll fed messages formatted the same way as
/dev/kmsg output as follows.

 ,,,;

One special case is fragments.  Message fragments are output
immediately to consoles to avoid losing them in case of crashes.  For
normal consoles, this is handled by later suppressing the assembled
result and /dev/kmsg only shows fully assembled message; however,
extended consoles would need both the fragments, to avoid losing
message in case of a crash, and the assembled result, to tell how the
fragments are assembled and which sequence number got assigned to it.

To help matching up the fragments with the resulting message,
fragments are emitted in the following format.

 ,-,,-,fragid=;

And later when the assembly is complete, the following is transmitted,

 fragid=;

* Extended message formatting for console drivers is enabled iff there
  are registered extended consoles.

* Comment describing extended message formats updated to help
  distinguishing variable with verbatim terms.

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
Cc: Petr Mladek 
---
 include/linux/console.h |   1 +
 kernel/printk/printk.c  | 141 +---
 2 files changed, 123 insertions(+), 19 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 7571a16..04bbd09 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -115,6 +115,7 @@ static inline int con_debug_leave(void)
 #define CON_BOOT   (8)
 #define CON_ANYTIME(16) /* Safe to call when cpu is offline */
 #define CON_BRL(32) /* Used for a braille device */
+#define CON_EXTENDED   (64) /* Use the extended output format a la /dev/kmsg */
 
 struct console {
charname[16];
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 0175c46..349a37b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -84,6 +84,8 @@ static struct lockdep_map console_lock_dep_map = {
 };
 #endif
 
+static int nr_ext_console_drivers;
+
 /*
  * Helper macros to handle lockdep when locking/unlocking console_sem. We use
  * macros instead of functions so that _RET_IP_ contains useful information.
@@ -195,14 +197,28 @@ static int console_may_schedule;
  * need to be changed in the future, when the requirements change.
  *
  * /dev/kmsg exports the structured data in the following line format:
- *   "level,sequnum,timestamp;\n"
+ *   ",,,;\n"
  *
  * The optional key/value pairs are attached as continuation lines starting
  * with a space character and terminated by a newline. All possible
  * non-prinatable characters are escaped in the "\xff" notation.
  *
  * Users of the export format should ignore possible additional values
- * separated by ',', and find the message after the ';' character.
+ * separated by ',', and find the message after the ';' character. All
+ * optional header fields should have the form "=".
+ *
+ * For consoles with CON_EXTENDED set, a message formatted like the
+ * following may also be printed. This is a continuation fragment which are
+ * being assembled and will be re-transmitted with a normal header once
+ * assembly finishes. The fragments are sent out immediately to avoid
+ * losing them over a crash.
+ *   ",-,,-,fragid=;\n"
+ *
+ * On completion of assembly, the following is transmitted.
+ *   "fragid=;\n"
+ *
+ * Extended consoles should identify and handle duplicates by matching the
+ * fragids of the fragments and assembled messages.
  */
 
 enum log_flags {
@@ -210,6 +226,7 @@ enum log_flags {
LOG_NEWLINE = 2,/* text ended with a newline */
LOG_PREFIX  = 4,/* text started with a prefix */
LOG_CONT= 8,/* text is a fragment of a continuation line */
+   LOG_DICT_META   = 16,   /* dict contains console meta information */
 };
 
 struct printk_log {
@@ -292,6 +309,12 @@ static char *log_dict(const struct printk_log *msg)
return (char *)msg + sizeof(struct printk_log) + msg->text_len;
 }
 
+/* if LOG_DICT_META is set, the dict buffer carries printk internal info */
+static size_t log_dict_len(const struct printk_log *msg)
+{
+   return (msg->flags & LOG_DICT_META) ? 0 : msg->dict_len;
+}
+
 /* get record by index; idx must point to valid msg */
 static struct printk_log *log_from_idx(u32 idx)
 {
@@ -516,7 +539,9 @@ static ssize_t msg_print_ext_header(char *buf, size_t size,
enum log_flags prev_flags)
 {
u64

[PATCH 16/16] netconsole: update documentation for extended netconsole

2015-04-16 Thread Tejun Heo

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 Documentation/networking/netconsole.txt | 95 -
 1 file changed, 94 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/netconsole.txt 
b/Documentation/networking/netconsole.txt
index a5d574a..e7a57c2 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -2,6 +2,7 @@
 started by Ingo Molnar , 2001.09.17
 2.6 port and netpoll api by Matt Mackall , Sep 9 2003
 IPv6 support by Cong Wang , Jan 1 2013
+Reliable extended console support by Tejun Heo , Apr 16 2015
 
 Please send bug reports to Matt Mackall 
 Satyam Sharma , and Cong Wang 

@@ -24,9 +25,10 @@ Sender and receiver configuration:
 It takes a string configuration parameter "netconsole" in the
 following format:
 
- netconsole=[src-port]@[src-ip]/[],[tgt-port]@/[tgt-macaddr]
+ netconsole=[+][src-port]@[src-ip]/[],[tgt-port]@/[tgt-macaddr]
 
where
++ if present, enable reliable extended console support
 src-port  source for UDP packets (defaults to 6665)
 src-ipsource IP to use (interface address)
 dev   network interface (eth0)
@@ -107,6 +109,7 @@ To remove a target:
 The interface exposes these parameters of a netconsole target to userspace:
 
enabled Is this target currently enabled?   (read-write)
+   extendedExtended mode enabled   (read-write)
dev_nameLocal network interface name(read-write)
local_port  Source UDP port to use  (read-write)
remote_port Remote agent's UDP port (read-write)
@@ -132,6 +135,96 @@ You can also update the local interface dynamically. This 
is especially
 useful if you want to use interfaces that have newly come up (and may not
 have existed when netconsole was loaded / initialized).
 
+Reliable extended console:
+==
+
+If '+' is prefixed to the configuration line or "extended" config file
+is set to 1, reliable extended console support is enabled. An example
+boot param follows.
+
+ linux netconsole=+@10.0.0.1/eth1,9353@10.0.0.2/12:34:56:78:9a:bc
+
+Reliable extended console support is consisted of three parts.
+
+1. Extended format messages
+---
+
+Log messages are transmitted with extended metadata header in the
+following format which is the same as /dev/kmsg.
+
+ ,,,;
+
+Non printable characters in  are escaped using "\xff"
+notation. If the message contains optional dictionary, verbatim
+newline is used as the delimeter. printk fragments from KERN_CONT
+messages are transmitted in the following form.
+
+ ,-,,-,fragid=;
+
+Where  uniquely identifies the message being assembled. Later
+when the assembly is complete, the completed message with sequence
+number is transmitted. The receiver is responsible for handling the
+overlaps between fragements and the final assembled message.
+
+If a message doesn't fit in 1000 bytes, the message is split into
+multiple fragments by netconsole. These fragments are transmitted with
+"ncfrag" header field added.
+
+ ncfrag=@<0-based-chunk-index>/
+
+For example,
+
+ 6,416,1758426,-,ncfrag=0@0/2;the first chunk,
+ 6,416,1758426,-,ncfrag=16@1/2;the second chunk
+
+2. Retransmission requests
+--
+
+If extended console support is enabled, netconsole listens for packets
+arriving on the source port. The receiver can send the packets of the
+following form to request messages with specific sequence numbers.
+
+ nca  ...
+
+There can be any number of  as long as they fit inside
+1000 bytes. If the messages with the matching sequence numbers exist,
+netconsole will retransmit them immediately.
+
+3. ACK and emergency transmission
+-
+
+If the receiver sends back a packet with the following format,
+netconsole enables ACK support.
+
+ nca  ...
+
+If the ack sequence lags the current sequence by more than 10s,
+emergency transmission is started and all the unacked messages are
+retransmitted periodically with increasing interval. In the first
+round, each unacked message is transmitted every 100ms. In the next,
+every 200ms. The interval doubles until it reaches 1s and stays there.
+
+Each emergency transmitted message has "emg=1" header field added to
+it. For example,
+
+ 6,416,1758426,-,emg=1;netpoll: netconsole: local port 
+
+This exists to ensure that messages reach the destination even when
+things went pear-shaped. As long as netpoll and timer are working, the
+logging target has pretty good chance of receiving all messages.
+
+Receiver
+
+
+libncrx is a library which implements reliable remote logger using the
+above features. The library is available under tools/lib/netconsole.
+The library is a pure state machine with packet and timer plumbing
+left to the library user to ease integration into larger framework.
+See

[PATCH 15/16] netconsole: implement netconsole receiver library

2015-04-16 Thread Tejun Heo

This patch implements libncrx.a and a simple receiver program ncrx
using the library.  The library makes use of the extended header,
retransmission and ack support to implement reliable netconsole
receiver.  The library is structured as a pure state machine leaving
all IO and timing handling to the library user and should be easy to
fit into any environment.

The reordering and retransmission mechanisms are fairly simple.  Each
receiver instance has a ring buffer where messages are slotted
according to their sequence number.  A hole in sequence numbers
trigger retransmission after the sequence progresses certain number of
slots or certain amount of duration has passed.

See tools/lib/netconsole/ncrx.h for information on usage.

Tested with simulated heavy packet loss (50%).  Messages are
transferred without loss and all the common scenarios including
reboots are handled correctly.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 tools/Makefile|  16 +-
 tools/lib/netconsole/Makefile |  36 ++
 tools/lib/netconsole/ncrx.c   | 906 ++
 tools/lib/netconsole/ncrx.h   | 204 ++
 tools/ncrx/Makefile   |  14 +
 tools/ncrx/ncrx.c | 143 +++
 6 files changed, 1318 insertions(+), 1 deletion(-)
 create mode 100644 tools/lib/netconsole/Makefile
 create mode 100644 tools/lib/netconsole/ncrx.c
 create mode 100644 tools/lib/netconsole/ncrx.h
 create mode 100644 tools/ncrx/Makefile
 create mode 100644 tools/ncrx/ncrx.c

diff --git a/tools/Makefile b/tools/Makefile
index 9a617ad..9f588f7 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -9,6 +9,7 @@ help:
@echo '  firewire   - the userspace part of nosy, an IEEE-1394 traffic 
sniffer'
@echo '  hv - tools used when in Hyper-V clients'
@echo '  lguest - a minimal 32-bit x86 hypervisor'
+   @echo '  ncrx   - simple reliable netconsole logger'
@echo '  perf   - Linux performance measurement and analysis tool'
@echo '  selftests  - various kernel selftests'
@echo '  turbostat  - Intel CPU idle stats and freq reporting tool'
@@ -50,6 +51,12 @@ liblockdep: FORCE
 libapikfs: FORCE
$(call descend,lib/api)
 
+libncrx: FORCE
+   $(call descend,lib/netconsole)
+
+ncrx: libncrx FORCE
+   $(call descend,$@)
+
 perf: libapikfs FORCE
$(call descend,$@)
 
@@ -100,6 +107,12 @@ liblockdep_clean:
 libapikfs_clean:
$(call descend,lib/api,clean)
 
+libncrx_clean:
+   $(call descend,lib/netconsole,clean)
+
+ncrx_clean: libncrx_clean
+   $(call descend,$(@:_clean=),clean)
+
 perf_clean: libapikfs_clean
$(call descend,$(@:_clean=),clean)
 
@@ -114,6 +127,7 @@ tmon_clean:
 
 clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean 
lguest_clean \
perf_clean selftests_clean turbostat_clean usb_clean 
virtio_clean \
-   vm_clean net_clean x86_energy_perf_policy_clean tmon_clean
+   vm_clean net_clean x86_energy_perf_policy_clean tmon_clean \
+   ncrx_clean
 
 .PHONY: FORCE
diff --git a/tools/lib/netconsole/Makefile b/tools/lib/netconsole/Makefile
new file mode 100644
index 000..6bc7997
--- /dev/null
+++ b/tools/lib/netconsole/Makefile
@@ -0,0 +1,36 @@
+include ../../scripts/Makefile.include
+
+CC = $(CROSS_COMPILE)gcc
+AR = $(CROSS_COMPILE)ar
+
+# guard against environment variables
+LIB_H=
+LIB_OBJS=
+
+LIB_H += ncrx.h
+LIB_OBJS += $(OUTPUT)ncrx.o
+
+LIBFILE = libncrx.a
+
+CFLAGS = -g -Wall -O2 $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) -fPIC
+ALL_CFLAGS = $(CFLAGS) $(BASIC_CFLAGS)
+ALL_LDFLAGS = $(LDFLAGS)
+
+RM = rm -f
+
+$(LIBFILE): $(LIB_OBJS)
+   $(QUIET_AR)$(RM) $@ && $(AR) rcs $(OUTPUT)$@ $(LIB_OBJS)
+
+$(LIB_OBJS): $(LIB_H)
+
+$(OUTPUT)%.o: %.c
+   $(QUIET_CC)$(CC) -o $@ -c $(ALL_CFLAGS) $<
+$(OUTPUT)%.s: %.c
+   $(QUIET_CC)$(CC) -S $(ALL_CFLAGS) $<
+$(OUTPUT)%.o: %.S
+   $(QUIET_CC)$(CC) -o $@ -c $(ALL_CFLAGS) $<
+
+clean:
+   $(call QUIET_CLEAN, libncrx) $(RM) $(LIB_OBJS) $(LIBFILE)
+
+.PHONY: clean
diff --git a/tools/lib/netconsole/ncrx.c b/tools/lib/netconsole/ncrx.c
new file mode 100644
index 000..8a87e6b
--- /dev/null
+++ b/tools/lib/netconsole/ncrx.c
@@ -0,0 +1,906 @@
+/*
+ * ncrx - extended netconsole receiver library
+ *
+ * Copyright (C) 2015  Facebook, Inc
+ * Copyright (C) 2015  Tejun Heo 
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ncrx.h"
+
+/* max payload len for responses, this is what netconsole uses on tx side */
+#define NCRX_RESP_MAX  1000
+/* oos history is tracked with a uint32_t */
+#define NCRX_OOS_MAX   32
+
+struct ncrx_msg_list {
+   struct ncrx_listhead;
+   int nr; /* number of msgs on the list */
+};
+
+struct ncrx_slot {
+   struct ncrx_msg *msg;
+   uint64_ttimestamp;

[PATCH 11/16] netconsole: consolidate enable/disable and create/destroy paths

2015-04-16 Thread Tejun Heo

netconsole management paths are scattered around in different places.
This patch reorganizes them so that

* All enable logic is in netconsole_enable() and disable in
  netconsole_disable().  Both should be called with netconsole_mutex
  held and netconsole_disable() may be invoked without intervening
  enable.

* alloc_param_target() now also handles linking the new target to
  target_list.  It's renamed to create_param_target() and now returns
  errno.

* store_enabled() now uses netconsole_enable/disable() instead of
  open-coding the logic.  This also fixes the missing synchronization
  against write path when manipulating ->enabled flag.

* drop_netconsole_target() and netconsole_deferred_disable_work_fn()
  updated to use netconsole_disable().

* init/cleanup_netconsole()'s destruction paths are consolidated into
  netconsole_destroy_all() which uses netconsole_disable().
  free_param_target() is no longer used and removed.

While the conversions aren't one-to-one equivalent, this patch
shouldn't cause any visible behavior differences and makes extension
of the enable/disable logic a lot easier.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 147 +--
 1 file changed, 79 insertions(+), 68 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index f0ac9f6..d72d902 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -83,6 +83,8 @@ static LIST_HEAD(target_list);
 /* protects target creation/destruction and enable/disable */
 static DEFINE_MUTEX(netconsole_mutex);
 
+static struct console netconsole;
+
 /**
  * struct netconsole_target - Represents a configured netconsole target.
  * @list:  Links this target into the target_list.
@@ -192,8 +194,43 @@ static struct netconsole_target 
*alloc_netconsole_target(void)
return nt;
 }
 
+static int netconsole_enable(struct netconsole_target *nt)
+{
+   int err;
+
+   lockdep_assert_held(_mutex);
+   WARN_ON_ONCE(nt->enabled);
+
+   err = netpoll_setup(>np);
+   if (err)
+   return err;
+
+   console_lock();
+   nt->enabled = true;
+   console_unlock();
+   return 0;
+}
+
+static void netconsole_disable(struct netconsole_target *nt)
+{
+   lockdep_assert_held(_mutex);
+
+   /*
+* We need to disable the netconsole before cleaning it up
+* otherwise we might end up in write_msg() with !nt->np.dev &&
+* nt->enabled.
+*/
+   if (nt->enabled) {
+   console_lock();
+   nt->enabled = false;
+   console_unlock();
+
+   netpoll_cleanup(>np);
+   }
+}
+
 /* Allocate new target (from boot/module param) and setup netpoll for it */
-static struct netconsole_target *alloc_param_target(char *target_config)
+static int create_param_target(char *target_config)
 {
int err = -ENOMEM;
struct netconsole_target *nt;
@@ -209,24 +246,26 @@ static struct netconsole_target *alloc_param_target(char 
*target_config)
if (err)
goto fail;
 
-   err = netpoll_setup(>np);
+   console_lock();
+   list_add(>list, _list);
+   console_unlock();
+
+   err = netconsole_enable(nt);
if (err)
-   goto fail;
+   goto fail_del;
 
-   nt->enabled = true;
+   /* Dump existing printks when we register */
+   netconsole.flags |= CON_PRINTBUFFER;
 
-   return nt;
+   return 0;
 
+fail_del:
+   console_lock();
+   list_del(>list);
+   console_unlock();
 fail:
kfree(nt);
-   return ERR_PTR(err);
-}
-
-/* Cleanup netpoll for given target (from boot/module param) and free it */
-static void free_param_target(struct netconsole_target *nt)
-{
-   netpoll_cleanup(>np);
-   kfree(nt);
+   return err;
 }
 
 #ifdef CONFIG_NETCONSOLE_DYNAMIC
@@ -350,24 +389,15 @@ static ssize_t store_enabled(struct netconsole_target *nt,
 */
netpoll_print_options(>np);
 
-   err = netpoll_setup(>np);
+   err = netconsole_enable(nt);
if (err)
return err;
 
pr_info("netconsole: network logging started\n");
} else {/* false */
-   /* We need to disable the netconsole before cleaning it up
-* otherwise we might end up in write_msg() with
-* nt->np.dev == NULL and nt->enabled == true
-*/
-   console_lock();
-   nt->enabled = false;
-   console_unlock();
-   netpoll_cleanup(>np);
+   netconsole_disable(nt);
}
 
-   nt->enabled = enabled;
-
return strnlen(buf, count);
 }
 
@@ -633,13 +663,7 @@ static void drop_netconsole_target(struct config_group 
*group,
list_del(>list);
console_unlock();
 
-   /*
-* The target may have never been

[PATCH 12/16] netconsole: implement extended console support

2015-04-16 Thread Tejun Heo

netconsole transmits raw console messages using one or multiple UDP
packets and there's no way to find out whether the packets are lost in
transit or received out of order.  Depending on the setup, this can
make the logging significantly unreliable and untrustworthy.

With the new extended console support, printk now can be told to
expose log metadata including the message sequence number to console
drivers which can be used by log receivers to determine whether and
which messages are missing and reorder messages received out of order.

This patch implements extended console support for netconsole which
can be enabled by either prepending "+" to a netconsole boot param
entry or echoing 1 to "extended" file in configfs.  When enabled,
netconsole transmits extended log messages with headers identical to
/dev/kmsg output.

netconsole may have to split a single messages to multiple fragments.
In this case, if the extended mode is enabled, an optional header of
the form "ncfrag=OFF@IDX/NR" is added to each fragment where OFF is
the byte offset of the message body, IDX is the 0-based fragment index
and NR is the number of total fragments for this message.

To avoid unnecessarily forcing printk to format extended messages,
extended netconsole is registered with printk iff it's actually used.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 159 ++-
 1 file changed, 158 insertions(+), 1 deletion(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index d72d902..626d9f0 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -83,6 +83,10 @@ static LIST_HEAD(target_list);
 /* protects target creation/destruction and enable/disable */
 static DEFINE_MUTEX(netconsole_mutex);
 
+static bool netconsole_ext_used_during_init;
+static bool netconsole_ext_registered;
+static struct console netconsole_ext;
+
 static struct console netconsole;
 
 /**
@@ -112,6 +116,7 @@ struct netconsole_target {
 #endif
boolenabled;
booldisable_scheduled;
+   boolextended;
struct netpoll  np;
 };
 
@@ -194,6 +199,81 @@ static struct netconsole_target 
*alloc_netconsole_target(void)
return nt;
 }
 
+/**
+ * send_ext_msg_udp - send extended log message to target
+ * @nt: target to send message to
+ * @msg: extended log message to send
+ * @msg_len: length of message
+ *
+ * Transfer extended log @msg to @nt.  If @msg is too long, it'll be split
+ * and transmitted in multiple chunks with ncfrag header field added to
+ * enable correct reassembly.
+ */
+static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
+int msg_len)
+{
+   static char buf[MAX_PRINT_CHUNK];
+   const int max_extra_len = sizeof(",ncfrag=@00/00");
+   const char *header, *body;
+   int header_len = msg_len, body_len = 0;
+   int chunk_len, nr_chunks, i;
+
+   if (!nt->enabled || !netif_running(nt->np.dev))
+   return;
+
+   if (msg_len <= MAX_PRINT_CHUNK) {
+   netpoll_send_udp(>np, msg, msg_len);
+   return;
+   }
+
+   /* need to insert extra header fields, detect header and body */
+   header = msg;
+   body = memchr(msg, ';', msg_len);
+   if (body) {
+   header_len = body - header;
+   body_len = msg_len - header_len - 1;
+   body++;
+   }
+
+   chunk_len = MAX_PRINT_CHUNK - header_len - max_extra_len;
+   if (WARN_ON_ONCE(chunk_len <= 0))
+   return;
+
+   /*
+* Transfer possibly multiple chunks with extra header fields.
+*
+* If @msg needs to be split to fit MAX_PRINT_CHUNK, add
+* "ncfrag=@<0-based-chunk-index>/" to
+* enable proper reassembly on receiver side.
+*/
+   memcpy(buf, header, header_len);
+   nr_chunks = DIV_ROUND_UP(body_len, chunk_len);
+
+   for (i = 0; i < nr_chunks; i++) {
+   int this_header = header_len;
+   int this_chunk;
+
+   if (nr_chunks > 1)
+   this_header += scnprintf(buf + this_header,
+sizeof(buf) - this_header,
+",ncfrag=%d@%d/%d",
+i * chunk_len, i, nr_chunks);
+   if (this_header < sizeof(buf))
+   buf[this_header++] = ';';
+
+   if (WARN_ON_ONCE(this_header + chunk_len > MAX_PRINT_CHUNK))
+   return;
+
+   this_chunk = min(body_len, chunk_len);
+   memcpy(buf + this_header, body, this_chunk);
+
+   netpoll_send_udp(>np, buf, this_header + this_chunk);
+
+   body += this_chunk;
+   body_len -= this_chunk;
+   }
+}
+
 static int netconsole_enable(struct

[PATCH 13/16] netconsole: implement retransmission support for extended consoles

2015-04-16 Thread Tejun Heo

With extended netconsole, the logger can reliably determine which
messages are missing.  This patch implements receiver for extended
netconsole which accepts retransmission request packets coming into
the the netconsole source port and sends back the requested messages.

The receiving socket is automatically created when an extended console
is enabled.  A poll_table with a custom callback is queued on the
socket.  When a packet is received on the socket, a work item is
scheduled which reads and parses the payload and sends back the
requested ones.  A retransmission packet looks like the following.

 nca  ...

The receiver doesn't interfere with the normal transmission path.
Even if something goes wrong with the network stack or receiver
itself, the normal transmission path should keep working.

This can be used to implement reliable netconsole logger.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 238 +++
 1 file changed, 238 insertions(+)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 626d9f0..b2763e0 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -45,7 +45,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -89,6 +92,14 @@ static struct console netconsole_ext;
 
 static struct console netconsole;
 
+union sockaddr_in46 {
+   struct sockaddr addr;
+   struct sockaddr_in6 in6;
+   struct sockaddr_in  in4;
+};
+
+static char *netconsole_ext_tx_buf;/* CONSOLE_EXT_LOG_MAX */
+
 /**
  * struct netconsole_target - Represents a configured netconsole target.
  * @list:  Links this target into the target_list.
@@ -118,6 +129,14 @@ struct netconsole_target {
booldisable_scheduled;
boolextended;
struct netpoll  np;
+
+   /* response packet handling for extended netconsoles */
+   union sockaddr_in46 raddr;  /* logging target address */
+   struct file *rx_file;   /* sock to receive responses */
+   poll_table  rx_ptable;  /* for polling the socket */
+   wait_queue_trx_wait;/* ditto */
+   wait_queue_head_t   *rx_waitq;  /* ditto */
+   struct work_struct  rx_work;/* receive & process packets */
 };
 
 #ifdef CONFIG_NETCONSOLE_DYNAMIC
@@ -274,6 +293,219 @@ static void send_ext_msg_udp(struct netconsole_target 
*nt, const char *msg,
}
 }
 
+static bool sockaddr_in46_equal(union sockaddr_in46 *a, union sockaddr_in46 *b)
+{
+   if (a->addr.sa_family != b->addr.sa_family)
+   return false;
+   if (a->in4.sin_port != b->in4.sin_port)
+   return false;
+   if (a->addr.sa_family == AF_INET &&
+   memcmp(>in4.sin_addr, >in4.sin_addr, sizeof(a->in4.sin_addr)))
+   return false;
+   if (a->addr.sa_family == AF_INET6 &&
+   memcmp(>in6.sin6_addr, >in6.sin6_addr, 
sizeof(a->in6.sin6_addr)))
+   return false;
+   return true;
+}
+
+/**
+ * netconsole_rx_work_fn - work function to handle netconsole ack packets
+ * @work: netconsole_target->rx_work
+ *
+ * This function is scheduled when packets are received on the source port
+ * of extended netconsole target and responds to ack messages from the
+ * remote logger.  An ack message has the following format.
+ *
+ * nca   ...
+ *
+ * There can be any number of missing-seq's as long as the whole payload
+ * fits inside MAX_PRINT_CHUNK.  This function re-transmits each message
+ * matching the missing-seq's if available.  This can be used to implement
+ * reliable logging from the receiver side.
+ *
+ * The rx path doesn't interfere with the normal tx path except for when it
+ * actually sends out the messages, and that path doesn't depend on
+ * anything more working compared to the normal tx path.  As such, this
+ * shouldn't make netconsole any less robust when things start going south.
+ */
+static void netconsole_rx_work_fn(struct work_struct *work)
+{
+   struct netconsole_target *nt =
+   container_of(work, struct netconsole_target, rx_work);
+   union sockaddr_in46 raddr;
+   struct msghdr msgh = { .msg_name = , };
+   struct kvec iov;
+   char *tx_buf = netconsole_ext_tx_buf;
+   char *rx_buf, *pos, *tok;
+   u64 seq;
+   int len;
+
+   rx_buf = kmalloc(MAX_PRINT_CHUNK + 1, GFP_KERNEL);
+   if (!rx_buf && printk_ratelimit()) {
+   pr_warning("failed to allocate RX buffer\n");
+   return;
+   }
+
+   iov.iov_base = rx_buf;
+   iov.iov_len = MAX_PRINT_CHUNK;
+repeat:
+   msgh.msg_namelen = sizeof(raddr);
+   len = kernel_recvmsg(sock_from_file(nt->rx_file, ), , , 1,
+MAX_PRINT_CHUNK, MSG_DONTWAIT);
+   if (len < 0) {
+   if (len != -EAGAIN &&

[PATCH 06/16] netconsole: make netconsole_target->enabled a bool

2015-04-16 Thread Tejun Heo

We'll add more bool's to netconsole_target.  Let's convert ->enabled
to bool first so that things stay consistent.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 15731d1..09d4e12 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -104,7 +104,7 @@ struct netconsole_target {
 #ifdef CONFIG_NETCONSOLE_DYNAMIC
struct config_item  item;
 #endif
-   int enabled;
+   boolenabled;
struct mutexmutex;
struct netpoll  np;
 };
@@ -197,7 +197,7 @@ static struct netconsole_target *alloc_param_target(char 
*target_config)
if (err)
goto fail;
 
-   nt->enabled = 1;
+   nt->enabled = true;
 
return nt;
 
@@ -322,13 +322,13 @@ static ssize_t store_enabled(struct netconsole_target *nt,
return err;
if (enabled < 0 || enabled > 1)
return -EINVAL;
-   if (enabled == nt->enabled) {
+   if ((bool)enabled == nt->enabled) {
pr_info("network logging has already %s\n",
nt->enabled ? "started" : "stopped");
return -EINVAL;
}
 
-   if (enabled) {  /* 1 */
+   if (enabled) {  /* true */
/*
 * Skip netpoll_parse_options() -- all the attributes are
 * already configured via configfs. Just print them out.
@@ -340,13 +340,13 @@ static ssize_t store_enabled(struct netconsole_target *nt,
return err;
 
pr_info("netconsole: network logging started\n");
-   } else {/* 0 */
+   } else {/* false */
/* We need to disable the netconsole before cleaning it up
 * otherwise we might end up in write_msg() with
-* nt->np.dev == NULL and nt->enabled == 1
+* nt->np.dev == NULL and nt->enabled == true
 */
spin_lock_irqsave(_list_lock, flags);
-   nt->enabled = 0;
+   nt->enabled = false;
spin_unlock_irqrestore(_list_lock, flags);
netpoll_cleanup(>np);
}
@@ -594,7 +594,7 @@ static struct config_item *make_netconsole_target(struct 
config_group *group,
 
/*
 * Allocate and initialize with defaults.
-* Target is disabled at creation (enabled == 0).
+* Target is disabled at creation (!enabled).
 */
nt = kzalloc(sizeof(*nt), GFP_KERNEL);
if (!nt)
@@ -695,7 +695,7 @@ restart:
spin_lock_irqsave(_list_lock, flags);
dev_put(nt->np.dev);
nt->np.dev = NULL;
-   nt->enabled = 0;
+   nt->enabled = false;
stopped = true;
netconsole_target_put(nt);
goto restart;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/16] netconsole: introduce netconsole_mutex

2015-04-16 Thread Tejun Heo

console_lock protects the target_list itself and setting and clearing
of its ->enabled flag; however, nothing protects the overall
enable/disable operations which we'll need to make netconsole
management more dynamic for the scheduled reliable transmission
support.  Also, an earlier patch introduced a small race window where
dynamic console disable may compete against asynchronous disable
kicked off from netdevice_notifier.

This patch adds netconsole_mutex which protects all target
create/destroy and enable/disable operations.  It also replaces
netconsole_target->mutex used by dynamic consoles.  The above
mentioned race is removed by this change.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 48 
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 57c02ab..f0ac9f6 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -75,10 +75,14 @@ __setup("netconsole=", option_setup);
 
 /*
  * Linked list of all configured targets.  The list and each target's
- * enable/disable state are protected by console_lock.
+ * enable/disable state are protected by both netconsole_mutex and
+ * console_lock.
  */
 static LIST_HEAD(target_list);
 
+/* protects target creation/destruction and enable/disable */
+static DEFINE_MUTEX(netconsole_mutex);
+
 /**
  * struct netconsole_target - Represents a configured netconsole target.
  * @list:  Links this target into the target_list.
@@ -106,7 +110,6 @@ struct netconsole_target {
 #endif
boolenabled;
booldisable_scheduled;
-   struct mutexmutex;
struct netpoll  np;
 };
 
@@ -184,7 +187,6 @@ static struct netconsole_target 
*alloc_netconsole_target(void)
strlcpy(nt->np.dev_name, "eth0", IFNAMSIZ);
nt->np.local_port = 6665;
nt->np.remote_port = ;
-   mutex_init(>mutex);
eth_broadcast_addr(nt->np.remote_mac);
 
return nt;
@@ -196,6 +198,8 @@ static struct netconsole_target *alloc_param_target(char 
*target_config)
int err = -ENOMEM;
struct netconsole_target *nt;
 
+   lockdep_assert_held(_mutex);
+
nt = alloc_netconsole_target();
if (!nt)
goto fail;
@@ -573,10 +577,10 @@ static ssize_t netconsole_target_attr_store(struct 
config_item *item,
struct netconsole_target_attr *na =
container_of(attr, struct netconsole_target_attr, attr);
 
-   mutex_lock(>mutex);
+   mutex_lock(_mutex);
if (na->store)
ret = na->store(nt, buf, count);
-   mutex_unlock(>mutex);
+   mutex_unlock(_mutex);
 
return ret;
 }
@@ -610,9 +614,11 @@ static struct config_item *make_netconsole_target(struct 
config_group *group,
config_item_init_type_name(>item, name, _target_type);
 
/* Adding, but it is disabled */
+   mutex_lock(_mutex);
console_lock();
list_add(>list, _list);
console_unlock();
+   mutex_unlock(_mutex);
 
return >item;
 }
@@ -622,6 +628,7 @@ static void drop_netconsole_target(struct config_group 
*group,
 {
struct netconsole_target *nt = to_target(item);
 
+   mutex_lock(_mutex);
console_lock();
list_del(>list);
console_unlock();
@@ -634,6 +641,7 @@ static void drop_netconsole_target(struct config_group 
*group,
netpoll_cleanup(>np);
 
config_item_put(>item);
+   mutex_unlock(_mutex);
 }
 
 static struct configfs_group_operations netconsole_subsys_group_ops = {
@@ -662,6 +670,7 @@ static void netconsole_deferred_disable_work_fn(struct 
work_struct *work)
 {
struct netconsole_target *nt, *to_disable;
 
+   mutex_lock(_mutex);
 repeat:
to_disable = NULL;
console_lock();
@@ -685,6 +694,8 @@ repeat:
netconsole_target_put(to_disable);
goto repeat;
}
+
+   mutex_unlock(_mutex);
 }
 
 static DECLARE_WORK(netconsole_deferred_disable_work,
@@ -791,6 +802,8 @@ static int __init init_netconsole(void)
char *target_config;
char *input = config;
 
+   mutex_lock(_mutex);
+
if (strnlen(input, MAX_PARAM_LENGTH)) {
while ((target_config = strsep(, ";"))) {
nt = alloc_param_target(target_config);
@@ -818,24 +831,21 @@ static int __init init_netconsole(void)
register_console();
pr_info("network logging started\n");
 
+   mutex_unlock(_mutex);
return err;
 
 undonotifier:
unregister_netdevice_notifier(_netdev_notifier);
-   cancel_work_sync(_deferred_disable_work);
 fail:
pr_err("cleaning up\n");
 
-   /*
-* Remove all targets and destroy them (only targets created
-* from the boot/module option exist here). Skipping the console
-* lock is safe here.
-*/
+

[PATCH 08/16] netconsole: punt disabling to workqueue from netdevice_notifier

2015-04-16 Thread Tejun Heo

The netdevice_notifier callback, netconsole_netdev_event(), needs to
perform netpoll_cleanup() for the affected targets; however, the
notifier is called with rtnl_lock held which the netpoll_cleanup()
path also grabs.  To avoid deadlock, the path uses __netpoll_cleanup()
instead and making the code path different from others.

The planned reliable netconsole support will add more logic to the
cleanup path making the slightly different paths painful.  Let's punt
netconsole_target disabling to workqueue so that it can later share
the same cleanup path.  This would also allow ditching
target_list_lock and depending on console_lock for synchronization.

Note that this introduces a narrow race window where the asynchronous
disabling may race against disabling from configfs ending up executing
netpoll_cleanup() more than once on the same instance.  The follow up
patches will fix it by introducing a mutex to protect overall enable /
disable operations; unfortunately, the locking update couldn't be
ordered before this change due to the locking order with rtnl_lock.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 58 ++--
 1 file changed, 41 insertions(+), 17 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 17692b8..d355776 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -105,6 +105,7 @@ struct netconsole_target {
struct config_item  item;
 #endif
boolenabled;
+   booldisable_scheduled;
struct mutexmutex;
struct netpoll  np;
 };
@@ -660,6 +661,39 @@ static struct configfs_subsystem netconsole_subsys = {
 
 #endif /* CONFIG_NETCONSOLE_DYNAMIC */
 
+static void netconsole_deferred_disable_work_fn(struct work_struct *work)
+{
+   struct netconsole_target *nt, *to_disable;
+   unsigned long flags;
+
+repeat:
+   to_disable = NULL;
+   spin_lock_irqsave(_list_lock, flags);
+   list_for_each_entry(nt, _list, list) {
+   if (!nt->disable_scheduled)
+   continue;
+   nt->disable_scheduled = false;
+
+   if (!nt->enabled)
+   continue;
+
+   netconsole_target_get(nt);
+   nt->enabled = false;
+   to_disable = nt;
+   break;
+   }
+   spin_unlock_irqrestore(_list_lock, flags);
+
+   if (to_disable) {
+   netpoll_cleanup(_disable->np);
+   netconsole_target_put(to_disable);
+   goto repeat;
+   }
+}
+
+static DECLARE_WORK(netconsole_deferred_disable_work,
+   netconsole_deferred_disable_work_fn);
+
 /* Handle network interface device notifications */
 static int netconsole_netdev_event(struct notifier_block *this,
   unsigned long event, void *ptr)
@@ -674,9 +708,7 @@ static int netconsole_netdev_event(struct notifier_block 
*this,
goto done;
 
spin_lock_irqsave(_list_lock, flags);
-restart:
list_for_each_entry(nt, _list, list) {
-   netconsole_target_get(nt);
if (nt->np.dev == dev) {
switch (event) {
case NETDEV_CHANGENAME:
@@ -685,23 +717,14 @@ restart:
case NETDEV_RELEASE:
case NETDEV_JOIN:
case NETDEV_UNREGISTER:
-   /* rtnl_lock already held
-* we might sleep in __netpoll_cleanup()
-*/
-   spin_unlock_irqrestore(_list_lock, 
flags);
-
-   __netpoll_cleanup(>np);
-
-   spin_lock_irqsave(_list_lock, flags);
-   dev_put(nt->np.dev);
-   nt->np.dev = NULL;
-   nt->enabled = false;
+   /* rtnl_lock already held, punt to workqueue */
+   if (nt->enabled && !nt->disable_scheduled) {
+   nt->disable_scheduled = true;
+   
schedule_work(_deferred_disable_work);
+   }
stopped = true;
-   netconsole_target_put(nt);
-   goto restart;
}
}
-   netconsole_target_put(nt);
}
spin_unlock_irqrestore(_list_lock, flags);
if (stopped) {
@@ -810,7 +833,7 @@ static int __init init_netconsole(void)
 
 undonotifier:
unregister_netdevice_notifier(_netdev_notifier);
-
+   cancel_work_sync(_deferred_disable_work);
 fail:
pr_err("cleaning up\n");
 
@@ -834,6 +857,7 @@ static void __exit cleanup_netconsole(void)

[PATCH 07/16] netconsole: factor out alloc_netconsole_target()

2015-04-16 Thread Tejun Heo

alloc_param_target() and make_netconsole_target() were duplicating the
same allocation and init logic.  Factor it out to
alloc_netconsole_target().

This is pure reorganization.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 40 
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 09d4e12..17692b8 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -167,19 +167,17 @@ static void netconsole_target_put(struct 
netconsole_target *nt)
 
 #endif /* CONFIG_NETCONSOLE_DYNAMIC */
 
-/* Allocate new target (from boot/module param) and setup netpoll for it */
-static struct netconsole_target *alloc_param_target(char *target_config)
+/*
+ * Allocate and initialize with defaults.  Note that these targets get
+ * their config_item fields zeroed-out.
+ */
+static struct netconsole_target *alloc_netconsole_target(void)
 {
-   int err = -ENOMEM;
struct netconsole_target *nt;
 
-   /*
-* Allocate and initialize with defaults.
-* Note that these targets get their config_item fields zeroed-out.
-*/
nt = kzalloc(sizeof(*nt), GFP_KERNEL);
if (!nt)
-   goto fail;
+   return NULL;
 
nt->np.name = "netconsole";
strlcpy(nt->np.dev_name, "eth0", IFNAMSIZ);
@@ -188,6 +186,19 @@ static struct netconsole_target *alloc_param_target(char 
*target_config)
mutex_init(>mutex);
eth_broadcast_addr(nt->np.remote_mac);
 
+   return nt;
+}
+
+/* Allocate new target (from boot/module param) and setup netpoll for it */
+static struct netconsole_target *alloc_param_target(char *target_config)
+{
+   int err = -ENOMEM;
+   struct netconsole_target *nt;
+
+   nt = alloc_netconsole_target();
+   if (!nt)
+   goto fail;
+
/* Parse parameters and setup netpoll */
err = netpoll_parse_options(>np, target_config);
if (err)
@@ -592,21 +603,10 @@ static struct config_item *make_netconsole_target(struct 
config_group *group,
unsigned long flags;
struct netconsole_target *nt;
 
-   /*
-* Allocate and initialize with defaults.
-* Target is disabled at creation (!enabled).
-*/
-   nt = kzalloc(sizeof(*nt), GFP_KERNEL);
+   nt = alloc_netconsole_target();
if (!nt)
return ERR_PTR(-ENOMEM);
 
-   nt->np.name = "netconsole";
-   strlcpy(nt->np.dev_name, "eth0", IFNAMSIZ);
-   nt->np.local_port = 6665;
-   nt->np.remote_port = ;
-   mutex_init(>mutex);
-   eth_broadcast_addr(nt->np.remote_mac);
-
/* Initialize the config_item member */
config_item_init_type_name(>item, name, _target_type);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 09/16] netconsole: replace target_list_lock with console_lock

2015-04-16 Thread Tejun Heo

netconsole has been using a spinlock - target_list_lock - to protect
the list of configured netconsole targets and their enable/disable
states.  With the disabling from netdevice_notifier moved off to a
workqueue by the previous patch and thus outside of rtnl_lock,
target_list_lock can be replaced with console_lock, which allows us to
avoid grabbing an extra lock in the log write path and can simplify
locking when involving other subsystems as console_lock is only
trylocked from printk path.

This patch replaces target_list_lock with console_lock.  The
conversion is one-to-one except for write_msg().  The function is
called with console_lock() already held so no further locking is
necessary; however, as netpoll_send_udp() expects irq to be disabled,
explicit irq save/restore pair is added around it.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 50 ++--
 1 file changed, 19 insertions(+), 31 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index d355776..57c02ab 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -73,12 +73,12 @@ static int __init option_setup(char *opt)
 __setup("netconsole=", option_setup);
 #endif /* MODULE */
 
-/* Linked list of all configured targets */
+/*
+ * Linked list of all configured targets.  The list and each target's
+ * enable/disable state are protected by console_lock.
+ */
 static LIST_HEAD(target_list);
 
-/* This needs to be a spinlock because write_msg() cannot sleep */
-static DEFINE_SPINLOCK(target_list_lock);
-
 /**
  * struct netconsole_target - Represents a configured netconsole target.
  * @list:  Links this target into the target_list.
@@ -325,7 +325,6 @@ static ssize_t store_enabled(struct netconsole_target *nt,
 const char *buf,
 size_t count)
 {
-   unsigned long flags;
int enabled;
int err;
 
@@ -357,9 +356,9 @@ static ssize_t store_enabled(struct netconsole_target *nt,
 * otherwise we might end up in write_msg() with
 * nt->np.dev == NULL and nt->enabled == true
 */
-   spin_lock_irqsave(_list_lock, flags);
+   console_lock();
nt->enabled = false;
-   spin_unlock_irqrestore(_list_lock, flags);
+   console_unlock();
netpoll_cleanup(>np);
}
 
@@ -601,7 +600,6 @@ static struct config_item_type netconsole_target_type = {
 static struct config_item *make_netconsole_target(struct config_group *group,
  const char *name)
 {
-   unsigned long flags;
struct netconsole_target *nt;
 
nt = alloc_netconsole_target();
@@ -612,9 +610,9 @@ static struct config_item *make_netconsole_target(struct 
config_group *group,
config_item_init_type_name(>item, name, _target_type);
 
/* Adding, but it is disabled */
-   spin_lock_irqsave(_list_lock, flags);
+   console_lock();
list_add(>list, _list);
-   spin_unlock_irqrestore(_list_lock, flags);
+   console_unlock();
 
return >item;
 }
@@ -622,12 +620,11 @@ static struct config_item *make_netconsole_target(struct 
config_group *group,
 static void drop_netconsole_target(struct config_group *group,
   struct config_item *item)
 {
-   unsigned long flags;
struct netconsole_target *nt = to_target(item);
 
-   spin_lock_irqsave(_list_lock, flags);
+   console_lock();
list_del(>list);
-   spin_unlock_irqrestore(_list_lock, flags);
+   console_unlock();
 
/*
 * The target may have never been enabled, or was manually disabled
@@ -664,11 +661,10 @@ static struct configfs_subsystem netconsole_subsys = {
 static void netconsole_deferred_disable_work_fn(struct work_struct *work)
 {
struct netconsole_target *nt, *to_disable;
-   unsigned long flags;
 
 repeat:
to_disable = NULL;
-   spin_lock_irqsave(_list_lock, flags);
+   console_lock();
list_for_each_entry(nt, _list, list) {
if (!nt->disable_scheduled)
continue;
@@ -682,7 +678,7 @@ repeat:
to_disable = nt;
break;
}
-   spin_unlock_irqrestore(_list_lock, flags);
+   console_unlock();
 
if (to_disable) {
netpoll_cleanup(_disable->np);
@@ -698,7 +694,6 @@ static DECLARE_WORK(netconsole_deferred_disable_work,
 static int netconsole_netdev_event(struct notifier_block *this,
   unsigned long event, void *ptr)
 {
-   unsigned long flags;
struct netconsole_target *nt;
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
bool stopped = false;
@@ -707,7 +702,7 @@ static int netconsole_netdev_event(struct notifier_block 
*this,
  event ==

[PATCH 02/16] printk: factor out message formatting from devkmsg_read()

2015-04-16 Thread Tejun Heo

The extended message formatting used for /dev/kmsg will be used
implement extended consoles.  Factor out msg_print_ext_header() and
msg_print_ext_body() from devkmsg_read().

This is pure restructuring.

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
Cc: Petr Mladek 
---
 kernel/printk/printk.c | 157 ++---
 1 file changed, 85 insertions(+), 72 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b6e24af..5ea6709 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -505,6 +505,86 @@ int check_syslog_permissions(int type, bool from_file)
return security_syslog(type);
 }
 
+static void append_char(char **pp, char *e, char c)
+{
+   if (*pp < e)
+   *(*pp)++ = c;
+}
+
+static ssize_t msg_print_ext_header(char *buf, size_t size,
+   struct printk_log *msg, u64 seq,
+   enum log_flags prev_flags)
+{
+   u64 ts_usec = msg->ts_nsec;
+   char cont = '-';
+
+   do_div(ts_usec, 1000);
+
+   /*
+* If we couldn't merge continuation line fragments during the print,
+* export the stored flags to allow an optional external merge of the
+* records. Merging the records isn't always neccessarily correct, like
+* when we hit a race during printing. In most cases though, it produces
+* better readable output. 'c' in the record flags mark the first
+* fragment of a line, '+' the following.
+*/
+   if (msg->flags & LOG_CONT && !(prev_flags & LOG_CONT))
+   cont = 'c';
+   else if ((msg->flags & LOG_CONT) ||
+((prev_flags & LOG_CONT) && !(msg->flags & LOG_PREFIX)))
+   cont = '+';
+
+   return scnprintf(buf, size, "%u,%llu,%llu,%c;",
+  (msg->facility << 3) | msg->level, seq, ts_usec, cont);
+}
+
+static ssize_t msg_print_ext_body(char *buf, size_t size,
+ char *dict, size_t dict_len,
+ char *text, size_t text_len)
+{
+   char *p = buf, *e = buf + size;
+   size_t i;
+
+   /* escape non-printable characters */
+   for (i = 0; i < text_len; i++) {
+   unsigned char c = text[i];
+
+   if (c < ' ' || c >= 127 || c == '\\')
+   p += scnprintf(p, e - p, "\\x%02x", c);
+   else
+   append_char(, e, c);
+   }
+   append_char(, e, '\n');
+
+   if (dict_len) {
+   bool line = true;
+
+   for (i = 0; i < dict_len; i++) {
+   unsigned char c = dict[i];
+
+   if (line) {
+   append_char(, e, ' ');
+   line = false;
+   }
+
+   if (c == '\0') {
+   append_char(, e, '\n');
+   line = true;
+   continue;
+   }
+
+   if (c < ' ' || c >= 127 || c == '\\') {
+   p += scnprintf(p, e - p, "\\x%02x", c);
+   continue;
+   }
+
+   append_char(, e, c);
+   }
+   append_char(, e, '\n');
+   }
+
+   return p - buf;
+}
 
 /* /dev/kmsg - userspace message inject/listen interface */
 struct devkmsg_user {
@@ -565,30 +645,17 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct 
iov_iter *from)
return ret;
 }
 
-static void append_char(char **pp, char *e, char c)
-{
-   if (*pp < e)
-   *(*pp)++ = c;
-}
-
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
 {
struct devkmsg_user *user = file->private_data;
struct printk_log *msg;
-   char *p, *e;
-   u64 ts_usec;
-   size_t i;
-   char cont = '-';
size_t len;
ssize_t ret;
 
if (!user)
return -EBADF;
 
-   p = user->buf;
-   e = user->buf + sizeof(user->buf);
-
ret = mutex_lock_interruptible(>lock);
if (ret)
return ret;
@@ -618,71 +685,17 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
}
 
msg = log_from_idx(user->idx);
-   ts_usec = msg->ts_nsec;
-   do_div(ts_usec, 1000);
-
-   /*
-* If we couldn't merge continuation line fragments during the print,
-* export the stored flags to allow an optional external merge of the
-* records. Merging the records isn't always neccessarily correct, like
-* when we hit a race during printing. In most cases though, it produces
-* better readable output. 'c' in the record flags mark the first
-* fragment of a line, '+' the following.
-*/
-   if (msg->flags & LOG_CONT && !(user->prev

[PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-16 Thread Tejun Heo

In a lot of configurations, netconsole is a useful way to collect
system logs; however, all netconsole does is simply emitting UDP
packets for the raw messages and there's no way for the receiver to
find out whether the packets were lost and/or reordered in flight.

printk already keeps log metadata which contains enough information to
make netconsole reliable.  This patchset does the followings.

* Make printk metadata available to console drivers.  A console driver
  can request this mode by setting CON_EXTENDED.  The metadata is
  emitted in the same format as /dev/kmsg.  This also makes all
  logging metadata including facility, loglevel and dictionary
  available to console receivers.

* Implement extended mode support in netconsole.  When enabled,
  netconsole transmits messages with extended header which is enough
  for the receiver to detect missing messages.

* Implement netconsole retransmission support.  Matching rx socket on
  the source port is automatically created for extended targets and
  the log receiver can request retransmission by sending reponse
  packets.  This is completely decoupled from the main write path and
  doesn't make netconsole less robust when things start go south.

* Implement netconsole ack support.  The response packet can
  optionally contain ack which enables emergency transmission timer.
  If acked sequence lags the current sequence for over 10s, netconsole
  repeatedly re-sends unacked messages with increasing interval.  This
  ensures that the receiver has the latest messages and also that all
  messages are transferred even while the kernel is failing as long as
  timer and netpoll are operational.  This too is completely decoupled
  from the main write path and doesn't make netconsole less robust.

* Implement the receiver library and simple receiver using it
  respectively in tools/lib/netconsole/libncrx.a and tools/ncrx/ncrx.
  In a simulated test with heavy packet loss (50%), ncrx logs all
  messages reliably and handle exceptional conditions including
  reboots as expected.

An obvious alternative for reliable loggin would be using a separate
TCP connection in addition to the UDP packets; however, I decided for
UDP based retransmission and ack mechanism for the following reasons.

* kernel side doesn't get simpler by using TCP.  It'd still need to
  transmit extended format messages, which BTW are useful regardless
  of reliable transmission, to match up UDP and TCP messages and
  detect missing ones from TCP send buffer filling up.  Also, the
  timeout and emergency transmission support would still be necessary
  to ensure that messages are transmitted in case of, e.g., network
  stack faiure.  It'd at least be about the same amount of code as the
  UDP based implementation.

* Receiver side might be a bit simpler but not by much.  It'd still
  need to keep track of the UDP based messages and then match them up
  with TCP messages and put messages from both sources in order (each
  stream may miss different ones) and would have to deal with
  reestablishing connections after reboots.  The only part which can
  completely go away would be the actual ack and retransmission part
  and that isn't a lot of logic.

* When the network condition is good, the only thing the UDP based
  implementation adds is occassional ack messages.  TCP based
  implementation would end up transmitting all messages twice which
  still isn't much but kinda silly given that using TCP doesn't lower
  the complexity in meaningful ways.

This patchset contains the following 16 patches.

 0001-printk-guard-the-amount-written-per-line-by-devkmsg_.patch
 0002-printk-factor-out-message-formatting-from-devkmsg_re.patch
 0003-printk-move-LOG_NOCONS-skipping-into-call_console_dr.patch
 0004-printk-implement-support-for-extended-console-driver.patch
 0005-printk-implement-log_seq_range-and-ext_log_from_seq.patch
 0006-netconsole-make-netconsole_target-enabled-a-bool.patch
 0007-netconsole-factor-out-alloc_netconsole_target.patch
 0008-netconsole-punt-disabling-to-workqueue-from-netdevic.patch
 0009-netconsole-replace-target_list_lock-with-console_loc.patch
 0010-netconsole-introduce-netconsole_mutex.patch
 0011-netconsole-consolidate-enable-disable-and-create-des.patch
 0012-netconsole-implement-extended-console-support.patch
 0013-netconsole-implement-retransmission-support-for-exte.patch
 0014-netconsole-implement-ack-handling-and-emergency-tran.patch
 0015-netconsole-implement-netconsole-receiver-library.patch
 0016-netconsole-update-documentation-for-extended-netcons.patch

0001-0005 implement extended console support in printk.

0006-0011 are prep patches for netconsole.

0012-0014 implement extended mode, retransmission and ack support.

0015 implements receiver library, libncrx, and a simple receiver using
the library, ncrx.

0016 updates documentation.

As the patchset touches both printk and netconsole, I'm not sure how
these patches should be routed once acked.  Either -mm or net

[RFC PATCH 4/4] mm: madvise allow remove operation for hugetlbfs

2015-04-16 Thread Mike Kravetz

Now that we have hole punching support for hugetlbfs, we can
also support the MADV_REMOVE interface to it.

Signed-off-by: Dave Hansen 
Signed-off-by: Mike Kravetz 
---
 mm/madvise.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index d551475..c4a1027 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -299,7 +299,7 @@ static long madvise_remove(struct vm_area_struct *vma,
 
*prev = NULL;   /* tell sys_madvise we drop mmap_sem */
 
-   if (vma->vm_flags & (VM_LOCKED | VM_HUGETLB))
+   if (vma->vm_flags & VM_LOCKED)
return -EINVAL;
 
f = vma->vm_file;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/16] printk: guard the amount written per line by devkmsg_read()

2015-04-16 Thread Tejun Heo

devkmsg_read() uses 8k buffer and assumes that the formatted output
message won't overrun which seems safe given LOG_LINE_MAX, the current
use of dict and the escaping method being used; however, we're
planning to use devkmsg formatting wider and accounting for the buffer
size properly isn't that complicated.

This patch defines CONSOLE_EXT_LOG_MAX as 8192 and updates
devkmsg_read() so that it limits output accordingly.

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
Cc: Petr Mladek 
---
 include/linux/printk.h |  2 ++
 kernel/printk/printk.c | 35 +++
 2 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 9b30871..58b1fec 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -30,6 +30,8 @@ static inline const char *printk_skip_level(const char 
*buffer)
return buffer;
 }
 
+#define CONSOLE_EXT_LOG_MAX8192
+
 /* printk's without a loglevel use this.. */
 #define MESSAGE_LOGLEVEL_DEFAULT CONFIG_MESSAGE_LOGLEVEL_DEFAULT
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 879edfc..b6e24af 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -512,7 +512,7 @@ struct devkmsg_user {
u32 idx;
enum log_flags prev;
struct mutex lock;
-   char buf[8192];
+   char buf[CONSOLE_EXT_LOG_MAX];
 };
 
 static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
@@ -565,11 +565,18 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct 
iov_iter *from)
return ret;
 }
 
+static void append_char(char **pp, char *e, char c)
+{
+   if (*pp < e)
+   *(*pp)++ = c;
+}
+
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
 {
struct devkmsg_user *user = file->private_data;
struct printk_log *msg;
+   char *p, *e;
u64 ts_usec;
size_t i;
char cont = '-';
@@ -579,6 +586,9 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
if (!user)
return -EBADF;
 
+   p = user->buf;
+   e = user->buf + sizeof(user->buf);
+
ret = mutex_lock_interruptible(>lock);
if (ret)
return ret;
@@ -625,9 +635,9 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
 ((user->prev & LOG_CONT) && !(msg->flags & LOG_PREFIX)))
cont = '+';
 
-   len = sprintf(user->buf, "%u,%llu,%llu,%c;",
- (msg->facility << 3) | msg->level,
- user->seq, ts_usec, cont);
+   p += scnprintf(p, e - p, "%u,%llu,%llu,%c;",
+  (msg->facility << 3) | msg->level,
+  user->seq, ts_usec, cont);
user->prev = msg->flags;
 
/* escape non-printable characters */
@@ -635,11 +645,11 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
unsigned char c = log_text(msg)[i];
 
if (c < ' ' || c >= 127 || c == '\\')
-   len += sprintf(user->buf + len, "\\x%02x", c);
+   p += scnprintf(p, e - p, "\\x%02x", c);
else
-   user->buf[len++] = c;
+   append_char(, e, c);
}
-   user->buf[len++] = '\n';
+   append_char(, e, '\n');
 
if (msg->dict_len) {
bool line = true;
@@ -648,30 +658,31 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
unsigned char c = log_dict(msg)[i];
 
if (line) {
-   user->buf[len++] = ' ';
+   append_char(, e, ' ');
line = false;
}
 
if (c == '\0') {
-   user->buf[len++] = '\n';
+   append_char(, e, '\n');
line = true;
continue;
}
 
if (c < ' ' || c >= 127 || c == '\\') {
-   len += sprintf(user->buf + len, "\\x%02x", c);
+   p += scnprintf(p, e - p, "\\x%02x", c);
continue;
}
 
-   user->buf[len++] = c;
+   append_char(, e, c);
}
-   user->buf[len++] = '\n';
+   append_char(, e, '\n');
}
 
user->idx = log_next(user->idx);
user->seq++;
raw_spin_unlock_irq(_lock);
 
+   len = p - user->buf;
if (len > count) {
ret = -EINVAL;
goto out;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at

[RFC PATCH 3/4] hugetlbfs: add hugetlbfs_fallocate()

2015-04-16 Thread Mike Kravetz

This is based on the shmem version, but it has diverged quite
a bit.  We have no swap to worry about, nor the new file sealing.

What this allows us to do is move physical memory in and out of
a hugetlbfs file without having it mapped.  This also gives us
the ability to support MADV_REMOVE since it is currently
implemented using fallocate().  MADV_REMOVE lets us remove data
from the middle of a hugetlbfs file, which wasn't possible before.

hugetlbfs fallocate only operates on whole huge pages.

Based-on code-by: Dave Hansen 
Signed-off-by: Mike Kravetz 
---
 fs/hugetlbfs/inode.c| 139 
 include/linux/hugetlb.h |   3 ++
 mm/hugetlb.c|   2 +-
 3 files changed, 143 insertions(+), 1 deletion(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index d5b67fd..6d48c8f 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include/* remove ASAP */
+#include 
 #include 
 #include 
 #include 
@@ -377,6 +378,143 @@ static void truncate_hugepages(struct inode *inode, 
loff_t lstart, loff_t lend)
hugetlb_unreserve_pages(inode, start, freed);
 }
 
+static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t 
len)
+{
+   struct hstate *h = hstate_inode(inode);
+   unsigned long hpage_size = huge_page_size(h);
+   loff_t hole_start, hole_end;
+
+   /*
+* For hole punch round up the beginning offset of the hole and
+* round down the end.
+*/
+   hole_start = (offset + hpage_size - 1) & ~huge_page_mask(h);
+   hole_end = (offset + len - (hpage_size - 1)) * ~huge_page_mask(h);
+
+   if ((u64)hole_end > (u64)hole_start) {
+   struct address_space *mapping = >i_data;
+
+   mutex_lock(>i_mutex);
+   unmap_mapping_range(mapping, hole_start, hole_end, 0);
+   truncate_hugepages(inode, hole_start, hole_end);
+   mutex_unlock(>i_mutex);
+   }
+
+   return 0;
+}
+
+static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
+   loff_t len)
+{
+   struct inode *inode = file_inode(file);
+   struct address_space *mapping = inode->i_mapping;
+   struct hstate *h = hstate_inode(inode);
+   struct vm_area_struct pseudo_vma;
+   unsigned long hpage_size = huge_page_size(h);
+   unsigned long hpage_shift = huge_page_shift(h);
+   pgoff_t start, index, end;
+   unsigned long addr;
+   int error;
+
+   if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
+   return -EOPNOTSUPP;
+
+   if (mode & FALLOC_FL_PUNCH_HOLE)
+   return hugetlbfs_punch_hole(inode, offset, len);
+
+   /*
+* Default preallocate case.
+* For this range, start is rounded down and end is rounded up.
+*/
+   start = offset >> hpage_shift;
+   end = (offset + len + hpage_size - 1) >> hpage_shift;
+
+   mutex_lock(>i_mutex);
+
+   /* We need to check rlimit even when FALLOC_FL_KEEP_SIZE */
+   error = inode_newsize_ok(inode, offset + len);
+   if (error)
+   goto out;
+
+   /*
+* Initialize a pseudo vma that just contains the policy used
+* when allocating the huge pages.  The actual policy field
+* (vm_policy) is determined based on the index in the loop below.
+*/
+   memset(_vma, 0, sizeof(struct vm_area_struct));
+   pseudo_vma.vm_start = 0;
+   pseudo_vma.vm_flags |= (VM_HUGETLB | VM_MAYSHARE);
+   pseudo_vma.vm_file = file;
+
+   /* addr is the offset within the file (zero based) */
+   addr = start * hpage_size;
+   for (index = start; index < end; index++) {
+   /*
+* This is supposed to be the vaddr where the page is being
+* faulted in, but we have no vaddr here.
+*/
+   struct page *page;
+   int avoid_reserve = 1;
+
+   cond_resched();
+
+   /*
+* fallocate(2) manpage permits EINTR; we may have been
+* interrupted because we are using up too much memory.
+*/
+   if (signal_pending(current)) {
+   error = -EINTR;
+   break;
+   }
+   page = find_get_page(mapping, index);
+   if (page) {
+   put_page(page);
+   continue;
+   }
+
+   /* Get policy based on index */
+   pseudo_vma.vm_policy =
+   mpol_shared_policy_lookup(_I(inode)->policy,
+   index);
+
+   page = alloc_huge_page(_vma, addr, avoid_reserve);
+   mpol_cond_put(pseudo_vma.vm_policy);
+   if (IS_ERR(page)) {
+   error = PTR_ERR(page);
+

[PATCH 03/16] printk: move LOG_NOCONS skipping into call_console_drivers()

2015-04-16 Thread Tejun Heo

When a line is printed by multiple printk invocations, each chunk is
directly sent out to console drivers so that they don't get lost.
When the line is completed and stored in the log buffer, the line is
suppressed from going out to consoles as that'd lead to duplicate
outputs.  This is tracked with LOG_NOCONS flag.

The suppression is currently implemented in console_unlock() which
skips invoking call_console_drivers() for LOG_NOCONS messages.  This
patch moves the filtering into call_console_drivers() in preparation
of the planned extended console drivers which will deal with the
duplicate messages themselves.

While this makes call_console_drivers() iterate over LOG_NOCONS
messages, this is extremely unlikely to matter especially given that
continuation lines aren't that common and also simplifies
console_unlock() a bit.

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
Cc: Petr Mladek 
---
 kernel/printk/printk.c | 46 --
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5ea6709..0175c46 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1417,7 +1417,8 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, 
int, len)
  * log_buf[start] to log_buf[end - 1].
  * The console_lock must be held.
  */
-static void call_console_drivers(int level, const char *text, size_t len)
+static void call_console_drivers(int level, bool nocons,
+const char *text, size_t len)
 {
struct console *con;
 
@@ -1438,6 +1439,13 @@ static void call_console_drivers(int level, const char 
*text, size_t len)
if (!cpu_online(smp_processor_id()) &&
!(con->flags & CON_ANYTIME))
continue;
+   /*
+* Skip record we have buffered and already printed
+* directly to the console when we received it.
+*/
+   if (nocons)
+   continue;
+
con->write(con, text, len);
}
 }
@@ -1919,7 +1927,8 @@ static struct cont {
 } cont;
 static struct printk_log *log_from_idx(u32 idx) { return NULL; }
 static u32 log_next(u32 idx) { return 0; }
-static void call_console_drivers(int level, const char *text, size_t len) {}
+static void call_console_drivers(int level, bool nocons,
+const char *text, size_t len) {}
 static size_t msg_print_text(const struct printk_log *msg, enum log_flags prev,
 bool syslog, char *buf, size_t size) { return 0; }
 static size_t cont_print_text(char *text, size_t size) { return 0; }
@@ -2190,7 +2199,7 @@ static void console_cont_flush(char *text, size_t size)
len = cont_print_text(text, size);
raw_spin_unlock(_lock);
stop_critical_timings();
-   call_console_drivers(cont.level, text, len);
+   call_console_drivers(cont.level, false, text, len);
start_critical_timings();
local_irq_restore(flags);
return;
@@ -2234,6 +2243,7 @@ again:
struct printk_log *msg;
size_t len;
int level;
+   bool nocons;
 
raw_spin_lock_irqsave(_lock, flags);
if (seen_seq != log_next_seq) {
@@ -2252,38 +2262,30 @@ again:
} else {
len = 0;
}
-skip:
+
if (console_seq == log_next_seq)
break;
 
msg = log_from_idx(console_idx);
-   if (msg->flags & LOG_NOCONS) {
-   /*
-* Skip record we have buffered and already printed
-* directly to the console when we received it.
-*/
-   console_idx = log_next(console_idx);
-   console_seq++;
-   /*
-* We will get here again when we register a new
-* CON_PRINTBUFFER console. Clear the flag so we
-* will properly dump everything later.
-*/
-   msg->flags &= ~LOG_NOCONS;
-   console_prev = msg->flags;
-   goto skip;
-   }
-
level = msg->level;
+   nocons = msg->flags & LOG_NOCONS;
len += msg_print_text(msg, console_prev, false,
  text + len, sizeof(text) - len);
console_idx = log_next(console_idx);
console_seq++;
console_prev = msg->flags;
+
+   /*
+* The log will be processed again when we register a new
+* CON_PRINTBUFFER console. Clear the flag so we will
+* properly dump everything later.
+*/
+   msg->flags &= ~LOG_NOCONS;
+

[RFC PATCH 0/4] hugetlbfs: add fallocate support

2015-04-16 Thread Mike Kravetz

hugetlbfs is used today by applications that want a high degree of
control over huge page usage.  Often, large hugetlbfs files are used
to map a large number huge pages into the application processes.
The applications know when page ranges within these large files will
no longer be used, and ideally would like to release them back to
the subpool or global pools for other uses.  The fallocate() system
call provides an interface for preallocation and hole punching within
files.  This patch set adds fallocate functionality to hugetlbfs.

Mike Kravetz (4):
  hugetlbfs: truncate_hugepages() takes a range of pages
  hugetlbfs: New huge_add_to_page_cache helper routine
  hugetlbfs: add hugetlbfs_fallocate()
  mm: madvise allow remove operation for hugetlbfs

 fs/hugetlbfs/inode.c| 164 ++--
 include/linux/hugetlb.h |   5 ++
 mm/hugetlb.c|  29 ++---
 mm/madvise.c|   2 +-
 4 files changed, 185 insertions(+), 15 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 2/4] hugetlbfs: New huge_add_to_page_cache helper routine

2015-04-16 Thread Mike Kravetz

Currently, there is  only a single place where hugetlbfs pages are
added to the page cache.  The new fallocate code be adding a second
one, so break the functionality out into its own helper.

Signed-off-by: Dave Hansen 
Signed-off-by: Mike Kravetz 
---
 include/linux/hugetlb.h |  2 ++
 mm/hugetlb.c| 27 ++-
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 7b57850..6425945 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -325,6 +325,8 @@ struct huge_bootmem_page {
 struct page *alloc_huge_page_node(struct hstate *h, int nid);
 struct page *alloc_huge_page_noerr(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
+int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
+   pgoff_t idx);
 
 /* arch callback */
 int __init alloc_bootmem_huge_page(struct hstate *h);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c41b2a0..7cda328 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2959,6 +2959,23 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
return page != NULL;
 }
 
+int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
+  pgoff_t idx)
+{
+   struct inode *inode = mapping->host;
+   struct hstate *h = hstate_inode(inode);
+   int err = add_to_page_cache(page, mapping, idx, GFP_KERNEL);
+
+   if (err)
+   return err;
+   ClearPagePrivate(page);
+
+   spin_lock(>i_lock);
+   inode->i_blocks += blocks_per_huge_page(h);
+   spin_unlock(>i_lock);
+   return 0;
+}
+
 static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
   struct address_space *mapping, pgoff_t idx,
   unsigned long address, pte_t *ptep, unsigned int 
flags)
@@ -3005,21 +3022,13 @@ retry:
__SetPageUptodate(page);
 
if (vma->vm_flags & VM_MAYSHARE) {
-   int err;
-   struct inode *inode = mapping->host;
-
-   err = add_to_page_cache(page, mapping, idx, GFP_KERNEL);
+   int err = huge_add_to_page_cache(page, mapping, idx);
if (err) {
put_page(page);
if (err == -EEXIST)
goto retry;
goto out;
}
-   ClearPagePrivate(page);
-
-   spin_lock(>i_lock);
-   inode->i_blocks += blocks_per_huge_page(h);
-   spin_unlock(>i_lock);
} else {
lock_page(page);
if (unlikely(anon_vma_prepare(vma))) {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 1/4] hugetlbfs: truncate_hugepages() takes a range of pages

2015-04-16 Thread Mike Kravetz

Modify truncate_hugepages() to take a range of pages (start, end)
instead of simply start.  If the value of end is -1, this indicates
the end of the range is the end of the file.  This functionality
will be used for fallocate hole punching.

Signed-off-by: Dave Hansen 
Signed-off-by: Mike Kravetz 
---
 fs/hugetlbfs/inode.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index c274aca..d5b67fd 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -324,11 +324,12 @@ static void truncate_huge_page(struct page *page)
delete_from_page_cache(page);
 }
 
-static void truncate_hugepages(struct inode *inode, loff_t lstart)
+static void truncate_hugepages(struct inode *inode, loff_t lstart, loff_t lend)
 {
struct hstate *h = hstate_inode(inode);
struct address_space *mapping = >i_data;
const pgoff_t start = lstart >> huge_page_shift(h);
+   const pgoff_t end = lend >> huge_page_shift(h);
struct pagevec pvec;
pgoff_t next;
int i, freed = 0;
@@ -336,7 +337,19 @@ static void truncate_hugepages(struct inode *inode, loff_t 
lstart)
pagevec_init(, 0);
next = start;
while (1) {
-   if (!pagevec_lookup(, mapping, next, PAGEVEC_SIZE)) {
+   long lookup_nr = PAGEVEC_SIZE;
+
+   /*
+* Make sure to never grab more pages that we
+* might possibly need.
+*/
+   if (end - start < lookup_nr)
+   lookup_nr = end - start;
+   /*
+* This pagevec_lookup() may return pages past 'end',
+* so we must check for page->index > end.
+*/
+   if (!pagevec_lookup(, mapping, next, lookup_nr)) {
if (next == start)
break;
next = start;
@@ -347,6 +360,10 @@ static void truncate_hugepages(struct inode *inode, loff_t 
lstart)
struct page *page = pvec.pages[i];
 
lock_page(page);
+   if (page->index >= end) {
+   unlock_page(page);
+   break;
+   }
if (page->index > next)
next = page->index;
++next;
@@ -364,7 +381,7 @@ static void hugetlbfs_evict_inode(struct inode *inode)
 {
struct resv_map *resv_map;
 
-   truncate_hugepages(inode, 0);
+   truncate_hugepages(inode, 0, -1);
resv_map = (struct resv_map *)inode->i_mapping->private_data;
/* root inode doesn't have the resv_map, so we should check it */
if (resv_map)
@@ -410,7 +427,7 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t 
offset)
if (!RB_EMPTY_ROOT(>i_mmap))
hugetlb_vmtruncate_list(>i_mmap, pgoff);
i_mmap_unlock_write(mapping);
-   truncate_hugepages(inode, offset);
+   truncate_hugepages(inode, offset, -1);
return 0;
 }
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [trivial PATCH] arm: dts: socfpga: Remove executable permission bits

2015-04-16 Thread Rob Herring

+arm-soc

On Thu, Apr 16, 2015 at 5:49 PM, Joe Perches  wrote:
> Change the .dts file permissions from 755 to 644.
>
> Signed-off-by: Joe Perches 

ARM dts files go in thru arm-soc tree.

Acked-by: Rob Herring 

> ---
>  arch/arm/boot/dts/socfpga_arria10_socdk.dts | 0
>  1 file changed, 0 insertions(+), 0 deletions(-)
>
> diff --git a/arch/arm/boot/dts/socfpga_arria10_socdk.dts 
> b/arch/arm/boot/dts/socfpga_arria10_socdk.dts
> old mode 100755
> new mode 100644
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install

2015-04-16 Thread Eric Dumazet

On Fri, 2015-04-17 at 00:00 +0200, Mateusz Guzik wrote:
> On Thu, Apr 16, 2015 at 01:55:39PM -0700, Eric Dumazet wrote:
> > On Thu, 2015-04-16 at 13:42 -0700, Eric Dumazet wrote:
> > > On Thu, 2015-04-16 at 19:09 +0100, Al Viro wrote:
> > > > On Thu, Apr 16, 2015 at 02:16:31PM +0200, Mateusz Guzik wrote:
> > > > > @@ -165,8 +165,10 @@ static int expand_fdtable(struct files_struct 
> > > > > *files, int nr)
> > > > >   cur_fdt = files_fdtable(files);
> > > > >   if (nr >= cur_fdt->max_fds) {
> > > > >   /* Continue as planned */
> > > > > + write_seqcount_begin(>fdt_seqcount);
> > > > >   copy_fdtable(new_fdt, cur_fdt);
> > > > >   rcu_assign_pointer(files->fdt, new_fdt);
> > > > > + write_seqcount_end(>fdt_seqcount);
> > > > >   if (cur_fdt != >fdtab)
> > > > >   call_rcu(_fdt->rcu, free_fdtable_rcu);
> > > > 
> > > > Interesting.  AFAICS, your test doesn't step anywhere near that path,
> > > > does it?  So basically you never hit the retries during that...
> > > 
> > > Right, but then the table is almost never changed for a given process,
> > > as we only increase it by power of two steps.
> > > 
> > > (So I scratch my initial comment, fdt_seqcount is really mostly read)
> > 
> > I tested Mateusz patch with my opensock program, mimicking a bit more
> > what a server does (having lot of sockets)
> > 
> > 24 threads running, doing close(randomfd())/socket() calls like crazy.
> > 
> > Before patch :
> > 
> > # time ./opensock 
> > 
> > real0m10.863s
> > user0m0.954s
> > sys 2m43.659s
> > 
> > 
> > After patch :
> > 
> > # time ./opensock
> > 
> > real0m9.750s
> > user0m0.804s
> > sys 2m18.034s
> > 
> > So this is an improvement for sure, but not massive.
> > 
> > perf record ./opensock ; report
> > 
> > 87.80%  opensock  [kernel.kallsyms]  [k] _raw_spin_lock 
> > 
> >|--52.70%-- __close_fd
> >|--46.41%-- __alloc_fd
> 
> My crap benchmark is here: http://people.redhat.com/~mguzik/pipebench.c
> (compile with -pthread, run with -s 10 -n 16 for 10 second test + 16
> threads)
> 
> As noted earlier it tends to go from rougly 300k ops/s to 400.
> 
> The fundamental problem here seems to be this pesky POSIX requirement of
> providing the lowest possible fd on each allocation (as a side note
> Linux breaks this with parallel fd allocs, where one of these backs off
> the reservation, not that I believe this causes trouble).

Note POSIX never talked about multi threads. The POSIX requirement came
from traditional linux stdin/stdout/stderr handling and legacy programs,
before dup2() even existed.


> 
> Ideally a process-wide switch could be implemented (e.g.
> prctl(SCRATCH_LOWEST_FD_REQ)) which would grant the kernel the freedom
> to return any fd it wants, so it would be possible to have fd ranges
> per thread and the like.

I played months ago with a SOCK_FD_FASTALLOC ;)

idea was to use a random starting point instead of 0.

But the bottleneck was really the spinlock, not the bit search, unless I
used 10 million fds in the program...

> 
> Having only a O_SCRATCH_POSIX flag passed to syscalls would still leave
> close() as a bottleneck.
> 
> In the meantime I consider the approach taken in my patch as an ok
> temporary improvement.

Yes please formally submit this patch.

Note that adding atomic bit operations could eventually allow to not
hold the spinlock at close() time.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[trivial PATCH] arm: dts: socfpga: Remove executable permission bits

2015-04-16 Thread Joe Perches

Change the .dts file permissions from 755 to 644.

Signed-off-by: Joe Perches 
---
 arch/arm/boot/dts/socfpga_arria10_socdk.dts | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/arch/arm/boot/dts/socfpga_arria10_socdk.dts 
b/arch/arm/boot/dts/socfpga_arria10_socdk.dts
old mode 100755
new mode 100644


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[trivial PATCH] tps6507x.txt: Remove executable permission

2015-04-16 Thread Joe Perches

Documention text files shouldn't be executable.

Signed-off-by: Joe Perches 
---
 Documentation/devicetree/bindings/mfd/tps6507x.txt | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/Documentation/devicetree/bindings/mfd/tps6507x.txt 
b/Documentation/devicetree/bindings/mfd/tps6507x.txt
old mode 100755
new mode 100644


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] bus: omap_l3_noc: Fix offset for DRA7 CLK1_HOST_CLK1_2 instance

2015-04-16 Thread Nishanth Menon

From: Illia Smyrnov 

The base address for DRA7 CLK1_HOST_CLK1_2 host instance is
0x4480, so correct offset is 0x80. DRA7 TRM rev X(fewb 2015)
has updates for this information.

With wrong offset these errors are not correctly cleared by the L3
IRQ handler and cause an continuous interrupt scenario and system lockup.

Signed-off-by: Illia Smyrnov 
Signed-off-by: Nishanth Menon 
---
 drivers/bus/omap_l3_noc.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/bus/omap_l3_noc.h b/drivers/bus/omap_l3_noc.h
index 95254585db86..a314d800f394 100644
--- a/drivers/bus/omap_l3_noc.h
+++ b/drivers/bus/omap_l3_noc.h
@@ -274,7 +274,7 @@ static struct l3_flagmux_data dra_l3_flagmux_clk1 = {
 
 static struct l3_target_data dra_l3_target_data_clk2[] = {
{0x0,   "HOST CLK1",},
-   {0x0,   "HOST CLK2",},
+   {0x80, "HOST CLK2",},
{0xdead, L3_TARGET_NOT_SUPPORTED,},
{0x3400, "SHA2_2",},
{0x0900, "BB2D",},
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install

2015-04-16 Thread Mateusz Guzik

On Thu, Apr 16, 2015 at 07:09:32PM +0100, Al Viro wrote:
> On Thu, Apr 16, 2015 at 02:16:31PM +0200, Mateusz Guzik wrote:
> > @@ -165,8 +165,10 @@ static int expand_fdtable(struct files_struct *files, 
> > int nr)
> > cur_fdt = files_fdtable(files);
> > if (nr >= cur_fdt->max_fds) {
> > /* Continue as planned */
> > +   write_seqcount_begin(>fdt_seqcount);
> > copy_fdtable(new_fdt, cur_fdt);
> > rcu_assign_pointer(files->fdt, new_fdt);
> > +   write_seqcount_end(>fdt_seqcount);
> > if (cur_fdt != >fdtab)
> > call_rcu(_fdt->rcu, free_fdtable_rcu);
> 
> Interesting.  AFAICS, your test doesn't step anywhere near that path,
> does it?  So basically you never hit the retries during that...

well, yeah. In fact for non-shared tables one could go a step further
and just plop the pointer in, but I don't know if that makes much sense.
Other processes inspecting the table could get away with a data
dependency barrier. Closing would still take the lock, so you can only
suddenly see filp installed, but never one going away.

Now, as far as correctness goes, I think there is a bug in the patch
(which does not invalidate the idea though). Chances are I got a fix as
well.

Benchmark prog is here: http://people.redhat.com/~mguzik/pipebench.c
A modified version: http://people.redhat.com/~mguzik/fdi-fail.c

Benchmark is just doing pipe + close in a loop in multiple threads.

Modified version spawns threads, sleeps 100 ms and does dup(0, 300) to
reallocate the table while other threads continue the work.

This succesfully tested retries (along with cases where installed file
got copied and was encountered during retry).

However, I see sporadic close failures. I presume this is because of a
missing read barrier after write_seqcount_begin. Adding a smp_mb()
seems to solve the problem, but I could only test on 2 * 16 Intel(R)
Xeon(R) CPU E5-2660 0 @ 2.20GHz.

My memory barrier-fu is rather weak and I'm not that confident in my
crap suspicion here.

Thoughts?

-- 
Mateusz Guzik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:x86/pmem] x86/mm: Add support for the non-standard protected e820 type

2015-04-16 Thread Andy Lutomirski

On Thu, Apr 2, 2015 at 12:51 PM, Andy Lutomirski  wrote:
> On Thu, Apr 2, 2015 at 12:13 PM, Ingo Molnar  wrote:
>>
>> * Andy Lutomirski  wrote:
>>
>>> On Thu, Apr 2, 2015 at 5:31 AM, tip-bot for Christoph Hellwig
>>>  wrote:
>>> > Commit-ID:  ec776ef6bbe1734c29cd6bd05219cd93b2731bd4
>>> > Gitweb: 
>>> > http://git.kernel.org/tip/ec776ef6bbe1734c29cd6bd05219cd93b2731bd4
>>> > Author: Christoph Hellwig 
>>> > AuthorDate: Wed, 1 Apr 2015 09:12:18 +0200
>>> > Committer:  Ingo Molnar 
>>> > CommitDate: Wed, 1 Apr 2015 17:02:43 +0200
>>> >
>>> > x86/mm: Add support for the non-standard protected e820 type
>>> >
>>> > Various recent BIOSes support NVDIMMs or ADR using a
>>> > non-standard e820 memory type, and Intel supplied reference
>>> > Linux code using this type to various vendors.
>>> >
>>> > Wire this e820 table type up to export platform devices for the
>>> > pmem driver so that we can use it in Linux.
>>>
>>> This scares me a bit.  Do we know that the upcoming ACPI 6.0
>>> enumeration mechanism *won't* use e820 type 12? [...]
>>
>> So I know nothing about it, but I'd be surprised if e820 was touched
>> at all, as e820 isn't really well suited to enumerate more complex
>> resources, and it appears pmem wants to grow into complex directions?
>
> I hope so, but I have no idea what the ACPI committee's schemes are.
>
> We could require pmem.enable_legacy_e820=Y to load the driver for now
> if we're concerned about it.
>

ACPI 6.0 is out:

http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf

AFAICT from a quick read, ACPI 6.0 systems will show EFI type 14
(EfiPersistentMemory), ACPI type 7 (AddressRangePersistentMemory) and
e820 type 7.

Type 12 is still "OEM defined".  See table 15-312.  Maybe I'm reading
this wrong.

*However*, ACPI 6.0 unsurprisingly also has a real enumeration
mechanism for NVDIMMs and such, and those should take precedence.

So this driver could plausibly be safe even on ACPI 6.0 systems.
Someone from one of the relevant vendors should probably confirm that.
I'm still a bit nervous, though.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: AM335x OMAP2 common clock external fixed-clock registration

2015-04-16 Thread Michael Welling

On Thu, Apr 16, 2015 at 10:37:19PM +0200, Sebastian Hesselbarth wrote:
> On 16.04.2015 18:17, Michael Welling wrote:
> >On Thu, Apr 16, 2015 at 07:32:32AM +0300, Tero Kristo wrote:
> >>On 04/15/2015 11:51 PM, Michael Welling wrote:
> >>>On Wed, Apr 15, 2015 at 01:45:53PM -0700, Mike Turquette wrote:
> On Wed, Apr 15, 2015 at 12:47 PM, Michael Welling  
> wrote:
> [...]
> >There is still an issue with the si5351.
> >
> >I had to comment out the clk_put here for the frequency to show up:
> >http://lxr.free-electrons.com/source/drivers/clk/clk-si5351.c#L1133
> >
> >Ideas?
> 
> What is the most recent upstream commit that you are based on?
> >>>
> >>>I am working from 4.0.0-rc7.
> >>>
> >>>7b43b47373d40d557cd7e1a84a0bd8ebc4d745ab
> >>
> >>Hmm, I wonder why si5351 calls clk_put immediately after of_clk_get
> >>in the first place, as far as I understand this destroys the clock
> >>handle, which is still being used later in the code.
> >
> >Not sure how this ever worked. This has been in the code since the
> >initial commit.
> 
> The reason it worked before may be related with recent rework of
> clk_put() itself and clk cookies instead of pointers. I lost track on
> the recent clk subsystem changes here, sorry.
> 
> However, droping the clk immediately surely isn't right.
> The thing is, we can remove the clk_put() just because there is no
> _remove() for that driver. I remember that back in the days the driver
> was mainlined, clk removal wasn't too easy.
> 
> FWIW, as soon as _remove() support will be added by someone, we'll have
> to rethink passing struct clk* by platform_data or at least
> double-check if we ever used [of_]clk_get() to obtain it.
> 
> Mind to send a patch removing the clk_put() on !IS_ERR and add a proper
> error path instead? While of_clk_get() is the only calls that need
> cleanup on error in si5351_dt_parse() we should probably move that
> calls to the end of this function. Otherwise we'd also have to cleanup
> on every of_parse_foo() failure.

What would be the proper error path?
What cleanup is required?

It should be noted that there are more deep rooted issues with the driver
that I have noticed. For one the driver behaves differently if the debugging
is on and when it is off.

Here is what the kernel reports with debugging off:
root@som3517-som200:~# cat /sys/kernel/debug/clk/clk_summary
   clock enable_cnt  prepare_cntrate   accuracy 
  phase

 ref27002700  0 
0  
xtal  002700  0 
0  
   pllb   00   59994  0 
0  
  ms0 001249  0 
0  
 clk0 001249  0 
0  
   plla   00   59994  0 
0  
  ms2 00 8219178  0 
0  
 clk2 00 8219178  0 
0  
  ms1 0094117646  0 
0  
 clk1 0094117646  0 
0  

Here is what the kernel reports with debugging on:
   clock enable_cnt  prepare_cntrate   accuracy 
  phase

 ref27002700  0 
0  
xtal  002700  0 
0  
   pllb   00   884736000  0 
0  
  ms0 0018432000  0 
0  
 clk0 0018432000  0 
0  
   plla   00   897023997  0 
0  
  ms2 0012287999  0 
0  
 clk2 0012287999  0 
0  
  ms1 00   140709646  0 
0  
 clk1 00   140709646  0 
0 

Note this is with the following devicetree entry:
si5351: clock-generator {
#address-cells = <1>;
#size-cells = <0>;
#clock-cells = <1>;
compatible = "silabs,si5351a-msop";
reg = <0x60>;
status = "okay";

/* connect xtal input to 27MHz reference */
clocks = <>;

/* connect xtal input as source of pll0 and pll1 */

Re: [PATCH] dmaengine: bcm2835: Add slave dma support

2015-04-16 Thread Noralf Trønnes



Den 15.04.2015 21:00, skrev Stefan Wahren:

Hi Noralf,

Am 15.04.2015 um 11:56 schrieb Noralf Trønnes:

Add slave transfer capability to BCM2835 dmaengine driver.
This patch is pulled from the bcm2708-dmaengine driver in the
Raspberry Pi repo. The work was done by Gellert Weisz.

Tested with the bcm2835-mmc driver from the same repo.


why not with the upstream kernel?



See my answer to Alexander Stein.


+unsigned int i, j, splitct, max_size;


I think "split_cnt" would be better.


+es = BCM2835_DMA_DATA_TYPE_S32;


Looks like "es" is never used.


+break;
+default:


A dev_err() might be useful here.


+d->frames += 1 + len / max_size;


If it's correct this should be more intuitive:

d->frames += len / max_size + 1;


+for_each_sg(sgl, sgent, sg_len, i) {
+dma_addr_t addr = sg_dma_address(sgent);
+unsigned int len = sg_dma_len(sgent);
+
+for (j = 0; j < len; j += max_size) {


It should be possible to move declaration of "j" down here.


+if (sync_type != 0)


if (sync_type) ?



Thanks for your comments Stefan, I'll make a new version of the patch.


Noralf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] net: dsa: register hwmon for any provided function

2015-04-16 Thread Vivien Didelot

Hi Guenter,

> >  switch (index) {
> > +case 0: /* temp1_input */
> > +if (drv->get_temp)
> > +mode |= S_IRUGO;
> 
> This should be mandatory. Sorry, I don't really understand what you are
> trying to accomplish here.
> 
> Can you give me a real world example where a chip would support setting
> a limit but not reading it ?

I have no such example. I just did not see why this couldn't be allowed
(e.g. setting only set_temp_limit and get_temp_alarm looks fine to me).
But if you say that get_temp should be mandatory, I'm OK with that.

The primary goal of this patchset was to use DEVICE_ATTR_RW to declare
temp1_max, instead of reflecting the minimal permissions needed.

Best,
-v
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] dmaengine: bcm2835: Add slave dma support

2015-04-16 Thread Noralf Trønnes

Den 16.04.2015 21:06, skrev Alexander Stein:

Hi Stefan,

On Wednesday 15 April 2015, 21:00:26 wrote Stefan Wahren:

Am 15.04.2015 um 11:56 schrieb Noralf Trønnes:

Add slave transfer capability to BCM2835 dmaengine driver.
This patch is pulled from the bcm2708-dmaengine driver in the
Raspberry Pi repo. The work was done by Gellert Weisz.

Tested with the bcm2835-mmc driver from the same repo.

why not with the upstream kernel?

I also looked at slave dma support, especially for use in mmc. It turns our 
that bcm2835-mmc is written more or less completly new.
Mainline linux uses sdhci "framework" which internally uses the SDMA and/or 
ADMA (both internal, to SD/MMC controller, DMA units) which can be supported by an SDHCI 
compatible controller.
AFAIK the SD/MMC controller in bcm2835 lacks both that is why the driver only 
uses PIO. I dunno if external DMA usage can so easily be integrated into the 
sdhci, I have my doubts.

I asked Jonathan Bell (Raspberry Pi) about why a new driver was made
instead of extending sdhci-bcm2835.

On 10.04.2015 20:02, Jonathan Bell wrote:
> Basically, it's impossible to integrate platform DMA channel support
> within the SDHCI framework. The Arasan controller (and the Broadcom
> MMCI controller) both use platform DMA channels to pump data to/from
> the host FIFO. Our old "sdhci-bcm2708" driver basically hacked sdhci.c
> to allow platform DMA support in a way that was guaranteed to cause
> merge conflicts with every new kernel branch. The reasoning behind
> creating an MMC-level driver was to minimise disruption of incorporating
> platform DMA and to have additional control e.g. on sequencing of 
commands

> that are known to have bugs/problems. There are drivers in the source
> tree that are "SDHCI compliant" but have their own various idiosyncrasies
> - e.g. : 
http://lxr.free-electrons.com/source/drivers/mmc/host/davinci_mmc.c

> Implementing an MMC-level driver was the easiest way to incorporate all
> our various bits of baggage (random necessary delays here, busy-wait 
there)

> without disrupting the rest of the codebase. I agree that some functions
> could just substitute the sdhci.c equivalents and deduplicate some of 
the code.

Stephen Warren made this comment on a previous attempt to upstream the
bcm2835-mmc driver:

On Tue, 28 Oct 2014 19:55:20 -0600, Stephen Warren wrote:
> On 10/28/2014 06:00 PM, Piotr Król wrote:
> > This is driver for Arasan External Mass Media Controller provided in
> > Raspberry Pi single board computer.
>
> We should not have multiple drivers for the same HW. The correct
> approach would be to enhance the existing sdhci-bcm2835.c to support any
> new features or bug-fixes embodied within this driver. Presumably that
> way, you'd also end up with a lot of small feature patches, which would
> make patch review easier. Consequently I haven't reviewed this patch 
much.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] Add smp booting support for Qualcomm ARMv8 SoCs

2015-04-16 Thread Matt Sealey

Hi Rob,

On Thu, Apr 16, 2015 at 12:17 PM, Rob Clark  wrote:
> On Thu, Apr 16, 2015 at 11:21 AM, Catalin Marinas
>  wrote:
>
>> But I'm definitely going to discourage companies like Qualcomm
>> deliberately ignoring the existing booting protocols while trying to get
>> their code upstream. This patch series is posted by Qualcomm without
>> providing any technical reason on why they don't want to/couldn't use
>> PSCI (well, I guess there is no technical reason but they may not care
>> much about mainline either).
>
> Sure.. just trying to make sure the wrong people don't end up being
> the ones that suffer.  I would assume/expect that it is at least
> possible for qcom to change firmware/bootloader for their dev boards
> and future devices and whatnot, whether they grumble about it or not.
> But I guess most of what the general public has are devices w/ signed
> fw, which is why "go fix your firmware" is an option that sets off
> alarm bells for me.
>
> I guess the first device where this will matter to me and a large
> group of community folks would be the dragonboard 410c..  *hopefully*
> it does not require signed firmware or at least qcom could make
> available signed firmware which supports psci..

For development boards, one would hope there is a way to sign your own firmware.
You can't expect - even for a phone SoC - that the development boards require
entering some kind of Faustian contract for development of low-level
software. What if
someone wants to develop a platform that doesn't require signing?

That said most of these dev boards have completely mangled JTAG anyway, and
I know Inforce (and Hardkernel, and so on) love their barely-ever-updated custom
firmware binaries, so..

The best thing would be to pick up one of those boards and port a PSCI firmware
to it (ATF or your own..) and just embarrass the SoC vendor by having better
mainline power management support (implemented by 10 lines in a device tree)
with the complicated code hidden away behind the scenes there, like it
should have
been done in the first place..

Ta.
Matt Sealey 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install

2015-04-16 Thread Mateusz Guzik

On Thu, Apr 16, 2015 at 01:55:39PM -0700, Eric Dumazet wrote:
> On Thu, 2015-04-16 at 13:42 -0700, Eric Dumazet wrote:
> > On Thu, 2015-04-16 at 19:09 +0100, Al Viro wrote:
> > > On Thu, Apr 16, 2015 at 02:16:31PM +0200, Mateusz Guzik wrote:
> > > > @@ -165,8 +165,10 @@ static int expand_fdtable(struct files_struct 
> > > > *files, int nr)
> > > > cur_fdt = files_fdtable(files);
> > > > if (nr >= cur_fdt->max_fds) {
> > > > /* Continue as planned */
> > > > +   write_seqcount_begin(>fdt_seqcount);
> > > > copy_fdtable(new_fdt, cur_fdt);
> > > > rcu_assign_pointer(files->fdt, new_fdt);
> > > > +   write_seqcount_end(>fdt_seqcount);
> > > > if (cur_fdt != >fdtab)
> > > > call_rcu(_fdt->rcu, free_fdtable_rcu);
> > > 
> > > Interesting.  AFAICS, your test doesn't step anywhere near that path,
> > > does it?  So basically you never hit the retries during that...
> > 
> > Right, but then the table is almost never changed for a given process,
> > as we only increase it by power of two steps.
> > 
> > (So I scratch my initial comment, fdt_seqcount is really mostly read)
> 
> I tested Mateusz patch with my opensock program, mimicking a bit more
> what a server does (having lot of sockets)
> 
> 24 threads running, doing close(randomfd())/socket() calls like crazy.
> 
> Before patch :
> 
> # time ./opensock 
> 
> real  0m10.863s
> user  0m0.954s
> sys   2m43.659s
> 
> 
> After patch :
> 
> # time ./opensock
> 
> real  0m9.750s
> user  0m0.804s
> sys   2m18.034s
> 
> So this is an improvement for sure, but not massive.
> 
> perf record ./opensock ; report
> 
> 87.80%  opensock  [kernel.kallsyms]  [k] _raw_spin_lock   
>   
>|--52.70%-- __close_fd
>|--46.41%-- __alloc_fd

My crap benchmark is here: http://people.redhat.com/~mguzik/pipebench.c
(compile with -pthread, run with -s 10 -n 16 for 10 second test + 16
threads)

As noted earlier it tends to go from rougly 300k ops/s to 400.

The fundamental problem here seems to be this pesky POSIX requirement of
providing the lowest possible fd on each allocation (as a side note
Linux breaks this with parallel fd allocs, where one of these backs off
the reservation, not that I believe this causes trouble).

Ideally a process-wide switch could be implemented (e.g.
prctl(SCRATCH_LOWEST_FD_REQ)) which would grant the kernel the freedom
to return any fd it wants, so it would be possible to have fd ranges
per thread and the like.

Having only a O_SCRATCH_POSIX flag passed to syscalls would still leave
close() as a bottleneck.

In the meantime I consider the approach taken in my patch as an ok
temporary improvement.

-- 
Mateusz Guzik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/4] fs: Add generic file system event notifications

2015-04-16 Thread Heinrich Schuchardt

On 15.04.2015 09:15, Beata Michalska wrote:
> Introduce configurable generic interface for file
> system-wide event notifications to provide file
> systems with a common way of reporting any potential
> issues as they emerge.
> 
> The notifications are to be issued through generic
> netlink interface, by a dedicated, for file system
> events, multicast group. The file systems might as
> well use this group to send their own custom messages.
> 
> The events have been split into four base categories:
> information, warnings, errors and threshold notifications,
> with some very basic event types like running out of space
> or file system being remounted as read-only.
> 
> Threshold notifications have been included to allow
> triggering an event whenever the amount of free space
> drops below a certain level - or levels to be more precise
> as two of them are being supported: the lower and the upper
> range. The notifications work both ways: once the threshold
> level has been reached, an event shall be generated whenever
> the number of available blocks goes up again re-activating
> the threshold.
> 
> The interface has been exposed through a vfs. Once mounted,
> it serves as an entry point for the set-up where one can
> register for particular file system events.

Having a framework for notification for file systems is a great idea.
Your solution covers an important part of the possible application scope.

Before moving forward I suggest we should analyze if this scope should
be enlarged.

Many filesystems are remote (e.g. CIFS/Samba) or distributed over many
network nodes (e.g. Lustre). How should file system notification work here?

How will fuse file systems be served?

The current point of reference is a single mount point.
Every time I insert an USB stick several file system may be automounted.
I would like to receive events for these automounted file systems.

A similar case arises when starting new virtual machines. How will I
receive events on the host system for the file systems of the virtual
machines?

In your implementation events are received via Netlink.
Using Netlink for marking mounts for notification would create a much
more homogenous interface. So why should we use a virtual file system here?

Best regards

Heinrich Schuchardt

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ARM: dts: dra7: Fix efuse register size for ABB

2015-04-16 Thread Nishanth Menon

Fix a typo in DRA7 dtsi where 12 bytes are needed for register
description of ABB efuse registers, however only 8 bytes are provided
to map. For some weird reason, this does not generate abort at offset
0x8, probably due to default maps already provided in io.c for the bus
register ranges.

Reported-by: Matt Gessner 
Reported-by: Suman Anna 
Signed-off-by: Nishanth Menon 
---
 arch/arm/boot/dts/dra7.dtsi |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
index 5332b57b4950..92b8bb873f43 100644
--- a/arch/arm/boot/dts/dra7.dtsi
+++ b/arch/arm/boot/dts/dra7.dtsi
@@ -911,7 +911,7 @@
ti,clock-cycles = <16>;
 
reg = <0x4ae07ddc 0x4>, <0x4ae07de0 0x4>,
- <0x4ae06014 0x4>, <0x4a003b20 0x8>,
+ <0x4ae06014 0x4>, <0x4a003b20 0xc>,
  <0x4ae0c158 0x4>;
reg-names = "setup-address", "control-address",
"int-address", "efuse-address",
@@ -944,7 +944,7 @@
ti,clock-cycles = <16>;
 
reg = <0x4ae07e34 0x4>, <0x4ae07e24 0x4>,
- <0x4ae06010 0x4>, <0x4a0025cc 0x8>,
+ <0x4ae06010 0x4>, <0x4a0025cc 0xc>,
  <0x4a002470 0x4>;
reg-names = "setup-address", "control-address",
"int-address", "efuse-address",
@@ -977,7 +977,7 @@
ti,clock-cycles = <16>;
 
reg = <0x4ae07e30 0x4>, <0x4ae07e20 0x4>,
- <0x4ae06010 0x4>, <0x4a0025e0 0x8>,
+ <0x4ae06010 0x4>, <0x4a0025e0 0xc>,
  <0x4a00246c 0x4>;
reg-names = "setup-address", "control-address",
"int-address", "efuse-address",
@@ -1010,7 +1010,7 @@
ti,clock-cycles = <16>;
 
reg = <0x4ae07de4 0x4>, <0x4ae07de8 0x4>,
- <0x4ae06010 0x4>, <0x4a003b08 0x8>,
+ <0x4ae06010 0x4>, <0x4a003b08 0xc>,
  <0x4ae0c154 0x4>;
reg-names = "setup-address", "control-address",
"int-address", "efuse-address",
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] Add smp booting support for Qualcomm ARMv8 SoCs

2015-04-16 Thread Catalin Marinas

On Thu, Apr 16, 2015 at 01:17:32PM -0400, Rob Clark wrote:
> On Thu, Apr 16, 2015 at 11:21 AM, Catalin Marinas
>  wrote:
> > On Wed, Apr 15, 2015 at 11:01:17AM -0400, Rob Clark wrote:
> >> There are folks who are working to get saner, more-upstream kernels
> >> working on devices.. and improving kernel infrastructure for
> >> device-needs (well, in my neck of the woods, there is drm/kms atomic
> >> and dsi/panel framework stuff.. I'm sure other similar things in other
> >> kernel domains).  And it seems like that is a good thing to encourage,
> >> rather than stymie.
> >
> > I'm not looking to discourage individuals trying to get upstream support
> > for older boards. Should the need arise, we'll look at options which may
> > or may not include kernel changes (e.g. wrap the kernel in a shim).
> 
> I had wondered about that, but afaiu psci is a smc interface, and I
> would assume bootloader drops you out of secure mode before entering
> the kernel or whatever comes after first stage bootloader if you are
> chain-loading?

One option is to run the shim at EL2 (hyp mode) which can intercept the
SMC calls from EL1. Another option is to release all the CPUs from
firmware into the shim and use a spin-table enable method.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[for-next][PATCH] tracing: Fix incorrect enabling of trace events by boot cmdline

2015-04-16 Thread Steven Rostedt

Another small fix that I'll push in this merge window.

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
for-next

Head SHA1: 84fce9db4d7eaebd6cb2ee30c15da6d4e4daf846


Joonsoo Kim (1):
  tracing: Fix incorrect enabling of trace events by boot cmdline


 kernel/trace/trace_events.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)
---
commit 84fce9db4d7eaebd6cb2ee30c15da6d4e4daf846
Author: Joonsoo Kim 
Date:   Thu Apr 16 13:44:44 2015 +0900

tracing: Fix incorrect enabling of trace events by boot cmdline

There is a problem that trace events are not properly enabled with
boot cmdline. The problem is that if we pass 
"trace_event=kmem:mm_page_alloc"
to the boot cmdline, it enables all kmem trace events, and not just
the page_alloc event.

This is caused by the parsing mechanism. When we parse the cmdline, the 
buffer
contents is modified due to tokenization. And, if we use this buffer
again, we will get the wrong result.

Unfortunately, this buffer is be accessed three times to set trace events
properly at boot time. So, we need to handle this situation.

There is already code handling ",", but we need another for ":".
This patch adds it.

Link: 
http://lkml.kernel.org/r/1429159484-22977-1-git-send-email-iamjoonsoo@lge.com

Cc: sta...@vger.kernel.org # 3.19+
Signed-off-by: Joonsoo Kim 
[ added missing return ret; ]
Signed-off-by: Steven Rostedt 

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index a576bbe75577..36a957c996c7 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -565,6 +565,7 @@ static int __ftrace_set_clr_event(struct trace_array *tr, 
const char *match,
 static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
 {
char *event = NULL, *sub = NULL, *match;
+   int ret;
 
/*
 * The buf format can be :
@@ -590,7 +591,13 @@ static int ftrace_set_clr_event(struct trace_array *tr, 
char *buf, int set)
event = NULL;
}
 
-   return __ftrace_set_clr_event(tr, match, sub, event, set);
+   ret = __ftrace_set_clr_event(tr, match, sub, event, set);
+
+   /* Put back the colon to allow this to be called again */
+   if (buf)
+   *(buf - 1) = ':';
+
+   return ret;
 }
 
 /**
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [alsa-devel] [PATCH -next] ASoC: Intel: sst_hsw: remove kfree for memory allocated with devm_kzalloc

2015-04-16 Thread Jin, Yao

For HSW, a patch "[PATCH] ASoC: Intel: Remove invalid kfree of devm
allocated data" to fix this issue has been applied.

But yes, we also need a similar patch for Baytrail.

Thanks
Jin Yao

On 2015/4/16 22:08, Jarkko Nikula wrote:
> On 04/16/2015 04:46 PM, weiyj...@163.com wrote:
>> From: Wei Yongjun 
>>
>> It's not necessary to free memory allocated with devm_kzalloc
>> and using kfree leads to a double free.
>>
>> Signed-off-by: Wei Yongjun 
>> ---
>>   sound/soc/intel/haswell/sst-haswell-ipc.c | 1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/sound/soc/intel/haswell/sst-haswell-ipc.c
>> b/sound/soc/intel/haswell/sst-haswell-ipc.c
>> index 344a1e9..324eceb 100644
>> --- a/sound/soc/intel/haswell/sst-haswell-ipc.c
>> +++ b/sound/soc/intel/haswell/sst-haswell-ipc.c
>> @@ -2201,7 +2201,6 @@ dma_err:
>>   dsp_new_err:
>>   sst_ipc_fini(ipc);
>>   ipc_init_err:
>> -kfree(hsw);
>>   return ret;
>>   }
>>   EXPORT_SYMBOL_GPL(sst_hsw_dsp_init);
>>
> Similar case than with baytrail. Here introduced by commit 0e7921e9583b
> ("ASoC: Intel: Use the generic IPC/mailbox APIs in Broadwell")
> 
> Reviewed-by: Jarkko Nikula 
> ___
> Alsa-devel mailing list
> alsa-de...@alsa-project.org
> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] net: dsa: register hwmon for any provided function

2015-04-16 Thread Guenter Roeck

On Thu, Apr 16, 2015 at 02:38:19PM -0400, Vivien Didelot wrote:
> A switch driver may only provide one of the temperature limit accessors,
> or the temperature alarm getter. So register the hwmon subsystem if any
> of the related functions is provided.
> 
> Thus, check get_temp to set the visibility of temp1_input.
> 
> Signed-off-by: Vivien Didelot 
> ---
>  net/dsa/dsa.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index 67d2983..6b68994 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -157,6 +157,10 @@ static umode_t dsa_hwmon_attrs_visible(struct kobject 
> *kobj,
>   umode_t mode = 0;
>  
>   switch (index) {
> + case 0: /* temp1_input */
> + if (drv->get_temp)
> + mode |= S_IRUGO;

This should be mandatory. Sorry, I don't really understand what you are
trying to accomplish here.

Can you give me a real world example where a chip would support setting
a limit but not reading it ?

Thanks,
Guenter

> + break;
>   case 1: /* temp1_max */
>   if (drv->get_temp_limit)
>   mode |= S_IRUGO;
> @@ -310,7 +314,8 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
> struct device *parent)
>* register with hardware monitoring subsystem.
>* Treat registration error as non-fatal and ignore it.
>*/
> - if (drv->get_temp) {
> + if (drv->get_temp || drv->get_temp_limit || drv->set_temp_limit ||
> + drv->get_temp_alarm) {
>   const char *netname = netdev_name(dst->master_netdev);
>   char hname[IFNAMSIZ + 1];
>   int i, j;
> -- 
> 2.3.5
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1262 matches

Mail list logo