date:20140814

Re: [PATCH RFC v4 net-next 23/26] samples: bpf: elf file loader

2014-08-14 Thread Alexei Starovoitov

On Thu, Aug 14, 2014 at 12:29 PM, Brendan Gregg
 wrote:
> On Wed, Aug 13, 2014 at 12:57 AM, Alexei Starovoitov  
> wrote:
> [...]
>> +static int load_and_attach(const char *event, struct bpf_insn *prog, int 
>> size)
>> +{
>> +   int fd, event_fd, err;
>> +   char fmt[32];
>> +   char path[256] = DEBUGFS;
>> +
>> +   fd = bpf_prog_load(BPF_PROG_TYPE_TRACING_FILTER, prog, size, 
>> license);
>> +
>> +   if (fd < 0) {
>> +   printf("err %d errno %d\n", fd, errno);
>> +   return fd;
>> +   }
>
> Minor suggestion: since this is sample code, I'd always print the bpf
> log after this this printf() error message:
>
> printf("%s", bpf_log_buf);
>
> Which has helped me debug my eBPF programs, as will be the case for
> anyone hacking on the examples.

Good point. Will do in V5.

> Or have a function for logdie(), if
> the log buffer may be populated with useful messages from other error
> paths as well.

This log buffer is an optional buffer that eBPF verifier is using to
store its messages. Mainly for humans to understand why verifier
rejected the program. It's also used by verifier testsuite to check
that reject reason actually matches the test intent.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-14 Thread Mattis Lorentzon

Fabio,

> Do the stalls also happen on a pure 3.16 kernel?

Yes, we just tried this out overnight and we get the same stalls here.
We have seen similar problems on a Zynq-based board. It might be
worth noting that a common chip between all three boards is, for
example, the KSZ9021RN, while the FEC driver, for example, only
runs on the two iMX6-boards.

> How can we reproduce the error?

We mostly run SSH with benchmarks using NFS, it can probably be
triggered by using only SSH with the following loop:

# while : ; do ssh arm-card date; done

Our (pure) 3.16 kernel uses the following config.
http://lkml.iu.edu/hypermail/linux/kernel/1408.1/03045/config.gz

(We have quite generously disabled a lot of sub-systems in our config.)

Best regards,
Mattis Lorentzon

***
Consider the environment before printing this message.

To read Autoliv's Information and Confidentiality Notice, follow this link:
http://www.autoliv.com/disclaimer.html
***

[PATCH V2] I2C: Rework kernel config I2C_ACPI

2014-08-14 Thread Lan Tianyu

Commit da3c6647(I2C/ACPI: Clean up I2C ACPI code and Add CONFIG_I2C_ACPI
config) adds a new kernel config I2C_ACPI and make I2C core built in
when the config is selected. This is wrong because distributions
etc generally compile I2C as a module and the commit broken that.
This patch is to rename I2C_ACPI to ACPI_I2C_OPREGION. New config
only controls ACPI I2C operation region code and depends on I2C=y.

Signed-off-by: Lan Tianyu 
---
 drivers/i2c/Kconfig| 20 +++-
 drivers/i2c/Makefile   |  2 +-
 drivers/i2c/i2c-acpi.c |  2 ++
 include/linux/i2c.h| 12 
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/i2c/Kconfig b/drivers/i2c/Kconfig
index 3e3b680..f0937e5 100644
--- a/drivers/i2c/Kconfig
+++ b/drivers/i2c/Kconfig
@@ -2,9 +2,7 @@
 # I2C subsystem configuration
 #
 
-menu "I2C support"
-
-config I2C
+menuconfig I2C
tristate "I2C support"
select RT_MUTEXES
---help---
@@ -23,17 +21,14 @@ config I2C
  This I2C support can also be built as a module.  If so, the module
  will be called i2c-core.
 
-config I2C_ACPI
-   bool "I2C ACPI support"
-   select I2C
-   depends on ACPI
+config ACPI_I2C_OPREGION
+   bool "ACPI I2C Operation region support"
+   depends on I2C=y && ACPI
default y
help
- Say Y here if you want to enable ACPI I2C support. This includes 
support
- for automatic enumeration of I2C slave devices and support for ACPI 
I2C
- Operation Regions. Operation Regions allow firmware (BIOS) code to
- access I2C slave devices, such as smart batteries through an I2C host
- controller driver.
+ Say Y here if you want to enable ACPI I2C operation region support.
+ Operation Regions allow firmware (BIOS) code to access I2C slave 
devices,
+ such as smart batteries through an I2C host controller driver.
 
 if I2C
 
@@ -139,4 +134,3 @@ config I2C_DEBUG_BUS
 
 endif # I2C
 
-endmenu
diff --git a/drivers/i2c/Makefile b/drivers/i2c/Makefile
index a1f590c..e0228b2 100644
--- a/drivers/i2c/Makefile
+++ b/drivers/i2c/Makefile
@@ -3,7 +3,7 @@
 #
 
 i2ccore-y := i2c-core.o
-i2ccore-$(CONFIG_I2C_ACPI) += i2c-acpi.o
+i2ccore-$(CONFIG_ACPI) += i2c-acpi.o
 
 obj-$(CONFIG_I2C_BOARDINFO)+= i2c-boardinfo.o
 obj-$(CONFIG_I2C)  += i2ccore.o
diff --git a/drivers/i2c/i2c-acpi.c b/drivers/i2c/i2c-acpi.c
index e8b6196..0dbc18c 100644
--- a/drivers/i2c/i2c-acpi.c
+++ b/drivers/i2c/i2c-acpi.c
@@ -126,6 +126,7 @@ void acpi_i2c_register_devices(struct i2c_adapter *adap)
dev_warn(>dev, "failed to enumerate I2C slaves\n");
 }
 
+#ifdef CONFIG_ACPI_I2C_OPREGION
 static int acpi_gsb_i2c_read_bytes(struct i2c_client *client,
u8 cmd, u8 *data, u8 data_len)
 {
@@ -360,3 +361,4 @@ void acpi_i2c_remove_space_handler(struct i2c_adapter 
*adapter)
 
acpi_bus_detach_private_data(handle);
 }
+#endif
diff --git a/include/linux/i2c.h b/include/linux/i2c.h
index ea50766..a95efeb 100644
--- a/include/linux/i2c.h
+++ b/include/linux/i2c.h
@@ -577,16 +577,20 @@ static inline struct i2c_adapter 
*of_find_i2c_adapter_by_node(struct device_node
 }
 #endif /* CONFIG_OF */
 
-#ifdef CONFIG_I2C_ACPI
-int acpi_i2c_install_space_handler(struct i2c_adapter *adapter);
-void acpi_i2c_remove_space_handler(struct i2c_adapter *adapter);
+#ifdef CONFIG_ACPI
 void acpi_i2c_register_devices(struct i2c_adapter *adap);
 #else
 static inline void acpi_i2c_register_devices(struct i2c_adapter *adap) { }
+#endif /* CONFIG_ACPI */
+
+#ifdef CONFIG_ACPI_I2C_OPREGION
+int acpi_i2c_install_space_handler(struct i2c_adapter *adapter);
+void acpi_i2c_remove_space_handler(struct i2c_adapter *adapter);
+#else
 static inline void acpi_i2c_remove_space_handler(struct i2c_adapter *adapter)
 { }
 static inline int acpi_i2c_install_space_handler(struct i2c_adapter *adapter)
 { return 0; }
-#endif
+#endif /* CONFIG_ACPI_I2C_OPREGION */
 
 #endif /* _LINUX_I2C_H */
-- 
1.8.4.rc0.1.g8f6a3e5.dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] mmc: core: Use regulator_get_voltage() if OCR mask is empty.

2014-08-14 Thread Tim Kryger

On Thu, Aug 14, 2014 at 8:19 AM, Mark Brown  wrote:

> Right, there's two things going on here.  One is that as you describe we
> shouldn't be putting constraints in .dtsi files if we don't know they're
> OK for a given board.  The other thing is that on this particular board
> it turns out that there's no support for varying the voltages at all so
> it doesn't make sense to have to specify a range, there's only one value
> anyway so the software really should be able to figure out that fixed
> value all by itself.

If constraints are truly irrelevant when the voltage supplied to
consumers is fixed, why doesn't regulator_list_voltage honor this
exemption and skip the voltage filtering that uses (potentially
unspecified) constraints when output is entirely determined by a
parent (or grandparent) supply that can't change its voltage?

It seems odd to make callers be the ones to handle this subtlety.

Thanks,
Tim Kryger
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups

2014-08-14 Thread Amit Shah

On (Wed) 13 Aug 2014 [06:00:49], Paul E. McKenney wrote:
> On Wed, Aug 13, 2014 at 11:14:39AM +0530, Amit Shah wrote:
> > On (Tue) 12 Aug 2014 [14:41:51], Paul E. McKenney wrote:
> > > On Tue, Aug 12, 2014 at 02:39:36PM -0700, Paul E. McKenney wrote:
> > > > On Tue, Aug 12, 2014 at 09:06:21AM -0700, Paul E. McKenney wrote:
> > > > > On Tue, Aug 12, 2014 at 11:03:21AM +0530, Amit Shah wrote:
> > > > 
> > > > [ . . . ]
> > > > 
> > > > > > I know of only virtio-console doing this (via userspace only,
> > > > > > though).
> > > > > 
> > > > > As in userspace within the guest?  That would not work.  The userspace
> > > > > that the qemu is running in might.  There is a way to extract ftrace 
> > > > > info
> > > > > from crash dumps, so one approach would be "sendkey alt-sysrq-c", then
> > > > > pull the buffer from the resulting dump.  For all I know, there might 
> > > > > also
> > > > > be some script that uses the qemu "x" command to get at the ftrace 
> > > > > buffer.
> > > > > 
> > > > > Again, I cannot reproduce this, and I have been through the code 
> > > > > several
> > > > > times over the past few days, and am not seeing it.  I could start
> > > > > sending you random diagnostic patches, but it would be much better if
> > > > > we could get the trace data from the failure.
> > 
> > I think the only recourse I now have is to dump the guest state from
> > qemu, and attempt to find the ftrace buffers by poking pages and
> > finding some ftrace-like struct... and then dumping the buffers.
> 
> The data exists in the qemu guest state, so it would be good to have
> it one way or another.  My current (perhaps self-serving) guess is that
> you have come up with a way to trick qemu into dropping IPIs.

I didn't get around to doing this yet; will get to it next week.

In the meantime, I tried this on RHEL6 (with RHEL6 qemu and gcc and
seabios), and that exhibits the problem similarly with my .config.



> > > +
> > >   return true;
> > 
> > I have return 1; here.
> > 
> > I'm on linux.git, c8d6637d0497d62093dbba0694c7b3a80b79bfe1.
> 
> I am working on top of my -rcu tree, which contains the fix from "1" to
> "true" compared to current mainline.  So this will resolve itself, and
> you should be OK fixing up conflict in either direction.

Yep, I did do that.  Just noted here that the hunk didn't directly
apply.

Thanks,

Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mm: compaction: buffer overflow in isolate_migratepages_range

2014-08-14 Thread Rafael Aquini

On Fri, Aug 15, 2014 at 07:36:16AM +0400, Konstantin Khlebnikov wrote:
> Don't hurry. The code in this state for years.
> I'm working on patches for this, if everything goes well I'll show it today.
> As usual I couldn't stop myself from cleaning the mess, so it will be
> bigger than yours.
>
Sorry,

I didn't see this reply of yours before sending out an adjusted-and-tested 
version of that patch, and asked Sasha to check it against his test-case.

Please, do not hesitate in providing your change ideas, though. I'd really
appreciate your assessment feedback on that code. 

Cheers,
-- Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] time,signal: protect resource use statistics with seqlock

2014-08-14 Thread Mike Galbraith

On Thu, 2014-08-14 at 19:48 +0200, Oleg Nesterov wrote: 
> On 08/14, Oleg Nesterov wrote:
> >
> > OK, lets forget about alternative approach for now. We can reconsider
> > it later. At least I have to admit that seqlock is more straighforward.
> 
> Yes.
> 
> But just for record, the "lockless" version doesn't look that bad to me,
> 
>   void thread_group_cputime(struct task_struct *tsk, struct task_cputime 
> *times)
>   {
>   struct signal_struct *sig = tsk->signal;
>   bool lockless, is_dead;
>   struct task_struct *t;
>   unsigned long flags;
>   u64 exec;
> 
>   lockless = true;
>   is_dead = !lock_task_sighand(p, );
>retry:
>   times->utime = sig->utime;
>   times->stime = sig->stime;
>   times->sum_exec_runtime = exec = sig->sum_sched_runtime;
>   if (is_dead)
>   return;
> 
>   if (lockless)
>   unlock_task_sighand(p, );
> 
>   rcu_read_lock();
>   for_each_thread(tsk, t) {
>   cputime_t utime, stime;
>   task_cputime(t, , );
>   times->utime += utime;
>   times->stime += stime;
>   times->sum_exec_runtime += task_sched_runtime(t);
>   }
>   rcu_read_unlock();
> 
>   if (lockless) {
>   lockless = false;
>   is_dead = !lock_task_sighand(p, );
>   if (is_dead || exec != sig->sum_sched_runtime)
>   goto retry;
>   }
>   unlock_task_sighand(p, );
>   }
> 
> The obvious problem is that we should shift lock_task_sighand() from the
> callers to thread_group_cputime() first, or add 
> thread_group_cputime_lockless()
> and change the current users one by one.
> 
> And of course, stats_lock is more generic.

Yours looks nice to me, particularly in that it doesn't munge structure
layout, could perhaps be backported to fix up production kernels.

For the N threads doing this on N cores case, seems rq->lock hammering
will still be a source of major box wide pain.  Is there any correctness
reason to add up unaccounted ->on_cpu beans, or is that just value
added?  Seems to me it can't matter, as you traverse, what you added up
on previous threads becomes ever more stale as you proceed, so big boxen
would be better off not doing that.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PULL] virtio-rng: add derating factor for use by hwrng core

2014-08-14 Thread Amit Shah

Hi Linus,

Sending directly to you with the commit log changes Ted Ts'o pointed
out.  Not sure if Rusty's back after his travel, but this already has
his s-o-b.

Please pull.

The following changes since commit c9d26423e56ce1ab4d786f92aebecf859d419293:

  Merge tag 'pm+acpi-3.17-rc1-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm (2014-08-14 
18:13:46 -0600)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/amit/virtio.git rng-queue

for you to fetch changes up to 34679ec7a0c45da8161507e1f2e1f72749dfd85c:

  virtio: rng: add derating factor for use by hwrng core (2014-08-15 10:26:01 
+0530)



Amit Shah (1):
  virtio: rng: add derating factor for use by hwrng core

 drivers/char/hw_random/virtio-rng.c | 1 +
 1 file changed, 1 insertion(+)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 1/1] virtio: rng: add derating factor for use by hwrng core

2014-08-14 Thread Amit Shah

The khwrngd thread is started when a hwrng device of sufficient
quality is registered.  The virtio-rng device is backed by the
hypervisor, and we trust the hypervisor to provide real entropy.

A malicious or badly-implemented hypervisor is a scenario that's
irrelevant -- such a setup is bound to cause all sorts of badness, and a
compromised hwrng is the least of the user's worries.

Given this, we might as well assume that the quality of randomness we
receive is perfectly trustworthy.  Hence, we use 100% for the factor,
indicating maximum confidence in the source.

Signed-off-by: Amit Shah 
Reviewed-by: H. Peter Anvin 
Reviewed-by: Amos Kong 
Signed-off-by: Rusty Russell 

---
Pretty small and contained patch; would be great if it is picked up for
3.17.

v2: re-word commit msg (hpa)
v3: re-word commit msg (tytso)
---
 drivers/char/hw_random/virtio-rng.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index 0027137..2e3139e 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -116,6 +116,7 @@ static int probe_common(struct virtio_device *vdev)
.cleanup = virtio_cleanup,
.priv = (unsigned long)vi,
.name = vi->name,
+   .quality = 1000,
};
vdev->priv = vi;
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mm: compaction: buffer overflow in isolate_migratepages_range

2014-08-14 Thread Rafael Aquini

Here's a potential final version for the patch mentioned in a earlier message.
The nitpick I raised to myself and a couple of other minor typing issues
are fixed.

I did a preliminary testround, in a KVM guest ballooning in and out memory by 
chunks of 1GB while a script within the guest was forcing 
compaction concurrently verything looked alright.

Sasha, could you give this a try to see if that reported KASAN warning
fades away, please?

Cheers,
-- Rafael

---8<---
From: Rafael Aquini 
Subject: [PATCH v2] mm: balloon_compaction: enhance balloon_page_movable()
 checkpoint against races

While testing linux-next for the Kernel Address Sanitizer patchset (KASAN)
Sasha Levin reported a buffer overflow warning triggered for
isolate_migratepages_range(), which later was discovered happening due to
a condition where balloon_page_movable() raced against move_to_new_page(),
while the later was copying the page->mapping of an anon page.

Because we can perform balloon_page_movable() in a lockless fashion at
isolate_migratepages_range(), the discovered race has unveiled the scheme
actually used to spot ballooned pages among page blocks that checks for
page_flags_cleared() and dereference page->mapping to check its mapping flags
is weak and potentially prone to stumble across another similar conditions
in the future.

Following Konstantin Khlebnikov's and Andrey Ryabinin's suggestions,
this patch replaces the old page->flags && mapping->flags checking scheme
with a more simple and strong page->_mapcount read and compare value test.
Similarly to what is done for PageBuddy() checks, BALLOON_PAGE_MAPCOUNT_VALUE
is introduced here to mark balloon pages. This allows balloon_page_movable()
to skip the proven troublesome dereference of page->mapping for flag checking
while it goes on isolate_migratepages_range() lockless rounds.
page->mapping dereference and flag-checking will be performed later, when
all locks are held properly.

Signed-off-by: Rafael Aquini 
---
 include/linux/balloon_compaction.h | 61 +++---
 mm/balloon_compaction.c| 59 ++--
 2 files changed, 54 insertions(+), 66 deletions(-)

diff --git a/include/linux/balloon_compaction.h 
b/include/linux/balloon_compaction.h
index 089743a..e00d5b0 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -108,54 +108,29 @@ static inline void balloon_mapping_free(struct 
address_space *balloon_mapping)
 }
 
 /*
- * page_flags_cleared - helper to perform balloon @page ->flags tests.
+ * balloon_page_movable - identify balloon pages that can be moved by
+ *   compaction / migration.
  *
- * As balloon pages are obtained from buddy and we do not play with page->flags
- * at driver level (exception made when we get the page lock for compaction),
- * we can safely identify a ballooned page by checking if the
- * PAGE_FLAGS_CHECK_AT_PREP page->flags are all cleared.  This approach also
- * helps us skip ballooned pages that are locked for compaction or release, 
thus
- * mitigating their racy check at balloon_page_movable()
+ * BALLOON_PAGE_MAPCOUNT_VALUE must be <= -2 but better not too close to
+ * -2 so that an underflow of the page_mapcount() won't be mistaken
+ * for a genuine BALLOON_PAGE_MAPCOUNT_VALUE.
  */
-static inline bool page_flags_cleared(struct page *page)
+#define BALLOON_PAGE_MAPCOUNT_VALUE (-256)
+static inline bool balloon_page_movable(struct page *page)
 {
-   return !(page->flags & PAGE_FLAGS_CHECK_AT_PREP);
+   return atomic_read(>_mapcount) == BALLOON_PAGE_MAPCOUNT_VALUE;
 }
 
-/*
- * __is_movable_balloon_page - helper to perform @page mapping->flags tests
- */
-static inline bool __is_movable_balloon_page(struct page *page)
+static inline void __balloon_page_set(struct page *page)
 {
-   struct address_space *mapping = page->mapping;
-   return mapping_balloon(mapping);
+   VM_BUG_ON_PAGE(atomic_read(>_mapcount) != -1, page);
+   atomic_set(>_mapcount, BALLOON_PAGE_MAPCOUNT_VALUE);
 }
 
-/*
- * balloon_page_movable - test page->mapping->flags to identify balloon pages
- *   that can be moved by compaction/migration.
- *
- * This function is used at core compaction's page isolation scheme, therefore
- * most pages exposed to it are not enlisted as balloon pages and so, to avoid
- * undesired side effects like racing against __free_pages(), we cannot afford
- * holding the page locked while testing page->mapping->flags here.
- *
- * As we might return false positives in the case of a balloon page being just
- * released under us, the page->mapping->flags need to be re-tested later,
- * under the proper page lock, at the functions that will be coping with the
- * balloon page case.
- */
-static inline bool balloon_page_movable(struct page *page)
+static inline void __balloon_page_clear(struct page *page)
 {
-   /*
-* Before dereferencing and testing mapping->flags,

Re: [PATCH/RFC v4 00/21] LED / flash API integration

2014-08-14 Thread Sakari Ailus

On Thu, Aug 14, 2014 at 12:35:05PM +0200, Jacek Anaszewski wrote:
> On 08/14/2014 07:03 AM, Sakari Ailus wrote:
> >Hi Jacek,
> >
> >On Thu, Aug 07, 2014 at 10:21:14AM +0200, Jacek Anaszewski wrote:
> >>On 08/06/2014 08:53 AM, Sakari Ailus wrote:
> >>>Hi Jacek,
> >>>
> >>>On Fri, Jul 11, 2014 at 04:04:03PM +0200, Jacek Anaszewski wrote:
> >>>...
> 1) Who should register V4L2 Flash sub-device?
> 
> LED Flash Class devices, after introduction of the Flash Manager,
> are not tightly coupled with any media controller. They are maintained
> by the Flash Manager and made available for dynamic assignment to
> any media system they are connected to through multiplexing devices.
> 
> In the proposed rough solution, when support for V4L2 Flash sub-devices
> is enabled, there is a v4l2_device created for them to register in.
> This however implies that V4L2 Flash device will not be available
> in any media controller, which calls its existence into question.
> 
> Therefore I'd like to consult possible ways of solving this issue.
> The option I see is implementing a mechanism for moving V4L2 Flash
> sub-devices between media controllers. A V4L2 Flash sub-device
> would initially be assigned to one media system in the relevant
> device tree binding, but it could be dynamically reassigned to
> the other one. However I'm not sure if media controller design
> is prepared for dynamic modifications of its graph and how many
> modifications in the existing drivers this solution would require.
> >>>
> >>>Do you have a use case where you would need to strobe a flash from multiple
> >>>media devices at different times, or is this entirely theoretical? 
> >>>Typically
> >>>flash controllers are connected to a single source of hardware strobe (if
> >>>there's one) since the flash LEDs are in fact mounted next to a specific
> >>>camera sensor.
> >>
> >>I took into account such arrangements in response to your message
> >>[1], where you were considering configurations like "one flash but
> >>two
> >>cameras", "one camera and two flashes". And you also called for
> >>proposing generic solution.
> >>
> >>One flash and two (or more) cameras case is easily conceivable -
> >>You even mentioned stereo cameras. One camera and many flashes
> >>arrangement might be useful in case of some professional devices which
> >>might be designed so that they would be able to apply different scene
> >>lighting. I haven't heard about such devices, but as you said
> >>such a configuration isn't unthinkable.
> >>
> >>>If this is a real issue the way to solve it would be to have a single media
> >>>device instead of many.
> >>
> >>I was considering adding media device, that would be a representation
> >>of a flash manager, gathering all the registered flashes. Nonetheless,
> >>finally I came to conclusion that a v4l2-device alone should suffice,
> >>just to provide a Flash Manager representation allowing for
> >>v4l2-flash sub-devices to register in.
> >>All the features provided by the media device are useless in case
> >>of a set of V4L2 Flash sub-devices. They couldn't have any linkage
> >>in such a device. The only benefit from having media device gathering
> >>V4L2 Flash devices would be possibility of listing them.
> >
> >Not quite so. The flash is associated to the sensor (and lens) using the
> >group ID in the Media controller. The user space doesn't need to "know" this
> >association.
> >
> >More complex use cases such as the above may need extensions to the Media
> >controller API.
> 
> I think that I have unnecessarily complicated the issue. Generally
> there will be always one media controller created for all camera
> sensors available in the system. If there is a single media controller
> then we can easily use async subdev registration API. A media-dev
> driver would have to parse list of flash device phandles from
> the ISP device's DT node and register them as async sub-devices.

Currently the media device is created by a driver which is most of the time
the ISP driver on embedded systems. This driver is also responsible for
registering the flash device (nodes) to the media device. So "will" is the
word, I think.

-- 
Sakari Ailus
e-mail: sakari.ai...@iki.fi XMPP: sai...@retiisi.org.uk
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git pull] drm fixes (mostly nouveau)

2014-08-14 Thread Dave Airlie


Hi Linus,

one doc buidling fixes for a file that moved, along with a bunch of 
nouveau fixes, one a build problem on ARM.

Dave.

The following changes since commit 899552d6e84babd24611fd36ac7051068cb1eb2d:

  Merge branch 'misc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild (2014-08-14 
11:14:29 -0600)

are available in the git repository at:


  git://people.freedesktop.org/~airlied/linux drm-fixes

for you to fetch changes up to 251964845fbf539781dd2c6406cb2ba1bf9eddd0:

  drm/doc: Refer to proper source file (2014-08-15 09:50:41 +1000)


Alexandre Courbot (2):
  drm/nouveau/gk20a: add LTC device
  drm/nouveau/platform: fix compilation error

Ben Skeggs (9):
  drm/nouveau/nvif: fix a number of notify thinkos
  drm/nouveau: kill unused variable warning if !__OS_HAS_AGP
  drm/nouveau/bar: behave better if ioremap failed
  drm/nvc0-/fb/ram: fix use of non-existant ram if partitions aren't uniform
  drm/nouveau/ltc: fix tag base address getting truncated if above 4GiB
  drm/nouveau/nvif: return null pointers on failure, in addition to ret != 0
  drm/gf100-/gr: fix -ENOSPC detection when allocating zbc table entries
  drm/nouveau/nvif: fix dac load detect method definition
  drm/nouveau: warn if we fail to re-pin fb on resume

Dave Airlie (1):
  Merge branch 'linux-3.17' of 
git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes

Fengguang Wu (1):
  drm/nouveau/kms: nouveau_fbcon_accel_fini can be static

Thierry Reding (1):
  drm/doc: Refer to proper source file

 Documentation/DocBook/drm.tmpl |  2 +-
 drivers/gpu/drm/nouveau/core/core/client.c |  4 +--
 drivers/gpu/drm/nouveau/core/engine/device/nve0.c  |  1 +
 drivers/gpu/drm/nouveau/core/engine/graph/nvc0.c   |  6 +
 drivers/gpu/drm/nouveau/core/include/core/client.h |  2 +-
 drivers/gpu/drm/nouveau/core/subdev/bar/base.c | 14 ---
 drivers/gpu/drm/nouveau/core/subdev/fb/ramnvc0.c   |  4 +--
 drivers/gpu/drm/nouveau/core/subdev/ltc/gf100.c|  2 +-
 drivers/gpu/drm/nouveau/nouveau_bo.c   |  3 +--
 drivers/gpu/drm/nouveau/nouveau_display.c  |  4 ++-
 drivers/gpu/drm/nouveau/nouveau_fbcon.c|  4 +--
 drivers/gpu/drm/nouveau/nouveau_platform.c |  3 ++-
 drivers/gpu/drm/nouveau/nvif/class.h   |  4 +--
 drivers/gpu/drm/nouveau/nvif/notify.c  | 29 +++---
 drivers/gpu/drm/nouveau/nvif/object.c  |  4 ++-
 15 files changed, 58 insertions(+), 28 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for Aug 15

2014-08-14 Thread Stephen Rothwell

Hi all,

Please do not add code intended for v3.18 until after v3.17-rc1 is
released.

Changes since 20140814:

The akpm tree lost a patch that turned up elsewhere.

Non-merge commits (relative to Linus' tree): 774
 873 files changed, 20466 insertions(+), 12050 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a
multi_v7_defconfig for arm. After the final fixups (if any), it is also
built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and
allyesconfig (this fails its final link) and i386, sparc, sparc64 and arm
defconfig.

Below is a summary of the state of the merge.

I am currently merging 220 trees (counting Linus' and 30 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (899552d6e84b Merge branch 'misc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild)
Merging fixes/master (23cf8d3ca0fd powerpc: Fix "attempt to move .org 
backwards" error)
Merging kbuild-current/rc-fixes (dd5a6752ae7d firmware: Create directories for 
external firmware)
Merging arc-current/for-curr (89ca3b881987 Linux 3.15-rc4)
Merging arm-current/fixes (e57e41931134 ARM: wire up memfd_create syscall)
Merging m68k-current/for-linus (9117710a5997 m68k/sun3: Remove define statement 
no longer needed)
Merging metag-fixes/fixes (ffe6902b66aa asm-generic: remove _STK_LIM_MAX)
Merging mips-fixes/mips-fixes (08a9c3c9afcf MIPS: OCTEON: make 
get_system_type() thread-safe)
Merging powerpc-merge/merge (396a34340cdf powerpc: Fix endianness of 
flash_block_list in rtas_flash)
Merging sparc/master (10cf15e1d128 sparc: Hook up memfd_create system call.)
Merging net/master (a61ebdfdb13a Merge tag 'master-2014-08-14' of 
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless)
Merging ipsec/master (a0e5ef53aac8 xfrm: Fix installation of AH IPsec SAs)
Merging sound-current/for-linus (61074c1a2d79 ALSA: hda - Set TLV_DB_SCALE_MUTE 
bit for cx5051 vmaster)
Merging pci-current/for-linus (9baa3c34ac4e PCI: Remove DEFINE_PCI_DEVICE_TABLE 
macro use)
Merging wireless/master (77b2f2865956 iwlwifi: mvm: disable scheduled scan to 
prevent firmware crash)
Merging driver-core.current/driver-core-linus (c489d98c8c81 Merge branch 
'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm)
Merging tty.current/tty-linus (c489d98c8c81 Merge branch 'for-linus' of 
git://ftp.arm.linux.org.uk/~rmk/linux-arm)
Merging usb.current/usb-linus (c489d98c8c81 Merge branch 'for-linus' of 
git://ftp.arm.linux.org.uk/~rmk/linux-arm)
Merging usb-gadget-fixes/fixes (a8a85b01d185 usb: musb/cppi41: call 
musb_ep_select() before accessing an endpoint's CSR)
CONFLICT (content): Merge conflict in drivers/usb/musb/musb_host.c
Merging usb-serial-fixes/usb-linus (19583ca584d6 Linux 3.16)
Merging staging.current/staging-linus (c309bfa9b481 Merge tag 
'for-linus-20140808' of git://git.infradead.org/linux-mtd)
Merging char-misc.current/char-misc-linus (c489d98c8c81 Merge branch 
'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm)
Merging input-current/for-linus (a6b48699ae50 Input: joystick - use get_cycles 
on ARMv8)
Merging md-current/for-linus (d47648fcf061 raid5: avoid finding "discard" 
stripe)
Merging crypto-current/master (ce5481d01f67 crypto: drbg - fix failure of 
generating multiple of 2**16 bytes)
Merging ide/master (a53dae49b2fe ide: use module_platform_driver())
Merging dwmw2/master (5950f0803ca9 pcmcia: remove RPX board stuff)
Merging devicetree-current/devicetree/merge (5a12a597a862 arm: Add devicetree 
fixup machine function)
Merging rr-fixes/fixes (79465d2fd48e module: remove warning about waiting 
module removal.)
Merging vfio-fixes/for-linus (239a87020b26 Merge branch 
'for-joerg/arm-smmu/fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux

Re: [PATCH] perf probe: Warn user to rebuild target with debuginfo

2014-08-14 Thread Brendan Gregg

On Thu, Aug 14, 2014 at 6:51 PM, Masami Hiramatsu
 wrote:
> Here is v2 patch, which I've added "or install an appropriate debuginfo 
> pacakge." :)
[...]

Looks good, thanks.

Brendan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next] vhost_net: stop rx net polling when possible

2014-08-14 Thread Jason Wang

After rx vq was enabled, we never stop polling its socket. This is sub optimal
when may lead unnecessary wake-ups after the rx net work has already been
queued. This could be optimized by stopping polling the rx net sock when
processing both rx and tx and restart it afterward. This could save unnecessary
wake-ups and even unnecessary spin locks acquiring with the help of commit
9e641bdcfa4ef4d6e2fbaa59c1be0ad5d1551fd5 "net-tun: restructure tun_do_read for
better sleep/wakeup efficiency".

Test shows significant CPU% savings during almost all the cases:

Guest rx stream:
size(B)/sessions/throughput/cpu/normalized thru/
64/1/+0.7773%   -8.6224% +10.2866%
64/2/+0.6335%   -13.9109%+16.8946%
64/4/-0.8182%   -14.8336%+16.4565%
64/8/+0.4830%   -13.7675%+16.5256%
256/1/-7.0963%  -12.6880%+6.4043%
256/2/-1.3982%  -11.5424%+11.4678%
256/4/-0.0350%  -11.8323%+13.3806%
256/8/-1.5830%  -12.7693%+12.8238%
1024/1/-7.4895% -19.1449%   +14.4152%
1024/2/-7.4575% -19.4018%   +14.8195%
1024/4/-0.3881% -9.1183%+9.6061%
1024/8/+0.4713% -11.0155%   +12.9087%
4096/1/+0.8786%  -8.4050%+10.1355%
4096/2/+0.0098%  -15.3094%   +18.0885%
4096/4/+0.0445%  -10.8247%   +12.1886%
4096/8/-2.1317%  -12.5111%   +11.8637%
16384/1/-0.0008% -6.1891%+6.5966%
16384/2/-0.0117% -16.2716%   +19.4198%
16384/4/+0.0001% -5.9197%+6.2923%
16384/8/+0.0173% -7.6681%+8.3236%
65535/1/+0.0011% -10.3594%   +11.5578%
65535/2/-0.4108%  -14.4304%   +16.3838%
65535/4/+0.0011%  -10.3594%   +11.5578%
65535/8/-0.4108%  -14.4304%   +16.3838%

Guest tx stream:
size(B)/sessions/throughput/cpu/normalized thru/
64/1/-0.6228% -2.1936% +1.6060%
64/2/+0.8646% -3.5063% +4.5297%
64/4/+0.8733% -3.2495% +4.2613%
64/8/+1.4290% -3.5593% +5.1724%
256/1/+7.2098%-3.1122% +10.6535%
256/2/-10.1408%   -6.8230% -3.5607%
256/4/-11.3531%   -6.7085% -4.9785%
256/8/-10.2723%   -6.5628% -3.9701%
1024/1/-18.9329%  -13.6162%-6.1547%
1024/2/-0.3728%   -1.3181% +0.9580%
1024/4/+0.0125%   -3.6338% +3.7838%
1024/8/-0.0030%   -2.7282% +2.8017%
4096/1/+16.9367%  -1.9435% +19.2543%
4096/2/+0.0121%   -6.1682% +6.5866%
4096/4/+0.0019%   -3.8510% +4.0072%
4096/8/-0.0222%   -4.1368% +4.2922%
16384/1/-0.0026%  -8.6892% +9.5132%
16384/2/-0.0012%  -10.1676%+11.3171%
16384/4/+0.0196%  -1.2551% +1.2908%
16384/8/+0.1303%  -3.2634% +3.5082%
65535/1/+0.0019%  -3.4694% +3.5961%
65535/2/-0.0003%  -0.7635% +0.7690%
65535/4/-0.0219%  -2.7875% +2.8448%
65535/8/+0.1137%  -2.7922% +2.9894%

TCP_RR:
size(B)/sessions/throughput/cpu/normalized thru/
256/1/+1.9004%-4.7985% +7.0366%
256/25/-4.7366%   -11.0809%+7.1349%
256/50/+3.9808%   -5.2037% +9.6887%
4096/1/+2.1619%   -0.7303% +2.9134%
4096/25/-13.1836% -14.7298%+1.8134%
4096/50/-11.1990% -15.4763%+5.0605%

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8dae2f7..d4a9742 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -334,6 +334,8 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, 
bool success)
 static void handle_tx(struct vhost_net *net)
 {
struct vhost_net_virtqueue *nvq = >vqs[VHOST_NET_VQ_TX];
+   struct vhost_virtqueue *rx_vq = >vqs[VHOST_NET_VQ_RX].vq;
+   struct vhost_poll *rx_poll = >poll[VHOST_NET_VQ_RX];
struct vhost_virtqueue *vq = >vq;
unsigned out, in, s;
int head;
@@ -348,15 +350,18 @@ static void handle_tx(struct vhost_net *net)
size_t len, total_len = 0;
int err;
size_t hdr_size;
-   struct socket *sock;
+   struct socket *sock, *rxsock;
struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
-   bool zcopy, zcopy_used;
+   bool zcopy, zcopy_used, poll = false;
 
mutex_lock(>mutex);
+   mutex_lock(_vq->mutex);
sock = vq->private_data;
+   rxsock = rx_vq->private_data;
if (!sock)
goto out;
 
+   vhost_poll_stop(rx_poll);
vhost_disable_notify(>dev, vq);
 
hdr_size = nvq->vhost_hlen;
@@ -451,11 +456,17 @@ static void handle_tx(struct vhost_net *net)
total_len += len;
vhost_net_tx_packet(net);
if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
-   vhost_poll_queue(>poll);
+   poll = true;
break;
}
}
+
+   if (rxsock)
+   vhost_poll_start(rx_poll, rxsock->file);
+   if (poll)
+   vhost_poll_queue(>poll);
 out:
+   mutex_unlock(_vq->mutex);
mutex_unlock(>mutex);
 }
 
@@ -554,6 +565,7 @@ err:
 static void handle_rx(struct vhost_net *net)
 {
struct vhost_net_virtqueue *nvq = >vqs[VHOST_NET_VQ_RX];
+

Re: mm: compaction: buffer overflow in isolate_migratepages_range

2014-08-14 Thread Konstantin Khlebnikov

On Fri, Aug 15, 2014 at 2:07 AM, Rafael Aquini  wrote:
> On Thu, Aug 14, 2014 at 06:43:50PM -0300, Rafael Aquini wrote:
>> On Thu, Aug 14, 2014 at 10:07:40PM +0400, Andrey Ryabinin wrote:
>> > We discussed this with Konstantin and he suggested a better solution for 
>> > this.
>> > If I understood him correctly the main idea was to store bit
>> > identifying ballon page
>> > in struct page (special value in _mapcount), so we won't need to check
>> > mapping->flags.
>> >
>>
>> Here goes what I thought doing, following that suggestion of Konstantin and 
>> yours. (I didn't tested it yet)
>>
>> Comments are welcomed.
>>
>> Cheers,
>> -- Rafael
>>
>>  8< 
>> From: Rafael Aquini 
>> Subject: mm: balloon_compaction: enhance balloon_page_movable() checkpoint 
>> against races
>>
>> While testing linux-next for the Kernel Address Sanitizer patchset (KASAN)
>> Sasha Levin reported a buffer overflow warning triggered for
>> isolate_migratepages_range(), which lated was discovered happening due to
>> a condition where balloon_page_movable() raced against move_to_new_page(),
>> while the later was copying the page->mapping of an anon page.
>>
>> Because we can perform balloon_page_movable() in a lockless fashion at
>> isolate_migratepages_range(), the dicovered race has unveiled the scheme
>> actually used to spot ballooned pages among page blocks that checks for
>> page_flags_cleared() and dereference page->mapping to check its mapping flags
>> is weak and potentially prone to stumble across another similar conditions
>> in the future.
>>
>> Following Konstantin Khlebnikov's and Andrey Ryabinin's suggestions,
>> this patch replaces the old page->flags && mapping->flags checking scheme
>> with a more simple and strong page->_mapcount read and compare value test.
>> Similarly to what is done for PageBuddy() checks, BALLOON_PAGE_MAPCOUNT_VALUE
>> is introduced here to mark balloon pages. This allows balloon_page_movable()
>> to skip the proven troublesome dereference of page->mapping for flag checking
>> while it goes on isolate_migratepages_range() lockless rounds.
>> page->mapping dereference and flag-checking will be performed later, when
>> all locks are held properly.
>>
>> ---
>>  include/linux/balloon_compaction.h | 61 
>> +++---
>>  mm/balloon_compaction.c| 53 +
>>  2 files changed, 45 insertions(+), 69 deletions(-)
>>
>> diff --git a/include/linux/balloon_compaction.h 
>> b/include/linux/balloon_compaction.h
>> index 089743a..1409ccc 100644
>> --- a/include/linux/balloon_compaction.h
>> +++ b/include/linux/balloon_compaction.h
>> @@ -108,54 +108,29 @@ static inline void balloon_mapping_free(struct 
>> address_space *balloon_mapping)
>>  }
>>
>>  /*
>> - * page_flags_cleared - helper to perform balloon @page ->flags tests.
>> + * balloon_page_movable - identify balloon pages that can be moved by
>> + * compaction / migration.
>>   *
>> - * As balloon pages are obtained from buddy and we do not play with 
>> page->flags
>> - * at driver level (exception made when we get the page lock for 
>> compaction),
>> - * we can safely identify a ballooned page by checking if the
>> - * PAGE_FLAGS_CHECK_AT_PREP page->flags are all cleared.  This approach also
>> - * helps us skip ballooned pages that are locked for compaction or release, 
>> thus
>> - * mitigating their racy check at balloon_page_movable()
>> + * BALLOON_PAGE_MAPCOUNT_VALUE must be <= -2 but better not too close to
>> + * -2 so that an underflow of the page_mapcount() won't be mistaken
>> + * for a genuine BALLOON_PAGE_MAPCOUNT_VALUE.
>>   */
>> -static inline bool page_flags_cleared(struct page *page)
>> +#define BALLOON_PAGE_MAPCOUNT_VALUE (-256)
>> +static inline bool balloon_page_movable(struct page *page)
>>  {
>> - return !(page->flags & PAGE_FLAGS_CHECK_AT_PREP);
>> + return atomic_read(>_mapcount) == BALLOON_PAGE_MAPCOUNT_VALUE;
>>  }
>>
>> -/*
>> - * __is_movable_balloon_page - helper to perform @page mapping->flags tests
>> - */
>> -static inline bool __is_movable_balloon_page(struct page *page)
>> +static inline void __balloon_page_set(struct page *page)
>>  {
>> - struct address_space *mapping = page->mapping;
>> - return mapping_balloon(mapping);
>> + VM_BUG_ON_PAGE(!atomic_read(>_mapcount) != -1, page);
>> + atomic_set(>_mapcount, BALLOON_PAGE_MAPCOUNT_VALUE);
>>  }
>>
>> -/*
>> - * balloon_page_movable - test page->mapping->flags to identify balloon 
>> pages
>> - * that can be moved by compaction/migration.
>> - *
>> - * This function is used at core compaction's page isolation scheme, 
>> therefore
>> - * most pages exposed to it are not enlisted as balloon pages and so, to 
>> avoid
>> - * undesired side effects like racing against __free_pages(), we cannot 
>> afford
>> - * holding the page locked while testing page->mapping->flags here.
>> - *
>> - * As we might return false positives in the

Re: [PATCH 2/7] locking/rwsem: more aggressive use of optimistic spinning

2014-08-14 Thread Dave Chinner

On Wed, Aug 13, 2014 at 12:41:06PM -0400, Waiman Long wrote:
> On 08/13/2014 01:51 AM, Dave Chinner wrote:
> >On Mon, Aug 04, 2014 at 11:44:19AM -0400, Waiman Long wrote:
> >>On 08/04/2014 12:10 AM, Jason Low wrote:
> >>>On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
> The rwsem_can_spin_on_owner() function currently allows optimistic
> spinning only if the owner field is defined and is running. That is
> too conservative as it will cause some tasks to miss the opportunity
> of doing spinning in case the owner hasn't been able to set the owner
> field in time or the lock has just become available.
> 
> This patch enables more aggressive use of optimistic spinning by
> assuming that the lock is spinnable unless proved otherwise.
> 
> Signed-off-by: Waiman Long
> ---
>   kernel/locking/rwsem-xadd.c |2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index d058946..dce22b8 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -285,7 +285,7 @@ static inline bool 
> rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
>   static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
>   {
>   struct task_struct *owner;
> - bool on_cpu = false;
> + bool on_cpu = true; /* Assume spinnable unless proved not to be */
> >>>Hi,
> >>>
> >>>So "on_cpu = true" was recently converted to "on_cpu = false" in order
> >>>to address issues such as a 5x performance regression in the xfs_repair
> >>>workload that was caused by the original rwsem optimistic spinning code.
> >>>
> >>>However, patch 4 in this patchset does address some of the problems with
> >>>spinning when there are readers. CC'ing Dave Chinner, who did the
> >>>testing with the xfs_repair workload.
> >>>
> >>This patch set enables proper reader spinning and so the problem
> >>that we see with xfs_repair workload should go away. I should have
> >>this patch after patch 4 to make it less confusing. BTW, patch 3 can
> >>significantly reduce spinlock contention in rwsem. So I believe the
> >>xfs_repair workload should run faster with this patch than both 3.15
> >>and 3.16.
> >I see lots of handwaving. I documented the test I ran when I
> >reported the problem so anyone with a 16p system and an SSD can
> >reproduce it. I don't have the bandwidth to keep track of the lunacy
> >of making locks scale these days - that's what you guys are doing.
> >
> >I gave you a simple, reliable workload that is extremely sensitive
> >to rwsem perturbations, so you should be adding it to your
> >regression tests rather than leaving it for others to notice you
> >screwed up
> >
> >Cheers,
> >
> >Dave.
> 
> If you can send me a rwsem workload that I can use for testing
> purpose, it will be highly appreciated.


xfs_io -f -c "truncate 500t" -c "extsize 1m" /path/to/vm/image/file



In vm:

download and build fsmark from here:

git://oss.sgi.com/dgc/fs_mark

download and install xfsprogs v3.2.1 from here:

git://oss.sgi.com/xfs/cmds/xfsprogs.git tags/v3.2.1

Setup up the target filesystem:

# mkfs.xfs -f -m "crc=1,finobt=1" /dev/vda
# mount -o logbsize=262144,nobarrier /dev/vda /mnt/scratch


Run:

# fs_mark  -D  1  -S0  -n  5  -s  0  -L  32 \
-d  /mnt/scratch/0  -d  /mnt/scratch/1 \
-d  /mnt/scratch/2  -d  /mnt/scratch/3 \
-d  /mnt/scratch/4  -d  /mnt/scratch/5 \
-d  /mnt/scratch/6  -d  /mnt/scratch/7 \
-d  /mnt/scratch/8  -d  /mnt/scratch/9 \
-d  /mnt/scratch/10  -d  /mnt/scratch/11 \
-d  /mnt/scratch/12  -d  /mnt/scratch/13 \
-d  /mnt/scratch/14  -d  /mnt/scratch/15 \

If you've got everything set up right, that should run at around
200-250,000 file creates/s. When finished, unmount and run:

# xfs_repair -o bhash=50 /dev/vda

And that should spend quite a long while pounding on the mmap_sem
until the the userspace buffer cache stops growing.

I just ran the above on 3.16, saw this from perf:

  37.30%  [kernel]  [k] _raw_spin_unlock_irqrestore
   - _raw_spin_unlock_irqrestore
  - 62.00% rwsem_wake
 - call_rwsem_wake
+ 83.52% sys_mprotect
+ 16.23% __do_page_fault
  + 35.15% try_to_wake_up
  + 0.96% update_blocked_averages
  + 0.61% pagevec_lru_move_fn
-  23.35%  [kernel]  [k] _raw_spin_unlock_irq
   - _raw_spin_unlock_irq
  + 51.37% finish_task_switch
  + 39.37% rwsem_down_write_failed
  + 8.49% rwsem_down_read_failed
0.62% run_timer_softirq
+   5.22%  [kernel]  [k] native_read_tsc
+   3.89%  [kernel]  [k] rwsem_down_write_failed
.

Cheers,

Dave.

-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read

[PATCH] zram: add num_discards for discarded pages stat

2014-08-14 Thread Chao Yu

Now we have supported handling discard request which is sended by filesystem,
but no interface could be used to show information of discard.
This patch adds num_discards to stat discarded pages, then export it to sysfs
for displaying.

Signed-off-by: Chao Yu 
---
 Documentation/ABI/testing/sysfs-block-zram | 10 ++
 drivers/block/zram/zram_drv.c  |  3 +++
 drivers/block/zram/zram_drv.h  |  1 +
 3 files changed, 14 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block-zram 
b/Documentation/ABI/testing/sysfs-block-zram
index 70ec992..fa8936e 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -57,6 +57,16 @@ Description:
The failed_writes file is read-only and specifies the number of
failed writes happened on this device.
 
+
+What:  /sys/block/zram/num_discards
+Date:  August 2014
+Contact:   Chao Yu 
+Description:
+   The num_discards file is read-only and specifies the number of
+   physical blocks which are discarded by this device. These blocks
+   are included in discard request which is sended by filesystem as
+   the blocks are no longer used.
+
 What:  /sys/block/zram/max_comp_streams
 Date:  February 2014
 Contact:   Sergey Senozhatsky 
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index d00831c..904e7a5 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -606,6 +606,7 @@ static void zram_bio_discard(struct zram *zram, u32 index,
bit_spin_lock(ZRAM_ACCESS, >table[index].value);
zram_free_page(zram, index);
bit_spin_unlock(ZRAM_ACCESS, >table[index].value);
+   atomic64_inc(>stats.num_discards);
index++;
n -= PAGE_SIZE;
}
@@ -866,6 +867,7 @@ ZRAM_ATTR_RO(num_reads);
 ZRAM_ATTR_RO(num_writes);
 ZRAM_ATTR_RO(failed_reads);
 ZRAM_ATTR_RO(failed_writes);
+ZRAM_ATTR_RO(num_discards);
 ZRAM_ATTR_RO(invalid_io);
 ZRAM_ATTR_RO(notify_free);
 ZRAM_ATTR_RO(zero_pages);
@@ -879,6 +881,7 @@ static struct attribute *zram_disk_attrs[] = {
_attr_num_writes.attr,
_attr_failed_reads.attr,
_attr_failed_writes.attr,
+   _attr_num_discards.attr,
_attr_invalid_io.attr,
_attr_notify_free.attr,
_attr_zero_pages.attr,
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index e0f725c..2994aaf 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -86,6 +86,7 @@ struct zram_stats {
atomic64_t num_writes;  /* --do-- */
atomic64_t failed_reads;/* can happen when memory is too low */
atomic64_t failed_writes;   /* can happen when memory is too low */
+   atomic64_t num_discards;/* no. of discarded pages */
atomic64_t invalid_io;  /* non-page-aligned I/O requests */
atomic64_t notify_free; /* no. of swap slot free notifications */
atomic64_t zero_pages;  /* no. of zero filled pages */
-- 
2.0.1.474.g72c7794


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm] slab: fix cpuset check in fallback_alloc

2014-08-14 Thread Li Zefan

On 2014/8/12 5:05, David Rientjes wrote:
> On Mon, 11 Aug 2014, Vladimir Davydov wrote:
> 
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1963,7 +1963,7 @@ zonelist_scan:
>>>  
>>> /*
>>>  * Scan zonelist, looking for a zone with enough free.
>>> -* See also __cpuset_node_allowed_softwall() comment in kernel/cpuset.c.
>>> +* See __cpuset_node_allowed() comment in kernel/cpuset.c.
>>>  */
>>> for_each_zone_zonelist_nodemask(zone, z, zonelist,
>>> high_zoneidx, nodemask) {
>>> @@ -1974,7 +1974,7 @@ zonelist_scan:
>>> continue;
>>> if (cpusets_enabled() &&
>>> (alloc_flags & ALLOC_CPUSET) &&
>>> -   !cpuset_zone_allowed_softwall(zone, gfp_mask))
>>> +   !cpuset_zone_allowed(zone, gfp_mask))
>>> continue;
>>
>> So, this is get_page_from_freelist. It's called from
>> __alloc_pages_nodemask with alloc_flags always having ALLOC_CPUSET bit
>> set and from __alloc_pages_slowpath with alloc_flags having ALLOC_CPUSET
>> bit set only for __GFP_WAIT allocations. That said, w/o your patch we
>> try to respect cpusets for all allocations, including atomic, and only
>> ignore cpusets if tight on memory (freelist's empty) for !__GFP_WAIT
>> allocations, while with your patch we always ignore cpusets for
>> !__GFP_WAIT allocations. Not sure if it really matters though, because
>> usually one uses cpuset.mems in conjunction with cpuset.cpus and it
>> won't make any difference then. It also doesn't conflict with any cpuset
>> documentation.
>>
> 
> Yeah, that's why I'm asking Li, the cpuset maintainer, if we can do this.  

I'm not quite sure. That code has been there before I got involved in cpuset.

> The only thing that we get by falling back to the page allocator slowpath 
> is that kswapd gets woken up before the allocation is attempted without 
> ALLOC_CPUSET.  It seems pointless to wakeup kswapd when the allocation can 
> succeed on any node.  Even with the patch, if the allocation fails because 
> all nodes are below their min watermark, then we still fallback to the 
> slowpath and wake up kswapd but there's nothing much else we can do 
> because it's !__GFP_WAIT.
> .

But I tend to agree with you. But if we want to do this, we should split this
change from the cleanup.

Regarding to the cleanup, I found there used to be a single 
cpuset_node_allowed(),
and your cleanup is exactly a revert of that ancient commit:

commit 02a0e53d8227aff5e62e0433f82c12c1c2805fd6
Author: Paul Jackson 
Date:   Wed Dec 13 00:34:25 2006 -0800

[PATCH] cpuset: rework cpuset_zone_allowed api

Seems the major intention was to avoid accident sleep-in-atomic bugs, because
callback_mutex might be held.

I don't see there's any reason callback_mutex can't be a spinlock. I thought
about this when Gu Zhen fixed the bug that callback_mutex is nested inside
rcu_read_lock().

--
 kernel/cpuset.c | 81 ++---
 1 file changed, 49 insertions(+), 32 deletions(-)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index baa155c..9d9e239 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -284,7 +284,7 @@ static struct cpuset top_cpuset = {
  */
 
 static DEFINE_MUTEX(cpuset_mutex);
-static DEFINE_MUTEX(callback_mutex);
+static DEFINE_SPINLOCK(callback_lock);
 
 /*
  * CPU / memory hotplug is handled asynchronously.
@@ -848,6 +848,7 @@ static void update_tasks_cpumask(struct cpuset *cs)
  */
 static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 {
+   unsigned long flags;
struct cpuset *cp;
struct cgroup_subsys_state *pos_css;
bool need_rebuild_sched_domains = false;
@@ -875,9 +876,9 @@ static void update_cpumasks_hier(struct cpuset *cs, struct 
cpumask *new_cpus)
continue;
rcu_read_unlock();
 
-   mutex_lock(_mutex);
+   spin_lock_irqsave(_lock, flags);
cpumask_copy(cp->effective_cpus, new_cpus);
-   mutex_unlock(_mutex);
+   spin_unlock_irqrestore(_lock, flags);
 
WARN_ON(!cgroup_on_dfl(cp->css.cgroup) &&
!cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
@@ -910,6 +911,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct 
cpumask *new_cpus)
 static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
  const char *buf)
 {
+   unsigned long flags;
int retval;
 
/* top_cpuset.cpus_allowed tracks cpu_online_mask; it's read-only */
@@ -942,9 +944,9 @@ static int update_cpumask(struct cpuset *cs, struct cpuset 
*trialcs,
if (retval < 0)
return retval;
 
-   mutex_lock(_mutex);
+   spin_lock_irqsave(_lock, flags);
cpumask_copy(cs->cpus_allowed,

Re: [PATCH v2] PC, KVM, CMA: Fix regression caused by wrong get_order() use

2014-08-14 Thread Alexey Kardashevskiy

On 08/14/2014 11:40 PM, Alexander Graf wrote:
> 
> On 14.08.14 07:13, Aneesh Kumar K.V wrote:
>> Alexey Kardashevskiy  writes:
>>
>>> fc95ca7284bc54953165cba76c3228bd2cdb9591 claims that there is no
>>> functional change but this is not true as it calls get_order() (which
>>> takes bytes) where it should have called ilog2() and the kernel stops
>>> on VM_BUG_ON().
>>>
>>> This replaces get_order() with order_base_2() (round-up version of ilog2).
>>>
>>> Suggested-by: Paul Mackerras 
>>> Cc: Alexander Graf 
>>> Cc: Aneesh Kumar K.V 
>>> Cc: Joonsoo Kim 
>>> Cc: Benjamin Herrenschmidt 
>>> Signed-off-by: Alexey Kardashevskiy 
>> Reviewed-by: Aneesh Kumar K.V 
> 
> So this affects 3.17?

Yes.


-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] x86, hotplug: fix llc shared map unreleased during cpu hotplug

2014-08-14 Thread Wanpeng Li

Hi Peter,
On Fri, Aug 08, 2014 at 04:40:57PM -0600, Linn Crosetto wrote:
[...]
>
>Tested with a CPU hotplug stress test, run on a large system with 240 CPUs.
>Thanks.
>
>Tested-by: Linn Crosetto 

Is it ok for you to apply this patch or still need update?

Regards,
Wanpeng Li 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC v3 2/2] btrfs: use the new VFS super_block_dev

2014-08-14 Thread Luis R. Rodriguez

From: "Luis R. Rodriguez" 

Use the new VFS layer struct super_block_dev instead of carrying
the anonymous bdev's on our own. This makes the VFS layer aware of
all of our anonymous dev's on the super block.

Signed-off-by: Luis R. Rodriguez 
Signed-off-by: Filipe Manana 
fdmanana: fix for running qgroup sanity tests
---
 fs/btrfs/ctree.h   | 7 ++-
 fs/btrfs/disk-io.c | 7 +++
 fs/btrfs/inode.c   | 2 +-
 3 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index be91397..0ece396 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1846,11 +1846,8 @@ struct btrfs_root {
 * protected by inode_lock
 */
struct radix_tree_root delayed_nodes_tree;
-   /*
-* right now this just gets used so that a root has its own devid
-* for stat.  It may be used for more later
-*/
-   dev_t anon_dev;
+
+   struct super_block_dev sbdev;
 
spinlock_t root_item_lock;
atomic_t refs;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 08e65e9..7c65307 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1270,7 +1270,6 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 
sectorsize,
root->defrag_trans_start = 0;
init_completion(>kobj_unregister);
root->root_key.objectid = objectid;
-   root->anon_dev = 0;
 
spin_lock_init(>root_item_lock);
 }
@@ -1573,7 +1572,7 @@ int btrfs_init_fs_root(struct btrfs_root *root)
spin_lock_init(>cache_lock);
init_waitqueue_head(>cache_wait);
 
-   ret = get_anon_bdev(>anon_dev);
+   ret = insert_anon_sbdev(root->fs_info->sb, >sbdev);
if (ret)
goto free_writers;
return 0;
@@ -3532,8 +3531,8 @@ static void free_fs_root(struct btrfs_root *root)
WARN_ON(!RB_EMPTY_ROOT(>inode_tree));
btrfs_free_block_rsv(root, root->orphan_block_rsv);
root->orphan_block_rsv = NULL;
-   if (root->anon_dev)
-   free_anon_bdev(root->anon_dev);
+   if (likely(!test_bit(BTRFS_ROOT_DUMMY_ROOT, >state)))
+   remove_anon_sbdev(>sbdev);
if (root->subv_writers)
btrfs_free_subvolume_writers(root->subv_writers);
free_extent_buffer(root->node);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3668048..0e8f604 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8277,7 +8277,7 @@ static int btrfs_getattr(struct vfsmount *mnt,
u32 blocksize = inode->i_sb->s_blocksize;
 
generic_fillattr(inode, stat);
-   stat->dev = BTRFS_I(inode)->root->anon_dev;
+   stat->dev = BTRFS_I(inode)->root->sbdev.anon_dev;
stat->blksize = PAGE_CACHE_SIZE;
 
spin_lock(_I(inode)->lock);
-- 
2.0.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC v3 1/2] fs/super.c: add new super block sub devices super_block_dev

2014-08-14 Thread Luis R. Rodriguez

From: "Luis R. Rodriguez" 

Modern filesystems are using the get_anon_bdev() for internal
notions of volumes, snapshots for a single super block but never
exposing them directly to the VFS layer. While this works its
leaves the VFS layer growing dumb over what filesystems are doing.
This creates a new super block subdevice which we can use to start
stuffing in information about the underlying bdev's and its
associated super block to start off with. This at least now lets
us implement proper support for ustat() once filesystems are
modified to use this data structure and respective helpers.

Signed-off-by: Luis R. Rodriguez 
---
 fs/super.c | 68 --
 include/linux/fs.h | 10 
 2 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index d20d5b1..d871892 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -133,6 +133,68 @@ static unsigned long super_cache_count(struct shrinker 
*shrink,
return total_objects;
 }
 
+static bool super_dev_match(struct super_block *sb, dev_t dev)
+{
+   struct super_block_dev *sbdev;
+
+   if (sb->s_dev == dev)
+   return true;
+
+   if (list_empty(>s_sbdevs))
+   return false;
+
+   list_for_each_entry(sbdev, >s_sbdevs, entry)
+   if (sbdev->anon_dev ==  dev)
+   return true;
+
+   return false;
+}
+
+int insert_anon_sbdev(struct super_block *sb, struct super_block_dev *sbdev)
+{
+   int ret;
+
+   ret = get_anon_bdev(>anon_dev);
+   if (ret)
+   return ret;
+
+   sbdev->sb = sb;
+
+   spin_lock(_lock);
+   list_add_tail(>entry, >s_sbdevs);
+   spin_unlock(_lock);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(insert_anon_sbdev);
+
+void remove_anon_sbdev(struct super_block_dev *sbdev)
+{
+   struct super_block *sb;
+   struct super_block_dev *sbdev_i, *tmp;
+
+   if (!sbdev)
+   return;
+
+   sb = sbdev->sb;
+
+   spin_lock(_lock);
+
+   WARN_ON(list_empty(>s_sbdevs));
+
+   list_for_each_entry_safe(sbdev_i, tmp, >s_sbdevs, entry) {
+   if (sbdev == sbdev_i) {
+   list_del_init(_i->entry);
+   break;
+   }
+   }
+
+   spin_unlock(_lock);
+
+   free_anon_bdev(sbdev->anon_dev);
+}
+EXPORT_SYMBOL_GPL(remove_anon_sbdev);
+
 /**
  * destroy_super   -   frees a superblock
  * @s: superblock to free
@@ -148,6 +210,7 @@ static void destroy_super(struct super_block *s)
percpu_counter_destroy(>s_writers.counter[i]);
security_sb_free(s);
WARN_ON(!list_empty(>s_mounts));
+   WARN_ON(!list_empty(>s_sbdevs));
kfree(s->s_subtype);
kfree(s->s_options);
kfree_rcu(s, rcu);
@@ -188,6 +251,7 @@ static struct super_block *alloc_super(struct 
file_system_type *type, int flags)
INIT_HLIST_NODE(>s_instances);
INIT_HLIST_BL_HEAD(>s_anon);
INIT_LIST_HEAD(>s_inodes);
+   INIT_LIST_HEAD(>s_sbdevs);
 
if (list_lru_init(>s_dentry_lru))
goto fail;
@@ -652,7 +716,7 @@ restart:
spin_unlock(_lock);
return NULL;
 }
- 
+
 struct super_block *user_get_super(dev_t dev)
 {
struct super_block *sb;
@@ -662,7 +726,7 @@ rescan:
list_for_each_entry(sb, _blocks, s_list) {
if (hlist_unhashed(>s_instances))
continue;
-   if (sb->s_dev ==  dev) {
+   if (super_dev_match(sb, dev)) {
sb->s_count++;
spin_unlock(_lock);
down_read(>s_umount);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f0890e4..c9152ac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1197,6 +1197,13 @@ struct sb_writers {
 #endif
 };
 
+/* we can expand this to help the VFS layer with modern filesystems */
+struct super_block_dev {
+   struct super_block  *sb;
+   struct list_headentry;  /* For struct sb->s_sbdevs */
+   dev_t   anon_dev;
+};
+
 struct super_block {
struct list_heads_list; /* Keep this first */
dev_t   s_dev;  /* search index; _not_ kdev_t */
@@ -1221,6 +1228,7 @@ struct super_block {
 
struct list_heads_inodes;   /* all inodes */
struct hlist_bl_heads_anon; /* anonymous dentries for (nfs) 
exporting */
+   struct list_heads_sbdevs;   /* internal fs dev_t */
struct list_heads_mounts;   /* list of mounts; _not_ for fs 
use */
struct block_device *s_bdev;
struct backing_dev_info *s_bdi;
@@ -1821,6 +1829,8 @@ void deactivate_locked_super(struct super_block *sb);
 int set_anon_super(struct super_block *s, void *data);
 int get_anon_bdev(dev_t *);
 void free_anon_bdev(dev_t);
+int insert_anon_sbdev(struct super_block

[RFC v3 0/2] vfs / btrfs: add support for ustat()

2014-08-14 Thread Luis R. Rodriguez

From: "Luis R. Rodriguez" 

This v3 has this small fix identified by Filipe Manana on the
btrfs specific patch. The v2 series was briefly discussed but
upon providing a use case and reasoning for the way things
were changed I haven't gotten any more further advice or
feedback.

Christoph had noted that this seemed associated to the problem
that the btrfs uses different assignments for st_dev than s_dev,
but much as I'd like to see that changed based on discussions so
far its unclear if this is going to be possible unless strong
commitment is reached. What this tries to do was to take the
other way around the problem, by slowly shifting out junk. I
think this approach might be more feasible over time. I see this
as an extension to Al's original commit 0ee5dc676 but more in line
with how they are really are used and exposes more information to
the VFS. As it stands now other filesystems can pop up and do
similar things, this at least extends the original API to fit
the use case a bit more closely to how its used and allows more
room to grow.

Let's consider this userspace case:

struct stat buf;
struct ustat ubuf;

/* Find a valid device number */
if (stat("/", )) {
fprintf(stderr, "Stat failed: %s\n", strerror(errno));
return 1;
}

/* Call ustat on it */
if (ustat(buf.st_dev, )) {
fprintf(stderr, "Ustat failed: %s\n", strerror(errno));
return 1;
}

In the btrfs case it has an inode op for getattr, that is used and we set
the dev to anonymous dev_t. Later ustat will use user_get_super() which
will only be able to work with a userblock if the super block's only
dev_t is assigned to it. Since we have many anonymous to dev_t mapping
to super block though we can't complete the search for btfs and ustat()
fails with -EINVAL. The series expands the number of dev_t's that a super
block can have and allows this search to complete.

Luis R. Rodriguez (2):
  fs/super.c: add new super block sub devices super_block_dev
  btrfs: use the new VFS super_block_dev

 fs/btrfs/ctree.h   |  7 ++
 fs/btrfs/disk-io.c |  7 +++---
 fs/btrfs/inode.c   |  2 +-
 fs/super.c | 68 --
 include/linux/fs.h | 10 
 5 files changed, 82 insertions(+), 12 deletions(-)

-- 
2.0.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] time,signal: protect resource use statistics with seqlock

2014-08-14 Thread Frederic Weisbecker

2014-08-14 16:39 GMT+02:00 Oleg Nesterov :
> On 08/14, Frederic Weisbecker wrote:
>>
>> 2014-08-14 3:57 GMT+02:00 Rik van Riel :
>> > -BEGIN PGP SIGNED MESSAGE-
>> > Hash: SHA1
>> >
>> > On 08/13/2014 08:43 PM, Frederic Weisbecker wrote:
>> >> On Wed, Aug 13, 2014 at 05:03:24PM -0400, Rik van Riel wrote:
>> >>
>> >> I'm worried about such lockless solution based on RCU or read
>> >> seqcount because we lose the guarantee that an update is
>> >> immediately visible by all subsequent readers.
>> >>
>> >> Say CPU 0 updates the thread time and both CPU 1 and CPU 2 right
>> >> after that call clock_gettime(), with the spinlock we were
>> >> guaranteed to see the new update. Now with a pure seqlock read
>> >> approach, we guarantee a read sequence coherency but we don't
>> >> guarantee the freshest update result.
>> >>
>> >> So that looks like a source of non monotonic results.
>> >
>> > Which update are you worried about, specifically?
>> >
>> > The seq_write_lock to update the usage stat in p->signal will lock out
>> > the seqlock read side used to check those results.
>> >
>> > Is there another kind of thing read by cpu_clock_sample_group that you
>> > believe is not excluded by the seq_lock?
>>
>> I mean the read side doesn't use a lock with seqlocks. It's only made
>> of barriers and sequence numbers to ensure the reader doesn't read
>> some half-complete update. But other than that it can as well see the
>> update n - 1 since barriers don't enforce latest results.
>
> Yes, sure, read_seqcount_begin/read_seqcount_retry "right after"
> write_seqcount_begin-update-write_seqcount_begin can miss "update" part
> along with ->sequence modifications.
>
> But I still can't understand how this can lead to non-monotonic results,
> could you spell?

Well lets say clock = T.
CPU 0 updates at T + 1.
Then I call clock_gettime() from CPU 1 and CPU 2. CPU 1 reads T + 1
while CPU 1 still reads T.
If I do yet another round of clock_gettime() on CPU 1 and CPU 2, it's
possible that CPU 2 still sees T. With the spinlocked version that
thing can't happen, the second round would read at least T + 1 for
both CPUs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2 0/2] vfs / btrfs: add support for ustat()

2014-08-14 Thread Luis R. Rodriguez

On Thu, Jul 17, 2014 at 1:49 PM, Luis R. Rodriguez  wrote:
> On Thu, Jul 17, 2014 at 01:03:01AM -0700, Christoph Hellwig wrote:
>> On Wed, Jul 16, 2014 at 02:37:56PM -0700, Luis R. Rodriguez wrote:
>> > From: "Luis R. Rodriguez" 
>> >
>> > This makes the implementation simpler by stuffing the struct on
>> > the driver and just letting the driver iinsert it and remove it
>> > onto the sb list. This avoids the kzalloc() completely.
>>
>> Again, NAK.  Make btrfs report the proper anon dev_t in stat and
>> everything will just work.
>
> Let's consider this userspace case:
>
> struct stat buf;
> struct ustat ubuf;
>
> /* Find a valid device number */
> if (stat("/", )) {
> fprintf(stderr, "Stat failed: %s\n", strerror(errno));
> return 1;
> }
>
> /* Call ustat on it */
> if (ustat(buf.st_dev, )) {
> fprintf(stderr, "Ustat failed: %s\n", strerror(errno));
> return 1;
> }
>
> In the btrfs case it has an inode op for getattr, that is used and we set
> the dev to anonymous dev_t. Later ustat will use user_get_super() which
> will only be able to work with a userblock if the super block's only
> dev_t is assigned to it. Since we have many anonymous to dev_t mapping
> to super block though we can't complete the search for btfs and ustat()
> fails with -EINVAL. The series expands the number of dev_t's that a super
> block can have and allows this search to complete.

Any further advice? I'll submit a v3 for RFC with some small change
for a fix for stress testing identified by Filipe Manana.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2] HID:hid-logitech: Prevent possibility of infinite loop when using /sys interface

2014-08-14 Thread Simon Wood

If the device data is not accessible for some reason, returning 0 will cause 
the call to be
continuously called again as none of the string has been 'consumed'.

Signed-off-by: Simon Wood 
---
 drivers/hid/hid-lg4ff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hid/hid-lg4ff.c b/drivers/hid/hid-lg4ff.c
index cc2bd20..7835717 100644
--- a/drivers/hid/hid-lg4ff.c
+++ b/drivers/hid/hid-lg4ff.c
@@ -451,13 +451,13 @@ static ssize_t lg4ff_range_store(struct device *dev, 
struct device_attribute *at
drv_data = hid_get_drvdata(hid);
if (!drv_data) {
hid_err(hid, "Private driver data not found!\n");
-   return 0;
+   return -EINVAL;
}
 
entry = drv_data->device_props;
if (!entry) {
hid_err(hid, "Device properties not found!\n");
-   return 0;
+   return -EINVAL;
}
 
if (range == 0)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] Staging driver patches for 3.17-rc1

2014-08-14 Thread Andreas Mohr

Hi,

late reply due to fatal environment failure, sorry.

On Mon, Aug 04, 2014 at 09:31:43PM -0700, Greg KH wrote:
> On Tue, Aug 05, 2014 at 06:13:25AM +0200, Andreas Mohr wrote:
> > Oh well, yet another driver where it became more difficult rather than
> > easier to make forward progress.
> 
> If you are willing to do the work, I will gladly revert the patch and
> look forward to patches to fix the remaining changes.

I've now done further device/driver research
and it looks like this course of action was actually best:
the keucr driver was duplicate/improved functionality of the
ene_ub6250.c driver,
with some extended card type support then relocated into ene_ub6250.c.
The card type relocation work that's still missing is SmartMedia,
but then I do have SM cards and would thus be able to test it.
Thus the duplicated keucr driver could probably stay removed,
with the penalty of having SM support lost at the moment
(since one had to switch drivers anyway
it's uncertain whether that's much of a loss).

The driver also seems to be missing several USB IDs,
as seen from IDs in keucr and Windows .inf file.

Also, firmware files are very old compared to content in current Windows driver
(but it remains to be seen whether new binaries remained compatible
with an old unchanged driver handling, especially given that on Windows
the firmware data is usefully(?) shipped right next to it,
within the same binary).

I'd like to definitely mention here
that this particular device might be well worth improving,
since as opposed to many other USB-based readers
it has a custom programming interface
plus external firmware data,
which could mean that we might be able
to add support for CompactFlash SMART readout
and/or SD card Card Information Struct readout,
on an external USB-based(!) reader.

I will have more time in the near future,
thus I should be able to work on some of these items -
after having completed some other pending driver submissions that is :-P

Andreas Mohr

-- 
GNU/Linux. It's not the software that's free, it's you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] usb_dev_resume returns -113 due to work items queued by usb on pm_wq is not executed before suspending.

2014-08-14 Thread Du, ChangbinX

Hi, All,
As described in runtime_pm.txt for pm_wq:
The power management work queue pm_wq in which bus types and device
drivers can put their PM-related work items. It is strongly recommended that
pm_wq be used for queuing all work items related to runtime PM, because
this allows them to be synchronized with system-wide power transitions
(suspend to RAM, hibernation and resume from system sleep states).

Per my understanding, all runtime PM related works items queued on pm_wq
should be completed before suspend to RAM. So to ensure device state is
runtime active before suspending. Having checked the pm code, I found this is
not true for work items queued by drivers.

Now usb driver has used this pm_wq to run a work item that resumes root hub
(see function xhci_resume()->usb_hcd_resume_root_hub()). But sometimes
this work is not completed before usb device suspend. That is to say root hub
device may still in runtime suspend state before suspending. And this can result
in problem. One case is that as below error log shows,

[  108.046248] PM: Entering mem sleep
[  108.050487] Suspending console(s) (use no_console_suspend to debug)
[  108.426510] active wakeup source: event5-576
[  108.426529] PM: Some devices failed to suspend
[  108.426887] dpm_run_callback(): usb_dev_resume+0x0/0x20 returns -113
[  108.426918] PM: Device 1-2 failed to resume async: error -113
[  108.428299] PM: resume of devices complete after 1.755 msecs

The usb_dev_resume() return error -113, which mean host is in suspend state
when resuming a device. The scenario is:
1) Just before system suspending, pm core will run hcd runtime resume
  routine if host is in runtime suspend state.
2) Hcd runtime resume function xhci_resume() returns, and roothub resume
  worker was queued by usb_hcd_resume_root_hub().
3) system suspend continue going before roothub resume worker starts executing.
  Thus host is still in runtime suspend state.
4) One event make suspending process aborted before hcd suspended. Then pm
  core will call resume routines for just suspended device. But when resuming
  a usb device it find the host is in suspended. Then return error -113.

If my analysis is correct, could you share your ideas for this issue?

Regards and Thanks!
Du, Changbin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] time,signal: protect resource use statistics with seqlock

2014-08-14 Thread Rik van Riel

On Thu, 14 Aug 2014 18:12:47 +0200
Oleg Nesterov  wrote:

> Or you can expand the scope of write_seqlock/write_sequnlock, so that
> __unhash_process in called from inside the critical section. This looks
> simpler at first glance.
> 
> Hmm, wait, it seems there is yet another problem ;) Afaics, you also
> need to modify __exit_signal() so that ->sum_sched_runtime/etc are
> accounted unconditionally, even if the group leader exits.

OK, this is what I have now.

I am still getting backwards time sometimes, but only tiny
increments. This suggests that cputime_adjust() may be the
culprit, and I have no good idea on how to fix that yet...

Should task_cputime_adjusted and thread_group_cputime_adjusted
pass in the address of a seqlock to use in case the values in
prev need to be updated?

Should we check whether the values in prev changed during the
time spent in the function?

Is this a race between task_cputime_adjusted and other writers
of signal->utime and signal->stime, instead of task_cputime_adjusted
racing with itself?

I am not sure what the best approach here is...

---8<---

Subject: time,signal: protect resource use statistics with seqlock

Both times() and clock_gettime(CLOCK_PROCESS_CPUTIME_ID) have scalability
issues on large systems, due to both functions being serialized with a
lock.

The lock protects against reporting a wrong value, due to a thread in the
task group exiting, its statistics reporting up to the signal struct, and
that exited task's statistics being counted twice (or not at all).

Protecting that with a lock results in times and clock_gettime being
completely serialized on large systems.

This can be fixed by using a seqlock around the events that gather and
propagate statistics. As an additional benefit, the protection code can
be moved into thread_group_cputime, slightly simplifying the calling
functions.

In the case of posix_cpu_clock_get_task things can be simplified a
lot, because the calling function already ensures tsk sticks around,
and the rest is now taken care of in thread_group_cputime.

This way the statistics reporting code can run lockless.

Signed-off-by: Rik van Riel 
---
 include/linux/sched.h  |  1 +
 kernel/exit.c  | 48 +++---
 kernel/fork.c  |  1 +
 kernel/sched/cputime.c | 36 +++
 kernel/sys.c   |  2 --
 kernel/time/posix-cpu-timers.c | 14 
 6 files changed, 51 insertions(+), 51 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 857ba40..91f9209 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -646,6 +646,7 @@ struct signal_struct {
 * Live threads maintain their own counters and add to these
 * in __exit_signal, except for the group leader.
 */
+   seqlock_t stats_lock;
cputime_t utime, stime, cutime, cstime;
cputime_t gtime;
cputime_t cgtime;
diff --git a/kernel/exit.c b/kernel/exit.c
index 32c58f7..c1a0ef2 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -115,32 +115,34 @@ static void __exit_signal(struct task_struct *tsk)

if (tsk == sig->curr_target)
sig->curr_target = next_thread(tsk);
-   /*
-* Accumulate here the counters for all threads but the
-* group leader as they die, so they can be added into
-* the process-wide totals when those are taken.
-* The group leader stays around as a zombie as long
-* as there are other threads.  When it gets reaped,
-* the exit.c code will add its counts into these totals.
-* We won't ever get here for the group leader, since it
-* will have been the last reference on the signal_struct.
-*/
-   task_cputime(tsk, , );
-   sig->utime += utime;
-   sig->stime += stime;
-   sig->gtime += task_gtime(tsk);
-   sig->min_flt += tsk->min_flt;
-   sig->maj_flt += tsk->maj_flt;
-   sig->nvcsw += tsk->nvcsw;
-   sig->nivcsw += tsk->nivcsw;
-   sig->inblock += task_io_get_inblock(tsk);
-   sig->oublock += task_io_get_oublock(tsk);
-   task_io_accounting_add(>ioac, >ioac);
-   sig->sum_sched_runtime += tsk->se.sum_exec_runtime;
}

+   /*
+* Accumulate here the counters for all threads but the
+* group leader as they die, so they can be added into
+* the process-wide totals when those are taken.
+* The group leader stays around as a zombie as long
+* as there are other threads.  When it gets reaped,
+* the exit.c code will add its counts into these totals.
+* We won't ever get here for the group leader, since it
+* will have been the last reference on the signal_struct.
+*/
+

[PATCH 1/1] iommu/vt-d: Add new macros for invalidation event

2014-08-14 Thread Li, Zhen-Hua

According to intel's spec
Intel® Virtualization Technology for Directed I/O,
Revision: 1.3 , February 2011,
Chaper 10.4.25 to 10.4.28

There are four registers

IECTL_REG   0xa0Invalidation event control register
IEDATA_REG  0xa4Invalidation event data register
IEADDR_REG  0xa8Invalidation event address register
IEUADDR_REG 0xacInvalidation event upper address register

Through they are not used in kernel in the latest version, the defination
 should be added to kernel as well as other registers.

Signed-off-by: Li, Zhen-Hua 
---
 include/linux/intel-iommu.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index a65208a..15fafd5 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -56,6 +56,10 @@
 #define DMAR_IQ_SHIFT  4   /* Invalidation queue head/tail shift */
 #define DMAR_IQA_REG   0x90/* Invalidation queue addr register */
 #define DMAR_ICS_REG   0x9c/* Invalidation complete status register */
+#define DMAR_IECTL_REG 0xa0/* Invalidation event control register */
+#define DMAR_IEDATA_REG0xa4/* Invalidation event data register */
+#define DMAR_IEADDR_REG0xa8/* Invalidation event address register 
*/
+#define DMAR_IEUADDR_REG 0xac  /* Invalidation event upper address register */
 #define DMAR_IRTA_REG  0xb8/* Interrupt remapping table addr register */
 
 #define OFFSET_STRIDE  (9)
-- 
2.0.0-rc0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-14 Thread Namhyung Kim

Hi Jiri,

2014-08-14 (목), 16:10 +0200, Jiri Olsa:
> On Thu, Aug 14, 2014 at 03:01:40PM +0900, Namhyung Kim wrote:
> 
> SNIP
> 
> > However, with --children feature added, it now can show all callees of
> > the entry.  For example, "start_kernel" entry now can display it calls
> > rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).
> > 
> >  6.14% 0.00%  swapper  [kernel.kallsyms]   [k] start_kernel
> >|
> > --- start_kernel
> > rest_init
> > cpu_idle
> >|
> >|--97.52%-- cpuidle_idle_call
> >|  cpuidle_enter_tk
> >|  |
> >|  |--99.91%-- cpuidle_wrap_enter
> >|  |  cpuidle_enter
> >|  |  intel_idle
> >|   --0.09%-- [...]
> > --2.48%-- [...]
> > 
> > Note that start_kernel has no self overhead - meaning that it never
> > get sampled by itself but constructs such a nice callgraph.  But,
> > sadly, if an entry has self overhead, callchain will get confused with
> > generated callchain (like above) and self callchains (which reversed
> > order) like the eariler example.
> > 
> > To be consistent with other entries, I'd like to make it just to show
> > a single entry - itself - like below since it doesn't have callees
> > (children) at all.  But still use the whole callchain to construct
> > children entries (like the start_kernel) as usual.
> > 
> > 40.53%40.53%  swapper  [kernel.kallsyms]   [k] intel_idle
> > |
> > --- intel_idle
> 
> I understand the consistency point, but I think we'd loose
> usefull info by cutting this off
> 
> I guess I can run 'report -g callee' to find out who called intel_idle
> instead.. but I would not need to if the callchain stays here

Yeah, but current behavior intermixes caller-callchains and
callee-callchains together so adds confusion to users.  This is a
problem IMHO.

And with --children you can easily see the callers right above the entry
as they likely to have same or higher children overhead.

Thanks,
Namhyung


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf probe: Warn user to rebuild target with debuginfo

2014-08-14 Thread Masami Hiramatsu

Here is v2 patch, which I've added "or install an appropriate debuginfo 
pacakge." :)

Thank you,

(2014/08/15 10:44), Masami Hiramatsu wrote:
> Warn user to rebuild target with debuginfo when the perf probe
> fails to find debug information in the target binary.
> Without this, perf probe just reports the failure, but it's
> no hint for users. This gives more hint for users.
> 
> Without this,
> 
>   $ strip perf
>   $ ./perf probe -x perf -L argv_split
>   Failed to open debuginfo file.
> Error: Failed to show lines.
> 
> With this,
> 
>   $ strip perf
>   $ ./perf probe -x perf -L argv_split
>   The /home/fedora/ksrc/linux-3/tools/perf/perf file has no debug information.
>   Rebuild with -g, or install an appropriate debuginfo pacakge.
> Error: Failed to show lines.
> 
> The "rebuild with ..." part changes to "rebuild with CONFIG_DEBUG_INFO"
> if the target is the kernel or a kernel module.
> 
> Signed-off-by: Masami Hiramatsu 
> Reported-by: Arnaldo Carvalho de Melo 
> Cc: Jiri Olsa 
> Cc: Namhyung Kim 
> Cc: David Ahern 
> Cc: Brendan Gregg 
> ---
>  tools/perf/util/probe-event.c |   41 
> +++--
>  1 file changed, 23 insertions(+), 18 deletions(-)
> 
> diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
> index 784ea42..9a29c72 100644
> --- a/tools/perf/util/probe-event.c
> +++ b/tools/perf/util/probe-event.c
> @@ -258,21 +258,33 @@ static void clear_probe_trace_events(struct 
> probe_trace_event *tevs, int ntevs)
>  #ifdef HAVE_DWARF_SUPPORT
>  
>  /* Open new debuginfo of given module */
> -static struct debuginfo *open_debuginfo(const char *module)
> +static struct debuginfo *open_debuginfo(const char *module, bool silent)
>  {
>   const char *path = module;
> + struct debuginfo *ret;
>  
>   if (!module || !strchr(module, '/')) {
>   path = kernel_get_module_path(module);
>   if (!path) {
> - pr_err("Failed to find path of %s module.\n",
> -module ?: "kernel");
> + if (!silent)
> + pr_err("Failed to find path of %s module.\n",
> +module ?: "kernel");
>   return NULL;
>   }
>   }
> - return debuginfo__new(path);
> + ret = debuginfo__new(path);
> + if (!ret && !silent) {
> + pr_warning("The %s file has no debug information.\n", path);
> + if (!module || !strtailcmp(path, ".ko"))
> + pr_warning("Rebuild with CONFIG_DEBUG_INFO=y, ");
> + else
> + pr_warning("Rebuild with -g, ");
> + pr_warning("or install an appropriate debuginfo pacakge.\n");
> + }
> + return ret;
>  }
>  
> +
>  static int get_text_start_address(const char *exec, unsigned long *address)
>  {
>   Elf *elf;
> @@ -333,15 +345,13 @@ static int find_perf_probe_point_from_dwarf(struct 
> probe_trace_point *tp,
>   pr_debug("try to find information at %" PRIx64 " in %s\n", addr,
>tp->module ? : "kernel");
>  
> - dinfo = open_debuginfo(tp->module);
> + dinfo = open_debuginfo(tp->module, verbose == 0);
>   if (dinfo) {
>   ret = debuginfo__find_probe_point(dinfo,
>(unsigned long)addr, pp);
>   debuginfo__delete(dinfo);
> - } else {
> - pr_debug("Failed to open debuginfo at 0x%" PRIx64 "\n", addr);
> + } else
>   ret = -ENOENT;
> - }
>  
>   if (ret > 0) {
>   pp->retprobe = tp->retprobe;
> @@ -457,13 +467,11 @@ static int try_to_find_probe_trace_events(struct 
> perf_probe_event *pev,
>   struct debuginfo *dinfo;
>   int ntevs, ret = 0;
>  
> - dinfo = open_debuginfo(target);
> + dinfo = open_debuginfo(target, !need_dwarf);
>  
>   if (!dinfo) {
> - if (need_dwarf) {
> - pr_warning("Failed to open debuginfo file.\n");
> + if (need_dwarf)
>   return -ENOENT;
> - }
>   pr_debug("Could not open debuginfo. Try to use symbols.\n");
>   return 0;
>   }
> @@ -620,11 +628,9 @@ static int __show_line_range(struct line_range *lr, 
> const char *module)
>   char *tmp;
>  
>   /* Search a line range */
> - dinfo = open_debuginfo(module);
> - if (!dinfo) {
> - pr_warning("Failed to open debuginfo file.\n");
> + dinfo = open_debuginfo(module, false);
> + if (!dinfo)
>   return -ENOENT;
> - }
>  
>   ret = debuginfo__find_line_range(dinfo, lr);
>   debuginfo__delete(dinfo);
> @@ -772,9 +778,8 @@ int show_available_vars(struct perf_probe_event *pevs, 
> int npevs,
>   if (ret < 0)
>   return ret;
>  
> - dinfo = open_debuginfo(module);
> + dinfo = open_debuginfo(module, false);
>   if (!dinfo) {
> -

[PATCH] perf probe: Warn user to rebuild target with debuginfo

2014-08-14 Thread Masami Hiramatsu

Warn user to rebuild target with debuginfo when the perf probe
fails to find debug information in the target binary.
Without this, perf probe just reports the failure, but it's
no hint for users. This gives more hint for users.

Without this,

  $ strip perf
  $ ./perf probe -x perf -L argv_split
  Failed to open debuginfo file.
Error: Failed to show lines.

With this,

  $ strip perf
  $ ./perf probe -x perf -L argv_split
  The /home/fedora/ksrc/linux-3/tools/perf/perf file has no debug information.
  Rebuild with -g, or install an appropriate debuginfo pacakge.
Error: Failed to show lines.

The "rebuild with ..." part changes to "rebuild with CONFIG_DEBUG_INFO"
if the target is the kernel or a kernel module.

Signed-off-by: Masami Hiramatsu 
Reported-by: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: David Ahern 
Cc: Brendan Gregg 
---
 tools/perf/util/probe-event.c |   41 +++--
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 784ea42..9a29c72 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -258,21 +258,33 @@ static void clear_probe_trace_events(struct 
probe_trace_event *tevs, int ntevs)
 #ifdef HAVE_DWARF_SUPPORT
 
 /* Open new debuginfo of given module */
-static struct debuginfo *open_debuginfo(const char *module)
+static struct debuginfo *open_debuginfo(const char *module, bool silent)
 {
const char *path = module;
+   struct debuginfo *ret;
 
if (!module || !strchr(module, '/')) {
path = kernel_get_module_path(module);
if (!path) {
-   pr_err("Failed to find path of %s module.\n",
-  module ?: "kernel");
+   if (!silent)
+   pr_err("Failed to find path of %s module.\n",
+  module ?: "kernel");
return NULL;
}
}
-   return debuginfo__new(path);
+   ret = debuginfo__new(path);
+   if (!ret && !silent) {
+   pr_warning("The %s file has no debug information.\n", path);
+   if (!module || !strtailcmp(path, ".ko"))
+   pr_warning("Rebuild with CONFIG_DEBUG_INFO=y, ");
+   else
+   pr_warning("Rebuild with -g, ");
+   pr_warning("or install an appropriate debuginfo pacakge.\n");
+   }
+   return ret;
 }
 
+
 static int get_text_start_address(const char *exec, unsigned long *address)
 {
Elf *elf;
@@ -333,15 +345,13 @@ static int find_perf_probe_point_from_dwarf(struct 
probe_trace_point *tp,
pr_debug("try to find information at %" PRIx64 " in %s\n", addr,
 tp->module ? : "kernel");
 
-   dinfo = open_debuginfo(tp->module);
+   dinfo = open_debuginfo(tp->module, verbose == 0);
if (dinfo) {
ret = debuginfo__find_probe_point(dinfo,
 (unsigned long)addr, pp);
debuginfo__delete(dinfo);
-   } else {
-   pr_debug("Failed to open debuginfo at 0x%" PRIx64 "\n", addr);
+   } else
ret = -ENOENT;
-   }
 
if (ret > 0) {
pp->retprobe = tp->retprobe;
@@ -457,13 +467,11 @@ static int try_to_find_probe_trace_events(struct 
perf_probe_event *pev,
struct debuginfo *dinfo;
int ntevs, ret = 0;
 
-   dinfo = open_debuginfo(target);
+   dinfo = open_debuginfo(target, !need_dwarf);
 
if (!dinfo) {
-   if (need_dwarf) {
-   pr_warning("Failed to open debuginfo file.\n");
+   if (need_dwarf)
return -ENOENT;
-   }
pr_debug("Could not open debuginfo. Try to use symbols.\n");
return 0;
}
@@ -620,11 +628,9 @@ static int __show_line_range(struct line_range *lr, const 
char *module)
char *tmp;
 
/* Search a line range */
-   dinfo = open_debuginfo(module);
-   if (!dinfo) {
-   pr_warning("Failed to open debuginfo file.\n");
+   dinfo = open_debuginfo(module, false);
+   if (!dinfo)
return -ENOENT;
-   }
 
ret = debuginfo__find_line_range(dinfo, lr);
debuginfo__delete(dinfo);
@@ -772,9 +778,8 @@ int show_available_vars(struct perf_probe_event *pevs, int 
npevs,
if (ret < 0)
return ret;
 
-   dinfo = open_debuginfo(module);
+   dinfo = open_debuginfo(module, false);
if (!dinfo) {
-   pr_warning("Failed to open debuginfo file.\n");
ret = -ENOENT;
goto out;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: pull request: bluetooth-next 2014-08-14

2014-08-14 Thread Marcel Holtmann

Hi Stephen,

>> Here's our first bluetooth-next pull request for the 3.18 kernel. Our
>> tree is based on net-next so you'd need to pull from there first before
>> pulling from our tree.
> 
> This is in a branch that will be included in linux-next today.  I guess
> you didn't read my (very often) repeated request that no v3.18 material
> be included until after v3.17-rc1 is released ... Please do not do that.

to be honest I did not know about this rule. Next time, I will create a special 
tree for the time during the merge window so that we can continue pushing out 
patches for the next release. For us the world is not standing still just 
because we have a 2 week merge window.

Regards

Marcel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [PATCH] perf probe: Warn user to rebuild target with debuginfo

2014-08-14 Thread Masami Hiramatsu

(2014/08/15 10:07), Arnaldo Carvalho de Melo wrote:
> Em Thu, Aug 14, 2014 at 01:07:28PM -0700, Brendan Gregg escreveu:
>> On Thu, Aug 14, 2014 at 11:29 AM, Masami Hiramatsu
>>  wrote:
>> [...]
>>> The "rebuild with ..." part changes to "rebuild with CONFIG_DEBUG_INFO"
>>> if the target is the kernel or a kernel module.
>  
>> Thanks, definitely an improvement! Should the kernel message also
>> mention kernel debuginfo packages? Depends on the distribution and
>> environment, but I think for some users the solution is to add the
>> package.

I see, and at least fedora/rhel has debuginfo for all packages.
So, not only for the kernel, but also for user applications,
we'll need to do that.

> Yeah, something like what is suggested by gdb and documented here:
> 
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Developer_Guide/intro.debuginfo.html
> 
> 
> In some cases (such as loading a core file), GDB does not know the
> name, version, or release of a name-debuginfo-version-release.rpm
> package; it only knows the build-id. In such cases, GDB suggests a
> different command:
> 
> gdb -c ./core
> [...]
> Missing separate debuginfo for the main executable filename
> Try: yum --disablerepo='*' --enablerepo='*debug*' install 
> /usr/lib/debug/.build-id/ef/dd0b5e69b0742fa5e5bad0771df4d1df2459d1
> -

ah, that's nice :)

> 
> This is something I want to have eventually, i.e. to have per distro
> plugins to automatically download packages required for some features,
> like probing and annotation, for instance.

Yeah, however, it depends on the distro. AFAIK, ubuntu provides
debuginfo package only for the kernel. So, at this point, I think
what we can do is just say "please install debuginfo package"
as below.

  $ ./perf probe -x perf -L argv_split
  The /home/fedora/ksrc/linux-3/tools/perf/perf file has no debug information, 
rebuild with -g.
  Or install appropriate debuginfo package.
Error: Failed to show lines.

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT

2014-08-14 Thread Andy Lutomirski

On Thu, Aug 14, 2014 at 6:21 PM, Liu, Chuansheng
 wrote:
>
>
>> -Original Message-
>> From: Andy Lutomirski [mailto:l...@amacapital.net]
>> Sent: Friday, August 15, 2014 5:23 AM
>> To: Peter Zijlstra
>> Cc: Daniel Lezcano; Liu, Chuansheng; Rafael J. Wysocki;
>> linux...@vger.kernel.org; LKML; Liu, Changcheng; Wang, Xiaoming;
>> Chakravarty, Souvik K
>> Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS
>> back to DEFAULT
>>
>> On Thu, Aug 14, 2014 at 2:16 PM, Peter Zijlstra 
>> wrote:
>> > On Thu, Aug 14, 2014 at 02:12:27PM -0700, Andy Lutomirski wrote:
>> >> On 08/14/2014 04:14 AM, Daniel Lezcano wrote:
>> >> > On 08/14/2014 01:00 PM, Peter Zijlstra wrote:
>> >> >>
>> >> >> So seeing how you're from @intel.com I'm assuming you're using x86
>> here.
>> >> >>
>> >> >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs
>> >> >> just fine, which means we'll fall out of the cpuidle_enter(), which
>> >> >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call().
>> >> >>
>> >> >> It will indeed not leave the cpu_idle_loop() function and go right back
>> >> >> into cpuidle_idle_call(), but that will then call cpuidle_select() 
>> >> >> which
>> >> >> should pick a new C state.
>> >> >>
>> >> >> So the interrupt _should_ work. If it doesn't you need to explain why.
>> >> >
>> >> > I think the issue is related to the poll_idle state, in
>> >> > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the
>> >> > cpuidle table as the state 0 (POLL). There is no mwait for this state.
>> >> > It is a bit confusing because this state is not listed in the acpi /
>> >> > intel idle driver but inserted implicitly at the beginning of the idle
>> >> > table by the cpuidle framework when the driver is registered.
>> >> >
>> >> > static int poll_idle(struct cpuidle_device *dev,
>> >> > struct cpuidle_driver *drv, int index)
>> >> > {
>> >> > local_irq_enable();
>> >> > if (!current_set_polling_and_test()) {
>> >> > while (!need_resched())
>> >> > cpu_relax();
>> >> > }
>> >> > current_clr_polling();
>> >> >
>> >> > return index;
>> >> > }
>> >>
>> >> As the most recent person to have modified this function, and as an
>> >> avowed hater of pointless IPIs, let me ask a rather different question:
>> >> why are you sending IPIs at all?  As of Linux 3.16, poll_idle actually
>> >> supports the polling idle interface :)
>> >>
>> >> Can't you just do:
>> >>
>> >> if (set_nr_if_polling(rq->idle)) {
>> >>   trace_sched_wake_idle_without_ipi(cpu);
>> >> } else {
>> >>   spin_lock_irqsave(>lock, flags);
>> >>   if (rq->curr == rq->idle)
>> >>   smp_send_reschedule(cpu);
>> >>   // else the CPU wasn't idle; nothing to do
>> >>   raw_spin_unlock_irqrestore(>lock, flags);
>> >> }
>> >>
>> >> In the common case (wake from C0, i.e. polling idle), this will skip the
>> >> IPI entirely unless you race with idle entry/exit, saving a few more
>> >> precious electrons and all of the latency involved in poking the APIC
>> >> registers.
>> >
>> > They could and they probably should, but that logic should _not_ live in
>> > the cpuidle driver.
>>
>> Sure.  My point is that fixing the IPI handler is, I think, totally
>> bogus, because the IPI API isn't the right way to do this at all.
>>
>> It would be straightforward to add a new function wake_if_idle(int
>> cpu) to sched/core.c.
>>
> Thanks Andy and Peter's suggestion, it will save some IPI things in case the 
> cores are not
> in idle.

This isn't quite right.  Using the polling interface correctly will
save IPIs in case the core *is* idle.  But, given that you are trying
to upgrade the chosen idle state, I don't think you need to kick
non-idle CPUs at all, and my example contains that optimization.

Presumably the function should be named something like wake_up_if_idle.

>
> There is one similar API in sched/core.c wake_up_idle_cpu(),
> then just need add one new common smp API:
>
> smp_wake_up_cpus() {
> for_each_online_cpu()
>   wake_up_idle_cpu();
> }
>
> Will try one patch for it.

This will have lots of extra overhead if the cpu is *not* idle.  I
think my example will be a lot more efficient.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT

2014-08-14 Thread Liu, Chuansheng



> -Original Message-
> From: Andy Lutomirski [mailto:l...@amacapital.net]
> Sent: Friday, August 15, 2014 5:23 AM
> To: Peter Zijlstra
> Cc: Daniel Lezcano; Liu, Chuansheng; Rafael J. Wysocki;
> linux...@vger.kernel.org; LKML; Liu, Changcheng; Wang, Xiaoming;
> Chakravarty, Souvik K
> Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS
> back to DEFAULT
> 
> On Thu, Aug 14, 2014 at 2:16 PM, Peter Zijlstra 
> wrote:
> > On Thu, Aug 14, 2014 at 02:12:27PM -0700, Andy Lutomirski wrote:
> >> On 08/14/2014 04:14 AM, Daniel Lezcano wrote:
> >> > On 08/14/2014 01:00 PM, Peter Zijlstra wrote:
> >> >>
> >> >> So seeing how you're from @intel.com I'm assuming you're using x86
> here.
> >> >>
> >> >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs
> >> >> just fine, which means we'll fall out of the cpuidle_enter(), which
> >> >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call().
> >> >>
> >> >> It will indeed not leave the cpu_idle_loop() function and go right back
> >> >> into cpuidle_idle_call(), but that will then call cpuidle_select() which
> >> >> should pick a new C state.
> >> >>
> >> >> So the interrupt _should_ work. If it doesn't you need to explain why.
> >> >
> >> > I think the issue is related to the poll_idle state, in
> >> > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the
> >> > cpuidle table as the state 0 (POLL). There is no mwait for this state.
> >> > It is a bit confusing because this state is not listed in the acpi /
> >> > intel idle driver but inserted implicitly at the beginning of the idle
> >> > table by the cpuidle framework when the driver is registered.
> >> >
> >> > static int poll_idle(struct cpuidle_device *dev,
> >> > struct cpuidle_driver *drv, int index)
> >> > {
> >> > local_irq_enable();
> >> > if (!current_set_polling_and_test()) {
> >> > while (!need_resched())
> >> > cpu_relax();
> >> > }
> >> > current_clr_polling();
> >> >
> >> > return index;
> >> > }
> >>
> >> As the most recent person to have modified this function, and as an
> >> avowed hater of pointless IPIs, let me ask a rather different question:
> >> why are you sending IPIs at all?  As of Linux 3.16, poll_idle actually
> >> supports the polling idle interface :)
> >>
> >> Can't you just do:
> >>
> >> if (set_nr_if_polling(rq->idle)) {
> >>   trace_sched_wake_idle_without_ipi(cpu);
> >> } else {
> >>   spin_lock_irqsave(>lock, flags);
> >>   if (rq->curr == rq->idle)
> >>   smp_send_reschedule(cpu);
> >>   // else the CPU wasn't idle; nothing to do
> >>   raw_spin_unlock_irqrestore(>lock, flags);
> >> }
> >>
> >> In the common case (wake from C0, i.e. polling idle), this will skip the
> >> IPI entirely unless you race with idle entry/exit, saving a few more
> >> precious electrons and all of the latency involved in poking the APIC
> >> registers.
> >
> > They could and they probably should, but that logic should _not_ live in
> > the cpuidle driver.
> 
> Sure.  My point is that fixing the IPI handler is, I think, totally
> bogus, because the IPI API isn't the right way to do this at all.
> 
> It would be straightforward to add a new function wake_if_idle(int
> cpu) to sched/core.c.
> 
Thanks Andy and Peter's suggestion, it will save some IPI things in case the 
cores are not
in idle.

There is one similar API in sched/core.c wake_up_idle_cpu(),
then just need add one new common smp API:

smp_wake_up_cpus() {
for_each_online_cpu()
  wake_up_idle_cpu();
}

Will try one patch for it.


N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH] perf probe: Warn user to rebuild target with debuginfo

2014-08-14 Thread Arnaldo Carvalho de Melo

Em Thu, Aug 14, 2014 at 01:07:28PM -0700, Brendan Gregg escreveu:
> On Thu, Aug 14, 2014 at 11:29 AM, Masami Hiramatsu
>  wrote:
> [...]
> > The "rebuild with ..." part changes to "rebuild with CONFIG_DEBUG_INFO"
> > if the target is the kernel or a kernel module.
 
> Thanks, definitely an improvement! Should the kernel message also
> mention kernel debuginfo packages? Depends on the distribution and
> environment, but I think for some users the solution is to add the
> package.

Yeah, something like what is suggested by gdb and documented here:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Developer_Guide/intro.debuginfo.html


In some cases (such as loading a core file), GDB does not know the
name, version, or release of a name-debuginfo-version-release.rpm
package; it only knows the build-id. In such cases, GDB suggests a
different command:

gdb -c ./core
[...]
Missing separate debuginfo for the main executable filename
Try: yum --disablerepo='*' --enablerepo='*debug*' install 
/usr/lib/debug/.build-id/ef/dd0b5e69b0742fa5e5bad0771df4d1df2459d1
-

This is something I want to have eventually, i.e. to have per distro
plugins to automatically download packages required for some features,
like probing and annotation, for instance.

E.g:

[acme@zoo ~]$ perf record usleep 1 
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data (~615 samples) ]
[acme@zoo ~]$ perf buildid-list
34412145145a7561db0d0063a925c17d9af67a1f [kernel.kallsyms]
efc76c94d401b3b7f0f8ecb893c829d82f10e4b2 /usr/lib64/libc-2.18.so
[acme@zoo ~]$ sudo yum --disablerepo='*' --enablerepo='*debug*' install 
/usr/lib/debug/.build-id/34/412145145a7561db0d0063a925c17d9af67a1f
Loaded plugins: auto-update-debuginfo, langpacks, refresh-packagekit
Resolving Dependencies
--> Running transaction check
---> Package kernel-debuginfo.x86_64 0:3.15.8-200.fc20 will be installed
--> Processing Dependency: kernel-debuginfo-common-x86_64 =
3.15.8-200.fc20 for package: kernel-debuginfo-3.15.8-200.fc20.x86_64
--> Running transaction check
---> Package kernel-debuginfo-common-x86_64.x86_64 0:3.15.8-200.fc20
will be installed
--> Finished Dependency Resolution

Dependencies Resolved


- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 2/2] x86: Add "make tinyconfig" to configure the tiniest possible kernel

2014-08-14 Thread Luis R. Rodriguez

On Mon, Aug 11, 2014 at 09:36:06PM +0200, Sam Ravnborg wrote:
> On Mon, Aug 11, 2014 at 11:15:21AM -0700, j...@joshtriplett.org wrote:
> > On Fri, Aug 08, 2014 at 06:22:20PM -0700, Linus Torvalds wrote:
> > > On Fri, Aug 8, 2014 at 5:10 PM, Josh Triplett  
> > > wrote:
> > > > Since commit 5d2acfc7b974bbd3858b4dd3f2cdc6362dd8843a ("kconfig: make
> > > > allnoconfig disable options behind EMBEDDED and EXPERT") in 3.15-rc1,
> > > > "make allnoconfig" disables every possible config option.
> > > 
> > > May I suggest pushing these through the kbuild tree, rather than
> > > (judging by the people cc'd) the x86 trees?
> > 
> > By all means; v3 seems sufficiently generic that it makes more sense for
> > both of these patches to go in through a kbuild tree, or something else
> > suitably general such as Andrew's tree.
> > 
> > Would someone mind picking it up, please?
> I will take care that they are channeled through kbuild tree.
> 
> We are too late for this merge window so we are not in a hurry.

This went in with a helper which I had originally submitted to
enable xenconfig but xenconfig patch went nowhere for some reason.
What tree should I use to rebase and who should I sent this stuff
to? The last series got reviewed bug just collected dust after we
agreed on the approach.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drm/msm: Fix missing unlock on error in msm_fbdev_create()

2014-08-14 Thread Rob Clark

On Wed, Aug 13, 2014 at 9:01 PM,   wrote:
> From: Wei Yongjun 
>
> Add the missing unlock before return from function msm_fbdev_create()
> in the error handling case.
>
> Signed-off-by: Wei Yongjun 

Thanks, I've got it queued up..

BR,
-R

> ---
>  drivers/gpu/drm/msm/msm_fbdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_fbdev.c b/drivers/gpu/drm/msm/msm_fbdev.c
> index 9c5221c..ab5bfd2 100644
> --- a/drivers/gpu/drm/msm/msm_fbdev.c
> +++ b/drivers/gpu/drm/msm/msm_fbdev.c
> @@ -143,7 +143,7 @@ static int msm_fbdev_create(struct drm_fb_helper *helper,
> ret = msm_gem_get_iova_locked(fbdev->bo, 0, );
> if (ret) {
> dev_err(dev->dev, "failed to get buffer obj iova: %d\n", ret);
> -   goto fail;
> +   goto fail_unlock;
> }
>
> fbi = framebuffer_alloc(0, dev->dev);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 1/3] mfd: devicetree: bindings: Add Qualcomm RPM DT binding

2014-08-14 Thread Bjorn Andersson

On Tue 12 Aug 10:43 PDT 2014, Kumar Gala wrote:

> 
> On Aug 11, 2014, at 5:43 PM, Bjorn Andersson  
> wrote:
> 
[...]
> > diff --git a/Documentation/devicetree/bindings/mfd/qcom,rpm.txt 
> > b/Documentation/devicetree/bindings/mfd/qcom,rpm.txt
[...]
> > +- reg:
> > + Usage: required
> > + Value type: 
> > + Definition: two entries specifying the physical address and size of 
> > the
> > + RPM's message ram
> 
> I’m a little confused here, by ‘two entries’ do you mean two values or really 
> two regions? < A B > or < A B C D >?
> 

Hmm, I agree with you, but I'm not sure what the definition of an entry is in
this context and looking around at other bindings makes me wonder even more.

How about we reduce the confusion by dropping the beginning and letting it be
"the physical address and size of..."?

> > +
[...]
> > +
> > +- qcom,ipc:
> > + Usage: required
> > + Value type: 
> > + Definition: three entries specifying:
> > + - phandle to a syscon node representing the apcs registers
> > + - u32 representing offset to the register within the 
> > syscon
> > + - u32 representing the ipc bit within the register
> 
> Can we clarify that this is ipc from Apps (or ARM processors to RPM)
> 

Makes sense

> > +
> > +
> > += SUBDEVICES
> 
> These should not be children of RPM, but in an RPM container outside of the 
> SoC node with some phandle reference (if needed) to the RPM.  The reason is 
> there isn’t really a technical means to translate from the SoC / MMIO bus 
> address space to the RPM address space that these nodes live in.
> 

I don't agree with this; the regulators, clocks and bus-scalers aren't
components on their own but just parts of the RPM. Your argument indicates that
every pmic, i2c, spi... block got this wrong, I think we should better follow
the pattern defined by the rest of these.

One thing that I think should be fixed though is to rename "SUBDEVICES" to
"SUBNODES" as "devices" is an implementation detail in my solution.

> Also, the binding specs should be split out into their own files.
> 

Not according to Rob Herring:
https://lkml.org/lkml/2014/3/10/567

> > +
> > +The RPM exposes resources to its subnodes. The below bindings specify the 
> > set
> > +of valid subnodes that can operate on these resources.
> > +
> > +== Switch-mode Power Supply regulator
> > +
[...]
> > +- reg:
> > + Usage: required
> > + Value type: 
> > + Definition: resource as defined in 
> > +
> 
> Can we spec what subset of values in qcom,rpm.h are actually valid?
> 

Yes, sorry about not improving this part.

[...]
> > +- qcom,switch-mode-frequency:
> > + Usage: required
> > + Value type: 
> > + Definition: Frequency (Hz) of the switch-mode power supply;
> > + must be one of:
> > + 1920, 960, 640, 480, 384, 320,
> > + 274, 240, 213, 192, 175, 160,
> > + 148, 137, 128, 120
> > +
> > +- qcom,force-mode-none:
> 
> I think I asked this last time I took a look at this, but can we have 
> multiple force-mode’s set?  If no, maybe this should be an enum instead.
> 

No, they are mutually exclusive. I just like the boolean representation
instead, but I can change it to an enum and provide some constants in the
header file.

Looking through msm-3.4 almost everything seems to be using force mode "none",
with a few uses of "auto". So if we turn it into an enum then we could specify
"none" in the platform dtsi (or don't specifying anything?) and only have to
override it in the very few places it needs to be anything else.

> > + Usage: optional (default if no other qcom,force-mode is specified)
> > + Value type: 
> > + Defintion: indicates that the regulator should not be forced to any
> > +particular mode
> > +
> > +- qcom,force-mode-lpm:
> > + Usage: optional
> > + Value type: 
> > + Definition: indicates that the regulator should be forced to operate 
> > in
> > + low-power-mode
> > +
> > +- qcom,force-mode-auto:
> > + Usage: optional (only available for 8960/8064)
> 
> can we say only available for "qcom,rpm-msm8960”, "qcom,rpm-apq8064"
> 

Indeed.

> > + Value type: 
> > + Definition: indicates that the regulator should be automatically pick
> > + operating mode
> > +
> > +- qcom,force-mode-hpm:
> > + Usage: optional (only available for 8960/8064)
> 
> can we say only available for "qcom,rpm-msm8960”, "qcom,rpm-apq8064"
> 

Indeed.

> > + Value type: 
> > + Definition: indicates that the regulator should be forced to operate 
> > in
> > + high-power-mode
> > +
> > +- qcom,force-mode-bypass: (only for 8960/8064)
> > + Usage: optional (only available for 8960/8064)
> 
> can we say only available for "qcom,rpm-msm8960”, "qcom,rpm-apq8064"
> 

Indeed.

> 
> > + Value type:

Re: pull request: bluetooth-next 2014-08-14

2014-08-14 Thread Stephen Rothwell

Hi Johan,

On Fri, 15 Aug 2014 09:55:47 +1000 Stephen Rothwell  
wrote:
>
> On Thu, 14 Aug 2014 17:50:00 +0300 Johan Hedberg  
> wrote:
> >
> > Here's our first bluetooth-next pull request for the 3.18 kernel. Our
> > tree is based on net-next so you'd need to pull from there first before
> > pulling from our tree.
> 
> This is in a branch that will be included in linux-next today.  I guess
> you didn't read my (very often) repeated request that no v3.18 material
> be included until after v3.17-rc1 is released ... Please do not do that.

It has also all been rebased since yesterday ...

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature

Re: pull request: bluetooth-next 2014-08-14

2014-08-14 Thread Stephen Rothwell

Hi Johan,

On Thu, 14 Aug 2014 17:50:00 +0300 Johan Hedberg  
wrote:
>
> Here's our first bluetooth-next pull request for the 3.18 kernel. Our
> tree is based on net-next so you'd need to pull from there first before
> pulling from our tree.

This is in a branch that will be included in linux-next today.  I guess
you didn't read my (very often) repeated request that no v3.18 material
be included until after v3.17-rc1 is released ... Please do not do that.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

signature.asc
Description: PGP signature

[GIT PULL] More ACPI and power management updates for 3.17-rc1

2014-08-14 Thread Rafael J. Wysocki

Hi Linus,

Please pull from

 git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
 pm+acpi-3.17-rc1-2

to receive more ACPI and power management updates for v3.17-rc1 with
top-most commit af5b7e84d022fdea373038d831bb4ca2c0e82108

 Merge branch 'pm-tools'

on top of commit 7725131982477b8ffdea143434dcc69f5d90

 Merge tag 'pm+acpi-3.17-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/
linux-pm

These are a couple of regression fixes, cpuidle menu governor
optimizations, fixes for ACPI proccessor and battery drivers,
hibernation fix to avoid problems related to the e820 memory map,
fixes for a few cpufreq drivers and a new version of the suspend
profiling tool analyze_suspend.py.

Specifics:

 - Fix for an ACPI-based device hotplug regression introduced in 3.14
   that causes a kernel panic to trigger when memory hot-remove is
   attempted with CONFIG_ACPI_HOTPLUG_MEMORY unset from Tang Chen.

 - Fix for a cpufreq regression introduced in 3.16 that triggers a
   "sleeping function called from invalid context" bug in
   dev_pm_opp_init_cpufreq_table() from Stephen Boyd.

 - ACPI battery driver fix for a warning message added in 3.16 that
   prints silly stuff sometimes from Mariusz Ceier.

 - Hibernation fix for safer handling of mismatches in the 820 memory
   map between the configurations during image creation and during
   the subsequent restore from Chun-Yi Lee.

 - ACPI processor driver fix to handle CPU hotplug notifications
   correctly during system suspend/resume from Lan Tianyu.

 - Series of four cpuidle menu governor cleanups that also should
   speed it up a bit from Mel Gorman.

 - Fixes for the speedstep-smi, integrator, cpu0 and arm_big_little
   cpufreq drivers from Hans Wennborg, Himangi Saraogi, Markus Pargmann
   and Uwe Kleine-König.

 - Version 3.0 of the analyze_suspend.py suspend profiling tool
   from Todd E Brandt.

Thanks!


---

Hans Wennborg (1):
  cpufreq: speedstep-smi: fix decimal printf specifiers

Himangi Saraogi (1):
  cpufreq: integrator: Use set_cpus_allowed_ptr

Lan Tianyu (1):
  ACPI / processor: Make acpi_cpu_soft_notify() process CPU FROZEN events

Lee, Chun-Yi (1):
  PM / hibernate: avoid unsafe pages in e820 reserved regions

Mariusz Ceier (1):
  ACPI / battery: Fix warning message in acpi_battery_get_state()

Markus Pargmann (1):
  cpufreq: cpu0: Do not print error message when deferring

Mel Gorman (4):
  cpuidle: menu: Use shifts when calculating averages where possible
  cpuidle: menu: Use ktime_to_us instead of reinventing the wheel
  cpuidle: menu: Call nr_iowait_cpu less times
  cpuidle: menu: Lookup CPU runqueues less

Stephen Boyd (1):
  cpufreq: OPP: Avoid sleeping while atomic

Tang Chen (1):
  ACPI / hotplug: Check scan handlers in acpi_scan_hot_remove()

Todd E Brandt (1):
  PM / tools: analyze_suspend.py: update to v3.0

Uwe Kleine-König (1):
  cpufreq: arm_big_little: fix module license spec

---

 drivers/acpi/battery.c   |2 +-
 drivers/acpi/processor_driver.c  |1 +
 drivers/acpi/scan.c  |3 +-
 drivers/cpufreq/arm_big_little.c |5 +
 drivers/cpufreq/arm_big_little_dt.c  |2 +-
 drivers/cpufreq/cpufreq-cpu0.c   |2 +-
 drivers/cpufreq/cpufreq_opp.c|2 +-
 drivers/cpufreq/integrator-cpufreq.c |   10 +-
 drivers/cpufreq/speedstep-smi.c  |4 +-
 drivers/cpuidle/governors/menu.c |   43 +-
 include/linux/sched.h|3 +-
 kernel/power/snapshot.c  |   21 +-
 kernel/sched/core.c  |7 +
 kernel/sched/proc.c  |7 -
 scripts/analyze_suspend.py   | 3817 ++
 15 files changed, 3051 insertions(+), 878 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT] Sparc

2014-08-14 Thread David Miller


Hook up the memfd syscall, and properly claim all PCI resources
discovered when building the PCI device tree.

Please pull, thanks a lot!

The following changes since commit f0094b28f3038936c1985be64dbe83f0e950b671:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2014-08-13 
18:27:40 -0600)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc.git master

for you to fetch changes up to 10cf15e1d1289aa0bf1d26e9f55176b4c7c5c512:

  sparc: Hook up memfd_create system call. (2014-08-13 22:00:09 -0700)


David S. Miller (4):
  sparc64: Expand PCI bridge probing debug logging.
  sparc64: Skip bogus PCI bridge ranges.
  sparc64: Properly claim resources as each PCI bus is probed.
  sparc: Hook up memfd_create system call.

 arch/sparc/include/uapi/asm/unistd.h |  3 ++-
 arch/sparc/kernel/pci.c  | 67 
++-
 arch/sparc/kernel/systbls_32.S   |  2 +-
 arch/sparc/kernel/systbls_64.S   |  4 ++--
 4 files changed, 71 insertions(+), 5 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT] Networking

2014-08-14 Thread David Miller


I'm sending this out, in particular, to get the iwlwifi fix propagated:

1) Fix build due to missing include in i40e driver, from Lucas Tanure.

2) Memory leak in openvswitch port allocation, from Chirstoph Jaeger.

3) Check DMA mapping errors in myri10ge, from Stanislaw Gruszka.

4) Fix various deadlock scenerios in sunvnet driver, from Sowmini Varadhan.

5) Fix cxgb4i build failures with incompatible Kconfig settings of the
   driver vs. ipv6, from Anish Bhatt.

6) Fix generation of ACK packet timestamps in the presence of TSO which
   will be split up, from Willem de Bruijn.

7) Don't enable sched scan in iwlwifi driver, it causes firmware crashes
   in some revisions.  From Emmanuel Grumbach.

8) Revert a macvlan simplification that causes crashes.

9) Handle RTT calculations properly in the presence of repair'd SKBs,
   from Andrey Vagin.

10) SIT tunnel lookup uses wrong device index in compares, from
Shmulik Ladkani.

11) Handle MTU reductions in TCP properly for ipv4 mapped ipv6 sockets,
from Neal Cardwell.

12) Add missing annotations in rhashtable code, from Thomas Graf.

13) Fix false interpretation of two RTOs as being from the same TCP
loss event in the FRTO code, from Neal Cardwell.

Please pull, thanks a lot!

The following changes since commit f0094b28f3038936c1985be64dbe83f0e950b671:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2014-08-13 
18:27:40 -0600)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to a61ebdfdb13a051f707b408d464f63b991aa21e3:

  Merge tag 'master-2014-08-14' of 
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless (2014-08-14 
15:17:37 -0700)



Andreas Ruprecht (1):
  net: ethernet: ibm: ehea: Remove duplicate object from Makefile

Andrey Vagin (1):
  tcp: don't use timestamp from repaired skb-s to calculate RTT (v2)

Anish Bhatt (1):
  libcxgbi/cxgb4i : Fix ipv6 build failure caught with randconfig

Arend van Spriel (2):
  brcmfmac: fix curly brace mistake in brcmf_pcie_handle_mb_data()
  brcmfmac: fix memory leakage in msgbuf

Christoph Jaeger (1):
  openvswitch: Fix memory leak in ovs_vport_alloc() error path

David S. Miller (5):
  Merge branch 'xen-netback-debugfs'
  Merge branch 'xen-netback-synchronization'
  Merge git://git.kernel.org/.../jkirsher/net
  Revert "macvlan: simplify the structure port"
  Merge tag 'master-2014-08-14' of 
git://git.kernel.org/.../linville/wireless

Emmanuel Grumbach (1):
  iwlwifi: mvm: disable scheduled scan to prevent firmware crash

Govindarajulu Varadarajan (1):
  tg3: fix return value in tg3_get_stats64

Hannes Frederic Sowa (1):
  tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle 
logic

Jean Sacren (2):
  e1000e: fix trivial kernel doc typos
  e1000e: delete excessive space character in debug message

Julia Lawall (1):
  i40e: use correct structure type name in sizeof

Libo Chen (1):
  drivers/net/irda/donauboe.c: convert to module_pci_driver

Lucas Tanure (1):
  i40e: Fix missing uapi/linux/dcbnl.h include in i40e_fcoe.c

Maks Naumov (1):
  irda: Fix rd_frame control field initialization in irlap_send_rd_frame()

Michal Simek (1):
  net: xilinx: Remove .owner field for driver

Neal Cardwell (2):
  tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()
  tcp: fix ssthresh and undo for consecutive short FRTO episodes

Rickard Strandqvist (2):
  net: wireless: ipw2x00: ipw2200.c: Cleaning up missing null-terminate 
after strncpy call
  i40e: Cleaning up missing null-terminate in conjunction with strncpy

Ronald Wahl (1):
  carl9170: fix sending URBs with wrong type when using full-speed

Shmulik Ladkani (1):
  sit: Fix ipip6_tunnel_lookup device matching criteria

Sowmini Varadhan (3):
  sunvnet: Do not ask for an ACK for every dring transmit
  sunvnet: Do not spin in an infinite loop when vio_ldc_send() returns 
EAGAIN
  sunvnet: Schedule maybe_tx_wakeup() as a tasklet from ldc_rx path

Stanislaw Gruszka (1):
  myri10ge: check for DMA mapping errors

Thomas Graf (4):
  rhashtable: RCU annotations for next pointers
  rhashtable: unexport and make rht_obj() static
  rhashtable: fix annotations for rht_for_each_entry_rcu()
  netlink: Annotate RCU locking for seq_file walker

Tobias Klauser (1):
  net: xgene: Check negative return value of xgene_enet_get_ring_size()

Wei Liu (5):
  xen-netback: fix debugfs write length check
  xen-netback: fix debugfs entry creation
  xen-netback: move NAPI add/remove calls
  xen-netback: don't stop dealloc kthread too early
  xen-netback: remove loop waiting function

Wei Yongjun (1):
  i40e: fix sparse non static symbol warning

Willem de Bruijn (2):
  net-timestamp: fix missing ACK

Re: [PATCH v5 tip/core/rcu 12/16] rcu: Make TASKS_RCU handle nohz_full= CPUs

2014-08-14 Thread Paul E. McKenney

On Thu, Aug 14, 2014 at 06:55:35PM -0400, Pranith Kumar wrote:
> On Mon, Aug 11, 2014 at 6:49 PM, Paul E. McKenney
>  wrote:
> > From: "Paul E. McKenney" 
> >
> > Currently TASKS_RCU would ignore a CPU running a task in nohz_full=
> > usermode execution.  There would be neither a context switch nor a
> > scheduling-clock interrupt to tell TASKS_RCU that the task in question
> > had passed through a quiescent state.  The grace period would therefore
> > extend indefinitely.  This commit therefore makes RCU's dyntick-idle
> > subsystem record the task_struct structure of the task that is running
> > in dyntick-idle mode on each CPU.  The TASKS_RCU grace period can
> > then access this information and record a quiescent state on
> > behalf of any CPU running in dyntick-idle usermode.
> >
> > Signed-off-by: Paul E. McKenney 
> > ---
> >  include/linux/init_task.h |  3 ++-
> >  include/linux/sched.h |  2 ++
> >  kernel/rcu/tree.c |  2 ++
> >  kernel/rcu/tree.h |  2 ++
> >  kernel/rcu/tree_plugin.h  | 16 
> >  kernel/rcu/update.c   |  4 +++-
> >  6 files changed, 27 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> > index 78715ea7c30c..642828009324 100644
> > --- a/include/linux/init_task.h
> > +++ b/include/linux/init_task.h
> > @@ -128,7 +128,8 @@ extern struct group_info init_groups;
> >  #define INIT_TASK_RCU_TASKS(tsk)   \
> > .rcu_tasks_holdout = false, \
> > .rcu_tasks_holdout_list =   \
> > -   LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),
> > +   LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list), \
> > +   .rcu_tasks_idle_cpu = -1,
> >  #else
> >  #define INIT_TASK_RCU_TASKS(tsk)
> >  #endif
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 3cf124389ec7..5fa041f7a034 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1277,6 +1277,7 @@ struct task_struct {
> > unsigned long rcu_tasks_nvcsw;
> > int rcu_tasks_holdout;
> > struct list_head rcu_tasks_holdout_list;
> > +   int rcu_tasks_idle_cpu;
> >  #endif /* #ifdef CONFIG_TASKS_RCU */
> >
> >  #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
> > @@ -2021,6 +2022,7 @@ static inline void rcu_copy_process(struct 
> > task_struct *p)
> >  #ifdef CONFIG_TASKS_RCU
> > p->rcu_tasks_holdout = false;
> > INIT_LIST_HEAD(>rcu_tasks_holdout_list);
> > +   p->rcu_tasks_idle_cpu = -1;
> >  #endif /* #ifdef CONFIG_TASKS_RCU */
> >  }
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 645a33efc0d4..0d9ee1e4f446 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -526,6 +526,7 @@ static void rcu_eqs_enter_common(struct rcu_dynticks 
> > *rdtp, long long oldval,
> > atomic_inc(>dynticks);
> > smp_mb__after_atomic();  /* Force ordering with next sojourn. */
> > WARN_ON_ONCE(atomic_read(>dynticks) & 0x1);
> > +   rcu_dynticks_task_enter();
> >
> > /*
> >  * It is illegal to enter an extended quiescent state while
> > @@ -642,6 +643,7 @@ void rcu_irq_exit(void)
> >  static void rcu_eqs_exit_common(struct rcu_dynticks *rdtp, long long 
> > oldval,
> >int user)
> >  {
> > +   rcu_dynticks_task_exit();
> > smp_mb__before_atomic();  /* Force ordering w/previous sojourn. */
> > atomic_inc(>dynticks);
> > /* CPUs seeing atomic_inc() must see later RCU read-side crit sects 
> > */
> > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> > index 0f69a79c5b7d..37ff593b7725 100644
> > --- a/kernel/rcu/tree.h
> > +++ b/kernel/rcu/tree.h
> > @@ -579,6 +579,8 @@ static void rcu_sysidle_report_gp(struct rcu_state 
> > *rsp, int isidle,
> >  static void rcu_bind_gp_kthread(void);
> >  static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
> >  static bool rcu_nohz_full_cpu(struct rcu_state *rsp);
> > +static void rcu_dynticks_task_enter(void);
> > +static void rcu_dynticks_task_exit(void);
> >
> >  #endif /* #ifndef RCU_TREE_NONCORE */
> >
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index a86a363ea453..0d8ef5cb1976 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2852,3 +2852,19 @@ static void rcu_bind_gp_kthread(void)
> > set_cpus_allowed_ptr(current, cpumask_of(cpu));
> >  #endif /* #ifdef CONFIG_NO_HZ_FULL */
> >  }
> > +
> > +/* Record the current task on dyntick-idle entry. */
> > +static void rcu_dynticks_task_enter(void)
> > +{
> > +#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
> > +   ACCESS_ONCE(current->rcu_tasks_idle_cpu) = smp_processor_id();
> > +#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */
> 
> Shouldn't we check that the cpu is actually

Re: [PATCH v3 1/3] init / kthread: add module_long_probe_init() and module_long_probe_exit()

2014-08-14 Thread Luis R. Rodriguez

On Wed, Aug 13, 2014 at 07:51:01PM +0200, Oleg Nesterov wrote:
> On 08/12, Luis R. Rodriguez wrote:
> >
> > +/* To be used by modules which can take over 30 seconds at probe */
> 
> Probably the comment should explain that this hack should only be
> used if the driver is buggy and is wating for "real fix".
> 
> > +#define module_long_probe_init(initfn) \
> > +   static struct task_struct *__init_thread;   \
> > +   static int _long_probe_##initfn(void *arg)  \
> > +   {   \
> > +   return initfn();\
> > +   }   \
> > +   static inline __init int __long_probe_##initfn(void)\
> > +   {   \
> > +   __init_thread = kthread_run(_long_probe_##initfn,\
> > +   NULL,   \
> > +   #initfn);   \
> > +   if (IS_ERR(__init_thread))  \
> > +   return PTR_ERR(__init_thread);  \
> > +   return 0;   \
> > +   }   \
> > +   module_init(__long_probe_##initfn);
> > +/* To be used by modules that require module_long_probe_init() */
> > +#define module_long_probe_exit(exitfn) \
> > +   static inline void __long_probe_##exitfn(void)  \
> > +   {   \
> > +   exitfn();   \
> > +   if (__init_thread)  \
> > +   kthread_stop(__init_thread);\
> > +   }   \
> 
> exitfn() should be called after kthread_stop(), and only if initfn()
> returns 0. So it should probably do
> 
>   int err = kthread_stop(__init_thread);
>   if (!err)
>   exitfn();

Thanks! With the check for __init_thread as well as it can be
ERR_PTR(-ENOMEM), ERR_PTR(-EINTR), or NULL (for whatever other
reason).

> But there is an additional complication, you can't use __init_thread
> without get_task_struct(),

Can you elaborate why ? kthread_stop() uses get_task_struct(), 
wake_up_process() and finally put_task_struct(), and we're the
only user of this thread. Also kthread_run() ensures wake_up_process()
gets called on startup, so not sure where the race would be provided
all users here and with the respective helpers on buggy drivers.

> so  __long_probe_##initfn() can't use
> kthread_run(). It needs kthread_create() + get_task_struct() + wakeup.

I fail to see why we'd need to add get_task_struct() on
module_long_probe_init(), can you clarify?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v11] NVMe: Convert to blk-mq

2014-08-14 Thread Keith Busch


On Thu, 14 Aug 2014, Jens Axboe wrote:

nr_tags must be uninitialized or screwed up somehow, otherwise I don't
see how that kmalloc() could warn on being too large. Keith, are you
running with slab debugging? Matias, might be worth trying.


The allocation and freeing of blk-mq parts seems a bit asymmetrical
to me. The 'tags' belong to the tagset, but any request_queue using
that tagset may free the tags. I looked to separate the tag allocation
concerns, but that's more time than I have, so this is my quick-fix
driver patch, forcing tag access through the hw_ctx.

---
diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index 384dc91..91432d2 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -109,7 +109,7 @@ struct nvme_queue {
u8 cqe_seen;
u8 q_suspended;
struct async_cmd_info cmdinfo;
-   struct blk_mq_tags *tags;
+   struct blk_mq_hw_ctx *hctx;
 };

 /*
@@ -148,6 +148,7 @@ static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, 
void *data,
struct nvme_queue *nvmeq = dev->queues[0];

hctx->driver_data = nvmeq;
+   nvmeq->hctx = hctx;
return 0;
 }

@@ -174,6 +175,7 @@ static int nvme_init_hctx(struct blk_mq_hw_ctx *hctx, void 
*data,
irq_set_affinity_hint(dev->entry[nvmeq->cq_vector].vector,
hctx->cpumask);
hctx->driver_data = nvmeq;
+   nvmeq->hctx = hctx;
return 0;
 }

@@ -280,8 +282,7 @@ static void async_completion(struct nvme_queue *nvmeq, void 
*ctx,
 static inline struct nvme_cmd_info *get_cmd_from_tag(struct nvme_queue *nvmeq,
  unsigned int tag)
 {
-   struct request *req = blk_mq_tag_to_rq(nvmeq->tags, tag);
-
+   struct request *req = blk_mq_tag_to_rq(nvmeq->hctx->tags, tag);
return blk_mq_rq_to_pdu(req);
 }

@@ -654,8 +655,6 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx, struct 
request *req)
nvme_submit_flush(nvmeq, ns, req->tag);
else
nvme_submit_iod(nvmeq, iod, ns);
-
- queued:
nvme_process_cq(nvmeq);
spin_unlock_irq(>q_lock);
return BLK_MQ_RQ_QUEUE_OK;
@@ -1051,9 +1050,8 @@ static void nvme_cancel_queue_ios(void *data, unsigned 
long *tag_map)
if (tag >= qdepth)
break;

-   req = blk_mq_tag_to_rq(nvmeq->tags, tag++);
+   req = blk_mq_tag_to_rq(nvmeq->hctx->tags, tag++);
cmd = blk_mq_rq_to_pdu(req);
if (cmd->ctx == CMD_CTX_CANCELLED)
continue;

@@ -1132,8 +1130,8 @@ static void nvme_clear_queue(struct nvme_queue *nvmeq)
 {
spin_lock_irq(>q_lock);
nvme_process_cq(nvmeq);
-   if (nvmeq->tags)
-   blk_mq_tag_busy_iter(nvmeq->tags, nvme_cancel_queue_ios, nvmeq);
+   if (nvmeq->hctx->tags)
+   blk_mq_tag_busy_iter(nvmeq->hctx->tags, nvme_cancel_queue_ios, 
nvmeq);
spin_unlock_irq(>q_lock);
 }

@@ -1353,8 +1351,6 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
if (blk_mq_alloc_tag_set(>admin_tagset))
return -ENOMEM;

-   dev->queues[0]->tags = dev->admin_tagset.tags[0];
-
dev->admin_q = blk_mq_init_queue(>admin_tagset);
if (!dev->admin_q) {
blk_mq_free_tag_set(>admin_tagset);
@@ -2055,9 +2051,6 @@ static int nvme_dev_add(struct nvme_dev *dev)
if (blk_mq_alloc_tag_set(>tagset))
goto out;

-   for (i = 1; i < dev->online_queues; i++)
-   dev->queues[i]->tags = dev->tagset.tags[i - 1];
-
id_ns = mem;
for (i = 1; i <= nn; i++) {
res = nvme_identify(dev, i, 0, dma_addr);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V3] hwmon, k10temp: Add support for F15h M60h

2014-08-14 Thread Aravind Gopalakrishnan

This patch adds temperature monitoring support for F15h M60h processor.
 - Add new pci device id for the relevant processor
 - The functionality of REG_REPORTED_TEMPERATURE is moved to
   D0F0xBC_xD820_0CA4 [Reported Temperature Control]
   - So, use this to get CUR_TEMP value
   - Since we need an indirect register access, protect this with
 a mutex lock
 - Add Kconfig, Doc entries to indicate support for this processor.

Signed-off-by: Aravind Gopalakrishnan 
---
Changes in V3:
 - Move helper function that protects indirect register access locally
   until a time when others outside k10temp may need it
   
Changes in V2:
 - Prevent race with other code that may require indirect NB_SMU_REG access
 - Fix some minor style issues

 Documentation/hwmon/k10temp |  2 +-
 drivers/hwmon/Kconfig   |  4 ++--
 drivers/hwmon/k10temp.c | 35 ---
 3 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/Documentation/hwmon/k10temp b/Documentation/hwmon/k10temp
index ee6d30e..254d2f5 100644
--- a/Documentation/hwmon/k10temp
+++ b/Documentation/hwmon/k10temp
@@ -11,7 +11,7 @@ Supported chips:
   Socket S1G2: Athlon (X2), Sempron (X2), Turion X2 (Ultra)
 * AMD Family 12h processors: "Llano" (E2/A4/A6/A8-Series)
 * AMD Family 14h processors: "Brazos" (C/E/G/Z-Series)
-* AMD Family 15h processors: "Bulldozer" (FX-Series), "Trinity", "Kaveri"
+* AMD Family 15h processors: "Bulldozer" (FX-Series), "Trinity", "Kaveri", 
"Carrizo"
 * AMD Family 16h processors: "Kabini", "Mullins"
 
   Prefix: 'k10temp'
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 02d3d85..57ba400 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -280,8 +280,8 @@ config SENSORS_K10TEMP
  If you say yes here you get support for the temperature
  sensor(s) inside your CPU. Supported are later revisions of
  the AMD Family 10h and all revisions of the AMD Family 11h,
- 12h (Llano), 14h (Brazos), 15h (Bulldozer/Trinity/Kaveri) and
- 16h (Kabini/Mullins) microarchitectures.
+ 12h (Llano), 14h (Brazos), 15h (Bulldozer/Trinity/Kaveri/Carrizo)
+ and 16h (Kabini/Mullins) microarchitectures.
 
  This driver can also be built as a module.  If so, the module
  will be called k10temp.
diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index f7b46f6..36ea152 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -33,6 +33,9 @@ static bool force;
 module_param(force, bool, 0444);
 MODULE_PARM_DESC(force, "force loading on processors with erratum 319");
 
+/* Provide lock for writing to NB_SMU_IND_ADDR */
+DEFINE_MUTEX(nb_smu_ind_mutex);
+
 /* CPUID function 0x8001, ebx */
 #define CPUID_PKGTYPE_MASK 0xf000
 #define CPUID_PKGTYPE_F0x
@@ -51,13 +54,38 @@ MODULE_PARM_DESC(force, "force loading on processors with 
erratum 319");
 #define REG_NORTHBRIDGE_CAPABILITIES   0xe8
 #define  NB_CAP_HTC0x0400
 
+/*
+ * For F15h M60h, functionality of REG_REPORTED_TEMPERATURE
+ * has been moved to D0F0xBC_xD820_0CA4 [Reported Temperature
+ * Control]
+ */
+#define F15H_M60H_REPORTED_TEMP_CTRL_OFFSET0xd8200ca4
+#define PCI_DEVICE_ID_AMD_15H_M60H_NB_F3   0x1573
+
+void amd_nb_smu_index_read(struct pci_dev *pdev, unsigned int devfn,
+  int offset, u32 *val)
+{
+   mutex_lock(_smu_ind_mutex);
+   pci_bus_write_config_dword(pdev->bus, devfn,
+  0xb8, offset);
+   pci_bus_read_config_dword(pdev->bus, devfn,
+ 0xbc, val);
+   mutex_unlock(_smu_ind_mutex);
+}
+
 static ssize_t show_temp(struct device *dev,
 struct device_attribute *attr, char *buf)
 {
u32 regval;
-
-   pci_read_config_dword(to_pci_dev(dev),
- REG_REPORTED_TEMPERATURE, );
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model == 0x60) {
+   amd_nb_smu_index_read(pdev, PCI_DEVFN(0, 0),
+ F15H_M60H_REPORTED_TEMP_CTRL_OFFSET,
+ );
+   } else {
+   pci_read_config_dword(pdev, REG_REPORTED_TEMPERATURE, );
+   }
return sprintf(buf, "%u\n", (regval >> 21) * 125);
 }
 
@@ -211,6 +239,7 @@ static const struct pci_device_id k10temp_id_table[] = {
{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_15H_NB_F3) },
{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_15H_M10H_F3) },
{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_15H_M30H_NB_F3) },
+   { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_15H_M60H_NB_F3) },
{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_16H_NB_F3) },
{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_16H_M30H_NB_F3) },
{}
-- 
2.0.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org

[PATCH] at76c50x-usb: fix use after free on failure path in at76_probe()

2014-08-14 Thread Alexey Khoroshilov

After commit 174beab7d445 ("at76c50x-usb: Don't perform DMA from stack memory")
at76_delete_device() and usb_put_dev() are called both
if at76_init_new_device() fails in at76_probe().
But at76_delete_device() does usb_put_dev(priv->dev) itself
that means double usb_put_dev().

The patch avoids the problem by moving usb_put_dev() from
at76_delete_device() to at76_disconnect().

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov 
---
 drivers/net/wireless/at76c50x-usb.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/wireless/at76c50x-usb.c 
b/drivers/net/wireless/at76c50x-usb.c
index 334c2ece855a..da92bfa76b7c 100644
--- a/drivers/net/wireless/at76c50x-usb.c
+++ b/drivers/net/wireless/at76c50x-usb.c
@@ -2423,8 +2423,6 @@ static void at76_delete_device(struct at76_priv *priv)
 
kfree_skb(priv->rx_skb);
 
-   usb_put_dev(priv->udev);
-
at76_dbg(DBG_PROC_ENTRY, "%s: before freeing priv/ieee80211_hw",
 __func__);
ieee80211_free_hw(priv->hw);
@@ -2558,6 +2556,7 @@ static void at76_disconnect(struct usb_interface 
*interface)
 
wiphy_info(priv->hw->wiphy, "disconnecting\n");
at76_delete_device(priv);
+   usb_put_dev(priv->udev);
dev_info(>dev, "disconnected\n");
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] pinctrl: qcom: apq8064: Correct interrupts in example

2014-08-14 Thread Bjorn Andersson

The example in the binding document indicates that interrupt 32 is used
for the TLMM summary IRQ. Correct this to reduce the confusion.

Signed-off-by: Bjorn Andersson 
---
 .../bindings/pinctrl/qcom,apq8064-pinctrl.txt  |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,apq8064-pinctrl.txt 
b/Documentation/devicetree/bindings/pinctrl/qcom,apq8064-pinctrl.txt
index 0211c6d..92fae82 100644
--- a/Documentation/devicetree/bindings/pinctrl/qcom,apq8064-pinctrl.txt
+++ b/Documentation/devicetree/bindings/pinctrl/qcom,apq8064-pinctrl.txt
@@ -62,7 +62,7 @@ Example:
#gpio-cells = <2>;
interrupt-controller;
#interrupt-cells = <2>;
-   interrupts = <0 32 0x4>;
+   interrupts = <0 16 0x4>;
 
pinctrl-names = "default";
pinctrl-0 = <_uart_default>;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 tip/core/rcu 12/16] rcu: Make TASKS_RCU handle nohz_full= CPUs

2014-08-14 Thread Pranith Kumar

On Mon, Aug 11, 2014 at 6:49 PM, Paul E. McKenney
 wrote:
> From: "Paul E. McKenney" 
>
> Currently TASKS_RCU would ignore a CPU running a task in nohz_full=
> usermode execution.  There would be neither a context switch nor a
> scheduling-clock interrupt to tell TASKS_RCU that the task in question
> had passed through a quiescent state.  The grace period would therefore
> extend indefinitely.  This commit therefore makes RCU's dyntick-idle
> subsystem record the task_struct structure of the task that is running
> in dyntick-idle mode on each CPU.  The TASKS_RCU grace period can
> then access this information and record a quiescent state on
> behalf of any CPU running in dyntick-idle usermode.
>
> Signed-off-by: Paul E. McKenney 
> ---
>  include/linux/init_task.h |  3 ++-
>  include/linux/sched.h |  2 ++
>  kernel/rcu/tree.c |  2 ++
>  kernel/rcu/tree.h |  2 ++
>  kernel/rcu/tree_plugin.h  | 16 
>  kernel/rcu/update.c   |  4 +++-
>  6 files changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index 78715ea7c30c..642828009324 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -128,7 +128,8 @@ extern struct group_info init_groups;
>  #define INIT_TASK_RCU_TASKS(tsk)   \
> .rcu_tasks_holdout = false, \
> .rcu_tasks_holdout_list =   \
> -   LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),
> +   LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list), \
> +   .rcu_tasks_idle_cpu = -1,
>  #else
>  #define INIT_TASK_RCU_TASKS(tsk)
>  #endif
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 3cf124389ec7..5fa041f7a034 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1277,6 +1277,7 @@ struct task_struct {
> unsigned long rcu_tasks_nvcsw;
> int rcu_tasks_holdout;
> struct list_head rcu_tasks_holdout_list;
> +   int rcu_tasks_idle_cpu;
>  #endif /* #ifdef CONFIG_TASKS_RCU */
>
>  #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
> @@ -2021,6 +2022,7 @@ static inline void rcu_copy_process(struct task_struct 
> *p)
>  #ifdef CONFIG_TASKS_RCU
> p->rcu_tasks_holdout = false;
> INIT_LIST_HEAD(>rcu_tasks_holdout_list);
> +   p->rcu_tasks_idle_cpu = -1;
>  #endif /* #ifdef CONFIG_TASKS_RCU */
>  }
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 645a33efc0d4..0d9ee1e4f446 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -526,6 +526,7 @@ static void rcu_eqs_enter_common(struct rcu_dynticks 
> *rdtp, long long oldval,
> atomic_inc(>dynticks);
> smp_mb__after_atomic();  /* Force ordering with next sojourn. */
> WARN_ON_ONCE(atomic_read(>dynticks) & 0x1);
> +   rcu_dynticks_task_enter();
>
> /*
>  * It is illegal to enter an extended quiescent state while
> @@ -642,6 +643,7 @@ void rcu_irq_exit(void)
>  static void rcu_eqs_exit_common(struct rcu_dynticks *rdtp, long long oldval,
>int user)
>  {
> +   rcu_dynticks_task_exit();
> smp_mb__before_atomic();  /* Force ordering w/previous sojourn. */
> atomic_inc(>dynticks);
> /* CPUs seeing atomic_inc() must see later RCU read-side crit sects */
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 0f69a79c5b7d..37ff593b7725 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -579,6 +579,8 @@ static void rcu_sysidle_report_gp(struct rcu_state *rsp, 
> int isidle,
>  static void rcu_bind_gp_kthread(void);
>  static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
>  static bool rcu_nohz_full_cpu(struct rcu_state *rsp);
> +static void rcu_dynticks_task_enter(void);
> +static void rcu_dynticks_task_exit(void);
>
>  #endif /* #ifndef RCU_TREE_NONCORE */
>
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index a86a363ea453..0d8ef5cb1976 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2852,3 +2852,19 @@ static void rcu_bind_gp_kthread(void)
> set_cpus_allowed_ptr(current, cpumask_of(cpu));
>  #endif /* #ifdef CONFIG_NO_HZ_FULL */
>  }
> +
> +/* Record the current task on dyntick-idle entry. */
> +static void rcu_dynticks_task_enter(void)
> +{
> +#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
> +   ACCESS_ONCE(current->rcu_tasks_idle_cpu) = smp_processor_id();
> +#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */

Shouldn't we check that the cpu is actually a nohz_full cpu, like follows:

 static void rcu_dynticks_task_enter(void)
 {
 #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
-   ACCESS_ONCE(current->rcu_tasks_idle_cpu) = smp_processor_id();
+   if (tick_nohz_full_cpu(smp_processor_id())
+

Re: [PATCH v5 tip/core/rcu 11/16] rcu: Defer rcu_tasks_kthread() creation till first call_rcu_tasks()

2014-08-14 Thread Paul E. McKenney

On Thu, Aug 14, 2014 at 06:28:53PM -0400, Pranith Kumar wrote:
> On Mon, Aug 11, 2014 at 6:49 PM, Paul E. McKenney
>  wrote:
> > From: "Paul E. McKenney" 
> >
> > It is expected that many sites will have CONFIG_TASKS_RCU=y, but
> > will never actually invoke call_rcu_tasks().  For such sites, creating
> > rcu_tasks_kthread() at boot is wasteful.  This commit therefore defers
> > creation of this kthread until the time of the first call_rcu_tasks().
> >
> > This of course means that the first call_rcu_tasks() must be invoked
> > from process context after the scheduler is fully operational.
> >
> > Signed-off-by: Paul E. McKenney 
> > ---
> >  kernel/rcu/update.c | 33 ++---
> >  1 file changed, 26 insertions(+), 7 deletions(-)
> >
> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index 1256a900cd01..d997163c7e92 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -378,7 +378,12 @@ DEFINE_SRCU(tasks_rcu_exit_srcu);
> >  static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10;
> >  module_param(rcu_task_stall_timeout, int, 0644);
> >
> > -/* Post an RCU-tasks callback. */
> > +static void rcu_spawn_tasks_kthread(void);
> > +
> > +/*
> > + * Post an RCU-tasks callback.  First call must be from process context
> > + * after the scheduler if fully operational.
> > + */
> >  void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head 
> > *rhp))
> >  {
> > unsigned long flags;
> > @@ -391,8 +396,10 @@ void call_rcu_tasks(struct rcu_head *rhp, void 
> > (*func)(struct rcu_head *rhp))
> > *rcu_tasks_cbs_tail = rhp;
> > rcu_tasks_cbs_tail = >next;
> > raw_spin_unlock_irqrestore(_tasks_cbs_lock, flags);
> > -   if (needwake)
> > +   if (needwake) {
> > +   rcu_spawn_tasks_kthread();
> > wake_up(_tasks_cbs_wq);
> > +   }
> >  }
> >  EXPORT_SYMBOL_GPL(call_rcu_tasks);
> >
> > @@ -618,15 +625,27 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > }
> >  }
> >
> > -/* Spawn rcu_tasks_kthread() at boot time. */
> > -static int __init rcu_spawn_tasks_kthread(void)
> > +/* Spawn rcu_tasks_kthread() at first call to call_rcu_tasks(). */
> > +static void rcu_spawn_tasks_kthread(void)
> >  {
> > -   struct task_struct __maybe_unused *t;
> > +   static DEFINE_MUTEX(rcu_tasks_kthread_mutex);
> > +   static struct task_struct *rcu_tasks_kthread_ptr;
> > +   struct task_struct *t;
> >
> > +   if (ACCESS_ONCE(rcu_tasks_kthread_ptr)) {
> > +   smp_mb(); /* Ensure caller sees full kthread. */
> > +   return;
> > +   }
> 
> I don't see the need for this smp_mb(). The caller has already seen
> that rcu_tasks_kthread_ptr is assigned. What are we ensuring with this
> barrier again?

We are ensuring that any later operations on rcu_tasks_kthread_ptr
see a fully initialized thread.  Because these later operations
might be loads, we cannot rely on control dependencies.

> an smp_rmb() before this ACCESS_ONCE() and an smp_wmb() after
> assigning to rcu_tasks_kthread_ptr should be enough, right?

Probably.  But given that rcu_spawn_tasks_kthread() is only called
when a CPU is onlined, I am not much inclined to weaken it.

> > +   mutex_lock(_tasks_kthread_mutex);
> > +   if (rcu_tasks_kthread_ptr) {
> > +   mutex_unlock(_tasks_kthread_mutex);
> > +   return;
> > +   }
> > t = kthread_run(rcu_tasks_kthread, NULL, "rcu_tasks_kthread");
> > BUG_ON(IS_ERR(t));
> > -   return 0;
> > +   smp_mb(); /* Ensure others see full kthread. */
> > +   ACCESS_ONCE(rcu_tasks_kthread_ptr) = t;
> 
> Isn't it better to reverse these two statements and change as follows?
> 
> ACCESS_ONCE(rcu_tasks_kthread_ptr) = t;
> smp_wmb();

This would break.  We need all the task creation stuff to be seen as
having happened before the store to rcu_tasks_kthread_ptr.  Putting
the barrier after the store to rcu_tasks_kthread_ptr would allow
both compiler and CPU to reorder task-creation stuff to follow the
store to the pointer, which would not be good.

> or
> 
> smp_store_release(rcu_tasks_kthread_ptr, t);
> 
> will ensure that this write to rcu_task_kthread_ptr is ordered with
> the previous read. I recently read memory-barriers.txt, so please
> excuse me if I am totally wrong. But I am confused! :(

Hmmm...  An smp_store_release() combined with smp_load_acquire()
up earlier might be a good approach.  Maybe as a future cleanup.

But please note that smp_store_release() puts the barrier -before-
the store.  ;-)

Thanx, Paul

> > +   mutex_unlock(_tasks_kthread_mutex);
> >  }
> > -early_initcall(rcu_spawn_tasks_kthread);
> >
> >  #endif /* #ifdef CONFIG_TASKS_RCU */
> > --
> > 1.8.1.5
> >
> 
> 
> 
> -- 
> Pranith
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org

[PATCH] nfs: fix kernel warning when removing proc entry

2014-08-14 Thread Cong Wang

I saw the following kernel warning:

[ 1852.321222] [ cut here ]
[ 1852.326527] WARNING: CPU: 0 PID: 118 at fs/proc/generic.c:521 
remove_proc_entry+0x154/0x16b()
[ 1852.335630] remove_proc_entry: removing non-empty directory 'fs/nfsfs', 
leaking at least 'volumes'
[ 1852.344084] CPU: 0 PID: 118 Comm: kworker/u8:2 Not tainted 3.16.0+ #540
[ 1852.350036] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1852.354992] Workqueue: netns cleanup_net
[ 1852.358701]   880116f2fbd0 819c03e9 
880116f2fc18
[ 1852.366474]  880116f2fc08 810744ee 811e0e6e 
8800d4e96238
[ 1852.373507]  81dbe665 8800d46a5948 0005 
880116f2fc68
[ 1852.380224] Call Trace:
[ 1852.381976]  [] dump_stack+0x4d/0x66
[ 1852.385495]  [] warn_slowpath_common+0x7a/0x93
[ 1852.389869]  [] ? remove_proc_entry+0x154/0x16b
[ 1852.393987]  [] warn_slowpath_fmt+0x4c/0x4e
[ 1852.397999]  [] remove_proc_entry+0x154/0x16b
[ 1852.402034]  [] nfs_fs_proc_net_exit+0x53/0x56
[ 1852.406136]  [] nfs_net_exit+0x12/0x1d
[ 1852.409774]  [] ops_exit_list+0x44/0x55
[ 1852.413529]  [] cleanup_net+0xee/0x182
[ 1852.417198]  [] process_one_work+0x209/0x40d
[ 1852.502320]  [] ? process_one_work+0x162/0x40d
[ 1852.587629]  [] worker_thread+0x1f0/0x2c7
[ 1852.673291]  [] ? process_scheduled_works+0x2f/0x2f
[ 1852.759470]  [] kthread+0xc9/0xd1
[ 1852.843099]  [] ? finish_task_switch+0x3a/0xce
[ 1852.926518]  [] ? __kthread_parkme+0x61/0x61
[ 1853.008565]  [] ret_from_fork+0x7c/0xb0
[ 1853.076477]  [] ? __kthread_parkme+0x61/0x61
[ 1853.140653] ---[ end trace 69c4c6617f78e32d ]---

It looks wrong that we add "/proc/net/nfsfs" in nfs_fs_proc_net_init()
while remove "/proc/fs/nfsfs" in nfs_fs_proc_net_exit().

Fixes: commit 65b38851a17 (NFS: Fix /proc/fs/nfsfs/servers and 
/proc/fs/nfsfs/volumes)
Cc: Eric W. Biederman 
Cc: Trond Myklebust 
Cc: Stanislav Kinsbursky 
Signed-off-by: Cong Wang 
---
 fs/nfs/client.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 1c5ff6d..1c57202 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1429,7 +1429,7 @@ void nfs_fs_proc_net_exit(struct net *net)
 
remove_proc_entry("volumes", nn->proc_nfsfs);
remove_proc_entry("servers", nn->proc_nfsfs);
-   remove_proc_entry("fs/nfsfs", NULL);
+   remove_proc_entry("nfsfs", net->proc_net);
 }
 
 /*
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/8] mtd: nand: Support for new DT NAND driver

2014-08-14 Thread Brian Norris

On Wed, Aug 13, 2014 at 10:11:59AM +0100, Lee Jones wrote:
> Hi Brian, Pekon,
> 
> I believe all of your queries have either been answered or addressed
> and I am hoping this will be the last submission. :)
> 
> /me crosses fingers!
> 
> Kind regards,
> Lee

I didn't look through the patches yet, but my build tools tell me you
didn't compile-test this. Please compile, test, and resend your patches.

In file included from drivers/mtd/nand/stm_nand_bch.c:26:0:
include/linux/mtd/stm_nand_bbt.h:16:13: warning: no previous prototype for 
'nandi_dump_bad_blocks' [-Wmissing-prototypes]
drivers/mtd/nand/stm_nand_bch.c: In function 'nandi_set_mtd_defaults':
drivers/mtd/nand/stm_nand_bch.c:870:19: error: 'bch_scan_bbt' undeclared (first 
use in this function)
drivers/mtd/nand/stm_nand_bch.c:870:19: note: each undeclared identifier is 
reported only once for each function it appears in
drivers/mtd/nand/stm_nand_bch.c:871:20: error: 'bch_block_isbad' undeclared 
(first use in this function)
drivers/mtd/nand/stm_nand_bch.c:872:24: error: 'bch_block_markbad' undeclared 
(first use in this function)
drivers/mtd/nand/stm_nand_bch.c: In function 'bch_calc_timing_registers':
drivers/mtd/nand/stm_nand_bch.c:1022:6: warning: variable 'ren_half_off' set 
but not used [-Wunused-but-set-variable]
drivers/mtd/nand/stm_nand_bch.c:1021:6: warning: variable 'ren_half_on' set but 
not used [-Wunused-but-set-variable]

Particularly, take a hard look at rewriting
include/linux/mtd/stm_nand_bbt.h.

Thanks,
Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: apq8064: Add pinmux and i2c pinctrl nodes

2014-08-14 Thread Bjorn Andersson

On Thu, Aug 14, 2014 at 12:20 AM, Kiran Padwal
 wrote:
> diff --git a/arch/arm/boot/dts/qcom-apq8064.dtsi 
> b/arch/arm/boot/dts/qcom-apq8064.dtsi
> index 92bf793..fbebf5c 100644
> --- a/arch/arm/boot/dts/qcom-apq8064.dtsi
> +++ b/arch/arm/boot/dts/qcom-apq8064.dtsi
> @@ -70,6 +70,17 @@
> ranges;
> compatible = "simple-bus";
>
> +   qcom_pinmux: pinmux@80 {

There are (at least) three different pinmuxes in these platforms: TLMM, PMIC
GPIO, PMIC MPP. Also this is the phandle that is used to reference the gpio
chip throughout the board.

So I would like to suggest that we name it "tlmm" or like in the downstream
kernel "msmgpio".

> +   compatible = "qcom,apq8064-pinctrl";
> +   reg = <0x80 0x4000>;
> +
> +   gpio-controller;
> +   #gpio-cells = <2>;
> +   interrupt-controller;
> +   #interrupt-cells = <2>;
> +   interrupts = <0 32 0x4>;

I must have gotten this wrong in the dt binding example, sorry about that.
interrupts should be <0 16 0x4>.

> +   };

Regards,
Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 tip/core/rcu 11/16] rcu: Defer rcu_tasks_kthread() creation till first call_rcu_tasks()

2014-08-14 Thread Pranith Kumar

On Mon, Aug 11, 2014 at 6:49 PM, Paul E. McKenney
 wrote:
> From: "Paul E. McKenney" 
>
> It is expected that many sites will have CONFIG_TASKS_RCU=y, but
> will never actually invoke call_rcu_tasks().  For such sites, creating
> rcu_tasks_kthread() at boot is wasteful.  This commit therefore defers
> creation of this kthread until the time of the first call_rcu_tasks().
>
> This of course means that the first call_rcu_tasks() must be invoked
> from process context after the scheduler is fully operational.
>
> Signed-off-by: Paul E. McKenney 
> ---
>  kernel/rcu/update.c | 33 ++---
>  1 file changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index 1256a900cd01..d997163c7e92 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -378,7 +378,12 @@ DEFINE_SRCU(tasks_rcu_exit_srcu);
>  static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10;
>  module_param(rcu_task_stall_timeout, int, 0644);
>
> -/* Post an RCU-tasks callback. */
> +static void rcu_spawn_tasks_kthread(void);
> +
> +/*
> + * Post an RCU-tasks callback.  First call must be from process context
> + * after the scheduler if fully operational.
> + */
>  void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
>  {
> unsigned long flags;
> @@ -391,8 +396,10 @@ void call_rcu_tasks(struct rcu_head *rhp, void 
> (*func)(struct rcu_head *rhp))
> *rcu_tasks_cbs_tail = rhp;
> rcu_tasks_cbs_tail = >next;
> raw_spin_unlock_irqrestore(_tasks_cbs_lock, flags);
> -   if (needwake)
> +   if (needwake) {
> +   rcu_spawn_tasks_kthread();
> wake_up(_tasks_cbs_wq);
> +   }
>  }
>  EXPORT_SYMBOL_GPL(call_rcu_tasks);
>
> @@ -618,15 +625,27 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> }
>  }
>
> -/* Spawn rcu_tasks_kthread() at boot time. */
> -static int __init rcu_spawn_tasks_kthread(void)
> +/* Spawn rcu_tasks_kthread() at first call to call_rcu_tasks(). */
> +static void rcu_spawn_tasks_kthread(void)
>  {
> -   struct task_struct __maybe_unused *t;
> +   static DEFINE_MUTEX(rcu_tasks_kthread_mutex);
> +   static struct task_struct *rcu_tasks_kthread_ptr;
> +   struct task_struct *t;
>
> +   if (ACCESS_ONCE(rcu_tasks_kthread_ptr)) {
> +   smp_mb(); /* Ensure caller sees full kthread. */
> +   return;
> +   }

I don't see the need for this smp_mb(). The caller has already seen
that rcu_tasks_kthread_ptr is assigned. What are we ensuring with this
barrier again?

an smp_rmb() before this ACCESS_ONCE() and an smp_wmb() after
assigning to rcu_tasks_kthread_ptr should be enough, right?

> +   mutex_lock(_tasks_kthread_mutex);
> +   if (rcu_tasks_kthread_ptr) {
> +   mutex_unlock(_tasks_kthread_mutex);
> +   return;
> +   }
> t = kthread_run(rcu_tasks_kthread, NULL, "rcu_tasks_kthread");
> BUG_ON(IS_ERR(t));
> -   return 0;
> +   smp_mb(); /* Ensure others see full kthread. */
> +   ACCESS_ONCE(rcu_tasks_kthread_ptr) = t;

Isn't it better to reverse these two statements and change as follows?

ACCESS_ONCE(rcu_tasks_kthread_ptr) = t;
smp_wmb();

or

smp_store_release(rcu_tasks_kthread_ptr, t);

will ensure that this write to rcu_task_kthread_ptr is ordered with
the previous read. I recently read memory-barriers.txt, so please
excuse me if I am totally wrong. But I am confused! :(

> +   mutex_unlock(_tasks_kthread_mutex);
>  }
> -early_initcall(rcu_spawn_tasks_kthread);
>
>  #endif /* #ifdef CONFIG_TASKS_RCU */
> --
> 1.8.1.5
>



-- 
Pranith
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC v4 net-next 03/26] bpf: introduce syscall(BPF, ...) and BPF maps

2014-08-14 Thread Brendan Gregg

On Wed, Aug 13, 2014 at 12:57 AM, Alexei Starovoitov  wrote:
[...]
> maps can have different types: hash, bloom filter, radix-tree, etc.
>
> The map is defined by:
>   . type
>   . max number of elements
>   . key size in bytes
>   . value size in bytes

Can values be strings or byte arrays? How would user-level bpf read
them? The two types of uses I'm thinking are:

A. Constructing a custom string in kernel-context, and using that as
the value. Eg, a truncated filename, or a dotted quad IP address, or
the raw contents of a packet.
B. I have a pointer to an existing buffer or string, eg a filename,
that will likely be around for some time (>1s). Instead of the value
storing the string, it could just be a ptr, so long as user-level bpf
has a way to read it.

Also, can keys be strings? I'd ask about multiple keys, but if they
can be a string, I can delimit in the key (eg, "PID:filename").
Thanks,

Brendan

-- 
http://www.brendangregg.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Overriding -Werror

2014-08-14 Thread Brian Norris

Hi all,

I'm interested in being able to build-test kernels on various
architectures while enabling extra warnings (make W=[123]). I'd like to
be able to finish the builds and see all warnings, rather than seeing a
failed build. However, GCC's -Werror is incompatible with this. There is
plenty of code that will produce at least one warning, when warning
verbosity is turned up. And GCC's -Werror is not guaranteed to remain
stable over time; new versions may develop new warnings that may or may
not be legitimate.

It seems that there are a few problem ARCHes that enable -Werror by
default: SuperH (orphaned), SPARC, and MIPS. There are also a few
scattered Makefiles throughout the build tree. Developers have
previously tried to remove some of the worst offenders [1], but were
mostly rejected [2]. It doesn't seem like we can fully prevent
maintainers from enabling -Werror on their code--or even on their entire
ARCH build, as with MIPS--for better or worse, so I look to other
alternatives.

For the easiest approach, I considered how one might add -Wno-error to
the CFLAGS. 'make KCFLAGS=-Wno-error' looked promising, but
KBUILD_CFLAGS is applied before the sub-directory Makefiles add their
own options to ccflags-y. So it seems like others have come to the same
conclusion as me: that Kbuild doesn't seem to provide a way to override
the -Werror behvaior from the top level. [3][4]

So, how can we fix this? -Werror may be useful in some cases to
encourage developers to fix up their code immediately, but it is
decidedly unhelpful when running code through analysis tools.

Possibilities include:

1. make -Werror be applied only when we do not have W=[123]. [5]

2. develop a top-level override for CFLAGS that is applied *after* all
   sub-directory modifications

3. make -Werror opt-in / configurable, like PPC's CONFIG_PPC_WERROR
   (maybe make it a generic CONFIG_WERROR?), and prevent its
   unconditional use in Makefiles

4. better ideas?

Regards,
Brian

[1] http://www.linux-mips.org/archives/linux-mips/2012-04/msg00179.html
http://patchwork.ozlabs.org/patch/146297/

[2] http://www.linux-mips.org/archives/linux-mips/2012-05/msg00064.html

[3] http://lists.linaro.org/pipermail/linaro-toolchain/2011-November/001869.html

[4] http://lists.linaro.org/pipermail/linaro-dev/2011-December/008880.html

[5] http://www.linux-mips.org/archives/linux-mips/2012-05/msg00070.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] hwmon, k10temp: Add support for F15h M60h

2014-08-14 Thread Aravind Gopalakrishnan


On 8/14/2014 5:05 PM, Borislav Petkov wrote:

On Thu, Aug 14, 2014 at 04:57:17PM -0500, Aravind Gopalakrishnan wrote:

Actually I don't need it outside of k10temp as of now (or near future)
I added it in amd_nb as that was Clemens, Guenter's suggestion on the
previous version;
Besides, it made sense as it's an indirect access of NB_SMU register and
amd_nb seems a good place to put the function in case someone needs it in
the future.

Then someone can move it then. But until that happens it is pretty
pointless of having the Kconfig dependency just for one small function
with a single user.


I can move it locally to k10temp and remove the dependency if that's
more preferable.

Yeah, it looks like a fabricated and not true dependency, which doesn't
make any sense currently.

Thanks.



Ok, Will fix this and send V3.

-Aravind.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC v4 net-next 25/26] samples: bpf: counting eBPF example in C

2014-08-14 Thread Brendan Gregg

On Wed, Aug 13, 2014 at 12:57 AM, Alexei Starovoitov  wrote:
> this example has two probes in C that use two different maps.
>
> 1st probe is the similar to dropmon.c. It attaches to kfree_skb tracepoint and
> count number of packet drops at different locations
>
> 2nd probe attaches to kprobe/sys_write and computes a histogram of different
> write sizes
>
> Usage:
> $ sudo ex2
>
> Should see:
> writing bpf-5 -> /sys/kernel/debug/tracing/events/skb/kfree_skb/filter
> writing bpf-8 -> /sys/kernel/debug/tracing/events/kprobes/sys_write/filter
> location 0x816efc67 count 1
>
> location 0x815d8030 count 1
> location 0x816efc67 count 3
>
> location 0x815d8030 count 4
> location 0x816efc67 count 9
>
>syscall write() stats
>  byte_size   : count distribution
>1 -> 1: 3141 |  |
>2 -> 3: 2|  |
>4 -> 7: 14   |  |
>8 -> 15   : 3268 |* |
>   16 -> 31   : 732  |  |
>   32 -> 63   : 20042|* |
>   64 -> 127  : 12154|**|
>  128 -> 255  : 2215 |***   |
>  256 -> 511  : 9|  |
>  512 -> 1023 : 0|  |
> 1024 -> 2047 : 1|  |

This is pretty awesome.

Given that this is tracing two tracepoints at once, I'd like to see a
similar example where time is stored on the first tracepoint,
retrieved on the second for a delta calculation, then presented with a
similar histogram as seen above.

Brendan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel 3.16.0 USB crash

2014-08-14 Thread Sarah Sharp

Adding Mathias Nyman.  He is now the USB 3.0 maintainer.

Sarah Sharp

On Thu, Aug 14, 2014 at 11:46:33AM +0200, Hans de Goede wrote:
> Hi,
> 
> On 08/14/2014 10:39 AM, Claudio Bizzarri wrote:
> > Ciao,
> > 
> > thank you very much for replay, you are right: it's UAS module. Now I'm
> > using Ubuntu 14.04 with kernel 3.16.1 from
> > http://kernel.ubuntu.com/~kernel-ppa/mainline/, there is no /proc/config.gz,
> > but but there is a config file in /boot:
> > 
> > b0@hp850ssd:~⟫ grep USB_UAS /boot/config-3.16.1-031601-generic
> > CONFIG_USB_UAS=m
> > 
> > When I attach my external USB disk I've 30 seconds before my laptop freeze,
> > here is my dmesg output, disk is not mounted:
> 
> Hmm, this sounds like a similar problem we've been having with JMicron UAS
> bridges over USB-2.
> 
> Can you collect "lsusb -v" output for the drive in question when connected
> through an usb-3 port (the uas module does not need to be loaded).
> 
> Also can you try the following patch, and see if that makes uas work ? :
> 
> diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
> index 511b229..6cdc1b9 100644
> --- a/drivers/usb/storage/uas.c
> +++ b/drivers/usb/storage/uas.c
> @@ -1033,6 +1033,7 @@ static int uas_configure_endpoints(struct uas_dev_info 
> *devinfo)
>   3, 256, GFP_NOIO);
>   if (devinfo->qdepth < 0)
>   return devinfo->qdepth;
> + devinfo->qdepth = 32;
>   devinfo->use_streams = 1;
>   }
> 
> 
> This is in essence the fix we've done for using these devices with uas over 
> usb-2,
> I would have expected this to not be be necessary at superspeed since there 
> the number
> of streams the device supports is part of the usb descriptors, but maybe the 
> device
> claims to support more streams then it can actually handle.
> 
> Note I'm on vacation next week, so don't expect another reply from me in this 
> thread
> for at least a week.
> 
> Regards,
> 
> Hans
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mm: compaction: buffer overflow in isolate_migratepages_range

2014-08-14 Thread Rafael Aquini

On Thu, Aug 14, 2014 at 06:43:50PM -0300, Rafael Aquini wrote:
> On Thu, Aug 14, 2014 at 10:07:40PM +0400, Andrey Ryabinin wrote:
> > We discussed this with Konstantin and he suggested a better solution for 
> > this.
> > If I understood him correctly the main idea was to store bit
> > identifying ballon page
> > in struct page (special value in _mapcount), so we won't need to check
> > mapping->flags.
> >
> 
> Here goes what I thought doing, following that suggestion of Konstantin and 
> yours. (I didn't tested it yet)
> 
> Comments are welcomed.
> 
> Cheers,
> -- Rafael
> 
>  8< 
> From: Rafael Aquini 
> Subject: mm: balloon_compaction: enhance balloon_page_movable() checkpoint 
> against races
> 
> While testing linux-next for the Kernel Address Sanitizer patchset (KASAN) 
> Sasha Levin reported a buffer overflow warning triggered for 
> isolate_migratepages_range(), which lated was discovered happening due to
> a condition where balloon_page_movable() raced against move_to_new_page(),
> while the later was copying the page->mapping of an anon page.
> 
> Because we can perform balloon_page_movable() in a lockless fashion at 
> isolate_migratepages_range(), the dicovered race has unveiled the scheme 
> actually used to spot ballooned pages among page blocks that checks for
> page_flags_cleared() and dereference page->mapping to check its mapping flags
> is weak and potentially prone to stumble across another similar conditions 
> in the future.
> 
> Following Konstantin Khlebnikov's and Andrey Ryabinin's suggestions,
> this patch replaces the old page->flags && mapping->flags checking scheme
> with a more simple and strong page->_mapcount read and compare value test.
> Similarly to what is done for PageBuddy() checks, BALLOON_PAGE_MAPCOUNT_VALUE
> is introduced here to mark balloon pages. This allows balloon_page_movable()
> to skip the proven troublesome dereference of page->mapping for flag checking
> while it goes on isolate_migratepages_range() lockless rounds.
> page->mapping dereference and flag-checking will be performed later, when
> all locks are held properly.
> 
> ---
>  include/linux/balloon_compaction.h | 61 
> +++---
>  mm/balloon_compaction.c| 53 +
>  2 files changed, 45 insertions(+), 69 deletions(-)
> 
> diff --git a/include/linux/balloon_compaction.h 
> b/include/linux/balloon_compaction.h
> index 089743a..1409ccc 100644
> --- a/include/linux/balloon_compaction.h
> +++ b/include/linux/balloon_compaction.h
> @@ -108,54 +108,29 @@ static inline void balloon_mapping_free(struct 
> address_space *balloon_mapping)
>  }
>  
>  /*
> - * page_flags_cleared - helper to perform balloon @page ->flags tests.
> + * balloon_page_movable - identify balloon pages that can be moved by
> + * compaction / migration.
>   *
> - * As balloon pages are obtained from buddy and we do not play with 
> page->flags
> - * at driver level (exception made when we get the page lock for compaction),
> - * we can safely identify a ballooned page by checking if the
> - * PAGE_FLAGS_CHECK_AT_PREP page->flags are all cleared.  This approach also
> - * helps us skip ballooned pages that are locked for compaction or release, 
> thus
> - * mitigating their racy check at balloon_page_movable()
> + * BALLOON_PAGE_MAPCOUNT_VALUE must be <= -2 but better not too close to
> + * -2 so that an underflow of the page_mapcount() won't be mistaken
> + * for a genuine BALLOON_PAGE_MAPCOUNT_VALUE.
>   */
> -static inline bool page_flags_cleared(struct page *page)
> +#define BALLOON_PAGE_MAPCOUNT_VALUE (-256)
> +static inline bool balloon_page_movable(struct page *page)
>  {
> - return !(page->flags & PAGE_FLAGS_CHECK_AT_PREP);
> + return atomic_read(>_mapcount) == BALLOON_PAGE_MAPCOUNT_VALUE;
>  }
>  
> -/*
> - * __is_movable_balloon_page - helper to perform @page mapping->flags tests
> - */
> -static inline bool __is_movable_balloon_page(struct page *page)
> +static inline void __balloon_page_set(struct page *page)
>  {
> - struct address_space *mapping = page->mapping;
> - return mapping_balloon(mapping);
> + VM_BUG_ON_PAGE(!atomic_read(>_mapcount) != -1, page);
> + atomic_set(>_mapcount, BALLOON_PAGE_MAPCOUNT_VALUE);
>  }
>  
> -/*
> - * balloon_page_movable - test page->mapping->flags to identify balloon pages
> - * that can be moved by compaction/migration.
> - *
> - * This function is used at core compaction's page isolation scheme, 
> therefore
> - * most pages exposed to it are not enlisted as balloon pages and so, to 
> avoid
> - * undesired side effects like racing against __free_pages(), we cannot 
> afford
> - * holding the page locked while testing page->mapping->flags here.
> - *
> - * As we might return false positives in the case of a balloon page being 
> just
> - * released under us, the page->mapping->flags need to be re-tested later,
> - * under the proper page lock, at

Re: [PATCH V2] hwmon, k10temp: Add support for F15h M60h

2014-08-14 Thread Borislav Petkov

On Thu, Aug 14, 2014 at 04:57:17PM -0500, Aravind Gopalakrishnan wrote:
> Actually I don't need it outside of k10temp as of now (or near future)
> I added it in amd_nb as that was Clemens, Guenter's suggestion on the
> previous version;
> Besides, it made sense as it's an indirect access of NB_SMU register and
> amd_nb seems a good place to put the function in case someone needs it in
> the future.

Then someone can move it then. But until that happens it is pretty
pointless of having the Kconfig dependency just for one small function
with a single user.

> I can move it locally to k10temp and remove the dependency if that's
> more preferable.

Yeah, it looks like a fabricated and not true dependency, which doesn't
make any sense currently.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] implement readpages() for block device to optimize sequential read

2014-08-14 Thread Andrew Morton

On Tue,  5 Aug 2014 23:38:31 +0900 Akinobu Mita  wrote:

> This patchset implements readpages() operation for block device by
> using mpage_readpages() which can create multipage BIOs instead of
> BIOs for each page and reduce system CPU time consumption.

Patchset is simple and straightforward enough.  But who the 
heck cares about the performance of buffered reads from /dev/XXX?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 tip/core/rcu 08/16] rcu: Add stall-warning checks for RCU-tasks

2014-08-14 Thread Paul E. McKenney

On Thu, Aug 14, 2014 at 05:39:54PM -0400, Pranith Kumar wrote:
> On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
>  wrote:
> > From: "Paul E. McKenney" 
> >
> > This commit adds a three-minute RCU-tasks stall warning.  The actual
> > time is controlled by the boot/sysfs parameter rcu_task_stall_timeout,
> > with values less than or equal to zero disabling the stall warnings.
> > The default value is three minutes, which means that the tasks that
> > have not yet responded will get their stacks dumped every ten minutes,
> > until they pass through a voluntary context switch.
> >
> > Signed-off-by: Paul E. McKenney 
> 
> Something about 3 minutes and 10 minutes is mixed up here!

Good catch, updated the commit log to also say ten minutes.

Thanx, Paul

> > ---
> >  Documentation/kernel-parameters.txt |  5 +
> >  kernel/rcu/update.c | 27 ---
> >  2 files changed, 29 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/kernel-parameters.txt 
> > b/Documentation/kernel-parameters.txt
> > index 910c3829f81d..8cdbde7b17f5 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -2921,6 +2921,11 @@ bytes respectively. Such letter suffixes can also be 
> > entirely omitted.
> > rcupdate.rcu_cpu_stall_timeout= [KNL]
> > Set timeout for RCU CPU stall warning messages.
> >
> > +   rcupdate.rcu_task_stall_timeout= [KNL]
> > +   Set timeout in jiffies for RCU task stall warning
> > +   messages.  Disable with a value less than or equal
> > +   to zero.
> > +
> > rdinit= [KNL]
> > Format: 
> > Run specified binary instead of /init from the 
> > ramdisk,
> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index 8f53a41dd9ee..f1535404a79e 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -374,7 +374,7 @@ static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
> >  DEFINE_SRCU(tasks_rcu_exit_srcu);
> >
> >  /* Control stall timeouts.  Disable with <= 0, otherwise jiffies till 
> > stall. */
> > -static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 3;
> > +static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10;
> >  module_param(rcu_task_stall_timeout, int, 0644);
> >
> >  /* Post an RCU-tasks callback. */
> > @@ -449,7 +449,8 @@ void rcu_barrier_tasks(void)
> >  EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
> >
> >  /* See if tasks are still holding out, complain if so. */
> > -static void check_holdout_task(struct task_struct *t)
> > +static void check_holdout_task(struct task_struct *t,
> > +  bool needreport, bool *firstreport)
> >  {
> > if (!ACCESS_ONCE(t->rcu_tasks_holdout) ||
> > t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) ||
> > @@ -457,7 +458,15 @@ static void check_holdout_task(struct task_struct *t)
> > ACCESS_ONCE(t->rcu_tasks_holdout) = 0;
> > list_del_rcu(>rcu_tasks_holdout_list);
> > put_task_struct(t);
> > +   return;
> > }
> > +   if (!needreport)
> > +   return;
> > +   if (*firstreport) {
> > +   pr_err("INFO: rcu_tasks detected stalls on tasks:\n");
> > +   *firstreport = false;
> > +   }
> > +   sched_show_task(t);
> >  }
> >
> >  /* RCU-tasks kthread that detects grace periods and invokes callbacks. */
> > @@ -465,6 +474,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> >  {
> > unsigned long flags;
> > struct task_struct *g, *t;
> > +   unsigned long lastreport;
> > struct rcu_head *list;
> > struct rcu_head *next;
> > LIST_HEAD(rcu_tasks_holdouts);
> > @@ -543,13 +553,24 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> >  * of holdout tasks, removing any that are no longer
> >  * holdouts.  When the list is empty, we are done.
> >  */
> > +   lastreport = jiffies;
> > while (!list_empty(_tasks_holdouts)) {
> > +   bool firstreport;
> > +   bool needreport;
> > +   int rtst;
> > +
> > schedule_timeout_interruptible(HZ);
> > +   rtst = ACCESS_ONCE(rcu_task_stall_timeout);
> > +   needreport = rtst > 0 &&
> > +time_after(jiffies, lastreport + rtst);
> > +   if (needreport)
> > +   lastreport = jiffies;
> > +   firstreport = true;
> > WARN_ON(signal_pending(current));
> > rcu_read_lock();
> > list_for_each_entry_rcu(t, _tasks_holdouts,
> >

Re: [PATCH v5 tip/core/rcu 09/16] rcu: Improve RCU-tasks energy efficiency

2014-08-14 Thread Pranith Kumar

On Thu, Aug 14, 2014 at 5:55 PM, Paul E. McKenney
 wrote:
> On Thu, Aug 14, 2014 at 05:42:06PM -0400, Pranith Kumar wrote:
>> On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
>>  wrote:
>> > From: "Paul E. McKenney" 
>> >
>> > The current RCU-tasks implementation uses strict polling to detect
>> > callback arrivals.  This works quite well, but is not so good for
>> > energy efficiency.  This commit therefore replaces the strict polling
>> > with a wait queue.
>> >
>> > Signed-off-by: Paul E. McKenney 
>> > ---
>> >  kernel/rcu/update.c | 14 --
>> >  1 file changed, 12 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
>> > index f1535404a79e..1256a900cd01 100644
>> > --- a/kernel/rcu/update.c
>> > +++ b/kernel/rcu/update.c
>> > @@ -368,6 +368,7 @@ early_initcall(check_cpu_stall_init);
>> >  /* Global list of callbacks and associated lock. */
>> >  static struct rcu_head *rcu_tasks_cbs_head;
>> >  static struct rcu_head **rcu_tasks_cbs_tail = _tasks_cbs_head;
>> > +static DECLARE_WAIT_QUEUE_HEAD(rcu_tasks_cbs_wq);
>> >  static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
>> >
>> >  /* Track exiting tasks in order to allow them to be waited for. */
>> > @@ -381,13 +382,17 @@ module_param(rcu_task_stall_timeout, int, 0644);
>> >  void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head 
>> > *rhp))
>> >  {
>> > unsigned long flags;
>> > +   bool needwake;
>> >
>> > rhp->next = NULL;
>> > rhp->func = func;
>> > raw_spin_lock_irqsave(_tasks_cbs_lock, flags);
>> > +   needwake = !rcu_tasks_cbs_head;
>> > *rcu_tasks_cbs_tail = rhp;
>> > rcu_tasks_cbs_tail = >next;
>> > raw_spin_unlock_irqrestore(_tasks_cbs_lock, flags);
>> > +   if (needwake)
>> > +   wake_up(_tasks_cbs_wq);
>> >  }
>> >  EXPORT_SYMBOL_GPL(call_rcu_tasks);
>>
>> I think you want
>>
>> needwake = !!rcu_tasks_cbs_head;
>>
>> otherwise it will wake up when rcu_tasks_cbs_head is null, no?
>
> Well, that is exactly what we want.  Note that we do the test -before-
> the enqueue.  This means that we do the wakeup if the list -was-
> empty before the enqueue, which is exactly the case where the task
> might be asleep without having already been sent a wakeup.
>
> Assuming that wakeups are reliably delivered, of course.  But if they
> are not reliably delivered, that is a bug that needs to be fixed.
>

Ohk, I did not notice the modification through rcu_tasks_cbs_tail! All is well.

-- 
Pranith
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] hwmon, k10temp: Add support for F15h M60h

2014-08-14 Thread Aravind Gopalakrishnan


On 8/14/2014 4:08 PM, Borislav Petkov wrote:

On Thu, Aug 14, 2014 at 10:22:31PM +0200, Clemens Ladisch wrote:

+   depends on X86 && PCI && AMD_NB

Is the added dependency acceptable ?

Yes, it is automatically set from CPU_SUP_AMD.

Well, we can always move that function to k10temp but I'll venture a
guess that Aravind wants to use it somewhere else too? Correct, Aravind?



Actually I don't need it outside of k10temp as of now (or near future)
I added it in amd_nb as that was Clemens, Guenter's suggestion on the 
previous version;
Besides, it made sense as it's an indirect access of NB_SMU register and 
amd_nb seems a good place to put the function in case someone needs it 
in the future.


I can move it locally to k10temp and remove the dependency if that's 
more preferable. Do let me know.


Thanks,
-Aravind.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 0/3] Experimental patchset for CPPC

2014-08-14 Thread Ashwin Chaugule

Hi Peter,

On 14 August 2014 16:51, Peter Zijlstra  wrote:
> On Thu, Aug 14, 2014 at 03:57:07PM -0400, Ashwin Chaugule wrote:
>>
>>
>> What is CPPC:
>> =
>>
>> CPPC is the new interface for CPU performance control between the OS and the
>> platform defined in ACPI 5.0+. The interface is built on an abstract
>> representation of CPU performance rather than raw frequency.  Basic operation
>> consists of:
>
> Why do we want this? Typically we've ignored ACPI and gone straight to
> MSR access, intel_pstate and intel_idle were created especially to avoid
> ACPI, so why return to it.
>
> Also, the whole interface sounds like trainwreck (one would not expect
> anything else from ACPI).
>
> So _why_?

The overall idea is that tying the notion of CPU performance to CPU
frequency is no longer true these days.[1]. So, using some direction
from an OS , the platforms want to be able to decide how to adjust CPU
performance by using knowledge that may be very platform specific.
e.g. through the use of performance counters, thermal budgets and
other system specific constraints. So, CPPC describes a way for the OS
to request performance within certain bounds and then letting the
platform optimize it within those constraints. Expressing CPU
performance in an abstract way, should also help keep things uniform
across various architecture implementations.

I dont see CPPC as necessarily an ACPI specific thing. If the platform
can provide the information which CPPC lays out via MSRs or other
system IO, then the higher algorithms should still work. The CPPC
table itself is nothing but register descriptions. It describes how
and where to access the registers. The registers can be anything, from
system I/O addresses, MSRs, CP15, or Mailbox type addresses(PCC). CPPC
doesn't even you tell what to do with that information. So its really
just a descriptor.

If you see the example in [2], the aperf and mperf reads directly go
to MSR addresses as parsed from the tables. If the platform does not
have ACPI, but knows how to provide the same information, then it can
directly read its MSRs or other sys regs. e.g. as in the case of
core_get_{min,max}_pstate(), which are used to get highest and lowest
performance values.

But I think the problem is that we dont have an algorithm that can
make use of the information that CPPC supported platforms can provide,
nor can we provide CPPC related information back to the platform.

Cheers,
Ashwin

[1]- https://plus.google.com/+ArjanvandeVen/posts/dLn9T4ehywL
[2] - 
http://git.linaro.org/people/ashwin.chaugule/leg-kernel.git/blob/236d901d31fb06fda798880c9ca09d65123c5dd9:/drivers/cpufreq/cppc_x86.c
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 tip/core/rcu 09/16] rcu: Improve RCU-tasks energy efficiency

2014-08-14 Thread Paul E. McKenney

On Thu, Aug 14, 2014 at 05:42:06PM -0400, Pranith Kumar wrote:
> On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
>  wrote:
> > From: "Paul E. McKenney" 
> >
> > The current RCU-tasks implementation uses strict polling to detect
> > callback arrivals.  This works quite well, but is not so good for
> > energy efficiency.  This commit therefore replaces the strict polling
> > with a wait queue.
> >
> > Signed-off-by: Paul E. McKenney 
> > ---
> >  kernel/rcu/update.c | 14 --
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index f1535404a79e..1256a900cd01 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -368,6 +368,7 @@ early_initcall(check_cpu_stall_init);
> >  /* Global list of callbacks and associated lock. */
> >  static struct rcu_head *rcu_tasks_cbs_head;
> >  static struct rcu_head **rcu_tasks_cbs_tail = _tasks_cbs_head;
> > +static DECLARE_WAIT_QUEUE_HEAD(rcu_tasks_cbs_wq);
> >  static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
> >
> >  /* Track exiting tasks in order to allow them to be waited for. */
> > @@ -381,13 +382,17 @@ module_param(rcu_task_stall_timeout, int, 0644);
> >  void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head 
> > *rhp))
> >  {
> > unsigned long flags;
> > +   bool needwake;
> >
> > rhp->next = NULL;
> > rhp->func = func;
> > raw_spin_lock_irqsave(_tasks_cbs_lock, flags);
> > +   needwake = !rcu_tasks_cbs_head;
> > *rcu_tasks_cbs_tail = rhp;
> > rcu_tasks_cbs_tail = >next;
> > raw_spin_unlock_irqrestore(_tasks_cbs_lock, flags);
> > +   if (needwake)
> > +   wake_up(_tasks_cbs_wq);
> >  }
> >  EXPORT_SYMBOL_GPL(call_rcu_tasks);
> 
> I think you want
> 
> needwake = !!rcu_tasks_cbs_head;
> 
> otherwise it will wake up when rcu_tasks_cbs_head is null, no?

Well, that is exactly what we want.  Note that we do the test -before-
the enqueue.  This means that we do the wakeup if the list -was-
empty before the enqueue, which is exactly the case where the task
might be asleep without having already been sent a wakeup.

Assuming that wakeups are reliably delivered, of course.  But if they
are not reliably delivered, that is a bug that needs to be fixed.

Thanx, Paul

> > @@ -498,8 +503,12 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> >
> > /* If there were none, wait a bit and start over. */
> > if (!list) {
> > -   schedule_timeout_interruptible(HZ);
> > -   WARN_ON(signal_pending(current));
> > +   wait_event_interruptible(rcu_tasks_cbs_wq,
> > +rcu_tasks_cbs_head);
> > +   if (!rcu_tasks_cbs_head) {
> > +   WARN_ON(signal_pending(current));
> > +   schedule_timeout_interruptible(HZ/10);
> > +   }
> > continue;
> > }
> >
> > @@ -605,6 +614,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > list = next;
> > cond_resched();
> > }
> > +   schedule_timeout_uninterruptible(HZ/10);
> > }
> >  }
> >
> > --
> > 1.8.1.5
> >
> 
> 
> 
> -- 
> Pranith
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 tip/core/rcu 06/16] rcutorture: Add torture tests for RCU-tasks

2014-08-14 Thread Paul E. McKenney

On Thu, Aug 14, 2014 at 02:44:15PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 14, 2014 at 05:34:53PM -0400, Pranith Kumar wrote:
> > On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
> >  wrote:
> > > From: "Paul E. McKenney" 
> > >
> > > This commit adds torture tests for RCU-tasks.  It also fixes a bug that
> > > would segfault for an RCU flavor lacking a callback-barrier function.
> > >
> > > Signed-off-by: Paul E. McKenney 
> > > Reviewed-by: Josh Triplett 
> > > ---
> > >  include/linux/rcupdate.h |  1 +
> > >  kernel/rcu/rcutorture.c  | 50 
> > > +++-
> > >  2 files changed, 50 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index e6aea256ad39..f504f797c9c8 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -55,6 +55,7 @@ enum rcutorture_type {
> > > RCU_FLAVOR,
> > > RCU_BH_FLAVOR,
> > > RCU_SCHED_FLAVOR,
> > > +   RCU_TASKS_FLAVOR,
> > > SRCU_FLAVOR,
> > > INVALID_RCU_FLAVOR
> > >  };
> > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > > index febe07062ac5..52423f2c74da 100644
> > > --- a/kernel/rcu/rcutorture.c
> > > +++ b/kernel/rcu/rcutorture.c
> > > @@ -601,6 +601,52 @@ static struct rcu_torture_ops sched_ops = {
> > > .name   = "sched"
> > >  };
> > >
> > > +#ifdef CONFIG_TASKS_RCU
> > > +
> > > +/*
> > > + * Definitions for RCU-tasks torture testing.
> > > + */
> > > +
> > > +static int tasks_torture_read_lock(void)
> > > +{
> > > +   return 0;
> > > +}
> > > +
> > > +static void tasks_torture_read_unlock(int idx)
> > > +{
> > > +}
> > > +
> > > +static void rcu_tasks_torture_deferred_free(struct rcu_torture *p)
> > > +{
> > > +   call_rcu_tasks(>rtort_rcu, rcu_torture_cb);
> > > +}
> > > +
> > > +static struct rcu_torture_ops tasks_ops = {
> > > +   .ttype  = RCU_TASKS_FLAVOR,
> > > +   .init   = rcu_sync_torture_init,
> > > +   .readlock   = tasks_torture_read_lock,
> > > +   .read_delay = rcu_read_delay,  /* just reuse rcu's version. */
> > > +   .readunlock = tasks_torture_read_unlock,
> > > +   .completed  = rcu_no_completed,
> > > +   .deferred_free  = rcu_tasks_torture_deferred_free,
> > > +   .sync   = synchronize_rcu_tasks,
> > > +   .exp_sync   = synchronize_rcu_tasks,
> > > +   .call   = call_rcu_tasks,
> > > +   .cb_barrier = rcu_barrier_tasks,
> > > +   .fqs= NULL,
> > > +   .stats  = NULL,
> > > +   .irq_capable= 1,
> > > +   .name   = "tasks"
> > > +};
> > > +
> > > +#define RCUTORTURE_TASKS_OPS _ops,
> > 
> > Not sure about the comma here, no harm but still... a minor nit :)
> 
> Good point, it would be better to parenthesize this an put the comma
> at the point of use.  Fixed!

Except that this gives me a syntax error when CONFIG_TASKS_RCU=n because
we end up with a pair of consecutive commas.  Nice try, though!  ;-)

Thanx, Paul

> > > +
> > > +#else /* #ifdef CONFIG_TASKS_RCU */
> > > +
> > > +#define RCUTORTURE_TASKS_OPS
> > > +
> > > +#endif /* #else #ifdef CONFIG_TASKS_RCU */
> > > +
> > >  /*
> > >   * RCU torture priority-boost testing.  Runs one real-time thread per
> > >   * CPU for moderate bursts, repeatedly registering RCU callbacks and
> > > @@ -1295,7 +1341,8 @@ static int rcu_torture_barrier_cbs(void *arg)
> > > if (atomic_dec_and_test(_cbs_count))
> > > wake_up(_wq);
> > > } while (!torture_must_stop());
> > > -   cur_ops->cb_barrier();
> > > +   if (cur_ops->cb_barrier != NULL)
> > > +   cur_ops->cb_barrier();
> > > destroy_rcu_head_on_stack();
> > > torture_kthread_stopping("rcu_torture_barrier_cbs");
> > > return 0;
> > > @@ -1534,6 +1581,7 @@ rcu_torture_init(void)
> > > int firsterr = 0;
> > > static struct rcu_torture_ops *torture_ops[] = {
> > > _ops, _bh_ops, _busted_ops, _ops, 
> > > _ops,
> > > +   RCUTORTURE_TASKS_OPS
> > > };
> > >
> > > if (!torture_init_begin(torture_type, verbose, 
> > > _runnable))
> > > --
> > > 1.8.1.5
> > >
> > 
> > 
> > 
> > -- 
> > Pranith
> > 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] Move Intel SNB device ids from sb_edac to pci_ids.h

2014-08-14 Thread Andy Lutomirski

The i2c_imc driver will use two of them, and moving only part of
the list seems messier.

Cc: Mauro Carvalho Chehab 
Cc: Rui Wang 
Signed-off-by: Andy Lutomirski 
---
 drivers/edac/sb_edac.c  | 30 --
 include/linux/pci_ids.h | 15 +++
 2 files changed, 15 insertions(+), 30 deletions(-)

diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index deea0dcb..a2597e9313c6 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -52,36 +52,6 @@ static int probed;
 #define GET_BITFIELD(v, lo, hi)\
(((v) & GENMASK_ULL(hi, lo)) >> (lo))
 
-/*
- * sbridge Memory Controller Registers
- */
-
-/*
- * FIXME: For now, let's order by device function, as it makes
- * easier for driver's development process. This table should be
- * moved to pci_id.h when submitted upstream
- */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD0   0x3cf4  /* 12.6 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD1   0x3cf6  /* 12.7 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_BR 0x3cf5  /* 13.6 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA00x3ca0  /* 14.0 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA 0x3ca8  /* 15.0 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_RAS0x3c71  /* 15.1 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD0   0x3caa  /* 15.2 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD1   0x3cab  /* 15.3 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD2   0x3cac  /* 15.4 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD3   0x3cad  /* 15.5 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_DDRIO  0x3cb8  /* 17.0 */
-
-   /*
-* Currently, unused, but will be needed in the future
-* implementations, as they hold the error counters
-*/
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR0   0x3c72  /* 16.2 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR1   0x3c73  /* 16.3 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR2   0x3c76  /* 16.6 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR3   0x3c77  /* 16.7 */
-
 /* Devices 12 Function 6, Offsets 0x80 to 0xcc */
 static const u32 sbridge_dram_rule[] = {
0x80, 0x88, 0x90, 0x98, 0xa0,
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 7fa31731c854..e0e6801c3d80 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2816,7 +2816,22 @@
 #define PCI_DEVICE_ID_INTEL_UNC_R2PCIE 0x3c43
 #define PCI_DEVICE_ID_INTEL_UNC_R3QPI0 0x3c44
 #define PCI_DEVICE_ID_INTEL_UNC_R3QPI1 0x3c45
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_RAS0x3c71  /* 15.1 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR0   0x3c72  /* 16.2 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR1   0x3c73  /* 16.3 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR2   0x3c76  /* 16.6 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR3   0x3c77  /* 16.7 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA00x3ca0  /* 14.0 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA 0x3ca8  /* 15.0 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD0   0x3caa  /* 15.2 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD1   0x3cab  /* 15.3 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD2   0x3cac  /* 15.4 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD3   0x3cad  /* 15.5 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_DDRIO  0x3cb8  /* 17.0 */
 #define PCI_DEVICE_ID_INTEL_JAKETOWN_UBOX  0x3ce0
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD0   0x3cf4  /* 12.6 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_BR 0x3cf5  /* 13.6 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD1   0x3cf6  /* 12.7 */
 #define PCI_DEVICE_ID_INTEL_IOAT_SNB   0x402f
 #define PCI_DEVICE_ID_INTEL_5100_160x65f0
 #define PCI_DEVICE_ID_INTEL_5100_190x65f3
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] sb_edac: Claim a different PCI device

2014-08-14 Thread Andy Lutomirski

sb_edac controls a large number of different PCI functions.  Rather
than registering as a normal PCI driver for all of them, it
registers for just one so that it gets probed and, at probe time, it
looks for all the others.

Coincidentally, the device it registers for also contains the SMBUS
registers, so the PCI core will refuse to probe both sb_edac and a
future iMC SMBUS driver.  The drivers don't actually conflict, so
just change sb_edac's device table to probe a different device.

An alternative fix would be to merge the two drivers, but sb_edac
will also refuse to load on non-ECC systems, whereas i2c_imc would
still be useful without ECC.

The only user-visible change should be that sb_edac appears to bind
a different device.

Cc: Mauro Carvalho Chehab 
Cc: Rui Wang 
Signed-off-by: Andy Lutomirski 
---
 drivers/edac/sb_edac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index a2597e9313c6..e3bc2cced580 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -432,7 +432,7 @@ static const struct pci_id_table 
pci_dev_descr_ibridge_table[] = {
  * pci_device_id   table for which devices we are looking for
  */
 static const struct pci_device_id sbridge_pci_tbl[] = {
-   {PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA)},
+   {PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0)},
{PCI_DEVICE(PCI_VENDOR_ID_INTEL, 
PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TA)},
{0,}/* 0 terminated list. */
 };
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/2] sb_edac: i2c_imc staging submission prep

2014-08-14 Thread Andy Lutomirski

I'd like to submit my i2c_imc driver to -staging, but the sb_edac
driver is currently squatting on my pci id :) sb_edac is a strange
beast: it uses registers from several PCI devcies, but the one that
it registers with the driver core is the SMBUS controller.

This trivial series moves sb_edac's PCI ids to pci_ids.h (they're
not exclusive to the EDAC hardware) and changes the PCI ID that is
used to detect the EDAC hardware.

I think that i2c_imc is a good staging candidate: the driver is IMO
quite clean, the hardware is very common, and I know of some users
(unrelated to me!) that use it for development, but it's not yet
acceptable as a real driver.  In particular, it needs confirmation
from Intel as to whether it handshakes correctly with BIOS.  In the
mean time, it's perfectly safe to use *if you know that your system
isn't doing something special with its DIMM SMBUS registers*.

I have reason to believe that I may be able to get a information
or a review from the right people at Intel in a couple of months,
and I suspect that some people in the NV-DIMM community would be
interested in this stuff.

I realize that the timing is a bit awkward here.  These patches have
been floating around for almost a year.  I'd be okay with them going
in for 3.17 or 3.18.  If I understand correctly, the deadline for
staging drivers is much later than the merge window, but I don't
want to submit the i2c_imc driver itself to staging until these prep
patches are in.

Andy Lutomirski (2):
  Move Intel SNB device ids from sb_edac to pci_ids.h
  sb_edac: Claim a different PCI device

 drivers/edac/sb_edac.c  | 32 +---
 include/linux/pci_ids.h | 15 +++
 2 files changed, 16 insertions(+), 31 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mm: compaction: buffer overflow in isolate_migratepages_range

2014-08-14 Thread Rafael Aquini

On Thu, Aug 14, 2014 at 10:07:40PM +0400, Andrey Ryabinin wrote:
> We discussed this with Konstantin and he suggested a better solution for this.
> If I understood him correctly the main idea was to store bit
> identifying ballon page
> in struct page (special value in _mapcount), so we won't need to check
> mapping->flags.
>

Here goes what I thought doing, following that suggestion of Konstantin and 
yours. (I didn't tested it yet)

Comments are welcomed.

Cheers,
-- Rafael

 8< 
From: Rafael Aquini 
Subject: mm: balloon_compaction: enhance balloon_page_movable() checkpoint 
against races

While testing linux-next for the Kernel Address Sanitizer patchset (KASAN) 
Sasha Levin reported a buffer overflow warning triggered for 
isolate_migratepages_range(), which lated was discovered happening due to
a condition where balloon_page_movable() raced against move_to_new_page(),
while the later was copying the page->mapping of an anon page.

Because we can perform balloon_page_movable() in a lockless fashion at 
isolate_migratepages_range(), the dicovered race has unveiled the scheme 
actually used to spot ballooned pages among page blocks that checks for
page_flags_cleared() and dereference page->mapping to check its mapping flags
is weak and potentially prone to stumble across another similar conditions 
in the future.

Following Konstantin Khlebnikov's and Andrey Ryabinin's suggestions,
this patch replaces the old page->flags && mapping->flags checking scheme
with a more simple and strong page->_mapcount read and compare value test.
Similarly to what is done for PageBuddy() checks, BALLOON_PAGE_MAPCOUNT_VALUE
is introduced here to mark balloon pages. This allows balloon_page_movable()
to skip the proven troublesome dereference of page->mapping for flag checking
while it goes on isolate_migratepages_range() lockless rounds.
page->mapping dereference and flag-checking will be performed later, when
all locks are held properly.

---
 include/linux/balloon_compaction.h | 61 +++---
 mm/balloon_compaction.c| 53 +
 2 files changed, 45 insertions(+), 69 deletions(-)

diff --git a/include/linux/balloon_compaction.h 
b/include/linux/balloon_compaction.h
index 089743a..1409ccc 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -108,54 +108,29 @@ static inline void balloon_mapping_free(struct 
address_space *balloon_mapping)
 }

 /*
- * page_flags_cleared - helper to perform balloon @page ->flags tests.
+ * balloon_page_movable - identify balloon pages that can be moved by
+ *   compaction / migration.
  *
- * As balloon pages are obtained from buddy and we do not play with page->flags
- * at driver level (exception made when we get the page lock for compaction),
- * we can safely identify a ballooned page by checking if the
- * PAGE_FLAGS_CHECK_AT_PREP page->flags are all cleared.  This approach also
- * helps us skip ballooned pages that are locked for compaction or release, 
thus
- * mitigating their racy check at balloon_page_movable()
+ * BALLOON_PAGE_MAPCOUNT_VALUE must be <= -2 but better not too close to
+ * -2 so that an underflow of the page_mapcount() won't be mistaken
+ * for a genuine BALLOON_PAGE_MAPCOUNT_VALUE.
  */
-static inline bool page_flags_cleared(struct page *page)
+#define BALLOON_PAGE_MAPCOUNT_VALUE (-256)
+static inline bool balloon_page_movable(struct page *page)
 {
-   return !(page->flags & PAGE_FLAGS_CHECK_AT_PREP);
+   return atomic_read(>_mapcount) == BALLOON_PAGE_MAPCOUNT_VALUE;
 }

-/*
- * __is_movable_balloon_page - helper to perform @page mapping->flags tests
- */
-static inline bool __is_movable_balloon_page(struct page *page)
+static inline void __balloon_page_set(struct page *page)
 {
-   struct address_space *mapping = page->mapping;
-   return mapping_balloon(mapping);
+   VM_BUG_ON_PAGE(!atomic_read(>_mapcount) != -1, page);
+   atomic_set(>_mapcount, BALLOON_PAGE_MAPCOUNT_VALUE);
 }

-/*
- * balloon_page_movable - test page->mapping->flags to identify balloon pages
- *   that can be moved by compaction/migration.
- *
- * This function is used at core compaction's page isolation scheme, therefore
- * most pages exposed to it are not enlisted as balloon pages and so, to avoid
- * undesired side effects like racing against __free_pages(), we cannot afford
- * holding the page locked while testing page->mapping->flags here.
- *
- * As we might return false positives in the case of a balloon page being just
- * released under us, the page->mapping->flags need to be re-tested later,
- * under the proper page lock, at the functions that will be coping with the
- * balloon page case.
- */
-static inline bool balloon_page_movable(struct page *page)
+static inline void __balloon_page_clear(struct page *page)
 {
-   /*
-* Before dereferencing and testing mapping->flags, let's make sure
-

Re: [PATCH v5 tip/core/rcu 06/16] rcutorture: Add torture tests for RCU-tasks

2014-08-14 Thread Paul E. McKenney

On Thu, Aug 14, 2014 at 05:34:53PM -0400, Pranith Kumar wrote:
> On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
>  wrote:
> > From: "Paul E. McKenney" 
> >
> > This commit adds torture tests for RCU-tasks.  It also fixes a bug that
> > would segfault for an RCU flavor lacking a callback-barrier function.
> >
> > Signed-off-by: Paul E. McKenney 
> > Reviewed-by: Josh Triplett 
> > ---
> >  include/linux/rcupdate.h |  1 +
> >  kernel/rcu/rcutorture.c  | 50 
> > +++-
> >  2 files changed, 50 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index e6aea256ad39..f504f797c9c8 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -55,6 +55,7 @@ enum rcutorture_type {
> > RCU_FLAVOR,
> > RCU_BH_FLAVOR,
> > RCU_SCHED_FLAVOR,
> > +   RCU_TASKS_FLAVOR,
> > SRCU_FLAVOR,
> > INVALID_RCU_FLAVOR
> >  };
> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index febe07062ac5..52423f2c74da 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -601,6 +601,52 @@ static struct rcu_torture_ops sched_ops = {
> > .name   = "sched"
> >  };
> >
> > +#ifdef CONFIG_TASKS_RCU
> > +
> > +/*
> > + * Definitions for RCU-tasks torture testing.
> > + */
> > +
> > +static int tasks_torture_read_lock(void)
> > +{
> > +   return 0;
> > +}
> > +
> > +static void tasks_torture_read_unlock(int idx)
> > +{
> > +}
> > +
> > +static void rcu_tasks_torture_deferred_free(struct rcu_torture *p)
> > +{
> > +   call_rcu_tasks(>rtort_rcu, rcu_torture_cb);
> > +}
> > +
> > +static struct rcu_torture_ops tasks_ops = {
> > +   .ttype  = RCU_TASKS_FLAVOR,
> > +   .init   = rcu_sync_torture_init,
> > +   .readlock   = tasks_torture_read_lock,
> > +   .read_delay = rcu_read_delay,  /* just reuse rcu's version. */
> > +   .readunlock = tasks_torture_read_unlock,
> > +   .completed  = rcu_no_completed,
> > +   .deferred_free  = rcu_tasks_torture_deferred_free,
> > +   .sync   = synchronize_rcu_tasks,
> > +   .exp_sync   = synchronize_rcu_tasks,
> > +   .call   = call_rcu_tasks,
> > +   .cb_barrier = rcu_barrier_tasks,
> > +   .fqs= NULL,
> > +   .stats  = NULL,
> > +   .irq_capable= 1,
> > +   .name   = "tasks"
> > +};
> > +
> > +#define RCUTORTURE_TASKS_OPS _ops,
> 
> Not sure about the comma here, no harm but still... a minor nit :)

Good point, it would be better to parenthesize this an put the comma
at the point of use.  Fixed!

Thanx, Paul

> > +
> > +#else /* #ifdef CONFIG_TASKS_RCU */
> > +
> > +#define RCUTORTURE_TASKS_OPS
> > +
> > +#endif /* #else #ifdef CONFIG_TASKS_RCU */
> > +
> >  /*
> >   * RCU torture priority-boost testing.  Runs one real-time thread per
> >   * CPU for moderate bursts, repeatedly registering RCU callbacks and
> > @@ -1295,7 +1341,8 @@ static int rcu_torture_barrier_cbs(void *arg)
> > if (atomic_dec_and_test(_cbs_count))
> > wake_up(_wq);
> > } while (!torture_must_stop());
> > -   cur_ops->cb_barrier();
> > +   if (cur_ops->cb_barrier != NULL)
> > +   cur_ops->cb_barrier();
> > destroy_rcu_head_on_stack();
> > torture_kthread_stopping("rcu_torture_barrier_cbs");
> > return 0;
> > @@ -1534,6 +1581,7 @@ rcu_torture_init(void)
> > int firsterr = 0;
> > static struct rcu_torture_ops *torture_ops[] = {
> > _ops, _bh_ops, _busted_ops, _ops, 
> > _ops,
> > +   RCUTORTURE_TASKS_OPS
> > };
> >
> > if (!torture_init_begin(torture_type, verbose, 
> > _runnable))
> > --
> > 1.8.1.5
> >
> 
> 
> 
> -- 
> Pranith
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 tip/core/rcu 09/16] rcu: Improve RCU-tasks energy efficiency

2014-08-14 Thread Pranith Kumar

On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
 wrote:
> From: "Paul E. McKenney" 
>
> The current RCU-tasks implementation uses strict polling to detect
> callback arrivals.  This works quite well, but is not so good for
> energy efficiency.  This commit therefore replaces the strict polling
> with a wait queue.
>
> Signed-off-by: Paul E. McKenney 
> ---
>  kernel/rcu/update.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index f1535404a79e..1256a900cd01 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -368,6 +368,7 @@ early_initcall(check_cpu_stall_init);
>  /* Global list of callbacks and associated lock. */
>  static struct rcu_head *rcu_tasks_cbs_head;
>  static struct rcu_head **rcu_tasks_cbs_tail = _tasks_cbs_head;
> +static DECLARE_WAIT_QUEUE_HEAD(rcu_tasks_cbs_wq);
>  static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
>
>  /* Track exiting tasks in order to allow them to be waited for. */
> @@ -381,13 +382,17 @@ module_param(rcu_task_stall_timeout, int, 0644);
>  void call_rcu_tasks(struct rcu_head *rhp, void (*func)(struct rcu_head *rhp))
>  {
> unsigned long flags;
> +   bool needwake;
>
> rhp->next = NULL;
> rhp->func = func;
> raw_spin_lock_irqsave(_tasks_cbs_lock, flags);
> +   needwake = !rcu_tasks_cbs_head;
> *rcu_tasks_cbs_tail = rhp;
> rcu_tasks_cbs_tail = >next;
> raw_spin_unlock_irqrestore(_tasks_cbs_lock, flags);
> +   if (needwake)
> +   wake_up(_tasks_cbs_wq);
>  }
>  EXPORT_SYMBOL_GPL(call_rcu_tasks);

I think you want

needwake = !!rcu_tasks_cbs_head;

otherwise it will wake up when rcu_tasks_cbs_head is null, no?

>
> @@ -498,8 +503,12 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>
> /* If there were none, wait a bit and start over. */
> if (!list) {
> -   schedule_timeout_interruptible(HZ);
> -   WARN_ON(signal_pending(current));
> +   wait_event_interruptible(rcu_tasks_cbs_wq,
> +rcu_tasks_cbs_head);
> +   if (!rcu_tasks_cbs_head) {
> +   WARN_ON(signal_pending(current));
> +   schedule_timeout_interruptible(HZ/10);
> +   }
> continue;
> }
>
> @@ -605,6 +614,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> list = next;
> cond_resched();
> }
> +   schedule_timeout_uninterruptible(HZ/10);
> }
>  }
>
> --
> 1.8.1.5
>



-- 
Pranith
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] net: xilinx: Remove .owner field for driver

2014-08-14 Thread David Miller

From: Michal Simek 
Date: Wed, 13 Aug 2014 13:54:22 +0200

> There is no need to init .owner field.
> 
> Based on the patch from Peter Griffin 
> "mmc: remove .owner field for drivers using module_platform_driver"
> 
> This patch removes the superflous .owner field for drivers which
> use the module_platform_driver API, as this is overriden in
> platform_driver_register anyway."
> 
> Signed-off-by: Michal Simek 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 tip/core/rcu 08/16] rcu: Add stall-warning checks for RCU-tasks

2014-08-14 Thread Pranith Kumar

On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
 wrote:
> From: "Paul E. McKenney" 
>
> This commit adds a three-minute RCU-tasks stall warning.  The actual
> time is controlled by the boot/sysfs parameter rcu_task_stall_timeout,
> with values less than or equal to zero disabling the stall warnings.
> The default value is three minutes, which means that the tasks that
> have not yet responded will get their stacks dumped every ten minutes,
> until they pass through a voluntary context switch.
>
> Signed-off-by: Paul E. McKenney 

Something about 3 minutes and 10 minutes is mixed up here!

> ---
>  Documentation/kernel-parameters.txt |  5 +
>  kernel/rcu/update.c | 27 ---
>  2 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index 910c3829f81d..8cdbde7b17f5 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2921,6 +2921,11 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
> rcupdate.rcu_cpu_stall_timeout= [KNL]
> Set timeout for RCU CPU stall warning messages.
>
> +   rcupdate.rcu_task_stall_timeout= [KNL]
> +   Set timeout in jiffies for RCU task stall warning
> +   messages.  Disable with a value less than or equal
> +   to zero.
> +
> rdinit= [KNL]
> Format: 
> Run specified binary instead of /init from the 
> ramdisk,
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index 8f53a41dd9ee..f1535404a79e 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -374,7 +374,7 @@ static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
>  DEFINE_SRCU(tasks_rcu_exit_srcu);
>
>  /* Control stall timeouts.  Disable with <= 0, otherwise jiffies till stall. 
> */
> -static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 3;
> +static int rcu_task_stall_timeout __read_mostly = HZ * 60 * 10;
>  module_param(rcu_task_stall_timeout, int, 0644);
>
>  /* Post an RCU-tasks callback. */
> @@ -449,7 +449,8 @@ void rcu_barrier_tasks(void)
>  EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
>
>  /* See if tasks are still holding out, complain if so. */
> -static void check_holdout_task(struct task_struct *t)
> +static void check_holdout_task(struct task_struct *t,
> +  bool needreport, bool *firstreport)
>  {
> if (!ACCESS_ONCE(t->rcu_tasks_holdout) ||
> t->rcu_tasks_nvcsw != ACCESS_ONCE(t->nvcsw) ||
> @@ -457,7 +458,15 @@ static void check_holdout_task(struct task_struct *t)
> ACCESS_ONCE(t->rcu_tasks_holdout) = 0;
> list_del_rcu(>rcu_tasks_holdout_list);
> put_task_struct(t);
> +   return;
> }
> +   if (!needreport)
> +   return;
> +   if (*firstreport) {
> +   pr_err("INFO: rcu_tasks detected stalls on tasks:\n");
> +   *firstreport = false;
> +   }
> +   sched_show_task(t);
>  }
>
>  /* RCU-tasks kthread that detects grace periods and invokes callbacks. */
> @@ -465,6 +474,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>  {
> unsigned long flags;
> struct task_struct *g, *t;
> +   unsigned long lastreport;
> struct rcu_head *list;
> struct rcu_head *next;
> LIST_HEAD(rcu_tasks_holdouts);
> @@ -543,13 +553,24 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>  * of holdout tasks, removing any that are no longer
>  * holdouts.  When the list is empty, we are done.
>  */
> +   lastreport = jiffies;
> while (!list_empty(_tasks_holdouts)) {
> +   bool firstreport;
> +   bool needreport;
> +   int rtst;
> +
> schedule_timeout_interruptible(HZ);
> +   rtst = ACCESS_ONCE(rcu_task_stall_timeout);
> +   needreport = rtst > 0 &&
> +time_after(jiffies, lastreport + rtst);
> +   if (needreport)
> +   lastreport = jiffies;
> +   firstreport = true;
> WARN_ON(signal_pending(current));
> rcu_read_lock();
> list_for_each_entry_rcu(t, _tasks_holdouts,
> rcu_tasks_holdout_list)
> -   check_holdout_task(t);
> +   check_holdout_task(t, needreport, 
> );
> rcu_read_unlock();
> }
>
> --
> 1.8.1.5
>



-- 
Pranith
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to

Re: [PATCH] tcp: don't use timestamp from repaired skb-s to calculate RTT (v2)

2014-08-14 Thread David Miller

From: Andrey Vagin 
Date: Wed, 13 Aug 2014 16:03:10 +0400

> We don't know right timestamp for repaired skb-s. Wrong RTT estimations
> isn't good, because some congestion modules heavily depends on it.
> 
> This patch adds the TCPCB_REPAIRED flag, which is included in
> TCPCB_RETRANS.
> 
> Thanks to Eric for the advice how to fix this issue.
> 
> This patch fixes the warning:
 ...
> v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit,
> where it's set for other skb-s.
> 
> Fixes: 431a91242d8d ("tcp: timestamp SYN+DATA messages")
> Fixes: 740b0f1841f6 ("tcp: switch rtt estimations to usec resolution")
> Cc: Eric Dumazet 
> Cc: Pavel Emelyanov 
> Cc: "David S. Miller" 
> Signed-off-by: Andrey Vagin 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 tip/core/rcu 06/16] rcutorture: Add torture tests for RCU-tasks

2014-08-14 Thread Pranith Kumar

On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
 wrote:
> From: "Paul E. McKenney" 
>
> This commit adds torture tests for RCU-tasks.  It also fixes a bug that
> would segfault for an RCU flavor lacking a callback-barrier function.
>
> Signed-off-by: Paul E. McKenney 
> Reviewed-by: Josh Triplett 
> ---
>  include/linux/rcupdate.h |  1 +
>  kernel/rcu/rcutorture.c  | 50 
> +++-
>  2 files changed, 50 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index e6aea256ad39..f504f797c9c8 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -55,6 +55,7 @@ enum rcutorture_type {
> RCU_FLAVOR,
> RCU_BH_FLAVOR,
> RCU_SCHED_FLAVOR,
> +   RCU_TASKS_FLAVOR,
> SRCU_FLAVOR,
> INVALID_RCU_FLAVOR
>  };
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index febe07062ac5..52423f2c74da 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -601,6 +601,52 @@ static struct rcu_torture_ops sched_ops = {
> .name   = "sched"
>  };
>
> +#ifdef CONFIG_TASKS_RCU
> +
> +/*
> + * Definitions for RCU-tasks torture testing.
> + */
> +
> +static int tasks_torture_read_lock(void)
> +{
> +   return 0;
> +}
> +
> +static void tasks_torture_read_unlock(int idx)
> +{
> +}
> +
> +static void rcu_tasks_torture_deferred_free(struct rcu_torture *p)
> +{
> +   call_rcu_tasks(>rtort_rcu, rcu_torture_cb);
> +}
> +
> +static struct rcu_torture_ops tasks_ops = {
> +   .ttype  = RCU_TASKS_FLAVOR,
> +   .init   = rcu_sync_torture_init,
> +   .readlock   = tasks_torture_read_lock,
> +   .read_delay = rcu_read_delay,  /* just reuse rcu's version. */
> +   .readunlock = tasks_torture_read_unlock,
> +   .completed  = rcu_no_completed,
> +   .deferred_free  = rcu_tasks_torture_deferred_free,
> +   .sync   = synchronize_rcu_tasks,
> +   .exp_sync   = synchronize_rcu_tasks,
> +   .call   = call_rcu_tasks,
> +   .cb_barrier = rcu_barrier_tasks,
> +   .fqs= NULL,
> +   .stats  = NULL,
> +   .irq_capable= 1,
> +   .name   = "tasks"
> +};
> +
> +#define RCUTORTURE_TASKS_OPS _ops,


Not sure about the comma here, no harm but still... a minor nit :)

> +
> +#else /* #ifdef CONFIG_TASKS_RCU */
> +
> +#define RCUTORTURE_TASKS_OPS
> +
> +#endif /* #else #ifdef CONFIG_TASKS_RCU */
> +
>  /*
>   * RCU torture priority-boost testing.  Runs one real-time thread per
>   * CPU for moderate bursts, repeatedly registering RCU callbacks and
> @@ -1295,7 +1341,8 @@ static int rcu_torture_barrier_cbs(void *arg)
> if (atomic_dec_and_test(_cbs_count))
> wake_up(_wq);
> } while (!torture_must_stop());
> -   cur_ops->cb_barrier();
> +   if (cur_ops->cb_barrier != NULL)
> +   cur_ops->cb_barrier();
> destroy_rcu_head_on_stack();
> torture_kthread_stopping("rcu_torture_barrier_cbs");
> return 0;
> @@ -1534,6 +1581,7 @@ rcu_torture_init(void)
> int firsterr = 0;
> static struct rcu_torture_ops *torture_ops[] = {
> _ops, _bh_ops, _busted_ops, _ops, _ops,
> +   RCUTORTURE_TASKS_OPS
> };
>
> if (!torture_init_begin(torture_type, verbose, _runnable))
> --
> 1.8.1.5
>



-- 
Pranith
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH net-next,v2] hyperv: Increase the buffer length for netvsc_channel_cb()

2014-08-14 Thread Haiyang Zhang



> -Original Message-
> From: David Miller [mailto:da...@redhat.com]
> Sent: Thursday, August 14, 2014 5:29 PM
> To: Haiyang Zhang
> Cc: net...@vger.kernel.org; KY Srinivasan; o...@aepfle.de;
> jasow...@redhat.com; linux-kernel@vger.kernel.org; driverdev-
> de...@linuxdriverproject.org
> Subject: Re: [PATCH net-next,v2] hyperv: Increase the buffer length for
> netvsc_channel_cb()
> 
> From: Haiyang Zhang 
> Date: Wed, 13 Aug 2014 18:03:44 +
> 
> > When the buffer is too small for a packet from VMBus, a bigger buffer
> will be
> > allocated in netvsc_channel_cb() and retry reading the packet from
> VMBus.
> > Increasing this buffer size will reduce the retry overhead.
> >
> > Signed-off-by: Haiyang Zhang 
> > Reviewed-by: Dexuan Cui 
> ...
> > -   net_device = kzalloc(sizeof(struct netvsc_device), GFP_KERNEL);
> > +   net_device = vzalloc(sizeof(*net_device));
> 
> This isn't what I suggested that you do.
> 
> I said that the buffer inside of netvsc_device should be made an
> indirect pointer and thus allocated seperately.
> 
> Thus you're still kzalloc() net_device, but net_device->cb_buffer
> becomes "unsigned char *" and another allocation is made for it.

I will change the patch to this way.
Thanks,

- Haiyang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 tip/core/rcu 05/16] rcu: Export RCU-tasks APIs to GPL modules

2014-08-14 Thread Paul E. McKenney

On Thu, Aug 14, 2014 at 03:08:06PM -0400, Pranith Kumar wrote:
> On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
>  wrote:
> > From: Steven Rostedt 
> >
> > This commit exports the RCU-tasks APIs, call_rcu_tasks(),
> > synchronize_rcu_tasks(), and rcu_barrier_tasks(), to GPL-licensed
> > kernel modules.
> 
> Only two of these are being exported in this patch. Patch 1 is adding
> the export for call_rcu_tasks().

Good point, updated the commit log accordingly.

Thanx, Paul

> > Signed-off-by: Steven Rostedt 
> > Signed-off-by: Paul E. McKenney 
> > Reviewed-by: Josh Triplett 
> > ---
> >  kernel/rcu/update.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index 4cece6e886ee..8f53a41dd9ee 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -433,6 +433,7 @@ void synchronize_rcu_tasks(void)
> > /* Wait for the grace period. */
> > wait_rcu_gp(call_rcu_tasks);
> >  }
> > +EXPORT_SYMBOL_GPL(synchronize_rcu_tasks);
> >
> >  /**
> >   * rcu_barrier_tasks - Wait for in-flight call_rcu_tasks() callbacks.
> > @@ -445,6 +446,7 @@ void rcu_barrier_tasks(void)
> > /* There is only one callback queue, so this is easy.  ;-) */
> > synchronize_rcu_tasks();
> >  }
> > +EXPORT_SYMBOL_GPL(rcu_barrier_tasks);
> >
> >  /* See if tasks are still holding out, complain if so. */
> >  static void check_holdout_task(struct task_struct *t)
> > --
> > 1.8.1.5
> >
> 
> 
> 
> -- 
> Pranith
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next,v2] hyperv: Increase the buffer length for netvsc_channel_cb()

2014-08-14 Thread David Miller

From: Haiyang Zhang 
Date: Wed, 13 Aug 2014 18:03:44 +

> When the buffer is too small for a packet from VMBus, a bigger buffer will be
> allocated in netvsc_channel_cb() and retry reading the packet from VMBus.
> Increasing this buffer size will reduce the retry overhead.
> 
> Signed-off-by: Haiyang Zhang 
> Reviewed-by: Dexuan Cui 
...
> - net_device = kzalloc(sizeof(struct netvsc_device), GFP_KERNEL);
> + net_device = vzalloc(sizeof(*net_device));

This isn't what I suggested that you do.

I said that the buffer inside of netvsc_device should be made an
indirect pointer and thus allocated seperately.

Thus you're still kzalloc() net_device, but net_device->cb_buffer
becomes "unsigned char *" and another allocation is made for it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] new APIs to allocate buffer-cache for superblock in non-movable area

2014-08-14 Thread Andrew Morton

On Thu, 14 Aug 2014 14:12:17 +0900 Gioh Kim  wrote:

> This patch try to solve problem that a long-lasting page caches of
> ext4 superblock and journaling of superblock disturb page migration.
> 
> I've been testing CMA feature on my ARM-based platform
> and found that two page caches cannot be migrated.
> They are page caches of superblock of ext4 filesystem and its journaling data.
> 
> Current ext4 reads superblock with sb_bread() that allocates page
> from movable area. But the problem is that ext4 hold the page until
> it is unmounted. If root filesystem is ext4 the page cannot be migrated 
> forever.
> And also the journaling data for the superblock cannot be migreated.
> 
> I introduce a new API for allocating page cache from non-movable area.
> It is useful for ext4/ext3 and others that want to hold page cache for a long 
> time.

All seems reasonable to me.  The additional overhead in buffer.c from
additional function arguments is regrettable but I don't see a
non-hacky alternative.

One vital question which the changelog doesn't really address (it
should): how important is this patch?  Is your test system presently
"completely dead in the water utterly unusable" or "occasionally not
quite as good as it could be".  Somewhere in between?

See, the patch adds costs.  I'd like us to have a good understanding of
what benefits it brings.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.17-rc0: kobject_put on uninitialied object

2014-08-14 Thread Pavel Machek

Hi!

3.17-rc0, but I lost sources from this one. I guess I'll restart with
newer kernel and see if it happens again...
Pavel

Aug 14 19:59:58 duo kernel: perf interrupt took too long (5749 > 5000), 
lowering kernel.perf_event_max_sample_rate to 25000
Aug 14 20:04:20 duo kernel: e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, 
Flow Control: Rx/Tx
Aug 14 20:04:20 duo kernel: e1000e :02:00.0 eth0: Link Speed was downgraded 
by SmartSpeed
Aug 14 20:04:20 duo kernel: e1000e :02:00.0 eth0: 10/100 speed: disabling 
TSO
Aug 14 20:04:27 duo kernel: wlan0: deauthenticating from 00:11:95:05:30:d7 by 
local choice (Reason: 3=DEAUTH_LEAVING)
Aug 14 20:04:27 duo kernel: cfg80211: Calling CRDA to update world regulatory 
domain
Aug 14 20:05:31 duo kernel: nm-connection-e[24493]: segfault at 2c ip 08073c24 
sp bfd50640 error 4 in nm-connection-editor[8048000+56000]
Aug 14 21:08:23 duo kernel: [ cut here ]
Aug 14 21:08:23 duo kernel: WARNING: CPU: 1 PID: 23996 at lib/kobject.c:670 
kobject_put+0x61/0x70()
Aug 14 21:08:23 duo kernel: kobject: '(null)' (caff26b8): is not initialized, 
yet kobject_put() is being called.
Aug 14 21:08:23 duo kernel: Modules linked in:
Aug 14 21:08:23 duo kernel: CPU: 1 PID: 23996 Comm: umount Not tainted 3.16.0+ 
#390
Aug 14 21:08:23 duo kernel: Hardware name: LENOVO 17097HU/17097HU, BIOS 
7BETD8WW (2.19 ) 03/31/2011
Aug 14 21:08:23 duo kernel: 029e c2fe5c44 c482b92d c4a6097c c2fe5c74 
c4038f2a c4a60b2c c2fe5ca0
Aug 14 21:08:23 duo kernel: 5dbc c4a6097c 029e c4282561 c4282561 
dc977800 caff26b0 dc977a20
Aug 14 21:08:23 duo kernel: c2fe5c8c c4038fce 0009 c2fe5c84 c4a60b2c 
c2fe5ca0 c2fe5cac c4282561
Aug 14 21:08:23 duo kernel: Call Trace:
Aug 14 21:08:23 duo kernel: [] dump_stack+0x41/0x52
Aug 14 21:08:23 duo kernel: [] warn_slowpath_common+0x7a/0xa0
Aug 14 21:08:23 duo kernel: [] ? kobject_put+0x61/0x70
Aug 14 21:08:23 duo kernel: [] ? kobject_put+0x61/0x70
Aug 14 21:08:23 duo kernel: [] warn_slowpath_fmt+0x2e/0x30
Aug 14 21:08:23 duo kernel: [] kobject_put+0x61/0x70
Aug 14 21:08:23 duo kernel: [] put_device+0xf/0x20
Aug 14 21:08:23 duo kernel: [] scsi_host_dev_release+0xb2/0xe0
Aug 14 21:08:23 duo kernel: [] device_release+0x27/0x90
Aug 14 21:08:23 duo kernel: [] ? cache_free_debugcheck+0x258/0x310
Aug 14 21:08:23 duo kernel: [] kobject_release+0x7a/0x1c0
Aug 14 21:08:23 duo kernel: [] ? debug_check_no_obj_freed+0x124/0x190
Aug 14 21:08:23 duo kernel: [] kobject_put+0x2f/0x70
Aug 14 21:08:23 duo kernel: [] put_device+0xf/0x20
Aug 14 21:08:23 duo kernel: [] scsi_target_dev_release+0x15/0x20
Aug 14 21:08:23 duo kernel: [] device_release+0x27/0x90
Aug 14 21:08:23 duo kernel: [] ? cache_free_debugcheck+0x258/0x310
Aug 14 21:08:23 duo kernel: [] kobject_release+0x7a/0x1c0
Aug 14 21:08:23 duo kernel: [] ? debug_check_no_obj_freed+0x124/0x190
Aug 14 21:08:23 duo kernel: [] kobject_put+0x2f/0x70
Aug 14 21:08:23 duo kernel: [] put_device+0xf/0x20
Aug 14 21:08:23 duo kernel: [] 
scsi_device_dev_release_usercontext+0xe0/0xf0
Aug 14 21:08:23 duo kernel: [] ? scsi_device_dev_release+0x20/0x20
Aug 14 21:08:23 duo kernel: [] execute_in_process_context+0x74/0x80
Aug 14 21:08:23 duo kernel: [] scsi_device_dev_release+0x13/0x20
Aug 14 21:08:23 duo kernel: [] device_release+0x27/0x90
Aug 14 21:08:23 duo kernel: [] kobject_release+0x7a/0x1c0
Aug 14 21:08:23 duo kernel: [] ? kobject_release+0x98/0x1c0
Aug 14 21:08:23 duo kernel: [] kobject_put+0x2f/0x70
Aug 14 21:08:23 duo kernel: [] ? kobject_put+0x2f/0x70
Aug 14 21:08:23 duo kernel: [] put_device+0xf/0x20
Aug 14 21:08:23 duo kernel: [] scsi_device_put+0x33/0x50
Aug 14 21:08:23 duo kernel: [] scsi_disk_put+0x2d/0x50
Aug 14 21:08:23 duo kernel: [] sd_release+0x2f/0x60
Aug 14 21:08:23 duo kernel: [] __blkdev_put+0x138/0x170
Aug 14 21:08:23 duo kernel: [] __blkdev_put+0xe7/0x170
Aug 14 21:08:23 duo kernel: [] blkdev_put+0x47/0x130
Aug 14 21:08:23 duo kernel: [] kill_block_super+0x3e/0x70
Aug 14 21:08:23 duo kernel: [] deactivate_locked_super+0x48/0x70
Aug 14 21:08:23 duo kernel: [] deactivate_super+0x51/0x70
Aug 14 21:08:23 duo kernel: [] mntput_no_expire+0x12f/0x1f0
Aug 14 21:08:23 duo kernel: [] ? SyS_umount+0xa7/0x430
Aug 14 21:08:23 duo kernel: [] SyS_umount+0xa7/0x430
Aug 14 21:08:23 duo kernel: [] ? SyS_oldumount+0x19/0x20
Aug 14 21:08:23 duo kernel: [] SyS_oldumount+0x19/0x20
Aug 14 21:08:23 duo kernel: [] syscall_call+0x7/0x7
Aug 14 21:08:23 duo kernel: ---[ end trace 7486deb8b98bf654 ]---
Linux duo 3.16.0+ #390 SMP Mon Aug 11 11:07:27 CEST 2014 i686 GNU/Linux

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next] r8169:add support for RTL8168H and RTL8107E

2014-08-14 Thread David Miller

From: Chun-Hao Lin 
Date: Wed, 13 Aug 2014 23:06:13 +0800

> RTL8168H is Realtek PCIe Gigabit Ethernet controller.
> RTL8107E is Realtek PCIe Fast Ethernet controller.
> 
> This patch add support for these two chips.
> 
> Signed-off-by: Chun-Hao Lin 

Please don't break the indentation like you have here:

> + rtl_w1w0_eri(tp,
> + 0x0dc,
> + ERIAR_MASK_0100,
> + MagicPacket_v2,
> + 0x,
> + ERIAR_EXGMAC);
 ...
> + rtl_w1w0_eri(tp,
> + 0x0dc,
> + ERIAR_MASK_0100,
> + 0x,
> + MagicPacket_v2,
> + ERIAR_EXGMAC);
 ...
> @@ -4495,15 +4791,19 @@ static void rtl8169_hw_reset(struct rtl8169_private 
> *tp)
>   tp->mac_version == RTL_GIGA_MAC_VER_31) {
>   rtl_udelay_loop_wait_low(tp, _npq_cond, 20, 42*42);
>   } else if (tp->mac_version == RTL_GIGA_MAC_VER_34 ||
> -tp->mac_version == RTL_GIGA_MAC_VER_35 ||
> -tp->mac_version == RTL_GIGA_MAC_VER_36 ||
> -tp->mac_version == RTL_GIGA_MAC_VER_37 ||
> -tp->mac_version == RTL_GIGA_MAC_VER_40 ||
> -tp->mac_version == RTL_GIGA_MAC_VER_41 ||
> -tp->mac_version == RTL_GIGA_MAC_VER_42 ||
> -tp->mac_version == RTL_GIGA_MAC_VER_43 ||
> -tp->mac_version == RTL_GIGA_MAC_VER_44 ||
> -tp->mac_version == RTL_GIGA_MAC_VER_38) {
> + tp->mac_version == RTL_GIGA_MAC_VER_35 ||
> + tp->mac_version == RTL_GIGA_MAC_VER_36 ||
> + tp->mac_version == RTL_GIGA_MAC_VER_37 ||
> + tp->mac_version == RTL_GIGA_MAC_VER_38 ||
> + tp->mac_version == RTL_GIGA_MAC_VER_40 ||
> + tp->mac_version == RTL_GIGA_MAC_VER_41 ||
> + tp->mac_version == RTL_GIGA_MAC_VER_42 ||
> + tp->mac_version == RTL_GIGA_MAC_VER_43 ||
> + tp->mac_version == RTL_GIGA_MAC_VER_44 ||
> + tp->mac_version == RTL_GIGA_MAC_VER_45 ||
> + tp->mac_version == RTL_GIGA_MAC_VER_46 ||
> + tp->mac_version == RTL_GIGA_MAC_VER_47 ||
> + tp->mac_version == RTL_GIGA_MAC_VER_48) {
>   RTL_W8(ChipCmd, RTL_R8(ChipCmd) | StopReq);
>   rtl_udelay_loop_wait_high(tp, _txcfg_empty_cond, 100, 666);
>   } else {

None of these changes are indented correctly, and in the last hunk the original
code was perfectly indented and you should have not adjusted it.

On a multi-line conditional or function call, the second and subsequent lines
should start exactly at the first column after the openning parenthesis of
the initial line.

You must use the appropriate number of TAB then SPACE characters necessary
to achieve this.

If you are indenting these lines only using TAB characters, you are doing it
incorrectly.

Please audit your entire patch for this problem, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 tip/core/rcu 01/16] rcu: Add call_rcu_tasks()

2014-08-14 Thread Paul E. McKenney

On Thu, Aug 14, 2014 at 04:46:34PM -0400, Pranith Kumar wrote:
> On Mon, Aug 11, 2014 at 6:48 PM, Paul E. McKenney
>  wrote:
> > From: "Paul E. McKenney" 
> >
> > This commit adds a new RCU-tasks flavor of RCU, which provides
> > call_rcu_tasks().  This RCU flavor's quiescent states are voluntary
> > context switch (not preemption!), userspace execution, and the idle loop.
> > Note that unlike other RCU flavors, these quiescent states occur in tasks,
> > not necessarily CPUs.  Includes fixes from Steven Rostedt.
> >
> > This RCU flavor is assumed to have very infrequent latency-tolerant
> > updaters.  This assumption permits significant simplifications, including
> > a single global callback list protected by a single global lock, along
> > with a single linked list containing all tasks that have not yet passed
> > through a quiescent state.  If experience shows this assumption to be
> > incorrect, the required additional complexity will be added.
> >
> > Suggested-by: Steven Rostedt 
> > Signed-off-by: Paul E. McKenney 
> 
> Please find comments below. I did not read all the ~100 emails in this
> series, so please forgive if I ask something repetitive and just point
> that out. I will go digging :)

;-)

> > ---
> >  include/linux/init_task.h |   9 +++
> >  include/linux/rcupdate.h  |  36 ++
> >  include/linux/sched.h |  23 ---
> >  init/Kconfig  |  10 +++
> >  kernel/rcu/tiny.c |   2 +
> >  kernel/rcu/tree.c |   2 +
> >  kernel/rcu/update.c   | 171 
> > ++
> >  7 files changed, 242 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> > index 6df7f9fe0d01..78715ea7c30c 100644
> > --- a/include/linux/init_task.h
> > +++ b/include/linux/init_task.h
> > @@ -124,6 +124,14 @@ extern struct group_info init_groups;
> >  #else
> >  #define INIT_TASK_RCU_PREEMPT(tsk)
> >  #endif
> > +#ifdef CONFIG_TASKS_RCU
> > +#define INIT_TASK_RCU_TASKS(tsk)   \
> > +   .rcu_tasks_holdout = false, \
> > +   .rcu_tasks_holdout_list =   \
> > +   LIST_HEAD_INIT(tsk.rcu_tasks_holdout_list),
> > +#else
> > +#define INIT_TASK_RCU_TASKS(tsk)
> > +#endif
> 
> rcu_tasks_holdout is defined as an int. So use 0 may be?

Good point.  I started with a bool, but then needed to do
smp_store_release(), which doesn't support bool.

> I see that there are other locations which set it to 'false'. So may
> just change the definition to bool, as it seems more appropriate.

If I no longer use smp_store_release, yep.

And it appears that I no longer do, so changed back to bool.

> Also why is rcu_tasks_nvcsw not being initialized? I see that it can
> be read before initialized, no?

It initialized by rcu_tasks_kthread() before putting a given task on the
rcu_tasks_holdouts list.  It is only read for tasks on that list.  So
there is not use before initialization.

> >  extern struct cred init_cred;
> >
> > @@ -231,6 +239,7 @@ extern struct task_group root_task_group;
> > INIT_FTRACE_GRAPH   \
> > INIT_TRACE_RECURSION\
> > INIT_TASK_RCU_PREEMPT(tsk)  \
> > +   INIT_TASK_RCU_TASKS(tsk)\
> > INIT_CPUSET_SEQ(tsk)\
> > INIT_RT_MUTEXES(tsk)\
> > INIT_VTIME(tsk) \
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 6a94cc8b1ca0..829efc99df3e 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -197,6 +197,26 @@ void call_rcu_sched(struct rcu_head *head,
> >
> >  void synchronize_sched(void);
> >
> > +/**
> > + * call_rcu_tasks() - Queue an RCU for invocation task-based grace period
> 
> -ENOPARSE :(
> 
> > + * @head: structure to be used for queueing the RCU updates.
> > + * @func: actual callback function to be invoked after the grace period
> > + *
> > + * The callback function will be invoked some time after a full grace
> > + * period elapses, in other words after all currently executing RCU
> > + * read-side critical sections have completed. call_rcu_tasks() assumes
> > + * that the read-side critical sections end at a voluntary context
> > + * switch (not a preemption!), entry into idle, or transition to usermode
> > + * execution.  As such, there are no read-side primitives analogous to
> > + * rcu_read_lock() and rcu_read_unlock() because this primitive is intended
> > + * to determine that all tasks have passed through a safe state, not so
> > + * much for data-strcuture synchronization.
> 
> s/strcuture/structure
> 
> > + *
> > + * See the description of call_rcu() for more

Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT

2014-08-14 Thread Andy Lutomirski

On Thu, Aug 14, 2014 at 2:16 PM, Peter Zijlstra  wrote:
> On Thu, Aug 14, 2014 at 02:12:27PM -0700, Andy Lutomirski wrote:
>> On 08/14/2014 04:14 AM, Daniel Lezcano wrote:
>> > On 08/14/2014 01:00 PM, Peter Zijlstra wrote:
>> >>
>> >> So seeing how you're from @intel.com I'm assuming you're using x86 here.
>> >>
>> >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs
>> >> just fine, which means we'll fall out of the cpuidle_enter(), which
>> >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call().
>> >>
>> >> It will indeed not leave the cpu_idle_loop() function and go right back
>> >> into cpuidle_idle_call(), but that will then call cpuidle_select() which
>> >> should pick a new C state.
>> >>
>> >> So the interrupt _should_ work. If it doesn't you need to explain why.
>> >
>> > I think the issue is related to the poll_idle state, in
>> > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the
>> > cpuidle table as the state 0 (POLL). There is no mwait for this state.
>> > It is a bit confusing because this state is not listed in the acpi /
>> > intel idle driver but inserted implicitly at the beginning of the idle
>> > table by the cpuidle framework when the driver is registered.
>> >
>> > static int poll_idle(struct cpuidle_device *dev,
>> > struct cpuidle_driver *drv, int index)
>> > {
>> > local_irq_enable();
>> > if (!current_set_polling_and_test()) {
>> > while (!need_resched())
>> > cpu_relax();
>> > }
>> > current_clr_polling();
>> >
>> > return index;
>> > }
>>
>> As the most recent person to have modified this function, and as an
>> avowed hater of pointless IPIs, let me ask a rather different question:
>> why are you sending IPIs at all?  As of Linux 3.16, poll_idle actually
>> supports the polling idle interface :)
>>
>> Can't you just do:
>>
>> if (set_nr_if_polling(rq->idle)) {
>>   trace_sched_wake_idle_without_ipi(cpu);
>> } else {
>>   spin_lock_irqsave(>lock, flags);
>>   if (rq->curr == rq->idle)
>>   smp_send_reschedule(cpu);
>>   // else the CPU wasn't idle; nothing to do
>>   raw_spin_unlock_irqrestore(>lock, flags);
>> }
>>
>> In the common case (wake from C0, i.e. polling idle), this will skip the
>> IPI entirely unless you race with idle entry/exit, saving a few more
>> precious electrons and all of the latency involved in poking the APIC
>> registers.
>
> They could and they probably should, but that logic should _not_ live in
> the cpuidle driver.

Sure.  My point is that fixing the IPI handler is, I think, totally
bogus, because the IPI API isn't the right way to do this at all.

It would be straightforward to add a new function wake_if_idle(int
cpu) to sched/core.c.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] fs/buffer.c: allocate buffer cache from non-movable area

2014-08-14 Thread Andrew Morton

On Thu, 14 Aug 2014 14:15:40 +0900 Gioh Kim  wrote:

> A buffer cache is allocated from movable area
> because it is referred for a while and released soon.
> But some filesystems are taking buffer cache for a long time
> and it can disturb page migration.
> 
> A new API should be introduced to allocate buffer cache from
> non-movable area.

I think the API could and should be more flexible than this.

Rather than making the API be "movable or not movable", let's permit
callers to specify the gfp_t and leave it at that.  That way, if
someone later wants to allocate a buffer head with, I dunno,
__GFP_NOTRACK then they can do so.

So the word "movable" shouldn't appear in buffer.c at all, except in a
single place.

> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -993,7 +993,7 @@ init_page_buffers(struct page *page, struct block_device 
> *bdev,
>   */
>  static int
>  grow_dev_page(struct block_device *bdev, sector_t block,
> -   pgoff_t index, int size, int sizebits)
> + pgoff_t index, int size, int sizebits, gfp_t movable_mask)

s/movable_mask/gfp/

>  {
> struct inode *inode = bdev->bd_inode;
> struct page *page;
> @@ -1003,7 +1003,8 @@ grow_dev_page(struct block_device *bdev, sector_t block,
> gfp_t gfp_mask;
> 
> gfp_mask = mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS;
> -   gfp_mask |= __GFP_MOVABLE;
> +   if (movable_mask & __GFP_MOVABLE)
> +   gfp_mask |= __GFP_MOVABLE;

This becomes

gfp_mask |= gfp;

> /*
>  * XXX: __getblk_slow() can not really deal with failure and
>  * will endlessly loop on improvised global reclaim.  Prefer
> @@ -1058,7 +1059,8 @@ failed:
>   * that page was dirty, the buffers are set dirty also.
>   */
>  static int
> -grow_buffers(struct block_device *bdev, sector_t block, int size)
> +grow_buffers(struct block_device *bdev, sector_t block,
> +int size, gfp_t movable_mask)

gfp

>  {
> pgoff_t index;
> int sizebits;
> @@ -1085,11 +1087,12 @@ grow_buffers(struct block_device *bdev, sector_t 
> block, int size)
> }
> 
> /* Create a page with the proper size buffers.. */
> -   return grow_dev_page(bdev, block, index, size, sizebits);
> +   return grow_dev_page(bdev, block, index, size, sizebits, 
> movable_mask);
>  }
> 
>  static struct buffer_head *
> -__getblk_slow(struct block_device *bdev, sector_t block, int size)
> +__getblk_slow(struct block_device *bdev, sector_t block,
> + int size, gfp_t movable_mask)

gfp

>  {
> /* Size must be multiple of hard sectorsize */
> if (unlikely(size & (bdev_logical_block_size(bdev)-1) ||
> @@ -,7 +1114,7 @@ __getblk_slow(struct block_device *bdev, sector_t 
> block, int size)
> if (bh)
> return bh;
> 
> -   ret = grow_buffers(bdev, block, size);
> +   ret = grow_buffers(bdev, block, size, movable_mask);

gfp

> if (ret < 0)
> return NULL;
> if (ret == 0)
> @@ -1385,11 +1388,34 @@ __getblk(struct block_device *bdev, sector_t block, 
> unsigned size)
> 
> might_sleep();
> if (bh == NULL)
> -   bh = __getblk_slow(bdev, block, size);
> +   bh = __getblk_slow(bdev, block, size, __GFP_MOVABLE);

Here is the place where buffer.c. mentions "movable".

> return bh;
>  }
>  EXPORT_SYMBOL(__getblk);
> 
> + /*
> + * __getblk_nonmovable will locate (and, if necessary, create) the 
> buffer_head
> + * which corresponds to the passed block_device, block and size. The
> + * returned buffer has its reference count incremented.
> + *
> + * The page cache is allocated from non-movable area
> + * not to prevent page migration.
> + *
> + * __getblk()_nonmovable will lock up the machine
> + * if grow_dev_page's try_to_free_buffers() attempt is failing. FIXME, 
> perhaps?
> + */
> +struct buffer_head *
> +__getblk_nonmovable(struct block_device *bdev, sector_t block, unsigned size)
> +{
> +   struct buffer_head *bh = __find_get_block(bdev, block, size);
> +
> +   might_sleep();
> +   if (bh == NULL)
> +   bh = __getblk_slow(bdev, block, size, 0);
> +   return bh;
> +}
> +EXPORT_SYMBOL(__getblk_nonmovable);

Suggest this be called __getblk_gfp(bdev, block, size, gfp) and then
__getblk() be changed to call __getblk_gfp(..., __GFP_MOVABLE).

We could then write a __getblk_nonmovable() which calls __getblk_gfp()
(a static inlined one-line function) or we can just call
__getblk_gfp(..., 0) directly from filesystems.

> @@ -1423,6 +1450,28 @@ __bread(struct block_device *bdev, sector_t block, 
> unsigned size)
>  }
>  EXPORT_SYMBOL(__bread);
> 
> +/**
> + *  __bread_nonmovable() - reads a specified block and returns the bh
> + *  @bdev: the block_device to read from
> + *  @block: number of block
> + *  @size: size (in bytes) to read
> + *
> + *  Reads a specified block, and returns

Re: [PATCH RFC v4 net-next 17/26] tracing: allow eBPF programs to be attached to events

2014-08-14 Thread Brendan Gregg

On Wed, Aug 13, 2014 at 12:57 AM, Alexei Starovoitov  wrote:
[...]
> +/* For tracing filters save first six arguments of tracepoint events.
> + * On 64-bit architectures argN fields will match one to one to arguments 
> passed
> + * to tracepoint events.
> + * On 32-bit architectures u64 arguments to events will be seen into two
> + * consecutive argN, argN+1 fields. Pointers, u32, u16, u8, bool types will
> + * match one to one
> + */
> +struct bpf_context {
> +   unsigned long arg1;
> +   unsigned long arg2;
> +   unsigned long arg3;
> +   unsigned long arg4;
> +   unsigned long arg5;
> +   unsigned long arg6;
> +   unsigned long ret;
> +};

While this works, the argN+1 shift for 32-bit is a gotcha to learn.
Lets say arg1 was 64-bit, and my program only examined arg2. I'd need
two programs, one for 64-bit (using arg2) and 32-bit (arg3). If there
was a way not to shift arguments, I could have one program for both.
Eg, additional arg1hi, arg2hi, ... for the higher order u32s.

Brendan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT

2014-08-14 Thread Peter Zijlstra

On Thu, Aug 14, 2014 at 02:12:27PM -0700, Andy Lutomirski wrote:
> On 08/14/2014 04:14 AM, Daniel Lezcano wrote:
> > On 08/14/2014 01:00 PM, Peter Zijlstra wrote:
> >>
> >> So seeing how you're from @intel.com I'm assuming you're using x86 here.
> >>
> >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs
> >> just fine, which means we'll fall out of the cpuidle_enter(), which
> >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call().
> >>
> >> It will indeed not leave the cpu_idle_loop() function and go right back
> >> into cpuidle_idle_call(), but that will then call cpuidle_select() which
> >> should pick a new C state.
> >>
> >> So the interrupt _should_ work. If it doesn't you need to explain why.
> > 
> > I think the issue is related to the poll_idle state, in
> > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the
> > cpuidle table as the state 0 (POLL). There is no mwait for this state.
> > It is a bit confusing because this state is not listed in the acpi /
> > intel idle driver but inserted implicitly at the beginning of the idle
> > table by the cpuidle framework when the driver is registered.
> > 
> > static int poll_idle(struct cpuidle_device *dev,
> > struct cpuidle_driver *drv, int index)
> > {
> > local_irq_enable();
> > if (!current_set_polling_and_test()) {
> > while (!need_resched())
> > cpu_relax();
> > }
> > current_clr_polling();
> > 
> > return index;
> > }
> 
> As the most recent person to have modified this function, and as an
> avowed hater of pointless IPIs, let me ask a rather different question:
> why are you sending IPIs at all?  As of Linux 3.16, poll_idle actually
> supports the polling idle interface :)
> 
> Can't you just do:
> 
> if (set_nr_if_polling(rq->idle)) {
>   trace_sched_wake_idle_without_ipi(cpu);
> } else {
>   spin_lock_irqsave(>lock, flags);
>   if (rq->curr == rq->idle)
>   smp_send_reschedule(cpu);
>   // else the CPU wasn't idle; nothing to do
>   raw_spin_unlock_irqrestore(>lock, flags);
> }
> 
> In the common case (wake from C0, i.e. polling idle), this will skip the
> IPI entirely unless you race with idle entry/exit, saving a few more
> precious electrons and all of the latency involved in poking the APIC
> registers.

They could and they probably should, but that logic should _not_ live in
the cpuidle driver.

And as stated elsewhere in the thread; they also need to fix their
kick_all_cpus_sync() usage, because that's similarly wrecked.


pgp2ZW6wDFZ7U.pgp
Description: PGP signature

[Patch v3 2/2] freezer: remove obsolete comments in __thaw_task()

2014-08-14 Thread Cong Wang

__thaw_task() no longer clears frozen flag.

Cc: David Rientjes 
Cc: "Rafael J. Wysocki" 
Cc: Tejun Heo 
Cc: Andrew Morton 
Cc: Michal Hocko 
Signed-off-by: Cong Wang 
---
 kernel/freezer.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index 5b25351..33cbcb0 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -156,12 +156,6 @@ void __thaw_task(struct task_struct *p)
 {
unsigned long flags;
 
-   /*
-* Clear freezing and kick @p if FROZEN.  Clearing is guaranteed to
-* be visible to @p as waking up implies wmb.  Waking up inside
-* freezer_lock also prevents wakeups from leaking outside
-* refrigerator.
-*/
spin_lock_irqsave(_lock, flags);
if (frozen(p))
wake_up_process(p);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Patch v3 1/2] freezer: check OOM kill while being frozen

2014-08-14 Thread Cong Wang

There is a race condition between OOM killer and freezer when
they try to operate on the same process, something like below:

Process A   Process B   Process C
trigger page fault
then trigger oom
B=oom_scan_process_thread()
cgroup freezer freeze(A, B)
...
try_to_freeze()
stay in D state
oom_kill_process(B)
restart page fault
...

In this case, process A triggered a page fault in user-space,
and the kernel page fault handler triggered OOM, then kernel
selected process B as the victim, right before being killed
process B was frozen by process C therefore went to D state,
then kernel sent SIGKILL but it is already too late as
process B will not care about pending signals any more.

David Rientjes tried to fix same issue with commit
f660daac474c6f (oom: thaw threads if oom killed thread is
frozen before deferring) but it doesn't work any more, because
__thaw_task() just checks if it's frozen and then wakes it up,
but the frozen task, after waking up, will check if freezing()
is still true and continue to freeze itself if so. __thaw_task()
can't make freezing() return false since it doesn't change any
of these conditions, especially cgroup_freezing().

Fix this straightly by checking if the frozen process itself
has been killed by OOM killer, so that the frozen process will
thaw itself and be killed finally.

Cc: David Rientjes 
Cc: Michal Hocko 
Cc: "Rafael J. Wysocki" 
Cc: Tejun Heo 
Cc: Andrew Morton 
Acked-by: Michal Hocko 
Signed-off-by: Cong Wang 
---
 kernel/freezer.c | 13 +++--
 mm/oom_kill.c|  2 --
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index aa6a8aa..5b25351 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -52,6 +52,16 @@ bool freezing_slow_path(struct task_struct *p)
 }
 EXPORT_SYMBOL(freezing_slow_path);
 
+static bool should_thaw_current(bool check_kthr_stop)
+{
+   if (!freezing(current) ||
+   (check_kthr_stop && kthread_should_stop()) ||
+   test_thread_flag(TIF_MEMDIE))
+   return true;
+   else
+   return false;
+}
+
 /* Refrigerator is place where frozen processes are stored :-). */
 bool __refrigerator(bool check_kthr_stop)
 {
@@ -67,8 +77,7 @@ bool __refrigerator(bool check_kthr_stop)
 
spin_lock_irq(_lock);
current->flags |= PF_FROZEN;
-   if (!freezing(current) ||
-   (check_kthr_stop && kthread_should_stop()))
+   if (should_thaw_current(check_kthr_stop))
current->flags &= ~PF_FROZEN;
spin_unlock_irq(_lock);
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1e11df8..112c278 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -266,8 +266,6 @@ enum oom_scan_t oom_scan_process_thread(struct task_struct 
*task,
 * Don't allow any other task to have access to the reserves.
 */
if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
-   if (unlikely(frozen(task)))
-   __thaw_task(task);
if (!force_kill)
return OOM_SCAN_ABORT;
}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT

2014-08-14 Thread Andy Lutomirski

On 08/14/2014 04:14 AM, Daniel Lezcano wrote:
> On 08/14/2014 01:00 PM, Peter Zijlstra wrote:
>>
>> So seeing how you're from @intel.com I'm assuming you're using x86 here.
>>
>> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs
>> just fine, which means we'll fall out of the cpuidle_enter(), which
>> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call().
>>
>> It will indeed not leave the cpu_idle_loop() function and go right back
>> into cpuidle_idle_call(), but that will then call cpuidle_select() which
>> should pick a new C state.
>>
>> So the interrupt _should_ work. If it doesn't you need to explain why.
> 
> I think the issue is related to the poll_idle state, in
> drivers/cpuidle/driver.c. This state is x86 specific and inserted in the
> cpuidle table as the state 0 (POLL). There is no mwait for this state.
> It is a bit confusing because this state is not listed in the acpi /
> intel idle driver but inserted implicitly at the beginning of the idle
> table by the cpuidle framework when the driver is registered.
> 
> static int poll_idle(struct cpuidle_device *dev,
> struct cpuidle_driver *drv, int index)
> {
> local_irq_enable();
> if (!current_set_polling_and_test()) {
> while (!need_resched())
> cpu_relax();
> }
> current_clr_polling();
> 
> return index;
> }

As the most recent person to have modified this function, and as an
avowed hater of pointless IPIs, let me ask a rather different question:
why are you sending IPIs at all?  As of Linux 3.16, poll_idle actually
supports the polling idle interface :)

Can't you just do:

if (set_nr_if_polling(rq->idle)) {
trace_sched_wake_idle_without_ipi(cpu);
} else {
spin_lock_irqsave(>lock, flags);
if (rq->curr == rq->idle)
smp_send_reschedule(cpu);
// else the CPU wasn't idle; nothing to do
raw_spin_unlock_irqrestore(>lock, flags);
}

In the common case (wake from C0, i.e. polling idle), this will skip the
IPI entirely unless you race with idle entry/exit, saving a few more
precious electrons and all of the latency involved in poking the APIC
registers.

--Andy

P.S. "30mV" in the patch description is presumably a typo.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] hwmon, k10temp: Add support for F15h M60h

2014-08-14 Thread Borislav Petkov

On Thu, Aug 14, 2014 at 10:22:31PM +0200, Clemens Ladisch wrote:
> >> +  depends on X86 && PCI && AMD_NB
> >
> > Is the added dependency acceptable ?
> 
> Yes, it is automatically set from CPU_SUP_AMD.

Well, we can always move that function to k10temp but I'll venture a
guess that Aravind wants to use it somewhere else too? Correct, Aravind?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] perf annotate: Don't truncate Intel style addresses

2014-08-14 Thread Alex Converse

Instructions like "mov r9,QWORD PTR [rdx+0x8]" were being truncated to
"mov r9,QWORD" by code that assuemd operands cannot have spaces.

Signed-off-by: Alex Converse 
---
 tools/perf/util/annotate.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 809b4c5..cc6f72c 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -232,9 +232,16 @@ static int mov__parse(struct ins_operands *ops)
return -1;
 
target = ++s;
+   comment = strchr(s, '#');
 
-   while (s[0] != '\0' && !isspace(s[0]))
-   ++s;
+   if (comment != NULL)
+   s = comment - 1;
+   else
+   s = strchr(s, '\0') - 1;
+
+   while (s > target && isspace(s[0]))
+   --s;
+   s++;
prev = *s;
*s = '\0';
 
@@ -244,7 +251,6 @@ static int mov__parse(struct ins_operands *ops)
if (ops->target.raw == NULL)
goto out_free_source;
 
-   comment = strchr(s, '#');
if (comment == NULL)
return 0;
 
-- 
2.1.0.rc2.206.gedb03e5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] power: twl4030-madc-battery: Convert to iio consumer.

2014-08-14 Thread Sebastian Reichel

Hi Marek,

On Mon, Aug 11, 2014 at 09:52:52PM +0200, Belisko Marek wrote:
> can you please take this series (I'll post update version with
> removing debug code). Thanks.

mh. I will not pull this with "(dis)charging-calibration-data" as
DT property name without an ACK from the DT binding maintainers.

I would feel fine with pulling this when they are prefixed with
"ti,". Otherwise the series looks good to me.

-- Sebastian

signature.asc
Description: Digital signature

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 921 matches

Mail list logo