Re: BUG_ON() in workingset_node_shadows_dec() triggers

2016-10-06 Thread Willy Tarreau
On Thu, Oct 06, 2016 at 04:59:20PM -0700, Linus Torvalds wrote:
> We should just switch BUG() over and be done with it. The whole point
> it that since it should never trigger in the first place, the
> semantics on BUG() should never matter.
> 
> And if you have some code that depends on the semantics of BUG(), that
> code is buggy crap *by*definition*.

I totally agree with this. If a developer writes BUG() somewhere, it
means he doesn't see how it is possible to end up in this situation.
Thus we cannot hope that the BUG() call is doing anything right to
fix what the code author didn't expect to happen. It just means
"try to limit the risks but I don't really know which ones".

Also we won't make things worse. Where people currently have an oops,
they'll get one or more warnings. The side effects (lockups, panic,
etc) will more or less be the same, but many of us already don't want
to continue after an oops and despite this our systems work fine, so
I don't see why anyone would suffer from such a change. However some
developers may get more details about issues than what they could get
in the past.

Willy


Re: BUG_ON() in workingset_node_shadows_dec() triggers

2016-10-06 Thread Willy Tarreau
On Thu, Oct 06, 2016 at 04:59:20PM -0700, Linus Torvalds wrote:
> We should just switch BUG() over and be done with it. The whole point
> it that since it should never trigger in the first place, the
> semantics on BUG() should never matter.
> 
> And if you have some code that depends on the semantics of BUG(), that
> code is buggy crap *by*definition*.

I totally agree with this. If a developer writes BUG() somewhere, it
means he doesn't see how it is possible to end up in this situation.
Thus we cannot hope that the BUG() call is doing anything right to
fix what the code author didn't expect to happen. It just means
"try to limit the risks but I don't really know which ones".

Also we won't make things worse. Where people currently have an oops,
they'll get one or more warnings. The side effects (lockups, panic,
etc) will more or less be the same, but many of us already don't want
to continue after an oops and despite this our systems work fine, so
I don't see why anyone would suffer from such a change. However some
developers may get more details about issues than what they could get
in the past.

Willy


Re: [PATCH 0/2 v2] userns: show current values of user namespace counters

2016-10-06 Thread Andrei Vagin
Hello Eric,

What do you think about this series? It should be useful to know current
usage for user counters.

Thanks,
Andrei

On Mon, Aug 15, 2016 at 01:10:20PM -0700, Andrei Vagin wrote:
> Recently Eric added user namespace counters.  User namespace counters is
> a feature that allows to limit the number of various kernel objects a
> user can create. These limits are set via /proc/sys/user/ sysctls on a
> per user namespace basis and are applicable to all users in that
> namespace.
> 
> User namespace counters are not in the upstream tree yet,
> you can find them in Eric's tree:
> https://git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git/log/?h=for-testing
> 
> This patch adds /proc//userns_counts files to provide current usage
> of user namespace counters.
> 
>   > cat /proc/813/userns_counts
>   user_namespaces  101000   1
>   pid_namespaces   101000   1
>   ipc_namespaces   101000   4
>   net_namespaces   101000   2
>   mnt_namespaces   101000   5
>   mnt_namespaces   10   1
> 
> The meanings of the columns are as follows, from left to right:
> 
>   Name Object name
>   UID  User ID
>   UsageCurrent usage
> 
> The full documentation is in the second patch.
> 
> v2: - describe this file in Documentation/filesystems/proc.txt
> - move and rename into /proc//userns_counts
> 
> Cc: Serge Hallyn 
> Cc: Kees Cook 
> Cc: "Eric W. Biederman" 
> Signed-off-by: Andrei Vagin 
> 
> Andrei Vagin (1):
>   kernel: show current values of user namespace counters
> 
> Kirill Kolyshkin (1):
>   Documentation: describe /proc//userns_counts
> 
>  Documentation/filesystems/proc.txt |  30 +++
>  fs/proc/array.c|  55 
>  fs/proc/base.c |   1 +
>  fs/proc/internal.h |   1 +
>  include/linux/user_namespace.h |   8 +++
>  kernel/ucount.c| 102 
> +
>  6 files changed, 197 insertions(+)
> 
> -- 
> 2.5.5


Re: [PATCH 0/2 v2] userns: show current values of user namespace counters

2016-10-06 Thread Andrei Vagin
Hello Eric,

What do you think about this series? It should be useful to know current
usage for user counters.

Thanks,
Andrei

On Mon, Aug 15, 2016 at 01:10:20PM -0700, Andrei Vagin wrote:
> Recently Eric added user namespace counters.  User namespace counters is
> a feature that allows to limit the number of various kernel objects a
> user can create. These limits are set via /proc/sys/user/ sysctls on a
> per user namespace basis and are applicable to all users in that
> namespace.
> 
> User namespace counters are not in the upstream tree yet,
> you can find them in Eric's tree:
> https://git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git/log/?h=for-testing
> 
> This patch adds /proc//userns_counts files to provide current usage
> of user namespace counters.
> 
>   > cat /proc/813/userns_counts
>   user_namespaces  101000   1
>   pid_namespaces   101000   1
>   ipc_namespaces   101000   4
>   net_namespaces   101000   2
>   mnt_namespaces   101000   5
>   mnt_namespaces   10   1
> 
> The meanings of the columns are as follows, from left to right:
> 
>   Name Object name
>   UID  User ID
>   UsageCurrent usage
> 
> The full documentation is in the second patch.
> 
> v2: - describe this file in Documentation/filesystems/proc.txt
> - move and rename into /proc//userns_counts
> 
> Cc: Serge Hallyn 
> Cc: Kees Cook 
> Cc: "Eric W. Biederman" 
> Signed-off-by: Andrei Vagin 
> 
> Andrei Vagin (1):
>   kernel: show current values of user namespace counters
> 
> Kirill Kolyshkin (1):
>   Documentation: describe /proc//userns_counts
> 
>  Documentation/filesystems/proc.txt |  30 +++
>  fs/proc/array.c|  55 
>  fs/proc/base.c |   1 +
>  fs/proc/internal.h |   1 +
>  include/linux/user_namespace.h |   8 +++
>  kernel/ucount.c| 102 
> +
>  6 files changed, 197 insertions(+)
> 
> -- 
> 2.5.5


Re: [PATCH 30/54] md/raid5: Delete two error messages for a failed memory allocation

2016-10-06 Thread Hannes Reinecke

On 10/06/2016 11:30 AM, SF Markus Elfring wrote:

From: Markus Elfring 
Date: Wed, 5 Oct 2016 09:43:40 +0200

Omit extra messages for a memory allocation failure in this function.

Link: 
http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf

Signed-off-by: Markus Elfring 
---
 drivers/md/raid5.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index d864871..ef180c0 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6613,12 +6613,9 @@ static struct r5conf *setup_conf(struct mddev *mddev)
memory = conf->min_nr_stripes * (sizeof(struct stripe_head) +
 max_disks * ((sizeof(struct bio) + PAGE_SIZE))) / 1024;
atomic_set(>empty_inactive_list_nr, NR_STRIPE_HASH_LOCKS);
-   if (grow_stripes(conf, conf->min_nr_stripes)) {
-   printk(KERN_ERR
-  "md/raid:%s: couldn't allocate %dkB for buffers\n",
-  mdname(mddev), memory);
+   if (grow_stripes(conf, conf->min_nr_stripes))
goto free_conf;
-   } else
+   else
printk(KERN_INFO "md/raid:%s: allocated %dkB\n",
   mdname(mddev), memory);
/*
@@ -6640,12 +6637,8 @@ static struct r5conf *setup_conf(struct mddev *mddev)

sprintf(pers_name, "raid%d", mddev->new_level);
conf->thread = md_register_thread(raid5d, mddev, pers_name);
-   if (!conf->thread) {
-   printk(KERN_ERR
-  "md/raid:%s: couldn't allocate thread.\n",
-  mdname(mddev));
+   if (!conf->thread)
goto free_conf;
-   }

return conf;
 free_conf:

Actually I prefer having error messages, especially if you have several 
possible failures all leading to the same return value.

Without it debugging becomes really hard.

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 30/54] md/raid5: Delete two error messages for a failed memory allocation

2016-10-06 Thread Hannes Reinecke

On 10/06/2016 11:30 AM, SF Markus Elfring wrote:

From: Markus Elfring 
Date: Wed, 5 Oct 2016 09:43:40 +0200

Omit extra messages for a memory allocation failure in this function.

Link: 
http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf

Signed-off-by: Markus Elfring 
---
 drivers/md/raid5.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index d864871..ef180c0 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6613,12 +6613,9 @@ static struct r5conf *setup_conf(struct mddev *mddev)
memory = conf->min_nr_stripes * (sizeof(struct stripe_head) +
 max_disks * ((sizeof(struct bio) + PAGE_SIZE))) / 1024;
atomic_set(>empty_inactive_list_nr, NR_STRIPE_HASH_LOCKS);
-   if (grow_stripes(conf, conf->min_nr_stripes)) {
-   printk(KERN_ERR
-  "md/raid:%s: couldn't allocate %dkB for buffers\n",
-  mdname(mddev), memory);
+   if (grow_stripes(conf, conf->min_nr_stripes))
goto free_conf;
-   } else
+   else
printk(KERN_INFO "md/raid:%s: allocated %dkB\n",
   mdname(mddev), memory);
/*
@@ -6640,12 +6637,8 @@ static struct r5conf *setup_conf(struct mddev *mddev)

sprintf(pers_name, "raid%d", mddev->new_level);
conf->thread = md_register_thread(raid5d, mddev, pers_name);
-   if (!conf->thread) {
-   printk(KERN_ERR
-  "md/raid:%s: couldn't allocate thread.\n",
-  mdname(mddev));
+   if (!conf->thread)
goto free_conf;
-   }

return conf;
 free_conf:

Actually I prefer having error messages, especially if you have several 
possible failures all leading to the same return value.

Without it debugging becomes really hard.

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: scripts/coccicheck: Update for a comment?

2016-10-06 Thread Julia Lawall


On Fri, 7 Oct 2016, SF Markus Elfring wrote:

> Hello,
>
> Information from a commit like "docs: sphinxify coccinelle.txt and add it
> to dev-tools" caught also my software development attention.
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/Documentation/coccinelle.txt?id=4b9033a33494ec9154d63e706e9e47f7eb3fd59e
>
> Did an other information from a comment become outdated in the script 
> "coccicheck"
> because of such changes for the documentation format?
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/scripts/coccicheck?id=c802e87fbe2d4dd58982d01b3c39bc5a781223aa#n4

How about submitting a patch to fix the problem?

julia


Re: scripts/coccicheck: Update for a comment?

2016-10-06 Thread Julia Lawall


On Fri, 7 Oct 2016, SF Markus Elfring wrote:

> Hello,
>
> Information from a commit like "docs: sphinxify coccinelle.txt and add it
> to dev-tools" caught also my software development attention.
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/Documentation/coccinelle.txt?id=4b9033a33494ec9154d63e706e9e47f7eb3fd59e
>
> Did an other information from a comment become outdated in the script 
> "coccicheck"
> because of such changes for the documentation format?
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/scripts/coccicheck?id=c802e87fbe2d4dd58982d01b3c39bc5a781223aa#n4

How about submitting a patch to fix the problem?

julia


scripts/coccicheck: Update for a comment?

2016-10-06 Thread SF Markus Elfring
Hello,

Information from a commit like "docs: sphinxify coccinelle.txt and add it
to dev-tools" caught also my software development attention.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/Documentation/coccinelle.txt?id=4b9033a33494ec9154d63e706e9e47f7eb3fd59e

Did an other information from a comment become outdated in the script 
"coccicheck"
because of such changes for the documentation format?
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/scripts/coccicheck?id=c802e87fbe2d4dd58982d01b3c39bc5a781223aa#n4

Regards,
Markus


scripts/coccicheck: Update for a comment?

2016-10-06 Thread SF Markus Elfring
Hello,

Information from a commit like "docs: sphinxify coccinelle.txt and add it
to dev-tools" caught also my software development attention.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/Documentation/coccinelle.txt?id=4b9033a33494ec9154d63e706e9e47f7eb3fd59e

Did an other information from a comment become outdated in the script 
"coccicheck"
because of such changes for the documentation format?
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/scripts/coccicheck?id=c802e87fbe2d4dd58982d01b3c39bc5a781223aa#n4

Regards,
Markus


[PATCH 2/4] mm: prevent double decrease of nr_reserved_highatomic

2016-10-06 Thread Minchan Kim
There is race between page freeing and unreserved highatomic.

 CPU 0  CPU 1

free_hot_cold_page
  mt = get_pfnblock_migratetype
  set_pcppage_migratetype(page, mt)
unreserve_highatomic_pageblock
spin_lock_irqsave(>lock)
move_freepages_block
set_pageblock_migratetype(page)
spin_unlock_irqrestore(>lock)
  free_pcppages_bulk
__free_one_page(mt) <- mt is stale

By above race, a page on CPU 0 could go non-highorderatomic free list
since the pageblock's type is changed. By that, unreserve logic of
highorderatomic can decrease reserved count on a same pageblock
several times and then it will make mismatch between
nr_reserved_highatomic and the number of reserved pageblock.

So, this patch verifies whether the pageblock is highatomic or not
and decrease the count only if the pageblock is highatomic.

Signed-off-by: Minchan Kim 
---
 mm/page_alloc.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e7cbb3cc22fa..d110cd640264 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2133,13 +2133,25 @@ static void unreserve_highatomic_pageblock(const struct 
alloc_context *ac)
continue;
 
/*
-* It should never happen but changes to locking could
-* inadvertently allow a per-cpu drain to add pages
-* to MIGRATE_HIGHATOMIC while unreserving so be safe
-* and watch for underflows.
+* In page freeing path, migratetype change is racy so
+* we can counter several free pages in a pageblock
+* in this loop althoug we changed the pageblock type
+* from highatomic to ac->migratetype. So we should
+* adjust the count once.
 */
-   zone->nr_reserved_highatomic -= min(pageblock_nr_pages,
-   zone->nr_reserved_highatomic);
+   if (get_pageblock_migratetype(page) ==
+   MIGRATE_HIGHATOMIC) {
+   /*
+* It should never happen but changes to
+* locking could inadvertently allow a per-cpu
+* drain to add pages to MIGRATE_HIGHATOMIC
+* while unreserving so be safe and watch for
+* underflows.
+*/
+   zone->nr_reserved_highatomic -= min(
+   pageblock_nr_pages,
+   zone->nr_reserved_highatomic);
+   }
 
/*
 * Convert to ac->migratetype and avoid the normal
-- 
2.7.4



[PATCH 3/4] mm: unreserve highatomic free pages fully before OOM

2016-10-06 Thread Minchan Kim
After fixing the race of highatomic page count, I still encounter
OOM with many free memory reserved as highatomic.

One of reason in my testing was we unreserve free pages only if
reclaim has progress. Otherwise, we cannot have chance to unreseve.

Other problem after fixing it was it doesn't guarantee every pages
unreserving of highatomic pageblock because it just release *a*
pageblock which could have few free pages so other context could
steal it easily so that the process stucked with direct reclaim
finally can encounter OOM although there are free pages which can
be unreserved.

This patch changes the logic so that it unreserves pageblocks with
no_progress_loop proportionally. IOW, in first retrial of reclaim,
it will try to unreserve a pageblock. In second retrial of reclaim,
it will try to unreserve 1/MAX_RECLAIM_RETRIES * reserved_pageblock
and finally all reserved pageblock before the OOM.

Signed-off-by: Minchan Kim 
---
 mm/page_alloc.c | 57 -
 1 file changed, 44 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d110cd640264..eeb047bb0e9d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -71,6 +71,12 @@
 #include 
 #include "internal.h"
 
+/*
+ * Maximum number of reclaim retries without any progress before OOM killer
+ * is consider as the only way to move forward.
+ */
+#define MAX_RECLAIM_RETRIES 16
+
 /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */
 static DEFINE_MUTEX(pcp_batch_high_lock);
 #define MIN_PERCPU_PAGELIST_FRACTION   (8)
@@ -2107,7 +2113,8 @@ static void reserve_highatomic_pageblock(struct page 
*page, struct zone *zone,
  * intense memory pressure but failed atomic allocations should be easier
  * to recover from than an OOM.
  */
-static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
+static int unreserve_highatomic_pageblock(const struct alloc_context *ac,
+   int no_progress_loops)
 {
struct zonelist *zonelist = ac->zonelist;
unsigned long flags;
@@ -2115,15 +2122,40 @@ static void unreserve_highatomic_pageblock(const struct 
alloc_context *ac)
struct zone *zone;
struct page *page;
int order;
+   int unreserved_pages = 0;
 
for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
ac->nodemask) {
-   /* Preserve at least one pageblock */
-   if (zone->nr_reserved_highatomic <= pageblock_nr_pages)
+   unsigned long unreserve_pages_max;
+
+   /*
+* Try to preserve at least one pageblock but use up before
+* OOM kill.
+*/
+   if (no_progress_loops < MAX_RECLAIM_RETRIES &&
+   zone->nr_reserved_highatomic <= pageblock_nr_pages)
continue;
 
spin_lock_irqsave(>lock, flags);
-   for (order = 0; order < MAX_ORDER; order++) {
+   if (no_progress_loops < MAX_RECLAIM_RETRIES) {
+   unreserve_pages_max = no_progress_loops *
+   zone->nr_reserved_highatomic /
+   MAX_RECLAIM_RETRIES;
+   unreserve_pages_max = max(unreserve_pages_max,
+   pageblock_nr_pages);
+   } else {
+   /*
+* By race with page free functions, !highatomic
+* pageblocks can have a free page in highatomic
+* migratetype free list. So if we are about to
+* kill some process, unreserve every free pages
+* in highorderatomic.
+*/
+   unreserve_pages_max = -1UL;
+   }
+
+   for (order = 0; order < MAX_ORDER &&
+   unreserve_pages_max > 0; order++) {
struct free_area *area = &(zone->free_area[order]);
 
page = list_first_entry_or_null(
@@ -2151,6 +2183,9 @@ static void unreserve_highatomic_pageblock(const struct 
alloc_context *ac)
zone->nr_reserved_highatomic -= min(
pageblock_nr_pages,
zone->nr_reserved_highatomic);
+   unreserve_pages_max -= min(pageblock_nr_pages,
+   zone->nr_reserved_highatomic);
+   unreserved_pages += 1 << page_order(page);
}
 
/*
@@ -2164,11 +2199,11 @@ static void unreserve_highatomic_pageblock(const struct 
alloc_context *ac)
 */
   

[PATCH 1/4] mm: adjust reserved highatomic count

2016-10-06 Thread Minchan Kim
In page freeing path, migratetype is racy so that a highorderatomic
page could free into non-highorderatomic free list. If that page
is allocated, VM can change the pageblock from higorderatomic to
something. In that case, we should adjust nr_reserved_highatomic.
Otherwise, VM cannot reserve highorderatomic pageblocks any more
although it doesn't reach 1% limit. It means highorder atomic
allocation failure would be higher.

So, this patch decreases the account as well as migratetype
if it was MIGRATE_HIGHATOMIC.

Signed-off-by: Minchan Kim 
---
 mm/page_alloc.c | 44 ++--
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 55ad0229ebf3..e7cbb3cc22fa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -282,6 +282,9 @@ EXPORT_SYMBOL(nr_node_ids);
 EXPORT_SYMBOL(nr_online_nodes);
 #endif
 
+static void dec_highatomic_pageblock(struct zone *zone, struct page *page,
+   int migratetype);
+
 int page_group_by_mobility_disabled __read_mostly;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
@@ -1935,7 +1938,14 @@ static void change_pageblock_range(struct page 
*pageblock_page,
int nr_pageblocks = 1 << (start_order - pageblock_order);
 
while (nr_pageblocks--) {
-   set_pageblock_migratetype(pageblock_page, migratetype);
+   if (get_pageblock_migratetype(pageblock_page) !=
+   MIGRATE_HIGHATOMIC)
+   set_pageblock_migratetype(pageblock_page,
+   migratetype);
+   else
+   dec_highatomic_pageblock(page_zone(pageblock_page),
+   pageblock_page,
+   migratetype);
pageblock_page += pageblock_nr_pages;
}
 }
@@ -1996,8 +2006,14 @@ static void steal_suitable_fallback(struct zone *zone, 
struct page *page,
 
/* Claim the whole block if over half of it is free */
if (pages >= (1 << (pageblock_order-1)) ||
-   page_group_by_mobility_disabled)
-   set_pageblock_migratetype(page, start_type);
+   page_group_by_mobility_disabled) {
+   int mt = get_pageblock_migratetype(page);
+
+   if (mt != MIGRATE_HIGHATOMIC)
+   set_pageblock_migratetype(page, start_type);
+   else
+   dec_highatomic_pageblock(zone, page, start_type);
+   }
 }
 
 /*
@@ -2037,6 +2053,17 @@ int find_suitable_fallback(struct free_area *area, 
unsigned int order,
return -1;
 }
 
+static void dec_highatomic_pageblock(struct zone *zone, struct page *page,
+   int migratetype)
+{
+   if (zone->nr_reserved_highatomic <= pageblock_nr_pages)
+   return;
+
+   zone->nr_reserved_highatomic -= min(pageblock_nr_pages,
+   zone->nr_reserved_highatomic);
+   set_pageblock_migratetype(page, migratetype);
+}
+
 /*
  * Reserve a pageblock for exclusive use of high-order atomic allocations if
  * there are no empty page blocks that contain a page with a suitable order
@@ -2555,9 +2582,14 @@ int __isolate_free_page(struct page *page, unsigned int 
order)
struct page *endpage = page + (1 << order) - 1;
for (; page < endpage; page += pageblock_nr_pages) {
int mt = get_pageblock_migratetype(page);
-   if (!is_migrate_isolate(mt) && !is_migrate_cma(mt))
-   set_pageblock_migratetype(page,
- MIGRATE_MOVABLE);
+   if (!is_migrate_isolate(mt) && !is_migrate_cma(mt)) {
+   if (mt != MIGRATE_HIGHATOMIC)
+   set_pageblock_migratetype(page,
+   MIGRATE_MOVABLE);
+   else
+   dec_highatomic_pageblock(zone, page,
+   MIGRATE_MOVABLE);
+   }
}
}
 
-- 
2.7.4



[PATCH 1/4] mm: adjust reserved highatomic count

2016-10-06 Thread Minchan Kim
In page freeing path, migratetype is racy so that a highorderatomic
page could free into non-highorderatomic free list. If that page
is allocated, VM can change the pageblock from higorderatomic to
something. In that case, we should adjust nr_reserved_highatomic.
Otherwise, VM cannot reserve highorderatomic pageblocks any more
although it doesn't reach 1% limit. It means highorder atomic
allocation failure would be higher.

So, this patch decreases the account as well as migratetype
if it was MIGRATE_HIGHATOMIC.

Signed-off-by: Minchan Kim 
---
 mm/page_alloc.c | 44 ++--
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 55ad0229ebf3..e7cbb3cc22fa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -282,6 +282,9 @@ EXPORT_SYMBOL(nr_node_ids);
 EXPORT_SYMBOL(nr_online_nodes);
 #endif
 
+static void dec_highatomic_pageblock(struct zone *zone, struct page *page,
+   int migratetype);
+
 int page_group_by_mobility_disabled __read_mostly;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
@@ -1935,7 +1938,14 @@ static void change_pageblock_range(struct page 
*pageblock_page,
int nr_pageblocks = 1 << (start_order - pageblock_order);
 
while (nr_pageblocks--) {
-   set_pageblock_migratetype(pageblock_page, migratetype);
+   if (get_pageblock_migratetype(pageblock_page) !=
+   MIGRATE_HIGHATOMIC)
+   set_pageblock_migratetype(pageblock_page,
+   migratetype);
+   else
+   dec_highatomic_pageblock(page_zone(pageblock_page),
+   pageblock_page,
+   migratetype);
pageblock_page += pageblock_nr_pages;
}
 }
@@ -1996,8 +2006,14 @@ static void steal_suitable_fallback(struct zone *zone, 
struct page *page,
 
/* Claim the whole block if over half of it is free */
if (pages >= (1 << (pageblock_order-1)) ||
-   page_group_by_mobility_disabled)
-   set_pageblock_migratetype(page, start_type);
+   page_group_by_mobility_disabled) {
+   int mt = get_pageblock_migratetype(page);
+
+   if (mt != MIGRATE_HIGHATOMIC)
+   set_pageblock_migratetype(page, start_type);
+   else
+   dec_highatomic_pageblock(zone, page, start_type);
+   }
 }
 
 /*
@@ -2037,6 +2053,17 @@ int find_suitable_fallback(struct free_area *area, 
unsigned int order,
return -1;
 }
 
+static void dec_highatomic_pageblock(struct zone *zone, struct page *page,
+   int migratetype)
+{
+   if (zone->nr_reserved_highatomic <= pageblock_nr_pages)
+   return;
+
+   zone->nr_reserved_highatomic -= min(pageblock_nr_pages,
+   zone->nr_reserved_highatomic);
+   set_pageblock_migratetype(page, migratetype);
+}
+
 /*
  * Reserve a pageblock for exclusive use of high-order atomic allocations if
  * there are no empty page blocks that contain a page with a suitable order
@@ -2555,9 +2582,14 @@ int __isolate_free_page(struct page *page, unsigned int 
order)
struct page *endpage = page + (1 << order) - 1;
for (; page < endpage; page += pageblock_nr_pages) {
int mt = get_pageblock_migratetype(page);
-   if (!is_migrate_isolate(mt) && !is_migrate_cma(mt))
-   set_pageblock_migratetype(page,
- MIGRATE_MOVABLE);
+   if (!is_migrate_isolate(mt) && !is_migrate_cma(mt)) {
+   if (mt != MIGRATE_HIGHATOMIC)
+   set_pageblock_migratetype(page,
+   MIGRATE_MOVABLE);
+   else
+   dec_highatomic_pageblock(zone, page,
+   MIGRATE_MOVABLE);
+   }
}
}
 
-- 
2.7.4



[PATCH 2/4] mm: prevent double decrease of nr_reserved_highatomic

2016-10-06 Thread Minchan Kim
There is race between page freeing and unreserved highatomic.

 CPU 0  CPU 1

free_hot_cold_page
  mt = get_pfnblock_migratetype
  set_pcppage_migratetype(page, mt)
unreserve_highatomic_pageblock
spin_lock_irqsave(>lock)
move_freepages_block
set_pageblock_migratetype(page)
spin_unlock_irqrestore(>lock)
  free_pcppages_bulk
__free_one_page(mt) <- mt is stale

By above race, a page on CPU 0 could go non-highorderatomic free list
since the pageblock's type is changed. By that, unreserve logic of
highorderatomic can decrease reserved count on a same pageblock
several times and then it will make mismatch between
nr_reserved_highatomic and the number of reserved pageblock.

So, this patch verifies whether the pageblock is highatomic or not
and decrease the count only if the pageblock is highatomic.

Signed-off-by: Minchan Kim 
---
 mm/page_alloc.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e7cbb3cc22fa..d110cd640264 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2133,13 +2133,25 @@ static void unreserve_highatomic_pageblock(const struct 
alloc_context *ac)
continue;
 
/*
-* It should never happen but changes to locking could
-* inadvertently allow a per-cpu drain to add pages
-* to MIGRATE_HIGHATOMIC while unreserving so be safe
-* and watch for underflows.
+* In page freeing path, migratetype change is racy so
+* we can counter several free pages in a pageblock
+* in this loop althoug we changed the pageblock type
+* from highatomic to ac->migratetype. So we should
+* adjust the count once.
 */
-   zone->nr_reserved_highatomic -= min(pageblock_nr_pages,
-   zone->nr_reserved_highatomic);
+   if (get_pageblock_migratetype(page) ==
+   MIGRATE_HIGHATOMIC) {
+   /*
+* It should never happen but changes to
+* locking could inadvertently allow a per-cpu
+* drain to add pages to MIGRATE_HIGHATOMIC
+* while unreserving so be safe and watch for
+* underflows.
+*/
+   zone->nr_reserved_highatomic -= min(
+   pageblock_nr_pages,
+   zone->nr_reserved_highatomic);
+   }
 
/*
 * Convert to ac->migratetype and avoid the normal
-- 
2.7.4



[PATCH 3/4] mm: unreserve highatomic free pages fully before OOM

2016-10-06 Thread Minchan Kim
After fixing the race of highatomic page count, I still encounter
OOM with many free memory reserved as highatomic.

One of reason in my testing was we unreserve free pages only if
reclaim has progress. Otherwise, we cannot have chance to unreseve.

Other problem after fixing it was it doesn't guarantee every pages
unreserving of highatomic pageblock because it just release *a*
pageblock which could have few free pages so other context could
steal it easily so that the process stucked with direct reclaim
finally can encounter OOM although there are free pages which can
be unreserved.

This patch changes the logic so that it unreserves pageblocks with
no_progress_loop proportionally. IOW, in first retrial of reclaim,
it will try to unreserve a pageblock. In second retrial of reclaim,
it will try to unreserve 1/MAX_RECLAIM_RETRIES * reserved_pageblock
and finally all reserved pageblock before the OOM.

Signed-off-by: Minchan Kim 
---
 mm/page_alloc.c | 57 -
 1 file changed, 44 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d110cd640264..eeb047bb0e9d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -71,6 +71,12 @@
 #include 
 #include "internal.h"
 
+/*
+ * Maximum number of reclaim retries without any progress before OOM killer
+ * is consider as the only way to move forward.
+ */
+#define MAX_RECLAIM_RETRIES 16
+
 /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */
 static DEFINE_MUTEX(pcp_batch_high_lock);
 #define MIN_PERCPU_PAGELIST_FRACTION   (8)
@@ -2107,7 +2113,8 @@ static void reserve_highatomic_pageblock(struct page 
*page, struct zone *zone,
  * intense memory pressure but failed atomic allocations should be easier
  * to recover from than an OOM.
  */
-static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
+static int unreserve_highatomic_pageblock(const struct alloc_context *ac,
+   int no_progress_loops)
 {
struct zonelist *zonelist = ac->zonelist;
unsigned long flags;
@@ -2115,15 +2122,40 @@ static void unreserve_highatomic_pageblock(const struct 
alloc_context *ac)
struct zone *zone;
struct page *page;
int order;
+   int unreserved_pages = 0;
 
for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
ac->nodemask) {
-   /* Preserve at least one pageblock */
-   if (zone->nr_reserved_highatomic <= pageblock_nr_pages)
+   unsigned long unreserve_pages_max;
+
+   /*
+* Try to preserve at least one pageblock but use up before
+* OOM kill.
+*/
+   if (no_progress_loops < MAX_RECLAIM_RETRIES &&
+   zone->nr_reserved_highatomic <= pageblock_nr_pages)
continue;
 
spin_lock_irqsave(>lock, flags);
-   for (order = 0; order < MAX_ORDER; order++) {
+   if (no_progress_loops < MAX_RECLAIM_RETRIES) {
+   unreserve_pages_max = no_progress_loops *
+   zone->nr_reserved_highatomic /
+   MAX_RECLAIM_RETRIES;
+   unreserve_pages_max = max(unreserve_pages_max,
+   pageblock_nr_pages);
+   } else {
+   /*
+* By race with page free functions, !highatomic
+* pageblocks can have a free page in highatomic
+* migratetype free list. So if we are about to
+* kill some process, unreserve every free pages
+* in highorderatomic.
+*/
+   unreserve_pages_max = -1UL;
+   }
+
+   for (order = 0; order < MAX_ORDER &&
+   unreserve_pages_max > 0; order++) {
struct free_area *area = &(zone->free_area[order]);
 
page = list_first_entry_or_null(
@@ -2151,6 +2183,9 @@ static void unreserve_highatomic_pageblock(const struct 
alloc_context *ac)
zone->nr_reserved_highatomic -= min(
pageblock_nr_pages,
zone->nr_reserved_highatomic);
+   unreserve_pages_max -= min(pageblock_nr_pages,
+   zone->nr_reserved_highatomic);
+   unreserved_pages += 1 << page_order(page);
}
 
/*
@@ -2164,11 +2199,11 @@ static void unreserve_highatomic_pageblock(const struct 
alloc_context *ac)
 */

[PATCH 4/4] mm: skip to reserve pageblock crossed zone boundary for HIGHATOMIC

2016-10-06 Thread Minchan Kim
In CONFIG_SPARSEMEM, VM shares a pageblock_flags of a mem_section
between two zones if the pageblock cross zone boundaries. It means
a zone lock cannot protect pageblock migratype change's race.

It might be not a problem because migratetype inherently was racy
but intrdocuing with CMA, it was not true any more and have been fixed.
(I hope it should be solved more general approach however...)
And then, it's time for MIGRATE_HIGHATOMIC.

More importantly, HIGHATOMIC migratetype is not big(i.e., 1%) reserve
in system so let's skip such crippled pageblock to try to reserve
full 1% free memory.

Debugged-by: Joonsoo Kim 
Signed-off-by: Minchan Kim 
---
 mm/page_alloc.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index eeb047bb0e9d..d76bb50baf61 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2098,6 +2098,24 @@ static void reserve_highatomic_pageblock(struct page 
*page, struct zone *zone,
mt = get_pageblock_migratetype(page);
if (mt != MIGRATE_HIGHATOMIC &&
!is_migrate_isolate(mt) && !is_migrate_cma(mt)) {
+   /*
+* If the pageblock cross zone boundaries, we need both
+* zone locks but doesn't want to make complex because
+* highatomic pageblock is small so that we want to reserve
+* sane(?) pageblock.
+*/
+   unsigned long start_pfn, end_pfn;
+
+   start_pfn = page_to_pfn(page);
+   start_pfn = start_pfn & ~(pageblock_nr_pages - 1);
+
+   if (!zone_spans_pfn(zone, start_pfn))
+   goto out_unlock;
+
+   end_pfn = start_pfn + pageblock_nr_pages - 1;
+   if (!zone_spans_pfn(zone, end_pfn))
+   goto out_unlock;
+
zone->nr_reserved_highatomic += pageblock_nr_pages;
set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
move_freepages_block(zone, page, MIGRATE_HIGHATOMIC);
-- 
2.7.4



[PATCH 4/4] mm: skip to reserve pageblock crossed zone boundary for HIGHATOMIC

2016-10-06 Thread Minchan Kim
In CONFIG_SPARSEMEM, VM shares a pageblock_flags of a mem_section
between two zones if the pageblock cross zone boundaries. It means
a zone lock cannot protect pageblock migratype change's race.

It might be not a problem because migratetype inherently was racy
but intrdocuing with CMA, it was not true any more and have been fixed.
(I hope it should be solved more general approach however...)
And then, it's time for MIGRATE_HIGHATOMIC.

More importantly, HIGHATOMIC migratetype is not big(i.e., 1%) reserve
in system so let's skip such crippled pageblock to try to reserve
full 1% free memory.

Debugged-by: Joonsoo Kim 
Signed-off-by: Minchan Kim 
---
 mm/page_alloc.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index eeb047bb0e9d..d76bb50baf61 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2098,6 +2098,24 @@ static void reserve_highatomic_pageblock(struct page 
*page, struct zone *zone,
mt = get_pageblock_migratetype(page);
if (mt != MIGRATE_HIGHATOMIC &&
!is_migrate_isolate(mt) && !is_migrate_cma(mt)) {
+   /*
+* If the pageblock cross zone boundaries, we need both
+* zone locks but doesn't want to make complex because
+* highatomic pageblock is small so that we want to reserve
+* sane(?) pageblock.
+*/
+   unsigned long start_pfn, end_pfn;
+
+   start_pfn = page_to_pfn(page);
+   start_pfn = start_pfn & ~(pageblock_nr_pages - 1);
+
+   if (!zone_spans_pfn(zone, start_pfn))
+   goto out_unlock;
+
+   end_pfn = start_pfn + pageblock_nr_pages - 1;
+   if (!zone_spans_pfn(zone, end_pfn))
+   goto out_unlock;
+
zone->nr_reserved_highatomic += pageblock_nr_pages;
set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
move_freepages_block(zone, page, MIGRATE_HIGHATOMIC);
-- 
2.7.4



[PATCH 0/4] use up highorder free pages before OOM

2016-10-06 Thread Minchan Kim
I got OOM report from production team with v4.4 kernel.
It has enough free memory but failed to allocate order-0 page and
finally encounter OOM kill.
I could reproduce it with my test easily. Look at below.
The reason is free pages(19M) of DMA32 zone are reserved for
HIGHORDERATOMIC and doesn't unreserved before the OOM.

balloon invoked oom-killer: 
gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
balloon cpuset=/ mems_allowed=0
CPU: 1 PID: 8473 Comm: balloon Tainted: GW  OE   
4.8.0-rc7-00219-g3f74c9559583-dirty #3161
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
  88007f15bbc8 8138eb13 88007f15bd88
 88005a72a4c0 88007f15bc28 811d2d13 88007f15bc08
 8146a5ca 81c8df60 0015 0206
Call Trace:
 [] dump_stack+0x63/0x90
 [] dump_header+0x5c/0x1ce
 [] ? virtballoon_oom_notify+0x2a/0x80
 [] oom_kill_process+0x22e/0x400
 [] out_of_memory+0x1ac/0x210
 [] __alloc_pages_nodemask+0x101e/0x1040
 [] handle_mm_fault+0xa0a/0xbf0
 [] __do_page_fault+0x1dd/0x4d0
 [] trace_do_page_fault+0x43/0x130
 [] do_async_page_fault+0x1a/0xa0
 [] async_page_fault+0x28/0x30
Mem-Info:
active_anon:383949 inactive_anon:106724 isolated_anon:0
 active_file:15 inactive_file:44 isolated_file:0
 unevictable:0 dirty:0 writeback:24 unstable:0
 slab_reclaimable:2483 slab_unreclaimable:3326
 mapped:0 shmem:0 pagetables:1906 bounce:0
 free:6898 free_pcp:291 free_cma:0
Node 0 active_anon:1535796kB inactive_anon:426896kB active_file:60kB 
inactive_file:176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB 
mapped:0kB dirty:0kB writeback:96kB shmem:0kB writeback_tmp:0kB unstable:0kB 
pages_scanned:1418 all_unreclaimable? no
DMA free:8188kB min:44kB low:56kB high:68kB active_anon:7648kB 
inactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:0kB 
writepending:0kB present:15992kB managed:15908kB mlocked:0kB 
slab_reclaimable:0kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:0kB 
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 1952 1952 1952
DMA32 free:19404kB min:5628kB low:7624kB high:9620kB active_anon:1528148kB 
inactive_anon:426896kB active_file:60kB inactive_file:420kB unevictable:0kB 
writepending:96kB present:2080640kB managed:2030092kB mlocked:0kB 
slab_reclaimable:9932kB slab_unreclaimable:13284kB kernel_stack:2496kB 
pagetables:7624kB bounce:0kB free_pcp:900kB local_pcp:112kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 
2*4096kB (H) = 8192kB
DMA32: 7*4kB (H) 8*8kB (H) 30*16kB (H) 31*32kB (H) 14*64kB (H) 9*128kB (H) 
2*256kB (H) 2*512kB (H) 4*1024kB (H) 5*2048kB (H) 0*4096kB = 19484kB
51131 total pagecache pages
50795 pages in swap cache
Swap cache stats: add 3532405601, delete 3532354806, find 124289150/1822712228
Free swap  = 8kB
Total swap = 255996kB
524158 pages RAM
0 pages HighMem/MovableOnly
12658 pages reserved
0 pages cma reserved
0 pages hwpoisoned

During the investigation, I found some problems with highatomic so
this patch aims to solve the problems and the final goal is to
unreserve every highatomic free pages before the OOM kill.

Patch 1 fixes accounting bug in several places of page allocators
Patch 2 fixes accounting bug caused by subtle race between freeing
function and unreserve_highatomic_pageblock.
Patch 3 changes unreseve scheme to use up every reserved pages
Patch 4 fixes accounting bug caused by mem_section shared by two zones.

Minchan Kim (4):
  mm: adjust reserved highatomic count
  mm: prevent double decrease of nr_reserved_highatomic
  mm: unreserve highatomic free pages fully before OOM
  mm: skip to reserve pageblock crossed zone boundary for HIGHATOMIC

 mm/page_alloc.c | 143 ++--
 1 file changed, 118 insertions(+), 25 deletions(-)

-- 
2.7.4



[PATCH 0/4] use up highorder free pages before OOM

2016-10-06 Thread Minchan Kim
I got OOM report from production team with v4.4 kernel.
It has enough free memory but failed to allocate order-0 page and
finally encounter OOM kill.
I could reproduce it with my test easily. Look at below.
The reason is free pages(19M) of DMA32 zone are reserved for
HIGHORDERATOMIC and doesn't unreserved before the OOM.

balloon invoked oom-killer: 
gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
balloon cpuset=/ mems_allowed=0
CPU: 1 PID: 8473 Comm: balloon Tainted: GW  OE   
4.8.0-rc7-00219-g3f74c9559583-dirty #3161
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
  88007f15bbc8 8138eb13 88007f15bd88
 88005a72a4c0 88007f15bc28 811d2d13 88007f15bc08
 8146a5ca 81c8df60 0015 0206
Call Trace:
 [] dump_stack+0x63/0x90
 [] dump_header+0x5c/0x1ce
 [] ? virtballoon_oom_notify+0x2a/0x80
 [] oom_kill_process+0x22e/0x400
 [] out_of_memory+0x1ac/0x210
 [] __alloc_pages_nodemask+0x101e/0x1040
 [] handle_mm_fault+0xa0a/0xbf0
 [] __do_page_fault+0x1dd/0x4d0
 [] trace_do_page_fault+0x43/0x130
 [] do_async_page_fault+0x1a/0xa0
 [] async_page_fault+0x28/0x30
Mem-Info:
active_anon:383949 inactive_anon:106724 isolated_anon:0
 active_file:15 inactive_file:44 isolated_file:0
 unevictable:0 dirty:0 writeback:24 unstable:0
 slab_reclaimable:2483 slab_unreclaimable:3326
 mapped:0 shmem:0 pagetables:1906 bounce:0
 free:6898 free_pcp:291 free_cma:0
Node 0 active_anon:1535796kB inactive_anon:426896kB active_file:60kB 
inactive_file:176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB 
mapped:0kB dirty:0kB writeback:96kB shmem:0kB writeback_tmp:0kB unstable:0kB 
pages_scanned:1418 all_unreclaimable? no
DMA free:8188kB min:44kB low:56kB high:68kB active_anon:7648kB 
inactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:0kB 
writepending:0kB present:15992kB managed:15908kB mlocked:0kB 
slab_reclaimable:0kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:0kB 
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 1952 1952 1952
DMA32 free:19404kB min:5628kB low:7624kB high:9620kB active_anon:1528148kB 
inactive_anon:426896kB active_file:60kB inactive_file:420kB unevictable:0kB 
writepending:96kB present:2080640kB managed:2030092kB mlocked:0kB 
slab_reclaimable:9932kB slab_unreclaimable:13284kB kernel_stack:2496kB 
pagetables:7624kB bounce:0kB free_pcp:900kB local_pcp:112kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 
2*4096kB (H) = 8192kB
DMA32: 7*4kB (H) 8*8kB (H) 30*16kB (H) 31*32kB (H) 14*64kB (H) 9*128kB (H) 
2*256kB (H) 2*512kB (H) 4*1024kB (H) 5*2048kB (H) 0*4096kB = 19484kB
51131 total pagecache pages
50795 pages in swap cache
Swap cache stats: add 3532405601, delete 3532354806, find 124289150/1822712228
Free swap  = 8kB
Total swap = 255996kB
524158 pages RAM
0 pages HighMem/MovableOnly
12658 pages reserved
0 pages cma reserved
0 pages hwpoisoned

During the investigation, I found some problems with highatomic so
this patch aims to solve the problems and the final goal is to
unreserve every highatomic free pages before the OOM kill.

Patch 1 fixes accounting bug in several places of page allocators
Patch 2 fixes accounting bug caused by subtle race between freeing
function and unreserve_highatomic_pageblock.
Patch 3 changes unreseve scheme to use up every reserved pages
Patch 4 fixes accounting bug caused by mem_section shared by two zones.

Minchan Kim (4):
  mm: adjust reserved highatomic count
  mm: prevent double decrease of nr_reserved_highatomic
  mm: unreserve highatomic free pages fully before OOM
  mm: skip to reserve pageblock crossed zone boundary for HIGHATOMIC

 mm/page_alloc.c | 143 ++--
 1 file changed, 118 insertions(+), 25 deletions(-)

-- 
2.7.4



[PATCH] hwmon: fix platform_no_drv_owner.cocci warnings

2016-10-06 Thread Julia Lawall
No need to set .owner here. The core will do it.

Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci

Signed-off-by: Julia Lawall 
Signed-off-by: Fengguang Wu 
---

tree:   https://github.com/0day-ci/linux
Chris-Packham/hwmon-Add-tc654-driver/20161007-054116
head:   7b9f81e69fbc7077c55136daefe7546cf88925ae
commit: 7b9f81e69fbc7077c55136daefe7546cf88925ae [1/1] hwmon: Add tc654
driver

 tc654.c |1 -
 1 file changed, 1 deletion(-)

--- a/drivers/hwmon/tc654.c
+++ b/drivers/hwmon/tc654.c
@@ -517,7 +517,6 @@ MODULE_DEVICE_TABLE(i2c, tc654_id);
 static struct i2c_driver tc654_driver = {
.driver = {
   .name = "tc654",
-  .owner = THIS_MODULE,
   .of_match_table = of_match_ptr(tc654_dt_match),
   },
.probe = tc654_probe,


[PATCH] hwmon: fix platform_no_drv_owner.cocci warnings

2016-10-06 Thread Julia Lawall
No need to set .owner here. The core will do it.

Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci

Signed-off-by: Julia Lawall 
Signed-off-by: Fengguang Wu 
---

tree:   https://github.com/0day-ci/linux
Chris-Packham/hwmon-Add-tc654-driver/20161007-054116
head:   7b9f81e69fbc7077c55136daefe7546cf88925ae
commit: 7b9f81e69fbc7077c55136daefe7546cf88925ae [1/1] hwmon: Add tc654
driver

 tc654.c |1 -
 1 file changed, 1 deletion(-)

--- a/drivers/hwmon/tc654.c
+++ b/drivers/hwmon/tc654.c
@@ -517,7 +517,6 @@ MODULE_DEVICE_TABLE(i2c, tc654_id);
 static struct i2c_driver tc654_driver = {
.driver = {
   .name = "tc654",
-  .owner = THIS_MODULE,
   .of_match_table = of_match_ptr(tc654_dt_match),
   },
.probe = tc654_probe,


Re: [GIT PULL] MD update for 4.9

2016-10-06 Thread Doug Dumitru
Mr. Li,

There is another thread in [linux-raid] discussing pre-fetches in the
raid-6 AVX2 code.  My testing implies that the prefetch distance is
too short.  In your new AVX512 code, it looks like there are 24
instructions, each with latencies of 1, between the prefetch and the
actual memory load.  I don't have a AVX512 CPU to try this on, but the
prefetch might do better at a bigger distance.  If I am not mistaken,
it takes a lot longer than 24 clocks to fetch 4 cache lines.

Just a comment while the code is still fluid.

Doug Dumitru
EasyCo LLC

On Thu, Oct 6, 2016 at 5:38 PM, Shaohua Li  wrote:
> Hi Linus,
> Please pull MD update for 4.9. This update includes:
> - new AVX512 instruction based raid6 gen/recovery algorithm
> - A couple of md-cluster related bug fixes
> - Fix a potential deadlock
> - Set nonrotational bit for raid array with SSD
> - Set correct max_hw_sectors for raid5/6, which hopefuly can improve
>   performance a little bit
> - Other minor fixes
>
> Thanks,
> Shaohua
>
> The following changes since commit 7d1e042314619115153a0f6f06e4552c09a50e13:
>
>   Merge tag 'usercopy-v4.8-rc8' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux (2016-09-20 17:11:19 
> -0700)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/shli/md.git tags/md/4.9-rc1
>
> for you to fetch changes up to bb086a89a406b5d877ee616f1490fcc81f8e1b2b:
>
>   md: set rotational bit (2016-10-03 10:20:27 -0700)
>
> 
> Chao Yu (1):
>   raid5: fix to detect failure of register_shrinker
>
> Gayatri Kammela (5):
>   lib/raid6: Add AVX512 optimized gen_syndrome functions
>   lib/raid6: Add AVX512 optimized recovery functions
>   lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functions
>   lib/raid6: Add AVX512 optimized xor_syndrome functions
>   raid6/test/test.c: bug fix: Specify aligned(alignment) attributes to 
> the char arrays
>
> Guoqing Jiang (9):
>   md-cluster: call md_kick_rdev_from_array once ack failed
>   md-cluster: use FORCEUNLOCK in lockres_free
>   md-cluster: remove some unnecessary dlm_unlock_sync
>   md: changes for MD_STILL_CLOSED flag
>   md-cluster: clean related infos of cluster
>   md-cluster: protect md_find_rdev_nr_rcu with rcu lock
>   md-cluster: convert the completion to wait queue
>   md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
>   md-cluster: make resync lock also could be interruptted
>
> Shaohua Li (5):
>   raid5: allow arbitrary max_hw_sectors
>   md/bitmap: fix wrong cleanup
>   md: fix a potential deadlock
>   raid5: handle register_shrinker failure
>   md: set rotational bit
>
>  arch/x86/Makefile|   5 +-
>  drivers/md/bitmap.c  |   4 +-
>  drivers/md/md-cluster.c  |  99 ++---
>  drivers/md/md.c  |  44 +++-
>  drivers/md/md.h  |   5 +-
>  drivers/md/raid5.c   |  11 +-
>  include/linux/raid/pq.h  |   4 +
>  lib/raid6/Makefile   |   2 +-
>  lib/raid6/algos.c|  12 +
>  lib/raid6/avx512.c   | 569 
> +++
>  lib/raid6/recov_avx512.c | 388 
>  lib/raid6/test/Makefile  |   5 +-
>  lib/raid6/test/test.c|   7 +-
>  lib/raid6/x86.h  |  10 +
>  14 files changed,  insertions(+), 54 deletions(-)
>  create mode 100644 lib/raid6/avx512.c
>  create mode 100644 lib/raid6/recov_avx512.c
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Doug Dumitru
EasyCo LLC


Re: [GIT PULL] MD update for 4.9

2016-10-06 Thread Doug Dumitru
Mr. Li,

There is another thread in [linux-raid] discussing pre-fetches in the
raid-6 AVX2 code.  My testing implies that the prefetch distance is
too short.  In your new AVX512 code, it looks like there are 24
instructions, each with latencies of 1, between the prefetch and the
actual memory load.  I don't have a AVX512 CPU to try this on, but the
prefetch might do better at a bigger distance.  If I am not mistaken,
it takes a lot longer than 24 clocks to fetch 4 cache lines.

Just a comment while the code is still fluid.

Doug Dumitru
EasyCo LLC

On Thu, Oct 6, 2016 at 5:38 PM, Shaohua Li  wrote:
> Hi Linus,
> Please pull MD update for 4.9. This update includes:
> - new AVX512 instruction based raid6 gen/recovery algorithm
> - A couple of md-cluster related bug fixes
> - Fix a potential deadlock
> - Set nonrotational bit for raid array with SSD
> - Set correct max_hw_sectors for raid5/6, which hopefuly can improve
>   performance a little bit
> - Other minor fixes
>
> Thanks,
> Shaohua
>
> The following changes since commit 7d1e042314619115153a0f6f06e4552c09a50e13:
>
>   Merge tag 'usercopy-v4.8-rc8' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux (2016-09-20 17:11:19 
> -0700)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/shli/md.git tags/md/4.9-rc1
>
> for you to fetch changes up to bb086a89a406b5d877ee616f1490fcc81f8e1b2b:
>
>   md: set rotational bit (2016-10-03 10:20:27 -0700)
>
> 
> Chao Yu (1):
>   raid5: fix to detect failure of register_shrinker
>
> Gayatri Kammela (5):
>   lib/raid6: Add AVX512 optimized gen_syndrome functions
>   lib/raid6: Add AVX512 optimized recovery functions
>   lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functions
>   lib/raid6: Add AVX512 optimized xor_syndrome functions
>   raid6/test/test.c: bug fix: Specify aligned(alignment) attributes to 
> the char arrays
>
> Guoqing Jiang (9):
>   md-cluster: call md_kick_rdev_from_array once ack failed
>   md-cluster: use FORCEUNLOCK in lockres_free
>   md-cluster: remove some unnecessary dlm_unlock_sync
>   md: changes for MD_STILL_CLOSED flag
>   md-cluster: clean related infos of cluster
>   md-cluster: protect md_find_rdev_nr_rcu with rcu lock
>   md-cluster: convert the completion to wait queue
>   md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
>   md-cluster: make resync lock also could be interruptted
>
> Shaohua Li (5):
>   raid5: allow arbitrary max_hw_sectors
>   md/bitmap: fix wrong cleanup
>   md: fix a potential deadlock
>   raid5: handle register_shrinker failure
>   md: set rotational bit
>
>  arch/x86/Makefile|   5 +-
>  drivers/md/bitmap.c  |   4 +-
>  drivers/md/md-cluster.c  |  99 ++---
>  drivers/md/md.c  |  44 +++-
>  drivers/md/md.h  |   5 +-
>  drivers/md/raid5.c   |  11 +-
>  include/linux/raid/pq.h  |   4 +
>  lib/raid6/Makefile   |   2 +-
>  lib/raid6/algos.c|  12 +
>  lib/raid6/avx512.c   | 569 
> +++
>  lib/raid6/recov_avx512.c | 388 
>  lib/raid6/test/Makefile  |   5 +-
>  lib/raid6/test/test.c|   7 +-
>  lib/raid6/x86.h  |  10 +
>  14 files changed,  insertions(+), 54 deletions(-)
>  create mode 100644 lib/raid6/avx512.c
>  create mode 100644 lib/raid6/recov_avx512.c
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Doug Dumitru
EasyCo LLC


[PATCH V5 05/10] dmaengine: qcom_hidma: make pending_tre_count atomic

2016-10-06 Thread Sinan Kaya
Getting ready for the MSI interrupts. The pending_tre_count is used
in the interrupt handler to make sure all outstanding requests are
serviced.

The driver will allocate 11 MSI interrupts. Each MSI interrupt can be
assigned to a different CPU. Then, we have a race condition for common
variables as they share the same interrupt handler with a different
cause bit and they can potentially be executed in parallel. Making this
variable atomic so that it can be updated from multiple processor
contexts.

Signed-off-by: Sinan Kaya 
---
 drivers/dma/qcom/hidma.h |  2 +-
 drivers/dma/qcom/hidma_dbg.c |  3 ++-
 drivers/dma/qcom/hidma_ll.c  | 13 ++---
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/dma/qcom/hidma.h b/drivers/dma/qcom/hidma.h
index b209942..afaeb9a 100644
--- a/drivers/dma/qcom/hidma.h
+++ b/drivers/dma/qcom/hidma.h
@@ -58,7 +58,7 @@ struct hidma_lldev {
void __iomem *evca; /* Event Channel address  */
struct hidma_tre
**pending_tre_list; /* Pointers to pending TREs   */
-   s32 pending_tre_count;  /* Number of TREs pending */
+   atomic_t pending_tre_count; /* Number of TREs pending */
 
void *tre_ring; /* TRE ring   */
dma_addr_t tre_dma; /* TRE ring to be shared with HW  */
diff --git a/drivers/dma/qcom/hidma_dbg.c b/drivers/dma/qcom/hidma_dbg.c
index 3d83b99..3bdcb80 100644
--- a/drivers/dma/qcom/hidma_dbg.c
+++ b/drivers/dma/qcom/hidma_dbg.c
@@ -74,7 +74,8 @@ static void hidma_ll_devstats(struct seq_file *s, void 
*llhndl)
seq_printf(s, "tre_ring_handle=%pap\n", >tre_dma);
seq_printf(s, "tre_ring_size = 0x%x\n", lldev->tre_ring_size);
seq_printf(s, "tre_processed_off = 0x%x\n", lldev->tre_processed_off);
-   seq_printf(s, "pending_tre_count=%d\n", lldev->pending_tre_count);
+   seq_printf(s, "pending_tre_count=%d\n",
+   atomic_read(>pending_tre_count));
seq_printf(s, "evca=%p\n", lldev->evca);
seq_printf(s, "evre_ring=%p\n", lldev->evre_ring);
seq_printf(s, "evre_ring_handle=%pap\n", >evre_dma);
diff --git a/drivers/dma/qcom/hidma_ll.c b/drivers/dma/qcom/hidma_ll.c
index ad20dfb..a4fc941 100644
--- a/drivers/dma/qcom/hidma_ll.c
+++ b/drivers/dma/qcom/hidma_ll.c
@@ -218,10 +218,9 @@ static int hidma_post_completed(struct hidma_lldev *lldev, 
int tre_iterator,
 * Keep track of pending TREs that SW is expecting to receive
 * from HW. We got one now. Decrement our counter.
 */
-   lldev->pending_tre_count--;
-   if (lldev->pending_tre_count < 0) {
+   if (atomic_dec_return(>pending_tre_count) < 0) {
dev_warn(lldev->dev, "tre count mismatch on completion");
-   lldev->pending_tre_count = 0;
+   atomic_set(>pending_tre_count, 0);
}
 
spin_unlock_irqrestore(>lock, flags);
@@ -321,7 +320,7 @@ void hidma_cleanup_pending_tre(struct hidma_lldev *lldev, 
u8 err_info,
u32 tre_read_off;
 
tre_iterator = lldev->tre_processed_off;
-   while (lldev->pending_tre_count) {
+   while (atomic_read(>pending_tre_count)) {
if (hidma_post_completed(lldev, tre_iterator, err_info,
 err_code))
break;
@@ -564,7 +563,7 @@ void hidma_ll_queue_request(struct hidma_lldev *lldev, u32 
tre_ch)
tre->err_code = 0;
tre->err_info = 0;
tre->queued = 1;
-   lldev->pending_tre_count++;
+   atomic_inc(>pending_tre_count);
lldev->tre_write_offset = (lldev->tre_write_offset + HIDMA_TRE_SIZE)
% lldev->tre_ring_size;
spin_unlock_irqrestore(>lock, flags);
@@ -670,7 +669,7 @@ int hidma_ll_setup(struct hidma_lldev *lldev)
u32 val;
u32 nr_tres = lldev->nr_tres;
 
-   lldev->pending_tre_count = 0;
+   atomic_set(>pending_tre_count, 0);
lldev->tre_processed_off = 0;
lldev->evre_processed_off = 0;
lldev->tre_write_offset = 0;
@@ -834,7 +833,7 @@ int hidma_ll_uninit(struct hidma_lldev *lldev)
tasklet_kill(>rst_task);
memset(lldev->trepool, 0, required_bytes);
lldev->trepool = NULL;
-   lldev->pending_tre_count = 0;
+   atomic_set(>pending_tre_count, 0);
lldev->tre_write_offset = 0;
 
rc = hidma_ll_reset(lldev);
-- 
1.9.1



[PATCH V5 08/10] dmaengine: qcom_hidma: protect common data structures

2016-10-06 Thread Sinan Kaya
When MSI interrupts are supported, error and the transfer interrupt can
come from multiple processor contexts.

Each error interrupt is an MSI interrupt. If the channel is disabled by
the first error interrupt, the remaining error interrupts will gracefully
return in the interrupt handler.

If an error is observed while servicing the completions in success case,
the posting of the completions will be aborted as soon as channel disabled
state is observed. The error interrupt handler will take it from there and
finish the remaining completions. We don't want to create multiple success
and error messages to be delivered to the client in mixed order.

Signed-off-by: Sinan Kaya 
---
 drivers/dma/qcom/hidma_ll.c | 44 +++-
 1 file changed, 11 insertions(+), 33 deletions(-)

diff --git a/drivers/dma/qcom/hidma_ll.c b/drivers/dma/qcom/hidma_ll.c
index 9d78c86..c4e8b64 100644
--- a/drivers/dma/qcom/hidma_ll.c
+++ b/drivers/dma/qcom/hidma_ll.c
@@ -198,13 +198,16 @@ static void hidma_ll_tre_complete(unsigned long arg)
}
 }
 
-static int hidma_post_completed(struct hidma_lldev *lldev, int tre_iterator,
-   u8 err_info, u8 err_code)
+static int hidma_post_completed(struct hidma_lldev *lldev, u8 err_info,
+   u8 err_code)
 {
struct hidma_tre *tre;
unsigned long flags;
+   u32 tre_iterator;
 
spin_lock_irqsave(>lock, flags);
+
+   tre_iterator = lldev->tre_processed_off;
tre = lldev->pending_tre_list[tre_iterator / HIDMA_TRE_SIZE];
if (!tre) {
spin_unlock_irqrestore(>lock, flags);
@@ -223,6 +226,9 @@ static int hidma_post_completed(struct hidma_lldev *lldev, 
int tre_iterator,
atomic_set(>pending_tre_count, 0);
}
 
+   HIDMA_INCREMENT_ITERATOR(tre_iterator, HIDMA_TRE_SIZE,
+lldev->tre_ring_size);
+   lldev->tre_processed_off = tre_iterator;
spin_unlock_irqrestore(>lock, flags);
 
tre->err_info = err_info;
@@ -244,13 +250,11 @@ static int hidma_post_completed(struct hidma_lldev 
*lldev, int tre_iterator,
 static int hidma_handle_tre_completion(struct hidma_lldev *lldev)
 {
u32 evre_ring_size = lldev->evre_ring_size;
-   u32 tre_ring_size = lldev->tre_ring_size;
u32 err_info, err_code, evre_write_off;
-   u32 tre_iterator, evre_iterator;
+   u32 evre_iterator;
u32 num_completed = 0;
 
evre_write_off = readl_relaxed(lldev->evca + HIDMA_EVCA_WRITE_PTR_REG);
-   tre_iterator = lldev->tre_processed_off;
evre_iterator = lldev->evre_processed_off;
 
if ((evre_write_off > evre_ring_size) ||
@@ -273,12 +277,9 @@ static int hidma_handle_tre_completion(struct hidma_lldev 
*lldev)
err_code =
(cfg >> HIDMA_EVRE_CODE_BIT_POS) & HIDMA_EVRE_CODE_MASK;
 
-   if (hidma_post_completed(lldev, tre_iterator, err_info,
-err_code))
+   if (hidma_post_completed(lldev, err_info, err_code))
break;
 
-   HIDMA_INCREMENT_ITERATOR(tre_iterator, HIDMA_TRE_SIZE,
-tre_ring_size);
HIDMA_INCREMENT_ITERATOR(evre_iterator, HIDMA_EVRE_SIZE,
 evre_ring_size);
 
@@ -295,16 +296,10 @@ static int hidma_handle_tre_completion(struct hidma_lldev 
*lldev)
if (num_completed) {
u32 evre_read_off = (lldev->evre_processed_off +
 HIDMA_EVRE_SIZE * num_completed);
-   u32 tre_read_off = (lldev->tre_processed_off +
-   HIDMA_TRE_SIZE * num_completed);
-
evre_read_off = evre_read_off % evre_ring_size;
-   tre_read_off = tre_read_off % tre_ring_size;
-
writel(evre_read_off, lldev->evca + HIDMA_EVCA_DOORBELL_REG);
 
/* record the last processed tre offset */
-   lldev->tre_processed_off = tre_read_off;
lldev->evre_processed_off = evre_read_off;
}
 
@@ -314,27 +309,10 @@ static int hidma_handle_tre_completion(struct hidma_lldev 
*lldev)
 void hidma_cleanup_pending_tre(struct hidma_lldev *lldev, u8 err_info,
   u8 err_code)
 {
-   u32 tre_iterator;
-   u32 tre_ring_size = lldev->tre_ring_size;
-   int num_completed = 0;
-   u32 tre_read_off;
-
-   tre_iterator = lldev->tre_processed_off;
while (atomic_read(>pending_tre_count)) {
-   if (hidma_post_completed(lldev, tre_iterator, err_info,
-err_code))
+   if (hidma_post_completed(lldev, err_info, err_code))
break;
-   HIDMA_INCREMENT_ITERATOR(tre_iterator, HIDMA_TRE_SIZE,
-

[PATCH V5 05/10] dmaengine: qcom_hidma: make pending_tre_count atomic

2016-10-06 Thread Sinan Kaya
Getting ready for the MSI interrupts. The pending_tre_count is used
in the interrupt handler to make sure all outstanding requests are
serviced.

The driver will allocate 11 MSI interrupts. Each MSI interrupt can be
assigned to a different CPU. Then, we have a race condition for common
variables as they share the same interrupt handler with a different
cause bit and they can potentially be executed in parallel. Making this
variable atomic so that it can be updated from multiple processor
contexts.

Signed-off-by: Sinan Kaya 
---
 drivers/dma/qcom/hidma.h |  2 +-
 drivers/dma/qcom/hidma_dbg.c |  3 ++-
 drivers/dma/qcom/hidma_ll.c  | 13 ++---
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/dma/qcom/hidma.h b/drivers/dma/qcom/hidma.h
index b209942..afaeb9a 100644
--- a/drivers/dma/qcom/hidma.h
+++ b/drivers/dma/qcom/hidma.h
@@ -58,7 +58,7 @@ struct hidma_lldev {
void __iomem *evca; /* Event Channel address  */
struct hidma_tre
**pending_tre_list; /* Pointers to pending TREs   */
-   s32 pending_tre_count;  /* Number of TREs pending */
+   atomic_t pending_tre_count; /* Number of TREs pending */
 
void *tre_ring; /* TRE ring   */
dma_addr_t tre_dma; /* TRE ring to be shared with HW  */
diff --git a/drivers/dma/qcom/hidma_dbg.c b/drivers/dma/qcom/hidma_dbg.c
index 3d83b99..3bdcb80 100644
--- a/drivers/dma/qcom/hidma_dbg.c
+++ b/drivers/dma/qcom/hidma_dbg.c
@@ -74,7 +74,8 @@ static void hidma_ll_devstats(struct seq_file *s, void 
*llhndl)
seq_printf(s, "tre_ring_handle=%pap\n", >tre_dma);
seq_printf(s, "tre_ring_size = 0x%x\n", lldev->tre_ring_size);
seq_printf(s, "tre_processed_off = 0x%x\n", lldev->tre_processed_off);
-   seq_printf(s, "pending_tre_count=%d\n", lldev->pending_tre_count);
+   seq_printf(s, "pending_tre_count=%d\n",
+   atomic_read(>pending_tre_count));
seq_printf(s, "evca=%p\n", lldev->evca);
seq_printf(s, "evre_ring=%p\n", lldev->evre_ring);
seq_printf(s, "evre_ring_handle=%pap\n", >evre_dma);
diff --git a/drivers/dma/qcom/hidma_ll.c b/drivers/dma/qcom/hidma_ll.c
index ad20dfb..a4fc941 100644
--- a/drivers/dma/qcom/hidma_ll.c
+++ b/drivers/dma/qcom/hidma_ll.c
@@ -218,10 +218,9 @@ static int hidma_post_completed(struct hidma_lldev *lldev, 
int tre_iterator,
 * Keep track of pending TREs that SW is expecting to receive
 * from HW. We got one now. Decrement our counter.
 */
-   lldev->pending_tre_count--;
-   if (lldev->pending_tre_count < 0) {
+   if (atomic_dec_return(>pending_tre_count) < 0) {
dev_warn(lldev->dev, "tre count mismatch on completion");
-   lldev->pending_tre_count = 0;
+   atomic_set(>pending_tre_count, 0);
}
 
spin_unlock_irqrestore(>lock, flags);
@@ -321,7 +320,7 @@ void hidma_cleanup_pending_tre(struct hidma_lldev *lldev, 
u8 err_info,
u32 tre_read_off;
 
tre_iterator = lldev->tre_processed_off;
-   while (lldev->pending_tre_count) {
+   while (atomic_read(>pending_tre_count)) {
if (hidma_post_completed(lldev, tre_iterator, err_info,
 err_code))
break;
@@ -564,7 +563,7 @@ void hidma_ll_queue_request(struct hidma_lldev *lldev, u32 
tre_ch)
tre->err_code = 0;
tre->err_info = 0;
tre->queued = 1;
-   lldev->pending_tre_count++;
+   atomic_inc(>pending_tre_count);
lldev->tre_write_offset = (lldev->tre_write_offset + HIDMA_TRE_SIZE)
% lldev->tre_ring_size;
spin_unlock_irqrestore(>lock, flags);
@@ -670,7 +669,7 @@ int hidma_ll_setup(struct hidma_lldev *lldev)
u32 val;
u32 nr_tres = lldev->nr_tres;
 
-   lldev->pending_tre_count = 0;
+   atomic_set(>pending_tre_count, 0);
lldev->tre_processed_off = 0;
lldev->evre_processed_off = 0;
lldev->tre_write_offset = 0;
@@ -834,7 +833,7 @@ int hidma_ll_uninit(struct hidma_lldev *lldev)
tasklet_kill(>rst_task);
memset(lldev->trepool, 0, required_bytes);
lldev->trepool = NULL;
-   lldev->pending_tre_count = 0;
+   atomic_set(>pending_tre_count, 0);
lldev->tre_write_offset = 0;
 
rc = hidma_ll_reset(lldev);
-- 
1.9.1



[PATCH V5 08/10] dmaengine: qcom_hidma: protect common data structures

2016-10-06 Thread Sinan Kaya
When MSI interrupts are supported, error and the transfer interrupt can
come from multiple processor contexts.

Each error interrupt is an MSI interrupt. If the channel is disabled by
the first error interrupt, the remaining error interrupts will gracefully
return in the interrupt handler.

If an error is observed while servicing the completions in success case,
the posting of the completions will be aborted as soon as channel disabled
state is observed. The error interrupt handler will take it from there and
finish the remaining completions. We don't want to create multiple success
and error messages to be delivered to the client in mixed order.

Signed-off-by: Sinan Kaya 
---
 drivers/dma/qcom/hidma_ll.c | 44 +++-
 1 file changed, 11 insertions(+), 33 deletions(-)

diff --git a/drivers/dma/qcom/hidma_ll.c b/drivers/dma/qcom/hidma_ll.c
index 9d78c86..c4e8b64 100644
--- a/drivers/dma/qcom/hidma_ll.c
+++ b/drivers/dma/qcom/hidma_ll.c
@@ -198,13 +198,16 @@ static void hidma_ll_tre_complete(unsigned long arg)
}
 }
 
-static int hidma_post_completed(struct hidma_lldev *lldev, int tre_iterator,
-   u8 err_info, u8 err_code)
+static int hidma_post_completed(struct hidma_lldev *lldev, u8 err_info,
+   u8 err_code)
 {
struct hidma_tre *tre;
unsigned long flags;
+   u32 tre_iterator;
 
spin_lock_irqsave(>lock, flags);
+
+   tre_iterator = lldev->tre_processed_off;
tre = lldev->pending_tre_list[tre_iterator / HIDMA_TRE_SIZE];
if (!tre) {
spin_unlock_irqrestore(>lock, flags);
@@ -223,6 +226,9 @@ static int hidma_post_completed(struct hidma_lldev *lldev, 
int tre_iterator,
atomic_set(>pending_tre_count, 0);
}
 
+   HIDMA_INCREMENT_ITERATOR(tre_iterator, HIDMA_TRE_SIZE,
+lldev->tre_ring_size);
+   lldev->tre_processed_off = tre_iterator;
spin_unlock_irqrestore(>lock, flags);
 
tre->err_info = err_info;
@@ -244,13 +250,11 @@ static int hidma_post_completed(struct hidma_lldev 
*lldev, int tre_iterator,
 static int hidma_handle_tre_completion(struct hidma_lldev *lldev)
 {
u32 evre_ring_size = lldev->evre_ring_size;
-   u32 tre_ring_size = lldev->tre_ring_size;
u32 err_info, err_code, evre_write_off;
-   u32 tre_iterator, evre_iterator;
+   u32 evre_iterator;
u32 num_completed = 0;
 
evre_write_off = readl_relaxed(lldev->evca + HIDMA_EVCA_WRITE_PTR_REG);
-   tre_iterator = lldev->tre_processed_off;
evre_iterator = lldev->evre_processed_off;
 
if ((evre_write_off > evre_ring_size) ||
@@ -273,12 +277,9 @@ static int hidma_handle_tre_completion(struct hidma_lldev 
*lldev)
err_code =
(cfg >> HIDMA_EVRE_CODE_BIT_POS) & HIDMA_EVRE_CODE_MASK;
 
-   if (hidma_post_completed(lldev, tre_iterator, err_info,
-err_code))
+   if (hidma_post_completed(lldev, err_info, err_code))
break;
 
-   HIDMA_INCREMENT_ITERATOR(tre_iterator, HIDMA_TRE_SIZE,
-tre_ring_size);
HIDMA_INCREMENT_ITERATOR(evre_iterator, HIDMA_EVRE_SIZE,
 evre_ring_size);
 
@@ -295,16 +296,10 @@ static int hidma_handle_tre_completion(struct hidma_lldev 
*lldev)
if (num_completed) {
u32 evre_read_off = (lldev->evre_processed_off +
 HIDMA_EVRE_SIZE * num_completed);
-   u32 tre_read_off = (lldev->tre_processed_off +
-   HIDMA_TRE_SIZE * num_completed);
-
evre_read_off = evre_read_off % evre_ring_size;
-   tre_read_off = tre_read_off % tre_ring_size;
-
writel(evre_read_off, lldev->evca + HIDMA_EVCA_DOORBELL_REG);
 
/* record the last processed tre offset */
-   lldev->tre_processed_off = tre_read_off;
lldev->evre_processed_off = evre_read_off;
}
 
@@ -314,27 +309,10 @@ static int hidma_handle_tre_completion(struct hidma_lldev 
*lldev)
 void hidma_cleanup_pending_tre(struct hidma_lldev *lldev, u8 err_info,
   u8 err_code)
 {
-   u32 tre_iterator;
-   u32 tre_ring_size = lldev->tre_ring_size;
-   int num_completed = 0;
-   u32 tre_read_off;
-
-   tre_iterator = lldev->tre_processed_off;
while (atomic_read(>pending_tre_count)) {
-   if (hidma_post_completed(lldev, tre_iterator, err_info,
-err_code))
+   if (hidma_post_completed(lldev, err_info, err_code))
break;
-   HIDMA_INCREMENT_ITERATOR(tre_iterator, HIDMA_TRE_SIZE,
-tre_ring_size);
-   

[PATCH V5 07/10] dmaengine: qcom_hidma: add a common API to setup the interrupt

2016-10-06 Thread Sinan Kaya
Introducing the hidma_ll_setup_irq function to set up the interrupt
type externally from the OS interface.

Signed-off-by: Sinan Kaya 
---
 drivers/dma/qcom/hidma.h|  2 ++
 drivers/dma/qcom/hidma_ll.c | 27 +++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/qcom/hidma.h b/drivers/dma/qcom/hidma.h
index afaeb9a..b74a56e 100644
--- a/drivers/dma/qcom/hidma.h
+++ b/drivers/dma/qcom/hidma.h
@@ -46,6 +46,7 @@ struct hidma_tre {
 };
 
 struct hidma_lldev {
+   bool msi_support;   /* flag indicating MSI support*/
bool initialized;   /* initialized flag   */
u8 trch_state;  /* trch_state of the device   */
u8 evch_state;  /* evch_state of the device   */
@@ -148,6 +149,7 @@ int hidma_ll_disable(struct hidma_lldev *lldev);
 int hidma_ll_enable(struct hidma_lldev *llhndl);
 void hidma_ll_set_transfer_params(struct hidma_lldev *llhndl, u32 tre_ch,
dma_addr_t src, dma_addr_t dest, u32 len, u32 flags);
+void hidma_ll_setup_irq(struct hidma_lldev *lldev, bool msi);
 int hidma_ll_setup(struct hidma_lldev *lldev);
 struct hidma_lldev *hidma_ll_init(struct device *dev, u32 max_channels,
void __iomem *trca, void __iomem *evca,
diff --git a/drivers/dma/qcom/hidma_ll.c b/drivers/dma/qcom/hidma_ll.c
index 015df4b..9d78c86 100644
--- a/drivers/dma/qcom/hidma_ll.c
+++ b/drivers/dma/qcom/hidma_ll.c
@@ -715,17 +715,36 @@ int hidma_ll_setup(struct hidma_lldev *lldev)
writel(HIDMA_EVRE_SIZE * nr_tres,
lldev->evca + HIDMA_EVCA_RING_LEN_REG);
 
-   /* support IRQ only for now */
+   /* configure interrupts */
+   hidma_ll_setup_irq(lldev, lldev->msi_support);
+
+   rc = hidma_ll_enable(lldev);
+   if (rc)
+   return rc;
+
+   return rc;
+}
+
+void hidma_ll_setup_irq(struct hidma_lldev *lldev, bool msi)
+{
+   u32 val;
+
+   lldev->msi_support = msi;
+
+   /* disable interrupts again after reset */
+   writel(0, lldev->evca + HIDMA_EVCA_IRQ_CLR_REG);
+   writel(0, lldev->evca + HIDMA_EVCA_IRQ_EN_REG);
+
+   /* support IRQ by default */
val = readl(lldev->evca + HIDMA_EVCA_INTCTRL_REG);
val &= ~0xF;
-   val |= 0x1;
+   if (!lldev->msi_support)
+   val = val | 0x1;
writel(val, lldev->evca + HIDMA_EVCA_INTCTRL_REG);
 
/* clear all pending interrupts and enable them */
writel(ENABLE_IRQS, lldev->evca + HIDMA_EVCA_IRQ_CLR_REG);
writel(ENABLE_IRQS, lldev->evca + HIDMA_EVCA_IRQ_EN_REG);
-
-   return hidma_ll_enable(lldev);
 }
 
 struct hidma_lldev *hidma_ll_init(struct device *dev, u32 nr_tres,
-- 
1.9.1



[PATCH V5 06/10] dmaengine: qcom_hidma: bring out interrupt cause

2016-10-06 Thread Sinan Kaya
Bring out the interrupt cause to the top level so that MSI interrupts
can be hooked at a later stage.

Signed-off-by: Sinan Kaya 
---
 drivers/dma/qcom/hidma_ll.c | 57 ++---
 1 file changed, 33 insertions(+), 24 deletions(-)

diff --git a/drivers/dma/qcom/hidma_ll.c b/drivers/dma/qcom/hidma_ll.c
index a4fc941..015df4b 100644
--- a/drivers/dma/qcom/hidma_ll.c
+++ b/drivers/dma/qcom/hidma_ll.c
@@ -432,12 +432,24 @@ static void hidma_ll_abort(unsigned long arg)
  * requests traditionally to the destination, this concept does not apply
  * here for this HW.
  */
-irqreturn_t hidma_ll_inthandler(int chirq, void *arg)
+static void hidma_ll_int_handler_internal(struct hidma_lldev *lldev, int cause)
 {
-   struct hidma_lldev *lldev = arg;
-   u32 status;
-   u32 enable;
-   u32 cause;
+   if (cause & HIDMA_ERR_INT_MASK) {
+   dev_err(lldev->dev, "error 0x%x, disabling...\n",
+   cause);
+
+   /* Clear out pending interrupts */
+   writel(cause, lldev->evca + HIDMA_EVCA_IRQ_CLR_REG);
+
+   /* No further submissions. */
+   hidma_ll_disable(lldev);
+
+   /* Driver completes the txn and intimates the client.*/
+   hidma_cleanup_pending_tre(lldev, 0xFF,
+ HIDMA_EVRE_STATUS_ERROR);
+
+   return;
+   }
 
/*
 * Fine tuned for this HW...
@@ -446,30 +458,28 @@ irqreturn_t hidma_ll_inthandler(int chirq, void *arg)
 * read and write accessors are used for performance reasons due to
 * interrupt delivery guarantees. Do not copy this code blindly and
 * expect that to work.
+*
+* Try to consume as many EVREs as possible.
 */
+   hidma_handle_tre_completion(lldev);
+
+   /* We consumed TREs or there are pending TREs or EVREs. */
+   writel_relaxed(cause, lldev->evca + HIDMA_EVCA_IRQ_CLR_REG);
+}
+
+irqreturn_t hidma_ll_inthandler(int chirq, void *arg)
+{
+   struct hidma_lldev *lldev = arg;
+   u32 status;
+   u32 enable;
+   u32 cause;
+
status = readl_relaxed(lldev->evca + HIDMA_EVCA_IRQ_STAT_REG);
enable = readl_relaxed(lldev->evca + HIDMA_EVCA_IRQ_EN_REG);
cause = status & enable;
 
while (cause) {
-   if (cause & HIDMA_ERR_INT_MASK) {
-   dev_err(lldev->dev, "error 0x%x, resetting...\n",
-   cause);
-
-   /* Clear out pending interrupts */
-   writel(cause, lldev->evca + HIDMA_EVCA_IRQ_CLR_REG);
-
-   tasklet_schedule(>rst_task);
-   goto out;
-   }
-
-   /*
-* Try to consume as many EVREs as possible.
-*/
-   hidma_handle_tre_completion(lldev);
-
-   /* We consumed TREs or there are pending TREs or EVREs. */
-   writel_relaxed(cause, lldev->evca + HIDMA_EVCA_IRQ_CLR_REG);
+   hidma_ll_int_handler_internal(lldev, cause);
 
/*
 * Another interrupt might have arrived while we are
@@ -480,7 +490,6 @@ irqreturn_t hidma_ll_inthandler(int chirq, void *arg)
cause = status & enable;
}
 
-out:
return IRQ_HANDLED;
 }
 
-- 
1.9.1



[PATCH V5 09/10] dmaengine: qcom_hidma: break completion processing on error

2016-10-06 Thread Sinan Kaya
We try to consume as much successful transfers as possible. Now that we
support MSI interrupts, an error interrupt might be observed by another
processor while we are finishing the successful ones.

Try to abort successful processing if this is the case.

Signed-off-by: Sinan Kaya 
---
 drivers/dma/qcom/hidma_ll.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/dma/qcom/hidma_ll.c b/drivers/dma/qcom/hidma_ll.c
index c4e8b64..aa76ec1 100644
--- a/drivers/dma/qcom/hidma_ll.c
+++ b/drivers/dma/qcom/hidma_ll.c
@@ -291,6 +291,13 @@ static int hidma_handle_tre_completion(struct hidma_lldev 
*lldev)
evre_write_off =
readl_relaxed(lldev->evca + HIDMA_EVCA_WRITE_PTR_REG);
num_completed++;
+
+   /*
+* An error interrupt might have arrived while we are processing
+* the completed interrupt.
+*/
+   if (!hidma_ll_isenabled(lldev))
+   break;
}
 
if (num_completed) {
-- 
1.9.1



[PATCH V5 10/10] dmaengine: qcom_hidma: add MSI support for interrupts

2016-10-06 Thread Sinan Kaya
The interrupts can now be delivered as platform MSI interrupts on newer
platforms. The code looks for a new OF and ACPI strings in order to enable
the functionality.

Signed-off-by: Sinan Kaya 
---
 drivers/dma/qcom/hidma.c| 143 ++--
 drivers/dma/qcom/hidma.h|   2 +
 drivers/dma/qcom/hidma_ll.c |   8 +++
 3 files changed, 147 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/qcom/hidma.c b/drivers/dma/qcom/hidma.c
index 10a9e3a..7b13213 100644
--- a/drivers/dma/qcom/hidma.c
+++ b/drivers/dma/qcom/hidma.c
@@ -56,6 +56,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../dmaengine.h"
 #include "hidma.h"
@@ -70,6 +71,7 @@
 #define HIDMA_ERR_INFO_SW  0xFF
 #define HIDMA_ERR_CODE_UNEXPECTED_TERMINATE0x0
 #define HIDMA_NR_DEFAULT_DESC  10
+#define HIDMA_MSI_INTS 11
 
 static inline struct hidma_dev *to_hidma_dev(struct dma_device *dmadev)
 {
@@ -530,6 +532,15 @@ static irqreturn_t hidma_chirq_handler(int chirq, void 
*arg)
return hidma_ll_inthandler(chirq, lldev);
 }
 
+static irqreturn_t hidma_chirq_handler_msi(int chirq, void *arg)
+{
+   struct hidma_lldev **lldevp = arg;
+   struct hidma_dev *dmadev = to_hidma_dev_from_lldev(lldevp);
+
+   return hidma_ll_inthandler_msi(chirq, *lldevp,
+  1 << (chirq - dmadev->msi_virqbase));
+}
+
 static ssize_t hidma_show_values(struct device *dev,
 struct device_attribute *attr, char *buf)
 {
@@ -584,6 +595,104 @@ static int hidma_sysfs_init(struct hidma_dev *dev)
return device_create_file(dev->ddev.dev, dev->chid_attrs);
 }
 
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+static void hidma_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
+{
+   struct device *dev = msi_desc_to_dev(desc);
+   struct hidma_dev *dmadev = dev_get_drvdata(dev);
+
+   if (!desc->platform.msi_index) {
+   writel(msg->address_lo, dmadev->dev_evca + 0x118);
+   writel(msg->address_hi, dmadev->dev_evca + 0x11C);
+   writel(msg->data, dmadev->dev_evca + 0x120);
+   }
+}
+#endif
+
+static void hidma_free_msis(struct hidma_dev *dmadev)
+{
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+   struct device *dev = dmadev->ddev.dev;
+   struct msi_desc *desc;
+
+   /* free allocated MSI interrupts above */
+   for_each_msi_entry(desc, dev)
+   devm_free_irq(dev, desc->irq, >lldev);
+
+   platform_msi_domain_free_irqs(dev);
+#endif
+}
+
+static int hidma_request_msi(struct hidma_dev *dmadev,
+struct platform_device *pdev)
+{
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+   int rc;
+   struct msi_desc *desc;
+   struct msi_desc *failed_desc = NULL;
+
+   rc = platform_msi_domain_alloc_irqs(>dev, HIDMA_MSI_INTS,
+   hidma_write_msi_msg);
+   if (rc)
+   return rc;
+
+   for_each_msi_entry(desc, >dev) {
+   if (!desc->platform.msi_index)
+   dmadev->msi_virqbase = desc->irq;
+
+   rc = devm_request_irq(>dev, desc->irq,
+  hidma_chirq_handler_msi,
+  0, "qcom-hidma-msi",
+  >lldev);
+   if (rc) {
+   failed_desc = desc;
+   break;
+   }
+   }
+
+   if (rc) {
+   /* free allocated MSI interrupts above */
+   for_each_msi_entry(desc, >dev) {
+   if (desc == failed_desc)
+   break;
+   devm_free_irq(>dev, desc->irq,
+ >lldev);
+   }
+   } else {
+   /* Add callback to free MSIs on teardown */
+   hidma_ll_setup_irq(dmadev->lldev, true);
+
+   }
+   if (rc)
+   dev_warn(>dev,
+"failed to request MSI irq, falling back to wired 
IRQ\n");
+   return rc;
+#else
+   return -EINVAL;
+#endif
+}
+
+static bool hidma_msi_capable(struct device *dev)
+{
+   struct acpi_device *adev = ACPI_COMPANION(dev);
+   const char *of_compat;
+   int ret = -EINVAL;
+
+   if (!adev || acpi_disabled) {
+   ret = device_property_read_string(dev, "compatible",
+ _compat);
+   if (ret)
+   return false;
+
+   ret = strcmp(of_compat, "qcom,hidma-1.1");
+   } else {
+#ifdef CONFIG_ACPI
+   ret = strcmp(acpi_device_hid(adev), "QCOM8062");
+#endif
+   }
+   return ret == 0;
+}
+
 static int hidma_probe(struct platform_device *pdev)
 {
struct hidma_dev *dmadev;
@@ -593,6 +702,7 @@ static int hidma_probe(struct platform_device *pdev)
void 

[PATCH V5 07/10] dmaengine: qcom_hidma: add a common API to setup the interrupt

2016-10-06 Thread Sinan Kaya
Introducing the hidma_ll_setup_irq function to set up the interrupt
type externally from the OS interface.

Signed-off-by: Sinan Kaya 
---
 drivers/dma/qcom/hidma.h|  2 ++
 drivers/dma/qcom/hidma_ll.c | 27 +++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/qcom/hidma.h b/drivers/dma/qcom/hidma.h
index afaeb9a..b74a56e 100644
--- a/drivers/dma/qcom/hidma.h
+++ b/drivers/dma/qcom/hidma.h
@@ -46,6 +46,7 @@ struct hidma_tre {
 };
 
 struct hidma_lldev {
+   bool msi_support;   /* flag indicating MSI support*/
bool initialized;   /* initialized flag   */
u8 trch_state;  /* trch_state of the device   */
u8 evch_state;  /* evch_state of the device   */
@@ -148,6 +149,7 @@ int hidma_ll_disable(struct hidma_lldev *lldev);
 int hidma_ll_enable(struct hidma_lldev *llhndl);
 void hidma_ll_set_transfer_params(struct hidma_lldev *llhndl, u32 tre_ch,
dma_addr_t src, dma_addr_t dest, u32 len, u32 flags);
+void hidma_ll_setup_irq(struct hidma_lldev *lldev, bool msi);
 int hidma_ll_setup(struct hidma_lldev *lldev);
 struct hidma_lldev *hidma_ll_init(struct device *dev, u32 max_channels,
void __iomem *trca, void __iomem *evca,
diff --git a/drivers/dma/qcom/hidma_ll.c b/drivers/dma/qcom/hidma_ll.c
index 015df4b..9d78c86 100644
--- a/drivers/dma/qcom/hidma_ll.c
+++ b/drivers/dma/qcom/hidma_ll.c
@@ -715,17 +715,36 @@ int hidma_ll_setup(struct hidma_lldev *lldev)
writel(HIDMA_EVRE_SIZE * nr_tres,
lldev->evca + HIDMA_EVCA_RING_LEN_REG);
 
-   /* support IRQ only for now */
+   /* configure interrupts */
+   hidma_ll_setup_irq(lldev, lldev->msi_support);
+
+   rc = hidma_ll_enable(lldev);
+   if (rc)
+   return rc;
+
+   return rc;
+}
+
+void hidma_ll_setup_irq(struct hidma_lldev *lldev, bool msi)
+{
+   u32 val;
+
+   lldev->msi_support = msi;
+
+   /* disable interrupts again after reset */
+   writel(0, lldev->evca + HIDMA_EVCA_IRQ_CLR_REG);
+   writel(0, lldev->evca + HIDMA_EVCA_IRQ_EN_REG);
+
+   /* support IRQ by default */
val = readl(lldev->evca + HIDMA_EVCA_INTCTRL_REG);
val &= ~0xF;
-   val |= 0x1;
+   if (!lldev->msi_support)
+   val = val | 0x1;
writel(val, lldev->evca + HIDMA_EVCA_INTCTRL_REG);
 
/* clear all pending interrupts and enable them */
writel(ENABLE_IRQS, lldev->evca + HIDMA_EVCA_IRQ_CLR_REG);
writel(ENABLE_IRQS, lldev->evca + HIDMA_EVCA_IRQ_EN_REG);
-
-   return hidma_ll_enable(lldev);
 }
 
 struct hidma_lldev *hidma_ll_init(struct device *dev, u32 nr_tres,
-- 
1.9.1



[PATCH V5 06/10] dmaengine: qcom_hidma: bring out interrupt cause

2016-10-06 Thread Sinan Kaya
Bring out the interrupt cause to the top level so that MSI interrupts
can be hooked at a later stage.

Signed-off-by: Sinan Kaya 
---
 drivers/dma/qcom/hidma_ll.c | 57 ++---
 1 file changed, 33 insertions(+), 24 deletions(-)

diff --git a/drivers/dma/qcom/hidma_ll.c b/drivers/dma/qcom/hidma_ll.c
index a4fc941..015df4b 100644
--- a/drivers/dma/qcom/hidma_ll.c
+++ b/drivers/dma/qcom/hidma_ll.c
@@ -432,12 +432,24 @@ static void hidma_ll_abort(unsigned long arg)
  * requests traditionally to the destination, this concept does not apply
  * here for this HW.
  */
-irqreturn_t hidma_ll_inthandler(int chirq, void *arg)
+static void hidma_ll_int_handler_internal(struct hidma_lldev *lldev, int cause)
 {
-   struct hidma_lldev *lldev = arg;
-   u32 status;
-   u32 enable;
-   u32 cause;
+   if (cause & HIDMA_ERR_INT_MASK) {
+   dev_err(lldev->dev, "error 0x%x, disabling...\n",
+   cause);
+
+   /* Clear out pending interrupts */
+   writel(cause, lldev->evca + HIDMA_EVCA_IRQ_CLR_REG);
+
+   /* No further submissions. */
+   hidma_ll_disable(lldev);
+
+   /* Driver completes the txn and intimates the client.*/
+   hidma_cleanup_pending_tre(lldev, 0xFF,
+ HIDMA_EVRE_STATUS_ERROR);
+
+   return;
+   }
 
/*
 * Fine tuned for this HW...
@@ -446,30 +458,28 @@ irqreturn_t hidma_ll_inthandler(int chirq, void *arg)
 * read and write accessors are used for performance reasons due to
 * interrupt delivery guarantees. Do not copy this code blindly and
 * expect that to work.
+*
+* Try to consume as many EVREs as possible.
 */
+   hidma_handle_tre_completion(lldev);
+
+   /* We consumed TREs or there are pending TREs or EVREs. */
+   writel_relaxed(cause, lldev->evca + HIDMA_EVCA_IRQ_CLR_REG);
+}
+
+irqreturn_t hidma_ll_inthandler(int chirq, void *arg)
+{
+   struct hidma_lldev *lldev = arg;
+   u32 status;
+   u32 enable;
+   u32 cause;
+
status = readl_relaxed(lldev->evca + HIDMA_EVCA_IRQ_STAT_REG);
enable = readl_relaxed(lldev->evca + HIDMA_EVCA_IRQ_EN_REG);
cause = status & enable;
 
while (cause) {
-   if (cause & HIDMA_ERR_INT_MASK) {
-   dev_err(lldev->dev, "error 0x%x, resetting...\n",
-   cause);
-
-   /* Clear out pending interrupts */
-   writel(cause, lldev->evca + HIDMA_EVCA_IRQ_CLR_REG);
-
-   tasklet_schedule(>rst_task);
-   goto out;
-   }
-
-   /*
-* Try to consume as many EVREs as possible.
-*/
-   hidma_handle_tre_completion(lldev);
-
-   /* We consumed TREs or there are pending TREs or EVREs. */
-   writel_relaxed(cause, lldev->evca + HIDMA_EVCA_IRQ_CLR_REG);
+   hidma_ll_int_handler_internal(lldev, cause);
 
/*
 * Another interrupt might have arrived while we are
@@ -480,7 +490,6 @@ irqreturn_t hidma_ll_inthandler(int chirq, void *arg)
cause = status & enable;
}
 
-out:
return IRQ_HANDLED;
 }
 
-- 
1.9.1



[PATCH V5 09/10] dmaengine: qcom_hidma: break completion processing on error

2016-10-06 Thread Sinan Kaya
We try to consume as much successful transfers as possible. Now that we
support MSI interrupts, an error interrupt might be observed by another
processor while we are finishing the successful ones.

Try to abort successful processing if this is the case.

Signed-off-by: Sinan Kaya 
---
 drivers/dma/qcom/hidma_ll.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/dma/qcom/hidma_ll.c b/drivers/dma/qcom/hidma_ll.c
index c4e8b64..aa76ec1 100644
--- a/drivers/dma/qcom/hidma_ll.c
+++ b/drivers/dma/qcom/hidma_ll.c
@@ -291,6 +291,13 @@ static int hidma_handle_tre_completion(struct hidma_lldev 
*lldev)
evre_write_off =
readl_relaxed(lldev->evca + HIDMA_EVCA_WRITE_PTR_REG);
num_completed++;
+
+   /*
+* An error interrupt might have arrived while we are processing
+* the completed interrupt.
+*/
+   if (!hidma_ll_isenabled(lldev))
+   break;
}
 
if (num_completed) {
-- 
1.9.1



[PATCH V5 10/10] dmaengine: qcom_hidma: add MSI support for interrupts

2016-10-06 Thread Sinan Kaya
The interrupts can now be delivered as platform MSI interrupts on newer
platforms. The code looks for a new OF and ACPI strings in order to enable
the functionality.

Signed-off-by: Sinan Kaya 
---
 drivers/dma/qcom/hidma.c| 143 ++--
 drivers/dma/qcom/hidma.h|   2 +
 drivers/dma/qcom/hidma_ll.c |   8 +++
 3 files changed, 147 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/qcom/hidma.c b/drivers/dma/qcom/hidma.c
index 10a9e3a..7b13213 100644
--- a/drivers/dma/qcom/hidma.c
+++ b/drivers/dma/qcom/hidma.c
@@ -56,6 +56,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../dmaengine.h"
 #include "hidma.h"
@@ -70,6 +71,7 @@
 #define HIDMA_ERR_INFO_SW  0xFF
 #define HIDMA_ERR_CODE_UNEXPECTED_TERMINATE0x0
 #define HIDMA_NR_DEFAULT_DESC  10
+#define HIDMA_MSI_INTS 11
 
 static inline struct hidma_dev *to_hidma_dev(struct dma_device *dmadev)
 {
@@ -530,6 +532,15 @@ static irqreturn_t hidma_chirq_handler(int chirq, void 
*arg)
return hidma_ll_inthandler(chirq, lldev);
 }
 
+static irqreturn_t hidma_chirq_handler_msi(int chirq, void *arg)
+{
+   struct hidma_lldev **lldevp = arg;
+   struct hidma_dev *dmadev = to_hidma_dev_from_lldev(lldevp);
+
+   return hidma_ll_inthandler_msi(chirq, *lldevp,
+  1 << (chirq - dmadev->msi_virqbase));
+}
+
 static ssize_t hidma_show_values(struct device *dev,
 struct device_attribute *attr, char *buf)
 {
@@ -584,6 +595,104 @@ static int hidma_sysfs_init(struct hidma_dev *dev)
return device_create_file(dev->ddev.dev, dev->chid_attrs);
 }
 
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+static void hidma_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
+{
+   struct device *dev = msi_desc_to_dev(desc);
+   struct hidma_dev *dmadev = dev_get_drvdata(dev);
+
+   if (!desc->platform.msi_index) {
+   writel(msg->address_lo, dmadev->dev_evca + 0x118);
+   writel(msg->address_hi, dmadev->dev_evca + 0x11C);
+   writel(msg->data, dmadev->dev_evca + 0x120);
+   }
+}
+#endif
+
+static void hidma_free_msis(struct hidma_dev *dmadev)
+{
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+   struct device *dev = dmadev->ddev.dev;
+   struct msi_desc *desc;
+
+   /* free allocated MSI interrupts above */
+   for_each_msi_entry(desc, dev)
+   devm_free_irq(dev, desc->irq, >lldev);
+
+   platform_msi_domain_free_irqs(dev);
+#endif
+}
+
+static int hidma_request_msi(struct hidma_dev *dmadev,
+struct platform_device *pdev)
+{
+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+   int rc;
+   struct msi_desc *desc;
+   struct msi_desc *failed_desc = NULL;
+
+   rc = platform_msi_domain_alloc_irqs(>dev, HIDMA_MSI_INTS,
+   hidma_write_msi_msg);
+   if (rc)
+   return rc;
+
+   for_each_msi_entry(desc, >dev) {
+   if (!desc->platform.msi_index)
+   dmadev->msi_virqbase = desc->irq;
+
+   rc = devm_request_irq(>dev, desc->irq,
+  hidma_chirq_handler_msi,
+  0, "qcom-hidma-msi",
+  >lldev);
+   if (rc) {
+   failed_desc = desc;
+   break;
+   }
+   }
+
+   if (rc) {
+   /* free allocated MSI interrupts above */
+   for_each_msi_entry(desc, >dev) {
+   if (desc == failed_desc)
+   break;
+   devm_free_irq(>dev, desc->irq,
+ >lldev);
+   }
+   } else {
+   /* Add callback to free MSIs on teardown */
+   hidma_ll_setup_irq(dmadev->lldev, true);
+
+   }
+   if (rc)
+   dev_warn(>dev,
+"failed to request MSI irq, falling back to wired 
IRQ\n");
+   return rc;
+#else
+   return -EINVAL;
+#endif
+}
+
+static bool hidma_msi_capable(struct device *dev)
+{
+   struct acpi_device *adev = ACPI_COMPANION(dev);
+   const char *of_compat;
+   int ret = -EINVAL;
+
+   if (!adev || acpi_disabled) {
+   ret = device_property_read_string(dev, "compatible",
+ _compat);
+   if (ret)
+   return false;
+
+   ret = strcmp(of_compat, "qcom,hidma-1.1");
+   } else {
+#ifdef CONFIG_ACPI
+   ret = strcmp(acpi_device_hid(adev), "QCOM8062");
+#endif
+   }
+   return ret == 0;
+}
+
 static int hidma_probe(struct platform_device *pdev)
 {
struct hidma_dev *dmadev;
@@ -593,6 +702,7 @@ static int hidma_probe(struct platform_device *pdev)
void __iomem *evca;

[PATCH V5 03/10] of: irq: make of_msi_configure accessible from modules

2016-10-06 Thread Sinan Kaya
The of_msi_configure routine is only accessible by the built-in
kernel drivers. Export this function so that modules can use it
too.

This function is useful for configuring MSI on child device tree
nodes on hierarchical objects.

Acked-by: Rob Herring 
Signed-off-by: Sinan Kaya 
---
 drivers/of/irq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/of/irq.c b/drivers/of/irq.c
index a2e68f7..20c09e0 100644
--- a/drivers/of/irq.c
+++ b/drivers/of/irq.c
@@ -767,3 +767,4 @@ void of_msi_configure(struct device *dev, struct 
device_node *np)
dev_set_msi_domain(dev,
   of_msi_get_domain(dev, np, DOMAIN_BUS_PLATFORM_MSI));
 }
+EXPORT_SYMBOL_GPL(of_msi_configure);
-- 
1.9.1



[PATCH V5 03/10] of: irq: make of_msi_configure accessible from modules

2016-10-06 Thread Sinan Kaya
The of_msi_configure routine is only accessible by the built-in
kernel drivers. Export this function so that modules can use it
too.

This function is useful for configuring MSI on child device tree
nodes on hierarchical objects.

Acked-by: Rob Herring 
Signed-off-by: Sinan Kaya 
---
 drivers/of/irq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/of/irq.c b/drivers/of/irq.c
index a2e68f7..20c09e0 100644
--- a/drivers/of/irq.c
+++ b/drivers/of/irq.c
@@ -767,3 +767,4 @@ void of_msi_configure(struct device *dev, struct 
device_node *np)
dev_set_msi_domain(dev,
   of_msi_get_domain(dev, np, DOMAIN_BUS_PLATFORM_MSI));
 }
+EXPORT_SYMBOL_GPL(of_msi_configure);
-- 
1.9.1



[PATCH V5 02/10] Documentation: DT: qcom_hidma: correct spelling mistakes

2016-10-06 Thread Sinan Kaya
Fix the spelling mistakes and extra and statements in the sentences.

Acked-by: Rob Herring 
Signed-off-by: Sinan Kaya 
---
 Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt 
b/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt
index 2c5e4b8..55492c2 100644
--- a/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt
+++ b/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt
@@ -5,13 +5,13 @@ memcpy and memset capabilities. It has been designed for 
virtualized
 environments.
 
 Each HIDMA HW instance consists of multiple DMA channels. These channels
-share the same bandwidth. The bandwidth utilization can be parititioned
+share the same bandwidth. The bandwidth utilization can be partitioned
 among channels based on the priority and weight assignments.
 
 There are only two priority levels and 15 weigh assignments possible.
 
 Other parameters here determine how much of the system bus this HIDMA
-instance can use like maximum read/write request and and number of bytes to
+instance can use like maximum read/write request and number of bytes to
 read/write in a single burst.
 
 Main node required properties:
-- 
1.9.1



[PATCH V5 01/10] Documentation: DT: qcom_hidma: update binding for MSI

2016-10-06 Thread Sinan Kaya
Adding a new binding for qcom,hidma-1.1 to distinguish HW supporting
MSI interrupts from the older revision.

Signed-off-by: Sinan Kaya 
---
 Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt 
b/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt
index fd5618b..2c5e4b8 100644
--- a/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt
+++ b/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt
@@ -47,12 +47,18 @@ When the OS is not in control of the management interface 
(i.e. it's a guest),
 the channel nodes appear on their own, not under a management node.
 
 Required properties:
-- compatible: must contain "qcom,hidma-1.0"
+- compatible: must contain "qcom,hidma-1.0" for initial HW or "qcom,hidma-1.1"
+for MSI capable HW.
 - reg: Addresses for the transfer and event channel
 - interrupts: Should contain the event interrupt
 - desc-count: Number of asynchronous requests this channel can handle
 - iommus: required a iommu node
 
+Optional properties for MSI:
+- msi-parent : See the generic MSI binding described in
+ devicetree/bindings/interrupt-controller/msi.txt for a description of the
+ msi-parent property.
+
 Example:
 
 Hypervisor OS configuration:
-- 
1.9.1



[PATCH V5 02/10] Documentation: DT: qcom_hidma: correct spelling mistakes

2016-10-06 Thread Sinan Kaya
Fix the spelling mistakes and extra and statements in the sentences.

Acked-by: Rob Herring 
Signed-off-by: Sinan Kaya 
---
 Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt 
b/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt
index 2c5e4b8..55492c2 100644
--- a/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt
+++ b/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt
@@ -5,13 +5,13 @@ memcpy and memset capabilities. It has been designed for 
virtualized
 environments.
 
 Each HIDMA HW instance consists of multiple DMA channels. These channels
-share the same bandwidth. The bandwidth utilization can be parititioned
+share the same bandwidth. The bandwidth utilization can be partitioned
 among channels based on the priority and weight assignments.
 
 There are only two priority levels and 15 weigh assignments possible.
 
 Other parameters here determine how much of the system bus this HIDMA
-instance can use like maximum read/write request and and number of bytes to
+instance can use like maximum read/write request and number of bytes to
 read/write in a single burst.
 
 Main node required properties:
-- 
1.9.1



[PATCH V5 01/10] Documentation: DT: qcom_hidma: update binding for MSI

2016-10-06 Thread Sinan Kaya
Adding a new binding for qcom,hidma-1.1 to distinguish HW supporting
MSI interrupts from the older revision.

Signed-off-by: Sinan Kaya 
---
 Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt 
b/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt
index fd5618b..2c5e4b8 100644
--- a/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt
+++ b/Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt
@@ -47,12 +47,18 @@ When the OS is not in control of the management interface 
(i.e. it's a guest),
 the channel nodes appear on their own, not under a management node.
 
 Required properties:
-- compatible: must contain "qcom,hidma-1.0"
+- compatible: must contain "qcom,hidma-1.0" for initial HW or "qcom,hidma-1.1"
+for MSI capable HW.
 - reg: Addresses for the transfer and event channel
 - interrupts: Should contain the event interrupt
 - desc-count: Number of asynchronous requests this channel can handle
 - iommus: required a iommu node
 
+Optional properties for MSI:
+- msi-parent : See the generic MSI binding described in
+ devicetree/bindings/interrupt-controller/msi.txt for a description of the
+ msi-parent property.
+
 Example:
 
 Hypervisor OS configuration:
-- 
1.9.1



[PATCH V5 04/10] dmaengine: qcom_hidma: configure DMA and MSI for OF

2016-10-06 Thread Sinan Kaya
Configure the DMA bindings for the device tree based firmware.

Signed-off-by: Sinan Kaya 
---
 drivers/dma/qcom/hidma_mgmt.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/qcom/hidma_mgmt.c b/drivers/dma/qcom/hidma_mgmt.c
index 82f36e4..185d29c 100644
--- a/drivers/dma/qcom/hidma_mgmt.c
+++ b/drivers/dma/qcom/hidma_mgmt.c
@@ -375,8 +375,15 @@ static int __init hidma_mgmt_of_populate_channels(struct 
device_node *np)
ret = PTR_ERR(new_pdev);
goto out;
}
+   of_node_get(child);
+   new_pdev->dev.of_node = child;
of_dma_configure(_pdev->dev, child);
-
+   /*
+* It is assumed that calling of_msi_configure is safe on
+* platforms with or without MSI support.
+*/
+   of_msi_configure(_pdev->dev, child);
+   of_node_put(child);
kfree(res);
res = NULL;
}
-- 
1.9.1



[PATCH V5 04/10] dmaengine: qcom_hidma: configure DMA and MSI for OF

2016-10-06 Thread Sinan Kaya
Configure the DMA bindings for the device tree based firmware.

Signed-off-by: Sinan Kaya 
---
 drivers/dma/qcom/hidma_mgmt.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/qcom/hidma_mgmt.c b/drivers/dma/qcom/hidma_mgmt.c
index 82f36e4..185d29c 100644
--- a/drivers/dma/qcom/hidma_mgmt.c
+++ b/drivers/dma/qcom/hidma_mgmt.c
@@ -375,8 +375,15 @@ static int __init hidma_mgmt_of_populate_channels(struct 
device_node *np)
ret = PTR_ERR(new_pdev);
goto out;
}
+   of_node_get(child);
+   new_pdev->dev.of_node = child;
of_dma_configure(_pdev->dev, child);
-
+   /*
+* It is assumed that calling of_msi_configure is safe on
+* platforms with or without MSI support.
+*/
+   of_msi_configure(_pdev->dev, child);
+   of_node_put(child);
kfree(res);
res = NULL;
}
-- 
1.9.1



Re: [PATCH V1 05/10] thermal: da9062/61: Thermal junction temperature monitoring driver

2016-10-06 Thread Keerthy

Steve,

On Thursday 06 October 2016 02:13 PM, Steve Twiss wrote:

From: Steve Twiss 

Add junction temperature monitoring supervisor device driver, compatible
with the DA9062 and DA9061 PMICs.

If the PMIC's internal junction temperature rises above TEMP_WARN (125
degC) an interrupt is issued. This TEMP_WARN level is defined as the
THERMAL_TRIP_HOT trip-wire inside the device driver. A kernel work queue
is configured to repeatedly poll this temperature trip-wire, between 1 and
10 second intervals (defaulting at 3 seconds).

This first level of temperature supervision is intended for non-invasive
temperature control, where the necessary measures for cooling the system
down are left to the host software. In this case, inside the thermal
notification function da9062_thermal_notify().

Signed-off-by: Steve Twiss 

---
This patch applies against linux-next and v4.8

Regards,
Steve Twiss, Dialog Semiconductor Ltd.


 drivers/thermal/Kconfig  |  10 ++
 drivers/thermal/Makefile |   1 +
 drivers/thermal/da9062-thermal.c | 313 +++
 3 files changed, 324 insertions(+)
 create mode 100644 drivers/thermal/da9062-thermal.c

diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index 2d702ca..da58e54 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -272,6 +272,16 @@ config DB8500_CPUFREQ_COOLING
  bound cpufreq cooling device turns active to set CPU frequency low to
  cool down the CPU.

+config DA9062_THERMAL
+   tristate "DA9062/DA9061 Dialog Semiconductor thermal driver"
+   depends on MFD_DA9062
+   depends on OF
+   help
+ Enable this for the Dialog Semiconductor thermal sensor driver.
+ This will report PMIC junction over-temperature for one thermal trip
+ zone.
+ Compatible with the DA9062 and DA9061 PMICs.
+
 config INTEL_POWERCLAMP
tristate "Intel PowerClamp idle injection driver"
depends on THERMAL
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
index 10b07c1..0a2b3f2 100644
--- a/drivers/thermal/Makefile
+++ b/drivers/thermal/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_ARMADA_THERMAL)  += armada_thermal.o
 obj-$(CONFIG_TANGO_THERMAL)+= tango_thermal.o
 obj-$(CONFIG_IMX_THERMAL)  += imx_thermal.o
 obj-$(CONFIG_DB8500_CPUFREQ_COOLING)   += db8500_cpufreq_cooling.o
+obj-$(CONFIG_DA9062_THERMAL)   += da9062-thermal.o
 obj-$(CONFIG_INTEL_POWERCLAMP) += intel_powerclamp.o
 obj-$(CONFIG_X86_PKG_TEMP_THERMAL) += x86_pkg_temp_thermal.o
 obj-$(CONFIG_INTEL_SOC_DTS_IOSF_CORE)  += intel_soc_dts_iosf.o
diff --git a/drivers/thermal/da9062-thermal.c b/drivers/thermal/da9062-thermal.c
new file mode 100644
index 000..feeabf6
--- /dev/null
+++ b/drivers/thermal/da9062-thermal.c
@@ -0,0 +1,313 @@
+/*
+ * Thermal device driver for DA9062 and DA9061
+ * Copyright (C) 2016  Dialog Semiconductor Ltd.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#define DA9062_DEFAULT_POLLING_MS_PERIOD   3000
+#define DA9062_MAX_POLLING_MS_PERIOD   1
+#define DA9062_MIN_POLLING_MS_PERIOD   1000
+
+#define DA9062_MILLI_CELSIUS(t)((t)*1000)
+
+struct da9062_thermal_config {
+   const char *name;
+};
+
+struct da9062_thermal {
+   struct da9062 *hw;
+   struct delayed_work work;
+   struct thermal_zone_device *zone;
+   enum thermal_device_mode mode;
+   unsigned int polling_period;
+   struct mutex lock;
+   int temperature;
+   int irq;
+   const struct da9062_thermal_config *config;
+   struct device *dev;
+};
+
+static void da9062_thermal_poll_on(struct work_struct *work)
+{
+   struct da9062_thermal *thermal = container_of(work,
+   struct da9062_thermal,
+   work.work);
+   unsigned int val;
+   int ret;
+
+   /* clear E_TEMP */
+   ret = regmap_write(thermal->hw->regmap,
+   DA9062AA_EVENT_B,
+   DA9062AA_E_TEMP_MASK);
+   if (ret < 0) {
+   dev_err(thermal->dev,
+   "Cannot clear the TJUNC temperature status\n");
+   goto err_enable_irq;
+   }
+
+   /* Now read E_TEMP again: it is acting like a status bit.
+

Re: [PATCH V1 05/10] thermal: da9062/61: Thermal junction temperature monitoring driver

2016-10-06 Thread Keerthy

Steve,

On Thursday 06 October 2016 02:13 PM, Steve Twiss wrote:

From: Steve Twiss 

Add junction temperature monitoring supervisor device driver, compatible
with the DA9062 and DA9061 PMICs.

If the PMIC's internal junction temperature rises above TEMP_WARN (125
degC) an interrupt is issued. This TEMP_WARN level is defined as the
THERMAL_TRIP_HOT trip-wire inside the device driver. A kernel work queue
is configured to repeatedly poll this temperature trip-wire, between 1 and
10 second intervals (defaulting at 3 seconds).

This first level of temperature supervision is intended for non-invasive
temperature control, where the necessary measures for cooling the system
down are left to the host software. In this case, inside the thermal
notification function da9062_thermal_notify().

Signed-off-by: Steve Twiss 

---
This patch applies against linux-next and v4.8

Regards,
Steve Twiss, Dialog Semiconductor Ltd.


 drivers/thermal/Kconfig  |  10 ++
 drivers/thermal/Makefile |   1 +
 drivers/thermal/da9062-thermal.c | 313 +++
 3 files changed, 324 insertions(+)
 create mode 100644 drivers/thermal/da9062-thermal.c

diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index 2d702ca..da58e54 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -272,6 +272,16 @@ config DB8500_CPUFREQ_COOLING
  bound cpufreq cooling device turns active to set CPU frequency low to
  cool down the CPU.

+config DA9062_THERMAL
+   tristate "DA9062/DA9061 Dialog Semiconductor thermal driver"
+   depends on MFD_DA9062
+   depends on OF
+   help
+ Enable this for the Dialog Semiconductor thermal sensor driver.
+ This will report PMIC junction over-temperature for one thermal trip
+ zone.
+ Compatible with the DA9062 and DA9061 PMICs.
+
 config INTEL_POWERCLAMP
tristate "Intel PowerClamp idle injection driver"
depends on THERMAL
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
index 10b07c1..0a2b3f2 100644
--- a/drivers/thermal/Makefile
+++ b/drivers/thermal/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_ARMADA_THERMAL)  += armada_thermal.o
 obj-$(CONFIG_TANGO_THERMAL)+= tango_thermal.o
 obj-$(CONFIG_IMX_THERMAL)  += imx_thermal.o
 obj-$(CONFIG_DB8500_CPUFREQ_COOLING)   += db8500_cpufreq_cooling.o
+obj-$(CONFIG_DA9062_THERMAL)   += da9062-thermal.o
 obj-$(CONFIG_INTEL_POWERCLAMP) += intel_powerclamp.o
 obj-$(CONFIG_X86_PKG_TEMP_THERMAL) += x86_pkg_temp_thermal.o
 obj-$(CONFIG_INTEL_SOC_DTS_IOSF_CORE)  += intel_soc_dts_iosf.o
diff --git a/drivers/thermal/da9062-thermal.c b/drivers/thermal/da9062-thermal.c
new file mode 100644
index 000..feeabf6
--- /dev/null
+++ b/drivers/thermal/da9062-thermal.c
@@ -0,0 +1,313 @@
+/*
+ * Thermal device driver for DA9062 and DA9061
+ * Copyright (C) 2016  Dialog Semiconductor Ltd.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#define DA9062_DEFAULT_POLLING_MS_PERIOD   3000
+#define DA9062_MAX_POLLING_MS_PERIOD   1
+#define DA9062_MIN_POLLING_MS_PERIOD   1000
+
+#define DA9062_MILLI_CELSIUS(t)((t)*1000)
+
+struct da9062_thermal_config {
+   const char *name;
+};
+
+struct da9062_thermal {
+   struct da9062 *hw;
+   struct delayed_work work;
+   struct thermal_zone_device *zone;
+   enum thermal_device_mode mode;
+   unsigned int polling_period;
+   struct mutex lock;
+   int temperature;
+   int irq;
+   const struct da9062_thermal_config *config;
+   struct device *dev;
+};
+
+static void da9062_thermal_poll_on(struct work_struct *work)
+{
+   struct da9062_thermal *thermal = container_of(work,
+   struct da9062_thermal,
+   work.work);
+   unsigned int val;
+   int ret;
+
+   /* clear E_TEMP */
+   ret = regmap_write(thermal->hw->regmap,
+   DA9062AA_EVENT_B,
+   DA9062AA_E_TEMP_MASK);
+   if (ret < 0) {
+   dev_err(thermal->dev,
+   "Cannot clear the TJUNC temperature status\n");
+   goto err_enable_irq;
+   }
+
+   /* Now read E_TEMP again: it is acting like a status bit.
+* If over-temperature, then this status will be true.
+   

Re: [PATCH] mm/slab: fix kmemcg cache creation delayed issue

2016-10-06 Thread Joonsoo Kim
On Thu, Oct 06, 2016 at 09:02:00AM -0700, Doug Smythies wrote:
> It was my (limited) understanding that the subsequent 2 patch set
> superseded this patch. Indeed, the 2 patch set seems to solve
> both the SLAB and SLUB bug reports.

It would mean that patch 1 solves both the SLAB and SLUB bug reports
since patch 2 is only effective for SLUB.

Reason that I send this patch is that although patch 1 fixes the
issue that too many kworkers are created, kmem_cache creation/destory
is still slowed by synchronize_sched() and it would cause kmemcg
usage counting delayed. I'm not sure how bad it is but it's generally
better to start accounting as soon as possible. With patch 2 for SLUB
and this patch for SLAB, performance of kmem_cache
creation/destory would recover.

Thanks.

> 
> References:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=172981
> https://bugzilla.kernel.org/show_bug.cgi?id=172991
> https://patchwork.kernel.org/patch/9361853
> https://patchwork.kernel.org/patch/9359271



Re: [PATCH] mm/slab: fix kmemcg cache creation delayed issue

2016-10-06 Thread Joonsoo Kim
On Thu, Oct 06, 2016 at 09:02:00AM -0700, Doug Smythies wrote:
> It was my (limited) understanding that the subsequent 2 patch set
> superseded this patch. Indeed, the 2 patch set seems to solve
> both the SLAB and SLUB bug reports.

It would mean that patch 1 solves both the SLAB and SLUB bug reports
since patch 2 is only effective for SLUB.

Reason that I send this patch is that although patch 1 fixes the
issue that too many kworkers are created, kmem_cache creation/destory
is still slowed by synchronize_sched() and it would cause kmemcg
usage counting delayed. I'm not sure how bad it is but it's generally
better to start accounting as soon as possible. With patch 2 for SLUB
and this patch for SLAB, performance of kmem_cache
creation/destory would recover.

Thanks.

> 
> References:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=172981
> https://bugzilla.kernel.org/show_bug.cgi?id=172991
> https://patchwork.kernel.org/patch/9361853
> https://patchwork.kernel.org/patch/9359271



Re: [RFC PATCH] mm, compaction: allow compaction for GFP_NOFS requests

2016-10-06 Thread Vlastimil Babka

On 10/04/2016 10:12 AM, Michal Hocko wrote:

From: Michal Hocko 

compaction has been disabled for GFP_NOFS and GFP_NOIO requests since
the direct compaction was introduced by 56de7263fcf3 ("mm: compaction:
direct compact when a high-order allocation fails"). The main reason
is that the migration of page cache pages might recurse back to fs/io
layer and we could potentially deadlock. This is overly conservative
because all the anonymous memory is migrateable in the GFP_NOFS context
just fine.  This might be a large portion of the memory in many/most
workkloads.

Remove the GFP_NOFS restriction and make sure that we skip all fs pages
(those with a mapping) while isolating pages to be migrated. We cannot
consider clean fs pages because they might need a metadata update so
only isolate pages without any mapping for nofs requests.

The effect of this patch will be probably very limited in many/most
workloads because higher order GFP_NOFS requests are quite rare,
although different configurations might lead to very different results
as GFP_NOFS usage is rather unleashed (e.g. I had hard time to trigger
any with my setup). But still there shouldn't be any strong reason to
completely back off and do nothing in that context. In the worst case
we just skip parts of the block with fs pages. This might be still
sufficient to make a progress for small orders.

Signed-off-by: Michal Hocko 
---

Hi,
I am sending this as an RFC because I am not completely sure this a) is
really worth it and b) it is 100% correct. I couldn't find any problems
when staring into the code but as mentioned in the changelog I wasn't
really able to trigger high order GFP_NOFS requests in my setup.

Thoughts?

 mm/compaction.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index badb92bf14b4..07254a73ee32 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -834,6 +834,13 @@ isolate_migratepages_block(struct compact_control *cc, 
unsigned long low_pfn,
page_count(page) > page_mapcount(page))
goto isolate_fail;

+   /*
+* Only allow to migrate anonymous pages in GFP_NOFS context
+* because those do not depend on fs locks.
+*/
+   if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page))
+   goto isolate_fail;


Unless page can acquire a page_mapping between this check and migration, 
I don't see a problem with allowing this.


But make sure you don't break kcompactd and manual compaction from 
/proc, as they don't currently set cc->gfp_mask. Looks like until now it 
was only used to determine direct compactor's migratetype which is 
irrelevant in those contexts.



+
/* If we already hold the lock, we can skip some rechecking */
if (!locked) {
locked = compact_trylock_irqsave(zone_lru_lock(zone),
@@ -1696,14 +1703,16 @@ enum compact_result try_to_compact_pages(gfp_t 
gfp_mask, unsigned int order,
unsigned int alloc_flags, const struct alloc_context *ac,
enum compact_priority prio)
 {
-   int may_enter_fs = gfp_mask & __GFP_FS;
int may_perform_io = gfp_mask & __GFP_IO;
struct zoneref *z;
struct zone *zone;
enum compact_result rc = COMPACT_SKIPPED;

-   /* Check if the GFP flags allow compaction */
-   if (!may_enter_fs || !may_perform_io)
+   /*
+* Check if the GFP flags allow compaction - GFP_NOIO is really
+* tricky context because the migration might require IO and
+*/
+   if (!may_perform_io)
return COMPACT_SKIPPED;

trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio);





Re: [RFC PATCH] mm, compaction: allow compaction for GFP_NOFS requests

2016-10-06 Thread Vlastimil Babka

On 10/04/2016 10:12 AM, Michal Hocko wrote:

From: Michal Hocko 

compaction has been disabled for GFP_NOFS and GFP_NOIO requests since
the direct compaction was introduced by 56de7263fcf3 ("mm: compaction:
direct compact when a high-order allocation fails"). The main reason
is that the migration of page cache pages might recurse back to fs/io
layer and we could potentially deadlock. This is overly conservative
because all the anonymous memory is migrateable in the GFP_NOFS context
just fine.  This might be a large portion of the memory in many/most
workkloads.

Remove the GFP_NOFS restriction and make sure that we skip all fs pages
(those with a mapping) while isolating pages to be migrated. We cannot
consider clean fs pages because they might need a metadata update so
only isolate pages without any mapping for nofs requests.

The effect of this patch will be probably very limited in many/most
workloads because higher order GFP_NOFS requests are quite rare,
although different configurations might lead to very different results
as GFP_NOFS usage is rather unleashed (e.g. I had hard time to trigger
any with my setup). But still there shouldn't be any strong reason to
completely back off and do nothing in that context. In the worst case
we just skip parts of the block with fs pages. This might be still
sufficient to make a progress for small orders.

Signed-off-by: Michal Hocko 
---

Hi,
I am sending this as an RFC because I am not completely sure this a) is
really worth it and b) it is 100% correct. I couldn't find any problems
when staring into the code but as mentioned in the changelog I wasn't
really able to trigger high order GFP_NOFS requests in my setup.

Thoughts?

 mm/compaction.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index badb92bf14b4..07254a73ee32 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -834,6 +834,13 @@ isolate_migratepages_block(struct compact_control *cc, 
unsigned long low_pfn,
page_count(page) > page_mapcount(page))
goto isolate_fail;

+   /*
+* Only allow to migrate anonymous pages in GFP_NOFS context
+* because those do not depend on fs locks.
+*/
+   if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page))
+   goto isolate_fail;


Unless page can acquire a page_mapping between this check and migration, 
I don't see a problem with allowing this.


But make sure you don't break kcompactd and manual compaction from 
/proc, as they don't currently set cc->gfp_mask. Looks like until now it 
was only used to determine direct compactor's migratetype which is 
irrelevant in those contexts.



+
/* If we already hold the lock, we can skip some rechecking */
if (!locked) {
locked = compact_trylock_irqsave(zone_lru_lock(zone),
@@ -1696,14 +1703,16 @@ enum compact_result try_to_compact_pages(gfp_t 
gfp_mask, unsigned int order,
unsigned int alloc_flags, const struct alloc_context *ac,
enum compact_priority prio)
 {
-   int may_enter_fs = gfp_mask & __GFP_FS;
int may_perform_io = gfp_mask & __GFP_IO;
struct zoneref *z;
struct zone *zone;
enum compact_result rc = COMPACT_SKIPPED;

-   /* Check if the GFP flags allow compaction */
-   if (!may_enter_fs || !may_perform_io)
+   /*
+* Check if the GFP flags allow compaction - GFP_NOIO is really
+* tricky context because the migration might require IO and
+*/
+   if (!may_perform_io)
return COMPACT_SKIPPED;

trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio);





Re: Scrolling down broken with "perf top --hierarchy"

2016-10-06 Thread Markus Trippelsdorf
On 2016.10.07 at 06:56 +0200, Markus Trippelsdorf wrote:
> On 2016.10.07 at 06:32 +0200, Markus Trippelsdorf wrote:
> > On 2016.10.07 at 13:22 +0900, Namhyung Kim wrote:
> > > On Fri, Oct 07, 2016 at 05:51:18AM +0200, Markus Trippelsdorf wrote:
> > > > On 2016.10.07 at 10:17 +0900, Namhyung Kim wrote:
> > > > > On Thu, Oct 06, 2016 at 06:33:33PM +0200, Markus Trippelsdorf wrote:
> > > > > > Scrolling down is broken when using "perf top --hierarchy".
> > > > > > When it starts up everything is OK and one can scroll up and down 
> > > > > > to all
> > > > > > entries. But as further and further new entries get added to the 
> > > > > > list,
> > > > > > scrolling down is blocked (at the position of the last entry that 
> > > > > > was
> > > > > > shown directly after startup).
> > > > > 
> > > > > I think below patch will fix the problem.  Please check.
> > > > 
> > > > Yes. It works fine now. Many thanks.
> > > 
> > > Good.  Can I add your Tested-by then?
> > 
> > Sure. 
> 
> And BTW symbols are currently always cut off at 60 characters in
> expanded entries.

Hmm, no. Sometimes they are cut off, sometimes they are not. I haven't
figured out what triggered this strange behavior.

-- 
Markus


Re: Scrolling down broken with "perf top --hierarchy"

2016-10-06 Thread Markus Trippelsdorf
On 2016.10.07 at 06:56 +0200, Markus Trippelsdorf wrote:
> On 2016.10.07 at 06:32 +0200, Markus Trippelsdorf wrote:
> > On 2016.10.07 at 13:22 +0900, Namhyung Kim wrote:
> > > On Fri, Oct 07, 2016 at 05:51:18AM +0200, Markus Trippelsdorf wrote:
> > > > On 2016.10.07 at 10:17 +0900, Namhyung Kim wrote:
> > > > > On Thu, Oct 06, 2016 at 06:33:33PM +0200, Markus Trippelsdorf wrote:
> > > > > > Scrolling down is broken when using "perf top --hierarchy".
> > > > > > When it starts up everything is OK and one can scroll up and down 
> > > > > > to all
> > > > > > entries. But as further and further new entries get added to the 
> > > > > > list,
> > > > > > scrolling down is blocked (at the position of the last entry that 
> > > > > > was
> > > > > > shown directly after startup).
> > > > > 
> > > > > I think below patch will fix the problem.  Please check.
> > > > 
> > > > Yes. It works fine now. Many thanks.
> > > 
> > > Good.  Can I add your Tested-by then?
> > 
> > Sure. 
> 
> And BTW symbols are currently always cut off at 60 characters in
> expanded entries.

Hmm, no. Sometimes they are cut off, sometimes they are not. I haven't
figured out what triggered this strange behavior.

-- 
Markus


[PATCH] perf top: Fix refreshing hierarchy entries on TUI

2016-10-06 Thread Namhyung Kim
Markus reported that 'perf top --hierarchy' cannot scroll down after
refresh.  This was because the number of entries are not updated when
hierarchy is enabled.

Unlike normal report view, hierarchy mode needs to keep its own entry
count since it can have non-leaf entries which can expand/collapse.

Reported-and-tested-by: Markus Trippelsdorf 
Fixes: f5b763feebe9 ("perf hists browser: Count number of hierarchy entries")
Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/browsers/hists.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index fb8e42c7507a..47be9299 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -601,7 +601,8 @@ int hist_browser__run(struct hist_browser *browser, const 
char *help)
u64 nr_entries;
hbt->timer(hbt->arg);
 
-   if (hist_browser__has_filter(browser))
+   if (hist_browser__has_filter(browser) ||
+   symbol_conf.report_hierarchy)
hist_browser__update_nr_entries(browser);
 
nr_entries = hist_browser__nr_entries(browser);
-- 
2.9.3



[PATCH] perf top: Fix refreshing hierarchy entries on TUI

2016-10-06 Thread Namhyung Kim
Markus reported that 'perf top --hierarchy' cannot scroll down after
refresh.  This was because the number of entries are not updated when
hierarchy is enabled.

Unlike normal report view, hierarchy mode needs to keep its own entry
count since it can have non-leaf entries which can expand/collapse.

Reported-and-tested-by: Markus Trippelsdorf 
Fixes: f5b763feebe9 ("perf hists browser: Count number of hierarchy entries")
Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/browsers/hists.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index fb8e42c7507a..47be9299 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -601,7 +601,8 @@ int hist_browser__run(struct hist_browser *browser, const 
char *help)
u64 nr_entries;
hbt->timer(hbt->arg);
 
-   if (hist_browser__has_filter(browser))
+   if (hist_browser__has_filter(browser) ||
+   symbol_conf.report_hierarchy)
hist_browser__update_nr_entries(browser);
 
nr_entries = hist_browser__nr_entries(browser);
-- 
2.9.3



[GIT PULL] drm-vc4-next-2016-10-06

2016-10-06 Thread Eric Anholt
These are fixes that have been on the list for 1-3 weeks that didn't
make it into 4.9.  I've been running most of them most of the time,
some have been merged downstream, and some have also been merged to
the Fedora kernel build.  This is about as much testing as we ever get
on vc4, so I feel pretty good about them.

The branch base is your current -next, because I wanted the merge
forward of drm-vc4-fixes to avoid conflicts.

The following changes since commit c2cbc38b9715bd8318062e600668fc30e5a3fbfa:

  drm: virtio: reinstate drm_virtio_set_busid() (2016-10-04 13:10:30 +1000)

are available in the git repository at:

  https://github.com/anholt/linux tags/drm-vc4-next-2016-10-06

for you to fetch changes up to dfccd937deec9283d6ced73e138808e62bec54e8:

  drm/vc4: Add support for double-clocked modes. (2016-10-06 11:58:28 -0700)


This pull request brings in several fixes for drm-next, mostly for
HDMI.


Eric Anholt (7):
  drm/vc4: Fix races when the CS reads from render targets.
  drm/vc4: Enable limited range RGB output on HDMI with CEA modes.
  drm/vc4: Fall back to using an EDID probe in the absence of a GPIO.
  drm/vc4: Increase timeout for HDMI_SCHEDULER_CONTROL changes.
  drm/vc4: Fix support for interlaced modes on HDMI.
  drm/vc4: Set up the AVI and SPD infoframes.
  drm/vc4: Add support for double-clocked modes.

Masahiro Yamada (1):
  drm/vc4: cleanup with list_first_entry_or_null()

 drivers/gpu/drm/vc4/vc4_crtc.c  |  64 +-
 drivers/gpu/drm/vc4/vc4_drv.h   |  30 +++--
 drivers/gpu/drm/vc4/vc4_gem.c   |  13 ++
 drivers/gpu/drm/vc4/vc4_hdmi.c  | 231 +---
 drivers/gpu/drm/vc4/vc4_regs.h  |  19 ++-
 drivers/gpu/drm/vc4/vc4_render_cl.c |  21 +++-
 drivers/gpu/drm/vc4/vc4_validate.c  |  17 ++-
 7 files changed, 306 insertions(+), 89 deletions(-)


[GIT PULL] drm-vc4-next-2016-10-06

2016-10-06 Thread Eric Anholt
These are fixes that have been on the list for 1-3 weeks that didn't
make it into 4.9.  I've been running most of them most of the time,
some have been merged downstream, and some have also been merged to
the Fedora kernel build.  This is about as much testing as we ever get
on vc4, so I feel pretty good about them.

The branch base is your current -next, because I wanted the merge
forward of drm-vc4-fixes to avoid conflicts.

The following changes since commit c2cbc38b9715bd8318062e600668fc30e5a3fbfa:

  drm: virtio: reinstate drm_virtio_set_busid() (2016-10-04 13:10:30 +1000)

are available in the git repository at:

  https://github.com/anholt/linux tags/drm-vc4-next-2016-10-06

for you to fetch changes up to dfccd937deec9283d6ced73e138808e62bec54e8:

  drm/vc4: Add support for double-clocked modes. (2016-10-06 11:58:28 -0700)


This pull request brings in several fixes for drm-next, mostly for
HDMI.


Eric Anholt (7):
  drm/vc4: Fix races when the CS reads from render targets.
  drm/vc4: Enable limited range RGB output on HDMI with CEA modes.
  drm/vc4: Fall back to using an EDID probe in the absence of a GPIO.
  drm/vc4: Increase timeout for HDMI_SCHEDULER_CONTROL changes.
  drm/vc4: Fix support for interlaced modes on HDMI.
  drm/vc4: Set up the AVI and SPD infoframes.
  drm/vc4: Add support for double-clocked modes.

Masahiro Yamada (1):
  drm/vc4: cleanup with list_first_entry_or_null()

 drivers/gpu/drm/vc4/vc4_crtc.c  |  64 +-
 drivers/gpu/drm/vc4/vc4_drv.h   |  30 +++--
 drivers/gpu/drm/vc4/vc4_gem.c   |  13 ++
 drivers/gpu/drm/vc4/vc4_hdmi.c  | 231 +---
 drivers/gpu/drm/vc4/vc4_regs.h  |  19 ++-
 drivers/gpu/drm/vc4/vc4_render_cl.c |  21 +++-
 drivers/gpu/drm/vc4/vc4_validate.c  |  17 ++-
 7 files changed, 306 insertions(+), 89 deletions(-)


Re: Scrolling down broken with "perf top --hierarchy"

2016-10-06 Thread Markus Trippelsdorf
On 2016.10.07 at 06:32 +0200, Markus Trippelsdorf wrote:
> On 2016.10.07 at 13:22 +0900, Namhyung Kim wrote:
> > On Fri, Oct 07, 2016 at 05:51:18AM +0200, Markus Trippelsdorf wrote:
> > > On 2016.10.07 at 10:17 +0900, Namhyung Kim wrote:
> > > > On Thu, Oct 06, 2016 at 06:33:33PM +0200, Markus Trippelsdorf wrote:
> > > > > Scrolling down is broken when using "perf top --hierarchy".
> > > > > When it starts up everything is OK and one can scroll up and down to 
> > > > > all
> > > > > entries. But as further and further new entries get added to the list,
> > > > > scrolling down is blocked (at the position of the last entry that was
> > > > > shown directly after startup).
> > > > 
> > > > I think below patch will fix the problem.  Please check.
> > > 
> > > Yes. It works fine now. Many thanks.
> > 
> > Good.  Can I add your Tested-by then?
> 
> Sure. 

And BTW symbols are currently always cut off at 60 characters in expanded 
entries.

-- 
Markus


Re: Scrolling down broken with "perf top --hierarchy"

2016-10-06 Thread Markus Trippelsdorf
On 2016.10.07 at 06:32 +0200, Markus Trippelsdorf wrote:
> On 2016.10.07 at 13:22 +0900, Namhyung Kim wrote:
> > On Fri, Oct 07, 2016 at 05:51:18AM +0200, Markus Trippelsdorf wrote:
> > > On 2016.10.07 at 10:17 +0900, Namhyung Kim wrote:
> > > > On Thu, Oct 06, 2016 at 06:33:33PM +0200, Markus Trippelsdorf wrote:
> > > > > Scrolling down is broken when using "perf top --hierarchy".
> > > > > When it starts up everything is OK and one can scroll up and down to 
> > > > > all
> > > > > entries. But as further and further new entries get added to the list,
> > > > > scrolling down is blocked (at the position of the last entry that was
> > > > > shown directly after startup).
> > > > 
> > > > I think below patch will fix the problem.  Please check.
> > > 
> > > Yes. It works fine now. Many thanks.
> > 
> > Good.  Can I add your Tested-by then?
> 
> Sure. 

And BTW symbols are currently always cut off at 60 characters in expanded 
entries.

-- 
Markus


Re: Scrolling down broken with "perf top --hierarchy"

2016-10-06 Thread Namhyung Kim
Cc-ing perf maintainers,

On Fri, Oct 07, 2016 at 06:32:29AM +0200, Markus Trippelsdorf wrote:
> On 2016.10.07 at 13:22 +0900, Namhyung Kim wrote:
> > On Fri, Oct 07, 2016 at 05:51:18AM +0200, Markus Trippelsdorf wrote:
> > > On 2016.10.07 at 10:17 +0900, Namhyung Kim wrote:
> > > > On Thu, Oct 06, 2016 at 06:33:33PM +0200, Markus Trippelsdorf wrote:
> > > > > Scrolling down is broken when using "perf top --hierarchy".
> > > > > When it starts up everything is OK and one can scroll up and down to 
> > > > > all
> > > > > entries. But as further and further new entries get added to the list,
> > > > > scrolling down is blocked (at the position of the last entry that was
> > > > > shown directly after startup).
> > > > 
> > > > I think below patch will fix the problem.  Please check.
> > > 
> > > Yes. It works fine now. Many thanks.
> > 
> > Good.  Can I add your Tested-by then?
> 
> Sure.

Ok, I'll send a formal patch with it.

> 
> (And in the long run you should think of making "perf top --hierarchy"
> the default for perf top, because it gives a much better (uncluttered)
> overview of what is going on.)

I think it's a matter of taste.  Some people prefer to see the top
single function or something (i.e. current behavior) while others
prefer to see a higher-level view.

But we can think again about the default at least for perf-top.  I
worried about changing default behavior because last time we did it
for children mode many people complained about it.  But I do think the
hierarchy mode is useful for many people though.

Hmm.. I thought that it already has a config option to enable hierarch
mode by default, but I cannot find it now.

Thanks,
Namhyung


Re: Scrolling down broken with "perf top --hierarchy"

2016-10-06 Thread Namhyung Kim
Cc-ing perf maintainers,

On Fri, Oct 07, 2016 at 06:32:29AM +0200, Markus Trippelsdorf wrote:
> On 2016.10.07 at 13:22 +0900, Namhyung Kim wrote:
> > On Fri, Oct 07, 2016 at 05:51:18AM +0200, Markus Trippelsdorf wrote:
> > > On 2016.10.07 at 10:17 +0900, Namhyung Kim wrote:
> > > > On Thu, Oct 06, 2016 at 06:33:33PM +0200, Markus Trippelsdorf wrote:
> > > > > Scrolling down is broken when using "perf top --hierarchy".
> > > > > When it starts up everything is OK and one can scroll up and down to 
> > > > > all
> > > > > entries. But as further and further new entries get added to the list,
> > > > > scrolling down is blocked (at the position of the last entry that was
> > > > > shown directly after startup).
> > > > 
> > > > I think below patch will fix the problem.  Please check.
> > > 
> > > Yes. It works fine now. Many thanks.
> > 
> > Good.  Can I add your Tested-by then?
> 
> Sure.

Ok, I'll send a formal patch with it.

> 
> (And in the long run you should think of making "perf top --hierarchy"
> the default for perf top, because it gives a much better (uncluttered)
> overview of what is going on.)

I think it's a matter of taste.  Some people prefer to see the top
single function or something (i.e. current behavior) while others
prefer to see a higher-level view.

But we can think again about the default at least for perf-top.  I
worried about changing default behavior because last time we did it
for children mode many people complained about it.  But I do think the
hierarchy mode is useful for many people though.

Hmm.. I thought that it already has a config option to enable hierarch
mode by default, but I cannot find it now.

Thanks,
Namhyung


[PATCH 01/01] drivers:input:byd fix greedy detection of Sentelic FSP by the BYD touchpad driver

2016-10-06 Thread Christophe Tordeux
From: Christophe TORDEUX 

With kernel v4.6 and later, the Sentelic touchpad STL3888_C0 and
probably other Sentelic FSP touchpads are detected as a BYD touchpad and
lose multitouch features.

During the BYD handshake in the byd_detect function, the BYD driver
mistakenly interprets a standard PS/2 protocol status request answer
from the Sentelic touchpad as a successful handshake with a BYD
touchpad. This is clearly a bug of the BYD driver.

Description of the patch: In byd_detect function, remove positive
detection result based on standard PS/2 protocol status request answer.
Replace it with positive detection based on handshake answers as they
can be inferred from the BYD touchpad datasheets found on BYD website.

Signed-off-by: Christophe TORDEUX 

---
Resubmitting this patch because I got no feedback on my first 
submission.
Fixes kernel bug 175421 which is impacting multiple users.

---
 drivers/input/mouse/byd.c | 76 
 ++-
 1 file changed, 62 insertions(+), 14 deletions(-)

diff --git a/drivers/input/mouse/byd.c b/drivers/input/mouse/byd.c
index b27aa63..b5acca0 100644
--- a/drivers/input/mouse/byd.c
+++ b/drivers/input/mouse/byd.c
@@ -35,6 +35,18 @@
  * BYD pad constants
  */
 
+/* Handshake answer of BTP6034 */
+#define BYD_MODEL_BTP6034  0x00E801
+/* Handshake answer of BTP6740 */
+#define BYD_MODEL_BTP6740  0x001155
+/* Handshake answers of BTP8644, BTP10463 and BTP11484 */
+#define BYD_MODEL_BTP8644  0x011155
+
+/* Handshake SETRES byte of BTP6034 and BTP6740 */
+#define BYD_SHAKE_BYTE_A   0x00
+/* Handshake SETRES byte of BTP8644, BTP10463 and BTP11484 */
+#define BYD_SHAKE_BYTE_B   0x03
+
 /*
  * True device resolution is unknown, however experiments show the
  * resolution is about 111 units/mm.
@@ -434,23 +446,59 @@ static void byd_disconnect(struct psmouse *psmouse)
}
 }
 
+u32 byd_try_model(u32 model)
+{
+   size_t i;
+
+   u32 byd_model[] = {
+   BYD_MODEL_BTP6034,
+   BYD_MODEL_BTP6740,
+   BYD_MODEL_BTP8644
+   };
+
+   for (i=0; i < ARRAY_SIZE(byd_model); i++) {
+   if (model ==  byd_model[i])
+   return model;
+   }
+
+   return 0;
+}
+
 int byd_detect(struct psmouse *psmouse, bool set_properties)
 {
struct ps2dev *ps2dev = >ps2dev;
-   u8 param[4] = {0x03, 0x00, 0x00, 0x00};
-
-   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
-   return -1;
-   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
-   return -1;
-   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
-   return -1;
-   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
-   return -1;
-   if (ps2_command(ps2dev, param, PSMOUSE_CMD_GETINFO))
-   return -1;
-
-   if (param[1] != 0x03 || param[2] != 0x64)
+   size_t i;
+
+   u8 byd_shbyte[] = {
+   BYD_SHAKE_BYTE_A,
+   BYD_SHAKE_BYTE_B
+   };
+
+   bool detect = false;
+   for (i=0; i < ARRAY_SIZE(byd_shbyte); i++) {
+   u32 model;
+   u8 param[4] = {byd_shbyte[i], 0x00, 0x00, 0x00};
+
+   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
+   return -1;
+   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
+   return -1;
+   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
+   return -1;
+   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
+   return -1;
+   if (ps2_command(ps2dev, param, PSMOUSE_CMD_GETINFO))
+   return -1;
+
+   model = param[2];
+   model += param[1] << 8;
+   model += param[0] << 16;
+   model = byd_try_model(model);
+   if (model)
+   detect = true;
+   }
+
+   if (!detect)
return -ENODEV;
 
psmouse_dbg(psmouse, "BYD touchpad detected\n");


signature.asc
Description: PGP signature


[PATCH 01/01] drivers:input:byd fix greedy detection of Sentelic FSP by the BYD touchpad driver

2016-10-06 Thread Christophe Tordeux
From: Christophe TORDEUX 

With kernel v4.6 and later, the Sentelic touchpad STL3888_C0 and
probably other Sentelic FSP touchpads are detected as a BYD touchpad and
lose multitouch features.

During the BYD handshake in the byd_detect function, the BYD driver
mistakenly interprets a standard PS/2 protocol status request answer
from the Sentelic touchpad as a successful handshake with a BYD
touchpad. This is clearly a bug of the BYD driver.

Description of the patch: In byd_detect function, remove positive
detection result based on standard PS/2 protocol status request answer.
Replace it with positive detection based on handshake answers as they
can be inferred from the BYD touchpad datasheets found on BYD website.

Signed-off-by: Christophe TORDEUX 

---
Resubmitting this patch because I got no feedback on my first 
submission.
Fixes kernel bug 175421 which is impacting multiple users.

---
 drivers/input/mouse/byd.c | 76 
 ++-
 1 file changed, 62 insertions(+), 14 deletions(-)

diff --git a/drivers/input/mouse/byd.c b/drivers/input/mouse/byd.c
index b27aa63..b5acca0 100644
--- a/drivers/input/mouse/byd.c
+++ b/drivers/input/mouse/byd.c
@@ -35,6 +35,18 @@
  * BYD pad constants
  */
 
+/* Handshake answer of BTP6034 */
+#define BYD_MODEL_BTP6034  0x00E801
+/* Handshake answer of BTP6740 */
+#define BYD_MODEL_BTP6740  0x001155
+/* Handshake answers of BTP8644, BTP10463 and BTP11484 */
+#define BYD_MODEL_BTP8644  0x011155
+
+/* Handshake SETRES byte of BTP6034 and BTP6740 */
+#define BYD_SHAKE_BYTE_A   0x00
+/* Handshake SETRES byte of BTP8644, BTP10463 and BTP11484 */
+#define BYD_SHAKE_BYTE_B   0x03
+
 /*
  * True device resolution is unknown, however experiments show the
  * resolution is about 111 units/mm.
@@ -434,23 +446,59 @@ static void byd_disconnect(struct psmouse *psmouse)
}
 }
 
+u32 byd_try_model(u32 model)
+{
+   size_t i;
+
+   u32 byd_model[] = {
+   BYD_MODEL_BTP6034,
+   BYD_MODEL_BTP6740,
+   BYD_MODEL_BTP8644
+   };
+
+   for (i=0; i < ARRAY_SIZE(byd_model); i++) {
+   if (model ==  byd_model[i])
+   return model;
+   }
+
+   return 0;
+}
+
 int byd_detect(struct psmouse *psmouse, bool set_properties)
 {
struct ps2dev *ps2dev = >ps2dev;
-   u8 param[4] = {0x03, 0x00, 0x00, 0x00};
-
-   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
-   return -1;
-   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
-   return -1;
-   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
-   return -1;
-   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
-   return -1;
-   if (ps2_command(ps2dev, param, PSMOUSE_CMD_GETINFO))
-   return -1;
-
-   if (param[1] != 0x03 || param[2] != 0x64)
+   size_t i;
+
+   u8 byd_shbyte[] = {
+   BYD_SHAKE_BYTE_A,
+   BYD_SHAKE_BYTE_B
+   };
+
+   bool detect = false;
+   for (i=0; i < ARRAY_SIZE(byd_shbyte); i++) {
+   u32 model;
+   u8 param[4] = {byd_shbyte[i], 0x00, 0x00, 0x00};
+
+   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
+   return -1;
+   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
+   return -1;
+   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
+   return -1;
+   if (ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES))
+   return -1;
+   if (ps2_command(ps2dev, param, PSMOUSE_CMD_GETINFO))
+   return -1;
+
+   model = param[2];
+   model += param[1] << 8;
+   model += param[0] << 16;
+   model = byd_try_model(model);
+   if (model)
+   detect = true;
+   }
+
+   if (!detect)
return -ENODEV;
 
psmouse_dbg(psmouse, "BYD touchpad detected\n");


signature.asc
Description: PGP signature


[RESEND PATCH v3] scsi: ufshcd: fix possible unclocked register access

2016-10-06 Thread Subhash Jadavani
Vendor specific setup_clocks callback may require the clocks managed
by ufshcd driver to be ON. So if the vendor specific setup_clocks callback
is called while the required clocks are turned off, it could result into
unclocked register access.

To prevent possible unclock register access, this change adds one more
argument to setup_clocks callback to let it know whether it is called
pre/post the clock changes by core driver.

Signed-off-by: Subhash Jadavani 
---
Changes from v2:
* Added one more argument to setup_clocks callback, this should address
  Kiwoong Kim's comments on v2.

Changes from v1:
* Don't call ufshcd_vops_setup_clocks() again for clock off
---
 drivers/scsi/ufs/ufs-qcom.c | 10 ++
 drivers/scsi/ufs/ufshcd.c   | 17 -
 drivers/scsi/ufs/ufshcd.h   |  8 +---
 3 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c
index 3aedf73..3c4f602 100644
--- a/drivers/scsi/ufs/ufs-qcom.c
+++ b/drivers/scsi/ufs/ufs-qcom.c
@@ -1094,10 +1094,12 @@ static void ufs_qcom_set_caps(struct ufs_hba *hba)
  * ufs_qcom_setup_clocks - enables/disable clocks
  * @hba: host controller instance
  * @on: If true, enable clocks else disable them.
+ * @status: PRE_CHANGE or POST_CHANGE notify
  *
  * Returns 0 on success, non-zero on failure.
  */
-static int ufs_qcom_setup_clocks(struct ufs_hba *hba, bool on)
+static int ufs_qcom_setup_clocks(struct ufs_hba *hba, bool on,
+enum ufs_notify_change_status status)
 {
struct ufs_qcom_host *host = ufshcd_get_variant(hba);
int err;
@@ -,7 +1113,7 @@ static int ufs_qcom_setup_clocks(struct ufs_hba *hba, 
bool on)
if (!host)
return 0;
 
-   if (on) {
+   if (on && (status == POST_CHANGE)) {
err = ufs_qcom_phy_enable_iface_clk(host->generic_phy);
if (err)
goto out;
@@ -1130,7 +1132,7 @@ static int ufs_qcom_setup_clocks(struct ufs_hba *hba, 
bool on)
if (vote == host->bus_vote.min_bw_vote)
ufs_qcom_update_bus_bw_vote(host);
 
-   } else {
+   } else if (!on && (status == PRE_CHANGE)) {
 
/* M-PHY RMMI interface clocks can be turned off */
ufs_qcom_phy_disable_iface_clk(host->generic_phy);
@@ -1254,7 +1256,7 @@ static int ufs_qcom_init(struct ufs_hba *hba)
ufs_qcom_set_caps(hba);
ufs_qcom_advertise_quirks(hba);
 
-   ufs_qcom_setup_clocks(hba, true);
+   ufs_qcom_setup_clocks(hba, true, POST_CHANGE);
 
if (hba->dev->id < MAX_UFS_QCOM_HOSTS)
ufs_qcom_hosts[hba->dev->id] = host;
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 05c7456..571a2f6 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -5389,6 +5389,10 @@ static int __ufshcd_setup_clocks(struct ufs_hba *hba, 
bool on,
if (!head || list_empty(head))
goto out;
 
+   ret = ufshcd_vops_setup_clocks(hba, on, PRE_CHANGE);
+   if (ret)
+   return ret;
+
list_for_each_entry(clki, head, list) {
if (!IS_ERR_OR_NULL(clki->clk)) {
if (skip_ref_clk && !strcmp(clki->name, "ref_clk"))
@@ -5410,7 +5414,10 @@ static int __ufshcd_setup_clocks(struct ufs_hba *hba, 
bool on,
}
}
 
-   ret = ufshcd_vops_setup_clocks(hba, on);
+   ret = ufshcd_vops_setup_clocks(hba, on, POST_CHANGE);
+   if (ret)
+   return ret;
+
 out:
if (ret) {
list_for_each_entry(clki, head, list) {
@@ -5500,8 +5507,6 @@ static void ufshcd_variant_hba_exit(struct ufs_hba *hba)
if (!hba->vops)
return;
 
-   ufshcd_vops_setup_clocks(hba, false);
-
ufshcd_vops_setup_regulators(hba, false);
 
ufshcd_vops_exit(hba);
@@ -5905,10 +5910,6 @@ disable_clks:
if (ret)
goto set_link_active;
 
-   ret = ufshcd_vops_setup_clocks(hba, false);
-   if (ret)
-   goto vops_resume;
-
if (!ufshcd_is_link_active(hba))
ufshcd_setup_clocks(hba, false);
else
@@ -5925,8 +5926,6 @@ disable_clks:
ufshcd_hba_vreg_set_lpm(hba);
goto out;
 
-vops_resume:
-   ufshcd_vops_resume(hba, pm_op);
 set_link_active:
ufshcd_vreg_set_hpm(hba);
if (ufshcd_is_link_hibern8(hba) && !ufshcd_uic_hibern8_exit(hba))
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 430bef1..afff7f4 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -273,7 +273,8 @@ struct ufs_hba_variant_ops {
u32 (*get_ufs_hci_version)(struct ufs_hba *);
int (*clk_scale_notify)(struct ufs_hba *, bool,
enum ufs_notify_change_status);
-   int (*setup_clocks)(struct ufs_hba *, bool);
+   int 

[RESEND PATCH v3] scsi: ufshcd: fix possible unclocked register access

2016-10-06 Thread Subhash Jadavani
Vendor specific setup_clocks callback may require the clocks managed
by ufshcd driver to be ON. So if the vendor specific setup_clocks callback
is called while the required clocks are turned off, it could result into
unclocked register access.

To prevent possible unclock register access, this change adds one more
argument to setup_clocks callback to let it know whether it is called
pre/post the clock changes by core driver.

Signed-off-by: Subhash Jadavani 
---
Changes from v2:
* Added one more argument to setup_clocks callback, this should address
  Kiwoong Kim's comments on v2.

Changes from v1:
* Don't call ufshcd_vops_setup_clocks() again for clock off
---
 drivers/scsi/ufs/ufs-qcom.c | 10 ++
 drivers/scsi/ufs/ufshcd.c   | 17 -
 drivers/scsi/ufs/ufshcd.h   |  8 +---
 3 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c
index 3aedf73..3c4f602 100644
--- a/drivers/scsi/ufs/ufs-qcom.c
+++ b/drivers/scsi/ufs/ufs-qcom.c
@@ -1094,10 +1094,12 @@ static void ufs_qcom_set_caps(struct ufs_hba *hba)
  * ufs_qcom_setup_clocks - enables/disable clocks
  * @hba: host controller instance
  * @on: If true, enable clocks else disable them.
+ * @status: PRE_CHANGE or POST_CHANGE notify
  *
  * Returns 0 on success, non-zero on failure.
  */
-static int ufs_qcom_setup_clocks(struct ufs_hba *hba, bool on)
+static int ufs_qcom_setup_clocks(struct ufs_hba *hba, bool on,
+enum ufs_notify_change_status status)
 {
struct ufs_qcom_host *host = ufshcd_get_variant(hba);
int err;
@@ -,7 +1113,7 @@ static int ufs_qcom_setup_clocks(struct ufs_hba *hba, 
bool on)
if (!host)
return 0;
 
-   if (on) {
+   if (on && (status == POST_CHANGE)) {
err = ufs_qcom_phy_enable_iface_clk(host->generic_phy);
if (err)
goto out;
@@ -1130,7 +1132,7 @@ static int ufs_qcom_setup_clocks(struct ufs_hba *hba, 
bool on)
if (vote == host->bus_vote.min_bw_vote)
ufs_qcom_update_bus_bw_vote(host);
 
-   } else {
+   } else if (!on && (status == PRE_CHANGE)) {
 
/* M-PHY RMMI interface clocks can be turned off */
ufs_qcom_phy_disable_iface_clk(host->generic_phy);
@@ -1254,7 +1256,7 @@ static int ufs_qcom_init(struct ufs_hba *hba)
ufs_qcom_set_caps(hba);
ufs_qcom_advertise_quirks(hba);
 
-   ufs_qcom_setup_clocks(hba, true);
+   ufs_qcom_setup_clocks(hba, true, POST_CHANGE);
 
if (hba->dev->id < MAX_UFS_QCOM_HOSTS)
ufs_qcom_hosts[hba->dev->id] = host;
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 05c7456..571a2f6 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -5389,6 +5389,10 @@ static int __ufshcd_setup_clocks(struct ufs_hba *hba, 
bool on,
if (!head || list_empty(head))
goto out;
 
+   ret = ufshcd_vops_setup_clocks(hba, on, PRE_CHANGE);
+   if (ret)
+   return ret;
+
list_for_each_entry(clki, head, list) {
if (!IS_ERR_OR_NULL(clki->clk)) {
if (skip_ref_clk && !strcmp(clki->name, "ref_clk"))
@@ -5410,7 +5414,10 @@ static int __ufshcd_setup_clocks(struct ufs_hba *hba, 
bool on,
}
}
 
-   ret = ufshcd_vops_setup_clocks(hba, on);
+   ret = ufshcd_vops_setup_clocks(hba, on, POST_CHANGE);
+   if (ret)
+   return ret;
+
 out:
if (ret) {
list_for_each_entry(clki, head, list) {
@@ -5500,8 +5507,6 @@ static void ufshcd_variant_hba_exit(struct ufs_hba *hba)
if (!hba->vops)
return;
 
-   ufshcd_vops_setup_clocks(hba, false);
-
ufshcd_vops_setup_regulators(hba, false);
 
ufshcd_vops_exit(hba);
@@ -5905,10 +5910,6 @@ disable_clks:
if (ret)
goto set_link_active;
 
-   ret = ufshcd_vops_setup_clocks(hba, false);
-   if (ret)
-   goto vops_resume;
-
if (!ufshcd_is_link_active(hba))
ufshcd_setup_clocks(hba, false);
else
@@ -5925,8 +5926,6 @@ disable_clks:
ufshcd_hba_vreg_set_lpm(hba);
goto out;
 
-vops_resume:
-   ufshcd_vops_resume(hba, pm_op);
 set_link_active:
ufshcd_vreg_set_hpm(hba);
if (ufshcd_is_link_hibern8(hba) && !ufshcd_uic_hibern8_exit(hba))
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 430bef1..afff7f4 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -273,7 +273,8 @@ struct ufs_hba_variant_ops {
u32 (*get_ufs_hci_version)(struct ufs_hba *);
int (*clk_scale_notify)(struct ufs_hba *, bool,
enum ufs_notify_change_status);
-   int (*setup_clocks)(struct ufs_hba *, bool);
+   int (*setup_clocks)(struct ufs_hba *, 

Re: [PATCH v2] mount: dont execute propagate_umount() many times for same mounts

2016-10-06 Thread Eric W. Biederman
Andrei Vagin  writes:

> On Thu, Oct 06, 2016 at 02:46:30PM -0500, Eric W. Biederman wrote:
>> Andrei Vagin  writes:
>> 
>> > The reason of this optimization is that umount() can hold namespace_sem
>> > for a long time, this semaphore is global, so it affects all users.
>> > Recently Eric W. Biederman added a per mount namespace limit on the
>> > number of mounts. The default number of mounts allowed per mount
>> > namespace at 100,000. Currently this value is allowed to construct a tree
>> > which requires hours to be umounted.
>> 
>> I am going to take a hard look at this as this problem sounds very
>> unfortunate.  My memory of going through this code before strongly
>> suggests that changing the last list_for_each_entry to
>> list_for_each_entry_reverse is going to impact the correctness of this
>> change.
>
> I have read this code again and you are right, list_for_each_entry can't
> be changed on list_for_each_entry_reverse here.
>
> I tested these changes more carefully and find one more issue, so I am
> going to send a new patch and would like to get your comments to it.
>
> Thank you for your time.

No problem.

A quick question.  You have introduced lookup_mnt_cont.  Is that a core
part of your fix or do you truly have problmenatic long hash chains.

Simply increasing the hash table size should fix problems long hash
chains (and there are other solutions like rhashtable that may be more
appropriate than pre-allocating large hash chains).

If it is not long hash chains introducing lookup_mnt_cont in your patch
is a distraction to the core of what is going on.

Perhaps I am blind but if the hash chains are not long I don't see mount
propagation could be more than quadratic in the worst case.  As there is
only a loop within a loop.  Or Is the tree walking in propagation_next
that bad?

Eric


Re: [PATCH v2] mount: dont execute propagate_umount() many times for same mounts

2016-10-06 Thread Eric W. Biederman
Andrei Vagin  writes:

> On Thu, Oct 06, 2016 at 02:46:30PM -0500, Eric W. Biederman wrote:
>> Andrei Vagin  writes:
>> 
>> > The reason of this optimization is that umount() can hold namespace_sem
>> > for a long time, this semaphore is global, so it affects all users.
>> > Recently Eric W. Biederman added a per mount namespace limit on the
>> > number of mounts. The default number of mounts allowed per mount
>> > namespace at 100,000. Currently this value is allowed to construct a tree
>> > which requires hours to be umounted.
>> 
>> I am going to take a hard look at this as this problem sounds very
>> unfortunate.  My memory of going through this code before strongly
>> suggests that changing the last list_for_each_entry to
>> list_for_each_entry_reverse is going to impact the correctness of this
>> change.
>
> I have read this code again and you are right, list_for_each_entry can't
> be changed on list_for_each_entry_reverse here.
>
> I tested these changes more carefully and find one more issue, so I am
> going to send a new patch and would like to get your comments to it.
>
> Thank you for your time.

No problem.

A quick question.  You have introduced lookup_mnt_cont.  Is that a core
part of your fix or do you truly have problmenatic long hash chains.

Simply increasing the hash table size should fix problems long hash
chains (and there are other solutions like rhashtable that may be more
appropriate than pre-allocating large hash chains).

If it is not long hash chains introducing lookup_mnt_cont in your patch
is a distraction to the core of what is going on.

Perhaps I am blind but if the hash chains are not long I don't see mount
propagation could be more than quadratic in the worst case.  As there is
only a loop within a loop.  Or Is the tree walking in propagation_next
that bad?

Eric


Re: [PATCH] staging: sm750fb: Fix printk() style warning

2016-10-06 Thread Edward Lipinsky
On Sun, Oct 02, 2016 at 08:13:01PM +0200, Greg KH wrote:
> On Sun, Oct 02, 2016 at 11:05:05AM -0700, Edward Lipinsky wrote:
> > This patch fixes the checkpatch.pl warning:
> > 
> > WARNING: printk() should include KERN_ facility level
> > 
> > Signed-off-by: Edward Lipinsky 
> > ---
> >  drivers/staging/sm750fb/ddk750_help.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/staging/sm750fb/ddk750_help.c 
> > b/drivers/staging/sm750fb/ddk750_help.c
> > index 9637dd3..e72a29c 100644
> > --- a/drivers/staging/sm750fb/ddk750_help.c
> > +++ b/drivers/staging/sm750fb/ddk750_help.c
> > @@ -11,7 +11,7 @@ void ddk750_set_mmio(void __iomem *addr, unsigned short 
> > devId, char revId)
> > devId750 = devId;
> > revId750 = revId;
> > if (revId == 0xfe)
> > -   printk("found sm750le\n");
> > +   pr_info("found sm750le\n");
> 
> Why can't you use dev_info() here?
> 
> thanks,
> 
> greg k-h

It should work, but I'm not sure what should change in the header files to
do it--esp. to make the dev parameter available in ddk750_help.c.  (Only
sm750.c uses dev_ style logging now, the rest of the driver still uses pr_*.)

Thanks,

Ed Lipinsky


Re: [PATCH] staging: sm750fb: Fix printk() style warning

2016-10-06 Thread Edward Lipinsky
On Sun, Oct 02, 2016 at 08:13:01PM +0200, Greg KH wrote:
> On Sun, Oct 02, 2016 at 11:05:05AM -0700, Edward Lipinsky wrote:
> > This patch fixes the checkpatch.pl warning:
> > 
> > WARNING: printk() should include KERN_ facility level
> > 
> > Signed-off-by: Edward Lipinsky 
> > ---
> >  drivers/staging/sm750fb/ddk750_help.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/staging/sm750fb/ddk750_help.c 
> > b/drivers/staging/sm750fb/ddk750_help.c
> > index 9637dd3..e72a29c 100644
> > --- a/drivers/staging/sm750fb/ddk750_help.c
> > +++ b/drivers/staging/sm750fb/ddk750_help.c
> > @@ -11,7 +11,7 @@ void ddk750_set_mmio(void __iomem *addr, unsigned short 
> > devId, char revId)
> > devId750 = devId;
> > revId750 = revId;
> > if (revId == 0xfe)
> > -   printk("found sm750le\n");
> > +   pr_info("found sm750le\n");
> 
> Why can't you use dev_info() here?
> 
> thanks,
> 
> greg k-h

It should work, but I'm not sure what should change in the header files to
do it--esp. to make the dev parameter available in ddk750_help.c.  (Only
sm750.c uses dev_ style logging now, the rest of the driver still uses pr_*.)

Thanks,

Ed Lipinsky


Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping

2016-10-06 Thread Dou Liyang

Hi Yinghai

At 10/07/2016 05:20 AM, Yinghai Lu wrote:

On Thu, Oct 6, 2016 at 1:06 AM, Dou Liyang  wrote:


I seem to remember that in x2APIC Spec the x2APIC ID may be at 255 or
greater.


Good to know. Maybe later when one package have more cores like 30 cores etc.


If we do that judgment, it may be affect x2APIC's work in some other places.

I saw the MADT, the main reason may be that we define 0xff to acpi_id
in LAPIC mode.
As you said, it was like:
[   42.107902] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   42.120125] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   42.132361] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
...

How about doing the acpi_id check when we parse it in
acpi_parse_lapic().

8<

--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -233,6 +233,11 @@ acpi_parse_lapic(struct acpi_subtable_header * header,
const unsigned long end)

acpi_table_print_madt_entry(header);

+   if (processor->id >= 255) {
+   ++disabled_cpus;
+   return -EINVAL;
+   }
+
/*
 * We need to register disabled CPU as well to permit
 * counting disabled CPUs. This allows us to size


Yes, that should work. but should do the same thing for x2apic

in acpi_parse_x2apic should have


+   if (processor->local_apic_id == -1) {
+   ++disabled_cpus;
+   return -EINVAL;
+   }


that is the reason why i want to extend acpi_register_lapic()
to take extra disabled_id (one is 0xff and another is 0x)
so could save some lines.



Yes, I understood.
But I think adding an extra disabled_id is not a good way for
validating the apic_id. If the disabled_id is not just one id(-1 or
255), may be two or more, even be a range. what should we do for
extending our code?

Firstly, I am not sure that the "-1" could appear in the MADT, even if
the ACPI tables is unreasonable.

Seondly, I guess if we need the check, there are some reserved methods
in the kernel, such as "default_apic_id_valid", "x2apic_apic_id_valid"
and so on. we should extend all of them and use them for check.


CC'ed: Rafael and Lv

May I ask a question?

Is it possible that the "-1/ox" could appear in the MADT which 
is one of the ACPI tables?




Thanks

Yinghai







Re: [tip:x86/apic] x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping

2016-10-06 Thread Dou Liyang

Hi Yinghai

At 10/07/2016 05:20 AM, Yinghai Lu wrote:

On Thu, Oct 6, 2016 at 1:06 AM, Dou Liyang  wrote:


I seem to remember that in x2APIC Spec the x2APIC ID may be at 255 or
greater.


Good to know. Maybe later when one package have more cores like 30 cores etc.


If we do that judgment, it may be affect x2APIC's work in some other places.

I saw the MADT, the main reason may be that we define 0xff to acpi_id
in LAPIC mode.
As you said, it was like:
[   42.107902] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   42.120125] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
[   42.132361] ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled)
...

How about doing the acpi_id check when we parse it in
acpi_parse_lapic().

8<

--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -233,6 +233,11 @@ acpi_parse_lapic(struct acpi_subtable_header * header,
const unsigned long end)

acpi_table_print_madt_entry(header);

+   if (processor->id >= 255) {
+   ++disabled_cpus;
+   return -EINVAL;
+   }
+
/*
 * We need to register disabled CPU as well to permit
 * counting disabled CPUs. This allows us to size


Yes, that should work. but should do the same thing for x2apic

in acpi_parse_x2apic should have


+   if (processor->local_apic_id == -1) {
+   ++disabled_cpus;
+   return -EINVAL;
+   }


that is the reason why i want to extend acpi_register_lapic()
to take extra disabled_id (one is 0xff and another is 0x)
so could save some lines.



Yes, I understood.
But I think adding an extra disabled_id is not a good way for
validating the apic_id. If the disabled_id is not just one id(-1 or
255), may be two or more, even be a range. what should we do for
extending our code?

Firstly, I am not sure that the "-1" could appear in the MADT, even if
the ACPI tables is unreasonable.

Seondly, I guess if we need the check, there are some reserved methods
in the kernel, such as "default_apic_id_valid", "x2apic_apic_id_valid"
and so on. we should extend all of them and use them for check.


CC'ed: Rafael and Lv

May I ask a question?

Is it possible that the "-1/ox" could appear in the MADT which 
is one of the ACPI tables?




Thanks

Yinghai







Re: [PATCH 1/2] watchdog: Introduce update_arch_nmi_watchdog

2016-10-06 Thread Sam Ravnborg
On Thu, Oct 06, 2016 at 03:16:42PM -0700, Babu Moger wrote:
> Currently we do not have a way to enable/disable arch specific
> watchdog handlers if it was implemented by any of the architectures.
> 
> This patch introduces new function update_arch_nmi_watchdog
> which can be used to enable/disable architecture specific NMI
> watchdog handlers. Also exposes watchdog_enabled variable outside
> so that arch specific nmi watchdogs can use it to implement
> enalbe/disable behavour.
> 
> Signed-off-by: Babu Moger 
> ---
>  include/linux/nmi.h |1 +
>  kernel/watchdog.c   |   16 +---
>  2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 4630eea..01b4830 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -66,6 +66,7 @@ static inline bool trigger_allbutself_cpu_backtrace(void)
>  
>  #ifdef CONFIG_LOCKUP_DETECTOR
>  u64 hw_nmi_get_sample_period(int watchdog_thresh);
> +extern unsigned long watchdog_enabled;

The extern is within an #ifdef, but the definition later is
valid alway.
So extern definition should be outside the #ifdef to match the
actual implementation.

To manipulate / read watchdog_enabled two constants are used: 
NMI_WATCHDOG_ENABLED, SOFT_WATCHDOG_ENABLED

They should be visible too, so uses do not fall into the trap
and uses constants (like in patch 2).

Sam


Re: [PATCH 1/2] watchdog: Introduce update_arch_nmi_watchdog

2016-10-06 Thread Sam Ravnborg
On Thu, Oct 06, 2016 at 03:16:42PM -0700, Babu Moger wrote:
> Currently we do not have a way to enable/disable arch specific
> watchdog handlers if it was implemented by any of the architectures.
> 
> This patch introduces new function update_arch_nmi_watchdog
> which can be used to enable/disable architecture specific NMI
> watchdog handlers. Also exposes watchdog_enabled variable outside
> so that arch specific nmi watchdogs can use it to implement
> enalbe/disable behavour.
> 
> Signed-off-by: Babu Moger 
> ---
>  include/linux/nmi.h |1 +
>  kernel/watchdog.c   |   16 +---
>  2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 4630eea..01b4830 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -66,6 +66,7 @@ static inline bool trigger_allbutself_cpu_backtrace(void)
>  
>  #ifdef CONFIG_LOCKUP_DETECTOR
>  u64 hw_nmi_get_sample_period(int watchdog_thresh);
> +extern unsigned long watchdog_enabled;

The extern is within an #ifdef, but the definition later is
valid alway.
So extern definition should be outside the #ifdef to match the
actual implementation.

To manipulate / read watchdog_enabled two constants are used: 
NMI_WATCHDOG_ENABLED, SOFT_WATCHDOG_ENABLED

They should be visible too, so uses do not fall into the trap
and uses constants (like in patch 2).

Sam


Re: Scrolling down broken with "perf top --hierarchy"

2016-10-06 Thread Markus Trippelsdorf
On 2016.10.07 at 13:22 +0900, Namhyung Kim wrote:
> On Fri, Oct 07, 2016 at 05:51:18AM +0200, Markus Trippelsdorf wrote:
> > On 2016.10.07 at 10:17 +0900, Namhyung Kim wrote:
> > > On Thu, Oct 06, 2016 at 06:33:33PM +0200, Markus Trippelsdorf wrote:
> > > > Scrolling down is broken when using "perf top --hierarchy".
> > > > When it starts up everything is OK and one can scroll up and down to all
> > > > entries. But as further and further new entries get added to the list,
> > > > scrolling down is blocked (at the position of the last entry that was
> > > > shown directly after startup).
> > > 
> > > I think below patch will fix the problem.  Please check.
> > 
> > Yes. It works fine now. Many thanks.
> 
> Good.  Can I add your Tested-by then?

Sure. 

(And in the long run you should think of making "perf top --hierarchy"
the default for perf top, because it gives a much better (uncluttered)
overview of what is going on.)

-- 
Markus


Re: Scrolling down broken with "perf top --hierarchy"

2016-10-06 Thread Markus Trippelsdorf
On 2016.10.07 at 13:22 +0900, Namhyung Kim wrote:
> On Fri, Oct 07, 2016 at 05:51:18AM +0200, Markus Trippelsdorf wrote:
> > On 2016.10.07 at 10:17 +0900, Namhyung Kim wrote:
> > > On Thu, Oct 06, 2016 at 06:33:33PM +0200, Markus Trippelsdorf wrote:
> > > > Scrolling down is broken when using "perf top --hierarchy".
> > > > When it starts up everything is OK and one can scroll up and down to all
> > > > entries. But as further and further new entries get added to the list,
> > > > scrolling down is blocked (at the position of the last entry that was
> > > > shown directly after startup).
> > > 
> > > I think below patch will fix the problem.  Please check.
> > 
> > Yes. It works fine now. Many thanks.
> 
> Good.  Can I add your Tested-by then?

Sure. 

(And in the long run you should think of making "perf top --hierarchy"
the default for perf top, because it gives a much better (uncluttered)
overview of what is going on.)

-- 
Markus


Re: Scrolling down broken with "perf top --hierarchy"

2016-10-06 Thread Namhyung Kim
On Fri, Oct 07, 2016 at 05:51:18AM +0200, Markus Trippelsdorf wrote:
> On 2016.10.07 at 10:17 +0900, Namhyung Kim wrote:
> > On Thu, Oct 06, 2016 at 06:33:33PM +0200, Markus Trippelsdorf wrote:
> > > Scrolling down is broken when using "perf top --hierarchy".
> > > When it starts up everything is OK and one can scroll up and down to all
> > > entries. But as further and further new entries get added to the list,
> > > scrolling down is blocked (at the position of the last entry that was
> > > shown directly after startup).
> > 
> > I think below patch will fix the problem.  Please check.
> 
> Yes. It works fine now. Many thanks.

Good.  Can I add your Tested-by then?

Thanks,
Namhyung


Re: Scrolling down broken with "perf top --hierarchy"

2016-10-06 Thread Namhyung Kim
On Fri, Oct 07, 2016 at 05:51:18AM +0200, Markus Trippelsdorf wrote:
> On 2016.10.07 at 10:17 +0900, Namhyung Kim wrote:
> > On Thu, Oct 06, 2016 at 06:33:33PM +0200, Markus Trippelsdorf wrote:
> > > Scrolling down is broken when using "perf top --hierarchy".
> > > When it starts up everything is OK and one can scroll up and down to all
> > > entries. But as further and further new entries get added to the list,
> > > scrolling down is blocked (at the position of the last entry that was
> > > shown directly after startup).
> > 
> > I think below patch will fix the problem.  Please check.
> 
> Yes. It works fine now. Many thanks.

Good.  Can I add your Tested-by then?

Thanks,
Namhyung


Re: [PATCH] ftrace: Support full glob matching

2016-10-06 Thread Namhyung Kim
Hi Masami,

On Wed, Oct 05, 2016 at 08:58:15PM +0900, Masami Hiramatsu wrote:
> Use glob_match() to support flexible glob wildcards (*,?)
> and character classes ([) for ftrace.
> Since the full glob matching is slower than the current
> partial matching routines(*pat, pat*, *pat*), this leaves
> those routines and just add MATCH_GLOB for complex glob
> expression.
> 
> e.g.
> 
> [root@localhost tracing]# echo 'sched*group' > set_ftrace_filter
> [root@localhost tracing]# cat set_ftrace_filter
> sched_free_group
> sched_change_group
> sched_create_group
> sched_online_group
> sched_destroy_group
> sched_offline_group
> [root@localhost tracing]# echo '[Ss]y[Ss]_*' > set_ftrace_filter
> [root@localhost tracing]# head set_ftrace_filter
> sys_arch_prctl
> sys_rt_sigreturn
> sys_ioperm
> SyS_iopl
> sys_modify_ldt
> SyS_mmap
> SyS_set_thread_area
> SyS_get_thread_area
> SyS_set_tid_address
> sys_fork
> 
> 
> Signed-off-by: Masami Hiramatsu 

Nice!

Acked-by: Namhyung Kim 

Thanks,
Namhyung


> ---
>  Documentation/trace/events.txt |9 +++--
>  Documentation/trace/ftrace.txt |9 +++--
>  kernel/trace/Kconfig   |2 ++
>  kernel/trace/ftrace.c  |4 
>  kernel/trace/trace.c   |2 +-
>  kernel/trace/trace.h   |2 ++
>  kernel/trace/trace_events_filter.c |   17 -
>  7 files changed, 31 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
> index 08d74d7..2cc08d4 100644
> --- a/Documentation/trace/events.txt
> +++ b/Documentation/trace/events.txt
> @@ -189,16 +189,13 @@ And for string fields they are:
>  
>  ==, !=, ~
>  
> -The glob (~) only accepts a wild card character (*) at the start and or
> -end of the string. For example:
> +The glob (~) accepts a wild card character (*,?) and character classes
> +([). For example:
>  
>prev_comm ~ "*sh"
>prev_comm ~ "sh*"
>prev_comm ~ "*sh*"
> -
> -But does not allow for it to be within the string:
> -
> -  prev_comm ~ "ba*sh"   <-- is invalid
> +  prev_comm ~ "ba*sh"
>  
>  5.2 Setting filters
>  ---
> diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
> index a6b3705..b26abc7 100644
> --- a/Documentation/trace/ftrace.txt
> +++ b/Documentation/trace/ftrace.txt
> @@ -2218,16 +2218,13 @@ hrtimer_interrupt
>  sys_nanosleep
>  
>  
> -Perhaps this is not enough. The filters also allow simple wild
> -cards. Only the following are currently available
> +Perhaps this is not enough. The filters also allow glob(7) matching.
>  
>*  - will match functions that begin with 
>*  - will match functions that end with 
>** - will match functions that have  in it
> -
> -These are the only wild cards which are supported.
> -
> -  * will not work.
> +  * - will match functions that begin with
> +   and end with 
>  
>  Note: It is better to use quotes to enclose the wild cards,
>otherwise the shell may expand the parameters into names
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index ba33267..aa6eb15 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -70,6 +70,7 @@ config FTRACE_NMI_ENTER
>  
>  config EVENT_TRACING
>   select CONTEXT_SWITCH_TRACER
> +select GLOB
>   bool
>  
>  config CONTEXT_SWITCH_TRACER
> @@ -133,6 +134,7 @@ config FUNCTION_TRACER
>   select KALLSYMS
>   select GENERIC_TRACER
>   select CONTEXT_SWITCH_TRACER
> +select GLOB
>   help
> Enable the kernel to trace every kernel function. This is done
> by using a compiler feature to insert a small, 5-byte No-Operation
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 84752c8..5741184 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -3493,6 +3493,10 @@ static int ftrace_match(char *str, struct ftrace_glob 
> *g)
>   memcmp(str + slen - g->len, g->search, g->len) == 0)
>   matched = 1;
>   break;
> + case MATCH_GLOB:
> + if (glob_match(g->search, str))
> + matched = 1;
> + break;
>   }
>  
>   return matched;
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 37824d9..ae343e7 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -4065,7 +4065,7 @@ static const char readme_msg[] =
>   "\n  available_filter_functions - list of functions that can be 
> filtered on\n"
>   "  set_ftrace_filter\t- echo function name in here to only trace 
> these\n"
>   "\t\t\t  functions\n"
> - "\t accepts: func_full_name, *func_end, func_begin*, 
> *func_middle*\n"
> + "\t accepts: func_full_name or glob-matching-pattern\n"
>   "\t modules: Can select a group via module\n"
>   "\t  Format: :mod:\n"
>   "\t example: echo :mod:ext3 > 

Re: [PATCH] ftrace: Support full glob matching

2016-10-06 Thread Namhyung Kim
Hi Masami,

On Wed, Oct 05, 2016 at 08:58:15PM +0900, Masami Hiramatsu wrote:
> Use glob_match() to support flexible glob wildcards (*,?)
> and character classes ([) for ftrace.
> Since the full glob matching is slower than the current
> partial matching routines(*pat, pat*, *pat*), this leaves
> those routines and just add MATCH_GLOB for complex glob
> expression.
> 
> e.g.
> 
> [root@localhost tracing]# echo 'sched*group' > set_ftrace_filter
> [root@localhost tracing]# cat set_ftrace_filter
> sched_free_group
> sched_change_group
> sched_create_group
> sched_online_group
> sched_destroy_group
> sched_offline_group
> [root@localhost tracing]# echo '[Ss]y[Ss]_*' > set_ftrace_filter
> [root@localhost tracing]# head set_ftrace_filter
> sys_arch_prctl
> sys_rt_sigreturn
> sys_ioperm
> SyS_iopl
> sys_modify_ldt
> SyS_mmap
> SyS_set_thread_area
> SyS_get_thread_area
> SyS_set_tid_address
> sys_fork
> 
> 
> Signed-off-by: Masami Hiramatsu 

Nice!

Acked-by: Namhyung Kim 

Thanks,
Namhyung


> ---
>  Documentation/trace/events.txt |9 +++--
>  Documentation/trace/ftrace.txt |9 +++--
>  kernel/trace/Kconfig   |2 ++
>  kernel/trace/ftrace.c  |4 
>  kernel/trace/trace.c   |2 +-
>  kernel/trace/trace.h   |2 ++
>  kernel/trace/trace_events_filter.c |   17 -
>  7 files changed, 31 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
> index 08d74d7..2cc08d4 100644
> --- a/Documentation/trace/events.txt
> +++ b/Documentation/trace/events.txt
> @@ -189,16 +189,13 @@ And for string fields they are:
>  
>  ==, !=, ~
>  
> -The glob (~) only accepts a wild card character (*) at the start and or
> -end of the string. For example:
> +The glob (~) accepts a wild card character (*,?) and character classes
> +([). For example:
>  
>prev_comm ~ "*sh"
>prev_comm ~ "sh*"
>prev_comm ~ "*sh*"
> -
> -But does not allow for it to be within the string:
> -
> -  prev_comm ~ "ba*sh"   <-- is invalid
> +  prev_comm ~ "ba*sh"
>  
>  5.2 Setting filters
>  ---
> diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
> index a6b3705..b26abc7 100644
> --- a/Documentation/trace/ftrace.txt
> +++ b/Documentation/trace/ftrace.txt
> @@ -2218,16 +2218,13 @@ hrtimer_interrupt
>  sys_nanosleep
>  
>  
> -Perhaps this is not enough. The filters also allow simple wild
> -cards. Only the following are currently available
> +Perhaps this is not enough. The filters also allow glob(7) matching.
>  
>*  - will match functions that begin with 
>*  - will match functions that end with 
>** - will match functions that have  in it
> -
> -These are the only wild cards which are supported.
> -
> -  * will not work.
> +  * - will match functions that begin with
> +   and end with 
>  
>  Note: It is better to use quotes to enclose the wild cards,
>otherwise the shell may expand the parameters into names
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index ba33267..aa6eb15 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -70,6 +70,7 @@ config FTRACE_NMI_ENTER
>  
>  config EVENT_TRACING
>   select CONTEXT_SWITCH_TRACER
> +select GLOB
>   bool
>  
>  config CONTEXT_SWITCH_TRACER
> @@ -133,6 +134,7 @@ config FUNCTION_TRACER
>   select KALLSYMS
>   select GENERIC_TRACER
>   select CONTEXT_SWITCH_TRACER
> +select GLOB
>   help
> Enable the kernel to trace every kernel function. This is done
> by using a compiler feature to insert a small, 5-byte No-Operation
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 84752c8..5741184 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -3493,6 +3493,10 @@ static int ftrace_match(char *str, struct ftrace_glob 
> *g)
>   memcmp(str + slen - g->len, g->search, g->len) == 0)
>   matched = 1;
>   break;
> + case MATCH_GLOB:
> + if (glob_match(g->search, str))
> + matched = 1;
> + break;
>   }
>  
>   return matched;
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 37824d9..ae343e7 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -4065,7 +4065,7 @@ static const char readme_msg[] =
>   "\n  available_filter_functions - list of functions that can be 
> filtered on\n"
>   "  set_ftrace_filter\t- echo function name in here to only trace 
> these\n"
>   "\t\t\t  functions\n"
> - "\t accepts: func_full_name, *func_end, func_begin*, 
> *func_middle*\n"
> + "\t accepts: func_full_name or glob-matching-pattern\n"
>   "\t modules: Can select a group via module\n"
>   "\t  Format: :mod:\n"
>   "\t example: echo :mod:ext3 > set_ftrace_filter\n"
> diff --git 

Re: [PATCH] tools lib traceevent: Fix kbuffer_read_at_offset()

2016-10-06 Thread Namhyung Kim
Hi Steve,

On Wed, Oct 05, 2016 at 09:28:01AM -0400, Steven Rostedt wrote:
> On Sat,  1 Oct 2016 19:17:00 +0900
> Namhyung Kim  wrote:
> 
> > When it's called with an offset less than or equal to the first event,
> > it'll return a garbage value since the data is not initialized.
> 
> Well, it can at most be equal to (unless offset is negative) because
> kbuffer_load_subbuffer() sets kbuf->curr to zero.

Actually kbuffer_load_subbuffer() calls kbuf->next_event().  Inside
the function it has a loop updating next valid event.  Sometimes, the
data starts with TIME_EXTEND with value of 0 and the loop skips it
which ended up setting kbuf->curr to 8. :)

I'll take a look it later.

> 
> But that said, it looks like offset == 0 is buggy.
> 
> Acked-by: Steven Rostedt 

Thanks,
Namhyung

> 
> 
> -- Steve
> 
> > 
> > Cc: Steven Rostedt 
> > Signed-off-by: Namhyung Kim 
> > ---
> >  tools/lib/traceevent/kbuffer-parse.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/tools/lib/traceevent/kbuffer-parse.c 
> > b/tools/lib/traceevent/kbuffer-parse.c
> > index 3bcada3ae05a..65984f1c2974 100644
> > --- a/tools/lib/traceevent/kbuffer-parse.c
> > +++ b/tools/lib/traceevent/kbuffer-parse.c
> > @@ -622,6 +622,7 @@ void *kbuffer_read_at_offset(struct kbuffer *kbuf, int 
> > offset,
> >  
> > /* Reset the buffer */
> > kbuffer_load_subbuffer(kbuf, kbuf->subbuffer);
> > +   data = kbuffer_read_event(kbuf, ts);
> >  
> > while (kbuf->curr < offset) {
> > data = kbuffer_next_event(kbuf, ts);
> 


Re: [PATCH] tools lib traceevent: Fix kbuffer_read_at_offset()

2016-10-06 Thread Namhyung Kim
Hi Steve,

On Wed, Oct 05, 2016 at 09:28:01AM -0400, Steven Rostedt wrote:
> On Sat,  1 Oct 2016 19:17:00 +0900
> Namhyung Kim  wrote:
> 
> > When it's called with an offset less than or equal to the first event,
> > it'll return a garbage value since the data is not initialized.
> 
> Well, it can at most be equal to (unless offset is negative) because
> kbuffer_load_subbuffer() sets kbuf->curr to zero.

Actually kbuffer_load_subbuffer() calls kbuf->next_event().  Inside
the function it has a loop updating next valid event.  Sometimes, the
data starts with TIME_EXTEND with value of 0 and the loop skips it
which ended up setting kbuf->curr to 8. :)

I'll take a look it later.

> 
> But that said, it looks like offset == 0 is buggy.
> 
> Acked-by: Steven Rostedt 

Thanks,
Namhyung

> 
> 
> -- Steve
> 
> > 
> > Cc: Steven Rostedt 
> > Signed-off-by: Namhyung Kim 
> > ---
> >  tools/lib/traceevent/kbuffer-parse.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/tools/lib/traceevent/kbuffer-parse.c 
> > b/tools/lib/traceevent/kbuffer-parse.c
> > index 3bcada3ae05a..65984f1c2974 100644
> > --- a/tools/lib/traceevent/kbuffer-parse.c
> > +++ b/tools/lib/traceevent/kbuffer-parse.c
> > @@ -622,6 +622,7 @@ void *kbuffer_read_at_offset(struct kbuffer *kbuf, int 
> > offset,
> >  
> > /* Reset the buffer */
> > kbuffer_load_subbuffer(kbuf, kbuf->subbuffer);
> > +   data = kbuffer_read_event(kbuf, ts);
> >  
> > while (kbuf->curr < offset) {
> > data = kbuffer_next_event(kbuf, ts);
> 


[PATCH - stable 4.1 backport] block: don't release bdi while request_queue has live references

2016-10-06 Thread NeilBrown

Hi,
 This patch was marked for stable v4.2+, but is needed for v4.1 as well.
 It fixes a regression introduced by:
  Fixes: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.")

 This is a backport to 4.1.33 which has been tested and confirmed to
 work.
 Bug report at
  https://bugzilla.kernel.org/show_bug.cgi?id=173031

 Please queue for 4.1.y

Thanks,
NeilBrown



From: Tejun Heo 
Date: Tue, 8 Sep 2015 12:20:22 -0400
Subject: [PATCH] block: don't release bdi while request_queue has live
 references

[ Upstream commit: b02176f30cd30acccd3b633ab7d9aed8b5da52ff ]

bdi's are initialized in two steps, bdi_init() and bdi_register(), but
destroyed in a single step by bdi_destroy() which, for a bdi embedded
in a request_queue, is called during blk_cleanup_queue() which makes
the queue invisible and starts the draining of remaining usages.

A request_queue's user can access the congestion state of the embedded
bdi as long as it holds a reference to the queue.  As such, it may
access the congested state of a queue which finished
blk_cleanup_queue() but hasn't reached blk_release_queue() yet.
Because the congested state was embedded in backing_dev_info which in
turn is embedded in request_queue, accessing the congested state after
bdi_destroy() was called was fine.  The bdi was destroyed but the
memory region for the congested state remained accessible till the
queue got released.

a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in
bdi_writeback") changed the situation.  Now, the root congested state
which is expected to be pinned while request_queue remains accessible
is separately reference counted and the base ref is put during
bdi_destroy().  This means that the root congested state may go away
prematurely while the queue is between bdi_dstroy() and
blk_cleanup_queue(), which was detected by Andrey's KASAN tests.

The root cause of this problem is that bdi doesn't distinguish the two
steps of destruction, unregistration and release, and now the root
congested state actually requires a separate release step.  To fix the
issue, this patch separates out bdi_unregister() and bdi_exit() from
bdi_destroy().  bdi_unregister() is called from blk_cleanup_queue()
and bdi_exit() from blk_release_queue().  bdi_destroy() is now just a
simple wrapper calling the two steps back-to-back.

While at it, the prototype of bdi_destroy() is moved right below
bdi_setup_and_register() so that the counterpart operations are
located together.

Signed-off-by: Tejun Heo 
Fixes: a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in 
bdi_writeback")
Fixes: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.")
Cc: sta...@vger.kernel.org # v4.2+
Reported-and-tested-by: Andrey Konovalov 
Reported-and-tested-by: Francesco Dolcini  (for 4.1 
backport)
Link: 
http://lkml.kernel.org/g/CAAeHK+zUJ74Zn17=rOyxacHU18SgCfC6bsYW=6kcy5gxjbw...@mail.gmail.com
Reviewed-by: Jan Kara 
Reviewed-by: Jeff Moyer 
Signed-off-by: Jens Axboe 
Signed-off-by: NeilBrown 

---
 block/blk-core.c|  2 +-
 block/blk-sysfs.c   |  1 +
 include/linux/backing-dev.h |  5 -
 mm/backing-dev.c| 17 ++---
 4 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index bbbf36e6066b..edf8d72daa83 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -554,7 +554,7 @@ void blk_cleanup_queue(struct request_queue *q)
q->queue_lock = >__queue_lock;
spin_unlock_irq(lock);

-   bdi_destroy(>backing_dev_info);
+   bdi_unregister(>backing_dev_info);

/* @q is and will stay empty, shutdown and put */
blk_put_queue(q);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 2b8fd302f677..c0bb3291859c 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -501,6 +501,7 @@ static void blk_release_queue(struct kobject *kobj)
struct request_queue *q =
container_of(kobj, struct request_queue, kobj);

+   bdi_exit(>backing_dev_info);
blkcg_exit_queue(q);

if (q->elevator) {
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index d87d8eced064..17d1799f8552 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -110,12 +110,15 @@ struct backing_dev_info {
 struct backing_dev_info *inode_to_bdi(struct inode *inode);

 int __must_check bdi_init(struct backing_dev_info *bdi);
-void bdi_destroy(struct backing_dev_info *bdi);
+void bdi_exit(struct backing_dev_info *bdi);

 __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
const char *fmt, ...);
 int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
+void bdi_unregister(struct backing_dev_info *bdi);
+void bdi_destroy(struct backing_dev_info *bdi);
+
 int 

[PATCH - stable 4.1 backport] block: don't release bdi while request_queue has live references

2016-10-06 Thread NeilBrown

Hi,
 This patch was marked for stable v4.2+, but is needed for v4.1 as well.
 It fixes a regression introduced by:
  Fixes: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.")

 This is a backport to 4.1.33 which has been tested and confirmed to
 work.
 Bug report at
  https://bugzilla.kernel.org/show_bug.cgi?id=173031

 Please queue for 4.1.y

Thanks,
NeilBrown



From: Tejun Heo 
Date: Tue, 8 Sep 2015 12:20:22 -0400
Subject: [PATCH] block: don't release bdi while request_queue has live
 references

[ Upstream commit: b02176f30cd30acccd3b633ab7d9aed8b5da52ff ]

bdi's are initialized in two steps, bdi_init() and bdi_register(), but
destroyed in a single step by bdi_destroy() which, for a bdi embedded
in a request_queue, is called during blk_cleanup_queue() which makes
the queue invisible and starts the draining of remaining usages.

A request_queue's user can access the congestion state of the embedded
bdi as long as it holds a reference to the queue.  As such, it may
access the congested state of a queue which finished
blk_cleanup_queue() but hasn't reached blk_release_queue() yet.
Because the congested state was embedded in backing_dev_info which in
turn is embedded in request_queue, accessing the congested state after
bdi_destroy() was called was fine.  The bdi was destroyed but the
memory region for the congested state remained accessible till the
queue got released.

a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in
bdi_writeback") changed the situation.  Now, the root congested state
which is expected to be pinned while request_queue remains accessible
is separately reference counted and the base ref is put during
bdi_destroy().  This means that the root congested state may go away
prematurely while the queue is between bdi_dstroy() and
blk_cleanup_queue(), which was detected by Andrey's KASAN tests.

The root cause of this problem is that bdi doesn't distinguish the two
steps of destruction, unregistration and release, and now the root
congested state actually requires a separate release step.  To fix the
issue, this patch separates out bdi_unregister() and bdi_exit() from
bdi_destroy().  bdi_unregister() is called from blk_cleanup_queue()
and bdi_exit() from blk_release_queue().  bdi_destroy() is now just a
simple wrapper calling the two steps back-to-back.

While at it, the prototype of bdi_destroy() is moved right below
bdi_setup_and_register() so that the counterpart operations are
located together.

Signed-off-by: Tejun Heo 
Fixes: a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in 
bdi_writeback")
Fixes: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.")
Cc: sta...@vger.kernel.org # v4.2+
Reported-and-tested-by: Andrey Konovalov 
Reported-and-tested-by: Francesco Dolcini  (for 4.1 
backport)
Link: 
http://lkml.kernel.org/g/CAAeHK+zUJ74Zn17=rOyxacHU18SgCfC6bsYW=6kcy5gxjbw...@mail.gmail.com
Reviewed-by: Jan Kara 
Reviewed-by: Jeff Moyer 
Signed-off-by: Jens Axboe 
Signed-off-by: NeilBrown 

---
 block/blk-core.c|  2 +-
 block/blk-sysfs.c   |  1 +
 include/linux/backing-dev.h |  5 -
 mm/backing-dev.c| 17 ++---
 4 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index bbbf36e6066b..edf8d72daa83 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -554,7 +554,7 @@ void blk_cleanup_queue(struct request_queue *q)
q->queue_lock = >__queue_lock;
spin_unlock_irq(lock);

-   bdi_destroy(>backing_dev_info);
+   bdi_unregister(>backing_dev_info);

/* @q is and will stay empty, shutdown and put */
blk_put_queue(q);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 2b8fd302f677..c0bb3291859c 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -501,6 +501,7 @@ static void blk_release_queue(struct kobject *kobj)
struct request_queue *q =
container_of(kobj, struct request_queue, kobj);

+   bdi_exit(>backing_dev_info);
blkcg_exit_queue(q);

if (q->elevator) {
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index d87d8eced064..17d1799f8552 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -110,12 +110,15 @@ struct backing_dev_info {
 struct backing_dev_info *inode_to_bdi(struct inode *inode);

 int __must_check bdi_init(struct backing_dev_info *bdi);
-void bdi_destroy(struct backing_dev_info *bdi);
+void bdi_exit(struct backing_dev_info *bdi);

 __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
const char *fmt, ...);
 int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
+void bdi_unregister(struct backing_dev_info *bdi);
+void bdi_destroy(struct backing_dev_info *bdi);
+
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
  

loop mount: kernel BUG at lib/percpu-refcount.c:231

2016-10-06 Thread Dave Young
Hi,

Below bug happened to me while loop mount a file image after stopping a
kvm guest. But it only happend once til now..

[ 4761.031686] [ cut here ]
[ 4761.075984] kernel BUG at lib/percpu-refcount.c:231!
[ 4761.120184] invalid opcode:  [#1] SMP
[ 4761.164307] Modules linked in: loop(+) macvtap macvlan tun ccm rfcomm fuse 
snd_hda_codec_hdmi cmac bnep vfat fat kvm_intel kvm irqbypass arc4 i915 
rtsx_pci_sdmmc intel_gtt drm_kms_helper iwlmvm syscopyarea sysfillrect 
sysimgblt fb_sys_fops mac80211 drm snd_hda_codec_realtek snd_hda_codec_generic 
snd_hda_intel snd_hda_codec btusb snd_hwdep iwlwifi snd_hda_core input_leds 
btrtl snd_seq pcspkr serio_raw btbcm snd_seq_device i2c_i801 btintel cfg80211 
bluetooth snd_pcm i2c_smbus rtsx_pci mfd_core e1000e ptp pps_core snd_timer 
thinkpad_acpi wmi snd soundcore rfkill video nfsd auth_rpcgss nfs_acl lockd 
grace sunrpc
[ 4761.323045] CPU: 1 PID: 25890 Comm: modprobe Not tainted 4.8.0+ #168
[ 4761.377791] Hardware name: LENOVO 20ARS1BJ02/20ARS1BJ02, BIOS GJET86WW (2.36 
) 12/04/2015
[ 4761.433704] task: 986fd1b7d780 task.stack: a85842528000
[ 4761.490120] RIP: 0010:[]  [] 
__percpu_ref_switch_to_percpu+0xf8/0x100
[ 4761.548138] RSP: 0018:a8584252bb38  EFLAGS: 00010246
[ 4761.604673] RAX:  RBX: 986fbdca3200 RCX: 
[ 4761.662416] RDX: 00983288 RSI: 0001 RDI: 986fbdca3958
[ 4761.720473] RBP: a8584252bb80 R08: 0008 R09: 0008
[ 4761.779270] R10:  R11:  R12: 
[ 4761.837603] R13: 9870fa22c800 R14: 9870fa22c80c R15: 986fbdca3200
[ 4761.895870] FS:  7fc286eb4640() GS:98711f24() 
knlGS:
[ 4761.954596] CS:  0010 DS:  ES:  CR0: 80050033
[ 4762.012978] CR2: 555c3a20ee78 CR3: 000212988000 CR4: 001406e0
[ 4762.072454] Stack:
[ 4762.131283]  9870f2f37800 9870c8e46000 9870fa22c880 
a8584252bbb8
[ 4762.190776]  ae2a147c ba169577 986fbdca3200 
9870fa22c870
[ 4762.251149]  9870fa22c800 a8584252bb90 ae2b3294 
a8584252bbc8
[ 4762.311657] Call Trace:
[ 4762.371157]  [] ? kobject_uevent_env+0xfc/0x3b0
[ 4762.431483]  [] percpu_ref_switch_to_percpu+0x14/0x20
[ 4762.492093]  [] blk_register_queue+0xbe/0x120
[ 4762.552727]  [] device_add_disk+0x1c4/0x470
[ 4762.614155]  [] loop_add+0x1d9/0x260 [loop]
[ 4762.674042]  [] loop_init+0x119/0x16c [loop]
[ 4762.733949]  [] ? 0xc02ff000
[ 4762.793563]  [] do_one_initcall+0x4b/0x180
[ 4762.853068]  [] ? free_vmap_area_noflush+0x43/0xb0
[ 4762.913665]  [] do_init_module+0x55/0x1c4
[ 4762.973400]  [] load_module+0x1fc4/0x23e0
[ 4763.033545]  [] ? __symbol_put+0x60/0x60
[ 4763.094281]  [] SYSC_init_module+0x138/0x150
[ 4763.154985]  [] SyS_init_module+0x9/0x10
[ 4763.215577]  [] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 4763.277044] Code: 00 48 c7 c7 20 c7 a8 ae 48 63 d2 e8 63 ef ff ff 3b 05 81 
a9 7d 00 89 c2 7c cd 48 8b 43 08 48 83 e0 fe 48 89 43 08 e9 3c ff ff ff <0f> 0b 
e8 81 b6 d9 ff 90 55 48 89 e5 41 54 4c 8d 67 d8 53 48 89 
[ 4763.342964] RIP  [] 
__percpu_ref_switch_to_percpu+0xf8/0x100
[ 4763.407151]  RSP 

Thanks
Dave


loop mount: kernel BUG at lib/percpu-refcount.c:231

2016-10-06 Thread Dave Young
Hi,

Below bug happened to me while loop mount a file image after stopping a
kvm guest. But it only happend once til now..

[ 4761.031686] [ cut here ]
[ 4761.075984] kernel BUG at lib/percpu-refcount.c:231!
[ 4761.120184] invalid opcode:  [#1] SMP
[ 4761.164307] Modules linked in: loop(+) macvtap macvlan tun ccm rfcomm fuse 
snd_hda_codec_hdmi cmac bnep vfat fat kvm_intel kvm irqbypass arc4 i915 
rtsx_pci_sdmmc intel_gtt drm_kms_helper iwlmvm syscopyarea sysfillrect 
sysimgblt fb_sys_fops mac80211 drm snd_hda_codec_realtek snd_hda_codec_generic 
snd_hda_intel snd_hda_codec btusb snd_hwdep iwlwifi snd_hda_core input_leds 
btrtl snd_seq pcspkr serio_raw btbcm snd_seq_device i2c_i801 btintel cfg80211 
bluetooth snd_pcm i2c_smbus rtsx_pci mfd_core e1000e ptp pps_core snd_timer 
thinkpad_acpi wmi snd soundcore rfkill video nfsd auth_rpcgss nfs_acl lockd 
grace sunrpc
[ 4761.323045] CPU: 1 PID: 25890 Comm: modprobe Not tainted 4.8.0+ #168
[ 4761.377791] Hardware name: LENOVO 20ARS1BJ02/20ARS1BJ02, BIOS GJET86WW (2.36 
) 12/04/2015
[ 4761.433704] task: 986fd1b7d780 task.stack: a85842528000
[ 4761.490120] RIP: 0010:[]  [] 
__percpu_ref_switch_to_percpu+0xf8/0x100
[ 4761.548138] RSP: 0018:a8584252bb38  EFLAGS: 00010246
[ 4761.604673] RAX:  RBX: 986fbdca3200 RCX: 
[ 4761.662416] RDX: 00983288 RSI: 0001 RDI: 986fbdca3958
[ 4761.720473] RBP: a8584252bb80 R08: 0008 R09: 0008
[ 4761.779270] R10:  R11:  R12: 
[ 4761.837603] R13: 9870fa22c800 R14: 9870fa22c80c R15: 986fbdca3200
[ 4761.895870] FS:  7fc286eb4640() GS:98711f24() 
knlGS:
[ 4761.954596] CS:  0010 DS:  ES:  CR0: 80050033
[ 4762.012978] CR2: 555c3a20ee78 CR3: 000212988000 CR4: 001406e0
[ 4762.072454] Stack:
[ 4762.131283]  9870f2f37800 9870c8e46000 9870fa22c880 
a8584252bbb8
[ 4762.190776]  ae2a147c ba169577 986fbdca3200 
9870fa22c870
[ 4762.251149]  9870fa22c800 a8584252bb90 ae2b3294 
a8584252bbc8
[ 4762.311657] Call Trace:
[ 4762.371157]  [] ? kobject_uevent_env+0xfc/0x3b0
[ 4762.431483]  [] percpu_ref_switch_to_percpu+0x14/0x20
[ 4762.492093]  [] blk_register_queue+0xbe/0x120
[ 4762.552727]  [] device_add_disk+0x1c4/0x470
[ 4762.614155]  [] loop_add+0x1d9/0x260 [loop]
[ 4762.674042]  [] loop_init+0x119/0x16c [loop]
[ 4762.733949]  [] ? 0xc02ff000
[ 4762.793563]  [] do_one_initcall+0x4b/0x180
[ 4762.853068]  [] ? free_vmap_area_noflush+0x43/0xb0
[ 4762.913665]  [] do_init_module+0x55/0x1c4
[ 4762.973400]  [] load_module+0x1fc4/0x23e0
[ 4763.033545]  [] ? __symbol_put+0x60/0x60
[ 4763.094281]  [] SYSC_init_module+0x138/0x150
[ 4763.154985]  [] SyS_init_module+0x9/0x10
[ 4763.215577]  [] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 4763.277044] Code: 00 48 c7 c7 20 c7 a8 ae 48 63 d2 e8 63 ef ff ff 3b 05 81 
a9 7d 00 89 c2 7c cd 48 8b 43 08 48 83 e0 fe 48 89 43 08 e9 3c ff ff ff <0f> 0b 
e8 81 b6 d9 ff 90 55 48 89 e5 41 54 4c 8d 67 d8 53 48 89 
[ 4763.342964] RIP  [] 
__percpu_ref_switch_to_percpu+0xf8/0x100
[ 4763.407151]  RSP 

Thanks
Dave


[GIT PULL] Please pull powerpc/linux.git powerpc-4.9-1 tag

2016-10-06 Thread Michael Ellerman
Hi Linus,

Please pull the first batch of powerpc updates for 4.9:

The following changes since commit c6935931c1894ff857616ff8549b61236a19148f:

  Linux 4.8-rc5 (2016-09-04 14:31:46 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.9-1

for you to fetch changes up to b7b7013cac55d794940bd9cb7b7c55c9dececac4:

  powerpc/bpf: Add support for bpf constant blinding (2016-10-04 20:33:20 +1100)


powerpc updates for 4.9

Highlights:
 - Major rework of Book3S 64-bit exception vectors (Nicholas Piggin)
   - Use gas sections for arranging exception vectors et. al.
 - Large set of TM cleanups and selftests (Cyril Bur)
 - Enable transactional memory (TM) lazily for userspace (Cyril Bur)
 - Support for XZ compression in the zImage wrapper (Oliver O'Halloran)
 - Add support for bpf constant blinding (Naveen N. Rao)
 - Beginnings of upstream support for PA Semi Nemo motherboards (Darren Stevens)

Fixes:
 - Ensure .mem(init|exit).text are within _stext/_etext (Michael Ellerman)
 - xmon: Don't use ld on 32-bit (Michael Ellerman)
 - vdso64: Use double word compare on pointers (Anton Blanchard)
 - powerpc/nvram: Fix an incorrect partition merge (Pan Xinhui)
 - powerpc: Fix usage of _PAGE_RO in hugepage (Christophe Leroy)
 - powerpc/mm: Update FORCE_MAX_ZONEORDER range to allow hugetlb w/4K (Aneesh 
Kumar K.V)
 - Fix memory leak in queue_hotplug_event() error path (Andrew Donnellan)
 - Replay hypervisor maintenance interrupt first (Nicholas Piggin)

Cleanups & features:
 - Sparse fixes/cleanups (Daniel Axtens)
 - Preserve CFAR value on SLB miss caused by access to bogus address (Paul 
Mackerras)
 - Radix MMU fixups for POWER9 (Aneesh Kumar K.V)
 - Support for setting used_(vsr|vr|spe) in sigreturn path (for CRIU) (Simon 
Guo)
 - Optimise syscall entry for virtual, relocatable case (Nicholas Piggin)
 - Optimise MSR handling in exception handling (Nicholas Piggin)
 - Support for kexec with Radix MMU (Benjamin Herrenschmidt)
 - powernv EEH fixes (Russell Currey)
 - Suprise PCI hotplug support for powernv (Gavin Shan)
 - Endian/sparse fixes for powernv PCI (Gavin Shan)
 - Defconfig updates (Anton Blanchard)
 - Various performance optimisations (Anton Blanchard)
   - Align hot loops of memset() and backwards_memcpy()
   - During context switch, check before setting mm_cpumask
   - Remove static branch prediction in atomic{, 64}_add_unless
   - Only disable HAVE_EFFICIENT_UNALIGNED_ACCESS on POWER7 little endian
   - Set default CPU type to POWER8 for little endian builds

 - KVM: PPC: Book3S HV: Migrate pinned pages out of CMA (Balbir Singh)
 - cxl: Flush PSL cache before resetting the adapter (Frederic Barrat)
 - cxl: replace loop with for_each_child_of_node(), remove unneeded 
of_node_put() (Andrew Donnellan)
 - Fix HV facility unavailable to use correct handler (Nicholas Piggin)
 - Remove unnecessary syscall trampoline (Nicholas Piggin)
 - fadump: Fix build break when CONFIG_PROC_VMCORE=n (Michael Ellerman)
 - Quieten EEH message when no adapters are found (Anton Blanchard)
 - powernv: Add PHB register dump debugfs handle (Russell Currey)
 - Use kprobe blacklist for exception handlers & asm functions (Nicholas Piggin)
 - Document the syscall ABI (Nicholas Piggin)
 - MAINTAINERS: Update cxl maintainers (Michael Neuling)
 - powerpc: Remove all usages of NO_IRQ (Michael Ellerman)

Minor cleanups:
 - Andrew Donnellan, Christophe Leroy, Colin Ian King, Cyril Bur, Frederic 
Barrat,
   Pan Xinhui, PrasannaKumar Muralidharan, Rui Teng, Simon Guo.


Andrew Donnellan (3):
  powerpc/pseries: fix memory leak in queue_hotplug_event() error path
  powerpc/powernv: Fix comment style and spelling
  cxl: replace loop with for_each_child_of_node(), remove unneeded 
of_node_put()

Aneesh Kumar K.V (6):
  powerpc/book3s: Add a cpu table entry for different POWER9 revs
  powerpc/mm/radix: Use different RTS encoding for different POWER9 revs
  powerpc/mm/radix: Use different pte update sequence for different POWER9 
revs
  powerpc/mm: Update the HID bit when switching from radix to hash
  powerpc/mm: Update FORCE_MAX_ZONEORDER range to allow hugetlb w/4K
  powerpc/mm: Add radix flush all with IS=3

Anton Blanchard (11):
  powerpc/vdso64: Use double word compare on pointers
  powerpc/64: Align hot loops of memset() and backwards_memcpy()
  powerpc/configs: Enable VMX crypto
  powerpc/configs: Bump kernel ring buffer size on 64 bit configs
  powerpc/configs: Change a few things from built in to modules
  powerpc/configs: Enable Intel i40e on 64 bit configs
  powerpc/eeh: Quieten EEH message when no adapters are found
  powerpc: During context switch, check before setting mm_cpumask
  powerpc: Remove static branch prediction in atomic{, 64}_add_unless
  powerpc: Only 

[GIT PULL] Please pull powerpc/linux.git powerpc-4.9-1 tag

2016-10-06 Thread Michael Ellerman
Hi Linus,

Please pull the first batch of powerpc updates for 4.9:

The following changes since commit c6935931c1894ff857616ff8549b61236a19148f:

  Linux 4.8-rc5 (2016-09-04 14:31:46 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.9-1

for you to fetch changes up to b7b7013cac55d794940bd9cb7b7c55c9dececac4:

  powerpc/bpf: Add support for bpf constant blinding (2016-10-04 20:33:20 +1100)


powerpc updates for 4.9

Highlights:
 - Major rework of Book3S 64-bit exception vectors (Nicholas Piggin)
   - Use gas sections for arranging exception vectors et. al.
 - Large set of TM cleanups and selftests (Cyril Bur)
 - Enable transactional memory (TM) lazily for userspace (Cyril Bur)
 - Support for XZ compression in the zImage wrapper (Oliver O'Halloran)
 - Add support for bpf constant blinding (Naveen N. Rao)
 - Beginnings of upstream support for PA Semi Nemo motherboards (Darren Stevens)

Fixes:
 - Ensure .mem(init|exit).text are within _stext/_etext (Michael Ellerman)
 - xmon: Don't use ld on 32-bit (Michael Ellerman)
 - vdso64: Use double word compare on pointers (Anton Blanchard)
 - powerpc/nvram: Fix an incorrect partition merge (Pan Xinhui)
 - powerpc: Fix usage of _PAGE_RO in hugepage (Christophe Leroy)
 - powerpc/mm: Update FORCE_MAX_ZONEORDER range to allow hugetlb w/4K (Aneesh 
Kumar K.V)
 - Fix memory leak in queue_hotplug_event() error path (Andrew Donnellan)
 - Replay hypervisor maintenance interrupt first (Nicholas Piggin)

Cleanups & features:
 - Sparse fixes/cleanups (Daniel Axtens)
 - Preserve CFAR value on SLB miss caused by access to bogus address (Paul 
Mackerras)
 - Radix MMU fixups for POWER9 (Aneesh Kumar K.V)
 - Support for setting used_(vsr|vr|spe) in sigreturn path (for CRIU) (Simon 
Guo)
 - Optimise syscall entry for virtual, relocatable case (Nicholas Piggin)
 - Optimise MSR handling in exception handling (Nicholas Piggin)
 - Support for kexec with Radix MMU (Benjamin Herrenschmidt)
 - powernv EEH fixes (Russell Currey)
 - Suprise PCI hotplug support for powernv (Gavin Shan)
 - Endian/sparse fixes for powernv PCI (Gavin Shan)
 - Defconfig updates (Anton Blanchard)
 - Various performance optimisations (Anton Blanchard)
   - Align hot loops of memset() and backwards_memcpy()
   - During context switch, check before setting mm_cpumask
   - Remove static branch prediction in atomic{, 64}_add_unless
   - Only disable HAVE_EFFICIENT_UNALIGNED_ACCESS on POWER7 little endian
   - Set default CPU type to POWER8 for little endian builds

 - KVM: PPC: Book3S HV: Migrate pinned pages out of CMA (Balbir Singh)
 - cxl: Flush PSL cache before resetting the adapter (Frederic Barrat)
 - cxl: replace loop with for_each_child_of_node(), remove unneeded 
of_node_put() (Andrew Donnellan)
 - Fix HV facility unavailable to use correct handler (Nicholas Piggin)
 - Remove unnecessary syscall trampoline (Nicholas Piggin)
 - fadump: Fix build break when CONFIG_PROC_VMCORE=n (Michael Ellerman)
 - Quieten EEH message when no adapters are found (Anton Blanchard)
 - powernv: Add PHB register dump debugfs handle (Russell Currey)
 - Use kprobe blacklist for exception handlers & asm functions (Nicholas Piggin)
 - Document the syscall ABI (Nicholas Piggin)
 - MAINTAINERS: Update cxl maintainers (Michael Neuling)
 - powerpc: Remove all usages of NO_IRQ (Michael Ellerman)

Minor cleanups:
 - Andrew Donnellan, Christophe Leroy, Colin Ian King, Cyril Bur, Frederic 
Barrat,
   Pan Xinhui, PrasannaKumar Muralidharan, Rui Teng, Simon Guo.


Andrew Donnellan (3):
  powerpc/pseries: fix memory leak in queue_hotplug_event() error path
  powerpc/powernv: Fix comment style and spelling
  cxl: replace loop with for_each_child_of_node(), remove unneeded 
of_node_put()

Aneesh Kumar K.V (6):
  powerpc/book3s: Add a cpu table entry for different POWER9 revs
  powerpc/mm/radix: Use different RTS encoding for different POWER9 revs
  powerpc/mm/radix: Use different pte update sequence for different POWER9 
revs
  powerpc/mm: Update the HID bit when switching from radix to hash
  powerpc/mm: Update FORCE_MAX_ZONEORDER range to allow hugetlb w/4K
  powerpc/mm: Add radix flush all with IS=3

Anton Blanchard (11):
  powerpc/vdso64: Use double word compare on pointers
  powerpc/64: Align hot loops of memset() and backwards_memcpy()
  powerpc/configs: Enable VMX crypto
  powerpc/configs: Bump kernel ring buffer size on 64 bit configs
  powerpc/configs: Change a few things from built in to modules
  powerpc/configs: Enable Intel i40e on 64 bit configs
  powerpc/eeh: Quieten EEH message when no adapters are found
  powerpc: During context switch, check before setting mm_cpumask
  powerpc: Remove static branch prediction in atomic{, 64}_add_unless
  powerpc: Only 

Re: [PATCH 4.8 00/10] 4.8.1-stable review

2016-10-06 Thread Greg Kroah-Hartman
On Thu, Oct 06, 2016 at 11:51:01AM -0700, Guenter Roeck wrote:
> On Thu, Oct 06, 2016 at 10:18:23AM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.8.1 release.
> > There are 10 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sat Oct  8 07:47:33 UTC 2016.
> > Anything received after that time might be too late.
> > 
> 
> Build results:
>   total: 149 pass: 149 fail: 0
> Qemu test results:
>   total: 108 pass: 108 fail: 0
> 
> Details are available at http://kerneltests.org/builders.

Great, thanks for testing all of these and letting me know.

greg k-h


Re: CONFIG_DEBUG_TEST_DRIVER_REMOVE needs a warning

2016-10-06 Thread Rob Herring
On Thu, Oct 6, 2016 at 6:53 PM, Laura Abbott  wrote:
> On a whim, I decided to turn on CONFIG_DEBUG_TEST_DRIVER_REMOVE on
> Fedora rawhide since it sounded harmless enough. It spewed warnings
> and panicked some systems. Clearly it's  doing its job
> well of finding drivers that can't handle remove properly and I
> underestimated it. I was expecting to maybe find a driver or two.
> Can we get stronger Kconfig text indicating that this shouldn't be
> turned on lightly? I'll be turning the option off in Fedora but sending
> out reports from what was found.

It hides behind CONFIG_DEBUG already. Is there a better option that
distros won't turn on?

Rob


Re: CONFIG_DEBUG_TEST_DRIVER_REMOVE needs a warning

2016-10-06 Thread Rob Herring
On Thu, Oct 6, 2016 at 6:53 PM, Laura Abbott  wrote:
> On a whim, I decided to turn on CONFIG_DEBUG_TEST_DRIVER_REMOVE on
> Fedora rawhide since it sounded harmless enough. It spewed warnings
> and panicked some systems. Clearly it's  doing its job
> well of finding drivers that can't handle remove properly and I
> underestimated it. I was expecting to maybe find a driver or two.
> Can we get stronger Kconfig text indicating that this shouldn't be
> turned on lightly? I'll be turning the option off in Fedora but sending
> out reports from what was found.

It hides behind CONFIG_DEBUG already. Is there a better option that
distros won't turn on?

Rob


Re: [PATCH 4.8 00/10] 4.8.1-stable review

2016-10-06 Thread Greg Kroah-Hartman
On Thu, Oct 06, 2016 at 11:51:01AM -0700, Guenter Roeck wrote:
> On Thu, Oct 06, 2016 at 10:18:23AM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.8.1 release.
> > There are 10 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sat Oct  8 07:47:33 UTC 2016.
> > Anything received after that time might be too late.
> > 
> 
> Build results:
>   total: 149 pass: 149 fail: 0
> Qemu test results:
>   total: 108 pass: 108 fail: 0
> 
> Details are available at http://kerneltests.org/builders.

Great, thanks for testing all of these and letting me know.

greg k-h


Re: [PATCH 4.7 000/141] 4.7.7-stable review

2016-10-06 Thread Greg Kroah-Hartman
On Thu, Oct 06, 2016 at 11:54:02AM -0700, Guenter Roeck wrote:
> On Thu, Oct 06, 2016 at 10:27:16AM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.7.7 release.
> > There are 141 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sat Oct  8 07:44:08 UTC 2016.
> > Anything received after that time might be too late.
> > 
> 
> Build results:
>   total: 149 pass: 148 fail: 1
> Failed builds:
>   powerpc:ppc6xx_defconfig
> 
> Qemu test results:
>   total: 108 pass: 108 fail: 0
> 
> Adding upstream commit c1a23f6d6455 ("scsi: sas: provide stub implementation
> for scsi_is_sas_rphy") fixes the build problem.

Thanks, this should now be fixed.

greg k-h


Re: Change CONFIG_DEVKMEM default value to n

2016-10-06 Thread Greg Kroah-Hartman
On Fri, Oct 07, 2016 at 10:04:11AM +0800, Dave Young wrote:
> Kconfig comment suggests setting it as "n" if in doubt thus move the
> default value to 'n'.
> 
> Signed-off-by: Dave Young 
> Suggested-by: Kees Cook 
> ---
>  drivers/char/Kconfig |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- linux-x86.orig/drivers/char/Kconfig
> +++ linux-x86/drivers/char/Kconfig
> @@ -17,7 +17,7 @@ config DEVMEM
>  
>  config DEVKMEM
>   bool "/dev/kmem virtual device support"
> - default y
> + default n

If you remove the "default" line, it defaults to 'n'.

And is it really "safe" to default this to n now?

thanks,

greg k-h


Re: [PATCH 4.7 000/141] 4.7.7-stable review

2016-10-06 Thread Greg Kroah-Hartman
On Thu, Oct 06, 2016 at 11:54:02AM -0700, Guenter Roeck wrote:
> On Thu, Oct 06, 2016 at 10:27:16AM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.7.7 release.
> > There are 141 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sat Oct  8 07:44:08 UTC 2016.
> > Anything received after that time might be too late.
> > 
> 
> Build results:
>   total: 149 pass: 148 fail: 1
> Failed builds:
>   powerpc:ppc6xx_defconfig
> 
> Qemu test results:
>   total: 108 pass: 108 fail: 0
> 
> Adding upstream commit c1a23f6d6455 ("scsi: sas: provide stub implementation
> for scsi_is_sas_rphy") fixes the build problem.

Thanks, this should now be fixed.

greg k-h


Re: Change CONFIG_DEVKMEM default value to n

2016-10-06 Thread Greg Kroah-Hartman
On Fri, Oct 07, 2016 at 10:04:11AM +0800, Dave Young wrote:
> Kconfig comment suggests setting it as "n" if in doubt thus move the
> default value to 'n'.
> 
> Signed-off-by: Dave Young 
> Suggested-by: Kees Cook 
> ---
>  drivers/char/Kconfig |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- linux-x86.orig/drivers/char/Kconfig
> +++ linux-x86/drivers/char/Kconfig
> @@ -17,7 +17,7 @@ config DEVMEM
>  
>  config DEVKMEM
>   bool "/dev/kmem virtual device support"
> - default y
> + default n

If you remove the "default" line, it defaults to 'n'.

And is it really "safe" to default this to n now?

thanks,

greg k-h


Re: [PATCH 4.8 00/10] 4.8.1-stable review

2016-10-06 Thread Greg Kroah-Hartman
On Thu, Oct 06, 2016 at 01:56:28PM -0600, Shuah Khan wrote:
> On 10/06/2016 02:18 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.8.1 release.
> > There are 10 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sat Oct  8 07:47:33 UTC 2016.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.8.1-rc1.gz
> > or in the git tree and branch at:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-4.8.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h


Re: [PATCH] staging: lustre: lprocfs_status.h: fix sparse error: symbol redeclared with different type

2016-10-06 Thread Greg KH
On Thu, Oct 06, 2016 at 06:52:07PM +0200, Samuele Baisi wrote:
> drivers/staging/lustre/lustre/obdclass/lprocfs_status.c:1554:5: error:
> symbol 'lprocfs_wr_root_squash' redeclared with different type (originally
> declared at 
> drivers/staging/lustre/lustre/obdclass/../include/lprocfs_status.h:704)
> - incompatible argument 1 (different address spaces)
> 
> drivers/staging/lustre/lustre/obdclass/lprocfs_status.c:1618:5: error:
> symbol 'lprocfs_wr_nosquash_nids' redeclared with different type (originally
> declared at 
> drivers/staging/lustre/lustre/obdclass/../include/lprocfs_status.h:706)
> - incompatible argument 1 (different address spaces)
> 
> Added __user annotation to the header definitions arguments (which are
> indeed userspace buffers).

Are they really?  Have you tested this?  The last time this was looked
at, it was a non-trivial problem...

And any reason you didn't cc the lustre maintainers with this change?
If you think it is correct, please resend it with the testing
information and cc: them.

thanks,

greg k-h


Re: [PATCH 4.8 00/10] 4.8.1-stable review

2016-10-06 Thread Greg Kroah-Hartman
On Thu, Oct 06, 2016 at 01:56:28PM -0600, Shuah Khan wrote:
> On 10/06/2016 02:18 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.8.1 release.
> > There are 10 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sat Oct  8 07:47:33 UTC 2016.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.8.1-rc1.gz
> > or in the git tree and branch at:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-4.8.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h


Re: [PATCH] staging: lustre: lprocfs_status.h: fix sparse error: symbol redeclared with different type

2016-10-06 Thread Greg KH
On Thu, Oct 06, 2016 at 06:52:07PM +0200, Samuele Baisi wrote:
> drivers/staging/lustre/lustre/obdclass/lprocfs_status.c:1554:5: error:
> symbol 'lprocfs_wr_root_squash' redeclared with different type (originally
> declared at 
> drivers/staging/lustre/lustre/obdclass/../include/lprocfs_status.h:704)
> - incompatible argument 1 (different address spaces)
> 
> drivers/staging/lustre/lustre/obdclass/lprocfs_status.c:1618:5: error:
> symbol 'lprocfs_wr_nosquash_nids' redeclared with different type (originally
> declared at 
> drivers/staging/lustre/lustre/obdclass/../include/lprocfs_status.h:706)
> - incompatible argument 1 (different address spaces)
> 
> Added __user annotation to the header definitions arguments (which are
> indeed userspace buffers).

Are they really?  Have you tested this?  The last time this was looked
at, it was a non-trivial problem...

And any reason you didn't cc the lustre maintainers with this change?
If you think it is correct, please resend it with the testing
information and cc: them.

thanks,

greg k-h


Re: [PATCH 4.7 122/141] scsi: ses: use scsi_is_sas_rphy instead of is_sas_attached

2016-10-06 Thread Greg Kroah-Hartman
On Thu, Oct 06, 2016 at 09:25:34PM +0800, James Bottomley wrote:
> On Thu, 2016-10-06 at 10:29 +0200, Greg Kroah-Hartman wrote:
> > 4.7-stable review patch.  If anyone has any objections, please let me
> > know.
> 
> This doesn't build if SCSI_SAS_ATTRS isn't set without this patch:
> 
> 
> commit c1a23f6d64552b4480208aa584ec7e9c13d6d9c3
> Author: Johannes Thumshirn 
> Date:   Wed Aug 17 11:46:16 2016 +0200
> 
> scsi: sas: provide stub implementation for scsi_is_sas_rphy
> 
> Does it?

You are right, we have ppc build failures without this, thanks for
letting me know.

greg k-h


Re: CONFIG_DEBUG_TEST_DRIVER_REMOVE needs a warning

2016-10-06 Thread Greg Kroah-Hartman
On Thu, Oct 06, 2016 at 04:53:20PM -0700, Laura Abbott wrote:
> On a whim, I decided to turn on CONFIG_DEBUG_TEST_DRIVER_REMOVE on
> Fedora rawhide since it sounded harmless enough. It spewed warnings
> and panicked some systems. Clearly it's  doing its job
> well of finding drivers that can't handle remove properly and I
> underestimated it.

Yes, we knew it was going to find bugs, you were brave :)

> I was expecting to maybe find a driver or two.
> Can we get stronger Kconfig text indicating that this shouldn't be
> turned on lightly? I'll be turning the option off in Fedora but sending
> out reports from what was found.

Care to send a patch with the wording change you would have found better
to warn yourself not to do this?

thanks,

greg k-h


  1   2   3   4   5   6   7   8   9   10   >