date:20151009

[PATCH] ASoc: sh: Fit typo in Kconfig

2015-10-09 Thread Masanari Iida

s/SUR/SRU/g

Signed-off-by: Masanari Iida 
---
 sound/soc/sh/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/sh/Kconfig b/sound/soc/sh/Kconfig
index 6ca90aa..206d1ed 100644
--- a/sound/soc/sh/Kconfig
+++ b/sound/soc/sh/Kconfig
@@ -41,7 +41,7 @@ config SND_SOC_RCAR
select SND_SIMPLE_CARD
select REGMAP_MMIO
help
- This option enables R-Car SUR/SCU/SSIU/SSI sound support
+ This option enables R-Car SRU/SCU/SSIU/SSI sound support
 
 config SND_SOC_RSRC_CARD
tristate "Renesas Sampling Rate Convert Sound Card"
-- 
2.6.1.133.gf5b6079

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RESEND PATCH] soc: qcom: smd: Correct SMEM items for upper channels

2015-10-09 Thread bjorn

From: Bjorn Andersson 

Update the SMEM items for the second set of SMD channels, as these where
incorrect.

Signed-off-by: Bjorn Andersson 
---

Corrected .gitconfig mishap which gave wrong author.

 drivers/soc/qcom/smd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/qcom/smd.c b/drivers/soc/qcom/smd.c
index 18964f1..e7fb8fa 100644
--- a/drivers/soc/qcom/smd.c
+++ b/drivers/soc/qcom/smd.c
@@ -87,8 +87,8 @@ static const struct {
.fifo_base_id = 338
},
{
-   .alloc_tbl_id = 14,
-   .info_base_id = 266,
+   .alloc_tbl_id = 266,
+   .info_base_id = 138,
.fifo_base_id = 202,
},
 };
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] blk-mq: fix waitqueue_active without memory barrier in block/blk-mq-tag.c

2015-10-09 Thread Kosuke Tatsukawa

Jens Axboe wrote:
> On 10/08/2015 06:35 PM, Kosuke Tatsukawa wrote:
>> blk_mq_tag_update_depth() seems to be missing a memory barrier which
>> might cause the waker to not notice the waiter and fail to send a
>> wake_up as in the following figure.
>> 
>>  blk_mq_tag_update_depth bt_get
>> 
>> if (waitqueue_active(>wait))
>> /* The CPU might reorder the test for
>> the waitqueue up here, before
>> prior writes complete */
>>  prepare_to_wait(>wait, ,
>>TASK_UNINTERRUPTIBLE);
>>  tag = __bt_get(hctx, bt, last_tag,
>>tags);
>>  /* Value set in bt_update_count not
>> visible yet */
>> bt_update_count(>bitmap_tags, tdepth);
>> /* blk_mq_tag_wakeup_all(tags, false); */
>>   bt = >bitmap_tags;
>>   wake_index = atomic_read(>wake_index);
>>  ...
>>  io_schedule();
>> 
>> 
>> This patch adds the missing memory barrier.
>> 
>> I found this issue when I was looking through the linux source code
>> for places calling waitqueue_active() before wake_up*(), but without
>> preceding memory barriers, after sending a patch to fix a similar
>> issue in drivers/tty/n_tty.c  (Details about the original issue can be
>> found here: https://lkml.org/lkml/2015/9/28/849).
>> 
>> Signed-off-by: Kosuke Tatsukawa 
>> ---
>>   block/blk-mq-tag.c |4 
>>   1 files changed, 4 insertions(+), 0 deletions(-)
>> 
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index ed96474..7a6b6e2 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -75,6 +75,10 @@ void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool 
>> include_reserve)
>>  struct blk_mq_bitmap_tags *bt;
>>  int i, wake_index;
>>   
>> +/*
>> + * Make sure all changes prior to this are visible from other CPUs.
>> + */
>> +smp_mb();
>>  bt = >bitmap_tags;
>>  wake_index = atomic_read(>wake_index);
>>  for (i = 0; i < BT_WAIT_QUEUES; i++) {
>> 
> 
> Thanks, after looking at this, I think this patch is fine. It's not a
> super hot path, so not worth it to further optimize this or look into
> ways to avoid the barrier. I do wonder if there are archs where
> atomic_read() is a memory barrier, in that case we need not do it at
> all. And perhaps we have some weird smp_before_bla variant that could be
> used here instead fo improve upon that case.

Roughly looking at include/asm/atomic.h in various architecures, it
seems atomic_read is defined as a macro or an inline function calling
  ACCESS_ONCE((v)->counter)
in many architectures which doesn't imply a memory barrier.

blackfin seems to be calling an assembler function which does "flush
core internal write buffer".

I'm not sure about the memory ordering of the assembler instructions for
metag, powerpc and s390 though.
---
Kosuke TATSUKAWA  | 3rd IT Platform Department
  | IT Platform Division, NEC Corporation
  | ta...@ab.jp.nec.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c

2015-10-09 Thread Kosuke Tatsukawa

Peter Hurley wrote:
> On 10/09/2015 01:28 PM, Peter Hurley wrote:
>> Tatsukawa-san,
>> 
>> I would still like to root-cause the reported stall; is the reported
>> stall resolved if smp_mb() is added before the waitqueue_active()
>> in __receive_buf()?
>
> Nevermind, I see it now.
>
> The store to commit_head is deferred until after the load of read_wait->next;
> a full memory barrier would properly order the store before the load but,
> since that is roughly equivalent to taking the spin lock for the wake up 
> anyway,
> it makes sense to just always do the wakeup.

The stall problem is resolved if smp_mb() is added both before the
waitqueue_active() in __receive_buf(), and before the line containing
input_available_p() in n_tty_read().
---
Kosuke TATSUKAWA  | 3rd IT Platform Department
  | IT Platform Division, NEC Corporation
  | ta...@ab.jp.nec.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] btrfs: fix waitqueue_active without memory barrier in btrfs

2015-10-09 Thread Kosuke Tatsukawa

David Sterba wrote:
> On Fri, Oct 09, 2015 at 12:35:48AM +, Kosuke Tatsukawa wrote:
>> This patch removes the call to waitqueue_active() leaving just wake_up()
>> behind.  This fixes the problem because the call to spin_lock_irqsave()
>> in wake_up() will be an ACQUIRE operation.
>
> Either we can switch it to wake_up or put the barrier before the check.
> Not all instances of waitqueue_active need the barrier though.
>
>> I found this issue when I was looking through the linux source code
>> for places calling waitqueue_active() before wake_up*(), but without
>> preceding memory barriers, after sending a patch to fix a similar
>> issue in drivers/tty/n_tty.c  (Details about the original issue can be
>> found here: https://lkml.org/lkml/2015/9/28/849).
>
> There are more in btrfs:
>
> https://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg41914.html

Thank you for the pointer.
Your patch seems better than mine.

I think the other places in btrfs that use waitqueue_active() before
wake_up are preceded by either a smp_mb or some kind of atomic
operation.

The latter still needs smp_mb__after_atomic() but it's light-weight
compared to smp_mb().


>> @@ -918,9 +918,7 @@ void btrfs_bio_counter_inc_noblocked(struct 
>> btrfs_fs_info *fs_info)
>>  void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount)
>>  {
>>  percpu_counter_sub(_info->bio_counter, amount);
>> -
>> -if (waitqueue_active(_info->replace_wait))
>> -wake_up(_info->replace_wait);
>> +wake_up(_info->replace_wait);
>
> Chris had a comment on that one in
> https://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg42551.html
> it's in performance critial context and the explicit wake_up is even
> worse than the barrier.
---
Kosuke TATSUKAWA  | 3rd IT Platform Department
  | IT Platform Division, NEC Corporation
  | ta...@ab.jp.nec.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pidns: Make pid accounting and pid_max per namespace

2015-10-09 Thread Zhang Haoyu


On 10/10/15 12:40, Zhang Haoyu wrote:
> On 10/10/15 11:35, Zefan Li wrote:
>> On 2015/10/9 18:29, Zhang Haoyu wrote:
>>> I started multiple docker containers in centos6.6(linux-2.6.32-504.16.2),
>>> and there's one bad program was running in one container.
>>> This program produced many child threads continuously without free, so more 
>>> and
>>> more pid numbers were consumed by this program, until hitting the pix_max 
>>> limit (32768
>>> default in my system ).
>>>
>>> What's worse is that containers and host share the pid numbers resource, so 
>>> new program
>>> cannot be produced any more in host and other containers.
>>>
>>> And, I clone the upstream kernel source from
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>> This problem is still there, I'm not sure.
>>>
>>> IMO, we should isolate the pid accounting and pid_max between pid 
>>> namespaces,
>>> and make them per pidns.
>>> Below post had request for making pid_max per pidns.
>>> http://thread.gmane.org/gmane.linux.kernel/1108167/focus=210
>>>
>> Mainline kernel already supports per-cgroup pid limit, which should solve
>> your problem.
>>
> What about pid accounting?
> If one pidns consume too many pids, dose it influence the other pid 
> namespaces?
I found it, thanks very much.
>
> Thanks,
> Zhang Haoyu


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pidns: Make pid accounting and pid_max per namespace

2015-10-09 Thread Zhang Haoyu


On 10/10/15 11:35, Zefan Li wrote:
> On 2015/10/9 18:29, Zhang Haoyu wrote:
>> I started multiple docker containers in centos6.6(linux-2.6.32-504.16.2),
>> and there's one bad program was running in one container.
>> This program produced many child threads continuously without free, so more 
>> and
>> more pid numbers were consumed by this program, until hitting the pix_max 
>> limit (32768
>> default in my system ).
>>
>> What's worse is that containers and host share the pid numbers resource, so 
>> new program
>> cannot be produced any more in host and other containers.
>>
>> And, I clone the upstream kernel source from
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>> This problem is still there, I'm not sure.
>>
>> IMO, we should isolate the pid accounting and pid_max between pid namespaces,
>> and make them per pidns.
>> Below post had request for making pid_max per pidns.
>> http://thread.gmane.org/gmane.linux.kernel/1108167/focus=210
>>
>
> Mainline kernel already supports per-cgroup pid limit, which should solve
> your problem.
>
What about pid accounting?
If one pidns consume too many pids, dose it influence the other pid namespaces?

Thanks,
Zhang Haoyu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Staging: most: Fix typo in staging/most

2015-10-09 Thread Masanari Iida

This patch fix spelling typo found in most

Signed-off-by: Masanari Iida 
---
 drivers/staging/most/Documentation/ABI/sysfs-class-most.txt | 2 +-
 drivers/staging/most/hdm-dim2/Kconfig   | 2 +-
 drivers/staging/most/hdm-usb/hdm_usb.c  | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/most/Documentation/ABI/sysfs-class-most.txt 
b/drivers/staging/most/Documentation/ABI/sysfs-class-most.txt
index 380c137..42ff0d8 100644
--- a/drivers/staging/most/Documentation/ABI/sysfs-class-most.txt
+++ b/drivers/staging/most/Documentation/ABI/sysfs-class-most.txt
@@ -47,7 +47,7 @@ Date: June 2015
 KernelVersion: 4.3
 Contact:   Christian Gromm 
 Description:
-   Indicates the type of peripherial interface the current device
+   Indicates the type of peripheral interface the current device
uses.
 Users:
 
diff --git a/drivers/staging/most/hdm-dim2/Kconfig 
b/drivers/staging/most/hdm-dim2/Kconfig
index fc54876..28a0e17 100644
--- a/drivers/staging/most/hdm-dim2/Kconfig
+++ b/drivers/staging/most/hdm-dim2/Kconfig
@@ -9,7 +9,7 @@ config HDM_DIM2
 
---help---
  Say Y here if you want to connect via MediaLB to network transceiver.
- This device driver is platform dependent and needs an addtional
+ This device driver is platform dependent and needs an additional
  platform driver to be installed. For more information contact
  maintainer of this driver.
 
diff --git a/drivers/staging/most/hdm-usb/hdm_usb.c 
b/drivers/staging/most/hdm-usb/hdm_usb.c
index fcd7559..a73eb5f 100644
--- a/drivers/staging/most/hdm-usb/hdm_usb.c
+++ b/drivers/staging/most/hdm-usb/hdm_usb.c
@@ -431,7 +431,7 @@ static void hdm_write_completion(struct urb *urb)
 }
 
 /**
- * hdm_read_completion - completion funciton for submitted Rx URBs
+ * hdm_read_completion - completion function for submitted Rx URBs
  * @urb: the URB that has been completed
  *
  * This checks the status of the completed URB. In case the URB has been
@@ -767,7 +767,7 @@ static int hdm_configure_channel(struct most_interface 
*iface, int channel,
tmp_val = conf->buffer_size / frame_size;
conf->buffer_size = tmp_val * frame_size;
dev_notice(dev,
-  "Channel %d - rouding buffer size to %d bytes, "
+  "Channel %d - rounding buffer size to %d bytes, "
   "channel config says %d bytes\n",
   channel,
   conf->buffer_size,
-- 
2.6.1.133.gf5b6079

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] thermal: exynos: fix register read in TMU

2015-10-09 Thread Krzysztof Kozlowski

W dniu 09.10.2015 o 21:07, Lukasz Majewski pisze:
> Hi Krzysztof,
> 
>> 2015-10-08 23:21 GMT+09:00 Sudip Mukherjee
>> :
>>> On Fri, Oct 02, 2015 at 08:43:52AM +0900, Krzysztof Kozlowski wrote:
 2015-10-01 23:12 GMT+09:00 Sudip Mukherjee
 :
> On Thu, Oct 01, 2015 at 10:18:57PM +0900, Krzysztof Kozlowski
> wrote:
>> 2015-10-01 20:39 GMT+09:00 Sudip Mukherjee
>> :
>>> The value of emul_con was getting overwritten if the selected
>>> soc is SOC_ARCH_EXYNOS5260. And so as a result we were
>>> reading from the wrong register in the case of
>>> SOC_ARCH_EXYNOS5260.
>>
>> How the value is overwritten if the soc is Exynos5260? I can't
>> see it (although the "else if" is still more obvious than "if"
>> but how does the description match the code?).
> The code here is:
> if (data->soc == SOC_ARCH_EXYNOS5260)
> emul_con = EXYNOS5260_EMUL_CON;
> if (data->soc == SOC_ARCH_EXYNOS5433)
> emul_con = EXYNOS5433_TMU_EMUL_CON;
> else if (data->soc == SOC_ARCH_EXYNOS7)
> emul_con = EXYNOS7_TMU_REG_EMUL_CON;
> else
> emul_con = EXYNOS_EMUL_CON;
>
> So if data->soc is SOC_ARCH_EXYNOS5260 , then emul_con becomes
> EXYNOS5260_EMUL_CON. But again for the else part it will become
> EXYNOS_EMUL_CON.

 Indeed!

 Fixes: 488c7455d74c ("thermal: exynos: Add the support for
 Exynos5433 TMU")

 Reviewed-by: Krzysztof Kozlowski 
>>>
>>> Hi Krzysztof,
>>> Who will pick this one up? I still do not see it in linux-next.
>>
>> Hi!
>>
>> I guess it is a patch for Lukasz.
>>
>> Lukasz,
>> Do you plan to pick it up or maybe this should go through samsung-soc
>> tree?
> 
> This is the only one patch, which is waiting in my queue. Since no more
> fixes available, please feel free to grab this patch to samsung-soc.
> 
> Acked-by: Lukasz Majewski 

Okay, thanks! Applied to fixes for current cycle. I'll it send later to
Kukjin (unless he picks it as well).

Best regards,
Krzysztof

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] soc: qcom: smd: Correct SMEM items for upper channels

2015-10-09 Thread Bjorn Andersson

Update the SMEM items for the second set of SMD channels, as these where
incorrect.

Signed-off-by: Bjorn Andersson 
---
 drivers/soc/qcom/smd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/qcom/smd.c b/drivers/soc/qcom/smd.c
index 18964f1..e7fb8fa 100644
--- a/drivers/soc/qcom/smd.c
+++ b/drivers/soc/qcom/smd.c
@@ -87,8 +87,8 @@ static const struct {
.fifo_base_id = 338
},
{
-   .alloc_tbl_id = 14,
-   .info_base_id = 266,
+   .alloc_tbl_id = 266,
+   .info_base_id = 138,
.fifo_base_id = 202,
},
 };
-- 
2.3.2 (Apple Git-55)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif

2015-10-09 Thread Bob Liu


On 10/05/2015 10:55 PM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Split per ring information to an new structure:xen_blkif_ring, so that one 
>> vbd
>> device can associate with one or more rings/hardware queues.
>>
>> This patch is a preparation for supporting multi hardware queues/rings.
>>
>> Signed-off-by: Arianna Avanzini 
>> Signed-off-by: Bob Liu 
>> ---
>>  drivers/block/xen-blkback/blkback.c |  365 
>> ++-
>>  drivers/block/xen-blkback/common.h  |   52 +++--
>>  drivers/block/xen-blkback/xenbus.c  |  130 +++--
>>  3 files changed, 295 insertions(+), 252 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkback/blkback.c 
>> b/drivers/block/xen-blkback/blkback.c
>> index 954c002..fd02240 100644
>> --- a/drivers/block/xen-blkback/blkback.c
>> +++ b/drivers/block/xen-blkback/blkback.c
>> @@ -113,71 +113,71 @@ module_param(log_stats, int, 0644);
>>  /* Number of free pages to remove on each call to gnttab_free_pages */
>>  #define NUM_BATCH_FREE_PAGES 10
>>  
>> -static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
>> +static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
>> **page)
>>  {
>>  unsigned long flags;
>>  
>> -spin_lock_irqsave(>free_pages_lock, flags);
>> -if (list_empty(>free_pages)) {
>> -BUG_ON(blkif->free_pages_num != 0);
>> -spin_unlock_irqrestore(>free_pages_lock, flags);
>> +spin_lock_irqsave(>free_pages_lock, flags);
>> +if (list_empty(>free_pages)) {
> 
> I'm afraid the pool of free pages should be per-device, not per-ring.
> 
>> +BUG_ON(ring->free_pages_num != 0);
>> +spin_unlock_irqrestore(>free_pages_lock, flags);
>>  return gnttab_alloc_pages(1, page);
>>  }
>> -BUG_ON(blkif->free_pages_num == 0);
>> -page[0] = list_first_entry(>free_pages, struct page, lru);
>> +BUG_ON(ring->free_pages_num == 0);
>> +page[0] = list_first_entry(>free_pages, struct page, lru);
>>  list_del([0]->lru);
>> -blkif->free_pages_num--;
>> -spin_unlock_irqrestore(>free_pages_lock, flags);
>> +ring->free_pages_num--;
>> +spin_unlock_irqrestore(>free_pages_lock, flags);
>>  
>>  return 0;
>>  }
>>  
>> -static inline void put_free_pages(struct xen_blkif *blkif, struct page 
>> **page,
>> +static inline void put_free_pages(struct xen_blkif_ring *ring, struct page 
>> **page,
>>int num)
>>  {
>>  unsigned long flags;
>>  int i;
>>  
>> -spin_lock_irqsave(>free_pages_lock, flags);
>> +spin_lock_irqsave(>free_pages_lock, flags);
>>  for (i = 0; i < num; i++)
>> -list_add([i]->lru, >free_pages);
>> -blkif->free_pages_num += num;
>> -spin_unlock_irqrestore(>free_pages_lock, flags);
>> +list_add([i]->lru, >free_pages);
>> +ring->free_pages_num += num;
>> +spin_unlock_irqrestore(>free_pages_lock, flags);
>>  }
>>  
>> -static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
>> +static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int 
>> num)
>>  {
>>  /* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
>>  struct page *page[NUM_BATCH_FREE_PAGES];
>>  unsigned int num_pages = 0;
>>  unsigned long flags;
>>  
>> -spin_lock_irqsave(>free_pages_lock, flags);
>> -while (blkif->free_pages_num > num) {
>> -BUG_ON(list_empty(>free_pages));
>> -page[num_pages] = list_first_entry(>free_pages,
>> +spin_lock_irqsave(>free_pages_lock, flags);
>> +while (ring->free_pages_num > num) {
>> +BUG_ON(list_empty(>free_pages));
>> +page[num_pages] = list_first_entry(>free_pages,
>> struct page, lru);
>>  list_del([num_pages]->lru);
>> -blkif->free_pages_num--;
>> +ring->free_pages_num--;
>>  if (++num_pages == NUM_BATCH_FREE_PAGES) {
>> -spin_unlock_irqrestore(>free_pages_lock, flags);
>> +spin_unlock_irqrestore(>free_pages_lock, flags);
>>  gnttab_free_pages(num_pages, page);
>> -spin_lock_irqsave(>free_pages_lock, flags);
>> +spin_lock_irqsave(>free_pages_lock, flags);
>>  num_pages = 0;
>>  }
>>  }
>> -spin_unlock_irqrestore(>free_pages_lock, flags);
>> +spin_unlock_irqrestore(>free_pages_lock, flags);
>>  if (num_pages != 0)
>>  gnttab_free_pages(num_pages, page);
>>  }
>>  
>>  #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
>>  
>> -static int do_block_io_op(struct xen_blkif *blkif);
>> -static int dispatch_rw_block_io(struct xen_blkif *blkif,
>> +static int do_block_io_op(struct xen_blkif_ring *ring);
>> +static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
>>

Re: CFS scheduler unfairly prefers pinned tasks

2015-10-09 Thread Wanpeng Li


Hi Paul,
On 10/8/15 4:19 PM, Mike Galbraith wrote:

On Tue, 2015-10-06 at 04:45 +0200, Mike Galbraith wrote:

On Tue, 2015-10-06 at 08:48 +1100, paul.sz...@sydney.edu.au wrote:

The Linux CFS scheduler prefers pinned tasks and unfairly
gives more CPU time to tasks that have set CPU affinity.
This effect is observed with or without CGROUP controls.

To demonstrate: on an otherwise idle machine, as some user
run several processes pinned to each CPU, one for each CPU
(as many as CPUs present in the system) e.g. for a quad-core
non-HyperThreaded machine:

   taskset -c 0 perl -e 'while(1){1}' &
   taskset -c 1 perl -e 'while(1){1}' &
   taskset -c 2 perl -e 'while(1){1}' &
   taskset -c 3 perl -e 'while(1){1}' &

and (as that same or some other user) run some without
pinning:

   perl -e 'while(1){1}' &
   perl -e 'while(1){1}' &

and use e.g.   top   to observe that the pinned processes get
more CPU time than "fair".


Interesting, I can reproduce it w/ your simple script. However, they are 
fair when the number of pinned perl tasks is equal to unpinned perl 
tasks. I will dig into it more deeply.


Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pidns: Make pid accounting and pid_max per namespace

2015-10-09 Thread Zefan Li


On 2015/10/9 18:29, Zhang Haoyu wrote:

I started multiple docker containers in centos6.6(linux-2.6.32-504.16.2),
and there's one bad program was running in one container.
This program produced many child threads continuously without free, so more and
more pid numbers were consumed by this program, until hitting the pix_max limit 
(32768
default in my system ).

What's worse is that containers and host share the pid numbers resource, so new 
program
cannot be produced any more in host and other containers.

And, I clone the upstream kernel source from
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
This problem is still there, I'm not sure.

IMO, we should isolate the pid accounting and pid_max between pid namespaces,
and make them per pidns.
Below post had request for making pid_max per pidns.
http://thread.gmane.org/gmane.linux.kernel/1108167/focus=210



Mainline kernel already supports per-cgroup pid limit, which should solve
your problem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/14] cgroup: relocate cgroup_[try]get/put()

2015-10-09 Thread Tejun Heo

Relocate cgroup_get(), cgroup_tryget() and cgroup_put() upwards.  This
is pure code reorganization to prepare for future changes.

Signed-off-by: Tejun Heo 
---
 kernel/cgroup.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 855313d..ab5c9a5 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -428,6 +428,22 @@ static inline bool cgroup_is_dead(const struct cgroup 
*cgrp)
return !(cgrp->self.flags & CSS_ONLINE);
 }
 
+static void cgroup_get(struct cgroup *cgrp)
+{
+   WARN_ON_ONCE(cgroup_is_dead(cgrp));
+   css_get(>self);
+}
+
+static bool cgroup_tryget(struct cgroup *cgrp)
+{
+   return css_tryget(>self);
+}
+
+static void cgroup_put(struct cgroup *cgrp)
+{
+   css_put(>self);
+}
+
 struct cgroup_subsys_state *of_css(struct kernfs_open_file *of)
 {
struct cgroup *cgrp = of->kn->parent->priv;
@@ -1177,22 +1193,6 @@ static umode_t cgroup_file_mode(const struct cftype *cft)
return mode;
 }
 
-static void cgroup_get(struct cgroup *cgrp)
-{
-   WARN_ON_ONCE(cgroup_is_dead(cgrp));
-   css_get(>self);
-}
-
-static bool cgroup_tryget(struct cgroup *cgrp)
-{
-   return css_tryget(>self);
-}
-
-static void cgroup_put(struct cgroup *cgrp)
-{
-   css_put(>self);
-}
-
 /**
  * cgroup_calc_child_subsys_mask - calculate child_subsys_mask
  * @cgrp: the target cgroup
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/14] cgroup: reorganize css_task_iter functions

2015-10-09 Thread Tejun Heo

* Rename css_advance_task_iter() to css_task_iter_advance_css_set()
  and make it clear it->task_pos too at the end of the iteration.

* Factor out css_task_iter_advance() from css_task_iter_next().  The
  new function whines if called on a terminated iterator.

Except for the termination check, this is pure reorganization and
doesn't introduce any behavior changes.  This will help the planned
locking update for css_task_iter.

Signed-off-by: Tejun Heo 
---
 kernel/cgroup.c | 49 -
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 4e239e4..0a58445 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3791,12 +3791,12 @@ bool css_has_online_children(struct cgroup_subsys_state 
*css)
 }
 
 /**
- * css_advance_task_iter - advance a task itererator to the next css_set
+ * css_task_iter_advance_css_set - advance a task itererator to the next 
css_set
  * @it: the iterator to advance
  *
  * Advance @it to the next css_set to walk.
  */
-static void css_advance_task_iter(struct css_task_iter *it)
+static void css_task_iter_advance_css_set(struct css_task_iter *it)
 {
struct list_head *l = it->cset_pos;
struct cgrp_cset_link *link;
@@ -3807,6 +3807,7 @@ static void css_advance_task_iter(struct css_task_iter 
*it)
l = l->next;
if (l == it->cset_head) {
it->cset_pos = NULL;
+   it->task_pos = NULL;
return;
}
 
@@ -3830,6 +3831,28 @@ static void css_advance_task_iter(struct css_task_iter 
*it)
it->mg_tasks_head = >mg_tasks;
 }
 
+static void css_task_iter_advance(struct css_task_iter *it)
+{
+   struct list_head *l = it->task_pos;
+
+   WARN_ON_ONCE(!l);
+
+   /*
+* Advance iterator to find next entry.  cset->tasks is consumed
+* first and then ->mg_tasks.  After ->mg_tasks, we move onto the
+* next cset.
+*/
+   l = l->next;
+
+   if (l == it->tasks_head)
+   l = it->mg_tasks_head->next;
+
+   if (l == it->mg_tasks_head)
+   css_task_iter_advance_css_set(it);
+   else
+   it->task_pos = l;
+}
+
 /**
  * css_task_iter_start - initiate task iteration
  * @css: the css to walk tasks of
@@ -3862,7 +3885,7 @@ void css_task_iter_start(struct cgroup_subsys_state *css,
 
it->cset_head = it->cset_pos;
 
-   css_advance_task_iter(it);
+   css_task_iter_advance_css_set(it);
 }
 
 /**
@@ -3876,28 +3899,12 @@ void css_task_iter_start(struct cgroup_subsys_state 
*css,
 struct task_struct *css_task_iter_next(struct css_task_iter *it)
 {
struct task_struct *res;
-   struct list_head *l = it->task_pos;
 
-   /* If the iterator cg is NULL, we have no tasks */
if (!it->cset_pos)
return NULL;
-   res = list_entry(l, struct task_struct, cg_list);
-
-   /*
-* Advance iterator to find next entry.  cset->tasks is consumed
-* first and then ->mg_tasks.  After ->mg_tasks, we move onto the
-* next cset.
-*/
-   l = l->next;
-
-   if (l == it->tasks_head)
-   l = it->mg_tasks_head->next;
-
-   if (l == it->mg_tasks_head)
-   css_advance_task_iter(it);
-   else
-   it->task_pos = l;
 
+   res = list_entry(it->task_pos, struct task_struct, cg_list);
+   css_task_iter_advance(it);
return res;
 }
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/14] cgroup: replace cgroup_has_tasks() with cgroup_is_populated()

2015-10-09 Thread Tejun Heo

Currently, cgroup_has_tasks() tests whether the target cgroup has any
css_set linked to it.  This works because a css_set's refcnt converges
with the number of tasks linked to it and thus there's no css_set
linked to a cgroup if it doesn't have any live tasks.

To help tracking resource usage of zombie tasks, putting the ref of
css_set will be separated from disassociating the task from the
css_set which means that a cgroup may have css_sets linked to it even
when it doesn't have any live tasks.

This patch replaces cgroup_has_tasks() with cgroup_is_populated()
which tests cgroup->nr_populated instead which locally counts the
number of populated css_sets.  Unlike cgroup_has_tasks(),
cgroup_is_populated() is recursive - if any of the descendants is
populated, the cgroup is populated too.  While this changes the
meaning of the test, all the existing users are okay with the change.

While at it, replace the open-coded ->populated_cnt test in
cgroup_events_show() with cgroup_is_populated().

Signed-off-by: Tejun Heo 
Cc: Li Zefan 
Cc: Johannes Weiner 
Cc: Michal Hocko 
---
 include/linux/cgroup.h | 4 ++--
 kernel/cgroup.c| 6 +++---
 kernel/cpuset.c| 2 +-
 mm/memcontrol.c| 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index e9c3eac..bdfdb3a 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -456,9 +456,9 @@ static inline struct cgroup *task_cgroup(struct task_struct 
*task,
 }
 
 /* no synchronization, the result can only be used as a hint */
-static inline bool cgroup_has_tasks(struct cgroup *cgrp)
+static inline bool cgroup_is_populated(struct cgroup *cgrp)
 {
-   return !list_empty(>cset_links);
+   return cgrp->populated_cnt;
 }
 
 /* returns ino associated with a cgroup */
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e5231d0..435aa68 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3121,7 +3121,7 @@ static ssize_t cgroup_subtree_control_write(struct 
kernfs_open_file *of,
 static int cgroup_events_show(struct seq_file *seq, void *v)
 {
seq_printf(seq, "populated %d\n",
-  (bool)seq_css(seq)->cgroup->populated_cnt);
+  cgroup_is_populated(seq_css(seq)->cgroup));
return 0;
 }
 
@@ -5558,7 +5558,7 @@ void cgroup_exit(struct task_struct *tsk)
 
 static void check_for_release(struct cgroup *cgrp)
 {
-   if (notify_on_release(cgrp) && !cgroup_has_tasks(cgrp) &&
+   if (notify_on_release(cgrp) && !cgroup_is_populated(cgrp) &&
!css_has_online_children(>self) && !cgroup_is_dead(cgrp))
schedule_work(>release_agent_work);
 }
@@ -5806,7 +5806,7 @@ static int cgroup_css_links_read(struct seq_file *seq, 
void *v)
 
 static u64 releasable_read(struct cgroup_subsys_state *css, struct cftype *cft)
 {
-   return (!cgroup_has_tasks(css->cgroup) &&
+   return (!cgroup_is_populated(css->cgroup) &&
!css_has_online_children(>cgroup->self));
 }
 
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index e4d..d7ccb87 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -498,7 +498,7 @@ static int validate_change(struct cpuset *cur, struct 
cpuset *trial)
 * be changed to have empty cpus_allowed or mems_allowed.
 */
ret = -ENOSPC;
-   if ((cgroup_has_tasks(cur->css.cgroup) || cur->attach_in_progress)) {
+   if ((cgroup_is_populated(cur->css.cgroup) || cur->attach_in_progress)) {
if (!cpumask_empty(cur->cpus_allowed) &&
cpumask_empty(trial->cpus_allowed))
goto out;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 33c8dad..0ddd0ff 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2920,7 +2920,7 @@ static int memcg_activate_kmem(struct mem_cgroup *memcg,
 * of course permitted.
 */
mutex_lock(_create_mutex);
-   if (cgroup_has_tasks(memcg->css.cgroup) ||
+   if (cgroup_is_populated(memcg->css.cgroup) ||
(memcg->use_hierarchy && memcg_has_children(memcg)))
err = -EBUSY;
mutex_unlock(_create_mutex);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 09/14] cgroup: factor out css_set_move_task()

2015-10-09 Thread Tejun Heo

A task is associated and disassociated with its css_set in three
places - during migration, after a new task is created and when a task
exits.  The first is handled by cgroup_task_migrate() and the latter
two are open-coded.

These are similar operations and spreading them over multiple places
makes it harder to follow and update.  This patch collects all task
css_set [dis]association operations into css_set_move_task().

While css_set_move_task() may check whether populated state needs to
be updated when not strictly necessary, the behavior is essentially
equivalent before and after this patch.

Signed-off-by: Tejun Heo 
---
 kernel/cgroup.c | 104 ++--
 1 file changed, 56 insertions(+), 48 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 5cb28d2..4e239e4 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -664,6 +664,52 @@ static void css_set_update_populated(struct css_set *cset, 
bool populated)
cgroup_update_populated(link->cgrp, populated);
 }
 
+/**
+ * css_set_move_task - move a task from one css_set to another
+ * @task: task being moved
+ * @from_cset: css_set @task currently belongs to (may be NULL)
+ * @to_cset: new css_set @task is being moved to (may be NULL)
+ * @use_mg_tasks: move to @to_cset->mg_tasks instead of ->tasks
+ *
+ * Move @task from @from_cset to @to_cset.  If @task didn't belong to any
+ * css_set, @from_cset can be NULL.  If @task is being disassociated
+ * instead of moved, @to_cset can be NULL.
+ *
+ * This function automatically handles populated_cnt updates but the caller
+ * is responsible for managing @from_cset and @to_cset's reference counts.
+ */
+static void css_set_move_task(struct task_struct *task,
+ struct css_set *from_cset, struct css_set 
*to_cset,
+ bool use_mg_tasks)
+{
+   lockdep_assert_held(_set_rwsem);
+
+   if (from_cset) {
+   WARN_ON_ONCE(list_empty(>cg_list));
+   list_del_init(>cg_list);
+   if (!css_set_populated(from_cset))
+   css_set_update_populated(from_cset, false);
+   } else {
+   WARN_ON_ONCE(!list_empty(>cg_list));
+   }
+
+   if (to_cset) {
+   /*
+* We are synchronized through cgroup_threadgroup_rwsem
+* against PF_EXITING setting such that we can't race
+* against cgroup_exit() changing the css_set to
+* init_css_set and dropping the old one.
+*/
+   WARN_ON_ONCE(task->flags & PF_EXITING);
+
+   if (!css_set_populated(to_cset))
+   css_set_update_populated(to_cset, true);
+   rcu_assign_pointer(task->cgroups, to_cset);
+   list_add_tail(>cg_list, use_mg_tasks ? _cset->mg_tasks 
:
+_cset->tasks);
+   }
+}
+
 /*
  * hash table for cgroup groups. This improves the performance to find
  * an existing css_set. This hash doesn't (currently) take into
@@ -2260,47 +2306,6 @@ struct task_struct *cgroup_taskset_next(struct 
cgroup_taskset *tset)
 }
 
 /**
- * cgroup_task_migrate - move a task from one cgroup to another.
- * @tsk: the task being migrated
- * @new_cset: the new css_set @tsk is being attached to
- *
- * Must be called with cgroup_mutex, threadgroup and css_set_rwsem locked.
- */
-static void cgroup_task_migrate(struct task_struct *tsk,
-   struct css_set *new_cset)
-{
-   struct css_set *old_cset;
-
-   lockdep_assert_held(_mutex);
-   lockdep_assert_held(_set_rwsem);
-
-   /*
-* We are synchronized through cgroup_threadgroup_rwsem against
-* PF_EXITING setting such that we can't race against cgroup_exit()
-* changing the css_set to init_css_set and dropping the old one.
-*/
-   WARN_ON_ONCE(tsk->flags & PF_EXITING);
-   old_cset = task_css_set(tsk);
-
-   if (!css_set_populated(new_cset))
-   css_set_update_populated(new_cset, true);
-
-   get_css_set(new_cset);
-   rcu_assign_pointer(tsk->cgroups, new_cset);
-   list_move_tail(>cg_list, _cset->mg_tasks);
-
-   if (!css_set_populated(old_cset))
-   css_set_update_populated(old_cset, false);
-
-   /*
-* We just gained a reference on old_cset by taking it from the
-* task. As trading it for new_cset is protected by cgroup_mutex,
-* we're safe to drop it here; it will be freed under RCU.
-*/
-   put_css_set_locked(old_cset);
-}
-
-/**
  * cgroup_taskset_migrate - migrate a taskset to a cgroup
  * @tset: taget taskset
  * @dst_cgrp: destination cgroup
@@ -2340,8 +2345,14 @@ static int cgroup_taskset_migrate(struct cgroup_taskset 
*tset,
 */
down_write(_set_rwsem);
list_for_each_entry(cset, >src_csets, mg_node) {
-

[PATCH 08/14] cgroup: keep css_set and task lists in chronological order

2015-10-09 Thread Tejun Heo

css task iteration will be updated to not leak cgroup internal locking
to iterator users.  In preparation, update css_set and task lists to
be in chronological order.

For tasks, as migration path is already using list_splice_tail_init(),
only cgroup_enable_task_cg_lists() and cgroup_post_fork() need
updating.  For css_sets, link_css_set() is the only place which needs
to be updated.

Signed-off-by: Tejun Heo 
---
 kernel/cgroup.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b8da97a..5cb28d2 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -913,12 +913,11 @@ static void link_css_set(struct list_head *tmp_links, 
struct css_set *cset,
link->cset = cset;
link->cgrp = cgrp;
 
-   list_move(>cset_link, >cset_links);
-
/*
-* Always add links to the tail of the list so that the list
-* is sorted by order of hierarchy creation
+* Always add links to the tail of the lists so that the lists are
+* in choronological order.
 */
+   list_move_tail(>cset_link, >cset_links);
list_add_tail(>cgrp_link, >cgrp_links);
 
cgroup_get(cgrp);
@@ -1778,7 +1777,7 @@ static void cgroup_enable_task_cg_lists(void)
 
if (!css_set_populated(cset))
css_set_update_populated(cset, true);
-   list_add(>cg_list, >tasks);
+   list_add_tail(>cg_list, >tasks);
get_css_set(cset);
}
spin_unlock_irq(>sighand->siglock);
@@ -5478,7 +5477,7 @@ void cgroup_post_fork(struct task_struct *child,
cset = task_css_set(current);
if (list_empty(>cg_list)) {
rcu_assign_pointer(child->cgroups, cset);
-   list_add(>cg_list, >tasks);
+   list_add_tail(>cg_list, >tasks);
get_css_set(cset);
}
up_write(_set_rwsem);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/14] cgroup: make cgroup_destroy_locked() test cgroup_is_populated()

2015-10-09 Thread Tejun Heo

cgroup_destroy_locked() currently tests whether any css_sets are
associated to reject removal if the cgroup contains tasks.  This works
because a css_set's refcnt converges with the number of tasks linked
to it and thus there's no css_set linked to a cgroup if it doesn't
have any live tasks.

To help tracking resource usage of zombie tasks, putting the ref of
css_set will be separated from disassociating the task from the
css_set which means that a cgroup may have css_sets linked to it even
when it doesn't have any live tasks.

This patch updates cgroup_destroy_locked() so that it tests
cgroup_is_populated(), which counts the number of populated css_sets,
instead of whether cgrp->cset_links is empty to determine whether the
cgroup is populated or not.  This ensures that rmdirs won't be
incorrectly rejected for cgroups which only contain zombie tasks.

Signed-off-by: Tejun Heo 
---
 kernel/cgroup.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ba11496..b8da97a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4996,16 +4996,15 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
__releases(_mutex) __acquires(_mutex)
 {
struct cgroup_subsys_state *css;
-   bool empty;
int ssid;
 
lockdep_assert_held(_mutex);
 
-   /* css_set_rwsem synchronizes access to ->cset_links */
-   down_read(_set_rwsem);
-   empty = list_empty(>cset_links);
-   up_read(_set_rwsem);
-   if (!empty)
+   /*
+* Only migration can raise populated from zero and we're already
+* holding cgroup_mutex.
+*/
+   if (cgroup_is_populated(cgrp))
return -EBUSY;
 
/*
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/14] cgroup: make css_sets pin the associated cgroups

2015-10-09 Thread Tejun Heo

Currently, css_sets don't pin the associated cgroups.  This is okay as
a cgroup with css_sets associated are not allowed to be removed;
however, to help resource tracking for zombie tasks, this is scheduled
to change such that a cgroup can be removed even when it has css_sets
associated as long as none of them are populated.

To ensure that a cgroup doesn't go away while css_sets are still
associated with it, make each associated css_set hold a reference on
the cgroup.

Signed-off-by: Tejun Heo 
---
 kernel/cgroup.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ab5c9a5..ba11496 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -705,6 +705,7 @@ static void put_css_set_locked(struct css_set *cset)
list_for_each_entry_safe(link, tmp_link, >cgrp_links, cgrp_link) {
list_del(>cset_link);
list_del(>cgrp_link);
+   cgroup_put(link->cgrp);
kfree(link);
}
 
@@ -919,6 +920,8 @@ static void link_css_set(struct list_head *tmp_links, 
struct css_set *cset,
 * is sorted by order of hierarchy creation
 */
list_add_tail(>cgrp_link, >cgrp_links);
+
+   cgroup_get(cgrp);
 }
 
 /**
@@ -4998,10 +5001,7 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 
lockdep_assert_held(_mutex);
 
-   /*
-* css_set_rwsem synchronizes access to ->cset_links and prevents
-* @cgrp from being removed while put_css_set() is in progress.
-*/
+   /* css_set_rwsem synchronizes access to ->cset_links */
down_read(_set_rwsem);
empty = list_empty(>cset_links);
up_read(_set_rwsem);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/14] cgroup: remove an unused parameter from cgroup_task_migrate()

2015-10-09 Thread Tejun Heo

cgroup_task_migrate() no longer uses @old_cgrp.  Remove it.

Signed-off-by: Tejun Heo 
---
 kernel/cgroup.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ae23814..49f30f1 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2235,14 +2235,12 @@ struct task_struct *cgroup_taskset_next(struct 
cgroup_taskset *tset)
 
 /**
  * cgroup_task_migrate - move a task from one cgroup to another.
- * @old_cgrp: the cgroup @tsk is being migrated from
  * @tsk: the task being migrated
  * @new_cset: the new css_set @tsk is being attached to
  *
  * Must be called with cgroup_mutex, threadgroup and css_set_rwsem locked.
  */
-static void cgroup_task_migrate(struct cgroup *old_cgrp,
-   struct task_struct *tsk,
+static void cgroup_task_migrate(struct task_struct *tsk,
struct css_set *new_cset)
 {
struct css_set *old_cset;
@@ -2311,8 +2309,7 @@ static int cgroup_taskset_migrate(struct cgroup_taskset 
*tset,
down_write(_set_rwsem);
list_for_each_entry(cset, >src_csets, mg_node) {
list_for_each_entry_safe(task, tmp_task, >mg_tasks, 
cg_list)
-   cgroup_task_migrate(cset->mg_src_cgrp, task,
-   cset->mg_dst_cset);
+   cgroup_task_migrate(task, cset->mg_dst_cset);
}
up_write(_set_rwsem);
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHSET cgroup/for-4.4] cgroup: make zombies retain cgroup membership and fix pids controller

2015-10-09 Thread Tejun Heo

Hello,

cgroup currently disassociates a task from its cgroups on exit and
reassigns it to the root cgroup.  This behavior turns out to be
problematic for several reasons.

* Resources can't be tracked for zombies.  This breaks pids controller
  as zombies escape resource restriction.  A cgroup can easily go way
  above its limits by creating a bunch of zombies.

* It's difficult to tell where zombies came from.  /proc/PID/cgroup
  gets reset to / on exit so given a zombie it's difficult to tell
  from which cgroup the zombie came from.

* It creates an extra work for controllers for no reason.  cpu and
  perf_events controllers implement exit callbacks to switch the
  exiting task's membership to root when just leaving it as-is is
  enough.

Unfortunately, fixing this involves opening a few cans of worms.

* Decoupling tasks being on a css_set from its reference counting so
  that css_set can be pinned w/o tasks being on it and decoupling
  css_set existence from whether a cgroup is populated so that pinning
  a css_set doesn't confuse populated state tracking and populated
  state can be used to decide whether certain operations are allowed.

* Making css task iteration drop css_set_rwsem between iteration steps
  so that internal locking is not exposed to iterator users and
  css_set_rwsem can be converted to a spinlock which can be grabbed
  from task free path.

After this patchset, besides pids controller being fixed, the visible
behavior isn't changed on traditional hierarchies but on the default
hierarchy a zombie reports its cgroup at the time of exit in
/proc/PID/cgroup.  If the cgroup gets removed before the task is
reaped, " (deleted)" is appended to the reported path.

This patchset contains the following 14 patches.

 0001-cgroup-remove-an-unused-parameter-from-cgroup_task_m.patch
 0002-cgroup-make-cgroup-nr_populated-count-the-number-of-.patch
 0003-cgroup-replace-cgroup_has_tasks-with-cgroup_is_popul.patch
 0004-cgroup-move-check_for_release-invocation.patch
 0005-cgroup-relocate-cgroup_-try-get-put.patch
 0006-cgroup-make-css_sets-pin-the-associated-cgroups.patch
 0007-cgroup-make-cgroup_destroy_locked-test-cgroup_is_pop.patch
 0008-cgroup-keep-css_set-and-task-lists-in-chronological-.patch
 0009-cgroup-factor-out-css_set_move_task.patch
 0010-cgroup-reorganize-css_task_iter-functions.patch
 0011-cgroup-don-t-hold-css_set_rwsem-across-css-task-iter.patch
 0012-cgroup-make-css_set_rwsem-a-spinlock-and-rename-it-t.patch
 0013-cgroup-keep-zombies-associated-with-their-original-c.patch
 0014-cgroup-add-cgroup_subsys-free-method-and-use-it-to-f.patch

0001-0007 decouple populated state tracking from css_set existence and
allows css_sets to be pinned without tasks on them.

0008-0012 update css_set task iterator to not hold lock across
iteration steps and replace css_set_rwsem with a spinlock.

0013 makes zombies keep their cgroup associations.  0014 introduces
->exit() method and fixes pids controller.

The patchset is pretty lightly tested and I need to verify that the
corner cases behave as expected.

This patchset is on top of cgroup/for-4.4 a3e72739b7a7 ("cgroup: fix
too early usage of static_branch_disable()") and available in the
following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-zombies

diffstat follows.  Thanks.

 Documentation/cgroups/cgroups.txt   |4 
 Documentation/cgroups/unified-hierarchy.txt |4 
 include/linux/cgroup-defs.h |   16 
 include/linux/cgroup.h  |   14 
 kernel/cgroup.c |  522 +---
 kernel/cgroup_pids.c|8 
 kernel/cpuset.c |2 
 kernel/events/core.c|   16 
 kernel/fork.c   |1 
 kernel/sched/core.c |   16 
 mm/memcontrol.c |2 
 11 files changed, 354 insertions(+), 251 deletions(-)

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/14] cgroup: make cgroup->nr_populated count the number of populated css_sets

2015-10-09 Thread Tejun Heo

Currently, cgroup->nr_populated counts whether the cgroup has any
css_sets linked to it and the number of children which has non-zero
->nr_populated.  This works because a css_set's refcnt converges with
the number of tasks linked to it and thus there's no css_set linked to
a cgroup if it doesn't have any live tasks.

To help tracking resource usage of zombie tasks, putting the ref of
css_set will be separated from disassociating the task from the
css_set which means that a cgroup may have css_sets linked to it even
when it doesn't have any live tasks.

This patch updates cgroup->nr_populated so that for the cgroup itself
it counts the number of css_sets which have tasks associated with them
so that empty css_sets don't skew the populated test.

Signed-off-by: Tejun Heo 
---
 include/linux/cgroup-defs.h |  8 +++---
 kernel/cgroup.c | 65 -
 2 files changed, 56 insertions(+), 17 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index df589a0..1744450 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -232,10 +232,10 @@ struct cgroup {
int id;
 
/*
-* If this cgroup contains any tasks, it contributes one to
-* populated_cnt.  All children with non-zero popuplated_cnt of
-* their own contribute one.  The count is zero iff there's no task
-* in this cgroup or its subtree.
+* Each non-empty css_set associated with this cgroup contributes
+* one to populated_cnt.  All children with non-zero popuplated_cnt
+* of their own contribute one.  The count is zero iff there's no
+* task in this cgroup or its subtree.
 */
int populated_cnt;
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 49f30f1..e5231d0 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -582,14 +582,25 @@ struct css_set init_css_set = {
 static int css_set_count   = 1;/* 1 for init_css_set */
 
 /**
+ * css_set_populated - does a css_set contain any tasks?
+ * @cset: target css_set
+ */
+static bool css_set_populated(struct css_set *cset)
+{
+   lockdep_assert_held(_set_rwsem);
+
+   return !list_empty(>tasks) || !list_empty(>mg_tasks);
+}
+
+/**
  * cgroup_update_populated - updated populated count of a cgroup
  * @cgrp: the target cgroup
  * @populated: inc or dec populated count
  *
- * @cgrp is either getting the first task (css_set) or losing the last.
- * Update @cgrp->populated_cnt accordingly.  The count is propagated
- * towards root so that a given cgroup's populated_cnt is zero iff the
- * cgroup and all its descendants are empty.
+ * One of the css_sets associated with @cgrp is either getting its first
+ * task or losing the last.  Update @cgrp->populated_cnt accordingly.  The
+ * count is propagated towards root so that a given cgroup's populated_cnt
+ * is zero iff the cgroup and all its descendants don't contain any tasks.
  *
  * @cgrp's interface file "cgroup.populated" is zero if
  * @cgrp->populated_cnt is zero and 1 otherwise.  When @cgrp->populated_cnt
@@ -618,6 +629,24 @@ static void cgroup_update_populated(struct cgroup *cgrp, 
bool populated)
} while (cgrp);
 }
 
+/**
+ * css_set_update_populated - update populated state of a css_set
+ * @cset: target css_set
+ * @populated: whether @cset is populated or depopulated
+ *
+ * @cset is either getting the first task or losing the last.  Update the
+ * ->populated_cnt of all associated cgroups accordingly.
+ */
+static void css_set_update_populated(struct css_set *cset, bool populated)
+{
+   struct cgrp_cset_link *link;
+
+   lockdep_assert_held(_set_rwsem);
+
+   list_for_each_entry(link, >cgrp_links, cgrp_link)
+   cgroup_update_populated(link->cgrp, populated);
+}
+
 /*
  * hash table for cgroup groups. This improves the performance to find
  * an existing css_set. This hash doesn't (currently) take into
@@ -663,10 +692,8 @@ static void put_css_set_locked(struct css_set *cset)
list_del(>cgrp_link);
 
/* @cgrp can't go away while we're holding css_set_rwsem */
-   if (list_empty(>cset_links)) {
-   cgroup_update_populated(cgrp, false);
+   if (list_empty(>cset_links))
check_for_release(cgrp);
-   }
 
kfree(link);
}
@@ -875,8 +902,6 @@ static void link_css_set(struct list_head *tmp_links, 
struct css_set *cset,
link->cset = cset;
link->cgrp = cgrp;
 
-   if (list_empty(>cset_links))
-   cgroup_update_populated(cgrp, true);
list_move(>cset_link, >cset_links);
 
/*
@@ -1754,6 +1779,8 @@ static void cgroup_enable_task_cg_lists(void)
if (!(p->flags & PF_EXITING)) {
struct css_set *cset = task_css_set(p);
 
+   if (!css_set_populated(cset))
+

[PATCH 04/14] cgroup: move check_for_release() invocation

2015-10-09 Thread Tejun Heo

To trigger release agent when the last task leaves the cgroup,
check_for_release() is called from put_css_set_locked(); however,
css_set being unlinked is being decoupled from task leaving the cgroup
and the correct condition to test is cgroup->nr_populated dropping to
zero which check_for_release() is already updated to test.

This patch moves check_for_release() invocation from
put_css_set_locked() to cgroup_update_populated().

Signed-off-by: Tejun Heo 
---
 kernel/cgroup.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 435aa68..855313d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -623,6 +623,7 @@ static void cgroup_update_populated(struct cgroup *cgrp, 
bool populated)
if (!trigger)
break;
 
+   check_for_release(cgrp);
cgroup_file_notify(>events_file);
 
cgrp = cgroup_parent(cgrp);
@@ -686,15 +687,8 @@ static void put_css_set_locked(struct css_set *cset)
css_set_count--;
 
list_for_each_entry_safe(link, tmp_link, >cgrp_links, cgrp_link) {
-   struct cgroup *cgrp = link->cgrp;
-
list_del(>cset_link);
list_del(>cgrp_link);
-
-   /* @cgrp can't go away while we're holding css_set_rwsem */
-   if (list_empty(>cset_links))
-   check_for_release(cgrp);
-
kfree(link);
}
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Linux] Linux PID algorithm is BRAINDEAD!

2015-10-09 Thread yalin wang


> On Oct 10, 2015, at 10:00, Dave Goel  wrote:
> 
> Pardon the subject line!  I think the PID algo. is actually pretty
> good and cheap.
> 
> 
> I just think that a very minor tweak could actually make it *actually* do
> what it always intended to do (that is, satisfy the PID-AIM listed below)!
> No expanded PID renumbering, no incompatibility introduction, nothing, just
> a relatively *minor tweak*.
> 
> 
> *** QUICK SUMMARY:
> 
> 
> PROBLEM:  PID gets re-used immediately as soon as it is freed.
> 
> EXAMPLE: My program with PID 1323 uses temporary files that are of the form
> PID-. It finishes in two days By this time, linux has circled
> PIDs back around five times. And, the counter happens to be right at 1323
> right as the program finishes. As soon as the PID is freed, Linux thus
> re-uses the PID 1323. A trash-cleaner program (I know, I know) gets
> confused. 1323 exited uncleanly. The cleaner sees no 1323 running. It then
> proceeds to delete the temp files. But, by that time, the kernel started
> another 1323 which happens to use the very files. The cleaner ends up
> deleting files for the wrong, running program!
> 
> This is just one example. There are many more examples of race conditions.
> 
> 
> 
> A TINY TWEAK AS SOLUTION:
> 
> The kernel already tries to do the right thing. The only minor tweak needed
> is that:
> 
> ***When Linux Re-Uses A Pid, It Uses The Pid That Has Been Free For
> The Longest.***
> 
> That's it!
> 
> 
> RESULT:
> 
> All PID-related race conditions are eliminated, period. No "good enough"
> hacks needed any more by utilities trying to avoid race conditions. A
> freshly freed PID will never get re-used immediately any more . No more race
> conditions. (The only time you will ever see an immediate re-use any more is
> when your machine actually has 32767(!)  or (2^22-1) processes!  And, by
> that time, you have FAR bigger problems.)
> 
> 
> 
> 
> 
>  DETAILS:
> 
> 
> You don't have to google very much to see the 1000 algos and other bashisms
> that exist to avoid the very race condition. For example, when you want to
> read a PID list and deleting temporary files based on a PID. The concern
> is that Linux might have created a new process with the same PID by the
> time you read the file-list.  We could argue endlessly that these bashisms
> are stupid and there are better ways. But, it seems to me that these better
> ways are not foolproof either; they are themselves hacks. And, a very simple
> tweak could alleviate 100% of these problems.
> 
> Now, 32768, or even 2^22 are actually very small numbers. OTOH, 2^200 is
> not. In an ideal world, the PID would just sample from the 2^200 space and
> declare that there will never be PID re-use or conflicts, period. If there's
> a 32-bit limit, it could use multiple bytes and still use a 2^200 space,
> etc.  But, all that would be a drastic change, which is why we are stuck
> with the 2^15 space.
> 
> Is there a way to continue using the 2^15 (or 2^22) space, but more
> reasonably?
> 
> I argue that the kernel already tries to satisfy a "PID-AIM" and already
> tries to Do The Right Thing, but there's just a tiny thinko in its current
> implementation that's easily corrected.
> 
> 
> PID-AIM:
> "No immediate re-use." The aim is to NOT immediately re-use a PID right
> after it's been freed.  I argue that this aim can easily be satisfied, even
> within the limited 2^15 space.
> 
> 
> CURRENT IMPLEMENTATION:
> The creators had the right idea. Linux indeed tries to satisfy the PID-AIM
> condition. This is why it increments the counter even if the PID is actually
> available.
> 
> But, looping happens (within a few hours for me). And, as soon as looping
> happens, satisfying the PID-AIM goes out the window.
> 
> This tweak would ensure that immediate re-use never happens, period.
> 
> COMPLEXITY, ETC:
> 
> All that the entire system needs is one queue of free PIDs. Any time you
> need a PID, take it from the head. Any time a PID is newly freed, push it at
> the back of the queue.  That's it! The overhead seems minimal to me.
> 
> The queue is initially populated by 2-32768, of course.
> 
> In fact, we could even use a smaller queue and not even activate the
> queue-mode till it is actually necessary; we could use various optimizing
> conditions and shortcuts, say, to push PIDs in the queue in batches. After
> all, it's not STRICTLY necessary to maintain exactly the correct order. The
> ONLY aim we really want to satisfy is that the latest-freed PID NOT go near
> the *head* of the queue; that's all.  Also, the queue is only even needed in
> the first place until you've actually looped around.  So, tiny rescue disk
> bootups would not even need to go the queue-mode.. (unless they've been
> running many days.)
> 
> 
> (Thanks to jd_1 and _mjg on for humoring and discussing this idea when I
> presented it on ##kernel.)
> 
> 
> Dave Goel (Please CC responses.)
> --
> To unsubscribe from this list: send the line "unsubscribe

Re: [RFT 0/3] usb: usb3503: Fix probing on Arndale board (missing phy)

2015-10-09 Thread Krzysztof Kozlowski

W dniu 10.10.2015 o 04:18, Kevin Hilman pisze:
> Hi Krzystof,
> 
> Krzysztof Kozlowski  writes:
> 
>> Introduction
>> 
>> This patchset tries to fix probing of usb3503 on Arndale board
>> if the Samsung PHY driver is probed later (or built as a module).
>>
>> *The patchset was not tested on Arndale board.*
>> I don't have that board. Please test it and say if the usb3503 deferred probe
>> works fine and the issue is solved.
> 
> FYI... I built this series on top of  next-20151009 and using
> exynos_defconfig.  I booted it on my arndale, and I still don't see the
> networking come up.  Full boot log attached.

+cc Tyler

Kevin,
Thanks for testing but I am not sure if this boot failure is related to
the patch. I mean if the patch should fix this particular boot failure.
The board stopped to boot after enabling DWC2 and network USB gadget:
http://www.spinics.net/lists/linux-samsung-soc/msg47009.html

The boot log shows that usb3503 was probed but asix network adapter was
not. I wonder how these things are related to each other...

Nevertheless it would be difficult to debug the issue without the
Arndale board. :)

Best regards,
Krzysztof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RESEND] mfd: rtsx: add support for rts522A

2015-10-09 Thread 敬锐

Hi Lee

Sorry for bother you, but I still can't see this patch applied.
Is there something wrong?

Regards.
micky.
On 07/08/2015 03:38 PM, Lee Jones wrote:
> On Wed, 08 Jul 2015, 敬锐 wrote:
>
>>
>> On 07/07/2015 07:46 PM, Lee Jones wrote:
>>> On Mon, 29 Jun 2015, micky_ch...@realsil.com.cn wrote:
>>>
 From: Micky Ching 

 rts522a(rts5227s) is derived from rts5227, and mainly same with rts5227.
 Add it to file mfd/rts5227.c to support this chip.

 Signed-off-by: Micky Ching 
 ---
drivers/mfd/Kconfig  |  7 ++--
drivers/mfd/rts5227.c| 77 
 ++--
drivers/mfd/rtsx_pcr.c   |  5 +++
drivers/mfd/rtsx_pcr.h   |  3 ++
include/linux/mfd/rtsx_pci.h |  6 
5 files changed, 93 insertions(+), 5 deletions(-)
>>> I Acked this once already.
>>>
>>> What's changed since then?
>> It's not changed, but I don't have time to fix magic numbers these days,
>> so, I prefer you apply this patch not waiting next patch.
> Subsequent patches are irrelevant.  I Acked this patch, so the Ack
> should carry forward.
>
> I'll apply the patch for now, but please bear this in mind for the
> future.
>
> Patch applied, thanks.
>
 diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
 index 6538159..614c146 100644
 --- a/drivers/mfd/Kconfig
 +++ b/drivers/mfd/Kconfig
 @@ -686,9 +686,10 @@ config MFD_RTSX_PCI
select MFD_CORE
help
  This supports for Realtek PCI-Express card reader including 
 rts5209,
 -rts5229, rtl8411, etc. Realtek card reader supports access to many
 -types of memory cards, such as Memory Stick, Memory Stick Pro,
 -Secure Digital and MultiMediaCard.
 +rts5227, rts522A, rts5229, rts5249, rts524A, rts525A, rtl8411, etc.
 +Realtek card reader supports access to many types of memory cards,
 +such as Memory Stick, Memory Stick Pro, Secure Digital and
 +MultiMediaCard.

config MFD_RT5033
tristate "Richtek RT5033 Power Management IC"
 diff --git a/drivers/mfd/rts5227.c b/drivers/mfd/rts5227.c
 index ce012d7..cf13e66 100644
 --- a/drivers/mfd/rts5227.c
 +++ b/drivers/mfd/rts5227.c
 @@ -26,6 +26,14 @@

#include "rtsx_pcr.h"

 +static u8 rts5227_get_ic_version(struct rtsx_pcr *pcr)
 +{
 +  u8 val;
 +
 +  rtsx_pci_read_register(pcr, DUMMY_REG_RESET_0, );
 +  return val & 0x0F;
 +}
 +
static void rts5227_fill_driving(struct rtsx_pcr *pcr, u8 voltage)
{
u8 driving_3v3[4][3] = {
 @@ -88,7 +96,7 @@ static void rts5227_force_power_down(struct rtsx_pcr 
 *pcr, u8 pm_state)
rtsx_pci_write_register(pcr, AUTOLOAD_CFG_BASE + 3, 0x01, 0);

if (pm_state == HOST_ENTER_S3)
 -  rtsx_pci_write_register(pcr, PM_CTRL3, 0x10, 0x10);
 +  rtsx_pci_write_register(pcr, pcr->reg_pm_ctrl3, 0x10, 0x10);

rtsx_pci_write_register(pcr, FPDCTL, 0x03, 0x03);
}
 @@ -121,7 +129,7 @@ static int rts5227_extra_init_hw(struct rtsx_pcr *pcr)
rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, PETXCFG, 0xB8, 
 0xB8);
else
rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, PETXCFG, 0xB8, 
 0x88);
 -  rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, PM_CTRL3, 0x10, 0x00);
 +  rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, pcr->reg_pm_ctrl3, 0x10, 0x00);

return rtsx_pci_send_cmd(pcr, 100);
}
 @@ -298,8 +306,73 @@ void rts5227_init_params(struct rtsx_pcr *pcr)
pcr->tx_initial_phase = SET_CLOCK_PHASE(27, 27, 15);
pcr->rx_initial_phase = SET_CLOCK_PHASE(30, 7, 7);

 +  pcr->ic_version = rts5227_get_ic_version(pcr);
pcr->sd_pull_ctl_enable_tbl = rts5227_sd_pull_ctl_enable_tbl;
pcr->sd_pull_ctl_disable_tbl = rts5227_sd_pull_ctl_disable_tbl;
pcr->ms_pull_ctl_enable_tbl = rts5227_ms_pull_ctl_enable_tbl;
pcr->ms_pull_ctl_disable_tbl = rts5227_ms_pull_ctl_disable_tbl;
 +
 +  pcr->reg_pm_ctrl3 = PM_CTRL3;
 +}
 +
 +static int rts522a_optimize_phy(struct rtsx_pcr *pcr)
 +{
 +  int err;
 +
 +  err = rtsx_pci_write_register(pcr, RTS522A_PM_CTRL3, D3_DELINK_MODE_EN,
 +  0x00);
 +  if (err < 0)
 +  return err;
 +
 +  if (is_version(pcr, 0x522A, IC_VER_A)) {
 +  err = rtsx_pci_write_phy_register(pcr, PHY_RCR2,
 +  PHY_RCR2_INIT_27S);
 +  if (err)
 +  return err;
 +
 +  rtsx_pci_write_phy_register(pcr, PHY_RCR1, PHY_RCR1_INIT_27S);
 +  rtsx_pci_write_phy_register(pcr, PHY_FLD0, PHY_FLD0_INIT_27S);
 +  rtsx_pci_write_phy_register(pcr,

[PATCH v10 2/6] ARM/PCI: remove align_resource in pci_sys_data

2015-10-09 Thread Zhou Wang

From: gabriele paoloni 

This patch is needed in order to unify the PCIe designware framework for ARM and
ARM64 architectures. In the PCIe designware unification process we are calling
pci_create_root_bus() passing a "sysdata" parameter that is the same for both
ARM and ARM64 and is of type "struct pcie_port*". In the ARM case this will
cause a problem with the function pcibios_align_resource(); in fact this will
cast "dev->sysdata" to "struct pci_sys_data*", whereas designware had passed a
"struct pcie_port*" pointer.

This patch solves the issue by removing "align_resource" from "pci_sys_data"
struct and defining a static global function pointer in "bios32.c"

Signed-off-by: Gabriele Paoloni 
Signed-off-by: Zhou Wang 
Acked-by: Pratyush Anand 
---
 arch/arm/include/asm/mach/pci.h |  6 --
 arch/arm/kernel/bios32.c| 12 
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/arm/include/asm/mach/pci.h b/arch/arm/include/asm/mach/pci.h
index 8857d28..0070e85 100644
--- a/arch/arm/include/asm/mach/pci.h
+++ b/arch/arm/include/asm/mach/pci.h
@@ -52,12 +52,6 @@ struct pci_sys_data {
u8  (*swizzle)(struct pci_dev *, u8 *);
/* IRQ mapping  
*/
int (*map_irq)(const struct pci_dev *, u8, u8);
-   /* Resource alignement requirements 
*/
-   resource_size_t (*align_resource)(struct pci_dev *dev,
- const struct resource *res,
- resource_size_t start,
- resource_size_t size,
- resource_size_t align);
void*private_data;  /* platform controller private data 
*/
 };
 
diff --git a/arch/arm/kernel/bios32.c b/arch/arm/kernel/bios32.c
index 874e182..6551d28 100644
--- a/arch/arm/kernel/bios32.c
+++ b/arch/arm/kernel/bios32.c
@@ -17,6 +17,11 @@
 #include 
 
 static int debug_pci;
+static resource_size_t (*align_resource)(struct pci_dev *dev,
+ const struct resource *res,
+ resource_size_t start,
+ resource_size_t size,
+ resource_size_t align) = NULL;
 
 /*
  * We can't use pci_get_device() here since we are
@@ -456,7 +461,7 @@ static void pcibios_init_hw(struct device *parent, struct 
hw_pci *hw,
sys->busnr   = busnr;
sys->swizzle = hw->swizzle;
sys->map_irq = hw->map_irq;
-   sys->align_resource = hw->align_resource;
+   align_resource = hw->align_resource;
INIT_LIST_HEAD(>resources);
 
if (hw->private_data)
@@ -572,7 +577,6 @@ resource_size_t pcibios_align_resource(void *data, const 
struct resource *res,
resource_size_t size, resource_size_t align)
 {
struct pci_dev *dev = data;
-   struct pci_sys_data *sys = dev->sysdata;
resource_size_t start = res->start;
 
if (res->flags & IORESOURCE_IO && start & 0x300)
@@ -580,8 +584,8 @@ resource_size_t pcibios_align_resource(void *data, const 
struct resource *res,
 
start = (start + align - 1) & ~(align - 1);
 
-   if (sys->align_resource)
-   return sys->align_resource(dev, res, start, size, align);
+   if (align_resource)
+   return align_resource(dev, res, start, size, align);
 
return start;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v10 4/6] PCI: hisi: Add PCIe host support for HiSilicon SoC Hip05

2015-10-09 Thread Zhou Wang

This patch adds PCIe host support for HiSilicon SoC Hip05.

Signed-off-by: Zhou Wang 
Signed-off-by: Gabriele Paoloni 
Signed-off-by: liudongdong 
Signed-off-by: Kefeng Wang 
Signed-off-by: qiuzhenfa 
---
 drivers/pci/host/Kconfig |   8 ++
 drivers/pci/host/Makefile|   1 +
 drivers/pci/host/pcie-hisi.c | 320 +++
 3 files changed, 329 insertions(+)
 create mode 100644 drivers/pci/host/pcie-hisi.c

diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
index d5e58ba..ae873be 100644
--- a/drivers/pci/host/Kconfig
+++ b/drivers/pci/host/Kconfig
@@ -145,4 +145,12 @@ config PCIE_IPROC_BCMA
  Say Y here if you want to use the Broadcom iProc PCIe controller
  through the BCMA bus interface
 
+config PCI_HISI
+   depends on OF && ARM64
+   bool "HiSilicon SoC HIP05 PCIe controller"
+   select PCIEPORTBUS
+   select PCIE_DW
+   help
+ Say Y here if you want PCIe controller support on HiSilicon HIP05 SoC
+
 endmenu
diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile
index 140d66f..ea1dbf2 100644
--- a/drivers/pci/host/Makefile
+++ b/drivers/pci/host/Makefile
@@ -17,3 +17,4 @@ obj-$(CONFIG_PCI_VERSATILE) += pci-versatile.o
 obj-$(CONFIG_PCIE_IPROC) += pcie-iproc.o
 obj-$(CONFIG_PCIE_IPROC_PLATFORM) += pcie-iproc-platform.o
 obj-$(CONFIG_PCIE_IPROC_BCMA) += pcie-iproc-bcma.o
+obj-$(CONFIG_PCI_HISI) += pcie-hisi.o
diff --git a/drivers/pci/host/pcie-hisi.c b/drivers/pci/host/pcie-hisi.c
new file mode 100644
index 000..26aa0d9
--- /dev/null
+++ b/drivers/pci/host/pcie-hisi.c
@@ -0,0 +1,320 @@
+/*
+ * PCIe host controller driver for HiSilicon Hip05 SoC
+ *
+ * Copyright (C) 2015 HiSilicon Co., Ltd. http://www.hisilicon.com
+ *
+ * Author: Zhou Wang 
+ * Dacai Zhu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pcie-designware.h"
+
+#define PCIE_SUBCTRL_MODE_REG   0x2800
+#define PCIE_SUBCTRL_SYS_STATE4_REG 0x6818
+#define PCIE_SLV_DBI_MODE   0x0
+#define PCIE_SLV_SYSCTRL_MODE   0x1
+#define PCIE_SLV_CONTENT_MODE   0x2
+#define PCIE_SLV_MSI_ASID   0x10
+#define PCIE_LTSSM_LINKUP_STATE 0x11
+#define PCIE_LTSSM_STATE_MASK   0x3F
+#define PCIE_MSI_ASID_ENABLE(0x1 << 12)
+#define PCIE_MSI_ASID_VALUE (0x1 << 16)
+#define PCIE_MSI_TRANS_ENABLE   (0x1 << 12)
+#define PCIE_MSI_TRANS_REG  0x1c8
+#define PCIE_MSI_LOW_ADDRESS0x1b4
+#define PCIE_MSI_HIGH_ADDRESS   0x1c4
+#define PCIE_GITS_TRANSLATER0x10040
+
+#define PCIE_SYS_CTRL20_REG 0x20
+#define PCIE_RD_TAB_SEL BIT(31)
+#define PCIE_RD_TAB_EN  BIT(30)
+
+#define to_hisi_pcie(x)container_of(x, struct hisi_pcie, pp)
+
+struct hisi_pcie {
+   struct regmap *subctrl;
+   void __iomem *reg_base;
+   u32 port_id;
+   struct pcie_port pp;
+};
+
+static inline void hisi_pcie_apb_writel(struct hisi_pcie *pcie,
+   u32 val, u32 reg)
+{
+   writel(val, pcie->reg_base + reg);
+}
+
+static inline u32 hisi_pcie_apb_readl(struct hisi_pcie *pcie, u32 reg)
+{
+   return readl(pcie->reg_base + reg);
+}
+
+/*
+ * Change mode to indicate the same reg_base to base of PCIe host configure
+ * registers, base of RC configure space or base of vmid/asid context table
+ */
+static void hisi_pcie_change_apb_mode(struct hisi_pcie *pcie, u32 mode)
+{
+   u32 val;
+   u32 bit_mask;
+   u32 bit_shift;
+   u32 port_id = pcie->port_id;
+   u32 reg = PCIE_SUBCTRL_MODE_REG + 0x100 * port_id;
+
+   if ((port_id == 1) || (port_id == 2)) {
+   bit_mask = 0xc;
+   bit_shift = 0x2;
+   } else {
+   bit_mask = 0x6;
+   bit_shift = 0x1;
+   }
+
+   regmap_update_bits(pcie->subctrl, reg, bit_mask, mode << bit_shift);
+}
+
+/* Configure vmid/asid table in PCIe host */
+static void hisi_pcie_config_context(struct hisi_pcie *pcie)
+{
+   int i;
+   u32 val;
+
+   /* enable to clean vmid and asid tables though apb bus */
+   hisi_pcie_change_apb_mode(pcie, PCIE_SLV_SYSCTRL_MODE);
+
+   val = hisi_pcie_apb_readl(pcie, PCIE_SYS_CTRL20_REG);
+   /* enable ar channel */
+   val |= PCIE_RD_TAB_SEL | PCIE_RD_TAB_EN;
+   hisi_pcie_apb_writel(pcie, val, PCIE_SYS_CTRL20_REG);
+
+

[PATCH v10 3/6] PCI: designware: Add ARM64 support

2015-10-09 Thread Zhou Wang

This patch tries to unify ARM32 and ARM64 PCIe in designware driver. Delete
function dw_pcie_setup, dw_pcie_scan_bus, dw_pcie_map_irq and struct hw_pci,
move related operations to dw_pcie_host_init.

This patch also try to use of_pci_get_host_bridge_resources for ARM32 and ARM64
according to the suggestion for Gabriele[1]

This patch reverts commit f4c55c5a3f7f ("PCI: designware: Program ATU with
untranslated address") based on 1/6 in this series. we delete *_mod_base in
pcie-designware. This was discussed in [2]

I have compiled the driver with multi_v7_defconfig. However, I don't have
ARM32 PCIe related board to do test. It will be appreciated if someone could
help to test it.

Signed-off-by: Zhou Wang 
Signed-off-by: Gabriele Paoloni 
Signed-off-by: Arnd Bergmann 
Tested-by: James Morse 
Tested-by: Gabriel Fernandez 
Tested-by: Minghuan Lian 
Acked-by: Pratyush Anand 

[1] http://www.spinics.net/lists/linux-pci/msg42194.html
[2] http://www.spinics.net/lists/arm-kernel/msg436779.html
---
 drivers/pci/host/pci-dra7xx.c  |  14 +--
 drivers/pci/host/pci-keystone-dw.c |   2 +-
 drivers/pci/host/pcie-designware.c | 238 -
 drivers/pci/host/pcie-designware.h |  14 +--
 4 files changed, 90 insertions(+), 178 deletions(-)

diff --git a/drivers/pci/host/pci-dra7xx.c b/drivers/pci/host/pci-dra7xx.c
index ebdffa0..b88c248 100644
--- a/drivers/pci/host/pci-dra7xx.c
+++ b/drivers/pci/host/pci-dra7xx.c
@@ -153,15 +153,15 @@ static void dra7xx_pcie_host_init(struct pcie_port *pp)
 {
dw_pcie_setup_rc(pp);
 
-   if (pp->io_mod_base)
-   pp->io_mod_base &= CPU_TO_BUS_ADDR;
+   if (pp->io_base)
+   pp->io_base &= CPU_TO_BUS_ADDR;
 
-   if (pp->mem_mod_base)
-   pp->mem_mod_base &= CPU_TO_BUS_ADDR;
+   if (pp->mem_base)
+   pp->mem_base &= CPU_TO_BUS_ADDR;
 
-   if (pp->cfg0_mod_base) {
-   pp->cfg0_mod_base &= CPU_TO_BUS_ADDR;
-   pp->cfg1_mod_base &= CPU_TO_BUS_ADDR;
+   if (pp->cfg0_base) {
+   pp->cfg0_base &= CPU_TO_BUS_ADDR;
+   pp->cfg1_base &= CPU_TO_BUS_ADDR;
}
 
dra7xx_pcie_establish_link(pp);
diff --git a/drivers/pci/host/pci-keystone-dw.c 
b/drivers/pci/host/pci-keystone-dw.c
index e71da99..8062ddb 100644
--- a/drivers/pci/host/pci-keystone-dw.c
+++ b/drivers/pci/host/pci-keystone-dw.c
@@ -322,7 +322,7 @@ static void ks_dw_pcie_clear_dbi_mode(void __iomem 
*reg_virt)
 void ks_dw_pcie_setup_rc_app_regs(struct keystone_pcie *ks_pcie)
 {
struct pcie_port *pp = _pcie->pp;
-   u32 start = pp->mem.start, end = pp->mem.end;
+   u32 start = pp->mem->start, end = pp->mem->end;
int i, tr_size;
 
/* Disable BARs for inbound access */
diff --git a/drivers/pci/host/pcie-designware.c 
b/drivers/pci/host/pcie-designware.c
index 75338a6..8ef74da 100644
--- a/drivers/pci/host/pcie-designware.c
+++ b/drivers/pci/host/pcie-designware.c
@@ -11,6 +11,7 @@
  * published by the Free Software Foundation.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -69,16 +70,7 @@
 #define PCIE_ATU_FUNC(x)   (((x) & 0x7) << 16)
 #define PCIE_ATU_UPPER_TARGET  0x91C
 
-static struct hw_pci dw_pci;
-
-static unsigned long global_io_offset;
-
-static inline struct pcie_port *sys_to_pcie(struct pci_sys_data *sys)
-{
-   BUG_ON(!sys->private_data);
-
-   return sys->private_data;
-}
+static struct pci_ops dw_pcie_ops;
 
 int dw_pcie_cfg_read(void __iomem *addr, int where, int size, u32 *val)
 {
@@ -255,7 +247,7 @@ static void dw_pcie_msi_set_irq(struct pcie_port *pp, int 
irq)
 static int assign_irq(int no_irqs, struct msi_desc *desc, int *pos)
 {
int irq, pos0, i;
-   struct pcie_port *pp = sys_to_pcie(msi_desc_to_pci_sysdata(desc));
+   struct pcie_port *pp = (struct pcie_port *) 
msi_desc_to_pci_sysdata(desc);
 
pos0 = bitmap_find_free_region(pp->msi_irq_in_use, MAX_MSI_IRQS,
   order_base_2(no_irqs));
@@ -298,7 +290,7 @@ static int dw_msi_setup_irq(struct msi_controller *chip, 
struct pci_dev *pdev,
 {
int irq, pos;
struct msi_msg msg;
-   struct pcie_port *pp = sys_to_pcie(pdev->bus->sysdata);
+   struct pcie_port *pp = pdev->bus->sysdata;
 
if (desc->msi_attrib.is_msix)
return -EINVAL;
@@ -327,7 +319,7 @@ static void dw_msi_teardown_irq(struct msi_controller 
*chip, unsigned int irq)
 {
struct irq_data *data = irq_get_irq_data(irq);
struct msi_desc *msi = irq_data_get_msi_desc(data);
-   struct pcie_port *pp = sys_to_pcie(msi_desc_to_pci_sysdata(msi));
+   struct pcie_port *pp = (struct pcie_port *) 
msi_desc_to_pci_sysdata(msi);
 
clear_irq_range(pp, irq, 1, data->hwirq);
 }
@@ -362,14 +354,12 @@ int dw_pcie_host_init(struct pcie_port *pp)
 {
struct device_node *np = pp->dev->of_node;
struct platform_device *pdev = to_platform_device(pp->dev);
-

[PATCH v10 6/6] MAINTAINERS: Add pcie-hisi maintainer

2015-10-09 Thread Zhou Wang

Signed-off-by: Zhou Wang 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7ba7ab7..944a229 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8047,6 +8047,13 @@ S:   Maintained
 F: Documentation/devicetree/bindings/pci/xgene-pci-msi.txt
 F: drivers/pci/host/pci-xgene-msi.c
 
+PCIE DRIVER FOR HISILICON
+M: Zhou Wang 
+L: linux-...@vger.kernel.org
+S: Maintained
+F: Documentation/devicetree/bindings/pci/hisilicon-pcie.txt
+F: drivers/pci/host/pcie-hisi.c
+
 PCMCIA SUBSYSTEM
 P: Linux PCMCIA Team
 L: linux-pcm...@lists.infradead.org
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v10 1/6] PCI: designware: move calculation of bus addresses to DRA7xx

2015-10-09 Thread Zhou Wang

From: gabriele paoloni 

Commit f4c55c5a3f7f ("PCI: designware: Program ATU with untranslated
address") added the calculation of PCI BUS addresses in designware,
storing them in new fields added in "struct pcie_port". This
calculation is done for every designware user even if is only
applicable to DRA7xx.
This patch moves the calculation of the bus addresses to the DRA7xx
driver and is needed to allow the rework of designware to use
the new DT parsing API.

Signed-off-by: Gabriele Paoloni 
Signed-off-by: Zhou Wang 
Acked-by: Pratyush Anand 
---
 drivers/pci/host/pci-dra7xx.c  | 13 +
 drivers/pci/host/pcie-designware.c | 15 ---
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/host/pci-dra7xx.c b/drivers/pci/host/pci-dra7xx.c
index 199e29a..ebdffa0 100644
--- a/drivers/pci/host/pci-dra7xx.c
+++ b/drivers/pci/host/pci-dra7xx.c
@@ -62,6 +62,7 @@
 
 #definePCIECTRL_DRA7XX_CONF_PHY_CS 0x010C
 #defineLINK_UP BIT(16)
+#defineCPU_TO_BUS_ADDR 0x0FFF
 
 struct dra7xx_pcie {
void __iomem*base;
@@ -151,6 +152,18 @@ static void dra7xx_pcie_enable_interrupts(struct pcie_port 
*pp)
 static void dra7xx_pcie_host_init(struct pcie_port *pp)
 {
dw_pcie_setup_rc(pp);
+
+   if (pp->io_mod_base)
+   pp->io_mod_base &= CPU_TO_BUS_ADDR;
+
+   if (pp->mem_mod_base)
+   pp->mem_mod_base &= CPU_TO_BUS_ADDR;
+
+   if (pp->cfg0_mod_base) {
+   pp->cfg0_mod_base &= CPU_TO_BUS_ADDR;
+   pp->cfg1_mod_base &= CPU_TO_BUS_ADDR;
+   }
+
dra7xx_pcie_establish_link(pp);
if (IS_ENABLED(CONFIG_PCI_MSI))
dw_pcie_msi_init(pp);
diff --git a/drivers/pci/host/pcie-designware.c 
b/drivers/pci/host/pcie-designware.c
index 52aa6e3..75338a6 100644
--- a/drivers/pci/host/pcie-designware.c
+++ b/drivers/pci/host/pcie-designware.c
@@ -365,14 +365,10 @@ int dw_pcie_host_init(struct pcie_port *pp)
struct of_pci_range range;
struct of_pci_range_parser parser;
struct resource *cfg_res;
-   u32 val, na, ns;
+   u32 val, ns;
const __be32 *addrp;
int i, index, ret;
 
-   /* Find the address cell size and the number of cells in order to get
-* the untranslated address.
-*/
-   of_property_read_u32(np, "#address-cells", );
ns = of_n_size_cells(np);
 
cfg_res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "config");
@@ -415,8 +411,7 @@ int dw_pcie_host_init(struct pcie_port *pp)
pp->io_base = range.cpu_addr;
 
/* Find the untranslated IO space address */
-   pp->io_mod_base = of_read_number(parser.range -
-parser.np + na, ns);
+   pp->io_mod_base = range.cpu_addr;
}
if (restype == IORESOURCE_MEM) {
of_pci_range_to_resource(, np, >mem);
@@ -425,8 +420,7 @@ int dw_pcie_host_init(struct pcie_port *pp)
pp->mem_bus_addr = range.pci_addr;
 
/* Find the untranslated MEM space address */
-   pp->mem_mod_base = of_read_number(parser.range -
- parser.np + na, ns);
+   pp->mem_mod_base = range.cpu_addr;
}
if (restype == 0) {
of_pci_range_to_resource(, np, >cfg);
@@ -436,8 +430,7 @@ int dw_pcie_host_init(struct pcie_port *pp)
pp->cfg1_base = pp->cfg.start + pp->cfg0_size;
 
/* Find the untranslated configuration space address */
-   pp->cfg0_mod_base = of_read_number(parser.range -
-  parser.np + na, ns);
+   pp->cfg0_mod_base = range.cpu_addr;
pp->cfg1_mod_base = pp->cfg0_mod_base +
pp->cfg0_size;
}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v10 5/6] Documentation: DT: Add HiSilicon PCIe host binding

2015-10-09 Thread Zhou Wang

This patch adds related DTS binding document for HiSilicon PCIe host driver.

Signed-off-by: Zhou Wang 
---
 .../bindings/arm/hisilicon/hisilicon.txt   | 17 +
 .../devicetree/bindings/pci/hisilicon-pcie.txt | 44 ++
 2 files changed, 61 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/hisilicon-pcie.txt

diff --git a/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt 
b/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
index 3504dca..6ac7c00 100644
--- a/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
+++ b/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
@@ -171,6 +171,23 @@ Example:
};
 
 ---
+Hisilicon HiP05 PCIe-SAS system controller
+
+Required properties:
+- compatible : "hisilicon,pcie-sas-subctrl", "syscon";
+- reg : Register address and size
+
+The HiP05 PCIe-SAS system controller is shared by PCIe and SAS controllers in
+HiP05 Soc to implement some basic configurations.
+
+Example:
+   /* for HiP05 PCIe-SAS system */
+   pcie_sas: system_controller@0xb000 {
+   compatible = "hisilicon,pcie-sas-subctrl", "syscon";
+   reg = <0xb000 0x1>;
+   };
+
+---
 Hisilicon CPU controller
 
 Required properties:
diff --git a/Documentation/devicetree/bindings/pci/hisilicon-pcie.txt 
b/Documentation/devicetree/bindings/pci/hisilicon-pcie.txt
new file mode 100644
index 000..17c6ed9
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/hisilicon-pcie.txt
@@ -0,0 +1,44 @@
+HiSilicon PCIe host bridge DT description
+
+HiSilicon PCIe host controller is based on Designware PCI core.
+It shares common functions with PCIe Designware core driver and inherits
+common properties defined in
+Documentation/devicetree/bindings/pci/designware-pci.txt.
+
+Additional properties are described here:
+
+Required properties:
+- compatible: Should contain "hisilicon,hip05-pcie".
+- reg: Should contain rc_dbi, config registers location and length.
+- reg-names: Must include the following entries:
+  "rc_dbi": controller configuration registers;
+  "config": PCIe configuration space registers.
+- msi-parent: Should be its_pcie which is an ITS receiving MSI interrupts.
+- port-id: Should be 0, 1, 2 or 3.
+
+Optional properties:
+- status: Either "ok" or "disabled".
+- dma-coherent: Present if DMA operations are coherent.
+
+Example:
+   pcie@0xb008 {
+   compatible = "hisilicon,hip05-pcie", "snps,dw-pcie";
+   reg = <0 0xb008 0 0x1>, <0x220 0x 0 0x2000>;
+   reg-names = "rc_dbi", "config";
+   bus-range = <0  15>;
+   msi-parent = <_pcie>;
+   #address-cells = <3>;
+   #size-cells = <2>;
+   device_type = "pci";
+   dma-coherent;
+   ranges = <0x8200 0 0x 0x220 0x 0 
0x1000>;
+   num-lanes = <8>;
+   port-id = <1>;
+   #interrupts-cells = <1>;
+   interrupts-map-mask = <0xf800 0 0 7>;
+   interrupts-map = <0x0 0 0 1 _pcie 1 10
+ 0x0 0 0 2 _pcie 2 11
+ 0x0 0 0 3 _pcie 3 12
+ 0x0 0 0 4 _pcie 4 13>;
+   status = "ok";
+   };
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v10 0/6] PCI: hisi: Add PCIe host support for HiSilicon SoC Hip05

2015-10-09 Thread Zhou Wang

This patchset adds PCIe host support for HiSilicon SoC Hip05. The PCIe hosts
use PCIe IP core from Synopsys, So this driver is based on designware PCIe 
driver.

Hip05 is an ARMv8 architecture SoC. It should be able to use ARM64 PCIe API in
designware PCIe driver. So this patch also adds ARM64 support for designware
pcie.

This patchset is based on v4.3-rc1.

Change from v9:
- Use syscon to get subctrl base address.
- 5/6 is based on [PATCH v3 0/2] arm64: Support Hisilicon Hip05-D02 board
  from Ding Tianhong
- Add hisi_pcie_cfg_read in pcie-hisi.c to match
  [PATCH v6 0/3] PCI: designware: change dw_pcie_cfg_write() and 
dw_pcie_cfg_read()
  from Gabriele.

Change from v8:
- Rebase on v4.3-rc1.
- Add Tested-by from Gabriel and Minghuan.
- Remove ITS domain parsing in msi_host_init in pcie-hisi.c, no need this as PCI
  core does related job. Add ITS base address parsing in msi_host_init.
- Change vmid/asid table configuration, previous configuration was wrong.
- Add wr_own_conf callback in pcie-hisi.c.
- Use subsys_initcall to init hisi PCIe.

Change from v7:
- Remove pp->root_bus_nr = 0 in dra7xx, exynos, imx6, keystone, layerscape,
  spear13xx. Pass pp->busn->start to pci_create_root_bus as root bus number.
- Remove bus-range parsing in pcie-hisi.c.

Change from v6:
- Add Pratyush's Acked-by for 1/6 and 2/6.
- Add James' Tested-by for 3/6.

Change from v5:
- Merge 1/6 in this series, discussion about this can be found in [1]

Change from v4:
- Change the author of 1/5 to Gabriele.
- Modify problems in 3/5 pointed by Bjorn.
- Modify spelling problems in 4/5.

Change from v3:
- Change 1/5 to what Gabriele suggested.
- Use win->__res.start to get *_mod_base in 2/5, this fix a bug in v3 series.

Change from v2:
- Move struct pci_dev *dev and struct pci_sys_data *sys in
  pcibios_align_resource in 1/5.
- Add Gabriele's codes in 2/5 which delete unnecessary information parse and
  use of_pci_get_host_bridge_resources for both ARM32 and ARM64.
- Add maintainer patch 5/5.

Change from RFC v1:
- Add 1/4 patch by Arnd which removes align_resource callback in ARM
  pcibios_align_resource.
- Change head file in pcie-designware from asm/hardirq.h to linux/hardirq.h.
- Set pp->root_bus_nr = 0 in dra7xx, exynos, imx6, keystone, layerscape,
  spear13xx.
- Remove unnecessary parentheses of some macros in pcie-hisi.
- Use macro to replace some magic values.
- Merge two loops together and add some comments about it in context_config
  function in pcie-hisi.
- Modify some value of items in pcie node example in binding document. 

Change from RFC:
- delete dw_pcie_setup, dw_pcie_scan_bus, dw_pcie_map_irq and struct hw_pci,
  merge related operations into dw_pcie_host_init.

Link of v9:
- http://www.spinics.net/lists/linux-pci/msg44545.html
Link of v8:
- http://www.spinics.net/lists/linux-pci/msg44192.html
Link of v7:
- http://www.spinics.net/lists/devicetree/msg90690.html
Link of v6:
- http://www.spinics.net/lists/linux-pci/msg43669.html
Link of v5:
- http://www.spinics.net/lists/devicetree/msg87959.html
Link of v4:
- http://www.spinics.net/lists/arm-kernel/msg433050.html
Link of v3:
- http://www.spinics.net/lists/linux-pci/msg42539.html
Link of v2:
- http://www.spinics.net/lists/linux-pci/msg41844.html
Link of RFC v1:
- http://www.spinics.net/lists/linux-pci/msg41305.html
Link of RFC:
- http://www.spinics.net/lists/linux-pci/msg40434.html

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/359741.html

Zhou Wang (4):
  PCI: designware: Add ARM64 support
  PCI: hisi: Add PCIe host support for HiSilicon SoC Hip05
  Documentation: DT: Add HiSilicon PCIe host binding
  MAINTAINERS: Add pcie-hisi maintainer

gabriele paoloni (2):
  PCI: designware: move calculation of bus addresses to DRA7xx
  ARM/PCI: remove align_resource in pci_sys_data

 .../bindings/arm/hisilicon/hisilicon.txt   |  17 ++
 .../devicetree/bindings/pci/hisilicon-pcie.txt |  44 +++
 MAINTAINERS|   7 +
 arch/arm/include/asm/mach/pci.h|   6 -
 arch/arm/kernel/bios32.c   |  12 +-
 drivers/pci/host/Kconfig   |   8 +
 drivers/pci/host/Makefile  |   1 +
 drivers/pci/host/pci-dra7xx.c  |  13 +
 drivers/pci/host/pci-keystone-dw.c |   2 +-
 drivers/pci/host/pcie-designware.c | 245 +---
 drivers/pci/host/pcie-designware.h |  14 +-
 drivers/pci/host/pcie-hisi.c   | 320 +
 12 files changed, 501 insertions(+), 188 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/pci/hisilicon-pcie.txt
 create mode 100644 drivers/pci/host/pcie-hisi.c

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at

[PATCH 1/1] I have a board it block on i8259A_shutdown when I want to poweroff. It is not always re-produce.

2015-10-09 Thread Figo

There is the log:
[   27.758391] xhci_hcd :00:14.0: shutdown start
[   27.768329] xhci_hcd :00:14.0: shutdown stop
[   27.773532] pci :00:0b.0: shutdown start
[   27.778335] pci :00:0b.0: shutdown stop
[   27.783041] pci :00:0a.0: shutdown start
[   27.787847] pci :00:0a.0: shutdown stop
[   27.792550] pci :00:03.0: shutdown start
[   27.797362] pci :00:03.0: shutdown stop
[   27.802087] i915 :00:02.0: shutdown start
[   27.816006] i915 :00:02.0: shutdown stop
[   27.832384] PM: Calling mce_syscore_shutdown+0x0/0x50 start
[   27.838651] PM: Calling mce_syscore_shutdown+0x0/0x50 stop
[   27.844813] PM: Calling i8259A_shutdown+0x0/0x20 start

It seems has a potential race on i8259A_shutdown().

Signed-off-by: Figo 
---
 arch/x86/kernel/i8259.c |4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
index 16cb827..06906d4 100644
--- a/arch/x86/kernel/i8259.c
+++ b/arch/x86/kernel/i8259.c
@@ -257,12 +257,16 @@ static int i8259A_suspend(void)
 
 static void i8259A_shutdown(void)
 {
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(_lock, flags);
/* Put the i8259A into a quiescent state that
 * the kernel initialization code can get it
 * out of.
 */
outb(0xff, PIC_MASTER_IMR); /* mask all of 8259A-1 */
outb(0xff, PIC_SLAVE_IMR);  /* mask all of 8259A-2 */
+   raw_spin_unlock_irqrestore(_lock, flags);
 }
 
 static struct syscore_ops i8259_syscore_ops = {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

平时最多也就联系了三千家，全球还有十几万客户在哪里？

2015-10-09 Thread iSayor

您好:
您还在用ali平台开发外贸客户?
   还在使用展会宣传企业和产品?
 你out了!!!
 当前外贸客户开发难，您是否也在寻找展会，B2B之外好的渠道？ 
 行业全球十几万客户，平时最多也就联系了三千家，您是否想把剩下的也开发到？
 加QQ2652697913给您演示下主动开发客户的方法，先用先受益，已经有近万家企业领先您使用！！。
 广东省商业联合会推荐，主动开发客户第一品牌，近万家企业正在获益。您可以没有使用，但是不能没有了解。
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH][v3] ACPI / PM: Fix incorrect wakeup irq setting before suspend-to-idle

2015-10-09 Thread Zheng, Lv

Hi,

> From: Chen, Yu C
> Sent: Friday, October 09, 2015 5:50 PM
> 
> Hi, LV
> 
> > From: Zheng, Lv
> > Sent: Friday, October 09, 2015 4:33 PM
> >
> > Hi, Yu
> >
> > > From: Chen, Yu C
> > > Sent: Friday, October 09, 2015 4:20 PM
> > >
> > >
> > >  acpi_status acpi_os_remove_interrupt_handler(u32 irq,
> > > acpi_osd_handler handler)
> >
> > Why don't you rename irq here to gsi to improve the readability?
> > The false naming and the wrong example written for this function are
> > probably the root causes of all other bad code.
> > So if we want to stop people making future mistakes, we need to cleanup
> > ourselves.
> >
> OK, will rewrite in next version.

You can add Acked-by: Lv Zheng 
And don't forget to mark it as a stable material.

One more question is:
Do you want the modules to use "acpi_sci_irq"?
If the answer is no, then please ignore this question.

Thanks and best regards
-Lv 


> 
> > Thanks and best regards
> > -Lv
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RFC] mm: Introduce kernelcore=reliable option

2015-10-09 Thread Xishi Qiu

On 2015/10/9 18:36, Xishi Qiu wrote:

> On 2015/10/9 17:24, Kamezawa Hiroyuki wrote:
> 
>> On 2015/10/09 15:46, Xishi Qiu wrote:
>>> On 2015/10/9 22:56, Taku Izumi wrote:
>>>
 Xeon E7 v3 based systems supports Address Range Mirroring
 and UEFI BIOS complied with UEFI spec 2.5 can notify which
 ranges are reliable (mirrored) via EFI memory map.
 Now Linux kernel utilize its information and allocates
 boot time memory from reliable region.

 My requirement is:
- allocate kernel memory from reliable region
- allocate user memory from non-reliable region

 In order to meet my requirement, ZONE_MOVABLE is useful.
 By arranging non-reliable range into ZONE_MOVABLE,
 reliable memory is only used for kernel allocations.

Hi,

If we reuse the movable zone, we should set appropriate size of
mirrored memory region(normal zone) and non-mirrored memory
region(movable zone). In some cases, kernel will take more memory
than user, e.g. some apps run in kernel space, like module.

I think user can set the size in BIOS interface, right?

Thanks,
Xishi Qiu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] arm: add relocate initrd support

2015-10-09 Thread yalin wang


> On Oct 10, 2015, at 00:10, Russell King - ARM Linux  
> wrote:
> 
> On Fri, Oct 09, 2015 at 11:55:09PM +0800, yalin wang wrote:
>> Add support for initrd on ARM arch, in case
>> mem= boot option change the memory size or the initrd are
>> not placed in low memory region, we need copy the initrd
>> to low memory region.
>> 
>> Signed-off-by: yalin wang 
>> ---
>> arch/arm/include/asm/fixmap.h |  1 +
>> arch/arm/kernel/setup.c   | 72 
>> +++
>> 2 files changed, 73 insertions(+)
>> 
>> diff --git a/arch/arm/include/asm/fixmap.h b/arch/arm/include/asm/fixmap.h
>> index 58cfe9f..18ad90f 100644
>> --- a/arch/arm/include/asm/fixmap.h
>> +++ b/arch/arm/include/asm/fixmap.h
>> @@ -10,6 +10,7 @@
>> 
>> enum fixed_addresses {
>>  FIX_EARLYCON_MEM_BASE,
>> +FIX_RELOCATE_INITRD,
>>  __end_of_permanent_fixed_addresses,
>> 
>>  FIX_KMAP_BEGIN = __end_of_permanent_fixed_addresses,
>> diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
>> index 20edd34..4260d59 100644
>> --- a/arch/arm/kernel/setup.c
>> +++ b/arch/arm/kernel/setup.c
>> @@ -811,6 +811,77 @@ static void __init request_standard_resources(const 
>> struct machine_desc *mdesc)
>>  request_resource(_resource, );
>> }
>> 
>> +#if defined(CONFIG_BLK_DEV_INITRD) && defined(CONFIG_MMU)
>> +/*
>> + * Relocate initrd if it is not completely within the linear mapping.
>> + * This would be the case if mem= cuts out all or part of it
>> + * or the initrd are not in low mem region place.
>> + */
>> +static void __init relocate_initrd(void)
>> +{
>> +phys_addr_t orig_start = __virt_to_phys(initrd_start);
>> +phys_addr_t orig_end = __virt_to_phys(initrd_end);
> 
> If initrd_start or initrd_end are outside of the lowmem region, it's
> quite possible for these to return incorrect physical addresses.
> The generic kernel's idea of using virtual addresses for the initrd
> stuff is painfully wrong IMHO.
> 
> The unfortunate thing is that the DT code propagates this stuff:
> 
>initrd_start = (unsigned long)__va(start);
>initrd_end = (unsigned long)__va(end);
> 
> and even this can give wrong results for the virtual address when the
> physical is outside of lowmem.  For addresses outside of lowmem,
> __virt_to_phys(__va(start)) is not guaranteed to return 'start'.
> 
> This is why I've said that if we want to support ramdisks outside of
> the lowmem mapping, we need to get rid of the initrd_start/initrd_end
> virtual addresses.
> 
> I'm sorry, but we need much wider code changes before we can cope with
> this.
> 
>> +phys_addr_t ram_end = memblock_end_of_DRAM();
>> +phys_addr_t new_start;
>> +phys_addr_t src;
>> +unsigned long size, to_free = 0;
>> +unsigned long slop, clen, p;
>> +void *dest;
>> +
>> +if (orig_end <= memblock_get_current_limit())
>> +return;
>> +
>> +/*
>> + * Any of the original initrd which overlaps the linear map should
>> + * be freed after relocating.
> 
> How does this work?  The code in arm_memblock_init() will have already
> reserved the physical addresses for the ramdisk:
> 
>   memblock_reserve(phys_initrd_start, phys_initrd_size);
> 
> So any new allocation shouldn't overlap the existing ramdisk - unless
> this is wrong.
>  
i see, i will send a V2 patch for review .

Thanks




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC V2] arm: add relocate initrd support

2015-10-09 Thread yalin wang

Add support for initrd on ARM arch, in case
mem= boot option change the memory size or the initrd are
not placed in low memory region, we need copy the initrd
to low memory region.

Signed-off-by: yalin wang 
---
 arch/arm/include/asm/fixmap.h |  1 +
 arch/arm/kernel/setup.c   | 70 +++
 drivers/of/fdt.c  |  2 ++
 include/linux/initrd.h|  1 +
 init/do_mounts_initrd.c   |  2 ++
 5 files changed, 76 insertions(+)

diff --git a/arch/arm/include/asm/fixmap.h b/arch/arm/include/asm/fixmap.h
index 58cfe9f..18ad90f 100644
--- a/arch/arm/include/asm/fixmap.h
+++ b/arch/arm/include/asm/fixmap.h
@@ -10,6 +10,7 @@
 
 enum fixed_addresses {
FIX_EARLYCON_MEM_BASE,
+   FIX_RELOCATE_INITRD,
__end_of_permanent_fixed_addresses,
 
FIX_KMAP_BEGIN = __end_of_permanent_fixed_addresses,
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 20edd34..036473b 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -811,6 +811,75 @@ static void __init request_standard_resources(const struct 
machine_desc *mdesc)
request_resource(_resource, );
 }
 
+#if defined(CONFIG_BLK_DEV_INITRD) && defined(CONFIG_MMU)
+/*
+ * Relocate initrd if it is not completely within the linear mapping.
+ * This would be the case if mem= cuts out all or part of it
+ * or the initrd are not in low mem region place.
+ */
+static void __init relocate_initrd(void)
+{
+   phys_addr_t ram_end = memblock_end_of_DRAM();
+   phys_addr_t new_start;
+   phys_addr_t src;
+   unsigned long size, to_free = 0;
+   unsigned long slop, clen, p;
+   void *dest;
+
+   if (initrd_end_phys <= __virt_to_phys(memblock_get_current_limit()))
+   return;
+
+   /*
+* Any of the original initrd which overlaps the linear map should
+* be freed after relocating.
+*/
+   if (initrd_start_phys < ram_end)
+   to_free = min(ram_end, initrd_end_phys) - initrd_start_phys;
+
+   size = initrd_end_phys - initrd_start_phys;
+
+   /* initrd needs to be relocated completely inside linear mapping */
+   new_start = memblock_find_in_range(0, 0, size, PAGE_SIZE);
+   if (!new_start)
+   panic("Cannot relocate initrd of size %ld\n", size);
+   memblock_reserve(new_start, size);
+
+   initrd_start = __phys_to_virt(new_start);
+   initrd_end   = initrd_start + size;
+
+   pr_info("Moving initrd from [%llx-%llx] to [%llx-%llx]\n",
+   (unsigned long long)initrd_start_phys,
+   (unsigned long long)(initrd_start_phys + size - 1),
+   (unsigned long long)new_start,
+   (unsigned long long)(new_start + size - 1));
+
+   dest = (void *)initrd_start;
+
+   src = initrd_start_phys;
+   while (size) {
+   slop = src & ~PAGE_MASK;
+   clen = min(PAGE_SIZE - slop, size);
+   p = set_fixmap_offset(FIX_RELOCATE_INITRD, src);
+   memcpy(dest, (void *)p, clen);
+   clear_fixmap(FIX_RELOCATE_INITRD);
+   dest += clen;
+   src += clen;
+   size -= clen;
+   }
+
+   if (to_free) {
+   pr_info("Freeing original RAMDISK from [%llx-%llx]\n",
+   (unsigned long long)initrd_start_phys,
+   (unsigned long long)(initrd_start_phys + to_free - 1));
+   memblock_free(initrd_start_phys, to_free);
+   }
+}
+#else
+static inline void __init relocate_initrd(void)
+{
+}
+#endif
+
 #if defined(CONFIG_VGA_CONSOLE) || defined(CONFIG_DUMMY_CONSOLE)
 struct screen_info screen_info = {
  .orig_video_lines = 30,
@@ -969,6 +1038,7 @@ void __init setup_arch(char **cmdline_p)
arm_memblock_init(mdesc);
 
paging_init(mdesc);
+   relocate_initrd();
request_standard_resources(mdesc);
 
if (mdesc->restart)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 0749656..3287ecb 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -777,6 +777,8 @@ static void __init early_init_dt_check_for_initrd(unsigned 
long node)
return;
end = of_read_number(prop, len/4);
 
+   initrd_start_phys = start;
+   initrd_end_phys = end;
initrd_start = (unsigned long)__va(start);
initrd_end = (unsigned long)__va(end);
initrd_below_start_ok = 1;
diff --git a/include/linux/initrd.h b/include/linux/initrd.h
index 55289d2..0698f37 100644
--- a/include/linux/initrd.h
+++ b/include/linux/initrd.h
@@ -15,6 +15,7 @@ extern int initrd_below_start_ok;
 
 /* free_initrd_mem always gets called with the next two as arguments.. */
 extern unsigned long initrd_start, initrd_end;
+extern phys_addr_t initrd_start_phys, initrd_end_phys;
 extern void free_initrd_mem(unsigned long, unsigned long);
 
 extern unsigned int real_root_dev;
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index

[Linux] Linux PID algorithm is BRAINDEAD!

2015-10-09 Thread Dave Goel

Pardon the subject line!  I think the PID algo. is actually pretty
good and cheap.


I just think that a very minor tweak could actually make it *actually* do
what it always intended to do (that is, satisfy the PID-AIM listed below)!
No expanded PID renumbering, no incompatibility introduction, nothing, just
a relatively *minor tweak*.


*** QUICK SUMMARY:


PROBLEM:  PID gets re-used immediately as soon as it is freed.

EXAMPLE: My program with PID 1323 uses temporary files that are of the form
PID-. It finishes in two days By this time, linux has circled
PIDs back around five times. And, the counter happens to be right at 1323
right as the program finishes. As soon as the PID is freed, Linux thus
re-uses the PID 1323. A trash-cleaner program (I know, I know) gets
confused. 1323 exited uncleanly. The cleaner sees no 1323 running. It then
proceeds to delete the temp files. But, by that time, the kernel started
another 1323 which happens to use the very files. The cleaner ends up
deleting files for the wrong, running program!

This is just one example. There are many more examples of race conditions.



A TINY TWEAK AS SOLUTION:

The kernel already tries to do the right thing. The only minor tweak needed
is that:

***When Linux Re-Uses A Pid, It Uses The Pid That Has Been Free For
The Longest.***

That's it!


RESULT:

All PID-related race conditions are eliminated, period. No "good enough"
hacks needed any more by utilities trying to avoid race conditions. A
freshly freed PID will never get re-used immediately any more . No more race
conditions. (The only time you will ever see an immediate re-use any more is
when your machine actually has 32767(!)  or (2^22-1) processes!  And, by
that time, you have FAR bigger problems.)





 DETAILS:


You don't have to google very much to see the 1000 algos and other bashisms
that exist to avoid the very race condition. For example, when you want to
read a PID list and deleting temporary files based on a PID. The concern
is that Linux might have created a new process with the same PID by the
time you read the file-list.  We could argue endlessly that these bashisms
are stupid and there are better ways. But, it seems to me that these better
ways are not foolproof either; they are themselves hacks. And, a very simple
tweak could alleviate 100% of these problems.

Now, 32768, or even 2^22 are actually very small numbers. OTOH, 2^200 is
not. In an ideal world, the PID would just sample from the 2^200 space and
declare that there will never be PID re-use or conflicts, period. If there's
a 32-bit limit, it could use multiple bytes and still use a 2^200 space,
etc.  But, all that would be a drastic change, which is why we are stuck
with the 2^15 space.

Is there a way to continue using the 2^15 (or 2^22) space, but more
reasonably?

I argue that the kernel already tries to satisfy a "PID-AIM" and already
tries to Do The Right Thing, but there's just a tiny thinko in its current
implementation that's easily corrected.


PID-AIM:
"No immediate re-use." The aim is to NOT immediately re-use a PID right
after it's been freed.  I argue that this aim can easily be satisfied, even
within the limited 2^15 space.


CURRENT IMPLEMENTATION:
The creators had the right idea. Linux indeed tries to satisfy the PID-AIM
condition. This is why it increments the counter even if the PID is actually
available.

But, looping happens (within a few hours for me). And, as soon as looping
happens, satisfying the PID-AIM goes out the window.

This tweak would ensure that immediate re-use never happens, period.

COMPLEXITY, ETC:

All that the entire system needs is one queue of free PIDs. Any time you
need a PID, take it from the head. Any time a PID is newly freed, push it at
the back of the queue.  That's it! The overhead seems minimal to me.

The queue is initially populated by 2-32768, of course.

In fact, we could even use a smaller queue and not even activate the
queue-mode till it is actually necessary; we could use various optimizing
conditions and shortcuts, say, to push PIDs in the queue in batches. After
all, it's not STRICTLY necessary to maintain exactly the correct order. The
ONLY aim we really want to satisfy is that the latest-freed PID NOT go near
the *head* of the queue; that's all.  Also, the queue is only even needed in
the first place until you've actually looped around.  So, tiny rescue disk
bootups would not even need to go the queue-mode.. (unless they've been
running many days.)


(Thanks to jd_1 and _mjg on for humoring and discussing this idea when I
presented it on ##kernel.)


Dave Goel (Please CC responses.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2 5/7] powerpc: atomic: Implement cmpxchg{,64}_* and atomic{,64}_cmpxchg_* variants

2015-10-09 Thread Boqun Feng

Hi Peter,

Sorry for replying late.

On Thu, Oct 01, 2015 at 02:27:16PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 16, 2015 at 11:49:33PM +0800, Boqun Feng wrote:
> > Unlike other atomic operation variants, cmpxchg{,64}_acquire and
> > atomic{,64}_cmpxchg_acquire don't have acquire semantics if the cmp part
> > fails, so we need to implement these using assembly.
> 
> I think that is actually expected and documented. That is, a cmpxchg
> only implies barriers on success. See:
> 
>   ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory 
> ordering semantics")

I probably didn't make myself clear here, my point is that if we use
__atomic_op_acquire() to built *_cmpchg_acquire(For ARM and PowerPC),
the barrier will be implied _unconditionally_, meaning no matter cmp
fails or not, there will be a barrier after the cmpxchg operation.
Therefore we have to use assembly to implement the operations right now.

Regards,
Boqun

signature.asc
Description: PGP signature

Re: [PATCH 1/5] thermal: exynos: Fix unbalanced regulator disable on probe failure

2015-10-09 Thread Alim Akhtar

Hello,

On Fri, Oct 9, 2015 at 4:28 PM, Krzysztof Kozlowski
 wrote:
> W dniu 09.10.2015 o 01:45, Alim Akhtar pisze:
>> Hello,
>>
>> On Thu, Oct 8, 2015 at 11:04 AM, Krzysztof Kozlowski
>>  wrote:
>>> During probe if the regulator could not be enabled, the error exit path
>>> would still disable it. This could lead to unbalanced counter of
>>> regulator enable/disable.
>>>
>> Do you see a regulator unbalanced reported here during boot? You may
>> want to add that to commit message.
>
> I did not see the warning/error message about unbalanced disable. It
> would happen in certain condition only - no other enables of regulator
> and count going below 0.
>
> I would have to simulate this error to get the warning message. I don't
> think it is worth the effort.
>
Ok, looking at code, it does looks to be calling regulator disable in
case regulator enable fails.
Feel free to add
Reviewed-by: Alim Akhtar 
Thanks!!

> Best regards,
> Krzysztof
>
>>
>>> The patch moves code for getting and enabling the regulator from
>>> exynos_map_dt_data() to probe function because it is really not a part
>>> of getting Device Tree properties.
>>>
>>> Signed-off-by: Krzysztof Kozlowski 
>>> Fixes: 5f09a5cbd14a ("thermal: exynos: Disable the regulator on probe 
>>> failure")
>>> Cc: 
>>> ---
>>>  drivers/thermal/samsung/exynos_tmu.c | 34 
>>> +-
>>>  1 file changed, 17 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/thermal/samsung/exynos_tmu.c 
>>> b/drivers/thermal/samsung/exynos_tmu.c
>>> index 0bae8cc6c23a..23f4320f8ef7 100644
>>> --- a/drivers/thermal/samsung/exynos_tmu.c
>>> +++ b/drivers/thermal/samsung/exynos_tmu.c
>>> @@ -1168,27 +1168,10 @@ static int exynos_map_dt_data(struct 
>>> platform_device *pdev)
>>> struct exynos_tmu_data *data = platform_get_drvdata(pdev);
>>> struct exynos_tmu_platform_data *pdata;
>>> struct resource res;
>>> -   int ret;
>>>
>>> if (!data || !pdev->dev.of_node)
>>> return -ENODEV;
>>>
>>> -   /*
>>> -* Try enabling the regulator if found
>>> -* TODO: Add regulator as an SOC feature, so that regulator enable
>>> -* is a compulsory call.
>>> -*/
>>> -   data->regulator = devm_regulator_get(>dev, "vtmu");
>>> -   if (!IS_ERR(data->regulator)) {
>>> -   ret = regulator_enable(data->regulator);
>>> -   if (ret) {
>>> -   dev_err(>dev, "failed to enable vtmu\n");
>>> -   return ret;
>>> -   }
>>> -   } else {
>>> -   dev_info(>dev, "Regulator node (vtmu) not found\n");
>>> -   }
>>> -
>>> data->id = of_alias_get_id(pdev->dev.of_node, "tmuctrl");
>>> if (data->id < 0)
>>> data->id = 0;
>>> @@ -1312,6 +1295,23 @@ static int exynos_tmu_probe(struct platform_device 
>>> *pdev)
>>> pr_err("thermal: tz: %p ERROR\n", data->tzd);
>>> return PTR_ERR(data->tzd);
>>> }
>>> +
>>> +   /*
>>> +* Try enabling the regulator if found
>>> +* TODO: Add regulator as an SOC feature, so that regulator enable
>>> +* is a compulsory call.
>>> +*/
>>> +   data->regulator = devm_regulator_get(>dev, "vtmu");
>>> +   if (!IS_ERR(data->regulator)) {
>>> +   ret = regulator_enable(data->regulator);
>>> +   if (ret) {
>>> +   dev_err(>dev, "failed to enable vtmu\n");
>>> +   return ret;
>>> +   }
>>> +   } else {
>>> +   dev_info(>dev, "Regulator node (vtmu) not found\n");
>>> +   }
>>> +
>>> ret = exynos_map_dt_data(pdev);
>>> if (ret)
>>> goto err_sensor;
>>> --
>>> 1.9.1
>>>
>>>
>>> ___
>>> linux-arm-kernel mailing list
>>> linux-arm-ker...@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>>
>>
>



-- 
Regards,
Alim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 1/2] clk: imx6: Add SPDIF_GCLK clock in clock tree

2015-10-09 Thread Shengjiu Wang

On Sat, Oct 10, 2015 at 09:11:55AM +0800, Shawn Guo wrote:
> On Fri, Oct 09, 2015 at 05:15:30PM +0800, Shengjiu Wang wrote:
> > SPDIF_GCLK is also spdif's clock, it use a same enable bit with 
> > SPDIF_ROOT_CLK,
> > We didn't separate them in clock tree before.
> 
> Is it the clock described as "Global clock" in Reference Manual, SPDIF
Yes.
> chapter?  If that's the case, you are just adding a missing SPDIF clock
> rather than fixing a low power mode issue, and I will be fine.  But
> still you should reword the commit log to make it clear, that the patch
> is to correct a SPDIF clock setting issue, which is just discovered by
> low power mode support.
> 
Ok, I will refine the patch comments, and send it later.

> Shawn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] namei: results of d_is_negative() should be checked after dentry revalidation

2015-10-09 Thread Al Viro

On Fri, Oct 09, 2015 at 05:19:02PM -0700, Linus Torvalds wrote:

> So in general, we should always (a) either verify all sequence points
> or (b) return -ENOCHLD to go into slow mode. The patch seems
> 
> However, this thing was explicitly made to be this way by commit
> 766c4cbfacd8 ("namei: d_is_negative() should be checked before ->d_seq
> validation"), so while my gut feel is to consider this fix
> ObviouslyCorrect(tm), I will delay it a bit in the hope to get an ACK
> and comment from Al about the patch.
> 
> Al?

Umm...  I agree that the current version is wrong and it looks like this
patch is a complete fix.  The only problem is the commit message -
what really happens is that 766c4cbfacd8 got the things subtly wrong.
We used to treat d_is_negative() after lookup_fast() as "fall with ENOENT".
That was wrong - checking ->d_flags outside of ->d_seq protection is
unreliable and failing with hard error on what should've fallen back to
non-RCU pathname resolution is a bug.

Unfortunately, we'd pulled the test too far up and ran afoul of another
kind of staleness.  Dentry might have been absolutely stable from the
RCU point of view (and we might be on UP, etc.), but stale from the
remote fs point of view.  If ->d_revalidate() returns "it's actually
stale", dentry gets thrown away and original code wouldn't even have looked
at its ->d_flags.  What we need is to check ->d_flags where 766c4cbfacd8 does
(prior to ->d_seq validation) but only use the result in cases where we
do not discard this dentry outright.

With some explanation along the lines of the above added, consider the patch
ACKed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC V3] regmap: change bool to 1 bit variable in struct regmap

2015-10-09 Thread yalin wang

hi 

i have sync this branch,
but see my patch have been merged :)
seems correct.

Thanks
> On Oct 9, 2015, at 19:35, Mark Brown  wrote:
> 
> On Fri, Oct 09, 2015 at 03:51:22PM +0800, yalin wang wrote:
>> This patch change some bool variables in struct regmap {  }
>> to be u8 v : 1 type, so that we can shrink the sizeof of struct regmap.
> 
> This still doesn't apply against current code - I'm looking for
> something that applies at least against
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap.git for-next

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] workqueue: Allocate the unbound pool using local node memory

2015-10-09 Thread pang . xunlei

Hello,

"Hillf Danton"  wrote 2015-10-09 PM 06:05:20:
> 
> Re: [PATCH] workqueue: Allocate the unbound pool using local node memory
> 
> > From: Xunlei Pang 
> > 
> > Currently, get_unbound_pool() uses kzalloc() to allocate the
> > worker pool. Actually, we can use the right node to do the
> > allocation, achieving local memory access.
> > 
> > This patch selects target node first, and uses kzalloc_node()
> > instead.
> > 
> > Signed-off-by: Xunlei Pang 
> > ---
> >  kernel/workqueue.c | 26 ++
> >  1 file changed, 14 insertions(+), 12 deletions(-)
> > 
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index ca71582..96d3747 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -3199,6 +3199,7 @@ static struct worker_pool *get_unbound_pool
> (const struct workqueue_attrs *attrs)
> > u32 hash = wqattrs_hash(attrs);
> > struct worker_pool *pool;
> > int node;
> > +   int target_node = NUMA_NO_NODE;
> > 
> > lockdep_assert_held(_pool_mutex);
> > 
> > @@ -3210,13 +3211,25 @@ static struct worker_pool 
> *get_unbound_pool(const struct workqueue_attrs *attrs)
> >}
> > }
> > 
> > +   /* if cpumask is contained inside a NUMA node, we belong to that 
node */
> > +   if (wq_numa_enabled) {
> > +  for_each_node(node) {
> > + if (cpumask_subset(attrs->cpumask,
> > +  wq_numa_possible_cpumask[node])) {
> > +target_node = node;
> > +break;
> > + }
> > +  }
> > +   }
> > +
> > /* nope, create a new one */
> > -   pool = kzalloc(sizeof(*pool), GFP_KERNEL);
> > +   pool = kzalloc_node(sizeof(*pool), GFP_KERNEL, target_node);
> 
> What if target_node is short of pages at the moment?

IIRC, the zonelist to be used contains all nodes' zones in order 
(otherwise __GFP_THISNODE is set)with target_node's zones having 
the highest priority, so it should automatically fall back to 
other nodes if target_node is short of pages.

Regards,
-Xunlei

> 
> > if (!pool || init_worker_pool(pool) < 0)
> >goto fail;
> > 
> > lockdep_set_subclass(>lock, 1);   /* see put_pwq() */
> > copy_workqueue_attrs(pool->attrs, attrs);
> > +   pool->node = target_node;
> > 
> > /*
> >  * no_numa isn't a worker_pool attribute, always clear it.  See
> > @@ -3224,17 +3237,6 @@ static struct worker_pool *get_unbound_pool
> (const struct workqueue_attrs *attrs)
> >  */
> > pool->attrs->no_numa = false;
> > 
> > -   /* if cpumask is contained inside a NUMA node, we belong to that 
node */
> > -   if (wq_numa_enabled) {
> > -  for_each_node(node) {
> > - if (cpumask_subset(pool->attrs->cpumask,
> > -  wq_numa_possible_cpumask[node])) {
> > -pool->node = node;
> > -break;
> > - }
> > -  }
> > -   }
> > -
> > if (worker_pool_assign_id(pool) < 0)
> >goto fail;
> > 
> > --
> > 1.9.1
> 


ZTE Information Security Notice: The information contained in this mail (and 
any attachment transmitted herewith) is privileged and confidential and is 
intended for the exclusive use of the addressee(s).  If you are not an intended 
recipient, any disclosure, reproduction, distribution or other dissemination or 
use of the information contained is strictly prohibited.  If you have received 
this mail in error, please delete it and notify us immediately.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] (swiotlb) stable/for-linus-4.3

2015-10-09 Thread Konrad Rzeszutek Wilk


Hey Linus,

Please git pull the following branch:

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git  
stable/for-linus-4.3

which enables the SWIOTLB under 32-bit PAE kernels. Nowadays most
distros enable this due to CONFIG_HYPERVISOR|CONFIG_XEN=y which
select SWIOTLB. But for those that are not interested in virtualization
and wanting to use 32-bit PAE kernels and wanting to have working DMA operations
- this configures it for them.


 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

Christian Melki (1):
  swiotlb: Enable it under x86 PAE

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 1/2] clk: imx6: Add SPDIF_GCLK clock in clock tree

2015-10-09 Thread Shawn Guo

On Fri, Oct 09, 2015 at 05:15:30PM +0800, Shengjiu Wang wrote:
> SPDIF_GCLK is also spdif's clock, it use a same enable bit with 
> SPDIF_ROOT_CLK,
> We didn't separate them in clock tree before.

Is it the clock described as "Global clock" in Reference Manual, SPDIF
chapter?  If that's the case, you are just adding a missing SPDIF clock
rather than fixing a low power mode issue, and I will be fine.  But
still you should reword the commit log to make it clear, that the patch
is to correct a SPDIF clock setting issue, which is just discovered by
low power mode support.

Shawn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 09/20] frv: fix compiler warning from definition of __pmd()

2015-10-09 Thread Dan Williams

Take into account that the pmd_t type is a array inside a struct, so it
needs two levels of brackets to initialize.  Otherwise, a usage of __pmd
generates a warning:

include/linux/mm.h:986:2: warning: missing braces around initializer 
[-Wmissing-braces]

Signed-off-by: Dan Williams 
---
 arch/frv/include/asm/page.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/frv/include/asm/page.h b/arch/frv/include/asm/page.h
index 8c97068ac8fc..688d8076a43a 100644
--- a/arch/frv/include/asm/page.h
+++ b/arch/frv/include/asm/page.h
@@ -34,7 +34,7 @@ typedef struct page *pgtable_t;
 #define pgprot_val(x)  ((x).pgprot)
 
 #define __pte(x)   ((pte_t) { (x) } )
-#define __pmd(x)   ((pmd_t) { (x) } )
+#define __pmd(x)   ((pmd_t) { { (x) } } )
 #define __pud(x)   ((pud_t) { (x) } )
 #define __pgd(x)   ((pgd_t) { (x) } )
 #define __pgprot(x)((pgprot_t) { (x) } )

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 01/20] block: generic request_queue reference counting

2015-10-09 Thread Dan Williams

Allow pmem, and other synchronous/bio-based block drivers, to fallback
on a per-cpu reference count managed by the core for tracking queue
live/dead state.

The existing per-cpu reference count for the blk_mq case is promoted to
be used in all block i/o scenarios.  This involves initializing it by
default, waiting for it to drop to zero at exit, and holding a live
reference over the invocation of q->make_request_fn() in
generic_make_request().  The blk_mq code continues to take its own
reference per blk_mq request and retains the ability to freeze the
queue, but the check that the queue is frozen is moved to
generic_make_request().

This fixes crash signatures like the following:

 BUG: unable to handle kernel paging request at 88014000
 [..]
 Call Trace:
  [] ? copy_user_handle_tail+0x5f/0x70
  [] pmem_do_bvec.isra.11+0x70/0xf0 [nd_pmem]
  [] pmem_make_request+0xd1/0x200 [nd_pmem]
  [] ? mempool_alloc+0x72/0x1a0
  [] generic_make_request+0xd6/0x110
  [] submit_bio+0x76/0x170
  [] submit_bh_wbc+0x12f/0x160
  [] submit_bh+0x12/0x20
  [] jbd2_write_superblock+0x8d/0x170
  [] jbd2_mark_journal_empty+0x5d/0x90
  [] jbd2_journal_destroy+0x24b/0x270
  [] ? put_pwq_unlocked+0x2a/0x30
  [] ? destroy_workqueue+0x225/0x250
  [] ext4_put_super+0x64/0x360
  [] generic_shutdown_super+0x6a/0xf0

Cc: Jens Axboe 
Cc: Keith Busch 
Cc: Ross Zwisler 
Suggested-by: Christoph Hellwig 
Signed-off-by: Dan Williams 
---
 block/blk-core.c   |   71 +--
 block/blk-mq-sysfs.c   |6 
 block/blk-mq.c |   80 ++--
 block/blk-sysfs.c  |3 +-
 block/blk.h|   14 
 include/linux/blk-mq.h |1 -
 include/linux/blkdev.h |2 +
 7 files changed, 102 insertions(+), 75 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 2eb722d48773..9b4d735cb5b8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -554,13 +554,10 @@ void blk_cleanup_queue(struct request_queue *q)
 * Drain all requests queued before DYING marking. Set DEAD flag to
 * prevent that q->request_fn() gets invoked after draining finished.
 */
-   if (q->mq_ops) {
-   blk_mq_freeze_queue(q);
-   spin_lock_irq(lock);
-   } else {
-   spin_lock_irq(lock);
+   blk_freeze_queue(q);
+   spin_lock_irq(lock);
+   if (!q->mq_ops)
__blk_drain_queue(q, true);
-   }
queue_flag_set(QUEUE_FLAG_DEAD, q);
spin_unlock_irq(lock);
 
@@ -570,6 +567,7 @@ void blk_cleanup_queue(struct request_queue *q)
 
if (q->mq_ops)
blk_mq_free_queue(q);
+   percpu_ref_exit(>q_usage_counter);
 
spin_lock_irq(lock);
if (q->queue_lock != >__queue_lock)
@@ -629,6 +627,40 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(blk_alloc_queue);
 
+int blk_queue_enter(struct request_queue *q, gfp_t gfp)
+{
+   while (true) {
+   int ret;
+
+   if (percpu_ref_tryget_live(>q_usage_counter))
+   return 0;
+
+   if (!(gfp & __GFP_WAIT))
+   return -EBUSY;
+
+   ret = wait_event_interruptible(q->mq_freeze_wq,
+   !atomic_read(>mq_freeze_depth) ||
+   blk_queue_dying(q));
+   if (blk_queue_dying(q))
+   return -ENODEV;
+   if (ret)
+   return ret;
+   }
+}
+
+void blk_queue_exit(struct request_queue *q)
+{
+   percpu_ref_put(>q_usage_counter);
+}
+
+static void blk_queue_usage_counter_release(struct percpu_ref *ref)
+{
+   struct request_queue *q =
+   container_of(ref, struct request_queue, q_usage_counter);
+
+   wake_up_all(>mq_freeze_wq);
+}
+
 struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 {
struct request_queue *q;
@@ -690,11 +722,22 @@ struct request_queue *blk_alloc_queue_node(gfp_t 
gfp_mask, int node_id)
 
init_waitqueue_head(>mq_freeze_wq);
 
-   if (blkcg_init_queue(q))
+   /*
+* Init percpu_ref in atomic mode so that it's faster to shutdown.
+* See blk_register_queue() for details.
+*/
+   if (percpu_ref_init(>q_usage_counter,
+   blk_queue_usage_counter_release,
+   PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
goto fail_bdi;
 
+   if (blkcg_init_queue(q))
+   goto fail_ref;
+
return q;
 
+fail_ref:
+   percpu_ref_exit(>q_usage_counter);
 fail_bdi:
bdi_destroy(>backing_dev_info);
 fail_split:
@@ -1966,9 +2009,19 @@ void generic_make_request(struct bio *bio)
do {
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 
-   q->make_request_fn(q, bio);
+   if (likely(blk_queue_enter(q, __GFP_WAIT) == 0)) {
+
+

[PATCH v2 10/20] um: kill pfn_t

2015-10-09 Thread Dan Williams

The core has developed a need for a "pfn_t" type [1].  Convert the usage
of pfn_t by usermode-linux to an unsigned long, and update pfn_to_phys()
to drop its expectation of a typed pfn.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html

Cc: Dave Hansen 
Cc: Jeff Dike 
Cc: Richard Weinberger 
Signed-off-by: Dan Williams 
---
 arch/um/include/asm/page.h   |6 +++---
 arch/um/include/asm/pgtable-3level.h |4 ++--
 arch/um/include/asm/pgtable.h|2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/um/include/asm/page.h b/arch/um/include/asm/page.h
index 71c5d132062a..fe26a5e06268 100644
--- a/arch/um/include/asm/page.h
+++ b/arch/um/include/asm/page.h
@@ -18,6 +18,7 @@
 
 struct page;
 
+#include 
 #include 
 #include 
 
@@ -76,7 +77,6 @@ typedef struct { unsigned long pmd; } pmd_t;
 #define pte_is_zero(p) (!((p).pte & ~_PAGE_NEWPAGE))
 #define pte_set_val(p, phys, prot) (p).pte = (phys | pgprot_val(prot))
 
-typedef unsigned long pfn_t;
 typedef unsigned long phys_t;
 
 #endif
@@ -109,8 +109,8 @@ extern unsigned long uml_physmem;
 #define __pa(virt) to_phys((void *) (unsigned long) (virt))
 #define __va(phys) to_virt((unsigned long) (phys))
 
-#define phys_to_pfn(p) ((pfn_t) ((p) >> PAGE_SHIFT))
-#define pfn_to_phys(pfn) ((phys_t) ((pfn) << PAGE_SHIFT))
+#define phys_to_pfn(p) ((p) >> PAGE_SHIFT)
+#define pfn_to_phys(pfn) PFN_PHYS(pfn)
 
 #define pfn_valid(pfn) ((pfn) < max_mapnr)
 #define virt_addr_valid(v) pfn_valid(phys_to_pfn(__pa(v)))
diff --git a/arch/um/include/asm/pgtable-3level.h 
b/arch/um/include/asm/pgtable-3level.h
index 2b4274e7c095..bae8523a162f 100644
--- a/arch/um/include/asm/pgtable-3level.h
+++ b/arch/um/include/asm/pgtable-3level.h
@@ -98,7 +98,7 @@ static inline unsigned long pte_pfn(pte_t pte)
return phys_to_pfn(pte_val(pte));
 }
 
-static inline pte_t pfn_pte(pfn_t page_nr, pgprot_t pgprot)
+static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
pte_t pte;
phys_t phys = pfn_to_phys(page_nr);
@@ -107,7 +107,7 @@ static inline pte_t pfn_pte(pfn_t page_nr, pgprot_t pgprot)
return pte;
 }
 
-static inline pmd_t pfn_pmd(pfn_t page_nr, pgprot_t pgprot)
+static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)
 {
return __pmd((page_nr << PAGE_SHIFT) | pgprot_val(pgprot));
 }
diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
index 18eb9924dda3..7485398d0737 100644
--- a/arch/um/include/asm/pgtable.h
+++ b/arch/um/include/asm/pgtable.h
@@ -271,7 +271,7 @@ static inline int pte_same(pte_t pte_a, pte_t pte_b)
 
 #define phys_to_page(phys) pfn_to_page(phys_to_pfn(phys))
 #define __virt_to_page(virt) phys_to_page(__pa(virt))
-#define page_to_phys(page) pfn_to_phys((pfn_t) page_to_pfn(page))
+#define page_to_phys(page) pfn_to_phys(page_to_pfn(page))
 #define virt_to_page(addr) __virt_to_page((const unsigned long) addr)
 
 #define mk_pte(page, pgprot) \

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 08/20] hugetlb: fix compile error on tile

2015-10-09 Thread Dan Williams

Inlude asm/pgtable.h to get the definition for pud_t to fix:

include/linux/hugetlb.h:203:29: error: unknown type name 'pud_t'

Signed-off-by: Dan Williams 
---
 include/linux/hugetlb.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 5e35379f58a5..ad5539cf52bf 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct ctl_table;
 struct user_struct;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 02/20] dax: increase granularity of dax_clear_blocks() operations

2015-10-09 Thread Dan Williams

dax_clear_blocks is currently performing a cond_resched() after every
PAGE_SIZE memset.  We need not check so frequently, for example md-raid
only calls cond_resched() at stripe granularity.  Also, in preparation
for introducing a dax_map_atomic() operation that temporarily pins a dax
mapping move the call to cond_resched() to the outer loop.

Signed-off-by: Dan Williams 
---
 fs/dax.c |   27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index cc9a6e3d7389..7031b0312596 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * dax_clear_blocks() is called from within transaction context from XFS,
@@ -43,24 +44,20 @@ int dax_clear_blocks(struct inode *inode, sector_t block, 
long size)
do {
void __pmem *addr;
unsigned long pfn;
-   long count;
+   long count, sz;
 
-   count = bdev_direct_access(bdev, sector, , , size);
+   sz = min_t(long, size, SZ_1M);
+   count = bdev_direct_access(bdev, sector, , , sz);
if (count < 0)
return count;
-   BUG_ON(size < count);
-   while (count > 0) {
-   unsigned pgsz = PAGE_SIZE - offset_in_page(addr);
-   if (pgsz > count)
-   pgsz = count;
-   clear_pmem(addr, pgsz);
-   addr += pgsz;
-   size -= pgsz;
-   count -= pgsz;
-   BUG_ON(pgsz & 511);
-   sector += pgsz / 512;
-   cond_resched();
-   }
+   if (count < sz)
+   sz = count;
+   clear_pmem(addr, sz);
+   addr += sz;
+   size -= sz;
+   BUG_ON(sz & 511);
+   sector += sz / 512;
+   cond_resched();
} while (size);
 
wmb_pmem();

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 03/20] block, dax: fix lifetime of in-kernel dax mappings with dax_map_atomic()

2015-10-09 Thread Dan Williams

The DAX implementation needs to protect new calls to ->direct_access()
and usage of its return value against unbind of the underlying block
device.  Use blk_queue_enter()/blk_queue_exit() to either prevent
blk_cleanup_queue() from proceeding, or fail the dax_map_atomic() if the
request_queue is being torn down.

Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Boaz Harrosh 
Cc: Dave Chinner 
Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 block/blk.h|2 -
 fs/dax.c   |  165 
 include/linux/blkdev.h |2 +
 3 files changed, 112 insertions(+), 57 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index 5b2cd393afbe..0f8de0dda768 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -72,8 +72,6 @@ void blk_dequeue_request(struct request *rq);
 void __blk_queue_free_tags(struct request_queue *q);
 bool __blk_end_bidi_request(struct request *rq, int error,
unsigned int nr_bytes, unsigned int bidi_bytes);
-int blk_queue_enter(struct request_queue *q, gfp_t gfp);
-void blk_queue_exit(struct request_queue *q);
 void blk_freeze_queue(struct request_queue *q);
 
 static inline void blk_queue_enter_live(struct request_queue *q)
diff --git a/fs/dax.c b/fs/dax.c
index 7031b0312596..9549cd523649 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -30,6 +30,40 @@
 #include 
 #include 
 
+static void __pmem *__dax_map_atomic(struct block_device *bdev, sector_t 
sector,
+   long size, unsigned long *pfn, long *len)
+{
+   long rc;
+   void __pmem *addr;
+   struct request_queue *q = bdev->bd_queue;
+
+   if (blk_queue_enter(q, GFP_NOWAIT) != 0)
+   return (void __pmem *) ERR_PTR(-EIO);
+   rc = bdev_direct_access(bdev, sector, , pfn, size);
+   if (len)
+   *len = rc;
+   if (rc < 0) {
+   blk_queue_exit(q);
+   return (void __pmem *) ERR_PTR(rc);
+   }
+   return addr;
+}
+
+static void __pmem *dax_map_atomic(struct block_device *bdev, sector_t sector,
+   long size)
+{
+   unsigned long pfn;
+
+   return __dax_map_atomic(bdev, sector, size, , NULL);
+}
+
+static void dax_unmap_atomic(struct block_device *bdev, void __pmem *addr)
+{
+   if (IS_ERR(addr))
+   return;
+   blk_queue_exit(bdev->bd_queue);
+}
+
 /*
  * dax_clear_blocks() is called from within transaction context from XFS,
  * and hence this means the stack from this point must follow GFP_NOFS
@@ -47,9 +81,9 @@ int dax_clear_blocks(struct inode *inode, sector_t block, 
long size)
long count, sz;
 
sz = min_t(long, size, SZ_1M);
-   count = bdev_direct_access(bdev, sector, , , sz);
-   if (count < 0)
-   return count;
+   addr = __dax_map_atomic(bdev, sector, size, , );
+   if (IS_ERR(addr))
+   return PTR_ERR(addr);
if (count < sz)
sz = count;
clear_pmem(addr, sz);
@@ -57,6 +91,7 @@ int dax_clear_blocks(struct inode *inode, sector_t block, 
long size)
size -= sz;
BUG_ON(sz & 511);
sector += sz / 512;
+   dax_unmap_atomic(bdev, addr);
cond_resched();
} while (size);
 
@@ -65,14 +100,6 @@ int dax_clear_blocks(struct inode *inode, sector_t block, 
long size)
 }
 EXPORT_SYMBOL_GPL(dax_clear_blocks);
 
-static long dax_get_addr(struct buffer_head *bh, void __pmem **addr,
-   unsigned blkbits)
-{
-   unsigned long pfn;
-   sector_t sector = bh->b_blocknr << (blkbits - 9);
-   return bdev_direct_access(bh->b_bdev, sector, addr, , bh->b_size);
-}
-
 /* the clear_pmem() calls are ordered by a wmb_pmem() in the caller */
 static void dax_new_buf(void __pmem *addr, unsigned size, unsigned first,
loff_t pos, loff_t end)
@@ -102,19 +129,30 @@ static bool buffer_size_valid(struct buffer_head *bh)
return bh->b_state != 0;
 }
 
+
+static sector_t to_sector(const struct buffer_head *bh,
+   const struct inode *inode)
+{
+   sector_t sector = bh->b_blocknr << (inode->i_blkbits - 9);
+
+   return sector;
+}
+
 static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
  loff_t start, loff_t end, get_block_t get_block,
  struct buffer_head *bh)
 {
-   ssize_t retval = 0;
-   loff_t pos = start;
-   loff_t max = start;
-   loff_t bh_max = start;
-   void __pmem *addr;
+   loff_t pos = start, max = start, bh_max = start;
+   struct block_device *bdev = NULL;
+   int rw = iov_iter_rw(iter), rc;
+   long map_len = 0;
+   unsigned long pfn;
+   void __pmem *addr = NULL;
+   void __pmem *kmap = (void __pmem *) ERR_PTR(-EIO);
bool hole = false;
bool need_wmb = false;
 
-   if (iov_iter_rw(iter) != WRITE)
+   if (rw == READ)

[PATCH v2 11/20] kvm: rename pfn_t to kvm_pfn_t

2015-10-09 Thread Dan Williams

The core has developed a need for a "pfn_t" type [1].  Move the existing
pfn_t in KVM to kvm_pfn_t [2].

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html

Cc: Dave Hansen 
Cc: Gleb Natapov 
Cc: Paolo Bonzini 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Ralf Baechle 
Cc: Alexander Graf 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Signed-off-by: Dan Williams 
---
 arch/arm/include/asm/kvm_mmu.h|5 ++--
 arch/arm/kvm/mmu.c|   10 ---
 arch/arm64/include/asm/kvm_mmu.h  |3 +-
 arch/mips/include/asm/kvm_host.h  |6 ++--
 arch/mips/kvm/emulate.c   |2 +
 arch/mips/kvm/tlb.c   |   14 +-
 arch/powerpc/include/asm/kvm_book3s.h |4 +--
 arch/powerpc/include/asm/kvm_ppc.h|2 +
 arch/powerpc/kvm/book3s.c |6 ++--
 arch/powerpc/kvm/book3s_32_mmu_host.c |2 +
 arch/powerpc/kvm/book3s_64_mmu_host.c |2 +
 arch/powerpc/kvm/e500.h   |2 +
 arch/powerpc/kvm/e500_mmu_host.c  |8 +++---
 arch/powerpc/kvm/trace_pr.h   |2 +
 arch/x86/kvm/iommu.c  |   11 
 arch/x86/kvm/mmu.c|   37 +-
 arch/x86/kvm/mmu_audit.c  |2 +
 arch/x86/kvm/paging_tmpl.h|6 ++--
 arch/x86/kvm/vmx.c|2 +
 arch/x86/kvm/x86.c|2 +
 include/linux/kvm_host.h  |   37 +-
 include/linux/kvm_types.h |2 +
 virt/kvm/kvm_main.c   |   47 +
 23 files changed, 110 insertions(+), 104 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 405aa1883307..8ebd282dfc2b 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -182,7 +182,8 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu 
*vcpu)
return (vcpu->arch.cp15[c1_SCTLR] & 0b101) == 0b101;
 }
 
-static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t 
pfn,
+static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
+  kvm_pfn_t pfn,
   unsigned long size,
   bool ipa_uncached)
 {
@@ -246,7 +247,7 @@ static inline void __kvm_flush_dcache_pte(pte_t pte)
 static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
 {
unsigned long size = PMD_SIZE;
-   pfn_t pfn = pmd_pfn(pmd);
+   kvm_pfn_t pfn = pmd_pfn(pmd);
 
while (size) {
void *va = kmap_atomic_pfn(pfn);
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 6984342da13d..e2dcbfdc4a8c 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -988,9 +988,9 @@ out:
return ret;
 }
 
-static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
+static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap)
 {
-   pfn_t pfn = *pfnp;
+   kvm_pfn_t pfn = *pfnp;
gfn_t gfn = *ipap >> PAGE_SHIFT;
 
if (PageTransCompound(pfn_to_page(pfn))) {
@@ -1202,7 +1202,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm 
*kvm,
kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t pfn,
+static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
  unsigned long size, bool uncached)
 {
__coherent_cache_guest_page(vcpu, pfn, size, uncached);
@@ -1219,7 +1219,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
struct kvm *kvm = vcpu->kvm;
struct kvm_mmu_memory_cache *memcache = >arch.mmu_page_cache;
struct vm_area_struct *vma;
-   pfn_t pfn;
+   kvm_pfn_t pfn;
pgprot_t mem_type = PAGE_S2;
bool fault_ipa_uncached;
bool logging_active = memslot_is_logging(memslot);
@@ -1347,7 +1347,7 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa)
 {
pmd_t *pmd;
pte_t *pte;
-   pfn_t pfn;
+   kvm_pfn_t pfn;
bool pfn_valid = false;
 
trace_kvm_access_fault(fault_ipa);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 61505676d085..385fc8cef82d 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -230,7 +230,8 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu 
*vcpu)
return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
-static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t 
pfn,
+static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
+

[PATCH v2 04/20] mm: introduce __get_dev_pagemap()

2015-10-09 Thread Dan Williams

There are several scenarios where we need to retrieve and update
metadata associated with a given devm_memremap_pages() mapping, and the
only lookup key available is a pfn in the range:

1/ We want to augment vmemmap_populate() (called via arch_add_memory())
   to allocate memmap storage from pre-allocated pages reserved by the
   device driver.  At vmemmap_alloc_block_buf() time it grabs device pages
   rather than page allocator pages.  This is in support of
   devm_memremap_pages() mappings where the memmap is too large to fit in
   main memory (i.e. large persistent memory devices).

2/ Taking a reference against the mapping when inserting device pages
   into the address_space radix of a given inode.  This facilitates
   unmap_mapping_range() and truncate_inode_pages() operations when the
   driver is tearing down the mapping.

3/ get_user_pages() operations on ZONE_DEVICE memory require taking a
   reference against the mapping so that the driver teardown path can
   revoke and drain usage of device pages.

Cc: Christoph Hellwig 
Cc: Dave Chinner 
Cc: Andrew Morton 
Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 include/linux/mm.h |   18 ++
 kernel/memremap.c  |   40 
 2 files changed, 58 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 80001de019ba..30c3c8764649 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -717,6 +717,24 @@ static inline enum zone_type page_zonenum(const struct 
page *page)
return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
 }
 
+/**
+ * struct dev_pagemap - metadata for ZONE_DEVICE mappings
+ * @dev: host device of the mapping for debug
+ */
+struct dev_pagemap {
+   /* TODO: vmem_altmap and percpu_ref count */
+   struct device *dev;
+};
+
+#ifdef CONFIG_ZONE_DEVICE
+struct dev_pagemap *__get_dev_pagemap(resource_size_t phys);
+#else
+static inline struct dev_pagemap *get_dev_pagemap(resource_size_t phys)
+{
+   return NULL;
+}
+#endif
+
 #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
 #define SECTION_IN_PAGE_FLAGS
 #endif
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 3218e8b1fc28..64bfd9fa93aa 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -10,6 +10,7 @@
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  * General Public License for more details.
  */
+#include 
 #include 
 #include 
 #include 
@@ -138,18 +139,52 @@ void devm_memunmap(struct device *dev, void *addr)
 EXPORT_SYMBOL(devm_memunmap);
 
 #ifdef CONFIG_ZONE_DEVICE
+static LIST_HEAD(ranges);
+static DEFINE_SPINLOCK(range_lock);
+
 struct page_map {
struct resource res;
+   struct dev_pagemap pgmap;
+   struct list_head list;
 };
 
+static void add_page_map(struct page_map *page_map)
+{
+   spin_lock(_lock);
+   list_add_rcu(_map->list, );
+   spin_unlock(_lock);
+}
+
+static void del_page_map(struct page_map *page_map)
+{
+   spin_lock(_lock);
+   list_del_rcu(_map->list);
+   spin_unlock(_lock);
+}
+
 static void devm_memremap_pages_release(struct device *dev, void *res)
 {
struct page_map *page_map = res;
 
+   del_page_map(page_map);
+
/* pages are dead and unused, undo the arch mapping */
arch_remove_memory(page_map->res.start, resource_size(_map->res));
 }
 
+/* assumes rcu_read_lock() held at entry */
+struct dev_pagemap *__get_dev_pagemap(resource_size_t phys)
+{
+   struct page_map *page_map;
+
+   WARN_ON_ONCE(!rcu_read_lock_held());
+
+   list_for_each_entry_rcu(page_map, , list)
+   if (phys >= page_map->res.start && phys <= page_map->res.end)
+   return _map->pgmap;
+   return NULL;
+}
+
 void *devm_memremap_pages(struct device *dev, struct resource *res)
 {
int is_ram = region_intersects(res->start, resource_size(res),
@@ -173,12 +208,17 @@ void *devm_memremap_pages(struct device *dev, struct 
resource *res)
 
memcpy(_map->res, res, sizeof(*res));
 
+   page_map->pgmap.dev = dev;
+   INIT_LIST_HEAD(_map->list);
+   add_page_map(page_map);
+
nid = dev_to_node(dev);
if (nid < 0)
nid = numa_mem_id();
 
error = arch_add_memory(nid, res->start, resource_size(res), true);
if (error) {
+   del_page_map(page_map);
devres_free(page_map);
return ERR_PTR(error);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 05/20] x86, mm: introduce vmem_altmap to augment vmemmap_populate()

2015-10-09 Thread Dan Williams

In support of providing struct page for large persistent memory
capacities, use struct vmem_altmap to change the default policy for
allocating memory for the memmap array.  The default vmemmap_populate()
allocates page table storage area from the page allocator.  Given
persistent memory capacities relative to DRAM it may not be feasible to
store the memmap in 'System Memory'.  Instead vmem_altmap represents
pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf()
requests.

Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Dave Hansen 
Cc: Andrew Morton 
Signed-off-by: Dan Williams 
---
 arch/m68k/include/asm/page_mm.h |1 
 arch/m68k/include/asm/page_no.h |1 
 arch/x86/mm/init_64.c   |   32 ++---
 drivers/nvdimm/pmem.c   |6 ++
 include/linux/io.h  |   17 ---
 include/linux/memory_hotplug.h  |3 +
 include/linux/mm.h  |   95 ++-
 kernel/memremap.c   |   69 +---
 mm/memory_hotplug.c |   66 +++
 mm/page_alloc.c |   10 
 mm/sparse-vmemmap.c |   37 ++-
 mm/sparse.c |8 ++-
 12 files changed, 282 insertions(+), 63 deletions(-)

diff --git a/arch/m68k/include/asm/page_mm.h b/arch/m68k/include/asm/page_mm.h
index 5029f73e6294..884f2f7e4caf 100644
--- a/arch/m68k/include/asm/page_mm.h
+++ b/arch/m68k/include/asm/page_mm.h
@@ -125,6 +125,7 @@ static inline void *__va(unsigned long x)
  */
 #define virt_to_pfn(kaddr) (__pa(kaddr) >> PAGE_SHIFT)
 #define pfn_to_virt(pfn)   __va((pfn) << PAGE_SHIFT)
+#define__pfn_to_phys(pfn)  PFN_PHYS(pfn)
 
 extern int m68k_virt_to_node_shift;
 
diff --git a/arch/m68k/include/asm/page_no.h b/arch/m68k/include/asm/page_no.h
index ef209169579a..7845eca0b36d 100644
--- a/arch/m68k/include/asm/page_no.h
+++ b/arch/m68k/include/asm/page_no.h
@@ -24,6 +24,7 @@ extern unsigned long memory_end;
 
 #define virt_to_pfn(kaddr) (__pa(kaddr) >> PAGE_SHIFT)
 #define pfn_to_virt(pfn)   __va((pfn) << PAGE_SHIFT)
+#define__pfn_to_phys(pfn)  PFN_PHYS(pfn)
 
 #define virt_to_page(addr) (mem_map + (((unsigned long)(addr)-PAGE_OFFSET) 
>> PAGE_SHIFT))
 #define page_to_virt(page) __va(page) - mem_map) << PAGE_SHIFT) + 
PAGE_OFFSET))
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e5d42f1a2a71..cabf8ceb0a6b 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -714,6 +714,12 @@ static void __meminit free_pagetable(struct page *page, 
int order)
 {
unsigned long magic;
unsigned int nr_pages = 1 << order;
+   struct vmem_altmap *altmap = to_vmem_altmap((unsigned long) page);
+
+   if (altmap) {
+   vmem_altmap_free(altmap, nr_pages);
+   return;
+   }
 
/* bootmem page has reserved flag */
if (PageReserved(page)) {
@@ -1018,13 +1024,19 @@ int __ref arch_remove_memory(u64 start, u64 size)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
+   struct page *page = pfn_to_page(start_pfn);
+   struct vmem_altmap *altmap;
struct zone *zone;
int ret;
 
-   zone = page_zone(pfn_to_page(start_pfn));
-   kernel_physical_mapping_remove(start, start + size);
+   /* With altmap the first mapped page is offset from @start */
+   altmap = to_vmem_altmap((unsigned long) page);
+   if (altmap)
+   page += vmem_altmap_offset(altmap);
+   zone = page_zone(page);
ret = __remove_pages(zone, start_pfn, nr_pages);
WARN_ON_ONCE(ret);
+   kernel_physical_mapping_remove(start, start + size);
 
return ret;
 }
@@ -1234,7 +1246,7 @@ static void __meminitdata *p_start, *p_end;
 static int __meminitdata node_start;
 
 static int __meminit vmemmap_populate_hugepages(unsigned long start,
-   unsigned long end, int node)
+   unsigned long end, int node, struct vmem_altmap *altmap)
 {
unsigned long addr;
unsigned long next;
@@ -1257,7 +1269,7 @@ static int __meminit vmemmap_populate_hugepages(unsigned 
long start,
if (pmd_none(*pmd)) {
void *p;
 
-   p = vmemmap_alloc_block_buf(PMD_SIZE, node);
+   p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
if (p) {
pte_t entry;
 
@@ -1278,7 +1290,8 @@ static int __meminit vmemmap_populate_hugepages(unsigned 
long start,
addr_end = addr + PMD_SIZE;
p_end = p + PMD_SIZE;
continue;
-   }
+   } else if (altmap)
+   return -ENOMEM; /* no fallback */
} else if (pmd_large(*pmd)) {

Re: [PATCH 2/2] i2c: add ACPI support for I2C mux ports

2015-10-09 Thread kbuild test robot

Hi Dustin,

[auto build test ERROR on wsa/i2c/for-next -- if it's inappropriate base, 
please ignore]

config: xtensa-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

   drivers/i2c/i2c-mux.c: In function 'i2c_add_mux_adapter':
>> drivers/i2c/i2c-mux.c:181:3: error: implicit declaration of function 
>> 'acpi_preset_companion' [-Werror=implicit-function-declaration]
  acpi_preset_companion(>adap.dev, ACPI_COMPANION(mux_dev),
  ^
   cc1: some warnings being treated as errors

vim +/acpi_preset_companion +181 drivers/i2c/i2c-mux.c

   175  }
   176  
   177  /*
   178   * Associate the mux channel with an ACPI node.
   179   */
   180  if (has_acpi_companion(mux_dev))
 > 181  acpi_preset_companion(>adap.dev, 
 > ACPI_COMPANION(mux_dev),
   182chan_id);
   183  
   184  if (force_nr) {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

[PATCH v2 07/20] avr32: convert to asm-generic/memory_model.h

2015-10-09 Thread Dan Williams

Switch avr32/include/asm/page.h to use the common defintions for
pfn_to_page(), page_to_pfn(), and ARCH_PFN_OFFSET.

Signed-off-by: Dan Williams 
---
 arch/avr32/include/asm/page.h |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/avr32/include/asm/page.h b/arch/avr32/include/asm/page.h
index f805d1cb11bc..c5d2a3e2c62f 100644
--- a/arch/avr32/include/asm/page.h
+++ b/arch/avr32/include/asm/page.h
@@ -83,11 +83,9 @@ static inline int get_order(unsigned long size)
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
 
-#define PHYS_PFN_OFFSET(CONFIG_PHYS_OFFSET >> PAGE_SHIFT)
+#define ARCH_PFN_OFFSET(CONFIG_PHYS_OFFSET >> PAGE_SHIFT)
 
-#define pfn_to_page(pfn)   (mem_map + ((pfn) - PHYS_PFN_OFFSET))
-#define page_to_pfn(page)  ((unsigned long)((page) - mem_map) + 
PHYS_PFN_OFFSET)
-#define pfn_valid(pfn) ((pfn) >= PHYS_PFN_OFFSET && (pfn) < 
(PHYS_PFN_OFFSET + max_mapnr))
+#define pfn_valid(pfn) ((pfn) >= ARCH_PFN_OFFSET && (pfn) < 
(ARCH_PFN_OFFSET + max_mapnr))
 #endif /* CONFIG_NEED_MULTIPLE_NODES */
 
 #define virt_to_page(kaddr)pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
@@ -101,4 +99,6 @@ static inline int get_order(unsigned long size)
  */
 #define HIGHMEM_START  0x2000UL
 
+#include 
+
 #endif /* __ASM_AVR32_PAGE_H */

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 13/20] mm, dax, pmem: introduce pfn_t

2015-10-09 Thread Dan Williams

In preparation for enabling get_user_pages() operations on dax mappings,
introduce a type that encapsulates a page-frame-number that can also be
used to encode other information.  This other information is the
historical "page_link" encoding in a scatterlist, but can also denote
"device memory".  Where "device memory" is a set of pfns that are not
part of the kernel's linear mapping by default, but are accessed via the
same memory controller as ram.  The motivation for this new type is
large capacity persistent memory that optionally has struct page entries
in the 'memmap'.

When a driver, like pmem, has established a devm_memremap_pages()
mapping it needs to communicate to upper layers that the pfn has a page
backing.  This property will be leveraged in a later patch to enable
dax-gup.  For now, update all the ->direct_access() implementations to
communicate whether the returned pfn range is mapped.

Cc: Christoph Hellwig 
Cc: Dave Hansen 
Cc: Andrew Morton 
Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 arch/powerpc/sysdev/axonram.c |8 ++---
 drivers/block/brd.c   |4 +--
 drivers/nvdimm/pmem.c |   25 ++--
 drivers/s390/block/dcssblk.c  |   10 +++---
 fs/block_dev.c|2 +
 fs/dax.c  |   19 ++--
 include/linux/blkdev.h|4 +--
 include/linux/mm.h|   65 +
 include/linux/pfn.h   |9 ++
 9 files changed, 105 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index d2b79bc336c1..59ca4c0ab529 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -141,15 +141,13 @@ axon_ram_make_request(struct request_queue *queue, struct 
bio *bio)
  */
 static long
 axon_ram_direct_access(struct block_device *device, sector_t sector,
-  void __pmem **kaddr, unsigned long *pfn)
+  void __pmem **kaddr, pfn_t *pfn)
 {
struct axon_ram_bank *bank = device->bd_disk->private_data;
loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
-   void *addr = (void *)(bank->ph_addr + offset);
-
-   *kaddr = (void __pmem *)addr;
-   *pfn = virt_to_phys(addr) >> PAGE_SHIFT;
 
+   *kaddr = (void __pmem __force *) bank->io_addr + offset;
+   *pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
return bank->size - offset;
 }
 
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index b9794aeeb878..0bbc60463779 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -374,7 +374,7 @@ static int brd_rw_page(struct block_device *bdev, sector_t 
sector,
 
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 static long brd_direct_access(struct block_device *bdev, sector_t sector,
-   void __pmem **kaddr, unsigned long *pfn)
+   void __pmem **kaddr, pfn_t *pfn)
 {
struct brd_device *brd = bdev->bd_disk->private_data;
struct page *page;
@@ -385,7 +385,7 @@ static long brd_direct_access(struct block_device *bdev, 
sector_t sector,
if (!page)
return -ENOSPC;
*kaddr = (void __pmem *)page_address(page);
-   *pfn = page_to_pfn(page);
+   *pfn = page_to_pfn_t(page);
 
return PAGE_SIZE;
 }
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index bb66158c0505..c950602bbf0b 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -39,6 +39,7 @@ struct pmem_device {
phys_addr_t phys_addr;
/* when non-zero this device is hosting a 'pfn' instance */
phys_addr_t data_offset;
+   unsigned long   pfn_flags;
void __pmem *virt_addr;
size_t  size;
 };
@@ -100,26 +101,15 @@ static int pmem_rw_page(struct block_device *bdev, 
sector_t sector,
 }
 
 static long pmem_direct_access(struct block_device *bdev, sector_t sector,
- void __pmem **kaddr, unsigned long *pfn)
+ void __pmem **kaddr, pfn_t *pfn)
 {
struct pmem_device *pmem = bdev->bd_disk->private_data;
resource_size_t offset = sector * 512 + pmem->data_offset;
-   resource_size_t size;
-
-   if (pmem->data_offset) {
-   /*
-* Limit the direct_access() size to what is covered by
-* the memmap
-*/
-   size = (pmem->size - offset) & ~ND_PFN_MASK;
-   } else
-   size = pmem->size - offset;
 
-   /* FIXME convert DAX to comprehend that this mapping has a lifetime */
*kaddr = pmem->virt_addr + offset;
-   *pfn = (pmem->phys_addr + offset) >> PAGE_SHIFT;
+   *pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
 
-   return size;
+   return pmem->size - offset;
 }
 
 static const struct block_device_operations pmem_fops = {
@@ -150,10 +140,12 @@ static struct pmem_device *pmem_alloc(struct device

[PATCH v2 15/20] mm, dax: convert vmf_insert_pfn_pmd() to pfn_t

2015-10-09 Thread Dan Williams

Similar to the conversion of vm_insert_mixed() use pfn_t in the
vmf_insert_pfn_pmd() to tag the resulting pte with _PAGE_DEVICE when the
pfn is backed by a devm_memremap_pages() mapping.

Cc: Dave Hansen 
Cc: Andrew Morton 
Cc: Matthew Wilcox 
Cc: Alexander Viro 
Signed-off-by: Dan Williams 
---
 arch/sparc/include/asm/pgtable_64.h |2 ++
 arch/x86/include/asm/pgtable.h  |6 ++
 arch/x86/mm/pat.c   |4 ++--
 drivers/gpu/drm/exynos/exynos_drm_gem.c |2 +-
 drivers/gpu/drm/msm/msm_gem.c   |2 +-
 drivers/gpu/drm/omapdrm/omap_gem.c  |4 ++--
 fs/dax.c|2 +-
 include/asm-generic/pgtable.h   |6 --
 include/linux/huge_mm.h |2 +-
 include/linux/mm.h  |   18 +-
 mm/huge_memory.c|   10 ++
 mm/memory.c |2 +-
 12 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_64.h 
b/arch/sparc/include/asm/pgtable_64.h
index 131d36fcd07a..496ef783c68c 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -234,6 +234,7 @@ extern struct page *mem_map_zero;
  * the first physical page in the machine is at some huge physical address,
  * such as 4GB.   This is common on a partitioned E1, for example.
  */
+#define pfn_pte pfn_pte
 static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
 {
unsigned long paddr = pfn << PAGE_SHIFT;
@@ -244,6 +245,7 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t 
prot)
 #define mk_pte(page, pgprot)   pfn_pte(page_to_pfn(page), (pgprot))
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define pfn_pmd pfn_pmd
 static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)
 {
pte_t pte = pfn_pte(page_nr, pgprot);
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 02a54e5b7930..84d1346e1cda 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -282,6 +282,11 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd)
return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
 }
 
+static inline pmd_t pmd_mkdevmap(pmd_t pmd)
+{
+   return pmd_set_flags(pmd, _PAGE_DEVMAP);
+}
+
 static inline pmd_t pmd_mkhuge(pmd_t pmd)
 {
return pmd_set_flags(pmd, _PAGE_PSE);
@@ -346,6 +351,7 @@ static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t 
pgprot)
 massage_pgprot(pgprot));
 }
 
+#define pfn_pmd pfn_pmd
 static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)
 {
return __pmd(((phys_addr_t)page_nr << PAGE_SHIFT) |
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 188e3e07eeeb..98efd3c02374 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -949,7 +949,7 @@ int track_pfn_remap(struct vm_area_struct *vma, pgprot_t 
*prot,
 }
 
 int track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
-unsigned long pfn)
+pfn_t pfn)
 {
enum page_cache_mode pcm;
 
@@ -957,7 +957,7 @@ int track_pfn_insert(struct vm_area_struct *vma, pgprot_t 
*prot,
return 0;
 
/* Set prot based on lookup */
-   pcm = lookup_memtype((resource_size_t)pfn << PAGE_SHIFT);
+   pcm = lookup_memtype(pfn_t_to_phys(pfn));
*prot = __pgprot((pgprot_val(vma->vm_page_prot) & (~_PAGE_CACHE_MASK)) |
 cachemode2protval(pcm));
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c 
b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index 778764bebc00..aa7709ed9ae2 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -480,7 +480,7 @@ int exynos_drm_gem_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf)
 
pfn = page_to_pfn(exynos_gem_obj->pages[page_offset]);
ret = vm_insert_mixed(vma, (unsigned long)vmf->virtual_address,
-   pfn_to_pfn_t(pfn, PFN_DEV));
+   __pfn_to_pfn_t(pfn, PFN_DEV));
 
 out:
switch (ret) {
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 0f4ed5bfda83..6509d9b23912 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -223,7 +223,7 @@ int msm_gem_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf)
pfn, pfn << PAGE_SHIFT);
 
ret = vm_insert_mixed(vma, (unsigned long)vmf->virtual_address,
-   pfn_to_pfn_t(pfn, PFN_DEV));
+   __pfn_to_pfn_t(pfn, PFN_DEV));
 
 out_unlock:
mutex_unlock(>struct_mutex);
diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c 
b/drivers/gpu/drm/omapdrm/omap_gem.c
index 910cb276a7ea..94b6d23ec202 100644
--- a/drivers/gpu/drm/omapdrm/omap_gem.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem.c
@@ -386,7 +386,7 @@ static int fault_1d(struct drm_gem_object *obj,
pfn, pfn <<

[PATCH v2 17/20] mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup

2015-10-09 Thread Dan Williams

get_dev_page() enables paths like get_user_pages() to pin a dynamically
mapped pfn-range (devm_memremap_pages()) while the resulting struct page
objects are in use.  Unlike get_page() it may fail if the device is, or
is in the process of being, disabled.  While the initial lookup of the
range may be an expensive list walk, the result is cached to speed up
subsequent lookups which are likely to be in the same mapped range.

devm_memremap_pages() now requires a reference counter to be specified
at init time.  For pmem this means moving request_queue allocation into
pmem_alloc() so the existing queue usage counter can track "device
pages".

Cc: Dave Hansen 
Cc: Andrew Morton 
Cc: Matthew Wilcox 
Cc: Ross Zwisler 
Cc: Alexander Viro 
Signed-off-by: Dan Williams 
---
 drivers/nvdimm/pmem.c|   42 +-
 include/linux/mm.h   |   40 ++--
 include/linux/mm_types.h |5 +
 kernel/memremap.c|   46 ++
 4 files changed, 110 insertions(+), 23 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index c950602bbf0b..f7acce594fa0 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -123,6 +123,7 @@ static struct pmem_device *pmem_alloc(struct device *dev,
struct resource *res, int id)
 {
struct pmem_device *pmem;
+   struct request_queue *q;
 
pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL);
if (!pmem)
@@ -140,19 +141,26 @@ static struct pmem_device *pmem_alloc(struct device *dev,
return ERR_PTR(-EBUSY);
}
 
+   q = blk_alloc_queue_node(GFP_KERNEL, dev_to_node(dev));
+   if (!q)
+   return ERR_PTR(-ENOMEM);
+
pmem->pfn_flags = PFN_DEV;
if (pmem_should_map_pages(dev)) {
pmem->virt_addr = (void __pmem *) devm_memremap_pages(dev, res,
-   NULL);
+   >q_usage_counter, NULL);
pmem->pfn_flags |= PFN_MAP;
} else
pmem->virt_addr = (void __pmem *) devm_memremap(dev,
pmem->phys_addr, pmem->size,
ARCH_MEMREMAP_PMEM);
 
-   if (IS_ERR(pmem->virt_addr))
+   if (IS_ERR(pmem->virt_addr)) {
+   blk_cleanup_queue(q);
return (void __force *) pmem->virt_addr;
+   }
 
+   pmem->pmem_queue = q;
return pmem;
 }
 
@@ -169,20 +177,15 @@ static void pmem_detach_disk(struct pmem_device *pmem)
 static int pmem_attach_disk(struct device *dev,
struct nd_namespace_common *ndns, struct pmem_device *pmem)
 {
-   int nid = dev_to_node(dev);
struct gendisk *disk;
 
-   pmem->pmem_queue = blk_alloc_queue_node(GFP_KERNEL, nid);
-   if (!pmem->pmem_queue)
-   return -ENOMEM;
-
blk_queue_make_request(pmem->pmem_queue, pmem_make_request);
blk_queue_physical_block_size(pmem->pmem_queue, PAGE_SIZE);
blk_queue_max_hw_sectors(pmem->pmem_queue, UINT_MAX);
blk_queue_bounce_limit(pmem->pmem_queue, BLK_BOUNCE_ANY);
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->pmem_queue);
 
-   disk = alloc_disk_node(0, nid);
+   disk = alloc_disk_node(0, dev_to_node(dev));
if (!disk) {
blk_cleanup_queue(pmem->pmem_queue);
return -ENOMEM;
@@ -318,6 +321,7 @@ static int nvdimm_namespace_attach_pfn(struct 
nd_namespace_common *ndns)
struct vmem_altmap *altmap;
struct nd_pfn_sb *pfn_sb;
struct pmem_device *pmem;
+   struct request_queue *q;
phys_addr_t offset;
int rc;
struct vmem_altmap __altmap = {
@@ -369,9 +373,10 @@ static int nvdimm_namespace_attach_pfn(struct 
nd_namespace_common *ndns)
 
/* establish pfn range for lookup, and switch to direct map */
pmem = dev_get_drvdata(dev);
+   q = pmem->pmem_queue;
devm_memunmap(dev, (void __force *) pmem->virt_addr);
pmem->virt_addr = (void __pmem *) devm_memremap_pages(dev, >res,
-   altmap);
+   >q_usage_counter, altmap);
pmem->pfn_flags |= PFN_MAP;
if (IS_ERR(pmem->virt_addr)) {
rc = PTR_ERR(pmem->virt_addr);
@@ -410,19 +415,22 @@ static int nd_pmem_probe(struct device *dev)
dev_set_drvdata(dev, pmem);
ndns->rw_bytes = pmem_rw_bytes;
 
-   if (is_nd_btt(dev))
+   if (is_nd_btt(dev)) {
+   /* btt allocates its own request_queue */
+   blk_cleanup_queue(pmem->pmem_queue);
+   pmem->pmem_queue = NULL;
return nvdimm_namespace_attach_btt(ndns);
+   }
 
if (is_nd_pfn(dev))
return nvdimm_namespace_attach_pfn(ndns);
 
-   if (nd_btt_probe(ndns, pmem) == 0) {
-   /* we'll come back as btt-pmem */
-   return -ENXIO;
-   }

[PATCH v2 19/20] mm, pmem: devm_memunmap_pages(), truncate and unmap ZONE_DEVICE pages

2015-10-09 Thread Dan Williams

Before we allow ZONE_DEVICE pages to be put into active use outside of
the pmem driver, we need to arrange for them to be reclaimed when the
driver is shutdown.  devm_memunmap_pages() must wait for all pages to
return to the initial mapcount of 1.  If a given page is mapped by a
process we will truncate it out of its inode mapping and unmap it out of
the process vma.

This truncation is done while the dev_pagemap reference count is "dead",
preventing new references from being taken while the truncate+unmap scan
is in progress.

Cc: Dave Hansen 
Cc: Andrew Morton 
Cc: Christoph Hellwig 
Cc: Ross Zwisler 
Cc: Matthew Wilcox 
Cc: Alexander Viro 
Cc: Dave Chinner 
Signed-off-by: Dan Williams 
---
 drivers/nvdimm/pmem.c |   42 --
 fs/dax.c  |2 ++
 include/linux/mm.h|5 +
 kernel/memremap.c |   48 
 4 files changed, 91 insertions(+), 6 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f7acce594fa0..2c9aebbc3fea 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -24,12 +24,15 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include "pfn.h"
 #include "nd.h"
 
+static ASYNC_DOMAIN_EXCLUSIVE(async_pmem);
+
 struct pmem_device {
struct request_queue*pmem_queue;
struct gendisk  *pmem_disk;
@@ -164,14 +167,43 @@ static struct pmem_device *pmem_alloc(struct device *dev,
return pmem;
 }
 
-static void pmem_detach_disk(struct pmem_device *pmem)
+
+static void async_blk_cleanup_queue(void *data, async_cookie_t cookie)
+{
+   struct pmem_device *pmem = data;
+
+   blk_cleanup_queue(pmem->pmem_queue);
+}
+
+static void pmem_detach_disk(struct device *dev)
 {
+   struct pmem_device *pmem = dev_get_drvdata(dev);
+   struct request_queue *q = pmem->pmem_queue;
+
if (!pmem->pmem_disk)
return;
 
del_gendisk(pmem->pmem_disk);
put_disk(pmem->pmem_disk);
-   blk_cleanup_queue(pmem->pmem_queue);
+   async_schedule_domain(async_blk_cleanup_queue, pmem, _pmem);
+
+   if (pmem->pfn_flags & PFN_MAP) {
+   /*
+* Wait for queue to go dead so that we know no new
+* references will be taken against the pages allocated
+* by devm_memremap_pages().
+*/
+   blk_wait_queue_dead(q);
+
+   /*
+* Manually release the page mapping so that
+* blk_cleanup_queue() can complete queue draining.
+*/
+   devm_memunmap_pages(dev, (void __force *) pmem->virt_addr);
+   }
+
+   /* Wait for blk_cleanup_queue() to finish */
+   async_synchronize_full_domain(_pmem);
 }
 
 static int pmem_attach_disk(struct device *dev,
@@ -299,11 +331,9 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 static int nvdimm_namespace_detach_pfn(struct nd_namespace_common *ndns)
 {
struct nd_pfn *nd_pfn = to_nd_pfn(ndns->claim);
-   struct pmem_device *pmem;
 
/* free pmem disk */
-   pmem = dev_get_drvdata(_pfn->dev);
-   pmem_detach_disk(pmem);
+   pmem_detach_disk(_pfn->dev);
 
/* release nd_pfn resources */
kfree(nd_pfn->pfn_sb);
@@ -446,7 +476,7 @@ static int nd_pmem_remove(struct device *dev)
else if (is_nd_pfn(dev))
nvdimm_namespace_detach_pfn(pmem->ndns);
else
-   pmem_detach_disk(pmem);
+   pmem_detach_disk(dev);
 
return 0;
 }
diff --git a/fs/dax.c b/fs/dax.c
index 87a070d6e6dc..208e064fafe5 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -46,6 +46,7 @@ static void __pmem *__dax_map_atomic(struct block_device 
*bdev, sector_t sector,
blk_queue_exit(q);
return (void __pmem *) ERR_PTR(rc);
}
+   rcu_read_lock();
return addr;
 }
 
@@ -62,6 +63,7 @@ static void dax_unmap_atomic(struct block_device *bdev, void 
__pmem *addr)
if (IS_ERR(addr))
return;
blk_queue_exit(bdev->bd_queue);
+   rcu_read_unlock();
 }
 
 /*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8a84bfb6fa6a..af7597410cb9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -801,6 +801,7 @@ struct dev_pagemap {
 
 #ifdef CONFIG_ZONE_DEVICE
 struct dev_pagemap *__get_dev_pagemap(resource_size_t phys);
+void devm_memunmap_pages(struct device *dev, void *addr);
 void *devm_memremap_pages(struct device *dev, struct resource *res,
struct percpu_ref *ref, struct vmem_altmap *altmap);
 struct vmem_altmap *to_vmem_altmap(unsigned long memmap_start);
@@ -810,6 +811,10 @@ static inline struct dev_pagemap 
*__get_dev_pagemap(resource_size_t phys)
return NULL;
 }
 
+static inline void devm_memunmap_pages(struct device *dev, void *addr)
+{
+}
+
 static inline void *devm_memremap_pages(struct device *dev, struct resource 
*res,

[PATCH v2 20/20] mm, x86: get_user_pages() for dax mappings

2015-10-09 Thread Dan Williams

A dax mapping establishes a pte with _PAGE_DEVMAP set when the driver
has established a devm_memremap_pages() mapping, i.e. when the pfn_t
return from ->direct_access() has PFN_DEV and PFN_MAP set.  Later, when
encountering _PAGE_DEVMAP during a page table walk we lookup and pin a
struct dev_pagemap instance to keep the result of pfn_to_page() valid
until put_page().

Cc: Dave Hansen 
Cc: Andrew Morton 
Cc: Christoph Hellwig 
Cc: Ross Zwisler 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: H. Peter Anvin 
Cc: Jeff Moyer 
Cc: Peter Zijlstra 
Cc: Matthew Wilcox 
Cc: Alexander Viro 
Cc: Dave Chinner 
Signed-off-by: Dan Williams 
---
 arch/ia64/include/asm/pgtable.h |1 +
 arch/x86/include/asm/pgtable.h  |2 +
 arch/x86/mm/gup.c   |   56 +--
 include/linux/mm.h  |   40 +++-
 mm/gup.c|   11 +++-
 mm/hugetlb.c|   18 -
 mm/swap.c   |   15 ++
 7 files changed, 124 insertions(+), 19 deletions(-)

diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index 9f3ed9ee8f13..81d2af23958f 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -273,6 +273,7 @@ extern unsigned long VMALLOC_END;
 #define pmd_clear(pmdp)(pmd_val(*(pmdp)) = 0UL)
 #define pmd_page_vaddr(pmd)((unsigned long) __va(pmd_val(pmd) & 
_PFN_MASK))
 #define pmd_page(pmd)  virt_to_page((pmd_val(pmd) + 
PAGE_OFFSET))
+#define pmd_pfn(pmd)   (pmd_val(pmd) >> PAGE_SHIFT)
 
 #define pud_none(pud)  (!pud_val(pud))
 #define pud_bad(pud)   (!ia64_phys_addr_valid(pud_val(pud)))
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 84d1346e1cda..d29dc7b4924b 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -461,7 +461,7 @@ static inline int pte_present(pte_t a)
 #define pte_devmap pte_devmap
 static inline int pte_devmap(pte_t a)
 {
-   return pte_flags(a) & _PAGE_DEVMAP;
+   return (pte_flags(a) & _PAGE_DEVMAP) == _PAGE_DEVMAP;
 }
 
 #define pte_accessible pte_accessible
diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index 81bf3d2af3eb..7254ba4f791d 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -63,6 +63,16 @@ retry:
 #endif
 }
 
+static void undo_dev_pagemap(int *nr, int nr_start, struct page **pages)
+{
+   while ((*nr) - nr_start) {
+   struct page *page = pages[--(*nr)];
+
+   ClearPageReferenced(page);
+   put_page(page);
+   }
+}
+
 /*
  * The performance critical leaf functions are made noinline otherwise gcc
  * inlines everything into a single function which results in too much
@@ -71,7 +81,9 @@ retry:
 static noinline int gup_pte_range(pmd_t pmd, unsigned long addr,
unsigned long end, int write, struct page **pages, int *nr)
 {
+   struct dev_pagemap *pgmap = NULL;
unsigned long mask;
+   int nr_start = *nr;
pte_t *ptep;
 
mask = _PAGE_PRESENT|_PAGE_USER;
@@ -89,13 +101,21 @@ static noinline int gup_pte_range(pmd_t pmd, unsigned long 
addr,
return 0;
}
 
-   if ((pte_flags(pte) & (mask | _PAGE_SPECIAL)) != mask) {
+   page = pte_page(pte);
+   if (pte_devmap(pte)) {
+   pgmap = get_dev_pagemap(pte_pfn(pte), pgmap);
+   if (unlikely(!pgmap)) {
+   undo_dev_pagemap(nr, nr_start, pages);
+   pte_unmap(ptep);
+   return 0;
+   }
+   } else if ((pte_flags(pte) & (mask | _PAGE_SPECIAL)) != mask) {
pte_unmap(ptep);
return 0;
}
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
-   page = pte_page(pte);
get_page(page);
+   put_dev_pagemap(pgmap);
SetPageReferenced(page);
pages[*nr] = page;
(*nr)++;
@@ -114,6 +134,32 @@ static inline void get_head_page_multiple(struct page 
*page, int nr)
SetPageReferenced(page);
 }
 
+static int __gup_device_huge_pmd(pmd_t pmd, unsigned long addr,
+   unsigned long end, struct page **pages, int *nr)
+{
+   int nr_start = *nr;
+   unsigned long pfn = pmd_pfn(pmd);
+   struct dev_pagemap *pgmap = NULL;
+
+   pfn += (addr & ~PMD_MASK) >> PAGE_SHIFT;
+   do {
+   struct page *page = pfn_to_page(pfn);
+
+   pgmap = get_dev_pagemap(pfn, pgmap);
+   if (unlikely(!pgmap)) {
+   undo_dev_pagemap(nr, nr_start, pages);
+   return 0;
+   }
+   SetPageReferenced(page);
+   pages[*nr] = page;
+

[PATCH v2 16/20] list: introduce list_poison() and LIST_POISON3

2015-10-09 Thread Dan Williams

ZONE_DEVICE pages always have an elevated count and will never be on an
lru reclaim list.  That space in 'struct page' can be redirected for
other uses, but for safety introduce a poison value that will always
trip __list_add() to assert.  This allows half of the struct list_head
storage to be reclaimed with some assurance to back up the assumption
that the page count never goes to zero and a list_add() is never
attempted.

Signed-off-by: Dan Williams 
---
 include/linux/list.h   |   14 ++
 include/linux/poison.h |1 +
 lib/list_debug.c   |2 ++
 3 files changed, 17 insertions(+)

diff --git a/include/linux/list.h b/include/linux/list.h
index 3e3e64a61002..af38cc80ae4c 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -114,6 +114,20 @@ extern void list_del(struct list_head *entry);
 #endif
 
 /**
+ * list_del_poison - poison an entry to always assert on list_add
+ * @entry: the element to delete and poison
+ *
+ * Note: the assertion on list_add() only occurs when CONFIG_DEBUG_LIST=y,
+ * otherwise this is identical to list_del()
+ */
+static inline void list_del_poison(struct list_head *entry)
+{
+   __list_del(entry->prev, entry->next);
+   entry->next = LIST_POISON3;
+   entry->prev = LIST_POISON3;
+}
+
+/**
  * list_replace - replace old entry by new one
  * @old : the element to be replaced
  * @new : the new element to insert
diff --git a/include/linux/poison.h b/include/linux/poison.h
index 317e16de09e5..31d048b3ba06 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -21,6 +21,7 @@
  */
 #define LIST_POISON1  ((void *) 0x100 + POISON_POINTER_DELTA)
 #define LIST_POISON2  ((void *) 0x200 + POISON_POINTER_DELTA)
+#define LIST_POISON3  ((void *) 0x300 + POISON_POINTER_DELTA)
 
 /** include/linux/timer.h **/
 /*
diff --git a/lib/list_debug.c b/lib/list_debug.c
index c24c2f7e296f..ec69e2b8e0fc 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -23,6 +23,8 @@ void __list_add(struct list_head *new,
  struct list_head *prev,
  struct list_head *next)
 {
+   WARN(new->next == LIST_POISON3 || new->prev == LIST_POISON3,
+   "list_add attempted on poisoned entry\n");
WARN(next->prev != prev,
"list_add corruption. next->prev should be "
"prev (%p), but was %p. (next=%p).\n",

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 18/20] block: notify queue death confirmation

2015-10-09 Thread Dan Williams

The pmem driver arranges for references to be taken against the queue
while pages it allocated via devm_memremap_pages() are in use.  At
shutdown time, before those pages can be deallocated, they need to be
truncated, unmapped, and guaranteed to be idle.  Scanning the pages to
initiate truncation can only be done once we are certain no new page
references will be taken.  Once the blk queue percpu_ref is confirmed
dead __get_dev_pagemap() will cease allowing new references and we can
reclaim these "device" pages.

Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 block/blk-core.c   |   12 +---
 block/blk-mq.c |   19 +++
 include/linux/blkdev.h |4 +++-
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 9b4d735cb5b8..74aaa208a8e9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -516,6 +516,12 @@ void blk_set_queue_dying(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_set_queue_dying);
 
+void blk_wait_queue_dead(struct request_queue *q)
+{
+   wait_event(q->q_freeze_wq, q->q_usage_dead);
+}
+EXPORT_SYMBOL(blk_wait_queue_dead);
+
 /**
  * blk_cleanup_queue - shutdown a request queue
  * @q: request queue to shutdown
@@ -638,7 +644,7 @@ int blk_queue_enter(struct request_queue *q, gfp_t gfp)
if (!(gfp & __GFP_WAIT))
return -EBUSY;
 
-   ret = wait_event_interruptible(q->mq_freeze_wq,
+   ret = wait_event_interruptible(q->q_freeze_wq,
!atomic_read(>mq_freeze_depth) ||
blk_queue_dying(q));
if (blk_queue_dying(q))
@@ -658,7 +664,7 @@ static void blk_queue_usage_counter_release(struct 
percpu_ref *ref)
struct request_queue *q =
container_of(ref, struct request_queue, q_usage_counter);
 
-   wake_up_all(>mq_freeze_wq);
+   wake_up_all(>q_freeze_wq);
 }
 
 struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
@@ -720,7 +726,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
q->bypass_depth = 1;
__set_bit(QUEUE_FLAG_BYPASS, >queue_flags);
 
-   init_waitqueue_head(>mq_freeze_wq);
+   init_waitqueue_head(>q_freeze_wq);
 
/*
 * Init percpu_ref in atomic mode so that it's faster to shutdown.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c371aeda2986..d52f9d91f5c1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -77,13 +77,23 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx 
*hctx,
clear_bit(CTX_TO_BIT(hctx, ctx), >word);
 }
 
+static void blk_confirm_queue_death(struct percpu_ref *ref)
+{
+   struct request_queue *q = container_of(ref, typeof(*q),
+   q_usage_counter);
+
+   q->q_usage_dead = 1;
+   wake_up_all(>q_freeze_wq);
+}
+
 void blk_mq_freeze_queue_start(struct request_queue *q)
 {
int freeze_depth;
 
freeze_depth = atomic_inc_return(>mq_freeze_depth);
if (freeze_depth == 1) {
-   percpu_ref_kill(>q_usage_counter);
+   percpu_ref_kill_and_confirm(>q_usage_counter,
+   blk_confirm_queue_death);
blk_mq_run_hw_queues(q, false);
}
 }
@@ -91,7 +101,7 @@ EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
 
 static void blk_mq_freeze_queue_wait(struct request_queue *q)
 {
-   wait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter));
+   wait_event(q->q_freeze_wq, percpu_ref_is_zero(>q_usage_counter));
 }
 
 /*
@@ -129,7 +139,8 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
WARN_ON_ONCE(freeze_depth < 0);
if (!freeze_depth) {
percpu_ref_reinit(>q_usage_counter);
-   wake_up_all(>mq_freeze_wq);
+   q->q_usage_dead = 0;
+   wake_up_all(>q_freeze_wq);
}
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
@@ -148,7 +159,7 @@ void blk_mq_wake_waiters(struct request_queue *q)
 * dying, we need to ensure that processes currently waiting on
 * the queue are notified as well.
 */
-   wake_up_all(>mq_freeze_wq);
+   wake_up_all(>q_freeze_wq);
 }
 
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index fb3e6886c479..a1340654e360 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -427,6 +427,7 @@ struct request_queue {
 */
unsigned intflush_flags;
unsigned intflush_not_queueable:1;
+   unsigned intq_usage_dead:1;
struct blk_flush_queue  *fq;
 
struct list_headrequeue_list;
@@ -449,7 +450,7 @@ struct request_queue {
struct throtl_data *td;
 #endif
struct rcu_head rcu_head;
-   wait_queue_head_t   mq_freeze_wq;
+   wait_queue_head_t   q_freeze_wq;

[PATCH v2 00/20] get_user_pages() for dax mappings

2015-10-09 Thread Dan Williams

Changes since v1 [1]:
1/ Rebased on the accepted cleanups to the memremap() api and the NUMA
   hints for devm allocations. (see libnvdimm-for-next [2]).

2/ Rebased on DAX fixes from Ross [3], currently in -mm, and Dave [4],
   applied locally for now.

3/ Renamed __pfn_t to pfn_t and converted KVM and UM accordingly (Dave
   Hansen)

4/ Make pfn-to-pfn_t conversions a nop (binary identical) for typical
   mapped pfns (Dave Hansen)

5/ Fixed up the devm_memremap_pages() api to require passing in a
   percpu_ref object.  Addresses a crash reported-by Logan.

6/ Moved the back pointer from a page to its hosting 'struct
   dev_pagemap' to share storage with the 'lru' field rather than
   'mapping'.  Enables us to revoke mappings at devm_memunmap_page()
   time and addresses a crash reported-by Logan.

7/ Rework dax_map_bh() into dax_map_atomic() to avoid proliferating
   buffer_head usage deeper into the dax implementation.  Also addresses
   a crash reported by Logan (Dave Chinner)

8/ Include an initial, only lightly tested, implementation of revoking
   usages of ZONE_DEVICE pages when the driver disables the pmem device.
   This coordinates with blk_cleanup_queue() for the pmem gendisk, see
   patch 19.

9/ Include a cleaned up version of the vmem_altmap infrastructure
   allowing the struct page memmap to optionally be allocated from pmem
   itself.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: 
https://git.kernel.org/cgit/linux/kernel/git/nvdimm/nvdimm.git/log/?h=libnvdimm-for-next
[3]: 
https://git.kernel.org/cgit/linux/kernel/git/nvdimm/nvdimm.git/commit/?h=dax-fixes=93fdde069dce
[4]: https://lists.01.org/pipermail/linux-nvdimm/2015-October/002286.html

---
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o.  It allows userspace to coordinate
DMA/RDMA from/to persitent memory.

The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 to flag pages that are owned and dynamically mapped by a device
driver.  The pmem driver, after mapping a persistent memory range into
the system memmap via devm_memremap_pages(), arranges for DAX to
distinguish pfn-only versus page-backed pmem-pfns via flags in the new
__pfn_t type.  The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn,
flags the resulting pte(s) inserted into the process page tables with a
new _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it
keys off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.

This series is available via git here:

  git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm libnvdimm-pending

---

Dan Williams (20):
  block: generic request_queue reference counting
  dax: increase granularity of dax_clear_blocks() operations
  block, dax: fix lifetime of in-kernel dax mappings with dax_map_atomic()
  mm: introduce __get_dev_pagemap()
  x86, mm: introduce vmem_altmap to augment vmemmap_populate()
  libnvdimm, pfn, pmem: allocate memmap array in persistent memory
  avr32: convert to asm-generic/memory_model.h
  hugetlb: fix compile error on tile
  frv: fix compiler warning from definition of __pmd()
  um: kill pfn_t
  kvm: rename pfn_t to kvm_pfn_t
  mips: fix PAGE_MASK definition
  mm, dax, pmem: introduce pfn_t
  mm, dax, gpu: convert vm_insert_mixed to pfn_t, introduce _PAGE_DEVMAP
  mm, dax: convert vmf_insert_pfn_pmd() to pfn_t
  list: introduce list_poison() and LIST_POISON3
  mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup
  block: notify queue death confirmation
  mm, pmem: devm_memunmap_pages(), truncate and unmap ZONE_DEVICE pages
  mm, x86: get_user_pages() for dax mappings


 arch/alpha/include/asm/pgtable.h|1 
 arch/arm/include/asm/kvm_mmu.h  |5 -
 arch/arm/kvm/mmu.c  |   10 +
 arch/arm64/include/asm/kvm_mmu.h|3 
 arch/avr32/include/asm/page.h   |8 -
 arch/frv/include/asm/page.h |2 
 arch/ia64/include/asm/pgtable.h |1 
 arch/m68k/include/asm/page_mm.h |1 
 arch/m68k/include/asm/page_no.h |1 
 arch/mips/include/asm/kvm_host.h|6 -
 arch/mips/include/asm/page.h|2 
 arch/mips/kvm/emulate.c |2 
 arch/mips/kvm/tlb.c |   14 +
 arch/parisc/include/asm/pgtable.h   |1 
 arch/powerpc/include/asm/kvm_book3s.h   |4 
 arch/powerpc/include/asm/kvm_ppc.h  |2 
 arch/powerpc/include/asm/pgtable.h  |1 
 arch/powerpc/kvm/book3s.c   |6 -
 arch/powerpc/kvm/book3s_32_mmu_host.c   |2 
 arch/powerpc/kvm/book3s_64_mmu_host.c

[PATCH v2 12/20] mips: fix PAGE_MASK definition

2015-10-09 Thread Dan Williams

Make PAGE_MASK an unsigned long, like it is on x86, to avoid:

In file included from arch/mips/kernel/asm-offsets.c:14:0:
include/linux/mm.h: In function '__pfn_to_pfn_t':
include/linux/mm.h:1050:2: warning: left shift count >= width of type
  pfn_t pfn_t = { .val = pfn | (flags & PFN_FLAGS_MASK), };

...where PFN_FLAGS_MASK is:

#define PFN_FLAGS_MASK (~PAGE_MASK << (BITS_PER_LONG - PAGE_SHIFT))

Cc: Ralf Baechle 
Cc: linux-m...@linux-mips.org
Signed-off-by: Dan Williams 
---
 arch/mips/include/asm/page.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/page.h b/arch/mips/include/asm/page.h
index 89dd7fed1a57..ad1fccdb8d13 100644
--- a/arch/mips/include/asm/page.h
+++ b/arch/mips/include/asm/page.h
@@ -33,7 +33,7 @@
 #define PAGE_SHIFT 16
 #endif
 #define PAGE_SIZE  (_AC(1,UL) << PAGE_SHIFT)
-#define PAGE_MASK  (~((1 << PAGE_SHIFT) - 1))
+#define PAGE_MASK  (~(PAGE_SIZE - 1))
 
 /*
  * This is used for calculating the real page sizes

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 14/20] mm, dax, gpu: convert vm_insert_mixed to pfn_t, introduce _PAGE_DEVMAP

2015-10-09 Thread Dan Williams

Convert the raw unsigned long 'pfn' argument to pfn_t for the purpose
of evaluating the PFN_MAP and PFN_DEV flags.  When both are set it
triggers _PAGE_DEVMAP to be set in the resulting pte.  This flag will
later be used in the get_user_pages() path to pin the page mapping,
dynamically allocated by devm_memremap_pages(), until all the resulting
pages are released.

There are no functional changes to the gpu drivers as a result of this
conversion.

This uncovered several architectures with no local definition for
pfn_pte(), in response pfn_t_pte() is only defined when an arch opts-in
by "#define pfn_pte pfn_pte".

Cc: Dave Hansen 
Cc: Andrew Morton 
Cc: David Airlie 
Signed-off-by: Dan Williams 
---
 arch/alpha/include/asm/pgtable.h|1 +
 arch/parisc/include/asm/pgtable.h   |1 +
 arch/powerpc/include/asm/pgtable.h  |1 +
 arch/tile/include/asm/pgtable.h |1 +
 arch/um/include/asm/pgtable-3level.h|1 +
 arch/x86/include/asm/pgtable.h  |   18 ++
 arch/x86/include/asm/pgtable_types.h|7 ++-
 drivers/gpu/drm/exynos/exynos_drm_gem.c |3 ++-
 drivers/gpu/drm/gma500/framebuffer.c|3 ++-
 drivers/gpu/drm/msm/msm_gem.c   |3 ++-
 drivers/gpu/drm/omapdrm/omap_gem.c  |6 --
 drivers/gpu/drm/ttm/ttm_bo_vm.c |3 ++-
 fs/dax.c|2 +-
 include/linux/mm.h  |   29 -
 mm/memory.c |   15 +--
 15 files changed, 79 insertions(+), 15 deletions(-)

diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h
index a9a119592372..a54050fe867e 100644
--- a/arch/alpha/include/asm/pgtable.h
+++ b/arch/alpha/include/asm/pgtable.h
@@ -216,6 +216,7 @@ extern unsigned long __zero_page(void);
 })
 #endif
 
+#define pfn_pte pfn_pte
 extern inline pte_t pfn_pte(unsigned long physpfn, pgprot_t pgprot)
 { pte_t pte; pte_val(pte) = (PHYS_TWIDDLE(physpfn) << 32) | 
pgprot_val(pgprot); return pte; }
 
diff --git a/arch/parisc/include/asm/pgtable.h 
b/arch/parisc/include/asm/pgtable.h
index f93c4a4e6580..dde7dd7200bd 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -377,6 +377,7 @@ static inline pte_t pte_mkspecial(pte_t pte){ 
return pte; }
 
 #define mk_pte(page, pgprot)   pfn_pte(page_to_pfn(page), (pgprot))
 
+#define pfn_pte pfn_pte
 static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
 {
pte_t pte;
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 0717693c8428..8448ff1542e0 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -67,6 +67,7 @@ static inline int pte_present(pte_t pte)
  * Even if PTEs can be unsigned long long, a PFN is always an unsigned
  * long for now.
  */
+#define pfn_pte pfn_pte
 static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot) {
return __pte(((pte_basic_t)(pfn) << PTE_RPN_SHIFT) |
 pgprot_val(pgprot)); }
diff --git a/arch/tile/include/asm/pgtable.h b/arch/tile/include/asm/pgtable.h
index 2b05ccbebed9..37c9aa3a3f0c 100644
--- a/arch/tile/include/asm/pgtable.h
+++ b/arch/tile/include/asm/pgtable.h
@@ -275,6 +275,7 @@ static inline unsigned long pte_pfn(pte_t pte)
 extern pgprot_t set_remote_cache_cpu(pgprot_t prot, int cpu);
 extern int get_remote_cache_cpu(pgprot_t prot);
 
+#define pfn_pte pfn_pte
 static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
 {
return hv_pte_set_pa(prot, PFN_PHYS(pfn));
diff --git a/arch/um/include/asm/pgtable-3level.h 
b/arch/um/include/asm/pgtable-3level.h
index bae8523a162f..b7b51db14c2f 100644
--- a/arch/um/include/asm/pgtable-3level.h
+++ b/arch/um/include/asm/pgtable-3level.h
@@ -98,6 +98,7 @@ static inline unsigned long pte_pfn(pte_t pte)
return phys_to_pfn(pte_val(pte));
 }
 
+#define pfn_pte pfn_pte
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
pte_t pte;
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 867da5bbb4a3..02a54e5b7930 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -248,6 +248,11 @@ static inline pte_t pte_mkspecial(pte_t pte)
return pte_set_flags(pte, _PAGE_SPECIAL);
 }
 
+static inline pte_t pte_mkdevmap(pte_t pte)
+{
+   return pte_set_flags(pte, _PAGE_SPECIAL|_PAGE_DEVMAP);
+}
+
 static inline pmd_t pmd_set_flags(pmd_t pmd, pmdval_t set)
 {
pmdval_t v = native_pmd_val(pmd);
@@ -334,6 +339,7 @@ static inline pgprotval_t massage_pgprot(pgprot_t pgprot)
return protval;
 }
 
+#define pfn_pte pfn_pte
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) |
@@ -446,6 +452,12 @@ static inline int pte_present(pte_t a)
return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
 }
 
+#define

[PATCH v2 06/20] libnvdimm, pfn, pmem: allocate memmap array in persistent memory

2015-10-09 Thread Dan Williams

Use the new vmem_altmap capability to enable the pmem driver to arrange
for a struct page memmap to be established in persistent memory.

Cc: Christoph Hellwig 
Cc: Dave Chinner 
Cc: Andrew Morton 
Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 drivers/nvdimm/pfn_devs.c |3 +--
 drivers/nvdimm/pmem.c |   19 +--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 71805a1aa0f3..a642cfacee07 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -83,8 +83,7 @@ static ssize_t mode_store(struct device *dev,
 
if (strncmp(buf, "pmem\n", n) == 0
|| strncmp(buf, "pmem", n) == 0) {
-   /* TODO: allocate from PMEM support */
-   rc = -ENOTTY;
+   nd_pfn->mode = PFN_MODE_PMEM;
} else if (strncmp(buf, "ram\n", n) == 0
|| strncmp(buf, "ram", n) == 0)
nd_pfn->mode = PFN_MODE_RAM;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 3c5b8f585441..bb66158c0505 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -322,12 +322,16 @@ static int nvdimm_namespace_attach_pfn(struct 
nd_namespace_common *ndns)
struct nd_namespace_io *nsio = to_nd_namespace_io(>dev);
struct nd_pfn *nd_pfn = to_nd_pfn(ndns->claim);
struct device *dev = _pfn->dev;
-   struct vmem_altmap *altmap;
struct nd_region *nd_region;
+   struct vmem_altmap *altmap;
struct nd_pfn_sb *pfn_sb;
struct pmem_device *pmem;
phys_addr_t offset;
int rc;
+   struct vmem_altmap __altmap = {
+   .base_pfn = __phys_to_pfn(nsio->res.start),
+   .reserve = __phys_to_pfn(SZ_8K),
+   };
 
if (!nd_pfn->uuid || !nd_pfn->ndns)
return -ENODEV;
@@ -355,6 +359,17 @@ static int nvdimm_namespace_attach_pfn(struct 
nd_namespace_common *ndns)
return -EINVAL;
nd_pfn->npfns = le64_to_cpu(pfn_sb->npfns);
altmap = NULL;
+   } else if (nd_pfn->mode == PFN_MODE_PMEM) {
+   nd_pfn->npfns = (resource_size(>res) - offset)
+   / PAGE_SIZE;
+   if (le64_to_cpu(nd_pfn->pfn_sb->npfns) > nd_pfn->npfns)
+   dev_info(_pfn->dev,
+   "number of pfns truncated from %lld to 
%ld\n",
+   le64_to_cpu(nd_pfn->pfn_sb->npfns),
+   nd_pfn->npfns);
+   altmap = & __altmap;
+   altmap->free = __phys_to_pfn(offset - SZ_8K);
+   altmap->alloc = 0;
} else {
rc = -ENXIO;
goto err;
@@ -364,7 +379,7 @@ static int nvdimm_namespace_attach_pfn(struct 
nd_namespace_common *ndns)
pmem = dev_get_drvdata(dev);
devm_memunmap(dev, (void __force *) pmem->virt_addr);
pmem->virt_addr = (void __pmem *) devm_memremap_pages(dev, >res,
-   NULL);
+   altmap);
if (IS_ERR(pmem->virt_addr)) {
rc = PTR_ERR(pmem->virt_addr);
goto err;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched_clock: add data pointer argument to read callback

2015-10-09 Thread Måns Rullgård

Russell King - ARM Linux  writes:

> On Sat, Oct 10, 2015 at 12:48:22AM +0100, Måns Rullgård wrote:
>> Russell King - ARM Linux  writes:
>> 
>> > On Fri, Oct 09, 2015 at 10:57:35PM +0100, Mans Rullgard wrote:
>> >> This passes a data pointer specified in the sched_clock_register()
>> >> call to the read callback allowing simpler implementations thereof.
>> >> 
>> >> In this patch, existing uses of this interface are simply updated
>> >> with a null pointer.
>> >
>> > This is a bad description.  It tells us what the patch is doing,
>> > (which we can see by reading the patch) but not _why_.  Please include
>> > information on why the change is necessary - describe what you are
>> > trying to achieve.
>> 
>> Currently most of the callbacks use a global variable to store the
>> address of a counter register.  This has several downsides:
>> 
>> - Loading the address of a global variable can be more expensive than
>>   keeping a pointer next to the function pointer.
>> 
>> - It makes it impossible to have multiple instances of a driver call
>>   sched_clock_register() since the caller can't know which clock will
>>   win in the end.
>> 
>> - Many of the existing callbacks are practically identical and could be
>>   replaced with a common generic function if it had a pointer argument.
>> 
>> If I've missed something that makes this a stupid idea, please tell.
>
> So my next question is whether you intend to pass an iomem pointer
> through this, or a some kind of structure, or both.  It matters,
> because iomem pointers have a __iomem attribute to keep sparse
> happy.  Having to force that attribute on and off pointers is frowned
> upon, as it defeats the purpose of the sparse static checker.

So this is an instance where tools like sparse get in the way of doing
the simplest, most efficient, and obviously correct thing.  Who wins in
such cases?

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] i2c: add ACPI support for I2C mux ports

2015-10-09 Thread Dustin Byford

Although I2C mux devices are easily enumerated using ACPI (_HID/_CID or
device property compatible string match) enumerating I2C client devices
connected through a I2C mux device requires a little extra work.

This change implements a method for describing an I2C device hierarchy that
includes mux devices by using an ACPI Device() for each mux channel along
with an _ADR to set the channel number for the device.  See
Documentation/acpi/i2c-muxes.txt for a simple example.

Signed-off-by: Dustin Byford 
---
 Documentation/acpi/i2c-muxes.txt | 58 
 drivers/i2c/i2c-core.c   | 18 +++--
 drivers/i2c/i2c-mux.c|  8 ++
 3 files changed, 82 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/acpi/i2c-muxes.txt

diff --git a/Documentation/acpi/i2c-muxes.txt b/Documentation/acpi/i2c-muxes.txt
new file mode 100644
index 000..efdcf0d
--- /dev/null
+++ b/Documentation/acpi/i2c-muxes.txt
@@ -0,0 +1,58 @@
+ACPI I2C Muxes
+--
+
+Describing an I2C device hierarchy that includes I2C muxes requires an ACPI
+Device() scope per mux channel.
+
+Consider this topology:
+
++--+   +--+
+| SMB1 |-->| MUX0 |--CH00--> i2c client A (0x50)
+|  |   | 0x70 |--CH01--> i2c client B (0x50)
++--+   +--+
+
+which corresponds to the following ASL:
+
+Device(SMB1)
+{
+Name (_HID, ...)
+Device(MUX0)
+{
+Name (_HID, ...)
+Name (_CRS, ResourceTemplate () {
+I2cSerialBus (0x70, ControllerInitiated, I2C_SPEED,
+  AddressingMode7Bit, "^SMB1", 0x00,
+  ResourceConsumer,,)
+}
+
+Device(CH00)
+{
+Name (_ADR, 0)
+
+Device(CLIA)
+{
+Name (_HID, ...)
+Name (_CRS, ResourceTemplate () {
+I2cSerialBus (0x50, ControllerInitiated, I2C_SPEED,
+  AddressingMode7Bit, "^CH00", 0x00,
+  ResourceConsumer,,)
+}
+}
+}
+
+Device(CH01)
+{
+Name (_ADR, 1)
+
+Device(CLIB)
+{
+Name (_HID, ...)
+Name (_CRS, ResourceTemplate () {
+I2cSerialBus (0x50, ControllerInitiated, I2C_SPEED,
+  AddressingMode7Bit, "^CH01", 0x00,
+  ResourceConsumer,,)
+}
+}
+}
+}
+}
diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index 3a4c54e..a2de010 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -156,7 +156,10 @@ static acpi_status acpi_i2c_add_device(acpi_handle handle, 
u32 level,
info.fwnode = acpi_fwnode_handle(adev);
 
memset(, 0, sizeof(lookup));
-   lookup.adapter_handle = ACPI_HANDLE(adapter->dev.parent);
+   if (i2c_parent_is_i2c_adapter(adapter))
+   lookup.adapter_handle = ACPI_HANDLE(>dev);
+   else
+   lookup.adapter_handle = ACPI_HANDLE(adapter->dev.parent);
lookup.device_handle = handle;
lookup.info = 
 
@@ -210,9 +213,20 @@ static acpi_status acpi_i2c_add_device(acpi_handle handle, 
u32 level,
  */
 static void acpi_i2c_register_devices(struct i2c_adapter *adap)
 {
+   struct device *dev;
acpi_status status;
 
-   if (!adap->dev.parent || !has_acpi_companion(adap->dev.parent))
+   /*
+* Typically we look at the ACPI device's parent for an ACPI companion.
+* However, in the case of an I2C-connected I2C mux, the "virtual" I2C
+* adapter allocated for the mux channel has that association.
+*/
+   if (i2c_parent_is_i2c_adapter(adap))
+   dev = >dev;
+   else
+   dev = adap->dev.parent;
+
+   if (!has_acpi_companion(dev))
return;
 
status = acpi_walk_namespace(ACPI_TYPE_DEVICE, ACPI_ROOT_OBJECT,
diff --git a/drivers/i2c/i2c-mux.c b/drivers/i2c/i2c-mux.c
index 2ba7c0f..00fc5b1 100644
--- a/drivers/i2c/i2c-mux.c
+++ b/drivers/i2c/i2c-mux.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* multiplexer per channel data */
 struct i2c_mux_priv {
@@ -173,6 +174,13 @@ struct i2c_adapter *i2c_add_mux_adapter(struct i2c_adapter 
*parent,
}
}
 
+   /*
+* Associate the mux channel with an ACPI node.
+*/
+   if (has_acpi_companion(mux_dev))
+   acpi_preset_companion(>adap.dev, ACPI_COMPANION(mux_dev),
+ chan_id);
+
if (force_nr) {
priv->adap.nr = force_nr;
ret = i2c_add_numbered_adapter(>adap);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at

[PATCH 1/2] i2c: scan entire ACPI namespace for I2C connections

2015-10-09 Thread Dustin Byford

An I2cSerialBus connection resource descriptor may indicate a
ResourceSource (a string uniquely identifying the I2C bus controller)
anywhere in the ACPI namespace.  However, when enumerating connections to a
I2C bus controller, i2c-core.c:acpi_i2c_register_devices() as only
searching devices that are descendants of the bus controller.

This change corrects acpi_i2c_register_devices() to walk the entire ACPI
namespace searching for I2C connections.

Suggested-by: Mika Westerberg 
Signed-off-by: Dustin Byford 
---
 drivers/i2c/i2c-core.c | 82 --
 1 file changed, 59 insertions(+), 23 deletions(-)

diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index 5f89f1e..3a4c54e 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -99,27 +99,40 @@ struct gsb_buffer {
};
 } __packed;
 
-static int acpi_i2c_add_resource(struct acpi_resource *ares, void *data)
+struct acpi_i2c_lookup {
+   struct i2c_board_info *info;
+   acpi_handle adapter_handle;
+   acpi_handle device_handle;
+};
+
+static int acpi_i2c_find_address(struct acpi_resource *ares, void *data)
 {
-   struct i2c_board_info *info = data;
+   struct acpi_i2c_lookup *lookup = data;
+   struct i2c_board_info *info = lookup->info;
+   struct acpi_resource_i2c_serialbus *sb;
+   acpi_handle adapter_handle;
+   acpi_status status;
 
-   if (ares->type == ACPI_RESOURCE_TYPE_SERIAL_BUS) {
-   struct acpi_resource_i2c_serialbus *sb;
+   if (info->addr || ares->type != ACPI_RESOURCE_TYPE_SERIAL_BUS)
+   return 1;
 
-   sb = >data.i2c_serial_bus;
-   if (!info->addr && sb->type == ACPI_RESOURCE_SERIAL_TYPE_I2C) {
-   info->addr = sb->slave_address;
-   if (sb->access_mode == ACPI_I2C_10BIT_MODE)
-   info->flags |= I2C_CLIENT_TEN;
-   }
-   } else if (!info->irq) {
-   struct resource r;
+   sb = >data.i2c_serial_bus;
+   if (sb->type != ACPI_RESOURCE_SERIAL_TYPE_I2C)
+   return 1;
 
-   if (acpi_dev_resource_interrupt(ares, 0, ))
-   info->irq = r.start;
+   /*
+* Extract the ResourceSource and make sure that the handle matches
+* with the I2C adapter handle.
+*/
+   status = acpi_get_handle(lookup->device_handle,
+sb->resource_source.string_ptr,
+_handle);
+   if (ACPI_SUCCESS(status) && adapter_handle == lookup->adapter_handle) {
+   info->addr = sb->slave_address;
+   if (sb->access_mode == ACPI_I2C_10BIT_MODE)
+   info->flags |= I2C_CLIENT_TEN;
}
 
-   /* Tell the ACPI core to skip this resource */
return 1;
 }
 
@@ -128,6 +141,8 @@ static acpi_status acpi_i2c_add_device(acpi_handle handle, 
u32 level,
 {
struct i2c_adapter *adapter = data;
struct list_head resource_list;
+   struct acpi_i2c_lookup lookup;
+   struct resource_entry *entry;
struct i2c_board_info info;
struct acpi_device *adev;
int ret;
@@ -140,14 +155,37 @@ static acpi_status acpi_i2c_add_device(acpi_handle 
handle, u32 level,
memset(, 0, sizeof(info));
info.fwnode = acpi_fwnode_handle(adev);
 
+   memset(, 0, sizeof(lookup));
+   lookup.adapter_handle = ACPI_HANDLE(adapter->dev.parent);
+   lookup.device_handle = handle;
+   lookup.info = 
+
+   /*
+* Look up for I2cSerialBus resource with ResourceSource that
+* matches with this adapter.
+*/
INIT_LIST_HEAD(_list);
ret = acpi_dev_get_resources(adev, _list,
-acpi_i2c_add_resource, );
+acpi_i2c_find_address, );
acpi_dev_free_resource_list(_list);
 
if (ret < 0 || !info.addr)
return AE_OK;
 
+   /* Then fill IRQ number if any */
+   ret = acpi_dev_get_resources(adev, _list, NULL, NULL);
+   if (ret < 0)
+   return AE_OK;
+
+   resource_list_for_each_entry(entry, _list) {
+   if (resource_type(entry->res) == IORESOURCE_IRQ) {
+   info.irq = entry->res->start;
+   break;
+   }
+   }
+
+   acpi_dev_free_resource_list(_list);
+
adev->power.flags.ignore_parent = true;
strlcpy(info.type, dev_name(>dev), sizeof(info.type));
if (!i2c_new_device(adapter, )) {
@@ -160,6 +198,8 @@ static acpi_status acpi_i2c_add_device(acpi_handle handle, 
u32 level,
return AE_OK;
 }
 
+#define ACPI_I2C_MAX_SCAN_DEPTH 32
+
 /**
  * acpi_i2c_register_devices - enumerate I2C slave devices behind adapter
  * @adap: pointer to adapter
@@ -170,17 +210,13 @@ static acpi_status acpi_i2c_add_device(acpi_handle 
handle, u32 level,
  */
 static void

[PATCH 0/2] i2c: acpi: scan ACPI enumerated I2C mux channels

2015-10-09 Thread Dustin Byford

Two patches ready from my RFC.  The first, from Mika scans more of the ACPI
namespace looking for I2C connections.  It's not strictly a dependency of
the other patch but they are easy to review together.  I was able to test
this by overriding my DSDT and moving I2C resource macros around in the
device hierarchy.

The next adds support for describing I2C mux ports like this (added as
Documentation/acpi/i2c-muxes.txt):

+--+   +--+
| SMB1 |-->| MUX0 |--CH00--> i2c client A (0x50)
|  |   | 0x70 |--CH01--> i2c client B (0x50)
+--+   +--+

Device(SMB1)
{
Name (_HID, ...)
Device(MUX0)
{
Name (_HID, ...)
Name (_CRS, ResourceTemplate () {
I2cSerialBus (0x70, ControllerInitiated, I2C_SPEED,
  AddressingMode7Bit, "^SMB1", 0x00,
  ResourceConsumer,,)
}

Device(CH00)
{
Name (_ADR, 0)

Device(CLIA)
{
Name (_HID, ...)
Name (_CRS, ResourceTemplate () {
I2cSerialBus (0x50, ControllerInitiated, I2C_SPEED,
  AddressingMode7Bit, "^CH00", 0x00,
  ResourceConsumer,,)
}
}
}

Device(CH01)
{
Name (_ADR, 1)

Device(CLIB)
{
Name (_HID, ...)
Name (_CRS, ResourceTemplate () {
I2cSerialBus (0x50, ControllerInitiated, I2C_SPEED,
  AddressingMode7Bit, "^CH01", 0x00,
  ResourceConsumer,,)
}
}
}
}
}

Dustin Byford (2):
  i2c: scan entire ACPI namespace for I2C connections
  i2c: add ACPI support for I2C mux ports

 Documentation/acpi/i2c-muxes.txt | 58 +
 drivers/i2c/i2c-core.c   | 94 ++--
 drivers/i2c/i2c-mux.c|  8 
 3 files changed, 138 insertions(+), 22 deletions(-)
 create mode 100644 Documentation/acpi/i2c-muxes.txt

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] strscpy powerpc fix for 3.4

2015-10-09 Thread Stephen Rothwell

Hi Linus,

On Wed, 7 Oct 2015 20:27:38 -0400 Chris Metcalf  wrote:
>
> On 10/7/2015 6:44 PM, Stephen Rothwell wrote:
> >
> > After merging Linus' tree, today's linux-next build (powerpc
> > ppc64_defconfig) failed like this:
> >
> > lib/string.c: In function 'strscpy':
> > lib/string.c:209:4: error: implicit declaration of function 'zero_bytemask' 
> > [-Werror=implicit-function-declaration]
> >  *(unsigned long *)(dest+res) = c & zero_bytemask(data);
> >  ^
> >
> > Caused by commit
> >
> >30035e45753b ("string: provide strscpy()")
> 
> I posted a change equivalent to yours earlier today:
> 
> http://lkml.kernel.org/r/1444229188-19640-1-git-send-email-cmetc...@ezchip.com
> 
> I also did no testing, but since the rest of the PPC code is similar to the
> asm-generic version, I believe the zero_bytemask() definition should be OK.
> 
> It probably should go through Linus' tree, like the previous set of patches.
> I just pushed it up to the linux-tile tree for Linus to grab as:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git strscpy
> 
> Chris Metcalf (1):
>arch/powerpc: provide zero_bytemask() for big-endian
> 
>   arch/powerpc/include/asm/word-at-a-time.h | 5 +
>   1 file changed, 5 insertions(+)
> 
> -- 
> Chris Metcalf, EZChip Semiconductor
> http://www.ezchip.com

Can you please do this pull as most of the powerpc build testing is
failing at the moment ... :-(

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 09/10] clk: ns2: add clock support for Broadcom Northstar 2 SoC

2015-10-09 Thread Stephen Boyd

On 10/02, Jon Mason wrote:
> diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms
> index 23800a1..2790f21 100644
> --- a/arch/arm64/Kconfig.platforms
> +++ b/arch/arm64/Kconfig.platforms
> @@ -2,6 +2,7 @@ menu "Platform selection"
>  
>  config ARCH_BCM_IPROC
>   bool "Broadcom iProc SoC Family"
> + select COMMON_CLK_IPROC

Given that this is a visible option, I'd expect the defconfig to
enable this.

>   help
> This enables support for Broadcom iProc based SoCs
>  
> diff --git a/drivers/clk/Makefile b/drivers/clk/Makefile
> index d08b3e5..ea81eaa 100644
> --- a/drivers/clk/Makefile
> +++ b/drivers/clk/Makefile
> @@ -47,7 +47,8 @@ obj-$(CONFIG_COMMON_CLK_WM831X) += clk-wm831x.o
>  obj-$(CONFIG_COMMON_CLK_XGENE)   += clk-xgene.o
>  obj-$(CONFIG_COMMON_CLK_PWM) += clk-pwm.o
>  obj-$(CONFIG_COMMON_CLK_AT91)+= at91/
> -obj-$(CONFIG_ARCH_BCM)   += bcm/
> +obj-$(CONFIG_CLK_BCM_KONA)   += bcm/
> +obj-$(CONFIG_COMMON_CLK_IPROC)   += bcm/

Also, perhaps we need some sort of Kconfig thing for overall bcm
clock drivers, so that we don't have duplicate Makefile rules.

config COMMON_CLK_BCM
bool "Support for Broadcom clocks"

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 05/10] clk: iproc: Add PLL base write function

2015-10-09 Thread Stephen Boyd

On 10/02, Jon Mason wrote:
> diff --git a/drivers/clk/bcm/clk-iproc-pll.c b/drivers/clk/bcm/clk-iproc-pll.c
> index e029ab3..a4602aa 100644
> --- a/drivers/clk/bcm/clk-iproc-pll.c
> +++ b/drivers/clk/bcm/clk-iproc-pll.c
> @@ -137,6 +137,18 @@ static int pll_wait_for_lock(struct iproc_pll *pll)
>   return -EIO;
>  }
>  
> +static void iproc_pll_write(struct iproc_pll *pll, void __iomem *base,
> + u32 offset, u32 val)
> +{
> + const struct iproc_pll_ctrl *ctrl = pll->ctrl;
> +
> + writel(val, base + offset);
> +
> + if (unlikely(ctrl->flags & IPROC_CLK_NEEDS_READ_BACK &&
> +  base == pll->pll_base))
> + val = readl(base + offset);

Is there any point to assign val here?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 09/10] clk: ns2: add clock support for Broadcom Northstar 2 SoC

2015-10-09 Thread Stephen Boyd

On 10/02, Jon Mason wrote:
> diff --git a/drivers/clk/bcm/clk-ns2.c b/drivers/clk/bcm/clk-ns2.c
> new file mode 100644
> index 000..1d08281
> --- /dev/null
> +++ b/drivers/clk/bcm/clk-ns2.c
> @@ -0,0 +1,290 @@
> +/*
> + * Copyright (C) 2015 Broadcom Corporation
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation version 2.
> + *
> + * This program is distributed "as is" WITHOUT ANY WARRANTY of any
> + * kind, whether express or implied; without even the implied warranty
> + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

clkdev looks unused here too?

> +#include 
> +#include 

And this one?

> +
> +#include 
> +#include "clk-iproc.h"
> +
> +#define reg_val(o, s, w) { .offset = o, .shift = s, .width = w, }

I guess we missed this one already, but this isn't a macro
resembling a function. Kernel style is to capitalize this sort of
macro.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] namei: results of d_is_negative() should be checked after dentry revalidation

2015-10-09 Thread Linus Torvalds

On Fri, Oct 9, 2015 at 10:44 AM, Trond Myklebust
 wrote:
>
> The issue is that revalidation may cause the dentry to be dropped in NFS
> if, say, the client notes that the directory timestamps have changed.

Ack.

We've had this bug before, where we returned something else than
-ENOCHLD while we were doing RCU lookups. See for example commit
97242f99a013 ("link_path_walk(): be careful when failing with
ENOTDIR").

So in general, we should always (a) either verify all sequence points
or (b) return -ENOCHLD to go into slow mode. The patch seems

However, this thing was explicitly made to be this way by commit
766c4cbfacd8 ("namei: d_is_negative() should be checked before ->d_seq
validation"), so while my gut feel is to consider this fix
ObviouslyCorrect(tm), I will delay it a bit in the hope to get an ACK
and comment from Al about the patch.

Al?

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 04/10] ARM: dts: enable clock support for BCM5301X

2015-10-09 Thread Stephen Boyd

On 10/09, Jon Mason wrote:
> On Fri, Oct 09, 2015 at 12:35:40AM -0700, Stephen Boyd wrote:
> > On 10/02, Jon Mason wrote:
> > 
> > >  arch/arm/boot/dts/bcm5301x.dtsi | 67 
> > > -
> > >  1 file changed, 60 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch/arm/boot/dts/bcm5301x.dtsi 
> > > b/arch/arm/boot/dts/bcm5301x.dtsi
> > > index 6f50f67..f717859 100644
> > > --- a/arch/arm/boot/dts/bcm5301x.dtsi
> > > +++ b/arch/arm/boot/dts/bcm5301x.dtsi
> > > @@ -55,14 +56,14 @@
> > >   compatible = "arm,cortex-a9-global-timer";
> > >   reg = <0x0200 0x100>;
> > >   interrupts = ;
> > > - clocks = <_periph>;
> > > + clocks = <_clk>;
> > >   };
> > >  
> > >   local-timer@0600 {
> > >   compatible = "arm,cortex-a9-twd-timer";
> > >   reg = <0x0600 0x100>;
> > >   interrupts = ;
> > > - clocks = <_periph>;
> > > + clocks = <_clk>;
> > >   };
> > >  
> > >   gic: interrupt-controller@1000 {
> > > @@ -94,14 +95,66 @@
> > >  
> > >   clocks {
> > 
> > I'd expect this to only contain nodes that don't have a reg
> > property. Clock providers that have a reg property would go into
> > some soc node or bus. Perhaps that's the chipcommonA node, or
> > axi?
> 
> This might get a little ugly, as some of the clocks are in the
> 0x1800 and others are in 0x1900.  I would think it better to
> have them all in one place (as that is more readable).  Do you preferr
> I split the pieces up into their respective DT nodes?

Are there two clock controllers? Sorry I don't understand the
architecture here very well. Nodes with reg properties in the
same range should be near each other. We don't group all i2c
controllers into the same node because they're logically i2c
controllers. We express the hierarchy of devices with container
nodes. The clocks node is only useful for board-level clocks, not
things that are inside the SoC.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] devicetree: add binding for generic mmio clocksource

2015-10-09 Thread Stephen Boyd

On 10/09, Måns Rullgård wrote:
> Stephen Boyd  writes:
> >
> > Does that mean a flag day? Urgh. Pain. I'm not opposed to adding
> > a pointer, in fact it might be better for performance so that we
> > don't take a cache miss in read() functions that need to load
> > some pointer. We were talking about that problem a few months
> > ago, but nothing came of it.
> 
> I've sent a patch.  Let the flames begin.
> 

I never got it. Was I Cced?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] devicetree: add binding for generic mmio clocksource

2015-10-09 Thread Stephen Boyd

On 10/09, Måns Rullgård wrote:
> Stephen Boyd  writes:
> 
> > On 10/09, Rob Herring wrote:
> >> 
> >> Adding a ptr to the callback seems fine to me.
> >> 
> >
> > Does that mean a flag day? Urgh. Pain. I'm not opposed to adding
> > a pointer, in fact it might be better for performance so that we
> > don't take a cache miss in read() functions that need to load
> > some pointer. We were talking about that problem a few months
> > ago, but nothing came of it.
> 
> Flag day in what sense?  There aren't all that many users of the
> interface (56, to be precise), and sched_clock_register() isn't
> exported. 

That's exactly what a flag day is. Lots of coordination, lots of
acks, etc. Last time when I changed the registration API I made a
new registration API, moved every caller over one by one, and
then deleted the old registration API. That's how you manage a
flag day.

We could probably do the same thing again with two different
types of registration APIs so that we move over users one by one.
The old registration API would be wrapped with a sched_clock
local function that doesn't pass the pointer it gets called with,
while the new registration API would fill in the function pointer
that we call directly from the core code. The double function
call is probably bad for performance, so I guess we should get
rid of it and always pass the pointer to the callback. But this
is at least a method to convert everything gradually without
breaking users that may be going through different trees.

> Verifying the change will be a minor pain, but I don't see
> why it should have any major consequences.  Obviously I'd just set the
> pointer to null for existing users and leave it for the respective
> maintainers to make proper use of it where sensible.
> 

Sure.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched_clock: add data pointer argument to read callback

2015-10-09 Thread Russell King - ARM Linux

On Sat, Oct 10, 2015 at 12:48:22AM +0100, Måns Rullgård wrote:
> Russell King - ARM Linux  writes:
> 
> > On Fri, Oct 09, 2015 at 10:57:35PM +0100, Mans Rullgard wrote:
> >> This passes a data pointer specified in the sched_clock_register()
> >> call to the read callback allowing simpler implementations thereof.
> >> 
> >> In this patch, existing uses of this interface are simply updated
> >> with a null pointer.
> >
> > This is a bad description.  It tells us what the patch is doing,
> > (which we can see by reading the patch) but not _why_.  Please include
> > information on why the change is necessary - describe what you are
> > trying to achieve.
> 
> Currently most of the callbacks use a global variable to store the
> address of a counter register.  This has several downsides:
> 
> - Loading the address of a global variable can be more expensive than
>   keeping a pointer next to the function pointer.
> 
> - It makes it impossible to have multiple instances of a driver call
>   sched_clock_register() since the caller can't know which clock will
>   win in the end.
> 
> - Many of the existing callbacks are practically identical and could be
>   replaced with a common generic function if it had a pointer argument.
> 
> If I've missed something that makes this a stupid idea, please tell.

So my next question is whether you intend to pass an iomem pointer
through this, or a some kind of structure, or both.  It matters,
because iomem pointers have a __iomem attribute to keep sparse
happy.  Having to force that attribute on and off pointers is frowned
upon, as it defeats the purpose of the sparse static checker.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/2] DRA72/DRA74: Add 2 lane support

2015-10-09 Thread Bjorn Helgaas

On Sat, Oct 10, 2015 at 04:46:55AM +0530, Kishon Vijay Abraham I wrote:
> Hi Bjorn,
> 
> On Saturday 10 October 2015 04:20 AM, Bjorn Helgaas wrote:
> > [+cc Arnd, Rob]
> > 
> > On Mon, Sep 28, 2015 at 06:27:36PM +0530, Kishon Vijay Abraham I wrote:
> >> Add driver modifications in pci-dra7xx to get x2 mode working in
> >> DRA72 and DRA74. Certain modifications is needed in PHY driver also
> >> which I'll send as a separate series.
> >>
> >> Kishon Vijay Abraham I (2):
> >>   pci: host: pci-dra7xx: use "num-lanes" property to find phy count
> >>   pci: host: pci-dra7xx: Enable x2 mode support
> > 
> > Applied to pci/host-dra7xx for v4.4, thanks!
> > 
> > I adjusted the subject line capitalization & format to match the history.
> > 
> > Arnd, Rob, any comments on the DT updates or the "num-lanes" vs number of
> > strings in "phy-names" changes?
> 
> I sent it as RFC since I didn't have the board to test 2 lane mode with
> $patch. And just now I got a board to test x2 and I found a problem with
> the 2nd patch.
> 
> .b1co_mode_sel_mask = GENMASK(2, 3), in the patch should be replaced with
> .b1co_mode_sel_mask = GENMASK(3, 2).
> 
> I'll resend the patch fixing the above.

OK, I dropped these two patches.  Thanks for letting me know.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched_clock: add data pointer argument to read callback

2015-10-09 Thread Måns Rullgård

Russell King - ARM Linux  writes:

> On Fri, Oct 09, 2015 at 10:57:35PM +0100, Mans Rullgard wrote:
>> This passes a data pointer specified in the sched_clock_register()
>> call to the read callback allowing simpler implementations thereof.
>> 
>> In this patch, existing uses of this interface are simply updated
>> with a null pointer.
>
> This is a bad description.  It tells us what the patch is doing,
> (which we can see by reading the patch) but not _why_.  Please include
> information on why the change is necessary - describe what you are
> trying to achieve.

Currently most of the callbacks use a global variable to store the
address of a counter register.  This has several downsides:

- Loading the address of a global variable can be more expensive than
  keeping a pointer next to the function pointer.

- It makes it impossible to have multiple instances of a driver call
  sched_clock_register() since the caller can't know which clock will
  win in the end.

- Many of the existing callbacks are practically identical and could be
  replaced with a common generic function if it had a pointer argument.

If I've missed something that makes this a stupid idea, please tell.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [lkp] [ACPI] 7494b07eba: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0

2015-10-09 Thread Rafael J. Wysocki


On 10/10/2015 12:52 AM, Al Stone wrote:

On 10/09/2015 03:02 PM, Rafael J. Wysocki wrote:

On Thursday, October 08, 2015 05:05:00 PM Al Stone wrote:

On 10/08/2015 04:50 PM, Rafael J. Wysocki wrote:

On Thursday, October 08, 2015 02:32:15 PM Al Stone wrote:

On 10/08/2015 02:41 PM, Rafael J. Wysocki wrote:

On Thursday, October 08, 2015 10:37:55 PM Rafael J. Wysocki wrote:

On Thursday, October 08, 2015 10:36:40 AM Al Stone wrote:

On 10/08/2015 05:44 AM, Hanjun Guo wrote:

On 10/08/2015 11:21 AM, kernel test robot wrote:

FYI, we noticed the below changes on

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
commit 7494b07ebaae2117629024369365f7be7adc16c3 ("ACPI: add in a
bad_madt_entry() function to eventually replace the macro")

[0.00] ACPI: undefined MADT subtable type for FADT 4.0: 127 (length 12)

[snip]


In the meantime, I'll poke the spec folks on the use of reserved subtable IDs
in the MADT and see what the consensus is there.  It may just be a matter of
clarifying the language in the spec.

One additional question to ask is what checks have been present in the OSes
and what they do if they see a reserved MADT subtable ID.  If they haven't been
doing anything so far, I'm afraid this particular train may be gone already.

It may be gone.  The silence so far is deafening :).


It's also on my plate to really dig into an ACPI test suite and see about
building something really robust for that -- this can be added as an example.
I'll see if I have time to send in a patch for FWTS, too, which is pretty
good about capturing such things.

Sounds good!

Thanks,
Rafael


Let me know if I need to send the patch to fix the regression elsewhere; it
dawned on me long after I sent it that this may not be the right place for it
to go...



Please send it to linux-a...@vger.kernel.org.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86/pci/legacy: make pci_subsys_init static

2015-10-09 Thread Bjorn Helgaas

On Fri, Oct 09, 2015 at 12:51:46AM +0600, Alexander Kuleshov wrote:
> The pci_subsys_init() is a subsys_initcall that can be declared
> static.
> 
> Signed-off-by: Alexander Kuleshov 

Applied to pci/misc for v4.4, thanks, Alexander!

I tweaked the subject to match the file's history.

> ---
>  arch/x86/pci/legacy.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
> index 5b662c0..ea6f380 100644
> --- a/arch/x86/pci/legacy.c
> +++ b/arch/x86/pci/legacy.c
> @@ -54,7 +54,7 @@ void pcibios_scan_specific_bus(int busn)
>  }
>  EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
>  
> -int __init pci_subsys_init(void)
> +static int __init pci_subsys_init(void)
>  {
>   /*
>* The init function returns an non zero value when
> -- 
> 2.6.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: arm/arm64: GICv2 driver does not have irq_disable implemented

2015-10-09 Thread Duc Dang

On Fri, Oct 9, 2015 at 3:21 PM, Duc Dang  wrote:
> On Fri, Oct 9, 2015 at 2:52 PM, Thomas Gleixner  wrote:
>> On Fri, 9 Oct 2015, Duc Dang wrote:
>>> On Fri, Oct 9, 2015 at 10:52 AM, Thomas Gleixner  wrote:
>>> > On Fri, 9 Oct 2015, Duc Dang wrote:
>>> >> In APM ARM64 X-Gene Enet controller driver, we use disable_irq_nosync to
>>> >> disable interrupt before calling __napi_schedule to schedule packet 
>>> >> handler
>>> >> to process the Tx/Rx packets.
>>> >
>>> > Which is wrong to begin with. Disable the interrupt at the device
>>> > level not at the interrupt line level.
>>> >
>>> We could not disable the interrupt at Enet controller level due to the
>>> controller limitation. As you said, using  disable_irq_nosync is wrong
>>> but it looks like that the only option that we have.
>>
>> Oh well.
>>
>>> Do you have any suggestion about different approach that we could try?
>>
>> Try the patch below and add
>>
>> irq_set_status_flags(irq, IRQ_DISABLE_UNLAZY);
>>
>> to your driver before requesting the interrupt. Unset it when freeing
>> the interrupt.
>
> Thanks, Thomas!
>
> We will try and let you know soon.

Hi  Thomas,

We use irq_set_status_flags(irq, IRQ_DISABLE_UNLAZY) in our X-Gene
Ethernet driver along with your patch and interrupt count works as
expected.

Are you going to commit your patch soon?

We will post the patch for our X-Gene Ethernet driver after your patch
is available.

Thanks,
>
>>
>> Thanks,
>>
>> tglx
>>
>> 8<--
>>
>> Subject: genirq: Add flag to force mask in disable_irq[_nosync]()
>> From: Thomas Gleixner 
>> Date: Fri, 09 Oct 2015 23:28:58 +0200
>>
>> If an irq chip does not implement the irq_disable callback, then we
>> use a lazy approach for disabling the interrupt. That means that the
>> interrupt is marked disabled, but the interrupt line is not
>> immediately masked in the interrupt chip. It only becomes masked if
>> the interrupt is raised while it's marked disabled. We use this to avoid
>> possibly expensive mask/unmask operations for common case operations.
>>
>> Unfortunately there are devices which do not allow the interrupt to be
>> disabled easily at the device level. They are forced to use
>> disable_irq_nosync(). This can result in taking each interrupt twice.
>>
>> Instead of enforcing the non lazy mode on all interrupts of a irq
>> chip, provide a settings flag, which can be set by the driver for that
>> particular interrupt line.
>>
>> Signed-off-by: Thomas Gleixner 
>> ---
>>  include/linux/irq.h   |4 +++-
>>  kernel/irq/chip.c |9 +
>>  kernel/irq/settings.h |7 +++
>>  3 files changed, 19 insertions(+), 1 deletion(-)
>>
>> Index: tip/include/linux/irq.h
>> ===
>> --- tip.orig/include/linux/irq.h
>> +++ tip/include/linux/irq.h
>> @@ -72,6 +72,7 @@ enum irqchip_irq_state;
>>   * IRQ_IS_POLLED   - Always polled by another interrupt. Exclude
>>   *   it from the spurious interrupt detection
>>   *   mechanism and from core side polling.
>> + * IRQ_DISABLE_UNLAZY  - Disable lazy irq disable
>>   */
>>  enum {
>> IRQ_TYPE_NONE   = 0x,
>> @@ -97,13 +98,14 @@ enum {
>> IRQ_NOTHREAD= (1 << 16),
>> IRQ_PER_CPU_DEVID   = (1 << 17),
>> IRQ_IS_POLLED   = (1 << 18),
>> +   IRQ_DISABLE_UNLAZY  = (1 << 19),
>>  };
>>
>>  #define IRQF_MODIFY_MASK   \
>> (IRQ_TYPE_SENSE_MASK | IRQ_NOPROBE | IRQ_NOREQUEST | \
>>  IRQ_NOAUTOEN | IRQ_MOVE_PCNTXT | IRQ_LEVEL | IRQ_NO_BALANCING | \
>>  IRQ_PER_CPU | IRQ_NESTED_THREAD | IRQ_NOTHREAD | IRQ_PER_CPU_DEVID 
>> | \
>> -IRQ_IS_POLLED)
>> +IRQ_IS_POLLED | IRQ_DISABLE_UNLAZY)
>>
>>  #define IRQ_NO_BALANCING_MASK  (IRQ_PER_CPU | IRQ_NO_BALANCING)
>>
>> Index: tip/kernel/irq/chip.c
>> ===
>> --- tip.orig/kernel/irq/chip.c
>> +++ tip/kernel/irq/chip.c
>> @@ -241,6 +241,13 @@ void irq_enable(struct irq_desc *desc)
>>   * disabled. If an interrupt happens, then the interrupt flow
>>   * handler masks the line at the hardware level and marks it
>>   * pending.
>> + *
>> + * If the interrupt chip does not implement the irq_disable callback,
>> + * a driver can disable the lazy approach for a particular irq line by
>> + * calling 'irq_set_status_flags(irq, IRQ_DISABLE_UNLAZY)'. This can be
>> + * used for devices which cannot disable the interrupt at the device
>> + * level under certain circumstances and have to use
>> + * disable_irq[_nosync] instead.
>>   */
>>  void irq_disable(struct irq_desc *desc)
>>  {
>> @@ -248,6 +255,8 @@ void irq_disable(struct irq_desc *desc)
>> if (desc->irq_data.chip->irq_disable) {
>> desc->irq_data.chip->irq_disable(>irq_data);
>> irq_state_set_masked(desc);
>> +   } else if

Re: [PATCH v3 1/7] drm/vc4: Add devicetree bindings for VC4.

2015-10-09 Thread Sebastian Reichel

Hi,

On Fri, Oct 09, 2015 at 02:27:42PM -0700, Eric Anholt wrote:
> VC4 is the GPU (display and 3D) subsystem present on the 2835 and some
> other Broadcom SoCs.
> 
> This binding follows the model of msm, imx, sti, and others, where
> there is a subsystem node for the whole GPU, with nodes for the
> individual HW components within it.

I think it would be useful to have the acronyms written out one time
in the document (VC4, HVS).

-- Sebastian


signature.asc
Description: PGP signature

Re: [PATCH] PCI: generic: Fix address window calculation for non-zero starting bus.

2015-10-09 Thread Bjorn Helgaas

On Thu, Oct 08, 2015 at 12:54:16PM -0700, David Daney wrote:
> From: David Daney 
> 
> Make the offset from the beginning of the "reg" property be from the
> starting bus number, rather than zero.  Hoist the invariant size
> calculation out of the mapping for loop.
> 
> Update host-generic-pci.txt to clarify the semantics of the "reg"
> property with respect to non-zero starting bus numbers.
> 
> Signed-off-by: David Daney 

Applied to pci/host-generic for v4.4 with Reviewed-by and Acked-by from
Arnd and Rob, thanks David!

Bjorn

> ---
> This is a replacement for:
> 
> https://lkml.org/lkml/2015/10/2/653
> 
> Since that patch is getting too much push-back, this is the only other
> option for fixing the driver to be internally consistent in its
> treatment of the offset from "reg" to the first bus in the "bus-range"
> property.
> 
>  Documentation/devicetree/bindings/pci/host-generic-pci.txt | 5 +++--
>  drivers/pci/host/pci-host-generic.c| 4 ++--
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/pci/host-generic-pci.txt 
> b/Documentation/devicetree/bindings/pci/host-generic-pci.txt
> index cf3e205..3f1d3fc 100644
> --- a/Documentation/devicetree/bindings/pci/host-generic-pci.txt
> +++ b/Documentation/devicetree/bindings/pci/host-generic-pci.txt
> @@ -34,8 +34,9 @@ Properties of the host controller node:
>  - #size-cells: Must be 2.
>  
>  - reg: The Configuration Space base address and size, as accessed
> -   from the parent bus.
> -
> +   from the parent bus.  The base address corresponds to
> +   the first bus in the "bus-range" property.  If no
> +   "bus-range" is specified, this will be bus 0 (the 
> default).
>  
>  Properties of the /chosen node:
>  
> diff --git a/drivers/pci/host/pci-host-generic.c 
> b/drivers/pci/host/pci-host-generic.c
> index fe9a81b..bb93346 100644
> --- a/drivers/pci/host/pci-host-generic.c
> +++ b/drivers/pci/host/pci-host-generic.c
> @@ -169,6 +169,7 @@ static int gen_pci_parse_map_cfg_windows(struct gen_pci 
> *pci)
>   struct resource *bus_range;
>   struct device *dev = pci->host.dev.parent;
>   struct device_node *np = dev->of_node;
> + u32 sz = 1 << pci->cfg.ops->bus_shift;
>  
>   err = of_address_to_resource(np, 0, >cfg.res);
>   if (err) {
> @@ -196,10 +197,9 @@ static int gen_pci_parse_map_cfg_windows(struct gen_pci 
> *pci)
>   bus_range = pci->cfg.bus_range;
>   for (busn = bus_range->start; busn <= bus_range->end; ++busn) {
>   u32 idx = busn - bus_range->start;
> - u32 sz = 1 << pci->cfg.ops->bus_shift;
>  
>   pci->cfg.win[idx] = devm_ioremap(dev,
> -  pci->cfg.res.start + busn * sz,
> +  pci->cfg.res.start + idx * sz,
>sz);
>   if (!pci->cfg.win[idx])
>   return -ENOMEM;
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched_clock: add data pointer argument to read callback

2015-10-09 Thread Russell King - ARM Linux

On Fri, Oct 09, 2015 at 10:57:35PM +0100, Mans Rullgard wrote:
> This passes a data pointer specified in the sched_clock_register()
> call to the read callback allowing simpler implementations thereof.
> 
> In this patch, existing uses of this interface are simply updated
> with a null pointer.

This is a bad description.  It tells us what the patch is doing,
(which we can see by reading the patch) but not _why_.  Please include
information on why the change is necessary - describe what you are
trying to achieve.

I generally don't accept patches what add new stuff to the kernel with
no users of that new stuff - that's called experience, experience of
people who submit stuff like that, and then vanish leaving their junk
in the kernel without any users.  Please ensure that this gets a user
very quickly, or better still, submit this patch as part of a series
which makes use of it.

Also, copying soo many people is guaranteed to be silently dropped by
mailing lists.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/2] DRA72/DRA74: Add 2 lane support

2015-10-09 Thread Kishon Vijay Abraham I

Hi,

On Saturday 10 October 2015 04:46 AM, Kishon Vijay Abraham I wrote:
> Hi Bjorn,
> 
> On Saturday 10 October 2015 04:20 AM, Bjorn Helgaas wrote:
>> [+cc Arnd, Rob]
>>
>> On Mon, Sep 28, 2015 at 06:27:36PM +0530, Kishon Vijay Abraham I wrote:
>>> Add driver modifications in pci-dra7xx to get x2 mode working in
>>> DRA72 and DRA74. Certain modifications is needed in PHY driver also
>>> which I'll send as a separate series.
>>>
>>> Kishon Vijay Abraham I (2):
>>>   pci: host: pci-dra7xx: use "num-lanes" property to find phy count
>>>   pci: host: pci-dra7xx: Enable x2 mode support
>>
>> Applied to pci/host-dra7xx for v4.4, thanks!
>>
>> I adjusted the subject line capitalization & format to match the history.
>>
>> Arnd, Rob, any comments on the DT updates or the "num-lanes" vs number of
>> strings in "phy-names" changes?
> 
> I sent it as RFC since I didn't have the board to test 2 lane mode with
> $patch. And just now I got a board to test x2 and I found a problem with
> the 2nd patch.
> 
> .b1co_mode_sel_mask = GENMASK(2, 3), in the patch should be replaced with
> .b1co_mode_sel_mask = GENMASK(3, 2).
> 
> I'll resend the patch fixing the above.

I'll resend after a couple of days to fix any comments from Arnd or Rob.

Thanks
Kishon

> 
> Thanks
> Kishon
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/2] DRA72/DRA74: Add 2 lane support

2015-10-09 Thread Kishon Vijay Abraham I

Hi Bjorn,

On Saturday 10 October 2015 04:20 AM, Bjorn Helgaas wrote:
> [+cc Arnd, Rob]
> 
> On Mon, Sep 28, 2015 at 06:27:36PM +0530, Kishon Vijay Abraham I wrote:
>> Add driver modifications in pci-dra7xx to get x2 mode working in
>> DRA72 and DRA74. Certain modifications is needed in PHY driver also
>> which I'll send as a separate series.
>>
>> Kishon Vijay Abraham I (2):
>>   pci: host: pci-dra7xx: use "num-lanes" property to find phy count
>>   pci: host: pci-dra7xx: Enable x2 mode support
> 
> Applied to pci/host-dra7xx for v4.4, thanks!
> 
> I adjusted the subject line capitalization & format to match the history.
> 
> Arnd, Rob, any comments on the DT updates or the "num-lanes" vs number of
> strings in "phy-names" changes?

I sent it as RFC since I didn't have the board to test 2 lane mode with
$patch. And just now I got a board to test x2 and I found a problem with
the 2nd patch.

.b1co_mode_sel_mask = GENMASK(2, 3), in the patch should be replaced with
.b1co_mode_sel_mask = GENMASK(3, 2).

I'll resend the patch fixing the above.

Thanks
Kishon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 3/6] pci:host: Add Altera PCIe host controller driver

2015-10-09 Thread Bjorn Helgaas

On Thu, Oct 08, 2015 at 06:03:24PM +0800, Ley Foon Tan wrote:
> On Thu, Oct 8, 2015 at 5:47 PM, Russell King - ARM Linux
>  wrote:
> >
> > On Thu, Oct 08, 2015 at 05:43:11PM +0800, Ley Foon Tan wrote:
> > > +static int altera_pcie_cfg_write(struct pci_bus *bus, unsigned int devfn,
> > > +  int where, int size, u32 value)
> > > +{
> > > + struct altera_pcie *pcie = bus->sysdata;
> > > + u32 data32;
> > > + u32 shift = 8 * (where & 3);
> > > + int ret;
> > > +
> > > + if (!altera_pcie_valid_config(pcie, bus, PCI_SLOT(devfn)))
> > > + return PCIBIOS_DEVICE_NOT_FOUND;
> > > +
> > > + /* write partial */
> > > + if (size != sizeof(u32)) {
> > > + ret = tlp_cfg_dword_read(pcie, bus->number, devfn,
> > > +  where & ~DWORD_MASK, );
> > > + if (ret)
> > > + return ret;
> > > + }
> > > +
> > > + switch (size) {
> > > + case 1:
> > > + data32 = (data32 & ~(0xff << shift)) |
> > > + ((value & 0xff) << shift);
> > > + break;
> > > + case 2:
> > > + data32 = (data32 & ~(0x << shift)) |
> > > + ((value & 0x) << shift);
> > > + break;
> > > + default:
> > > + data32 = value;
> >
> > Can you generate proper 1, 2 and 4 byte configuration accesses?  That
> > is much preferred over the above read-modify-write, as there are
> > registers in PCI and PCIe that are read/write-1-to-clear.  The above
> > has the effect of inadvertently clearing those RW1C bits.
> No, hardware can only access 4-byte aligned address.

This is non-spec compliant, and we really should have some way of
flagging that because it may break things in ways that would be very
difficult to debug, e.g., we can lose RW1C status bits when writing an
adjacent register, so they would just silently disappear.

I don't know if this should be a kernel taint, a simple warning in
dmesg, or what.  I guess the tainting mechanism is probably too
general-purpose for this, and add_taint() doesn't give any dmesg
indication.  We wouldn't see the taint unless the problem actually
caused an oops or panic.  In this case, I think I want a clue in dmesg
so we have a chance of seeing it even if there is no oops.  So
probably something like a dev_warn("non-compliant config accesses")
would work.

You really should double-check with the hardware guys, because it's
pretty obvious that the PCI spec requires 1- and 2-byte config
accesses to work correctly.  For example, if you read/modify/write to
update PCI_COMMAND, you will inadvertently clear the RW1C bits in
PCI_STATUS.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 2/7] MAINTAINERS: Add myself for the new VC4 (RPi GPU) graphics driver.

2015-10-09 Thread Eric Anholt

Emil Velikov  writes:

> Hi Eric,
>
> On 9 October 2015 at 22:27, Eric Anholt  wrote:
>> Signed-off-by: Eric Anholt 
>> ---
>>
>> v2: Mark it Supported, not Maintained.
>>
>>  MAINTAINERS | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 7ba7ab7..e331e46 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -3653,6 +3653,12 @@ S:   Maintained
>>  F: drivers/gpu/drm/sti
>>  F: Documentation/devicetree/bindings/gpu/st,stih4xx.txt
>>
>> +DRM DRIVERS FOR VC4
>> +M: Eric Anholt 
>> +T: git git://github.com/anholt/linux
> You might want to add dri-devel as mailing list, at least initially.

I believe that's already implied by the general entry for dri-devel:

./scripts/get_maintainer.pl -f drivers/gpu/drm/vc4/vc4_drv.c
Eric Anholt  (supporter:DRM DRIVERS FOR VC4)
David Airlie  (maintainer:DRM DRIVERS)
dri-de...@lists.freedesktop.org (open list:DRM DRIVERS)


signature.asc
Description: PGP signature

[PATCH] f2fs: set GFP_NOFS for grab_cache_page

2015-10-09 Thread Jaegeuk Kim

For normal inodes, their pages are allocated with __GFP_FS, which can cause
filesystem calls when reclaiming memory.
This can incur a dead lock condition accordingly.

So, this patch addresses this problem by introducing
f2fs_grab_cache_page(.., bool for_write), which calls
grab_cache_page_write_begin() with AOP_FLAG_NOFS.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c | 16 +---
 fs/f2fs/dir.c  |  6 +++---
 fs/f2fs/f2fs.h | 12 ++--
 fs/f2fs/file.c |  6 +++---
 fs/f2fs/gc.c   |  6 +++---
 5 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index bc04e92..d4f1c74 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -275,7 +275,8 @@ int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index)
return f2fs_reserve_block(dn, index);
 }
 
-struct page *get_read_data_page(struct inode *inode, pgoff_t index, int rw)
+struct page *get_read_data_page(struct inode *inode, pgoff_t index,
+   int rw, bool for_write)
 {
struct address_space *mapping = inode->i_mapping;
struct dnode_of_data dn;
@@ -292,7 +293,7 @@ struct page *get_read_data_page(struct inode *inode, 
pgoff_t index, int rw)
if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode))
return read_mapping_page(mapping, index, NULL);
 
-   page = grab_cache_page(mapping, index);
+   page = f2fs_grab_cache_page(mapping, index, for_write);
if (!page)
return ERR_PTR(-ENOMEM);
 
@@ -352,7 +353,7 @@ struct page *find_data_page(struct inode *inode, pgoff_t 
index)
return page;
f2fs_put_page(page, 0);
 
-   page = get_read_data_page(inode, index, READ_SYNC);
+   page = get_read_data_page(inode, index, READ_SYNC, false);
if (IS_ERR(page))
return page;
 
@@ -372,12 +373,13 @@ struct page *find_data_page(struct inode *inode, pgoff_t 
index)
  * Because, the callers, functions in dir.c and GC, should be able to know
  * whether this page exists or not.
  */
-struct page *get_lock_data_page(struct inode *inode, pgoff_t index)
+struct page *get_lock_data_page(struct inode *inode, pgoff_t index,
+   bool for_write)
 {
struct address_space *mapping = inode->i_mapping;
struct page *page;
 repeat:
-   page = get_read_data_page(inode, index, READ_SYNC);
+   page = get_read_data_page(inode, index, READ_SYNC, for_write);
if (IS_ERR(page))
return page;
 
@@ -411,7 +413,7 @@ struct page *get_new_data_page(struct inode *inode,
struct dnode_of_data dn;
int err;
 repeat:
-   page = grab_cache_page(mapping, index);
+   page = f2fs_grab_cache_page(mapping, index, true);
if (!page) {
/*
 * before exiting, we should make sure ipage will be released
@@ -439,7 +441,7 @@ repeat:
} else {
f2fs_put_page(page, 1);
 
-   page = get_read_data_page(inode, index, READ_SYNC);
+   page = get_read_data_page(inode, index, READ_SYNC, true);
if (IS_ERR(page))
goto repeat;
 
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 6726c4a..7c1678b 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -258,7 +258,7 @@ struct f2fs_dir_entry *f2fs_parent_dir(struct inode *dir, 
struct page **p)
if (f2fs_has_inline_dentry(dir))
return f2fs_parent_inline_dir(dir, p);
 
-   page = get_lock_data_page(dir, 0);
+   page = get_lock_data_page(dir, 0, false);
if (IS_ERR(page))
return NULL;
 
@@ -740,7 +740,7 @@ bool f2fs_empty_dir(struct inode *dir)
return f2fs_empty_inline_dir(dir);
 
for (bidx = 0; bidx < nblock; bidx++) {
-   dentry_page = get_lock_data_page(dir, bidx);
+   dentry_page = get_lock_data_page(dir, bidx, false);
if (IS_ERR(dentry_page)) {
if (PTR_ERR(dentry_page) == -ENOENT)
continue;
@@ -854,7 +854,7 @@ static int f2fs_readdir(struct file *file, struct 
dir_context *ctx)
min(npages - n, (pgoff_t)MAX_DIR_RA_PAGES));
 
for (; n < npages; n++) {
-   dentry_page = get_lock_data_page(inode, n);
+   dentry_page = get_lock_data_page(inode, n, false);
if (IS_ERR(dentry_page))
continue;
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 6f2310c..6ba5a59 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1233,6 +1233,14 @@ static inline unsigned int valid_inode_count(struct 
f2fs_sb_info *sbi)
return sbi->total_valid_inode_count;
 }
 
+static inline struct page *f2fs_grab_cache_page(struct address_space *mapping,
+   pgoff_t index, bool for_write)
+{
+   if (!for_write)
+   return

[PATCH v9] gpio: Add GPIO support for the ACCES 104-IDIO-16

2015-10-09 Thread William Breathitt Gray

The ACCES 104-IDIO-16 family of PC/104 utility boards feature 16
optically isolated inputs and 16 optically isolated FET solid state
outputs. This driver provides GPIO support for these 32 channels of
digital I/O. Change-of-State detection interrupts are not supported.

GPIO 0-15 correspond to digital outputs 0-15, while GPIO 16-31
correspond to digital inputs 0-15. The base port address for the device
may be set via the a_104_idio_16_base module parameter.

Signed-off-by: William Breathitt Gray 
---
Changes in v9:
  - Initialize GPIO device private data structure to 0 to prevent
garbage data pollution

 drivers/gpio/Kconfig|  10 ++
 drivers/gpio/Makefile   |   1 +
 drivers/gpio/gpio-104-idio-16.c | 219 
 3 files changed, 230 insertions(+)
 create mode 100644 drivers/gpio/gpio-104-idio-16.c

diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index 8949b3f..dc843e0 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -684,6 +684,16 @@ config GPIO_SX150X
 
 endmenu
 
+menu "ISA GPIO drivers"
+
+config GPIO_104_IDIO_16
+   tristate "ACCES 104-IDIO-16 GPIO support"
+   depends on X86
+   help
+ Enables GPIO support for the ACCES 104-IDIO-16 family.
+
+endmenu
+
 menu "MFD GPIO expanders"
 
 config GPIO_ADP5520
diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile
index f79a7c4..6f2fea5 100644
--- a/drivers/gpio/Makefile
+++ b/drivers/gpio/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_GPIO_ACPI)   += gpiolib-acpi.o
 # Device drivers. Generally keep list sorted alphabetically
 obj-$(CONFIG_GPIO_GENERIC) += gpio-generic.o
 
+obj-$(CONFIG_GPIO_104_IDIO_16) += gpio-104-idio-16.o
 obj-$(CONFIG_GPIO_74X164)  += gpio-74x164.o
 obj-$(CONFIG_GPIO_74XX_MMIO)   += gpio-74xx-mmio.o
 obj-$(CONFIG_GPIO_ADNP)+= gpio-adnp.o
diff --git a/drivers/gpio/gpio-104-idio-16.c b/drivers/gpio/gpio-104-idio-16.c
new file mode 100644
index 000..a85ae05
--- /dev/null
+++ b/drivers/gpio/gpio-104-idio-16.c
@@ -0,0 +1,219 @@
+/*
+ * GPIO driver for the ACCES 104-IDIO-16 family
+ * Copyright (C) 2015 William Breathitt Gray
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static unsigned a_104_idio_16_base;
+module_param(a_104_idio_16_base, uint, 0);
+MODULE_PARM_DESC(a_104_idio_16_base, "ACCES 104-IDIO-16 base address");
+
+/**
+ * struct a_104_idio_16_gpio - GPIO device private data structure
+ * @chip:  instance of the gpio_chip
+ * @lock:  synchronization lock to prevent gpio_set race conditions
+ * @base:  base port address of the GPIO device
+ * @extent:extent of port address region of the GPIO device
+ * @out_state: output bits state
+ */
+struct a_104_idio_16_gpio {
+   struct gpio_chip chip;
+   spinlock_t lock;
+   unsigned base;
+   unsigned extent;
+   unsigned out_state;
+};
+
+static int a_104_idio_16_gpio_get_direction(struct gpio_chip *chip,
+   unsigned offset)
+{
+   if (offset > 15)
+   return 1;
+
+   return 0;
+}
+
+static int a_104_idio_16_gpio_direction_input(struct gpio_chip *chip,
+   unsigned offset)
+{
+   return 0;
+}
+
+static int a_104_idio_16_gpio_direction_output(struct gpio_chip *chip,
+   unsigned offset, int value)
+{
+   chip->set(chip, offset, value);
+   return 0;
+}
+
+static struct a_104_idio_16_gpio *to_a104idio16gp(struct gpio_chip *gc)
+{
+   return container_of(gc, struct a_104_idio_16_gpio, chip);
+}
+
+static int a_104_idio_16_gpio_get(struct gpio_chip *chip, unsigned offset)
+{
+   struct a_104_idio_16_gpio *const a104idio16gp = to_a104idio16gp(chip);
+   const unsigned BIT_MASK = 1U << (offset-16);
+
+   if (offset < 16)
+   return -EINVAL;
+
+   if (offset < 24)
+   return !!(inb(a104idio16gp->base + 1) & BIT_MASK);
+
+   return !!(inb(a104idio16gp->base + 5) & (BIT_MASK>>8));
+}
+
+static void a_104_idio_16_gpio_set(struct gpio_chip *chip, unsigned offset,
+   int value)
+{
+   struct a_104_idio_16_gpio *const a104idio16gp = to_a104idio16gp(chip);
+   const unsigned BIT_MASK = 1U << offset;
+   unsigned long flags;
+
+   if (offset > 15)
+   return;
+
+   spin_lock_irqsave(>lock, flags);
+
+   if (value)
+   a104idio16gp->out_state |= BIT_MASK;
+   else
+   a104idio16gp->out_state &= ~BIT_MASK;
+
+   if (offset > 7)
+

[PULL v2] GIC changes for Linux 4.4

2015-10-09 Thread Marc Zyngier

Hi Thomas, Jason,

Here's the pull request for the GIC updates I stashed over the past
few weeks. Only real new feature is the 32bit support from
Jean-Philippe, the rest is all about dealing with errata and firmware.

Please pull!

Thanks,

M.

The following changes since commit 0e841b04c829f59a5d5745f98d2857f48882efe9:

  irqchip/sunxi-nmi: Switch to of_io_request_and_map() from of_iomap() 
(2015-10-09 22:47:28 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git 
tags/gic-4.4

for you to fetch changes up to 4f64cb65bf76fbd89c62d8e69c7bf75091950739:

  arm/arm64: KVM: Only allow 64bit hosts to build VGICv3 (2015-10-09 23:11:57 
+0100)


GIC updates for Linux 4.4:

- Enable basic GICv3 support on 32bit ARM (mostly for running VMs with
  more than 8 virtual CPUs)
- arm64 changes to deal with firmware limitations that forces a GICv3
  to be used as a GICv2
- A GICv2m erratum workaround on Applied Micro X-Gene2


Duc Dang (1):
  irqchip/gic-v2m: Add workaround for APM X-Gene GICv2m erratum

Jean-Philippe Brucker (6):
  irqchip/gic-v3: Refactor the arm64 specific parts
  irqchip/gic-v3: Change unsigned types for AArch32 compatibility
  irqchip/gic-v3: Specialize readq and writeq accesses
  ARM: add 32bit support to GICv3
  ARM: virt: select ARM_GIC_V3
  arm/arm64: KVM: Only allow 64bit hosts to build VGICv3

Marc Zyngier (5):
  arm64: el2_setup: Make sure ICC_SRE_EL2.SRE sticks before using GICv3 
sysregs
  irqchip/gic-v3: Make gic_enable_sre an inline function
  arm64: cpufeatures: Check ICC_EL1_SRE.SRE before enabling 
ARM64_HAS_SYSREG_GIC_CPUIF
  irqchip/gic: Warn if GICv3 system registers are enabled
  arm64: Update booting requirements for GICv3 in GICv2 mode

 Documentation/arm64/booting.txt |  11 ++-
 arch/arm/Kconfig|   1 +
 arch/arm/include/asm/arch_gicv3.h   | 188 
 arch/arm64/include/asm/arch_gicv3.h | 170 
 arch/arm64/kernel/cpufeature.c  |  19 +++-
 arch/arm64/kernel/head.S|   2 +
 arch/arm64/kvm/Kconfig  |   4 +
 drivers/irqchip/irq-gic-v2m.c   |  22 +
 drivers/irqchip/irq-gic-v3.c| 120 ++-
 drivers/irqchip/irq-gic.c   |  15 +++
 include/kvm/arm_vgic.h  |   4 +-
 include/linux/irqchip/arm-gic-v3.h  | 104 +---
 virt/kvm/arm/vgic.c |   4 +-
 13 files changed, 487 insertions(+), 177 deletions(-)
 create mode 100644 arch/arm/include/asm/arch_gicv3.h
 create mode 100644 arch/arm64/include/asm/arch_gicv3.h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [lkp] [ACPI] 7494b07eba: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0

2015-10-09 Thread Al Stone

On 10/09/2015 03:02 PM, Rafael J. Wysocki wrote:
> On Thursday, October 08, 2015 05:05:00 PM Al Stone wrote:
>> On 10/08/2015 04:50 PM, Rafael J. Wysocki wrote:
>>> On Thursday, October 08, 2015 02:32:15 PM Al Stone wrote:
 On 10/08/2015 02:41 PM, Rafael J. Wysocki wrote:
> On Thursday, October 08, 2015 10:37:55 PM Rafael J. Wysocki wrote:
>> On Thursday, October 08, 2015 10:36:40 AM Al Stone wrote:
>>> On 10/08/2015 05:44 AM, Hanjun Guo wrote:
 On 10/08/2015 11:21 AM, kernel test robot wrote:
> FYI, we noticed the below changes on
>
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
> master
> commit 7494b07ebaae2117629024369365f7be7adc16c3 ("ACPI: add in a
> bad_madt_entry() function to eventually replace the macro")
>
> [0.00] ACPI: undefined MADT subtable type for FADT 4.0: 127 
> (length 12)
[snip]

>> In the meantime, I'll poke the spec folks on the use of reserved subtable IDs
>> in the MADT and see what the consensus is there.  It may just be a matter of
>> clarifying the language in the spec.
> 
> One additional question to ask is what checks have been present in the OSes
> and what they do if they see a reserved MADT subtable ID.  If they haven't 
> been
> doing anything so far, I'm afraid this particular train may be gone already.

It may be gone.  The silence so far is deafening :).

>> It's also on my plate to really dig into an ACPI test suite and see about
>> building something really robust for that -- this can be added as an example.
>> I'll see if I have time to send in a patch for FWTS, too, which is pretty
>> good about capturing such things.
> 
> Sounds good!
> 
> Thanks,
> Rafael
> 

Let me know if I need to send the patch to fix the regression elsewhere; it
dawned on me long after I sent it that this may not be the right place for it
to go...

-- 
ciao,
al
---
Al Stone
Software Engineer
Linaro Enterprise Group
al.st...@linaro.org
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1610 matches

Mail list logo