Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations

2015-06-17 Thread Vlastimil Babka
On 18.6.2015 3:23, Xishi Qiu wrote:
> On 2015/6/16 17:46, Vlastimil Babka wrote:
> 
>> On 06/16/2015 10:17 AM, Xishi Qiu wrote:
>>> On 2015/6/16 15:53, Vlastimil Babka wrote:
>>>
 On 06/04/2015 02:54 PM, Xishi Qiu wrote:
>
> I think add a new migratetype is btter and easier than a new zone, so I 
> use

 If the mirrored memory is in a single reasonably compact (no large holes) 
 range
 (per NUMA node) and won't dynamically change its size, then zone might be a
 better option. For one thing, it will still allow distinguishing movable 
 and
 unmovable allocations within the mirrored memory.

 We had enough fun with MIGRATE_CMA and all kinds of checks it added to 
 allocator
 hot paths, and even CMA is now considering moving to a separate zone.

>>>
>>> Hi, how about the problem of this case:
>>> e.g. node 0: 0-4G(dma and dma32)
>>>  node 1: 4G-8G(normal), 8-12G(mirror), 12-16G(normal),
>>> so more than one normal zone in a node? or normal zone just span the mirror 
>>> zone?
>>
>> Normal zone can span the mirror zone just fine. However, it will result in 
>> zone
>> scanners such as compaction to skip over the mirror zone inefficiently. 
>> Hmm...

On the other hand, it would skip just as inefficiently over MIGRATE_MIRROR
pageblocks within a Normal zone. Since migrating pages between MIGRATE_MIRROR
and other types pageblocks would violate what the allocations requested.

Having separate zone instead would allow compaction to run specifically on the
zone and defragment movable allocations there (i.e. userspace pages if/when
userspace requesting mirrored memory is supported).

>>
> 
> Hi Vlastimil,
> 
> If there are many mirror regions in one node, then it will be many holes in 
> the
> normal zone, is this fine?

Yeah, it doesn't matter how many holes there are.

> Thanks,
> Xishi Qiu
> 
>>
>> .
>>
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/1] perf tools: Check access permission when reading /proc/kcore file.

2015-06-17 Thread Sukadev Bhattiprolu
Li Zhang [zhlci...@linux.vnet.ibm.com] wrote:

| >For consistency with rest of the file, use pr_warning() or pr_err().
| 
| ui_warning can report the message to users directly when this
| program is running.
| But if we considered the consistency, pr_warning or pr_err should be better.
| And users can get this message by trying another time.

That seems to be the way perf currently operates - silent by default for
non-fatal errors. -v or -vvv increases verbosity and reports non-fatal
warnings/errors also.

| 
| >
| >Also, we could drop the access() call and report the error when open()
| >fails below?
| 
| I think we can drop this access. But /proc/kcore also require the
| process with CAP_SYS_RAWIO
| capability. Even if chown this file, access report right result, but
| open still fails.

Maybe the error message could hint that CAP_SYS_RAWIO would be needed.
| 
| >
| >|fd = open(kcore_filename, O_RDONLY);
| >|if (fd < 0)
| >|return -EINVAL;
| >
| >Further, if user specifies the file with --kallsyms and we are not
| >able to read it, we should treat it as a fatal error and exit - this
| >would be easer when parsing command line args.
| I have another patch which checks this files. I will merge it to this patch.
| 
| >
| >If user did not specify the option and we are proactively trying to
| >use /proc/kcore, we should not treat errors as fatal? i.e report
| >a warning message and continue without symbols?
| 
| In the current program, even if open fails, the program still
| continue to run.
| Is it helpful for users to get the address without symbols?

Well, if profiling applications, user may not care about kernel symbols,
so being unable to open /proc/kcore would be ok? If OTOH, user specifies
--kallsyms, then they care about the kenrel symbols so we should treat
the open() error () as fatal.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] scsi: use kzalloc for allocating one thing

2015-06-17 Thread Hannes Reinecke
On 06/18/2015 06:36 AM, Maninder Singh wrote:
> Use kzalloc rather than kcalloc(1,...) for allocating one thing
> 
> Signed-off-by: Maninder Singh 
> Reviewed-by: Vaneet Narang 
> ---
>  drivers/scsi/mvsas/mv_init.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
> index d40d734..65e47eb 100644
> --- a/drivers/scsi/mvsas/mv_init.c
> +++ b/drivers/scsi/mvsas/mv_init.c
> @@ -558,7 +558,7 @@ static int mvs_pci_init(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>  
>   chip = _chips[ent->driver_data];
>   SHOST_TO_SAS_HA(shost) =
> - kcalloc(1, sizeof(struct sas_ha_struct), GFP_KERNEL);
> + kzalloc(sizeof(struct sas_ha_struct), GFP_KERNEL);
>   if (!SHOST_TO_SAS_HA(shost)) {
>   kfree(shost);
>   rc = -ENOMEM;
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] pinctrl: simplify of_pinctrl_get()

2015-06-17 Thread Masahiro Yamada
This commit does not change the logic at all.

Signed-off-by: Masahiro Yamada 
---

 drivers/pinctrl/devicetree.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/pinctrl/devicetree.c b/drivers/pinctrl/devicetree.c
index 0bbf7d7..fe04e74 100644
--- a/drivers/pinctrl/devicetree.c
+++ b/drivers/pinctrl/devicetree.c
@@ -97,13 +97,7 @@ static int dt_remember_or_free_map(struct pinctrl *p, const 
char *statename,
 
 struct pinctrl_dev *of_pinctrl_get(struct device_node *np)
 {
-   struct pinctrl_dev *pctldev;
-
-   pctldev = get_pinctrl_dev_from_of_node(np);
-   if (!pctldev)
-   return NULL;
-
-   return pctldev;
+   return get_pinctrl_dev_from_of_node(np);
 }
 
 static int dt_to_map_one_config(struct pinctrl *p, const char *statename,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Panic when cpu hot-remove

2015-06-17 Thread Jiang Liu
On 2015/6/17 22:36, Alex Williamson wrote:
> On Wed, 2015-06-17 at 13:52 +0200, Joerg Roedeljoro wrote:
>> On Wed, Jun 17, 2015 at 10:42:49AM +, 范冬冬 wrote:
>>> Hi maintainer,
>>>
>>> We found a problem that a panic happen when cpu was hot-removed. We also 
>>> trace the problem according to the calltrace information.
>>> An endless loop happen because value head is not equal to value tail 
>>> forever in the function qi_check_fault( ).
>>> The location code is as follows:
>>>
>>>
>>> do {
>>> if (qi->desc_status[head] == QI_IN_USE)
>>> qi->desc_status[head] = QI_ABORT;
>>> head = (head - 2 + QI_LENGTH) % QI_LENGTH;
>>> } while (head != tail);
>>
>> Hmm, this code interates only over every second QI descriptor, and tail
>> probably points to a descriptor that is not iterated over.
>>
>> Jiang, can you please have a look?
> 
> I think that part is normal, the way we use the queue is to always
> submit a work operation followed by a wait operation so that we can
> determine the work operation is complete.  That's done via
> qi_submit_sync().  We have had spurious reports of the queue getting
> impossibly out of sync though.  I saw one that was somehow linked to the
> I/O AT DMA engine.  Roland Dreier saw something similar[1].  I'm not
> sure if they're related to this, but maybe worth comparing.  Thanks,
Thanks, Alex and Joerg!

Hi Dongdong,
Could you please help to give some instructions about how to
reproduce this issue? I will try to reproduce it if possible.
Thanks!
Gerry

> 
> Alex
> 
> [1] http://lists.linuxfoundation.org/pipermail/iommu/2015-January/011502.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] arm64: dts: mt8173: add clock_null

2015-06-17 Thread Eddie Huang
Add clk_null, which represents clocks that can not / need not
controlled by software.
There are many clocks' parent set to clk_null.

Signed-off-by: James Liao 
Signed-off-by: Eddie Huang 
---
Base on 4.1-rc1

Change-Id: I4db9b40d07e28f54f7bae9b676316cbd6a962124
---
 arch/arm64/boot/dts/mediatek/mt8173.dtsi | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt8173.dtsi 
b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
index 924fdb6..4798f44 100644
--- a/arch/arm64/boot/dts/mediatek/mt8173.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
@@ -81,6 +81,12 @@
cpu_on= <0x8403>;
};
 
+   clk_null: clk_null {
+   compatible = "fixed-clock";
+   clock-frequency = <0>;
+   #clock-cells = <0>;
+   };
+
uart_clk: dummy26m {
compatible = "fixed-clock";
clock-frequency = <2600>;
-- 
1.8.1.1.dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-06-17 Thread Ming Lin
On Wed, Jun 10, 2015 at 2:46 PM, Mike Snitzer  wrote:
> On Wed, Jun 10 2015 at  5:20pm -0400,
> Ming Lin  wrote:
>
>> On Mon, Jun 8, 2015 at 11:09 PM, Ming Lin  wrote:
>> > On Thu, 2015-06-04 at 17:06 -0400, Mike Snitzer wrote:
>> >> We need to test on large HW raid setups like a Netapp filer (or even
>> >> local SAS drives connected via some SAS controller).  Like a 8+2 drive
>> >> RAID6 or 8+1 RAID5 setup.  Testing with MD raid on JBOD setups with 8
>> >> devices is also useful.  It is larger RAID setups that will be more
>> >> sensitive to IO sizes being properly aligned on RAID stripe and/or chunk
>> >> size boundaries.
>> >
>> > Here are tests results of xfs/ext4/btrfs read/write on HW RAID6/MD 
>> > RAID6/DM stripe target.
>> > Each case run 0.5 hour, so it took 36 hours to finish all the tests on 
>> > 4.1-rc4 and 4.1-rc4-patched kernels.
>> >
>> > No performance regressions were introduced.
>> >
>> > Test server: Dell R730xd(2 sockets/48 logical cpus/264G memory)
>> > HW RAID6/MD RAID6/DM stripe target were configured with 10 HDDs, each 280G
>> > Stripe size 64k and 128k were tested.
>> >
>> > devs="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh 
>> > /dev/sdi /dev/sdj /dev/sdk"
>> > spare_devs="/dev/sdl /dev/sdm"
>> > stripe_size=64 (or 128)
>> >
>> > MD RAID6 was created by:
>> > mdadm --create --verbose /dev/md0 --level=6 --raid-devices=10 $devs 
>> > --spare-devices=2 $spare_devs -c $stripe_size
>> >
>> > DM stripe target was created by:
>> > pvcreate $devs
>> > vgcreate striped_vol_group $devs
>> > lvcreate -i10 -I${stripe_size} -L2T -nstriped_logical_volume 
>> > striped_vol_group
>
> DM had a regression relative to merge_bvec that wasn't fixed until
> recently (it wasn't in 4.1-rc4), see commit 1c220c69ce0 ("dm: fix
> casting bug in dm_merge_bvec()").  It was introduced in 4.1.
>
> So your 4.1-rc4 DM stripe testing may have effectively been with
> merge_bvec disabled.
>
>> > Here is an example of fio script for stripe size 128k:
>> > [global]
>> > ioengine=libaio
>> > iodepth=64
>> > direct=1
>> > runtime=1800
>> > time_based
>> > group_reporting
>> > numjobs=48
>> > gtod_reduce=0
>> > norandommap
>> > write_iops_log=fs
>> >
>> > [job1]
>> > bs=1280K
>> > directory=/mnt
>> > size=5G
>> > rw=read
>> >
>> > All results here: http://minggr.net/pub/20150608/fio_results/
>> >
>> > Results summary:
>> >
>> > 1. HW RAID6: stripe size 64k
>> > 4.1-rc4 4.1-rc4-patched
>> > --- ---
>> > (MB/s)  (MB/s)
>> > xfs read:   821.23  812.20  -1.09%
>> > xfs write:  753.16  754.42  +0.16%
>> > ext4 read:  827.80  834.82  +0.84%
>> > ext4 write: 783.08  777.58  -0.70%
>> > btrfs read: 859.26  871.68  +1.44%
>> > btrfs write:815.63  844.40  +3.52%
>> >
>> > 2. HW RAID6: stripe size 128k
>> > 4.1-rc4 4.1-rc4-patched
>> > --- ---
>> > (MB/s)  (MB/s)
>> > xfs read:   948.27  979.11  +3.25%
>> > xfs write:  820.78  819.94  -0.10%
>> > ext4 read:  978.35  997.92  +2.00%
>> > ext4 write: 853.51  847.97  -0.64%
>> > btrfs read: 1013.1  1015.6  +0.24%
>> > btrfs write:854.43  850.42  -0.46%
>> >
>> > 3. MD RAID6: stripe size 64k
>> > 4.1-rc4 4.1-rc4-patched
>> > --- ---
>> > (MB/s)  (MB/s)
>> > xfs read:   847.34  869.43  +2.60%
>> > xfs write:  198.67  199.03  +0.18%
>> > ext4 read:  763.89  767.79  +0.51%
>> > ext4 write: 281.44  282.83  +0.49%
>> > btrfs read: 756.02  743.69  -1.63%
>> > btrfs write:268.37  265.93  -0.90%
>> >
>> > 4. MD RAID6: stripe size 128k
>> > 4.1-rc4 4.1-rc4-patched
>> > --- ---
>> > (MB/s)  (MB/s)
>> > xfs read:   993.04  1014.1  +2.12%
>> > xfs write:  293.06  298.95  +2.00%
>> > ext4 read:  1019.6  1020.9  +0.12%
>> > ext4 write: 371.51  371.47  -0.01%
>> > btrfs read: 1000.4  1020.8  +2.03%
>> > btrfs write:241.08  246.77  +2.36%
>> >
>> > 5. DM: stripe size 64k
>> > 4.1-rc4 4.1-rc4-patched
>> > --- ---
>> > (MB/s)  (MB/s)
>> > xfs read:   1084.4  1080.1  -0.39%
>> > xfs write:  1071.1  1063.4  -0.71%
>> > ext4 read:  991.54  1003.7  +1.22%
>> > ext4 write: 1069.7  1052.2  -1.63%
>> > btrfs read: 1076.1  1082.1  +0.55%
>> > btrfs write:968.98  965.07  -0.40%
>> >
>> > 6. DM: stripe size 128k
>> > 4.1-rc4 4.1-rc4-patched
>> > --- 

[RFC][PATCH] arm64:Modify the dump mem for 64 bit addresses

2015-06-17 Thread Maninder Singh
From: Rohit Thapliyal 

On 64bit kernel, the dump_mem gives 32 bit addresses
on the stack dump. This gives unorganized information regarding
the 64bit values on the stack. Hence, modified to get a complete
64bit memory dump.

With patch:
Process insmod (pid: 1587, stack limit = 0xffc976be4058)
Stack: (0xffc976be7cf0 to 0xffc976be8000)
7ce0:   ffc976be7d00 ffc8163c
7d00: ffc976be7d40 ffcf8a44 ffc00098ef38 ffbffc88
7d20: ffc00098ef50 ffbffcc0 0001 ffbffc70
7d40: ffc976be7e40 ffcf935c  2b424090
7d60: 2b424010 007facc555f4 8000 0015
7d80: 0116 0069 ffc00097b000 ffc976be4000
7da0: 0064 0072 006e 003f
7dc0: feff fff1 ffbffc002028 0124
7de0: ffc976be7e10 0001 ff80 ffbb
7e00: ffc976be7e60   
7e20:    
7e40: 007fcc474550 ffc841ec 2b424010 007facda0710
7e60:  ffcbe6dc ff80007d2000 0001c010
7e80: ff80007e0ae0 ff80007e09d0 ff80007edf70 0288
7ea0: 02e8   001c001b
7ec0: 0009 0007 2b424090 0001c010
7ee0: 2b424010 007faccd3a48  
7f00: 007fcc4743f8 007fcc4743f8 0069 0003
7f20: 0101010101010101 0004 0020 03f3
7f40: 007facb95664 007facda7030 007facc555d0 00498378
7f60:  2b424010 007facda0710 2b424090
7f80: 007fcc474698 00498000 007fcc474ebb 00474f58
7fa0: 00498000   007fcc474550
7fc0: 004104bc 007fcc474430 007facc555f4 8000
7fe0: 2b424090 0069 0950020128000244 41040804
Call trace:

The above output makes a debugger life a lot more easier.

Signed-off-by: Rohit Thapliyal 
Signed-off-by: Maninder Singh 
Reviewed-by: Akhilesh Kumar  
---
 arch/arm64/kernel/traps.c |   62 +++--
 1 file changed, 60 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 1ef2940..6e9f19b 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -51,6 +51,48 @@ int show_unhandled_signals = 1;
 /*
  * Dump out the contents of some memory nicely...
  */
+
+static void dump_mem64(const char *lvl, const char *str, unsigned long bottom,
+   unsigned long top)
+{
+   unsigned long first;
+   mm_segment_t fs;
+   int i;
+
+   /*
+* We need to switch to kernel mode so that we can use __get_user
+* to safely read from kernel space.  Note that we now dump the
+* code first, just in case the backtrace kills us.
+*/
+   fs = get_fs();
+   set_fs(KERNEL_DS);
+
+   pr_alert("%s%s(0x%016lx to 0x%016lx)\n", lvl, str, bottom, top);
+
+   for (first = bottom & ~31; first < top; first += 32) {
+   unsigned long p;
+   char str[sizeof(" 1234567812345678") * 8 + 1];
+
+   memset(str, ' ', sizeof(str));
+   str[sizeof(str) - 1] = '\0';
+
+   for (p = first, i = 0; i < 4 && p < top; i++, p += 8) {
+   if (p >= bottom && p < top) {
+   unsigned long val;
+
+   if (__get_user(val, (unsigned long *)p) == 0)
+   sprintf(str + i * 17, " %016lx", val);
+   else
+   sprintf(str + i * 17,
+   " ");
+   }
+   }
+   pr_alert("%s%04lx:%s\n", lvl, first & 0x, str);
+   }
+
+   set_fs(fs);
+}
+
 static void dump_mem(const char *lvl, const char *str, unsigned long bottom,
 unsigned long top)
 {
@@ -206,8 +248,24 @@ static int __die(const char *str, int err, struct 
thread_info *thread,
 TASK_COMM_LEN, tsk->comm, task_pid_nr(tsk), thread + 1);
 
if (!user_mode(regs) || in_interrupt()) {
-   dump_mem(KERN_EMERG, "Stack: ", regs->sp,
-THREAD_SIZE + (unsigned long)task_stack_page(tsk));
+
+   if (regs->sp > (unsigned long)task_stack_page(tsk)) {
+   dump_mem64(KERN_EMERG, "Stack: ", regs->sp,
+   THREAD_SIZE +
+   (unsigned long)task_stack_page(tsk));
+   } else {
+   

Re: [PATCH] Fixes: 9697dffb098d ("drm: Turn off Legacy Context Functions")

2015-06-17 Thread Daniel Vetter
On Wed, Jun 17, 2015 at 02:03:47PM -0600, Eddie Kovsky wrote:
> Commit 9697dffb098d ("drm: Turn off Legacy Context Functions")
> added checks for legacy features to several functions in the 
> drm driver. It is now possible for the void functions changed by 
> this commit to return an int error code. This patch updates
> the function definitions to return int. This fixes the build warnings:
> 
> warning: ‘return’ with a value, in function returning void
>return -EINVAL
> 
> Signed-off-by: Eddie Kovsky 

Oops sorry, I spotted this while applying but somehow screwed up and still
pushed out the patch. Dropped it now for real.

Thanks, Daniel

> ---
>  drivers/gpu/drm/drm_context.c | 12 +---
>  drivers/gpu/drm/drm_legacy.h  |  6 +++---
>  2 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_context.c b/drivers/gpu/drm/drm_context.c
> index f3ea657f6574..3c2f4a76f934 100644
> --- a/drivers/gpu/drm/drm_context.c
> +++ b/drivers/gpu/drm/drm_context.c
> @@ -51,7 +51,7 @@ struct drm_ctx_list {
>   * in drm_device::ctx_idr, while holding the drm_device::struct_mutex
>   * lock.
>   */
> -void drm_legacy_ctxbitmap_free(struct drm_device * dev, int ctx_handle)
> +int drm_legacy_ctxbitmap_free(struct drm_device * dev, int ctx_handle)
>  {
>   if (!drm_core_check_feature(dev, DRIVER_KMS_LEGACY_CONTEXT) &&
>   drm_core_check_feature(dev, DRIVER_MODESET))
> @@ -60,6 +60,8 @@ void drm_legacy_ctxbitmap_free(struct drm_device * dev, int 
> ctx_handle)
>   mutex_lock(>struct_mutex);
>   idr_remove(>ctx_idr, ctx_handle);
>   mutex_unlock(>struct_mutex);
> +
> + return 0;
>  }
>  
>  /**
> @@ -107,7 +109,7 @@ int drm_legacy_ctxbitmap_init(struct drm_device * dev)
>   * Free all idr members using drm_ctx_sarea_free helper function
>   * while holding the drm_device::struct_mutex lock.
>   */
> -void drm_legacy_ctxbitmap_cleanup(struct drm_device * dev)
> +int drm_legacy_ctxbitmap_cleanup(struct drm_device * dev)
>  {
>   if (!drm_core_check_feature(dev, DRIVER_KMS_LEGACY_CONTEXT) &&
>   drm_core_check_feature(dev, DRIVER_MODESET))
> @@ -116,6 +118,8 @@ void drm_legacy_ctxbitmap_cleanup(struct drm_device * dev)
>   mutex_lock(>struct_mutex);
>   idr_destroy(>ctx_idr);
>   mutex_unlock(>struct_mutex);
> +
> + return 0;
>  }
>  
>  /**
> @@ -127,7 +131,7 @@ void drm_legacy_ctxbitmap_cleanup(struct drm_device * dev)
>   * @file. Note that after this call returns, new contexts might be added if
>   * the file is still alive.
>   */
> -void drm_legacy_ctxbitmap_flush(struct drm_device *dev, struct drm_file 
> *file)
> +int drm_legacy_ctxbitmap_flush(struct drm_device *dev, struct drm_file *file)
>  {
>   struct drm_ctx_list *pos, *tmp;
>  
> @@ -150,6 +154,8 @@ void drm_legacy_ctxbitmap_flush(struct drm_device *dev, 
> struct drm_file *file)
>   }
>  
>   mutex_unlock(>ctxlist_mutex);
> +
> + return 0;
>  }
>  
>  /*@}*/
> diff --git a/drivers/gpu/drm/drm_legacy.h b/drivers/gpu/drm/drm_legacy.h
> index c1dc61473db5..26c16220e475 100644
> --- a/drivers/gpu/drm/drm_legacy.h
> +++ b/drivers/gpu/drm/drm_legacy.h
> @@ -43,9 +43,9 @@ struct drm_file;
>  #define DRM_RESERVED_CONTEXTS1
>  
>  int drm_legacy_ctxbitmap_init(struct drm_device *dev);
> -void drm_legacy_ctxbitmap_cleanup(struct drm_device *dev);
> -void drm_legacy_ctxbitmap_free(struct drm_device *dev, int ctx_handle);
> -void drm_legacy_ctxbitmap_flush(struct drm_device *dev, struct drm_file 
> *file);
> +int drm_legacy_ctxbitmap_cleanup(struct drm_device *dev);
> +int drm_legacy_ctxbitmap_free(struct drm_device *dev, int ctx_handle);
> +int drm_legacy_ctxbitmap_flush(struct drm_device *dev, struct drm_file 
> *file);
>  
>  int drm_legacy_resctx(struct drm_device *d, void *v, struct drm_file *f);
>  int drm_legacy_addctx(struct drm_device *d, void *v, struct drm_file *f);
> -- 
> 2.4.3
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] NVMe: Fixed race between nvme_thread & probe path.

2015-06-17 Thread Parav Pandit
Kernel thread nvme_thread and driver load process can be executing
in parallel on different CPU. This leads to race condition whenever
nvme_alloc_queue() instructions are executed out of order that can
reflects incorrect value for nvme_thread.
Memory barrier in nvme_alloc_queue() ensures that it maintains the
order and and data dependency read barrier in reader thread ensures
that cpu cache is synced.

Signed-off-by: Parav Pandit 
---
 drivers/block/nvme-core.c |   12 ++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index 5961ed7..90fb0ce 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -1403,8 +1403,10 @@ static struct nvme_queue *nvme_alloc_queue(struct 
nvme_dev *dev, int qid,
nvmeq->q_db = >dbs[qid * 2 * dev->db_stride];
nvmeq->q_depth = depth;
nvmeq->qid = qid;
-   dev->queue_count++;
dev->queues[qid] = nvmeq;
+   /* update queues first before updating queue_count */
+   smp_wmb();
+   dev->queue_count++;
 
return nvmeq;
 
@@ -2073,7 +2075,13 @@ static int nvme_kthread(void *data)
continue;
}
for (i = 0; i < dev->queue_count; i++) {
-   struct nvme_queue *nvmeq = dev->queues[i];
+   struct nvme_queue *nvmeq;
+
+   /* make sure to read queue_count before
+* traversing queues.
+*/
+   smp_read_barrier_depends();
+   nvmeq = dev->queues[i];
if (!nvmeq)
continue;
spin_lock_irq(>q_lock);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cgroup: add documentation for the PIDs controller

2015-06-17 Thread Aleksa Sarai
The attached patch adds documentation concerning the PIDs controller.
This should be applied alongside the rest of this patchset[1] (I'm not
entirely sure what the kernel policy is for new patches that should be
appended to an existing patchset).

[1]: https://lkml.org/lkml/2015/6/9/320

8<---

Add documentation derived from kernel/cgroup_pids.c to the relevant
Documentation/ directory, along with a few examples of how to use the
PIDs controller as well an explanation of its peculiarities.

Signed-off-by: Aleksa Sarai 
---
 Documentation/cgroups/00-INDEX |  2 +
 Documentation/cgroups/pids.txt | 85 ++
 2 files changed, 87 insertions(+)
 create mode 100644 Documentation/cgroups/pids.txt

diff --git a/Documentation/cgroups/00-INDEX b/Documentation/cgroups/00-INDEX
index 96ce071..3f5a40f 100644
--- a/Documentation/cgroups/00-INDEX
+++ b/Documentation/cgroups/00-INDEX
@@ -22,6 +22,8 @@ net_cls.txt
- Network classifier cgroups details and usages.
 net_prio.txt
- Network priority cgroups details and usages.
+pids.txt
+   - Process number cgroups details and usages.
 resource_counter.txt
- Resource Counter API.
 unified-hierarchy.txt
diff --git a/Documentation/cgroups/pids.txt b/Documentation/cgroups/pids.txt
new file mode 100644
index 000..1a078b5
--- /dev/null
+++ b/Documentation/cgroups/pids.txt
@@ -0,0 +1,85 @@
+  Process Number Controller
+  =
+
+Abstract
+
+
+The process number controller is used to allow a cgroup hierarchy to stop any
+new tasks from being fork()'d or clone()'d after a certain limit is reached.
+
+Since it is trivial to hit the task limit without hitting any kmemcg limits in
+place, PIDs are a fundamental resource. As such, PID exhaustion must be
+preventable in the scope of a cgroup hierarchy by allowing resource limiting of
+the number of tasks in a cgroup.
+
+Usage
+-
+
+In order to use the `pids` controller, set the maximum number of tasks in
+pids.max (this is not available in the root cgroup for obvious reasons). The
+number of processes currently in the cgroup is given by pids.current.
+
+Organisational operations are not blocked by cgroup policies, so it is possible
+to have pids.current > pids.max. This can be done by either setting the limit 
to
+be smaller than pids.current, or attaching enough processes to the cgroup such
+that pids.current > pids.max. However, it is not possible to violate a cgroup
+policy through fork() or clone(). fork() and clone() will return -EAGAIN if the
+creation of a new process would cause a cgroup policy to be violated.
+
+To set a cgroup to have no limit, set pids.max to "max". This is the default 
for
+all new cgroups (N.B. that PID limits are hierarchical, so the most stringent
+limit in the hierarchy is followed).
+
+pids.current tracks all child cgroup hierarchies, so parent/pids.current is a
+superset of parent/child/pids.current.
+
+Example
+---
+
+First, we mount the pids controller:
+# mkdir -p /sys/fs/cgroup/pids
+# mount -t cgroup -o pids none /sys/fs/cgroup/pids
+
+Then we create a hierarchy, set limits and attach processes to it:
+# mkdir -p /sys/fs/cgroup/pids/parent/child
+# echo 2 > /sys/fs/cgroup/pids/parent/pids.max
+# echo $$ > /sys/fs/cgroup/pids/parent/cgroup.procs
+# cat /sys/fs/cgroup/pids/parent/pids.current
+2
+#
+
+It should be noted that attempts to overcome the set limit (2 in this case) 
will
+fail:
+
+# cat /sys/fs/cgroup/pids/parent/pids.current
+2
+# ( /bin/echo "Here's some processes for you." | cat )
+sh: fork: Resource temporary unavailable
+#
+
+Even if we migrate to a child cgroup (which doesn't have a set limit), we will
+not be able to overcome the most stringent limit in the hierarchy (in this 
case,
+parent's):
+
+# echo $$ > /sys/fs/cgroup/pids/parent/child/cgroup.procs
+# cat /sys/fs/cgroup/pids/parent/pids.current
+2
+# cat /sys/fs/cgroup/pids/parent/child/pids.current
+2
+# cat /sys/fs/cgroup/pids/parent/child/pids.max
+max
+# ( /bin/echo "Here's some processes for you." | cat )
+sh: fork: Resource temporary unavailable
+#
+
+We can set a limit that is smaller than pids.current, which will stop any new
+processes from being forked at all (note that the shell itself counts towards
+pids.current):
+
+# echo 1 > /sys/fs/cgroup/pids/parent/pids.max
+# /bin/echo "We can't even spawn a single process now."
+sh: fork: Resource temporary unavailable
+# echo 0 > /sys/fs/cgroup/pids/parent/pids.max
+# /bin/echo "We can't even spawn a single process now."
+sh: fork: Resource temporary unavailable
+#
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [PATCH v2 12/13] KVM: x86: add SMM to the MMU role, support SMRAM address space

2015-06-17 Thread Xiao Guangrong



On 06/17/2015 04:18 PM, Paolo Bonzini wrote:



On 09/06/2015 06:01, Xiao Guangrong wrote:



On 05/28/2015 01:05 AM, Paolo Bonzini wrote:

This is now very simple to do.  The only interesting part is a simple
trick to find the right memslot in gfn_to_rmap, retrieving the address
space from the spte role word.  The same trick is used in the auditing
code.

The comment on top of union kvm_mmu_page_role has been stale forever,


Fortunately, we have documented these fields in mmu.txt, please do it for
'smm' as well. :)


Right, done.


+/*
+ * This is left at the top of the word so that
+ * kvm_memslots_for_spte_role can extract it with a
+ * simple shift.  While there is room, give it a whole
+ * byte so it is also faster to load it from memory.
+ */
+unsigned smm:8;


I suspect if we really need this trick, smm is not the hottest filed in
this struct anyway.


Note that after these patches it is used by gfn_to_rmap, and hence for
example rmap_add.


However, role->level is more hotter than role->smm so that it's also a good
candidate for this kind of trick.

And this is only 32 bits which can be operated in a CPU register by a single
memory load, that is why i was worried if it is really needed.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 23/24] m68k/mac: Fix PRAM accessors

2015-06-17 Thread Finn Thain

Hi Geert,

Further to my previous email,

On Tue, 16 Jun 2015, in which I wrote:

> 
> On Mon, 15 Jun 2015, Geert Uytterhoeven wrote:
> 
> > 
> > More magic values...
> 
> [...] The only useful RTC documentation I've ever come across is this: 
> http://mac.linux-m68k.org/devel/plushw.php

This document appears to be Inside Macintosh vol. III ch. 2. It describes 
the early RTC chip that lacks two-byte operations and XPRAM, and pre-dates 
all Mac hardware supported in mainline Linux. But it does offer some 
useful data, though not enough to answer all of your criticisms (as I 
said).

> [...] I think they should be applied across the entire file, and in a 
> different patch. Inconsistent use of such macros would be undesirable 
> IMHO.

So, unless you have other ideas, I will revise this patch and insert an 
earlier patch to address existing code, and codify what little reliable 
chip data we have.

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/9] fix sm7xxfb

2015-06-17 Thread Greg Kroah-Hartman
On Wed, Jun 17, 2015 at 09:43:51PM -0700, Greg Kroah-Hartman wrote:
> On Wed, Jun 17, 2015 at 04:54:39PM +0530, Sudip Mukherjee wrote:
> > fixing the few remaining issues of sm7xxfb before sending the patch to
> > remove it from staging.

Oh, the BIG_ENDIAN defines need to be fixed up, surely that can be done
in a more "correct" way.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 09/13] KVM: x86: pass kvm_mmu_page to gfn_to_rmap

2015-06-17 Thread Xiao Guangrong



On 06/17/2015 04:15 PM, Paolo Bonzini wrote:



On 09/06/2015 05:28, Xiao Guangrong wrote:


-rmapp = gfn_to_rmap(kvm, sp->gfn, PT_PAGE_TABLE_LEVEL);
+slots = kvm_memslots(kvm);
+slot = __gfn_to_memslot(slots, sp->gfn);
+rmapp = __gfn_to_rmap(sp->gfn, PT_PAGE_TABLE_LEVEL, slot);



Why @sp is not available here?


Because the function forces the level to be PT_PAGE_TABLE_LEVEL rather
than sp->level.


Oh, right, thanks for your explanation. :)

Reviewed-by: Xiao Guangrong 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/9] fix sm7xxfb

2015-06-17 Thread Greg Kroah-Hartman
On Wed, Jun 17, 2015 at 04:54:39PM +0530, Sudip Mukherjee wrote:
> fixing the few remaining issues of sm7xxfb before sending the patch to
> remove it from staging.
> Also attempted to setup a tree and all the patches of this series are
> available there for you.
> 
> 
> The following changes since commit f0feeaff9c60bfb3dbadf09da15d70cf35700f29:
> 
>   staging: wilc1000: remove unwanted code (2015-06-16 19:23:25 -0700)
> 
> are available in the git repository at:
> 
>   git://git.vectorindia.net/staging staging-testing

Sorry, I don't take git pulls for staging patches.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] scsi: use kzalloc for allocating one thing

2015-06-17 Thread Maninder Singh
Use kzalloc rather than kcalloc(1,...) for allocating one thing

Signed-off-by: Maninder Singh 
Reviewed-by: Vaneet Narang 
---
 drivers/scsi/mvsas/mv_init.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
index d40d734..65e47eb 100644
--- a/drivers/scsi/mvsas/mv_init.c
+++ b/drivers/scsi/mvsas/mv_init.c
@@ -558,7 +558,7 @@ static int mvs_pci_init(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 
chip = _chips[ent->driver_data];
SHOST_TO_SAS_HA(shost) =
-   kcalloc(1, sizeof(struct sas_ha_struct), GFP_KERNEL);
+   kzalloc(sizeof(struct sas_ha_struct), GFP_KERNEL);
if (!SHOST_TO_SAS_HA(shost)) {
kfree(shost);
rc = -ENOMEM;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] of: overlay: Implement indirect target support

2015-06-17 Thread Guenter Roeck

On 06/17/2015 05:10 PM, Rob Herring wrote:

On Fri, Jun 12, 2015 at 2:54 PM, Pantelis Antoniou
 wrote:

Some applications require applying the same overlay to a different
target according to some external condition (for instance depending
on the slot a card has been inserted, the overlay target is different).

The indirect target use requires using the new
of_overlay_create_indirect() API which uses a text selector.

The format requires the use of a target-indirect node as follows:

 fragment@0 {
 target-indirect {
 foo { target = <_target>; };
 bar { target = <_target>; };
 };
 };


The problem with this is it does not scale. The overlay has to be
changed for every new target. If you had an add-on board (possibly


Not really. For our use case, at least, each target is a slot in the
chassis, so we end up with something like

target-indirect {
slot0 { target = <>; };
slot1 { target = <>; };
slot2 { target = <>; };
slot3 { target = <>; };
slot4 { target = <>; };
slot5 { target = <>; };
slot6 { target = <>; };
slot7 { target = <>; };
slot8 { target = <>; };
};

where sib[0-8]i2c is defined in the master dts file.

Since the number of slots is well defined, the overlay will
always work. Sure, it may have to be updated if it is used in
a chassis with 20 slots, but that doesn't happen that often.


providing an overlay from an EEPROM), you would not want to have to
rebuild overlays with every new host board. It also only handles
translations of where to apply the overlay, but doesn't provide
translations of phandles within the overlay. Say a node that is a
clock or regulator consumer for example. Or am I missing something.

Grant and I discussed this briefly. I think we need a connector
definition in the base dtb which can provide the target for an
overlay. The connector should provide the translation between
connector local signals/buses and host signals/buses. We need to
define what this translation would look like for each binding.

At least for GPIO, we could have something similar to interrupt-map
properties. For example, to map connector gpio 0 to host gpio 66 and
connector gpio 1 to host gpio 44:

gpio-map = <0  66>, <1  44>;

We'd need to define how to handle I2C, SPI, regulators, and clocks
minimally. Perhaps rather than trying to apply nodes into the base
dtb, they should be under the connector and the kernel has to learn to
not just look for child nodes for various bindings. Just thinking
aloud...


Anything is fine with me, as long as it is usable (and does not require
us to create 9 overlay files for sib[0-8] from the example above).

The real tricky part is pci, where it is not just about simple target
redirection but irq numbers, memory address ranges, and bus number ranges.
It would be quite useful to have a workable solution for that as well,
but at least I don't have an idea how it could be done.

Talking about connector ...

Right now we have something like

sib0 {
compatible = "jnx,sib-connector", "simple-bus";
... (various properties)
};

Maybe we could use something like the following ?

sib0 {
compatible = "jnx,sib-connector", "simple-bus";
... (various attributes)
ref0 = <>;
ref1 = <>;
};

and then just reference ref0 and ref1 as targets in the overlay itself ?

Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE

2015-06-17 Thread Mike Galbraith
On Wed, 2015-06-17 at 20:46 -0700, Josef Bacik wrote:
> On 06/17/2015 05:55 PM, Mike Galbraith wrote:
> > On Wed, 2015-06-17 at 11:06 -0700, Josef Bacik wrote:
> >> On 06/11/2015 10:35 PM, Mike Galbraith wrote:
> >>> On Thu, 2015-05-28 at 13:05 +0200, Peter Zijlstra wrote:
> >
> >>> If sd == NULL, we fall through and try to pull wakee despite nacked-by
> >>> tsk_cpus_allowed() or wake_affine().
> >>>
> >>
> >> So maybe add a check in the if (sd_flag & SD_BALANCE_WAKE) for something
> >> like this
> >>
> >> if (tmp >= 0) {
> >>new_cpu = tmp;
> >>goto unlock;
> >> } else if (!want_affine) {
> >>new_cpu = prev_cpu;
> >> }
> >>
> >> so we can make sure we're not being pushed onto a cpu that we aren't
> >> allowed on?  Thanks,
> >
> > The buglet is a messenger methinks.  You saying the patch helped without
> > SD_BALANCE_WAKE being set is why I looked.  The buglet would seem to say
> > that preferring cache is not harming your load after all.  It now sounds
> > as though wake_wide() may be what you're squabbling with.
> >
> > Things aren't adding up all that well.
> 
> Yeah I'm horribly confused.  The other thing is I had to switch clusters 
> (I know, I know, I'm changing the parameters of the test).  So these new 
> boxes are haswell boxes, but basically the same otherwise, 2 socket 12 
> core with HT, just newer/faster CPUs.  I'll re-run everything again and 
> give the numbers so we're all on the same page again, but as it stands 
> now I think we have this
> 
> 3.10 with wake_idle forward ported - good
> 4.0 stock - 20% perf drop
> 4.0 w/ Peter's patch - good
> 4.0 w/ Peter's patch + SD_BALANCE_WAKE - 5% perf drop
> 
> I can do all these iterations again to verify, is there any other 
> permutation you'd like to see?  Thanks,

Yeah, after re-baseline, please apply/poke these buttons individually in
4.0-virgin.

(cat /sys/kernel/debug/sched_features, prepend NO_, echo it back)

---
 kernel/sched/fair.c |4 ++--
 kernel/sched/features.h |2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4506,7 +4506,7 @@ static int wake_affine(struct sched_doma
 * If we wake multiple tasks be careful to not bounce
 * ourselves around too much.
 */
-   if (wake_wide(p))
+   if (sched_feat(WAKE_WIDE) && wake_wide(p))
return 0;
 
idx   = sd->wake_idx;
@@ -4682,7 +4682,7 @@ static int select_idle_sibling(struct ta
struct sched_group *sg;
int i = task_cpu(p);
 
-   if (idle_cpu(target))
+   if (!sched_feat(PREFER_IDLE) || idle_cpu(target))
return target;
 
/*
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -59,6 +59,8 @@ SCHED_FEAT(TTWU_QUEUE, true)
 SCHED_FEAT(FORCE_SD_OVERLAP, false)
 SCHED_FEAT(RT_RUNTIME_SHARE, true)
 SCHED_FEAT(LB_MIN, false)
+SCHED_FEAT(PREFER_IDLE, true)
+SCHED_FEAT(WAKE_WIDE, true)
 
 /*
  * Apply the automatic NUMA scheduling policy. Enabled automatically


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/7] staging: board: armadillo800eva: Board staging for sh_mobile_lcdc_fb

2015-06-17 Thread Simon Horman
On Wed, Jun 17, 2015 at 10:38:49AM +0200, Geert Uytterhoeven wrote:
>   Hi Greg, Simon, Magnus,
> 
> This patch series adds board staging support for the Renesas R-Mobile A1
> (r8a7740) based Armadillo-800 EVA board. It allows to support the frame
> buffer device for the on-board LCD (which isn't supported by a DT-aware
> driver yet) in modern DT-based multi-platform kernels.
> 
> The board staging area was introduced last year to allow continuous
> upstream in-tree development and integration of platform devices. It
> helps developers integrate devices as platform devices for device
> drivers that only provide platform device bindings.  This in turn allows
> for incremental development of both hardware feature support and DT
> binding work in parallel.
> 
> The goal is to complete the move to ARM multi-platform kernels for all
> shmobile platforms, and drop the existing board files
> (arch/arm/mach-shmobile/board-*). Once this series is accepted, more
> than 3000 lines of legacy armadillo board code and r8a7740 SoC code can
> be removed.
> 
> This series consists of 5 parts:
>   - Patch 1 re-enables compilation of the board staging area, which was
> disabled after a compile breakage, but has been fixed in the mean
> time,
>   - Path 2 moves initialization of staging board code to an earlier
> moment, as currently it happens after unused PM domains are powered
> down,
>   - Patches 3 and 4 (hopefully) fix the existing kzm9d board staging
> code, which was presumably "broken" by commit 9a1091ef0017c40a
> ("irqchip: gic: Support hierarchy irq domain."),
>   - Patches 5 and 6 add support for registering platform devices with
> complex dependencies (clocks and PM domains), and add armadillo
> board staging code for enabling a frame buffer on the on-board LCD,
>   - Patch 7 (new) adds pinctrl and gpio-hog configuration to enable the
> LCD.
> 
> The first 6 patches should go in through the staging tree, the last one
> through the shmobile tree.
> 
> Major changes since v1 (more detailed changelogs in the individual
> patches):
>   - Add support for low/high edge/level interrupts in hwirq translation,
>   - Move pinctrl and GPIO configuration from board staging code to DT,
>   - Use clk_add_alias() instead of open coding.
> 
> Dependencies:
>   - This is against next-20150617,
>   - The gpio-hog in patch 7 depends on a bug fix like "[PATCH] [RFC]
> gpio: Retry deferred GPIO hogging on pin range change"
> (https://lkml.org/lkml/2015/6/16/455). It can be applied as-is
> though.
> 
> This was tested on r8a7740/armadillo.
> This was not tested on emev2/kzm9d, due to lack of hardware.

I have verified that kzm9d still boots with your patches applied on top
of renesas-devel-20150617-v4.1-rc8. I used shmobile_defconfig and
then enabled CONFIG_STAGING and in turn CONFIG_STAGING_BOARD.

Let me know if you think further testing is appropriate.

> Thanks for applying!
> 
> Geert Uytterhoeven (7):
>   Revert "staging: board: disable as it breaks the build"
>   staging: board: Initialize staging board code earlier
>   staging: board: Add support for translating hwirq to virq numbers
>   staging: board: kzm9d: Translate hwirq numbers to virq numbers
>   staging: board: Add support for devices with complex dependencies
>   staging: board: armadillo800eva: Board staging for sh_mobile_lcdc_fb

Feel free to add:

Acked-by: Simon Horman 

to the above.

>   ARM: shmobile: armadillo800eva dts: Add pinctrl and gpio-hog for lcdc0
> 
>  arch/arm/boot/dts/r8a7740-armadillo800eva.dts |  13 +++
>  drivers/staging/board/Kconfig |   1 -
>  drivers/staging/board/Makefile|   3 +-
>  drivers/staging/board/armadillo800eva.c   | 105 
>  drivers/staging/board/board.c | 136 
> ++
>  drivers/staging/board/board.h |  27 -
>  drivers/staging/board/kzm9d.c |  10 +-
>  7 files changed, 290 insertions(+), 5 deletions(-)
>  create mode 100644 drivers/staging/board/armadillo800eva.c
> 
> -- 
> 1.9.1
> 
> Gr{oetje,eeting}s,
> 
>   Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
>   -- Linus Torvalds
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH V2 07/10] arm64: Handle TRAP_BRKPT for user mode as well

2015-06-17 Thread Pratyush Anand
uprobe is registered at break_hook with a unique ESR code. So, when a
TRAP_BRKPT occurs, call_break_hook checks if it was for uprobe. If not,
then send a SIGTRAP to user.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/kernel/debug-monitors.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index 7eb13dcf09fa..1fe912e77f62 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -311,8 +311,18 @@ static int brk_handler(unsigned long addr, unsigned int 
esr,
   struct pt_regs *regs)
 {
siginfo_t info;
+   bool handler_found = false;
+
+#ifdef CONFIG_KPROBES
+   if ((esr & BRK64_ESR_MASK) == BRK64_ESR_KPROBES) {
+   if (kprobe_breakpoint_handler(regs, esr) == DBG_HOOK_HANDLED)
+   handler_found = true;
+   }
+#endif
+   if (!handler_found && call_break_hook(regs, esr) == DBG_HOOK_HANDLED)
+   handler_found = true;
 
-   if (user_mode(regs)) {
+   if (!handler_found && user_mode(regs)) {
info = (siginfo_t) {
.si_signo = SIGTRAP,
.si_errno = 0,
@@ -321,15 +331,8 @@ static int brk_handler(unsigned long addr, unsigned int 
esr,
};
 
force_sig_info(SIGTRAP, , current);
-   }
-#ifdef CONFIG_KPROBES
-   else if ((esr & BRK64_ESR_MASK) == BRK64_ESR_KPROBES) {
-   if (kprobe_breakpoint_handler(regs, esr) != DBG_HOOK_HANDLED)
-   return -EFAULT;
-   }
-#endif
-   else if (call_break_hook(regs, esr) != DBG_HOOK_HANDLED) {
-   pr_warn("Unexpected kernel BRK exception at EL1\n");
+   } else if (!handler_found) {
+   pr_warning("Unexpected kernel BRK exception at EL1\n");
return -EFAULT;
}
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH V2 08/10] arm64: rename enum debug_el to enum debug_elx to fix "wrong kind of tag"

2015-06-17 Thread Pratyush Anand
asm/debug-monitors.h contains definition for debug opcode. So, it will
be needed by asm/uprobes.h.

With enum debug_el it generates following compilation error, since
asm/uprobes.h is included.

lib/list_sort.c:160:8: error: ‘debug_el’ defined as wrong kind of tag
 struct debug_el {

Therefore rename enum debug_el to enum debug_elx.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/include/asm/debug-monitors.h | 6 +++---
 arch/arm64/kernel/debug-monitors.c  | 4 ++--
 arch/arm64/kernel/hw_breakpoint.c   | 6 +++---
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/debug-monitors.h 
b/arch/arm64/include/asm/debug-monitors.h
index 92d7ceac9adf..d9e79b01d09e 100644
--- a/arch/arm64/include/asm/debug-monitors.h
+++ b/arch/arm64/include/asm/debug-monitors.h
@@ -132,13 +132,13 @@ void unregister_break_hook(struct break_hook *hook);
 
 u8 debug_monitors_arch(void);
 
-enum debug_el {
+enum debug_elx {
DBG_ACTIVE_EL0 = 0,
DBG_ACTIVE_EL1,
 };
 
-void enable_debug_monitors(enum debug_el el);
-void disable_debug_monitors(enum debug_el el);
+void enable_debug_monitors(enum debug_elx el);
+void disable_debug_monitors(enum debug_elx el);
 
 void user_rewind_single_step(struct task_struct *task);
 void user_fastforward_single_step(struct task_struct *task);
diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index 1fe912e77f62..237a21f675fd 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -83,7 +83,7 @@ early_param("nodebugmon", early_debug_disable);
 static DEFINE_PER_CPU(int, mde_ref_count);
 static DEFINE_PER_CPU(int, kde_ref_count);
 
-void enable_debug_monitors(enum debug_el el)
+void enable_debug_monitors(enum debug_elx el)
 {
u32 mdscr, enable = 0;
 
@@ -103,7 +103,7 @@ void enable_debug_monitors(enum debug_el el)
}
 }
 
-void disable_debug_monitors(enum debug_el el)
+void disable_debug_monitors(enum debug_elx el)
 {
u32 mdscr, disable = 0;
 
diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index e7d934d3afe0..43b74a3ddaef 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -157,7 +157,7 @@ static void write_wb_reg(int reg, int n, u64 val)
  * Convert a breakpoint privilege level to the corresponding exception
  * level.
  */
-static enum debug_el debug_exception_level(int privilege)
+static enum debug_elx debug_exception_level(int privilege)
 {
switch (privilege) {
case AARCH64_BREAKPOINT_EL0:
@@ -231,7 +231,7 @@ static int hw_breakpoint_control(struct perf_event *bp,
struct perf_event **slots;
struct debug_info *debug_info = >thread.debug;
int i, max_slots, ctrl_reg, val_reg, reg_enable;
-   enum debug_el dbg_el = debug_exception_level(info->ctrl.privilege);
+   enum debug_elx dbg_el = debug_exception_level(info->ctrl.privilege);
u32 ctrl;
 
if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE) {
@@ -538,7 +538,7 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
  * exception level at the register level.
  * This is used when single-stepping after a breakpoint exception.
  */
-static void toggle_bp_registers(int reg, enum debug_el el, int enable)
+static void toggle_bp_registers(int reg, enum debug_elx el, int enable)
 {
int i, max_slots, privilege;
u32 ctrl;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH V2 05/10] arm64: Re-factor flush_ptrace_access

2015-06-17 Thread Pratyush Anand
Re-factor flush_ptrace_access to reuse vma independent part, which is
needed for functions like arch_uprobe_copy_ixol where we do not have
a vma.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/mm/flush.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index b6f14e8d2121..9a4dd6f39cfb 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -34,19 +34,25 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned 
long start,
__flush_icache_all();
 }
 
+static void __flush_ptrace_access(struct page *page, unsigned long uaddr,
+   void *kaddr, unsigned long len)
+{
+   unsigned long addr = (unsigned long)kaddr;
+
+   if (icache_is_aliasing()) {
+   __flush_dcache_area(kaddr, len);
+   __flush_icache_all();
+   } else {
+   flush_icache_range(addr, addr + len);
+   }
+}
+
 static void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
unsigned long uaddr, void *kaddr,
unsigned long len)
 {
-   if (vma->vm_flags & VM_EXEC) {
-   unsigned long addr = (unsigned long)kaddr;
-   if (icache_is_aliasing()) {
-   __flush_dcache_area(kaddr, len);
-   __flush_icache_all();
-   } else {
-   flush_icache_range(addr, addr + len);
-   }
-   }
+   if (vma->vm_flags & VM_EXEC)
+   __flush_ptrace_access(page, uaddr, kaddr, len);
 }
 
 /*
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH V2 06/10] arm64: Handle TRAP_HWBRKPT for user mode as well

2015-06-17 Thread Pratyush Anand
uprobe registers a handler at step_hook. So, single_step_handler now
checks for user mode as well if there is a valid hook.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/kernel/debug-monitors.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index 486ee94304a0..7eb13dcf09fa 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -238,7 +238,14 @@ static int single_step_handler(unsigned long addr, 
unsigned int esr,
if (!reinstall_suspended_bps(regs))
return 0;
 
-   if (user_mode(regs)) {
+#ifdef CONFIG_KPROBES
+   if (kprobe_single_step_handler(regs, esr) == DBG_HOOK_HANDLED)
+   handler_found = true;
+#endif
+   if (!handler_found && call_step_hook(regs, esr) == DBG_HOOK_HANDLED)
+   handler_found = true;
+
+   if (!handler_found && user_mode(regs)) {
info.si_signo = SIGTRAP;
info.si_errno = 0;
info.si_code  = TRAP_HWBKPT;
@@ -252,22 +259,13 @@ static int single_step_handler(unsigned long addr, 
unsigned int esr,
 * to the active-not-pending state).
 */
user_rewind_single_step(current);
-   } else {
-#ifdef CONFIG_KPROBES
-   if (kprobe_single_step_handler(regs, esr) == DBG_HOOK_HANDLED)
-   handler_found = true;
-#endif
-   if (call_step_hook(regs, esr) == DBG_HOOK_HANDLED)
-   handler_found = true;
-
-   if (!handler_found) {
-   pr_warn("Unexpected kernel single-step exception at 
EL1\n");
-   /*
-* Re-enable stepping since we know that we will be
-* returning to regs.
-*/
-   set_regs_spsr_ss(regs);
-   }
+   } else if (!handler_found) {
+   pr_warning("Unexpected kernel single-step exception at EL1\n");
+   /*
+* Re-enable stepping since we know that we will be
+* returning to regs.
+*/
+   set_regs_spsr_ss(regs);
}
 
return 0;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH V2 09/10] arm64: Add uprobe support

2015-06-17 Thread Pratyush Anand
This patch adds support for uprobe on ARM64 architecture.

Unit test for following has been done so far and they have been found
working
1. Step-able instructions, like sub, ldr, add etc.
2. Simulation-able like ret.
3. uretprobe
4. Reject-able instructions like sev, wfe etc.
5. trapped and abort xol path
6. probe at unaligned user address.

Currently it does not support aarch32 instruction probing.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/Kconfig  |   3 +
 arch/arm64/include/asm/debug-monitors.h |   3 +
 arch/arm64/include/asm/probes.h |   1 +
 arch/arm64/include/asm/thread_info.h|   5 +-
 arch/arm64/include/asm/uprobes.h|  37 ++
 arch/arm64/kernel/Makefile  |   3 +
 arch/arm64/kernel/signal.c  |   4 +-
 arch/arm64/kernel/uprobes.c | 213 
 arch/arm64/mm/flush.c   |   6 +
 9 files changed, 273 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/include/asm/uprobes.h
 create mode 100644 arch/arm64/kernel/uprobes.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5312be5b40ad..3ff4e038a365 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -156,6 +156,9 @@ config PGTABLE_LEVELS
default 3 if ARM64_4K_PAGES && ARM64_VA_BITS_39
default 4 if ARM64_4K_PAGES && ARM64_VA_BITS_48
 
+config ARCH_SUPPORTS_UPROBES
+   def_bool y
+
 source "init/Kconfig"
 
 source "kernel/Kconfig.freezer"
diff --git a/arch/arm64/include/asm/debug-monitors.h 
b/arch/arm64/include/asm/debug-monitors.h
index d9e79b01d09e..1c3a4e635f1c 100644
--- a/arch/arm64/include/asm/debug-monitors.h
+++ b/arch/arm64/include/asm/debug-monitors.h
@@ -94,6 +94,9 @@
 #define BRK64_ESR_MASK 0x
 #define BRK64_ESR_KPROBES  0x0004
 #define BRK64_OPCODE_KPROBES   (AARCH64_BREAK_MON | (BRK64_ESR_KPROBES << 5))
+/* uprobes BRK opcodes with ESR encoding  */
+#define BRK64_ESR_UPROBES  0x0008
+#define BRK64_OPCODE_UPROBES   (AARCH64_BREAK_MON | (BRK64_ESR_UPROBES << 5))
 
 /* AArch32 */
 #define DBG_ESR_EVT_BKPT   0x4
diff --git a/arch/arm64/include/asm/probes.h b/arch/arm64/include/asm/probes.h
index f07968f1335f..52db4e4c47c7 100644
--- a/arch/arm64/include/asm/probes.h
+++ b/arch/arm64/include/asm/probes.h
@@ -19,6 +19,7 @@ struct kprobe;
 struct arch_specific_insn;
 
 typedef u32 kprobe_opcode_t;
+typedef u32 uprobe_opcode_t;
 typedef unsigned long (kprobes_pstate_check_t)(unsigned long);
 typedef unsigned long
 (probes_condition_check_t)(u32 opcode, struct arch_specific_insn *asi,
diff --git a/arch/arm64/include/asm/thread_info.h 
b/arch/arm64/include/asm/thread_info.h
index dcd06d18a42a..2e0644e0600e 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -101,6 +101,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_NEED_RESCHED   1
 #define TIF_NOTIFY_RESUME  2   /* callback before returning to user */
 #define TIF_FOREIGN_FPSTATE3   /* CPU's FP state is not current's */
+#define TIF_UPROBE 4   /* uprobe breakpoint or singlestep */
 #define TIF_NOHZ   7
 #define TIF_SYSCALL_TRACE  8
 #define TIF_SYSCALL_AUDIT  9
@@ -122,10 +123,12 @@ static inline struct thread_info 
*current_thread_info(void)
 #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SYSCALL_TRACEPOINT(1 << TIF_SYSCALL_TRACEPOINT)
 #define _TIF_SECCOMP   (1 << TIF_SECCOMP)
+#define _TIF_UPROBE(1 << TIF_UPROBE)
 #define _TIF_32BIT (1 << TIF_32BIT)
 
 #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
-_TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE)
+_TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
+_TIF_UPROBE)
 
 #define _TIF_SYSCALL_WORK  (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/include/asm/uprobes.h b/arch/arm64/include/asm/uprobes.h
new file mode 100644
index ..9d64317d1e21
--- /dev/null
+++ b/arch/arm64/include/asm/uprobes.h
@@ -0,0 +1,37 @@
+/*
+ * Copyright (C) 2014-2015 Pratyush Anand 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _ASM_UPROBES_H
+#define _ASM_UPROBES_H
+
+#include 
+#include 
+#include 
+
+#define MAX_UINSN_BYTESAARCH64_INSN_SIZE
+
+#define UPROBE_SWBP_INSN   BRK64_OPCODE_UPROBES
+#define UPROBE_SWBP_INSN_SIZE  4
+#define UPROBE_XOL_SLOT_BYTES  MAX_UINSN_BYTES
+
+struct arch_uprobe_task {
+   unsigned long saved_fault_code;
+};
+
+struct arch_uprobe {
+   union {
+   u8 insn[MAX_UINSN_BYTES];
+   u8 ixol[MAX_UINSN_BYTES];
+   };
+   struct 

[RFC PATCH V2 04/10] arm64: Add helper for link pointer

2015-06-17 Thread Pratyush Anand
At many a place we program procedure link pointer ie regs[30]. So adding
helper to do that.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/include/asm/ptrace.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index 3ea7f5a04bfc..fa2c122e5bd6 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -227,6 +227,13 @@ static inline int valid_user_regs(struct user_pt_regs 
*regs)
 #include 
 
 #define stack_pointer(regs)((regs)->sp)
+#define procedure_link_pointer(regs)   ((regs)->regs[30])
+
+static inline void procedure_link_pointer_set(struct pt_regs *regs,
+  unsigned long val)
+{
+   procedure_link_pointer(regs) = val;
+}
 
 #ifdef CONFIG_SMP
 #undef profile_pc
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH V2 10/10] arm64: uprobes: check conditions before simulating instructions

2015-06-17 Thread Pratyush Anand
From: Steve Capper 

Currently uprobes just simulates any instruction that it can't in
place execute. This can lead to unpredictable behaviour if the
execution condition fails and the instruction wouldn't otherwise
have been executed.

This patch adds the condition check

Signed-off-by: Steve Capper 
---
 arch/arm64/kernel/uprobes.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/uprobes.c b/arch/arm64/kernel/uprobes.c
index 2cc9114deac2..a6d12b81e9ae 100644
--- a/arch/arm64/kernel/uprobes.c
+++ b/arch/arm64/kernel/uprobes.c
@@ -119,15 +119,22 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 {
kprobe_opcode_t insn;
unsigned long addr;
+   struct arch_specific_insn *ainsn;
 
if (!auprobe->simulate)
return false;
 
insn = *(kprobe_opcode_t *)(>insn[0]);
addr = instruction_pointer(regs);
+   ainsn = >ainsn;
+
+   if (ainsn->handler) {
+   if (!ainsn->check_condn || ainsn->check_condn(insn, ainsn, 
regs))
+   ainsn->handler(insn, addr, regs);
+   else
+   instruction_pointer_set(regs, instruction_pointer(regs) 
+ 4);
+   }
 
-   if (auprobe->ainsn.handler)
-   auprobe->ainsn.handler(insn, addr, regs);
 
return true;
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH V2 03/10] arm64: include asm-generic/ptrace.h in asm/ptrace.h

2015-06-17 Thread Pratyush Anand
instruction_pointer_set is needed for uprobe implementation.
asm-generic/ptrace.h already defines it. So include it in asm/ptrace.h.

But inclusion of asm-generic/ptrace.h, needs definition of GET_USP,
SET_USP, GET_FP & SET_FP as they are different than the generic
definition. So, define them in asm/ptrace.h.

user_stack_pointer, instruction_pointer and profile_pc have already been
defined by asm-generic/ptrace.h now, therefore remove them from asm/ptrace.h.

To modify instruction pointer in kprobe, use
instruction_pointer_set(regs, val) instead of instruction_pointer(regs)
= val, otherwise lvalue error.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/include/asm/ptrace.h  | 32 +---
 arch/arm64/kernel/kprobes.c  | 13 +++--
 arch/arm64/kernel/probes-simulate-insn.c | 16 
 3 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index aadf61a334eb..3ea7f5a04bfc 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -144,10 +144,6 @@ struct pt_regs {
 
 #define fast_interrupts_enabled(regs) \
(!((regs)->pstate & PSR_F_BIT))
-
-#define user_stack_pointer(regs) \
-   (!compat_user_mode(regs) ? (regs)->sp : (regs)->compat_sp)
-
 /**
  * regs_get_register() - get register value from its offset
  * @regs: pt_regs from which register value is gotten
@@ -206,13 +202,35 @@ static inline int valid_user_regs(struct user_pt_regs 
*regs)
return 0;
 }
 
-#define instruction_pointer(regs)  ((regs)->pc)
+#define GET_USP(regs) \
+   (!compat_user_mode(regs) ? (regs)->sp : (regs)->compat_sp)
+
+#define SET_USP(regs, val) \
+   do {\
+   if (compat_user_mode(regs)) \
+   (regs)->compat_sp = val;\
+   else\
+   (regs)->sp = val;   \
+   } while (0)
+
+#define GET_FP(regs) \
+   (!compat_user_mode(regs) ? (regs)->regs[29] : (regs)->compat_fp)
+
+#define SET_FP(regs, val)  \
+   do {\
+   if (compat_user_mode(regs)) \
+   (regs)->compat_fp = val;\
+   else\
+   (regs)->regs[29] = val; \
+   } while (0)
+
+#include 
+
 #define stack_pointer(regs)((regs)->sp)
 
 #ifdef CONFIG_SMP
+#undef profile_pc
 extern unsigned long profile_pc(struct pt_regs *regs);
-#else
-#define profile_pc(regs) instruction_pointer(regs)
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/kernel/kprobes.c b/arch/arm64/kernel/kprobes.c
index 740f71695b07..6c9f8b5f04ce 100644
--- a/arch/arm64/kernel/kprobes.c
+++ b/arch/arm64/kernel/kprobes.c
@@ -228,7 +228,8 @@ static void __kprobes
 skip_singlestep_missed(struct kprobe_ctlblk *kcb, struct pt_regs *regs)
 {
/* set return addr to next pc to continue */
-   instruction_pointer(regs) += sizeof(kprobe_opcode_t);
+   instruction_pointer_set(regs,
+   instruction_pointer(regs) + sizeof(kprobe_opcode_t));
 }
 
 static void __kprobes setup_singlestep(struct kprobe *p,
@@ -257,7 +258,7 @@ static void __kprobes setup_singlestep(struct kprobe *p,
/* IRQs and single stepping do not mix well. */
kprobes_save_local_irqflag(regs);
kernel_enable_single_step(regs);
-   instruction_pointer(regs) = slot;
+   instruction_pointer_set(regs, slot);
} else  {
/* insn simulation */
arch_simulate_insn(p, regs);
@@ -304,7 +305,7 @@ post_kprobe_handler(struct kprobe_ctlblk *kcb, struct 
pt_regs *regs)
 
/* return addr restore if non-branching insn */
if (cur->ainsn.restore.type == RESTORE_PC) {
-   instruction_pointer(regs) = cur->ainsn.restore.addr;
+   instruction_pointer_set(regs, cur->ainsn.restore.addr);
if (!instruction_pointer(regs))
BUG();
}
@@ -341,7 +342,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, 
unsigned int fsr)
 * and allow the page fault handler to continue as a
 * normal page fault.
 */
-   instruction_pointer(regs) = (unsigned long)cur->addr;
+   instruction_pointer_set(regs, (unsigned long)cur->addr);
if (!instruction_pointer(regs))
BUG();
if (kcb->kprobe_status == KPROBE_REENTER)
@@ -507,7 +508,7 @@ int __kprobes setjmp_pre_handler(struct kprobe *p, struct 
pt_regs *regs)
memcpy(kcb->jprobes_stack, (void *)stack_ptr,
   MIN_STACK_SIZE(stack_ptr));
 
-   

[RFC PATCH V2 01/10] arm64: kprobe: Make prepare and handler function independent of 'struct kprobe'

2015-06-17 Thread Pratyush Anand
prepare and handler function will also be used by uprobe. So, make them
struct kprobe independent.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/include/asm/probes.h   |  5 +++--
 arch/arm64/kernel/kprobes-arm64.c | 33 +
 arch/arm64/kernel/kprobes.c   |  7 ---
 3 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/include/asm/probes.h b/arch/arm64/include/asm/probes.h
index 7f5a27fa071c..f07968f1335f 100644
--- a/arch/arm64/include/asm/probes.h
+++ b/arch/arm64/include/asm/probes.h
@@ -21,9 +21,10 @@ struct arch_specific_insn;
 typedef u32 kprobe_opcode_t;
 typedef unsigned long (kprobes_pstate_check_t)(unsigned long);
 typedef unsigned long
-(probes_condition_check_t)(struct kprobe *p, struct pt_regs *);
+(probes_condition_check_t)(u32 opcode, struct arch_specific_insn *asi,
+   struct pt_regs *);
 typedef void
-(probes_prepare_t)(struct kprobe *, struct arch_specific_insn *);
+(probes_prepare_t)(u32 insn, struct arch_specific_insn *);
 typedef void (kprobes_handler_t) (u32 opcode, long addr, struct pt_regs *);
 
 enum pc_restore_type {
diff --git a/arch/arm64/kernel/kprobes-arm64.c 
b/arch/arm64/kernel/kprobes-arm64.c
index 8a7e6b0290a7..d8f6e79b4de0 100644
--- a/arch/arm64/kernel/kprobes-arm64.c
+++ b/arch/arm64/kernel/kprobes-arm64.c
@@ -26,68 +26,61 @@
  * condition check functions for kprobes simulation
  */
 static unsigned long __kprobes
-__check_pstate(struct kprobe *p, struct pt_regs *regs)
+__check_pstate(u32 opcode, struct arch_specific_insn *asi, struct pt_regs 
*regs)
 {
-   struct arch_specific_insn *asi = >ainsn;
unsigned long pstate = regs->pstate & 0x;
 
return asi->pstate_cc(pstate);
 }
 
 static unsigned long __kprobes
-__check_cbz(struct kprobe *p, struct pt_regs *regs)
+__check_cbz(u32 opcode, struct arch_specific_insn *asi, struct pt_regs *regs)
 {
-   return check_cbz((u32)p->opcode, regs);
+   return check_cbz(opcode, regs);
 }
 
 static unsigned long __kprobes
-__check_cbnz(struct kprobe *p, struct pt_regs *regs)
+__check_cbnz(u32 opcode, struct arch_specific_insn *asi, struct pt_regs *regs)
 {
-   return check_cbnz((u32)p->opcode, regs);
+   return check_cbnz(opcode, regs);
 }
 
 static unsigned long __kprobes
-__check_tbz(struct kprobe *p, struct pt_regs *regs)
+__check_tbz(u32 opcode, struct arch_specific_insn *asi, struct pt_regs *regs)
 {
-   return check_tbz((u32)p->opcode, regs);
+   return check_tbz(opcode, regs);
 }
 
 static unsigned long __kprobes
-__check_tbnz(struct kprobe *p, struct pt_regs *regs)
+__check_tbnz(u32 opcode, struct arch_specific_insn *asi, struct pt_regs *regs)
 {
-   return check_tbnz((u32)p->opcode, regs);
+   return check_tbnz(opcode, regs);
 }
 
 /*
  * prepare functions for instruction simulation
  */
 static void __kprobes
-prepare_none(struct kprobe *p, struct arch_specific_insn *asi)
+prepare_none(u32 insn, struct arch_specific_insn *asi)
 {
 }
 
 static void __kprobes
-prepare_bcond(struct kprobe *p, struct arch_specific_insn *asi)
+prepare_bcond(u32 insn, struct arch_specific_insn *asi)
 {
-   kprobe_opcode_t insn = p->opcode;
-
asi->check_condn = __check_pstate;
asi->pstate_cc = kprobe_condition_checks[insn & 0xf];
 }
 
 static void __kprobes
-prepare_cbz_cbnz(struct kprobe *p, struct arch_specific_insn *asi)
+prepare_cbz_cbnz(u32 insn, struct arch_specific_insn *asi)
 {
-   kprobe_opcode_t insn = p->opcode;
-
asi->check_condn = (insn & (1 << 24)) ? __check_cbnz : __check_cbz;
 }
 
 static void __kprobes
-prepare_tbz_tbnz(struct kprobe *p, struct arch_specific_insn *asi)
+prepare_tbz_tbnz(u32 insn, struct arch_specific_insn *asi)
 {
-   kprobe_opcode_t insn = p->opcode;
-
asi->check_condn = (insn & (1 << 24)) ? __check_tbnz : __check_tbz;
 }
 
diff --git a/arch/arm64/kernel/kprobes.c b/arch/arm64/kernel/kprobes.c
index 7e34ef381055..740f71695b07 100644
--- a/arch/arm64/kernel/kprobes.c
+++ b/arch/arm64/kernel/kprobes.c
@@ -60,7 +60,7 @@ static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
 static void __kprobes arch_prepare_simulate(struct kprobe *p)
 {
if (p->ainsn.prepare)
-   p->ainsn.prepare(p, >ainsn);
+   p->ainsn.prepare(p->opcode, >ainsn);
 
/* This instructions is not executed xol. No need to adjust the PC */
p->ainsn.restore.addr = 0;
@@ -271,7 +271,8 @@ static int __kprobes reenter_kprobe(struct kprobe *p,
switch (kcb->kprobe_status) {
case KPROBE_HIT_SSDONE:
case KPROBE_HIT_ACTIVE:
-   if (!p->ainsn.check_condn || p->ainsn.check_condn(p, regs)) {
+   if (!p->ainsn.check_condn ||
+   p->ainsn.check_condn((u32)p->opcode, >ainsn, regs)) {
kprobes_inc_nmissed_count(p);
setup_singlestep(p, regs, kcb, 1);
} else  {
@@ -402,7 +403,7 @@ void __kprobes kprobe_handler(struct 

[RFC PATCH V2 02/10] arm64: fix kgdb_step_brk_fn to ignore other's exception

2015-06-17 Thread Pratyush Anand
ARM64 step exception does not have any syndrome information. So, it is
responsibility of exception handler to take care that they handle it
only if exception was raised for them.
After kprobe support, both kprobe and kgdb uses register_step_hook
mechanism to register its step handler. So, if call_step_hook calls kgdb
handler first, it was always returning 0 and in that case if an
exception was raised for kprobe, it would never be handled.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/kernel/kgdb.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index a0d10c55f307..9469465a5e03 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -229,6 +229,9 @@ static int kgdb_compiled_brk_fn(struct pt_regs *regs, 
unsigned int esr)
 
 static int kgdb_step_brk_fn(struct pt_regs *regs, unsigned int esr)
 {
+   if (!kgdb_single_step)
+   return DBG_HOOK_ERROR;
+
kgdb_handle_exception(1, SIGTRAP, 0, regs);
return 0;
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH V2 00/10] ARM64: Uprobe support added

2015-06-17 Thread Pratyush Anand
These patches have been prepared on top of ARM64 kprobe v7 patches [1].
Keeping as RFC, because kprobe-v7 still need to be ACKed.

Unit test for following has been done so far and they have been found
working
1. Step-able instructions, like sub, ldr, add etc.
2. Simulation-able like ret.
3. uretprobe
4. Reject-able instructions like sev, wfe etc.
5. trapped and abort xol path
6. probe at unaligned user address.

Currently it does not support aarch32 instruction probing.

RFC PATCH V1 is here [2].

Changes since V1:
===
* Most of the part of V1-(1-2) have been merged into kprobe patches.
* V1 Patch-3 has been removed.
* Other patches have also been re-arranged.
* Patch-1 in this series does changes to make 'prepare' and 'handler'
function independent of 'struct kprobe', so that they can be reused for uprobe.
* Patch-2 fixes kgdb_step_brk_fn to ignore other's exception
* Patch3-8 are preparations for uprobe patch to work.
* Patch 9-10 is actual work for uprobe support

Other significant changes

* Now relying on uprobe_task->vaddr, and so removed saved_user_pc and
ss_ctx from struct arch_uprobe_task.
* irqs disabling around uprobe_pre/post_sstep_notifier removed.
* Now returning DBG_HOOK_HANDLED from breakpoint and step handler only
on success.
* Removed step_ctx logic.
* A comment for not supporting compat.
* unaligned address check in arch_uprobe_analyze_insn
* includes asm-generic/ptrace.h in asm/ptrace.h
* rename enum debug_el to enum debug_elx 

[1] http://marc.info/?l=linux-arm-kernel=143439540523827=2
[2] http://marc.info/?l=linux-arm-kernel=142003951103185=2

Pratyush Anand (9):
  arm64: kprobe: Make prepare and handler function independent of
'struct kprobe'
  arm64: fix kgdb_step_brk_fn to ignore other's exception
  arm64: include asm-generic/ptrace.h in asm/ptrace.h
  arm64: Add helper for link pointer
  arm64: Re-factor flush_ptrace_access
  arm64: Handle TRAP_HWBRKPT for user mode as well
  arm64: Handle TRAP_BRKPT for user mode as well
  arm64: rename enum debug_el to enum debug_elx to fix "wrong kind of
tag"
  arm64: Add uprobe support

Steve Capper (1):
  arm64: uprobes: check conditions before simulating instructions

 arch/arm64/Kconfig   |   3 +
 arch/arm64/include/asm/debug-monitors.h  |   9 +-
 arch/arm64/include/asm/probes.h  |   6 +-
 arch/arm64/include/asm/ptrace.h  |  39 +-
 arch/arm64/include/asm/thread_info.h |   5 +-
 arch/arm64/include/asm/uprobes.h |  37 ++
 arch/arm64/kernel/Makefile   |   3 +
 arch/arm64/kernel/debug-monitors.c   |  59 +
 arch/arm64/kernel/hw_breakpoint.c|   6 +-
 arch/arm64/kernel/kgdb.c |   3 +
 arch/arm64/kernel/kprobes-arm64.c|  33 ++---
 arch/arm64/kernel/kprobes.c  |  20 +--
 arch/arm64/kernel/probes-simulate-insn.c |  16 +--
 arch/arm64/kernel/signal.c   |   4 +-
 arch/arm64/kernel/uprobes.c  | 220 +++
 arch/arm64/mm/flush.c|  30 +++--
 16 files changed, 401 insertions(+), 92 deletions(-)
 create mode 100644 arch/arm64/include/asm/uprobes.h
 create mode 100644 arch/arm64/kernel/uprobes.c

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] watchdog: bcm2835: Fix poweroff behaviour

2015-06-17 Thread Guenter Roeck

On 06/17/2015 07:04 AM, Noralf Trønnes wrote:

Currently poweroff/halt results in a reboot on the Raspberry Pi.
The firmware uses the RSTS register to know which partiton to
boot from. The partiton value is spread into bits
0, 2, 4, 6, 8, 10. Partiton 63 is a special partition used by
the firmware to indicate halt.

The firmware made this change in 19 Aug 2013 and was matched
by the downstream commit:
Changes for new NOOBS multi partition booting from gsh

Signed-off-by: Noralf Trønnes 


Reviewed-by: Guenter Roeck 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction

2015-06-17 Thread Sergey Senozhatsky
On (06/18/15 12:39), Minchan Kim wrote:
[..]
> > ah, I see.
> > it doesn't hold the lock `until all the pages are done`. it holds it
> > as long as zs_can_compact() returns > 0. hm, I'm not entirely sure that
> > this patch set has increased the locking time (in average).
> 
> I see your point. Sorry for the consusing.
> My point is not average but max time. I bet your patch will increase
> it and it will affect others who want to allocate zspage in parallel on
> another CPU.

makes sense.

[..]
> > > Yes, it's not easy and I believe a few artificial testing are not enough
> > > to prove no regression but we don't have any choice.
> > > Actually, I think this patchset does make sense. Although it might have
> > > a problem on situation heavy memory pressure by lacking of fragment space,
> > 
> > 
> > I tested exactly this scenario yesterday (and sent an email). We leave `no 
> > holes'
> > in classes only in ~1.35% of cases. so, no, this argument is not valid. we 
> > preserve
> > fragmentation.
> 
> Thanks, Sergey.
> 
> I want to test by myself to simulate worst case scenario to make to use up
> reserved memory by zram. For it, please fix below first and resubmit, please.
> 
> 1. doesn't hold lock until class compation is done.
>It could prevent another allocation on another CPU.
>I want to make worst case scenario and it needs it.
> 
> 2. No touch ZS_ALMOST_FULL waterline. It can put more zspages
>in ZS_ALMOST_FULL list so it couldn't be selected by migration
>source.
> 
> With new patchset, I want to watch min(free_pages of the system),
> zram.max_used_pages, testing time and so on.
> 
> Really sorry for bothering you, Sergey but I think it's important
> feature on zram so I want to be careful because risk management is
> my role.

ok. will take a day or two to gather new numbers.

-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3 1/6] hv: Modify vmbus to search for all MMIO ranges available

2015-06-17 Thread KY Srinivasan


> -Original Message-
> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org]
> Sent: Wednesday, June 17, 2015 12:11 PM
> To: Jake Oshins
> Cc: Vitaly Kuznetsov; KY Srinivasan; linux-kernel@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; Haiyang
> Zhang; Mike Ebersol
> Subject: Re: [PATCH v3 1/6] hv: Modify vmbus to search for all MMIO ranges
> available
> 
> On Wed, Jun 17, 2015 at 05:41:58PM +, Jake Oshins wrote:
> > > -Original Message-
> > > From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
> > > Sent: Wednesday, June 17, 2015 10:28 AM
> > > To: Jake Oshins
> > > Cc: gre...@linuxfoundation.org; KY Srinivasan; linux-
> > > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> > > a...@canonical.com; Haiyang Zhang; Mike Ebersol
> > > Subject: Re: [PATCH v3 1/6] hv: Modify vmbus to search for all MMIO
> ranges
> > > available
> > > >  }
> > > >
> > > > @@ -1047,6 +1121,7 @@ static struct acpi_driver vmbus_acpi_driver = {
> > > > .ids = vmbus_acpi_device_ids,
> > > > .ops = {
> > > > .add = vmbus_acpi_add,
> > > > +   .remove = vmbus_acpi_remove,
> > >
> > > This will probably need rebasing on top of current char-misc-next tree
> > > as we already have commit e4ecb41c: "Drivers: hv: vmbus: introduce
> > > vmbus_acpi_remove" there.
> > >
> >
> > Thanks.  Please educate me since I'm new around here.  What should I
> > do in response to this message?  Wait for this tree to be pulled into
> > the mainline and resend after rebasing?  Something more proactive?
> 
> It's KY's job to do this type of thing, he should handle this for you :)

I have been on vacation for the last 10 days; currently waiting in the Dubai 
airport
on the way back. I will work with Jake soon on this.

K. Y
> 
> greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE

2015-06-17 Thread Josef Bacik

On 06/17/2015 05:55 PM, Mike Galbraith wrote:

On Wed, 2015-06-17 at 11:06 -0700, Josef Bacik wrote:

On 06/11/2015 10:35 PM, Mike Galbraith wrote:

On Thu, 2015-05-28 at 13:05 +0200, Peter Zijlstra wrote:



If sd == NULL, we fall through and try to pull wakee despite nacked-by
tsk_cpus_allowed() or wake_affine().



So maybe add a check in the if (sd_flag & SD_BALANCE_WAKE) for something
like this

if (tmp >= 0) {
new_cpu = tmp;
goto unlock;
} else if (!want_affine) {
new_cpu = prev_cpu;
}

so we can make sure we're not being pushed onto a cpu that we aren't
allowed on?  Thanks,


The buglet is a messenger methinks.  You saying the patch helped without
SD_BALANCE_WAKE being set is why I looked.  The buglet would seem to say
that preferring cache is not harming your load after all.  It now sounds
as though wake_wide() may be what you're squabbling with.

Things aren't adding up all that well.


Yeah I'm horribly confused.  The other thing is I had to switch clusters 
(I know, I know, I'm changing the parameters of the test).  So these new 
boxes are haswell boxes, but basically the same otherwise, 2 socket 12 
core with HT, just newer/faster CPUs.  I'll re-run everything again and 
give the numbers so we're all on the same page again, but as it stands 
now I think we have this


3.10 with wake_idle forward ported - good
4.0 stock - 20% perf drop
4.0 w/ Peter's patch - good
4.0 w/ Peter's patch + SD_BALANCE_WAKE - 5% perf drop

I can do all these iterations again to verify, is there any other 
permutation you'd like to see?  Thanks,


Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction

2015-06-17 Thread Minchan Kim
On Thu, Jun 18, 2015 at 12:01:36PM +0900, Sergey Senozhatsky wrote:
> On (06/18/15 11:41), Sergey Senozhatsky wrote:
> [..]
> > > My concern is not a compacion overhead but higher memory footprint
> > > consumed by zram in reserved memory.
> > > It might hang system if zram used up reserved memory of system with
> > > ALLOC_NO_WATERMARKS. With auto-compaction, userspace has a higher chance
> > > to use more memory with uncompressible pages or file-backed pages
> > > so zram-swap can use more reserved memory. We need to evaluate it, I 
> > > think.
> > > 
> 
> a couple of _not really related_ ideas that I want to voice.
> 
> (a) I'm thinking of extending zramX/compact attr. right now it's WO,
>   and I think it makes sense to make it RW:
> ->write will trigger compaction
> ->read will return estimated number of bytes
>   "zs_can_compact() * pages per zspage * page_size" that can be freed.
>   so user-space will have at least minimal idea whether compaction is
>   reasonable. but sure, this is racy and in general case things may
>   change between `cat compact` and `echo 1 > compact`.

It's a good idea. with that, memory manager on platform could be smart.

if memory pressure == soft and zram.can_compact > 20M
do zram.compact
if memory pressure == hard and zram.can_compact > 5M
do zram.compact

With this, userspace have more flexibility. :)

However, FYI, I want to make auto-compact default in future
so let's see how auto-compact is going.

> 
> 
> (b) adding a knob (yeah, like we don't have enough knobs already :-))
> that will allow 'enable/disable auto compaction'.

I agree.

> 
>   -ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Crypto Fixes for 4.1

2015-06-17 Thread Herbert Xu
Hi Linus:

This push fixes the following issues:

1) Crash in caam hash due to uninitialised buffer lengths.
2) Alignment issue in caam RNG that may lead to non-random output.

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git


Steve Cornelius (2):
  crypto: caam - improve initalization for context state saves
  crypto: caam - fix RNG buffer cache alignment

 drivers/crypto/caam/caamhash.c |2 ++
 drivers/crypto/caam/caamrng.c  |2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction

2015-06-17 Thread Minchan Kim
On Thu, Jun 18, 2015 at 11:41:07AM +0900, Sergey Senozhatsky wrote:
> Hi,
> 
> On (06/18/15 10:50), Minchan Kim wrote:
> [..]
> > > hm, what's the difference with the existing implementation?
> > > The 'new one' aborts when (a) !zs_can_compact() and (b) !migrate_zspage().
> > > It holds the class lock less time than current compaction.
> > 
> > At old, it unlocks periodically(ie, per-zspage migration) so other who
> > want to allocate a zspage in the class can have a chance but your patch
> > increases lock holding time until all of zspages in the class is done
> > so other will be blocked until all of zspage migration in the class is
> > done.
> 
> ah, I see.
> it doesn't hold the lock `until all the pages are done`. it holds it
> as long as zs_can_compact() returns > 0. hm, I'm not entirely sure that
> this patch set has increased the locking time (in average).

I see your point. Sorry for the consusing.
My point is not average but max time. I bet your patch will increase
it and it will affect others who want to allocate zspage in parallel on
another CPU.

> 
> 
> > > 
> > > > I will review remain parts tomorrow(I hope) but what I want to say
> > > > before going sleep is:
> > > > 
> > > > I like the idea but still have a concern to lack of fragmented zspages
> > > > during memory pressure because auto-compaction will prevent fragment
> > > > most of time. Surely, using fragment space as buffer in heavy memory
> > > > pressure is not intened design so it could be fragile but I'm afraid
> > > > this feature might accelrate it and it ends up having a problem and
> > > > change current behavior in zram as swap.
> > > 
> > > Well, it's nearly impossible to prove anything with the numbers obtained
> > > during some particular case. I agree that fragmentation can be both
> > > 'good' (depending on IO pattern) and 'bad'.
> > 
> > Yes, it's not easy and I believe a few artificial testing are not enough
> > to prove no regression but we don't have any choice.
> > Actually, I think this patchset does make sense. Although it might have
> > a problem on situation heavy memory pressure by lacking of fragment space,
> 
> 
> I tested exactly this scenario yesterday (and sent an email). We leave `no 
> holes'
> in classes only in ~1.35% of cases. so, no, this argument is not valid. we 
> preserve
> fragmentation.

Thanks, Sergey.

I want to test by myself to simulate worst case scenario to make to use up
reserved memory by zram. For it, please fix below first and resubmit, please.

1. doesn't hold lock until class compation is done.
   It could prevent another allocation on another CPU.
   I want to make worst case scenario and it needs it.

2. No touch ZS_ALMOST_FULL waterline. It can put more zspages
   in ZS_ALMOST_FULL list so it couldn't be selected by migration
   source.

With new patchset, I want to watch min(free_pages of the system),
zram.max_used_pages, testing time and so on.

Really sorry for bothering you, Sergey but I think it's important
feature on zram so I want to be careful because risk management is
my role.

> 
>   -ss
> 
> > I think we should go with this patchset and fix the problem with another way
> > (e,g. memory pooling rather than relying on the luck of fragment).
> > But I need something to take the risk. That's why I ask the number
> > although it's not complete. It can cover a case at least, it is better than
> > none. :)
> > 
> > > 
> > > 
> > > Auto-compaction of IDLE zram devices certainly makes sense, when system
> > > is getting low on memory. zram devices are not always 'busy', serving
> > > heavy IO. There may be N idle zram devices simply sitting and wasting
> > > memory; or being 'moderately' busy; so compaction will not cause any
> > > significant slow down there.
> > > 
> > > Auto-compaction of BUSY zram devices is less `desired', of course;
> > > but not entirely terrible I think (zs_can_compact() can help here a
> > > lot).
> > 
> > My concern is not a compacion overhead but higher memory footprint
> > consumed by zram in reserved memory.
> > It might hang system if zram used up reserved memory of system with
> > ALLOC_NO_WATERMARKS. With auto-compaction, userspace has a higher chance
> > to use more memory with uncompressible pages or file-backed pages
> > so zram-swap can use more reserved memory. We need to evaluate it, I think.
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/6] ARM: Mediatek: enable GPT6 on boot up to make arch timer working for MT6580

2015-06-17 Thread Scott Shu
We enable GTP6 which ungates the arch timer clock.
---
 arch/arm/mach-mediatek/mediatek.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-mediatek/mediatek.c 
b/arch/arm/mach-mediatek/mediatek.c
index 6b38d67..469d332 100644
--- a/arch/arm/mach-mediatek/mediatek.c
+++ b/arch/arm/mach-mediatek/mediatek.c
@@ -28,7 +28,8 @@ static void __init mediatek_timer_init(void)
 {
void __iomem *gpt_base = 0;
 
-   if (of_machine_is_compatible("mediatek,mt6589") ||
+   if (of_machine_is_compatible("mediatek,mt6580") ||
+   of_machine_is_compatible("mediatek,mt6589") ||
of_machine_is_compatible("mediatek,mt8135") ||
of_machine_is_compatible("mediatek,mt8127")) {
/* turn on GPT6 which ungates arch timer clocks */
@@ -46,6 +47,7 @@ static void __init mediatek_timer_init(void)
 };
 
 static const char * const mediatek_board_dt_compat[] = {
+   "mediatek,mt6580",
"mediatek,mt6589",
"mediatek,mt6592",
"mediatek,mt8127",
-- 
1.8.1.1.dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/6] soc: Mediatek: Add SCPSYS CPU power domain driver

2015-06-17 Thread Scott Shu
This adds a CPU power domain driver for the Mediatek SCPSYS unit on
MT6580.
---
 arch/arm/mach-mediatek/Makefile  |   2 +-
 arch/arm/mach-mediatek/generic.h |  24 +
 arch/arm/mach-mediatek/hotplug.c | 228 +++
 3 files changed, 253 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/mach-mediatek/generic.h
 create mode 100644 arch/arm/mach-mediatek/hotplug.c

diff --git a/arch/arm/mach-mediatek/Makefile b/arch/arm/mach-mediatek/Makefile
index 2116460..b2e4ef5 100644
--- a/arch/arm/mach-mediatek/Makefile
+++ b/arch/arm/mach-mediatek/Makefile
@@ -1,4 +1,4 @@
 ifeq ($(CONFIG_SMP),y)
-obj-$(CONFIG_ARCH_MEDIATEK) += platsmp.o
+obj-$(CONFIG_ARCH_MEDIATEK) += platsmp.o hotplug.o
 endif
 obj-$(CONFIG_ARCH_MEDIATEK) += mediatek.o
diff --git a/arch/arm/mach-mediatek/generic.h b/arch/arm/mach-mediatek/generic.h
new file mode 100644
index 000..2a0d0c8
--- /dev/null
+++ b/arch/arm/mach-mediatek/generic.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (c) 2015 Mediatek Inc.
+ * Author: Scott Shu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+#ifndef __MACH_MTK_COMMON__
+#define __MACH_MTK_COMMON__
+
+#include 
+
+int spm_cpu_mtcmos_init(void);
+int spm_cpu_mtcmos_on(int cpu);
+int spm_cpu_mtcmos_off(int cpu, bool wfi);
+
+#endif
diff --git a/arch/arm/mach-mediatek/hotplug.c b/arch/arm/mach-mediatek/hotplug.c
new file mode 100644
index 000..be0305d
--- /dev/null
+++ b/arch/arm/mach-mediatek/hotplug.c
@@ -0,0 +1,228 @@
+/*
+ * Copyright (c) 2015 Mediatek Inc.
+ * Author: Scott Shu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* SCPSYS registers */
+#define SPM_POWERON_CONFIG_SET 0x
+
+#define SPM_CA7_CPU0_PWR_CON   0x0200
+#define SPM_CA7_CPU1_PWR_CON   0x0218
+#define SPM_CA7_CPU2_PWR_CON   0x021c
+#define SPM_CA7_CPU3_PWR_CON   0x0220
+
+#define SPM_CA7_CPU0_L1_PDN0x025c
+#define SPM_CA7_CPU1_L1_PDN0x0264
+#define SPM_CA7_CPU2_L1_PDN0x026c
+#define SPM_CA7_CPU3_L1_PDN0x0274
+
+#define SPM_PWR_STATUS 0x060c
+#define SPM_PWR_STATUS_2ND 0x0610
+#define SPM_SLEEP_TIMER_STA0x0720
+
+/*
+ * bit definition in SPM_CA7_CPUx_PWR_CON
+ */
+#define SRAM_ISOINT_B  BIT(6)
+#define SRAM_CKISO BIT(5)
+#define PWR_CLK_DISBIT(4)
+#define PWR_ON_2ND BIT(3)
+#define PWR_ON BIT(2)
+#define PWR_ISOBIT(1)
+#define PWR_RST_B  BIT(0)
+
+/*
+ * bit definition in SPM_CA7_CPUx_L1_PDN
+ */
+#define L1_PDN_ACK BIT(8)
+#define L1_PDN BIT(0)
+
+void __iomem *spm_cpu_base;
+
+u32 spm_cpu_pwr_con[4] = {
+   SPM_CA7_CPU0_PWR_CON,
+   SPM_CA7_CPU1_PWR_CON,
+   SPM_CA7_CPU2_PWR_CON,
+   SPM_CA7_CPU3_PWR_CON,
+};
+
+u32 spm_cpu_l1_pdn[4] = {
+   SPM_CA7_CPU0_L1_PDN,
+   SPM_CA7_CPU1_L1_PDN,
+   SPM_CA7_CPU2_L1_PDN,
+   SPM_CA7_CPU3_L1_PDN,
+};
+
+#define SPM_REGWR_EN   (1U << 0)
+#define SPM_PROJECT_CODE   0x0B16
+
+int spm_cpu_mtcmos_on(int cpu)
+{
+   static DEFINE_SPINLOCK(spm_cpu_lock);
+   unsigned long flags;
+   static u32 spmcpu_pwr_con, spmcpu_l1_pdn;
+   unsigned int temp;
+
+   temp = (SPM_PROJECT_CODE << 16) | SPM_REGWR_EN;
+   writel_relaxed(temp, spm_cpu_base + SPM_POWERON_CONFIG_SET);
+
+   spmcpu_pwr_con = spm_cpu_pwr_con[cpu];
+   spmcpu_l1_pdn = spm_cpu_l1_pdn[cpu];
+
+   spin_lock_irqsave(_cpu_lock, flags);
+
+   temp = readl_relaxed(spm_cpu_base + spmcpu_pwr_con);
+   temp |= PWR_ON;
+   writel_relaxed(temp, spm_cpu_base + spmcpu_pwr_con);
+
+   udelay(1);
+
+   temp = readl_relaxed(spm_cpu_base + spmcpu_pwr_con);
+   temp |= PWR_ON_2ND;
+   writel_relaxed(temp, spm_cpu_base + spmcpu_pwr_con);
+
+   while (((readl_relaxed(spm_cpu_base + SPM_PWR_STATUS) &
+   (1U << (13 - cpu))) != (1U << (13 - cpu))) ||
+   ((readl_relaxed(spm_cpu_base + SPM_PWR_STATUS_2ND) &
+   (1U << (13 - cpu))) != (1U << (13 - 

[PATCH 6/6] ARM: dts: mt6580: enable basic SMP bringup for mt6580

2015-06-17 Thread Scott Shu
Add arch timer node to enable arch-timer support. MT6580 firmware
doesn't correctly setup arch-timer frequency and CNTVOFF, add
properties to workaround this.

This set cpu enable-method to enable SMP.
---
 arch/arm/boot/dts/mt6580.dtsi | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/arm/boot/dts/mt6580.dtsi b/arch/arm/boot/dts/mt6580.dtsi
index a974830..a7071b38 100644
--- a/arch/arm/boot/dts/mt6580.dtsi
+++ b/arch/arm/boot/dts/mt6580.dtsi
@@ -23,26 +23,31 @@
cpus {
#address-cells = <1>;
#size-cells = <0>;
+   enable-method = "mediatek,mt6580-smp";
 
cpu@0 {
device_type = "cpu";
compatible = "arm,cortex-a7";
reg = <0x0>;
+   clock-frequency = <17>;
};
cpu@1 {
device_type = "cpu";
compatible = "arm,cortex-a7";
reg = <0x1>;
+   clock-frequency = <17>;
};
cpu@2 {
device_type = "cpu";
compatible = "arm,cortex-a7";
reg = <0x2>;
+   clock-frequency = <17>;
};
cpu@3 {
device_type = "cpu";
compatible = "arm,cortex-a7";
reg = <0x3>;
+   clock-frequency = <17>;
};
 
};
@@ -72,6 +77,21 @@
};
};
 
+   timer {
+   compatible = "arm,armv7-timer";
+   interrupt-parent = <>;
+   interrupts = ,
+,
+,
+;
+   clock-frequency = <1300>;
+   arm,cpu-registers-not-fw-configured;
+   };
+
soc {
#address-cells = <1>;
#size-cells = <1>;
-- 
1.8.1.1.dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/6] Document: bindings: DT: Add SMP enable method for MT6580 SoC platform

2015-06-17 Thread Scott Shu
For MT6580 SoC platform, the secondary cores are in powered off state
as default, so compared with MT65xx series SoC, one new enable method
is needed. This method using the SPM (System Power Manager) inside
the SCYSYS to control the CPU power.
---
 Documentation/devicetree/bindings/arm/cpus.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/arm/cpus.txt 
b/Documentation/devicetree/bindings/arm/cpus.txt
index ac2903d..fb80b2e 100644
--- a/Documentation/devicetree/bindings/arm/cpus.txt
+++ b/Documentation/devicetree/bindings/arm/cpus.txt
@@ -194,6 +194,7 @@ nodes to be present and contain the properties described 
below.
"marvell,armada-380-smp"
"marvell,armada-390-smp"
"marvell,armada-xp-smp"
+   "mediatek,mt6580-smp"
"mediatek,mt65xx-smp"
"mediatek,mt81xx-tz-smp"
"qcom,gcc-msm8660"
-- 
1.8.1.1.dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/6] ARM: dts: mt6580: Add device nodes to the MT6580 dtsi file

2015-06-17 Thread Scott Shu
This adds the SCPSYS device node to the MT6580 dtsi file.
---
 arch/arm/boot/dts/mt6580.dtsi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm/boot/dts/mt6580.dtsi b/arch/arm/boot/dts/mt6580.dtsi
index ae3cdb6..a974830 100644
--- a/arch/arm/boot/dts/mt6580.dtsi
+++ b/arch/arm/boot/dts/mt6580.dtsi
@@ -78,6 +78,11 @@
compatible = "simple-bus";
ranges;
 
+   scpsys: scpsys@10006000 {
+   compatible = "mediatek,mt6580-scpsys";
+   reg = <0x10006000 0x1000>;
+   };
+
timer: timer@10008000 {
compatible = "mediatek,mt6580-timer", 
"mediatek,mt6577-timer";
reg = <0x10008000 0x80>;
-- 
1.8.1.1.dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/6] ARM: mediatek: add smp bringup code for MT6580

2015-06-17 Thread Scott Shu
Add support for cpu enable-method "mediatek,mt6580-smp" for booting
secondary CPUs on MT6580.
---
 arch/arm/mach-mediatek/platsmp.c | 107 +++
 1 file changed, 107 insertions(+)

diff --git a/arch/arm/mach-mediatek/platsmp.c b/arch/arm/mach-mediatek/platsmp.c
index 12fefb3..2985913 100644
--- a/arch/arm/mach-mediatek/platsmp.c
+++ b/arch/arm/mach-mediatek/platsmp.c
@@ -21,10 +21,15 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include "generic.h"
 
 #define MTK_MAX_CPU8
 #define MTK_SMP_REG_SIZE   0x1000
 
+static DEFINE_SPINLOCK(boot_lock);
+
 struct mtk_smp_boot_info {
unsigned long smp_base;
unsigned int jump_reg;
@@ -57,6 +62,101 @@ static const struct of_device_id mtk_smp_boot_infos[] 
__initconst = {
 static void __iomem *mtk_smp_base;
 static const struct mtk_smp_boot_info *mtk_smp_info;
 
+static void __cpuinit write_pen_release(int val)
+{
+   pen_release = val;
+   /* Make sure this is visible to other CPUs */
+   smp_wmb();
+   sync_cache_w(_release);
+}
+
+static int mt6580_boot_secondary(unsigned int cpu, struct task_struct *idle)
+{
+   unsigned long timeout;
+
+   /*
+* Set synchronisation state between this boot processor
+* and the secondary one
+*/
+   spin_lock(_lock);
+
+   /*
+* The secondary processor is waiting to be released from
+* the holding pen - release it, then wait for it to flag
+* that it has been released by resetting pen_release.
+*
+* Note that "pen_release" is the hardware CPU ID, whereas
+* "cpu" is Linux's internal ID.
+*/
+   write_pen_release(cpu);
+
+   /*
+* CPU power on control by SPM
+*/
+   spm_cpu_mtcmos_on(cpu);
+
+   timeout = jiffies + (1 * HZ);
+   while (time_before(jiffies, timeout)) {
+   /* Read barrier */
+   smp_rmb();
+
+   if (pen_release == -1)
+   break;
+
+   usleep_range(10, 1000);
+   }
+
+   /*
+* Now the secondary core is starting up let it run its
+* calibrations, then wait for it to finish
+*/
+   spin_unlock(_lock);
+
+   return (pen_release != -1 ? -EINVAL : 0);
+}
+
+static void mt6580_secondary_init(unsigned int cpu)
+{
+   /*
+* Let the primary processor know we're out of the
+* pen, then head off into the C entry point
+*/
+   write_pen_release(-1);
+
+   /*
+* Synchronise with the boot thread.
+*/
+   spin_lock(_lock);
+   spin_unlock(_lock);
+}
+
+#define MT6580_INFRACFG_AO 0x10001000
+#define SW_ROM_PD  BIT(31)
+
+static void __init mt6580_smp_prepare_cpus(unsigned int max_cpus)
+{
+   static void __iomem *infracfg_ao_base;
+
+   infracfg_ao_base = ioremap(MT6580_INFRACFG_AO, 0x1000);
+
+   if (!infracfg_ao_base)
+   pr_err("%s: Unable to map I/O memory\n", __func__);
+
+   /* Enable bootrom power down mode */
+   writel_relaxed(readl(infracfg_ao_base + 0x804) | SW_ROM_PD,
+  infracfg_ao_base + 0x804);
+
+   /* Write the address of slave startup into boot address
+  register for bootrom power down mode */
+   writel_relaxed(virt_to_phys(secondary_startup_arm),
+  infracfg_ao_base + 0x800);
+
+   iounmap(infracfg_ao_base);
+
+   /* Initial spm cpu mtcmos memory map */
+   spm_cpu_mtcmos_init();
+}
+
 static int mtk_boot_secondary(unsigned int cpu, struct task_struct *idle)
 {
if (!mtk_smp_base)
@@ -143,3 +243,10 @@ static struct smp_operations mt65xx_smp_ops __initdata = {
.smp_boot_secondary = mtk_boot_secondary,
 };
 CPU_METHOD_OF_DECLARE(mt65xx_smp, "mediatek,mt65xx-smp", _smp_ops);
+
+static struct smp_operations mt6580_smp_ops __initdata = {
+   .smp_prepare_cpus = mt6580_smp_prepare_cpus,
+   .smp_secondary_init = mt6580_secondary_init,
+   .smp_boot_secondary = mt6580_boot_secondary,
+};
+CPU_METHOD_OF_DECLARE(mt6580_smp, "mediatek,mt6580-smp", _smp_ops);
-- 
1.8.1.1.dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/6] This series adds SMP support for the MediaTek MT6580.

2015-06-17 Thread Scott Shu
This patchset adds support SMP on MediaTek MT6580 Cortex-A7 qual core SoC.

This is based on v4.1-rc1 and following patch series:
(1) Yingjoe Chen's "Add SMP bringup support for mt65xx socs" [1]
(2) Mars Cheng's "Add mt6580 basic chip support" [2]
(3) Sascha Hauer's "Mediatek SCPSYS power domain support" [3]

The secondary cores are power off as default on MT6580, this change adds
a new enable-method to turn on power to the cores during booting process.

The System Power Manager (SPM) inside the SCPSYS is for the CPU MTCMOS
power domain control. Please check [3] for more information about SCPSYS.

[1] https://lkml.org/lkml/2015/5/16/33
[2] https://lkml.org/lkml/2015/6/3/113
[3] https://lkml.org/lkml/2015/6/9/172


Scott Shu (6):
  Document: bindings: DT: Add SMP enable method for MT6580 SoC platform
  soc: Mediatek: Add SCPSYS CPU power domain driver
  ARM: mediatek: add smp bringup code
  ARM: Mediatek: enable GPT6 on boot up to make arch timer working for MT6580
  ARM: dts: mt6580: Add device nodes to the MT6580 dtsi file.
  ARM: dts: mt6580: enable basic SMP bringup for MT6580

 Documentation/devicetree/bindings/arm/cpus.txt |   1 +
 arch/arm/boot/dts/mt6580.dtsi  |  25 +++
 arch/arm/mach-mediatek/Makefile|   2 +-
 arch/arm/mach-mediatek/generic.h   |  24 +++
 arch/arm/mach-mediatek/hotplug.c   | 229 +
 arch/arm/mach-mediatek/mediatek.c  |   4 +-
 arch/arm/mach-mediatek/platsmp.c   | 113 +++-
 7 files changed, 395 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm/mach-mediatek/generic.h
 create mode 100644 arch/arm/mach-mediatek/hotplug.c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/1] perf tools: Check access permission when reading /proc/kcore file.

2015-06-17 Thread Li Zhang

On 2015年06月17日 14:09, Sukadev Bhattiprolu wrote:

Li Zhang [zhlci...@linux.vnet.ibm.com] wrote:
| When using command perf report --kallsyms=/proc/kallsyms with a non-root
| user, symbols are resolved. Then select one symbol and annotate it, it
| reports the error as the following:
| Can't annotate __clear_user: No vmlinux file with build id xxx was
| found.
|
| The problem is caused by reading /proc/kcore without access permission.
| It needs to change access permission to allow a specific user to read
| /proc/kcore or use root to execute the perf command.
|
| This patch is to check access permission when reading kcore file.
|
| Signed-off-by: Li Zhang 
| ---
|  v2 -> v1:
| * Report one useful message to users about the access permision,
|   then go back to the tools. Suggested by Arnaldo Carvalho de Melo.
|
|  tools/perf/util/symbol.c | 4 
|  1 file changed, 4 insertions(+)
|
| diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
| index 201f6c4c..1bcd8dc 100644
| --- a/tools/perf/util/symbol.c
| +++ b/tools/perf/util/symbol.c
| @@ -1125,6 +1125,10 @@ static int dso__load_kcore(struct dso *dso, struct map 
*map,
|   md.type = map->type;
|   INIT_LIST_HEAD();
|
| + if (access(kcore_filename, R_OK))
| + ui__warning("Insufficient permission to access %s.\n",
| + kcore_filename);
| +

Couple of comments.

For consistency with rest of the file, use pr_warning() or pr_err().


ui_warning can report the message to users directly when this program is 
running.

But if we considered the consistency, pr_warning or pr_err should be better.
And users can get this message by trying another time.



Also, we could drop the access() call and report the error when open()
fails below?


I think we can drop this access. But /proc/kcore also require the 
process with CAP_SYS_RAWIO
capability. Even if chown this file, access report right result, but 
open still fails.




|   fd = open(kcore_filename, O_RDONLY);
|   if (fd < 0)
|   return -EINVAL;

Further, if user specifies the file with --kallsyms and we are not
able to read it, we should treat it as a fatal error and exit - this
would be easer when parsing command line args.

I have another patch which checks this files. I will merge it to this patch.



If user did not specify the option and we are proactively trying to
use /proc/kcore, we should not treat errors as fatal? i.e report
a warning message and continue without symbols?


In the current program, even if open fails, the program still continue 
to run.

Is it helpful for users to get the address without symbols?



| --
| 2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




--

Li Zhang
IBM China Linux Technology Centre

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Intel-gfx] [BUG, bisect] Re: drm/i915: WARN_ON(dev_priv->mm.busy)

2015-06-17 Thread Jeremiah Mahler
Jani,

On Mon, Jun 15, 2015 at 02:40:42PM +0300, Jani Nikula wrote:
> On Mon, 15 Jun 2015, Ville Syrjälä  wrote:
> > On Mon, Jun 15, 2015 at 01:25:38AM -0700, Jeremiah Mahler wrote:
> >> Daniel,
> >> 
> >> On Mon, Jun 15, 2015 at 08:57:47AM +0200, Daniel Vetter wrote:
> >> > Can you please retest with
> >> > 
> >> > commit 0aedb1626566efd72b369c01992ee7413c82a0c5
> >> > Author: Ville Syrjälä 
> >> > Date:   Thu May 28 18:32:36 2015 +0300
> >> > 
> >> > drm/i915: Don't skip request retirement if the active list is empty
> >> > 
> >> > Thanks, Daniel
> >> > 
> >> 
> >> The bug is still present with that patch applied.  And it is still
> >> present up to linux-next 20150611.
> >
> > The patch was misapplied, so what's in the tree at the moment isn't what
> > I sent to the list.
> 
> This should be rectified in current drm-intel-nightly branch of
> [1]. Jeremiah, please give that a try.
> 
> BR,
> Jani.
> 
> 
> [1] http://cgit.freedesktop.org/drm-intel
> 
> 
> >
> > -- 
> > Ville Syrjälä
> > Intel OTC
> 
> -- 
> Jani Nikula, Intel Open Source Technology Center

I tested drm-intel-nightly and all the warnings appear to be resolved
in there.  So when these get to -next it should be good.

-- 
- Jeremiah Mahler
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Suspicious RCU usage at boot w/ arm ipi trace events?

2015-06-17 Thread Steven Rostedt
On Wed, 17 Jun 2015 18:49:38 -0700
Stephen Boyd  wrote:

> From: Stephen Boyd 
> Subject: [PATCH] ARM: smp: Silence suspicious RCU usage with ipi tracepoints
> 
> John Stultz reports an RCU splat on boot with ARM ipi trace
> events enabled.
> 
> ===
> [ INFO: suspicious RCU usage. ]
> 4.1.0-rc7-00033-gb5bed2f #153 Not tainted
> ---
> include/trace/events/ipi.h:68 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> RCU used illegally from extended quiescent state!
> no locks held by swapper/0/0.
> 
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-rc7-00033-gb5bed2f #153
> Hardware name: Qualcomm (Flattened Device Tree)
> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> [] (show_stack) from [] (dump_stack+0x70/0xbc)
> [] (dump_stack) from [] (handle_IPI+0x428/0x604)
> [] (handle_IPI) from [] (gic_handle_irq+0x54/0x5c)
> [] (gic_handle_irq) from [] (__irq_svc+0x44/0x7c)
> Exception stack(0xc09f3f48 to 0xc09f3f90)
> 3f40:   0001 0001  c09f73b8 c09f4528 c0a5de9c
> 3f60: c076b4f0   c09ef108 c0a5cec1 0001  c09f3f90
> 3f80: c026bf60 c0210ab8 2113 
> [] (__irq_svc) from [] (arch_cpu_idle+0x20/0x3c)
> [] (arch_cpu_idle) from [] (cpu_startup_entry+0x2c0/0x5dc)
> [] (cpu_startup_entry) from [] (start_kernel+0x358/0x3c4)
> [] (start_kernel) from [<8020807c>] (0x8020807c)
> 
> At this point in the IPI handling path we haven't called
> irq_enter() yet, so RCU doesn't know that we're about to exit
> idle and properly warns that we're using RCU from an idle CPU.
> Use trace_ipi_entry_rcuidle() instead of trace_ipi_entry() so
> that RCU is informed about our exit from idle.
> 
> Reported-by: John Stultz 

Acked-by: Steven Rostedt 

-- Steve

> Fixes: 365ec7b17327 "ARM: add IPI tracepoints"
> Signed-off-by: Stephen Boyd 
> ---
>  arch/arm/kernel/smp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 13a91d390832..03eb8a446dca 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -589,7 +589,7 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
>   struct pt_regs *old_regs = set_irq_regs(regs);
>  
>   if ((unsigned)ipinr < NR_IPI) {
> - trace_ipi_entry(ipi_types[ipinr]);
> + trace_ipi_entry_rcuidle(ipi_types[ipinr]);
>   __inc_irq_stat(cpu, ipi_irqs[ipinr]);
>   }
>  
> @@ -648,7 +648,7 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
>   }
>  
>   if ((unsigned)ipinr < NR_IPI)
> - trace_ipi_exit(ipi_types[ipinr]);
> + trace_ipi_exit_rcuidle(ipi_types[ipinr]);
>   set_irq_regs(old_regs);
>  }
>  
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[lkp] [drm/i915] 2def4ad99be: +182.8% piglit.time.system_time

2015-06-17 Thread Huang Ying
FYI, we noticed the below changes on

git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 ("drm/i915: Optimistically spin 
for the request completion")


=
tbox_group/testcase/rootfs/kconfig/compiler/cpufreq_governor/group:
  
lkp-t410/piglit/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/performance/igt-052

commit: 
  a1b2278e4dfcd2dbea85e319ebf73a6b7b2f180b
  2def4ad99befa25775dd2f714fdd4d92faec6e34

a1b2278e4dfcd2db 2def4ad99befa25775dd2f714f 
 -- 
 %stddev %change %stddev
 \  |\  
  6.00 ±  0%+116.7%  13.00 ±  0%  
piglit.time.percent_of_cpu_this_job_got
  6.71 ±  0%+182.8%  18.96 ±  0%  piglit.time.system_time
  5598 ±  0%  +2.1%   5718 ±  0%  vmstat.system.in
  6.00 ±  0%+116.7%  13.00 ±  0%  time.percent_of_cpu_this_job_got
  6.71 ±  0%+182.8%  18.96 ±  0%  time.system_time
  4.49 ±  0% -26.1%   3.32 ±  0%  time.user_time


lkp-t410: Westmere
Memory: 2G


 piglit.time.system_time

  20 ++-+
  18 O+O O O O O  O O O O O O O O O O O  O O O O O O O O|
 |  |
  16 ++ |
  14 ++ |
 |  |
  12 ++ |
  10 ++ |
   8 ++ |
 * *..*.*.*.*.*.*.*.*.*.*.*..*.*.*.*.*.*.*.*.*.*.*.*..*.*.*.*.*.*
   6 ++:|
   4 ++   : |
 |:   : |
   2 ++: :  |
   0 ++*-*-*-*--+


 piglit.time.percent_of_cpu_this_job_got

  14 ++-+
 O O O O O O  O O O O O O   O O O O  O O O O O O O O|
  12 ++   O |
 |  |
  10 ++ |
 |  |
   8 ++ |
 |  |
   6 *+*..*.*.*.*.*.*.*.*.*.*.*..*.*.*.*.*.*.*.*.*.*.*.*..*.*.*.*.*.*
 | :|
   4 ++   : |
 |:   : |
   2 ++   : |
 | : :  |
   0 ++*-*-*-*--+


  time.user_time

5 +++
  4.5 ++ .*..*. .*..   .*.*.*
  * *.*.*  *.*.*.*.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.* |
4 ++:   |
  3.5 ++  O :  O|
  O:O O O   O O O O  O O O O O O O O O O O   O O O O|
3 ++   :|
  2.5 ++   :|
2 ++   :|
  |:   :|
  1.5 ++   :|
1 ++:  :|
  | : : |
  0.5 ++: : |
0 ++*-*-*-*-+


time.system_time

  20 

[lkp] [jbd2] de92c8caf16: no primary result change, +270.9% vmstat.procs.b, -61.0% vmstat.procs.r

2015-06-17 Thread Huang Ying
FYI, we noticed the below changes on

git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
commit de92c8caf16ca84926fa31b7a5590c0fb9c0d5ca ("jbd2: speedup 
jbd2_journal_get_[write|undo]_access()")

It appears that more processes are put in uninterruptible state after the 
commit.

=
tbox_group/testcase/rootfs/kconfig/compiler/wait_disks_timeout/runtime/disk/md/iosched/fs/nr_threads:
  
lkp-st02/dd-write/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/300/5m/11HDD/JBOD/cfq/ext4/10dd

commit: 
  8b00f400eedf91d074f831077003c0d4d9147377
  de92c8caf16ca84926fa31b7a5590c0fb9c0d5ca

8b00f400eedf91d0 de92c8caf16ca84926fa31b7a5 
 -- 
 %stddev %change %stddev
 \  |\  
 48241 ±  8%+139.2% 115399 ±  2%  softirqs.SCHED
  1565 ±  5% +12.4%   1759 ±  6%  time.involuntary_context_switches
391202 ±  5%+110.7% 824392 ±  1%  proc-vmstat.pgactivate
 70008 ± 56% -73.1%  18828 ± 50%  proc-vmstat.pgrotated
 21.50 ± 10%+270.9%  79.75 ±  1%  vmstat.procs.b
 86.00 ±  1% -61.0%  33.50 ±  2%  vmstat.procs.r
 23827 ±  0%  -2.7%  23187 ±  0%  vmstat.system.in
  2310 ±  4% -16.9%   1919 ± 10%  slabinfo.ext4_io_end.active_objs
  2310 ±  4% -16.9%   1919 ± 10%  slabinfo.ext4_io_end.num_objs
  2110 ± 18% +18.4%   2499 ±  3%  slabinfo.kmalloc-96.active_objs
  2163 ± 13% +15.5%   2499 ±  3%  slabinfo.kmalloc-96.num_objs
 1.683e+12 ±  1%  -6.1%   1.58e+12 ±  0%  perf-stat.L1-dcache-loads
 6.332e+08 ±  2%  -6.8%  5.904e+08 ±  1%  perf-stat.L1-dcache-prefetches
 9.849e+11 ±  1%  -4.0%  9.457e+11 ±  0%  perf-stat.L1-dcache-stores
 6.932e+10 ±  1%  +6.7%  7.397e+10 ±  0%  perf-stat.L1-icache-load-misses
 5.064e+12 ±  1%  -5.7%  4.777e+12 ±  0%  perf-stat.L1-icache-loads
 1.823e+09 ±  1%  +8.0%  1.969e+09 ±  0%  perf-stat.LLC-load-misses
 8.519e+11 ±  1%  -3.5%  8.222e+11 ±  0%  perf-stat.branch-instructions
  8.99e+09 ±  2% -23.1%  6.916e+09 ±  0%  perf-stat.branch-load-misses
 8.513e+11 ±  1%  -3.5%  8.216e+11 ±  0%  perf-stat.branch-loads
 8.967e+09 ±  2% -23.5%  6.858e+09 ±  0%  perf-stat.branch-misses
 8.059e+11 ±  1%  -3.4%  7.786e+11 ±  0%  perf-stat.bus-cycles
 3.658e+09 ±  1%  +5.6%  3.863e+09 ±  0%  perf-stat.cache-misses
 1.612e+11 ±  2%  +5.6%  1.701e+11 ±  0%  perf-stat.cache-references
 6.452e+12 ±  1%  -3.5%  6.227e+12 ±  0%  perf-stat.cpu-cycles
 92243 ±  6%+175.6% 254241 ±  2%  perf-stat.cpu-migrations
 1.305e+10 ± 12% -15.2%  1.106e+10 ±  4%  perf-stat.dTLB-load-misses
 1.683e+12 ±  1%  -6.2%  1.579e+12 ±  0%  perf-stat.dTLB-loads
 9.845e+11 ±  1%  -4.0%  9.446e+11 ±  0%  perf-stat.dTLB-stores
  31933083 ±  9% -31.3%   21923765 ±  2%  perf-stat.iTLB-load-misses
 4.638e+12 ±  1%  -3.6%  4.471e+12 ±  0%  perf-stat.iTLB-loads
 4.639e+12 ±  1%  -3.6%  4.471e+12 ±  0%  perf-stat.instructions
 6.448e+12 ±  1%  -3.4%  6.229e+12 ±  0%  perf-stat.ref-cycles
  1.36 ±  4% +17.9%   1.60 ±  3%  
perf-profile.cpu-cycles.__clear_user.iov_iter_zero.read_iter_zero.__vfs_read.vfs_read
  3.48 ±  4% +11.1%   3.87 ±  2%  
perf-profile.cpu-cycles.__ext4_get_inode_loc.ext4_get_inode_loc.ext4_reserve_inode_write.ext4_mark_inode_dirty.ext4_dirty_inode
  5.15 ±  1% +22.6%   6.31 ±  2%  
perf-profile.cpu-cycles.__ext4_handle_dirty_metadata.ext4_mark_iloc_dirty.ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty
  7.90 ±  3% -87.7%   0.98 ± 14%  
perf-profile.cpu-cycles.__ext4_journal_get_write_access.ext4_reserve_inode_write.ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty
  8.53 ±  0% +17.7%  10.04 ±  1%  
perf-profile.cpu-cycles.__ext4_journal_start_sb.ext4_da_write_begin.generic_perform_write.__generic_file_write_iter.ext4_file_write_iter
  0.91 ±  3% +11.3%   1.01 ±  9%  
perf-profile.cpu-cycles.__ext4_journal_start_sb.ext4_dirty_inode.__mark_inode_dirty.generic_write_end.ext4_da_write_end
  2.99 ±  2% +17.4%   3.50 ±  4%  
perf-profile.cpu-cycles.__ext4_journal_stop.ext4_da_write_end.generic_perform_write.__generic_file_write_iter.ext4_file_write_iter
  0.78 ±  6% +22.4%   0.96 ±  6%  
perf-profile.cpu-cycles.__ext4_journal_stop.ext4_dirty_inode.__mark_inode_dirty.generic_write_end.ext4_da_write_end
  1.19 ±  3% +29.1%   1.53 ±  4%  
perf-profile.cpu-cycles.__find_get_block.__getblk_gfp.__ext4_get_inode_loc.ext4_get_inode_loc.ext4_reserve_inode_write
  1.88 ±  4% +20.2%   2.26 ±  4%  
perf-profile.cpu-cycles.__getblk_gfp.__ext4_get_inode_loc.ext4_get_inode_loc.ext4_reserve_inode_write.ext4_mark_inode_dirty
 26.69 ±  0% -16.5%  22.29 ±  1%  

Re: [PATCH 3.4 000/172] 3.4.108-rc1 review

2015-06-17 Thread Zefan Li
On 2015/6/16 23:13, Ian Campbell wrote:
> On Tue, 2015-06-16 at 16:33 +0800, l...@kernel.org wrote:
>> From: Zefan Li 
>>
>> This is the start of the stable review cycle for the 3.4.108 release.
>> There are 172 patches in this series, all will be posted as a response
>> to this one.  If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Thu Jun 18 08:30:58 UTC 2015.
>> Anything received after that time might be too late.
> 
> Would it be possible to also include 31a418986a58 "xen: netback: read
> hotplug script once at start of day." which has started trickling into
> other stable branches already, please.
> 
> If not now then for 109.
> 

Queued up for 3.4.108. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.4 035/172] ALSA: hdspm - Constrain periods to 2 on older cards

2015-06-17 Thread Zefan Li
>> @@ -6042,6 +6042,12 @@ static int snd_hdspm_capture_open(struct 
>> snd_pcm_substream *substream)
>>  snd_pcm_hw_constraint_minmax(runtime,
>>   SNDRV_PCM_HW_PARAM_PERIOD_SIZE,
>>   64, 8192);
>> +snd_pcm_hw_constraint_minmax(runtime,
>> + SNDRV_PCM_HW_PARAM_PERIODS,
>> + 2, 2);
>> +snd_pcm_hw_constraint_minmax(runtime,
>> + SNDRV_PCM_HW_PARAM_PERIODS,
>> + 2, 2);
>>  break;
>>  }
> 
> This is not correct, those lines need to go to two different functions
> (snd_hdspm_playback_open and snd_hdspm_capture_open).
> 
> Here is how the patch should look like:
> 

Now fixed. Thanks for review the patch.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.4 000/172] 3.4.108-rc1 review

2015-06-17 Thread Zefan Li
On 2015/6/16 16:49, Guenter Roeck wrote:
> On 06/16/2015 01:33 AM, l...@kernel.org wrote:
>> From: Zefan Li 
>>
>> This is the start of the stable review cycle for the 3.4.108 release.
>> There are 172 patches in this series, all will be posted as a response
>> to this one.  If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Thu Jun 18 08:30:58 UTC 2015.
>> Anything received after that time might be too late.
>>
> 
> Build results:
> total: 98 pass: 94 fail: 4
> Failed builds:
> arm:allmodconfig
> score:defconfig
> sparc64:allmodconfig
> xtensa:allmodconfig
> Qemu test results:
> total: 23 pass: 22 fail: 1
> Failed tests:
> arm:arm_versatile_defconfig
> 
> Results are as expected.
> Details are available at http://server.roeck-us.net:8010/builders.
> 

Thanks for testing!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] cgroup: require write perm on common ancestor when moving processes on the default hierarchy

2015-06-17 Thread Zefan Li
Hi Tejun,

> -static int cgroup_procs_write_permission(struct task_struct *task)
> +static int cgroup_procs_write_permission(struct task_struct *task,
> +  struct cgroup *dst_cgrp,
> +  struct kernfs_open_file *of)
>  {
>   const struct cred *cred = current_cred();
>   const struct cred *tcred = get_task_cred(task);
> @@ -2407,6 +2409,26 @@ static int cgroup_procs_write_permission(struct 
> task_struct *task)
>   !uid_eq(cred->euid, tcred->suid))
>   ret = -EACCES;
>  
> + if (cgroup_on_dfl(dst_cgrp)) {

if (!ret && cgroup_on_dfl(dst_cgrp))

> + struct super_block *sb = of->file->f_path.dentry->d_sb;
> + struct cgroup *cgrp;
> + struct inode *inode;
> +
> + down_read(_set_rwsem);
> + cgrp = task_cgroup_from_root(task, _dfl_root);
> + up_read(_set_rwsem);
> +
> + while (!cgroup_is_descendant(dst_cgrp, cgrp))
> + cgrp = cgroup_parent(cgrp);
> +
> + ret = -ENOMEM;
> + inode = kernfs_get_inode(sb, cgrp->procs_kn);
> + if (inode) {
> + ret = inode_permission(inode, MAY_WRITE);
> + iput(inode);
> + }
> + }
> +
>   put_cred(tcred);
>   return ret;
>  }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-17 Thread Steven Rostedt
On Fri, 12 Jun 2015 11:50:38 -0400
Steven Rostedt  wrote:

> I reverted the following commits:
> 
> c627d31ba0696cbd829437af2be2f2dee3546b1e
> 9e2b9f37760e129cee053cc7b6e7288acc2a7134
> caf4ccd4e88cf2795c927834bc488c8321437586
> 
> And the issue goes away. That is, I watched the port go from
> ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port.
> 
> In fact, I watched the port with my portlist.c module, and it
> disappeared there too when it entered the TIME_WAIT state.
> 

I've been running v4.0.5 with the above commits reverted for 5 days
now, and there's still no hidden port appearing.

What's the status on this? Should those commits be reverted or is there
another solution to this bug?

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction

2015-06-17 Thread Sergey Senozhatsky
On (06/18/15 11:41), Sergey Senozhatsky wrote:
[..]
> > My concern is not a compacion overhead but higher memory footprint
> > consumed by zram in reserved memory.
> > It might hang system if zram used up reserved memory of system with
> > ALLOC_NO_WATERMARKS. With auto-compaction, userspace has a higher chance
> > to use more memory with uncompressible pages or file-backed pages
> > so zram-swap can use more reserved memory. We need to evaluate it, I think.
> > 

a couple of _not really related_ ideas that I want to voice.

(a) I'm thinking of extending zramX/compact attr. right now it's WO,
  and I think it makes sense to make it RW:
->write will trigger compaction
->read will return estimated number of bytes
  "zs_can_compact() * pages per zspage * page_size" that can be freed.
  so user-space will have at least minimal idea whether compaction is
  reasonable. but sure, this is racy and in general case things may
  change between `cat compact` and `echo 1 > compact`.


(b) adding a knob (yeah, like we don't have enough knobs already :-))
that will allow 'enable/disable auto compaction'.

-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tipc:Make the function tipc_buf_append have a return type of bool

2015-06-17 Thread Ying Xue
On 06/18/2015 10:44 AM, Nicholas Krause wrote:
> This converts the function tipc_buf_append now due to this
> particular function only returning either one or zero as
> its return value.
> 
> Signed-off-by: Nicholas Krause 

Acked-by: Ying Xue 

> ---
>  net/tipc/msg.c | 12 ++--
>  net/tipc/msg.h |  2 +-
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/net/tipc/msg.c b/net/tipc/msg.c
> index c3e96e8..52f2978 100644
> --- a/net/tipc/msg.c
> +++ b/net/tipc/msg.c
> @@ -115,9 +115,9 @@ struct sk_buff *tipc_msg_create(uint user, uint type,
>   *out: set when successful non-complete reassembly, otherwise 
> NULL
>   * @*buf: in:  the buffer to append. Always defined
>   *out: head buf after successful complete reassembly, otherwise 
> NULL
> - * Returns 1 when reassembly complete, otherwise 0
> + * Returns true when reassembly complete, otherwise false
>   */
> -int tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf)
> +bool tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf)
>  {
>   struct sk_buff *head = *headbuf;
>   struct sk_buff *frag = *buf;
> @@ -144,7 +144,7 @@ int tipc_buf_append(struct sk_buff **headbuf, struct 
> sk_buff **buf)
>   skb_frag_list_init(head);
>   TIPC_SKB_CB(head)->tail = NULL;
>   *buf = NULL;
> - return 0;
> + return false;
>   }
>  
>   if (!head)
> @@ -171,16 +171,16 @@ int tipc_buf_append(struct sk_buff **headbuf, struct 
> sk_buff **buf)
>   *buf = head;
>   TIPC_SKB_CB(head)->tail = NULL;
>   *headbuf = NULL;
> - return 1;
> + return true;
>   }
>   *buf = NULL;
> - return 0;
> + return false;
>  err:
>   pr_warn_ratelimited("Unable to build fragment list\n");
>   kfree_skb(*buf);
>   kfree_skb(*headbuf);
>   *buf = *headbuf = NULL;
> - return 0;
> + return false;
>  }
>  
>  /* tipc_msg_validate - validate basic format of received message
> diff --git a/net/tipc/msg.h b/net/tipc/msg.h
> index e1d3595e..00d3357 100644
> --- a/net/tipc/msg.h
> +++ b/net/tipc/msg.h
> @@ -771,7 +771,7 @@ void tipc_msg_init(u32 own_addr, struct tipc_msg *m, u32 
> user, u32 type,
>  struct sk_buff *tipc_msg_create(uint user, uint type, uint hdr_sz,
>   uint data_sz, u32 dnode, u32 onode,
>   u32 dport, u32 oport, int errcode);
> -int tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf);
> +bool tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf);
>  bool tipc_msg_bundle(struct sk_buff *bskb, struct sk_buff *skb, u32 mtu);
>  
>  bool tipc_msg_make_bundle(struct sk_buff **skb, u32 mtu, u32 dnode);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Resend PATCH v8 0/4] sched: Rewrite runnable load and utilization average tracking

2015-06-17 Thread Yuyang Du
On Wed, Jun 17, 2015 at 09:06:17PM +0800, Boqun Feng wrote:
> 
> > So the problem is:
> > 
> > 1) The tasks in the workload have too small weight (only 79), because
> >they share a task group.
> > 
> > 2) Probably some "high" weight task even runnable a small time
> >contribute "big" to cfs_rq's load_avg.
> 
> Thank you for your analysis.
> 
> Some updates:
> 
> I created a task group /g and set /g/cpu.shares to 13312 (1024 * 13),
> and then ran `stress --cpu 12` and `dbench 1` simultaneously in that
> group. The situation is much better, only one CPU is not fully loaded,
> and its utilization rate stays around 85%.
> 

Hi,

That is good. You can as well disable autogroup, or "nicer" the autogroup,
or exec the dbench from another shell, etc...

Thank you for the tests. This may not be intuitive, but actually the results
showcased that:

1) the patchset improves the task group share management, accomplishes what it 
is
   said to be in terms of fair share, finally.

2) the seamlessly combined runnable + blocked load_avg improves the share
   of the sometimes runnable sometimes blocked tasks by preserving the blocked
   load in the avg, fairness is achieved as the dbench has the same weight as
   the 12 stress tasks, and the dbench (buried in CPU hogging tasks) performance
   is thus improved.

Peter?

In addition, to correct the util_avg odd value, the following patch should work.
Send it here before I send another version.

Thanks,
Yuyang

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a8fd7b9..2b0907c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -687,7 +687,7 @@ void init_entity_runnable_average(struct sched_entity *se)
sa->load_avg = scale_load_down(se->load.weight);
sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
-   sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
+   sa->util_sum = LOAD_AVG_MAX;
/* when this task enqueue'ed, it will contribute to its cfs_rq's 
load_avg */
 }
 #else
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drm: bridge/dw_hdmi: Filter modes > 165MHz for DVI

2015-06-17 Thread Doug Anderson
Russell,

On Wed, Jun 17, 2015 at 4:30 PM, Russell King - ARM Linux
 wrote:
> On Wed, Jun 17, 2015 at 04:14:07PM -0700, Doug Anderson wrote:
>> If you plug in a DVI monitor to your HDMI port, you need to filter out
>> clocks > 165MHz.  That's because 165MHz is the maximum clock rate that
>> we can run single-link DVI at.
>>
>> If you want to run high resolutions to DVI, you'd need some type of an
>> active adapter that pretended that it was HDMI, interpreted the
>> signal, and produced a new dual link DVI signal at a lower clock rate.
>>
>> Signed-off-by: Doug Anderson 
>> ---
>> Note: this patch was tested against a 3.14 kernel with backports.  It
>> was only compile tested against linuxnext, but the code is
>> sufficiently similar that I'm convinced it will work there.
>
> Really?  I have to wonder what your testing was...
>
> hdmi->vic = drm_match_cea_mode(mode);
>
> if (!hdmi->vic) {
> dev_dbg(hdmi->dev, "Non-CEA mode used in HDMI\n");
> hdmi->hdmi_data.video_mode.mdvi = true;
> } else {
> dev_dbg(hdmi->dev, "CEA mode used vic=%d\n", hdmi->vic);
> hdmi->hdmi_data.video_mode.mdvi = false;
> }
>
> mdvi indicates whether the _currently set mode_ is a CEA mode or not (imho,
> it's mis-named).  It doesn't indicate whether we have a HDMI display device
> or a DVI display device connected, which seems to be what you want to use
> it for below.
>
> To sort that, what you need to do is detect a HDMI display device using
> drm_detect_hdmi_monitor() on the EDID received from the device before
> parsing the modes, and save that value in a dw_hdmi struct member, and
> I'd suggest that it's a top-level struct member, not buried in 'hdmi_data'
> or 'video_mode'.

OK, so clearly my patch won't work against mainline.  I guess it's a
good thing that I pointed out that it was only tested locally (would
have been better to test against mainline, but I don't think that's so
easy since there are several unlanded patches in mainline for
Rockchip).

As pointed out by others at , locally
our kernel has a slightly older version of
, which would change mdvi to be
as needed.

...so I guess my change is blocked on someone reviewing/landing that
series.  If that series is rejected (or is changed sufficiently so
that mdvi no longer is set via drm_detect_hdmi_monitor() then my patch
will need to be re-spun.

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] perf probe: Fix failure to probe events on arm

2015-06-17 Thread He Kuang
Fix failure to probe events on arm, problem is introduced by commit
5a51fcd1f30c ("perf probe: Skip kernel symbols which is out of
.text"). For some architectures, label '_etext' is not in the .text
section(in .notes section for arm/arm64). Label out of .text section is
not loaded as symbols and we got a zero value when look up its address,
which causes all events be wrongly skiped.

This patch skip checking text address range when failed to get the
address of '_etext' and fixes the problem.

Problem can be reproduced on arm as following:

  # perf probe --add='generic_perform_write'
  generic_perform_write+0 is out of .text, skip it.
  Probe point 'generic_perform_write' not found.
Error: Failed to add events.

After this patch:

  # perf probe --add='generic_perform_write'
  Added new event:
probe:generic_perform_write (on generic_perform_write)

  You can now use it in all perf tools, such as:

perf record -e probe:generic_perform_write -aR sleep 1

Signed-off-by: He Kuang 
---
 tools/perf/util/probe-event.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 076527b..381f23a 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -249,8 +249,12 @@ static void clear_probe_trace_events(struct 
probe_trace_event *tevs, int ntevs)
 static bool kprobe_blacklist__listed(unsigned long address);
 static bool kprobe_warn_out_range(const char *symbol, unsigned long address)
 {
+   u64 etext_addr;
+
/* Get the address of _etext for checking non-probable text symbol */
-   if (kernel_get_symbol_address_by_name("_etext", false) < address)
+   etext_addr = kernel_get_symbol_address_by_name("_etext", false);
+
+   if (etext_addr != 0 && etext_addr < address)
pr_warning("%s is out of .text, skip it.\n", symbol);
else if (kprobe_blacklist__listed(address))
pr_warning("%s is blacklisted function, skip it.\n", symbol);
-- 
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction

2015-06-17 Thread Sergey Senozhatsky
Hi,

On (06/18/15 10:50), Minchan Kim wrote:
[..]
> > hm, what's the difference with the existing implementation?
> > The 'new one' aborts when (a) !zs_can_compact() and (b) !migrate_zspage().
> > It holds the class lock less time than current compaction.
> 
> At old, it unlocks periodically(ie, per-zspage migration) so other who
> want to allocate a zspage in the class can have a chance but your patch
> increases lock holding time until all of zspages in the class is done
> so other will be blocked until all of zspage migration in the class is
> done.

ah, I see.
it doesn't hold the lock `until all the pages are done`. it holds it
as long as zs_can_compact() returns > 0. hm, I'm not entirely sure that
this patch set has increased the locking time (in average).


> > 
> > > I will review remain parts tomorrow(I hope) but what I want to say
> > > before going sleep is:
> > > 
> > > I like the idea but still have a concern to lack of fragmented zspages
> > > during memory pressure because auto-compaction will prevent fragment
> > > most of time. Surely, using fragment space as buffer in heavy memory
> > > pressure is not intened design so it could be fragile but I'm afraid
> > > this feature might accelrate it and it ends up having a problem and
> > > change current behavior in zram as swap.
> > 
> > Well, it's nearly impossible to prove anything with the numbers obtained
> > during some particular case. I agree that fragmentation can be both
> > 'good' (depending on IO pattern) and 'bad'.
> 
> Yes, it's not easy and I believe a few artificial testing are not enough
> to prove no regression but we don't have any choice.
> Actually, I think this patchset does make sense. Although it might have
> a problem on situation heavy memory pressure by lacking of fragment space,


I tested exactly this scenario yesterday (and sent an email). We leave `no 
holes'
in classes only in ~1.35% of cases. so, no, this argument is not valid. we 
preserve
fragmentation.

-ss

> I think we should go with this patchset and fix the problem with another way
> (e,g. memory pooling rather than relying on the luck of fragment).
> But I need something to take the risk. That's why I ask the number
> although it's not complete. It can cover a case at least, it is better than
> none. :)
> 
> > 
> > 
> > Auto-compaction of IDLE zram devices certainly makes sense, when system
> > is getting low on memory. zram devices are not always 'busy', serving
> > heavy IO. There may be N idle zram devices simply sitting and wasting
> > memory; or being 'moderately' busy; so compaction will not cause any
> > significant slow down there.
> > 
> > Auto-compaction of BUSY zram devices is less `desired', of course;
> > but not entirely terrible I think (zs_can_compact() can help here a
> > lot).
> 
> My concern is not a compacion overhead but higher memory footprint
> consumed by zram in reserved memory.
> It might hang system if zram used up reserved memory of system with
> ALLOC_NO_WATERMARKS. With auto-compaction, userspace has a higher chance
> to use more memory with uncompressible pages or file-backed pages
> so zram-swap can use more reserved memory. We need to evaluate it, I think.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Suspicious RCU usage at boot w/ arm ipi trace events?

2015-06-17 Thread Paul E. McKenney
On Wed, Jun 17, 2015 at 06:49:38PM -0700, Stephen Boyd wrote:
> On 06/16/2015 09:46 PM, Paul E. McKenney wrote:
> > On Tue, Jun 16, 2015 at 05:41:29PM -0700, Stephen Boyd wrote:
> >>
> >> The tracepoint 'trace_ipi_entry' in handle_IPI()  is using RCU and we
> >> haven't called irq_enter() yet at the point. Does this tracepoint need
> >> to have _rcuidle() added to it?
> > Yes, I believe that would fix this problem.
> >
> 
> Ok... here's the patch. I see the problem on my device and applying this
> patch fixes it.
> 
> 8<
> 
> From: Stephen Boyd 
> Subject: [PATCH] ARM: smp: Silence suspicious RCU usage with ipi tracepoints
> 
> John Stultz reports an RCU splat on boot with ARM ipi trace
> events enabled.
> 
> ===
> [ INFO: suspicious RCU usage. ]
> 4.1.0-rc7-00033-gb5bed2f #153 Not tainted
> ---
> include/trace/events/ipi.h:68 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> RCU used illegally from extended quiescent state!
> no locks held by swapper/0/0.
> 
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-rc7-00033-gb5bed2f #153
> Hardware name: Qualcomm (Flattened Device Tree)
> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> [] (show_stack) from [] (dump_stack+0x70/0xbc)
> [] (dump_stack) from [] (handle_IPI+0x428/0x604)
> [] (handle_IPI) from [] (gic_handle_irq+0x54/0x5c)
> [] (gic_handle_irq) from [] (__irq_svc+0x44/0x7c)
> Exception stack(0xc09f3f48 to 0xc09f3f90)
> 3f40:   0001 0001  c09f73b8 c09f4528 c0a5de9c
> 3f60: c076b4f0   c09ef108 c0a5cec1 0001  c09f3f90
> 3f80: c026bf60 c0210ab8 2113 
> [] (__irq_svc) from [] (arch_cpu_idle+0x20/0x3c)
> [] (arch_cpu_idle) from [] (cpu_startup_entry+0x2c0/0x5dc)
> [] (cpu_startup_entry) from [] (start_kernel+0x358/0x3c4)
> [] (start_kernel) from [<8020807c>] (0x8020807c)
> 
> At this point in the IPI handling path we haven't called
> irq_enter() yet, so RCU doesn't know that we're about to exit
> idle and properly warns that we're using RCU from an idle CPU.
> Use trace_ipi_entry_rcuidle() instead of trace_ipi_entry() so
> that RCU is informed about our exit from idle.
> 
> Reported-by: John Stultz 
> Fixes: 365ec7b17327 "ARM: add IPI tracepoints"
> Signed-off-by: Stephen Boyd 

Reviewed-by: Paul E. McKenney 

> ---
>  arch/arm/kernel/smp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 13a91d390832..03eb8a446dca 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -589,7 +589,7 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
>   struct pt_regs *old_regs = set_irq_regs(regs);
> 
>   if ((unsigned)ipinr < NR_IPI) {
> - trace_ipi_entry(ipi_types[ipinr]);
> + trace_ipi_entry_rcuidle(ipi_types[ipinr]);
>   __inc_irq_stat(cpu, ipi_irqs[ipinr]);
>   }
> 
> @@ -648,7 +648,7 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
>   }
> 
>   if ((unsigned)ipinr < NR_IPI)
> - trace_ipi_exit(ipi_types[ipinr]);
> + trace_ipi_exit_rcuidle(ipi_types[ipinr]);
>   set_irq_regs(old_regs);
>  }
> 
> 
> -- 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irqchip: bcm2835: Add FIQ support

2015-06-17 Thread Stephen Warren
On 06/12/2015 11:26 AM, Noralf Trønnes wrote:
> Add a duplicate irq range with an offset on the hwirq's so the
> driver can detect that enable_fiq() is used.
> Tested with downstream dwc_otg USB controller driver.

This basically looks OK, but a few comments/thoughts:

a) Should the Kconfig change be a separate patch since it's a separate
subsystem?

b) Doesn't the driver need to refuse some operation (handler
registration, IRQ setup, IRQ enable, ...?) for more than 1 IRQ in the
FIQ range, since the FIQ control register only allows routing 1 IRQ to FIQ.

c) The DT binding needs updating to describe the extra IRQs:

> Documentation/devicetree/bindings/interrupt-controller/brcm,bcm28armctrl-ic.txt

d) I wonder how the FIQ handler actually gets routed to this controller
and hooked to its handler etc. I assume there's a separate patch for
that coming?

BTW, I'll be on vacation for just over a couple weeks starting Saturday,
so responses will certainly be delayed for a quite a while.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 0/2] locking/qrwlock: More optimizations in qrwlock

2015-06-17 Thread Waiman Long
v3->v4:
 - Remove the unnecessary _QW_WMASK check in
   queue_read_lock_slowpath().

v2->v3:
 - Fix incorrect commit log message in patch 1.

v1->v2:
 - Add microbenchmark data for the second patch

This patch set contains 2 patches on qrwlock. The first one is to
optimize the interrupt context reader slowpath.  The second one is
to optimize the writer slowpath.

Waiman Long (2):
  locking/qrwlock: Better optimization for interrupt context readers
  locking/qrwlock: Don't contend with readers when setting _QW_WAITING

 include/asm-generic/qrwlock.h |4 ++--
 kernel/locking/qrwlock.c  |   41 +++--
 2 files changed, 33 insertions(+), 12 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 2/2] locking/qrwlock: Don't contend with readers when setting _QW_WAITING

2015-06-17 Thread Waiman Long
The current cmpxchg() loop in setting the _QW_WAITING flag for writers
in queue_write_lock_slowpath() will contend with incoming readers
causing possibly extra cmpxchg() operations that are wasteful. This
patch changes the code to do a byte cmpxchg() to eliminate contention
with new readers.

A multithreaded microbenchmark running 5M read_lock/write_lock loop
on a 8-socket 80-core Westmere-EX machine running 4.0 based kernel
with the qspinlock patch have the following execution times (in ms)
with and without the patch:

With R:W ratio = 5:1

Threadsw/o patchwith patch  % change
------  
   2 990895   -9.6%
   32136   1912  -10.5%
   43166   2830  -10.6%
   53953   3629   -8.2%
   64628   4405   -4.8%
   75344   5197   -2.8%
   86065   6004   -1.0%
   96826   6811   -0.2%
  107599   75990.0%
  159757   9766   +0.1%
  20   13767  13817   +0.4%

With small number of contending threads, this patch can improve
locking performance by up to 10%. With more contending threads,
however, the gain diminishes.

Signed-off-by: Waiman Long 
---
 kernel/locking/qrwlock.c |   28 
 1 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 7fd223c..38ad7e0 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -22,6 +22,26 @@
 #include 
 #include 
 
+/*
+ * This internal data structure is used for optimizing access to some of
+ * the subfields within the atomic_t cnts.
+ */
+struct __qrwlock {
+   union {
+   atomic_t cnts;
+   struct {
+#ifdef __LITTLE_ENDIAN
+   u8 wmode;   /* Writer mode   */
+   u8 rcnts[3];/* Reader counts */
+#else
+   u8 rcnts[3];/* Reader counts */
+   u8 wmode;   /* Writer mode   */
+#endif
+   };
+   };
+   arch_spinlock_t lock;
+};
+
 /**
  * rspin_until_writer_unlock - inc reader count & spin until writer is gone
  * @lock  : Pointer to queue rwlock structure
@@ -108,10 +128,10 @@ void queue_write_lock_slowpath(struct qrwlock *lock)
 * or wait for a previous writer to go away.
 */
for (;;) {
-   cnts = atomic_read(>cnts);
-   if (!(cnts & _QW_WMASK) &&
-   (atomic_cmpxchg(>cnts, cnts,
-   cnts | _QW_WAITING) == cnts))
+   struct __qrwlock *l = (struct __qrwlock *)lock;
+
+   if (!READ_ONCE(l->wmode) &&
+  (cmpxchg(>wmode, 0, _QW_WAITING) == 0))
break;
 
cpu_relax_lowlatency();
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 1/2] locking/qrwlock: Better optimization for interrupt context readers

2015-06-17 Thread Waiman Long
The qrwlock is fair in the process context, but becoming unfair when
in the interrupt context to support use cases like the tasklist_lock.

The current code isn't that well-documented on what happens when
in the interrupt context. The rspin_until_writer_unlock() will only
spin if the writer has gotten the lock. If the writer is still in the
waiting state, the increment in the reader count will cause the writer
to remain in the waiting state and the new interrupt context reader
will get the lock and return immediately. The current code, however,
do an additional read of the lock value which is not necessary as
the information have already been there in the fast path. This may
sometime cause an additional cacheline transfer when the lock is
highly contended.

This patch passes the lock value information gotten in the fast path
to the slow path to eliminate the additional read. It also document
the action for the interrupt context readers more clearly.

Signed-off-by: Waiman Long 
---
 include/asm-generic/qrwlock.h |4 ++--
 kernel/locking/qrwlock.c  |   13 +++--
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 6383d54..865d021 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -36,7 +36,7 @@
 /*
  * External function declarations
  */
-extern void queue_read_lock_slowpath(struct qrwlock *lock);
+extern void queue_read_lock_slowpath(struct qrwlock *lock, u32 cnts);
 extern void queue_write_lock_slowpath(struct qrwlock *lock);
 
 /**
@@ -105,7 +105,7 @@ static inline void queue_read_lock(struct qrwlock *lock)
return;
 
/* The slowpath will decrement the reader count, if necessary. */
-   queue_read_lock_slowpath(lock);
+   queue_read_lock_slowpath(lock, cnts);
 }
 
 /**
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 00c12bb..7fd223c 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -43,22 +43,23 @@ rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
  * queue_read_lock_slowpath - acquire read lock of a queue rwlock
  * @lock: Pointer to queue rwlock structure
  */
-void queue_read_lock_slowpath(struct qrwlock *lock)
+void queue_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
 {
-   u32 cnts;
-
/*
 * Readers come here when they cannot get the lock without waiting
 */
if (unlikely(in_interrupt())) {
/*
-* Readers in interrupt context will spin until the lock is
-* available without waiting in the queue.
+* Readers in interrupt context will get the lock immediately
+* if the writer is just waiting (not holding the lock yet).
+* The rspin_until_writer_unlock() function returns immediately
+* in this case. Otherwise, they will spin until the lock
+* is available without waiting in the queue.
 */
-   cnts = smp_load_acquire((u32 *)>cnts);
rspin_until_writer_unlock(lock, cnts);
return;
}
+
atomic_sub(_QR_BIAS, >cnts);
 
/*
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] watchdog: bcm2835: Fix poweroff behaviour

2015-06-17 Thread Stephen Warren
On 06/17/2015 08:04 AM, Noralf Trønnes wrote:
> Currently poweroff/halt results in a reboot on the Raspberry Pi.
> The firmware uses the RSTS register to know which partiton to
> boot from. The partiton value is spread into bits
> 0, 2, 4, 6, 8, 10. Partiton 63 is a special partition used by
> the firmware to indicate halt.
> 
> The firmware made this change in 19 Aug 2013 and was matched
> by the downstream commit:
> Changes for new NOOBS multi partition booting from gsh

Tested-by: Stephen Warren 
Acked-by: Stephen Warren 

I wonder if, sometime down the road, it's worth querying the firmware
version via the firmware driver and dynamically doing the right thing.
Still, the combination of a bleeding edge kernel and an ancient firmware
seems a bit unlikely, so perhaps not worth the bother.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kexec_load(2) bypasses signature verification

2015-06-17 Thread Dave Young
On 06/18/15 at 09:16am, Dave Young wrote:
> On 06/16/15 at 09:47pm, Vivek Goyal wrote:
> > On Tue, Jun 16, 2015 at 08:32:37PM -0500, Eric W. Biederman wrote:
> > > Vivek Goyal  writes:
> > > 
> > > > On Tue, Jun 16, 2015 at 02:38:31PM -0500, Eric W. Biederman wrote:
> > > >> 
> > > >> Adding Vivek as he is the one who implemented kexec_file_load.
> > > >> I was hoping he would respond to this thread, and it looks like he
> > > >> simply has not ever been Cc'd.
> > > >> 
> > > >> Theodore Ts'o  writes:
> > > >> 
> > > >> > On Mon, Jun 15, 2015 at 09:37:05AM -0400, Josh Boyer wrote:
> > > >> >> The bits that actually read Secure Boot state out of the UEFI
> > > >> >> variables, and apply protections to the machine to avoid compromise
> > > >> >> under the SB threat model.  Things like disabling the old kexec...
> > > >> >
> > > >> > I don't have any real interest in using Secure Boot, but I *am*
> > > >> > interested in using CONFIG_KEXEC_VERIFY_SIG[1].  So perhaps we need 
> > > >> > to
> > > >> > have something similar to what we have with signed modules in terms 
> > > >> > of
> > > >> > CONFIG_MODULE_SIG_FORCE and module/sig_enforce, but for
> > > >> > KEXEC_VERIFY_SIG.  This would mean creating a separate flag
> > > >> > independent of the one Linus suggested for Secure Boot, but since we
> > > >> > have one for signed modules, we do have precedent for this sort of
> > > >> > thing.
> > > >> 
> > > >> My overall request with respect to kexec has been that we implement
> > > >> things that make sense outside of the bizarre threat model of the Linux
> > > >> folks who were talking about secure boot.
> > > >> 
> > > >> nI have not navigated the labyrinth of config options but having a way 
> > > >> to
> > > >> only boot signed things with kexec seems a completely sensible way to
> > > >> operate in the context of signed images.
> > > >> 
> > > >> I don't know how much that will help given that actors with sufficient
> > > >> resources have demonstrated the ability to steal private keys, but
> > > >> assuming binary signing is an effective technique (or why else do it)
> > > >> then having an option to limit kexec to only loading signed images 
> > > >> seems
> > > >> sensible.
> > > >
> > > > I went through the mail chain on web and here are my thoughts.
> > > >
> > > > - So yes, upstream does not have the logic which automatically disables
> > > >   the old syscall (kexec_load()) on secureboot systems. Distributions
> > > >   carry those patches.
> > > >
> > > > - This KEXEC_VERIFY_SIG option only cotrols the behavior for
> > > >   kexec_file_load() syscall and is not meant to directly affect any
> > > >   behavior of old syscall (kexec_load()). I think I should have named
> > > >   it KEXEC_FILE_VERIFY_SIG. Though help text makes it clear.
> > > >   "Verify kernel signature during kexec_file_load() syscall".
> > > >
> > > > - I think disabling old system call if KEXEC_VERIFY_SIG() is set
> > > >   will break existing setup which use old system call by default, except
> > > >   the case of secureboot system. And old syscall path is well tested
> > > >   and new syscall might not be in a position to support all the corner
> > > >   cases, atleast as of now.
> > > >
> > > > Ted, 
> > > >
> > > > So looks like you are looking for a system/option where you just want to
> > > > always make use of kexec_file_load() and disable kexec_load(). This 
> > > > sounds
> > > > like you want a kernel where kexec_load() is compiled out and you want
> > > > only kexec_file_load() in.
> > > >
> > > > Right now one can't do that becase kexec_file_load() depends on
> > > > CONFIG_KEXEC option.
> > > >
> > > > I am wondering that how about making CONFIG_KEXEC_FILE_LOAD independent
> > > > of CONFIG_KEXEC. That way one can set CONFIG_KEXEC_VERIFY_SIG=y, and
> > > > only signed kernel can be kexeced on that system.
> > > >
> > > > This should gel well with long term strategy of deprecating kexec_load()
> > > > at some point of time when kexec_file_load() is ready to completely
> > > > replace it.
> > > 
> > > Interesting.
> > > 
> > > I suspect that what we want is to have CONFIG_KEXEC for the core
> > > and additional CONFIG_KEXEC_LOAD option that covers that kexec_load call.
> > > 
> > > That should make it trivially easy to disable the kexec_load system call
> > > in cases where people care.
> > 
> > Or, we could create another option CONFIG_KEXEC_CORE/CONFIG_KEXEC_COMMON
> > which will be automatically selected when either CONFIG_KEXEC or
> > CONIG_KEXEC_FILE are selected.
> > 
> > All common code can go under this option and rest can go under respective
> > config options.
> > 
> > That way, those who have CONFIG_KEXEC=y in old config files will not be
> > broken. They don't have to learn about new options at all.
> 
> Or simply add a new config option KEXEC_VERIFY_SIG_FORCE, so we can return
> error in kexec_load and print some error message.

Just like below, does this work for you, Ted?

---
 arch/x86/Kconfig |7 

Re: [PATCH] livepatch: add sysfs interface /sys/kernel/livepatch/state

2015-06-17 Thread Li Bin
On 2015/6/17 21:20, Miroslav Benes wrote:
> On Wed, 17 Jun 2015, Li Bin wrote:
> 
>> On 2015/6/17 16:13, Miroslav Benes wrote:
>>> On Wed, 17 Jun 2015, Li Bin wrote:
>>
>>> The list of applied patches can be obtained just by 'ls 
>>> /sys/kernel/livepatch' and their state is in enabled attribute in each 
>>> respective patch (no, you cannot obtain the order in the stack).
>>
>> But why we cannot obtain it? I think We indeed need the stack order when we
>> will disable one patch, at least, we can find out whether it is on the top of
>> the stack if failed to disable one patch.
> 
> I meant with the current means. It is correct that we do not export 
> information about stacking order anywhere.
> 
> What we do in kGraft is that there is something like refcount for each 
> patch. When the patch is being applied the refcount of all the previous 
> patches is increased. Only the patch with the refcount equal to 0 can be 
> removed. This information is exported and gives one a clue about the 
> order.
> 

It sounds good, but the information is limited that cannot show the stack
order, right? (The refcount of all the disabled patch is equal to 0, if
being enable one disabled patch, the stack order is also needed.)

refcount Patch
---
3   patch1(enabled)
2   patch2(enabled)
1   patch3(enabled)
0   patch4(enabled)
0   patch5(disabled)
0   patch6(disabled)

Unless the refcount is allowed to be less than 0, then when the patch is
being disabled the refcount of all the patches is decreased, when the patch
is being enabled the refcount of all patches is increased. Only the patch
with the refcount equal to 0 can be disabled and only equal to -1 can be
enabled, and only less or equal to 0 can be removed (that the livepatch does
not support right now).

refcount Patch
---
3   patch1(enabled)
2   patch2(enabled)
1   patch3(enabled)
0   patch4(enabled)
-1  patch5(disabled)
-2  patch6(disabled)

Thanks,
Li Bin

> So if there is a need to have something like this there would certainly 
> be a way (or ways to be precise) how to do it. The question is if we need 
> it right now.
> 
> Regards,
> Miroslav
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction

2015-06-17 Thread Minchan Kim
Hi Sergey,

On Wed, Jun 17, 2015 at 12:45:29AM +0900, Sergey Senozhatsky wrote:
> On (06/16/15 23:47), Minchan Kim wrote:
> [..]
> > > 
> > > Compaction now has a relatively quick pool scan so we are able to
> > > estimate the number of pages that will be freed easily, which makes it
> > > possible to call this function from a shrinker->count_objects() callback.
> > > We also abort compaction as soon as we detect that we can't free any
> > > pages any more, preventing wasteful objects migrations. In the example
> > > above, "6074 objects were migrated" implies that we actually released
> > > zspages back to system.
> > > 
> > > The initial patch was triggering compaction from zs_free() for
> > > every ZS_ALMOST_EMPTY page. Minchan Kim proposed to use a slab
> > > shrinker.
> > 
> > First of all, thanks for mentioning me as proposer.
> > However, it's not a helpful comment for other reviewers and
> > anonymous people who will review this in future.
> > 
> > At least, write why I suggested it so others can understand
> > the pros/cons.
> 
> OK, this one is far from perfect. Will try to improve later.
> 
> > > 
> > > Signed-off-by: Sergey Senozhatsky 
> > > Reported-by: Minchan Kim 
> > 
> > I didn't report anything. ;-).
> 
> :-)
> 
> > 
> > > ---
> 
> [..]
> 
> > 
> > So should we hold class lock until finishing the compaction of the class?
> > It would make horrible latency for other allocation from the class
> > in parallel.
> 
> hm, what's the difference with the existing implementation?
> The 'new one' aborts when (a) !zs_can_compact() and (b) !migrate_zspage().
> It holds the class lock less time than current compaction.

At old, it unlocks periodically(ie, per-zspage migration) so other who
want to allocate a zspage in the class can have a chance but your patch
increases lock holding time until all of zspages in the class is done
so other will be blocked until all of zspage migration in the class is
done.

> 
> > I will review remain parts tomorrow(I hope) but what I want to say
> > before going sleep is:
> > 
> > I like the idea but still have a concern to lack of fragmented zspages
> > during memory pressure because auto-compaction will prevent fragment
> > most of time. Surely, using fragment space as buffer in heavy memory
> > pressure is not intened design so it could be fragile but I'm afraid
> > this feature might accelrate it and it ends up having a problem and
> > change current behavior in zram as swap.
> 
> Well, it's nearly impossible to prove anything with the numbers obtained
> during some particular case. I agree that fragmentation can be both
> 'good' (depending on IO pattern) and 'bad'.

Yes, it's not easy and I believe a few artificial testing are not enough
to prove no regression but we don't have any choice.
Actually, I think this patchset does make sense. Although it might have
a problem on situation heavy memory pressure by lacking of fragment space,
I think we should go with this patchset and fix the problem with another way
(e,g. memory pooling rather than relying on the luck of fragment).
But I need something to take the risk. That's why I ask the number
although it's not complete. It can cover a case at least, it is better than
none. :)

> 
> 
> Auto-compaction of IDLE zram devices certainly makes sense, when system
> is getting low on memory. zram devices are not always 'busy', serving
> heavy IO. There may be N idle zram devices simply sitting and wasting
> memory; or being 'moderately' busy; so compaction will not cause any
> significant slow down there.
> 
> Auto-compaction of BUSY zram devices is less `desired', of course;
> but not entirely terrible I think (zs_can_compact() can help here a
> lot).

My concern is not a compacion overhead but higher memory footprint
consumed by zram in reserved memory.
It might hang system if zram used up reserved memory of system with
ALLOC_NO_WATERMARKS. With auto-compaction, userspace has a higher chance
to use more memory with uncompressible pages or file-backed pages
so zram-swap can use more reserved memory. We need to evaluate it, I think.

> 
> Just an idea
> we can move shrinker registration from zsmalloc to zram. zram will be
> able to STOP (or forbid) any shrinker activities while it [zram] serves
> IO requests (or has requests in its request_queue).
> 
> But, again, advocating fragmentation is tricky.
> 
> 
> I'll quote from the cover letter
> 
> : zsmalloc in some cases can suffer from a notable fragmentation and
> : compaction can release some considerable amount of memory. The problem
> : here is that currently we fully rely on user space to perform compaction
> : when needed. However, performing zsmalloc compaction is not always an
> : obvious thing to do. For example, suppose we have a `idle' fragmented
> : (compaction was never performed) zram device and system is getting low
> : on memory due to some 3rd party user processes (gcc LTO, or firefox, etc.).
> : It's quite unlikely that user space will issue zpool 

Re: Suspicious RCU usage at boot w/ arm ipi trace events?

2015-06-17 Thread Stephen Boyd
On 06/16/2015 09:46 PM, Paul E. McKenney wrote:
> On Tue, Jun 16, 2015 at 05:41:29PM -0700, Stephen Boyd wrote:
>>
>> The tracepoint 'trace_ipi_entry' in handle_IPI()  is using RCU and we
>> haven't called irq_enter() yet at the point. Does this tracepoint need
>> to have _rcuidle() added to it?
> Yes, I believe that would fix this problem.
>

Ok... here's the patch. I see the problem on my device and applying this
patch fixes it.

8<

From: Stephen Boyd 
Subject: [PATCH] ARM: smp: Silence suspicious RCU usage with ipi tracepoints

John Stultz reports an RCU splat on boot with ARM ipi trace
events enabled.

===
[ INFO: suspicious RCU usage. ]
4.1.0-rc7-00033-gb5bed2f #153 Not tainted
---
include/trace/events/ipi.h:68 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-rc7-00033-gb5bed2f #153
Hardware name: Qualcomm (Flattened Device Tree)
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0x70/0xbc)
[] (dump_stack) from [] (handle_IPI+0x428/0x604)
[] (handle_IPI) from [] (gic_handle_irq+0x54/0x5c)
[] (gic_handle_irq) from [] (__irq_svc+0x44/0x7c)
Exception stack(0xc09f3f48 to 0xc09f3f90)
3f40:   0001 0001  c09f73b8 c09f4528 c0a5de9c
3f60: c076b4f0   c09ef108 c0a5cec1 0001  c09f3f90
3f80: c026bf60 c0210ab8 2113 
[] (__irq_svc) from [] (arch_cpu_idle+0x20/0x3c)
[] (arch_cpu_idle) from [] (cpu_startup_entry+0x2c0/0x5dc)
[] (cpu_startup_entry) from [] (start_kernel+0x358/0x3c4)
[] (start_kernel) from [<8020807c>] (0x8020807c)

At this point in the IPI handling path we haven't called
irq_enter() yet, so RCU doesn't know that we're about to exit
idle and properly warns that we're using RCU from an idle CPU.
Use trace_ipi_entry_rcuidle() instead of trace_ipi_entry() so
that RCU is informed about our exit from idle.

Reported-by: John Stultz 
Fixes: 365ec7b17327 "ARM: add IPI tracepoints"
Signed-off-by: Stephen Boyd 
---
 arch/arm/kernel/smp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 13a91d390832..03eb8a446dca 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -589,7 +589,7 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
struct pt_regs *old_regs = set_irq_regs(regs);
 
if ((unsigned)ipinr < NR_IPI) {
-   trace_ipi_entry(ipi_types[ipinr]);
+   trace_ipi_entry_rcuidle(ipi_types[ipinr]);
__inc_irq_stat(cpu, ipi_irqs[ipinr]);
}
 
@@ -648,7 +648,7 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
}
 
if ((unsigned)ipinr < NR_IPI)
-   trace_ipi_exit(ipi_types[ipinr]);
+   trace_ipi_exit_rcuidle(ipi_types[ipinr]);
set_irq_regs(old_regs);
 }
 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 05/14] crypto: marvell/CESA: add TDMA support

2015-06-17 Thread Herbert Xu
On Wed, Jun 17, 2015 at 05:34:02PM +0200, Boris Brezillon wrote:
>
> I can check for that too, but note that it doesn't prevent one from
> providing different scatterlist structures pointing to the same memory
> region.

Pointing to the same memory should be fine, it's the act of passing
the same SG list to dma_map_sg twice that's wrong because dma_map_sg
will modify the SG list.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 05/14] crypto: marvell/CESA: add TDMA support

2015-06-17 Thread Herbert Xu
On Wed, Jun 17, 2015 at 05:58:28PM +0200, Boris Brezillon wrote:
>
> Here is an incremental patch [1], please let me know if something else
> is missing.

Looks good.  Thanks!
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 2/2] locking/qrwlock: Don't contend with readers when setting _QW_WAITING

2015-06-17 Thread Waiman Long

On 06/16/2015 02:02 PM, Will Deacon wrote:

On Mon, Jun 15, 2015 at 11:24:03PM +0100, Waiman Long wrote:

The current cmpxchg() loop in setting the _QW_WAITING flag for writers
in queue_write_lock_slowpath() will contend with incoming readers
causing possibly extra cmpxchg() operations that are wasteful. This
patch changes the code to do a byte cmpxchg() to eliminate contention
with new readers.

A multithreaded microbenchmark running 5M read_lock/write_lock loop
on a 8-socket 80-core Westmere-EX machine running 4.0 based kernel
with the qspinlock patch have the following execution times (in ms)
with and without the patch:

With R:W ratio = 5:1

Threadsw/o patchwith patch  % change
------  
   2 990895   -9.6%
   32136   1912  -10.5%
   43166   2830  -10.6%
   53953   3629   -8.2%
   64628   4405   -4.8%
   75344   5197   -2.8%
   86065   6004   -1.0%
   96826   6811   -0.2%
  107599   75990.0%
  159757   9766   +0.1%
  20   13767  13817   +0.4%

With small number of contending threads, this patch can improve
locking performance by up to 10%. With more contending threads,
however, the gain diminishes.

Signed-off-by: Waiman Long
---
  kernel/locking/qrwlock.c |   28 
  1 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index d7d7557..559198a 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -22,6 +22,26 @@
  #include
  #include

+/*
+ * This internal data structure is used for optimizing access to some of
+ * the subfields within the atomic_t cnts.
+ */
+struct __qrwlock {
+   union {
+   atomic_t cnts;
+   struct {
+#ifdef __LITTLE_ENDIAN
+   u8 wmode;   /* Writer mode   */
+   u8 rcnts[3];/* Reader counts */
+#else
+   u8 rcnts[3];/* Reader counts */
+   u8 wmode;   /* Writer mode   */
+#endif
+   };
+   };
+   arch_spinlock_t lock;
+};
+
  /**
   * rspin_until_writer_unlock - inc reader count&  spin until writer is gone
   * @lock  : Pointer to queue rwlock structure
@@ -109,10 +129,10 @@ void queue_write_lock_slowpath(struct qrwlock *lock)
 * or wait for a previous writer to go away.
 */
for (;;) {
-   cnts = atomic_read(>cnts);
-   if (!(cnts&  _QW_WMASK)&&
-   (atomic_cmpxchg(>cnts, cnts,
-   cnts | _QW_WAITING) == cnts))
+   struct __qrwlock *l = (struct __qrwlock *)lock;
+
+   if (!READ_ONCE(l->wmode)&&
+  (cmpxchg(>wmode, 0, _QW_WAITING) == 0))
break;

Maybe you could also update the x86 implementation of queue_write_unlock
to write the wmode field instead of casting to u8 *?

Will


The queue_write_unlock() function is in the header file. I don't want to 
expose the internal structure to other files.


Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: use srcu for shrinkers

2015-06-17 Thread Davidlohr Bueso
On Wed, 2015-06-17 at 09:47 +0200, Michal Hocko wrote:
> On the other hand using srcu is a neat idea. Shrinkers only need the
> existence guarantee when racing with unregister. Register even shouldn't
> be that interesting because such a shrinker wouldn't have much to
> shrink anyway so we can safely miss it AFAIU. With the srcu read lock
> we can finally get rid of the try_lock. I do not think you need an
> ugly spin_is_locked as the replacement though. We have the existence
> guarantee and that should be sufficient.

So the reason for the spin_is_locked check was that I was concerned
about new reader(s) that come in while doing the registry. Currently
this is forbidden by the trylock and fake-ish retry. But yes, perhaps I
was being over safe and we shouldn't be blockling the reclaim simply
because a shrinker is registering. And it would be cleaner to get rid of
the whole retry idea and just use rcu guarantees.

This is probably a little late in the game to try to push for 4.2, so
I'll send a v2 with any other updates that might come up once the merge
window closes.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/2] locking/qrwlock: Better optimization for interrupt context readers

2015-06-17 Thread Waiman Long

On 06/16/2015 08:17 AM, Will Deacon wrote:

Hi Waiman,

On Mon, Jun 15, 2015 at 11:24:02PM +0100, Waiman Long wrote:

The qrwlock is fair in the process context, but becoming unfair when
in the interrupt context to support use cases like the tasklist_lock.

The current code isn't that well-documented on what happens when
in the interrupt context. The rspin_until_writer_unlock() will only
spin if the writer has gotten the lock. If the writer is still in the
waiting state, the increment in the reader count will cause the writer
to remain in the waiting state and the new interrupt context reader
will get the lock and return immediately. The current code, however,
do an additional read of the lock value which is not necessary as the
information have already been there in the fast path. This may sometime
cause an additional cacheline load when the lock is highly contended.

This patch passes the lock value information gotten in the fast path
to the slow path to eliminate the additional read. It also clarify the
action for the interrupt context readers more explicitly.

Signed-off-by: Waiman Long
---
  include/asm-generic/qrwlock.h |4 ++--
  kernel/locking/qrwlock.c  |   14 --
  2 files changed, 10 insertions(+), 8 deletions(-)

[...]


diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 00c12bb..d7d7557 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -43,22 +43,24 @@ rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
   * queue_read_lock_slowpath - acquire read lock of a queue rwlock
   * @lock: Pointer to queue rwlock structure
   */
-void queue_read_lock_slowpath(struct qrwlock *lock)
+void queue_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
  {
-   u32 cnts;
-
/*
 * Readers come here when they cannot get the lock without waiting
 */
if (unlikely(in_interrupt())) {
/*
-* Readers in interrupt context will spin until the lock is
-* available without waiting in the queue.
+* Readers in interrupt context will get the lock immediately
+* if the writer is just waiting (not holding the lock yet)
+* or they will spin until the lock is available without
+* waiting in the queue.
 */
-   cnts = smp_load_acquire((u32 *)>cnts);
+   if ((cnts&  _QW_WMASK) != _QW_LOCKED)
+   return;

I really doubt the check here is gaining you any performance, given
rspin_until_write_unlock does the same check immediately and should be
inlined. Just dropping the acquire and passing cnts through should be
sufficient.


Yes, you are right. I can just pass the cnt to 
rspin_until_write_unlock() and be done with it.


Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kexec_load(2) bypasses signature verification

2015-06-17 Thread Dave Young
On 06/15/15 at 04:01pm, Theodore Ts'o wrote:
> On Mon, Jun 15, 2015 at 09:37:05AM -0400, Josh Boyer wrote:
> > The bits that actually read Secure Boot state out of the UEFI
> > variables, and apply protections to the machine to avoid compromise
> > under the SB threat model.  Things like disabling the old kexec...
> 
> I don't have any real interest in using Secure Boot, but I *am*
> interested in using CONFIG_KEXEC_VERIFY_SIG[1].  So perhaps we need to
> have something similar to what we have with signed modules in terms of
> CONFIG_MODULE_SIG_FORCE and module/sig_enforce, but for
> KEXEC_VERIFY_SIG.  This would mean creating a separate flag
> independent of the one Linus suggested for Secure Boot, but since we
> have one for signed modules, we do have precedent for this sort of
> thing.

Agree and vote for this way as I replied in another email about
CONFIG_KEXEC_VERIFY_SIG_FORCE.

Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/12] mm: mirrored memory support for page buddy allocations

2015-06-17 Thread Xishi Qiu
On 2015/6/16 17:46, Vlastimil Babka wrote:

> On 06/16/2015 10:17 AM, Xishi Qiu wrote:
>> On 2015/6/16 15:53, Vlastimil Babka wrote:
>>
>>> On 06/04/2015 02:54 PM, Xishi Qiu wrote:

 I think add a new migratetype is btter and easier than a new zone, so I use
>>>
>>> If the mirrored memory is in a single reasonably compact (no large holes) 
>>> range
>>> (per NUMA node) and won't dynamically change its size, then zone might be a
>>> better option. For one thing, it will still allow distinguishing movable and
>>> unmovable allocations within the mirrored memory.
>>>
>>> We had enough fun with MIGRATE_CMA and all kinds of checks it added to 
>>> allocator
>>> hot paths, and even CMA is now considering moving to a separate zone.
>>>
>>
>> Hi, how about the problem of this case:
>> e.g. node 0: 0-4G(dma and dma32)
>>  node 1: 4G-8G(normal), 8-12G(mirror), 12-16G(normal),
>> so more than one normal zone in a node? or normal zone just span the mirror 
>> zone?
> 
> Normal zone can span the mirror zone just fine. However, it will result in 
> zone
> scanners such as compaction to skip over the mirror zone inefficiently. Hmm...
> 

Hi Vlastimil,

If there are many mirror regions in one node, then it will be many holes in the
normal zone, is this fine?

Thanks,
Xishi Qiu

> 
> .
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/4] mm, thp: try fault allocations only if we expect them to succeed

2015-06-17 Thread David Rientjes
On Mon, 11 May 2015, Vlastimil Babka wrote:

> Since we track THP availability for khugepaged THP collapses, we can use it
> also for page fault THP allocations. If khugepaged with its sync compaction
> is not able to allocate a hugepage, then it's unlikely that the less involved
> attempt on page fault would succeed, and the cost could be higher than THP
> benefits. Also clear the THP availability flag if we do attempt and fail to
> allocate during page fault, and set the flag if we are freeing a large enough
> page from any context. The latter doesn't include merges, as that's a fast
> path and unlikely to make much difference.
> 

That depends on how long {scan,alloc}_sleep_millisecs are, so if 
khugepaged fails to allocate a hugepage on all nodes, it sleeps for 
alloc_sleep_millisecs (default 60s), and then there's immediate memory 
freeing, thp page faults don't happen again for 60s.  That's scary to me 
when thp_avail_nodes is clear, a large process terminates, and then 
immediately starts back up.  None of its memory is faulted as thp and 
depending on how large it is, khugepaged may fail to allocate hugepages 
when it wakes back up so it never scans (the only reason why 
thp_avail_nodes was clear before it terminated originally).

I'm not sure that approach can work unless the inference of whether a 
hugepage can be allocated at a given time is a very good indicator of 
whether a hugepage can be allocated alloc_sleep_millisecs later, and I'm 
afraid that's not the case.

I'm very happy that you're looking at thp fault latency and the role that 
khugepaged can play in accepting responsibility for defragmentation, 
though.  It's an area that has caused me some trouble lately and I'd like 
to be able to improve.

We see an immediate benefit when experimenting with doing synchronous 
memory compactions of all memory every 15s.  That's done using a cronjob 
rather than khugepaged, but the idea is the same.

What would your thoughts be about doing something radical like

 - having khugepaged do synchronous memory compaction of all memory at
   regulary intervals,

 - track how many pageblocks are free for thp memory to be allocated,

 - terminate collapsing if free pageblocks are below a threshold,

 - trigger a khugepaged wakeup at page fault when that number of 
   pageblocks falls below a threshold,

 - determine the next full sync memory compaction based on how many
   pageblocks were defragmented on the last wakeup, and

 - avoid memory compaction for all thp page faults.

(I'd ignore what is actually the responsibility of khugepaged and what is 
done in task work at this time.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kexec_load(2) bypasses signature verification

2015-06-17 Thread Dave Young
On 06/16/15 at 09:47pm, Vivek Goyal wrote:
> On Tue, Jun 16, 2015 at 08:32:37PM -0500, Eric W. Biederman wrote:
> > Vivek Goyal  writes:
> > 
> > > On Tue, Jun 16, 2015 at 02:38:31PM -0500, Eric W. Biederman wrote:
> > >> 
> > >> Adding Vivek as he is the one who implemented kexec_file_load.
> > >> I was hoping he would respond to this thread, and it looks like he
> > >> simply has not ever been Cc'd.
> > >> 
> > >> Theodore Ts'o  writes:
> > >> 
> > >> > On Mon, Jun 15, 2015 at 09:37:05AM -0400, Josh Boyer wrote:
> > >> >> The bits that actually read Secure Boot state out of the UEFI
> > >> >> variables, and apply protections to the machine to avoid compromise
> > >> >> under the SB threat model.  Things like disabling the old kexec...
> > >> >
> > >> > I don't have any real interest in using Secure Boot, but I *am*
> > >> > interested in using CONFIG_KEXEC_VERIFY_SIG[1].  So perhaps we need to
> > >> > have something similar to what we have with signed modules in terms of
> > >> > CONFIG_MODULE_SIG_FORCE and module/sig_enforce, but for
> > >> > KEXEC_VERIFY_SIG.  This would mean creating a separate flag
> > >> > independent of the one Linus suggested for Secure Boot, but since we
> > >> > have one for signed modules, we do have precedent for this sort of
> > >> > thing.
> > >> 
> > >> My overall request with respect to kexec has been that we implement
> > >> things that make sense outside of the bizarre threat model of the Linux
> > >> folks who were talking about secure boot.
> > >> 
> > >> nI have not navigated the labyrinth of config options but having a way to
> > >> only boot signed things with kexec seems a completely sensible way to
> > >> operate in the context of signed images.
> > >> 
> > >> I don't know how much that will help given that actors with sufficient
> > >> resources have demonstrated the ability to steal private keys, but
> > >> assuming binary signing is an effective technique (or why else do it)
> > >> then having an option to limit kexec to only loading signed images seems
> > >> sensible.
> > >
> > > I went through the mail chain on web and here are my thoughts.
> > >
> > > - So yes, upstream does not have the logic which automatically disables
> > >   the old syscall (kexec_load()) on secureboot systems. Distributions
> > >   carry those patches.
> > >
> > > - This KEXEC_VERIFY_SIG option only cotrols the behavior for
> > >   kexec_file_load() syscall and is not meant to directly affect any
> > >   behavior of old syscall (kexec_load()). I think I should have named
> > >   it KEXEC_FILE_VERIFY_SIG. Though help text makes it clear.
> > >   "Verify kernel signature during kexec_file_load() syscall".
> > >
> > > - I think disabling old system call if KEXEC_VERIFY_SIG() is set
> > >   will break existing setup which use old system call by default, except
> > >   the case of secureboot system. And old syscall path is well tested
> > >   and new syscall might not be in a position to support all the corner
> > >   cases, atleast as of now.
> > >
> > > Ted, 
> > >
> > > So looks like you are looking for a system/option where you just want to
> > > always make use of kexec_file_load() and disable kexec_load(). This sounds
> > > like you want a kernel where kexec_load() is compiled out and you want
> > > only kexec_file_load() in.
> > >
> > > Right now one can't do that becase kexec_file_load() depends on
> > > CONFIG_KEXEC option.
> > >
> > > I am wondering that how about making CONFIG_KEXEC_FILE_LOAD independent
> > > of CONFIG_KEXEC. That way one can set CONFIG_KEXEC_VERIFY_SIG=y, and
> > > only signed kernel can be kexeced on that system.
> > >
> > > This should gel well with long term strategy of deprecating kexec_load()
> > > at some point of time when kexec_file_load() is ready to completely
> > > replace it.
> > 
> > Interesting.
> > 
> > I suspect that what we want is to have CONFIG_KEXEC for the core
> > and additional CONFIG_KEXEC_LOAD option that covers that kexec_load call.
> > 
> > That should make it trivially easy to disable the kexec_load system call
> > in cases where people care.
> 
> Or, we could create another option CONFIG_KEXEC_CORE/CONFIG_KEXEC_COMMON
> which will be automatically selected when either CONFIG_KEXEC or
> CONIG_KEXEC_FILE are selected.
> 
> All common code can go under this option and rest can go under respective
> config options.
> 
> That way, those who have CONFIG_KEXEC=y in old config files will not be
> broken. They don't have to learn about new options at all.

Or simply add a new config option KEXEC_VERIFY_SIG_FORCE, so we can return
error in kexec_load and print some error message.

Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/44] kernel: Add support for poweroff handler call chain

2015-06-17 Thread Stephen Boyd
On 10/06/2014 10:28 PM, Guenter Roeck wrote:
> Various drivers implement architecture and/or device specific means to
> remove power from the system.  For the most part, those drivers set the
> global variable pm_power_off to point to a function within the driver.
>
> This mechanism has a number of drawbacks.  Typically only one scheme
> to remove power is supported (at least if pm_power_off is used).
> At least in theory there can be multiple means remove power, some of
> which may be less desirable. For example, some mechanisms may only
> power off the CPU or the CPU card, while another may power off the
> entire system.  Others may really just execute a restart sequence
> or drop into the ROM monitor. Using pm_power_off can also be racy
> if the function pointer is set from a driver built as module, as the
> driver may be in the process of being unloaded when pm_power_off is
> called. If there are multiple poweroff handlers in the system, removing
> a module with such a handler may inadvertently reset the pointer to
> pm_power_off to NULL, leaving the system with no means to remove power.
>
> Introduce a system poweroff handler call chain to solve the described
> problems.  This call chain is expected to be executed from the
> architecture specific machine_power_off() function.  Drivers providing
> system poweroff functionality are expected to register with this call chain.
> By using the priority field in the notifier block, callers can control
> poweroff handler execution sequence and thus ensure that the poweroff
> handler with the optimal capabilities to remove power for a given system
> is called first.

What happened to this series? I want to add shutdown support to my
platform and I need to write a register on the PMIC in one driver to
configure it for shutdown instead of restart and then write an MMIO
register to tell the PMIC to actually do the shutdown in another driver.
It seems that the notifier solves this case for me, albeit with the
slight complication that I need to order the two with some priority.

I'm also considering putting the PMIC configuration part into the reboot
notifier chain, because it only does things to change the configuration
and not actually any shutdown/restart itself. That removes any
requirement to get the priority of notifiers right. This series will
still be useful for the MMIO register that needs to be toggled though.
Right now I have to assign pm_power_off or hook the reboot notifier with
a different priority to make this work.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] regulator: qcom_spmi: Fix calculating number of voltages

2015-06-17 Thread Stephen Boyd
On 06/17/2015 05:50 PM, Axel Lin wrote:
> n /= range->step_uV + 1; is equivalent to n /= (range->step_uV + 1);
> which is wrong. Fix it.
>
> Signed-off-by: Axel Lin 
> ---

Acked-by: Stephen Boyd 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 4/4] gpio: brcmstb: support wakeup from S5 cold boot

2015-06-17 Thread Gregory Fong
For wake from S5, we need to:
- register a reboot handler
- set wakeup capability before requesting IRQ so wakeup count is
  incremented
- mask all GPIO IRQs and clear any pending interrupts during driver
  probe to since no driver will yet be registered to handle any IRQs
  carried over from boot at that time, and it's possible that the
  booted kernel does not request the same IRQ anyway.

This means that /sys/.../power/wakeup_count is valid at boot time, and
we can properly account for S5 wakeup stats. e.g.:

  ### After waking from S5 from a GPIO key
  # cat /sys/bus/platform/drivers/brcmstb-gpio/f04172c0.gpio/power/wakeup
  enabled
  # cat /sys/bus/platform/drivers/brcmstb-gpio/f04172c0.gpio/power/wakeup_count
  1

Signed-off-by: Gregory Fong 
---
New in v3.

 drivers/gpio/gpio-brcmstb.c | 56 -
 1 file changed, 50 insertions(+), 6 deletions(-)

diff --git a/drivers/gpio/gpio-brcmstb.c b/drivers/gpio/gpio-brcmstb.c
index 141509b..dedb35c 100644
--- a/drivers/gpio/gpio-brcmstb.c
+++ b/drivers/gpio/gpio-brcmstb.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define GIO_BANK_SIZE   0x20
 #define GIO_ODEN(bank)  (((bank) * GIO_BANK_SIZE) + 0x00)
@@ -48,6 +49,7 @@ struct brcmstb_gpio_priv {
int gpio_base;
bool can_wake;
int parent_wake_irq;
+   struct notifier_block reboot_notifier;
 };
 
 #define MAX_GPIO_PER_BANK   32
@@ -167,10 +169,9 @@ static int brcmstb_gpio_irq_set_type(struct irq_data *d, 
unsigned int type)
return 0;
 }
 
-static int brcmstb_gpio_irq_set_wake(struct irq_data *d, unsigned int enable)
+static int __brcmstb_gpio_irq_set_wake(struct brcmstb_gpio_priv *priv,
+   unsigned int enable)
 {
-   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
-   struct brcmstb_gpio_priv *priv = brcmstb_gpio_gc_to_priv(gc);
int ret = 0;
 
/*
@@ -188,6 +189,14 @@ static int brcmstb_gpio_irq_set_wake(struct irq_data *d, 
unsigned int enable)
return ret;
 }
 
+static int brcmstb_gpio_irq_set_wake(struct irq_data *d, unsigned int enable)
+{
+   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+   struct brcmstb_gpio_priv *priv = brcmstb_gpio_gc_to_priv(gc);
+
+   return __brcmstb_gpio_irq_set_wake(priv, enable);
+}
+
 static irqreturn_t brcmstb_gpio_wake_irq_handler(int irq, void *data)
 {
struct brcmstb_gpio_priv *priv = data;
@@ -247,6 +256,19 @@ static void brcmstb_gpio_irq_handler(unsigned int irq, 
struct irq_desc *desc)
chained_irq_exit(chip, desc);
 }
 
+static int brcmstb_gpio_reboot(struct notifier_block *nb,
+   unsigned long action, void *data)
+{
+   struct brcmstb_gpio_priv *priv =
+   container_of(nb, struct brcmstb_gpio_priv, reboot_notifier);
+
+   /* Enable GPIO for S5 cold boot */
+   if (action == SYS_POWER_OFF)
+   __brcmstb_gpio_irq_set_wake(priv, 1);
+
+   return NOTIFY_DONE;
+}
+
 /* Make sure that the number of banks matches up between properties */
 static int brcmstb_gpio_sanity_check_banks(struct device *dev,
struct device_node *np, struct resource *res)
@@ -286,6 +308,12 @@ static int brcmstb_gpio_remove(struct platform_device 
*pdev)
if (ret)
dev_err(>dev, "gpiochip_remove fail in 
cleanup\n");
}
+   if (priv->reboot_notifier.notifier_call) {
+   ret = unregister_reboot_notifier(>reboot_notifier);
+   if (ret)
+   dev_err(>dev,
+   "failed to unregister reboot notifier\n");
+   }
return ret;
 }
 
@@ -343,7 +371,16 @@ static int brcmstb_gpio_irq_setup(struct platform_device 
*pdev,
dev_warn(dev,
"Couldn't get wake IRQ - GPIOs will not be able 
to wake from sleep");
} else {
-   int err = devm_request_irq(dev, priv->parent_wake_irq,
+   int err;
+
+   /*
+* Set wakeup capability before requesting wakeup
+* interrupt, so we can process boot-time "wakeups"
+* (e.g., from S5 cold boot)
+*/
+   device_set_wakeup_capable(dev, true);
+   device_wakeup_enable(dev);
+   err = devm_request_irq(dev, priv->parent_wake_irq,
brcmstb_gpio_wake_irq_handler, 0,
"brcmstb-gpio-wake", priv);
 
@@ -352,8 +389,9 @@ static int brcmstb_gpio_irq_setup(struct platform_device 
*pdev,
return err;
}
 
-   device_set_wakeup_capable(dev, true);
-   device_wakeup_enable(dev);
+   priv->reboot_notifier.notifier_call =
+ 

[PATCH v3 3/4] gpio: brcmstb: Add interrupt and wakeup source support

2015-06-17 Thread Gregory Fong
Uses the gpiolib irqchip helpers.  For this to work, the irq setup
function is called once per bank instead of once per device.  Note
that all known uses of this block have a BCM7120 L2 interrupt
controller as a parent.  Supports interrupts for all GPIOs.

In the IRQ handler, we check for raised IRQs for invalid GPIOs and
warn (ratelimited) if they're encountered.

Also, several drivers (e.g. gpio-keys) allow for GPIOs to be
configured as wakeup sources, and this GPIO controller supports that
through a separate interrupt path.

The de-facto standard DT property "wakeup-source" is checked, since
that indicates whether the GPIO controller hardware can wake.  Uses
the IRQCHIP_MASK_ON_SUSPEND irq_chip flag because UPG GIO doesn't have
any of its own wakeup source configuration.

Aside regarding gpiolib irqchip helpers: It wasn't obvious (to me)
that you can have multiple chained irqchips and associated IRQ domains
for a single parent IRQ, and as long as the xlate function is written
correctly, a GPIO IRQ request end up checking the correct domain and
will get associated with the correct IRQ.  What helps make this clear
is to read
  drivers/gpio/gpiolib-of.c:
   - of_gpiochip_find_and_xlate()
   - of_get_named_gpiod_flags()
  drivers/gpio/gpiolib.c:
   - gpiochip_find()

Signed-off-by: Gregory Fong 
---
v3:
- combine commits to add interrupt support and allow GPIOs to be wakeup sources
- change to use the gpiolib irqchip helpers, reducing unnecessary code
  duplication.

 drivers/gpio/Kconfig|   1 +
 drivers/gpio/gpio-brcmstb.c | 263 +++-
 2 files changed, 258 insertions(+), 6 deletions(-)

diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index 5f79b7f..f723c7e 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -131,6 +131,7 @@ config GPIO_BRCMSTB
default y if ARCH_BRCMSTB
depends on OF_GPIO && (ARCH_BRCMSTB || COMPILE_TEST)
select GPIO_GENERIC
+   select GPIOLIB_IRQCHIP
help
  Say yes here to enable GPIO support for Broadcom STB (BCM7XXX) SoCs.
 
diff --git a/drivers/gpio/gpio-brcmstb.c b/drivers/gpio/gpio-brcmstb.c
index 4630a81..141509b 100644
--- a/drivers/gpio/gpio-brcmstb.c
+++ b/drivers/gpio/gpio-brcmstb.c
@@ -17,6 +17,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #define GIO_BANK_SIZE   0x20
 #define GIO_ODEN(bank)  (((bank) * GIO_BANK_SIZE) + 0x00)
@@ -34,14 +37,17 @@ struct brcmstb_gpio_bank {
struct bgpio_chip bgc;
struct brcmstb_gpio_priv *parent_priv;
u32 width;
+   struct irq_chip irq_chip;
 };
 
 struct brcmstb_gpio_priv {
struct list_head bank_list;
void __iomem *reg_base;
-   int num_banks;
struct platform_device *pdev;
+   int parent_irq;
int gpio_base;
+   bool can_wake;
+   int parent_wake_irq;
 };
 
 #define MAX_GPIO_PER_BANK   32
@@ -63,6 +69,184 @@ brcmstb_gpio_gc_to_priv(struct gpio_chip *gc)
return bank->parent_priv;
 }
 
+static void brcmstb_gpio_set_imask(struct brcmstb_gpio_bank *bank,
+   unsigned int offset, bool enable)
+{
+   struct bgpio_chip *bgc = >bgc;
+   struct brcmstb_gpio_priv *priv = bank->parent_priv;
+   u32 mask = bgc->pin2mask(bgc, offset);
+   u32 imask;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   imask = bgc->read_reg(priv->reg_base + GIO_MASK(bank->id));
+   if (enable)
+   imask |= mask;
+   else
+   imask &= ~mask;
+   bgc->write_reg(priv->reg_base + GIO_MASK(bank->id), imask);
+   spin_unlock_irqrestore(>lock, flags);
+}
+
+/*  IRQ chip functions  */
+
+static void brcmstb_gpio_irq_mask(struct irq_data *d)
+{
+   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+   struct brcmstb_gpio_bank *bank = brcmstb_gpio_gc_to_bank(gc);
+
+   brcmstb_gpio_set_imask(bank, d->hwirq, false);
+}
+
+static void brcmstb_gpio_irq_unmask(struct irq_data *d)
+{
+   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+   struct brcmstb_gpio_bank *bank = brcmstb_gpio_gc_to_bank(gc);
+
+   brcmstb_gpio_set_imask(bank, d->hwirq, true);
+}
+
+static int brcmstb_gpio_irq_set_type(struct irq_data *d, unsigned int type)
+{
+   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+   struct brcmstb_gpio_bank *bank = brcmstb_gpio_gc_to_bank(gc);
+   struct brcmstb_gpio_priv *priv = bank->parent_priv;
+   u32 mask = BIT(d->hwirq);
+   u32 edge_insensitive, iedge_insensitive;
+   u32 edge_config, iedge_config;
+   u32 level, ilevel;
+   unsigned long flags;
+
+   switch (type) {
+   case IRQ_TYPE_LEVEL_LOW:
+   level = 0;
+   edge_config = 0;
+   edge_insensitive = 0;
+   break;
+   case IRQ_TYPE_LEVEL_HIGH:
+   level = mask;
+   edge_config = 0;
+  

[PATCH v3 2/4] dt-bindings: brcmstb-gpio: document properties for wakeup

2015-06-17 Thread Gregory Fong
Some brcmstb GPIO controllers can be used to wake from suspend, so use
the de facto standard property 'wakeup-source' to mark the nodes of
controllers with that capability.

Also document interrupts-extended, which will be used for wakeup
handling because the interrupt parent for the wake IRQ is different
from the regular IRQ.

While we're at it, a few more fixes: We don't actually use the
"interrupt-names" property, so remove it from the listed optional
properties and from the examples.  And since we're modifying the
examples, also follow Brian's suggestions to:
- change #gpio-cells, #interrupt-cells, and brcm,gpio-bank-widths from
  hex to dec
- use phandles

Reviewed-by: Brian Norris 
Signed-off-by: Gregory Fong 
---
v3: Update per Brian's suggestions described in above message.

 .../devicetree/bindings/gpio/brcm,brcmstb-gpio.txt | 35 +-
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/Documentation/devicetree/bindings/gpio/brcm,brcmstb-gpio.txt 
b/Documentation/devicetree/bindings/gpio/brcm,brcmstb-gpio.txt
index 435f1bc..b405b44 100644
--- a/Documentation/devicetree/bindings/gpio/brcm,brcmstb-gpio.txt
+++ b/Documentation/devicetree/bindings/gpio/brcm,brcmstb-gpio.txt
@@ -33,6 +33,13 @@ Optional properties:
 - interrupt-parent:
 phandle of the parent interrupt controller
 
+- interrupts-extended:
+Alternate form of specifying interrupts and parents that allows for
+multiple parents.  This takes precedence over 'interrupts' and
+'interrupt-parent'.  Wakeup-capable GPIO controllers often route their
+wakeup interrupt lines through a different interrupt controller than the
+primary interrupt line, making this property necessary.
+
 - #interrupt-cells:
 Should be <2>.  The first cell is the GPIO number, the second should 
specify
 flags.  The following subset of flags is supported:
@@ -47,19 +54,33 @@ Optional properties:
 - interrupt-controller:
 Marks the device node as an interrupt controller
 
-- interrupt-names:
-The name of the IRQ resource used by this controller
+- wakeup-source:
+GPIOs for this controller can be used as a wakeup source
 
 Example:
upg_gio: gpio@f040a700 {
-   #gpio-cells = <0x2>;
-   #interrupt-cells = <0x2>;
+   #gpio-cells = <2>;
+   #interrupt-cells = <2>;
compatible = "brcm,bcm7445-gpio", "brcm,brcmstb-gpio";
gpio-controller;
interrupt-controller;
reg = <0xf040a700 0x80>;
-   interrupt-parent = <0xf>;
+   interrupt-parent = <_intc>;
+   interrupts = <0x6>;
+   brcm,gpio-bank-widths = <32 32 32 24>;
+   };
+
+   upg_gio_aon: gpio@f04172c0 {
+   #gpio-cells = <2>;
+   #interrupt-cells = <2>;
+   compatible = "brcm,bcm7445-gpio", "brcm,brcmstb-gpio";
+   gpio-controller;
+   interrupt-controller;
+   reg = <0xf04172c0 0x40>;
+   interrupt-parent = <_aon_intc>;
interrupts = <0x6>;
-   interrupt-names = "upg_gio";
-   brcm,gpio-bank-widths = <0x20 0x20 0x20 0x18>;
+   interrupts-extended = <_aon_intc 0x6>,
+   <_pm_l2_intc 0x5>;
+   wakeup-source;
+   brcm,gpio-bank-widths = <18 4>;
};
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 1/4] gpio: brcmstb: fix null ptr dereference in driver remove

2015-06-17 Thread Gregory Fong
If a failure occurs during probe, brcmstb_gpio_remove() is called. In
remove, we call platform_get_drvdata(), but at the time of failure in
the probe the driver data hadn't yet been set which leads to a NULL
ptr dereference in the remove's list_for_each.  Call
platform_set_drvdata() and set up list head right after allocating the
priv struct to both avoid the null pointer dereference that could
occur today.  To guard against potential future changes, check for
null pointer in remove.

Reported-by: Tim Ross 
Signed-off-by: Gregory Fong 
---
New in v3.

 drivers/gpio/gpio-brcmstb.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpio/gpio-brcmstb.c b/drivers/gpio/gpio-brcmstb.c
index 7a3cb1f..4630a81 100644
--- a/drivers/gpio/gpio-brcmstb.c
+++ b/drivers/gpio/gpio-brcmstb.c
@@ -87,6 +87,15 @@ static int brcmstb_gpio_remove(struct platform_device *pdev)
struct brcmstb_gpio_bank *bank;
int ret = 0;
 
+   if (!priv) {
+   dev_err(>dev, "called %s without drvdata!\n", __func__);
+   return -EFAULT;
+   }
+
+   /*
+* You can lose return values below, but we report all errors, and it's
+* more important to actually perform all of the steps.
+*/
list_for_each(pos, >bank_list) {
bank = list_entry(pos, struct brcmstb_gpio_bank, node);
ret = bgpio_remove(>bgc);
@@ -143,6 +152,8 @@ static int brcmstb_gpio_probe(struct platform_device *pdev)
priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
+   platform_set_drvdata(pdev, priv);
+   INIT_LIST_HEAD(>bank_list);
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
reg_base = devm_ioremap_resource(dev, res);
@@ -153,7 +164,6 @@ static int brcmstb_gpio_probe(struct platform_device *pdev)
priv->reg_base = reg_base;
priv->pdev = pdev;
 
-   INIT_LIST_HEAD(>bank_list);
if (brcmstb_gpio_sanity_check_banks(dev, np, res))
return -EINVAL;
 
@@ -221,8 +231,6 @@ static int brcmstb_gpio_probe(struct platform_device *pdev)
dev_info(dev, "Registered %d banks (GPIO(s): %d-%d)\n",
priv->num_banks, priv->gpio_base, gpio_base - 1);
 
-   platform_set_drvdata(pdev, priv);
-
return 0;
 
 fail:
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 0/4] GPIO support for BRCMSTB

2015-06-17 Thread Gregory Fong
Adds interrupt support for the GPIO controller (UPG GIO) used on Broadcom's
various BRCMSTB SoCs (BCM7XXX and others).  For all existing hardware, this
block hooks up to the BCM7120 L2 IRQ controller and so will require
CONFIG_BCM7120_L2_IRQ=y.

New in v3:
- fix a null pointer dereference noticed by Tim Ross
- use the gpiolib irqchip helpers as recommended by Linus Walleij
- update device tree bindings as suggested by Brian Norris:
  https://lkml.org/lkml/2015/5/29/802
- add S5 (cold boot) support

The following are not included in this patchset:
- Initial device tree bindings (merged from v1 to GPIO tree)
- Initial GPIO support w/o interrupts (merged from v2 to GPIO tree)
- ARM Kconfig changes (merged from v2 to arm-soc tree)

Previous revisions:
v1: https://lkml.org/lkml/2015/5/6/199
v2: https://lkml.org/lkml/2015/5/28/853

Gregory Fong (4):
  gpio: brcmstb: fix null ptr dereference in driver remove
  dt-bindings: brcmstb-gpio: document properties for wakeup
  gpio: brcmstb: Add interrupt and wakeup source support
  gpio: brcmstb: support wakeup from S5 cold boot

 .../devicetree/bindings/gpio/brcm,brcmstb-gpio.txt |  35 ++-
 drivers/gpio/Kconfig   |   1 +
 drivers/gpio/gpio-brcmstb.c| 321 -
 3 files changed, 341 insertions(+), 16 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 2/4] mm, thp: khugepaged checks for THP allocability before scanning

2015-06-17 Thread David Rientjes
On Mon, 11 May 2015, Vlastimil Babka wrote:

> Khugepaged could be scanning for collapse candidates uselessly, if it cannot
> allocate a hugepage in the end. The hugepage preallocation mechanism prevented
> this, but only for !NUMA configurations. It was removed by the previous patch,
> and this patch replaces it with a more generic mechanism.
> 
> The patch itroduces a thp_avail_nodes nodemask, which initially assumes that
> hugepage can be allocated on any node. Whenever khugepaged fails to allocate
> a hugepage, it clears the corresponding node bit. Before scanning for collapse
> candidates, it tries to allocate a hugepage on each online node with the bit
> cleared, and set it back on success. It tries to hold on to the hugepage if
> it doesn't hold any other yet. But the assumption is that even if the hugepage
> is freed back, it should be possible to allocate it in near future without
> further reclaim and compaction attempts.
> 
> During the scaning, khugepaged avoids collapsing on nodes with the bit 
> cleared,
> as soon as possible. If no nodes have hugepages available, scanning is skipped
> altogether.
> 

I'm not exactly sure what you mean by avoiding to do something as soon as 
possible.

> During testing, the patch did not show much difference in preventing
> thp_collapse_failed events from khugepaged, but this can be attributed to the
> sync compaction, which only khugepaged is allowed to use, and which is
> heavyweight enough to succeed frequently enough nowadays. The next patch will
> however extend the nodemask check to page fault context, where it has much
> larger impact. Also, with the future plan to convert THP collapsing to
> task_work context, this patch is a preparation to avoid useless scanning or
> heavyweight THP allocations in that context.
> 
> Signed-off-by: Vlastimil Babka 
> ---
>  mm/huge_memory.c | 71 
> +---
>  1 file changed, 63 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 565864b..b86a72a 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -102,7 +102,7 @@ struct khugepaged_scan {
>  static struct khugepaged_scan khugepaged_scan = {
>   .mm_head = LIST_HEAD_INIT(khugepaged_scan.mm_head),
>  };
> -
> +static nodemask_t thp_avail_nodes = NODE_MASK_ALL;

Seems like it should have khugepaged in its name so it's understood that 
the nodemask doesn't need to be synchronized, even though it will later be 
read outside of khugepaged, or at least a comment to say only khugepaged 
can store to it.

>  
>  static int set_recommended_min_free_kbytes(void)
>  {
> @@ -2273,6 +2273,14 @@ static bool khugepaged_scan_abort(int nid)
>   int i;
>  
>   /*
> +  * If it's clear that we are going to select a node where THP
> +  * allocation is unlikely to succeed, abort
> +  */
> + if (khugepaged_node_load[nid] == (HPAGE_PMD_NR / 2) &&
> + !node_isset(nid, thp_avail_nodes))
> + return true;
> +
> + /*
>* If zone_reclaim_mode is disabled, then no extra effort is made to
>* allocate memory locally.
>*/

If khugepaged_node_load for a node doesn't reach HPAGE_PMD_NR / 2, then 
this doesn't cause an abort.  I don't think it's necessary to try to 
optimize and abort the scan early when this is met, I think this should 
only be checked before collapse_huge_page().

> @@ -2356,6 +2364,7 @@ static struct page
>   if (unlikely(!*hpage)) {
>   count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
>   *hpage = ERR_PTR(-ENOMEM);
> + node_clear(node, thp_avail_nodes);
>   return NULL;
>   }
>  
> @@ -2363,6 +2372,42 @@ static struct page
>   return *hpage;
>  }
>  
> +/* Return true, if THP should be allocatable on at least one node */
> +static bool khugepaged_check_nodes(struct page **hpage)
> +{
> + bool ret = false;
> + int nid;
> + struct page *newpage = NULL;
> + gfp_t gfp = alloc_hugepage_gfpmask(khugepaged_defrag());
> +
> + for_each_online_node(nid) {
> + if (node_isset(nid, thp_avail_nodes)) {
> + ret = true;
> + continue;
> + }
> +
> + newpage = alloc_hugepage_node(gfp, nid);
> +
> + if (newpage) {
> + node_set(nid, thp_avail_nodes);
> + ret = true;
> + /*
> +  * Heuristic - try to hold on to the page for collapse
> +  * scanning, if we don't hold any yet.
> +  */
> + if (IS_ERR_OR_NULL(*hpage)) {
> + *hpage = newpage;
> + //NIXME: should we count all/no allocations?
> + count_vm_event(THP_COLLAPSE_ALLOC);

Seems like we'd only count the event when the node load has selected a 
target node and the hugepage that is 

Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE

2015-06-17 Thread Mike Galbraith
On Wed, 2015-06-17 at 11:06 -0700, Josef Bacik wrote:
> On 06/11/2015 10:35 PM, Mike Galbraith wrote:
> > On Thu, 2015-05-28 at 13:05 +0200, Peter Zijlstra wrote:

> > If sd == NULL, we fall through and try to pull wakee despite nacked-by
> > tsk_cpus_allowed() or wake_affine().
> >
> 
> So maybe add a check in the if (sd_flag & SD_BALANCE_WAKE) for something 
> like this
> 
> if (tmp >= 0) {
>   new_cpu = tmp;
>   goto unlock;
> } else if (!want_affine) {
>   new_cpu = prev_cpu;
> }
> 
> so we can make sure we're not being pushed onto a cpu that we aren't 
> allowed on?  Thanks,

The buglet is a messenger methinks.  You saying the patch helped without
SD_BALANCE_WAKE being set is why I looked.  The buglet would seem to say
that preferring cache is not harming your load after all.  It now sounds
as though wake_wide() may be what you're squabbling with.

Things aren't adding up all that well.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] regulator: qcom_spmi: Fix calculating number of voltages

2015-06-17 Thread Axel Lin
n /= range->step_uV + 1; is equivalent to n /= (range->step_uV + 1);
which is wrong. Fix it.

Signed-off-by: Axel Lin 
---
 drivers/regulator/qcom_spmi-regulator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/regulator/qcom_spmi-regulator.c 
b/drivers/regulator/qcom_spmi-regulator.c
index 162b865..42a60b4 100644
--- a/drivers/regulator/qcom_spmi-regulator.c
+++ b/drivers/regulator/qcom_spmi-regulator.c
@@ -1108,7 +1108,7 @@ static void spmi_calculate_num_voltages(struct 
spmi_voltage_set_points *points)
n = 0;
if (range->set_point_max_uV) {
n = range->set_point_max_uV - range->set_point_min_uV;
-   n /= range->step_uV + 1;
+   n = (n / range->step_uV) + 1;
}
range->n_voltages = n;
points->n_voltages += n;
-- 
2.1.0



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 1/4] mm, thp: stop preallocating hugepages in khugepaged

2015-06-17 Thread David Rientjes
On Mon, 11 May 2015, Vlastimil Babka wrote:

> Khugepaged tries to preallocate a hugepage before scanning for THP collapse
> candidates. If the preallocation fails, scanning is not attempted. This makes
> sense, but it is only restricted to !NUMA configurations, where it does not
> need to predict on which node to preallocate.
> 
> Besides the !NUMA restriction, the preallocated page may also end up being
> unused and put back when no collapse candidate is found. I have observed the
> thp_collapse_alloc vmstat counter to have 3+ times the value of the counter
> of actually collapsed pages in /sys/.../khugepaged/pages_collapsed. On the
> other hand, the periodic hugepage allocation attempts involving sync
> compaction can be beneficial for the antifragmentation mechanism, but that's
> however harder to evaluate.
> 
> The following patch will introduce per-node THP availability tracking, which
> has more benefits than current preallocation and is applicable to CONFIG_NUMA.
> We can therefore remove the preallocation, which also allows a cleanup of the
> functions involved in khugepaged allocations. Another small benefit of the
> patch is that NUMA configs can now reuse an allocated hugepage for another
> collapse attempt, if the previous one was for the same node and failed.
> 
> Signed-off-by: Vlastimil Babka 

I think this is fine if the rest of the series is adopted, and I 
understand how the removal and cleanup is easier when done first before 
the following patches.  I think you can unify alloc_hugepage_node() for 
both NUMA and !NUMA configs and inline it in khugepaged_alloc_page().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/2] kconfig: allow use of relations other than (in)equality

2015-06-17 Thread Ulf Magnusson
On Mon, Jun 15, 2015 at 5:29 PM, Randy Dunlap  wrote:
>
> Hi,
>
> Please update Documentation/kbuild/kconfig-language.txt where 
> syntax is defined.
>
>
> --
> ~Randy

Is this likely to be the final version? Approx. when will it go in?

/Ulf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS

2015-06-17 Thread Hugh Dickins
 Wed, Jun 17, 2015 at 12:45 PM, Morten Stevens
 wrote:
> 2015-06-15 8:09 GMT+02:00 Daniel Wagner :
>> On 06/14/2015 06:48 PM, Hugh Dickins wrote:
>>> It appears that, at some point last year, XFS made directory handling
>>> changes which bring it into lockdep conflict with shmem_zero_setup():
>>> it is surprising that mmap() can clone an inode while holding mmap_sem,
>>> but that has been so for many years.
>>>
>>> Since those few lockdep traces that I've seen all implicated selinux,
>>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which
>>> v3.13's commit c7277090927a ("security: shmem: implement kernel private
>>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes:
>>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail.
>>>
>>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero
>>> (and MAP_SHARED|MAP_ANONYMOUS).  I thought there were also drivers
>>> which cloned inode in mmap(), but if so, I cannot locate them now.
>>>
>>> Reported-and-tested-by: Prarit Bhargava 
>>> Reported-by: Daniel Wagner 
>>
>> Reported-and-tested-by: Daniel Wagner 
>>
>> Sorry for the long delay. It took me a while to figure out my original
>> setup. I could verify that this patch made the lockdep message go away
>> on 4.0-rc6 and also on 4.1-rc8.
>
> Yes, it's also fixed for me after applying this patch to 4.1-rc8.

Thank you - Hugh

>
> Best regards,
>
> Morten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: call_rcu from trace_preempt

2015-06-17 Thread Paul E. McKenney
On Wed, Jun 17, 2015 at 04:58:48PM -0700, Alexei Starovoitov wrote:
> On 6/17/15 2:36 PM, Paul E. McKenney wrote:
> >Well, you do need to have something in each element to allow them to be
> >tracked.  You could indeed use llist_add() to maintain the per-CPU list,
> >and then use llist_del_all() bulk-remove all the elements from the per-CPU
> >list.  You can then pass each element in turn to kfree_rcu().  And yes,
> >I am suggesting that you open-code this, as it is going to be easier to
> >handle your special case then to provide a fully general solution.  For
> >one thing, the general solution would require a full rcu_head to track
> >offset and next.  In contrast, you can special-case the offset.  And
> >ignore the overload special cases.
> 
> yes. all makes sense.
> 
> > Locklessly enqueue onto a per-CPU list, but yes.  The freeing is up to
> 
> yes. per-cpu llist indeed.
> 
> > you -- you get called just before exit from __call_rcu(), and get to
> > figure out what to do.
> >
> > My guess would be if not in interrupt and not recursively invoked,
> > atomically remove all the elements from the list, then pass each to
> > kfree_rcu(), and finally let things take their course from there.
> > The llist APIs look like they would work.
> 
> Above and 'just before the exit from __call_rcu()' part of suggestion
> I still don't understand.
> To avoid reentry into call_rcu I can either create 1 or N new kthreads
> or work_queue and do manual wakeups, but that's very specialized and I
> don't want to permanently waste them, so I'm thinking to llist_add into
> per-cpu llists and do llist_del_all in rcu_process_callbacks() to take
> them from these llists and call kfree_rcu on them.

Another option is to drain the lists the next time you do an allocation.
That would avoid hooking both __call_rcu() and rcu_process_callbacks().

Thanx, Paul

> The llist_add part will also do:
> if (!rcu_is_watching()) invoke_rcu_core();
> to raise softirq when necessary.
> So at the end it will look like two phase kfree_rcu.
> I'll try to code it up and see it explodes :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] of: overlay: Implement indirect target support

2015-06-17 Thread Rob Herring
On Fri, Jun 12, 2015 at 2:54 PM, Pantelis Antoniou
 wrote:
> Some applications require applying the same overlay to a different
> target according to some external condition (for instance depending
> on the slot a card has been inserted, the overlay target is different).
>
> The indirect target use requires using the new
> of_overlay_create_indirect() API which uses a text selector.
>
> The format requires the use of a target-indirect node as follows:
>
> fragment@0 {
> target-indirect {
> foo { target = <_target>; };
> bar { target = <_target>; };
> };
> };

The problem with this is it does not scale. The overlay has to be
changed for every new target. If you had an add-on board (possibly
providing an overlay from an EEPROM), you would not want to have to
rebuild overlays with every new host board. It also only handles
translations of where to apply the overlay, but doesn't provide
translations of phandles within the overlay. Say a node that is a
clock or regulator consumer for example. Or am I missing something.

Grant and I discussed this briefly. I think we need a connector
definition in the base dtb which can provide the target for an
overlay. The connector should provide the translation between
connector local signals/buses and host signals/buses. We need to
define what this translation would look like for each binding.

At least for GPIO, we could have something similar to interrupt-map
properties. For example, to map connector gpio 0 to host gpio 66 and
connector gpio 1 to host gpio 44:

gpio-map = <0  66>, <1  44>;

We'd need to define how to handle I2C, SPI, regulators, and clocks
minimally. Perhaps rather than trying to apply nodes into the base
dtb, they should be under the connector and the kernel has to learn to
not just look for child nodes for various bindings. Just thinking
aloud...

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] perf/kvm: Guest Symbol Resolution for powerpc

2015-06-17 Thread Hemant Kumar

Hi Arnaldo,

On 06/16/2015 09:08 PM, Arnaldo Carvalho de Melo wrote:

Em Tue, Jun 16, 2015 at 08:20:53AM +0530, Hemant Kumar escreveu:

"perf kvm {record|report}" is used to record and report the performance
profile of any workload on a guest. From the host, we can collect
guest kernel statistics which is useful in finding out any contentions
in guest kernel symbols for a certain workload.

This feature is not available on powerpc because "perf" relies on the
"cycles" event (a PMU event) to profile the guest. However, for powerpc,
this can't be used from the host because the PMUs are controlled by the
guest rather than the host.

Due to this problem, we need a different approach to profile the
workload in the guest. There exists a tracepoint "kvm_hv:kvm_guest_exit"
in powerpc which is hit whenever any of the threads exit the guest
context. The guest instruction pointer dumped along with this
tracepoint data in the field "pc", can be used as guest instruction
pointer while postprocessing the trace data to map this IP to symbol
from guest.kallsyms.

However, to have some kind of periodicity, we can't use all the kvm
exits, rather exits which are bound to happen in certain intervals.
HV_DECREMENTER Interrupt forces the threads to exit after an interval
of 10 ms.

This patch makes use of the "kvm_guest_exit" tracepoint and checks the
exit reason for any kvm exit. If it is HV_DECREMENTER, then the
instruction pointer dumped along with this tracepoint is retrieved and
mapped with the guest kallsyms.

This patch is a prototype asking for suggestions/comments as to whether
the approach is right or is there any way better than this (like using
a different event to profile for, etc) to profile the guest from the
host.

Thank You.

Signed-off-by: Hemant Kumar 
---
  tools/perf/arch/powerpc/Makefile|  1 +
  tools/perf/arch/powerpc/util/parse-tp.c | 55 +
  tools/perf/builtin-report.c |  9 ++
  tools/perf/util/event.c |  7 -
  tools/perf/util/evsel.c |  7 +
  tools/perf/util/evsel.h |  4 +++
  tools/perf/util/session.c   |  7 +++--
  7 files changed, 86 insertions(+), 4 deletions(-)
  create mode 100644 tools/perf/arch/powerpc/util/parse-tp.c

diff --git a/tools/perf/arch/powerpc/Makefile b/tools/perf/arch/powerpc/Makefile
index 6f7782b..992a0d5 100644
--- a/tools/perf/arch/powerpc/Makefile
+++ b/tools/perf/arch/powerpc/Makefile
@@ -4,3 +4,4 @@ LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/dwarf-regs.o
  LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/skip-callchain-idx.o
  endif
  LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/header.o
+LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/parse-tp.o
diff --git a/tools/perf/arch/powerpc/util/parse-tp.c 
b/tools/perf/arch/powerpc/util/parse-tp.c
new file mode 100644
index 000..4c6e49c
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/parse-tp.c
@@ -0,0 +1,55 @@
+#include "../../util/evsel.h"
+#include "../../util/trace-event.h"
+#include "../../util/session.h"
+
+#define KVMPPC_EXIT "kvm_hv:kvm_guest_exit"
+#define HV_DECREMENTER 2432
+#define HV_BIT 3
+#define PR_BIT 49
+#define PPC_MAX 63
+
+/*
+ * Get the instruction pointer from the tracepoint data
+ */
+u64 arch__get_ip(struct perf_evsel *evsel, struct perf_sample *data)
+{
+   u64 tp_ip = data->ip;
+   int trap;
+
+   if (!strcmp(KVMPPC_EXIT, evsel->name)) {

Can't you cache this somewhere? I.e. something like
  
	static int kvmppc_exit = -1;


if (evsel->attr.type != PERF_TRACEPOINT)
goto out;

if (unlikely(kvmppc_exit == -1)) {
if (strcmp(KVMPPC_EXIT, evsel->name)))
goto out;

kvmppc_exit = evsel->attr.config;
} else (if kvmppc_exit != evsel->attr.config)
goto out;


Will try this.




+   trap = raw_field_value(evsel->tp_format, "trap", data->raw_data);
+
+   if (trap == HV_DECREMENTER)
+   tp_ip = raw_field_value(evsel->tp_format, "pc",
+   data->raw_data);

out:


+   return tp_ip;
+}


Also we have:

u64 perf_evsel__intval(struct perf_evsel *evsel,
   struct perf_sample *sample, const char *name);

So:

trap = perf_evsel__intval(evsel, sample, "trap");

And:

tp_ip = perf_evsel__intval(evsel, sample, "pc");

Makes it a bit shorter and allows for optimizations in how to find that
field by name made at the evsel code.


Thanks, missed perf_evsel__intval, will use this in the next iteration.


- Arnaldo


+
+/*
+ * Get the HV and PR bits and accordingly, determine the cpumode
+ */
+u8 arch__get_cpumode(union perf_event *event, struct perf_evsel *evsel,
+struct perf_sample *data)
+{
+   unsigned long hv, pr, msr;
+   u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+
+   if (strcmp(KVMPPC_EXIT, evsel->name))
+   goto ret;
+
+   

[PATCH 02/15] libnvdimm: infrastructure for btt devices

2015-06-17 Thread Dan Williams
Block devices from an nd bus, in addition to accepting "struct bio"
based requests, also have the capability to perform byte-aligned
accesses.  By default only the bio/block interface is used.  However, if
another driver can make effective use of the byte-aligned capability it
can claim the block interface and use the byte-aligned ->rw_bytes()
interface.

The BTT driver is the initial first consumer of this mechanism to allow
layering atomic sector update guarantees on top of ->rw_bytes() capable
libnvdimm-block-devices, or their partitions.

Cc: Greg KH 
Cc: Neil Brown 
Signed-off-by: Dan Williams 
---
 drivers/nvdimm/Kconfig |3 
 drivers/nvdimm/Makefile|1 
 drivers/nvdimm/btt.h   |   45 +
 drivers/nvdimm/btt_devs.c  |  431 
 drivers/nvdimm/bus.c   |   82 
 drivers/nvdimm/core.c  |   20 ++
 drivers/nvdimm/nd-core.h   |   34 +++
 drivers/nvdimm/nd.h|   19 ++
 drivers/nvdimm/pmem.c  |6 -
 include/uapi/linux/ndctl.h |2 
 10 files changed, 637 insertions(+), 6 deletions(-)
 create mode 100644 drivers/nvdimm/btt.h
 create mode 100644 drivers/nvdimm/btt_devs.c

diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 07a29113b870..f16ba9d14740 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -33,4 +33,7 @@ config BLK_DEV_PMEM
 
  Say Y if you want to use an NVDIMM
 
+config ND_BTT_DEVS
+   def_bool y
+
 endif
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index abce98f87f16..eb1bbce86592 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -11,3 +11,4 @@ libnvdimm-y += region_devs.o
 libnvdimm-y += region.o
 libnvdimm-y += namespace_devs.o
 libnvdimm-y += label.o
+libnvdimm-$(CONFIG_ND_BTT_DEVS) += btt_devs.o
diff --git a/drivers/nvdimm/btt.h b/drivers/nvdimm/btt.h
new file mode 100644
index ..e8f6d8e0ddd3
--- /dev/null
+++ b/drivers/nvdimm/btt.h
@@ -0,0 +1,45 @@
+/*
+ * Block Translation Table library
+ * Copyright (c) 2014-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#ifndef _LINUX_BTT_H
+#define _LINUX_BTT_H
+
+#include 
+
+#define BTT_SIG_LEN 16
+#define BTT_SIG "BTT_ARENA_INFO\0"
+
+struct btt_sb {
+   u8 signature[BTT_SIG_LEN];
+   u8 uuid[16];
+   u8 parent_uuid[16];
+   __le32 flags;
+   __le16 version_major;
+   __le16 version_minor;
+   __le32 external_lbasize;
+   __le32 external_nlba;
+   __le32 internal_lbasize;
+   __le32 internal_nlba;
+   __le32 nfree;
+   __le32 infosize;
+   __le64 nextoff;
+   __le64 dataoff;
+   __le64 mapoff;
+   __le64 logoff;
+   __le64 info2off;
+   u8 padding[3968];
+   __le64 checksum;
+};
+
+#endif
diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c
new file mode 100644
index ..2148fd8f535b
--- /dev/null
+++ b/drivers/nvdimm/btt_devs.c
@@ -0,0 +1,431 @@
+/*
+ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "nd-core.h"
+#include "btt.h"
+#include "nd.h"
+
+static DEFINE_IDA(btt_ida);
+
+static void nd_btt_release(struct device *dev)
+{
+   struct nd_btt *nd_btt = to_nd_btt(dev);
+
+   dev_dbg(dev, "%s\n", __func__);
+   WARN_ON(nd_btt->backing_dev);
+   ida_simple_remove(_ida, nd_btt->id);
+   kfree(nd_btt->uuid);
+   kfree(nd_btt);
+}
+
+static struct device_type nd_btt_device_type = {
+   .name = "nd_btt",
+   .release = nd_btt_release,
+};
+
+bool is_nd_btt(struct device *dev)
+{
+   return dev->type == _btt_device_type;
+}
+
+struct nd_btt *to_nd_btt(struct device *dev)
+{
+   struct nd_btt *nd_btt = container_of(dev, struct nd_btt, dev);
+
+   WARN_ON(!is_nd_btt(dev));
+   return nd_btt;
+}
+EXPORT_SYMBOL(to_nd_btt);
+
+static const unsigned long btt_lbasize_supported[] = { 512, 4096, 0 };
+
+static ssize_t sector_size_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct nd_btt *nd_btt = 

[PATCH 03/15] nd_btt: atomic sector updates

2015-06-17 Thread Dan Williams
From: Vishal Verma 

BTT stands for Block Translation Table, and is a way to provide power
fail sector atomicity semantics for block devices that have the ability
to perform byte granularity IO. It relies on the capability of libnvdimm
namespace devices to do byte aligned IO.

The BTT works as a stacked blocked device, and reserves a chunk of space
from the backing device for its accounting metadata. It is a bio-based
driver because all IO is done synchronously, and there is no queuing or
asynchronous completions at either the device or the driver level.

The BTT uses 'lanes' to index into various 'on-disk' data structures,
and lanes also act as a synchronization mechanism in case there are more
CPUs than available lanes. We did a comparison between two lane lock
strategies - first where we kept an atomic counter around that tracked
which was the last lane that was used, and 'our' lane was determined by
atomically incrementing that. That way, for the nr_cpus > nr_lanes case,
theoretically, no CPU would be blocked waiting for a lane. The other
strategy was to use the cpu number we're scheduled on to and hash it to
a lane number. Theoretically, this could block an IO that could've
otherwise run using a different, free lane. But some fio workloads
showed that the direct cpu -> lane hash performed faster than tracking
'last lane' - my reasoning is the cache thrash caused by moving the
atomic variable made that approach slower than simply waiting out the
in-progress IO. This supports the conclusion that the driver can be a
very simple bio-based one that does synchronous IOs instead of queuing.

Cc: Andy Lutomirski 
Cc: Boaz Harrosh 
Cc: H. Peter Anvin 
Cc: Jens Axboe 
Cc: Ingo Molnar 
Cc: Christoph Hellwig 
Cc: Neil Brown 
Cc: Jeff Moyer 
Cc: Dave Chinner 
Cc: Greg KH 
[jmoyer: fix nmi watchdog timeout in btt_map_init]
[jmoyer: move btt initialization to module load path]
[jmoyer: fix memory leak in the btt initialization path]
[jmoyer: Don't overwrite corrupted arenas]
Signed-off-by: Vishal Verma 
Signed-off-by: Dan Williams 
---
 Documentation/nvdimm/btt.txt |  273 
 drivers/acpi/nfit.c  |1 
 drivers/nvdimm/Kconfig   |   27 +
 drivers/nvdimm/Makefile  |3 
 drivers/nvdimm/btt.c | 1449 ++
 drivers/nvdimm/btt.h |  141 
 drivers/nvdimm/btt_devs.c|3 
 drivers/nvdimm/nd.h  |   10 
 drivers/nvdimm/region.c  |   89 +++
 drivers/nvdimm/region_devs.c |   10 
 include/linux/libnvdimm.h|1 
 11 files changed, 2004 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/nvdimm/btt.txt
 create mode 100644 drivers/nvdimm/btt.c

diff --git a/Documentation/nvdimm/btt.txt b/Documentation/nvdimm/btt.txt
new file mode 100644
index ..95134d5ec4a0
--- /dev/null
+++ b/Documentation/nvdimm/btt.txt
@@ -0,0 +1,273 @@
+BTT - Block Translation Table
+=
+
+
+1. Introduction
+---
+
+Persistent memory based storage is able to perform IO at byte (or more
+accurately, cache line) granularity. However, we often want to expose such
+storage as traditional block devices. The block drivers for persistent memory
+will do exactly this. However, they do not provide any atomicity guarantees.
+Traditional SSDs typically provide protection against torn sectors in hardware,
+using stored energy in capacitors to complete in-flight block writes, or 
perhaps
+in firmware. We don't have this luxury with persistent memory - if a write is 
in
+progress, and we experience a power failure, the block will contain a mix of 
old
+and new data. Applications may not be prepared to handle such a scenario.
+
+The Block Translation Table (BTT) provides atomic sector update semantics for
+persistent memory devices, so that applications that rely on sector writes not
+being torn can continue to do so. The BTT manifests itself as a stacked block
+device, and reserves a portion of the underlying storage for its metadata. At
+the heart of it, is an indirection table that re-maps all the blocks on the
+volume. It can be thought of as an extremely simple file system that only
+provides atomic sector updates.
+
+
+2. Static Layout
+
+
+The underlying storage on which a BTT can be laid out is not limited in any 
way.
+The BTT, however, splits the available space into chunks of up to 512 GiB,
+called "Arenas".
+
+Each arena follows the same layout for its metadata, and all references in an
+arena are internal to it (with the exception of one field that points to the
+next arena). The following depicts the "On-disk" metadata layout:
+
+
+  Backing Store +--->  Arena
++---+   |   +--+
+|   |   |   | Arena info block |
+|Arena 0+---+   |   4K |
+| 512G  |   +--+
+|   |   |  |
++---+   |  |
+|   |   | 

[PATCH 10/15] libnvdimm: fix up max_hw_sectors

2015-06-17 Thread Dan Williams
There is no hardware limit to enforce on the size of the i/o that can be
passed to nd block device, so set it to UINT_MAX.  Do this centrally for
all nd block devices in nd_blk_queue_init();

Reviewed-by: Vishal Verma 
Signed-off-by: Dan Williams 
---
 drivers/nvdimm/blk.c  |3 +--
 drivers/nvdimm/btt.c  |3 +--
 drivers/nvdimm/core.c |7 +++
 drivers/nvdimm/nd.h   |1 +
 drivers/nvdimm/pmem.c |3 +--
 5 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/nvdimm/blk.c b/drivers/nvdimm/blk.c
index feddad325f97..8a6345797a71 100644
--- a/drivers/nvdimm/blk.c
+++ b/drivers/nvdimm/blk.c
@@ -272,9 +272,8 @@ static int nd_blk_probe(struct device *dev)
}
 
blk_queue_make_request(blk_dev->queue, nd_blk_make_request);
-   blk_queue_max_hw_sectors(blk_dev->queue, 1024);
-   blk_queue_bounce_limit(blk_dev->queue, BLK_BOUNCE_ANY);
blk_queue_logical_block_size(blk_dev->queue, blk_dev->sector_size);
+   nd_blk_queue_init(blk_dev->queue);
 
disk = blk_dev->disk = alloc_disk(0);
if (!disk) {
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index c337b7abfb43..380e01cedd24 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1285,9 +1285,8 @@ static int btt_blk_init(struct btt *btt)
btt->btt_disk->flags = GENHD_FL_EXT_DEVT;
 
blk_queue_make_request(btt->btt_queue, btt_make_request);
-   blk_queue_max_hw_sectors(btt->btt_queue, 1024);
-   blk_queue_bounce_limit(btt->btt_queue, BLK_BOUNCE_ANY);
blk_queue_logical_block_size(btt->btt_queue, btt->sector_size);
+   nd_blk_queue_init(btt->btt_queue);
btt->btt_queue->queuedata = btt;
 
set_capacity(btt->btt_disk, 0);
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index 8f466c384b30..d27b13357873 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -214,6 +214,13 @@ ssize_t nd_sector_size_store(struct device *dev, const 
char *buf,
}
 }
 
+void nd_blk_queue_init(struct request_queue *q)
+{
+   blk_queue_max_hw_sectors(q, UINT_MAX);
+   blk_queue_bounce_limit(q, BLK_BOUNCE_ANY);
+}
+EXPORT_SYMBOL(nd_blk_queue_init);
+
 static ssize_t commands_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 9b5fdb2215b1..2f20d5dca028 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -171,5 +171,6 @@ struct resource *nvdimm_allocate_dpa(struct nvdimm_drvdata 
*ndd,
struct nd_label_id *label_id, resource_size_t start,
resource_size_t n);
 int nd_blk_region_init(struct nd_region *nd_region);
+void nd_blk_queue_init(struct request_queue *q);
 resource_size_t nd_namespace_blk_validate(struct nd_namespace_blk *nsblk);
 #endif /* __ND_H__ */
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 1f4767150975..b825a2201aa8 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -172,8 +172,7 @@ static struct pmem_device *pmem_alloc(struct device *dev,
goto out_unmap;
 
blk_queue_make_request(pmem->pmem_queue, pmem_make_request);
-   blk_queue_max_hw_sectors(pmem->pmem_queue, 1024);
-   blk_queue_bounce_limit(pmem->pmem_queue, BLK_BOUNCE_ANY);
+   nd_blk_queue_init(pmem->pmem_queue);
 
disk = alloc_disk(0);
if (!disk)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/15] libnvdimm, nfit: handle acpi_nfit_memory_map flags

2015-06-17 Thread Dan Williams
The flags in this NFIT sub-structure indicate the state of the data on
the nvdimm relative to its energy source or last "flush to persistence".
For the most part there is nothing the driver can do but advertise the
state of these flags in sysfs and emit a message if firmware indicates
that the contents of the device may be corrupted.  However, for the case
of ACPI_NFIT_MEM_ARMED, the driver can arrange for the block devices
incorporating that nvdimm to be marked read-only.  This is a safe
default as the data is still available and new writes are held off until
the administrator either forces read-write mode, or the energy source
becomes armed.

A module parameter "force_rw" is added to allow the default to be
overridden.

Signed-off-by: Dan Williams 
---
 drivers/acpi/nfit.c  |   35 +++
 drivers/acpi/nfit.h  |3 +++
 drivers/nvdimm/blk.c |1 +
 drivers/nvdimm/bus.c |   27 +++
 drivers/nvdimm/nd-core.h |1 +
 drivers/nvdimm/nd.h  |1 +
 drivers/nvdimm/pmem.c|1 +
 drivers/nvdimm/region_devs.c |   13 +
 include/linux/libnvdimm.h|2 ++
 tools/testing/nvdimm/test/nfit.c |3 +++
 10 files changed, 87 insertions(+)

diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
index 9363e4b0e6a7..5f645823d7d7 100644
--- a/drivers/acpi/nfit.c
+++ b/drivers/acpi/nfit.c
@@ -27,6 +27,10 @@ static bool force_enable_dimms;
 module_param(force_enable_dimms, bool, S_IRUGO|S_IWUSR);
 MODULE_PARM_DESC(force_enable_dimms, "Ignore _STA (ACPI DIMM device) status");
 
+static bool force_rw;
+module_param(force_rw, bool, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(force_rw, "Enable writes to DIMMs that failed to arm");
+
 static u8 nfit_uuid[NFIT_UUID_MAX][16];
 
 const u8 *to_nfit_uuid(enum nfit_uuids id)
@@ -664,6 +668,20 @@ static ssize_t serial_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(serial);
 
+static ssize_t flags_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   u16 flags = to_nfit_memdev(dev)->flags;
+
+   return sprintf(buf, "%s%s%s%s%s\n",
+   flags & ACPI_NFIT_MEM_SAVE_FAILED ? "save " : "",
+   flags & ACPI_NFIT_MEM_RESTORE_FAILED ? "restore " : "",
+   flags & ACPI_NFIT_MEM_FLUSH_FAILED ? "flush " : "",
+   flags & ACPI_NFIT_MEM_ARMED ? "arm " : "",
+   flags & ACPI_NFIT_MEM_HEALTH_OBSERVED ? "smart " : "");
+}
+static DEVICE_ATTR_RO(flags);
+
 static struct attribute *acpi_nfit_dimm_attributes[] = {
_attr_handle.attr,
_attr_phys_id.attr,
@@ -672,6 +690,7 @@ static struct attribute *acpi_nfit_dimm_attributes[] = {
_attr_format.attr,
_attr_serial.attr,
_attr_rev_id.attr,
+   _attr_flags.attr,
NULL,
 };
 
@@ -764,6 +783,7 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc 
*acpi_desc)
struct nvdimm *nvdimm;
unsigned long flags = 0;
u32 device_handle;
+   u16 mem_flags;
int rc;
 
device_handle = __to_nfit_memdev(nfit_mem)->device_handle;
@@ -781,6 +801,10 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc 
*acpi_desc)
if (nfit_mem->bdw && nfit_mem->memdev_pmem)
flags |= NDD_ALIASING;
 
+   mem_flags = __to_nfit_memdev(nfit_mem)->flags;
+   if ((mem_flags & ACPI_NFIT_MEM_ARMED) && !force_rw)
+   flags |= NDD_UNARMED;
+
rc = acpi_nfit_add_dimm(acpi_desc, nfit_mem, device_handle);
if (rc)
continue;
@@ -793,6 +817,17 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc 
*acpi_desc)
 
nfit_mem->nvdimm = nvdimm;
dimm_count++;
+
+   if ((mem_flags & ACPI_NFIT_MEM_FAILED_MASK) == 0)
+   continue;
+
+   dev_info(acpi_desc->dev, "%s: failed: %s%s%s%s\n",
+   nvdimm_name(nvdimm),
+   mem_flags & ACPI_NFIT_MEM_SAVE_FAILED ? "save " : "",
+   mem_flags & ACPI_NFIT_MEM_RESTORE_FAILED ? "restore " : 
"",
+   mem_flags & ACPI_NFIT_MEM_FLUSH_FAILED ? "flush " : "",
+   mem_flags & ACPI_NFIT_MEM_ARMED ? "arm " : "");
+
}
 
return nvdimm_bus_check_dimm_count(acpi_desc->nvdimm_bus, dimm_count);
diff --git a/drivers/acpi/nfit.h b/drivers/acpi/nfit.h
index c62fffea8423..81f2e8c5a79c 100644
--- a/drivers/acpi/nfit.h
+++ b/drivers/acpi/nfit.h
@@ -22,6 +22,9 @@
 
 #define UUID_NFIT_BUS "2f10e7a4-9e91-11e4-89d3-123b93f75cba"
 #define UUID_NFIT_DIMM "4309ac30-0d11-11e4-9191-0800200c9a66"
+#define ACPI_NFIT_MEM_FAILED_MASK (ACPI_NFIT_MEM_SAVE_FAILED \
+   | ACPI_NFIT_MEM_RESTORE_FAILED | 

  1   2   3   4   5   6   7   8   9   10   >