Re: [PATCH 08/12] pinctrl: axp209: account for const type of of_device_id.data

2018-01-02 Thread Linus Walleij
On Tue, Jan 2, 2018 at 2:28 PM, Julia Lawall  wrote:

> The return value of of_device_get_match_data has type const void *.
> The desc field of the pctl structure also has a const type, so there
> is no need for the const-discarding cast between them.
>
> Done using Coccinelle.
>
> Signed-off-by: Julia Lawall 

Patch applied.

Yours,
Linus Walleij


Re: [PATCH 08/12] pinctrl: axp209: account for const type of of_device_id.data

2018-01-02 Thread Linus Walleij
On Tue, Jan 2, 2018 at 2:28 PM, Julia Lawall  wrote:

> The return value of of_device_get_match_data has type const void *.
> The desc field of the pctl structure also has a const type, so there
> is no need for the const-discarding cast between them.
>
> Done using Coccinelle.
>
> Signed-off-by: Julia Lawall 

Patch applied.

Yours,
Linus Walleij


Re: [PATCH] ethernet: mlx4: Delete an error message for a failed memory allocation in five functions

2018-01-02 Thread Tariq Toukan



On 01/01/2018 10:46 PM, SF Markus Elfring wrote:

From: Markus Elfring 
Date: Mon, 1 Jan 2018 21:42:27 +0100

Omit an extra message for a memory allocation failure in these functions.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---


Is this an issue? Why? What is your motivation?
These are error messages, very informative, appear only upon errors, and 
in control flow.


Re: [PATCH] ethernet: mlx4: Delete an error message for a failed memory allocation in five functions

2018-01-02 Thread Tariq Toukan



On 01/01/2018 10:46 PM, SF Markus Elfring wrote:

From: Markus Elfring 
Date: Mon, 1 Jan 2018 21:42:27 +0100

Omit an extra message for a memory allocation failure in these functions.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---


Is this an issue? Why? What is your motivation?
These are error messages, very informative, appear only upon errors, and 
in control flow.


Re: [PATCH 02/12] pinctrl: at91-pio4: account for const type of of_device_id.data

2018-01-02 Thread Linus Walleij
On Tue, Jan 2, 2018 at 2:27 PM, Julia Lawall  wrote:

> This driver creates a const structure that it stores in the data field
> of an of_device_id array.
>
> Adding const to the declaration of the location that receives the
> const value from the data field ensures that the compiler will
> continue to check that the value is not modified.  Furthermore, the
> const-discarding cast on the extraction from the data field is no
> longer needed.
>
> Done using Coccinelle.
>
> Signed-off-by: Julia Lawall 

Patch applied.

Yours,
Linus Walleij


Re: [PATCH 02/12] pinctrl: at91-pio4: account for const type of of_device_id.data

2018-01-02 Thread Linus Walleij
On Tue, Jan 2, 2018 at 2:27 PM, Julia Lawall  wrote:

> This driver creates a const structure that it stores in the data field
> of an of_device_id array.
>
> Adding const to the declaration of the location that receives the
> const value from the data field ensures that the compiler will
> continue to check that the value is not modified.  Furthermore, the
> const-discarding cast on the extraction from the data field is no
> longer needed.
>
> Done using Coccinelle.
>
> Signed-off-by: Julia Lawall 

Patch applied.

Yours,
Linus Walleij


Re: [PATCH 16/67] powerpc: rename dma_direct_ to dma_nommu_

2018-01-02 Thread Geert Uytterhoeven
Hi Michael,

On Wed, Jan 3, 2018 at 7:24 AM, Michael Ellerman  wrote:
> Geert Uytterhoeven  writes:
>
>> On Tue, Jan 2, 2018 at 10:45 AM, Michael Ellerman  
>> wrote:
>>> Christoph Hellwig  writes:
>>>
 We want to use the dma_direct_ namespace for a generic implementation,
 so rename powerpc to the second best choice: dma_nommu_.
>>>
>>> I'm not a fan of "nommu". Some of the users of direct ops *are* using an
>>> IOMMU, they're just setting up a 1:1 mapping once at init time, rather
>>> than mapping dynamically.
>>>
>>> Though I don't have a good idea for a better name, maybe "1to1",
>>> "linear", "premapped" ?
>>
>> "identity"?
>
> I think that would be wrong, but thanks for trying to help :)
>
> The address on the device side is sometimes (often?) offset from the CPU
> address. So eg. the device can DMA to RAM address 0x0 using address
> 0x800.
>
> Identity would imply 0 == 0 etc.
>
> I think "bijective" is the correct term, but that's probably a bit
> esoteric.

OK, didn't know about the offset.
Then "linear" is what we tend to use, right?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH 16/67] powerpc: rename dma_direct_ to dma_nommu_

2018-01-02 Thread Geert Uytterhoeven
Hi Michael,

On Wed, Jan 3, 2018 at 7:24 AM, Michael Ellerman  wrote:
> Geert Uytterhoeven  writes:
>
>> On Tue, Jan 2, 2018 at 10:45 AM, Michael Ellerman  
>> wrote:
>>> Christoph Hellwig  writes:
>>>
 We want to use the dma_direct_ namespace for a generic implementation,
 so rename powerpc to the second best choice: dma_nommu_.
>>>
>>> I'm not a fan of "nommu". Some of the users of direct ops *are* using an
>>> IOMMU, they're just setting up a 1:1 mapping once at init time, rather
>>> than mapping dynamically.
>>>
>>> Though I don't have a good idea for a better name, maybe "1to1",
>>> "linear", "premapped" ?
>>
>> "identity"?
>
> I think that would be wrong, but thanks for trying to help :)
>
> The address on the device side is sometimes (often?) offset from the CPU
> address. So eg. the device can DMA to RAM address 0x0 using address
> 0x800.
>
> Identity would imply 0 == 0 etc.
>
> I think "bijective" is the correct term, but that's probably a bit
> esoteric.

OK, didn't know about the offset.
Then "linear" is what we tend to use, right?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH v3 18/27] pinctrl: replace devm_ioremap_nocache with devm_ioremap

2018-01-02 Thread Linus Walleij
On Wed, Jan 3, 2018 at 7:15 AM, Yisheng Xie  wrote:
> On 2018/1/2 16:43, Linus Walleij wrote:
>> On Sat, Dec 23, 2017 at 12:00 PM, Yisheng Xie  wrote:
>>
>>> Default ioremap is ioremap_nocache, so devm_ioremap has the same
>>> function with devm_ioremap_nocache, which can just be killed to
>>> save the size of devres.o
>>>
>>> This patch is to use use devm_ioremap instead of devm_ioremap_nocache,
>>> which should not have any function change but prepare for killing
>>> devm_ioremap_nocache.
>>>
>>> Cc: Linus Walleij 
>>> Cc: linux-g...@vger.kernel.org
>>> Signed-off-by: Yisheng Xie 
>>
>> Patch applied.
>
> Well, I list the ARCHs related to the change file, do not include 
> cris,ia64,mn10300
> and openrisc, which ioremap is not the same as ioremap_nocache, as discussed 
> in cover
> letter. So please let me know if I need update the comment.

Yeah, same comment as the GPIO patch.

Yours,
Linus Walleij


Re: [PATCH v3 18/27] pinctrl: replace devm_ioremap_nocache with devm_ioremap

2018-01-02 Thread Linus Walleij
On Wed, Jan 3, 2018 at 7:15 AM, Yisheng Xie  wrote:
> On 2018/1/2 16:43, Linus Walleij wrote:
>> On Sat, Dec 23, 2017 at 12:00 PM, Yisheng Xie  wrote:
>>
>>> Default ioremap is ioremap_nocache, so devm_ioremap has the same
>>> function with devm_ioremap_nocache, which can just be killed to
>>> save the size of devres.o
>>>
>>> This patch is to use use devm_ioremap instead of devm_ioremap_nocache,
>>> which should not have any function change but prepare for killing
>>> devm_ioremap_nocache.
>>>
>>> Cc: Linus Walleij 
>>> Cc: linux-g...@vger.kernel.org
>>> Signed-off-by: Yisheng Xie 
>>
>> Patch applied.
>
> Well, I list the ARCHs related to the change file, do not include 
> cris,ia64,mn10300
> and openrisc, which ioremap is not the same as ioremap_nocache, as discussed 
> in cover
> letter. So please let me know if I need update the comment.

Yeah, same comment as the GPIO patch.

Yours,
Linus Walleij


Re: [PATCH v3 06/27] gpio: replace devm_ioremap_nocache with devm_ioremap

2018-01-02 Thread Linus Walleij
On Wed, Jan 3, 2018 at 7:05 AM, Yisheng Xie  wrote:
> On 2018/1/2 16:41, Linus Walleij wrote:
>> On Sat, Dec 23, 2017 at 11:58 AM, Yisheng Xie  wrote:
>>
>>> Default ioremap is ioremap_nocache, so devm_ioremap has the same
>>> function with devm_ioremap_nocache, which can just be killed to
>>> save the size of devres.o
>>>
>>> This patch is to use use devm_ioremap instead of devm_ioremap_nocache,
>>> which should not have any function change but prepare for killing
>>> devm_ioremap_nocache.
>>>
>>> Cc: Linus Walleij 
>>> Cc: linux-g...@vger.kernel.org
>>> Signed-off-by: Yisheng Xie 
>
> Well, I list the ARCHs related to the change file, do not include 
> cris,ia64,mn10300
> and openrisc, which ioremap is not the same as ioremap_nocache, as discussed 
> in cover
> letter. So please let me know if I need update the comment.

I dropped the patch until it's figured out that none of these arches
are affected
by the change.

Please resend with a comment explaining why the change is harmless on the
architectures these drivers are for.

Yours,
Linus Walleij


Re: [PATCH v3 06/27] gpio: replace devm_ioremap_nocache with devm_ioremap

2018-01-02 Thread Linus Walleij
On Wed, Jan 3, 2018 at 7:05 AM, Yisheng Xie  wrote:
> On 2018/1/2 16:41, Linus Walleij wrote:
>> On Sat, Dec 23, 2017 at 11:58 AM, Yisheng Xie  wrote:
>>
>>> Default ioremap is ioremap_nocache, so devm_ioremap has the same
>>> function with devm_ioremap_nocache, which can just be killed to
>>> save the size of devres.o
>>>
>>> This patch is to use use devm_ioremap instead of devm_ioremap_nocache,
>>> which should not have any function change but prepare for killing
>>> devm_ioremap_nocache.
>>>
>>> Cc: Linus Walleij 
>>> Cc: linux-g...@vger.kernel.org
>>> Signed-off-by: Yisheng Xie 
>
> Well, I list the ARCHs related to the change file, do not include 
> cris,ia64,mn10300
> and openrisc, which ioremap is not the same as ioremap_nocache, as discussed 
> in cover
> letter. So please let me know if I need update the comment.

I dropped the patch until it's figured out that none of these arches
are affected
by the change.

Please resend with a comment explaining why the change is harmless on the
architectures these drivers are for.

Yours,
Linus Walleij


Re: general protection fault in __netlink_ns_capable

2018-01-02 Thread Andrei Vagin
On Tue, Jan 02, 2018 at 04:35:11PM -0800, Andrei Vagin wrote:
> On Tue, Jan 02, 2018 at 10:58:01AM -0800, syzbot wrote:
> > Hello,
> > 
> > syzkaller hit the following crash on
> > 75aa5540627fdb3d8f86229776ea87f995275351
> > git://git.cmpxchg.org/linux-mmots.git/master
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached
> > Raw console output is attached.
> > C reproducer is attached
> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> > for information about syzkaller reproducers
> > 
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+e432865c29eb4c48c...@syzkaller.appspotmail.com
> > It will help syzbot understand when the bug is fixed. See footer for
> > details.
> > If you forward the report, please keep this part and the footer.
> > 
> > netlink: 3 bytes leftover after parsing attributes in process
> > `syzkaller140561'.
> > netlink: 3 bytes leftover after parsing attributes in process
> > `syzkaller140561'.
> > netlink: 3 bytes leftover after parsing attributes in process
> > `syzkaller140561'.
> > kasan: CONFIG_KASAN_INLINE enabled
> > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > general protection fault:  [#1] SMP KASAN
> > Dumping ftrace buffer:
> >(ftrace buffer empty)
> > Modules linked in:
> > CPU: 1 PID: 3149 Comm: syzkaller140561 Not tainted 4.15.0-rc4-mm1+ #47
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > RIP: 0010:__netlink_ns_capable+0x8b/0x120 net/netlink/af_netlink.c:868
> 
> NETLINK_CB(skb).sk is NULL here. It looks like we have to use
> sk_ns_capable instead of netlink_ns_capable:
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index c688dc564b11..408c75de52ea 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1762,7 +1762,7 @@ static struct net *get_target_net(struct sk_buff
> *skb, int netnsid)
> /* For now, the caller is required to have CAP_NET_ADMIN in
>  * the user namespace owning the target net ns.
>  */
> -   if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) {
> +   if (!sk_ns_capable(skb->sk, net->user_ns, CAP_NET_ADMIN)) {
> put_net(net);
> return ERR_PTR(-EACCES);
> }
>

get_target_net() is used twice in the code. In rtnl_getlink(), we need
to use netlink_ns_capable(skb, ...), but in rtnl_dump_ifinfo, we need to
use sk_ns_capable(skb->sk, ...).

Pls, take a look at this patch:
https://patchwork.ozlabs.org/patch/854896/
Subject: rtnetlink: give a user socket to get_target_net()


Re: general protection fault in __netlink_ns_capable

2018-01-02 Thread Andrei Vagin
On Tue, Jan 02, 2018 at 04:35:11PM -0800, Andrei Vagin wrote:
> On Tue, Jan 02, 2018 at 10:58:01AM -0800, syzbot wrote:
> > Hello,
> > 
> > syzkaller hit the following crash on
> > 75aa5540627fdb3d8f86229776ea87f995275351
> > git://git.cmpxchg.org/linux-mmots.git/master
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached
> > Raw console output is attached.
> > C reproducer is attached
> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> > for information about syzkaller reproducers
> > 
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+e432865c29eb4c48c...@syzkaller.appspotmail.com
> > It will help syzbot understand when the bug is fixed. See footer for
> > details.
> > If you forward the report, please keep this part and the footer.
> > 
> > netlink: 3 bytes leftover after parsing attributes in process
> > `syzkaller140561'.
> > netlink: 3 bytes leftover after parsing attributes in process
> > `syzkaller140561'.
> > netlink: 3 bytes leftover after parsing attributes in process
> > `syzkaller140561'.
> > kasan: CONFIG_KASAN_INLINE enabled
> > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > general protection fault:  [#1] SMP KASAN
> > Dumping ftrace buffer:
> >(ftrace buffer empty)
> > Modules linked in:
> > CPU: 1 PID: 3149 Comm: syzkaller140561 Not tainted 4.15.0-rc4-mm1+ #47
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > RIP: 0010:__netlink_ns_capable+0x8b/0x120 net/netlink/af_netlink.c:868
> 
> NETLINK_CB(skb).sk is NULL here. It looks like we have to use
> sk_ns_capable instead of netlink_ns_capable:
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index c688dc564b11..408c75de52ea 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1762,7 +1762,7 @@ static struct net *get_target_net(struct sk_buff
> *skb, int netnsid)
> /* For now, the caller is required to have CAP_NET_ADMIN in
>  * the user namespace owning the target net ns.
>  */
> -   if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) {
> +   if (!sk_ns_capable(skb->sk, net->user_ns, CAP_NET_ADMIN)) {
> put_net(net);
> return ERR_PTR(-EACCES);
> }
>

get_target_net() is used twice in the code. In rtnl_getlink(), we need
to use netlink_ns_capable(skb, ...), but in rtnl_dump_ifinfo, we need to
use sk_ns_capable(skb->sk, ...).

Pls, take a look at this patch:
https://patchwork.ozlabs.org/patch/854896/
Subject: rtnetlink: give a user socket to get_target_net()


Re: [PATCH v3] f2fs: add an ioctl to disable GC for specific file

2018-01-02 Thread Chao Yu
On 2018/1/3 11:21, Jaegeuk Kim wrote:
> This patch gives a flag to disable GC on given file, which would be useful, 
> when
> user wants to keep its block map. It also conducts in-place-update for 
> dontmove
> file.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
> 
> Change log from v2:
>  - modify ioctl to allow users unpin the file
> 
>  fs/f2fs/data.c  |  2 ++
>  fs/f2fs/f2fs.h  | 28 +-
>  fs/f2fs/file.c  | 64 
> +
>  fs/f2fs/gc.c| 11 +
>  fs/f2fs/gc.h|  2 ++
>  fs/f2fs/sysfs.c |  2 ++
>  include/linux/f2fs_fs.h |  9 ++-
>  7 files changed, 116 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 449b0aaa3905..45f65a5b9871 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -1395,6 +1395,8 @@ static inline bool need_inplace_update(struct 
> f2fs_io_info *fio)
>  {
>   struct inode *inode = fio->page->mapping->host;
>  
> + if (f2fs_is_pinned_file(inode))
> + return true;
>   if (S_ISDIR(inode->i_mode) || f2fs_is_atomic_file(inode))
>   return false;
>   if (is_cold_data(fio->page))
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index a0e8eec23125..f4b7d73695a7 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -350,6 +350,7 @@ static inline bool __has_cursum_space(struct f2fs_journal 
> *journal,
>  #define F2FS_IOC_GARBAGE_COLLECT_RANGE   _IOW(F2FS_IOCTL_MAGIC, 11,  
> \
>   struct f2fs_gc_range)
>  #define F2FS_IOC_GET_FEATURES_IOR(F2FS_IOCTL_MAGIC, 12, 
> __u32)
> +#define F2FS_IOC_SET_PIN_FILE_IOW(F2FS_IOCTL_MAGIC, 13, 
> __u32)
>  
>  #define F2FS_IOC_SET_ENCRYPTION_POLICY   FS_IOC_SET_ENCRYPTION_POLICY
>  #define F2FS_IOC_GET_ENCRYPTION_POLICY   FS_IOC_GET_ENCRYPTION_POLICY
> @@ -587,7 +588,10 @@ struct f2fs_inode_info {
>   unsigned long i_flags;  /* keep an inode flags for ioctl */
>   unsigned char i_advise; /* use to give file attribute hints */
>   unsigned char i_dir_level;  /* use for dentry level for large dir */
> - unsigned int i_current_depth;   /* use only in directory structure */
> + union {
> + unsigned int i_current_depth;   /* only for directory depth */
> + unsigned short i_gc_failures;   /* only for regular file */
> + };
>   unsigned int i_pino;/* parent inode number */
>   umode_t i_acl_mode; /* keep file acl mode temporarily */
>  
> @@ -1133,6 +1137,9 @@ struct f2fs_sb_info {
>   /* threshold for converting bg victims for fg */
>   u64 fggc_threshold;
>  
> + /* threshold for gc trials on pinned files */
> + u64 gc_pin_file_threshold;
> +
>   /* maximum # of trials to find a victim segment for SSR and GC */
>   unsigned int max_victim_search;
>  
> @@ -2124,6 +2131,7 @@ enum {
>   FI_HOT_DATA,/* indicate file is hot */
>   FI_EXTRA_ATTR,  /* indicate file has extra attribute */
>   FI_PROJ_INHERIT,/* indicate file inherits projectid */
> + FI_PIN_FILE,/* indicate file should not be gced */
>  };
>  
>  static inline void __mark_inode_dirty_flag(struct inode *inode,
> @@ -2137,6 +2145,7 @@ static inline void __mark_inode_dirty_flag(struct inode 
> *inode,
>   return;
>   case FI_DATA_EXIST:
>   case FI_INLINE_DOTS:
> + case FI_PIN_FILE:
>   f2fs_mark_inode_dirty_sync(inode, true);
>   }
>  }
> @@ -2217,6 +2226,13 @@ static inline void f2fs_i_depth_write(struct inode 
> *inode, unsigned int depth)
>   f2fs_mark_inode_dirty_sync(inode, true);
>  }
>  
> +static inline void f2fs_i_gc_failures_write(struct inode *inode,
> + unsigned int count)
> +{
> + F2FS_I(inode)->i_gc_failures = count;
> + f2fs_mark_inode_dirty_sync(inode, true);
> +}
> +
>  static inline void f2fs_i_xnid_write(struct inode *inode, nid_t xnid)
>  {
>   F2FS_I(inode)->i_xattr_nid = xnid;
> @@ -2245,6 +2261,8 @@ static inline void get_inline_info(struct inode *inode, 
> struct f2fs_inode *ri)
>   set_bit(FI_INLINE_DOTS, >flags);
>   if (ri->i_inline & F2FS_EXTRA_ATTR)
>   set_bit(FI_EXTRA_ATTR, >flags);
> + if (ri->i_inline & F2FS_PIN_FILE)
> + set_bit(FI_PIN_FILE, >flags);
>  }
>  
>  static inline void set_raw_inline(struct inode *inode, struct f2fs_inode *ri)
> @@ -2263,6 +2281,8 @@ static inline void set_raw_inline(struct inode *inode, 
> struct f2fs_inode *ri)
>   ri->i_inline |= F2FS_INLINE_DOTS;
>   if (is_inode_flag_set(inode, FI_EXTRA_ATTR))
>   ri->i_inline |= F2FS_EXTRA_ATTR;
> + if (is_inode_flag_set(inode, FI_PIN_FILE))
> + ri->i_inline |= F2FS_PIN_FILE;
>  }
>  
>  static inline int f2fs_has_extra_attr(struct 

Re: [PATCH v3] f2fs: add an ioctl to disable GC for specific file

2018-01-02 Thread Chao Yu
On 2018/1/3 11:21, Jaegeuk Kim wrote:
> This patch gives a flag to disable GC on given file, which would be useful, 
> when
> user wants to keep its block map. It also conducts in-place-update for 
> dontmove
> file.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
> 
> Change log from v2:
>  - modify ioctl to allow users unpin the file
> 
>  fs/f2fs/data.c  |  2 ++
>  fs/f2fs/f2fs.h  | 28 +-
>  fs/f2fs/file.c  | 64 
> +
>  fs/f2fs/gc.c| 11 +
>  fs/f2fs/gc.h|  2 ++
>  fs/f2fs/sysfs.c |  2 ++
>  include/linux/f2fs_fs.h |  9 ++-
>  7 files changed, 116 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 449b0aaa3905..45f65a5b9871 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -1395,6 +1395,8 @@ static inline bool need_inplace_update(struct 
> f2fs_io_info *fio)
>  {
>   struct inode *inode = fio->page->mapping->host;
>  
> + if (f2fs_is_pinned_file(inode))
> + return true;
>   if (S_ISDIR(inode->i_mode) || f2fs_is_atomic_file(inode))
>   return false;
>   if (is_cold_data(fio->page))
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index a0e8eec23125..f4b7d73695a7 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -350,6 +350,7 @@ static inline bool __has_cursum_space(struct f2fs_journal 
> *journal,
>  #define F2FS_IOC_GARBAGE_COLLECT_RANGE   _IOW(F2FS_IOCTL_MAGIC, 11,  
> \
>   struct f2fs_gc_range)
>  #define F2FS_IOC_GET_FEATURES_IOR(F2FS_IOCTL_MAGIC, 12, 
> __u32)
> +#define F2FS_IOC_SET_PIN_FILE_IOW(F2FS_IOCTL_MAGIC, 13, 
> __u32)
>  
>  #define F2FS_IOC_SET_ENCRYPTION_POLICY   FS_IOC_SET_ENCRYPTION_POLICY
>  #define F2FS_IOC_GET_ENCRYPTION_POLICY   FS_IOC_GET_ENCRYPTION_POLICY
> @@ -587,7 +588,10 @@ struct f2fs_inode_info {
>   unsigned long i_flags;  /* keep an inode flags for ioctl */
>   unsigned char i_advise; /* use to give file attribute hints */
>   unsigned char i_dir_level;  /* use for dentry level for large dir */
> - unsigned int i_current_depth;   /* use only in directory structure */
> + union {
> + unsigned int i_current_depth;   /* only for directory depth */
> + unsigned short i_gc_failures;   /* only for regular file */
> + };
>   unsigned int i_pino;/* parent inode number */
>   umode_t i_acl_mode; /* keep file acl mode temporarily */
>  
> @@ -1133,6 +1137,9 @@ struct f2fs_sb_info {
>   /* threshold for converting bg victims for fg */
>   u64 fggc_threshold;
>  
> + /* threshold for gc trials on pinned files */
> + u64 gc_pin_file_threshold;
> +
>   /* maximum # of trials to find a victim segment for SSR and GC */
>   unsigned int max_victim_search;
>  
> @@ -2124,6 +2131,7 @@ enum {
>   FI_HOT_DATA,/* indicate file is hot */
>   FI_EXTRA_ATTR,  /* indicate file has extra attribute */
>   FI_PROJ_INHERIT,/* indicate file inherits projectid */
> + FI_PIN_FILE,/* indicate file should not be gced */
>  };
>  
>  static inline void __mark_inode_dirty_flag(struct inode *inode,
> @@ -2137,6 +2145,7 @@ static inline void __mark_inode_dirty_flag(struct inode 
> *inode,
>   return;
>   case FI_DATA_EXIST:
>   case FI_INLINE_DOTS:
> + case FI_PIN_FILE:
>   f2fs_mark_inode_dirty_sync(inode, true);
>   }
>  }
> @@ -2217,6 +2226,13 @@ static inline void f2fs_i_depth_write(struct inode 
> *inode, unsigned int depth)
>   f2fs_mark_inode_dirty_sync(inode, true);
>  }
>  
> +static inline void f2fs_i_gc_failures_write(struct inode *inode,
> + unsigned int count)
> +{
> + F2FS_I(inode)->i_gc_failures = count;
> + f2fs_mark_inode_dirty_sync(inode, true);
> +}
> +
>  static inline void f2fs_i_xnid_write(struct inode *inode, nid_t xnid)
>  {
>   F2FS_I(inode)->i_xattr_nid = xnid;
> @@ -2245,6 +2261,8 @@ static inline void get_inline_info(struct inode *inode, 
> struct f2fs_inode *ri)
>   set_bit(FI_INLINE_DOTS, >flags);
>   if (ri->i_inline & F2FS_EXTRA_ATTR)
>   set_bit(FI_EXTRA_ATTR, >flags);
> + if (ri->i_inline & F2FS_PIN_FILE)
> + set_bit(FI_PIN_FILE, >flags);
>  }
>  
>  static inline void set_raw_inline(struct inode *inode, struct f2fs_inode *ri)
> @@ -2263,6 +2281,8 @@ static inline void set_raw_inline(struct inode *inode, 
> struct f2fs_inode *ri)
>   ri->i_inline |= F2FS_INLINE_DOTS;
>   if (is_inode_flag_set(inode, FI_EXTRA_ATTR))
>   ri->i_inline |= F2FS_EXTRA_ATTR;
> + if (is_inode_flag_set(inode, FI_PIN_FILE))
> + ri->i_inline |= F2FS_PIN_FILE;
>  }
>  
>  static inline int f2fs_has_extra_attr(struct inode *inode)
> @@ 

Re: [PATCH 04/13] powerpc/powernv: Add platform-specific services for opencapi

2018-01-02 Thread Andrew Donnellan

On 19/12/17 02:21, Frederic Barrat wrote:

Implement a few platform-specific calls which can be used by drivers:

- provide the Transaction Layer capabilities of the host, so that the
   driver can find some common ground and configure the device and host
   appropriately.

- provide the hw interrupt to be used for translation faults raised by
   the NPU

- map/unmap some NPU mmio registers to get the fault context when the
   NPU raises an address translation fault

The rest are wrappers around the previously-introduced opal calls.


Signed-off-by: Frederic Barrat 
---
  arch/powerpc/include/asm/pnv-ocxl.h |  36 ++
  arch/powerpc/platforms/powernv/Makefile |   1 +
  arch/powerpc/platforms/powernv/ocxl.c   | 187 
  3 files changed, 224 insertions(+)
  create mode 100644 arch/powerpc/include/asm/pnv-ocxl.h
  create mode 100644 arch/powerpc/platforms/powernv/ocxl.c

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
new file mode 100644
index ..b9ab3f0a9634
--- /dev/null
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ASM_PVN_OCXL_H
+#define _ASM_PVN_OCXL_H


I assume you meant "PNV" here.


+
+#include 
+
+#define PNV_OCXL_TL_MAX_TEMPLATE63
+#define PNV_OCXL_TL_BITS_PER_RATE   4
+#define PNV_OCXL_TL_RATE_BUF_SIZE   ((PNV_OCXL_TL_MAX_TEMPLATE+1) * 
PNV_OCXL_TL_BITS_PER_RATE / 8)
+
+extern int pnv_ocxl_get_tl_cap(struct pci_dev *dev, long *cap,
+   char *rate_buf, int rate_buf_size);
+extern int pnv_ocxl_set_tl_conf(struct pci_dev *dev, long cap,
+   uint64_t rate_buf_phys, int rate_buf_size);
+
+extern int pnv_ocxl_get_xsl_irq(struct pci_dev *dev, int *hwirq);
+extern void pnv_ocxl_unmap_xsl_regs(void __iomem *dsisr, void __iomem *dar,
+   void __iomem *tfc, void __iomem *pe_handle);
+extern int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void __iomem **dsisr,
+   void __iomem **dar, void __iomem **tfc,
+   void __iomem **pe_handle);
+
+extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
+   void **platform_data);
+extern void pnv_ocxl_spa_release(void *platform_data);
+extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);
+
+#endif /* _ASM_PVN_OCXL_H */


And here


diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 3732118a0482..6c9d5199a7e2 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -17,3 +17,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
  obj-$(CONFIG_PPC_MEMTRACE)+= memtrace.o
  obj-$(CONFIG_PPC_VAS) += vas.o vas-window.o vas-debug.o
  obj-$(CONFIG_PPC_FTW) += nx-ftw.o
+obj-$(CONFIG_OCXL_BASE)+= ocxl.o
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
new file mode 100644
index ..3378b75cf5e5
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/ocxl.c
+int pnv_ocxl_get_xsl_irq(struct pci_dev *dev, int *hwirq)
+{
+   int rc;
+
+   rc = of_property_read_u32(dev->dev.of_node, "ibm,opal-xsl-irq", hwirq);
+   if (rc) {
+   dev_err(>dev,
+   "Can't translation xsl interrupt for device\n");


Can't get?


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH 04/13] powerpc/powernv: Add platform-specific services for opencapi

2018-01-02 Thread Andrew Donnellan

On 19/12/17 02:21, Frederic Barrat wrote:

Implement a few platform-specific calls which can be used by drivers:

- provide the Transaction Layer capabilities of the host, so that the
   driver can find some common ground and configure the device and host
   appropriately.

- provide the hw interrupt to be used for translation faults raised by
   the NPU

- map/unmap some NPU mmio registers to get the fault context when the
   NPU raises an address translation fault

The rest are wrappers around the previously-introduced opal calls.


Signed-off-by: Frederic Barrat 
---
  arch/powerpc/include/asm/pnv-ocxl.h |  36 ++
  arch/powerpc/platforms/powernv/Makefile |   1 +
  arch/powerpc/platforms/powernv/ocxl.c   | 187 
  3 files changed, 224 insertions(+)
  create mode 100644 arch/powerpc/include/asm/pnv-ocxl.h
  create mode 100644 arch/powerpc/platforms/powernv/ocxl.c

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
new file mode 100644
index ..b9ab3f0a9634
--- /dev/null
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ASM_PVN_OCXL_H
+#define _ASM_PVN_OCXL_H


I assume you meant "PNV" here.


+
+#include 
+
+#define PNV_OCXL_TL_MAX_TEMPLATE63
+#define PNV_OCXL_TL_BITS_PER_RATE   4
+#define PNV_OCXL_TL_RATE_BUF_SIZE   ((PNV_OCXL_TL_MAX_TEMPLATE+1) * 
PNV_OCXL_TL_BITS_PER_RATE / 8)
+
+extern int pnv_ocxl_get_tl_cap(struct pci_dev *dev, long *cap,
+   char *rate_buf, int rate_buf_size);
+extern int pnv_ocxl_set_tl_conf(struct pci_dev *dev, long cap,
+   uint64_t rate_buf_phys, int rate_buf_size);
+
+extern int pnv_ocxl_get_xsl_irq(struct pci_dev *dev, int *hwirq);
+extern void pnv_ocxl_unmap_xsl_regs(void __iomem *dsisr, void __iomem *dar,
+   void __iomem *tfc, void __iomem *pe_handle);
+extern int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void __iomem **dsisr,
+   void __iomem **dar, void __iomem **tfc,
+   void __iomem **pe_handle);
+
+extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
+   void **platform_data);
+extern void pnv_ocxl_spa_release(void *platform_data);
+extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);
+
+#endif /* _ASM_PVN_OCXL_H */


And here


diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 3732118a0482..6c9d5199a7e2 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -17,3 +17,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
  obj-$(CONFIG_PPC_MEMTRACE)+= memtrace.o
  obj-$(CONFIG_PPC_VAS) += vas.o vas-window.o vas-debug.o
  obj-$(CONFIG_PPC_FTW) += nx-ftw.o
+obj-$(CONFIG_OCXL_BASE)+= ocxl.o
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
new file mode 100644
index ..3378b75cf5e5
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/ocxl.c
+int pnv_ocxl_get_xsl_irq(struct pci_dev *dev, int *hwirq)
+{
+   int rc;
+
+   rc = of_property_read_u32(dev->dev.of_node, "ibm,opal-xsl-irq", hwirq);
+   if (rc) {
+   dev_err(>dev,
+   "Can't translation xsl interrupt for device\n");


Can't get?


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH 06/13] ocxl: Driver code for 'generic' opencapi devices

2018-01-02 Thread Andrew Donnellan

On 19/12/17 02:21, Frederic Barrat wrote:

Add an ocxl driver to handle generic opencapi devices. Of course, it's
not meant to be the only opencapi driver, any device is free to
implement its own. But if a host application only needs basic services
like attaching to an opencapi adapter, have translation faults handled
or allocate AFU interrupts, it should suffice.

The AFU config space must follow the opencapi specification and use
the expected vendor/device ID to be seen by the generic driver.

The driver exposes the device AFUs as a char device in /dev/ocxl/

Note that the driver currently doesn't handle memory attached to the
opencapi device.

Signed-off-by: Frederic Barrat 
Signed-off-by: Andrew Donnellan 
Signed-off-by: Alastair D'Silva 


A bunch of sparse warnings we should look at. (there's a few more that 
appear in later patches too)



---
  drivers/misc/ocxl/config.c| 718 ++
  drivers/misc/ocxl/context.c   | 237 +
  drivers/misc/ocxl/file.c  | 405 +
  drivers/misc/ocxl/link.c  | 610 
  drivers/misc/ocxl/main.c  |  40 +++
  drivers/misc/ocxl/ocxl_internal.h | 200 +++
  drivers/misc/ocxl/pasid.c | 114 ++
  drivers/misc/ocxl/pci.c   | 592 +++
  drivers/misc/ocxl/sysfs.c | 150 
  include/uapi/misc/ocxl.h  |  47 +++
  10 files changed, 3113 insertions(+)
  create mode 100644 drivers/misc/ocxl/config.c
  create mode 100644 drivers/misc/ocxl/context.c
  create mode 100644 drivers/misc/ocxl/file.c
  create mode 100644 drivers/misc/ocxl/link.c
  create mode 100644 drivers/misc/ocxl/main.c
  create mode 100644 drivers/misc/ocxl/ocxl_internal.h
  create mode 100644 drivers/misc/ocxl/pasid.c
  create mode 100644 drivers/misc/ocxl/pci.c
  create mode 100644 drivers/misc/ocxl/sysfs.c
  create mode 100644 include/uapi/misc/ocxl.h

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
new file mode 100644
index ..bb2fde5967e2
--- /dev/null
+++ b/drivers/misc/ocxl/config.c
@@ -0,0 +1,718 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include "ocxl_internal.h"
+
+#define EXTRACT_BIT(val, bit) (!!(val & BIT(bit)))
+#define EXTRACT_BITS(val, s, e) ((val & GENMASK(e, s)) >> s)
+
+#define OCXL_DVSEC_AFU_IDX_MASK  GENMASK(5, 0)
+#define OCXL_DVSEC_ACTAG_MASKGENMASK(11, 0)
+#define OCXL_DVSEC_PASID_MASKGENMASK(19, 0)
+#define OCXL_DVSEC_PASID_LOG_MASKGENMASK(4, 0)
+
+#define OCXL_DVSEC_TEMPL_VERSION 0x0
+#define OCXL_DVSEC_TEMPL_NAME0x4
+#define OCXL_DVSEC_TEMPL_AFU_VERSION 0x1C
+#define OCXL_DVSEC_TEMPL_MMIO_GLOBAL 0x20
+#define OCXL_DVSEC_TEMPL_MMIO_GLOBAL_SZ  0x28
+#define OCXL_DVSEC_TEMPL_MMIO_PP 0x30
+#define OCXL_DVSEC_TEMPL_MMIO_PP_SZ  0x38
+#define OCXL_DVSEC_TEMPL_MEM_SZ  0x3C
+#define OCXL_DVSEC_TEMPL_WWID0x40
+
+#define OCXL_MAX_AFU_PER_FUNCTION 64
+#define OCXL_TEMPL_LEN0x58
+#define OCXL_TEMPL_NAME_LEN   24
+#define OCXL_CFG_TIMEOUT 3
+
+static int find_dvsec(struct pci_dev *dev, int dvsec_id)
+{
+   int vsec = 0;
+   u16 vendor, id;
+
+   while ((vsec = pci_find_next_ext_capability(dev, vsec,
+   OCXL_EXT_CAP_ID_DVSEC))) {
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_VENDOR_OFFSET,
+   );
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_ID_OFFSET, );
+   if (vendor == PCI_VENDOR_ID_IBM && id == dvsec_id)
+   return vsec;
+   }
+   return 0;
+}
+
+static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 afu_idx)
+{
+   int vsec = 0;
+   u16 vendor, id;
+   u8 idx;
+
+   while ((vsec = pci_find_next_ext_capability(dev, vsec,
+   OCXL_EXT_CAP_ID_DVSEC))) {
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_VENDOR_OFFSET,
+   );
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_ID_OFFSET, );
+
+   if (vendor == PCI_VENDOR_ID_IBM &&
+   id == OCXL_DVSEC_AFU_CTRL_ID) {
+   pci_read_config_byte(dev,
+   vsec + OCXL_DVSEC_AFU_CTRL_AFU_IDX,
+   );
+   if (idx == afu_idx)
+   return vsec;
+   }
+   }
+   return 0;
+}
+
+static int 

Re: [PATCH 06/13] ocxl: Driver code for 'generic' opencapi devices

2018-01-02 Thread Andrew Donnellan

On 19/12/17 02:21, Frederic Barrat wrote:

Add an ocxl driver to handle generic opencapi devices. Of course, it's
not meant to be the only opencapi driver, any device is free to
implement its own. But if a host application only needs basic services
like attaching to an opencapi adapter, have translation faults handled
or allocate AFU interrupts, it should suffice.

The AFU config space must follow the opencapi specification and use
the expected vendor/device ID to be seen by the generic driver.

The driver exposes the device AFUs as a char device in /dev/ocxl/

Note that the driver currently doesn't handle memory attached to the
opencapi device.

Signed-off-by: Frederic Barrat 
Signed-off-by: Andrew Donnellan 
Signed-off-by: Alastair D'Silva 


A bunch of sparse warnings we should look at. (there's a few more that 
appear in later patches too)



---
  drivers/misc/ocxl/config.c| 718 ++
  drivers/misc/ocxl/context.c   | 237 +
  drivers/misc/ocxl/file.c  | 405 +
  drivers/misc/ocxl/link.c  | 610 
  drivers/misc/ocxl/main.c  |  40 +++
  drivers/misc/ocxl/ocxl_internal.h | 200 +++
  drivers/misc/ocxl/pasid.c | 114 ++
  drivers/misc/ocxl/pci.c   | 592 +++
  drivers/misc/ocxl/sysfs.c | 150 
  include/uapi/misc/ocxl.h  |  47 +++
  10 files changed, 3113 insertions(+)
  create mode 100644 drivers/misc/ocxl/config.c
  create mode 100644 drivers/misc/ocxl/context.c
  create mode 100644 drivers/misc/ocxl/file.c
  create mode 100644 drivers/misc/ocxl/link.c
  create mode 100644 drivers/misc/ocxl/main.c
  create mode 100644 drivers/misc/ocxl/ocxl_internal.h
  create mode 100644 drivers/misc/ocxl/pasid.c
  create mode 100644 drivers/misc/ocxl/pci.c
  create mode 100644 drivers/misc/ocxl/sysfs.c
  create mode 100644 include/uapi/misc/ocxl.h

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
new file mode 100644
index ..bb2fde5967e2
--- /dev/null
+++ b/drivers/misc/ocxl/config.c
@@ -0,0 +1,718 @@
+/*
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include "ocxl_internal.h"
+
+#define EXTRACT_BIT(val, bit) (!!(val & BIT(bit)))
+#define EXTRACT_BITS(val, s, e) ((val & GENMASK(e, s)) >> s)
+
+#define OCXL_DVSEC_AFU_IDX_MASK  GENMASK(5, 0)
+#define OCXL_DVSEC_ACTAG_MASKGENMASK(11, 0)
+#define OCXL_DVSEC_PASID_MASKGENMASK(19, 0)
+#define OCXL_DVSEC_PASID_LOG_MASKGENMASK(4, 0)
+
+#define OCXL_DVSEC_TEMPL_VERSION 0x0
+#define OCXL_DVSEC_TEMPL_NAME0x4
+#define OCXL_DVSEC_TEMPL_AFU_VERSION 0x1C
+#define OCXL_DVSEC_TEMPL_MMIO_GLOBAL 0x20
+#define OCXL_DVSEC_TEMPL_MMIO_GLOBAL_SZ  0x28
+#define OCXL_DVSEC_TEMPL_MMIO_PP 0x30
+#define OCXL_DVSEC_TEMPL_MMIO_PP_SZ  0x38
+#define OCXL_DVSEC_TEMPL_MEM_SZ  0x3C
+#define OCXL_DVSEC_TEMPL_WWID0x40
+
+#define OCXL_MAX_AFU_PER_FUNCTION 64
+#define OCXL_TEMPL_LEN0x58
+#define OCXL_TEMPL_NAME_LEN   24
+#define OCXL_CFG_TIMEOUT 3
+
+static int find_dvsec(struct pci_dev *dev, int dvsec_id)
+{
+   int vsec = 0;
+   u16 vendor, id;
+
+   while ((vsec = pci_find_next_ext_capability(dev, vsec,
+   OCXL_EXT_CAP_ID_DVSEC))) {
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_VENDOR_OFFSET,
+   );
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_ID_OFFSET, );
+   if (vendor == PCI_VENDOR_ID_IBM && id == dvsec_id)
+   return vsec;
+   }
+   return 0;
+}
+
+static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 afu_idx)
+{
+   int vsec = 0;
+   u16 vendor, id;
+   u8 idx;
+
+   while ((vsec = pci_find_next_ext_capability(dev, vsec,
+   OCXL_EXT_CAP_ID_DVSEC))) {
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_VENDOR_OFFSET,
+   );
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_ID_OFFSET, );
+
+   if (vendor == PCI_VENDOR_ID_IBM &&
+   id == OCXL_DVSEC_AFU_CTRL_ID) {
+   pci_read_config_byte(dev,
+   vsec + OCXL_DVSEC_AFU_CTRL_AFU_IDX,
+   );
+   if (idx == afu_idx)
+   return vsec;
+   }
+   }
+   return 0;
+}
+
+static int read_pasid(struct pci_dev *dev, struct ocxl_fn_config *fn)
+{
+   u16 val;

[PATCHv4 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2018-01-02 Thread Mahesh Bandewar
From: Mahesh Bandewar 

Add a sysctl variable kernel.controlled_userns_caps_whitelist. Capability
mask is stored in kernel as kernel_cap_t type (array of u32). This sysctl
takes input as comma separated hex u32 words. For simplicity one could
see this sysctl to operate on string inputs. However the value is not
expected to change that often during the life of a kernel-boot. It makes
more sense to use the widely available API instead of bringing another
string manipulation for the purpose of making this simpler.

The default value set (for kernel.controlled_userns_caps_whitelist) is
CAP_FULL_SET indicating that no capability is controlled by default to
maintain compatibility with the existing behavior of user-ns. Administrator
will have to modify this sysctl to control any capability as such. e.g. to
control CAP_NET_RAW the mask need to be changed like -

  # sysctl -q kernel.controlled_userns_caps_whitelist
  kernel.controlled_userns_caps_whitelist = 1f,
  # sysctl -w kernel.controlled_userns_caps_whitelist=1f,dfff
  kernel.controlled_userns_caps_whitelist = 1f,dfff

For bit-to-mask conversion please check include/uapi/linux/capability.h
file.

Any capabilities that are not part of this mask will be controlled and
will not be allowed to processes in controlled user-ns. In above example
CAP_NET_RAW will not be available to controlled-user-namespaces.

Acked-by: Serge Hallyn 
Signed-off-by: Mahesh Bandewar 
---
v4:
  commit message changes.
v3:
  Added couple of comments as requested by Serge Hallyn
v2:
  Rebase
v1:
  Initial submission

 Documentation/sysctl/kernel.txt | 21 ++
 include/linux/capability.h  |  3 +++
 kernel/capability.c | 47 +
 kernel/sysctl.c |  5 +
 4 files changed, 76 insertions(+)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 694968c7523c..6aa1e087afee 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -25,6 +25,7 @@ show up in /proc/sys/kernel:
 - bootloader_version[ X86 only ]
 - callhome  [ S390 only ]
 - cap_last_cap
+- controlled_userns_caps_whitelist
 - core_pattern
 - core_pipe_limit
 - core_uses_pid
@@ -187,6 +188,26 @@ CAP_LAST_CAP from the kernel.
 
 ==
 
+controlled_userns_caps_whitelist
+
+Capability mask that is whitelisted for "controlled" user namespaces.
+Any capability that is missing from this mask will not be allowed to
+any process that is attached to a controlled-userns. e.g. if CAP_NET_RAW
+is not part of this mask, then processes running inside any controlled
+userns's will not be allowed to perform action that needs CAP_NET_RAW
+capability. However, processes that are attached to a parent user-ns
+hierarchy that is *not* controlled and has CAP_NET_RAW can continue
+performing those actions. User-namespaces are marked "controlled" at
+the time of their creation based on the capabilities of the creator.
+A process that does not have CAP_SYS_ADMIN will create user-namespaces
+that are controlled.
+
+The value is expressed as two comma separated hex words (u32). This
+sysctl is available in init-ns and users with CAP_SYS_ADMIN in init-ns
+are allowed to make changes.
+
+==
+
 core_pattern:
 
 core_pattern is used to specify a core dumpfile pattern name.
diff --git a/include/linux/capability.h b/include/linux/capability.h
index f640dcbc880c..7d79a4689625 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -14,6 +14,7 @@
 #define _LINUX_CAPABILITY_H
 
 #include 
+#include 
 
 
 #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3
@@ -248,6 +249,8 @@ extern bool ptracer_capable(struct task_struct *tsk, struct 
user_namespace *ns);
 
 /* audit system wants to get cap info from files as well */
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct 
cpu_vfs_cap_data *cpu_caps);
+int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
+void __user *buff, size_t *lenp, loff_t *ppos);
 
 extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t 
size);
 
diff --git a/kernel/capability.c b/kernel/capability.c
index 1e1c0236f55b..4a859b7d4902 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -29,6 +29,8 @@ EXPORT_SYMBOL(__cap_empty_set);
 
 int file_caps_enabled = 1;
 
+kernel_cap_t controlled_userns_caps_whitelist = CAP_FULL_SET;
+
 static int __init file_caps_disable(char *str)
 {
file_caps_enabled = 0;
@@ -507,3 +509,48 @@ bool ptracer_capable(struct task_struct *tsk, struct 
user_namespace *ns)
rcu_read_unlock();
return (ret == 0);
 }
+
+/* Controlled-userns capabilities routines */
+#ifdef CONFIG_SYSCTL
+int proc_douserns_caps_whitelist(struct 

[PATCHv4 2/2] userns: control capabilities of some user namespaces

2018-01-02 Thread Mahesh Bandewar
From: Mahesh Bandewar 

With this new notion of "controlled" user-namespaces, the controlled
user-namespaces are marked at the time of their creation while the
capabilities of processes that belong to them are controlled using the
global mask.

Init-user-ns is always uncontrolled and a process that has SYS_ADMIN
that belongs to uncontrolled user-ns can create another (child) user-
namespace that is uncontrolled. Any other process (that either does
not have SYS_ADMIN or belongs to a controlled user-ns) can only
create a user-ns that is controlled.

global-capability-whitelist (controlled_userns_caps_whitelist) is used
at the capability check-time and keeps the semantics for the processes
that belong to uncontrolled user-ns as it is. Processes that belong to
controlled user-ns however are subjected to different checks-

   (a) if the capability in question is controlled and process belongs
   to controlled user-ns, then it's always denied.
   (b) if the capability in question is NOT controlled then fall back
   to the traditional check.

Acked-by: Serge Hallyn 
Signed-off-by: Mahesh Bandewar 
---
v4:
  Rebase
v3:
  Rebase
v2:
  Don't recalculate user-ns flags for every setns() call.
v1:
  Initial submission.

 include/linux/capability.h |  4 
 include/linux/user_namespace.h | 25 +
 kernel/capability.c|  5 +
 kernel/user_namespace.c|  4 
 security/commoncap.c   |  8 
 5 files changed, 46 insertions(+)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index 7d79a4689625..383f31f066f0 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -251,6 +251,10 @@ extern bool ptracer_capable(struct task_struct *tsk, 
struct user_namespace *ns);
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct 
cpu_vfs_cap_data *cpu_caps);
 int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
 void __user *buff, size_t *lenp, loff_t *ppos);
+/* Controlled capability is capability that is missing from the capability-mask
+ * controlled_userns_caps_whitelist controlled via sysctl.
+ */
+bool is_capability_controlled(int cap);
 
 extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t 
size);
 
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index d6b74b91096b..a5c48684b317 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -32,6 +32,7 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
 };
 
 #define USERNS_SETGROUPS_ALLOWED 1UL
+#define USERNS_CONTROLLED   2UL
 
 #define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED
 
@@ -112,6 +113,21 @@ static inline void put_user_ns(struct user_namespace *ns)
__put_user_ns(ns);
 }
 
+/* Controlled user-ns is the one that is created by a process that does not
+ * have CAP_SYS_ADMIN (or descended from such an user-ns).
+ * For more details please see the sysctl description of
+ * controlled_userns_caps_whitelist.
+ */
+static inline bool is_user_ns_controlled(const struct user_namespace *ns)
+{
+   return ns->flags & USERNS_CONTROLLED;
+}
+
+static inline void mark_user_ns_controlled(struct user_namespace *ns)
+{
+   ns->flags |= USERNS_CONTROLLED;
+}
+
 struct seq_operations;
 extern const struct seq_operations proc_uid_seq_operations;
 extern const struct seq_operations proc_gid_seq_operations;
@@ -170,6 +186,15 @@ static inline struct ns_common *ns_get_owner(struct 
ns_common *ns)
 {
return ERR_PTR(-EPERM);
 }
+
+static inline bool is_user_ns_controlled(const struct user_namespace *ns)
+{
+   return false;
+}
+
+static inline void mark_user_ns_controlled(struct user_namespace *ns)
+{
+}
 #endif
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/capability.c b/kernel/capability.c
index 4a859b7d4902..bffe249922de 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -511,6 +511,11 @@ bool ptracer_capable(struct task_struct *tsk, struct 
user_namespace *ns)
 }
 
 /* Controlled-userns capabilities routines */
+bool is_capability_controlled(int cap)
+{
+   return !cap_raised(controlled_userns_caps_whitelist, cap);
+}
+
 #ifdef CONFIG_SYSCTL
 int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
 void __user *buff, size_t *lenp, loff_t *ppos)
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 246d4d4ce5c7..ca0556d466b6 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -141,6 +141,10 @@ int create_user_ns(struct cred *new)
goto fail_keyring;
 
set_cred_user_ns(new, ns);
+   if (!ns_capable(parent_ns, CAP_SYS_ADMIN) ||
+   is_user_ns_controlled(parent_ns))
+   mark_user_ns_controlled(ns);
+
return 0;
 fail_keyring:
 #ifdef CONFIG_PERSISTENT_KEYRINGS
diff --git a/security/commoncap.c 

[PATCHv4 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2018-01-02 Thread Mahesh Bandewar
From: Mahesh Bandewar 

Add a sysctl variable kernel.controlled_userns_caps_whitelist. Capability
mask is stored in kernel as kernel_cap_t type (array of u32). This sysctl
takes input as comma separated hex u32 words. For simplicity one could
see this sysctl to operate on string inputs. However the value is not
expected to change that often during the life of a kernel-boot. It makes
more sense to use the widely available API instead of bringing another
string manipulation for the purpose of making this simpler.

The default value set (for kernel.controlled_userns_caps_whitelist) is
CAP_FULL_SET indicating that no capability is controlled by default to
maintain compatibility with the existing behavior of user-ns. Administrator
will have to modify this sysctl to control any capability as such. e.g. to
control CAP_NET_RAW the mask need to be changed like -

  # sysctl -q kernel.controlled_userns_caps_whitelist
  kernel.controlled_userns_caps_whitelist = 1f,
  # sysctl -w kernel.controlled_userns_caps_whitelist=1f,dfff
  kernel.controlled_userns_caps_whitelist = 1f,dfff

For bit-to-mask conversion please check include/uapi/linux/capability.h
file.

Any capabilities that are not part of this mask will be controlled and
will not be allowed to processes in controlled user-ns. In above example
CAP_NET_RAW will not be available to controlled-user-namespaces.

Acked-by: Serge Hallyn 
Signed-off-by: Mahesh Bandewar 
---
v4:
  commit message changes.
v3:
  Added couple of comments as requested by Serge Hallyn
v2:
  Rebase
v1:
  Initial submission

 Documentation/sysctl/kernel.txt | 21 ++
 include/linux/capability.h  |  3 +++
 kernel/capability.c | 47 +
 kernel/sysctl.c |  5 +
 4 files changed, 76 insertions(+)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 694968c7523c..6aa1e087afee 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -25,6 +25,7 @@ show up in /proc/sys/kernel:
 - bootloader_version[ X86 only ]
 - callhome  [ S390 only ]
 - cap_last_cap
+- controlled_userns_caps_whitelist
 - core_pattern
 - core_pipe_limit
 - core_uses_pid
@@ -187,6 +188,26 @@ CAP_LAST_CAP from the kernel.
 
 ==
 
+controlled_userns_caps_whitelist
+
+Capability mask that is whitelisted for "controlled" user namespaces.
+Any capability that is missing from this mask will not be allowed to
+any process that is attached to a controlled-userns. e.g. if CAP_NET_RAW
+is not part of this mask, then processes running inside any controlled
+userns's will not be allowed to perform action that needs CAP_NET_RAW
+capability. However, processes that are attached to a parent user-ns
+hierarchy that is *not* controlled and has CAP_NET_RAW can continue
+performing those actions. User-namespaces are marked "controlled" at
+the time of their creation based on the capabilities of the creator.
+A process that does not have CAP_SYS_ADMIN will create user-namespaces
+that are controlled.
+
+The value is expressed as two comma separated hex words (u32). This
+sysctl is available in init-ns and users with CAP_SYS_ADMIN in init-ns
+are allowed to make changes.
+
+==
+
 core_pattern:
 
 core_pattern is used to specify a core dumpfile pattern name.
diff --git a/include/linux/capability.h b/include/linux/capability.h
index f640dcbc880c..7d79a4689625 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -14,6 +14,7 @@
 #define _LINUX_CAPABILITY_H
 
 #include 
+#include 
 
 
 #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3
@@ -248,6 +249,8 @@ extern bool ptracer_capable(struct task_struct *tsk, struct 
user_namespace *ns);
 
 /* audit system wants to get cap info from files as well */
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct 
cpu_vfs_cap_data *cpu_caps);
+int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
+void __user *buff, size_t *lenp, loff_t *ppos);
 
 extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t 
size);
 
diff --git a/kernel/capability.c b/kernel/capability.c
index 1e1c0236f55b..4a859b7d4902 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -29,6 +29,8 @@ EXPORT_SYMBOL(__cap_empty_set);
 
 int file_caps_enabled = 1;
 
+kernel_cap_t controlled_userns_caps_whitelist = CAP_FULL_SET;
+
 static int __init file_caps_disable(char *str)
 {
file_caps_enabled = 0;
@@ -507,3 +509,48 @@ bool ptracer_capable(struct task_struct *tsk, struct 
user_namespace *ns)
rcu_read_unlock();
return (ret == 0);
 }
+
+/* Controlled-userns capabilities routines */
+#ifdef CONFIG_SYSCTL
+int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
+

[PATCHv4 2/2] userns: control capabilities of some user namespaces

2018-01-02 Thread Mahesh Bandewar
From: Mahesh Bandewar 

With this new notion of "controlled" user-namespaces, the controlled
user-namespaces are marked at the time of their creation while the
capabilities of processes that belong to them are controlled using the
global mask.

Init-user-ns is always uncontrolled and a process that has SYS_ADMIN
that belongs to uncontrolled user-ns can create another (child) user-
namespace that is uncontrolled. Any other process (that either does
not have SYS_ADMIN or belongs to a controlled user-ns) can only
create a user-ns that is controlled.

global-capability-whitelist (controlled_userns_caps_whitelist) is used
at the capability check-time and keeps the semantics for the processes
that belong to uncontrolled user-ns as it is. Processes that belong to
controlled user-ns however are subjected to different checks-

   (a) if the capability in question is controlled and process belongs
   to controlled user-ns, then it's always denied.
   (b) if the capability in question is NOT controlled then fall back
   to the traditional check.

Acked-by: Serge Hallyn 
Signed-off-by: Mahesh Bandewar 
---
v4:
  Rebase
v3:
  Rebase
v2:
  Don't recalculate user-ns flags for every setns() call.
v1:
  Initial submission.

 include/linux/capability.h |  4 
 include/linux/user_namespace.h | 25 +
 kernel/capability.c|  5 +
 kernel/user_namespace.c|  4 
 security/commoncap.c   |  8 
 5 files changed, 46 insertions(+)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index 7d79a4689625..383f31f066f0 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -251,6 +251,10 @@ extern bool ptracer_capable(struct task_struct *tsk, 
struct user_namespace *ns);
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct 
cpu_vfs_cap_data *cpu_caps);
 int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
 void __user *buff, size_t *lenp, loff_t *ppos);
+/* Controlled capability is capability that is missing from the capability-mask
+ * controlled_userns_caps_whitelist controlled via sysctl.
+ */
+bool is_capability_controlled(int cap);
 
 extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t 
size);
 
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index d6b74b91096b..a5c48684b317 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -32,6 +32,7 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */
 };
 
 #define USERNS_SETGROUPS_ALLOWED 1UL
+#define USERNS_CONTROLLED   2UL
 
 #define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED
 
@@ -112,6 +113,21 @@ static inline void put_user_ns(struct user_namespace *ns)
__put_user_ns(ns);
 }
 
+/* Controlled user-ns is the one that is created by a process that does not
+ * have CAP_SYS_ADMIN (or descended from such an user-ns).
+ * For more details please see the sysctl description of
+ * controlled_userns_caps_whitelist.
+ */
+static inline bool is_user_ns_controlled(const struct user_namespace *ns)
+{
+   return ns->flags & USERNS_CONTROLLED;
+}
+
+static inline void mark_user_ns_controlled(struct user_namespace *ns)
+{
+   ns->flags |= USERNS_CONTROLLED;
+}
+
 struct seq_operations;
 extern const struct seq_operations proc_uid_seq_operations;
 extern const struct seq_operations proc_gid_seq_operations;
@@ -170,6 +186,15 @@ static inline struct ns_common *ns_get_owner(struct 
ns_common *ns)
 {
return ERR_PTR(-EPERM);
 }
+
+static inline bool is_user_ns_controlled(const struct user_namespace *ns)
+{
+   return false;
+}
+
+static inline void mark_user_ns_controlled(struct user_namespace *ns)
+{
+}
 #endif
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/capability.c b/kernel/capability.c
index 4a859b7d4902..bffe249922de 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -511,6 +511,11 @@ bool ptracer_capable(struct task_struct *tsk, struct 
user_namespace *ns)
 }
 
 /* Controlled-userns capabilities routines */
+bool is_capability_controlled(int cap)
+{
+   return !cap_raised(controlled_userns_caps_whitelist, cap);
+}
+
 #ifdef CONFIG_SYSCTL
 int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
 void __user *buff, size_t *lenp, loff_t *ppos)
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 246d4d4ce5c7..ca0556d466b6 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -141,6 +141,10 @@ int create_user_ns(struct cred *new)
goto fail_keyring;
 
set_cred_user_ns(new, ns);
+   if (!ns_capable(parent_ns, CAP_SYS_ADMIN) ||
+   is_user_ns_controlled(parent_ns))
+   mark_user_ns_controlled(ns);
+
return 0;
 fail_keyring:
 #ifdef CONFIG_PERSISTENT_KEYRINGS
diff --git a/security/commoncap.c b/security/commoncap.c
index 4f8e09340956..5454e9c03ee8 100644
--- 

[PATCHv4 0/2] capability controlled user-namespaces

2018-01-02 Thread Mahesh Bandewar
From: Mahesh Bandewar 

TL;DR version
-
Creating a sandbox environment with namespaces is challenging
considering what these sandboxed processes can engage into. e.g.
CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
Current form of user-namespaces, however, if changed a bit can allow
us to create a sandbox environment without locking down user-
namespaces.

Detailed version


Problem
---
User-namespaces in the current form have increased the attack surface as
any process can acquire capabilities which are not available to them (by
default) by performing combination of clone()/unshare()/setns() syscalls.

#define _GNU_SOURCE
#include 
#include 
#include 

int main(int ac, char **av)
{
int sock = -1;

printf("Attempting to open RAW socket before unshare()...\n");
sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
if (sock < 0) {
perror("socket() SOCK_RAW failed: ");
} else {
printf("Successfully opened RAW-Sock before unshare().\n");
close(sock);
sock = -1;
}

if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
perror("unshare() failed: ");
return 1;
}

printf("Attempting to open RAW socket after unshare()...\n");
sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
if (sock < 0) {
perror("socket() SOCK_RAW failed: ");
} else {
printf("Successfully opened RAW-Sock after unshare().\n");
close(sock);
sock = -1;
}

return 0;
}

The above example shows how easy it is to acquire NET_RAW capabilities
and once acquired, these processes could take benefit of above mentioned
or similar issues discovered/undiscovered with malicious intent. Note
that this is just an example and the problem/solution is not limited
to NET_RAW capability *only*. 

The easiest fix one can apply here is to lock-down user-namespaces which
many of the distros do (i.e. don't allow users to create user namespaces),
but unfortunately that prevents everyone from using them.

Approach

Introduce a notion of 'controlled' user-namespaces. Every process on
the host is allowed to create user-namespaces (governed by the limit
imposed by per-ns sysctl) however, mark user-namespaces created by
sandboxed processes as 'controlled'. Use this 'mark' at the time of
capability check in conjunction with a global capability whitelist.
If the capability is not whitelisted, processes that belong to 
controlled user-namespaces will not be allowed.

Processes that do not have CAP_SYS_ADMIN in init-ns can *only* create
controlled user-namespaces. In other words, user-namespaces created by
privileged processes (those which have CAP_SYS_ADMIN in init-ns) are
not controlled. A hierarchy underneath any controlled user-ns is always
controlled.

A global whitelist is list of capabilities governed by a sysctl
(kernel.controlled_userns_caps_whitelist) which is available to
(privileged) user in init-ns to modify while it's applicable to all
controlled user-namespaces on the host irrespective of when that user-ns
was created.

Marking user-namespaces controlled without modifying the whitelist is
equivalent of the current behavior. The default value of whitelist includes
all capabilities so that the compatibility is maintained. However it gives
admins fine-grained ability to control various capabilities system wide
without locking down user-namespaces.

Example
---
Here is the example that demonstrates the behavior of a kernel that has
this patch-set applied. It uses the same c-code from this commit-log and
is called acquire_raw.c -

(a) The 'root' user has all the capabilities all the time (before and
after taking capability).

root@vm0:~# id
uid=0(root) gid=0(root) groups=0(root)

root@vm0:~# sysctl -q kernel.controlled_userns_caps_whitelist 
kernel.controlled_userns_caps_whitelist = 1f,

root@vm0:~# ./acquire_raw 
Attempting to open RAW socket before unshare()...
Successfully opened RAW-Sock before unshare().
Attempting to open RAW socket after unshare()...
Successfully opened RAW-Sock after unshare().

root@vm0:~# sysctl -w kernel.controlled_userns_caps_whitelist=1f,dfff
kernel.controlled_userns_caps_whitelist = 1f,dfff

root@vm0:~# ./acquire_raw 
Attempting to open RAW socket before unshare()...
Successfully opened RAW-Sock before unshare().
Attempting to open RAW socket after unshare()...
Successfully opened RAW-Sock after unshare().

(b) Unprivileged user cannot change the mask.

mahesh@vm0:~$ id
uid=1000(mahesh) gid=1000(mahesh)

groups=1000(mahesh),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),118(lpadmin),128(sambashare)

mahesh@vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
kernel.controlled_userns_caps_whitelist = 1f,

 

[PATCHv4 0/2] capability controlled user-namespaces

2018-01-02 Thread Mahesh Bandewar
From: Mahesh Bandewar 

TL;DR version
-
Creating a sandbox environment with namespaces is challenging
considering what these sandboxed processes can engage into. e.g.
CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
Current form of user-namespaces, however, if changed a bit can allow
us to create a sandbox environment without locking down user-
namespaces.

Detailed version


Problem
---
User-namespaces in the current form have increased the attack surface as
any process can acquire capabilities which are not available to them (by
default) by performing combination of clone()/unshare()/setns() syscalls.

#define _GNU_SOURCE
#include 
#include 
#include 

int main(int ac, char **av)
{
int sock = -1;

printf("Attempting to open RAW socket before unshare()...\n");
sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
if (sock < 0) {
perror("socket() SOCK_RAW failed: ");
} else {
printf("Successfully opened RAW-Sock before unshare().\n");
close(sock);
sock = -1;
}

if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
perror("unshare() failed: ");
return 1;
}

printf("Attempting to open RAW socket after unshare()...\n");
sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
if (sock < 0) {
perror("socket() SOCK_RAW failed: ");
} else {
printf("Successfully opened RAW-Sock after unshare().\n");
close(sock);
sock = -1;
}

return 0;
}

The above example shows how easy it is to acquire NET_RAW capabilities
and once acquired, these processes could take benefit of above mentioned
or similar issues discovered/undiscovered with malicious intent. Note
that this is just an example and the problem/solution is not limited
to NET_RAW capability *only*. 

The easiest fix one can apply here is to lock-down user-namespaces which
many of the distros do (i.e. don't allow users to create user namespaces),
but unfortunately that prevents everyone from using them.

Approach

Introduce a notion of 'controlled' user-namespaces. Every process on
the host is allowed to create user-namespaces (governed by the limit
imposed by per-ns sysctl) however, mark user-namespaces created by
sandboxed processes as 'controlled'. Use this 'mark' at the time of
capability check in conjunction with a global capability whitelist.
If the capability is not whitelisted, processes that belong to 
controlled user-namespaces will not be allowed.

Processes that do not have CAP_SYS_ADMIN in init-ns can *only* create
controlled user-namespaces. In other words, user-namespaces created by
privileged processes (those which have CAP_SYS_ADMIN in init-ns) are
not controlled. A hierarchy underneath any controlled user-ns is always
controlled.

A global whitelist is list of capabilities governed by a sysctl
(kernel.controlled_userns_caps_whitelist) which is available to
(privileged) user in init-ns to modify while it's applicable to all
controlled user-namespaces on the host irrespective of when that user-ns
was created.

Marking user-namespaces controlled without modifying the whitelist is
equivalent of the current behavior. The default value of whitelist includes
all capabilities so that the compatibility is maintained. However it gives
admins fine-grained ability to control various capabilities system wide
without locking down user-namespaces.

Example
---
Here is the example that demonstrates the behavior of a kernel that has
this patch-set applied. It uses the same c-code from this commit-log and
is called acquire_raw.c -

(a) The 'root' user has all the capabilities all the time (before and
after taking capability).

root@vm0:~# id
uid=0(root) gid=0(root) groups=0(root)

root@vm0:~# sysctl -q kernel.controlled_userns_caps_whitelist 
kernel.controlled_userns_caps_whitelist = 1f,

root@vm0:~# ./acquire_raw 
Attempting to open RAW socket before unshare()...
Successfully opened RAW-Sock before unshare().
Attempting to open RAW socket after unshare()...
Successfully opened RAW-Sock after unshare().

root@vm0:~# sysctl -w kernel.controlled_userns_caps_whitelist=1f,dfff
kernel.controlled_userns_caps_whitelist = 1f,dfff

root@vm0:~# ./acquire_raw 
Attempting to open RAW socket before unshare()...
Successfully opened RAW-Sock before unshare().
Attempting to open RAW socket after unshare()...
Successfully opened RAW-Sock after unshare().

(b) Unprivileged user cannot change the mask.

mahesh@vm0:~$ id
uid=1000(mahesh) gid=1000(mahesh)

groups=1000(mahesh),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),118(lpadmin),128(sambashare)

mahesh@vm0:~$ sysctl -q kernel.controlled_userns_caps_whitelist 
kernel.controlled_userns_caps_whitelist = 1f,

mahesh@vm0:~$ 

Re: [PATCH V8 0/3] OPP: Allow OPP table to be used for power-domains

2018-01-02 Thread Viresh Kumar
On 18-12-17, 15:51, Viresh Kumar wrote:
> Hi,
> 
> Now that the performance state of PM domains are supported by the kernel
> (merged in linux-next), I am trying once again to define the bindings
> which we dropped until the code is merged first.
> 
> Summary:
> 
> Power-domains can also have their active states and this patchset
> enhances the OPP binding to define those.
> 
> The power domains can use the OPP bindings mostly as is. Though there
> are some changes required to support special cases:
> 
> - Allow "operating-points-v2" to contain multiple phandles for power
>   domain providers providing multiple domains.
> 
> - A new property "required-opp" is added for the devices to specify the
>   minimum required OPP of the master domain or any other type of device.
> 
> - Allow some of the OPP properties to accept magic values (firmware
>   dependent) as the OS doesn't know the real freq/voltage values.
> 
> V7->V8:
> - V7 1/2 divided into two patches. 1/3 is unchanged from V7.
> - 2/3 renamed the property from "power-domain-opp" to "required-opp", as
>   suggested by Rob.
> - Added Ulf's reviewed-by for 1/3 and 3/3.
> 
> --
> viresh
> 
> Viresh Kumar (3):
>   OPP: Allow OPP table to be used for power-domains
>   OPP: Introduce "required-opp" property

>   OPP: Allow "opp-hz" and "opp-microvolt" to contain magic values

Discussions are still going on for the last commit, though the first
two are already Acked by Rob and Ulf and are quite independent of the
third one.

Any objections to getting the first two merged for 4.16-rc1 ? I will
send them to Rafael on Friday if no one objects.

-- 
viresh


Re: [PATCH V8 0/3] OPP: Allow OPP table to be used for power-domains

2018-01-02 Thread Viresh Kumar
On 18-12-17, 15:51, Viresh Kumar wrote:
> Hi,
> 
> Now that the performance state of PM domains are supported by the kernel
> (merged in linux-next), I am trying once again to define the bindings
> which we dropped until the code is merged first.
> 
> Summary:
> 
> Power-domains can also have their active states and this patchset
> enhances the OPP binding to define those.
> 
> The power domains can use the OPP bindings mostly as is. Though there
> are some changes required to support special cases:
> 
> - Allow "operating-points-v2" to contain multiple phandles for power
>   domain providers providing multiple domains.
> 
> - A new property "required-opp" is added for the devices to specify the
>   minimum required OPP of the master domain or any other type of device.
> 
> - Allow some of the OPP properties to accept magic values (firmware
>   dependent) as the OS doesn't know the real freq/voltage values.
> 
> V7->V8:
> - V7 1/2 divided into two patches. 1/3 is unchanged from V7.
> - 2/3 renamed the property from "power-domain-opp" to "required-opp", as
>   suggested by Rob.
> - Added Ulf's reviewed-by for 1/3 and 3/3.
> 
> --
> viresh
> 
> Viresh Kumar (3):
>   OPP: Allow OPP table to be used for power-domains
>   OPP: Introduce "required-opp" property

>   OPP: Allow "opp-hz" and "opp-microvolt" to contain magic values

Discussions are still going on for the last commit, though the first
two are already Acked by Rob and Ulf and are quite independent of the
third one.

Any objections to getting the first two merged for 4.16-rc1 ? I will
send them to Rafael on Friday if no one objects.

-- 
viresh


Re: [Intel-gfx] [PATCH v2] drm/i915: Try EDID bitbanging on HDMI after failed read

2018-01-02 Thread Jani Nikula
On Tue, 02 Jan 2018, Chris Wilson  wrote:
> Quoting Rodrigo Vivi (2018-01-02 19:12:18)
>> On Sun, Dec 31, 2017 at 10:34:54PM +, Stefan Brüns wrote:
>> > + edid = drm_get_edid(connector, i2c);
>> > +
>> > + if (!edid && !intel_gmbus_is_forced_bit(i2c)) {
>> > + DRM_DEBUG_KMS("HDMI GMBUS EDID read failed, retry using GPIO 
>> > bit-banging\n");
>> > + intel_gmbus_force_bit(i2c, true);
>> > + edid = drm_get_edid(connector, i2c);
>> > + intel_gmbus_force_bit(i2c, false);
>> > + }
>> 
>> Approach seems fine for this case.
>> I just wonder what would be the risks of forcing this bit and edid read when 
>> nothing is present on the other end?
>
> Should be no more risky than using GMBUS as the bit-banging is the
> underlying HW protocol; it should just be adding an extra delay to
> the disconnected probe. Offset against the chance that it fixes
> detection of borderline devices.
>
> I would say that given the explanation above, the question is why not
> apply it universally? (Bonus points for including the explanation as
> comments.)

I'm wondering, is gmbus too fast for the adapters, does gmbus generally
have different timing for the ack/nak as described in the commit message
than bit banging, or are the adapters just plain buggy? Do we have any
control over gmbus timings (don't have the time to peruse the bpsec just
now)?

BR,
Jani.

> -Chris
> ___
> Intel-gfx mailing list
> intel-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center


Re: [Intel-gfx] [PATCH v2] drm/i915: Try EDID bitbanging on HDMI after failed read

2018-01-02 Thread Jani Nikula
On Tue, 02 Jan 2018, Chris Wilson  wrote:
> Quoting Rodrigo Vivi (2018-01-02 19:12:18)
>> On Sun, Dec 31, 2017 at 10:34:54PM +, Stefan Brüns wrote:
>> > + edid = drm_get_edid(connector, i2c);
>> > +
>> > + if (!edid && !intel_gmbus_is_forced_bit(i2c)) {
>> > + DRM_DEBUG_KMS("HDMI GMBUS EDID read failed, retry using GPIO 
>> > bit-banging\n");
>> > + intel_gmbus_force_bit(i2c, true);
>> > + edid = drm_get_edid(connector, i2c);
>> > + intel_gmbus_force_bit(i2c, false);
>> > + }
>> 
>> Approach seems fine for this case.
>> I just wonder what would be the risks of forcing this bit and edid read when 
>> nothing is present on the other end?
>
> Should be no more risky than using GMBUS as the bit-banging is the
> underlying HW protocol; it should just be adding an extra delay to
> the disconnected probe. Offset against the chance that it fixes
> detection of borderline devices.
>
> I would say that given the explanation above, the question is why not
> apply it universally? (Bonus points for including the explanation as
> comments.)

I'm wondering, is gmbus too fast for the adapters, does gmbus generally
have different timing for the ack/nak as described in the commit message
than bit banging, or are the adapters just plain buggy? Do we have any
control over gmbus timings (don't have the time to peruse the bpsec just
now)?

BR,
Jani.

> -Chris
> ___
> Intel-gfx mailing list
> intel-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center


Re: 4.15-rc6 PTI regression: L1 TLB mismatch MCE on Athlon64

2018-01-02 Thread Meelis Roos
> > These MCE-s do not happen on 4.14 and 4.15.0-rc4-00041-gace52288edf0. 
> > They do happen on each boot into 4.15-rc6. Will try to bisect.
> 
> Please do. And try -rc5 too.

4.15-rc5 is OK. Will try CONFIG_X86_PTDUMP on the next kernel.
 
> And then Linus' pti merges:
> 
> 52c90f2d32bfa7d6eccd66a56c44ace1f78fbadd
> 5aa90a84589282b87666f92b6c3c917c8080a9bf
> caf9a82657b313106aae8f4a35936c116a152299
> 64a48099b3b31568ac45716b7fafcb74a0c2fcfe


-- 
Meelis Roos (mr...@linux.ee)


Re: 4.15-rc6 PTI regression: L1 TLB mismatch MCE on Athlon64

2018-01-02 Thread Meelis Roos
> > These MCE-s do not happen on 4.14 and 4.15.0-rc4-00041-gace52288edf0. 
> > They do happen on each boot into 4.15-rc6. Will try to bisect.
> 
> Please do. And try -rc5 too.

4.15-rc5 is OK. Will try CONFIG_X86_PTDUMP on the next kernel.
 
> And then Linus' pti merges:
> 
> 52c90f2d32bfa7d6eccd66a56c44ace1f78fbadd
> 5aa90a84589282b87666f92b6c3c917c8080a9bf
> caf9a82657b313106aae8f4a35936c116a152299
> 64a48099b3b31568ac45716b7fafcb74a0c2fcfe


-- 
Meelis Roos (mr...@linux.ee)


Re: [PATCH] exec: Weaken dumpability for secureexec

2018-01-02 Thread Serge E. Hallyn
On Tue, Jan 02, 2018 at 03:21:33PM -0800, Kees Cook wrote:
> This is a logical revert of:
> 
> commit e37fdb785a5f ("exec: Use secureexec for setting dumpability")
> 
> This weakens dumpability back to checking only for uid/gid changes in
> current (which is useless), but userspace depends on dumpability not
> being tied to secureexec.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1528633
> 
> Reported-by: Tom Horsley 
> Fixes: e37fdb785a5f ("exec: Use secureexec for setting dumpability")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Kees Cook 
> ---
>  fs/exec.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 5688b5e1b937..7eb8d21bcab9 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1349,9 +1349,14 @@ void setup_new_exec(struct linux_binprm * bprm)
>  
>   current->sas_ss_sp = current->sas_ss_size = 0;
>  
> - /* Figure out dumpability. */
> + /*
> +  * Figure out dumpability. Note that this checking only of current
> +  * is wrong, but userspace depends on it. This should be testing
> +  * bprm->secureexec instead.
> +  */
>   if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
> - bprm->secureexec)
> + !(uid_eq(current_euid(), current_uid()) &&
> +   gid_eq(current_egid(), current_gid(

So what about the pdeath_signal?  Is that going to be another subtle
time-bomb?

>   set_dumpable(current->mm, suid_dumpable);
>   else
>   set_dumpable(current->mm, SUID_DUMP_USER);
> -- 
> 2.7.4
> 
> 
> -- 
> Kees Cook
> Pixel Security


Re: [PATCH] exec: Weaken dumpability for secureexec

2018-01-02 Thread Serge E. Hallyn
On Tue, Jan 02, 2018 at 03:21:33PM -0800, Kees Cook wrote:
> This is a logical revert of:
> 
> commit e37fdb785a5f ("exec: Use secureexec for setting dumpability")
> 
> This weakens dumpability back to checking only for uid/gid changes in
> current (which is useless), but userspace depends on dumpability not
> being tied to secureexec.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1528633
> 
> Reported-by: Tom Horsley 
> Fixes: e37fdb785a5f ("exec: Use secureexec for setting dumpability")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Kees Cook 
> ---
>  fs/exec.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 5688b5e1b937..7eb8d21bcab9 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1349,9 +1349,14 @@ void setup_new_exec(struct linux_binprm * bprm)
>  
>   current->sas_ss_sp = current->sas_ss_size = 0;
>  
> - /* Figure out dumpability. */
> + /*
> +  * Figure out dumpability. Note that this checking only of current
> +  * is wrong, but userspace depends on it. This should be testing
> +  * bprm->secureexec instead.
> +  */
>   if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
> - bprm->secureexec)
> + !(uid_eq(current_euid(), current_uid()) &&
> +   gid_eq(current_egid(), current_gid(

So what about the pdeath_signal?  Is that going to be another subtle
time-bomb?

>   set_dumpable(current->mm, suid_dumpable);
>   else
>   set_dumpable(current->mm, SUID_DUMP_USER);
> -- 
> 2.7.4
> 
> 
> -- 
> Kees Cook
> Pixel Security


Re: About the try to remove cross-release feature entirely by Ingo

2018-01-02 Thread Theodore Ts'o
On Wed, Jan 03, 2018 at 11:10:37AM +0900, Byungchul Park wrote:
> > The point I was trying to drive home is that "all we have to do is
> > just classify everything well or just invalidate the right lock
> 
> Just to be sure, we don't have to invalidate lock objects at all but
> a problematic waiter only.

So essentially you are proposing that we have to play "whack-a-mole"
as we find false positives, and where we may have to put in ad-hoc
plumbing to only invalidate "a problematic waiter" when it's
problematic --- or to entirely suppress the problematic waiter
altogether.  And in that case, a file system developer might be forced
to invalidate a lock/"waiter"/"completion" in another subsystem.

I will also remind you that doing this will trigger a checkpatch.pl
*error*:

ERROR("LOCKDEP", "lockdep_no_validate class is reserved for device->mutex.\n" . 
$herecurr);

- Ted


Re: About the try to remove cross-release feature entirely by Ingo

2018-01-02 Thread Theodore Ts'o
On Wed, Jan 03, 2018 at 11:10:37AM +0900, Byungchul Park wrote:
> > The point I was trying to drive home is that "all we have to do is
> > just classify everything well or just invalidate the right lock
> 
> Just to be sure, we don't have to invalidate lock objects at all but
> a problematic waiter only.

So essentially you are proposing that we have to play "whack-a-mole"
as we find false positives, and where we may have to put in ad-hoc
plumbing to only invalidate "a problematic waiter" when it's
problematic --- or to entirely suppress the problematic waiter
altogether.  And in that case, a file system developer might be forced
to invalidate a lock/"waiter"/"completion" in another subsystem.

I will also remind you that doing this will trigger a checkpatch.pl
*error*:

ERROR("LOCKDEP", "lockdep_no_validate class is reserved for device->mutex.\n" . 
$herecurr);

- Ted


Re: [PATCH] exec: Weaken dumpability for secureexec

2018-01-02 Thread Serge E. Hallyn
On Tue, Jan 02, 2018 at 03:21:33PM -0800, Kees Cook wrote:
> This is a logical revert of:
> 
> commit e37fdb785a5f ("exec: Use secureexec for setting dumpability")
> 
> This weakens dumpability back to checking only for uid/gid changes in
> current (which is useless), but userspace depends on dumpability not
> being tied to secureexec.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1528633
> 
> Reported-by: Tom Horsley 

Seems right, any chance we could get a tested-by: Tom?  (Did we already
get that?)

> Fixes: e37fdb785a5f ("exec: Use secureexec for setting dumpability")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Kees Cook 
> ---
>  fs/exec.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 5688b5e1b937..7eb8d21bcab9 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1349,9 +1349,14 @@ void setup_new_exec(struct linux_binprm * bprm)
>  
>   current->sas_ss_sp = current->sas_ss_size = 0;
>  
> - /* Figure out dumpability. */
> + /*
> +  * Figure out dumpability. Note that this checking only of current
> +  * is wrong, but userspace depends on it. This should be testing
> +  * bprm->secureexec instead.
> +  */
>   if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
> - bprm->secureexec)
> + !(uid_eq(current_euid(), current_uid()) &&
> +   gid_eq(current_egid(), current_gid(
>   set_dumpable(current->mm, suid_dumpable);
>   else
>   set_dumpable(current->mm, SUID_DUMP_USER);
> -- 
> 2.7.4
> 
> 
> -- 
> Kees Cook
> Pixel Security


Re: [PATCH] exec: Weaken dumpability for secureexec

2018-01-02 Thread Serge E. Hallyn
On Tue, Jan 02, 2018 at 03:21:33PM -0800, Kees Cook wrote:
> This is a logical revert of:
> 
> commit e37fdb785a5f ("exec: Use secureexec for setting dumpability")
> 
> This weakens dumpability back to checking only for uid/gid changes in
> current (which is useless), but userspace depends on dumpability not
> being tied to secureexec.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1528633
> 
> Reported-by: Tom Horsley 

Seems right, any chance we could get a tested-by: Tom?  (Did we already
get that?)

> Fixes: e37fdb785a5f ("exec: Use secureexec for setting dumpability")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Kees Cook 
> ---
>  fs/exec.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 5688b5e1b937..7eb8d21bcab9 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1349,9 +1349,14 @@ void setup_new_exec(struct linux_binprm * bprm)
>  
>   current->sas_ss_sp = current->sas_ss_size = 0;
>  
> - /* Figure out dumpability. */
> + /*
> +  * Figure out dumpability. Note that this checking only of current
> +  * is wrong, but userspace depends on it. This should be testing
> +  * bprm->secureexec instead.
> +  */
>   if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
> - bprm->secureexec)
> + !(uid_eq(current_euid(), current_uid()) &&
> +   gid_eq(current_egid(), current_gid(
>   set_dumpable(current->mm, suid_dumpable);
>   else
>   set_dumpable(current->mm, SUID_DUMP_USER);
> -- 
> 2.7.4
> 
> 
> -- 
> Kees Cook
> Pixel Security


Re: [PATCH] bonding: Delete an error message for a failed memory allocation in bond_update_slave_arr()

2018-01-02 Thread महेश बंडेवार
On Mon, Jan 1, 2018 at 8:07 AM, SF Markus Elfring
 wrote:
> From: Markus Elfring 
> Date: Mon, 1 Jan 2018 17:00:04 +0100
>
> Omit an extra message for a memory allocation failure in this function.
>
> This issue was detected by using the Coccinelle software.
>
What is the issue with this message?

> Signed-off-by: Markus Elfring 
> ---
>  drivers/net/bonding/bond_main.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index c669554d70bb..a96e0c9cc4bf 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3910,7 +3910,6 @@ int bond_update_slave_arr(struct bonding *bond, struct 
> slave *skipslave)
>   GFP_KERNEL);
> if (!new_arr) {
> ret = -ENOMEM;
> -   pr_err("Failed to build slave-array.\n");
> goto out;
> }
> if (BOND_MODE(bond) == BOND_MODE_8023AD) {
> --
> 2.15.1
>


Re: [PATCH] bonding: Delete an error message for a failed memory allocation in bond_update_slave_arr()

2018-01-02 Thread महेश बंडेवार
On Mon, Jan 1, 2018 at 8:07 AM, SF Markus Elfring
 wrote:
> From: Markus Elfring 
> Date: Mon, 1 Jan 2018 17:00:04 +0100
>
> Omit an extra message for a memory allocation failure in this function.
>
> This issue was detected by using the Coccinelle software.
>
What is the issue with this message?

> Signed-off-by: Markus Elfring 
> ---
>  drivers/net/bonding/bond_main.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index c669554d70bb..a96e0c9cc4bf 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3910,7 +3910,6 @@ int bond_update_slave_arr(struct bonding *bond, struct 
> slave *skipslave)
>   GFP_KERNEL);
> if (!new_arr) {
> ret = -ENOMEM;
> -   pr_err("Failed to build slave-array.\n");
> goto out;
> }
> if (BOND_MODE(bond) == BOND_MODE_8023AD) {
> --
> 2.15.1
>


Re: [PATCH] nokia N9: Add support for magnetometer and touchscreen

2018-01-02 Thread Filip Matijević
Hi,

On 01/02/2018 06:27 PM, Sebastian Reichel wrote:
> Hi,
> 
> On Tue, Jan 02, 2018 at 02:17:22PM +0100, Pavel Machek wrote:
>> This adds dts support for magnetometer and touchscreen on Nokia N9.
> 
> I think it makes sense to have this splitted.
> 
>> Signed-off-by: Pavel Machek 
>>
>> diff --git a/arch/arm/boot/dts/omap3-n9.dts b/arch/arm/boot/dts/omap3-n9.dts
>> index 39e35f8..57a6679 100644
>> --- a/arch/arm/boot/dts/omap3-n9.dts
>> +++ b/arch/arm/boot/dts/omap3-n9.dts
>> @@ -36,6 +57,22 @@
>>  };
>>  };
>>  };
>> +
>> +touch@4b {
> 
> touchscreen@
> 
>> +compatible = "atmel,maxtouch";
>> +reg = <0x4b>;
>> +interrupt-parent = <>;
>> +interrupts = <29 2>; /* gpio_61, IRQF_TRIGGER_FALLING*/
> 
> reset-gpios = < 17 GPIO_ACTIVE_SOMETHING>;
> 

I'm using reset-gpios = < 17 0>;

>> +vdd-supply = <>;
>> +avdd-supply = <>;
> 
> Those two are not mentioned in the binding and not supported by the
> driver as far as I can see?
> 

Right, but vio and vaux1 need to be on - the reason why it's working at
all is because lis302 uses the same regulators and turns them on. IMHO
either we add the support for regulators to maxtouch driver or we add
regulator-always-on to vio and vaux1.

>> +};
>> +};
> 
> Touchscreen with the same settings is required for n950, so it
> should be in the shared n950 + n9 file.
> 

As a side-note, there is no pinmux mentioned and usually I'd use
OMAP3_CORE1_IOPAD(0x20c8, PIN_INPUT | MUX_MODE4) /* gpio_61*/
OMAP3_CORE1_IOPAD(0x20f2, PIN_OUTPUT | MUX_MODE4) /* gpio_81*/

For reasons that I can't explain, first line (gpmc_nbe1->gpio_61) breaks
it for me, so I've commented it out. Still, if anyone has an idea what
is wrong with that please let me know.

>> + {
>> +ak8975@0f {
>> +compatible = "asahi-kasei,ak8975";
>> +reg = <0x0f>;
>> +};
>>  };
> 
> Looking at the N9 board file this is missing a rotation matrix. This
> is supported by the binding:
> 
> Documentation/devicetree/bindings/iio/magnetometer/ak8975.txt
> 
>>  
>>   {
> 
> -- Sebastian
> 

Best regards,
Filip


Re: [PATCH] nokia N9: Add support for magnetometer and touchscreen

2018-01-02 Thread Filip Matijević
Hi,

On 01/02/2018 06:27 PM, Sebastian Reichel wrote:
> Hi,
> 
> On Tue, Jan 02, 2018 at 02:17:22PM +0100, Pavel Machek wrote:
>> This adds dts support for magnetometer and touchscreen on Nokia N9.
> 
> I think it makes sense to have this splitted.
> 
>> Signed-off-by: Pavel Machek 
>>
>> diff --git a/arch/arm/boot/dts/omap3-n9.dts b/arch/arm/boot/dts/omap3-n9.dts
>> index 39e35f8..57a6679 100644
>> --- a/arch/arm/boot/dts/omap3-n9.dts
>> +++ b/arch/arm/boot/dts/omap3-n9.dts
>> @@ -36,6 +57,22 @@
>>  };
>>  };
>>  };
>> +
>> +touch@4b {
> 
> touchscreen@
> 
>> +compatible = "atmel,maxtouch";
>> +reg = <0x4b>;
>> +interrupt-parent = <>;
>> +interrupts = <29 2>; /* gpio_61, IRQF_TRIGGER_FALLING*/
> 
> reset-gpios = < 17 GPIO_ACTIVE_SOMETHING>;
> 

I'm using reset-gpios = < 17 0>;

>> +vdd-supply = <>;
>> +avdd-supply = <>;
> 
> Those two are not mentioned in the binding and not supported by the
> driver as far as I can see?
> 

Right, but vio and vaux1 need to be on - the reason why it's working at
all is because lis302 uses the same regulators and turns them on. IMHO
either we add the support for regulators to maxtouch driver or we add
regulator-always-on to vio and vaux1.

>> +};
>> +};
> 
> Touchscreen with the same settings is required for n950, so it
> should be in the shared n950 + n9 file.
> 

As a side-note, there is no pinmux mentioned and usually I'd use
OMAP3_CORE1_IOPAD(0x20c8, PIN_INPUT | MUX_MODE4) /* gpio_61*/
OMAP3_CORE1_IOPAD(0x20f2, PIN_OUTPUT | MUX_MODE4) /* gpio_81*/

For reasons that I can't explain, first line (gpmc_nbe1->gpio_61) breaks
it for me, so I've commented it out. Still, if anyone has an idea what
is wrong with that please let me know.

>> + {
>> +ak8975@0f {
>> +compatible = "asahi-kasei,ak8975";
>> +reg = <0x0f>;
>> +};
>>  };
> 
> Looking at the N9 board file this is missing a rotation matrix. This
> is supported by the binding:
> 
> Documentation/devicetree/bindings/iio/magnetometer/ak8975.txt
> 
>>  
>>   {
> 
> -- Sebastian
> 

Best regards,
Filip


Re: [PATCH] mm/fadvise: discard partial pages iff endbyte is also eof

2018-01-02 Thread 夷则(Caspar)


> 在 2017年12月23日,12:16,十刀  写道:
> 
> From: "shidao.ytt" 
> 
> in commit 441c228f817f7 ("mm: fadvise: document the
> fadvise(FADV_DONTNEED) behaviour for partial pages") Mel Gorman
> explained why partial pages should be preserved instead of discarded
> when using fadvise(FADV_DONTNEED), however the actual codes to calcuate
> end_index was unexpectedly wrong, the code behavior didn't match to the
> statement in comments; Luckily in another commit 18aba41cbf
> ("mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED")
> Oleg Drokin fixed this behavior
> 
> Here I come up with a new idea that actually we can still discard the
> last parital page iff the page-unaligned endbyte is also the end of
> file, since no one else will use the rest of the page and it should be
> safe enough to discard.

+akpm...

Hi Mel, Andrew:

Would you please take a look at this patch, to see if this proposal
is reasonable enough, thanks in advance!

Thanks,
Caspar

> 
> Signed-off-by: shidao.ytt 
> Signed-off-by: Caspar Zhang 
> ---
> mm/fadvise.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/fadvise.c b/mm/fadvise.c
> index ec70d6e..f74b21e 100644
> --- a/mm/fadvise.c
> +++ b/mm/fadvise.c
> @@ -127,7 +127,8 @@
>*/
>   start_index = (offset+(PAGE_SIZE-1)) >> PAGE_SHIFT;
>   end_index = (endbyte >> PAGE_SHIFT);
> - if ((endbyte & ~PAGE_MASK) != ~PAGE_MASK) {
> + if ((endbyte & ~PAGE_MASK) != ~PAGE_MASK &&
> + endbyte != inode->i_size - 1) {
>   /* First page is tricky as 0 - 1 = -1, but pgoff_t
>* is unsigned, so the end_index >= start_index
>* check below would be true and we'll discard the whole
> -- 
> 1.8.3.1



Re: [PATCH] mm/fadvise: discard partial pages iff endbyte is also eof

2018-01-02 Thread 夷则(Caspar)


> 在 2017年12月23日,12:16,十刀  写道:
> 
> From: "shidao.ytt" 
> 
> in commit 441c228f817f7 ("mm: fadvise: document the
> fadvise(FADV_DONTNEED) behaviour for partial pages") Mel Gorman
> explained why partial pages should be preserved instead of discarded
> when using fadvise(FADV_DONTNEED), however the actual codes to calcuate
> end_index was unexpectedly wrong, the code behavior didn't match to the
> statement in comments; Luckily in another commit 18aba41cbf
> ("mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED")
> Oleg Drokin fixed this behavior
> 
> Here I come up with a new idea that actually we can still discard the
> last parital page iff the page-unaligned endbyte is also the end of
> file, since no one else will use the rest of the page and it should be
> safe enough to discard.

+akpm...

Hi Mel, Andrew:

Would you please take a look at this patch, to see if this proposal
is reasonable enough, thanks in advance!

Thanks,
Caspar

> 
> Signed-off-by: shidao.ytt 
> Signed-off-by: Caspar Zhang 
> ---
> mm/fadvise.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/fadvise.c b/mm/fadvise.c
> index ec70d6e..f74b21e 100644
> --- a/mm/fadvise.c
> +++ b/mm/fadvise.c
> @@ -127,7 +127,8 @@
>*/
>   start_index = (offset+(PAGE_SIZE-1)) >> PAGE_SHIFT;
>   end_index = (endbyte >> PAGE_SHIFT);
> - if ((endbyte & ~PAGE_MASK) != ~PAGE_MASK) {
> + if ((endbyte & ~PAGE_MASK) != ~PAGE_MASK &&
> + endbyte != inode->i_size - 1) {
>   /* First page is tricky as 0 - 1 = -1, but pgoff_t
>* is unsigned, so the end_index >= start_index
>* check below would be true and we'll discard the whole
> -- 
> 1.8.3.1



Re: [PATCH 2/3] dt-bindings: mtd: atmel-quadspi: add an optional property 'dmacap,memcpy'

2018-01-02 Thread ludovic.desroc...@microchip.com
On Tue, Jan 02, 2018 at 07:18:58PM +, Trent Piepho wrote:
> On Tue, 2018-01-02 at 11:22 +0100, Ludovic Desroches wrote:
> > On Wed, Dec 27, 2017 at 10:40:00PM +0100, Cyrille Pitchen wrote:
> > 
> > > Or maybe no change at all is required at the at_xdmac.c driver side: we
> > > just don't care about the provided flags in the "dmas" property, 
> > > especially
> > > the "peripheral id". They would be ignored anyway when the atmel-quadspi.c
> > > driver later calls dmaengine_prep_dma_memcpy(). So I could simply set the
> > > dma cells to 0 in the device-tree?
> > > 
> > > Ludovic, what do you think about that ?
> > 
> > It may work but I won't do this. Usually, channels requested through the 
> > xlate
> > function have usually their capaiblities set to DMA_SLAVE and not 
> > DMA_MEMCPY.
> > In the at_xdmac case, it won't be an issue but if you have a controller
> > which has channels which can support only mem-to-mem or peripheral, it
> > won't work.
> 
> Maybe one could create an "AT91_XDMAC_DT_" macro to indicate a memcpy
> channel.  There are still unused bits for another flag.  It also looks
> like at_xdma uses peripheral id 0x3f for memcpy transfers (will that
> work with memcpy DMA on multiple channels at the same time?).  So
> perhaps perid 0x3f could be the indication of wanting a memcpy channel,
> rather than another flag bit.  But however it's done, one writes:
> 
> dmas = < AT91_XDMAC_DT_MEMCPY>; dma-names = "rx-tx";
> 

If have no objection about doing that, my concerns are:
- most (all ?) of the dma controllers used the xlate function to provide
  slave channel. Does it have to provide slave channel or can we
  use it for all kind of channel? From my point of view, we can do it,
  just need the confirmation.
- this set of patches if focused on the atmel qspi controller but other
  ones may be interested in doing the same thing so they would have to
  update the behavior of the xlate function of the DMA controller they
  are using. So having the request of a DMA_MEMCPY channel inside the
  spi/qspi controller doesn't seem to be a wrong idea. Moreover, it may
  be confusing for the user who don't know the context: why do I have to
  use memcpy and not slave as usal?

Honestly I have no opinion about the way to do it. Both have pros and
cons.

> I think one could have the quadspi driver automatically fill in the dma
> cell in the dma specifier if it is not present in the device tree.  So
> one could write "dmas = <>" and the driver adds the
> AT91_XDMAC_DT_MEMCPY cell before xlating.  I'm not sure if that's a
> good idea or not.

I don't think so, there is enough black magic, let's try to not add more
:p

Regards

Ludovic


Re: [PATCH 2/3] dt-bindings: mtd: atmel-quadspi: add an optional property 'dmacap,memcpy'

2018-01-02 Thread ludovic.desroc...@microchip.com
On Tue, Jan 02, 2018 at 07:18:58PM +, Trent Piepho wrote:
> On Tue, 2018-01-02 at 11:22 +0100, Ludovic Desroches wrote:
> > On Wed, Dec 27, 2017 at 10:40:00PM +0100, Cyrille Pitchen wrote:
> > 
> > > Or maybe no change at all is required at the at_xdmac.c driver side: we
> > > just don't care about the provided flags in the "dmas" property, 
> > > especially
> > > the "peripheral id". They would be ignored anyway when the atmel-quadspi.c
> > > driver later calls dmaengine_prep_dma_memcpy(). So I could simply set the
> > > dma cells to 0 in the device-tree?
> > > 
> > > Ludovic, what do you think about that ?
> > 
> > It may work but I won't do this. Usually, channels requested through the 
> > xlate
> > function have usually their capaiblities set to DMA_SLAVE and not 
> > DMA_MEMCPY.
> > In the at_xdmac case, it won't be an issue but if you have a controller
> > which has channels which can support only mem-to-mem or peripheral, it
> > won't work.
> 
> Maybe one could create an "AT91_XDMAC_DT_" macro to indicate a memcpy
> channel.  There are still unused bits for another flag.  It also looks
> like at_xdma uses peripheral id 0x3f for memcpy transfers (will that
> work with memcpy DMA on multiple channels at the same time?).  So
> perhaps perid 0x3f could be the indication of wanting a memcpy channel,
> rather than another flag bit.  But however it's done, one writes:
> 
> dmas = < AT91_XDMAC_DT_MEMCPY>; dma-names = "rx-tx";
> 

If have no objection about doing that, my concerns are:
- most (all ?) of the dma controllers used the xlate function to provide
  slave channel. Does it have to provide slave channel or can we
  use it for all kind of channel? From my point of view, we can do it,
  just need the confirmation.
- this set of patches if focused on the atmel qspi controller but other
  ones may be interested in doing the same thing so they would have to
  update the behavior of the xlate function of the DMA controller they
  are using. So having the request of a DMA_MEMCPY channel inside the
  spi/qspi controller doesn't seem to be a wrong idea. Moreover, it may
  be confusing for the user who don't know the context: why do I have to
  use memcpy and not slave as usal?

Honestly I have no opinion about the way to do it. Both have pros and
cons.

> I think one could have the quadspi driver automatically fill in the dma
> cell in the dma specifier if it is not present in the device tree.  So
> one could write "dmas = <>" and the driver adds the
> AT91_XDMAC_DT_MEMCPY cell before xlating.  I'm not sure if that's a
> good idea or not.

I don't think so, there is enough black magic, let's try to not add more
:p

Regards

Ludovic


Re: [PATCH 0/2] perf-probe: Improve warning message for buildid mismatch

2018-01-02 Thread Ravi Bangoria


On 12/18/2017 12:58 PM, Masami Hiramatsu wrote:
> Hello,
>
> This series ensure the build-ids for target binary and debuginfo
> are matched. If there is a mismatch, it warns user to check the
> package versions.

For the series,

Reviewed-by: Ravi Bangoria 



Re: [PATCH 0/2] perf-probe: Improve warning message for buildid mismatch

2018-01-02 Thread Ravi Bangoria


On 12/18/2017 12:58 PM, Masami Hiramatsu wrote:
> Hello,
>
> This series ensure the build-ids for target binary and debuginfo
> are matched. If there is a mismatch, it warns user to check the
> package versions.

For the series,

Reviewed-by: Ravi Bangoria 



Re: [f2fs-dev] [PATCH v3] f2fs: add reserved blocks for root user

2018-01-02 Thread Chao Yu
On 2018/1/3 3:24, Jaegeuk Kim wrote:
>> How about adding uid & gid verification also like ext4?
> 
> Again, that's another feature which requires a mount option. I think it'd be
> better to add that, once we have a use-case.

That's OK. ;)

Thanks,



Re: [f2fs-dev] [PATCH v5] f2fs: add reserved blocks for root user

2018-01-02 Thread Chao Yu
On 2018/1/3 10:21, Jaegeuk Kim wrote:
> This patch allows root to reserve some blocks via mount option.
> 
> "-o reserve_root=N" means N x 4KB-sized blocks for root only.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
> 
> Change log from v4:
>  - fix f_bfree in statfs

Could you fix f_bfree calculation issue in another patch prior to this
patch? That will be better for history tracking of patches or git bisect
when backtracking issues.

One more thing, should we move reserve_root_limit check to parse_option?
now, it looks that during remount we can set root_reserved_blocks exceeding
our defined limitation.

Thanks,

> 
>  fs/f2fs/f2fs.h  | 26 ++
>  fs/f2fs/super.c | 34 +-
>  fs/f2fs/sysfs.c |  3 ++-
>  3 files changed, 53 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 07e03990420b..a0e8eec23125 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -95,6 +95,7 @@ extern char *fault_name[FAULT_MAX];
>  #define F2FS_MOUNT_PRJQUOTA  0x0020
>  #define F2FS_MOUNT_QUOTA 0x0040
>  #define F2FS_MOUNT_INLINE_XATTR_SIZE 0x0080
> +#define F2FS_MOUNT_RESERVE_ROOT  0x0100
>  
>  #define clear_opt(sbi, option)   ((sbi)->mount_opt.opt &= 
> ~F2FS_MOUNT_##option)
>  #define set_opt(sbi, option) ((sbi)->mount_opt.opt |= F2FS_MOUNT_##option)
> @@ -1105,6 +1106,7 @@ struct f2fs_sb_info {
>   block_t last_valid_block_count; /* for recovery */
>   block_t reserved_blocks;/* configurable reserved blocks 
> */
>   block_t current_reserved_blocks;/* current reserved blocks */
> + block_t root_reserved_blocks;   /* root reserved blocks */
>  
>   unsigned int nquota_files;  /* # of quota sysfile */
>  
> @@ -1554,6 +1556,12 @@ static inline bool f2fs_has_xattr_block(unsigned int 
> ofs)
>   return ofs == XATTR_NODE_OFFSET;
>  }
>  
> +static inline block_t reserve_root_limit(struct f2fs_sb_info *sbi)
> +{
> + /* limit is 0.2% */
> + return (sbi->user_block_count << 1) / 1000;
> +}
> +
>  static inline void f2fs_i_blocks_write(struct inode *, block_t, bool, bool);
>  static inline int inc_valid_block_count(struct f2fs_sb_info *sbi,
>struct inode *inode, blkcnt_t *count)
> @@ -1583,11 +1591,17 @@ static inline int inc_valid_block_count(struct 
> f2fs_sb_info *sbi,
>   sbi->total_valid_block_count += (block_t)(*count);
>   avail_user_block_count = sbi->user_block_count -
>   sbi->current_reserved_blocks;
> +
> + if (!(test_opt(sbi, RESERVE_ROOT) && capable(CAP_SYS_RESOURCE)))
> + avail_user_block_count -= sbi->root_reserved_blocks;
> +
>   if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) {
>   diff = sbi->total_valid_block_count - avail_user_block_count;
> + if (diff > *count)
> + diff = *count;
>   *count -= diff;
>   release = diff;
> - sbi->total_valid_block_count = avail_user_block_count;
> + sbi->total_valid_block_count -= diff;
>   if (!*count) {
>   spin_unlock(>stat_lock);
>   percpu_counter_sub(>alloc_valid_block_count, diff);
> @@ -1776,9 +1790,13 @@ static inline int inc_valid_node_count(struct 
> f2fs_sb_info *sbi,
>  
>   spin_lock(>stat_lock);
>  
> - valid_block_count = sbi->total_valid_block_count + 1;
> - if (unlikely(valid_block_count + sbi->current_reserved_blocks >
> - sbi->user_block_count)) {
> + valid_block_count = sbi->total_valid_block_count +
> + sbi->current_reserved_blocks + 1;
> +
> + if (!(test_opt(sbi, RESERVE_ROOT) && capable(CAP_SYS_RESOURCE)))
> + valid_block_count += sbi->root_reserved_blocks;
> +
> + if (unlikely(valid_block_count > sbi->user_block_count)) {
>   spin_unlock(>stat_lock);
>   goto enospc;
>   }
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 5c6a02b558f0..e814340bc2f0 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -107,6 +107,7 @@ enum {
>   Opt_noextent_cache,
>   Opt_noinline_data,
>   Opt_data_flush,
> + Opt_reserve_root,
>   Opt_mode,
>   Opt_io_size_bits,
>   Opt_fault_injection,
> @@ -157,6 +158,7 @@ static match_table_t f2fs_tokens = {
>   {Opt_noextent_cache, "noextent_cache"},
>   {Opt_noinline_data, "noinline_data"},
>   {Opt_data_flush, "data_flush"},
> + {Opt_reserve_root, "reserve_root=%u"},
>   {Opt_mode, "mode=%s"},
>   {Opt_io_size_bits, "io_bits=%u"},
>   {Opt_fault_injection, "fault_injection=%u"},
> @@ -488,6 +490,18 @@ static int parse_options(struct super_block *sb, char 
> *options)
>   case Opt_data_flush:
>   

Re: [f2fs-dev] [PATCH v3] f2fs: add reserved blocks for root user

2018-01-02 Thread Chao Yu
On 2018/1/3 3:24, Jaegeuk Kim wrote:
>> How about adding uid & gid verification also like ext4?
> 
> Again, that's another feature which requires a mount option. I think it'd be
> better to add that, once we have a use-case.

That's OK. ;)

Thanks,



Re: [f2fs-dev] [PATCH v5] f2fs: add reserved blocks for root user

2018-01-02 Thread Chao Yu
On 2018/1/3 10:21, Jaegeuk Kim wrote:
> This patch allows root to reserve some blocks via mount option.
> 
> "-o reserve_root=N" means N x 4KB-sized blocks for root only.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
> 
> Change log from v4:
>  - fix f_bfree in statfs

Could you fix f_bfree calculation issue in another patch prior to this
patch? That will be better for history tracking of patches or git bisect
when backtracking issues.

One more thing, should we move reserve_root_limit check to parse_option?
now, it looks that during remount we can set root_reserved_blocks exceeding
our defined limitation.

Thanks,

> 
>  fs/f2fs/f2fs.h  | 26 ++
>  fs/f2fs/super.c | 34 +-
>  fs/f2fs/sysfs.c |  3 ++-
>  3 files changed, 53 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 07e03990420b..a0e8eec23125 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -95,6 +95,7 @@ extern char *fault_name[FAULT_MAX];
>  #define F2FS_MOUNT_PRJQUOTA  0x0020
>  #define F2FS_MOUNT_QUOTA 0x0040
>  #define F2FS_MOUNT_INLINE_XATTR_SIZE 0x0080
> +#define F2FS_MOUNT_RESERVE_ROOT  0x0100
>  
>  #define clear_opt(sbi, option)   ((sbi)->mount_opt.opt &= 
> ~F2FS_MOUNT_##option)
>  #define set_opt(sbi, option) ((sbi)->mount_opt.opt |= F2FS_MOUNT_##option)
> @@ -1105,6 +1106,7 @@ struct f2fs_sb_info {
>   block_t last_valid_block_count; /* for recovery */
>   block_t reserved_blocks;/* configurable reserved blocks 
> */
>   block_t current_reserved_blocks;/* current reserved blocks */
> + block_t root_reserved_blocks;   /* root reserved blocks */
>  
>   unsigned int nquota_files;  /* # of quota sysfile */
>  
> @@ -1554,6 +1556,12 @@ static inline bool f2fs_has_xattr_block(unsigned int 
> ofs)
>   return ofs == XATTR_NODE_OFFSET;
>  }
>  
> +static inline block_t reserve_root_limit(struct f2fs_sb_info *sbi)
> +{
> + /* limit is 0.2% */
> + return (sbi->user_block_count << 1) / 1000;
> +}
> +
>  static inline void f2fs_i_blocks_write(struct inode *, block_t, bool, bool);
>  static inline int inc_valid_block_count(struct f2fs_sb_info *sbi,
>struct inode *inode, blkcnt_t *count)
> @@ -1583,11 +1591,17 @@ static inline int inc_valid_block_count(struct 
> f2fs_sb_info *sbi,
>   sbi->total_valid_block_count += (block_t)(*count);
>   avail_user_block_count = sbi->user_block_count -
>   sbi->current_reserved_blocks;
> +
> + if (!(test_opt(sbi, RESERVE_ROOT) && capable(CAP_SYS_RESOURCE)))
> + avail_user_block_count -= sbi->root_reserved_blocks;
> +
>   if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) {
>   diff = sbi->total_valid_block_count - avail_user_block_count;
> + if (diff > *count)
> + diff = *count;
>   *count -= diff;
>   release = diff;
> - sbi->total_valid_block_count = avail_user_block_count;
> + sbi->total_valid_block_count -= diff;
>   if (!*count) {
>   spin_unlock(>stat_lock);
>   percpu_counter_sub(>alloc_valid_block_count, diff);
> @@ -1776,9 +1790,13 @@ static inline int inc_valid_node_count(struct 
> f2fs_sb_info *sbi,
>  
>   spin_lock(>stat_lock);
>  
> - valid_block_count = sbi->total_valid_block_count + 1;
> - if (unlikely(valid_block_count + sbi->current_reserved_blocks >
> - sbi->user_block_count)) {
> + valid_block_count = sbi->total_valid_block_count +
> + sbi->current_reserved_blocks + 1;
> +
> + if (!(test_opt(sbi, RESERVE_ROOT) && capable(CAP_SYS_RESOURCE)))
> + valid_block_count += sbi->root_reserved_blocks;
> +
> + if (unlikely(valid_block_count > sbi->user_block_count)) {
>   spin_unlock(>stat_lock);
>   goto enospc;
>   }
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 5c6a02b558f0..e814340bc2f0 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -107,6 +107,7 @@ enum {
>   Opt_noextent_cache,
>   Opt_noinline_data,
>   Opt_data_flush,
> + Opt_reserve_root,
>   Opt_mode,
>   Opt_io_size_bits,
>   Opt_fault_injection,
> @@ -157,6 +158,7 @@ static match_table_t f2fs_tokens = {
>   {Opt_noextent_cache, "noextent_cache"},
>   {Opt_noinline_data, "noinline_data"},
>   {Opt_data_flush, "data_flush"},
> + {Opt_reserve_root, "reserve_root=%u"},
>   {Opt_mode, "mode=%s"},
>   {Opt_io_size_bits, "io_bits=%u"},
>   {Opt_fault_injection, "fault_injection=%u"},
> @@ -488,6 +490,18 @@ static int parse_options(struct super_block *sb, char 
> *options)
>   case Opt_data_flush:
>   set_opt(sbi, 

[PATCH v2 1/4] dmaengine: xilinx_dma: populate dma caps properly

2018-01-02 Thread Kedareswara rao Appana
When client driver uses dma_get_slave_caps() api,
it checks for certain fields of dma_device struct
currently driver is not settings the directions and addr_widths
fields resulting dma_get_slave_caps() returning failure.

This patch fixes this issue by populating proper values
to the struct dma_device directions and addr_widths fields.

Signed-off-by: Kedareswara rao Appana 
---
Changes for v2:
--> Improved commit message title and description 
as suggested by Vinod.

 drivers/dma/xilinx/xilinx_dma.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 88d317d..21ac954 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -2398,6 +2398,7 @@ static int xilinx_dma_chan_probe(struct xilinx_dma_device 
*xdev,
chan->direction = DMA_MEM_TO_DEV;
chan->id = chan_id;
chan->tdest = chan_id;
+   xdev->common.directions = BIT(DMA_MEM_TO_DEV);
 
chan->ctrl_offset = XILINX_DMA_MM2S_CTRL_OFFSET;
if (xdev->dma_config->dmatype == XDMA_TYPE_VDMA) {
@@ -2415,6 +2416,7 @@ static int xilinx_dma_chan_probe(struct xilinx_dma_device 
*xdev,
chan->direction = DMA_DEV_TO_MEM;
chan->id = chan_id;
chan->tdest = chan_id - xdev->nr_channels;
+   xdev->common.directions |= BIT(DMA_DEV_TO_MEM);
 
chan->ctrl_offset = XILINX_DMA_S2MM_CTRL_OFFSET;
if (xdev->dma_config->dmatype == XDMA_TYPE_VDMA) {
@@ -2629,6 +2631,8 @@ static int xilinx_dma_probe(struct platform_device *pdev)
dma_cap_set(DMA_PRIVATE, xdev->common.cap_mask);
}
 
+   xdev->common.dst_addr_widths = BIT(addr_width / 8);
+   xdev->common.src_addr_widths = BIT(addr_width / 8);
xdev->common.device_alloc_chan_resources =
xilinx_dma_alloc_chan_resources;
xdev->common.device_free_chan_resources =
-- 
2.7.4



[PATCH v2 1/4] dmaengine: xilinx_dma: populate dma caps properly

2018-01-02 Thread Kedareswara rao Appana
When client driver uses dma_get_slave_caps() api,
it checks for certain fields of dma_device struct
currently driver is not settings the directions and addr_widths
fields resulting dma_get_slave_caps() returning failure.

This patch fixes this issue by populating proper values
to the struct dma_device directions and addr_widths fields.

Signed-off-by: Kedareswara rao Appana 
---
Changes for v2:
--> Improved commit message title and description 
as suggested by Vinod.

 drivers/dma/xilinx/xilinx_dma.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 88d317d..21ac954 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -2398,6 +2398,7 @@ static int xilinx_dma_chan_probe(struct xilinx_dma_device 
*xdev,
chan->direction = DMA_MEM_TO_DEV;
chan->id = chan_id;
chan->tdest = chan_id;
+   xdev->common.directions = BIT(DMA_MEM_TO_DEV);
 
chan->ctrl_offset = XILINX_DMA_MM2S_CTRL_OFFSET;
if (xdev->dma_config->dmatype == XDMA_TYPE_VDMA) {
@@ -2415,6 +2416,7 @@ static int xilinx_dma_chan_probe(struct xilinx_dma_device 
*xdev,
chan->direction = DMA_DEV_TO_MEM;
chan->id = chan_id;
chan->tdest = chan_id - xdev->nr_channels;
+   xdev->common.directions |= BIT(DMA_DEV_TO_MEM);
 
chan->ctrl_offset = XILINX_DMA_S2MM_CTRL_OFFSET;
if (xdev->dma_config->dmatype == XDMA_TYPE_VDMA) {
@@ -2629,6 +2631,8 @@ static int xilinx_dma_probe(struct platform_device *pdev)
dma_cap_set(DMA_PRIVATE, xdev->common.cap_mask);
}
 
+   xdev->common.dst_addr_widths = BIT(addr_width / 8);
+   xdev->common.src_addr_widths = BIT(addr_width / 8);
xdev->common.device_alloc_chan_resources =
xilinx_dma_alloc_chan_resources;
xdev->common.device_free_chan_resources =
-- 
2.7.4



[PATCH v2 3/4] dmaengine: xilinx_dma: Fix warning variable prev set but not used

2018-01-02 Thread Kedareswara rao Appana
This patch fixes the below sparse warning in the driver
drivers/dma/xilinx/xilinx_dma.c: In function ‘xilinx_vdma_dma_prep_interleaved’:
drivers/dma/xilinx/xilinx_dma.c:1614:43: warning: variable ‘prev’ set but not 
used [-Wunused-but-set-variable]
  struct xilinx_vdma_tx_segment *segment, *prev = NULL;

Signed-off-by: Kedareswara rao Appana 
---
Changes for v2:
--> Improved commit message title and description 
as suggested by Vinod.

 drivers/dma/xilinx/xilinx_dma.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 8467671..845e638 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -1611,7 +1611,7 @@ xilinx_vdma_dma_prep_interleaved(struct dma_chan *dchan,
 {
struct xilinx_dma_chan *chan = to_xilinx_chan(dchan);
struct xilinx_dma_tx_descriptor *desc;
-   struct xilinx_vdma_tx_segment *segment, *prev = NULL;
+   struct xilinx_vdma_tx_segment *segment;
struct xilinx_vdma_desc_hw *hw;
 
if (!is_slave_direction(xt->dir))
@@ -1665,8 +1665,6 @@ xilinx_vdma_dma_prep_interleaved(struct dma_chan *dchan,
/* Insert the segment into the descriptor segments list. */
list_add_tail(>node, >segments);
 
-   prev = segment;
-
/* Link the last hardware descriptor with the first. */
segment = list_first_entry(>segments,
   struct xilinx_vdma_tx_segment, node);
-- 
2.7.4



[PATCH v2 0/4] dmaengine: xilinx_dma: Bug fixes

2018-01-02 Thread Kedareswara rao Appana
This patch series does the below
--> Fixes sparse warnings in the driver.
--> properly configures the SG mode bit in the driver for cdma.
--> populates dma caps properly.

This patch series got created on top of linux tag 4.15-rc4
i.e slave-dma.git next branch

Kedareswara rao Appana (4):
  dmaengine: xilinx_dma: populate dma caps properly
  dmaengine: xilinx_dma: properly configure the SG mode bit in the
driver for cdma
  dmaengine: xilinx_dma: Fix warning variable prev set but not used
  dmaengine: xilinx_dma: Free BD consistent memory

 drivers/dma/xilinx/xilinx_dma.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

-- 
2.7.4



[PATCH v2 3/4] dmaengine: xilinx_dma: Fix warning variable prev set but not used

2018-01-02 Thread Kedareswara rao Appana
This patch fixes the below sparse warning in the driver
drivers/dma/xilinx/xilinx_dma.c: In function ‘xilinx_vdma_dma_prep_interleaved’:
drivers/dma/xilinx/xilinx_dma.c:1614:43: warning: variable ‘prev’ set but not 
used [-Wunused-but-set-variable]
  struct xilinx_vdma_tx_segment *segment, *prev = NULL;

Signed-off-by: Kedareswara rao Appana 
---
Changes for v2:
--> Improved commit message title and description 
as suggested by Vinod.

 drivers/dma/xilinx/xilinx_dma.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 8467671..845e638 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -1611,7 +1611,7 @@ xilinx_vdma_dma_prep_interleaved(struct dma_chan *dchan,
 {
struct xilinx_dma_chan *chan = to_xilinx_chan(dchan);
struct xilinx_dma_tx_descriptor *desc;
-   struct xilinx_vdma_tx_segment *segment, *prev = NULL;
+   struct xilinx_vdma_tx_segment *segment;
struct xilinx_vdma_desc_hw *hw;
 
if (!is_slave_direction(xt->dir))
@@ -1665,8 +1665,6 @@ xilinx_vdma_dma_prep_interleaved(struct dma_chan *dchan,
/* Insert the segment into the descriptor segments list. */
list_add_tail(>node, >segments);
 
-   prev = segment;
-
/* Link the last hardware descriptor with the first. */
segment = list_first_entry(>segments,
   struct xilinx_vdma_tx_segment, node);
-- 
2.7.4



[PATCH v2 0/4] dmaengine: xilinx_dma: Bug fixes

2018-01-02 Thread Kedareswara rao Appana
This patch series does the below
--> Fixes sparse warnings in the driver.
--> properly configures the SG mode bit in the driver for cdma.
--> populates dma caps properly.

This patch series got created on top of linux tag 4.15-rc4
i.e slave-dma.git next branch

Kedareswara rao Appana (4):
  dmaengine: xilinx_dma: populate dma caps properly
  dmaengine: xilinx_dma: properly configure the SG mode bit in the
driver for cdma
  dmaengine: xilinx_dma: Fix warning variable prev set but not used
  dmaengine: xilinx_dma: Free BD consistent memory

 drivers/dma/xilinx/xilinx_dma.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

-- 
2.7.4



[PATCH v2 2/4] dmaengine: xilinx_dma: properly configure the SG mode bit in the driver for cdma

2018-01-02 Thread Kedareswara rao Appana
If the hardware is configured for Scatter Gather(SG) mode,
and hardware is idle, in the control register SG mode bit
must be set to a 0 then back to 1 by the software, to force
the CDMA SG engine to use a new value written to the CURDESC_PNTR
register, failure to do so could result errors from the dmaengine.

This patch updates the same.

Signed-off-by: Kedareswara rao Appana 
---
Changes for v2:
--> Improved commit message title and description 
as suggested by Vinod.

 drivers/dma/xilinx/xilinx_dma.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 21ac954..8467671 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -1204,6 +1204,12 @@ static void xilinx_cdma_start_transfer(struct 
xilinx_dma_chan *chan)
}
 
if (chan->has_sg) {
+   dma_ctrl_clr(chan, XILINX_DMA_REG_DMACR,
+XILINX_CDMA_CR_SGMODE);
+
+   dma_ctrl_set(chan, XILINX_DMA_REG_DMACR,
+XILINX_CDMA_CR_SGMODE);
+
xilinx_write(chan, XILINX_DMA_REG_CURDESC,
 head_desc->async_tx.phys);
 
@@ -2052,6 +2058,10 @@ static int xilinx_dma_terminate_all(struct dma_chan 
*dchan)
chan->cyclic = false;
}
 
+   if ((chan->xdev->dma_config->dmatype == XDMA_TYPE_CDMA) && chan->has_sg)
+   dma_ctrl_clr(chan, XILINX_DMA_REG_DMACR,
+XILINX_CDMA_CR_SGMODE);
+
return 0;
 }
 
-- 
2.7.4



[PATCH v2 2/4] dmaengine: xilinx_dma: properly configure the SG mode bit in the driver for cdma

2018-01-02 Thread Kedareswara rao Appana
If the hardware is configured for Scatter Gather(SG) mode,
and hardware is idle, in the control register SG mode bit
must be set to a 0 then back to 1 by the software, to force
the CDMA SG engine to use a new value written to the CURDESC_PNTR
register, failure to do so could result errors from the dmaengine.

This patch updates the same.

Signed-off-by: Kedareswara rao Appana 
---
Changes for v2:
--> Improved commit message title and description 
as suggested by Vinod.

 drivers/dma/xilinx/xilinx_dma.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 21ac954..8467671 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -1204,6 +1204,12 @@ static void xilinx_cdma_start_transfer(struct 
xilinx_dma_chan *chan)
}
 
if (chan->has_sg) {
+   dma_ctrl_clr(chan, XILINX_DMA_REG_DMACR,
+XILINX_CDMA_CR_SGMODE);
+
+   dma_ctrl_set(chan, XILINX_DMA_REG_DMACR,
+XILINX_CDMA_CR_SGMODE);
+
xilinx_write(chan, XILINX_DMA_REG_CURDESC,
 head_desc->async_tx.phys);
 
@@ -2052,6 +2058,10 @@ static int xilinx_dma_terminate_all(struct dma_chan 
*dchan)
chan->cyclic = false;
}
 
+   if ((chan->xdev->dma_config->dmatype == XDMA_TYPE_CDMA) && chan->has_sg)
+   dma_ctrl_clr(chan, XILINX_DMA_REG_DMACR,
+XILINX_CDMA_CR_SGMODE);
+
return 0;
 }
 
-- 
2.7.4



Re: [PATCH v3 00/27] kill devm_ioremap_nocache

2018-01-02 Thread Yisheng Xie
+ cris/ia64/mn10300/openrisc maintainers

On 2017/12/25 9:09, Yisheng Xie wrote:
> hi Christophe and Greg,
> 
> On 2017/12/24 16:55, christophe leroy wrote:
>>
>>
>> Le 23/12/2017 à 16:57, Guenter Roeck a écrit :
>>> On 12/23/2017 05:48 AM, Greg KH wrote:
 On Sat, Dec 23, 2017 at 06:55:25PM +0800, Yisheng Xie wrote:
> Hi all,
>
> When I tried to use devm_ioremap function and review related code, I found
> devm_ioremap and devm_ioremap_nocache is almost the same with each other,
> except one use ioremap while the other use ioremap_nocache.

 For all arches?  Really?  Look at MIPS, and x86, they have different
 functions.

>>>
>>> Both mips and x86 end up mapping the same function, but other arches don't.
>>> mn10300 is one where ioremap and ioremap_nocache are definitely different.
>>
>> alpha: identical
>> arc: identical
>> arm: identical
>> arm64: identical
>> cris: different<==
>> frv: identical
>> hexagone: identical
>> ia64: different<==
>> m32r: identical
>> m68k: identical
>> metag: identical
>> microblaze: identical
>> mips: identical
>> mn10300: different <==
>> nios: identical
>> openrisc: different<==
>> parisc: identical
>> riscv: identical
>> s390: identical
>> sh: identical
>> sparc: identical
>> tile: identical
>> um: rely on asm/generic
>> unicore32: identical
>> x86: identical
>> asm/generic (no mmu): identical
> 
> Wow, that's correct, sorry for I have just checked the main archs, I means
> x86,arm, arm64, mips.
> 
> However, I stall have no idea about why these 4 archs want different ioremap
> function with others. Drivers seems cannot aware this? If driver call ioremap
> want he really want for there 4 archs, cache or nocache?

Could you please help about this? it is out of my knowledge.

Thanks
Yisheng

> 
>>
>> So 4 among all arches seems to have ioremap() and ioremap_nocache() being 
>> different.
>>
>> Could we have a define set by the 4 arches on which ioremap() and 
>> ioremap_nocache() are different, something like 
>> HAVE_DIFFERENT_IOREMAP_NOCACHE ?
> 
> Then, what the HAVE_DIFFERENT_IOREMAP_NOCACHE is uesed for ?
> 
> Thanks
> Yisheng
>>
>> Christophe
>>
>>>
>>> Guenter
>>>
> While ioremap's
> default function is ioremap_nocache, so devm_ioremap_nocache also have the
> same function with devm_ioremap, which can just be killed to reduce the 
> size
> of devres.o(from 20304 bytes to 18992 bytes in my compile environment).
>
> I have posted two versions, which use macro instead of function for
> devm_ioremap_nocache[1] or devm_ioremap[2]. And Greg suggest me to kill
> devm_ioremap_nocache for no need to keep a macro around for the duplicate
> thing. So here comes v3 and please help to review.

 I don't think this can be done, what am I missing?  These functions are
 not identical, sorry for missing that before.
> 
> Never mind, I should checked all the arches, sorry about that.
> 

 thanks,

 greg k-h

>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> ---
>> L'absence de virus dans ce courrier électronique a été vérifiée par le 
>> logiciel antivirus Avast.
>> https://www.avast.com/antivirus
>>
>>
>> .
>>
> 
> 
> .
> 



Re: [PATCH v3 00/27] kill devm_ioremap_nocache

2018-01-02 Thread Yisheng Xie
+ cris/ia64/mn10300/openrisc maintainers

On 2017/12/25 9:09, Yisheng Xie wrote:
> hi Christophe and Greg,
> 
> On 2017/12/24 16:55, christophe leroy wrote:
>>
>>
>> Le 23/12/2017 à 16:57, Guenter Roeck a écrit :
>>> On 12/23/2017 05:48 AM, Greg KH wrote:
 On Sat, Dec 23, 2017 at 06:55:25PM +0800, Yisheng Xie wrote:
> Hi all,
>
> When I tried to use devm_ioremap function and review related code, I found
> devm_ioremap and devm_ioremap_nocache is almost the same with each other,
> except one use ioremap while the other use ioremap_nocache.

 For all arches?  Really?  Look at MIPS, and x86, they have different
 functions.

>>>
>>> Both mips and x86 end up mapping the same function, but other arches don't.
>>> mn10300 is one where ioremap and ioremap_nocache are definitely different.
>>
>> alpha: identical
>> arc: identical
>> arm: identical
>> arm64: identical
>> cris: different<==
>> frv: identical
>> hexagone: identical
>> ia64: different<==
>> m32r: identical
>> m68k: identical
>> metag: identical
>> microblaze: identical
>> mips: identical
>> mn10300: different <==
>> nios: identical
>> openrisc: different<==
>> parisc: identical
>> riscv: identical
>> s390: identical
>> sh: identical
>> sparc: identical
>> tile: identical
>> um: rely on asm/generic
>> unicore32: identical
>> x86: identical
>> asm/generic (no mmu): identical
> 
> Wow, that's correct, sorry for I have just checked the main archs, I means
> x86,arm, arm64, mips.
> 
> However, I stall have no idea about why these 4 archs want different ioremap
> function with others. Drivers seems cannot aware this? If driver call ioremap
> want he really want for there 4 archs, cache or nocache?

Could you please help about this? it is out of my knowledge.

Thanks
Yisheng

> 
>>
>> So 4 among all arches seems to have ioremap() and ioremap_nocache() being 
>> different.
>>
>> Could we have a define set by the 4 arches on which ioremap() and 
>> ioremap_nocache() are different, something like 
>> HAVE_DIFFERENT_IOREMAP_NOCACHE ?
> 
> Then, what the HAVE_DIFFERENT_IOREMAP_NOCACHE is uesed for ?
> 
> Thanks
> Yisheng
>>
>> Christophe
>>
>>>
>>> Guenter
>>>
> While ioremap's
> default function is ioremap_nocache, so devm_ioremap_nocache also have the
> same function with devm_ioremap, which can just be killed to reduce the 
> size
> of devres.o(from 20304 bytes to 18992 bytes in my compile environment).
>
> I have posted two versions, which use macro instead of function for
> devm_ioremap_nocache[1] or devm_ioremap[2]. And Greg suggest me to kill
> devm_ioremap_nocache for no need to keep a macro around for the duplicate
> thing. So here comes v3 and please help to review.

 I don't think this can be done, what am I missing?  These functions are
 not identical, sorry for missing that before.
> 
> Never mind, I should checked all the arches, sorry about that.
> 

 thanks,

 greg k-h

>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> ---
>> L'absence de virus dans ce courrier électronique a été vérifiée par le 
>> logiciel antivirus Avast.
>> https://www.avast.com/antivirus
>>
>>
>> .
>>
> 
> 
> .
> 



[PATCH v2 4/4] dmaengine: xilinx_dma: Free BD consistent memory

2018-01-02 Thread Kedareswara rao Appana
Free BD consistent memory while freeing the channel
i.e in free_chan_resources.

Signed-off-by: Radhey Shyam Pandey 
Signed-off-by: Kedareswara rao Appana 
---
Changes for v2:
--> None.

 drivers/dma/xilinx/xilinx_dma.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 845e638..a9edbd8 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -764,6 +764,11 @@ static void xilinx_dma_free_chan_resources(struct dma_chan 
*dchan)
INIT_LIST_HEAD(>free_seg_list);
spin_unlock_irqrestore(>lock, flags);
 
+   /* Free memory that is allocated for BD */
+   dma_free_coherent(chan->dev, sizeof(*chan->seg_v) *
+ XILINX_DMA_NUM_DESCS, chan->seg_v,
+ chan->seg_p);
+
/* Free Memory that is allocated for cyclic DMA Mode */
dma_free_coherent(chan->dev, sizeof(*chan->cyclic_seg_v),
  chan->cyclic_seg_v, chan->cyclic_seg_p);
-- 
2.7.4



[PATCH v2 4/4] dmaengine: xilinx_dma: Free BD consistent memory

2018-01-02 Thread Kedareswara rao Appana
Free BD consistent memory while freeing the channel
i.e in free_chan_resources.

Signed-off-by: Radhey Shyam Pandey 
Signed-off-by: Kedareswara rao Appana 
---
Changes for v2:
--> None.

 drivers/dma/xilinx/xilinx_dma.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 845e638..a9edbd8 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -764,6 +764,11 @@ static void xilinx_dma_free_chan_resources(struct dma_chan 
*dchan)
INIT_LIST_HEAD(>free_seg_list);
spin_unlock_irqrestore(>lock, flags);
 
+   /* Free memory that is allocated for BD */
+   dma_free_coherent(chan->dev, sizeof(*chan->seg_v) *
+ XILINX_DMA_NUM_DESCS, chan->seg_v,
+ chan->seg_p);
+
/* Free Memory that is allocated for cyclic DMA Mode */
dma_free_coherent(chan->dev, sizeof(*chan->cyclic_seg_v),
  chan->cyclic_seg_v, chan->cyclic_seg_p);
-- 
2.7.4



Re: [PATCH v2] regulator: sc2731: Fix defines for SC2731_WR_UNLOCK and SC2731_PWR_WR_PROT_VALUE

2018-01-02 Thread Erick Chen
Hi Axel,

On 一,  1月 01, 2018 at 08:38:50下午 +0800, Axel Lin wrote:
> The defines for SC2731_WR_UNLOCK and SC2731_PWR_WR_PROT_VALUE makes
> regmap_write() call looks strange because it takes reg parameter fist
> then val.
> Base on Erick's suggestion to define SC2731_PWR_WR_PROT and
> SC2731_WR_UNLOCK_VALUE instead.
> 
> Signed-off-by: Axel Lin 

Reviewed-by: Erick Chen 

> 


Re: [PATCH v2] regulator: sc2731: Fix defines for SC2731_WR_UNLOCK and SC2731_PWR_WR_PROT_VALUE

2018-01-02 Thread Erick Chen
Hi Axel,

On 一,  1月 01, 2018 at 08:38:50下午 +0800, Axel Lin wrote:
> The defines for SC2731_WR_UNLOCK and SC2731_PWR_WR_PROT_VALUE makes
> regmap_write() call looks strange because it takes reg parameter fist
> then val.
> Base on Erick's suggestion to define SC2731_PWR_WR_PROT and
> SC2731_WR_UNLOCK_VALUE instead.
> 
> Signed-off-by: Axel Lin 

Reviewed-by: Erick Chen 

> 


Re: WARNING in adjust_ptr_min_max_vals

2018-01-02 Thread Alexei Starovoitov
On Tue, Jan 02, 2018 at 08:58:01PM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 0e08c463db387a2adcb0243b15ab868a73f87807
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+6d362cadd45dc0a12...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> audit: type=1400 audit(1514685224.971:7): avc:  denied  { map } for
> pid=3144 comm="syzkaller663366" path="/root/syzkaller663366580" dev="sda1"
> ino=16481 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
> tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
> WARNING: CPU: 1 PID: 3144 at kernel/bpf/verifier.c:2359
> adjust_ptr_min_max_vals+0x977/0x20a0 kernel/bpf/verifier.c:2359
> Kernel panic - not syncing: panic_on_warn set ...
> 
> CPU: 1 PID: 3144 Comm: syzkaller663366 Not tainted 4.15.0-rc4-next-20171221+
> #78
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  panic+0x1e4/0x41c kernel/panic.c:183
>  __warn+0x1dc/0x200 kernel/panic.c:547
>  report_bug+0x211/0x2d0 lib/bug.c:184
>  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:177
>  fixup_bug arch/x86/kernel/traps.c:246 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:295
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:314
>  invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1079
> RIP: 0010:adjust_ptr_min_max_vals+0x977/0x20a0 kernel/bpf/verifier.c:2359
> RSP: 0018:8801c97ef198 EFLAGS: 00010293
> RAX: 8801c94e8240 RBX: 8801c8ee4b00 RCX: 817eebb7
> RDX:  RSI: c9002048 RDI: c9002049
> RBP: 8801c97ef228 R08:  R09: 858fa920
> R10: 0071 R11: 858f9d00 R12: 
> R13: 0001 R14: c9002048 R15: 8801c8e02040
>  adjust_reg_min_max_vals kernel/bpf/verifier.c:2799 [inline]
>  check_alu_op kernel/bpf/verifier.c:2997 [inline]
>  do_check+0x67e0/0xae20 kernel/bpf/verifier.c:4448
>  bpf_check+0x2b1b/0x49f0 kernel/bpf/verifier.c:5374
>  bpf_prog_load+0xa2a/0x1b00 kernel/bpf/syscall.c:1192
>  SYSC_bpf kernel/bpf/syscall.c:1724 [inline]
>  SyS_bpf+0x1044/0x4420 kernel/bpf/syscall.c:1686
>  entry_SYSCALL_64_fastpath+0x1f/0x96

that's an interesting bug.
If I decipher fuzzed bpf insns correctly the sequence:
 r0 = 0x0
 if r0 s<= 0x0 goto pc+0
 r0 -= r1
causes:
 if (WARN_ON_ONCE(known && (smin_val != smax_val))) {
and smin_val=1 smax_val=0
since the verifier did:
case BPF_JSLE:
false_reg->smin_value = max_t(s64, false_reg->smin_value, val + 
1);
Not sure what the best fix yet.



Re: WARNING in adjust_ptr_min_max_vals

2018-01-02 Thread Alexei Starovoitov
On Tue, Jan 02, 2018 at 08:58:01PM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 0e08c463db387a2adcb0243b15ab868a73f87807
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+6d362cadd45dc0a12...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> audit: type=1400 audit(1514685224.971:7): avc:  denied  { map } for
> pid=3144 comm="syzkaller663366" path="/root/syzkaller663366580" dev="sda1"
> ino=16481 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
> tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
> WARNING: CPU: 1 PID: 3144 at kernel/bpf/verifier.c:2359
> adjust_ptr_min_max_vals+0x977/0x20a0 kernel/bpf/verifier.c:2359
> Kernel panic - not syncing: panic_on_warn set ...
> 
> CPU: 1 PID: 3144 Comm: syzkaller663366 Not tainted 4.15.0-rc4-next-20171221+
> #78
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  panic+0x1e4/0x41c kernel/panic.c:183
>  __warn+0x1dc/0x200 kernel/panic.c:547
>  report_bug+0x211/0x2d0 lib/bug.c:184
>  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:177
>  fixup_bug arch/x86/kernel/traps.c:246 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:295
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:314
>  invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1079
> RIP: 0010:adjust_ptr_min_max_vals+0x977/0x20a0 kernel/bpf/verifier.c:2359
> RSP: 0018:8801c97ef198 EFLAGS: 00010293
> RAX: 8801c94e8240 RBX: 8801c8ee4b00 RCX: 817eebb7
> RDX:  RSI: c9002048 RDI: c9002049
> RBP: 8801c97ef228 R08:  R09: 858fa920
> R10: 0071 R11: 858f9d00 R12: 
> R13: 0001 R14: c9002048 R15: 8801c8e02040
>  adjust_reg_min_max_vals kernel/bpf/verifier.c:2799 [inline]
>  check_alu_op kernel/bpf/verifier.c:2997 [inline]
>  do_check+0x67e0/0xae20 kernel/bpf/verifier.c:4448
>  bpf_check+0x2b1b/0x49f0 kernel/bpf/verifier.c:5374
>  bpf_prog_load+0xa2a/0x1b00 kernel/bpf/syscall.c:1192
>  SYSC_bpf kernel/bpf/syscall.c:1724 [inline]
>  SyS_bpf+0x1044/0x4420 kernel/bpf/syscall.c:1686
>  entry_SYSCALL_64_fastpath+0x1f/0x96

that's an interesting bug.
If I decipher fuzzed bpf insns correctly the sequence:
 r0 = 0x0
 if r0 s<= 0x0 goto pc+0
 r0 -= r1
causes:
 if (WARN_ON_ONCE(known && (smin_val != smax_val))) {
and smin_val=1 smax_val=0
since the verifier did:
case BPF_JSLE:
false_reg->smin_value = max_t(s64, false_reg->smin_value, val + 
1);
Not sure what the best fix yet.



Re: [PATCH v5 2/2] PCI: mediatek: Set up class type and vendor ID for MT7622

2018-01-02 Thread Honghui Zhang
On Tue, 2018-01-02 at 10:56 +, Lorenzo Pieralisi wrote:
> On Thu, Dec 28, 2017 at 09:39:12AM +0800, Honghui Zhang wrote:
> > On Wed, 2017-12-27 at 12:45 -0600, Bjorn Helgaas wrote:
> > > On Wed, Dec 27, 2017 at 08:59:54AM +0800, honghui.zh...@mediatek.com 
> > > wrote:
> > > > From: Honghui Zhang 
> > > > 

> > > > +   /* Set up class code for MT7622 */
> > > > +   val = PCI_CLASS_BRIDGE_PCI << 16;
> > > > +   writel(val, port->base + PCIE_CONF_CLASS);
> > > 
> > > 1) Your comments mention MT7622 specifically, but this code is run for
> > > both mt2712-pcie and mt7622-pcie.  If this code is safe and necessary
> > > for both mt2712-pcie and mt7622-pcie, please remove the mention of
> > > MT7622.
> > 
> > Hmm, the code snippet added here will only be executed by MT7622, since
> > MT2712 will not enter this  "if (pcie->base) {"  condition.
> > Should the mention of MT7622 must be removed in this case?
> 
> You should add an explicit way (eg of_device_is_compatible() match for
> instance) to apply the quirk just on the platform that requires it.
> 
> Checking for "if (pcie->base)" is really not the way to do it.
> 

hi, Lorenzo,
Thanks very much for your advise.
Passing the compatible string or platform data into this function needed
to add some new field in the struct mtk_pcie_port, then I guess both set
it for MT2712 and MT7622 is an easy way, since re-setting those values
for MT2712 is safe.

> > > 2) The first comment mentions both "vendor ID and device ID" but you
> > > don't write the device ID.  Since this code applies to both
> > > mt2712-pcie and mt7622-pcie, my guess is that you don't *want* to
> > > write the device ID.  If that's the case, please fix the comment.
> > > 
> > 
> > My bad, I did not check the comments carefully.
> > Thanks.
> > 
> > > 3) If you only need to set the vendor ID, you're performing a 32-bit
> > > write (writel()) to update a 16-bit value.  Please use writew()
> > > instead.
> > > 
> > 
> > Ok, thanks, I guess I could use the following code snippet in the next
> > version:
> > val = readl(port->base + PCIE_CONF_VENDOR_ID)
> > val &= ~GENMASK(15, 0);
> > val |= PCI_VENDOR_ID_MEDIATEK;
> > writel(val, port->base + PCIE_CONF_VENDOR_ID);
> 
> Have you read Bjorn's comment ? Or there is a problem with using
> a writew() ?
> 

This control register is a 32bit register, I'm not sure whether the apb
bus support write an 16bit value with 16bit but not 32bit address
alignment. I prefer the more safety old way of writel.

I need to do more test about the writew if the code elegant is more
important.

thanks.

> Lorenzo
> 
> > > 4) If you only need to set the vendor ID, please use a definition like
> > > "PCIE_CONF_VENDOR_ID" instead of the ambiguous "PCIE_CONF_ID".
> > > 
> > > 5) If you only need to set the vendor ID, please update the changelog
> > > to mention "vendor ID" specifically instead of the ambiguous "IDs".
> > 
> > > 6) Please add a space before the closing "*/" of the first comment.
> > > 
> > > 7) PCI_CLASS_BRIDGE_PCI is for a PCI-to-PCI bridge, i.e., one that has
> > > PCI on both the primary (upstream) side and the secondary (downstream)
> > > side.  That kind of bridge has a type 1 config header (see
> > > PCI_HEADER_TYPE) and the PCI_PRIMARY_BUS and PCI_SECONDARY_BUS
> > > registers tell us the bus number of the primary and secondary sides.
> > > 
> > > I don't believe this device is a PCI-to-PCI bridge.  I think it's a
> > > *host* bridge that has some non-PCI interface on the upstream side and
> > > should have a type 0 config header.  If that's the case you should use
> > > PCI_CLASS_BRIDGE_HOST instead.
> > > 
> > 
> > Thanks very much for your help with the review, I will fix the other
> > issue in the next version.
> > 
> > > > }
> > > >  
> > > > /* Assert all reset signals */
> > > > diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> > > > index ab20dc5..2480b0e 100644
> > > > --- a/include/linux/pci_ids.h
> > > > +++ b/include/linux/pci_ids.h
> > > > @@ -2113,6 +2113,8 @@
> > > >  
> > > >  #define PCI_VENDOR_ID_MYRICOM  0x14c1
> > > >  
> > > > +#define PCI_VENDOR_ID_MEDIATEK 0x14c3
> > > > +
> > > >  #define PCI_VENDOR_ID_TITAN0x14D2
> > > >  #define PCI_DEVICE_ID_TITAN_010L   0x8001
> > > >  #define PCI_DEVICE_ID_TITAN_100L   0x8010
> > > > -- 
> > > > 2.6.4
> > > > 
> > 
> > 




Re: [PATCH v5 2/2] PCI: mediatek: Set up class type and vendor ID for MT7622

2018-01-02 Thread Honghui Zhang
On Tue, 2018-01-02 at 10:56 +, Lorenzo Pieralisi wrote:
> On Thu, Dec 28, 2017 at 09:39:12AM +0800, Honghui Zhang wrote:
> > On Wed, 2017-12-27 at 12:45 -0600, Bjorn Helgaas wrote:
> > > On Wed, Dec 27, 2017 at 08:59:54AM +0800, honghui.zh...@mediatek.com 
> > > wrote:
> > > > From: Honghui Zhang 
> > > > 

> > > > +   /* Set up class code for MT7622 */
> > > > +   val = PCI_CLASS_BRIDGE_PCI << 16;
> > > > +   writel(val, port->base + PCIE_CONF_CLASS);
> > > 
> > > 1) Your comments mention MT7622 specifically, but this code is run for
> > > both mt2712-pcie and mt7622-pcie.  If this code is safe and necessary
> > > for both mt2712-pcie and mt7622-pcie, please remove the mention of
> > > MT7622.
> > 
> > Hmm, the code snippet added here will only be executed by MT7622, since
> > MT2712 will not enter this  "if (pcie->base) {"  condition.
> > Should the mention of MT7622 must be removed in this case?
> 
> You should add an explicit way (eg of_device_is_compatible() match for
> instance) to apply the quirk just on the platform that requires it.
> 
> Checking for "if (pcie->base)" is really not the way to do it.
> 

hi, Lorenzo,
Thanks very much for your advise.
Passing the compatible string or platform data into this function needed
to add some new field in the struct mtk_pcie_port, then I guess both set
it for MT2712 and MT7622 is an easy way, since re-setting those values
for MT2712 is safe.

> > > 2) The first comment mentions both "vendor ID and device ID" but you
> > > don't write the device ID.  Since this code applies to both
> > > mt2712-pcie and mt7622-pcie, my guess is that you don't *want* to
> > > write the device ID.  If that's the case, please fix the comment.
> > > 
> > 
> > My bad, I did not check the comments carefully.
> > Thanks.
> > 
> > > 3) If you only need to set the vendor ID, you're performing a 32-bit
> > > write (writel()) to update a 16-bit value.  Please use writew()
> > > instead.
> > > 
> > 
> > Ok, thanks, I guess I could use the following code snippet in the next
> > version:
> > val = readl(port->base + PCIE_CONF_VENDOR_ID)
> > val &= ~GENMASK(15, 0);
> > val |= PCI_VENDOR_ID_MEDIATEK;
> > writel(val, port->base + PCIE_CONF_VENDOR_ID);
> 
> Have you read Bjorn's comment ? Or there is a problem with using
> a writew() ?
> 

This control register is a 32bit register, I'm not sure whether the apb
bus support write an 16bit value with 16bit but not 32bit address
alignment. I prefer the more safety old way of writel.

I need to do more test about the writew if the code elegant is more
important.

thanks.

> Lorenzo
> 
> > > 4) If you only need to set the vendor ID, please use a definition like
> > > "PCIE_CONF_VENDOR_ID" instead of the ambiguous "PCIE_CONF_ID".
> > > 
> > > 5) If you only need to set the vendor ID, please update the changelog
> > > to mention "vendor ID" specifically instead of the ambiguous "IDs".
> > 
> > > 6) Please add a space before the closing "*/" of the first comment.
> > > 
> > > 7) PCI_CLASS_BRIDGE_PCI is for a PCI-to-PCI bridge, i.e., one that has
> > > PCI on both the primary (upstream) side and the secondary (downstream)
> > > side.  That kind of bridge has a type 1 config header (see
> > > PCI_HEADER_TYPE) and the PCI_PRIMARY_BUS and PCI_SECONDARY_BUS
> > > registers tell us the bus number of the primary and secondary sides.
> > > 
> > > I don't believe this device is a PCI-to-PCI bridge.  I think it's a
> > > *host* bridge that has some non-PCI interface on the upstream side and
> > > should have a type 0 config header.  If that's the case you should use
> > > PCI_CLASS_BRIDGE_HOST instead.
> > > 
> > 
> > Thanks very much for your help with the review, I will fix the other
> > issue in the next version.
> > 
> > > > }
> > > >  
> > > > /* Assert all reset signals */
> > > > diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> > > > index ab20dc5..2480b0e 100644
> > > > --- a/include/linux/pci_ids.h
> > > > +++ b/include/linux/pci_ids.h
> > > > @@ -2113,6 +2113,8 @@
> > > >  
> > > >  #define PCI_VENDOR_ID_MYRICOM  0x14c1
> > > >  
> > > > +#define PCI_VENDOR_ID_MEDIATEK 0x14c3
> > > > +
> > > >  #define PCI_VENDOR_ID_TITAN0x14D2
> > > >  #define PCI_DEVICE_ID_TITAN_010L   0x8001
> > > >  #define PCI_DEVICE_ID_TITAN_100L   0x8010
> > > > -- 
> > > > 2.6.4
> > > > 
> > 
> > 




[PATCH] irqchip/gic-v3-its: Add workaround for ThunderX2 erratum #174

2018-01-02 Thread Ganapatrao Kulkarni
When an interrupt is moved across node collections on ThunderX2
multi Socket platform, an interrupt stops routed to new collection
and results in loss of interrupts.

Adding workaround to issue INV after MOVI for cross-node collection
move to flush out the cached entry.

Signed-off-by: Ganapatrao Kulkarni 
---
 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/Kconfig | 11 +++
 drivers/irqchip/irq-gic-v3-its.c   | 24 
 3 files changed, 36 insertions(+)

diff --git a/Documentation/arm64/silicon-errata.txt 
b/Documentation/arm64/silicon-errata.txt
index fc1c884..fb27cb5 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -63,6 +63,7 @@ stable kernels.
 | Cavium | ThunderX Core   | #27456  | CAVIUM_ERRATUM_27456
|
 | Cavium | ThunderX Core   | #30115  | CAVIUM_ERRATUM_30115
|
 | Cavium | ThunderX SMMUv2 | #27704  | N/A 
|
+| Cavium | ThunderX2 ITS   | #174| CAVIUM_ERRATUM_174  
|
 | Cavium | ThunderX2 SMMUv3| #74 | N/A 
|
 | Cavium | ThunderX2 SMMUv3| #126| N/A 
|
 || | | 
|
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9a7e9e..71a7e30 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -461,6 +461,17 @@ config ARM64_ERRATUM_843419
 
  If unsure, say Y.
 
+config CAVIUM_ERRATUM_174
+   bool "Cavium ThunderX2 erratum 174"
+   depends on NUMA
+   default y
+   help
+ LPI stops routed to redistributors after inter node collection
+ move in ITS. Enable workaround to invalidate ITS entry after
+ inter-node collection move.
+
+ If unsure, say Y.
+
 config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 06f025f..d8b9c96 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -46,6 +46,7 @@
 #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING  (1ULL << 0)
 #define ITS_FLAGS_WORKAROUND_CAVIUM_22375  (1ULL << 1)
 #define ITS_FLAGS_WORKAROUND_CAVIUM_23144  (1ULL << 2)
+#define ITS_FLAGS_WORKAROUND_CAVIUM_174(1ULL << 3)
 
 #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING(1 << 0)
 
@@ -1119,6 +1120,12 @@ static int its_set_affinity(struct irq_data *d, const 
struct cpumask *mask_val,
if (cpu != its_dev->event_map.col_map[id]) {
target_col = _dev->its->collections[cpu];
its_send_movi(its_dev, target_col, id);
+   if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_174) {
+   /* Issue INV for cross node collection move. */
+   if (cpu_to_node(cpu) !=
+   cpu_to_node(its_dev->event_map.col_map[id]))
+   its_send_inv(its_dev, id);
+   }
its_dev->event_map.col_map[id] = cpu;
irq_data_update_effective_affinity(d, cpumask_of(cpu));
}
@@ -2904,6 +2911,15 @@ static int its_force_quiescent(void __iomem *base)
}
 }
 
+static bool __maybe_unused its_enable_quirk_cavium_174(void *data)
+{
+   struct its_node *its = data;
+
+   its->flags |= ITS_FLAGS_WORKAROUND_CAVIUM_174;
+
+   return true;
+}
+
 static bool __maybe_unused its_enable_quirk_cavium_22375(void *data)
 {
struct its_node *its = data;
@@ -3031,6 +3047,14 @@ static const struct gic_quirk its_quirks[] = {
.init   = its_enable_quirk_hip07_161600802,
},
 #endif
+#ifdef CONFIG_CAVIUM_ERRATUM_174
+   {
+   .desc   = "ITS: Cavium ThunderX2 erratum 174",
+   .iidr   = 0x13f,/* ThunderX2 pass A1/A2/B0 */
+   .mask   = 0x,
+   .init   = its_enable_quirk_cavium_174,
+   },
+#endif
{
}
 };
-- 
2.9.4



[PATCH] irqchip/gic-v3-its: Add workaround for ThunderX2 erratum #174

2018-01-02 Thread Ganapatrao Kulkarni
When an interrupt is moved across node collections on ThunderX2
multi Socket platform, an interrupt stops routed to new collection
and results in loss of interrupts.

Adding workaround to issue INV after MOVI for cross-node collection
move to flush out the cached entry.

Signed-off-by: Ganapatrao Kulkarni 
---
 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/Kconfig | 11 +++
 drivers/irqchip/irq-gic-v3-its.c   | 24 
 3 files changed, 36 insertions(+)

diff --git a/Documentation/arm64/silicon-errata.txt 
b/Documentation/arm64/silicon-errata.txt
index fc1c884..fb27cb5 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -63,6 +63,7 @@ stable kernels.
 | Cavium | ThunderX Core   | #27456  | CAVIUM_ERRATUM_27456
|
 | Cavium | ThunderX Core   | #30115  | CAVIUM_ERRATUM_30115
|
 | Cavium | ThunderX SMMUv2 | #27704  | N/A 
|
+| Cavium | ThunderX2 ITS   | #174| CAVIUM_ERRATUM_174  
|
 | Cavium | ThunderX2 SMMUv3| #74 | N/A 
|
 | Cavium | ThunderX2 SMMUv3| #126| N/A 
|
 || | | 
|
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9a7e9e..71a7e30 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -461,6 +461,17 @@ config ARM64_ERRATUM_843419
 
  If unsure, say Y.
 
+config CAVIUM_ERRATUM_174
+   bool "Cavium ThunderX2 erratum 174"
+   depends on NUMA
+   default y
+   help
+ LPI stops routed to redistributors after inter node collection
+ move in ITS. Enable workaround to invalidate ITS entry after
+ inter-node collection move.
+
+ If unsure, say Y.
+
 config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 06f025f..d8b9c96 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -46,6 +46,7 @@
 #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING  (1ULL << 0)
 #define ITS_FLAGS_WORKAROUND_CAVIUM_22375  (1ULL << 1)
 #define ITS_FLAGS_WORKAROUND_CAVIUM_23144  (1ULL << 2)
+#define ITS_FLAGS_WORKAROUND_CAVIUM_174(1ULL << 3)
 
 #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING(1 << 0)
 
@@ -1119,6 +1120,12 @@ static int its_set_affinity(struct irq_data *d, const 
struct cpumask *mask_val,
if (cpu != its_dev->event_map.col_map[id]) {
target_col = _dev->its->collections[cpu];
its_send_movi(its_dev, target_col, id);
+   if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_174) {
+   /* Issue INV for cross node collection move. */
+   if (cpu_to_node(cpu) !=
+   cpu_to_node(its_dev->event_map.col_map[id]))
+   its_send_inv(its_dev, id);
+   }
its_dev->event_map.col_map[id] = cpu;
irq_data_update_effective_affinity(d, cpumask_of(cpu));
}
@@ -2904,6 +2911,15 @@ static int its_force_quiescent(void __iomem *base)
}
 }
 
+static bool __maybe_unused its_enable_quirk_cavium_174(void *data)
+{
+   struct its_node *its = data;
+
+   its->flags |= ITS_FLAGS_WORKAROUND_CAVIUM_174;
+
+   return true;
+}
+
 static bool __maybe_unused its_enable_quirk_cavium_22375(void *data)
 {
struct its_node *its = data;
@@ -3031,6 +3047,14 @@ static const struct gic_quirk its_quirks[] = {
.init   = its_enable_quirk_hip07_161600802,
},
 #endif
+#ifdef CONFIG_CAVIUM_ERRATUM_174
+   {
+   .desc   = "ITS: Cavium ThunderX2 erratum 174",
+   .iidr   = 0x13f,/* ThunderX2 pass A1/A2/B0 */
+   .mask   = 0x,
+   .init   = its_enable_quirk_cavium_174,
+   },
+#endif
{
}
 };
-- 
2.9.4



Re: [PATCH] KVM: nVMX: remove unnecessary vmwrite from L2->L1 vmexit

2018-01-02 Thread Quan Xu



On 2018/01/02 17:47, Liran Alon wrote:



On 02/01/18 00:58, Paolo Bonzini wrote:
The POSTED_INTR_NV field is constant (though it differs between the 
vmcs01 and

vmcs02), there is no need to reload it on vmexit to L1.

Signed-off-by: Paolo Bonzini 
---
  arch/x86/kvm/vmx.c | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e6223fe8faa1..1e184830a295 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11610,9 +11610,6 @@ static void load_vmcs12_host_state(struct 
kvm_vcpu *vcpu,

   */
  vmx_flush_tlb(vcpu, true);
  }
-    /* Restore posted intr vector. */
-    if (nested_cpu_has_posted_intr(vmcs12))
-    vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR);

  vmcs_write32(GUEST_SYSENTER_CS, vmcs12->host_ia32_sysenter_cs);
  vmcs_writel(GUEST_SYSENTER_ESP, vmcs12->host_ia32_sysenter_esp);



Reviewed-by: Liran Alon 

I would also add to commit message:
Fixes: 06a5524f091b ("KVM: nVMX: Fix posted intr delivery when vcpu is 
in guest mode")



Reviewed-by: Quan Xu 



Re: [PATCH] KVM: nVMX: remove unnecessary vmwrite from L2->L1 vmexit

2018-01-02 Thread Quan Xu



On 2018/01/02 17:47, Liran Alon wrote:



On 02/01/18 00:58, Paolo Bonzini wrote:
The POSTED_INTR_NV field is constant (though it differs between the 
vmcs01 and

vmcs02), there is no need to reload it on vmexit to L1.

Signed-off-by: Paolo Bonzini 
---
  arch/x86/kvm/vmx.c | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e6223fe8faa1..1e184830a295 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11610,9 +11610,6 @@ static void load_vmcs12_host_state(struct 
kvm_vcpu *vcpu,

   */
  vmx_flush_tlb(vcpu, true);
  }
-    /* Restore posted intr vector. */
-    if (nested_cpu_has_posted_intr(vmcs12))
-    vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR);

  vmcs_write32(GUEST_SYSENTER_CS, vmcs12->host_ia32_sysenter_cs);
  vmcs_writel(GUEST_SYSENTER_ESP, vmcs12->host_ia32_sysenter_esp);



Reviewed-by: Liran Alon 

I would also add to commit message:
Fixes: 06a5524f091b ("KVM: nVMX: Fix posted intr delivery when vcpu is 
in guest mode")



Reviewed-by: Quan Xu 



Business Opportunity

2018-01-02 Thread Mr Yin Lianchen
Hello,

How are you and your family?
Thanks for accepting my connection.
I am connecting you due to a Business Opportunity.
Should you like to know more about it.
Do get back to me so i give you further details.

I hope to hear from you soon

Regards,

MR. YIN LIANCHEN
CHIEF INVESTMENT OFFICER
CHINA EVERBRIGHT LIMITED.
210 CENTURY CENTER BUILDING, 25th FLOOR, 21
CENTURY AVENUE, PUDONG NEW AREA,
SHANGHAI, CHINA


Business Opportunity

2018-01-02 Thread Mr Yin Lianchen
Hello,

How are you and your family?
Thanks for accepting my connection.
I am connecting you due to a Business Opportunity.
Should you like to know more about it.
Do get back to me so i give you further details.

I hope to hear from you soon

Regards,

MR. YIN LIANCHEN
CHIEF INVESTMENT OFFICER
CHINA EVERBRIGHT LIMITED.
210 CENTURY CENTER BUILDING, 25th FLOOR, 21
CENTURY AVENUE, PUDONG NEW AREA,
SHANGHAI, CHINA


Re: [PATCH 16/67] powerpc: rename dma_direct_ to dma_nommu_

2018-01-02 Thread Michael Ellerman
Geert Uytterhoeven  writes:

> On Tue, Jan 2, 2018 at 10:45 AM, Michael Ellerman  wrote:
>> Christoph Hellwig  writes:
>>
>>> We want to use the dma_direct_ namespace for a generic implementation,
>>> so rename powerpc to the second best choice: dma_nommu_.
>>
>> I'm not a fan of "nommu". Some of the users of direct ops *are* using an
>> IOMMU, they're just setting up a 1:1 mapping once at init time, rather
>> than mapping dynamically.
>>
>> Though I don't have a good idea for a better name, maybe "1to1",
>> "linear", "premapped" ?
>
> "identity"?

I think that would be wrong, but thanks for trying to help :)

The address on the device side is sometimes (often?) offset from the CPU
address. So eg. the device can DMA to RAM address 0x0 using address
0x800.

Identity would imply 0 == 0 etc.

I think "bijective" is the correct term, but that's probably a bit
esoteric.

cheers


Re: [PATCH 16/67] powerpc: rename dma_direct_ to dma_nommu_

2018-01-02 Thread Michael Ellerman
Geert Uytterhoeven  writes:

> On Tue, Jan 2, 2018 at 10:45 AM, Michael Ellerman  wrote:
>> Christoph Hellwig  writes:
>>
>>> We want to use the dma_direct_ namespace for a generic implementation,
>>> so rename powerpc to the second best choice: dma_nommu_.
>>
>> I'm not a fan of "nommu". Some of the users of direct ops *are* using an
>> IOMMU, they're just setting up a 1:1 mapping once at init time, rather
>> than mapping dynamically.
>>
>> Though I don't have a good idea for a better name, maybe "1to1",
>> "linear", "premapped" ?
>
> "identity"?

I think that would be wrong, but thanks for trying to help :)

The address on the device side is sometimes (often?) offset from the CPU
address. So eg. the device can DMA to RAM address 0x0 using address
0x800.

Identity would imply 0 == 0 etc.

I think "bijective" is the correct term, but that's probably a bit
esoteric.

cheers


Re: general protection fault in copy_verifier_state

2018-01-02 Thread Alexei Starovoitov
On Tue, Jan 02, 2018 at 02:58:01PM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 6bb8824732f69de0f233ae6b1a8158e149627b38
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+32ac5a3e473f2e01c...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> R10:  R11: 0246 R12: 
> R13: 656c6c616b7a7973 R14: 000e R15: 
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 3197 Comm: syzkaller425062 Not tainted 4.15.0-rc5+ #170
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:copy_func_state kernel/bpf/verifier.c:403 [inline]
> RIP: 0010:copy_verifier_state+0x364/0x590 kernel/bpf/verifier.c:431
> RSP: 0018:8801c7fff130 EFLAGS: 00010203
> RAX: 0070 RBX: dc00 RCX: 0384
> RDX:  RSI: 8801c938d800 RDI: 8801c938d800
> RBP: 8801c7fff188 R08: 8801c938d700 R09: 8801c938d700
> R10:  R11:  R12: 8801c8066940
> R13: 8801c938d700 R14:  R15: 8801c938d800
> FS:  01581880() GS:8801db30() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 20a97000 CR3: 0001c839a001 CR4: 001606e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  pop_stack+0x8c/0x270 kernel/bpf/verifier.c:449
>  push_stack kernel/bpf/verifier.c:491 [inline]
>  check_cond_jmp_op kernel/bpf/verifier.c:3598 [inline]
>  do_check+0x4b60/0xa050 kernel/bpf/verifier.c:4731
>  bpf_check+0x3296/0x58c0 kernel/bpf/verifier.c:5489
>  bpf_prog_load+0xa2a/0x1b00 kernel/bpf/syscall.c:1198
>  SYSC_bpf kernel/bpf/syscall.c:1807 [inline]
>  SyS_bpf+0x1044/0x4420 kernel/bpf/syscall.c:1769
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> RIP: 0033:0x4404f9
> RSP: 002b:7fff03dc4a48 EFLAGS: 0246 ORIG_RAX: 0141
> RAX: ffda RBX: 0001 RCX: 004404f9
> RDX: 0048 RSI: 20903000 RDI: 0005
> RBP: 000f R08: 0002 R09: 3332
> R10:  R11: 0246 R12: 
> R13: 656c6c616b7a7973 R14: 000e R15: 
> Code: 4b 8d 3c f7 48 89 f8 48 c1 e8 03 80 3c 18 00 0f 85 05 02 00 00 4f 8b
> 34 f7 49 8d 8e 84 03 00 00 48 89 c8 48 89 4d c8 48 c1 e8 03 <0f> b6 14 18 48
> 89 c8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
> RIP: copy_func_state kernel/bpf/verifier.c:403 [inline] RSP:
> 8801c7fff130
> RIP: copy_verifier_state+0x364/0x590 kernel/bpf/verifier.c:431 RSP:
> 8801c7fff130
> ---[ end trace 18f3ab976ca58c6c ]---

thanks for the report.
Looks like it needs this fix:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 98d8637cf70d..0876d4402dc3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -375,6 +375,8 @@ static int realloc_func_state(struct bpf_func_state *state, 
int size,

 static void free_func_state(struct bpf_func_state *state)
 {
+   if (!state)
+   return;
kfree(state->stack);
kfree(state);
 }
@@ -487,6 +489,8 @@ static struct bpf_verifier_state *push_stack(struct 
bpf_verifier_env *env,
}
return >st;
 err:
+   free_verifier_state(env->cur_state, true);
+   env->cur_state = NULL;
/* pop all elements and return */
while (!pop_stack(env, NULL, NULL));
return NULL;

will submit it properly after few more tests.



Re: general protection fault in copy_verifier_state

2018-01-02 Thread Alexei Starovoitov
On Tue, Jan 02, 2018 at 02:58:01PM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 6bb8824732f69de0f233ae6b1a8158e149627b38
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+32ac5a3e473f2e01c...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> R10:  R11: 0246 R12: 
> R13: 656c6c616b7a7973 R14: 000e R15: 
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 3197 Comm: syzkaller425062 Not tainted 4.15.0-rc5+ #170
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:copy_func_state kernel/bpf/verifier.c:403 [inline]
> RIP: 0010:copy_verifier_state+0x364/0x590 kernel/bpf/verifier.c:431
> RSP: 0018:8801c7fff130 EFLAGS: 00010203
> RAX: 0070 RBX: dc00 RCX: 0384
> RDX:  RSI: 8801c938d800 RDI: 8801c938d800
> RBP: 8801c7fff188 R08: 8801c938d700 R09: 8801c938d700
> R10:  R11:  R12: 8801c8066940
> R13: 8801c938d700 R14:  R15: 8801c938d800
> FS:  01581880() GS:8801db30() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 20a97000 CR3: 0001c839a001 CR4: 001606e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  pop_stack+0x8c/0x270 kernel/bpf/verifier.c:449
>  push_stack kernel/bpf/verifier.c:491 [inline]
>  check_cond_jmp_op kernel/bpf/verifier.c:3598 [inline]
>  do_check+0x4b60/0xa050 kernel/bpf/verifier.c:4731
>  bpf_check+0x3296/0x58c0 kernel/bpf/verifier.c:5489
>  bpf_prog_load+0xa2a/0x1b00 kernel/bpf/syscall.c:1198
>  SYSC_bpf kernel/bpf/syscall.c:1807 [inline]
>  SyS_bpf+0x1044/0x4420 kernel/bpf/syscall.c:1769
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> RIP: 0033:0x4404f9
> RSP: 002b:7fff03dc4a48 EFLAGS: 0246 ORIG_RAX: 0141
> RAX: ffda RBX: 0001 RCX: 004404f9
> RDX: 0048 RSI: 20903000 RDI: 0005
> RBP: 000f R08: 0002 R09: 3332
> R10:  R11: 0246 R12: 
> R13: 656c6c616b7a7973 R14: 000e R15: 
> Code: 4b 8d 3c f7 48 89 f8 48 c1 e8 03 80 3c 18 00 0f 85 05 02 00 00 4f 8b
> 34 f7 49 8d 8e 84 03 00 00 48 89 c8 48 89 4d c8 48 c1 e8 03 <0f> b6 14 18 48
> 89 c8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
> RIP: copy_func_state kernel/bpf/verifier.c:403 [inline] RSP:
> 8801c7fff130
> RIP: copy_verifier_state+0x364/0x590 kernel/bpf/verifier.c:431 RSP:
> 8801c7fff130
> ---[ end trace 18f3ab976ca58c6c ]---

thanks for the report.
Looks like it needs this fix:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 98d8637cf70d..0876d4402dc3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -375,6 +375,8 @@ static int realloc_func_state(struct bpf_func_state *state, 
int size,

 static void free_func_state(struct bpf_func_state *state)
 {
+   if (!state)
+   return;
kfree(state->stack);
kfree(state);
 }
@@ -487,6 +489,8 @@ static struct bpf_verifier_state *push_stack(struct 
bpf_verifier_env *env,
}
return >st;
 err:
+   free_verifier_state(env->cur_state, true);
+   env->cur_state = NULL;
/* pop all elements and return */
while (!pop_stack(env, NULL, NULL));
return NULL;

will submit it properly after few more tests.



Re: [PATCH v3 18/27] pinctrl: replace devm_ioremap_nocache with devm_ioremap

2018-01-02 Thread Yisheng Xie


On 2018/1/2 16:43, Linus Walleij wrote:
> On Sat, Dec 23, 2017 at 12:00 PM, Yisheng Xie  wrote:
> 
>> Default ioremap is ioremap_nocache, so devm_ioremap has the same
>> function with devm_ioremap_nocache, which can just be killed to
>> save the size of devres.o
>>
>> This patch is to use use devm_ioremap instead of devm_ioremap_nocache,
>> which should not have any function change but prepare for killing
>> devm_ioremap_nocache.
>>
>> Cc: Linus Walleij 
>> Cc: linux-g...@vger.kernel.org
>> Signed-off-by: Yisheng Xie 
> 
> Patch applied.

Well, I list the ARCHs related to the change file, do not include 
cris,ia64,mn10300
and openrisc, which ioremap is not the same as ioremap_nocache, as discussed in 
cover
letter. So please let me know if I need update the comment.

 change fileARCH
 drivers/pinctrl/bcm/pinctrl-ns2-mux.c | 2 +-   arm/arm64
 drivers/pinctrl/bcm/pinctrl-nsp-mux.c | 4 ++-- arm
 drivers/pinctrl/freescale/pinctrl-imx1-core.c | 2 +-   arm
 drivers/pinctrl/pinctrl-amd.c | 4 ++-- x86/arm

Thanks
Yisheng
> 
> Yours,
> Linus Walleij
> 
> 



Re: [PATCH v3 18/27] pinctrl: replace devm_ioremap_nocache with devm_ioremap

2018-01-02 Thread Yisheng Xie


On 2018/1/2 16:43, Linus Walleij wrote:
> On Sat, Dec 23, 2017 at 12:00 PM, Yisheng Xie  wrote:
> 
>> Default ioremap is ioremap_nocache, so devm_ioremap has the same
>> function with devm_ioremap_nocache, which can just be killed to
>> save the size of devres.o
>>
>> This patch is to use use devm_ioremap instead of devm_ioremap_nocache,
>> which should not have any function change but prepare for killing
>> devm_ioremap_nocache.
>>
>> Cc: Linus Walleij 
>> Cc: linux-g...@vger.kernel.org
>> Signed-off-by: Yisheng Xie 
> 
> Patch applied.

Well, I list the ARCHs related to the change file, do not include 
cris,ia64,mn10300
and openrisc, which ioremap is not the same as ioremap_nocache, as discussed in 
cover
letter. So please let me know if I need update the comment.

 change fileARCH
 drivers/pinctrl/bcm/pinctrl-ns2-mux.c | 2 +-   arm/arm64
 drivers/pinctrl/bcm/pinctrl-nsp-mux.c | 4 ++-- arm
 drivers/pinctrl/freescale/pinctrl-imx1-core.c | 2 +-   arm
 drivers/pinctrl/pinctrl-amd.c | 4 ++-- x86/arm

Thanks
Yisheng
> 
> Yours,
> Linus Walleij
> 
> 



Re: [PATCH v2 0/4] Address error and recovery for AER and DPC

2018-01-02 Thread poza

On 2018-01-03 00:32, Bjorn Helgaas wrote:

On Fri, Dec 29, 2017 at 12:54:15PM +0530, Oza Pawandeep wrote:
This patch set brings in support for DPC and AER to co-exist and not 
to

race for recovery.

The current implementation of AER and error message broadcasting to 
the

EP driver is tightly coupled and limited to AER service driver.
It is important to factor out broadcasting and other link handling
callbacks. So that not only when AER gets triggered, but also when DPC 
get

triggered, or both get triggered simultaneously (for e.g. ERR_FATAL),
callbacks are handled appropriately.
having modularized the code, the race between AER and DPC is handled
gracefully.
for e.g. when DPC is active and kicked in, AER should not attempt to 
do

recovery, because DPC takes care of it.


High-level question:

We have some convoluted code in negotiate_os_control() and
aer_service_init() that (I think) essentially disables AER unless the
platform firmware grants us permission to use it.

The last implementation note in PCIe r3.1, sec 6.2.10 says

  DPC may be controlled in some configurations by platform firmware
  and in other configurations by the operating system. DPC
  functionality is strongly linked with the functionality in Advanced
  Error Reporting. To avoid conflicts over whether platform firmware
  or the operating system have control of DPC, it is recommended that
  platform firmware and operating systems always link the control of
  DPC to the control of Advanced Error Reporting.

I read that as suggesting that we should enable DPC support in Linux
if and only if we also enable AER.  But I don't see anything in DPC
that looks like that.  Should there be something there?  Should DPC be
restructured so it's enabled and handled inside the AER driver instead
of being a separate driver?

Bjorn


The whole idea of factoring out error handing and plug it back to DPC is 
to
enable DPC is participate synchronously in pcie_port_service_driver 
hooks.


AER and DPC both being port service driver, it makes more sense, for DPC 
to be able
to do with those callbacks as much as AER is able to do with those 
callbacks currently.

but those callbacks are tightly coupled with AER driver.

that way DPC and AER can act independently in their own space, by 
gaining more control.

and if needed, both can synchronize the callbacks.

Regards,
Oza.













Re: [PATCH v2 0/4] Address error and recovery for AER and DPC

2018-01-02 Thread poza

On 2018-01-03 00:32, Bjorn Helgaas wrote:

On Fri, Dec 29, 2017 at 12:54:15PM +0530, Oza Pawandeep wrote:
This patch set brings in support for DPC and AER to co-exist and not 
to

race for recovery.

The current implementation of AER and error message broadcasting to 
the

EP driver is tightly coupled and limited to AER service driver.
It is important to factor out broadcasting and other link handling
callbacks. So that not only when AER gets triggered, but also when DPC 
get

triggered, or both get triggered simultaneously (for e.g. ERR_FATAL),
callbacks are handled appropriately.
having modularized the code, the race between AER and DPC is handled
gracefully.
for e.g. when DPC is active and kicked in, AER should not attempt to 
do

recovery, because DPC takes care of it.


High-level question:

We have some convoluted code in negotiate_os_control() and
aer_service_init() that (I think) essentially disables AER unless the
platform firmware grants us permission to use it.

The last implementation note in PCIe r3.1, sec 6.2.10 says

  DPC may be controlled in some configurations by platform firmware
  and in other configurations by the operating system. DPC
  functionality is strongly linked with the functionality in Advanced
  Error Reporting. To avoid conflicts over whether platform firmware
  or the operating system have control of DPC, it is recommended that
  platform firmware and operating systems always link the control of
  DPC to the control of Advanced Error Reporting.

I read that as suggesting that we should enable DPC support in Linux
if and only if we also enable AER.  But I don't see anything in DPC
that looks like that.  Should there be something there?  Should DPC be
restructured so it's enabled and handled inside the AER driver instead
of being a separate driver?

Bjorn


The whole idea of factoring out error handing and plug it back to DPC is 
to
enable DPC is participate synchronously in pcie_port_service_driver 
hooks.


AER and DPC both being port service driver, it makes more sense, for DPC 
to be able
to do with those callbacks as much as AER is able to do with those 
callbacks currently.

but those callbacks are tightly coupled with AER driver.

that way DPC and AER can act independently in their own space, by 
gaining more control.

and if needed, both can synchronize the callbacks.

Regards,
Oza.













RE: [LINUX PATCH 3/4] dmaengine: xilinx_dma: Fix compilation warning

2018-01-02 Thread Appana Durga Kedareswara Rao
Hi Vinod,


>On Wed, Jan 03, 2018 at 05:13:29AM +, Appana Durga Kedareswara Rao
>wrote:
>> Hi Vinod,
>>
>>  Thanks for the review...
>>
>> >
>> >On Thu, Dec 21, 2017 at 03:41:37PM +0530, Kedareswara rao Appana wrote:
>> >
>> >Fix title here too
>>
>> Sure will fix in v2...
>>
>> >
>> >BTW whats with LINUX tag in patches, pls drop them
>>
>> Ok will mention the Linux tag info in the cover letter patch from the
>> next patch series on wards...
>
>Please wrap your replies within 80chars. It is very hard to read! I have 
>reflown for
>readability

Sure will take care of it next time onwards... 

>
>Can you explain what you mean by that info, what are you trying to convey?

What I mean here is will mention the Linux kernel tag
Information in the cover letter patch...

Regards,
Kedar.

>
>--
>~Vinod


RE: [LINUX PATCH 3/4] dmaengine: xilinx_dma: Fix compilation warning

2018-01-02 Thread Appana Durga Kedareswara Rao
Hi Vinod,


>On Wed, Jan 03, 2018 at 05:13:29AM +, Appana Durga Kedareswara Rao
>wrote:
>> Hi Vinod,
>>
>>  Thanks for the review...
>>
>> >
>> >On Thu, Dec 21, 2017 at 03:41:37PM +0530, Kedareswara rao Appana wrote:
>> >
>> >Fix title here too
>>
>> Sure will fix in v2...
>>
>> >
>> >BTW whats with LINUX tag in patches, pls drop them
>>
>> Ok will mention the Linux tag info in the cover letter patch from the
>> next patch series on wards...
>
>Please wrap your replies within 80chars. It is very hard to read! I have 
>reflown for
>readability

Sure will take care of it next time onwards... 

>
>Can you explain what you mean by that info, what are you trying to convey?

What I mean here is will mention the Linux kernel tag
Information in the cover letter patch...

Regards,
Kedar.

>
>--
>~Vinod


[PATCH] iommu/of: Only do IOMMU lookup for available ones

2018-01-02 Thread Jeffy Chen
The for_each_matching_node_and_match() would return every matching
nodes including unavailable ones.

It's pointless to init unavailable IOMMUs, so add a sanity check to
avoid that.

Signed-off-by: Jeffy Chen 
---

 drivers/iommu/of_iommu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 50947ebb6d17..6f7456caa30d 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -240,6 +240,9 @@ static int __init of_iommu_init(void)
for_each_matching_node_and_match(np, matches, ) {
const of_iommu_init_fn init_fn = match->data;
 
+   if (!of_device_is_available(np))
+   continue;
+
if (init_fn && init_fn(np))
pr_err("Failed to initialise IOMMU %pOF\n", np);
}
-- 
2.11.0




[PATCH] iommu/of: Only do IOMMU lookup for available ones

2018-01-02 Thread Jeffy Chen
The for_each_matching_node_and_match() would return every matching
nodes including unavailable ones.

It's pointless to init unavailable IOMMUs, so add a sanity check to
avoid that.

Signed-off-by: Jeffy Chen 
---

 drivers/iommu/of_iommu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 50947ebb6d17..6f7456caa30d 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -240,6 +240,9 @@ static int __init of_iommu_init(void)
for_each_matching_node_and_match(np, matches, ) {
const of_iommu_init_fn init_fn = match->data;
 
+   if (!of_device_is_available(np))
+   continue;
+
if (init_fn && init_fn(np))
pr_err("Failed to initialise IOMMU %pOF\n", np);
}
-- 
2.11.0




Re: [PATCH v3 06/27] gpio: replace devm_ioremap_nocache with devm_ioremap

2018-01-02 Thread Yisheng Xie


On 2018/1/2 16:41, Linus Walleij wrote:
> On Sat, Dec 23, 2017 at 11:58 AM, Yisheng Xie  wrote:
> 
>> Default ioremap is ioremap_nocache, so devm_ioremap has the same
>> function with devm_ioremap_nocache, which can just be killed to
>> save the size of devres.o
>>
>> This patch is to use use devm_ioremap instead of devm_ioremap_nocache,
>> which should not have any function change but prepare for killing
>> devm_ioremap_nocache.
>>
>> Cc: Linus Walleij 
>> Cc: linux-g...@vger.kernel.org
>> Signed-off-by: Yisheng Xie 

Well, I list the ARCHs related to the change file, do not include 
cris,ia64,mn10300
and openrisc, which ioremap is not the same as ioremap_nocache, as discussed in 
cover
letter. So please let me know if I need update the comment.

 change_fileARCH
 drivers/gpio/gpio-ath79.c | 3 +--  mips
 drivers/gpio/gpio-em.c| 6 ++   arm
 drivers/gpio/gpio-htc-egpio.c | 4 ++-- arm
 drivers/gpio/gpio-xgene.c | 3 +--  arm64

Thanks
Yisheng
> 
> Patch applied.
> 
> Yours,
> Linus Walleij
> 
> 



Re: [PATCH v3 06/27] gpio: replace devm_ioremap_nocache with devm_ioremap

2018-01-02 Thread Yisheng Xie


On 2018/1/2 16:41, Linus Walleij wrote:
> On Sat, Dec 23, 2017 at 11:58 AM, Yisheng Xie  wrote:
> 
>> Default ioremap is ioremap_nocache, so devm_ioremap has the same
>> function with devm_ioremap_nocache, which can just be killed to
>> save the size of devres.o
>>
>> This patch is to use use devm_ioremap instead of devm_ioremap_nocache,
>> which should not have any function change but prepare for killing
>> devm_ioremap_nocache.
>>
>> Cc: Linus Walleij 
>> Cc: linux-g...@vger.kernel.org
>> Signed-off-by: Yisheng Xie 

Well, I list the ARCHs related to the change file, do not include 
cris,ia64,mn10300
and openrisc, which ioremap is not the same as ioremap_nocache, as discussed in 
cover
letter. So please let me know if I need update the comment.

 change_fileARCH
 drivers/gpio/gpio-ath79.c | 3 +--  mips
 drivers/gpio/gpio-em.c| 6 ++   arm
 drivers/gpio/gpio-htc-egpio.c | 4 ++-- arm
 drivers/gpio/gpio-xgene.c | 3 +--  arm64

Thanks
Yisheng
> 
> Patch applied.
> 
> Yours,
> Linus Walleij
> 
> 



linux-next: Tree for Jan 3

2018-01-02 Thread Stephen Rothwell
Hi all,

Changes since 20180102:

The clk tree lost its build failure.

The kvm-arm tree gained a conflict against Linus' tree.

Non-merge commits (relative to Linus' tree): 6587
 6916 files changed, 273638 insertions(+), 194470 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 255 trees (counting Linus' and 43 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (30a7acd57389 Linux 4.15-rc6)
Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging kbuild-current/fixes (cfe17c9bbe6a kbuild: move cc-option and 
cc-disable-warning after incl. arch Makefile)
Merging arc-current/for-curr (d3b388559fac ARC: handle gcc generated 
__builtin_trap for older compiler)
Merging arm-current/fixes (36b0cb84ee85 ARM: 8731/1: Fix 
csum_partial_copy_from_user() stack mismatch)
Merging m68k-current/for-linus (5e387199c17c m68k/defconfig: Update defconfigs 
for v4.14-rc7)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (7333b5aca412 KVM: PPC: Book3S HV: Fix pending_pri 
value in kvmppc_xive_get_icp())
Merging sparc/master (59585b4be9ae sparc64: repair calling incorrect hweight 
function from stubs)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (bd30ffc414e5 NET: usb: qmi_wwan: add support for YUGA 
CLM920-NC5 PID 0x9625)
Merging bpf/master (2758b3e3e630 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging ipsec/master (2f10a61cee8f xfrm: fix rcu usage in xfrm_get_type_offload)
Merging netfilter/master (8bea728dce89 netfilter: nf_tables: fix potential 
NULL-ptr deref in nf_tables_dump_obj_done())
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (a41886f56b7b Merge tag 
'iwlwifi-for-kalle-2017-12-05' of 
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (04a7279ff12f cfg80211: ship certificates as hex files)
Merging sound-current/for-linus (fe08f34d066f ALSA: pcm: Remove incorrect 
snd_BUG_ON() usages)
Merging pci-current/for-linus (1291a0d5049d Linux 4.15-rc4)
Merging driver-core.current/driver-core-linus (30a7acd57389 Linux 4.15-rc6)
Merging tty.current/tty-linus (30a7acd57389 Linux 4.15-rc6)
Merging usb.current/usb-linus (30a7acd57389 Linux 4.15-rc6)
Merging usb-gadget-fixes/fixes (1291a0d5049d Linux 4.15-rc4)
Merging usb-serial-fixes/usb-linus (4307413256ac USB: serial: cp210x: add IDs 
for LifeScan OneTouch Verio IQ)
Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: 
fix ulpi-node lookup)
Merging phy/fixes (2b88212c4cc6 phy: rcar-gen3-usb2: select USB_COMMON)
Merging staging.current/staging-linus (30a7acd57389 Linux 4.15-rc6)
Merging char-misc.current/char-misc-linus (30a7acd57389 Linux 4.15-rc6)
Merging input-current/for-linus (8b7e9d9e2d8b Input: hideep - fix compile error 
due to missing include file)
Merging crypto-current/master (2973633e9f09 crypto: inside-secure - do not use 
areq->result for partial results)
Merging ide/master (0c86a6bd85ff Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging vfio-fixes/for-linus (e4

linux-next: Tree for Jan 3

2018-01-02 Thread Stephen Rothwell
Hi all,

Changes since 20180102:

The clk tree lost its build failure.

The kvm-arm tree gained a conflict against Linus' tree.

Non-merge commits (relative to Linus' tree): 6587
 6916 files changed, 273638 insertions(+), 194470 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 255 trees (counting Linus' and 43 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (30a7acd57389 Linux 4.15-rc6)
Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging kbuild-current/fixes (cfe17c9bbe6a kbuild: move cc-option and 
cc-disable-warning after incl. arch Makefile)
Merging arc-current/for-curr (d3b388559fac ARC: handle gcc generated 
__builtin_trap for older compiler)
Merging arm-current/fixes (36b0cb84ee85 ARM: 8731/1: Fix 
csum_partial_copy_from_user() stack mismatch)
Merging m68k-current/for-linus (5e387199c17c m68k/defconfig: Update defconfigs 
for v4.14-rc7)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (7333b5aca412 KVM: PPC: Book3S HV: Fix pending_pri 
value in kvmppc_xive_get_icp())
Merging sparc/master (59585b4be9ae sparc64: repair calling incorrect hweight 
function from stubs)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (bd30ffc414e5 NET: usb: qmi_wwan: add support for YUGA 
CLM920-NC5 PID 0x9625)
Merging bpf/master (2758b3e3e630 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging ipsec/master (2f10a61cee8f xfrm: fix rcu usage in xfrm_get_type_offload)
Merging netfilter/master (8bea728dce89 netfilter: nf_tables: fix potential 
NULL-ptr deref in nf_tables_dump_obj_done())
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (a41886f56b7b Merge tag 
'iwlwifi-for-kalle-2017-12-05' of 
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (04a7279ff12f cfg80211: ship certificates as hex files)
Merging sound-current/for-linus (fe08f34d066f ALSA: pcm: Remove incorrect 
snd_BUG_ON() usages)
Merging pci-current/for-linus (1291a0d5049d Linux 4.15-rc4)
Merging driver-core.current/driver-core-linus (30a7acd57389 Linux 4.15-rc6)
Merging tty.current/tty-linus (30a7acd57389 Linux 4.15-rc6)
Merging usb.current/usb-linus (30a7acd57389 Linux 4.15-rc6)
Merging usb-gadget-fixes/fixes (1291a0d5049d Linux 4.15-rc4)
Merging usb-serial-fixes/usb-linus (4307413256ac USB: serial: cp210x: add IDs 
for LifeScan OneTouch Verio IQ)
Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: 
fix ulpi-node lookup)
Merging phy/fixes (2b88212c4cc6 phy: rcar-gen3-usb2: select USB_COMMON)
Merging staging.current/staging-linus (30a7acd57389 Linux 4.15-rc6)
Merging char-misc.current/char-misc-linus (30a7acd57389 Linux 4.15-rc6)
Merging input-current/for-linus (8b7e9d9e2d8b Input: hideep - fix compile error 
due to missing include file)
Merging crypto-current/master (2973633e9f09 crypto: inside-secure - do not use 
areq->result for partial results)
Merging ide/master (0c86a6bd85ff Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging vfio-fixes/for-linus (e4

Re: [ANNOUNCE] Git v2.16.0-rc0

2018-01-02 Thread Jonathan Nieder
Bryan Turner wrote:
> On Tue, Jan 2, 2018 at 9:07 PM, Jonathan Nieder  wrote:

>> So my first question is why the basename detection is not working for
>> you.  What value of GIT_SSH, GIT_SSH_COMMAND, or core.sshCommand are
>> you using?
>
> So I'd been digging further into this for the last hour because I
> wasn't seeing quite the behavior I was expecting when I ran Git from
> the command line on Ubuntu 12.04 or 14.04, and this nudged me to the
> right answer: We're setting GIT_SSH to a wrapper script. In our case,
> that wrapper script is just calling OpenSSH's ssh with all the
> provided arguments (plus a couple extra ones), but because we're
> setting GIT_SSH at all, that's why the auto variant code is running.
> That being the case, explicitly setting GIT_SSH_VARIANT=ssh may be the
> correct thing to do, to tell Git that we want to be treated like
> "normal" OpenSSH, as opposed to expecting Git to assume we behave like
> OpenSSH (when the Android repo use case clearly shows that assumption
> also doesn't hold).

Ah, that's a comfort.  Setting GIT_SSH_VARIANT would avoid this
autodetection code and is the recommended thing to do.

That said, we can't go back in time and update everyone's tools to do
that (e.g. there is not even a release of repo with [1] out yet), so
this is still considered a regression and I'm glad you found it.

Jonathan

[1] https://gerrit-review.googlesource.com/c/git-repo/+/134950


Re: [ANNOUNCE] Git v2.16.0-rc0

2018-01-02 Thread Jonathan Nieder
Bryan Turner wrote:
> On Tue, Jan 2, 2018 at 9:07 PM, Jonathan Nieder  wrote:

>> So my first question is why the basename detection is not working for
>> you.  What value of GIT_SSH, GIT_SSH_COMMAND, or core.sshCommand are
>> you using?
>
> So I'd been digging further into this for the last hour because I
> wasn't seeing quite the behavior I was expecting when I ran Git from
> the command line on Ubuntu 12.04 or 14.04, and this nudged me to the
> right answer: We're setting GIT_SSH to a wrapper script. In our case,
> that wrapper script is just calling OpenSSH's ssh with all the
> provided arguments (plus a couple extra ones), but because we're
> setting GIT_SSH at all, that's why the auto variant code is running.
> That being the case, explicitly setting GIT_SSH_VARIANT=ssh may be the
> correct thing to do, to tell Git that we want to be treated like
> "normal" OpenSSH, as opposed to expecting Git to assume we behave like
> OpenSSH (when the Android repo use case clearly shows that assumption
> also doesn't hold).

Ah, that's a comfort.  Setting GIT_SSH_VARIANT would avoid this
autodetection code and is the recommended thing to do.

That said, we can't go back in time and update everyone's tools to do
that (e.g. there is not even a release of repo with [1] out yet), so
this is still considered a regression and I'm glad you found it.

Jonathan

[1] https://gerrit-review.googlesource.com/c/git-repo/+/134950


Re: About the try to remove cross-release feature entirely by Ingo

2018-01-02 Thread Byungchul Park

On 1/3/2018 11:58 AM, Dave Chinner wrote:

On Wed, Jan 03, 2018 at 11:28:44AM +0900, Byungchul Park wrote:

On 1/1/2018 7:18 PM, Matthew Wilcox wrote:

On Sat, Dec 30, 2017 at 06:00:57PM -0500, Theodore Ts'o wrote:

Also, what to do with TCP connections which are created in userspace
(with some authentication exchanges happening in userspace), and then
passed into kernel space for use in kernel space, is an interesting
question.


Yes!  I'd love to have a lockdep expert weigh in here.  I believe it's
legitimate to change a lock's class after it's been used, essentially
destroying it and reinitialising it.  If not, it should be because it's
a reasonable design for an object to need different lock classes for
different phases of its existance.


I also think it should be done ultimately. And I think it's very much
hard since it requires to change the dependency graph of lockdep but
anyway possible. It's up to lockdep maintainer's will though..


We used to do this in XFS to work around the fact that the memory
reclaim context "locks" were too stupid to understand that an object
referenced and locked above memory allocation could not be
accessed below in memory reclaim because memory reclaim only accesses
/unreferenced objects/. We played whack-a-mole with lockdep for
years to get most of the false positives sorted out.

Hence for a long time we had to re-initialise the lock context for
the XFS inode iolock in ->evict_inode() so we could lock it for
reclaim processing.  Eventually we ended up completely reworking the
inode reclaim locking in XFS primarily to get rid of all the nasty
lockdep hacks we had strewn throughout the code. It was ~2012 we
got rid of the last inode re-init code, IIRC. Yeah:

commit 4f59af758f9092bc7b266ca919ce6067170e5172
Author: Christoph Hellwig 
Date:   Wed Jul 4 11:13:33 2012 -0400

 xfs: remove iolock lock classes
 
 Now that we never take the iolock during inode reclaim we don't need

 to play games with lock classes.
 
 Signed-off-by: Christoph Hellwig 

 Reviewed-by: Rich Johnston 
 Signed-off-by: Ben Myers 

We still have problems with lockdep false positives w.r.t. memory
allocation contexts, mainly with code that can be called from
both above and below memory allocation contexts. We've finally
got __GFP_NOLOCKDEP to be able to annotate memory allocation points
within such code paths, but that doesn't help with locks

Byungchul, lockdep has a long, long history of having sharp edges
and being very unfriendly to developers. We've all been scarred by
lockdep at one time or another and so there's a fair bit of
resistance to repeating past mistakes and allowing lockdep to
inflict more scars on us


As I understand what you suffered from.. I don't really want to
force it forward strongly.

So far, all problems have been handled by myself including the
final one e.i. the completion in submit_bio_wait() with the
invalidation if it's allowed. But yes, who knows the future? In
the future, that terrible thing you mentioned might or might
not happen because of cross-release.

I just felt like someone was misunderstanding what the problem
came from, what the problem was, how we could avoid it, why
cross-release should be removed and so on..

I believe the 3 ways I suggested can help, but I don't want to
strongly insist if all of you don't think so.

Thanks a lot anyway for your opinion.

--
Thanks,
Byungchul


Re: About the try to remove cross-release feature entirely by Ingo

2018-01-02 Thread Byungchul Park

On 1/3/2018 11:58 AM, Dave Chinner wrote:

On Wed, Jan 03, 2018 at 11:28:44AM +0900, Byungchul Park wrote:

On 1/1/2018 7:18 PM, Matthew Wilcox wrote:

On Sat, Dec 30, 2017 at 06:00:57PM -0500, Theodore Ts'o wrote:

Also, what to do with TCP connections which are created in userspace
(with some authentication exchanges happening in userspace), and then
passed into kernel space for use in kernel space, is an interesting
question.


Yes!  I'd love to have a lockdep expert weigh in here.  I believe it's
legitimate to change a lock's class after it's been used, essentially
destroying it and reinitialising it.  If not, it should be because it's
a reasonable design for an object to need different lock classes for
different phases of its existance.


I also think it should be done ultimately. And I think it's very much
hard since it requires to change the dependency graph of lockdep but
anyway possible. It's up to lockdep maintainer's will though..


We used to do this in XFS to work around the fact that the memory
reclaim context "locks" were too stupid to understand that an object
referenced and locked above memory allocation could not be
accessed below in memory reclaim because memory reclaim only accesses
/unreferenced objects/. We played whack-a-mole with lockdep for
years to get most of the false positives sorted out.

Hence for a long time we had to re-initialise the lock context for
the XFS inode iolock in ->evict_inode() so we could lock it for
reclaim processing.  Eventually we ended up completely reworking the
inode reclaim locking in XFS primarily to get rid of all the nasty
lockdep hacks we had strewn throughout the code. It was ~2012 we
got rid of the last inode re-init code, IIRC. Yeah:

commit 4f59af758f9092bc7b266ca919ce6067170e5172
Author: Christoph Hellwig 
Date:   Wed Jul 4 11:13:33 2012 -0400

 xfs: remove iolock lock classes
 
 Now that we never take the iolock during inode reclaim we don't need

 to play games with lock classes.
 
 Signed-off-by: Christoph Hellwig 

 Reviewed-by: Rich Johnston 
 Signed-off-by: Ben Myers 

We still have problems with lockdep false positives w.r.t. memory
allocation contexts, mainly with code that can be called from
both above and below memory allocation contexts. We've finally
got __GFP_NOLOCKDEP to be able to annotate memory allocation points
within such code paths, but that doesn't help with locks

Byungchul, lockdep has a long, long history of having sharp edges
and being very unfriendly to developers. We've all been scarred by
lockdep at one time or another and so there's a fair bit of
resistance to repeating past mistakes and allowing lockdep to
inflict more scars on us


As I understand what you suffered from.. I don't really want to
force it forward strongly.

So far, all problems have been handled by myself including the
final one e.i. the completion in submit_bio_wait() with the
invalidation if it's allowed. But yes, who knows the future? In
the future, that terrible thing you mentioned might or might
not happen because of cross-release.

I just felt like someone was misunderstanding what the problem
came from, what the problem was, how we could avoid it, why
cross-release should be removed and so on..

I believe the 3 ways I suggested can help, but I don't want to
strongly insist if all of you don't think so.

Thanks a lot anyway for your opinion.

--
Thanks,
Byungchul


Re: [PATCH 10/13] ocxl: Add Makefile and Kconfig

2018-01-02 Thread Andrew Donnellan

On 19/12/17 02:21, Frederic Barrat wrote:

OCXL_BASE triggers the platform support needed by the driver.

Signed-off-by: Frederic Barrat 
---
  drivers/misc/Kconfig   |  1 +
  drivers/misc/Makefile  |  1 +
  drivers/misc/ocxl/Kconfig  | 25 +
  drivers/misc/ocxl/Makefile | 10 ++
  4 files changed, 37 insertions(+)
  create mode 100644 drivers/misc/ocxl/Kconfig
  create mode 100644 drivers/misc/ocxl/Makefile

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index f1a5c2357b14..0534f338c84a 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -508,4 +508,5 @@ source "drivers/misc/mic/Kconfig"
  source "drivers/misc/genwqe/Kconfig"
  source "drivers/misc/echo/Kconfig"
  source "drivers/misc/cxl/Kconfig"
+source "drivers/misc/ocxl/Kconfig"
  endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 5ca5f64df478..73326d54e246 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -55,6 +55,7 @@ obj-$(CONFIG_CXL_BASE)+= cxl/
  obj-$(CONFIG_ASPEED_LPC_CTRL) += aspeed-lpc-ctrl.o
  obj-$(CONFIG_ASPEED_LPC_SNOOP)+= aspeed-lpc-snoop.o
  obj-$(CONFIG_PCI_ENDPOINT_TEST)   += pci_endpoint_test.o
+obj-$(CONFIG_OCXL) += ocxl/

  lkdtm-$(CONFIG_LKDTM) += lkdtm_core.o
  lkdtm-$(CONFIG_LKDTM) += lkdtm_bugs.o
diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig
new file mode 100644
index ..4496b61f48db
--- /dev/null
+++ b/drivers/misc/ocxl/Kconfig
@@ -0,0 +1,25 @@
+#
+# Open Coherent Accelerator (OCXL) compatible devices
+#
+
+config OCXL_BASE
+   bool
+   default n
+   select PPC_COPRO_BASE
+
+config OCXL
+   tristate "Support for Open Coherent Accelerators (OCXL)"
+   depends on PPC_POWERNV && PCI && EEH
+   select OCXL_BASE
+   default m
+   help
+
+ Select this option to enable driver support for Open
+ Coherent Accelerators (OCXL).  OCXL is otherwise known as
+ Open Coherent Accelerator Processor Interface (OCAPI).
+ OCAPI allows accelerators in FPGAs to be coherently attached
+ to a CPU through a Open CAPI link.  This driver enables
+ userspace programs to access these accelerators through
+ devices found in /dev/ocxl/


I'd prefer more consistency in how we refer to OpenCAPI. "ocxl" is a 
driver name that we have purely for historical reasons, it's not really 
the name of anything else. I know throughout the various specs and code, 
we use "OCAPI" a lot, but that's not really an abbreviation that should 
be "user-facing".


Something like:

config OCXL
 tristate "OpenCAPI coherent accelerator support"
 help

   Select this option to enable the ocxl driver for Open Coherent 


   Accelerator Processor Interface (OpenCAPI) devices.

   OpenCAPI allows FPGA and ASIC accelerators to be coherently
   attached to a CPU over an OpenCAPI link.

   The ocxl driver enables userspace programs to access these
   accelerators through devices in /dev/ocxl/.

   For more information, see http://opencapi.org.

   If unsure, say N.


+
+ If unsure, say N.
diff --git a/drivers/misc/ocxl/Makefile b/drivers/misc/ocxl/Makefile
new file mode 100644
index ..f75853411cfd
--- /dev/null
+++ b/drivers/misc/ocxl/Makefile
@@ -0,0 +1,10 @@
+ccflags-$(CONFIG_PPC_WERROR)   += -Werror
+
+ocxl-y += main.o pci.o config.o file.o pasid.o
+ocxl-y += link.o context.o afu_irq.o sysfs.o trace.o
+obj-$(CONFIG_OCXL) += ocxl.o
+
+# For tracepoints to include our trace.h from tracepoint infrastructure:
+CFLAGS_trace.o := -I$(src)
+
+# ccflags-y += -DDEBUG



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH 10/13] ocxl: Add Makefile and Kconfig

2018-01-02 Thread Andrew Donnellan

On 19/12/17 02:21, Frederic Barrat wrote:

OCXL_BASE triggers the platform support needed by the driver.

Signed-off-by: Frederic Barrat 
---
  drivers/misc/Kconfig   |  1 +
  drivers/misc/Makefile  |  1 +
  drivers/misc/ocxl/Kconfig  | 25 +
  drivers/misc/ocxl/Makefile | 10 ++
  4 files changed, 37 insertions(+)
  create mode 100644 drivers/misc/ocxl/Kconfig
  create mode 100644 drivers/misc/ocxl/Makefile

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index f1a5c2357b14..0534f338c84a 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -508,4 +508,5 @@ source "drivers/misc/mic/Kconfig"
  source "drivers/misc/genwqe/Kconfig"
  source "drivers/misc/echo/Kconfig"
  source "drivers/misc/cxl/Kconfig"
+source "drivers/misc/ocxl/Kconfig"
  endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 5ca5f64df478..73326d54e246 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -55,6 +55,7 @@ obj-$(CONFIG_CXL_BASE)+= cxl/
  obj-$(CONFIG_ASPEED_LPC_CTRL) += aspeed-lpc-ctrl.o
  obj-$(CONFIG_ASPEED_LPC_SNOOP)+= aspeed-lpc-snoop.o
  obj-$(CONFIG_PCI_ENDPOINT_TEST)   += pci_endpoint_test.o
+obj-$(CONFIG_OCXL) += ocxl/

  lkdtm-$(CONFIG_LKDTM) += lkdtm_core.o
  lkdtm-$(CONFIG_LKDTM) += lkdtm_bugs.o
diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig
new file mode 100644
index ..4496b61f48db
--- /dev/null
+++ b/drivers/misc/ocxl/Kconfig
@@ -0,0 +1,25 @@
+#
+# Open Coherent Accelerator (OCXL) compatible devices
+#
+
+config OCXL_BASE
+   bool
+   default n
+   select PPC_COPRO_BASE
+
+config OCXL
+   tristate "Support for Open Coherent Accelerators (OCXL)"
+   depends on PPC_POWERNV && PCI && EEH
+   select OCXL_BASE
+   default m
+   help
+
+ Select this option to enable driver support for Open
+ Coherent Accelerators (OCXL).  OCXL is otherwise known as
+ Open Coherent Accelerator Processor Interface (OCAPI).
+ OCAPI allows accelerators in FPGAs to be coherently attached
+ to a CPU through a Open CAPI link.  This driver enables
+ userspace programs to access these accelerators through
+ devices found in /dev/ocxl/


I'd prefer more consistency in how we refer to OpenCAPI. "ocxl" is a 
driver name that we have purely for historical reasons, it's not really 
the name of anything else. I know throughout the various specs and code, 
we use "OCAPI" a lot, but that's not really an abbreviation that should 
be "user-facing".


Something like:

config OCXL
 tristate "OpenCAPI coherent accelerator support"
 help

   Select this option to enable the ocxl driver for Open Coherent 


   Accelerator Processor Interface (OpenCAPI) devices.

   OpenCAPI allows FPGA and ASIC accelerators to be coherently
   attached to a CPU over an OpenCAPI link.

   The ocxl driver enables userspace programs to access these
   accelerators through devices in /dev/ocxl/.

   For more information, see http://opencapi.org.

   If unsure, say N.


+
+ If unsure, say N.
diff --git a/drivers/misc/ocxl/Makefile b/drivers/misc/ocxl/Makefile
new file mode 100644
index ..f75853411cfd
--- /dev/null
+++ b/drivers/misc/ocxl/Makefile
@@ -0,0 +1,10 @@
+ccflags-$(CONFIG_PPC_WERROR)   += -Werror
+
+ocxl-y += main.o pci.o config.o file.o pasid.o
+ocxl-y += link.o context.o afu_irq.o sysfs.o trace.o
+obj-$(CONFIG_OCXL) += ocxl.o
+
+# For tracepoints to include our trace.h from tracepoint infrastructure:
+CFLAGS_trace.o := -I$(src)
+
+# ccflags-y += -DDEBUG



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH 02/11] clk: sunxi-ng: a83t: Add M divider to TCON1 clock

2018-01-02 Thread Chen-Yu Tsai
On Sun, Dec 31, 2017 at 5:01 AM, Jernej Skrabec  wrote:
> TCON1 also has M divider, contrary to TCON0.
>
> Fixes: 05359be1176b ("clk: sunxi-ng: Add driver for A83T CCU")
>
> Signed-off-by: Jernej Skrabec 

Added "And the mux is only 2 bits wide, instead of 3." to the commit
message and applied.

ChenYu


Re: [PATCH 02/11] clk: sunxi-ng: a83t: Add M divider to TCON1 clock

2018-01-02 Thread Chen-Yu Tsai
On Sun, Dec 31, 2017 at 5:01 AM, Jernej Skrabec  wrote:
> TCON1 also has M divider, contrary to TCON0.
>
> Fixes: 05359be1176b ("clk: sunxi-ng: Add driver for A83T CCU")
>
> Signed-off-by: Jernej Skrabec 

Added "And the mux is only 2 bits wide, instead of 3." to the commit
message and applied.

ChenYu


  1   2   3   4   5   6   7   8   9   10   >