date:20151002

Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-02 Thread Mathias Krause

On 2 October 2015 at 22:43, Jason Baron  wrote:
> The unix_dgram_poll() routine calls sock_poll_wait() not only for the wait
> queue associated with the socket s that we are poll'ing against, but also 
> calls
> sock_poll_wait() for a remote peer socket p, if it is connected. Thus,
> if we call poll()/select()/epoll() for the socket s, there are then
> a couple of code paths in which the remote peer socket p and its associated
> peer_wait queue can be freed before poll()/select()/epoll() have a chance
> to remove themselves from the remote peer socket.
>
> The way that remote peer socket can be freed are:
>
> 1. If s calls connect() to a connect to a new socket other than p, it will
> drop its reference on p, and thus a close() on p will free it.
>
> 2. If we call close on p(), then a subsequent sendmsg() from s, will drop
> the final reference to p, allowing it to be freed.
>
> Address this issue, by reverting unix_dgram_poll() to only register with
> the wait queue associated with s and register a callback with the remote peer
> socket on connect() that will wake up the wait queue associated with s. If
> scenarios 1 or 2 occur above we then simply remove the callback from the
> remote peer. This then presents the expected semantics to poll()/select()/
> epoll().
>
> I've implemented this for sock-type, SOCK_RAW, SOCK_DGRAM, and SOCK_SEQPACKET
> but not for SOCK_STREAM, since SOCK_STREAM does not use unix_dgram_poll().
>
> Signed-off-by: Jason Baron 
> ---
>  include/net/af_unix.h |  1 +
>  net/unix/af_unix.c| 32 +++-
>  2 files changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index 4a167b3..9698aff 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -62,6 +62,7 @@ struct unix_sock {
>  #define UNIX_GC_CANDIDATE  0
>  #define UNIX_GC_MAYBE_CYCLE1
> struct socket_wqpeer_wq;
> +   wait_queue_twait;
>  };
>  #define unix_sk(__sk) ((struct unix_sock *)__sk)
>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 03ee4d3..f789423 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -420,6 +420,9 @@ static void unix_release_sock(struct sock *sk, int 
> embrion)
> skpair = unix_peer(sk);
>
> if (skpair != NULL) {
> +   if (sk->sk_type != SOCK_STREAM)
> +   remove_wait_queue(_sk(skpair)->peer_wait,
> + >wait);
> if (sk->sk_type == SOCK_STREAM || sk->sk_type == 
> SOCK_SEQPACKET) {
> unix_state_lock(skpair);
> /* No more writes */
> @@ -636,6 +639,16 @@ static struct proto unix_proto = {
>   */
>  static struct lock_class_key af_unix_sk_receive_queue_lock_key;
>
> +static int peer_wake(wait_queue_t *wait, unsigned mode, int sync, void *key)
> +{
> +   struct unix_sock *u;
> +
> +   u = container_of(wait, struct unix_sock, wait);
> +   wake_up_interruptible_sync_poll(sk_sleep(>sk), key);
> +
> +   return 0;
> +}
> +
>  static struct sock *unix_create1(struct net *net, struct socket *sock, int 
> kern)
>  {
> struct sock *sk = NULL;
> @@ -664,6 +677,7 @@ static struct sock *unix_create1(struct net *net, struct 
> socket *sock, int kern)
> INIT_LIST_HEAD(>link);
> mutex_init(>readlock); /* single task reading lock */
> init_waitqueue_head(>peer_wait);
> +   init_waitqueue_func_entry(>wait, peer_wake);
> unix_insert_socket(unix_sockets_unbound(sk), sk);
>  out:
> if (sk == NULL)
> @@ -1030,7 +1044,11 @@ restart:
>  */
> if (unix_peer(sk)) {
> struct sock *old_peer = unix_peer(sk);
> +
> +   remove_wait_queue(_sk(old_peer)->peer_wait,
> + _sk(sk)->wait);
> unix_peer(sk) = other;
> +   add_wait_queue(_sk(other)->peer_wait, 
> _sk(sk)->wait);
> unix_state_double_unlock(sk, other);
>
> if (other != old_peer)
> @@ -1038,8 +1056,12 @@ restart:
> sock_put(old_peer);
> } else {
> unix_peer(sk) = other;
> +   add_wait_queue(_sk(other)->peer_wait, 
> _sk(sk)->wait);
> unix_state_double_unlock(sk, other);
> }
> +   /* New remote may have created write space for us */
> +   wake_up_interruptible_sync_poll(sk_sleep(sk),
> +   POLLOUT | POLLWRNORM | POLLWRBAND);
> return 0;
>
>  out_unlock:
> @@ -1194,6 +1216,8 @@ restart:
>
> sock_hold(sk);
> unix_peer(newsk)= sk;
> +   if (sk->sk_type == SOCK_SEQPACKET)
> +   add_wait_queue(_sk(sk)->peer_wait, 
> _sk(newsk)->wait);
> newsk->sk_state = TCP_ESTABLISHED;
> newsk->sk_type  = sk->sk_type;
> init_peercred(newsk);
> @@ -1220,6 +1244,8 @@ restart:
>
>

Re: [PATCH 1/3] Input: db9 - store object at correct index

2015-10-02 Thread Sudip Mukherjee

On Fri, Oct 02, 2015 at 11:11:18AM -0700, Dmitry Torokhov wrote:
> Hi Sudip,
> 
> On Fri, Oct 02, 2015 at 04:58:33PM +0530, Sudip Mukherjee wrote:
> > The variable i is used to check the port to attach to and we are
> > supposed to save the reference of struct db9 in the location given by
> > db9_base[i]. But after finding out the index i is getting modified again
> > so we saved in a wrong index.
> > 
> > Fixes: 2260c419b52b ("Input: db9 - use parallel port device model")
> > Signed-off-by: Sudip Mukherjee 
> > ---
> >  drivers/input/joystick/db9.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/input/joystick/db9.c b/drivers/input/joystick/db9.c
> > index cf1f602..f6ecd4a 100644
> > --- a/drivers/input/joystick/db9.c
> > +++ b/drivers/input/joystick/db9.c
> > @@ -560,7 +560,7 @@ static void db9_attach(struct parport *pp)
> > const struct db9_mode_data *db9_mode;
> > struct pardevice *pd;
> > struct input_dev *input_dev;
> > -   int i, j;
> > +   int i, j, k;
> > int mode;
> > struct pardev_cb db9_parport_cb;
> >  
> > @@ -577,6 +577,7 @@ static void db9_attach(struct parport *pp)
> > pr_debug("Not using parport%d.\n", pp->number);
> > return;
> > }
> > +   k = i;
> 
> Hmm, I'd prefer we did not reuse 'i' at all. Can we instead of 'k' add
> 'port_idx' and use it instead of 'i' in the first loop?
Yes, that will be great. This reuse of i caused this mistake. Sorry
about that I should have been more carefull.

regards
sudip
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ceph:Remove unused goto labels in decode crush map functions

2015-10-02 Thread Ilya Dryomov

On Fri, Oct 2, 2015 at 9:48 PM, Nicholas Krause  wrote:
> This removes unused goto labels in decode crush map functions related
> to error paths due to them never being used on any error path for these
> particular functions in the file, osdmap.c.
>
> Signed-off-by: Nicholas Krause 
> ---
>  net/ceph/osdmap.c | 10 --
>  1 file changed, 10 deletions(-)
>
> diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c
> index 7d8f581..2f8e41c 100644
> --- a/net/ceph/osdmap.c
> +++ b/net/ceph/osdmap.c
> @@ -59,8 +59,6 @@ static int crush_decode_uniform_bucket(void **p, void *end,
> ceph_decode_need(p, end, (1+b->h.size) * sizeof(u32), bad);
^^^
> b->item_weight = ceph_decode_32(p);
> return 0;
> -bad:
   ^^^
> -   return -EINVAL;
>  }

I realize that these macros are sneaky, but you should at least
compile-test your patches before you send them out.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mtd: mtdram: check offs and len in mtdram->erase

2015-10-02 Thread Sudip Mukherjee

On Sat, Oct 03, 2015 at 11:31:04AM +0800, Dongsheng Yang wrote:
> On 10/03/2015 01:38 AM, Brian Norris wrote:
> >On Fri, Oct 02, 2015 at 03:31:33PM +0530, Sudip Mukherjee wrote:
> >>On Fri, Oct 02, 2015 at 05:39:02PM +0800, Dongsheng Yang wrote:
> >>>On 10/01/2015 12:41 AM, Sudip Mukherjee wrote:
> We should prevent user to erasing mtd device with an unaligned offset
> or length.
> 
> Signed-off-by: Sudip Mukherjee 
> ---
> 
> I am not sure if I should add the Signed-off-by of
> Dongsheng Yang  . He is the original author
> and he should get the credit for that.
> >>>
> >>>But I had sent a a patch out to fix this problem before your v1.
> >>>
> >>>http://lists.infradead.org/pipermail/linux-mtd/2015-September/062234.html
> >>I didn't know that. I think your v1 was applied.
> >
> >Sorry if I left any confusion. Dongsheng's v1 was applied and reverted.
> >v2 is still under review (and was sent slightly before (?) your v1).
> 
> Yea, sorry I should have mentioned it earlier. But I was and am still
> in a vacation, then I did not point it out in time.
> 
> Sudip, any comment or test for my patch there is always welcome.
Sorry, I donot know anything about this driver to comment. My main patch
was to fix the build failure. And since by that time your patch was
reverted so I sent another path with my patch and your patch combined
together.

regards
sudip
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git pull] drm fixes

2015-10-02 Thread Dave Airlie


Hi Linus,

Bunch of fixes all over the place, all pretty small,

amdgpu, i915, exynos, one qxl and one vmwgfx,

There is also a bunch of mst fixes, I left some cleanups in the series
as I didn't think it was worth splitting up the tested series,

Dave.


The following changes since commit 9ffecb10283508260936b96022d4ee43a7798b4c:

  Linux 4.3-rc3 (2015-09-27 07:50:08 -0400)

are available in the git repository at:

  git://people.freedesktop.org/~airlied/linux drm-fixes

for you to fetch changes up to ccf03d6995fa4b784f5b987726ba98f4859bf326:

  drm/dp/mst: add some defines for logical/physical ports (2015-10-02 15:34:42 
+1000)


Alex Deucher (4):
  drm/amdgpu:  Restore LCD backlight level on resume
  drm/amdgpu/cgs: remove import_gpu_mem
  drm: handle cursor_set2 in restore_fbdev_mode
  drm/radeon: drop radeon_fb_helper_set_par

Christian König (1):
  drm/amdgpu: only print meaningful VM faults

Dan Carpenter (1):
  drm/amdgpu: signedness bug in amdgpu_cs_parser_init()

Daniel Kurtz (1):
  drm/exynos: Remove useless EXPORT_SYMBOL_GPLs

Dave Airlie (10):
  Merge branch 'drm-fixes-4.3' of git://people.freedesktop.org/~agd5f/linux 
into drm-fixes
  Merge branch 'exynos-drm-fixes' of 
git://git.kernel.org/.../daeinki/drm-exynos into drm-fixes
  Merge tag 'vmwgfx-fixes-4.3-151001' of 
git://people.freedesktop.org/~thomash/linux into drm-fixes
  Merge tag 'drm-intel-fixes-2015-10-01' of 
git://anongit.freedesktop.org/drm-intel into drm-fixes
  drm/dp/mst: don't pass port into the path builder function
  drm/dp/mst: fixup handling hotplug on port removal.
  drm/dp/mst: update the link_address_sent before sending the link address 
(v3)
  drm/dp/mst: split connector registration into two parts (v2)
  drm/dp/mst: drop cancel work sync in the mstb destroy path (v2)
  drm/dp/mst: add some defines for logical/physical ports

Egbert Eich (2):
  drm: Add a non-locking version of drm_kms_helper_poll_enable(), v2
  drm/i915: Call non-locking version of drm_kms_helper_poll_enable(), v2

Fabiano Fidêncio (1):
  drm/qxl: recreate the primary surface when the bo is not primary

Gustavo Padovan (4):
  drm/exynos: remove fimd_mode_fixup()
  drm/exynos: remove decon_mode_fixup()
  drm/exynos: remove unused mode_fixup() code
  drm/exynos: fimd: actually disable dp clock

Inki Dae (1):
  drm/exynos: dp: remove suspend/resume functions

Joonyoung Shim (8):
  drm/exynos: fix layering violation of address
  drm/exynos: fix missed calling of drm_prime_gem_destroy()
  drm/exynos: remove unnecessary NULL assignment
  drm/exynos: staticize exynos_drm_gem_init()
  drm/exynos: cleanup function calling written twice
  drm/exynos: cleanup line feed in exynos_drm_gem_get_ioctl
  drm/exynos: remove call to drm_gem_free_mmap_offset()
  drm/exynos: create a fake mmap offset with gem creation

Krzysztof Kozlowski (1):
  drm/exynos: Staticize local function in exynos_drm_gem.c

Michel Dänzer (1):
  drm/radeon: Restore LCD backlight level on resume (>= R5xx)

Michel Thierry (1):
  drm/i915: Consider HW CSB write pointer before resetting the sw read 
pointer

Rodrigo Vivi (1):
  drm/i915/skl: Don't call intel_prepare_ddi when encoder list isn't yet 
initialized.

Thierry Reding (3):
  drm/exynos: Suspend/resume is unused if !PM
  drm/exynos: fimc: Clock control is unused if !PM
  drm/exynos: rotator: Clock control is unused if !PM

Thomas Hellstrom (1):
  drm/vmwgfx: Fix a command submission hang regression

 drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c| 39 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |  3 +-
 drivers/gpu/drm/amd/amdgpu/atombios_encoders.c |  3 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  |  8 ++-
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  |  8 ++-
 drivers/gpu/drm/amd/include/cgs_linux.h| 17 -
 drivers/gpu/drm/drm_dp_mst_topology.c  | 85 ++-
 drivers/gpu/drm/drm_fb_helper.c|  6 +-
 drivers/gpu/drm/drm_probe_helper.c | 19 +-
 drivers/gpu/drm/exynos/exynos7_drm_decon.c | 12 
 drivers/gpu/drm/exynos/exynos_dp_core.c| 23 ---
 drivers/gpu/drm/exynos/exynos_drm_core.c   |  6 --
 drivers/gpu/drm/exynos/exynos_drm_crtc.c   | 15 
 drivers/gpu/drm/exynos/exynos_drm_drv.c|  2 +
 drivers/gpu/drm/exynos/exynos_drm_drv.h|  4 --
 drivers/gpu/drm/exynos/exynos_drm_fimc.c   | 36 +-
 drivers/gpu/drm/exynos/exynos_drm_fimd.c   | 14 +---
 drivers/gpu/drm/exynos/exynos_drm_g2d.c|  3 -
 drivers/gpu/drm/exynos/exynos_drm_gem.c| 94 +++---
 drivers/gpu/drm/exynos/exynos_drm_gem.h|  6 +-
 drivers/gpu/drm/exynos/exynos_drm_rotator.c|  2 +-
 drivers/gpu/drm/i915/intel_dp_mst.c|  9 ++-

Re: [BUG] RCU stall in cursor_timer_handler

2015-10-02 Thread Scot Doyle

On Sat, 3 Oct 2015, Alistair Popple wrote:
> Hi,
> 
> We have been intermittently seeing the below RCU stall at boot on a 
> PPC64LE 4.2.1 kernel which has been preventing the system from booting.
> Further investigation indicates that ops->cur_blink_jiffies is 
> potentially being used uninitialised in cursor_timer_handler():
> 
> static void cursor_timer_handler(unsigned long dev_addr)
> {
>   struct fb_info *info = (struct fb_info *) dev_addr;
>   struct fbcon_ops *ops = info->fbcon_par;
> 
>   queue_work(system_power_efficient_wq, >queue);
>   mod_timer(>cursor_timer, jiffies + ops->cur_blink_jiffies);
> }
...


Hi Alistair, thanks so much for the detailed report. Does this patch 
correct the stalls?


diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
index 1aaf893..92f3949 100644
--- a/drivers/video/console/fbcon.c
+++ b/drivers/video/console/fbcon.c
@@ -1093,6 +1093,7 @@ static void fbcon_init(struct vc_data *vc, int init)
con_copy_unimap(vc, svc);
 
ops = info->fbcon_par;
+   ops->cur_blink_jiffies = msecs_to_jiffies(vc->vc_cur_blink_ms);
p->con_rotate = initial_rotation;
set_blitting_type(vc, info);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] staging: i20: Added a blank line after declaration

2015-10-02 Thread Anjali Menon

Added a blank line after declaration to fix the coding style
warning detected by checkpatch.pl

WARNING: Missing a blank line after declarations

Signed-off-by: Anjali Menon 
---
 drivers/staging/i2o/pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/i2o/pci.c b/drivers/staging/i2o/pci.c
index 49804c9..18b6c11 100644
--- a/drivers/staging/i2o/pci.c
+++ b/drivers/staging/i2o/pci.c
@@ -458,6 +458,7 @@ static int i2o_pci_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
 static void i2o_pci_remove(struct pci_dev *pdev)
 {
struct i2o_controller *c;
+
c = pci_get_drvdata(pdev);
 
i2o_iop_remove(c);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kselftest: replace $(RM) with rm -f command

2015-10-02 Thread Darren Hart

On Mon, Sep 28, 2015 at 03:16:53AM +, Mathieu Desnoyers wrote:
> - On Sep 27, 2015, at 10:10 PM, Wang Long long.wangl...@huawei.com wrote:
> 
> > Some test's Makefile using "$(RM)" while the other's
> > using "rm -f". It is better to use one of them in all
> > tests.
> 
> I agree that this disparity appears to be unwanted. We
> should settle on one or the other.
> 
> > 
> > "rm -f" is better, because it is less magic, and everyone
> > konws what is does.
> 
> "$(RM)" is clearly defined as a Makefile implicit variable
> which defaults to "rm -f".
> Ref. 
> https://www.gnu.org/software/make/manual/html_node/Implicit-Variables.html
> 
> Leaving it as a variable is more flexible because then the
> default behavior can be overridden if need be, which is
> not the case of a hardcoded "rm -f".
> 
> Following your line of argumentation, we should then
> invoke "gcc" directly in every Makefile because it is
> less magic than "$(CC)". This makes no sense.

I don't think they can be compared so simply. Specifying a compiler is a common
use case. Customizing the rm command is not, in my experience anyway, and like
Michael, I would definately have to look up what RM means.

That said, I care more about consistency than which is used. Both are valid, but
$(RM), while more flexible, will cost more people time to look up what it does
as it isn't commonly used than any benefit we're likely to see from its use.

Meh. :-)

-- 
Darren Hart
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: bad base of the h8300 tree

2015-10-02 Thread Stephen Rothwell

Hi Yoshinori,

On Fri, 02 Oct 2015 15:11:48 +0900 Yoshinori Sato  
wrote:
>
> On Tue, 15 Sep 2015 08:12:01 +0900,
> Stephen Rothwell wrote:
> > 
> > On Tue, 15 Sep 2015 09:10:30 +1000 Stephen Rothwell  
> > wrote:
> > >
> > > Please do "gitk 9751a9e449da..h8300-next" in your tree and look at
> > > it (9751a9e449da is in Linus' tree).  As I suggested above, maybe you
> > > should go back to commit 99bcfda85f66 and work forward from there
> > > (after amending it have a SOB).  Or just start all over with v4.3 ...
> > 
> > I meant v4.3-rc1, of course.
> 
> OK.
> fixed it.

Well, not really :-(

It is now based on next-20150925.  As I have said before, you cannot
base your linux-next included branch on a linux-next release.  You
should either base it on Linus' tree, or one of the other linux-next
included branches (as long as that other branch is never rebased).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 0/2] Enable capsule loader interface for efi firmware updating

2015-10-02 Thread Andy Lutomirski

On Oct 2, 2015 10:37 AM, "Borislav Petkov"  wrote:
>
> On Fri, Oct 02, 2015 at 05:05:52AM +0800, Kweh, Hock Leong wrote:
> > From: "Kweh, Hock Leong" 
> >
> > Dear maintainers & communities,
> >
> > This patchset is created on top of Matt's patchset:
> > 1.)https://lkml.org/lkml/2014/10/7/390
> > "[PATCH 1/2] efi: Move efi_status_to_err() to efi.h"
> > 2.)https://lkml.org/lkml/2014/10/7/391
> > "[PATCH 2/2] efi: Capsule update support"
> >
> > It expose a misc char interface for user to upload the capsule binary and
> > calling efi_capsule_update() API to pass the binary to EFI firmware.
> >
> > The steps to update efi firmware are:
> > 1.) cat firmware.cap > /dev/efi_capsule_loader
> > 2.) reboot
> >
> > Any failed upload error message will be returned while doing "cat" through
> > Write() function call.
> >
> > Tested the code with Intel Quark Galileo platform.
>
> What does the error case look like? A standard glibc message about
> write(2) failing?
>

close(2), right?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] jbd2: gate checksum calculations on crc driver presence, not sb flags

2015-10-02 Thread Darrick J. Wong

Change the journal's checksum functions to gate on whether or not the
crc32c driver is loaded, and gate the loading on the superblock bits.
This prevents a journal crash if someone loads a journal in no-csum
mode and then randomizes the superblock, thus flipping on the feature
bits.

v2: Create a feature-test helper, use it everywhere.

Tested-By: Nikolay Borisov 
Reported-by: Nikolay Borisov 
Signed-off-by: Darrick J. Wong 
---
 fs/jbd2/journal.c|6 +++---
 include/linux/jbd2.h |   13 +
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 8270fe9..00f7dbd 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -124,7 +124,7 @@ EXPORT_SYMBOL(__jbd2_debug);
 /* Checksumming functions */
 static int jbd2_verify_csum_type(journal_t *j, journal_superblock_t *sb)
 {
-   if (!jbd2_journal_has_csum_v2or3(j))
+   if (!jbd2_journal_has_csum_v2or3_feature(j))
return 1;
 
return sb->s_checksum_type == JBD2_CRC32C_CHKSUM;
@@ -1531,7 +1531,7 @@ static int journal_get_superblock(journal_t *journal)
goto out;
}
 
-   if (jbd2_journal_has_csum_v2or3(journal) &&
+   if (jbd2_journal_has_csum_v2or3_feature(journal) &&
JBD2_HAS_COMPAT_FEATURE(journal, JBD2_FEATURE_COMPAT_CHECKSUM)) {
/* Can't have checksum v1 and v2 on at the same time! */
printk(KERN_ERR "JBD2: Can't enable checksumming v1 and v2/3 "
@@ -1545,7 +1545,7 @@ static int journal_get_superblock(journal_t *journal)
}
 
/* Load the checksum driver */
-   if (jbd2_journal_has_csum_v2or3(journal)) {
+   if (jbd2_journal_has_csum_v2or3_feature(journal)) {
journal->j_chksum_driver = crypto_alloc_shash("crc32c", 0, 0);
if (IS_ERR(journal->j_chksum_driver)) {
printk(KERN_ERR "JBD2: Cannot load crc32c driver.\n");
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index df07e78..6da6f89 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1338,13 +1338,18 @@ static inline int tid_geq(tid_t x, tid_t y)
 extern int jbd2_journal_blocks_per_page(struct inode *inode);
 extern size_t journal_tag_bytes(journal_t *journal);
 
+static inline bool jbd2_journal_has_csum_v2or3_feature(journal_t *j)
+{
+   return JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V2) ||
+  JBD2_HAS_INCOMPAT_FEATURE(j, JBD2_FEATURE_INCOMPAT_CSUM_V3);
+}
+
 static inline int jbd2_journal_has_csum_v2or3(journal_t *journal)
 {
-   if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2) ||
-   JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V3))
-   return 1;
+   WARN_ON_ONCE(jbd2_journal_has_csum_v2or3_feature(journal) &&
+journal->j_chksum_driver == NULL);
 
-   return 0;
+   return journal->j_chksum_driver != NULL;
 }
 
 /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mtd: mtdram: check offs and len in mtdram->erase

2015-10-02 Thread Dongsheng Yang


On 10/03/2015 01:38 AM, Brian Norris wrote:

On Fri, Oct 02, 2015 at 03:31:33PM +0530, Sudip Mukherjee wrote:

On Fri, Oct 02, 2015 at 05:39:02PM +0800, Dongsheng Yang wrote:

On 10/01/2015 12:41 AM, Sudip Mukherjee wrote:

We should prevent user to erasing mtd device with an unaligned offset
or length.

Signed-off-by: Sudip Mukherjee 
---

I am not sure if I should add the Signed-off-by of
Dongsheng Yang  . He is the original author
and he should get the credit for that.


But I had sent a a patch out to fix this problem before your v1.

http://lists.infradead.org/pipermail/linux-mtd/2015-September/062234.html

I didn't know that. I think your v1 was applied.


Sorry if I left any confusion. Dongsheng's v1 was applied and reverted.
v2 is still under review (and was sent slightly before (?) your v1).


Yea, sorry I should have mentioned it earlier. But I was and am still
in a vacation, then I did not point it out in time.

Sudip, any comment or test for my patch there is always welcome.

Thanx
Yang

Feel free to comment there.

Brian
.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v6 0/2] Enable capsule loader interface for efi firmware updating

2015-10-02 Thread Kweh, Hock Leong

> -Original Message-
> From: Borislav Petkov [mailto:b...@alien8.de]
> Sent: Saturday, October 03, 2015 1:37 AM
> To: Kweh, Hock Leong
> Cc: Matt Fleming; Greg Kroah-Hartman; Ong, Boon Leong; LKML; linux-
> e...@vger.kernel.org; Sam Protsenko; Peter Jones; Andy Lutomirski; Roy
> Franz; James Bottomley; Linux FS Devel; Fleming, Matt
> Subject: Re: [PATCH v6 0/2] Enable capsule loader interface for efi firmware
> updating
> 
> On Fri, Oct 02, 2015 at 05:05:52AM +0800, Kweh, Hock Leong wrote:
> > From: "Kweh, Hock Leong" 
> >
> > Dear maintainers & communities,
> >
> > This patchset is created on top of Matt's patchset:
> > 1.)https://lkml.org/lkml/2014/10/7/390
> > "[PATCH 1/2] efi: Move efi_status_to_err() to efi.h"
> > 2.)https://lkml.org/lkml/2014/10/7/391
> > "[PATCH 2/2] efi: Capsule update support"
> >
> > It expose a misc char interface for user to upload the capsule binary
> > and calling efi_capsule_update() API to pass the binary to EFI firmware.
> >
> > The steps to update efi firmware are:
> > 1.) cat firmware.cap > /dev/efi_capsule_loader
> > 2.) reboot
> >
> > Any failed upload error message will be returned while doing "cat"
> > through
> > Write() function call.
> >
> > Tested the code with Intel Quark Galileo platform.
> 
> What does the error case look like? A standard glibc message about
> write(2) failing?
> 

Any upload fail error like -ENOMEM, -EINVAL, -EIO as well as error returned
by efi_capsule_update() API.

Thanks.

Re: [PATCH v4 0/5] PCI: Add support for PCI Enhanced Allocation "BARs"

2015-10-02 Thread Yinghai Lu

On Fri, Oct 2, 2015 at 3:37 PM, David Daney  wrote:
> From: David Daney 
>
> PCI Enhanced Allocation is a new method of allocating MMIO & IO
> resources for PCI devices & bridges. It can be used instead
> of the traditional PCI method of using BARs.
>
> EA entries are hardware-initialized to a fixed address.
> Unlike BARs, regions described by EA are cannot be moved.
> Because of this, only devices which are permanently connected to
> the PCI bus can use EA. A removable PCI card must not use EA.
>
> The Enhanced Allocation ECN is publicly available here:
> https://www.pcisig.com/specifications/conventional/ECN_Enhanced_Allocation_23_Oct_2014_Final.pdf

Looks like the EA will support more than just fixed address later.

"Enhanced Allocation is an optional Conventional PCI Capability that
may be implemented by
Functions to indicate fixed (non reprogrammable) I/O and memory ranges
assigned to the
Function, as well as supporting new resource “type” definitions and
future extensibility to also
support reprogrammable allocations."

so I would prefer to think more to make frame configurable to leave
space for that.

Bjorn,

I wonder if we need to revive the add-on resource support patchset
that i suggested couple years ago,
so we can extend it to support EA features.

URL: https://lkml.org/lkml/2012/3/19/86

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] RDMA/amso1100: use offset_in_page macro

2015-10-02 Thread Geliang Tang

Use offset_in_page macro instead of (addr & ~PAGE_MASK).

Signed-off-by: Geliang Tang 
---
 drivers/staging/rdma/amso1100/c2_alloc.c| 2 +-
 drivers/staging/rdma/amso1100/c2_provider.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/rdma/amso1100/c2_alloc.c 
b/drivers/staging/rdma/amso1100/c2_alloc.c
index 78d247e..039872d 100644
--- a/drivers/staging/rdma/amso1100/c2_alloc.c
+++ b/drivers/staging/rdma/amso1100/c2_alloc.c
@@ -131,7 +131,7 @@ void c2_free_mqsp(__be16 *mqsp)
*mqsp = (__force __be16) head->head;
 
/* Compute the shared_ptr index */
-   idx = ((unsigned long) mqsp & ~PAGE_MASK) >> 1;
+   idx = (offset_in_page(mqsp)) >> 1;
idx -= (unsigned long) &(((struct sp_chunk *) 0)->shared_ptr[0]) >> 1;
 
/* Point this index at the head */
diff --git a/drivers/staging/rdma/amso1100/c2_provider.c 
b/drivers/staging/rdma/amso1100/c2_provider.c
index 25c3f00..956d76b 100644
--- a/drivers/staging/rdma/amso1100/c2_provider.c
+++ b/drivers/staging/rdma/amso1100/c2_provider.c
@@ -359,7 +359,7 @@ static struct ib_mr *c2_reg_phys_mr(struct ib_pd *ib_pd,
 
for (i = 0; i < num_phys_buf; i++) {
 
-   if (buffer_list[i].addr & ~PAGE_MASK) {
+   if (offset_in_page(buffer_list[i].addr)) {
pr_debug("Unaligned Memory Buffer: 0x%x\n",
(unsigned int) buffer_list[i].addr);
return ERR_PTR(-EINVAL);
-- 
2.5.0


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v6 1/2] efi: export efi_capsule_supported() function symbol

2015-10-02 Thread Kweh, Hock Leong

> -Original Message-
> From: Borislav Petkov [mailto:b...@alien8.de]
> Sent: Saturday, October 03, 2015 1:37 AM
> To: Kweh, Hock Leong
> Cc: Matt Fleming; Greg Kroah-Hartman; Ong, Boon Leong; LKML; linux-
> e...@vger.kernel.org; Sam Protsenko; Peter Jones; Andy Lutomirski; Roy
> Franz; James Bottomley; Linux FS Devel; Fleming, Matt
> Subject: Re: [PATCH v6 1/2] efi: export efi_capsule_supported() function
> symbol
> 
> On Fri, Oct 02, 2015 at 05:05:53AM +0800, Kweh, Hock Leong wrote:
> > From: "Kweh, Hock Leong" 
> >
> > This patch export efi_capsule_supported() function symbol for capsule
> > kernel module to use.
> >
> > Cc: Matt Fleming 
> > Signed-off-by: Kweh, Hock Leong 
> > ---
> >  drivers/firmware/efi/capsule.c |1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/firmware/efi/capsule.c
> > b/drivers/firmware/efi/capsule.c index d8cd75c0..738d437 100644
> > --- a/drivers/firmware/efi/capsule.c
> > +++ b/drivers/firmware/efi/capsule.c
> > @@ -101,6 +101,7 @@ out:
> > kfree(capsule);
> > return rv;
> >  }
> > +EXPORT_SYMBOL_GPL(efi_capsule_supported);
> >
> >  /**
> >   * efi_capsule_update - send a capsule to the firmware
> 
> Why is this a separate patch?
> 

It is because the author of this code is Matt. Submitting this,
allows him to easily squash into his patch:
https://lkml.org/lkml/2014/10/7/391

Thanks.

Re: [PATCH v4 3/5] PCI: Handle IORESOURCE_PCI_FIXED when sizing and assigning resources.

2015-10-02 Thread Yinghai Lu

On Fri, Oct 2, 2015 at 4:38 PM, David Daney  wrote:
> On 10/02/2015 04:14 PM, Yinghai Lu wrote:
>>
>> https://patchwork.kernel.org/patch/7304371/
>> [v6,06/53] PCI: Claim fixed resource during remove/rescan path
>
>
> This one is interesting, but I don't think it will work.
>
> pci_claim_resource() calls pci_find_parent_resource(), which will fail in
> important use cases.
>
> It is perfectly legal for a bridge provisioned by EA to not specify any
> resources.  In this case we must walk up the bus tree until we find
> something that contains the device resource, and can thus be a parent.
>
> That is a big part of what my patch is doing.

looks we need another resource flags for EA in addition to
FIXED as it could out side of bridge MMIO range.

>
> As for the merits of assigning fixed resources from the FINAL fixup, rather
> than in __pci_bus_assign_resources(), I am unsure.

usually __pci_bus_assign_resources() is used for unassigned one. so
I don't want it mixed with request resource there.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 00/23] ILP32 for ARM64

2015-10-02 Thread Kapoor, Prasun



On 10/2/15, 2:37 AM, "Catalin Marinas"  wrote:

>On Thu, Oct 01, 2015 at 09:49:46PM +, Pinski, Andrew wrote:
>> Ok, we will rewrite these patches using 32bit time_t and 32bit off_t
>> and redo the toolchain support for them.  Note this is going back to
>> the abi I had originally done when I submitted my original version
>> when it was asked to change time_t to be 64bit.
>
>One of the key aspects of kernel development is the ability to adapt
>quickly to new requests/insights. This implies releasing early and often
>rather than a new version roughly twice a year (IIRC, v1 was posted
>September 2013). Moreover, the success of the kernel is partly based on
>not getting stuck on old decisions (well, unless it breaks accepted user
>ABI).
>
>So, at the time, following x32 discussions, we thought of using the
>native ABI as much as possible. However, two important things happened
>since:
>
>1. libc community didn't like breaking the POSIX compliance
>2. No-one seems desperate for ILP32 on AArch64
>
>(1) is a fair point and I would rather be careful as we don't know the
>extent of the code affected. In the meantime, we've also had ongoing
>work for addressing the 2038 issue on 32-bit architectures.
>
>The second point is equally important. The benchmarks I've seen didn't
>show a significant improvement and the messages I got on various
>channels pretty much labeled ILP32 as a transitional stage to full LP64.
>In this case, we need to balance the benefits of a close to native ABI
>(future proof, slightly higher performance) vs. the cost of maintaining
>such ABI in the kernel on the long term, especially if it's not widely
>used/tested.


For us ILP32  is not about putting this into our product flier at all, it
is about supporting real applications. We have an existing product line of
MIPS based SoCs where a large number of N32 (an exact equivalent of ILP32)
applications are currently in production. Our customers are looking to
bring those applications (mostly in Networking and Telecom space) over to
ARMv8. 

We think its an extremely risky strategy to say either future processors
should incur the additional cost (power and complexity) of implementing
Aarch32 instruction set or have no way of  supporting 32 bit applications
at all.

Apart from there being an installed base of 32 bit networking and telecom
applications, we have also seen non-trivial performance gains with ILP32
(for example, our SPECINT score goes up by 7% with ILP32 compared to
LP64).  



>
>We've seen the kernel patches and, following discussions on the lists,
>decided to change the original recommendation. IIRC, the main ideas (but
>you need to read various threads as I can't remember the details from
>6-7 months ago):
>
>a) separate syscall table for ILP32
>b) close to compat ABI with 32-bit time_t, off_t
>c) asm-generic/unistd.h rather than asm/unistd32.h (that's where it
>   would differ from compat, together with places where pointers are
>   passed)
>
>As I said previously, I'm not going to pay any attention to the patches
>in this series, it's nothing more than a rebase of a version I already
>reviewed.
>
>-- 
>Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] IB/ipath: use offset_in_page macro

2015-10-02 Thread Geliang Tang

Use offset_in_page macro instead of (addr & ~PAGE_MASK).

Signed-off-by: Geliang Tang 
---
 drivers/staging/rdma/ipath/ipath_user_sdma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/rdma/ipath/ipath_user_sdma.c 
b/drivers/staging/rdma/ipath/ipath_user_sdma.c
index cc04b7b..e82b3ee 100644
--- a/drivers/staging/rdma/ipath/ipath_user_sdma.c
+++ b/drivers/staging/rdma/ipath/ipath_user_sdma.c
@@ -239,7 +239,7 @@ static int ipath_user_sdma_num_pages(const struct iovec 
*iov)
 /* truncate length to page boundary */
 static int ipath_user_sdma_page_length(unsigned long addr, unsigned long len)
 {
-   const unsigned long offset = addr & ~PAGE_MASK;
+   const unsigned long offset = offset_in_page(addr);
 
return ((offset + len) > PAGE_SIZE) ? (PAGE_SIZE - offset) : len;
 }
@@ -298,7 +298,7 @@ static int ipath_user_sdma_pin_pages(const struct 
ipath_devdata *dd,
dma_addr_t dma_addr =
dma_map_page(>pcidev->dev,
 pages[j], 0, flen, DMA_TO_DEVICE);
-   unsigned long fofs = addr & ~PAGE_MASK;
+   unsigned long fofs = offset_in_page(addr);
 
if (dma_mapping_error(>pcidev->dev, dma_addr)) {
ret = -ENOMEM;
-- 
2.5.0


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] IB/hfi1: use offset_in_page macro

2015-10-02 Thread Geliang Tang

Use offset_in_page macro instead of (addr & ~PAGE_MASK).

Signed-off-by: Geliang Tang 
---
 drivers/staging/rdma/hfi1/file_ops.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rdma/hfi1/file_ops.c 
b/drivers/staging/rdma/hfi1/file_ops.c
index 9a77221..7d28680 100644
--- a/drivers/staging/rdma/hfi1/file_ops.c
+++ b/drivers/staging/rdma/hfi1/file_ops.c
@@ -168,7 +168,7 @@ enum mmap_types {
HFI1_MMAP_TOKEN_SET(TYPE, type) | \
HFI1_MMAP_TOKEN_SET(CTXT, ctxt) | \
HFI1_MMAP_TOKEN_SET(SUBCTXT, subctxt) | \
-   HFI1_MMAP_TOKEN_SET(OFFSET, ((unsigned long)addr & ~PAGE_MASK)))
+   HFI1_MMAP_TOKEN_SET(OFFSET, (offset_in_page(addr
 
 #define EXP_TID_SET(field, value)  \
(((value) & EXP_TID_TID##field##_MASK) <<   \
@@ -1335,9 +1335,9 @@ static int get_base_info(struct file *fp, void __user 
*ubase, __u32 len)
 */
binfo.user_regbase = HFI1_MMAP_TOKEN(UREGS, uctxt->ctxt,
subctxt_fp(fp), 0);
-   offset = uctxt->ctxt - dd->first_user_ctxt) *
+   offset = offset_in_pageuctxt->ctxt - dd->first_user_ctxt) *
HFI1_MAX_SHARED_CTXTS) + subctxt_fp(fp)) *
- sizeof(*dd->events)) & ~PAGE_MASK;
+ sizeof(*dd->events));
binfo.events_bufbase = HFI1_MMAP_TOKEN(EVENTS, uctxt->ctxt,
  subctxt_fp(fp),
  offset);
@@ -1573,7 +1573,7 @@ static int exp_tid_setup(struct file *fp, struct 
hfi1_tid_info *tinfo)
 
vaddr = tinfo->vaddr;
 
-   if (vaddr & ~PAGE_MASK) {
+   if (offset_in_page(vaddr)) {
ret = -EINVAL;
goto bail;
}
-- 
2.5.0


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Missing operand for tlbie instruction on Power7

2015-10-02 Thread Peter Bergner

On Fri, 2015-10-02 at 17:00 -0500, Segher Boessenkool wrote:
> On Sat, Oct 03, 2015 at 12:37:35AM +0300, Denis Kirjanov wrote:
> > >> -0: tlbie   r4; \
> > >> +0: tlbie   r4, 0;  \
> > >
> > > This isn't correct.  With POWER7 and later (which this compile
> > > is, since it's on LE), the tlbie instruction takes two register
> > > operands:
> > >
> > > tlbie RB, RS
> > >
> > > The tlbie instruction on pre POWER7 cpus had one required register
> > > operand (RB) and an optional second L operand, where if you omitted
> > > it, it was the same as using "0":
> > >
> > > tlbie RB, L
> > >
> > > This is a POWER7 and later build, so your change which adds the "0"
> > > above is really adding r0 for RS.  The new tlbie instruction doesn't
> > > treat r0 specially, so you'll be using whatever random bits which
> > > happen to be in r0 which I don't think that is what you want.
> > 
> > Ok, than we can just zero out r5 for example and use it in tlbie as RS,
> > right?
> 
> That won't assemble _unless_ your assembler is in POWER7 mode.  It also
> won't do the right thing at run time on older machines.

Correct, getting this to work on both pre-power7 and power7 and later
is tricky.  One really horrible hack would be to do:

  li r0,0
  tlbie r4,0

On pre-power7, the "0" will be taken as a zero L operand and on
power7 and later, it'll be r0, but with a zero value we loaded in
the insn before.  I know, really ugly. :-)

Peter


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] drivers: staging: wilc1000: Check for errors before kfree

2015-10-02 Thread Chandra Gorentla

On Fri, Oct 02, 2015 at 04:39:11PM +0300, Dan Carpenter wrote:
> On Fri, Oct 02, 2015 at 06:47:35PM +0530, Chandra S Gorentla wrote:
> > During the clean-up of the function, it is need to check if
> > errors occurred, not the memory pointer.
> >
> 
> The bug here is that we have a use after free on the success path.  It
> should have been mentioned in the changelog.
> 
> Anyway, this patch is buggy.  If result == -EFAULT then it will crash.
> Also this patch is really ugly.  There is someone who is going to send a
> correct fix (just add a return 0).
> 
> This driver usese "do everything" style error handling.  It is a bug
> prone anti-pattern because doing everything is more complicated than
> doing one thing.  You can easily see it is bug prone, because it made
> you introduce a bug, right?
> 
> Instead the error handling should look like this:
> 
>   return 0;
> 
> err_free_msg:
>   kfree(pstrMessage);
> 
>   return ret;
> 
> There are no error paths where we need to free "pstrMessage->pvBuffer"
> but if we were to add one it would look like this:
> 
>   return 0;
> 
> err_pvbuffer:
>   kfree(pstrMessage->pvBuffer);
> err_msg:
>   kfree(pstrMessage);
> 
>   return ret;
> 
> This is a minimal, uncomplicated, no indenting, no if statement way of
> unwinding.
> 
> regards,
> dan carpenter
> 
OK.  There is a problem in this patch.  I will correct it, reorganize the
patch series.

Thank you,
chandra

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [fuse-devel] [PATCH] fuse: break infinite loop in fuse_fill_write_pages()

2015-10-02 Thread Konstantin Khlebnikov

On Sat, Oct 3, 2015 at 1:04 AM, Andrew Morton  wrote:
> On Fri, 2 Oct 2015 12:27:45 -0700 Maxim Patlasov  
> wrote:
>
>> On 10/02/2015 04:21 AM, Konstantin Khlebnikov wrote:
>> > Bump. Add more peopple in CC.
>> >
>> > On Mon, Sep 21, 2015 at 1:02 PM, Roman Gushchin  
>> > wrote:
>> >> I got a report about unkillable task eating CPU. Thge further
>> >> investigation shows, that the problem is in the fuse_fill_write_pages()
>> >> function. If iov's first segment has zero length, we get an infinite
>> >> loop, because we never reach iov_iter_advance() call.
>>
>> iov_iter_copy_from_user_atomic() eventually calls iterate_iovec(). The
>> latter silently consumes zero-length iov. So I don't think "iov's first
>> segment has zero length" can cause infinite loop.
>
> I'm suspecting it got stuck because local variable `bytes' is zero, so
> the code does `goto again' repeatedly.
>
> Or maybe not.   A more complete description of the bug would help.

I suspect here is the same scenario like in 124d3b7041f:
Zero-length segmend is followed by segment with invalid address:
iov_iter_fault_in_readable() checks only first segment (zero-length)
iov_iter_copy_from_user_atomic() skips it, fails at second and
returns zero -> goto again without skipping zero-length segment.

Patch calls iov_iter_advance() before goto again: we'll skip zero-length
segment at second iteraction and iov_iter_fault_in_readable() will detect
invalid address.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] blk-mq: use after free when freeing tag sets

2015-10-02 Thread Sasha Levin

Commit f26cdc853 ("blk-mq: Shared tag enhancements") has introduced a use after
free where it tried to free the cpumask var out of the tag set but the tag set
was already freed by blk_mq_free_rq_map().

Signed-off-by: Sasha Levin 
---
 block/blk-mq-tag.c |1 +
 block/blk-mq.c |7 ++-
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index f6020c6..c03eeb8 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -628,6 +628,7 @@ void blk_mq_free_tags(struct blk_mq_tags *tags)
 {
bt_free(>bitmap_tags);
bt_free(>breserved_tags);
+   free_cpumask_var(tags->cpumask);
kfree(tags);
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5692c15..197a615 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2270,12 +2270,9 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 {
int i;
 
-   for (i = 0; i < set->nr_hw_queues; i++) {
-   if (set->tags[i]) {
+   for (i = 0; i < set->nr_hw_queues; i++)
+   if (set->tags[i])
blk_mq_free_rq_map(set, set->tags[i], i);
-   free_cpumask_var(set->tags[i]->cpumask);
-   }
-   }
 
kfree(set->tags);
set->tags = NULL;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] RCU stall in cursor_timer_handler

2015-10-02 Thread Alistair Popple

Hi,

We have been intermittently seeing the below RCU stall at boot on a PPC64LE 
4.2.1 kernel which has been preventing the system from booting. Further 
investigation indicates that  ops->cur_blink_jiffies is potentially being used 
uninitialised in cursor_timer_handler():

static void cursor_timer_handler(unsigned long dev_addr)
{
struct fb_info *info = (struct fb_info *) dev_addr;
struct fbcon_ops *ops = info->fbcon_par;

queue_work(system_power_efficient_wq, >queue);
mod_timer(>cursor_timer, jiffies + ops->cur_blink_jiffies);
}

Adding WARN_ON(ops->cur_blink_jiffies < 1) to the above occasionally triggers 
and further testing shows ops->cur_blink_jiffies == 0, therefore I suspect 
cur_blink_jiffies is being used uninitialised. This patch seems to work around 
the problem:

diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
index 1aaf893..45d2a0a 100644
--- a/drivers/video/console/fbcon.c
+++ b/drivers/video/console/fbcon.c
@@ -416,6 +416,8 @@ static void fbcon_add_cursor_timer(struct fb_info *info)
INIT_WORK(>queue, fb_flashcursor);
 
init_timer(>cursor_timer);
+   if (ops->cur_blink_jiffies < 1)
+   ops->cur_blink_jiffies = msecs_to_jiffies(200);
ops->cursor_timer.function = cursor_timer_handler;
ops->cursor_timer.expires = jiffies + ops->cur_blink_jiffies;
ops->cursor_timer.data = (unsigned long ) info;

It looks like the issue was introduced by:

commit 27a4c827c34ac4256a190cc9d24607f953c1c459
Author: Scot Doyle

  
Date:   Thu Mar 26 13:56:38 2015 +  

  


  
fbcon: use the cursor blink interval provided by vt 

  


  
vt now provides a cursor blink interval via vc_data. Use this   

  
interval instead of the currently hardcoded 200 msecs. Store it in  

  
fbcon_ops to avoid locking the console in cursor_timer_handler().   

  


  
Signed-off-by: Scot Doyle 

  
Acked-by: Pavel Machek

  
Signed-off-by: Greg Kroah-Hartman   

  

Regards,

Alistair

[006280989170] Generic RTC Driver v1.07
[006281152624] powernv_rng: Registered powernv hwrng.
[006281250526] [drm] Initialized drm 1.1.0 20060810
[006281323420] [drm] radeon kernel modesetting enabled.
[006281570416] ast 0001:10:00.0: enabling device (0140 -> 0142)
[006281932424] [drm] platform has no IO space, trying MMIO
[006281978870] [drm] AST 2400 detected
[006282070130] [drm] VGA not enabled on entry, requesting chip POST
[006282131234] [drm] Analog VGA only
[006282174414] [drm] dram 163200 7 16 00c0
[006282324794] [TTM] Zone  kernel: Available graphics memory: 66806240 kiB
[006282376184] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[006282551070] [TTM] Initializing pool allocator
[027281989546] INFO: rcu_sched self-detected stall on CPU
[027281996420]  154: (2100 ticks this GP) idle=e25/142/0 
softirq=323/326 fqs=2099 
[027282010008]   (t=2100 jiffies g=-260 c=-261 q=4321)
[027282013264] Task dump for CPU 154:
[027282017276] swapper/136 R  running task0 1  0 
0x0804
[027282018170] Call Trace:
[027282023368] [c01fe0301c50] [c007fe98] 
sched_show_task+0x178/0x184

Re: [patch 1/2] x86/process: Add proper bound checks in 64bit get_wchan()

2015-10-02 Thread Andy Lutomirski

On Fri, Oct 2, 2015 at 6:15 PM, Sasha Levin  wrote:
> On 09/30/2015 04:38 AM, Thomas Gleixner wrote:
>> Dmitry Vyukov reported the following using trinity and the memory
>> error detector AddressSanitizer
>> (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).
>>
>> [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
>> address 88002e28
>> [ 124.576801] 88002e28 is located 131938492886538 bytes to
>> the left of 28857600-byte region [81282e0a, 82e0830a)
>> [ 124.578633] Accessed by thread T10915:
>> [ 124.579295] inlined in describe_heap_address
>> ./arch/x86/mm/asan/report.c:164
>> [ 124.579295] #0 810dd277 in asan_report_error
>> ./arch/x86/mm/asan/report.c:278
>> [ 124.580137] #1 810dc6a0 in asan_check_region
>> ./arch/x86/mm/asan/asan.c:37
>> [ 124.581050] #2 810dd423 in __tsan_read8 ??:0
>> [ 124.581893] #3 8107c093 in get_wchan
>> ./arch/x86/kernel/process_64.c:444
>>
>> The address checks in the 64bit implementation of get_wchan() are
>> wrong in several ways:
>>
>>  - The lower bound of the stack is not the start of the stack
>>page. It's the start of the stack page plus sizeof (struct
>>thread_info)
>>
>>  - The upper bound must be:
>>
>>top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned 
>> long).
>>
>>The 2 * sizeof(unsigned long) is required because the stack pointer
>>points at the frame pointer. The layout on the stack is: ... IP FP
>>... IP FP. So we need to make sure that both IP and FP are in the
>>bounds.
>>
>> Fix the bound checks and get rid of the mix of numeric constants, u64
>> and unsigned long. Making all unsigned long allows us to use the same
>> function for 32bit as well.
>>
>> Use READ_ONCE() when accessing the stack. This does not prevent a
>> concurrent wakeup of the task and the stack changing, but at least it
>> avoids TOCTOU.
>>
>> Also check task state at the end of the loop. Again that does not
>> prevent concurrent changes, but it avoids walking for nothing.
>>
>> Add proper comments while at it.
>>
>> Reported-by: Dmitry Vyukov 
>> Reported-by: Sasha Levin 
>> Based-on-patch-from: Wolfram Gloger 
>> Signed-off-by: Thomas Gleixner 
>
> I'm seeing a different issue with this patch:
>
> [ 5228.736320] BUG: KASAN: out-of-bounds in get_wchan+0xf9/0x1b0 at addr 
> 88049d2b7c50

This could be a real bug, but it also could plausibly be kasan not
understanding that this code can legitimately read random addresses
within the stack page.  In particular, it can read up into the entry
asm stack, which kasan might consider off-limits.  (kasan may also
consider the return address itself off limits.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/2] x86/process: Add proper bound checks in 64bit get_wchan()

2015-10-02 Thread Sasha Levin

On 09/30/2015 04:38 AM, Thomas Gleixner wrote:
> Dmitry Vyukov reported the following using trinity and the memory
> error detector AddressSanitizer
> (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).
> 
> [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
> address 88002e28
> [ 124.576801] 88002e28 is located 131938492886538 bytes to
> the left of 28857600-byte region [81282e0a, 82e0830a)
> [ 124.578633] Accessed by thread T10915:
> [ 124.579295] inlined in describe_heap_address
> ./arch/x86/mm/asan/report.c:164
> [ 124.579295] #0 810dd277 in asan_report_error
> ./arch/x86/mm/asan/report.c:278
> [ 124.580137] #1 810dc6a0 in asan_check_region
> ./arch/x86/mm/asan/asan.c:37
> [ 124.581050] #2 810dd423 in __tsan_read8 ??:0
> [ 124.581893] #3 8107c093 in get_wchan
> ./arch/x86/kernel/process_64.c:444
> 
> The address checks in the 64bit implementation of get_wchan() are
> wrong in several ways:
> 
>  - The lower bound of the stack is not the start of the stack
>page. It's the start of the stack page plus sizeof (struct
>thread_info)
> 
>  - The upper bound must be:
> 
>top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).
> 
>The 2 * sizeof(unsigned long) is required because the stack pointer
>points at the frame pointer. The layout on the stack is: ... IP FP
>... IP FP. So we need to make sure that both IP and FP are in the
>bounds.
> 
> Fix the bound checks and get rid of the mix of numeric constants, u64
> and unsigned long. Making all unsigned long allows us to use the same
> function for 32bit as well.
> 
> Use READ_ONCE() when accessing the stack. This does not prevent a
> concurrent wakeup of the task and the stack changing, but at least it
> avoids TOCTOU.
> 
> Also check task state at the end of the loop. Again that does not
> prevent concurrent changes, but it avoids walking for nothing.
> 
> Add proper comments while at it.
> 
> Reported-by: Dmitry Vyukov 
> Reported-by: Sasha Levin 
> Based-on-patch-from: Wolfram Gloger 
> Signed-off-by: Thomas Gleixner 

I'm seeing a different issue with this patch:

[ 5228.736320] BUG: KASAN: out-of-bounds in get_wchan+0xf9/0x1b0 at addr 
88049d2b7c50
[ 5228.737560] Read of size 8 by task killall/22177
[ 5228.738304] page:ea001274adc0 count:0 mapcount:0 mapping:  
(null) index:0x0
[ 5228.739374] flags: 0x6f8000()
[ 5228.739862] page dumped because: kasan: bad access detected
[ 5228.741764] CPU: 8 PID: 22177 Comm: killall Not tainted 
4.3.0-rc3-next-20151002-sasha-00076-gde7fa56-dirty #2590
[ 5228.743337]  882c80967828 7a901a83 882c80967790 
acd2c8c8
[ 5228.744409]  88049d2b7c50 882c80967818 ab74befb 
882c8bd0
[ 5228.745436]  0002 0282 882c8bd00cf8 
0001
[ 5228.746446] Call Trace:
[ 5228.746881] dump_stack (lib/dump_stack.c:52)
[ 5228.747720] kasan_report_error (include/linux/kasan.h:28 
mm/kasan/report.c:170 mm/kasan/report.c:237)
[ 5228.748670] __asan_report_load8_noabort (mm/kasan/report.c:279)
[ 5228.750563] get_wchan (arch/x86/kernel/process.c:561)
[ 5228.751378] do_task_stat (fs/proc/array.c:458)
[ 5228.755912] proc_tgid_stat (fs/proc/array.c:565)
[ 5228.756770] proc_single_show (./arch/x86/include/asm/atomic.h:118 
include/linux/sched.h:2012 fs/proc/base.c:789)
[ 5228.759066] seq_read (fs/seq_file.c:238)
[ 5228.762360] __vfs_read (fs/read_write.c:432)
[ 5228.767957] vfs_read (fs/read_write.c:454)
[ 5228.769368] SyS_read (fs/read_write.c:570 fs/read_write.c:562)
[ 5228.778344] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186)
[ 5228.779272] Memory state around the buggy address:
[ 5228.779971]  88049d2b7b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[ 5228.780992]  88049d2b7b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[ 5228.782021] >88049d2b7c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[ 5228.783066] ^
[ 5228.783936]  88049d2b7c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[ 5228.784994]  88049d2b7d00: 00 00 00 00 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 
f3

fp = READ_ONCE(*(unsigned long *)sp);
do {
if (fp < bottom || fp > top)
return 0;
ip = READ_ONCE(*(unsigned long *)(fp + sizeof(unsigned long)));
if (!in_sched_functions(ip))
return ip;
fp = READ_ONCE(*(unsigned long *)fp); <=== Here
} while (count++ < 16 && p->state != TASK_RUNNING);

Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [alsa-devel] [RESEND PATCH v2 1/1] ASoC: dwc: fix dma stop transferring issue

2015-10-02 Thread yitian

Hi Mark:
> This doesn't apply because it's been corrupted by line wrapping - git am
> can't understand it.  Since it's just that one line I fixed it up by
> hand but please look at your mail setup to make sure this works (this
> might've been what happened with your other patch yesterday).

After you said it corrupted git am, I used git send-email to send the
"correct irq clear method" patch and hopefully that was correct.
I will be careful next time, thanks a lot for your patient help.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] staging: rtl8723au: Fix Sparse errors in rtl8723a_cmd.c

2015-10-02 Thread Jacob Kiefer

From: Jacob Kiefer 

This patch fixes the following sparse errors:


  CHECK   drivers/staging/rtl8723au/hal/rtl8723a_cmd.c
...
drivers/staging/rtl8723au/hal/rtl8723a_cmd.c:118:25: warning: incorrect type in 
assignment (different base types)
drivers/staging/rtl8723au/hal/rtl8723a_cmd.c:118:25:expected unsigned int 
[unsigned] [usertype] 
drivers/staging/rtl8723au/hal/rtl8723a_cmd.c:118:25:got restricted __le32 
[usertype] 
drivers/staging/rtl8723au/hal/rtl8723a_cmd.c:130:14: warning: incorrect type in 
assignment (different base types)
drivers/staging/rtl8723au/hal/rtl8723a_cmd.c:130:14:expected unsigned int 
[unsigned] [usertype] mask
drivers/staging/rtl8723au/hal/rtl8723a_cmd.c:130:14:got restricted __le32 
[usertype] 
  CC [M]  drivers/staging/rtl8723au/hal/rtl8723a_cmd.o

Signed-off-by: Jacob Kiefer 
---
 drivers/staging/rtl8723au/hal/rtl8723a_cmd.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rtl8723au/hal/rtl8723a_cmd.c 
b/drivers/staging/rtl8723au/hal/rtl8723a_cmd.c
index 9733aa6..111a24d 100644
--- a/drivers/staging/rtl8723au/hal/rtl8723a_cmd.c
+++ b/drivers/staging/rtl8723au/hal/rtl8723a_cmd.c
@@ -115,9 +115,11 @@ exit:
 
 int rtl8723a_set_rssi_cmd(struct rtw_adapter *padapter, u8 *param)
 {
-   *((u32 *)param) = cpu_to_le32(*((u32 *)param));
+   __le32 leparam;
 
-   FillH2CCmd(padapter, RSSI_SETTING_EID, 3, param);
+   leparam = cpu_to_le32(*((u32 *)param));
+
+   FillH2CCmd(padapter, RSSI_SETTING_EID, 3, (u8 *));
 
return _SUCCESS;
 }
@@ -125,10 +127,11 @@ int rtl8723a_set_rssi_cmd(struct rtw_adapter *padapter, 
u8 *param)
 int rtl8723a_set_raid_cmd(struct rtw_adapter *padapter, u32 mask, u8 arg)
 {
u8 buf[5];
+   __le32 lemask;
 
memset(buf, 0, 5);
-   mask = cpu_to_le32(mask);
-   memcpy(buf, , 4);
+   lemask = cpu_to_le32(mask);
+   memcpy(buf, (u32 *), 4);
buf[4]  = arg;
 
FillH2CCmd(padapter, MACID_CONFIG_EID, 5, buf);
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info

2015-10-02 Thread Bob Liu


On 10/03/2015 01:02 AM, Roger Pau Monné wrote:
> El 05/09/15 a les 14.39, Bob Liu ha escrit:
>> Split per ring information to an new structure:blkfront_ring_info, also 
>> rename
>> per blkfront_info to blkfront_dev_info.
>   ^ removed.
>>
>> A ring is the representation of a hardware queue, every vbd device can 
>> associate
>> with one or more blkfront_ring_info depending on how many hardware
>> queues/rings to be used.
>>
>> This patch is a preparation for supporting real multi hardware queues/rings.
>>
>> Signed-off-by: Arianna Avanzini 
>> Signed-off-by: Bob Liu 
>> ---
>>  drivers/block/xen-blkfront.c |  854 
>> ++
>>  1 file changed, 445 insertions(+), 409 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 5dd591d..bf416d5 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
>>  module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, 
>> S_IRUGO);
>>  MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used 
>> for the shared ring");
>>  
>> -#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * 
>> (info)->nr_ring_pages)
>> +#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * 
>> (dinfo)->nr_ring_pages)
> 
> This change looks pointless, any reason to use dinfo instead of info?
> 
>>  #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * 
>> XENBUS_MAX_RING_PAGES)
>>  /*
>>   * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
>> @@ -116,12 +116,31 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order 
>> of pages to be used for the
>>  #define RINGREF_NAME_LEN (20)
>>  
>>  /*
>> + *  Per-ring info.
>> + *  Every blkfront device can associate with one or more blkfront_ring_info,
>> + *  depending on how many hardware queues to be used.
>> + */
>> +struct blkfront_ring_info
>> +{
>> +struct blkif_front_ring ring;
>> +unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
>> +unsigned int evtchn, irq;
>> +struct work_struct work;
>> +struct gnttab_free_callback callback;
>> +struct blk_shadow shadow[BLK_MAX_RING_SIZE];
>> +struct list_head grants;
>> +struct list_head indirect_pages;
>> +unsigned int persistent_gnts_c;
> 
> persistent grants should be per-device, not per-queue IMHO. Is it really
> hard to make this global instead of per-queue?
> 

The most important thing is keep changes minimal for better review at this 
stage.
I'll check which way has the least modification.

>> +unsigned long shadow_free;
>> +struct blkfront_dev_info *dinfo;
>> +};
>> +
>> +/*
>>   * We have one of these per vbd, whether ide, scsi or 'other'.  They
>>   * hang in private_data off the gendisk structure. We may end up
>>   * putting all kinds of interesting stuff here :-)
>>   */
>> -struct blkfront_info
>> -{
>> +struct blkfront_dev_info {
> 
> IMHO, you can leave this as blkfront_info (unless I'm missing something).
> 
>>  spinlock_t io_lock;
> 
> Shouldn't the spinlock be per-queue instead of per-device?
> 

That's in another patch for better review.
'[PATCH v3 5/9] xen/blkfront: convert per device io_lock to per ring ring_lock' 
will do that.

>>  struct mutex mutex;
>>  struct xenbus_device *xbdev;
>> @@ -129,18 +148,7 @@ struct blkfront_info
>>  int vdevice;
>>  blkif_vdev_t handle;
>>  enum blkif_state connected;
>> -int ring_ref[XENBUS_MAX_RING_PAGES];
>> -unsigned int nr_ring_pages;
>> -struct blkif_front_ring ring;
>> -unsigned int evtchn, irq;
>>  struct request_queue *rq;
>> -struct work_struct work;
>> -struct gnttab_free_callback callback;
>> -struct blk_shadow shadow[BLK_MAX_RING_SIZE];
>> -struct list_head grants;
>> -struct list_head indirect_pages;
>> -unsigned int persistent_gnts_c;
>> -unsigned long shadow_free;
>>  unsigned int feature_flush;
>>  unsigned int feature_discard:1;
>>  unsigned int feature_secdiscard:1;
>> @@ -149,7 +157,9 @@ struct blkfront_info
>>  unsigned int feature_persistent:1;
>>  unsigned int max_indirect_segments;
>>  int is_ready;
>> +unsigned int nr_ring_pages;
> 
> Spurious change? You are removing it in the chunk above and adding it
> back here.
> 

Will be fix.

> [...]
> 
>> @@ -246,33 +257,33 @@ out_of_memory:
>>  }
>>  
>>  static struct grant *get_grant(grant_ref_t *gref_head,
>> -   unsigned long pfn,
>> -   struct blkfront_info *info)
>> +   unsigned long pfn,
>> +   struct blkfront_ring_info *rinfo)
> 
> Indentation? (or my email client is mangling emails one more time...)
> 

Will be fix.

> In order to make this easier to review, do you think you can leave
> blkfront_info as "info" for now, and do the renaming to dinfo in a later
> patch. That would

Re: [RFC] asm-generic/pci_iomap.h: make custom PCI BAR requirements explicit

2015-10-02 Thread Luis R. Rodriguez

On Fri, Sep 11, 2015 at 10:14:09AM +0200, Martin Schwidefsky wrote:
> On Tue, 08 Sep 2015 15:42:40 +0200
> Arnd Bergmann  wrote:
> 
> > On Thursday 03 September 2015 03:44:15 Luis R. Rodriguez wrote:
> > > On Sun, Aug 30, 2015 at 09:30:26PM +0200, Arnd Bergmann wrote:
> > > > On Friday 28 August 2015 17:17:27 Luis R. Rodriguez wrote:
> > > > > While at it, as with the ioremap*() variants, since we have no clear
> > > > > semantics yet well defined provide a solution for them that returns
> > > > > NULL. This allows architectures to move forward by defining 
> > > > > pci_ioremap*()
> > > > > variants without requiring immediate changes to all architectures. 
> > > > > Each
> > > > > architecture then can implement their own solution as needed and
> > > > > when they get to it.
> > > > 
> > > > Which architectures are you thinking about here?
> > > 
> > > Really only S390 would benefit from this now.
> > 
> > Ok
> > 
> > > > > Build tested with allyesconfig on:
> > > > > 
> > > > > * S390
> > > > > * x86_64
> > > > > 
> > > > > Signed-off-by: Luis R. Rodriguez 
> > > > 
> > > > It's not really clear to me what the purpose of the patch is, is this 
> > > > meant as a cleanup, or are you trying to avoid some real-life bugs
> > > > you ran into?
> > > 
> > > Upon adding a new helper into CONFIG_PCI_IOMAP it was only through
> > > 0-day build testing that I found that I needed to add something for S390.
> > > This means we fix S390 reactively. With the asm-generic stuff in place
> > > to return NULL we don't need to do anything but a respective return
> > > NULL static inline, the moment that is done the author would know some
> > > architectures may not get support for the functionality they are adding.
> > > Without this we only find out reactively.
> > 
> > Hmm, my gut feeling tells me that your approach won't solve the problem
> > in general. s390 PCI is just weird in many ways and it will occasionally
> > suffer from problems like this (as do other aspects of the s390 architecture
> > that are unlike the rest of the world).
> > 
> > Maybe Martin and Heiko can comment on this, they may have a preference
> > from the s390 point of view.
> 
> I do not see how the additional Kconfig ARCH_PCI_NON_DISJUNCTIVE and the
> #ifdef indirections help with anything. An extension to lib/pci_iomap.c
> now requires an extra inline function in include/asm-generic/pci_iomap.h
> which I am sure will be added blindly without any consideration what
> s390 needs.

The purpose here was to enable evolution of this code *without* having to
require a solution in place for S390, instead on S390 such things would just
not fail to compile and when and if folks needed it they'd write it.

> Actually the patch makes it worse as the new inline will cover things up.
> Instead of a zero day compile error we will be left with a silently broken
> extension.
> 
> I prefer a compile error as it points out that there is a problem.

Of course you would though, the patch's intention is to enable folks to
have to consider a solution for S390, it would let *you* the maintainers
of S390 eventually get to it without requiring a solution in place to be
defiend for S390 always in this area due to the overlapping PCI bar
situation on S390. This is how we devised support requirements for
ioremap_*() variants, for instance, it means architecture and/or
drivers that need these variants can evolve within Linux without having
to wait to iron things out for other architectures. Is that a design
mistake ? I figured we'd want to take advantage of similar practice here,
but if S390 is soo quircky that we'd end up with too many of these
I can understand this is undesirable... but that reason is different
than *wanting* a compile error to prompt a solution before code gets
merged upstream or with the hope that a bot compile test should figure
these issues out somehow.

Now, even if these quirky design considerations are spread all over S390
it would seem to me *good* to have them well documented, it does not seem
that's the case today, so using something like this should be considered
for the gains of having these properly identified.

You guys can decide. Was just trying to be proactive here about all this.
I really don't think waiting for a compile bot to tell you there is an
issue scale wells, nor do I think its the best of designs possible.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 2/2] Revert "dax: fix race between simultaneous faults"

2015-10-02 Thread Ross Zwisler

This reverts commit 843172978bb92997310d2f7fbc172ece423cfc02.

The following two locking commits in the DAX code:

commit 843172978bb9 ("dax: fix race between simultaneous faults")
commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for DAX")

introduced a number of deadlocks and other issues, and need to be
reverted for the v4.3 kernel.  The list of issues in DAX after these
commits (some newly introduced by the commits, some preexisting) can be
found here:

https://lkml.org/lkml/2015/9/25/602

Signed-off-by: Ross Zwisler 
---
 fs/dax.c| 33 -
 mm/memory.c | 11 +++
 2 files changed, 19 insertions(+), 25 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index de3f53e..f364c90 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -285,6 +285,7 @@ static int copy_user_bh(struct page *to, struct buffer_head 
*bh,
 static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
struct vm_area_struct *vma, struct vm_fault *vmf)
 {
+   struct address_space *mapping = inode->i_mapping;
sector_t sector = bh->b_blocknr << (inode->i_blkbits - 9);
unsigned long vaddr = (unsigned long)vmf->virtual_address;
void __pmem *addr;
@@ -292,6 +293,8 @@ static int dax_insert_mapping(struct inode *inode, struct 
buffer_head *bh,
pgoff_t size;
int error;
 
+   i_mmap_lock_read(mapping);
+
/*
 * Check truncate didn't happen while we were allocating a block.
 * If it did, this block may or may not be still allocated to the
@@ -321,6 +324,8 @@ static int dax_insert_mapping(struct inode *inode, struct 
buffer_head *bh,
error = vm_insert_mixed(vma, vaddr, pfn);
 
  out:
+   i_mmap_unlock_read(mapping);
+
return error;
 }
 
@@ -382,17 +387,15 @@ int __dax_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf,
 * from a read fault and we've raced with a truncate
 */
error = -EIO;
-   goto unlock;
+   goto unlock_page;
}
-   } else {
-   i_mmap_lock_write(mapping);
}
 
error = get_block(inode, block, , 0);
if (!error && (bh.b_size < PAGE_SIZE))
error = -EIO;   /* fs corruption? */
if (error)
-   goto unlock;
+   goto unlock_page;
 
if (!buffer_mapped() && !buffer_unwritten() && !vmf->cow_page) {
if (vmf->flags & FAULT_FLAG_WRITE) {
@@ -403,9 +406,8 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault 
*vmf,
if (!error && (bh.b_size < PAGE_SIZE))
error = -EIO;
if (error)
-   goto unlock;
+   goto unlock_page;
} else {
-   i_mmap_unlock_write(mapping);
return dax_load_hole(mapping, page, vmf);
}
}
@@ -417,15 +419,17 @@ int __dax_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf,
else
clear_user_highpage(new_page, vaddr);
if (error)
-   goto unlock;
+   goto unlock_page;
vmf->page = page;
if (!page) {
+   i_mmap_lock_read(mapping);
/* Check we didn't race with truncate */
size = (i_size_read(inode) + PAGE_SIZE - 1) >>
PAGE_SHIFT;
if (vmf->pgoff >= size) {
+   i_mmap_unlock_read(mapping);
error = -EIO;
-   goto unlock;
+   goto out;
}
}
return VM_FAULT_LOCKED;
@@ -461,8 +465,6 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault 
*vmf,
WARN_ON_ONCE(!(vmf->flags & FAULT_FLAG_WRITE));
}
 
-   if (!page)
-   i_mmap_unlock_write(mapping);
  out:
if (error == -ENOMEM)
return VM_FAULT_OOM | major;
@@ -471,14 +473,11 @@ int __dax_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf,
return VM_FAULT_SIGBUS | major;
return VM_FAULT_NOPAGE | major;
 
- unlock:
+ unlock_page:
if (page) {
unlock_page(page);
page_cache_release(page);
-   } else {
-   i_mmap_unlock_write(mapping);
}
-
goto out;
 }
 EXPORT_SYMBOL(__dax_fault);
@@ -556,10 +555,10 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned 
long address,
block = (sector_t)pgoff << (PAGE_SHIFT - blkbits);
 
bh.b_size = PMD_SIZE;
-   i_mmap_lock_write(mapping);
length = get_block(inode, block, ,

[PATCH v2 1/2] Revert "mm: take i_mmap_lock in unmap_mapping_range() for DAX"

2015-10-02 Thread Ross Zwisler

This reverts commits 46c043ede4711e8d598b9d63c5616c1fedb0605e
and 8346c416d17bf5b4ea1508662959bb62e73fd6a5.

The following two locking commits in the DAX code:

commit 843172978bb9 ("dax: fix race between simultaneous faults")
commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for DAX")

introduced a number of deadlocks and other issues, and need to be
reverted for the v4.3 kernel. The list of issues in DAX after these
commits (some newly introduced by the commits, some preexisting) can be
found here:

https://lkml.org/lkml/2015/9/25/602

This revert keeps the PMEM API changes to the zeroing code in
__dax_pmd_fault(), which were added by this commit:

commit d77e92e270ed ("dax: update PMD fault handler with PMEM API")

Signed-off-by: Ross Zwisler 
---
 fs/dax.c| 50 ++
 mm/memory.c | 11 +--
 2 files changed, 27 insertions(+), 34 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index bcfb14b..de3f53e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -569,38 +569,6 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned 
long address,
if (!buffer_size_valid() || bh.b_size < PMD_SIZE)
goto fallback;
 
-   sector = bh.b_blocknr << (blkbits - 9);
-
-   if (buffer_unwritten() || buffer_new()) {
-   int i;
-
-   length = bdev_direct_access(bh.b_bdev, sector, , ,
-   bh.b_size);
-   if (length < 0) {
-   result = VM_FAULT_SIGBUS;
-   goto out;
-   }
-   if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR))
-   goto fallback;
-
-   for (i = 0; i < PTRS_PER_PMD; i++)
-   clear_pmem(kaddr + i * PAGE_SIZE, PAGE_SIZE);
-   wmb_pmem();
-   count_vm_event(PGMAJFAULT);
-   mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
-   result |= VM_FAULT_MAJOR;
-   }
-
-   /*
-* If we allocated new storage, make sure no process has any
-* zero pages covering this hole
-*/
-   if (buffer_new()) {
-   i_mmap_unlock_write(mapping);
-   unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0);
-   i_mmap_lock_write(mapping);
-   }
-
/*
 * If a truncate happened while we were allocating blocks, we may
 * leave blocks allocated to the file that are beyond EOF.  We can't
@@ -615,6 +583,13 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned 
long address,
if ((pgoff | PG_PMD_COLOUR) >= size)
goto fallback;
 
+   /*
+* If we allocated new storage, make sure no process has any
+* zero pages covering this hole
+*/
+   if (buffer_new())
+   unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0);
+
if (!write && !buffer_mapped() && buffer_uptodate()) {
spinlock_t *ptl;
pmd_t entry;
@@ -635,6 +610,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned 
long address,
result = VM_FAULT_NOPAGE;
spin_unlock(ptl);
} else {
+   sector = bh.b_blocknr << (blkbits - 9);
length = bdev_direct_access(bh.b_bdev, sector, , ,
bh.b_size);
if (length < 0) {
@@ -644,6 +620,16 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned 
long address,
if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR))
goto fallback;
 
+   if (buffer_unwritten() || buffer_new()) {
+   int i;
+   for (i = 0; i < PTRS_PER_PMD; i++)
+   clear_pmem(kaddr + i * PAGE_SIZE, PAGE_SIZE);
+   wmb_pmem();
+   count_vm_event(PGMAJFAULT);
+   mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
+   result |= VM_FAULT_MAJOR;
+   }
+
result |= vmf_insert_pfn_pmd(vma, address, pmd, pfn, write);
}
 
diff --git a/mm/memory.c b/mm/memory.c
index 9cb2747..5ec066f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2426,10 +2426,17 @@ void unmap_mapping_range(struct address_space *mapping,
if (details.last_index < details.first_index)
details.last_index = ULONG_MAX;
 
-   i_mmap_lock_write(mapping);
+
+   /*
+* DAX already holds i_mmap_lock to serialise file truncate vs
+* page fault and page fault vs page fault.
+*/
+   if (!IS_DAX(mapping->host))
+   i_mmap_lock_write(mapping);
if (unlikely(!RB_EMPTY_ROOT(>i_mmap)))
unmap_mapping_range_tree(>i_mmap, );
-   i_mmap_unlock_write(mapping);
+   if (!IS_DAX(mapping->host))
+   i_mmap_unlock_write(mapping);
 }

[PATCH v2 0/2] Revert locking changes in DAX for v4.3

2015-10-02 Thread Ross Zwisler

This series reverts some recent changes to the locking scheme in DAX introduced
by these two commits:

commit 843172978bb9 ("dax: fix race between simultaneous faults")
commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for DAX")

Changes from v1:
 -  Squashed patches 1 and 2 from the first series into a single patch to avoid
adding another spot in the git history where we could end up referencing an
uninitialized pointer.

Ross Zwisler (2):
  Revert "mm: take i_mmap_lock in unmap_mapping_range() for DAX"
  Revert "dax: fix race between simultaneous faults"

 fs/dax.c| 83 +
 mm/memory.c |  2 ++
 2 files changed, 36 insertions(+), 49 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings

2015-10-02 Thread Bob Liu


On 10/03/2015 12:22 AM, Roger Pau Monné wrote:
> El 02/10/15 a les 18.12, Wei Liu ha escrit:
>> On Fri, Oct 02, 2015 at 06:04:35PM +0200, Roger Pau Monné wrote:
>>> El 05/09/15 a les 14.39, Bob Liu ha escrit:
 Document multi queues/rings of xen-block.

 Signed-off-by: Bob Liu 
>>>
>>> As said by Konrad, you should send this against the Xen public headers
>>> also (or even before). I have a comment below.
>>>

Sure, I'll do that and also rebase this series after get more comments.

 ---
  include/xen/interface/io/blkif.h |   32 
  1 file changed, 32 insertions(+)

 diff --git a/include/xen/interface/io/blkif.h 
 b/include/xen/interface/io/blkif.h
 index c33e1c4..b453b70 100644
 --- a/include/xen/interface/io/blkif.h
 +++ b/include/xen/interface/io/blkif.h
 @@ -28,6 +28,38 @@ typedef uint16_t blkif_vdev_t;
  typedef uint64_t blkif_sector_t;
  
  /*
 + * Multiple hardware queues/rings:
 + * If supported, the backend will write the key "multi-queue-max-queues" 
 to
 + * the directory for that vbd, and set its value to the maximum supported
 + * number of queues.
 + * Frontends that are aware of this feature and wish to use it can write 
 the
 + * key "multi-queue-num-queues", set to the number they wish to use, which
 + * must be greater than zero, and no more than the value reported by the 
 backend
 + * in "multi-queue-max-queues".
 + *
 + * For frontends requesting just one queue, the usual event-channel and
 + * ring-ref keys are written as before, simplifying the backend processing
 + * to avoid distinguishing between a frontend that doesn't understand the
 + * multi-queue feature, and one that does, but requested only one queue.
 + *
 + * Frontends requesting two or more queues must not write the toplevel
 + * event-channeland ring-ref keys, instead writing those keys under 
 sub-keys
 + * having the name "queue-N" where N is the integer ID of the queue/ring 
 for
 + * which those keys belong. Queues are indexed from zero.
 + * For example, a frontend with two queues must write the following set of
 + * queue-related keys:
 + *
 + * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
 + * /local/domain/1/device/vbd/0/queue-0 = ""
 + * /local/domain/1/device/vbd/0/queue-0/ring-ref = ""
 + * /local/domain/1/device/vbd/0/queue-0/event-channel = ""
 + * /local/domain/1/device/vbd/0/queue-1 = ""
 + * /local/domain/1/device/vbd/0/queue-1/ring-ref = ""
 + * /local/domain/1/device/vbd/0/queue-1/event-channel = ""
>>>
>>> AFAICT, it's impossible by design to use multiple queues together with
>>> multipage rings, is that right?
>>>
>>
>> As far as I can tell, these two features are not inherently coupled.
>> Whether you want to make (by design) them coupled together or not is
>> another matter. :-)
> 
> I haven't looked at the implementation yet, but some mention of whether
> multipage-rings are allowed with multiqueue would be good. For example
> if both can indeed be used in conjunction I would mention:
> 
> If multi-page rings are also used, the format of the grant references
> will be:
> 
> /local/domain/1/device/vbd/0/queue-0/ring-ref0 = ""
> /local/domain/1/device/vbd/0/queue-0/ring-ref1 = ""
> /local/domain/1/device/vbd/0/queue-0/ring-ref2 = ""
> [...]
> 

True, and this is already supported. I'll update the document next version.

-- 
Regards,
-Bob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] asm-generic/pci_iomap.h: make custom PCI BAR requirements explicit

2015-10-02 Thread Luis R. Rodriguez

On Tue, Sep 08, 2015 at 03:42:40PM +0200, Arnd Bergmann wrote:
> On Thursday 03 September 2015 03:44:15 Luis R. Rodriguez wrote:
> > On Sun, Aug 30, 2015 at 09:30:26PM +0200, Arnd Bergmann wrote:
> > > On Friday 28 August 2015 17:17:27 Luis R. Rodriguez wrote:
> > > > While at it, as with the ioremap*() variants, since we have no clear
> > > > semantics yet well defined provide a solution for them that returns
> > > > NULL. This allows architectures to move forward by defining 
> > > > pci_ioremap*()
> > > > variants without requiring immediate changes to all architectures. Each
> > > > architecture then can implement their own solution as needed and
> > > > when they get to it.
> > > 
> > > Which architectures are you thinking about here?
> > 
> > Really only S390 would benefit from this now.
> 
> Ok
> 
> > > > Build tested with allyesconfig on:
> > > > 
> > > > * S390
> > > > * x86_64
> > > > 
> > > > Signed-off-by: Luis R. Rodriguez 
> > > 
> > > It's not really clear to me what the purpose of the patch is, is this 
> > > meant as a cleanup, or are you trying to avoid some real-life bugs
> > > you ran into?
> > 
> > Upon adding a new helper into CONFIG_PCI_IOMAP it was only through
> > 0-day build testing that I found that I needed to add something for S390.
> > This means we fix S390 reactively. With the asm-generic stuff in place
> > to return NULL we don't need to do anything but a respective return
> > NULL static inline, the moment that is done the author would know some
> > architectures may not get support for the functionality they are adding.
> > Without this we only find out reactively.
> 
> Hmm, my gut feeling tells me that your approach won't solve the problem
> in general. s390 PCI is just weird in many ways and it will occasionally
> suffer from problems like this (as do other aspects of the s390 architecture
> that are unlike the rest of the world).
> 
> Maybe Martin and Heiko can comment on this, they may have a preference
> from the s390 point of view.

Hrm, so S390 is quirky is really odd ways that no other architecture is or
is at least for now not expected to be ?

> > > The version from lib/iomap.c seems correct for uses of 
> > > CONFIG_GENERIC_IOMAP,
> > > but most architectures can do better without that option.
> > 
> > By do better do you mean a more optimized solution ?
> 
> Yes: most architectures access the PCI I/O space through memory mapped I/O,
> so we can return a regular __iomem pointer from ioport_map, rather than
> a garbled pointer that the x86 version has to use.
> 
> This means we also get to define iowrite32() to be identical to writel()
> and can save the conditional for each caller.
>
> The lib/iomap.c version is really only needed for architectures that use
> pointer access for PCI memory space, and special instructions for PCI
> I/O space, like x86. s390 has special instructions for both, some
> architectures do not have any I/O port access at all, and most of them
> treat memory and I/O space the same way.

I see, thanks.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 0/5] PCI: Add support for PCI Enhanced Allocation "BARs"

2015-10-02 Thread Sean O. Stalley

Hi David,

I did a quick look through & overall I what you have done.
I will try to find some time to do a full review early next week.

Thanks Again,
Sean

On Fri, Oct 02, 2015 at 03:37:51PM -0700, David Daney wrote:
> From: David Daney 
> 
> The original patches are from Sean O. Stalley. I made a few tweaks,
> but feel that it is substancially Sean's work, so I am keeping the
> patch set version numbering scheme going.
> 
> Tested on Cavium ThunderX system with 4 Root Complexes containing 50
> devices/bridges provisioned with EA.
> 
> Here is Sean's description of the patches:
> 
> PCI Enhanced Allocation is a new method of allocating MMIO & IO
> resources for PCI devices & bridges. It can be used instead
> of the traditional PCI method of using BARs.
> 
> EA entries are hardware-initialized to a fixed address.
> Unlike BARs, regions described by EA are cannot be moved.
> Because of this, only devices which are permanently connected to
> the PCI bus can use EA. A removable PCI card must not use EA.
> 
> This patchset adds support for using EA entries instead of BARs
> on Root Complex Integrated Endpoints.
> 
> The Enhanced Allocation ECN is publicly available here:
> https://www.pcisig.com/specifications/conventional/ECN_Enhanced_Allocation_23_Oct_2014_Final.pdf
> 
> 
> Changes from V1:
>   - Use generic PCI resource claim functions (instead of EA-specific 
> functions)
>   - Only add support for RCiEPs (instead of all devices).
>   - Removed some debugging messages leftover from early testing.
> 
> Changes from V2 (By David Daney):
>   - Add ea_cap to struct pci_device, to aid in finding the EA capability.
>   - Factored EA entity decoding into a separate function.
>   - Add functions to find EA entities by BEI or Property.
>   - Add handling of EA provisioned bridges.
>   - Add handling of EA SRIOV BARs.
>   - Try to assign proper resource parent so that SRIOV device creation 
> can occur.
> 
> Changes from V3 (By David Daney):
>   - Discarded V3 changes and started over fresh based on Sean's V2.
>   - Add more support/checking for Entry Properties.
>   - Allow EA behind bridges.
>   - Rewrite some error messages.
>   - Add patch 3/5 to prevent resizing, and better handle
>   assigning, of fixed EA resources.
>   - Add patch 4/5 to handle EA provisioned SRIOV devices.
>   - Add patch 5/5 to handle EA provisioned bridges.
> 
> David Daney (3):
>   PCI: Handle IORESOURCE_PCI_FIXED when sizing and assigning resources.
>   PCI: Handle Enhanced Allocation (EA) capability for SRIOV devices.
>   PCI: Handle Enhanced Allocation (EA) capability for bridges
> 
> Sean O. Stalley (2):
>   PCI: Add Enhanced Allocation register entries
>   PCI: Add support for Enhanced Allocation devices
> 
>  drivers/pci/bus.c |   7 ++
>  drivers/pci/iov.c |  11 ++-
>  drivers/pci/pci.c | 202 
> ++
>  drivers/pci/pci.h |   1 +
>  drivers/pci/probe.c   |  34 ++-
>  drivers/pci/setup-bus.c   |  63 -
>  include/linux/pci.h   |   1 +
>  include/uapi/linux/pci_regs.h |  44 -
>  8 files changed, 355 insertions(+), 8 deletions(-)
> 
> -- 
> 1.9.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 3/5] PCI: Handle IORESOURCE_PCI_FIXED when sizing and assigning resources.

2015-10-02 Thread David Daney


On 10/02/2015 04:14 PM, Yinghai Lu wrote:

On Fri, Oct 2, 2015 at 3:37 PM, David Daney  wrote:

From: David Daney 

The new Enhanced Allocation (EA) capability support creates resources
with the IORESOURCE_PCI_FIXED set.  This creates a couple of problems:

1) Since these resources cannot be relocated or resized, their
alignment is not really defined, and it is therefore not specified.
This causes a problem in pbus_size_mem() where resources with
unspecified alignment are disabled.

2) During resource assignment in pci_bus_assign_resources(),
IORESOURCE_PCI_FIXED resources are not given a parent.  This, in
turn, causes pci_enable_resources() to fail with a "not claimed"
error.

So, in pbus_size_mem() skip IORESOURCE_PCI_FIXED resources, instead of
disabling them.

In __pci_bus_assign_resources(), for IORESOURCE_PCI_FIXED resources,
try to request the resource from a parent bus.


Can you check if

https://patchwork.kernel.org/patch/7304971/
[v6,05/53] PCI: Don't release fixed resource for realloc



This one isn't relevant as the problem is seen when we are acquiring 
resources, not releasing them.




https://patchwork.kernel.org/patch/7304371/
[v6,06/53] PCI: Claim fixed resource during remove/rescan path


This one is interesting, but I don't think it will work.

pci_claim_resource() calls pci_find_parent_resource(), which will fail 
in important use cases.


It is perfectly legal for a bridge provisioned by EA to not specify any 
resources.  In this case we must walk up the bus tree until we find 
something that contains the device resource, and can thus be a parent.


That is a big part of what my patch is doing.

As for the merits of assigning fixed resources from the FINAL fixup, 
rather than in __pci_bus_assign_resources(), I am unsure.


Thanks,
David Daney




address the the problem that you met?

 Yinghai



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] Revert "dax: fix NULL pointer in __dax_pmd_fault()"

2015-10-02 Thread Ross Zwisler

On Fri, Oct 02, 2015 at 02:11:03PM -0700, Dan Williams wrote:
> On Fri, Oct 2, 2015 at 2:02 PM, Ross Zwisler
>  wrote:
> > This reverts commit 8346c416d17bf5b4ea1508662959bb62e73fd6a5.
> >
> > This commit did fix the issue it intended to fix, but it turns out that
> > the locking changes introduced by these two commits:
> >
> > commit 843172978bb9 ("dax: fix race between simultaneous faults")
> > commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for 
> > DAX")
> >
> > had other issues as well, so they need to just be reverted.
> 
> Wait, why introduce two points in the kernel history where we have a
> known uninitialized variable?  I'd say fix up the revert of "mm: take
> i_mmap_lock in unmap_mapping_range() for DAX" to address the conflict
> with the fix, one less patch and keeps the stability rolling forward.

Essentially because I wasn't sure about the rules regarding reverts, if there
are any.  I assumed (perhaps incorrectly) that you'd want a 1:1 relationship
between original commits and reverts.  If it's better to not have intermediate
breakage, sure, let's squash them.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] asm-generic/pci_iomap.h: make custom PCI BAR requirements explicit

2015-10-02 Thread Luis R. Rodriguez

On Fri, Aug 28, 2015 at 07:13:08PM -0700, Randy Dunlap wrote:
> On 08/28/15 17:17, Luis R. Rodriguez wrote:
> > 
> >  arch/s390/Kconfig |  8 +
> >  arch/s390/include/asm/io.h| 11 ---
> >  arch/s390/include/asm/pci.h   |  2 --
> >  arch/s390/include/asm/pci_iomap.h | 33 +
> >  arch/s390/pci/pci.c   |  2 ++
> >  include/asm-generic/io.h  | 12 
> >  include/asm-generic/iomap.h   | 10 ---
> >  include/asm-generic/pci_iomap.h   | 62 
> > +++
> >  lib/Kconfig   |  1 +
> >  lib/pci_iomap.c   |  5 
> >  10 files changed, 105 insertions(+), 41 deletions(-)
> >  create mode 100644 arch/s390/include/asm/pci_iomap.h
> > 
> > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> > index 1d57000b1b24..1217b7db4265 100644
> > --- a/arch/s390/Kconfig
> > +++ b/arch/s390/Kconfig
> > @@ -614,6 +614,14 @@ endif  # PCI
> >  config PCI_DOMAINS
> > def_bool PCI
> >  
> > +config ARCH_PCI_NON_DISJUNCTIVE
> > +   def_bool PCI
> > +   help
> > + On the S390 architecture PCI BAR spaces are not disjunctive, as such
> 
>   are not disjoint?  may be overlapping?
> 
> > + the PCI bar is required on a series of otherwise asm generic PCI
> > + routines, as such S390 requires itw own implemention for these
> 
> its own implementation

Thanks, I've re-written this as:

mcgrof@ergon ~/linux-next (git::(no branch, rebasing 20150805-pend-all))$ git 
diff
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index f4725d1af438..8ba5826ed13b 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -618,10 +618,10 @@ config PCI_DOMAINS
 config ARCH_PCI_NON_DISJUNCTIVE
def_bool PCI
help
- On the S390 architecture PCI BAR spaces are not disjunctive, as such
- the PCI bar is required on a series of otherwise asm generic PCI
- routines, as such S390 requires itw own implemention for these
- routines.
+ On the S390 architecture PCI BAR spaces may overlap with each other,
+ because of this the PCI bar is required on a series of otherwise asm
+ generic PCI routines and this in turn requires S390 to provide its
+ own implementation for these routines.
 
 config HAS_IOMEM
def_bool PCI
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 01/53] sparc/PCI: Add mem64 resource parsing for root bus

2015-10-02 Thread Yinghai Lu

On Fri, Oct 2, 2015 at 4:05 PM, Khalid Aziz  wrote:
> On 10/02/2015 04:05 PM, Yinghai Lu wrote:

> I still see lots of "no compatible
> bridge window" messages but overlapping address ranges are no longer
> reserved

Please send boot log to me, I'd like to fix them.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ver_linux: module-init-tools.patch

2015-10-02 Thread Jim Davis

On Fri, Oct 2, 2015 at 1:57 PM, Alexander Kapshuk
 wrote:
> On Fri, Oct 2, 2015 at 11:22 PM, Alexander Kapshuk
>  wrote:
>> On Fri, Oct 2, 2015 at 10:45 PM, Jim Davis  wrote:
>>> On Fri, Oct 2, 2015 at 12:35 PM, Jim Davis  wrote:
 On Fri, Oct 2, 2015 at 12:03 PM, Alexander Kapshuk

>>> +depmod=`whereis depmod | awk '{print $2}'`
>>
>>>
 I suspect it'll be hard to come up with something that's 100%
 foolproof and respects user's choices.  Sticking with searching the
 user's $PATH at least won't lead to surprises about which program is
 being run...
>>>
>>> Though looking back at your patch, what might work is to look first
>>> for depmod in the user's $PATH and then try whereis only if that
>>> fails.  I'm not convinced that's much better than just searching
>>> $PATH, but that at least would go with the user's preference first.
>>>
>>> --
>>> Jim
>>
>> Seems like the way to go. Thanks.
>>
>> I'll resubmit this and the other patches tomorrow with this
>> consideration in mind.
>
> What do you think of this?
>
> which depmod >/dev/null 2>&1 && depmod=depmod ||
> depmod=`whereis depmod | awk '{print $2}'`
>
> test -n "$depmod" -a -x "$depmod" &&
> $depmod -V 2>&1 |
> sed '
> /[0-9]$/!d
> s/[^0-9\.]//g
> s/^/module-init-tools\t/
> '

Looks good, thanks.
-- 
Jim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 3/5] PCI: Handle IORESOURCE_PCI_FIXED when sizing and assigning resources.

2015-10-02 Thread Yinghai Lu

On Fri, Oct 2, 2015 at 3:37 PM, David Daney  wrote:
> From: David Daney 
>
> The new Enhanced Allocation (EA) capability support creates resources
> with the IORESOURCE_PCI_FIXED set.  This creates a couple of problems:
>
> 1) Since these resources cannot be relocated or resized, their
>alignment is not really defined, and it is therefore not specified.
>This causes a problem in pbus_size_mem() where resources with
>unspecified alignment are disabled.
>
> 2) During resource assignment in pci_bus_assign_resources(),
>IORESOURCE_PCI_FIXED resources are not given a parent.  This, in
>turn, causes pci_enable_resources() to fail with a "not claimed"
>error.
>
> So, in pbus_size_mem() skip IORESOURCE_PCI_FIXED resources, instead of
> disabling them.
>
> In __pci_bus_assign_resources(), for IORESOURCE_PCI_FIXED resources,
> try to request the resource from a parent bus.

Can you check if

https://patchwork.kernel.org/patch/7304971/
[v6,05/53] PCI: Don't release fixed resource for realloc

https://patchwork.kernel.org/patch/7304371/
[v6,06/53] PCI: Claim fixed resource during remove/rescan path

address the the problem that you met?

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 01/53] sparc/PCI: Add mem64 resource parsing for root bus

2015-10-02 Thread Khalid Aziz


On 10/02/2015 04:05 PM, Yinghai Lu wrote:

On Fri, Oct 2, 2015 at 1:00 PM, Khalid Aziz  wrote:

On Wed, 2015-09-30 at 22:52 -0700, Yinghai Lu wrote:

Found "no compatible bridge window" warning in boot log from T5-8.

pci :00:01.0: can't claim BAR 15 [mem 0x1-0x4afff pref]: no 
compatible bridge window

That resource is above 4G, but does not get offset correctly as
root bus only report io and mem32.

pci_sun4v f02dbcfc: PCI host bridge to bus :00
pci_bus :00: root bus resource [io  0x8040-0x80400fff] (bus 
address [0x-0xfff])
pci_bus :00: root bus resource [mem 0x8000-0x80007eff] (bus 
address [0x-0x7eff])
pci_bus :00: root bus resource [bus 00-77]

Add mem64 handling in pci_common for sparc, so we can have 64bit resource
registered for root bus at first.

After patch, will have:
pci_sun4v f02dbcfc: PCI host bridge to bus :00
pci_bus :00: root bus resource [io  0x8040-0x80400fff] (bus 
address [0x-0xfff])
pci_bus :00: root bus resource [mem 0x8000-0x80007eff] (bus 
address [0x-0x7eff])
pci_bus :00: root bus resource [mem 0x8001-0x8007] (bus 
address [0x1-0x7])
pci_bus :00: root bus resource [bus 00-77]

-v2: mem64_space should use mem_space.start as offset.
-v3: add IORESOURCE_MEM_64 flag

...

PCI: Scanning PBM /pci@301
pci_sun4v f0339c2c: PCI host bridge to bus 0009:00
pci_bus 0009:00: root bus resource [io  0x2027e4000-0x2027e4fff] (bus 
address [0x-0xfff])
pci_bus 0009:00: root bus resource [mem 0x202400010-0x202407eff] (bus 
address [0x-0x7eef])
pci_bus 0009:00: root bus resource [mem 0x20241-0x2024d] (bus 
address [0xfff0-0xdffef])


Looks like offset for mmio64 is not right.

Please check attached patch on the this platform  and T5-8.


Good catch! That code change looks like the right thing to do and it 
fixed my issues with ixgbe not attaching. I still see lots of "no 
compatible bridge window" messages but overlapping address ranges are no 
longer reserved and as a result drivers are able to ioremap BARs 
successfully.


I have tested it on a T7 and one other platform where I was seeing 
problems without mem64 and both platforms work with this patch. I have 
not been able to get my hands on a T5-8 yet but I will try again on Monday.


Thanks,
Khalid

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Regression: at24 eeprom writing

2015-10-02 Thread Peter Rosin

Hi!

I recently upgraded from the atmel linux-3.18-at91 kernel to vanilla 4.2
and everything seemed fine. Until I tried to write to the little eeprom
chip. I then tried the linux-4.1-at91 kernel and that suffers too.

The symptoms are that it seems like writes get interrupted, and restarted
again without properly initializing everything again. Inspecting the i2c
bus during these fails gets me something like this (int hex) when I

echo abcdefghijklmnopqr > /sys/bus/i2c/devices/0-0050/eeprom

S a0 00 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 P
S a0 10 (clk and data low for a "long" time) 10 71 72 0a P

Notice how the address byte in the second chunk (10) is repeated after
the strange event on the i2c bus.

I looked around and found that if I revert 
a839ce663b3183209fdf7b1fc4796bfe2a4679c3
"eeprom: at24: extend driver to allow writing via i2c_smbus_write_byte_data"
eeprom writing starts working again.

AFAICT, the i2c-at91 bus driver makes the eeprom driver use the
i2c_transfer code path both with that patch and with it reverted,
so I sadly don't see why the patch makes a difference.

I'm on a board that is based on the sama5d31 evaluation kit, with a
NXP SE97BTP,547 chip and this in the devicetree:

i2c0: i2c@f0014000 {
status = "okay";

jc42@18 {
compatible = "jc42";
reg = <0x18>;
};

eeprom@50 {
compatible = "24c02";
reg = <0x50>;
pagesize = <16>;
};
};

Any ideas?

Cheers,
Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/10] clk: iproc: add support for BCM NS, NSP, and NS2

2015-10-02 Thread Jon Mason

This patch series adds support for the Broadcom Northstar, Northstar
Plus, and Northstar 2 clocks.  Some slight modifications were necessary
to clk-iproc-pll to get Northstar and Northstar Plus working, due to
differences in register layout.  This is the reason why the first patch
is necessary.  Some more modifications were necessary to clk-iproc-pll
to get Northstar 2 working, due to differences in register layout (and
resulting fallout in Cygnus and NSP).  This is the reason why the sixth
and seventh patches are necessary.  The fifth patch is clean-up to
prevent accidentally forgetting to adjust for the base write errata
(which happened a few times, but was caught in internal review).

There is a potential merge "race" between the device tree changes and
the clk changes.  If the device tree changes go in before the clk
changes, there is a window where there are non-working clk entries in
the device tree.  So, it makes the most sense for this series to be
pulled into the clk maintainer's tree solely.  

Also, the Northstar Plus device tree modifications were left out of this
series due to potential complications with the merging of this series.
Northstar Plus was recently accepted, and only exists in Florian's tree.
This would cause merge issues in the clk tree.  So, the NSP device tree
changes will be submitted at a later date.

Thanks,
Jon


 .../bindings/clock/brcm,iproc-clocks.txt   |  78 ++
 arch/arm/boot/dts/bcm5301x.dtsi|  67 -
 arch/arm64/Kconfig.platforms   |   1 +
 arch/arm64/boot/dts/broadcom/ns2.dtsi  |  81 ++
 drivers/clk/Makefile   |   3 +-
 drivers/clk/bcm/Makefile   |   3 +
 drivers/clk/bcm/clk-cygnus.c   |  17 +-
 drivers/clk/bcm/clk-iproc-pll.c| 183 +++--
 drivers/clk/bcm/clk-iproc.h|  22 +-
 drivers/clk/bcm/clk-ns2.c  | 290 +
 drivers/clk/bcm/clk-nsp.c  | 143 ++
 include/dt-bindings/clock/bcm-ns2.h|  72 +
 include/dt-bindings/clock/bcm-nsp.h|  51 
 13 files changed, 918 insertions(+), 93 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/10] clk: iproc: Add PWRCTRL support

2015-10-02 Thread Jon Mason

Some iProc SoC clocks use a different way to control clock power, via
the PWRDWN bit in the PLL control register.  Since the PLL control
register is used to access the PWRDWN bit, there is no need for the
pwr_base when this is being used.  A new flag, IPROC_CLK_EMBED_PWRCTRL,
has been added to identify this usage.  We can use the AON interface to
write the values to enable/disable PWRDOWN.

Signed-off-by: Jon Mason 
---
 drivers/clk/bcm/clk-iproc-pll.c | 55 -
 drivers/clk/bcm/clk-iproc.h |  6 +
 2 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/drivers/clk/bcm/clk-iproc-pll.c b/drivers/clk/bcm/clk-iproc-pll.c
index 2dda4e8..e029ab3 100644
--- a/drivers/clk/bcm/clk-iproc-pll.c
+++ b/drivers/clk/bcm/clk-iproc-pll.c
@@ -148,14 +148,25 @@ static void __pll_disable(struct iproc_pll *pll)
writel(val, pll->asiu_base + ctrl->asiu.offset);
}
 
-   /* latch input value so core power can be shut down */
-   val = readl(pll->pwr_base + ctrl->aon.offset);
-   val |= (1 << ctrl->aon.iso_shift);
-   writel(val, pll->pwr_base + ctrl->aon.offset);
-
-   /* power down the core */
-   val &= ~(bit_mask(ctrl->aon.pwr_width) << ctrl->aon.pwr_shift);
-   writel(val, pll->pwr_base + ctrl->aon.offset);
+   if (ctrl->flags & IPROC_CLK_EMBED_PWRCTRL) {
+   val = readl(pll->pll_base + ctrl->aon.offset);
+   val |= (bit_mask(ctrl->aon.pwr_width) << ctrl->aon.pwr_shift);
+   writel(val, pll->pll_base + ctrl->aon.offset);
+
+   if (unlikely(ctrl->flags & IPROC_CLK_NEEDS_READ_BACK))
+   readl(pll->pll_base + ctrl->aon.offset);
+   }
+
+   if (pll->pwr_base) {
+   /* latch input value so core power can be shut down */
+   val = readl(pll->pwr_base + ctrl->aon.offset);
+   val |= (1 << ctrl->aon.iso_shift);
+   writel(val, pll->pwr_base + ctrl->aon.offset);
+
+   /* power down the core */
+   val &= ~(bit_mask(ctrl->aon.pwr_width) << ctrl->aon.pwr_shift);
+   writel(val, pll->pwr_base + ctrl->aon.offset);
+   }
 }
 
 static int __pll_enable(struct iproc_pll *pll)
@@ -163,11 +174,22 @@ static int __pll_enable(struct iproc_pll *pll)
const struct iproc_pll_ctrl *ctrl = pll->ctrl;
u32 val;
 
-   /* power up the PLL and make sure it's not latched */
-   val = readl(pll->pwr_base + ctrl->aon.offset);
-   val |= bit_mask(ctrl->aon.pwr_width) << ctrl->aon.pwr_shift;
-   val &= ~(1 << ctrl->aon.iso_shift);
-   writel(val, pll->pwr_base + ctrl->aon.offset);
+   if (ctrl->flags & IPROC_CLK_EMBED_PWRCTRL) {
+   val = readl(pll->pll_base + ctrl->aon.offset);
+   val &= ~(bit_mask(ctrl->aon.pwr_width) << ctrl->aon.pwr_shift);
+   writel(val, pll->pll_base + ctrl->aon.offset);
+
+   if (unlikely(ctrl->flags & IPROC_CLK_NEEDS_READ_BACK))
+   readl(pll->pll_base + ctrl->aon.offset);
+   }
+
+   if (pll->pwr_base) {
+   /* power up the PLL and make sure it's not latched */
+   val = readl(pll->pwr_base + ctrl->aon.offset);
+   val |= bit_mask(ctrl->aon.pwr_width) << ctrl->aon.pwr_shift;
+   val &= ~(1 << ctrl->aon.iso_shift);
+   writel(val, pll->pwr_base + ctrl->aon.offset);
+   }
 
/* certain PLLs also need to be ungated from the ASIU top level */
if (ctrl->flags & IPROC_CLK_PLL_ASIU) {
@@ -610,9 +632,8 @@ void __init iproc_pll_clk_setup(struct device_node *node,
if (WARN_ON(!pll->pll_base))
goto err_pll_iomap;
 
+   /* Some SoCs do not require the pwr_base, thus failing is not fatal */
pll->pwr_base = of_iomap(node, 1);
-   if (WARN_ON(!pll->pwr_base))
-   goto err_pwr_iomap;
 
/* some PLLs require gating control at the top ASIU level */
if (pll_ctrl->flags & IPROC_CLK_PLL_ASIU) {
@@ -695,9 +716,9 @@ err_pll_register:
iounmap(pll->asiu_base);
 
 err_asiu_iomap:
-   iounmap(pll->pwr_base);
+   if (pll->pwr_base)
+   iounmap(pll->pwr_base);
 
-err_pwr_iomap:
iounmap(pll->pll_base);
 
 err_pll_iomap:
diff --git a/drivers/clk/bcm/clk-iproc.h b/drivers/clk/bcm/clk-iproc.h
index d834b7a..ff7bfad 100644
--- a/drivers/clk/bcm/clk-iproc.h
+++ b/drivers/clk/bcm/clk-iproc.h
@@ -49,6 +49,12 @@
 #define IPROC_CLK_PLL_NEEDS_SW_CFG BIT(4)
 
 /*
+ * Some PLLs use a different way to control clock power, via the PWRDWN bit in
+ * the PLL control register
+ */
+#define IPROC_CLK_EMBED_PWRCTRL BIT(5)
+
+/*
  * Parameters for VCO frequency configuration
  *
  * VCO frequency =
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read

Re: v5 of seccomp filter c/r patches

2015-10-02 Thread Kees Cook

On Fri, Oct 2, 2015 at 3:57 PM, Daniel Borkmann  wrote:
> On 10/03/2015 12:44 AM, Tycho Andersen wrote:
>>
>> On Fri, Oct 02, 2015 at 02:10:24PM -0700, Kees Cook wrote:
>
> ...
>>
>> Ok, how about,
>>
>> struct sock_filter insns[BPF_MAXINSNS];
>> insn_cnt = ptrace(PTRACE_SECCOMP_GET_FILTER, pid, insns, i);
>
>
> Would also be good that when the storage buffer (insns) is NULL,
> it just returns you the number of sock_filter insns (or 0 when
> nothing attached).
>
> That would be consistent with classic socket filters (see
> sk_get_filter()), and user space could allocate a specific
> size instead of always passing in max insns.

Yes please. :)

>> when asking for the ith filter? It returns either the number of
>> instructions, -EINVAL if something was wrong (i, pid,
>> CONFIG_CHECKPOINT_RESTORE isn't enabled). While it would always
>> succeed now, if/when the underlying filter was not created from a bpf
>> classic filter, we can return -EMEDIUMTYPE? (Suggestions welcome, I
>> picked this mostly based on what sounds nice.)

We can bikeshed the non-classic case when we need it, but I think
EINVAL is "not under seccomp", and ENOENT is "no such index".

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/10] clk: iproc: define Broadcom NSP iProc clock binding

2015-10-02 Thread Jon Mason

Document the device tree bindings for Broadcom Northstar Plus
architecture based clock controller

Signed-off-by: Jon Mason 
---
 .../bindings/clock/brcm,iproc-clocks.txt   | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt 
b/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt
index da8d9bb..b3c3e9d 100644
--- a/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt
+++ b/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt
@@ -130,3 +130,33 @@ These clock IDs are defined in:
 ch3_unused mipipll  4   BCM_CYGNUS_MIPIPLL_CH3_UNUSED
 ch4_unused mipipll  5   BCM_CYGNUS_MIPIPLL_CH4_UNUSED
 ch5_unused mipipll  6   BCM_CYGNUS_MIPIPLL_CH5_UNUSED
+
+Northstar and Northstar Plus
+--
+PLL and leaf clock compatible strings for Northstar and Northstar Plus are:
+ "brcm,nsp-armpll"
+ "brcm,nsp-genpll"
+ "brcm,nsp-lcpll0"
+
+The following table defines the set of PLL/clock index and ID for Northstar and
+Northstar Plus.  These clock IDs are defined in:
+"include/dt-bindings/clock/bcm-nsp.h"
+
+Clock  Source  Index   ID
+----   -   -
+crystalN/A N/A N/A
+
+armpll crystal N/A N/A
+
+genpll crystal 0   BCM_NSP_GENPLL
+phygenpll  1   BCM_NSP_GENPLL_PHY_CLK
+ethernetclkgenpll  2   BCM_NSP_GENPLL_ENET_SW_CLK
+usbclk genpll  3   BCM_NSP_GENPLL_USB_PHY_REF_CLK
+iprocfast  genpll  4   BCM_NSP_GENPLL_IPROCFAST_CLK
+sata1  genpll  5   BCM_NSP_GENPLL_SATA1_CLK
+sata2  genpll  6   BCM_NSP_GENPLL_SATA2_CLK
+
+lcpll0 crystal 0   BCM_NSP_LCPLL0
+pcie_phy   lcpll0  1   BCM_NSP_LCPLL0_PCIE_PHY_REF_CLK
+sdio   lcpll0  2   BCM_NSP_LCPLL0_SDIO_CLK
+ddr_phylcpll0  3   BCM_NSP_LCPLL0_DDR_PHY_CLK
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 04/10] ARM: dts: enable clock support for BCM5301X

2015-10-02 Thread Jon Mason

Replace current device tree dummy clocks with real clock support for
Broadcom Northstar SoCs.

Signed-off-by: Jon Mason 
---
 arch/arm/boot/dts/bcm5301x.dtsi | 67 -
 1 file changed, 60 insertions(+), 7 deletions(-)

diff --git a/arch/arm/boot/dts/bcm5301x.dtsi b/arch/arm/boot/dts/bcm5301x.dtsi
index 6f50f67..f717859 100644
--- a/arch/arm/boot/dts/bcm5301x.dtsi
+++ b/arch/arm/boot/dts/bcm5301x.dtsi
@@ -8,6 +8,7 @@
  * Licensed under the GNU/GPL. See COPYING for details.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -55,14 +56,14 @@
compatible = "arm,cortex-a9-global-timer";
reg = <0x0200 0x100>;
interrupts = ;
-   clocks = <_periph>;
+   clocks = <_clk>;
};
 
local-timer@0600 {
compatible = "arm,cortex-a9-twd-timer";
reg = <0x0600 0x100>;
interrupts = ;
-   clocks = <_periph>;
+   clocks = <_clk>;
};
 
gic: interrupt-controller@1000 {
@@ -94,14 +95,66 @@
 
clocks {
#address-cells = <1>;
-   #size-cells = <0>;
+   #size-cells = <1>;
+   ranges;
 
-   /* As long as we do not have a real clock driver us this
-* fixed clock */
-   clk_periph: periph {
+   osc: oscillator {
+   #clock-cells = <0>;
compatible = "fixed-clock";
+   clock-frequency = <2500>;
+   };
+
+   lcpll0: lcpll0@1800c100 {
+   #clock-cells = <1>;
+   compatible = "brcm,nsp-lcpll0";
+   reg = <0x1800c100 0x14>;
+   clocks = <>;
+   clock-output-names = "lcpll0", "pcie_phy", "sdio",
+"ddr_phy";
+   };
+
+   genpll: genpll@1800c140 {
+   #clock-cells = <1>;
+   compatible = "brcm,nsp-genpll";
+   reg = <0x1800c140 0x24>;
+   clocks = <>;
+   clock-output-names = "genpll", "phy", "ethernetclk",
+"usbclk", "iprocfast", "sata1",
+"sata2";
+   };
+
+   iprocmed: iprocmed {
+   #clock-cells = <0>;
+   compatible = "fixed-factor-clock";
+   clocks = < BCM_NSP_GENPLL_IPROCFAST_CLK>;
+   clock-div = <2>;
+   clock-mult = <1>;
+   clock-output-names = "iprocmed";
+   };
+
+   iprocslow: iprocslow {
+   #clock-cells = <0>;
+   compatible = "fixed-factor-clock";
+   clocks = < BCM_NSP_GENPLL_IPROCFAST_CLK>;
+   clock-div = <4>;
+   clock-mult = <1>;
+   clock-output-names = "iprocslow";
+   };
+
+
+   a9pll: arm_clk@1900 {
+   #clock-cells = <0>;
+   compatible = "brcm,nsp-armpll";
+   clocks = <>;
+   reg = <0x1900 0x1000>;
+   };
+
+   periph_clk: periph_clk {
#clock-cells = <0>;
-   clock-frequency = <4>;
+   compatible = "fixed-factor-clock";
+   clocks = <>;
+   clock-div = <2>;
+   clock-mult = <1>;
};
};
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/10] clk: nsp: add clock support for Broadcom Northstar Plus SoC

2015-10-02 Thread Jon Mason

The Broadcom Northstar Plus SoC is architected under the iProc
architecture. It has the following PLLs: ARMPLL, GENPLL, LCPLL0, all
derived from an onboard crystal.

Signed-off-by: Jon Mason 
---
 drivers/clk/bcm/Makefile|   2 +
 drivers/clk/bcm/clk-nsp.c   | 139 
 include/dt-bindings/clock/bcm-nsp.h |  51 +
 3 files changed, 192 insertions(+)
 create mode 100644 drivers/clk/bcm/clk-nsp.c
 create mode 100644 include/dt-bindings/clock/bcm-nsp.h

diff --git a/drivers/clk/bcm/Makefile b/drivers/clk/bcm/Makefile
index 8a7a477..e258b28 100644
--- a/drivers/clk/bcm/Makefile
+++ b/drivers/clk/bcm/Makefile
@@ -4,3 +4,5 @@ obj-$(CONFIG_CLK_BCM_KONA)  += clk-bcm281xx.o
 obj-$(CONFIG_CLK_BCM_KONA) += clk-bcm21664.o
 obj-$(CONFIG_COMMON_CLK_IPROC) += clk-iproc-armpll.o clk-iproc-pll.o 
clk-iproc-asiu.o
 obj-$(CONFIG_ARCH_BCM_CYGNUS)  += clk-cygnus.o
+obj-$(CONFIG_ARCH_BCM_NSP) += clk-nsp.o
+obj-$(CONFIG_ARCH_BCM_5301X)   += clk-nsp.o
diff --git a/drivers/clk/bcm/clk-nsp.c b/drivers/clk/bcm/clk-nsp.c
new file mode 100644
index 000..708961a
--- /dev/null
+++ b/drivers/clk/bcm/clk-nsp.c
@@ -0,0 +1,139 @@
+/*
+ * Copyright (C) 2015 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include "clk-iproc.h"
+
+#define reg_val(o, s, w) { .offset = o, .shift = s, .width = w, }
+
+#define aon_val(o, pw, ps, is) { .offset = o, .pwr_width = pw, \
+   .pwr_shift = ps, .iso_shift = is }
+
+#define reset_val(o, rs, prs, kis, kiw, kps, kpw, kas, kaw) { .offset = o, \
+   .reset_shift = rs, .p_reset_shift = prs, .ki_shift = kis, \
+   .ki_width = kiw, .kp_shift = kps, .kp_width = kpw, .ka_shift = kas, \
+   .ka_width = kaw }
+
+#define vco_ctrl_val(uo, lo) { .u_offset = uo, .l_offset = lo }
+
+#define enable_val(o, es, hs, bs) { .offset = o, .enable_shift = es, \
+   .hold_shift = hs, .bypass_shift = bs }
+
+static void __init nsp_armpll_init(struct device_node *node)
+{
+   iproc_armpll_setup(node);
+}
+CLK_OF_DECLARE(nsp_armpll, "brcm,nsp-armpll", nsp_armpll_init);
+
+static const struct iproc_pll_ctrl genpll = {
+   .flags = IPROC_CLK_PLL_HAS_NDIV_FRAC | IPROC_CLK_EMBED_PWRCTRL,
+   .aon = aon_val(0x0, 1, 12, 0),
+   .reset = reset_val(0x0, 11, 10, 4, 3, 0, 4, 7, 3),
+   .ndiv_int = reg_val(0x14, 20, 10),
+   .ndiv_frac = reg_val(0x14, 0, 20),
+   .pdiv = reg_val(0x18, 24, 3),
+   .status = reg_val(0x20, 12, 1),
+};
+
+static const struct iproc_clk_ctrl genpll_clk[] = {
+   [BCM_NSP_GENPLL_PHY_CLK] = {
+   .channel = BCM_NSP_GENPLL_PHY_CLK,
+   .flags = IPROC_CLK_AON,
+   .enable = enable_val(0x4, 12, 6, 18),
+   .mdiv = reg_val(0x18, 16, 8),
+   },
+   [BCM_NSP_GENPLL_ENET_SW_CLK] = {
+   .channel = BCM_NSP_GENPLL_ENET_SW_CLK,
+   .flags = IPROC_CLK_AON,
+   .enable = enable_val(0x4, 13, 7, 19),
+   .mdiv = reg_val(0x18, 8, 8),
+   },
+   [BCM_NSP_GENPLL_USB_PHY_REF_CLK] = {
+   .channel = BCM_NSP_GENPLL_USB_PHY_REF_CLK,
+   .flags = IPROC_CLK_AON,
+   .enable = enable_val(0x4, 14, 8, 20),
+   .mdiv = reg_val(0x18, 0, 8),
+   },
+   [BCM_NSP_GENPLL_IPROCFAST_CLK] = {
+   .channel = BCM_NSP_GENPLL_IPROCFAST_CLK,
+   .flags = IPROC_CLK_AON,
+   .enable = enable_val(0x4, 15, 9, 21),
+   .mdiv = reg_val(0x1c, 16, 8),
+   },
+   [BCM_NSP_GENPLL_SATA1_CLK] = {
+   .channel = BCM_NSP_GENPLL_SATA1_CLK,
+   .flags = IPROC_CLK_AON,
+   .enable = enable_val(0x4, 16, 10, 22),
+   .mdiv = reg_val(0x1c, 8, 8),
+   },
+   [BCM_NSP_GENPLL_SATA2_CLK] = {
+   .channel = BCM_NSP_GENPLL_SATA2_CLK,
+   .flags = IPROC_CLK_AON,
+   .enable = enable_val(0x4, 17, 11, 23),
+   .mdiv = reg_val(0x1c, 0, 8),
+   },
+};
+
+static void __init nsp_genpll_clk_init(struct device_node *node)
+{
+   iproc_pll_clk_setup(node, , NULL, 0, genpll_clk,
+   ARRAY_SIZE(genpll_clk));
+}
+CLK_OF_DECLARE(nsp_genpll_clk, "brcm,nsp-genpll", nsp_genpll_clk_init);
+
+static const struct iproc_pll_ctrl lcpll0 = {
+   .flags = IPROC_CLK_PLL_HAS_NDIV_FRAC | IPROC_CLK_EMBED_PWRCTRL,
+   .aon = aon_val(0x0, 1, 24, 0),
+   .reset = reset_val(0x0, 23, 22, 16,

[PATCH 07/10] clk: iproc: Separate status and control variables

2015-10-02 Thread Jon Mason

Some PLLs have separate registers for Status and Control.  The means the
pll_base needs to be split into 2 new variables, so that those PLLs can
specify device tree registers for those independently.  Also, add a new
driver flag to identify this presence of the split, and let the driver
know that additional registers need to be used.

Signed-off-by: Jon Mason 
---
 drivers/clk/bcm/clk-iproc-pll.c | 96 -
 drivers/clk/bcm/clk-iproc.h |  6 +++
 2 files changed, 62 insertions(+), 40 deletions(-)

diff --git a/drivers/clk/bcm/clk-iproc-pll.c b/drivers/clk/bcm/clk-iproc-pll.c
index 882aced..c8c993d 100644
--- a/drivers/clk/bcm/clk-iproc-pll.c
+++ b/drivers/clk/bcm/clk-iproc-pll.c
@@ -74,7 +74,8 @@ struct iproc_clk {
 };
 
 struct iproc_pll {
-   void __iomem *pll_base;
+   void __iomem *status_base;
+   void __iomem *control_base;
void __iomem *pwr_base;
void __iomem *asiu_base;
 
@@ -127,7 +128,7 @@ static int pll_wait_for_lock(struct iproc_pll *pll)
const struct iproc_pll_ctrl *ctrl = pll->ctrl;
 
for (i = 0; i < LOCK_DELAY; i++) {
-   u32 val = readl(pll->pll_base + ctrl->status.offset);
+   u32 val = readl(pll->status_base + ctrl->status.offset);
 
if (val & (1 << ctrl->status.shift))
return 0;
@@ -145,7 +146,7 @@ static void iproc_pll_write(struct iproc_pll *pll, void 
__iomem *base,
writel(val, base + offset);
 
if (unlikely(ctrl->flags & IPROC_CLK_NEEDS_READ_BACK &&
-base == pll->pll_base))
+(base == pll->status_base || base == pll->control_base)))
val = readl(base + offset);
 }
 
@@ -161,9 +162,9 @@ static void __pll_disable(struct iproc_pll *pll)
}
 
if (ctrl->flags & IPROC_CLK_EMBED_PWRCTRL) {
-   val = readl(pll->pll_base + ctrl->aon.offset);
+   val = readl(pll->control_base + ctrl->aon.offset);
val |= (bit_mask(ctrl->aon.pwr_width) << ctrl->aon.pwr_shift);
-   iproc_pll_write(pll, pll->pll_base, ctrl->aon.offset, val);
+   iproc_pll_write(pll, pll->control_base, ctrl->aon.offset, val);
}
 
if (pll->pwr_base) {
@@ -184,9 +185,9 @@ static int __pll_enable(struct iproc_pll *pll)
u32 val;
 
if (ctrl->flags & IPROC_CLK_EMBED_PWRCTRL) {
-   val = readl(pll->pll_base + ctrl->aon.offset);
+   val = readl(pll->control_base + ctrl->aon.offset);
val &= ~(bit_mask(ctrl->aon.pwr_width) << ctrl->aon.pwr_shift);
-   iproc_pll_write(pll, pll->pll_base, ctrl->aon.offset, val);
+   iproc_pll_write(pll, pll->control_base, ctrl->aon.offset, val);
}
 
if (pll->pwr_base) {
@@ -213,9 +214,9 @@ static void __pll_put_in_reset(struct iproc_pll *pll)
const struct iproc_pll_ctrl *ctrl = pll->ctrl;
const struct iproc_pll_reset_ctrl *reset = >reset;
 
-   val = readl(pll->pll_base + reset->offset);
+   val = readl(pll->control_base + reset->offset);
val &= ~(1 << reset->reset_shift | 1 << reset->p_reset_shift);
-   iproc_pll_write(pll, pll->pll_base, reset->offset, val);
+   iproc_pll_write(pll, pll->control_base, reset->offset, val);
 }
 
 static void __pll_bring_out_reset(struct iproc_pll *pll, unsigned int kp,
@@ -226,17 +227,17 @@ static void __pll_bring_out_reset(struct iproc_pll *pll, 
unsigned int kp,
const struct iproc_pll_reset_ctrl *reset = >reset;
const struct iproc_pll_dig_filter_ctrl *dig_filter = >dig_filter;
 
-   val = readl(pll->pll_base + dig_filter->offset);
+   val = readl(pll->control_base + dig_filter->offset);
val &= ~(bit_mask(dig_filter->ki_width) << dig_filter->ki_shift |
bit_mask(dig_filter->kp_width) << dig_filter->kp_shift |
bit_mask(dig_filter->ka_width) << dig_filter->ka_shift);
val |= ki << dig_filter->ki_shift | kp << dig_filter->kp_shift |
   ka << dig_filter->ka_shift;
-   iproc_pll_write(pll, pll->pll_base, dig_filter->offset, val);
+   iproc_pll_write(pll, pll->control_base, dig_filter->offset, val);
 
-   val = readl(pll->pll_base + reset->offset);
+   val = readl(pll->control_base + reset->offset);
val |= 1 << reset->reset_shift | 1 << reset->p_reset_shift;
-   iproc_pll_write(pll, pll->pll_base, reset->offset, val);
+   iproc_pll_write(pll, pll->control_base, reset->offset, val);
 }
 
 static int pll_set_rate(struct iproc_clk *clk, unsigned int rate_index,
@@ -291,9 +292,9 @@ static int pll_set_rate(struct iproc_clk *clk, unsigned int 
rate_index,
/* put PLL in reset */
__pll_put_in_reset(pll);
 
-   iproc_pll_write(pll, pll->pll_base, ctrl->vco_ctrl.u_offset, 0);
+   iproc_pll_write(pll, pll->control_base, ctrl->vco_ctrl.u_offset, 0);
 
-   val = readl(pll->pll_base + ctrl->vco_ctrl.l_offset);
+

Re: v5 of seccomp filter c/r patches

2015-10-02 Thread Daniel Borkmann


On 10/03/2015 12:44 AM, Tycho Andersen wrote:

On Fri, Oct 02, 2015 at 02:10:24PM -0700, Kees Cook wrote:

...

Ok, how about,

struct sock_filter insns[BPF_MAXINSNS];
insn_cnt = ptrace(PTRACE_SECCOMP_GET_FILTER, pid, insns, i);


Would also be good that when the storage buffer (insns) is NULL,
it just returns you the number of sock_filter insns (or 0 when
nothing attached).

That would be consistent with classic socket filters (see
sk_get_filter()), and user space could allocate a specific
size instead of always passing in max insns.


when asking for the ith filter? It returns either the number of
instructions, -EINVAL if something was wrong (i, pid,
CONFIG_CHECKPOINT_RESTORE isn't enabled). While it would always
succeed now, if/when the underlying filter was not created from a bpf
classic filter, we can return -EMEDIUMTYPE? (Suggestions welcome, I
picked this mostly based on what sounds nice.)

Tycho



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/10] clk: iproc: Split off dig_filter

2015-10-02 Thread Jon Mason

The PLL loop filter/gain can be located in a separate register on some
SoCs.  Split these off into a separate variable, so that an offset can
be added if necessary.  Also, make the necessary modifications to the
Cygnus and NSP drivers for this change.

Signed-off-by: Jon Mason 
---
 drivers/clk/bcm/clk-cygnus.c| 17 +++--
 drivers/clk/bcm/clk-iproc-pll.c | 14 +-
 drivers/clk/bcm/clk-iproc.h | 10 +-
 drivers/clk/bcm/clk-nsp.c   | 14 +-
 4 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/drivers/clk/bcm/clk-cygnus.c b/drivers/clk/bcm/clk-cygnus.c
index 316c603..c526143 100644
--- a/drivers/clk/bcm/clk-cygnus.c
+++ b/drivers/clk/bcm/clk-cygnus.c
@@ -34,9 +34,11 @@
{ .offset = o, .en_shift = es, .high_shift = hs, \
.high_width = hw, .low_shift = ls, .low_width = lw }
 
-#define reset_val(o, rs, prs, kis, kiw, kps, kpw, kas, kaw) { .offset = o, \
-   .reset_shift = rs, .p_reset_shift = prs, .ki_shift = kis, \
-   .ki_width = kiw, .kp_shift = kps, .kp_width = kpw, .ka_shift = kas, \
+#define reset_val(o, rs, prs) { .offset = o, .reset_shift = rs, \
+   .p_reset_shift = prs }
+
+#define df_val(o, kis, kiw, kps, kpw, kas, kaw) { .offset = o, .ki_shift = 
kis,\
+   .ki_width = kiw, .kp_shift = kps, .kp_width = kpw, .ka_shift = kas,\
.ka_width = kaw }
 
 #define vco_ctrl_val(uo, lo) { .u_offset = uo, .l_offset = lo }
@@ -56,7 +58,8 @@ static const struct iproc_pll_ctrl genpll = {
.flags = IPROC_CLK_AON | IPROC_CLK_PLL_HAS_NDIV_FRAC |
IPROC_CLK_PLL_NEEDS_SW_CFG,
.aon = aon_val(0x0, 2, 1, 0),
-   .reset = reset_val(0x0, 11, 10, 4, 3, 0, 4, 7, 3),
+   .reset = reset_val(0x0, 11, 10),
+   .dig_filter = df_val(0x0, 4, 3, 0, 4, 7, 3),
.sw_ctrl = sw_ctrl_val(0x10, 31),
.ndiv_int = reg_val(0x10, 20, 10),
.ndiv_frac = reg_val(0x10, 0, 20),
@@ -114,7 +117,8 @@ CLK_OF_DECLARE(cygnus_genpll, "brcm,cygnus-genpll", 
cygnus_genpll_clk_init);
 static const struct iproc_pll_ctrl lcpll0 = {
.flags = IPROC_CLK_AON | IPROC_CLK_PLL_NEEDS_SW_CFG,
.aon = aon_val(0x0, 2, 5, 4),
-   .reset = reset_val(0x0, 31, 30, 27, 3, 23, 4, 19, 4),
+   .reset = reset_val(0x0, 31, 30),
+   .dig_filter = df_val(0x0, 27, 3, 23, 4, 19, 4),
.sw_ctrl = sw_ctrl_val(0x4, 31),
.ndiv_int = reg_val(0x4, 16, 10),
.pdiv = reg_val(0x4, 26, 4),
@@ -191,7 +195,8 @@ static const struct iproc_pll_ctrl mipipll = {
 IPROC_CLK_NEEDS_READ_BACK,
.aon = aon_val(0x0, 4, 17, 16),
.asiu = asiu_gate_val(0x0, 3),
-   .reset = reset_val(0x0, 11, 10, 4, 3, 0, 4, 7, 4),
+   .reset = reset_val(0x0, 11, 10),
+   .dig_filter = df_val(0x0, 4, 3, 0, 4, 7, 4),
.ndiv_int = reg_val(0x10, 20, 10),
.ndiv_frac = reg_val(0x10, 0, 20),
.pdiv = reg_val(0x14, 0, 4),
diff --git a/drivers/clk/bcm/clk-iproc-pll.c b/drivers/clk/bcm/clk-iproc-pll.c
index a4602aa..882aced 100644
--- a/drivers/clk/bcm/clk-iproc-pll.c
+++ b/drivers/clk/bcm/clk-iproc-pll.c
@@ -224,13 +224,17 @@ static void __pll_bring_out_reset(struct iproc_pll *pll, 
unsigned int kp,
u32 val;
const struct iproc_pll_ctrl *ctrl = pll->ctrl;
const struct iproc_pll_reset_ctrl *reset = >reset;
+   const struct iproc_pll_dig_filter_ctrl *dig_filter = >dig_filter;
+
+   val = readl(pll->pll_base + dig_filter->offset);
+   val &= ~(bit_mask(dig_filter->ki_width) << dig_filter->ki_shift |
+   bit_mask(dig_filter->kp_width) << dig_filter->kp_shift |
+   bit_mask(dig_filter->ka_width) << dig_filter->ka_shift);
+   val |= ki << dig_filter->ki_shift | kp << dig_filter->kp_shift |
+  ka << dig_filter->ka_shift;
+   iproc_pll_write(pll, pll->pll_base, dig_filter->offset, val);
 
val = readl(pll->pll_base + reset->offset);
-   val &= ~(bit_mask(reset->ki_width) << reset->ki_shift |
-bit_mask(reset->kp_width) << reset->kp_shift |
-bit_mask(reset->ka_width) << reset->ka_shift);
-   val |=  ki << reset->ki_shift | kp << reset->kp_shift |
-   ka << reset->ka_shift;
val |= 1 << reset->reset_shift | 1 << reset->p_reset_shift;
iproc_pll_write(pll, pll->pll_base, reset->offset, val);
 }
diff --git a/drivers/clk/bcm/clk-iproc.h b/drivers/clk/bcm/clk-iproc.h
index ff7bfad..b71c197 100644
--- a/drivers/clk/bcm/clk-iproc.h
+++ b/drivers/clk/bcm/clk-iproc.h
@@ -94,12 +94,19 @@ struct iproc_pll_aon_pwr_ctrl {
 };
 
 /*
- * Control of the PLL reset, with Ki, Kp, and Ka parameters
+ * Control of the PLL reset
  */
 struct iproc_pll_reset_ctrl {
unsigned int offset;
unsigned int reset_shift;
unsigned int p_reset_shift;
+};
+
+/*
+ * Control of the Ki, Kp, and Ka parameters
+ */
+struct iproc_pll_dig_filter_ctrl {
+   unsigned int offset;
unsigned int ki_shift;

Re: v5 of seccomp filter c/r patches

2015-10-02 Thread Tycho Andersen

On Sat, Oct 03, 2015 at 12:57:49AM +0200, Daniel Borkmann wrote:
> On 10/03/2015 12:44 AM, Tycho Andersen wrote:
> >On Fri, Oct 02, 2015 at 02:10:24PM -0700, Kees Cook wrote:
> ...
> >Ok, how about,
> >
> >struct sock_filter insns[BPF_MAXINSNS];
> >insn_cnt = ptrace(PTRACE_SECCOMP_GET_FILTER, pid, insns, i);
> 
> Would also be good that when the storage buffer (insns) is NULL,
> it just returns you the number of sock_filter insns (or 0 when
> nothing attached).
> 
> That would be consistent with classic socket filters (see
> sk_get_filter()), and user space could allocate a specific
> size instead of always passing in max insns.

Yep, the current set does this with SECCOMP_FD_DUMP and I agree that
it's nice behavior, so I'll plan on preserving it.

Thanks,

Tycho
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/10] clk: iproc: define Broadcom NS2 iProc clock binding

2015-10-02 Thread Jon Mason

Document the device tree bindings for Broadcom Northstar 2 architecture
based clock controller

Signed-off-by: Jon Mason 
---
 .../bindings/clock/brcm,iproc-clocks.txt   | 48 ++
 1 file changed, 48 insertions(+)

diff --git a/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt 
b/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt
index b3c3e9d..ede65a5 100644
--- a/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt
+++ b/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt
@@ -160,3 +160,51 @@ Northstar Plus.  These clock IDs are defined in:
 pcie_phy   lcpll0  1   BCM_NSP_LCPLL0_PCIE_PHY_REF_CLK
 sdio   lcpll0  2   BCM_NSP_LCPLL0_SDIO_CLK
 ddr_phylcpll0  3   BCM_NSP_LCPLL0_DDR_PHY_CLK
+
+Northstar 2
+---
+PLL and leaf clock compatible strings for Northstar 2 are:
+"brcm,ns2-genpll-scr"
+"brcm,ns2-genpll-sw"
+"brcm,ns2-lcpll-ddr"
+"brcm,ns2-lcpll-ports"
+
+The following table defines the set of PLL/clock index and ID for Northstar 2.
+These clock IDs are defined in:
+"include/dt-bindings/clock/bcm-ns2.h"
+
+Clock  Source  Index   ID
+----   -   -
+crystalN/A N/A N/A
+
+genpll_scr crystal 0   BCM_NS2_GENPLL_SCR
+scrgenpll_scr  1   BCM_NS2_GENPLL_SCR_SCR_CLK
+fs genpll_scr  2   BCM_NS2_GENPLL_SCR_FS_CLK
+audio_ref  genpll_scr  3   BCM_NS2_GENPLL_SCR_AUDIO_CLK
+ch3_unused genpll_scr  4   BCM_NS2_GENPLL_SCR_CH3_UNUSED
+ch4_unused genpll_scr  5   BCM_NS2_GENPLL_SCR_CH4_UNUSED
+ch5_unused genpll_scr  6   BCM_NS2_GENPLL_SCR_CH5_UNUSED
+
+genpll_sw  crystal 0   BCM_NS2_GENPLL_SW
+rpegenpll_sw   1   BCM_NS2_GENPLL_SW_RPE_CLK
+250genpll_sw   2   BCM_NS2_GENPLL_SW_250_CLK
+nicgenpll_sw   3   BCM_NS2_GENPLL_SW_NIC_CLK
+chimp  genpll_sw   4   BCM_NS2_GENPLL_SW_CHIMP_CLK
+port   genpll_sw   5   BCM_NS2_GENPLL_SW_PORT_CLK
+sdio   genpll_sw   6   BCM_NS2_GENPLL_SW_SDIO_CLK
+
+lcpll_ddr  crystal 0   BCM_NS2_LCPLL_DDR
+pcie_sata_usb lcpll_ddr1   BCM_NS2_LCPLL_DDR_PCIE_SATA_USB_CLK
+ddrlcpll_ddr   2   BCM_NS2_LCPLL_DDR_DDR_CLK
+ch2_unused lcpll_ddr   3   BCM_NS2_LCPLL_DDR_CH2_UNUSED
+ch3_unused lcpll_ddr   4   BCM_NS2_LCPLL_DDR_CH3_UNUSED
+ch4_unused lcpll_ddr   5   BCM_NS2_LCPLL_DDR_CH4_UNUSED
+ch5_unused lcpll_ddr   6   BCM_NS2_LCPLL_DDR_CH5_UNUSED
+
+lcpll_portscrystal 0   BCM_NS2_LCPLL_PORTS
+wanlcpll_ports 1   BCM_NS2_LCPLL_PORTS_WAN_CLK
+rgmii  lcpll_ports 2   BCM_NS2_LCPLL_PORTS_RGMII_CLK
+ch2_unused lcpll_ports 3   BCM_NS2_LCPLL_PORTS_CH2_UNUSED
+ch3_unused lcpll_ports 4   BCM_NS2_LCPLL_PORTS_CH3_UNUSED
+ch4_unused lcpll_ports 5   BCM_NS2_LCPLL_PORTS_CH4_UNUSED
+ch5_unused lcpll_ports 6   BCM_NS2_LCPLL_PORTS_CH5_UNUSED
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/10] ARM: dts: enable clock support for Broadcom NS2

2015-10-02 Thread Jon Mason

Add device tree entries for clock support for Broadcom Northstar 2 SoC

Signed-off-by: Jon Mason 
---
 arch/arm64/boot/dts/broadcom/ns2.dtsi | 81 +++
 1 file changed, 81 insertions(+)

diff --git a/arch/arm64/boot/dts/broadcom/ns2.dtsi 
b/arch/arm64/boot/dts/broadcom/ns2.dtsi
index 3c92d92..0b8921e 100644
--- a/arch/arm64/boot/dts/broadcom/ns2.dtsi
+++ b/arch/arm64/boot/dts/broadcom/ns2.dtsi
@@ -31,6 +31,7 @@
  */
 
 #include 
+#include 
 
 /memreserve/ 0x84b0 0x0008;
 
@@ -89,6 +90,86 @@
  IRQ_TYPE_EDGE_RISING)>;
};
 
+   clocks {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0 0 0x6501c000 0x2000>;
+
+   osc: oscillator {
+   #clock-cells = <0>;
+   compatible = "fixed-clock";
+   clock-frequency = <2500>;
+   };
+
+   lcpll_ddr: lcpll_ddr@1058 {
+   #clock-cells = <1>;
+   compatible = "brcm,ns2-lcpll-ddr";
+   reg = <0x1058 0x20>,
+ <0x0020 0x4>,
+ <0x104c 0x4>;
+   clocks = <>;
+   clock-output-names = "lcpll_ddr", "pcie_sata_usb",
+"ddr", "ddr_ch2_unused",
+"ddr_ch3_unused", "ddr_ch4_unused",
+"ddr_ch5_unused";
+   };
+
+   lcpll_ports: lcpll_ports@1078 {
+   #clock-cells = <1>;
+   compatible = "brcm,ns2-lcpll-ports";
+   reg = <0x1078 0x20>,
+ <0x0020 0x4>,
+ <0x1054 0x4>;
+   clocks = <>;
+   clock-output-names = "lcpll_ports", "wan", "rgmii",
+"ports_ch2_unused",
+"ports_ch3_unused",
+"ports_ch4_unused",
+"ports_ch5_unused";
+   };
+
+   genpll_scr: genpll_scr@1098 {
+   #clock-cells = <1>;
+   compatible = "brcm,ns2-genpll-scr";
+   reg = <0x1098 0x32>,
+ <0x0020 0x4>,
+ <0x1044 0x4>;
+   clocks = <>;
+   clock-output-names = "genpll_scr", "scr", "fs",
+"audio_ref", "scr_ch3_unused",
+"scr_ch4_unused", "scr_ch5_unused";
+   };
+
+   genpll_sw: genpll_sw@10c4 {
+   #clock-cells = <1>;
+   compatible = "brcm,ns2-genpll-sw";
+   reg = <0x10c4 0x32>,
+ <0x0020 0x4>,
+ <0x1044 0x4>;
+   clocks = <>;
+   clock-output-names = "genpll_sw", "rpe", "250", "nic",
+"chimp", "port", "sdio";
+   };
+
+   iprocmed: iprocmed {
+   #clock-cells = <0>;
+   compatible = "fixed-factor-clock";
+   clocks = <_scr BCM_NS2_GENPLL_SCR_SCR_CLK>;
+   clock-div = <2>;
+   clock-mult = <1>;
+   clock-output-names = "iprocmed";
+   };
+
+   iprocslow: iprocslow {
+   #clock-cells = <0>;
+   compatible = "fixed-factor-clock";
+   clocks = <_scr BCM_NS2_GENPLL_SCR_SCR_CLK>;
+   clock-div = <4>;
+   clock-mult = <1>;
+   clock-output-names = "iprocslow";
+   };
+   };
+
soc: soc {
compatible = "simple-bus";
#address-cells = <1>;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 09/10] clk: ns2: add clock support for Broadcom Northstar 2 SoC

2015-10-02 Thread Jon Mason

The Broadcom Northstar 2 SoC is architected under the iProc
architecture. It has the following PLLs: GENPLL SCR, GENPLL SW,
LCPLL DDR, LCPLL Ports, all derived from an onboard crystal.

Signed-off-by: Jon Mason 
---
 arch/arm64/Kconfig.platforms|   1 +
 drivers/clk/Makefile|   3 +-
 drivers/clk/bcm/Makefile|   1 +
 drivers/clk/bcm/clk-ns2.c   | 290 
 include/dt-bindings/clock/bcm-ns2.h |  72 +
 5 files changed, 366 insertions(+), 1 deletion(-)
 create mode 100644 drivers/clk/bcm/clk-ns2.c
 create mode 100644 include/dt-bindings/clock/bcm-ns2.h

diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms
index 23800a1..2790f21 100644
--- a/arch/arm64/Kconfig.platforms
+++ b/arch/arm64/Kconfig.platforms
@@ -2,6 +2,7 @@ menu "Platform selection"
 
 config ARCH_BCM_IPROC
bool "Broadcom iProc SoC Family"
+   select COMMON_CLK_IPROC
help
  This enables support for Broadcom iProc based SoCs
 
diff --git a/drivers/clk/Makefile b/drivers/clk/Makefile
index d08b3e5..ea81eaa 100644
--- a/drivers/clk/Makefile
+++ b/drivers/clk/Makefile
@@ -47,7 +47,8 @@ obj-$(CONFIG_COMMON_CLK_WM831X)   += clk-wm831x.o
 obj-$(CONFIG_COMMON_CLK_XGENE) += clk-xgene.o
 obj-$(CONFIG_COMMON_CLK_PWM)   += clk-pwm.o
 obj-$(CONFIG_COMMON_CLK_AT91)  += at91/
-obj-$(CONFIG_ARCH_BCM) += bcm/
+obj-$(CONFIG_CLK_BCM_KONA) += bcm/
+obj-$(CONFIG_COMMON_CLK_IPROC) += bcm/
 obj-$(CONFIG_ARCH_BERLIN)  += berlin/
 obj-$(CONFIG_ARCH_HISI)+= hisilicon/
 obj-$(CONFIG_ARCH_MXC) += imx/
diff --git a/drivers/clk/bcm/Makefile b/drivers/clk/bcm/Makefile
index e258b28..2d1cbc5 100644
--- a/drivers/clk/bcm/Makefile
+++ b/drivers/clk/bcm/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_CLK_BCM_KONA)  += clk-kona-setup.o
 obj-$(CONFIG_CLK_BCM_KONA) += clk-bcm281xx.o
 obj-$(CONFIG_CLK_BCM_KONA) += clk-bcm21664.o
 obj-$(CONFIG_COMMON_CLK_IPROC) += clk-iproc-armpll.o clk-iproc-pll.o 
clk-iproc-asiu.o
+obj-$(CONFIG_COMMON_CLK_IPROC) += clk-ns2.o
 obj-$(CONFIG_ARCH_BCM_CYGNUS)  += clk-cygnus.o
 obj-$(CONFIG_ARCH_BCM_NSP) += clk-nsp.o
 obj-$(CONFIG_ARCH_BCM_5301X)   += clk-nsp.o
diff --git a/drivers/clk/bcm/clk-ns2.c b/drivers/clk/bcm/clk-ns2.c
new file mode 100644
index 000..1d08281
--- /dev/null
+++ b/drivers/clk/bcm/clk-ns2.c
@@ -0,0 +1,290 @@
+/*
+ * Copyright (C) 2015 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include "clk-iproc.h"
+
+#define reg_val(o, s, w) { .offset = o, .shift = s, .width = w, }
+
+#define aon_val(o, pw, ps, is) { .offset = o, .pwr_width = pw, \
+   .pwr_shift = ps, .iso_shift = is }
+
+#define reset_val(o, rs, prs) { .offset = o, .reset_shift = rs, \
+   .p_reset_shift = prs }
+
+#define df_val(o, kis, kiw, kps, kpw, kas, kaw) { .offset = o, .ki_shift = 
kis,\
+   .ki_width = kiw, .kp_shift = kps, .kp_width = kpw, .ka_shift = kas,\
+   .ka_width = kaw }
+
+#define vco_ctrl_val(uo, lo) { .u_offset = uo, .l_offset = lo }
+
+#define enable_val(o, es, hs, bs) { .offset = o, .enable_shift = es, \
+   .hold_shift = hs, .bypass_shift = bs }
+
+static const struct iproc_pll_ctrl genpll_scr = {
+   .flags = IPROC_CLK_AON | IPROC_CLK_PLL_SPLIT_STAT_CTRL,
+   .aon = aon_val(0x0, 1, 15, 12),
+   .reset = reset_val(0x4, 2, 1),
+   .dig_filter = df_val(0x0, 9, 3, 5, 4, 2, 3),
+   .ndiv_int = reg_val(0x8, 4, 10),
+   .pdiv = reg_val(0x8, 0, 4),
+   .vco_ctrl = vco_ctrl_val(0x10, 0xc),
+   .status = reg_val(0x0, 27, 1),
+};
+
+
+static const struct iproc_clk_ctrl genpll_scr_clk[] = {
+   /* bypass_shift, the last value passed into enable_val(), is not defined
+* in NS2.  However, it doesn't appear to be used anywhere, so setting
+* it to 0.
+*/
+   [BCM_NS2_GENPLL_SCR_SCR_CLK] = {
+   .channel = BCM_NS2_GENPLL_SCR_SCR_CLK,
+   .flags = IPROC_CLK_AON,
+   .enable = enable_val(0x0, 18, 12, 0),
+   .mdiv = reg_val(0x18, 0, 8),
+   },
+   [BCM_NS2_GENPLL_SCR_FS_CLK] = {
+   .channel = BCM_NS2_GENPLL_SCR_FS_CLK,
+   .flags = IPROC_CLK_AON,
+   .enable = enable_val(0x0, 19, 13, 0),
+   .mdiv = reg_val(0x18, 8, 8),
+   },
+   [BCM_NS2_GENPLL_SCR_AUDIO_CLK] = {
+

[PATCH 05/10] clk: iproc: Add PLL base write function

2015-10-02 Thread Jon Mason

All writes to the PLL base address must be flushed if the
IPROC_CLK_NEEDS_READ_BACK flag is set.  If we add a function to make the
necessary write and reads, we can make sure that any future code which
makes PLL base writes will do the correct thing.

Signed-off-by: Jon Mason 
---
 drivers/clk/bcm/clk-iproc-pll.c | 80 +
 1 file changed, 33 insertions(+), 47 deletions(-)

diff --git a/drivers/clk/bcm/clk-iproc-pll.c b/drivers/clk/bcm/clk-iproc-pll.c
index e029ab3..a4602aa 100644
--- a/drivers/clk/bcm/clk-iproc-pll.c
+++ b/drivers/clk/bcm/clk-iproc-pll.c
@@ -137,6 +137,18 @@ static int pll_wait_for_lock(struct iproc_pll *pll)
return -EIO;
 }
 
+static void iproc_pll_write(struct iproc_pll *pll, void __iomem *base,
+   u32 offset, u32 val)
+{
+   const struct iproc_pll_ctrl *ctrl = pll->ctrl;
+
+   writel(val, base + offset);
+
+   if (unlikely(ctrl->flags & IPROC_CLK_NEEDS_READ_BACK &&
+base == pll->pll_base))
+   val = readl(base + offset);
+}
+
 static void __pll_disable(struct iproc_pll *pll)
 {
const struct iproc_pll_ctrl *ctrl = pll->ctrl;
@@ -145,27 +157,24 @@ static void __pll_disable(struct iproc_pll *pll)
if (ctrl->flags & IPROC_CLK_PLL_ASIU) {
val = readl(pll->asiu_base + ctrl->asiu.offset);
val &= ~(1 << ctrl->asiu.en_shift);
-   writel(val, pll->asiu_base + ctrl->asiu.offset);
+   iproc_pll_write(pll, pll->asiu_base, ctrl->asiu.offset, val);
}
 
if (ctrl->flags & IPROC_CLK_EMBED_PWRCTRL) {
val = readl(pll->pll_base + ctrl->aon.offset);
val |= (bit_mask(ctrl->aon.pwr_width) << ctrl->aon.pwr_shift);
-   writel(val, pll->pll_base + ctrl->aon.offset);
-
-   if (unlikely(ctrl->flags & IPROC_CLK_NEEDS_READ_BACK))
-   readl(pll->pll_base + ctrl->aon.offset);
+   iproc_pll_write(pll, pll->pll_base, ctrl->aon.offset, val);
}
 
if (pll->pwr_base) {
/* latch input value so core power can be shut down */
val = readl(pll->pwr_base + ctrl->aon.offset);
val |= (1 << ctrl->aon.iso_shift);
-   writel(val, pll->pwr_base + ctrl->aon.offset);
+   iproc_pll_write(pll, pll->pwr_base, ctrl->aon.offset, val);
 
/* power down the core */
val &= ~(bit_mask(ctrl->aon.pwr_width) << ctrl->aon.pwr_shift);
-   writel(val, pll->pwr_base + ctrl->aon.offset);
+   iproc_pll_write(pll, pll->pwr_base, ctrl->aon.offset, val);
}
 }
 
@@ -177,10 +186,7 @@ static int __pll_enable(struct iproc_pll *pll)
if (ctrl->flags & IPROC_CLK_EMBED_PWRCTRL) {
val = readl(pll->pll_base + ctrl->aon.offset);
val &= ~(bit_mask(ctrl->aon.pwr_width) << ctrl->aon.pwr_shift);
-   writel(val, pll->pll_base + ctrl->aon.offset);
-
-   if (unlikely(ctrl->flags & IPROC_CLK_NEEDS_READ_BACK))
-   readl(pll->pll_base + ctrl->aon.offset);
+   iproc_pll_write(pll, pll->pll_base, ctrl->aon.offset, val);
}
 
if (pll->pwr_base) {
@@ -188,14 +194,14 @@ static int __pll_enable(struct iproc_pll *pll)
val = readl(pll->pwr_base + ctrl->aon.offset);
val |= bit_mask(ctrl->aon.pwr_width) << ctrl->aon.pwr_shift;
val &= ~(1 << ctrl->aon.iso_shift);
-   writel(val, pll->pwr_base + ctrl->aon.offset);
+   iproc_pll_write(pll, pll->pwr_base, ctrl->aon.offset, val);
}
 
/* certain PLLs also need to be ungated from the ASIU top level */
if (ctrl->flags & IPROC_CLK_PLL_ASIU) {
val = readl(pll->asiu_base + ctrl->asiu.offset);
val |= (1 << ctrl->asiu.en_shift);
-   writel(val, pll->asiu_base + ctrl->asiu.offset);
+   iproc_pll_write(pll, pll->asiu_base, ctrl->asiu.offset, val);
}
 
return 0;
@@ -209,9 +215,7 @@ static void __pll_put_in_reset(struct iproc_pll *pll)
 
val = readl(pll->pll_base + reset->offset);
val &= ~(1 << reset->reset_shift | 1 << reset->p_reset_shift);
-   writel(val, pll->pll_base + reset->offset);
-   if (unlikely(ctrl->flags & IPROC_CLK_NEEDS_READ_BACK))
-   readl(pll->pll_base + reset->offset);
+   iproc_pll_write(pll, pll->pll_base, reset->offset, val);
 }
 
 static void __pll_bring_out_reset(struct iproc_pll *pll, unsigned int kp,
@@ -228,9 +232,7 @@ static void __pll_bring_out_reset(struct iproc_pll *pll, 
unsigned int kp,
val |=  ki << reset->ki_shift | kp << reset->kp_shift |
ka << reset->ka_shift;
val |= 1 << reset->reset_shift | 1 << reset->p_reset_shift;
-   writel(val, pll->pll_base + reset->offset);
-   if (unlikely(ctrl->flags & IPROC_CLK_NEEDS_READ_BACK))
-

Re: v5 of seccomp filter c/r patches

2015-10-02 Thread Tycho Andersen

On Fri, Oct 02, 2015 at 03:52:03PM -0700, Andy Lutomirski wrote:
> On Fri, Oct 2, 2015 at 3:44 PM, Tycho Andersen
>  wrote:
> > On Fri, Oct 02, 2015 at 02:10:24PM -0700, Kees Cook wrote:
> >> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
> >>  wrote:
> >> > Hi all,
> >> >
> >> > Here's v5 of the seccomp filter c/r set. The individual patch notes have
> >> > changes, but two highlights are:
> >> >
> >> > * This series is now based on http://patchwork.ozlabs.org/patch/525492/ 
> >> > and
> >> >   will need to be built with that patch applied. This gets rid of two 
> >> > incorrect
> >> >   patches in the previous series and is a nicer API.
> >> >
> >> > * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return 
> >> > the
> >> >   same struct file across calls, so we still need a kcmp command. I've 
> >> > narrowed
> >> >   the scope of the one being added to only compare seccomp fds.
> >> >
> >> > Thoughts welcome,
> >>
> >> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
> >>
> >> Happy bit:
> >> - avoiding eBPF and just saving the original filters makes things much 
> >> easier.
> >>
> >> Sad bit:
> >> - inventing a new interface for seccompfds feels like massive overkill to 
> >> me.
> >>
> >> While Andy has big dreams, we're not presently doing seccompfd
> >> monitoring, etc. There's no driving user for that kind of interface,
> >> and accepting the maintenance burden of it only for CRIU seems unwise.
> >>
> >> So, I'll go back to what I originally proposed at LSS (which it looks
> >> like we're half way there now):
> >>
> >> - save the original filter (done!)
> >> - extract filters through a single special-purpose interface (looks
> >> like ptrace is the way to go: root-only, stopped process, etc)
> >> - compare filter content and issue TSYNCs to merge detected sibling
> >> threads, since merging things that weren't merged before creates no
> >> problems.
> >>
> >> This means the parenting logic is heuristic, but it's entirely in
> >> userspace, so the complexity burden doesn't live in seccomp which we,
> >> by design, want to keep as simple as possible.
> >
> > Ok, how about,
> >
> > struct sock_filter insns[BPF_MAXINSNS];
> > insn_cnt = ptrace(PTRACE_SECCOMP_GET_FILTER, pid, insns, i);
> >
> > when asking for the ith filter? It returns either the number of
> > instructions, -EINVAL if something was wrong (i, pid,
> > CONFIG_CHECKPOINT_RESTORE isn't enabled). While it would always
> > succeed now, if/when the underlying filter was not created from a bpf
> > classic filter, we can return -EMEDIUMTYPE? (Suggestions welcome, I
> > picked this mostly based on what sounds nice.)
> >
> 
> Are we still requiring global permissions or that the caller isn't
> seccomped at all?  I've not lost track of how we're resolving the case
> where the caller and the tracee have exactly the same seccomp state
> (or the tracee is derived from the caller's state or they're totally
> unrelated states).

At least for now, I think requiring real root and no seccomp is fine,
so I'll do that.

Tycho
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 14/15] mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup

2015-10-02 Thread Dan Williams

On Fri, Oct 2, 2015 at 3:42 PM, Logan Gunthorpe  wrote:
>
>
> On 02/10/15 03:53 PM, Dan Williams wrote:
>>
>> Yes, this location for dev_pagemap will not work.  I've since moved it
>> to a union with the lru list_head since ZONE_DEVICE pages memory
>> should always have an elevated page count and never land on a slab
>> allocator lru.
>
>
> Oh, also, I was actually hoping to make use of the lru list_head in the
> future with ZONE_DEVICE memory. One thought I had was once we have a PCIe
> device with a BAR space, we'd then need to have a way of allocating these
> buffers when user space needs them. The simple way I was thinking was to
> just use the lru list head to store lists of used and unused pages -- though
> there are probably other solutions to this that don't require using struct
> pages.
>

The current assumption is the ZONE_DEVICE ranges are being managed by
a physical address allocator.  In the case of persistent memory this
is the block allocator of the filesystem sitting on top of a pmem
block device.  The struct page is really only there to facilitate
in-flight I/O requests.  If it weren't for complexity we'd allocate
them on demand.  So you're "unused" case should be a raw pfn and then
for the time-limited duration it is in use as a struct page it should
hold a reference against the mapping.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3] kselftest: timers: Add adjtick test to validate adjtimex() tick adjustments

2015-10-02 Thread John Stultz

Recently an issue was reported that was difficult to detect except
by tweaking the adjtimex tick value, and noticing how quickly the
adjustment took to be made:
https://lkml.org/lkml/2015/9/1/488

Thus this patch introduces a new test which manipulates the adjtimex
tick value and validates the results are what we expect.

Cc: Nuno Gonçalves 
Cc: Miroslav Lichvar 
Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Shuah Khan 
Signed-off-by: John Stultz 
---
v2: - Use sysconf(_SC_CLK_TCK) to properly get USER_HZ value.
v3: - Set STA_PLL to ensure offset is cleared
- Add rational for 100ppm error bound

 tools/testing/selftests/timers/Makefile  |   3 +-
 tools/testing/selftests/timers/adjtick.c | 210 +++
 2 files changed, 212 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/timers/adjtick.c

diff --git a/tools/testing/selftests/timers/Makefile 
b/tools/testing/selftests/timers/Makefile
index 89a3f44..4a1be1b 100644
--- a/tools/testing/selftests/timers/Makefile
+++ b/tools/testing/selftests/timers/Makefile
@@ -8,7 +8,7 @@ LDFLAGS += -lrt -lpthread
 TEST_PROGS = posix_timers nanosleep nsleep-lat set-timer-lat mqueue-lat \
 inconsistency-check raw_skew threadtest rtctest
 
-TEST_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex change_skew \
+TEST_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex adjtick change_skew \
  skew_consistency clocksource-switch leap-a-day \
  leapcrash set-tai set-2038
 
@@ -24,6 +24,7 @@ include ../lib.mk
 run_destructive_tests: run_tests
./alarmtimer-suspend
./valid-adjtimex
+   ./adjtick
./change_skew
./skew_consistency
./clocksource-switch
diff --git a/tools/testing/selftests/timers/adjtick.c 
b/tools/testing/selftests/timers/adjtick.c
new file mode 100644
index 000..2aeedbd
--- /dev/null
+++ b/tools/testing/selftests/timers/adjtick.c
@@ -0,0 +1,210 @@
+/* adjtimex() tick adjustment test
+ * by:   John Stultz 
+ * (C) Copyright Linaro Limited 2015
+ * Licensed under the GPLv2
+ *
+ *  To build:
+ * $ gcc adjtick.c -o adjtick -lrt
+ *
+ *   This program is free software: you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation, either version 2 of the License, or
+ *   (at your option) any later version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *   GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#ifdef KTEST
+#include "../kselftest.h"
+#else
+static inline int ksft_exit_pass(void)
+{
+   exit(0);
+}
+static inline int ksft_exit_fail(void)
+{
+   exit(1);
+}
+#endif
+
+
+#define CLOCK_MONOTONIC_RAW4
+#define NSEC_PER_SEC 10LL
+#define USEC_PER_SEC 100
+#define MILLION 100
+
+long systick;
+
+long long llabs(long long val)
+{
+   if (val < 0)
+   val = -val;
+   return val;
+}
+
+unsigned long long ts_to_nsec(struct timespec ts)
+{
+   return ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec;
+}
+
+struct timespec nsec_to_ts(long long ns)
+{
+   struct timespec ts;
+
+   ts.tv_sec = ns/NSEC_PER_SEC;
+   ts.tv_nsec = ns%NSEC_PER_SEC;
+   return ts;
+}
+
+long long diff_timespec(struct timespec start, struct timespec end)
+{
+   long long start_ns, end_ns;
+
+   start_ns = ts_to_nsec(start);
+   end_ns = ts_to_nsec(end);
+   return end_ns - start_ns;
+}
+
+void get_monotonic_and_raw(struct timespec *mon, struct timespec *raw)
+{
+   struct timespec start, mid, end;
+   long long diff = 0, tmp;
+   int i;
+
+   clock_gettime(CLOCK_MONOTONIC, mon);
+   clock_gettime(CLOCK_MONOTONIC_RAW, raw);
+
+   /* Try to get a more tightly bound pairing */
+   for (i = 0; i < 3; i++) {
+   long long newdiff;
+
+   clock_gettime(CLOCK_MONOTONIC, );
+   clock_gettime(CLOCK_MONOTONIC_RAW, );
+   clock_gettime(CLOCK_MONOTONIC, );
+
+   newdiff = diff_timespec(start, end);
+   if (diff == 0 || newdiff < diff) {
+   diff = newdiff;
+   *raw = mid;
+   tmp = (ts_to_nsec(start) + ts_to_nsec(end))/2;
+   *mon = nsec_to_ts(tmp);
+   }
+   }
+}
+
+long long get_ppm_drift(void)
+{
+   struct timespec mon_start, raw_start, mon_end, raw_end;
+   long long delta1, delta2, eppm;
+
+   get_monotonic_and_raw(_start, _start);
+
+   sleep(15);
+
+   get_monotonic_and_raw(_end, _end);
+
+   delta1 = diff_timespec(mon_start, mon_end);
+   delta2 =

Re: v5 of seccomp filter c/r patches

2015-10-02 Thread Andy Lutomirski

On Fri, Oct 2, 2015 at 3:44 PM, Tycho Andersen
 wrote:
> On Fri, Oct 02, 2015 at 02:10:24PM -0700, Kees Cook wrote:
>> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
>>  wrote:
>> > Hi all,
>> >
>> > Here's v5 of the seccomp filter c/r set. The individual patch notes have
>> > changes, but two highlights are:
>> >
>> > * This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
>> >   will need to be built with that patch applied. This gets rid of two 
>> > incorrect
>> >   patches in the previous series and is a nicer API.
>> >
>> > * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
>> >   same struct file across calls, so we still need a kcmp command. I've 
>> > narrowed
>> >   the scope of the one being added to only compare seccomp fds.
>> >
>> > Thoughts welcome,
>>
>> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
>>
>> Happy bit:
>> - avoiding eBPF and just saving the original filters makes things much 
>> easier.
>>
>> Sad bit:
>> - inventing a new interface for seccompfds feels like massive overkill to me.
>>
>> While Andy has big dreams, we're not presently doing seccompfd
>> monitoring, etc. There's no driving user for that kind of interface,
>> and accepting the maintenance burden of it only for CRIU seems unwise.
>>
>> So, I'll go back to what I originally proposed at LSS (which it looks
>> like we're half way there now):
>>
>> - save the original filter (done!)
>> - extract filters through a single special-purpose interface (looks
>> like ptrace is the way to go: root-only, stopped process, etc)
>> - compare filter content and issue TSYNCs to merge detected sibling
>> threads, since merging things that weren't merged before creates no
>> problems.
>>
>> This means the parenting logic is heuristic, but it's entirely in
>> userspace, so the complexity burden doesn't live in seccomp which we,
>> by design, want to keep as simple as possible.
>
> Ok, how about,
>
> struct sock_filter insns[BPF_MAXINSNS];
> insn_cnt = ptrace(PTRACE_SECCOMP_GET_FILTER, pid, insns, i);
>
> when asking for the ith filter? It returns either the number of
> instructions, -EINVAL if something was wrong (i, pid,
> CONFIG_CHECKPOINT_RESTORE isn't enabled). While it would always
> succeed now, if/when the underlying filter was not created from a bpf
> classic filter, we can return -EMEDIUMTYPE? (Suggestions welcome, I
> picked this mostly based on what sounds nice.)
>

Are we still requiring global permissions or that the caller isn't
seccomped at all?  I've not lost track of how we're resolving the case
where the caller and the tracee have exactly the same seccomp state
(or the tracee is derived from the caller's state or they're totally
unrelated states).

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] PCI: Xilinx-NWL-PCIe: Added support for Xilinx NWL PCIe Host Controller

2015-10-02 Thread Ray Jui



On 10/2/2015 3:36 PM, Arnd Bergmann wrote:
> On Thursday 01 October 2015 17:44:36 Ray Jui wrote:
>>
>> Sorry for stealing this discussion, :)
>>
>> I have some questions here, since this affects how I should implement
>> the MSI support for iProc based PCIe controller. I understand it makes
>> more sense to use separate device node for MSI and have "msi-parent"
>> from the pci node points to the MSI node, and that MSI node can be
>> gicv2m or gicv3 based on more advanced ARMv8 platforms.
>>
>> Then I would assume the MSI controller would deserve its own driver?
>> Which is a lot of people do nowadays. In that case, how I should handle
>> the case when the iProc MSI support is handled through some event queue
>> mechanism with their registers embedded in the PCIe controller register
>> space?
>>
>> Does the following logic make sense to you?
>>
>> 1. parse phandle of "msi-parent"
>> 2. Call of_pci_find_msi_chip_by_node to hook it up to msi chip already
>> registered (in the gicv2m and gicv3 case)
>> 3. If failed, fall back to use the iProc's own event queue logic by
>> calling iproc_pcie_msi_init.
>>
>> The iProc MSI still has its own node that looks like this:
>> 141 msi0: msi@2002 {
>> 142 msi-controller;
>> 143 interrupt-parent = <>;
>> 144 interrupts = ,
>> 145  ,
>> 146  ,
>> 147  ,
>> 148  ,
>> 149  ;
>> 150 brcm,num-eq-region = <1>;
>> 151 brcm,num-msi-msg-region = <1>;
>> 152 };
>>
>> But it does not have its own "reg" since they are embedded in the PCI
>> controller's registers and it relies on one calling iproc_pcie_msi_init
>> to pass in base register value and some other information.
> 
> I don't think I have a perfect answer to this. One way would be to
> separate the actual PCI root device node from the IP block that
> contains both the PCI root and the MSI catcher, but I guess that
> would require an incompatible change to your binding and it's not
> worth the pain.

Indeed, that's going to be very painful given that this iProc PCIe
controller driver is used on multiple platforms including Northstar,
Cygnus, Northstar+, and Northstar 2.

> 
> It's probably also ok to make the PCI host node itself be the msi-controller
> node and have an msi-parent phandle that points to the node itself. Not
> sure if that violates any rules that we may want or need to follow though.
> 
> Having a device node without registers is also a bit problematic,
> especially the 'msi@2002' name doesn't make sense if 0x2002
> is not the first number in the reg property. Maybe it's best to 
> put that node directly under the PCI host controller and not assign
> any registers. This is still a bit ugly because we'd expect devices
> under the host bridge to be PCI devices rather than random other things,
> but it may be the least of the evils.

This is what I have right now. With the msi node under the PCIe
controller node and have msi-parent points to the msi node. Maybe it
will be a lot easier to discuss this when I submit the code for review
within the next couple weeks.

> 
>   Arnd
> 

Thanks,

Ray
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v5 of seccomp filter c/r patches

2015-10-02 Thread Tycho Andersen

On Fri, Oct 02, 2015 at 02:10:24PM -0700, Kees Cook wrote:
> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
>  wrote:
> > Hi all,
> >
> > Here's v5 of the seccomp filter c/r set. The individual patch notes have
> > changes, but two highlights are:
> >
> > * This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
> >   will need to be built with that patch applied. This gets rid of two 
> > incorrect
> >   patches in the previous series and is a nicer API.
> >
> > * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
> >   same struct file across calls, so we still need a kcmp command. I've 
> > narrowed
> >   the scope of the one being added to only compare seccomp fds.
> >
> > Thoughts welcome,
> 
> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
> 
> Happy bit:
> - avoiding eBPF and just saving the original filters makes things much easier.
> 
> Sad bit:
> - inventing a new interface for seccompfds feels like massive overkill to me.
> 
> While Andy has big dreams, we're not presently doing seccompfd
> monitoring, etc. There's no driving user for that kind of interface,
> and accepting the maintenance burden of it only for CRIU seems unwise.
> 
> So, I'll go back to what I originally proposed at LSS (which it looks
> like we're half way there now):
> 
> - save the original filter (done!)
> - extract filters through a single special-purpose interface (looks
> like ptrace is the way to go: root-only, stopped process, etc)
> - compare filter content and issue TSYNCs to merge detected sibling
> threads, since merging things that weren't merged before creates no
> problems.
> 
> This means the parenting logic is heuristic, but it's entirely in
> userspace, so the complexity burden doesn't live in seccomp which we,
> by design, want to keep as simple as possible.

Ok, how about,

struct sock_filter insns[BPF_MAXINSNS];
insn_cnt = ptrace(PTRACE_SECCOMP_GET_FILTER, pid, insns, i);

when asking for the ith filter? It returns either the number of
instructions, -EINVAL if something was wrong (i, pid,
CONFIG_CHECKPOINT_RESTORE isn't enabled). While it would always
succeed now, if/when the underlying filter was not created from a bpf
classic filter, we can return -EMEDIUMTYPE? (Suggestions welcome, I
picked this mostly based on what sounds nice.)

Tycho
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] PCI: Xilinx-NWL-PCIe: Added support for Xilinx NWL PCIe Host Controller

2015-10-02 Thread Arnd Bergmann

On Thursday 01 October 2015 17:44:36 Ray Jui wrote:
> 
> Sorry for stealing this discussion, :)
> 
> I have some questions here, since this affects how I should implement
> the MSI support for iProc based PCIe controller. I understand it makes
> more sense to use separate device node for MSI and have "msi-parent"
> from the pci node points to the MSI node, and that MSI node can be
> gicv2m or gicv3 based on more advanced ARMv8 platforms.
> 
> Then I would assume the MSI controller would deserve its own driver?
> Which is a lot of people do nowadays. In that case, how I should handle
> the case when the iProc MSI support is handled through some event queue
> mechanism with their registers embedded in the PCIe controller register
> space?
> 
> Does the following logic make sense to you?
> 
> 1. parse phandle of "msi-parent"
> 2. Call of_pci_find_msi_chip_by_node to hook it up to msi chip already
> registered (in the gicv2m and gicv3 case)
> 3. If failed, fall back to use the iProc's own event queue logic by
> calling iproc_pcie_msi_init.
> 
> The iProc MSI still has its own node that looks like this:
> 141 msi0: msi@2002 {
> 142 msi-controller;
> 143 interrupt-parent = <>;
> 144 interrupts = ,
> 145  ,
> 146  ,
> 147  ,
> 148  ,
> 149  ;
> 150 brcm,num-eq-region = <1>;
> 151 brcm,num-msi-msg-region = <1>;
> 152 };
> 
> But it does not have its own "reg" since they are embedded in the PCI
> controller's registers and it relies on one calling iproc_pcie_msi_init
> to pass in base register value and some other information.

I don't think I have a perfect answer to this. One way would be to
separate the actual PCI root device node from the IP block that
contains both the PCI root and the MSI catcher, but I guess that
would require an incompatible change to your binding and it's not
worth the pain.

It's probably also ok to make the PCI host node itself be the msi-controller
node and have an msi-parent phandle that points to the node itself. Not
sure if that violates any rules that we may want or need to follow though.

Having a device node without registers is also a bit problematic,
especially the 'msi@2002' name doesn't make sense if 0x2002
is not the first number in the reg property. Maybe it's best to 
put that node directly under the PCI host controller and not assign
any registers. This is still a bit ugly because we'd expect devices
under the host bridge to be PCI devices rather than random other things,
but it may be the least of the evils.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 14/15] mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup

2015-10-02 Thread Logan Gunthorpe




On 02/10/15 03:53 PM, Dan Williams wrote:

Yes, this location for dev_pagemap will not work.  I've since moved it
to a union with the lru list_head since ZONE_DEVICE pages memory
should always have an elevated page count and never land on a slab
allocator lru.


Oh, also, I was actually hoping to make use of the lru list_head in the 
future with ZONE_DEVICE memory. One thought I had was once we have a 
PCIe device with a BAR space, we'd then need to have a way of allocating 
these buffers when user space needs them. The simple way I was thinking 
was to just use the lru list head to store lists of used and unused 
pages -- though there are probably other solutions to this that don't 
require using struct pages.


Thanks,

Logan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status

2015-10-02 Thread Andrew Morton

On Fri,  2 Oct 2015 15:35:51 +0200 Vlastimil Babka  wrote:

> From: Jerome Marchand 
> 
> It's currently inconvenient to retrieve MM_ANONPAGES value from status
> and statm files and there is no way to separate MM_FILEPAGES and
> MM_SHMEMPAGES. Add RssAnon, RssFile and RssShm lines in /proc//status
> to solve these issues.

This changelog is also head-spinning.

Why is it talking about MM_ANONPAGES and MM_FILEPAGES in the context of
procfs files?  Those terms are kernel-internal stuff and are
meaningless to end users.

So can we please start over with the changelogs?

- What is wrong with the current user interface?

- What changes are we proposing making?

- What new fields are added to the UI?  What is their meaning to users?

- Are any existing UI fields altered?  If so how and why and what
  impact will that have?

Extra points will be awarded for example procfs output.

This is the important stuff!  Once this is all clearly described,
understood and reviewed, then we can get into the
kernel-internal-microdetails like MM_ANONPAGES.

(And "What is wrong with the current user interface?" is important. 
What value does this patchset provide to end users?  Why does anyone
even want these changes?)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6] Documentation: add Device tree bindings for hwmon/nct7802

2015-10-02 Thread Constantine Shulyupin

Introduced subnodes sensor, fan and peci with properties.

Signed-off-by: Constantine Shulyupin 
---
Changed in v6:
- Removed previous definition.
- Introduced subnodes sensor, fan and peci with properties.

Changed in v5:
- Fixed typos

Changed in v4:
- Removed registers initialization by names
- Added properties nuvoton,sensor*-type

Changed in v3:
- Fixed vendor prefix
- Added short registers description,
  full registers description is available at
https://www.nuvoton.com/hq/products/cloud-computing/hardware-monitors/desktop-server-series/nct7802y/

Changed in v2:
- Removed nct7802,reg-init
- Added registers initialization by names

Introduced in v1:
 - nct7802,reg-init
---
 .../devicetree/bindings/hwmon/nct7802.txt  | 66 ++
 1 file changed, 66 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/nct7802.txt

diff --git a/Documentation/devicetree/bindings/hwmon/nct7802.txt 
b/Documentation/devicetree/bindings/hwmon/nct7802.txt
new file mode 100644
index 000..88ee599
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/nct7802.txt
@@ -0,0 +1,66 @@
+Nuvoton NCT7802Y Hardware Monitoring IC
+
+Required node properties:
+
+ - "compatible": must be "nuvoton,nct7802"
+ - "reg": I2C bus address of the device
+ - #address-cells=<1>;
+ - #size-cells=<0>;
+
+Optional subnodes:
+
+Subnode "sensor" properties:
+ - "reg": index in range 0..3
+ - "compatible", allowed values:
+- "nuvoton,nct7802-thermal-diode"
+- "nuvoton,nct7802-thermistor"
+- "nuvoton,nct7802-voltage"
+
+Sensor at address 2 can't be "nuvoton,nct7802-thermal-diode".
+
+Subnode "fan":
+Required properties:
+ - "reg" :index of in range 0 .. 2.
+ - "compatible", allowed values:
+   "nuvoton,nct7802-fan-pwm" :PWM powered fan
+   "nuvoton,nct7802-fan-dc" :direct current powered fan
+Optional boolean properties:
+ - "invert"  :inverted polarity
+ - "open-drain" :open-drain output instead push pull
+
+Subnode "peci" properties:
+ - "reg" :index in range 0..1
+ - "compatible", should be "nuvoton,nct7802-peci"
+
+Not defined sensors, fans and PECI modules will be disabled.
+
+Example nct7802 node:
+
+nct7802 {
+   compatible = "nuvoton,nct7802";
+   reg = <0x2a>;
+   #address-cells=<1>;
+   #size-cells=<0>;
+   sensor@0 {
+   reg = <0>;
+   compatible = "nuvoton,nct7802-thermistor";
+   };
+   sensor@2 {
+   reg = <2>;
+   compatible = "nuvoton,nct7802-thermal-diode";
+   };
+   fan@0 {
+   reg = <0>;
+   compatible = "nuvoton,nct7802-fan-pwm";
+   };
+   fan@1 {
+   reg = <1>;
+   compatible = "nuvoton,nct7802-fan-dc";
+   invert;
+   open-drain;
+   };
+   peci@0 {
+   reg = <0>;
+   compatible = "nuvoton,nct7802-peci";
+   };
+};
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] gpio: omap: convert to use generic irq handler

2015-10-02 Thread Grygorii Strashko


On 10/02/2015 03:17 PM, Linus Walleij wrote:

On Fri, Sep 25, 2015 at 12:28 PM, Grygorii Strashko
 wrote:


This patch converts TI OMAP GPIO driver to use generic irq handler
instead of chained IRQ handler. This way OMAP GPIO driver will be
compatible with RT kernel where it will be forced thread IRQ handler
while in non-RT kernel it still will be executed in HW IRQ context.
As part of this change the IRQ wakeup configuration is applied to
GPIO Bank IRQ as it now will be under control of IRQ PM Core during
suspend.

There are also additional benefits:
  - on-RT kernel there will be no complains any more about PM runtime usage
in atomic context  "BUG: sleeping function called from invalid context";
  - GPIO bank IRQs will appear in /proc/interrupts and its usage statistic
 will be  visible;
  - GPIO bank IRQs could be configured through IRQ proc_fs interface and,
as result, could be a part of IRQ balancing process if needed;
  - GPIO bank IRQs will be under control of IRQ PM Core during
suspend to RAM.

Disadvantage:
  - additional runtime overhed as call chain till
omap_gpio_irq_handler() will be longer now
  - necessity to use wa_lock in omap_gpio_irq_handler() to W/A warning
in handle_irq_event_percpu()
WARNING: CPU: 1 PID: 35 at kernel/irq/handle.c:149 
handle_irq_event_percpu+0x51c/0x638()

This patch doesn't fully follows recommendations provided by Sebastian
Andrzej Siewior [1], because It's required to go through and check all
GPIO IRQ pin states as fast as possible and pass control to handle_level_irq
or handle_edge_irq. handle_level_irq or handle_edge_irq will perform actions
specific for IRQ triggering type and wakeup corresponding registered
threaded IRQ handler (at least it's expected to be threaded).
IRQs can be lost if handle_nested_irq() will be used, because excecution
time of some pin specific GPIO IRQ handler can be very significant and
require accessing ext. devices (I2C).

Idea of such kind reworking was also discussed in [2].

[1] http://www.spinics.net/lists/linux-omap/msg120665.html
[2] http://www.spinics.net/lists/linux-omap/msg119516.html

Tested-by: Tony Lindgren 
Tested-by: Austin Schuh 
Signed-off-by: Grygorii Strashko 


Patch applied.



Thanks.


I'm thinking that we need some recommendations on how to write
IRQ handlers in order to be RT-compatible. Can you help me lining
up the requirements in Documentation/gpio/driver.txt?

I will write an RFC patch and let you write some additional text
to it in response then we can iterate it a bit.


Sure. I'll try to help.


--
regards,
-grygorii
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 4/5] PCI: Handle Enhanced Allocation (EA) capability for SRIOV devices.

2015-10-02 Thread David Daney

From: David Daney 

SRIOV BARs can be specified via EA entries.  Extend the EA parser to
extract the SRIOV BAR resources, and modify sriov_init() to use
resources previously obtained via EA.

Signed-off-by: David Daney 
---
 drivers/pci/iov.c | 11 +--
 drivers/pci/pci.c |  5 +
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ee0ebff..c789e68 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -436,8 +436,15 @@ found:
nres = 0;
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
res = >resource[i + PCI_IOV_RESOURCES];
-   bar64 = __pci_read_base(dev, pci_bar_unknown, res,
-   pos + PCI_SRIOV_BAR + i * 4);
+   /*
+* If it is already FIXED, don't change it, something
+* (perhaps EA or header fixups) wants it this way.
+*/
+   if (res->flags & IORESOURCE_PCI_FIXED)
+   bar64 = (res->flags & IORESOURCE_MEM_64) ? 1 : 0;
+   else
+   bar64 = __pci_read_base(dev, pci_bar_unknown, res,
+   pos + PCI_SRIOV_BAR + i * 4);
if (!res->flags)
continue;
if (resource_size(res) & (PAGE_SIZE - 1)) {
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 54a0a98..6750edf 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2178,6 +2178,11 @@ static struct resource *pci_ea_get_resource(struct 
pci_dev *dev, u8 bei, u8 prop
 {
if (bei <= PCI_EA_BEI_BAR5 && prop <= PCI_EA_P_IO)
return >resource[bei];
+#ifdef CONFIG_PCI_IOV
+   else if (bei >= PCI_EA_BEI_VF_BAR0 && bei <= PCI_EA_BEI_VF_BAR5 &&
+(prop == PCI_EA_P_VIRT_MEM || prop == 
PCI_EA_P_VIRT_MEM_PREFETCH))
+   return >resource[PCI_IOV_RESOURCES + bei - 
PCI_EA_BEI_VF_BAR0];
+#endif
else if (bei == PCI_EA_BEI_ROM)
return >resource[PCI_ROM_RESOURCE];
else
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 1/5] PCI: Add Enhanced Allocation register entries

2015-10-02 Thread David Daney

From: "Sean O. Stalley" 

Add registers defined in PCI-SIG's Enhanced allocation ECN.

Signed-off-by: Sean O. Stalley 
[david.da...@cavium.com: Added more definitions for PCI_EA_BEI_*]
Signed-off-by: Signed-off-by: David Daney 
---
 include/uapi/linux/pci_regs.h | 44 ++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 413417f..352e418 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -216,7 +216,8 @@
 #define  PCI_CAP_ID_MSIX   0x11/* MSI-X */
 #define  PCI_CAP_ID_SATA   0x12/* SATA Data/Index Conf. */
 #define  PCI_CAP_ID_AF 0x13/* PCI Advanced Features */
-#define  PCI_CAP_ID_MAXPCI_CAP_ID_AF
+#define  PCI_CAP_ID_EA 0x14/* PCI Enhanced Allocation */
+#define  PCI_CAP_ID_MAXPCI_CAP_ID_EA
 #define PCI_CAP_LIST_NEXT  1   /* Next capability in the list */
 #define PCI_CAP_FLAGS  2   /* Capability defined flags (16 bits) */
 #define PCI_CAP_SIZEOF 4
@@ -353,6 +354,47 @@
 #define  PCI_AF_STATUS_TP  0x01
 #define PCI_CAP_AF_SIZEOF  6   /* size of AF registers */
 
+/* PCI Enhanced Allocation registers */
+
+#define PCI_EA_NUM_ENT 2   /* Number of Capability Entries */
+#define PCI_EA_NUM_ENT_MASK0x3f/* Num Entries Mask */
+#define PCI_EA_FIRST_ENT   4   /* First EA Entry in List */
+#define PCI_EA_FIRST_ENT_BRIDGE8   /* First EA Entry for Bridges */
+#define PCI_EA_ES  0x7 /* Entry Size */
+#define PCI_EA_BEI(x)  (((x) >> 4) & 0xf) /* BAR Equivalent Indicator */
+/* 0-5 map to BARs 0-5 respectively */
+#define  PCI_EA_BEI_BAR0   0
+#define  PCI_EA_BEI_BAR5   5
+#define  PCI_EA_BEI_BRIDGE 6   /* Resource behind bridge */
+#define  PCI_EA_BEI_ENI7   /* Equivalent Not Indicated */
+#define  PCI_EA_BEI_ROM8   /* Expansion ROM */
+/* 9-14 map to VF BARs 0-5 respectively */
+#define  PCI_EA_BEI_VF_BAR09
+#define  PCI_EA_BEI_VF_BAR514
+#define  PCI_EA_BEI_RESERVED   15  /* Reserved - Treat like ENI */
+
+#define PCI_EA_PP(x)   (((x) >>  8) & 0xff)/* Primary Properties */
+#define PCI_EA_SP(x)   (((x) >> 16) & 0xff)/* Secondary Properties */
+#define  PCI_EA_P_MEM  0x00/* Non-Prefetch Memory */
+#define  PCI_EA_P_MEM_PREFETCH 0x01/* Prefetchable Memory */
+#define  PCI_EA_P_IO   0x02/* I/O Space */
+#define  PCI_EA_P_VIRT_MEM_PREFETCH0x03/* VF Prefetchable Memory */
+#define  PCI_EA_P_VIRT_MEM 0x04/* VF Non-Prefetch Memory */
+#define  PCI_EA_P_BRIDGE_MEM   0x05/* Bridge Non-Prefetch Memory */
+#define  PCI_EA_P_BRIDGE_MEM_PREFETCH  0x06/* Bridge Prefetchable Memory */
+#define  PCI_EA_P_BRIDGE_IO0x07/* Bridge I/O Space */
+/* 0x08-0xfc reserved */
+#define  PCI_EA_P_MEM_RESERVED 0xfd/* Reserved Memory */
+#define  PCI_EA_P_IO_RESERVED  0xfe/* Reserved I/O Space */
+#define  PCI_EA_P_UNAVAILABLE  0xff/* Entry Unavailable */
+#define PCI_EA_WRITEABLE   BIT(30) /* Writable, 1 = RW, 0 = HwInit */
+#define PCI_EA_ENABLE  BIT(31) /* Enable for this entry */
+#define PCI_EA_BASE4   /* Base Address Offset */
+#define PCI_EA_MAX_OFFSET  8   /* MaxOffset (resource length) */
+/* bit 0 is reserved */
+#define PCI_EA_IS_64   BIT(1)  /* 64-bit field flag */
+#define PCI_EA_FIELD_MASK  0xfffc  /* For Base & Max Offset */
+
 /* PCI-X registers (Type 0 (non-bridge) devices) */
 
 #define PCI_X_CMD  2   /* Modes & Features */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 3/5] PCI: Handle IORESOURCE_PCI_FIXED when sizing and assigning resources.

2015-10-02 Thread David Daney

From: David Daney 

The new Enhanced Allocation (EA) capability support creates resources
with the IORESOURCE_PCI_FIXED set.  This creates a couple of problems:

1) Since these resources cannot be relocated or resized, their
   alignment is not really defined, and it is therefore not specified.
   This causes a problem in pbus_size_mem() where resources with
   unspecified alignment are disabled.

2) During resource assignment in pci_bus_assign_resources(),
   IORESOURCE_PCI_FIXED resources are not given a parent.  This, in
   turn, causes pci_enable_resources() to fail with a "not claimed"
   error.

So, in pbus_size_mem() skip IORESOURCE_PCI_FIXED resources, instead of
disabling them.

In __pci_bus_assign_resources(), for IORESOURCE_PCI_FIXED resources,
try to request the resource from a parent bus.

Signed-off-by: David Daney 
---
 drivers/pci/setup-bus.c | 60 ++---
 1 file changed, 57 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 508cc56..8f9ed9b 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1037,9 +1037,10 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned 
long mask,
struct resource *r = >resource[i];
resource_size_t r_size;
 
-   if (r->parent || ((r->flags & mask) != type &&
- (r->flags & mask) != type2 &&
- (r->flags & mask) != type3))
+   if (r->parent || (r->flags | IORESOURCE_PCI_FIXED) ||
+   ((r->flags & mask) != type &&
+(r->flags & mask) != type2 &&
+(r->flags & mask) != type3))
continue;
r_size = resource_size(r);
 #ifdef CONFIG_PCI_IOV
@@ -1340,6 +1341,57 @@ void pci_bus_size_bridges(struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_bus_size_bridges);
 
+static void assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
+{
+   int i;
+   struct resource *parent_r;
+   unsigned long mask = IORESOURCE_IO | IORESOURCE_MEM |
+   IORESOURCE_PREFETCH;
+
+   pci_bus_for_each_resource(b, parent_r, i) {
+   if (!parent_r)
+   continue;
+
+   if ((r->flags & mask) == (parent_r->flags & mask) &&
+   resource_contains(parent_r, r))
+   request_resource(parent_r, r);
+   }
+}
+
+/*
+ * Try to assign any resources marked as IORESOURCE_PCI_FIXED, as they
+ * are skipped by pbus_assign_resources_sorted().
+ */
+static void pdev_assign_fixed_resources(struct pci_dev *dev)
+{
+   int i;
+
+   for (i = 0; i <  PCI_NUM_RESOURCES; i++) {
+   struct pci_bus *b;
+   struct resource *r = >resource[i];
+
+   if (r->parent || !(r->flags & IORESOURCE_PCI_FIXED) ||
+   !(r->flags & (IORESOURCE_IO | IORESOURCE_MEM)))
+   continue;
+
+   b = dev->bus;
+   while (b && !r->parent) {
+   assign_fixed_resource_on_bus(b, r);
+   b = b->parent;
+   }
+   if (!r->parent) {
+   /*
+* If that didn't work, try to fallback to the
+* ultimate parent.
+*/
+   if (r->flags &  IORESOURCE_IO)
+   request_resource(_resource, r);
+   else
+   request_resource(_resource, r);
+   }
+   }
+}
+
 void __pci_bus_assign_resources(const struct pci_bus *bus,
struct list_head *realloc_head,
struct list_head *fail_head)
@@ -1350,6 +1402,8 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
pbus_assign_resources_sorted(bus, realloc_head, fail_head);
 
list_for_each_entry(dev, >devices, bus_list) {
+   pdev_assign_fixed_resources(dev);
+
b = dev->subordinate;
if (!b)
continue;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 2/5] PCI: Add support for Enhanced Allocation devices

2015-10-02 Thread David Daney

From: "Sean O. Stalley" 

Add support for devices using Enhanced Allocation entries instead of BARs.
This patch allows the kernel to parse the EA Extended Capability structure
in PCI configspace and claim the BAR-equivalent resources.

Signed-off-by: Sean O. Stalley 
[david.da...@cavium.com: Add more support/checking for Entry Properties,
allow EA behind bridges, rewrite some error messages.]
Signed-off-by: David Daney 
---
 drivers/pci/pci.c   | 184 
 drivers/pci/pci.h   |   1 +
 drivers/pci/probe.c |   3 +
 3 files changed, 188 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 6a9a111..54a0a98 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2148,6 +2148,190 @@ void pci_pm_init(struct pci_dev *dev)
}
 }
 
+static unsigned long pci_ea_set_flags(struct pci_dev *dev, u8 prop)
+{
+   unsigned long flags = IORESOURCE_PCI_FIXED;
+
+   switch (prop) {
+   case PCI_EA_P_MEM:
+   case PCI_EA_P_VIRT_MEM:
+   case PCI_EA_P_BRIDGE_MEM:
+   flags |= IORESOURCE_MEM;
+   break;
+   case PCI_EA_P_MEM_PREFETCH:
+   case PCI_EA_P_VIRT_MEM_PREFETCH:
+   case PCI_EA_P_BRIDGE_MEM_PREFETCH:
+   flags |= IORESOURCE_MEM | IORESOURCE_PREFETCH;
+   break;
+   case PCI_EA_P_IO:
+   case PCI_EA_P_BRIDGE_IO:
+   flags |= IORESOURCE_IO;
+   break;
+   default:
+   return 0;
+   }
+
+   return flags;
+}
+
+static struct resource *pci_ea_get_resource(struct pci_dev *dev, u8 bei, u8 
prop)
+{
+   if (bei <= PCI_EA_BEI_BAR5 && prop <= PCI_EA_P_IO)
+   return >resource[bei];
+   else if (bei == PCI_EA_BEI_ROM)
+   return >resource[PCI_ROM_RESOURCE];
+   else
+   return NULL;
+}
+
+/* Read an Enhanced Allocation (EA) entry */
+static int pci_ea_read(struct pci_dev *dev, int offset)
+{
+   struct resource *res;
+   int ent_offset = offset;
+   int ent_size;
+   resource_size_t start;
+   resource_size_t end;
+   unsigned long flags;
+   u32 dw0;
+   u32 base;
+   u32 max_offset;
+   u8 prop;
+   bool support_64 = (sizeof(resource_size_t) >= 8);
+
+   pci_read_config_dword(dev, ent_offset, );
+   ent_offset += 4;
+
+   /* Entry size field indicates DWORDs after 1st */
+   ent_size = ((dw0 & PCI_EA_ES) + 1) << 2;
+
+   if (!(dw0 & PCI_EA_ENABLE)) /* Entry not enabled */
+   goto out;
+
+   prop = PCI_EA_PP(dw0);
+   /*
+* If the Property is in the reserved range, try the Secondary
+* Property instead.
+*/
+   if (prop > PCI_EA_P_BRIDGE_IO && prop < PCI_EA_P_MEM_RESERVED)
+   prop = PCI_EA_SP(dw0);
+   if (prop > PCI_EA_P_BRIDGE_IO)
+   goto out;
+
+   res = pci_ea_get_resource(dev, PCI_EA_BEI(dw0), prop);
+   if (!res) {
+   dev_err(>dev, "Unsupported EA entry BEI: %u\n",
+   PCI_EA_BEI(dw0));
+   goto out;
+   }
+
+   flags = pci_ea_set_flags(dev, prop);
+   if (!flags) {
+   dev_err(>dev, "Unsupported EA properties: %u\n", prop);
+   goto out;
+   }
+
+   /* Read Base */
+   pci_read_config_dword(dev, ent_offset, );
+   start = (base & PCI_EA_FIELD_MASK);
+   ent_offset += 4;
+
+   /* Read MaxOffset */
+   pci_read_config_dword(dev, ent_offset, _offset);
+   ent_offset += 4;
+
+   /* Read Base MSBs (if 64-bit entry) */
+   if (base & PCI_EA_IS_64) {
+   u32 base_upper;
+
+   pci_read_config_dword(dev, ent_offset, _upper);
+   ent_offset += 4;
+
+   flags |= IORESOURCE_MEM_64;
+
+   /* entry starts above 32-bit boundary, can't use */
+   if (!support_64 && base_upper)
+   goto out;
+
+   if (support_64)
+   start |= ((u64)base_upper << 32);
+   }
+
+   dev_dbg(>dev,
+   "EA (%u,%u) start = %pa\n", PCI_EA_BEI(dw0), prop, );
+
+   end = start + (max_offset | 0x03);
+
+   /* Read MaxOffset MSBs (if 64-bit entry) */
+   if (max_offset & PCI_EA_IS_64) {
+   u32 max_offset_upper;
+
+   pci_read_config_dword(dev, ent_offset, _offset_upper);
+   ent_offset += 4;
+
+   flags |= IORESOURCE_MEM_64;
+
+   /* entry too big, can't use */
+   if (!support_64 && max_offset_upper)
+   goto out;
+
+   if (support_64)
+   end += ((u64)max_offset_upper << 32);
+   }
+
+   dev_dbg(>dev,
+   "EA (%u,%u) end = %pa\n", PCI_EA_BEI(dw0), prop, );
+
+   if (end < start) {
+   dev_err(>dev, "EA Entry crosses address boundary\n");
+   goto out;
+   }
+
+   if (ent_size != ent_offset - offset) {
+

[PATCH v4 0/5] PCI: Add support for PCI Enhanced Allocation "BARs"

2015-10-02 Thread David Daney

From: David Daney 

The original patches are from Sean O. Stalley. I made a few tweaks,
but feel that it is substancially Sean's work, so I am keeping the
patch set version numbering scheme going.

Tested on Cavium ThunderX system with 4 Root Complexes containing 50
devices/bridges provisioned with EA.

Here is Sean's description of the patches:

PCI Enhanced Allocation is a new method of allocating MMIO & IO
resources for PCI devices & bridges. It can be used instead
of the traditional PCI method of using BARs.

EA entries are hardware-initialized to a fixed address.
Unlike BARs, regions described by EA are cannot be moved.
Because of this, only devices which are permanently connected to
the PCI bus can use EA. A removable PCI card must not use EA.

This patchset adds support for using EA entries instead of BARs
on Root Complex Integrated Endpoints.

The Enhanced Allocation ECN is publicly available here:
https://www.pcisig.com/specifications/conventional/ECN_Enhanced_Allocation_23_Oct_2014_Final.pdf


Changes from V1:
- Use generic PCI resource claim functions (instead of EA-specific 
functions)
- Only add support for RCiEPs (instead of all devices).
- Removed some debugging messages leftover from early testing.

Changes from V2 (By David Daney):
- Add ea_cap to struct pci_device, to aid in finding the EA capability.
- Factored EA entity decoding into a separate function.
- Add functions to find EA entities by BEI or Property.
- Add handling of EA provisioned bridges.
- Add handling of EA SRIOV BARs.
- Try to assign proper resource parent so that SRIOV device creation 
can occur.

Changes from V3 (By David Daney):
- Discarded V3 changes and started over fresh based on Sean's V2.
- Add more support/checking for Entry Properties.
- Allow EA behind bridges.
- Rewrite some error messages.
- Add patch 3/5 to prevent resizing, and better handle
  assigning, of fixed EA resources.
- Add patch 4/5 to handle EA provisioned SRIOV devices.
- Add patch 5/5 to handle EA provisioned bridges.

David Daney (3):
  PCI: Handle IORESOURCE_PCI_FIXED when sizing and assigning resources.
  PCI: Handle Enhanced Allocation (EA) capability for SRIOV devices.
  PCI: Handle Enhanced Allocation (EA) capability for bridges

Sean O. Stalley (2):
  PCI: Add Enhanced Allocation register entries
  PCI: Add support for Enhanced Allocation devices

 drivers/pci/bus.c |   7 ++
 drivers/pci/iov.c |  11 ++-
 drivers/pci/pci.c | 202 ++
 drivers/pci/pci.h |   1 +
 drivers/pci/probe.c   |  34 ++-
 drivers/pci/setup-bus.c   |  63 -
 include/linux/pci.h   |   1 +
 include/uapi/linux/pci_regs.h |  44 -
 8 files changed, 355 insertions(+), 8 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 5/5] PCI: Handle Enhanced Allocation (EA) capability for bridges

2015-10-02 Thread David Daney

From: David Daney 

PCI bridges may have their properties be specified via EA entries.

Extend the EA parser to extract the bridge resources, and modify
pci_read_bridge_{io,mmio,mmio_pref}() to use resources previously
obtained via EA.

Save the offset to the EA capability in struct pci_dev, and use it to
easily find the EA bridge subordinate and secondary bus numbers.

When assigning the bridge resources a couple of changes are required
so that the EA obtained IORESOURCE_PCI_FIXED are not resized, and
correctly linked into the resource tree.

1) In pbus_size_mem() do not attempt to resize the bridge resources if
   they are marked as IORESOURCE_PCI_FIXED.

2) In pci_bus_alloc_from_region()for IORESOURCE_PCI_FIXED resources, just
   try to request the resource as is, without attempting to resize it.

Signed-off-by: David Daney 
---
 drivers/pci/bus.c   |  7 +++
 drivers/pci/pci.c   | 13 +
 drivers/pci/probe.c | 31 +--
 drivers/pci/setup-bus.c |  3 +++
 include/linux/pci.h |  1 +
 5 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 6fbd3f2..0556b33 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -153,6 +153,13 @@ static int pci_bus_alloc_from_region(struct pci_bus *bus, 
struct resource *res,
!(res->flags & IORESOURCE_PREFETCH))
continue;
 
+   if (res->flags & IORESOURCE_PCI_FIXED) {
+   /* Cannot change it, just try to claim it. */
+   if (request_resource(r, res))
+   continue;
+   return 0;
+   }
+
avail = *r;
pci_clip_resource_to_region(bus, , region);
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 6750edf..c857632 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2183,6 +2183,17 @@ static struct resource *pci_ea_get_resource(struct 
pci_dev *dev, u8 bei, u8 prop
 (prop == PCI_EA_P_VIRT_MEM || prop == 
PCI_EA_P_VIRT_MEM_PREFETCH))
return >resource[PCI_IOV_RESOURCES + bei - 
PCI_EA_BEI_VF_BAR0];
 #endif
+   else if (bei == PCI_EA_BEI_BRIDGE)
+   switch (prop) {
+   case PCI_EA_P_BRIDGE_IO:
+   return >resource[PCI_BRIDGE_RESOURCES + 0];
+   case PCI_EA_P_BRIDGE_MEM:
+   return >resource[PCI_BRIDGE_RESOURCES + 1];
+   case PCI_EA_P_BRIDGE_MEM_PREFETCH:
+   return >resource[PCI_BRIDGE_RESOURCES + 2];
+   default:
+   return NULL;
+   }
else if (bei == PCI_EA_BEI_ROM)
return >resource[PCI_ROM_RESOURCE];
else
@@ -2321,6 +2332,8 @@ void pci_ea_init(struct pci_dev *dev)
if (!ea)
return;
 
+   dev->ea_cap = ea;
+
/* determine the number of entries */
pci_bus_read_config_byte(dev->bus, dev->devfn, ea + PCI_EA_NUM_ENT,
_ent);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 4293eec..e4bcb0b 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -348,6 +348,10 @@ static void pci_read_bridge_io(struct pci_bus *child)
}
 
res = child->resource[0];
+   if (res->flags & IORESOURCE_PCI_FIXED) {
+   dev_dbg(>dev, "  bridge window %pR (fixed)\n", res);
+   return;
+   }
pci_read_config_byte(dev, PCI_IO_BASE, _base_lo);
pci_read_config_byte(dev, PCI_IO_LIMIT, _limit_lo);
base = (io_base_lo & io_mask) << 8;
@@ -380,6 +384,10 @@ static void pci_read_bridge_mmio(struct pci_bus *child)
struct resource *res;
 
res = child->resource[1];
+   if (res->flags & IORESOURCE_PCI_FIXED) {
+   dev_dbg(>dev, "  bridge window %pR (fixed)\n", res);
+   return;
+   }
pci_read_config_word(dev, PCI_MEMORY_BASE, _base_lo);
pci_read_config_word(dev, PCI_MEMORY_LIMIT, _limit_lo);
base = ((unsigned long) mem_base_lo & PCI_MEMORY_RANGE_MASK) << 16;
@@ -403,6 +411,10 @@ static void pci_read_bridge_mmio_pref(struct pci_bus 
*child)
struct resource *res;
 
res = child->resource[2];
+   if (res->flags & IORESOURCE_PCI_FIXED) {
+   dev_dbg(>dev, "  bridge window %pR (fixed)\n", res);
+   return;
+   }
pci_read_config_word(dev, PCI_PREF_MEMORY_BASE, _base_lo);
pci_read_config_word(dev, PCI_PREF_MEMORY_LIMIT, _limit_lo);
base64 = (mem_base_lo & PCI_PREF_RANGE_MASK) << 16;
@@ -801,8 +813,23 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev 
*dev, int max, int pass)
 
pci_read_config_dword(dev, PCI_PRIMARY_BUS, );
primary = buses & 0xFF;
-   secondary = (buses >> 8) & 0xFF;
-   subordinate = (buses >> 16) & 0xFF;
+   if (dev->ea_cap) {
+   u32 dw1;

Re: [PATCH] ovs: do not allocate memory from offline numa node

2015-10-02 Thread Pravin Shelar

On Fri, Oct 2, 2015 at 3:18 AM, Konstantin Khlebnikov
 wrote:
> When openvswitch tries allocate memory from offline numa node 0:
> stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0)
> It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
> [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
> This patch disables numa affinity in this case.
>
> Signed-off-by: Konstantin Khlebnikov 
>
> ---
>
> <4>[   24.368805] [ cut here ]
> <2>[   24.368846] kernel BUG at include/linux/gfp.h:325!
> <4>[   24.368868] invalid opcode:  [#1] SMP
> <4>[   24.368892] Modules linked in: openvswitch vxlan udp_tunnel 
> ip6_udp_tunnel gre libcrc32c kvm_amd kvm crc32_pclmul ghash_clmulni_intel 
> aesni_intel ablk_helper cryptd lrw mgag200 ttm drm_kms_helper drm gf128mul 
> glue_helper serio_raw aes_x86_64 sysimgblt sysfillrect syscopyarea sp5100_tco 
> amd64_edac_mod edac_core edac_mce_amd i2c_piix4 k10temp fam15h_power 
> microcode raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
> async_tx xor raid6_pq raid1 raid0 igb multipath i2c_algo_bit i2c_core linear 
> dca psmouse ptp ahci pata_atiixp pps_core libahci
> <4>[   24.369225] CPU: 22 PID: 987 Comm: ovs-vswitchd Not tainted 3.18.19-24 
> #1
> <4>[   24.369255] Hardware name: Supermicro H8DGU/H8DGU, BIOS 3.0b   
> 05/07/2013
> <4>[   24.369286] task: 8807f2433240 ti: 8807ec9a task.ti: 
> 8807ec9a
> <4>[   24.369317] RIP: 0010:[]  [] 
> new_slab+0x2d4/0x380
> <4>[   24.369359] RSP: 0018:8807ec9a35d8  EFLAGS: 00010246
> <4>[   24.369383] RAX:  RBX: 8807ff403c00 RCX: 
> 
> <4>[   24.369412] RDX:  RSI:  RDI: 
> 002012d0
> <4>[   24.369441] RBP: 8807ec9a3608 R08: 8807f193cfe0 R09: 
> 0001008a
> <4>[   24.369471] R10: f193cf01 R11: 00015f38 R12: 
> 
> <4>[   24.369501] R13: 0080 R14:  R15: 
> 00d0
> <4>[   24.369531] FS:  7febb0cbe980() GS:8807ffd8() 
> knlGS:
> <4>[   24.369563] CS:  0010 DS:  ES:  CR0: 80050033
> <4>[   24.369588] CR2: 7efc53abc1b8 CR3: 0007f213f000 CR4: 
> 000407e0
> <4>[   24.369618] Stack:
> <4>[   24.369630]  8807ec9a3618   
> 8807ffd958c0
> <4>[   24.369669]  8807ff403c00 80d0 8807ec9a36f8 
> 816cc548
> <4>[   24.370755]  8807ec9a3708 0296 0004 
> 
> <4>[   24.371777] Call Trace:
> <4>[   24.372929]  [] __slab_alloc+0x33b/0x459
> <4>[   24.374179]  [] ? ovs_flow_alloc+0x59/0x110 
> [openvswitch]
> <4>[   24.375390]  [] ? get_page_from_freelist+0x483/0x9f0
> <4>[   24.376623]  [] ? memzero_explicit+0xe/0x10
> <4>[   24.377767]  [] ? ovs_flow_alloc+0x59/0x110 
> [openvswitch]
> <4>[   24.378951]  [] kmem_cache_alloc_node+0x9c/0x1b0
> <4>[   24.379916]  [] ? kmem_cache_alloc+0x18b/0x1a0
> <4>[   24.390806]  [] ? ovs_flow_alloc+0x1d/0x110 
> [openvswitch]
> <4>[   24.391779]  [] ovs_flow_alloc+0x59/0x110 
> [openvswitch]
> <4>[   24.392875]  [] ovs_flow_cmd_new+0x5b/0x360 
> [openvswitch]
> <4>[   24.394004]  [] ? __alloc_pages_nodemask+0x16c/0xaf0
> <4>[   24.394973]  [] ? __alloc_skb+0x87/0x2a0
> <4>[   24.395926]  [] ? nla_parse+0x90/0x110
> <4>[   24.476276]  [] genl_family_rcv_msg+0x373/0x3d0
> <4>[   24.477704]  [] ? 
> __kmalloc_node_track_caller+0x6c/0x220
> <4>[   24.478859]  [] genl_rcv_msg+0x44/0x80
> <4>[   24.479987]  [] ? genl_family_rcv_msg+0x3d0/0x3d0
> <4>[   24.481325]  [] netlink_rcv_skb+0xb9/0xe0
> <4>[   24.482466]  [] genl_rcv+0x2c/0x40
> <4>[   24.483554]  [] netlink_unicast+0x12b/0x1c0
> <4>[   24.484739]  [] netlink_sendmsg+0x392/0x6d0
> <4>[   24.485942]  [] sock_sendmsg+0xaf/0xc0
> <4>[   24.486953]  [] ? netlink_sendmsg+0x202/0x6d0
> <4>[   24.487969]  [] ___sys_sendmsg.part.19+0x322/0x330
> <4>[   24.489167]  [] ? SYSC_sendto+0xf9/0x130
> <4>[   24.490217]  [] ___sys_sendmsg+0x4a/0x70
> <4>[   24.491162]  [] __sys_sendmsg+0x49/0x90
> <4>[   24.492082]  [] SyS_sendmsg+0x19/0x20
> <4>[   24.493181]  [] system_call_fastpath+0x12/0x17
> <4>[   24.494124] Code: 40 e9 ea fe ff ff 90 e8 6b 69 ff ff 49 89 c4 e9 07 fe 
> ff ff 4c 89 f7 ff d0 e9 26 ff ff ff 49 c7 04 06 00 00 00 00 e9 3c ff ff ff 
> <0f> 0b ba 00 10 00 00 be 5a 00 00 00 4c 89 ef 48 d3 e2 e8 65 2a
> <1>[   24.496071] RIP  [] new_slab+0x2d4/0x380
> <4>[   24.497152]  RSP 
> <4>[   24.498945] ---[ end trace 6f97360ff4a9ee45 ]---
> ---
>  net/openvswitch/flow_table.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index f2ea83ba4763..c7f74aab34b9 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -93,7 +93,8 @@ struct sw_flow *ovs_flow_alloc(void)
>
> /* Initialize the default

Re: [PATCH v4 3/4] mm, shmem: Add shmem resident memory accounting

2015-10-02 Thread Andrew Morton

On Fri,  2 Oct 2015 15:35:50 +0200 Vlastimil Babka  wrote:

> From: Jerome Marchand 

Changelog is a bit weird.

> Currently looking at /proc//status or statm, there is no way to
> distinguish shmem pages from pages mapped to a regular file (shmem
> pages are mapped to /dev/zero), even though their implication in
> actual memory use is quite different.

OK, that's a bunch of stuff about the user interface.

> This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
> shmem pages instead of MM_FILEPAGES.

And that has nothing to do with the user interface.

So now this little reader is all confused.  The patch doesn't actually
address the described problem at all, does it?  It's preparatory stuff
only?  No changes to the kernel's user interface?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps

2015-10-02 Thread Andrew Morton

On Fri,  2 Oct 2015 15:35:49 +0200 Vlastimil Babka  wrote:

> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question 
> never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee 
> a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.
> 
> ...
>
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct 
> address_space *mapping,
>  extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t 
> end);
>  extern int shmem_unuse(swp_entry_t entry, struct page *page);
>  
> +#ifdef CONFIG_SWAP
> +extern unsigned long shmem_swap_usage(struct inode *inode);
> +extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
> + pgoff_t start, pgoff_t end);
> +#endif

CONFIG_SWAP is wrong, isn't it?  It should be CONFIG_SHMEM if anything.

I'd just do

--- 
a/include/linux/shmem_fs.h~mm-proc-account-for-shmem-swap-in-proc-pid-smaps-fix
+++ a/include/linux/shmem_fs.h
@@ -60,11 +60,9 @@ extern struct page *shmem_read_mapping_p
 extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t 
end);
 extern int shmem_unuse(swp_entry_t entry, struct page *page);
 
-#ifdef CONFIG_SWAP
 extern unsigned long shmem_swap_usage(struct inode *inode);
 extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
pgoff_t start, pgoff_t end);
-#endif
 
 static inline struct page *shmem_read_mapping_page(
struct address_space *mapping, pgoff_t index)


We don't need the ifdefs around declarations and they're a pain to
maintain and they'd add a *ton* of clutter if we even tried to do this
for real.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] ARM: dts: bcm5301x: Add BCM SVK DT files

2015-10-02 Thread Jon Mason

Add device tree files for Broadcom Northstar based SVKs.  Since the
bcm5301x.dtsi already exists, all that is necessary is the dts files to
enable the UARTs (and specify the RAM size for the 4708/9).  With these
files, the SVKs are able to boot to shell.

Signed-off-by: Jon Mason 
---
 arch/arm/boot/dts/Makefile   |  5 +++-
 arch/arm/boot/dts/bcm94708.dts   | 52 +++
 arch/arm/boot/dts/bcm94709.dts   | 52 +++
 arch/arm/boot/dts/bcm953012k.dts | 59 
 4 files changed, 167 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/boot/dts/bcm94708.dts
 create mode 100644 arch/arm/boot/dts/bcm94709.dts
 create mode 100644 arch/arm/boot/dts/bcm953012k.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 233159d..96a1b58 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -72,7 +72,10 @@ dtb-$(CONFIG_ARCH_BCM_5301X) += \
bcm47081-buffalo-wzr-900dhp.dtb \
bcm4709-asus-rt-ac87u.dtb \
bcm4709-buffalo-wxr-1900dhp.dtb \
-   bcm4709-netgear-r8000.dtb
+   bcm4709-netgear-r8000.dtb \
+   bcm94708.dtb \
+   bcm94709.dtb \
+   bcm953012k.dtb
 dtb-$(CONFIG_ARCH_BCM_63XX) += \
bcm963138dvt.dtb
 dtb-$(CONFIG_ARCH_BCM_CYGNUS) += \
diff --git a/arch/arm/boot/dts/bcm94708.dts b/arch/arm/boot/dts/bcm94708.dts
new file mode 100644
index 000..57a74c6
--- /dev/null
+++ b/arch/arm/boot/dts/bcm94708.dts
@@ -0,0 +1,52 @@
+/*
+ *  BSD LICENSE
+ *
+ *  Copyright(c) 2015 Broadcom Corporation.  All rights reserved.
+ *
+ *  Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *
+ ** Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ ** Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in
+ *  the documentation and/or other materials provided with the
+ *  distribution.
+ ** Neither the name of Broadcom Corporation nor the names of its
+ *  contributors may be used to endorse or promote products derived
+ *  from this software without specific prior written permission.
+ *
+ *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *  OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *  DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *  THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/dts-v1/;
+
+#include "bcm5301x.dtsi"
+
+/ {
+   model = "NorthStar SVK (BCM94708)";
+   compatible = "brcm,bcm4708";
+
+   aliases {
+   serial0 = 
+   };
+
+   chosen {
+   bootargs = "console=ttyS0,115200 mem=128M";
+   };
+};
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/bcm94709.dts b/arch/arm/boot/dts/bcm94709.dts
new file mode 100644
index 000..bf4b6e1
--- /dev/null
+++ b/arch/arm/boot/dts/bcm94709.dts
@@ -0,0 +1,52 @@
+/*
+ *  BSD LICENSE
+ *
+ *  Copyright(c) 2015 Broadcom Corporation.  All rights reserved.
+ *
+ *  Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *
+ ** Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ ** Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in
+ *  the documentation and/or other materials provided with the
+ *  distribution.
+ ** Neither the name of Broadcom Corporation nor the names of its
+ *  contributors may be used to endorse or promote products derived
+ *  from this software without specific prior written permission.
+ *
+ *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *  OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,

[PATCH 2/2] dt-bindings: Add new boards to bcm4708 DT bindings

2015-10-02 Thread Jon Mason

Add the 4708, 4709, and 953012k SVKs to the the documentation for the
Broadcom Northstar device tree bindings.

Signed-off-by: Jon Mason 
---
 Documentation/devicetree/bindings/arm/bcm/brcm,bcm4708.txt | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/bcm/brcm,bcm4708.txt 
b/Documentation/devicetree/bindings/arm/bcm/brcm,bcm4708.txt
index 6b0f49f..bdf4c06 100644
--- a/Documentation/devicetree/bindings/arm/bcm/brcm,bcm4708.txt
+++ b/Documentation/devicetree/bindings/arm/bcm/brcm,bcm4708.txt
@@ -5,4 +5,11 @@ Boards with the BCM4708 SoC shall have the following 
properties:
 
 Required root node property:
 
+bcm94709
 compatible = "brcm,bcm4708";
+
+bcm94709
+compatible = "brcm,bcm4709", "brcm,bcm4708";
+
+bcm953012k
+compatible = "brcm,bcm5301k", "brcm,bcm4708";
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] irqchip/GICv2m: Add workaround for APM X-Gene GICv2m erratum

2015-10-02 Thread Duc Dang

APM X-Gene GICv2m implementation has an erratum where the
MSI data needs to be the offset from the spi_start in order to
trigger the correct MSI interrupt. This is different from the
standard GICv2m implementation where the MSI data is the absolute
value within the range from spi_start to (spi_start + num_spis)
of each v2m frame.

This patch reads MSI_IIDR register (present in all GICv2m
implementations) to identify X-Gene GICv2m implementation and
apply workaround to change the data portion of MSI vector.

Signed-off-by: Duc Dang 
---
 drivers/irqchip/irq-gic-v2m.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index db04fc1..470972c 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -43,6 +43,10 @@
 
 #define V2M_MSI_TYPER_NUM_SPI(x)   ((x) & V2M_MSI_TYPER_NUM_MASK)
 
+#define V2M_MSI_IIDR   0xFCC
+/* APM X-Gene with GICv2m MSI_IIDR register value */
+#define XGENE_GICV2M_MSI_IIDR  0x06000170
+
 struct v2m_data {
spinlock_t msi_cnt_lock;
struct resource res;/* GICv2m resource */
@@ -98,6 +102,16 @@ static void gicv2m_compose_msi_msg(struct irq_data *data, 
struct msi_msg *msg)
msg->address_hi = (u32) (addr >> 32);
msg->address_lo = (u32) (addr);
msg->data = data->hwirq;
+   /*
+* APM X-Gene GICv2m implementation has an erratum where
+* the MSI data needs to be the offset from the spi_start
+* in order to trigger the correct MSI interrupt. This is
+* different from the standard GICv2m implementation where
+* the MSI data is the absolute value within the range from
+* spi_start to (spi_start + num_spis).
+*/
+   if (readl_relaxed(v2m->base + V2M_MSI_IIDR) == XGENE_GICV2M_MSI_IIDR)
+   msg->data = data->hwirq - v2m->spi_start;
 }
 
 static struct irq_chip gicv2m_irq_chip = {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] percpu_counter: return precise count from __percpu_counter_compare()

2015-10-02 Thread Dave Chinner

On Fri, Oct 02, 2015 at 01:29:57PM -0400, Waiman Long wrote:
> In __percpu_counter_compare(), if the current imprecise count is
> within (batch*nr_cpus) of the input value to be compared, a call
> to percpu_counter_sum() will be made to get the precise count. The
> percpu_counter_sum() call, however, can be expensive especially on
> large systems where there are a lot of CPUs. Large systems also make
> it more likely that percpu_counter_sum() will be called.
> 
> The xfs_mod_fdblocks() function calls __percpu_counter_compare()
> twice. First to see if a smaller batch size should be used for
> __percpu_counter_add() and the second call to compare the actual
> size needed. This can potentially lead to 2 calls to the expensive
> percpu_counter_sum() function.

There should not be that much overhead in __percpu_counter_compare()
through this path in normal operation. The slow path is only taken
as you near ENOSPC...

> This patch added an extra argument to __percpu_counter_compare()
> to return the precise count, if computed. The caller will need to
> initialize it to an invalid value that it can tell if the precise
> count is being returned.

This doesn't work. ENOSPC detection is a lockless algorithm that
requires absolute precision. Assuming the XFS_ALLOC_SET_ASIDE()
definition of ENOSPC is 0 blocks free, your change allows this race:

free space: 1 block

thread 1thread 2free space
allocate 1 blockallocate 1 block  1
sample pcount = 1 1
sample pcount = 1 1
add fdblocks, -1, 1)  0
add fdblocks, -1, 1)  -1
if (pcount - 1 >= 0)if (pcount - 1 >= 0)
   OK!  OK!   -1

So, we've just failed to detect ENOSPC correct. One of those two
threads should have returned ENOSPC and failed the allocation,
but instead we've just allowed XFS to allocate a block that doesn't
exist. Hence we have to resample the percpu counter after the
modification to ensure that we don't miss this race condition.

Sure, the curent code could race on the second comparisions and
return ENOSPC to both threads, but that is a perfectly OK thing
to do. It is vitally important that we don't oversubscribe
filesystem space, because that will lead to all sorts of other
problems (deadlocks, hangs, shutdowns, etc) that are very difficult
to identify the cause of.

FWIW, I'm guessing that you didn't run this patch through xfstests?
xfstests will find these ENOSPC accounting bugs, and usually quite
quickly...

> Running the AIM7 disk workload with XFS filesystem, the jobs/min
> on a 40-core 80-thread 4-socket Haswell-EX system increases from
> 3805k to 4276k (12% increase) with this patch applied. As measured
> by the perf tool, the %CPU cycle consumed by __percpu_counter_sum()
> decreases from 12.64% to 7.08%.

XFS should only hit the slow __percpu_counter_sum() path patch as
the fs gets close to ENOSPC, which for your system will be less
than:

threshold = num_online_cpus * XFS_FDBLOCKS_BATCH * 2 blocks
  = 80 * 1024 * 2 blocks
  = 160,000 blocks
  = 640MB of disk space.

Having less than 1GB of free space in an XFS filesystem is
considered to be "almost ENOSPC" - when you have TB to PB of space,
less than 1GB really "moments before ENOSPC".

XFS trades off low overhead for fast path allocation  with slowdowns
as we near ENOSPC in allocation routines. It gets harder to find
contiguous free space, files get more fragmented, IO takes longer
because we seek more, etc. Hence we accept that performance slows
down as as the need for precision increases as we near ENOSPC.

I'd suggest you retry your benchmark with larger filesystems, and
see what happens...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v5 of seccomp filter c/r patches

2015-10-02 Thread Andy Lutomirski

On Fri, Oct 2, 2015 at 3:06 PM, Kees Cook  wrote:
> On Fri, Oct 2, 2015 at 3:04 PM, Andy Lutomirski  wrote:
>> On Fri, Oct 2, 2015 at 3:02 PM, Kees Cook  wrote:
>>> On Fri, Oct 2, 2015 at 2:29 PM, Andy Lutomirski  wrote:
 On Fri, Oct 2, 2015 at 2:10 PM, Kees Cook  wrote:
> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
>  wrote:
>> Hi all,
>>
>> Here's v5 of the seccomp filter c/r set. The individual patch notes have
>> changes, but two highlights are:
>>
>> * This series is now based on http://patchwork.ozlabs.org/patch/525492/ 
>> and
>>   will need to be built with that patch applied. This gets rid of two 
>> incorrect
>>   patches in the previous series and is a nicer API.
>>
>> * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return 
>> the
>>   same struct file across calls, so we still need a kcmp command. I've 
>> narrowed
>>   the scope of the one being added to only compare seccomp fds.
>>
>> Thoughts welcome,
>
> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
>
> Happy bit:
> - avoiding eBPF and just saving the original filters makes things much 
> easier.
>
> Sad bit:
> - inventing a new interface for seccompfds feels like massive overkill to 
> me.
>
> While Andy has big dreams, we're not presently doing seccompfd
> monitoring, etc. There's no driving user for that kind of interface,
> and accepting the maintenance burden of it only for CRIU seems unwise.
>
> So, I'll go back to what I originally proposed at LSS (which it looks
> like we're half way there now):
>
> - save the original filter (done!)
> - extract filters through a single special-purpose interface (looks
> like ptrace is the way to go: root-only, stopped process, etc)
> - compare filter content and issue TSYNCs to merge detected sibling
> threads, since merging things that weren't merged before creates no
> problems.
>
> This means the parenting logic is heuristic, but it's entirely in
> userspace, so the complexity burden doesn't live in seccomp which we,
> by design, want to keep as simple as possible.

 This is okay with me with a future-proofing caveat: I think that
 whatever reads out the filter should be clearly documented as
 returning some special error code that indicates that that filter it
 tried to read wasn't in the expected form.  That would happen for
 native eBPF filters, and it would also happen for seccomp monitors
 even if those monitors use classic BPF.
>>>
>>> As in, it should have something like "give me BPF" and that'll start
>>> failing when it's only eBPF in the future?
>>
>> Yes, but it might also start failing when if my dreams come true, it's
>> still classic BPF, but it's no longer a classic seccomp bpf filter
>> layer with the semantics we expect today.  (E.g. if it's classic bpf
>> but has a monitor attached, then the read should fail because
>> restoring it without restoring the monitor will cause all kinds of
>> mess.)
>
> Ah-ha! Understood, and yeah, that seems fine.
>
> Speaking of dreams -- what do you think about re-running seccomp in
> the face of changed syscalls due to ptrace? Closing the ptrace hole
> would be really nice.

Yes, absolutely!  We might even want to just move the seccomp check
after ptrace (except for seccomp-induced ptrace).

Unfortunately, I backed us into a corner with two-phase seccomp on
x86, and it's a big mess.  (I wrote the seccomp vs ptrace patches, and
I don't think they're acceptable.)  My big x86 low-level rewrite is an
attempt to get back out of that corner, and I'm hoping to resubmit the
bulk of it today or tomorrow.  Once that happens, I just need to fix
up the 64-bit native case (trivial, I know) and then revert two-phase
seccomp.

One nice outcome of all of this will be that the syscall tables will
contain bona fide C ABI compliant function pointers, which is
currently not the case.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] arm: Add generic smc wrapper

2015-10-02 Thread Arnd Bergmann

On Friday 02 October 2015 13:13:24 kbuild test robot wrote:
> 
> 15  
> 16  #include 
> 17  
> 18  /* int invoke_smc(u32 function_id, u32 arg0, u32 arg1, u32 arg2) */
> 19  ENTRY(invoke_smc)
> 20  __SMC(0)
>   > 21  bx  lr
> 22  ENDPROC(invoke_smc)
> 
> 

Maybe compile this only for ARMv6 and ARMv7? The error was for an ARMv3/v4 
build,
which has neither SMC nor bx.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 14/15] mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup

2015-10-02 Thread Logan Gunthorpe


Hi Dan,

Good to know you've already addressed the struct page issue. We'll watch 
out for an updated patchset to try.



On 02/10/15 03:53 PM, Dan Williams wrote:

Hmm, I didn't have peer-to-peer PCI-E in mind for this mechanism, but
the test report is welcome nonetheless.  The definition of dma_addr_t
is the device view of host memory, not necessarily the device view of
a peer device's memory range, so I expect you'll run into issues with
IOMMUs and other parts of the kernel that assume this definition.


Yeah, we've actually been doing this with a number of more "hacky" 
techniques for some time. ZONE_DEVICE just provides us with a much 
cleaner way to set this up that doesn't require patching around 
get_user_pages in various places in the kernel.


We've never had any issues with the IOMMU getting in the way (at least 
on Intel x86). My understanding always was that the IOMMU sits between a 
PCI card and main memory; it doesn't get in the way of peer-to-peer 
transfers. Though admittedly, I don't have a complete understanding of 
how the IOMMU works in the kernel. I'm just speaking from experimental 
experience. We've never actually tried this on other architectures.


Thanks,

Logan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging/lustre: Make nrs_policy_get_info_locked() static

2015-10-02 Thread Arnd Bergmann

On Saturday 03 October 2015 06:10:12 kbuild test robot wrote:
> >> drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c:456:13: error: static 
> >> declaration of 'nrs_policy_get_info_locked' follows non-static declaration
> static void nrs_policy_get_info_locked(struct ptlrpc_nrs_policy *policy,
> ^
>In file included from 
> drivers/staging/lustre/lustre/ptlrpc/../include/lustre_lib.h:64:0,
> from 
> drivers/staging/lustre/lustre/ptlrpc/../include/obd.h:52,
> from 
> drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c:40:
>drivers/staging/lustre/lustre/ptlrpc/../include/lustre_net.h:1542:6: note: 
> previous declaration of 'nrs_policy_get_info_locked' was here
> void nrs_policy_get_info_locked(struct ptlrpc_nrs_policy *policy,
>  ^
> 

fwiw, the patch should be fine on staging-testing, just not on mainline at the 
moment.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging/lustre: Make nrs_policy_get_info_locked() static

2015-10-02 Thread kbuild test robot

Hi Rocco,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please 
ignore]

config: sparc64-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64 

All error/warnings (new ones prefixed by >>):

>> drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c:456:13: error: static 
>> declaration of 'nrs_policy_get_info_locked' follows non-static declaration
static void nrs_policy_get_info_locked(struct ptlrpc_nrs_policy *policy,
^
   In file included from 
drivers/staging/lustre/lustre/ptlrpc/../include/lustre_lib.h:64:0,
from 
drivers/staging/lustre/lustre/ptlrpc/../include/obd.h:52,
from drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c:40:
   drivers/staging/lustre/lustre/ptlrpc/../include/lustre_net.h:1542:6: note: 
previous declaration of 'nrs_policy_get_info_locked' was here
void nrs_policy_get_info_locked(struct ptlrpc_nrs_policy *policy,
 ^

vim +/nrs_policy_get_info_locked +456 
drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c

   450   *
   451   * Information is copied in \a info.
   452   *
   453   * \param[in] policy The policy
   454   * \param[out] info  Holds returned status information
   455   */
 > 456  static void nrs_policy_get_info_locked(struct ptlrpc_nrs_policy *policy,
   457  struct ptlrpc_nrs_pol_info *info)
   458  {
   459  LASSERT(policy != NULL);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

Re: Missing operand for tlbie instruction on Power7

2015-10-02 Thread Laura Abbott


On 10/02/2015 03:00 PM, Segher Boessenkool wrote:

On Sat, Oct 03, 2015 at 12:37:35AM +0300, Denis Kirjanov wrote:

-0: tlbie   r4; \
+0: tlbie   r4, 0;  \


This isn't correct.  With POWER7 and later (which this compile
is, since it's on LE), the tlbie instruction takes two register
operands:

 tlbie RB, RS

The tlbie instruction on pre POWER7 cpus had one required register
operand (RB) and an optional second L operand, where if you omitted
it, it was the same as using "0":

 tlbie RB, L

This is a POWER7 and later build, so your change which adds the "0"
above is really adding r0 for RS.  The new tlbie instruction doesn't
treat r0 specially, so you'll be using whatever random bits which
happen to be in r0 which I don't think that is what you want.


Ok, than we can just zero out r5 for example and use it in tlbie as RS,
right?


That won't assemble _unless_ your assembler is in POWER7 mode.  It also
won't do the right thing at run time on older machines.

Where is this tlbia macro used at all, for 64-bit machines?




[labbott@labbott-redhat-machine linux_upstream]$ make ARCH=powerpc 
CROSS_COMPILE=powerpc64-linux-gnu-
  CHK include/config/kernel.release
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
  CHK include/generated/bounds.h
  CHK include/generated/timeconst.h
  CHK include/generated/asm-offsets.h
  CALLscripts/checksyscalls.sh
  CHK include/generated/compile.h
  CALLarch/powerpc/kernel/systbl_chk.sh
  AS  arch/powerpc/kernel/swsusp_asm64.o
arch/powerpc/kernel/swsusp_asm64.S: Assembler messages:
arch/powerpc/kernel/swsusp_asm64.S:188: Error: missing operand
scripts/Makefile.build:294: recipe for target 
'arch/powerpc/kernel/swsusp_asm64.o' failed
make[1]: *** [arch/powerpc/kernel/swsusp_asm64.o] Error 1
Makefile:941: recipe for target 'arch/powerpc/kernel' failed
make: *** [arch/powerpc/kernel] Error 2

This is piece of code protected by CONFIG_PPC_BOOK3S_64.
 


Segher



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] pinctrl: pinconf-generic: add "input-schmitt" DT property

2015-10-02 Thread Linus Walleij

On Wed, Sep 30, 2015 at 5:07 AM, Masahiro Yamada
 wrote:

> PIN_CONFIG_INPUT_SCHMITT is defined in enum_pin_config_param,
> but the corresponding DT property is missing.
>
> Signed-off-by: Masahiro Yamada 

Patch applied.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] pinctrl: pinconf-generic: sort pin configuration params alphabetically

2015-10-02 Thread Linus Walleij

On Wed, Sep 30, 2015 at 5:07 AM, Masahiro Yamada
 wrote:

> Currently, the dt_params array in drivers/pinctrl/pinconf-generic.c
> is not sorted in the same order as the enum pin_config_param in
> include/linux/pinctrl/pinconf-generic.h.
>
> Sort enum pin_config_param, conf_items, dt_params, alphabetically
> for consistency.
>
> Signed-off-by: Masahiro Yamada 

Patch applied.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v5 of seccomp filter c/r patches

2015-10-02 Thread Kees Cook

On Fri, Oct 2, 2015 at 3:04 PM, Andy Lutomirski  wrote:
> On Fri, Oct 2, 2015 at 3:02 PM, Kees Cook  wrote:
>> On Fri, Oct 2, 2015 at 2:29 PM, Andy Lutomirski  wrote:
>>> On Fri, Oct 2, 2015 at 2:10 PM, Kees Cook  wrote:
 On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
  wrote:
> Hi all,
>
> Here's v5 of the seccomp filter c/r set. The individual patch notes have
> changes, but two highlights are:
>
> * This series is now based on http://patchwork.ozlabs.org/patch/525492/ 
> and
>   will need to be built with that patch applied. This gets rid of two 
> incorrect
>   patches in the previous series and is a nicer API.
>
> * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return 
> the
>   same struct file across calls, so we still need a kcmp command. I've 
> narrowed
>   the scope of the one being added to only compare seccomp fds.
>
> Thoughts welcome,

 Hi, sorry I've been slow/busy. I'm finally reading through these threads.

 Happy bit:
 - avoiding eBPF and just saving the original filters makes things much 
 easier.

 Sad bit:
 - inventing a new interface for seccompfds feels like massive overkill to 
 me.

 While Andy has big dreams, we're not presently doing seccompfd
 monitoring, etc. There's no driving user for that kind of interface,
 and accepting the maintenance burden of it only for CRIU seems unwise.

 So, I'll go back to what I originally proposed at LSS (which it looks
 like we're half way there now):

 - save the original filter (done!)
 - extract filters through a single special-purpose interface (looks
 like ptrace is the way to go: root-only, stopped process, etc)
 - compare filter content and issue TSYNCs to merge detected sibling
 threads, since merging things that weren't merged before creates no
 problems.

 This means the parenting logic is heuristic, but it's entirely in
 userspace, so the complexity burden doesn't live in seccomp which we,
 by design, want to keep as simple as possible.
>>>
>>> This is okay with me with a future-proofing caveat: I think that
>>> whatever reads out the filter should be clearly documented as
>>> returning some special error code that indicates that that filter it
>>> tried to read wasn't in the expected form.  That would happen for
>>> native eBPF filters, and it would also happen for seccomp monitors
>>> even if those monitors use classic BPF.
>>
>> As in, it should have something like "give me BPF" and that'll start
>> failing when it's only eBPF in the future?
>
> Yes, but it might also start failing when if my dreams come true, it's
> still classic BPF, but it's no longer a classic seccomp bpf filter
> layer with the semantics we expect today.  (E.g. if it's classic bpf
> but has a monitor attached, then the read should fail because
> restoring it without restoring the monitor will cause all kinds of
> mess.)

Ah-ha! Understood, and yeah, that seems fine.

Speaking of dreams -- what do you think about re-running seccomp in
the face of changed syscalls due to ptrace? Closing the ptrace hole
would be really nice.

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 01/53] sparc/PCI: Add mem64 resource parsing for root bus

2015-10-02 Thread Yinghai Lu

On Fri, Oct 2, 2015 at 1:00 PM, Khalid Aziz  wrote:
> On Wed, 2015-09-30 at 22:52 -0700, Yinghai Lu wrote:
>> Found "no compatible bridge window" warning in boot log from T5-8.
>>
>> pci :00:01.0: can't claim BAR 15 [mem 0x1-0x4afff pref]: no 
>> compatible bridge window
>>
>> That resource is above 4G, but does not get offset correctly as
>> root bus only report io and mem32.
>>
>> pci_sun4v f02dbcfc: PCI host bridge to bus :00
>> pci_bus :00: root bus resource [io  0x8040-0x80400fff] (bus 
>> address [0x-0xfff])
>> pci_bus :00: root bus resource [mem 0x8000-0x80007eff] (bus 
>> address [0x-0x7eff])
>> pci_bus :00: root bus resource [bus 00-77]
>>
>> Add mem64 handling in pci_common for sparc, so we can have 64bit resource
>> registered for root bus at first.
>>
>> After patch, will have:
>> pci_sun4v f02dbcfc: PCI host bridge to bus :00
>> pci_bus :00: root bus resource [io  0x8040-0x80400fff] (bus 
>> address [0x-0xfff])
>> pci_bus :00: root bus resource [mem 0x8000-0x80007eff] (bus 
>> address [0x-0x7eff])
>> pci_bus :00: root bus resource [mem 0x8001-0x8007] (bus 
>> address [0x1-0x7])
>> pci_bus :00: root bus resource [bus 00-77]
>>
>> -v2: mem64_space should use mem_space.start as offset.
>> -v3: add IORESOURCE_MEM_64 flag
...
> PCI: Scanning PBM /pci@301
> pci_sun4v f0339c2c: PCI host bridge to bus 0009:00
> pci_bus 0009:00: root bus resource [io  0x2027e4000-0x2027e4fff] (bus 
> address [0x-0xfff])
> pci_bus 0009:00: root bus resource [mem 0x202400010-0x202407eff] (bus 
> address [0x-0x7eef])
> pci_bus 0009:00: root bus resource [mem 0x20241-0x2024d] (bus 
> address [0xfff0-0xdffef])

Looks like offset for mmio64 is not right.

Please check attached patch on the this platform  and T5-8.

Thanks

Yinghai
---
 arch/sparc/kernel/pci.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/arch/sparc/kernel/pci.c
===
--- linux-2.6.orig/arch/sparc/kernel/pci.c
+++ linux-2.6/arch/sparc/kernel/pci.c
@@ -659,7 +659,7 @@ struct pci_bus *pci_scan_one_pbm(struct
 pbm->mem_space.start);
 	if (pbm->mem64_space.flags)
 		pci_add_resource_offset(, >mem64_space,
-	pbm->mem_space.start);
+	pbm->mem64_space.start);
 	pbm->busn.start = pbm->pci_first_busno;
 	pbm->busn.end	= pbm->pci_last_busno;
 	pbm->busn.flags	= IORESOURCE_BUS;

Re: [fuse-devel] [PATCH] fuse: break infinite loop in fuse_fill_write_pages()

2015-10-02 Thread Andrew Morton

On Fri, 2 Oct 2015 12:27:45 -0700 Maxim Patlasov  
wrote:

> On 10/02/2015 04:21 AM, Konstantin Khlebnikov wrote:
> > Bump. Add more peopple in CC.
> >
> > On Mon, Sep 21, 2015 at 1:02 PM, Roman Gushchin  
> > wrote:
> >> I got a report about unkillable task eating CPU. Thge further
> >> investigation shows, that the problem is in the fuse_fill_write_pages()
> >> function. If iov's first segment has zero length, we get an infinite
> >> loop, because we never reach iov_iter_advance() call.
> 
> iov_iter_copy_from_user_atomic() eventually calls iterate_iovec(). The 
> latter silently consumes zero-length iov. So I don't think "iov's first 
> segment has zero length" can cause infinite loop.

I'm suspecting it got stuck because local variable `bytes' is zero, so
the code does `goto again' repeatedly.

Or maybe not.   A more complete description of the bug would help. 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v5 of seccomp filter c/r patches

2015-10-02 Thread Andy Lutomirski

On Fri, Oct 2, 2015 at 3:02 PM, Kees Cook  wrote:
> On Fri, Oct 2, 2015 at 2:29 PM, Andy Lutomirski  wrote:
>> On Fri, Oct 2, 2015 at 2:10 PM, Kees Cook  wrote:
>>> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
>>>  wrote:
 Hi all,

 Here's v5 of the seccomp filter c/r set. The individual patch notes have
 changes, but two highlights are:

 * This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
   will need to be built with that patch applied. This gets rid of two 
 incorrect
   patches in the previous series and is a nicer API.

 * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
   same struct file across calls, so we still need a kcmp command. I've 
 narrowed
   the scope of the one being added to only compare seccomp fds.

 Thoughts welcome,
>>>
>>> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
>>>
>>> Happy bit:
>>> - avoiding eBPF and just saving the original filters makes things much 
>>> easier.
>>>
>>> Sad bit:
>>> - inventing a new interface for seccompfds feels like massive overkill to 
>>> me.
>>>
>>> While Andy has big dreams, we're not presently doing seccompfd
>>> monitoring, etc. There's no driving user for that kind of interface,
>>> and accepting the maintenance burden of it only for CRIU seems unwise.
>>>
>>> So, I'll go back to what I originally proposed at LSS (which it looks
>>> like we're half way there now):
>>>
>>> - save the original filter (done!)
>>> - extract filters through a single special-purpose interface (looks
>>> like ptrace is the way to go: root-only, stopped process, etc)
>>> - compare filter content and issue TSYNCs to merge detected sibling
>>> threads, since merging things that weren't merged before creates no
>>> problems.
>>>
>>> This means the parenting logic is heuristic, but it's entirely in
>>> userspace, so the complexity burden doesn't live in seccomp which we,
>>> by design, want to keep as simple as possible.
>>
>> This is okay with me with a future-proofing caveat: I think that
>> whatever reads out the filter should be clearly documented as
>> returning some special error code that indicates that that filter it
>> tried to read wasn't in the expected form.  That would happen for
>> native eBPF filters, and it would also happen for seccomp monitors
>> even if those monitors use classic BPF.
>
> As in, it should have something like "give me BPF" and that'll start
> failing when it's only eBPF in the future?

Yes, but it might also start failing when if my dreams come true, it's
still classic BPF, but it's no longer a classic seccomp bpf filter
layer with the semantics we expect today.  (E.g. if it's classic bpf
but has a monitor attached, then the read should fail because
restoring it without restoring the monitor will cause all kinds of
mess.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v5 of seccomp filter c/r patches

2015-10-02 Thread Kees Cook

On Fri, Oct 2, 2015 at 2:29 PM, Andy Lutomirski  wrote:
> On Fri, Oct 2, 2015 at 2:10 PM, Kees Cook  wrote:
>> On Fri, Oct 2, 2015 at 9:27 AM, Tycho Andersen
>>  wrote:
>>> Hi all,
>>>
>>> Here's v5 of the seccomp filter c/r set. The individual patch notes have
>>> changes, but two highlights are:
>>>
>>> * This series is now based on http://patchwork.ozlabs.org/patch/525492/ and
>>>   will need to be built with that patch applied. This gets rid of two 
>>> incorrect
>>>   patches in the previous series and is a nicer API.
>>>
>>> * I couldn't figure out a nice way to have SECCOMP_GET_FILTER_FD return the
>>>   same struct file across calls, so we still need a kcmp command. I've 
>>> narrowed
>>>   the scope of the one being added to only compare seccomp fds.
>>>
>>> Thoughts welcome,
>>
>> Hi, sorry I've been slow/busy. I'm finally reading through these threads.
>>
>> Happy bit:
>> - avoiding eBPF and just saving the original filters makes things much 
>> easier.
>>
>> Sad bit:
>> - inventing a new interface for seccompfds feels like massive overkill to me.
>>
>> While Andy has big dreams, we're not presently doing seccompfd
>> monitoring, etc. There's no driving user for that kind of interface,
>> and accepting the maintenance burden of it only for CRIU seems unwise.
>>
>> So, I'll go back to what I originally proposed at LSS (which it looks
>> like we're half way there now):
>>
>> - save the original filter (done!)
>> - extract filters through a single special-purpose interface (looks
>> like ptrace is the way to go: root-only, stopped process, etc)
>> - compare filter content and issue TSYNCs to merge detected sibling
>> threads, since merging things that weren't merged before creates no
>> problems.
>>
>> This means the parenting logic is heuristic, but it's entirely in
>> userspace, so the complexity burden doesn't live in seccomp which we,
>> by design, want to keep as simple as possible.
>
> This is okay with me with a future-proofing caveat: I think that
> whatever reads out the filter should be clearly documented as
> returning some special error code that indicates that that filter it
> tried to read wasn't in the expected form.  That would happen for
> native eBPF filters, and it would also happen for seccomp monitors
> even if those monitors use classic BPF.

As in, it should have something like "give me BPF" and that'll start
failing when it's only eBPF in the future?

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Missing operand for tlbie instruction on Power7

2015-10-02 Thread Segher Boessenkool

On Sat, Oct 03, 2015 at 12:37:35AM +0300, Denis Kirjanov wrote:
> >> -0: tlbie   r4; \
> >> +0: tlbie   r4, 0;  \
> >
> > This isn't correct.  With POWER7 and later (which this compile
> > is, since it's on LE), the tlbie instruction takes two register
> > operands:
> >
> > tlbie RB, RS
> >
> > The tlbie instruction on pre POWER7 cpus had one required register
> > operand (RB) and an optional second L operand, where if you omitted
> > it, it was the same as using "0":
> >
> > tlbie RB, L
> >
> > This is a POWER7 and later build, so your change which adds the "0"
> > above is really adding r0 for RS.  The new tlbie instruction doesn't
> > treat r0 specially, so you'll be using whatever random bits which
> > happen to be in r0 which I don't think that is what you want.
> 
> Ok, than we can just zero out r5 for example and use it in tlbie as RS,
> right?

That won't assemble _unless_ your assembler is in POWER7 mode.  It also
won't do the right thing at run time on older machines.

Where is this tlbia macro used at all, for 64-bit machines?


Segher
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging/lustre: Make nrs_policy_get_info_locked() static

2015-10-02 Thread Arnd Bergmann

On Friday 02 October 2015 23:54:26 Rocco Folino wrote:
> This patch fixes the warning generated by sparse: "symbol 
> 'nrs_policy_get_info_locked' was not
> declared. Should it be static?" by declaring the function static.
> 
> Signed-off-by: Rocco Folino 
> 

Reviewed-by: Arnd Bergmann 

This probably triggered a sparse warning after the unused declaration was
removed in my "staging/lustre: remove lots of dead code".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] x86: guest: rely on leaf 0x40000001 to detect Hyper-V

2015-10-02 Thread KY Srinivasan



> -Original Message-
> From: Thomas Gleixner [mailto:t...@linutronix.de]
> Sent: Friday, October 2, 2015 1:07 PM
> To: KY Srinivasan 
> Cc: Paolo Bonzini ; linux-kernel@vger.kernel.org;
> Haiyang Zhang ; x...@kernel.org;
> de...@linuxdriverproject.org; alex.william...@redhat.com
> Subject: RE: [PATCH] x86: guest: rely on leaf 0x4001 to detect Hyper-V
> 
> On Fri, 2 Oct 2015, KY Srinivasan wrote:
> > > Change ms_hyperv_platform to actually do what the specification
> suggests.
> > > This roughy matches what Windows looks for, though Windows actually
> > > ignores HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS completely.
> > >
> > > Signed-off-by: Paolo Bonzini 
> >
> > Thanks Paolo.
> >
> > Signed-off-by: K. Y. Srinivasan 
> 
> Does that mean Acked-by or Reviewed-by? SOB certainly does not apply
> here.
Thomas,

I have reviewed the patch and am acking this change.

Regards,

K. Y
> 
> Thanks,
> 
>   tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 5/6] Documentation: dt-bindings: pci: altera pcie device tree binding

2015-10-02 Thread Arnd Bergmann

On Friday 02 October 2015 15:53:44 Ley Foon Tan wrote:
> > Strictly speaking, if you have undocumented bindings downstream that
> > is your problem and we don't have to accept them as-is upstream. I'm
> > not going to worry about that here.
> >
> >>> txs contains the config space?
> >> It is not the config space, but a memory slave port.
> >
> > Then where is the config space? It should not be part of "ranges" is
> > all I care about.
> The config space is not part of "ranges". Our IP uses TLP packet to
> access config space.
> 

It took me a bit to figure out what you mean here. To save others
from reading the source, here is what I found:

* The config space is accessed indirectly through registers from the
  "cra" register range, which is the right approach according to the
  point that Rob made.
* hardware-wise this basically looks like bit-banged PCIe, which is
  both awesome and scary ;-)

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1624 matches

Mail list logo