Re: [PATCH] clocksource: Add heuristics to avoid switching away from TSC due to timer delay

2018-12-04 Thread Roland Dreier
>  /*
>   * Proper multiline comments look like this not like
>   * the above.
>   */

Got it, will fix next time around.

> That aside. Why are you trying to do heuristics on the delta?
>
> We have way better information than that. The watchdog timer expiry time is
> known and we can determine the exact delay of the timer.
>
> The watchdog clocksource provides the maximum 'idle' time, i.e. the time
> between two reads, in clocksource::max_idle_ns. That value is filled in
> when the clocksource is configured.
>
> So without doing speculation we can make an informed decision:
>
> elapsed = jiffies_to_nsec(jiffies - watchdog_timer->expires) +
>   WATCHDOG_INTERVAL_NS;
>
> if (elapsed > wdcs->max_idle_ns) {
> Skip ..
> }

Yes, that makes more sense than what I was doing, although I'm not
sure on the details.  Just missed that idea.

Why are you adding the watchdog interval to the calculated elapsed
time?  It seems we have an issue exactly if jiffies -
watchdog_timer->expires is too big, without adding the interval we
tried to wait in on top.  Also I think we might want to be careful
that jiffies is >= the expires time - or is it not possible that a
timer fires one jiffy early?

Also for full generality it seems we should check against the
clocksource max_idle_ns as well - for x86 TSC is wider than HPET but
there may be other architectures that could hit the same problem, just
with the clocksource being checked wrapping around instead of the
watchdog clocksource.  Right?

Thanks!
  Roland


Re: [PATCH] clocksource: Add heuristics to avoid switching away from TSC due to timer delay

2018-12-04 Thread Roland Dreier
>  /*
>   * Proper multiline comments look like this not like
>   * the above.
>   */

Got it, will fix next time around.

> That aside. Why are you trying to do heuristics on the delta?
>
> We have way better information than that. The watchdog timer expiry time is
> known and we can determine the exact delay of the timer.
>
> The watchdog clocksource provides the maximum 'idle' time, i.e. the time
> between two reads, in clocksource::max_idle_ns. That value is filled in
> when the clocksource is configured.
>
> So without doing speculation we can make an informed decision:
>
> elapsed = jiffies_to_nsec(jiffies - watchdog_timer->expires) +
>   WATCHDOG_INTERVAL_NS;
>
> if (elapsed > wdcs->max_idle_ns) {
> Skip ..
> }

Yes, that makes more sense than what I was doing, although I'm not
sure on the details.  Just missed that idea.

Why are you adding the watchdog interval to the calculated elapsed
time?  It seems we have an issue exactly if jiffies -
watchdog_timer->expires is too big, without adding the interval we
tried to wait in on top.  Also I think we might want to be careful
that jiffies is >= the expires time - or is it not possible that a
timer fires one jiffy early?

Also for full generality it seems we should check against the
clocksource max_idle_ns as well - for x86 TSC is wider than HPET but
there may be other architectures that could hit the same problem, just
with the clocksource being checked wrapping around instead of the
watchdog clocksource.  Right?

Thanks!
  Roland


[tip:x86/timers] x86/hpet: Remove unused FSEC_PER_NSEC define

2018-12-04 Thread tip-bot for Roland Dreier
Commit-ID:  d999c0ec2498e54b9328db6b2c1037710025add1
Gitweb: https://git.kernel.org/tip/d999c0ec2498e54b9328db6b2c1037710025add1
Author: Roland Dreier 
AuthorDate: Fri, 30 Nov 2018 13:14:50 -0800
Committer:  Borislav Petkov 
CommitDate: Tue, 4 Dec 2018 12:17:21 +0100

x86/hpet: Remove unused FSEC_PER_NSEC define

The FSEC_PER_NSEC macro has had zero users since commit

  ab0e08f15d23 ("x86: hpet: Cleanup the clockevents init and register code").

Remove it.

Signed-off-by: Roland Dreier 
Signed-off-by: Borislav Petkov 
Acked-by: Thomas Gleixner 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20181130211450.5200-1-rol...@purestorage.com
---
 arch/x86/kernel/hpet.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index b0acb22e5a46..dfd3aca82c61 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -21,10 +21,6 @@
 
 #define HPET_MASK  CLOCKSOURCE_MASK(32)
 
-/* FSEC = 10^-15
-   NSEC = 10^-9 */
-#define FSEC_PER_NSEC  100L
-
 #define HPET_DEV_USED_BIT  2
 #define HPET_DEV_USED  (1 << HPET_DEV_USED_BIT)
 #define HPET_DEV_VALID 0x8


[tip:x86/timers] x86/hpet: Remove unused FSEC_PER_NSEC define

2018-12-04 Thread tip-bot for Roland Dreier
Commit-ID:  d999c0ec2498e54b9328db6b2c1037710025add1
Gitweb: https://git.kernel.org/tip/d999c0ec2498e54b9328db6b2c1037710025add1
Author: Roland Dreier 
AuthorDate: Fri, 30 Nov 2018 13:14:50 -0800
Committer:  Borislav Petkov 
CommitDate: Tue, 4 Dec 2018 12:17:21 +0100

x86/hpet: Remove unused FSEC_PER_NSEC define

The FSEC_PER_NSEC macro has had zero users since commit

  ab0e08f15d23 ("x86: hpet: Cleanup the clockevents init and register code").

Remove it.

Signed-off-by: Roland Dreier 
Signed-off-by: Borislav Petkov 
Acked-by: Thomas Gleixner 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20181130211450.5200-1-rol...@purestorage.com
---
 arch/x86/kernel/hpet.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index b0acb22e5a46..dfd3aca82c61 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -21,10 +21,6 @@
 
 #define HPET_MASK  CLOCKSOURCE_MASK(32)
 
-/* FSEC = 10^-15
-   NSEC = 10^-9 */
-#define FSEC_PER_NSEC  100L
-
 #define HPET_DEV_USED_BIT  2
 #define HPET_DEV_USED  (1 << HPET_DEV_USED_BIT)
 #define HPET_DEV_VALID 0x8


[PATCH] x86/hpet: Remove unused FSEC_PER_NSEC define

2018-12-04 Thread Roland Dreier
The FSEC_PER_NSEC macro has had zero users since commit ab0e08f15d23
("x86: hpet: Cleanup the clockevents init and register code").

Signed-off-by: Roland Dreier 
---
 arch/x86/kernel/hpet.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index b0acb22e5a46..dfd3aca82c61 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -21,10 +21,6 @@
 
 #define HPET_MASK  CLOCKSOURCE_MASK(32)
 
-/* FSEC = 10^-15
-   NSEC = 10^-9 */
-#define FSEC_PER_NSEC  100L
-
 #define HPET_DEV_USED_BIT  2
 #define HPET_DEV_USED  (1 << HPET_DEV_USED_BIT)
 #define HPET_DEV_VALID 0x8
-- 
2.19.1



[PATCH] x86/hpet: Remove unused FSEC_PER_NSEC define

2018-12-04 Thread Roland Dreier
The FSEC_PER_NSEC macro has had zero users since commit ab0e08f15d23
("x86: hpet: Cleanup the clockevents init and register code").

Signed-off-by: Roland Dreier 
---
 arch/x86/kernel/hpet.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index b0acb22e5a46..dfd3aca82c61 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -21,10 +21,6 @@
 
 #define HPET_MASK  CLOCKSOURCE_MASK(32)
 
-/* FSEC = 10^-15
-   NSEC = 10^-9 */
-#define FSEC_PER_NSEC  100L
-
 #define HPET_DEV_USED_BIT  2
 #define HPET_DEV_USED  (1 << HPET_DEV_USED_BIT)
 #define HPET_DEV_VALID 0x8
-- 
2.19.1



[PATCH] clocksource: Add heuristics to avoid switching away from TSC due to timer delay

2018-11-30 Thread Roland Dreier
On a modern x86 system, the TSC is used as a clocksource, with HPET
used in the clocksource watchdog to make sure that the TSC is stable.

If the clocksource watchdog_timer is delayed for an extremely long
time (for example if softirqs are being serviced in ksoftirqd, and
realtime threads are starving ksoftirqd), then the 32-bit HPET counter
may wrap around.  For example, with an HPET running at 24 MHz, 2^32
cycles is about 179 seconds - a long time for timers to be starved,
but possible with a poorly behaved realtime thread.

If this happens, since the TSC is a 64-bit counter and won't wrap, the
watchdog will detect skew - the TSC interval will be 179 seconds
longer than the HPET interval - and will mark the TSC as unstable.
This causes the system to switch to the HPET as a clocksource, which
has a huge negative performance impact.

In this case, switching to the HPET just makes a bad situation (timers
starved) that the system might recover from turn permanently even
worse (more expensive clock_gettime() calls), due to a spurious false
positive detection of TSC instability.

To improve this, add some heuristics to detect cases where the
watchdog is delayed long enough for the instability detection to be
likely to be wrong:

 - If the clocksource being tested (eg TSC) has counted so many cycles
   that converting to nsecs will overflow multiplication, *AND* the
   watchdog clocksource (eg HPET) shows that the watchdog timer has
   missed its interval by at least a factor of 3, skip marking the
   clocksource as unstable for a timer interation.  This is not
   perfect - for example it is possible for the watchdog clocksource
   to wrap around and show a small interval - but at least in the
   specific x86 it is unlikely, since the watchdog interval is a small
   fraction of the wraparound interval.

 - If there is a skew between the clocksource being tested and the
   watchdog clocksource that is at least as big as the wraparound
   interval for the watchdog clocksource, then don't mark the
   clocksource as unstable.  Again, this might fail to mark a
   clocksource as unstable for one iteration, but it is unlikely that
   the instability is bad enough that we will see a larger skew than
   the wraparound interval for many iterations.

These heuristics are imperfect but are chosen to make false detection
of instability much less likely, while leaving detection of true
instability very likely within a few clocksource watchdog iterations.

Signed-off-by: Roland Dreier 
---
 kernel/time/clocksource.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index ffe081623aec..f1b3d8ff2437 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -243,12 +243,47 @@ static void clocksource_watchdog(struct timer_list 
*unused)
 watchdog->shift);
 
delta = clocksource_delta(csnow, cs->cs_last, cs->mask);
+
+   /* If the cycle delta is beyond what we can safely
+* convert to nsecs, and the watchdog clocksource
+* suggests that we've overslept, skip checking this
+* iteration to avoid marking a clocksource as
+* unstable because of a severely delayed timer. */
+   if (delta > cs->max_cycles &&
+   wd_nsec > 3 * jiffies_to_nsecs(WATCHDOG_INTERVAL)) {
+   pr_warn("timekeeping watchdog: Clocksource '%s' not 
checked due to apparent long timer delay:\n",
+   cs->name);
+   pr_warn("  Delta %llx > max_cycles 
%llx, wd_nsec %lld\n",
+   delta, cs->max_cycles, wd_nsec);
+   continue;
+   }
+
cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
wdlast = cs->wd_last; /* save these in case we print them */
cslast = cs->cs_last;
cs->cs_last = csnow;
cs->wd_last = wdnow;
 
+   /* If the clocksource interval is far off from the
+* watchdog clocksource interval but the interval is
+* big enough that the watchdog may have wrapped
+* around (again due to a severely delayed timer),
+* skip this iteration.  For example, this saves us
+* from marking the TSC as unstable just because the
+* 32-bit HPET wrapped around on x86. */
+   if (abs(cs_nsec - wd_nsec) >
+   clocksource_cyc2ns(watchdog->max_cycles, watchdog->mult,
+  watchdog->shift) - WATCHDOG_THRESHOLD) {
+   pr_warn("timekeeping watchdog: Clocksource '%s' not 
checked due to apparent t

[PATCH] clocksource: Add heuristics to avoid switching away from TSC due to timer delay

2018-11-30 Thread Roland Dreier
On a modern x86 system, the TSC is used as a clocksource, with HPET
used in the clocksource watchdog to make sure that the TSC is stable.

If the clocksource watchdog_timer is delayed for an extremely long
time (for example if softirqs are being serviced in ksoftirqd, and
realtime threads are starving ksoftirqd), then the 32-bit HPET counter
may wrap around.  For example, with an HPET running at 24 MHz, 2^32
cycles is about 179 seconds - a long time for timers to be starved,
but possible with a poorly behaved realtime thread.

If this happens, since the TSC is a 64-bit counter and won't wrap, the
watchdog will detect skew - the TSC interval will be 179 seconds
longer than the HPET interval - and will mark the TSC as unstable.
This causes the system to switch to the HPET as a clocksource, which
has a huge negative performance impact.

In this case, switching to the HPET just makes a bad situation (timers
starved) that the system might recover from turn permanently even
worse (more expensive clock_gettime() calls), due to a spurious false
positive detection of TSC instability.

To improve this, add some heuristics to detect cases where the
watchdog is delayed long enough for the instability detection to be
likely to be wrong:

 - If the clocksource being tested (eg TSC) has counted so many cycles
   that converting to nsecs will overflow multiplication, *AND* the
   watchdog clocksource (eg HPET) shows that the watchdog timer has
   missed its interval by at least a factor of 3, skip marking the
   clocksource as unstable for a timer interation.  This is not
   perfect - for example it is possible for the watchdog clocksource
   to wrap around and show a small interval - but at least in the
   specific x86 it is unlikely, since the watchdog interval is a small
   fraction of the wraparound interval.

 - If there is a skew between the clocksource being tested and the
   watchdog clocksource that is at least as big as the wraparound
   interval for the watchdog clocksource, then don't mark the
   clocksource as unstable.  Again, this might fail to mark a
   clocksource as unstable for one iteration, but it is unlikely that
   the instability is bad enough that we will see a larger skew than
   the wraparound interval for many iterations.

These heuristics are imperfect but are chosen to make false detection
of instability much less likely, while leaving detection of true
instability very likely within a few clocksource watchdog iterations.

Signed-off-by: Roland Dreier 
---
 kernel/time/clocksource.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index ffe081623aec..f1b3d8ff2437 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -243,12 +243,47 @@ static void clocksource_watchdog(struct timer_list 
*unused)
 watchdog->shift);
 
delta = clocksource_delta(csnow, cs->cs_last, cs->mask);
+
+   /* If the cycle delta is beyond what we can safely
+* convert to nsecs, and the watchdog clocksource
+* suggests that we've overslept, skip checking this
+* iteration to avoid marking a clocksource as
+* unstable because of a severely delayed timer. */
+   if (delta > cs->max_cycles &&
+   wd_nsec > 3 * jiffies_to_nsecs(WATCHDOG_INTERVAL)) {
+   pr_warn("timekeeping watchdog: Clocksource '%s' not 
checked due to apparent long timer delay:\n",
+   cs->name);
+   pr_warn("  Delta %llx > max_cycles 
%llx, wd_nsec %lld\n",
+   delta, cs->max_cycles, wd_nsec);
+   continue;
+   }
+
cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
wdlast = cs->wd_last; /* save these in case we print them */
cslast = cs->cs_last;
cs->cs_last = csnow;
cs->wd_last = wdnow;
 
+   /* If the clocksource interval is far off from the
+* watchdog clocksource interval but the interval is
+* big enough that the watchdog may have wrapped
+* around (again due to a severely delayed timer),
+* skip this iteration.  For example, this saves us
+* from marking the TSC as unstable just because the
+* 32-bit HPET wrapped around on x86. */
+   if (abs(cs_nsec - wd_nsec) >
+   clocksource_cyc2ns(watchdog->max_cycles, watchdog->mult,
+  watchdog->shift) - WATCHDOG_THRESHOLD) {
+   pr_warn("timekeeping watchdog: Clocksource '%s' not 
checked due to apparent t

[PATCH] x86/hpet: Remove unused FSEC_PER_NSEC define

2018-11-30 Thread Roland Dreier
The FSEC_PER_NSEC macro has had zero users since commit ab0e08f15d23
("x86: hpet: Cleanup the clockevents init and register code").

Signed-off-by: Roland Dreier 
---
 arch/x86/kernel/hpet.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index b0acb22e5a46..dfd3aca82c61 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -21,10 +21,6 @@
 
 #define HPET_MASK  CLOCKSOURCE_MASK(32)
 
-/* FSEC = 10^-15
-   NSEC = 10^-9 */
-#define FSEC_PER_NSEC  100L
-
 #define HPET_DEV_USED_BIT  2
 #define HPET_DEV_USED  (1 << HPET_DEV_USED_BIT)
 #define HPET_DEV_VALID 0x8
-- 
2.19.1



[PATCH] x86/hpet: Remove unused FSEC_PER_NSEC define

2018-11-30 Thread Roland Dreier
The FSEC_PER_NSEC macro has had zero users since commit ab0e08f15d23
("x86: hpet: Cleanup the clockevents init and register code").

Signed-off-by: Roland Dreier 
---
 arch/x86/kernel/hpet.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index b0acb22e5a46..dfd3aca82c61 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -21,10 +21,6 @@
 
 #define HPET_MASK  CLOCKSOURCE_MASK(32)
 
-/* FSEC = 10^-15
-   NSEC = 10^-9 */
-#define FSEC_PER_NSEC  100L
-
 #define HPET_DEV_USED_BIT  2
 #define HPET_DEV_USED  (1 << HPET_DEV_USED_BIT)
 #define HPET_DEV_VALID 0x8
-- 
2.19.1



Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-06-05 Thread Roland Dreier
> The sensible thing to do in nvme is to use different paths for
> different queues.  That is e.g. in the RDMA case use the HCA closer
> to a given CPU by default.  We might allow to override this for
> cases where the is a good reason, but what I really don't want is
> configurability for configurabilities sake.

That makes sense but I'm not sure it covers everything.  Probably the
most common way to do NVMe/RDMA will be with a single HCA that has
multiple ports, so there's no sensible CPU locality.  On the other
hand we want to keep both ports to the fabric busy.  Setting different
paths for different queues makes sense, but there may be
single-threaded applications that want a  different policy.

I'm not saying anything very profound, but we have to find the right
balance between too many and too few knobs.

 - R.


Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-06-05 Thread Roland Dreier
> The sensible thing to do in nvme is to use different paths for
> different queues.  That is e.g. in the RDMA case use the HCA closer
> to a given CPU by default.  We might allow to override this for
> cases where the is a good reason, but what I really don't want is
> configurability for configurabilities sake.

That makes sense but I'm not sure it covers everything.  Probably the
most common way to do NVMe/RDMA will be with a single HCA that has
multiple ports, so there's no sensible CPU locality.  On the other
hand we want to keep both ports to the fabric busy.  Setting different
paths for different queues makes sense, but there may be
single-threaded applications that want a  different policy.

I'm not saying anything very profound, but we have to find the right
balance between too many and too few knobs.

 - R.


Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-06-04 Thread Roland Dreier
> Moreover, I also wanted to point out that fabrics array vendors are
> building products that rely on standard nvme multipathing (and probably
> multipathing over dispersed namespaces as well), and keeping a knob that
> will keep nvme users with dm-multipath will probably not help them
> educate their customers as well... So there is another angle to this.

As a vendor who is building an NVMe-oF storage array, I can say that
clarity around how Linux wants to handle NVMe multipath would
definitely be appreciated.  It would be great if we could all converge
around the upstream native driver but right now it doesn't look
adequate - having only a single active path is not the best way to use
a multi-controller storage system.  Unfortunately it looks like we're
headed to a world where people have to write separate "best practices"
documents to cover RHEL, SLES and other vendors.

We plan to implement all the fancy NVMe standards like ANA, but it
seems that there is still a requirement to let the host side choose
policies about how to use paths (round-robin vs least queue depth for
example).  Even in the modern SCSI world with VPD pages and ALUA,
there are still knobs that are needed.  Maybe NVMe will be different
and we can find defaults that work in all cases but I have to admit
I'm skeptical...

 - R.


Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-06-04 Thread Roland Dreier
> Moreover, I also wanted to point out that fabrics array vendors are
> building products that rely on standard nvme multipathing (and probably
> multipathing over dispersed namespaces as well), and keeping a knob that
> will keep nvme users with dm-multipath will probably not help them
> educate their customers as well... So there is another angle to this.

As a vendor who is building an NVMe-oF storage array, I can say that
clarity around how Linux wants to handle NVMe multipath would
definitely be appreciated.  It would be great if we could all converge
around the upstream native driver but right now it doesn't look
adequate - having only a single active path is not the best way to use
a multi-controller storage system.  Unfortunately it looks like we're
headed to a world where people have to write separate "best practices"
documents to cover RHEL, SLES and other vendors.

We plan to implement all the fancy NVMe standards like ANA, but it
seems that there is still a requirement to let the host side choose
policies about how to use paths (round-robin vs least queue depth for
example).  Even in the modern SCSI world with VPD pages and ALUA,
there are still knobs that are needed.  Maybe NVMe will be different
and we can find defaults that work in all cases but I have to admit
I'm skeptical...

 - R.


Re: KASAN: use-after-free Read in __list_add_valid (5)

2018-05-15 Thread Roland Dreier
> Still reproducible on Linus' tree (commit 66e1c94db3cd4e) and on linux-next
> (next-20180511).  Here's a simplified reproducer:

Thanks!  That's a fantastic test case.

The issue is a race where rdma_listen() sees invalid state in the
middle of an rdma_bind_addr() call that will ultimately fail.  I'll
send a proposed patch shortly.

 - R.


Re: KASAN: use-after-free Read in __list_add_valid (5)

2018-05-15 Thread Roland Dreier
> Still reproducible on Linus' tree (commit 66e1c94db3cd4e) and on linux-next
> (next-20180511).  Here's a simplified reproducer:

Thanks!  That's a fantastic test case.

The issue is a race where rdma_listen() sees invalid state in the
middle of an rdma_bind_addr() call that will ultimately fail.  I'll
send a proposed patch shortly.

 - R.


Re: [Patch v2 00/19] CIFS: Implement SMBDirect

2017-08-29 Thread Roland Dreier
> Starting with SMB2 dialect 3.0, Microsoft introduced SMBDirect transport 
> protocol for transferring upper layer (SMB2) payload over RDMA via 
> Infiniband, RoCE or iWARP. The prococol is published in [MS-SMBD] 
> (https://msdn.microsoft.com/en-us/library/hh536346.aspx).

This is great to see.  Is there a Linux implementation of the server
side (in Samba?) so that the client can be tested without needing a
Windows server?

 - R.


Re: [Patch v2 00/19] CIFS: Implement SMBDirect

2017-08-29 Thread Roland Dreier
> Starting with SMB2 dialect 3.0, Microsoft introduced SMBDirect transport 
> protocol for transferring upper layer (SMB2) payload over RDMA via 
> Infiniband, RoCE or iWARP. The prococol is published in [MS-SMBD] 
> (https://msdn.microsoft.com/en-us/library/hh536346.aspx).

This is great to see.  Is there a Linux implementation of the server
side (in Samba?) so that the client can be tested without needing a
Windows server?

 - R.


Re: Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-07-08 Thread Roland Dreier
On Fri, Jul 8, 2016 at 9:51 AM, Jason Gunthorpe
 wrote:
> So, it appears, the dst and neigh can be used for all performances cases.
>
> For the non performance dst == null case, can we just burn cycles and
> stuff the daddr in front of the packet at hardheader time, even if we
> have to copy?

OK, sounds interesting.

Unfortunately the scope of this work has gotten to the point where I
can't take it on right now.  My system is running 4.4.y for now
(before struct skb_gso_cb grew) so I think shrinking struct skb_gso_cb
to 8 bytes plus changing SKB_SGO_CB_OFFSET to 20 will work for now.
Hope someone is able to come up with a real fix before I need to
upgrade to 4.10.y...

 - R.


Re: Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-07-08 Thread Roland Dreier
On Fri, Jul 8, 2016 at 9:51 AM, Jason Gunthorpe
 wrote:
> So, it appears, the dst and neigh can be used for all performances cases.
>
> For the non performance dst == null case, can we just burn cycles and
> stuff the daddr in front of the packet at hardheader time, even if we
> have to copy?

OK, sounds interesting.

Unfortunately the scope of this work has gotten to the point where I
can't take it on right now.  My system is running 4.4.y for now
(before struct skb_gso_cb grew) so I think shrinking struct skb_gso_cb
to 8 bytes plus changing SKB_SGO_CB_OFFSET to 20 will work for now.
Hope someone is able to come up with a real fix before I need to
upgrade to 4.10.y...

 - R.


Re: Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-07-08 Thread Roland Dreier
On Thu, Jul 7, 2016 at 4:14 PM, Jason Gunthorpe
 wrote:
> We have neighbour_priv, and ndo_neigh_construct/destruct now ..
>
> A first blush that would seem to be enough to let ipoib store the AH
> and other path information in the neigh and avoid the cb? At least the
> example in clip sure looks like what ipoib needs to do.

Do you think those new facilities let us go back to using the neigh
and still avoid the issues that led to commit b63b70d87741 ("IPoIB:
Use a private hash table for path lookup in xmit path")?

 - R.


Re: Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-07-08 Thread Roland Dreier
On Thu, Jul 7, 2016 at 4:14 PM, Jason Gunthorpe
 wrote:
> We have neighbour_priv, and ndo_neigh_construct/destruct now ..
>
> A first blush that would seem to be enough to let ipoib store the AH
> and other path information in the neigh and avoid the cb? At least the
> example in clip sure looks like what ipoib needs to do.

Do you think those new facilities let us go back to using the neigh
and still avoid the issues that led to commit b63b70d87741 ("IPoIB:
Use a private hash table for path lookup in xmit path")?

 - R.


Re: Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-07-07 Thread Roland Dreier
>> struct skb_gso_cb {
>> int mac_offset;
>> int encap_level;
>> __u16   csum_start;
>> };

> This is based on an out-dated version of this struct.  The 4.7 RC
> kernel has a few more fields that were added to support local checksum
> offload for encapsulated frames.

Thanks for pointing that out.  I hit the perf regression on 4.4.y
(stable) and looked at the struct there.  I see that latest upstream
has changed, and I agree that this struct really can't shrink below 10
bytes.

Since IP needs 20 bytes, GSO needs 10 bytes and IPoIB needs 20 bytes,
we're 2 bytes over the 48 that are available in cb[].  So this is
harder to fix than just changing skb_gso_cb and SKB_SGO_CB_OFFSET
unfortunately.

>> What is the best way to keep the crash fix but not kill IPoIB performance?
>
> It seems like what would probably need to happen is to move where the
> IPoIB address is stored.  I'm not sure the control buffer is really
> the best place for it since the cb gets overwritten at various levels,
> and storing 20 bytes makes it hard to avoid bumping up against the
> size restrictions of the buffer.  Seeing as how the IPoIB hwaddr is
> generated around the same time we generate the L2 header for the
> frame, I wonder if you couldn't get away with using a bit of extra skb
> headroom to store it and then use a offset from the MAC header to
> access it.  An added bonus would be that with a few tricks with
> SKB_GSO_CB(skb)->mac_offset you might even be able to set things up so
> that you copy the hwaddr when you copy the header for each fragment
> instead of having to go and copy the hwaddr out of the cb and clone it
> for each frame.

Can we assume there are 20 bytes of skb headroom available?  What if
we're forwarding an skb received on an Ethernet device?

The reason we moved to the cb storage is that in the past, trying to
hide some data in the actual skb buffer that we don't actually send
led to some awkward-at-best code.  (As I recall GRO was difficult to
handle before commit 936d7de3d736 "IPoIB: Stop lying about
hard_header_len and use skb->cb to stash LL addresses")  But maybe
there's a third way to handle this other than the old way and the
skb->cb way.

 - R.


Re: Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-07-07 Thread Roland Dreier
>> struct skb_gso_cb {
>> int mac_offset;
>> int encap_level;
>> __u16   csum_start;
>> };

> This is based on an out-dated version of this struct.  The 4.7 RC
> kernel has a few more fields that were added to support local checksum
> offload for encapsulated frames.

Thanks for pointing that out.  I hit the perf regression on 4.4.y
(stable) and looked at the struct there.  I see that latest upstream
has changed, and I agree that this struct really can't shrink below 10
bytes.

Since IP needs 20 bytes, GSO needs 10 bytes and IPoIB needs 20 bytes,
we're 2 bytes over the 48 that are available in cb[].  So this is
harder to fix than just changing skb_gso_cb and SKB_SGO_CB_OFFSET
unfortunately.

>> What is the best way to keep the crash fix but not kill IPoIB performance?
>
> It seems like what would probably need to happen is to move where the
> IPoIB address is stored.  I'm not sure the control buffer is really
> the best place for it since the cb gets overwritten at various levels,
> and storing 20 bytes makes it hard to avoid bumping up against the
> size restrictions of the buffer.  Seeing as how the IPoIB hwaddr is
> generated around the same time we generate the L2 header for the
> frame, I wonder if you couldn't get away with using a bit of extra skb
> headroom to store it and then use a offset from the MAC header to
> access it.  An added bonus would be that with a few tricks with
> SKB_GSO_CB(skb)->mac_offset you might even be able to set things up so
> that you copy the hwaddr when you copy the header for each fragment
> instead of having to go and copy the hwaddr out of the cb and clone it
> for each frame.

Can we assume there are 20 bytes of skb headroom available?  What if
we're forwarding an skb received on an Ethernet device?

The reason we moved to the cb storage is that in the past, trying to
hide some data in the actual skb buffer that we don't actually send
led to some awkward-at-best code.  (As I recall GRO was difficult to
handle before commit 936d7de3d736 "IPoIB: Stop lying about
hard_header_len and use skb->cb to stash LL addresses")  But maybe
there's a third way to handle this other than the old way and the
skb->cb way.

 - R.


Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-07-07 Thread Roland Dreier
On Thu, Jan 7, 2016 at 3:00 AM, Konstantin Khlebnikov  wrote:
> Or just shift GSO CB and add couple checks like
> BUILD_BUG_ON(sizeof(SKB_GSO_CB(skb)->room) < sizeof(*IPCB(skb)));

Resurrecting this old thread, because the patch that ultimately went
upstream (commit 9207f9d45b0a / net: preserve IP control block during
GSO segmentation) causes a huge IPoIB performance regression (to the
point of being unusable):
https://bugzilla.kernel.org/show_bug.cgi?id=111921

I don't think anyone has explained what goes wrong or why IPoIB works
the way it does.  The underlying difference that IPoIB has from other
drivers is that there are two levels of address resolution.  First,
normal ARP / ND resolves an IP address to a "hardware" address.  The
difference is that in IPoIB, the hardware address is an IB GID (plus a
QPN, but we can ignore that).  To actually send data to that GID, the
IPoIB driver has to do a second lookup - it needs to ask the IB subnet
manager for a path record that tells it how to reach that GID.

In particular this means that "destination address" (as the IP / ARP
layer understands it) actually isn't in the packet anywhere - there's
nothing like an ethernet header as there is for "normal" network
drivers.  Instead, the driver stashes the address in skb->cb during
hard_header_ops->create() and then looks at it in the xmit routine -
this was designed way back around when commit a0417fa3a18a / net: Make
qdisc_skb_cb upper size bound explicit. was merged.  The expectation
was that the part of the cb after sizeof (struct qdisc_skb_cb) would
be preserved.

The problem with commit 9207f9d45b0a is that GSO operations now access
cb after SKB_SGO_CB_OFFSET==32, which lands right in the middle of
where IPoIB stashes its hwaddr.

It seems that the intent of the commit is to preserve the IP control
block - struct inet_skb_parm (and presumably struct inet6_skb_parm) -
even when using SKB_GSO_CB().  Seems like both inet_skb_parm and
inet6_skb_parm are 20 bytes.  IPoIB uses the part of cb after 28
bytes, so if we could squeeze struct skb_gso_cb down to 8 bytes and
set SKB_SGO_CB_OFFSET to 20, then everything would work.  The struct
is

struct skb_gso_cb {
int mac_offset;
int encap_level;
__u16   csum_start;
};

is it feasible to make encap_level a __u16 (which would make the
overall struct exactly 8 bytes)?  If I understand this correctly, 64K
nested encapsulations seems like quite a bit for a packet...

Or, earlier in this thread, having the GSO in ip_output and other gso
paths save and restore the IP/IP6 control block was suggested as an
alternate approach.  I don't know if there are performance
implications to that.

What is the best way to keep the crash fix but not kill IPoIB performance?

Thanks!
 - R.


Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-07-07 Thread Roland Dreier
On Thu, Jan 7, 2016 at 3:00 AM, Konstantin Khlebnikov  wrote:
> Or just shift GSO CB and add couple checks like
> BUILD_BUG_ON(sizeof(SKB_GSO_CB(skb)->room) < sizeof(*IPCB(skb)));

Resurrecting this old thread, because the patch that ultimately went
upstream (commit 9207f9d45b0a / net: preserve IP control block during
GSO segmentation) causes a huge IPoIB performance regression (to the
point of being unusable):
https://bugzilla.kernel.org/show_bug.cgi?id=111921

I don't think anyone has explained what goes wrong or why IPoIB works
the way it does.  The underlying difference that IPoIB has from other
drivers is that there are two levels of address resolution.  First,
normal ARP / ND resolves an IP address to a "hardware" address.  The
difference is that in IPoIB, the hardware address is an IB GID (plus a
QPN, but we can ignore that).  To actually send data to that GID, the
IPoIB driver has to do a second lookup - it needs to ask the IB subnet
manager for a path record that tells it how to reach that GID.

In particular this means that "destination address" (as the IP / ARP
layer understands it) actually isn't in the packet anywhere - there's
nothing like an ethernet header as there is for "normal" network
drivers.  Instead, the driver stashes the address in skb->cb during
hard_header_ops->create() and then looks at it in the xmit routine -
this was designed way back around when commit a0417fa3a18a / net: Make
qdisc_skb_cb upper size bound explicit. was merged.  The expectation
was that the part of the cb after sizeof (struct qdisc_skb_cb) would
be preserved.

The problem with commit 9207f9d45b0a is that GSO operations now access
cb after SKB_SGO_CB_OFFSET==32, which lands right in the middle of
where IPoIB stashes its hwaddr.

It seems that the intent of the commit is to preserve the IP control
block - struct inet_skb_parm (and presumably struct inet6_skb_parm) -
even when using SKB_GSO_CB().  Seems like both inet_skb_parm and
inet6_skb_parm are 20 bytes.  IPoIB uses the part of cb after 28
bytes, so if we could squeeze struct skb_gso_cb down to 8 bytes and
set SKB_SGO_CB_OFFSET to 20, then everything would work.  The struct
is

struct skb_gso_cb {
int mac_offset;
int encap_level;
__u16   csum_start;
};

is it feasible to make encap_level a __u16 (which would make the
overall struct exactly 8 bytes)?  If I understand this correctly, 64K
nested encapsulations seems like quite a bit for a packet...

Or, earlier in this thread, having the GSO in ip_output and other gso
paths save and restore the IP/IP6 control block was suggested as an
alternate approach.  I don't know if there are performance
implications to that.

What is the best way to keep the crash fix but not kill IPoIB performance?

Thanks!
 - R.


[PATCH] iommu/vt-d: Don't reject NTB devices due to scope mismatch

2016-06-02 Thread Roland Dreier
From: Roland Dreier <rol...@purestorage.com>

On a system with an Intel PCIe port configured as an NTB device, iommu
initialization fails with

DMAR: Device scope type does not match for :80:03.0

This is because the DMAR table reports this device as having scope 2
(ACPI_DMAR_SCOPE_TYPE_BRIDGE):

[0A0h 0160   1]  Device Scope Entry Type : 02
[0A1h 0161   1] Entry Length : 08
[0A2h 0162   2] Reserved : 
[0A4h 0164   1]   Enumeration ID : 00
[0A5h 0165   1]   PCI Bus Number : 80

[0A6h 0166   2] PCI Path : 03,00

but the device has a type 0 PCI header:

80:03.0 Bridge [0680]: Intel Corporation Device [8086:2f0d] (rev 02)
00: 86 80 0d 2f 00 00 10 00 02 00 80 06 10 00 80 00
10: 0c 00 c0 00 c0 38 00 00 0c 00 00 00 80 38 00 00
20: 00 00 00 c8 00 00 10 c8 00 00 00 00 86 80 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 ff 01 00 00

VT-d works perfectly on this system, so there's no reason to bail out
on initialization due to this apparent scope mismatch.  Use the class
0x0680 ("Other bridge device") as a heuristic for allowing DMAR
initialization for non-bridge PCI devices listed with scope bridge.

Signed-off-by: Roland Dreier <rol...@purestorage.com>
---
 drivers/iommu/dmar.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 6a86b5d1defa..2eff7b6c6c98 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -241,8 +241,20 @@ int dmar_insert_dev_scope(struct dmar_pci_notify_info 
*info,
if (!dmar_match_pci_path(info, scope->bus, path, level))
continue;
 
-   if ((scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT) ^
-   (info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL)) {
+   /*
+* We expect devices with endpoint scope to have normal PCI
+* headers, and devices with bridge scope to have bridge PCI
+* headers.  However PCI NTB devices may be listed in the
+* DMAR table with bridge scope, even though they have a
+* normal PCI header.  NTB devices are identified by class
+* "BRIDGE_OTHER" (0680h) - we don't declare a socpe mismatch
+* for this special case.
+*/
+   if ((scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT &&
+info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) ||
+   (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE &&
+(info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
+ info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) {
pr_warn("Device scope type does not match for %s\n",
pci_name(info->dev));
return -EINVAL;
-- 
2.7.4



[PATCH] iommu/vt-d: Don't reject NTB devices due to scope mismatch

2016-06-02 Thread Roland Dreier
From: Roland Dreier 

On a system with an Intel PCIe port configured as an NTB device, iommu
initialization fails with

DMAR: Device scope type does not match for :80:03.0

This is because the DMAR table reports this device as having scope 2
(ACPI_DMAR_SCOPE_TYPE_BRIDGE):

[0A0h 0160   1]  Device Scope Entry Type : 02
[0A1h 0161   1] Entry Length : 08
[0A2h 0162   2] Reserved : 
[0A4h 0164   1]   Enumeration ID : 00
[0A5h 0165   1]   PCI Bus Number : 80

[0A6h 0166   2] PCI Path : 03,00

but the device has a type 0 PCI header:

80:03.0 Bridge [0680]: Intel Corporation Device [8086:2f0d] (rev 02)
00: 86 80 0d 2f 00 00 10 00 02 00 80 06 10 00 80 00
10: 0c 00 c0 00 c0 38 00 00 0c 00 00 00 80 38 00 00
20: 00 00 00 c8 00 00 10 c8 00 00 00 00 86 80 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 ff 01 00 00

VT-d works perfectly on this system, so there's no reason to bail out
on initialization due to this apparent scope mismatch.  Use the class
0x0680 ("Other bridge device") as a heuristic for allowing DMAR
initialization for non-bridge PCI devices listed with scope bridge.

Signed-off-by: Roland Dreier 
---
 drivers/iommu/dmar.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 6a86b5d1defa..2eff7b6c6c98 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -241,8 +241,20 @@ int dmar_insert_dev_scope(struct dmar_pci_notify_info 
*info,
if (!dmar_match_pci_path(info, scope->bus, path, level))
continue;
 
-   if ((scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT) ^
-   (info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL)) {
+   /*
+* We expect devices with endpoint scope to have normal PCI
+* headers, and devices with bridge scope to have bridge PCI
+* headers.  However PCI NTB devices may be listed in the
+* DMAR table with bridge scope, even though they have a
+* normal PCI header.  NTB devices are identified by class
+* "BRIDGE_OTHER" (0680h) - we don't declare a socpe mismatch
+* for this special case.
+*/
+   if ((scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT &&
+info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) ||
+   (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE &&
+(info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
+ info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) {
pr_warn("Device scope type does not match for %s\n",
pci_name(info->dev));
return -EINVAL;
-- 
2.7.4



Re: Regression in IO resource allocation

2016-06-01 Thread Roland Dreier
On Tue, May 31, 2016 at 3:31 PM, Rafael J. Wysocki <raf...@kernel.org> wrote:
> It may not be called at all if _PTC is used on that system, for example.

Yes, that's exactly the case on my system.

So from my POV:

Tested-by: Roland Dreier <rol...@purestorage.com>

Thanks!


Re: Regression in IO resource allocation

2016-06-01 Thread Roland Dreier
On Tue, May 31, 2016 at 3:31 PM, Rafael J. Wysocki  wrote:
> It may not be called at all if _PTC is used on that system, for example.

Yes, that's exactly the case on my system.

So from my POV:

Tested-by: Roland Dreier 

Thanks!


Re: Regression in IO resource allocation

2016-05-31 Thread Roland Dreier
On Tue, May 31, 2016 at 2:11 PM, Rafael J. Wysocki  wrote:
> Can you please try the appended patch (untested)?

Thanks for the quick reply.  Patch looks OK on my system... it boots
(which is very good :) and I see

system 00:01: [io  0x0400-0x047f] has been reserved

however I don't see the "ACPI CPU throttle" region reserved in
/proc/ioports... haven't debugged why acpi_processor_get_throttling()
isn't getting called or what is happening yet.

Will dig a bit deeper and let you know.

 - R.


Re: Regression in IO resource allocation

2016-05-31 Thread Roland Dreier
On Tue, May 31, 2016 at 2:11 PM, Rafael J. Wysocki  wrote:
> Can you please try the appended patch (untested)?

Thanks for the quick reply.  Patch looks OK on my system... it boots
(which is very good :) and I see

system 00:01: [io  0x0400-0x047f] has been reserved

however I don't see the "ACPI CPU throttle" region reserved in
/proc/ioports... haven't debugged why acpi_processor_get_throttling()
isn't getting called or what is happening yet.

Will dig a bit deeper and let you know.

 - R.


Regression in IO resource allocation

2016-05-31 Thread Roland Dreier
Hi,

I recently updated one of my systems from 3.10.y to 4.4.11, and
discovered a regression that stops it from booting.  It's actually
very similar to https://bugzilla.kernel.org/show_bug.cgi?id=99831
(which I reported about the same system last year).

The problem is that commit ac212b6980d8 ("ACPI / processor: Use common
hotplug infrastructure") changes the order that the ACPI processor and
PnP initialization run.  pnp_system_init() is run at fs_initcall time,
while acpi_processor_init() is run from acpi_scan_init(), earlier at
subsys_initcall time.  Pre-ac212b6980d8, the ACPI processor
initialization all ran from acpi_processor_init() at module_init time.
So the processor driver initialization has flipped from after to
before pnp_system_init().

Just as before, the failure is that the resource allocation code puts
some AHCI IO BARs around 0x400, and reservation fails because some
other ACPI stuff is also there.  The problem is that when acpi_processor_init()
runs, it reserves a range 0x410 - 0x415 for "ACPI CPU throttle", and
if that happens before pnp_system_init(), then I get

system 00:01: [io  0x0400-0x047f] could not be reserved

because that overlaps the already-reserved range.  Then the PCI
resource allocation code is free to put PCI resources into that range
and tons of things go south after that.

For now I've worked around it by commenting out the request_region()
in acpi_processor.c but that doesn't seem like a very good long-term
solution.  Does it make sense to resurrect the patches you had to let
ACPI and PnP coexist in resource reservation?  Or could we move the
request_region() for CPU throttle into the still-modular
initialization done from acpi_processor_driver_init()?

Thanks!
  Roland


Regression in IO resource allocation

2016-05-31 Thread Roland Dreier
Hi,

I recently updated one of my systems from 3.10.y to 4.4.11, and
discovered a regression that stops it from booting.  It's actually
very similar to https://bugzilla.kernel.org/show_bug.cgi?id=99831
(which I reported about the same system last year).

The problem is that commit ac212b6980d8 ("ACPI / processor: Use common
hotplug infrastructure") changes the order that the ACPI processor and
PnP initialization run.  pnp_system_init() is run at fs_initcall time,
while acpi_processor_init() is run from acpi_scan_init(), earlier at
subsys_initcall time.  Pre-ac212b6980d8, the ACPI processor
initialization all ran from acpi_processor_init() at module_init time.
So the processor driver initialization has flipped from after to
before pnp_system_init().

Just as before, the failure is that the resource allocation code puts
some AHCI IO BARs around 0x400, and reservation fails because some
other ACPI stuff is also there.  The problem is that when acpi_processor_init()
runs, it reserves a range 0x410 - 0x415 for "ACPI CPU throttle", and
if that happens before pnp_system_init(), then I get

system 00:01: [io  0x0400-0x047f] could not be reserved

because that overlaps the already-reserved range.  Then the PCI
resource allocation code is free to put PCI resources into that range
and tons of things go south after that.

For now I've worked around it by commenting out the request_region()
in acpi_processor.c but that doesn't seem like a very good long-term
solution.  Does it make sense to resurrect the patches you had to let
ACPI and PnP coexist in resource reservation?  Or could we move the
request_region() for CPU throttle into the still-modular
initialization done from acpi_processor_driver_init()?

Thanks!
  Roland


Re: Running out of IO space because of innocuous-looking DSDT change

2015-10-19 Thread Roland Dreier
On Mon, Oct 19, 2015 at 10:00 AM, Yinghai Lu  wrote:
> I would suggest to expand standard_io_resources[] to include all
> possible conflict that we should avoid, like the io port for serial and 
> cf8/cf9.
>
> Then we could just set PCIBIOS_MIN_IO to 0 for x86.

That would work on my system, which is a well-behaved standard server.
But I thought the issue was weird vendor-specific stuff (Sony
laptops?) where there are undocumented nonstandard IO resources that
also aren't reserved in ACPI?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Running out of IO space because of innocuous-looking DSDT change

2015-10-19 Thread Roland Dreier
I recently ran into an interesting issue with IO space allocation, and
I'm looking for opinions on whether this is a BIOS issue, a kernel
issue, both, or neither ;)

What happened is that a BIOS update for my system changed the DSDT
from having three ranges in PCI0:

WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
0x, // Granularity
0x, // Range Minimum
0x03AF, // Range Maximum
0x, // Translation Offset
0x03B0, // Length
,, , TypeStatic)
WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
0x, // Granularity
0x03E0, // Range Minimum
0x0CF7, // Range Maximum
0x, // Translation Offset
0x0918, // Length
,, , TypeStatic)
WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
0x, // Granularity
0x03B0, // Range Minimum
0x03DF, // Range Maximum
0x, // Translation Offset
0x0030, // Length
,, , TypeStatic)

to a single range:

WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
0x, // Granularity
0x, // Range Minimum
0x0CF7, // Range Maximum
0x, // Translation Offset
0x0CF8, // Length
,, , TypeStatic)

Naively it seems like this shouldn't make a difference, since in the
end we've covered the space 0...0xCF7.  However because of the code

min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;

/* First, try exact prefetching match.. */
ret = pci_bus_alloc_resource(bus, res, size, align, min,
 IORESOURCE_PREFETCH,
 pcibios_align_resource, dev);

in pci_bus_alloc_resource(), the single range ultimately means we end
up running out of IO space for our devices (we have various devices
asking for IO space as well as quite a few downstream PCI switch ports
that get allocated IO space).

What happens is that PCIBIOS_MIN_IO is 0x1000, so that code means with
the new BIOS we can't allocate any IO in the range 0...0xCF7; with the
old BIOS we only ruled out the range 0...0x3AF and happily put small
IO resources (for SMBus controller devices etc) at places like 0x480 etc.

Looking at the code and history, I see that the code with PCIBIOS_MIN_IO
is there to deal with systems where not all resources are declared
and the kernel might accidentally allocate something that clashes with
strange hardware.  However in my case I'm pretty confident there isn't
anything in the range we used to use (since my system didn't blow up,
and I know there isn't any weird proprietary stuff anyway).

Would it make sense to change the kernel to reduce PCIBIOS_MIN_IO in
my case?  I could make it generic and send it upstream, or just hack
it locally.  Or (given my ignorance of ACPI in the real world) is this
a broken BIOS change that I should ask my BIOS vendor to revert?
Or... ?

Thanks!
  Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Running out of IO space because of innocuous-looking DSDT change

2015-10-19 Thread Roland Dreier
I recently ran into an interesting issue with IO space allocation, and
I'm looking for opinions on whether this is a BIOS issue, a kernel
issue, both, or neither ;)

What happened is that a BIOS update for my system changed the DSDT
from having three ranges in PCI0:

WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
0x, // Granularity
0x, // Range Minimum
0x03AF, // Range Maximum
0x, // Translation Offset
0x03B0, // Length
,, , TypeStatic)
WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
0x, // Granularity
0x03E0, // Range Minimum
0x0CF7, // Range Maximum
0x, // Translation Offset
0x0918, // Length
,, , TypeStatic)
WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
0x, // Granularity
0x03B0, // Range Minimum
0x03DF, // Range Maximum
0x, // Translation Offset
0x0030, // Length
,, , TypeStatic)

to a single range:

WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
0x, // Granularity
0x, // Range Minimum
0x0CF7, // Range Maximum
0x, // Translation Offset
0x0CF8, // Length
,, , TypeStatic)

Naively it seems like this shouldn't make a difference, since in the
end we've covered the space 0...0xCF7.  However because of the code

min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;

/* First, try exact prefetching match.. */
ret = pci_bus_alloc_resource(bus, res, size, align, min,
 IORESOURCE_PREFETCH,
 pcibios_align_resource, dev);

in pci_bus_alloc_resource(), the single range ultimately means we end
up running out of IO space for our devices (we have various devices
asking for IO space as well as quite a few downstream PCI switch ports
that get allocated IO space).

What happens is that PCIBIOS_MIN_IO is 0x1000, so that code means with
the new BIOS we can't allocate any IO in the range 0...0xCF7; with the
old BIOS we only ruled out the range 0...0x3AF and happily put small
IO resources (for SMBus controller devices etc) at places like 0x480 etc.

Looking at the code and history, I see that the code with PCIBIOS_MIN_IO
is there to deal with systems where not all resources are declared
and the kernel might accidentally allocate something that clashes with
strange hardware.  However in my case I'm pretty confident there isn't
anything in the range we used to use (since my system didn't blow up,
and I know there isn't any weird proprietary stuff anyway).

Would it make sense to change the kernel to reduce PCIBIOS_MIN_IO in
my case?  I could make it generic and send it upstream, or just hack
it locally.  Or (given my ignorance of ACPI in the real world) is this
a broken BIOS change that I should ask my BIOS vendor to revert?
Or... ?

Thanks!
  Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Running out of IO space because of innocuous-looking DSDT change

2015-10-19 Thread Roland Dreier
On Mon, Oct 19, 2015 at 10:00 AM, Yinghai Lu  wrote:
> I would suggest to expand standard_io_resources[] to include all
> possible conflict that we should avoid, like the io port for serial and 
> cf8/cf9.
>
> Then we could just set PCIBIOS_MIN_IO to 0 for x86.

That would work on my system, which is a well-behaved standard server.
But I thought the issue was weird vendor-specific stuff (Sony
laptops?) where there are undocumented nonstandard IO resources that
also aren't reserved in ACPI?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] target/iscsi: fix digest computation for chained SGs

2015-07-21 Thread Roland Dreier
On Tue, Jul 21, 2015 at 1:57 AM, Sagi Grimberg  wrote:
> How were you able to get a chained SG list in the target code?

Local hack.  So this bug can't be hit in current mainline code, but
patch improves the code and removes a hidden booby-trap, so I think it
makes sense to apply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] target/iscsi: fix digest computation for chained SGs

2015-07-21 Thread Roland Dreier
On Tue, Jul 21, 2015 at 1:57 AM, Sagi Grimberg sa...@dev.mellanox.co.il wrote:
 How were you able to get a chained SG list in the target code?

Local hack.  So this bug can't be hit in current mainline code, but
patch improves the code and removes a hidden booby-trap, so I think it
makes sense to apply.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 3.10.80 vs. 3.10.79

2015-06-15 Thread Roland Dreier
On Sat, Jun 13, 2015 at 9:56 AM, Roland Dreier  wrote:
> Below is a more sophisticated, so to speak, version of it with a changelog and
> all.  It works for me, but more testing would be much appreciated.

Yes, the patch works as expected:

Tested-by: Roland Dreier 


It does change /proc/ioports heirarchy to

  0400-0403 : ACPI PM1a_EVT_BLK
  0404-0405 : ACPI PM1a_CNT_BLK
  0406-0407 : pnp 00:06
  0408-040b : ACPI PM_TMR
  040c-041f : pnp 00:06
0410-0415 : ACPI CPU throttle
  0420-042f : ACPI GPE0_BLK
  0430-044f : pnp 00:06
0430-0433 : iTCO_wdt
  0430-0433 : iTCO_wdt
  0450-0450 : ACPI PM2_CNT_BLK
  0451-047f : pnp 00:06
0460-047f : iTCO_wdt
  0460-047f : iTCO_wdt

where the old kernel had

  0400-047f : pnp 00:06
0400-0403 : ACPI PM1a_EVT_BLK
0404-0405 : ACPI PM1a_CNT_BLK
0408-040b : ACPI PM_TMR
0410-0415 : ACPI CPU throttle
0420-042f : ACPI GPE0_BLK
0430-0433 : iTCO_wdt
  0430-0433 : iTCO_wdt
0450-0450 : ACPI PM2_CNT_BLK
0460-047f : iTCO_wdt
  0460-047f : iTCO_wdt

but I don't think that matters.

Thanks,
 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 3.10.80 vs. 3.10.79

2015-06-15 Thread Roland Dreier
On Sat, Jun 13, 2015 at 9:56 AM, Roland Dreier rol...@purestorage.com wrote:
 Below is a more sophisticated, so to speak, version of it with a changelog and
 all.  It works for me, but more testing would be much appreciated.

Yes, the patch works as expected:

Tested-by: Roland Dreier rol...@purestorage.com


It does change /proc/ioports heirarchy to

  0400-0403 : ACPI PM1a_EVT_BLK
  0404-0405 : ACPI PM1a_CNT_BLK
  0406-0407 : pnp 00:06
  0408-040b : ACPI PM_TMR
  040c-041f : pnp 00:06
0410-0415 : ACPI CPU throttle
  0420-042f : ACPI GPE0_BLK
  0430-044f : pnp 00:06
0430-0433 : iTCO_wdt
  0430-0433 : iTCO_wdt
  0450-0450 : ACPI PM2_CNT_BLK
  0451-047f : pnp 00:06
0460-047f : iTCO_wdt
  0460-047f : iTCO_wdt

where the old kernel had

  0400-047f : pnp 00:06
0400-0403 : ACPI PM1a_EVT_BLK
0404-0405 : ACPI PM1a_CNT_BLK
0408-040b : ACPI PM_TMR
0410-0415 : ACPI CPU throttle
0420-042f : ACPI GPE0_BLK
0430-0433 : iTCO_wdt
  0430-0433 : iTCO_wdt
0450-0450 : ACPI PM2_CNT_BLK
0460-047f : iTCO_wdt
  0460-047f : iTCO_wdt

but I don't think that matters.

Thanks,
 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 3.10.80 vs. 3.10.79

2015-06-13 Thread Roland Dreier
On Fri, Jun 12, 2015 at 7:52 PM, Rafael J. Wysocki  wrote:
> Below is a more sophisticated, so to speak, version of it with a changelog and
> all.  It works for me, but more testing would be much appreciated.

Great, I'm convinced by your reasoning that this makes sense.  I'm
building 3.10.80 patched with this (needed a tiny bit of context
adjustment because acpi_dev_filter_resource_type() hadn't been added
to 3.10 yet), and will confirm that it fixes the issue I saw.

Thanks!
  Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 3.10.80 vs. 3.10.79

2015-06-13 Thread Roland Dreier
On Fri, Jun 12, 2015 at 7:52 PM, Rafael J. Wysocki r...@rjwysocki.net wrote:
 Below is a more sophisticated, so to speak, version of it with a changelog and
 all.  It works for me, but more testing would be much appreciated.

Great, I'm convinced by your reasoning that this makes sense.  I'm
building 3.10.80 patched with this (needed a tiny bit of context
adjustment because acpi_dev_filter_resource_type() hadn't been added
to 3.10 yet), and will confirm that it fixes the issue I saw.

Thanks!
  Roland
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 3.10.80 vs. 3.10.79

2015-06-12 Thread Roland Dreier
On Thu, Jun 11, 2015 at 1:50 PM, Rafael J. Wysocki  wrote:
> Changing the ordering between those two routines would work around that 
> problem,
> but in my view that wouldn't be a proper fix.  In fact, the role of 
> reserve_range()
> is to reserve the resources so as to prevent them from being used going 
> forward,
> so they need not be reserved each in one piece.  Instead, we can just check 
> if they
> overlap with the ones reserved by acpi_reserve_resources() and only request 
> the
> non-overlapping parts of them to avoid conflicts.
>
> So I wonder if the patch below makes any difference?

I will give this a try and make sure it fixes my system, although I'm
pretty sure it will.

However I'm not sure I agree that this is a better fix than just
having pnp reserve ranges before acpi.  It already creates a special
relationship between pnp and acpi, and acpi_reserve_region is a bunch
of extra code.  Could we really have a system where the hierarchy of
acpi being a subset of a pnp bus doesn't work?  I looked at a few
other systems I have, and things like the following seem quite common:

supermicro:

03e0-0cf7 : PCI Bus :00
  03f8-03ff : serial
  0400-0453 : pnp 00:0c
0400-0403 : ACPI PM1a_EVT_BLK
0404-0405 : ACPI PM1a_CNT_BLK
0408-040b : ACPI PM_TMR
0410-0415 : ACPI CPU throttle
0420-042f : ACPI GPE0_BLK
0430-0433 : iTCO_wdt
0450-0450 : ACPI PM2_CNT_BLK

dell:

03e0-0cf7 : PCI Bus :00
  03f8-03ff : serial
  0800-087f : pnp 00:06
0800-0803 : ACPI PM1a_EVT_BLK
0804-0805 : ACPI PM1a_CNT_BLK
0808-080b : ACPI PM_TMR
0810-0815 : ACPI CPU throttle
0820-082f : ACPI GPE0_BLK
0830-0833 : iTCO_wdt
  0830-0833 : iTCO_wdt
0850-0850 : ACPI PM2_CNT_BLK
0860-087f : iTCO_wdt
  0860-087f : iTCO_wdt

but I wasn't able to find anything that required more generality...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 3.10.80 vs. 3.10.79

2015-06-12 Thread Roland Dreier
On Thu, Jun 11, 2015 at 1:50 PM, Rafael J. Wysocki r...@rjwysocki.net wrote:
 Changing the ordering between those two routines would work around that 
 problem,
 but in my view that wouldn't be a proper fix.  In fact, the role of 
 reserve_range()
 is to reserve the resources so as to prevent them from being used going 
 forward,
 so they need not be reserved each in one piece.  Instead, we can just check 
 if they
 overlap with the ones reserved by acpi_reserve_resources() and only request 
 the
 non-overlapping parts of them to avoid conflicts.

 So I wonder if the patch below makes any difference?

I will give this a try and make sure it fixes my system, although I'm
pretty sure it will.

However I'm not sure I agree that this is a better fix than just
having pnp reserve ranges before acpi.  It already creates a special
relationship between pnp and acpi, and acpi_reserve_region is a bunch
of extra code.  Could we really have a system where the hierarchy of
acpi being a subset of a pnp bus doesn't work?  I looked at a few
other systems I have, and things like the following seem quite common:

supermicro:

03e0-0cf7 : PCI Bus :00
  03f8-03ff : serial
  0400-0453 : pnp 00:0c
0400-0403 : ACPI PM1a_EVT_BLK
0404-0405 : ACPI PM1a_CNT_BLK
0408-040b : ACPI PM_TMR
0410-0415 : ACPI CPU throttle
0420-042f : ACPI GPE0_BLK
0430-0433 : iTCO_wdt
0450-0450 : ACPI PM2_CNT_BLK

dell:

03e0-0cf7 : PCI Bus :00
  03f8-03ff : serial
  0800-087f : pnp 00:06
0800-0803 : ACPI PM1a_EVT_BLK
0804-0805 : ACPI PM1a_CNT_BLK
0808-080b : ACPI PM_TMR
0810-0815 : ACPI CPU throttle
0820-082f : ACPI GPE0_BLK
0830-0833 : iTCO_wdt
  0830-0833 : iTCO_wdt
0850-0850 : ACPI PM2_CNT_BLK
0860-087f : iTCO_wdt
  0860-087f : iTCO_wdt

but I wasn't able to find anything that required more generality...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 3.10.80 vs. 3.10.79

2015-06-11 Thread Roland Dreier
On Wed, Jun 10, 2015 at 4:23 PM, Rafael J. Wysocki  wrote:
> Can you please file a bug at bugzilla.kernel.org to track this and attach
> the output of acpidump from the affected system in there?

Done: https://bugzilla.kernel.org/show_bug.cgi?id=99831

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 3.10.80 vs. 3.10.79

2015-06-11 Thread Roland Dreier
On Wed, Jun 10, 2015 at 4:23 PM, Rafael J. Wysocki r...@rjwysocki.net wrote:
 Can you please file a bug at bugzilla.kernel.org to track this and attach
 the output of acpidump from the affected system in there?

Done: https://bugzilla.kernel.org/show_bug.cgi?id=99831

Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 3.10.80 vs. 3.10.79

2015-06-09 Thread Roland Dreier
On Tue, Jun 9, 2015 at 4:43 PM, Roland Dreier  wrote:
> I understand that the change here fixed another regression, but I'm
> wondering if there's a way to make everyone happy here?  I can provide
> debugging info from my system as required...

Maybe sent my mail too quickly, as I have some thoughts after looking
at the code.

>From the link order, drivers/acpi init wll be called before
drivers/pnp init, right?  In my case, the acpi resources ("ACPI
PM1a_EVT_BLK") etc are under a pnp bus.  But if acpi requests the
resources first, then pnp can't request the enclosing range.

Is the right fix to make sure the pnp init happens before acpi
requests resources?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Regression in 3.10.80 vs. 3.10.79

2015-06-09 Thread Roland Dreier
Hi, I recently updated from 3.10.79 to 3.10.80, and my system wouldn't
boot any more.  I tracked this down to commit 92c934b10ec3 ("ACPI /
init: Fix the ordering of acpi_reserve_resources()").  With that
commit reverted, my system is OK again.

What happens is that ahci fails to initialize because
pcim_iomap_regions_request_all() fails with EBUSY, due to a resource
conflict on the first IO region of the ahci device.  Since my root
device is on ahci, that's the end of that.  I'm sure this is due to a
BIOS / ACPI table bug on my particular platform, but that's scant
comfort when the system won't boot :)

I patched 3.10.80 so that ahci continues to initialize after the
EBUSY, and relevant parts of the kernel log seem to be:

[3.836643,26] system 00:06: [io  0x0400-0x047f] could not be reserved
...
[3.844112,26] pci :00:1f.2: BAR 0: assigned [io  0x0410-0x0417]
...
[6.020040,00] ahci :00:1f.2: BAR 0: can't reserve [io  0x0410-0x0417]

and /proc/ioports shows

0410-0415 : ACPI CPU throttle

So if I'm understanding properly, for some reason we discover but fail
to reserve the region with the ACPI resources, then PCI decides to
assign ahci IO ports into that range, then ACPI loads and reserves
0x0410-0x0415, and then ahci fails to load.

If I fully revert the patch, then I see

[3.853857,08] system 00:06: [io  0x0400-0x047f] has been reserved
...
[3.861806,08] pci :00:1f.2: BAR 0: assigned [io  0x0820-0x0827]

We're able to reserve the range, and then PCI assigns ahci into a
non-conflicting range.

I understand that the change here fixed another regression, but I'm
wondering if there's a way to make everyone happy here?  I can provide
debugging info from my system as required...

Thanks,
  Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 3.10.80 vs. 3.10.79

2015-06-09 Thread Roland Dreier
On Tue, Jun 9, 2015 at 4:43 PM, Roland Dreier rol...@purestorage.com wrote:
 I understand that the change here fixed another regression, but I'm
 wondering if there's a way to make everyone happy here?  I can provide
 debugging info from my system as required...

Maybe sent my mail too quickly, as I have some thoughts after looking
at the code.

From the link order, drivers/acpi init wll be called before
drivers/pnp init, right?  In my case, the acpi resources (ACPI
PM1a_EVT_BLK) etc are under a pnp bus.  But if acpi requests the
resources first, then pnp can't request the enclosing range.

Is the right fix to make sure the pnp init happens before acpi
requests resources?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Regression in 3.10.80 vs. 3.10.79

2015-06-09 Thread Roland Dreier
Hi, I recently updated from 3.10.79 to 3.10.80, and my system wouldn't
boot any more.  I tracked this down to commit 92c934b10ec3 (ACPI /
init: Fix the ordering of acpi_reserve_resources()).  With that
commit reverted, my system is OK again.

What happens is that ahci fails to initialize because
pcim_iomap_regions_request_all() fails with EBUSY, due to a resource
conflict on the first IO region of the ahci device.  Since my root
device is on ahci, that's the end of that.  I'm sure this is due to a
BIOS / ACPI table bug on my particular platform, but that's scant
comfort when the system won't boot :)

I patched 3.10.80 so that ahci continues to initialize after the
EBUSY, and relevant parts of the kernel log seem to be:

[3.836643,26] system 00:06: [io  0x0400-0x047f] could not be reserved
...
[3.844112,26] pci :00:1f.2: BAR 0: assigned [io  0x0410-0x0417]
...
[6.020040,00] ahci :00:1f.2: BAR 0: can't reserve [io  0x0410-0x0417]

and /proc/ioports shows

0410-0415 : ACPI CPU throttle

So if I'm understanding properly, for some reason we discover but fail
to reserve the region with the ACPI resources, then PCI decides to
assign ahci IO ports into that range, then ACPI loads and reserves
0x0410-0x0415, and then ahci fails to load.

If I fully revert the patch, then I see

[3.853857,08] system 00:06: [io  0x0400-0x047f] has been reserved
...
[3.861806,08] pci :00:1f.2: BAR 0: assigned [io  0x0820-0x0827]

We're able to reserve the range, and then PCI assigns ahci into a
non-conflicting range.

I understand that the change here fixed another regression, but I'm
wondering if there's a way to make everyone happy here?  I can provide
debugging info from my system as required...

Thanks,
  Roland
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2015-04-22 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


InfiniBand/RDMA updates for 4.1:
 - IPoIB fixes from Doug Ledford and Erez Shitrit
 - iSER updates from Sagi Grimberg
 - mlx4 GUID handling changes from Yishai Hadas
 - other misc fixes


Bart Van Assche (1):
  IB/srp: Use P_Key cache for P_Key lookups

Doug Ledford (11):
  IB/ipoib: factor out ah flushing
  IB/ipoib: change init sequence ordering
  IB/ipoib: Consolidate rtnl_lock tasks in workqueue
  IB/ipoib: Make the carrier_on_task race aware
  IB/ipoib: Use dedicated workqueues per interface
  IB/ipoib: No longer use flush as a parameter
  IB/ipoib: fix MCAST_FLAG_BUSY usage
  IB/ipoib: deserialize multicast joins
  IB/ipoib: drop mcast_mutex usage
  ib_srpt: convert printk's to pr_* functions
  Merge branches 'cve-fixup', 'ipoib', 'iser', 'misc-4.1', 'or-mlx4' and 
'srp' into for-4.1

Erez Shitrit (6):
  IB/ipoib: Use one linear skb in RX flow
  IB/ipoib: Update broadcast record values after each successful join 
request
  IB/ipoib: Handle QP in SQE state
  IB/ipoib: Save only IPOIB_MAX_PATH_REC_QUEUE skb's
  IB/ipoib: Remove IPOIB_MCAST_RUN bit
  IB/mlx4: Fix WQE LSO segment calculation

Honggang LI (1):
  mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit 
architectures

Sagi Grimberg (18):
  IB/iser: Fix unload during ep_poll wrong dereference
  IB/iser: Handle fastreg/local_inv completion errors
  IB/iser: Fix wrong calculation of protection buffer length
  IB/iser: Remove redundant cmd_data_len calculation
  IB/iser: Remove a redundant struct iser_data_buf
  IB/iser: Don't pass ib_device to fall_to_bounce_buff routine
  IB/iser: Move memory reg/dereg routines to iser_memory.c
  IB/iser: Remove redundant assignments in iser_reg_page_vec
  IB/iser: Get rid of struct iser_rdma_regd
  IB/iser: Merge build page-vec into register page-vec
  IB/iser: Move fastreg descriptor pool get/put to helper functions
  IB/iser: Move PI context alloc/free to routines
  IB/iser: Make fastreg pool cache friendly
  IB/iser: Modify struct iser_mem_reg members
  IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr
  IB/iser: Remove code duplication for a single DMA entry
  IB/iser: Bump version to 1.6
  IB/iser: Rewrite bounce buffer code path

Sebastian Ott (1):
  infiniband/mlx4: check for mapping error

Selvin Xavier (1):
  MAINTAINERS: Adding list of maintainers for ocrdma

Stephen Hemminger (1):
  rdma: replace deprecated ifconfig in doc

Sébastien Dugué (1):
  ib_uverbs: Fix pages leak when using XRC SRQs

Yann Droneaud (2):
  IB/core: disallow registering 0-sized memory region
  IB/core: don't disallow registering region starting at 0x0

Yishai Hadas (9):
  IB/mlx4: Alias GUID adding persistency support
  net/mlx4_core: Manage alias GUID per VF
  net/mlx4_core: Set initial admin GUIDs for VFs
  IB/mlx4: Manage admin alias GUID upon admin request
  IB/mlx4: Change init flow to request alias GUIDs for active VFs
  IB/mlx4: Request alias GUID on demand
  net/mlx4_core: Raise slave shutdown event upon FLR
  net/mlx4_core: Return the admin alias GUID upon host view request
  IB/mlx4: Change alias guids default to be host assigned

 Documentation/filesystems/nfs/nfs-rdma.txt |   9 +-
 MAINTAINERS|   9 +
 drivers/infiniband/core/umem.c |   7 +-
 drivers/infiniband/core/uverbs_main.c  |  22 +-
 drivers/infiniband/hw/mlx4/alias_GUID.c| 457 +-
 drivers/infiniband/hw/mlx4/mad.c   |   9 +
 drivers/infiniband/hw/mlx4/main.c  |  26 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h   |  14 +-
 drivers/infiniband/hw/mlx4/qp.c|   7 +-
 drivers/infiniband/hw/mlx4/sysfs.c |  44 +-
 drivers/infiniband/ulp/ipoib/ipoib.h   |  31 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|  18 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c| 195 
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |  73 ++-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 520 ++--
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |  44 +-
 drivers/infiniband/ulp/iser/iscsi_iser.h   |  66 +--
 drivers/infiniband/ulp/iser/iser_initiator.c   |  66 ++-
 drivers/infiniband/ulp/iser/iser_memory.c  | 523 -
 drivers/infiniband/ulp/iser/iser_verbs.c   | 220 +++--
 drivers/infiniband/ulp/srp/ib_srp.c|   9 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c  | 188 
 

[GIT PULL] please pull infiniband.git

2015-04-22 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


InfiniBand/RDMA updates for 4.1:
 - IPoIB fixes from Doug Ledford and Erez Shitrit
 - iSER updates from Sagi Grimberg
 - mlx4 GUID handling changes from Yishai Hadas
 - other misc fixes


Bart Van Assche (1):
  IB/srp: Use P_Key cache for P_Key lookups

Doug Ledford (11):
  IB/ipoib: factor out ah flushing
  IB/ipoib: change init sequence ordering
  IB/ipoib: Consolidate rtnl_lock tasks in workqueue
  IB/ipoib: Make the carrier_on_task race aware
  IB/ipoib: Use dedicated workqueues per interface
  IB/ipoib: No longer use flush as a parameter
  IB/ipoib: fix MCAST_FLAG_BUSY usage
  IB/ipoib: deserialize multicast joins
  IB/ipoib: drop mcast_mutex usage
  ib_srpt: convert printk's to pr_* functions
  Merge branches 'cve-fixup', 'ipoib', 'iser', 'misc-4.1', 'or-mlx4' and 
'srp' into for-4.1

Erez Shitrit (6):
  IB/ipoib: Use one linear skb in RX flow
  IB/ipoib: Update broadcast record values after each successful join 
request
  IB/ipoib: Handle QP in SQE state
  IB/ipoib: Save only IPOIB_MAX_PATH_REC_QUEUE skb's
  IB/ipoib: Remove IPOIB_MCAST_RUN bit
  IB/mlx4: Fix WQE LSO segment calculation

Honggang LI (1):
  mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit 
architectures

Sagi Grimberg (18):
  IB/iser: Fix unload during ep_poll wrong dereference
  IB/iser: Handle fastreg/local_inv completion errors
  IB/iser: Fix wrong calculation of protection buffer length
  IB/iser: Remove redundant cmd_data_len calculation
  IB/iser: Remove a redundant struct iser_data_buf
  IB/iser: Don't pass ib_device to fall_to_bounce_buff routine
  IB/iser: Move memory reg/dereg routines to iser_memory.c
  IB/iser: Remove redundant assignments in iser_reg_page_vec
  IB/iser: Get rid of struct iser_rdma_regd
  IB/iser: Merge build page-vec into register page-vec
  IB/iser: Move fastreg descriptor pool get/put to helper functions
  IB/iser: Move PI context alloc/free to routines
  IB/iser: Make fastreg pool cache friendly
  IB/iser: Modify struct iser_mem_reg members
  IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr
  IB/iser: Remove code duplication for a single DMA entry
  IB/iser: Bump version to 1.6
  IB/iser: Rewrite bounce buffer code path

Sebastian Ott (1):
  infiniband/mlx4: check for mapping error

Selvin Xavier (1):
  MAINTAINERS: Adding list of maintainers for ocrdma

Stephen Hemminger (1):
  rdma: replace deprecated ifconfig in doc

Sébastien Dugué (1):
  ib_uverbs: Fix pages leak when using XRC SRQs

Yann Droneaud (2):
  IB/core: disallow registering 0-sized memory region
  IB/core: don't disallow registering region starting at 0x0

Yishai Hadas (9):
  IB/mlx4: Alias GUID adding persistency support
  net/mlx4_core: Manage alias GUID per VF
  net/mlx4_core: Set initial admin GUIDs for VFs
  IB/mlx4: Manage admin alias GUID upon admin request
  IB/mlx4: Change init flow to request alias GUIDs for active VFs
  IB/mlx4: Request alias GUID on demand
  net/mlx4_core: Raise slave shutdown event upon FLR
  net/mlx4_core: Return the admin alias GUID upon host view request
  IB/mlx4: Change alias guids default to be host assigned

 Documentation/filesystems/nfs/nfs-rdma.txt |   9 +-
 MAINTAINERS|   9 +
 drivers/infiniband/core/umem.c |   7 +-
 drivers/infiniband/core/uverbs_main.c  |  22 +-
 drivers/infiniband/hw/mlx4/alias_GUID.c| 457 +-
 drivers/infiniband/hw/mlx4/mad.c   |   9 +
 drivers/infiniband/hw/mlx4/main.c  |  26 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h   |  14 +-
 drivers/infiniband/hw/mlx4/qp.c|   7 +-
 drivers/infiniband/hw/mlx4/sysfs.c |  44 +-
 drivers/infiniband/ulp/ipoib/ipoib.h   |  31 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|  18 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c| 195 
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |  73 ++-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 520 ++--
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |  44 +-
 drivers/infiniband/ulp/iser/iscsi_iser.h   |  66 +--
 drivers/infiniband/ulp/iser/iser_initiator.c   |  66 ++-
 drivers/infiniband/ulp/iser/iser_memory.c  | 523 -
 drivers/infiniband/ulp/iser/iser_verbs.c   | 220 +++--
 drivers/infiniband/ulp/srp/ib_srp.c|   9 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c  | 188 
 

Re: [PATCH v3 07/28] IB/Verbs: Reform IB-ulp ipoib

2015-04-16 Thread Roland Dreier
On Thu, Apr 16, 2015 at 9:44 AM, Jason Gunthorpe
 wrote:
>> We can give client->add() callback a return value and make
>> ib_register_device() return -ENOMEM when it failed, just wondering
>> why we don't do this at first, any special reason?

> No idea, but having ib_register_device fail and unwind if a client
> fails to attach makes sense to me.

It seems a bit unfriendly to fail an entire device if one ULP has a
problem.  Let's say you have a system whose main network connection is
IPoIB.  Would you want that connection to come up even if, say, the
NFS/RDMA server fails to find the memory registration type it likes?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 07/28] IB/Verbs: Reform IB-ulp ipoib

2015-04-16 Thread Roland Dreier
On Thu, Apr 16, 2015 at 9:44 AM, Jason Gunthorpe
jguntho...@obsidianresearch.com wrote:
 We can give client-add() callback a return value and make
 ib_register_device() return -ENOMEM when it failed, just wondering
 why we don't do this at first, any special reason?

 No idea, but having ib_register_device fail and unwind if a client
 fails to attach makes sense to me.

It seems a bit unfriendly to fail an entire device if one ULP has a
problem.  Let's say you have a system whose main network connection is
IPoIB.  Would you want that connection to come up even if, say, the
NFS/RDMA server fails to find the memory registration type it likes?

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2015-04-02 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


One 4.0 RDMA change:
 - Fix for exploitable integer overflow in uverbs interface.


Shachar Raindel (1):
  IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic

 drivers/infiniband/core/umem.c | 8 
 1 file changed, 8 insertions(+)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2015-04-02 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


One 4.0 RDMA change:
 - Fix for exploitable integer overflow in uverbs interface.


Shachar Raindel (1):
  IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic

 drivers/infiniband/core/umem.c | 8 
 1 file changed, 8 insertions(+)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2015-02-20 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


InfiniBand/RDMA changes for 3.20 merge window:
 - Re-enable on-demand paging changes with stable ABI
 - Fairly large set of ocrdma HW driver fixes
 - Some qib HW driver fixes
 - Other miscellaneous changes


Andreea-Cristina Bernat (2):
  IB/qib: Replace rcu_assign_pointer() with RCU_INIT_POINTER() in qib_qp.c
  IB/qib: Replace rcu_assign_pointer() with RCU_INIT_POINTER() in qib_keys.c

Ariel Nahum (1):
  IB/iser: Release the iscsi endpoint if ep_disconnect wasn't called

Bart Van Assche (1):
  MAINTAINERS: Update SRP initiator entry

Dan Carpenter (2):
  IB/mlx5: Fix error code in get_port_caps()
  RDMA/ocrdma: Fix off by one in ocrdma_query_gid()

Devesh Sharma (4):
  RDMA/ocrdma: Report correct count of interrupt vectors while registering 
ocrdma device
  RDMA/ocrdma: Discontinue support of RDMA-READ-WITH-INVALIDATE
  RDMA/ocrdma: Honor return value of ocrdma_resolve_dmac
  RDMA/ocrdma: set vlan present bit for user AH

Eli Cohen (1):
  IB/core: Add support for extended query device caps

Haggai Eran (3):
  IB/core: Properly handle registration of on-demand paging MRs after dereg
  IB/core: Add on demand paging caps to ib_uverbs_ex_query_device
  IB/mlx5: Enable the ODP capability query verb

Hariprasad S (2):
  RDMA/cxgb4: Serialize CQ event upcalls with CQ destruction
  RDMA/cxgb4: Don't hang threads forever waiting on WR replies

Ilya Nelkenbaum (1):
  IB/core: When marshaling ucma path from user-space, clear unused fields

Jack Morgenstein (1):
  IB/mlx4: In mlx4_ib_demux_cm, print out GUID in host-endian order

Majd Dibbiny (3):
  IB/mlx4: Fix memory leak in __mlx4_ib_modify_qp
  IB/mlx4: Bug fixes in mlx4_ib_resize_cq
  IB/mlx5: Update the dev in reg_create

Mike Marciniszyn (3):
  IB/qib: Fix sizeof checkpatch warnings
  IB/qib: Fix checkpatch warnings
  IB/qib: Add blank line after declaration

Mitesh Ahuja (7):
  RDMA/ocrdma: Add support for IB stack compliant stats in sysfs.
  RDMA/ocrdma: Increase the GID table size.
  RDMA/ocrdma: Move PD resource management to driver.
  RDMA/ocrdma: Host crash on destroying device resources
  RDMA/ocrdma: Add support for interrupt moderation
  RDMA/ocrdma: remove reference of ocrdma_dev out of ocrdma_qp structure
  RDMA/ocrdma: Update the ocrdma module version string

Mitko Haralanov (1):
  IB/qib: Do not write EEPROM

Moshe Lazer (1):
  IB/core: Fix deadlock on uverbs modify_qp error flow

Or Gerlitz (1):
  IB/mlx4: Fix wrong usage of IPv4 protocol for multicast attach/detach

Padmanabh Ratnakar (1):
  RDMA/ocrdma: Report correct state in ibv_query_qp

Rasmus Villemoes (2):
  RDMA/ocrdma: Help gcc generate better code for ocrdma_srq_toggle_bit
  RDMA/ocrdma: Use unsigned for bit index

Rickard Strandqvist (1):
  IB/ipath: Remove unused function in ipath_wc_ppc64

Roi Dayan (1):
  IB/iser: Use correct dma direction when unmapping SGs

Roland Dreier (1):
  Merge branches 'core', 'cxgb4', 'iser', 'mlx4', 'mlx5', 'ocrdma', 'odp', 
'qib' and 'srp' into for-next

Sagi Grimberg (1):
  IB/iser: Fix memory regions possible leak

Selvin Xavier (2):
  RDMA/ocrdma: Debugfs enhancments for ocrdma driver
  RDMA/ocrdma: Allow expansion of the SQ CQEs via buddy CQ expansion of the 
QP

Vinit Agnihotri (1):
  IB/qib: Add support for the new QMH7360 card

 MAINTAINERS   |   2 +-
 drivers/infiniband/core/ucma.c|   3 +
 drivers/infiniband/core/umem_odp.c|   3 +-
 drivers/infiniband/core/uverbs.h  |   1 +
 drivers/infiniband/core/uverbs_cmd.c  | 158 +
 drivers/infiniband/core/uverbs_main.c |   1 +
 drivers/infiniband/hw/cxgb4/ev.c  |   9 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h|  29 ++-
 drivers/infiniband/hw/ipath/ipath_kernel.h|   3 -
 drivers/infiniband/hw/ipath/ipath_wc_ppc64.c  |  13 --
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c |  15 --
 drivers/infiniband/hw/mlx4/cm.c   |   2 +-
 drivers/infiniband/hw/mlx4/cq.c   |   7 +-
 drivers/infiniband/hw/mlx4/main.c |  10 +-
 drivers/infiniband/hw/mlx4/qp.c   |   6 +-
 drivers/infiniband/hw/mlx5/main.c |   4 +-
 drivers/infiniband/hw/mlx5/mr.c   |   1 +
 drivers/infiniband/hw/ocrdma/ocrdma.h |  38 +++-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c  |  38 +++-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h  |   6 +
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c  | 312 ++
 drivers/infiniband/hw/ocrdma/ocrdma_hw.h  |   2 +
 drivers/infiniband/hw/ocrdma/ocrdma_main.c|  12

[GIT PULL] please pull infiniband.git

2015-02-20 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


InfiniBand/RDMA changes for 3.20 merge window:
 - Re-enable on-demand paging changes with stable ABI
 - Fairly large set of ocrdma HW driver fixes
 - Some qib HW driver fixes
 - Other miscellaneous changes


Andreea-Cristina Bernat (2):
  IB/qib: Replace rcu_assign_pointer() with RCU_INIT_POINTER() in qib_qp.c
  IB/qib: Replace rcu_assign_pointer() with RCU_INIT_POINTER() in qib_keys.c

Ariel Nahum (1):
  IB/iser: Release the iscsi endpoint if ep_disconnect wasn't called

Bart Van Assche (1):
  MAINTAINERS: Update SRP initiator entry

Dan Carpenter (2):
  IB/mlx5: Fix error code in get_port_caps()
  RDMA/ocrdma: Fix off by one in ocrdma_query_gid()

Devesh Sharma (4):
  RDMA/ocrdma: Report correct count of interrupt vectors while registering 
ocrdma device
  RDMA/ocrdma: Discontinue support of RDMA-READ-WITH-INVALIDATE
  RDMA/ocrdma: Honor return value of ocrdma_resolve_dmac
  RDMA/ocrdma: set vlan present bit for user AH

Eli Cohen (1):
  IB/core: Add support for extended query device caps

Haggai Eran (3):
  IB/core: Properly handle registration of on-demand paging MRs after dereg
  IB/core: Add on demand paging caps to ib_uverbs_ex_query_device
  IB/mlx5: Enable the ODP capability query verb

Hariprasad S (2):
  RDMA/cxgb4: Serialize CQ event upcalls with CQ destruction
  RDMA/cxgb4: Don't hang threads forever waiting on WR replies

Ilya Nelkenbaum (1):
  IB/core: When marshaling ucma path from user-space, clear unused fields

Jack Morgenstein (1):
  IB/mlx4: In mlx4_ib_demux_cm, print out GUID in host-endian order

Majd Dibbiny (3):
  IB/mlx4: Fix memory leak in __mlx4_ib_modify_qp
  IB/mlx4: Bug fixes in mlx4_ib_resize_cq
  IB/mlx5: Update the dev in reg_create

Mike Marciniszyn (3):
  IB/qib: Fix sizeof checkpatch warnings
  IB/qib: Fix checkpatch warnings
  IB/qib: Add blank line after declaration

Mitesh Ahuja (7):
  RDMA/ocrdma: Add support for IB stack compliant stats in sysfs.
  RDMA/ocrdma: Increase the GID table size.
  RDMA/ocrdma: Move PD resource management to driver.
  RDMA/ocrdma: Host crash on destroying device resources
  RDMA/ocrdma: Add support for interrupt moderation
  RDMA/ocrdma: remove reference of ocrdma_dev out of ocrdma_qp structure
  RDMA/ocrdma: Update the ocrdma module version string

Mitko Haralanov (1):
  IB/qib: Do not write EEPROM

Moshe Lazer (1):
  IB/core: Fix deadlock on uverbs modify_qp error flow

Or Gerlitz (1):
  IB/mlx4: Fix wrong usage of IPv4 protocol for multicast attach/detach

Padmanabh Ratnakar (1):
  RDMA/ocrdma: Report correct state in ibv_query_qp

Rasmus Villemoes (2):
  RDMA/ocrdma: Help gcc generate better code for ocrdma_srq_toggle_bit
  RDMA/ocrdma: Use unsigned for bit index

Rickard Strandqvist (1):
  IB/ipath: Remove unused function in ipath_wc_ppc64

Roi Dayan (1):
  IB/iser: Use correct dma direction when unmapping SGs

Roland Dreier (1):
  Merge branches 'core', 'cxgb4', 'iser', 'mlx4', 'mlx5', 'ocrdma', 'odp', 
'qib' and 'srp' into for-next

Sagi Grimberg (1):
  IB/iser: Fix memory regions possible leak

Selvin Xavier (2):
  RDMA/ocrdma: Debugfs enhancments for ocrdma driver
  RDMA/ocrdma: Allow expansion of the SQ CQEs via buddy CQ expansion of the 
QP

Vinit Agnihotri (1):
  IB/qib: Add support for the new QMH7360 card

 MAINTAINERS   |   2 +-
 drivers/infiniband/core/ucma.c|   3 +
 drivers/infiniband/core/umem_odp.c|   3 +-
 drivers/infiniband/core/uverbs.h  |   1 +
 drivers/infiniband/core/uverbs_cmd.c  | 158 +
 drivers/infiniband/core/uverbs_main.c |   1 +
 drivers/infiniband/hw/cxgb4/ev.c  |   9 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h|  29 ++-
 drivers/infiniband/hw/ipath/ipath_kernel.h|   3 -
 drivers/infiniband/hw/ipath/ipath_wc_ppc64.c  |  13 --
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c |  15 --
 drivers/infiniband/hw/mlx4/cm.c   |   2 +-
 drivers/infiniband/hw/mlx4/cq.c   |   7 +-
 drivers/infiniband/hw/mlx4/main.c |  10 +-
 drivers/infiniband/hw/mlx4/qp.c   |   6 +-
 drivers/infiniband/hw/mlx5/main.c |   4 +-
 drivers/infiniband/hw/mlx5/mr.c   |   1 +
 drivers/infiniband/hw/ocrdma/ocrdma.h |  38 +++-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c  |  38 +++-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h  |   6 +
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c  | 312 ++
 drivers/infiniband/hw/ocrdma/ocrdma_hw.h  |   2 +
 drivers/infiniband/hw/ocrdma/ocrdma_main.c|  12

Re: linux-next: build failure after merge of the infiniband tree

2015-02-17 Thread Roland Dreier
On Tue, Feb 17, 2015 at 6:32 PM, Stephen Rothwell  wrote:
> After merging the livepatching tree, today's linux-next build (powerpc
> allyesconfig) failed like this:
>
> In file included from drivers/infiniband/hw/qib/qib_cq.c:41:0:
> drivers/infiniband/hw/qib/qib.h: In function 'qib_flush_wc':
> drivers/infiniband/hw/qib/qib.h:1470:1: error: expected ';' before '}' token
>  }
>  ^
>
> and it went badly down hill from there :-(


Weird, I could have sworn I fixed that before I pushed the tree out.
Anyway I'll try adding the missing ';' again and push it out again :(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] IB/mthca: remove deprecated use of pci api

2015-02-17 Thread Roland Dreier
On Wed, Feb 4, 2015 at 6:09 AM, Quentin Lambert
 wrote:
> -   dev->eq_table.icm_dma  = pci_map_page(dev->pdev, 
> dev->eq_table.icm_page, 0,
> - PAGE_SIZE, 
> PCI_DMA_BIDIRECTIONAL);
> -   if (pci_dma_mapping_error(dev->pdev, dev->eq_table.icm_dma)) {
> +   dev->eq_table.icm_dma  = dma_map_page(>pdev->dev,
> + dev->eq_table.icm_page, 0,
> + PAGE_SIZE,
> + (enum 
> dma_data_direction)PCI_DMA_BIDIRECTIONAL);

Surely this can't be right?  Shouldn't the direction just change to
DMA_BIDIRECTIONAL?

Are we really sweeping through the kernel and getting rid of pci_map_
etc. calls?

If so please respin your semantic patch so that it doesn't add crazy stuff like

(enum dma_data_direction)PCI_DMA_BIDIRECTIONAL

and resend the change.

Thanks,
  Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the infiniband tree

2015-02-17 Thread Roland Dreier
On Tue, Feb 17, 2015 at 6:32 PM, Stephen Rothwell s...@canb.auug.org.au wrote:
 After merging the livepatching tree, today's linux-next build (powerpc
 allyesconfig) failed like this:

 In file included from drivers/infiniband/hw/qib/qib_cq.c:41:0:
 drivers/infiniband/hw/qib/qib.h: In function 'qib_flush_wc':
 drivers/infiniband/hw/qib/qib.h:1470:1: error: expected ';' before '}' token
  }
  ^

 and it went badly down hill from there :-(


Weird, I could have sworn I fixed that before I pushed the tree out.
Anyway I'll try adding the missing ';' again and push it out again :(
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] IB/mthca: remove deprecated use of pci api

2015-02-17 Thread Roland Dreier
On Wed, Feb 4, 2015 at 6:09 AM, Quentin Lambert
lambert.quen...@gmail.com wrote:
 -   dev-eq_table.icm_dma  = pci_map_page(dev-pdev, 
 dev-eq_table.icm_page, 0,
 - PAGE_SIZE, 
 PCI_DMA_BIDIRECTIONAL);
 -   if (pci_dma_mapping_error(dev-pdev, dev-eq_table.icm_dma)) {
 +   dev-eq_table.icm_dma  = dma_map_page(dev-pdev-dev,
 + dev-eq_table.icm_page, 0,
 + PAGE_SIZE,
 + (enum 
 dma_data_direction)PCI_DMA_BIDIRECTIONAL);

Surely this can't be right?  Shouldn't the direction just change to
DMA_BIDIRECTIONAL?

Are we really sweeping through the kernel and getting rid of pci_map_
etc. calls?

If so please respin your semantic patch so that it doesn't add crazy stuff like

(enum dma_data_direction)PCI_DMA_BIDIRECTIONAL

and resend the change.

Thanks,
  Roland
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2015-02-06 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


One more last-second RDMA change for 3.19:
 - Yann realized that the previous revert of new userspace ABI did not
   go far enough, and we're still exposing a change that we don't want.
   Revert even closer to 3.18 interface to make sure we get things right
   in the long run.

Sorry for sending this at the very end of the release cycle, but we
didn't realize the scope of the required fix until just now.


Yann Droneaud (1):
  Revert "IB/core: Add support for extended query device caps"

 drivers/infiniband/core/uverbs.h |   1 -
 drivers/infiniband/core/uverbs_cmd.c | 137 +++
 drivers/infiniband/hw/mlx5/main.c|   2 -
 include/rdma/ib_verbs.h  |   5 +-
 include/uapi/rdma/ib_user_verbs.h|  27 ---
 5 files changed, 42 insertions(+), 130 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2015-02-06 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


One more last-second RDMA change for 3.19:
 - Yann realized that the previous revert of new userspace ABI did not
   go far enough, and we're still exposing a change that we don't want.
   Revert even closer to 3.18 interface to make sure we get things right
   in the long run.

Sorry for sending this at the very end of the release cycle, but we
didn't realize the scope of the required fix until just now.


Yann Droneaud (1):
  Revert IB/core: Add support for extended query device caps

 drivers/infiniband/core/uverbs.h |   1 -
 drivers/infiniband/core/uverbs_cmd.c | 137 +++
 drivers/infiniband/hw/mlx5/main.c|   2 -
 include/rdma/ib_verbs.h  |   5 +-
 include/uapi/rdma/ib_user_verbs.h|  27 ---
 5 files changed, 42 insertions(+), 130 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2015-02-03 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


Last minute InfiniBand/RDMA changes for 3.19:
 - Revert IPoIB driver back to 3.18 state.  We had a number of fixes go
   into 3.19, but they introduced regressions.  We tried to get everything
   fixed up but ran out of time, so we'll try again for 3.20.
 - Similarly, turn off the new "extended query port" verb.  Late in the
   cycle we realized the ABI is not quite right, and rather than freeze
   something in a rush and make a mistake, we'll take a bit more time
   and get it right in 3.20.


Haggai Eran (1):
  IB/core: Temporarily disable ex_query_device uverb

Roland Dreier (9):
  Revert "IPoIB: No longer use flush as a parameter"
  Revert "IPoIB: Make ipoib_mcast_stop_thread flush the workqueue"
  Revert "IPoIB: Use dedicated workqueues per interface"
  Revert "IPoIB: change init sequence ordering"
  Revert "IPoIB: fix mcast_dev_flush/mcast_restart_task race"
  Revert "IPoIB: fix MCAST_FLAG_BUSY usage"
  Revert "IPoIB: Make the carrier_on_task race aware"
  Revert "IPoIB: Consolidate rtnl_lock tasks in workqueue"
  Merge branches 'ipoib' and 'odp' into for-next

 drivers/infiniband/core/uverbs_main.c  |   1 -
 drivers/infiniband/ulp/ipoib/ipoib.h   |  19 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|  18 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c|  27 +--
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |  49 ++---
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 239 +
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |  22 +--
 7 files changed, 134 insertions(+), 241 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2015-02-03 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


Last minute InfiniBand/RDMA changes for 3.19:
 - Revert IPoIB driver back to 3.18 state.  We had a number of fixes go
   into 3.19, but they introduced regressions.  We tried to get everything
   fixed up but ran out of time, so we'll try again for 3.20.
 - Similarly, turn off the new extended query port verb.  Late in the
   cycle we realized the ABI is not quite right, and rather than freeze
   something in a rush and make a mistake, we'll take a bit more time
   and get it right in 3.20.


Haggai Eran (1):
  IB/core: Temporarily disable ex_query_device uverb

Roland Dreier (9):
  Revert IPoIB: No longer use flush as a parameter
  Revert IPoIB: Make ipoib_mcast_stop_thread flush the workqueue
  Revert IPoIB: Use dedicated workqueues per interface
  Revert IPoIB: change init sequence ordering
  Revert IPoIB: fix mcast_dev_flush/mcast_restart_task race
  Revert IPoIB: fix MCAST_FLAG_BUSY usage
  Revert IPoIB: Make the carrier_on_task race aware
  Revert IPoIB: Consolidate rtnl_lock tasks in workqueue
  Merge branches 'ipoib' and 'odp' into for-next

 drivers/infiniband/core/uverbs_main.c  |   1 -
 drivers/infiniband/ulp/ipoib/ipoib.h   |  19 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|  18 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c|  27 +--
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |  49 ++---
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 239 +
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |  22 +--
 7 files changed, 134 insertions(+), 241 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-12-18 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


Main batch of InfiniBand/RDMA changes for 3.19:

 - On-demand paging support in core midlayer and mlx5 driver.  This lets
   userspace create non-pinned memory regions and have the adapter HW
   trigger page faults.
 - iSER and IPoIB updates and fixes.
 - Low-level HW driver updates for cxgb4, mlx4 and ocrdma.
 - Other miscellaneous fixes.


Ariel Nahum (2):
  IB/iser: Collapse cleanup and disconnect handlers
  IB/iser: Fix possible NULL derefernce ib_conn->device in session_create

Devesh Sharma (1):
  RDMA/ocrdma: Always resolve destination mac from GRH for UD QPs

Doug Ledford (8):
  IPoIB: Consolidate rtnl_lock tasks in workqueue
  IPoIB: Make the carrier_on_task race aware
  IPoIB: fix MCAST_FLAG_BUSY usage
  IPoIB: fix mcast_dev_flush/mcast_restart_task race
  IPoIB: change init sequence ordering
  IPoIB: Use dedicated workqueues per interface
  IPoIB: Make ipoib_mcast_stop_thread flush the workqueue
  IPoIB: No longer use flush as a parameter

Eli Cohen (1):
  IB/core: Add support for extended query device caps

Haggai Eran (14):
  IB/mlx5: Remove per-MR pas and dma pointers
  IB/mlx5: Enhance UMR support to allow partial page table update
  IB/core: Replace ib_umem's offset field with a full address
  IB/core: Add umem function to read data from user-space
  IB/mlx5: Add function to read WQE from user-space
  IB/core: Implement support for MMU notifiers regarding on demand paging 
regions
  mlx5_core: Add support for page faults events and low level handling
  IB/mlx5: Implement the ODP capability query verb
  IB/mlx5: Changes in memory region creation to support on-demand paging
  IB/mlx5: Add mlx5_ib_update_mtt to update page tables after creation
  IB/mlx5: Page faults handling infrastructure
  IB/mlx5: Handle page faults
  IB/mlx5: Add support for RDMA read/write responder page faults
  IB/mlx5: Implement on demand paging by adding support for MMU notifiers

Hariprasad S (1):
  RDMA/cxgb4: Handle NET_XMIT return codes

Hariprasad Shenai (2):
  RDMA/cxgb4: Fix locking issue in process_mpa_request
  RDMA/cxgb4: Limit MRs to < 8GB for T4/T5 devices

Jack Morgenstein (2):
  IB/core: Fix mgid key handling in SA agent multicast data-base
  IB/mlx4: Fix an incorrectly shadowed variable in mlx4_ib_rereg_user_mr

Max Gurtovoy (1):
  IB/iser: Fix possible SQ overflow

Minh Tran (1):
  IB/iser: Re-adjust CQ and QP send ring sizes to HW limits

Mitesh Ahuja (1):
  RDMA/ocrdma: Fix ocrdma_query_qp() to report q_key value for UD QPs

Moni Shoua (1):
  IB/core: Do not resolve VLAN if already resolved

Or Gerlitz (1):
  IB/iser: Bump version to 1.5

Or Kehati (1):
  IB/addr: Improve address resolution callback scheduling

Pramod Kumar (2):
  RDMA/cxgb4: Increase epd buff size for debug interface
  RDMA/cxgb4: Configure 0B MRs to match HW implementation

Roland Dreier (2):
  mlx5_core: Re-add MLX5_DEV_CAP_FLAG_ON_DMND_PG flag
  Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'mlx4', 'ocrdma', 'odp' 
and 'srp' into for-next

Sagi Grimberg (13):
  IB/iser: Fix catastrophic error flow hang
  IB/iser: Decrement CQ's active QPs accounting when QP creation fails
  IB/iser: Fix sparse warnings
  IB/iser: Fix race between iser connection teardown and scsi TMFs
  IB/iser: Terminate connection before cleaning inflight tasks
  IB/iser: Centralize memory region invalidation to a function
  IB/iser: Remove redundant is_mr indicator
  IB/iser: Use more completion queues
  IB/iser: Micro-optimize iser logging
  IB/iser: Micro-optimize iser_handle_wc
  IB/iser: DIX update
  IB/core: Add flags for on demand paging support
  IB/srp: Allow newline separator for connection string

Shachar Raindel (1):
  IB/core: Add support for on demand paging regions

Steve Wise (1):
  RDMA/cxgb4: Wake up waiters after flushing the qp

Yuval Shaia (1):
  mlx4_core: Check for DPDP violation only when DPDP is not supported

 drivers/infiniband/Kconfig |  11 +
 drivers/infiniband/core/Makefile   |   1 +
 drivers/infiniband/core/addr.c |   4 +-
 drivers/infiniband/core/multicast.c|  11 +-
 drivers/infiniband/core/umem.c |  72 ++-
 drivers/infiniband/core/umem_odp.c | 668 +
 drivers/infiniband/core/umem_rbtree.c  |  94 +++
 drivers/infiniband/core/uverbs.h   |   1 +
 drivers/infiniband/core/uverbs_cmd.c   | 171 --
 drivers/infiniband/core/uverbs_main.c  |   5 +-
 drivers/infiniband/core/verbs.c|   3 +-
 d

[GIT PULL] please pull infiniband.git

2014-12-18 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


Main batch of InfiniBand/RDMA changes for 3.19:

 - On-demand paging support in core midlayer and mlx5 driver.  This lets
   userspace create non-pinned memory regions and have the adapter HW
   trigger page faults.
 - iSER and IPoIB updates and fixes.
 - Low-level HW driver updates for cxgb4, mlx4 and ocrdma.
 - Other miscellaneous fixes.


Ariel Nahum (2):
  IB/iser: Collapse cleanup and disconnect handlers
  IB/iser: Fix possible NULL derefernce ib_conn-device in session_create

Devesh Sharma (1):
  RDMA/ocrdma: Always resolve destination mac from GRH for UD QPs

Doug Ledford (8):
  IPoIB: Consolidate rtnl_lock tasks in workqueue
  IPoIB: Make the carrier_on_task race aware
  IPoIB: fix MCAST_FLAG_BUSY usage
  IPoIB: fix mcast_dev_flush/mcast_restart_task race
  IPoIB: change init sequence ordering
  IPoIB: Use dedicated workqueues per interface
  IPoIB: Make ipoib_mcast_stop_thread flush the workqueue
  IPoIB: No longer use flush as a parameter

Eli Cohen (1):
  IB/core: Add support for extended query device caps

Haggai Eran (14):
  IB/mlx5: Remove per-MR pas and dma pointers
  IB/mlx5: Enhance UMR support to allow partial page table update
  IB/core: Replace ib_umem's offset field with a full address
  IB/core: Add umem function to read data from user-space
  IB/mlx5: Add function to read WQE from user-space
  IB/core: Implement support for MMU notifiers regarding on demand paging 
regions
  mlx5_core: Add support for page faults events and low level handling
  IB/mlx5: Implement the ODP capability query verb
  IB/mlx5: Changes in memory region creation to support on-demand paging
  IB/mlx5: Add mlx5_ib_update_mtt to update page tables after creation
  IB/mlx5: Page faults handling infrastructure
  IB/mlx5: Handle page faults
  IB/mlx5: Add support for RDMA read/write responder page faults
  IB/mlx5: Implement on demand paging by adding support for MMU notifiers

Hariprasad S (1):
  RDMA/cxgb4: Handle NET_XMIT return codes

Hariprasad Shenai (2):
  RDMA/cxgb4: Fix locking issue in process_mpa_request
  RDMA/cxgb4: Limit MRs to  8GB for T4/T5 devices

Jack Morgenstein (2):
  IB/core: Fix mgid key handling in SA agent multicast data-base
  IB/mlx4: Fix an incorrectly shadowed variable in mlx4_ib_rereg_user_mr

Max Gurtovoy (1):
  IB/iser: Fix possible SQ overflow

Minh Tran (1):
  IB/iser: Re-adjust CQ and QP send ring sizes to HW limits

Mitesh Ahuja (1):
  RDMA/ocrdma: Fix ocrdma_query_qp() to report q_key value for UD QPs

Moni Shoua (1):
  IB/core: Do not resolve VLAN if already resolved

Or Gerlitz (1):
  IB/iser: Bump version to 1.5

Or Kehati (1):
  IB/addr: Improve address resolution callback scheduling

Pramod Kumar (2):
  RDMA/cxgb4: Increase epd buff size for debug interface
  RDMA/cxgb4: Configure 0B MRs to match HW implementation

Roland Dreier (2):
  mlx5_core: Re-add MLX5_DEV_CAP_FLAG_ON_DMND_PG flag
  Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'mlx4', 'ocrdma', 'odp' 
and 'srp' into for-next

Sagi Grimberg (13):
  IB/iser: Fix catastrophic error flow hang
  IB/iser: Decrement CQ's active QPs accounting when QP creation fails
  IB/iser: Fix sparse warnings
  IB/iser: Fix race between iser connection teardown and scsi TMFs
  IB/iser: Terminate connection before cleaning inflight tasks
  IB/iser: Centralize memory region invalidation to a function
  IB/iser: Remove redundant is_mr indicator
  IB/iser: Use more completion queues
  IB/iser: Micro-optimize iser logging
  IB/iser: Micro-optimize iser_handle_wc
  IB/iser: DIX update
  IB/core: Add flags for on demand paging support
  IB/srp: Allow newline separator for connection string

Shachar Raindel (1):
  IB/core: Add support for on demand paging regions

Steve Wise (1):
  RDMA/cxgb4: Wake up waiters after flushing the qp

Yuval Shaia (1):
  mlx4_core: Check for DPDP violation only when DPDP is not supported

 drivers/infiniband/Kconfig |  11 +
 drivers/infiniband/core/Makefile   |   1 +
 drivers/infiniband/core/addr.c |   4 +-
 drivers/infiniband/core/multicast.c|  11 +-
 drivers/infiniband/core/umem.c |  72 ++-
 drivers/infiniband/core/umem_odp.c | 668 +
 drivers/infiniband/core/umem_rbtree.c  |  94 +++
 drivers/infiniband/core/uverbs.h   |   1 +
 drivers/infiniband/core/uverbs_cmd.c   | 171 --
 drivers/infiniband/core/uverbs_main.c  |   5 +-
 drivers/infiniband/core/verbs.c|   3 +-
 drivers

Re: linux-next: build failure after merge of the infiniband tree

2014-12-15 Thread Roland Dreier
On Mon, Dec 15, 2014 at 5:56 PM, Roland Dreier  wrote:
> I'll add a partial revert of that patch to my tree to get back the
> now-used enum values.

I rebased my tree on top of the merge-window merge of davem's tree,
and added the missing flag on top of the "remove this flag" commit.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the infiniband tree

2014-12-15 Thread Roland Dreier
On Mon, Dec 15, 2014 at 5:47 PM, Stephen Rothwell  wrote:
> Hi all,
>
> After merging the infiniband tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
>
> drivers/infiniband/hw/mlx5/main.c: In function 'mlx5_ib_query_device':
> drivers/infiniband/hw/mlx5/main.c:248:34: error: 
> 'MLX5_DEV_CAP_FLAG_ON_DMND_PG' undeclared (first use in this function)
>   if (dev->mdev->caps.gen.flags & MLX5_DEV_CAP_FLAG_ON_DMND_PG)
>   ^
> drivers/net/ethernet/mellanox/mlx5/core/fw.c: In function 
> 'mlx5_query_odp_caps':
> drivers/net/ethernet/mellanox/mlx5/core/fw.c:79:30: error: 
> 'MLX5_DEV_CAP_FLAG_ON_DMND_PG' undeclared (first use in this function)
>   if (!(dev->caps.gen.flags & MLX5_DEV_CAP_FLAG_ON_DMND_PG))
>   ^
> drivers/net/ethernet/mellanox/mlx5/core/eq.c: In function 'mlx5_start_eqs':
> drivers/net/ethernet/mellanox/mlx5/core/eq.c:459:28: error: 
> 'MLX5_DEV_CAP_FLAG_ON_DMND_PG' undeclared (first use in this function)
>   if (dev->caps.gen.flags & MLX5_DEV_CAP_FLAG_ON_DMND_PG)
> ^
>
> Really?  Code added half way though the merge window not even build
> tested?

It's not quite as bad as it seems.  The infiniband tree itself builds,
the problem is the merged tree.

The Mellanox guys merged the "cleanup"

commit 0c7aac854f52
Author: Eli Cohen 
Date:   Tue Dec 2 02:26:14 2014

net/mlx5_core: Remove unused dev cap enum fields

These enumerations are not used so remove them.

Signed-off-by: Eli Cohen 
Signed-off-by: David S. Miller 

through davem's tree, and then went ahead and used at least
MLX5_DEV_CAP_FLAG_ON_DMND_PG (which that patch removes) in patches
they merged through my tree.

I'll add a partial revert of that patch to my tree to get back the
now-used enum values.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the infiniband tree

2014-12-15 Thread Roland Dreier
On Mon, Dec 15, 2014 at 5:47 PM, Stephen Rothwell s...@canb.auug.org.au wrote:
 Hi all,

 After merging the infiniband tree, today's linux-next build (x86_64
 allmodconfig) failed like this:

 drivers/infiniband/hw/mlx5/main.c: In function 'mlx5_ib_query_device':
 drivers/infiniband/hw/mlx5/main.c:248:34: error: 
 'MLX5_DEV_CAP_FLAG_ON_DMND_PG' undeclared (first use in this function)
   if (dev-mdev-caps.gen.flags  MLX5_DEV_CAP_FLAG_ON_DMND_PG)
   ^
 drivers/net/ethernet/mellanox/mlx5/core/fw.c: In function 
 'mlx5_query_odp_caps':
 drivers/net/ethernet/mellanox/mlx5/core/fw.c:79:30: error: 
 'MLX5_DEV_CAP_FLAG_ON_DMND_PG' undeclared (first use in this function)
   if (!(dev-caps.gen.flags  MLX5_DEV_CAP_FLAG_ON_DMND_PG))
   ^
 drivers/net/ethernet/mellanox/mlx5/core/eq.c: In function 'mlx5_start_eqs':
 drivers/net/ethernet/mellanox/mlx5/core/eq.c:459:28: error: 
 'MLX5_DEV_CAP_FLAG_ON_DMND_PG' undeclared (first use in this function)
   if (dev-caps.gen.flags  MLX5_DEV_CAP_FLAG_ON_DMND_PG)
 ^

 Really?  Code added half way though the merge window not even build
 tested?

It's not quite as bad as it seems.  The infiniband tree itself builds,
the problem is the merged tree.

The Mellanox guys merged the cleanup

commit 0c7aac854f52
Author: Eli Cohen e...@dev.mellanox.co.il
Date:   Tue Dec 2 02:26:14 2014

net/mlx5_core: Remove unused dev cap enum fields

These enumerations are not used so remove them.

Signed-off-by: Eli Cohen e...@mellanox.com
Signed-off-by: David S. Miller da...@davemloft.net

through davem's tree, and then went ahead and used at least
MLX5_DEV_CAP_FLAG_ON_DMND_PG (which that patch removes) in patches
they merged through my tree.

I'll add a partial revert of that patch to my tree to get back the
now-used enum values.

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the infiniband tree

2014-12-15 Thread Roland Dreier
On Mon, Dec 15, 2014 at 5:56 PM, Roland Dreier rol...@kernel.org wrote:
 I'll add a partial revert of that patch to my tree to get back the
 now-used enum values.

I rebased my tree on top of the merge-window merge of davem's tree,
and added the missing flag on top of the remove this flag commit.

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-10-16 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


Main set of InfiniBand/RDMA updates for 3.18 merge window:

 - Large set of iSER initiator improvements
 - Hardware driver fixes for cxgb4, mlx5 and ocrdma
 - Small fixes to core midlayer


Ariel Nahum (3):
  IB/iser: Unbind at conn_stop stage
  IB/iser: Use iser_warn instead of BUG_ON in iser_conn_release
  IB/iser: Change iscsi_conn_stop log level to info

Devesh Sharma (3):
  RDMA/ocrdma: Add default GID at index 0
  RDMA/ocrdma: Convert kernel VA to PA for mmap in user
  IB/core: Clear AH attr variable to prevent garbage data

Eli Cohen (5):
  IB/mlx5: Clear umr resources after ib_unregister_device
  IB/mlx5: Improve debug prints in mlx5_ib_reg_user_mr
  IB/core: Avoid leakage from kernel to user space
  IB/mlx5: Fix possible array overflow
  IB/mlx5: Remove duplicate code from mlx5_set_path

Hariprasad S (3):
  RDMA/cxgb4: Take IPv6 into account for best_mtu and set_emss
  RDMA/cxgb4: Add missing neigh_release in find_route
  RDMA/cxgb4: Fix ntuple calculation for ipv6 and remove duplicate line

Jack Morgenstein (1):
  IB/core: Fix XRC race condition in ib_uverbs_open_qp

Jes Sorensen (3):
  RDMA/ocrdma: Don't memset() buffers we just allocated with kzalloc()
  RDMA/ocrdma: The kernel has a perfectly good BIT() macro - use it
  RDMA/ocrdma: Save the bit environment, spare unncessary parenthesis

Li RongQing (1):
  RDMA/ocrdma: Remove a unused-label warning

Or Gerlitz (1):
  IB/iser: Bump version, add maintainer

Roi Dayan (1):
  IB/iser: Remove unused variables and dead code

Roland Dreier (1):
  Merge branches 'core', 'cxgb4', 'iser', 'mlx5' and 'ocrdma' into for-next

Sagi Grimberg (23):
  IB/iser: Rename ib_conn -> iser_conn
  IB/iser: Re-introduce ib_conn
  IB/iser: Extend iser_free_ib_conn_res()
  IB/iser: Fix DEVICE REMOVAL handling in the absence of iscsi daemon
  IB/iser: Don't bound release_work completions timeouts
  IB/iser: Protect tasks cleanup in case IB device was already released
  IB/iser: Signal iSCSI layer that transport is broken in error completions
  IB/iser: Centralize iser completion contexts
  IB/iser: Use internal polling budget to avoid possible live-lock
  IB/iser: Use single CQ for RX and TX
  IB/iser: Use beacon to indicate all completions were consumed
  IB/iser: Optimize completion polling
  IB/iser: Suppress scsi command send completions
  IB/iser: Nit - add space after __func__ in iser logging
  IB/iser: Add/Fix kernel doc style descriptions in iscsi_iser.h
  IB/iser: Fix/add kernel-doc style description in iscsi_iser.c
  IB/mlx5: Use enumerations for PI copy mask
  IB/iser: Remove redundant assignment
  IB/iser: Set IP_CSUM as default guard type
  IB/mlx5: Use extended internal signature layout
  IB/iser: Centralize ib_sig_domain settings
  Target/iser: Centralize ib_sig_domain setting
  IB/mlx5, iser, isert: Add Signature API additions

Selvin Xavier (1):
  RDMA/ocrdma: Get vlan tag from ib_qp_attrs

Steve Wise (1):
  RDMA/cxgb4: Make c4iw_wr_log_size_order static

Yishai Hadas (1):
  IB/mlx5: Modify to work with arbitrary page size

 MAINTAINERS  |   1 +
 drivers/infiniband/core/uverbs_cmd.c |   2 +
 drivers/infiniband/core/uverbs_main.c|   5 +
 drivers/infiniband/hw/cxgb4/cm.c |  32 +-
 drivers/infiniband/hw/cxgb4/device.c |   2 +-
 drivers/infiniband/hw/mlx5/main.c|   8 +-
 drivers/infiniband/hw/mlx5/mem.c |  18 +-
 drivers/infiniband/hw/mlx5/mr.c  |   6 +-
 drivers/infiniband/hw/mlx5/qp.c  | 149 +++---
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c |  25 +-
 drivers/infiniband/hw/ocrdma/ocrdma_main.c   |  12 +
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h| 238 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |  10 +-
 drivers/infiniband/ulp/iser/iscsi_iser.c | 313 ++---
 drivers/infiniband/ulp/iser/iscsi_iser.h | 408 +++-
 drivers/infiniband/ulp/iser/iser_initiator.c | 198 
 drivers/infiniband/ulp/iser/iser_memory.c|  99 ++--
 drivers/infiniband/ulp/iser/iser_verbs.c | 667 +++
 drivers/infiniband/ulp/isert/ib_isert.c  |  65 ++-
 include/linux/mlx5/qp.h  |  35 +-
 include/rdma/ib_verbs.h  |  32 +-
 21 files changed, 1372 insertions(+), 953 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-10-16 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


Main set of InfiniBand/RDMA updates for 3.18 merge window:

 - Large set of iSER initiator improvements
 - Hardware driver fixes for cxgb4, mlx5 and ocrdma
 - Small fixes to core midlayer


Ariel Nahum (3):
  IB/iser: Unbind at conn_stop stage
  IB/iser: Use iser_warn instead of BUG_ON in iser_conn_release
  IB/iser: Change iscsi_conn_stop log level to info

Devesh Sharma (3):
  RDMA/ocrdma: Add default GID at index 0
  RDMA/ocrdma: Convert kernel VA to PA for mmap in user
  IB/core: Clear AH attr variable to prevent garbage data

Eli Cohen (5):
  IB/mlx5: Clear umr resources after ib_unregister_device
  IB/mlx5: Improve debug prints in mlx5_ib_reg_user_mr
  IB/core: Avoid leakage from kernel to user space
  IB/mlx5: Fix possible array overflow
  IB/mlx5: Remove duplicate code from mlx5_set_path

Hariprasad S (3):
  RDMA/cxgb4: Take IPv6 into account for best_mtu and set_emss
  RDMA/cxgb4: Add missing neigh_release in find_route
  RDMA/cxgb4: Fix ntuple calculation for ipv6 and remove duplicate line

Jack Morgenstein (1):
  IB/core: Fix XRC race condition in ib_uverbs_open_qp

Jes Sorensen (3):
  RDMA/ocrdma: Don't memset() buffers we just allocated with kzalloc()
  RDMA/ocrdma: The kernel has a perfectly good BIT() macro - use it
  RDMA/ocrdma: Save the bit environment, spare unncessary parenthesis

Li RongQing (1):
  RDMA/ocrdma: Remove a unused-label warning

Or Gerlitz (1):
  IB/iser: Bump version, add maintainer

Roi Dayan (1):
  IB/iser: Remove unused variables and dead code

Roland Dreier (1):
  Merge branches 'core', 'cxgb4', 'iser', 'mlx5' and 'ocrdma' into for-next

Sagi Grimberg (23):
  IB/iser: Rename ib_conn - iser_conn
  IB/iser: Re-introduce ib_conn
  IB/iser: Extend iser_free_ib_conn_res()
  IB/iser: Fix DEVICE REMOVAL handling in the absence of iscsi daemon
  IB/iser: Don't bound release_work completions timeouts
  IB/iser: Protect tasks cleanup in case IB device was already released
  IB/iser: Signal iSCSI layer that transport is broken in error completions
  IB/iser: Centralize iser completion contexts
  IB/iser: Use internal polling budget to avoid possible live-lock
  IB/iser: Use single CQ for RX and TX
  IB/iser: Use beacon to indicate all completions were consumed
  IB/iser: Optimize completion polling
  IB/iser: Suppress scsi command send completions
  IB/iser: Nit - add space after __func__ in iser logging
  IB/iser: Add/Fix kernel doc style descriptions in iscsi_iser.h
  IB/iser: Fix/add kernel-doc style description in iscsi_iser.c
  IB/mlx5: Use enumerations for PI copy mask
  IB/iser: Remove redundant assignment
  IB/iser: Set IP_CSUM as default guard type
  IB/mlx5: Use extended internal signature layout
  IB/iser: Centralize ib_sig_domain settings
  Target/iser: Centralize ib_sig_domain setting
  IB/mlx5, iser, isert: Add Signature API additions

Selvin Xavier (1):
  RDMA/ocrdma: Get vlan tag from ib_qp_attrs

Steve Wise (1):
  RDMA/cxgb4: Make c4iw_wr_log_size_order static

Yishai Hadas (1):
  IB/mlx5: Modify to work with arbitrary page size

 MAINTAINERS  |   1 +
 drivers/infiniband/core/uverbs_cmd.c |   2 +
 drivers/infiniband/core/uverbs_main.c|   5 +
 drivers/infiniband/hw/cxgb4/cm.c |  32 +-
 drivers/infiniband/hw/cxgb4/device.c |   2 +-
 drivers/infiniband/hw/mlx5/main.c|   8 +-
 drivers/infiniband/hw/mlx5/mem.c |  18 +-
 drivers/infiniband/hw/mlx5/mr.c  |   6 +-
 drivers/infiniband/hw/mlx5/qp.c  | 149 +++---
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c |  25 +-
 drivers/infiniband/hw/ocrdma/ocrdma_main.c   |  12 +
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h| 238 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c  |  10 +-
 drivers/infiniband/ulp/iser/iscsi_iser.c | 313 ++---
 drivers/infiniband/ulp/iser/iscsi_iser.h | 408 +++-
 drivers/infiniband/ulp/iser/iser_initiator.c | 198 
 drivers/infiniband/ulp/iser/iser_memory.c|  99 ++--
 drivers/infiniband/ulp/iser/iser_verbs.c | 667 +++
 drivers/infiniband/ulp/isert/ib_isert.c  |  65 ++-
 include/linux/mlx5/qp.h  |  35 +-
 include/rdma/ib_verbs.h  |  32 +-
 21 files changed, 1372 insertions(+), 953 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-09-23 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus

This is later and bigger than I would like, and the blame is all on
me: I got very busy with other stuff for a few weeks during the 3.17
cycle, and didn't prepare this tree as soon as I should have.  However
I don't think there's anything risky here, and no one really cares if
we break InfiniBand in 3.17 anyway...


Last late set of InfiniBand/RDMA fixes for 3.17:

 - Fixes for the new memory region re-registration support
 - iSER initiator error path fixes
 - Grab bag of small fixes for the qib and ocrdma hardware drivers
 - Larger set of fixes for mlx4, especially in RoCE mode


Alex Estrin (1):
  IPoIB: Remove unnecessary port query

Devesh Sharma (2):
  RDMA/ocrdma: Report correct value of max_fast_reg_page_list_len
  RDMA/ocrdma: Do not skip setting deferred_arm

Jack Morgenstein (6):
  IB/mlx4: Fix lockdep splat for the iboe lock
  mlx4: Fix mlx4 reg/unreg mac to work properly with 0-mac addresses
  IB/mlx4: Avoid accessing netdevice when building RoCE qp1 header
  IB/mlx4: Don't update QP1 in native mode
  IB/mlx4: Do not allow APM under RoCE
  IB/mlx4: Fix VF mac handling in RoCE

Markus Stockhausen (1):
  IB/mlx4: Disable TSO for Connect-X rev. A0 HCAs

Matan Barak (2):
  mlx4: Correct error flows in rereg_mr
  IB/core: When marshaling uverbs path, clear unused fields

Mike Marciniszyn (3):
  IB/ipath: Change get_user_pages() usage to always NULL vmas
  IB/qib: Change get_user_pages() usage to always NULL vmas
  IB/qib: Correct reference counting in debugfs qp_stats

Moni Shoua (5):
  IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs()
  IB/mlx4: Don't duplicate the default RoCE GID
  IB/mlx4: Reorder steps in RoCE GID table initialization
  IB/mlx4: Get upper dev addresses as RoCE GIDs when port comes up
  IB/mlx4: Avoid executing gid task when device is being removed

Or Gerlitz (1):
  IB/iser: Bump version to 1.4.1

Roi Dayan (1):
  IB/iser: Fix RX/TX CQ resource leak on error flow

Roland Dreier (1):
  Merge branches 'core', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into 
for-next

Sagi Grimberg (1):
  IB/iser: Allow bind only when connection state is UP

Shawn Bohrer (1):
  IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get

devesh.sha...@emulex.com (2):
  RDMA/ocrdma: Resolve L2 address when creating user AH
  RDMA/ocrdma: Use right macro in query AH

 drivers/infiniband/core/umem.c |  19 ++-
 drivers/infiniband/core/uverbs_marshall.c  |   4 +
 drivers/infiniband/hw/ipath/ipath_user_pages.c |   6 +-
 drivers/infiniband/hw/mlx4/main.c  | 169 +
 drivers/infiniband/hw/mlx4/mlx4_ib.h   |   1 +
 drivers/infiniband/hw/mlx4/mr.c|   7 +-
 drivers/infiniband/hw/mlx4/qp.c|  60 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c   |  43 +--
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c|   6 +-
 drivers/infiniband/hw/qib/qib_debugfs.c|   3 +-
 drivers/infiniband/hw/qib/qib_qp.c |   8 --
 drivers/infiniband/hw/qib/qib_user_pages.c |   6 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |  10 +-
 drivers/infiniband/ulp/iser/iscsi_iser.c   |  19 ++-
 drivers/infiniband/ulp/iser/iscsi_iser.h   |   2 +-
 drivers/infiniband/ulp/iser/iser_verbs.c   |  24 ++--
 drivers/net/ethernet/mellanox/mlx4/mr.c|  33 +++--
 drivers/net/ethernet/mellanox/mlx4/port.c  |  11 +-
 include/rdma/ib_umem.h |   1 +
 19 files changed, 277 insertions(+), 155 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-09-23 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus

This is later and bigger than I would like, and the blame is all on
me: I got very busy with other stuff for a few weeks during the 3.17
cycle, and didn't prepare this tree as soon as I should have.  However
I don't think there's anything risky here, and no one really cares if
we break InfiniBand in 3.17 anyway...


Last late set of InfiniBand/RDMA fixes for 3.17:

 - Fixes for the new memory region re-registration support
 - iSER initiator error path fixes
 - Grab bag of small fixes for the qib and ocrdma hardware drivers
 - Larger set of fixes for mlx4, especially in RoCE mode


Alex Estrin (1):
  IPoIB: Remove unnecessary port query

Devesh Sharma (2):
  RDMA/ocrdma: Report correct value of max_fast_reg_page_list_len
  RDMA/ocrdma: Do not skip setting deferred_arm

Jack Morgenstein (6):
  IB/mlx4: Fix lockdep splat for the iboe lock
  mlx4: Fix mlx4 reg/unreg mac to work properly with 0-mac addresses
  IB/mlx4: Avoid accessing netdevice when building RoCE qp1 header
  IB/mlx4: Don't update QP1 in native mode
  IB/mlx4: Do not allow APM under RoCE
  IB/mlx4: Fix VF mac handling in RoCE

Markus Stockhausen (1):
  IB/mlx4: Disable TSO for Connect-X rev. A0 HCAs

Matan Barak (2):
  mlx4: Correct error flows in rereg_mr
  IB/core: When marshaling uverbs path, clear unused fields

Mike Marciniszyn (3):
  IB/ipath: Change get_user_pages() usage to always NULL vmas
  IB/qib: Change get_user_pages() usage to always NULL vmas
  IB/qib: Correct reference counting in debugfs qp_stats

Moni Shoua (5):
  IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs()
  IB/mlx4: Don't duplicate the default RoCE GID
  IB/mlx4: Reorder steps in RoCE GID table initialization
  IB/mlx4: Get upper dev addresses as RoCE GIDs when port comes up
  IB/mlx4: Avoid executing gid task when device is being removed

Or Gerlitz (1):
  IB/iser: Bump version to 1.4.1

Roi Dayan (1):
  IB/iser: Fix RX/TX CQ resource leak on error flow

Roland Dreier (1):
  Merge branches 'core', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into 
for-next

Sagi Grimberg (1):
  IB/iser: Allow bind only when connection state is UP

Shawn Bohrer (1):
  IB: ib_umem_release() should decrement mm-pinned_vm from ib_umem_get

devesh.sha...@emulex.com (2):
  RDMA/ocrdma: Resolve L2 address when creating user AH
  RDMA/ocrdma: Use right macro in query AH

 drivers/infiniband/core/umem.c |  19 ++-
 drivers/infiniband/core/uverbs_marshall.c  |   4 +
 drivers/infiniband/hw/ipath/ipath_user_pages.c |   6 +-
 drivers/infiniband/hw/mlx4/main.c  | 169 +
 drivers/infiniband/hw/mlx4/mlx4_ib.h   |   1 +
 drivers/infiniband/hw/mlx4/mr.c|   7 +-
 drivers/infiniband/hw/mlx4/qp.c|  60 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c   |  43 +--
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c|   6 +-
 drivers/infiniband/hw/qib/qib_debugfs.c|   3 +-
 drivers/infiniband/hw/qib/qib_qp.c |   8 --
 drivers/infiniband/hw/qib/qib_user_pages.c |   6 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |  10 +-
 drivers/infiniband/ulp/iser/iscsi_iser.c   |  19 ++-
 drivers/infiniband/ulp/iser/iscsi_iser.h   |   2 +-
 drivers/infiniband/ulp/iser/iser_verbs.c   |  24 ++--
 drivers/net/ethernet/mellanox/mlx4/mr.c|  33 +++--
 drivers/net/ethernet/mellanox/mlx4/port.c  |  11 +-
 include/rdma/ib_umem.h |   1 +
 19 files changed, 277 insertions(+), 155 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1 for-next 00/16] On demand paging

2014-09-03 Thread Roland Dreier
> I would like to note that we at Los Alamos National Laboratory are very
> interested in this functionality and it would be great if it gets accepted.

Have you done any review or testing of these changes?  If so can you
share the results?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1 for-next 00/16] On demand paging

2014-09-03 Thread Roland Dreier
 I would like to note that we at Los Alamos National Laboratory are very
 interested in this functionality and it would be great if it gets accepted.

Have you done any review or testing of these changes?  If so can you
share the results?

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-08-14 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


Main set of InfiniBand/RDMA updates for 3.17 merge window:

 - MR reregistration support
 - MAD support for RMPP in userspace
 - iSER and SRP initiator updates
 - ocrdma hardware driver updates
 - other fixes...


Alex Estrin (1):
  IB/ipoib: Avoid multicast join attempts with invalid P_key

Ariel Nahum (3):
  IB/iser: Seperate iser_conn and iscsi_endpoint storage space
  IB/iser: Protect iser state machine with a mutex
  IB/iser: Replace connection waitqueue with completion object

Bart Van Assche (3):
  scsi_transport_srp: Fix fast_io_fail_tmo=dev_loss_tmo=off behavior
  IB/srp: Fix deadlock between host removal and multipathd
  IB/srp: Fix residual handling

Dan Carpenter (1):
  RDMA/amso1100: Check for integer overflow in c2_alloc_cq_buf()

Devesh Sharma (7):
  RDMA/ocrdma: Avoid posting DPP requests for RDMA READ
  be2net: Issue shutdown event to ocrdma driver
  RDMA/ocrdma: Handle shutdown event from be2net driver
  RDMA/ocrdma: Remove hardcoding of the max DPP QPs supported
  RDMA/ocrdma: Delete AH table if ocrdma_init_hw fails after AH table 
creation
  RDMA/ocrdma: Obtain SL from device structure
  RDMA/ocrdma: Update sli data structure for endianness

Doug Ledford (2):
  IB/srpt: Handle GID change events
  RDMA/uapi: Include socket.h in rdma_user_cm.h

Erez Shitrit (2):
  IB/ipoib: Use P_Key change event instead of P_Key polling mechanism
  IB/ipoib: Avoid flushing the workqueue from worker context

Fabian Frederick (3):
  IPoIB: Remove unnecessary test for NULL before debugfs_remove()
  IB/mlx4: Use ARRAY_SIZE instead of sizeof/sizeof[0]
  IB/mlx5: Use ARRAY_SIZE instead of sizeof/sizeof[0]

Ira Weiny (5):
  IB/umad: Update module to [pr|dev]_* style print messages
  IB/mad: Update module to [pr|dev]_* style print messages
  IB/mad: Add dev_notice messages for various umad/mad registration failures
  IB/mad: add new ioctl to ABI to support new registration options
  IB/mad: Add user space RMPP support

Jack Morgenstein (1):
  mlx4_core: Add support for secure-host and SMP firewall

Matan Barak (3):
  IB/core: Add user MR re-registration support
  mlx4_core: Add helper functions to support MR re-registration
  IB/mlx4_ib: Add support for user MR re-registration

Mitesh Ahuja (4):
  RDMA/ocrdma: Allow only SEND opcode in case of UD QPs
  RDMA/ocrdma: Do proper cleanup even if FW is in error state
  RDMA/ocrdma: Return proper value for max_mr_size
  RDMA/ocrdma: report asic-id in query device

Or Gerlitz (1):
  IB/ipath: Add P_Key change event support

Roi Dayan (3):
  IB/iser: Support IPv6 address family
  IB/iser: Add TIMEWAIT_EXIT event handling
  IB/iser: Clarify a duplicate counters check

Roland Dreier (1):
  Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'iwcm', 'mad', 'misc', 
'mlx4', 'mlx5', 'ocrdma' and 'srp' into for-next

Sagi Grimberg (2):
  IB/iser: Fix responder resources advertisement
  IB/iser: Remove redundant return code in iser_free_ib_conn_res()

Selvin Xavier (8):
  RDMA/ocrdma: Query and initalize the PFC SL
  RDMA/ocrdma: Add hca_type and fixing fw_version string in device 
atrributes
  RDMA/ocrdma: Avoid reporting wrong completions in case of error CQEs
  RDMA/ocrdma: Add missing adapter mailbox opcodes
  RDMA/ocrdma: Increase the size of STAG array in dev structure to 16K
  RDMA/ocrdma: Initialize the GID table while registering the device
  RDMA/ocrdma: Fix a sparse warning
  RDMA/ocrdma: Update the ocrdma module version string

Steve Wise (2):
  RDMA/cxgb4: Only call CQ completion handler if it is armed
  RDMA/iwcm: Use a default listen backlog if needed

Wei Yongjun (1):
  IB/srp: Fix return value check in srp_init_module()

 Documentation/infiniband/user_mad.txt  |  13 +-
 drivers/infiniband/core/agent.c|  16 +-
 drivers/infiniband/core/cm.c   |   5 +-
 drivers/infiniband/core/iwcm.c |  27 ++
 drivers/infiniband/core/mad.c  | 283 +---
 drivers/infiniband/core/mad_priv.h |   3 -
 drivers/infiniband/core/sa_query.c |   2 +-
 drivers/infiniband/core/user_mad.c | 188 +++--
 drivers/infiniband/core/uverbs.h   |   1 +
 drivers/infiniband/core/uverbs_cmd.c   |  93 +++
 drivers/infiniband/core/uverbs_main.c  |   1 +
 drivers/infiniband/hw/amso1100/c2_cq.c |   7 +-
 drivers/infiniband/hw/cxgb4/ev.c   |   1 +
 drivers/infiniband/hw/cxgb4/qp.c   |  37 ++-
 drivers/infiniband

[GIT PULL] please pull infiniband.git

2014-08-14 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


Main set of InfiniBand/RDMA updates for 3.17 merge window:

 - MR reregistration support
 - MAD support for RMPP in userspace
 - iSER and SRP initiator updates
 - ocrdma hardware driver updates
 - other fixes...


Alex Estrin (1):
  IB/ipoib: Avoid multicast join attempts with invalid P_key

Ariel Nahum (3):
  IB/iser: Seperate iser_conn and iscsi_endpoint storage space
  IB/iser: Protect iser state machine with a mutex
  IB/iser: Replace connection waitqueue with completion object

Bart Van Assche (3):
  scsi_transport_srp: Fix fast_io_fail_tmo=dev_loss_tmo=off behavior
  IB/srp: Fix deadlock between host removal and multipathd
  IB/srp: Fix residual handling

Dan Carpenter (1):
  RDMA/amso1100: Check for integer overflow in c2_alloc_cq_buf()

Devesh Sharma (7):
  RDMA/ocrdma: Avoid posting DPP requests for RDMA READ
  be2net: Issue shutdown event to ocrdma driver
  RDMA/ocrdma: Handle shutdown event from be2net driver
  RDMA/ocrdma: Remove hardcoding of the max DPP QPs supported
  RDMA/ocrdma: Delete AH table if ocrdma_init_hw fails after AH table 
creation
  RDMA/ocrdma: Obtain SL from device structure
  RDMA/ocrdma: Update sli data structure for endianness

Doug Ledford (2):
  IB/srpt: Handle GID change events
  RDMA/uapi: Include socket.h in rdma_user_cm.h

Erez Shitrit (2):
  IB/ipoib: Use P_Key change event instead of P_Key polling mechanism
  IB/ipoib: Avoid flushing the workqueue from worker context

Fabian Frederick (3):
  IPoIB: Remove unnecessary test for NULL before debugfs_remove()
  IB/mlx4: Use ARRAY_SIZE instead of sizeof/sizeof[0]
  IB/mlx5: Use ARRAY_SIZE instead of sizeof/sizeof[0]

Ira Weiny (5):
  IB/umad: Update module to [pr|dev]_* style print messages
  IB/mad: Update module to [pr|dev]_* style print messages
  IB/mad: Add dev_notice messages for various umad/mad registration failures
  IB/mad: add new ioctl to ABI to support new registration options
  IB/mad: Add user space RMPP support

Jack Morgenstein (1):
  mlx4_core: Add support for secure-host and SMP firewall

Matan Barak (3):
  IB/core: Add user MR re-registration support
  mlx4_core: Add helper functions to support MR re-registration
  IB/mlx4_ib: Add support for user MR re-registration

Mitesh Ahuja (4):
  RDMA/ocrdma: Allow only SEND opcode in case of UD QPs
  RDMA/ocrdma: Do proper cleanup even if FW is in error state
  RDMA/ocrdma: Return proper value for max_mr_size
  RDMA/ocrdma: report asic-id in query device

Or Gerlitz (1):
  IB/ipath: Add P_Key change event support

Roi Dayan (3):
  IB/iser: Support IPv6 address family
  IB/iser: Add TIMEWAIT_EXIT event handling
  IB/iser: Clarify a duplicate counters check

Roland Dreier (1):
  Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'iwcm', 'mad', 'misc', 
'mlx4', 'mlx5', 'ocrdma' and 'srp' into for-next

Sagi Grimberg (2):
  IB/iser: Fix responder resources advertisement
  IB/iser: Remove redundant return code in iser_free_ib_conn_res()

Selvin Xavier (8):
  RDMA/ocrdma: Query and initalize the PFC SL
  RDMA/ocrdma: Add hca_type and fixing fw_version string in device 
atrributes
  RDMA/ocrdma: Avoid reporting wrong completions in case of error CQEs
  RDMA/ocrdma: Add missing adapter mailbox opcodes
  RDMA/ocrdma: Increase the size of STAG array in dev structure to 16K
  RDMA/ocrdma: Initialize the GID table while registering the device
  RDMA/ocrdma: Fix a sparse warning
  RDMA/ocrdma: Update the ocrdma module version string

Steve Wise (2):
  RDMA/cxgb4: Only call CQ completion handler if it is armed
  RDMA/iwcm: Use a default listen backlog if needed

Wei Yongjun (1):
  IB/srp: Fix return value check in srp_init_module()

 Documentation/infiniband/user_mad.txt  |  13 +-
 drivers/infiniband/core/agent.c|  16 +-
 drivers/infiniband/core/cm.c   |   5 +-
 drivers/infiniband/core/iwcm.c |  27 ++
 drivers/infiniband/core/mad.c  | 283 +---
 drivers/infiniband/core/mad_priv.h |   3 -
 drivers/infiniband/core/sa_query.c |   2 +-
 drivers/infiniband/core/user_mad.c | 188 +++--
 drivers/infiniband/core/uverbs.h   |   1 +
 drivers/infiniband/core/uverbs_cmd.c   |  93 +++
 drivers/infiniband/core/uverbs_main.c  |   1 +
 drivers/infiniband/hw/amso1100/c2_cq.c |   7 +-
 drivers/infiniband/hw/cxgb4/ev.c   |   1 +
 drivers/infiniband/hw/cxgb4/qp.c   |  37 ++-
 drivers/infiniband

[GIT PULL] please pull infiniband.git

2014-07-18 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


InfiniBand/RDMA fixes for 3.16

 - cxgb4 hardware driver regression fixes
 - mlx5 hardware driver regression fixes


Hariprasad S (2):
  RDMA/cxgb4: Fix skb_leak in reject_cr()
  RDMA/cxgb4: Clean up connection on ARP error

Or Gerlitz (1):
  IB/mlx5: Enable "block multicast loopback" for kernel consumers

Roland Dreier (1):
  Merge branches 'cxgb4' and 'mlx5' into for-next

Sagi Grimberg (1):
  mlx5_core: Fix possible race between mr tree insert/delete

Steve Wise (2):
  RDMA/cxgb4: Initialize the device status page
  RDMA/cxgb4: Call iwpm_init() only once

 drivers/infiniband/hw/cxgb4/cm.c | 14 +++---
 drivers/infiniband/hw/cxgb4/device.c | 18 +++---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h   |  2 +-
 drivers/infiniband/hw/mlx5/qp.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/mr.c | 19 +++
 5 files changed, 39 insertions(+), 16 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-07-18 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


InfiniBand/RDMA fixes for 3.16

 - cxgb4 hardware driver regression fixes
 - mlx5 hardware driver regression fixes


Hariprasad S (2):
  RDMA/cxgb4: Fix skb_leak in reject_cr()
  RDMA/cxgb4: Clean up connection on ARP error

Or Gerlitz (1):
  IB/mlx5: Enable block multicast loopback for kernel consumers

Roland Dreier (1):
  Merge branches 'cxgb4' and 'mlx5' into for-next

Sagi Grimberg (1):
  mlx5_core: Fix possible race between mr tree insert/delete

Steve Wise (2):
  RDMA/cxgb4: Initialize the device status page
  RDMA/cxgb4: Call iwpm_init() only once

 drivers/infiniband/hw/cxgb4/cm.c | 14 +++---
 drivers/infiniband/hw/cxgb4/device.c | 18 +++---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h   |  2 +-
 drivers/infiniband/hw/mlx5/qp.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/mr.c | 19 +++
 5 files changed, 39 insertions(+), 16 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-06-10 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



Main batch of InfiniBand/RDMA changes for 3.16:

 - Add iWARP port mapper to avoid conflicts between RDMA and normal
   stack TCP connections.

 - Fixes for i386 / x86-64 structure padding differences (ABI
   compatibility for 32-on-64) from Yann Droneaud.

 - A pile of SRP initiator fixes from Bart Van Assche.

 - Fixes for a writeback / memory allocation deadlock with NFS over
   IPoIB connected mode from Jiri Kosina.

 - The usual fixes and cleanups to mlx4, mlx5, cxgb4 and other
   low-level drivers.


Ariel Nahum (2):
  IB/iser: Simplify connection management
  IB/iser: Fix a possible race in iser connection states transition

Bart Van Assche (11):
  IB/srp: Fix a sporadic crash triggered by cable pulling
  IB/srp: Fix kernel-doc warnings
  IB/srp: Introduce an additional local variable
  IB/srp: Introduce srp_map_fmr()
  IB/srp: Introduce srp_finish_mapping()
  IB/srp: Introduce the 'register_always' kernel module parameter
  IB/srp: One FMR pool per SRP connection
  IB/srp: Rename FMR-related variables
  IB/srp: Add fast registration support
  IB/umad: Fix error handling
  IB/umad: Fix use-after-free on close

Christoph Jaeger (1):
  RDMA/cxgb4: Fix memory leaks in c4iw_alloc() error paths

Colin Ian King (1):
  IB/mlx4: fix unitialised variable is_mcast

Dan Carpenter (2):
  RDMA/cxgb3: Fix information leak in send_abort()
  RDMA/cxgb3: Remove a couple unneeded conditions

Dennis Dalessandro (1):
  IB/ipath: Translate legacy diagpkt into newer extended diagpkt

Dotan Barak (1):
  mlx4_core: Fix memory leaks in SR-IOV error paths

Duan Jiong (1):
  RDMA/ocrdma: Convert to use simple_open()

Haggai Eran (7):
  IB/mlx5: Fix error handling in reg_umr
  IB/mlx5: Add MR to radix tree in reg_mr_callback
  mlx5_core: Store MR attributes in mlx5_mr_core during creation and after 
UMR
  IB/mlx5: Set QP offsets and parameters for user QPs and not just for 
kernel QPs
  IB/core: Remove unneeded kobject_get/put calls
  IB/core: Fix port kobject deletion during error flow
  IB/core: Fix kobject leak on device register error flow

Jack Morgenstein (5):
  mlx4_core: Fix incorrect FLAGS1 bitmap test in mlx4_QUERY_FUNC_CAP
  IB/mlx4: SET_PORT called by mlx4_ib_modify_port should be wrapped
  IB/mlx4: Preparation for VFs to issue/receive SMI (QP0) requests/responses
  mlx4: Add infrastructure for selecting VFs to enable QP0 via MLX proxy QPs
  IB/mlx4: Add interface for selecting VFs to enable QP0 via MLX proxy QPs

Jiri Kosina (2):
  IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO
  IB/mlx4: Fix gfp passing in create_qp_common()

Joe Perches (1):
  IB/srp: Avoid problems if a header uses pr_fmt

Manuel Schölling (1):
  IB/ipath: Use time_before()/_after()

Mike Marciniszyn (1):
  IB/qib: Fix port in pkey change event

Or Gerlitz (3):
  IB/iser: Bump version to 1.4
  IB: Return error for unsupported QP creation flags
  IB: Add a QP creation flag to use GFP_NOIO allocations

Roi Dayan (1):
  IB/iser: Add missing newlines to logging messages

Roland Dreier (6):
  IB/mlx5: Fix warning about cast of wr_id back to pointer on 32 bits
  mlx4_core: Move handling of MLX4_QP_ST_MLX to proper switch statement
  IB/mad: Fix sparse warning about gfp_t use
  IB/core: Fix sparse warnings about redeclared functions
  mlx4_core: Fix GFP flags parameters to be gfp_t
  Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', 
'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next

Sagi Grimberg (3):
  mlx5_core: Fix signature handover operation for interleaved buffers
  mlx5_core: Simplify signature handover wqe for interleaved buffers
  mlx5_core: Copy DIF fields only when input and output space values match

Shachar Raindel (1):
  IB/mlx5: Refactor UMR to have its own context struct

Steve Wise (2):
  RDMA/cxgb4: Fix vlan support
  RDMA/cxgb4: Add support for iWARP Port Mapper user space service

Tatyana Nikolova (2):
  RDMA/core: Add support for iWARP Port Mapper user space service
  RDMA/nes: Add support for iWARP Port Mapper user space service

Upinder Malhi (1):
  IB/usnic: Fix source file missing copyright and license

Vinit Agnihotri (1):
  IB/qib: Additional Intel branding changes

Yann Droneaud (5):
  IB/mlx5: add missing padding at end of struct mlx5_ib_create_cq
  IB/mlx5: add missing padding at end of struct mlx5_ib_create_srq
  RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp
  IB: Allow build of hw/ and ulp/ subdirectories independently
  RDMA/cxgb4: add missing padding at end of struct

[GIT PULL] please pull infiniband.git

2014-06-10 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



Main batch of InfiniBand/RDMA changes for 3.16:

 - Add iWARP port mapper to avoid conflicts between RDMA and normal
   stack TCP connections.

 - Fixes for i386 / x86-64 structure padding differences (ABI
   compatibility for 32-on-64) from Yann Droneaud.

 - A pile of SRP initiator fixes from Bart Van Assche.

 - Fixes for a writeback / memory allocation deadlock with NFS over
   IPoIB connected mode from Jiri Kosina.

 - The usual fixes and cleanups to mlx4, mlx5, cxgb4 and other
   low-level drivers.


Ariel Nahum (2):
  IB/iser: Simplify connection management
  IB/iser: Fix a possible race in iser connection states transition

Bart Van Assche (11):
  IB/srp: Fix a sporadic crash triggered by cable pulling
  IB/srp: Fix kernel-doc warnings
  IB/srp: Introduce an additional local variable
  IB/srp: Introduce srp_map_fmr()
  IB/srp: Introduce srp_finish_mapping()
  IB/srp: Introduce the 'register_always' kernel module parameter
  IB/srp: One FMR pool per SRP connection
  IB/srp: Rename FMR-related variables
  IB/srp: Add fast registration support
  IB/umad: Fix error handling
  IB/umad: Fix use-after-free on close

Christoph Jaeger (1):
  RDMA/cxgb4: Fix memory leaks in c4iw_alloc() error paths

Colin Ian King (1):
  IB/mlx4: fix unitialised variable is_mcast

Dan Carpenter (2):
  RDMA/cxgb3: Fix information leak in send_abort()
  RDMA/cxgb3: Remove a couple unneeded conditions

Dennis Dalessandro (1):
  IB/ipath: Translate legacy diagpkt into newer extended diagpkt

Dotan Barak (1):
  mlx4_core: Fix memory leaks in SR-IOV error paths

Duan Jiong (1):
  RDMA/ocrdma: Convert to use simple_open()

Haggai Eran (7):
  IB/mlx5: Fix error handling in reg_umr
  IB/mlx5: Add MR to radix tree in reg_mr_callback
  mlx5_core: Store MR attributes in mlx5_mr_core during creation and after 
UMR
  IB/mlx5: Set QP offsets and parameters for user QPs and not just for 
kernel QPs
  IB/core: Remove unneeded kobject_get/put calls
  IB/core: Fix port kobject deletion during error flow
  IB/core: Fix kobject leak on device register error flow

Jack Morgenstein (5):
  mlx4_core: Fix incorrect FLAGS1 bitmap test in mlx4_QUERY_FUNC_CAP
  IB/mlx4: SET_PORT called by mlx4_ib_modify_port should be wrapped
  IB/mlx4: Preparation for VFs to issue/receive SMI (QP0) requests/responses
  mlx4: Add infrastructure for selecting VFs to enable QP0 via MLX proxy QPs
  IB/mlx4: Add interface for selecting VFs to enable QP0 via MLX proxy QPs

Jiri Kosina (2):
  IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO
  IB/mlx4: Fix gfp passing in create_qp_common()

Joe Perches (1):
  IB/srp: Avoid problems if a header uses pr_fmt

Manuel Schölling (1):
  IB/ipath: Use time_before()/_after()

Mike Marciniszyn (1):
  IB/qib: Fix port in pkey change event

Or Gerlitz (3):
  IB/iser: Bump version to 1.4
  IB: Return error for unsupported QP creation flags
  IB: Add a QP creation flag to use GFP_NOIO allocations

Roi Dayan (1):
  IB/iser: Add missing newlines to logging messages

Roland Dreier (6):
  IB/mlx5: Fix warning about cast of wr_id back to pointer on 32 bits
  mlx4_core: Move handling of MLX4_QP_ST_MLX to proper switch statement
  IB/mad: Fix sparse warning about gfp_t use
  IB/core: Fix sparse warnings about redeclared functions
  mlx4_core: Fix GFP flags parameters to be gfp_t
  Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', 
'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next

Sagi Grimberg (3):
  mlx5_core: Fix signature handover operation for interleaved buffers
  mlx5_core: Simplify signature handover wqe for interleaved buffers
  mlx5_core: Copy DIF fields only when input and output space values match

Shachar Raindel (1):
  IB/mlx5: Refactor UMR to have its own context struct

Steve Wise (2):
  RDMA/cxgb4: Fix vlan support
  RDMA/cxgb4: Add support for iWARP Port Mapper user space service

Tatyana Nikolova (2):
  RDMA/core: Add support for iWARP Port Mapper user space service
  RDMA/nes: Add support for iWARP Port Mapper user space service

Upinder Malhi (1):
  IB/usnic: Fix source file missing copyright and license

Vinit Agnihotri (1):
  IB/qib: Additional Intel branding changes

Yann Droneaud (5):
  IB/mlx5: add missing padding at end of struct mlx5_ib_create_cq
  IB/mlx5: add missing padding at end of struct mlx5_ib_create_srq
  RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp
  IB: Allow build of hw/ and ulp/ subdirectories independently
  RDMA/cxgb4: add missing padding at end of struct

Re: [PATCH v1 for-next 0/3] IB: Use GFP_NOIO calls in IPoIB connected mode TX path

2014-05-19 Thread Roland Dreier
On Sat, May 17, 2014 at 1:52 PM, Or Gerlitz  wrote:
> Roland, we're soon on -rc6 and there's no reason for this to miss
> 3.16, could you please comment whether you want it to go through your
> tree or net-next?

I will pick it up.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1 for-next 0/3] IB: Use GFP_NOIO calls in IPoIB connected mode TX path

2014-05-19 Thread Roland Dreier
On Sat, May 17, 2014 at 1:52 PM, Or Gerlitz or.gerl...@gmail.com wrote:
 Roland, we're soon on -rc6 and there's no reason for this to miss
 3.16, could you please comment whether you want it to go through your
 tree or net-next?

I will pick it up.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/mm] x86, ioremap: Speed up check for RAM pages

2014-05-02 Thread tip-bot for Roland Dreier
Commit-ID:  c81c8a1eeede61e92a15103748c23d100880cc8a
Gitweb: http://git.kernel.org/tip/c81c8a1eeede61e92a15103748c23d100880cc8a
Author: Roland Dreier 
AuthorDate: Fri, 2 May 2014 11:18:41 -0700
Committer:  H. Peter Anvin 
CommitDate: Fri, 2 May 2014 11:52:26 -0700

x86, ioremap: Speed up check for RAM pages

In __ioremap_caller() (the guts of ioremap), we loop over the range of
pfns being remapped and checks each one individually with page_is_ram().
For large ioremaps, this can be very slow.  For example, we have a
device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+
seconds -- sometimes long enough to trigger the soft lockup detector!

Internally, page_is_ram() calls walk_system_ram_range() on a single
page.  Instead, we can make a single call to walk_system_ram_range()
from __ioremap_caller(), and do our further checks only for any RAM
pages that we find.  For the common case of MMIO, this saves an enormous
amount of work, since the range being ioremapped doesn't intersect
system RAM at all.

With this change, ioremap on our 256 GiB BAR takes less than 1 second.

Signed-off-by: Roland Dreier 
Link: 
http://lkml.kernel.org/r/1399054721-1331-1-git-send-email-rol...@kernel.org
Signed-off-by: H. Peter Anvin 
---
 arch/x86/mm/ioremap.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 597ac15..bc7527e 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -50,6 +50,21 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long 
size,
return err;
 }
 
+static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages,
+  void *arg)
+{
+   unsigned long i;
+
+   for (i = 0; i < nr_pages; ++i)
+   if (pfn_valid(start_pfn + i) &&
+   !PageReserved(pfn_to_page(start_pfn + i)))
+   return 1;
+
+   WARN_ONCE(1, "ioremap on RAM pfn 0x%lx\n", start_pfn);
+
+   return 0;
+}
+
 /*
  * Remap an arbitrary physical address space into the kernel virtual
  * address space. Needed when the kernel wants to access high addresses
@@ -93,14 +108,11 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
/*
 * Don't allow anybody to remap normal RAM that we're using..
 */
+   pfn  = phys_addr >> PAGE_SHIFT;
last_pfn = last_addr >> PAGE_SHIFT;
-   for (pfn = phys_addr >> PAGE_SHIFT; pfn <= last_pfn; pfn++) {
-   int is_ram = page_is_ram(pfn);
-
-   if (is_ram && pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn)))
-   return NULL;
-   WARN_ON_ONCE(is_ram);
-   }
+   if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
+ __ioremap_check_ram) == 1)
+   return NULL;
 
/*
 * Mappings have to be page-aligned
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86, ioremap: Speed up check for RAM pages

2014-05-02 Thread Roland Dreier
From: Roland Dreier 

In __ioremap_caller() (the guts of ioremap), we loop over the range of
pfns being remapped and checks each one individually with page_is_ram().
For large ioremaps, this can be very slow.  For example, we have a
device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+
seconds -- sometimes long enough to trigger the soft lockup detector!

Internally, page_is_ram() calls walk_system_ram_range() on a single
page.  Instead, we can make a single call to walk_system_ram_range()
from __ioremap_caller(), and do our further checks only for any RAM
pages that we find.  For the common case of MMIO, this saves an enormous
amount of work, since the range being ioremapped doesn't intersect
system RAM at all.

With this change, ioremap on our 256 GiB BAR takes less than 1 second.

Signed-off-by: Roland Dreier 
---
 arch/x86/mm/ioremap.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 597ac155c91c..bc7527e109c8 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -50,6 +50,21 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long 
size,
return err;
 }
 
+static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages,
+  void *arg)
+{
+   unsigned long i;
+
+   for (i = 0; i < nr_pages; ++i)
+   if (pfn_valid(start_pfn + i) &&
+   !PageReserved(pfn_to_page(start_pfn + i)))
+   return 1;
+
+   WARN_ONCE(1, "ioremap on RAM pfn 0x%lx\n", start_pfn);
+
+   return 0;
+}
+
 /*
  * Remap an arbitrary physical address space into the kernel virtual
  * address space. Needed when the kernel wants to access high addresses
@@ -93,14 +108,11 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
/*
 * Don't allow anybody to remap normal RAM that we're using..
 */
+   pfn  = phys_addr >> PAGE_SHIFT;
last_pfn = last_addr >> PAGE_SHIFT;
-   for (pfn = phys_addr >> PAGE_SHIFT; pfn <= last_pfn; pfn++) {
-   int is_ram = page_is_ram(pfn);
-
-   if (is_ram && pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn)))
-   return NULL;
-   WARN_ON_ONCE(is_ram);
-   }
+   if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
+ __ioremap_check_ram) == 1)
+   return NULL;
 
/*
 * Mappings have to be page-aligned
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86, ioremap: Speed up check for RAM pages

2014-05-02 Thread Roland Dreier
From: Roland Dreier rol...@purestorage.com

In __ioremap_caller() (the guts of ioremap), we loop over the range of
pfns being remapped and checks each one individually with page_is_ram().
For large ioremaps, this can be very slow.  For example, we have a
device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+
seconds -- sometimes long enough to trigger the soft lockup detector!

Internally, page_is_ram() calls walk_system_ram_range() on a single
page.  Instead, we can make a single call to walk_system_ram_range()
from __ioremap_caller(), and do our further checks only for any RAM
pages that we find.  For the common case of MMIO, this saves an enormous
amount of work, since the range being ioremapped doesn't intersect
system RAM at all.

With this change, ioremap on our 256 GiB BAR takes less than 1 second.

Signed-off-by: Roland Dreier rol...@purestorage.com
---
 arch/x86/mm/ioremap.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 597ac155c91c..bc7527e109c8 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -50,6 +50,21 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long 
size,
return err;
 }
 
+static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages,
+  void *arg)
+{
+   unsigned long i;
+
+   for (i = 0; i  nr_pages; ++i)
+   if (pfn_valid(start_pfn + i) 
+   !PageReserved(pfn_to_page(start_pfn + i)))
+   return 1;
+
+   WARN_ONCE(1, ioremap on RAM pfn 0x%lx\n, start_pfn);
+
+   return 0;
+}
+
 /*
  * Remap an arbitrary physical address space into the kernel virtual
  * address space. Needed when the kernel wants to access high addresses
@@ -93,14 +108,11 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
/*
 * Don't allow anybody to remap normal RAM that we're using..
 */
+   pfn  = phys_addr  PAGE_SHIFT;
last_pfn = last_addr  PAGE_SHIFT;
-   for (pfn = phys_addr  PAGE_SHIFT; pfn = last_pfn; pfn++) {
-   int is_ram = page_is_ram(pfn);
-
-   if (is_ram  pfn_valid(pfn)  !PageReserved(pfn_to_page(pfn)))
-   return NULL;
-   WARN_ON_ONCE(is_ram);
-   }
+   if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
+ __ioremap_check_ram) == 1)
+   return NULL;
 
/*
 * Mappings have to be page-aligned
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/mm] x86, ioremap: Speed up check for RAM pages

2014-05-02 Thread tip-bot for Roland Dreier
Commit-ID:  c81c8a1eeede61e92a15103748c23d100880cc8a
Gitweb: http://git.kernel.org/tip/c81c8a1eeede61e92a15103748c23d100880cc8a
Author: Roland Dreier rol...@purestorage.com
AuthorDate: Fri, 2 May 2014 11:18:41 -0700
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Fri, 2 May 2014 11:52:26 -0700

x86, ioremap: Speed up check for RAM pages

In __ioremap_caller() (the guts of ioremap), we loop over the range of
pfns being remapped and checks each one individually with page_is_ram().
For large ioremaps, this can be very slow.  For example, we have a
device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+
seconds -- sometimes long enough to trigger the soft lockup detector!

Internally, page_is_ram() calls walk_system_ram_range() on a single
page.  Instead, we can make a single call to walk_system_ram_range()
from __ioremap_caller(), and do our further checks only for any RAM
pages that we find.  For the common case of MMIO, this saves an enormous
amount of work, since the range being ioremapped doesn't intersect
system RAM at all.

With this change, ioremap on our 256 GiB BAR takes less than 1 second.

Signed-off-by: Roland Dreier rol...@purestorage.com
Link: 
http://lkml.kernel.org/r/1399054721-1331-1-git-send-email-rol...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/mm/ioremap.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 597ac15..bc7527e 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -50,6 +50,21 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long 
size,
return err;
 }
 
+static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages,
+  void *arg)
+{
+   unsigned long i;
+
+   for (i = 0; i  nr_pages; ++i)
+   if (pfn_valid(start_pfn + i) 
+   !PageReserved(pfn_to_page(start_pfn + i)))
+   return 1;
+
+   WARN_ONCE(1, ioremap on RAM pfn 0x%lx\n, start_pfn);
+
+   return 0;
+}
+
 /*
  * Remap an arbitrary physical address space into the kernel virtual
  * address space. Needed when the kernel wants to access high addresses
@@ -93,14 +108,11 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
/*
 * Don't allow anybody to remap normal RAM that we're using..
 */
+   pfn  = phys_addr  PAGE_SHIFT;
last_pfn = last_addr  PAGE_SHIFT;
-   for (pfn = phys_addr  PAGE_SHIFT; pfn = last_pfn; pfn++) {
-   int is_ram = page_is_ram(pfn);
-
-   if (is_ram  pfn_valid(pfn)  !PageReserved(pfn_to_page(pfn)))
-   return NULL;
-   WARN_ON_ONCE(is_ram);
-   }
+   if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
+ __ioremap_check_ram) == 1)
+   return NULL;
 
/*
 * Mappings have to be page-aligned
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-05-01 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


InfiniBand/RDMA updates for 3.15-rc4:

 - cxgb4 hardware driver fixes


Hariprasad S (1):
  RDMA/cxgb4: Update Kconfig to include Chelsio T5 adapter

Steve Wise (3):
  RDMA/cxgb4: Fix endpoint mutex deadlocks
  RDMA/cxgb4: Force T5 connections to use TAHOE congestion control
  RDMA/cxgb4: Only allow kernel db ringing for T4 devs

 drivers/infiniband/hw/cxgb4/Kconfig   |  6 ++---
 drivers/infiniband/hw/cxgb4/cm.c  | 39 ++-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h|  1 +
 drivers/infiniband/hw/cxgb4/qp.c  | 13 +++
 drivers/infiniband/hw/cxgb4/t4fw_ri_api.h | 14 +++
 5 files changed, 55 insertions(+), 18 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-05-01 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus


InfiniBand/RDMA updates for 3.15-rc4:

 - cxgb4 hardware driver fixes


Hariprasad S (1):
  RDMA/cxgb4: Update Kconfig to include Chelsio T5 adapter

Steve Wise (3):
  RDMA/cxgb4: Fix endpoint mutex deadlocks
  RDMA/cxgb4: Force T5 connections to use TAHOE congestion control
  RDMA/cxgb4: Only allow kernel db ringing for T4 devs

 drivers/infiniband/hw/cxgb4/Kconfig   |  6 ++---
 drivers/infiniband/hw/cxgb4/cm.c  | 39 ++-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h|  1 +
 drivers/infiniband/hw/cxgb4/qp.c  | 13 +++
 drivers/infiniband/hw/cxgb4/t4fw_ri_api.h | 14 +++
 5 files changed, 55 insertions(+), 18 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-04-18 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



InfiniBand/RDMA updates for 3.15-rc2:

 - Mostly cxgb4 fixes unblocked by the merge of some prerequisites via
   the net tree.

 - Drop deprecated MSI-X API use.

 - A couple other miscellaneous things.


Alexander Gordeev (2):
  IB/qib: Use pci_enable_msix_range() instead of pci_enable_msix()
  IB/mthca: Use pci_enable_msix_exact() instead of pci_enable_msix()

Eli Cohen (1):
  IB/mlx5: Add block multicast loopback support

Hariprasad Shenai (1):
  RDMA/cxgb4: Use pr_warn_ratelimited

Roland Dreier (1):
  Merge branches 'cxgb4', 'misc', 'mlx5' and 'qib' into for-next

Steve Wise (9):
  RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices
  RDMA/cxgb4: Endpoint timeout fixes
  RDMA/cxgb4: rmb() after reading valid gen bit
  RDMA/cxgb4: SQ flush fix
  RDMA/cxgb4: Max fastreg depth depends on DSGL support
  RDMA/cxgb4: Initialize reserved fields in a FW work request
  RDMA/cxgb4: Add missing debug stats
  RDMA/cxgb4: Use uninitialized_var()
  RDMA/cxgb4: Fix over-dereference when terminating

 drivers/infiniband/hw/cxgb4/cm.c | 89 
 drivers/infiniband/hw/cxgb4/cq.c | 24 -
 drivers/infiniband/hw/cxgb4/device.c | 41 ---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h   |  2 +
 drivers/infiniband/hw/cxgb4/mem.c|  6 ++-
 drivers/infiniband/hw/cxgb4/provider.c   |  2 +-
 drivers/infiniband/hw/cxgb4/qp.c | 70 +++--
 drivers/infiniband/hw/cxgb4/resource.c   | 10 ++--
 drivers/infiniband/hw/cxgb4/t4.h | 72 --
 drivers/infiniband/hw/mlx5/main.c|  2 +
 drivers/infiniband/hw/mlx5/qp.c  | 12 +
 drivers/infiniband/hw/mthca/mthca_main.c |  8 +--
 drivers/infiniband/hw/qib/qib_pcie.c | 55 ++--
 include/linux/mlx5/device.h  |  1 +
 include/linux/mlx5/qp.h  |  1 +
 15 files changed, 270 insertions(+), 125 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-04-18 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



InfiniBand/RDMA updates for 3.15-rc2:

 - Mostly cxgb4 fixes unblocked by the merge of some prerequisites via
   the net tree.

 - Drop deprecated MSI-X API use.

 - A couple other miscellaneous things.


Alexander Gordeev (2):
  IB/qib: Use pci_enable_msix_range() instead of pci_enable_msix()
  IB/mthca: Use pci_enable_msix_exact() instead of pci_enable_msix()

Eli Cohen (1):
  IB/mlx5: Add block multicast loopback support

Hariprasad Shenai (1):
  RDMA/cxgb4: Use pr_warn_ratelimited

Roland Dreier (1):
  Merge branches 'cxgb4', 'misc', 'mlx5' and 'qib' into for-next

Steve Wise (9):
  RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices
  RDMA/cxgb4: Endpoint timeout fixes
  RDMA/cxgb4: rmb() after reading valid gen bit
  RDMA/cxgb4: SQ flush fix
  RDMA/cxgb4: Max fastreg depth depends on DSGL support
  RDMA/cxgb4: Initialize reserved fields in a FW work request
  RDMA/cxgb4: Add missing debug stats
  RDMA/cxgb4: Use uninitialized_var()
  RDMA/cxgb4: Fix over-dereference when terminating

 drivers/infiniband/hw/cxgb4/cm.c | 89 
 drivers/infiniband/hw/cxgb4/cq.c | 24 -
 drivers/infiniband/hw/cxgb4/device.c | 41 ---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h   |  2 +
 drivers/infiniband/hw/cxgb4/mem.c|  6 ++-
 drivers/infiniband/hw/cxgb4/provider.c   |  2 +-
 drivers/infiniband/hw/cxgb4/qp.c | 70 +++--
 drivers/infiniband/hw/cxgb4/resource.c   | 10 ++--
 drivers/infiniband/hw/cxgb4/t4.h | 72 --
 drivers/infiniband/hw/mlx5/main.c|  2 +
 drivers/infiniband/hw/mlx5/qp.c  | 12 +
 drivers/infiniband/hw/mthca/mthca_main.c |  8 +--
 drivers/infiniband/hw/qib/qib_pcie.c | 55 ++--
 include/linux/mlx5/device.h  |  1 +
 include/linux/mlx5/qp.h  |  1 +
 15 files changed, 270 insertions(+), 125 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2014-04-03 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



Main batch of InfiniBand/RDMA changes for 3.15:

 - The biggest change is core API extensions and mlx5 low-level driver
   support for handling DIF/DIX-style protection information, and the
   addition of PI support to the iSER initiator.  Target support will be
   arriving shortly through the SCSI target tree.

 - A nice simplification to the "umem" memory pinning library now that
   we have chained sg lists.  Kudos to Yishai Hadas for realizing our
   code didn't have to be so crazy.

 - Another nice simplification to the sg wrappers used by qib, ipath and
   ehca to handle their mapping of memory to adapter.

 - The usual batch of fixes to bugs found by static checkers etc. from
   intrepid people like Dan Carpenter and Yann Droneaud.

 - A large batch of cxgb4, ocrdma, qib driver updates.


Alex Tabachnik (2):
  IB/iser: Introduce pi_enable, pi_guard module parameters
  IB/iser: Initialize T10-PI resources

Ariel Nahum (1):
  IB/iser: Remove struct iscsi_iser_conn

Bart Van Assche (7):
  IB/mlx4: Fix a sparse endianness warning
  scsi_transport_srp: Fix two kernel-doc warnings
  IB/srp: Add more logging
  IB/srp: Avoid duplicate connections
  IB/srp: Make writing into the "add_target" sysfs attribute interruptible
  IB/srp: Avoid that writing into "add_target" hangs due to a cable pull
  IB/srp: Fix a race condition between failing I/O and I/O completion

CQ Tang (1):
  IB/qib: Change SDMA progression mode depending on single- or multi-rail

Dan Carpenter (7):
  IB/qib: Remove duplicate check in get_a_ctxt()
  RDMA/nes: Clean up a condition
  RDMA/cxgb4: Fix underflows in c4iw_create_qp()
  RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()
  IB/qib: Cleanup qib_register_observer()
  mlx4_core: Fix some indenting in mlx4_ib_add()
  mlx4_core: Make buffer larger to avoid overflow warning

Dennis Dalessandro (3):
  IB/qib: Fix potential buffer overrun in sending diag packet routine
  IB/ipath: Fix potential buffer overrun in sending diag packet routine
  IB/qib: Fix memory leak of recv context when driver fails to initialize.

Devesh Sharma (9):
  RDMA/ocrdma: EQ full catastrophe avoidance
  RDMA/ocrdma: SQ and RQ doorbell offset clean up
  RDMA/ocrdma: Read ASIC_ID register to select asic_gen
  RDMA/ocrdma: Allow DPP QP creation
  RDMA/ocrdma: ABI versioning between ocrdma and be2net
  be2net: Add abi version between be2net and ocrdma
  RDMA/ocrdma: Update version string
  RDMA/ocrdma: Increment abi version count
  RDMA/ocrdma: Code clean-up

Fabio Estevam (1):
  IB/usnic: Remove '0x' when using %pa format

Mike Marciniszyn (7):
  IB/qib: Fix debugfs ordering issue with multiple HCAs
  IB/qib: Add percpu counter replacing qib_devdata int_counter
  IB/qib: Modify software pma counters to use percpu variables
  IB/qib: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads
  IB/ipath: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads
  IB/ehca: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads
  IB/core: Remove overload in ib_sg_dma*

Moni Shoua (1):
  IB/core: Don't resolve passive side RoCE L2 address in CMA REQ handler

Or Gerlitz (3):
  IB/iser: Print QP information once connection is established
  IB/iser: Update Mellanox copyright note
  IB/iser: Bump driver version to 1.3

Prarit Bhargava (1):
  RDMA/ocrdma: Fix compiler warning

Randy Dunlap (1):
  IB/iser: Fix sector_t format warning

Roi Dayan (1):
  IB/iser: Drain the tx cq once before looping on the rx cq

Roland Dreier (2):
  RDMA/ocrdma: Fix warnings about pointer <-> integer casts
  Merge branches 'core', 'cxgb4', 'ip-roce', 'iser', 'misc', 'mlx4', 'nes', 
'ocrdma', 'qib', 'sgwrapper', 'srp' and 'usnic' into for-next

Sagi Grimberg (23):
  IB/core: Introduce protected memory regions
  IB/core: Introduce signature verbs API
  mlx5: Implement create_mr and destroy_mr
  IB/mlx5: Initialize mlx5_ib_qp signature-related members
  IB/mlx5: Break up wqe handling into begin & finish routines
  IB/mlx5: Remove MTT access mode from umr flags helper function
  IB/mlx5: Keep mlx5 MRs in a radix tree under device
  IB/mlx5: Support IB_WR_REG_SIG_MR
  IB/mlx5: Collect signature error completion
  IB/mlx5: Expose support for signature MR feature
  IB/iser: Suppress completions for fast registration work requests
  IB/iser: Avoid FRWR notation, use fastreg instead
  IB/iser: Push the decision what memory key to use into fast_reg_mr routine
  IB/iser: Move fast_reg_descriptor initialization to a function

[GIT PULL] please pull infiniband.git

2014-04-03 Thread Roland Dreier
Hi Linus,

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
tags/rdma-for-linus



Main batch of InfiniBand/RDMA changes for 3.15:

 - The biggest change is core API extensions and mlx5 low-level driver
   support for handling DIF/DIX-style protection information, and the
   addition of PI support to the iSER initiator.  Target support will be
   arriving shortly through the SCSI target tree.

 - A nice simplification to the umem memory pinning library now that
   we have chained sg lists.  Kudos to Yishai Hadas for realizing our
   code didn't have to be so crazy.

 - Another nice simplification to the sg wrappers used by qib, ipath and
   ehca to handle their mapping of memory to adapter.

 - The usual batch of fixes to bugs found by static checkers etc. from
   intrepid people like Dan Carpenter and Yann Droneaud.

 - A large batch of cxgb4, ocrdma, qib driver updates.


Alex Tabachnik (2):
  IB/iser: Introduce pi_enable, pi_guard module parameters
  IB/iser: Initialize T10-PI resources

Ariel Nahum (1):
  IB/iser: Remove struct iscsi_iser_conn

Bart Van Assche (7):
  IB/mlx4: Fix a sparse endianness warning
  scsi_transport_srp: Fix two kernel-doc warnings
  IB/srp: Add more logging
  IB/srp: Avoid duplicate connections
  IB/srp: Make writing into the add_target sysfs attribute interruptible
  IB/srp: Avoid that writing into add_target hangs due to a cable pull
  IB/srp: Fix a race condition between failing I/O and I/O completion

CQ Tang (1):
  IB/qib: Change SDMA progression mode depending on single- or multi-rail

Dan Carpenter (7):
  IB/qib: Remove duplicate check in get_a_ctxt()
  RDMA/nes: Clean up a condition
  RDMA/cxgb4: Fix underflows in c4iw_create_qp()
  RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()
  IB/qib: Cleanup qib_register_observer()
  mlx4_core: Fix some indenting in mlx4_ib_add()
  mlx4_core: Make buffer larger to avoid overflow warning

Dennis Dalessandro (3):
  IB/qib: Fix potential buffer overrun in sending diag packet routine
  IB/ipath: Fix potential buffer overrun in sending diag packet routine
  IB/qib: Fix memory leak of recv context when driver fails to initialize.

Devesh Sharma (9):
  RDMA/ocrdma: EQ full catastrophe avoidance
  RDMA/ocrdma: SQ and RQ doorbell offset clean up
  RDMA/ocrdma: Read ASIC_ID register to select asic_gen
  RDMA/ocrdma: Allow DPP QP creation
  RDMA/ocrdma: ABI versioning between ocrdma and be2net
  be2net: Add abi version between be2net and ocrdma
  RDMA/ocrdma: Update version string
  RDMA/ocrdma: Increment abi version count
  RDMA/ocrdma: Code clean-up

Fabio Estevam (1):
  IB/usnic: Remove '0x' when using %pa format

Mike Marciniszyn (7):
  IB/qib: Fix debugfs ordering issue with multiple HCAs
  IB/qib: Add percpu counter replacing qib_devdata int_counter
  IB/qib: Modify software pma counters to use percpu variables
  IB/qib: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads
  IB/ipath: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads
  IB/ehca: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads
  IB/core: Remove overload in ib_sg_dma*

Moni Shoua (1):
  IB/core: Don't resolve passive side RoCE L2 address in CMA REQ handler

Or Gerlitz (3):
  IB/iser: Print QP information once connection is established
  IB/iser: Update Mellanox copyright note
  IB/iser: Bump driver version to 1.3

Prarit Bhargava (1):
  RDMA/ocrdma: Fix compiler warning

Randy Dunlap (1):
  IB/iser: Fix sector_t format warning

Roi Dayan (1):
  IB/iser: Drain the tx cq once before looping on the rx cq

Roland Dreier (2):
  RDMA/ocrdma: Fix warnings about pointer - integer casts
  Merge branches 'core', 'cxgb4', 'ip-roce', 'iser', 'misc', 'mlx4', 'nes', 
'ocrdma', 'qib', 'sgwrapper', 'srp' and 'usnic' into for-next

Sagi Grimberg (23):
  IB/core: Introduce protected memory regions
  IB/core: Introduce signature verbs API
  mlx5: Implement create_mr and destroy_mr
  IB/mlx5: Initialize mlx5_ib_qp signature-related members
  IB/mlx5: Break up wqe handling into begin  finish routines
  IB/mlx5: Remove MTT access mode from umr flags helper function
  IB/mlx5: Keep mlx5 MRs in a radix tree under device
  IB/mlx5: Support IB_WR_REG_SIG_MR
  IB/mlx5: Collect signature error completion
  IB/mlx5: Expose support for signature MR feature
  IB/iser: Suppress completions for fast registration work requests
  IB/iser: Avoid FRWR notation, use fastreg instead
  IB/iser: Push the decision what memory key to use into fast_reg_mr routine
  IB/iser: Move fast_reg_descriptor initialization to a function
  IB/iser: Keep IB device attributes under

Re: linux rdma 3.14 merge plans

2014-03-07 Thread Roland Dreier
Sure, no problem.

Do you have a git tree with the latest versions of all the changes you
want for 3.15 in a branch?  That would be helpful as I catch up on
applying things, so that I don't miss anything.

If you don't have one, taking a little time to set one up on github or
wherever would be nice.  You can base your set of changes on Linus's
latest tree.

Thanks!
  Roland

On Thu, Mar 6, 2014 at 9:07 PM, Devesh Sharma  wrote:
> Hi Roland,
>
> Is it okay to send next series of patches even if previous series is not 
> accepted yet in your tree? Off-course I will cut patches on top of previous 
> series of patches.
>
> -Regards
>  Devesh
>
> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org 
> [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
> Sent: Thursday, March 06, 2014 12:34 AM
> To: Roland Dreier
> Cc: Or Gerlitz; Hefty Sean; linux-rdma; Martin K. Petersen; target-devel; 
> Sagi Grimberg; linux-kernel
> Subject: Re: linux rdma 3.14 merge plans
>
> On Wed, 2014-03-05 at 07:18 -0800, Roland Dreier wrote:
>> On Wed, Mar 5, 2014 at 1:54 AM, Nicholas A. Bellinger
>>  wrote:
>> > That all said, do you have an objection wrt taking this bits through
>> > target-pending..?  Given the dependencies involved, that would seem
>> > the most logical path to take.
>>
>> Perhaps not surprisingly, I would prefer to get a chance to review a
>> major change to the core RDMA midlayer rather than having you merge it
>> through your tree.  So yes I do object.  Please give me a chance to
>> review and merge it.  I am working on that this week.
>>
>
> Great.  We'll be looking for a response by the end of the week.
>
> Otherwise if you end up not having time, we'd still like to move forward for 
> v3.15 given the amount of review the series has already gotten on the list.
>
> Thank you,
>
> --nab
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the 
> body of a message to majord...@vger.kernel.org More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux rdma 3.14 merge plans

2014-03-07 Thread Roland Dreier
Sure, no problem.

Do you have a git tree with the latest versions of all the changes you
want for 3.15 in a branch?  That would be helpful as I catch up on
applying things, so that I don't miss anything.

If you don't have one, taking a little time to set one up on github or
wherever would be nice.  You can base your set of changes on Linus's
latest tree.

Thanks!
  Roland

On Thu, Mar 6, 2014 at 9:07 PM, Devesh Sharma devesh.sha...@emulex.com wrote:
 Hi Roland,

 Is it okay to send next series of patches even if previous series is not 
 accepted yet in your tree? Off-course I will cut patches on top of previous 
 series of patches.

 -Regards
  Devesh

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org 
 [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
 Sent: Thursday, March 06, 2014 12:34 AM
 To: Roland Dreier
 Cc: Or Gerlitz; Hefty Sean; linux-rdma; Martin K. Petersen; target-devel; 
 Sagi Grimberg; linux-kernel
 Subject: Re: linux rdma 3.14 merge plans

 On Wed, 2014-03-05 at 07:18 -0800, Roland Dreier wrote:
 On Wed, Mar 5, 2014 at 1:54 AM, Nicholas A. Bellinger
 n...@linux-iscsi.org wrote:
  That all said, do you have an objection wrt taking this bits through
  target-pending..?  Given the dependencies involved, that would seem
  the most logical path to take.

 Perhaps not surprisingly, I would prefer to get a chance to review a
 major change to the core RDMA midlayer rather than having you merge it
 through your tree.  So yes I do object.  Please give me a chance to
 review and merge it.  I am working on that this week.


 Great.  We'll be looking for a response by the end of the week.

 Otherwise if you end up not having time, we'd still like to move forward for 
 v3.15 given the amount of review the series has already gotten on the list.

 Thank you,

 --nab



 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in the 
 body of a message to majord...@vger.kernel.org More majordomo info at  
 http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >