Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-10 Thread Rafael Tinoco
Paul E. McKenney, Eric Biederman, David Miller (and/or anyone else interested):

It was brought to my attention that netns creation/execution might
have suffered scalability/performance regression after v3.8.

I would like you, or anyone interested, to review these charts/data
and check if there is something that could be discussed/said before I
move further.

The following script was used for all the tests and charts generation:


#!/bin/bash
IP=/sbin/ip

function add_fake_router_uuid() {
j=`uuidgen`
$IP netns add bar-${j}
$IP netns exec bar-${j} $IP link set lo up
$IP netns exec bar-${j} sysctl -w net.ipv4.ip_forward=1  /dev/null
k=`echo $j | cut -b -11`
$IP link add qro-${k} type veth peer name qri-${k} netns bar-${j}
$IP link add qgo-${k} type veth peer name qgi-${k} netns bar-${j}
}

for i in `seq 1 $1`; do
if [ `expr $i % 250` -eq 0 ]; then
echo $i by `date +%s`
fi
add_fake_router_uuid
done


This script gives how many fake routers are added per second (from 0
to 3000 router creation mark, ex). With this and a git bisect on
kernel tree I was led to one specific commit causing
scalability/performance regression: #911af50 rcu: Provide
compile-time control for no-CBs CPUs. Even Though this change was
experimental at that point, it introduced a performance scalability
regression (explained below) that still lasts.

RCU related code looked like to be responsible for the problem. With
that, every commit from tag v3.8 to master that changed any of this
files: kernel/rcutree.c kernel/rcutree.h kernel/rcutree_plugin.h
include/trace/events/rcu.h include/linux/rcupdate.h had the kernel
checked out/compiled/tested. The idea was to check performance
regression during rcu development, if that was the case. In the worst
case, the regression not being related to rcu, I would still have
chronological data to interpret.

All text below this refer to 2 groups of charts, generated during the study:


1) Kernel git tags from 3.8 to 3.14.
*** http://people.canonical.com/~inaddy/lp1328088/charts/250-tag.html ***

2) Kernel git commits for rcu development (111 commits) - Clearly
shows regressions:
*** http://people.canonical.com/~inaddy/lp1328088/charts/250.html ***

Obs:

1) There is a general chart with 111 commits. With this chart you can
see performance evolution/regression on each test mark. Test mark goes
from 0 to 2500 and refers to fake routers already created. Example:
Throughput was 50 routers/sec on 250 already created mark and 30
routers/sec on 1250 mark.

2) Clicking on a specific commit will give you that commit evolution
from 0 routers already created to 2500 routers already created mark.


Since there were differences in results, depending on how many cpus or
how the no-cb cpus were configured, 3 kernel config options were used
on every measure, for 1 and 4 cpus.


- CONFIG_RCU_NOCB_CPU (disabled): nocbno
- CONFIG_RCU_NOCB_CPU_ALL (enabled): nocball
- CONFIG_RCU_NOCB_CPU_NONE (enabled): nocbnone

Obs: For 1 cpu cases: nocbno, nocbnone, nocball behaves the same (or
should) since w/ only 1 cpu there is no no-cb cpu.


After charts being generated it was clear that NOCB_CPU_ALL (4 cpus)
affected the fake routers creation process performance and this
regression continues up to upstream version. It was also clear that,
after commit #911af50, having more than 1 cpu does not improve
performance/scalability for netns, makes it worse.

#911af50

...
+#ifdef CONFIG_RCU_NOCB_CPU_ALL
+ pr_info(\tExperimental no-CBs for all CPUs\n);
+ cpumask_setall(rcu_nocb_mask);
+#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ALL */
...


Comparing standing out points (see charts):

#81e5949 - good
#911af50 - bad

I was able to see that, from the script above, the following lines
causes major impact on netns scalability/performance:

1) ip netns add - huge performance regression:

 1 cpu: no regression
 4 cpu: regression for NOCB_CPU_ALL

 obs: regression from 250 netns/sec to 50 netns/sec on 500 netns
already created mark

2) ip netns exec - some performance regression

 1 cpu: no regression
 4 cpu: regression for NOCB_CPU_ALL

 obs: regression from 40 netns (+1 exec per netns creation) to 20
netns/sec on 500 netns created mark



FULL NOTE: http://people.canonical.com/~inaddy/lp1328088/

** Assumption: RCU callbacks being offloaded to multiple cpus
(cpumask_setall) caused regression in
copy_net_ns-created_new_namespaces or unshare(clone_newnet).

** Next Steps: I'll probably begin to function_graph netns creation execution
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-11 Thread Rafael Tinoco
, you need to use the rcu_nocbs= boot parameter.


Absolutely, I'll try with 2 or 3 commits before #911af50 just in case.

 Again, what you are seeing is the effect of callback offloading on
 a workload not particularly suited for it.  That said, I don't understand
 why you are seeing any particular effect when offloading is completely
 disabled unless your workload is sensitive to grace-period latency.


Wanted to make sure results were correctly. Starting to investigate netns
functions (copied some of netns developers here also). Totally agree
and this confirm my hypothesis.


 Some questions:

 o   Why does the throughput drop all the way to zero at various points?

Explained earlier. Check if it is 0.00 or 0.xx. 0.00 can mean unbootable kernel.


 o   What exactly is this benchmark doing?

Explained earlier. Simulating cloud infrastructure migrating netns on failure.


 o   Is this benchmark sensitive to grace-period latency?
 (You can check this by changing the value of HZ, give or take.)

Will do that.

 o   How many runs were taken at each point?  If more than one, what
 was the variability?

For all commits only one. For pointed out commits more then one.
Results tend to
be the same with minimum variation. Trying to balance efforts on digging into
the problem versus getting more results.

If you think, after my next answers (changing HZ, FAST_NOHZ) that
remeasuring everything is a must, let me know then I'll work on
deviation for you.


 o   Routers per second means what?

Explained earlier.


 o   How did you account for the effects of other non-RCU commits?
 Did you rebase the RCU commits on top of an older release without
 the other commits or something similar?

I used the Linus git tree, checking out specific commits and compiling the
kernel. I've only used commits that changed RCU because of the bisect
result. Besides these commits I have only generated kernel for main
release tags.

In my point of view, if this is related to RCU, several things have to be
discussed: Is using NOCB_CPU_ALL for a general purpose kernel a
good option ? Is netns code too dependent of grace-period low latency
to scale ? Is there a way of minimizing this ?

 Thanx, Paul

No Paul, I have to thank you. Really appreciate your time.

Rafael (tinoco@canonical/~inaddy)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-11 Thread Rafael Tinoco
Eric,

I'll test the patch with the same testcase and let you all know.

Really appreciate everybody's efforts.

On Wed, Jun 11, 2014 at 5:55 PM, Eric W. Biederman
ebied...@xmission.com wrote:
 Paul E. McKenney paul...@linux.vnet.ibm.com writes:

 On Wed, Jun 11, 2014 at 01:27:07PM -0500, Dave Chiluk wrote:
 On 06/11/2014 11:18 AM, Paul E. McKenney wrote:
  On Wed, Jun 11, 2014 at 10:46:00AM -0500, David Chiluk wrote:
  Now think about what happens when a gateway goes down, the namespaces
  need to be migrated, or a new machine needs to be brought up to replace
  it.  When we're talking about 3000 namespaces, the amount of time it
  takes simply to recreate the namespaces becomes very significant.
 
  The script is a stripped down example of what exactly is being done on
  the neutron gateway in order to create namespaces.
 
  Are the namespaces torn down and recreated one at a time, or is there some
  syscall, ioctl(), or whatever that allows bulk tear down and recreating?
 
 Thanx, Paul

 In the normal running case, the namespaces are created one at a time, as
 new customers create a new set of VMs on the cloud.

 However, in the case of failover to a new neutron gateway the namespaces
 are created all at once using the ip command (more or less serially).

 As far as I know there is no syscall or ioctl that allows bulk tear down
 and recreation.  if such a beast exists that might be helpful.

 The solution might be to create such a beast.  I might be able to shave
 a bit of time off of this benchmark, but at the cost of significant
 increases in RCU's CPU consumption.  A bulk teardown/recreation API could
 reduce the RCU grace-period overhead by several orders of magnitude by
 having a single RCU grace period cover a few thousand changes.

 This is why other bulk-change syscalls exist.

 Just out of curiosity, what syscalls does the ip command use?

 You can look in iproute2 ip/ipnetns.c

 But rought ip netns add does:

 unshare(CLONE_NEWNET);
 mkdir /var/run/netns/name
 mount --bind /proc/self/ns/net /var/run/netns/name

 I don't know if there is any sensible way to batch that work.

 (The unshare gets you into copy_net_ns in net/core/net_namespace.c
  and to find all of the code it can call you have to trace all
  of the register_pernet_subsys and register_pernet_device calls).

 At least for creation I would like to see if we can make all of the
 rcu_callback synchronize_rcu calls go away.  That seems preferable
 to batching at creation time.

 Eric



-- 
-- 
Rafael David Tinoco
Software Sustaining Engineer @ Canonical
Canonical Technical Services Engineering Team
# Email: rafael.tin...@canonical.com (GPG: 87683FC0)
# Phone: +55.11.9.6777.2727 (Americas/Sao_Paulo)
# LP: ~inaddy | IRC: tinoco | Skype: rafael.tinoco
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-11 Thread Rafael Tinoco
I'm getting a kernel panic with your patch:

-- panic
-- mount_block_root
-- mount_root
-- prepare_namespace
-- kernel_init_freeable

It is giving me an unknown block device for the same config file i
used on other builds. Since my test is running on a kvm guest under a
ramdisk, i'm still checking if there are any differences between this
build and other ones but I think there aren't.

Any chances that prepare_namespace might be breaking mount_root ?

Tks

On Wed, Jun 11, 2014 at 9:14 PM, Eric W. Biederman
ebied...@xmission.com wrote:
 Paul E. McKenney paul...@linux.vnet.ibm.com writes:

 On Wed, Jun 11, 2014 at 04:12:15PM -0700, Eric W. Biederman wrote:
 Paul E. McKenney paul...@linux.vnet.ibm.com writes:

  On Wed, Jun 11, 2014 at 01:46:08PM -0700, Eric W. Biederman wrote:
  On the chance it is dropping the old nsproxy which calls syncrhonize_rcu
  in switch_task_namespaces that is causing you problems I have attached
  a patch that changes from rcu_read_lock to task_lock for code that
  calls task_nsproxy from a different task.  The code should be safe
  and it should be an unquestions performance improvement but I have only
  compile tested it.
 
  If you can try the patch it will tell is if the problem is the rcu
  access in switch_task_namespaces (the only one I am aware of network
  namespace creation) or if the problem rcu case is somewhere else.
 
  If nothing else knowing which rcu accesses are causing the slow down
  seem important at the end of the day.
 
  Eric
 
 
  If this is the culprit, another approach would be to use workqueues from
  RCU callbacks.  The following (untested, probably does not even build)
  patch illustrates one such approach.

 For reference the only reason we are using rcu_lock today for nsproxy is
 an old lock ordering problem that does not exist anymore.

 I can say that in some workloads setns is a bit heavy today because of
 the synchronize_rcu and setns is more important that I had previously
 thought because pthreads break the classic unix ability to do things in
 your process after fork() (sigh).

 Today daemonize is gone, and notify the parent process with a signal
 relies on task_active_pid_ns which does not use nsproxy.  So the old
 lock ordering problem/race is gone.

 The description of what was happening when the code switched from
 task_lock to rcu_read_lock to protect nsproxy.

 OK, never mind, then!  ;-)

 I appreciate you posting your approach.  I just figured I should do
 my homework, and verify my fuzzy memory.

 Who knows there might be different performance problems with my
 approach.  But I am hoping this is one of those happy instances where we
 can just make everything simpler.

 Eric



-- 
-- 
Rafael David Tinoco
Software Sustaining Engineer @ Canonical
Canonical Technical Services Engineering Team
# Email: rafael.tin...@canonical.com (GPG: 87683FC0)
# Phone: +55.11.9.6777.2727 (Americas/Sao_Paulo)
# LP: ~inaddy | IRC: tinoco | Skype: rafael.tinoco
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-11 Thread Rafael Tinoco
Ok, some misconfiguration here probably, never mind. I'll finish the
tests tomorrow, compare with existent ones and let you know asap. Tks.

On Wed, Jun 11, 2014 at 10:09 PM, Eric W. Biederman
ebied...@xmission.com wrote:
 Rafael Tinoco rafael.tin...@canonical.com writes:

 I'm getting a kernel panic with your patch:

 -- panic
 -- mount_block_root
 -- mount_root
 -- prepare_namespace
 -- kernel_init_freeable

 It is giving me an unknown block device for the same config file i
 used on other builds. Since my test is running on a kvm guest under a
 ramdisk, i'm still checking if there are any differences between this
 build and other ones but I think there aren't.

 Any chances that prepare_namespace might be breaking mount_root ?

 My patch boots for me

 Eric



-- 
-- 
Rafael David Tinoco
Software Sustaining Engineer @ Canonical
Canonical Technical Services Engineering Team
# Email: rafael.tin...@canonical.com (GPG: 87683FC0)
# Phone: +55.11.9.6777.2727 (Americas/Sao_Paulo)
# LP: ~inaddy | IRC: tinoco | Skype: rafael.tinoco
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-13 Thread Rafael Tinoco
Okay,

Tests with the same script were done.
I'm comparing : master + patch vs 3.15.0-rc5 (last sync'ed rcu commit)
and 3.9 last bisect good.

Same tests were made. I'm comparing the following versions:

1) master + suggested patch
2) 3.15.0-rc5 (last rcu commit in my clone)
3) 3.9-rc2 (last bisect good)

 master + sug patch  3.15.0-rc5 (last rcu)
3.9-rc2 (bisec good)
mark  no  noneall no   none allno

# (netns add) / sec

 250  125.00  250.00  250.00  20.8322.7350.00  83.33
 500  250.00  250.00  250.00  22.7322.7350.00  125.00
 750  250.00  125.00  125.00  20.8322.7362.50  125.00
1000  125.00  250.00  125.00  20.8320.8350.00  250.00
1250  125.00  125.00  250.00  22.7322.7350.00  125.00
1500  125.00  125.00  125.00  22.7322.7341.67  125.00
1750  125.00  125.00  83.33   22.7322.7350.00  83.33
2000  125.00  83.33   125.00  22.7325.0050.00  125.00

- From 3.15 to patched tree, netns add performance was ***
restored/improved *** OK

# (netns add + 1 x exec) / sec

 250  11.90   14.71   31.25   5.00 6.76 15.63  62.50
 500  11.90   13.89   31.25   5.10 7.14 15.63  41.67
 750  11.90   13.89   27.78   5.10 7.14 15.63  50.00
1000  11.90   13.16   25.00   4.90 6.41 15.63  35.71
1250  11.90   13.89   25.00   4.90 6.58 15.63  27.78
1500  11.36   13.16   25.00   4.72 6.25 15.63  25.00
1750  11.90   12.50   22.73   4.63 5.56 14.71  20.83
2000  11.36   12.50   22.73   4.55 5.43 13.89  17.86

- From 3.15 to patched tree, performance improves +100% but still
-50% of 3.9-rc2

# (netns add + 2 x exec) / sec

250   6.588.6216.67   2.81 3.97 9.26   41.67
500   6.588.3315.63   2.78 4.10 9.62   31.25
750   5.957.8115.63   2.69 3.85 8.93   25.00
1000  5.957.3513.89   2.60 3.73 8.93   20.83
1250  5.817.3513.89   2.55 3.52 8.62   16.67
1500  5.817.3513.16   0.00 3.47 8.62   13.89
1750  5.436.7613.16   0.00 3.47 8.62   11.36
2000  5.326.5812.50   0.00 3.38 8.339.26

- Same as before.

# netns add + 2 x exec + 1 x ip link to netns

250   7.148.3314.71   2.87 3.97 8.62   35.71
500   6.948.3313.89   2.91 3.91 8.93   25.00
750   6.107.5813.89   2.75 3.79 8.06   19.23
1000  5.566.9412.50   2.69 3.85 8.06   14.71
1250  5.686.5811.90   2.58 3.57 7.81   11.36
1500  5.566.5810.87   0.00 3.73 7.58   10.00
1750  5.436.4110.42   0.00 3.57 7.14   8.62
2000  5.216.2510.00   0.00 3.33 7.14   6.94

- Ip link add to netns did not change performance proportion much.

# netns add + 2 x exec + 2 x ip link to netns

250   7.358.6213.89   2.94 4.03 8.33   31.25
500   7.148.0612.50   2.94 4.03 8.06   20.83
750   6.417.5811.90   2.81 3.85 7.81   15.63
1000  5.957.1410.87   2.69 3.79 7.35   12.50
1250  5.816.7610.00   2.66 3.62 7.14   10.00
1500  5.686.419.623.73 6.76 8.06
1750  5.326.258.933.68 6.58 7.35
2000  5.436.108.333.42 6.10 6.41

- Same as before.

OBS:

1) It seems that performance got improved for network namespace
addiction but maybe there can be some improvement also on netns
execution. This way we might achieve same performance as 3.9.0-rc2
(good bisect) had.

2) These tests were made with 4 cpu only.

3) Initial charts showed that 1 cpu case with all cpus as no-cb
(without this patch) had something like 50% of bisect good. The 4 cpu
(nocball) case had 26% of bisect good (like showed above in the last
case - 31.25 -- 8.33).

4) With the patch, using 4 cpus and nocball, we now have 44% of bisect
good performance (against 26% we had).

5) NOCB_* is still an issue. It is clear that only NOCB_CPU_ALL option
is giving us something near last good commit performance.

Thank you

Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-16 Thread Rafael Tinoco
...

On Fri, Jun 13, 2014 at 9:02 PM, Eric W. Biederman
ebied...@xmission.com wrote:
 Rafael Tinoco rafael.tin...@canonical.com writes:

 Okay,

 Tests with the same script were done.
 I'm comparing : master + patch vs 3.15.0-rc5 (last sync'ed rcu commit)
 and 3.9 last bisect good.

 Same tests were made. I'm comparing the following versions:

 1) master + suggested patch
 2) 3.15.0-rc5 (last rcu commit in my clone)
 3) 3.9-rc2 (last bisect good)

 I am having a hard time making sense of your numbers.

 If I have read your email correctly my suggested patch caused:
 ip netns add numbers to improve
 1x ip netns exec to improve some
 2x ip netns exec to show no improvement
 ip link add to show no effect (after the 2x ip netns exec)

 - netns add are as good as they were before this regression.
 - netns exec are improved but still 50% of the last good bisect commit.
 - link add didn't show difference.

 This is interesting in a lot of ways.
 - This seems to confirm that the only rcu usage in ip netns add
   was switch_task_namespaces.  Which is convinient as that rules
   out most of the network stack when looking for performance oddities.

 - ip netns exec had an expected performance improvement
 - ip netns exec is still slow (so something odd is still going on)
 - ip link add appears immaterial to the performance problem.

 It would be interesting to switch the ip link add and ip netns exec
 in your test case to confirm that there is nothing interesting/slow
 going on in ip link add

 - will do that.


 Which leaves me with the question what ip ip netns exec remains
 that is using rcu and is slowing all of this down.

 - will check this also.

 Eric

Tks

Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4.4 00/92] 4.4.133-stable review

2018-05-24 Thread Rafael Tinoco
> > kernel: 4.4.133-rc1
> > git repo: 
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> > git branch: linux-4.4.y
> > git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4
> > git describe: v4.4.132-93-g915a3d7cdea9
> > Test details: 
> > https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a3d7cdea9
> >
> >
> > No regressions (compared to build v4.4.132-71-g180635995c36)
>
> It should have gotten better, as there was a fix in here for at least 2
> LTP tests that we previously were not passing.  I don't know why you all
> were not reporting that in the past, it took someone else randomly
> deciding to run LTP to report it to me...
>
> Why did an improvement in results not show up?

Are you referring to the CLOCK_MONOTONIC_RAW fix for the arm64 vDSO ?
I think that CLOCK_MONOTONIC_RAW in VDSO wasn't backported to 4.4.y
(commit 49eea433b326 in mainline) so this "fix" is changing the
timekeeping sauce (that would fix MONOTONIC RAW) but not for 4.4.y in
ARM64. Needs review though :\


Re: [PATCH 4.4 00/92] 4.4.133-stable review

2018-05-24 Thread Rafael Tinoco
Thank you Daniel! Will investigate those.

Meanwhile, Greg, I referred to:

time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting

Since we're not using this type of clock on arm64's 4.4 kernel vdso
functions. This commit's description calls attention for it to be
responsible for fixing kselftest flacking tests, we wouldn't get that
on 4.4 according to bellow:

stable-rc-linux-4.14.y
dbb236c1ceb6 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO
82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support
9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW

stable-rc-linux-4.16.y
dbb236c1ceb6 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO
82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support
9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW

stable-rc-linux-4.4.y


stable-rc-linux-4.9.y
99f66b5182a4 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO
82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support
9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW

Yet, the second fix was backported to all (including 4.4.y):

stable-rc-linux-4.14.y
3d88d56c5873 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
stable-rc-linux-4.16.y
3d88d56c5873 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
stable-rc-linux-4.4.y
7c8bd6e07430 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
stable-rc-linux-4.9.y
a53bfdda06ac time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting

Not sure you want to keep it in 4.4, thought it was worth mentioning it.

Cheers.

On 24 May 2018 at 22:34, Daniel Sangorrin
<daniel.sangor...@toshiba.co.jp> wrote:
> Hello Rafael,
>
> The tests fcntl35 and fcntl35_64 should have go from FAIL to PASS.
> https://www.spinics.net/lists/stable/msg239475.html
>
> Looking at
> https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a3d7cdea9/testrun/228569/suite/ltp-syscalls-tests/tests/
> I see that these two tests (and other important tests as well) are being 
> SKIPPED.
>
> By the way, I see that select04 FAILS in your case. But in my setup, select04 
> was working fine (x86_64) in 4.4.132. I will confirm that it still works in 
> 4.4.133
>
> Thanks,
> Daniel Sangorrin
>
>> -Original Message-
>> From: stable-ow...@vger.kernel.org [mailto:stable-ow...@vger.kernel.org] On
>> Behalf Of Rafael Tinoco
>> Sent: Friday, May 25, 2018 5:32 AM
>> To: Greg Kroah-Hartman <gre...@linuxfoundation.org>
>> Cc: linux-kernel@vger.kernel.org; sh...@kernel.org; patc...@kernelci.org;
>> lkft-tri...@lists.linaro.org; ben.hutchi...@codethink.co.uk;
>> sta...@vger.kernel.org; a...@linux-foundation.org;
>> torva...@linux-foundation.org; li...@roeck-us.net
>> Subject: Re: [PATCH 4.4 00/92] 4.4.133-stable review
>>
>> > > kernel: 4.4.133-rc1
>> > > git repo:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>> > > git branch: linux-4.4.y
>> > > git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4
>> > > git describe: v4.4.132-93-g915a3d7cdea9
>> > > Test details:
>> https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a
>> 3d7cdea9
>> > >
>> > >
>> > > No regressions (compared to build v4.4.132-71-g180635995c36)
>> >
>> > It should have gotten better, as there was a fix in here for at least 2
>> > LTP tests that we previously were not passing.  I don't know why you all
>> > were not reporting that in the past, it took someone else randomly
>> > deciding to run LTP to report it to me...
>> >
>> > Why did an improvement in results not show up?
>>
>> Are you referring to the CLOCK_MONOTONIC_RAW fix for the arm64 vDSO ?
>> I think that CLOCK_MONOTONIC_RAW in VDSO wasn't backported to 4.4.y
>> (commit 49eea433b326 in mainline) so this "fix" is changing the
>> timekeeping sauce (that would fix MONOTONIC RAW) but not for 4.4.y in
>> ARM64. Needs review though :\
>
>
>


Re: [LTP] [PATCH 4.4 00/24] 4.4.137-stable review

2018-06-14 Thread Rafael Tinoco
Jan, Naresh,

Patch has been queued to 4.4 (for the next review round, yet to be
merged to stable-rc branch):

https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-4.4

as "clarify-and-fix-max_lfs_filesize-macros.patch"

Thank you!

On 14 June 2018 at 07:36, Jan Stancek  wrote:
>
> - Original Message -
>> On Thu, Jun 14, 2018 at 05:49:52AM -0400, Jan Stancek wrote:
>> >
>> > - Original Message -
>> > > On Thu, Jun 14, 2018 at 02:24:25PM +0530, Naresh Kamboju wrote:
>> > > > On 14 June 2018 at 12:04, Greg Kroah-Hartman
>> > > > 
>> > > > wrote:
>> > > > > On Wed, Jun 13, 2018 at 10:48:50PM -0300, Rafael Tinoco wrote:
>> > > > >> On 13 June 2018 at 18:08, Rafael David Tinoco
>> > > > >>  wrote:
>> > > > >> > On Wed, Jun 13, 2018 at 6:00 PM, Greg Kroah-Hartman
>> > > > >> >  wrote:
>> > > > >> >> On Wed, Jun 13, 2018 at 05:47:49PM -0300, Rafael Tinoco wrote:
>> > > > >> >>> Results from Linaro’s test farm.
>> > > > >> >>> Regressions detected.
>> > > > >> >>>
>> > > > >> >>> NOTE:
>> > > > >> >>>
>> > > > >> >>> 1) LTP vma03 test (cve-2011-2496) broken on v4.4-137-rc1 because
>> > > > >> >>> of:
>> > > > >> >>>
>> > > > >> >>>  6ea1dc96a03a mmap: relax file size limit for regular files
>> > > > >> >>>  bd2f9ce5bacb mmap: introduce sane default mmap limits
>> > > > >> >>>
>> > > > >> >>>discussion:
>> > > > >> >>>
>> > > > >> >>>  https://github.com/linux-test-project/ltp/issues/341
>> > > > >> >>>
>> > > > >> >>>mainline commit (v4.13-rc7):
>> > > > >> >>>
>> > > > >> >>>  0cc3b0ec23ce Clarify (and fix) MAX_LFS_FILESIZE macros
>> > > > >> >>>
>> > > > >> >>>should be backported to 4.4.138-rc2 and fixes the issue.
>> > > > >> >>
>> > > > >> >> Really?  That commit says it fixes c2a9737f45e2 ("vfs,mm: fix a
>> > > > >> >> dead
>> > > > >> >> loop in truncate_inode_pages_range()") which is not in 4.4.y at
>> > > > >> >> all.
>> > > > >> >>
>> > > > >> >> Did you test this out?
>> > > > >> >
>> > > > >> > Yes, the LTP contains the tests (last comment is the final test
>> > > > >> > for
>> > > > >> > arm32, right before Jan tests i686).
>> > > > >> >
>> > > > >> > Fixing MAX_LFS_FILESIZE fixes the new limit for mmap() brought by
>> > > > >> > those 2 commits (file_mmap_size_max()).
>> > > > >> > offset tested by the LTP test is 0xfffe000.
>> > > > >> > file_mmap_size_max gives: 0x000 as max value, but only
>> > > > >> > after
>> > > > >> > the mentioned patch.
>> > > > >> >
>> > > > >> > Original intent for this fix was other though.
>> > > > >>
>> > > > >> To clarify this a bit further.
>> > > > >>
>> > > > >> The LTP CVE test is breaking in the first call to mmap(), even
>> > > > >> before
>> > > > >> trying to remap and test the security issue. That start happening in
>> > > > >> this round because of those mmap() changes and the offset used in
>> > > > >> the
>> > > > >> LTP test. Linus changed limit checks and made them to be related to
>> > > > >> MAX_LFS_FILESIZE. Unfortunately, in 4.4 stable, we were missing the
>> > > > >> fix for MAX_LFS_FILESIZE (which before commit 0cc3b0ec23ce was less
>> > > > >> than the REAL 32 bit limit).
>> > > > >>
>> > > > >> Commit 0cc3b0ec23ce was made because an user noticed the FS limit
>> > > > >> not
>> > > > >> being what i

Re: [PATCH 4.9 00/31] 4.9.108-stable review

2018-06-13 Thread Rafael Tinoco
On 12 June 2018 at 13:46, Greg Kroah-Hartman  wrote:
> This is the start of the stable review cycle for the 4.9.108 release.
> There are 31 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Jun 14 16:46:09 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.108-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

NOTE:
There is an intermittent failure LTP fs read_all_sys on specific to arm64 Hikey
board we will investigate further.
https://bugs.linaro.org/show_bug.cgi?id=3903

Summary


kernel: 4.9.108-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.9.y
git commit: 9b3f06c8225324c48370ed02288023578494f050
git describe: v4.9.107-32-g9b3f06c82253
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.107-32-g9b

No Regressions (compared to build v4.9.107)


Ran 11373 total tests in the following environments and test suites.

Environments
--
- dragonboard-410c - arm64
- hi6220-hikey - arm64
- juno-r2 - arm64
- qemu_arm
- qemu_arm64
- qemu_x86_64
- x15 - arm
- x86_64

Test Suites
---
* boot
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none

--
Linaro LKFT
https://lkft.linaro.org


Re: [PATCH 4.4 00/24] 4.4.137-stable review

2018-06-13 Thread Rafael Tinoco
Results from Linaro’s test farm.
Regressions detected.

NOTE:

1) LTP vma03 test (cve-2011-2496) broken on v4.4-137-rc1 because of:

 6ea1dc96a03a mmap: relax file size limit for regular files
 bd2f9ce5bacb mmap: introduce sane default mmap limits

   discussion:

 https://github.com/linux-test-project/ltp/issues/341

   mainline commit (v4.13-rc7):

 0cc3b0ec23ce Clarify (and fix) MAX_LFS_FILESIZE macros

   should be backported to 4.4.138-rc2 and fixes the issue.

2) select04 failure on x15 board will be investigated in:

 https://bugs.linaro.org/show_bug.cgi?id=3852

   and seems to be a timing issue (HW related).

Summary


kernel: 4.4.137-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.4.y
git commit: 678437d36d4e14a029309f1c282802ce47fda36a
git describe: v4.4.136-25-g678437d36d4e
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.136-25-g678437d36d4e

Regressions (compared to build v4.4.136)


qemu_arm:
  ltp-cve-tests:
* cve-2011-2496
* runltp_cve

* test src: git://github.com/linux-test-project/ltp.git

x15 - arm:
  ltp-cve-tests:
* cve-2011-2496
* runltp_cve

* test src: git://github.com/linux-test-project/ltp.git
  ltp-syscalls-tests:
* runltp_syscalls
* select04

* test src: git://github.com/linux-test-project/ltp.git

Ran 7100 total tests in the following environments and test suites.

Environments
--
- juno-r2 - arm64
- qemu_arm
- qemu_x86_64
- x15 - arm
- x86_64

Test Suites
---
* boot
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none

Summary


kernel: 4.4.137-rc1
git repo: https://git.linaro.org/lkft/arm64-stable-rc.git
git branch: 4.4.137-rc1-hikey-20180612-214
git commit: e5d5cb57472f9f98a68f872664de3d70610019e1
git describe: 4.4.137-rc1-hikey-20180612-214
Test details: 
https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.137-rc1-hikey-20180612-214

No regressions (compared to build 4.4.136-rc2-hikey-20180606-212)

Ran 2611 total tests in the following environments and test suites.

Environments
--
- hi6220-hikey - arm64
- qemu_arm64

Test Suites
---
* boot
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* ltp-fs-tests

--
Linaro LKFT
https://lkft.linaro.org

On 12 June 2018 at 13:51, Greg Kroah-Hartman  wrote:
> This is the start of the stable review cycle for the 4.4.137 release.
> There are 24 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Jun 14 16:48:07 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.137-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h


Re: [PATCH 4.4 00/24] 4.4.137-stable review

2018-06-13 Thread Rafael Tinoco
On 13 June 2018 at 18:08, Rafael David Tinoco
 wrote:
> On Wed, Jun 13, 2018 at 6:00 PM, Greg Kroah-Hartman
>  wrote:
>> On Wed, Jun 13, 2018 at 05:47:49PM -0300, Rafael Tinoco wrote:
>>> Results from Linaro’s test farm.
>>> Regressions detected.
>>>
>>> NOTE:
>>>
>>> 1) LTP vma03 test (cve-2011-2496) broken on v4.4-137-rc1 because of:
>>>
>>>  6ea1dc96a03a mmap: relax file size limit for regular files
>>>  bd2f9ce5bacb mmap: introduce sane default mmap limits
>>>
>>>discussion:
>>>
>>>  https://github.com/linux-test-project/ltp/issues/341
>>>
>>>mainline commit (v4.13-rc7):
>>>
>>>  0cc3b0ec23ce Clarify (and fix) MAX_LFS_FILESIZE macros
>>>
>>>should be backported to 4.4.138-rc2 and fixes the issue.
>>
>> Really?  That commit says it fixes c2a9737f45e2 ("vfs,mm: fix a dead
>> loop in truncate_inode_pages_range()") which is not in 4.4.y at all.
>>
>> Did you test this out?
>
> Yes, the LTP contains the tests (last comment is the final test for
> arm32, right before Jan tests i686).
>
> Fixing MAX_LFS_FILESIZE fixes the new limit for mmap() brought by
> those 2 commits (file_mmap_size_max()).
> offset tested by the LTP test is 0xfffe000.
> file_mmap_size_max gives: 0x000 as max value, but only after
> the mentioned patch.
>
> Original intent for this fix was other though.

To clarify this a bit further.

The LTP CVE test is breaking in the first call to mmap(), even before
trying to remap and test the security issue. That start happening in
this round because of those mmap() changes and the offset used in the
LTP test. Linus changed limit checks and made them to be related to
MAX_LFS_FILESIZE. Unfortunately, in 4.4 stable, we were missing the
fix for MAX_LFS_FILESIZE (which before commit 0cc3b0ec23ce was less
than the REAL 32 bit limit).

Commit 0cc3b0ec23ce was made because an user noticed the FS limit not
being what it should be. In our case, the 4.4 stable kernel, we are
facing this 32 bit lower limit (than the real 32 bit real limit),
because of the LTP CVE test, so we need this fix to have the real 32
bit limit set for that macro (mmap limits did not use that macro
before).

I have tested in arm32 and Jan Stancek, who first responded to LTP
issue, has tested this in i686 and both worked after that patch was
included to v4.4-137-rc1 (my last test was even with 4.4.138-rc1).

Hope that helps a bit.


Re: [PATCH 4.16 000/272] 4.16.13-stable review

2018-05-29 Thread Rafael Tinoco
The following bug has been opened for LTP:

https://github.com/linux-test-project/ltp/issues/319

for the CVE-2017-5669's wrong assumptions (based on Davidlohr's work). 

I'll change the test to cover both scenarios and expect the right results from 
them.

> On 29 May 2018, at 04:08, Greg Kroah-Hartman  
> wrote:
> 
> On Tue, May 29, 2018 at 10:55:34AM +0530, Naresh Kamboju wrote:
>> On 28 May 2018 at 15:30, Greg Kroah-Hartman  
>> wrote:
>>> This is the start of the stable review cycle for the 4.16.13 release.
>>> There are 272 patches in this series, all will be posted as a response
>>> to this one.  If anyone has any issues with these being applied, please
>>> let me know.
>>> 
>>> Responses should be made by Wed May 30 10:01:02 UTC 2018.
>>> Anything received after that time might be too late.
>>> 
>>> The whole patch series can be found in one patch at:
>>>
>>> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.16.13-rc1.gz
>>> or in the git tree and branch at:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
>>> linux-4.16.y
>>> and the diffstat can be found below.
>>> 
>>> thanks,
>>> 
>>> greg k-h
>> 
>> Results from Linaro’s test farm.
>> No regressions on arm64, arm and x86_64.
>> 
>> NOTE:
>> The failed LTP test case "cve-2017-5669" is a waiver here.
> 
> Thanks for figuring that one out :)
> 
> Also, thanks for testing all of these and letting me know.



Re: [PATCH] selftests: gpio: gpio-mockup-chardev GPIOHANDLE_REQUEST_OUTPUT fix

2018-06-27 Thread Rafael Tinoco
Linus, Bartosz,

This was discovered during our investigations of a functional tests
regression/error:

https://bugs.linaro.org/show_bug.cgi?id=3769

Which turned out to be related to missing CONFIG_ARM{64}_MODULE_PLTS
config in our builds.

However, during investigations, we realized the functional test had
the issues best described in comment:

https://bugs.linaro.org/show_bug.cgi?id=3769#c3

Related to errno variable being considered outside the error scope.

Thank you
Rafael

On Thu, 14 Jun 2018 at 11:42, Linus Walleij  wrote:
>
> On Wed, Jun 6, 2018 at 7:44 PM, Rafael David Tinoco
>  wrote:
>
> > Following logic from commit: 22f6592b23, GPIOHANDLE_REQUEST_OUTPUT
> > should handle errors same way as GPIOHANDLE_REQUEST_INPUT does, or else
> > the following error occurs:
> >
> > gpio-mockup-chardev: gpio line<0> test flag<0x2> value<0>: No
> > such file or directory
> >
> > despite the real result of gpio_pin_test(), gpio_debugfs_get() and
> > gpiotools_request_linehandle() functions.
> >
> > Signed-off-by: Rafael David Tinoco 
>
> Bartosz, does this look OK to you?
>
> Yours,
> Linus Walleij


nfs: possible sync issue between nfs_call_unlink <-> nfs_async_unlink_release

2018-07-03 Thread Rafael Tinoco
BUG: https://bugs.linaro.org/show_bug.cgi?id=3731

During Linaro's Kernel Functional tests, we have observed the
following situation:

[   52.651490] DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)1UL))
[   52.651506] WARNING: CPU: 2 PID: 1457 at
./kernel/locking/rwsem.c:217 up_read_non_owner+0x5d/0x70
[   52.674398] Modules linked in: x86_pkg_temp_thermal fuse
[   52.679719] CPU: 2 PID: 1457 Comm: kworker/2:2 Not tainted 4.16.0 #1
[   52.687448] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[   52.694922] Workqueue: nfsiod rpc_async_release
[   52.699454] RIP: 0010:up_read_non_owner+0x5d/0x70
[   52.704157] RSP: 0018:9cbf81a23dd0 EFLAGS: 00010282
[   52.709376] RAX:  RBX: 8dc1983c76c0 RCX: 
[   52.716500] RDX: bd2d26c9 RSI: 0001 RDI: bd2d2889
[   52.723652] RBP: 9cbf81a23dd8 R08:  R09: 
[   52.730782] R10: 9cbf81a23dd0 R11:  R12: 8dc19abf8600
[   52.737906] R13: 8dc19b6c R14:  R15: 8dc19bacad80
[   52.745029] FS:  () GS:8dc1afd0()
knlGS:
[   52.753108] CS:  0010 DS:  ES:  CR0: 80050033
[   52.758845] CR2: 7f33794665d8 CR3: 00016c41e006 CR4: 003606e0
[   52.765968] DR0:  DR1:  DR2: 
[   52.773091] DR3:  DR6: fffe0ff0 DR7: 0400
[   52.780215] Call Trace:
[   52.782695]  nfs_async_unlink_release+0x32/0x80
[   52.787220]  rpc_free_task+0x30/0x50
[   52.790789]  rpc_async_release+0x12/0x20
[   52.794707]  process_one_work+0x25e/0x660
[   52.798713]  worker_thread+0x4b/0x410
[   52.802377]  kthread+0x10d/0x140
[   52.805600]  ? rescuer_thread+0x3a0/0x3a0
[   52.809652]  ? kthread_create_worker_on_cpu+0x70/0x70
[   52.814702]  ? do_syscall_64+0x69/0x1b0
[   52.818540]  ret_from_fork+0x3a/0x50

TEST RUNS:

https://lkft.validation.linaro.org/scheduler/job/167146#L2361
https://lkft.validation.linaro.org/scheduler/job/177145#L883

COMMENTS:

This started happening after commit 5149cbac4235 (locking/rwsem: Add
DEBUG_RWSEMS to look for lock/unlock mismatches) introduced checks for
the semaphores.

After some investigations (in Bug: #3731), the feedback, and assumptions, are:

Function "nfs_rmdir()" acquires the write semaphore in order to
guarantee that no unlink operations will happen to inodes inside the
directory to be unlinked, making it safe to unlink a directory on NFS.
The parent's dentry (read) semaphore serve as a synchronization
(mutual exclusive) for rmdir (read/write), to guarantee there are no
files being unlinked inside the directory when this is being unlinked.

The logic "nfs_call_unlink() -> nfs_do_call_unlink()" acquires the
inode's (that was unlinked) parent dentry->rmdir_sem (read) until the
inode's nfs_unlinkdata (needed for silly unlink) is finally
asynchronously (rpc call) cleaned by nfs_async_unlink_release() call,
the same place where inode's parent dentry->rmdir_sem is released, as
originally described in commit: 884be175351e. Purpose of this is to
guarantee that the parent directory won't be unlinked while the inode
is being async unlinked.

Unfortunately it seems that there is some condition that either made
the read semaphore (from inode's parent dentry) to be released (by the
same async logic ?) before nfs_async_unlink_release() ran through RPC
call OR the inode was moved (rmdir_sem doesn't seem to be acquired in
other situations) to a different directory (caching a new dentry
without the read semaphore acquired), making the DEBUG_LOCKS_WARN_ON
to have caught this.

Thank you for reviewing this.
-- 
Linaro LKFT
https://lkft.linaro.org


Re: LTP CVE cve-2017-17053 test failed on x86_64 device

2018-06-20 Thread Rafael Tinoco
I believe the error message on boot is solved by LKML thread:

[PATCH] locking/rwsem: Fix up_read_non_owner() warning with DEBUG_RWSEMS

Looks like that is what is tainting the kernel.

On 20 June 2018 at 08:11, Naresh Kamboju  wrote:
> On 20 June 2018 at 12:51, Michael Moese  wrote:
>> Hi,
>>
>> On Wed, Jun 20, 2018 at 12:14:22PM +0530, Naresh Kamboju wrote:
>>> Test FAIL case output,
>>> tst_test.c:1015: INFO: Timeout per run is 0h 15m 00s
>>> tst_taint.c:88: BROK: Kernel is already tainted: 512
>> The kernel is already tainted. In this case, the test refuses to run,
>> because it could not tell if the test is pass or fail.
>>
>> Could you please check if you could run the test directly after a
>> reboot?
>
> This single test ran immediately after the boot and bug reproduced.
>
> tst_taint.c:88: BROK: Kernel is already tainted: 512
> https://lkft.validation.linaro.org/scheduler/job/293222#L1204
>
> Test command for 10 iterations and it failed for all 10 iterations.
> + ./runltp -s cve-2017-17053 -I 10
>
> NOTE:
> We still see kernel warning while booting the x86_64 machine.
> DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))
>
> - Naresh
>
>>
>> Regards,
>> Michael
>> --
>> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 
>> 21284 (AG Nürnberg)


Re: [PATCH 4.9 00/71] 4.9.134-stable review

2018-10-17 Thread Rafael Tinoco
On Tue, Oct 16, 2018 at 2:23 PM Greg Kroah-Hartman
 wrote:
>
> This is the start of the stable review cycle for the 4.9.134 release.
> There are 71 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Oct 18 17:05:18 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.134-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Summary


kernel: 4.9.134-rc2
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.9.y
git commit: 9e48abe2679cbb419f7472c31d11c06711b5ebc7
git describe: v4.9.133-71-g9e48abe2679c
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.133-71-g9e48abe2679c


No regressions (compared to build v4.9.133-72-gda849e5647be)


No fixes (compared to build v4.9.133-72-gda849e5647be)


Ran 20932 total tests in the following environments and test suites.

Environments
--
- dragonboard-410c - arm64
- hi6220-hikey - arm64
- i386
- juno-r2 - arm64
- qemu_arm
- qemu_arm64
- qemu_i386
- qemu_x86_64
- x15 - arm
- x86_64

Test Suites
---
* boot
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* kselftest
* libhugetlbfs
* ltp-fs-tests
* ltp-fsx-tests
* ltp-open-posix-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none

-- 
Linaro LKFT
https://lkft.linaro.org


Re: [PATCH 4.14 000/173] 4.14.72-stable review

2018-09-25 Thread Rafael Tinoco
Greg,

> > > This is the start of the stable review cycle for the 4.14.72 release.
> > > There are 173 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > >
> > > Responses should be made by Wed Sep 26 11:30:10 UTC 2018.
> > > Anything received after that time might be too late.
> > >
> > > The whole patch series can be found in one patch at:
> > > 
> > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.72-rc1.gz
> > > or in the git tree and branch at:
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > > linux-4.14.y
> > > and the diffstat can be found below.
> >
> > -rc2 is out to resolve some reported problems:
> >   
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.72-rc2.gz
>
> -rc2 looks good. There is a problem on dragonboard during boot that was
> introduced in v4.14.71 that I didn't notice last week. We'll bisect it
> and report back later this week. dragonboard on the other branches (4.9,
> 4.18, mainline) looks fine.

As Dan pointed out, during validation, we have bisected this issue on
a dragonboard 410c (can't find root device) to the following commit
for v4.14:

[1ed3a9307230] rpmsg: core: add support to power domains for devices

There is an on-going discussion on "[PATCH] rpmsg: core: add support
to power domains for devices" about this patch having other
dependencies and breaking something else on v4.14 as well.

Do you think we could drop this patch, for now, in a possible -rc3 for
v4.14.72 ? Dragonboards aren't being tested, because of this, since
v4.14.70. Hopefully it isn't too late for this release =).

BTW, I have just tested removing the commit from -rc2 and the board boots okay.

Thank you
-Rafael


Re: [PATCH 4.14 000/173] 4.14.72-stable review

2018-09-26 Thread Rafael Tinoco
> > Do you think we could drop this patch, for now, in a possible -rc3 for
> > v4.14.72 ? Dragonboards aren't being tested, because of this, since
> > v4.14.70. Hopefully it isn't too late for this release =).
>
> I can't "drop" it as it is already in a released kernel, 4.14.71 and
> 4.18.9.  I can revert it though, and will do so for the next round of
> releases after this one.

Yes, bad wording, sorry. That it was I meant and tested. Thank you.


Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-10 Thread Rafael Tinoco
Paul E. McKenney, Eric Biederman, David Miller (and/or anyone else interested):

It was brought to my attention that netns creation/execution might
have suffered scalability/performance regression after v3.8.

I would like you, or anyone interested, to review these charts/data
and check if there is something that could be discussed/said before I
move further.

The following script was used for all the tests and charts generation:


#!/bin/bash
IP=/sbin/ip

function add_fake_router_uuid() {
j=`uuidgen`
$IP netns add bar-${j}
$IP netns exec bar-${j} $IP link set lo up
$IP netns exec bar-${j} sysctl -w net.ipv4.ip_forward=1 > /dev/null
k=`echo $j | cut -b -11`
$IP link add qro-${k} type veth peer name qri-${k} netns bar-${j}
$IP link add qgo-${k} type veth peer name qgi-${k} netns bar-${j}
}

for i in `seq 1 $1`; do
if [ `expr $i % 250` -eq 0 ]; then
echo "$i by `date +%s`"
fi
add_fake_router_uuid
done


This script gives how many "fake routers" are added per second (from 0
to 3000 router creation mark, ex). With this and a git bisect on
kernel tree I was led to one specific commit causing
scalability/performance regression: #911af50 "rcu: Provide
compile-time control for no-CBs CPUs". Even Though this change was
experimental at that point, it introduced a performance scalability
regression (explained below) that still lasts.

RCU related code looked like to be responsible for the problem. With
that, every commit from tag v3.8 to master that changed any of this
files: "kernel/rcutree.c kernel/rcutree.h kernel/rcutree_plugin.h
include/trace/events/rcu.h include/linux/rcupdate.h" had the kernel
checked out/compiled/tested. The idea was to check performance
regression during rcu development, if that was the case. In the worst
case, the regression not being related to rcu, I would still have
chronological data to interpret.

All text below this refer to 2 groups of charts, generated during the study:


1) Kernel git tags from 3.8 to 3.14.
*** http://people.canonical.com/~inaddy/lp1328088/charts/250-tag.html ***

2) Kernel git commits for rcu development (111 commits) -> Clearly
shows regressions:
*** http://people.canonical.com/~inaddy/lp1328088/charts/250.html ***

Obs:

1) There is a general chart with 111 commits. With this chart you can
see performance evolution/regression on each test mark. Test mark goes
from 0 to 2500 and refers to "fake routers already created". Example:
Throughput was 50 routers/sec on 250 already created mark and 30
routers/sec on 1250 mark.

2) Clicking on a specific commit will give you that commit evolution
from 0 routers already created to 2500 routers already created mark.


Since there were differences in results, depending on how many cpus or
how the no-cb cpus were configured, 3 kernel config options were used
on every measure, for 1 and 4 cpus.


- CONFIG_RCU_NOCB_CPU (disabled): nocbno
- CONFIG_RCU_NOCB_CPU_ALL (enabled): nocball
- CONFIG_RCU_NOCB_CPU_NONE (enabled): nocbnone

Obs: For 1 cpu cases: nocbno, nocbnone, nocball behaves the same (or
should) since w/ only 1 cpu there is no no-cb cpu.


After charts being generated it was clear that NOCB_CPU_ALL (4 cpus)
affected the "fake routers" creation process performance and this
regression continues up to upstream version. It was also clear that,
after commit #911af50, having more than 1 cpu does not improve
performance/scalability for netns, makes it worse.

#911af50

...
+#ifdef CONFIG_RCU_NOCB_CPU_ALL
+ pr_info("\tExperimental no-CBs for all CPUs\n");
+ cpumask_setall(rcu_nocb_mask);
+#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ALL */
...


Comparing standing out points (see charts):

#81e5949 - good
#911af50 - bad

I was able to see that, from the script above, the following lines
causes major impact on netns scalability/performance:

1) ip netns add -> huge performance regression:

 1 cpu: no regression
 4 cpu: regression for NOCB_CPU_ALL

 obs: regression from 250 netns/sec to 50 netns/sec on 500 netns
already created mark

2) ip netns exec -> some performance regression

 1 cpu: no regression
 4 cpu: regression for NOCB_CPU_ALL

 obs: regression from 40 netns (+1 exec per netns creation) to 20
netns/sec on 500 netns created mark



FULL NOTE: http://people.canonical.com/~inaddy/lp1328088/

** Assumption: RCU callbacks being offloaded to multiple cpus
(cpumask_setall) caused regression in
copy_net_ns<-created_new_namespaces or unshare(clone_newnet).

** Next Steps: I'll probably begin to function_graph netns creation execution
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-11 Thread Rafael Tinoco
 Im aware of that.. but i just did an automatized testing tool
for this and
make nconfig fixed my non-existent CONFIG_* options for kernels before that
specific commit.

> If you want to see the real CONFIG_RCU_NOCB_CPU_ALL effect before that
> commit, you need to use the rcu_nocbs= boot parameter.
>

Absolutely, I'll try with 2 or 3 commits before #911af50 just in case.

> Again, what you are seeing is the effect of callback offloading on
> a workload not particularly suited for it.  That said, I don't understand
> why you are seeing any particular effect when offloading is completely
> disabled unless your workload is sensitive to grace-period latency.
>

Wanted to make sure results were correctly. Starting to investigate netns
functions (copied some of netns developers here also). Totally agree
and this confirm my hypothesis.

>
> Some questions:
>
> o   Why does the throughput drop all the way to zero at various points?

Explained earlier. Check if it is 0.00 or 0.xx. 0.00 can mean unbootable kernel.

>
> o   What exactly is this benchmark doing?

Explained earlier. Simulating cloud infrastructure migrating netns on failure.

>
> o   Is this benchmark sensitive to grace-period latency?
> (You can check this by changing the value of HZ, give or take.)

Will do that.

> o   How many runs were taken at each point?  If more than one, what
> was the variability?

For all commits only one. For pointed out commits more then one.
Results tend to
be the same with minimum variation. Trying to balance efforts on digging into
the problem versus getting more results.

If you think, after my next answers (changing HZ, FAST_NOHZ) that
remeasuring everything is a must, let me know then I'll work on
deviation for you.

>
> o   Routers per second means what?

Explained earlier.

>
> o   How did you account for the effects of other non-RCU commits?
> Did you rebase the RCU commits on top of an older release without
> the other commits or something similar?

I used the Linus git tree, checking out specific commits and compiling the
kernel. I've only used commits that changed RCU because of the bisect
result. Besides these commits I have only generated kernel for main
release tags.

In my point of view, if this is related to RCU, several things have to be
discussed: Is using NOCB_CPU_ALL for a general purpose kernel a
good option ? Is netns code too dependent of grace-period low latency
to scale ? Is there a way of minimizing this ?

> Thanx, Paul

No Paul, I have to thank you. Really appreciate your time.

Rafael (tinoco@canonical/~inaddy)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-11 Thread Rafael Tinoco
Eric,

I'll test the patch with the same testcase and let you all know.

Really appreciate everybody's efforts.

On Wed, Jun 11, 2014 at 5:55 PM, Eric W. Biederman
 wrote:
> "Paul E. McKenney"  writes:
>
>> On Wed, Jun 11, 2014 at 01:27:07PM -0500, Dave Chiluk wrote:
>>> On 06/11/2014 11:18 AM, Paul E. McKenney wrote:
>>> > On Wed, Jun 11, 2014 at 10:46:00AM -0500, David Chiluk wrote:
>>> >> Now think about what happens when a gateway goes down, the namespaces
>>> >> need to be migrated, or a new machine needs to be brought up to replace
>>> >> it.  When we're talking about 3000 namespaces, the amount of time it
>>> >> takes simply to recreate the namespaces becomes very significant.
>>> >>
>>> >> The script is a stripped down example of what exactly is being done on
>>> >> the neutron gateway in order to create namespaces.
>>> >
>>> > Are the namespaces torn down and recreated one at a time, or is there some
>>> > syscall, ioctl(), or whatever that allows bulk tear down and recreating?
>>> >
>>> >Thanx, Paul
>>>
>>> In the normal running case, the namespaces are created one at a time, as
>>> new customers create a new set of VMs on the cloud.
>>>
>>> However, in the case of failover to a new neutron gateway the namespaces
>>> are created all at once using the ip command (more or less serially).
>>>
>>> As far as I know there is no syscall or ioctl that allows bulk tear down
>>> and recreation.  if such a beast exists that might be helpful.
>>
>> The solution might be to create such a beast.  I might be able to shave
>> a bit of time off of this benchmark, but at the cost of significant
>> increases in RCU's CPU consumption.  A bulk teardown/recreation API could
>> reduce the RCU grace-period overhead by several orders of magnitude by
>> having a single RCU grace period cover a few thousand changes.
>>
>> This is why other bulk-change syscalls exist.
>>
>> Just out of curiosity, what syscalls does the ip command use?
>
> You can look in iproute2 ip/ipnetns.c
>
> But rought ip netns add does:
>
> unshare(CLONE_NEWNET);
> mkdir /var/run/netns/
> mount --bind /proc/self/ns/net /var/run/netns/
>
> I don't know if there is any sensible way to batch that work.
>
> (The unshare gets you into copy_net_ns in net/core/net_namespace.c
>  and to find all of the code it can call you have to trace all
>  of the register_pernet_subsys and register_pernet_device calls).
>
> At least for creation I would like to see if we can make all of the
> rcu_callback synchronize_rcu calls go away.  That seems preferable
> to batching at creation time.
>
> Eric



-- 
-- 
Rafael David Tinoco
Software Sustaining Engineer @ Canonical
Canonical Technical Services Engineering Team
# Email: rafael.tin...@canonical.com (GPG: 87683FC0)
# Phone: +55.11.9.6777.2727 (Americas/Sao_Paulo)
# LP: ~inaddy | IRC: tinoco | Skype: rafael.tinoco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-11 Thread Rafael Tinoco
I'm getting a kernel panic with your patch:

-- panic
-- mount_block_root
-- mount_root
-- prepare_namespace
-- kernel_init_freeable

It is giving me an unknown block device for the same config file i
used on other builds. Since my test is running on a kvm guest under a
ramdisk, i'm still checking if there are any differences between this
build and other ones but I think there aren't.

Any chances that "prepare_namespace" might be breaking mount_root ?

Tks

On Wed, Jun 11, 2014 at 9:14 PM, Eric W. Biederman
 wrote:
> "Paul E. McKenney"  writes:
>
>> On Wed, Jun 11, 2014 at 04:12:15PM -0700, Eric W. Biederman wrote:
>>> "Paul E. McKenney"  writes:
>>>
>>> > On Wed, Jun 11, 2014 at 01:46:08PM -0700, Eric W. Biederman wrote:
>>> >> On the chance it is dropping the old nsproxy which calls syncrhonize_rcu
>>> >> in switch_task_namespaces that is causing you problems I have attached
>>> >> a patch that changes from rcu_read_lock to task_lock for code that
>>> >> calls task_nsproxy from a different task.  The code should be safe
>>> >> and it should be an unquestions performance improvement but I have only
>>> >> compile tested it.
>>> >>
>>> >> If you can try the patch it will tell is if the problem is the rcu
>>> >> access in switch_task_namespaces (the only one I am aware of network
>>> >> namespace creation) or if the problem rcu case is somewhere else.
>>> >>
>>> >> If nothing else knowing which rcu accesses are causing the slow down
>>> >> seem important at the end of the day.
>>> >>
>>> >> Eric
>>> >>
>>> >
>>> > If this is the culprit, another approach would be to use workqueues from
>>> > RCU callbacks.  The following (untested, probably does not even build)
>>> > patch illustrates one such approach.
>>>
>>> For reference the only reason we are using rcu_lock today for nsproxy is
>>> an old lock ordering problem that does not exist anymore.
>>>
>>> I can say that in some workloads setns is a bit heavy today because of
>>> the synchronize_rcu and setns is more important that I had previously
>>> thought because pthreads break the classic unix ability to do things in
>>> your process after fork() (sigh).
>>>
>>> Today daemonize is gone, and notify the parent process with a signal
>>> relies on task_active_pid_ns which does not use nsproxy.  So the old
>>> lock ordering problem/race is gone.
>>>
>>> The description of what was happening when the code switched from
>>> task_lock to rcu_read_lock to protect nsproxy.
>>
>> OK, never mind, then!  ;-)
>
> I appreciate you posting your approach.  I just figured I should do
> my homework, and verify my fuzzy memory.
>
> Who knows there might be different performance problems with my
> approach.  But I am hoping this is one of those happy instances where we
> can just make everything simpler.
>
> Eric



-- 
-- 
Rafael David Tinoco
Software Sustaining Engineer @ Canonical
Canonical Technical Services Engineering Team
# Email: rafael.tin...@canonical.com (GPG: 87683FC0)
# Phone: +55.11.9.6777.2727 (Americas/Sao_Paulo)
# LP: ~inaddy | IRC: tinoco | Skype: rafael.tinoco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-11 Thread Rafael Tinoco
Ok, some misconfiguration here probably, never mind. I'll finish the
tests tomorrow, compare with existent ones and let you know asap. Tks.

On Wed, Jun 11, 2014 at 10:09 PM, Eric W. Biederman
 wrote:
> Rafael Tinoco  writes:
>
>> I'm getting a kernel panic with your patch:
>>
>> -- panic
>> -- mount_block_root
>> -- mount_root
>> -- prepare_namespace
>> -- kernel_init_freeable
>>
>> It is giving me an unknown block device for the same config file i
>> used on other builds. Since my test is running on a kvm guest under a
>> ramdisk, i'm still checking if there are any differences between this
>> build and other ones but I think there aren't.
>>
>> Any chances that "prepare_namespace" might be breaking mount_root ?
>
> My patch boots for me
>
> Eric



-- 
-- 
Rafael David Tinoco
Software Sustaining Engineer @ Canonical
Canonical Technical Services Engineering Team
# Email: rafael.tin...@canonical.com (GPG: 87683FC0)
# Phone: +55.11.9.6777.2727 (Americas/Sao_Paulo)
# LP: ~inaddy | IRC: tinoco | Skype: rafael.tinoco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-13 Thread Rafael Tinoco
Okay,

Tests with the same script were done.
I'm comparing : master + patch vs 3.15.0-rc5 (last sync'ed rcu commit)
and 3.9 last bisect good.

Same tests were made. I'm comparing the following versions:

1) master + suggested patch
2) 3.15.0-rc5 (last rcu commit in my clone)
3) 3.9-rc2 (last bisect good)

 master + sug patch  3.15.0-rc5 (last rcu)
3.9-rc2 (bisec good)
mark  no  noneall no   none allno

# (netns add) / sec

 250  125.00  250.00  250.00  20.8322.7350.00  83.33
 500  250.00  250.00  250.00  22.7322.7350.00  125.00
 750  250.00  125.00  125.00  20.8322.7362.50  125.00
1000  125.00  250.00  125.00  20.8320.8350.00  250.00
1250  125.00  125.00  250.00  22.7322.7350.00  125.00
1500  125.00  125.00  125.00  22.7322.7341.67  125.00
1750  125.00  125.00  83.33   22.7322.7350.00  83.33
2000  125.00  83.33   125.00  22.7325.0050.00  125.00

-> From 3.15 to patched tree, netns add performance was ***
restored/improved *** OK

# (netns add + 1 x exec) / sec

 250  11.90   14.71   31.25   5.00 6.76 15.63  62.50
 500  11.90   13.89   31.25   5.10 7.14 15.63  41.67
 750  11.90   13.89   27.78   5.10 7.14 15.63  50.00
1000  11.90   13.16   25.00   4.90 6.41 15.63  35.71
1250  11.90   13.89   25.00   4.90 6.58 15.63  27.78
1500  11.36   13.16   25.00   4.72 6.25 15.63  25.00
1750  11.90   12.50   22.73   4.63 5.56 14.71  20.83
2000  11.36   12.50   22.73   4.55 5.43 13.89  17.86

-> From 3.15 to patched tree, performance improves +100% but still
-50% of 3.9-rc2

# (netns add + 2 x exec) / sec

250   6.588.6216.67   2.81 3.97 9.26   41.67
500   6.588.3315.63   2.78 4.10 9.62   31.25
750   5.957.8115.63   2.69 3.85 8.93   25.00
1000  5.957.3513.89   2.60 3.73 8.93   20.83
1250  5.817.3513.89   2.55 3.52 8.62   16.67
1500  5.817.3513.16   0.00 3.47 8.62   13.89
1750  5.436.7613.16   0.00 3.47 8.62   11.36
2000  5.326.5812.50   0.00 3.38 8.339.26

-> Same as before.

# netns add + 2 x exec + 1 x ip link to netns

250   7.148.3314.71   2.87 3.97 8.62   35.71
500   6.948.3313.89   2.91 3.91 8.93   25.00
750   6.107.5813.89   2.75 3.79 8.06   19.23
1000  5.566.9412.50   2.69 3.85 8.06   14.71
1250  5.686.5811.90   2.58 3.57 7.81   11.36
1500  5.566.5810.87   0.00 3.73 7.58   10.00
1750  5.436.4110.42   0.00 3.57 7.14   8.62
2000  5.216.2510.00   0.00 3.33 7.14   6.94

-> Ip link add to netns did not change performance proportion much.

# netns add + 2 x exec + 2 x ip link to netns

250   7.358.6213.89   2.94 4.03 8.33   31.25
500   7.148.0612.50   2.94 4.03 8.06   20.83
750   6.417.5811.90   2.81 3.85 7.81   15.63
1000  5.957.1410.87   2.69 3.79 7.35   12.50
1250  5.816.7610.00   2.66 3.62 7.14   10.00
1500  5.686.419.623.73 6.76 8.06
1750  5.326.258.933.68 6.58 7.35
2000  5.436.108.333.42 6.10 6.41

-> Same as before.

OBS:

1) It seems that performance got improved for network namespace
addiction but maybe there can be some improvement also on netns
execution. This way we might achieve same performance as 3.9.0-rc2
(good bisect) had.

2) These tests were made with 4 cpu only.

3) Initial charts showed that 1 cpu case with all cpus as no-cb
(without this patch) had something like 50% of bisect good. The 4 cpu
(nocball) case had 26% of bisect good (like showed above in the last
case -> 31.25 -- 8.33).

4) With the patch, using 4 cpus and nocball, we now have 44% of bisect
good performance (against 26% we had).

5) NOCB_* is still an issue. It is clear that only NOCB_CPU_ALL option
is giving us something near last good commit performance.

Thank you

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-16 Thread Rafael Tinoco
...

On Fri, Jun 13, 2014 at 9:02 PM, Eric W. Biederman
 wrote:
> Rafael Tinoco  writes:
>
>> Okay,
>>
>> Tests with the same script were done.
>> I'm comparing : master + patch vs 3.15.0-rc5 (last sync'ed rcu commit)
>> and 3.9 last bisect good.
>>
>> Same tests were made. I'm comparing the following versions:
>>
>> 1) master + suggested patch
>> 2) 3.15.0-rc5 (last rcu commit in my clone)
>> 3) 3.9-rc2 (last bisect good)
>
> I am having a hard time making sense of your numbers.
>
> If I have read your email correctly my suggested patch caused:
> "ip netns add" numbers to improve
> 1x "ip netns exec" to improve some
> 2x "ip netns exec" to show no improvement
> "ip link add" to show no effect (after the 2x ip netns exec)

 - "netns add" are as good as they were before this regression.
 - "netns exec" are improved but still 50% of the last good bisect commit.
 - "link add" didn't show difference.

> This is interesting in a lot of ways.
> - This seems to confirm that the only rcu usage in ip netns add
>   was switch_task_namespaces.  Which is convinient as that rules
>   out most of the network stack when looking for performance oddities.
>
> - "ip netns exec" had an expected performance improvement
> - "ip netns exec" is still slow (so something odd is still going on)
> - "ip link add" appears immaterial to the performance problem.
>
> It would be interesting to switch the "ip link add" and "ip netns exec"
> in your test case to confirm that there is nothing interesting/slow
> going on in "ip link add"

 - will do that.

>
> Which leaves me with the question what ip "ip netns exec" remains
> that is using rcu and is slowing all of this down.

 - will check this also.

> Eric

Tks

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LTP CVE cve-2017-17053 test failed on x86_64 device

2018-06-20 Thread Rafael Tinoco
I believe the error message on boot is solved by LKML thread:

[PATCH] locking/rwsem: Fix up_read_non_owner() warning with DEBUG_RWSEMS

Looks like that is what is tainting the kernel.

On 20 June 2018 at 08:11, Naresh Kamboju  wrote:
> On 20 June 2018 at 12:51, Michael Moese  wrote:
>> Hi,
>>
>> On Wed, Jun 20, 2018 at 12:14:22PM +0530, Naresh Kamboju wrote:
>>> Test FAIL case output,
>>> tst_test.c:1015: INFO: Timeout per run is 0h 15m 00s
>>> tst_taint.c:88: BROK: Kernel is already tainted: 512
>> The kernel is already tainted. In this case, the test refuses to run,
>> because it could not tell if the test is pass or fail.
>>
>> Could you please check if you could run the test directly after a
>> reboot?
>
> This single test ran immediately after the boot and bug reproduced.
>
> tst_taint.c:88: BROK: Kernel is already tainted: 512
> https://lkft.validation.linaro.org/scheduler/job/293222#L1204
>
> Test command for 10 iterations and it failed for all 10 iterations.
> + ./runltp -s cve-2017-17053 -I 10
>
> NOTE:
> We still see kernel warning while booting the x86_64 machine.
> DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0)))
>
> - Naresh
>
>>
>> Regards,
>> Michael
>> --
>> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 
>> 21284 (AG Nürnberg)


Re: [PATCH] selftests: gpio: gpio-mockup-chardev GPIOHANDLE_REQUEST_OUTPUT fix

2018-06-27 Thread Rafael Tinoco
Linus, Bartosz,

This was discovered during our investigations of a functional tests
regression/error:

https://bugs.linaro.org/show_bug.cgi?id=3769

Which turned out to be related to missing CONFIG_ARM{64}_MODULE_PLTS
config in our builds.

However, during investigations, we realized the functional test had
the issues best described in comment:

https://bugs.linaro.org/show_bug.cgi?id=3769#c3

Related to errno variable being considered outside the error scope.

Thank you
Rafael

On Thu, 14 Jun 2018 at 11:42, Linus Walleij  wrote:
>
> On Wed, Jun 6, 2018 at 7:44 PM, Rafael David Tinoco
>  wrote:
>
> > Following logic from commit: 22f6592b23, GPIOHANDLE_REQUEST_OUTPUT
> > should handle errors same way as GPIOHANDLE_REQUEST_INPUT does, or else
> > the following error occurs:
> >
> > gpio-mockup-chardev: gpio line<0> test flag<0x2> value<0>: No
> > such file or directory
> >
> > despite the real result of gpio_pin_test(), gpio_debugfs_get() and
> > gpiotools_request_linehandle() functions.
> >
> > Signed-off-by: Rafael David Tinoco 
>
> Bartosz, does this look OK to you?
>
> Yours,
> Linus Walleij


Re: [PATCH 4.4 00/92] 4.4.133-stable review

2018-05-24 Thread Rafael Tinoco
> > kernel: 4.4.133-rc1
> > git repo: 
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> > git branch: linux-4.4.y
> > git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4
> > git describe: v4.4.132-93-g915a3d7cdea9
> > Test details: 
> > https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a3d7cdea9
> >
> >
> > No regressions (compared to build v4.4.132-71-g180635995c36)
>
> It should have gotten better, as there was a fix in here for at least 2
> LTP tests that we previously were not passing.  I don't know why you all
> were not reporting that in the past, it took someone else randomly
> deciding to run LTP to report it to me...
>
> Why did an improvement in results not show up?

Are you referring to the CLOCK_MONOTONIC_RAW fix for the arm64 vDSO ?
I think that CLOCK_MONOTONIC_RAW in VDSO wasn't backported to 4.4.y
(commit 49eea433b326 in mainline) so this "fix" is changing the
timekeeping sauce (that would fix MONOTONIC RAW) but not for 4.4.y in
ARM64. Needs review though :\


Re: [PATCH 4.4 00/92] 4.4.133-stable review

2018-05-24 Thread Rafael Tinoco
Thank you Daniel! Will investigate those.

Meanwhile, Greg, I referred to:

time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting

Since we're not using this type of clock on arm64's 4.4 kernel vdso
functions. This commit's description calls attention for it to be
responsible for fixing kselftest flacking tests, we wouldn't get that
on 4.4 according to bellow:

stable-rc-linux-4.14.y
dbb236c1ceb6 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO
82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support
9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW

stable-rc-linux-4.16.y
dbb236c1ceb6 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO
82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support
9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW

stable-rc-linux-4.4.y


stable-rc-linux-4.9.y
99f66b5182a4 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO
82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support
9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW

Yet, the second fix was backported to all (including 4.4.y):

stable-rc-linux-4.14.y
3d88d56c5873 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
stable-rc-linux-4.16.y
3d88d56c5873 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
stable-rc-linux-4.4.y
7c8bd6e07430 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
stable-rc-linux-4.9.y
a53bfdda06ac time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting

Not sure you want to keep it in 4.4, thought it was worth mentioning it.

Cheers.

On 24 May 2018 at 22:34, Daniel Sangorrin
 wrote:
> Hello Rafael,
>
> The tests fcntl35 and fcntl35_64 should have go from FAIL to PASS.
> https://www.spinics.net/lists/stable/msg239475.html
>
> Looking at
> https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a3d7cdea9/testrun/228569/suite/ltp-syscalls-tests/tests/
> I see that these two tests (and other important tests as well) are being 
> SKIPPED.
>
> By the way, I see that select04 FAILS in your case. But in my setup, select04 
> was working fine (x86_64) in 4.4.132. I will confirm that it still works in 
> 4.4.133
>
> Thanks,
> Daniel Sangorrin
>
>> -Original Message-
>> From: stable-ow...@vger.kernel.org [mailto:stable-ow...@vger.kernel.org] On
>> Behalf Of Rafael Tinoco
>> Sent: Friday, May 25, 2018 5:32 AM
>> To: Greg Kroah-Hartman 
>> Cc: linux-kernel@vger.kernel.org; sh...@kernel.org; patc...@kernelci.org;
>> lkft-tri...@lists.linaro.org; ben.hutchi...@codethink.co.uk;
>> sta...@vger.kernel.org; a...@linux-foundation.org;
>> torva...@linux-foundation.org; li...@roeck-us.net
>> Subject: Re: [PATCH 4.4 00/92] 4.4.133-stable review
>>
>> > > kernel: 4.4.133-rc1
>> > > git repo:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>> > > git branch: linux-4.4.y
>> > > git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4
>> > > git describe: v4.4.132-93-g915a3d7cdea9
>> > > Test details:
>> https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-g915a
>> 3d7cdea9
>> > >
>> > >
>> > > No regressions (compared to build v4.4.132-71-g180635995c36)
>> >
>> > It should have gotten better, as there was a fix in here for at least 2
>> > LTP tests that we previously were not passing.  I don't know why you all
>> > were not reporting that in the past, it took someone else randomly
>> > deciding to run LTP to report it to me...
>> >
>> > Why did an improvement in results not show up?
>>
>> Are you referring to the CLOCK_MONOTONIC_RAW fix for the arm64 vDSO ?
>> I think that CLOCK_MONOTONIC_RAW in VDSO wasn't backported to 4.4.y
>> (commit 49eea433b326 in mainline) so this "fix" is changing the
>> timekeeping sauce (that would fix MONOTONIC RAW) but not for 4.4.y in
>> ARM64. Needs review though :\
>
>
>


Re: [PATCH 4.9 00/71] 4.9.134-stable review

2018-10-17 Thread Rafael Tinoco
On Tue, Oct 16, 2018 at 2:23 PM Greg Kroah-Hartman
 wrote:
>
> This is the start of the stable review cycle for the 4.9.134 release.
> There are 71 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Oct 18 17:05:18 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.134-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Summary


kernel: 4.9.134-rc2
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.9.y
git commit: 9e48abe2679cbb419f7472c31d11c06711b5ebc7
git describe: v4.9.133-71-g9e48abe2679c
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.133-71-g9e48abe2679c


No regressions (compared to build v4.9.133-72-gda849e5647be)


No fixes (compared to build v4.9.133-72-gda849e5647be)


Ran 20932 total tests in the following environments and test suites.

Environments
--
- dragonboard-410c - arm64
- hi6220-hikey - arm64
- i386
- juno-r2 - arm64
- qemu_arm
- qemu_arm64
- qemu_i386
- qemu_x86_64
- x15 - arm
- x86_64

Test Suites
---
* boot
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* kselftest
* libhugetlbfs
* ltp-fs-tests
* ltp-fsx-tests
* ltp-open-posix-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none

-- 
Linaro LKFT
https://lkft.linaro.org


Re: [PATCH 4.16 000/272] 4.16.13-stable review

2018-05-29 Thread Rafael Tinoco
The following bug has been opened for LTP:

https://github.com/linux-test-project/ltp/issues/319

for the CVE-2017-5669's wrong assumptions (based on Davidlohr's work). 

I'll change the test to cover both scenarios and expect the right results from 
them.

> On 29 May 2018, at 04:08, Greg Kroah-Hartman  
> wrote:
> 
> On Tue, May 29, 2018 at 10:55:34AM +0530, Naresh Kamboju wrote:
>> On 28 May 2018 at 15:30, Greg Kroah-Hartman  
>> wrote:
>>> This is the start of the stable review cycle for the 4.16.13 release.
>>> There are 272 patches in this series, all will be posted as a response
>>> to this one.  If anyone has any issues with these being applied, please
>>> let me know.
>>> 
>>> Responses should be made by Wed May 30 10:01:02 UTC 2018.
>>> Anything received after that time might be too late.
>>> 
>>> The whole patch series can be found in one patch at:
>>>
>>> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.16.13-rc1.gz
>>> or in the git tree and branch at:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
>>> linux-4.16.y
>>> and the diffstat can be found below.
>>> 
>>> thanks,
>>> 
>>> greg k-h
>> 
>> Results from Linaro’s test farm.
>> No regressions on arm64, arm and x86_64.
>> 
>> NOTE:
>> The failed LTP test case "cve-2017-5669" is a waiver here.
> 
> Thanks for figuring that one out :)
> 
> Also, thanks for testing all of these and letting me know.



nfs: possible sync issue between nfs_call_unlink <-> nfs_async_unlink_release

2018-07-03 Thread Rafael Tinoco
BUG: https://bugs.linaro.org/show_bug.cgi?id=3731

During Linaro's Kernel Functional tests, we have observed the
following situation:

[   52.651490] DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)1UL))
[   52.651506] WARNING: CPU: 2 PID: 1457 at
./kernel/locking/rwsem.c:217 up_read_non_owner+0x5d/0x70
[   52.674398] Modules linked in: x86_pkg_temp_thermal fuse
[   52.679719] CPU: 2 PID: 1457 Comm: kworker/2:2 Not tainted 4.16.0 #1
[   52.687448] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[   52.694922] Workqueue: nfsiod rpc_async_release
[   52.699454] RIP: 0010:up_read_non_owner+0x5d/0x70
[   52.704157] RSP: 0018:9cbf81a23dd0 EFLAGS: 00010282
[   52.709376] RAX:  RBX: 8dc1983c76c0 RCX: 
[   52.716500] RDX: bd2d26c9 RSI: 0001 RDI: bd2d2889
[   52.723652] RBP: 9cbf81a23dd8 R08:  R09: 
[   52.730782] R10: 9cbf81a23dd0 R11:  R12: 8dc19abf8600
[   52.737906] R13: 8dc19b6c R14:  R15: 8dc19bacad80
[   52.745029] FS:  () GS:8dc1afd0()
knlGS:
[   52.753108] CS:  0010 DS:  ES:  CR0: 80050033
[   52.758845] CR2: 7f33794665d8 CR3: 00016c41e006 CR4: 003606e0
[   52.765968] DR0:  DR1:  DR2: 
[   52.773091] DR3:  DR6: fffe0ff0 DR7: 0400
[   52.780215] Call Trace:
[   52.782695]  nfs_async_unlink_release+0x32/0x80
[   52.787220]  rpc_free_task+0x30/0x50
[   52.790789]  rpc_async_release+0x12/0x20
[   52.794707]  process_one_work+0x25e/0x660
[   52.798713]  worker_thread+0x4b/0x410
[   52.802377]  kthread+0x10d/0x140
[   52.805600]  ? rescuer_thread+0x3a0/0x3a0
[   52.809652]  ? kthread_create_worker_on_cpu+0x70/0x70
[   52.814702]  ? do_syscall_64+0x69/0x1b0
[   52.818540]  ret_from_fork+0x3a/0x50

TEST RUNS:

https://lkft.validation.linaro.org/scheduler/job/167146#L2361
https://lkft.validation.linaro.org/scheduler/job/177145#L883

COMMENTS:

This started happening after commit 5149cbac4235 (locking/rwsem: Add
DEBUG_RWSEMS to look for lock/unlock mismatches) introduced checks for
the semaphores.

After some investigations (in Bug: #3731), the feedback, and assumptions, are:

Function "nfs_rmdir()" acquires the write semaphore in order to
guarantee that no unlink operations will happen to inodes inside the
directory to be unlinked, making it safe to unlink a directory on NFS.
The parent's dentry (read) semaphore serve as a synchronization
(mutual exclusive) for rmdir (read/write), to guarantee there are no
files being unlinked inside the directory when this is being unlinked.

The logic "nfs_call_unlink() -> nfs_do_call_unlink()" acquires the
inode's (that was unlinked) parent dentry->rmdir_sem (read) until the
inode's nfs_unlinkdata (needed for silly unlink) is finally
asynchronously (rpc call) cleaned by nfs_async_unlink_release() call,
the same place where inode's parent dentry->rmdir_sem is released, as
originally described in commit: 884be175351e. Purpose of this is to
guarantee that the parent directory won't be unlinked while the inode
is being async unlinked.

Unfortunately it seems that there is some condition that either made
the read semaphore (from inode's parent dentry) to be released (by the
same async logic ?) before nfs_async_unlink_release() ran through RPC
call OR the inode was moved (rmdir_sem doesn't seem to be acquired in
other situations) to a different directory (caching a new dentry
without the read semaphore acquired), making the DEBUG_LOCKS_WARN_ON
to have caught this.

Thank you for reviewing this.
-- 
Linaro LKFT
https://lkft.linaro.org


Re: [PATCH 4.9 00/31] 4.9.108-stable review

2018-06-13 Thread Rafael Tinoco
On 12 June 2018 at 13:46, Greg Kroah-Hartman  wrote:
> This is the start of the stable review cycle for the 4.9.108 release.
> There are 31 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Jun 14 16:46:09 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.108-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

NOTE:
There is an intermittent failure LTP fs read_all_sys on specific to arm64 Hikey
board we will investigate further.
https://bugs.linaro.org/show_bug.cgi?id=3903

Summary


kernel: 4.9.108-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.9.y
git commit: 9b3f06c8225324c48370ed02288023578494f050
git describe: v4.9.107-32-g9b3f06c82253
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.107-32-g9b

No Regressions (compared to build v4.9.107)


Ran 11373 total tests in the following environments and test suites.

Environments
--
- dragonboard-410c - arm64
- hi6220-hikey - arm64
- juno-r2 - arm64
- qemu_arm
- qemu_arm64
- qemu_x86_64
- x15 - arm
- x86_64

Test Suites
---
* boot
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none

--
Linaro LKFT
https://lkft.linaro.org


Re: [PATCH 4.4 00/24] 4.4.137-stable review

2018-06-13 Thread Rafael Tinoco
Results from Linaro’s test farm.
Regressions detected.

NOTE:

1) LTP vma03 test (cve-2011-2496) broken on v4.4-137-rc1 because of:

 6ea1dc96a03a mmap: relax file size limit for regular files
 bd2f9ce5bacb mmap: introduce sane default mmap limits

   discussion:

 https://github.com/linux-test-project/ltp/issues/341

   mainline commit (v4.13-rc7):

 0cc3b0ec23ce Clarify (and fix) MAX_LFS_FILESIZE macros

   should be backported to 4.4.138-rc2 and fixes the issue.

2) select04 failure on x15 board will be investigated in:

 https://bugs.linaro.org/show_bug.cgi?id=3852

   and seems to be a timing issue (HW related).

Summary


kernel: 4.4.137-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.4.y
git commit: 678437d36d4e14a029309f1c282802ce47fda36a
git describe: v4.4.136-25-g678437d36d4e
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.136-25-g678437d36d4e

Regressions (compared to build v4.4.136)


qemu_arm:
  ltp-cve-tests:
* cve-2011-2496
* runltp_cve

* test src: git://github.com/linux-test-project/ltp.git

x15 - arm:
  ltp-cve-tests:
* cve-2011-2496
* runltp_cve

* test src: git://github.com/linux-test-project/ltp.git
  ltp-syscalls-tests:
* runltp_syscalls
* select04

* test src: git://github.com/linux-test-project/ltp.git

Ran 7100 total tests in the following environments and test suites.

Environments
--
- juno-r2 - arm64
- qemu_arm
- qemu_x86_64
- x15 - arm
- x86_64

Test Suites
---
* boot
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none

Summary


kernel: 4.4.137-rc1
git repo: https://git.linaro.org/lkft/arm64-stable-rc.git
git branch: 4.4.137-rc1-hikey-20180612-214
git commit: e5d5cb57472f9f98a68f872664de3d70610019e1
git describe: 4.4.137-rc1-hikey-20180612-214
Test details: 
https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.137-rc1-hikey-20180612-214

No regressions (compared to build 4.4.136-rc2-hikey-20180606-212)

Ran 2611 total tests in the following environments and test suites.

Environments
--
- hi6220-hikey - arm64
- qemu_arm64

Test Suites
---
* boot
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* ltp-fs-tests

--
Linaro LKFT
https://lkft.linaro.org

On 12 June 2018 at 13:51, Greg Kroah-Hartman  wrote:
> This is the start of the stable review cycle for the 4.4.137 release.
> There are 24 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Jun 14 16:48:07 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.137-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h


Re: [PATCH 4.4 00/24] 4.4.137-stable review

2018-06-13 Thread Rafael Tinoco
On 13 June 2018 at 18:08, Rafael David Tinoco
 wrote:
> On Wed, Jun 13, 2018 at 6:00 PM, Greg Kroah-Hartman
>  wrote:
>> On Wed, Jun 13, 2018 at 05:47:49PM -0300, Rafael Tinoco wrote:
>>> Results from Linaro’s test farm.
>>> Regressions detected.
>>>
>>> NOTE:
>>>
>>> 1) LTP vma03 test (cve-2011-2496) broken on v4.4-137-rc1 because of:
>>>
>>>  6ea1dc96a03a mmap: relax file size limit for regular files
>>>  bd2f9ce5bacb mmap: introduce sane default mmap limits
>>>
>>>discussion:
>>>
>>>  https://github.com/linux-test-project/ltp/issues/341
>>>
>>>mainline commit (v4.13-rc7):
>>>
>>>  0cc3b0ec23ce Clarify (and fix) MAX_LFS_FILESIZE macros
>>>
>>>should be backported to 4.4.138-rc2 and fixes the issue.
>>
>> Really?  That commit says it fixes c2a9737f45e2 ("vfs,mm: fix a dead
>> loop in truncate_inode_pages_range()") which is not in 4.4.y at all.
>>
>> Did you test this out?
>
> Yes, the LTP contains the tests (last comment is the final test for
> arm32, right before Jan tests i686).
>
> Fixing MAX_LFS_FILESIZE fixes the new limit for mmap() brought by
> those 2 commits (file_mmap_size_max()).
> offset tested by the LTP test is 0xfffe000.
> file_mmap_size_max gives: 0x000 as max value, but only after
> the mentioned patch.
>
> Original intent for this fix was other though.

To clarify this a bit further.

The LTP CVE test is breaking in the first call to mmap(), even before
trying to remap and test the security issue. That start happening in
this round because of those mmap() changes and the offset used in the
LTP test. Linus changed limit checks and made them to be related to
MAX_LFS_FILESIZE. Unfortunately, in 4.4 stable, we were missing the
fix for MAX_LFS_FILESIZE (which before commit 0cc3b0ec23ce was less
than the REAL 32 bit limit).

Commit 0cc3b0ec23ce was made because an user noticed the FS limit not
being what it should be. In our case, the 4.4 stable kernel, we are
facing this 32 bit lower limit (than the real 32 bit real limit),
because of the LTP CVE test, so we need this fix to have the real 32
bit limit set for that macro (mmap limits did not use that macro
before).

I have tested in arm32 and Jan Stancek, who first responded to LTP
issue, has tested this in i686 and both worked after that patch was
included to v4.4-137-rc1 (my last test was even with 4.4.138-rc1).

Hope that helps a bit.


Re: [LTP] [PATCH 4.4 00/24] 4.4.137-stable review

2018-06-14 Thread Rafael Tinoco
Jan, Naresh,

Patch has been queued to 4.4 (for the next review round, yet to be
merged to stable-rc branch):

https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-4.4

as "clarify-and-fix-max_lfs_filesize-macros.patch"

Thank you!

On 14 June 2018 at 07:36, Jan Stancek  wrote:
>
> - Original Message -
>> On Thu, Jun 14, 2018 at 05:49:52AM -0400, Jan Stancek wrote:
>> >
>> > - Original Message -
>> > > On Thu, Jun 14, 2018 at 02:24:25PM +0530, Naresh Kamboju wrote:
>> > > > On 14 June 2018 at 12:04, Greg Kroah-Hartman
>> > > > 
>> > > > wrote:
>> > > > > On Wed, Jun 13, 2018 at 10:48:50PM -0300, Rafael Tinoco wrote:
>> > > > >> On 13 June 2018 at 18:08, Rafael David Tinoco
>> > > > >>  wrote:
>> > > > >> > On Wed, Jun 13, 2018 at 6:00 PM, Greg Kroah-Hartman
>> > > > >> >  wrote:
>> > > > >> >> On Wed, Jun 13, 2018 at 05:47:49PM -0300, Rafael Tinoco wrote:
>> > > > >> >>> Results from Linaro’s test farm.
>> > > > >> >>> Regressions detected.
>> > > > >> >>>
>> > > > >> >>> NOTE:
>> > > > >> >>>
>> > > > >> >>> 1) LTP vma03 test (cve-2011-2496) broken on v4.4-137-rc1 because
>> > > > >> >>> of:
>> > > > >> >>>
>> > > > >> >>>  6ea1dc96a03a mmap: relax file size limit for regular files
>> > > > >> >>>  bd2f9ce5bacb mmap: introduce sane default mmap limits
>> > > > >> >>>
>> > > > >> >>>discussion:
>> > > > >> >>>
>> > > > >> >>>  https://github.com/linux-test-project/ltp/issues/341
>> > > > >> >>>
>> > > > >> >>>mainline commit (v4.13-rc7):
>> > > > >> >>>
>> > > > >> >>>  0cc3b0ec23ce Clarify (and fix) MAX_LFS_FILESIZE macros
>> > > > >> >>>
>> > > > >> >>>should be backported to 4.4.138-rc2 and fixes the issue.
>> > > > >> >>
>> > > > >> >> Really?  That commit says it fixes c2a9737f45e2 ("vfs,mm: fix a
>> > > > >> >> dead
>> > > > >> >> loop in truncate_inode_pages_range()") which is not in 4.4.y at
>> > > > >> >> all.
>> > > > >> >>
>> > > > >> >> Did you test this out?
>> > > > >> >
>> > > > >> > Yes, the LTP contains the tests (last comment is the final test
>> > > > >> > for
>> > > > >> > arm32, right before Jan tests i686).
>> > > > >> >
>> > > > >> > Fixing MAX_LFS_FILESIZE fixes the new limit for mmap() brought by
>> > > > >> > those 2 commits (file_mmap_size_max()).
>> > > > >> > offset tested by the LTP test is 0xfffe000.
>> > > > >> > file_mmap_size_max gives: 0x000 as max value, but only
>> > > > >> > after
>> > > > >> > the mentioned patch.
>> > > > >> >
>> > > > >> > Original intent for this fix was other though.
>> > > > >>
>> > > > >> To clarify this a bit further.
>> > > > >>
>> > > > >> The LTP CVE test is breaking in the first call to mmap(), even
>> > > > >> before
>> > > > >> trying to remap and test the security issue. That start happening in
>> > > > >> this round because of those mmap() changes and the offset used in
>> > > > >> the
>> > > > >> LTP test. Linus changed limit checks and made them to be related to
>> > > > >> MAX_LFS_FILESIZE. Unfortunately, in 4.4 stable, we were missing the
>> > > > >> fix for MAX_LFS_FILESIZE (which before commit 0cc3b0ec23ce was less
>> > > > >> than the REAL 32 bit limit).
>> > > > >>
>> > > > >> Commit 0cc3b0ec23ce was made because an user noticed the FS limit
>> > > > >> not
>> > > > >> being what i

Re: [PATCH 4.14 000/173] 4.14.72-stable review

2018-09-25 Thread Rafael Tinoco
Greg,

> > > This is the start of the stable review cycle for the 4.14.72 release.
> > > There are 173 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > >
> > > Responses should be made by Wed Sep 26 11:30:10 UTC 2018.
> > > Anything received after that time might be too late.
> > >
> > > The whole patch series can be found in one patch at:
> > > 
> > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.72-rc1.gz
> > > or in the git tree and branch at:
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > > linux-4.14.y
> > > and the diffstat can be found below.
> >
> > -rc2 is out to resolve some reported problems:
> >   
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.72-rc2.gz
>
> -rc2 looks good. There is a problem on dragonboard during boot that was
> introduced in v4.14.71 that I didn't notice last week. We'll bisect it
> and report back later this week. dragonboard on the other branches (4.9,
> 4.18, mainline) looks fine.

As Dan pointed out, during validation, we have bisected this issue on
a dragonboard 410c (can't find root device) to the following commit
for v4.14:

[1ed3a9307230] rpmsg: core: add support to power domains for devices

There is an on-going discussion on "[PATCH] rpmsg: core: add support
to power domains for devices" about this patch having other
dependencies and breaking something else on v4.14 as well.

Do you think we could drop this patch, for now, in a possible -rc3 for
v4.14.72 ? Dragonboards aren't being tested, because of this, since
v4.14.70. Hopefully it isn't too late for this release =).

BTW, I have just tested removing the commit from -rc2 and the board boots okay.

Thank you
-Rafael


Re: [PATCH 4.14 000/173] 4.14.72-stable review

2018-09-26 Thread Rafael Tinoco
> > Do you think we could drop this patch, for now, in a possible -rc3 for
> > v4.14.72 ? Dragonboards aren't being tested, because of this, since
> > v4.14.70. Hopefully it isn't too late for this release =).
>
> I can't "drop" it as it is already in a released kernel, 4.14.71 and
> 4.18.9.  I can revert it though, and will do so for the next round of
> releases after this one.

Yes, bad wording, sorry. That it was I meant and tested. Thank you.