Stable update request: kernel changes to fix PIE with large stack

2017-03-23 Thread Ben Hutchings
On Thu, 2017-03-23 at 17:06 +, James Cowgill wrote:
[...]
> The reason the program and the heap are at these very high addresses is
> that xsltproc is built with PIE and the kernel is treating the
> executable like a mmap and grouping it with all the other libraries. In
> d1fd836dcf00 ("mm: split ET_DYN ASLR from mmap ASLR") the behavior
> changed and now the program and it's heap will be mapped at a lower
> address so the bug does not affect newer kernels. Using "setarch -L" or
> "setarch -R" is another workaround for this bug because that moves the
> program so that there is a much larger gap between the heap and the stack.
> 
> This might affect other applications as well. Effectively it means that
> PIE executables which use lots of stack space might not work properly
> with jessie's kernel. The chances the bug will be hit seems to vary
> between arches however (depending on what each arch does in
> arch_pick_mmap_layout and arch_randomize_brk) - mips64el seems to be hit
> pretty frequently. In xsltproc's case, PIE was enabled some time ago
> which is why this bug is quite old.
> 
> I believe any of the following will fix this (but have not all been tested):
> - Reduce the stack usage in xsltproc (the upstream bug)
> - Upgrade the relevant buildds to Linux >= 4.1
> - Apply d1fd836dcf00 to jessie's kernel

That's part of a series of 10 commits covering multiple architectures. 
I already picked one of them as a dependency for fixing CVE-2016-3672,
which leaves 9 to do.  I think it is worth doing this in stable to
support chroots and partial upgrades, but I would like to hear the
release team ack/nak this in principle before I start preparing the
change for Debian stable.

Kees Cook quotes the list of commits here:
http://lists.openwall.net/linux-kernel/2015/07/27/964
(I can't find the original message).

Ben.

> - Disable PIE in xsltproc.
> - Run xsltproc inside setarch -L / setarch -R
[...]

-- 
Ben Hutchings
The first rule of tautology club is the first rule of tautology club.


signature.asc
Description: This is a digitally signed message part


Why doesn't Debian offer low-latency kernel ?

2017-03-23 Thread shirish शिरीष
Dear Friends,

I have been wondering (for quite some time) why Debian hasn't ever
given low-latency kernel . I know we do have/give RT kernels.

The typical use case for real-time kernels is given as recording
studio or live-streaming or gaming server where latency is important.

>From what I could recall several years ago, low-latency kernel was
much better for specific applications at the cost of everything else
though.

Is it because there are not enough people in the Debian-kernel team as
in human resources or there hasn't been a need for it as such ?

Sorry for the long-winded question/query .

-- 
  Regards,
  Shirish Agarwal  शिरीष अग्रवाल
  My quotes in this email licensed under CC 3.0
http://creativecommons.org/licenses/by-nc/3.0/
http://flossexperiences.wordpress.com
EB80 462B 08E1 A0DE A73A  2C2F 9F3D C7A4 E1C4 D2D8



Bug#858125: e1000: ethernet interface hangs occasionally, kernel reports hang

2017-03-23 Thread Bruce Momjian,,,
On Wed, Mar 22, 2017 at 11:41:57PM -0400, Bruce Momjian,,, wrote:
> > OK, I am running this after setting flow control on/default on the
> > switch and Debian, and rebooting:
> > 
> > daemon -- sh -c "while :; do date;ethtool -S eth0| grep flow_control;
> > sleep 1;done > /root/ethtool"
> > 
> > I will report back with the relevant logging lines once it hangs again. 
> 
> OK, I have results of a hang after 24 hours of uptime.  The hangs are
> listed here via dmesg -T:
> 
>   http://momjian.us/expire/eth0/dmesg.txt
> 
> showing the watchdog warning/hang/reset at 23:01 and port hang/reset at
> 23:10.
> 
> I have also produced the ethtool -S output every second for the entire
> 24-hour period, gziped, at:
> 
>   http://momjian.us/expire/eth0/ethtool.gz
> 
> You will see reception of a large number of rx_flow_control_xoff
> messages about 50 minutes before the hangs, and just before the hangs.

I had four more 14 hours later so I created new files that also include
the earlier ones:

http://momjian.us/expire/eth0/dmesg2.txt
http://momjian.us/expire/eth0/ethtool2.gz

The last two dmesg lines at 13:29  are me turning of flow control on the
switch so they are not problems.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Bug#858405: xmlto: intermittent Segmentation fault when building manpages for libreswan on mips64el

2017-03-23 Thread James Cowgill
reassign 858405 xsltproc
forcemerge 750593 858405
retitle 750593 xsltproc: bus error on some arches with linux < 4.1
thanks

Hi,

On 22/03/17 21:01, Daniel Kahn Gillmor wrote:
> On Wed 2017-03-22 06:22:41 -0400, James Cowgill wrote:
>> On 22/03/17 01:29, Daniel Kahn Gillmor wrote:
>>> For debian revisions of 3.20, failures happened on:
>>>
>>>   mipsel-manda-02
>>>   eberlin
>>> 
>>> Also for revisions of 3.20, successes happened on:
>>>
>>>   mipsel-sil-01
>>>   mipsel-manda-03
>>>   mipsel-manda-01
>>
>> This is a known issue and it only affects Loongson buildds.
>> Interestingly mipsel-manda-01 is Loongson and didn't fail there so there
>> may be a random element involved here. I don't think anyone's tracked
>> down the underlying issue though.
> 
> thanks, is there a public reference for the known issue that we can
> point to?

I think #750593 looks a lot like the bug here.

After some investigation, it seems I was being a bit unfair to Loongson.
This is arguably a non mips specific bug in Linux < 4.1. It just so
happens that all the Loongson buildds run jessie's 3.16 kernel and all
the other buildds run >= 4.7 from backports.

In #750593 there was lots of talk about stack overflows causing this but
there is actually another element to this. Indeed if I reduced the stack
size down with ulimit, the segfaults become more frequent, but
increasing the stack size didn't help at all. After looking at the
mappings for a failing process, I saw this (taken just after starting
xsltproc):

[...]
> fff7f5-fff7f5c000 ---p 4000 fd:00 1060250
> /usr/lib/mips64el-linux-gnuabi64/libeatmydata.so.1.1.2
> fff7f5c000-fff7f6 rw-p  fd:00 1060250
> /usr/lib/mips64el-linux-gnuabi64/libeatmydata.so.1.1.2
> fff7f6-fff7f88000 r-xp  fd:00 1060375
> /lib/mips64el-linux-gnuabi64/ld-2.24.so
> fff7f94000-fff7f98000 rw-p 00024000 fd:00 1060375
> /lib/mips64el-linux-gnuabi64/ld-2.24.so
> fff7f98000-fff7fa r-xp  fd:00 947544 
> /usr/bin/xsltproc
> fff7fa4000-fff7fac000 rw-p  00:00 0
> fff7fac000-fff7fb rw-p 4000 fd:00 947544 
> /usr/bin/xsltproc
> 1d4000-384000 rwxp  00:00 0  
> [heap]
> 9e-a04000 rwxp  00:00 0  
> [stack]
> ffc000-100 r-xp  00:00 0 
> [vdso]

Notice that there is a very small gap between the heap and the stack
here (at least compared to working xsltproc runs). I think that the heap
is growing to a point where it limits the maximum size of the stack and
so increasing the stack size with ulimit doesn't help.

The reason the program and the heap are at these very high addresses is
that xsltproc is built with PIE and the kernel is treating the
executable like a mmap and grouping it with all the other libraries. In
d1fd836dcf00 ("mm: split ET_DYN ASLR from mmap ASLR") the behavior
changed and now the program and it's heap will be mapped at a lower
address so the bug does not affect newer kernels. Using "setarch -L" or
"setarch -R" is another workaround for this bug because that moves the
program so that there is a much larger gap between the heap and the stack.

This might affect other applications as well. Effectively it means that
PIE executables which use lots of stack space might not work properly
with jessie's kernel. The chances the bug will be hit seems to vary
between arches however (depending on what each arch does in
arch_pick_mmap_layout and arch_randomize_brk) - mips64el seems to be hit
pretty frequently. In xsltproc's case, PIE was enabled some time ago
which is why this bug is quite old.

I believe any of the following will fix this (but have not all been tested):
- Reduce the stack usage in xsltproc (the upstream bug)
- Upgrade the relevant buildds to Linux >= 4.1
- Apply d1fd836dcf00 to jessie's kernel
- Disable PIE in xsltproc.
- Run xsltproc inside setarch -L / setarch -R

>> For the moment, I'll rebuild libreswan again and hope a good buildd is
>> picked.
> 
> i see 5 mips64el rebuilds now at
> https://buildd.debian.org/status/logs.php?pkg=libreswan=3.20-6=sid,
> but none of them have succeded yet :/
> 
> 3 of the builds are from mipsel-manda-02, 1 is from eberlin, and one
> additional new "bad" builder is:
> 
>   mipsel-aql-01

There are 3 non-Loongson buildds: mipsel-aql-03, mipsel-manda-03 and
mipsel-sil-01. I expect libreswan will only build on one of those
buildds at the moment.

Thanks,
James



signature.asc
Description: OpenPGP digital signature