dcn10_get_dig_frontend problem like this fixed in "drm/amd/display: Add get_dig_frontend implementation for DCEx"

2021-01-13 Thread Andreas Hartmann
Hello,

I'm facing probably a similar problem on this machine during resume after s2ram 
with linux 5.10.7 (see attached file "trace").

The error happens in 
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_link_encoder.c:483 (see 
-line):

unsigned int dcn10_get_dig_frontend(struct link_encoder *enc)
{
struct dcn10_link_encoder *enc10 = TO_DCN10_LINK_ENC(enc);
int32_t value;
enum engine_id result;

REG_GET(DIG_BE_CNTL, DIG_FE_SOURCE_SELECT, );

switch (value) {
case DCN10_DIG_FE_SOURCE_SELECT_DIGA:
result = ENGINE_ID_DIGA;
break;
case DCN10_DIG_FE_SOURCE_SELECT_DIGB:
result = ENGINE_ID_DIGB;
break;
case DCN10_DIG_FE_SOURCE_SELECT_DIGC:
result = ENGINE_ID_DIGC;
break;
case DCN10_DIG_FE_SOURCE_SELECT_DIGD:
result = ENGINE_ID_DIGD;
break;
case DCN10_DIG_FE_SOURCE_SELECT_DIGE:
result = ENGINE_ID_DIGE;
break;
case DCN10_DIG_FE_SOURCE_SELECT_DIGF:
result = ENGINE_ID_DIGF;
break;
case DCN10_DIG_FE_SOURCE_SELECT_DIGG:
result = ENGINE_ID_DIGG;
break;
default:
// invalid source select DIG
ASSERT(false);
result = ENGINE_ID_UNKNOWN;
^^^
}

return result;
}


About the machine:
It's a notebook with two GPUs. AMD is the primary GPU - the secondary GPU 
(Nvidia) is unused (nouveau is not loaded at all - the proprietary driver isn't 
even installed)

05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] 
Picasso (rev c1) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. Device 18f1
Flags: bus master, fast devsel, latency 0, IRQ 24
Memory at e000 (64-bit, prefetchable) [size=256M]
Memory at f000 (64-bit, prefetchable) [size=2M]
I/O ports at c000 [size=256]
Memory at f750 (32-bit, non-prefetchable) [size=512K]
Capabilities: [48] Vendor Specific Information: Len=08 
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/4 Maskable- 64bit+
Capabilities: [c0] MSI-X: Enable+ Count=3 Masked-
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 

Capabilities: [200] #15
Capabilities: [270] #19
Capabilities: [2a0] Access Control Services
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities: [320] Latency Tolerance Reporting
Kernel driver in use: amdgpu
Kernel modules: amdgpu

01:00.0 VGA compatible controller: NVIDIA Corporation TU117M [GeForce GTX 1650 
Mobile / Max-Q] (rev a1) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. Device 109f
Flags: fast devsel, IRQ 255
Memory at f600 (32-bit, non-prefetchable) [disabled] [size=16M]
Memory at c000 (64-bit, prefetchable) [disabled] [size=256M]
Memory at d000 (64-bit, prefetchable) [disabled] [size=32M]
I/O ports at f000 [disabled] [size=128]
Expansion ROM at f700 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting 
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 

Capabilities: [900] #19
Capabilities: [bb0] #15
Kernel modules: nouveau

CPU: AMD Ryzen 7 3750H with Radeon Vega Mobile Gfx

Could you please fix this problem, too?
Please CC me for any answer because I'm not regularly reading the kernel 
mailing list.


Thanks
Andreas Hartmann
2021-01-13T10:52:02.135202+01:00 localhost kernel: [  155.645178] [ 
cut here ]
2021-01-13T10:52:02.135204+01:00 localhost kernel: [  155.645330] WARNING: CPU: 
6 PID: 4116 at 
../drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_link_encoder.c:483 
dcn10_get_dig_frontend+0x65/0xb0 [amdgpu]
2021-01-13T10:52:02.135205+01:00 localhost kernel: [  155.645331] Modules 
linked in: fuse iptable_mangle xt_TCPMSS xt_tcpudp bpfilter ip_tables x_tables 
af_packet dmi_sysfs uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 
videobuf2_common videodev mc msr snd_hda_codec_realtek snd_hda_codec_generic 
ledtrig_audio snd_hda_codec_hdmi snd

Re: [PATCH 4.19 13/99] netfilter: nf_conncount: fix argument order to find_next_bit

2019-04-22 Thread Andreas Hartmann
On 22.04.19 at 20:57 Florian Westphal wrote:
> Andreas Hartmann  wrote:
>>> Could you at least tell us how you're using nf_conncount (nf/iptables
>>> rules)?
>>
>> # Generated by iptables-save v1.6.2 on Mon Apr 22 20:19:30 2019
>> *filter
>> :INPUT DROP [0:0]
>> :FORWARD ACCEPT [0:0]
>> :OUTPUT DROP [4423:248703]
>> -A INPUT -s 127.0.0.1/32 -d 239.255.255.250/32 -i lo -p udp -j ACCEPT
>> -A INPUT -p tcp -m tcp --dport 113 -j REJECT --reject-with 
>> icmp-port-unreachable
>> -A INPUT -d 255.255.255.255/32 -p udp -j ACCEPT
>> -A INPUT -d 224.0.0.1/32 -j ACCEPT
>> -A INPUT -s 127.0.0.1/32 -d 127.0.0.2/32 -i lo -j ACCEPT
>> -A INPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -i lo -j ACCEPT
>> -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
>> -A INPUT -s 192.168.22.0/24 -j ACCEPT
>> -A INPUT -j LOG --log-prefix "In Input gesperrt: "
>> -A INPUT -s 169.254.2.1/32 -d 169.254.2.2/32 -i br1 -p tcp -m tcp --sport 80 
>> -j ACCEPT
>> -A OUTPUT -s 192.168.22.6/32 -d 224.0.0.22/32 -o lo -p igmp -j ACCEPT
>> -A OUTPUT -d 192.168.6.173/32 -o br1 -p tcp -m tcp --dport 80 -j ACCEPT
>> -A OUTPUT -s 169.254.2.2/32 -d 239.255.255.250/32 -o br1 -p udp -j DROP
>> -A OUTPUT -s 192.168.22.6/32 -d 224.0.0.251/32 -o br1 -p udp -j ACCEPT
>> -A OUTPUT -s 127.0.0.1/32 -d 239.255.255.250/32 -o lo -p udp -j ACCEPT
>> -A OUTPUT -s 192.168.22.6/32 -d 255.255.255.255/32 -o br1 -p udp -m udp 
>> --dport 1900 -j ACCEPT
>> -A OUTPUT -s 127.0.0.1/32 -d 127.255.255.255/32 -o br1 -p udp -j ACCEPT
>> -A OUTPUT -s 192.168.22.6/32 -d 239.0.0.250/32 -o br1 -p igmp -j ACCEPT
>> -A OUTPUT -s 192.168.22.6/32 -d 239.255.255.250/32 -o br1 -p igmp -j ACCEPT
>> -A OUTPUT -s 192.168.22.6/32 -d 239.255.255.250/32 -o br1 -p udp -m udp 
>> --dport 1900 -j ACCEPT
>> -A OUTPUT -s 192.168.22.6/32 -d 239.1.1.1/32 -o br1 -p udp -j ACCEPT
>> -A OUTPUT -s 192.168.22.6/32 -d 239.1.1.1/32 -o br1 -p igmp -j ACCEPT
>> -A OUTPUT -s 192.168.22.6/32 -d 224.0.0.251/32 -o br1 -p igmp -j ACCEPT
>> -A OUTPUT -s 192.168.22.6/32 -p tcp -m tcp --dport 1935 -j ACCEPT
>> -A OUTPUT -s 192.168.22.0/24 -d 192.168.3.0/24 -j ACCEPT
>> -A OUTPUT -s 127.0.0.1/32 -d 127.0.0.2/32 -o lo -j ACCEPT
>> -A OUTPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -o lo -j ACCEPT
>> -A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
>> -A OUTPUT -s 192.168.22.0/24 -d 192.168.22.0/24 -j ACCEPT
>> -A OUTPUT -j LOG --log-prefix "In Output gesperrt: "
>> -A OUTPUT -s 169.254.2.2/32 -d 169.254.2.1/32 -o br1 -p tcp -m tcp --dport 
>> 80 -j ACCEPT
>> COMMIT
> 
> I don't see connlimit match is in use.
> 
> Could you post output of
> 
> lsmod | grep nf_conncount
> 
> and
> 
> grep CONNCOUNT ~/your_kernel_conf

True - it's not in use (it's not even configured) at all. I'm surprised that it 
seems to fix the problem anyway.
Ok - I'm testing few weeks more. If it comes up again: this has been a false 
positive.
If I can't see it any more - I wouldn't know what to do any further at the 
moment.

Regarding git bisect, the only other possible remaining changes would be at the 
moment

tty: Don't hold ldisc lock in tty_reopen() if ldisc present 
Dmitry Safonov
tty: Simplify tty->count math in tty_reopen()   
Dmitry Safonov
tty: Hold tty_ldisc_lock() during tty_reopen()  
Dmitry Safonov
tty/ldsem: Wake up readers after timed out down_write() 
Dmitry Safonov

But I don't know how this change could break video streaming using serviio ... .


Thanks
Andreas


Re: [PATCH 4.19 13/99] netfilter: nf_conncount: fix argument order to find_next_bit

2019-04-22 Thread Andreas Hartmann
On 22.04.19 at 19:27 Florian Westphal wrote:
> Andreas Hartmann  wrote:
>> Since 4.19.17, I'm facing problems during streaming of videos I've never 
>> seen before. This means:
>>
>> - video from internet stutters although enough data flow can be seen in bmon.
>> - gpu is locked:
>>   radeon :0a:00.0: ring 0 stalled for more than 14084msec
>>   radeon :0a:00.0: GPU lockup (current fence id 0x00053ed7 last 
>> fence id 0x00053f0f on ring 0)
>> - The connection of videos streamed locally by the machine for a TV suddenly 
>> breaks (upnp - serviio as server).
>>
>> After very long time of testing, I detected, that removing the complete 
>> patch series for 4.19.17 regarding netfilter: nf_conncount makes the problem 
>> disappear.
>>
>> Please remove / fix this patchset!
> 
> The state in 4.19.y is same as in mainline.

I don't use mainline.

> Could you at least tell us how you're using nf_conncount (nf/iptables
> rules)?

The host is a Ryzen 7 1700 CPU, containing 4 kvm VMs, one ethernet interface, 2 
bridges (2 different networks). One VM works as a bridge between both networks.
The iptables rules are the following:

# Generated by iptables-save v1.6.2 on Mon Apr 22 20:19:30 2019
*filter
:INPUT DROP [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT DROP [4423:248703]
-A INPUT -s 127.0.0.1/32 -d 239.255.255.250/32 -i lo -p udp -j ACCEPT
-A INPUT -p tcp -m tcp --dport 113 -j REJECT --reject-with icmp-port-unreachable
-A INPUT -d 255.255.255.255/32 -p udp -j ACCEPT
-A INPUT -d 224.0.0.1/32 -j ACCEPT
-A INPUT -s 127.0.0.1/32 -d 127.0.0.2/32 -i lo -j ACCEPT
-A INPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -i lo -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -s 192.168.22.0/24 -j ACCEPT
-A INPUT -j LOG --log-prefix "In Input gesperrt: "
-A INPUT -s 169.254.2.1/32 -d 169.254.2.2/32 -i br1 -p tcp -m tcp --sport 80 -j 
ACCEPT
-A OUTPUT -s 192.168.22.6/32 -d 224.0.0.22/32 -o lo -p igmp -j ACCEPT
-A OUTPUT -d 192.168.6.173/32 -o br1 -p tcp -m tcp --dport 80 -j ACCEPT
-A OUTPUT -s 169.254.2.2/32 -d 239.255.255.250/32 -o br1 -p udp -j DROP
-A OUTPUT -s 192.168.22.6/32 -d 224.0.0.251/32 -o br1 -p udp -j ACCEPT
-A OUTPUT -s 127.0.0.1/32 -d 239.255.255.250/32 -o lo -p udp -j ACCEPT
-A OUTPUT -s 192.168.22.6/32 -d 255.255.255.255/32 -o br1 -p udp -m udp --dport 
1900 -j ACCEPT
-A OUTPUT -s 127.0.0.1/32 -d 127.255.255.255/32 -o br1 -p udp -j ACCEPT
-A OUTPUT -s 192.168.22.6/32 -d 239.0.0.250/32 -o br1 -p igmp -j ACCEPT
-A OUTPUT -s 192.168.22.6/32 -d 239.255.255.250/32 -o br1 -p igmp -j ACCEPT
-A OUTPUT -s 192.168.22.6/32 -d 239.255.255.250/32 -o br1 -p udp -m udp --dport 
1900 -j ACCEPT
-A OUTPUT -s 192.168.22.6/32 -d 239.1.1.1/32 -o br1 -p udp -j ACCEPT
-A OUTPUT -s 192.168.22.6/32 -d 239.1.1.1/32 -o br1 -p igmp -j ACCEPT
-A OUTPUT -s 192.168.22.6/32 -d 224.0.0.251/32 -o br1 -p igmp -j ACCEPT
-A OUTPUT -s 192.168.22.6/32 -p tcp -m tcp --dport 1935 -j ACCEPT
-A OUTPUT -s 192.168.22.0/24 -d 192.168.3.0/24 -j ACCEPT
-A OUTPUT -s 127.0.0.1/32 -d 127.0.0.2/32 -o lo -j ACCEPT
-A OUTPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -o lo -j ACCEPT
-A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A OUTPUT -s 192.168.22.0/24 -d 192.168.22.0/24 -j ACCEPT
-A OUTPUT -j LOG --log-prefix "In Output gesperrt: "
-A OUTPUT -s 169.254.2.2/32 -d 169.254.2.1/32 -o br1 -p tcp -m tcp --dport 80 
-j ACCEPT
COMMIT
# Completed on Mon Apr 22 20:19:30 2019


br0: flags=4163  mtu 9000
ether .  txqueuelen 1000  (Ethernet)
RX packets 1376  bytes 139220 (135.9 KiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 0  bytes 0 (0.0 B)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

br1: flags=4163  mtu 1512
inet 192.168.22.6  netmask 255.255.255.0  broadcast 192.168.22.255
ether .  txqueuelen 1000  (Ethernet)
RX packets 1161816  bytes 2806028482 (2.6 GiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 1427306  bytes 2032637199 (1.8 GiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163  mtu 1512
ether .  txqueuelen 1000  (Ethernet)
RX packets 119990  bytes 110191277 (105.0 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 204094  bytes 234832004 (223.9 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
device interrupt 36  memory 0xfc8c-fc8e

lo: flags=73  mtu 65536
inet 127.0.0.1  netmask 255.0.0.0
loop  txqueuelen 1000  (Local Loopback)
RX packets 2474  bytes 16626724 (15.8 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 2474  bytes 16626724 (15.8 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

tap0: flags=4163  mtu 1512
ether   txqueuelen 1000  (Ethernet)

Re: [PATCH 4.19 13/99] netfilter: nf_conncount: fix argument order to find_next_bit

2019-04-22 Thread Andreas Hartmann
Hello!

Since 4.19.17, I'm facing problems during streaming of videos I've never seen 
before. This means:

- video from internet stutters although enough data flow can be seen in bmon.
- gpu is locked:
  radeon :0a:00.0: ring 0 stalled for more than 14084msec
  radeon :0a:00.0: GPU lockup (current fence id 0x00053ed7 last 
fence id 0x00053f0f on ring 0)
- The connection of videos streamed locally by the machine for a TV suddenly 
breaks (upnp - serviio as server).

After very long time of testing, I detected, that removing the complete patch 
series for 4.19.17 regarding netfilter: nf_conncount makes the problem 
disappear.

Please remove / fix this patchset!


Thanks
Andreas Hartmann


On 21.01.19 at 14:48 Greg Kroah-Hartman wrote:
> 4.19-stable review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: Florian Westphal 
> 
> commit a007232066f6839d6f256bab21e825d968f1a163 upstream.
> 
> Size and 'next bit' were swapped, this bug could cause worker to
> reschedule itself even if system was idle.
> 
> Fixes: 5c789e131cbb9 ("netfilter: nf_conncount: Add list lock and gc worker, 
> and RCU for init tree search")
> Reviewed-by: Shawn Bohrer 
> Signed-off-by: Florian Westphal 
> Signed-off-by: Pablo Neira Ayuso 
> Signed-off-by: Greg Kroah-Hartman 
> 
> ---
>  net/netfilter/nf_conncount.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/net/netfilter/nf_conncount.c
> +++ b/net/netfilter/nf_conncount.c
> @@ -488,7 +488,7 @@ next:
>   clear_bit(tree, data->pending_trees);
>  
>   next_tree = (tree + 1) % CONNCOUNT_SLOTS;
> - next_tree = find_next_bit(data->pending_trees, next_tree, 
> CONNCOUNT_SLOTS);
> + next_tree = find_next_bit(data->pending_trees, CONNCOUNT_SLOTS, 
> next_tree);
>  
>   if (next_tree < CONNCOUNT_SLOTS) {
>   data->gc_tree = next_tree;
> 
> 
> 





Re: Spectre mitigation doesn't seem to work at all?!

2018-06-04 Thread Andreas Hartmann
On 06/04/2018 at 04:12 PM Alan Cox wrote:
>> A malicious program most probably won't care about that. Therefore, my
>> next question is: which memory regions can be exploited by a malicious
>> program? The complete physical memory or only the memory provided to the
>> malicious program? Should be the latter if this approach should have any
>> impact.
> 
> Spectre is not about memory regions. It's about speculative execution
> leaving measurable footprints. What footprints you leave depend upon what
> code you are executing. Thus the question becomes 'what can the target
> access'.
> 
> In order to attack something you need both a way to influence the code
> concerned and a way to measure it. In addition it needs to have some
> secret you want.
> 
> In practice that usually means something on the same system with its own
> memory space/privilege level. The usual cases then are user<->kernel and
> managed application<->runtime.

Would this be a practical test case: Gather keys and passwords used by a
ssh login by running a malicious program in parallel to sshd as another
ordinary user w/o root access.


Thanks,
Andreas


Re: Spectre mitigation doesn't seem to work at all?!

2018-06-04 Thread Andreas Hartmann
On 06/04/2018 at 04:12 PM Alan Cox wrote:
>> A malicious program most probably won't care about that. Therefore, my
>> next question is: which memory regions can be exploited by a malicious
>> program? The complete physical memory or only the memory provided to the
>> malicious program? Should be the latter if this approach should have any
>> impact.
> 
> Spectre is not about memory regions. It's about speculative execution
> leaving measurable footprints. What footprints you leave depend upon what
> code you are executing. Thus the question becomes 'what can the target
> access'.
> 
> In order to attack something you need both a way to influence the code
> concerned and a way to measure it. In addition it needs to have some
> secret you want.
> 
> In practice that usually means something on the same system with its own
> memory space/privilege level. The usual cases then are user<->kernel and
> managed application<->runtime.

Would this be a practical test case: Gather keys and passwords used by a
ssh login by running a malicious program in parallel to sshd as another
ordinary user w/o root access.


Thanks,
Andreas


Re: Spectre mitigation doesn't seem to work at all?!

2018-06-04 Thread Andreas Hartmann
Hello Mark,

On 06/04/2018 at 11:19 AM Mark Rutland wrote:
> On Mon, Jun 04, 2018 at 10:50:07AM +0200, Andreas Hartmann wrote:
>> Hello Peter,
>>
>> thanks for your answer! I appreciate it!
>>
>> On 06/04/2018 at 10:15 AM Peter Zijlstra wrote:
>>> On Fri, Jun 01, 2018 at 02:19:38PM +0200, Andreas Hartmann wrote:
>>>
>>>> I tested the spectre mitigation of different machines and kernels with
>>>> https://github.com/crozone/SpectrePoC
>>>>
>>>> You can see the results below.
>>>
>>>> My question: Did I miss something?
>>>
>>> Yes.
>>>
>>>> Build: ... INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
>>>> Build: ... INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
>>>> Build: ... INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
>>>
>>>    
>>>
>>> The POC is a v1 on itself. V1 needs to be fixed for every individual
>>> executable (worse, for every individual location in the code, and we're
>>> still finding them). The kernel mitigation status for v1 only indicates
>>> the kernel itself has mitigations (for some locations).
>>>
>>> The POC is meant to test effectiveness of these mitigations, either the
>>> original LFENCE or the dependent instruction thing, but you have to
>>> enable one or the other.
>>
>> Ok, this means every program running on the machine has to care itself
>> to be spectre v1 - safe.
> 
> Correct. Primiarily this matters for things like JITs, where untrusted code 
> may
> be run in the same address space as sensitive data.
> 
>> A malicious program most probably won't care about that. Therefore, my
>> next question is: which memory regions can be exploited by a malicious
>> program? The complete physical memory or only the memory provided to the
>> malicious program? Should be the latter if this approach should have any
>> impact.
> 
> Assuming you have a CPU which is not vulnerable to meltdown / variant-3, or 
> you
> have mitigated this, (e.g. with KPTI), a malicious program can only access 
> data
> within its own address space.
> 
> Spectre variant-1 alone only gives access to memory in the address space of 
> the
> program itself.

Thanks Mark! Now I've a better understanding about the effects the
different vulnerabilities around Spectre and Meltdown do have and I'm
now hopefully able to better estimate them.

As I'm mostly using AMD-CPUs (like Ryzen 1 e.g.) for virtualization, I
should be secure by default regarding unwanted global memory access from
the VM to the host memory, because the Ryzen 1 CPU is not affected by
Meltdown at all.


Regards,
Andreas


Re: Spectre mitigation doesn't seem to work at all?!

2018-06-04 Thread Andreas Hartmann
Hello Mark,

On 06/04/2018 at 11:19 AM Mark Rutland wrote:
> On Mon, Jun 04, 2018 at 10:50:07AM +0200, Andreas Hartmann wrote:
>> Hello Peter,
>>
>> thanks for your answer! I appreciate it!
>>
>> On 06/04/2018 at 10:15 AM Peter Zijlstra wrote:
>>> On Fri, Jun 01, 2018 at 02:19:38PM +0200, Andreas Hartmann wrote:
>>>
>>>> I tested the spectre mitigation of different machines and kernels with
>>>> https://github.com/crozone/SpectrePoC
>>>>
>>>> You can see the results below.
>>>
>>>> My question: Did I miss something?
>>>
>>> Yes.
>>>
>>>> Build: ... INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
>>>> Build: ... INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
>>>> Build: ... INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
>>>
>>>    
>>>
>>> The POC is a v1 on itself. V1 needs to be fixed for every individual
>>> executable (worse, for every individual location in the code, and we're
>>> still finding them). The kernel mitigation status for v1 only indicates
>>> the kernel itself has mitigations (for some locations).
>>>
>>> The POC is meant to test effectiveness of these mitigations, either the
>>> original LFENCE or the dependent instruction thing, but you have to
>>> enable one or the other.
>>
>> Ok, this means every program running on the machine has to care itself
>> to be spectre v1 - safe.
> 
> Correct. Primiarily this matters for things like JITs, where untrusted code 
> may
> be run in the same address space as sensitive data.
> 
>> A malicious program most probably won't care about that. Therefore, my
>> next question is: which memory regions can be exploited by a malicious
>> program? The complete physical memory or only the memory provided to the
>> malicious program? Should be the latter if this approach should have any
>> impact.
> 
> Assuming you have a CPU which is not vulnerable to meltdown / variant-3, or 
> you
> have mitigated this, (e.g. with KPTI), a malicious program can only access 
> data
> within its own address space.
> 
> Spectre variant-1 alone only gives access to memory in the address space of 
> the
> program itself.

Thanks Mark! Now I've a better understanding about the effects the
different vulnerabilities around Spectre and Meltdown do have and I'm
now hopefully able to better estimate them.

As I'm mostly using AMD-CPUs (like Ryzen 1 e.g.) for virtualization, I
should be secure by default regarding unwanted global memory access from
the VM to the host memory, because the Ryzen 1 CPU is not affected by
Meltdown at all.


Regards,
Andreas


Re: Spectre mitigation doesn't seem to work at all?!

2018-06-04 Thread Andreas Hartmann
Hello Peter,

thanks for your answer! I appreciate it!

On 06/04/2018 at 10:15 AM Peter Zijlstra wrote:
> On Fri, Jun 01, 2018 at 02:19:38PM +0200, Andreas Hartmann wrote:
> 
>> I tested the spectre mitigation of different machines and kernels with
>> https://github.com/crozone/SpectrePoC
>>
>> You can see the results below.
> 
>> My question: Did I miss something?
> 
> Yes.
> 
>> Build: ... INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
>> Build: ... INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
>> Build: ... INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
> 
>    
> 
> The POC is a v1 on itself. V1 needs to be fixed for every individual
> executable (worse, for every individual location in the code, and we're
> still finding them). The kernel mitigation status for v1 only indicates
> the kernel itself has mitigations (for some locations).
> 
> The POC is meant to test effectiveness of these mitigations, either the
> original LFENCE or the dependent instruction thing, but you have to
> enable one or the other.

Ok, this means every program running on the machine has to care itself
to be spectre v1 - safe.

A malicious program most probably won't care about that. Therefore, my
next question is: which memory regions can be exploited by a malicious
program? The complete physical memory or only the memory provided to the
malicious program? Should be the latter if this approach should have any
impact.


Thanks,
Andreas


Re: Spectre mitigation doesn't seem to work at all?!

2018-06-04 Thread Andreas Hartmann
Hello Peter,

thanks for your answer! I appreciate it!

On 06/04/2018 at 10:15 AM Peter Zijlstra wrote:
> On Fri, Jun 01, 2018 at 02:19:38PM +0200, Andreas Hartmann wrote:
> 
>> I tested the spectre mitigation of different machines and kernels with
>> https://github.com/crozone/SpectrePoC
>>
>> You can see the results below.
> 
>> My question: Did I miss something?
> 
> Yes.
> 
>> Build: ... INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
>> Build: ... INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
>> Build: ... INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
> 
>    
> 
> The POC is a v1 on itself. V1 needs to be fixed for every individual
> executable (worse, for every individual location in the code, and we're
> still finding them). The kernel mitigation status for v1 only indicates
> the kernel itself has mitigations (for some locations).
> 
> The POC is meant to test effectiveness of these mitigations, either the
> original LFENCE or the dependent instruction thing, but you have to
> enable one or the other.

Ok, this means every program running on the machine has to care itself
to be spectre v1 - safe.

A malicious program most probably won't care about that. Therefore, my
next question is: which memory regions can be exploited by a malicious
program? The complete physical memory or only the memory provided to the
malicious program? Should be the latter if this approach should have any
impact.


Thanks,
Andreas


Re: Spectre mitigation doesn't seem to work at all?!

2018-06-04 Thread Andreas Hartmann
Hello!

Sorry for a ping - but I think the behavior shown below should really be
investigated!


Thanks,
Andreas




On 06/01/2018 at 02:19 PM Andreas Hartmann wrote:
> Hello!
> 
> I tested the spectre mitigation of different machines and kernels with
> https://github.com/crozone/SpectrePoC
> 
> You can see the results below.
> 
> 
> My question: Did I miss something?
> My expectation was, that on base of the output of
> /sys/devices/system/cpu/vulnerabilities/spectre_v* as shown below the
> problem should be gone away.
> But the results seem to tell me something other ... .
> 
> 
> Thanks
> Andreas
> 
> 
> 
> 
> --
> 
> CPU:    AMD Ryzen 7 1700X Eight-Core Processor
> Bios:   BIOS 4011 04/19/2018 - ibpb is listed in /proc/cpuinfo
> Kernel: 4.14.44-1.1-default
> cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
> Mitigation: Full AMD retpoline, IBPB
> cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
> Mitigation: __user pointer sanitization
> 
>  ./spectre.out
> Using a cache hit threshold of 80.
> Build: RDTSCP_SUPPORTED MFENCE_SUPPORTED CLFLUSH_SUPPORTED
> INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
> Reading 40 bytes:
> Reading at malicious_x = 0xffdfec18... Success: 0x54=’T’ score=2
> Reading at malicious_x = 0xffdfec19... Success: 0x68=’h’ score=2
> Reading at malicious_x = 0xffdfec1a... Success: 0x65=’e’ score=2
> Reading at malicious_x = 0xffdfec1b... Success: 0x20=’ ’ score=2
> Reading at malicious_x = 0xffdfec1c... Success: 0x4D=’M’ score=2
> Reading at malicious_x = 0xffdfec1d... Success: 0x61=’a’ score=2
> Reading at malicious_x = 0xffdfec1e... Success: 0x67=’g’ score=2
> Reading at malicious_x = 0xffdfec1f... Success: 0x69=’i’ score=2
> Reading at malicious_x = 0xffdfec20... Success: 0x63=’c’ score=2
> Reading at malicious_x = 0xffdfec21... Success: 0x20=’ ’ score=2
> Reading at malicious_x = 0xffdfec22... Success: 0x57=’W’ score=2
> Reading at malicious_x = 0xffdfec23... Success: 0x6F=’o’ score=2
> Reading at malicious_x = 0xffdfec24... Success: 0x72=’r’ score=2
> Reading at malicious_x = 0xffdfec25... Success: 0x64=’d’ score=2
> Reading at malicious_x = 0xffdfec26... Success: 0x73=’s’ score=2
> Reading at malicious_x = 0xffdfec27... Success: 0x20=’ ’ score=2
> Reading at malicious_x = 0xffdfec28... Success: 0x61=’a’ score=2
> Reading at malicious_x = 0xffdfec29... Success: 0x72=’r’ score=2
> Reading at malicious_x = 0xffdfec2a... Success: 0x65=’e’ score=2
> Reading at malicious_x = 0xffdfec2b... Success: 0x20=’ ’ score=2
> Reading at malicious_x = 0xffdfec2c... Success: 0x53=’S’ score=2
> Reading at malicious_x = 0xffdfec2d... Success: 0x71=’q’ score=2
> Reading at malicious_x = 0xffdfec2e... Success: 0x75=’u’ score=2
> Reading at malicious_x = 0xffdfec2f... Success: 0x65=’e’ score=2
> Reading at malicious_x = 0xffdfec30... Success: 0x61=’a’ score=2
> Reading at malicious_x = 0xffdfec31... Success: 0x6D=’m’ score=2
> Reading at malicious_x = 0xffdfec32... Success: 0x69=’i’ score=2
> Reading at malicious_x = 0xffdfec33... Success: 0x73=’s’ score=2
> Reading at malicious_x = 0xffdfec34... Success: 0x68=’h’ score=2
> Reading at malicious_x = 0xffdfec35... Success: 0x20=’ ’ score=2
> Reading at malicious_x = 0xffdfec36... Success: 0x4F=’O’ score=2
> Reading at malicious_x = 0xffdfec37... Success: 0x73=’s’ score=2
> Reading at malicious_x = 0xffdfec38... Success: 0x73=’s’ score=2
> Reading at malicious_x = 0xffdfec39... Success: 0x69=’i’ score=2
> Reading at malicious_x = 0xffdfec3a... Success: 0x66=’f’ score=2
> Reading at malicious_x = 0xffdfec3b... Success: 0x72=’r’ score=2
> Reading at malicious_x = 0xffdfec3c... Success: 0x61=’a’ score=2
> Reading at malicious_x = 0xffdfec3d... Success: 0x67=’g’ score=2
> Reading at malicious_x = 0xffdfec3e... Success: 0x65=’e’ score=2
> Reading at malicious_x = 0xffdfec3f... Success: 0x2E=’.’ score=2
> 
> 
> --
> 
> CPU:    AMD G-T40E Processor
> Kernel: 4.14.44-1.el6.x86_64
> cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
> Mitigation: __user pointer sanitization
> cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
> Mitigation: Full AMD retpoline
> 
> ./spectre.out 130
> Using a cache hit threshold of 13

Re: Spectre mitigation doesn't seem to work at all?!

2018-06-04 Thread Andreas Hartmann
Hello!

Sorry for a ping - but I think the behavior shown below should really be
investigated!


Thanks,
Andreas




On 06/01/2018 at 02:19 PM Andreas Hartmann wrote:
> Hello!
> 
> I tested the spectre mitigation of different machines and kernels with
> https://github.com/crozone/SpectrePoC
> 
> You can see the results below.
> 
> 
> My question: Did I miss something?
> My expectation was, that on base of the output of
> /sys/devices/system/cpu/vulnerabilities/spectre_v* as shown below the
> problem should be gone away.
> But the results seem to tell me something other ... .
> 
> 
> Thanks
> Andreas
> 
> 
> 
> 
> --
> 
> CPU:    AMD Ryzen 7 1700X Eight-Core Processor
> Bios:   BIOS 4011 04/19/2018 - ibpb is listed in /proc/cpuinfo
> Kernel: 4.14.44-1.1-default
> cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
> Mitigation: Full AMD retpoline, IBPB
> cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
> Mitigation: __user pointer sanitization
> 
>  ./spectre.out
> Using a cache hit threshold of 80.
> Build: RDTSCP_SUPPORTED MFENCE_SUPPORTED CLFLUSH_SUPPORTED
> INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
> Reading 40 bytes:
> Reading at malicious_x = 0xffdfec18... Success: 0x54=’T’ score=2
> Reading at malicious_x = 0xffdfec19... Success: 0x68=’h’ score=2
> Reading at malicious_x = 0xffdfec1a... Success: 0x65=’e’ score=2
> Reading at malicious_x = 0xffdfec1b... Success: 0x20=’ ’ score=2
> Reading at malicious_x = 0xffdfec1c... Success: 0x4D=’M’ score=2
> Reading at malicious_x = 0xffdfec1d... Success: 0x61=’a’ score=2
> Reading at malicious_x = 0xffdfec1e... Success: 0x67=’g’ score=2
> Reading at malicious_x = 0xffdfec1f... Success: 0x69=’i’ score=2
> Reading at malicious_x = 0xffdfec20... Success: 0x63=’c’ score=2
> Reading at malicious_x = 0xffdfec21... Success: 0x20=’ ’ score=2
> Reading at malicious_x = 0xffdfec22... Success: 0x57=’W’ score=2
> Reading at malicious_x = 0xffdfec23... Success: 0x6F=’o’ score=2
> Reading at malicious_x = 0xffdfec24... Success: 0x72=’r’ score=2
> Reading at malicious_x = 0xffdfec25... Success: 0x64=’d’ score=2
> Reading at malicious_x = 0xffdfec26... Success: 0x73=’s’ score=2
> Reading at malicious_x = 0xffdfec27... Success: 0x20=’ ’ score=2
> Reading at malicious_x = 0xffdfec28... Success: 0x61=’a’ score=2
> Reading at malicious_x = 0xffdfec29... Success: 0x72=’r’ score=2
> Reading at malicious_x = 0xffdfec2a... Success: 0x65=’e’ score=2
> Reading at malicious_x = 0xffdfec2b... Success: 0x20=’ ’ score=2
> Reading at malicious_x = 0xffdfec2c... Success: 0x53=’S’ score=2
> Reading at malicious_x = 0xffdfec2d... Success: 0x71=’q’ score=2
> Reading at malicious_x = 0xffdfec2e... Success: 0x75=’u’ score=2
> Reading at malicious_x = 0xffdfec2f... Success: 0x65=’e’ score=2
> Reading at malicious_x = 0xffdfec30... Success: 0x61=’a’ score=2
> Reading at malicious_x = 0xffdfec31... Success: 0x6D=’m’ score=2
> Reading at malicious_x = 0xffdfec32... Success: 0x69=’i’ score=2
> Reading at malicious_x = 0xffdfec33... Success: 0x73=’s’ score=2
> Reading at malicious_x = 0xffdfec34... Success: 0x68=’h’ score=2
> Reading at malicious_x = 0xffdfec35... Success: 0x20=’ ’ score=2
> Reading at malicious_x = 0xffdfec36... Success: 0x4F=’O’ score=2
> Reading at malicious_x = 0xffdfec37... Success: 0x73=’s’ score=2
> Reading at malicious_x = 0xffdfec38... Success: 0x73=’s’ score=2
> Reading at malicious_x = 0xffdfec39... Success: 0x69=’i’ score=2
> Reading at malicious_x = 0xffdfec3a... Success: 0x66=’f’ score=2
> Reading at malicious_x = 0xffdfec3b... Success: 0x72=’r’ score=2
> Reading at malicious_x = 0xffdfec3c... Success: 0x61=’a’ score=2
> Reading at malicious_x = 0xffdfec3d... Success: 0x67=’g’ score=2
> Reading at malicious_x = 0xffdfec3e... Success: 0x65=’e’ score=2
> Reading at malicious_x = 0xffdfec3f... Success: 0x2E=’.’ score=2
> 
> 
> --
> 
> CPU:    AMD G-T40E Processor
> Kernel: 4.14.44-1.el6.x86_64
> cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
> Mitigation: __user pointer sanitization
> cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
> Mitigation: Full AMD retpoline
> 
> ./spectre.out 130
> Using a cache hit threshold of 13

Spectre mitigation doesn't seem to work at all?!

2018-06-01 Thread Andreas Hartmann

Hello!

I tested the spectre mitigation of different machines and kernels with
https://github.com/crozone/SpectrePoC

You can see the results below.


My question: Did I miss something?
My expectation was, that on base of the output of
/sys/devices/system/cpu/vulnerabilities/spectre_v* as shown below the problem 
should be gone away.
But the results seem to tell me something other ... .


Thanks
Andreas




--
CPU:AMD Ryzen 7 1700X Eight-Core Processor
Bios:   BIOS 4011 04/19/2018 - ibpb is listed in /proc/cpuinfo
Kernel: 4.14.44-1.1-default
cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
Mitigation: Full AMD retpoline, IBPB
cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
Mitigation: __user pointer sanitization

 ./spectre.out
Using a cache hit threshold of 80.
Build: RDTSCP_SUPPORTED MFENCE_SUPPORTED CLFLUSH_SUPPORTED 
INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
Reading 40 bytes:
Reading at malicious_x = 0xffdfec18... Success: 0x54=’T’ score=2
Reading at malicious_x = 0xffdfec19... Success: 0x68=’h’ score=2
Reading at malicious_x = 0xffdfec1a... Success: 0x65=’e’ score=2
Reading at malicious_x = 0xffdfec1b... Success: 0x20=’ ’ score=2
Reading at malicious_x = 0xffdfec1c... Success: 0x4D=’M’ score=2
Reading at malicious_x = 0xffdfec1d... Success: 0x61=’a’ score=2
Reading at malicious_x = 0xffdfec1e... Success: 0x67=’g’ score=2
Reading at malicious_x = 0xffdfec1f... Success: 0x69=’i’ score=2
Reading at malicious_x = 0xffdfec20... Success: 0x63=’c’ score=2
Reading at malicious_x = 0xffdfec21... Success: 0x20=’ ’ score=2
Reading at malicious_x = 0xffdfec22... Success: 0x57=’W’ score=2
Reading at malicious_x = 0xffdfec23... Success: 0x6F=’o’ score=2
Reading at malicious_x = 0xffdfec24... Success: 0x72=’r’ score=2
Reading at malicious_x = 0xffdfec25... Success: 0x64=’d’ score=2
Reading at malicious_x = 0xffdfec26... Success: 0x73=’s’ score=2
Reading at malicious_x = 0xffdfec27... Success: 0x20=’ ’ score=2
Reading at malicious_x = 0xffdfec28... Success: 0x61=’a’ score=2
Reading at malicious_x = 0xffdfec29... Success: 0x72=’r’ score=2
Reading at malicious_x = 0xffdfec2a... Success: 0x65=’e’ score=2
Reading at malicious_x = 0xffdfec2b... Success: 0x20=’ ’ score=2
Reading at malicious_x = 0xffdfec2c... Success: 0x53=’S’ score=2
Reading at malicious_x = 0xffdfec2d... Success: 0x71=’q’ score=2
Reading at malicious_x = 0xffdfec2e... Success: 0x75=’u’ score=2
Reading at malicious_x = 0xffdfec2f... Success: 0x65=’e’ score=2
Reading at malicious_x = 0xffdfec30... Success: 0x61=’a’ score=2
Reading at malicious_x = 0xffdfec31... Success: 0x6D=’m’ score=2
Reading at malicious_x = 0xffdfec32... Success: 0x69=’i’ score=2
Reading at malicious_x = 0xffdfec33... Success: 0x73=’s’ score=2
Reading at malicious_x = 0xffdfec34... Success: 0x68=’h’ score=2
Reading at malicious_x = 0xffdfec35... Success: 0x20=’ ’ score=2
Reading at malicious_x = 0xffdfec36... Success: 0x4F=’O’ score=2
Reading at malicious_x = 0xffdfec37... Success: 0x73=’s’ score=2
Reading at malicious_x = 0xffdfec38... Success: 0x73=’s’ score=2
Reading at malicious_x = 0xffdfec39... Success: 0x69=’i’ score=2
Reading at malicious_x = 0xffdfec3a... Success: 0x66=’f’ score=2
Reading at malicious_x = 0xffdfec3b... Success: 0x72=’r’ score=2
Reading at malicious_x = 0xffdfec3c... Success: 0x61=’a’ score=2
Reading at malicious_x = 0xffdfec3d... Success: 0x67=’g’ score=2
Reading at malicious_x = 0xffdfec3e... Success: 0x65=’e’ score=2
Reading at malicious_x = 0xffdfec3f... Success: 0x2E=’.’ score=2


--
CPU:AMD G-T40E Processor
Kernel: 4.14.44-1.el6.x86_64
cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
Mitigation: __user pointer sanitization
cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
Mitigation: Full AMD retpoline

./spectre.out 130
Using a cache hit threshold of 130.
Build: RDTSCP_SUPPORTED MFENCE_SUPPORTED CLFLUSH_SUPPORTED 
INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
Reading 40 bytes:
Reading at malicious_x = 0xffdfebf0... Unclear: 0x54=’T’ score=999 
(second best: 0x00=’?’ score=992)
Reading at malicious_x = 0xffdfebf1... Unclear: 0x68=’h’ score=996 
(second best: 0x00=’?’ score=988)
Reading at malicious_x = 0xffdfebf2... Unclear: 0x65=’e’ score=999 
(second best: 0x00=’?’ score=985)
Reading at malicious_x = 0xffdfebf3... Unclear: 0x20=’ ’ score=997 
(second best: 0x00=’?’ score=989)
Reading at malicious_x = 

Spectre mitigation doesn't seem to work at all?!

2018-06-01 Thread Andreas Hartmann

Hello!

I tested the spectre mitigation of different machines and kernels with
https://github.com/crozone/SpectrePoC

You can see the results below.


My question: Did I miss something?
My expectation was, that on base of the output of
/sys/devices/system/cpu/vulnerabilities/spectre_v* as shown below the problem 
should be gone away.
But the results seem to tell me something other ... .


Thanks
Andreas




--
CPU:AMD Ryzen 7 1700X Eight-Core Processor
Bios:   BIOS 4011 04/19/2018 - ibpb is listed in /proc/cpuinfo
Kernel: 4.14.44-1.1-default
cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
Mitigation: Full AMD retpoline, IBPB
cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
Mitigation: __user pointer sanitization

 ./spectre.out
Using a cache hit threshold of 80.
Build: RDTSCP_SUPPORTED MFENCE_SUPPORTED CLFLUSH_SUPPORTED 
INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
Reading 40 bytes:
Reading at malicious_x = 0xffdfec18... Success: 0x54=’T’ score=2
Reading at malicious_x = 0xffdfec19... Success: 0x68=’h’ score=2
Reading at malicious_x = 0xffdfec1a... Success: 0x65=’e’ score=2
Reading at malicious_x = 0xffdfec1b... Success: 0x20=’ ’ score=2
Reading at malicious_x = 0xffdfec1c... Success: 0x4D=’M’ score=2
Reading at malicious_x = 0xffdfec1d... Success: 0x61=’a’ score=2
Reading at malicious_x = 0xffdfec1e... Success: 0x67=’g’ score=2
Reading at malicious_x = 0xffdfec1f... Success: 0x69=’i’ score=2
Reading at malicious_x = 0xffdfec20... Success: 0x63=’c’ score=2
Reading at malicious_x = 0xffdfec21... Success: 0x20=’ ’ score=2
Reading at malicious_x = 0xffdfec22... Success: 0x57=’W’ score=2
Reading at malicious_x = 0xffdfec23... Success: 0x6F=’o’ score=2
Reading at malicious_x = 0xffdfec24... Success: 0x72=’r’ score=2
Reading at malicious_x = 0xffdfec25... Success: 0x64=’d’ score=2
Reading at malicious_x = 0xffdfec26... Success: 0x73=’s’ score=2
Reading at malicious_x = 0xffdfec27... Success: 0x20=’ ’ score=2
Reading at malicious_x = 0xffdfec28... Success: 0x61=’a’ score=2
Reading at malicious_x = 0xffdfec29... Success: 0x72=’r’ score=2
Reading at malicious_x = 0xffdfec2a... Success: 0x65=’e’ score=2
Reading at malicious_x = 0xffdfec2b... Success: 0x20=’ ’ score=2
Reading at malicious_x = 0xffdfec2c... Success: 0x53=’S’ score=2
Reading at malicious_x = 0xffdfec2d... Success: 0x71=’q’ score=2
Reading at malicious_x = 0xffdfec2e... Success: 0x75=’u’ score=2
Reading at malicious_x = 0xffdfec2f... Success: 0x65=’e’ score=2
Reading at malicious_x = 0xffdfec30... Success: 0x61=’a’ score=2
Reading at malicious_x = 0xffdfec31... Success: 0x6D=’m’ score=2
Reading at malicious_x = 0xffdfec32... Success: 0x69=’i’ score=2
Reading at malicious_x = 0xffdfec33... Success: 0x73=’s’ score=2
Reading at malicious_x = 0xffdfec34... Success: 0x68=’h’ score=2
Reading at malicious_x = 0xffdfec35... Success: 0x20=’ ’ score=2
Reading at malicious_x = 0xffdfec36... Success: 0x4F=’O’ score=2
Reading at malicious_x = 0xffdfec37... Success: 0x73=’s’ score=2
Reading at malicious_x = 0xffdfec38... Success: 0x73=’s’ score=2
Reading at malicious_x = 0xffdfec39... Success: 0x69=’i’ score=2
Reading at malicious_x = 0xffdfec3a... Success: 0x66=’f’ score=2
Reading at malicious_x = 0xffdfec3b... Success: 0x72=’r’ score=2
Reading at malicious_x = 0xffdfec3c... Success: 0x61=’a’ score=2
Reading at malicious_x = 0xffdfec3d... Success: 0x67=’g’ score=2
Reading at malicious_x = 0xffdfec3e... Success: 0x65=’e’ score=2
Reading at malicious_x = 0xffdfec3f... Success: 0x2E=’.’ score=2


--
CPU:AMD G-T40E Processor
Kernel: 4.14.44-1.el6.x86_64
cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
Mitigation: __user pointer sanitization
cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
Mitigation: Full AMD retpoline

./spectre.out 130
Using a cache hit threshold of 130.
Build: RDTSCP_SUPPORTED MFENCE_SUPPORTED CLFLUSH_SUPPORTED 
INTEL_MITIGATION_DISABLED LINUX_KERNEL_MITIGATION_DISABLED
Reading 40 bytes:
Reading at malicious_x = 0xffdfebf0... Unclear: 0x54=’T’ score=999 
(second best: 0x00=’?’ score=992)
Reading at malicious_x = 0xffdfebf1... Unclear: 0x68=’h’ score=996 
(second best: 0x00=’?’ score=988)
Reading at malicious_x = 0xffdfebf2... Unclear: 0x65=’e’ score=999 
(second best: 0x00=’?’ score=985)
Reading at malicious_x = 0xffdfebf3... Unclear: 0x20=’ ’ score=997 
(second best: 0x00=’?’ score=989)
Reading at malicious_x = 

Re: [FYI] GCC segfaults under heavy multithreaded compilation with AMD Ryzen

2017-07-31 Thread Andreas Hartmann
On 07/31/2017 at 02:10 PM Alan Cox wrote:
> On Wed, 26 Jul 2017 06:54:01 +0900
> Satoru Takeuchi  wrote:
> 
>> # I'm a LKML subscriber, but not a x86 list subscriber
>>
>> I found the following new linux kernel bugzilla about Ryzen related problem.
>> Since many developers don't check this bugzilla and I've also
>> encountered this problem,
>> I decided to introduce this problem here.
> 
> Historically we've seen exactly these symptoms on all kinds of systems
> where the memory is at fault, even in cases where memtest86 passes.
> Whether there's a specific problem on some Ryzen boards is a question for
> AMD, but if I saw this without knowing the CPU I'd suspect memory
> firstly. GCC it turns out is by accident an amazingly effective memory
> testing tool.

That's surely true. But meanwhile, I got rid of my memory problems (no
more traces like these [1] or even system hangs) by a correct memory
configuration, but the segfaults of gcc remain, most of the time with
kernel 4.12, kernel 4.11.x and 4.9.39ff are working mostly fine -
mostly, because I stopped tests and can't therefore say, if it's really
stable or not - but (k)aslr must be disabled always.

FreeBSD meanwhile provides this workaround after long research [2]:

https://reviews.freebsd.org/D11780

Please port it to Linux!


[1] https://www.spinics.net/lists/kernel/msg2565491.html
[2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219399#c89


Re: [FYI] GCC segfaults under heavy multithreaded compilation with AMD Ryzen

2017-07-31 Thread Andreas Hartmann
On 07/31/2017 at 02:10 PM Alan Cox wrote:
> On Wed, 26 Jul 2017 06:54:01 +0900
> Satoru Takeuchi  wrote:
> 
>> # I'm a LKML subscriber, but not a x86 list subscriber
>>
>> I found the following new linux kernel bugzilla about Ryzen related problem.
>> Since many developers don't check this bugzilla and I've also
>> encountered this problem,
>> I decided to introduce this problem here.
> 
> Historically we've seen exactly these symptoms on all kinds of systems
> where the memory is at fault, even in cases where memtest86 passes.
> Whether there's a specific problem on some Ryzen boards is a question for
> AMD, but if I saw this without knowing the CPU I'd suspect memory
> firstly. GCC it turns out is by accident an amazingly effective memory
> testing tool.

That's surely true. But meanwhile, I got rid of my memory problems (no
more traces like these [1] or even system hangs) by a correct memory
configuration, but the segfaults of gcc remain, most of the time with
kernel 4.12, kernel 4.11.x and 4.9.39ff are working mostly fine -
mostly, because I stopped tests and can't therefore say, if it's really
stable or not - but (k)aslr must be disabled always.

FreeBSD meanwhile provides this workaround after long research [2]:

https://reviews.freebsd.org/D11780

Please port it to Linux!


[1] https://www.spinics.net/lists/kernel/msg2565491.html
[2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219399#c89


Re: [FYI] GCC segfaults under heavy multithreaded compilation with AMD Ryzen

2017-07-25 Thread Andreas Hartmann
On 07/26/2017 at 12:00 AM Satoru Takeuchi wrote:
> # I'm a LKML subscriber, but not a x86 list subscriber
> 
> I found the following new linux kernel bugzilla about Ryzen related problem.
> Since many developers don't check this bugzilla and I've also
> encountered this problem,
> I decided to introduce this problem here.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=196481:

I'm affected, too.

I'm using Asus PRIME X370-PRO / 32 GB RAM (Kingston Hyperx
HX424C15FBK2/32) configured as suggested by bios 0805 (Agesa 1.0.0.6):
2400 MHz.
Problems happen with linux 4.12.x or 4.9.x (didn't test others).

It seems that things may run more stable if the machine is booted twice:
At first boot until password request for hd encryption, then hard reset again.


During kernel compiling, I can see those crashes and hard lockups, ...:

  CC [M]  drivers/video/backlight/adp8870_bl.o
  CC [M]  drivers/usb/host/r8a66597-hcd.o
../scripts/Makefile.build:315: recipe for target
'drivers/usb/host/r8a66597-hcd.o' failed
../drivers/usb/host/r8a66597-hcd.c: In function 'r8a66597_timer':
../drivers/usb/host/r8a66597-hcd.c:1824:1: internal compiler error:
Segmentation fault
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
make[5]: *** [drivers/usb/host/r8a66597-hcd.o] Error 1
../scripts/Makefile.build:568: recipe for target 'drivers/usb/host' failed
make[4]: *** [drivers/usb/host] Error 2
make[4]: *** Waiting for unfinished jobs
  CC [M]  drivers/scsi/lpfc/lpfc_mbox.o


or


  CC [M]  drivers/staging/lustre/lustre/obdclass/lustre_handles.o
  CC [M]  drivers/net/ethernet/intel/e1000e/82571.o
../scripts/Makefile.build:309: recipe for target
'drivers/net/ethernet/intel/e1000e/82571.o' failed
../drivers/net/ethernet/intel/e1000e/82571.c: In function
'e1000_init_hw_82571':
../drivers/net/ethernet/intel/e1000e/82571.c:1152:1: internal compiler
error: Segmentation fault
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
make[7]: *** [drivers/net/ethernet/intel/e1000e/82571.o] Error 1
../scripts/Makefile.build:568: recipe for target
'drivers/net/ethernet/intel/e1000e' failed
  CC [M]  drivers/scsi/fcoe/fcoe.o
make[6]: *** [drivers/net/ethernet/intel/e1000e] Error 2
make[6]: *** Waiting for unfinished jobs


It also happened, that compiling just hangs, because two processes wait for 
each other.


Sometimes I get those entries in messages:


Jul 25 17:08:03 dualc kernel: traps: cc1[17305] general protection ip:48960c 
sp:7fff9910 error:0
Jul 25 17:08:03 dualc kernel:  in cc1[40+c73000]
Jul 25 17:08:03 dualc kernel: Modules linked in: vhost_net tun vhost macvtap 
macvlan igb dca nf_log_ipv4 nf_log_common xt_LOG ipt_REJECT nf_reject_ipv4 
xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
iptable_filter ip_tables x_tables vfio_pci vfio_iommu_type1 vfio_virqfd vfio 
br_netfilter bridge stp llc iscsi_ibft iscsi_boot_sysfs it87(O) hwmon_vid 
snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel eeepc_wmi asus_wmi 
snd_hda_codec sparse_keymap rfkill snd_hda_core video snd_hwdep mxm_wmi snd_pcm 
snd_seq snd_seq_device kvm_amd snd_timer kvm irqbypass snd pcspkr e1000e 
sp5100_tco soundcore i2c_piix4 ptp pps_core acpi_cpufreq fjes tpm_tis 
gpio_amdpt 8250_dw gpio_generic pinctrl_amd i2c_designware_platform wmi 
tpm_tis_core shpchp i2c_designware_core button tpm nfsd auth_rpcgss nfs_acl 
lockd grace
Jul 25 17:08:03 dualc kernel:  sunrpc xfs libcrc32c dm_crypt hid_generic usbhid 
raid1 md_mod amdkfd amd_iommu_v2 radeon crct10dif_pclmul crc32_pclmul 
crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_kms_helper syscopyarea 
sysfillrect sysimgblt fb_sys_fops ttm serio_raw drm ccp sr_mod cdrom xhci_pci 
xhci_hcd usbcore aesni_intel aes_x86_64 glue_helper lrw ablk_helper cryptd 
ata_generic pata_atiixp dm_mirror dm_region_hash dm_log sg thermal dm_multipath 
dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
Jul 25 17:08:03 dualc kernel: CPU: 9 PID: 17378 Comm: sh Tainted: G   O 
   4.9.39-2.4-default #1
Jul 25 17:08:03 dualc kernel: Hardware name: System manufacturer System Product 
Name/PRIME X370-PRO, BIOS 0805 06/20/2017
Jul 25 17:08:03 dualc kernel: task: 968a69f72140 task.stack: 
b69e9468
Jul 25 17:08:03 dualc kernel: RIP: 0010:[]  
[] lock_page_memcg+0x4f/0x80
Jul 25 17:08:03 dualc kernel: RSP: 0018:b69e94683c30  EFLAGS: 00010286
Jul 25 17:08:03 dualc kernel: RAX: 968a69f72140 RBX: cfff96844ca83000 RCX: 
007f8a32
Jul 25 17:08:03 dualc kernel: RDX: e459dfe28c80 RSI:  RDI: 
e459dfe28c80
Jul 25 17:08:03 dualc kernel: RBP: b69e94683c48 R08: 96880c194480 R09: 
77988000
Jul 25 17:08:03 dualc kernel: R10: 968abe85ccd8 R11: 00020002 R12: 
7795d000
Jul 25 17:08:03 dualc kernel: R13: e459dfe28c80 R14: b69e94683dd0 R15: 
7795e000
Jul 25 17:08:03 dualc kernel: FS: 

Re: [FYI] GCC segfaults under heavy multithreaded compilation with AMD Ryzen

2017-07-25 Thread Andreas Hartmann
On 07/26/2017 at 12:00 AM Satoru Takeuchi wrote:
> # I'm a LKML subscriber, but not a x86 list subscriber
> 
> I found the following new linux kernel bugzilla about Ryzen related problem.
> Since many developers don't check this bugzilla and I've also
> encountered this problem,
> I decided to introduce this problem here.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=196481:

I'm affected, too.

I'm using Asus PRIME X370-PRO / 32 GB RAM (Kingston Hyperx
HX424C15FBK2/32) configured as suggested by bios 0805 (Agesa 1.0.0.6):
2400 MHz.
Problems happen with linux 4.12.x or 4.9.x (didn't test others).

It seems that things may run more stable if the machine is booted twice:
At first boot until password request for hd encryption, then hard reset again.


During kernel compiling, I can see those crashes and hard lockups, ...:

  CC [M]  drivers/video/backlight/adp8870_bl.o
  CC [M]  drivers/usb/host/r8a66597-hcd.o
../scripts/Makefile.build:315: recipe for target
'drivers/usb/host/r8a66597-hcd.o' failed
../drivers/usb/host/r8a66597-hcd.c: In function 'r8a66597_timer':
../drivers/usb/host/r8a66597-hcd.c:1824:1: internal compiler error:
Segmentation fault
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
make[5]: *** [drivers/usb/host/r8a66597-hcd.o] Error 1
../scripts/Makefile.build:568: recipe for target 'drivers/usb/host' failed
make[4]: *** [drivers/usb/host] Error 2
make[4]: *** Waiting for unfinished jobs
  CC [M]  drivers/scsi/lpfc/lpfc_mbox.o


or


  CC [M]  drivers/staging/lustre/lustre/obdclass/lustre_handles.o
  CC [M]  drivers/net/ethernet/intel/e1000e/82571.o
../scripts/Makefile.build:309: recipe for target
'drivers/net/ethernet/intel/e1000e/82571.o' failed
../drivers/net/ethernet/intel/e1000e/82571.c: In function
'e1000_init_hw_82571':
../drivers/net/ethernet/intel/e1000e/82571.c:1152:1: internal compiler
error: Segmentation fault
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
make[7]: *** [drivers/net/ethernet/intel/e1000e/82571.o] Error 1
../scripts/Makefile.build:568: recipe for target
'drivers/net/ethernet/intel/e1000e' failed
  CC [M]  drivers/scsi/fcoe/fcoe.o
make[6]: *** [drivers/net/ethernet/intel/e1000e] Error 2
make[6]: *** Waiting for unfinished jobs


It also happened, that compiling just hangs, because two processes wait for 
each other.


Sometimes I get those entries in messages:


Jul 25 17:08:03 dualc kernel: traps: cc1[17305] general protection ip:48960c 
sp:7fff9910 error:0
Jul 25 17:08:03 dualc kernel:  in cc1[40+c73000]
Jul 25 17:08:03 dualc kernel: Modules linked in: vhost_net tun vhost macvtap 
macvlan igb dca nf_log_ipv4 nf_log_common xt_LOG ipt_REJECT nf_reject_ipv4 
xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
iptable_filter ip_tables x_tables vfio_pci vfio_iommu_type1 vfio_virqfd vfio 
br_netfilter bridge stp llc iscsi_ibft iscsi_boot_sysfs it87(O) hwmon_vid 
snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel eeepc_wmi asus_wmi 
snd_hda_codec sparse_keymap rfkill snd_hda_core video snd_hwdep mxm_wmi snd_pcm 
snd_seq snd_seq_device kvm_amd snd_timer kvm irqbypass snd pcspkr e1000e 
sp5100_tco soundcore i2c_piix4 ptp pps_core acpi_cpufreq fjes tpm_tis 
gpio_amdpt 8250_dw gpio_generic pinctrl_amd i2c_designware_platform wmi 
tpm_tis_core shpchp i2c_designware_core button tpm nfsd auth_rpcgss nfs_acl 
lockd grace
Jul 25 17:08:03 dualc kernel:  sunrpc xfs libcrc32c dm_crypt hid_generic usbhid 
raid1 md_mod amdkfd amd_iommu_v2 radeon crct10dif_pclmul crc32_pclmul 
crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_kms_helper syscopyarea 
sysfillrect sysimgblt fb_sys_fops ttm serio_raw drm ccp sr_mod cdrom xhci_pci 
xhci_hcd usbcore aesni_intel aes_x86_64 glue_helper lrw ablk_helper cryptd 
ata_generic pata_atiixp dm_mirror dm_region_hash dm_log sg thermal dm_multipath 
dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
Jul 25 17:08:03 dualc kernel: CPU: 9 PID: 17378 Comm: sh Tainted: G   O 
   4.9.39-2.4-default #1
Jul 25 17:08:03 dualc kernel: Hardware name: System manufacturer System Product 
Name/PRIME X370-PRO, BIOS 0805 06/20/2017
Jul 25 17:08:03 dualc kernel: task: 968a69f72140 task.stack: 
b69e9468
Jul 25 17:08:03 dualc kernel: RIP: 0010:[]  
[] lock_page_memcg+0x4f/0x80
Jul 25 17:08:03 dualc kernel: RSP: 0018:b69e94683c30  EFLAGS: 00010286
Jul 25 17:08:03 dualc kernel: RAX: 968a69f72140 RBX: cfff96844ca83000 RCX: 
007f8a32
Jul 25 17:08:03 dualc kernel: RDX: e459dfe28c80 RSI:  RDI: 
e459dfe28c80
Jul 25 17:08:03 dualc kernel: RBP: b69e94683c48 R08: 96880c194480 R09: 
77988000
Jul 25 17:08:03 dualc kernel: R10: 968abe85ccd8 R11: 00020002 R12: 
7795d000
Jul 25 17:08:03 dualc kernel: R13: e459dfe28c80 R14: b69e94683dd0 R15: 
7795e000
Jul 25 17:08:03 dualc kernel: FS: 

Regression - Linux 4.9: ums_eneub6250 broken: transfer buffer not dma capable - Trace

2017-04-15 Thread Andreas Hartmann
Hello!

Since Linux 4.9, ums_eneub6250 is broken. It's working fine if
CONFIG_VMAP_STACK is disabled.

I would be glad if it would be fixed.


Thanks,
kind regards,
Andreas


Apr 15 17:58:54 notebook2 kernel: usb 1-1.1: new high-speed USB device number 3 
using ehci-pci
Apr 15 17:58:54 notebook2 kernel: usb 1-1.1: New USB device found, 
idVendor=0cf2, idProduct=6250
Apr 15 17:58:54 notebook2 kernel: usb 1-1.1: New USB device strings: Mfr=1, 
Product=2, SerialNumber=4
Apr 15 17:58:54 notebook2 kernel: usb 1-1.1: Product: UB6250   
Apr 15 17:58:54 notebook2 kernel: usb 1-1.1: Manufacturer: ENE Flash  
Apr 15 17:58:54 notebook2 kernel: usb 1-1.1: SerialNumber: 606569746801
Apr 15 17:58:54 notebook2 mtp-probe[2134]: checking bus 1, device 3: 
"/sys/devices/pci:00/:00:1a.0/usb1/1-1/1-1.1"
Apr 15 17:58:54 notebook2 mtp-probe[2134]: bus: 1, device: 3 was not an MTP 
device
Apr 15 17:58:55 notebook2 kernel: usbcore: registered new interface driver 
usb-storage
Apr 15 17:58:55 notebook2 kernel: usbcore: registered new interface driver uas
Apr 15 17:58:55 notebook2 kernel: ums_eneub6250 1-1.1:1.0: USB Mass Storage 
device detected
Apr 15 17:58:55 notebook2 kernel: scsi host6: usb-storage 1-1.1:1.0
Apr 15 17:58:55 notebook2 kernel: [ cut here ]
Apr 15 17:58:55 notebook2 kernel: WARNING: CPU: 2 PID: 2133 at 
../drivers/usb/core/hcd.c:1587 usb_hcd_map_urb_for_dma+0x4ba/0x4f0 [usbcore]
Apr 15 17:58:55 notebook2 kernel: transfer buffer not dma capable
Apr 15 17:58:55 notebook2 kernel: Modules linked in: ums_eneub6250(+) uas 
usb_storage fuse binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek 
snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep 
snd_pcm_oss msi_wmi iTCO_wdt iTCO_vendor_support snd_pcm wmi snd_seq battery ac 
msi_laptop sparse_keymap rfkill joydev snd_seq_device snd_timer r8169 mii 
snd_mixer_oss intel_powerclamp coretemp kvm_intel snd mei_me mei kvm i2c_i801 
lpc_ich soundcore intel_ips shpchp mfd_core i2c_smbus fjes acpi_cpufreq tpm_tis 
pcspkr thermal tpm_tis_core tpm irqbypass fan dm_crypt crc32c_intel serio_raw 
sr_mod cdrom ehci_pci i915 ehci_hcd i2c_algo_bit usbcore drm_kms_helper 
syscopyarea sysfillrect sysimgblt fb_sys_fops drm video button dm_mirror 
dm_region_hash dm_log sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc 
scsi_dh_alua
Apr 15 17:58:55 notebook2 kernel: CPU: 2 PID: 2133 Comm: systemd-udevd Not 
tainted 4.9.21-1-default #1
Apr 15 17:58:55 notebook2 kernel: Hardware name: Micro-Star International 
CR620/CR620, BIOS E1681IMS VER.10C 04/12/2011
Apr 15 17:58:55 notebook2 kernel:  baf681b477f0 af3c854a 
baf681b47840 
Apr 15 17:58:55 notebook2 kernel:  baf681b47830 af085c71 
0633af0bd0de 8d35b2844e40
Apr 15 17:58:55 notebook2 kernel:   0200 
0002 8d360fafd800
Apr 15 17:58:55 notebook2 kernel: Call Trace:
Apr 15 17:58:55 notebook2 kernel:  [] dump_stack+0x63/0x89
Apr 15 17:58:55 notebook2 kernel:  [] __warn+0xd1/0xf0
Apr 15 17:58:55 notebook2 kernel:  [] 
warn_slowpath_fmt+0x4f/0x60
Apr 15 17:58:55 notebook2 kernel:  [] ? 
put_prev_entity+0x48/0x720
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_hcd_map_urb_for_dma+0x4ba/0x4f0 [usbcore]
Apr 15 17:58:55 notebook2 kernel:  [] ? 
finish_task_switch+0x78/0x1e0
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_hcd_submit_urb+0x1c9/0xb30 [usbcore]
Apr 15 17:58:55 notebook2 kernel:  [] ? schedule+0x3d/0x90
Apr 15 17:58:55 notebook2 kernel:  [] ? 
schedule_timeout+0x220/0x3c0
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_submit_urb.part.6+0x295/0x550 [usbcore]
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_submit_urb+0x34/0x70 [usbcore]
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_stor_msg_common+0x9d/0x120 [usb_storage]
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_stor_bulk_transfer_buf+0x56/0xa0 [usb_storage]
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_stor_bulk_transfer_sg+0x4e/0x60 [usb_storage]
Apr 15 17:58:55 notebook2 kernel:  [] 
ene_send_scsi_cmd+0x97/0x160 [ums_eneub6250]
Apr 15 17:58:55 notebook2 kernel:  [] 
ene_get_card_type.constprop.19+0x5b/0x60 [ums_eneub6250]
Apr 15 17:58:55 notebook2 kernel:  [] 
ene_ub6250_probe+0x8f/0x110 [ums_eneub6250]
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_probe_interface+0x157/0x2f0 [usbcore]
Apr 15 17:58:55 notebook2 kernel:  [] 
driver_probe_device+0x227/0x440
Apr 15 17:58:55 notebook2 kernel:  [] 
__driver_attach+0xdd/0xe0
Apr 15 17:58:55 notebook2 kernel:  [] ? 
driver_probe_device+0x440/0x440
Apr 15 17:58:55 notebook2 kernel:  [] 
bus_for_each_dev+0x5d/0x90
Apr 15 17:58:55 notebook2 kernel:  [] driver_attach+0x1e/0x20
Apr 15 17:58:55 notebook2 kernel:  [] 
bus_add_driver+0x45/0x270
Apr 15 17:58:55 notebook2 kernel:  [] 
driver_register+0x60/0xe0
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_register_driver+0x82/0x150 [usbcore]
Apr 15 17:58:55 notebook2 kernel:  [] ? 0xc03b9000
Apr 15 17:58:55 notebook2 kernel:  [] 
ene_ub6250_driver_init+0x38/0x1000 

Regression - Linux 4.9: ums_eneub6250 broken: transfer buffer not dma capable - Trace

2017-04-15 Thread Andreas Hartmann
Hello!

Since Linux 4.9, ums_eneub6250 is broken. It's working fine if
CONFIG_VMAP_STACK is disabled.

I would be glad if it would be fixed.


Thanks,
kind regards,
Andreas


Apr 15 17:58:54 notebook2 kernel: usb 1-1.1: new high-speed USB device number 3 
using ehci-pci
Apr 15 17:58:54 notebook2 kernel: usb 1-1.1: New USB device found, 
idVendor=0cf2, idProduct=6250
Apr 15 17:58:54 notebook2 kernel: usb 1-1.1: New USB device strings: Mfr=1, 
Product=2, SerialNumber=4
Apr 15 17:58:54 notebook2 kernel: usb 1-1.1: Product: UB6250   
Apr 15 17:58:54 notebook2 kernel: usb 1-1.1: Manufacturer: ENE Flash  
Apr 15 17:58:54 notebook2 kernel: usb 1-1.1: SerialNumber: 606569746801
Apr 15 17:58:54 notebook2 mtp-probe[2134]: checking bus 1, device 3: 
"/sys/devices/pci:00/:00:1a.0/usb1/1-1/1-1.1"
Apr 15 17:58:54 notebook2 mtp-probe[2134]: bus: 1, device: 3 was not an MTP 
device
Apr 15 17:58:55 notebook2 kernel: usbcore: registered new interface driver 
usb-storage
Apr 15 17:58:55 notebook2 kernel: usbcore: registered new interface driver uas
Apr 15 17:58:55 notebook2 kernel: ums_eneub6250 1-1.1:1.0: USB Mass Storage 
device detected
Apr 15 17:58:55 notebook2 kernel: scsi host6: usb-storage 1-1.1:1.0
Apr 15 17:58:55 notebook2 kernel: [ cut here ]
Apr 15 17:58:55 notebook2 kernel: WARNING: CPU: 2 PID: 2133 at 
../drivers/usb/core/hcd.c:1587 usb_hcd_map_urb_for_dma+0x4ba/0x4f0 [usbcore]
Apr 15 17:58:55 notebook2 kernel: transfer buffer not dma capable
Apr 15 17:58:55 notebook2 kernel: Modules linked in: ums_eneub6250(+) uas 
usb_storage fuse binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek 
snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep 
snd_pcm_oss msi_wmi iTCO_wdt iTCO_vendor_support snd_pcm wmi snd_seq battery ac 
msi_laptop sparse_keymap rfkill joydev snd_seq_device snd_timer r8169 mii 
snd_mixer_oss intel_powerclamp coretemp kvm_intel snd mei_me mei kvm i2c_i801 
lpc_ich soundcore intel_ips shpchp mfd_core i2c_smbus fjes acpi_cpufreq tpm_tis 
pcspkr thermal tpm_tis_core tpm irqbypass fan dm_crypt crc32c_intel serio_raw 
sr_mod cdrom ehci_pci i915 ehci_hcd i2c_algo_bit usbcore drm_kms_helper 
syscopyarea sysfillrect sysimgblt fb_sys_fops drm video button dm_mirror 
dm_region_hash dm_log sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc 
scsi_dh_alua
Apr 15 17:58:55 notebook2 kernel: CPU: 2 PID: 2133 Comm: systemd-udevd Not 
tainted 4.9.21-1-default #1
Apr 15 17:58:55 notebook2 kernel: Hardware name: Micro-Star International 
CR620/CR620, BIOS E1681IMS VER.10C 04/12/2011
Apr 15 17:58:55 notebook2 kernel:  baf681b477f0 af3c854a 
baf681b47840 
Apr 15 17:58:55 notebook2 kernel:  baf681b47830 af085c71 
0633af0bd0de 8d35b2844e40
Apr 15 17:58:55 notebook2 kernel:   0200 
0002 8d360fafd800
Apr 15 17:58:55 notebook2 kernel: Call Trace:
Apr 15 17:58:55 notebook2 kernel:  [] dump_stack+0x63/0x89
Apr 15 17:58:55 notebook2 kernel:  [] __warn+0xd1/0xf0
Apr 15 17:58:55 notebook2 kernel:  [] 
warn_slowpath_fmt+0x4f/0x60
Apr 15 17:58:55 notebook2 kernel:  [] ? 
put_prev_entity+0x48/0x720
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_hcd_map_urb_for_dma+0x4ba/0x4f0 [usbcore]
Apr 15 17:58:55 notebook2 kernel:  [] ? 
finish_task_switch+0x78/0x1e0
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_hcd_submit_urb+0x1c9/0xb30 [usbcore]
Apr 15 17:58:55 notebook2 kernel:  [] ? schedule+0x3d/0x90
Apr 15 17:58:55 notebook2 kernel:  [] ? 
schedule_timeout+0x220/0x3c0
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_submit_urb.part.6+0x295/0x550 [usbcore]
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_submit_urb+0x34/0x70 [usbcore]
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_stor_msg_common+0x9d/0x120 [usb_storage]
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_stor_bulk_transfer_buf+0x56/0xa0 [usb_storage]
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_stor_bulk_transfer_sg+0x4e/0x60 [usb_storage]
Apr 15 17:58:55 notebook2 kernel:  [] 
ene_send_scsi_cmd+0x97/0x160 [ums_eneub6250]
Apr 15 17:58:55 notebook2 kernel:  [] 
ene_get_card_type.constprop.19+0x5b/0x60 [ums_eneub6250]
Apr 15 17:58:55 notebook2 kernel:  [] 
ene_ub6250_probe+0x8f/0x110 [ums_eneub6250]
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_probe_interface+0x157/0x2f0 [usbcore]
Apr 15 17:58:55 notebook2 kernel:  [] 
driver_probe_device+0x227/0x440
Apr 15 17:58:55 notebook2 kernel:  [] 
__driver_attach+0xdd/0xe0
Apr 15 17:58:55 notebook2 kernel:  [] ? 
driver_probe_device+0x440/0x440
Apr 15 17:58:55 notebook2 kernel:  [] 
bus_for_each_dev+0x5d/0x90
Apr 15 17:58:55 notebook2 kernel:  [] driver_attach+0x1e/0x20
Apr 15 17:58:55 notebook2 kernel:  [] 
bus_add_driver+0x45/0x270
Apr 15 17:58:55 notebook2 kernel:  [] 
driver_register+0x60/0xe0
Apr 15 17:58:55 notebook2 kernel:  [] 
usb_register_driver+0x82/0x150 [usbcore]
Apr 15 17:58:55 notebook2 kernel:  [] ? 0xc03b9000
Apr 15 17:58:55 notebook2 kernel:  [] 
ene_ub6250_driver_init+0x38/0x1000 

ata3.00: failed command: WRITE FPDMA QUEUED since Linux 4.1

2015-07-24 Thread Andreas Hartmann
Hello!

Since Linux 4.1, there are often ata erros like these here:

[1.154572] libata version 3.00 loaded.
[1.787436] ahci :00:11.0: AHCI 0001.0200 32 slots 6 ports 6 Gbps
0x3f impl SATA mode
[1.788731] ata1: SATA max UDMA/133 abar m1024@0xfdfff000 port
0xfdfff100 irq 19
[1.788733] ata2: SATA max UDMA/133 abar m1024@0xfdfff000 port
0xfdfff180 irq 19
[1.788734] ata3: SATA max UDMA/133 abar m1024@0xfdfff000 port
0xfdfff200 irq 19
[1.788736] ata4: SATA max UDMA/133 abar m1024@0xfdfff000 port
0xfdfff280 irq 19
[1.788738] ata5: SATA max UDMA/133 abar m1024@0xfdfff000 port
0xfdfff300 irq 19
[1.788740] ata6: SATA max UDMA/133 abar m1024@0xfdfff000 port
0xfdfff380 irq 19
[2.105906] ata4: SATA link down (SStatus 0 SControl 300)
[2.109960] ata6: SATA link down (SStatus 0 SControl 300)
[2.281699] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[2.281717] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[2.281737] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[2.281752] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[2.282446] ata2.00: ATA-8: ST3000DM001-1CH166, CC24, max UDMA/133
[2.282448] ata2.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth
31/32), AA
[2.282893] ata3.00: ATA-9: ST3000DM001-1CH166, CC29, max UDMA/133
[2.282895] ata3.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth
31/32), AA
[2.283287] ata2.00: configured for UDMA/133
[2.283763] ata3.00: configured for UDMA/133
[2.289800] ata5.00: ATAPI: HL-DT-ST BD-RE  BH10LS38, 1.00, max UDMA/133
[2.293383] ata1.00: ATA-8: Corsair Force GT, 1.3.3, max UDMA/133
[2.293385] ata1.00: 468862128 sectors, multi 16: LBA48 NCQ (depth
31/32), AA
[2.293754] ata5.00: configured for UDMA/133
[2.303356] ata1.00: configured for UDMA/133
[2.303469] scsi 0:0:0:0: Direct-Access ATA  Corsair Force GT
3PQ: 0 ANSI: 5
[2.303760] scsi 1:0:0:0: Direct-Access ATA  ST3000DM001-1CH1
CC24 PQ: 0 ANSI: 5
[2.304055] scsi 2:0:0:0: Direct-Access ATA  ST3000DM001-1CH1
CC29 PQ: 0 ANSI: 5
[   48.689195] ata3.00: exception Emask 0x0 SAct 0x18 SErr 0x0 action
0x6 frozen
[   48.690421] ata3.00: failed command: WRITE FPDMA QUEUED
[   48.691597] ata3.00: cmd 61/58:18:21:ab:eb/05:00:58:00:00/40 tag 3
ncq 700416 out
[   48.693977] ata3.00: status: { DRDY }
[   48.695115] ata3.00: failed command: WRITE FPDMA QUEUED
[   48.696257] ata3.00: cmd 61/00:20:21:a3:eb/08:00:58:00:00/40 tag 4
ncq 1048576 out
[   48.698702] ata3.00: status: { DRDY }
[   48.699856] ata3: hard resetting link
[   49.188612] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   49.385330] ata3.00: configured for UDMA/133
[   49.385356] ata3.00: device reported invalid CHS sector 0
[   49.385380] ata3.00: device reported invalid CHS sector 0
[   49.385393] ata3: EH complete
[   79.630109] ata3.00: exception Emask 0x0 SAct 0xc0 SErr 0x0 action
0x6 frozen
[   79.631069] ata3.00: failed command: WRITE FPDMA QUEUED
[   79.632057] ata3.00: cmd 61/00:30:21:a3:eb/08:00:58:00:00/40 tag 6
ncq 1048576 out
[   79.634185] ata3.00: status: { DRDY }
[   79.635267] ata3.00: failed command: WRITE FPDMA QUEUED
[   79.636378] ata3.00: cmd 61/58:38:21:ab:eb/05:00:58:00:00/40 tag 7
ncq 700416 out
[   79.638743] ata3.00: status: { DRDY }
[   79.639935] ata3: hard resetting link
[   80.129527] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   80.145631] ata3.00: configured for UDMA/133
[   80.145661] ata3.00: device reported invalid CHS sector 0
[   80.145680] ata3.00: device reported invalid CHS sector 0
[   80.145693] ata3: EH complete
[  110.571021] ata3.00: exception Emask 0x0 SAct 0x1800 SErr 0x0 action
0x6 frozen
[  110.572263] ata3.00: failed command: WRITE FPDMA QUEUED
[  110.573505] ata3.00: cmd 61/58:58:21:ab:eb/05:00:58:00:00/40 tag 11
ncq 700416 out
[  110.576028] ata3.00: status: { DRDY }
[  110.577267] ata3.00: failed command: WRITE FPDMA QUEUED
[  110.578508] ata3.00: cmd 61/00:60:21:a3:eb/08:00:58:00:00/40 tag 12
ncq 1048576 out
[  110.580954] ata3.00: status: { DRDY }
[  110.582183] ata3: hard resetting link
[  111.070441] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  111.072173] ata3.00: configured for UDMA/133
[  111.072198] ata3.00: device reported invalid CHS sector 0
[  111.07] ata3.00: device reported invalid CHS sector 0
[  111.072235] ata3: EH complete
[  141.511934] ata3.00: NCQ disabled due to excessive errors
[  141.511943] ata3.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action
0x6 frozen
[  141.512904] ata3.00: failed command: WRITE FPDMA QUEUED
[  141.513894] ata3.00: cmd 61/00:80:21:a3:eb/08:00:58:00:00/40 tag 16
ncq 1048576 out
[  141.516018] ata3.00: status: { DRDY }
[  141.517106] ata3.00: failed command: WRITE FPDMA QUEUED
[  141.518224] ata3.00: cmd 61/58:88:21:ab:eb/05:00:58:00:00/40 tag 17
ncq 700416 out
[  141.520587] ata3.00: status: { DRDY }
[  141.521791] ata3: hard resetting link
[  142.011355] ata3: SATA link 

ata3.00: failed command: WRITE FPDMA QUEUED since Linux 4.1

2015-07-24 Thread Andreas Hartmann
Hello!

Since Linux 4.1, there are often ata erros like these here:

[1.154572] libata version 3.00 loaded.
[1.787436] ahci :00:11.0: AHCI 0001.0200 32 slots 6 ports 6 Gbps
0x3f impl SATA mode
[1.788731] ata1: SATA max UDMA/133 abar m1024@0xfdfff000 port
0xfdfff100 irq 19
[1.788733] ata2: SATA max UDMA/133 abar m1024@0xfdfff000 port
0xfdfff180 irq 19
[1.788734] ata3: SATA max UDMA/133 abar m1024@0xfdfff000 port
0xfdfff200 irq 19
[1.788736] ata4: SATA max UDMA/133 abar m1024@0xfdfff000 port
0xfdfff280 irq 19
[1.788738] ata5: SATA max UDMA/133 abar m1024@0xfdfff000 port
0xfdfff300 irq 19
[1.788740] ata6: SATA max UDMA/133 abar m1024@0xfdfff000 port
0xfdfff380 irq 19
[2.105906] ata4: SATA link down (SStatus 0 SControl 300)
[2.109960] ata6: SATA link down (SStatus 0 SControl 300)
[2.281699] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[2.281717] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[2.281737] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[2.281752] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[2.282446] ata2.00: ATA-8: ST3000DM001-1CH166, CC24, max UDMA/133
[2.282448] ata2.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth
31/32), AA
[2.282893] ata3.00: ATA-9: ST3000DM001-1CH166, CC29, max UDMA/133
[2.282895] ata3.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth
31/32), AA
[2.283287] ata2.00: configured for UDMA/133
[2.283763] ata3.00: configured for UDMA/133
[2.289800] ata5.00: ATAPI: HL-DT-ST BD-RE  BH10LS38, 1.00, max UDMA/133
[2.293383] ata1.00: ATA-8: Corsair Force GT, 1.3.3, max UDMA/133
[2.293385] ata1.00: 468862128 sectors, multi 16: LBA48 NCQ (depth
31/32), AA
[2.293754] ata5.00: configured for UDMA/133
[2.303356] ata1.00: configured for UDMA/133
[2.303469] scsi 0:0:0:0: Direct-Access ATA  Corsair Force GT
3PQ: 0 ANSI: 5
[2.303760] scsi 1:0:0:0: Direct-Access ATA  ST3000DM001-1CH1
CC24 PQ: 0 ANSI: 5
[2.304055] scsi 2:0:0:0: Direct-Access ATA  ST3000DM001-1CH1
CC29 PQ: 0 ANSI: 5
[   48.689195] ata3.00: exception Emask 0x0 SAct 0x18 SErr 0x0 action
0x6 frozen
[   48.690421] ata3.00: failed command: WRITE FPDMA QUEUED
[   48.691597] ata3.00: cmd 61/58:18:21:ab:eb/05:00:58:00:00/40 tag 3
ncq 700416 out
[   48.693977] ata3.00: status: { DRDY }
[   48.695115] ata3.00: failed command: WRITE FPDMA QUEUED
[   48.696257] ata3.00: cmd 61/00:20:21:a3:eb/08:00:58:00:00/40 tag 4
ncq 1048576 out
[   48.698702] ata3.00: status: { DRDY }
[   48.699856] ata3: hard resetting link
[   49.188612] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   49.385330] ata3.00: configured for UDMA/133
[   49.385356] ata3.00: device reported invalid CHS sector 0
[   49.385380] ata3.00: device reported invalid CHS sector 0
[   49.385393] ata3: EH complete
[   79.630109] ata3.00: exception Emask 0x0 SAct 0xc0 SErr 0x0 action
0x6 frozen
[   79.631069] ata3.00: failed command: WRITE FPDMA QUEUED
[   79.632057] ata3.00: cmd 61/00:30:21:a3:eb/08:00:58:00:00/40 tag 6
ncq 1048576 out
[   79.634185] ata3.00: status: { DRDY }
[   79.635267] ata3.00: failed command: WRITE FPDMA QUEUED
[   79.636378] ata3.00: cmd 61/58:38:21:ab:eb/05:00:58:00:00/40 tag 7
ncq 700416 out
[   79.638743] ata3.00: status: { DRDY }
[   79.639935] ata3: hard resetting link
[   80.129527] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   80.145631] ata3.00: configured for UDMA/133
[   80.145661] ata3.00: device reported invalid CHS sector 0
[   80.145680] ata3.00: device reported invalid CHS sector 0
[   80.145693] ata3: EH complete
[  110.571021] ata3.00: exception Emask 0x0 SAct 0x1800 SErr 0x0 action
0x6 frozen
[  110.572263] ata3.00: failed command: WRITE FPDMA QUEUED
[  110.573505] ata3.00: cmd 61/58:58:21:ab:eb/05:00:58:00:00/40 tag 11
ncq 700416 out
[  110.576028] ata3.00: status: { DRDY }
[  110.577267] ata3.00: failed command: WRITE FPDMA QUEUED
[  110.578508] ata3.00: cmd 61/00:60:21:a3:eb/08:00:58:00:00/40 tag 12
ncq 1048576 out
[  110.580954] ata3.00: status: { DRDY }
[  110.582183] ata3: hard resetting link
[  111.070441] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  111.072173] ata3.00: configured for UDMA/133
[  111.072198] ata3.00: device reported invalid CHS sector 0
[  111.07] ata3.00: device reported invalid CHS sector 0
[  111.072235] ata3: EH complete
[  141.511934] ata3.00: NCQ disabled due to excessive errors
[  141.511943] ata3.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action
0x6 frozen
[  141.512904] ata3.00: failed command: WRITE FPDMA QUEUED
[  141.513894] ata3.00: cmd 61/00:80:21:a3:eb/08:00:58:00:00/40 tag 16
ncq 1048576 out
[  141.516018] ata3.00: status: { DRDY }
[  141.517106] ata3.00: failed command: WRITE FPDMA QUEUED
[  141.518224] ata3.00: cmd 61/58:88:21:ab:eb/05:00:58:00:00/40 tag 17
ncq 700416 out
[  141.520587] ata3.00: status: { DRDY }
[  141.521791] ata3: hard resetting link
[  142.011355] ata3: SATA link 

Since Linux 4.1: A lot of AMD-Vi IO_PAGE_FAULTs

2015-07-21 Thread Andreas Hartmann
Hello!

Since Linux 4.1, I'm getting a lot of IO_PAGE_FAULT like this one

[   17.048609] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0
domain=0x0008 address=0x40ebaaab00618000 flags=0x0010]

with different addresses:

0x40ebaaab00618000
0x40ebaaab00618040
0x
0x0180
0x00c0
0x0080
0x0100
0x0040
0x0140
0x01c0
0x0200
0x0240
0x0280

...

device=00:11.0 is:

# lspci -vvs 00:11.0
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40) (prog-if 01 [AHCI
1.0])
Subsystem: Gigabyte Technology Co., Ltd Device b002
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
SERR- http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Since Linux 4.1: A lot of AMD-Vi IO_PAGE_FAULTs

2015-07-21 Thread Andreas Hartmann
Hello!

Since Linux 4.1, I'm getting a lot of IO_PAGE_FAULT like this one

[   17.048609] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0
domain=0x0008 address=0x40ebaaab00618000 flags=0x0010]

with different addresses:

0x40ebaaab00618000
0x40ebaaab00618040
0x
0x0180
0x00c0
0x0080
0x0100
0x0040
0x0140
0x01c0
0x0200
0x0240
0x0280

...

device=00:11.0 is:

# lspci -vvs 00:11.0
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40) (prog-if 01 [AHCI
1.0])
Subsystem: Gigabyte Technology Co., Ltd Device b002
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium TAbort-
TAbort- MAbort- SERR- PERR- INTx-
Latency: 32, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 19
Region 0: I/O ports at ff00 [size=8]
Region 1: I/O ports at fe00 [size=4]
Region 2: I/O ports at fd00 [size=8]
Region 3: I/O ports at fc00 [size=4]
Region 4: I/O ports at fb00 [size=16]
Region 5: Memory at fdfff000 (32-bit, non-prefetchable) [size=1K]
Capabilities: [70] SATA HBA v1.0 InCfgSpace
Capabilities: [a4] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel driver in use: ahci


Any idea what could be wrong?


Thanks,
kind regards,
Andreas Hartmann
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: f_op->read seems to be always NULL since Linux 4.1

2015-06-28 Thread Andreas Hartmann

On Sat, Jun 27, 2015 at 8:10 PM, Richard Weinberger wrote:

On Sat, Jun 27, 2015 at 7:32 PM, Andreas Hartmann
 wrote:

[...]

See __vfs_read().
Your module most not rely on such internals.


Thanks for your hint to the function which exists since 3.19.

Is there a site out there which lists all relevant changes done for each 
kernel version and the recommendations how to correctly handle them?



Kind regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: f_op-read seems to be always NULL since Linux 4.1

2015-06-28 Thread Andreas Hartmann

On Sat, Jun 27, 2015 at 8:10 PM, Richard Weinberger wrote:

On Sat, Jun 27, 2015 at 7:32 PM, Andreas Hartmann
andihartm...@01019freenet.de wrote:

[...]

See __vfs_read().
Your module most not rely on such internals.


Thanks for your hint to the function which exists since 3.19.

Is there a site out there which lists all relevant changes done for each 
kernel version and the recommendations how to correctly handle them?



Kind regards,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


f_op->read seems to be always NULL since Linux 4.1

2015-06-27 Thread Andreas Hartmann
Hello!

Given is a module like the following snippet running fine w/ Linux 4.0
and ext4 fs - but doesn't work w/ Linux 4.1 because f->f_op->read is not
defined any more (= NULL). Is this the intended behavior now?

vfs_read(f, buf, 128, >f_pos) works fine.


module.c

#include 
#include 
#include 
#include 

int init_module(void)
{
struct file *f;
char buf[128];
mm_segment_t fs;
int i;
int len=128;

for(i=0;if_op->read) {
f->f_op->read(f, buf, len, >f_pos);
printk(KERN_INFO "buf:%s\n",buf);
}
else {
printk(KERN_INFO "No read method\n");
}

set_fs(fs);

}
filp_close(f,NULL);
return 0;
}

void cleanup_module(void)
{
printk(KERN_INFO "My module is unloaded\n");
}
---

Makefile:
---
obj-m += module.o

all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean




Regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


f_op-read seems to be always NULL since Linux 4.1

2015-06-27 Thread Andreas Hartmann
Hello!

Given is a module like the following snippet running fine w/ Linux 4.0
and ext4 fs - but doesn't work w/ Linux 4.1 because f-f_op-read is not
defined any more (= NULL). Is this the intended behavior now?

vfs_read(f, buf, 128, f-f_pos) works fine.


module.c

#include linux/module.h
#include linux/kernel.h
#include linux/fs.h
#include asm/uaccess.h

int init_module(void)
{
struct file *f;
char buf[128];
mm_segment_t fs;
int i;
int len=128;

for(i=0;ilen;i++)
buf[i] = 0;

printk(KERN_INFO My module is loaded\n);

f = filp_open(/etc/fedora-release, O_RDONLY, 0);
if(f == NULL)
printk(KERN_ALERT filp_open error!!.\n);
else{
fs = get_fs();
set_fs(get_ds());

if (f-f_op-read) {
f-f_op-read(f, buf, len, f-f_pos);
printk(KERN_INFO buf:%s\n,buf);
}
else {
printk(KERN_INFO No read method\n);
}

set_fs(fs);

}
filp_close(f,NULL);
return 0;
}

void cleanup_module(void)
{
printk(KERN_INFO My module is unloaded\n);
}
---

Makefile:
---
obj-m += module.o

all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean




Regards,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] PCI: quirk Atheros AR93xx to avoid bus reset

2015-01-12 Thread Andreas Hartmann
Hello Alex!

Alex Williamson wrote:
> On Mon, 2015-01-12 at 16:20 +0100, Andreas Hartmann wrote:
>> Alex Williamson wrote:
>>> On Thu, 2015-01-08 at 09:07 -0700, Bjorn Helgaas wrote:
>>>> On Fri, Nov 21, 2014 at 11:24:27AM -0700, Alex Williamson wrote:
>>>>> Reports against the TL-WDN4800 card indicate that PCI bus reset of
>>>>> this Atheros device cause system lock-ups and resets.  I've also
>>>>> been able to confirm this behavior on multiple systems.  The device
>>>>> never returns from reset and attempts to access config space of the
>>>>> device after reset result in hangs.  Blacklist bus reset for the
>>>>> device to avoid this issue.
>>>>>
>>>>> Reported-by: Andreas Hartmann 
>>>>> Signed-off-by: Alex Williamson 
>>>>> Tested-by: Andreas Hartmann 
>>>>
>>>> If I understand correctly, these two (patches 3 & 4) fix a v3.14 regression
>>>> caused by 425c1b223dac ("PCI: Add Virtual Channel to save/restore 
>>>> support").
>>>>
>>>> If so, these should go to for-linus for v3.19.  What about patches 1 & 2?
>>>> Do they fix a regression?  Is there a pointer to a bugzilla or problem
>>>> report about that issue?
>>>>
>>>> I don't understand the connection between 425c1b223dac and
>>>> PCI_DEV_FLAGS_NO_BUS_RESET, because 425c1b223dac doesn't seem to do any
>>>> resets.  Is that the wrong commit, or can you outline the connection for
>>>> me?
>>>
>>> TBH, I don't have a lot of faith in associating this to 425c1b223dac,
>>> I'm not sure how Andreas' bisect landed there. 
>>
>> Because removing this patch made it working again :-)
>>
>> And too:
>> http://thread.gmane.org/gmane.linux.kernel.pci/35170/focus=35984
>>
>> Kernel 2.10. and 2.12. and 2.13. did work fine for me. 2.14 is the first
>> kernel, which hangs the machine at startup of the VM. The userland
>> (qemu) didn't change in between.
> 
> s/2\./3\./

Thanks :-) It seems I don't like the number 3 :-)

> Ok, so what about VC save/restore (425c1b223dac) is the problem then?
> When we tried to determine that, you found that if we continue from the
> top of the save loop, everything works (ie. no VC state saved), but if
> you continue after the variable declaration of the same loop (ie. still
> no VC state saved), it breaks:
> 
> http://www.spinics.net/lists/linux-pci/msg36166.html
> 
> So, please forgive me if I don't have a whole lot of faith that
> 425c1b223dac is involved.

It's hard for me, too. Really. It's kind of mystique.

> We also both independently determined that this particular device never
> recovers from a PCI bus reset, even when done from userspace with setpci
> and absolutely no save/restore wrappers.

Yes.

>  Config space on the device is
> never accessible after the reset.

Yes.

>  Therefore, how could any sort of bus
> reset with save/restore ever work for this device?

I can't say. What I definitely can say, is that I never had problems
with running VMs w/ qemu until 3.14 came up. Do you think I'm lying? I
used 3.10. and 3.12. for long time w/o (known!) problems (3.12 only on
first start of VM). Otherwise I would have been here long time before :-))).

>> Therefore: from my point of view, it is a regression, because things
>> have been working < 2.14.
>>
>> Besides that: It is undoubted, that there is a problem with resetting
>> this card. But the difference between >= 3.14 and < 3.14 is, that < 3.14
>> has been working nevertheless. The patch
>> 425c1b223dac456d00a61fd6b451b6d1cf00d065 obviously changed something
>> which I can't say and I don't know off. Therefore, the quirk-patch is
>> definitely required, because things work completely fine again w/ this
>> patch.
>>
>> "Working" means for me here: I was able to start (and use) the VM w/o
>> crashing the machine and this isn't possible w/ unpatched 2.14+ any
>> more. Yes, w/ 2.12, I wasn't able to restart the VM (it then crashed the
>> machine), but w/ 2.10 even this was possible.
> 
> What?!  So v3.12 still had a machine crash when assigning this device.

Yes. If you *re*start the VM (long time, I didn't knew that fact at all
- I just discovered it during testing while analyzing the problem :-)).
The first start (after reboot) was not a problem. This was the usual use
case here :-)).

Believe me, I'm really convinced that this card does have a problem with
resets. I'm just wondering why it had worked for me until 3.13. That's all.

> The vfio hot reset interface was added i

Re: [PATCH 4/4] PCI: quirk Atheros AR93xx to avoid bus reset

2015-01-12 Thread Andreas Hartmann
Alex Williamson wrote:
> On Thu, 2015-01-08 at 09:07 -0700, Bjorn Helgaas wrote:
>> On Fri, Nov 21, 2014 at 11:24:27AM -0700, Alex Williamson wrote:
>>> Reports against the TL-WDN4800 card indicate that PCI bus reset of
>>> this Atheros device cause system lock-ups and resets.  I've also
>>> been able to confirm this behavior on multiple systems.  The device
>>> never returns from reset and attempts to access config space of the
>>> device after reset result in hangs.  Blacklist bus reset for the
>>> device to avoid this issue.
>>>
>>> Reported-by: Andreas Hartmann 
>>> Signed-off-by: Alex Williamson 
>>> Tested-by: Andreas Hartmann 
>>
>> If I understand correctly, these two (patches 3 & 4) fix a v3.14 regression
>> caused by 425c1b223dac ("PCI: Add Virtual Channel to save/restore support").
>>
>> If so, these should go to for-linus for v3.19.  What about patches 1 & 2?
>> Do they fix a regression?  Is there a pointer to a bugzilla or problem
>> report about that issue?
>>
>> I don't understand the connection between 425c1b223dac and
>> PCI_DEV_FLAGS_NO_BUS_RESET, because 425c1b223dac doesn't seem to do any
>> resets.  Is that the wrong commit, or can you outline the connection for
>> me?
> 
> TBH, I don't have a lot of faith in associating this to 425c1b223dac,
> I'm not sure how Andreas' bisect landed there. 

Because removing this patch made it working again :-)

And too:
http://thread.gmane.org/gmane.linux.kernel.pci/35170/focus=35984

Kernel 2.10. and 2.12. and 2.13. did work fine for me. 2.14 is the first
kernel, which hangs the machine at startup of the VM. The userland
(qemu) didn't change in between.

Therefore: from my point of view, it is a regression, because things
have been working < 2.14.

Besides that: It is undoubted, that there is a problem with resetting
this card. But the difference between >= 3.14 and < 3.14 is, that < 3.14
has been working nevertheless. The patch
425c1b223dac456d00a61fd6b451b6d1cf00d065 obviously changed something
which I can't say and I don't know off. Therefore, the quirk-patch is
definitely required, because things work completely fine again w/ this
patch.

"Working" means for me here: I was able to start (and use) the VM w/o
crashing the machine and this isn't possible w/ unpatched 2.14+ any
more. Yes, w/ 2.12, I wasn't able to restart the VM (it then crashed the
machine), but w/ 2.10 even this was possible.


> IME, this device cannot,
> and has never been able to handle a bus reset.  A simple setpci
> experiment on the commandline can confirm this.  What I think happened
> is that with the PCI bus reset infrastructure we added, we switched QEMU
> to prefer PCI bus resets over things like PM D3hot->D0 resets.  So it's
> just more prolific use of bus resets by userspace.
> 
> There's also no regression in 1 & 2, PM reset has never done anything
> useful on those devices.  Thanks,
> 
> Alex
> 
>>> ---
>>>
>>>  drivers/pci/quirks.c |   14 ++
>>>  1 file changed, 14 insertions(+)
>>>
>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>> index 561e10d..ebbd5b4 100644
>>> --- a/drivers/pci/quirks.c
>>> +++ b/drivers/pci/quirks.c
>>> @@ -3029,6 +3029,20 @@ static void quirk_no_pm_reset(struct pci_dev *dev)
>>>  DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
>>>PCI_CLASS_DISPLAY_VGA, 8, quirk_no_pm_reset);
>>>  
>>> +static void quirk_no_bus_reset(struct pci_dev *dev)
>>> +{
>>> +   dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
>>> +}
>>> +
>>> +/*
>>> + * Atheros AR93xx chips do not behave after a bus reset.  The device will
>>> + * throw a Link Down error on AER capable system and regardless of AER,
>>> + * config space of the device is never accessible again and typically
>>> + * causes the system to hang or reset when access is attempted.
>>> + * http://www.spinics.net/lists/linux-pci/msg34797.html
>>> + */
>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0030, 
>>> quirk_no_bus_reset);
>>> +
>>>  #ifdef CONFIG_ACPI
>>>  /*
>>>   * Apple: Shutdown Cactus Ridge Thunderbolt controller.
>>>
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] PCI: quirk Atheros AR93xx to avoid bus reset

2015-01-12 Thread Andreas Hartmann
Alex Williamson wrote:
 On Thu, 2015-01-08 at 09:07 -0700, Bjorn Helgaas wrote:
 On Fri, Nov 21, 2014 at 11:24:27AM -0700, Alex Williamson wrote:
 Reports against the TL-WDN4800 card indicate that PCI bus reset of
 this Atheros device cause system lock-ups and resets.  I've also
 been able to confirm this behavior on multiple systems.  The device
 never returns from reset and attempts to access config space of the
 device after reset result in hangs.  Blacklist bus reset for the
 device to avoid this issue.

 Reported-by: Andreas Hartmann andihartm...@freenet.de
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 Tested-by: Andreas Hartmann andihartm...@freenet.de

 If I understand correctly, these two (patches 3  4) fix a v3.14 regression
 caused by 425c1b223dac (PCI: Add Virtual Channel to save/restore support).

 If so, these should go to for-linus for v3.19.  What about patches 1  2?
 Do they fix a regression?  Is there a pointer to a bugzilla or problem
 report about that issue?

 I don't understand the connection between 425c1b223dac and
 PCI_DEV_FLAGS_NO_BUS_RESET, because 425c1b223dac doesn't seem to do any
 resets.  Is that the wrong commit, or can you outline the connection for
 me?
 
 TBH, I don't have a lot of faith in associating this to 425c1b223dac,
 I'm not sure how Andreas' bisect landed there. 

Because removing this patch made it working again :-)

And too:
http://thread.gmane.org/gmane.linux.kernel.pci/35170/focus=35984

Kernel 2.10. and 2.12. and 2.13. did work fine for me. 2.14 is the first
kernel, which hangs the machine at startup of the VM. The userland
(qemu) didn't change in between.

Therefore: from my point of view, it is a regression, because things
have been working  2.14.

Besides that: It is undoubted, that there is a problem with resetting
this card. But the difference between = 3.14 and  3.14 is, that  3.14
has been working nevertheless. The patch
425c1b223dac456d00a61fd6b451b6d1cf00d065 obviously changed something
which I can't say and I don't know off. Therefore, the quirk-patch is
definitely required, because things work completely fine again w/ this
patch.

Working means for me here: I was able to start (and use) the VM w/o
crashing the machine and this isn't possible w/ unpatched 2.14+ any
more. Yes, w/ 2.12, I wasn't able to restart the VM (it then crashed the
machine), but w/ 2.10 even this was possible.


 IME, this device cannot,
 and has never been able to handle a bus reset.  A simple setpci
 experiment on the commandline can confirm this.  What I think happened
 is that with the PCI bus reset infrastructure we added, we switched QEMU
 to prefer PCI bus resets over things like PM D3hot-D0 resets.  So it's
 just more prolific use of bus resets by userspace.
 
 There's also no regression in 1  2, PM reset has never done anything
 useful on those devices.  Thanks,
 
 Alex
 
 ---

  drivers/pci/quirks.c |   14 ++
  1 file changed, 14 insertions(+)

 diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
 index 561e10d..ebbd5b4 100644
 --- a/drivers/pci/quirks.c
 +++ b/drivers/pci/quirks.c
 @@ -3029,6 +3029,20 @@ static void quirk_no_pm_reset(struct pci_dev *dev)
  DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
PCI_CLASS_DISPLAY_VGA, 8, quirk_no_pm_reset);
  
 +static void quirk_no_bus_reset(struct pci_dev *dev)
 +{
 +   dev-dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
 +}
 +
 +/*
 + * Atheros AR93xx chips do not behave after a bus reset.  The device will
 + * throw a Link Down error on AER capable system and regardless of AER,
 + * config space of the device is never accessible again and typically
 + * causes the system to hang or reset when access is attempted.
 + * http://www.spinics.net/lists/linux-pci/msg34797.html
 + */
 +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0030, 
 quirk_no_bus_reset);
 +
  #ifdef CONFIG_ACPI
  /*
   * Apple: Shutdown Cactus Ridge Thunderbolt controller.

 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-pci in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] PCI: quirk Atheros AR93xx to avoid bus reset

2015-01-12 Thread Andreas Hartmann
Hello Alex!

Alex Williamson wrote:
 On Mon, 2015-01-12 at 16:20 +0100, Andreas Hartmann wrote:
 Alex Williamson wrote:
 On Thu, 2015-01-08 at 09:07 -0700, Bjorn Helgaas wrote:
 On Fri, Nov 21, 2014 at 11:24:27AM -0700, Alex Williamson wrote:
 Reports against the TL-WDN4800 card indicate that PCI bus reset of
 this Atheros device cause system lock-ups and resets.  I've also
 been able to confirm this behavior on multiple systems.  The device
 never returns from reset and attempts to access config space of the
 device after reset result in hangs.  Blacklist bus reset for the
 device to avoid this issue.

 Reported-by: Andreas Hartmann andihartm...@freenet.de
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 Tested-by: Andreas Hartmann andihartm...@freenet.de

 If I understand correctly, these two (patches 3  4) fix a v3.14 regression
 caused by 425c1b223dac (PCI: Add Virtual Channel to save/restore 
 support).

 If so, these should go to for-linus for v3.19.  What about patches 1  2?
 Do they fix a regression?  Is there a pointer to a bugzilla or problem
 report about that issue?

 I don't understand the connection between 425c1b223dac and
 PCI_DEV_FLAGS_NO_BUS_RESET, because 425c1b223dac doesn't seem to do any
 resets.  Is that the wrong commit, or can you outline the connection for
 me?

 TBH, I don't have a lot of faith in associating this to 425c1b223dac,
 I'm not sure how Andreas' bisect landed there. 

 Because removing this patch made it working again :-)

 And too:
 http://thread.gmane.org/gmane.linux.kernel.pci/35170/focus=35984

 Kernel 2.10. and 2.12. and 2.13. did work fine for me. 2.14 is the first
 kernel, which hangs the machine at startup of the VM. The userland
 (qemu) didn't change in between.
 
 s/2\./3\./

Thanks :-) It seems I don't like the number 3 :-)

 Ok, so what about VC save/restore (425c1b223dac) is the problem then?
 When we tried to determine that, you found that if we continue from the
 top of the save loop, everything works (ie. no VC state saved), but if
 you continue after the variable declaration of the same loop (ie. still
 no VC state saved), it breaks:
 
 http://www.spinics.net/lists/linux-pci/msg36166.html
 
 So, please forgive me if I don't have a whole lot of faith that
 425c1b223dac is involved.

It's hard for me, too. Really. It's kind of mystique.

 We also both independently determined that this particular device never
 recovers from a PCI bus reset, even when done from userspace with setpci
 and absolutely no save/restore wrappers.

Yes.

  Config space on the device is
 never accessible after the reset.

Yes.

  Therefore, how could any sort of bus
 reset with save/restore ever work for this device?

I can't say. What I definitely can say, is that I never had problems
with running VMs w/ qemu until 3.14 came up. Do you think I'm lying? I
used 3.10. and 3.12. for long time w/o (known!) problems (3.12 only on
first start of VM). Otherwise I would have been here long time before :-))).

 Therefore: from my point of view, it is a regression, because things
 have been working  2.14.

 Besides that: It is undoubted, that there is a problem with resetting
 this card. But the difference between = 3.14 and  3.14 is, that  3.14
 has been working nevertheless. The patch
 425c1b223dac456d00a61fd6b451b6d1cf00d065 obviously changed something
 which I can't say and I don't know off. Therefore, the quirk-patch is
 definitely required, because things work completely fine again w/ this
 patch.

 Working means for me here: I was able to start (and use) the VM w/o
 crashing the machine and this isn't possible w/ unpatched 2.14+ any
 more. Yes, w/ 2.12, I wasn't able to restart the VM (it then crashed the
 machine), but w/ 2.10 even this was possible.
 
 What?!  So v3.12 still had a machine crash when assigning this device.

Yes. If you *re*start the VM (long time, I didn't knew that fact at all
- I just discovered it during testing while analyzing the problem :-)).
The first start (after reboot) was not a problem. This was the usual use
case here :-)).

Believe me, I'm really convinced that this card does have a problem with
resets. I'm just wondering why it had worked for me until 3.13. That's all.

 The vfio hot reset interface was added in v3.12, so v3.10 didn't have
 any way to do a reset other than what pci_reset_function() decided to
 do.  That all seems to associate the machine crash to the ability to do
 a bus reset on the device.  I'm not sure why the behavior changed
 between v3.14 and v3.12 (maybe the try-reset addition), but there's some
 sort of pre-existing issue before we even got to 425c1b223dac.

Most probably.

 I'm perfectly happy tagging this for stable,

Thanks!! I'm really very comfortable with your patch and your support!
Really! Thanks a lot! It's just odd for me, why it partly worked (first
start of VM worked) w/ 3.12 and 3.13 and 3.14 suddenly no more at all.

You have been accidentally the sufferer - most probably it could have
hit any other

Re: [PATCH] Revert "cfg80211: make WEXT compatibility unselectable"

2015-01-01 Thread Andreas Hartmann
Arend van Spriel wrote:
> On 12/31/14 16:14, Andreas Hartmann wrote:
[...]
>> All in all:
>> If you want to get rid of wext, you still have to go a *very* long way
>> to get the same *stable* and high throughput quality with *all* chips
>> depending on mac80211 and not just a few flagship drivers like Atheros.
> 
> Hi Andreas,
> 
> That's a nice list of unrelated stuff. This has all nothing to do with
> WEXT. Actually, you can build rt5572sta with cfg80211 support
> (RT_CFG80211_SUPPORT).

You seem to know sources I don't know off. Could you please tell me,
where to find them?

I have DPO_RT5572_LinuxSTA_2.6.0.1_20120629 which doesn't compile with
HAS_CFG80211_SUPPORT=y because -DCONFIG_AP_SUPPORT, on which
RT_CFG80211_SUPPORT relies, is broken.

DPO_RT5572_LinuxSTA_2.6.1.3_20121022 removed the necessary broken AP
code completely.

> This thread is about the configuration API and
> not about driver performance.

I know.

I tried to show, why WEXT as a whole is still necessary even if there is
a mac80211 based driver, because of the weakness of rt2800usb:
Nip it in the bud.



Kind regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Revert cfg80211: make WEXT compatibility unselectable

2015-01-01 Thread Andreas Hartmann
Arend van Spriel wrote:
 On 12/31/14 16:14, Andreas Hartmann wrote:
[...]
 All in all:
 If you want to get rid of wext, you still have to go a *very* long way
 to get the same *stable* and high throughput quality with *all* chips
 depending on mac80211 and not just a few flagship drivers like Atheros.
 
 Hi Andreas,
 
 That's a nice list of unrelated stuff. This has all nothing to do with
 WEXT. Actually, you can build rt5572sta with cfg80211 support
 (RT_CFG80211_SUPPORT).

You seem to know sources I don't know off. Could you please tell me,
where to find them?

I have DPO_RT5572_LinuxSTA_2.6.0.1_20120629 which doesn't compile with
HAS_CFG80211_SUPPORT=y because -DCONFIG_AP_SUPPORT, on which
RT_CFG80211_SUPPORT relies, is broken.

DPO_RT5572_LinuxSTA_2.6.1.3_20121022 removed the necessary broken AP
code completely.

 This thread is about the configuration API and
 not about driver performance.

I know.

I tried to show, why WEXT as a whole is still necessary even if there is
a mac80211 based driver, because of the weakness of rt2800usb:
Nip it in the bud.



Kind regards,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Revert "cfg80211: make WEXT compatibility unselectable"

2014-12-31 Thread Andreas Hartmann
Jiri Kosina wrote:
> On Wed, 31 Dec 2014, Arend van Spriel wrote:
> 
>> The thing with WEXT is that it will stay as is. So if tools like wicd 
>> want to support new features like P2P it will need to make the switch. I 
>> checked out wicd repo and found a number of iwconfig calls and they kick 
>> off wpa_supplicant with wext driver.
> 
> Unfortunately this is by no means just about wicd. I have already received 
> a few off-list mails from people who were wondering why their home-made 
> scripts / tools, which are running 'iwconfig' directly suddenly stopped to 
> work, and that it was indeed fallout of WEXT going away. Given the very 
> short time this has been in mainline, you can probably imagine the 
> fireworks once this appears in major release.

It is not just the userspace tools (I prefer them, too), which need
wext, but a lot of drivers, too, such as Mediathek drivers e.g. which
perform *much* better compared to rt2x00, especially concerning USB
chips like the one used by Linksys AE3000 (3x3 Mimo)
(https://wikidevi.com/wiki/Linksys_AE3000), which achieves average
throughputs around 14 MB/s *average* with scp of big (> 10 GB) crypted
files even through reinforced-concrete floor(!) - rt2x00 is *far* away
of providing such a performance.

Next bad point of rt2x00 e.g. is the huge CPU overhead - compare
rt5572sta on Raspi with rt2x00 running netperf and you will see the huge
problem of rt2x00 (which is covered on x86 by mostly oversized multi
core CPUs).

Another big advantage of rt5572sta is: it is *stable* over a lot of
kernel versions (as long as the kernel didn't break interfaces - but
there are patches to catch them).

Even ath9k, which usually is a really fine driver, is broken on some
kernel versions (link and throughput is not stable - my use case depends
*heavily* on very high and longterm stable throughput). That's why I'm
using a VM for my ath9k-device to be independent of these quality
problems of mac80211 (or maybe ath9k - don't know) over different kernel
versions.


All in all:
If you want to get rid of wext, you still have to go a *very* long way
to get the same *stable* and high throughput quality with *all* chips
depending on mac80211 and not just a few flagship drivers like Atheros.



Kind regards,
Andreas Hartmann
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Revert cfg80211: make WEXT compatibility unselectable

2014-12-31 Thread Andreas Hartmann
Jiri Kosina wrote:
 On Wed, 31 Dec 2014, Arend van Spriel wrote:
 
 The thing with WEXT is that it will stay as is. So if tools like wicd 
 want to support new features like P2P it will need to make the switch. I 
 checked out wicd repo and found a number of iwconfig calls and they kick 
 off wpa_supplicant with wext driver.
 
 Unfortunately this is by no means just about wicd. I have already received 
 a few off-list mails from people who were wondering why their home-made 
 scripts / tools, which are running 'iwconfig' directly suddenly stopped to 
 work, and that it was indeed fallout of WEXT going away. Given the very 
 short time this has been in mainline, you can probably imagine the 
 fireworks once this appears in major release.

It is not just the userspace tools (I prefer them, too), which need
wext, but a lot of drivers, too, such as Mediathek drivers e.g. which
perform *much* better compared to rt2x00, especially concerning USB
chips like the one used by Linksys AE3000 (3x3 Mimo)
(https://wikidevi.com/wiki/Linksys_AE3000), which achieves average
throughputs around 14 MB/s *average* with scp of big ( 10 GB) crypted
files even through reinforced-concrete floor(!) - rt2x00 is *far* away
of providing such a performance.

Next bad point of rt2x00 e.g. is the huge CPU overhead - compare
rt5572sta on Raspi with rt2x00 running netperf and you will see the huge
problem of rt2x00 (which is covered on x86 by mostly oversized multi
core CPUs).

Another big advantage of rt5572sta is: it is *stable* over a lot of
kernel versions (as long as the kernel didn't break interfaces - but
there are patches to catch them).

Even ath9k, which usually is a really fine driver, is broken on some
kernel versions (link and throughput is not stable - my use case depends
*heavily* on very high and longterm stable throughput). That's why I'm
using a VM for my ath9k-device to be independent of these quality
problems of mac80211 (or maybe ath9k - don't know) over different kernel
versions.


All in all:
If you want to get rid of wext, you still have to go a *very* long way
to get the same *stable* and high throughput quality with *all* chips
depending on mac80211 and not just a few flagship drivers like Atheros.



Kind regards,
Andreas Hartmann
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] PCI: quirk Atheros AR93xx to avoid bus reset

2014-12-26 Thread Andreas Hartmann
Hello Bjorn,

I'm running this patch and the corresponding "[PATCH 3/4] PCI: Allow
device quirks to exclude bus reset" patch meanwhile since a month w/
kernel 3.14.x and couldn't find any problem. Would it be possible to
apply these patches to main kernel? Or even to lt-kernel 3.14?


Thanks.
kind regards,
Andreas Hartmann


Alex Williamson wrote:
> Reports against the TL-WDN4800 card indicate that PCI bus reset of
> this Atheros device cause system lock-ups and resets.  I've also
> been able to confirm this behavior on multiple systems.  The device
> never returns from reset and attempts to access config space of the
> device after reset result in hangs.  Blacklist bus reset for the
> device to avoid this issue.
> 
> Reported-by: Andreas Hartmann 
> Signed-off-by: Alex Williamson 
> Tested-by: Andreas Hartmann 
> ---
> 
>  drivers/pci/quirks.c |   14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 561e10d..ebbd5b4 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3029,6 +3029,20 @@ static void quirk_no_pm_reset(struct pci_dev *dev)
>  DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
>  PCI_CLASS_DISPLAY_VGA, 8, quirk_no_pm_reset);
>  
> +static void quirk_no_bus_reset(struct pci_dev *dev)
> +{
> + dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
> +}
> +
> +/*
> + * Atheros AR93xx chips do not behave after a bus reset.  The device will
> + * throw a Link Down error on AER capable system and regardless of AER,
> + * config space of the device is never accessible again and typically
> + * causes the system to hang or reset when access is attempted.
> + * http://www.spinics.net/lists/linux-pci/msg34797.html
> + */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0030, quirk_no_bus_reset);
> +
>  #ifdef CONFIG_ACPI
>  /*
>   * Apple: Shutdown Cactus Ridge Thunderbolt controller.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] PCI: quirk Atheros AR93xx to avoid bus reset

2014-12-26 Thread Andreas Hartmann
Hello Bjorn,

I'm running this patch and the corresponding [PATCH 3/4] PCI: Allow
device quirks to exclude bus reset patch meanwhile since a month w/
kernel 3.14.x and couldn't find any problem. Would it be possible to
apply these patches to main kernel? Or even to lt-kernel 3.14?


Thanks.
kind regards,
Andreas Hartmann


Alex Williamson wrote:
 Reports against the TL-WDN4800 card indicate that PCI bus reset of
 this Atheros device cause system lock-ups and resets.  I've also
 been able to confirm this behavior on multiple systems.  The device
 never returns from reset and attempts to access config space of the
 device after reset result in hangs.  Blacklist bus reset for the
 device to avoid this issue.
 
 Reported-by: Andreas Hartmann andihartm...@freenet.de
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 Tested-by: Andreas Hartmann andihartm...@freenet.de
 ---
 
  drivers/pci/quirks.c |   14 ++
  1 file changed, 14 insertions(+)
 
 diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
 index 561e10d..ebbd5b4 100644
 --- a/drivers/pci/quirks.c
 +++ b/drivers/pci/quirks.c
 @@ -3029,6 +3029,20 @@ static void quirk_no_pm_reset(struct pci_dev *dev)
  DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
  PCI_CLASS_DISPLAY_VGA, 8, quirk_no_pm_reset);
  
 +static void quirk_no_bus_reset(struct pci_dev *dev)
 +{
 + dev-dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
 +}
 +
 +/*
 + * Atheros AR93xx chips do not behave after a bus reset.  The device will
 + * throw a Link Down error on AER capable system and regardless of AER,
 + * config space of the device is never accessible again and typically
 + * causes the system to hang or reset when access is attempted.
 + * http://www.spinics.net/lists/linux-pci/msg34797.html
 + */
 +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0030, quirk_no_bus_reset);
 +
  #ifdef CONFIG_ACPI
  /*
   * Apple: Shutdown Cactus Ridge Thunderbolt controller.
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange problem with vxlan!

2014-01-08 Thread Andreas Hartmann
Hi!

For all others, having problems w/ broken multicast:

See the solution here:
http://article.gmane.org/gmane.linux.kernel/1625590


Regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Out of the box (ootb) multicast broken since Linux 3.5 until min. 3.12.

2014-01-08 Thread Andreas Hartmann
Hello!

This patch:

commit c5c23260594c5701af66ef754916775ba6a46bbc
Author: Herbert Xu 
Date:   Fri Apr 13 02:37:42 2012 +

bridge: Add multicast_querier toggle and disable queries by default

Sending general queries was implemented as an optimisation to speed
up convergence on start-up.  In order to prevent interference with
multicast routers a zero source address has to be used.

Unfortunately these packets appear to cause some multicast-aware
switches to misbehave, e.g., by disrupting multicast packets to us.

Since the multicast snooping feature still functions without sending
our own queries, this patch will change the default to not send
queries.

For those that need queries in order to speed up convergence on
start-up, a toggle is provided to restore the previous behaviour.

Signed-off-by: Herbert Xu 
Signed-off-by: David S. Miller 


incompatibly broke ootb multicast in Linux until 3.12 or even higher
(didn't test) for this use case:

http://thread.gmane.org/gmane.linux.kernel/1622550


It is necessary to manually add this switch

echo "1" > /sys/devices/virtual/net/br0/bridge/multicast_querier

to get multicast working again.


Would be nice to get the old behaviour (= working multicast ootb) back
again. This would have saved a lot of time, probably not only in my case
here (e.g. see https://bugzilla.redhat.com/show_bug.cgi?id=880035).


Regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Out of the box (ootb) multicast broken since Linux 3.5 until min. 3.12.

2014-01-08 Thread Andreas Hartmann
Hello!

This patch:

commit c5c23260594c5701af66ef754916775ba6a46bbc
Author: Herbert Xu herb...@gondor.apana.org.au
Date:   Fri Apr 13 02:37:42 2012 +

bridge: Add multicast_querier toggle and disable queries by default

Sending general queries was implemented as an optimisation to speed
up convergence on start-up.  In order to prevent interference with
multicast routers a zero source address has to be used.

Unfortunately these packets appear to cause some multicast-aware
switches to misbehave, e.g., by disrupting multicast packets to us.

Since the multicast snooping feature still functions without sending
our own queries, this patch will change the default to not send
queries.

For those that need queries in order to speed up convergence on
start-up, a toggle is provided to restore the previous behaviour.

Signed-off-by: Herbert Xu herb...@gondor.apana.org.au
Signed-off-by: David S. Miller da...@davemloft.net


incompatibly broke ootb multicast in Linux until 3.12 or even higher
(didn't test) for this use case:

http://thread.gmane.org/gmane.linux.kernel/1622550


It is necessary to manually add this switch

echo 1  /sys/devices/virtual/net/br0/bridge/multicast_querier

to get multicast working again.


Would be nice to get the old behaviour (= working multicast ootb) back
again. This would have saved a lot of time, probably not only in my case
here (e.g. see https://bugzilla.redhat.com/show_bug.cgi?id=880035).


Regards,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange problem with vxlan!

2014-01-08 Thread Andreas Hartmann
Hi!

For all others, having problems w/ broken multicast:

See the solution here:
http://article.gmane.org/gmane.linux.kernel/1625590


Regards,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange problem with vxlan!

2014-01-05 Thread Andreas Hartmann
On Fri, 3 Jan 2014 15:27:19 +0100
Andreas Hartmann  wrote:

[...]

> Now the problem:
> 
> If the VM (=AP) runs e.g. Linux 3.4.x, all is working fine as expected. 
> If the VM runs 3.12.x or even 3.10.x, the tunnel works fine a few minutes 
> after creation. Afterwards it is broken.
> 
> Broken means:
> A "dhcpcd eth0" e.g. on the notebook times out, doesn't work any more. Traces 
> show:
> The udp-tunnel-packages sent by the STA through vxlan0 can be seen on the 
> host / tap0, but they can't be seen on vxlan0 (if it works, they can be seen 
> on the vxlan0 device, too).
> 
> On the host runs Linux 3.10.x, on the STA 3.11.6.

Some more findings:

- Problem can be seen with Linux 3.7 in the AP (VM), too.
- *Problem disappears* if the bridge device br0 on the host is set to
  promiscuous mode.
- Sometimes, there can be seen the warning 
  "notebook dhcpcd[2784]: eth0: bad UDP checksum, ignoring" 
  when starting dhcpcd on the notebook with br0 / host set to promiscuous
  mode (nevertheless dhcpcd worked fine). I never saw this warning
  before.


Any idea how to fix the problem w/o running the bridge br0 on the host
in promiscuous mode?



Thanks for any hint,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange problem with vxlan!

2014-01-05 Thread Andreas Hartmann
On Fri, 3 Jan 2014 15:27:19 +0100
Andreas Hartmann andihartm...@01019freenet.de wrote:

[...]

 Now the problem:
 
 If the VM (=AP) runs e.g. Linux 3.4.x, all is working fine as expected. 
 If the VM runs 3.12.x or even 3.10.x, the tunnel works fine a few minutes 
 after creation. Afterwards it is broken.
 
 Broken means:
 A dhcpcd eth0 e.g. on the notebook times out, doesn't work any more. Traces 
 show:
 The udp-tunnel-packages sent by the STA through vxlan0 can be seen on the 
 host / tap0, but they can't be seen on vxlan0 (if it works, they can be seen 
 on the vxlan0 device, too).
 
 On the host runs Linux 3.10.x, on the STA 3.11.6.

Some more findings:

- Problem can be seen with Linux 3.7 in the AP (VM), too.
- *Problem disappears* if the bridge device br0 on the host is set to
  promiscuous mode.
- Sometimes, there can be seen the warning 
  notebook dhcpcd[2784]: eth0: bad UDP checksum, ignoring 
  when starting dhcpcd on the notebook with br0 / host set to promiscuous
  mode (nevertheless dhcpcd worked fine). I never saw this warning
  before.


Any idea how to fix the problem w/o running the bridge br0 on the host
in promiscuous mode?



Thanks for any hint,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Strange problem with vxlan!

2014-01-03 Thread Andreas Hartmann
Given is the following network architecture: connection of a virtual bridge br0 
and a remote ethernet-switch through vxlan tunnel via WLAN:



host[br0: tap0,vxlan0]
|||
|===
| ||
| ||
VM (WLAN access point)  [br0: eth0, wlan0]||
  |   ||
  |   ||
  -   ||
  |   ||
STA [wlan0, br0: eth0, vxlan0]
  |
  |
   |--|
Switch
   |
   --
|
notebook [eth0]



The configuration of the vxlan is:

host: route add -net 224.0.0.0 netmask 240.0.0.0 dev br0
  ip li add vxlan0 type vxlan id 1 group 239.1.1.1 dev br0

STA:  route add -net 224.0.0.0 netmask 240.0.0.0 dev wlan0
  ip li add vxlan0 type vxlan id 1 group 239.1.1.1 dev wlan0

This means: the endpoints of the vxlan tunnel are br0 (host) and STA (wlan0). 
Between them, there is the WLAN AP (a VM belonging to the host).


Now the problem:

If the VM (=AP) runs e.g. Linux 3.4.x, all is working fine as expected. 
If the VM runs 3.12.x or even 3.10.x, the tunnel works fine a few minutes after 
creation. Afterwards it is broken.

Broken means:
A "dhcpcd eth0" e.g. on the notebook times out, doesn't work any more. Traces 
show:
The udp-tunnel-packages sent by the STA through vxlan0 can be seen on the host 
/ tap0, but they can't be seen on vxlan0 (if it works, they can be seen on the 
vxlan0 device, too).

On the host runs Linux 3.10.x, on the STA 3.11.6.


Any idea why vxlan is broken w/ Linux 3.12.x or 3.10.x on the VM (AP)?



Thanks in advance for any hint,
regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Strange problem with vxlan!

2014-01-03 Thread Andreas Hartmann
Given is the following network architecture: connection of a virtual bridge br0 
and a remote ethernet-switch through vxlan tunnel via WLAN:



host[br0: tap0,vxlan0]
|||
|===
| ||
| ||
VM (WLAN access point)  [br0: eth0, wlan0]||
  |   ||
  |   ||
  -   ||
  |   ||
STA [wlan0, br0: eth0, vxlan0]
  |
  |
   |--|
Switch
   |
   --
|
notebook [eth0]



The configuration of the vxlan is:

host: route add -net 224.0.0.0 netmask 240.0.0.0 dev br0
  ip li add vxlan0 type vxlan id 1 group 239.1.1.1 dev br0

STA:  route add -net 224.0.0.0 netmask 240.0.0.0 dev wlan0
  ip li add vxlan0 type vxlan id 1 group 239.1.1.1 dev wlan0

This means: the endpoints of the vxlan tunnel are br0 (host) and STA (wlan0). 
Between them, there is the WLAN AP (a VM belonging to the host).


Now the problem:

If the VM (=AP) runs e.g. Linux 3.4.x, all is working fine as expected. 
If the VM runs 3.12.x or even 3.10.x, the tunnel works fine a few minutes after 
creation. Afterwards it is broken.

Broken means:
A dhcpcd eth0 e.g. on the notebook times out, doesn't work any more. Traces 
show:
The udp-tunnel-packages sent by the STA through vxlan0 can be seen on the host 
/ tap0, but they can't be seen on vxlan0 (if it works, they can be seen on the 
vxlan0 device, too).

On the host runs Linux 3.10.x, on the STA 3.11.6.


Any idea why vxlan is broken w/ Linux 3.12.x or 3.10.x on the VM (AP)?



Thanks in advance for any hint,
regards,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 102/127] iommu/amd: Workaround for ERBT1312

2013-06-29 Thread Andreas Hartmann

Joerg Roedel schrieb:

On Sat, Jun 29, 2013 at 07:54:20AM +0200, Andreas Hartmann wrote:

Sorry, but it doesn't work for me at all :-(. Behaviour is unchanged. It
is exactly as described in the other mail: at the moment of binding vfio
to 14.0, the fire begins.


Hmm, VFIO attaches the device to a new domain. That clears the bit, how
about this patch:


Didn't help, too :-(


Regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 102/127] iommu/amd: Workaround for ERBT1312

2013-06-29 Thread Andreas Hartmann

Joerg Roedel schrieb:

On Sat, Jun 29, 2013 at 07:54:20AM +0200, Andreas Hartmann wrote:

Sorry, but it doesn't work for me at all :-(. Behaviour is unchanged. It
is exactly as described in the other mail: at the moment of binding vfio
to 14.0, the fire begins.


Hmm, VFIO attaches the device to a new domain. That clears the bit, how
about this patch:


Didn't help, too :-(


Regards,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 102/127] iommu/amd: Workaround for ERBT1312

2013-06-28 Thread Andreas Hartmann
Joerg Roedel wrote:
> Alex, Andreas,
> 
> On Fri, Jun 28, 2013 at 08:42:05PM +0200, Andreas Hartmann wrote:
>> You're right, there is exactly one entry directly after loading of vfio.
>> I can see this message, too, with linux 3.4.43.
> 
> Can you please test this patch? It should reduce the noise
> significantly, but a few of those error messages are still expected.

Sorry, but it doesn't work for me at all :-(. Behaviour is unchanged. It
is exactly as described in the other mail: at the moment of binding vfio
to 14.0, the fire begins.

echo "1002 4385" > /sys/bus/pci/drivers/vfio-pci/new_id
echo :00:14.0 > /sys/bus/pci/devices/:00:14.0/driver/unbind
echo :00:14.0 > /sys/bus/pci/drivers/vfio-pci/bind


Regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 102/127] iommu/amd: Workaround for ERBT1312

2013-06-28 Thread Andreas Hartmann
Alex Williamson wrote:
> On Fri, 2013-06-28 at 18:11 +0200, Andreas Hartmann wrote:
>> Hello Joerg, hello Alex,
>>
>> the subsequent patch and the patch "iommu/amd: Re-enable IOMMU event log
>> interrupt after handling." 925fe08bce38d1ff052fe2209b9e2b8d5fbb7f98
>> spread /var/log/messages with the following line (> 700 lines/second)
>> right after loading vfio:
>>
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.0 domain=0x 
>> address=0x00fdf9103300 flags=0x0600]
> 
> That's interesting, I PXE boot my system from one NIC then use a
> different NIC for the iSCSI root.  The PXE boot NIC now screams like
> this, _until_ I attach it to vfio, then it quiets down.

Hmm, I just remembered an active workaround I implemented to "resolve"
an error like this when starting my VM to passthrough my intel pci
ethernet device since I applied a new kvm version:


qemu-kvm: -device vfio-pci,host=06:06.0: vfio: failed to set iommu for
container: Device or resource busy

qemu-kvm: -device vfio-pci,host=06:06.0: vfio: failed to setup container
for group 12

qemu-kvm: -device vfio-pci,host=06:06.0: vfio: failed to get group 12

qemu-kvm: -device vfio-pci,host=06:06.0: Device 'vfio-pci' could not be
initialized


The workaround was to bind the individual multifunction devices during
boot one time to vfio and release them after 2 seconds again and rebind
them to the original drivers as they where bound before (if it was bound
to any).

I did this with a script beginning like this:

#!/bin/sh
modprobe vfio-pci

echo "1002 4385" > /sys/bus/pci/drivers/vfio-pci/new_id
echo :00:14.0 > /sys/bus/pci/devices/:00:14.0/driver/unbind
echo :00:14.0 > /sys/bus/pci/drivers/vfio-pci/bind
...

sleep 2

echo :00:14.0 > /sys/bus/pci/drivers/vfio-pci/unbind
echo "1002 4385" > /sys/bus/pci/drivers/vfio-pci/remove_id
...

The logs in messages:

Jun 28 15:54:12 . kernel: [   48.860147] VFIO - User Level meta-driver version: 
0.3
Jun 28 15:54:12 . kernel: [   48.875243] AMD-Vi: Event logged [IO_PAGE_FAULT 
device=00:14.0 domain=0x address=0x00fdf9103300 flags=0x0600]
...

Therefore, the logoutput most probably started after device 14.0 was
bound to vfio. If it would have started after removing vfio, I would
have expected 2 seconds between the start messages of vfio and the first
occurrence of the IO_PAGE_FAULT.

Today, I'm using kvm 1.3.1 and it isn't necessary to use the complete
workaround anymore. It is enough to bind / unbind the pci bridge
as described above before starting the VM with the passed through pci
ethernet device.
Because I now don't touch the 14.0 device any more, the IO_PAGE_FAULT
messages disappeared completely.

@Joerg:
Anyway, I'm going to test your provided patch tomorrow!

BTW: what does it mean: IO_PAGE_FAULT - what do I have to expect if I
see this message?



Thanks,
regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 102/127] iommu/amd: Workaround for ERBT1312

2013-06-28 Thread Andreas Hartmann
Hello Joerg,

Joerg Roedel wrote:
> Hi Andreas,
> 
> On Fri, Jun 28, 2013 at 06:11:36PM +0200, Andreas Hartmann wrote:
>> Hello Joerg, hello Alex,
>>
>> the subsequent patch and the patch "iommu/amd: Re-enable IOMMU event log
>> interrupt after handling." 925fe08bce38d1ff052fe2209b9e2b8d5fbb7f98
>> spread /var/log/messages with the following line (> 700 lines/second)
>> right after loading vfio:
>>
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.0 domain=0x 
>> address=0x00fdf9103300 flags=0x0600]
>>
>> lspci -vvvs 0:14.0
>> 00:14.0 SMBus: Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller 
>> (rev 42)
>> Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
>> Stepping- SERR- FastB2B- DisINTx+
>> Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
>> SERR-  
> Most likely a BIOS issue that is uncovered by re-enabling the event-log
> interrupt patch. The device itself is only used by the BIOS and not by
> the Linux kernel

Thanks for this info! Good to know.

[...]

>> I removed the two mentioned patches and all is working
>> fine again as before.
> 
> Without these two patches, can you check dmesg after boot if there are
> other lines which report IO_PAGE_FAULTs?

You're right, there is exactly one entry directly after loading of vfio.
I can see this message, too, with linux 3.4.43.


Regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 102/127] iommu/amd: Workaround for ERBT1312

2013-06-28 Thread Andreas Hartmann
Hello Joerg, hello Alex,

the subsequent patch and the patch "iommu/amd: Re-enable IOMMU event log
interrupt after handling." 925fe08bce38d1ff052fe2209b9e2b8d5fbb7f98
spread /var/log/messages with the following line (> 700 lines/second)
right after loading vfio:

AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.0 domain=0x 
address=0x00fdf9103300 flags=0x0600]

lspci -vvvs 0:14.0
00:14.0 SMBus: Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller (rev 
42)
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
SERR-  the SSD was fast enough to
cover it silently). I saw it the first time I rebooted because X didn't start 
any more because
the /var partition was completely full. 

I removed the two mentioned patches and all is working
fine again as before.

Any idea?


Thanks,
kind regards,
Andreas


Greg Kroah-Hartman wrote:
> 3.9-stable review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: Joerg Roedel 
> 
> commit d3263bc29706e42f74d8800807c2dedf320d77f1 upstream.
> 
> Work around an IOMMU  hardware bug where clearing the
> EVT_INT or PPR_INT bit in the status register may race with
> the hardware trying to set it again. When not handled the
> bit might not be cleared and we lose all future event or ppr
> interrupts.
> 
> Reported-by: Suravee Suthikulpanit 
> Signed-off-by: Joerg Roedel 
> Signed-off-by: Greg Kroah-Hartman 
> 
> ---
>  drivers/iommu/amd_iommu.c |   34 ++
>  1 file changed, 26 insertions(+), 8 deletions(-)
> 
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -700,14 +700,23 @@ retry:
>  
>  static void iommu_poll_events(struct amd_iommu *iommu)
>  {
> - u32 head, tail;
> + u32 head, tail, status;
>   unsigned long flags;
>  
> - /* enable event interrupts again */
> - writel(MMIO_STATUS_EVT_INT_MASK, iommu->mmio_base + MMIO_STATUS_OFFSET);
> -
>   spin_lock_irqsave(>lock, flags);
>  
> + /* enable event interrupts again */
> + do {
> + /*
> +  * Workaround for Erratum ERBT1312
> +  * Clearing the EVT_INT bit may race in the hardware, so read
> +  * it again and make sure it was really cleared
> +  */
> + status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
> + writel(MMIO_STATUS_EVT_INT_MASK,
> +iommu->mmio_base + MMIO_STATUS_OFFSET);
> + } while (status & MMIO_STATUS_EVT_INT_MASK);
> +
>   head = readl(iommu->mmio_base + MMIO_EVT_HEAD_OFFSET);
>   tail = readl(iommu->mmio_base + MMIO_EVT_TAIL_OFFSET);
>  
> @@ -744,16 +753,25 @@ static void iommu_handle_ppr_entry(struc
>  static void iommu_poll_ppr_log(struct amd_iommu *iommu)
>  {
>   unsigned long flags;
> - u32 head, tail;
> + u32 head, tail, status;
>  
>   if (iommu->ppr_log == NULL)
>   return;
>  
> - /* enable ppr interrupts again */
> - writel(MMIO_STATUS_PPR_INT_MASK, iommu->mmio_base + MMIO_STATUS_OFFSET);
> -
>   spin_lock_irqsave(>lock, flags);
>  
> + /* enable ppr interrupts again */
> + do {
> + /*
> +  * Workaround for Erratum ERBT1312
> +  * Clearing the PPR_INT bit may race in the hardware, so read
> +  * it again and make sure it was really cleared
> +  */
> + status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
> + writel(MMIO_STATUS_PPR_INT_MASK,
> +iommu->mmio_base + MMIO_STATUS_OFFSET);
> + } while (status & MMIO_STATUS_PPR_INT_MASK);
> +
>   head = readl(iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
>   tail = readl(iommu->mmio_base + MMIO_PPR_TAIL_OFFSET);
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 102/127] iommu/amd: Workaround for ERBT1312

2013-06-28 Thread Andreas Hartmann
Hello Joerg, hello Alex,

the subsequent patch and the patch iommu/amd: Re-enable IOMMU event log
interrupt after handling. 925fe08bce38d1ff052fe2209b9e2b8d5fbb7f98
spread /var/log/messages with the following line ( 700 lines/second)
right after loading vfio:

AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.0 domain=0x 
address=0x00fdf9103300 flags=0x0600]

lspci -vvvs 0:14.0
00:14.0 SMBus: Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller (rev 
42)
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium TAbort- 
TAbort- MAbort- SERR- PERR- INTx-


Besides the enormous pollution I couldn't see any malfunction at all.
At first, I didn't realised it at all (- the SSD was fast enough to
cover it silently). I saw it the first time I rebooted because X didn't start 
any more because
the /var partition was completely full. 

I removed the two mentioned patches and all is working
fine again as before.

Any idea?


Thanks,
kind regards,
Andreas


Greg Kroah-Hartman wrote:
 3.9-stable review patch.  If anyone has any objections, please let me know.
 
 --
 
 From: Joerg Roedel j...@8bytes.org
 
 commit d3263bc29706e42f74d8800807c2dedf320d77f1 upstream.
 
 Work around an IOMMU  hardware bug where clearing the
 EVT_INT or PPR_INT bit in the status register may race with
 the hardware trying to set it again. When not handled the
 bit might not be cleared and we lose all future event or ppr
 interrupts.
 
 Reported-by: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
 Signed-off-by: Joerg Roedel j...@8bytes.org
 Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org
 
 ---
  drivers/iommu/amd_iommu.c |   34 ++
  1 file changed, 26 insertions(+), 8 deletions(-)
 
 --- a/drivers/iommu/amd_iommu.c
 +++ b/drivers/iommu/amd_iommu.c
 @@ -700,14 +700,23 @@ retry:
  
  static void iommu_poll_events(struct amd_iommu *iommu)
  {
 - u32 head, tail;
 + u32 head, tail, status;
   unsigned long flags;
  
 - /* enable event interrupts again */
 - writel(MMIO_STATUS_EVT_INT_MASK, iommu-mmio_base + MMIO_STATUS_OFFSET);
 -
   spin_lock_irqsave(iommu-lock, flags);
  
 + /* enable event interrupts again */
 + do {
 + /*
 +  * Workaround for Erratum ERBT1312
 +  * Clearing the EVT_INT bit may race in the hardware, so read
 +  * it again and make sure it was really cleared
 +  */
 + status = readl(iommu-mmio_base + MMIO_STATUS_OFFSET);
 + writel(MMIO_STATUS_EVT_INT_MASK,
 +iommu-mmio_base + MMIO_STATUS_OFFSET);
 + } while (status  MMIO_STATUS_EVT_INT_MASK);
 +
   head = readl(iommu-mmio_base + MMIO_EVT_HEAD_OFFSET);
   tail = readl(iommu-mmio_base + MMIO_EVT_TAIL_OFFSET);
  
 @@ -744,16 +753,25 @@ static void iommu_handle_ppr_entry(struc
  static void iommu_poll_ppr_log(struct amd_iommu *iommu)
  {
   unsigned long flags;
 - u32 head, tail;
 + u32 head, tail, status;
  
   if (iommu-ppr_log == NULL)
   return;
  
 - /* enable ppr interrupts again */
 - writel(MMIO_STATUS_PPR_INT_MASK, iommu-mmio_base + MMIO_STATUS_OFFSET);
 -
   spin_lock_irqsave(iommu-lock, flags);
  
 + /* enable ppr interrupts again */
 + do {
 + /*
 +  * Workaround for Erratum ERBT1312
 +  * Clearing the PPR_INT bit may race in the hardware, so read
 +  * it again and make sure it was really cleared
 +  */
 + status = readl(iommu-mmio_base + MMIO_STATUS_OFFSET);
 + writel(MMIO_STATUS_PPR_INT_MASK,
 +iommu-mmio_base + MMIO_STATUS_OFFSET);
 + } while (status  MMIO_STATUS_PPR_INT_MASK);
 +
   head = readl(iommu-mmio_base + MMIO_PPR_HEAD_OFFSET);
   tail = readl(iommu-mmio_base + MMIO_PPR_TAIL_OFFSET);
  
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 102/127] iommu/amd: Workaround for ERBT1312

2013-06-28 Thread Andreas Hartmann
Hello Joerg,

Joerg Roedel wrote:
 Hi Andreas,
 
 On Fri, Jun 28, 2013 at 06:11:36PM +0200, Andreas Hartmann wrote:
 Hello Joerg, hello Alex,

 the subsequent patch and the patch iommu/amd: Re-enable IOMMU event log
 interrupt after handling. 925fe08bce38d1ff052fe2209b9e2b8d5fbb7f98
 spread /var/log/messages with the following line ( 700 lines/second)
 right after loading vfio:

 AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.0 domain=0x 
 address=0x00fdf9103300 flags=0x0600]

 lspci -vvvs 0:14.0
 00:14.0 SMBus: Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller 
 (rev 42)
 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
 Stepping- SERR- FastB2B- DisINTx+
 Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium TAbort- 
 TAbort- MAbort- SERR- PERR- INTx-
 
 Most likely a BIOS issue that is uncovered by re-enabling the event-log
 interrupt patch. The device itself is only used by the BIOS and not by
 the Linux kernel

Thanks for this info! Good to know.

[...]

 I removed the two mentioned patches and all is working
 fine again as before.
 
 Without these two patches, can you check dmesg after boot if there are
 other lines which report IO_PAGE_FAULTs?

You're right, there is exactly one entry directly after loading of vfio.
I can see this message, too, with linux 3.4.43.


Regards,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 102/127] iommu/amd: Workaround for ERBT1312

2013-06-28 Thread Andreas Hartmann
Alex Williamson wrote:
 On Fri, 2013-06-28 at 18:11 +0200, Andreas Hartmann wrote:
 Hello Joerg, hello Alex,

 the subsequent patch and the patch iommu/amd: Re-enable IOMMU event log
 interrupt after handling. 925fe08bce38d1ff052fe2209b9e2b8d5fbb7f98
 spread /var/log/messages with the following line ( 700 lines/second)
 right after loading vfio:

 AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.0 domain=0x 
 address=0x00fdf9103300 flags=0x0600]
 
 That's interesting, I PXE boot my system from one NIC then use a
 different NIC for the iSCSI root.  The PXE boot NIC now screams like
 this, _until_ I attach it to vfio, then it quiets down.

Hmm, I just remembered an active workaround I implemented to resolve
an error like this when starting my VM to passthrough my intel pci
ethernet device since I applied a new kvm version:


qemu-kvm: -device vfio-pci,host=06:06.0: vfio: failed to set iommu for
container: Device or resource busy

qemu-kvm: -device vfio-pci,host=06:06.0: vfio: failed to setup container
for group 12

qemu-kvm: -device vfio-pci,host=06:06.0: vfio: failed to get group 12

qemu-kvm: -device vfio-pci,host=06:06.0: Device 'vfio-pci' could not be
initialized


The workaround was to bind the individual multifunction devices during
boot one time to vfio and release them after 2 seconds again and rebind
them to the original drivers as they where bound before (if it was bound
to any).

I did this with a script beginning like this:

#!/bin/sh
modprobe vfio-pci

echo 1002 4385  /sys/bus/pci/drivers/vfio-pci/new_id
echo :00:14.0  /sys/bus/pci/devices/:00:14.0/driver/unbind
echo :00:14.0  /sys/bus/pci/drivers/vfio-pci/bind
...

sleep 2

echo :00:14.0  /sys/bus/pci/drivers/vfio-pci/unbind
echo 1002 4385  /sys/bus/pci/drivers/vfio-pci/remove_id
...

The logs in messages:

Jun 28 15:54:12 . kernel: [   48.860147] VFIO - User Level meta-driver version: 
0.3
Jun 28 15:54:12 . kernel: [   48.875243] AMD-Vi: Event logged [IO_PAGE_FAULT 
device=00:14.0 domain=0x address=0x00fdf9103300 flags=0x0600]
...

Therefore, the logoutput most probably started after device 14.0 was
bound to vfio. If it would have started after removing vfio, I would
have expected 2 seconds between the start messages of vfio and the first
occurrence of the IO_PAGE_FAULT.

Today, I'm using kvm 1.3.1 and it isn't necessary to use the complete
workaround anymore. It is enough to bind / unbind the pci bridge
as described above before starting the VM with the passed through pci
ethernet device.
Because I now don't touch the 14.0 device any more, the IO_PAGE_FAULT
messages disappeared completely.

@Joerg:
Anyway, I'm going to test your provided patch tomorrow!

BTW: what does it mean: IO_PAGE_FAULT - what do I have to expect if I
see this message?



Thanks,
regards,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 102/127] iommu/amd: Workaround for ERBT1312

2013-06-28 Thread Andreas Hartmann
Joerg Roedel wrote:
 Alex, Andreas,
 
 On Fri, Jun 28, 2013 at 08:42:05PM +0200, Andreas Hartmann wrote:
 You're right, there is exactly one entry directly after loading of vfio.
 I can see this message, too, with linux 3.4.43.
 
 Can you please test this patch? It should reduce the noise
 significantly, but a few of those error messages are still expected.

Sorry, but it doesn't work for me at all :-(. Behaviour is unchanged. It
is exactly as described in the other mail: at the moment of binding vfio
to 14.0, the fire begins.

echo 1002 4385  /sys/bus/pci/drivers/vfio-pci/new_id
echo :00:14.0  /sys/bus/pci/devices/:00:14.0/driver/unbind
echo :00:14.0  /sys/bus/pci/drivers/vfio-pci/bind


Regards,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] pci: ACS quirk for AMD southbridge

2013-06-26 Thread Andreas Hartmann
Alex Williamson wrote:
> On Wed, 2013-06-26 at 17:14 +0200, Andreas Hartmann wrote:
>> Bjorn Helgaas wrote:
>>> [fix Joerg's email address]
>>>
>>> On Tue, Jun 25, 2013 at 10:15 PM, Bjorn Helgaas  wrote:
>>>> On Wed, Jul 11, 2012 at 11:18 PM, Alex Williamson
>>>>  wrote:
>>>>> We've confirmed that peer-to-peer between these devices is
>>>>> not possible.  We can therefore claim that they support a
>>>>> subset of ACS.
>>>>>
>>>>> Signed-off-by: Alex Williamson 
>>>>> Cc: Joerg Roedel 
>>>>> ---
>>>>>
>>>>> Two things about this patch make me a little nervous.  The
>>>>> first is that I'd really like to have a pci_is_pcie() test
>>>>> in pci_mf_no_p2p_acs_enabled(), but these devices don't
>>>>> have a PCIe capability.  That means that if there was a
>>>>> topology where these devices sit on a legacy PCI bus,
>>>>> we incorrectly return that we're ACS safe here.  That leads
>>>>> to my second problem, pciids seems to suggest that some of
>>>>> these functions have been around for a while.  Is it just
>>>>> this package that's peer-to-peer safe, or is it safe to
>>>>> assume that any previous assembly of these functions is
>>>>> also p2p safe.  Maybe we need to factor in device revs if
>>>>> that uniquely identifies this package?
>>>>>
>>>>> Looks like another useful device to potentially quirk
>>>>> would be:
>>>>>
>>>>> 00:15.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI 
>>>>> SB700/SB800/SB900 PCI to PCI bridge (PCIE port 0)
>>>>> 00:15.1 PCI bridge: Advanced Micro Devices [AMD] nee ATI 
>>>>> SB700/SB800/SB900 PCI to PCI bridge (PCIE port 1)
>>>>> 00:15.2 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB900 PCI to PCI 
>>>>> bridge (PCIE port 2)
>>>>> 00:15.3 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB900 PCI to PCI 
>>>>> bridge (PCIE port 3)
>>>>>
>>>>> 00:15.0 0604: 1002:43a0
>>>>> 00:15.1 0604: 1002:43a1
>>>>> 00:15.2 0604: 1002:43a2
>>>>> 00:15.3 0604: 1002:43a3
>>>>>
>>>>>  drivers/pci/quirks.c |   29 +
>>>>>  1 file changed, 29 insertions(+)
>>>>>
>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>> index 4ebc865..2c84961 100644
>>>>> --- a/drivers/pci/quirks.c
>>>>> +++ b/drivers/pci/quirks.c
>>>>> @@ -3271,11 +3271,40 @@ struct pci_dev *pci_get_dma_source(struct pci_dev 
>>>>> *dev)
>>>>> return pci_dev_get(dev);
>>>>>  }
>>>>>
>>>>> +/*
>>>>> + * Multifunction devices that do not support peer-to-peer between
>>>>> + * functions can claim to support a subset of ACS.  Such devices
>>>>> + * effectively enable request redirect (RR) and completion redirect (CR)
>>>>> + * since all transactions are redirected to the upstream root complex.
>>>>> + */
>>>>> +static int pci_mf_no_p2p_acs_enabled(struct pci_dev *dev, u16 acs_flags)
>>>>> +{
>>>>> +   if (!dev->multifunction)
>>>>> +   return -ENODEV;
>>>>> +
>>>>> +   /* Filter out flags not applicable to multifunction */
>>>>> +   acs_flags &= (PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC | PCI_ACS_DT);
>>>>> +
>>>>> +   return acs_flags & ~(PCI_ACS_RR | PCI_ACS_CR) ? 0 : 1;
>>>>> +}
>>>>> +
>>>>>  static const struct pci_dev_acs_enabled {
>>>>> u16 vendor;
>>>>> u16 device;
>>>>> int (*acs_enabled)(struct pci_dev *dev, u16 acs_flags);
>>>>>  } pci_dev_acs_enabled[] = {
>>>>> +   /*
>>>>> +* AMD/ATI multifunction southbridge devices.  AMD has confirmed
>>>>> +* that peer-to-peer between these devices is not possible, so
>>>>> +* they do support a subset of ACS even though the capability is
>>>>> +* not exposed in config space.
>>>>> +*/
>>>>> +   { PCI_VENDOR_ID_ATI, 0x4385, pci_mf_no_p2p_acs_enabled },
>>>>> +   { PCI_VENDOR_ID_ATI, 0x439c

Re: [PATCH RFC] pci: ACS quirk for AMD southbridge

2013-06-26 Thread Andreas Hartmann
Bjorn Helgaas wrote:
> [fix Joerg's email address]
> 
> On Tue, Jun 25, 2013 at 10:15 PM, Bjorn Helgaas  wrote:
>> On Wed, Jul 11, 2012 at 11:18 PM, Alex Williamson
>>  wrote:
>>> We've confirmed that peer-to-peer between these devices is
>>> not possible.  We can therefore claim that they support a
>>> subset of ACS.
>>>
>>> Signed-off-by: Alex Williamson 
>>> Cc: Joerg Roedel 
>>> ---
>>>
>>> Two things about this patch make me a little nervous.  The
>>> first is that I'd really like to have a pci_is_pcie() test
>>> in pci_mf_no_p2p_acs_enabled(), but these devices don't
>>> have a PCIe capability.  That means that if there was a
>>> topology where these devices sit on a legacy PCI bus,
>>> we incorrectly return that we're ACS safe here.  That leads
>>> to my second problem, pciids seems to suggest that some of
>>> these functions have been around for a while.  Is it just
>>> this package that's peer-to-peer safe, or is it safe to
>>> assume that any previous assembly of these functions is
>>> also p2p safe.  Maybe we need to factor in device revs if
>>> that uniquely identifies this package?
>>>
>>> Looks like another useful device to potentially quirk
>>> would be:
>>>
>>> 00:15.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB700/SB800/SB900 
>>> PCI to PCI bridge (PCIE port 0)
>>> 00:15.1 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB700/SB800/SB900 
>>> PCI to PCI bridge (PCIE port 1)
>>> 00:15.2 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB900 PCI to PCI 
>>> bridge (PCIE port 2)
>>> 00:15.3 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB900 PCI to PCI 
>>> bridge (PCIE port 3)
>>>
>>> 00:15.0 0604: 1002:43a0
>>> 00:15.1 0604: 1002:43a1
>>> 00:15.2 0604: 1002:43a2
>>> 00:15.3 0604: 1002:43a3
>>>
>>>  drivers/pci/quirks.c |   29 +
>>>  1 file changed, 29 insertions(+)
>>>
>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>> index 4ebc865..2c84961 100644
>>> --- a/drivers/pci/quirks.c
>>> +++ b/drivers/pci/quirks.c
>>> @@ -3271,11 +3271,40 @@ struct pci_dev *pci_get_dma_source(struct pci_dev 
>>> *dev)
>>> return pci_dev_get(dev);
>>>  }
>>>
>>> +/*
>>> + * Multifunction devices that do not support peer-to-peer between
>>> + * functions can claim to support a subset of ACS.  Such devices
>>> + * effectively enable request redirect (RR) and completion redirect (CR)
>>> + * since all transactions are redirected to the upstream root complex.
>>> + */
>>> +static int pci_mf_no_p2p_acs_enabled(struct pci_dev *dev, u16 acs_flags)
>>> +{
>>> +   if (!dev->multifunction)
>>> +   return -ENODEV;
>>> +
>>> +   /* Filter out flags not applicable to multifunction */
>>> +   acs_flags &= (PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC | PCI_ACS_DT);
>>> +
>>> +   return acs_flags & ~(PCI_ACS_RR | PCI_ACS_CR) ? 0 : 1;
>>> +}
>>> +
>>>  static const struct pci_dev_acs_enabled {
>>> u16 vendor;
>>> u16 device;
>>> int (*acs_enabled)(struct pci_dev *dev, u16 acs_flags);
>>>  } pci_dev_acs_enabled[] = {
>>> +   /*
>>> +* AMD/ATI multifunction southbridge devices.  AMD has confirmed
>>> +* that peer-to-peer between these devices is not possible, so
>>> +* they do support a subset of ACS even though the capability is
>>> +* not exposed in config space.
>>> +*/
>>> +   { PCI_VENDOR_ID_ATI, 0x4385, pci_mf_no_p2p_acs_enabled },
>>> +   { PCI_VENDOR_ID_ATI, 0x439c, pci_mf_no_p2p_acs_enabled },
>>> +   { PCI_VENDOR_ID_ATI, 0x4383, pci_mf_no_p2p_acs_enabled },
>>> +   { PCI_VENDOR_ID_ATI, 0x439d, pci_mf_no_p2p_acs_enabled },
>>> +   { PCI_VENDOR_ID_ATI, 0x4384, pci_mf_no_p2p_acs_enabled },
>>> +   { PCI_VENDOR_ID_ATI, 0x4399, pci_mf_no_p2p_acs_enabled },
>>> { 0 }
>>>  };
>>>
>>>
>>
>> I was looking for something else and found this old email.  This patch
>> hasn't been applied and I haven't seen any discussion about it.  Is it
>> still of interest?  It seems relevant to the current ACS discussion
>> [1].

It is absolutely relevant. I always have to patch my kernel to get it
working to put my pci device to VM. Meanwhile I'm doing it for
kernel 3.9. I would be very glad to get these patches to the kernel as
they don't do anything bad!

My multifunction devices are the devices defined in the patch. My
current pci device passed through is a intel ethernet device:

-[:00]-+-00.0  Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge 
(external gfx0 port B)
   +-00.2  Advanced Micro Devices [AMD] nee ATI RD990 I/O Memory 
Management Unit (IOMMU)
   +-02.0-[01]--+-00.0  Advanced Micro Devices [AMD] nee ATI Turks 
[Radeon HD 6570]
   |\-00.1  Advanced Micro Devices [AMD] nee ATI Turks HDMI 
Audio [Radeon HD 6000 Series]
   +-04.0-[02]00.0  Etron Technology, Inc. EJ168 USB 3.0 Host 
Controller
   +-05.0-[03]00.0  Atheros Communications Inc. AR9300 Wireless LAN 
adaptor
   

Re: [PATCH RFC] pci: ACS quirk for AMD southbridge

2013-06-26 Thread Andreas Hartmann
   +-11.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA 
Controller [AHCI mode]
   +-12.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
OHCI0 Controller
   +-12.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
EHCI Controller
   +-13.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
OHCI0 Controller
   +-13.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
EHCI Controller
   +-14.0  Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller
   +-14.2  Advanced Micro Devices [AMD] nee ATI SBx00 Azalia (Intel HDA)
   +-14.3  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 LPC 
host controller
   +-14.4-[06]--+-06.0  Intel Corporation 82557/8/9/0/1 Ethernet Pro 100
   |\-0e.0  VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] 
IEEE 1394 OHCI Controller
   +-14.5  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
OHCI2 Controller
   +-15.0-[07]--
   +-16.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
OHCI0 Controller
   +-16.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
EHCI Controller
   +-18.0  Advanced Micro Devices [AMD] Family 15h Processor Function 0
   +-18.1  Advanced Micro Devices [AMD] Family 15h Processor Function 1
   +-18.2  Advanced Micro Devices [AMD] Family 15h Processor Function 2
   +-18.3  Advanced Micro Devices [AMD] Family 15h Processor Function 3
   +-18.4  Advanced Micro Devices [AMD] Family 15h Processor Function 4
   \-18.5  Advanced Micro Devices [AMD] Family 15h Processor Function 5


lspci -v -s 14.4
00:14.4 PCI bridge: Advanced Micro Devices [AMD] nee ATI SBx00 PCI to PCI 
Bridge (rev 40) (prog-if 01 [Subtractive decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop+ ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- 
TAbort- MAbort- SERR- PERR- INTx-
Latency: 64
Bus: primary=00, secondary=06, subordinate=06, sec-latency=64
I/O behind bridge: 9000-9fff
Memory behind bridge: fd80-fd8f
Prefetchable memory behind bridge: fd70-fd7f
Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium TAbort- 
TAbort- MAbort+ SERR- PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-


lspci -v -s 6.0
06:06.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 
(rev 0c)
Subsystem: Intel Corporation EtherExpress PRO/100 S Desktop Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- 
TAbort- MAbort- SERR- PERR- INTx-
Latency: 64 (2000ns min, 14000ns max), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 20
Region 0: Memory at fd8ff000 (32-bit, non-prefetchable) [size=4K]
Region 1: I/O ports at 9f00 [size=64]
Region 2: Memory at fd8c (32-bit, non-prefetchable) [size=128K]
Expansion ROM at fd70 [disabled] [size=64K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=2 PME-
Kernel driver in use: vfio-pci


Thanks,
kind regards,
Andreas Hartmann
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] pci: ACS quirk for AMD southbridge

2013-06-26 Thread Andreas Hartmann
Alex Williamson wrote:
 On Wed, 2013-06-26 at 17:14 +0200, Andreas Hartmann wrote:
 Bjorn Helgaas wrote:
 [fix Joerg's email address]

 On Tue, Jun 25, 2013 at 10:15 PM, Bjorn Helgaas bhelg...@google.com wrote:
 On Wed, Jul 11, 2012 at 11:18 PM, Alex Williamson
 alex.william...@redhat.com wrote:
 We've confirmed that peer-to-peer between these devices is
 not possible.  We can therefore claim that they support a
 subset of ACS.

 Signed-off-by: Alex Williamson alex.william...@redhat.com
 Cc: Joerg Roedel joerg.roe...@amd.com
 ---

 Two things about this patch make me a little nervous.  The
 first is that I'd really like to have a pci_is_pcie() test
 in pci_mf_no_p2p_acs_enabled(), but these devices don't
 have a PCIe capability.  That means that if there was a
 topology where these devices sit on a legacy PCI bus,
 we incorrectly return that we're ACS safe here.  That leads
 to my second problem, pciids seems to suggest that some of
 these functions have been around for a while.  Is it just
 this package that's peer-to-peer safe, or is it safe to
 assume that any previous assembly of these functions is
 also p2p safe.  Maybe we need to factor in device revs if
 that uniquely identifies this package?

 Looks like another useful device to potentially quirk
 would be:

 00:15.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI 
 SB700/SB800/SB900 PCI to PCI bridge (PCIE port 0)
 00:15.1 PCI bridge: Advanced Micro Devices [AMD] nee ATI 
 SB700/SB800/SB900 PCI to PCI bridge (PCIE port 1)
 00:15.2 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB900 PCI to PCI 
 bridge (PCIE port 2)
 00:15.3 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB900 PCI to PCI 
 bridge (PCIE port 3)

 00:15.0 0604: 1002:43a0
 00:15.1 0604: 1002:43a1
 00:15.2 0604: 1002:43a2
 00:15.3 0604: 1002:43a3

  drivers/pci/quirks.c |   29 +
  1 file changed, 29 insertions(+)

 diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
 index 4ebc865..2c84961 100644
 --- a/drivers/pci/quirks.c
 +++ b/drivers/pci/quirks.c
 @@ -3271,11 +3271,40 @@ struct pci_dev *pci_get_dma_source(struct pci_dev 
 *dev)
 return pci_dev_get(dev);
  }

 +/*
 + * Multifunction devices that do not support peer-to-peer between
 + * functions can claim to support a subset of ACS.  Such devices
 + * effectively enable request redirect (RR) and completion redirect (CR)
 + * since all transactions are redirected to the upstream root complex.
 + */
 +static int pci_mf_no_p2p_acs_enabled(struct pci_dev *dev, u16 acs_flags)
 +{
 +   if (!dev-multifunction)
 +   return -ENODEV;
 +
 +   /* Filter out flags not applicable to multifunction */
 +   acs_flags = (PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC | PCI_ACS_DT);
 +
 +   return acs_flags  ~(PCI_ACS_RR | PCI_ACS_CR) ? 0 : 1;
 +}
 +
  static const struct pci_dev_acs_enabled {
 u16 vendor;
 u16 device;
 int (*acs_enabled)(struct pci_dev *dev, u16 acs_flags);
  } pci_dev_acs_enabled[] = {
 +   /*
 +* AMD/ATI multifunction southbridge devices.  AMD has confirmed
 +* that peer-to-peer between these devices is not possible, so
 +* they do support a subset of ACS even though the capability is
 +* not exposed in config space.
 +*/
 +   { PCI_VENDOR_ID_ATI, 0x4385, pci_mf_no_p2p_acs_enabled },
 +   { PCI_VENDOR_ID_ATI, 0x439c, pci_mf_no_p2p_acs_enabled },
 +   { PCI_VENDOR_ID_ATI, 0x4383, pci_mf_no_p2p_acs_enabled },
 +   { PCI_VENDOR_ID_ATI, 0x439d, pci_mf_no_p2p_acs_enabled },
 +   { PCI_VENDOR_ID_ATI, 0x4384, pci_mf_no_p2p_acs_enabled },
 +   { PCI_VENDOR_ID_ATI, 0x4399, pci_mf_no_p2p_acs_enabled },
 { 0 }
  };



 I was looking for something else and found this old email.  This patch
 hasn't been applied and I haven't seen any discussion about it.  Is it
 still of interest?  It seems relevant to the current ACS discussion
 [1].

 It is absolutely relevant. I always have to patch my kernel to get it
 working to put my pci device to VM. Meanwhile I'm doing it for
 kernel 3.9. I would be very glad to get these patches to the kernel as
 they don't do anything bad!
 
 I'd still like to see this get in too.  IIRC, where we left off was that
 Joerg had confirmed with the hardware folks that there is no
 peer-to-peer between these devices, but we still had questions about
 whether that was true for any instance of these vendor/device IDs.
 These devices are re-used in several packages and I'm not sure if we
 need to somehow figure out what package (ie. which chipset generation)
 we're looking at to know if p2p is used. 

Does this statement cover your question?
http://article.gmane.org/gmane.comp.emulators.kvm.devel/99402

Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 104/173] rt2x00: Dont let mac80211 send a BAR when an AMPDU subframe fails

2013-01-07 Thread Andreas Hartmann
Hello Stanislaw!

Stanislaw Gruszka wrote:
> On Mon, Jan 07, 2013 at 07:38:35PM +0100, Andreas Hartmann wrote:
>> Stanislaw Gruszka wrote:
>>> On Mon, Jan 07, 2013 at 04:04:01PM +0100, Andreas Hartmann wrote:
>>>> Ben Hutchings wrote:
>>>>> On Mon, 2013-01-07 at 09:10 +0100, Stanislaw Gruszka wrote:
>>>>>> On Mon, Jan 07, 2013 at 09:05:32AM +0100, Stanislaw Gruszka wrote:
>>>>>>>> To be clear, I have all of these in the queue:
>>>>>>>>
>>>>>>>> be03d4a45c09 rt2x00: Don't let mac80211 send a BAR when an AMPDU 
>>>>>>>> subframe fails
>>>>>>>> 5b632fe85ec8 mac80211: introduce IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL
>>>>>>>> ab9d6e4ffe19 Revert: "rt2x00: Don't let mac80211 send a BAR when an 
>>>>>>>> AMPDU subframe fails"
>>>>>>>>
>>>>>>>> and I'm intending to drop/defer them all.
>>>>>>>
>>>>>>> Patch 3 is a revert of patch 1 (questioned patch). Please apply all 3 
>>>>>>> patches,
>>>>>>> or only patch 2.
>>>>>>
>>>>>> No, actually all 3 patches have to be applied. Because last one, except
>>>>>> revert, include flag IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL setting in 
>>>>>> rt2x00
>>>>>> driver, which make patch 2 work.
>>>>>
>>>>> Andreas said that that after ab9d6e4ffe19 there was still a regression.
>>>
>>> That's not true. There will be no regression after ab9d6e4ffe20. The
>>> only thing is that solution is not perfect. But perfect solution require
>>> lot of changes i.e. is not -stable appropriate (and does not exist 
>>> currently).
>>>
>>>>> But maybe he was confused.  I know I'm confused.
>>>> :-))
>>>>
>>>> No, the thing is:
>>>> rt2800pci misses an appropriate handling of aggregation (which meets the
>>>> requirements of mac80211).
>>>>
>>>> Both workarounds, mine and the new workaround from Stanislaw (which is
>>>> nothing more than a restricted version of my initial workaround), work
>>>
>>> Your workaround broke STA mode on some environment.
>>
>> Why are you sure, that this workaround doesn't break some other devices
>> running in AP mode? We believed at that time too, it wouldn't harm even
>> STA. But this was wrong for some (which?) devices.
> 
> Because it make behaviour the same as it was before 3.2, which introduce
> those issues.

You're so right, Stanislaw! I should have better looked again at your
patch before writing those stupid lines about differentiation between
STA and AP.

Please apologize!


Kind regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 104/173] rt2x00: Dont let mac80211 send a BAR when an AMPDU subframe fails

2013-01-07 Thread Andreas Hartmann
Hello Helmut!

Helmut Schaa wrote:
> On Mon, Jan 7, 2013 at 4:04 PM, Andreas Hartmann
>  wrote:
>> The solution would be IMHO, to implement an own aggregation handling,
>> maybe the same way as it was done for carl9170, which had the same problem:
>>
>> http://thread.gmane.org/gmane.linux.kernel.wireless.general/100793/focus=1405
>>
>> I prefer to have solutions (if one is known) instead of another workaround.
> 
> JFI, I'm just working on exactly that (handling BAR TX status in
> driver to implement proper RX reorder window flushing at the peer). I'll post 
> it for
> further testing to the rt2x00 list once I'm done.

Thank you for your time spent on this problem! I really appreciate it!


Kind regards!
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 104/173] rt2x00: Dont let mac80211 send a BAR when an AMPDU subframe fails

2013-01-07 Thread Andreas Hartmann
Stanislaw Gruszka wrote:
> On Mon, Jan 07, 2013 at 04:04:01PM +0100, Andreas Hartmann wrote:
>> Ben Hutchings wrote:
>>> On Mon, 2013-01-07 at 09:10 +0100, Stanislaw Gruszka wrote:
>>>> On Mon, Jan 07, 2013 at 09:05:32AM +0100, Stanislaw Gruszka wrote:
>>>>>> To be clear, I have all of these in the queue:
>>>>>>
>>>>>> be03d4a45c09 rt2x00: Don't let mac80211 send a BAR when an AMPDU 
>>>>>> subframe fails
>>>>>> 5b632fe85ec8 mac80211: introduce IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL
>>>>>> ab9d6e4ffe19 Revert: "rt2x00: Don't let mac80211 send a BAR when an 
>>>>>> AMPDU subframe fails"
>>>>>>
>>>>>> and I'm intending to drop/defer them all.
>>>>>
>>>>> Patch 3 is a revert of patch 1 (questioned patch). Please apply all 3 
>>>>> patches,
>>>>> or only patch 2.
>>>>
>>>> No, actually all 3 patches have to be applied. Because last one, except
>>>> revert, include flag IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL setting in 
>>>> rt2x00
>>>> driver, which make patch 2 work.
>>>
>>> Andreas said that that after ab9d6e4ffe19 there was still a regression.
> 
> That's not true. There will be no regression after ab9d6e4ffe20. The
> only thing is that solution is not perfect. But perfect solution require
> lot of changes i.e. is not -stable appropriate (and does not exist currently).
> 
>>> But maybe he was confused.  I know I'm confused.
>> :-))
>>
>> No, the thing is:
>> rt2800pci misses an appropriate handling of aggregation (which meets the
>> requirements of mac80211).
>>
>> Both workarounds, mine and the new workaround from Stanislaw (which is
>> nothing more than a restricted version of my initial workaround), work
> 
> Your workaround broke STA mode on some environment.

Why are you sure, that this workaround doesn't break some other devices
running in AP mode? We believed at that time too, it wouldn't harm even
STA. But this was wrong for some (which?) devices.


Anyway: As Helmut meanwhile mentioned that he thankfully works on a
solution now, I'm fine with the second round of workaround.



Kind regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 104/173] rt2x00: Dont let mac80211 send a BAR when an AMPDU subframe fails

2013-01-07 Thread Andreas Hartmann
Ben Hutchings wrote:
> On Mon, 2013-01-07 at 09:10 +0100, Stanislaw Gruszka wrote:
>> On Mon, Jan 07, 2013 at 09:05:32AM +0100, Stanislaw Gruszka wrote:
 To be clear, I have all of these in the queue:

 be03d4a45c09 rt2x00: Don't let mac80211 send a BAR when an AMPDU subframe 
 fails
 5b632fe85ec8 mac80211: introduce IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL
 ab9d6e4ffe19 Revert: "rt2x00: Don't let mac80211 send a BAR when an AMPDU 
 subframe fails"

 and I'm intending to drop/defer them all.
>>>
>>> Patch 3 is a revert of patch 1 (questioned patch). Please apply all 3 
>>> patches,
>>> or only patch 2.
>>
>> No, actually all 3 patches have to be applied. Because last one, except
>> revert, include flag IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL setting in rt2x00
>> driver, which make patch 2 work.
> 
> Andreas said that that after ab9d6e4ffe19 there was still a regression.
> But maybe he was confused.  I know I'm confused.

:-))

No, the thing is:
rt2800pci misses an appropriate handling of aggregation (which meets the
requirements of mac80211).

Both workarounds, mine and the new workaround from Stanislaw (which is
nothing more than a restricted version of my initial workaround), work
like this:
Let the peer do the aggregation handling. If it's not done by the peer,
the connection will break down.

Therefore:
The solution would be IMHO, to implement an own aggregation handling,
maybe the same way as it was done for carl9170, which had the same problem:

http://thread.gmane.org/gmane.linux.kernel.wireless.general/100793/focus=1405

I prefer to have solutions (if one is known) instead of another workaround.
If I use my device as STA instead of an AP, it even works fine w/o
Stanislaws patch. Do you understand what I'm trying to say?



Thanks,
kind regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 104/173] rt2x00: Dont let mac80211 send a BAR when an AMPDU subframe fails

2013-01-07 Thread Andreas Hartmann
Ben Hutchings wrote:
 On Mon, 2013-01-07 at 09:10 +0100, Stanislaw Gruszka wrote:
 On Mon, Jan 07, 2013 at 09:05:32AM +0100, Stanislaw Gruszka wrote:
 To be clear, I have all of these in the queue:

 be03d4a45c09 rt2x00: Don't let mac80211 send a BAR when an AMPDU subframe 
 fails
 5b632fe85ec8 mac80211: introduce IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL
 ab9d6e4ffe19 Revert: rt2x00: Don't let mac80211 send a BAR when an AMPDU 
 subframe fails

 and I'm intending to drop/defer them all.

 Patch 3 is a revert of patch 1 (questioned patch). Please apply all 3 
 patches,
 or only patch 2.

 No, actually all 3 patches have to be applied. Because last one, except
 revert, include flag IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL setting in rt2x00
 driver, which make patch 2 work.
 
 Andreas said that that after ab9d6e4ffe19 there was still a regression.
 But maybe he was confused.  I know I'm confused.

:-))

No, the thing is:
rt2800pci misses an appropriate handling of aggregation (which meets the
requirements of mac80211).

Both workarounds, mine and the new workaround from Stanislaw (which is
nothing more than a restricted version of my initial workaround), work
like this:
Let the peer do the aggregation handling. If it's not done by the peer,
the connection will break down.

Therefore:
The solution would be IMHO, to implement an own aggregation handling,
maybe the same way as it was done for carl9170, which had the same problem:

http://thread.gmane.org/gmane.linux.kernel.wireless.general/100793/focus=1405

I prefer to have solutions (if one is known) instead of another workaround.
If I use my device as STA instead of an AP, it even works fine w/o
Stanislaws patch. Do you understand what I'm trying to say?



Thanks,
kind regards,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 104/173] rt2x00: Dont let mac80211 send a BAR when an AMPDU subframe fails

2013-01-07 Thread Andreas Hartmann
Stanislaw Gruszka wrote:
 On Mon, Jan 07, 2013 at 04:04:01PM +0100, Andreas Hartmann wrote:
 Ben Hutchings wrote:
 On Mon, 2013-01-07 at 09:10 +0100, Stanislaw Gruszka wrote:
 On Mon, Jan 07, 2013 at 09:05:32AM +0100, Stanislaw Gruszka wrote:
 To be clear, I have all of these in the queue:

 be03d4a45c09 rt2x00: Don't let mac80211 send a BAR when an AMPDU 
 subframe fails
 5b632fe85ec8 mac80211: introduce IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL
 ab9d6e4ffe19 Revert: rt2x00: Don't let mac80211 send a BAR when an 
 AMPDU subframe fails

 and I'm intending to drop/defer them all.

 Patch 3 is a revert of patch 1 (questioned patch). Please apply all 3 
 patches,
 or only patch 2.

 No, actually all 3 patches have to be applied. Because last one, except
 revert, include flag IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL setting in 
 rt2x00
 driver, which make patch 2 work.

 Andreas said that that after ab9d6e4ffe19 there was still a regression.
 
 That's not true. There will be no regression after ab9d6e4ffe20. The
 only thing is that solution is not perfect. But perfect solution require
 lot of changes i.e. is not -stable appropriate (and does not exist currently).
 
 But maybe he was confused.  I know I'm confused.
 :-))

 No, the thing is:
 rt2800pci misses an appropriate handling of aggregation (which meets the
 requirements of mac80211).

 Both workarounds, mine and the new workaround from Stanislaw (which is
 nothing more than a restricted version of my initial workaround), work
 
 Your workaround broke STA mode on some environment.

Why are you sure, that this workaround doesn't break some other devices
running in AP mode? We believed at that time too, it wouldn't harm even
STA. But this was wrong for some (which?) devices.


Anyway: As Helmut meanwhile mentioned that he thankfully works on a
solution now, I'm fine with the second round of workaround.



Kind regards,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 104/173] rt2x00: Dont let mac80211 send a BAR when an AMPDU subframe fails

2013-01-07 Thread Andreas Hartmann
Hello Helmut!

Helmut Schaa wrote:
 On Mon, Jan 7, 2013 at 4:04 PM, Andreas Hartmann
 andihartm...@01019freenet.de wrote:
 The solution would be IMHO, to implement an own aggregation handling,
 maybe the same way as it was done for carl9170, which had the same problem:

 http://thread.gmane.org/gmane.linux.kernel.wireless.general/100793/focus=1405

 I prefer to have solutions (if one is known) instead of another workaround.
 
 JFI, I'm just working on exactly that (handling BAR TX status in
 driver to implement proper RX reorder window flushing at the peer). I'll post 
 it for
 further testing to the rt2x00 list once I'm done.

Thank you for your time spent on this problem! I really appreciate it!


Kind regards!
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 104/173] rt2x00: Dont let mac80211 send a BAR when an AMPDU subframe fails

2013-01-07 Thread Andreas Hartmann
Hello Stanislaw!

Stanislaw Gruszka wrote:
 On Mon, Jan 07, 2013 at 07:38:35PM +0100, Andreas Hartmann wrote:
 Stanislaw Gruszka wrote:
 On Mon, Jan 07, 2013 at 04:04:01PM +0100, Andreas Hartmann wrote:
 Ben Hutchings wrote:
 On Mon, 2013-01-07 at 09:10 +0100, Stanislaw Gruszka wrote:
 On Mon, Jan 07, 2013 at 09:05:32AM +0100, Stanislaw Gruszka wrote:
 To be clear, I have all of these in the queue:

 be03d4a45c09 rt2x00: Don't let mac80211 send a BAR when an AMPDU 
 subframe fails
 5b632fe85ec8 mac80211: introduce IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL
 ab9d6e4ffe19 Revert: rt2x00: Don't let mac80211 send a BAR when an 
 AMPDU subframe fails

 and I'm intending to drop/defer them all.

 Patch 3 is a revert of patch 1 (questioned patch). Please apply all 3 
 patches,
 or only patch 2.

 No, actually all 3 patches have to be applied. Because last one, except
 revert, include flag IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL setting in 
 rt2x00
 driver, which make patch 2 work.

 Andreas said that that after ab9d6e4ffe19 there was still a regression.

 That's not true. There will be no regression after ab9d6e4ffe20. The
 only thing is that solution is not perfect. But perfect solution require
 lot of changes i.e. is not -stable appropriate (and does not exist 
 currently).

 But maybe he was confused.  I know I'm confused.
 :-))

 No, the thing is:
 rt2800pci misses an appropriate handling of aggregation (which meets the
 requirements of mac80211).

 Both workarounds, mine and the new workaround from Stanislaw (which is
 nothing more than a restricted version of my initial workaround), work

 Your workaround broke STA mode on some environment.

 Why are you sure, that this workaround doesn't break some other devices
 running in AP mode? We believed at that time too, it wouldn't harm even
 STA. But this was wrong for some (which?) devices.
 
 Because it make behaviour the same as it was before 3.2, which introduce
 those issues.

You're so right, Stanislaw! I should have better looked again at your
patch before writing those stupid lines about differentiation between
STA and AP.

Please apologize!


Kind regards,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 104/173] rt2x00: Dont let mac80211 send a BAR when an AMPDU subframe fails

2012-12-29 Thread Andreas Hartmann
Ben Hutchings wrote:
> 3.2-stable review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: Andreas Hartmann 
> 
> commit be03d4a45c09ee5100d3aaaedd087f19bc20d01f upstream.

[...]

This patch is a workaround for

mac80211: retry sending failed BAR frames later instead of tearing down
aggr (http://www.spinics.net/lists/linux-wireless/msg76379.html -
f0425beda4d404a6e751439b562100b902ba9c98)
See:
http://thread.gmane.org/gmane.linux.kernel.wireless.general/83297/focus=83304


Meanwhile there was a bug report complaining about problems with
be03d4a45 when used as STA:
http://thread.gmane.org/gmane.linux.drivers.rt2x00.user/1257
You can find there a few other workaround proposals.


Stanislaw Gruszka proposed here a final(?) workaround, which refines
workaround be03d4a45c by shrinking it to AP function:
http://thread.gmane.org/gmane.linux.kernel.wireless.general/100793


carl9170 had the same problem with f0425beda. There it was fixed like
this:
http://thread.gmane.org/gmane.linux.kernel.wireless.general/100793/focus=1405
This approach fixes the real problem (no aggregation handling by the
firmware / hardware) by implementing it into the driver.

Unfortunately, I didn't see any implementation of c9122c0d63a50 for
rt2x00 until now.



Kind regards,
Andreas Hartmann
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 104/173] rt2x00: Dont let mac80211 send a BAR when an AMPDU subframe fails

2012-12-29 Thread Andreas Hartmann
Ben Hutchings wrote:
 3.2-stable review patch.  If anyone has any objections, please let me know.
 
 --
 
 From: Andreas Hartmann andihartm...@01019freenet.de
 
 commit be03d4a45c09ee5100d3aaaedd087f19bc20d01f upstream.

[...]

This patch is a workaround for

mac80211: retry sending failed BAR frames later instead of tearing down
aggr (http://www.spinics.net/lists/linux-wireless/msg76379.html -
f0425beda4d404a6e751439b562100b902ba9c98)
See:
http://thread.gmane.org/gmane.linux.kernel.wireless.general/83297/focus=83304


Meanwhile there was a bug report complaining about problems with
be03d4a45 when used as STA:
http://thread.gmane.org/gmane.linux.drivers.rt2x00.user/1257
You can find there a few other workaround proposals.


Stanislaw Gruszka proposed here a final(?) workaround, which refines
workaround be03d4a45c by shrinking it to AP function:
http://thread.gmane.org/gmane.linux.kernel.wireless.general/100793


carl9170 had the same problem with f0425beda. There it was fixed like
this:
http://thread.gmane.org/gmane.linux.kernel.wireless.general/100793/focus=1405
This approach fixes the real problem (no aggregation handling by the
firmware / hardware) by implementing it into the driver.

Unfortunately, I didn't see any implementation of c9122c0d63a50 for
rt2x00 until now.



Kind regards,
Andreas Hartmann
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] pci: ACS quirk for AMD southbridge

2012-07-12 Thread Andreas Hartmann
Hello Alex,

I tested the patch below against linux 3.4.4 and with this
PCI WLAN-device:

06:07.0 Network controller: Ralink corp. RT2800 802.11n PCI
06:07.0 0280: 1814:0601

The device resides behind a PCI to PCI bridge:

00:14.4 PCI bridge: Advanced Micro Devices [AMD] nee ATI SBx00 PCI to PCI 
Bridge (rev 40) (prog-if 01 [Subtractive decode])
00:14.4 0604: 1002:4384 (rev 40) (prog-if 01 [Subtractive decode])

The device works fine in kvm / 64bit. Surprisingly, it isn't necessary
at all to put the PCI to PCI bridge to the VM. It's enough to put the
WLAN-device to the VM and bind it to vfio-pci. That's all. The bridge
isn't bound to vfio-pci (it's bound to nothing).

I stripped off linux-pci because I'm no member of this list.


Thanks.
kind regards,
Andreas


Alex Williamson wrote:
> We've confirmed that peer-to-peer between these devices is
> not possible.  We can therefore claim that they support a
> subset of ACS.
> 
> Signed-off-by: Alex Williamson 
Tested-by: Andreas Hartmann 
> Cc: Joerg Roedel 
> ---
> 
> Two things about this patch make me a little nervous.  The
> first is that I'd really like to have a pci_is_pcie() test
> in pci_mf_no_p2p_acs_enabled(), but these devices don't
> have a PCIe capability.  That means that if there was a
> topology where these devices sit on a legacy PCI bus,
> we incorrectly return that we're ACS safe here.  That leads
> to my second problem, pciids seems to suggest that some of
> these functions have been around for a while.  Is it just
> this package that's peer-to-peer safe, or is it safe to
> assume that any previous assembly of these functions is
> also p2p safe.  Maybe we need to factor in device revs if
> that uniquely identifies this package?
> 
> Looks like another useful device to potentially quirk
> would be:
> 
> 00:15.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB700/SB800/SB900 
> PCI to PCI bridge (PCIE port 0)
> 00:15.1 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB700/SB800/SB900 
> PCI to PCI bridge (PCIE port 1)
> 00:15.2 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB900 PCI to PCI 
> bridge (PCIE port 2)
> 00:15.3 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB900 PCI to PCI 
> bridge (PCIE port 3)
> 
> 00:15.0 0604: 1002:43a0
> 00:15.1 0604: 1002:43a1
> 00:15.2 0604: 1002:43a2
> 00:15.3 0604: 1002:43a3
> 
>  drivers/pci/quirks.c |   29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 4ebc865..2c84961 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3271,11 +3271,40 @@ struct pci_dev *pci_get_dma_source(struct pci_dev 
> *dev)
>   return pci_dev_get(dev);
>  }
>  
> +/*
> + * Multifunction devices that do not support peer-to-peer between
> + * functions can claim to support a subset of ACS.  Such devices
> + * effectively enable request redirect (RR) and completion redirect (CR)
> + * since all transactions are redirected to the upstream root complex.
> + */
> +static int pci_mf_no_p2p_acs_enabled(struct pci_dev *dev, u16 acs_flags)
> +{
> + if (!dev->multifunction)
> + return -ENODEV;
> +
> + /* Filter out flags not applicable to multifunction */
> + acs_flags &= (PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC | PCI_ACS_DT);
> +
> + return acs_flags & ~(PCI_ACS_RR | PCI_ACS_CR) ? 0 : 1;
> +}
> +
>  static const struct pci_dev_acs_enabled {
>   u16 vendor;
>   u16 device;
>   int (*acs_enabled)(struct pci_dev *dev, u16 acs_flags);
>  } pci_dev_acs_enabled[] = {
> + /*
> +  * AMD/ATI multifunction southbridge devices.  AMD has confirmed
> +  * that peer-to-peer between these devices is not possible, so
> +  * they do support a subset of ACS even though the capability is
> +  * not exposed in config space.
> +  */
> + { PCI_VENDOR_ID_ATI, 0x4385, pci_mf_no_p2p_acs_enabled },
> + { PCI_VENDOR_ID_ATI, 0x439c, pci_mf_no_p2p_acs_enabled },
> + { PCI_VENDOR_ID_ATI, 0x4383, pci_mf_no_p2p_acs_enabled },
> + { PCI_VENDOR_ID_ATI, 0x439d, pci_mf_no_p2p_acs_enabled },
> + { PCI_VENDOR_ID_ATI, 0x4384, pci_mf_no_p2p_acs_enabled },
> + { PCI_VENDOR_ID_ATI, 0x4399, pci_mf_no_p2p_acs_enabled },
>   { 0 }
>  };
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] pci: ACS quirk for AMD southbridge

2012-07-12 Thread Andreas Hartmann
Hello Alex,

I tested the patch below against linux 3.4.4 and with this
PCI WLAN-device:

06:07.0 Network controller: Ralink corp. RT2800 802.11n PCI
06:07.0 0280: 1814:0601

The device resides behind a PCI to PCI bridge:

00:14.4 PCI bridge: Advanced Micro Devices [AMD] nee ATI SBx00 PCI to PCI 
Bridge (rev 40) (prog-if 01 [Subtractive decode])
00:14.4 0604: 1002:4384 (rev 40) (prog-if 01 [Subtractive decode])

The device works fine in kvm / 64bit. Surprisingly, it isn't necessary
at all to put the PCI to PCI bridge to the VM. It's enough to put the
WLAN-device to the VM and bind it to vfio-pci. That's all. The bridge
isn't bound to vfio-pci (it's bound to nothing).

I stripped off linux-pci because I'm no member of this list.


Thanks.
kind regards,
Andreas


Alex Williamson wrote:
 We've confirmed that peer-to-peer between these devices is
 not possible.  We can therefore claim that they support a
 subset of ACS.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
Tested-by: Andreas Hartmann andihartm...@01019freenet.de
 Cc: Joerg Roedel joerg.roe...@amd.com
 ---
 
 Two things about this patch make me a little nervous.  The
 first is that I'd really like to have a pci_is_pcie() test
 in pci_mf_no_p2p_acs_enabled(), but these devices don't
 have a PCIe capability.  That means that if there was a
 topology where these devices sit on a legacy PCI bus,
 we incorrectly return that we're ACS safe here.  That leads
 to my second problem, pciids seems to suggest that some of
 these functions have been around for a while.  Is it just
 this package that's peer-to-peer safe, or is it safe to
 assume that any previous assembly of these functions is
 also p2p safe.  Maybe we need to factor in device revs if
 that uniquely identifies this package?
 
 Looks like another useful device to potentially quirk
 would be:
 
 00:15.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB700/SB800/SB900 
 PCI to PCI bridge (PCIE port 0)
 00:15.1 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB700/SB800/SB900 
 PCI to PCI bridge (PCIE port 1)
 00:15.2 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB900 PCI to PCI 
 bridge (PCIE port 2)
 00:15.3 PCI bridge: Advanced Micro Devices [AMD] nee ATI SB900 PCI to PCI 
 bridge (PCIE port 3)
 
 00:15.0 0604: 1002:43a0
 00:15.1 0604: 1002:43a1
 00:15.2 0604: 1002:43a2
 00:15.3 0604: 1002:43a3
 
  drivers/pci/quirks.c |   29 +
  1 file changed, 29 insertions(+)
 
 diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
 index 4ebc865..2c84961 100644
 --- a/drivers/pci/quirks.c
 +++ b/drivers/pci/quirks.c
 @@ -3271,11 +3271,40 @@ struct pci_dev *pci_get_dma_source(struct pci_dev 
 *dev)
   return pci_dev_get(dev);
  }
  
 +/*
 + * Multifunction devices that do not support peer-to-peer between
 + * functions can claim to support a subset of ACS.  Such devices
 + * effectively enable request redirect (RR) and completion redirect (CR)
 + * since all transactions are redirected to the upstream root complex.
 + */
 +static int pci_mf_no_p2p_acs_enabled(struct pci_dev *dev, u16 acs_flags)
 +{
 + if (!dev-multifunction)
 + return -ENODEV;
 +
 + /* Filter out flags not applicable to multifunction */
 + acs_flags = (PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC | PCI_ACS_DT);
 +
 + return acs_flags  ~(PCI_ACS_RR | PCI_ACS_CR) ? 0 : 1;
 +}
 +
  static const struct pci_dev_acs_enabled {
   u16 vendor;
   u16 device;
   int (*acs_enabled)(struct pci_dev *dev, u16 acs_flags);
  } pci_dev_acs_enabled[] = {
 + /*
 +  * AMD/ATI multifunction southbridge devices.  AMD has confirmed
 +  * that peer-to-peer between these devices is not possible, so
 +  * they do support a subset of ACS even though the capability is
 +  * not exposed in config space.
 +  */
 + { PCI_VENDOR_ID_ATI, 0x4385, pci_mf_no_p2p_acs_enabled },
 + { PCI_VENDOR_ID_ATI, 0x439c, pci_mf_no_p2p_acs_enabled },
 + { PCI_VENDOR_ID_ATI, 0x4383, pci_mf_no_p2p_acs_enabled },
 + { PCI_VENDOR_ID_ATI, 0x439d, pci_mf_no_p2p_acs_enabled },
 + { PCI_VENDOR_ID_ATI, 0x4384, pci_mf_no_p2p_acs_enabled },
 + { PCI_VENDOR_ID_ATI, 0x4399, pci_mf_no_p2p_acs_enabled },
   { 0 }
  };
  
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


crash with linux 2.6.16 under high network traffic

2007-06-06 Thread Andreas Hartmann
427, race 0+2
Jun  6 13:15:36 pscudb01 kernel: Free swap  = 4170160kB
Jun  6 13:15:36 pscudb01 kernel: Total swap = 4200956kB
Jun  6 13:15:36 pscudb01 kernel: Free swap:   4170160kB
Jun  6 13:15:36 pscudb01 kernel: 6291456 pages of RAM
Jun  6 13:15:36 pscudb01 kernel: 214414 reserved pages
Jun  6 13:15:36 pscudb01 kernel: 28836 pages shared
Jun  6 13:15:36 pscudb01 kernel: 1209 pages swap cached


Sometimes, the oom-killer gets active too, before the machine crashes.


Does anybody has any idea, what to do to narrow down this problem? How can
I see how much memory the network driver module needs?

Background:
I'm suspecting the cassini driver to be the problem (memory leak?),
because I didn't have this problem without the cassini driver while using
another nic and driver.




Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


crash with linux 2.6.16 under high network traffic

2007-06-06 Thread Andreas Hartmann
 kernel: Node 0 Normal: 4041*4kB 1*8kB 1*16kB
1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 16796kB
Jun  6 13:15:36 pscudb01 kernel: Node 0 HighMem: empty
Jun  6 13:15:36 pscudb01 kernel: Swap cache: add 73569, delete 72373, find
23938/34427, race 0+2
Jun  6 13:15:36 pscudb01 kernel: Free swap  = 4170160kB
Jun  6 13:15:36 pscudb01 kernel: Total swap = 4200956kB
Jun  6 13:15:36 pscudb01 kernel: Free swap:   4170160kB
Jun  6 13:15:36 pscudb01 kernel: 6291456 pages of RAM
Jun  6 13:15:36 pscudb01 kernel: 214414 reserved pages
Jun  6 13:15:36 pscudb01 kernel: 28836 pages shared
Jun  6 13:15:36 pscudb01 kernel: 1209 pages swap cached


Sometimes, the oom-killer gets active too, before the machine crashes.


Does anybody has any idea, what to do to narrow down this problem? How can
I see how much memory the network driver module needs?

Background:
I'm suspecting the cassini driver to be the problem (memory leak?),
because I didn't have this problem without the cassini driver while using
another nic and driver.




Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.18.2] ide_core bug: kobject_add failed for ide ... - with vanilla kernel

2007-01-08 Thread Andreas Hartmann
Hello Lee,

Lee Revell wrote:
> On Sun, 2007-01-07 at 18:44 +0100, Andreas Hartmann wrote:
>> Hello,
>> 
>> ide_core is loaded (while putting in an USB stick) as module the first
>> time after reboot - all works fine. The USB stick got mounted and a ls
>> is done to show the files on the root of the filesystem of the stick.
>> Afterwards, the stick is securely removed from the system.
>> Afterwards, ide_core is unloaded with rmmod (after usb-storage has been
>> unloaded) - ok.
>> 
>> Next step is to load ide_core again. Now, the following error can be
>> found in /var/log/messages:
>> 
>> 
>> Jan  7 11:48:18 notebook1 kernel: Uniform Multi-Platform E-IDE driver
>> Revision: 7.00alpha2
>> Jan  7 11:48:18 notebook1 kernel: ide: Assuming 33MHz system bus speed
>> for PIO modes; override with idebus=xx
>> Jan  7 11:48:18 notebook1 kernel: kobject_add failed for ide with
>> -EEXIST, don't try to register things with the same name in the same
>> directory.
> 
> You seem to be running a SuSE kernel - please report the issue to them.

You are right - but the same error appears with the vanilla kernel, too.
That's why I reported it here.

> It's probably useful to repeat your test but run "find /sys/module >
> sys1" before loading ide_core the first time, then "find /sys/module >
> sys2" after "rmmod ide_core", and save the output of "diff sys1 sys2".

There isn't any difference.


Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.18.2] ide_core bug: kobject_add failed for ide ... - with vanilla kernel

2007-01-08 Thread Andreas Hartmann
Hello Lee,

Lee Revell wrote:
 On Sun, 2007-01-07 at 18:44 +0100, Andreas Hartmann wrote:
 Hello,
 
 ide_core is loaded (while putting in an USB stick) as module the first
 time after reboot - all works fine. The USB stick got mounted and a ls
 is done to show the files on the root of the filesystem of the stick.
 Afterwards, the stick is securely removed from the system.
 Afterwards, ide_core is unloaded with rmmod (after usb-storage has been
 unloaded) - ok.
 
 Next step is to load ide_core again. Now, the following error can be
 found in /var/log/messages:
 
 
 Jan  7 11:48:18 notebook1 kernel: Uniform Multi-Platform E-IDE driver
 Revision: 7.00alpha2
 Jan  7 11:48:18 notebook1 kernel: ide: Assuming 33MHz system bus speed
 for PIO modes; override with idebus=xx
 Jan  7 11:48:18 notebook1 kernel: kobject_add failed for ide with
 -EEXIST, don't try to register things with the same name in the same
 directory.
 
 You seem to be running a SuSE kernel - please report the issue to them.

You are right - but the same error appears with the vanilla kernel, too.
That's why I reported it here.

 It's probably useful to repeat your test but run find /sys/module 
 sys1 before loading ide_core the first time, then find /sys/module 
 sys2 after rmmod ide_core, and save the output of diff sys1 sys2.

There isn't any difference.


Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6.18.2] ide_core bug: kobject_add failed for ide ...

2007-01-07 Thread Andreas Hartmann
 82801FBM (ICH6M) LPC Interface
Bridge (rev 04)
Subsystem: Mitac Unknown device 8048
Flags: bus master, medium devsel, latency 0

00:1f.2 IDE interface: Intel Corporation 82801FBM (ICH6M) SATA
Controller (rev 04) (prog-if 80 [Master])
Subsystem: Mitac Unknown device 8048
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 177
I/O ports at 
I/O ports at 
I/O ports at 
I/O ports at 
I/O ports at 1100 [size=16]
Capabilities: [70] Power Management version 2

00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
SMBus Controller (rev 04)
Subsystem: Mitac Unknown device 8048
Flags: medium devsel, IRQ 177
I/O ports at 1400 [size=32]

01:02.0 CardBus bridge: Texas Instruments PCIxx21/x515 Cardbus Controller
Subsystem: Mitac Unknown device 8048
Flags: bus master, medium devsel, latency 168, IRQ 169
Memory at cc009000 (32-bit, non-prefetchable) [size=4K]
Bus: primary=01, secondary=02, subordinate=05, sec-latency=176
Memory window 0: 9c00-9dfff000 (prefetchable)
Memory window 1: ce00-c000
I/O window 0: c400-c4ff
I/O window 1: c800-c8ff
16-bit legacy interface ports at 0001

01:02.2 FireWire (IEEE 1394): Texas Instruments OHCI Compliant IEEE 1394
Host Controller (prog-if 10 [OHCI])
Subsystem: Mitac Unknown device 8048
Flags: bus master, medium devsel, latency 128, IRQ 177
Memory at fedff800 (32-bit, non-prefetchable) [size=2K]
Memory at cc00c000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [44] Power Management version 2

01:04.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
Subsystem: Mitac Unknown device 8048
Flags: bus master, medium devsel, latency 128, IRQ 209
I/O ports at c000 [size=256]
Memory at cc008000 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2

01:05.0 Network controller: RaLink RT2561/RT61 rev B 802.11g
Subsystem: Micro-Star International Co., Ltd. Unknown device b833
Flags: bus master, slow devsel, latency 128, IRQ 11
Memory at cc00 (32-bit, non-prefetchable) [size=32K]
Capabilities: [40] Power Management version 2

notebook1:~ # lsmod
Module  Size  Used by
ide_core  129992  0
af_packet  29320  2
ipv6  263584  12
button 10896  0
battery14340  0
ac  9476  0
twofish47488  3
cryptoloop  7680  3
ohci_hcd   23428  0
apparmor   55572  0
aamatch_pcre   18304  1 apparmor
loop   20488  7 cryptoloop
dm_mod 60184  13
pcmcia 40892  0
firmware_class 14080  1 pcmcia
usbhid 52192  0
yenta_socket   30348  1
ohci1394   37040  0
rsrc_nonstatic 17024  1 yenta_socket
pcmcia_core43412  3 pcmcia,yenta_socket,rsrc_nonstatic
ieee1394  102584  1 ohci1394
snd_hda_intel  23060  1
snd_hda_codec 164352  1 snd_hda_intel
snd_pcm86916  2 snd_hda_intel,snd_hda_codec
snd_timer  27908  1 snd_pcm
snd61188  6
snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer
8139too30592  0
soundcore  13792  1 snd
intel_agp  27804  1
snd_page_alloc 14472  2 snd_hda_intel,snd_pcm
mii 9600  1 8139too
agpgart35528  2 intel_agp
ehci_hcd   34696  0
uhci_hcd   26892  0
i2c_i801   11660  0
usbcore   114896  4 ohci_hcd,usbhid,ehci_hcd,uhci_hcd
i2c_core   25216  1 i2c_i801
reiserfs  237312  7
sr_mod 20132  0
cdrom  38432  1 sr_mod
edd13892  0
fan 8964  1
sg 38044  0
ata_piix   19332  3
ahci   25860  0
libata119188  2 ata_piix,ahci
thermal18568  1
processor  34664  1 thermal
sd_mod 24576  4
scsi_mod  136712  5 sr_mod,sg,ahci,libata,sd_mod



Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6.18.2] ide_core oops

2007-01-07 Thread Andreas Hartmann
 2

01:04.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
Subsystem: Mitac Unknown device 8048
Flags: bus master, medium devsel, latency 128, IRQ 209
I/O ports at c000 [size=256]
Memory at cc008000 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2

01:05.0 Network controller: RaLink RT2561/RT61 rev B 802.11g
Subsystem: Micro-Star International Co., Ltd. Unknown device b833
Flags: bus master, slow devsel, latency 128, IRQ 11
Memory at cc00 (32-bit, non-prefetchable) [size=32K]
Capabilities: [40] Power Management version 2


lsmod (when there is no error)
button 10896  0
usb_storage82112  0
ide_core  129992  1 usb_storage
ohci_hcd   23428  0
uhci_hcd   26892  0
ohci1394   37040  0
ieee1394  102584  1 ohci1394
nls_iso8859_1   8320  0
nls_cp437   9984  0
vfat   16640  0
fat55324  1 vfat
af_packet  29320  2
ipv6  263584  16
battery14340  0
ac  9476  0
twofish47488  3
cryptoloop  7680  3
apparmor   55572  0
aamatch_pcre   18304  1 apparmor
loop   20488  7 cryptoloop
dm_mod 60184  13
pcmcia 40892  0
firmware_class 14080  1 pcmcia
usbhid 52192  0
snd_hda_intel  23060  1
snd_hda_codec 164352  1 snd_hda_intel
snd_pcm86916  2 snd_hda_intel,snd_hda_codec
8139too30592  0
yenta_socket   30348  1
snd_timer  27908  1 snd_pcm
rsrc_nonstatic 17024  1 yenta_socket
snd61188  6
snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer
mii 9600  1 8139too
pcmcia_core43412  3 pcmcia,yenta_socket,rsrc_nonstatic
soundcore  13792  1 snd
ehci_hcd   34696  0
snd_page_alloc 14472  2 snd_hda_intel,snd_pcm
usbcore   114896  5
usb_storage,ohci_hcd,uhci_hcd,usbhid,ehci_hcd
i2c_i801   11660  0
intel_agp  27804  1
agpgart35528  3 intel_agp
i2c_core   25216  1 i2c_i801
reiserfs  237312  7
sr_mod 20132  0
cdrom  38432  1 sr_mod
edd13892  0
fan 8964  1
sg 38044  0
ata_piix   19332  3
ahci   25860  0
libata119188  2 ata_piix,ahci
thermal18568  1
processor  34664  1 thermal
sd_mod     24576  4
scsi_mod  136712  6 usb_storage,sr_mod,sg,ahci,libata,sd_mod


Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6.18.2] ide_core bug: kobject_add failed for ide ...

2007-01-07 Thread Andreas Hartmann
Capabilities: [50] Subsystem: Mitac Unknown device 8048

00:1f.0 ISA bridge: Intel Corporation 82801FBM (ICH6M) LPC Interface
Bridge (rev 04)
Subsystem: Mitac Unknown device 8048
Flags: bus master, medium devsel, latency 0

00:1f.2 IDE interface: Intel Corporation 82801FBM (ICH6M) SATA
Controller (rev 04) (prog-if 80 [Master])
Subsystem: Mitac Unknown device 8048
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 177
I/O ports at unassigned
I/O ports at unassigned
I/O ports at unassigned
I/O ports at unassigned
I/O ports at 1100 [size=16]
Capabilities: [70] Power Management version 2

00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
SMBus Controller (rev 04)
Subsystem: Mitac Unknown device 8048
Flags: medium devsel, IRQ 177
I/O ports at 1400 [size=32]

01:02.0 CardBus bridge: Texas Instruments PCIxx21/x515 Cardbus Controller
Subsystem: Mitac Unknown device 8048
Flags: bus master, medium devsel, latency 168, IRQ 169
Memory at cc009000 (32-bit, non-prefetchable) [size=4K]
Bus: primary=01, secondary=02, subordinate=05, sec-latency=176
Memory window 0: 9c00-9dfff000 (prefetchable)
Memory window 1: ce00-c000
I/O window 0: c400-c4ff
I/O window 1: c800-c8ff
16-bit legacy interface ports at 0001

01:02.2 FireWire (IEEE 1394): Texas Instruments OHCI Compliant IEEE 1394
Host Controller (prog-if 10 [OHCI])
Subsystem: Mitac Unknown device 8048
Flags: bus master, medium devsel, latency 128, IRQ 177
Memory at fedff800 (32-bit, non-prefetchable) [size=2K]
Memory at cc00c000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [44] Power Management version 2

01:04.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
Subsystem: Mitac Unknown device 8048
Flags: bus master, medium devsel, latency 128, IRQ 209
I/O ports at c000 [size=256]
Memory at cc008000 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2

01:05.0 Network controller: RaLink RT2561/RT61 rev B 802.11g
Subsystem: Micro-Star International Co., Ltd. Unknown device b833
Flags: bus master, slow devsel, latency 128, IRQ 11
Memory at cc00 (32-bit, non-prefetchable) [size=32K]
Capabilities: [40] Power Management version 2

notebook1:~ # lsmod
Module  Size  Used by
ide_core  129992  0
af_packet  29320  2
ipv6  263584  12
button 10896  0
battery14340  0
ac  9476  0
twofish47488  3
cryptoloop  7680  3
ohci_hcd   23428  0
apparmor   55572  0
aamatch_pcre   18304  1 apparmor
loop   20488  7 cryptoloop
dm_mod 60184  13
pcmcia 40892  0
firmware_class 14080  1 pcmcia
usbhid 52192  0
yenta_socket   30348  1
ohci1394   37040  0
rsrc_nonstatic 17024  1 yenta_socket
pcmcia_core43412  3 pcmcia,yenta_socket,rsrc_nonstatic
ieee1394  102584  1 ohci1394
snd_hda_intel  23060  1
snd_hda_codec 164352  1 snd_hda_intel
snd_pcm86916  2 snd_hda_intel,snd_hda_codec
snd_timer  27908  1 snd_pcm
snd61188  6
snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer
8139too30592  0
soundcore  13792  1 snd
intel_agp  27804  1
snd_page_alloc 14472  2 snd_hda_intel,snd_pcm
mii 9600  1 8139too
agpgart35528  2 intel_agp
ehci_hcd   34696  0
uhci_hcd   26892  0
i2c_i801   11660  0
usbcore   114896  4 ohci_hcd,usbhid,ehci_hcd,uhci_hcd
i2c_core   25216  1 i2c_i801
reiserfs  237312  7
sr_mod 20132  0
cdrom  38432  1 sr_mod
edd13892  0
fan 8964  1
sg 38044  0
ata_piix   19332  3
ahci   25860  0
libata119188  2 ata_piix,ahci
thermal18568  1
processor  34664  1 thermal
sd_mod 24576  4
scsi_mod  136712  5 sr_mod,sg,ahci,libata,sd_mod



Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6.18.2] ide_core oops

2007-01-07 Thread Andreas Hartmann
 behind bridge: cc00-cfff
Prefetchable memory behind bridge: 9c00-9fff
Capabilities: [50] Subsystem: Mitac Unknown device 8048

00:1f.0 ISA bridge: Intel Corporation 82801FBM (ICH6M) LPC Interface
Bridge (rev 04)
Subsystem: Mitac Unknown device 8048
Flags: bus master, medium devsel, latency 0

00:1f.2 IDE interface: Intel Corporation 82801FBM (ICH6M) SATA
Controller (rev 04) (prog-if 80 [Master])
Subsystem: Mitac Unknown device 8048
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 177
I/O ports at unassigned
I/O ports at unassigned
I/O ports at unassigned
I/O ports at unassigned
I/O ports at 1100 [size=16]
Capabilities: [70] Power Management version 2

00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
SMBus Controller (rev 04)
Subsystem: Mitac Unknown device 8048
Flags: medium devsel, IRQ 177
I/O ports at 1400 [size=32]

01:02.0 CardBus bridge: Texas Instruments PCIxx21/x515 Cardbus Controller
Subsystem: Mitac Unknown device 8048
Flags: bus master, medium devsel, latency 168, IRQ 169
Memory at cc009000 (32-bit, non-prefetchable) [size=4K]
Bus: primary=01, secondary=02, subordinate=05, sec-latency=176
Memory window 0: 9c00-9dfff000 (prefetchable)
Memory window 1: ce00-c000
I/O window 0: c400-c4ff
I/O window 1: c800-c8ff
16-bit legacy interface ports at 0001

01:02.2 FireWire (IEEE 1394): Texas Instruments OHCI Compliant IEEE 1394
Host Controller (prog-if 10 [OHCI])
Subsystem: Mitac Unknown device 8048
Flags: bus master, medium devsel, latency 128, IRQ 177
Memory at fedff800 (32-bit, non-prefetchable) [size=2K]
Memory at cc00c000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [44] Power Management version 2

01:04.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
Subsystem: Mitac Unknown device 8048
Flags: bus master, medium devsel, latency 128, IRQ 209
I/O ports at c000 [size=256]
Memory at cc008000 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2

01:05.0 Network controller: RaLink RT2561/RT61 rev B 802.11g
Subsystem: Micro-Star International Co., Ltd. Unknown device b833
Flags: bus master, slow devsel, latency 128, IRQ 11
Memory at cc00 (32-bit, non-prefetchable) [size=32K]
Capabilities: [40] Power Management version 2


lsmod (when there is no error)
button 10896  0
usb_storage82112  0
ide_core  129992  1 usb_storage
ohci_hcd   23428  0
uhci_hcd   26892  0
ohci1394   37040  0
ieee1394  102584  1 ohci1394
nls_iso8859_1   8320  0
nls_cp437   9984  0
vfat   16640  0
fat55324  1 vfat
af_packet  29320  2
ipv6  263584  16
battery14340  0
ac  9476  0
twofish47488  3
cryptoloop  7680  3
apparmor   55572  0
aamatch_pcre   18304  1 apparmor
loop   20488  7 cryptoloop
dm_mod 60184  13
pcmcia 40892  0
firmware_class 14080  1 pcmcia
usbhid 52192  0
snd_hda_intel  23060  1
snd_hda_codec 164352  1 snd_hda_intel
snd_pcm86916  2 snd_hda_intel,snd_hda_codec
8139too30592  0
yenta_socket   30348  1
snd_timer  27908  1 snd_pcm
rsrc_nonstatic 17024  1 yenta_socket
snd61188  6
snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer
mii 9600  1 8139too
pcmcia_core43412  3 pcmcia,yenta_socket,rsrc_nonstatic
soundcore  13792  1 snd
ehci_hcd   34696  0
snd_page_alloc 14472  2 snd_hda_intel,snd_pcm
usbcore   114896  5
usb_storage,ohci_hcd,uhci_hcd,usbhid,ehci_hcd
i2c_i801   11660  0
intel_agp  27804  1
agpgart35528  3 intel_agp
i2c_core   25216  1 i2c_i801
reiserfs  237312  7
sr_mod 20132  0
cdrom  38432  1 sr_mod
edd13892  0
fan 8964  1
sg 38044  0
ata_piix   19332  3
ahci   25860  0
libata119188  2 ata_piix,ahci
thermal18568  1
processor  34664  1 thermal
sd_mod 24576  4
scsi_mod  136712  6 usb_storage,sr_mod,sg,ahci,libata,sd_mod


Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo

Re: forbid to strace a program

2005-09-04 Thread Andreas Hartmann
Chase Venters wrote:
>> Is there another way to do this? If the password is crypted, I need a
>> passphrase or something other to decrypt it again. Not really a solution
>> of the problem.
>>
>> Therefore, it would be best, to hide it by preventing stracing of the
>> application to all users and root.
>>
>> Ok, root could search for the password directly in the memory, but this
>> would be not as easy as a strace.
> 
> Obfuscation isn't really valid security. Making something 'harder' to break 
> isn't a solution unless you're making it hard enough that current technology 
> can't break it (eg... you always have the brute force option, but good crypto 
> intends to make such an option impossible without expending zillions of clock 
> cycles). 

You're right. If I would have a solution, which could do this, I would
prefer it.

> 
> Can I ask why you want to hide the database password from root?

It's easy: for security reasons. There could always be some bugs in some
software, which makes it possible for some other user, to gain root
privileges. Now, they could easily strace for information, they shouldn't
could do it. The password they could see, isn't just used for the DB, but
for some other applications, too. That's the disadvantage of general
(single sign on) passwords.


Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forbid to strace a program

2005-09-04 Thread Andreas Hartmann
Chase Venters wrote:
 Is there another way to do this? If the password is crypted, I need a
 passphrase or something other to decrypt it again. Not really a solution
 of the problem.

 Therefore, it would be best, to hide it by preventing stracing of the
 application to all users and root.

 Ok, root could search for the password directly in the memory, but this
 would be not as easy as a strace.
 
 Obfuscation isn't really valid security. Making something 'harder' to break 
 isn't a solution unless you're making it hard enough that current technology 
 can't break it (eg... you always have the brute force option, but good crypto 
 intends to make such an option impossible without expending zillions of clock 
 cycles). 

You're right. If I would have a solution, which could do this, I would
prefer it.

 
 Can I ask why you want to hide the database password from root?

It's easy: for security reasons. There could always be some bugs in some
software, which makes it possible for some other user, to gain root
privileges. Now, they could easily strace for information, they shouldn't
could do it. The password they could see, isn't just used for the DB, but
for some other applications, too. That's the disadvantage of general
(single sign on) passwords.


Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forbid to strace a program

2005-09-03 Thread Andreas Hartmann
Alex Riesen wrote:
> On 9/3/05, Andreas Hartmann <[EMAIL PROTECTED]> wrote:
>> Hello!
>> 
>> Is it possible to prevent a program to be straced on x86?
>> What do I have to do, eg., to prevent a perl-program to be straced?
>> 
> 
> So that none can see what are you doing? Or because your program is
> breaking because of this? Probably nothing, but someone would like
> to know what it is you are doing and exactly how it breaks (and, if
> you don't mind -
> why it breaks).

That's not really the problem. I want to hide a clear text password in
that program (something like ssh-agent or gpg-agent; the last can be
straced, too :-() which I need for a database when the program runs.

Is there another way to do this? If the password is crypted, I need a
passphrase or something other to decrypt it again. Not really a solution
of the problem.

Therefore, it would be best, to hide it by preventing stracing of the
application to all users and root.

Ok, root could search for the password directly in the memory, but this
would be not as easy as a strace.



Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forbid to strace a program

2005-09-03 Thread Andreas Hartmann
Alex Riesen wrote:
 On 9/3/05, Andreas Hartmann [EMAIL PROTECTED] wrote:
 Hello!
 
 Is it possible to prevent a program to be straced on x86?
 What do I have to do, eg., to prevent a perl-program to be straced?
 
 
 So that none can see what are you doing? Or because your program is
 breaking because of this? Probably nothing, but someone would like
 to know what it is you are doing and exactly how it breaks (and, if
 you don't mind -
 why it breaks).

That's not really the problem. I want to hide a clear text password in
that program (something like ssh-agent or gpg-agent; the last can be
straced, too :-() which I need for a database when the program runs.

Is there another way to do this? If the password is crypted, I need a
passphrase or something other to decrypt it again. Not really a solution
of the problem.

Therefore, it would be best, to hide it by preventing stracing of the
application to all users and root.

Ok, root could search for the password directly in the memory, but this
would be not as easy as a strace.



Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More performance for the TCP stack by using additional hardware chip on NIC

2005-04-17 Thread Andreas Hartmann
Willy Tarreau schrieb:
> Hello !
> 
> On Sun, Apr 17, 2005 at 01:29:14PM +0300, Avi Kivity wrote:
>> On Sun, 2005-04-17 at 12:07, Arjan van de Ven wrote:
>> > On Sun, 2005-04-17 at 10:17 +0200, Andreas Hartmann wrote:
>> > > Hello!
>> > > 
>> > > Alacritech developed a new chip for NIC's
>> > > (http://www.alacritech.com/html/tech_review.html), which makes it 
>> > > possible
>> > > to take away the TCP stack from the host CPU. Therefore, the host CPU has
>> > > more performance for the applications according Alacritech.
>> > 
>> > there are very many good reasons why this for linux is not the right
>> > solution, including the fact that the linux tcp/ip stack already is
>> > quite fast so the "gains" achieved aren't that stellar as the gains you
>> > get when comparing to windows.
>> > 
>> 
>> TOEs can remove the data copy on receive. In some applications (notably
>> storage), where the application does not touch most of the data, this is
>> a significant advantage that cannot be achieved in a software-only
>> solution.
> 
> Well, if the application does not touch most of the data, either it
> is playing as a relay, and the data will at least have to be copied,
> or it will play as a client or server which reads from/writes to disk,
> and in this case, I wonder how the NIC will send its writes directly
> to the disk controller without some help.
> 
> What worries me with those NICs is that you have no control on the
> TCP stack. You often have to disable the acceleration when you
> want to insert even 1 firewall rule, use policy routing or even
> do a simple anti-spoofing check. It is exactly like the routers
> which do many things in hardware at wire speed, but jump to snail
> speed when you enable any advanced feature.
> 
>> > Also these types of solution always add quite a bit of overhead to
>> > connection setup/teardown making it actually a *loss* for the "many
>> > short connections" types of workloads. Now guess which things certain
>> > benchmarks use, and guess what real world servers do :)
>> > 
>> 
>> again, this depends on the application.
> 
> The speed itself depends on the application. An application which
> goal is to achieve 10 Gbps needs to be written with this goal in
> mind from start, and needs fine usage of the kernel internals, and
> even sometimes good knowledge of the hardware itself.

Alacritech says, the hardware solution would make it very easy for the
application, because _every_ application would gain, without considering
the hardware it runs on itself. These are things which CEO's like to hear
- because they think, they could save time and money during development of
the application.


I don't think that it must be a problem, that on the hardware TCP stack
doesn't run any filter or other additional functions, because machines
(often clusters) with high workloads usually run on dedicated servers with
other dedicated firewall machines in front of.


I think it would be good to support this hardware, because the user can
decide afterwards (after testing), which is the best choice for his
specific application and workload.



Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


More performance for the TCP stack by using additional hardware chip on NIC

2005-04-17 Thread Andreas Hartmann
Hello!

Alacritech developed a new chip for NIC's
(http://www.alacritech.com/html/tech_review.html), which makes it possible
to take away the TCP stack from the host CPU. Therefore, the host CPU has
more performance for the applications according Alacritech.

This sounds interesting.

Unfortunately, there are two patents belonging to this solution.

Now, I'm wondering if it is possible to implement any support for these
chips in the Linux kernel. If this hardware solution does have really the
advantages described by Alacritech, it would be a pitty, if Linux couldn't
use this hardware.

What do you think about that?



Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


More performance for the TCP stack by using additional hardware chip on NIC

2005-04-17 Thread Andreas Hartmann
Hello!

Alacritech developed a new chip for NIC's
(http://www.alacritech.com/html/tech_review.html), which makes it possible
to take away the TCP stack from the host CPU. Therefore, the host CPU has
more performance for the applications according Alacritech.

This sounds interesting.

Unfortunately, there are two patents belonging to this solution.

Now, I'm wondering if it is possible to implement any support for these
chips in the Linux kernel. If this hardware solution does have really the
advantages described by Alacritech, it would be a pitty, if Linux couldn't
use this hardware.

What do you think about that?



Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More performance for the TCP stack by using additional hardware chip on NIC

2005-04-17 Thread Andreas Hartmann
Willy Tarreau schrieb:
 Hello !
 
 On Sun, Apr 17, 2005 at 01:29:14PM +0300, Avi Kivity wrote:
 On Sun, 2005-04-17 at 12:07, Arjan van de Ven wrote:
  On Sun, 2005-04-17 at 10:17 +0200, Andreas Hartmann wrote:
   Hello!
   
   Alacritech developed a new chip for NIC's
   (http://www.alacritech.com/html/tech_review.html), which makes it 
   possible
   to take away the TCP stack from the host CPU. Therefore, the host CPU has
   more performance for the applications according Alacritech.
  
  there are very many good reasons why this for linux is not the right
  solution, including the fact that the linux tcp/ip stack already is
  quite fast so the gains achieved aren't that stellar as the gains you
  get when comparing to windows.
  
 
 TOEs can remove the data copy on receive. In some applications (notably
 storage), where the application does not touch most of the data, this is
 a significant advantage that cannot be achieved in a software-only
 solution.
 
 Well, if the application does not touch most of the data, either it
 is playing as a relay, and the data will at least have to be copied,
 or it will play as a client or server which reads from/writes to disk,
 and in this case, I wonder how the NIC will send its writes directly
 to the disk controller without some help.
 
 What worries me with those NICs is that you have no control on the
 TCP stack. You often have to disable the acceleration when you
 want to insert even 1 firewall rule, use policy routing or even
 do a simple anti-spoofing check. It is exactly like the routers
 which do many things in hardware at wire speed, but jump to snail
 speed when you enable any advanced feature.
 
  Also these types of solution always add quite a bit of overhead to
  connection setup/teardown making it actually a *loss* for the many
  short connections types of workloads. Now guess which things certain
  benchmarks use, and guess what real world servers do :)
  
 
 again, this depends on the application.
 
 The speed itself depends on the application. An application which
 goal is to achieve 10 Gbps needs to be written with this goal in
 mind from start, and needs fine usage of the kernel internals, and
 even sometimes good knowledge of the hardware itself.

Alacritech says, the hardware solution would make it very easy for the
application, because _every_ application would gain, without considering
the hardware it runs on itself. These are things which CEO's like to hear
- because they think, they could save time and money during development of
the application.


I don't think that it must be a problem, that on the hardware TCP stack
doesn't run any filter or other additional functions, because machines
(often clusters) with high workloads usually run on dedicated servers with
other dedicated firewall machines in front of.


I think it would be good to support this hardware, because the user can
decide afterwards (after testing), which is the best choice for his
specific application and workload.



Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


crypting filesystems

2005-04-04 Thread Andreas Hartmann
Hello,

I want to crypt some filesystems (/var, /home, /Data). I'm running LVM I
on all these partitions yet.

I searched, how to do this with linux and found 3 ways to achieve, what I
want to do.

1. crypto-loop (with kernel 2.6)
2. loop-AES (with kernel 2.2.x, 2.4.x and 2.6.x)
3. dm-crypt (with kernel 2.6.x)

Because I'm new to filesystem encryption, I searched for documentation of
all of these solutions and found, that crypto-loop seems not to be
maintained any more. loop-AES and dm-crypt remained. dm-crypt uses the
device mapper concept, which I know long ago from LVM and which therefore
seems to be the most logical solution to me. There is no need to patch the
mount-utility and integration is "out of the box".

So, I suggested to use dm-crypt with 2.6.11.6. I built 3 partitions with
cryptsetup (LUKS) with ESSIV-cipher and 256Bit keys on top of LVM 1 and
reiserfs as filesystem. The swap-partition is crypted with a random key,
which is generated each time at booting.

After all, there are remaining some questions open concerning the security
 / stability of this solution.

1. In order to put in the passphrase just once a time at booting, I put
the passphrase in a gpg-crypted file (cipher AES256 and 256Bit key size),
which is decrypted at boot-time to /tmp (-> tmpfs) and immediately removed
with shred, after activating the three partitions. Is it possible to see
the cleartext password after this action in tmpfs?

2. Is it possible to gain the passphrase from the active encrypted
partitions (because the passphrase is somewhere held in the RAM)?

3. I read at clemens.endorphin.org about 4 different cipher modes (CBC,
CMC, EME and LRW). Actually implemented in dm-crypt is the public-IV
on-disk format or ESSIV, both using CBC cipher mode. The other cipher
modes (CMC, EWE, LRW) are not implemented yet although they promise more
security.

My question is:
Was anybody able to decrypt one of these two implemented public-IV on-disk
formats, or, to say it in other words: are the known problems a mainly
theoretical discussion until today?

4. Are there any master keys existing, which could be used to open every
encrypted filesystem?

5. I read about problems (corrupted filesystem) with reiserfs (I'm using V
3.6). Are they fixed in 2.6.11.6? Would it be better to use XFS?



I would be very glad, if somebody could give me some advice.


Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


crypting filesystems

2005-04-04 Thread Andreas Hartmann
Hello,

I want to crypt some filesystems (/var, /home, /Data). I'm running LVM I
on all these partitions yet.

I searched, how to do this with linux and found 3 ways to achieve, what I
want to do.

1. crypto-loop (with kernel 2.6)
2. loop-AES (with kernel 2.2.x, 2.4.x and 2.6.x)
3. dm-crypt (with kernel 2.6.x)

Because I'm new to filesystem encryption, I searched for documentation of
all of these solutions and found, that crypto-loop seems not to be
maintained any more. loop-AES and dm-crypt remained. dm-crypt uses the
device mapper concept, which I know long ago from LVM and which therefore
seems to be the most logical solution to me. There is no need to patch the
mount-utility and integration is out of the box.

So, I suggested to use dm-crypt with 2.6.11.6. I built 3 partitions with
cryptsetup (LUKS) with ESSIV-cipher and 256Bit keys on top of LVM 1 and
reiserfs as filesystem. The swap-partition is crypted with a random key,
which is generated each time at booting.

After all, there are remaining some questions open concerning the security
 / stability of this solution.

1. In order to put in the passphrase just once a time at booting, I put
the passphrase in a gpg-crypted file (cipher AES256 and 256Bit key size),
which is decrypted at boot-time to /tmp (- tmpfs) and immediately removed
with shred, after activating the three partitions. Is it possible to see
the cleartext password after this action in tmpfs?

2. Is it possible to gain the passphrase from the active encrypted
partitions (because the passphrase is somewhere held in the RAM)?

3. I read at clemens.endorphin.org about 4 different cipher modes (CBC,
CMC, EME and LRW). Actually implemented in dm-crypt is the public-IV
on-disk format or ESSIV, both using CBC cipher mode. The other cipher
modes (CMC, EWE, LRW) are not implemented yet although they promise more
security.

My question is:
Was anybody able to decrypt one of these two implemented public-IV on-disk
formats, or, to say it in other words: are the known problems a mainly
theoretical discussion until today?

4. Are there any master keys existing, which could be used to open every
encrypted filesystem?

5. I read about problems (corrupted filesystem) with reiserfs (I'm using V
3.6). Are they fixed in 2.6.11.6? Would it be better to use XFS?



I would be very glad, if somebody could give me some advice.


Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.4.x oops with X

2005-02-05 Thread Andreas Hartmann
Andreas Hartmann wrote:
> Andreas Hartmann wrote:
> [...]
>> But now, the question is:
>> Why does X crash running kernel 2.4.x with glibc 2.3.4 and not with kernel
>> 2.6.10? Why does X run fine using kernel 2.4 and 2.6 with glibc 2.3.3?
>> 
>> --
>>  |   glibc
>>  |   2.3.3   2.3.4
>> --|-
>> kernel|
>> 2.4  |   X okX segfaults
>> 2.6  |   X okX ok
> 
> 
> Meanwhile, I could find where X crashes using glibc 2.3.4 with kernel 2.4.
> It's this piece of code in linux_vm86.c:267
> 
> static int
> vm86_rep(struct vm86_struct *ptr)
> {
> int __res;
> 
> #ifdef __PIC__
> /* When compiling with -fPIC, we can't use asm constraint "b" because
>%ebx is already taken by gcc. */
> __asm__ __volatile__("pushl %%ebx\n\t"
>  "movl %2,%%ebx\n\t"
>  "movl %1,%%eax\n\t"
>  "int $0x80\n\t"
>  "popl %%ebx"
>  :"=a" (__res)
>  :"n" ((int)113), "r" ((struct vm86_struct *)ptr));
> #else
> __asm__ __volatile__("int $0x80\n\t"
>  :"=a" (__res):"a" ((int)113),
>  "b" ((struct vm86_struct *)ptr));
> #endif
> 
> if (__res < 0) {
> errno = -__res;
> __res = -1;
> }
> else errno = 0;
> return __res;
> }
> 
> 
> The function ExecX86int10 (vbe.c) calls do_vm86 (linux_vm86.c), which
> calls vm86_rep (linux_vm86.c).
> 
> 
> I don't understand, why this piece of assembler code works fine with glibc
> 2.3.3, but not with glibc 2.3.4, running kernel 2.4.x. It works fine again
> with kernel 2.6.

Solution for this problem can be found meanwhile at
https://bugs.freedesktop.org/show_bug.cgi?id=2431


Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.4.x oops with X

2005-02-05 Thread Andreas Hartmann
Andreas Hartmann wrote:
 Andreas Hartmann wrote:
 [...]
 But now, the question is:
 Why does X crash running kernel 2.4.x with glibc 2.3.4 and not with kernel
 2.6.10? Why does X run fine using kernel 2.4 and 2.6 with glibc 2.3.3?
 
 --
  |   glibc
  |   2.3.3   2.3.4
 --|-
 kernel|
 2.4  |   X okX segfaults
 2.6  |   X okX ok
 
 
 Meanwhile, I could find where X crashes using glibc 2.3.4 with kernel 2.4.
 It's this piece of code in linux_vm86.c:267
 
 static int
 vm86_rep(struct vm86_struct *ptr)
 {
 int __res;
 
 #ifdef __PIC__
 /* When compiling with -fPIC, we can't use asm constraint b because
%ebx is already taken by gcc. */
 __asm__ __volatile__(pushl %%ebx\n\t
  movl %2,%%ebx\n\t
  movl %1,%%eax\n\t
  int $0x80\n\t
  popl %%ebx
  :=a (__res)
  :n ((int)113), r ((struct vm86_struct *)ptr));
 #else
 __asm__ __volatile__(int $0x80\n\t
  :=a (__res):a ((int)113),
  b ((struct vm86_struct *)ptr));
 #endif
 
 if (__res  0) {
 errno = -__res;
 __res = -1;
 }
 else errno = 0;
 return __res;
 }
 
 
 The function ExecX86int10 (vbe.c) calls do_vm86 (linux_vm86.c), which
 calls vm86_rep (linux_vm86.c).
 
 
 I don't understand, why this piece of assembler code works fine with glibc
 2.3.3, but not with glibc 2.3.4, running kernel 2.4.x. It works fine again
 with kernel 2.6.

Solution for this problem can be found meanwhile at
https://bugs.freedesktop.org/show_bug.cgi?id=2431


Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Software Suspend for 2.4 Final Release

2005-01-30 Thread Andreas Hartmann
Nigel Cunningham schrieb:
> Hi everyone.
> 
> SoftwareSuspend 2.1.5.7B for the 2.4.28 kernel is now available from
> softwaresuspend.berlios.de.
> 
> Bug fixes and forward ports to 2.4.29 and later kernels notwithstanding,
> it is intended to be the last release of SoftwareSuspend for the 2.4
> series kernels.
> 
> The 2.4 version of Suspend is generally pretty easily to get going, but
> if you have any questions or problems, you will find lots of resources
> at softwaresuspend.berlios.de. In particular, there are HOWTOs, FAQs,
> and a Wiki that you can consult before asking on the mailing lists
> you'll also find there.
> 
> Fuller instructions regarding applying the package can be found in the
> README file, included in the package.
> 
> Nigel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Software Suspend for 2.4 Final Release

2005-01-30 Thread Andreas Hartmann
Nigel Cunningham schrieb:
> Hi everyone.
> 
> SoftwareSuspend 2.1.5.7B for the 2.4.28 kernel is now available from
> softwaresuspend.berlios.de.

I'm wondering why you didn't provide a patch against 2.4.29.

Anyway, I tested it against 2.4.29. I couldn't apply the preemption patch.
The other patches could be applied with a view changes. 2.1.5.7B is
working fine afterwards - even without restarting sleeping hd's during
hibernation! Thank you very much for fixing this problem!



Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Software Suspend for 2.4 Final Release

2005-01-30 Thread Andreas Hartmann
Nigel Cunningham schrieb:
 Hi everyone.
 
 SoftwareSuspend 2.1.5.7B for the 2.4.28 kernel is now available from
 softwaresuspend.berlios.de.
 
 Bug fixes and forward ports to 2.4.29 and later kernels notwithstanding,
 it is intended to be the last release of SoftwareSuspend for the 2.4
 series kernels.
 
 The 2.4 version of Suspend is generally pretty easily to get going, but
 if you have any questions or problems, you will find lots of resources
 at softwaresuspend.berlios.de. In particular, there are HOWTOs, FAQs,
 and a Wiki that you can consult before asking on the mailing lists
 you'll also find there.
 
 Fuller instructions regarding applying the package can be found in the
 README file, included in the package.
 
 Nigel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Software Suspend for 2.4 Final Release

2005-01-30 Thread Andreas Hartmann
Nigel Cunningham schrieb:
 Hi everyone.
 
 SoftwareSuspend 2.1.5.7B for the 2.4.28 kernel is now available from
 softwaresuspend.berlios.de.

I'm wondering why you didn't provide a patch against 2.4.29.

Anyway, I tested it against 2.4.29. I couldn't apply the preemption patch.
The other patches could be applied with a view changes. 2.1.5.7B is
working fine afterwards - even without restarting sleeping hd's during
hibernation! Thank you very much for fixing this problem!



Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.10 dies when X uses PCI radeon 9200 SE, binary search result

2005-01-22 Thread Andreas Hartmann
Helge Hafting schrieb:
> On Fri, Jan 21, 2005 at 09:05:12PM +0100, Andreas Hartmann wrote:
>> Hello Helge,
>> 
>> Helge Hafting schrieb:
>> > On Sun, Jan 16, 2005 at 10:41:23PM +1100, Dave Airlie wrote:
>> >> > 
>> >> > I'm fine with adding this code, but we still don't know if this is the
>> >> > cause of his problem. The debug output can determine if this really is
>> >> > the source of the problem or if it is somewhere else.
>> >> > 
>> >> 
>> >> I actually doubt it is this stuff.. my guess is that it is something
>> >> nasty like ACPI breaking int10 for X or something like that... it
>> >> seems a lot more subtle than the usually things that break when we
>> >> mess with the DRM :-)
>> 
>> Which glibc do you use? I have problems with glibc 2.3.4, kernel 2.4.x and
>> X / Xorg while executing the int10-code of X. glibc 2.3.3 works fine for
>> me. But I could find another posting, which describes, that there are even
>> problems with glibc 2.3.3 and kernel 2.4.x.
>> 
>> It's new for me, that there could be problems with kernelversions of 2.6, 
>> too.
>> 
>> Therefore, it would be really interessting to know, which glibc version
>> you are using.
>> 
> I use glibc 2.3.2 from debian testing (or unstable).  
> This is not the problem though, because a reboot into 2.6.8.1 makes
> X work without crashing.  The crash only happens with 2.6.9-rc2
> or later kernels.

Did you try another version of glibc?

> So the only way glibc could be the culprit, is if the newer kernel
> exports some new interface that this glibc manages to mess up.  Still,
> even a buggy glibc shouldn't hang the kernel anyway.

That's certainly correct.

> Such issues
> could crash (all) user apps, but shouldn't prevent the machine from
> responding to sysrq sequences.

You emphasized the differences of the effects. But there is one reason in
all cases which I know: int10 crashes X or even the whole kernel.

I could debug the problem to the following point:

--
static int
vm86_rep(struct vm86_struct *ptr)
{
int __res;

#ifdef __PIC__
/* When compiling with -fPIC, we can't use asm constraint "b" because
   %ebx is already taken by gcc. */
__asm__ __volatile__("pushl %%ebx\n\t"
 "movl %2,%%ebx\n\t"
 "movl %1,%%eax\n\t"
 "int $0x80\n\t"
 "popl %%ebx"
 :"=a" (__res)
 :"n" ((int)113), "r" ((struct vm86_struct *)ptr));
#else
__asm__ __volatile__("int $0x80\n\t"
 :"=a" (__res):"a" ((int)113),
 "b" ((struct vm86_struct *)ptr));
#endif
/* Comment from me */
xf86MsgVerb(X_INFO,3,"my comment\n");
if (__res < 0) {
errno = -__res;
__res = -1;
}
else errno = 0;
return __res;
}

#endif
---

I could see, that X crashes in glibc 2.3.4 with kernel 2.4.x (not with
kernel 2.6.x, x <= 10, x > 10 not tested) during the first malloc syscall
after int10 to execute the function
xf86MsgVerb(X_INFO,3,"my comment\n");


The crashes depend on different versions of used software:

glibc 2.3.3 or 2.3.4 with kernel 2.4.x
glibc 2.3.2 with kernel > 2.6.9rc2

I asked a X developper, but he couldn't help until now, too.


I can't say, if glibc or the kernel could be the problem. You can't relate
it reliable neither to glibc nor to the kernel nor to X. Therefore, it
_seems_ to me, nobody really cares about the problem.

I'm willing to help to find the problem - but I'm neither a kernel
developper, nor a glibc developper nor a X developper. I'm depending on
the support of the developpers.

I think, there should work one developper of each application together to
find the problem. I could ask a X developper, which I know, if he is
willing to help to find the problem together with a developper from the
kernel and from the glibc (I don't know, who to ask from the glibc-team).


Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.10 dies when X uses PCI radeon 9200 SE, binary search result

2005-01-22 Thread Andreas Hartmann
Helge Hafting schrieb:
 On Fri, Jan 21, 2005 at 09:05:12PM +0100, Andreas Hartmann wrote:
 Hello Helge,
 
 Helge Hafting schrieb:
  On Sun, Jan 16, 2005 at 10:41:23PM +1100, Dave Airlie wrote:
   
   I'm fine with adding this code, but we still don't know if this is the
   cause of his problem. The debug output can determine if this really is
   the source of the problem or if it is somewhere else.
   
  
  I actually doubt it is this stuff.. my guess is that it is something
  nasty like ACPI breaking int10 for X or something like that... it
  seems a lot more subtle than the usually things that break when we
  mess with the DRM :-)
 
 Which glibc do you use? I have problems with glibc 2.3.4, kernel 2.4.x and
 X / Xorg while executing the int10-code of X. glibc 2.3.3 works fine for
 me. But I could find another posting, which describes, that there are even
 problems with glibc 2.3.3 and kernel 2.4.x.
 
 It's new for me, that there could be problems with kernelversions of 2.6, 
 too.
 
 Therefore, it would be really interessting to know, which glibc version
 you are using.
 
 I use glibc 2.3.2 from debian testing (or unstable).  
 This is not the problem though, because a reboot into 2.6.8.1 makes
 X work without crashing.  The crash only happens with 2.6.9-rc2
 or later kernels.

Did you try another version of glibc?

 So the only way glibc could be the culprit, is if the newer kernel
 exports some new interface that this glibc manages to mess up.  Still,
 even a buggy glibc shouldn't hang the kernel anyway.

That's certainly correct.

 Such issues
 could crash (all) user apps, but shouldn't prevent the machine from
 responding to sysrq sequences.

You emphasized the differences of the effects. But there is one reason in
all cases which I know: int10 crashes X or even the whole kernel.

I could debug the problem to the following point:

--
static int
vm86_rep(struct vm86_struct *ptr)
{
int __res;

#ifdef __PIC__
/* When compiling with -fPIC, we can't use asm constraint b because
   %ebx is already taken by gcc. */
__asm__ __volatile__(pushl %%ebx\n\t
 movl %2,%%ebx\n\t
 movl %1,%%eax\n\t
 int $0x80\n\t
 popl %%ebx
 :=a (__res)
 :n ((int)113), r ((struct vm86_struct *)ptr));
#else
__asm__ __volatile__(int $0x80\n\t
 :=a (__res):a ((int)113),
 b ((struct vm86_struct *)ptr));
#endif
/* Comment from me */
xf86MsgVerb(X_INFO,3,my comment\n);
if (__res  0) {
errno = -__res;
__res = -1;
}
else errno = 0;
return __res;
}

#endif
---

I could see, that X crashes in glibc 2.3.4 with kernel 2.4.x (not with
kernel 2.6.x, x = 10, x  10 not tested) during the first malloc syscall
after int10 to execute the function
xf86MsgVerb(X_INFO,3,my comment\n);


The crashes depend on different versions of used software:

glibc 2.3.3 or 2.3.4 with kernel 2.4.x
glibc 2.3.2 with kernel  2.6.9rc2

I asked a X developper, but he couldn't help until now, too.


I can't say, if glibc or the kernel could be the problem. You can't relate
it reliable neither to glibc nor to the kernel nor to X. Therefore, it
_seems_ to me, nobody really cares about the problem.

I'm willing to help to find the problem - but I'm neither a kernel
developper, nor a glibc developper nor a X developper. I'm depending on
the support of the developpers.

I think, there should work one developper of each application together to
find the problem. I could ask a X developper, which I know, if he is
willing to help to find the problem together with a developper from the
kernel and from the glibc (I don't know, who to ask from the glibc-team).


Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.10 dies when X uses PCI radeon 9200 SE, binary search result

2005-01-21 Thread Andreas Hartmann
Hello Helge,

Helge Hafting schrieb:
> On Sun, Jan 16, 2005 at 10:41:23PM +1100, Dave Airlie wrote:
>> > 
>> > I'm fine with adding this code, but we still don't know if this is the
>> > cause of his problem. The debug output can determine if this really is
>> > the source of the problem or if it is somewhere else.
>> > 
>> 
>> I actually doubt it is this stuff.. my guess is that it is something
>> nasty like ACPI breaking int10 for X or something like that... it
>> seems a lot more subtle than the usually things that break when we
>> mess with the DRM :-)

Which glibc do you use? I have problems with glibc 2.3.4, kernel 2.4.x and
X / Xorg while executing the int10-code of X. glibc 2.3.3 works fine for
me. But I could find another posting, which describes, that there are even
problems with glibc 2.3.3 and kernel 2.4.x.

It's new for me, that there could be problems with kernelversions of 2.6, too.

Therefore, it would be really interessting to know, which glibc version
you are using.


Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.10 dies when X uses PCI radeon 9200 SE, binary search result

2005-01-21 Thread Andreas Hartmann
Hello Helge,

Helge Hafting schrieb:
 On Sun, Jan 16, 2005 at 10:41:23PM +1100, Dave Airlie wrote:
  
  I'm fine with adding this code, but we still don't know if this is the
  cause of his problem. The debug output can determine if this really is
  the source of the problem or if it is somewhere else.
  
 
 I actually doubt it is this stuff.. my guess is that it is something
 nasty like ACPI breaking int10 for X or something like that... it
 seems a lot more subtle than the usually things that break when we
 mess with the DRM :-)

Which glibc do you use? I have problems with glibc 2.3.4, kernel 2.4.x and
X / Xorg while executing the int10-code of X. glibc 2.3.3 works fine for
me. But I could find another posting, which describes, that there are even
problems with glibc 2.3.3 and kernel 2.4.x.

It's new for me, that there could be problems with kernelversions of 2.6, too.

Therefore, it would be really interessting to know, which glibc version
you are using.


Kind regards,
Andreas Hartmann
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >