Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2017-01-03 Thread Mark Lord
On 17-01-02 07:40 PM, Ansis Atteka wrote:
..
> I think that I am getting closer to the root cause of this bug. Also,
> I have a workaround that at least makes r8152 functionally stable in
> my Dell TB15 dock. Mark, would you mind giving a chance to the patch
> that I have in the bottom of this email to see if it helps your issue
> too (you might have to tweak those settings slightly differently if
> you use something else than USB 3.0)

 /* USB_RX_EARLY_TIMEOUT */
-#define COALESCE_SUPER  85000U
-#define COALESCE_HIGH  25U
-#define COALESCE_SLOW  524280U
+#define COALESCE_SUPER  8500U
+#define COALESCE_HIGH  25000U
+#define COALESCE_SLOW  52428U

The RTL_VER_02 chip that I was using does not support interrupt coalescing
in the driver [see the rtl8152_set_coalesce() function].  So that workaround
would not help here.
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-12-09 Thread Mark Lord
On 16-12-08 10:23 PM, Hayes Wang wrote:
> Mark Lord <ml...@pobox.com>
> 
> I find an issue about autosuspend, and it may result in the same
> problem with you. I don't sure if this is helpful to you, because
> it only occurs when enabling the autosuspend.

Thanks.  I am using ASIX adapters now.

I did try the latest 4.9-rc8, and 4.8.12 kernels with the r8152 dongle 
yesterday,
in hope that perhaps the many EHCI fixes from those kernels might help out.

The dongle was unusable with those newer kernels.
Most of the time it failed with "Get ether addr fail\n" at startup.

On the occasions where it got past that point, it often failed
the DHCP negotiation, but this looks more like a bug elsewhere in
the kernel, possibly racing against initialization of the random
number generators.  Adding a 2-second sleep the the r8151 probe
function made this error mostly go away.

Cheers
-- 
Mark Lord


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-25 Thread Mark Lord
On 16-11-25 09:22 AM, Greg KH wrote:
> On Fri, Nov 25, 2016 at 07:41:42AM -0500, Mark Lord wrote:
>> On 16-11-25 07:34 AM, Mark Lord wrote:
>>> On 16-11-25 04:53 AM, Greg KH wrote:
>>>> Note, there are "cheap" USB monitors that can be quite handy and that work 
>>>> on Linux:
>>>>http://www.totalphase.com/products/beagle-usb12/
>>>
>>> USD$455/each in quantity, vs. USD$8 for the USB ethernet dongle.
>>
>> Oh, wrong model.  That one doesn't do USB2.
>> The USB2 version is a mere USD$1300 in quantity.
>>
>> Seems like rather a lot of money just to report a bug in a USB driver.
>> Perhaps the Linux Foundation might purchase one and loan it for this task?
> 
> You already have access to a USB analyzer you said, why would I try to
> buy one and ship it around the world instead?  Makes no sense...

No, the company where I am consulting has a paperweight called a "USB analyzer".
It doesn't work with Linux machines.

You are the one who suggested purchase of a working Linux compatible unit,
so I was just following up to see if you were serious about that.

No worries.
I'll see if the paperweight can be converted into something useful next week.

Cheers


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-25 Thread Mark Lord
On 16-11-25 04:52 AM, Hayes Wang wrote:
..
> What is the value of /sys/bus/usb/devices/.../power/control ?

That entry does not exist -- power control is completely
disabled on this board.

Good try, though -- USB power control still causes me trouble
on PCs with mice and remote controls.  But not here.



Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-25 Thread Mark Lord
On 16-11-25 04:53 AM, Greg KH wrote:
> On Thu, Nov 24, 2016 at 10:49:33PM -0500, Mark Lord wrote:
>> There is no possibility for them to be used for anything other than
>> USB receive buffers, for this driver only.  Nothing in the driver
>> or kernel ever writes to those buffers after initial allocation,
>> and only the driver and USB host controller ever have pointers to the 
>> buffers.
> 
> You really are going to have to break out that USB monitor to verify
> that this is the data coming across the wire.

Not sure why, because there really is no other way for the data to
appear where it does at the beginning of that URB buffer.

This does seem a rather unexpected burden to place upon someone
reporting a regression in a USB network driver that corrupts user data.

I have already spent about 50 hours looking at this issue,
and everything now points firmly at some kind of FIFO overflow
within the dongle itself.  There is no evidence to the contrary.

I am very happy to test any driver updates, or data collection mods
provided by the author, to help the author find/fix the issue.

One idea, might be to have the author try testing with the dongle
connected through a USB1.1 hub, forcing it to slower speeds.
This might make reproducing the issue (if indeed a FIFO overflow)
easier, as the host transfers will then be slower than the
ethernet wire speed.

I have access to the hardware here next Tuesday.
If we can scrounge up the USB analyzer, cables, and a suitable
MS-Windows (ugh) machine of some kind, then I'll see if it can
be programmed to somewhow capture the event.  Probably just set it
in continuous capture mode, and have the target system halt
when it sees bad data at offset zero.

This can take days to reproduce, so don't hold your breaths.

Something useful to do in the meanwhile, is to then think
about "what next" after the analyzer confirms the issue.
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-25 Thread Mark Lord
On 16-11-25 04:53 AM, Greg KH wrote:
> Note, there are "cheap" USB monitors that can be quite handy and that work on 
> Linux:
>   http://www.totalphase.com/products/beagle-usb12/

USD$455/each in quantity, vs. USD$8 for the USB ethernet dongle.



Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-25 Thread Mark Lord
On 16-11-25 07:34 AM, Mark Lord wrote:
> On 16-11-25 04:53 AM, Greg KH wrote:
>> Note, there are "cheap" USB monitors that can be quite handy and that work 
>> on Linux:
>>  http://www.totalphase.com/products/beagle-usb12/
> 
> USD$455/each in quantity, vs. USD$8 for the USB ethernet dongle.

Oh, wrong model.  That one doesn't do USB2.
The USB2 version is a mere USD$1300 in quantity.

Seems like rather a lot of money just to report a bug in a USB driver.
Perhaps the Linux Foundation might purchase one and loan it for this task?


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-25 Thread Mark Lord
On 16-11-25 01:51 AM, Hayes Wang wrote:
>
> Forgive me. I provide wrong information. This is about RTL8153, not RTL8152.

No problem.  Thanks for trying though.


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-25 Thread Mark Lord
On 16-11-25 01:11 AM, Hayes Wang wrote:
> Mark Lord [mailto:ml...@pobox.com]
..
>> If that "return 0" statement is ever executed, doesn't it result
>> in the loss/leak of a buffer?
> 
> They would be found back by calling rtl_start_rx(), when the rx
> is restarted.

Good. I figured it was probably something like that, but wasn't
entirely sure about the control flow around stop/restart there.

Thanks.


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-24 Thread Mark Lord
On 16-11-24 07:27 PM, Francois Romieu wrote:
>
> Through aliasing the URB was given a page that contains said (previously)
> received file. The ethernet chip/usb host does not write anything in it.

I don't see how that could be possible.  Please elaborate.

The URB buffers are statically allocated by the driver at probe time,
ten of them in all, allocated with usb_alloc_coherent() in the copy of
the driver I am testing with.

There is no possibility for them to be used for anything other than
USB receive buffers, for this driver only.  Nothing in the driver
or kernel ever writes to those buffers after initial allocation,
and only the driver and USB host controller ever have pointers to the buffers.
-- 
Mark Lord


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-24 Thread Mark Lord
On 16-11-24 02:00 PM, Greg KH wrote:
> On Thu, Nov 24, 2016 at 01:34:08PM -0500, Mark Lord wrote:
>> One thought:  bulk data streams are byte streams, not packets.
>> Scheduling on the USB bus can break up larger transfers across
>> multiple in-kernel buffers.  A "real" URB buffer on USB2 is max 512 bytes.
>> The driver is providing 16384-byte buffers, and assumes that data will
>> never spill over from one such buffer to the next.
>> Yet the observations here consistently show otherwise.
> 
> Wait, how do you know that data will not spill over?  What is making
> that guarantee?  Will the USB device send a "zero packet" in order to
> show that all of the "logical" data is now sent for this specific
> endpoint?  Is there some sort of "framing" that the device does with the
> USB data so that the driver "knows" where the end of packet is?

Exactly my point.

> Check the zero-packet stuff for this device, that's tripped up many a
> USB driver writer over the years, myself included.

I haven't tripped over it myself, but only because we were careful
to allow for such in the USB drivers I have worked on.

The r8152 driver just assumes it never happens.


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-24 Thread Mark Lord
On 16-11-24 01:42 PM, Greg KH wrote:
>
> Have you tried using usbmon?

This system is running rootfs over NFS, so usbmon
isn't realistically going to be usable in that scenario
without a lot of reconfiguration of the setup (which in itself
might obscure the original problem).

There is a hardware USB analyzer in the building though.

But it requires a MS-Windows machine (very scarce here, I don't have one)
for the incredibly user-unfriendly software.  I'm not sure if it can be
setup to stop the trace somehow at the right point either, as it takes
overnight runs usually to catch an occurrence of the issue.

I also seem to recall that it only exports data captures in a proprietary
format that only that brand of software/device can read, but perhaps
that might not be true.  Would still need to find a MS-Windows machine/license
to even check it out though.



Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-24 Thread Mark Lord
On 16-11-24 01:34 PM, Mark Lord wrote:
>From tracing through the powerpc arch code, this is the buffer that
> is being directly DMA'd into.  And the USB layer does an invalidate_dcache
> on that entire buffer before initiating the DMA (confirmed via printk).

Slight correction:  the invalidate_dcache_range() is only done when
using kmalloc'd buffers.  I have converted the driver here
to use usb_alloc_coherent() instead, so that now gets skipped
since the memory is never cached.


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-24 Thread Mark Lord
On 16-11-24 12:11 PM, David Miller wrote:
> From: Mark Lord <ml...@pobox.com>
> Date: Thu, 24 Nov 2016 11:43:53 -0500
> 
>> So even if this were a platform memory coherency issue, one should
>> still never see ASCII data at the beginning of an rx buffer.
> 
> I'm not so convinced, since this is the kind of random corruption one
> would expect to see when dealing with virtual caches that have
> aliasing or similar issues.
> 
> Writes to address X that show up at address Y or not at all are
> precisely the signature of virtual cache aliasing problems.
> 
> Is it a case of the chip writing to X but the cpu is still seeing
> stale data from a previous CPU store?
> 
> For NFS the cpu is writing into the page cache, so we know that
> cpu side stores are where the ASCII text is coming from.
> 
> Now is the r8152 buffer one that the USB host controller is DMA'ing
> into directly, or is it one that SWIOMMU or similar bounce buffering
> is copying into?  In the latter case we are doing cpu stores into
> the area and the writes aren't coming from the device.

>From tracing through the powerpc arch code, this is the buffer that
is being directly DMA'd into.  And the USB layer does an invalidate_dcache
on that entire buffer before initiating the DMA (confirmed via printk).

The driver itself NEVER writes anything to that buffer,
and nobody else has a pointer to it other than the USB host controller,
so there's nothing else that can write to it either.

According to the driver writer, the chip should only ever write a fresh
rx_desc struct at the beginning of a buffer, never ASCII data.

So how does that buffer end up containing ASCII data from the NFS transfers?

The only explanation I can see, is if the URB itself contains
the data that we see in the URB buffer.  Which is what one would expect.
So for that to happen, the ethernet chip must be transferring that data.

The thing that is special about the situation here, is that the processor
is very slow (800Mhz 32-bit powerpc), and very busy elsewhere.
So it can easily fall way behind in servicing the ethernet dongle,
something that never happens with most modern faster machines.
So perhaps this results in a FIFO overflow somewhere in the chip.

We can boot/run this same machine from a USB memory stick, and nary a problem.
Ditto for other types of ethernet dongles.
But boot/run from that specific ethernet dongle, and we get regular
random segfaults from corrupted page fetches over NFS.

The only end-to-end data integrity available here is the rx checksum,
when verified by software rather than trusting it to the chip/driver.

One thought:  bulk data streams are byte streams, not packets.
Scheduling on the USB bus can break up larger transfers across
multiple in-kernel buffers.  A "real" URB buffer on USB2 is max 512 bytes.
The driver is providing 16384-byte buffers, and assumes that data will
never spill over from one such buffer to the next.
Yet the observations here consistently show otherwise.

Cheers
-- 
Mark Lord


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-24 Thread Mark Lord

On 16-11-24 11:43 AM, Mark Lord wrote:
..

But how does this ASCII data end up at offset zero of the rx buffer??
Not possible -- this isn't even stale data, because only an rx_desc could
be at that offset in that buffer.


Answering my own question here, I suspect it ends up there as a result
of overrunning the previous URB.  So I have updated the test copy of the driver
here now to check for that exact situation.  It's running now, but could take
hours or a day for the bug to occur again.

It seems I am being overly helpful here.

Perhaps I should have just stopped with the original regression report
(driver works in 3.12.xx, fails on all newer kernels, as a result of enabling
hardware checksums).

Had I left it there, one might reasonably expect the onus to be on the driver
developer to sort it out, with me providing retests of supplied patches as need 
be.

But I've gone WAY BEYOND that, even questioning the sanity of the platform on
which it is being used, just to avoid blaming a buggy USB dongle for some other 
issue.
And this is leading people to suspect that I really think the platform is buggy.

It isn't.   It's been running for years, with a variety of USB hardware 
attached,
and nary a problem.  Except with this r8152 dongle on kernels > 3.12.

So, yeah, the driver is fixed in our local tree, and has been for some time now.
I just was hoping that perhaps others might be interested in it too,
since the bug (whatever it is) corrupts data on the NFS server.

Cheers




Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-24 Thread Mark Lord

On 16-11-24 11:21 AM, David Miller wrote:

From: Hayes Wang 
Date: Thu, 24 Nov 2016 13:26:55 +


I don't think the garbage results from our driver or device.

This is my impression with what has been presented so far as well.


It's not garbage.

The latest run with the debug code I posted here earlier just spat out this 
below.
Using coherent (guarded, non-cacheable) RX buffers, with mb() calls:

[   15.199157] r8152_check_rx_desc: rx_desc looks bad.
[   15.204270] r8152_rx_bottom: offset=0/3376 bad rx_desc
[   15.209584] r8152_dump_rx_desc: 3d435253 3034336d 202f3a30 47524154 2f3d5445 
3034336d rx_len=21075

The bad data in this case is ASCII:

"SRC=m3400:/ TARGET=/m340"

This data is what is seen in /run/mount/utab, a file that is read/written over 
NFS on each boot.

"SRC=m3400:/ TARGET=/m3400 ROOT=/ ATTRS=nolock,addr=192.168.8.1\n"

But how does this ASCII data end up at offset zero of the rx buffer??
Not possible -- this isn't even stale data, because only an rx_desc could
be at that offset in that buffer.

So even if this were a platform memory coherency issue, one should still
never see ASCII data at the beginning of an rx buffer.  The driver NEVER
writes anything to the rx buffers.  Only the USB hardware ever does.

And only the r8152 dongle/driver exhibits this issue.
Other USB dongles do not.  They *might* still have such issues,
but because they use software checksums, the bad packets are caught/rejected.

The r8152 driver, without the debug/error-checking additions, would have tried
to interpret that ASCII data as an "rx_desc", and would have interpreted the
"checksum bits" therein as "valid checksum", and the packet would have passed
through the network stack, corrupting data.

This driver worked without noticeable issues in 3.12.xx.
It hasn't worked since.  Because it now trusts the hardware checksums,
without first checking to see if noise-on-the-line or something else
has corrupted the data before receipt in the rx buffer.

Based on the above capture, I suspect a bug in the chip itself, which perhaps
is only manifest on a very slow CPU.

Nobody here tests with slow CPUs, but they are very prevalent in embedded space.
And very few people use USB network dongles nowadays either, as nearly all 
"computers"
have built-in networking.  The market for USB network dongles is mostly 
embedded space.

Ergo.

Cheers


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-24 Thread Mark Lord

On 16-11-24 08:26 AM, Hayes Wang wrote:
..

Besides, it doesn't seem to occur for all platforms. I have
tested the iperf more than 26 hours, and it still works fine.
I think I would get the same result on x86 or x86_64 platform.

..

x86 has near fully-coherent memory, so it is the "easy" platform
to get things working on.  But Linux supports a very diverse number
of platforms, with varying degrees of cache/memory coherency,
and it can be tricky for things to work correctly on all of them.

If you are testing with the driver as currently in 4.4.34,
then you won't even notice when things are screwing up,
because the driver just silently drops packets.
Or it passes them on without noticing that they have bad data.

Here (attached) is the instrumented driver I am using here now.
I suggest you use it or something similar when testing,
and not the stock driver.

This one has also been converted to use non-cacheable RAM for the
receive buffers -- something that is probably a Good Thing
for it to do regardless of this investigation.

It also never drops a packet without logging the event,
so we can see just how often there's an issue.

This version behaves almost perfectly here, but I am still experimenting
to see what is actually necessary, and what is not.  In particular,
there are some mb() calls I had put in there that shouldn't be required,
so I have yet to try removing them again and see what changes.

It takes at least an overnight run to pop up one or two errors,
so do expect to hear back again until after the weekend at this point.

Also, unrelated, but inside r8152_submit_rx() there is this code:

/* The rx would be stopped, so skip submitting */
if (test_bit(RTL8152_UNPLUG, >flags) ||
!test_bit(WORK_ENABLE, >flags) || !netif_carrier_ok(tp->netdev))
   return 0;

If that "return 0" statement is ever executed, doesn't it result
in the loss/leak of a buffer?

Thanks


/*
 *  Copyright (c) 2014 Realtek Semiconductor Corp. All rights reserved.
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License
 * version 2 as published by the Free Software Foundation.
 *
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

/* Information for net-next */
#define NETNEXT_VERSION		"08"

/* Information for net */
#define NET_VERSION		"2"

#define DRIVER_VERSION		"v1." NETNEXT_VERSION "." NET_VERSION
#define DRIVER_AUTHOR "Realtek linux nic maintainers "
#define DRIVER_DESC "Realtek RTL8152/RTL8153 Based USB Ethernet Adapters"
#define MODULENAME "r8152"

#define R8152_PHY_ID		32

#define PLA_IDR			0xc000
#define PLA_RCR			0xc010
#define PLA_RMS			0xc016
#define PLA_RXFIFO_CTRL0	0xc0a0
#define PLA_RXFIFO_CTRL1	0xc0a4
#define PLA_RXFIFO_CTRL2	0xc0a8
#define PLA_DMY_REG0		0xc0b0
#define PLA_FMC			0xc0b4
#define PLA_CFG_WOL		0xc0b6
#define PLA_TEREDO_CFG		0xc0bc
#define PLA_MAR			0xcd00
#define PLA_BACKUP		0xd000
#define PAL_BDC_CR		0xd1a0
#define PLA_TEREDO_TIMER	0xd2cc
#define PLA_REALWOW_TIMER	0xd2e8
#define PLA_LEDSEL		0xdd90
#define PLA_LED_FEATURE		0xdd92
#define PLA_PHYAR		0xde00
#define PLA_BOOT_CTRL		0xe004
#define PLA_GPHY_INTR_IMR	0xe022
#define PLA_EEE_CR		0xe040
#define PLA_EEEP_CR		0xe080
#define PLA_MAC_PWR_CTRL	0xe0c0
#define PLA_MAC_PWR_CTRL2	0xe0ca
#define PLA_MAC_PWR_CTRL3	0xe0cc
#define PLA_MAC_PWR_CTRL4	0xe0ce
#define PLA_WDT6_CTRL		0xe428
#define PLA_TCR0		0xe610
#define PLA_TCR1		0xe612
#define PLA_MTPS		0xe615
#define PLA_TXFIFO_CTRL		0xe618
#define PLA_RSTTALLY		0xe800
#define PLA_CR			0xe813
#define PLA_CRWECR		0xe81c
#define PLA_CONFIG12		0xe81e	/* CONFIG1, CONFIG2 */
#define PLA_CONFIG34		0xe820	/* CONFIG3, CONFIG4 */
#define PLA_CONFIG5		0xe822
#define PLA_PHY_PWR		0xe84c
#define PLA_OOB_CTRL		0xe84f
#define PLA_CPCR		0xe854
#define PLA_MISC_0		0xe858
#define PLA_MISC_1		0xe85a
#define PLA_OCP_GPHY_BASE	0xe86c
#define PLA_TALLYCNT		0xe890
#define PLA_SFF_STS_7		0xe8de
#define PLA_PHYSTATUS		0xe908
#define PLA_BP_BA		0xfc26
#define PLA_BP_0		0xfc28
#define PLA_BP_1		0xfc2a
#define PLA_BP_2		0xfc2c
#define PLA_BP_3		0xfc2e
#define PLA_BP_4		0xfc30
#define PLA_BP_5		0xfc32
#define PLA_BP_6		0xfc34
#define PLA_BP_7		0xfc36
#define PLA_BP_EN		0xfc38

#define USB_USB2PHY		0xb41e
#define USB_SSPHYLINK2		0xb428
#define USB_U2P3_CTRL		0xb460
#define USB_CSR_DUMMY1		0xb464
#define USB_CSR_DUMMY2		0xb466
#define USB_DEV_STAT		0xb808
#define USB_CONNECT_TIMER	0xcbf8
#define USB_BURST_SIZE		0xcfc0
#define USB_USB_CTRL		0xd406
#define USB_PHY_CTRL		0xd408
#define USB_TX_AGG		0xd40a
#define USB_RX_BUF_TH		0xd40c
#define USB_USB_TIMER		0xd428
#define USB_RX_EARLY_TIMEOUT	0xd42c
#define USB_RX_EARLY_SIZE	0xd42e
#define USB_PM_CTRL_STATUS	0xd432
#define USB_TX_DMA		0xd434
#define USB_TOLERANCE		0xd490
#define USB_LPM_CTRL		0xd41a
#define USB_UPS_CTRL	

Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-24 Thread Mark Lord

On 16-11-23 02:29 PM, Mark Lord wrote:

On 16-11-23 10:12 AM, Hayes Wang wrote:

Mark Lord [ml...@pobox.com]
[...]

What does this code do:



static void r8153_set_rx_early_size(struct r8152 *tp)
{
   u32 mtu = tp->netdev->mtu;
   u32 ocp_data = (agg_buf_sz - mtu - VLAN_ETH_HLEN - VLAN_HLEN) / 4;

   ocp_write_word(tp, MCU_TYPE_USB, USB_RX_EARLY_SIZE, ocp_data);
}


This only works for RTL8153. However, what you use is RTL8152.
It is like delay completion. It is used to reduce the loading of CPU
by letting a transfer contain more data to reduce the number of
transfers.


How is ocp_data used by the hardware?
Shouldn't the calculation also include sizeof(rx_desc) in there somewhere?


The algorithm is from our hw engineers, and it should be

   (agg_buf_sz - packet size) / 8

You could refer to commit a59e6d815226 ("r8152: correct the rx early size").


Thanks.

Right now I am working quite hard trying to narrow things down exactly.
You are correct that the driver does appear to be careful about accesses
beyond the filled portion of a URB buffer -- for some reason I thought
the original driver had issues there, but looking again it does not seem to.

One idea that is now looking more likely:
Things could be suffering from speculative CPU accesses to RAM
(the system here has non-coherent d-cache/RAM).
This could incorrectly pre-load data from adjacent URB buffers
into the d-cache, creating coherency issues.  I am testing now
with cacheline-sized guard zones between the buffers to see if
that is the issue or not.


Nope.  Guard zones did not fix it, so it's probably not a prefetch issue.
Oddly, adding a couple of memory barriers to specific places in the driver
does help, A LOT.  Still not 100%, but it did pass 1800 reboot tests over night
with only three bad rx_desc's reported.

That's a new record here for the driver using kmalloc'd buffers,
and put reliability on par with using non-cacheable buffers.

Any way we look at it though, the chip/driver are simply unreliable,
and relying upon hardware checksums (which fail due to the driver
looking at garbage rather than the checksum bits) leads to data corruption.

Cheers





Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-23 Thread Mark Lord

On 16-11-23 10:12 AM, Hayes Wang wrote:

Mark Lord [ml...@pobox.com]
[...]

What does this code do:



static void r8153_set_rx_early_size(struct r8152 *tp)
{
   u32 mtu = tp->netdev->mtu;
   u32 ocp_data = (agg_buf_sz - mtu - VLAN_ETH_HLEN - VLAN_HLEN) / 4;

   ocp_write_word(tp, MCU_TYPE_USB, USB_RX_EARLY_SIZE, ocp_data);
}


This only works for RTL8153. However, what you use is RTL8152.
It is like delay completion. It is used to reduce the loading of CPU
by letting a transfer contain more data to reduce the number of
transfers.


How is ocp_data used by the hardware?
Shouldn't the calculation also include sizeof(rx_desc) in there somewhere?


The algorithm is from our hw engineers, and it should be

   (agg_buf_sz - packet size) / 8

You could refer to commit a59e6d815226 ("r8152: correct the rx early size").


Thanks.

Right now I am working quite hard trying to narrow things down exactly.
You are correct that the driver does appear to be careful about accesses
beyond the filled portion of a URB buffer -- for some reason I thought
the original driver had issues there, but looking again it does not seem to.

One idea that is now looking more likely:
Things could be suffering from speculative CPU accesses to RAM
(the system here has non-coherent d-cache/RAM).
This could incorrectly pre-load data from adjacent URB buffers
into the d-cache, creating coherency issues.  I am testing now
with cacheline-sized guard zones between the buffers to see if
that is the issue or not.

Worth repeating: other dongles we have tried, eg. those using the asix driver,
do not cause us any troubles here.  Only the r8152 dongles do.

The other drivers do not use hardware checksums, so even if they did
incur similar bad packets, whatever the reason, those bad packets
would be detected/rejected by the Linux network stack (software checksums).
So everything appears to behave fine with them, as it does with
the r8152 driver when hardware checksums are disabled.

Still trying to understand exactly how these errors are happening.
It takes a very long time to do a conclusive test of anything here,
and I only have the hardware for a day or two a week.
So my apologies if I am slow in getting back to you on stuff.

Cheers





Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-23 Thread Mark Lord
What does this code do:

>static void r8153_set_rx_early_size(struct r8152 *tp)
>{
>u32 mtu = tp->netdev->mtu;
>u32 ocp_data = (agg_buf_sz - mtu - VLAN_ETH_HLEN - VLAN_HLEN) / 4;
>
>ocp_write_word(tp, MCU_TYPE_USB, USB_RX_EARLY_SIZE, ocp_data);
>}

How is ocp_data used by the hardware?
Shouldn't the calculation also include sizeof(rx_desc) in there somewhere?

Thanks
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-22 Thread Mark Lord

On 16-11-18 07:03 AM, Mark Lord wrote:

On 16-11-18 02:57 AM, Hayes Wang wrote:
..

Besides, the maximum data length which the RTL8152 would send to
the host is 16KB. That is, if the agg_buf_sz is 16KB, the host
wouldn't split it. However, you still see problems for it.


How does the RTL8152 know that the limit is 16KB,
rather than some other number?  Is this a hardwired number
in the hardware, or is it a parameter that the software
sends to the chip during initialization?

..

The first issue is that a packet sometimes begins in one URB,
and completes in the next URB, without an rx_desc at the start
of the second URB.  This I have already reported earlier.


Long run tests over the weekend, with the invalidate_dcache_range() call
before the inner loop of r8152_rx_bottom(), turned up a few instances
where packets were truncated inside a 16384 byte URB buffer, without filling 
the URB.

[10.293228] r8152_rx_bottom: 4278 corrupted urb: head=9d21 
urb_offset=2856/3376 pkt_len(1518) exceeds remainder(496)
[10.304523] r8152_dump_rx_desc: 044805ee 4008 006005dc 0602  
 rx_len=1518
..
[   16.660431] r8152_rx_bottom: 7802 corrupted urb: head=9d1f8000 
urb_offset=1544/2064 pkt_len(1518) exceeds remainder(496)
[   16.671719] r8152_dump_rx_desc: 044805ee 4048 004005dc 46020006  
 rx_len=1518

The r8152.c driver attempted to build skb's for the entire packet size,
even though the 1518-byte packets had only 496-bytes of data in the URB.
It is not clear what the chip did with the rest of the packets in question,
but the next URBs in each case began with a new/real rx_desc and new packet.

There were also unconnected events during the test runs where the
test code noticed totally invalid rx_desc structs in the middles of URBs.
The stock driver would again have attempted to treat those as "valid" (ugh).

..
[   10.273906] r8152_check_rx_desc: rx_desc looks bad.
[   10.279012] r8152_rx_bottom: 4338 corrupted urb. head=9d21 
urb_offset=2856/3376 len_used=2880
[   10.288196] r8152_dump_rx_desc: 312e3239 382e3836 0a20382e 3d435253 3034336d 
202f3a30 rx_len=12857

..
[7.184565] r8152_check_rx_desc: rx_desc looks bad.
[7.189657] r8152_rx_bottom: 1678 corrupted urb. head=9d21 
urb_offset=2856/3376 len_used=2880
[7.198852] r8152_dump_rx_desc: a1388402 803c9001 84380810 a67c5c4c a77c782b 
c64c782b rx_len=1026
..
[   10.351251] r8152_check_rx_desc: rx_desc looks bad.
[   10.356356] r8152_rx_bottom: 4397 corrupted urb. head=9d20c000 
urb_offset=4400/7984 len_used=4424
[   10.365543] r8152_dump_rx_desc: 312e3239 382e3836 0a20382e 3d435253 3034336d 
202f3a30 rx_len=12857
..
[   10.518119] r8152_check_rx_desc: rx_desc looks bad.
[   10.523204] r8152_rx_bottom: 4458 corrupted urb. head=9d21 
urb_offset=4400/7984 len_used=4424
[   10.532416] r8152_dump_rx_desc: 54544120 6e3d5352 636f6c6f 65762c6b 343d7372 
6464612c rx_len=16672
..


But the driver, as written, sometimes accesses bytes outside
of the 16KB URB buffer, because it trusts the non-existent
rx_desc in these cases, and also because it accesses bytes
from the rx_desc without first checking whether there is
sufficient remaining space in the URB to hold an rx_desc.

These incorrect accesses sometimes touch memory outside
of the URB buffer.  Since the driver allocates all of its
rx URB buffers at once, they are highly likely to be
physically (and therefore virtually) adjacent in memory.

So mistakenly accessing beyond the end of one buffer will
often result in a read from memory of the next URB buffer.
Which causes a portion of it to be loaded in the the D-cache.

When that URB is subsequently filled by DMA, there then exists
a data-consistency issue:  the D-cache contains stale information
from before the latest DMA cycle.

So this explains the strange memory behaviour observed earlier on.
When I add a call to invalidate_dcache_range() to the driver
just before it begins examining a new rx URB, the problems go away.
So this confirms the observations.

Using non-cacheable RAM also makes the problem go away.
But neither is a fix for the real buffer overrun accesses in the driver.

Fix the "packet spans URBs" bug, and fix the driver to ALWAYS
test lengths/ranges before accessing the actual buffer,
and everything should begin working reliably.


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-18 Thread Mark Lord
On 16-11-18 02:57 AM, Hayes Wang wrote:
..
> Besides, the maximum data length which the RTL8152 would send to
> the host is 16KB. That is, if the agg_buf_sz is 16KB, the host
> wouldn't split it. However, you still see problems for it.

How does the RTL8152 know that the limit is 16KB,
rather than some other number?  Is this a hardwired number
in the hardware, or is it a parameter that the software
sends to the chip during initialization?

I have a USB analyzer, but it is difficult to figure out how
to program an appropriate trigger point for the capture,
since the problem (with 16KB URBs) takes minutes to hours
or even days to trigger.

And the output from the analyzer is in some proprietary format.
The in-kernel software analzer could be useful, but I have never
figured out how to use it.  :)

Since my earlier email, I have figured out another piece of the
puzzle with this dongle.

The first issue is that a packet sometimes begins in one URB,
and completes in the next URB, without an rx_desc at the start
of the second URB.  This I have already reported earlier.

But the driver, as written, sometimes accesses bytes outside
of the 16KB URB buffer, because it trusts the non-existent
rx_desc in these cases, and also because it accesses bytes
from the rx_desc without first checking whether there is
sufficient remaining space in the URB to hold an rx_desc.

These incorrect accesses sometimes touch memory outside
of the URB buffer.  Since the driver allocates all of its
rx URB buffers at once, they are highly likely to be
physically (and therefore virtually) adjacent in memory.

So mistakenly accessing beyond the end of one buffer will
often result in a read from memory of the next URB buffer.
Which causes a portion of it to be loaded in the the D-cache.

When that URB is subsequently filled by DMA, there then exists
a data-consistency issue:  the D-cache contains stale information
from before the latest DMA cycle.

So this explains the strange memory behaviour observed earlier on.
When I add a call to invalidate_dcache_range() to the driver
just before it begins examining a new rx URB, the problems go away.
So this confirms the observations.

Using non-cacheable RAM also makes the problem go away.
But neither is a fix for the real buffer overrun accesses in the driver.

Fix the "packet spans URBs" bug, and fix the driver to ALWAYS
test lengths/ranges before accessing the actual buffer,
and everything should begin working reliably.

Cheers
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-17 Thread Mark Lord

(resending.. not sure if the original had mailer errors)

On 16-11-16 10:36 PM, Hayes Wang wrote:
> [...]
>> Fix the hw rx checksum is always enabled, and the user couldn't switch
>> it to sw rx checksum.
>>
>> Note that the RTL_VER_01 only supports sw rx checksum only. Besides,
>> the hw rx checksum for RTL_VER_02 is disabled after
>> commit b9a321b48af4 ("r8152: Fix broken RX checksums."). Re-enable it.
>
> Excuse me. If I want to re-send this one patch, should I let
> RTL_VER_02 use rx hw checksum?

Definitely NOT.

I am still doing low-level tracing through the driver as time permits,
and just now found some really interesting evidence.

Using coherent buffers (non-cacheable, allocated with usb_alloc_coherent),
I can get it to fail extremely regularly by simply reducing the buffer size
(agg_buf_sz) from 16KB down to 4KB.   This makes reproducing the issue
much much easier -- the same problems do happen with the larger 16KB size,
but much less often than with smaller sizes.

So.. with a 4KB URB transfer_buffer size, along with a ton of added 
error-checking,
I see this behaviour every 10 (rx) URBs or so:

First URB (number 593):
[   34.260667] r8152_rx_bottom: 593 corrupted urb: head=bf014000 
urb_offset=2856/4096 pkt_len(1518) exceeds remainder(1216)
[   34.271931] r8152_dump_rx_desc: 044805ee 4008 006005dc 0602  
 rx_len=1518

Next URB (number 594):
[   34.281172] r8152_check_rx_desc: rx_desc looks bad.
[   34.286228] r8152_rx_bottom: 594 corrupted urb. head=bf018000 
urb_offset=0/304 len_used=24
[   34.294774] r8152_dump_rx_desc: 8300 8400 8500 8600 8700 
8800 rx_len=768

What the above sample shows, is the URB transfer buffer ran out of space in the 
middle
of a packet, and the hardware then tried to just continue that same packet in 
the next URB,
without an rx_desc header inserted.  The r8152.c driver always assumes the URB 
buffer begins
with an rx_desc, so of course this behaviour produces really weird effects, and 
system crashes, etc..

So until that driver bug is addressed, I would advise disabling hardware RX 
checksums
for all chip versions, not only for version 02.

It is not clear to me how the chip decides when to forward an rx URB to the 
host.
If you could describe how that part works for us, then it would help in further
understanding why fast systems (eg. a PC) don't generally notice the issue,
while much slower embedded systems do see the issue regularly.

Thanks
Mark



Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-17 Thread Mark Lord

On 16-11-17 09:14 AM, Mark Lord wrote:
..

Using coherent buffers (non-cacheable, allocated with usb_alloc_coherent),


Note that the same behaviour also happens with the original kmalloc'd buffers.


I can get it to fail extremely regularly by simply reducing the buffer size
(agg_buf_sz) from 16KB down to 4KB.   This makes reproducing the issue
much much easier -- the same problems do happen with the larger 16KB size,
but much less often than with smaller sizes.


Increasing the buffer size to 64KB makes the problem much less frequent,
as one might expect.  Thus far I haven't seen it happen at all, but a longer
run (1-3 days) is needed to make sure.  This however is NOT a "fix".


So.. with a 4KB URB transfer_buffer size, along with a ton of added 
error-checking,
I see this behaviour every 10 (rx) URBs or so:

First URB (number 593):
[   34.260667] r8152_rx_bottom: 593 corrupted urb: head=bf014000 
urb_offset=2856/4096 pkt_len(1518) exceeds remainder(1216)
[   34.271931] r8152_dump_rx_desc: 044805ee 4008 006005dc 0602  
 rx_len=1518

Next URB (number 594):
[   34.281172] r8152_check_rx_desc: rx_desc looks bad.
[   34.286228] r8152_rx_bottom: 594 corrupted urb. head=bf018000 
urb_offset=0/304 len_used=24
[   34.294774] r8152_dump_rx_desc: 8300 8400 8500 8600 8700 
8800 rx_len=768

What the above sample shows, is the URB transfer buffer ran out of space in the 
middle
of a packet, and the hardware then tried to just continue that same packet in 
the next URB,
without an rx_desc header inserted.  The r8152.c driver always assumes the URB 
buffer begins
with an rx_desc, so of course this behaviour produces really weird effects, and 
system crashes, etc..

So until that driver bug is addressed, I would advise disabling hardware RX 
checksums
for all chip versions, not only for version 02.

It is not clear to me how the chip decides when to forward an rx URB to the 
host.
If you could describe how that part works for us, then it would help in further
understanding why fast systems (eg. a PC) don't generally notice the issue,
while much slower embedded systems do see the issue regularly.


That last part is critical to understanding things:
How does the chip decide that a URB is "full enough" before sending it to the 
host?
Why does a really fast host see fewer packets jammed together into a single URB 
than a slower host?

The answers will help understand if there are more bugs to be found/fixed,
or if everything is explained by what has been observed thus far.

To recap:  the hardware sometimes fills a URB to the very end, and then 
continues the
current packet at the first byte of the following URB.  The r8152.c driver does 
NOT
handle this situation; instead it always interprets the first 24 bytes of every 
URB
as an "rx_desc" structure, without any kind of sanity/validation.  This results 
in
buffer overruns (it trusts the packet length field, even though the URB is too 
small
to hold such a packet), and other semi-random behaviour.

Using software rx checksums prevents Bad Things(tm) happening from most of this,
but even that is not perfect given the severity of the bug.

Cheers


Re: [PATCH net 2/2] r8152: rx descriptor check

2016-11-13 Thread Mark Lord
On 16-11-13 03:34 PM, Mark Lord wrote:
>
> The system I use it with is a 32-bit ppc476, with non-coherent RAM,
> and using 16KB page sizes.
> 
> The dongle instantly becomes a lot more reliable when r8152.c is updated
> to use usb_alloc_coherent() for URB buffers, rather than kmalloc().
> 
> Not sure why that would be though, as the USB stack normally would handle
> kmalloc'd buffers just fine.  It is calling the appropriate routines,
> which boil down to invalidating the dcache lines (for inbound bulk xfers)
> as part of usb_submit_urb(), and yet the problem there persists.
> 
> It could be caused by cache-line sharing with other allocations, but that 
> seems
> unlikely as the kmalloc() size is 16384 bytes per buffer.  Perhaps the driver
> is somehow accessing the buffer space again after doing usb_submit_urb()?
> That would certainly produce this kind of behaviour.
> 
> Or maybe there's just a memory barrier missing somewhere in path.
> 
> The really weird thing is that ASIX-based dongles (which use a different 
> driver)
> don't have this problem, and yet they also use kmalloc'd buffers.
> 
> I have access to the test system only for a day or two a week,
> and it takes a few hours to do a good test as to whether something helps or 
> not.
> I'll continue to poke at it as time and New Ideas permit.

Oh, and the problems did not exist with the 3.14.xx kernels and earlier.
They began to show up when we tried 3.16.xx and all newer kernels.

The difference there is that RX checksums were enabled in hardware as of 
3.16.xx,
and thus the network stack began accepting bad packets from the r8152 driver.

I don't know if the ASIX driver uses hardware checksums or just software 
checksums.
That might explain why it is more reliable here.
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com


Re: [PATCH net 2/2] r8152: rx descriptor check

2016-11-13 Thread Mark Lord
On 16-11-13 12:39 PM, David Miller wrote:
> From: Hayes Wang <hayesw...@realtek.com>
> Date: Fri, 11 Nov 2016 15:15:41 +0800
> 
>> For some platforms, the data in memory is not the same with the one
>> from the device. That is, the data of memory is unbelievable. The
>> check is used to find out this situation.
>>
>> Signed-off-by: Hayes Wang <hayesw...@realtek.com>
> 
> I'm all for adding consistency checks, but I disagree with proceeding
> in this manner for this.
> 
> If you add this patch now, there is a much smaller likelyhood that you
> will work with a high priority to figure out _why_ this is happening.
> 
> For all we know this could be a platform bug in the DMA API for the
> systems in question.
> 
> It could also be a bug elsewhere in the driver, either in setting up
> the descriptor DMA mappings or how the chip is programmed.
> 
> Either way the true cause must be found before we start throwing
> changes like this into the driver.

I agree.

The system I use it with is a 32-bit ppc476, with non-coherent RAM,
and using 16KB page sizes.

The dongle instantly becomes a lot more reliable when r8152.c is updated
to use usb_alloc_coherent() for URB buffers, rather than kmalloc().

Not sure why that would be though, as the USB stack normally would handle
kmalloc'd buffers just fine.  It is calling the appropriate routines,
which boil down to invalidating the dcache lines (for inbound bulk xfers)
as part of usb_submit_urb(), and yet the problem there persists.

It could be caused by cache-line sharing with other allocations, but that seems
unlikely as the kmalloc() size is 16384 bytes per buffer.  Perhaps the driver
is somehow accessing the buffer space again after doing usb_submit_urb()?
That would certainly produce this kind of behaviour.

Or maybe there's just a memory barrier missing somewhere in path.

The really weird thing is that ASIX-based dongles (which use a different driver)
don't have this problem, and yet they also use kmalloc'd buffers.

I have access to the test system only for a day or two a week,
and it takes a few hours to do a good test as to whether something helps or not.
I'll continue to poke at it as time and New Ideas permit.

New Ideas welcome!
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com


Re: [PATCH net 2/2] r8152: rx descriptor check

2016-11-12 Thread Mark Lord
On 16-11-11 07:13 AM, Francois Romieu wrote:
> Hayes Wang <hayesw...@realtek.com> :
>> For some platforms, the data in memory is not the same with the one
>> from the device. That is, the data of memory is unbelievable. The
>> check is used to find out this situation.
> 
> Invalid packet size corrupted receive descriptors in Realtek's device
> reminds of CVE-2009-4537.
> 
> Is the silicium of both devices different enough to prevent the same
> exploit to happen ?

I don't know if the hardware can do it, but the existing Linux device
driver regularly attempts to process huge unreal packet sizes here.
I've had to patch it to reject "packets" larger than the configured MRU.
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com


Re: [PATCH net] r8152: Fix broken RX checksums.

2016-11-09 Thread Mark Lord
On 16-11-09 08:09 AM, Hayes Wang wrote:
> Mark Lord [mailto:ml...@pobox.com]
..
>> The MTU/MRU on this link is the standard 1500 bytes, so a pkt_len of 2045 
>> isn't
>> valid here.
>> And the rx_desc values look an awful lot like the rx_data values that follow 
>> it.
>>
>> There's definitely more broken here than just TCP RX checksums.
> 
> I don't think it is the issue of our hw. If it happens, windows or
> other OS may have problems, too. It is like the memory issue described
> in commit 990c9b347245("Merge branch 'r8152-fixes'"). It seems that
> the data in memory is not same with the one from the device.

I am still doing long-term testing of various tweaks to the driver,
and can now confirm that changing from kmalloc() to usb_alloc_coherent()
vastly improves reliability, and re-enabling RX checksums works fine
with that change.

However, even with coherent URB buffers, I still see the occasional bad rx_desc:
like, twice in 36 hours of continuous bashing at it.

So having code in the driver to sanitize the rx_desc is essential.
My current test code (shared with Hayes already) includes validation of various
key fields of the rx_desc, and detects when the chip/driver/whatever gets 
confused.

Hopefully r8152.c will get updated to take more care before trusting
what it sees in the rx_desc fields.

Cheers
-- 
Mark Lord
ml...@pobox.com


Re: [PATCH net] r8152: Fix broken RX checksums.

2016-11-04 Thread Mark Lord
On 16-11-04 09:50 AM, Mark Lord wrote:
> Yeah, the device or driver is definitely getting confused with rx_desc 
> structures.
> I added code to check for unlikely rx_desc values, and it found this for 
> starters:
> 
> rx_desc: 00480801 00480401 00480001 0048fc00 0048f800 0048f400 pkt_len=2045
> rx_data: 00 f0 48 00 00 ec 48 00 00 e8 48 00 00 e4 48 00 00 e0 48 00 00 dc 48 
> 00 00 d8 48 00 00 d4
> 48 00
> rx_data: 00 d0 48 00 00 cc 48 00 00 c8 48 00 00 c4 48 00 00 c0 48 00 00 bc 48 
> 00 00 b8 48 00 00 b4
> 48 00
> rx_data: 00 b0 48 00 00 ac 48 00 00 01 00 00 81 ed 00 00 00 01 00 00 00 00 00 
> 00 00 00 00 02 4d ac
> 00 00
> rx_data: 10 00 ff ff ff ff 00 00 01 28 83 d6 ff 6d 00 20 25 b1 58 1b 68 ff 00 
> 05 20 01 56 41 17 35
> 00 00
> ...
> 
> The MTU/MRU on this link is the standard 1500 bytes, so a pkt_len of 2045 
> isn't valid here.
> And the rx_desc values look an awful lot like the rx_data values that follow 
> it.
> 
> There's definitely more broken here than just TCP RX checksums.

I spent a bit more time on this again today, and made progress.
The issue seems to be stale rx buffers.
I'll discuss further offline with Hayes Wang.

-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com


Re: [PATCH net] r8152: Fix broken RX checksums.

2016-11-04 Thread Mark Lord

Yeah, the device or driver is definitely getting confused with rx_desc 
structures.
I added code to check for unlikely rx_desc values, and it found this for 
starters:

rx_desc: 00480801 00480401 00480001 0048fc00 0048f800 0048f400 pkt_len=2045
rx_data: 00 f0 48 00 00 ec 48 00 00 e8 48 00 00 e4 48 00 00 e0 48 00 00 dc 48 
00 00 d8 48 00 00 d4 48 00
rx_data: 00 d0 48 00 00 cc 48 00 00 c8 48 00 00 c4 48 00 00 c0 48 00 00 bc 48 
00 00 b8 48 00 00 b4 48 00
rx_data: 00 b0 48 00 00 ac 48 00 00 01 00 00 81 ed 00 00 00 01 00 00 00 00 00 
00 00 00 00 02 4d ac 00 00
rx_data: 10 00 ff ff ff ff 00 00 01 28 83 d6 ff 6d 00 20 25 b1 58 1b 68 ff 00 
05 20 01 56 41 17 35 00 00
...

The MTU/MRU on this link is the standard 1500 bytes, so a pkt_len of 2045 isn't 
valid here.
And the rx_desc values look an awful lot like the rx_data values that follow it.

There's definitely more broken here than just TCP RX checksums.

-ml



Re: [PATCH net] r8152: Fix broken RX checksums.

2016-11-04 Thread Mark Lord

On 16-11-02 02:29 PM, Mark Lord wrote:


I have poked at it some more, and thus far it appears that it is
only necessary to disable TCP rx checksums.  The system doesn't crash
when only IP/UDP checksums are enabled, but does when TCP checksums are on.

This happens regardless of whether RX_AGG is disabled or enabled,
and increasing/decreasing the number of RX URBs (RTL8152_MAX_RX)
doesn't seem to affect it.


..

I noticed that BIT(20) was not defined as anything for "opts3",
and so I added a line to rx_csum to check whether or not that bit
was ever set.

It triggered after a few thousand reset/reboot cycles with opts3
having the rather dubious looking value of an ASCII string: 0x5c7d7852.

So to me, it appears that the rx_desc's are getting mixed-up with data 
somewhere,
and when using hardware TCP checksums this doesn't get caught.  So perhaps the
checksums themselves are fine, but there's another bug (driver or hardware?)
that sneaks through when not doing software checksums.






Re: [PATCH net] r8152: Fix broken RX checksums.

2016-11-03 Thread Mark Lord
On 16-11-03 04:56 AM, Hayes Wang wrote:
> Mark Lord [mailto:ml...@pobox.com]
>> Sent: Thursday, November 03, 2016 2:30 AM
>> To: Hayes Wang; David Miller
> [...]
>> I have poked at it some more, and thus far it appears that it is
>> only necessary to disable TCP rx checksums.  The system doesn't crash
>> when only IP/UDP checksums are enabled, but does when TCP checksums are on.
>>
>> This happens regardless of whether RX_AGG is disabled or enabled,
>> and increasing/decreasing the number of RX URBs (RTL8152_MAX_RX)
>> doesn't seem to affect it.
> 
> I test Raspberry Pi v1, but I couldn't boot with NFSROOT through
> both onboard nic and RTL8152. I get following error.
> 
>VFS: Unable to mount root fs via NFS, trying floppy.
>Kernel panic - not syncing: VFS: Unable to mount root fs on 
> unknown-block(2,0)
> 
> However, if I start the system without NFSROOT, I could mount the nfs fs.
> Any idea?

Rather than getting caught up in all of that,
you could then just chroot to the mounted nfs fs
at that point, and continue on from there.

Eg.  chroot /mnt/nfsxxx /bin/sh

Running from NFS is probably not necessary though.
Instead, perhaps just run md5sum on every file on the nfs fs
from the Raspberry Pi, and then repeat the md5sum's on the server,
and compare the results for errors.

The system I am using the dongle with is a custom embedded board,
but I think the important thing is that it has a slow-ish CPU,
which means it is more prone to having the on-chip RX FIFO overflow.
It is also big-endian rather than little-endian, though that seems
to be correctly handled already in the device driver.

I will try the md5sum test on an x86 box for comparison.

Cheers
-- 
Mark Lord


Re: [PATCH net] r8152: Fix broken RX checksums.

2016-11-02 Thread Mark Lord

On 16-10-31 04:14 AM, Hayes Wang wrote:

The r8152 driver has been broken since (approx) 3.16.xx
when support was added for hardware RX checksums
on newer chip versions.  Symptoms include random
segfaults and silent data corruption over NFS.

The hardware checksum logig does not work on the VER_02
dongles I have here when used with a slow embedded system CPU.
Google reveals others reporting similar issues on Raspberry Pi.

...

Our hw engineer says only VER_01 has the issue about rx checksum.
I need more information for checking it.


I have poked at it some more, and thus far it appears that it is
only necessary to disable TCP rx checksums.  The system doesn't crash
when only IP/UDP checksums are enabled, but does when TCP checksums are on.

This happens regardless of whether RX_AGG is disabled or enabled,
and increasing/decreasing the number of RX URBs (RTL8152_MAX_RX)
doesn't seem to affect it.

lsusb -vv (from an x86 system, not the failing embedded system) follows:

Bus 001 Device 004: ID 0bda:8152 Realtek Semiconductor Corp.
Device Descriptor:
  bLength18
  bDescriptorType 1
  bcdUSB   2.10
  bDeviceClass0 (Defined at Interface level)
  bDeviceSubClass 0
  bDeviceProtocol 0
  bMaxPacketSize064
  idVendor   0x0bda Realtek Semiconductor Corp.
  idProduct  0x8152
  bcdDevice   20.00
  iManufacturer   1 Realtek
  iProduct2 USB 10/100 LAN
  iSerial 3 84E71400257D
  bNumConfigurations  2
  Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength   39
bNumInterfaces  1
bConfigurationValue 1
iConfiguration  0
bmAttributes 0xa0
  (Bus Powered)
  Remote Wakeup
MaxPower  100mA
Interface Descriptor:
  bLength 9
  bDescriptorType 4
  bInterfaceNumber0
  bAlternateSetting   0
  bNumEndpoints   3
  bInterfaceClass   255 Vendor Specific Class
  bInterfaceSubClass255 Vendor Specific Subclass
  bInterfaceProtocol  0
  iInterface  0
  Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81  EP 1 IN
bmAttributes2
  Transfer TypeBulk
  Synch Type   None
  Usage Type   Data
wMaxPacketSize 0x0200  1x 512 bytes
bInterval   0
  Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02  EP 2 OUT
bmAttributes2
  Transfer TypeBulk
  Synch Type   None
  Usage Type   Data
wMaxPacketSize 0x0200  1x 512 bytes
bInterval   0
  Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83  EP 3 IN
bmAttributes3
  Transfer TypeInterrupt
  Synch Type   None
  Usage Type   Data
wMaxPacketSize 0x0002  1x 2 bytes
bInterval   8
  Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength   80
bNumInterfaces  2
bConfigurationValue 2
iConfiguration  0
bmAttributes 0xa0
  (Bus Powered)
  Remote Wakeup
MaxPower  100mA
Interface Descriptor:
  bLength 9
  bDescriptorType 4
  bInterfaceNumber0
  bAlternateSetting   0
  bNumEndpoints   1
  bInterfaceClass 2 Communications
  bInterfaceSubClass  6 Ethernet Networking
  bInterfaceProtocol  0
  iInterface  5 CDC Communications Control
  CDC Header:
bcdCDC   1.10
  CDC Union:
bMasterInterface0
bSlaveInterface 1
  CDC Ethernet:
iMacAddress  3 84E71400257D
bmEthernetStatistics0x
wMaxSegmentSize   1514
wNumberMCFilters0x
bNumberPowerFilters  0
  Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83  EP 3 IN
bmAttributes3
  Transfer TypeInterrupt
  Synch Type   None
  Usage Type   Data
wMaxPacketSize 0x0010  1x 16 bytes
bInterval   8
Interface Descriptor:
  bLength 9
  bDescriptorType 4
  bInterfaceNumber1
  bAlternateSetting   0
  bNumEndpoints   0
  

Re: [PATCH net] r8152: Fix broken RX checksums.

2016-10-31 Thread Mark Lord
On 16-10-31 04:14 AM, Hayes Wang wrote:
>
> Our hw engineer says only VER_01 has the issue about rx checksum.
> I need more information for checking it.

I've been doing driver work for Linux since 1991,
and learned long ago not to trust engineering specs 100%.

Get yourself a Raspberry Pi v1, set up an NFSROOT root filesystem for it,
and boot/run from that using the ethernet dongle to connect.

It should segfault like crazy when hardware RX checksums are enabled.

Definitely something wrong there, and whatever it is goes away
when RX checksums are disabled in the driver.

I have two theories as to why this happens:

1) The hardware buffer on the dongle overflows because the slow host CPU
does not empty it quickly enough.  This results in a bad checksum on the
final/truncated packet in the buffer.  The chip does not detect this.

2) Perhaps the device driver is looking at the wrong bits.

Either way, this results in data corruption and until otherwise fixed,
it is safest to just not enable RX checksums.

If it happens on a slow embedded CPU, then it can also happen on a heavily
loaded Intel/AMD CPU -- just a lot less frequently.

Cheers
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com


Re: [PATCH net] r8152: Fix broken RX checksums.

2016-10-30 Thread Mark Lord
On 16-10-30 08:57 PM, David Miller wrote:
> From: Mark Lord <ml...@pobox.com>
> Date: Sun, 30 Oct 2016 19:28:27 -0400
> 
>> The r8152 driver has been broken since (approx) 3.16.xx
>> when support was added for hardware RX checksums
>> on newer chip versions.  Symptoms include random
>> segfaults and silent data corruption over NFS.
>>
>> The hardware checksum logig does not work on the VER_02
>> dongles I have here when used with a slow embedded system CPU.
>> Google reveals others reporting similar issues on Raspberry Pi.
>>
>> So, disable hardware RX checksum support for VER_02, and fix
>> an obvious coding error for IPV6 checksums in the same function.
>>
>> Because this bug results in silent data corruption,
>> it is a good candidate for back-porting to -stable >= 3.16.xx.
>>
>> Signed-off-by: Mark Lord <ml...@pobox.com>
> 
> Applied and queued up for -stable, thanks.

Thanks.  Now that this is taken care of, I do wonder if perhaps
RX checksums ought to be enabled at all for ANY versions of this chip?

My theory is that the checksums probably work okay most of the time,
except when the hardware RX buffer overflows.

In my case, and in the case of the Raspberry Pi, the receiving CPU
is quite a bit slower than mainstream x86, so it can quite easily
fall behind in emptying the RX buffer on the chip.
The only indication this has happened may be an incorrect RX checksum.

This is only a theory, but I otherwise have trouble explaining
why we are seeing invalid RX checksums -- direct cable connections
to a switch, shared only with the NFS server.  No reason for it
to have bad RX checksums in the first place.

Should we just blanket disable RX checksums for all versions here
unless proven otherwise/safe?

Anyone out there know better?

Cheers
Mark


[PATCH net] r8152: Fix broken RX checksums.

2016-10-30 Thread Mark Lord
The r8152 driver has been broken since (approx) 3.16.xx
when support was added for hardware RX checksums
on newer chip versions.  Symptoms include random
segfaults and silent data corruption over NFS.

The hardware checksum logig does not work on the VER_02
dongles I have here when used with a slow embedded system CPU.
Google reveals others reporting similar issues on Raspberry Pi.

So, disable hardware RX checksum support for VER_02, and fix
an obvious coding error for IPV6 checksums in the same function.

Because this bug results in silent data corruption,
it is a good candidate for back-porting to -stable >= 3.16.xx.

Signed-off-by: Mark Lord <ml...@pobox.com>

--- old/drivers/net/usb/r8152.c 2016-09-30 04:20:43.0 -0400
+++ linux/drivers/net/usb/r8152.c   2016-10-26 14:15:44.932517676 -0400
@@ -1645,7 +1645,7 @@
u8 checksum = CHECKSUM_NONE;
u32 opts2, opts3;

-   if (tp->version == RTL_VER_01)
+   if (tp->version == RTL_VER_01 || tp->version == RTL_VER_02)
goto return_result;

opts2 = le32_to_cpu(rx_desc->opts2);
@@ -1660,7 +1660,7 @@
checksum = CHECKSUM_NONE;
else
checksum = CHECKSUM_UNNECESSARY;
-   } else if (RD_IPV6_CS) {
+   } else if (opts2 & RD_IPV6_CS) {
if ((opts2 & RD_UDP_CS) && !(opts3 & UDPF))
checksum = CHECKSUM_UNNECESSARY;
else if ((opts2 & RD_TCP_CS) && !(opts3 & TCPF))


Re: [PATCH] drivers/net/usb/r8152 fix broken rx checksums

2016-10-26 Thread Mark Lord

On 16-10-26 06:36 PM, Mark Lord wrote:

The r8152 driver has been broken since (approx) 3.6.16,


Correction:  broken since 3.16.xx.


when support was added for hardware rx checksum on newer chip versions.
Symptoms include random segfaults and silent data corruption over NFS.

This does not work on the VER_02 dongle I have here
when used with a slow embedded system CPU.
Google reveals others reporting similar issues on Raspberry Pi.

So, disable hardware rx checksum for VER_02, and fix
an obvious coding error for IPV6 checksums in the same function.

Because this bug results in silent data corruption,
it is a good candidate for back-porting to -stable >= 3.16.xx.
Patch attached (to deal with buggy mailer) and also below for review.

Signed-off-by: Mark Lord <ml...@pobox.com>

--- old/drivers/net/usb/r8152.c2016-09-30 04:20:43.0 -0400
+++ linux/drivers/net/usb/r8152.c2016-10-26 14:15:44.932517676 -0400
@@ -1645,7 +1645,7 @@
 u8 checksum = CHECKSUM_NONE;
 u32 opts2, opts3;

-if (tp->version == RTL_VER_01)
+if (tp->version == RTL_VER_01 || tp->version == RTL_VER_02)
 goto return_result;

 opts2 = le32_to_cpu(rx_desc->opts2);
@@ -1660,7 +1660,7 @@
 checksum = CHECKSUM_NONE;
 else
 checksum = CHECKSUM_UNNECESSARY;
-} else if (RD_IPV6_CS) {
+} else if (opts2 & RD_IPV6_CS) {
 if ((opts2 & RD_UDP_CS) && !(opts3 & UDPF))
 checksum = CHECKSUM_UNNECESSARY;
 else if ((opts2 & RD_TCP_CS) && !(opts3 & TCPF))



--
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com


[PATCH] drivers/net/usb/r8152 fix broken rx checksums

2016-10-26 Thread Mark Lord

The r8152 driver has been broken since (approx) 3.6.16,
when support was added for hardware rx checksum on newer chip versions.
Symptoms include random segfaults and silent data corruption over NFS.

This does not work on the VER_02 dongle I have here
when used with a slow embedded system CPU.
Google reveals others reporting similar issues on Raspberry Pi.

So, disable hardware rx checksum for VER_02, and fix
an obvious coding error for IPV6 checksums in the same function.

Because this bug results in silent data corruption,
it is a good candidate for back-porting to -stable >= 3.16.xx.
Patch attached (to deal with buggy mailer) and also below for review.

Signed-off-by: Mark Lord <ml...@pobox.com>

--- old/drivers/net/usb/r8152.c 2016-09-30 04:20:43.0 -0400
+++ linux/drivers/net/usb/r8152.c   2016-10-26 14:15:44.932517676 -0400
@@ -1645,7 +1645,7 @@
u8 checksum = CHECKSUM_NONE;
u32 opts2, opts3;

-   if (tp->version == RTL_VER_01)
+   if (tp->version == RTL_VER_01 || tp->version == RTL_VER_02)
goto return_result;

opts2 = le32_to_cpu(rx_desc->opts2);
@@ -1660,7 +1660,7 @@
checksum = CHECKSUM_NONE;
else
checksum = CHECKSUM_UNNECESSARY;
-   } else if (RD_IPV6_CS) {
+   } else if (opts2 & RD_IPV6_CS) {
if ((opts2 & RD_UDP_CS) && !(opts3 & UDPF))
checksum = CHECKSUM_UNNECESSARY;
else if ((opts2 & RD_TCP_CS) && !(opts3 & TCPF))
The r8152 driver has been broken since (approx) 3.6.16,
when support was added for hardware rx checksum on newer chip versions.
Symptoms include random segfaults and silent data corruption over NFS.

This does not work on the VER_02 dongle I have here
when used with a slow embedded system CPU.
Google reveals others reporting similar issues on Raspberry Pi.

So, disable hardware rx checksum for VER_02, and fix
an obvious coding error for IPV6 checksums in the same function.

Because this bug results in silent data corruption,
it is a good candidate for back-porting to -stable >= 3.16.xx.

Signed-off-by: Mark Lord <ml...@pobox.com>

--- old/drivers/net/usb/r8152.c	2016-09-30 04:20:43.0 -0400
+++ linux/drivers/net/usb/r8152.c	2016-10-26 14:15:44.932517676 -0400
@@ -1645,7 +1645,7 @@
 	u8 checksum = CHECKSUM_NONE;
 	u32 opts2, opts3;
 
-	if (tp->version == RTL_VER_01)
+	if (tp->version == RTL_VER_01 || tp->version == RTL_VER_02)
 		goto return_result;
 
 	opts2 = le32_to_cpu(rx_desc->opts2);
@@ -1660,7 +1660,7 @@
 			checksum = CHECKSUM_NONE;
 		else
 			checksum = CHECKSUM_UNNECESSARY;
-	} else if (RD_IPV6_CS) {
+	} else if (opts2 & RD_IPV6_CS) {
 		if ((opts2 & RD_UDP_CS) && !(opts3 & UDPF))
 			checksum = CHECKSUM_UNNECESSARY;
 		else if ((opts2 & RD_TCP_CS) && !(opts3 & TCPF))


Re: [PATCH v2] libata: add support for NCQ commands for SG interface

2015-11-16 Thread Mark Lord

On 15-10-27 01:49 AM, vinayak.k...@gmail.com wrote:

From: Vinayak Kale 

This patch is needed to make NCQ commands with FPDMA protocol value
(eg READ/WRITE FPDMA) work over SCSI Generic (SG) interface.

..

+   /* For NCQ commands with FPDMA protocol, copy the tag value */
+   if (tf->protocol == ATA_PROT_NCQ)
+   tf->nsect = qc->tag << 3;
+



What prevents the qc-tag value here from conflicting with in-flight I/O
using the exact same qc-tag ??
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux: sata_nv: adma support

2015-08-25 Thread Mark Lord

On 15-08-01 09:45 PM, Robert Hancock wrote:

On Sat, Aug 1, 2015 at 2:09 PM, Pali Rohár pali.ro...@gmail.com wrote:

On Thursday 25 December 2014 07:22:13 Robert Hancock wrote:

On Tue, Dec 23, 2014 at 1:51 PM, Pali Rohár pali.ro...@gmail.com
wrote:

Hello,

I have nvidia nforce4 motherboard with nvidia sata controller:

..

It looks like something is trying to issue a command to disable APM
power management on the drive, and the command fails (likely because
it doesn't support that command).

..

  /sbin/hdparm -B254 $DRIVE

And that -B254 cause above error message in dmesg log. Output from
hdparm is:

  /dev/sda:
   setting Advanced Power Management level to 0xfe (254)
   APM_level  = not supported

..

  $ sudo hdparm -I /dev/sda | grep -i power
 *Power Management feature set


That's not the same as APM (Advanced Power Management).


However, these NVIDIA SATAs are black boxes, and rather buggy ones at that,
so it's possible there's an unknown issue there.


I wonder if NVIDIA simply bought out the IP from Pacific Digital
when they went bust?  Pacific Digital invented the original ADMA,
and the pdc_adma.c driver in the kernel knows all about it.
If the IP is pretty similar (identical?) then we could probably
improve things.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-3.14 nfsd regression

2014-05-01 Thread Mark Lord
On 14-04-04 09:58 AM, J. Bruce Fields wrote:
 On Thu, Apr 03, 2014 at 07:21:46PM -0400, Jeff Layton wrote:
 So according to the RFC you have to encode both the mode bits and the
 ftype for v2. The type bits seem to be removed from the mode in NFSv3
 though, so perhaps we should only be doing that masking in versions
 above v2?
 
 Right, the problematic patch applied the same mask in both v2 and v3
 cases, so I'm reverting just the v2 part (see below).
 
 With a quick check, it looks like the v3 code doesn't rely on those bits
 and I imagine v4 doesn't either.

 It might also be nice to have the client v2 decode_fattr function to
 throw a warning if the server sends us mismatched type bits and ftype
 values. That would have helped us catch this sooner...
 
 Yes, that might be a reasonable thing to do, though I don't know if it's
 worth it.
 
 --b.
 
 commit 35a8dff14e76c00e5b52140290cfb498dc2454a0
 Author: J. Bruce Fields bfie...@redhat.com
 Date:   Thu Apr 3 15:10:35 2014 -0400
 
 nfsd: revert v2 half of nfsd: don't return high mode bits
 
 This reverts the part of commit 6e14b46b91fee8a049b0940333ce13a820beaaa5
 that changes NFSv2 behavior.
 
 Mark Lord found that it broke nfs-root for Linux clients, because it
 broke NFSv2.
 
 In fact, from RFC 1094:
 
   Notice that the file type is specified both in the mode bits
   and in the file type.  This is really a bug in the protocol and
   will be fixed in future versions.
 
 So NFSv2 clients really are expected to depend on the high bits of the
 mode.
 
 Cc: sta...@kernel.org
 Reported-by: Mark Lord ml...@pobox.com
 Signed-off-by: J. Bruce Fields bfie...@redhat.com
 
 diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c
 index b17d932..9c769a4 100644
 --- a/fs/nfsd/nfsxdr.c
 +++ b/fs/nfsd/nfsxdr.c
 @@ -152,7 +152,7 @@ encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct 
 svc_fh *fhp,
   type = (stat-mode  S_IFMT);
  
   *p++ = htonl(nfs_ftypes[type  12]);
 - *p++ = htonl((u32) (stat-mode  S_IALLUGO));
 + *p++ = htonl((u32) stat-mode);
   *p++ = htonl((u32) stat-nlink);
   *p++ = htonl((u32) from_kuid(init_user_ns, stat-uid));
   *p++ = htonl((u32) from_kgid(init_user_ns, stat-gid));
 


Still a regression in 3.14.2 now.
Anyone got plans to push this patch out to mainline, as well as +stable ?

-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: driver skip pci_set_master, fix it? No.

2014-04-09 Thread Mark Lord
On 14-04-08 10:51 PM, Benjamin Herrenschmidt wrote:
 On Tue, 2014-04-08 at 17:18 -0400, Mark Lord wrote:
 I assume you're talking about the one added by cf3e1feba7f9 (PCI:
 Workaround missing pci_set_master in pci drivers), but as far as I
 can tell, it only calls pci_set_master() for *bridge* devices.  What
 am I missing?  Is pci_set_master() being called for your endpoint?
 What path is that?

 Yes, it is being called during execution of the _probe() function in my 
 driver,
 as evidenced by the annoying (and wrong) message it produces.

 Next time I've got the hardware at hand, I'll put a dump_stack() into there
 to see the exact calling path.
 
 Note that one of the reason we want to do it early on bridges is that without 
 it,
 we may also not get the PCIe error messages.

Sure, for bridges.

I'll get a stack trace later today, but what I suspect is happening
is that this multi-function card is being treated by the PCI layers
as a bridge for purposes of the multiple virtual functions it implements.

We will probably need to distinguish this kind of device from real bridges here.
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: driver skip pci_set_master, fix it? No.

2014-04-09 Thread Mark Lord
On 14-04-09 09:08 AM, Mark Lord wrote:
 On 14-04-08 10:51 PM, Benjamin Herrenschmidt wrote:
 On Tue, 2014-04-08 at 17:18 -0400, Mark Lord wrote:
 I assume you're talking about the one added by cf3e1feba7f9 (PCI:
 Workaround missing pci_set_master in pci drivers), but as far as I
 can tell, it only calls pci_set_master() for *bridge* devices.  What
 am I missing?  Is pci_set_master() being called for your endpoint?
 What path is that?

 Yes, it is being called during execution of the _probe() function in my 
 driver,
 as evidenced by the annoying (and wrong) message it produces.

 Next time I've got the hardware at hand, I'll put a dump_stack() into 
 there
 to see the exact calling path.

 Note that one of the reason we want to do it early on bridges is that 
 without it,
 we may also not get the PCIe error messages.
 
 Sure, for bridges.
 
 I'll get a stack trace later today, but what I suspect is happening
 is that this multi-function card is being treated by the PCI layers
 as a bridge for purposes of the multiple virtual functions it implements.
 
 We will probably need to distinguish this kind of device from real bridges 
 here.

Here's the call trace, all the way back to k7_probe(),
the driver's PCI probe function, and beyond:

[   30.481454] k7: loading driver version 0.80
[   30.485561] pcieport :00:1c.0: driver skip pci_set_master, fix it!
[   30.485580] CPU: 2 PID: 4401 Comm: insmod Tainted: G   O 3.12.14 #3
[   30.485583] Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 2.0b 
09/17/2012
[   30.485590]  0300 88041c11b9b8 8156c40b 

[   30.485598]  88041d2b7000 88041c11b9d8 812dc493 
0300
[   30.485603]  88041d399000 88041c11ba08 812dc50d 
1000
[   30.485607] Call Trace:
[   30.485616]  [8156c40b] dump_stack+0x4f/0x84
[   30.485622]  [812dc493] pci_enable_bridge+0x93/0xa0
[   30.485627]  [812dc50d] pci_enable_device_flags+0x6d/0xe0
[   30.485631]  [812dc58e] pci_enable_device+0xe/0x10
[   30.485641]  [a0469c0d] k7_enable_device+0x3d/0xa30 [k7]
[   30.485649]  [a0462d72] ? k7_devmem_alloc+0x32/0x140 [k7]
[   30.485654]  [81572ab6] ? _raw_spin_lock+0x16/0x40
[   30.485658]  [81572721] ? _raw_spin_unlock+0x11/0x40
[   30.485666]  [a046aee8] k7_probe+0x458/0x630 [k7]

[   30.485682]  [812de3d6] local_pci_probe+0x46/0x80
[   30.485696]  [812de6f1] pci_device_probe+0x101/0x110
[   30.485702]  [813941d6] driver_probe_device+0x76/0x240
[   30.485705]  [8139443b] __driver_attach+0x9b/0xa0
[   30.485709]  [813943a0] ? driver_probe_device+0x240/0x240
[   30.485713]  [81392385] bus_for_each_dev+0x55/0x90
[   30.485717]  [81393ce9] driver_attach+0x19/0x20
[   30.485720]  [81393814] bus_add_driver+0x104/0x290
[   30.485724]  [81394abf] driver_register+0x5f/0xf0
[   30.485728]  [812dd3f6] __pci_register_driver+0x46/0x50
[   30.485736]  [a024c16e] k7_init+0x16e/0x1000 [k7]
[   30.485746]  [a024c000] ? 0xa024bfff
[   30.485765]  [81000302] do_one_initcall+0x112/0x160
[   30.485779]  [81038143] ? set_memory_nx+0x43/0x50
[   30.485785]  [810abbe1] load_module+0x1e51/0x2480
[   30.485789]  [810a8b10] ? show_initstate+0x50/0x50
[   30.485794]  [810ac2ae] SyS_init_module+0x9e/0xc0
[   30.485799]  [8157389b] tracesys+0xdd/0xe
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: driver skip pci_set_master, fix it? No.

2014-04-09 Thread Mark Lord
On 14-04-09 10:12 AM, Mark Lord wrote:
 On 14-04-09 09:08 AM, Mark Lord wrote:
 On 14-04-08 10:51 PM, Benjamin Herrenschmidt wrote:
 On Tue, 2014-04-08 at 17:18 -0400, Mark Lord wrote:
 I assume you're talking about the one added by cf3e1feba7f9 (PCI:
 Workaround missing pci_set_master in pci drivers), but as far as I
 can tell, it only calls pci_set_master() for *bridge* devices.  What
 am I missing?  Is pci_set_master() being called for your endpoint?
 What path is that?

 Yes, it is being called during execution of the _probe() function in my 
 driver,
 as evidenced by the annoying (and wrong) message it produces.

 Next time I've got the hardware at hand, I'll put a dump_stack() into 
 there
 to see the exact calling path.

 Note that one of the reason we want to do it early on bridges is that 
 without it,
 we may also not get the PCIe error messages.

 Sure, for bridges.

 I'll get a stack trace later today, but what I suspect is happening
 is that this multi-function card is being treated by the PCI layers
 as a bridge for purposes of the multiple virtual functions it implements.

 We will probably need to distinguish this kind of device from real bridges 
 here.
 
 Here's the call trace, all the way back to k7_probe(),
 the driver's PCI probe function, and beyond:
 
 [   30.481454] k7: loading driver version 0.80
 [   30.485561] pcieport :00:1c.0: driver skip pci_set_master, fix it!
 [   30.485580] CPU: 2 PID: 4401 Comm: insmod Tainted: G   O 3.12.14 #3
 [   30.485583] Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 2.0b 
 09/17/2012
 [   30.485590]  0300 88041c11b9b8 8156c40b 
 
 [   30.485598]  88041d2b7000 88041c11b9d8 812dc493 
 0300
 [   30.485603]  88041d399000 88041c11ba08 812dc50d 
 1000
 [   30.485607] Call Trace:
 [   30.485616]  [8156c40b] dump_stack+0x4f/0x84
 [   30.485622]  [812dc493] pci_enable_bridge+0x93/0xa0
 [   30.485627]  [812dc50d] pci_enable_device_flags+0x6d/0xe0
 [   30.485631]  [812dc58e] pci_enable_device+0xe/0x10
 [   30.485641]  [a0469c0d] k7_enable_device+0x3d/0xa30 [k7]
 [   30.485649]  [a0462d72] ? k7_devmem_alloc+0x32/0x140 [k7]
 [   30.485654]  [81572ab6] ? _raw_spin_lock+0x16/0x40
 [   30.485658]  [81572721] ? _raw_spin_unlock+0x11/0x40
 [   30.485666]  [a046aee8] k7_probe+0x458/0x630 [k7]
 
 [   30.485682]  [812de3d6] local_pci_probe+0x46/0x80
 [   30.485696]  [812de6f1] pci_device_probe+0x101/0x110
 [   30.485702]  [813941d6] driver_probe_device+0x76/0x240
 [   30.485705]  [8139443b] __driver_attach+0x9b/0xa0
 [   30.485709]  [813943a0] ? driver_probe_device+0x240/0x240
 [   30.485713]  [81392385] bus_for_each_dev+0x55/0x90
 [   30.485717]  [81393ce9] driver_attach+0x19/0x20
 [   30.485720]  [81393814] bus_add_driver+0x104/0x290
 [   30.485724]  [81394abf] driver_register+0x5f/0xf0
 [   30.485728]  [812dd3f6] __pci_register_driver+0x46/0x50
 [   30.485736]  [a024c16e] k7_init+0x16e/0x1000 [k7]
 [   30.485746]  [a024c000] ? 0xa024bfff
 [   30.485765]  [81000302] do_one_initcall+0x112/0x160
 [   30.485779]  [81038143] ? set_memory_nx+0x43/0x50
 [   30.485785]  [810abbe1] load_module+0x1e51/0x2480
 [   30.485789]  [810a8b10] ? show_initstate+0x50/0x50
 [   30.485794]  [810ac2ae] SyS_init_module+0x9e/0xc0
 [   30.485799]  [8157389b] tracesys+0xdd/0xe
 

The e1000e network driver is suffering from this as well in 3.12.14.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: driver skip pci_set_master, fix it? No.

2014-04-09 Thread Mark Lord
On 14-04-09 11:52 AM, Bjorn Helgaas wrote:
 On Wed, Apr 9, 2014 at 8:18 AM, Mark Lord ml...@pobox.com wrote:
 On 14-04-09 10:12 AM, Mark Lord wrote:
 On 14-04-09 09:08 AM, Mark Lord wrote:
 On 14-04-08 10:51 PM, Benjamin Herrenschmidt wrote:
 On Tue, 2014-04-08 at 17:18 -0400, Mark Lord wrote:
 I assume you're talking about the one added by cf3e1feba7f9 (PCI:
 Workaround missing pci_set_master in pci drivers), but as far as I
 can tell, it only calls pci_set_master() for *bridge* devices.  What
 am I missing?  Is pci_set_master() being called for your endpoint?
 What path is that?

 Yes, it is being called during execution of the _probe() function in my 
 driver,
 as evidenced by the annoying (and wrong) message it produces.

 Next time I've got the hardware at hand, I'll put a dump_stack() into 
 there
 to see the exact calling path.

 Note that one of the reason we want to do it early on bridges is that 
 without it,
 we may also not get the PCIe error messages.

 Sure, for bridges.

 I'll get a stack trace later today, but what I suspect is happening
 is that this multi-function card is being treated by the PCI layers
 as a bridge for purposes of the multiple virtual functions it implements.

 We will probably need to distinguish this kind of device from real bridges 
 here.

 Here's the call trace, all the way back to k7_probe(),
 the driver's PCI probe function, and beyond:

 [   30.481454] k7: loading driver version 0.80
 [   30.485561] pcieport :00:1c.0: driver skip pci_set_master, fix it!
 
 This message says we're enabling bus mastering for a PCIe Root Port,
 which I think is the expected behavior and shouldn't cause trouble for
 your device (correct me if I'm wrong).
 
 I don't know the system topology, but I'm guessing the k7 device is
 below that Root Port.  We might be enabling bus mastering for the k7
 device, too, but that's not what this message is about, and we'd have
 to look at the k7 command register to know for sure whether we did
 anything to it.
 
 [   30.485580] CPU: 2 PID: 4401 Comm: insmod Tainted: G   O 3.12.14 
 #3
 [   30.485583] Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 2.0b 
 09/17/2012
 [   30.485590]  0300 88041c11b9b8 8156c40b 
 
 [   30.485598]  88041d2b7000 88041c11b9d8 812dc493 
 0300
 [   30.485603]  88041d399000 88041c11ba08 812dc50d 
 1000
 [   30.485607] Call Trace:
 [   30.485616]  [8156c40b] dump_stack+0x4f/0x84
 [   30.485622]  [812dc493] pci_enable_bridge+0x93/0xa0
 [   30.485627]  [812dc50d] pci_enable_device_flags+0x6d/0xe0
 [   30.485631]  [812dc58e] pci_enable_device+0xe/0x10
 [   30.485641]  [a0469c0d] k7_enable_device+0x3d/0xa30 [k7]
 [   30.485649]  [a0462d72] ? k7_devmem_alloc+0x32/0x140 [k7]
 [   30.485654]  [81572ab6] ? _raw_spin_lock+0x16/0x40
 [   30.485658]  [81572721] ? _raw_spin_unlock+0x11/0x40
 [   30.485666]  [a046aee8] k7_probe+0x458/0x630 [k7]
...
 The e1000e network driver is suffering from this as well in 3.12.14.
 
 I'll look at this more closely, in 3.12.14 in particular (I was
 looking at 3.14 before).  Can you collect lspci -vv output for one
 or both of these systems (the whole system, not just the device in
 question)?
 
 Maybe you could read the PCI command register after the
 pci_enable_device() and verify that bus mastering is actually being
 enabled when you didn't expect it?

I've checked the master bit now in my own driver,
and you are right -- it is still 0 after pci_enable_device().

So that message is complaining about the root port driver,
not my driver or the e1000e driver.   Confusing at first.

Whoever added the message ought to have taken care of the
root ports already. So a fix may still be needed for that.

To confirm this, here are the messages and the tree view:

pcieport :00:1c.4: driver skip pci_set_master, fix it!
pcieport :00:1c.5: driver skip pci_set_master, fix it!
pcieport :00:1c.0: driver skip pci_set_master, fix it!

lspci -t
-[:00]-+-00.0
   +-01.0-[01]--+-00.0
   |\-00.1
   +-16.0
   +-16.1
   +-1a.0
   +-1c.0-[02]--+-00.0  (these are part of my k7 device)
   |+-00.1
   |+-00.2
   |+-00.3
   |+-00.4
   |+-00.5
   |+-00.6
   |\-00.7
   +-1c.4-[03]00.0  (e1000e)
   +-1c.5-[04]00.0  (e1000e)
   +-1d.0
   +-1e.0-[05]03.0
   +-1f.0
   +-1f.2
   \-1f.3

Here is the simple lspci view.
The device I am working with has not yet been announced/released,
so I have to hide the identification for now (NDA).
The driver is being developed for GPL licensing/distribution though.

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor Family DRAM

driver skip pci_set_master, fix it? No.

2014-04-08 Thread Mark Lord
I am working a couple of drivers for chips that perform extensive bus-mastering 
ops.
These including full SRIOV support, and allow assigning virtual functions to 
virtual machines, etc.

One thing the driver (still in development) does for safety,
is defer the call to pci_set_master() until *after* it has mapped
the MMIO space of the chips, so it can reset/flush the DMA engines
before giving them permission to scribble over host RAM.

But a recent patch to the kernel has removed this from the driver's control.
The core PCI now does pci_set_master() immediately on pci_enable_device().

This could be catastrophic in some situations, depending upon the state
of the DMA engines in the device.  If they have leftovers from a previous 
life
(or a previous VM assignment), then they'll immediately resume accessing host 
RAM
as soon as the driver calls pci_enable_device(), courtesy of the embedded call
to pci_set_master().

This isn't good, and these may not be the only devices affected in this way.

Can we perhaps not do that, or provide some other way to return control of 
bus-mastering
to the device driver ?
--
ml...@pobox.com
Mark Lord
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: driver skip pci_set_master, fix it? No.

2014-04-08 Thread Mark Lord
On 14-04-08 02:27 PM, Bjorn Helgaas wrote:
 [+cc Ben, linux-pci]
 
 On Tue, Apr 8, 2014 at 10:34 AM, Mark Lord ml...@pobox.com wrote:
 I am working a couple of drivers for chips that perform extensive 
 bus-mastering ops.
 These including full SRIOV support, and allow assigning virtual functions to 
 virtual machines, etc.

 One thing the driver (still in development) does for safety,
 is defer the call to pci_set_master() until *after* it has mapped
 the MMIO space of the chips, so it can reset/flush the DMA engines
 before giving them permission to scribble over host RAM.

 But a recent patch to the kernel has removed this from the driver's control.
 The core PCI now does pci_set_master() immediately on pci_enable_device().
 
 I assume you're talking about the one added by cf3e1feba7f9 (PCI:
 Workaround missing pci_set_master in pci drivers), but as far as I
 can tell, it only calls pci_set_master() for *bridge* devices.  What
 am I missing?  Is pci_set_master() being called for your endpoint?
 What path is that?

Yes, it is being called during execution of the _probe() function in my driver,
as evidenced by the annoying (and wrong) message it produces.

Next time I've got the hardware at hand, I'll put a dump_stack() into there
to see the exact calling path.
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-3.14 nfsd regression

2014-04-03 Thread Mark Lord
On 14-04-03 12:33 PM, Mark Lord wrote:
 This commit from linux-3.14 breaks our NFS-root clients here:
 
 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6e14b46b91fee8a049b0940333ce13a820beaaa5
 
 
 - *p++ = htonl((u32) stat-mode);
 + *p++ = htonl((u32) (stat-mode  S_IALLUGO));
 
 
 Reverting the one-liner above (on the server) fixes it for us,
 as does reverting back to linux-3.13.8 on the server.
 
 The NFS-root clients are on PowerPC (big-endian) architecture,
 running linux-3.12.16. The NFS server is on an Intel PC running linux-3.14.
 
 ACL is completely disabled on server and client,
 and we're using NFSv2/v3.  No support for v4.
 
 I instrumented the function to see what other bits were being cleared
 by the (stat-mode  S_IALLUGO) masking.  The results are attached.
..


Looking into the (previously attached) trace, the bits that seem to be
getting chopped most often are S_IFREG (0x8000) and S_IFDIR (0x4000).
It appears that one/both of those are needed for mounting nfsroot.


[ 2733.823753] encode_fattr: mode=0x41ed mask=0x0fff
[ 2733.824217] encode_fattr: mode=0x41ed mask=0x0fff
[ 2733.838769] encode_fattr: mode=0x41ed mask=0x0fff
[ 2733.839175] encode_fattr: mode=0xa1ff mask=0x0fff
[ 2733.839895] encode_fattr: mode=0x81ed mask=0x0fff
[ 2733.840388] encode_fattr: mode=0x81ed mask=0x0fff
[ 2733.840431] encode_fattr: mode=0x81ed mask=0x0fff
[ 2733.841256] encode_fattr: mode=0x41ed mask=0x0fff
[ 2733.841659] encode_fattr: mode=0xa1ff mask=0x0fff
[ 2733.842379] encode_fattr: mode=0x81ed mask=0x0fff
[ 2733.842825] encode_fattr: mode=0x81ed mask=0x0fff
[ 2733.842879] encode_fattr: mode=0x81ed mask=0x0fff
[ 2733.843876] encode_fattr: mode=0x81ed mask=0x0fff
[ 2733.843924] encode_fattr: mode=0x81ed mask=0x0fff
...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-3.14 nfsd regression

2014-04-03 Thread Mark Lord
Mark Lord wrote:
 On 14-04-03 12:33 PM, Mark Lord wrote:
 This commit from linux-3.14 breaks our NFS-root clients here:

 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6e14b46b91fee8a049b0940333ce13a820beaaa5


 - *p++ = htonl((u32) stat-mode);
 + *p++ = htonl((u32) (stat-mode  S_IALLUGO));


 Reverting the one-liner above (on the server) fixes it for us,
 as does reverting back to linux-3.13.8 on the server.

 The NFS-root clients are on PowerPC (big-endian) architecture,
 running linux-3.12.16. The NFS server is on an Intel PC running linux-3.14.

 ACL is completely disabled on server and client,
 and we're using NFSv2/v3.  No support for v4.

 I instrumented the function to see what other bits were being cleared
 by the (stat-mode  S_IALLUGO) masking.  The results are attached.
 ..
 
 
 Looking into the (previously attached) trace, the bits that seem to be
 getting chopped most often are S_IFREG (0x8000) and S_IFDIR (0x4000).
 It appears that one/both of those are needed for mounting nfsroot.
 
 
 [ 2733.823753] encode_fattr: mode=0x41ed mask=0x0fff
 [ 2733.824217] encode_fattr: mode=0x41ed mask=0x0fff
 [ 2733.838769] encode_fattr: mode=0x41ed mask=0x0fff
 [ 2733.839175] encode_fattr: mode=0xa1ff mask=0x0fff
 [ 2733.839895] encode_fattr: mode=0x81ed mask=0x0fff
...

Okay, the NFS root filesystem mounts and appears to work if I use this:

*p++ = htonl((u32) (stat-mode  (S_IALLUGO | S_IFDIR | S_IFREG | 
S_IFCHR)));

But I suspect it may also need S_IFBLK for block device accesses as well.

Cheers

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-3.14 nfsd regression

2014-04-03 Thread Mark Lord
On 14-04-03 01:16 PM, J. Bruce Fields wrote:
 On Thu, Apr 03, 2014 at 12:33:55PM -0400, Mark Lord wrote:
 This commit from linux-3.14 breaks our NFS-root clients here:

 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6e14b46b91fee8a049b0940333ce13a820beaaa5


 - *p++ = htonl((u32) stat-mode);
 + *p++ = htonl((u32) (stat-mode  S_IALLUGO));


 Reverting the one-liner above (on the server) fixes it for us,
 as does reverting back to linux-3.13.8 on the server.

 The NFS-root clients are on PowerPC (big-endian) architecture,
 running linux-3.12.16. The NFS server is on an Intel PC running linux-3.14.

 ACL is completely disabled on server and client,
 and we're using NFSv2/v3.  No support for v4.

 I instrumented the function to see what other bits were being cleared
 by the (stat-mode  S_IALLUGO) masking.  The results are attached.
 
 Hm, it sounds like a bug in the client if it's depending on those high
 bits.

But only for mounting / starting up from the nfsroot, it seems.
I wonder if there's an unusual code path for that in there?
The regular stuff looks mostly fine:

p = xdr_decode_ftype3(p, fmode);
fattr-mode = (be32_to_cpup(p++)  ~S_IFMT) | fmode;

Except perhaps that second line ought to use the same mask
as the server side is using, just in case there are some other
stray high (higher than S_IFMT) bits in there now/someday.

 The original behavior was in practice harmless and changing it broke
 something, so I think we should definitely just revert this patch.

Yup.  Who?

 But the client may need fixing too.

Probably a good thing in the longer term, for better compatibility
with non-Linux servers.  But we'll still have to keep the revert
on the server (nfsd) code for backward compatibility, I think.

Cheers


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-3.14 nfsd regression

2014-04-03 Thread Mark Lord
On 14-04-03 04:15 PM, J. Bruce Fields wrote:
 On Thu, Apr 03, 2014 at 01:51:06PM -0400, Mark Lord wrote:
 On 14-04-03 01:16 PM, J. Bruce Fields wrote:
 The original behavior was in practice harmless and changing it broke
 something, so I think we should definitely just revert this patch.

 Yup.  Who?
 
 I'll submit this soon.
..

Thanks, Bruce!

-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-3.14 nfsd regression

2014-04-03 Thread Mark Lord
On 14-04-03 03:30 PM, J. Bruce Fields wrote:
 On Thu, Apr 03, 2014 at 01:51:06PM -0400, Mark Lord wrote:
 On 14-04-03 01:16 PM, J. Bruce Fields wrote:
 On Thu, Apr 03, 2014 at 12:33:55PM -0400, Mark Lord wrote:
 This commit from linux-3.14 breaks our NFS-root clients here:

 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6e14b46b91fee8a049b0940333ce13a820beaaa5


 - *p++ = htonl((u32) stat-mode);
 + *p++ = htonl((u32) (stat-mode  S_IALLUGO));


 Reverting the one-liner above (on the server) fixes it for us,
 as does reverting back to linux-3.13.8 on the server.

 The NFS-root clients are on PowerPC (big-endian) architecture,
 running linux-3.12.16. The NFS server is on an Intel PC running linux-3.14.

 ACL is completely disabled on server and client,
 and we're using NFSv2/v3.  No support for v4.

 I instrumented the function to see what other bits were being cleared
 by the (stat-mode  S_IALLUGO) masking.  The results are attached.

 Hm, it sounds like a bug in the client if it's depending on those high
 bits.

 But only for mounting / starting up from the nfsroot, it seems.
 I wonder if there's an unusual code path for that in there?
 The regular stuff looks mostly fine:

 p = xdr_decode_ftype3(p, fmode);
 fattr-mode = (be32_to_cpup(p++)  ~S_IFMT) | fmode;
 
 Hm, but that's in nfs3xdr.c; in nfs2xdr.c we have just
 
   fattr-mode = be32_to_cpup(p+);
 
 and NFSv2 is the default for nfsroot.  Do you have some reason to
 believe you're not using NFSv2?

Oh, the client here was using NFS2, absolutely.
I just don't know my way around the code very well yet.  :)

But that mask in nfs3xdr.c (client) doesn't match what the server side is using.
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-3.14 nfsd regression

2014-04-03 Thread Mark Lord
On 14-04-03 05:28 PM, J. Bruce Fields wrote:
 On Thu, Apr 03, 2014 at 04:48:11PM -0400, Mark Lord wrote:
 On 14-04-03 03:30 PM, J. Bruce Fields wrote:
 On Thu, Apr 03, 2014 at 01:51:06PM -0400, Mark Lord wrote:
 On 14-04-03 01:16 PM, J. Bruce Fields wrote:
 On Thu, Apr 03, 2014 at 12:33:55PM -0400, Mark Lord wrote:
 This commit from linux-3.14 breaks our NFS-root clients here:

 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6e14b46b91fee8a049b0940333ce13a820beaaa5


 - *p++ = htonl((u32) stat-mode);
 + *p++ = htonl((u32) (stat-mode  S_IALLUGO));


 Reverting the one-liner above (on the server) fixes it for us,
 as does reverting back to linux-3.13.8 on the server.

 The NFS-root clients are on PowerPC (big-endian) architecture,
 running linux-3.12.16. The NFS server is on an Intel PC running 
 linux-3.14.

 ACL is completely disabled on server and client,
 and we're using NFSv2/v3.  No support for v4.

 I instrumented the function to see what other bits were being cleared
 by the (stat-mode  S_IALLUGO) masking.  The results are attached.

 Hm, it sounds like a bug in the client if it's depending on those high
 bits.

 But only for mounting / starting up from the nfsroot, it seems.
 I wonder if there's an unusual code path for that in there?
 The regular stuff looks mostly fine:

 p = xdr_decode_ftype3(p, fmode);
 fattr-mode = (be32_to_cpup(p++)  ~S_IFMT) | fmode;

 Hm, but that's in nfs3xdr.c; in nfs2xdr.c we have just

 fattr-mode = be32_to_cpup(p+);

 and NFSv2 is the default for nfsroot.  Do you have some reason to
 believe you're not using NFSv2?

 Oh, the client here was using NFS2, absolutely.
 I just don't know my way around the code very well yet.  :)

 But that mask in nfs3xdr.c (client) doesn't match what the server side is 
 using.
 
 Not sure there's anything to see there.
 
 Looking at include/uapi/linux/stat.h and include/linux/stat.h, if I'm
 doing this right...
 
 S_ISFMT is   017
 S_IALLUGO is 000
 
 So they're 16-bit complements.  And we're storing the result in a short?

Oh, is it a short?  I was just going by the be32_to_cpup() macro (not a short).
But if the target field is, then okay for now, until someone expands it.

Cheers
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3.13.2: intel_iommu=on WARN_ONCE()

2014-02-12 Thread Mark Lord
New install: Haswell i7-4770 CPU, Q87 chipset.

When I enable intel_iommu=on at boot, it clutters the log with this WARN_ONCE():


...
[0.313282] Freeing initrd memory: 5092K (880037af7000 - 880037ff)
[0.313603] DMAR: No ATSR found
[0.313774] IOMMU 0 0xfed9: using Queued invalidation
[0.313935] IOMMU: Setting RMRR:
[0.314098] IOMMU: Setting identity map for device :00:1d.0 [0xcec5e000 - 
0xcec6afff]
[0.314419] IOMMU: Setting identity map for device :00:1a.0 [0xcec5e000 - 
0xcec6afff]
[0.314740] IOMMU: Setting identity map for device :00:14.0 [0xcec5e000 - 
0xcec6afff]
[0.315052] IOMMU: Prepare 0-16MiB unity mapping for LPC
[0.315216] IOMMU: Setting identity map for device :00:1f.0 [0x0 - 0xff]
[0.315530] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
[0.315734] [ cut here ]
[0.315897] WARNING: CPU: 0 PID: 1 at drivers/pci/search.c:46 
pci_find_upstream_pcie_bridge+0x5d/0x70()
[0.316201] Modules linked in:
[0.316203] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.2 #1
[0.316204] Hardware name: Gigabyte Technology Co., Ltd. To be filled by 
O.E.M./Q87M-D2H, BIOS F7
01/17/2014
[0.316205]  0009 814f7908  8103c141
[0.316206]  88041b1f2098 88041b1f2000 88041b1f2098 1fff
[0.316208]  88041b805720 812b31bd 8143c683 88041b1f2098
[0.316209] Call Trace:
[0.316213]  [814f7908] ? dump_stack+0x49/0x6a
[0.316216]  [8103c141] ? warn_slowpath_common+0x81/0xb0
[0.316217]  [812b31bd] ? pci_find_upstream_pcie_bridge+0x5d/0x70
[0.316219]  [8143c683] ? intel_iommu_add_device+0x43/0x210
[0.316221]  [81439940] ? bus_set_iommu+0x60/0x60
[0.316222]  [8143996b] ? add_iommu_group+0x2b/0x60
[0.316224]  [8134a90d] ? bus_for_each_dev+0x4d/0x80
[0.316225]  [81439929] ? bus_set_iommu+0x49/0x60
[0.316228]  [81ad8353] ? intel_iommu_init+0xa04/0xb23
[0.316230]  [81ab1b5b] ? memblock_find_dma_reserve+0x148/0x148
[0.316231]  [81ab1b69] ? pci_iommu_init+0xe/0x37
[0.316233]  [81000352] ? do_one_initcall+0x122/0x170
[0.316235]  [81057b0e] ? parse_args+0x1fe/0x340
[0.316237]  [81aadf07] ? kernel_init_freeable+0x145/0x1d3
[0.316238]  [81aad746] ? loglevel+0x31/0x31
[0.316240]  [814f07c0] ? rest_init+0x80/0x80
[0.316241]  [814f07c5] ? kernel_init+0x5/0x120
[0.316243]  [814fd7bc] ? ret_from_fork+0x7c/0xb0
[0.316244]  [814f07c0] ? rest_init+0x80/0x80
[0.316247] ---[ end trace 860cc01b6444c516 ]---
...

00:00.0 Host bridge: Intel Corporation Haswell DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Haswell PCI Express x16 Controller (rev 
06)
00:14.0 USB controller: Intel Corporation Lynx Point USB xHCI Host Controller 
(rev 04)
00:16.0 Communication controller: Intel Corporation Lynx Point MEI Controller 
#1 (rev 04)
00:16.3 Serial controller: Intel Corporation Lynx Point KT Controller (rev 04)
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 
04)
00:1a.0 USB controller: Intel Corporation Lynx Point USB Enhanced Host 
Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation Lynx Point High Definition Audio 
Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation Lynx Point PCI Express Root Port #1 (rev 
d4)
00:1c.1 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d4)
00:1c.4 PCI bridge: Intel Corporation Lynx Point PCI Express Root Port #5 (rev 
d4)
00:1d.0 USB controller: Intel Corporation Lynx Point USB Enhanced Host 
Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation Lynx Point LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation Lynx Point 6-port SATA Controller 1 
[AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation Lynx Point SMBus Controller (rev 04)
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1284 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 0e0f (rev a1)
03:00.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 41)
04:00.0 Multimedia video controller: Conexant Systems, Inc. CX23880/1/2/3 PCI 
Video and Audio
Decoder (rev 05)
04:00.1 Multimedia controller: Conexant Systems, Inc. CX23880/1/2/3 PCI Video 
and Audio Decoder
[Audio Port] (rev 05)
04:00.2 Multimedia controller: Conexant Systems, Inc. CX23880/1/2/3 PCI Video 
and Audio Decoder
[MPEG Port] (rev 05)
05:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II 
Controller (rev 01)


processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 60
model name  : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping: 3
microcode   : 0x10
cpu MHz : 3899.375
cache size  : 8192 KB
physical id : 0
siblings: 8
core id : 0
cpu cores   : 4
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu 

Re: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst

2014-01-04 Thread Mark Lord
On 14-01-03 10:40 AM, walt wrote:
 On 01/02/2014 11:15 AM, Sarah Sharp wrote:
 On Tue, Dec 31, 2013 at 12:40:16PM -0800, walt wrote:
 On 12/18/2013 01:11 PM, Greg Kroah-Hartman wrote:
 3.12-stable review patch.  If anyone has any objections, please let me 
 know.

 --

 From: David Laight david.lai...@aculab.com

 commit 35773dac5f862cb1c82ea151eba3e2f6de51ec3e upstream.

 Section 4.11.7.1 of rev 1.0 of the xhci specification states that a link 
 TRB
 can only occur at a boundary between underlying USB frames (512 bytes for
 high speed devices).

 If this isn't done the USB frames aren't formatted correctly and, for 
 example,
 the USB3 ethernet ax88179_178a card will stop sending...


 Unfortunately this patch causes a regression when copying large files to my
 outboard USB3 drive. (Nothing at all to do with networking.)
 
 Do you have CONFIG_USB_DEBUG turned on for 3.13?  If so, you should see
 dmesg output from this statement shortly before your drive fails:

 if (num_trbs = TRBS_PER_SEGMENT) {
  xhci_err(xhci, Too many fragments %d, max %d\n,
  num_trbs, TRBS_PER_SEGMENT - 1);
  return -ENOMEM;
 }
 
 Well, the answers depend on whether the usb3 drive uses logical volumes or not
 (lvm2), which I can't explain.  What I've described so far is with lvm2.
 
 When using lvm2 on the usb3 drive, turning on USB_DEBUG has *no* effect -- the
 console prints two or three lines stating that the ext4 journal has quit and
 the drive is remounted ro.  That particular drive stays wedged until the next
 reboot, but no other ill effects to the system.
 
 OTOH, when I put a disk with just an ordinary ext4 partition in the usb3 dock,
 (no logical volumes) the copy failure becomes catastrophic, with kernel panic
 messages, leaving the system unresponsive and needing a hard reset to recover.
 
 I also tried your other suggestion:
 
 diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
 index 4265b48..1a6a43d 100644
 --- a/drivers/usb/host/xhci.c
 +++ b/drivers/usb/host/xhci.c
 @@ -4714,7 +4714,7 @@ int xhci_gen_setup(struct usb_hcd *hcd, 
 xhci_get_quirks_t get_quirks)
 int retval;
  
 /* Accept arbitrarily long scatter-gather lists */
 -   hcd-self.sg_tablesize = ~0;
 +   hcd-self.sg_tablesize = 31;
  
 /* support to build packet from discontinuous buffers */
 hcd-self.no_sg_constraint = 1;
 
 Sadly it didn't fix the problem.  Did I get the patch right?


That sounds almost as if the old version is still being loaded/run,
possibly from the initramfs image?

-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst

2014-01-02 Thread Mark Lord
On 14-01-02 02:15 PM, Sarah Sharp wrote:
 On Tue, Dec 31, 2013 at 12:40:16PM -0800, walt wrote:
..
 Unfortunately this patch causes a regression when copying large files to my
 outboard USB3 drive. (Nothing at all to do with networking.)

 When I try to copy a large (20GB) file to the USB3 drive, the copy dies after
 about 7GB, the ext4 journal aborts and the drive is remounted read-only.

 This bug is 100% reproducible (always pretty close to 7GB) and reverting this
 patch completely fixes the problem.
 
 Ok, I had feared that would be a consequence of this patch.  I think the
 problem is that the usb-storage driver submitted an URB with more
 scatter-gather entries than would fit on the ring segment, the xHCI
 driver rejected the URB with -ENOMEM, and the SCSI core eventually gave
 up on the SCSI command.


Is there not a block layer / scheduler tunable for max sg entries or something?
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst

2013-12-31 Thread Mark Lord
On 13-12-31 03:40 PM, walt wrote:
 On 12/18/2013 01:11 PM, Greg Kroah-Hartman wrote:
 3.12-stable review patch.  If anyone has any objections, please let me know.

 --

 From: David Laight david.lai...@aculab.com

 commit 35773dac5f862cb1c82ea151eba3e2f6de51ec3e upstream.

 Section 4.11.7.1 of rev 1.0 of the xhci specification states that a link TRB
 can only occur at a boundary between underlying USB frames (512 bytes for
 high speed devices).

 If this isn't done the USB frames aren't formatted correctly and, for 
 example,
 the USB3 ethernet ax88179_178a card will stop sending...
 
 
 Unfortunately this patch causes a regression when copying large files to my
 outboard USB3 drive. (Nothing at all to do with networking.)
 
 When I try to copy a large (20GB) file to the USB3 drive, the copy dies after
 about 7GB, the ext4 journal aborts and the drive is remounted read-only.
 
 This bug is 100% reproducible (always pretty close to 7GB) and reverting this
 patch completely fixes the problem.
 
 (Note to Sarah:  I recently emailed you about this problem, and I *wrongly*
 said that reverting the patch doesn't help.  That was a mistake, sorry.)
..


I have also had USB3 mass-storage issues with kernels since this patch.
Dunno if the patch itself is the problem, but just too much to do with USB3
is very flakey in the most recent 3.12.* kernels, as well as the 3.13-rc* ones.
So I have gone back to running older kernels and patching them.

Cheers
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: net/usb/ax88179_178a driver broken in linux-3.12

2013-11-19 Thread Mark Lord
On 13-11-19 05:04 AM, David Laight wrote:
 From: Mark Lord
..
 except the ax88179_178a driver still does not work in linux-3.12,
 whereas it works fine in all earlier kernels.

 That's a regression.
 And a simple revert (earlier in this thread) fixes it.
 So.. let's revert it for now, until a proper xhci compatible patch is 
 produced.
...
 There is a patch to xhci-ring.c that should fix the SG problem.
 http://www.spinics.net/lists/linux-usb/msg97176.html
 
 I think it should apply to the 3.12 sources.

I am running with that patch here now (thanks),
and it too appears to prevent the lockups.

But is this patch upstream already?
If yes, then it needs to get pushed out to -stable for 3.12 at least.

If not upstream, then the revert is probably safest for -stable,
rather than new code that has never been upstream before.

Both patches are attached to this email.
One or the other is required for the USB 3.0 network adapters to function in 
3.12.

Thanks
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com
Revert USB 3.0 network driver changes that break the adapter (lockups)
in 3.12.  This just puts back the original code from previous kernels.

Signed-off-by: Mark Lord ml...@pobox.com

--- linux/drivers/net/usb/ax88179_178a.c.orig	2013-11-03 18:41:51.0 -0500
+++ linux/drivers/net/usb/ax88179_178a.c	2013-11-17 13:23:39.525734277 -0500
@@ -1177,18 +1177,31 @@
 	int frame_size = dev-maxpacket;
 	int mss = skb_shinfo(skb)-gso_size;
 	int headroom;
+	int tailroom;
 
 	tx_hdr1 = skb-len;
 	tx_hdr2 = mss;
 	if (((skb-len + 8) % frame_size) == 0)
 		tx_hdr2 |= 0x80008000;	/* Enable padding */
 
-	headroom = skb_headroom(skb) - 8;
+	headroom = skb_headroom(skb);
+	tailroom = skb_tailroom(skb);
 
-	if ((skb_header_cloned(skb) || headroom  0) 
-	pskb_expand_head(skb, headroom  0 ? 8 : 0, 0, GFP_ATOMIC)) {
+	if (!skb_header_cloned(skb) 
+	!skb_cloned(skb) 
+	(headroom + tailroom) = 8) {
+		if (headroom  8) {
+			skb-data = memmove(skb-head + 8, skb-data, skb-len);
+			skb_set_tail_pointer(skb, skb-len);
+		}
+	} else {
+		struct sk_buff *skb2;
+
+		skb2 = skb_copy_expand(skb, 8, 0, flags);
 		dev_kfree_skb_any(skb);
-		return NULL;
+		skb = skb2;
+		if (!skb)
+			return NULL;
 	}
 
 	skb_push(skb, 4);
Section 4.11.7.1 of rev 1.0 of the xhci specification states that a link TRB
can only occur at a boundary between underlying USB frames (512 bytes for 480M).

If this isn't done the USB frames aren't formatted correctly and, for example,
the USB3 ethernet ax88179_178a card will stop sending (while still receiving)
when running a netperf tcp transmit test with (say) and 8k buffer.

This should be a candidate for stable, the ax88179_178a driver defaults to
gso and tso enabled so it passes a lot of fragmented skb to the USB stack.

Signed-off-by: David Laight lt;david.laight@xxgt;
---

Changes for v2:

1) Only act on bulk endpoints.
   While isoc endpoints could suffer from the same problem it is much less
   likely and can't be fixed by adding NOP TRBs (they would stop data being
   sent in the poll interval).

2) When writing the NOP TRB use the count of TRBs instead of scanning for
   the link TRB.

 drivers/usb/host/xhci-ring.c | 53 ++--
 1 file changed, 51 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 5480215..c1342dc 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2929,8 +2929,57 @@
 	}
 
 	while (1) {
-		if (room_on_ring(xhci, ep_ring, num_trbs))
-			break;
+		if (room_on_ring(xhci, ep_ring, num_trbs)) {
+			union xhci_trb *trb = ep_ring-enqueue;
+			unsigned int usable = ep_ring-enq_seg-trbs +
+	TRBS_PER_SEGMENT - 1 - trb;
+			u32 nop_cmd;
+
+			/*
+			 * Section 4.11.7.1 TD Fragments states that a link
+			 * TRB must only occur at the boundary between
+			 * data bursts (eg 512 bytes for 480M).
+			 * While it is possible to split a large fragment
+			 * we don't know the size yet.
+			 * Simplest solution is to fill the trb before the
+			 * LINK with nop commands.
+			 */
+			if (num_trbs == 1 || num_trbs = usable || usable == 0)
+break;
+
+			if (ep_ring-type != TYPE_BULK)
+/*
+ * While isoc transfers might have a buffer that
+ * crosses a 64k boundary it is unlikely.
+ * Since we can't add NOPs without generating
+ * gaps in the traffic just hope it never
+ * happens at the end of the ring.
+ * This could be fixed by writing a LINK TRB
+ * instead of the first NOP - however the
+ * TRB_TYPE_LINK_LE32() calls would all need
+ * changing to check the ring length. */
+break;
+
+			if (num_trbs = TRBS_PER_SEGMENT) {
+xhci_err(xhci, Too many fragments %d, max %d\n,
+		num_trbs, TRBS_PER_SEGMENT - 1);
+return -ENOMEM;
+			}
+
+			nop_cmd = cpu_to_le32(TRB_TYPE(TRB_TR_NOOP) |
+	ep_ring-cycle_state);
+			ep_ring-num_trbs_free -= usable;
+			do {
+trb-generic.field[0] = 0;
+trb

Re: net/usb/ax88179_178a driver broken in linux-3.12

2013-11-19 Thread Mark Lord
On 13-11-19 09:15 AM, Eric Dumazet wrote:
 On Tue, 2013-11-19 at 09:02 -0500, Mark Lord wrote:
 On 13-11-19 05:04 AM, David Laight wrote:
 From: Mark Lord
 ..
 except the ax88179_178a driver still does not work in linux-3.12,
 whereas it works fine in all earlier kernels.

 That's a regression.
 And a simple revert (earlier in this thread) fixes it.
 So.. let's revert it for now, until a proper xhci compatible patch is 
 produced.
 ...
 There is a patch to xhci-ring.c that should fix the SG problem.
 http://www.spinics.net/lists/linux-usb/msg97176.html

 I think it should apply to the 3.12 sources.

 I am running with that patch here now (thanks),
 and it too appears to prevent the lockups.

 But is this patch upstream already?
 If yes, then it needs to get pushed out to -stable for 3.12 at least.

 If not upstream, then the revert is probably safest for -stable,
 rather than new code that has never been upstream before.

 
 Both patches are attached to this email.
 One or the other is required for the USB 3.0 network adapters to function in 
 3.12.
 
 I do not see any error in commit f27070158d6754765f2
 (ax88179_178a: avoid copy of tx tcp packets)
 Quite the contrary in fact...
 
 I suspect a TSO bug, and would rather disable TSO for this nic.

David's explanation for the XHCI issue seems to explain it nicely,
and the patch he linked to does indeed address/fix the issue,
without disabling TSO.

So on the evidence, probably NOT a TSO bug.
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v2 12/29] PCI/MSI: Introduce pcim_enable_msi*() family helpers

2013-10-28 Thread Mark Lord
On 13-10-25 06:01 AM, Alexander Gordeev wrote:

 If this problem really exists anywhere besides pSeries?
 
 I can imagine x86 hitting lack of vectors in interrupt table when
 number of CPUs exceeds hundreds, but do we have this problem now?

An awful lot of x86 hardware has a 256 (255?) vector limit for MSI/MSI-X.
Couple that with PCIe Virtual Functions, each wanting 16 vectors (for example),
and that limit is really simple to exceed today.

But this is more a problem for a sysadmin, and I am happy with the current
and the proposed methods.

Cheers
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v2 12/29] PCI/MSI: Introduce pcim_enable_msi*() family helpers

2013-10-24 Thread Mark Lord
On 13-10-24 07:41 AM, Alexander Gordeev wrote:
 On Thu, Oct 24, 2013 at 11:57:40AM +0100, David Laight wrote:
 The one case it doesn't work is where the driver either
 wants the full number or the minimum number - but not
 a value in between.

 Might be worth adding an extra parameter so that this
 (and maybe other) odd requirements can be met.
 
 IMHO its not worth it, since it is not possible to generalize
 all odd requirements out there. I do not think we should blow
 the API in this case.
 
 Having said that, the min-or-max interface is probably the only
 worth considering. But again, I would prefer to put its semantics
 to function name rather than to extra parameters, i.e.
 
 pcim_enable_msix_min_max(struct pci_dev *dev, struct msix_entry *entries,
unsigned int minvec, unsigned int maxvec);

The hardware I have in mind here works only for powers of two.
Eg. 16, 8, 4, 2, or 1 MSI-X vector.  Not the odd values in between.

But it appears I can still just use a loop for that case,
calling the new function above instead of the old functions.

Cheers

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v2 12/29] PCI/MSI: Introduce pcim_enable_msi*() family helpers

2013-10-24 Thread Mark Lord
On 13-10-24 10:31 AM, Alexander Gordeev wrote:
 On Thu, Oct 24, 2013 at 11:51:58AM +0100, Tejun Heo wrote:
 I haven't looked into any details but, if the above works for most use
 cases, it looks really good to me.
 
 Well, if we reuse Michael's statistics:
 
  - 58 drivers call pci_enable_msix()
  - 24 try a single allocation and then fallback to MSI/LSI
  - 19 use the loop style allocation
  - 14 try an allocation, and if it fails retry once
 
 ...then I expect most of 19/58 (loop style) could be converted to
 pcim_enable_msix() and pcim_enable_msix_range() and all of 14/58
 (single fallback) should be converted to pcim_enable_msix() users.

Those are just the in-kernel drivers.
There are many, many more out-of-kernel drivers for embedded platforms,
hardware in-development, etc..

Let's not be overly presumptive about the size of the user base.

Cheers
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-11 Thread Mark Lord
On 13-10-11 04:41 AM, Alexander Gordeev wrote:
 On Thu, Oct 10, 2013 at 07:17:18PM -0400, Mark Lord wrote:
 Just to help us all understand the loop issue..

 Here's an example of driver code which uses the existing MSI-X interfaces,
 for a device which can work with either 16, 8, 4, 2, or 1 MSI-X interrupt.
 This is from a new driver I'm working on right now:
..
 Now, this is a loop-free alternative:
 
 static int xx_alloc_msix_irqs(struct xx_dev *dev, int nvec)
 {
   nvec = roundup_pow_of_two(nvec);/* assume 0  nvec = 16 */
 
   xx_disable_all_irqs(dev);
 
   pci_lock_msi(dev-pdev);
 
   rc = pci_get_msix_limit(dev-pdev, nvec);
   if (rc  0)
   goto err;
 
   nvec = min(nvec, rc);   /* if limit is more than requested */
   nvec = rounddown_pow_of_two(nvec);  /* (a) */
 
   xx_prep_for_msix_vectors(dev, nvec);
 
   rc = pci_enable_msix(dev-pdev, dev-irqs, nvec);   /* (b)  */
   if (rc  0)
   goto err;
 
   pci_unlock_msi(dev-pdev);
 
   dev-num_vectors = nvec;/* (b) */
   return 0;
 
 err:
   pci_unlock_msi(dev-pdev);
 
 kerr(dev-name, pci_enable_msix() failed, err=%d, rc);
 dev-num_vectors = 0;
 return rc;
 }

That would still need a loop, to handle the natural race between
the calls to pci_get_msix_limit() and pci_enable_msix() -- the driver and device
can and should fall back to a smaller number of vectors when pci_enable_msix() 
fails.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-10 Thread Mark Lord
Just to help us all understand the loop issue..

Here's an example of driver code which uses the existing MSI-X interfaces,
for a device which can work with either 16, 8, 4, 2, or 1 MSI-X interrupt.
This is from a new driver I'm working on right now:


static int xx_alloc_msix_irqs (struct xx_dev *dev, int nvec)
{
xx_disable_all_irqs(dev);
do {
if (nvec  2)
xx_prep_for_1_msix_vector(dev);
else if (nvec  4)
xx_prep_for_2_msix_vectors(dev);
else if (nvec  8)
xx_prep_for_4_msix_vectors(dev);
else if (nvec  16)
xx_prep_for_8_msix_vectors(dev);
else
xx_prep_for_16_msix_vectors(dev);
nvec = pci_enable_msix(dev-pdev, dev-irqs, dev-num_vectors);
} while (nvec  0);

if (nvec) {
kerr(dev-name, pci_enable_msix() failed, err=%d, nvec);
dev-num_vectors = 0;
return nvec;
}
return 0;   /* success */
}
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-08 Thread Mark Lord
On 13-10-02 06:29 AM, Alexander Gordeev wrote:
..
 This update converts pci_enable_msix() and pci_enable_msi_block()
 interfaces to canonical kernel functions and makes them return a
 error code in case of failure or 0 in case of success.

Rather than silently break dozens of drivers in mysterious ways,
please invent new function names for the replacements to the
existing pci_enable_msix() and pci_enable_msi_block() functions.

That way, both in-tree and out-of-tree drivers will notice the API change,
rather than having it go unseen and just failing for unknown reasons.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/6] PCI/MSI: Factor out pci_get_msi_cap() interface

2013-10-01 Thread Mark Lord
On 13-09-26 09:03 AM, Alexander Gordeev wrote:
 On Thu, Sep 26, 2013 at 08:32:53AM -0400, Mark Lord wrote:
 On 13-09-18 05:48 AM, Alexander Gordeev wrote:
 The last pattern makes most of sense to me and could be updated with a more
 clear sequence - a call to (bit modified) pci_msix_table_size() followed
 by a call to pci_enable_msix(). I think this pattern can effectively
 supersede the currently recommended loop practice.

 The loop is still necessary, because there's a race between those two calls,
 so that pci_enable_msix() can still fail due to lack of MSIX slots.
 
 Moreover, the existing loop pattern is racy and could fail just as easily ;)

Yes, but it then loops again to correct things.

 But (1) that is something drivers should expect and (2) there is basically
 nothing to race against - that is probably the reason it has not been a
 problem for pSeries. So I think we should not care about this.

I always care about race conditions.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/6] PCI/MSI: Factor out pci_get_msi_cap() interface

2013-09-26 Thread Mark Lord
On 13-09-18 05:48 AM, Alexander Gordeev wrote:

 The last pattern makes most of sense to me and could be updated with a more
 clear sequence - a call to (bit modified) pci_msix_table_size() followed
 by a call to pci_enable_msix(). I think this pattern can effectively
 supersede the currently recommended loop practice.

The loop is still necessary, because there's a race between those two calls,
so that pci_enable_msix() can still fail due to lack of MSIX slots.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA hdd refuses to reallocate a sector?

2013-06-29 Thread Mark Lord
On 13-06-29 02:47 PM, Henrique de Moraes Holschuh wrote:
 You know, either the long or the offline SMART test routines do exactly
 that on any spinning rust device with a firmware that is not utterly broken.
 
 The HDD's firmware will rewrite, and even reallocate any weak sectors
 found by the surface scan.
 

The drives I have tried this on (smartctl -t long),
abort at the first bad sector.  Not useful.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA hdd refuses to reallocate a sector?

2013-06-24 Thread Mark Lord
On 13-06-24 03:14 AM, Ondrej Zary wrote:
..
 Being tired of using hdparm manually, I created a simple hdd_realloc utility
 that reads the disk in big blocks (1 MB). When there's a read error, it reads
 the failed block sector-by-sector and tries to rewrite the sectors that fail
 to read. It work fine for disks with just a couple of pending sectors.

Something like that would work very well if it used the hdparm approach
(directly to the drive) for the sector-by-sector part.

Going through the block layer isn't always going to work,
because the kernel likes to do I/O in PAGE_SIZE multiples.

And the SCSI stack in Linux has rather atrocious error handling.
It lumps multiple requests together, and can fail the entire lot even
if only a single sector is bad.

Cheers
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA hdd refuses to reallocate a sector?

2013-06-23 Thread Mark Lord
On 13-06-23 03:00 PM, Pavel Machek wrote:

 Thanks for the hint. (Insert rant about hdparm documentation
 explaining that it is bad idea, but not telling me _why_ is it bad
 idea. Can I expect cache consistency issues after that, or is it just
 simple you are writing to the disk without any checks? Plus, I guess
 documentation should mention what sector number is. I guess sectors
 are 512bytes for the old drives, but is it 512 or 4096 for new
 drives?)

For ATA, use the logical sector size.
For all existing drives out there, that's a 512 byte unit.

 ...but it does not do the trick :-(. It behaves strangely as if it was
 still cached somewhere. Do I need to turn off the write back cache?

No, it works just fine.  You probably have more than one bad sector.
After you see a read failure, run smartctl -a and look at the error
logs to see what sector the drive is choking on.

Or just low-level format it all with hdparm --security-erase.

Cheers
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA hdd refuses to reallocate a sector?

2013-06-23 Thread Mark Lord
On 13-06-23 05:51 PM, Pavel Machek wrote:
 On Sun 2013-06-23 17:27:52, Mark Lord wrote:

 For all existing drives out there, that's a 512 byte unit.
 
 I guessed so. (It would be good to actually document it, as well as
 documenting exactly why it is dangerous. Is it okay to send patches?)

Absolutely.  Please, even!

 Well, I definitely have more than one bad sector, but I did try to
 read exactly the same sector and it failed. See below.
..
read failed.
write works.
read failed.
write works.
read works.
dd failed.
read works.
read works.
read failed.

Odd.  The drive must be furiously reshuffling sectors or something,
or more likely pushing a piece of dirt around scuffing up more bits.

hdparm generally talks directly to a drive, not through the block
or filesystem layers.  So the block, filesystem, and page-cache stuff
don't know anything about --read-sector and --write-sector.

Cheers
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG at net/sunrpc/svc_xprt.c:921

2013-02-25 Thread Mark Lord
On 13-01-17 08:53 AM, J. Bruce Fields wrote:
 On Thu, Jan 17, 2013 at 08:11:52AM -0500, Mark Lord wrote:
 On 13-01-14 11:17 AM, Mark Lord wrote:

 Here's the code with the BUG() at net/sunrpc/svc_xprt.c line 921:

 /*
  * Remove a dead transport
  */
 static void svc_delete_xprt(struct svc_xprt *xprt)
 {
 struct svc_serv *serv = xprt-xpt_server;
 struct svc_deferred_req *dr;

 /* Only do this once */
 if (test_and_set_bit(XPT_DEAD, xprt-xpt_flags))
 BUG();


Saw this again today on 3.7.9 -- dunno if your changes are in that kernel yet 
though.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] Networking

2013-02-21 Thread Mark Lord
On 13-02-20 10:05 PM, Linus Torvalds wrote:
 On Wed, Feb 20, 2013 at 2:09 PM, David Miller da...@davemloft.net wrote:

 15) Orphan and delete a bunch of pre-historic networking drivers from
 Paul Gortmaker.
 
 Nooo You killed the 3c501 and 3c503 drivers! Snif.
 
 I wonder if they still worked..

I hope they're not really dead, because we still use them in several machines 
here
as secondary interfaces for test rigs and whatnot.

It's a tad early to be nuking support for such widespread devices.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] Networking

2013-02-21 Thread Mark Lord
On 13-02-21 09:26 PM, Paul Gortmaker wrote:
 On Thu, Feb 21, 2013 at 9:37 AM, Mark Lord ker...@start.ca wrote:
 On 13-02-20 10:05 PM, Linus Torvalds wrote:
 On Wed, Feb 20, 2013 at 2:09 PM, David Miller da...@davemloft.net wrote:
..
 Nooo You killed the 3c501 and 3c503 drivers! Snif.

 I wonder if they still worked..

 I hope they're not really dead, because we still use them in several 
 machines here
 as secondary interfaces for test rigs and whatnot.
..
 Did you actually look at the drivers deleted?
..

Finally got to one of the boxes here to check.
And you're right, I was confusing drivers.

I always seem to get the 3c509 (ISA) stuff confused with the 3c59x (PCI).
Our boxes here have the 3c59x (PCI) cards.

R.I.P. 3c50x.  :)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG at net/sunrpc/svc_xprt.c:921 (another one)

2013-02-13 Thread Mark Lord
On 13-02-12 03:52 PM, J. Bruce Fields wrote:
 On Sun, Jan 20, 2013 at 05:51:12PM -0500, Mark Lord wrote:
 Got it again, this time on a different system
 running mostly the same software.
 
 Mark, Paweł, Tom, could any of you confirm whether this helps?
..

No, I cannot confirm one way or the other,
because I haven't noticed it again since the most recent
couple of occurrences I posted earlier here.

Cheers
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG at net/sunrpc/svc_xprt.c:921 (another one)

2013-01-20 Thread Mark Lord
Got it again, this time on a different system
running mostly the same software.

This time, I noticed when it happened:  on mounting another system's storage 
over NFS.
I was doing a mount command when suddenly this happened.  Linux-3.7.3.

[ 3342.841487] [ cut here ]
[ 3342.841527] kernel BUG at net/sunrpc/svc_xprt.c:921!
[ 3342.841547] invalid opcode:  [#1] PREEMPT SMP
[ 3342.841579] Modules linked in: nfsv3 nfsv4 sha1_generic ppp_mppe ppp_async 
crc_ccitt ppp_generic
slhc btusb hid_generic arc4 usbhid hid b43 coretemp kvm_intel kvm mac80211 
cfg80211
snd_hda_codec_idt dell_wmi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss 
snd_mixer_oss snd_pcm
dell_laptop dcdbas snd_seq_dummy snd_seq_oss microcode snd_seq_midi psmouse 
snd_rawmidi
snd_seq_midi_event snd_seq nouveau ttm drm_kms_helper ssb drm bnep i2c_algo_bit 
snd_timer i2c_core
mxm_wmi snd_seq_device rfcomm snd bluetooth parport_pc soundcore snd_page_alloc 
ppdev video nfsd
auth_rpcgss nfs lockd binfmt_misc sunrpc lp parport firewire_ohci tg3 
firewire_core crc_itu_t libphy
hwmon sdhci_pci sdhci
[ 3342.842104] CPU 1
[ 3342.842120] Pid: 4108, comm: nfsv4.0-svc Not tainted 3.7.3 #3 Dell Inc. 
Precision M6300
   /0JM680
[ 3342.842151] RIP: 0010:[a007ba6e]  [a007ba6e] 
svc_delete_xprt+0x23/0xf3 [sunrpc]
[ 3342.842206] RSP: 0018:8801a680be38  EFLAGS: 00010286
[ 3342.842226] RAX:  RBX: 880215103000 RCX: dead00200200
[ 3342.842247] RDX: dead00100100 RSI: 880215103038 RDI: 0006
[ 3342.842270] RBP: 880215726900 R08: 0606 R09: 88021fd11280
[ 3342.842293] R10: 004f9885 R11: 88021fd11280 R12: 880215726900
[ 3342.842315] R13: 880215103038 R14:  R15: 
[ 3342.842337] FS:  () GS:88021fd0() 
knlGS:
[ 3342.842361] CS:  0010 DS:  ES:  CR0: 8005003b
[ 3342.842380] CR2: 00b0d000 CR3: 0001cf402000 CR4: 07e0
[ 3342.842402] DR0:  DR1:  DR2: 
[ 3342.842425] DR3:  DR6: 0ff0 DR7: 0400
[ 3342.842448] Process nfsv4.0-svc (pid: 4108, threadinfo 8801a680a000, 
task 8802027cf080)
[ 3342.842476] Stack:
[ 3342.842488]  dead00200200 88020970a000 880215103000 
880215726900
[ 3342.842527]  880215726900 a007c960 8801a680bfd8 
00011280
[ 3342.842568]  8801a680bfd8 88020970a000 88020970a000 
8801a680bf08
[ 3342.842608] Call Trace:
[ 3342.842644]  [a007c960] ? svc_recv+0xcc/0x338 [sunrpc]
[ 3342.842678]  [a0488000] ? nfs_callback_authenticate+0x20/0x20 
[nfsv4]
[ 3342.842712]  [a048801d] ? nfs4_callback_svc+0x1d/0x3c [nfsv4]
[ 3342.842739]  [8103dcf9] ? kthread+0x81/0x89
[ 3342.842763]  [81040101] ? posix_cpu_nsleep_restart+0x11/0x89
[ 3342.842787]  [8103dc78] ? kthread_freezable_should_stop+0x31/0x31
[ 3342.842813]  [8130302c] ? ret_from_fork+0x7c/0xb0
[ 3342.842835]  [8103dc78] ? kthread_freezable_should_stop+0x31/0x31
[ 3342.842856] Code: c2 84 d2 74 02 eb a0 c3 41 55 4c 8d 6f 38 41 54 4c 89 ee 
55 53 48 89 fb 51 48
8b 6f 40 bf 06 00 00 00 e8 62 fa ff ff 85 c0 74 02 0f 0b 48 8b 43 08 4c 8d 65 
10 48 89 df ff 50 38
4c 89 e7 e8 2a
[ 3342.843192] RIP  [a007ba6e] svc_delete_xprt+0x23/0xf3 [sunrpc]
[ 3342.843235]  RSP 8801a680be38
[ 3342.858471] ---[ end trace f0dbe9f9cd4029c3 ]---
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG at net/sunrpc/svc_xprt.c:921

2013-01-18 Thread Mark Lord
On 13-01-18 12:37 AM, Stanislav Kinsbursky wrote:

 You have more than one NFS mount in different network namespaces, haven't you?


No, I don't (knowingly) use (multiple) namespaces at all.
Usually I disable them in the kernel .config,
though it appears the currently running kernel has this:

CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_NET_NS is not set

The full .config was attached to the first post in this thread.

Cheers

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG at net/sunrpc/svc_xprt.c:921

2013-01-17 Thread Mark Lord
On 13-01-14 11:17 AM, Mark Lord wrote:

 Here's the code with the BUG() at net/sunrpc/svc_xprt.c line 921:
 
 /*
  * Remove a dead transport
  */
 static void svc_delete_xprt(struct svc_xprt *xprt)
 {
 struct svc_serv *serv = xprt-xpt_server;
 struct svc_deferred_req *dr;
 
 /* Only do this once */
 if (test_and_set_bit(XPT_DEAD, xprt-xpt_flags))
 BUG();


Shouldn't there also be a return statement after the BUG() line,
inside the if-stmt ?

I mean, the comment says only do this once, but it actually
appears to end up doing it twice, despite the test.

??
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG at net/sunrpc/svc_xprt.c:921

2013-01-17 Thread Mark Lord
On 13-01-17 08:53 AM, J. Bruce Fields wrote:
 On Thu, Jan 17, 2013 at 08:11:52AM -0500, Mark Lord wrote:
 On 13-01-14 11:17 AM, Mark Lord wrote:

 Here's the code with the BUG() at net/sunrpc/svc_xprt.c line 921:

 /*
  * Remove a dead transport
  */
 static void svc_delete_xprt(struct svc_xprt *xprt)
 {
 struct svc_serv *serv = xprt-xpt_server;
 struct svc_deferred_req *dr;

 /* Only do this once */
 if (test_and_set_bit(XPT_DEAD, xprt-xpt_flags))
 BUG();


 Shouldn't there also be a return statement after the BUG() line,
 inside the if-stmt ?
 
 BUG() kills the thread that calls it

Oh, does it?  Well, taken care of then, I guess.
With a sledgehammer.

:)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG at net/sunrpc/svc_xprt.c:921

2013-01-17 Thread Mark Lord
On 13-01-17 08:24 AM, Stanislav Kinsbursky wrote:
..
 This looks like the old issue I was trying to fix with SUNRPC: protect 
 service sockets lists during
 per-net shutdown.
 So, here is the problem as I see it: there is a transport, which is processed 
 by service thread and
 it's processing is racing with per-net service shutdown:
 
 CPU#0:CPU#1:
 
 svc_recvsvc_close_net
 svc_get_next_xprt (list_del_init(xpt_ready))
 svc_close_list (set XPT_BUSY and XPT_CLOSE)
 svc_clear_pools(xprt was gained on CPU#0 already)
 svc_delete_xprt (set XPT_DEAD)
 svc_handle_xprt (is XPT_CLOSE = svc_delete_xprt()
 BUG()
 
 So, from my POW, we need some way to:
 1) Skip such in-progress transports on svc_close_net() call (there is not way 
 to detect them, or at 
 least I don't see one)
 2) Delete the transport after somewhere after svc_xprt_received()
 
 But there is a problem with svc_xprt_received(): there is a call for 
 svc_xprt_put() in it
 (svc_recv-svc_handle_xprt-svc_xprt_received-svc_xprt_put) . And if we are 
 the only user - then
 the transport will be destroyed. But transport is dereferenced later in 
 svc_recv() after the
 svc_handle_xprt call.

Sounds like a reference count type of problem/solution (kref) (?)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG at net/sunrpc/svc_xprt.c:921

2013-01-16 Thread Mark Lord
On 13-01-16 12:20 AM, Stanislav Kinsbursky wrote:
 
 Mark, could you provide any call traces?

Call traces from where/what?
There's this one, posted earlier in the BUG report:

kernel BUG at net/sunrpc/svc_xprt.c:921!
Call Trace:
 [a016a56a] ? svc_recv+0xcc/0x338 [sunrpc]
 [a0318bfc] ? nfs_callback_authenticate+0x20/0x20 [nfsv4]
 [a0318c19] ? nfs4_callback_svc+0x1d/0x3c [nfsv4]
 [810407e6] ? kthread+0x81/0x89
 [81040765] ? kthread_freezable_should_stop+0x36/0x36
 [812ea62c] ? ret_from_fork+0x7c/0xb0
 [81040765] ? kthread_freezable_should_stop+0x36/0x36

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG at net/sunrpc/svc_xprt.c:921

2013-01-16 Thread Mark Lord
On 13-01-16 05:51 PM, Mark Lord wrote:
 On 13-01-16 12:20 AM, Stanislav Kinsbursky wrote:

 Mark, could you provide any call traces?
 
 Call traces from where/what?
 There's this one, posted earlier in the BUG report:
 
 kernel BUG at net/sunrpc/svc_xprt.c:921!
 Call Trace:
  [a016a56a] ? svc_recv+0xcc/0x338 [sunrpc]
  [a0318bfc] ? nfs_callback_authenticate+0x20/0x20 [nfsv4]
  [a0318c19] ? nfs4_callback_svc+0x1d/0x3c [nfsv4]
  [810407e6] ? kthread+0x81/0x89
  [81040765] ? kthread_freezable_should_stop+0x36/0x36
  [812ea62c] ? ret_from_fork+0x7c/0xb0
  [81040765] ? kthread_freezable_should_stop+0x36/0x36
..

This might be of some interest.
Here are the first few lines of the same BUG occurance,
with timestamps and the dmesg lines that immediately preceeded it.
Perhaps they might help indicate who's triggering the action
that results in the BUG(?).

Jan 14 10:58:05 zippy kernel: [66045.627952] NFS: Registering the id_resolver 
key type
Jan 14 10:58:05 zippy kernel: [66045.628014] Key type id_resolver registered
Jan 14 10:58:05 zippy kernel: [66045.628020] Key type id_legacy registered
Jan 14 10:58:05 zippy kernel: [66045.636302] [ cut here 
]
Jan 14 10:58:05 zippy kernel: [66045.648342] kernel BUG at 
net/sunrpc/svc_xprt.c:921!

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


BUG at net/sunrpc/svc_xprt.c:921

2013-01-14 Thread Mark Lord
Since upgrading to 3.7, and now 3.7.2, my AMD-450E based server
is getting these BUG complaints.  The .config file is gzip'd/attached.

[ cut here ]
kernel BUG at net/sunrpc/svc_xprt.c:921!
invalid opcode:  [#1] SMP
Modules linked in: nfsv4 xt_state xt_tcpudp xt_recent xt_LOG xt_limit 
iptable_mangle iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter 
ip_tables x_tables
sc520_wdt btusb snd_usb_audio snd_usbmidi_lib hid_generic ftdi_sio usbserial 
usbhid hid
snd_hda_codec_realtek psmouse snd_hda_codec_hdmi r8169 xhci_hcd mii 
snd_hda_intel snd_hda_codec
snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi 
snd_seq_midi_event snd_seq bnep
snd_timer rfcomm snd_seq_device bluetooth snd nfsd auth_rpcgss binfmt_misc 
radeon nfs lockd sunrpc
soundcore ttm snd_page_alloc drm_kms_helper drm i2c_algo_bit it87 hwmon_vid 
k10temp hwmon microcode
CPU 0
Pid: 29613, comm: nfsv4.0-svc Not tainted 3.7.2 #1 System manufacturer System 
Product Name/E45M1-I
DELUXE
RIP: 0010:[a01696cd]  [a01696cd] svc_delete_xprt+0x23/0xeb 
[sunrpc]
RSP: 0018:880234f05e38  EFLAGS: 00010286
RAX:  RBX: 8801b931b000 RCX: dead00200200
RDX: dead00100100 RSI: 8801b931b038 RDI: 0006
RBP: 880049125e40 R08: 0606 R09: 88023ec10fc0
R10: 88023ec10fc0 R11: 88023ec10fc0 R12: 880049125e40
R13: 8801b931b038 R14:  R15: 
FS:  7f5bef2fd700() GS:88023ec0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f2f1800bfa0 CR3: 00015ba2e000 CR4: 07f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process nfsv4.0-svc (pid: 29613, threadinfo 880234f04000, task 
88021b51a280)
Stack:
 1cc7 8801a5f2e000 8801b931b000 880049125e40
 880049125e40 a016a56a 00010fc0 880234f05fd8
 880234f05fd8 8801a5f2e000 8801a5f2e000 880234f05f08
Call Trace:
 [a016a56a] ? svc_recv+0xcc/0x338 [sunrpc]
 [a0318bfc] ? nfs_callback_authenticate+0x20/0x20 [nfsv4]
 [a0318c19] ? nfs4_callback_svc+0x1d/0x3c [nfsv4]
 [810407e6] ? kthread+0x81/0x89
 [81040765] ? kthread_freezable_should_stop+0x36/0x36
 [812ea62c] ? ret_from_fork+0x7c/0xb0
 [81040765] ? kthread_freezable_should_stop+0x36/0x36
Code: c2 84 d2 74 02 eb a0 c3 41 55 4c 8d 6f 38 41 54 4c 89 ee 55 53 48 89 fb 
50 48 8b 6f 40 bf 06
00 00 00 e8 77 fa ff ff 85 c0 74 02 0f 0b 48 8b 43 08 4c 8d 65 10 48 89 df ff 
50 38 4c 89 e7 e8 6d
RIP  [a01696cd] svc_delete_xprt+0x23/0xeb [sunrpc]
 RSP 880234f05e38
---[ end trace 916f6471c0b47e1d ]---


Here's the code with the BUG() at net/sunrpc/svc_xprt.c line 921:

/*
 * Remove a dead transport
 */
static void svc_delete_xprt(struct svc_xprt *xprt)
{
struct svc_serv *serv = xprt-xpt_server;
struct svc_deferred_req *dr;

/* Only do this once */
if (test_and_set_bit(XPT_DEAD, xprt-xpt_flags))
BUG();
...






config.txt.gz
Description: GNU Zip compressed data


Re: BUG at net/sunrpc/svc_xprt.c:921

2013-01-14 Thread Mark Lord
On 13-01-14 03:37 PM, J. Bruce Fields wrote:
 Thanks for the report.
 
 On Mon, Jan 14, 2013 at 11:17:09AM -0500, Mark Lord wrote:
 Since upgrading to 3.7, and now 3.7.2, my AMD-450E based server
 
 It's acting as an NFS client, right?

Client and server, with other Linux boxes all running 3.something kernels.

 What did you upgrade from?

3.4.something, I believe.

 is getting these BUG complaints.  The .config file is gzip'd/attached.
 
 Is this easy to reproduce?

So far, it seems to pop up within a day or so of any reboot.
I normally only reboot that system for a kernel upgrade,
but can do so a bit more often if there's useful info to collect.

Cheers
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression from 3.4.9 to 3.4.16 stable kernel

2012-10-29 Thread Mark Lord
There's something else very wrong when going from 3.4.9 to 3.4.16.
I've done it on two machines here, one the AMD-450 server (64-bit),
and the other my main notebook (Core2duo 32-bit-PAE).

Both systems feel much more sluggish than usual with 3.4.16 running.
Reverted them both back to earlier kernels (3.4.9, 3.4.4-PAE),
and the usual responsive feel has returned.

Vague, I know, but something bad happened in there somewhere.

Cheers
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression from 3.4.9 to 3.4.16 stable kernel

2012-10-29 Thread Mark Lord
On 12-10-29 07:03 PM, Greg Kroah-Hartman wrote:
 On Mon, Oct 29, 2012 at 07:00:54PM -0400, Mark Lord wrote:
 There's something else very wrong when going from 3.4.9 to 3.4.16.
 I've done it on two machines here, one the AMD-450 server (64-bit),
 and the other my main notebook (Core2duo 32-bit-PAE).

 Both systems feel much more sluggish than usual with 3.4.16 running.
 Reverted them both back to earlier kernels (3.4.9, 3.4.4-PAE),
 and the usual responsive feel has returned.

 Vague, I know, but something bad happened in there somewhere.
 
 That's too vague for me to do anything with, sorry.  Bisection would be
 good if you can figure out how to measure this.

Well, I'd bet Donkeys to Daises that reverting the kernel/sched.c changes
will probably fix the responsiveness, but I haven't done that yet.
I've lost enough time already debugging the other issues.

This is more just an indication that perhaps -stable patches need better review
than they're getting.  Take the setup.c breakage: as soon as I pointed it out,
a few people jumped in with knowledge that it was broken, and that patches
existed to fix it.

That kind of thing should be happening before a -stable release,
though I don't know how you would get the Right People to look
at this stuff then rather than after the fact.  Maybe a topic
for a future kernel summit or something.

Best wishes.
-ml

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression from 3.4.9 to 3.4.16 stable kernel

2012-10-29 Thread Mark Lord
On 12-10-29 02:46 AM, Willy Tarreau wrote:
 On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
 My server here runs the 3.4.xx series of stable kernels.
 Until today, it was running 3.4.9.
 Today I tried to upgrade it to 3.4.16.
 It hangs in setup.c.

 I've isolated the fault down to this specific change
 that was made between 3.4.9 and 3.4.16.
 Reverting this change allows the system to boot/run normally again.


 --- linux-3.4.9/arch/x86/kernel/setup.c  2012-08-15 11:17:17.0 
 -0400
 +++ linux-3.4.16/arch/x86/kernel/setup.c 2012-10-28 13:36:33.0 
 -0400
 @@ -927,8 +927,21 @@

  #ifdef CONFIG_X86_64
  if (max_pfn  max_low_pfn) {
 -max_pfn_mapped = init_memory_mapping(1UL32,
 - max_pfnPAGE_SHIFT);
 +int i;
 +for (i = 0; i  e820.nr_map; i++) {
 +struct e820entry *ei = e820.map[i];
 +
 +if (ei-addr + ei-size = 1UL  32)
 +continue;
 +
 +if (ei-type == E820_RESERVED)
 +continue;
 +
 +max_pfn_mapped = init_memory_mapping(
 +ei-addr  1UL  32 ? 1UL  32 : ei-addr,
 +ei-addr + ei-size);
 +}
 +
  /* can we preseve max_low_pfn ?*/
  max_low_pfn = max_pfn;
  }
 
 For the record, it is this commit introduced in 3.4.16 :
 
 commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
 Author: Jacob Shin jacob.s...@amd.com
 Date:   Thu Oct 20 16:15:26 2011 -0500
 
 x86: Exclude E820_RESERVED regions and memory holes above 4 GB from 
 direct mapping.
 
 commit 1e779aabe1f0768c2bf8f8c0a5583679b54a upstream.
 
 On systems with very large memory (1 TB in our case), BIOS may report a
 reserved region or a hole in the E820 map, even above the 4 GB range. 
 Exclude
 these from the direct mapping.
 
 [ hpa: this should be done not just for  4 GB but for everything above 
 the legacy
   region (1 MB), at the very least.  That, however, turns out to require 
 significant
   restructuring.  That work is well underway, but is not suitable for 
 rc/stable. ]
 
 Signed-off-by: Jacob Shin jacob.s...@amd.com
 Link: 
 http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.s...@amd.com
 Signed-off-by: H. Peter Anvin h...@linux.intel.com
 Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org
 
 Willy


Thanks, Willy.

I've also now downloaded linux-3.7.0-rc3, and it boots/runs without need for 
patching.
So there's a fix somewhere in between that perhaps could also get backported to 
-stable.

-ml

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression from 3.4.9 to 3.4.16 stable kernel

2012-10-29 Thread Mark Lord
On 12-10-29 10:22 AM, Mark Lord wrote:
 On 12-10-29 02:46 AM, Willy Tarreau wrote:
 On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
 My server here runs the 3.4.xx series of stable kernels.
 Until today, it was running 3.4.9.
 Today I tried to upgrade it to 3.4.16.
 It hangs in setup.c.

 I've isolated the fault down to this specific change
 that was made between 3.4.9 and 3.4.16.
 Reverting this change allows the system to boot/run normally again.
..
 For the record, it is this commit introduced in 3.4.16 :

 commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
 Author: Jacob Shin jacob.s...@amd.com
 Date:   Thu Oct 20 16:15:26 2011 -0500

 x86: Exclude E820_RESERVED regions and memory holes above 4 GB from 
 direct mapping.
 
 commit 1e779aabe1f0768c2bf8f8c0a5583679b54a upstream.
 
 On systems with very large memory (1 TB in our case), BIOS may report a
 reserved region or a hole in the E820 map, even above the 4 GB range. 
 Exclude
 these from the direct mapping.
 
 [ hpa: this should be done not just for  4 GB but for everything above 
 the legacy
   region (1 MB), at the very least.  That, however, turns out to require 
 significant
   restructuring.  That work is well underway, but is not suitable for 
 rc/stable. ]
 
 Signed-off-by: Jacob Shin jacob.s...@amd.com
 Link: 
 http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.s...@amd.com
 Signed-off-by: H. Peter Anvin h...@linux.intel.com
 Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org
..
 I've also now downloaded linux-3.7.0-rc3, and it boots/runs without need for 
 patching.
 So there's a fix somewhere in between that perhaps could also get backported 
 to -stable.
..

Heh.. except that kernel has its own issues -- hangs in some kind of screen loop
in the Radeon code (?) when trying to shutdown.  ctrl-alt-sysrq s+u+s+b gets 
out of that,
but it hangs in a similar fashion during the subsequent reboot.

A full power-off was required to get the Radeon video to behave so I could 
reboot
the system with 3.4.16 again.  I'm not going to pursue that issue for now, 
though.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Regression from 3.4.9 to 3.4.16 stable kernel

2012-10-28 Thread Mark Lord
My server here runs the 3.4.xx series of stable kernels.
Until today, it was running 3.4.9.
Today I tried to upgrade it to 3.4.16.
It hangs in setup.c.

I've isolated the fault down to this specific change
that was made between 3.4.9 and 3.4.16.
Reverting this change allows the system to boot/run normally again.


--- linux-3.4.9/arch/x86/kernel/setup.c 2012-08-15 11:17:17.0 -0400
+++ linux-3.4.16/arch/x86/kernel/setup.c2012-10-28 13:36:33.0 
-0400
@@ -927,8 +927,21 @@

 #ifdef CONFIG_X86_64
if (max_pfn  max_low_pfn) {
-   max_pfn_mapped = init_memory_mapping(1UL32,
-max_pfnPAGE_SHIFT);
+   int i;
+   for (i = 0; i  e820.nr_map; i++) {
+   struct e820entry *ei = e820.map[i];
+
+   if (ei-addr + ei-size = 1UL  32)
+   continue;
+
+   if (ei-type == E820_RESERVED)
+   continue;
+
+   max_pfn_mapped = init_memory_mapping(
+   ei-addr  1UL  32 ? 1UL  32 : ei-addr,
+   ei-addr + ei-size);
+   }
+
/* can we preseve max_low_pfn ?*/
max_low_pfn = max_pfn;
}
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Drop support for x86-32

2012-08-29 Thread Mark Lord
On 12-08-26 10:15 AM, wbrana wrote:
 On 8/26/12, Mark Lord ker...@teksavvy.com wrote:
 Here are a couple of real scenarios you don't seem to have thought about.
 A 32-bit kernel on a legacy (or even new) system in 2017 will still need
 regular kernel updates (not long term un0maintained kernels)
 in order to work with new USB devices, new 4KB+ sector hard drives,
 newer generations of SSDs, etc..
 12-years-old machine is trash.

There you go making assumptions again.
Who said anything about a 12-year old machine?

Much more likely is a 5-year old software installation
that gets moved to a new box.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Drop support for x86-32

2012-08-26 Thread Mark Lord
On 12-08-24 12:45 PM, wbrana wrote:
 On 8/24/12, Alan Cox a...@lxorguk.ukuu.org.uk wrote:
 That doesn't work for a variety of reasons x86 hardware is still
 changing, devices are still changing. So please exit cloud cuckoo land
 and go do something useful.
 Hardware will be discontinued if no software will support it.

Here are a couple of real scenarios you don't seem to have thought about.
A 32-bit kernel on a legacy (or even new) system in 2017 will still need
regular kernel updates (not long term un0maintained kernels)
in order to work with new USB devices, new 4KB+ sector hard drives,
newer generations of SSDs, etc..

It's (mostly) all about drivers.

Cheers
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ugly patch] Save .15W-.5W by AHCI powersaving

2008-02-26 Thread Mark Lord

Pavel Machek wrote:

Hi!


This is a patch (very ugly, assumes you have just one disk) to bring
powersaving to AHCI. You need Alan's SCSI autosuspend (attached) patch
as a base.

It saves .5W compared to config with disk spinning, and even .15W
compared to hdparm -y... on my thinkpad x60 anyway.

..

There was a discussion of this here today.


Real-life discussion, or something I could read? :-).


It makes good use of AHCI-specific features.

Has it been tested with a Port-Multiplier yet?


I do not know what port-multiplier is, sorry. But it was not really
tested. It is not expected to work on any other config than notebook
very similar to mine.


This is cool enough that we really ought to do a hardware-independent
version, so that all SATA interfaces could benefit.  Especially ata_piix,
but others too.


Well, it seems like it is 10 lines per driver once Alan's SCSI
autosuspend patches are in...

..

Cool (literally)!

I think I might have gotten your patch confused in my mind
with another AHCI patch, which uses features of the chip itself
to automatically negotiate/change link power status on the fly
(no s/w needed, other than to turn it on).

That one is very ACPI specific, though.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Sata-MV, Intergated Sata Device Support

2008-02-26 Thread Mark Lord

saeed wrote:


On Mon, 25 Feb 2008, Jeff Garzik wrote:


...

Saeed:  isn't this what your SOC patches already implemented for us?
As near as I can tell, sata_mv now already has support for the 60x1C0.

Saeed's stuff didn't support PCI though, and Jon Li is definitely talking
about PCI...
yes, my patch added support for the SoC sata like in the 5182, and this 
is what Jon Li was concerned about. he mentioneded the 60x1C0 pci device 
just to suggest to use it's code for the SoC sata as it is very similar.

..

I don't think I understand your english there.

Does the current sata_mv driver work as-is with the chipset this person wants?
If not, then exactly what has to change to make it work?

Thanks
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ugly patch] Save .15W-.5W by AHCI powersaving

2008-02-25 Thread Mark Lord

Pavel Machek wrote:

Hi!

This is a patch (very ugly, assumes you have just one disk) to bring
powersaving to AHCI. You need Alan's SCSI autosuspend (attached) patch
as a base.

It saves .5W compared to config with disk spinning, and even .15W
compared to hdparm -y... on my thinkpad x60 anyway.

..

There was a discussion of this here today.
It makes good use of AHCI-specific features.

Has it been tested with a Port-Multiplier yet?

This is cool enough that we really ought to do a hardware-independent
version, so that all SATA interfaces could benefit.  Especially ata_piix,
but others too.

Cheers
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Sata-MV, Intergated Sata Device Support

2008-02-25 Thread Mark Lord

Jeff Garzik wrote:

Jon Li wrote:

Hello,

I am curious as to whether there are plans to add support for integrated
sata devices.  I personally want to add support for a 60x1C0 based
device (pci:id = 0x5182).  I think adding support should be relatively
simple, except for a few issues outlined below.

In the original mvSata.c (ver3.4) that has 0x5182 support, the config
space is as such:

case MV_SATA_DEVICE_ID_5182:
pAdapter-numberOfChannels = MV_SATA_5182_PORT_NUM;
pAdapter-numberOfUnits = 1;
pAdapter-portsPerUnit = 2;
pAdapter-sataAdapterGeneration = MV_SATA_GEN_IIE;
/*The integrated sata core chip based on 60x1 C0*/
pAdapter-chipIs60X1C0 = MV_TRUE;
pAdapter-hostInterface = MV_HOST_IF_INTEGRATED;
pAdapter-mainMaskOffset = 0x20024; /*the iobaseaddress is
0x6*/
pAdapter-mainCauseOffset = 0x20020;
break;

I have not yet figured out how all these values are defined in sata-mv.c
(ver 0.8).  Specifically, where do I define numberOfChannels which
should equal 2, and numberOfUnits which obviously equals 1?

I have a current config space (not completed) for sata-mv.c which is:

{  /* chip_5182 */
.sht= mv_sht,
.flags= (MV_COMMON_FLAGS | MV_6XXX_FLAGS |
   MV_FLAG_DUAL_HC),
.pio_mask= 0x1f,/* pio0-4 */
.udma_mask= 0x7f,/* udma0-6 */
.port_ops= mv6_ops,
},

...

Saeed:  isn't this what your SOC patches already implemented for us?
As near as I can tell, sata_mv now already has support for the 60x1C0.

-ml
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.

2008-02-22 Thread Mark Lord

Rafael J. Wysocki wrote:


No.  Again, if there are devices that wake us up from S4, but not from S5,
they need to be handled differently in the *enter S4* case (hibernation) and
in the *enter S5* case (powering off the system).

..

Something I've never understood, is why we would ever want to bother with *S4* 
at all?

I actually like hibernation (great for travelling), but I treat it as if
it were a complete power-off (S5?).  I pull batteries, unplug drives,
boot other operating systems, etc..

And when I put it all back together again with the Linux disk inserted,
I fully expect it to resume from the hibernation of 3 months ago.
And it does.

Why would I ever want anything less than a full poweroff for hibernation 

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.

2008-02-22 Thread Mark Lord

[EMAIL PROTECTED] wrote:
..
I've been watching for kexec hibernate for a little while now, and the 
last I saw was that acpi was incompatible with the kexec hibernate (but 
the suspend folks were still claiming that devices needed to be put in 
the 'right mode' not just powered off. I've been waiting to see this 
resolved.

..

Yeah, exactly.  What's so special about poweroff on hibernation?
Why even bother with the special S4 state there?
I want a real full poweroff, or at least I think I do.  Why wouldn't I?


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.

2008-02-22 Thread Mark Lord

Rafael J. Wysocki wrote:

On Friday, 22 of February 2008, Mark Lord wrote:

[EMAIL PROTECTED] wrote:
..
I've been watching for kexec hibernate for a little while now, and the 
last I saw was that acpi was incompatible with the kexec hibernate (but 
the suspend folks were still claiming that devices needed to be put in 
the 'right mode' not just powered off. I've been waiting to see this 
resolved.

..

Yeah, exactly.  What's so special about poweroff on hibernation?
Why even bother with the special S4 state there?


(1) To be able to wake up with the help of devices that can't wake
the system up from S5 (power off)
(2) To handle some platform devices appropriately over the cycle

..

That's the theory.  I've read about it, but have yet to imagine
any real-life situation where it applies.

But this isn't my speciality, so.. do you have experience with any real 
examples?

Thanks!


I want a real full poweroff, or at least I think I do.  Why wouldn't I?




You may want that, some people may not want it.

We are supposed to handle S4, the BIOS/platform may expect us to do that, so
IMO this is a good enough reason to do it.  Especially that we can.

Thanks,
Rafael


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc2 + smartd = hang

2008-02-22 Thread Mark Lord

Anders Eriksson wrote:


[EMAIL PROTECTED] said:

The sysrq-e output is probably just standard ext3 journalling unrelated  to
the problem...  what does dmesg say?  lspci?  What's your hardware setup? 



dmesg ; smartd ; dmesg yields no new entries in dmesg. It seems on disk 
accesses are dead. it still routes packets fine.


This is an old PII-300 with 2 IDS disks and a DVD R/W. 

...

Feb 22 17:38:49 tippex Uniform Multi-Platform E-IDE driver
Feb 22 17:38:49 tippex ide: Assuming 33MHz system bus speed for PIO modes; 
override with idebus=xx
Feb 22 17:38:49 tippex PIIX4: IDE controller (0x8086:0x7111 rev 0x01) at  PCI 
slot :00:07.1
Feb 22 17:38:49 tippex PIIX4: not 100% native mode: will probe irqs later
Feb 22 17:38:49 tippex ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, 
hdb:PIO
Feb 22 17:38:49 tippex ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:PIO, 
hdd:PIO
Feb 22 17:38:49 tippex Probing IDE interface ide0...
Feb 22 17:38:49 tippex hdb: IC35L120AVV207-0, ATA DISK drive
Feb 22 17:38:49 tippex hda: IBM-DTTA-371010, ATA DISK drive
Feb 22 17:38:49 tippex hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4
Feb 22 17:38:49 tippex hda: UDMA/33 mode selected
Feb 22 17:38:49 tippex hdb: host max PIO4 wanted PIO255(auto-tune) selected PIO4
Feb 22 17:38:49 tippex hdb: UDMA/33 mode selected
Feb 22 17:38:49 tippex Probing IDE interface ide1...
Feb 22 17:38:49 tippex hdd: Maxtor 6L250R0, ATA DISK drive
Feb 22 17:38:49 tippex hdc: AOPEN DUW1608/ARR, ATAPI CD/DVD-ROM drive
Feb 22 17:38:49 tippex hdc: host max PIO4 wanted PIO255(auto-tune) selected PIO4
Feb 22 17:38:49 tippex hdc: UDMA/33 mode selected
Feb 22 17:38:49 tippex hdd: host max PIO4 wanted PIO255(auto-tune) selected PIO4
Feb 22 17:38:49 tippex hdd: UDMA/33 mode selected
Feb 22 17:38:49 tippex ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Feb 22 17:38:49 tippex ide1 at 0x170-0x177,0x376 on irq 15

..

So that's using the old IDE drivers.
And the network and USB are sharing IRQ#11 with each other.

If you are going to be using newer kernels like this (2.6.23+),
then you might consider shifting those drives over to libata drivers.

This involves a little bit of work -- building a kernel with libata
and ata_piix built-in instead of the old IDE drivers,
and then rearranging /etc/fstab to match the new device names
(eg. /dev/sda instead of /dev/hda).

But at this point libata is working much better than the old IDE stuff,
and it really is worth moving things over if you can.

Cheers
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ata_ram driver

2008-02-22 Thread Mark Lord

Matthew Wilcox wrote:

I've ported the scsi_ram driver [1] to libata.  It could use a lot more
work -- there's a lot of stuff in the identify page that I haven't
filled in, and there's a lot of commands it doesn't even try to execute.

For example, when you unload the driver, you get the mildly disturbing
messages:

sd 12:0:0:0: [sdb] Stopping disk
sd 12:0:0:0: [sdb] START_STOP FAILED
sd 12:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK,SUGGEST_OK


..

I see messages like those with *established* libata drivers from time to time.
It could just be a bug in the shutdown sequence, somewhere between libata,
SCSI, block layer, and the device model in general.  Or not.

Cheers
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   >