RE: [PATCH 02/11] media: vsp1: use kernel __packed for structures

2018-03-13 Thread David Laight
From: Kieran Bingham [mailto:kieran.bingham+rene...@ideasonboard.com]
> On 13/03/18 11:20, David Laight wrote:
> > From: Kieran Bingham
> >> Sent: 09 March 2018 22:04
> >> The kernel provides a __packed definition to abstract away from the
> >> compiler specific attributes tag.
> >>
> >> Convert all packed structures in VSP1 to use it.
> >>
> >> Signed-off-by: Kieran Bingham <kieran.bingham+rene...@ideasonboard.com>
> >> ---
> >>  drivers/media/platform/vsp1/vsp1_dl.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c 
> >> b/drivers/media/platform/vsp1/vsp1_dl.c
> >> index 37e2c984fbf3..382e45c2054e 100644
> >> --- a/drivers/media/platform/vsp1/vsp1_dl.c
> >> +++ b/drivers/media/platform/vsp1/vsp1_dl.c
> >> @@ -29,19 +29,19 @@
> >>  struct vsp1_dl_header_list {
> >>u32 num_bytes;
> >>u32 addr;
> >> -} __attribute__((__packed__));
> >> +} __packed;
> >>
> >>  struct vsp1_dl_header {
> >>u32 num_lists;
> >>struct vsp1_dl_header_list lists[8];
> >>u32 next_header;
> >>u32 flags;
> >> -} __attribute__((__packed__));
> >> +} __packed;
> >>
> >>  struct vsp1_dl_entry {
> >>u32 addr;
> >>u32 data;
> >> -} __attribute__((__packed__));
> >> +} __packed;
> >
> > Do these structures ever actually appear in misaligned memory?
> > If they don't then they shouldn't be marked 'packed'.
> 
> I believe the declaration is to ensure that the struct definition is not 
> altered
> by the compiler as these structures specifically define the layout of how the
> memory is read by the VSP1 hardware.

The C language and ABI define structure layouts.

> Certainly 2 u32's sequentially stored in a struct are unlikely to be moved or
> rearranged by the compiler (though I believe it would be free to do so if it
> chose without this attribute), but I think the declaration shows the intent of
> the memory structure.

The language requires the fields be in order and the ABI stops the compiler
adding 'random' padding.

> Isn't this a common approach throughout the kernel when dealing with hardware
> defined memory structures ?

Absolutely not.
__packed tells the compiler that the structure might be on any address boundary.
On many architectures this means the compiler must use byte accesses with shifts
and ors for every access.
The most a hardware defined structure will have is a compile-time assert
that it is the correct size (to avoid silly errors from changes).

David



RE: [PATCH 02/11] media: vsp1: use kernel __packed for structures

2018-03-13 Thread David Laight
From: Kieran Bingham
> Sent: 09 March 2018 22:04
> The kernel provides a __packed definition to abstract away from the
> compiler specific attributes tag.
> 
> Convert all packed structures in VSP1 to use it.
> 
> Signed-off-by: Kieran Bingham 
> ---
>  drivers/media/platform/vsp1/vsp1_dl.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c 
> b/drivers/media/platform/vsp1/vsp1_dl.c
> index 37e2c984fbf3..382e45c2054e 100644
> --- a/drivers/media/platform/vsp1/vsp1_dl.c
> +++ b/drivers/media/platform/vsp1/vsp1_dl.c
> @@ -29,19 +29,19 @@
>  struct vsp1_dl_header_list {
>   u32 num_bytes;
>   u32 addr;
> -} __attribute__((__packed__));
> +} __packed;
> 
>  struct vsp1_dl_header {
>   u32 num_lists;
>   struct vsp1_dl_header_list lists[8];
>   u32 next_header;
>   u32 flags;
> -} __attribute__((__packed__));
> +} __packed;
> 
>  struct vsp1_dl_entry {
>   u32 addr;
>   u32 data;
> -} __attribute__((__packed__));
> +} __packed;

Do these structures ever actually appear in misaligned memory?
If they don't then they shouldn't be marked 'packed'.

David



RE: Enabling peer to peer device transactions for PCIe devices

2017-10-24 Thread David Laight
Please don't top post, write shorter lines, and add the odd blank line.
Big blocks of text are hard to read quickly.

> From: Petrosyan, Ludwig [mailto:ludwig.petros...@desy.de]
> Yes I agree it has to be started with the write transaction, according of 
> PCIe standard all write
> transaction are address routed, and I agree with Logan:
> if in write transaction TLP the endpoint address written in header the TLP 
> should not touch CPU, the
> PCIe Switch has to route it to endpoint.

That depends, IIRC there is a feature for PCIe switches to force them
to send all transactions to the root hub.
This is there so that the host can enforce rules to stop p2p transfers.
It might enabled on the switch you have.

> The idea was: in MTCA system there is PCIe Switch on MCH (MTCA crate HUB) 
> this switch connects CPU to
> other Crate Slots, so one port is Upstream and others are Downstream  ports, 
> DMA read from CPU is
> usual write on endpoint side, Xilinx DMA core has two registers Destination 
> Address and Source
> Address,
> device driver to make DMA has to set up these registers,
> usually device driver to start DMA read from Board sets Source address as 
> FPGA memory address and
> Destination address is DMA prepared system address,
> in case of testing p2p I set Destination address as physical address of other 
> endpoint.

Unnecessary detail...

> More detailed:
> we have so called pcie universal driver: the idea behind is
> 1. all pcie configuration staff, find enabled BARs, mapping BARs, usual 
> read/write and common ioctl
> (get slot number, get driver version ...) implemented in universal driver and 
> EXPORTed.
> 2. if some system function in new kernel are changed we change it only in 
> universal parts (easy
> support a big number of drivers )
> so the universal driver something like PCIe Driver API
> 3. the universal driver provides read/writ functions so we have the same 
> device access API for any
> PCIe device, we could use the same user application with any PCIe device

More crap...

> now. during BARs finding and mapping universal driver keeps pcie endpoint 
> physical address in some
> internal structures, any top driver may get physical address
> of other pcie endpoint by slot number.
> in may case during get_resorce the physical address is 0xB200, I check 
> lspci -H1 - -s psie
> switch port bus address (the endpoint connected to this port, checked by 
> lspci -H1 -t) the same
> address (0xB20) is the memory behind bridge,

Overly verbose...

> I want to make p2p writes to offset 0x4, so I set DMA destination address 
> 0xB240
> is something wrong?

Possibly.

You almost certainly need the address that is written into the BAR of the
target endpoint.
This could well be different from the physical address that the cpu uses
to write to the endpoint (as well as the cpu virtual address).

lspci lies [1], run lspci -x  (or hexdump config space through /sys/devices)
to see what is actually in the BAR.

[1] The addresses come from somewhere other than reading the BAR.
If the endpoint resets the BAR lspci will still report the old
addresses.

David



RE: Enabling peer to peer device transactions for PCIe devices

2017-10-23 Thread David Laight
From: Petrosyan Ludwig
> Sent: 22 October 2017 07:14
> Could be I have done is stupid...
> But at first sight it has to be simple:
> The PCIe Write transactions are address routed, so if in the packet header 
> the other endpoint address
> is written the TLP has to be routed (by PCIe Switch to the endpoint), the DMA 
> reading from the end
> point is really write transactions from the endpoint, usually (Xilinx core) 
> to start DMA one has to
> write to the DMA control register of the endpoint the destination address. So 
> I have change the device
> driver to set in this register the physical address of the other endpoint 
> (get_resource start called
> to other endpoint, and it is the same address which I could see in lspci 
> - -s bus-address of the
> switch port, memories behind bridge), so now the endpoint has to start send 
> writes TLP with the other
> endpoint address in the TLP header.
> But this is not working (I want to understand why ...), but I could see the 
> first address of the
> destination endpoint is changed (with the wrong value 0xFF),
> now I want to try prepare in the driver of one endpoint the DMA buffer , but 
> using physical address of
> the other endpoint,
> Could be it will never work, but I want to understand why, there is my error 
> ...

It is also worth checking that the hardware actually supports p2p transfers.
Writes are more likely to be supported then reads.
ISTR that some intel cpus support some p2p writes, but there could easily
be errata against them.

I'd certainly test a single word write to read/write memory location.
First verify against kernel memory, then against a 'slave' board.

I don't know about Xilinx fpga, but we've had 'fun' getting Altera fpga
to do sensible PCIe cycles (I ended up writing a simple dma controller 
that would generate long TLP).
We also found a bug in the Altera logic that processed interleaved
read completions.

David



RE: [PATCH v4 4/9] em28xx: fix em28xx_dvb_init for KASAN

2017-09-25 Thread David Laight
From: Arnd Bergmann
> Sent: 22 September 2017 22:29
...
> It seems that this is triggered in part by using strlcpy(), which the
> compiler doesn't recognize as copying at most 'len' bytes, since strlcpy
> is not part of the C standard.

Neither is strncpy().

It'll almost certainly be a marker in a header file somewhere,
so it should be possibly to teach it about other functions.

David



RE: [PATCH 16/22] xen-blkfront: Make use of the new sg_map helper function

2017-04-18 Thread David Laight
From: Logan Gunthorpe
> Sent: 13 April 2017 23:05
> Straightforward conversion to the new helper, except due to
> the lack of error path, we have to warn if unmapable memory
> is ever present in the sgl.
> 
> Signed-off-by: Logan Gunthorpe 
> ---
>  drivers/block/xen-blkfront.c | 33 +++--
>  1 file changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 5067a0a..7dcf41d 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -807,8 +807,19 @@ static int blkif_queue_rw_req(struct request *req, 
> struct blkfront_ring_info *ri
>   BUG_ON(sg->offset + sg->length > PAGE_SIZE);
> 
>   if (setup.need_copy) {
> - setup.bvec_off = sg->offset;
> - setup.bvec_data = kmap_atomic(sg_page(sg));
> + setup.bvec_off = 0;
> + setup.bvec_data = sg_map(sg, SG_KMAP_ATOMIC);
> + if (IS_ERR(setup.bvec_data)) {
> + /*
> +  * This should really never happen unless
> +  * the code is changed to use memory that is
> +  * not mappable in the sg. Seeing there is a
> +  * questionable error path out of here,
> +  * we WARN.
> +  */
> + WARN(1, "Non-mappable memory used in sg!");
> + return 1;
> + }
...

Perhaps add a flag to mark failure as 'unexpected' and trace (and panic?)
inside sg_map().

David




RE: [PATCH v3] usb: document that URB transfer_buffer should be aligned

2017-04-07 Thread David Laight
From: Mauro Carvalho Chehab
> Sent: 05 April 2017 14:53
> Several host controllers, commonly found on ARM, like dwc2,
> require buffers that are CPU-word aligned for they to work.
> 
> Failing to do that will cause buffer overflows at the caller
> drivers, with could cause data corruption.
> 
> Such data corruption was found, in practice, with the uvcdriver.
> 
> Document it.

How does this in any way solve the problem.

Ethernet frames will be misaligned, however hard it may be
the usb drivers will have to handle it.

David



RE: [PATCH 22/22] usb: document that URB transfer_buffer should be aligned

2017-03-30 Thread David Laight
From: Mauro Carvalho Chehab
> Sent: 30 March 2017 11:28
...
> While debugging this issue, I saw *a lot* of network-generated URB
> traffic from RPi3 Ethernet port drivers that were using non-aligned
> buffers and were subject to the temporary buffer conversion.

Buffers from the network stack will almost always be 4n+2 aligned.
Receive data being fed into the network stack really needs to be
4n=2 aligned.

The USB stack almost certainly has to live with that.

If the USB ethernet device doesn't have two bytes of 'pad' before
the frame data (destination MAC address) then you have to solve
the problem within the USB stack.

For transmits it might be possible to send an initial 2 byte fragment
from a separate buffer - but only if arbitrary fragment sizes are
allowed.
A normal USB3 controller should allow this - but you have to be very
careful about what happens at the end of the ring.

David




RE: [PATCH 01/26] compiler: introduce noinline_for_kasan annotation

2017-03-03 Thread David Laight
From: Andrey Ryabinin
> Sent: 03 March 2017 13:50
...
> noinline_iff_kasan might be a better name.  noinline_for_kasan gives the 
> impression
> that we always noinline function for the sake of kasan, while 
> noinline_iff_kasan
> clearly indicates that function is noinline only if kasan is used.

noinline_if_stackbloat

David



RE: [PATCH] usb: hcd: get/put device and hcd for hcd_buffers()

2014-12-08 Thread David Laight
From: Greg Kroah-Hartman
 On Fri, Dec 05, 2014 at 09:03:57PM +0100, Sebastian Andrzej Siewior wrote:
  Consider the following scenario:
  - plugin a webcam
  - play the stream via gst-launch-0.10 v4l2src device=/dev/video0
  - remove the USB-HCD during playback via rmmod $HCD
 
  and now wait for the crash
 
 Which you deserve, why did you ever remove a kernel module?  That's racy
 and _never_ recommended, which is why it never happens automatically and
 only root can do it.

Really drivers and subsystems should have the required locking (etc) to
ensure that kernel modules can either be unloaded, or that the unload
request itself fails if the device is busy.

It shouldn't be considered a 'shoot self in foot' operation.
OTOH there are likely to be bugs.

David

N�r��yb�X��ǧv�^�)޺{.n�+{���bj)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥