Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
Ralf Baechle r...@linux-mips.org writes: 16K is a silver bullet solution to all cache aliasing problems. So if your issue persists with 16K page size, it's not a cache aliasing issue. Aside there are generally performance gains from the bigger page size. I wonder why isn't the issue present in other cases. Perhaps remapping of a userspace address and accessing it with kseg0 isn't a frequent operation. Shouldn't we change the default page size (on affected CPUs) to 16 KB then? Alternatively, we could flush/invalidate the cache when needed - is it a viable option? Thanks. -- Krzysztof Halasa Research Institute for Automation and Measurements PIAP Al. Jerozolimskie 202, 02-486 Warsaw, Poland -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
On Wed, Oct 09, 2013 at 08:53:20AM +0200, Krzysztof Hałasa wrote: 16K is a silver bullet solution to all cache aliasing problems. So if your issue persists with 16K page size, it's not a cache aliasing issue. Aside there are generally performance gains from the bigger page size. I wonder why isn't the issue present in other cases. Perhaps remapping of a userspace address and accessing it with kseg0 isn't a frequent operation. Shouldn't we change the default page size (on affected CPUs) to 16 KB then? Alternatively, we could flush/invalidate the cache when needed - is it a viable option? The kernel is supposed to perform the necessary cache flushing, so any remaining aliasing issue would be considered a bug. But the code is performance sensitive, some of the problem cases are twisted and complex so bugs and unsolved corner cases show up every now and then. The historic default is 4K page size - on some processors such as the venerable R3000 it's also the only page size available. Some application code wants to know the page size and has wisely hardcoded 4K. Also a fix to binutils many years ago reduced the alignment of generated binaries so they'd not run on a kernel with larger page size. The kernel configuration defaults are chosen to just work out of the box, and 4K page size is the safest choice. Anyway, binutils got unfixed again years ago so chances are 16K will just work. Does it work for you, even solve your problem? Ralf -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
Ralf Baechle r...@linux-mips.org writes: The kernel is supposed to perform the necessary cache flushing, so any remaining aliasing issue would be considered a bug. But the code is performance sensitive, some of the problem cases are twisted and complex so bugs and unsolved corner cases show up every now and then. Ok. This means I should also investigate the V4L2 and the hw driver code, because the cache aliasing shouldn't be there in the first place. Does it work for you, even solve your problem? Sure, with 16 KB page size everything works fine. -- Krzysztof Halasa Research Institute for Automation and Measurements PIAP Al. Jerozolimskie 202, 02-486 Warsaw, Poland -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
Ralf Baechle r...@linux-mips.org writes: That's fine. You just need to ensure that there are no virtual aliases. Does this include virtual aliasing between a 4 KB TLB-mapped page and a kseg0 address? I don't really have two TLBs pointing to the same page. One way to do so is to increase the page size to 16kB. Right, this way we will have a unique mapping from the virtual address to the data cache, as the cache size (per way) is 8 KB here. Is it the correct fix in this situation? Note that there is a variant of the 24K which has a VIPT cache but uses hardware to resolve cache aliases. That is, from a kernel cache management perspective it behaves like a PIPT cache. It seems it's not the case here. What I have here is: CPU revision is: 00019374 (MIPS 24Kc) SoC: Atheros AR7161 rev 2 Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes. Primary data cache 32kB, 4-way, VIPT, cache aliases, linesize 32 bytes However as I understand what you're mapping to userspace is actually device memory, right? Not exactly - I'm using PCI DMA to userspace SG buffers in RAM. The userspace first allocates the buffers in normal RAM (with vmalloc() or something, there is an mmap ioctl() for this), the address returned is 0x7xxx. Then the buffers (which consist of several pages each) are presented to the hw driver which obtains separate (kernel) mapping for each page (kseg0) and does dma_map_sg() and so on. The driver also simply writes to the buffers. This isn't a problem, though - only the incoherence between TLB and kseg0 is. The problem is the userspace doesn't see the kernel writes - The 0x7xxx TLB-mapped pages are read-cached and not invalidated while the kernel writes to the same pages using kseg0 addresses. Thanks for looking at this. -- Krzysztof Halasa Research Institute for Automation and Measurements PIAP Al. Jerozolimskie 202, 02-486 Warsaw, Poland -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
Ralf Baechle r...@linux-mips.org writes: That's fine. You just need to ensure that there are no virtual aliases. One way to do so is to increase the page size to 16kB. Checked, this thing works fine with 16 KB pages. -- Krzysztof Halasa Research Institute for Automation and Measurements PIAP Al. Jerozolimskie 202, 02-486 Warsaw, Poland -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
On Tue, Oct 08, 2013 at 10:24:13AM +0200, Krzysztof Hałasa wrote: That's fine. You just need to ensure that there are no virtual aliases. Does this include virtual aliasing between a 4 KB TLB-mapped page and a kseg0 address? I don't really have two TLBs pointing to the same page. Yes. Note that the terminology used in the manuals may be confusing here. They call KSEG0/1 and - on 64 bit - XKPHYS unmapped spaces. But obviously there is a mapping from virtual to physical addresses. It's just that there is no TLB entry being used for these mappings. It's easier to understand if you see that all memory accesses are going through the cache, TLB mapped or not. One way to do so is to increase the page size to 16kB. Right, this way we will have a unique mapping from the virtual address to the data cache, as the cache size (per way) is 8 KB here. Is it the correct fix in this situation? 16K is a silver bullet solution to all cache aliasing problems. So if your issue persists with 16K page size, it's not a cache aliasing issue. Aside there are generally performance gains from the bigger page size. Note that there is a variant of the 24K which has a VIPT cache but uses hardware to resolve cache aliases. That is, from a kernel cache management perspective it behaves like a PIPT cache. It seems it's not the case here. What I have here is: CPU revision is: 00019374 (MIPS 24Kc) SoC: Atheros AR7161 rev 2 Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes. Primary data cache 32kB, 4-way, VIPT, cache aliases, linesize 32 bytes Yes; it would print PIPT if the cache was aliasing-free. However as I understand what you're mapping to userspace is actually device memory, right? Not exactly - I'm using PCI DMA to userspace SG buffers in RAM. The userspace first allocates the buffers in normal RAM (with vmalloc() or something, there is an mmap ioctl() for this), the address returned is 0x7xxx. Then the buffers (which consist of several pages each) are presented to the hw driver which obtains separate (kernel) mapping for each page (kseg0) and does dma_map_sg() and so on. The driver also simply writes to the buffers. This isn't a problem, though - only the incoherence between TLB and kseg0 is. Now that very much sounds like an aliasing issue! The problem is the userspace doesn't see the kernel writes - The 0x7xxx TLB-mapped pages are read-cached and not invalidated while the kernel writes to the same pages using kseg0 addresses. Thanks for looking at this. You're welcome. I'm just wondering the underlying issue might be a generic problem. Ralf -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
Please forgive me my MIPS TLB ignorance. It seems there is a TLB entry pointing to the userspace buffer at the time the kernel pointer (kseg0) is used. Is is an allowed situation on MIPS 24K? buffer: len 0x1000 (first page), userspace pointer 0x77327000, kernel pointer 0x867ac000 (physical address = 0x067ac000) TLB Index: 15 pgmask=4kb va=77326000 asid=be [pa=01149000 c=3 d=1 v=1 g=0] [pa=067ac000 c=3 d=1 v=1 g=0] Should the TLB entry be deleted before using the kernel pointer (which points at the same page)? -- Krzysztof Halasa Research Institute for Automation and Measurements PIAP Al. Jerozolimskie 202, 02-486 Warsaw, Poland -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
On Mon, Oct 07, 2013 at 10:38:49AM +0200, Krzysztof Hałasa wrote: Please forgive me my MIPS TLB ignorance. May the manual be with you :-) It seems there is a TLB entry pointing to the userspace buffer at the time the kernel pointer (kseg0) is used. Is is an allowed situation on MIPS 24K? buffer: len 0x1000 (first page), userspace pointer 0x77327000, kernel pointer 0x867ac000 (physical address = 0x067ac000) TLB Index: 15 pgmask=4kb va=77326000 asid=be [pa=01149000 c=3 d=1 v=1 g=0] [pa=067ac000 c=3 d=1 v=1 g=0] Should the TLB entry be deleted before using the kernel pointer (which points at the same page)? That's fine. You just need to ensure that there are no virtual aliases. One way to do so is to increase the page size to 16kB. Note that there is a variant of the 24K which has a VIPT cache but uses hardware to resolve cache aliases. That is, from a kernel cache management perspective it behaves like a PIPT cache. However as I understand what you're mapping to userspace is actually device memory, right? You probably want to map that uncached. That's a long standing issue in these two macros: /* * Convert a physical pointer to a virtual kernel pointer for /dev/mem * access */ #define xlate_dev_mem_ptr(p)__va(p) /* * Convert a virtual cached pointer to an uncached pointer */ #define xlate_dev_kmem_ptr(p) p which are defined in arch/mips/include/asm/io.h. These should return a KSEG1 (uncached XKPHYS) address for anything but RAM. Would that explain your observations? Ralf -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
I'm debugging a problem with a SOLO6110-based H.264 PCI video encoder on Atheros AR7100-based (MIPS, big-endian) platform. BTW this CPU obviously has VIPT data cache, this means a physical page with multiple virtual addresses (e.g. mapped multiple times) may and will be cached multiple times. AR7100 = arch/mips/ath79. -- Krzysztof Halasa Research Institute for Automation and Measurements PIAP Al. Jerozolimskie 202, 02-486 Warsaw, Poland -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Suspected cache coherency problem on V4L2 and AR7100 CPU
Hi, I'm debugging a problem with a SOLO6110-based H.264 PCI video encoder on Atheros AR7100-based (MIPS, big-endian) platform. The problem manifests itself with stale data being returned by the driver (using ioctl VIDIOC_DQBUF). The stale date always starts and ends on 32-byte cache line boundary. The driver is drivers/staging/media/solo6x10. Initially I thought the encoder hardware is at fault (though it works on i686 and on (both endians) ARM). But I've eliminated actual DMA accesses from the driver and the problems still persists. The control flow is now basically the following: - userspace program initializes the adapter and allocates 192 KB long buffers (at least 2 of them): open(/dev/video1); various ioctl() calls for (cnt = 0; cnt buffer_count; cnt++) { struct v4l2_buffer buf = { .type = V4L2_BUF_TYPE_VIDEO_CAPTURE, .memory = V4L2_MEMORY_MMAP, .index = cnt, }; ioctl(stream-fd, VIDIOC_QUERYBUF, buf); mmap(NULL, buf.length, PROT_READ | PROT_WRITE, MAP_SHARED, stream-fd, buf.m.offset); } and then: for (cnt = 0; cnt buffer_count; cnt++) { struct v4l2_buffer buf = { .type = V4L2_BUF_TYPE_VIDEO_CAPTURE, .memory = V4L2_MEMORY_MMAP, .index = cnt, ioctl(stream-fd, VIDIOC_QBUF, buf); } The buffers are now queued. The driver (upon receiving an encoded frame) now mostly does: various spin_lock() etc. vb = list_first_entry(solo_enc-vidq_active, struct solo_vb2_buf, list); list_del(vb-list); struct vb2_dma_sg_desc *vbuf = vb2_dma_sg_plane_desc(vb, 0); /* now we have vbuf-sglist which corresponds to a single userspace 192-KB buffer */ vb2_set_plane_payload(vb, 0, 1024 /* bytes */); static u32 n = 0; /* a counter to mark each buffer */ /* the following is normally done using dma_map_sg() and DMA, and also with sg_copy_from_buffer() - eliminated for now */ /* I do the following instead */ struct page *page = sg_page(vbuf-sglist); u32 *addr = kmap_atomic(page); /* 4 times as large, I know, the buffer is much longer though */ for (i = 0; i 1024; i++) addr[i] = n; flush_dcache_page(page); /* and/or */ flush_kernel_dcache_page(page); kunmap_atomic(addr); vb-v4l2_buf.sequence = solo_enc-sequence++; vb-v4l2_buf.timestamp.tv_sec = vop_sec(vh); vb-v4l2_buf.timestamp.tv_usec = vop_usec(vh); vb2_buffer_done(vb, VB2_BUF_STATE_DONE); The userspace server now does ioctl(VIDIOC_DQBUF), sends it using UDP, and populates buffer pool again with ioctl(VIDIOC_QBUF). The driver uses directly-mapped cached (kernel) pointers to access the buffers (0x8000-0x9FFF kseg0 region) while (obviously) userspace uses TLB-mapped pointers. I have verified with a JTAG-based debugger (OpenOCD) that the buffers are flushed to DRAM (0xAxxx uncached directly-mapped region has valid data), however the userspace TLB-mapped buffers (which correspond to the same physical DRAM addresses) partially contain old cached data (from previous iterations). The question is which part of the code is at fault, and how do I fix it. I understand invalidating (and perhaps first flushing) userspace buffers (cache) should generally fix the problem. This could also be a simple bug rather than API/platform incompatibility because usually (though not always) only 1 of the buffers gets corrupted (the second one of two). It looks like this - valid buffer, counter n = 0x499, a fragment of actual UDP packet: 0x0030: 0499 0499 0499 0499 0x0040: 0499 0499 0499 0499 0x0050: 0499 0499 0499 0499 0x0060: 0499 0499 0499 0499 0x0070: 0499 0499 0499 0499 0x0080: 0499 0499 0499 0499 0x0090: 0499 0499 0499 0499 0x00a0: 0499 0499 0499 0499 0x00b0: 0499 0499 0499 0499 next buffer is corrupted, n = 0x49A: 0x0030: 049a 049a 049a 049a 0x0040: 049a 0468 0468 0468 ...h...h...h 0x0050: 0468 0468 0468 0468 ...h...h...h...h 0x0060: 0468 049a 049a 049a ...h 0x0070: 049a 049a 049a 049a 0x0080: 049a 049a 049a 049a 0x0090: