Re: High CPU usage when video streaming on EHCI, 3.17.8 kernel with DMA enabled

2015-07-02 Thread Tony Lindgren
* Tom Mises tommi...@gmail.com [150701 11:24]:
 On Mon, May 18, 2015 at 8:59 AM, Tony Lindgren t...@atomide.com wrote:
  For the memcpy part with DMA this patch should help too:
 
  [PATCH] dmaengine: omap-dma: Add support for memcpy
 
 The patch should help, but not out of the box as far as I understand.
 My understanding is that to take advantage of this, it would be
 necessary to explicitly call the DMA API either by replacing memcpy()
 in uvc_video_decode_data() or by modyfying the memcpy()
 implementation. Is that correct?

Yes you should be able to configure the DMA to do memory-to-memory
copy where both source and destination address is looping over
the desired buffer size. Ideally you would move the data from camera
directly to the buffer without doing a memcpy inbetween.

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: High CPU usage when video streaming on EHCI, 3.17.8 kernel with DMA enabled

2015-07-01 Thread Tom Mises
I has been some time, but I have an update to share.

On Sun, May 17, 2015 at 8:06 AM, Alan Stern st...@rowland.harvard.edu wrote:
 You might be able to learn more from ftrace.  See the instructions in
 Documentation/trace/ftrace.txt.  The irqsoff tracer may be the best
 one to try.

I used ftrace with the irqoff tracer as recommended. irqoff seems to
have some known issues on ARM and it traces the idle process, however,
I was able to get some meaningful output when running this tracer for
a very short time. I ran the function_graph tracer as well, which is
quite informative, but it also adds a lot of overhead.

On Mon, May 18, 2015 at 2:27 AM, Matthieu CASTET
matthieu.cas...@parrot.com wrote:
 last time I checked uvc driver, it was doing memcpy of video buffer : see 
 uvc_video_decode_data.

I found out Matthieu was on the right track and most of the time is
indeed spent in uvc_video_decode_data(), which uses memcpy() to copy
the URBs to the video buffers. For 1280x960px 15fps single 8-bit
channel,  uvc_video_decode_data() seems to consume between 40% and
50%. What is still a mystery to me is why I see relatively high CPU
usage even for a very tiny resolution. I was able to confirm that it
is not uvc_video_decode_data() in that case, however, given
function_graph overhead and irqoff being partially broken I was unable
to figure out much more.

I attached some tracer outputs at the end.

On Mon, May 18, 2015 at 8:59 AM, Tony Lindgren t...@atomide.com wrote:
 For the memcpy part with DMA this patch should help too:

 [PATCH] dmaengine: omap-dma: Add support for memcpy

The patch should help, but not out of the box as far as I understand.
My understanding is that to take advantage of this, it would be
necessary to explicitly call the DMA API either by replacing memcpy()
in uvc_video_decode_data() or by modyfying the memcpy()
implementation. Is that correct?


***
/sys/kernel/debug/tracing# echo 0  tracing_max_latency  echo 1 
tracing_on  usleep 10  echo 0  tracing_on
# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 3.17.8-custom
# 
# latency: 579 us, #26/26, CPU#0 | (M:server VP:0, KP:0, SP:0 HP:0 #P:1)
#-
#| task: sshd-2980 (uid:0 nice:0 policy:0 rt_prio:0)
#-
#  = started at: __usb_hcd_giveback_urb
#  = ended at:   __usb_hcd_giveback_urb
#
#
#  _--= CPU#
# / _-= irqs-off
#| / _= need-resched
#|| / _---= hardirq/softirq
#||| / _--= preempt-depth
# / delay
#  cmd pid   | time  |   caller
# \   /  |  \|   /
sshd-29800d.s.0us : trace_hardirqs_on -tasklet_action
sshd-29800d.s.0us : time_hardirqs_on -tasklet_action
sshd-29800d.s.0us : trace_hardirqs_off -_raw_spin_lock_irq
sshd-29800d.s.0us : trace_hardirqs_on -_raw_spin_unlock_irq
sshd-29800d.s.0us : time_hardirqs_on -_raw_spin_unlock_irq
sshd-29800d.s.0us+: trace_hardirqs_off -__usb_hcd_giveback_urb
sshd-29800d.s.   31us : uvc_video_complete -__usb_hcd_giveback_urb
sshd-29800d.s.   31us : uvc_video_decode_bulk -uvc_video_complete
sshd-29800d.s.   31us : uvc_video_decode_start -uvc_video_decode_bulk
sshd-29800d.s.   31us : ktime_get_ts64 -uvc_video_decode_start
sshd-29800d.s.   31us : clocksource_mmio_readl_up -ktime_get_ts64
sshd-29800d.s.   31us!: uvc_video_decode_data.isra.17
-uvc_video_decode_bulk
sshd-29800d.s.  519us : usb_submit_urb -uvc_video_complete
sshd-29800d.s.  519us : usb_hcd_submit_urb -usb_submit_urb
sshd-29800d.s.  519us : usb_get_urb -usb_hcd_submit_urb
sshd-29800d.s.  519us : usb_hcd_map_urb_for_dma -usb_hcd_submit_urb
sshd-29800d.s.  519us+: ehci_urb_enqueue -usb_hcd_submit_urb
sshd-29800d.s.  549us : qh_urb_transaction -ehci_urb_enqueue
sshd-29800d.s.  549us : ehci_qtd_alloc.isra.71 -qh_urb_transaction
sshd-29800d.s.  549us : dma_pool_alloc -ehci_qtd_alloc.isra.71
sshd-29800d.s.  549us : qtd_fill.isra.54 -qh_urb_transaction
sshd-29800d.s.  549us : usb_hcd_link_urb_to_ep -ehci_urb_enqueue
sshd-29800d.s.  549us+: qh_append_tds -ehci_urb_enqueue
sshd-29800d.s.  580us : trace_hardirqs_on -__usb_hcd_giveback_urb
sshd-29800d.s.  580us+: time_hardirqs_on -__usb_hcd_giveback_urb
sshd-29800d.s.  671us : stack trace
 = trace_hardirqs_on
 = __usb_hcd_giveback_urb
 = usb_giveback_urb_bh
 = tasklet_action
 = __do_softirq
 = irq_exit
 = handle_IRQ
 = omap3_intc_handle_irq
 = __irq_usr
 =
***


***
# tracer: function_graph
#
# TIME  DURATIONFUNCTION CALLS
#  | |   |  |   |   |   |
 5853.350952   0.000 us| 

Re: High CPU usage when video streaming on EHCI, 3.17.8 kernel with DMA enabled

2015-05-21 Thread Ash Charles
On Mon, May 18, 2015 at 7:59 AM, Tony Lindgren t...@atomide.com wrote:
 [PATCH] dmaengine: omap-dma: Add support for memcpy

Hi Tom,

To follow up on your corresponding mail to the gumstix list [1], I
tried on the 3.17.8 kernel on an Overo COM (DM37390) with the
suggested dmaengine patch applied.  Using hdparm -t --DIRECT
/dev/sda with a USB hard drive attached via the USB EHCI port, I see
a read speed of ~24MB/s (vs ~33MB/s on my laptop).  I see the same
behaviour on a recent kernel (from tony/omap-for-v4.2/cleanup,
otherwise works nicely on an Overo FWIW) with and without the
dmaengine patch.

What would be the expected USB read speed for an OMAP3 platform?

--Ash
[1] http://gumstix.8.x6.nabble.com/Low-USB-speed-on-Overo-td4969907.html
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html