RE: [maemo-developers] Xvideo support for Nokia 770?

2007-02-08 Thread Simon Pickering
 
 2. Reliable information that is detailed enough for performing graphics 
 and audio output from DSP, see 
 http://maemo.org/pipermail/maemo-developers/2007-February/007949.html
 
 Again throwing some options in the air
 1. Dump DSP. If I remember the driver for the audio chip is 
 in the omap tree. So (probably never tried) is to remove all 
 dsp related components from the image, and then have 
 everything purely on ARM [maybe worth a shot] 2. use ALSA DSP

It seems a shame to drop use of the DSP. It may be generally hard to program
well, but even off-loading some parts (of otherwise ARM code) to the DSP will
free up the ARM CPU to do more.

Add to this the interest involved in hacking/writing code for the DSP and this
is something I certainly want to pursue.

As a final point, if the hardware is there, don't you find it frustrating not
being able to use it? DSP, IVA, 2D/3D acceleration - all just sat there waiting
to be exploited (if we can find out how to do so)!

It would just be nice if the DSP learning curve could be made a little less
steep (and less opaque). With that said I'll keep fiddling with the DSP tools in
the hope I find the right combination.

Cheers,


Simon

___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] Xvideo support for Nokia 770?

2007-02-07 Thread Siarhei Siamashka
Hello, 

It would be probably a good idea to discuss different possibilities for
improving multimedia support on 770/N800.

Now we have a fast JIT scaler that runs on ARM core, it solves all the 
video resolution related performance problems. I'm going to work on 
improving quality, performance and its inclusion into upstream ffmpeg 
library, this task is in my nearest plans:
http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2007-January/051209.html

As for the ways of improving multimedia support on Nokia 770, it may be 
done in the following ways (in no particular order):
1. Continue ffmpeg optimizations (motion compensation functions, finetune
idct, have a look at the possibilities to optimize codecs other than mpeg4 and
its variants)
2. Implement Xvideo extension support for Nokia 770 (using scaling done on 
ARM core)
3. Implement XvMC in some way (using C55x DSP for it as it is supposedly good
for IDCT and motion compensation stuff)
4. Improve GStreamer plugins (replacements for dspfbsink and dspmpeg4sink
running on ARM core, it could probably improve mpeg4 playback performance a
lot and allow using higher video bitrates and resolutions that are currently
available in MPlayer)
5. Try to relay color format conversion and scaling to DSP. If it works as
expected, video scaling can be done with almost zero overhead for ARM 
core. Theoretically the same trick could probably also work for GStreamer
if video output sink can provide its own buffer (::buffer_alloc). The first
step would be to try just doing nonscaled color format conversion. If it is
successful, some more advanced stuff can be tried such as JIT dynamic 
code generation on C55x.
6. Try porting vorbis decoder (tremor) to DSP
7. Try porting libmpeg2 to DSP. With audio decoding and scaling done on 
ARM core, it might improve overall mpeg2 playback performance, I wonder if
nonconverted DVD video playback is even theoretically possible on 
Nokia 770.

That's quite a big list and it contains some things that might be generally
nice to have, but have relatively low practical value and are actually not
worth efforts implementing :)

There are two issues that need to be solved for this all to become reality:
1. We need some way of applying community developed upgrades for core 
system components such as xserver and xlib (if we go after Xvideo support on
Nokia 770). They must be easy to install by end users, otherwise this all
development does not make much sense. It would be also nice to integrate 
these improvements into official firmware later, but I wonder if Nokia has
spare resources for doing this integration and its quality assurance.
2. Reliable information that is detailed enough for performing graphics and
audio output from DSP, see 
http://maemo.org/pipermail/maemo-developers/2007-February/007949.html
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] Xvideo support for Nokia 770?

2007-02-07 Thread Frantisek Dufka

Siarhei Siamashka wrote:


6. Try porting vorbis decoder (tremor) to DSP


Thanks to Johannes Sandvall and Erik Montnémery this was already done
http://fanoush.wz.cz/maemo/sandvall-thesis.pdf
http://fanoush.wz.cz/maemo/sandvall-tremor.patch

we just need a way to output audio fom dsp task and compile this thing.

Frantisek
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] Xvideo support for Nokia 770?

2007-01-17 Thread Siarhei Siamashka
On Wednesday 10 January 2007 01:51, Charles 'Buck' Krasic wrote:

 Siarhei Siamashka wrote:
  Actually I have been thinking about trying to implement Xvideo
  support on 770 for some time already. Now as N800 has Xvideo
  support, it would be nice to have it on 770 as well for better
  consistency and software compatibility.

 As you may recall, I was considering this back in August/September.
 I tried a few things, and reported some of my findings to this list.
 The code for all that is still available here:
 http://qstream.org/~krasic/770/dsp/

Yes, sure I remember. Thanks for doing these experiments and making 
the results available. It really helps to have more information around.

  I see the following possible options:
 
  1. Implement it just using ARM core and optimize it as much as
  possible (using dynamically generated code for scaling to get the
  best performance). Is quite a straightforward solution and only
  needs time to implement it.

 It is my impression that this might be the most attractive option.
 I noticed that TCPMP which seems to be the most performant player for
 the ARM uses this approach, and it is available under GPL, so it may
 be possible to adapt some of its code.

 In the long run, I would hope that integrating TCPMP scaling code into
 libswscale of the ffmpeg project might be the most elegant approach,
 since that seems to be the most performant/featureful/widel adopted
 open-source scaling code (but not yet on ARM).   For mplayer, it works
 out of the box, since libswcale actually originated from mplayer, and
 only recently migrated to ffmpeg.

I see, thanks for the information (I checked TCPMP sources some time ago, 
but was interested in runtime cpu capabilities detection code and did not look
at the scaler that time). Using TCPMP code may be an interesting option. But I
also still may try to make my own scaler implementation for two reasons:
1. TCPMP is covered by GPL license, and most parts of ffmpeg are LGPL, so
probably it makes sense making a clean room implementation of JIT powered
scaler for ARM under LGPL license
2. I'm worried about the performance. Knowing how the cache and write buffer
work on arm926 core, it is possible to tune generated code for it and get the
best performance possible. So the results can be better than for TCPMP.

I have just committed some initial assembly optimizations for unscaled
yuv420p - yuyv422 color format convertor to maemo mplayer SVN. It already
provides some performance improvement, for example on my test video file
(640x480 resolution, 24 fps) I get the following results now:

BENCHMARKs: VC: 114.526s VO:  21.055s A:   0.000s Sys:   1.582s =  137.163s
BENCHMARK%: VC: 83.4962% VO: 15.3503% A:  0.% Sys:  1.1535% = 100.%

We can compare it with the older results (decoding time was also 
improved a bit since that time because of recent assembly optimizations 
for dequantizer):
http://maemo.org/pipermail/maemo-developers/2006-December/006646.html

BENCHMARKs: VC: 121.282s VO:  31.538s A:   0.000s Sys:   1.577s =  154.397s
BENCHMARK%: VC: 78.5517% VO: 20.4267% A:  0.% Sys:  1.0216% = 100.%

Most of the speed improvement in color conversion and video output (VO: part)
is gained just from loop unrolling and avoiding using some extra instructions
as gcc does when compiling C code, but using STMD instruction to store 16
bytes at once at aligned location [1] provides at least 10% performance here.
If we estimate memory copy speed here with additional colorspace conversion
applied, it is about 70MB/s now for 640x480 24 fps video (though we need to
read a bit less data than write here, so it is a bit different from memcpy).
And I have observed peak memcpy performance about 110MB/s on Nokia 770. 
So this color convertor is quite close to memory bandwidth limit now. This
code can be optimized more by processing two image lines at once, so we can
get rid of some data read instructions and improve performance. Also
experimenting with prefetch reads may provide some improvement.

JIT generated code should have a bit worse performance, but not much. It we
decide to make 'nearest neghbour' scaling, the result should be probably as
fast as this nonscaled conversion. But I want to try some simplified variation
of bilinear scaling: each pixel in the destination buffer is either a copy of
some pixel in the source buffer or an average value of two pixels. This way it
should only introduce two extra instructions for each byte in output at
maximum: addition of two pixel color components and right shift.

  2. Try using dsp tasks that already exist on the device and are
  used for dspfbsink. But the sources of gst plugins contain code
  that limits video resolution for dspfbsink. I wonder if this check
  was introduced artificially or it is the limitation of DSP scaler
  and it can't handle anything larger than that. Also I wonder if
  existing video scaler DSP task can support direct rendering [2].

 I tried direct rendering in the 

RE: [maemo-developers] Xvideo support for Nokia 770?

2007-01-11 Thread Simon Pickering
 
 As you may recall, I was considering this back in August/September.  
 I tried a few things, and reported some of my findings to this list.  
 The code for all that is still available here:  
 http://qstream.org/~krasic/770/dsp/

snip

  2. Try using dsp tasks that already exist on the device and 
 are used 
  for dspfbsink. But the sources of gst plugins contain code 
 that limits 
  video resolution for dspfbsink. I wonder if this check was 
 introduced 
  artificially or it is the limitation of DSP scaler and it 
 can't handle 
  anything larger than that. Also I wonder if existing video 
 scaler DSP 
  task can support direct rendering [2].
 
 I tried direct rendering in the above mentioned 
 experimentation.  I never got it to work exactly correctly, 
 i.e. I could get images fragments on the screen, but they 
 were not the whole image, and never
 in exactly the correct screen position.   I suspected this was tied to
 the baroque memory addressing constraints of the DSP (e.g. 16bit data
 item limitations).   I tried very hard to work around them but was not
 successful.

Was this the demo_fb task, or something different? I see that demo_console
has been compiled (in dspgw-3.3-dsp/apps/demo_mod), but I can't see demo_fb
having been compiled in situ (dspgw-3.3-dsp/apps/demo). If it was something
different, could you point me to the code please?

I ask as I'm trying to get the demo_fb code to work. Demo_console works fine
and outputs the message to the screen, but demo_fb complains with the
following message: 

# ./demo_fb fbadr=30
open: Device or resource busy

Anyone have any ideas why this might be? I assume this is caused by the
open() call in the arm-side demo_fb app (see dspgw-3.3-arm/apps/demo):

fd = open(/dev/dsptask/demo_fb, O_RDWR);

I'm just not sure what would cause the busy message when the demo_console
runs fine before and after I try demo_fb.

I altered the demo_fb.c code slightly to add an if defined() statement for
the Nokia 770, which I hope should set the screen dimensions correctly. I
must add that I've not tried it without this modification, but will do so
this evening to check.

I also pulled the framebuffer address out of /lib/dsp/avs_kernelcfg.cmd on
the 770. Is this the address I should use?

  3. Try implementing a new DSP based scaler from scratch. The most 
  important thing to know is how to access framebuffer 
 directly from DSP 
  and move data to it from mapped buffer without any overhead.
  The first test implementation can just perform nonscaled planar
  YV12 - packed YUV422 conversion, if it proves to be fast 
 and useful, 
  it could be extended to also support scaling.
 
 This is what I did in August.   I did YUV - YUV scaling plus RGB
 conversion on the DSP.   I think I did YUV-YUV scaling later. The
 results (performance) were abysmal.   Maybe I committed some mortal
 DSP programming sins that dragged the performance down, but it was soo
 slow I gave up even hoping.   I think my DSP code was maxed out on the
 DSP at like 20 fps, where the ARM was able to do 24fps with 
 about 10-20% cpu.
 
 Anyway, my code is still there which may be a start if you want to
 attempt it.   However, I think your first option is probably the most
 fruitful option.My little project made me very cynical of the
 value of the DSP.  ;-)

Again, could you give me a pointer to the directory under which to find this
code?

Thanks,


Simon


___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] Xvideo support for Nokia 770?

2007-01-09 Thread Charles 'Buck' Krasic
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Siarhei Siamashka wrote:

 On Tuesday 09 January 2007 20:59, Charles 'Buck' Krasic wrote:

 Any chance the Xvideo support in the Bora 3.0 will turn up in a
 770 OS?


 I asked the same question on #maemo irc channel and daniels
 explained that video scaling is done by gpu on N800, so probably
 the same code can't be reused on 770:
 https://mg.pov.lt/maemo-irclog/%23maemo.2007-01-08.log.html

 Actually I have been thinking about trying to implement Xvideo
 support on 770 for some time already. Now as N800 has Xvideo
 support, it would be nice to have it on 770 as well for better
 consistency and software compatibility.

As you may recall, I was considering this back in August/September.  
I tried a few things, and reported some of my findings to this list.  
The code for all that is still available here:  
http://qstream.org/~krasic/770/dsp/


 I see the following possible options:

 1. Implement it just using ARM core and optimize it as much as
 possible (using dynamically generated code for scaling to get the
 best performance). Is quite a straightforward solution and only
 needs time to implement it.

It is my impression that this might be the most attractive option.   
I noticed that TCPMP which seems to be the most performant player for
the ARM uses this approach, and it is available under GPL, so it may
be possible to adapt some of its code.

In the long run, I would hope that integrating TCPMP scaling code into
libswscale of the ffmpeg project might be the most elegant approach,
since that seems to be the most performant/featureful/widel adopted
open-source scaling code (but not yet on ARM).   For mplayer, it works
out of the box, since libswcale actually originated from mplayer, and
only recently migrated to ffmpeg.


 2. Try using dsp tasks that already exist on the device and are
 used for dspfbsink. But the sources of gst plugins contain code
 that limits video resolution for dspfbsink. I wonder if this check
 was introduced artificially or it is the limitation of DSP scaler
 and it can't handle anything larger than that. Also I wonder if
 existing video scaler DSP task can support direct rendering [2].

I tried direct rendering in the above mentioned experimentation.  I
never got it to work exactly correctly, i.e. I could get images
fragments on the screen, but they were not the whole image, and never
in exactly the correct screen position.   I suspected this was tied to
the baroque memory addressing constraints of the DSP (e.g. 16bit data
item limitations).   I tried very hard to work around them but was not
successful.

I think the benefits of direct rendering may be a false temptation on
the DSP anyway.My impression was that the DSP access to
framebuffer memory slowed down the scaling algorithm tremendously, so
it was actually faster to scale into DSP local memory, and then do a
fast bulk copy to the FB, or to SDRAM on the ARM side.Plus you
have all the AV synchronization headaches.

I think these gains pale compared to the gain from just using the fb
in YUV mode, and doing all the video stuff on the ARM side.
Hence, option 1 seems to sound very attractive.

 It would need to support arbitrary number of memory mapped buffers
 for video output in order to avoid unnecessary memcpy, otherwise
 performance will suffer.

 Maybe we can ask Nokia developers to provide some information about
 the internals of these plugins. The most important questions are: *
 What are the real capabilities of DSP based scaler, can it be used
 for resolutions let's say up to 800x480?

I doubt 800x480.   The added quality benefit over 400x240 with pixel
doubling in the fb is probably way to marginal to justify the
effort.   The DSP hardware doesn't seem to have any meaningful support
for general scaling (beyond doubling).

 * Where is the screen update performed after dsp has finished
 scaling/converting video from mapped buffer to framebuffer? Is it
 done on ARM side, or probably screen update can be also triggered
 from DSP directly?

I seem to have the rough impression from inspecting X code that ARM
side does the final update (copy) to fb memory.  I'm not 100% sure on
that right now though.

 * Is it possible to get direct rendering [2] support with existing
 dsp tasks on 770? If not, would it be too hard to implement this
 feature? * How are timestamps handled in dsp? Is it possible to
 just send a one shot signal to dsp task for rendering video frame
 from a mapped buffer as fast as possible?

 A brief dsp interface description would be welcome. Maybe some
 questions may be trivial, but unfortunately I did not have much
 time for a detailed walk through the sources in order to figure out
 how this all works. If any Nokia developer finds time for some
 short answers, it would really help a lot.

Agreed.


 3. Try implementing a new DSP based scaler from scratch. The most
 important thing to know is how to access framebuffer directly from
 DSP and move data