Re: On optimizing Theora

2009-02-22 Thread Tiago Marques
Hi,

Can you please try both options with also the following
ones:*-ftree-vectorize -funroll-loops -m3dnow
*

Also, it may be a good idea to test both geode and i586 with *-m3dnow *and *
-mno3dnow*, since the compiler may be causing problems while vectorizing.
Another option is to test also with i486 compilations, as per what I had
already found in this thread:

http://geode.insideo.net/info-linux_archives/msg00396.html

Let me underscore my colleague's statement.  Do not use the 586 target.
In testing we've found that the 586 optimized version can be up to 3x
slower vs. the 386/486 versions on the Geode LX.


This should be due to Geode LX not being a superscalar processor (while the
i586 is) may be causing problems even with the i586 march.

Best regards,
  Tiago Marques



On Fri, Feb 20, 2009 at 2:23 PM, Benjamin M. Schwartz 
bmsch...@fas.harvard.edu wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Tomeu Vizoso wrote:
  On Fri, Feb 20, 2009 at 06:41,  qu...@laptop.org wrote:
  On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote:
  GCC 4.3 evidently does not do a very good job of optimizing for geode.
  What percentage of CPU time was spent in libtheora?

 100%.  The encoder was operating in a continuous loop.

  Yeah, both X and jffs2 seem to use a lot of cpu on the XO, so if they
  were involved during your tests, you may have seen little of theora
  itself.

 Neither X nor jffs2 was involved.  The input file (y4m or ogv) was cached
 in memory, and the output stream (ogv or y4m) was being sent directly to
 /dev/null, and not displayed.

 The only action being taken in X was to display, in the Terminal activity,
 a text-only progress bar, rendered by the encoder_example, or dump_video
 command.  These commands are part of libtheora, and were recompiled with
 it, so the point remains.

 - --Ben
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v2.0.9 (GNU/Linux)

 iEYEARECAAYFAkmevNoACgkQUJT6e6HFtqR6tACeO1ZzMrBs/u1RZiGLqS19AJEv
 RD4An26lFRgJ1sRxktsSlG18WjVQ92d7
 =eIOq
 -END PGP SIGNATURE-
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On optimizing Theora

2009-02-22 Thread Benjamin M. Schwartz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tiago Marques wrote:
 Can you please try both options with also the following
 ones:*-ftree-vectorize -funroll-loops -m3dnow

(1) libtheora automatically adds the flags -O3 -fforce-addr
- -fomit-frame-pointer -finline-functions -funroll-loops to any specified
CFLAGS.

(2) libtheora's inner loops are largely hand-optimized MMX assembly, so
vectorization and 3dnow are unlikely to have a significant impact.

(3) I am not particularly interested in trolling through every combination
of relevant gcc flags in search of performance benefit.  That's the
compiler's (and compiler writers') job.  My point, instead, was that gcc
(at least the version in 767) does not have a good code generator for
Geode, and therefore we should not expect any performance increase by
rebuilding everything -march=geode.

If you are interested in searching for the perfect compiler flags, perhaps
you would like to try Acovea (http://www.coyotegulch.com/products/acovea/).

- --Ben
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAkmhp7AACgkQUJT6e6HFtqRt4wCgl4CpYwb3OqlxUfwkgVvuMsk6
UcYAoJ54o4Oyhgl056lF6HQbbtf245O2
=dFCy
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On optimizing Theora

2009-02-20 Thread Tomeu Vizoso
On Fri, Feb 20, 2009 at 06:41,  qu...@laptop.org wrote:
 On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote:
 GCC 4.3 evidently does not do a very good job of optimizing for geode.

 What percentage of CPU time was spent in libtheora?

Yeah, both X and jffs2 seem to use a lot of cpu on the XO, so if they
were involved during your tests, you may have seen little of theora
itself.

Regards,

Tomeu
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On optimizing Theora

2009-02-20 Thread Benjamin M. Schwartz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tomeu Vizoso wrote:
 On Fri, Feb 20, 2009 at 06:41,  qu...@laptop.org wrote:
 On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote:
 GCC 4.3 evidently does not do a very good job of optimizing for geode.
 What percentage of CPU time was spent in libtheora?

100%.  The encoder was operating in a continuous loop.

 Yeah, both X and jffs2 seem to use a lot of cpu on the XO, so if they
 were involved during your tests, you may have seen little of theora
 itself.

Neither X nor jffs2 was involved.  The input file (y4m or ogv) was cached
in memory, and the output stream (ogv or y4m) was being sent directly to
/dev/null, and not displayed.

The only action being taken in X was to display, in the Terminal activity,
a text-only progress bar, rendered by the encoder_example, or dump_video
command.  These commands are part of libtheora, and were recompiled with
it, so the point remains.

- --Ben
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAkmevNoACgkQUJT6e6HFtqR6tACeO1ZzMrBs/u1RZiGLqS19AJEv
RD4An26lFRgJ1sRxktsSlG18WjVQ92d7
=eIOq
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


On optimizing Theora

2009-02-19 Thread Benjamin M. Schwartz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I have been testing libtheora-1.0 on a MP XO.  On build 767, using F9's
gcc-4.3, I compiled libtheora with CFLAGS=-march=geode.  I tested
encode, with the command

time encoder_example -v 1 coastguard_cif.y4m  /dev/null
using the test video from
http://media.xiph.org/video/derf/y4m/coastguard_qcif.y4m.  This test ran
in 44.15 +/- 0.15 seconds (all times are user time).

I then tested decode, with the command
time dump_video coastguard_cif1.ogv  /dev/null
using the ogg video that would be produced by the encoder above were it
not redirected to /dev/null.  This test ran in 4.60 +/- 0.05 seconds.

I then repeated these tests after recompiling with -march=i586
- -mtune=generic, which I assume are approximately the CFLAGS used by
Fedora.  The resultant times  were 41.6 +/- 0.1 and 4.45 +/- 0.05.

In conclusion, compiling libtheora with -march=geode causes it to run
significantly (20 sigma, 7%) slower than -march=i586 -mtune=generic for
encoding, and possibly slightly slower for decoding as well.  GCC 4.3
evidently does not do a very good job of optimizing for geode.

- --Ben
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAkmeP4oACgkQUJT6e6HFtqQw8wCdEhQQi0qzQNjn++HQU1uQRMXG
+aIAnA/LStzVA7pSZGMRFIWXUbeQv3oc
=wp55
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On optimizing Theora

2009-02-19 Thread quozl
On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote:
 GCC 4.3 evidently does not do a very good job of optimizing for geode.

What percentage of CPU time was spent in libtheora?

-- 
James Cameronmailto:qu...@us.netrek.org http://quozl.netrek.org/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel