Re: On optimizing Theora

2009-02-22 Thread Tiago Marques
Hi,

Can you please try both options with also the following
ones:*-ftree-vectorize -funroll-loops -m3dnow
*

Also, it may be a good idea to test both geode and i586 with *-m3dnow *and *
-mno3dnow*, since the compiler may be causing problems while vectorizing.
Another option is to test also with i486 compilations, as per what I had
already found in this thread:

http://geode.insideo.net/info-linux_archives/msg00396.html

Let me underscore my colleague's statement.  Do not use the 586 target.
In testing we've found that the 586 optimized version can be up to 3x
slower vs. the 386/486 versions on the Geode LX.


This should be due to Geode LX not being a superscalar processor (while the
i586 is) may be causing problems even with the i586 march.

Best regards,
  Tiago Marques



On Fri, Feb 20, 2009 at 2:23 PM, Benjamin M. Schwartz 
bmsch...@fas.harvard.edu wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Tomeu Vizoso wrote:
  On Fri, Feb 20, 2009 at 06:41,  qu...@laptop.org wrote:
  On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote:
  GCC 4.3 evidently does not do a very good job of optimizing for geode.
  What percentage of CPU time was spent in libtheora?

 100%.  The encoder was operating in a continuous loop.

  Yeah, both X and jffs2 seem to use a lot of cpu on the XO, so if they
  were involved during your tests, you may have seen little of theora
  itself.

 Neither X nor jffs2 was involved.  The input file (y4m or ogv) was cached
 in memory, and the output stream (ogv or y4m) was being sent directly to
 /dev/null, and not displayed.

 The only action being taken in X was to display, in the Terminal activity,
 a text-only progress bar, rendered by the encoder_example, or dump_video
 command.  These commands are part of libtheora, and were recompiled with
 it, so the point remains.

 - --Ben
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v2.0.9 (GNU/Linux)

 iEYEARECAAYFAkmevNoACgkQUJT6e6HFtqR6tACeO1ZzMrBs/u1RZiGLqS19AJEv
 RD4An26lFRgJ1sRxktsSlG18WjVQ92d7
 =eIOq
 -END PGP SIGNATURE-
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On optimizing Theora

2009-02-22 Thread Benjamin M. Schwartz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tiago Marques wrote:
 Can you please try both options with also the following
 ones:*-ftree-vectorize -funroll-loops -m3dnow

(1) libtheora automatically adds the flags -O3 -fforce-addr
- -fomit-frame-pointer -finline-functions -funroll-loops to any specified
CFLAGS.

(2) libtheora's inner loops are largely hand-optimized MMX assembly, so
vectorization and 3dnow are unlikely to have a significant impact.

(3) I am not particularly interested in trolling through every combination
of relevant gcc flags in search of performance benefit.  That's the
compiler's (and compiler writers') job.  My point, instead, was that gcc
(at least the version in 767) does not have a good code generator for
Geode, and therefore we should not expect any performance increase by
rebuilding everything -march=geode.

If you are interested in searching for the perfect compiler flags, perhaps
you would like to try Acovea (http://www.coyotegulch.com/products/acovea/).

- --Ben
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAkmhp7AACgkQUJT6e6HFtqRt4wCgl4CpYwb3OqlxUfwkgVvuMsk6
UcYAoJ54o4Oyhgl056lF6HQbbtf245O2
=dFCy
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On optimizing Theora

2009-02-20 Thread Tomeu Vizoso
On Fri, Feb 20, 2009 at 06:41,  qu...@laptop.org wrote:
 On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote:
 GCC 4.3 evidently does not do a very good job of optimizing for geode.

 What percentage of CPU time was spent in libtheora?

Yeah, both X and jffs2 seem to use a lot of cpu on the XO, so if they
were involved during your tests, you may have seen little of theora
itself.

Regards,

Tomeu
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On optimizing Theora

2009-02-20 Thread Benjamin M. Schwartz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tomeu Vizoso wrote:
 On Fri, Feb 20, 2009 at 06:41,  qu...@laptop.org wrote:
 On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote:
 GCC 4.3 evidently does not do a very good job of optimizing for geode.
 What percentage of CPU time was spent in libtheora?

100%.  The encoder was operating in a continuous loop.

 Yeah, both X and jffs2 seem to use a lot of cpu on the XO, so if they
 were involved during your tests, you may have seen little of theora
 itself.

Neither X nor jffs2 was involved.  The input file (y4m or ogv) was cached
in memory, and the output stream (ogv or y4m) was being sent directly to
/dev/null, and not displayed.

The only action being taken in X was to display, in the Terminal activity,
a text-only progress bar, rendered by the encoder_example, or dump_video
command.  These commands are part of libtheora, and were recompiled with
it, so the point remains.

- --Ben
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAkmevNoACgkQUJT6e6HFtqR6tACeO1ZzMrBs/u1RZiGLqS19AJEv
RD4An26lFRgJ1sRxktsSlG18WjVQ92d7
=eIOq
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On optimizing Theora

2009-02-19 Thread quozl
On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote:
 GCC 4.3 evidently does not do a very good job of optimizing for geode.

What percentage of CPU time was spent in libtheora?

-- 
James Cameronmailto:qu...@us.netrek.org http://quozl.netrek.org/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel