Re: On optimizing Theora
Hi, Can you please try both options with also the following ones:*-ftree-vectorize -funroll-loops -m3dnow * Also, it may be a good idea to test both geode and i586 with *-m3dnow *and * -mno3dnow*, since the compiler may be causing problems while vectorizing. Another option is to test also with i486 compilations, as per what I had already found in this thread: http://geode.insideo.net/info-linux_archives/msg00396.html Let me underscore my colleague's statement. Do not use the 586 target. In testing we've found that the 586 optimized version can be up to 3x slower vs. the 386/486 versions on the Geode LX. This should be due to Geode LX not being a superscalar processor (while the i586 is) may be causing problems even with the i586 march. Best regards, Tiago Marques On Fri, Feb 20, 2009 at 2:23 PM, Benjamin M. Schwartz bmsch...@fas.harvard.edu wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tomeu Vizoso wrote: On Fri, Feb 20, 2009 at 06:41, qu...@laptop.org wrote: On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote: GCC 4.3 evidently does not do a very good job of optimizing for geode. What percentage of CPU time was spent in libtheora? 100%. The encoder was operating in a continuous loop. Yeah, both X and jffs2 seem to use a lot of cpu on the XO, so if they were involved during your tests, you may have seen little of theora itself. Neither X nor jffs2 was involved. The input file (y4m or ogv) was cached in memory, and the output stream (ogv or y4m) was being sent directly to /dev/null, and not displayed. The only action being taken in X was to display, in the Terminal activity, a text-only progress bar, rendered by the encoder_example, or dump_video command. These commands are part of libtheora, and were recompiled with it, so the point remains. - --Ben -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkmevNoACgkQUJT6e6HFtqR6tACeO1ZzMrBs/u1RZiGLqS19AJEv RD4An26lFRgJ1sRxktsSlG18WjVQ92d7 =eIOq -END PGP SIGNATURE- ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: On optimizing Theora
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tiago Marques wrote: Can you please try both options with also the following ones:*-ftree-vectorize -funroll-loops -m3dnow (1) libtheora automatically adds the flags -O3 -fforce-addr - -fomit-frame-pointer -finline-functions -funroll-loops to any specified CFLAGS. (2) libtheora's inner loops are largely hand-optimized MMX assembly, so vectorization and 3dnow are unlikely to have a significant impact. (3) I am not particularly interested in trolling through every combination of relevant gcc flags in search of performance benefit. That's the compiler's (and compiler writers') job. My point, instead, was that gcc (at least the version in 767) does not have a good code generator for Geode, and therefore we should not expect any performance increase by rebuilding everything -march=geode. If you are interested in searching for the perfect compiler flags, perhaps you would like to try Acovea (http://www.coyotegulch.com/products/acovea/). - --Ben -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkmhp7AACgkQUJT6e6HFtqRt4wCgl4CpYwb3OqlxUfwkgVvuMsk6 UcYAoJ54o4Oyhgl056lF6HQbbtf245O2 =dFCy -END PGP SIGNATURE- ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: On optimizing Theora
On Fri, Feb 20, 2009 at 06:41, qu...@laptop.org wrote: On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote: GCC 4.3 evidently does not do a very good job of optimizing for geode. What percentage of CPU time was spent in libtheora? Yeah, both X and jffs2 seem to use a lot of cpu on the XO, so if they were involved during your tests, you may have seen little of theora itself. Regards, Tomeu ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: On optimizing Theora
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tomeu Vizoso wrote: On Fri, Feb 20, 2009 at 06:41, qu...@laptop.org wrote: On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote: GCC 4.3 evidently does not do a very good job of optimizing for geode. What percentage of CPU time was spent in libtheora? 100%. The encoder was operating in a continuous loop. Yeah, both X and jffs2 seem to use a lot of cpu on the XO, so if they were involved during your tests, you may have seen little of theora itself. Neither X nor jffs2 was involved. The input file (y4m or ogv) was cached in memory, and the output stream (ogv or y4m) was being sent directly to /dev/null, and not displayed. The only action being taken in X was to display, in the Terminal activity, a text-only progress bar, rendered by the encoder_example, or dump_video command. These commands are part of libtheora, and were recompiled with it, so the point remains. - --Ben -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkmevNoACgkQUJT6e6HFtqR6tACeO1ZzMrBs/u1RZiGLqS19AJEv RD4An26lFRgJ1sRxktsSlG18WjVQ92d7 =eIOq -END PGP SIGNATURE- ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: On optimizing Theora
On Fri, Feb 20, 2009 at 12:28:42AM -0500, Benjamin M. Schwartz wrote: GCC 4.3 evidently does not do a very good job of optimizing for geode. What percentage of CPU time was spent in libtheora? -- James Cameronmailto:qu...@us.netrek.org http://quozl.netrek.org/ ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel