--- Comment #6 from pinskia at gcc dot gnu dot org 2008-09-02 20:45 ---
The main difference between -O2 and -Os is that csum_partial is inlined for -Os
and unrolling is disabled for -Os.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37312
--- Comment #5 from pinskia at gmail dot com 2008-09-01 20:41 ---
Subject: Re: -Os significantly faster than -O2 on test case
This is mostly because of extra register moves that IRA some times
introduces. There is another bug about Inline-asm and the return
register.
Sent from my
This is mostly because of extra register moves that IRA some times
introduces. There is another bug about Inline-asm and the return
register.
Sent from my iPhone
On Sep 1, 2008, at 7:36, "rguenth at gcc dot gnu dot org" <[EMAIL PROTECTED]
> wrote:
--- Comment #4 from rguenth at gcc
--- Comment #4 from rguenth at gcc dot gnu dot org 2008-09-01 14:36 ---
Well, now -Os -funroll-all-loops doesn't do any unrolling anymore while it did
before. With -O2 you get what you ask for - unrolled loops.
-funroll-all-loops isn't really a flag to be used in general.
--
http:
--- Comment #3 from andi-gcc at firstfloor dot org 2008-09-01 14:20 ---
Thanks for the us^whelpful comment. If you can suggest a way to do carry
preserving addition without inline assembler that would be fine, otherwise not.
-Os seems to do something that improves it at least (and that
--- Comment #2 from rguenth at gcc dot gnu dot org 2008-09-01 13:42 ---
Uh, well. The code ist mostly inline assembly which doesn't give GCC much
freedom to do something. I guess -O2 simply optimizes "too much" around the
asm. Try not using inline assembly instead.
--
http://gcc.
--- Comment #1 from andi-gcc at firstfloor dot org 2008-09-01 11:22 ---
Created an attachment (id=16178)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16178&action=view)
test case
checksum functions extracted from the Linux kernel.
Not preprocessed, but should compile on any x86