[tip:x86/asm] x86/defconfig: Turn on CONFIG_CC_OPTIMIZE_FOR_SIZE= y in the 64-bit defconfig

2013-01-26 Thread tip-bot for Ma Ling
Commit-ID: d94ffd677469ef729e9d6e968191872577a6119e Gitweb: http://git.kernel.org/tip/d94ffd677469ef729e9d6e968191872577a6119e Author: Ma Ling AuthorDate: Fri, 25 Jan 2013 09:11:01 -0500 Committer: Ingo Molnar CommitDate: Sat, 26 Jan 2013 13:09:15 +0100 x86/defconfig: Turn on

[tip:x86/asm] x86/asm: Clean up copy_page_*() comments and code

2012-10-24 Thread tip-bot for Ma Ling
Commit-ID: 269833bd5a0f4443873da358b71675a890b47c3c Gitweb: http://git.kernel.org/tip/269833bd5a0f4443873da358b71675a890b47c3c Author: Ma Ling AuthorDate: Thu, 18 Oct 2012 03:52:45 +0800 Committer: Ingo Molnar CommitDate: Wed, 24 Oct 2012 12:42:47 +0200 x86/asm: Clean up copy_page_

RE: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging instruction sequence and saving register

2012-10-14 Thread Ma, Ling
2 6:58 PM > To: Ma, Ling > Cc: Konrad Rzeszutek Wilk; mi...@elte.hu; h...@zytor.com; > t...@linutronix.de; linux-kernel@vger.kernel.org; i...@google.com; > George Spelvin > Subject: Re: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging > instruction sequence and saving r

RE: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging instruction sequence and saving register

2012-10-12 Thread Ma, Ling
> If you can't test the CPUs who run this code I think it's safer if you > add a new variant for Atom, not change the existing well tested code. > Otherwise you risk performance regressions on these older CPUs. I found one older machine, and tested the code on it, the results between them are alm

RE: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging instruction sequence and saving register

2012-10-12 Thread Ma, Ling
> > > So is that also true for AMD CPUs? > > Although Bulldozer put 32byte instruction into decoupled 16byte entry > > buffers, it still decode 4 instructions per cycle, so 4 instructions > > will be fed into execution unit and > > 2 loads ,1 write will be issued per cycle. > > I'd be very interes

RE: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging instruction sequence and saving register

2012-10-11 Thread Ma, Ling
> > Load and write operation occupy about 35% and 10% respectively for > > most industry benchmarks. Fetched 16-aligned bytes code include about > > 4 instructions, implying 1.34(0.35 * 4) load, 0.4 write. > > Modern CPU support 2 load and 1 write per cycle, so throughput from > > write is bottlene

RE: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging instruction sequence and saving register

2012-10-11 Thread Ma, Ling
> > Load and write operation occupy about 35% and 10% respectively for > > most industry benchmarks. Fetched 16-aligned bytes code include about > > 4 instructions, implying 1.34(0.35 * 4) load, 0.4 write. > > Modern CPU support 2 load and 1 write per cycle, so throughput from > > write is bottlene