Re: Use of vector instructions in memmov/memset expanding

2011-11-07 Thread Jan Hubicka
Hi, Jan! I was just preparing my version of the patch, but it seems a bit late now. Please see my comments to this and your previous letter below. By the way, would it be possible to commit other part of the patch (middle-end part) - probably also by small parts - and some other tuning

Re: Use of vector instructions in memmov/memset expanding

2011-11-06 Thread Jan Hubicka
There is rolled loop algorithm, that doesn't use SSE-modes - such architectures could use it instead of unrolled_loop. I think the performance wouldn't suffer much from that. For the most of modern processors, SSE-moves are faster than several word-sized moves, so this change in

Re: Use of vector instructions in memmov/memset expanding

2011-11-02 Thread Jan Hubicka
Hi, I am going to benchmark the following hunk separately tonight. It is independent change. Rth, Vladimir: there are obviously several options how to make GCC use SSE for 64bit loads/stores in 32bit codegen (and 128bit loads/stores in 128bit codegen). What do you think is best variant here? (an

Re: Use of vector instructions in memmov/memset expanding

2011-11-02 Thread Jan Hubicka
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2c53423..6ce240a 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -561,10 +561,14 @@ struct processor_costs ix86_size_cost = {/* costs for tuning for size */ COSTS_N_BYTES (2), /* cost of

Re: Use of vector instructions in memmov/memset expanding

2011-10-28 Thread Richard Henderson
On 10/28/2011 05:41 AM, Michael Zolotukhin wrote: +/* Target hook. Returns rtx of mode MODE with promoted value VAL, that is + supposed to represent one byte. MODE could be a vector mode. + Example: + 1) VAL = const_int (0xAB), mode = SImode, + the result is const_int

Re: Use of vector instructions in memmov/memset expanding

2011-10-27 Thread Jan Hubicka
Hi, sorry for delay with the review. This is my first pass through the backend part, hopefully someone else will do the middle end bits. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2c53423..d7c4330 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -561,10

Re: Use of vector instructions in memmov/memset expanding

2011-10-26 Thread Michael Zolotukhin
Any questions on these patches? Are they ok for the trunk? On 20 October 2011 12:37, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: And, finally, part with the tests. On 20 October 2011 12:36, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: Back-end part of the patch is

Re: Use of vector instructions in memmov/memset expanding

2011-10-20 Thread Michael Zolotukhin
Middle-end part of the patch is attached. On 20 October 2011 12:34, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: I fixed the tests as well as updated my branch and fixed introduced during this process bugs. Here is fixed complete patch (other parts will be sent in consequent

Re: Use of vector instructions in memmov/memset expanding

2011-09-29 Thread Jakub Jelinek
Hi! On Thu, Sep 29, 2011 at 03:14:40PM +0400, Michael Zolotukhin wrote: +/* { dg-options -O2 -march=atom -mtune=atom -m64 -dp } */ The testcases are wrong, -m64 or -m32 should never appear in dg-options,

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin
Attached is a part 1 of patch that enables use of vector-instructions in memset and memcopy (middle-end part). The main part of the changes is in functions move_by_pieces/set_by_pieces. In new version algorithm of move-mode selection was changed – now it checks if alignment is known at compile

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Andi Kleen
Michael Zolotukhin michael.v.zolotuk...@gmail.com writes: Build and 'make check' was tested. Could you expand a bit on the performance benefits? Where does it help? -Andi -- a...@linux.intel.com -- Speaking for myself only

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Jakub Jelinek
On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote: Michael Zolotukhin michael.v.zolotuk...@gmail.com writes: Build and 'make check' was tested. Could you expand a bit on the performance benefits? Where does it help? Especially when glibc these days has very well optimized

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Jan Hubicka
On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote: Michael Zolotukhin michael.v.zolotuk...@gmail.com writes: Build and 'make check' was tested. Could you expand a bit on the performance benefits? Where does it help? Especially when glibc these days has very well

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin
This expanding only works on relatively small sizes (up to 4k), where overhead of library call could be quite significant. In some cases new implementation gives 5x acceleration (especially on small sizes - less than ~256 bytes). Almost on all sizes from 16 to 4096 bytes there is a some gain, in

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin
Do you know glibc version numbers when the optimized string functions was introduced? Afaik, it's 2.13. I also compared my implementation to 2.13.

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Jan Hubicka
Do you know glibc version numbers when the optimized string functions was introduced? Afaik, it's 2.13. I also compared my implementation to 2.13. I wonder if we can assume that most of GCC 4.7 based systems will be glibc 2.13 based, too. I would tend to say that yes and thus would

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin
(I worry about the tables in i386.c deciding what strategy to use for block of given size. This is more or less unrelated to the actual patch) Yep, the threshold values I mentioned above are the values in these tables. Even with fast glibs there are some cases when inlining is profitable (e.g.

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Jack Howarth
On Wed, Sep 28, 2011 at 02:56:30PM +0400, Michael Zolotukhin wrote: Attached is a part 1 of patch that enables use of vector-instructions in memset and memcopy (middle-end part). The main part of the changes is in functions move_by_pieces/set_by_pieces. In new version algorithm of move-mode

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin
  It appears that part 1 of the patch wasn't really attached. Thanks, resending. memfunc-mid.patch Description: Binary data

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Andi Kleen
On Wed, Sep 28, 2011 at 02:54:34PM +0200, Jan Hubicka wrote: Do you know glibc version numbers when the optimized string functions was introduced? Afaik, it's 2.13. I also compared my implementation to 2.13. I wonder if we can assume that most of GCC 4.7 based systems will be glibc

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin
You could add a check to configure and generate based on that? Do you mean check if glibc is newer than 2.13? I think that when new glibc version is released, the tables should be re-examined anyway - we shouldn't just stop inlining, or stop generating libcalls. BTW I know that the tables need

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Andi Kleen
There is no separate cost-table for Nehalem or SandyBridge - however, I tuned generic32 and generic64 tables, that should improve performance on modern processors. In old version REP-MOV was used - it The recommended heuristics have changed in Nehalem and Sandy-Bridge over earlier Intel CPUs.

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Andi Kleen
On Wed, Sep 28, 2011 at 06:27:11PM +0200, Andi Kleen wrote: There is no separate cost-table for Nehalem or SandyBridge - however, I tuned generic32 and generic64 tables, that should improve performance on modern processors. In old version REP-MOV was used - it The recommended heuristics

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin
Michael, Did you bootstrap with --enable-checking=yes? I am seeing the bootstrap failure... I checked bootstrap, specs and 'make check' with the complete patch. Separate patches for ME and BE were only tested for build (no bootstrap) and 'make check'. I think it's better to apply the

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin
Sorry what I meant is that it would be bad if -mtune=corei7(-avx)? was slower than generic. For now, -mtune=corei7 is triggering use of generic cost-table (I'm not sure about corei7-avx, but assume the same) - so it won't be slower. Adding new tables shouldn't be very difficult, even if they

Re: Use of vector instructions in memmov/memset expanding

2011-08-22 Thread Michael Zolotukhin
Ping. On 18 July 2011 15:00, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: Here is a summary - probably, it doesn't cover every single piece in the patch, but I tried to describe the major changes. I hope this will help you a bit - and of course I'll answer your further questions if

Re: Use of vector instructions in memmov/memset expanding

2011-07-26 Thread Michael Zolotukhin
Any updates/questions on this? On 18 July 2011 15:00, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: Here is a summary - probably, it doesn't cover every single piece in the patch, but I tried to describe the major changes. I hope this will help you a bit - and of course I'll answer

Re: Use of vector instructions in memmov/memset expanding

2011-07-18 Thread Michael Zolotukhin
Here is a summary - probably, it doesn't cover every single piece in the patch, but I tried to describe the major changes. I hope this will help you a bit - and of course I'll answer your further questions if they appear. The changes could be logically divided into two parts (though, these parts

Re: Use of vector instructions in memmov/memset expanding

2011-07-15 Thread Jan Hubicka
New algorithm for move-mode selection is implemented for move_by_pieces, store_by_pieces. x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in similar way, x86 cost-models parameters are slightly changed to support this. This implementation checks if array's

Re: Use of vector instructions in memmov/memset expanding

2011-07-13 Thread Uros Bizjak
Hello! Please don't use -m32/-m64 in testcases directly. You should use /* { dg-do compile { target { ! ia32 } } } */ for 32bit insns and /* { dg-do compile { target { ia32 } } } */ for 64bit insns. Also, there is no need to add -mtune if -march is already specified. -mtune will follow

Re: Use of vector instructions in memmov/memset expanding

2011-07-11 Thread Michael Zolotukhin
Resending in plain text: On 11 July 2011 23:50, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: The attached patch enables use of vector instructions in memmov/memset expanding. New algorithm for move-mode selection is implemented for move_by_pieces, store_by_pieces. x86

Re: Use of vector instructions in memmov/memset expanding

2011-07-11 Thread H.J. Lu
On Mon, Jul 11, 2011 at 1:57 PM, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: Sorry, for sending once again - forgot to attach the patch. On 11 July 2011 23:50, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: The attached patch enables use of vector instructions in memmov