Hi, Jan!
I was just preparing my version of the patch, but it seems a bit late
now. Please see my comments to this and your previous letter below.
By the way, would it be possible to commit other part of the patch
(middle-end part) - probably also by small parts - and some other
tuning
There is rolled loop algorithm, that doesn't use SSE-modes - such
architectures could use it instead of unrolled_loop. I think the
performance wouldn't suffer much from that.
For the most of modern processors, SSE-moves are faster than several
word-sized moves, so this change in
Hi,
I am going to benchmark the following hunk separately tonight. It is
independent change.
Rth, Vladimir: there are obviously several options how to make GCC use SSE for
64bit loads/stores in 32bit codegen (and 128bit loads/stores in 128bit
codegen). What do you think is best variant here?
(an
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2c53423..6ce240a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -561,10 +561,14 @@ struct processor_costs ix86_size_cost = {/* costs for
tuning for size */
COSTS_N_BYTES (2), /* cost of
On 10/28/2011 05:41 AM, Michael Zolotukhin wrote:
+/* Target hook. Returns rtx of mode MODE with promoted value VAL, that is
+ supposed to represent one byte. MODE could be a vector mode.
+ Example:
+ 1) VAL = const_int (0xAB), mode = SImode,
+ the result is const_int
Hi,
sorry for delay with the review. This is my first pass through the backend
part, hopefully
someone else will do the middle end bits.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2c53423..d7c4330 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -561,10
Any questions on these patches? Are they ok for the trunk?
On 20 October 2011 12:37, Michael Zolotukhin
michael.v.zolotuk...@gmail.com wrote:
And, finally, part with the tests.
On 20 October 2011 12:36, Michael Zolotukhin
michael.v.zolotuk...@gmail.com wrote:
Back-end part of the patch is
Middle-end part of the patch is attached.
On 20 October 2011 12:34, Michael Zolotukhin
michael.v.zolotuk...@gmail.com wrote:
I fixed the tests as well as updated my branch and fixed introduced
during this process bugs.
Here is fixed complete patch (other parts will be sent in consequent
Hi!
On Thu, Sep 29, 2011 at 03:14:40PM +0400, Michael Zolotukhin wrote:
+/* { dg-options -O2 -march=atom -mtune=atom -m64 -dp } */
The testcases are wrong, -m64 or -m32 should never appear in dg-options,
Attached is a part 1 of patch that enables use of vector-instructions
in memset and memcopy (middle-end part).
The main part of the changes is in functions
move_by_pieces/set_by_pieces. In new version algorithm of move-mode
selection was changed – now it checks if alignment is known at compile
Michael Zolotukhin michael.v.zolotuk...@gmail.com writes:
Build and 'make check' was tested.
Could you expand a bit on the performance benefits? Where does it help?
-Andi
--
a...@linux.intel.com -- Speaking for myself only
On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote:
Michael Zolotukhin michael.v.zolotuk...@gmail.com writes:
Build and 'make check' was tested.
Could you expand a bit on the performance benefits? Where does it help?
Especially when glibc these days has very well optimized
On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote:
Michael Zolotukhin michael.v.zolotuk...@gmail.com writes:
Build and 'make check' was tested.
Could you expand a bit on the performance benefits? Where does it help?
Especially when glibc these days has very well
This expanding only works on relatively small sizes (up to 4k), where
overhead of library call could be quite significant. In some cases new
implementation gives 5x acceleration (especially on small sizes - less
than ~256 bytes). Almost on all sizes from 16 to 4096 bytes there is a
some gain, in
Do you know glibc version numbers when
the optimized string functions was introduced?
Afaik, it's 2.13.
I also compared my implementation to 2.13.
Do you know glibc version numbers when
the optimized string functions was introduced?
Afaik, it's 2.13.
I also compared my implementation to 2.13.
I wonder if we can assume that most of GCC 4.7 based systems will be glibc 2.13
based, too. I would tend to say that yes and thus would
(I worry about the tables in i386.c deciding what strategy to use for block of
given size. This is more or less unrelated to the actual patch)
Yep, the threshold values I mentioned above are the values in these
tables. Even with fast glibs there are some cases when inlining is
profitable (e.g.
On Wed, Sep 28, 2011 at 02:56:30PM +0400, Michael Zolotukhin wrote:
Attached is a part 1 of patch that enables use of vector-instructions
in memset and memcopy (middle-end part).
The main part of the changes is in functions
move_by_pieces/set_by_pieces. In new version algorithm of move-mode
It appears that part 1 of the patch wasn't really attached.
Thanks, resending.
memfunc-mid.patch
Description: Binary data
On Wed, Sep 28, 2011 at 02:54:34PM +0200, Jan Hubicka wrote:
Do you know glibc version numbers when
the optimized string functions was introduced?
Afaik, it's 2.13.
I also compared my implementation to 2.13.
I wonder if we can assume that most of GCC 4.7 based systems will be glibc
You could add a check to configure and generate based on that?
Do you mean check if glibc is newer than 2.13?
I think that when new glibc version is released, the tables should be
re-examined anyway - we shouldn't just stop inlining, or stop
generating libcalls.
BTW I know that the tables need
There is no separate cost-table for Nehalem or SandyBridge - however,
I tuned generic32 and generic64 tables, that should improve
performance on modern processors. In old version REP-MOV was used - it
The recommended heuristics have changed in Nehalem and Sandy-Bridge
over earlier Intel CPUs.
On Wed, Sep 28, 2011 at 06:27:11PM +0200, Andi Kleen wrote:
There is no separate cost-table for Nehalem or SandyBridge - however,
I tuned generic32 and generic64 tables, that should improve
performance on modern processors. In old version REP-MOV was used - it
The recommended heuristics
Michael,
Did you bootstrap with --enable-checking=yes? I am seeing the bootstrap
failure...
I checked bootstrap, specs and 'make check' with the complete patch.
Separate patches for ME and BE were only tested for build (no
bootstrap) and 'make check'. I think it's better to apply the
Sorry what I meant is that it would be bad if -mtune=corei7(-avx)? was
slower than generic.
For now, -mtune=corei7 is triggering use of generic cost-table (I'm
not sure about corei7-avx, but assume the same) - so it won't be
slower.
Adding new tables shouldn't be very difficult, even if they
Ping.
On 18 July 2011 15:00, Michael Zolotukhin
michael.v.zolotuk...@gmail.com wrote:
Here is a summary - probably, it doesn't cover every single piece in
the patch, but I tried to describe the major changes. I hope this will
help you a bit - and of course I'll answer your further questions if
Any updates/questions on this?
On 18 July 2011 15:00, Michael Zolotukhin
michael.v.zolotuk...@gmail.com wrote:
Here is a summary - probably, it doesn't cover every single piece in
the patch, but I tried to describe the major changes. I hope this will
help you a bit - and of course I'll answer
Here is a summary - probably, it doesn't cover every single piece in
the patch, but I tried to describe the major changes. I hope this will
help you a bit - and of course I'll answer your further questions if
they appear.
The changes could be logically divided into two parts (though, these
parts
New algorithm for move-mode selection is implemented for move_by_pieces,
store_by_pieces.
x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in
similar way, x86 cost-models parameters are slightly changed to support
this. This implementation checks if array's
Hello!
Please don't use -m32/-m64 in testcases directly.
You should use
/* { dg-do compile { target { ! ia32 } } } */
for 32bit insns and
/* { dg-do compile { target { ia32 } } } */
for 64bit insns.
Also, there is no need to add -mtune if -march is already specified.
-mtune will follow
Resending in plain text:
On 11 July 2011 23:50, Michael Zolotukhin
michael.v.zolotuk...@gmail.com wrote:
The attached patch enables use of vector instructions in memmov/memset
expanding.
New algorithm for move-mode selection is implemented for move_by_pieces,
store_by_pieces.
x86
On Mon, Jul 11, 2011 at 1:57 PM, Michael Zolotukhin
michael.v.zolotuk...@gmail.com wrote:
Sorry, for sending once again - forgot to attach the patch.
On 11 July 2011 23:50, Michael Zolotukhin
michael.v.zolotuk...@gmail.com wrote:
The attached patch enables use of vector instructions in memmov
32 matches
Mail list logo