Re: recent troubles with float vectors bitwise ops
given that we know that the processor supports bitwise-or on floating point values, using a instruction different from that for bitwise-or on integer values, then it is fair to ask why we don't support vector | vector for floating point vectors. Because processors may add weird instructions for internal reasons, especially in an area where you want to extract every little bit of performance. It is up to the back-end to ensure that the instruction is generated whenever appropriate. Paolo
Re: recent troubles with float vectors bitwise ops
Let's assume that the recent change is what we want, i.e., that the answer to (1) is No, these operations should not be part of the vector extensions because they are not valid scalar extensions. So, that means we need to answer (2). We still have the problem that users now can't write machine-independent code to do this operation. Assuming the operations are useful for something (and, if it weren't, why would Intel want to have instructions for it, and why would tbp want to use it?), I'm not sure that it is *so* useful for a user to have access to it, except for specialized cases: 1) neg, abs and copysign operations on vectors. These we can make available via builtins (for - of course you don't need it); we already support them in many back-ends. abs: cmpeqps xmm1, xmm1 ; xmm1 = all-ones psrlq xmm1, 31 ; xmm1 = all 1000... andnps xmm2, xmm1 ; xmm2 = abs(xmm2) neg: cmpeqps xmm1, xmm1 ; xmm1 = all-ones psrlq xmm1, 31 ; xmm1 = all 1000... xorps xmm2, xmm1 ; xmm2 = -xmm2 copysign: cmpeqps xmm1, xmm1 ; xmm1 = all-ones psrlq xmm1, 31 ; xmm1 = all 1000... andnps xmm2, xmm1 ; xmm2 = abs (xmm2) andps xmm1, xmm3 ; xmm1 = signbit (xmm2) orpsxmm2, xmm1 ; xmm2 = copysign (xmm2, xmm3) 2) selection operations on vectors, kind of (v1 = v2 ? v3 : v4). These can be written for example like this: cmpleps xmm1, xmm2 ; xmm1 = xmm1 = xmm2 ? all-ones : 0 andnps xmm4, xmm1 ; xmm4 = xmm1 = xmm2 ? 0 : xmm4 andps xmm1, xmm3 ; xmm1 = xmm1 = xmm2 ? xmm3 : 0 orpsxmm1, xmm4 ; xmm1 = xmm1 = xmm2 ? xmm3 : xmm4 And we can add as an extension to our vector arithmetic set; they are already supported as VEC_COND_EXPR by the middle-end. For other cases, which do not come to mind at the moment, introducing a couple of casts is not a big deal IMNSHO, especially if we make sure that the generated code is good. Right now, we have good code for SSE, and a prototype patch was posted for SSE2 and up. If we have a good extension for vector arithmetic, we should aim at improving it consistently rather than extending it in unpredictable ways. For example, another useful extension would be the ability to access vectors by item using x[n] (at least with constant expressions). What are these operation used for? Can someone give an example of a kernel than benefits from this kind of thing? See above. Paolo
Re: recent troubles with float vectors bitwise ops
Paolo Bonzini wrote: 2) selection operations on vectors, kind of (v1 = v2 ? v3 : v4). These can be written for example like this: cmpleps xmm1, xmm2 ; xmm1 = xmm1 = xmm2 ? all-ones : 0 andnps xmm4, xmm1 ; xmm4 = xmm1 = xmm2 ? 0 : xmm4 andps xmm1, xmm3 ; xmm1 = xmm1 = xmm2 ? xmm3 : 0 orpsxmm1, xmm4 ; xmm1 = xmm1 = xmm2 ? xmm3 : xmm4 SSE4 introduces specific instruction support, with a shorter sequence for this purpose. It seems to be quite difficult to persuade gcc to use it.
Re: recent troubles with float vectors bitwise ops
Mark Mitchell wrote: One option is for the user to use intrinsics. It's been claimed that results in worse code. There doesn't seem any obvious reason for that, but, if true, we should try to fix it; we don't want to penalize people who are using the intrinsics. So, let's assume using intrinsics is just as efficient, either because it already is, or because we make it so. I maintain that empirical claim; if i compare what gives a simple SOA hybrid 3 coordinates something implemented via intrinsics, builtins and vector when used as the basic component for a raytracer kernel i get as many codegen variations: register allocations differ, stack footprints differ, branches code organization differ, etc... so it's not that surprising performance also differ. It appears the vector builtin (which isn't using __m128 but straight v4sf) implementations are mostly on par while the intrinsic based version is slightly slower. Then you factor in how convenient it is, well... was, to use that vector extension to write such something... Another issue is that for MSVC and ICC, __m128 is a class, but not for gcc so you need more wrapping in C++ but if you know you can let some naked v4sf escape because the compiler always does the right thing with them. Now while there's some subtleties (and annoying 'features'), i should state that gcc4.3, if you're careful, generates mostly excellent SSE code (especially on x86-64, even more so if compared to icc). We still have the problem that users now can't write machine-independent code to do this operation. Assuming the operations are useful for That and writing, say, a generic int,float,double something takes much much more work. What are these operation used for? Can someone give an example of a kernel than benefits from this kind of thing? There's of course what Paolo Bonzini described, but also all kind tricks that knowing such operations are extremely efficient encourages. While it would be nice to have such builtins also operate on vectors, if only because they are so common, it's not quite the same as having full freedom and hardware features exposed.
Re: recent troubles with float vectors bitwise ops
Paolo Bonzini wrote: I'm not sure that it is *so* useful for a user to have access to it, except for specialized cases: As there's other means, it may not be that useful but for sure it's extremely convenient. 2) selection operations on vectors, kind of (v1 = v2 ? v3 : v4). These can be written for example like this: cmpleps xmm1, xmm2 ; xmm1 = xmm1 = xmm2 ? all-ones : 0 andnps xmm4, xmm1 ; xmm4 = xmm1 = xmm2 ? 0 : xmm4 andps xmm1, xmm3 ; xmm1 = xmm1 = xmm2 ? xmm3 : 0 orpsxmm1, xmm4 ; xmm1 = xmm1 = xmm2 ? xmm3 : xmm4 I suppose you'll find such variant of a conditional move pattern in every piece of SSE code. But you can't condense bitwise vs float usage to a few patterns because when writing SSE, the efficiency of those operations is taken for granted. If we have a good extension for vector arithmetic, we should aim at improving it consistently rather than extending it in unpredictable ways. For example, another useful extension would be the ability to access vectors by item using x[n] (at least with constant expressions). Yes, yes and yes.
Re: recent troubles with float vectors bitwise ops
Paolo Bonzini [EMAIL PROTECTED] writes: 1) neg, abs and copysign operations on vectors. These we can make available via builtins (for - of course you don't need it); we already support them in many back-ends. Here is my point of view. People using the vector extensions are already writing inherently machine specific code, and they are (ideally) familiar with the instruction set of their processor. I see no significant disadvantage to gcc to granting them easy access to the capabilities of their processor. Saying that these capabilities are available in other ways just amounts to putting an obstacle in their path. If there is a reason to put in that obstacle--e.g., because we are implementing a language standard and the language standard forbids it--then fine. But citing a PowerPC specific standard to forbid code appropriate for the x86 does not count as a sufficient reason in my book. Permitting this extension continues the preexisting behaviour, and it helps programmers and helps existing code. Who does it hurt to permit this extension? Who does it help to forbid this extension? Ian
Re: recent troubles with float vectors bitwise ops
On Friday 24 August 2007, Ian Lance Taylor wrote: Paolo Bonzini [EMAIL PROTECTED] writes: 1) neg, abs and copysign operations on vectors. These we can make available via builtins (for - of course you don't need it); we already support them in many back-ends. Here is my point of view. People using the vector extensions are already writing inherently machine specific code, and they are (ideally) familiar with the instruction set of their processor. By the same argument, If you're already writing machine specific code then there shouldn't be a problem using machine specific intrinsics. I admit I've never been convinced that the generic vector support was sufficient to write useful code without resorting to machine specific intrinsics. Permitting this extension continues the preexisting behaviour, and it helps programmers and helps existing code. Who does it hurt to permit this extension? Who does it help to forbid this extension? I'm partly worried about cross-platform compatibility, and what this imples for other SIMD targets. At minimum we need to fix the internals documentation to say how to support this extension. The current docs are unclear whether (or:V2SF ...) is valid RTL. Paul
Re: recent troubles with float vectors bitwise ops
On Aug 24, 2007, at 8:02 AM, Ian Lance Taylor wrote: Permitting this extension continues the preexisting behaviour, and it helps programmers and helps existing code. Who does it hurt to permit this extension? Who does it help to forbid this extension? Aren't builtins the designated way to access processor-specific features like this? Why does there have to be C operators for obscure features like this? Wouldn't it be better to fix the code generator to do the right thing regardless of how the user presents it? There is a lot of code that uses casts (including the builtin implementations themselves) - it seems worthwhile to generate instructions for the right domain for this code as well. -Chris
Re: recent troubles with float vectors bitwise ops
Chris Lattner [EMAIL PROTECTED] writes: On Aug 24, 2007, at 8:02 AM, Ian Lance Taylor wrote: Permitting this extension continues the preexisting behaviour, and it helps programmers and helps existing code. Who does it hurt to permit this extension? Who does it help to forbid this extension? Aren't builtins the designated way to access processor-specific features like this? Why does there have to be C operators for obscure features like this? A fair question, but we've already decided to support vector + vector and such operations, and we've decided that that is one valid way to generate vector instructions. That decision may itself have been a mistake. But once we accept that decision, then, given that we know that the processor supports bitwise-or on floating point values, using a instruction different from that for bitwise-or on integer values, then it is fair to ask why we don't support vector | vector for floating point vectors. Wouldn't it be better to fix the code generator to do the right thing regardless of how the user presents it? There is a lot of code that uses casts (including the builtin implementations themselves) - it seems worthwhile to generate instructions for the right domain for this code as well. I completely agree. Ian
Re: recent troubles with float vectors bitwise ops
Paul Brook wrote: On Friday 24 August 2007, Ian Lance Taylor wrote: Paolo Bonzini [EMAIL PROTECTED] writes: 1) neg, abs and copysign operations on vectors. These we can make available via builtins (for - of course you don't need it); we already support them in many back-ends. Here is my point of view. People using the vector extensions are already writing inherently machine specific code, and they are (ideally) familiar with the instruction set of their processor. By the same argument, If you're already writing machine specific code then there shouldn't be a problem using machine specific intrinsics. I admit I've never been convinced that the generic vector support was sufficient to write useful code without resorting to machine specific intrinsics. Our VSIPL++ team is using it for some things. My guess is that it's probably not sufficient for all things, but probably is sufficient for many things. Also, I expect some users get (say) a 4x speedup over C code easily by using the vector extension, and could get an 8x speedup by using intrinsics, but with a lot more work. So, the vector extensions give them a sweet spot on the performance/effort/portability curve. I'm partly worried about cross-platform compatibility, and what this imples for other SIMD targets. Yes. Here's a proposed definition: Let a and b be floating-point operands of type F, where F is a floating-point type. Let N be the number of bytes in F. Then, a | b is defined as: ({ union fi { F f; char bytes[N]; }; union fi au; union fi bu; au.f = a; bu.f = b; for (i = 0; i N; ++i) au.bytes[i] |= bu.bytes[i]; au.f; }) If the resulting floating-point value is denormal, NaN, etc., whether or not exceptions are raised is unspecified. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: recent troubles with float vectors bitwise ops
On Aug 24, 2007, at 8:37 AM, Ian Lance Taylor wrote: Chris Lattner [EMAIL PROTECTED] writes: On Aug 24, 2007, at 8:02 AM, Ian Lance Taylor wrote: Permitting this extension continues the preexisting behaviour, and it helps programmers and helps existing code. Who does it hurt to permit this extension? Who does it help to forbid this extension? Aren't builtins the designated way to access processor-specific features like this? Why does there have to be C operators for obscure features like this? A fair question, but we've already decided to support vector + vector and such operations, and we've decided that that is one valid way to generate vector instructions. That decision may itself have been a mistake. But once we accept that decision, then, given that we know that the processor supports bitwise-or on floating point values, using a instruction different from that for bitwise-or on integer values, then it is fair to ask why we don't support vector | vector for floating point vectors. My personal opinion is that the grammar and type rules of the language should be defined independently of the target. + is allowed on all generic vectors for all targets. Allowing ^| to be used on FP vectors on some targets but not others seems extremely inconsistent (generic vectors are supposed to provide some amount of portability after all). Allowing these operators on all targets also seems strange to me, but is a better solution than allowing them on some targets but not others. I consider pollution of the IR to be a significant problem. If you allow this, you suddenly have tree nodes and RTL nodes for logical operations that have to handle operands that are FP vectors. I imagine that this will result in either 1) subtle bugs in various transformations that work on these or 2) special case code to handle this in various cases, spread through the optimizer. -Chris
Re: recent troubles with float vectors bitwise ops
On 8/24/07, Mark Mitchell [EMAIL PROTECTED] wrote: Let a and b be floating-point operands of type F, where F is a floating-point type. Let N be the number of bytes in F. Then, a | b is defined as: Yes that makes sense, not. Since most of the time, you have a mask and that is what is being used. Like masking the the sign bit or doing a selection. The mask is most likely a NaN anyways so having that undefined just does not make sense. So is this going to be on scalars? If not, then we should still not accept it on vectors. -- Pinski
Re: recent troubles with float vectors bitwise ops
Andrew Pinski wrote: On 8/24/07, Mark Mitchell [EMAIL PROTECTED] wrote: Let a and b be floating-point operands of type F, where F is a floating-point type. Let N be the number of bytes in F. Then, a | b is defined as: Yes that makes sense, not. I'm not following. Are you agreeing or disagreeing? Since most of the time, you have a mask and that is what is being used. Like masking the the sign bit or doing a selection. The mask is most likely a NaN anyways so having that undefined just does not make sense. I'm not following. What I meant was that if the result was a NaN, whether or not floating-point exceptions were signalled was unspecified. Where does undefined come into it, and what does that have to do with the mask? If we think that no hardware will ever signal an exception in this case, then we can say that the operation never signals an exception. But, I was afraid that might be too strong a constraint. So is this going to be on scalars? If not, then we should still not accept it on vectors. Yes, from a language-design point of view, it should be for both scalars and vectors, so I wrote the strawman definition in terms of scalars. Of course, if where it's actually useful is vectors, then implementing it for vectors is the important case, and whether or not we get around to doing it on scalars is secondary. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
RE: recent troubles with float vectors bitwise ops
On 24 August 2007 17:04, Andrew Pinski wrote: On 8/24/07, Mark Mitchell [EMAIL PROTECTED] wrote: Let a and b be floating-point operands of type F, where F is a floating-point type. Let N be the number of bytes in F. Then, a | b is defined as: Yes that makes sense, not. Since most of the time, you have a mask and that is what is being used. http://en.wikipedia.org/wiki/Weasel_word. Like masking the the sign bit or doing a selection. The mask is most likely a NaN anyways so having that undefined just does not make sense. What are you talking about? I can't even parse this rant. cheers, DaveK -- Can't think of a witty .sigline today
Re: recent troubles with float vectors bitwise ops
I'm partly worried about cross-platform compatibility, and what this imples for other SIMD targets. Yes. Here's a proposed definition: snip I agree this is the only sane definition. I probably wasn't clear: My main concern is that if we do support this extension the internals should be implemented and documented in such a way that target maintainers (i.e. me) can figure out how to make it work on their favourite target. We should not just quietly flip some bit in the x86 backend. Paul
Re: recent troubles with float vectors bitwise ops
Paul Brook wrote: I probably wasn't clear: My main concern is that if we do support this extension the internals should be implemented and documented in such a way that target maintainers (i.e. me) can figure out how to make it work on their favourite target. We should not just quietly flip some bit in the x86 backend. Totally agreed. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: recent troubles with float vectors bitwise ops
If there is a reason to put in that obstacle--e.g., because we are implementing a language standard and the language standard forbids it--then fine. But citing a PowerPC specific standard to forbid code appropriate for the x86 does not count as a sufficient reason in my book. The code I want to forbid is actually appropriate not only for the x86; the exact same code is appropriate for PowerPC, because the same kind of masking operations can be used there. However, for some reason, the PowerPC spec chose *not* to allow vector float bitwise operations, and I agree with it; the reason I want to avoid this, is that it goes against our guideline for vector extensions (i.e. valarray). Users can also achieve the same effect with casts, and in addition I would like to trade this lost ability with two gained abilities. First, I want GCC to produce the exact same code with and without casts. Second, I want GCC to have builtins supporting most common uses of the idiom, so that users can actually do without casts *and* bitwise operations 99% of the time. Paolo
Re: recent troubles with float vectors bitwise ops
On Fri, Aug 24, 2007 at 02:34:27PM -0400, Ross Ridge wrote: Mark Mitchell Let's assume that the recent change is what we want, i.e., that the answer to (1) is No, these operations should not be part of the vector extensions because they are not valid scalar extensions. I don't think we should assume that. If we were to we'd also have to change vector casts to work like scalar casts and actually convert the values. (Or like valarray, disallow them completely.) That would force a solution like Paolo Bonzini's to use unions instead of casts, making it even more cumbersome. In C++, you could use reinterpret_cast (meaning that values are not converted, just reinterpreted as integers of the same size). That would avoid the need for unions, you'd just cast. But this solution doesn't work for C. Using vector casts that behave differently than scalar casts has a lot more potential to generate confusion than allowing bitwise operations on vector floats does. I suppose you could have an appropriately named intrinsic for doing a reinterpret_cast in C (that is, the type would be reinterpreted but it would be a no-op at machine level). Then, to do a masking operation you could write ovec = __as_float_vector(MASK | __as_int_vector(ivec));
Re: recent troubles with float vectors bitwise ops
Mark Mitchell Let's assume that the recent change is what we want, i.e., that the answer to (1) is No, these operations should not be part of the vector extensions because they are not valid scalar extensions. I don't think we should assume that. If we were to we'd also have to change vector casts to work like scalar casts and actually convert the values. (Or like valarray, disallow them completely.) That would force a solution like Paolo Bonzini's to use unions instead of casts, making it even more cumbersome. If you look at what these bitwise operations are doing, they're taking a floating point vector and applying an operation (eg. negation) to certain members of the vector of according to a (normally) constant mask. They're really unaray floating-point vector operations. I don't think it's unreasonable to want to express these operations using floating-point vector types directly. Using vector casts that behave differently than scalar casts has a lot more potential to generate confusion than allowing bitwise operations on vector floats does. As I see it, there's two ways you can express these kinds operations without using casts that are both cumbersome and misleading. The easy way would be to just revert the change, and allow bitwise operations on vector floats. This is essentially an old-school programmer-knows-best solution where the compiler provides operators that represent the sort of operations generally supported by CPUs. Even on Altivec these bitwise operations on vector floats are meaningful and useful. The other way is to provide a complete set operations that would make using the bitwise operators pretty much unnecessary, like it is with scalar floats. For example, you can express masked negation by multiplying with a constant vector of -1.0 and 1.0 elements. It shouldn't be too hard for GCC to optimize this into an appropriate bitwise instruction for the target. For other operations the solution isn't as nice. You could implement a set of builtin functions easily enough, but it wouldn't be much better than using target specific intrinsics. Chances are though that operatations are going to be missed. For example, I doubt anyone unfamiliar with 3D programming would've seen the need for only negating part of a vector. (A more concise way to eliminate the need for the bitwise operations on vector floats would be to implement either the swizzles used in 3D shaders or array indexing on vectors. It would require a lot of work to implement properly, so I don't see it happening.) Ross Ridge
Re: recent troubles with float vectors bitwise ops
The IA-32 instruction set does distignuish between integer and floating point bitiwse operations. In addition to the single-precision floating-point bitwise instructions that tbp mentioned (ORPS, ANDPS, ANDNPS and XORPS) there are both distinct double-precision floating-point bitwise instructions (ORPD, ANDPD, ANDNPD and XORPD) and integer bitwise instructions (POR, PAND, PANDN and PXOR). While these operations all do the same thing, they can differ in performance depending on the context. Oops, I only remembered PS vs. PD (I remembered POR as MMX instructions only). I believe that optimizing this should be a task for the x86 machine dependent reorg. Paolo
Re: recent troubles with float vectors bitwise ops
Why did Intel split up these instructions in the first place, is it because they wanted to have a seperate vector units in some cases? I don't know and I don't care that much. To some extent I agree with Andrew Pinski here. Saying that you need support in a generic vector extension for vector float | vector float in order to generate ANDPS and not PXOR, is just wrong. That should be done by the back-end. Paolo
Re: recent troubles with float vectors bitwise ops
Ross Ridge wrote: If I were tbp, I'd just code all his vector operatations using intrinsics. The other responses in this thread have made it clear that GCC's vector arithemetic operations are really only designed to be used with the Cell Broadband Engine and other Power PC processors. Thing is my main use for that extension is for a specialization (made on a rainy day out of boredom) of a basic something re-used all over in my code; the default implementation uses intrinsics. It turns out, when benchmarked, that i get better code with the specialization. So it's more convenient and faster, win/win. I'm unsure why the code is better in the end, perhaps because the may_alias attribute of __m128, perhaps because some builtins which are used to implement those intrinsics are mistyped (ie v4si __builtin_ia32_cmpltps (v4sf, v4sf))... i don't know, i'd need to try a builtin based specialization. In any case that vector extension is now totally useless on x86 and conflicts with the documentation.
Re: recent troubles with float vectors bitwise ops
Andrew Pinski wrote: Which hardware (remember GCC is a generic compiler)? VMX/Altivec and SPU actually does not have different instructions for bitwise and/ior/xor for different vector types (it is all the same instruction). I have ran into ICEs with even bitwise on vector float/double on x86 also in the past which is the other reason why I disabled them. Since this is an extension, it would be nice if it was nicely defined extension which means disabling them for vector float/double. It *was* neatly defined: The types defined in this manner can be used with a subset of normal C operations. Currently, GCC will allow using the following operators on these types: +, -, *, /, unary minus, ^, |, , ~.. So can you, pretty please, also patch the documentation and maybe point to the Altivec spec as it's obviously the only one relevant no matter what platform you're on?
Re: recent troubles with float vectors bitwise ops
The types defined in this manner can be used with a subset of normal C operations. Currently, GCC will allow using the following operators on these types: +, -, *, /, unary minus, ^, |, , ~.. What was missing is when allowed by the base type. E.g. is not supported. Paolo
Re: recent troubles with float vectors bitwise ops
Paolo Bonzini wrote: To some extent I agree with Andrew Pinski here. Saying that you need support in a generic vector extension for vector float | vector float in order to generate ANDPS and not PXOR, is just wrong. That should be done by the back-end. I guess i fail to grasp the logic mandating that the intended source level, strictly typed, 'vector float | vector float' should be mangled into an int op with frantic casts to magically emerge out from the backend as the original 'vector float | vector float', but i'm not a compiler maintener: for me it smells like a regression.
Re: recent troubles with float vectors bitwise ops
tbp wrote: Paolo Bonzini wrote: To some extent I agree with Andrew Pinski here. Saying that you need support in a generic vector extension for vector float | vector float in order to generate ANDPS and not PXOR, is just wrong. That should be done by the back-end. I guess i fail to grasp the logic mandating that the intended source level, strictly typed, 'vector float | vector float' should be mangled into an int op with frantic casts to magically emerge out from the backend as the original 'vector float | vector float', but i'm not a compiler maintener: for me it smells like a regression. Because it's *not* strictly typed. Strict typing means that you accept the same things accepted for the element type. So it's not a regression, it's a bug fix. Paolo
Re: recent troubles with float vectors bitwise ops
GCC makes the problem is even worse if only SSE and not SSE 2 instructions are enabled. Since the integer bitwise instructions are only available with SSE 2, using casts instead of intrinsics causes GCC to expand the operation into a long series of instructions. This was also a bug and a patch for this has been posted and approved. Paolo
Re: recent troubles with float vectors bitwise ops
Paolo Bonzini wrote: Because it's *not* strictly typed. Strict typing means that you accept the same things accepted for the element type. So it's not a regression, it's a bug fix. # cat regressionorbugfix.cc typedef float v4sf_t __attribute__ ((__vector_size__ (16))); typedef int v4si_t __attribute__ ((__vector_size__ (16))); v4sf_t foo(v4sf_t a, v4sf_t b, v4sf_t c) { return a + (b | c); } v4sf_t bar(v4sf_t a, v4sf_t b, v4sf_t c) { return a + (v4sf_t) ((v4si_t) b | (v4si_t) c); } int main() { return 0; } 00400a30 foo(float __vector, float __vector, float __vector): 400a30: orps %xmm2,%xmm1 400a33: addps %xmm1,%xmm0 400a36: retq 00400a40 bar(float __vector, float __vector, float __vector): 400a40: por%xmm2,%xmm1 400a44: addps %xmm1,%xmm0 400a47: retq I'm surely not qualified to argue about typing, but you'd need a rather strong distortion field to not characterize that as a regression.
Re: recent troubles with float vectors bitwise ops
# cat regressionorbugfix.cc typedef float v4sf_t __attribute__ ((__vector_size__ (16))); typedef int v4si_t __attribute__ ((__vector_size__ (16))); v4sf_t foo(v4sf_t a, v4sf_t b, v4sf_t c) { return a + (b | c); } v4sf_t bar(v4sf_t a, v4sf_t b, v4sf_t c) { return a + (v4sf_t) ((v4si_t) b | (v4si_t) c); } int main() { return 0; } 00400a30 foo(float __vector, float __vector, float __vector): 400a30: orps %xmm2,%xmm1 400a33: addps %xmm1,%xmm0 400a36: retq 00400a40 bar(float __vector, float __vector, float __vector): 400a40: por%xmm2,%xmm1 400a44: addps %xmm1,%xmm0 400a47: retq I'm surely not qualified to argue about typing, but you'd need a rather strong distortion field to not characterize that as a regression. I've added 5 minutes ago an XFAILed test for exactly this code. OTOH, I have also committed a fix that will avoid producing tons of shuffle and unpacking instructions when function bar is compiled with -msse but without -msse2. I'm also going to file a missed optimization bug soon. I'm curious, does ICC support vector arithmetic like this? Do both functions compile? What code does it produce for bar? Paolo
Re: recent troubles with float vectors bitwise ops
On 8/23/07, Paolo Bonzini [EMAIL PROTECTED] wrote: I've added 5 minutes ago an XFAILed test for exactly this code. OTOH, I have also committed a fix that will avoid producing tons of shuffle and unpacking instructions when function bar is compiled with -msse but without -msse2. Thanks. I'm also going to file a missed optimization bug soon. Ditto. I'm curious, does ICC support vector arithmetic like this? Do both functions compile? What code does it produce for bar? No, icc9/10 only provide basic support for that extension (and then only on linux i think) # /opt/intel/cce/9.1.051/bin/icpc regressionorbugfix.cc regressionorbugfix.cc(5): error: no operator | matches these operands operand types are: v4sf_t | v4sf_t return a + (b | c); ^ regressionorbugfix.cc(8): error: no operator | matches these operands operand types are: v4si_t | v4si_t return a + (v4sf_t) ((v4si_t) b | (v4si_t) c); ^ but then it's more aggressive about intrinsics than gcc. Like i said somewhere i got slightly better results when using that extension than intrinsics with gcc 4.3 but haven't checked if i could get the same result with builtins yet.
Re: recent troubles with float vectors bitwise ops
Paolo Bonzini wrote: I'm curious, does ICC support vector arithmetic like this? The primary icc/icl use of SSE/SSE2 masking operations, of course, is in the auto-vectorization of fabs[f] and conditional operations: sum = 0.f; i__2 = *n; for (i__ = 1; i__ = i__2; ++i__) if (a[i__] 0.f) sum += a[i__]; (Windows/intel asm syntax) pxor xmm2, xmm2 cmpltps xmm2, xmm3 andps xmm3, xmm2 addps xmm0, xmm3 ...
Re: recent troubles with float vectors bitwise ops
On 8/23/07, Tim Prince [EMAIL PROTECTED] wrote: The primary icc/icl use of SSE/SSE2 masking operations, of course, is in the auto-vectorization of fabs[f] and conditional operations: sum = 0.f; i__2 = *n; for (i__ = 1; i__ = i__2; ++i__) if (a[i__] 0.f) sum += a[i__]; (Windows/intel asm syntax) pxor xmm2, xmm2 cmpltps xmm2, xmm3 andps xmm3, xmm2 addps xmm0, xmm3 ... Note that icc9 has a strong bias for pentium4, which had no stall penalty for mistyped fp vectors as for Intel it came with the pentium M line, so you see a pxor even if generating code for the core2. # cat autoicc.cc float foo(const float *a, int n) { float sum = 0.f; for (int i = 0; i n; ++i) if (a[i] 0.f) sum += a[i]; return sum; } int main() { return 0; } # /opt/intel/cce/9.1.051/bin/icpc -O3 -xT autoicc.cc autoicc.cc(3) : (col. 2) remark: LOOP WAS VECTORIZED. 4007a9: pxor %xmm4,%xmm4 4007ad: cmpltps %xmm3,%xmm4 4007b1: andps %xmm3,%xmm4 # /opt/intel/cce/10.0.023/bin/icpc -O3 -xT autoicc.cc autoicc.cc(3): (col. 2) remark: LOOP WAS VECTORIZED. 400b50: xorps %xmm3,%xmm3 400b53: cmpltps %xmm4,%xmm3 400b57: andps %xmm3,%xmm4
Re: recent troubles with float vectors bitwise ops
Paolo Bonzini [EMAIL PROTECTED] writes: The types defined in this manner can be used with a subset of normal C operations. Currently, GCC will allow using the following operators on these types: +, -, *, /, unary minus, ^, |, , ~.. What was missing is when allowed by the base type. E.g. is not supported. I think we should revert the patch, and continue permitting the bitwise operations on vector float. There seem to be solid reasons to permit this, and no very strong ones to prohibit it. We can consider it to be a GNU extension for vectors. Vectors are of course themselves an extension already. Ian
Re: recent troubles with float vectors bitwise ops
The types defined in this manner can be used with a subset of normal C operations. Currently, GCC will allow using the following operators on these types: +, -, *, /, unary minus, ^, |, , ~.. What was missing is when allowed by the base type. E.g. is not supported. I think we should revert the patch, and continue permitting the bitwise operations on vector float. There seem to be solid reasons to permit this, and no very strong ones to prohibit it. I'm not sure. I think it's better if we improve the compiler to generate better code for the version with casts. So we get no pessimization, and better typechecking. I think that, in an ideal world, intrinsics would be implemented using the generic vector extensions, and they would generic as good code as with builtins -- better because of simplifications that can be done after inlining. We should move towards that direction. Paolo
Re: recent troubles with float vectors bitwise ops
There seem to be solid reasons to permit this, and no very strong ones to prohibit it. We can consider it to be a GNU extension for vectors. Vectors are of course themselves an extension already. How are you suggesting it be implemented? Will the front/middle-end convert it to (vNsf)((vNsi)a | (vNsi)b), or do all vector backends need to lie about having float vector bitwise operations? Paul
Re: recent troubles with float vectors bitwise ops
Hi, On Thu, 23 Aug 2007, Paul Brook wrote: There seem to be solid reasons to permit this, and no very strong ones to prohibit it. We can consider it to be a GNU extension for vectors. Vectors are of course themselves an extension already. How are you suggesting it be implemented? Will the front/middle-end convert it to (vNsf)((vNsi)a | (vNsi)b), or do all vector backends need to lie about having float vector bitwise operations? optabs and open-coding on expand when unavailable? Like other constructs? Ciao, Michael.
Re: recent troubles with float vectors bitwise ops
On Thu, 23 Aug 2007, Ian Lance Taylor wrote: I think we should revert the patch, and continue permitting the bitwise operations on vector float. There seem to be solid reasons to permit this, and no very strong ones to prohibit it. We can consider it to be a GNU extension for vectors. Vectors are of course themselves an extension already. We decided long ago that the extension would be based on what's permitted by C++ valarray rather than by a particular CPU's vector intrinsics. So unless C++ valarray allows this operation, I think we should leave it prohibited and ensure that the compiler can generate appropriate code for these bitwise operations in the presence of casts (the particular integer element type used should of course not affect the code for these operations either.) -- Joseph S. Myers [EMAIL PROTECTED]
Re: recent troubles with float vectors bitwise ops
On 8/23/07, Joseph S. Myers [EMAIL PROTECTED] wrote: On Thu, 23 Aug 2007, Ian Lance Taylor wrote: I think we should revert the patch, and continue permitting the bitwise operations on vector float. There seem to be solid reasons to permit this, and no very strong ones to prohibit it. We can consider it to be a GNU extension for vectors. Vectors are of course themselves an extension already. We decided long ago that the extension would be based on what's permitted by C++ valarray rather than by a particular CPU's vector intrinsics. So unless C++ valarray allows this operation, I think we should leave it prohibited And it is not supported by valarray. Testcase: #include valarray using std::valarray; valarrayfloat a, b; int f(void) { a = a | b; } --- cut -- Error messages: /usr/include/c++/4.0.0/bits/valarray_before.h: In member function '_Tp std::__bitwise_or::operator()(const _Tp, const _Tp) const [with _Tp = float]': /usr/include/c++/4.0.0/bits/valarray_before.h:527: instantiated from 'typename std::__fun_Oper, typename _Arg::value_type::result_type std::_BinBase_Oper, _FirstArg, _SecondArg::operator[](size_t) const [with _Oper = std::__bitwise_or, _FirstArg = std::valarrayfloat, _SecondArg = std::valarrayfloat]' /usr/include/c++/4.0.0/bits/valarray_after.h:220: instantiated from '_Tp std::_Expr_Clos, _Tp::operator[](size_t) const [with _Clos = std::_BinClosstd::__bitwise_or, std::_ValArray, std::_ValArray, float, float, _Tp = float]' /usr/include/c++/4.0.0/bits/valarray_array.tcc:149: instantiated from 'void std::__valarray_copy(const std::_Expr_Dom, _Tp, size_t, std::_Array_Tp) [with _Tp = float, _Dom = std::_BinClosstd::__bitwise_or, std::_ValArray, std::_ValArray, float, float]' /usr/include/c++/4.0.0/valarray:696: instantiated from 'std::valarray_Tp std::valarray_Tp::operator=(const std::_Expr_Dom, _Tp) [with _Dom = std::_BinClosstd::__bitwise_or, std::_ValArray, std::_ValArray, float, float, _Tp = float]' t.cc:8: instantiated from here /usr/include/c++/4.0.0/bits/valarray_before.h:243: error: invalid operands of types 'const float' and 'const float' to binary 'operator|' Thanks, Andrew Pinski
Re: recent troubles with float vectors bitwise ops
Joseph S. Myers [EMAIL PROTECTED] writes: | On Thu, 23 Aug 2007, Ian Lance Taylor wrote: | | I think we should revert the patch, and continue permitting the | bitwise operations on vector float. | | There seem to be solid reasons to permit this, and no very strong ones | to prohibit it. We can consider it to be a GNU extension for vectors. | Vectors are of course themselves an extension already. | | We decided long ago that the extension would be based on what's permitted | by C++ valarray rather than by a particular CPU's vector intrinsics. In C++, the broadcast operations are allowed on arrays if, and only if, they are allowed on element types. -- Gaby
Re: recent troubles with float vectors bitwise ops
On 8/23/07, Andrew Pinski [EMAIL PROTECTED] wrote: On 8/23/07, Joseph S. Myers [EMAIL PROTECTED] wrote: We decided long ago that the extension would be based on what's permitted by C++ valarray rather than by a particular CPU's vector intrinsics. So unless C++ valarray allows this operation, I think we should leave it prohibited And it is not supported by valarray. Plus this is already documented: The operations behave like C++ valarrays. Addition is defined as the addition of the corresponding elements of the operands. So if one reads the documentation, vector float | vector float would mean take each element and ior it with the element in the other vector so then you have float | float which is invalid. -- Pinski
Re: recent troubles with float vectors bitwise ops
Paolo Bonzini wrote: Why did Intel split up these instructions in the first place, is it because they wanted to have a seperate vector units in some cases? I don't know and I don't care that much. To some extent I agree with Andrew Pinski here. Saying that you need support in a generic vector extension for vector float | vector float in order to generate ANDPS and not PXOR, is just wrong. That should be done by the back-end. Rather than accusing Intel of bad ISA design and the GCC maintainers of Altivec prejudice, let's just figure out what to do. We all agree that: (1) On Intel CPUs, it's more efficient to use the floating-point bitwise instructions (2) In C, you can't do a bitwise-or on two floating-point types So, we have two questions: (1) Should GCC's vector extensions permit floating-point bitwise operations? (2) If not, how can a user can get efficient code? Let's assume that the recent change is what we want, i.e., that the answer to (1) is No, these operations should not be part of the vector extensions because they are not valid scalar extensions. So, that means we need to answer (2). One option is for the user to use intrinsics. It's been claimed that results in worse code. There doesn't seem any obvious reason for that, but, if true, we should try to fix it; we don't want to penalize people who are using the intrinsics. So, let's assume using intrinsics is just as efficient, either because it already is, or because we make it so. We still have the problem that users now can't write machine-independent code to do this operation. Assuming the operations are useful for something (and, if it weren't, why would Intel want to have instructions for it, and why would tbp want to use it?), it seems unfortunate to restrict the extension in this way. We could always support the scalar form too, if we want to maintain consistency between the scalar and vector forms. Presumably, the reason this isn't standard C or C++ is because the standards don't specify a floating-point format. At most, they could have made the behavior implementation-defined. But, if nobody thought it was a useful operation, they probably didn't see any point. What are these operation used for? Can someone give an example of a kernel than benefits from this kind of thing? Assuming there's a plausible use, my suggestion is that we just undo the patch that turned off this functionality. If it doesn't well on some systems, and we don't have any volunteers to write a fully generic (e.g., move float operands to integer registers, bitwise operation, move the result back) then we could always issue a sorry. Users may then have to use #ifdefs on some platforms, but that's no worse than using intrinsics. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: recent troubles with float vectors bitwise ops
Apparently enough for a small vendor like Intel to propose such things as orps, andps, andnps, and xorps. I think you're running too far with your sarcasm. SSE's instructions do not go so far as to specify integer vs. floating point. To me, ps means 32-bit SIMD, independent of integerness. So, that's what i feared... it was intentional. And now i guess the only sanctioned access to those ops is via builtins/intrinsics. No, you can do so with casts. Floating-point to integer vector casts preserve the bit pattern. For example, you can do vector float f = { 5, 5, 5, 5 }; vector int g = { 0x8000, 0, 0x8000, 0 }; vector int f_int = (vector int) f; f = (vector float) (f_int ^ g); For Altivec, I get exactly addis r2,r10,ha16(LC0-gibberish) la r2,lo16(LC0-gibberish)(r2) lvx v0,0,r2 vxor v2,v2,v0 ... LC0: .long -2147483648 .long 0 .long -2147483648 .long 0 Paolo
RE: recent troubles with float vectors bitwise ops
On 22 August 2007 06:10, Ian Lance Taylor wrote: tbp [EMAIL PROTECTED] writes: vecop.cc:4: error: invalid operands of types 'float __vector__' and 'float __vector__' to binary 'operator|' vecop.cc:5: error: invalid operands of types 'float __vector__' and 'float __vector__' to binary 'operator' vecop.cc:6: error: invalid operands of types 'float __vector__' and 'float __vector__' to binary 'operator^' Apparently it's still there as of right now, on x86-64 at least. I think this is not supposed to happen but i'm not sure, hence the mail. What does it mean to do a bitwise-or of a floating point value? This code also gets an error: double foo(double a, double b) { return a | b; } There are some notable fp hacks and speedups that make use of integer ops on floating point operands, so it's not entirely an insane notion. However, as Paolo points out upthread, that can be done with casts, which is more correct anyway, so I don't think there's a problem with blocking the unadorned usage. cheers, DaveK -- Can't think of a witty .sigline today
Re: recent troubles with float vectors bitwise ops
On 8/21/07, tbp [EMAIL PROTECTED] wrote: # /usr/local/gcc-4.3-svn.old6/bin/g++ vecop.cc vecop.cc: In function 'T foo() [with T = float __vector__]': vecop.cc:13: instantiated from here vecop.cc:4: error: invalid operands of types 'float __vector__' and 'float __vector__' to binary 'operator|' vecop.cc:5: error: invalid operands of types 'float __vector__' and 'float __vector__' to binary 'operator' vecop.cc:6: error: invalid operands of types 'float __vector__' and 'float __vector__' to binary 'operator^' Apparently it's still there as of right now, on x86-64 at least. I think this is not supposed to happen but i'm not sure, hence the mail. This is internally, float|float does not make sense so how can vector float | vector float make sense (likewise for and ^). This was PR 30428. -- Pinski
RE: recent troubles with float vectors bitwise ops
On 22 August 2007 11:13, Andrew Pinski wrote: On 8/21/07, tbp [EMAIL PROTECTED] wrote: # /usr/local/gcc-4.3-svn.old6/bin/g++ vecop.cc vecop.cc: In function 'T foo() [with T = float __vector__]': vecop.cc:13: instantiated from here vecop.cc:4: error: invalid operands of types 'float __vector__' and 'float __vector__' to binary 'operator|' vecop.cc:5: error: invalid operands of types 'float __vector__' and 'float __vector__' to binary 'operator' vecop.cc:6: error: invalid operands of types 'float __vector__' and 'float __vector__' to binary 'operator^' Apparently it's still there as of right now, on x86-64 at least. I think this is not supposed to happen but i'm not sure, hence the mail. This is internally, float|float does not make sense so how can vector float | vector float make sense (likewise for and ^). float InvSqrt (float x){ float xhalf = 0.5f*x; int i = *(int*)x; i = 0x5f3759df - (i1); x = *(float*)i; x = x*(1.5f - xhalf*x*x); return x; } It's not exactly what you're referring to, but it's evidence in favour of the argument that we should never presume anything, no matter how unusual, might not have a reasonable use. (However as I said, I'm not arguing against the error message, since it's still possible to express this intent using casts.) cheers, DaveK -- Can't think of a witty .sigline today
Re: recent troubles with float vectors bitwise ops
On 8/22/07, Dave Korn [EMAIL PROTECTED] wrote: float InvSqrt (float x){ float xhalf = 0.5f*x; int i = *(int*)x; You are violating C/C++ aliasing rules here anyways. i = 0x5f3759df - (i1); x = *(float*)i; Likewise. So I guess you like to depend on undefined code :). -- Pinski
RE: recent troubles with float vectors bitwise ops
On 22 August 2007 11:40, Andrew Pinski wrote: On 8/22/07, Dave Korn [EMAIL PROTECTED] wrote: float InvSqrt (float x){ float xhalf = 0.5f*x; int i = *(int*)x; You are violating C/C++ aliasing rules here anyways. i = 0x5f3759df - (i1); x = *(float*)i; Likewise. So I guess you like to depend on undefined code :). Well, I like to think that I could cast the address to unsigned char*, memcpy a bunch of them to the address of an int, then dereference the int and the compiler would realise it was a no-op and optimise it away, but I doubt thatt would actually happen... cheers, DaveK -- Can't think of a witty .sigline today
Re: recent troubles with float vectors bitwise ops
On Wed, Aug 22, 2007 at 11:47:52AM +0100, Dave Korn wrote: Well, I like to think that I could cast the address to unsigned char*, memcpy a bunch of them to the address of an int, then dereference the int and the compiler would realise it was a no-op and optimise it away, but I doubt thatt would actually happen... It did a few months ago, at least for scalar variables. Have we regressed in this area? -- Rask Ingemann Lambertsen
RE: recent troubles with float vectors bitwise ops
On 22 August 2007 14:06, Rask Ingemann Lambertsen wrote: On Wed, Aug 22, 2007 at 11:47:52AM +0100, Dave Korn wrote: Well, I like to think that I could cast the address to unsigned char*, memcpy a bunch of them to the address of an int, then dereference the int and the compiler would realise it was a no-op and optimise it away, but I doubt thatt would actually happen... It did a few months ago, at least for scalar variables. I'm proper impressed! Have we regressed in this area? Not that I know of. cheers, DaveK -- Can't think of a witty .sigline today
Re: recent troubles with float vectors bitwise ops
tbp writes: Apparently enough for a small vendor like Intel to propose such things as orps, andps, andnps, and xorps. Paolo Bonzini writes: I think you're running too far with your sarcasm. SSE's instructions do not go so far as to specify integer vs. floating point. To me, ps means 32-bit SIMD, independent of integerness The IA-32 instruction set does distignuish between integer and floating point bitiwse operations. In addition to the single-precision floating-point bitwise instructions that tbp mentioned (ORPS, ANDPS, ANDNPS and XORPS) there are both distinct double-precision floating-point bitwise instructions (ORPD, ANDPD, ANDNPD and XORPD) and integer bitwise instructions (POR, PAND, PANDN and PXOR). While these operations all do the same thing, they can differ in performance depending on the context. Intel's IA-32 Software Developer's Manual gives this warning: In this example: XORPS or PXOR can be used in place of XORPD and yield the same correct result. However, because of the type mismatch between the operand data type and the instruction data type, a latency penalty will be incurred due to implementations of the instructions at the microarchitecture level. And now i guess the only sanctioned access to those ops is via builtins/intrinsics. No, you can do so with casts. tbp is correct. Using casts gets you the integer bitwise instrucitons, not the single-precision bitwise instructions that are more optimal for flipping bits in single-precision vectors. If you want GCC to generate better code using single-precision bitwise instructions you're now forced to use the intrinsics. Ross Ridge
Re: recent troubles with float vectors bitwise ops
On 8/22/07, Paolo Bonzini [EMAIL PROTECTED] wrote: I think you're running too far with your sarcasm. SSE's instructions do not go so far as to specify integer vs. floating point. To me, ps means 32-bit SIMD, independent of integerness. Excuse me if i'm amazed being replied bitwise ops on floating values make no sense as the justification for breaking something that used to work and match hardware features. I naively thought that was the purpose of that convenient extension. So, that's what i feared... it was intentional. And now i guess the only sanctioned access to those ops is via builtins/intrinsics. No, you can do so with casts. Floating-point to integer vector casts preserve the bit pattern. For example, you can do Again SIMD ops (among them bitwise stuff) comes in 3 mostly symmetric flavors on x86 namely for int, float and doubles; casting isn't innocuous because there's a penalty for type mismatch (1 cycle of re-categorization if i remember for both k8 and core2), so it's either that or some moving around. Let me cite Intel(r) 64 and IA-32 Architectures Optimization Reference Manual, 5-1, When writing SIMD code that works for both integer and floating-point data, use the subset of SIMD convert instructions or load/store instructions to ensure that the input operands in XMM registers contain data types that are properly defined to match the instruction. Code sequences containing cross-typed usage produce the same result across different implementations but incur a significant performance penalty. Using SSE/SSE2/SSE3/SSSE3 instructions to operate on type-mismatched SIMD data in the XMM register is strongly discouraged. You could find a similar note in AMD's doc for the k8.
Re: recent troubles with float vectors bitwise ops
On 8/22/07, tbp [EMAIL PROTECTED] wrote: On 8/22/07, Paolo Bonzini [EMAIL PROTECTED] wrote: I think you're running too far with your sarcasm. SSE's instructions do not go so far as to specify integer vs. floating point. To me, ps means 32-bit SIMD, independent of integerness. Excuse me if i'm amazed being replied bitwise ops on floating values make no sense as the justification for breaking something that used to work and match hardware features. I naively thought that was the purpose of that convenient extension. Which hardware (remember GCC is a generic compiler)? VMX/Altivec and SPU actually does not have different instructions for bitwise and/ior/xor for different vector types (it is all the same instruction). I have ran into ICEs with even bitwise on vector float/double on x86 also in the past which is the other reason why I disabled them. Since this is an extension, it would be nice if it was nicely defined extension which means disabling them for vector float/double. Thanks, Andrew Pinski
Re: recent troubles with float vectors bitwise ops
On 8/22/07, Andrew Pinski [EMAIL PROTECTED] wrote: Which hardware (remember GCC is a generic compiler)? VMX/Altivec and SPU actually does not have different instructions for bitwise and/ior/xor for different vector types (it is all the same instruction). I have ran into ICEs with even bitwise on vector float/double on x86 also in the past which is the other reason why I disabled them. Since this is an extension, it would be nice if it was nicely defined extension which means disabling them for vector float/double. One more note, the C/C++ Language extensions for the CBEA specifications, says that the bitwise operators don't work on vector float/double but do work on the integer vector types. So the other reason why this change I did was to make us more conforming with that standard (yes I worked on that spec but I did not write that part). -- Pinski
Re: recent troubles with float vectors bitwise ops
Ross Ridge writes: tbp is correct. Using casts gets you the integer bitwise instrucitons, not the single-precision bitwise instructions that are more optimal for flipping bits in single-precision vectors. If you want GCC to generate better code using single-precision bitwise instructions you're now forced to use the intrinsics. GCC makes the problem is even worse if only SSE and not SSE 2 instructions are enabled. Since the integer bitwise instructions are only available with SSE 2, using casts instead of intrinsics causes GCC to expand the operation into a long series of instructions. If I were tbp, I'd just code all his vector operatations using intrinsics. The other responses in this thread have made it clear that GCC's vector arithemetic operations are really only designed to be used with the Cell Broadband Engine and other Power PC processors. Ross Ridge
Re: recent troubles with float vectors bitwise ops
On 8/22/07, Ross Ridge [EMAIL PROTECTED] wrote: GCC makes the problem is even worse if only SSE and not SSE 2 instructions are enabled. Since the integer bitwise instructions are only available with SSE 2, using casts instead of intrinsics causes GCC to expand the operation into a long series of instructions. And why do a bad decission based on another bad decission? Why did Intel split up these instructions in the first place, is it because they wanted to have a seperate vector units in some cases? I don't know and I don't care that much. This extension is supposed to be generic and doing weird stuff by allowing bitwise operators on vector float just confuses people more. Yes Intel/AMD's specific instruction set includes that but not everyone elses. If I were tbp, I'd just code all his vector operatations using intrinsics. The other responses in this thread have made it clear that GCC's vector arithemetic operations are really only designed to be used with the Cell Broadband Engine and other Power PC processors. No, they were designed to be generic. The issue comes down to what is generic. I am saying that don't allow it for scalar fp types, why allow it for vector fp types? The genericism here is that vector is just an expansion on top of the scalar types. Not many new features are supposed to be added. -- Pinski
Re: recent troubles with float vectors bitwise ops
Ross Ridge [EMAIL PROTECTED] wrote: GCC makes the problem is even worse if only SSE and not SSE 2 instructions are enabled. Since the integer bitwise instructions are only available with SSE 2, using casts instead of intrinsics causes GCC to expand the operation into a long series of instructions. Andrew Pinski writes: ... Why did Intel split up these instructions in the first place, is it because they wanted to have a seperate vector units in some cases? I don't know and I don't care that much. Well, if you would rather remain ingorant, I suppose there's little point in discussing this with you. However, please don't try to pretend that the vector extenstions are supposed to be generic when you use justifications like it's how Altivec works, and it's compatible with a proprietary standard called C/C++ Language Extensions for Cell Broadband Engine Architecture. If you're going to continue to use justifications like this and ignore the performance implications of your changes on IA-32, then you should accept the fact that the vector extensions are not ment for platforms that you don't know and don't care that much about. Ross Ridge
recent troubles with float vectors bitwise ops
Hello, # cat vecop.cc templatetypename T T foo() { T a = { 0, 1, 2, 3 }, b = { 4, 5, 6, 7 }, c = a | b, d = c b, e = d ^ b; return e; } int main() { typedef float v4sf_t __attribute__ ((__vector_size__ (16))); typedef int v4si_t __attribute__ ((__vector_size__ (16))); foov4si_t(); foov4sf_t(); return 0; } # /usr/local/gcc-4.3-svn.old5/bin/g++ -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../configure --prefix=/usr/local/gcc-4.3-svn --enable-languages=c,c++ --enable-threads=posix --disable-checking --disable-nls --disable-shared --disable-win32-registry --with-system-zlib --disable-multilib --verbose --with-gcc=gcc-4.2 --with-gnu-ld --with-gnu-as --enable-checking=none --disable-bootstrap Thread model: posix gcc version 4.3.0 20070808 (experimental) # /usr/local/gcc-4.3-svn.old5/bin/g++ vecop.cc # /usr/local/gcc-4.3-svn.old6/bin/g++ -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../configure --prefix=/usr/local/gcc-4.3-svn --enable-languages=c,c++ --enable-threads=posix --disable-checking --disable-nls --disable-shared --disable-win32-registry --with-system-zlib --disable-multilib --verbose --with-gcc=gcc-4.2 --with-gnu-ld --with-gnu-as --enable-checking=none --disable-bootstrap Thread model: posix gcc version 4.3.0 20070819 (experimental) # /usr/local/gcc-4.3-svn.old6/bin/g++ vecop.cc vecop.cc: In function 'T foo() [with T = float __vector__]': vecop.cc:13: instantiated from here vecop.cc:4: error: invalid operands of types 'float __vector__' and 'float __vector__' to binary 'operator|' vecop.cc:5: error: invalid operands of types 'float __vector__' and 'float __vector__' to binary 'operator' vecop.cc:6: error: invalid operands of types 'float __vector__' and 'float __vector__' to binary 'operator^' Apparently it's still there as of right now, on x86-64 at least. I think this is not supposed to happen but i'm not sure, hence the mail.
Re: recent troubles with float vectors bitwise ops
tbp [EMAIL PROTECTED] writes: vecop.cc:4: error: invalid operands of types 'float __vector__' and 'float __vector__' to binary 'operator|' vecop.cc:5: error: invalid operands of types 'float __vector__' and 'float __vector__' to binary 'operator' vecop.cc:6: error: invalid operands of types 'float __vector__' and 'float __vector__' to binary 'operator^' Apparently it's still there as of right now, on x86-64 at least. I think this is not supposed to happen but i'm not sure, hence the mail. What does it mean to do a bitwise-or of a floating point value? This code also gets an error: double foo(double a, double b) { return a | b; } Ian
Re: recent troubles with float vectors bitwise ops
Ian Lance Taylor wrote: What does it mean to do a bitwise-or of a floating point value? Apparently enough for a small vendor like Intel to propose such things as orps, andps, andnps, and xorps. So, that's what i feared... it was intentional. And now i guess the only sanctioned access to those ops is via builtins/intrinsics. Great. If only i could get the same quality of code when using intrinsics to begin with...