Mathieu Desnoyers wrote:
Will fix. I noticed that it was because I had too much "register" char
variables declared and used at the same time. Putting a "g" constraint
gave the same result.
"g" is the same as "rmi", which is probably *NOT* what you want.
Don't use "register" variables.
* H. Peter Anvin ([EMAIL PROTECTED]) wrote:
> Mathieu Desnoyers wrote:
> >
> >I have tried generating asm-to-"register" c variables for char, short
> >and int on i386 and I do not see this happening. The char opcode is
> >always 1 byte, short 2 bytes and int 1 byte. Result:
> >
>
> The comment
> - Either I use a "r" constraint and let gcc produce the instructions,
> that I need to assume to have correct size so I can align their
> immediate values (therefore, taking the offset from the end of the
> instruction will not help). Here, if gas changes its behavior
> dramatically for
- Either I use a r constraint and let gcc produce the instructions,
that I need to assume to have correct size so I can align their
immediate values (therefore, taking the offset from the end of the
instruction will not help). Here, if gas changes its behavior
dramatically for a given
* H. Peter Anvin ([EMAIL PROTECTED]) wrote:
Mathieu Desnoyers wrote:
I have tried generating asm-to-register c variables for char, short
and int on i386 and I do not see this happening. The char opcode is
always 1 byte, short 2 bytes and int 1 byte. Result:
The comment was referring to
Mathieu Desnoyers wrote:
Will fix. I noticed that it was because I had too much register char
variables declared and used at the same time. Putting a g constraint
gave the same result.
g is the same as rmi, which is probably *NOT* what you want.
Don't use register variables. That's a poor
Mathieu Desnoyers wrote:
I have tried generating asm-to-"register" c variables for char, short
and int on i386 and I do not see this happening. The char opcode is
always 1 byte, short 2 bytes and int 1 byte. Result:
The comment was referring to x86-64, but I incorrectly remembered that
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> H. Peter Anvin wrote:
> > Allowing different registers should be doable, but if so, one would have
> > to put 0: at the *end* of the instruction and use (0f)-4 instead, since
> > the non-%eax forms are one byte longer.
> >
>
> OK, that's
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
H. Peter Anvin wrote:
Allowing different registers should be doable, but if so, one would have
to put 0: at the *end* of the instruction and use (0f)-4 instead, since
the non-%eax forms are one byte longer.
OK, that's already a
Mathieu Desnoyers wrote:
I have tried generating asm-to-register c variables for char, short
and int on i386 and I do not see this happening. The char opcode is
always 1 byte, short 2 bytes and int 1 byte. Result:
The comment was referring to x86-64, but I incorrectly remembered that
* Denys Vlasenko ([EMAIL PROTECTED]) wrote:
> On Tuesday 18 September 2007 22:07, Mathieu Desnoyers wrote:
> > i386 optimization of the immediate values which uses a movl with code
> > patching
> > to set/unset the value used to populate the register used as variable
> > source.
> >
> >
* Denys Vlasenko ([EMAIL PROTECTED]) wrote:
On Tuesday 18 September 2007 22:07, Mathieu Desnoyers wrote:
i386 optimization of the immediate values which uses a movl with code
patching
to set/unset the value used to populate the register used as variable
source.
Changelog:
- Use
On Tuesday 18 September 2007 22:07, Mathieu Desnoyers wrote:
> i386 optimization of the immediate values which uses a movl with code patching
> to set/unset the value used to populate the register used as variable source.
>
> Changelog:
> - Use text_poke_early with cr0 WP save/restore to patch
On Tuesday 18 September 2007 22:07, Mathieu Desnoyers wrote:
i386 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.
Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the
* H. Peter Anvin ([EMAIL PROTECTED]) wrote:
> Jeremy Fitzhardinge wrote:
> >
> > Cross-cache-line, sure. But what about just not sizeof aligned? If its
> > enough to avoid cross-cache-line, then that's simpler.
> >
>
> Not really. It pretty much comes down to the same problem.
>
> > Which
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> H. Peter Anvin wrote:
> > Mathieu Desnoyers wrote:
> >
> >> Ok, let's have a good look at what we want:
> >>
> >> 1 - get a pointer to the beginning of the immediate value within the
> >> instruction.
> >> 2 - make sure that the immediate
Jeremy Fitzhardinge wrote:
>
> Cross-cache-line, sure. But what about just not sizeof aligned? If its
> enough to avoid cross-cache-line, then that's simpler.
>
Not really. It pretty much comes down to the same problem.
> Which is something I was going to comment on: Mathieu, you try to
H. Peter Anvin wrote:
> Mathieu Desnoyers wrote:
>
>> Ok, let's have a good look at what we want:
>>
>> 1 - get a pointer to the beginning of the immediate value within the
>> instruction.
>> 2 - make sure that the immediate value, within the instruction, is
>> written to atomically wrt
Mathieu Desnoyers wrote:
>
> Ok, let's have a good look at what we want:
>
> 1 - get a pointer to the beginning of the immediate value within the
> instruction.
> 2 - make sure that the immediate value, within the instruction, is
> written to atomically wrt all CPUs, even on older
* Mathieu Desnoyers ([EMAIL PROTECTED]) wrote:
> * Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> > H. Peter Anvin wrote:
> > > Allowing different registers should be doable, but if so, one would have
> > > to put 0: at the *end* of the instruction and use (0f)-4 instead, since
> > > the
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> H. Peter Anvin wrote:
> > Allowing different registers should be doable, but if so, one would have
> > to put 0: at the *end* of the instruction and use (0f)-4 instead, since
> > the non-%eax forms are one byte longer.
> >
>
> OK, that's
> And yes, it's a pity there is no way to produce the long-nops there. :(
To be honest I consider it extremly unlikely you would be able
to measure a difference.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More
* Andi Kleen ([EMAIL PROTECTED]) wrote:
> Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:
> >
> > It's a pity that gas seems to generate plain 0x90 nops rather than
> > long-nop forms here. I thought it could do that.
>
> .p2align does it.
>
Sadly, p2align does not apply well to my context. I
* H. Peter Anvin ([EMAIL PROTECTED]) wrote:
> Jeremy Fitzhardinge wrote:
> > Mathieu Desnoyers wrote:
> >
> >> +#define immediate_read(name)
> >> \
> >> + ({ \
> >> +
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> Mathieu Desnoyers wrote:
>
> > +#define immediate_read(name)
> > \
> > + ({ \
> > + __typeof__(name##__immediate) value;
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
Mathieu Desnoyers wrote:
+#define immediate_read(name)
\
+ ({ \
+ __typeof__(name##__immediate) value;
* H. Peter Anvin ([EMAIL PROTECTED]) wrote:
Jeremy Fitzhardinge wrote:
Mathieu Desnoyers wrote:
+#define immediate_read(name)
\
+ ({ \
+ __typeof__(name##__immediate)
* Andi Kleen ([EMAIL PROTECTED]) wrote:
Jeremy Fitzhardinge [EMAIL PROTECTED] writes:
It's a pity that gas seems to generate plain 0x90 nops rather than
long-nop forms here. I thought it could do that.
.p2align does it.
Sadly, p2align does not apply well to my context. I have to
And yes, it's a pity there is no way to produce the long-nops there. :(
To be honest I consider it extremly unlikely you would be able
to measure a difference.
-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
H. Peter Anvin wrote:
Allowing different registers should be doable, but if so, one would have
to put 0: at the *end* of the instruction and use (0f)-4 instead, since
the non-%eax forms are one byte longer.
OK, that's already a
* Mathieu Desnoyers ([EMAIL PROTECTED]) wrote:
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
H. Peter Anvin wrote:
Allowing different registers should be doable, but if so, one would have
to put 0: at the *end* of the instruction and use (0f)-4 instead, since
the non-%eax forms are
Mathieu Desnoyers wrote:
Ok, let's have a good look at what we want:
1 - get a pointer to the beginning of the immediate value within the
instruction.
2 - make sure that the immediate value, within the instruction, is
written to atomically wrt all CPUs, even on older architectures
H. Peter Anvin wrote:
Mathieu Desnoyers wrote:
Ok, let's have a good look at what we want:
1 - get a pointer to the beginning of the immediate value within the
instruction.
2 - make sure that the immediate value, within the instruction, is
written to atomically wrt all CPUs,
Jeremy Fitzhardinge wrote:
Cross-cache-line, sure. But what about just not sizeof aligned? If its
enough to avoid cross-cache-line, then that's simpler.
Not really. It pretty much comes down to the same problem.
Which is something I was going to comment on: Mathieu, you try to align
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
H. Peter Anvin wrote:
Mathieu Desnoyers wrote:
Ok, let's have a good look at what we want:
1 - get a pointer to the beginning of the immediate value within the
instruction.
2 - make sure that the immediate value, within the
* H. Peter Anvin ([EMAIL PROTECTED]) wrote:
Jeremy Fitzhardinge wrote:
Cross-cache-line, sure. But what about just not sizeof aligned? If its
enough to avoid cross-cache-line, then that's simpler.
Not really. It pretty much comes down to the same problem.
Which is something I
On Tue, Sep 18, 2007 at 03:29:50PM -0700, Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> > Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:
> >
> >> It's a pity that gas seems to generate plain 0x90 nops rather than
> >> long-nop forms here. I thought it could do that.
> >>
> >
> >
Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
>> Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:
>>
>>> It's a pity that gas seems to generate plain 0x90 nops rather than
>>> long-nop forms here. I thought it could do that.
>>>
>> .p2align does it
>
> Just .p2align? Not align, balign,
Andi Kleen wrote:
> Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:
>
>> It's a pity that gas seems to generate plain 0x90 nops rather than
>> long-nop forms here. I thought it could do that.
>>
>
> .p2align does it
Just .p2align? Not align, balign, org or skip? Seems... strange.
H. Peter Anvin wrote:
> Allowing different registers should be doable, but if so, one would have
> to put 0: at the *end* of the instruction and use (0f)-4 instead, since
> the non-%eax forms are one byte longer.
>
OK, that's already a problem since its using "=r" as the constraint.
> This
Jeremy Fitzhardinge wrote:
> Mathieu Desnoyers wrote:
>
>> +#define immediate_read(name)
>> \
>> +({ \
>> +__typeof__(name##__immediate) value;\
>> +
Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:
>
> It's a pity that gas seems to generate plain 0x90 nops rather than
> long-nop forms here. I thought it could do that.
.p2align does it.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message
Mathieu Desnoyers wrote:
> +#define immediate_read(name) \
> + ({ \
> + __typeof__(name##__immediate) value;\
> + switch (sizeof(value)) {
i386 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.
Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
non atomic writes to a code region only
On Mon, Sep 17, 2007 at 02:42:28PM -0400, Mathieu Desnoyers wrote:
> i386 optimization of the immediate values which uses a movl with code patching
> to set/unset the value used to populate the register used as variable source.
>
> Changelog:
> - Use text_poke_early with cr0 WP save/restore to
On Mon, Sep 17, 2007 at 02:42:28PM -0400, Mathieu Desnoyers wrote:
i386 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.
Changelog:
- Use text_poke_early with cr0 WP save/restore to patch
i386 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.
Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
non atomic writes to a code region only
Mathieu Desnoyers wrote:
+#define immediate_read(name) \
+ ({ \
+ __typeof__(name##__immediate) value;\
+ switch (sizeof(value)) {
Jeremy Fitzhardinge [EMAIL PROTECTED] writes:
It's a pity that gas seems to generate plain 0x90 nops rather than
long-nop forms here. I thought it could do that.
.p2align does it.
-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to
Jeremy Fitzhardinge wrote:
Mathieu Desnoyers wrote:
+#define immediate_read(name)
\
+({ \
+__typeof__(name##__immediate) value;\
+
H. Peter Anvin wrote:
Allowing different registers should be doable, but if so, one would have
to put 0: at the *end* of the instruction and use (0f)-4 instead, since
the non-%eax forms are one byte longer.
OK, that's already a problem since its using =r as the constraint.
This also
Andi Kleen wrote:
Jeremy Fitzhardinge [EMAIL PROTECTED] writes:
It's a pity that gas seems to generate plain 0x90 nops rather than
long-nop forms here. I thought it could do that.
.p2align does it
Just .p2align? Not align, balign, org or skip? Seems... strange.
J
-
To
Jeremy Fitzhardinge wrote:
Andi Kleen wrote:
Jeremy Fitzhardinge [EMAIL PROTECTED] writes:
It's a pity that gas seems to generate plain 0x90 nops rather than
long-nop forms here. I thought it could do that.
.p2align does it
Just .p2align? Not align, balign, org or skip?
On Tue, Sep 18, 2007 at 03:29:50PM -0700, Jeremy Fitzhardinge wrote:
Andi Kleen wrote:
Jeremy Fitzhardinge [EMAIL PROTECTED] writes:
It's a pity that gas seems to generate plain 0x90 nops rather than
long-nop forms here. I thought it could do that.
.p2align does it
Just
i386 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.
Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
non atomic writes to a code region only
i386 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.
Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
non atomic writes to a code region only
56 matches
Mail list logo