Re: [Qemu-devel] TCG x86-64 'bt' insn

2014-04-20 Thread Clemens Kolbitsch
Hi Peter,

sure, that's fine too :). I just meant "great that it will be picked up in
a future version" :)

Thanks!
-Clemens


On Sat, Apr 19, 2014 at 3:50 PM, Peter Maydell wrote:

> On 19 April 2014 23:41, Clemens Kolbitsch  wrote:
> > Thanks guys, awesome feedback and glad to see it was picked up in QEMU
> 2.0
> > :)
>
> This didn't go into 2.0 -- it arrived somewhat late for that. It'll get
> into 2.1
> (and perhaps 2.0.1, if anybody cares enough to cc stable on it).
>
> thanks
> -- PMM
>


Re: [Qemu-devel] TCG x86-64 'bt' insn

2014-04-19 Thread Peter Maydell
On 19 April 2014 23:41, Clemens Kolbitsch  wrote:
> Thanks guys, awesome feedback and glad to see it was picked up in QEMU 2.0
> :)

This didn't go into 2.0 -- it arrived somewhat late for that. It'll get into 2.1
(and perhaps 2.0.1, if anybody cares enough to cc stable on it).

thanks
-- PMM



Re: [Qemu-devel] TCG x86-64 'bt' insn

2014-04-19 Thread Clemens Kolbitsch
Thanks guys, awesome feedback and glad to see it was picked up in QEMU 2.0
:)


On Sat, Apr 19, 2014 at 3:21 PM, Peter Maydell wrote:

> On 19 April 2014 23:04, Paolo Bonzini  wrote:
> > The new code should apply to btc/btr/bts too.
>
> ...see also RTH's patch:
>
> https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01455.html
>
> thanks
> -- PMM
>



-- 
Clemens Kolbitsch
Security Researcher
kolbit...@lastline.com
Mobile +1 (206) 356-7745
Land +1 (805) 456-7076

Lastline, Inc.
6950 Hollister Avenue, Suite 101
Goleta, CA 93117

www.lastline.com


Re: [Qemu-devel] TCG x86-64 'bt' insn

2014-04-19 Thread Peter Maydell
On 19 April 2014 23:04, Paolo Bonzini  wrote:
> The new code should apply to btc/btr/bts too.

...see also RTH's patch:

https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01455.html

thanks
-- PMM



Re: [Qemu-devel] TCG x86-64 'bt' insn

2014-04-19 Thread Paolo Bonzini

Il 09/04/2014 13:55, Clemens Kolbitsch ha scritto:

Hi,

quick follow-up. *As always* you find a problem right after asking for
help :). The updated patch does not cause BSOD on Windows guests, but
neither does it fix the actual problem (of the program seg-faulting)

I would really appreciate any feedback on the proposed patch below - the
difference to the previous patch is that we clear out undefined flags
and only keep the Z-flag (and update the C-flag)


The new code should apply to btc/btr/bts too.  Basically the "switch" 
statement keeps just the shr, and everything else moves below.


Or even simpler, you could exploit the undefinedness of non-CF, non-ZF 
flags and keep using CC_OP_SARB.  Because CC_OP_SARB uses "cc_dst == 0" 
to compute ZF, if ZF=1 you want to set it to 0, and if ZF=0 you want to 
set it to non-zero.  So just compute !ZF into CC_DST, and place the 
tested bit (CF) into CC_SRC.  Basically:


gen_setcc1(s, (JCC_Z << 1) | 1, cpu_cc_dst);
switch(op) {
...
}
tcg_gen_andi_tl(cpu_cc_src, cpu_tmp4, CC_C);
set_cc_op(s, CC_OP_SARB + ot);


Last another general question: Does TCG make any assumptions that
undefined flags are set to 0? I see that most flag-computations set
undefined flags to 0 - is this just a convention or really a requirement?


Just a convention.

Paolo


Thanks guys!
-Clemens




On Wed, Apr 9, 2014 at 10:33 AM, Clemens Kolbitsch
mailto:kolbit...@lastline.com>> wrote:

Hi guys,

I have to revive a rather old thread [1,2]. A quick summary of the
issue:

TCG emulates the BT (bit-test) instruction using a SAR to re-compute
eflags. While SAR affects all flags, BT only affects the C-flag and
leaves the Z-flag unaffected/unchanged [3].

According to the x86 manual, the current behavior is correct
(although not perfect performance-wise), but NOT correct according
to newer Intel CPU instruction sets (such as x86-64). What has
caused some confusion in the past it seems is that AMD's manual
mentions that all flags other than the C-flag are "undefined" (just
like in Intel's old x86 manual).

A patch has been suggested (although not applied) before [2] and I
was trying to update and try the patch to the QEMU 2.0 RC2, but it
seems not to be working (causes a BSOD on Windows guests).

BTW: I have a program that seg-faults on Windows clients (32 as well
as 64 bit guest OSes) in vanilla QEMU 2.0 RC2 (just like in previous
versions), so this is not just of theoretical importance :)

Can somebody say if there is something that I'm doing obviously wrong?

*NON-FUNCTIONAL PATCH!*

--- ../orig/qemu-2.0.0-rc2/target-i386/translate.c  2014-04-08
12:38:58.0 -0700
+++ target-i386/translate.c 2014-04-09 10:08:43.252084947 -0700
@@ -6710,8 +6710,14 @@
 tcg_gen_andi_tl(cpu_T[1], cpu_T[1], (1 << (3 + ot)) - 1);
 switch(op) {
 case 0:
-tcg_gen_shr_tl(cpu_cc_src, cpu_T[0], cpu_T[1]);
-tcg_gen_movi_tl(cpu_cc_dst, 0);
+/* whatever the last CC-op is - recompute now so we can
OR-in
+ * updated results */
+gen_update_cc_op(s);
+gen_compute_eflags(s);
+tcg_gen_shr_tl(cpu_tmp4, cpu_T[0], cpu_T[1]);
+tcg_gen_andi_tl(cpu_tmp4, cpu_tmp4, CC_C);
+tcg_gen_or_tl(cpu_cc_src, cpu_cc_src, cpu_tmp4);
+set_cc_op(s, CC_OP_EFLAGS);
 break;
 case 1:
 tcg_gen_shr_tl(cpu_tmp4, cpu_T[0], cpu_T[1]);
@@ -6734,8 +6740,8 @@
 tcg_gen_xor_tl(cpu_T[0], cpu_T[0], cpu_tmp0);
 break;
 }
-set_cc_op(s, CC_OP_SARB + ot);
 if (op != 0) {
+set_cc_op(s, CC_OP_SARB + ot);
 if (mod != 3) {
 gen_op_st_v(s, ot, cpu_T[0], cpu_A0);
 } else {

Thanks, I'd really appreciate any input you can provide.
-Clemens


[1] http://lists.gnu.org/archive/html/qemu-devel/2010-11/msg00442.html
[2] http://lists.gnu.org/archive/html/qemu-devel/2010-11/msg00764.html
[3] page
148 
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf







Re: [Qemu-devel] TCG x86-64 'bt' insn

2014-04-09 Thread Clemens Kolbitsch
Hi,

quick follow-up. *As always* you find a problem right after asking for help
:). The updated patch does not cause BSOD on Windows guests, but neither
does it fix the actual problem (of the program seg-faulting)

I would really appreciate any feedback on the proposed patch below - the
difference to the previous patch is that we clear out undefined flags and
only keep the Z-flag (and update the C-flag)

--- ../orig/qemu-2.0.0-rc2/target-i386/translate.c  2014-04-08
12:38:58.0 -0700
+++ target-i386/translate.c 2014-04-09 10:48:25.264200230 -0700
@@ -6710,8 +6710,15 @@
 tcg_gen_andi_tl(cpu_T[1], cpu_T[1], (1 << (3 + ot)) - 1);
 switch(op) {
 case 0:
-tcg_gen_shr_tl(cpu_cc_src, cpu_T[0], cpu_T[1]);
-tcg_gen_movi_tl(cpu_cc_dst, 0);
+/* whatever the last CC-op is - recompute now so we can OR-in
+ * updated results */
+gen_update_cc_op(s); // ? needed ?
+gen_compute_eflags(s);
+tcg_gen_shr_tl(cpu_tmp4, cpu_T[0], cpu_T[1]);
+tcg_gen_andi_tl(cpu_tmp4, cpu_tmp4, CC_C);
+tcg_gen_andi_tl(cpu_cc_src, cpu_cc_src, CC_Z);
+tcg_gen_or_tl(cpu_cc_src, cpu_cc_src, cpu_tmp4);
+set_cc_op(s, CC_OP_EFLAGS);
 break;
 case 1:
 tcg_gen_shr_tl(cpu_tmp4, cpu_T[0], cpu_T[1]);
@@ -6734,8 +6741,8 @@
 tcg_gen_xor_tl(cpu_T[0], cpu_T[0], cpu_tmp0);
 break;
 }
-set_cc_op(s, CC_OP_SARB + ot);
 if (op != 0) {
+set_cc_op(s, CC_OP_SARB + ot);
 if (mod != 3) {
 gen_op_st_v(s, ot, cpu_T[0], cpu_A0);
 } else {

Last another general question: Does TCG make any assumptions that undefined
flags are set to 0? I see that most flag-computations set undefined flags
to 0 - is this just a convention or really a requirement?

Thanks guys!
-Clemens




On Wed, Apr 9, 2014 at 10:33 AM, Clemens Kolbitsch
wrote:

> Hi guys,
>
> I have to revive a rather old thread [1,2]. A quick summary of the issue:
>
> TCG emulates the BT (bit-test) instruction using a SAR to re-compute
> eflags. While SAR affects all flags, BT only affects the C-flag and leaves
> the Z-flag unaffected/unchanged [3].
>
> According to the x86 manual, the current behavior is correct (although not
> perfect performance-wise), but NOT correct according to newer Intel CPU
> instruction sets (such as x86-64). What has caused some confusion in the
> past it seems is that AMD's manual mentions that all flags other than the
> C-flag are "undefined" (just like in Intel's old x86 manual).
>
> A patch has been suggested (although not applied) before [2] and I was
> trying to update and try the patch to the QEMU 2.0 RC2, but it seems not to
> be working (causes a BSOD on Windows guests).
>
> BTW: I have a program that seg-faults on Windows clients (32 as well as 64
> bit guest OSes) in vanilla QEMU 2.0 RC2 (just like in previous versions),
> so this is not just of theoretical importance :)
>
> Can somebody say if there is something that I'm doing obviously wrong?
>
> *NON-FUNCTIONAL PATCH!*
>
> --- ../orig/qemu-2.0.0-rc2/target-i386/translate.c  2014-04-08
> 12:38:58.0 -0700
> +++ target-i386/translate.c 2014-04-09 10:08:43.252084947 -0700
> @@ -6710,8 +6710,14 @@
>  tcg_gen_andi_tl(cpu_T[1], cpu_T[1], (1 << (3 + ot)) - 1);
>  switch(op) {
>  case 0:
> -tcg_gen_shr_tl(cpu_cc_src, cpu_T[0], cpu_T[1]);
> -tcg_gen_movi_tl(cpu_cc_dst, 0);
> +/* whatever the last CC-op is - recompute now so we can OR-in
> + * updated results */
> +gen_update_cc_op(s);
> +gen_compute_eflags(s);
> +tcg_gen_shr_tl(cpu_tmp4, cpu_T[0], cpu_T[1]);
> +tcg_gen_andi_tl(cpu_tmp4, cpu_tmp4, CC_C);
> +tcg_gen_or_tl(cpu_cc_src, cpu_cc_src, cpu_tmp4);
> +set_cc_op(s, CC_OP_EFLAGS);
>  break;
>  case 1:
>  tcg_gen_shr_tl(cpu_tmp4, cpu_T[0], cpu_T[1]);
> @@ -6734,8 +6740,8 @@
>  tcg_gen_xor_tl(cpu_T[0], cpu_T[0], cpu_tmp0);
>  break;
>  }
> -set_cc_op(s, CC_OP_SARB + ot);
>  if (op != 0) {
> +set_cc_op(s, CC_OP_SARB + ot);
>  if (mod != 3) {
>  gen_op_st_v(s, ot, cpu_T[0], cpu_A0);
>  } else {
>
> Thanks, I'd really appreciate any input you can provide.
> -Clemens
>
>
> [1] http://lists.gnu.org/archive/html/qemu-devel/2010-11/msg00442.html
> [2] http://lists.gnu.org/archive/html/qemu-devel/2010-11/msg00764.html
> [3] page 148
> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf
>


[Qemu-devel] TCG x86-64 'bt' insn

2014-04-09 Thread Clemens Kolbitsch
Hi guys,

I have to revive a rather old thread [1,2]. A quick summary of the issue:

TCG emulates the BT (bit-test) instruction using a SAR to re-compute
eflags. While SAR affects all flags, BT only affects the C-flag and leaves
the Z-flag unaffected/unchanged [3].

According to the x86 manual, the current behavior is correct (although not
perfect performance-wise), but NOT correct according to newer Intel CPU
instruction sets (such as x86-64). What has caused some confusion in the
past it seems is that AMD's manual mentions that all flags other than the
C-flag are "undefined" (just like in Intel's old x86 manual).

A patch has been suggested (although not applied) before [2] and I was
trying to update and try the patch to the QEMU 2.0 RC2, but it seems not to
be working (causes a BSOD on Windows guests).

BTW: I have a program that seg-faults on Windows clients (32 as well as 64
bit guest OSes) in vanilla QEMU 2.0 RC2 (just like in previous versions),
so this is not just of theoretical importance :)

Can somebody say if there is something that I'm doing obviously wrong?

*NON-FUNCTIONAL PATCH!*

--- ../orig/qemu-2.0.0-rc2/target-i386/translate.c  2014-04-08
12:38:58.0 -0700
+++ target-i386/translate.c 2014-04-09 10:08:43.252084947 -0700
@@ -6710,8 +6710,14 @@
 tcg_gen_andi_tl(cpu_T[1], cpu_T[1], (1 << (3 + ot)) - 1);
 switch(op) {
 case 0:
-tcg_gen_shr_tl(cpu_cc_src, cpu_T[0], cpu_T[1]);
-tcg_gen_movi_tl(cpu_cc_dst, 0);
+/* whatever the last CC-op is - recompute now so we can OR-in
+ * updated results */
+gen_update_cc_op(s);
+gen_compute_eflags(s);
+tcg_gen_shr_tl(cpu_tmp4, cpu_T[0], cpu_T[1]);
+tcg_gen_andi_tl(cpu_tmp4, cpu_tmp4, CC_C);
+tcg_gen_or_tl(cpu_cc_src, cpu_cc_src, cpu_tmp4);
+set_cc_op(s, CC_OP_EFLAGS);
 break;
 case 1:
 tcg_gen_shr_tl(cpu_tmp4, cpu_T[0], cpu_T[1]);
@@ -6734,8 +6740,8 @@
 tcg_gen_xor_tl(cpu_T[0], cpu_T[0], cpu_tmp0);
 break;
 }
-set_cc_op(s, CC_OP_SARB + ot);
 if (op != 0) {
+set_cc_op(s, CC_OP_SARB + ot);
 if (mod != 3) {
 gen_op_st_v(s, ot, cpu_T[0], cpu_A0);
 } else {

Thanks, I'd really appreciate any input you can provide.
-Clemens


[1] http://lists.gnu.org/archive/html/qemu-devel/2010-11/msg00442.html
[2] http://lists.gnu.org/archive/html/qemu-devel/2010-11/msg00764.html
[3] page 148
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf

-- 
Clemens Kolbitsch
Security Researcher
kolbit...@lastline.com
Mobile +1 (206) 356-7745
Land +1 (805) 456-7076

Lastline, Inc.
6950 Hollister Avenue, Suite 101
Goleta, CA 93117

www.lastline.com