Re: [Qemu-devel] [PATCH] cputlb: don't cpu_abort() if guest tries to execute outside RAM or RAM

2016-06-28 Thread Paolo Bonzini


On 28/06/2016 17:42, Peter Maydell wrote:
> Ping for review?

The patch is trivial, the hard part was coming up with the message for
the user. :)  Go ahead!

Paolo

> thanks
> -- PMM
> 
> On 20 June 2016 at 18:07, Peter Maydell  wrote:
>> In get_page_addr_code(), if the guest program counter turns out not to
>> be in ROM or RAM, we can't handle executing from it, and we call
>> cpu_abort(). This results in the message
>>   qemu: fatal: Trying to execute code outside RAM or ROM at 0x0800
>> followed by a guest register dump, and then QEMU dumps core.
>>
>> This situation happens in one of two cases:
>>  (1) a guest kernel bug, where it jumped off into nowhere
>>  (2) a user command line mistake, where they tried to run an image for
>>  board A on a QEMU model of board B, or where they didn't provide
>>  an image at all, and QEMU executed through a ROM or RAM full of
>>  NOP instructions and then fell off the end
>>
>> In either case, a core dump of QEMU itself is entirely useless, and
>> only confuses users into thinking that this is a bug in QEMU rather
>> than a bug in the guest or a problem with their command line. (This
>> is a variation on the general idea that we shouldn't assert() on
>> something the user can accidentally provoke.)
>>
>> Replace the cpu_abort() with something that explains the situation
>> a bit better and exits QEMU without dumping core.
>>
>> (See LP:1062220 for several examples of confused users.)
>>
>> Signed-off-by: Peter Maydell 
>> ---
>> I've been meaning to do this for a while now...hopefully the
>> expanded error message should reduce user confusion.
>>
>>  cputlb.c | 39 +--
>>  1 file changed, 37 insertions(+), 2 deletions(-)
>>
>> diff --git a/cputlb.c b/cputlb.c
>> index 23c9b91..079e497 100644
>> --- a/cputlb.c
>> +++ b/cputlb.c
>> @@ -30,6 +30,8 @@
>>  #include "exec/ram_addr.h"
>>  #include "exec/exec-all.h"
>>  #include "tcg/tcg.h"
>> +#include "qemu/error-report.h"
>> +#include "exec/log.h"
>>
>>  /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */
>>  /* #define DEBUG_TLB */
>> @@ -427,6 +429,39 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
>>  prot, mmu_idx, size);
>>  }
>>
>> +static void report_bad_exec(CPUState *cpu, target_ulong addr)
>> +{
>> +/* Accidentally executing outside RAM or ROM is quite common for
>> + * several user-error situations, so report it in a way that
>> + * makes it clear that this isn't a QEMU bug and provide suggestions
>> + * about what a user could do to fix things.
>> + */
>> +error_report("Trying to execute code outside RAM or ROM at 0x"
>> + TARGET_FMT_lx, addr);
>> +error_printf("This usually means one of the following happened:\n\n"
>> + "(1) You told QEMU to execute a kernel for the wrong 
>> machine "
>> + "type, and it crashed on startup (eg trying to run a "
>> + "raspberry pi kernel on a versatilepb QEMU machine)\n"
>> + "(2) You didn't give QEMU a kernel or BIOS filename at 
>> all, "
>> + "and QEMU executed a ROM full of no-op instructions until "
>> + "it fell off the end\n"
>> + "(3) Your guest kernel has a bug and crashed by jumping "
>> + "off into nowhere\n\n"
>> + "This is almost always one of the first two, so check your 
>> "
>> + "command line and that you are using the right type of 
>> kernel "
>> + "for this machine.\n"
>> + "If you think option (3) is likely then you can try 
>> debugging "
>> + "your guest with the -d debug options; in particular "
>> + "-d guest_errors will cause the log to include a dump of 
>> the "
>> + "guest register state at this point.\n\n"
>> + "Execution cannot continue; stopping here.\n\n");
>> +
>> +/* Report also to the logs, with more detail including register dump */
>> +qemu_log_mask(LOG_GUEST_ERROR, "qemu: fatal: Trying to execute code "
>> +  "outside RAM or ROM at 0x" TARGET_FMT_lx "\n", addr);
>> +log_cpu_state_mask(LOG_GUEST_ERROR, cpu, CPU_DUMP_FPU | CPU_DUMP_CCOP);
>> +}
>> +
>>  /* NOTE: this function can trigger an exception */
>>  /* NOTE2: the returned address is not exactly the physical address: it
>>   * is actually a ram_addr_t (in system mode; the user mode emulation
>> @@ -455,8 +490,8 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, 
>> target_ulong addr)
>>  if (cc->do_unassigned_access) {
>>  cc->do_unassigned_access(cpu, addr, false, true, 0, 4);
>>  } else {
>> -cpu_abort(cpu, "Trying to execute code outside RAM or ROM at 0x"
>> -  TARGET_FMT_lx "\n", addr);
>> +report_bad_exec(cpu, addr);

Re: [Qemu-devel] [PATCH] cputlb: don't cpu_abort() if guest tries to execute outside RAM or RAM

2016-06-28 Thread Peter Maydell
On 28 June 2016 at 18:49, Paolo Bonzini  wrote:
> On 28/06/2016 17:42, Peter Maydell wrote:
>> Ping for review?
>
> The patch is trivial, the hard part was coming up with the message for
> the user. :)

Sure, but review includes whether the message makes sense :-)

> Go ahead!

I'll push it to master in a bit.

thanks
-- PMM



Re: [Qemu-devel] [PATCH] cputlb: don't cpu_abort() if guest tries to execute outside RAM or RAM

2016-06-28 Thread Richard Henderson

On 06/20/2016 10:07 AM, Peter Maydell wrote:

In get_page_addr_code(), if the guest program counter turns out not to
be in ROM or RAM, we can't handle executing from it, and we call
cpu_abort(). This results in the message
  qemu: fatal: Trying to execute code outside RAM or ROM at 0x0800
followed by a guest register dump, and then QEMU dumps core.

This situation happens in one of two cases:
 (1) a guest kernel bug, where it jumped off into nowhere
 (2) a user command line mistake, where they tried to run an image for
 board A on a QEMU model of board B, or where they didn't provide
 an image at all, and QEMU executed through a ROM or RAM full of
 NOP instructions and then fell off the end

In either case, a core dump of QEMU itself is entirely useless, and
only confuses users into thinking that this is a bug in QEMU rather
than a bug in the guest or a problem with their command line. (This
is a variation on the general idea that we shouldn't assert() on
something the user can accidentally provoke.)

Replace the cpu_abort() with something that explains the situation
a bit better and exits QEMU without dumping core.

(See LP:1062220 for several examples of confused users.)

Signed-off-by: Peter Maydell 
---


Reviewed-by: Richard Henderson  


r~



Re: [Qemu-devel] [PATCH] cputlb: don't cpu_abort() if guest tries to execute outside RAM or RAM

2016-06-28 Thread Peter Maydell
Ping for review?

thanks
-- PMM

On 20 June 2016 at 18:07, Peter Maydell  wrote:
> In get_page_addr_code(), if the guest program counter turns out not to
> be in ROM or RAM, we can't handle executing from it, and we call
> cpu_abort(). This results in the message
>   qemu: fatal: Trying to execute code outside RAM or ROM at 0x0800
> followed by a guest register dump, and then QEMU dumps core.
>
> This situation happens in one of two cases:
>  (1) a guest kernel bug, where it jumped off into nowhere
>  (2) a user command line mistake, where they tried to run an image for
>  board A on a QEMU model of board B, or where they didn't provide
>  an image at all, and QEMU executed through a ROM or RAM full of
>  NOP instructions and then fell off the end
>
> In either case, a core dump of QEMU itself is entirely useless, and
> only confuses users into thinking that this is a bug in QEMU rather
> than a bug in the guest or a problem with their command line. (This
> is a variation on the general idea that we shouldn't assert() on
> something the user can accidentally provoke.)
>
> Replace the cpu_abort() with something that explains the situation
> a bit better and exits QEMU without dumping core.
>
> (See LP:1062220 for several examples of confused users.)
>
> Signed-off-by: Peter Maydell 
> ---
> I've been meaning to do this for a while now...hopefully the
> expanded error message should reduce user confusion.
>
>  cputlb.c | 39 +--
>  1 file changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/cputlb.c b/cputlb.c
> index 23c9b91..079e497 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -30,6 +30,8 @@
>  #include "exec/ram_addr.h"
>  #include "exec/exec-all.h"
>  #include "tcg/tcg.h"
> +#include "qemu/error-report.h"
> +#include "exec/log.h"
>
>  /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */
>  /* #define DEBUG_TLB */
> @@ -427,6 +429,39 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
>  prot, mmu_idx, size);
>  }
>
> +static void report_bad_exec(CPUState *cpu, target_ulong addr)
> +{
> +/* Accidentally executing outside RAM or ROM is quite common for
> + * several user-error situations, so report it in a way that
> + * makes it clear that this isn't a QEMU bug and provide suggestions
> + * about what a user could do to fix things.
> + */
> +error_report("Trying to execute code outside RAM or ROM at 0x"
> + TARGET_FMT_lx, addr);
> +error_printf("This usually means one of the following happened:\n\n"
> + "(1) You told QEMU to execute a kernel for the wrong 
> machine "
> + "type, and it crashed on startup (eg trying to run a "
> + "raspberry pi kernel on a versatilepb QEMU machine)\n"
> + "(2) You didn't give QEMU a kernel or BIOS filename at all, 
> "
> + "and QEMU executed a ROM full of no-op instructions until "
> + "it fell off the end\n"
> + "(3) Your guest kernel has a bug and crashed by jumping "
> + "off into nowhere\n\n"
> + "This is almost always one of the first two, so check your "
> + "command line and that you are using the right type of 
> kernel "
> + "for this machine.\n"
> + "If you think option (3) is likely then you can try 
> debugging "
> + "your guest with the -d debug options; in particular "
> + "-d guest_errors will cause the log to include a dump of 
> the "
> + "guest register state at this point.\n\n"
> + "Execution cannot continue; stopping here.\n\n");
> +
> +/* Report also to the logs, with more detail including register dump */
> +qemu_log_mask(LOG_GUEST_ERROR, "qemu: fatal: Trying to execute code "
> +  "outside RAM or ROM at 0x" TARGET_FMT_lx "\n", addr);
> +log_cpu_state_mask(LOG_GUEST_ERROR, cpu, CPU_DUMP_FPU | CPU_DUMP_CCOP);
> +}
> +
>  /* NOTE: this function can trigger an exception */
>  /* NOTE2: the returned address is not exactly the physical address: it
>   * is actually a ram_addr_t (in system mode; the user mode emulation
> @@ -455,8 +490,8 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, 
> target_ulong addr)
>  if (cc->do_unassigned_access) {
>  cc->do_unassigned_access(cpu, addr, false, true, 0, 4);
>  } else {
> -cpu_abort(cpu, "Trying to execute code outside RAM or ROM at 0x"
> -  TARGET_FMT_lx "\n", addr);
> +report_bad_exec(cpu, addr);
> +exit(1);
>  }
>  }
>  p = (void *)((uintptr_t)addr + 
> env1->tlb_table[mmu_idx][page_index].addend);
> --
> 1.9.1



Re: [Qemu-devel] [PATCH] cputlb: don't cpu_abort() if guest tries to execute outside RAM or RAM

2016-06-20 Thread Peter Maydell
On 20 June 2016 at 20:16, Mark Cave-Ayland
 wrote:
> Excellent! Another use case I see here is with HelenOS/ppc whose
> bootloader is fixed at address 0x800 (128Mb) and so if you don't
> increase the memory above the default then you end up with this panic,
> which as you rightly point out is often confusing.

For that one, if the real life machine always has more ram
and we don't mind breaking migration back-compat for it,
then we could set its default_ram_size to something
other than 128MB.

thanks
-- PMM



Re: [Qemu-devel] [PATCH] cputlb: don't cpu_abort() if guest tries to execute outside RAM or RAM

2016-06-20 Thread Mark Cave-Ayland
On 20/06/16 18:07, Peter Maydell wrote:

> In get_page_addr_code(), if the guest program counter turns out not to
> be in ROM or RAM, we can't handle executing from it, and we call
> cpu_abort(). This results in the message
>   qemu: fatal: Trying to execute code outside RAM or ROM at 0x0800
> followed by a guest register dump, and then QEMU dumps core.
> 
> This situation happens in one of two cases:
>  (1) a guest kernel bug, where it jumped off into nowhere
>  (2) a user command line mistake, where they tried to run an image for
>  board A on a QEMU model of board B, or where they didn't provide
>  an image at all, and QEMU executed through a ROM or RAM full of
>  NOP instructions and then fell off the end
> 
> In either case, a core dump of QEMU itself is entirely useless, and
> only confuses users into thinking that this is a bug in QEMU rather
> than a bug in the guest or a problem with their command line. (This
> is a variation on the general idea that we shouldn't assert() on
> something the user can accidentally provoke.)
> 
> Replace the cpu_abort() with something that explains the situation
> a bit better and exits QEMU without dumping core.
> 
> (See LP:1062220 for several examples of confused users.)
> 
> Signed-off-by: Peter Maydell 
> ---
> I've been meaning to do this for a while now...hopefully the
> expanded error message should reduce user confusion.
> 
>  cputlb.c | 39 +--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/cputlb.c b/cputlb.c
> index 23c9b91..079e497 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -30,6 +30,8 @@
>  #include "exec/ram_addr.h"
>  #include "exec/exec-all.h"
>  #include "tcg/tcg.h"
> +#include "qemu/error-report.h"
> +#include "exec/log.h"
>  
>  /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */
>  /* #define DEBUG_TLB */
> @@ -427,6 +429,39 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
>  prot, mmu_idx, size);
>  }
>  
> +static void report_bad_exec(CPUState *cpu, target_ulong addr)
> +{
> +/* Accidentally executing outside RAM or ROM is quite common for
> + * several user-error situations, so report it in a way that
> + * makes it clear that this isn't a QEMU bug and provide suggestions
> + * about what a user could do to fix things.
> + */
> +error_report("Trying to execute code outside RAM or ROM at 0x"
> + TARGET_FMT_lx, addr);
> +error_printf("This usually means one of the following happened:\n\n"
> + "(1) You told QEMU to execute a kernel for the wrong 
> machine "
> + "type, and it crashed on startup (eg trying to run a "
> + "raspberry pi kernel on a versatilepb QEMU machine)\n"
> + "(2) You didn't give QEMU a kernel or BIOS filename at all, 
> "
> + "and QEMU executed a ROM full of no-op instructions until "
> + "it fell off the end\n"
> + "(3) Your guest kernel has a bug and crashed by jumping "
> + "off into nowhere\n\n"
> + "This is almost always one of the first two, so check your "
> + "command line and that you are using the right type of 
> kernel "
> + "for this machine.\n"
> + "If you think option (3) is likely then you can try 
> debugging "
> + "your guest with the -d debug options; in particular "
> + "-d guest_errors will cause the log to include a dump of 
> the "
> + "guest register state at this point.\n\n"
> + "Execution cannot continue; stopping here.\n\n");
> +
> +/* Report also to the logs, with more detail including register dump */
> +qemu_log_mask(LOG_GUEST_ERROR, "qemu: fatal: Trying to execute code "
> +  "outside RAM or ROM at 0x" TARGET_FMT_lx "\n", addr);
> +log_cpu_state_mask(LOG_GUEST_ERROR, cpu, CPU_DUMP_FPU | CPU_DUMP_CCOP);
> +}
> +
>  /* NOTE: this function can trigger an exception */
>  /* NOTE2: the returned address is not exactly the physical address: it
>   * is actually a ram_addr_t (in system mode; the user mode emulation
> @@ -455,8 +490,8 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, 
> target_ulong addr)
>  if (cc->do_unassigned_access) {
>  cc->do_unassigned_access(cpu, addr, false, true, 0, 4);
>  } else {
> -cpu_abort(cpu, "Trying to execute code outside RAM or ROM at 0x"
> -  TARGET_FMT_lx "\n", addr);
> +report_bad_exec(cpu, addr);
> +exit(1);
>  }
>  }
>  p = (void *)((uintptr_t)addr + 
> env1->tlb_table[mmu_idx][page_index].addend);
> 

Excellent! Another use case I see here is with HelenOS/ppc whose
bootloader is fixed at address 0x800 (128Mb) and so if you don't
increase the memory above the default then