[gem5-dev] Change in gem5/gem5[develop]: dev-amdgpu,configs: checkpoint before MMIOs
Matthew Poremba has submitted this change. ( https://gem5-review.googlesource.com/c/public/gem5/+/46161 ) Change subject: dev-amdgpu,configs: checkpoint before MMIOs .. dev-amdgpu,configs: checkpoint before MMIOs The flow for Full System amdgpu is the use KVM to boot linux and begin loading the driver module. However, the amdgpu module requires reading the VGA ROM located at 0xc in X86. KVM does not support having a small 128KiB hole at this location, therefore we take a checkpoint and switch to a timing CPU to continue loading the drivers before the VGA ROM is read. This creates a checkpoint just before the first MMIOs. This is indicated by three interrupts being sent to the PCI device. After three interrupts in a row are counted a checkpoint exit event occurs. The interrupt counter is reset if a non-interrupt PCI read is seen. Change-Id: I23b320abe81ff6e766cb3f604eca2979339938e5 Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/46161 Reviewed-by: Matt Sinclair Reviewed-by: Jason Lowe-Power Maintainer: Matt Sinclair Tested-by: kokoro --- M configs/example/gpufs/runfs.py M configs/example/gpufs/system/amdgpu.py M src/dev/amdgpu/AMDGPU.py M src/dev/amdgpu/amdgpu_device.cc M src/dev/amdgpu/amdgpu_device.hh 5 files changed, 54 insertions(+), 3 deletions(-) Approvals: Jason Lowe-Power: Looks good to me, approved Matt Sinclair: Looks good to me, approved; Looks good to me, approved kokoro: Regressions pass diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py index 2e60633..fc32a61 100644 --- a/configs/example/gpufs/runfs.py +++ b/configs/example/gpufs/runfs.py @@ -58,6 +58,8 @@ parser.add_argument("--host-parallel", default=False, action="store_true", help="Run multiple host threads in KVM mode") +parser.add_argument("--restore-dir", type=str, default=None, +help="Directory to restore checkpoints from") parser.add_argument("--disk-image", default="", help="The boot disk image to mount (/dev/sda)") parser.add_argument("--second-disk", default=None, @@ -66,6 +68,11 @@ parser.add_argument("--gpu-rom", default=None, help="GPU BIOS to load") parser.add_argument("--gpu-mmio-trace", default=None, help="GPU MMIO trace to load") +parser.add_argument("--checkpoint-before-mmios", default=False, +action="store_true", +help="Take a checkpoint before driver sends MMIOs. " +"This is used to switch out of KVM mode and into " +"timing mode required to read the VGA ROM on boot.") def runGpuFSSystem(args): @@ -89,7 +96,10 @@ if args.script is not None: system.readfile = args.script -m5.instantiate() +if args.restore_dir is None: +m5.instantiate() +else: +m5.instantiate(args.restore_dir) print("Running the simulation") @@ -97,6 +107,20 @@ exit_event = m5.simulate(sim_ticks) +# Keep executing while there is something to do +while True: +if exit_event.getCause() == "m5_exit instruction encountered" or \ +exit_event.getCause() == "user interrupt received" or \ +exit_event.getCause() == "simulate() limit reached": +break +elif "checkpoint" in exit_event.getCause(): +assert(args.checkpoint_dir is not None) +m5.checkpoint(args.checkpoint_dir) +break +else: +print('Unknown exit event: %s. Continuing...' +% exit_event.getCause()) + print('Exiting @ tick %i because %s' % (m5.curTick(), exit_event.getCause())) diff --git a/configs/example/gpufs/system/amdgpu.py b/configs/example/gpufs/system/amdgpu.py index b6bf821..b30892c 100644 --- a/configs/example/gpufs/system/amdgpu.py +++ b/configs/example/gpufs/system/amdgpu.py @@ -148,3 +148,5 @@ system.pc.south_bridge.gpu.trace_file = args.gpu_mmio_trace system.pc.south_bridge.gpu.rom_binary = args.gpu_rom +system.pc.south_bridge.gpu.checkpoint_before_mmios = \ +args.checkpoint_before_mmios diff --git a/src/dev/amdgpu/AMDGPU.py b/src/dev/amdgpu/AMDGPU.py index 29526c4..092a380 100644 --- a/src/dev/amdgpu/AMDGPU.py +++ b/src/dev/amdgpu/AMDGPU.py @@ -70,3 +70,5 @@ rom_binary = Param.String("ROM binary dumped from hardware") trace_file = Param.String("MMIO trace collected on hardware") +checkpoint_before_mmios = Param.Bool(False, "Take a checkpoint before the" +" device begins sending MMIOs") diff --git a/src/dev/amdgpu/amdgpu_device.cc b/src/dev/amdgpu/amdgpu_device.cc index afe28f3..08e6987 100644 --- a/src/dev/amdgpu/amdgpu_device.cc +++ b/src/dev/amdgpu/amdgpu_device.cc @@ -40,9 +40,11 @@ #include
[gem5-dev] Change in gem5/gem5[develop]: dev-amdgpu,configs: checkpoint before MMIOs
Matthew Poremba has uploaded this change for review. ( https://gem5-review.googlesource.com/c/public/gem5/+/46161 ) Change subject: dev-amdgpu,configs: checkpoint before MMIOs .. dev-amdgpu,configs: checkpoint before MMIOs The flow for Full System amdgpu is the use KVM to boot linux and begin loading the driver module. However, the amdgpu module requires reading the VGA ROM located at 0xc in X86. KVM does not support having a small 128KiB hole at this location, therefore we take a checkpoint and switch to a timing CPU to continue loading the drivers before the VGA ROM is read. This creates a checkpoint just before the first MMIOs. This is indicated by three interrupts being sent to the PCI device. After three interrupts in a row are counted a checkpoint exit event occurs. The interrupt counter is reset if a non-interrupt PCI read is seen. Change-Id: I23b320abe81ff6e766cb3f604eca2979339938e5 --- M configs/example/gpufs/runfs.py M src/dev/amdgpu/amdgpu_device.cc M src/dev/amdgpu/amdgpu_device.hh 3 files changed, 41 insertions(+), 3 deletions(-) diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py index 2e60633..231cf2d 100644 --- a/configs/example/gpufs/runfs.py +++ b/configs/example/gpufs/runfs.py @@ -58,6 +58,8 @@ parser.add_argument("--host-parallel", default=False, action="store_true", help="Run multiple host threads in KVM mode") +parser.add_argument("--restore-dir", type=str, default=None, +help="Directory to restore checkpoints from") parser.add_argument("--disk-image", default="", help="The boot disk image to mount (/dev/sda)") parser.add_argument("--second-disk", default=None, @@ -89,7 +91,10 @@ if args.script is not None: system.readfile = args.script -m5.instantiate() +if args.restore_dir is None: +m5.instantiate() +else: +m5.instantiate(args.restore_dir) print("Running the simulation") @@ -97,6 +102,20 @@ exit_event = m5.simulate(sim_ticks) +# Keep executing while there is something to do +while True: +if exit_event.getCause() == "m5_exit instruction encountered" or \ +exit_event.getCause() == "user interrupt received" or \ +exit_event.getCause() == "simulate() limit reached": +break +elif "checkpoint" in exit_event.getCause(): +assert(args.checkpoint_dir is not None) +m5.checkpoint(args.checkpoint_dir) +break +else: +print('Unknown exit event: %s. Continuing...' +% exit_event.getCause()) + print('Exiting @ tick %i because %s' % (m5.curTick(), exit_event.getCause())) diff --git a/src/dev/amdgpu/amdgpu_device.cc b/src/dev/amdgpu/amdgpu_device.cc index 26dd2b3..20efce6 100644 --- a/src/dev/amdgpu/amdgpu_device.cc +++ b/src/dev/amdgpu/amdgpu_device.cc @@ -40,9 +40,10 @@ #include "mem/packet_access.hh" #include "params/AMDGPUDevice.hh" #include "sim/byteswap.hh" +#include "sim/sim_exit.hh" AMDGPUDevice::AMDGPUDevice(const AMDGPUDeviceParams ) -: PciDevice(p) +: PciDevice(p), init_interrupt_count(0) { // Loading the rom binary dumped from hardware. std::ifstream romBin; @@ -100,7 +101,23 @@ DPRINTF(AMDGPUDevice, "Read Config: from offset: %#x size: %#x " "data: %#x\n", offset, pkt->getSize(), config.data[offset]); -return PciDevice::readConfig(pkt); +Tick delay = PciDevice::readConfig(pkt); + +// Before sending MMIOs the driver sends three interrupts in a row. +// Use this to trigger creating a checkpoint to restore in timing mode. +// This is only necessary until we can create a "hole" in the KVM VM +// around the VGA ROM region such that KVM exits and sends requests to +// this device rather than the KVM VM. +if (offset == PCI0_INTERRUPT_PIN) { +if (++init_interrupt_count == 3) { +DPRINTF(AMDGPUDevice, "Checkpointing before first MMIO\n"); +exitSimLoop("checkpoint", 0, curTick() + delay + 1); +} +} else { +init_interrupt_count = 0; +} + +return delay; } Tick diff --git a/src/dev/amdgpu/amdgpu_device.hh b/src/dev/amdgpu/amdgpu_device.hh index 892d021..418cc50 100644 --- a/src/dev/amdgpu/amdgpu_device.hh +++ b/src/dev/amdgpu/amdgpu_device.hh @@ -102,6 +102,8 @@ */ std::unordered_map regs; +int init_interrupt_count; + public: AMDGPUDevice(const AMDGPUDeviceParams ); -- To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/46161 To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings Gerrit-Project: public/gem5 Gerrit-Branch: develop Gerrit-Change-Id: I23b320abe81ff6e766cb3f604eca2979339938e5 Gerrit-Change-Number: 46161 Gerrit-PatchSet: 1