[gem5-dev] [S] Change in gem5/gem5[develop]: gpu-compute,configs: Make sim exits conditional

2023-07-07 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/72098?usp=email )


Change subject: gpu-compute,configs: Make sim exits conditional
..

gpu-compute,configs: Make sim exits conditional

The unconditional exit event when a kernel completes that was added in
c644eae2ddd34cf449a9c4476730bd29703c4dd7 is causing scripts that do not
ignore unknown exit events to end simulation prematurely. One such
script is the apu_se.py script used in SE mode GPU simulation. Make this
exit conditional to the parameter being set to a valid value to avoid
this problem.

Change-Id: I1d2c082291fdbcf27390913ffdffb963ec8080dd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/72098
Reviewed-by: Jason Lowe-Power 
Maintainer: Matt Sinclair 
Maintainer: Jason Lowe-Power 
Reviewed-by: Matt Sinclair 
Tested-by: kokoro 
---
M configs/example/gpufs/system/system.py
M src/gpu-compute/GPU.py
M src/gpu-compute/dispatcher.cc
M src/gpu-compute/dispatcher.hh
4 files changed, 13 insertions(+), 3 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/configs/example/gpufs/system/system.py  
b/configs/example/gpufs/system/system.py

index 40e0016..19df310 100644
--- a/configs/example/gpufs/system/system.py
+++ b/configs/example/gpufs/system/system.py
@@ -115,7 +115,8 @@
 numHWQueues=args.num_hw_queues,
 walker=hsapp_pt_walker,
 )
-dispatcher = GPUDispatcher()
+dispatcher_exit_events = True if args.exit_at_gpu_kernel > -1 else  
False

+dispatcher = GPUDispatcher(kernel_exit_events=dispatcher_exit_events)
 cp_pt_walker = VegaPagetableWalker()
 gpu_cmd_proc = GPUCommandProcessor(
 hsapp=gpu_hsapp, dispatcher=dispatcher, walker=cp_pt_walker
diff --git a/src/gpu-compute/GPU.py b/src/gpu-compute/GPU.py
index c5449cc..c64a6b7 100644
--- a/src/gpu-compute/GPU.py
+++ b/src/gpu-compute/GPU.py
@@ -328,6 +328,10 @@
 cxx_class = "gem5::GPUDispatcher"
 cxx_header = "gpu-compute/dispatcher.hh"

+kernel_exit_events = Param.Bool(
+False, "Enable exiting sim loop after a kernel"
+)
+

 class GPUCommandProcessor(DmaVirtDevice):
 type = "GPUCommandProcessor"
diff --git a/src/gpu-compute/dispatcher.cc b/src/gpu-compute/dispatcher.cc
index b19bccc..7b36bce 100644
--- a/src/gpu-compute/dispatcher.cc
+++ b/src/gpu-compute/dispatcher.cc
@@ -51,7 +51,8 @@
 : SimObject(p), shader(nullptr), gpuCmdProc(nullptr),
   tickEvent([this]{ exec(); },
   "GPU Dispatcher tick", false, Event::CPU_Tick_Pri),
-  dispatchActive(false), stats(this)
+  dispatchActive(false), kernelExitEvents(p.kernel_exit_events),
+  stats(this)
 {
 schedule(, 0);
 }
@@ -332,7 +333,9 @@
 curTick(), kern_id);
 DPRINTF(GPUKernelInfo, "Completed kernel %d\n", kern_id);

-exitSimLoop("GPU Kernel Completed");
+if (kernelExitEvents) {
+exitSimLoop("GPU Kernel Completed");
+}
 }

 if (!tickEvent.scheduled()) {
diff --git a/src/gpu-compute/dispatcher.hh b/src/gpu-compute/dispatcher.hh
index 7699cef..eafa080 100644
--- a/src/gpu-compute/dispatcher.hh
+++ b/src/gpu-compute/dispatcher.hh
@@ -92,6 +92,8 @@
 std::queue doneIds;
 // is there a kernel in execution?
 bool dispatchActive;
+// Enable exiting sim loop after each kernel completion
+bool kernelExitEvents;

   protected:
 struct GPUDispatcherStats : public statistics::Group

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/72098?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I1d2c082291fdbcf27390913ffdffb963ec8080dd
Gerrit-Change-Number: 72098
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Bobby Bruce 
Gerrit-CC: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: gpu-compute,configs: Make sim exits conditional

2023-07-06 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/72098?usp=email )



Change subject: gpu-compute,configs: Make sim exits conditional
..

gpu-compute,configs: Make sim exits conditional

The unconditional exit event when a kernel completes that was added in
c644eae2ddd34cf449a9c4476730bd29703c4dd7 is causing scripts that do not
ignore unknown exit events to end simulation prematurely. One such
script is the apu_se.py script used in SE mode GPU simulation. Make this
exit conditional to the parameter being set to a valid value to avoid
this problem.

Change-Id: I1d2c082291fdbcf27390913ffdffb963ec8080dd
---
M configs/example/gpufs/system/system.py
M src/gpu-compute/GPU.py
M src/gpu-compute/dispatcher.cc
M src/gpu-compute/dispatcher.hh
4 files changed, 13 insertions(+), 3 deletions(-)



diff --git a/configs/example/gpufs/system/system.py  
b/configs/example/gpufs/system/system.py

index 40e0016..19df310 100644
--- a/configs/example/gpufs/system/system.py
+++ b/configs/example/gpufs/system/system.py
@@ -115,7 +115,8 @@
 numHWQueues=args.num_hw_queues,
 walker=hsapp_pt_walker,
 )
-dispatcher = GPUDispatcher()
+dispatcher_exit_events = True if args.exit_at_gpu_kernel > -1 else  
False

+dispatcher = GPUDispatcher(kernel_exit_events=dispatcher_exit_events)
 cp_pt_walker = VegaPagetableWalker()
 gpu_cmd_proc = GPUCommandProcessor(
 hsapp=gpu_hsapp, dispatcher=dispatcher, walker=cp_pt_walker
diff --git a/src/gpu-compute/GPU.py b/src/gpu-compute/GPU.py
index c5449cc..c64a6b7 100644
--- a/src/gpu-compute/GPU.py
+++ b/src/gpu-compute/GPU.py
@@ -328,6 +328,10 @@
 cxx_class = "gem5::GPUDispatcher"
 cxx_header = "gpu-compute/dispatcher.hh"

+kernel_exit_events = Param.Bool(
+False, "Enable exiting sim loop after a kernel"
+)
+

 class GPUCommandProcessor(DmaVirtDevice):
 type = "GPUCommandProcessor"
diff --git a/src/gpu-compute/dispatcher.cc b/src/gpu-compute/dispatcher.cc
index b19bccc..7b36bce 100644
--- a/src/gpu-compute/dispatcher.cc
+++ b/src/gpu-compute/dispatcher.cc
@@ -51,7 +51,8 @@
 : SimObject(p), shader(nullptr), gpuCmdProc(nullptr),
   tickEvent([this]{ exec(); },
   "GPU Dispatcher tick", false, Event::CPU_Tick_Pri),
-  dispatchActive(false), stats(this)
+  dispatchActive(false), kernelExitEvents(p.kernel_exit_events),
+  stats(this)
 {
 schedule(, 0);
 }
@@ -332,7 +333,9 @@
 curTick(), kern_id);
 DPRINTF(GPUKernelInfo, "Completed kernel %d\n", kern_id);

-exitSimLoop("GPU Kernel Completed");
+if (kernelExitEvents) {
+exitSimLoop("GPU Kernel Completed");
+}
 }

 if (!tickEvent.scheduled()) {
diff --git a/src/gpu-compute/dispatcher.hh b/src/gpu-compute/dispatcher.hh
index 7699cef..eafa080 100644
--- a/src/gpu-compute/dispatcher.hh
+++ b/src/gpu-compute/dispatcher.hh
@@ -92,6 +92,8 @@
 std::queue doneIds;
 // is there a kernel in execution?
 bool dispatchActive;
+// Enable exiting sim loop after each kernel completion
+bool kernelExitEvents;

   protected:
 struct GPUDispatcherStats : public statistics::Group

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/72098?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I1d2c082291fdbcf27390913ffdffb963ec8080dd
Gerrit-Change-Number: 72098
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [L] Change in gem5/gem5[develop]: configs: Create base GPUFS vega config and atomic config

2023-06-30 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/71939?usp=email )


Change subject: configs: Create base GPUFS vega config and atomic config
..

configs: Create base GPUFS vega config and atomic config

Move the Vega KVM script code to a common base file and add scripts for
KVM and atomic. Since atomic is now possible in GPUFS this gives a way
to run it without editing the current scripts.

Change-Id: I094bc4d4df856563535c28c1f6d6cc045d6734cd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71939
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
A configs/example/gpufs/vega10.py
A configs/example/gpufs/vega10_atomic.py
M configs/example/gpufs/vega10_kvm.py
3 files changed, 188 insertions(+), 124 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/configs/example/gpufs/vega10.py  
b/configs/example/gpufs/vega10.py

new file mode 100644
index 000..9eff5a2
--- /dev/null
+++ b/configs/example/gpufs/vega10.py
@@ -0,0 +1,153 @@
+# Copyright (c) 2022-2023 Advanced Micro Devices, Inc.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are  
met:

+#
+# 1. Redistributions of source code must retain the above copyright notice,
+# this list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright  
notice,

+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from this
+# software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS  
IS"

+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR  
PURPOSE

+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF  
THE

+# POSSIBILITY OF SUCH DAMAGE.
+
+import m5
+import runfs
+import base64
+import tempfile
+import argparse
+import sys
+import os
+
+from amd import AmdGPUOptions
+from common import Options
+from common import GPUTLBOptions
+from ruby import Ruby
+
+
+demo_runscript_without_checkpoint = """\
+export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
+export HSA_ENABLE_INTERRUPT=0
+dmesg -n8
+dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
+if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
+echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
+/sbin/m5 exit
+fi
+modprobe -v amdgpu ip_block_mask=0xff ppfeaturemask=0 dpm=0 audio=0
+echo "Running {} {}"
+echo "{}" | base64 -d > myapp
+chmod +x myapp
+./myapp {}
+/sbin/m5 exit
+"""
+
+demo_runscript_with_checkpoint = """\
+export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
+export HSA_ENABLE_INTERRUPT=0
+dmesg -n8
+dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
+if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
+echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
+/sbin/m5 exit
+fi
+modprobe -v amdgpu ip_block_mask=0xff ppfeaturemask=0 dpm=0 audio=0
+echo "Running {} {}"
+echo "{}" | base64 -d > myapp
+chmod +x myapp
+/sbin/m5 checkpoint
+./myapp {}
+/sbin/m5 exit
+"""
+
+
+def addDemoOptions(parser):
+parser.add_argument(
+"-a", "--app", default=None, help="GPU application to run"
+)
+parser.add_argument(
+"-o", "--opts", default="", help="GPU application arguments"
+)
+
+
+def runVegaGPUFS(cpu_type):
+parser = argparse.ArgumentParser()
+runfs.addRunFSOptions(parser)
+Options.addCommonOptions(parser)
+AmdGPUOptions.addAmdGPUOptions(parser)
+Ruby.define_options(parser)
+GPUTLBOptions.tlb_options(parser)
+addDemoOptions(parser)
+
+# Parse now so we can override options
+args = parser.parse_args()
+demo_runscript = ""
+
+# Create temp script to run application
+if args.app is None:
+print(f"No application given. Use {sys.argv[0]} -a ")
+sys.exit(1)
+elif args.kernel is None:
+print(f"No kernel path given. Use {sys.argv[0]} --kernel  

[gem5-dev] [L] Change in gem5/gem5[develop]: configs: Create base GPUFS vega config and atomic config

2023-06-29 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/71939?usp=email )



Change subject: configs: Create base GPUFS vega config and atomic config
..

configs: Create base GPUFS vega config and atomic config

Move the Vega KVM script code to a common base file and add scripts for
KVM and atomic. Since atomic is now possible in GPUFS this gives a way
to run it without editing the current scripts.

Change-Id: I094bc4d4df856563535c28c1f6d6cc045d6734cd
---
A configs/example/gpufs/vega10.py
A configs/example/gpufs/vega10_atomic.py
M configs/example/gpufs/vega10_kvm.py
3 files changed, 188 insertions(+), 124 deletions(-)



diff --git a/configs/example/gpufs/vega10.py  
b/configs/example/gpufs/vega10.py

new file mode 100644
index 000..9eff5a2
--- /dev/null
+++ b/configs/example/gpufs/vega10.py
@@ -0,0 +1,153 @@
+# Copyright (c) 2022-2023 Advanced Micro Devices, Inc.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are  
met:

+#
+# 1. Redistributions of source code must retain the above copyright notice,
+# this list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright  
notice,

+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from this
+# software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS  
IS"

+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR  
PURPOSE

+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF  
THE

+# POSSIBILITY OF SUCH DAMAGE.
+
+import m5
+import runfs
+import base64
+import tempfile
+import argparse
+import sys
+import os
+
+from amd import AmdGPUOptions
+from common import Options
+from common import GPUTLBOptions
+from ruby import Ruby
+
+
+demo_runscript_without_checkpoint = """\
+export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
+export HSA_ENABLE_INTERRUPT=0
+dmesg -n8
+dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
+if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
+echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
+/sbin/m5 exit
+fi
+modprobe -v amdgpu ip_block_mask=0xff ppfeaturemask=0 dpm=0 audio=0
+echo "Running {} {}"
+echo "{}" | base64 -d > myapp
+chmod +x myapp
+./myapp {}
+/sbin/m5 exit
+"""
+
+demo_runscript_with_checkpoint = """\
+export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
+export HSA_ENABLE_INTERRUPT=0
+dmesg -n8
+dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
+if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
+echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
+/sbin/m5 exit
+fi
+modprobe -v amdgpu ip_block_mask=0xff ppfeaturemask=0 dpm=0 audio=0
+echo "Running {} {}"
+echo "{}" | base64 -d > myapp
+chmod +x myapp
+/sbin/m5 checkpoint
+./myapp {}
+/sbin/m5 exit
+"""
+
+
+def addDemoOptions(parser):
+parser.add_argument(
+"-a", "--app", default=None, help="GPU application to run"
+)
+parser.add_argument(
+"-o", "--opts", default="", help="GPU application arguments"
+)
+
+
+def runVegaGPUFS(cpu_type):
+parser = argparse.ArgumentParser()
+runfs.addRunFSOptions(parser)
+Options.addCommonOptions(parser)
+AmdGPUOptions.addAmdGPUOptions(parser)
+Ruby.define_options(parser)
+GPUTLBOptions.tlb_options(parser)
+addDemoOptions(parser)
+
+# Parse now so we can override options
+args = parser.parse_args()
+demo_runscript = ""
+
+# Create temp script to run application
+if args.app is None:
+print(f"No application given. Use {sys.argv[0]} -a ")
+sys.exit(1)
+elif args.kernel is None:
+print(f"No kernel path given. Use {sys.argv[0]} --kernel  
")

+sys.exit(1)
+elif args.disk_image is None:
+print(f"No disk path given. Use {sys.argv[0]} --disk-image  
")

+sys.exit(1)
+elif args.gpu_mmio_trace is None:
+print(f"No MMIO trace path. Use 

[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Add GPUFS --root-partition option

2023-06-29 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/71918?usp=email )


Change subject: configs: Add GPUFS --root-partition option
..

configs: Add GPUFS --root-partition option

Different GPUFS disk images have different root partitions that Linux
needs to boot from. In particular, Ubuntu's new installer has a GRUB
partition that cannot seem to be removed. Adding this as an option
prevents needing to edit a config script to change one character each
time a different disk image is used.

Change-Id: Iac2996ea096047281891a70aa2901401ac9746fc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71918
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M configs/example/gpufs/runfs.py
M configs/example/gpufs/system/system.py
2 files changed, 8 insertions(+), 1 deletion(-)

Approvals:
  kokoro: Regressions pass
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved




diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py
index b045b80..5346622 100644
--- a/configs/example/gpufs/runfs.py
+++ b/configs/example/gpufs/runfs.py
@@ -151,6 +151,13 @@
 help="Exit simulation after running this many kernels",
 )

+parser.add_argument(
+"--root-partition",
+type=str,
+default="/dev/sda1",
+help="Root partition of disk image",
+)
+

 def runGpuFSSystem(args):
 """
diff --git a/configs/example/gpufs/system/system.py  
b/configs/example/gpufs/system/system.py

index 263ffc0..40e0016 100644
--- a/configs/example/gpufs/system/system.py
+++ b/configs/example/gpufs/system/system.py
@@ -50,7 +50,7 @@
 "earlyprintk=ttyS0",
 "console=ttyS0,9600",
 "lpj=723",
-"root=/dev/sda1",
+f"root={args.root_partition}",
 "drm_kms_helper.fbdev_emulation=0",
 "modprobe.blacklist=amdgpu",
 "modprobe.blacklist=psmouse",

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/71918?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Iac2996ea096047281891a70aa2901401ac9746fc
Gerrit-Change-Number: 71918
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [XS] Change in gem5/gem5[develop]: arch-vega: Add Vega D16 decodings and fix V_SWAP_B32

2023-06-29 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/71899?usp=email )


Change subject: arch-vega: Add Vega D16 decodings and fix V_SWAP_B32
..

arch-vega: Add Vega D16 decodings and fix V_SWAP_B32

Vega adds multiple new D16 instructions which load a byte or short into
the lower or upper 16 bits of a register for packed math. The decoder
table has subDecode tables for FLAT instructions which represents 32
opcodes in each subDecode table. The subDecode table for opcodes 32-63
is missing so it is added here.

The opcode for V_SWAP_B32 is also off by one- In the ISA manual this
instruction is opcode 81, the instruction before is 79, and there is no
opcode 80, so the decoder entry is swapped with the invalid decoding
below it.

Change-Id: I278fea574ea684ccc6302d5b4d0f5dd8813a88ad
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71899
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/vega/decoder.cc
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/decoder.cc  
b/src/arch/amdgpu/vega/decoder.cc

index fd3a803..a86dd66 100644
--- a/src/arch/amdgpu/vega/decoder.cc
+++ b/src/arch/amdgpu/vega/decoder.cc
@@ -495,7 +495,7 @@
 ::decode_invalid,
 ::decode_invalid,
 ::subDecode_OP_FLAT,
-::decode_invalid,
+::subDecode_OP_FLAT,
 ::subDecode_OP_FLAT,
 ::subDecode_OP_FLAT,
 ::decode_invalid,
@@ -3140,8 +3140,8 @@
 ::decode_OP_VOP1__V_CVT_NORM_I16_F16,
 ::decode_OP_VOP1__V_CVT_NORM_U16_F16,
 ::decode_OP_VOP1__V_SAT_PK_U8_I16,
-::decode_OP_VOP1__V_SWAP_B32,
 ::decode_invalid,
+::decode_OP_VOP1__V_SWAP_B32,
 ::decode_invalid,
 ::decode_invalid,
 ::decode_invalid,

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/71899?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I278fea574ea684ccc6302d5b4d0f5dd8813a88ad
Gerrit-Change-Number: 71899
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Perform frame writes atomically

2023-06-29 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/71898?usp=email )


Change subject: dev-amdgpu: Perform frame writes atomically
..

dev-amdgpu: Perform frame writes atomically

The PCI read/write functions are atomic functions in gem5, meaning they
expect a response with a latency value on the same simulation Tick. For
reads to a PCI device, the response must also include a data value read
from the device.

The AMDGPU device has a PCI BAR which mirrors the frame buffer memory.
Currently reads are done atomically, but writes are sent to a DMA device
without waiting for a write completion ACK. As a result, it is possible
that writes can be queued in the DMA device long enough that another
read for a queued address arrives. This happens very deterministically
with the AtomicSimpleCPU and causes GPUFS to break with that CPU.

This change makes writes to the frame BAR atomic the same as reads. This
avoids that problem and as a result the AtomicSimpleCPU can now load the
driver for GPUFS simulations.

Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71898
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matthew Poremba 
Reviewed-by: Matthew Poremba 
---
M src/dev/amdgpu/amdgpu_device.cc
1 file changed, 16 insertions(+), 2 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass
  Matthew Poremba: Looks good to me, approved; Looks good to me, approved




diff --git a/src/dev/amdgpu/amdgpu_device.cc  
b/src/dev/amdgpu/amdgpu_device.cc

index 3260d05..d1058f1 100644
--- a/src/dev/amdgpu/amdgpu_device.cc
+++ b/src/dev/amdgpu/amdgpu_device.cc
@@ -349,6 +349,22 @@
 }

 nbio.writeFrame(pkt, offset);
+
+/*
+ * Write the value to device memory. This must be done functionally
+ * because this method is called by the PCIDevice::write method which
+ * is a non-timing write.
+ */
+RequestPtr req = std::make_shared(offset, pkt->getSize(), 0,
+   vramRequestorId());
+PacketPtr writePkt = Packet::createWrite(req);
+uint8_t *dataPtr = new uint8_t[pkt->getSize()];
+std::memcpy(dataPtr, pkt->getPtr(),
+pkt->getSize() * sizeof(uint8_t));
+writePkt->dataDynamic(dataPtr);
+
+auto system = cp->shader()->gpuCmdProc.system();
+system->getDeviceMemory(writePkt)->access(writePkt);
 }

 void
@@ -489,8 +505,6 @@

 switch (barnum) {
   case FRAMEBUFFER_BAR:
-  gpuMemMgr->writeRequest(offset, pkt->getPtr(),
-  pkt->getSize(), 0, nullptr);
   writeFrame(pkt, offset);
   break;
   case DOORBELL_BAR:

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/71898?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db
Gerrit-Change-Number: 71898
Gerrit-PatchSet: 4
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Add GPUFS --root-partition option

2023-06-29 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/71918?usp=email )



Change subject: configs: Add GPUFS --root-partition option
..

configs: Add GPUFS --root-partition option

Different GPUFS disk images have different root partitions that Linux
needs to boot from. In particular, Ubuntu's new installer has a GRUB
partition that cannot seem to be removed. Adding this as an option
prevents needing to edit a config script to change one character each
time a different disk image is used.

Change-Id: Iac2996ea096047281891a70aa2901401ac9746fc
---
M configs/example/gpufs/runfs.py
M configs/example/gpufs/system/system.py
2 files changed, 8 insertions(+), 1 deletion(-)



diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py
index b045b80..5346622 100644
--- a/configs/example/gpufs/runfs.py
+++ b/configs/example/gpufs/runfs.py
@@ -151,6 +151,13 @@
 help="Exit simulation after running this many kernels",
 )

+parser.add_argument(
+"--root-partition",
+type=str,
+default="/dev/sda1",
+help="Root partition of disk image",
+)
+

 def runGpuFSSystem(args):
 """
diff --git a/configs/example/gpufs/system/system.py  
b/configs/example/gpufs/system/system.py

index 263ffc0..40e0016 100644
--- a/configs/example/gpufs/system/system.py
+++ b/configs/example/gpufs/system/system.py
@@ -50,7 +50,7 @@
 "earlyprintk=ttyS0",
 "console=ttyS0,9600",
 "lpj=723",
-"root=/dev/sda1",
+f"root={args.root_partition}",
 "drm_kms_helper.fbdev_emulation=0",
 "modprobe.blacklist=amdgpu",
 "modprobe.blacklist=psmouse",

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/71918?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Iac2996ea096047281891a70aa2901401ac9746fc
Gerrit-Change-Number: 71918
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [XS] Change in gem5/gem5[develop]: arch-vega: Add Vega D16 decodings and fix V_SWAP_B32

2023-06-28 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/71899?usp=email )



Change subject: arch-vega: Add Vega D16 decodings and fix V_SWAP_B32
..

arch-vega: Add Vega D16 decodings and fix V_SWAP_B32

Vega adds multiple new D16 instructions which load a byte or short into
the lower or upper 16 bits of a register for packed math. The decoder
table has subDecode tables for FLAT instructions which represents 32
opcodes in each subDecode table. The subDecode table for opcodes 32-63
is missing so it is added here.

The opcode for V_SWAP_B32 is also off by one- In the ISA manual this
instruction is opcode 81, the instruction before is 79, and there is no
opcode 80, so the decoder entry is swapped with the invalid decoding
below it.

Change-Id: I278fea574ea684ccc6302d5b4d0f5dd8813a88ad
---
M src/arch/amdgpu/vega/decoder.cc
1 file changed, 2 insertions(+), 2 deletions(-)



diff --git a/src/arch/amdgpu/vega/decoder.cc  
b/src/arch/amdgpu/vega/decoder.cc

index fd3a803..a86dd66 100644
--- a/src/arch/amdgpu/vega/decoder.cc
+++ b/src/arch/amdgpu/vega/decoder.cc
@@ -495,7 +495,7 @@
 ::decode_invalid,
 ::decode_invalid,
 ::subDecode_OP_FLAT,
-::decode_invalid,
+::subDecode_OP_FLAT,
 ::subDecode_OP_FLAT,
 ::subDecode_OP_FLAT,
 ::decode_invalid,
@@ -3140,8 +3140,8 @@
 ::decode_OP_VOP1__V_CVT_NORM_I16_F16,
 ::decode_OP_VOP1__V_CVT_NORM_U16_F16,
 ::decode_OP_VOP1__V_SAT_PK_U8_I16,
-::decode_OP_VOP1__V_SWAP_B32,
 ::decode_invalid,
+::decode_OP_VOP1__V_SWAP_B32,
 ::decode_invalid,
 ::decode_invalid,
 ::decode_invalid,

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/71899?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I278fea574ea684ccc6302d5b4d0f5dd8813a88ad
Gerrit-Change-Number: 71899
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Perform frame writes atomically

2023-06-28 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/71898?usp=email )



Change subject: dev-amdgpu: Perform frame writes atomically
..

dev-amdgpu: Perform frame writes atomically

The PCI read/write functions are atomic functions in gem5, meaning they
expect a response with a latency value on the same simulation Tick. For
reads to a PCI device, the response must also include a data value read
from the device.

The AMDGPU device has a PCI BAR which mirrors the frame buffer memory.
Currently reads are done atomically, but writes are sent to a DMA device
without waiting for a write completion ACK. As a result, it is possible
that writes can be queued in the DMA device long enough that another
read for a queued address arrives. This happens very deterministically
with the AtomicSimpleCPU and cause GPUFS to break with that CPU.

This change makes writes to the frame BAR atomic the same as reads. This
avoids that problem and as a result the AtomicSimpleCPU can now load the
driver for GPUFS simulations.

Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db
---
M src/dev/amdgpu/amdgpu_device.cc
1 file changed, 16 insertions(+), 2 deletions(-)



diff --git a/src/dev/amdgpu/amdgpu_device.cc  
b/src/dev/amdgpu/amdgpu_device.cc

index 3260d05..226fc99 100644
--- a/src/dev/amdgpu/amdgpu_device.cc
+++ b/src/dev/amdgpu/amdgpu_device.cc
@@ -349,6 +349,22 @@
 }

 nbio.writeFrame(pkt, offset);
+
+/*
+ * Read the value from device memory. This must be done functionally
+ * because this method is called by the PCIDevice::read method which
+ * is a non-timing read.
+ */
+RequestPtr req = std::make_shared(offset, pkt->getSize(), 0,
+   vramRequestorId());
+PacketPtr writePkt = Packet::createWrite(req);
+uint8_t *dataPtr = new uint8_t[pkt->getSize()];
+std::memcpy(dataPtr, pkt->getPtr(),
+pkt->getSize() * sizeof(uint8_t));
+writePkt->dataDynamic(dataPtr);
+
+auto system = cp->shader()->gpuCmdProc.system();
+system->getDeviceMemory(writePkt)->access(writePkt);
 }

 void
@@ -489,8 +505,6 @@

 switch (barnum) {
   case FRAMEBUFFER_BAR:
-  gpuMemMgr->writeRequest(offset, pkt->getPtr(),
-  pkt->getSize(), 0, nullptr);
   writeFrame(pkt, offset);
   break;
   case DOORBELL_BAR:

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/71898?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I9a8e8b172712c78b667ebcec81a0c5d0060234db
Gerrit-Change-Number: 71898
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Helper methods for SDWA/DPP for VOP2

2023-06-15 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70738?usp=email )


Change subject: arch-vega: Helper methods for SDWA/DPP for VOP2
..

arch-vega: Helper methods for SDWA/DPP for VOP2

Many of the outstanding issues with the GPU model are related to
instructions not having SDWA/DPP implementations and executing by
ignoring the special registers leading to incorrect executiong.
Adding SDWA/DPP is current very cumbersome as there is a lot of
boilerplate code.

This changeset adds helper methods for VOP2 with one instruction
changed as an example. This review is intended to get feedback
before applying this change to all VOP2 instructions that support
SDWA/DPP.

Change-Id: I1edbc3f3bb166d34f151545aa9f47a94150e1406
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70738
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/op_encodings.hh
2 files changed, 97 insertions(+), 52 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 6c014bc..0d3f2dc 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -6384,65 +6384,17 @@
 void
 Inst_VOP2__V_MUL_U32_U24::execute(GPUDynInstPtr gpuDynInst)
 {
-Wavefront *wf = gpuDynInst->wavefront();
-ConstVecOperandU32 src0(gpuDynInst, instData.SRC0);
-VecOperandU32 src1(gpuDynInst, instData.VSRC1);
-VecOperandU32 vdst(gpuDynInst, instData.VDST);
-
-src0.readSrc();
-src1.read();
-
-if (isSDWAInst()) {
-VecOperandU32 src0_sdwa(gpuDynInst,  
extData.iFmt_VOP_SDWA.SRC0);

-// use copies of original src0, src1, and dest during selecting
-VecOperandU32 origSrc0_sdwa(gpuDynInst,
-extData.iFmt_VOP_SDWA.SRC0);
-VecOperandU32 origSrc1(gpuDynInst, instData.VSRC1);
-VecOperandU32 origVdst(gpuDynInst, instData.VDST);
-
-src0_sdwa.read();
-origSrc0_sdwa.read();
-origSrc1.read();
-
-DPRINTF(VEGA, "Handling V_MUL_U32_U24 SRC SDWA. SRC0:  
register "

-"v[%d], DST_SEL: %d, DST_U: %d, CLMP: %d, SRC0_SEL: "
-"%d, SRC0_SEXT: %d, SRC0_NEG: %d, SRC0_ABS: %d,  
SRC1_SEL: "

-"%d, SRC1_SEXT: %d, SRC1_NEG: %d, SRC1_ABS: %d\n",
-extData.iFmt_VOP_SDWA.SRC0,  
extData.iFmt_VOP_SDWA.DST_SEL,

-extData.iFmt_VOP_SDWA.DST_U,
-extData.iFmt_VOP_SDWA.CLMP,
-extData.iFmt_VOP_SDWA.SRC0_SEL,
-extData.iFmt_VOP_SDWA.SRC0_SEXT,
-extData.iFmt_VOP_SDWA.SRC0_NEG,
-extData.iFmt_VOP_SDWA.SRC0_ABS,
-extData.iFmt_VOP_SDWA.SRC1_SEL,
-extData.iFmt_VOP_SDWA.SRC1_SEXT,
-extData.iFmt_VOP_SDWA.SRC1_NEG,
-extData.iFmt_VOP_SDWA.SRC1_ABS);
-
-processSDWA_src(extData.iFmt_VOP_SDWA, src0_sdwa,  
origSrc0_sdwa,

-src1, origSrc1);
-
-for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
-if (wf->execMask(lane)) {
-vdst[lane] = bits(src0_sdwa[lane], 23, 0) *
- bits(src1[lane], 23, 0);
-origVdst[lane] = vdst[lane]; // keep copy consistent
-}
-}
-
-processSDWA_dst(extData.iFmt_VOP_SDWA, vdst, origVdst);
-} else {
+auto opImpl = [](VecOperandU32& src0, VecOperandU32& src1,
+ VecOperandU32& vdst, Wavefront* wf) {
 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (wf->execMask(lane)) {
 vdst[lane] = bits(src0[lane], 23, 0) *
  bits(src1[lane], 23, 0);
 }
 }
-}
+};

-
-vdst.write();
+vop2Helper(gpuDynInst, opImpl);
 } // execute
 // --- Inst_VOP2__V_MUL_HI_U32_U24 class methods ---

diff --git a/src/arch/amdgpu/vega/insts/op_encodings.hh  
b/src/arch/amdgpu/vega/insts/op_encodings.hh

index 1071ead..f195472 100644
--- a/src/arch/amdgpu/vega/insts/op_encodings.hh
+++ b/src/arch/amdgpu/vega/insts/op_encodings.hh
@@ -272,6 +272,99 @@
 InstFormat extData;
 uint32_t varSize;

+template
+T sdwaSrcHelper(GPUDynInstPtr gpuDynInst, T & src1)
+{
+T src0_sdwa(gpuDynInst, extData.iFmt_VOP_SDWA.SRC0);
+// use copies of original src0, src1, and dest during selecting
+ 

[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: GPUFS: Only use parallel eventqs for KVM

2023-06-08 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/71419?usp=email )


Change subject: configs: GPUFS: Only use parallel eventqs for KVM
..

configs: GPUFS: Only use parallel eventqs for KVM

This is turned on by default with multiple CPUs in the GPUFS configs,
which causes other CPU types (e.g., AtomicSimpleCPU) to assert. Only
enable parallel event queues for KVM CPUs to avoid this issue.

Change-Id: Ic8235437caf0150560e2b360a4544d82dfc26c36
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71419
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
---
M configs/example/gpufs/runfs.py
1 file changed, 2 insertions(+), 1 deletion(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py
index 01203bb..b045b80 100644
--- a/configs/example/gpufs/runfs.py
+++ b/configs/example/gpufs/runfs.py
@@ -162,7 +162,8 @@
 # GPUFS is primarily designed to use the X86 KVM CPU. This model needs  
to
 # use multiple event queues when more than one CPU is simulated. Force  
it

 # on if that is the case.
-args.host_parallel = True if args.num_cpus > 1 else False
+if ObjectList.is_kvm_cpu(ObjectList.cpu_list.get(args.cpu_type)):
+args.host_parallel = True if args.num_cpus > 1 else False

 # These are used by the protocols. They should not be set by the user.
 n_cu = args.num_compute_units

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/71419?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ic8235437caf0150560e2b360a4544d82dfc26c36
Gerrit-Change-Number: 71419
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: configs,gpu-compute: Kernel dispatch-based exit events

2023-06-08 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/71418?usp=email )


Change subject: configs,gpu-compute: Kernel dispatch-based exit events
..

configs,gpu-compute: Kernel dispatch-based exit events

Add two kernel dispatch-based exit events that are useful for limiting
the simulation and enabling debug flags at specific GPU kernels. Since
the KVM CPU typically used with GPUFS is not deterministic, this help
with enabling debug flags when the Tick number may vary. The exit at GPU
kernel option can also limit simulation by only simulating a few hundred
kernels, for example, and exit at a determined point.

Change-Id: I81bae92a80c25fc38c41e999aa662e1417b7a20d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71418
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
---
M configs/example/gpufs/runfs.py
M src/gpu-compute/dispatcher.cc
2 files changed, 30 insertions(+), 0 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py
index f8ef70d..01203bb 100644
--- a/configs/example/gpufs/runfs.py
+++ b/configs/example/gpufs/runfs.py
@@ -137,6 +137,20 @@
 "MI200 (gfx90a)",
 )

+parser.add_argument(
+"--debug-at-gpu-kernel",
+type=int,
+default=-1,
+help="Turn on debug flags starting with this kernel",
+)
+
+parser.add_argument(
+"--exit-at-gpu-kernel",
+type=int,
+default=-1,
+help="Exit simulation after running this many kernels",
+)
+

 def runGpuFSSystem(args):
 """
@@ -184,6 +198,9 @@

 print("Running the simulation")
 sim_ticks = args.abs_max_tick
+kernels_launched = 0
+if args.debug_at_gpu_kernel != -1:
+m5.trace.disable()

 exit_event = m5.simulate(sim_ticks)

@@ -199,11 +216,21 @@
 assert args.checkpoint_dir is not None
 m5.checkpoint(args.checkpoint_dir)
 break
+elif "GPU Kernel Completed" in exit_event.getCause():
+kernels_launched += 1
 else:
 print(
 f"Unknown exit event: {exit_event.getCause()}.  
Continuing..."

 )

+if kernels_launched == args.debug_at_gpu_kernel:
+m5.trace.enable()
+if kernels_launched == args.exit_at_gpu_kernel:
+print(f"Exiting @ GPU kernel {kernels_launched}")
+break
+
+exit_event = m5.simulate(sim_ticks - m5.curTick())
+
 print(
 "Exiting @ tick %i because %s" % (m5.curTick(),  
exit_event.getCause())

 )
diff --git a/src/gpu-compute/dispatcher.cc b/src/gpu-compute/dispatcher.cc
index a76ba7c..b19bccc 100644
--- a/src/gpu-compute/dispatcher.cc
+++ b/src/gpu-compute/dispatcher.cc
@@ -40,6 +40,7 @@
 #include "gpu-compute/hsa_queue_entry.hh"
 #include "gpu-compute/shader.hh"
 #include "gpu-compute/wavefront.hh"
+#include "sim/sim_exit.hh"
 #include "sim/syscall_emul_buf.hh"
 #include "sim/system.hh"

@@ -330,6 +331,8 @@
 DPRINTF(GPUWgLatency, "Kernel Complete ticks:%d kernel:%d\n",
 curTick(), kern_id);
 DPRINTF(GPUKernelInfo, "Completed kernel %d\n", kern_id);
+
+exitSimLoop("GPU Kernel Completed");
 }

 if (!tickEvent.scheduled()) {

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/71418?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I81bae92a80c25fc38c41e999aa662e1417b7a20d
Gerrit-Change-Number: 71418
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: VISHNU RAMADAS 
Gerrit-CC: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: configs,gpu-compute: Kernel dispatch-based exit events

2023-06-08 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/71418?usp=email )



Change subject: configs,gpu-compute: Kernel dispatch-based exit events
..

configs,gpu-compute: Kernel dispatch-based exit events

Add two kernel dispatch-based exit events that are useful for limiting
the simulation and enabling debug flags at specific GPU kernels. Since
the KVM CPU typically used with GPUFS is not deterministic, this help
with enabling debug flags when the Tick number may vary. The exit at GPU
kernel option can also limit simulation by only simulating a few hundred
kernels, for example, and exit at a determined point.

Change-Id: I81bae92a80c25fc38c41e999aa662e1417b7a20d
---
M configs/example/gpufs/runfs.py
M src/gpu-compute/dispatcher.cc
2 files changed, 30 insertions(+), 0 deletions(-)



diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py
index f8ef70d..01203bb 100644
--- a/configs/example/gpufs/runfs.py
+++ b/configs/example/gpufs/runfs.py
@@ -137,6 +137,20 @@
 "MI200 (gfx90a)",
 )

+parser.add_argument(
+"--debug-at-gpu-kernel",
+type=int,
+default=-1,
+help="Turn on debug flags starting with this kernel",
+)
+
+parser.add_argument(
+"--exit-at-gpu-kernel",
+type=int,
+default=-1,
+help="Exit simulation after running this many kernels",
+)
+

 def runGpuFSSystem(args):
 """
@@ -184,6 +198,9 @@

 print("Running the simulation")
 sim_ticks = args.abs_max_tick
+kernels_launched = 0
+if args.debug_at_gpu_kernel != -1:
+m5.trace.disable()

 exit_event = m5.simulate(sim_ticks)

@@ -199,11 +216,21 @@
 assert args.checkpoint_dir is not None
 m5.checkpoint(args.checkpoint_dir)
 break
+elif "GPU Kernel Completed" in exit_event.getCause():
+kernels_launched += 1
 else:
 print(
 f"Unknown exit event: {exit_event.getCause()}.  
Continuing..."

 )

+if kernels_launched == args.debug_at_gpu_kernel:
+m5.trace.enable()
+if kernels_launched == args.exit_at_gpu_kernel:
+print(f"Exiting @ GPU kernel {kernels_launched}")
+break
+
+exit_event = m5.simulate(sim_ticks - m5.curTick())
+
 print(
 "Exiting @ tick %i because %s" % (m5.curTick(),  
exit_event.getCause())

 )
diff --git a/src/gpu-compute/dispatcher.cc b/src/gpu-compute/dispatcher.cc
index a76ba7c..b19bccc 100644
--- a/src/gpu-compute/dispatcher.cc
+++ b/src/gpu-compute/dispatcher.cc
@@ -40,6 +40,7 @@
 #include "gpu-compute/hsa_queue_entry.hh"
 #include "gpu-compute/shader.hh"
 #include "gpu-compute/wavefront.hh"
+#include "sim/sim_exit.hh"
 #include "sim/syscall_emul_buf.hh"
 #include "sim/system.hh"

@@ -330,6 +331,8 @@
 DPRINTF(GPUWgLatency, "Kernel Complete ticks:%d kernel:%d\n",
 curTick(), kern_id);
 DPRINTF(GPUKernelInfo, "Completed kernel %d\n", kern_id);
+
+exitSimLoop("GPU Kernel Completed");
 }

 if (!tickEvent.scheduled()) {

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/71418?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I81bae92a80c25fc38c41e999aa662e1417b7a20d
Gerrit-Change-Number: 71418
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: GPUFS: Only use parallel eventqs for KVM

2023-06-08 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/71419?usp=email )



Change subject: configs: GPUFS: Only use parallel eventqs for KVM
..

configs: GPUFS: Only use parallel eventqs for KVM

This is turned on by default with multiple CPUs in the GPUFS configs,
which causes other CPU types (e.g., AtomicSimpleCPU) to assert. Only
enable parallel event queues for KVM CPUs to avoid this issue.

Change-Id: Ic8235437caf0150560e2b360a4544d82dfc26c36
---
M configs/example/gpufs/runfs.py
1 file changed, 2 insertions(+), 1 deletion(-)



diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py
index 01203bb..b045b80 100644
--- a/configs/example/gpufs/runfs.py
+++ b/configs/example/gpufs/runfs.py
@@ -162,7 +162,8 @@
 # GPUFS is primarily designed to use the X86 KVM CPU. This model needs  
to
 # use multiple event queues when more than one CPU is simulated. Force  
it

 # on if that is the case.
-args.host_parallel = True if args.num_cpus > 1 else False
+if ObjectList.is_kvm_cpu(ObjectList.cpu_list.get(args.cpu_type)):
+args.host_parallel = True if args.num_cpus > 1 else False

 # These are used by the protocols. They should not be set by the user.
 n_cu = args.num_compute_units

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/71419?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ic8235437caf0150560e2b360a4544d82dfc26c36
Gerrit-Change-Number: 71419
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [XS] Change in gem5/gem5[develop]: gpu-compute: Gfx version check for FS and SE mode

2023-05-31 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/71078?usp=email )


Change subject: gpu-compute: Gfx version check for FS and SE mode
..

gpu-compute: Gfx version check for FS and SE mode

There is no GPU device in SE mode to get version from and no GPU driver
in FS mode to get version from, so a conditional needs to be added
depending on the mode to get the gfx version.

Change-Id: I33fdafb60d351ebc5148e2248244537fb5bebd31
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71078
Tested-by: kokoro 
Maintainer: Matt Sinclair 
Reviewed-by: Matt Sinclair 
---
M src/gpu-compute/gpu_command_processor.cc
M src/gpu-compute/gpu_compute_driver.hh
2 files changed, 5 insertions(+), 1 deletion(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/gpu-compute/gpu_command_processor.cc  
b/src/gpu-compute/gpu_command_processor.cc

index 9755180..8f748bd 100644
--- a/src/gpu-compute/gpu_command_processor.cc
+++ b/src/gpu-compute/gpu_command_processor.cc
@@ -227,9 +227,11 @@

 DPRINTF(GPUKernelInfo, "Kernel name: %s\n", kernel_name.c_str());

+GfxVersion gfxVersion = FullSystem ? gpuDevice->getGfxVersion()
+  : driver()->getGfxVersion();
 HSAQueueEntry *task = new HSAQueueEntry(kernel_name, queue_id,
 dynamic_task_id, raw_pkt, , host_pkt_addr, machine_code_addr,
-gpuDevice->getGfxVersion());
+gfxVersion);

 DPRINTF(GPUCommandProc, "Task ID: %i Got AQL: wg size (%dx%dx%d), "
 "grid size (%dx%dx%d) kernarg addr: %#x, completion "
diff --git a/src/gpu-compute/gpu_compute_driver.hh  
b/src/gpu-compute/gpu_compute_driver.hh

index def40f4..9a3c647 100644
--- a/src/gpu-compute/gpu_compute_driver.hh
+++ b/src/gpu-compute/gpu_compute_driver.hh
@@ -142,6 +142,8 @@
 };
 typedef class EventTableEntry ETEntry;

+GfxVersion getGfxVersion() const { return gfxVersion; }
+
   private:
 /**
  * GPU that is controlled by this driver.

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/71078?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I33fdafb60d351ebc5148e2248244537fb5bebd31
Gerrit-Change-Number: 71078
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Bobby Bruce 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [XS] Change in gem5/gem5[develop]: gpu-compute: Gfx version check for FS and SE mode

2023-05-30 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/71078?usp=email )



Change subject: gpu-compute: Gfx version check for FS and SE mode
..

gpu-compute: Gfx version check for FS and SE mode

There is no GPU device in SE mode to get version from and no GPU driver
in FS mode to get version from, so a conditional needs to be added
depending on the mode to get the gfx version.

Change-Id: I33fdafb60d351ebc5148e2248244537fb5bebd31
---
M src/gpu-compute/gpu_command_processor.cc
M src/gpu-compute/gpu_compute_driver.hh
2 files changed, 5 insertions(+), 1 deletion(-)



diff --git a/src/gpu-compute/gpu_command_processor.cc  
b/src/gpu-compute/gpu_command_processor.cc

index 9755180..8f748bd 100644
--- a/src/gpu-compute/gpu_command_processor.cc
+++ b/src/gpu-compute/gpu_command_processor.cc
@@ -227,9 +227,11 @@

 DPRINTF(GPUKernelInfo, "Kernel name: %s\n", kernel_name.c_str());

+GfxVersion gfxVersion = FullSystem ? gpuDevice->getGfxVersion()
+  : driver()->getGfxVersion();
 HSAQueueEntry *task = new HSAQueueEntry(kernel_name, queue_id,
 dynamic_task_id, raw_pkt, , host_pkt_addr, machine_code_addr,
-gpuDevice->getGfxVersion());
+gfxVersion);

 DPRINTF(GPUCommandProc, "Task ID: %i Got AQL: wg size (%dx%dx%d), "
 "grid size (%dx%dx%d) kernarg addr: %#x, completion "
diff --git a/src/gpu-compute/gpu_compute_driver.hh  
b/src/gpu-compute/gpu_compute_driver.hh

index def40f4..9a3c647 100644
--- a/src/gpu-compute/gpu_compute_driver.hh
+++ b/src/gpu-compute/gpu_compute_driver.hh
@@ -142,6 +142,8 @@
 };
 typedef class EventTableEntry ETEntry;

+GfxVersion getGfxVersion() const { return gfxVersion; }
+
   private:
 /**
  * GPU that is controlled by this driver.

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/71078?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I33fdafb60d351ebc5148e2248244537fb5bebd31
Gerrit-Change-Number: 71078
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: mem: Handle DRAM write queue drain and disabled power down

2023-05-25 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/69917?usp=email )


Change subject: mem: Handle DRAM write queue drain and disabled power down
..

mem: Handle DRAM write queue drain and disabled power down

Write queue drain logic seems off currently. An event is scheduled if
the write queue is empty instead of non-empty. There is no check to see
if draining is complete when bus is in write mode. Finally the power
down check on drain always fails if DRAM powerdown is disabled.

This changeset reverses the drain conditional for the write queue to
schedule an event if the write queue is *not* empty and checks in the
event processing method that the queues are all empty so that
signalDrainDone can be called. Lastly the powerdown state is ignored if
DRAM powerdown is disabled. Powerdown is disabled in the GPU_VIPER
protocol by default. This changeset successfully drains and checkpoints
a GPUFS simulation using GPU_VIPER protocol.

Change-Id: I5459856a694c9054b28677049a06b99b9ad91bbb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69917
Tested-by: kokoro 
Maintainer: Jason Lowe-Power 
Reviewed-by: Jason Lowe-Power 
---
M src/mem/dram_interface.hh
M src/mem/mem_ctrl.cc
2 files changed, 23 insertions(+), 4 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/mem/dram_interface.hh b/src/mem/dram_interface.hh
index fa9d319..e20e33f 100644
--- a/src/mem/dram_interface.hh
+++ b/src/mem/dram_interface.hh
@@ -380,7 +380,18 @@
  * @param Return true if the rank is idle from a bank
  *and power point of view
  */
-bool inPwrIdleState() const { return pwrState == PWR_IDLE; }
+bool
+inPwrIdleState() const
+{
+// If powerdown is not enabled, then the ranks never go to idle
+// states. In that case return true here to prevent  
checkpointing

+// from getting stuck waiting for DRAM to be idle.
+if (!dram.enableDRAMPowerdown) {
+return true;
+}
+
+return pwrState == PWR_IDLE;
+}

 /**
  * Trigger a self-refresh exit if there are entries enqueued
diff --git a/src/mem/mem_ctrl.cc b/src/mem/mem_ctrl.cc
index 543d637..290db3e 100644
--- a/src/mem/mem_ctrl.cc
+++ b/src/mem/mem_ctrl.cc
@@ -908,6 +908,13 @@
 }
 }

+if (drainState() == DrainState::Draining && !totalWriteQueueSize &&
+!totalReadQueueSize && respQEmpty() && allIntfDrained()) {
+
+DPRINTF(Drain, "MemCtrl controller done draining\n");
+signalDrainDone();
+}
+
 // updates current state
 busState = busStateNext;

@@ -1411,8 +1418,8 @@
 {
 // if there is anything in any of our internal queues, keep track
 // of that as well
-if (!(!totalWriteQueueSize && !totalReadQueueSize && respQueue.empty()  
&&

-  allIntfDrained())) {
+if (totalWriteQueueSize || totalReadQueueSize || !respQueue.empty() ||
+  !allIntfDrained()) {

 DPRINTF(Drain, "Memory controller not drained, write: %d,  
read: %d,"

 " resp: %d\n", totalWriteQueueSize, totalReadQueueSize,
@@ -1420,7 +1427,8 @@

 // the only queue that is not drained automatically over time
 // is the write queue, thus kick things into action if needed
-if (!totalWriteQueueSize && !nextReqEvent.scheduled()) {
+if (totalWriteQueueSize && !nextReqEvent.scheduled()) {
+DPRINTF(Drain,"Scheduling nextReqEvent from drain\n");
 schedule(nextReqEvent, curTick());
 }


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/69917?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I5459856a694c9054b28677049a06b99b9ad91bbb
Gerrit-Change-Number: 69917
Gerrit-PatchSet: 4
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: Nikos Nikoleris 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Matt Sinclair 
Gerrit-CC: Matt Sinclair 
Gerrit-CC: Melissa Jost 
Gerrit-CC: VISHNU RAMADAS 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: configs,dev-amdgpu: GPUFS MI200/gfx90a support

2023-05-25 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70317?usp=email )


Change subject: configs,dev-amdgpu: GPUFS MI200/gfx90a support
..

configs,dev-amdgpu: GPUFS MI200/gfx90a support

Add support for MI200-like device. This includes adding PCI IDs and new
MMIOs for the device, a different MAP_PROCESS packet, and a different
calculation for the number of VGPRs.

Change-Id: I0fb7b3ad928826beaa5386d52a94ba504369cb0d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70317
Reviewed-by: Jason Lowe-Power 
Maintainer: Jason Lowe-Power 
Tested-by: kokoro 
---
M configs/example/gpufs/runfs.py
M configs/example/gpufs/system/amdgpu.py
M configs/example/gpufs/system/system.py
M src/dev/amdgpu/amdgpu_device.cc
M src/dev/amdgpu/amdgpu_device.hh
M src/dev/amdgpu/amdgpu_nbio.cc
M src/dev/amdgpu/amdgpu_nbio.hh
M src/dev/amdgpu/amdgpu_vm.hh
M src/dev/amdgpu/pm4_defines.hh
M src/dev/amdgpu/pm4_packet_processor.cc
M src/dev/amdgpu/pm4_packet_processor.hh
M src/gpu-compute/GPU.py
M src/gpu-compute/gpu_command_processor.cc
M src/gpu-compute/hsa_queue_entry.hh
14 files changed, 173 insertions(+), 27 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py
index 4c90601..f8ef70d 100644
--- a/configs/example/gpufs/runfs.py
+++ b/configs/example/gpufs/runfs.py
@@ -132,8 +132,9 @@
 parser.add_argument(
 "--gpu-device",
 default="Vega10",
-choices=["Vega10", "MI100"],
-help="GPU model to run: Vega10 (gfx900) or MI100 (gfx908)",
+choices=["Vega10", "MI100", "MI200"],
+help="GPU model to run: Vega10 (gfx900), MI100 (gfx908), or "
+"MI200 (gfx90a)",
 )


diff --git a/configs/example/gpufs/system/amdgpu.py  
b/configs/example/gpufs/system/amdgpu.py

index 5f98b55..9697e50 100644
--- a/configs/example/gpufs/system/amdgpu.py
+++ b/configs/example/gpufs/system/amdgpu.py
@@ -177,6 +177,10 @@
 system.pc.south_bridge.gpu.DeviceID = 0x738C
 system.pc.south_bridge.gpu.SubsystemVendorID = 0x1002
 system.pc.south_bridge.gpu.SubsystemID = 0x0C34
+elif args.gpu_device == "MI200":
+system.pc.south_bridge.gpu.DeviceID = 0x740F
+system.pc.south_bridge.gpu.SubsystemVendorID = 0x1002
+system.pc.south_bridge.gpu.SubsystemID = 0x0C34
 elif args.gpu_device == "Vega10":
 system.pc.south_bridge.gpu.DeviceID = 0x6863
 else:
diff --git a/configs/example/gpufs/system/system.py  
b/configs/example/gpufs/system/system.py

index 90c5c01..263ffc0 100644
--- a/configs/example/gpufs/system/system.py
+++ b/configs/example/gpufs/system/system.py
@@ -152,6 +152,16 @@
 0x7D000,
 ]
 sdma_sizes = [0x1000] * 8
+elif args.gpu_device == "MI200":
+num_sdmas = 5
+sdma_bases = [
+0x4980,
+0x6180,
+0x78000,
+0x79000,
+0x7A000,
+]
+sdma_sizes = [0x1000] * 5
 else:
 m5.util.panic(f"Unknown GPU device {args.gpu_device}")

diff --git a/src/dev/amdgpu/amdgpu_device.cc  
b/src/dev/amdgpu/amdgpu_device.cc

index 7037e6f..3260d05 100644
--- a/src/dev/amdgpu/amdgpu_device.cc
+++ b/src/dev/amdgpu/amdgpu_device.cc
@@ -115,7 +115,7 @@
 sdmaFunc.insert({0x10b, ::setPageDoorbellOffsetLo});
 sdmaFunc.insert({0xe0, ::setPageSize});
 sdmaFunc.insert({0x113, ::setPageWptrLo});
-} else if (p.device_name == "MI100") {
+} else if (p.device_name == "MI100" || p.device_name == "MI200") {
 sdmaFunc.insert({0xd9, ::setPageBaseLo});
 sdmaFunc.insert({0xe1, ::setPageRptrLo});
 sdmaFunc.insert({0xe0, ::setPageRptrHi});
@@ -144,10 +144,19 @@
 if (p.device_name == "Vega10") {
 setRegVal(VEGA10_FB_LOCATION_BASE, mmhubBase >> 24);
 setRegVal(VEGA10_FB_LOCATION_TOP, mmhubTop >> 24);
+gfx_version = GfxVersion::gfx900;
 } else if (p.device_name == "MI100") {
 setRegVal(MI100_FB_LOCATION_BASE, mmhubBase >> 24);
 setRegVal(MI100_FB_LOCATION_TOP, mmhubTop >> 24);
 setRegVal(MI100_MEM_SIZE_REG, 0x3ff0); // 16GB of memory
+gfx_version = GfxVersion::gfx908;
+} else if (p.device_name == "MI200") {
+// This device can have either 64GB or 128GB of device memory.
+// This limits to 16GB for simulation.
+setRegVal(MI200_FB_LOCATION_BASE, mmhubBase >> 24);
+setRegVal(MI200_FB_LOCATION_TOP, mmhubTop >> 24);
+setRegVal(MI200_MEM_SIZE_REG, 0x3ff0);
+gfx_version = GfxVersion::gfx90a;
 } else {
 panic("Unknown GPU device %s\n", p.device_name);
 }
diff --git a/src/dev/amdgpu/amdgpu_device.hh  
b/src/dev/amdgpu/amdgpu_device.hh

index cab7991..56ed2f4 100644
--- a/src/dev/amdgpu/amdgpu_device.hh
+++ 

[gem5-dev] [XS] Change in gem5/gem5[develop]: arch-x86: Fix CPUID function 0

2023-05-25 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70778?usp=email )


Change subject: arch-x86: Fix CPUID function 0
..

arch-x86: Fix CPUID function 0

This should return the number of standard features, not the number of
extended features.

Change-Id: Ieb3a36d832cee603f1efd39b4f430b5ac0478561
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70778
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
---
M src/arch/x86/cpuid.cc
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/x86/cpuid.cc b/src/arch/x86/cpuid.cc
index 4ce66df..ac4709c 100644
--- a/src/arch/x86/cpuid.cc
+++ b/src/arch/x86/cpuid.cc
@@ -162,7 +162,7 @@
   ISA *isa = dynamic_cast(tc->getIsaPtr());
   auto vendor_string = isa->getVendorString();
   result = CpuidResult(
-  NumExtendedCpuidFuncs - 1,
+  NumStandardCpuidFuncs - 1,
   stringToRegister(vendor_string.c_str()),
   stringToRegister(vendor_string.c_str() + 4),
   stringToRegister(vendor_string.c_str() + 8));

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/70778?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ieb3a36d832cee603f1efd39b4f430b5ac0478561
Gerrit-Change-Number: 70778
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Gabe Black 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [XS] Change in gem5/gem5[develop]: dev-amdgpu: Update SDMA checkpointing

2023-05-23 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70878?usp=email )


Change subject: dev-amdgpu: Update SDMA checkpointing
..

dev-amdgpu: Update SDMA checkpointing

Patch https://gem5-review.googlesource.com/c/public/gem5/+/70040 added
support for a variable number of SDMA engines to support newer GPU
models. As part of this an SDMA IDs map was added to map from SDMA ID
number to the SDMA SimObject pointer. In order to get the correct
pointer in unserialize now, we need to store the ID in the checkpoint
and use that to index the new map. We can't simply assign using the loop
variable as the SDMAs might not be in order in the checkpoint and
additionally the checkpoint contains both the gfx and page offset for
the SDMA engines, so each SDMA is inserted into the SDMA offset map
(sdmaEngs) twice.

Change-Id: I08e9a8d785f467b6eebff8ab0a9336851c87258d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70878
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
---
M src/dev/amdgpu/amdgpu_device.cc
M src/dev/amdgpu/sdma_engine.hh
2 files changed, 5 insertions(+), 3 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/dev/amdgpu/amdgpu_device.cc  
b/src/dev/amdgpu/amdgpu_device.cc

index f58d1f7..7037e6f 100644
--- a/src/dev/amdgpu/amdgpu_device.cc
+++ b/src/dev/amdgpu/amdgpu_device.cc
@@ -604,7 +604,7 @@
 idx = 0;
 for (auto & it : sdmaEngs) {
 sdma_engs_offset[idx] = it.first;
-sdma_engs[idx] = idx;
+sdma_engs[idx] = it.second->getId();
 ++idx;
 }

@@ -675,8 +675,9 @@
 UNSERIALIZE_ARRAY(sdma_engs,  
sizeof(sdma_engs)/sizeof(sdma_engs[0]));


 for (int idx = 0; idx < sdma_engs_size; ++idx) {
-assert(sdmaIds.count(idx));
-SDMAEngine *sdma = sdmaIds[idx];
+int sdma_id = sdma_engs[idx];
+assert(sdmaIds.count(sdma_id));
+SDMAEngine *sdma = sdmaIds[sdma_id];
 sdmaEngs.insert(std::make_pair(sdma_engs_offset[idx], sdma));
 }
 }
diff --git a/src/dev/amdgpu/sdma_engine.hh b/src/dev/amdgpu/sdma_engine.hh
index 1e4f965..bcbd497 100644
--- a/src/dev/amdgpu/sdma_engine.hh
+++ b/src/dev/amdgpu/sdma_engine.hh
@@ -165,6 +165,7 @@
 void setGPUDevice(AMDGPUDevice *gpu_device);

 void setId(int _id) { id = _id; }
+int getId() const { return id; }
 /**
  * Returns the client id for the Interrupt Handler.
  */

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/70878?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I08e9a8d785f467b6eebff8ab0a9336851c87258d
Gerrit-Change-Number: 70878
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: VISHNU RAMADAS 
Gerrit-Reviewer: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [XS] Change in gem5/gem5[develop]: dev-amdgpu: Update SDMA checkpointing

2023-05-22 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70878?usp=email )



Change subject: dev-amdgpu: Update SDMA checkpointing
..

dev-amdgpu: Update SDMA checkpointing

Patch https://gem5-review.googlesource.com/c/public/gem5/+/70040 added
support for a variable number of SDMA engines to support newer GPU
models. As part of this an SDMA IDs map was added to map from SDMA ID
number to the SDMA SimObject pointer. In order to get the correct
pointer in unserialize now, we need to store the ID in the checkpoint
and use that to index the new map. We can't simply assign using the loop
variable as the SDMAs might not be in order in the checkpoint and
additionally the checkpoint contains both the gfx and page offset for
the SDMA engines, so each SDMA is inserted into the SDMA offset map
(sdmaEngs) twice.

Change-Id: I08e9a8d785f467b6eebff8ab0a9336851c87258d
---
M src/dev/amdgpu/amdgpu_device.cc
M src/dev/amdgpu/sdma_engine.hh
2 files changed, 5 insertions(+), 3 deletions(-)



diff --git a/src/dev/amdgpu/amdgpu_device.cc  
b/src/dev/amdgpu/amdgpu_device.cc

index f58d1f7..7037e6f 100644
--- a/src/dev/amdgpu/amdgpu_device.cc
+++ b/src/dev/amdgpu/amdgpu_device.cc
@@ -604,7 +604,7 @@
 idx = 0;
 for (auto & it : sdmaEngs) {
 sdma_engs_offset[idx] = it.first;
-sdma_engs[idx] = idx;
+sdma_engs[idx] = it.second->getId();
 ++idx;
 }

@@ -675,8 +675,9 @@
 UNSERIALIZE_ARRAY(sdma_engs,  
sizeof(sdma_engs)/sizeof(sdma_engs[0]));


 for (int idx = 0; idx < sdma_engs_size; ++idx) {
-assert(sdmaIds.count(idx));
-SDMAEngine *sdma = sdmaIds[idx];
+int sdma_id = sdma_engs[idx];
+assert(sdmaIds.count(sdma_id));
+SDMAEngine *sdma = sdmaIds[sdma_id];
 sdmaEngs.insert(std::make_pair(sdma_engs_offset[idx], sdma));
 }
 }
diff --git a/src/dev/amdgpu/sdma_engine.hh b/src/dev/amdgpu/sdma_engine.hh
index 1e4f965..bcbd497 100644
--- a/src/dev/amdgpu/sdma_engine.hh
+++ b/src/dev/amdgpu/sdma_engine.hh
@@ -165,6 +165,7 @@
 void setGPUDevice(AMDGPUDevice *gpu_device);

 void setId(int _id) { id = _id; }
+int getId() const { return id; }
 /**
  * Returns the client id for the Interrupt Handler.
  */

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/70878?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I08e9a8d785f467b6eebff8ab0a9336851c87258d
Gerrit-Change-Number: 70878
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Fix nbio psp ring assert

2023-05-22 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70677?usp=email )


Change subject: dev-amdgpu: Fix nbio psp ring assert
..

dev-amdgpu: Fix nbio psp ring assert

The size of the packet changes between ROCm 4.x and ROCm 5.x. Change how
the address is set based on the incoming packet size so that both
versions continue to work for now.

Change-Id: I91694e4760198fd9129e60140df4e863666be2e2
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70677
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M src/dev/amdgpu/amdgpu_nbio.cc
1 file changed, 17 insertions(+), 3 deletions(-)

Approvals:
  kokoro: Regressions pass
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved




diff --git a/src/dev/amdgpu/amdgpu_nbio.cc b/src/dev/amdgpu/amdgpu_nbio.cc
index 8064fd2..69e4373 100644
--- a/src/dev/amdgpu/amdgpu_nbio.cc
+++ b/src/dev/amdgpu/amdgpu_nbio.cc
@@ -162,9 +162,23 @@
 AMDGPUNbio::writeFrame(PacketPtr pkt, Addr offset)
 {
 if (offset == psp_ring_listen_addr) {
-assert(pkt->getSize() == 8);
-psp_ring_dev_addr = pkt->getLE()
-  - gpuDevice->getVM().getSysAddrRangeLow();
+DPRINTF(AMDGPUDevice, "Saw psp_ring_listen_addr with size %ld  
value "

+"%ld\n", pkt->getSize(), pkt->getUintX(ByteOrder::little));
+
+/*
+ * In ROCm versions 4.x this packet is a 4 byte value. In ROCm 5.x
+ * the packet is 8 bytes and mapped as a system address which needs
+ * to be subtracted out to get the framebuffer address.
+ */
+if (pkt->getSize() == 4) {
+psp_ring_dev_addr = pkt->getLE();
+} else if (pkt->getSize() == 8) {
+psp_ring_dev_addr = pkt->getUintX(ByteOrder::little)
+  - gpuDevice->getVM().getSysAddrRangeLow();
+} else {
+panic("Invalid write size to psp_ring_listen_addr\n");
+}
+
 DPRINTF(AMDGPUDevice, "Setting PSP ring device address to %#lx\n",
 psp_ring_dev_addr);
 }

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/70677?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I91694e4760198fd9129e60140df4e863666be2e2
Gerrit-Change-Number: 70677
Gerrit-PatchSet: 3
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: VISHNU RAMADAS 
Gerrit-Reviewer: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [XS] Change in gem5/gem5[develop]: arch-x86: Fix CPUID function 0

2023-05-19 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70778?usp=email )



Change subject: arch-x86: Fix CPUID function 0
..

arch-x86: Fix CPUID function 0

This should return the number of standard features, not the number of
extended features.

Change-Id: Ieb3a36d832cee603f1efd39b4f430b5ac0478561
---
M src/arch/x86/cpuid.cc
1 file changed, 1 insertion(+), 1 deletion(-)



diff --git a/src/arch/x86/cpuid.cc b/src/arch/x86/cpuid.cc
index 4ce66df..ac4709c 100644
--- a/src/arch/x86/cpuid.cc
+++ b/src/arch/x86/cpuid.cc
@@ -162,7 +162,7 @@
   ISA *isa = dynamic_cast(tc->getIsaPtr());
   auto vendor_string = isa->getVendorString();
   result = CpuidResult(
-  NumExtendedCpuidFuncs - 1,
+  NumStandardCpuidFuncs - 1,
   stringToRegister(vendor_string.c_str()),
   stringToRegister(vendor_string.c_str() + 4),
   stringToRegister(vendor_string.c_str() + 8));

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/70778?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ieb3a36d832cee603f1efd39b4f430b5ac0478561
Gerrit-Change-Number: 70778
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Helper methods for SDWA/DPP for VOP2

2023-05-17 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70738?usp=email )



Change subject: arch-vega: Helper methods for SDWA/DPP for VOP2
..

arch-vega: Helper methods for SDWA/DPP for VOP2

Many of the outstanding issues with the GPU model are related to
instructions not having SDWA/DPP implementations and executing by
ignoring the special registers leading to incorrect executiong.
Adding SDWA/DPP is current very cumbersome as there is a lot of
boilerplate code.

This changeset adds helper methods for VOP2 with one instruction
changed as an example. This review is intended to get feedback
before applying this change to all VOP2 instructions that support
SDWA/DPP.

Change-Id: I1edbc3f3bb166d34f151545aa9f47a94150e1406
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/op_encodings.hh
2 files changed, 97 insertions(+), 52 deletions(-)



diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 6c014bc..0d3f2dc 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -6384,65 +6384,17 @@
 void
 Inst_VOP2__V_MUL_U32_U24::execute(GPUDynInstPtr gpuDynInst)
 {
-Wavefront *wf = gpuDynInst->wavefront();
-ConstVecOperandU32 src0(gpuDynInst, instData.SRC0);
-VecOperandU32 src1(gpuDynInst, instData.VSRC1);
-VecOperandU32 vdst(gpuDynInst, instData.VDST);
-
-src0.readSrc();
-src1.read();
-
-if (isSDWAInst()) {
-VecOperandU32 src0_sdwa(gpuDynInst,  
extData.iFmt_VOP_SDWA.SRC0);

-// use copies of original src0, src1, and dest during selecting
-VecOperandU32 origSrc0_sdwa(gpuDynInst,
-extData.iFmt_VOP_SDWA.SRC0);
-VecOperandU32 origSrc1(gpuDynInst, instData.VSRC1);
-VecOperandU32 origVdst(gpuDynInst, instData.VDST);
-
-src0_sdwa.read();
-origSrc0_sdwa.read();
-origSrc1.read();
-
-DPRINTF(VEGA, "Handling V_MUL_U32_U24 SRC SDWA. SRC0:  
register "

-"v[%d], DST_SEL: %d, DST_U: %d, CLMP: %d, SRC0_SEL: "
-"%d, SRC0_SEXT: %d, SRC0_NEG: %d, SRC0_ABS: %d,  
SRC1_SEL: "

-"%d, SRC1_SEXT: %d, SRC1_NEG: %d, SRC1_ABS: %d\n",
-extData.iFmt_VOP_SDWA.SRC0,  
extData.iFmt_VOP_SDWA.DST_SEL,

-extData.iFmt_VOP_SDWA.DST_U,
-extData.iFmt_VOP_SDWA.CLMP,
-extData.iFmt_VOP_SDWA.SRC0_SEL,
-extData.iFmt_VOP_SDWA.SRC0_SEXT,
-extData.iFmt_VOP_SDWA.SRC0_NEG,
-extData.iFmt_VOP_SDWA.SRC0_ABS,
-extData.iFmt_VOP_SDWA.SRC1_SEL,
-extData.iFmt_VOP_SDWA.SRC1_SEXT,
-extData.iFmt_VOP_SDWA.SRC1_NEG,
-extData.iFmt_VOP_SDWA.SRC1_ABS);
-
-processSDWA_src(extData.iFmt_VOP_SDWA, src0_sdwa,  
origSrc0_sdwa,

-src1, origSrc1);
-
-for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
-if (wf->execMask(lane)) {
-vdst[lane] = bits(src0_sdwa[lane], 23, 0) *
- bits(src1[lane], 23, 0);
-origVdst[lane] = vdst[lane]; // keep copy consistent
-}
-}
-
-processSDWA_dst(extData.iFmt_VOP_SDWA, vdst, origVdst);
-} else {
+auto opImpl = [](VecOperandU32& src0, VecOperandU32& src1,
+ VecOperandU32& vdst, Wavefront* wf) {
 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (wf->execMask(lane)) {
 vdst[lane] = bits(src0[lane], 23, 0) *
  bits(src1[lane], 23, 0);
 }
 }
-}
+};

-
-vdst.write();
+vop2Helper(gpuDynInst, opImpl);
 } // execute
 // --- Inst_VOP2__V_MUL_HI_U32_U24 class methods ---

diff --git a/src/arch/amdgpu/vega/insts/op_encodings.hh  
b/src/arch/amdgpu/vega/insts/op_encodings.hh

index 1071ead..f195472 100644
--- a/src/arch/amdgpu/vega/insts/op_encodings.hh
+++ b/src/arch/amdgpu/vega/insts/op_encodings.hh
@@ -272,6 +272,99 @@
 InstFormat extData;
 uint32_t varSize;

+template
+T sdwaSrcHelper(GPUDynInstPtr gpuDynInst, T & src1)
+{
+T src0_sdwa(gpuDynInst, extData.iFmt_VOP_SDWA.SRC0);
+// use copies of original src0, src1, and dest during selecting
+T origSrc0_sdwa(gpuDynInst, extData.iFmt_VOP_SDWA.SRC0);
+T origSrc1(gpuDynInst, instData.VSRC1);
+
+src0_sdwa.read();
+origSrc0_sdwa.read();
+origSrc1.read();
+
+DPRINTF(VEGA, 

[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Fix nbio psp ring assert

2023-05-16 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70677?usp=email )



Change subject: dev-amdgpu: Fix nbio psp ring assert
..

dev-amdgpu: Fix nbio psp ring assert

The size of the packet changes between ROCm 4.x and ROCm 5.x. Change how
the address is set based on the incoming packet size so that both
versions continue to work for now.

Change-Id: I91694e4760198fd9129e60140df4e863666be2e2
---
M src/dev/amdgpu/amdgpu_nbio.cc
1 file changed, 12 insertions(+), 3 deletions(-)



diff --git a/src/dev/amdgpu/amdgpu_nbio.cc b/src/dev/amdgpu/amdgpu_nbio.cc
index 8064fd2..8722c50 100644
--- a/src/dev/amdgpu/amdgpu_nbio.cc
+++ b/src/dev/amdgpu/amdgpu_nbio.cc
@@ -162,9 +162,18 @@
 AMDGPUNbio::writeFrame(PacketPtr pkt, Addr offset)
 {
 if (offset == psp_ring_listen_addr) {
-assert(pkt->getSize() == 8);
-psp_ring_dev_addr = pkt->getLE()
-  - gpuDevice->getVM().getSysAddrRangeLow();
+DPRINTF(AMDGPUDevice, "Saw psp_ring_listen_addr with size %ld  
value "

+"%ld\n", pkt->getSize(), pkt->getUintX(ByteOrder::little));
+
+if (pkt->getSize() == 4) {
+psp_ring_dev_addr = pkt->getLE();
+} else if (pkt->getSize() == 8) {
+psp_ring_dev_addr = pkt->getUintX(ByteOrder::little)
+  - gpuDevice->getVM().getSysAddrRangeLow();
+} else {
+panic("Invalid write size to psp_ring_listen_addr\n");
+}
+
 DPRINTF(AMDGPUDevice, "Setting PSP ring device address to %#lx\n",
 psp_ring_dev_addr);
 }

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/70677?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I91694e4760198fd9129e60140df4e863666be2e2
Gerrit-Change-Number: 70677
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: arch-gcn3,arch-vega: Fix ds_read2st64_b32

2023-05-13 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70577?usp=email )


Change subject: arch-gcn3,arch-vega: Fix ds_read2st64_b32
..

arch-gcn3,arch-vega: Fix ds_read2st64_b32

This instruction has two issues. The first is that it should write two
consecutive registers, starting with vdst because it is writing two
dwords. The second is that the data assignment to the lanes from the
dynamic instruction should cast to a U32 type otherwise the array index
goes out of bounds and returns the wrong data.

The first issue was fixed in GCN3 a few years ago in this review:
https://gem5-review.googlesource.com/c/public/gem5/+/32236. This
changeset makes the same change for Vega and applies the U32 cast in
both ISAs.

Tested with rocPRIM unit test. The test was failing before this
changeset and now passes.

Change-Id: Ifb110fc9a36ad198da7eaf86b1e3e37eccd3bb10
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70577
Maintainer: Matt Sinclair 
Reviewed-by: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/gcn3/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.cc
2 files changed, 5 insertions(+), 5 deletions(-)

Approvals:
  kokoro: Regressions pass
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved




diff --git a/src/arch/amdgpu/gcn3/insts/instructions.cc  
b/src/arch/amdgpu/gcn3/insts/instructions.cc

index 8c51af5..478b1d3 100644
--- a/src/arch/amdgpu/gcn3/insts/instructions.cc
+++ b/src/arch/amdgpu/gcn3/insts/instructions.cc
@@ -32123,9 +32123,9 @@

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
-vdst0[lane] = (reinterpret_cast(
+vdst0[lane] = (reinterpret_cast(
 gpuDynInst->d_data))[lane * 2];
-vdst1[lane] = (reinterpret_cast(
+vdst1[lane] = (reinterpret_cast(
 gpuDynInst->d_data))[lane * 2 + 1];
 }
 }
diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 45c8491..6c014bc 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -35665,13 +35665,13 @@
 Inst_DS__DS_READ2ST64_B32::completeAcc(GPUDynInstPtr gpuDynInst)
 {
 VecOperandU32 vdst0(gpuDynInst, extData.VDST);
-VecOperandU32 vdst1(gpuDynInst, extData.VDST + 2);
+VecOperandU32 vdst1(gpuDynInst, extData.VDST + 1);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
-vdst0[lane] = (reinterpret_cast(
+vdst0[lane] = (reinterpret_cast(
 gpuDynInst->d_data))[lane * 2];
-vdst1[lane] = (reinterpret_cast(
+vdst1[lane] = (reinterpret_cast(
 gpuDynInst->d_data))[lane * 2 + 1];
 }
 }

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/70577?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ifb110fc9a36ad198da7eaf86b1e3e37eccd3bb10
Gerrit-Change-Number: 70577
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: arch-gcn3,arch-vega: Fix ds_read2st64_b32

2023-05-12 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70577?usp=email )



Change subject: arch-gcn3,arch-vega: Fix ds_read2st64_b32
..

arch-gcn3,arch-vega: Fix ds_read2st64_b32

This instruction has two issues. The first is that it should write two
consecutive registers, starting with vdst because it is writing two
dwords. The second is that the data assignment to the lanes from the
dynamic instruction should cast to a U32 type otherwise the array index
goes out of bounds and returns the wrong data.

The first issue was fixed in GCN3 a few years ago in this review:
https://gem5-review.googlesource.com/c/public/gem5/+/32236. This
changeset makes the same change for Vega and applies the U32 cast in
both ISAs.

Tested with rocPRIM unit test. The test was failing before this
changeset and now passes.

Change-Id: Ifb110fc9a36ad198da7eaf86b1e3e37eccd3bb10
---
M src/arch/amdgpu/gcn3/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.cc
2 files changed, 5 insertions(+), 5 deletions(-)



diff --git a/src/arch/amdgpu/gcn3/insts/instructions.cc  
b/src/arch/amdgpu/gcn3/insts/instructions.cc

index 8c51af5..478b1d3 100644
--- a/src/arch/amdgpu/gcn3/insts/instructions.cc
+++ b/src/arch/amdgpu/gcn3/insts/instructions.cc
@@ -32123,9 +32123,9 @@

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
-vdst0[lane] = (reinterpret_cast(
+vdst0[lane] = (reinterpret_cast(
 gpuDynInst->d_data))[lane * 2];
-vdst1[lane] = (reinterpret_cast(
+vdst1[lane] = (reinterpret_cast(
 gpuDynInst->d_data))[lane * 2 + 1];
 }
 }
diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 45c8491..6c014bc 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -35665,13 +35665,13 @@
 Inst_DS__DS_READ2ST64_B32::completeAcc(GPUDynInstPtr gpuDynInst)
 {
 VecOperandU32 vdst0(gpuDynInst, extData.VDST);
-VecOperandU32 vdst1(gpuDynInst, extData.VDST + 2);
+VecOperandU32 vdst1(gpuDynInst, extData.VDST + 1);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
-vdst0[lane] = (reinterpret_cast(
+vdst0[lane] = (reinterpret_cast(
 gpuDynInst->d_data))[lane * 2];
-vdst1[lane] = (reinterpret_cast(
+vdst1[lane] = (reinterpret_cast(
 gpuDynInst->d_data))[lane * 2 + 1];
 }
 }

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/70577?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings?usp=email


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ifb110fc9a36ad198da7eaf86b1e3e37eccd3bb10
Gerrit-Change-Number: 70577
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: configs,dev-amdgpu: GPUFS MI200/gfx90a support

2023-05-05 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70317?usp=email )



Change subject: configs,dev-amdgpu: GPUFS MI200/gfx90a support
..

configs,dev-amdgpu: GPUFS MI200/gfx90a support

Add support for MI200-like device. This includes adding PCI IDs and new
MMIOs for the device, a different MAP_PROCESS packet, and a different
calculation for the number of VGPRs.

Change-Id: I0fb7b3ad928826beaa5386d52a94ba504369cb0d
---
M configs/example/gpufs/runfs.py
M configs/example/gpufs/system/amdgpu.py
M configs/example/gpufs/system/system.py
M src/dev/amdgpu/amdgpu_device.cc
M src/dev/amdgpu/amdgpu_device.hh
M src/dev/amdgpu/amdgpu_nbio.cc
M src/dev/amdgpu/amdgpu_nbio.hh
M src/dev/amdgpu/amdgpu_vm.hh
M src/dev/amdgpu/pm4_defines.hh
M src/dev/amdgpu/pm4_packet_processor.cc
M src/dev/amdgpu/pm4_packet_processor.hh
M src/gpu-compute/GPU.py
M src/gpu-compute/gpu_command_processor.cc
M src/gpu-compute/hsa_queue_entry.hh
14 files changed, 173 insertions(+), 27 deletions(-)



diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py
index 4c90601..f8ef70d 100644
--- a/configs/example/gpufs/runfs.py
+++ b/configs/example/gpufs/runfs.py
@@ -132,8 +132,9 @@
 parser.add_argument(
 "--gpu-device",
 default="Vega10",
-choices=["Vega10", "MI100"],
-help="GPU model to run: Vega10 (gfx900) or MI100 (gfx908)",
+choices=["Vega10", "MI100", "MI200"],
+help="GPU model to run: Vega10 (gfx900), MI100 (gfx908), or "
+"MI200 (gfx90a)",
 )


diff --git a/configs/example/gpufs/system/amdgpu.py  
b/configs/example/gpufs/system/amdgpu.py

index 5f98b55..9697e50 100644
--- a/configs/example/gpufs/system/amdgpu.py
+++ b/configs/example/gpufs/system/amdgpu.py
@@ -177,6 +177,10 @@
 system.pc.south_bridge.gpu.DeviceID = 0x738C
 system.pc.south_bridge.gpu.SubsystemVendorID = 0x1002
 system.pc.south_bridge.gpu.SubsystemID = 0x0C34
+elif args.gpu_device == "MI200":
+system.pc.south_bridge.gpu.DeviceID = 0x740F
+system.pc.south_bridge.gpu.SubsystemVendorID = 0x1002
+system.pc.south_bridge.gpu.SubsystemID = 0x0C34
 elif args.gpu_device == "Vega10":
 system.pc.south_bridge.gpu.DeviceID = 0x6863
 else:
diff --git a/configs/example/gpufs/system/system.py  
b/configs/example/gpufs/system/system.py

index 90c5c01..263ffc0 100644
--- a/configs/example/gpufs/system/system.py
+++ b/configs/example/gpufs/system/system.py
@@ -152,6 +152,16 @@
 0x7D000,
 ]
 sdma_sizes = [0x1000] * 8
+elif args.gpu_device == "MI200":
+num_sdmas = 5
+sdma_bases = [
+0x4980,
+0x6180,
+0x78000,
+0x79000,
+0x7A000,
+]
+sdma_sizes = [0x1000] * 5
 else:
 m5.util.panic(f"Unknown GPU device {args.gpu_device}")

diff --git a/src/dev/amdgpu/amdgpu_device.cc  
b/src/dev/amdgpu/amdgpu_device.cc

index f58d1f7..734f0d7 100644
--- a/src/dev/amdgpu/amdgpu_device.cc
+++ b/src/dev/amdgpu/amdgpu_device.cc
@@ -115,7 +115,7 @@
 sdmaFunc.insert({0x10b, ::setPageDoorbellOffsetLo});
 sdmaFunc.insert({0xe0, ::setPageSize});
 sdmaFunc.insert({0x113, ::setPageWptrLo});
-} else if (p.device_name == "MI100") {
+} else if (p.device_name == "MI100" || p.device_name == "MI200") {
 sdmaFunc.insert({0xd9, ::setPageBaseLo});
 sdmaFunc.insert({0xe1, ::setPageRptrLo});
 sdmaFunc.insert({0xe0, ::setPageRptrHi});
@@ -144,10 +144,19 @@
 if (p.device_name == "Vega10") {
 setRegVal(VEGA10_FB_LOCATION_BASE, mmhubBase >> 24);
 setRegVal(VEGA10_FB_LOCATION_TOP, mmhubTop >> 24);
+gfx_version = GfxVersion::gfx900;
 } else if (p.device_name == "MI100") {
 setRegVal(MI100_FB_LOCATION_BASE, mmhubBase >> 24);
 setRegVal(MI100_FB_LOCATION_TOP, mmhubTop >> 24);
 setRegVal(MI100_MEM_SIZE_REG, 0x3ff0); // 16GB of memory
+gfx_version = GfxVersion::gfx908;
+} else if (p.device_name == "MI200") {
+// This device can have either 64GB or 128GB of device memory.
+// This limits to 16GB for simulation.
+setRegVal(MI200_FB_LOCATION_BASE, mmhubBase >> 24);
+setRegVal(MI200_FB_LOCATION_TOP, mmhubTop >> 24);
+setRegVal(MI200_MEM_SIZE_REG, 0x3ff0);
+gfx_version = GfxVersion::gfx90a;
 } else {
 panic("Unknown GPU device %s\n", p.device_name);
 }
diff --git a/src/dev/amdgpu/amdgpu_device.hh  
b/src/dev/amdgpu/amdgpu_device.hh

index cab7991..56ed2f4 100644
--- a/src/dev/amdgpu/amdgpu_device.hh
+++ b/src/dev/amdgpu/amdgpu_device.hh
@@ -42,6 +42,7 @@
 #include "dev/amdgpu/mmio_reader.hh"
 #include "dev/io_device.hh"
 #include "dev/pci/device.hh"
+#include "enums/GfxVersion.hh"
 #include "params/AMDGPUDevice.hh"

 namespace gem5
@@ -145,6 +146,9 @@
  */
 

[gem5-dev] [L] Change in gem5/gem5[develop]: dev-amdgpu: Enable more GPUs with device specific registers

2023-04-27 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70041?usp=email )


Change subject: dev-amdgpu: Enable more GPUs with device specific registers
..

dev-amdgpu: Enable more GPUs with device specific registers

Currently gem5 assumes the amdgpu device to be Vega10. In order to
support more devices we need to handle situations where different
registers and addresses have the same functionality but different
offsets on different devices.

This changeset adds an NBIO class to handle device discovery and driver
initialization related tasks, pulling them out of the AMDGPUDevice
class. The offsets used for MMIOs are reworked slightly to use offsets
rather than absolute addresses. This is because we cannot determine the
absolute address in the constructor since the BAR has not been assigned
by the OS yet.

Change-Id: I14b364374e086e185978334425a4e265cf2760d0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70041
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/dev/amdgpu/SConscript
M src/dev/amdgpu/amdgpu_device.cc
M src/dev/amdgpu/amdgpu_device.hh
A src/dev/amdgpu/amdgpu_nbio.cc
A src/dev/amdgpu/amdgpu_nbio.hh
M src/dev/amdgpu/amdgpu_vm.hh
6 files changed, 369 insertions(+), 46 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/dev/amdgpu/SConscript b/src/dev/amdgpu/SConscript
index 713f0a6..9f8eeac 100644
--- a/src/dev/amdgpu/SConscript
+++ b/src/dev/amdgpu/SConscript
@@ -39,6 +39,7 @@
 tags='x86 isa')

 Source('amdgpu_device.cc', tags='x86 isa')
+Source('amdgpu_nbio.cc', tags='x86 isa')
 Source('amdgpu_vm.cc', tags='x86 isa')
 Source('interrupt_handler.cc', tags='x86 isa')
 Source('memory_manager.cc', tags='x86 isa')
diff --git a/src/dev/amdgpu/amdgpu_device.cc  
b/src/dev/amdgpu/amdgpu_device.cc

index 2acf1f4..f58d1f7 100644
--- a/src/dev/amdgpu/amdgpu_device.cc
+++ b/src/dev/amdgpu/amdgpu_device.cc
@@ -34,6 +34,7 @@
 #include 

 #include "debug/AMDGPUDevice.hh"
+#include "dev/amdgpu/amdgpu_nbio.hh"
 #include "dev/amdgpu/amdgpu_vm.hh"
 #include "dev/amdgpu/interrupt_handler.hh"
 #include "dev/amdgpu/pm4_packet_processor.hh"
@@ -129,6 +130,32 @@
 pm4PktProc->setGPUDevice(this);
 cp->hsaPacketProc().setGPUDevice(this);
 cp->setGPUDevice(this);
+
+// Address aperture for device memory. We tell this to the driver and
+// could possibly be anything, but these are the values used by  
hardware.

+uint64_t mmhubBase = 0x8000ULL << 24;
+uint64_t mmhubTop = 0x83ffULL << 24;
+
+// These are hardcoded register values to return what the driver  
expects

+setRegVal(AMDGPU_MP0_SMN_C2PMSG_33, 0x8000);
+
+// There are different registers for different GPUs, so we set the  
value

+// based on the GPU type specified by the user.
+if (p.device_name == "Vega10") {
+setRegVal(VEGA10_FB_LOCATION_BASE, mmhubBase >> 24);
+setRegVal(VEGA10_FB_LOCATION_TOP, mmhubTop >> 24);
+} else if (p.device_name == "MI100") {
+setRegVal(MI100_FB_LOCATION_BASE, mmhubBase >> 24);
+setRegVal(MI100_FB_LOCATION_TOP, mmhubTop >> 24);
+setRegVal(MI100_MEM_SIZE_REG, 0x3ff0); // 16GB of memory
+} else {
+panic("Unknown GPU device %s\n", p.device_name);
+}
+
+gpuvm.setMMHUBBase(mmhubBase);
+gpuvm.setMMHUBTop(mmhubTop);
+
+nbio.setGPUDevice(this);
 }

 void
@@ -236,35 +263,25 @@
  * first, ignoring any writes from driver. (2) Any other address from
  * device backing store / abstract memory class functionally.
  */
-if (offset == 0xa28000) {
-/*
- * Handle special counter addresses in framebuffer. These counter
- * addresses expect the read to return previous value + 1.
- */
-if (regs.find(pkt->getAddr()) == regs.end()) {
-regs[pkt->getAddr()] = 1;
-} else {
-regs[pkt->getAddr()]++;
-}
-
-pkt->setUintX(regs[pkt->getAddr()], ByteOrder::little);
-} else {
-/*
- * Read the value from device memory. This must be done  
functionally
- * because this method is called by the PCIDevice::read method  
which

- * is a non-timing read.
- */
-RequestPtr req = std::make_shared(offset, pkt->getSize(),  
0,

-   vramRequestorId());
-PacketPtr readPkt = Packet::createRead(req);
-uint8_t *dataPtr = new uint8_t[pkt->getSize()];
-readPkt->dataDynamic(dataPtr);
-
-auto system = cp->shader()->gpuCmdProc.system();
-system->getDeviceMemory(readPkt)->access(readPkt);
-
-pkt->setUintX(readPkt->getUintX(ByteOrder::little),  
ByteOrder::little);

+if (nbio.readFrame(pkt, offset)) {
+return;
 }
+
+

[gem5-dev] [M] Change in gem5/gem5[develop]: dev-amdgpu: Refactor MMIO interface for SDMA engines

2023-04-27 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70040?usp=email )


 (

1 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

 )Change subject: dev-amdgpu: Refactor MMIO interface for SDMA engines
..

dev-amdgpu: Refactor MMIO interface for SDMA engines

Currently the amdgpu simulated device is assumed to be a Vega10. As a
result there are a few things that are hardcoded. One of those is the
number of SDMAs. In order to add a newer device, such as MI100+, we need
to enable a flexible number of SDMAs.

In order to support a variable number of SDMAs and with the MMIO offsets
of each device being potentially different, the MMIO interface for SDMAs
is changed to use an SDMA class method dispatch table with forwards a
32-bit value from the MMIO packet to the MMIO functions in SDMA of the
format `void method(uint32_t)`. Several changes are made to enable this:

 - Allow the SDMA to have a variable MMIO base and size. These are
   configured in python.
 - An SDMA class method dispatch table which contains the MMIO offset
   relative to the SDMA's MMIO base address.
 - An updated writeMMIO method to iterate over the SDMA MMIO address
   ranges and call the appropriate SDMA MMIO method which matches the
   MMIO offset.
 - Moved all SDMA related MMIO data bit twiddling, masking, etc. into
   the MMIO methods themselves instead of in the writeMMIO method in
   SDMAEngine.

Change-Id: Ifce626f84d52f9e27e4438ba4e685e30dbf06dbc
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70040
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
---
M configs/example/gpufs/system/system.py
M src/dev/amdgpu/AMDGPU.py
M src/dev/amdgpu/amdgpu_device.cc
M src/dev/amdgpu/amdgpu_device.hh
M src/dev/amdgpu/interrupt_handler.cc
M src/dev/amdgpu/interrupt_handler.hh
M src/dev/amdgpu/sdma_engine.cc
M src/dev/amdgpu/sdma_engine.hh
8 files changed, 182 insertions(+), 57 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/configs/example/gpufs/system/system.py  
b/configs/example/gpufs/system/system.py

index 93f0194..90c5c01 100644
--- a/configs/example/gpufs/system/system.py
+++ b/configs/example/gpufs/system/system.py
@@ -129,15 +129,45 @@
 device_ih = AMDGPUInterruptHandler()
 system.pc.south_bridge.gpu.device_ih = device_ih

-# Setup the SDMA engines
-sdma0_pt_walker = VegaPagetableWalker()
-sdma1_pt_walker = VegaPagetableWalker()
+# Setup the SDMA engines depending on device. The MMIO base addresses
+# can be found in the driver code under:
+# include/asic_reg/sdmaX/sdmaX_Y_Z_offset.h
+num_sdmas = 2
+sdma_bases = []
+sdma_sizes = []
+if args.gpu_device == "Vega10":
+num_sdmas = 2
+sdma_bases = [0x4980, 0x5180]
+sdma_sizes = [0x800] * 2
+elif args.gpu_device == "MI100":
+num_sdmas = 8
+sdma_bases = [
+0x4980,
+0x6180,
+0x78000,
+0x79000,
+0x7A000,
+0x7B000,
+0x7C000,
+0x7D000,
+]
+sdma_sizes = [0x1000] * 8
+else:
+m5.util.panic(f"Unknown GPU device {args.gpu_device}")

-sdma0 = SDMAEngine(walker=sdma0_pt_walker)
-sdma1 = SDMAEngine(walker=sdma1_pt_walker)
+sdma_pt_walkers = []
+sdma_engines = []
+for sdma_idx in range(num_sdmas):
+sdma_pt_walker = VegaPagetableWalker()
+sdma_engine = SDMAEngine(
+walker=sdma_pt_walker,
+mmio_base=sdma_bases[sdma_idx],
+mmio_size=sdma_sizes[sdma_idx],
+)
+sdma_pt_walkers.append(sdma_pt_walker)
+sdma_engines.append(sdma_engine)

-system.pc.south_bridge.gpu.sdma0 = sdma0
-system.pc.south_bridge.gpu.sdma1 = sdma1
+system.pc.south_bridge.gpu.sdmas = sdma_engines

 # Setup PM4 packet processor
 pm4_pkt_proc = PM4PacketProcessor()
@@ -155,22 +185,22 @@
 system._dma_ports.append(gpu_hsapp)
 system._dma_ports.append(gpu_cmd_proc)
 system._dma_ports.append(system.pc.south_bridge.gpu)
-system._dma_ports.append(sdma0)
-system._dma_ports.append(sdma1)
+for sdma in sdma_engines:
+system._dma_ports.append(sdma)
 system._dma_ports.append(device_ih)
 system._dma_ports.append(pm4_pkt_proc)
 system._dma_ports.append(system_hub)
 system._dma_ports.append(gpu_mem_mgr)
 system._dma_ports.append(hsapp_pt_walker)
 system._dma_ports.append(cp_pt_walker)
-system._dma_ports.append(sdma0_pt_walker)
-system._dma_ports.append(sdma1_pt_walker)
+for sdma_pt_walker in sdma_pt_walkers:
+system._dma_ports.append(sdma_pt_walker)

 gpu_hsapp.pio = system.iobus.mem_side_ports
 gpu_cmd_proc.pio = system.iobus.mem_side_ports
   

[gem5-dev] [XS] Change in gem5/gem5[develop]: dev-amdgpu: Default MMIO reads when previously written

2023-04-27 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70039?usp=email )


 (

1 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

 )Change subject: dev-amdgpu: Default MMIO reads when previously written
..

dev-amdgpu: Default MMIO reads when previously written

If an MMIO was previously written and the driver reads it, we should
return the value that was previously read. This overwrites the MMIO
trace value which is the last resort fallback for finding an MMIO value.
This is needed to initialize newer GPU devices in gem5.

Change-Id: Ida2435290b706288e88518b5d920691cdb6dcc09
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70039
Maintainer: Matt Sinclair 
Reviewed-by: Matt Sinclair 
Tested-by: kokoro 
---
M src/dev/amdgpu/amdgpu_device.cc
1 file changed, 7 insertions(+), 0 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/dev/amdgpu/amdgpu_device.cc  
b/src/dev/amdgpu/amdgpu_device.cc

index 3605882..7e6304a 100644
--- a/src/dev/amdgpu/amdgpu_device.cc
+++ b/src/dev/amdgpu/amdgpu_device.cc
@@ -248,6 +248,13 @@
 DPRINTF(AMDGPUDevice, "Read MMIO %#lx\n", offset);
 mmioReader.readFromTrace(pkt, MMIO_BAR, offset);

+if (regs.find(pkt->getAddr()) != regs.end()) {
+uint64_t value = regs[pkt->getAddr()];
+DPRINTF(AMDGPUDevice, "Reading what kernel wrote before: %#x\n",
+value);
+pkt->setUintX(value, ByteOrder::little);
+}
+
 switch (aperture) {
   case NBIO_BASE:
 switch (aperture_offset) {

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/70039?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ida2435290b706288e88518b5d920691cdb6dcc09
Gerrit-Change-Number: 70039
Gerrit-PatchSet: 3
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu,configs: Add human readable names for different GPUs

2023-04-27 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70038?usp=email )


Change subject: dev-amdgpu,configs: Add human readable names for different  
GPUs

..

dev-amdgpu,configs: Add human readable names for different GPUs

Add a human readable string for GPU device names rather than using the
device ID in the code. This is intended to make code more readable.

Change-Id: Id3ea74ca37422b1f4a0f09e5a9522d37b5998c1a
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70038
Reviewed-by: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Jason Lowe-Power 
Maintainer: Matt Sinclair 
---
M configs/example/gpufs/runfs.py
M configs/example/gpufs/system/amdgpu.py
M src/dev/amdgpu/AMDGPU.py
3 files changed, 24 insertions(+), 0 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass
  Jason Lowe-Power: Looks good to me, approved




diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py
index 52b79ab..4c90601 100644
--- a/configs/example/gpufs/runfs.py
+++ b/configs/example/gpufs/runfs.py
@@ -126,6 +126,16 @@
 help="type of memory to use",
 )

+# These are the models that are both supported in gem5 and supported
+# by the versions of ROCm supported by gem5 in full system mode. For
+# other gfx versions there is some support in syscall emulation mode.
+parser.add_argument(
+"--gpu-device",
+default="Vega10",
+choices=["Vega10", "MI100"],
+help="GPU model to run: Vega10 (gfx900) or MI100 (gfx908)",
+)
+

 def runGpuFSSystem(args):
 """
diff --git a/configs/example/gpufs/system/amdgpu.py  
b/configs/example/gpufs/system/amdgpu.py

index 1fd3e2f..5f98b55 100644
--- a/configs/example/gpufs/system/amdgpu.py
+++ b/configs/example/gpufs/system/amdgpu.py
@@ -170,3 +170,14 @@
 system.pc.south_bridge.gpu.checkpoint_before_mmios = (
 args.checkpoint_before_mmios
 )
+
+system.pc.south_bridge.gpu.device_name = args.gpu_device
+
+if args.gpu_device == "MI100":
+system.pc.south_bridge.gpu.DeviceID = 0x738C
+system.pc.south_bridge.gpu.SubsystemVendorID = 0x1002
+system.pc.south_bridge.gpu.SubsystemID = 0x0C34
+elif args.gpu_device == "Vega10":
+system.pc.south_bridge.gpu.DeviceID = 0x6863
+else:
+panic("Unknown GPU device: {}".format(args.gpu_device))
diff --git a/src/dev/amdgpu/AMDGPU.py b/src/dev/amdgpu/AMDGPU.py
index f9d953f..1e78672 100644
--- a/src/dev/amdgpu/AMDGPU.py
+++ b/src/dev/amdgpu/AMDGPU.py
@@ -46,6 +46,9 @@
 cxx_header = "dev/amdgpu/amdgpu_device.hh"
 cxx_class = "gem5::AMDGPUDevice"

+# Human readable name for device ID
+device_name = Param.String("Vega10", "Codename for device")
+
 # IDs for AMD Vega 10
 VendorID = 0x1002
 DeviceID = 0x6863

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/70038?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Id3ea74ca37422b1f4a0f09e5a9522d37b5998c1a
Gerrit-Change-Number: 70038
Gerrit-PatchSet: 3
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Add decodings for new MI100 VOP2 insts

2023-04-27 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70042?usp=email )


 (

1 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

 )Change subject: arch-vega: Add decodings for new MI100 VOP2 insts
..

arch-vega: Add decodings for new MI100 VOP2 insts

VOP2 with opcodes 55-61 were added in MI100 and are not in Vega10. This
changeset adds the decodings for these instructions.

The changeset does not implement the instructions, however the fatal
message is much more helpful for debugging compared so a generic
decode_invalid handler.

Change-Id: Ibde0880c35ff915bf8e50772df9ce263e55ca893
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70042
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
---
M src/arch/amdgpu/vega/decoder.cc
M src/arch/amdgpu/vega/gpu_decoder.hh
2 files changed, 84 insertions(+), 28 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/decoder.cc  
b/src/arch/amdgpu/vega/decoder.cc

index 291dd69..fd3a803 100644
--- a/src/arch/amdgpu/vega/decoder.cc
+++ b/src/arch/amdgpu/vega/decoder.cc
@@ -274,34 +274,34 @@
 ::decode_OP_VOP2__V_SUBREV_U32,
 ::decode_OP_VOP2__V_SUBREV_U32,
 ::decode_OP_VOP2__V_SUBREV_U32,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
+::decode_OP_VOP2__V_DOT2C_F32_F16,
+::decode_OP_VOP2__V_DOT2C_F32_F16,
+::decode_OP_VOP2__V_DOT2C_F32_F16,
+::decode_OP_VOP2__V_DOT2C_F32_F16,
+::decode_OP_VOP2__V_DOT2C_I32_I16,
+::decode_OP_VOP2__V_DOT2C_I32_I16,
+::decode_OP_VOP2__V_DOT2C_I32_I16,
+::decode_OP_VOP2__V_DOT2C_I32_I16,
+::decode_OP_VOP2__V_DOT4C_I32_I8,
+::decode_OP_VOP2__V_DOT4C_I32_I8,
+::decode_OP_VOP2__V_DOT4C_I32_I8,
+::decode_OP_VOP2__V_DOT4C_I32_I8,
+::decode_OP_VOP2__V_DOT8C_I32_I4,
+::decode_OP_VOP2__V_DOT8C_I32_I4,
+::decode_OP_VOP2__V_DOT8C_I32_I4,
+::decode_OP_VOP2__V_DOT8C_I32_I4,
+::decode_OP_VOP2__V_FMAC_F32,
+::decode_OP_VOP2__V_FMAC_F32,
+::decode_OP_VOP2__V_FMAC_F32,
+::decode_OP_VOP2__V_FMAC_F32,
+::decode_OP_VOP2__V_PK_FMAC_F16,
+::decode_OP_VOP2__V_PK_FMAC_F16,
+::decode_OP_VOP2__V_PK_FMAC_F16,
+::decode_OP_VOP2__V_PK_FMAC_F16,
+::decode_OP_VOP2__V_XNOR_B32,
+::decode_OP_VOP2__V_XNOR_B32,
+::decode_OP_VOP2__V_XNOR_B32,
+::decode_OP_VOP2__V_XNOR_B32,
 ::subDecode_OP_VOPC,
 ::subDecode_OP_VOPC,
 ::subDecode_OP_VOPC,
@@ -4172,6 +4172,55 @@
 }

 GPUStaticInst*
+Decoder::decode_OP_VOP2__V_DOT2C_F32_F16(MachInst iFmt)
+{
+fatal("Trying to decode instruction without a class\n");
+return nullptr;
+}
+
+GPUStaticInst*
+Decoder::decode_OP_VOP2__V_DOT2C_I32_I16(MachInst iFmt)
+{
+fatal("Trying to decode instruction without a class\n");
+return nullptr;
+}
+
+GPUStaticInst*
+Decoder::decode_OP_VOP2__V_DOT4C_I32_I8(MachInst iFmt)
+{
+fatal("Trying to decode instruction without a class\n");
+return nullptr;
+}
+
+GPUStaticInst*
+Decoder::decode_OP_VOP2__V_DOT8C_I32_I4(MachInst iFmt)
+{
+fatal("Trying to decode instruction without a class\n");
+return nullptr;
+}
+
+GPUStaticInst*
+Decoder::decode_OP_VOP2__V_FMAC_F32(MachInst iFmt)
+{
+fatal("Trying to decode instruction without a class\n");
+return nullptr;
+}
+
+GPUStaticInst*
+Decoder::decode_OP_VOP2__V_PK_FMAC_F16(MachInst iFmt)
+{
+fatal("Trying to decode instruction without a class\n");
+return nullptr;
+}
+
+GPUStaticInst*
+Decoder::decode_OP_VOP2__V_XNOR_B32(MachInst iFmt)
+{
+fatal("Trying to decode instruction without a class\n");
+return nullptr;
+}
+
+GPUStaticInst*
 Decoder::decode_OP_SOP2__S_ADD_U32(MachInst iFmt)
 {
 return new Inst_SOP2__S_ADD_U32(>iFmt_SOP2);
diff --git 

[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Add writeROM method

2023-04-22 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70037?usp=email )


Change subject: dev-amdgpu: Add writeROM method
..

dev-amdgpu: Add writeROM method

For non-KVM CPUs the VBIOS memory falls into an I/O hole and therefore
gets routed to the PIO bus in gem5. This gets routed to the GPU in the
case of a ROM write. We write to the ROM as a way to "load" the VBIOS
without creating holes in the KVM VM.

This write method allows the same scripts as KVM to be used by writing
to the ROM area and overwriting what might already be there from the
--gpu-rom option.

Change-Id: I8c2d2aa05a823569a774dfdd3bf2d2e773f38683
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/70037
Reviewed-by: Matt Sinclair 
Tested-by: kokoro 
Maintainer: Matt Sinclair 
---
M src/dev/amdgpu/amdgpu_device.cc
M src/dev/amdgpu/amdgpu_device.hh
2 files changed, 23 insertions(+), 0 deletions(-)

Approvals:
  kokoro: Regressions pass
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved




diff --git a/src/dev/amdgpu/amdgpu_device.cc  
b/src/dev/amdgpu/amdgpu_device.cc

index cb180b6..3605882 100644
--- a/src/dev/amdgpu/amdgpu_device.cc
+++ b/src/dev/amdgpu/amdgpu_device.cc
@@ -107,6 +107,20 @@
 pkt->getAddr(), rom_offset, rom_data);
 }

+void
+AMDGPUDevice::writeROM(PacketPtr pkt)
+{
+assert(isROM(pkt->getAddr()));
+
+Addr rom_offset = pkt->getAddr() - romRange.start();
+uint64_t rom_data = pkt->getUintX(ByteOrder::little);
+
+memcpy(rom.data() + rom_offset, _data, pkt->getSize());
+
+DPRINTF(AMDGPUDevice, "Write to addr %#x on ROM offset %#x  
data: %#x\n",

+pkt->getAddr(), rom_offset, rom_data);
+}
+
 AddrRangeList
 AMDGPUDevice::getAddrRanges() const
 {
@@ -386,6 +400,14 @@
 Tick
 AMDGPUDevice::write(PacketPtr pkt)
 {
+if (isROM(pkt->getAddr())) {
+writeROM(pkt);
+
+dispatchAccess(pkt, false);
+
+return pioDelay;
+}
+
 int barnum = -1;
 Addr offset = 0;
 getBAR(pkt->getAddr(), barnum, offset);
diff --git a/src/dev/amdgpu/amdgpu_device.hh  
b/src/dev/amdgpu/amdgpu_device.hh

index ac31b95..b64067a 100644
--- a/src/dev/amdgpu/amdgpu_device.hh
+++ b/src/dev/amdgpu/amdgpu_device.hh
@@ -94,6 +94,7 @@
 AddrRange romRange;
 bool isROM(Addr addr) const { return romRange.contains(addr); }
 void readROM(PacketPtr pkt);
+void writeROM(PacketPtr pkt);

 std::array rom;


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/70037?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I8c2d2aa05a823569a774dfdd3bf2d2e773f38683
Gerrit-Change-Number: 70037
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Use higher dmesg level for GPUFS

2023-04-21 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/69977?usp=email )


Change subject: configs: Use higher dmesg level for GPUFS
..

configs: Use higher dmesg level for GPUFS

The dmesg level is currently set to 3 which will not display errors if
the amdgpu driver fails to load. Changing to level 8 will show errors in
the gem5 terminal and is not too spammy. This will help GPUFS developers
with bug reports since we would actually be able to observe an error.
Currently if the driver fails to load, there is no way to detect it and
applications will attempt to run, usually failing on getting device
properties.

Change-Id: I56b9581c1a12a8ce329066d18d6a072d006c096d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69977
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M configs/example/gpufs/hip_cookbook.py
M configs/example/gpufs/hip_rodinia.py
M configs/example/gpufs/hip_samples.py
M configs/example/gpufs/vega10_kvm.py
4 files changed, 4 insertions(+), 4 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/configs/example/gpufs/hip_cookbook.py  
b/configs/example/gpufs/hip_cookbook.py

index 87c7547..6a7bb42 100644
--- a/configs/example/gpufs/hip_cookbook.py
+++ b/configs/example/gpufs/hip_cookbook.py
@@ -42,7 +42,7 @@
 cookbook_runscript = """\
 export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
 export HSA_ENABLE_INTERRUPT=0
-dmesg -n3
+dmesg -n8
 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
 echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
diff --git a/configs/example/gpufs/hip_rodinia.py  
b/configs/example/gpufs/hip_rodinia.py

index 8ed951b..b8a7858 100644
--- a/configs/example/gpufs/hip_rodinia.py
+++ b/configs/example/gpufs/hip_rodinia.py
@@ -43,7 +43,7 @@
 rodinia_runscript = """\
 export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
 export HSA_ENABLE_INTERRUPT=0
-dmesg -n3
+dmesg -n8
 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
 echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
diff --git a/configs/example/gpufs/hip_samples.py  
b/configs/example/gpufs/hip_samples.py

index ccc1719..9f83c25 100644
--- a/configs/example/gpufs/hip_samples.py
+++ b/configs/example/gpufs/hip_samples.py
@@ -42,7 +42,7 @@
 samples_runscript = """\
 export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
 export HSA_ENABLE_INTERRUPT=0
-dmesg -n3
+dmesg -n8
 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
 echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
diff --git a/configs/example/gpufs/vega10_kvm.py  
b/configs/example/gpufs/vega10_kvm.py

index 54253be..9c7e457 100644
--- a/configs/example/gpufs/vega10_kvm.py
+++ b/configs/example/gpufs/vega10_kvm.py
@@ -44,7 +44,7 @@
 demo_runscript = """\
 export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
 export HSA_ENABLE_INTERRUPT=0
-dmesg -n3
+dmesg -n8
 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
 echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/69977?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I56b9581c1a12a8ce329066d18d6a072d006c096d
Gerrit-Change-Number: 69977
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Add simple check for valid GPU MMIO trace

2023-04-21 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/69978?usp=email )


Change subject: configs: Add simple check for valid GPU MMIO trace
..

configs: Add simple check for valid GPU MMIO trace

This file is a required input to the simulator for GPUFS. There seems to
be confusion from several users who are not providing this input. This
usually results in the amdgpu driver failing to load, leading to the
application under test exiting along with it.

This changeset adds a simple md5 hashsum check to compare against the
known good MMIO trace located in the gem5-resources repository.

Change-Id: I59819fc795a6bc4bc6badbd4d120db1246498987
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69978
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M configs/example/gpufs/runfs.py
1 file changed, 6 insertions(+), 0 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py
index 4a28068a..52b79ab 100644
--- a/configs/example/gpufs/runfs.py
+++ b/configs/example/gpufs/runfs.py
@@ -30,6 +30,7 @@
 # System includes
 import argparse
 import math
+import hashlib

 # gem5 related
 import m5
@@ -145,6 +146,11 @@
 math.ceil(float(n_cu) / args.cu_per_scalar_cache)
 )

+# Verify MMIO trace is valid
+mmio_md5 =  
hashlib.md5(open(args.gpu_mmio_trace, "rb").read()).hexdigest()

+if mmio_md5 != "c4ff3326ae8a036e329b8b595c83bd6d":
+m5.util.panic("MMIO file does not match gem5 resources")
+
 system = makeGpuFSSystem(args)

 root = Root(

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/69978?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I59819fc795a6bc4bc6badbd4d120db1246498987
Gerrit-Change-Number: 69978
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Allow other CPU types in GPUFS

2023-04-21 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/69979?usp=email )


Change subject: configs: Allow other CPU types in GPUFS
..

configs: Allow other CPU types in GPUFS

Previously the CPU type and memory modes were hardcoded for KVM, because
there was a deadlock bug. After some recent testing, this deadlock bug
no longer exists with the simple CPU models. Thus, changing the configs
to allow for other CPU models as a first step toward lifting the KVM
requirement from GPUFS.

Change-Id: Ib616c3ef60f173871421b55a8bb73b25ce2990b5
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/69979
Tested-by: kokoro 
Maintainer: Matt Sinclair 
Reviewed-by: Matt Sinclair 
---
M configs/example/gpufs/system/system.py
1 file changed, 6 insertions(+), 3 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/configs/example/gpufs/system/system.py  
b/configs/example/gpufs/system/system.py

index a1b59ef..93f0194 100644
--- a/configs/example/gpufs/system/system.py
+++ b/configs/example/gpufs/system/system.py
@@ -61,7 +61,9 @@
 panic("Need at least 2GB of system memory to load amdgpu module")

 # Use the common FSConfig to setup a Linux X86 System
-(TestCPUClass, test_mem_mode, FutureClass) =  
Simulation.setCPUClass(args)

+(TestCPUClass, test_mem_mode) = Simulation.getCPUClass(args.cpu_type)
+if test_mem_mode == "atomic":
+test_mem_mode = "atomic_noncaching"
 disks = [args.disk_image]
 if args.second_disk is not None:
 disks.extend([args.second_disk])
@@ -91,10 +93,11 @@

 # Create specified number of CPUs. GPUFS really only needs one.
 system.cpu = [
-X86KvmCPU(clk_domain=system.cpu_clk_domain, cpu_id=i)
+TestCPUClass(clk_domain=system.cpu_clk_domain, cpu_id=i)
 for i in range(args.num_cpus)
 ]
-system.kvm_vm = KvmVM()
+if ObjectList.is_kvm_cpu(TestCPUClass):
+system.kvm_vm = KvmVM()

 # Create AMDGPU and attach to southbridge
 shader = createGPU(system, args)

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/69979?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-MessageType: merged
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ib616c3ef60f173871421b55a8bb73b25ce2990b5
Gerrit-Change-Number: 69979
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Add decodings for new MI100 VOP2 insts

2023-04-21 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70042?usp=email )



Change subject: arch-vega: Add decodings for new MI100 VOP2 insts
..

arch-vega: Add decodings for new MI100 VOP2 insts

VOP2 with opcodes 55-61 were added in MI100 and are not in Vega10. This
changeset adds the decodings for these instructions.

The changeset does not implement the instructions, however the fatal
message is much more helpful for debugging compared so a generic
decode_invalid handler.

Change-Id: Ibde0880c35ff915bf8e50772df9ce263e55ca893
---
M src/arch/amdgpu/vega/decoder.cc
M src/arch/amdgpu/vega/gpu_decoder.hh
2 files changed, 84 insertions(+), 28 deletions(-)



diff --git a/src/arch/amdgpu/vega/decoder.cc  
b/src/arch/amdgpu/vega/decoder.cc

index 291dd69..fd3a803 100644
--- a/src/arch/amdgpu/vega/decoder.cc
+++ b/src/arch/amdgpu/vega/decoder.cc
@@ -274,34 +274,34 @@
 ::decode_OP_VOP2__V_SUBREV_U32,
 ::decode_OP_VOP2__V_SUBREV_U32,
 ::decode_OP_VOP2__V_SUBREV_U32,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
-::decode_invalid,
+::decode_OP_VOP2__V_DOT2C_F32_F16,
+::decode_OP_VOP2__V_DOT2C_F32_F16,
+::decode_OP_VOP2__V_DOT2C_F32_F16,
+::decode_OP_VOP2__V_DOT2C_F32_F16,
+::decode_OP_VOP2__V_DOT2C_I32_I16,
+::decode_OP_VOP2__V_DOT2C_I32_I16,
+::decode_OP_VOP2__V_DOT2C_I32_I16,
+::decode_OP_VOP2__V_DOT2C_I32_I16,
+::decode_OP_VOP2__V_DOT4C_I32_I8,
+::decode_OP_VOP2__V_DOT4C_I32_I8,
+::decode_OP_VOP2__V_DOT4C_I32_I8,
+::decode_OP_VOP2__V_DOT4C_I32_I8,
+::decode_OP_VOP2__V_DOT8C_I32_I4,
+::decode_OP_VOP2__V_DOT8C_I32_I4,
+::decode_OP_VOP2__V_DOT8C_I32_I4,
+::decode_OP_VOP2__V_DOT8C_I32_I4,
+::decode_OP_VOP2__V_FMAC_F32,
+::decode_OP_VOP2__V_FMAC_F32,
+::decode_OP_VOP2__V_FMAC_F32,
+::decode_OP_VOP2__V_FMAC_F32,
+::decode_OP_VOP2__V_PK_FMAC_F16,
+::decode_OP_VOP2__V_PK_FMAC_F16,
+::decode_OP_VOP2__V_PK_FMAC_F16,
+::decode_OP_VOP2__V_PK_FMAC_F16,
+::decode_OP_VOP2__V_XNOR_B32,
+::decode_OP_VOP2__V_XNOR_B32,
+::decode_OP_VOP2__V_XNOR_B32,
+::decode_OP_VOP2__V_XNOR_B32,
 ::subDecode_OP_VOPC,
 ::subDecode_OP_VOPC,
 ::subDecode_OP_VOPC,
@@ -4172,6 +4172,55 @@
 }

 GPUStaticInst*
+Decoder::decode_OP_VOP2__V_DOT2C_F32_F16(MachInst iFmt)
+{
+fatal("Trying to decode instruction without a class\n");
+return nullptr;
+}
+
+GPUStaticInst*
+Decoder::decode_OP_VOP2__V_DOT2C_I32_I16(MachInst iFmt)
+{
+fatal("Trying to decode instruction without a class\n");
+return nullptr;
+}
+
+GPUStaticInst*
+Decoder::decode_OP_VOP2__V_DOT4C_I32_I8(MachInst iFmt)
+{
+fatal("Trying to decode instruction without a class\n");
+return nullptr;
+}
+
+GPUStaticInst*
+Decoder::decode_OP_VOP2__V_DOT8C_I32_I4(MachInst iFmt)
+{
+fatal("Trying to decode instruction without a class\n");
+return nullptr;
+}
+
+GPUStaticInst*
+Decoder::decode_OP_VOP2__V_FMAC_F32(MachInst iFmt)
+{
+fatal("Trying to decode instruction without a class\n");
+return nullptr;
+}
+
+GPUStaticInst*
+Decoder::decode_OP_VOP2__V_PK_FMAC_F16(MachInst iFmt)
+{
+fatal("Trying to decode instruction without a class\n");
+return nullptr;
+}
+
+GPUStaticInst*
+Decoder::decode_OP_VOP2__V_XNOR_B32(MachInst iFmt)
+{
+fatal("Trying to decode instruction without a class\n");
+return nullptr;
+}
+
+GPUStaticInst*
 Decoder::decode_OP_SOP2__S_ADD_U32(MachInst iFmt)
 {
 return new Inst_SOP2__S_ADD_U32(>iFmt_SOP2);
diff --git a/src/arch/amdgpu/vega/gpu_decoder.hh  
b/src/arch/amdgpu/vega/gpu_decoder.hh

index 1be4386..af989e0 100644
--- a/src/arch/amdgpu/vega/gpu_decoder.hh
+++ b/src/arch/amdgpu/vega/gpu_decoder.hh
@@ -1358,6 +1358,13 @@
 GPUStaticInst* decode_OP_VOP2__V_ADD_U32(MachInst);
 GPUStaticInst* decode_OP_VOP2__V_SUB_U32(MachInst);
 GPUStaticInst* 

[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Add writeROM method

2023-04-21 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70037?usp=email )



Change subject: dev-amdgpu: Add writeROM method
..

dev-amdgpu: Add writeROM method

For non-KVM CPUs the VBIOS memory falls into an I/O hole and therefore
gets routed to the PIO bus in gem5. This gets routed to the GPU in the
case of a ROM write. We write to the ROM as a way to "load" the VBIOS
without creating holes in the KVM VM.

This write method allows the same scripts as KVM to be used by writing
to the ROM area and overwriting what might already be there from the
--gpu-rom option.

Change-Id: I8c2d2aa05a823569a774dfdd3bf2d2e773f38683
---
M src/dev/amdgpu/amdgpu_device.cc
M src/dev/amdgpu/amdgpu_device.hh
2 files changed, 23 insertions(+), 0 deletions(-)



diff --git a/src/dev/amdgpu/amdgpu_device.cc  
b/src/dev/amdgpu/amdgpu_device.cc

index cb180b6..3605882 100644
--- a/src/dev/amdgpu/amdgpu_device.cc
+++ b/src/dev/amdgpu/amdgpu_device.cc
@@ -107,6 +107,20 @@
 pkt->getAddr(), rom_offset, rom_data);
 }

+void
+AMDGPUDevice::writeROM(PacketPtr pkt)
+{
+assert(isROM(pkt->getAddr()));
+
+Addr rom_offset = pkt->getAddr() - romRange.start();
+uint64_t rom_data = pkt->getUintX(ByteOrder::little);
+
+memcpy(rom.data() + rom_offset, _data, pkt->getSize());
+
+DPRINTF(AMDGPUDevice, "Write to addr %#x on ROM offset %#x  
data: %#x\n",

+pkt->getAddr(), rom_offset, rom_data);
+}
+
 AddrRangeList
 AMDGPUDevice::getAddrRanges() const
 {
@@ -386,6 +400,14 @@
 Tick
 AMDGPUDevice::write(PacketPtr pkt)
 {
+if (isROM(pkt->getAddr())) {
+writeROM(pkt);
+
+dispatchAccess(pkt, false);
+
+return pioDelay;
+}
+
 int barnum = -1;
 Addr offset = 0;
 getBAR(pkt->getAddr(), barnum, offset);
diff --git a/src/dev/amdgpu/amdgpu_device.hh  
b/src/dev/amdgpu/amdgpu_device.hh

index ac31b95..b64067a 100644
--- a/src/dev/amdgpu/amdgpu_device.hh
+++ b/src/dev/amdgpu/amdgpu_device.hh
@@ -94,6 +94,7 @@
 AddrRange romRange;
 bool isROM(Addr addr) const { return romRange.contains(addr); }
 void readROM(PacketPtr pkt);
+void writeROM(PacketPtr pkt);

 std::array rom;


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/70037?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I8c2d2aa05a823569a774dfdd3bf2d2e773f38683
Gerrit-Change-Number: 70037
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [XS] Change in gem5/gem5[develop]: dev-amdgpu: Default MMIO reads when previously written

2023-04-21 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70039?usp=email )



Change subject: dev-amdgpu: Default MMIO reads when previously written
..

dev-amdgpu: Default MMIO reads when previously written

If an MMIO was previously written and the driver reads it, we should
return the value that was previously read. This overwrites the MMIO
trace value which is the last resort fallback for finding an MMIO value.
This is needed to initialize newer GPU devices in gem5.

Change-Id: Ida2435290b706288e88518b5d920691cdb6dcc09
---
M src/dev/amdgpu/amdgpu_device.cc
1 file changed, 7 insertions(+), 0 deletions(-)



diff --git a/src/dev/amdgpu/amdgpu_device.cc  
b/src/dev/amdgpu/amdgpu_device.cc

index 3605882..7e6304a 100644
--- a/src/dev/amdgpu/amdgpu_device.cc
+++ b/src/dev/amdgpu/amdgpu_device.cc
@@ -248,6 +248,13 @@
 DPRINTF(AMDGPUDevice, "Read MMIO %#lx\n", offset);
 mmioReader.readFromTrace(pkt, MMIO_BAR, offset);

+if (regs.find(pkt->getAddr()) != regs.end()) {
+uint64_t value = regs[pkt->getAddr()];
+DPRINTF(AMDGPUDevice, "Reading what kernel wrote before: %#x\n",
+value);
+pkt->setUintX(value, ByteOrder::little);
+}
+
 switch (aperture) {
   case NBIO_BASE:
 switch (aperture_offset) {

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/70039?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ida2435290b706288e88518b5d920691cdb6dcc09
Gerrit-Change-Number: 70039
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: dev-amdgpu: Refactor MMIO interface for SDMA engines

2023-04-21 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70040?usp=email )



Change subject: dev-amdgpu: Refactor MMIO interface for SDMA engines
..

dev-amdgpu: Refactor MMIO interface for SDMA engines

Currently the amdgpu simulated device is assumed to be a Vega10. As a
result there are a few things that are hardcoded. One of those is the
number of SDMAs. In order to add a newer device, such as MI100+, we need
to enable a flexible number of SDMAs.

In order to support a variable number of SDMAs and with the MMIO offsets
of each device being potentially different, the MMIO interface for SDMAs
is changed to use an SDMA class method dispatch table with forwards a
32-bit value from the MMIO packet to the MMIO functions in SDMA of the
format `void method(uint32_t)`. Several changes are made to enable this:

 - Allow the SDMA to have a variable MMIO base and size. These are
   configured in python.
 - An SDMA class method dispatch table which contains the MMIO offset
   relative to the SDMA's MMIO base address.
 - An updated writeMMIO method to iterate over the SDMA MMIO address
   ranges and call the appropriate SDMA MMIO method which matches the
   MMIO offset.
 - Moved all SDMA related MMIO data bit twiddling, masking, etc. into
   the MMIO methods themselves instead of in the writeMMIO method in
   SDMAEngine.

Change-Id: Ifce626f84d52f9e27e4438ba4e685e30dbf06dbc
---
M configs/example/gpufs/system/system.py
M src/dev/amdgpu/AMDGPU.py
M src/dev/amdgpu/amdgpu_device.cc
M src/dev/amdgpu/amdgpu_device.hh
M src/dev/amdgpu/interrupt_handler.cc
M src/dev/amdgpu/interrupt_handler.hh
M src/dev/amdgpu/sdma_engine.cc
M src/dev/amdgpu/sdma_engine.hh
8 files changed, 182 insertions(+), 57 deletions(-)



diff --git a/configs/example/gpufs/system/system.py  
b/configs/example/gpufs/system/system.py

index 93f0194..90c5c01 100644
--- a/configs/example/gpufs/system/system.py
+++ b/configs/example/gpufs/system/system.py
@@ -129,15 +129,45 @@
 device_ih = AMDGPUInterruptHandler()
 system.pc.south_bridge.gpu.device_ih = device_ih

-# Setup the SDMA engines
-sdma0_pt_walker = VegaPagetableWalker()
-sdma1_pt_walker = VegaPagetableWalker()
+# Setup the SDMA engines depending on device. The MMIO base addresses
+# can be found in the driver code under:
+# include/asic_reg/sdmaX/sdmaX_Y_Z_offset.h
+num_sdmas = 2
+sdma_bases = []
+sdma_sizes = []
+if args.gpu_device == "Vega10":
+num_sdmas = 2
+sdma_bases = [0x4980, 0x5180]
+sdma_sizes = [0x800] * 2
+elif args.gpu_device == "MI100":
+num_sdmas = 8
+sdma_bases = [
+0x4980,
+0x6180,
+0x78000,
+0x79000,
+0x7A000,
+0x7B000,
+0x7C000,
+0x7D000,
+]
+sdma_sizes = [0x1000] * 8
+else:
+m5.util.panic(f"Unknown GPU device {args.gpu_device}")

-sdma0 = SDMAEngine(walker=sdma0_pt_walker)
-sdma1 = SDMAEngine(walker=sdma1_pt_walker)
+sdma_pt_walkers = []
+sdma_engines = []
+for sdma_idx in range(num_sdmas):
+sdma_pt_walker = VegaPagetableWalker()
+sdma_engine = SDMAEngine(
+walker=sdma_pt_walker,
+mmio_base=sdma_bases[sdma_idx],
+mmio_size=sdma_sizes[sdma_idx],
+)
+sdma_pt_walkers.append(sdma_pt_walker)
+sdma_engines.append(sdma_engine)

-system.pc.south_bridge.gpu.sdma0 = sdma0
-system.pc.south_bridge.gpu.sdma1 = sdma1
+system.pc.south_bridge.gpu.sdmas = sdma_engines

 # Setup PM4 packet processor
 pm4_pkt_proc = PM4PacketProcessor()
@@ -155,22 +185,22 @@
 system._dma_ports.append(gpu_hsapp)
 system._dma_ports.append(gpu_cmd_proc)
 system._dma_ports.append(system.pc.south_bridge.gpu)
-system._dma_ports.append(sdma0)
-system._dma_ports.append(sdma1)
+for sdma in sdma_engines:
+system._dma_ports.append(sdma)
 system._dma_ports.append(device_ih)
 system._dma_ports.append(pm4_pkt_proc)
 system._dma_ports.append(system_hub)
 system._dma_ports.append(gpu_mem_mgr)
 system._dma_ports.append(hsapp_pt_walker)
 system._dma_ports.append(cp_pt_walker)
-system._dma_ports.append(sdma0_pt_walker)
-system._dma_ports.append(sdma1_pt_walker)
+for sdma_pt_walker in sdma_pt_walkers:
+system._dma_ports.append(sdma_pt_walker)

 gpu_hsapp.pio = system.iobus.mem_side_ports
 gpu_cmd_proc.pio = system.iobus.mem_side_ports
 system.pc.south_bridge.gpu.pio = system.iobus.mem_side_ports
-sdma0.pio = system.iobus.mem_side_ports
-sdma1.pio = system.iobus.mem_side_ports
+for sdma in sdma_engines:
+sdma.pio = system.iobus.mem_side_ports
 device_ih.pio = system.iobus.mem_side_ports
 pm4_pkt_proc.pio = system.iobus.mem_side_ports
 system_hub.pio = 

[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu,configs: Add human readable names for different GPUs

2023-04-21 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70038?usp=email )



Change subject: dev-amdgpu,configs: Add human readable names for different  
GPUs

..

dev-amdgpu,configs: Add human readable names for different GPUs

Add a human readable string for GPU device names rather than using the
device ID in the code. This is intended to make code more readable.

Change-Id: Id3ea74ca37422b1f4a0f09e5a9522d37b5998c1a
---
M configs/example/gpufs/runfs.py
M configs/example/gpufs/system/amdgpu.py
M src/dev/amdgpu/AMDGPU.py
3 files changed, 21 insertions(+), 0 deletions(-)



diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py
index 52b79ab..efea26b 100644
--- a/configs/example/gpufs/runfs.py
+++ b/configs/example/gpufs/runfs.py
@@ -126,6 +126,13 @@
 help="type of memory to use",
 )

+parser.add_argument(
+"--gpu-device",
+default="Vega10",
+choices=["Vega10", "MI100"],
+help="GPU model to run: Vega10 (gfx900) or MI100 (gfx908)",
+)
+

 def runGpuFSSystem(args):
 """
diff --git a/configs/example/gpufs/system/amdgpu.py  
b/configs/example/gpufs/system/amdgpu.py

index 1fd3e2f..5f98b55 100644
--- a/configs/example/gpufs/system/amdgpu.py
+++ b/configs/example/gpufs/system/amdgpu.py
@@ -170,3 +170,14 @@
 system.pc.south_bridge.gpu.checkpoint_before_mmios = (
 args.checkpoint_before_mmios
 )
+
+system.pc.south_bridge.gpu.device_name = args.gpu_device
+
+if args.gpu_device == "MI100":
+system.pc.south_bridge.gpu.DeviceID = 0x738C
+system.pc.south_bridge.gpu.SubsystemVendorID = 0x1002
+system.pc.south_bridge.gpu.SubsystemID = 0x0C34
+elif args.gpu_device == "Vega10":
+system.pc.south_bridge.gpu.DeviceID = 0x6863
+else:
+panic("Unknown GPU device: {}".format(args.gpu_device))
diff --git a/src/dev/amdgpu/AMDGPU.py b/src/dev/amdgpu/AMDGPU.py
index f9d953f..1e78672 100644
--- a/src/dev/amdgpu/AMDGPU.py
+++ b/src/dev/amdgpu/AMDGPU.py
@@ -46,6 +46,9 @@
 cxx_header = "dev/amdgpu/amdgpu_device.hh"
 cxx_class = "gem5::AMDGPUDevice"

+# Human readable name for device ID
+device_name = Param.String("Vega10", "Codename for device")
+
 # IDs for AMD Vega 10
 VendorID = 0x1002
 DeviceID = 0x6863

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/70038?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Id3ea74ca37422b1f4a0f09e5a9522d37b5998c1a
Gerrit-Change-Number: 70038
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [L] Change in gem5/gem5[develop]: dev-amdgpu: Enable more GPUs with device specific registers

2023-04-21 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/70041?usp=email )



Change subject: dev-amdgpu: Enable more GPUs with device specific registers
..

dev-amdgpu: Enable more GPUs with device specific registers

Currently gem5 assumes the amdgpu device to be Vega10. In order to
support more devices we need to handle situations where different
registers and addresses have the same functionality but different
offsets on different devices.

This changeset adds an NBIO class to handle device discovery and driver
initialization related tasks, pulling them out of the AMDGPUDevice
class. The offsets used for MMIOs are reworked slightly to use offsets
rather than absolute addresses. This is because we cannot determine the
absolute address in the constructor since the BAR has not been assigned
by the OS yet.

Change-Id: I14b364374e086e185978334425a4e265cf2760d0
---
M src/dev/amdgpu/SConscript
M src/dev/amdgpu/amdgpu_device.cc
M src/dev/amdgpu/amdgpu_device.hh
A src/dev/amdgpu/amdgpu_nbio.cc
A src/dev/amdgpu/amdgpu_nbio.hh
M src/dev/amdgpu/amdgpu_vm.hh
6 files changed, 371 insertions(+), 46 deletions(-)



diff --git a/src/dev/amdgpu/SConscript b/src/dev/amdgpu/SConscript
index 713f0a6..9f8eeac 100644
--- a/src/dev/amdgpu/SConscript
+++ b/src/dev/amdgpu/SConscript
@@ -39,6 +39,7 @@
 tags='x86 isa')

 Source('amdgpu_device.cc', tags='x86 isa')
+Source('amdgpu_nbio.cc', tags='x86 isa')
 Source('amdgpu_vm.cc', tags='x86 isa')
 Source('interrupt_handler.cc', tags='x86 isa')
 Source('memory_manager.cc', tags='x86 isa')
diff --git a/src/dev/amdgpu/amdgpu_device.cc  
b/src/dev/amdgpu/amdgpu_device.cc

index 2acf1f4..519ea7a 100644
--- a/src/dev/amdgpu/amdgpu_device.cc
+++ b/src/dev/amdgpu/amdgpu_device.cc
@@ -36,6 +36,7 @@
 #include "debug/AMDGPUDevice.hh"
 #include "dev/amdgpu/amdgpu_vm.hh"
 #include "dev/amdgpu/interrupt_handler.hh"
+#include "dev/amdgpu/nbio_mmio.hh"
 #include "dev/amdgpu/pm4_packet_processor.hh"
 #include "dev/amdgpu/sdma_engine.hh"
 #include "dev/hsa/hw_scheduler.hh"
@@ -129,6 +130,34 @@
 pm4PktProc->setGPUDevice(this);
 cp->hsaPacketProc().setGPUDevice(this);
 cp->setGPUDevice(this);
+
+// Address aperture for device memory. We tell this to the driver and
+// could possibly be anything, but these are the values used by  
hardware.

+uint64_t mmhubBase = 0x8000ULL << 24;
+uint64_t mmhubTop = 0x83ffULL << 24;
+
+// All read-before-write MMIOs go here
+//triggered_reads[AMDGPU_MP0_SMN_C2PMSG_64] = 0x8000;
+
+// These are hardcoded register values to return what the driver  
expects

+setRegVal(AMDGPU_MP0_SMN_C2PMSG_33, 0x8000);
+
+// Different registers for MI200, MI100, and Vega10
+if (p.device_name == "Vega10") {
+setRegVal(VEGA10_FB_LOCATION_BASE, mmhubBase >> 24);
+setRegVal(VEGA10_FB_LOCATION_TOP, mmhubTop >> 24);
+} else if (p.device_name == "MI100") {
+setRegVal(MI100_FB_LOCATION_BASE, mmhubBase >> 24);
+setRegVal(MI100_FB_LOCATION_TOP, mmhubTop >> 24);
+setRegVal(MI100_MEM_SIZE_REG, 0x3ff0); // 16GB of memory
+} else {
+panic("Unknown GPU device %s\n", p.device_name);
+}
+
+gpuvm.setMMHUBBase(mmhubBase);
+gpuvm.setMMHUBTop(mmhubTop);
+
+nbio.setGPUDevice(this);
 }

 void
@@ -236,35 +265,25 @@
  * first, ignoring any writes from driver. (2) Any other address from
  * device backing store / abstract memory class functionally.
  */
-if (offset == 0xa28000) {
-/*
- * Handle special counter addresses in framebuffer. These counter
- * addresses expect the read to return previous value + 1.
- */
-if (regs.find(pkt->getAddr()) == regs.end()) {
-regs[pkt->getAddr()] = 1;
-} else {
-regs[pkt->getAddr()]++;
-}
-
-pkt->setUintX(regs[pkt->getAddr()], ByteOrder::little);
-} else {
-/*
- * Read the value from device memory. This must be done  
functionally
- * because this method is called by the PCIDevice::read method  
which

- * is a non-timing read.
- */
-RequestPtr req = std::make_shared(offset, pkt->getSize(),  
0,

-   vramRequestorId());
-PacketPtr readPkt = Packet::createRead(req);
-uint8_t *dataPtr = new uint8_t[pkt->getSize()];
-readPkt->dataDynamic(dataPtr);
-
-auto system = cp->shader()->gpuCmdProc.system();
-system->getDeviceMemory(readPkt)->access(readPkt);
-
-pkt->setUintX(readPkt->getUintX(ByteOrder::little),  
ByteOrder::little);

+if (nbio.readFrame(pkt, offset)) {
+return;
 }
+
+/*
+ * Read the value from device memory. This must be done functionally
+ * because this method is called by the PCIDevice::read method which
+ 

[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Add simple check for valid GPU MMIO trace

2023-04-20 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/69978?usp=email )



Change subject: configs: Add simple check for valid GPU MMIO trace
..

configs: Add simple check for valid GPU MMIO trace

This file is a required input to the simulator for GPUFS. There seems to
be confusion from several users who are not providing this input. This
usually results in the amdgpu driver failing to load, leading to the
application under test exiting along with it.

This changeset adds a simple md5 hashsum check to compare against the
known good MMIO trace located in the gem5-resources repository.

Change-Id: I59819fc795a6bc4bc6badbd4d120db1246498987
---
M configs/example/gpufs/runfs.py
1 file changed, 6 insertions(+), 0 deletions(-)



diff --git a/configs/example/gpufs/runfs.py b/configs/example/gpufs/runfs.py
index 4a28068a..52b79ab 100644
--- a/configs/example/gpufs/runfs.py
+++ b/configs/example/gpufs/runfs.py
@@ -30,6 +30,7 @@
 # System includes
 import argparse
 import math
+import hashlib

 # gem5 related
 import m5
@@ -145,6 +146,11 @@
 math.ceil(float(n_cu) / args.cu_per_scalar_cache)
 )

+# Verify MMIO trace is valid
+mmio_md5 =  
hashlib.md5(open(args.gpu_mmio_trace, "rb").read()).hexdigest()

+if mmio_md5 != "c4ff3326ae8a036e329b8b595c83bd6d":
+m5.util.panic("MMIO file does not match gem5 resources")
+
 system = makeGpuFSSystem(args)

 root = Root(

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/69978?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I59819fc795a6bc4bc6badbd4d120db1246498987
Gerrit-Change-Number: 69978
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Allow other CPU types in GPUFS

2023-04-20 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/69979?usp=email )



Change subject: configs: Allow other CPU types in GPUFS
..

configs: Allow other CPU types in GPUFS

Previously the CPU type and memory modes were hardcoded for KVM, because
there was a deadlock bug. After some recent testing, this deadlock bug
no longer exists with the simple CPU models. Thus, changing the configs
to allow for other CPU models as a first step toward lifting the KVM
requirement from GPUFS.

Change-Id: Ib616c3ef60f173871421b55a8bb73b25ce2990b5
---
M configs/example/gpufs/system/system.py
1 file changed, 6 insertions(+), 3 deletions(-)



diff --git a/configs/example/gpufs/system/system.py  
b/configs/example/gpufs/system/system.py

index a1b59ef..93f0194 100644
--- a/configs/example/gpufs/system/system.py
+++ b/configs/example/gpufs/system/system.py
@@ -61,7 +61,9 @@
 panic("Need at least 2GB of system memory to load amdgpu module")

 # Use the common FSConfig to setup a Linux X86 System
-(TestCPUClass, test_mem_mode, FutureClass) =  
Simulation.setCPUClass(args)

+(TestCPUClass, test_mem_mode) = Simulation.getCPUClass(args.cpu_type)
+if test_mem_mode == "atomic":
+test_mem_mode = "atomic_noncaching"
 disks = [args.disk_image]
 if args.second_disk is not None:
 disks.extend([args.second_disk])
@@ -91,10 +93,11 @@

 # Create specified number of CPUs. GPUFS really only needs one.
 system.cpu = [
-X86KvmCPU(clk_domain=system.cpu_clk_domain, cpu_id=i)
+TestCPUClass(clk_domain=system.cpu_clk_domain, cpu_id=i)
 for i in range(args.num_cpus)
 ]
-system.kvm_vm = KvmVM()
+if ObjectList.is_kvm_cpu(TestCPUClass):
+system.kvm_vm = KvmVM()

 # Create AMDGPU and attach to southbridge
 shader = createGPU(system, args)

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/69979?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ib616c3ef60f173871421b55a8bb73b25ce2990b5
Gerrit-Change-Number: 69979
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [XS] Change in gem5/gem5[develop]: configs: Use higher dmesg level for GPUFS

2023-04-20 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/69977?usp=email )



Change subject: configs: Use higher dmesg level for GPUFS
..

configs: Use higher dmesg level for GPUFS

The dmesg level is currently set to 3 which will not display errors if
the amdgpu driver fails to load. Changing to level 8 will show errors in
the gem5 terminal and is not too spammy. This will help GPUFS developers
with bug reports since we would actually be able to observe an error.
Currently if the driver fails to load, there is no way to detect it and
applications will attempt to run, usually failing on getting device
properties.

Change-Id: I56b9581c1a12a8ce329066d18d6a072d006c096d
---
M configs/example/gpufs/hip_cookbook.py
M configs/example/gpufs/hip_rodinia.py
M configs/example/gpufs/hip_samples.py
M configs/example/gpufs/vega10_kvm.py
4 files changed, 4 insertions(+), 4 deletions(-)



diff --git a/configs/example/gpufs/hip_cookbook.py  
b/configs/example/gpufs/hip_cookbook.py

index 87c7547..6a7bb42 100644
--- a/configs/example/gpufs/hip_cookbook.py
+++ b/configs/example/gpufs/hip_cookbook.py
@@ -42,7 +42,7 @@
 cookbook_runscript = """\
 export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
 export HSA_ENABLE_INTERRUPT=0
-dmesg -n3
+dmesg -n8
 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
 echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
diff --git a/configs/example/gpufs/hip_rodinia.py  
b/configs/example/gpufs/hip_rodinia.py

index 8ed951b..b8a7858 100644
--- a/configs/example/gpufs/hip_rodinia.py
+++ b/configs/example/gpufs/hip_rodinia.py
@@ -43,7 +43,7 @@
 rodinia_runscript = """\
 export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
 export HSA_ENABLE_INTERRUPT=0
-dmesg -n3
+dmesg -n8
 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
 echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
diff --git a/configs/example/gpufs/hip_samples.py  
b/configs/example/gpufs/hip_samples.py

index ccc1719..9f83c25 100644
--- a/configs/example/gpufs/hip_samples.py
+++ b/configs/example/gpufs/hip_samples.py
@@ -42,7 +42,7 @@
 samples_runscript = """\
 export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
 export HSA_ENABLE_INTERRUPT=0
-dmesg -n3
+dmesg -n8
 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
 echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."
diff --git a/configs/example/gpufs/vega10_kvm.py  
b/configs/example/gpufs/vega10_kvm.py

index 54253be..9c7e457 100644
--- a/configs/example/gpufs/vega10_kvm.py
+++ b/configs/example/gpufs/vega10_kvm.py
@@ -44,7 +44,7 @@
 demo_runscript = """\
 export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
 export HSA_ENABLE_INTERRUPT=0
-dmesg -n3
+dmesg -n8
 dd if=/root/roms/vega10.rom of=/dev/mem bs=1k seek=768 count=128
 if [ ! -f /lib/modules/`uname -r`/updates/dkms/amdgpu.ko ]; then
 echo "ERROR: Missing DKMS package for kernel `uname -r`. Exiting gem5."

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/69977?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I56b9581c1a12a8ce329066d18d6a072d006c096d
Gerrit-Change-Number: 69977
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: mem: Handle DRAM write queue drain and disabled power down

2023-04-19 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/69917?usp=email )



Change subject: mem: Handle DRAM write queue drain and disabled power down
..

mem: Handle DRAM write queue drain and disabled power down

Write queue drain logic seems off currently. An event is scheduled if
the write queue is empty instead of non-empty. There is no check to see
if draining is complete when bus is in write mode. Finally the power
down check on drain always fails if DRAM powerdown is disabled.

This changeset reverses the drain conditional for the write queue to
schedule an event if the write queue is *not* empty and checks in the
event processing method that the queues are all empty so that
signalDrainDone can be called. Lastly the powerdown state is ignored if
DRAM powerdown is disabled. Powerdown is disabled in the GPU_VIPER
protocol by default. This changeset successfully drains and checkpoints
a GPUFS simulation using GPU_VIPER protocol.

Change-Id: I5459856a694c9054b28677049a06b99b9ad91bbb
---
M src/mem/dram_interface.hh
M src/mem/mem_ctrl.cc
2 files changed, 14 insertions(+), 2 deletions(-)



diff --git a/src/mem/dram_interface.hh b/src/mem/dram_interface.hh
index fa9d319..206f8e8 100644
--- a/src/mem/dram_interface.hh
+++ b/src/mem/dram_interface.hh
@@ -380,7 +380,11 @@
  * @param Return true if the rank is idle from a bank
  *and power point of view
  */
-bool inPwrIdleState() const { return pwrState == PWR_IDLE; }
+bool
+inPwrIdleState() const
+{
+return !dram.enableDRAMPowerdown || pwrState == PWR_IDLE;
+}

 /**
  * Trigger a self-refresh exit if there are entries enqueued
diff --git a/src/mem/mem_ctrl.cc b/src/mem/mem_ctrl.cc
index 543d637..074a31f 100644
--- a/src/mem/mem_ctrl.cc
+++ b/src/mem/mem_ctrl.cc
@@ -908,6 +908,13 @@
 }
 }

+if (drainState() == DrainState::Draining && !totalWriteQueueSize &&
+!totalReadQueueSize && respQEmpty()) {
+
+DPRINTF(Drain, "MemCtrl controller done draining\n");
+signalDrainDone();
+}
+
 // updates current state
 busState = busStateNext;

@@ -1420,7 +1427,8 @@

 // the only queue that is not drained automatically over time
 // is the write queue, thus kick things into action if needed
-if (!totalWriteQueueSize && !nextReqEvent.scheduled()) {
+if (totalWriteQueueSize && !nextReqEvent.scheduled()) {
+DPRINTF(Drain,"Scheduling nextReqEvent from drain\n");
 schedule(nextReqEvent, curTick());
 }


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/69917?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-MessageType: newchange
Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I5459856a694c9054b28677049a06b99b9ad91bbb
Gerrit-Change-Number: 69917
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Update API for some flat atomics

2023-02-15 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67977?usp=email )


Change subject: arch-vega: Update API for some flat atomics
..

arch-vega: Update API for some flat atomics

Some recently submitted atomic instructions were using two older APIs.
Update these to use the newer APIs to support all apertures and avoid
compilation issue.

Change-Id: Ibd6bc00177d33236946f54ef8e5c7544af322852
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67977
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
---
M src/arch/amdgpu/vega/insts/instructions.cc
1 file changed, 23 insertions(+), 15 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index b6a78b2..45c8491 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -44984,13 +44984,11 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
 ConstVecOperandU32 data(gpuDynInst, extData.DATA);

-addr.read();
 data.read();

-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
@@ -44999,8 +44997,7 @@
 }
 }

-gpuDynInst->computeUnit()->globalMemoryPipe.
-issueRequest(gpuDynInst);
+issueRequestHelper(gpuDynInst);
 } // execute

 void
@@ -45091,13 +45088,11 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
 ConstVecOperandU32 data(gpuDynInst, extData.DATA);

-addr.read();
 data.read();

-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
@@ -45106,8 +45101,7 @@
 }
 }

-gpuDynInst->computeUnit()->globalMemoryPipe.
-issueRequest(gpuDynInst);
+issueRequestHelper(gpuDynInst);
 } // execute

 void
@@ -45226,13 +45220,11 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
 ConstVecOperandU32 data(gpuDynInst, extData.DATA);

-addr.read();
 data.read();

-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
@@ -45241,8 +45233,7 @@
 }
 }

-gpuDynInst->computeUnit()->globalMemoryPipe.
-issueRequest(gpuDynInst);
+issueRequestHelper(gpuDynInst);
 } // execute

 void

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67977?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ibd6bc00177d33236946f54ef8e5c7544af322852
Gerrit-Change-Number: 67977
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Bobby Bruce 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Jason Lowe-Power 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Update API for some flat atomics

2023-02-15 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67977?usp=email )



Change subject: arch-vega: Update API for some flat atomics
..

arch-vega: Update API for some flat atomics

Some recently submitted atomic instructions were using two older APIs.
Update these to use the newer APIs to support all apertures and avoid
compilation issue.

Change-Id: Ibd6bc00177d33236946f54ef8e5c7544af322852
---
M src/arch/amdgpu/vega/insts/instructions.cc
1 file changed, 19 insertions(+), 15 deletions(-)



diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index b6a78b2..45c8491 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -44984,13 +44984,11 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
 ConstVecOperandU32 data(gpuDynInst, extData.DATA);

-addr.read();
 data.read();

-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
@@ -44999,8 +44997,7 @@
 }
 }

-gpuDynInst->computeUnit()->globalMemoryPipe.
-issueRequest(gpuDynInst);
+issueRequestHelper(gpuDynInst);
 } // execute

 void
@@ -45091,13 +45088,11 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
 ConstVecOperandU32 data(gpuDynInst, extData.DATA);

-addr.read();
 data.read();

-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
@@ -45106,8 +45101,7 @@
 }
 }

-gpuDynInst->computeUnit()->globalMemoryPipe.
-issueRequest(gpuDynInst);
+issueRequestHelper(gpuDynInst);
 } // execute

 void
@@ -45226,13 +45220,11 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
 ConstVecOperandU32 data(gpuDynInst, extData.DATA);

-addr.read();
 data.read();

-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
@@ -45241,8 +45233,7 @@
 }
 }

-gpuDynInst->computeUnit()->globalMemoryPipe.
-issueRequest(gpuDynInst);
+issueRequestHelper(gpuDynInst);
 } // execute

 void

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67977?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ibd6bc00177d33236946f54ef8e5c7544af322852
Gerrit-Change-Number: 67977
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Update deprecated ports

2023-02-14 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67837?usp=email )


Change subject: dev-amdgpu: Update deprecated ports
..

dev-amdgpu: Update deprecated ports

Change-Id: Icbc5636c33b437c7396ee27363eed1cf006f8882
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67837
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
---
M src/arch/amdgpu/common/tlb_coalescer.hh
M src/dev/amdgpu/memory_manager.hh
2 files changed, 16 insertions(+), 3 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/common/tlb_coalescer.hh  
b/src/arch/amdgpu/common/tlb_coalescer.hh

index 59d8ebe..56d72d7 100644
--- a/src/arch/amdgpu/common/tlb_coalescer.hh
+++ b/src/arch/amdgpu/common/tlb_coalescer.hh
@@ -152,7 +152,7 @@
   public:
 MemSidePort(const std::string &_name, TLBCoalescer *tlb_coalescer,
 PortID _index)
-: RequestPort(_name, tlb_coalescer), coalescer(tlb_coalescer),
+: RequestPort(_name), coalescer(tlb_coalescer),
   index(_index) { }

 std::deque retries;
diff --git a/src/dev/amdgpu/memory_manager.hh  
b/src/dev/amdgpu/memory_manager.hh

index e18ec64..0bd08d6 100644
--- a/src/dev/amdgpu/memory_manager.hh
+++ b/src/dev/amdgpu/memory_manager.hh
@@ -45,11 +45,11 @@

 class AMDGPUMemoryManager : public ClockedObject
 {
-class GPUMemPort : public MasterPort
+class GPUMemPort : public RequestPort
 {
   public:
 GPUMemPort(const std::string &_name, AMDGPUMemoryManager  
&_gpuMemMgr)

-: MasterPort(_name, &_gpuMemMgr), gpu_mem(_gpuMemMgr)
+: RequestPort(_name), gpu_mem(_gpuMemMgr)
 {
 }


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67837?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Icbc5636c33b437c7396ee27363eed1cf006f8882
Gerrit-Change-Number: 67837
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Gabriel B. 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implementing global_atomic_smax

2023-02-14 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/64513?usp=email )


Change subject: arch-vega: Implementing global_atomic_smax
..

arch-vega: Implementing global_atomic_smax

Change-Id: Id4053424c98eec1e98eb555bb35b48f0b5d2407b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64513
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
2 files changed, 67 insertions(+), 1 deletion(-)

Approvals:
  kokoro: Regressions pass
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved




diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index e3639a5..b6a78b2 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -45079,8 +45079,59 @@
 void
 Inst_FLAT__FLAT_ATOMIC_SMAX::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (gpuDynInst->exec_mask.none()) {
+wf->decVMemInstsIssued();
+wf->decLGKMInstsIssued();
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());
+
+ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU32 data(gpuDynInst, extData.DATA);
+
+addr.read();
+data.read();
+
+calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->a_data))[lane]
+= data[lane];
+}
+}
+
+gpuDynInst->computeUnit()->globalMemoryPipe.
+issueRequest(gpuDynInst);
 } // execute
+
+void
+Inst_FLAT__FLAT_ATOMIC_SMAX::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+initAtomicAccess(gpuDynInst);
+} // initiateAcc
+
+void
+Inst_FLAT__FLAT_ATOMIC_SMAX::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+if (isAtomicRet()) {
+VecOperandU32 vdst(gpuDynInst, extData.VDST);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+vdst[lane] = (reinterpret_cast(
+gpuDynInst->d_data))[lane];
+}
+}
+
+vdst.write();
+}
+} // completeAcc
 // --- Inst_FLAT__FLAT_ATOMIC_UMAX class methods ---

 Inst_FLAT__FLAT_ATOMIC_UMAX::Inst_FLAT__FLAT_ATOMIC_UMAX(InFmt_FLAT  
*iFmt)
diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index 8b0c8c4..d45a84c 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -42691,6 +42691,8 @@
 } // getOperandSize

 void execute(GPUDynInstPtr) override;
+void initiateAcc(GPUDynInstPtr) override;
+void completeAcc(GPUDynInstPtr) override;
 }; // Inst_FLAT__FLAT_ATOMIC_SMAX

 class Inst_FLAT__FLAT_ATOMIC_UMAX : public Inst_FLAT

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/64513?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Id4053424c98eec1e98eb555bb35b48f0b5d2407b
Gerrit-Change-Number: 64513
Gerrit-PatchSet: 2
Gerrit-Owner: Alexandru Duțu (Alex) 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implementing global_atomic_smin

2023-02-14 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/64512?usp=email )


Change subject: arch-vega: Implementing global_atomic_smin
..

arch-vega: Implementing global_atomic_smin

Change-Id: Iffb366190f9e3f7ffbacde5dbb3abc97226926d4
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64512
Reviewed-by: Matt Sinclair 
Tested-by: kokoro 
Maintainer: Matt Sinclair 
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
2 files changed, 67 insertions(+), 1 deletion(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 987474f..e3639a5 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -44972,8 +44972,59 @@
 void
 Inst_FLAT__FLAT_ATOMIC_SMIN::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (gpuDynInst->exec_mask.none()) {
+wf->decVMemInstsIssued();
+wf->decLGKMInstsIssued();
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());
+
+ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU32 data(gpuDynInst, extData.DATA);
+
+addr.read();
+data.read();
+
+calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->a_data))[lane]
+= data[lane];
+}
+}
+
+gpuDynInst->computeUnit()->globalMemoryPipe.
+issueRequest(gpuDynInst);
 } // execute
+
+void
+Inst_FLAT__FLAT_ATOMIC_SMIN::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+initAtomicAccess(gpuDynInst);
+} // initiateAcc
+
+void
+Inst_FLAT__FLAT_ATOMIC_SMIN::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+if (isAtomicRet()) {
+VecOperandU32 vdst(gpuDynInst, extData.VDST);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+vdst[lane] = (reinterpret_cast(
+gpuDynInst->d_data))[lane];
+}
+}
+
+vdst.write();
+}
+} // completeAcc
 // --- Inst_FLAT__FLAT_ATOMIC_UMIN class methods ---

 Inst_FLAT__FLAT_ATOMIC_UMIN::Inst_FLAT__FLAT_ATOMIC_UMIN(InFmt_FLAT  
*iFmt)
diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index ddf228a..8b0c8c4 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -42615,6 +42615,8 @@
 } // getOperandSize

 void execute(GPUDynInstPtr) override;
+void initiateAcc(GPUDynInstPtr) override;
+void completeAcc(GPUDynInstPtr) override;
 }; // Inst_FLAT__FLAT_ATOMIC_SMIN

 class Inst_FLAT__FLAT_ATOMIC_UMIN : public Inst_FLAT

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/64512?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Iffb366190f9e3f7ffbacde5dbb3abc97226926d4
Gerrit-Change-Number: 64512
Gerrit-PatchSet: 2
Gerrit-Owner: Alexandru Duțu (Alex) 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implementing global_atomic_or

2023-02-14 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/64511?usp=email )


Change subject: arch-vega: Implementing global_atomic_or
..

arch-vega: Implementing global_atomic_or

Change-Id: I13065186313ca784054956e1165b1b2fd8ce4a19
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/64511
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
2 files changed, 68 insertions(+), 1 deletion(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index f019dfd..987474f 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -45112,8 +45112,60 @@
 void
 Inst_FLAT__FLAT_ATOMIC_OR::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (gpuDynInst->exec_mask.none()) {
+wf->decVMemInstsIssued();
+wf->decLGKMInstsIssued();
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());
+
+ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU32 data(gpuDynInst, extData.DATA);
+
+addr.read();
+data.read();
+
+calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->a_data))[lane]
+= data[lane];
+}
+}
+
+gpuDynInst->computeUnit()->globalMemoryPipe.
+issueRequest(gpuDynInst);
 } // execute
+
+void
+Inst_FLAT__FLAT_ATOMIC_OR::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+initAtomicAccess(gpuDynInst);
+} // initiateAcc
+
+void
+Inst_FLAT__FLAT_ATOMIC_OR::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+if (isAtomicRet()) {
+VecOperandU32 vdst(gpuDynInst, extData.VDST);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+vdst[lane] = (reinterpret_cast(
+gpuDynInst->d_data))[lane];
+}
+}
+
+vdst.write();
+}
+} // completeAcc
+
 // --- Inst_FLAT__FLAT_ATOMIC_XOR class methods ---

 Inst_FLAT__FLAT_ATOMIC_XOR::Inst_FLAT__FLAT_ATOMIC_XOR(InFmt_FLAT  
*iFmt)
diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index dc2ee08..ddf228a 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -42800,6 +42800,8 @@
 } // getOperandSize

 void execute(GPUDynInstPtr) override;
+void initiateAcc(GPUDynInstPtr) override;
+void completeAcc(GPUDynInstPtr) override;
 }; // Inst_FLAT__FLAT_ATOMIC_OR

 class Inst_FLAT__FLAT_ATOMIC_XOR : public Inst_FLAT

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/64511?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I13065186313ca784054956e1165b1b2fd8ce4a19
Gerrit-Change-Number: 64511
Gerrit-PatchSet: 2
Gerrit-Owner: Alexandru Duțu (Alex) 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Fix address in POLL_REGMEM SDMA packet

2023-02-14 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67877?usp=email )


Change subject: dev-amdgpu: Fix address in POLL_REGMEM SDMA packet
..

dev-amdgpu: Fix address in POLL_REGMEM SDMA packet

The address for the POLL_REGMEM packet should not be shifted when the
mode is 1 (memory). Relevant driver code below is not shifting the
address. The shift is causing a page fault due to the incorrect address.

This changeset removes the shift so the correct address is translated.

https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/
roc-4.3.x/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c#L903

Change-Id: I7a0ec3245ca14376670df24c5d3773958c08d751
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67877
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/dev/amdgpu/sdma_engine.cc
1 file changed, 23 insertions(+), 1 deletion(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/dev/amdgpu/sdma_engine.cc b/src/dev/amdgpu/sdma_engine.cc
index 4c03bf5..736df45 100644
--- a/src/dev/amdgpu/sdma_engine.cc
+++ b/src/dev/amdgpu/sdma_engine.cc
@@ -832,7 +832,7 @@
 auto cb = new DmaVirtCallback(
 [ = ] (const uint32_t _buffer) {
 pollRegMemRead(q, header, pkt, dma_buffer, 0); });
-dmaReadVirt(pkt->address >> 3, sizeof(uint32_t), cb,
+dmaReadVirt(pkt->address, sizeof(uint32_t), cb,
 (void *)>dmaBuffer);
 } else {
 panic("SDMA poll mem operation not implemented.");

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67877?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I7a0ec3245ca14376670df24c5d3773958c08d751
Gerrit-Change-Number: 67877
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Fix address in POLL_REGMEM SDMA packet

2023-02-13 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67877?usp=email )



Change subject: dev-amdgpu: Fix address in POLL_REGMEM SDMA packet
..

dev-amdgpu: Fix address in POLL_REGMEM SDMA packet

The address for the POLL_REGMEM packet should not be shifted when the
mode is 1 (memory). Relevant driver code below is not shifting the
address. The shift is causing a page fault due to the incorrect address.

This changeset removes the shift so the correct address is translated.

https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/
roc-4.3.x/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c#L903

Change-Id: I7a0ec3245ca14376670df24c5d3773958c08d751
---
M src/dev/amdgpu/sdma_engine.cc
1 file changed, 19 insertions(+), 1 deletion(-)



diff --git a/src/dev/amdgpu/sdma_engine.cc b/src/dev/amdgpu/sdma_engine.cc
index 4c03bf5..736df45 100644
--- a/src/dev/amdgpu/sdma_engine.cc
+++ b/src/dev/amdgpu/sdma_engine.cc
@@ -832,7 +832,7 @@
 auto cb = new DmaVirtCallback(
 [ = ] (const uint32_t _buffer) {
 pollRegMemRead(q, header, pkt, dma_buffer, 0); });
-dmaReadVirt(pkt->address >> 3, sizeof(uint32_t), cb,
+dmaReadVirt(pkt->address, sizeof(uint32_t), cb,
 (void *)>dmaBuffer);
 } else {
 panic("SDMA poll mem operation not implemented.");

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67877?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I7a0ec3245ca14376670df24c5d3773958c08d751
Gerrit-Change-Number: 67877
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Update deprecated ports

2023-02-10 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67837?usp=email )



Change subject: dev-amdgpu: Update deprecated ports
..

dev-amdgpu: Update deprecated ports

Change-Id: Icbc5636c33b437c7396ee27363eed1cf006f8882
---
M src/arch/amdgpu/common/tlb_coalescer.hh
M src/dev/amdgpu/memory_manager.hh
2 files changed, 12 insertions(+), 3 deletions(-)



diff --git a/src/arch/amdgpu/common/tlb_coalescer.hh  
b/src/arch/amdgpu/common/tlb_coalescer.hh

index 59d8ebe..56d72d7 100644
--- a/src/arch/amdgpu/common/tlb_coalescer.hh
+++ b/src/arch/amdgpu/common/tlb_coalescer.hh
@@ -152,7 +152,7 @@
   public:
 MemSidePort(const std::string &_name, TLBCoalescer *tlb_coalescer,
 PortID _index)
-: RequestPort(_name, tlb_coalescer), coalescer(tlb_coalescer),
+: RequestPort(_name), coalescer(tlb_coalescer),
   index(_index) { }

 std::deque retries;
diff --git a/src/dev/amdgpu/memory_manager.hh  
b/src/dev/amdgpu/memory_manager.hh

index e18ec64..0bd08d6 100644
--- a/src/dev/amdgpu/memory_manager.hh
+++ b/src/dev/amdgpu/memory_manager.hh
@@ -45,11 +45,11 @@

 class AMDGPUMemoryManager : public ClockedObject
 {
-class GPUMemPort : public MasterPort
+class GPUMemPort : public RequestPort
 {
   public:
 GPUMemPort(const std::string &_name, AMDGPUMemoryManager  
&_gpuMemMgr)

-: MasterPort(_name, &_gpuMemMgr), gpu_mem(_gpuMemMgr)
+: RequestPort(_name), gpu_mem(_gpuMemMgr)
 {
 }


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67837?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Icbc5636c33b437c7396ee27363eed1cf006f8882
Gerrit-Change-Number: 67837
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Make VGPR-offset for global SGPR-base signed

2023-02-09 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67412?usp=email )


Change subject: arch-vega: Make VGPR-offset for global SGPR-base signed
..

arch-vega: Make VGPR-offset for global SGPR-base signed

The VGPR-offset used when SGPR-base addressing is used can be signed in
Vega. These are global instructions of the format:
`global_load_dword v0, v1, s[0:1]`. This is not explicitly stated in the
ISA manual however based on compiler output the offset can be negative.

This changeset assigns the offset to a signed 32-bit integer and the
compiler takes care of the signedness in the expression which calculates
the final address. This fixes a bad address calculation in a rocPRIM
unit test.

Change-Id: I271edfbb4c6344cb1a6a69a0fd3df58a6198d599
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67412
Reviewed-by: Bobby Bruce 
Maintainer: Bobby Bruce 
Tested-by: kokoro 
---
M src/arch/amdgpu/vega/insts/op_encodings.hh
1 file changed, 25 insertions(+), 1 deletion(-)

Approvals:
  Bobby Bruce: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/insts/op_encodings.hh  
b/src/arch/amdgpu/vega/insts/op_encodings.hh

index 34f6040..1071ead 100644
--- a/src/arch/amdgpu/vega/insts/op_encodings.hh
+++ b/src/arch/amdgpu/vega/insts/op_encodings.hh
@@ -1007,8 +1007,9 @@
 // mask any upper bits from the vaddr.
 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
+ScalarRegI32 voffset = vaddr[lane];
 gpuDynInst->addr.at(lane) =
-saddr.rawData() + (vaddr[lane] & 0x) +  
offset;

+saddr.rawData() + voffset + offset;
 }
 }
 }

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67412?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I271edfbb4c6344cb1a6a69a0fd3df58a6198d599
Gerrit-Change-Number: 67412
Gerrit-PatchSet: 3
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Bobby Bruce 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_write_b8_d16_hi

2023-02-09 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67411?usp=email )


Change subject: arch-vega: Implement ds_write_b8_d16_hi
..

arch-vega: Implement ds_write_b8_d16_hi

Writes a byte to the upper 16-bit input word to an address.

Change-Id: I0bfd573526b9c46585d0008cde07c769b1d29ebd
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67411
Maintainer: Matt Sinclair 
Reviewed-by: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/vega/decoder.cc
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
3 files changed, 112 insertions(+), 2 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/decoder.cc  
b/src/arch/amdgpu/vega/decoder.cc

index 18c72a4..291dd69 100644
--- a/src/arch/amdgpu/vega/decoder.cc
+++ b/src/arch/amdgpu/vega/decoder.cc
@@ -7706,8 +7706,7 @@
 GPUStaticInst*
 Decoder::decode_OP_DS__DS_WRITE_B8_D16_HI(MachInst iFmt)
 {
-fatal("Trying to decode instruction without a class\n");
-return nullptr;
+return new Inst_DS__DS_WRITE_B8_D16_HI(>iFmt_DS);
 }

 GPUStaticInst*
diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 6cf01fb..f019dfd 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -34877,6 +34877,68 @@
 Inst_DS__DS_WRITE_B8::completeAcc(GPUDynInstPtr gpuDynInst)
 {
 } // completeAcc
+// --- Inst_DS__DS_WRITE_B8_D16_HI class methods ---
+
+Inst_DS__DS_WRITE_B8_D16_HI::Inst_DS__DS_WRITE_B8_D16_HI(InFmt_DS  
*iFmt)

+: Inst_DS(iFmt, "ds_write_b8_d16_hi")
+{
+setFlag(MemoryRef);
+setFlag(Store);
+} // Inst_DS__DS_WRITE_B8_D16_HI
+
+Inst_DS__DS_WRITE_B8_D16_HI::~Inst_DS__DS_WRITE_B8_D16_HI()
+{
+} // ~Inst_DS__DS_WRITE_B8_D16_HI
+
+// --- description from .arch file ---
+// MEM[ADDR] = DATA[23:16].
+// Byte write in to high word.
+void
+Inst_DS__DS_WRITE_B8_D16_HI::execute(GPUDynInstPtr gpuDynInst)
+{
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (gpuDynInst->exec_mask.none()) {
+wf->decLGKMInstsIssued();
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(
+gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
+ConstVecOperandU32 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU8 data(gpuDynInst, extData.DATA0);
+
+addr.read();
+data.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->d_data))[lane]
+= bits(data[lane], 23, 16);
+}
+}
+
+ 
gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst);

+} // execute
+
+void
+Inst_DS__DS_WRITE_B8_D16_HI::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+Addr offset0 = instData.OFFSET0;
+Addr offset1 = instData.OFFSET1;
+Addr offset = (offset1 << 8) | offset0;
+
+initMemWrite(gpuDynInst, offset);
+} // initiateAcc
+
+void
+Inst_DS__DS_WRITE_B8_D16_HI::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+} // completeAcc
 // --- Inst_DS__DS_WRITE_B16 class methods ---

 Inst_DS__DS_WRITE_B16::Inst_DS__DS_WRITE_B16(InFmt_DS *iFmt)
diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index 2896732..dc2ee08 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -31934,6 +31934,40 @@
 void completeAcc(GPUDynInstPtr) override;
 }; // Inst_DS__DS_WRITE_B8

+class Inst_DS__DS_WRITE_B8_D16_HI : public Inst_DS
+{
+  public:
+Inst_DS__DS_WRITE_B8_D16_HI(InFmt_DS*);
+~Inst_DS__DS_WRITE_B8_D16_HI();
+
+int
+getNumOperands() override
+{
+return numDstRegOperands() + numSrcRegOperands();
+} // getNumOperands
+
+int numDstRegOperands() override { return 0; }
+int numSrcRegOperands() override { return 2; }
+
+int
+getOperandSize(int opIdx) override
+{
+switch (opIdx) {
+  case 0: //vgpr_a
+return 4;
+  case 1: //vgpr_d0
+return 1;
+  default:
+fatal("op idx %i out of bounds\n", opIdx);
+return -1;
+}
+} // getOperandSize
+
+void execute(GPUDynInstPtr) override;
+void initiateAcc(GPUDynInstPtr) override;
+  

[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Make VGPR-offset for global SGPR-base signed

2023-01-19 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67412?usp=email )



Change subject: arch-vega: Make VGPR-offset for global SGPR-base signed
..

arch-vega: Make VGPR-offset for global SGPR-base signed

The VGPR-offset used when SGPR-base addressing is used can be signed in
Vega. These are global instructions of the format:
`global_load_dword v0, v1, s[0:1]`. This is not explicitly stated in the
ISA manual however based on compiler output the offset can be negative.

This changeset assigns the offset to a signed 32-bit integer and the
compiler takes care of the signedness in the expression which calculates
the final address. This fixes a bad address calculation in a rocPRIM
unit test.

Change-Id: I271edfbb4c6344cb1a6a69a0fd3df58a6198d599
---
M src/arch/amdgpu/vega/insts/op_encodings.hh
1 file changed, 21 insertions(+), 1 deletion(-)



diff --git a/src/arch/amdgpu/vega/insts/op_encodings.hh  
b/src/arch/amdgpu/vega/insts/op_encodings.hh

index 34f6040..1f52c75 100644
--- a/src/arch/amdgpu/vega/insts/op_encodings.hh
+++ b/src/arch/amdgpu/vega/insts/op_encodings.hh
@@ -1007,8 +1007,9 @@
 // mask any upper bits from the vaddr.
 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
+ScalarRegI32 voffset = vaddr[lane] & 0x;
 gpuDynInst->addr.at(lane) =
-saddr.rawData() + (vaddr[lane] & 0x) +  
offset;

+saddr.rawData() + voffset + offset;
 }
 }
 }

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67412?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I271edfbb4c6344cb1a6a69a0fd3df58a6198d599
Gerrit-Change-Number: 67412
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_write_b8_d16_hi

2023-01-19 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67411?usp=email )



Change subject: arch-vega: Implement ds_write_b8_d16_hi
..

arch-vega: Implement ds_write_b8_d16_hi

Writes a byte to the upper 16-bit input word to an address.

Change-Id: I0bfd573526b9c46585d0008cde07c769b1d29ebd
---
M src/arch/amdgpu/vega/decoder.cc
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
3 files changed, 108 insertions(+), 2 deletions(-)



diff --git a/src/arch/amdgpu/vega/decoder.cc  
b/src/arch/amdgpu/vega/decoder.cc

index 18c72a4..291dd69 100644
--- a/src/arch/amdgpu/vega/decoder.cc
+++ b/src/arch/amdgpu/vega/decoder.cc
@@ -7706,8 +7706,7 @@
 GPUStaticInst*
 Decoder::decode_OP_DS__DS_WRITE_B8_D16_HI(MachInst iFmt)
 {
-fatal("Trying to decode instruction without a class\n");
-return nullptr;
+return new Inst_DS__DS_WRITE_B8_D16_HI(>iFmt_DS);
 }

 GPUStaticInst*
diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 6cf01fb..f019dfd 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -34877,6 +34877,68 @@
 Inst_DS__DS_WRITE_B8::completeAcc(GPUDynInstPtr gpuDynInst)
 {
 } // completeAcc
+// --- Inst_DS__DS_WRITE_B8_D16_HI class methods ---
+
+Inst_DS__DS_WRITE_B8_D16_HI::Inst_DS__DS_WRITE_B8_D16_HI(InFmt_DS  
*iFmt)

+: Inst_DS(iFmt, "ds_write_b8_d16_hi")
+{
+setFlag(MemoryRef);
+setFlag(Store);
+} // Inst_DS__DS_WRITE_B8_D16_HI
+
+Inst_DS__DS_WRITE_B8_D16_HI::~Inst_DS__DS_WRITE_B8_D16_HI()
+{
+} // ~Inst_DS__DS_WRITE_B8_D16_HI
+
+// --- description from .arch file ---
+// MEM[ADDR] = DATA[23:16].
+// Byte write in to high word.
+void
+Inst_DS__DS_WRITE_B8_D16_HI::execute(GPUDynInstPtr gpuDynInst)
+{
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (gpuDynInst->exec_mask.none()) {
+wf->decLGKMInstsIssued();
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(
+gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
+ConstVecOperandU32 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU8 data(gpuDynInst, extData.DATA0);
+
+addr.read();
+data.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->d_data))[lane]
+= bits(data[lane], 23, 16);
+}
+}
+
+ 
gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst);

+} // execute
+
+void
+Inst_DS__DS_WRITE_B8_D16_HI::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+Addr offset0 = instData.OFFSET0;
+Addr offset1 = instData.OFFSET1;
+Addr offset = (offset1 << 8) | offset0;
+
+initMemWrite(gpuDynInst, offset);
+} // initiateAcc
+
+void
+Inst_DS__DS_WRITE_B8_D16_HI::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+} // completeAcc
 // --- Inst_DS__DS_WRITE_B16 class methods ---

 Inst_DS__DS_WRITE_B16::Inst_DS__DS_WRITE_B16(InFmt_DS *iFmt)
diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index 2896732..dc2ee08 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -31934,6 +31934,40 @@
 void completeAcc(GPUDynInstPtr) override;
 }; // Inst_DS__DS_WRITE_B8

+class Inst_DS__DS_WRITE_B8_D16_HI : public Inst_DS
+{
+  public:
+Inst_DS__DS_WRITE_B8_D16_HI(InFmt_DS*);
+~Inst_DS__DS_WRITE_B8_D16_HI();
+
+int
+getNumOperands() override
+{
+return numDstRegOperands() + numSrcRegOperands();
+} // getNumOperands
+
+int numDstRegOperands() override { return 0; }
+int numSrcRegOperands() override { return 2; }
+
+int
+getOperandSize(int opIdx) override
+{
+switch (opIdx) {
+  case 0: //vgpr_a
+return 4;
+  case 1: //vgpr_d0
+return 1;
+  default:
+fatal("op idx %i out of bounds\n", opIdx);
+return -1;
+}
+} // getOperandSize
+
+void execute(GPUDynInstPtr) override;
+void initiateAcc(GPUDynInstPtr) override;
+void completeAcc(GPUDynInstPtr) override;
+}; // Inst_DS__DS_WRITE_B8_D16_HI
+
 class Inst_DS__DS_WRITE_B16 : public Inst_DS
 {
   public:

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67411?usp=email

[gem5-dev] [S] Change in gem5/gem5[develop]: dev: Ignore MC146818 UIP bit / Fix x86 Linux 5.11+ boot

2023-01-17 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/66731?usp=email )


Change subject: dev: Ignore MC146818 UIP bit / Fix x86 Linux 5.11+ boot
..

dev: Ignore MC146818 UIP bit / Fix x86 Linux 5.11+ boot

As of Linux 5.11, the MC146818 code was changed to avoid reading garbage
data that may occur if the is a read while the registers are being
updated:

github.com/torvalds/linux/commit/05a0302c35481e9b47fb90ba40922b0a4cae40d8

Previously toggling this bit was fine as Linux would check twice. It now
checks before and after reading time information, causing it to retry
infinitely until eventually Linux bootup fails due to watchdog timeout.

This changeset always sets update in progress to false. Since this is a
simulation, the updates probably will not be occurring at the same time
a read is occurring.

Change-Id: If0f440de9f9a6bc5a773fc935d1d5af5b98a9a4b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66731
Reviewed-by: Matt Sinclair 
Tested-by: kokoro 
Maintainer: Bobby Bruce 
Reviewed-by: Jason Lowe-Power 
---
M src/dev/mc146818.cc
1 file changed, 31 insertions(+), 2 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, but someone else must approve
  Bobby Bruce: Looks good to me, approved
  Jason Lowe-Power: Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/dev/mc146818.cc b/src/dev/mc146818.cc
index 919efb0..2bfe877 100644
--- a/src/dev/mc146818.cc
+++ b/src/dev/mc146818.cc
@@ -233,8 +233,9 @@
 else {
 switch (addr) {
   case RTC_STAT_REGA:
-// toggle UIP bit for linux
-stat_regA.uip = !stat_regA.uip;
+// Linux after v5.10 checks this multiple times so toggling
+// leads to a deadlock on bootup.
+stat_regA.uip = 0;
 return stat_regA;
 break;
   case RTC_STAT_REGB:

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/66731?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: If0f440de9f9a6bc5a773fc935d1d5af5b98a9a4b
Gerrit-Change-Number: 66731
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Andreas Sandberg 
Gerrit-Reviewer: Bobby Bruce 
Gerrit-Reviewer: Gabe Black 
Gerrit-Reviewer: Gabe Black 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Read one dword for SGPR base global insts

2023-01-05 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67077?usp=email )


Change subject: arch-vega: Read one dword for SGPR base global insts
..

arch-vega: Read one dword for SGPR base global insts

Global instructions in Vega can either use a VGPR base address plus
instruction offset or SGPR base address plus VGPR offset plus
instruction offset. Currently the VGPR address/offset is always read as
two dwords. This causes problems if the VGPR number is the last VGPR
allocated to a wavefront since the second dword would be beyond the
allocation and trip an assert.

This changeset sets the operand size of the VGPR operand to one dword
when SGPR base is used and two dwords otherwise so initDynOperandInfo
does not assert. It also moves the read of the VGPR into the calcAddr
method so that the correct ConstVecOperandU## is used to prevent another
assertion failure when reading from the register file. These two changes
are made to all flat instructions, as global instructions are a
subsegement of flat instructions.

Change-Id: I79030771aa6deec05ffa5853ca2d8b68943ee0a0
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67077
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
M src/arch/amdgpu/vega/insts/op_encodings.hh
3 files changed, 101 insertions(+), 107 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index c803656..4b27afa 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -43831,11 +43831,7 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
-
-addr.read();
-
-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 issueRequestHelper(gpuDynInst);
 } // execute
@@ -43919,11 +43915,7 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
-
-addr.read();
-
-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 issueRequestHelper(gpuDynInst);
 } // execute
@@ -44008,11 +44000,7 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
-
-addr.read();
-
-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 issueRequestHelper(gpuDynInst);
 } // execute
@@ -44067,11 +44055,7 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
-
-addr.read();
-
-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 issueRequestHelper(gpuDynInst);
 } // execute
@@ -44126,11 +44110,7 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
-
-addr.read();
-
-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 issueRequestHelper(gpuDynInst);
 } // execute
@@ -44194,11 +44174,7 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
-
-addr.read();
-
-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 issueRequestHelper(gpuDynInst);
 } // execute
@@ -44266,13 +44242,11 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
 ConstVecOperandU8 data(gpuDynInst, extData.DATA);

-addr.read();
 data.read();

-calcAddr(gpuDynInst, addr, extData.SADDR, 

[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_write2st64_b64

2023-01-05 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67078?usp=email )


Change subject: arch-vega: Implement ds_write2st64_b64
..

arch-vega: Implement ds_write2st64_b64

Write two qwords at offsets multiplied by 8 * 64 bytes.

Change-Id: I0d0e05f3e848c2fd02d32095e32b7f023bd8803b
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67078
Reviewed-by: Matt Sinclair 
Tested-by: kokoro 
Maintainer: Matt Sinclair 
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
2 files changed, 62 insertions(+), 1 deletion(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 4b27afa..6cf01fb 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -36595,8 +36595,52 @@
 void
 Inst_DS__DS_WRITE2ST64_B64::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (gpuDynInst->exec_mask.none()) {
+wf->decLGKMInstsIssued();
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(
+gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
+ConstVecOperandU32 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU64 data0(gpuDynInst, extData.DATA0);
+ConstVecOperandU64 data1(gpuDynInst, extData.DATA1);
+
+addr.read();
+data0.read();
+data1.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(
+gpuDynInst->d_data))[lane * 2] = data0[lane];
+(reinterpret_cast(
+gpuDynInst->d_data))[lane * 2 + 1] = data1[lane];
+}
+}
+
+ 
gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst);

 } // execute
+
+void
+Inst_DS__DS_WRITE2ST64_B64::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+Addr offset0 = instData.OFFSET0 * 8 * 64;
+Addr offset1 = instData.OFFSET1 * 8 * 64;
+
+initDualMemWrite(gpuDynInst, offset0, offset1);
+}
+
+void
+Inst_DS__DS_WRITE2ST64_B64::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+}
 // --- Inst_DS__DS_CMPST_B64 class methods ---

 Inst_DS__DS_CMPST_B64::Inst_DS__DS_CMPST_B64(InFmt_DS *iFmt)
diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index 9f017f9..2896732 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -33572,6 +33572,8 @@
 } // getOperandSize

 void execute(GPUDynInstPtr) override;
+void initiateAcc(GPUDynInstPtr) override;
+void completeAcc(GPUDynInstPtr) override;
 }; // Inst_DS__DS_WRITE2ST64_B64

 class Inst_DS__DS_CMPST_B64 : public Inst_DS

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67078?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I0d0e05f3e848c2fd02d32095e32b7f023bd8803b
Gerrit-Change-Number: 67078
Gerrit-PatchSet: 3
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_read_i8

2023-01-05 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67076?usp=email )


Change subject: arch-vega: Implement ds_read_i8
..

arch-vega: Implement ds_read_i8

Read one byte with sign extended from LDS.

Change-Id: I9cb9b4033c6f834241cba944bc7e6a7ebc5401be
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67076
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
2 files changed, 60 insertions(+), 1 deletion(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index a54f426..c803656 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -35636,8 +35636,50 @@
 void
 Inst_DS__DS_READ_I8::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (gpuDynInst->exec_mask.none()) {
+wf->decLGKMInstsIssued();
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(
+gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
+ConstVecOperandU32 addr(gpuDynInst, extData.ADDR);
+
+addr.read();
+
+calcAddr(gpuDynInst, addr);
+
+ 
gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst);

 } // execute
+
+void
+Inst_DS__DS_READ_I8::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+Addr offset0 = instData.OFFSET0;
+Addr offset1 = instData.OFFSET1;
+Addr offset = (offset1 << 8) | offset0;
+
+initMemRead(gpuDynInst, offset);
+} // initiateAcc
+
+void
+Inst_DS__DS_READ_I8::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+VecOperandU32 vdst(gpuDynInst, extData.VDST);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+vdst[lane] =  
(VecElemU32)sext<8>((reinterpret_cast(

+gpuDynInst->d_data))[lane]);
+}
+}
+
+vdst.write();
+} // completeAcc
 // --- Inst_DS__DS_READ_U8 class methods ---

 Inst_DS__DS_READ_U8::Inst_DS__DS_READ_U8(InFmt_DS *iFmt)
diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index f8fc98b..b2cf2b9 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -32848,6 +32848,8 @@
 } // getOperandSize

 void execute(GPUDynInstPtr) override;
+void initiateAcc(GPUDynInstPtr) override;
+void completeAcc(GPUDynInstPtr) override;
 }; // Inst_DS__DS_READ_I8

 class Inst_DS__DS_READ_U8 : public Inst_DS

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67076?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I9cb9b4033c6f834241cba944bc7e6a7ebc5401be
Gerrit-Change-Number: 67076
Gerrit-PatchSet: 3
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_add_u64

2023-01-05 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67075?usp=email )


Change subject: arch-vega: Implement ds_add_u64
..

arch-vega: Implement ds_add_u64

This instruction does an atomic add of an unsigned 64-bit data with a
VGPR and value in LDS atomically without return.

Change-Id: I6a7d6713b256607c4e69ddbdef5c83172493c077
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67075
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
2 files changed, 64 insertions(+), 3 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 3d9808a..a54f426 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -36088,6 +36088,10 @@
 Inst_DS__DS_ADD_U64::Inst_DS__DS_ADD_U64(InFmt_DS *iFmt)
 : Inst_DS(iFmt, "ds_add_u64")
 {
+setFlag(MemoryRef);
+setFlag(GroupSegment);
+setFlag(AtomicAdd);
+setFlag(AtomicNoReturn);
 } // Inst_DS__DS_ADD_U64

 Inst_DS__DS_ADD_U64::~Inst_DS__DS_ADD_U64()
@@ -36096,14 +36100,53 @@

 // --- description from .arch file ---
 // 64b:
-// tmp = MEM[ADDR];
 // MEM[ADDR] += DATA[0:1];
-// RETURN_DATA[0:1] = tmp.
 void
 Inst_DS__DS_ADD_U64::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (gpuDynInst->exec_mask.none()) {
+wf->decLGKMInstsIssued();
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(
+gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
+ConstVecOperandU32 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU64 data(gpuDynInst, extData.DATA0);
+
+addr.read();
+data.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->a_data))[lane]
+= data[lane];
+}
+}
+
+ 
gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst);

 } // execute
+
+void
+Inst_DS__DS_ADD_U64::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+Addr offset0 = instData.OFFSET0;
+Addr offset1 = instData.OFFSET1;
+Addr offset = (offset1 << 8) | offset0;
+
+initAtomicAccess(gpuDynInst, offset);
+} // initiateAcc
+
+void
+Inst_DS__DS_ADD_U64::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+} // completeAcc
 // --- Inst_DS__DS_SUB_U64 class methods ---

 Inst_DS__DS_SUB_U64::Inst_DS__DS_SUB_U64(InFmt_DS *iFmt)
diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index 05a0002..f8fc98b 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -33079,6 +33079,8 @@
 }
 } // getOperandSize

+void initiateAcc(GPUDynInstPtr gpuDynInst) override;
+void completeAcc(GPUDynInstPtr gpuDynInst) override;
 void execute(GPUDynInstPtr) override;
 }; // Inst_DS__DS_ADD_U64


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67075?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I6a7d6713b256607c4e69ddbdef5c83172493c077
Gerrit-Change-Number: 67075
Gerrit-PatchSet: 3
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: base: Specialize bitwise atomics so FP types can be used

2023-01-05 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67073?usp=email )


Change subject: base: Specialize bitwise atomics so FP types can be used
..

base: Specialize bitwise atomics so FP types can be used

The current atomic memory operations are templated so any type can be
used. However floating point types can not perform bitwise operations.
The GPU model contains some instructions which do atomics on floating
point types, so they need to be supported. To allow this, template
specialization is added to atomic AND, OR, and XOR which does nothing
if the type is floating point and operates as normal for integral
types.

Change-Id: I60f935756355462e99c59a9da032c5bf5afa246c
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67073
Reviewed-by: Matt Sinclair 
Reviewed-by: Daniel Carvalho 
Tested-by: kokoro 
Maintainer: Matt Sinclair 
---
M src/base/amo.hh
1 file changed, 52 insertions(+), 3 deletions(-)

Approvals:
  kokoro: Regressions pass
  Daniel Carvalho: Looks good to me, approved
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved




diff --git a/src/base/amo.hh b/src/base/amo.hh
index 81bf069..c990d15 100644
--- a/src/base/amo.hh
+++ b/src/base/amo.hh
@@ -129,30 +129,57 @@
 template
 class AtomicOpAnd : public TypedAtomicOpFunctor
 {
+// Bitwise operations are only legal on integral types
+template
+typename std::enable_if::value, void>::type
+executeImpl(B *b) { *b &= a; }
+
+template
+typename std::enable_if::value, void>::type
+executeImpl(B *b) { }
+
   public:
 T a;
 AtomicOpAnd(T _a) : a(_a) { }
-void execute(T *b) { *b &= a; }
+void execute(T *b) { executeImpl(b); }
 AtomicOpFunctor* clone () { return new AtomicOpAnd(a); }
 };

 template
 class AtomicOpOr : public TypedAtomicOpFunctor
 {
+// Bitwise operations are only legal on integral types
+template
+typename std::enable_if::value, void>::type
+executeImpl(B *b) { *b |= a; }
+
+template
+typename std::enable_if::value, void>::type
+executeImpl(B *b) { }
+
   public:
 T a;
 AtomicOpOr(T _a) : a(_a) { }
-void execute(T *b) { *b |= a; }
+void execute(T *b) { executeImpl(b); }
 AtomicOpFunctor* clone () { return new AtomicOpOr(a); }
 };

 template
 class AtomicOpXor : public TypedAtomicOpFunctor
 {
+// Bitwise operations are only legal on integral types
+template
+typename std::enable_if::value, void>::type
+executeImpl(B *b) { *b ^= a; }
+
+template
+typename std::enable_if::value, void>::type
+executeImpl(B *b) { }
+
   public:
 T a;
 AtomicOpXor(T _a) : a(_a) {}
-void execute(T *b) { *b ^= a; }
+void execute(T *b) { executeImpl(b); }
 AtomicOpFunctor* clone () { return new AtomicOpXor(a); }
 };


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67073?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I60f935756355462e99c59a9da032c5bf5afa246c
Gerrit-Change-Number: 67073
Gerrit-PatchSet: 3
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Bobby Bruce 
Gerrit-Reviewer: Daniel Carvalho 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_add_f32 atomic

2023-01-05 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67074?usp=email )


Change subject: arch-vega: Implement ds_add_f32 atomic
..

arch-vega: Implement ds_add_f32 atomic

This instruction does an atomic add of a 32-bit float with a VGPR and
value in LDS atomically without return.

Change-Id: Id4f23a1ab587a23edfd1d88ede1cbcc5bdedc0cb
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67074
Maintainer: Matt Sinclair 
Reviewed-by: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
2 files changed, 64 insertions(+), 3 deletions(-)

Approvals:
  kokoro: Regressions pass
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved




diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index afdfde3..3d9808a 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -34755,6 +34755,10 @@
 : Inst_DS(iFmt, "ds_add_f32")
 {
 setFlag(F32);
+setFlag(MemoryRef);
+setFlag(GroupSegment);
+setFlag(AtomicAdd);
+setFlag(AtomicNoReturn);
 } // Inst_DS__DS_ADD_F32

 Inst_DS__DS_ADD_F32::~Inst_DS__DS_ADD_F32()
@@ -34763,15 +34767,54 @@

 // --- description from .arch file ---
 // 32b:
-// tmp = MEM[ADDR];
 // MEM[ADDR] += DATA;
-// RETURN_DATA = tmp.
 // Floating point add that handles NaN/INF/denormal values.
 void
 Inst_DS__DS_ADD_F32::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (gpuDynInst->exec_mask.none()) {
+wf->decLGKMInstsIssued();
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(
+gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
+ConstVecOperandU32 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandF32 data(gpuDynInst, extData.DATA0);
+
+addr.read();
+data.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->a_data))[lane]
+= data[lane];
+}
+}
+
+ 
gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst);

 } // execute
+
+void
+Inst_DS__DS_ADD_F32::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+Addr offset0 = instData.OFFSET0;
+Addr offset1 = instData.OFFSET1;
+Addr offset = (offset1 << 8) | offset0;
+
+initAtomicAccess(gpuDynInst, offset);
+} // initiateAcc
+
+void
+Inst_DS__DS_ADD_F32::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+} // completeAcc
 // --- Inst_DS__DS_WRITE_B8 class methods ---

 Inst_DS__DS_WRITE_B8::Inst_DS__DS_WRITE_B8(InFmt_DS *iFmt)
diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index 33be33e..05a0002 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -31895,6 +31895,8 @@
 }
 } // getOperandSize

+void initiateAcc(GPUDynInstPtr gpuDynInst) override;
+void completeAcc(GPUDynInstPtr gpuDynInst) override;
 void execute(GPUDynInstPtr) override;
 }; // Inst_DS__DS_ADD_F32


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67074?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Id4f23a1ab587a23edfd1d88ede1cbcc5bdedc0cb
Gerrit-Change-Number: 67074
Gerrit-PatchSet: 3
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Fix signed BFE instructions

2023-01-03 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/66751?usp=email )


Change subject: arch-vega: Fix signed BFE instructions
..

arch-vega: Fix signed BFE instructions

The bitfield extract instructions come in unsigned and signed variants.
The documentation on this is not correct, however the GCN3 documentation
gives some clues. The instruction should extract an N-bit integer where
N is defined in a source operand starting at some bit also defined by a
source operand. For signed variants of this instruction, the N-bit
integer should be sign extended but is currently not.

This changeset does sign extension using the runtime value of N by ORing
the upper bits with ones if the most significant bit is one. This was
verified by writing these instructions in assembly and running on a real
GPU. Changes are made to v_bfe_i32, s_bfe_i32, and s_bfe_i64.

Change-Id: Ia192f5940200c6de48867b02f709a7f1b2daa974
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66751
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
---
M src/arch/amdgpu/vega/insts/instructions.cc
1 file changed, 55 insertions(+), 0 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index f5b08b7..c9e57bc 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -1302,6 +1302,21 @@

 sdst = (src0.rawData() >> bits(src1.rawData(), 4, 0))
 & ((1 << bits(src1.rawData(), 22, 16)) - 1);
+
+// Above extracted a signed int of size src1[22:16] bits which  
needs

+// to be signed-extended. Check if the MSB of our src1[22:16]-bit
+// integer is 1, and sign extend it is.
+//
+// Note: The description in the Vega ISA manual does not mention to
+// sign-extend the result. An update description can be found in  
the

+// more recent RDNA3 manual here:
+// https://developer.amd.com/wp-content/resources/
+//  RDNA3_Shader_ISA_December2022.pdf
+if (sdst.rawData() >> (bits(src1.rawData(), 22, 16) - 1)) {
+sdst = sdst.rawData()
+ | (0x << bits(src1.rawData(), 22, 16));
+}
+
 scc = sdst.rawData() ? 1 : 0;

 sdst.write();
@@ -1373,6 +1388,14 @@

 sdst = (src0.rawData() >> bits(src1.rawData(), 5, 0))
 & ((1 << bits(src1.rawData(), 22, 16)) - 1);
+
+// Above extracted a signed int of size src1[22:16] bits which  
needs

+// to be signed-extended. Check if the MSB of our src1[22:16]-bit
+// integer is 1, and sign extend it is.
+if (sdst.rawData() >> (bits(src1.rawData(), 22, 16) - 1)) {
+sdst = sdst.rawData()
+ | 0x << bits(src1.rawData(), 22, 16);
+}
 scc = sdst.rawData() ? 1 : 0;

 sdst.write();
@@ -30544,6 +30567,13 @@
 if (wf->execMask(lane)) {
 vdst[lane] = (src0[lane] >> bits(src1[lane], 4, 0))
 & ((1 << bits(src2[lane], 4, 0)) - 1);
+
+// Above extracted a signed int of size src2 bits which  
needs

+// to be signed-extended. Check if the MSB of our src2-bit
+// integer is 1, and sign extend it is.
+if (vdst[lane] >> (bits(src2[lane], 4, 0) - 1)) {
+vdst[lane] |= 0x << bits(src2[lane], 4, 0);
+}
 }
 }


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/66751?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ia192f5940200c6de48867b02f709a7f1b2daa974
Gerrit-Change-Number: 66751
Gerrit-PatchSet: 4
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Fix several issues with DPP

2023-01-03 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/66752?usp=email )


 (

2 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

 )Change subject: arch-vega: Fix several issues with DPP
..

arch-vega: Fix several issues with DPP

DPP processing has several issues which are fixed in this changeset:

1) Incorrect comment is updated
2) newLane calculation for shift/rotate instructions is corrected
3) A copy of original data is made so that a copy of a copy is not made
4) Reset all booleans (OOB, zeroSrc, laneDisabled) after each lane
iteration

The shift, rotate, and broadcast variants were tested by implementing
them in assembly and running on silicon.

Change-Id: If86fbb26c87eaca4ef0587fd846978115858b168
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66752
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
---
M src/arch/amdgpu/vega/insts/inst_util.hh
1 file changed, 58 insertions(+), 23 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/insts/inst_util.hh  
b/src/arch/amdgpu/vega/insts/inst_util.hh

index 01925f9..7ec2e2d 100644
--- a/src/arch/amdgpu/vega/insts/inst_util.hh
+++ b/src/arch/amdgpu/vega/insts/inst_util.hh
@@ -303,9 +303,9 @@
  * Currently the values are:
  * 0x0 - 0xFF: full permute of four threads
  * 0x100: reserved
- * 0x101 - 0x10F: row shift right by 1-15 threads
+ * 0x101 - 0x10F: row shift left by 1-15 threads
  * 0x111 - 0x11F: row shift right by 1-15 threads
- * 0x121 - 0x12F: row shift right by 1-15 threads
+ * 0x121 - 0x12F: row rotate right by 1-15 threads
  * 0x130: wavefront left shift by 1 thread
  * 0x134: wavefront left rotate by 1 thread
  * 0x138: wavefront right shift by 1 thread
@@ -322,7 +322,8 @@
 // newLane will be the same as the input lane unless swizzling  
happens

 int newLane = currLane;
 // for shift/rotate permutations; positive values are LEFT rotates
-int count = 1;
+// shift/rotate left means lane n -> lane n-1 (e.g., lane 1 ->  
lane 0)

+int count = 0;
 int localRowOffset = rowOffset;
 int localRowNum = rowNum;

@@ -335,51 +336,47 @@
 panic("ERROR: instruction using reserved DPP_CTRL value\n");
 } else if ((dppCtrl >= SQ_DPP_ROW_SL1) &&
(dppCtrl <= SQ_DPP_ROW_SL15)) { // DPP_ROW_SL{1:15}
-count -= (dppCtrl - SQ_DPP_ROW_SL1 + 1);
+count = (dppCtrl - SQ_DPP_ROW_SL1 + 1);
 if ((localRowOffset + count >= 0) &&
 (localRowOffset + count < ROW_SIZE)) {
 localRowOffset += count;
-newLane = (rowNum | localRowOffset);
+newLane = ((rowNum * ROW_SIZE) | localRowOffset);
 } else {
 outOfBounds = true;
 }
 } else if ((dppCtrl >= SQ_DPP_ROW_SR1) &&
(dppCtrl <= SQ_DPP_ROW_SR15)) { // DPP_ROW_SR{1:15}
-count -= (dppCtrl - SQ_DPP_ROW_SR1 + 1);
+count = -(dppCtrl - SQ_DPP_ROW_SR1 + 1);
 if ((localRowOffset + count >= 0) &&
 (localRowOffset + count < ROW_SIZE)) {
 localRowOffset += count;
-newLane = (rowNum | localRowOffset);
+newLane = ((rowNum * ROW_SIZE) | localRowOffset);
 } else {
 outOfBounds = true;
 }
 } else if ((dppCtrl >= SQ_DPP_ROW_RR1) &&
(dppCtrl <= SQ_DPP_ROW_RR15)) { // DPP_ROW_RR{1:15}
-count -= (dppCtrl - SQ_DPP_ROW_RR1 + 1);
+count = -(dppCtrl - SQ_DPP_ROW_RR1 + 1);
 localRowOffset = (localRowOffset + count + ROW_SIZE) %  
ROW_SIZE;

-newLane = (rowNum | localRowOffset);
+newLane = ((rowNum * ROW_SIZE) | localRowOffset);
 } else if (dppCtrl == SQ_DPP_WF_SL1) { // DPP_WF_SL1
-count = 1;
 if ((currLane >= 0) && (currLane < NumVecElemPerVecReg)) {
-newLane += count;
+newLane += 1;
 } else {
 outOfBounds = true;
 }
 } else if (dppCtrl == SQ_DPP_WF_RL1) { // DPP_WF_RL1
-count = 1;
-newLane = (currLane + count + NumVecElemPerVecReg) %
+newLane = (currLane - 1 + NumVecElemPerVecReg) %
   NumVecElemPerVecReg;
 } else if (dppCtrl == SQ_DPP_WF_SR1) { // DPP_WF_SR1
-count = -1;
-int currVal = (currLane + count);
+int currVal = (currLane - 1);
 if ((currVal >= 0) && (currVal < NumVecElemPerVecReg)) {
-newLane += count;
+newLane -= 1;
 } 

[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Add missing operand size for ds_write2st64_b64

2023-01-03 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67071?usp=email )


Change subject: arch-vega: Add missing operand size for ds_write2st64_b64
..

arch-vega: Add missing operand size for ds_write2st64_b64

This instruction takes three operands (address, and two datas) but there
were only operand sizes for two operands tripping assert in default
case.

Change-Id: I3f505b6432aee5f3f265acac46b83c0c7daff3e7
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67071
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
---
M src/arch/amdgpu/vega/insts/instructions.hh
1 file changed, 20 insertions(+), 1 deletion(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index 0671df8..1c42248 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -33553,7 +33553,9 @@
 switch (opIdx) {
   case 0: //vgpr_a
 return 4;
-  case 1: //vgpr_d1
+  case 1: //vgpr_d0
+return 8;
+  case 2: //vgpr_d1
 return 8;
   default:
 fatal("op idx %i out of bounds\n", opIdx);

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67071?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I3f505b6432aee5f3f265acac46b83c0c7daff3e7
Gerrit-Change-Number: 67071
Gerrit-PatchSet: 3
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_add_u32 atomic

2023-01-03 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67072?usp=email )


Change subject: arch-vega: Implement ds_add_u32 atomic
..

arch-vega: Implement ds_add_u32 atomic

This instruction does an atomic add of unsigned 32-bit data with a VGPR
and value in LDS atomically, without return.

Change-Id: I87579a94f6200a9a066f8f7390e57fb5fb6eff8e
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/67072
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
2 files changed, 64 insertions(+), 3 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 1f37ff1..afdfde3 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -34071,6 +34071,10 @@
 Inst_DS__DS_ADD_U32::Inst_DS__DS_ADD_U32(InFmt_DS *iFmt)
 : Inst_DS(iFmt, "ds_add_u32")
 {
+setFlag(MemoryRef);
+setFlag(GroupSegment);
+setFlag(AtomicAdd);
+setFlag(AtomicNoReturn);
 } // Inst_DS__DS_ADD_U32

 Inst_DS__DS_ADD_U32::~Inst_DS__DS_ADD_U32()
@@ -34079,14 +34083,53 @@

 // --- description from .arch file ---
 // 32b:
-// tmp = MEM[ADDR];
 // MEM[ADDR] += DATA;
-// RETURN_DATA = tmp.
 void
 Inst_DS__DS_ADD_U32::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (gpuDynInst->exec_mask.none()) {
+wf->decLGKMInstsIssued();
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(
+gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
+ConstVecOperandU32 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU32 data(gpuDynInst, extData.DATA0);
+
+addr.read();
+data.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->a_data))[lane]
+= data[lane];
+}
+}
+
+ 
gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst);

 } // execute
+
+void
+Inst_DS__DS_ADD_U32::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+Addr offset0 = instData.OFFSET0;
+Addr offset1 = instData.OFFSET1;
+Addr offset = (offset1 << 8) | offset0;
+
+initAtomicAccess(gpuDynInst, offset);
+} // initiateAcc
+
+void
+Inst_DS__DS_ADD_U32::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+} // completeAcc
 // --- Inst_DS__DS_SUB_U32 class methods ---

 Inst_DS__DS_SUB_U32::Inst_DS__DS_SUB_U32(InFmt_DS *iFmt)
diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index 1c42248..33be33e 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -31211,6 +31211,8 @@
 }
 } // getOperandSize

+void initiateAcc(GPUDynInstPtr gpuDynInst) override;
+void completeAcc(GPUDynInstPtr gpuDynInst) override;
 void execute(GPUDynInstPtr) override;
 }; // Inst_DS__DS_ADD_U32


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67072?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I87579a94f6200a9a066f8f7390e57fb5fb6eff8e
Gerrit-Change-Number: 67072
Gerrit-PatchSet: 3
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Add DPP support for V_AND_B32

2023-01-03 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/66753?usp=email )


 (

1 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

 )Change subject: arch-vega: Add DPP support for V_AND_B32
..

arch-vega: Add DPP support for V_AND_B32

A DPP variant of V_AND_B32 was found in rocPRIM. With this changeset the
unit tests for rocPRIM scan_inclusive are passing.

Change-Id: I5a65f2cf6b56ac13609b191e3b3dfeb55e630942
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66753
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M src/arch/amdgpu/vega/insts/instructions.cc
1 file changed, 46 insertions(+), 4 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index c9e57bc..1f37ff1 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -6844,15 +6844,41 @@
 {
 Wavefront *wf = gpuDynInst->wavefront();
 ConstVecOperandU32 src0(gpuDynInst, instData.SRC0);
-ConstVecOperandU32 src1(gpuDynInst, instData.VSRC1);
+VecOperandU32 src1(gpuDynInst, instData.VSRC1);
 VecOperandU32 vdst(gpuDynInst, instData.VDST);

 src0.readSrc();
 src1.read();

-for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
-if (wf->execMask(lane)) {
-vdst[lane] = src0[lane] & src1[lane];
+if (isDPPInst()) {
+VecOperandU32 src0_dpp(gpuDynInst, extData.iFmt_VOP_DPP.SRC0);
+src0_dpp.read();
+
+DPRINTF(VEGA, "Handling V_AND_B32 SRC DPP. SRC0: register  
v[%d], "

+"DPP_CTRL: 0x%#x, SRC0_ABS: %d, SRC0_NEG: %d, "
+"SRC1_ABS: %d, SRC1_NEG: %d, BC: %d, "
+"BANK_MASK: %d, ROW_MASK: %d\n",  
extData.iFmt_VOP_DPP.SRC0,

+extData.iFmt_VOP_DPP.DPP_CTRL,
+extData.iFmt_VOP_DPP.SRC0_ABS,
+extData.iFmt_VOP_DPP.SRC0_NEG,
+extData.iFmt_VOP_DPP.SRC1_ABS,
+extData.iFmt_VOP_DPP.SRC1_NEG,
+extData.iFmt_VOP_DPP.BC,
+extData.iFmt_VOP_DPP.BANK_MASK,
+extData.iFmt_VOP_DPP.ROW_MASK);
+
+processDPP(gpuDynInst, extData.iFmt_VOP_DPP, src0_dpp, src1);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (wf->execMask(lane)) {
+vdst[lane] = src0_dpp[lane] & src1[lane];
+}
+}
+} else {
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (wf->execMask(lane)) {
+vdst[lane] = src0[lane] & src1[lane];
+}
 }
 }


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/66753?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I5a65f2cf6b56ac13609b191e3b3dfeb55e630942
Gerrit-Change-Number: 66753
Gerrit-PatchSet: 5
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_write2st64_b64

2022-12-30 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67078?usp=email )



Change subject: arch-vega: Implement ds_write2st64_b64
..

arch-vega: Implement ds_write2st64_b64

Write two qwords at offsets multiplied by 8 * 64 bytes.

Change-Id: I0d0e05f3e848c2fd02d32095e32b7f023bd8803b
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
2 files changed, 58 insertions(+), 1 deletion(-)



diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 7594f9c..3ef11c4 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -36589,8 +36589,52 @@
 void
 Inst_DS__DS_WRITE2ST64_B64::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (gpuDynInst->exec_mask.none()) {
+wf->decLGKMInstsIssued();
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(
+gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
+ConstVecOperandU32 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU64 data0(gpuDynInst, extData.DATA0);
+ConstVecOperandU64 data1(gpuDynInst, extData.DATA1);
+
+addr.read();
+data0.read();
+data1.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(
+gpuDynInst->d_data))[lane * 2] = data0[lane];
+(reinterpret_cast(
+gpuDynInst->d_data))[lane * 2 + 1] = data1[lane];
+}
+}
+
+ 
gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst);

 } // execute
+
+void
+Inst_DS__DS_WRITE2ST64_B64::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+Addr offset0 = instData.OFFSET0 * 8 * 64;
+Addr offset1 = instData.OFFSET1 * 8 * 64;
+
+initDualMemWrite(gpuDynInst, offset0, offset1);
+}
+
+void
+Inst_DS__DS_WRITE2ST64_B64::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+}
 // --- Inst_DS__DS_CMPST_B64 class methods ---

 Inst_DS__DS_CMPST_B64::Inst_DS__DS_CMPST_B64(InFmt_DS *iFmt)
diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index 9f017f9..2896732 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -33572,6 +33572,8 @@
 } // getOperandSize

 void execute(GPUDynInstPtr) override;
+void initiateAcc(GPUDynInstPtr) override;
+void completeAcc(GPUDynInstPtr) override;
 }; // Inst_DS__DS_WRITE2ST64_B64

 class Inst_DS__DS_CMPST_B64 : public Inst_DS

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67078?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I0d0e05f3e848c2fd02d32095e32b7f023bd8803b
Gerrit-Change-Number: 67078
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Read one dword for SGPR base global insts

2022-12-30 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67077?usp=email )



Change subject: arch-vega: Read one dword for SGPR base global insts
..

arch-vega: Read one dword for SGPR base global insts

Global instructions in Vega can either use a VGPR base address plus
instruction offset or SGPR base address plus VGPR offset plus
instruction offset. Currently the VGPR address/offset is always read as
two dwords. This causes problems if the VGPR number is the last VGPR
allocated to a wavefront since the second dword would be beyond the
allocation and trip an assert.

This changeset sets the operand size of the VGPR operand to one dword
when SGPR base is used and two dwords otherwise so initDynOperandInfo
does not assert. It also moves the read of the VGPR into the calcAddr
method so that the correct ConstVecOperandU## is used to prevent another
assertion failure when reading from the register file. These two changes
are made to all flat instructions, as global instructions are a
subsegement of flat instructions.

Change-Id: I79030771aa6deec05ffa5853ca2d8b68943ee0a0
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
M src/arch/amdgpu/vega/insts/op_encodings.hh
3 files changed, 97 insertions(+), 107 deletions(-)



diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index f0fb1aa..7594f9c 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -43825,11 +43825,7 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
-
-addr.read();
-
-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 issueRequestHelper(gpuDynInst);
 } // execute
@@ -43913,11 +43909,7 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
-
-addr.read();
-
-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 issueRequestHelper(gpuDynInst);
 } // execute
@@ -44002,11 +43994,7 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
-
-addr.read();
-
-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 issueRequestHelper(gpuDynInst);
 } // execute
@@ -44061,11 +44049,7 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
-
-addr.read();
-
-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 issueRequestHelper(gpuDynInst);
 } // execute
@@ -44120,11 +44104,7 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
-
-addr.read();
-
-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 issueRequestHelper(gpuDynInst);
 } // execute
@@ -44188,11 +44168,7 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
-
-addr.read();
-
-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 issueRequestHelper(gpuDynInst);
 } // execute
@@ -44260,13 +44236,11 @@
 gpuDynInst->latency.init(gpuDynInst->computeUnit());
 gpuDynInst->latency.set(gpuDynInst->computeUnit()->clockPeriod());

-ConstVecOperandU64 addr(gpuDynInst, extData.ADDR);
 ConstVecOperandU8 data(gpuDynInst, extData.DATA);

-addr.read();
 data.read();

-calcAddr(gpuDynInst, addr, extData.SADDR, instData.OFFSET);
+calcAddr(gpuDynInst, extData.ADDR, extData.SADDR, instData.OFFSET);

 for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
 if (gpuDynInst->exec_mask[lane]) {
@@ -44319,13 +44293,11 @@
 

[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Add missing operand size for ds_write2st64_b64

2022-12-30 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67071?usp=email )



Change subject: arch-vega: Add missing operand size for ds_write2st64_b64
..

arch-vega: Add missing operand size for ds_write2st64_b64

This instruction takes three operands (address, and two datas) but there
were only operand sizes for two operands tripping assert in default
case.

Change-Id: I3f505b6432aee5f3f265acac46b83c0c7daff3e7
---
M src/arch/amdgpu/vega/insts/instructions.hh
1 file changed, 16 insertions(+), 1 deletion(-)



diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index 0671df8..1c42248 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -33553,7 +33553,9 @@
 switch (opIdx) {
   case 0: //vgpr_a
 return 4;
-  case 1: //vgpr_d1
+  case 1: //vgpr_d0
+return 8;
+  case 2: //vgpr_d1
 return 8;
   default:
 fatal("op idx %i out of bounds\n", opIdx);

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67071?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I3f505b6432aee5f3f265acac46b83c0c7daff3e7
Gerrit-Change-Number: 67071
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: base: Specialize bitwise atomics so FP types can be used

2022-12-30 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67073?usp=email )



Change subject: base: Specialize bitwise atomics so FP types can be used
..

base: Specialize bitwise atomics so FP types can be used

The current atomic memory operations are templated so any type can be
used. However floating point types can not perform bitwise operations.
The GPU model contains some instructions which do atomics on floating
point types, so they need to be supported. To allow this, template
specialization is added to atomic AND, OR, and XOR which does nothing
if the type is floating point and operates as normal for integral
types.

Change-Id: I60f935756355462e99c59a9da032c5bf5afa246c
---
M src/base/amo.hh
1 file changed, 47 insertions(+), 3 deletions(-)



diff --git a/src/base/amo.hh b/src/base/amo.hh
index 81bf069..c990d15 100644
--- a/src/base/amo.hh
+++ b/src/base/amo.hh
@@ -129,30 +129,57 @@
 template
 class AtomicOpAnd : public TypedAtomicOpFunctor
 {
+// Bitwise operations are only legal on integral types
+template
+typename std::enable_if::value, void>::type
+executeImpl(B *b) { *b &= a; }
+
+template
+typename std::enable_if::value, void>::type
+executeImpl(B *b) { }
+
   public:
 T a;
 AtomicOpAnd(T _a) : a(_a) { }
-void execute(T *b) { *b &= a; }
+void execute(T *b) { executeImpl(b); }
 AtomicOpFunctor* clone () { return new AtomicOpAnd(a); }
 };

 template
 class AtomicOpOr : public TypedAtomicOpFunctor
 {
+// Bitwise operations are only legal on integral types
+template
+typename std::enable_if::value, void>::type
+executeImpl(B *b) { *b |= a; }
+
+template
+typename std::enable_if::value, void>::type
+executeImpl(B *b) { }
+
   public:
 T a;
 AtomicOpOr(T _a) : a(_a) { }
-void execute(T *b) { *b |= a; }
+void execute(T *b) { executeImpl(b); }
 AtomicOpFunctor* clone () { return new AtomicOpOr(a); }
 };

 template
 class AtomicOpXor : public TypedAtomicOpFunctor
 {
+// Bitwise operations are only legal on integral types
+template
+typename std::enable_if::value, void>::type
+executeImpl(B *b) { *b ^= a; }
+
+template
+typename std::enable_if::value, void>::type
+executeImpl(B *b) { }
+
   public:
 T a;
 AtomicOpXor(T _a) : a(_a) {}
-void execute(T *b) { *b ^= a; }
+void execute(T *b) { executeImpl(b); }
 AtomicOpFunctor* clone () { return new AtomicOpXor(a); }
 };


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67073?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I60f935756355462e99c59a9da032c5bf5afa246c
Gerrit-Change-Number: 67073
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_add_u64

2022-12-30 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67075?usp=email )



Change subject: arch-vega: Implement ds_add_u64
..

arch-vega: Implement ds_add_u64

This instruction does an atomic add of an unsigned 64-bit data with a
VGPR and value in LDS atomically without return.

Change-Id: I6a7d6713b256607c4e69ddbdef5c83172493c077
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
2 files changed, 60 insertions(+), 3 deletions(-)



diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index a0308c8..511a767 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -36082,6 +36082,10 @@
 Inst_DS__DS_ADD_U64::Inst_DS__DS_ADD_U64(InFmt_DS *iFmt)
 : Inst_DS(iFmt, "ds_add_u64")
 {
+setFlag(MemoryRef);
+setFlag(GroupSegment);
+setFlag(AtomicAdd);
+setFlag(AtomicNoReturn);
 } // Inst_DS__DS_ADD_U64

 Inst_DS__DS_ADD_U64::~Inst_DS__DS_ADD_U64()
@@ -36090,14 +36094,53 @@

 // --- description from .arch file ---
 // 64b:
-// tmp = MEM[ADDR];
 // MEM[ADDR] += DATA[0:1];
-// RETURN_DATA[0:1] = tmp.
 void
 Inst_DS__DS_ADD_U64::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (gpuDynInst->exec_mask.none()) {
+wf->decLGKMInstsIssued();
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(
+gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
+ConstVecOperandU32 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU64 data(gpuDynInst, extData.DATA0);
+
+addr.read();
+data.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->a_data))[lane]
+= data[lane];
+}
+}
+
+ 
gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst);

 } // execute
+
+void
+Inst_DS__DS_ADD_U64::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+Addr offset0 = instData.OFFSET0;
+Addr offset1 = instData.OFFSET1;
+Addr offset = (offset1 << 8) | offset0;
+
+initAtomicAccess(gpuDynInst, offset);
+} // initiateAcc
+
+void
+Inst_DS__DS_ADD_U64::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+} // completeAcc
 // --- Inst_DS__DS_SUB_U64 class methods ---

 Inst_DS__DS_SUB_U64::Inst_DS__DS_SUB_U64(InFmt_DS *iFmt)
diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index 05a0002..f8fc98b 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -33079,6 +33079,8 @@
 }
 } // getOperandSize

+void initiateAcc(GPUDynInstPtr gpuDynInst) override;
+void completeAcc(GPUDynInstPtr gpuDynInst) override;
 void execute(GPUDynInstPtr) override;
 }; // Inst_DS__DS_ADD_U64


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67075?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I6a7d6713b256607c4e69ddbdef5c83172493c077
Gerrit-Change-Number: 67075
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_add_f32 atomic

2022-12-30 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67074?usp=email )



Change subject: arch-vega: Implement ds_add_f32 atomic
..

arch-vega: Implement ds_add_f32 atomic

This instruction does an atomic add of a 32-bit float with a VGPR and
value in LDS atomically without return.

Change-Id: Id4f23a1ab587a23edfd1d88ede1cbcc5bdedc0cb
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
2 files changed, 60 insertions(+), 3 deletions(-)



diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 5332687..a0308c8 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -34749,6 +34749,10 @@
 : Inst_DS(iFmt, "ds_add_f32")
 {
 setFlag(F32);
+setFlag(MemoryRef);
+setFlag(GroupSegment);
+setFlag(AtomicAdd);
+setFlag(AtomicNoReturn);
 } // Inst_DS__DS_ADD_F32

 Inst_DS__DS_ADD_F32::~Inst_DS__DS_ADD_F32()
@@ -34757,15 +34761,54 @@

 // --- description from .arch file ---
 // 32b:
-// tmp = MEM[ADDR];
 // MEM[ADDR] += DATA;
-// RETURN_DATA = tmp.
 // Floating point add that handles NaN/INF/denormal values.
 void
 Inst_DS__DS_ADD_F32::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (gpuDynInst->exec_mask.none()) {
+wf->decLGKMInstsIssued();
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(
+gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
+ConstVecOperandU32 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandF32 data(gpuDynInst, extData.DATA0);
+
+addr.read();
+data.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->a_data))[lane]
+= data[lane];
+}
+}
+
+ 
gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst);

 } // execute
+
+void
+Inst_DS__DS_ADD_F32::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+Addr offset0 = instData.OFFSET0;
+Addr offset1 = instData.OFFSET1;
+Addr offset = (offset1 << 8) | offset0;
+
+initAtomicAccess(gpuDynInst, offset);
+} // initiateAcc
+
+void
+Inst_DS__DS_ADD_F32::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+} // completeAcc
 // --- Inst_DS__DS_WRITE_B8 class methods ---

 Inst_DS__DS_WRITE_B8::Inst_DS__DS_WRITE_B8(InFmt_DS *iFmt)
diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index 33be33e..05a0002 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -31895,6 +31895,8 @@
 }
 } // getOperandSize

+void initiateAcc(GPUDynInstPtr gpuDynInst) override;
+void completeAcc(GPUDynInstPtr gpuDynInst) override;
 void execute(GPUDynInstPtr) override;
 }; // Inst_DS__DS_ADD_F32


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67074?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Id4f23a1ab587a23edfd1d88ede1cbcc5bdedc0cb
Gerrit-Change-Number: 67074
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_add_u32 atomic

2022-12-30 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67072?usp=email )



Change subject: arch-vega: Implement ds_add_u32 atomic
..

arch-vega: Implement ds_add_u32 atomic

This instruction does an atomic add of unsigned 32-bit data with a VGPR
and value in LDS atomically, without return.

Change-Id: I87579a94f6200a9a066f8f7390e57fb5fb6eff8e
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
2 files changed, 60 insertions(+), 3 deletions(-)



diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 3570e32..5332687 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -34065,6 +34065,10 @@
 Inst_DS__DS_ADD_U32::Inst_DS__DS_ADD_U32(InFmt_DS *iFmt)
 : Inst_DS(iFmt, "ds_add_u32")
 {
+setFlag(MemoryRef);
+setFlag(GroupSegment);
+setFlag(AtomicAdd);
+setFlag(AtomicNoReturn);
 } // Inst_DS__DS_ADD_U32

 Inst_DS__DS_ADD_U32::~Inst_DS__DS_ADD_U32()
@@ -34073,14 +34077,53 @@

 // --- description from .arch file ---
 // 32b:
-// tmp = MEM[ADDR];
 // MEM[ADDR] += DATA;
-// RETURN_DATA = tmp.
 void
 Inst_DS__DS_ADD_U32::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (gpuDynInst->exec_mask.none()) {
+wf->decLGKMInstsIssued();
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(
+gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
+ConstVecOperandU32 addr(gpuDynInst, extData.ADDR);
+ConstVecOperandU32 data(gpuDynInst, extData.DATA0);
+
+addr.read();
+data.read();
+
+calcAddr(gpuDynInst, addr);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+(reinterpret_cast(gpuDynInst->a_data))[lane]
+= data[lane];
+}
+}
+
+ 
gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst);

 } // execute
+
+void
+Inst_DS__DS_ADD_U32::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+Addr offset0 = instData.OFFSET0;
+Addr offset1 = instData.OFFSET1;
+Addr offset = (offset1 << 8) | offset0;
+
+initAtomicAccess(gpuDynInst, offset);
+} // initiateAcc
+
+void
+Inst_DS__DS_ADD_U32::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+} // completeAcc
 // --- Inst_DS__DS_SUB_U32 class methods ---

 Inst_DS__DS_SUB_U32::Inst_DS__DS_SUB_U32(InFmt_DS *iFmt)
diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index 1c42248..33be33e 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -31211,6 +31211,8 @@
 }
 } // getOperandSize

+void initiateAcc(GPUDynInstPtr gpuDynInst) override;
+void completeAcc(GPUDynInstPtr gpuDynInst) override;
 void execute(GPUDynInstPtr) override;
 }; // Inst_DS__DS_ADD_U32


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67072?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I87579a94f6200a9a066f8f7390e57fb5fb6eff8e
Gerrit-Change-Number: 67072
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Implement ds_read_i8

2022-12-30 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/67076?usp=email )



Change subject: arch-vega: Implement ds_read_i8
..

arch-vega: Implement ds_read_i8

Read one byte with sign extended from LDS.

Change-Id: I9cb9b4033c6f834241cba944bc7e6a7ebc5401be
---
M src/arch/amdgpu/vega/insts/instructions.cc
M src/arch/amdgpu/vega/insts/instructions.hh
2 files changed, 56 insertions(+), 1 deletion(-)



diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 511a767..f0fb1aa 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -35630,8 +35630,50 @@
 void
 Inst_DS__DS_READ_I8::execute(GPUDynInstPtr gpuDynInst)
 {
-panicUnimplemented();
+Wavefront *wf = gpuDynInst->wavefront();
+
+if (gpuDynInst->exec_mask.none()) {
+wf->decLGKMInstsIssued();
+return;
+}
+
+gpuDynInst->execUnitId = wf->execUnitId;
+gpuDynInst->latency.init(gpuDynInst->computeUnit());
+gpuDynInst->latency.set(
+gpuDynInst->computeUnit()->cyclesToTicks(Cycles(24)));
+ConstVecOperandU32 addr(gpuDynInst, extData.ADDR);
+
+addr.read();
+
+calcAddr(gpuDynInst, addr);
+
+ 
gpuDynInst->computeUnit()->localMemoryPipe.issueRequest(gpuDynInst);

 } // execute
+
+void
+Inst_DS__DS_READ_I8::initiateAcc(GPUDynInstPtr gpuDynInst)
+{
+Addr offset0 = instData.OFFSET0;
+Addr offset1 = instData.OFFSET1;
+Addr offset = (offset1 << 8) | offset0;
+
+initMemRead(gpuDynInst, offset);
+} // initiateAcc
+
+void
+Inst_DS__DS_READ_I8::completeAcc(GPUDynInstPtr gpuDynInst)
+{
+VecOperandU32 vdst(gpuDynInst, extData.VDST);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (gpuDynInst->exec_mask[lane]) {
+vdst[lane] =  
(VecElemU32)sext<8>((reinterpret_cast(

+gpuDynInst->d_data))[lane]);
+}
+}
+
+vdst.write();
+} // completeAcc
 // --- Inst_DS__DS_READ_U8 class methods ---

 Inst_DS__DS_READ_U8::Inst_DS__DS_READ_U8(InFmt_DS *iFmt)
diff --git a/src/arch/amdgpu/vega/insts/instructions.hh  
b/src/arch/amdgpu/vega/insts/instructions.hh

index f8fc98b..b2cf2b9 100644
--- a/src/arch/amdgpu/vega/insts/instructions.hh
+++ b/src/arch/amdgpu/vega/insts/instructions.hh
@@ -32848,6 +32848,8 @@
 } // getOperandSize

 void execute(GPUDynInstPtr) override;
+void initiateAcc(GPUDynInstPtr) override;
+void completeAcc(GPUDynInstPtr) override;
 }; // Inst_DS__DS_READ_I8

 class Inst_DS__DS_READ_U8 : public Inst_DS

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/67076?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I9cb9b4033c6f834241cba944bc7e6a7ebc5401be
Gerrit-Change-Number: 67076
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Add DPP support for V_AND_B32

2022-12-16 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/66753?usp=email )



Change subject: arch-vega: Add DPP support for V_AND_B32
..

arch-vega: Add DPP support for V_AND_B32

A DPP variant of V_AND_B32 was found in rocPRIM. With this changeset the
unit tests for rocPRIM scan_inclusive are passing.

Change-Id: I5a65f2cf6b56ac13609b191e3b3dfeb55e630942
---
M src/arch/amdgpu/vega/insts/instructions.cc
1 file changed, 42 insertions(+), 4 deletions(-)



diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 5612f29..3570e32 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -6838,15 +6838,41 @@
 {
 Wavefront *wf = gpuDynInst->wavefront();
 ConstVecOperandU32 src0(gpuDynInst, instData.SRC0);
-ConstVecOperandU32 src1(gpuDynInst, instData.VSRC1);
+VecOperandU32 src1(gpuDynInst, instData.VSRC1);
 VecOperandU32 vdst(gpuDynInst, instData.VDST);

 src0.readSrc();
 src1.read();

-for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
-if (wf->execMask(lane)) {
-vdst[lane] = src0[lane] & src1[lane];
+if (isDPPInst()) {
+VecOperandU32 src0_dpp(gpuDynInst, extData.iFmt_VOP_DPP.SRC0);
+src0_dpp.read();
+
+DPRINTF(VEGA, "Handling V_AND_B32 SRC DPP. SRC0: register  
v[%d], "

+"DPP_CTRL: 0x%#x, SRC0_ABS: %d, SRC0_NEG: %d, "
+"SRC1_ABS: %d, SRC1_NEG: %d, BC: %d, "
+"BANK_MASK: %d, ROW_MASK: %d\n",  
extData.iFmt_VOP_DPP.SRC0,

+extData.iFmt_VOP_DPP.DPP_CTRL,
+extData.iFmt_VOP_DPP.SRC0_ABS,
+extData.iFmt_VOP_DPP.SRC0_NEG,
+extData.iFmt_VOP_DPP.SRC1_ABS,
+extData.iFmt_VOP_DPP.SRC1_NEG,
+extData.iFmt_VOP_DPP.BC,
+extData.iFmt_VOP_DPP.BANK_MASK,
+extData.iFmt_VOP_DPP.ROW_MASK);
+
+processDPP(gpuDynInst, extData.iFmt_VOP_DPP, src0_dpp, src1);
+
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (wf->execMask(lane)) {
+vdst[lane] = src0_dpp[lane] & src1[lane];
+}
+}
+} else {
+for (int lane = 0; lane < NumVecElemPerVecReg; ++lane) {
+if (wf->execMask(lane)) {
+vdst[lane] = src0[lane] & src1[lane];
+}
 }
 }


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/66753?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I5a65f2cf6b56ac13609b191e3b3dfeb55e630942
Gerrit-Change-Number: 66753
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: arch-vega: Fix several issues with DPP

2022-12-16 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/66752?usp=email )



Change subject: arch-vega: Fix several issues with DPP
..

arch-vega: Fix several issues with DPP

DPP processing has several issues which are fixed in this changeset:

1) Incorrect comment is updated
2) newLane calculation for shift/rotate instructions is corrected
3) A copy of original data is made so that a copy of a copy is not made
4) Reset all booleans (OOB, zeroSrc, laneDisabled) after each lane
iteration

The shift, rotate, and broadcast variants were tested by implementing
them in assembly and running on silicon.

Change-Id: If86fbb26c87eaca4ef0587fd846978115858b168
---
M src/arch/amdgpu/vega/insts/inst_util.hh
1 file changed, 48 insertions(+), 23 deletions(-)



diff --git a/src/arch/amdgpu/vega/insts/inst_util.hh  
b/src/arch/amdgpu/vega/insts/inst_util.hh

index 01925f9..10bb629 100644
--- a/src/arch/amdgpu/vega/insts/inst_util.hh
+++ b/src/arch/amdgpu/vega/insts/inst_util.hh
@@ -303,9 +303,9 @@
  * Currently the values are:
  * 0x0 - 0xFF: full permute of four threads
  * 0x100: reserved
- * 0x101 - 0x10F: row shift right by 1-15 threads
+ * 0x101 - 0x10F: row shift left by 1-15 threads
  * 0x111 - 0x11F: row shift right by 1-15 threads
- * 0x121 - 0x12F: row shift right by 1-15 threads
+ * 0x121 - 0x12F: row rotate right by 1-15 threads
  * 0x130: wavefront left shift by 1 thread
  * 0x134: wavefront left rotate by 1 thread
  * 0x138: wavefront right shift by 1 thread
@@ -322,7 +322,8 @@
 // newLane will be the same as the input lane unless swizzling  
happens

 int newLane = currLane;
 // for shift/rotate permutations; positive values are LEFT rotates
-int count = 1;
+// shift/rotate left means lane n -> lane n-1 (e.g., lane 1 ->  
lane 0)

+int count = 0;
 int localRowOffset = rowOffset;
 int localRowNum = rowNum;

@@ -335,51 +336,47 @@
 panic("ERROR: instruction using reserved DPP_CTRL value\n");
 } else if ((dppCtrl >= SQ_DPP_ROW_SL1) &&
(dppCtrl <= SQ_DPP_ROW_SL15)) { // DPP_ROW_SL{1:15}
-count -= (dppCtrl - SQ_DPP_ROW_SL1 + 1);
+count = (dppCtrl - SQ_DPP_ROW_SL1 + 1);
 if ((localRowOffset + count >= 0) &&
 (localRowOffset + count < ROW_SIZE)) {
 localRowOffset += count;
-newLane = (rowNum | localRowOffset);
+newLane = ((rowNum * ROW_SIZE) | localRowOffset);
 } else {
 outOfBounds = true;
 }
 } else if ((dppCtrl >= SQ_DPP_ROW_SR1) &&
(dppCtrl <= SQ_DPP_ROW_SR15)) { // DPP_ROW_SR{1:15}
-count -= (dppCtrl - SQ_DPP_ROW_SR1 + 1);
+count = -(dppCtrl - SQ_DPP_ROW_SR1 + 1);
 if ((localRowOffset + count >= 0) &&
 (localRowOffset + count < ROW_SIZE)) {
 localRowOffset += count;
-newLane = (rowNum | localRowOffset);
+newLane = ((rowNum * ROW_SIZE) | localRowOffset);
 } else {
 outOfBounds = true;
 }
 } else if ((dppCtrl >= SQ_DPP_ROW_RR1) &&
(dppCtrl <= SQ_DPP_ROW_RR15)) { // DPP_ROW_RR{1:15}
-count -= (dppCtrl - SQ_DPP_ROW_RR1 + 1);
+count = -(dppCtrl - SQ_DPP_ROW_RR1 + 1);
 localRowOffset = (localRowOffset + count + ROW_SIZE) %  
ROW_SIZE;

-newLane = (rowNum | localRowOffset);
+newLane = ((rowNum * ROW_SIZE) | localRowOffset);
 } else if (dppCtrl == SQ_DPP_WF_SL1) { // DPP_WF_SL1
-count = 1;
 if ((currLane >= 0) && (currLane < NumVecElemPerVecReg)) {
-newLane += count;
+newLane += 1;
 } else {
 outOfBounds = true;
 }
 } else if (dppCtrl == SQ_DPP_WF_RL1) { // DPP_WF_RL1
-count = 1;
-newLane = (currLane + count + NumVecElemPerVecReg) %
+newLane = (currLane - 1 + NumVecElemPerVecReg) %
   NumVecElemPerVecReg;
 } else if (dppCtrl == SQ_DPP_WF_SR1) { // DPP_WF_SR1
-count = -1;
-int currVal = (currLane + count);
+int currVal = (currLane - 1);
 if ((currVal >= 0) && (currVal < NumVecElemPerVecReg)) {
-newLane += count;
+newLane -= 1;
 } else {
 outOfBounds = true;
 }
 } else if (dppCtrl == SQ_DPP_WF_RR1) { // DPP_WF_RR1
-count = -1;
-newLane = (currLane + count + NumVecElemPerVecReg) %
+newLane = (currLane - 1 + NumVecElemPerVecReg) %
   NumVecElemPerVecReg;
 } else if (dppCtrl == SQ_DPP_ROW_MIRROR) { // 

[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Fix signed BFE instructions

2022-12-16 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/66751?usp=email )



Change subject: arch-vega: Fix signed BFE instructions
..

arch-vega: Fix signed BFE instructions

The bitfield extract instructions come in unsigned and signed variants.
The documentation on this is not correct, however the GCN3 documentation
gives some clues. The instruction should extract an N-bit integer where
N is defined in a source operand starting at some bit also defined by a
source operand. For signed variants of this instruction, the N-bit
integer should be sign extended but is currently not.

This changeset does sign extension using the runtime value of N by ORing
the upper bits with ones if the most significant bit is one. This was
verified by writing these instructions in assembly and running on a real
GPU. Changes are made to v_bfe_i32, s_bfe_i32, and s_bfe_i64.

Change-Id: Ia192f5940200c6de48867b02f709a7f1b2daa974
---
M src/arch/amdgpu/vega/insts/instructions.cc
1 file changed, 45 insertions(+), 0 deletions(-)



diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index f5b08b7..5612f29 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -1302,6 +1302,15 @@

 sdst = (src0.rawData() >> bits(src1.rawData(), 4, 0))
 & ((1 << bits(src1.rawData(), 22, 16)) - 1);
+
+// Above extracted a signed int of size src1[22:16] bits which  
needs

+// to be signed-extended. Check if the MSB of our src1[22:16]-bit
+// integer is 1, and sign extend it is.
+if (sdst.rawData() >> (bits(src1.rawData(), 22, 16) - 1)) {
+sdst = sdst.rawData()
+ | (0x << bits(src1.rawData(), 22, 16));
+}
+
 scc = sdst.rawData() ? 1 : 0;

 sdst.write();
@@ -1373,6 +1382,14 @@

 sdst = (src0.rawData() >> bits(src1.rawData(), 5, 0))
 & ((1 << bits(src1.rawData(), 22, 16)) - 1);
+
+// Above extracted a signed int of size src1[22:16] bits which  
needs

+// to be signed-extended. Check if the MSB of our src1[22:16]-bit
+// integer is 1, and sign extend it is.
+if (sdst.rawData() >> (bits(src1.rawData(), 22, 16) - 1)) {
+sdst = sdst.rawData()
+ | 0x << bits(src1.rawData(), 22, 16);
+}
 scc = sdst.rawData() ? 1 : 0;

 sdst.write();
@@ -30544,6 +30561,13 @@
 if (wf->execMask(lane)) {
 vdst[lane] = (src0[lane] >> bits(src1[lane], 4, 0))
 & ((1 << bits(src2[lane], 4, 0)) - 1);
+
+// Above extracted a signed int of size src2 bits which  
needs

+// to be signed-extended. Check if the MSB of our src2-bit
+// integer is 1, and sign extend it is.
+if (vdst[lane] >> (bits(src2[lane], 4, 0) - 1)) {
+vdst[lane] |= 0x << bits(src2[lane], 4, 0);
+}
 }
 }


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/66751?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ia192f5940200c6de48867b02f709a7f1b2daa974
Gerrit-Change-Number: 66751
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: gpu-compute: Fix ABI init for DispatchId

2022-12-16 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/66711?usp=email )


Change subject: gpu-compute: Fix ABI init for DispatchId
..

gpu-compute: Fix ABI init for DispatchId

DispatchId should allocate two SGPRs instead of one. Allocating one was
causing all subsequent SGPR index values to be off by one, leading to
bad addresses for things like flat scratch and private segment. This
field is not used very often so it was not impacting most applications.

Change-Id: I17744e2d099fbc0447f400211ba7f8a42675ea06
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/66711
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/gpu-compute/wavefront.cc
1 file changed, 28 insertions(+), 2 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/gpu-compute/wavefront.cc b/src/gpu-compute/wavefront.cc
index 7e4b36f..8a1adfe 100644
--- a/src/gpu-compute/wavefront.cc
+++ b/src/gpu-compute/wavefront.cc
@@ -118,8 +118,10 @@
 {
 int regInitIdx = 0;

-// iterate over all the init fields and check which
-// bits are enabled
+// Iterate over all the init fields and check which
+// bits are enabled. Useful information can be found here:
+// https://github.com/ROCm-Developer-Tools/ROCm-ComputeABI-Doc/
+//blob/master/AMDGPU-ABI.md
 for (int en_bit = 0; en_bit < NumScalarInitFields; ++en_bit) {

 if (task->sgprBitEnabled(en_bit)) {
@@ -263,6 +265,12 @@
 computeUnit->cu_id, simdId,
 wfSlotId, wfDynId, physSgprIdx,
 task->dispatchId());
+
+// Dispatch ID in gem5 is an int. Set upper 32-bits to  
zero.

+physSgprIdx
+= computeUnit->registerManager->mapSgpr(this,  
regInitIdx);

+computeUnit->srf[simdId]->write(physSgprIdx, 0);
+++regInitIdx;
 break;
   case FlatScratchInit:
 physSgprIdx

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/66711?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I17744e2d099fbc0447f400211ba7f8a42675ea06
Gerrit-Change-Number: 66711
Gerrit-PatchSet: 3
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: dev: Ignore MC146818 UIP bit / Fix x86 Linux 5.11+ boot

2022-12-15 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/66731?usp=email )



Change subject: dev: Ignore MC146818 UIP bit / Fix x86 Linux 5.11+ boot
..

dev: Ignore MC146818 UIP bit / Fix x86 Linux 5.11+ boot

As of Linux 5.11, the MC146818 code was changed to avoid reading garbage
data that may occur if the is a read while the registers are being
updated:

github.com/torvalds/linux/commit/05a0302c35481e9b47fb90ba40922b0a4cae40d8

Previously toggling this bit was fine as Linux would check twice. It now
checks before and after reading time information, causing it to retry
infinitely until eventually Linux bootup fails due to watchdog timeout.

This changeset always sets update in progress to false. Since this is a
simulation, the updates probably will not be occurring at the same time
a read is occurring.

Change-Id: If0f440de9f9a6bc5a773fc935d1d5af5b98a9a4b
---
M src/dev/mc146818.cc
1 file changed, 26 insertions(+), 2 deletions(-)



diff --git a/src/dev/mc146818.cc b/src/dev/mc146818.cc
index 919efb0..2bfe877 100644
--- a/src/dev/mc146818.cc
+++ b/src/dev/mc146818.cc
@@ -233,8 +233,9 @@
 else {
 switch (addr) {
   case RTC_STAT_REGA:
-// toggle UIP bit for linux
-stat_regA.uip = !stat_regA.uip;
+// Linux after v5.10 checks this multiple times so toggling
+// leads to a deadlock on bootup.
+stat_regA.uip = 0;
 return stat_regA;
 break;
   case RTC_STAT_REGB:

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/66731?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: If0f440de9f9a6bc5a773fc935d1d5af5b98a9a4b
Gerrit-Change-Number: 66731
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: gpu-compute: Fix ABI init for DispatchId

2022-12-15 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/66711?usp=email )



Change subject: gpu-compute: Fix ABI init for DispatchId
..

gpu-compute: Fix ABI init for DispatchId

DispatchId should allocated two SGPRs instead of one. Allocating one was
causing all subsequent SGPR index values to be off by one, leading to
bad addresses for things like flat scratch and private segment. This
field is not used very often so it was not impacting most applications.

Change-Id: I17744e2d099fbc0447f400211ba7f8a42675ea06
---
M src/gpu-compute/wavefront.cc
1 file changed, 24 insertions(+), 2 deletions(-)



diff --git a/src/gpu-compute/wavefront.cc b/src/gpu-compute/wavefront.cc
index 7e4b36f..8a1adfe 100644
--- a/src/gpu-compute/wavefront.cc
+++ b/src/gpu-compute/wavefront.cc
@@ -118,8 +118,10 @@
 {
 int regInitIdx = 0;

-// iterate over all the init fields and check which
-// bits are enabled
+// Iterate over all the init fields and check which
+// bits are enabled. Useful information can be found here:
+// https://github.com/ROCm-Developer-Tools/ROCm-ComputeABI-Doc/
+//blob/master/AMDGPU-ABI.md
 for (int en_bit = 0; en_bit < NumScalarInitFields; ++en_bit) {

 if (task->sgprBitEnabled(en_bit)) {
@@ -263,6 +265,12 @@
 computeUnit->cu_id, simdId,
 wfSlotId, wfDynId, physSgprIdx,
 task->dispatchId());
+
+// Dispatch ID in gem5 is an int. Set upper 32-bits to  
zero.

+physSgprIdx
+= computeUnit->registerManager->mapSgpr(this,  
regInitIdx);

+computeUnit->srf[simdId]->write(physSgprIdx, 0);
+++regInitIdx;
 break;
   case FlatScratchInit:
 physSgprIdx

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/66711?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I17744e2d099fbc0447f400211ba7f8a42675ea06
Gerrit-Change-Number: 66711
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: dev-amdgpu: Writeback RLC queue MQD when unmapped

2022-12-01 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/65791?usp=email )


Change subject: dev-amdgpu: Writeback RLC queue MQD when unmapped
..

dev-amdgpu: Writeback RLC queue MQD when unmapped

Currently when RLC queues (user mode queues) are mapped, the read/write
pointers of the ring buffer are set to zero. However, these queues could
be unmapped and then remapped later. In that situation the read/write
pointers should be the previous value before unmapping occurred. Since
the read pointer gets reset to zero, the queue begins reading from the
start of the ring, which usually contains older packets. There is a 99%
chance those packets contain addresses which are no longer in the page
tables which will cause a page fault.

To fix this we update the MQD with the current read/write pointer values
and then writeback the MQD to memory when the queue is unmapped. This
requires adding a pointer to the MQD and the host address of the MQD
where it should be written back to. The interface for registering RLC
queue is also simplified. Since we need to pass the MQD anyway, we can
get values from it as well.

Fixes b+tree and streamcluster from rodinia (when using RLC queues).

Change-Id: Ie5dad4d7d90ea240c3e9f0cddf3e844a3cd34c4f
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65791
Tested-by: kokoro 
Maintainer: Matt Sinclair 
Reviewed-by: Matt Sinclair 
---
M src/dev/amdgpu/pm4_packet_processor.cc
M src/dev/amdgpu/pm4_queues.hh
M src/dev/amdgpu/sdma_engine.cc
M src/dev/amdgpu/sdma_engine.hh
4 files changed, 110 insertions(+), 19 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/dev/amdgpu/pm4_packet_processor.cc  
b/src/dev/amdgpu/pm4_packet_processor.cc

index f78f833..152fd4d 100644
--- a/src/dev/amdgpu/pm4_packet_processor.cc
+++ b/src/dev/amdgpu/pm4_packet_processor.cc
@@ -458,9 +458,7 @@
 SDMAEngine *sdma_eng = gpuDevice->getSDMAById(pkt->engineSel - 2);

 // Register RLC queue with SDMA
-sdma_eng->registerRLCQueue(pkt->doorbellOffset << 2,
-   mqd->rb_base << 8, rlc_size,
-   rptr_wb_addr);
+sdma_eng->registerRLCQueue(pkt->doorbellOffset << 2, addr, mqd);

 // Register doorbell with GPU device
 gpuDevice->setSDMAEngine(pkt->doorbellOffset << 2, sdma_eng);
diff --git a/src/dev/amdgpu/pm4_queues.hh b/src/dev/amdgpu/pm4_queues.hh
index 8b6626d..ddadd65 100644
--- a/src/dev/amdgpu/pm4_queues.hh
+++ b/src/dev/amdgpu/pm4_queues.hh
@@ -33,6 +33,8 @@
 #ifndef __DEV_AMDGPU_PM4_QUEUES_HH__
 #define __DEV_AMDGPU_PM4_QUEUES_HH__

+#include "dev/amdgpu/pm4_defines.hh"
+
 namespace gem5
 {

@@ -201,10 +203,24 @@
 };
 uint64_t rb_base;
 };
-uint32_t sdmax_rlcx_rb_rptr;
-uint32_t sdmax_rlcx_rb_rptr_hi;
-uint32_t sdmax_rlcx_rb_wptr;
-uint32_t sdmax_rlcx_rb_wptr_hi;
+union
+{
+struct
+{
+uint32_t sdmax_rlcx_rb_rptr;
+uint32_t sdmax_rlcx_rb_rptr_hi;
+};
+uint64_t rptr;
+};
+union
+{
+struct
+{
+uint32_t sdmax_rlcx_rb_wptr;
+uint32_t sdmax_rlcx_rb_wptr_hi;
+};
+uint64_t wptr;
+};
 uint32_t sdmax_rlcx_rb_wptr_poll_cntl;
 uint32_t sdmax_rlcx_rb_rptr_addr_hi;
 uint32_t sdmax_rlcx_rb_rptr_addr_lo;
diff --git a/src/dev/amdgpu/sdma_engine.cc b/src/dev/amdgpu/sdma_engine.cc
index 02203c8..4c03bf5 100644
--- a/src/dev/amdgpu/sdma_engine.cc
+++ b/src/dev/amdgpu/sdma_engine.cc
@@ -165,30 +165,40 @@
 }

 void
-SDMAEngine::registerRLCQueue(Addr doorbell, Addr rb_base, uint32_t size,
- Addr rptr_wb_addr)
+SDMAEngine::registerRLCQueue(Addr doorbell, Addr mqdAddr, SDMAQueueDesc  
*mqd)

 {
+uint32_t rlc_size = 4UL << bits(mqd->sdmax_rlcx_rb_cntl, 6, 1);
+Addr rptr_wb_addr = mqd->sdmax_rlcx_rb_rptr_addr_hi;
+rptr_wb_addr <<= 32;
+rptr_wb_addr |= mqd->sdmax_rlcx_rb_rptr_addr_lo;
+
 // Get first free RLC
 if (!rlc0.valid()) {
 DPRINTF(SDMAEngine, "Doorbell %lx mapped to RLC0\n", doorbell);
 rlcInfo[0] = doorbell;
 rlc0.valid(true);
-rlc0.base(rb_base);
+rlc0.base(mqd->rb_base << 8);
+rlc0.size(rlc_size);
 rlc0.rptr(0);
-rlc0.wptr(0);
+rlc0.incRptr(mqd->rptr);
+rlc0.setWptr(mqd->wptr);
 rlc0.rptrWbAddr(rptr_wb_addr);
 rlc0.processing(false);
-rlc0.size(size);
+rlc0.setMQD(mqd);
+rlc0.setMQDAddr(mqdAddr);
 } else if (!rlc1.valid()) {
 DPRINTF(SDMAEngine, "Doorbell %lx mapped to RLC1\n", doorbell);
 rlcInfo[1] = doorbell;
 rlc1.valid(true);
-rlc1.base(rb_base);
+rlc1.base(mqd->rb_base << 8);
+rlc1.size(rlc_size);
 rlc1.rptr(0);
-

[gem5-dev] [S] Change in gem5/gem5[develop]: configs: Set CPU vendor to M5 Simulator in apu_se.py

2022-11-28 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/65991?usp=email )


Change subject: configs: Set CPU vendor to M5 Simulator in apu_se.py
..

configs: Set CPU vendor to M5 Simulator in apu_se.py

Other vendor strings causes, for some reason, bad addresses to be
computed when running the GPU model. This change reverts back to M5
Simulator only for apu_se.py.

Change-Id: I5992b4e31569f5c0e5e49e523908c8fa0602f845
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65991
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Reviewed-by: Jason Lowe-Power 
---
M configs/example/apu_se.py
1 file changed, 23 insertions(+), 0 deletions(-)

Approvals:
  kokoro: Regressions pass
  Matt Sinclair: Looks good to me, but someone else must approve; Looks  
good to me, approved

  Jason Lowe-Power: Looks good to me, approved




diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py
index 39def02..8e8bc60 100644
--- a/configs/example/apu_se.py
+++ b/configs/example/apu_se.py
@@ -757,6 +757,11 @@
 (cpu_list[i], future_cpu_list[i]) for i in range(args.num_cpus)
 ]

+# Other CPU strings cause bad addresses in ROCm. Revert back to M5  
Simulator.

+for (i, cpu) in enumerate(cpu_list):
+for j in range(len(cpu)):
+cpu.isa[j].vendor_string = "M5 Simulator"
+
 # Full list of processing cores in the system.
 cpu_list = cpu_list + [shader] + cp_list


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/65991?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I5992b4e31569f5c0e5e49e523908c8fa0602f845
Gerrit-Change-Number: 65991
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Bobby Bruce 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: configs: Set CPU vendor to M5 Simulator in apu_se.py

2022-11-23 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/65991?usp=email )



Change subject: configs: Set CPU vendor to M5 Simulator in apu_se.py
..

configs: Set CPU vendor to M5 Simulator in apu_se.py

Other vendor strings causes, for some reason, bad addresses to be
computed when running the GPU model. This change reverts back to M5
Simulator only for apu_se.py.

Change-Id: I5992b4e31569f5c0e5e49e523908c8fa0602f845
---
M configs/example/apu_se.py
1 file changed, 18 insertions(+), 0 deletions(-)



diff --git a/configs/example/apu_se.py b/configs/example/apu_se.py
index 39def02..8e8bc60 100644
--- a/configs/example/apu_se.py
+++ b/configs/example/apu_se.py
@@ -757,6 +757,11 @@
 (cpu_list[i], future_cpu_list[i]) for i in range(args.num_cpus)
 ]

+# Other CPU strings cause bad addresses in ROCm. Revert back to M5  
Simulator.

+for (i, cpu) in enumerate(cpu_list):
+for j in range(len(cpu)):
+cpu.isa[j].vendor_string = "M5 Simulator"
+
 # Full list of processing cores in the system.
 cpu_list = cpu_list + [shader] + cp_list


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/65991?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I5992b4e31569f5c0e5e49e523908c8fa0602f845
Gerrit-Change-Number: 65991
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: dev-amdgpu: Writeback RLC queue MQD when unmapped

2022-11-18 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/65791?usp=email )



Change subject: dev-amdgpu: Writeback RLC queue MQD when unmapped
..

dev-amdgpu: Writeback RLC queue MQD when unmapped

Currently when RLC queues (user mode queues) are mapped, the read/write
pointers of the ring buffer are set to zero. However, these queues could
be unmapped and then remapped later. In that situation the read/write
pointers should be the previous value before unmapping occurred. Since
the read pointer gets reset to zero, the queue begins reading from the
start of the ring, which usually contains older packets. There is a 99%
chance those packets contain addresses which are no longer in the page
tables which will cause a page fault.

To fix this we update the MQD with the current read/write pointer values
and then writeback the MQD to memory when the queue is unmapped. This
requires adding a pointer to the MQD and the host address of the MQD
where it should be written back to. The interface for registering RLC
queue is also simplified. Since we need to pass the MQD anyway, we can
get values from it as well.

Fixes b+tree and streamcluster from rodinia (when using RLC queues).

Change-Id: Ie5dad4d7d90ea240c3e9f0cddf3e844a3cd34c4f
---
M src/dev/amdgpu/pm4_packet_processor.cc
M src/dev/amdgpu/pm4_queues.hh
M src/dev/amdgpu/sdma_engine.cc
M src/dev/amdgpu/sdma_engine.hh
4 files changed, 106 insertions(+), 19 deletions(-)



diff --git a/src/dev/amdgpu/pm4_packet_processor.cc  
b/src/dev/amdgpu/pm4_packet_processor.cc

index f78f833..152fd4d 100644
--- a/src/dev/amdgpu/pm4_packet_processor.cc
+++ b/src/dev/amdgpu/pm4_packet_processor.cc
@@ -458,9 +458,7 @@
 SDMAEngine *sdma_eng = gpuDevice->getSDMAById(pkt->engineSel - 2);

 // Register RLC queue with SDMA
-sdma_eng->registerRLCQueue(pkt->doorbellOffset << 2,
-   mqd->rb_base << 8, rlc_size,
-   rptr_wb_addr);
+sdma_eng->registerRLCQueue(pkt->doorbellOffset << 2, addr, mqd);

 // Register doorbell with GPU device
 gpuDevice->setSDMAEngine(pkt->doorbellOffset << 2, sdma_eng);
diff --git a/src/dev/amdgpu/pm4_queues.hh b/src/dev/amdgpu/pm4_queues.hh
index 8b6626d..ddadd65 100644
--- a/src/dev/amdgpu/pm4_queues.hh
+++ b/src/dev/amdgpu/pm4_queues.hh
@@ -33,6 +33,8 @@
 #ifndef __DEV_AMDGPU_PM4_QUEUES_HH__
 #define __DEV_AMDGPU_PM4_QUEUES_HH__

+#include "dev/amdgpu/pm4_defines.hh"
+
 namespace gem5
 {

@@ -201,10 +203,24 @@
 };
 uint64_t rb_base;
 };
-uint32_t sdmax_rlcx_rb_rptr;
-uint32_t sdmax_rlcx_rb_rptr_hi;
-uint32_t sdmax_rlcx_rb_wptr;
-uint32_t sdmax_rlcx_rb_wptr_hi;
+union
+{
+struct
+{
+uint32_t sdmax_rlcx_rb_rptr;
+uint32_t sdmax_rlcx_rb_rptr_hi;
+};
+uint64_t rptr;
+};
+union
+{
+struct
+{
+uint32_t sdmax_rlcx_rb_wptr;
+uint32_t sdmax_rlcx_rb_wptr_hi;
+};
+uint64_t wptr;
+};
 uint32_t sdmax_rlcx_rb_wptr_poll_cntl;
 uint32_t sdmax_rlcx_rb_rptr_addr_hi;
 uint32_t sdmax_rlcx_rb_rptr_addr_lo;
diff --git a/src/dev/amdgpu/sdma_engine.cc b/src/dev/amdgpu/sdma_engine.cc
index 02203c8..4c03bf5 100644
--- a/src/dev/amdgpu/sdma_engine.cc
+++ b/src/dev/amdgpu/sdma_engine.cc
@@ -165,30 +165,40 @@
 }

 void
-SDMAEngine::registerRLCQueue(Addr doorbell, Addr rb_base, uint32_t size,
- Addr rptr_wb_addr)
+SDMAEngine::registerRLCQueue(Addr doorbell, Addr mqdAddr, SDMAQueueDesc  
*mqd)

 {
+uint32_t rlc_size = 4UL << bits(mqd->sdmax_rlcx_rb_cntl, 6, 1);
+Addr rptr_wb_addr = mqd->sdmax_rlcx_rb_rptr_addr_hi;
+rptr_wb_addr <<= 32;
+rptr_wb_addr |= mqd->sdmax_rlcx_rb_rptr_addr_lo;
+
 // Get first free RLC
 if (!rlc0.valid()) {
 DPRINTF(SDMAEngine, "Doorbell %lx mapped to RLC0\n", doorbell);
 rlcInfo[0] = doorbell;
 rlc0.valid(true);
-rlc0.base(rb_base);
+rlc0.base(mqd->rb_base << 8);
+rlc0.size(rlc_size);
 rlc0.rptr(0);
-rlc0.wptr(0);
+rlc0.incRptr(mqd->rptr);
+rlc0.setWptr(mqd->wptr);
 rlc0.rptrWbAddr(rptr_wb_addr);
 rlc0.processing(false);
-rlc0.size(size);
+rlc0.setMQD(mqd);
+rlc0.setMQDAddr(mqdAddr);
 } else if (!rlc1.valid()) {
 DPRINTF(SDMAEngine, "Doorbell %lx mapped to RLC1\n", doorbell);
 rlcInfo[1] = doorbell;
 rlc1.valid(true);
-rlc1.base(rb_base);
+rlc1.base(mqd->rb_base << 8);
+rlc1.size(rlc_size);
 rlc1.rptr(0);
-rlc1.wptr(0);
+rlc1.incRptr(mqd->rptr);
+rlc1.setWptr(mqd->wptr);
 rlc1.rptrWbAddr(rptr_wb_addr);
 rlc1.processing(false);
-rlc1.size(size);
+rlc1.setMQD(mqd);
+rlc1.setMQDAddr(mqdAddr);
 } 

[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Store SDMA queue type, use for ring ID

2022-11-18 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/65691?usp=email )


Change subject: dev-amdgpu: Store SDMA queue type, use for ring ID
..

dev-amdgpu: Store SDMA queue type, use for ring ID

Currently the SDMA queue type is guessed in the trap method by looking
at which queue in the engine is processing packets. It is possible for
both queues to be processing (e.g., one queue sent a DMA and is waiting
then switch to another queue), triggering an assert.

Instead store the queue type in the queue itself and use that type in
trap to determine which ring ID to use for the interrupt packet.

Change-Id: If91c458e60a03f2013c0dc42bab0b1673e3dbd84
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65691
Maintainer: Jason Lowe-Power 
Reviewed-by: Jason Lowe-Power 
Tested-by: kokoro 
---
M src/dev/amdgpu/sdma_engine.cc
M src/dev/amdgpu/sdma_engine.hh
2 files changed, 30 insertions(+), 6 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/dev/amdgpu/sdma_engine.cc b/src/dev/amdgpu/sdma_engine.cc
index 59c5027..02203c8 100644
--- a/src/dev/amdgpu/sdma_engine.cc
+++ b/src/dev/amdgpu/sdma_engine.cc
@@ -55,11 +55,15 @@
 gfxIb.parent();
 gfx.valid(true);
 gfxIb.valid(true);
+gfx.queueType(SDMAGfx);
+gfxIb.queueType(SDMAGfx);

 page.ib();
 pageIb.parent();
 page.valid(true);
 pageIb.valid(true);
+page.queueType(SDMAPage);
+pageIb.queueType(SDMAPage);

 rlc0.ib();
 rlc0Ib.parent();
@@ -727,11 +731,7 @@

 DPRINTF(SDMAEngine, "Trap contextId: %p\n", pkt->intrContext);

-uint32_t ring_id = 0;
-assert(page.processing() ^ gfx.processing());
-if (page.processing()) {
-ring_id = 3;
-}
+uint32_t ring_id = (q->queueType() == SDMAPage) ? 3 : 0;

 gpuDevice->getIH()->prepareInterruptCookie(pkt->intrContext, ring_id,
getIHClientId(), TRAP_ID);
diff --git a/src/dev/amdgpu/sdma_engine.hh b/src/dev/amdgpu/sdma_engine.hh
index d0afaf7..0bfee12 100644
--- a/src/dev/amdgpu/sdma_engine.hh
+++ b/src/dev/amdgpu/sdma_engine.hh
@@ -64,9 +64,10 @@
 bool _processing;
 SDMAQueue *_parent;
 SDMAQueue *_ib;
+SDMAType _type;
   public:
 SDMAQueue() : _rptr(0), _wptr(0), _valid(false),  
_processing(false),

-_parent(nullptr), _ib(nullptr) {}
+_parent(nullptr), _ib(nullptr), _type(SDMAGfx) {}

 Addr base() { return _base; }
 Addr rptr() { return _base + _rptr; }
@@ -80,6 +81,7 @@
 bool processing() { return _processing; }
 SDMAQueue* parent() { return _parent; }
 SDMAQueue* ib() { return _ib; }
+SDMAType queueType() { return _type; }

 void base(Addr value) { _base = value; }

@@ -111,6 +113,7 @@
 void processing(bool value) { _processing = value; }
 void parent(SDMAQueue* q) { _parent = q; }
 void ib(SDMAQueue* ib) { _ib = ib; }
+void queueType(SDMAType type) { _type = type; }
 };

 /* SDMA Engine ID */

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/65691?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: If91c458e60a03f2013c0dc42bab0b1673e3dbd84
Gerrit-Change-Number: 65691
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Store SDMA queue type, use for ring ID

2022-11-17 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/65691?usp=email )



Change subject: dev-amdgpu: Store SDMA queue type, use for ring ID
..

dev-amdgpu: Store SDMA queue type, use for ring ID

Currently the SDMA queue type is guessed in the trap method by looking
at which queue in the engine is processing packets. It is possible for
both queues to be processing (e.g., one queue sent a DMA and is waiting
then switch to another queue), triggering an assert.

Instead store the queue type in the queue itself and use that type in
trap to determine which ring ID to use for the interrupt packet.

Change-Id: If91c458e60a03f2013c0dc42bab0b1673e3dbd84
---
M src/dev/amdgpu/sdma_engine.cc
M src/dev/amdgpu/sdma_engine.hh
2 files changed, 26 insertions(+), 6 deletions(-)



diff --git a/src/dev/amdgpu/sdma_engine.cc b/src/dev/amdgpu/sdma_engine.cc
index 59c5027..02203c8 100644
--- a/src/dev/amdgpu/sdma_engine.cc
+++ b/src/dev/amdgpu/sdma_engine.cc
@@ -55,11 +55,15 @@
 gfxIb.parent();
 gfx.valid(true);
 gfxIb.valid(true);
+gfx.queueType(SDMAGfx);
+gfxIb.queueType(SDMAGfx);

 page.ib();
 pageIb.parent();
 page.valid(true);
 pageIb.valid(true);
+page.queueType(SDMAPage);
+pageIb.queueType(SDMAPage);

 rlc0.ib();
 rlc0Ib.parent();
@@ -727,11 +731,7 @@

 DPRINTF(SDMAEngine, "Trap contextId: %p\n", pkt->intrContext);

-uint32_t ring_id = 0;
-assert(page.processing() ^ gfx.processing());
-if (page.processing()) {
-ring_id = 3;
-}
+uint32_t ring_id = (q->queueType() == SDMAPage) ? 3 : 0;

 gpuDevice->getIH()->prepareInterruptCookie(pkt->intrContext, ring_id,
getIHClientId(), TRAP_ID);
diff --git a/src/dev/amdgpu/sdma_engine.hh b/src/dev/amdgpu/sdma_engine.hh
index d0afaf7..0bfee12 100644
--- a/src/dev/amdgpu/sdma_engine.hh
+++ b/src/dev/amdgpu/sdma_engine.hh
@@ -64,9 +64,10 @@
 bool _processing;
 SDMAQueue *_parent;
 SDMAQueue *_ib;
+SDMAType _type;
   public:
 SDMAQueue() : _rptr(0), _wptr(0), _valid(false),  
_processing(false),

-_parent(nullptr), _ib(nullptr) {}
+_parent(nullptr), _ib(nullptr), _type(SDMAGfx) {}

 Addr base() { return _base; }
 Addr rptr() { return _base + _rptr; }
@@ -80,6 +81,7 @@
 bool processing() { return _processing; }
 SDMAQueue* parent() { return _parent; }
 SDMAQueue* ib() { return _ib; }
+SDMAType queueType() { return _type; }

 void base(Addr value) { _base = value; }

@@ -111,6 +113,7 @@
 void processing(bool value) { _processing = value; }
 void parent(SDMAQueue* q) { _parent = q; }
 void ib(SDMAQueue* ib) { _ib = ib; }
+void queueType(SDMAType type) { _type = type; }
 };

 /* SDMA Engine ID */

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/65691?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: If91c458e60a03f2013c0dc42bab0b1673e3dbd84
Gerrit-Change-Number: 65691
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Fix SOPK instruction sign extends

2022-11-09 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/65432?usp=email )


Change subject: arch-vega: Fix SOPK instruction sign extends
..

arch-vega: Fix SOPK instruction sign extends

See: https://gem5-review.googlesource.com/c/public/gem5/+/37495

Same patch but for vega. This fixes issues with lulesh and probably
rodinia - heartwall as well in fullsystem.

Change-Id: I3af36bb9b60d32dc96cc3b439bb1167be1b0945d
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65432
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
Tested-by: kokoro 
---
M src/arch/amdgpu/vega/insts/instructions.cc
1 file changed, 28 insertions(+), 10 deletions(-)

Approvals:
  kokoro: Regressions pass
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved




diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 76bb8aa..f5b08b7 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -1553,7 +1553,7 @@
 void
 Inst_SOPK__S_MOVK_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ScalarOperandI32 sdst(gpuDynInst, instData.SDST);

 sdst = simm16;
@@ -1579,7 +1579,7 @@
 void
 Inst_SOPK__S_CMOVK_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ScalarOperandI32 sdst(gpuDynInst, instData.SDST);
 ConstScalarOperandU32 scc(gpuDynInst, REG_SCC);

@@ -1607,7 +1607,7 @@
 void
 Inst_SOPK__S_CMPK_EQ_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ConstScalarOperandI32 src(gpuDynInst, instData.SDST);
 ScalarOperandU32 scc(gpuDynInst, REG_SCC);

@@ -1634,7 +1634,7 @@
 void
 Inst_SOPK__S_CMPK_LG_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ConstScalarOperandI32 src(gpuDynInst, instData.SDST);
 ScalarOperandU32 scc(gpuDynInst, REG_SCC);

@@ -1661,7 +1661,7 @@
 void
 Inst_SOPK__S_CMPK_GT_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ConstScalarOperandI32 src(gpuDynInst, instData.SDST);
 ScalarOperandU32 scc(gpuDynInst, REG_SCC);

@@ -1688,7 +1688,7 @@
 void
 Inst_SOPK__S_CMPK_GE_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ConstScalarOperandI32 src(gpuDynInst, instData.SDST);
 ScalarOperandU32 scc(gpuDynInst, REG_SCC);

@@ -1715,7 +1715,7 @@
 void
 Inst_SOPK__S_CMPK_LT_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ConstScalarOperandI32 src(gpuDynInst, instData.SDST);
 ScalarOperandU32 scc(gpuDynInst, REG_SCC);

@@ -1742,7 +1742,7 @@
 void
 Inst_SOPK__S_CMPK_LE_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ConstScalarOperandI32 src(gpuDynInst, instData.SDST);
 ScalarOperandU32 scc(gpuDynInst, REG_SCC);

@@ -1939,7 +1939,7 @@

 src.read();

-sdst = src.rawData() + (ScalarRegI32)simm16;
+sdst = src.rawData() + (ScalarRegI32)sext<16>(simm16);
 scc = (bits(src.rawData(), 31) == bits(simm16, 15)
 && bits(src.rawData(), 31) != bits(sdst.rawData(), 31)) ? 1 :  
0;


@@ -1969,7 +1969,7 @@

 src.read();

-sdst = src.rawData() * (ScalarRegI32)simm16;
+sdst = src.rawData() * (ScalarRegI32)sext<16>(simm16);

 sdst.write();
 } // execute

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/65432?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I3af36bb9b60d32dc96cc3b439bb1167be1b0945d
Gerrit-Change-Number: 65432
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-CC: Jason Lowe-Power 
Gerrit-MessageType: merged

[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Handle ring buffer wrap for PM4 queue

2022-11-09 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/65431?usp=email )


Change subject: dev-amdgpu: Handle ring buffer wrap for PM4 queue
..

dev-amdgpu: Handle ring buffer wrap for PM4 queue

Change-Id: I27bc274327838add709423b072d437c4e727a714
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65431
Maintainer: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Matt Sinclair 
---
M src/dev/amdgpu/pm4_mmio.hh
M src/dev/amdgpu/pm4_packet_processor.cc
M src/dev/amdgpu/pm4_packet_processor.hh
M src/dev/amdgpu/pm4_queues.hh
4 files changed, 31 insertions(+), 4 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/dev/amdgpu/pm4_mmio.hh b/src/dev/amdgpu/pm4_mmio.hh
index a3ce5f1..3801223 100644
--- a/src/dev/amdgpu/pm4_mmio.hh
+++ b/src/dev/amdgpu/pm4_mmio.hh
@@ -60,6 +60,7 @@
 #define mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI
0x1251
 #define mmCP_HQD_PQ_WPTR_POLL_ADDR 
0x1252
 #define mmCP_HQD_PQ_WPTR_POLL_ADDR_HI  
0x1253
+#define mmCP_HQD_PQ_CONTROL
0x1256
 #define mmCP_HQD_IB_CONTROL
0x125a
 #define mmCP_HQD_PQ_WPTR_LO
0x127b
 #define mmCP_HQD_PQ_WPTR_HI
0x127c
diff --git a/src/dev/amdgpu/pm4_packet_processor.cc  
b/src/dev/amdgpu/pm4_packet_processor.cc

index 4f98f18..f78f833 100644
--- a/src/dev/amdgpu/pm4_packet_processor.cc
+++ b/src/dev/amdgpu/pm4_packet_processor.cc
@@ -147,8 +147,8 @@
 gpuDevice->setDoorbellType(offset, qt);

 DPRINTF(PM4PacketProcessor, "New PM4 queue %d, base: %p offset: %p,  
me: "

-"%d, pipe %d queue: %d\n", id, q->base(), q->offset(), q->me(),
-q->pipe(), q->queue());
+"%d, pipe %d queue: %d size: %d\n", id, q->base(), q->offset(),
+q->me(), q->pipe(), q->queue(), q->size());
 }

 void
@@ -790,6 +790,9 @@
   case mmCP_HQD_PQ_WPTR_POLL_ADDR_HI:
 setHqdPqWptrPollAddrHi(pkt->getLE());
 break;
+  case mmCP_HQD_PQ_CONTROL:
+setHqdPqControl(pkt->getLE());
+break;
   case mmCP_HQD_IB_CONTROL:
 setHqdIbCtrl(pkt->getLE());
 break;
@@ -912,6 +915,12 @@
 }

 void
+PM4PacketProcessor::setHqdPqControl(uint32_t data)
+{
+kiq.hqd_pq_control = data;
+}
+
+void
 PM4PacketProcessor::setHqdIbCtrl(uint32_t data)
 {
 kiq.hqd_ib_control = data;
diff --git a/src/dev/amdgpu/pm4_packet_processor.hh  
b/src/dev/amdgpu/pm4_packet_processor.hh

index 4806671..4617a21 100644
--- a/src/dev/amdgpu/pm4_packet_processor.hh
+++ b/src/dev/amdgpu/pm4_packet_processor.hh
@@ -171,6 +171,7 @@
 void setHqdPqRptrReportAddrHi(uint32_t data);
 void setHqdPqWptrPollAddr(uint32_t data);
 void setHqdPqWptrPollAddrHi(uint32_t data);
+void setHqdPqControl(uint32_t data);
 void setHqdIbCtrl(uint32_t data);
 void setRbVmid(uint32_t data);
 void setRbCntl(uint32_t data);
diff --git a/src/dev/amdgpu/pm4_queues.hh b/src/dev/amdgpu/pm4_queues.hh
index 19973b1..8b6626d 100644
--- a/src/dev/amdgpu/pm4_queues.hh
+++ b/src/dev/amdgpu/pm4_queues.hh
@@ -396,14 +396,14 @@
 rptr()
 {
 if (ib()) return q->ibBase + q->ibRptr;
-else return q->base + q->rptr;
+else return q->base + (q->rptr % size());
 }

 Addr
 wptr()
 {
 if (ib()) return q->ibBase + _ibWptr;
-else return q->base + _wptr;
+else return q->base + (_wptr % size());
 }

 Addr
@@ -470,6 +470,9 @@
 uint32_t pipe() { return _pkt.pipe; }
 uint32_t queue() { return _pkt.queueSlot; }
 bool privileged() { return _pkt.queueSel == 0 ? 1 : 0; }
+
+// Same computation as processMQD. See comment there for details.
+uint64_t size() { return 4UL << ((q->hqd_pq_control & 0x3f) + 1); }
 };

 } // namespace gem5

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/65431?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I27bc274327838add709423b072d437c4e727a714
Gerrit-Change-Number: 65431
Gerrit-PatchSet: 2
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: dev-amdgpu: Fix SDMA ring buffer wrap around

2022-11-08 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/65351?usp=email )


Change subject: dev-amdgpu: Fix SDMA ring buffer wrap around
..

dev-amdgpu: Fix SDMA ring buffer wrap around

The current SDMA wrap around handling only considers the ring buffer
location as seen by the GPU. Eventually when the end of the SDMA ring
buffer is reached, the driver waits until the rptr written back to the
host catches up to what the driver sees before wrapping around back to
the beginning of the buffer. This writeback currently does not happen at
all, causing hangs for applications with a lot of SDMA commands.

This changeset first fixes the sizes of the queues, especially RLC
queues, so that the wrap around occurs in the correct place. Second, we
now store the rptr writeback address and the absoluate (unwrapped) rptr
value in each SDMA queue. The absolulte rptr is what the driver sends to
the device and what it expects to be written back.

This was tested with an application which basically does a few hundred
thousand hipMemcpy() calls in a loop. It should also fix the issue with
pannotia BC in fullsystem mode.

Change-Id: I53ebdcc6b02fb4eb4da435c9a509544066a97069
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65351
Maintainer: Jason Lowe-Power 
Tested-by: kokoro 
Reviewed-by: Jason Lowe-Power 
Reviewed-by: Matt Sinclair 
Maintainer: Matt Sinclair 
---
M src/dev/amdgpu/pm4_packet_processor.cc
M src/dev/amdgpu/sdma_engine.cc
M src/dev/amdgpu/sdma_engine.hh
3 files changed, 79 insertions(+), 15 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved




diff --git a/src/dev/amdgpu/pm4_packet_processor.cc  
b/src/dev/amdgpu/pm4_packet_processor.cc

index 404beab..4f98f18 100644
--- a/src/dev/amdgpu/pm4_packet_processor.cc
+++ b/src/dev/amdgpu/pm4_packet_processor.cc
@@ -441,12 +441,17 @@
 PM4PacketProcessor::processSDMAMQD(PM4MapQueues *pkt, PM4Queue *q, Addr  
addr,

 SDMAQueueDesc *mqd, uint16_t vmid)
 {
+uint32_t rlc_size = 4UL << bits(mqd->sdmax_rlcx_rb_cntl, 6, 1);
+Addr rptr_wb_addr = mqd->sdmax_rlcx_rb_rptr_addr_hi;
+rptr_wb_addr <<= 32;
+rptr_wb_addr |= mqd->sdmax_rlcx_rb_rptr_addr_lo;
+
 DPRINTF(PM4PacketProcessor, "SDMAMQD: rb base: %#lx rptr: %#x/%#x  
wptr: "

-"%#x/%#x ib: %#x/%#x size: %d ctrl: %#x\n", mqd->rb_base,
-mqd->sdmax_rlcx_rb_rptr, mqd->sdmax_rlcx_rb_rptr_hi,
+"%#x/%#x ib: %#x/%#x size: %d ctrl: %#x rptr wb addr: %#lx\n",
+mqd->rb_base, mqd->sdmax_rlcx_rb_rptr,  
mqd->sdmax_rlcx_rb_rptr_hi,

 mqd->sdmax_rlcx_rb_wptr, mqd->sdmax_rlcx_rb_wptr_hi,
 mqd->sdmax_rlcx_ib_base_lo, mqd->sdmax_rlcx_ib_base_hi,
-mqd->sdmax_rlcx_ib_size, mqd->sdmax_rlcx_rb_cntl);
+rlc_size, mqd->sdmax_rlcx_rb_cntl, rptr_wb_addr);

 // Engine 2 points to SDMA0 while engine 3 points to SDMA1
 assert(pkt->engineSel == 2 || pkt->engineSel == 3);
@@ -454,7 +459,8 @@

 // Register RLC queue with SDMA
 sdma_eng->registerRLCQueue(pkt->doorbellOffset << 2,
-   mqd->rb_base << 8);
+   mqd->rb_base << 8, rlc_size,
+   rptr_wb_addr);

 // Register doorbell with GPU device
 gpuDevice->setSDMAEngine(pkt->doorbellOffset << 2, sdma_eng);
diff --git a/src/dev/amdgpu/sdma_engine.cc b/src/dev/amdgpu/sdma_engine.cc
index e9a4c17..59c5027 100644
--- a/src/dev/amdgpu/sdma_engine.cc
+++ b/src/dev/amdgpu/sdma_engine.cc
@@ -161,7 +161,8 @@
 }

 void
-SDMAEngine::registerRLCQueue(Addr doorbell, Addr rb_base)
+SDMAEngine::registerRLCQueue(Addr doorbell, Addr rb_base, uint32_t size,
+ Addr rptr_wb_addr)
 {
 // Get first free RLC
 if (!rlc0.valid()) {
@@ -171,19 +172,19 @@
 rlc0.base(rb_base);
 rlc0.rptr(0);
 rlc0.wptr(0);
+rlc0.rptrWbAddr(rptr_wb_addr);
 rlc0.processing(false);
-// TODO: size - I think pull from MQD 2^rb_cntrl[6:1]-1
-rlc0.size(1024*1024);
+rlc0.size(size);
 } else if (!rlc1.valid()) {
 DPRINTF(SDMAEngine, "Doorbell %lx mapped to RLC1\n", doorbell);
 rlcInfo[1] = doorbell;
 rlc1.valid(true);
 rlc1.base(rb_base);
-rlc1.rptr(1);
-rlc1.wptr(1);
+rlc1.rptr(0);
+rlc1.wptr(0);
+rlc1.rptrWbAddr(rptr_wb_addr);
 rlc1.processing(false);
-// TODO: size - I think pull from MQD 2^rb_cntrl[6:1]-1
-rlc1.size(1024*1024);
+rlc1.size(size);
 } else {
 panic("No free RLCs. Check they are properly unmapped.");
 }
@@ -291,6 +292,17 @@
 { decodeHeader(q, header); });
 dmaReadVirt(q->rptr(), sizeof(uint32_t), cb, 

[gem5-dev] [S] Change in gem5/gem5[develop]: dev-amdgpu: Handle ring buffer wrap for PM4 queue

2022-11-08 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/65431?usp=email )



Change subject: dev-amdgpu: Handle ring buffer wrap for PM4 queue
..

dev-amdgpu: Handle ring buffer wrap for PM4 queue

Change-Id: I27bc274327838add709423b072d437c4e727a714
---
M src/dev/amdgpu/pm4_mmio.hh
M src/dev/amdgpu/pm4_packet_processor.cc
M src/dev/amdgpu/pm4_packet_processor.hh
M src/dev/amdgpu/pm4_queues.hh
4 files changed, 27 insertions(+), 4 deletions(-)



diff --git a/src/dev/amdgpu/pm4_mmio.hh b/src/dev/amdgpu/pm4_mmio.hh
index a3ce5f1..3801223 100644
--- a/src/dev/amdgpu/pm4_mmio.hh
+++ b/src/dev/amdgpu/pm4_mmio.hh
@@ -60,6 +60,7 @@
 #define mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI
0x1251
 #define mmCP_HQD_PQ_WPTR_POLL_ADDR 
0x1252
 #define mmCP_HQD_PQ_WPTR_POLL_ADDR_HI  
0x1253
+#define mmCP_HQD_PQ_CONTROL
0x1256
 #define mmCP_HQD_IB_CONTROL
0x125a
 #define mmCP_HQD_PQ_WPTR_LO
0x127b
 #define mmCP_HQD_PQ_WPTR_HI
0x127c
diff --git a/src/dev/amdgpu/pm4_packet_processor.cc  
b/src/dev/amdgpu/pm4_packet_processor.cc

index 4f98f18..f78f833 100644
--- a/src/dev/amdgpu/pm4_packet_processor.cc
+++ b/src/dev/amdgpu/pm4_packet_processor.cc
@@ -147,8 +147,8 @@
 gpuDevice->setDoorbellType(offset, qt);

 DPRINTF(PM4PacketProcessor, "New PM4 queue %d, base: %p offset: %p,  
me: "

-"%d, pipe %d queue: %d\n", id, q->base(), q->offset(), q->me(),
-q->pipe(), q->queue());
+"%d, pipe %d queue: %d size: %d\n", id, q->base(), q->offset(),
+q->me(), q->pipe(), q->queue(), q->size());
 }

 void
@@ -790,6 +790,9 @@
   case mmCP_HQD_PQ_WPTR_POLL_ADDR_HI:
 setHqdPqWptrPollAddrHi(pkt->getLE());
 break;
+  case mmCP_HQD_PQ_CONTROL:
+setHqdPqControl(pkt->getLE());
+break;
   case mmCP_HQD_IB_CONTROL:
 setHqdIbCtrl(pkt->getLE());
 break;
@@ -912,6 +915,12 @@
 }

 void
+PM4PacketProcessor::setHqdPqControl(uint32_t data)
+{
+kiq.hqd_pq_control = data;
+}
+
+void
 PM4PacketProcessor::setHqdIbCtrl(uint32_t data)
 {
 kiq.hqd_ib_control = data;
diff --git a/src/dev/amdgpu/pm4_packet_processor.hh  
b/src/dev/amdgpu/pm4_packet_processor.hh

index 4806671..4617a21 100644
--- a/src/dev/amdgpu/pm4_packet_processor.hh
+++ b/src/dev/amdgpu/pm4_packet_processor.hh
@@ -171,6 +171,7 @@
 void setHqdPqRptrReportAddrHi(uint32_t data);
 void setHqdPqWptrPollAddr(uint32_t data);
 void setHqdPqWptrPollAddrHi(uint32_t data);
+void setHqdPqControl(uint32_t data);
 void setHqdIbCtrl(uint32_t data);
 void setRbVmid(uint32_t data);
 void setRbCntl(uint32_t data);
diff --git a/src/dev/amdgpu/pm4_queues.hh b/src/dev/amdgpu/pm4_queues.hh
index 19973b1..8b6626d 100644
--- a/src/dev/amdgpu/pm4_queues.hh
+++ b/src/dev/amdgpu/pm4_queues.hh
@@ -396,14 +396,14 @@
 rptr()
 {
 if (ib()) return q->ibBase + q->ibRptr;
-else return q->base + q->rptr;
+else return q->base + (q->rptr % size());
 }

 Addr
 wptr()
 {
 if (ib()) return q->ibBase + _ibWptr;
-else return q->base + _wptr;
+else return q->base + (_wptr % size());
 }

 Addr
@@ -470,6 +470,9 @@
 uint32_t pipe() { return _pkt.pipe; }
 uint32_t queue() { return _pkt.queueSlot; }
 bool privileged() { return _pkt.queueSel == 0 ? 1 : 0; }
+
+// Same computation as processMQD. See comment there for details.
+uint64_t size() { return 4UL << ((q->hqd_pq_control & 0x3f) + 1); }
 };

 } // namespace gem5

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/65431?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I27bc274327838add709423b072d437c4e727a714
Gerrit-Change-Number: 65431
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: arch-vega: Fix SOPK instruction sign extends

2022-11-08 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has uploaded this change for review. (  
https://gem5-review.googlesource.com/c/public/gem5/+/65432?usp=email )



Change subject: arch-vega: Fix SOPK instruction sign extends
..

arch-vega: Fix SOPK instruction sign extends

See: https://gem5-review.googlesource.com/c/public/gem5/+/37495

Same patch but for vega. This fixes issues with lulesh and probably
rodinia - heartwall as well in fullsystem.

Change-Id: I3af36bb9b60d32dc96cc3b439bb1167be1b0945d
---
M src/arch/amdgpu/vega/insts/instructions.cc
1 file changed, 24 insertions(+), 10 deletions(-)



diff --git a/src/arch/amdgpu/vega/insts/instructions.cc  
b/src/arch/amdgpu/vega/insts/instructions.cc

index 76bb8aa..f5b08b7 100644
--- a/src/arch/amdgpu/vega/insts/instructions.cc
+++ b/src/arch/amdgpu/vega/insts/instructions.cc
@@ -1553,7 +1553,7 @@
 void
 Inst_SOPK__S_MOVK_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ScalarOperandI32 sdst(gpuDynInst, instData.SDST);

 sdst = simm16;
@@ -1579,7 +1579,7 @@
 void
 Inst_SOPK__S_CMOVK_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ScalarOperandI32 sdst(gpuDynInst, instData.SDST);
 ConstScalarOperandU32 scc(gpuDynInst, REG_SCC);

@@ -1607,7 +1607,7 @@
 void
 Inst_SOPK__S_CMPK_EQ_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ConstScalarOperandI32 src(gpuDynInst, instData.SDST);
 ScalarOperandU32 scc(gpuDynInst, REG_SCC);

@@ -1634,7 +1634,7 @@
 void
 Inst_SOPK__S_CMPK_LG_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ConstScalarOperandI32 src(gpuDynInst, instData.SDST);
 ScalarOperandU32 scc(gpuDynInst, REG_SCC);

@@ -1661,7 +1661,7 @@
 void
 Inst_SOPK__S_CMPK_GT_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ConstScalarOperandI32 src(gpuDynInst, instData.SDST);
 ScalarOperandU32 scc(gpuDynInst, REG_SCC);

@@ -1688,7 +1688,7 @@
 void
 Inst_SOPK__S_CMPK_GE_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ConstScalarOperandI32 src(gpuDynInst, instData.SDST);
 ScalarOperandU32 scc(gpuDynInst, REG_SCC);

@@ -1715,7 +1715,7 @@
 void
 Inst_SOPK__S_CMPK_LT_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ConstScalarOperandI32 src(gpuDynInst, instData.SDST);
 ScalarOperandU32 scc(gpuDynInst, REG_SCC);

@@ -1742,7 +1742,7 @@
 void
 Inst_SOPK__S_CMPK_LE_I32::execute(GPUDynInstPtr gpuDynInst)
 {
-ScalarRegI32 simm16 = (ScalarRegI32)instData.SIMM16;
+ScalarRegI32 simm16 = (ScalarRegI32)sext<16>(instData.SIMM16);
 ConstScalarOperandI32 src(gpuDynInst, instData.SDST);
 ScalarOperandU32 scc(gpuDynInst, REG_SCC);

@@ -1939,7 +1939,7 @@

 src.read();

-sdst = src.rawData() + (ScalarRegI32)simm16;
+sdst = src.rawData() + (ScalarRegI32)sext<16>(simm16);
 scc = (bits(src.rawData(), 31) == bits(simm16, 15)
 && bits(src.rawData(), 31) != bits(sdst.rawData(), 31)) ? 1 :  
0;


@@ -1969,7 +1969,7 @@

 src.read();

-sdst = src.rawData() * (ScalarRegI32)simm16;
+sdst = src.rawData() * (ScalarRegI32)sext<16>(simm16);

 sdst.write();
 } // execute

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/65432?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I3af36bb9b60d32dc96cc3b439bb1167be1b0945d
Gerrit-Change-Number: 65432
Gerrit-PatchSet: 1
Gerrit-Owner: Matthew Poremba 
Gerrit-MessageType: newchange
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [M] Change in gem5/gem5[develop]: gpu-compute: Chunkify AMDKernelCode read from device

2022-11-08 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/65251?usp=email )


Change subject: gpu-compute: Chunkify AMDKernelCode read from device
..

gpu-compute: Chunkify AMDKernelCode read from device

The AMDKernelCode object can span potentially span two pages. Currently
the copy loop from device memory only translates once at the base
address.

This changeset translates one cache line at a time before copying and
has the ancillary benefit for cleaning up this code a bit.

Change-Id: I602bc12d8f8c5d3a3e57ab3f42f7dd3df58dc144
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65251
Reviewed-by: Matt Sinclair 
Tested-by: kokoro 
Reviewed-by: Jason Lowe-Power 
Maintainer: Jason Lowe-Power 
---
M src/gpu-compute/gpu_command_processor.cc
1 file changed, 42 insertions(+), 9 deletions(-)

Approvals:
  Jason Lowe-Power: Looks good to me, but someone else must approve; Looks  
good to me, approved

  kokoro: Regressions pass
  Matt Sinclair: Looks good to me, approved




diff --git a/src/gpu-compute/gpu_command_processor.cc  
b/src/gpu-compute/gpu_command_processor.cc

index d46ace6..af59b78 100644
--- a/src/gpu-compute/gpu_command_processor.cc
+++ b/src/gpu-compute/gpu_command_processor.cc
@@ -118,6 +118,7 @@
 {
 static int dynamic_task_id = 0;
 _hsa_dispatch_packet_t *disp_pkt = (_hsa_dispatch_packet_t*)raw_pkt;
+assert(!(disp_pkt->kernel_object & (system()->cacheLineSize() - 1)));

 /**
  * we need to read a pointer in the application's address
@@ -150,6 +151,10 @@
 is_system_page);
 }

+DPRINTF(GPUCommandProc, "kernobj vaddr %#lx paddr %#lx size %d s:%d\n",
+disp_pkt->kernel_object, phys_addr, sizeof(AMDKernelCode),
+is_system_page);
+
 /**
  * The kernel_object is a pointer to the machine code, whose entry
  * point is an 'amd_kernel_code_t' type, which is included in the
@@ -167,20 +172,27 @@
 } else {
 assert(FullSystem);
 DPRINTF(GPUCommandProc, "kernel_object in device, using device  
mem\n");

-// Read from GPU memory manager
-uint8_t raw_akc[sizeof(AMDKernelCode)];
-for (int i = 0; i < sizeof(AMDKernelCode) / sizeof(uint8_t); ++i) {
-Addr mmhubAddr = phys_addr + i*sizeof(uint8_t);
+
+// Read from GPU memory manager one cache line at a time to prevent
+// rare cases where the AKC spans two memory pages.
+ChunkGenerator gen(disp_pkt->kernel_object, sizeof(AMDKernelCode),
+   system()->cacheLineSize());
+for (; !gen.done(); gen.next()) {
+Addr chunk_addr = gen.addr();
+int vmid = 1;
+unsigned dummy;
+ 
walker->startFunctional(gpuDevice->getVM().getPageTableBase(vmid),

+chunk_addr, dummy, BaseMMU::Mode::Read,
+is_system_page);
+
 Request::Flags flags = Request::PHYSICAL;
-RequestPtr request = std::make_shared(
-mmhubAddr, sizeof(uint8_t), flags,  
walker->getDevRequestor());

+RequestPtr request = std::make_shared(chunk_addr,
+system()->cacheLineSize(), flags,  
walker->getDevRequestor());

 Packet *readPkt = new Packet(request, MemCmd::ReadReq);
-readPkt->allocate();
+readPkt->dataStatic((uint8_t *) + gen.complete());
 system()->getDeviceMemory(readPkt)->access(readPkt);
-raw_akc[i] = readPkt->getLE();
 delete readPkt;
 }
-memcpy(, _akc, sizeof(AMDKernelCode));
 }

 DPRINTF(GPUCommandProc, "GPU machine code is %lli bytes from start of  
the "


--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/65251?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I602bc12d8f8c5d3a3e57ab3f42f7dd3df58dc144
Gerrit-Change-Number: 65251
Gerrit-PatchSet: 3
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Alexandru Duțu (Alex) 
Gerrit-Reviewer: Jason Lowe-Power 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


[gem5-dev] [S] Change in gem5/gem5[develop]: gpu-compute: Add granulated SGPR computation for gfx9

2022-11-08 Thread Matthew Poremba (Gerrit) via gem5-dev
Matthew Poremba has submitted this change. (  
https://gem5-review.googlesource.com/c/public/gem5/+/65252?usp=email )


 (

1 is the latest approved patch-set.
No files were changed between the latest approved patch-set and the  
submitted one.

 )Change subject: gpu-compute: Add granulated SGPR computation for gfx9
..

gpu-compute: Add granulated SGPR computation for gfx9

The granulated SGPR size is used when the number of SGPRs is unknown.
The computation for this has changed since gfx8 and is commented as a
TODO in a comment.

This changeset implements the change and also checks for an invalid SGPR
count. According to LLVM code this could happen "due to a compiler bug
or when using inline asm.":
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/
AMDGPUAsmPrinter.cpp#L723

Change-Id: Ie487a53940b323a0002341075e0f81af4147a7d8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/65252
Maintainer: Matt Sinclair 
Reviewed-by: Matt Sinclair 
Tested-by: kokoro 
---
M src/gpu-compute/hsa_queue_entry.hh
1 file changed, 39 insertions(+), 3 deletions(-)

Approvals:
  Matt Sinclair: Looks good to me, approved; Looks good to me, approved
  kokoro: Regressions pass




diff --git a/src/gpu-compute/hsa_queue_entry.hh  
b/src/gpu-compute/hsa_queue_entry.hh

index 4261f2c..fbe0efe 100644
--- a/src/gpu-compute/hsa_queue_entry.hh
+++ b/src/gpu-compute/hsa_queue_entry.hh
@@ -96,9 +96,22 @@
 if (!numVgprs)
 numVgprs = (akc->granulated_workitem_vgpr_count + 1) * 4;

-// TODO: Granularity changes for GFX9!
-if (!numSgprs)
-numSgprs = (akc->granulated_wavefront_sgpr_count + 1) * 8;
+if (!numSgprs || numSgprs ==
+ 
std::numeric_limitswavefront_sgpr_count)>::max()) {
+// Supported major generation numbers: 0 (BLIT kernels), 8,  
and 9

+uint16_t version = akc->amd_machine_version_major;
+assert((version == 0) || (version == 8) || (version == 9));
+// SGPR allocation granularies:
+// - GFX8: 8
+// - GFX9: 16
+// Source: https://llvm.org/docs/AMDGPUUsage.html
+if ((version == 0) || (version == 8)) {
+// We assume that BLIT kernels use the same granularity as  
GFX8

+numSgprs = (akc->granulated_wavefront_sgpr_count + 1) * 8;
+} else if (version == 9) {
+numSgprs = ((akc->granulated_wavefront_sgpr_count + 1) *  
16)/2;

+}
+}

 initialVgprState.reset();
 initialSgprState.reset();

--
To view, visit  
https://gem5-review.googlesource.com/c/public/gem5/+/65252?usp=email
To unsubscribe, or for help writing mail filters, visit  
https://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Ie487a53940b323a0002341075e0f81af4147a7d8
Gerrit-Change-Number: 65252
Gerrit-PatchSet: 3
Gerrit-Owner: Matthew Poremba 
Gerrit-Reviewer: Alexandru Duțu (Alex) 
Gerrit-Reviewer: Kyle Roarty 
Gerrit-Reviewer: Matt Sinclair 
Gerrit-Reviewer: Matthew Poremba 
Gerrit-Reviewer: kokoro 
Gerrit-MessageType: merged
___
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org


  1   2   3   4   >